on-disk-format.txt revision 2144
1968N/Apkg(5): image packaging system
1968N/A
1968N/AThis information is Copyright (c) 2010, Oracle and/or its affiliates.
1968N/AAll rights reserved.
1968N/A
1968N/AON-DISK FORMAT PROPOSAL
1968N/A
1968N/A1. Introduction
1968N/A 1.1. Date of This Document:
1968N/A
1968N/A 06/02/2010
1968N/A
1968N/A 1.2. Name of Document Author/Supplier:
1968N/A
1968N/A Shawn Walker, Oracle,
1968N/A on behalf of the pkg(5) project team
1968N/A
1968N/A 1.3. Acknowledgements:
1968N/A
1968N/A This document is largely based on comments from the following
1968N/A individuals to whom the author is exceedingly indebted to:
1968N/A
1968N/A - Danek Duvall
1968N/A - Mike Gerdts
1968N/A - Stephen Hahn
1968N/A - Krister Johansen
1968N/A - Dan Price
1968N/A - Brock Pytlik
1968N/A - Bart Smaalders
1968N/A - Peter Tribble
1968N/A
1968N/A2. Project Summary
1968N/A
1968N/A 2.1. Project Description:
1968N/A
1968N/A "...the repository can be archived up, put on a CD, memory
1968N/A stick, 2D barcode, and protected by the Black Knight, fire
1968N/A moats, komodo dragons, etc." - Danek Duvall
1968N/A
1968N/A pkg(5) is primarily a network-oriented binary packaging system.
1968N/A Although some of the tools it provides support filesystem-based
1968N/A operations for publication, the primary expected use for package
1968N/A operations (such as install, update, search, etc.) is between an
1968N/A intelligent client and one or more servers that provide access
1968N/A to a package repository and/or other interactive services.
1968N/A
1968N/A This project seeks to define and establish an on-disk format
1968N/A (and corresponding container format), for the pkg(5) system,
1968N/A with the intent that it can enable the ubiquitous, transparent
1968N/A use of package data from filesystem-based resources.
1968N/A
1968N/A The changes proposed by this project are evolutionary, not
1968N/A revolutionary, in nature. In particular, this project seeks
1968N/A to refine and adopt the existing repository format used by the
1968N/A pkg(5) depot server as the on-disk format. Supplementary to
1968N/A that, it also seeks the addition of a container format to ease
1968N/A provisioning of the on-disk format, and the unification of the
1968N/A scheme used by the client and server to store package data.
1968N/A
1968N/A 2.2. Problem Area:
1968N/A
1968N/A For some deployments, network-based package data access is not
1968N/A possible or is undesirable. Concerns often cited in this area
1968N/A include:
1968N/A
1968N/A - lack of access control or ability to easily integrate with
1968N/A existing access control systems,
1968N/A
1968N/A - inability to rely on alternative (or existing) provisioning
1968N/A arrangements (such as NFS-based file servers),
1968N/A
1968N/A - environmental or procedural requirements that prohibit the
1968N/A ability to or use of a network-based service,
1968N/A
1968N/A - characteristics of network protocols (such as HTTP, etc.) that
1968N/A artificially limit functionality or performance (as opposed to
1968N/A iSCSI or other alternatives),
1968N/A
1968N/A - ease of administration of filesystem-based resources, and
1968N/A
1968N/A - ease of transferring package data.
1968N/A
1968N/A3. Project Technical Description:
1968N/A 3.1. Details:
1968N/A
1968N/A This project defines an on-disk format (and corresponding con-
1968N/A tainer format) that is intended for the supplemental or complete
1968N/A provisioning of package data at all stages of the package life-
1968N/A cycle. That is, when package data is published, stored by the
1968N/A client or server, or otherwise used during package operations.
1968N/A
1968N/A The on-disk format (defined in detail later in this document)
1968N/A is intended to be distributable in its raw form (a pre-defined
1968N/A structure of directories and files) or within a container format
1968N/A (such as a zip file, etc.).
1968N/A
1968N/A Out of necessity, the use of filesystem-based resources (such as
1968N/A those provided by the on-disk format) will sometimes limit the
1968N/A operations that can be performed to a subset of those normally
1968N/A available when interacting with a network-based repository. For
1968N/A example, search and publisher configuration may not be possible,
1968N/A and purely interactive services such as the BUI (Browser UI)
1968N/A offered by the depot server for a repository, RSS feeds, and
1968N/A others will not be available.
1968N/A
1968N/A Because of the wide-ranging impact of the changes required to
1968N/A implement this functionality, it is intended that the project
1968N/A be implemented in the following sequence:
1968N/A
1968N/A - Client Support for filesystem-based Repository Access
1968N/A
1968N/A - Depot Storage, Client Transport and Publication Tool Update
1968N/A
1968N/A - Client Storage and Image Format Update
1968N/A
1968N/A - Client and Depot Support for On-Disk Archive Format
1968N/A
1968N/A 3.2. Bug/RFE Number(s):
1968N/A
1968N/A As an example of the kinds of defects and RFEs intended to be
1968N/A resolved by this project, see the following selection of
1968N/A defect.opensolaris.org bug IDs:
1968N/A
1968N/A2152 standalone package support needed (on-disk format)
1968N/A166 depot doesn't set directory mode when creating directories
1968N/A2086 validate that a repository is really a repository in pkg.depotd
1968N/A6335 publisher repo with invalid certificate information shouldn't
1968N/A prevent querying other repos
1968N/A6576 pkg install/image-update support for temporary publisher origins
1968N/A desired
1968N/A6940 depot support for file:// URI desired
1968N/A7213 ability to remove published packages
1968N/A7273 manifests should be arranged in a hierarchy by publisher
1968N/A7276 /var/pkg metadata needs reorg (looks busy)
1968N/A8433 client and pull need to refer to refer to "repository" instead of
1968N/A "server"
1968N/A8722 advanced repository metadata store needed
1968N/A8725 versioning information for depot and repository metadata needed
1968N/A9571 CachedManifest should be named FactoredManifest
1968N/A9572 CachedManifest should allow consumers to specify cache location
1968N/A9872 publication api should use new transport subsystem
1968N/A9933 ability to control repository creation behaviour or removal of it
1968N/A10244 caching dictionaries as a class variable prevents multi-image and
1968N/A repo search
1968N/A11362 Image update dying when trying to talk to a disabled and offline
1968N/A publisher
1968N/A11740 publishers with installed packages should not be removable
1968N/A12814 publisher prefixes should be forcibly lower-cased or case
1968N/A insensitive
1968N/A14802 ability to have separate read / write download caches
1968N/A15320 pkgsend will traceback if unable to parse server error response
1968N/A15371 repository property defaults opensolaris.org-specific
1968N/A
1968N/A 3.3. In Scope:
1968N/A
1968N/A Filesystem-based data resourcing for package operations.
1968N/A
1968N/A 3.4. Out of Scope:
1968N/A
1968N/A Package signing and fine-grained access control for package
1968N/A repositories.
1968N/A
1968N/A4. On-Disk Format Technical Description:
1968N/A 4.1. Overview:
1968N/A
1968N/A The on-disk format is intended to exist both in a raw format as
1968N/A a pre-defined structure of directories and files, and in an
1968N/A archive format which is primarily a simple container for
1968N/A the raw format.
1968N/A
1968N/A 4.2. Raw Format:
1968N/A
1968N/A 4.2.1. Goals:
1968N/A The goals for the raw on-disk format include:
1968N/A
1968N/A - unification of client and server package data storage
1968N/A for data common to both,
1968N/A
1968N/A - transparent usage of package data regardless of operation
1968N/A or use by client or server,
1968N/A
1968N/A - ease in composition and decomposition of package data
1968N/A stored within by publisher or package,
1968N/A
1968N/A - re-use of existing publication tools for on-disk format,
1968N/A
1968N/A - enablement of future publication tools to automatically
1968N/A be able to manipulate or use on-disk format, and
1968N/A
1968N/A - ease of provisioning.
1968N/A
1968N/A 4.2.2. Raw Format specification:
1968N/A
1968N/A The pkg(5) repository format is a set of directories and
1968N/A files that conform to a pre-defined structure.
1968N/A
2144N/A For a version 3 repository (the current format), the
1968N/A structure is as follows:
1968N/A
1968N/A <REPO_ROOT>/
1968N/A catalog/
1968N/A <catalog v1 files>
1968N/A index/
1968N/A <index files>
1968N/A file/
1968N/A <first two letters of file hash>/
1968N/A <file-named-by-hash>
1968N/A pkg/
1968N/A <stem>/
1968N/A <manifest-file>
1968N/A trans/
1968N/A <in-flight transaction files>
1968N/A cfg_cache (optional repository configuration file)
1968N/A
2144N/A Version 4 of the repository format eliminates the potential
1968N/A for unintended collisions between package metadata from
1968N/A different publishers and simplifies composition and decomp-
1968N/A osition of repository content. The top-level is an optional
1968N/A shared storage space for data common to all publishers in
1968N/A the repository, while the publisher subdirectory contains
1968N/A data specific to a publisher. It is essentially a nested
1968N/A repository format, and can be defined as follows:
1968N/A
1968N/A <REPO_ROOT>/
1968N/A file/ (optional)
1968N/A publisher/ (optional)
1968N/A <prefix>/ (optional)
1968N/A catalog/ (optional)
1968N/A <catalog v1 files>
1968N/A file/ (optional)
1968N/A <first two letters of file hash>/
1968N/A <file-named-by-hash>
1968N/A index/ (optional)
1968N/A pkg/ (optional)
1968N/A <stem>/
1968N/A <manifest-file-for-pkg-version>
1968N/A trans/ (optional)
1968N/A <in-flight transaction files>
1968N/A pub.p5i (optional)
1968N/A pkg5.repository (required)
1968N/A
1968N/A By default, repository operations will store data in the
1968N/A publisher-specific location found under publisher/<prefix>
1968N/A for new repositories.
1968N/A
1968N/A In the case that the top-level file/ directory is used,
1968N/A automatic decomposition of contents into its publisher-
1968N/A specific components will not be possible unless
1968N/A corresponding package manifests are also available.
1968N/A
1968N/A To support easy composition, filtering, and creation of
1968N/A package archives, directories above marked with the text
1968N/A '(optional)' must not be required. The behaviour of
1968N/A consumers accessing the contents of the repository should
1968N/A be as follows based on the directory accessed:
1968N/A
1968N/A - file/
1968N/A This optional directory serves as a place to store file
1968N/A data for more than one publisher. Package files are
1968N/A stored in gzip format using a sha1sum of the file as the
1968N/A filename, and then the first two letters of the filename
1968N/A as the parent directory's name.
1968N/A
1968N/A - publisher/<prefix>/catalog/
1968N/A If absent, consumers should determine the list of
1968N/A packages available based on the manifest files present
1968N/A in the publisher/ subdirectory. If present, consumers
1968N/A should expect v1 (or newer) catalog files, or none at
1968N/A all, to be contained within.
1968N/A
1968N/A - publisher/<prefix>/file/
1968N/A Consumers should always check this subdirectory first
1968N/A (if present) when retrieving package file data if the
1968N/A publisher is known. Package files are stored in gzip
1968N/A format using a sha1sum of the file as the filename, and
1968N/A then the first two letters of the filename as the parent
1968N/A directory's name.
1968N/A
1968N/A - publisher/<prefix>/index/
1968N/A If absent, search functionality should be disabled for
1968N/A this publisher, or a fallback to 'slow manifest-based
1968N/A search' performed. If present, consumers should expect
1968N/A v1 (or newer) search files, or none at all, to be con-
1968N/A tained within.
1968N/A
1968N/A - publisher/<prefix>/pkg/
1968N/A If absent, search must be disabled for this publisher
1968N/A even if index is present. If present, manifests are
1968N/A stored in pkg(5) manifest format using the uri-encoded
1968N/A version of the package FMRI as the filename, and using
1968N/A the uri-encoded package FMRI stem (name) as the parent
1968N/A directory's name.
1968N/A
1968N/A - publisher/<prefix>/trans/
1968N/A If absent, this directory will be created during
1968N/A publication operations. If present, in progress
1968N/A transaction data is stored in a directory named
1968N/A by the open time of the transaction as a UTC UNIX
1968N/A timestamp plus an '_' and the URI-encoded package
1968N/A FRMI. As an example:
1968N/A
1968N/A 1245176111_pkg%3A%2FBRCMbnx%400.5.11%2C5.11-0.116
1968N/A %3A20090616T181511Z
1968N/A
1968N/A - publisher/<prefix>/pub.p5i
1968N/A This pkg(5) information (p5i) file should contain
1968N/A suggested configuration information for clients such as
1968N/A origins, mirrors, alias, etc. Consumers can use this to
1968N/A provide clients with initial or suggested configuration
1968N/A information for a given publisher. If not present, the
1968N/A publisher's identity should be assumed based on the
1968N/A directory structure, while the refresh interval should
1968N/A be assumed to be 4 hours.
1968N/A
1968N/A - pkg5.repository
1968N/A This file serves as an identifier and a place to store
1968N/A configuration information specific to the repository.
1968N/A It *is not* an equivalent to the existing cfg_cache
1968N/A file which will no longer be used. Its format and
1968N/A structure are as follows:
1968N/A
1968N/A [repository]
1968N/A version = <integer>
1968N/A
1968N/A Any information found in the cfg_cache used in the previous
1968N/A repository format related to a publisher is now stored in
1968N/A the pub.p5i file for the related publisher. (Examples of
1968N/A information include origins, mirrors, maintainer info,
1968N/A etc.) As a result, the cfg_cache file is no longer used.
1968N/A
1968N/A Any depot-specific properties, such as the feed icon, logo,
1968N/A etc. are now completely managed using SMF or a user-provided
1968N/A configuration file. This change was made not only to sim-
1968N/A plify configuration, but to separate depot configuration
1968N/A from repsitory configuration.
1968N/A
2144N/A An example version 4 repository might be structured as
1968N/A follows:
1968N/A
1968N/A <REPO_ROOT>/
1968N/A publisher/
1968N/A example.com/
1968N/A catalog/
1968N/A catalog.attrs
1968N/A catalog.base.C
1968N/A file/
1968N/A ff/
1968N/A fffff277f5a8fb63e57670afc178415c2c5e706d
1968N/A index/
1968N/A __at_depend
1968N/A ...
1968N/A pkg/
1968N/A package%2Fpkg/
1968N/A 0.5.11%2C5.11-0.136%3A20100327T063139Z
1968N/A trans/
1968N/A 1245176111_pkg%3A%2FBRCMbnx%400.5.11%2C5.11-0.116
1968N/A %3A20090616T181511Z
1968N/A pub.p5i
1968N/A example.net/
1968N/A catalog/
1968N/A catalog.attrs
1968N/A catalog.base.C
1968N/A file/
1968N/A af/
1968N/A affff277f5a8fb63e57670afc178415c2c5e706d
1968N/A index/
1968N/A __at_depend
1968N/A ...
1968N/A pkg/
1968N/A package%2Fpkg/
1968N/A 0.5.11%2C5.11-0.133%3A20090327T062137Z
1968N/A trans/
1968N/A 1245176111_pkg%3A%2FFAAMbnx%400.5.11%2C5.11-0.139
1968N/A %3A20100616T181511Z
1968N/A pub.p5i
1968N/A
1968N/A pkg5.repository:
1968N/A [repository]
2144N/A version = 4
1968N/A
1968N/A 4.3. Archive Format:
1968N/A
1968N/A 4.3.1. Requirements:
1968N/A
1968N/A The requirements for the on-disk archive format include:
1968N/A
1968N/A - support for archives greater than 8GB in size,
1968N/A
1968N/A - support for files in archive greater than 4GB in size,
1968N/A
1968N/A - support for efficient storage of hard links,
1968N/A
1968N/A - support for pathnames sigificantly greater than > 255
1968N/A characters in length,
1968N/A
1968N/A - core Python bindings exist or can be easily created using
1968N/A an existing library,
1968N/A
1968N/A - can be a container of compressed files, as opposed to a
1968N/A compressed container of uncompressed files,
1968N/A
1968N/A - open, royalty-free, well-documented format with wide
1968N/A platform support and acceptance,
1968N/A
1968N/A - multi-threaded decompression and compression possible,
1968N/A
1968N/A - creation and basic manipulation of package archives
1968N/A possible using widely-available tools,
1968N/A
1968N/A - simple composition and filtering of its content should be
1968N/A possible, and
1968N/A
1968N/A - random access to the archive contents must be possible
1968N/A without reading the entire archive file.
1968N/A
1968N/A 4.3.2. Candidates:
1968N/A
1968N/A A number of potential archive formats have been considered
1968N/A for use, including:
1968N/A
1968N/A - 7z (7-Zip)
1968N/A - cpio
1968N/A - pax (portable archive exchange format)
1968N/A - ZIP
1968N/A
1968N/A The evaluations provided for each format here are not in-
1968N/A tended to be exhaustive; rather they focus on the specific
1968N/A requirements of this project. For more information about
1968N/A these formats, and the documents used to evaluate them,
1968N/A please refer to section 6 of this proposal.
1968N/A
1968N/A 4.3.3. 7z Evaluation:
1968N/A
1968N/A The 7z format was rejected for the following reasons:
1968N/A
1968N/A - Does not permit random access to archive contents or
1968N/A requires the entire archive file to access the contents
1968N/A and adding this would require a custom variation of 7z.
1968N/A
1968N/A - Although the 7z format supports compression methods other
1968N/A than LZMA, a primary motivator for using 7z would be the
1968N/A ability to use LZMA natively as part of the conatiner
1968N/A format. However, the tradeoffs in terms of CPU and memory
1968N/A footprint currently make LZMA unsuitable for pkg(5) when
1968N/A compared to other compression algorithms such as those
1968N/A used by gzip(1).
1968N/A
1968N/A - Use of the 7z format would require integration of the LZMA
1968N/A SDK (which also provides a basic 7z API in C) and the cre-
1968N/A ation of python bindings or the integration of a third
1968N/A party's (such as pylzma).
1968N/A
1968N/A - No native support for extended attributes or UNIX owner/
1968N/A group permissions.
1968N/A
1968N/A 4.3.4. cpio Evaluation:
1968N/A
1968N/A The cpio format doesn't natively support random access to
1968N/A archive contents, but the format itself doesn't prevent
1968N/A this. An index could be added first file in the archive
1968N/A with the information needed to provide fast, random access
1968N/A to the archive contents.
1968N/A
1968N/A The cpio format was rejected for the following reasons:
1968N/A
1968N/A - The length of pathnames in cpio archives is limited to
1968N/A 256 characters for the portable format.
1968N/A
1968N/A - Available tools vary significantly in maximum archive size
1968N/A support.
1968N/A
1968N/A - The portable cpio format stores a copy of the file data
1968N/A with every hard link in an archive instead of simply
1968N/A storing a pointer to the source file in the archive.
1968N/A
1968N/A 4.3.4. PAX Evaluation:
1968N/A
1968N/A The PAX format meets all of the requirements except that of
1968N/A random access to archive contents. However, the format
1968N/A itself doesn't prevent this. A table of contents file could
1968N/A be supplied as the first file in the archive with the info-
1968N/A rmation needed to provide fast, random access to the con-
1968N/A tainer contents.
1968N/A
1968N/A 4.3.5. ZIP Evaluation:
1968N/A
1968N/A The ZIP format meets all of the requirements listed above
1968N/A (assuming that ZIP64 extensions are used), with the ex-
1968N/A ceptions listed below for which it was rejected:
1968N/A
1968N/A - The use or implementation of some of the functionality
1968N/A documented in the .ZIP file format requires a license from
1968N/A PKWARE.
1968N/A
1968N/A - While random archive content access is possible, the ZIP
1968N/A file format stores the index for the archive at the end of
1968N/A the archive (as opposed to the beginning). This increases
1968N/A the number of round trips that would be required for
1968N/A potential remote random content access. It also means
1968N/A that extraction requires multiple seeks to the end of the
1968N/A file before any content can be extracted from the archive,
1968N/A which can be detrimental to performance for some media
1968N/A types (optical, etc.).
1968N/A
1968N/A 4.3.6. Evaluation Conclusion:
1968N/A
1968N/A Based on the requirements set forth in section 4.3.1, the
1968N/A PAX format was selected as the on-disk archive format
1968N/A for pkg(5) packages. However, to enable efficient access
1968N/A to the archive contents, an index file needs to be present
1968N/A as the first file in the archive.
1968N/A
1968N/A Early evaluations of an unoptimised prototype were performed
1968N/A using a repository containing all packages for build 136 and
1968N/A unbundleds. The on-disk size of the repository was appox-
1968N/A imately 4.98G. The resulting archive was 5.0G in size, with
1968N/A an archive index file 9.7M in size (when the index was comp-
1968N/A ressed using gzip).
1968N/A
1968N/A First time access to the prototype archive for extraction of
1968N/A a single file after creation yielded a total time of approx-
1968N/A imately 5 seconds compared to approximately 36-42 seconds
1968N/A for utilities such as pax(1), tar(1), or gtar(1).
1968N/A
1968N/A Creation of the archive took 7 minutes, 35 seconds on a
1968N/A custom-built Intel Core 2 DUO E8400, with 8GB Memory,
2144N/A and a 1TB 10000 RPM SATA Drive w/ 64MB Cache.
1968N/A
1968N/A 4.3.7. Package Archive Specification:
1968N/A
1968N/A pkg(5) archive files will have an extension of 'p5p' which
1968N/A will stand for 'pkg(5) package'. The format of these
1968N/A archives matches that defined by IEEE Std 1003.1, 2004 for
1968N/A the pax Interchange Format, with the exception that the
1968N/A first archive entry must not use the optional pax headers
1968N/A allowed by the format, and must contain the index file
1968N/A for the package archive. The layout can be visualised as
1968N/A follows:
1968N/A
1968N/A .--------------------------------------------------------.
1968N/A | ustar header for package archive index file |
1968N/A .--------------------------------------------------------.
1968N/A | file data for package archive index file |
1968N/A .--------------------------------------------------------.
1968N/A | remaining archive data |
1968N/A .________________________________________________________.
1968N/A
1968N/A The reason for this limitation is to ensure that clients
1968N/A performing selective archive extraction can be guaranteed
1968N/A to find the location and size of the package archive index
1968N/A file without knowing the size of the header for the index
1968N/A file in advance (ustar headers are always 512 bytes in
1968N/A size).
1968N/A
1968N/A In addition, pkg(5) archives in this format make remote,
1968N/A selective archive access possible. For example, a client
1968N/A could request the first 512 bytes of a pkg(5) archive file
1968N/A from a remote repository, then retrieve the archive index
1968N/A file. Once it has the archive index file, it can then
1968N/A perform a HTTP/1.1 byte-ranges request to selectively
1968N/A transfer the data for a set of specific files from the
1968N/A archive. This convention also optimises access to the
1968N/A archive for sources that are heavily biased towards
1968N/A sequential reads.
1968N/A
1968N/A The index file must be named using the following template
1968N/A and be compressed using the gzip format described by RFCs
1968N/A 1951 and 1952, and formatted according to section 4.3.8:
1968N/A
1968N/A p5p.index.<index_file_number>.<index_version>.gz
1968N/A
1968N/A <index_file_number> is an integer in string form that
1968N/A indicates which index file this is. The number only
1968N/A exists so that each index file can remain unique in
1968N/A the archive. An archive may contain multiple index
1968N/A files to support fast archive additions.
1968N/A
1968N/A <index_version> is an integer in string form that
1968N/A indicates the version of the index file. The initial
1968N/A version for this proposal will be '0'.
1968N/A
1968N/A If the first file in the archive is found not to be in the
1968N/A layout or format shown above, or any of the index files in
1968N/A the archive are found to not be in a format supported by
1968N/A the client (version too old or too new), the archive will
1968N/A be treated as a standard pax archive and some operations
1968N/A may not be possible or experience degraded performance.
1968N/A The same is also true if the index file is found to not
1968N/A match the archive contents.
1968N/A
1968N/A When creating the archive, or adding to an existing archive,
1968N/A new index gzip files should be zero-padded with an extra 256
1968N/A bytes at the end. This reserved space is used for fast
1968N/A additions to existing package archives by updating the
1968N/A previous index file with an entry for the new index file.
1968N/A For example, the first index file's last entry should
1968N/A contain the name and offset of the second index file,
1968N/A and so on.
1968N/A
1968N/A All pathnames after the first in the archive (if the first
1968N/A file is the archive index file) must conform to the repo-
1968N/A sitory layout specified in section 4.2.2 of this proposal.
1968N/A
1968N/A Since a pkg(5) repository can contain one or more packages,
1968N/A pkg(5) archive files can also contain the data for one or
1968N/A more packages. This allows easy redistribution of a single
1968N/A package and all of its dependencies in a single file.
1968N/A
1968N/A Finally, it should be noted that only ascii character path-
1968N/A names are expected in the archive as the raw repository
1968N/A format does not use or support unicode pathnames.
1968N/A
1968N/A 4.3.8. Package Archive Index Specification:
1968N/A
1968N/A The pkg(5) archive index file enables fast, efficient access
1968N/A to the contents of an archive. It contains an entry for all
1968N/A files in the archive excluding the index file itself in the
1968N/A following format (also referred to as index format version
1968N/A 0):
1968N/A
1968N/A <name>NUL<offset>NUL<entry_size>NUL<size>NUL<typeflag>NL
1968N/A
1968N/A <name> is a string containing the pathname of the file
1968N/A in the archive using only ascii characters. It can be
1968N/A up to 65,535 bytes in length.
1968N/A
1968N/A <offset> is an unsigned long long integer in string form
1968N/A containing the relative offset in bytes of the first
1968N/A header block for the file in the archive. The offset is
1968N/A relative to the end of the last block of the index file
1968N/A in the archive they are listed in.
1968N/A
1968N/A <entry_size> is an unsigned long long integer in string
1968N/A form containing the size of the file's entry in bytes
1968N/A in the archive (including archive headers and trailers
1968N/A for the entry).
1968N/A
1968N/A <size> is an unsigned long long integer in string form
1968N/A containing the size of the file in bytes in the archive.
1968N/A
1968N/A <typeflag> is a single character representing the type
1968N/A of the file in the archive. Possible values are:
1968N/A 0 Regular File
1968N/A 1 Hard Link
1968N/A 2 Symbolic Link
1968N/A 5 Directory or subdirectory
1968N/A
1968N/A All values not listed above are reserved for future
1968N/A use. Unrecognised values should be treated as a
1968N/A regular file.
1968N/A
1968N/A An example set of entries would appear as follows:
1968N/A
1968N/A pkg5.repositoryNUL0NUL546NUL2560NUL0
1968N/A pkgNUL2560NUL0NUL1536NUL5
1968N/A pkg/service%2Ffault-managementNUL4096NUL0NUL1536NUL5
1968N/A
1968N/A It should be noted that other possible formats were
1968N/A evaluated for the index file, including those based
1968N/A on: JSON, XDR, and python's pack. However, all other
1968N/A formats were found to be deficient for one or more
1968N/A of the following reasons:
1968N/A
1968N/A - larger in size
1968N/A
1968N/A - no streaming support (required entire index file be
1968N/A loaded into memory)
1968N/A
1968N/A - significantly greater parsing times
1968N/A
1968N/A5. Proposed Changes:
1968N/A
1968N/A 5.1. Client Support for filesystem-based Repository Access:
1968N/A
1968N/A The pkg.client.api provided by pkg(5) will be updated to allow
1968N/A access to repositories via the filesystem. All functionality
1968N/A normally offered by pkg.depotd will be supported.
1968N/A
1968N/A pkg(1) and packagemanager(1) will be modified to support the
1968N/A use of URIs using the 'file' scheme. No user visible changes
1968N/A will be made to any existing subcommands or options except
1968N/A that URIs using the 'file' scheme will be allowed.
1968N/A
1968N/A When accessing repositories using the 'file' scheme, clients
1968N/A by default will not copy package file data into the client's
1968N/A cache (e.g. /var/pkg/download). Instead, the transport system
1968N/A will treat configured repositories as an additional read-only
1968N/A cache.
1968N/A
1968N/A 5.2. Depot Storage, Client Transport and Publication Tool Update:
1968N/A
1968N/A The pkg.server.repository module will be updated to support
1968N/A the new repository format outlined in section 4.2.2. Existing
1968N/A repositories will not automatically be upgraded, while new
1968N/A repositories will use the new format. A new administrative
1968N/A command detailed below has been introduced to allow upgrading
1968N/A existing repositories to the new format.
1968N/A
1968N/A These changes will automatically allow the client to access
1968N/A repositories in the new format when using filesystem-based
1968N/A access. Older clients will remain unable to access repo-
1968N/A sitories in the new format.
1968N/A
1968N/A The client transport system will be updated to support all
1968N/A publication operations and the publication tools and project
1968N/A private APIs will be changed to use the client transport
1968N/A system.
1968N/A
1968N/A The '-d' option of pkgrecv(1) will be changed such that if
1968N/A the name of a file with a '.p5p' extension is specified,
1968N/A and that file does not already exist, a pkg(5) archive
1968N/A file will be created containing the specified packages.
1968N/A If the file already exists, it will exit with an error.
1968N/A When pkgrecv(1) creates pkg(5) archive files, it will omit
1968N/A catalog and index data.
1968N/A
1968N/A Due to the transport changes above, pkgrecv(1) will also
1968N/A be able to use pkg(5) archive files as a source of package
1968N/A data. pkgsend(1) will not support the use of pkg(5)
1968N/A archive files as a destination due to the publication
1968N/A model it currently uses.
1968N/A
2144N/A To support the expanded multiple publisher version 4 format
1968N/A of repositories, the depot server will be updated to respond
1968N/A to requests as follows:
1968N/A
1968N/A - If clients include the publisher prefix as part of the request
1968N/A path, then responses will be for that specific publisher's
1968N/A data. For example:
1968N/A
1968N/A http://localhost/dev/opensolaris.org/manifest/
1968N/A 0/opensolaris.org/backup%2Fareca/7.1%2C5.11-0.134
1968N/A %3A20100302T005731Z
1968N/A
1968N/A http://localhost/dev/file/0/opensolaris.org/
1968N/A 2ce6c746c85cd7ac44571d094b53c5fe1bfc32c8
1968N/A
1968N/A - The default publisher specified in the depot configuration
1968N/A will be used when responding to requests for operations that
1968N/A do not include the publisher prefix. For example:
1968N/A
1968N/A http://localhost/dev/manifest/0/
1968N/A backup%2Fareca/7.1%2C5.11-0.134%3A20100302T005731Z
1968N/A
1968N/A ...provides a response identical to the first case where the
1968N/A publisher prefix was provided as part of the request. Those
1968N/A expecting to maintain a large population of older clients
1968N/A should reassign publisher URLs down a level, to include the
1968N/A publisher explicitly although this is not required for
1968N/A correct operation.
1968N/A
1968N/A A new utility named pkgrepo will be added to facilitate the
1968N/A creation and management of pkg(5) repositories. It will have
1968N/A the following global options:
1968N/A
1968N/A -s repo_uri_or_path
1968N/A A URI or path specifying the location of a pkg(5)
1968N/A package repository.
1968N/A
1968N/A -? / --help
1968N/A
1968N/A It will have the following subcommands:
1968N/A
1968N/A create <uri_or_path>
1968N/A Creates a pkg(5) repository at the specified location.
1968N/A Can only be used with filesystem-based repositories.
1968N/A
1968N/A publisher [<pub_prefix> ...]
1968N/A Lists the publishers of packages in the repository:
1968N/A
1968N/A PUBLISHER PACKAGES VERSIONS UPDATED
1968N/A <pub_1> <num_uniq_pkgs> <num_pkg_vers> <cat_last_modified>
1968N/A <pub_2> <num_uniq_pkgs> <num_pkg_vers> <cat_last_modified>
1968N/A ...
1968N/A
1968N/A rebuild
1968N/A Discards any catalog, search or other cached informaqtion
1968N/A found in the repository and then re-creates it based on
1968N/A the current contents of the repository. Can only be used
1968N/A with filesystem-based repositories.
1968N/A
1968N/A refresh
1968N/A By default, catalogs any new packages found in the repo-
1968N/A sitory and updates search indices. This is intended for
1968N/A use with deferred publication (--no-catalog or --no-index
1968N/A options of pkgsend). Can only be used with filesystem-based
1968N/A repositories.
1968N/A
1968N/A Options:
1968N/A --no-catalog - doesn't add new packages
1968N/A --no-index - doesn't refresh search indices
1968N/A
1968N/A remove fmri_pattern ...
1968N/A Removes the specified package(s) from the repository.
1968N/A If more than one match is found for any given pattern,
1968N/A the exact FMRI must be provided.
1968N/A
1968N/A upgrade
1968N/A Can only be used with filesystem-based repositories.
1968N/A Upgrades the repository to the most current format if
1968N/A possible.
1968N/A
1968N/A Has these options:
1968N/A
1968N/A -n determine whether the upgrade could be formed and exit
1968N/A
1968N/A -v show a summary of what will be done, the current format
1968N/A of the repository and what it will be upgraded to
1968N/A
1968N/A 5.3. Client Storage and Image Format Update:
1968N/A
1968N/A To simplify and unify the storage format used by the client,
1968N/A and pkg(5) repositories, the format of the client image
1968N/A will be changed to use the structure described below.
1968N/A
2144N/A For a version 3 image (the current format), the structure is as
2144N/A follows:
2144N/A
2144N/A <IMG_ROOT>
2144N/A download/
2144N/A <first two letters of file hash>/
2144N/A <file-named-by-hash>
2144N/A file/
2144N/A gui_cache/
2144N/A history/
2144N/A index/
2144N/A lost+found/
2144N/A pkg/
2144N/A <stem>/
2144N/A <version>/
2144N/A manifest
2144N/A manifest.<cachefiles>
2144N/A publisher/
2144N/A <prefix>/
2144N/A catalog/
2144N/A certs/ (optional)
2144N/A last_refreshed (optional)
2144N/A state/
2144N/A installed/
2144N/A <image catalog files>
2144N/A known/
2144N/A <image catalog files>
2144N/A tmp/
2144N/A cfg_cache
2144N/A lock
2144N/A
1968N/A For a version 4 image (the proposed format), the structure is
1968N/A as follows:
1968N/A
1968N/A <IMG_ROOT>
1968N/A cache/
2144N/A index/
2144N/A <api search index files>
2144N/A publisher/
2144N/A <publisher_prefix>/
2144N/A catalog/
2144N/A <repository composition cache files>
2144N/A pkg/
2144N/A <stem>/
2144N/A <version>/
2144N/A <manifest-cache-files>
2144N/A tmp/
2144N/A <api temporary files>
2144N/A gui_cache/
2144N/A <package manager data files>
2144N/A history/
2144N/A <client history files>
2144N/A license/
2144N/A <stem>/
2144N/A <license files>
2144N/A lost+found/
2144N/A <salvaged filesystem objects>
2144N/A publisher/
2144N/A <prefix>/
1968N/A certs/
1968N/A <publisher signing certificates>
2144N/A <otherwise as described in section 4.2.2>
1968N/A ssl/
1968N/A client ssl certificates>
1968N/A state/
1968N/A installed/
1968N/A <image catalog files>
1968N/A known/
1968N/A <image catalog files>
1968N/A pkg5.image (client configuration file; was cfg_cache)
1968N/A
1968N/A A new property named 'version' will be added to the image
1968N/A and will be readonly (cannot be set using the set-property
1968N/A subcommand of pkg(1)).
1968N/A
1968N/A Existing images will not automatically be upgraded to the new
1968N/A format. To enable the upgrading of existing images to newer
1968N/A formats, the following subcommands will be added:
1968N/A
2144N/A update-format
1968N/A Updates the format of the client's image to the current
1968N/A format if possible.
1968N/A
1968N/A 5.4. Client and Depot Support for On-Disk Archive Format:
1968N/A
1968N/A The pkg.server.repository module will be updated to support
1968N/A the serving of a repository in readonly mode using a pkg(5)
1968N/A archive file.
1968N/A
1968N/A The pkg.client.api transport system will be updated to support
1968N/A the usage of a pkg(5) archive file as an origin for package
1968N/A data.
1968N/A
1968N/A To support the specification of temporary origins, the install
1968N/A and image-update subcommands will be modified by adding a '-g'
1968N/A option to specify additional temporary package origin URIs or
1968N/A the path to a pkg(5) archive file or pkg(5) info file. The
1968N/A '-g' option may be specified multiple times. As an example:
1968N/A
1968N/A $ pkg install -g /path/to/foo.p5p \
1968N/A -g http://mytemprepo:10000/ \
1968N/A -g file:/path/to/bar.p5p \
1968N/A foo bar localpkg
1968N/A
1968N/A $ pkg image-update -g /path/to/foo.p5p
1968N/A
1968N/A pkg(5) archive files used as a source of package data during an
1968N/A install or image-update operation will have their content cached
1968N/A by the client before the operation begins. Any publishers found
1968N/A in the archive will be temporarily added to the image if they do
1968N/A not already exist. Publishers that were temporarily added but
1968N/A not used during the operation will be removed after operation
1968N/A completion or failure. Any package FMRIs or patterns provided
1968N/A will be matched using only the sources provided using '-g'.
1968N/A
1968N/A The pkg list and pkg info commands will also be updated by
1968N/A adding the '-g' option described above, with the exception
1968N/A that the '-g' option may only be specified once, and only
1968N/A the source named will be used for the operation.
1968N/A
1968N/A Using '-g' with the pkg list subcommand implies '-n' by default,
1968N/A unless '-f' is specified; it also implies '-a'. To list all
1968N/A versions, the '-f' option must be used. As an example:
1968N/A
1968N/A $ pkg list -g /path/to/foo.p5p
1968N/A NAME (PUBLISHER) VERSION STATE UFOXI
1968N/A bar (example.com) 1.0-0.133 known -----
1968N/A foo (example.com) 1.0-0.133 installed -----
1968N/A
1968N/A $ pkg list -g file:/path/to/foo.p5p
1968N/A NAME (PUBLISHER) VERSION STATE UFOXI
1968N/A bar (example.com) 1.0-0.133 known -----
1968N/A foo (example.com) 1.0-0.133 installed -----
1968N/A
1968N/A $ pkg list -f -g http://example.com/multi_foo.p5p
1968N/A NAME (PUBLISHER) VERSION STATE UFOXI
1968N/A foo (example.com) 1.0-0.133 installed u----
1968N/A foo (example.com) 2.0-0.133 known u----
1968N/A foo (example.com) 3.0-0.133 known -----
1968N/A
1968N/A $ pkg list -g file:/path/to/repo
1968N/A NAME (PUBLISHER) VERSION STATE UFOXI
1968N/A repopkg (example.com) 2.0-0.133 known -----
1968N/A
1968N/A $ pkg list -g http://myrepo:10000
1968N/A NAME (PUBLISHER) VERSION STATE UFOXI
1968N/A localpkg (example.org) 3.0-0.133 known -----
1968N/A
1968N/A Using '-g' with the pkg info subcommand implies '-r'. The '-l'
1968N/A option cannot be used in combination with '-g'. As an example:
1968N/A
1968N/A $ pkg info -g /path/to/bundle.p5p
1968N/A Name: bar
1968N/A Summary: A useful complement to foo.
1968N/A State: Not Installed
1968N/A ...
1968N/A Name: foo
1968N/A Summary: Provides useful utilities.
1968N/A State: Installed
1968N/A ...
1968N/A
1968N/A '-g' was chosen for the option usage described above to match
1968N/A the '-g' already used by set-publisher and image-create for
1968N/A origins, and due to the unfortunate existing usage of '-s'
1968N/A by the 'pkg list' subcommand.
1968N/A
1968N/A6. Reference Documents:
1968N/A
1968N/A Project team members and community members have provided a number of
1968N/A informal comments that served as the basis for the goals of this
1968N/A project:
1968N/A
1968N/A - "new on-disk format?", 18 Jan. 2008:
1968N/A http://markmail.org/thread/2kg6w5bfwp4x3knc
1968N/A
1968N/A - "reorganising the repository and client metadata", 23. Sep. 2009:
1968N/A http://markmail.org/thread/stfrosvx3v6if2fi
1968N/A
1968N/A - "ZAP - Zip Archive Packaging", Sep. 2007:
1968N/A http://markmail.org/thread/ijyq3mlrhaofccgx
1968N/A
1968N/A In addition, the following materials were referenced when writing
1968N/A this proposal:
1968N/A
1968N/A - "7z", 12 Apr. 2010:
1968N/A http://en.wikipedia.org/wiki/7z
1968N/A
1968N/A - "RFC2616: HTTP/1.1 Header Field Definitions", 01 Sep. 2004:
1968N/A http://www.w3.org/Protocols/rfc2616/
1968N/A rfc2616-sec14.html#sec14.35.1
1968N/A
1968N/A - "cpio", 21 Mar. 2010:
1968N/A http://en.wikipedia.org/wiki/Cpio
1968N/A
1968N/A - "copy file archives in and out", 26 Mar. 2007:
1968N/A http://heirloom.sourceforge.net/man/cpio.1.html
1968N/A
1968N/A - "The gzip file format", Date Unknown:
1968N/A http://www.gzip.org/format.txt
1968N/A
1968N/A - "DragonFly File Formats Manual, cpio -- format of cpio archive
1968N/A files"
1968N/A http://leaf.dragonflybsd.org/cgi/web-man?command=cpio&section=5
1968N/A
1968N/A - "A Quick Benchmark: Gzip vs. Bzip2 vs. LZMA", 31 May. 2005:
1968N/A http://tukaani.org/lzma/benchmarks.html
1968N/A
1968N/A - "Lempel Ziv Markov Algorithm and 7-Zip", 7 Feb. 2008:
1968N/A http://blogs.sun.com/clayb/entry/lempel_ziv_markov_algorithm_and
1968N/A
1968N/A - "The Open Group Base Specifications Issue 6: pax Interchange
1968N/A Format, IEEE Std 1003.1, 2004 Edition"
1968N/A http://www.opengroup.org/onlinepubs/009695399/utilities/
1968N/A pax.html#tag_04_100_13_01
1968N/A
1968N/A - ".ZIP File Format Specification", 28 Sep. 2007:
1968N/A http://www.pkware.com/documents/casestudies/APPNOTE.TXT
1968N/A
1968N/A - "ZIP (file format)", 17 Apr. 2010:
1968N/A http://en.wikipedia.org/wiki/ZIP_%28file_format%29