pkg(5): image packaging system This information is Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved. ON-DISK FORMAT PROPOSAL 1. Introduction 1.1. Date of This Document: 06/02/2010 1.2. Name of Document Author/Supplier: Shawn Walker, Oracle, on behalf of the pkg(5) project team 1.3. Acknowledgements: This document is largely based on comments from the following individuals to whom the author is exceedingly indebted to: - Danek Duvall - Mike Gerdts - Stephen Hahn - Krister Johansen - Dan Price - Brock Pytlik - Bart Smaalders - Peter Tribble 2. Project Summary 2.1. Project Description: "...the repository can be archived up, put on a CD, memory stick, 2D barcode, and protected by the Black Knight, fire moats, komodo dragons, etc." - Danek Duvall pkg(5) is primarily a network-oriented binary packaging system. Although some of the tools it provides support filesystem-based operations for publication, the primary expected use for package operations (such as install, update, search, etc.) is between an intelligent client and one or more servers that provide access to a package repository and/or other interactive services. This project seeks to define and establish an on-disk format (and corresponding container format), for the pkg(5) system, with the intent that it can enable the ubiquitous, transparent use of package data from filesystem-based resources. The changes proposed by this project are evolutionary, not revolutionary, in nature. In particular, this project seeks to refine and adopt the existing repository format used by the pkg(5) depot server as the on-disk format. Supplementary to that, it also seeks the addition of a container format to ease provisioning of the on-disk format, and the unification of the scheme used by the client and server to store package data. 2.2. Problem Area: For some deployments, network-based package data access is not possible or is undesirable. Concerns often cited in this area include: - lack of access control or ability to easily integrate with existing access control systems, - inability to rely on alternative (or existing) provisioning arrangements (such as NFS-based file servers), - environmental or procedural requirements that prohibit the ability to or use of a network-based service, - characteristics of network protocols (such as HTTP, etc.) that artificially limit functionality or performance (as opposed to iSCSI or other alternatives), - ease of administration of filesystem-based resources, and - ease of transferring package data. 3. Project Technical Description: 3.1. Details: This project defines an on-disk format (and corresponding con- tainer format) that is intended for the supplemental or complete provisioning of package data at all stages of the package life- cycle. That is, when package data is published, stored by the client or server, or otherwise used during package operations. The on-disk format (defined in detail later in this document) is intended to be distributable in its raw form (a pre-defined structure of directories and files) or within a container format (such as a zip file, etc.). Out of necessity, the use of filesystem-based resources (such as those provided by the on-disk format) will sometimes limit the operations that can be performed to a subset of those normally available when interacting with a network-based repository. For example, search and publisher configuration may not be possible, and purely interactive services such as the BUI (Browser UI) offered by the depot server for a repository, RSS feeds, and others will not be available. Because of the wide-ranging impact of the changes required to implement this functionality, it is intended that the project be implemented in the following sequence: - Client Support for filesystem-based Repository Access - Depot Storage, Client Transport and Publication Tool Update - Client Storage and Image Format Update - Client and Depot Support for On-Disk Archive Format 3.2. Bug/RFE Number(s): As an example of the kinds of defects and RFEs intended to be resolved by this project, see the following selection of defect.opensolaris.org bug IDs: 2152 standalone package support needed (on-disk format) 166 depot doesn't set directory mode when creating directories 2086 validate that a repository is really a repository in pkg.depotd 6335 publisher repo with invalid certificate information shouldn't prevent querying other repos 6576 pkg install/update support for temporary publisher origins desired 6940 depot support for file:// URI desired 7213 ability to remove published packages 7273 manifests should be arranged in a hierarchy by publisher 7276 /var/pkg metadata needs reorg (looks busy) 8433 client and pull need to refer to refer to "repository" instead of "server" 8722 advanced repository metadata store needed 8725 versioning information for depot and repository metadata needed 9571 CachedManifest should be named FactoredManifest 9572 CachedManifest should allow consumers to specify cache location 9872 publication api should use new transport subsystem 9933 ability to control repository creation behaviour or removal of it 10244 caching dictionaries as a class variable prevents multi-image and repo search 11362 Image update dying when trying to talk to a disabled and offline publisher 11740 publishers with installed packages should not be removable 12814 publisher prefixes should be forcibly lower-cased or case insensitive 14802 ability to have separate read / write download caches 15320 pkgsend will traceback if unable to parse server error response 15371 repository property defaults opensolaris.org-specific 3.3. In Scope: Filesystem-based data resourcing for package operations. 3.4. Out of Scope: Package signing and fine-grained access control for package repositories. 4. On-Disk Format Technical Description: 4.1. Overview: The on-disk format is intended to exist both in a raw format as a pre-defined structure of directories and files, and in an archive format which is primarily a simple container for the raw format. 4.2. Raw Format: 4.2.1. Goals: The goals for the raw on-disk format include: - unification of client and server package data storage for data common to both, - transparent usage of package data regardless of operation or use by client or server, - ease in composition and decomposition of package data stored within by publisher or package, - re-use of existing publication tools for on-disk format, - enablement of future publication tools to automatically be able to manipulate or use on-disk format, and - ease of provisioning. 4.2.2. Raw Format specification: The pkg(5) repository format is a set of directories and files that conform to a pre-defined structure. For a version 3 repository (the current format), the structure is as follows: / catalog/ index/ file/ / pkg/ / trans/ cfg_cache (optional repository configuration file) Version 4 of the repository format eliminates the potential for unintended collisions between package metadata from different publishers and simplifies composition and decomp- osition of repository content. The top-level is an optional shared storage space for data common to all publishers in the repository, while the publisher subdirectory contains data specific to a publisher. It is essentially a nested repository format, and can be defined as follows: / file/ (optional) publisher/ (optional) / (optional) catalog/ (optional) file/ (optional) / index/ (optional) pkg/ (optional) / trans/ (optional) pub.p5i (optional) pkg5.repository (required) By default, repository operations will store data in the publisher-specific location found under publisher/ for new repositories. In the case that the top-level file/ directory is used, automatic decomposition of contents into its publisher- specific components will not be possible unless corresponding package manifests are also available. To support easy composition, filtering, and creation of package archives, directories above marked with the text '(optional)' must not be required. The behaviour of consumers accessing the contents of the repository should be as follows based on the directory accessed: - file/ This optional directory serves as a place to store file data for more than one publisher. Package files are stored in gzip format using a sha1sum of the file as the filename, and then the first two letters of the filename as the parent directory's name. - publisher//catalog/ If absent, consumers should determine the list of packages available based on the manifest files present in the publisher/ subdirectory. If present, consumers should expect v1 (or newer) catalog files, or none at all, to be contained within. - publisher//file/ Consumers should always check this subdirectory first (if present) when retrieving package file data if the publisher is known. Package files are stored in gzip format using a sha1sum of the file as the filename, and then the first two letters of the filename as the parent directory's name. - publisher//index/ If absent, search functionality should be disabled for this publisher, or a fallback to 'slow manifest-based search' performed. If present, consumers should expect v1 (or newer) search files, or none at all, to be con- tained within. - publisher//pkg/ If absent, search must be disabled for this publisher even if index is present. If present, manifests are stored in pkg(5) manifest format using the uri-encoded version of the package FMRI as the filename, and using the uri-encoded package FMRI stem (name) as the parent directory's name. - publisher//trans/ If absent, this directory will be created during publication operations. If present, in progress transaction data is stored in a directory named by the open time of the transaction as a UTC UNIX timestamp plus an '_' and the URI-encoded package FRMI. As an example: 1245176111_pkg%3A%2FBRCMbnx%400.5.11%2C5.11-0.116 %3A20090616T181511Z - publisher//pub.p5i This pkg(5) information (p5i) file should contain suggested configuration information for clients such as origins, mirrors, alias, etc. Consumers can use this to provide clients with initial or suggested configuration information for a given publisher. If not present, the publisher's identity should be assumed based on the directory structure, while the refresh interval should be assumed to be 4 hours. - pkg5.repository This file serves as an identifier and a place to store configuration information specific to the repository. It *is not* an equivalent to the existing cfg_cache file which will no longer be used. Its format and structure are as follows: [repository] version = Any information found in the cfg_cache used in the previous repository format related to a publisher is now stored in the pub.p5i file for the related publisher. (Examples of information include origins, mirrors, maintainer info, etc.) As a result, the cfg_cache file is no longer used. Any depot-specific properties, such as the feed icon, logo, etc. are now completely managed using SMF or a user-provided configuration file. This change was made not only to sim- plify configuration, but to separate depot configuration from repsitory configuration. An example version 4 repository might be structured as follows: / publisher/ example.com/ catalog/ catalog.attrs catalog.base.C file/ ff/ fffff277f5a8fb63e57670afc178415c2c5e706d index/ __at_depend ... pkg/ package%2Fpkg/ 0.5.11%2C5.11-0.136%3A20100327T063139Z trans/ 1245176111_pkg%3A%2FBRCMbnx%400.5.11%2C5.11-0.116 %3A20090616T181511Z pub.p5i example.net/ catalog/ catalog.attrs catalog.base.C file/ af/ affff277f5a8fb63e57670afc178415c2c5e706d index/ __at_depend ... pkg/ package%2Fpkg/ 0.5.11%2C5.11-0.133%3A20090327T062137Z trans/ 1245176111_pkg%3A%2FFAAMbnx%400.5.11%2C5.11-0.139 %3A20100616T181511Z pub.p5i pkg5.repository: [repository] version = 4 4.3. Archive Format: 4.3.1. Requirements: The requirements for the on-disk archive format include: - support for archives greater than 8GB in size, - support for files in archive greater than 4GB in size, - support for efficient storage of hard links, - support for pathnames sigificantly greater than > 255 characters in length, - core Python bindings exist or can be easily created using an existing library, - can be a container of compressed files, as opposed to a compressed container of uncompressed files, - open, royalty-free, well-documented format with wide platform support and acceptance, - multi-threaded decompression and compression possible, - creation and basic manipulation of package archives possible using widely-available tools, - simple composition and filtering of its content should be possible, and - random access to the archive contents must be possible without reading the entire archive file. 4.3.2. Candidates: A number of potential archive formats have been considered for use, including: - 7z (7-Zip) - cpio - pax (portable archive exchange format) - ZIP The evaluations provided for each format here are not in- tended to be exhaustive; rather they focus on the specific requirements of this project. For more information about these formats, and the documents used to evaluate them, please refer to section 6 of this proposal. 4.3.3. 7z Evaluation: The 7z format was rejected for the following reasons: - Does not permit random access to archive contents or requires the entire archive file to access the contents and adding this would require a custom variation of 7z. - Although the 7z format supports compression methods other than LZMA, a primary motivator for using 7z would be the ability to use LZMA natively as part of the conatiner format. However, the tradeoffs in terms of CPU and memory footprint currently make LZMA unsuitable for pkg(5) when compared to other compression algorithms such as those used by gzip(1). - Use of the 7z format would require integration of the LZMA SDK (which also provides a basic 7z API in C) and the cre- ation of python bindings or the integration of a third party's (such as pylzma). - No native support for extended attributes or UNIX owner/ group permissions. 4.3.4. cpio Evaluation: The cpio format doesn't natively support random access to archive contents, but the format itself doesn't prevent this. An index could be added first file in the archive with the information needed to provide fast, random access to the archive contents. The cpio format was rejected for the following reasons: - The length of pathnames in cpio archives is limited to 256 characters for the portable format. - Available tools vary significantly in maximum archive size support. - The portable cpio format stores a copy of the file data with every hard link in an archive instead of simply storing a pointer to the source file in the archive. 4.3.4. PAX Evaluation: The PAX format meets all of the requirements except that of random access to archive contents. However, the format itself doesn't prevent this. A table of contents file could be supplied as the first file in the archive with the info- rmation needed to provide fast, random access to the con- tainer contents. 4.3.5. ZIP Evaluation: The ZIP format meets all of the requirements listed above (assuming that ZIP64 extensions are used), with the ex- ceptions listed below for which it was rejected: - The use or implementation of some of the functionality documented in the .ZIP file format requires a license from PKWARE. - While random archive content access is possible, the ZIP file format stores the index for the archive at the end of the archive (as opposed to the beginning). This increases the number of round trips that would be required for potential remote random content access. It also means that extraction requires multiple seeks to the end of the file before any content can be extracted from the archive, which can be detrimental to performance for some media types (optical, etc.). 4.3.6. Evaluation Conclusion: Based on the requirements set forth in section 4.3.1, the PAX format was selected as the on-disk archive format for pkg(5) packages. However, to enable efficient access to the archive contents, an index file needs to be present as the first file in the archive. Early evaluations of an unoptimised prototype were performed using a repository containing all packages for build 136 and unbundleds. The on-disk size of the repository was appox- imately 4.98G. The resulting archive was 5.0G in size, with an archive index file 9.7M in size (when the index was comp- ressed using gzip). First time access to the prototype archive for extraction of a single file after creation yielded a total time of approx- imately 5 seconds compared to approximately 36-42 seconds for utilities such as pax(1), tar(1), or gtar(1). Creation of the archive took 7 minutes, 35 seconds on a custom-built Intel Core 2 DUO E8400, with 8GB Memory, and a 1TB 10000 RPM SATA Drive w/ 64MB Cache. 4.3.7. Package Archive Specification: pkg(5) archive files will have an extension of 'p5p' which will stand for 'pkg(5) package'. The format of these archives matches that defined by IEEE Std 1003.1, 2004 for the pax Interchange Format, with the exception that the first archive entry is tagged with an extended pax archive header that specifies the archive version and the version of the pkg(5) API that was used to write it. In addition, the file for the first archive entry must be the index file file for the package archive. The layout can be visualised as follows: .--------------------------------------------------------. | ustar header for pax header global archive data | .--------------------------------------------------------. | pax global extended header data for archive | .--------------------------------------------------------. | ustar header for pax header for archive index file | .--------------------------------------------------------. | pax extended header data for archive index file | .--------------------------------------------------------. | ustar header for package archive index file | .--------------------------------------------------------. | file data for package archive index file | .--------------------------------------------------------. | remaining archive data | .________________________________________________________. The archive and API version is stored in the header of the index file instead of the global header for two reasons: first, any headers in the global header are treated as though they apply to every entry in the archive, and secondly, the pax specification states that global headers should not be used with interchange media that could suffer partial data loss during transport. Since the archive version primarily serves as a way for clients to reliably determine if a "standard" pax archive versus one with an index is being read, this approach seems reasonable. The reason for this limitation is to ensure that clients performing selective archive extraction can be guaranteed to find the location and size of the package archive index file without knowing the size of the header for the index file in advance (this layout ensures that clients can find the archive index and/or identify the archive in the first 2048 bytes). In addition, pkg(5) archives in this format make remote, selective archive access possible. For example, a client could request the first 2048 bytes of a pkg(5) archive file from a remote repository, identify the offsets of the index and then retrieve it using a HTTP/1.1 byte-ranges request. Once it has the archive index file, it can then perform additional byte-range requests to selectively transfer the the data for a set of specific files from the archive. This convention also optimises access to the archive for sources that are heavily biased towards sequential reads. The index file must be named using the following template and be compressed using the gzip format described by RFCs 1951 and 1952, and formatted according to section 4.3.8: p5p.index..v.gz is an integer in string form that indicates which index file this is. The number only exists so that each index file can remain unique in the archive. An archive may contain multiple index files to support fast archive additions. is an integer in string form that indicates the version of the index file. The initial version for this proposal will be '0'. However, if the first file in the archive is found to not use the layout or format shown above, or any of the index files in the archive are not in a format supported by the client (version too old or too new), the archive must be treated as a standard pax archive and some operations may not be possible or experience degraded performance. The same is also true if the index file is found to not match the archive contents. All entries in the archive (excluding any archive index files) must conform to the repository layout specified in section 4.2.2 of this proposal. Since a pkg(5) repository can contain one or more packages, pkg(5) archive files can also contain the data for one or more packages. This allows easy redistribution of a single package and all of its dependencies in a single file. Finally, it should be noted that only ascii character path- names are expected in the archive as the raw repository format does not use or support unicode pathnames. 4.3.8. Package Archive Index Specification: The pkg(5) archive index file enables fast, efficient access to the contents of an archive. It contains an entry for all files in the archive excluding the index file itself in the following format (also referred to as index format version 0): NULNULNULNUL NULNL is a string containing the pathname of the file in the archive using only ascii characters. It can be up to 65,535 bytes in length. is an unsigned long long integer in string form containing the relative offset in bytes of the first header block for the file in the archive. The offset is relative to the end of the last block of the index file in the archive they are listed in. is an unsigned long long integer in string form containing the size of the file's entry in bytes in the archive (including archive headers and trailers for the entry). is an unsigned long long integer in string form containing the size of the file in bytes in the archive. is a single character representing the type of the file in the archive. Possible values are: 0 Regular File 1 Hard Link 2 Symbolic Link 5 Directory or subdirectory All values not listed above are reserved for future use. Unrecognised values should be treated as a regular file. An example set of entries would appear as follows: pkg5.repositoryNUL0NUL546NUL2560NUL0NUL pkgNUL2560NUL0NUL1536NUL5NUL pkg/service%2Ffault-managementNUL4096NUL0NUL1536NUL5NUL It should be noted that other possible formats were evaluated for the index file, including those based on: JSON, XDR, and python's pack. However, all other formats were found to be deficient for one or more of the following reasons: - larger in size - no streaming support (required entire index file be loaded into memory) - significantly greater parsing times using currently available Python libraries - required developing an envelope format that could contain the encoded data 5. Proposed Changes: 5.1. Client Support for filesystem-based Repository Access: The pkg.client.api provided by pkg(5) will be updated to allow access to repositories via the filesystem. All functionality normally offered by pkg.depotd will be supported. pkg(1) and packagemanager(1) will be modified to support the use of URIs using the 'file' scheme. No user visible changes will be made to any existing subcommands or options except that URIs using the 'file' scheme will be allowed. When accessing repositories using the 'file' scheme, clients by default will not copy package file data into the client's cache (e.g. /var/pkg/download). Instead, the transport system will treat configured repositories as an additional read-only cache. 5.2. Depot Storage, Client Transport and Publication Tool Update: The pkg.server.repository module will be updated to support the new repository format outlined in section 4.2.2. Existing repositories will not automatically be upgraded, while new repositories will use the new format. A new administrative command detailed below has been introduced to allow upgrading existing repositories to the new format. These changes will automatically allow the client to access repositories in the new format when using filesystem-based access. Older clients will remain unable to access repo- sitories in the new format. The client transport system will be updated to support all publication operations and the publication tools and project private APIs will be changed to use the client transport system. The '-d' option of pkgrecv(1) will be changed such that if the name of a file with a '.p5p' extension is specified, and that file does not already exist, a pkg(5) archive file will be created containing the specified packages. If the file already exists, it will exit with an error. When pkgrecv(1) creates pkg(5) archive files, it will omit catalog and index data. Due to the transport changes above, pkgrecv(1) will also be able to use pkg(5) archive files as a source of package data. pkgsend(1) will not support the use of pkg(5) archive files as a destination due to the publication model it currently uses. To support the expanded multiple publisher version 4 format of repositories, the depot server will be updated to respond to requests as follows: - If clients include the publisher prefix as part of the request path, then responses will be for that specific publisher's data. For example: http://localhost/dev/opensolaris.org/manifest/ 0/opensolaris.org/backup%2Fareca/7.1%2C5.11-0.134 %3A20100302T005731Z http://localhost/dev/file/0/opensolaris.org/ 2ce6c746c85cd7ac44571d094b53c5fe1bfc32c8 - The default publisher specified in the depot configuration will be used when responding to requests for operations that do not include the publisher prefix. For example: http://localhost/dev/manifest/0/ backup%2Fareca/7.1%2C5.11-0.134%3A20100302T005731Z ...provides a response identical to the first case where the publisher prefix was provided as part of the request. Those expecting to maintain a large population of older clients should reassign publisher URLs down a level, to include the publisher explicitly although this is not required for correct operation. A new utility named pkgrepo will be added to facilitate the creation and management of pkg(5) repositories. It will have the following global options: -s repo_uri_or_path A URI or path specifying the location of a pkg(5) package repository. -? / --help It will have the following subcommands: create Creates a pkg(5) repository at the specified location. Can only be used with filesystem-based repositories. publisher [ ...] Lists the publishers of packages in the repository: PUBLISHER PACKAGES VERSIONS UPDATED ... rebuild Discards any catalog, search or other cached informaqtion found in the repository and then re-creates it based on the current contents of the repository. Can only be used with filesystem-based repositories. refresh By default, catalogs any new packages found in the repo- sitory and updates search indices. This is intended for use with deferred publication (--no-catalog or --no-index options of pkgsend). Can only be used with filesystem-based repositories. Options: --no-catalog - doesn't add new packages --no-index - doesn't refresh search indices remove fmri_pattern ... Removes the specified package(s) from the repository. If more than one match is found for any given pattern, the exact FMRI must be provided. upgrade Can only be used with filesystem-based repositories. Upgrades the repository to the most current format if possible. Has these options: -n determine whether the upgrade could be formed and exit -v show a summary of what will be done, the current format of the repository and what it will be upgraded to 5.3. Client Storage and Image Format Update: To simplify and unify the storage format used by the client, and pkg(5) repositories, the format of the client image will be changed to use the structure described below. For a version 3 image (the current format), the structure is as follows: download/ / file/ gui_cache/ history/ index/ lost+found/ pkg/ / / manifest manifest. publisher/ / catalog/ certs/ (optional) last_refreshed (optional) state/ installed/ known/ tmp/ cfg_cache lock For a version 4 image (the proposed format), the structure is as follows: cache/ index/ publisher/ / catalog/ pkg/ / / tmp/ gui_cache/ history/ license/ / lost+found/ publisher/ / certs/ ssl/ client ssl certificates> state/ installed/ known/ pkg5.image (client configuration file; was cfg_cache) A new property named 'version' will be added to the image and will be readonly (cannot be set using the set-property subcommand of pkg(1)). Existing images will not automatically be upgraded to the new format. To enable the upgrading of existing images to newer formats, the following subcommands will be added: update-format Updates the format of the client's image to the current format if possible. 5.4. Client and Depot Support for On-Disk Archive Format: The pkg.server.repository module will be updated to support the serving of a repository in readonly mode using a pkg(5) archive file. The pkg.client.api transport system will be updated to support the usage of a pkg(5) archive file as an origin for package data. To support the specification of temporary origins, the install and update subcommands will be modified by adding a '-g' option to specify additional temporary package origin URIs or the path to a pkg(5) archive file or pkg(5) info file. The '-g' option may be specified multiple times. As an example: $ pkg install -g /path/to/foo.p5p \ -g http://mytemprepo:10000/ \ -g file:/path/to/bar.p5p \ foo bar localpkg pkg(5) archive files used as a source of package data during an install or update operation will have their content cached by the client before the operation begins. Any publishers found in the archive will be temporarily added to the image if they do not already exist. Publishers that were temporarily added but not used during the operation will be removed after operation completion or failure. Any package FMRIs or patterns provided will be matched using only the sources provided using '-g'. The pkg list and pkg info commands will also be updated by adding the '-g' option described above, with the exception that the '-g' option may only be specified once, and only the source named will be used for the operation. Using '-g' with the pkg list subcommand implies '-n' by default, unless '-f' is specified; it also implies '-a'. To list all versions, the '-f' option must be used. As an example: $ pkg list -g /path/to/foo.p5p NAME (PUBLISHER) VERSION STATE UFOXI bar (example.com) 1.0-0.133 known ----- foo (example.com) 1.0-0.133 installed ----- $ pkg list -g file:/path/to/foo.p5p NAME (PUBLISHER) VERSION STATE UFOXI bar (example.com) 1.0-0.133 known ----- foo (example.com) 1.0-0.133 installed ----- $ pkg list -f -g http://example.com/multi_foo.p5p NAME (PUBLISHER) VERSION STATE UFOXI foo (example.com) 1.0-0.133 installed u---- foo (example.com) 2.0-0.133 known u---- foo (example.com) 3.0-0.133 known ----- $ pkg list -g file:/path/to/repo NAME (PUBLISHER) VERSION STATE UFOXI repopkg (example.com) 2.0-0.133 known ----- $ pkg list -g http://myrepo:10000 NAME (PUBLISHER) VERSION STATE UFOXI localpkg (example.org) 3.0-0.133 known ----- Using '-g' with the pkg info subcommand implies '-r'. The '-l' option cannot be used in combination with '-g'. As an example: $ pkg info -g /path/to/bundle.p5p Name: bar Summary: A useful complement to foo. State: Not Installed ... Name: foo Summary: Provides useful utilities. State: Installed ... '-g' was chosen for the option usage described above to match the '-g' already used by set-publisher and image-create for origins, and due to the unfortunate existing usage of '-s' by the 'pkg list' subcommand. 6. Reference Documents: Project team members and community members have provided a number of informal comments that served as the basis for the goals of this project: - "new on-disk format?", 18 Jan. 2008: http://markmail.org/thread/2kg6w5bfwp4x3knc - "reorganising the repository and client metadata", 23. Sep. 2009: http://markmail.org/thread/stfrosvx3v6if2fi - "ZAP - Zip Archive Packaging", Sep. 2007: http://markmail.org/thread/ijyq3mlrhaofccgx In addition, the following materials were referenced when writing this proposal: - "7z", 12 Apr. 2010: http://en.wikipedia.org/wiki/7z - "RFC2616: HTTP/1.1 Header Field Definitions", 01 Sep. 2004: http://www.w3.org/Protocols/rfc2616/ rfc2616-sec14.html#sec14.35.1 - "cpio", 21 Mar. 2010: http://en.wikipedia.org/wiki/Cpio - "copy file archives in and out", 26 Mar. 2007: http://heirloom.sourceforge.net/man/cpio.1.html - "The gzip file format", Date Unknown: http://www.gzip.org/format.txt - "DragonFly File Formats Manual, cpio -- format of cpio archive files" http://leaf.dragonflybsd.org/cgi/web-man?command=cpio§ion=5 - "A Quick Benchmark: Gzip vs. Bzip2 vs. LZMA", 31 May. 2005: http://tukaani.org/lzma/benchmarks.html - "Lempel Ziv Markov Algorithm and 7-Zip", 7 Feb. 2008: http://blogs.sun.com/clayb/entry/lempel_ziv_markov_algorithm_and - "The Open Group Base Specifications Issue 6: pax Interchange Format, IEEE Std 1003.1, 2004 Edition" http://www.opengroup.org/onlinepubs/009695399/utilities/ pax.html#tag_04_100_13_01 - ".ZIP File Format Specification", 28 Sep. 2007: http://www.pkware.com/documents/casestudies/APPNOTE.TXT - "ZIP (file format)", 17 Apr. 2010: http://en.wikipedia.org/wiki/ZIP_%28file_format%29