layout.py revision 1516
1516N/A#!/usr/bin/python
1452N/A#
1452N/A# CDDL HEADER START
1452N/A#
1452N/A# The contents of this file are subject to the terms of the
1452N/A# Common Development and Distribution License (the "License").
1452N/A# You may not use this file except in compliance with the License.
1452N/A#
1452N/A# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
1452N/A# or http://www.opensolaris.org/os/licensing.
1452N/A# See the License for the specific language governing permissions
1452N/A# and limitations under the License.
1452N/A#
1452N/A# When distributing Covered Code, include this CDDL HEADER in each
1452N/A# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
1452N/A# If applicable, add the following below this CDDL HEADER, with the
1452N/A# fields enclosed by brackets "[]" replaced with your own identifying
1452N/A# information: Portions Copyright [yyyy] [name of copyright owner]
1452N/A#
1452N/A# CDDL HEADER END
1452N/A#
1452N/A# Copyright 2009 Sun Microsystems, Inc. All rights reserved.
1452N/A# Use is subject to license terms.
1452N/A
1452N/A"""object to map content hashes to file paths
1452N/A
1452N/AThe Layout class hierarchy encapsulates bijective mappings between a hash
1452N/A(or file name since those are equivalent in our system) and a relative path
1452N/Athat describes where to place that file in the file system. This bijective
1452N/Arelation should hold when the union of all layouts is considered as a single
1452N/Aset of mappings. In practical terms, this means that only one layout may
1452N/Apotentially deposit a hash into any particular location. This is not a
1452N/Adifficult requirement to satisfy since each layout may append a unique
1452N/Aidentifier to the file name or choose to carve out its own namespace at some
1452N/Alevel of directory hierarchy.
1452N/A
1452N/AThe V1Layout places each file into a single layer of 256 directories. A
1452N/Afanout of 256 provides good performance compared to the other layouts
1452N/Atested. It also allows over 8M files to be stored even with filesystems
1452N/Awhich limit the number of files in a directory to 65k.
1452N/A
1452N/AThe V0Layout layout uses two layers of directories; the first has a fanout
1452N/Aof 256 while the second has a fanout of 16M. This layout has the problem
1452N/Athat for the sizes of images (on the order of 300-500k files) and repos (on
1452N/Athe order of 1M files), the second director level usually contains a single
1452N/Afile. This imposes a substantial penalty for removing or resyncing the
1452N/Adirectories because a readdir(3C) must be done for each directory and
1452N/Areaddir is two orders of magnitude slower than the open or read ZFS
1452N/Aoperations, and one order of magnitude slower than ZFS remove. Reducing
1452N/Athe number of directories used to hold the downloaded files was a goal for
1452N/Athe next layout.
1452N/A
1452N/ATo evaluate a layout, it is necessary to measure the insertion time, the
1452N/Aremoval time, and the time to open a random file. The insertion time
1452N/Aaffects the publication speed. The removal time effects the time a client
1452N/Amay take to clear its download cache. The access time effects how quickly
1452N/Aa server can open a file to serve it. File sizes from 1 to 10M were used
1452N/Ato asses the scalability of the different layouts."""
1452N/A
1452N/A
1452N/Aimport os
1452N/A
1452N/Aclass Layout(object):
1452N/A """This class is the parent class to all layouts. It defines the
1452N/A interface which those subclasses must satisfy."""
1452N/A
1452N/A def lookup(self, hashval):
1452N/A """Return the path to the file with name "hashval"."""
1452N/A raise NotImplementedError
1452N/A
1452N/A def path_to_hash(self, path):
1452N/A """Return the hash which would map to "path"."""
1452N/A raise NotImplementedError
1452N/A
1452N/A def contains(self, rel_path, file_name):
1452N/A """Returns whether this layout would place a file named
1452N/A "file_name" at "rel_path"."""
1452N/A return self.lookup(file_name) == rel_path
1452N/A
1452N/A
1452N/Aclass V0Layout(Layout):
1452N/A """This class implements the original layout used. It uses a 256 way
1452N/A split (2 hex digits) followed by a 16.7M way split (6 hex digits)."""
1452N/A
1452N/A def lookup(self, hashval):
1452N/A """Return the path to the file with name "hashval"."""
1452N/A return os.path.join(hashval[0:2], hashval[2:8], hashval)
1452N/A
1452N/A def path_to_hash(self, path):
1452N/A """Return the hash which would map to "path"."""
1452N/A return os.path.basename(path)
1452N/A
1452N/A
1452N/Aclass V1Layout(Layout):
1452N/A """This class implements the new layout approach which is a single 256
1452N/A way fanout using the first two digits of the hash."""
1452N/A
1452N/A def lookup(self, hashval):
1452N/A """Return the path to the file with name "hashval"."""
1452N/A return os.path.join(hashval[0:2], hashval)
1452N/A
1452N/A def path_to_hash(self, path):
1452N/A """Return the hash which would map to "path"."""
1452N/A return os.path.basename(path)
1452N/A
1452N/A
1452N/Adef get_default_layouts():
1452N/A """This function describes the default order in which to use the
1452N/A layouts defined above."""
1452N/A
1452N/A return [V1Layout(), V0Layout()]
1452N/A
1452N/Adef get_preferred_layout():
1452N/A """This function returns the single preferred layout to use."""
1452N/A
1452N/A return V1Layout()