search_storage.py revision 1100
429N/A# The contents of this file are subject to the terms of the 429N/A# Common Development and Distribution License (the "License"). 429N/A# You may not use this file except in compliance with the License. 429N/A# See the License for the specific language governing permissions 429N/A# and limitations under the License. 429N/A# When distributing Covered Code, include this CDDL HEADER in each 429N/A# If applicable, add the following below this CDDL HEADER, with the 429N/A# fields enclosed by brackets "[]" replaced with your own identifying 429N/A# information: Portions Copyright [yyyy] [name of copyright owner] 941N/A# Copyright 2009 Sun Microsystems, Inc. All rights reserved. 429N/A# Use is subject to license terms. 429N/A """Opens all data holders in data_list and ensures that the 429N/A versions are consistent among all of them. 429N/A It retries several times in case a race condition between file 429N/A migration and open is encountered. 429N/A Note: Do not set timeout to be 0. It will cause an exception to be 441N/A # The assignments to cur_version and missing cannot be 441N/A # placed here. They must be reset prior to breaking out of the 441N/A # for loop so that the while loop condition will be true. They 441N/A # cannot be placed after the for loop since that path is taken 441N/A # when all files are missing or opened successfully. 429N/A # All indexes must have the same version and all must 429N/A # either be present or absent for a successful return. 429N/A # If one of these conditions is not met, the function 429N/A # tries again until it succeeds or the time spent in 429N/A # in the function is greater than timeout. 429N/A # If we get here, then the current index file 429N/A # Read the version. If this is the first file, 429N/A # set the expected version otherwise check that 429N/A # the version matches the expected version. 429N/A # Got inconsistent versions, so close 429N/A # all files and try again. 429N/A # If the index file is missing, ensure 429N/A # that previous files were missing as 429N/A # well. If not, try again. 429N/A # The index is missing (ie, no files were present). 429N/A """Base class for all data storage used by the indexer and 429N/A queryEngine. All members must have a file name and maintain 429N/A an internal file handle to that file as instructed by external 429N/A """Closes the file handle and clears it so that it cannot 429N/A """Writes the dictionary in the expected format. 429N/A Note: Only child classes should call this method. 429N/A """This method uses the modification time and the file size 429N/A to (heuristically) determine whether the file backing this 429N/A storage has changed since it was last read. 429N/A """This uses consistent open to ensure that the version line 429N/A processing is done consistently and that only a single function 429N/A actually opens files stored using this class. 429N/A """Class for representing the main dictionary file 429N/A # Here is an example of a line from the main dictionary, it is 1100N/A # Each line begins with a urllib quoted search token. It's followed by 1100N/A # a set of space separated lists. Each of these lists begin with an 1100N/A # action type. It's separated from its sublist by a '!'. Next is the 1100N/A # key type, which is separated from its sublist by a '@'. Next is the 1100N/A # full value, which is used in set actions to hold the full value which 1100N/A # matched the token. It's separated from its sublist by a '#'. The 1100N/A # next token (579) is the fmri id. The subsequent comma separated 1100N/A # values are the byte offsets into that manifest of the lines containing 429N/A """This class relies on external methods to write the file. 429N/A Making this empty call to protected_write_dict_file allows the 429N/A file to be set up correctly with the version number stored 429N/A """Return the file handle. Note that doing 429N/A anything other than sequential reads or writes 429N/A to or from this file_handle may result in unexpected 429N/A behavior. In short, don't use seek. 1100N/A """Helper function for parse_main_dict_line. 1100N/A The "split_chars" parameter is a list of characters to use to 1100N/A The "unquote_list" parameter is a list of booleans which tells 1100N/A whether to unquote each level of value. 1100N/A The "line" parameter is the line to parse. 429N/A """Parses one line of a main dictionary file. 429N/A Changes to this function must be paired with changes to 429N/A write_main_dict_line below. 941N/A [
" ",
"!",
"@",
"#",
","],
1026N/A """Pulls the token out of a line from a main dictionary file. 1026N/A Changes to this function must be paired with changes to 1026N/A write_main_dict_line below. 1100N/A """Helper function for write_main_dict_line. 1100N/A The "file_handle" parameter is the file handle to write lines 1100N/A The "sep_chars" parameter is the list of characters to use to 1100N/A separate each level of the entries. 1100N/A The "quote" parameter is a list of boolean values which 1100N/A determine whether the value being written is quoted or not. 1100N/A The "entries" parameter is a list of lists of lists and so on. 1100N/A The depth of all lists at each level must be consistent, and 1100N/A must match the length of "sep_chars" and "quote". 429N/A """Paired with parse_main_dict_line above. Writes 429N/A a line in a main dictionary file in the appropriate format. 941N/A [
"",
" ",
"!",
"@",
"#",
","],
1100N/A """Helper function for transform_main_dict_line. 1100N/A The "file_handle" parameter is the file handle to write lines 1100N/A The "sep_chars" parameter is the list of characters to use to 1100N/A separate each level of the entries. 1100N/A The "quote" parameter is a list of boolean values which 1100N/A determine whether the value being written is quoted or not. 1100N/A The "entries" parameter is a list of lists of lists and so on. 1100N/A The depth of all lists at each level must be consistent, and 1100N/A must match the length of sep_chars and quote. 1100N/A """Paired with parse_main_dict_line above. Transforms a token 1100N/A and its data into the string which would be written to the main 941N/A [
"",
" ",
"!",
"@",
"#",
","],
429N/A """Returns the number of entries removed during a second phase 429N/A # This returns 0 because this class is not responsible for 429N/A # storing anything in memory. 429N/A """Moves the existing file with self._name in directory 429N/A use_dir to a new file named self._name + suffix in directory 429N/A use_dir. If it has done this previously, it removes the old 429N/A file it moved. It also opens the newly moved file and uses 429N/A that as the file for its file handle. 429N/A """Used when both a list and a dictionary are needed to 429N/A store the information. Used for bidirectional lookup when 429N/A one item is an int (an id) and the other is not (an entity). It 429N/A maintains a list of empty spots in the list so that adding entities 429N/A can take advantage of unused space. It encodes empty space as a blank 429N/A line in the file format and '' in the internal list. 429N/A """Adds an entity consistently to the list and dictionary 429N/A allowing bidirectional lookup. 429N/A """deletes in_id from the list and the dictionary """ 429N/A """deletes the entity from the list and the dictionary """ 429N/A """returns the id of entity """ 429N/A """Adds entity if it's not previously stored and returns the 429N/A # This code purposefully reimplements add_entity 429N/A # code. Replacing the function calls to has_entity, add_entity, 429N/A # and get_id with direct access to the data structure gave a 429N/A # speed up of a factor of 4. Because this is a very hot path, 429N/A # the tradeoff seemed appropriate. 429N/A """return the entity in_id maps to """ 429N/A """check if entity is in storage """ 429N/A """Check if the structure has any empty elements which 429N/A can be filled with data. 429N/A """returns the next id which maps to no element """ 429N/A """Passes self._list to the parent class to write to a file. 429N/A """Reads in a dictionary previously stored using the above 429N/A # A blank line means that id can be reused. 429N/A """Returns the number of entries removed during a second phase 429N/A """Class used when only entity -> id lookup is needed 429N/A """Reads in a dictionary stored in line number -> entity 429N/A """If it's necessary to reread the file, it rereads the 1100N/A file. It matches the line it reads against the contents of 1100N/A "in_set". If a match is found, the entry on the line is stored 1100N/A for later use, otherwise the line is skipped. When all items 429N/A in in_set have been matched, the method is done and returns. 1100N/A By default, any existing information is cleared before the 1100N/A dictionary is reread. With "update", the original dictionary 1100N/A is left in place and any new information is added to it. 941N/A # skip the version line 429N/A """Returns the number of entries removed during a second phase 429N/A """Dictionary which allows dynamic update of its storage 429N/A """Reads in a dictionary stored in with an entity 429N/A and its number on each line. 429N/A """Opens the output file for this class and prepares it 429N/A to be written via write_entity. 429N/A """Writes the entity out to the file with my_id """ 429N/A """ Generates an iterable list of string representations of 429N/A the dictionary that the parent's protected_write_dict_file 429N/A """Returns the number of entries removed during a second phase 516N/A """Set the has value.""" 516N/A """Calculate the hash value of the sorted members of vals.""" 516N/A """Write self.hash_val out to a line in a file """ 516N/A """Process a dictionary file written using the above method 516N/A """Check the hash value of vals against the value stored 516N/A in the file for this object.""" 516N/A """Returns the number of entries removed during a second phase 429N/A """Used when only set membership is desired. 429N/A This is currently designed for exclusive use 429N/A with storage of fmri.PkgFmris. However, that impact 429N/A is only seen in the read_and_discard_matching_from_argument 429N/A """Remove entity purposfully assumes that entity is 429N/A already in the set to be removed. This is useful for 429N/A error checking and debugging. 429N/A """Write each member of the set out to a line in a file """ 429N/A """Process a dictionary file written using the above method 429N/A """Reads the file and removes all frmis in the file 429N/A """Returns the number of entries removed during a second phase 1054N/A """Class used to store and process fmri to offset mappings. It does 1054N/A delta compression and deduplication of shared offset sets when writing 1054N/A """file_name is the name of the file to write to or read from. 1054N/A p_id_trans is an object which has a get entity method which, 1054N/A when given a package id number returns the PkgFmri object 1054N/A """Adds a package id number and an associated offset to the 1054N/A """Does delta encoding of offsets to reduce space by only 1054N/A storing the difference between the current offset and the 1054N/A previous offset. It also performs deduplication so that all 1054N/A packages with the same set of offsets share a common bucket.""" 1054N/A """For a given offset string, a list of package id numbers, 1054N/A and a translator from package id numbers to PkgFmris, returns 1054N/A the string which represents that information. Its format is 1054N/A space separated package fmris, followed by a !, followed by 1054N/A space separated offsets which have had delta compression 1054N/A """Write the mapping of package fmris to offset sets out 1054N/A """Read a file written by the above function and store the 1054N/A information in a dictionary.""" 1054N/A """For a list of strings of offsets, undo the delta compression 1054N/A that has been performed.""" 1054N/A """For a given function which returns true if it matches the 1054N/A desired fmri, return the offsets which are associated with the