search_storage.py revision 3171
1505N/A# The contents of this file are subject to the terms of the 1505N/A# Common Development and Distribution License (the "License"). 1505N/A# You may not use this file except in compliance with the License. 1505N/A# See the License for the specific language governing permissions 1505N/A# and limitations under the License. 1505N/A# When distributing Covered Code, include this CDDL HEADER in each 1505N/A# If applicable, add the following below this CDDL HEADER, with the 1505N/A# fields enclosed by brackets "[]" replaced with your own identifying 1505N/A# information: Portions Copyright [yyyy] [name of copyright owner] 3158N/A# Copyright (c) 2010, 2015, Oracle and/or its affiliates. All rights reserved. 2931N/A """Opens all data holders in data_list and ensures that the 2931N/A versions are consistent among all of them. 2931N/A It retries several times in case a race condition between file 2931N/A migration and open is encountered. 2925N/A Note: Do not set timeout to be 0. It will cause an exception to be 2925N/A # The assignments to cur_version and missing cannot be 2925N/A # placed here. They must be reset prior to breaking out of the 2925N/A # for loop so that the while loop condition will be true. They 2925N/A # cannot be placed after the for loop since that path is taken 2925N/A # when all files are missing or opened successfully. 2925N/A # All indexes must have the same version and all must 2925N/A # either be present or absent for a successful return. 2925N/A # If one of these conditions is not met, the function 2925N/A # tries again until it succeeds or the time spent in 2925N/A # in the function is greater than timeout. 2925N/A # If we get here, then the current index file 2925N/A # Read the version. If this is the first file, 2925N/A # set the expected version otherwise check that 2925N/A # the version matches the expected version. 2608N/A # Got inconsistent versions, so close 2690N/A # If the index file is missing, ensure 2690N/A # that previous files were missing as 2690N/A # The index is missing (ie, no files were present). 2925N/A """Base class for all data storage used by the indexer and 2925N/A queryEngine. All members must have a file name and maintain 2925N/A an internal file handle to that file as instructed by external 2925N/A """Closes the file handle and clears it so that it cannot 2925N/A """Writes the dictionary in the expected format. 2925N/A Note: Only child classes should call this method. 2925N/A """This method uses the modification time and the file size 2925N/A to (heuristically) determine whether the file backing this 2925N/A storage has changed since it was last read. 2925N/A """This uses consistent open to ensure that the version line 2925N/A processing is done consistently and that only a single function 2925N/A actually opens files stored using this class. 2925N/A """Class for representing the main dictionary file 2925N/A # Here is an example of a line from the main dictionary, it is 2925N/A # Each line begins with a urllib quoted search token. It's followed by 2925N/A # a set of space separated lists. Each of these lists begin with an 2925N/A # action type. It's separated from its sublist by a '!'. Next is the 2925N/A # key type, which is separated from its sublist by a '@'. Next is the 2925N/A # full value, which is used in set actions to hold the full value which 2690N/A # matched the token. It's separated from its sublist by a '#'. The 1505N/A # next token (579) is the fmri id. The subsequent comma separated 1505N/A # values are the byte offsets into that manifest of the lines containing 2925N/A """This class relies on external methods to write the file. 2925N/A Making this empty call to protected_write_dict_file allows the 2925N/A file to be set up correctly with the version number stored 2931N/A """Return the file handle. Note that doing 2931N/A anything other than sequential reads or writes 3245N/A to or from this file_handle may result in unexpected 2925N/A behavior. In short, don't use seek. 3245N/A """Parses one line of a main dictionary file. 2925N/A Changes to this function must be paired with changes to 2925N/A write_main_dict_line below. 2925N/A This should produce the same data structure that 2925N/A """Pulls the token out of a line from a main dictionary file. 2925N/A Changes to this function must be paired with changes to 2925N/A write_main_dict_line below. 1505N/A """Paired with parse_main_dict_line above. Transforms a token 2608N/A and its data into the string which can be written to the main 3070N/A The "token" parameter is the token whose index line is being 3070N/A The "entries" parameter is a list of lists of lists and so on. 2925N/A It contains information about where and how "token" was seen in 2925N/A manifests. The depth of all lists at each level must be 2925N/A consistent, and must match the length of "sep_chars" and 2925N/A "quote". The details of the contents on entries are described 2925N/A """Returns the number of entries removed during a second phase 2925N/A # This returns 0 because this class is not responsible for 2925N/A # storing anything in memory. 2925N/A """Moves the existing file with self._name in directory 2925N/A use_dir to a new file named self._name + suffix in directory 2925N/A use_dir. If it has done this previously, it removes the old 2925N/A file it moved. It also opens the newly moved file and uses 2925N/A that as the file for its file handle. 2925N/A """Used when both a list and a dictionary are needed to 2925N/A store the information. Used for bidirectional lookup when 2925N/A one item is an int (an id) and the other is not (an entity). It 2925N/A maintains a list of empty spots in the list so that adding entities 2925N/A can take advantage of unused space. It encodes empty space as a blank 2925N/A line in the file format and '' in the internal list. 2925N/A """Adds an entity consistently to the list and dictionary 2925N/A allowing bidirectional lookup. 2925N/A """deletes in_id from the list and the dictionary """ 1505N/A """deletes the entity from the list and the dictionary """ 2925N/A """returns the id of entity """ 2925N/A """Adds entity if it's not previously stored and returns the 1505N/A # This code purposefully reimplements add_entity 1505N/A # code. Replacing the function calls to has_entity, add_entity, 1505N/A # and get_id with direct access to the data structure gave a 1505N/A # speed up of a factor of 4. Because this is a very hot path, 1505N/A # the tradeoff seemed appropriate. 2925N/A """return the entity in_id maps to """ 2925N/A """check if entity is in storage """ 2925N/A """Check if the structure has any empty elements which 2925N/A """returns the next id which maps to no element """ 2925N/A """Passes self._list to the parent class to write to a file. 1505N/A """Reads in a dictionary previously stored using the above 1505N/A # A blank line means that id can be reused. 2925N/A """Returns the number of entries removed during a second phase 2925N/A """Class used when only entity -> id lookup is needed 2639N/A """Reads in a dictionary stored in line number -> entity """Returns the number of entries removed during a second phase """Dictionary which allows dynamic update of its storage """Reads in a dictionary stored in with an entity and its number on each line. """Opens the output file for this class and prepares it to be written via write_entity. """Writes the entity out to the file with my_id """ """ Generates an iterable list of string representations of the dictionary that the parent's protected_write_dict_file """Returns the number of entries removed during a second phase # In order to interoperate with older clients, we must use sha-1 """Calculate the hash value of the sorted members of vals.""" # In order to interoperate with older clients, we must use sha-1 """Write self.hash_val out to a line in a file """ """Process a dictionary file written using the above method """Check the hash value of vals against the value stored in the file for this object.""" """Returns the number of entries removed during a second phase """Used when only set membership is desired. This is currently designed for exclusive use with storage of fmri.PkgFmris. However, that impact is only seen in the read_and_discard_matching_from_argument """Remove entity purposfully assumes that entity is already in the set to be removed. This is useful for error checking and debugging. """Write each member of the set out to a line in a file """ """Process a dictionary file written using the above method """Reads the file and removes all frmis in the file """Returns the number of entries removed during a second phase """Class used to store and process fmri to offset mappings. It does delta compression and deduplication of shared offset sets when writing """file_name is the name of the file to write to or read from. p_id_trans is an object which has a get entity method which, when given a package id number returns the PkgFmri object """Adds a package id number and an associated offset to the """Does delta encoding of offsets to reduce space by only storing the difference between the current offset and the previous offset. It also performs deduplication so that all packages with the same set of offsets share a common bucket.""" """For a given offset string, a list of package id numbers, and a translator from package id numbers to PkgFmris, returns the string which represents that information. Its format is space separated package fmris, followed by a !, followed by space separated offsets which have had delta compression """Write the mapping of package fmris to offset sets out """Read a file written by the above function and store the information in a dictionary.""" """For a list of strings of offsets, undo the delta compression that has been performed.""" """For a given function which returns true if it matches the desired fmri, return the offsets which are associated with the