Importer.java revision 5117
3339N/A * The contents of this file are subject to the terms of the 3339N/A * Common Development and Distribution License, Version 1.0 only 3339N/A * (the "License"). You may not use this file except in compliance 3339N/A * You can obtain a copy of the license at 3339N/A * See the License for the specific language governing permissions 3339N/A * and limitations under the License. 3339N/A * When distributing Covered Code, include this CDDL HEADER in each 3339N/A * file and include the License file at 3339N/A * add the following below this CDDL HEADER, with the fields enclosed 3339N/A * by brackets "[]" replaced with your own identifying information: 3339N/A * Portions Copyright [yyyy] [name of copyright owner] 5007N/A * Copyright 2008-2010 Sun Microsystems, Inc. 4963N/A * This class provides the engine that performs both importing of LDIF files and 4963N/A * the rebuilding of indexes. 4963N/A //Defaults for LDIF reader buffers, min memory required to import and default 4963N/A //Min and MAX sizes of phase one buffer. 4963N/A //Min size of phase two read-ahead cache. 4963N/A //Set aside this much for the JVM from free memory. 4963N/A //Percent of import memory to use for temporary environment if the 4963N/A //skip DN validation flag isn't specified. 4963N/A //Small heap threshold used to give more memory to JVM to attempt OOM errors. 4963N/A //Comparators for DN and indexes respectively. 4963N/A //Phase one buffer and imported entries counts. 4963N/A //Phase one buffer size in bytes. 4963N/A //Set to true when validation is skipped. 4963N/A //Temporary environment used when DN validation is done in first phase. 5007N/A //Size in bytes of temporary env, DB cache, DB log buf size. 4963N/A //The executor service used for the buffer sort tasks. 4963N/A //The executor service used for the scratch file processing tasks. 4591N/A //Queue of free index buffers -- used to re-cycle index buffers; 4643N/A //Map of index keys to index buffers. Used to allocate sorted 4591N/A //index buffers to a index writer thread. 4591N/A //Map of DB containers to index managers. Used to start phase 2. 4963N/A //Map of DB containers to DN-based index managers. Used to start phase 2. 4591N/A //Futures used to indicate when the index file writers are done flushing 4591N/A //their work queues and have exited. End of phase one. 4963N/A //List of index file writer tasks. Used to signal stopScratchFileWriters to 4963N/A //the index file writer tasks when the LDIF file has been done. 4643N/A //Map of DNs to Suffix objects. 4963N/A //Map of container ids to database containers. 4963N/A //Map of container ids to entry containers 4963N/A //Used to synchronize when a scratch file index writer is first setup. 4963N/A //Rebuld index manager used when rebuilding indexes. 4963N/A //Set to true if the backend was cleared. 4963N/A //Used to shutdown import if an error occurs in phase one. 4963N/A //Number of phase one buffers 3339N/A * Create a new import job with the specified ldif import config. 4649N/A * @param importConfiguration The LDIF import configuration. 4765N/A * @param localDBBackendCfg The local DB back-end configuration. 4765N/A * @param envConfig The JEB environment config. 4591N/A * @throws IOException If a problem occurs while opening the LDIF file for 4649N/A * @throws InitializationException If a problem occurs during initialization. 4963N/A //Set up temporary environment. 4765N/A * Return and import LDIF instance using the specified arguments. 4765N/A * @param importCfg The import config to use. 4765N/A * @param localDBBackendCfg The local DB backend config to use. 4765N/A * @param envCfg The JEB environment config to use. 4765N/A * @return A import LDIF instance. 4765N/A * @throws IOException If an I/O error occurs. 4765N/A * @throws InitializationException If the instance cannot be initialized. 4765N/A * Return an import rebuild index instance using the specified arguments. 4765N/A * @param rebuildCfg The rebuild config to use. 4765N/A * @param localDBBackendCfg The local DB backend config to use. 4765N/A * @param envCfg The JEB environment config to use. 4765N/A * @return An import rebuild index instance. 4765N/A * @throws IOException If an I/O error occurs. 4765N/A * @throws InitializationException If the instance cannot be initialized. 4765N/A * @throws JebException If a JEB exception occurs. 4765N/A * @throws ConfigException If the instance cannot be configured. 4591N/A * Return the suffix instance in the specified map that matches the specified 4591N/A * @param dn The DN to search for. 4591N/A * @param map The map to search. 4591N/A * @return The suffix instance that matches the DN, or null if no match is 4963N/A //Used for large heap sizes when the buffer max size has been identified. Any 4963N/A //extra memory can be given to the temporary environment in that case. 5007N/A //The DN cache probably needs to be smaller and the DB cache bigger 5007N/A //because the dn2id is checked if the backend has not been cleared. 4591N/A * Calculate buffer sizes and initialize JEB properties based on memory. 4591N/A * @param envConfig The environment config to use in the calculations. 4591N/A * @throws InitializationException If a problem occurs during calculation. 5007N/A //Give any extra memory to the temp environment cache if there is any. 4963N/A //Mainly used to support multiple suffixes. Each index in each suffix gets 4963N/A //an unique ID to identify which DB it needs to go to in phase two processing. 4643N/A // This entire base DN was explicitly excluded. Skip. 4649N/A There are no branches in the explicitly defined include list under 4649N/A this base DN. Skip this base DN all together. 4643N/A // Remove any overlapping include branches. 4649N/A // Remove any exclude branches that are not are not under a include 4643N/A // branch since they will be migrated as part of the existing entries 4643N/A // outside of the include branches anyways. 4643N/A // This entire base DN is explicitly included in the import with 4643N/A // no exclude branches that we need to migrate. Just clear the entry 4643N/A // Create a temp entry container 4765N/A * Rebuild the indexes using the specified rootcontainer. 4765N/A * @param rootContainer The rootcontainer to rebuild indexes in. 4765N/A * @throws ConfigException If a configuration error occurred. 4765N/A * @throws InitializationException If an initialization error occurred. 4765N/A * @throws IOException If an IO error occurred. 4765N/A * @throws JebException If the JEB database had an error. 4765N/A * @throws DatabaseException If a database error occurred. 4765N/A * @throws InterruptedException If an interrupted error occurred. 4765N/A * @throws ExecutionException If an execution error occurred. 4649N/A * Import a LDIF using the specified root container. 4591N/A * @param rootContainer The root container to use during the import. 4591N/A * @throws ConfigException If the import failed because of an configuration 4591N/A * @throws IOException If the import failed because of an IO error. 4591N/A * @throws InitializationException If the import failed because of an 4591N/A * @throws JebException If the import failed due to a database error. 4591N/A * @throws InterruptedException If the import failed due to an interrupted 4591N/A * @throws ExecutionException If the import failed due to an execution error. 4643N/A * @throws DatabaseException If the import failed due to a database error. 4963N/A //Try to clear as much memory as possible. 4963N/A //Start DN processing first. 4963N/A //For very small heaps, give more memory to the JVM. 4963N/A //Cache size is never larger than the buffer size. 4643N/A * Task used to migrate excluded branch. 4643N/A // This is the base entry for a branch that was excluded in the 4643N/A // import so we must migrate all entries in this branch over to 4643N/A // the new entry container. 4643N/A * Task to migrate existing entries. 4660N/A // This is the base entry for a branch that will be included 4660N/A // in the import so we don't want to copy the branch to the 4660N/A * Advance the cursor to next entry at the same level in the 4660N/A * skipping all the entries in this branch. 4660N/A * Set the next starting value to a value of equal length but 4660N/A * slightly greater than the previous DN. Since keys are 4660N/A * compared in reverse order we must set the first byte 4660N/A * No possibility of overflow here. 4963N/A * This task performs phase reading and processing of the entries read from 4963N/A * the LDIF file(s). This task is used if the append flag wasn't specified. 4963N/A //Examine the DN for duplicates and missing parents. 4963N/A //If the backend was not cleared, then the dn2id needs to checked first 4963N/A //for DNs that might not exist in the DN cache. If the DN is not in 4963N/A //the suffixes dn2id DB, then the dn cache is used. 4963N/A "Index buffer processing error.");
4963N/A "Cancel processing received.");
4963N/A * This task reads sorted records from the temporary index scratch files, 4963N/A * processes the records and writes the results to the index database. The 4963N/A * DN index is treated differently then non-DN indexes. 4643N/A * This class is used to by a index DB merge thread performing DN processing 4643N/A * to keep track of the state of individual DN2ID index processing. 4835N/A //Bypass the cache for append data, lookup the parent in DN2ID and 5017N/A //If null is returned than this is a suffix DN. 4835N/A //Bypass the cache for append data, lookup the parent DN in the DN2ID 5117N/A // We have a missing parent. Maybe parent checking was turned off? 4963N/A * This task writes the temporary scratch index files using the sorted 4963N/A * buffers read from a blocking queue private to each index. 4591N/A * This task main function is to sort the index buffers given to it from 4591N/A * the import tasks reading the LDIF file. It will also create a index 4591N/A * file writer task and corresponding queue if needed. The sorted index 4591N/A * buffers are put on the index file writer queues for writing to a temporary 4591N/A * The buffer class is used to process a buffer from the temporary index files 4591N/A * during phase 2 processing. 4963N/A * The index manager class has several functions: 4963N/A * 1. It used to carry information about index processing created in phase 4963N/A * 2. It collects statistics about phase two processing for each index. 4963N/A * 3. It manages opening and closing the scratch index files. 4963N/A * The rebuild index manager handles all rebuild index related processing. 4963N/A //Rebuild index configuration. 4963N/A //Local DB backend configuration. 4963N/A //Map of index keys to indexes. 4963N/A //Map of index keys to extensible indexes. 4963N/A //Total entries to be processed. 4963N/A //Set to true if the rebuild all flag was specified. 4765N/A * Create an instance of the rebuild index manager using the specified 4765N/A * @param rebuildConfig The rebuild configuration to use. 4765N/A * @param cfg The local DB configuration to use. 4963N/A * Initialize a rebuild index manager. 4765N/A * @throws ConfigException If an configuration error occurred. 4765N/A * @throws InitializationException If an initialization error occurred. 4765N/A * @throws DatabaseException If an database error occurred. 4765N/A * @param startTime The time the rebuild started. 4963N/A * Perform rebuild index processing. 4765N/A * @throws DatabaseException If an database error occurred. 4765N/A * @throws InterruptedException If an interrupted error occurred. 4765N/A * @throws ExecutionException If an Excecution error occurred. 4765N/A * @throws JebException If an JEB error occurred. 4963N/A //Try to clear as much memory as possible. 4963N/A //Add four for: DN, id2subtree, id2children and dn2uri. 4963N/A * Return the number of entries processed by the rebuild manager. 4963N/A * @return The number of entries processed. 4963N/A * Return the total number of entries to process by the rebuild manager. 4963N/A * @return The total number for entries to process. 4963N/A * This class reports progress of rebuild index processing at fixed 4963N/A * The number of records that had been processed at the time of the 4963N/A * previous progress report. 4963N/A * The time in milliseconds of the previous progress report. 4963N/A * The environment statistics at the time of the previous report. 4963N/A * Create a new rebuild index progress task. 4963N/A * @throws DatabaseException If an error occurred while accessing the JE 4963N/A * The action to be performed by this timer task. 4963N/A * This class reports progress of first phase of import processing at 3339N/A * The number of entries that had been read at the time of the 3339N/A * previous progress report. 3339N/A * The time in milliseconds of the previous progress report. 3339N/A * The environment statistics at the time of the previous report. 4591N/A // Determines if eviction has been detected. 4591N/A // Entry count when eviction was detected. 3339N/A * Create a new import progress task. 4963N/A * The action to be performed by this timer task. 4963N/A //If first phase skip DN validation is specified use the root container 4963N/A //stats, else use the temporary environment stats. 4963N/A // Unlikely to happen and not critical. 4963N/A * This class reports progress of the second phase of import processing at 4591N/A * The number of entries that had been read at the time of the 4591N/A * previous progress report. 4591N/A * The time in milliseconds of the previous progress report. 4591N/A * The environment statistics at the time of the previous report. 4591N/A // Determines if eviction has been detected. 4591N/A * Create a new import progress task. 4765N/A * @param latestCount The latest count of entries processed in phase one. 4591N/A * The action to be performed by this timer task. 4591N/A // Unlikely to happen and not critical. 4963N/A //Do DN index managers first. 4963N/A //Do non-DN index managers. 4643N/A * A class to hold information about the entry determined by the LDIF reader. 4963N/A * Mainly the suffix the entry belongs under and the ID assigned to it by the 4643N/A * Return the suffix associated with the entry. 4643N/A * @return Entry's suffix instance; 4643N/A * Set the suffix instance associated with the entry. 4643N/A * @param suffix The suffix associated with the entry. 4643N/A * @param entryID The entry ID to set the entry ID to. 4643N/A * Return the entry ID associated with the entry. 4643N/A * @return The entry ID associated with the entry. 4643N/A * This class defines the individual index type available. 4649N/A * The sub-string index type. 4643N/A * The approximate index type. 4649N/A * The extensible sub-string index type. 4643N/A * The extensible shared index type. 4649N/A * This class is used as an index key for hash maps that need to 4649N/A * process multiple suffix index elements into a single queue and/or maps 4649N/A * based on both attribute type and index type 4649N/A * (ie., cn.equality, sn.equality,...). 4963N/A * Create index key instance using the specified attribute type, index type 4963N/A * @param attributeType The attribute type. 4643N/A * @param indexType The index type. 4963N/A * @param entryLimit The entry limit for the index. 4765N/A * An equals method that uses both the attribute type and the index type. 4963N/A * Only returns {@code true} if the attribute type and index type are 4765N/A * @param obj the object to compare. 4963N/A * @return {@code true} if the objects are equal, or {@code false} if they 4649N/A * A hash code method that adds the hash codes of the attribute type and 4643N/A * index type and returns that value. 4963N/A * @return The combined hash values of attribute type hash code and the 4643N/A * Return the attribute type. 4643N/A * @return The attribute type. 4643N/A * Return the index key name, which is the attribute type primary name, 4643N/A * a period, and the index type name. Used for building file names and 4643N/A * @return The index key name. 4963N/A * Return the entry limit associated with the index. 4963N/A * The temporary enviroment will be shared when multiple suffixes are being 4963N/A * processed. This interface is used by those suffix instance to do parental 4963N/A * checking of the DN cache. 4963N/A * Returns {@code true} if the specified DN is contained in the DN cache, 4963N/A * or {@code false} otherwise. 4963N/A * @param dn The DN to check the presence of. 4963N/A * @return {@code true} if the cache contains the DN, or {@code false} if it 4963N/A * @throws DatabaseException If an error occurs reading the database. 4963N/A * Temporary environment used to check DN's when DN validation is performed 4963N/A * during phase one processing. It is deleted after phase one processing. 4963N/A * Create a temporary DB environment and database to be used as a cache of 4963N/A * DNs when DN validation is performed in phase one processing. 4963N/A * @param envPath The file path to create the enviroment under. 4963N/A * @throws DatabaseException If an error occurs either creating the 4963N/A * environment or the DN database. 4963N/A //Hash the DN bytes. Uses the FNV-1a hash. 4963N/A * Shutdown the temporary environment. 4963N/A * @throws JebException If error occurs. 4963N/A * Insert the specified DN into the DN cache. It will return {@code true} if 4963N/A * the DN does not already exist in the cache and was inserted, or 4963N/A * {@code false} if the DN exists already in the cache. 4963N/A * @param dn The DN to insert in the cache. 4963N/A * @param val A database entry to use in the insert. 4963N/A * @param key A database entry to use in the insert. 4963N/A * @return {@code true} if the DN was inserted in the cache, or 4963N/A * {@code false} if the DN exists in the cache already and could 4963N/A * @throws JebException If an error occurs accessing the database. 4963N/A "Search DN cache failed.");
4963N/A //Add the DN to the DNs as because of a hash collision. 4963N/A "Add of DN to DN cache failed.");
4963N/A //Return true if the specified DN is in the DNs saved as a result of hash 4963N/A * Check if the specified DN is contained in the temporary DN cache. 4963N/A * @param dn A DN check for. 4965N/A * @return {@code true} if the specified DN is in the temporary DN cache, 4965N/A * or {@code false} if it is not. 4963N/A * Return temporary environment stats. 4963N/A * @param statsConfig A stats configuration instance. 4963N/A * @return Environment stats. 4963N/A * @throws DatabaseException If an error occurs retrieving the stats. 4963N/A * Uncaught exception handler. Try and catch any uncaught exceptions, log 4963N/A * them and print a stack trace.