bt_recno.c revision 1
1N/A * See the file LICENSE for redistribution information. 1N/A * Copyright (c) 1997, 1998 1N/A * Sleepycat Software. All rights reserved. 1N/A#
endif /* not lint */ 1N/A * In recno, there are two meanings to the on-page "deleted" flag. If we're 1N/A * re-numbering records, it means the record was implicitly created. We skip 1N/A * over implicitly created records if doing a cursor "next" or "prev", and 1N/A * return DB_KEYEMPTY if they're explicitly requested.. If not re-numbering 1N/A * records, it means that the record was implicitly created, or was deleted. 1N/A * We skip over implicitly created or deleted records if doing a cursor "next" 1N/A * or "prev", and return DB_KEYEMPTY if they're explicitly requested. 1N/A * If we're re-numbering records, then we have to detect in the cursor that 1N/A * a record was deleted, and adjust the cursor as necessary on the next get. 1N/A * If we're not re-numbering records, then we can detect that a record has 1N/A * been deleted by looking at the actual on-page record, so we completely 1N/A * ignore the cursor's delete flag. This is different from the B+tree code. 1N/A * It also maintains whether the cursor references a deleted record in the 1N/A * cursor, and it doesn't always check the on-page value. 1N/A * Recno open function. 1N/A * PUBLIC: int __ram_open __P((DB *, DB_INFO *)); 1N/A /* Allocate and initialize the private btree structure. */ 1N/A /* Allocate and initialize the private recno structure. */ 1N/A /* Link in the private recno structure. */ 1N/A * Intention is to make sure all of the user's selections are okay 1N/A * here and then use them without checking. 1N/A * If the user specified a source tree, open it and map it in. 1N/A * We don't complain if the user specified transactions or 1N/A * threads. It's possible to make it work, but you'd better 1N/A * know what you're doing! 1N/A /* Copy delimiter, length and padding values. */ 1N/A "record length must be greater than 0");
1N/A /* Start up the tree. */ 1N/A /* Set the overflow page size. */ 1N/A /* If we're snapshotting an underlying source file, do it now. */ 1N/A /* Allocate a cursor. */ 1N/A /* Do the snapshot. */ 1N/A /* Discard the cursor. */ 1N/Aerr:
/* If we mmap'd a source file, discard it. */ 1N/A /* If we opened a source file, discard it. */ 1N/A * Recno db->del function. 1N/A /* Check for invalid flags. */ 1N/A /* Acquire a cursor. */ 1N/A /* Check the user's record number and fill in as necessary. */ 1N/A /* Do the delete. */ 1N/A /* Release the cursor. */ 1N/A * Internal version of recno delete, called by __ram_delete and 1N/A * If this is CDB and this isn't a write cursor, then it's an error. 1N/A * If it is a write cursor, but we don't yet hold the write lock, then 1N/A * we need to upgrade to the write lock. 1N/A /* Make sure it's a valid update cursor. */ 1N/A /* Search the tree for the key; delete only deletes exact matches. */ 1N/A * If re-numbering records, the on-page deleted flag can only mean 1N/A * that this record was implicitly created. Applications aren't 1N/A * permitted to delete records they never created, return an error. 1N/A * If not re-numbering records, the on-page deleted flag means that 1N/A * this record was implicitly created, or, was deleted at some time. 1N/A * The former is an error because applications aren't permitted to 1N/A * delete records they never created, the latter is an error because 1N/A * if the record was "deleted", we could never have found it. 1N/A /* Delete the item, adjust the counts, adjust the cursors. */ 1N/A * If the page is empty, delete it. The whole tree is locked 1N/A * so there are no preparations to make. 1N/A /* Use a delete/put pair to replace the record with a marker. */ 1N/A /* If we upgraded the CDB lock upon entry; downgrade it now. */ 1N/A * Recno db->put function. 1N/A /* Check for invalid flags. */ 1N/A /* Allocate a cursor. */ 1N/A * If we're appending to the tree, make sure we've read in all of 1N/A * the backing source file. Otherwise, check the user's record 1N/A * number and fill in as necessary. 1N/A /* Add the record. */ 1N/A /* Discard the cursor. */ 1N/A /* Return the record number if we're appending to the tree. */ 1N/A * Recno db->sync function. 1N/A * Sync the underlying btree. 1N/A * We don't need to do a panic check or flags check, the "real" 1N/A * sync function does all that for us. 1N/A /* Allocate a cursor. */ 1N/A /* Copy back the backing source file. */ 1N/A /* Discard the cursor. */ 1N/A * Recno db->close function. 1N/A * PUBLIC: int __ram_close __P((DB *)); 1N/A /* Close any underlying mmap region. */ 1N/A /* Close any backing source file descriptor. */ 1N/A /* Free any backing source file name. */ 1N/A /* Free allocated memory. */ 1N/A /* Close the underlying btree. */ 1N/A * Recno cursor->c_del function. 1N/A * PUBLIC: int __ram_c_del __P((DBC *, u_int32_t)); 1N/A /* Check for invalid flags. */ 1N/A * If we are running CDB, this had better be either a write 1N/A * cursor or an immediate writer. 1N/A * The semantics of cursors during delete are as follows: if record 1N/A * numbers are mutable (DB_RE_RENUMBER is set), deleting a record 1N/A * causes the cursor to automatically point to the record immediately 1N/A * following. In this case it is possible to use a single cursor for 1N/A * repeated delete operations, without intervening operations. 1N/A * If record numbers are not mutable, then records are replaced with 1N/A * a marker containing a delete flag. If the record referenced by 1N/A * this cursor has already been deleted, we will detect that as part 1N/A * of the delete operation, and fail. 1N/A * Recno cursor->c_get function. 1N/A * PUBLIC: int __ram_c_get __P((DBC *, DBT *, DBT *, u_int32_t)); 1N/A /* Check for invalid flags. */ 1N/A /* Clear OR'd in additional bits so we can check for flag equality. */ 1N/A /* Initialize the cursor for a new retrieval. */ 1N/A * If record numbers are mutable: if we just deleted a record, 1N/A * there is no action necessary, we return the record following 1N/A * the deleted item by virtue of renumbering the tree. 1N/A * If record numbers are mutable: if we just deleted a record, 1N/A * we have to avoid incrementing the record number so that we 1N/A * return the right record by virtue of renumbering the tree. 1N/A /* Return the key if the user didn't give us one. */ 1N/A /* Search the tree for the record. */ 1N/A * If re-numbering records, the on-page deleted flag means this record 1N/A * was implicitly created. If not re-numbering records, the on-page 1N/A * deleted flag means this record was implicitly created, or, it was 1N/A * deleted at some time. Regardless, we skip such records if doing 1N/A * cursor next/prev operations, and fail if the application requested 1N/A /* Return the data item. */ 1N/A /* The cursor was reset, no further delete adjustment is necessary. */ 1N/A /* Release temporary lock upgrade. */ 1N/A * Recno cursor->c_put function. 1N/A * PUBLIC: int __ram_c_put __P((DBC *, DBT *, DBT *, u_int32_t)); 1N/A * If we are running CDB, this had better be either a write 1N/A * cursor or an immediate writer. If it's a regular writer, 1N/A * that means we have an IWRITE lock and we need to upgrade 1N/A * it to a write lock. 1N/A /* Initialize the cursor for a new retrieval. */ 1N/A * To split, we need a valid key for the page. Since it's a cursor, 1N/A * we have to build one. 1N/A * The split code discards all short-term locks and stack pages. 1N/A /* Adjust the cursors. */ 1N/A /* Set this cursor to reference the new record. */ 1N/A /* Adjust the cursors. */ 1N/A /* Set this cursor to reference the new record. */ 1N/A /* The cursor was reset, no further delete adjustment is necessary. */ 1N/A * PUBLIC: void __ram_ca __P((DB *, db_recno_t, ca_recno_arg)); 1N/A * Adjust the cursors. See the comment in __bam_ca_delete(). 1N/A * Check the user's record number, and make sure we've seen it. 1N/A * PUBLIC: int __ram_getno __P((DBC *, const DBT *, db_recno_t *, int)); 1N/A /* Check the user's record number. */ 1N/A * Btree can neither create records nor read them in. Recno can 1N/A * do both, see if we can find the record. 1N/A * Ensure the tree has records up to and including the specified one. 1N/A * If we can't create records and we've read the entire backing input 1N/A * If we haven't seen this record yet, try to get it from the original 1N/A * If we can create records, create empty ones up to the requested 1N/A * Load information about the backing file. 1N/A * The caller has full responsibility for cleaning up on error -- 1N/A * (it has to anyway, in case it fails after this routine succeeds). 1N/A * We'd like to test to see if the file is too big to mmap. Since we 1N/A * don't know what size or type off_t's or size_t's are, or the largest 1N/A * unsigned integral type is, or what random insanity the local C 1N/A * compiler will perpetrate, doing the comparison in a portable way is 1N/A * flatly impossible. Hope that mmap fails if the file is too large. 1N/A * __ram_writeback -- 1N/A * Rewrite the backing file. 1N/A /* If the file wasn't modified, we're done. */ 1N/A /* If there's no backing source file, we're done. */ 1N/A * Read any remaining records into the tree. 1N/A * This is why we can't support transactions when applications specify 1N/A * backing (re_source) files. At this point we have to read in the 1N/A * rest of the records from the file so that we can write all of the 1N/A * records back out again, which could modify a page for which we'd 1N/A * have to log changes and which we don't have locked. This could be 1N/A * partially fixed by taking a snapshot of the entire file during the 1N/A * db_open(), or, since db_open() isn't transaction protected, as part 1N/A * of the first DB operation. But, if a checkpoint occurs then, the 1N/A * part of the log holding the copy of the file could be discarded, and 1N/A * that would make it impossible to recover in the face of disaster. 1N/A * This could all probably be fixed, but it would require transaction 1N/A * protecting the backing source file, i.e. mpool would have to know 1N/A * about it, and we don't want to go there. 1N/A * Close any underlying mmap region. This is required for Windows NT 1N/A * (4.0, Service Pack 2) -- if the file is still mapped, the following 1N/A /* Get rid of any backing file descriptor, just on GP's. */ 1N/A /* Open the file, truncating it. */ 1N/A * We step through the records, writing each one out. Use the record 1N/A * number and the dbp->get() function, instead of a cursor, so we find 1N/A * and write out "deleted" or non-existent records. 1N/A * We'll need the delimiter if we're doing variable-length records, 1N/A * and the pad character if we're doing fixed-length records. 1N/Adone:
/* Close the file descriptor. */ 1N/A * Get fixed length records from a file. 1N/A * Another process may have read this record from the input 1N/A * file and stored it into the database already, in which 1N/A * case we don't need to repeat that operation. We detect 1N/A * this by checking if the last record we've read is greater 1N/A * or equal to the number of records in the database. 1N/A * We should just do a seek, since the records are fixed 1N/A * Get variable length records from a file. 1N/A * Another process may have read this record from the input 1N/A * file and stored it into the database already, in which 1N/A * case we don't need to repeat that operation. We detect 1N/A * this by checking if the last record we've read is greater 1N/A * or equal to the number of records in the database. 1N/A * Add records into the tree. 1N/Aretry:
/* Find the slot for insertion. */ 1N/A * If re-numbering records, the on-page deleted flag means this record 1N/A * was implicitly created. If not re-numbering records, the on-page 1N/A * deleted flag means this record was implicitly created, or, it was 1N/A * deleted at some time. 1N/A * If DB_NOOVERWRITE is set and the item already exists in the tree, 1N/A * return an error unless the item was either marked for deletion or 1N/A * only implicitly created. 1N/A * Select the arguments for __bam_iitem() and do the insert. If the 1N/A * key is an exact match, or we're replacing the data item with a 1N/A * new data item, replace the current item. If the key isn't an exact 1N/A * match, we're inserting a new key/data pair, before the search 1N/A * Don't adjust anything. 1N/A * If we inserted a record, no cursors need adjusting because 1N/A * the only new record it's possible to insert is at the very 1N/A * end of the tree. The necessary adjustments to the internal 1N/A * page counts were made by __bam_iitem(). 1N/A * If we overwrote a record, no cursors need adjusting because 1N/A * future DBcursor->get calls will simply return the underlying 1N/A * record (there's no adjustment made for the DB_CURRENT flag 1N/A * when a cursor get operation immediately follows a cursor 1N/A * delete operation, and the normal adjustment for the DB_NEXT 1N/A * flag is still correct). 1N/A /* Discard the stack of pages and split the page. */