1N/A * See the file LICENSE for redistribution information. 1N/A * Copyright (c) 1996, 1997, 1998 1N/A * Sleepycat Software. All rights reserved. 1N/A * Copyright (c) 1990, 1993, 1994, 1995, 1996 1N/A * Keith Bostic. All rights reserved. 1N/A * Copyright (c) 1990, 1993, 1994, 1995 1N/A * The Regents of the University of California. All rights reserved. 1N/A * Redistribution and use in source and binary forms, with or without 1N/A * modification, are permitted provided that the following conditions 1N/A * 1. Redistributions of source code must retain the above copyright 1N/A * notice, this list of conditions and the following disclaimer. 1N/A * 2. Redistributions in binary form must reproduce the above copyright 1N/A * notice, this list of conditions and the following disclaimer in the 1N/A * documentation and/or other materials provided with the distribution. 1N/A * 3. All advertising materials mentioning features or use of this software 1N/A * must display the following acknowledgement: 1N/A * This product includes software developed by the University of 1N/A * California, Berkeley and its contributors. 1N/A * 4. Neither the name of the University nor the names of its contributors 1N/A * may be used to endorse or promote products derived from this software 1N/A * without specific prior written permission. 1N/A * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 1N/A * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 1N/A * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 1N/A * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 1N/A * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 1N/A * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 1N/A * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 1N/A * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 1N/A * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 1N/A * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 1N/A#
endif /* not lint */ 1N/A * PUBLIC: int __bam_split __P((DBC *, void *)); 1N/A * The locking protocol we use to avoid deadlock to acquire locks by 1N/A * walking down the tree, but we do it as lazily as possible, locking 1N/A * the root only as a last resort. We expect all stack pages to have 1N/A * been discarded before we're called; we discard all short-term locks. 1N/A * When __bam_split is first called, we know that a leaf page was too 1N/A * full for an insert. We don't know what leaf page it was, but we 1N/A * have the key/recno that caused the problem. We call XX_search to 1N/A * reacquire the leaf page, but this time get both the leaf page and 1N/A * its parent, locked. We then split the leaf page and see if the new 1N/A * internal key will fit into the parent page. If it will, we're done. 1N/A * If it won't, we discard our current locks and repeat the process, 1N/A * only this time acquiring the parent page and its parent, locked. 1N/A * This process repeats until we succeed in the split, splitting the 1N/A * root page as the final resort. The entire process then repeats, 1N/A * as necessary, until we split a leaf page. 1N/A * A traditional method of speeding this up is to maintain a stack of 1N/A * the pages traversed in the original search. You can detect if the 1N/A * stack is correct by storing the page's LSN when it was searched and 1N/A * comparing that LSN with the current one when it's locked during the 1N/A * split. This would be an easy change for this code, but I have no 1N/A * numbers that indicate it's worthwhile. 1N/A * Acquire a page and its parent, locked. 1N/A * Split the page if it still needs it (it's possible another 1N/A * thread of control has already split the page). If we are 1N/A * guaranteed that two items will fit on the page, the split 1N/A * is no longer necessary. 1N/A /* Once we've split the leaf page, we're done. */ 1N/A /* Switch directions. */ 1N/A * It's possible to fail to split repeatedly, as other 1N/A * threads may be modifying the tree, or the page usage 1N/A * is sufficiently bad that we don't get enough space 1N/A * Split the root page of a btree. 1N/A /* Create new left and right pages for the split. */ 1N/A /* Split the page. */ 1N/A /* Log the change. */ 1N/A /* Clean up the new root page. */ 1N/A /* Adjust any cursors. Do it last so we don't have to undo it. */ 1N/A /* Success -- write the real pages back to the store. */ 1N/A * Split the non-root page of a btree. 1N/A /* Create new right page for the split. */ 1N/A /* Create new left page for the split. */ 1N/A * Only the indices are sorted on the page, i.e., the key/data pairs 1N/A * aren't, so it's simpler to copy the data from the split page onto 1N/A * two new pages instead of copying half the data to the right page 1N/A * and compacting the left page in place. Since the left page can't 1N/A * change, we swap the original and the allocated left page after the 1N/A * Fix up the previous pointer of any leaf page following the split 1N/A * There are interesting deadlock situations here as we write-lock a 1N/A * page that's not in our direct ancestry. Consider a cursor walking 1N/A * through the leaf pages, that has the previous page read-locked and 1N/A * is waiting on a lock for the page we just split. It will deadlock 1N/A * here. If this is a problem, we can fail in the split; it's not a 1N/A * problem as the split will succeed after the cursor passes through 1N/A * the page we're splitting. 1N/A /* Insert the new pages into the parent page. */ 1N/A /* Log the change. */ 1N/A /* Copy the allocated page into place. */ 1N/A /* Finish the next-page link. */ 1N/A /* Adjust any cursors. Do so last so we don't have to undo it. */ 1N/A /* Success -- write the real pages back to the store. */ 1N/A * Fix up the btree root page after it has been split. 1N/A * If the root page was a leaf page, change it into an internal page. 1N/A * We copy the key we split on (but not the key's data, in the case of 1N/A * a leaf page) to the new root page. 1N/A * The btree comparison code guarantees that the left-most key on any 1N/A * level of the tree is never used, so it doesn't need to be filled in. 1N/A /* Copy the first key of the child page onto the root page. */ 1N/A /* Increment the overflow ref count. */ 1N/A /* Copy the first key of the child page onto the root page. */ 1N/A /* Increment the overflow ref count. */ 1N/A * Fix up the recno root page after it has been split. 1N/A /* Initialize the page. */ 1N/A /* Initialize the header. */ 1N/A /* Insert the left and right keys, set the header information. */ 1N/A * Insert a new key into a parent page, completing the split. 1N/A /* If handling record numbers, count records split to the right page. */ 1N/A * Now we insert the new page's first key into the parent page, which 1N/A * completes the split. The parent points to a PAGE and a page index 1N/A * offset, where the new key goes ONE AFTER the index, because we split 1N/A * Some btree algorithms replace the key for the old page as well as 1N/A * the new page. We don't, as there's no reason to believe that the 1N/A * first key on the old page is any better than the key we have, and, 1N/A * in the case of a key being placed at index 0 causing the split, the 1N/A * key is unavailable. 1N/A * Calculate the space needed on the parent page. 1N/A * Prefix trees: space hack used when inserting into BINTERNAL pages. 1N/A * Retain only what's needed to distinguish between the new entry and 1N/A * the LAST entry on the page to its left. If the keys compare equal, 1N/A * retain the entire key. We ignore overflow keys, and the entire key 1N/A * must be retained for the next-to-leftmost key on the leftmost page 1N/A * of each level, or the search will fail. Applicable ONLY to internal 1N/A * pages that have leaf pages as children. Further reduction of the 1N/A * key between pairs of internal pages loses too much information. 1N/A /* Add a new record for the right page. */ 1N/A /* Increment the overflow ref count. */ 1N/A /* Increment the overflow ref count. */ 1N/A /* Add a new record for the right page. */ 1N/A /* Adjust the parent page's left page record count. */ 1N/A /* Log the change. */ 1N/A /* Update the left page count. */ 1N/A * Do the real work of splitting the page. 1N/A * If we're splitting the first (last) page on a level because we're 1N/A * inserting (appending) a key to it, it's likely that the data is 1N/A * sorted. Moving a single item to the new page is less work and can 1N/A * push the fill factor higher than normal. If we're wrong it's not 1N/A * a big deal, we'll just do the split the right way next time. 1N/A * Split the data to the left and right pages. Try not to split on 1N/A * an overflow key. (Overflow keys on internal pages will slow down 1N/A * searches.) Refuse to split in the middle of a set of duplicates. 1N/A * First, find the optimum place to split. 1N/A * It's possible to try and split past the last record on the page if 1N/A * there's a very large record at the end of the page. Make sure this 1N/A * doesn't happen by bounding the check at the next-to-last entry on 1N/A * Note, we try and split half the data present on the page. This is 1N/A * because another process may have already split the page and left 1N/A * it half empty. We don't try and skip the split -- we don't know 1N/A * how much space we're going to need on the page, and we may need up 1N/A * to half the page for a big item, so there's no easy test to decide 1N/A * if we need to split or not. Besides, if two threads are inserting 1N/A * data into the same place in the database, we're probably going to 1N/A * need more space soon anyway. 1N/A * Splitp is either at or just past the optimum split point. If 1N/A * it's a big key, try and find something close by that's not. 1N/A * We can't split in the middle a set of duplicates. We know that 1N/A * no duplicate set can take up more than about 25% of the page, 1N/A * because that's the point where we push it off onto a duplicate 1N/A * page set. So, this loop can't be unbounded. 1N/A /* We're going to split at splitp. */ 1N/A * Copy a set of records from one page to another. 1N/A * PUBLIC: int __bam_copy __P((DB *, PAGE *, PAGE *, u_int32_t, u_int32_t)); 1N/A * Copy the rest of the data to the right page. Nxt is the next 1N/A * offset placed on the target page. 1N/A * If we're on a key and it's a duplicate, just copy