1N/A * See the file LICENSE for redistribution information. 1N/A * Copyright (c) 1996, 1997, 1998 1N/A * Sleepycat Software. All rights reserved. 1N/A * This implementation requires that values within the following structures 1N/A * NOT be padded -- note, ANSI C permits random padding within structures. 1N/A * If your compiler pads randomly you can just forget ever making DB run on 1N/A * your system. In addition, no data type can require larger alignment than 1N/A * its own size, e.g., a 4-byte data element may not require 8-byte alignment. 1N/A * Note that key/data lengths are often stored in db_indx_t's -- this is 1N/A * item fits on a page, it's guaranteed to be small enough to fit into a 1N/A * db_indx_t, and storing it in one saves space. 1N/A * When we create pages in mpool, we ask mpool to clear some number of bytes 1N/A * in the header. This number must be at least as big as the regular page 1N/A * headers and cover enough of the btree and hash meta-data pages to obliterate 1N/A * the magic and version numbers. 1N/A/************************************************************************ 1N/A BTREE METADATA PAGE LAYOUT 1N/A ************************************************************************/ 1N/A * Btree metadata page layout: 1N/A /* 48-67: Unique file ID. */ 1N/A/************************************************************************ 1N/A HASH METADATA PAGE LAYOUT 1N/A ************************************************************************/ 1N/A * Hash metadata page layout: 1N/A/* Hash Table Information */ 1N/Atypedef struct hashhdr {
/* Disk resident portion */ 1N/A /* 60-187: Spare pages for overflow */ 1N/A /* 188-207: Unique file ID. */ 1N/A * Minimum page size is 256. 1N/A/************************************************************************ 1N/A ************************************************************************/ 1N/A * +-----------------------------------+ 1N/A * | lsn | pgno | prev pgno | 1N/A * +-----------------------------------+ 1N/A * | next pgno | entries | hf offset | 1N/A * +-----------------------------------+ 1N/A * | level | type | index | 1N/A * +-----------------------------------+ 1N/A * | index | free --> | 1N/A * +-----------+-----------------------+ 1N/A * | F R E E A R E A | 1N/A * +-----------------------------------+ 1N/A * | <-- free | item | 1N/A * +-----------------------------------+ 1N/A * | item | item | item | 1N/A * +-----------------------------------+ 1N/A * sizeof(PAGE) == 26 bytes, and the following indices are guaranteed to be 1N/A * For hash and btree leaf pages, index items are paired, e.g., inp[0] is the 1N/A * key for inp[1]'s data. All other types of pages only contain single items. 1N/A * The btree levels are numbered from the leaf to the root, starting 1N/A * with 1, so the leaf is level 1, its parent is level 2, and so on. 1N/A * We maintain this level on all btree pages, but the only place that 1N/A * we actually need it is on the root page. It would not be difficult 1N/A * to hide the byte on the root page once it becomes an internal page, 1N/A * so we could get this byte back if we needed it for something else. 1N/A/* Element macros. */ 1N/A * The next_pgno and prev_pgno fields are not maintained for btree and recno 1N/A * internal pages. It's a minor performance improvement, and more, it's 1N/A * hard to do when deleting internal pages, and it decreases the chance of 1N/A * deadlock during deletes and splits. 1N/A * The btree/recno access method needs db_recno_t bytes of space on the root 1N/A * page to specify how many records are stored in the tree. (The alternative 1N/A * is to store the number of records in the meta-data page, which will create 1N/A * a second hot spot in trees being actively modified, or recalculate it from 1N/A * the BINTERNAL fields on each access.) Overload the prev_pgno field. 1N/A * Initialize a page. 1N/A * Don't modify the page's LSN, code depends on it being unchanged after a 1N/A/* Page header length (offset to first index). */ 1N/A/* First free byte. */ 1N/A/* Free space on the page. */ 1N/A/* Get a pointer to the bytes at a specific index. */ 1N/A/************************************************************************ 1N/A OVERFLOW PAGE LAYOUT 1N/A ************************************************************************/ 1N/A * Overflow items are referenced by HOFFPAGE and BOVERFLOW structures, which 1N/A * store a page number (the first page of the overflow item) and a length 1N/A * (the total length of the overflow item). The overflow item consists of 1N/A * some number of overflow pages, linked by the next_pgno field of the page. 1N/A * A next_pgno field of PGNO_INVALID flags the end of the overflow item. 1N/A * Overflow page overloads: 1N/A * The amount of overflow data stored on each page is stored in the 1N/A * The implementation reference counts overflow items as it's possible 1N/A * for them to be promoted onto btree internal pages. The reference 1N/A * count is stored in the entries field. 1N/A/* Maximum number of bytes that you can put on an overflow page. */ 1N/A/************************************************************************ 1N/A ************************************************************************/ 1N/A/* Each index references a group of bytes on the page. */ 1N/A * Items on hash pages are (potentially) unaligned, so we can never cast the 1N/A * (page + offset) pointer to an HKEYDATA, HOFFPAGE or HOFFDUP structure, as 1N/A * we do with B+tree on-page structures. Because we frequently want the type 1N/A * field, it requires no alignment, and it's in the same location in all three 1N/A * structures, there's a pair of macros. 1N/A * The first and second types are H_KEYDATA and H_DUPLICATE, represented 1N/A * by the HKEYDATA structure: 1N/A * +-----------------------------------+ 1N/A * +-----------------------------------+ 1N/A * For duplicates, the data field encodes duplicate elements in the data 1N/A * +---------------------------------------------------------------+ 1N/A * | type | len1 | element1 | len1 | len2 | element2 | len2 | 1N/A * +---------------------------------------------------------------+ 1N/A * Thus, by keeping track of the offset in the element, we can do both 1N/A * backward and forward traversal. 1N/A * The length of any HKEYDATA item. Note that indx is an element index, 1N/A * Page space required to add a new HKEYDATA item to the page, with and 1N/A * without the index value. 1N/A/* Put a HKEYDATA item at the location referenced by a page entry. */ 1N/A * Macros the describe the page layout in terms of key-data pairs. 1N/A * The use of "pindex" indicates that the argument is the index 1N/A * expressed in pairs instead of individual elements. 1N/A * The third type is the H_OFFPAGE, represented by the HOFFPAGE structure: 1N/A * Page space required to add a new HOFFPAGE item to the page, with and 1N/A * without the index value. 1N/A * The fourth type is H_OFFDUP represented by the HOFFDUP structure: 1N/A * Page space required to add a new HOFFDUP item to the page, with and 1N/A * without the index value. 1N/A/************************************************************************ 1N/A ************************************************************************/ 1N/A/* Each index references a group of bytes on the page. */ 1N/A * We have to store a deleted entry flag in the page. The reason is complex, 1N/A * but the simple version is that we can't delete on-page items referenced by 1N/A * a cursor -- the return order of subsequent insertions might be wrong. The 1N/A * delete flag is an overload of the top bit of the type byte. 1N/A * The first type is B_KEYDATA, represented by the BKEYDATA structure: 1N/A/* Get a BKEYDATA item for a specific index. */ 1N/A * Page space required to add a new BKEYDATA item to the page, with and 1N/A * without the index value. 1N/A * The second and third types are B_DUPLICATE and B_OVERFLOW, represented 1N/A * by the BOVERFLOW structure. 1N/A/* Get a BOVERFLOW item for a specific index. */ 1N/A * Page space required to add a new BOVERFLOW item to the page, with and 1N/A * without the index value. 1N/A * Btree leaf and hash page layouts group indices in sets of two, one 1N/A * for the key and one for the data. Everything else does it in sets 1N/A * of one to save space. I use the following macros so that it's real 1N/A * obvious what's going on... 1N/A/************************************************************************ 1N/A BTREE INTERNAL PAGE LAYOUT 1N/A ************************************************************************/ 1N/A * Btree internal entry. 1N/A/* Get a BINTERNAL item for a specific index. */ 1N/A * Page space required to add a new BINTERNAL item to the page, with and 1N/A * without the index value. 1N/A/************************************************************************ 1N/A RECNO INTERNAL PAGE LAYOUT 1N/A ************************************************************************/ 1N/A * The recno internal entry. 1N/A * Why not fold this into the db_indx_t structure, it's fixed length? 1N/A/* Get a RINTERNAL item for a specific index. */ 1N/A * Page space required to add a new RINTERNAL item to the page, with and 1N/A * without the index value. 1N/A#
endif /* _DB_PAGE_H_ */