red-black revision 40f53fa8d9c6a4fc38c0014495e7a42b08f52481
ac4e70ff8955669341f435bc0a734a17c01af124Mark AndrewsCopyright (C) 1999, 2000 Internet Software Consortium.
6fe48fb46e53ffc37542853a1edb74cb481b7d94Automatic UpdaterSee COPYRIGHT in the source root or http://www.isc.org/copyright for terms.
229ea4644b3a7d9c7fdaa43888e7f55ba01e2ee3Automatic Updater$Id: red-black,v 1.6 2000/08/01 01:18:14 tale Exp $
0c39b3ed9409ecb277d5e32fa763a4e4d6598df8Automatic Updater Red-Black Tree Implementation Notes
79b273c187a4aa1016a62181983dfdd0521681aeMark AndrewsBIND9's basic name storage mechanism is to use a modified form of
90ff38a0d8deaf5f9c2aa5916d99b2e572d28738Automatic Updaterbalanced binary tree known as a red-black tree. Red-black trees
9e3a7b0faf417a10f5f689edf288807b2d5eedc5Brian Wellingtonprovide for relatively efficient storage, retrieval and removal of
ac4e70ff8955669341f435bc0a734a17c01af124Mark Andrewsdata while maintaining the lexical order of all stored keys, a
6c6a121295b30772cbf3dd75a51fb9d883051a0eAutomatic Updaternecessary function for DNS security.
e171a4137c6ba348957e61b7c4c3541493c0da02Automatic UpdaterA red-black tree is a balanced binary tree named for the coloring that
c651f15b30f1dae5cc2f00878fb5da5b3a35a468Mark Andrewsis done in the tree, identifying each node as either red or black.
e130ab53e992670e2a2ecf043976ac09f21358d1Automatic UpdaterThere are two simple rules for maintaining the color of nodes:
3cc98b8ecedcbc8465f1cf2740b966b315662430Automatic Updater (1) A red node has only black children.
c651f15b30f1dae5cc2f00878fb5da5b3a35a468Mark Andrews (2) The path from the root to any leaf node always includes the
c651f15b30f1dae5cc2f00878fb5da5b3a35a468Mark Andrews same number of black nodes.
831f79c4310a7d38fc3475ccfff531b2b2535641Automatic UpdaterWhenever a key is added or removed, adjustments are made to adhere to
c651f15b30f1dae5cc2f00878fb5da5b3a35a468Mark Andrewsthose two rules. These adjustments are relatively cheap to make but
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic Updatermaintain the balance of the tree, thus making for efficient addition,
efb0e886f18894a1d2489f1ad74ad14b579e11c7Mark Andrewslookup and deletion operations, all of which are O(log N). The color
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic Updaterof a node is not relevant to external users of the tree; it is needed
91216cff91b34c9ff6e846dc23f248219cafe660Andreas Gustafssononly to maintain the balance of the tree.
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic UpdaterFor more information on basic red-black trees, see _Introduction to
91216cff91b34c9ff6e846dc23f248219cafe660Andreas GustafssonAlgorithms_, Cormen, Leiserson, and Rivest, MIT Press / McGraw Hill,
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic Updater1990, ISBN 0-262-03141-8, chapter 14.
9513a2a6670951f5cf5477fcfec9f933fcaff628Automatic UpdaterIn BIND9, the red-black tree implementation uses DNS names as keys,
aa9c561961e9d877946ebaa8795fa2be054ab7bfEvan Huntand can store arbitrary data with each key value. "name" and "key"
e130ab53e992670e2a2ecf043976ac09f21358d1Automatic Updaterare used interchangably in this document.
aa9c561961e9d877946ebaa8795fa2be054ab7bfEvan HuntThe basic red-black tree algorithm is further adapted for use in BIND9
9513a2a6670951f5cf5477fcfec9f933fcaff628Automatic Updaterto incorporate the notion of hierarchy, creating a tree of red-black
9513a2a6670951f5cf5477fcfec9f933fcaff628Automatic Updatertrees. Where there is more than one name with a common suffix, all
9513a2a6670951f5cf5477fcfec9f933fcaff628Automatic Updaternames with that suffix are stored in their own red-black tree, with a
aa9c561961e9d877946ebaa8795fa2be054ab7bfEvan Huntdown pointer from the suffix locating the subtree.
2d2dc37599979c83495510f8af8d1756753aa2c5Automatic UpdaterFor example, consider storing the following names:
9513a2a6670951f5cf5477fcfec9f933fcaff628Automatic UpdaterNo matter which order the keys were added, this would result in a tree
9513a2a6670951f5cf5477fcfec9f933fcaff628Automatic Updaterthat can be visualized as:
930f6069e5aa157cf6987cdafd412f5757a5a558Automatic UpdaterThis tree shows that when there is no key for a particular label, and
c651f15b30f1dae5cc2f00878fb5da5b3a35a468Mark Andrewswhen there is only one known label for its immediate subordinate, then
930f6069e5aa157cf6987cdafd412f5757a5a558Automatic Updatermultiple labels can appear in a single node, such as at d.e.f and g.h.
80faf1588895fd26490f82f95a7a1b771df1c324Automatic UpdaterIt also demonstrates that there can be more nodes in the tree of trees
c651f15b30f1dae5cc2f00878fb5da5b3a35a468Mark Andrewsthan there are actual keys (which degrades the O(log N) performance
6fe48fb46e53ffc37542853a1edb74cb481b7d94Automatic Updatermarginally); the nodes at d.e.f and w.y do not represent keys.
930f6069e5aa157cf6987cdafd412f5757a5a558Automatic UpdaterAs an aside, remember that when ordering DNS names, labels are
693c4232dfdffaff672197d4b9fea944c64cf80aAutomatic Updaterexamined from the right, therefore w.y sorts after x and before z.
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic UpdaterA split can occur not only on a regular label boundary, but also
91216cff91b34c9ff6e846dc23f248219cafe660Andreas Gustafssonbetween any two bits in an EDNS bitstring label. The common-suffix
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic Updaterrules will be applied to keep as many bits together as possible.
efb0e886f18894a1d2489f1ad74ad14b579e11c7Mark AndrewsIn the current implementation of the tree of trees, a node is
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic Updaterconsidered to "formally" exist only if it has data associated with
91216cff91b34c9ff6e846dc23f248219cafe660Andreas Gustafssonit. So if the above tree then had the key d.e.f added to it, the
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic Updateroperation would succeed rather than getting an "already exists"
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic UpdaterAlong the same lines, if a key is added with a name which is a proper
dd65eb1efb40b1c47d57963192bfc54873b219beAutomatic Updatersuperdomain of the name stored in an existing node, the operation will
dd65eb1efb40b1c47d57963192bfc54873b219beAutomatic Updatersucceed by splitting the existing node into one node that is the key
78f3ed4bc2fcd3d270bfd599804f3b27a1db4d91Mark Andrewsand another node that is the remaining parts of the name. Adding e.f
11af78f7dc35741bdab68dbab11b03daab005b28Automatic Updaterto the above tree results in the top level red-black tree having a
11af78f7dc35741bdab68dbab11b03daab005b28Automatic Updaternode named e.f where the current d.e.f is, and a down pointer from
78f3ed4bc2fcd3d270bfd599804f3b27a1db4d91Mark Andrewsd.e.f to a "tree" of a single node named d. The down pointer from d
2a31bd531072824ef252c18303859d6af7451b00Francis Dupontwould be kept to the level which has x, w.y, and z.
8ccd7da886e93cd490fcb6f4c4e98a6514f35820Automatic UpdaterA similar split of d.e.f would occur if the name k.e.f were added.
2a31bd531072824ef252c18303859d6af7451b00Francis DupontThe top level tree would have the node e.f with a down pointer to a
c651f15b30f1dae5cc2f00878fb5da5b3a35a468Mark Andrewslevel that had both d and k, and d would continue to have its down
e130ab53e992670e2a2ecf043976ac09f21358d1Automatic Updaterpointer to the x, w.y and z level.
c651f15b30f1dae5cc2f00878fb5da5b3a35a468Mark AndrewsIt is guaranteed when splitting that external references to the node
78f3ed4bc2fcd3d270bfd599804f3b27a1db4d91Mark Andrewsthat is split will remain valid --- in the previous examples, anything
08e3b6797706a13054bad749dea04e94b514b8e7Automatic Updaterthat was pointing to the node that was d.e.f will still point to the
dd65eb1efb40b1c47d57963192bfc54873b219beAutomatic Updaternode that is now just d.
78f3ed4bc2fcd3d270bfd599804f3b27a1db4d91Mark AndrewsWhen deleting keys, nodes can be rejoined. If both of p.w.y.d.e.f and
a308b69ac66fadf66863484f301314d6e6a3f1d2Automatic Updaterq.w.y.d.e.f were removed from the example tree, the node named w.y
a308b69ac66fadf66863484f301314d6e6a3f1d2Automatic Updaterwould become o.w.y. Unlike splitting, it is _not_ guaranteed that
78f3ed4bc2fcd3d270bfd599804f3b27a1db4d91Mark Andrewsexternal references remain consistent; sometimes they will, sometimes
c651f15b30f1dae5cc2f00878fb5da5b3a35a468Mark Andrewsthey won't. Also, note that deletion is not perfectly symmetric with
6fe48fb46e53ffc37542853a1edb74cb481b7d94Automatic Updateraddition. If you "undo" the last addition with a deletion of the same
82447d835d3ff5c658749b4e9b4f66166407b3eaAutomatic Updaterkey then the tree of trees is not guaranteed to have exactly the same
c651f15b30f1dae5cc2f00878fb5da5b3a35a468Mark Andrewsstructure as it had prior to the addition. Sometimes, but not always.
0c39b3ed9409ecb277d5e32fa763a4e4d6598df8Automatic UpdaterRejoining does not happen if it would violate any of the rules that
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic Updatercause a split. o would not be rejoined with w.y if w.y had data
cdfc81e048bd34c1d628380247bda6b80a89e20eAutomatic Updaterassociated with the key; o would remain as a single node on its own
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic Updaterlevel. This emphasizes the rule that a node is considered to formally
fe80a4909bf62b602feaf246866e9d29f7654194Automatic Updaterexist only if data is associated with it, because even if w.y.d.e.f
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic Updaterhad been explicitly added as a key but with no data, then o would
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic Updaterstill be merged with the w.y node when p and q were deleted.
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic UpdaterSearching for a node generally returns one of three possible results:
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic Updatereither the key is found, a superdomain (partial match) of the key is
91216cff91b34c9ff6e846dc23f248219cafe660Andreas Gustafssonfound, or no part of the key is found. The first and last are rather
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic Updaterobvious, and the second result basically means that a hierarchically
aa1d397c4736cd86540555193d71e55fa3b37b2aMark Andrewsenclosing name is found; e.g, searching for bb.rc.vix.com turned up
91216cff91b34c9ff6e846dc23f248219cafe660Andreas Gustafssonrc.vix.com, but not the full name.
dd65eb1efb40b1c47d57963192bfc54873b219beAutomatic UpdaterNo locking is done within the RBT library. @@@
91216cff91b34c9ff6e846dc23f248219cafe660Andreas GustafssonWhen a partial match is made, level_matches is set while the chain
f2770f6b39a9b2a98afb7a11ed105f73f1570c1eAutomatic Updaterpoints to the partial match node that was found. Then the chain is
4104e236f71eb5108fcfda6711878a97f6f4a8e7Automatic Updateradjusted to point to the DNSSEC predecessor node, which might not even
8711e5c73ca872d59810760af0332194cbdd619bAutomatic Updaterbe under the same top level domain as the name that was searched for.
229ea4644b3a7d9c7fdaa43888e7f55ba01e2ee3Automatic UpdaterFor example, consider a database that had only the names vix.com and
0ce87e5749aabb8eef1e0a37e4bd6e6ffa1d7196Automatic Updaterisc.org. A search for uu.net would leave the chain pointed to
6fe48fb46e53ffc37542853a1edb74cb481b7d94Automatic Updatervix.com, the DNSSEC predecessor. Though this might first appear to
229ea4644b3a7d9c7fdaa43888e7f55ba01e2ee3Automatic Updatercause level_matches to be bogus because the chain has been unwound and
765c97d56ccddc9d7904c7d9ff2e2d825d9687e4Automatic Updatersent down another path, note that the partial match node will always
3e5340279d8875d136a4dd815cccad0044aa2644Automatic Updaterbe in the chain of the predecessor, too --- and often the partial
8ccd7da886e93cd490fcb6f4c4e98a6514f35820Automatic Updatermatch node will be the predecessor itself. In the vix.com/isc.org
da82e232161d67b77df2d67898bdac693f647be1Automatic Updaterexample, the search for uu.net finds a partial match at ".", which is
e130ab53e992670e2a2ecf043976ac09f21358d1Automatic Updaterof course also in the path to the vix.com predecessor. A search for
d145b64cacc8d9cda51f9924ec70cd4661c3e2cfAutomatic Updaterwww.isc.org would find that isc.org is both the partial match and the
1d4f4d2db2d69e48fec2dde5c1535853677d22a7Automatic UpdaterEXTERNAL PROGRAMMATIC DETAILS
da82e232161d67b77df2d67898bdac693f647be1Automatic UpdaterThis section details the functions used to interact with the BIND9
9c446b72069d0ab9f710502f4d7048e50875fccbAutomatic Updaterred-black tree library, or RBT for short.
ce9cad6bb04869c5e94d9dc721032b25117f9210Automatic UpdaterA source file that will be using RBT will usually need to include
bc0a53583d92309bebcf93c408e2f3247ebd3d3cAutomatic Updater<dns/rbt.h>. This header file automatically includes <isc/result.h),
ce9cad6bb04869c5e94d9dc721032b25117f9210Automatic Updater<isc/mem.h>, <dns/types.h>, and <dns/name.h>.
59528addd704f8d5757b54e540520f74e588a7c7Automatic UpdaterThe rbt.h file has more complete descriptions of each of the functions
ce9cad6bb04869c5e94d9dc721032b25117f9210Automatic Updaternamed here, including what is required for each argument, what each
ce9cad6bb04869c5e94d9dc721032b25117f9210Automatic Updaterfunction ensures (and might not ensure) will occur, and the full range
7f79131f9a8e804b93c57f3c679065cce878b726Automatic Updaterof possible results for each call. Note well: if a function returns a
59528addd704f8d5757b54e540520f74e588a7c7Automatic Updaterdns_result_t rather than void, it definitely means there is something
ce9cad6bb04869c5e94d9dc721032b25117f9210Automatic Updaterthat can go possibly wrong in the function and it should be checked by
ce9cad6bb04869c5e94d9dc721032b25117f9210Automatic UpdaterA new tree of trees must be initialized using:
ce9cad6bb04869c5e94d9dc721032b25117f9210Automatic Updater dns_result_t dns_rbt_create(isc_mem_t *mctx, void (*deleter)(void *, void *),
9513a2a6670951f5cf5477fcfec9f933fcaff628Automatic Updater void *deleter_arg, dns_rbt_t **rbtp);
ce9cad6bb04869c5e94d9dc721032b25117f9210Automatic UpdaterThe memory context, mctx, must be a non-null pointer that was
ce9cad6bb04869c5e94d9dc721032b25117f9210Automatic Updaterinitialized with isc_mem_create(). The deleter argument, if non-null,
5ecad47f69b3fd945472ab2900a9ff826a7ce2f6Automatic Updatershould point to a function that is responsible for cleaning up any
e130ab53e992670e2a2ecf043976ac09f21358d1Automatic Updatermemory associated with the data pointer of a node when the node is
ce9cad6bb04869c5e94d9dc721032b25117f9210Automatic Updaterdeleted. It is passed the deleted node's data pointer as its first
71bd43eebd9d6e42dbcae62b730f5b6508d5acd8Automatic Updaterargument and deleter_arg as its second argument.
7262eb86f2b465822206122921e2f357218f0cfdAutomatic UpdaterAfter initializing an RBT manager, to add keys to the tree, use:
ce9cad6bb04869c5e94d9dc721032b25117f9210Automatic Updater dns_result_t dns_rbt_addname(dns_rbt_t *rbt, dns_name_t *name, void *data);
bbb069be941f649228760edcc241122933c066d2Automatic UpdaterThe name _must_ be an absolute name. It is not required that the data
9cd5eb6fe0f26d65724b99216cb31dcdd12e4afdAutomatic Updaterpointer be non-null, but it is recommended that it point to something,
4cda4fd158d6ded5586bacea8c388445d99611eaAutomatic Updatereven just invalid memory, because of the various searching and
c651f15b30f1dae5cc2f00878fb5da5b3a35a468Mark Andrewsdeletion issues described in the previous section. The RBT code will
9cd5eb6fe0f26d65724b99216cb31dcdd12e4afdAutomatic Updaternot attempt to dereference the pointer.
c651f15b30f1dae5cc2f00878fb5da5b3a35a468Mark AndrewsTo find a key in the tree, use:
8711e5c73ca872d59810760af0332194cbdd619bAutomatic Updater dns_result_t dns_rbt_findname(dns_rbt_t *rbt, dns_name_t *name, void **data);
c651f15b30f1dae5cc2f00878fb5da5b3a35a468Mark AndrewsThe data parameter must not be NULL, but *data must be NULL. The
765c97d56ccddc9d7904c7d9ff2e2d825d9687e4Automatic Updaterresult will be either DNS_R_SUCCESS, DNS_R_PARTIALMATCH or
c651f15b30f1dae5cc2f00878fb5da5b3a35a468Mark AndrewsDNS_R_NOTFOUND. In the first case, an exact match was found for the
f7c88d61cc1ad2435b0b7cfaedfc9d5248c0be25Automatic Updatername and there was an associate data pointer, which is returned via
c651f15b30f1dae5cc2f00878fb5da5b3a35a468Mark Andrewsthe data parameter. A partial match results when the name has not
ce9cad6bb04869c5e94d9dc721032b25117f9210Automatic Updaterbeen found but a superdomain name, with data, does exist; then the
3f616e6f846be57b1717c6beaba0f74de9d5a7c6Automatic Updaterdata for that name is returned in the data parameter. If no data is
229ea4644b3a7d9c7fdaa43888e7f55ba01e2ee3Automatic Updaterfound for the name or a superdomain, *data will remain NULL.
f7a71eef29bcbf892270460269c79664f600cffdAutomatic UpdaterINTERNAL PROGRAMMATIC DETAILS
9e3a7b0faf417a10f5f689edf288807b2d5eedc5Brian WellingtonThis section is mainly relevant to the RBT DB implementation. It is
8711e5c73ca872d59810760af0332194cbdd619bAutomatic Updaterhighly recommended that programmers using the RBT library stick to the
8711e5c73ca872d59810760af0332194cbdd619bAutomatic Updaterfunctions named in the previous section.
930f6069e5aa157cf6987cdafd412f5757a5a558Automatic UpdaterThe dns_rbt_addname and dns_rbt_findname functions named in the
8ccd7da886e93cd490fcb6f4c4e98a6514f35820Automatic Updaterprevious section are wrappers around dns_rbt_addnode and
8711e5c73ca872d59810760af0332194cbdd619bAutomatic Updaterdns_rbt_findnode. The *node functions for the most part do not
ce9cad6bb04869c5e94d9dc721032b25117f9210Automatic Updaterparticularly care whether a node has an associated data pointer or
cf7e98f59148b559946a7f1ca728471374f1eef3Automatic Updaternot, whereas the *name functions do. The one exception to this is
c3fd32ed29e9e419bb56583f4272a506773b1ea0Automatic Updaterthat when a PARTIALMATCH is returned for a search, the indicated node
91216cff91b34c9ff6e846dc23f248219cafe660Andreas Gustafssonis the deepest match that has data, rather than just the deepest
c3fd32ed29e9e419bb56583f4272a506773b1ea0Automatic Updatermatch. Even that behavior is selectable, however, using the boolean
c3fd32ed29e9e419bb56583f4272a506773b1ea0Automatic Updaterempty_data_ok argument to dns_rbt_findnode.
8711e5c73ca872d59810760af0332194cbdd619bAutomatic UpdaterEach node in the tree of trees is represented by the following structure:
9e3a7b0faf417a10f5f689edf288807b2d5eedc5Brian Wellington typedef struct dns_rbtnode {
9513a2a6670951f5cf5477fcfec9f933fcaff628Automatic Updater struct dns_rbtnode *left;
3857cb6fcabeb79d85de4b3e3e4ab99912b701f8Mark Andrews struct dns_rbtnode *right;
572cb2c1c931f6bc6a4a019c103ae88239b0eb96Automatic Updater struct dns_rbtnode *down;
c651f15b30f1dae5cc2f00878fb5da5b3a35a468Mark Andrews * The following bitfields add up to a total bitwidth of 32.
9174e44c14b1cb91a651fa1dc29470438c246ab9Automatic Updater * The range of values necessary for each item is indicated,
91216cff91b34c9ff6e846dc23f248219cafe660Andreas Gustafsson * but in the case of "attributes" the field is wider to accomodate
e2caa7536302de34de6cc04025abcd53dc3a499aAutomatic Updater * possible future expansion. "offsetlen" could be one bit
56e7dc0c24b04210dcbffb180a9e35644fb820daAutomatic Updater * narrower by always adjusting its value by 1 to find the real
7d12a6b412fe47e6d6582923fd6954ab8cd0baebAutomatic Updater * offsetlen, but doing so does not gain anything (except perhaps
8292deab031e7599cd7622aa7675fbe139ca6095Mark Andrews * another bit for "attributes", which doesn't yet need any more).
0b57424d28c9a67018107133f9fbc0a7dcf057e2Mark Andrews unsigned int color:1; /* range is 0..1 */
0b57424d28c9a67018107133f9fbc0a7dcf057e2Mark Andrews unsigned int attributes:6; /* range is 0..2 */
ca35524ce2b57e6f1b261d23565d1288a355d12fAutomatic Updater unsigned int namelen:8; /* range is 1..255 */
78f3ed4bc2fcd3d270bfd599804f3b27a1db4d91Mark Andrews unsigned int offsetlen:8; /* range is 1..128 */
b109432c3a939bff66a463be86c371bd88efe3aaAutomatic Updater unsigned int padbytes:9; /* range is 0..380 */
78f3ed4bc2fcd3d270bfd599804f3b27a1db4d91Mark Andrews * These values are used in the RBT DB implementation. The
78f3ed4bc2fcd3d270bfd599804f3b27a1db4d91Mark Andrews * appropriate node lock must be held before accessing them.
78f3ed4bc2fcd3d270bfd599804f3b27a1db4d91Mark Andrews unsigned int dirty:1;
78f3ed4bc2fcd3d270bfd599804f3b27a1db4d91Mark Andrews unsigned int locknum:DNS_RBT_LOCKLENGTH;
3351ccbd5c1961404044f8273d54dad405f53960Mark Andrews unsigned int references:DNS_RBT_REFLENGTH;
7d12a6b412fe47e6d6582923fd6954ab8cd0baebAutomatic Updater } dns_rbtnode_t;