database revision 816e576f77e2c46df3e3d97d65822aa8aded7c4b
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-MontaneCopyright (C) 1999, 2000 Internet Software Consortium.
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-MontaneSee COPYRIGHT in the source root or http://isc.org/copyright.html for terms.
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montane
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montane$Id: database,v 1.7 2000/08/09 04:37:14 tale Exp $
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montane
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-MontaneDatabases
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montane
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-MontaneBIND 9 DNS database allows named rdatasets to be stored and retrieved.
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-MontaneDNS databases are used to store two different categories of data:
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montaneauthoritative zone data and non-authoritative cache data. Unlike
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montaneprevious versions of BIND which used a monolithic database, BIND 9 has
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montaneone database per zone or cache. Certain database operations, for
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montaneexample updates, have differing requirements and actions depending
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montaneupon whether the database contains zone data or cache data.
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montane
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montane
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-MontaneDatabase Semantics
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montane
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-MontaneA database instance either has zone semantics or cache semantics. The
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montanesemantics are chosen when the database is created and cannot be
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montanechanged. The differences between zone databases and cache databases
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montanewill be discussed further below.
0b84d2442ea178dc9989a239f26be28327476c48Jean-Noel Rouvignac
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montane
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-MontaneReference Safety
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montane
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-MontaneIt is a general principle of the BIND 9 project, and of the database
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-MontaneAPI, that all references returned to the caller remain valid until the
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanecaller discards the reference.
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montane
35f4c9400e583aa53856c929a5fbcb52f66cd96bViolette Roche-MontaneThe database interface also mandates that the rdata in a retrieved
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montanerdataset shall remain unaltered while any reference to the rdataset is
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montaneheld. Some other properties of the rdataset, e.g. its DNSSEC
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanevalidation status, may change.
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montane
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montane
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-MontaneDatabase Updates
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montane
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-MontaneA master zone is updated by a Dynamic Update message. A slave zone is
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montaneupdated by IXFR or AXFR. AXFR provides the entire contents of the new
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montanezone version, and replaces the entire contents of the database. IXFR
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montaneand Dynamic Update, although completely different protocols, have the
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montanesame basic database requirements. They are differential update
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montaneprotocols, e.g. "add this record to the records at name 'foo'". The
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montaneupdates are also atomic, i.e. they must either succeed or fail.
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-MontaneChanges must not become visible to clients until the update has
35f4c9400e583aa53856c929a5fbcb52f66cd96bViolette Roche-Montanecommitted. In short, zone updates are transactional. This
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montanetransaction occurs at a database level; the entire database goes from
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montaneone version to another.
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montane
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-MontaneCache updates are done by the server in the ordinary course of
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanehandling client requests. Unlike zone databases, there's no need (and
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montaneindeed, no ability) to ensure that data in the cache is consistent.
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-MontaneFor example, the cache may hold rdatasets from different versions of a
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanegiven zone. A typical cache update involves looking at the existing
a8b1f24a577acf3d2bd1154974d6bbe86ddf460bViolette Roche-Montanecache contents for the given name and type (if any), deciding if the
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montaneproposed replacement is better, and if so, doing the replacement.
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-MontaneConcurrent update attempts to the same node and rdataset type must
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montaneappear to have been executed in some order; there must be no merging
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montaneof data from multiple updates. Caches are not globally versioned like
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montanezones are. There is no need to group changes to multiple rdatasets
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montaneinto a cache transaction.
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montane
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montane
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-MontaneDatabase Concurrency and Locking
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montane
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-MontaneA principal goal of the BIND 9 project is multiprocessor scalabilty.
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-MontaneThe amount of concurrency in database accesses is an important factor
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanein achieving scalability. Consider a heavily used database, e.g. the
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanecache database serving some mail hubs, or ".com". If access to these
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanedatabases is not parallalized, then adding another CPU will not help
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanethe server's performance for the portion of the runtime spent in
d54a7f8d41d8c04a43d136ad6748281308e0b720Jean-Noel Rouvignacdatabase lookup.
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montane
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-MontaneSupport for multiple concurrent readers certainly helps both cache
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanedatabases and zone databases. Zones are typically read much more than
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanethey are written, though less so than in prior years because dynamic
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-MontaneDNS support is now widely available. Caches are frequently read and
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanefrequently written; a non-scientific survey of caching statistics on a
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanefew busy caching nameservers showed the ratio of cache hits to misses
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanewas about 2 to 1.
d54a7f8d41d8c04a43d136ad6748281308e0b720Jean-Noel Rouvignac
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-MontaneAs mentioned above, zone updates must be serialized, but cache updates
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanecan often go in parallel.
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montane
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-MontaneA simple approach to these concurrency goals would be to have a single
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montaneread-write lock on the database. This would allow for multiple
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montaneconcurrent readers, and would provide the serialization of updates
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanethat zone updates require. This approach also has significant
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanelimitations. Readers cannot run while an update is running. For a
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montaneshort-lived transaction like a Dynamic Update, this may be acceptable,
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanebut an IXFR can take a long time (even hours) to complete. Preventing
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montaneread access for such a long time is unacceptable. Another problem is
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanethat it forces updates to be serialized, even for cache databases.
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-MontaneThere are problems on the reader side of the lock too. If the entire
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanedatabase is protected by one lock, then any data retrieved from the
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanedatabase must either be used while the lock is held, or it must be
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanecopied, because the data in the database can change when the lock
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montaneisn't held. Copying is expensive, and the server would like to be
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montaneable to hold a reference to database data for a long time. The most
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanesignificant long-running reader problem is outbound AXFR, which could
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanepotentially block updates for a long time (hours).
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montane
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-MontaneA finer-grained locking scheme, e.g. one lock per node, helps
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montaneparallelize cache updates, but doesn't help with the long-lived reader
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montaneor long-lived writer problems. These problems are solved by zone
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanedatabase versioning, described below.
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montane
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-MontaneThe BIND 9 Database interface does not mandate any particular locking
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanescheme. Database implementations are strongly encouraged to provide
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montaneas much concurrency as possible without violating the database
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montaneinterface's rules.
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montane
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montane
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-MontaneDatabase Versioning
35f4c9400e583aa53856c929a5fbcb52f66cd96bViolette Roche-Montane
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-MontaneVersioning is not available in cache databases.
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montane
6870993d12bf8a2b9d5cd103dc5ccabc42f9bf5dJean-Noel RouvignacA zone database has a "current version" which is the version most
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanerecently committed. A database has a set of versions open for reading
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montane(the "open versions"). This set is always non-empty, since the
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montanecurrent version is always open. The openversion method opens a
6870993d12bf8a2b9d5cd103dc5ccabc42f9bf5dJean-Noel Rouvignacread-only handle to the current version. All retrievals using the
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanehandle will see the database as it was at the time the version was
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montaneopened, regardless of subsequent changes to the database. It is not
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanepossible to open a specific version; only the current version may be
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montaneopened. This helps limit the number of prior versions which must be
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montanekept in the database.
ae2964516ff5296a50b354b4c1283879a637dc0cViolette Roche-Montane
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-MontaneEach zone update transaction is assigned a new version. Only one such
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montane"future version" may be open at any time. It is the caller's
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montaneresponsibility to serialize and handle the blocking and awakening of
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montanemultiple update requests. The future version may be committed or
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montanerolled back by the caller. If the future version commits, its version
6870993d12bf8a2b9d5cd103dc5ccabc42f9bf5dJean-Noel Rouvignacbecomes the current version of the database.
9fb13259bb1add404edb5dd42c4d51a69096617eViolette Roche-Montane