database revision 3bbe4751ff06fc8bfe190289dddb2f58553e308e
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyDatabases
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyBIND 9 DNS database allows named rdatasets to be stored and retrieved.
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyDNS databases are used to store two different categories of data:
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyauthoritative zone data and non-authoritative cache data.Unlike
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyprevious versions of BIND which used a monolithic database, BIND 9 has
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyone database per zone or cache. Certain database operations, for
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyexample updates, have differing requirements and actions depending
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyupon whether the database contains zone data or cache data.
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyDatabase Updates
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyA master zone is updated by a Dynamic Update message. A slave zone is
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyupdated by IXFR or AXFR. AXFR provides the entire contents of the new
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyzone version, and replaces the entire contents of the database. IXFR
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyand Dynamic Update, although completely different protocols, have the
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleysame basic database requirements. They are differential update
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyprotocols, e.g. "add this record to the records at name 'foo'". They
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyare transactional, and must either succeed or fail completely.
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyChanges must not become visible to clients until the transaction has
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleycommitted. The differential nature of these updates requires
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleytransaction serialization.
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyCache updates are done by the server in the ordinary course of
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyhandling client requests. Unlike zone updates, cache updates do not
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyrefer to the current contents of the cache, so concurrent writing to
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleythe cache is possible. The main requirement is that concurrent update
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyattempts to the same node and rdataset type must appear to have been
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyexecuted in some order. In order to make DB versioning simpler, the DB
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyinterface actually imposes a more restrictive set of requirements, namely
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleythat access to a node is serialized and that database changes will become
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyvisible in version order (more on this below).
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyDatabase Concurrency and Locking
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyA principle goal of the BIND 9 project is multiprocessor scalabilty.
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyThe amount of concurrency in database accesses is an important factor
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyin achieving scalability. Consider a heavily used database, e.g. the
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleycache database serving some mail hubs, or ".com". If access to these
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleydatabases is not parallalized, then adding another CPU will not help
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleythe server's performance for the portion of the runtime spent in
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleydatabase lookup.
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleySupport for multiple concurrent readers certainly helps both cache
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleydatabases and zone databases. Zones are typically read much more than
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleythey are written, though less so than in prior years because dynamic
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyDNS support is now widely available. Caches are frequently written as
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleywell as read; a non-scientific survey of caching statistics on a few
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleybusy caching nameservers showed the ratio of cache hits to misses was
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyabout 2 to 1.
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyAs mentioned above, zone updates must be serialized, but cache updates
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyoften provide good opportunities for concurrency.
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyA simple approach to these concurrency goals would be to have a single
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyread-write lock on the database. This would allow for multiple
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyconcurrent readers, and would provide the serialization of updates
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleythat zone updates require. This approach also has significant
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleylimitations. Readers cannot run while an update is running. For a
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyshort-lived transaction like a Dynamic Update, this may be acceptable,
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleybut an IXFR can take a very long time (even hours) to complete.
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyPreventing read access for such a long time is unacceptable. Another
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyproblem is that it forces updates to be serialized, even for cache
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleydatabases. There are problems on the reader side of the lock too. If
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleythe entire database is protected by one lock, then any data retrieved
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyfrom the database must either be used while the lock is held, or it
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleymust be copied, because the data in the database can change when the
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleylock isn't held. Copying is expensive, and the server would like to
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleybe able to hold a reference to database data for a long time. The
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleymost significant long-running reader problem is outbound AXFR, which
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleycould potentially block updates for a very long time (hours).
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyA finer-grained locking scheme, e.g. one lock per node, helps
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyparallelize cache updates, but doesn't help with the long-lived reader
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halleyor long-lived writer problems.
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyDatabase Versioning
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob Halley
3bbe4751ff06fc8bfe190289dddb2f58553e308eBob HalleyXXX TBS XXX