database revision fb2d509d86af3a33350a1703316bed5b219edeca
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardDatabases
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardBIND 9 DNS database allows named rdatasets to be stored and retrieved.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardDNS databases are used to store two different categories of data:
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardauthoritative zone data and non-authoritative cache data. Unlike
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardprevious versions of BIND which used a monolithic database, BIND 9 has
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardone database per zone or cache. Certain database operations, for
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardexample updates, have differing requirements and actions depending
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardupon whether the database contains zone data or cache data.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardDatabase Semantics
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardA database instance either has zone semantics or cache semantics. The
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardsemantics are chosen when the database is created and cannot be
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardchanged. The differences between zone databases and cache databases
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardwill be discussed further below.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardReference Safety
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardIt is a general principle of the BIND 9 project, and of the database
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardAPI, that all references returned to the caller remain valid until the
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardcaller discards the reference.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardThe database interface also mandates that the rdata in a retrieved
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardrdataset shall remain unaltered while any reference to the rdataset is
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardheld. Some other properties of the rdataset, e.g. its DNSSEC
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardvalidation status, may change.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardDatabase Updates
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardA master zone is updated by a Dynamic Update message. A slave zone is
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardupdated by IXFR or AXFR. AXFR provides the entire contents of the new
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardzone version, and replaces the entire contents of the database. IXFR
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardand Dynamic Update, although completely different protocols, have the
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardsame basic database requirements. They are differential update
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardprotocols, e.g. "add this record to the records at name 'foo'". The
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardupdates are also atomic, i.e. they must either succeed or fail.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardChanges must not become visible to clients until the update has
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardcommitted. In short, zone updates are transactional. This
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardtransaction occurs at a database level; the entire database goes from
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardone version to another.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardCache updates are done by the server in the ordinary course of
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardhandling client requests. Unlike zone databases, there's no need (and
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardindeed, no ability) to ensure that data in the cache is consistent.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardFor example, the cache may hold rdatasets from different versions of a
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardgiven zone. A typical cache update involves looking at the existing
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardcache contents for the given name and type (if any), deciding if the
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardproposed replacement is better, and if so, doing the replacement.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardConcurrent update attempts to the same node and rdataset type must
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardappear to have been executed in some order; there must be no merging
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardof data from multiple updates. Caches are not globally versioned like
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardzones are. There is no need to group changes to multiple rdatasets
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardinto a cache transaction.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardDatabase Concurrency and Locking
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardA principal goal of the BIND 9 project is multiprocessor scalabilty.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardThe amount of concurrency in database accesses is an important factor
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardin achieving scalability. Consider a heavily used database, e.g. the
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardcache database serving some mail hubs, or ".com". If access to these
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willarddatabases is not parallalized, then adding another CPU will not help
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardthe server's performance for the portion of the runtime spent in
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willarddatabase lookup.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardSupport for multiple concurrent readers certainly helps both cache
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willarddatabases and zone databases. Zones are typically read much more than
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardthey are written, though less so than in prior years because dynamic
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardDNS support is now widely available. Caches are frequently read and
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardfrequently written; a non-scientific survey of caching statistics on a
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardfew busy caching nameservers showed the ratio of cache hits to misses
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardwas about 2 to 1.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardAs mentioned above, zone updates must be serialized, but cache updates
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardcan often go in parallel.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardA simple approach to these concurrency goals would be to have a single
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardread-write lock on the database. This would allow for multiple
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardconcurrent readers, and would provide the serialization of updates
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardthat zone updates require. This approach also has significant
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardlimitations. Readers cannot run while an update is running. For a
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardshort-lived transaction like a Dynamic Update, this may be acceptable,
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardbut an IXFR can take a long time (even hours) to complete. Preventing
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardread access for such a long time is unacceptable. Another problem is
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardthat it forces updates to be serialized, even for cache databases.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardThere are problems on the reader side of the lock too. If the entire
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willarddatabase is protected by one lock, then any data retrieved from the
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willarddatabase must either be used while the lock is held, or it must be
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardcopied, because the data in the database can change when the lock
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardisn't held. Copying is expensive, and the server would like to be
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardable to hold a reference to database data for a long time. The most
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardsignificant long-running reader problem is outbound AXFR, which could
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardpotentially block updates for a long time (hours).
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardA finer-grained locking scheme, e.g. one lock per node, helps
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardparallelize cache updates, but doesn't help with the long-lived reader
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardor long-lived writer problems. These problems are solved by zone
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willarddatabase versioning, described below.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardThe BIND 9 Database interface does not mandate any particular locking
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardscheme. Database implementations are strongly encouraged to provide
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardas much concurrency as possible without violating the database
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardinterface's rules.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardDatabase Versioning
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardVersioning is not available in cache databases.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardA zone database has a "current version" which is the version most
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardrecently committed. A database has a set of versions open for reading
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard(the "open versions"). This set is always non-empty, since the
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardcurrent version is always open. The openversion method opens a
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardread-only handle to the current version. All retrievals using the
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardhandle will see the database as it was at the time the version was
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardopened, regardless of subsequent changes to the database. It is not
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardpossible to open a specific version; only the current version may be
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardopened. This helps limit the number of prior versions which must be
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardkept in the database.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardEach zone update transaction is assigned a new version. Only one such
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard"future version" may be open at any time. It is the caller's
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardresponsibility to serialize and handle the blocking and awakening of
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardmultiple update requests. The future version may be committed or
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardrolled back by the caller. If the future version commits, its version
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willardbecomes the current version of the database.
4a6822d07d6d3f9ffe6907ef5f10d11dcadd75c6willard