dhcp_network.c revision 2
2N/A * The contents of this file are subject to the terms of the 2N/A * Common Development and Distribution License, Version 1.0 only 2N/A * (the "License"). You may not use this file except in compliance 2N/A * See the License for the specific language governing permissions 2N/A * and limitations under the License. 2N/A * When distributing Covered Code, include this CDDL HEADER in each 2N/A * If applicable, add the following below this CDDL HEADER, with the 2N/A * fields enclosed by brackets "[]" replaced with your own identifying 2N/A * information: Portions Copyright [yyyy] [name of copyright owner] 2N/A * Copyright 2005 Sun Microsystems, Inc. All rights reserved. 2N/A * Use is subject to license terms. 2N/A#
pragma ident "%Z%%M% %I% %E% SMI" 2N/A * This file contains public functions for managing DHCP network 2N/A * containers. For the semantics of these functions, please see the 2N/A * Enterprise DHCP Architecture Document. 2N/A * This module uses synchronization guarantees provided by dsvclockd(1M); 2N/A * Big Theory Statement for the SUNWbinfiles DHCP Network Module 2N/A * ============================================================= 2N/A * 1. On-disk Structure 2N/A * Each container consists of two basic pieces on-disk: a header and an 2N/A * array of records. In order to provide fast client IP lookup, the array 2N/A * of records is directly indexed by client IP address (using a simple 2N/A * mapping function). In order to provide fast client id lookup, each 2N/A * in-use record is also on exactly one doubly-linked client id hash chain; 2N/A * the hash chains heads are contained in the header). For all other 2N/A * lookups, we can restrict our search to only the in-use records by merely 2N/A * walking all of the hash chains. Here's a crude illustration of what 2N/A * this looks like on-disk (note that hash chains 2 and 3 are empty): 2N/A * _______________________________________________ 2N/A * | container info | hash chain heads (buckets) | 2N/A * header | | 1 | 2 | 3 | [ .... ] | N | 2N/A * |__________________|_|________________________|_| 2N/A * | rec1 | rec2 | | rec3 | rec4 | | 2N/A * | unused | unused | hash1 | unused | | 2N/A * |___________|___________|________^|_|_________|_| 2N/A * | rec5 | rec6 | rec7 |v | rec8 | | 2N/A * records | unused | hashN | hash1 <- hash1 | | 2N/A * |___________|________^|_|___________|_________|_| 2N/A * | : :: : [ more records... ] : | 2N/A * |___________:________::_:___________:_________:_| 2N/A * | recN-3 | recN-2 || | recN-1 | recN v | 2N/A * | unused | unused +--- hashN <- hashN | 2N/A * |___________|___________|___________|___________| 2N/A * Note that the actual on-disk format is a bit more complicated than this 2N/A * due to robustness issues; see section 3 below for details. 2N/A * 2. Robustness Requirements 2N/A * This module has been designed to be as efficient as possible while still 2N/A * retaining the robustness minimally required for an enterprise-level 2N/A * environment. In particular, it is designed to handle the following 2N/A * failure situations: 2N/A * 1. An update operation (add, modify, delete) on a container is 2N/A * unable to complete due to an unexpected internal error at 2N/A * any point in the update code. 2N/A * 2. An update operation (add, modify, delete) on a container is 2N/A * unable to complete due to unexpected program termination while 2N/A * at any point in the update code. 2N/A * If either of these situations occur, the container in question must be 2N/A * left in a consistent (and viable) state. In addition, only the pending 2N/A * transaction (at most) may be lost. 2N/A * 3. Robustness Techniques 2N/A * This module uses a few different techniques to meet our robustness goals 2N/A * while maintaining high performance. The biggest problem we encounter 2N/A * when trying to achieve robustness is updating the client id hash chain. 2N/A * In particular, it is not possible to atomically add, move, or delete an 2N/A * item from a doubly linked list, thus creating a window where a crash 2N/A * could leave our hash chains in an inconsistent state. 2N/A * To address this problem, we actually maintain two images (copies) of all 2N/A * the hash chains in the container. At any point in time, exactly one of 2N/A * the two images is active (and thus considered authoritative), as 2N/A * indicated by a byte in the container header. When performing an update 2N/A * operation, all hash chain modifications are done on the *inactive* 2N/A * image, then, once the inactive image has completed the hash chain 2N/A * operations required by the update, the active and inactive images are 2N/A * atomically switched, making the formerly-inactive image authoritative. 2N/A * After the image switch, the update code then updates the formerly-active 2N/A * image's hash chains to match the active image's hash chains. 2N/A * This approach has the nice property that internal container consistency 2N/A * can always be restored after a crash by just resynchronizing the 2N/A * inactive image's hash chains with the active image's chains. Note that 2N/A * the atomic image switch serves as the "commit point" for the operation: 2N/A * if we crash before this point, we roll back the operation upon recovery 2N/A * and it appears as though the operation never happened; if we crash after 2N/A * this point, we roll forward the rest of the operation upon recovery as 2N/A * if the crash had not happened. 2N/A * This technique is enough to robustly implement our add and delete 2N/A * operations, but modify has an additional complication due to our direct 2N/A * mapping of client IP addresses to records. In particular, unless the 2N/A * record modification includes changing the client IP address, the 2N/A * modified record must be written at the same location as the original 2N/A * record -- however, if the modify operation fails part way through 2N/A * writing out the new client record, the record will be corrupt and we 2N/A * will have no way to return the record to a consistent state. To address 2N/A * this issue, we allocate a spare record in the container header called 2N/A * the "temporary" record. Upon a modification of this type, we first 2N/A * write the modified record to the temporary record and indicate that the 2N/A * temporary record is currently proxying for the actual record. We then 2N/A * copy the temporary record to the actual record and make the temporary 2N/A * record available again for future use. If a crash occurs before the 2N/A * copy to the temporary record is complete, then we just roll back as if 2N/A * the modify never happened (since we have not modified the actual 2N/A * record). If a crash occurs after copying the temporary record, we roll 2N/A * forward and complete the copy operation as if the crash never happened. 2N/A * Note that there are some additional subtle complications here; see the 2N/A * comments in the code for details. 2N/A * As a safeguard, check that the size of a dn_header_t hasn't 2N/A * changed (since it contains a dn_rec_t, this will probably catch 2N/A * a change in that structure as well). If it has, bail rather 2N/A * than totally corrupting the container (by continuing). Note 2N/A * that this situation indicates an internal programming error, 2N/A * which is why we prefer assert() to just returning DSVC_INTERNAL. 2N/A * We just created the per-network container; initialize 2N/A * the header and put it out on disk. Note that we leave 2N/A * `dnh_version' zero until the entire header has been 2N/A * written, so we can detect partial failure. 2N/A * Virtually reserve all the space we're going to need for 2N/A * the dn_rec_t's ahead of time, so that we don't have to 2N/A * worry about "growing" the file later (though it may 2N/A * increase in size as we fill in holes). We're guaranteed 2N/A * that we'll read these holes as zeros, which we take 2N/A * advantage of since a dn_filerec_t with a rec_prev of 2N/A * DN_NOREC (which is 0) indicates that a record is unused. 2N/A * Set the version field on the container, effectively 2N/A * making it available for use. 2N/A * Container already exists; sanity check against the 2N/A * header that's on-disk. If we detect a problem then 2N/A * either someone scribbled on our container or we 2N/A * terminated abnormally when creating the container. 2N/A * It's possible that a previous update to this container failed 2N/A * part-way through. In general, this is fine since we always keep 2N/A * our active image's hash chains correct and only swap to the 2N/A * alternate image when the other image is completely safe to use. 2N/A * However, for reasons explained in modify_dn(), it's possible 2N/A * that a record being modified was not completely updated before a 2N/A * failure occurred. In this case, the actual data for that record 2N/A * is contained in the temporary record in the header. We need to 2N/A * be careful to use that temporary record anywhere we'd otherwise 2N/A * refer to the partially updated record. Note that we do this 2N/A * rather than attempting to restore the consistency of the 2N/A * container because we're MT-hot here. 2N/A * Lookup scenario 1: Caller has requested a QN_CIP 2N/A * query lookup; set `recid' to the only possible 2N/A * entry (which may not be in-use). 2N/A * Lookup scenario 2: Caller has requested a 2N/A * QN_CID-based lookup. Walk the `cidhash' chain 2N/A * (one call at a time) and set `recid' to hash 2N/A * bucket candidates. 2N/A * Note that it's possible for the client id value 2N/A * 00 to appear more than once, and it's not 2N/A * impossible for other duplicate client ids to 2N/A * occur, so continue until we reach `nrecords'. 2N/A * Lookup scenario 3: Caller has requested any 2N/A * other type of search. Walk the all the client 2N/A * No more records; bail. 2N/A * The temporary record is actually authoritative 2N/A * for this record's contents; use it instead. 2N/A * If the record isn't in-use, then skip... 2N/A * See if we've got a match... 2N/A * Caller just wants a count of the number of matching 2N/A * records, not the records themselves; continue. 2N/A * Allocate the record and fill it in. 2N/A * Chuck the record on the list and up the counter. 2N/A * Compares `dnp' to the target `targetp', using `query' to decide what 2N/A * fields to compare. Returns B_TRUE if `dnp' matches `targetp', B_FALSE 2N/A * As an optimization, skip any checks if the query is empty. 2N/A for (i = 0; i <
sizeof (
qflags) /
sizeof (
unsigned int); i++) {
2N/A * Get the active image. 2N/A * Doublecheck to make sure this entry doesn't exist already. 2N/A * We're going to insert `rec' at the head of the `hash' hash 2N/A * chain; get it ready-to-go. Note that we update the alternate 2N/A * image's hash record id pointers so that the record will 2N/A * atomically become in-use when we switch to the alternate image. 2N/A * If there's a record currently on the hash chain (i.e, we're 2N/A * not the first) then load the record. 2N/A * Before we update any information on disk, mark the container as 2N/A * dirty so that there's no chance the container is inconsistent 2N/A * without us knowing about it. 2N/A * Update the new record on-disk; note that it's not yet reachable 2N/A * Update the alternate image's on-disk hash pointers. We need to 2N/A * do this before we switch to the alternate image so we cannot 2N/A * abort with an inconsistent active image. 2N/A * Activate the alternate image. This is our commit point -- if we 2N/A * fail after this point, we will roll forward on recovery. 2N/A * Update the old record id pointers to match 2N/A * Update the signature on the record handed back to the caller. 2N/A * Finally, mark the container as clean. 2N/A * Get the active image. 2N/A * Find the original entry in the network table, make sure the 2N/A * record is in-use, and check the signature field (to guard 2N/A * against collisions). 2N/A * The signatures must match to delete a record, *except* when 2N/A * delp->dn_sig == 0. This is so records can be deleted that 2N/A * weren't retrieved via lookup_dn() 2N/A * Read our neighboring records. 2N/A * Before we update the alternate image's on-disk hash pointers, 2N/A * mark the container as dirty so that there's no chance the 2N/A * container is inconsistent without us knowing about it. 2N/A * Update the alternate image's on-disk hash pointers. We need to 2N/A * do this before we switch to the alternate image so we do not 2N/A * abort with an inconsistent active image. Also reset the 2N/A * record's alternate image record id pointers, so that the old 2N/A * record will not be in-use when we switch to the alternate image. 2N/A * Activate the alternate image. This is our commit point -- if we 2N/A * fail after this point, we will roll forward on recovery. 2N/A * Update the old record id pointers to match. 2N/A * Finally, mark the container as clean. 2N/A * Get the active image 2N/A * Find the original entry in the network table, make sure the 2N/A * entry is in-use, and check the signature field (to guard against 2N/A * Check if the record id is changing (as a result of modifying the 2N/A * IP address). If it is, then make sure the new one is available 2N/A * (if not, fail with DSVC_EXISTS). 2N/A * Update the record with the new information. 2N/A * Find out if our hash chain is changing. If so, then update the 2N/A * new record's record id pointers to be on the new chain; 2N/A * otherwise just take the original record's pointers. Note that 2N/A * in either case, only update the alternate image pointers, so 2N/A * that the new record becomes in-use when we switch to the 2N/A * Write the record out; if this means overwriting the old record, 2N/A * then write to a temporary record instead. 2N/A * Mark the container as dirty so that there's no chance the 2N/A * container is inconsistent without us knowing about it. 2N/A * If we've changed either the hash chain or the record id, then 2N/A * update our neighboring records' record id pointers. If we're 2N/A * changing hash chains, then remove ourselves from the old 2N/A * hash chain and insert ourselves on the new one -- otherwise, if 2N/A * we're changing record id's, then update our neighbors with our 2N/A * new record id. Note that we only apply these changes to the 2N/A * alternate image for now so that we can recover upon failure. 2N/A * If our hash is changing, update the alternate image 2N/A * record id pointers to point to our moved record. 2N/A * If our record id is changing, reset the old record's 2N/A * alternate image record id pointers, so that the old 2N/A * record will not be in-use once we switch over to the 2N/A * If we're using the temporary record, then set `dnh_tempimage' to 2N/A * the image that will be active when we're done. This piece of 2N/A * state is critical in the case of failure, since it indicates 2N/A * both that the temporary record is valid, and tells us whether we 2N/A * failed before or after activating the alternate image (below). 2N/A * If we failed before activating the alternate image, then the 2N/A * failure code can just reset `dnh_tempimage' to DN_NOIMAGE and 2N/A * resynchronize the pointers. Otherwise, we failed somewhere 2N/A * after making the alternate image active but before we completed 2N/A * copying the temporary record over to the actual record, which 2N/A * the recovery code will then complete on our behalf before 2N/A * resynchronizing the pointers. 2N/A * Activate the alternate image. This is our commit point -- if we 2N/A * fail after this point, we will roll forward on recovery. 2N/A * If we used the temporary record, copy the data into the actual 2N/A * record. Once finished, reset `dnh_tempimage' to DN_NOIMAGE 2N/A * since the temporary record no longer needs to be used. 2N/A * Update the old record id pointers to match. 2N/A * If our hash changed, update the alternate image record 2N/A * id pointers to point to our moved record. 2N/A * If our record id changed, then finish marking the old 2N/A * record as "not in use". 2N/A * Update the signature on the new record handed back to the caller. 2N/A * Finally, mark the container as clean. 2N/A * Compile a regular expression matching "SUNWbinfilesX_" (where X 2N/A * is a container version number) followed by an IP address 2N/A * (roughly speaking). Note that the $N constructions allow us to 2N/A * get the container version and IP address when calling regex(3C). 2N/A "(([0-9]{1,3}_){3}[0-9]{1,3})$1$", (
char *)0);
2N/A * Check (a la fsck) that a given DHCP network container is in a consistent 2N/A * state. If not, then attempt to restore internal consistency; this should 2N/A * always be possible unless the container has been externally corrupted. 2N/A * Reading the whole header is a very expensive operation; only do 2N/A * it once we're sure the container is actually dirty. On an 2N/A * E4500, this optimization lowers the wall-clock cost of creating 2N/A * a 5000-record datastore by 20 percent. 2N/A * If `dnh_tempimage' matches the current working image, then we 2N/A * crashed in the middle of a modify_dn() operation. Complete 2N/A * writing out the temporary record before restoring internal 2N/A * consistency. This is a bit of a kludge but there doesn't seem 2N/A * to be another way. 2N/A * Blindly update all the header hashhead pointers since we're 2N/A * going to have to re-write the header anyway. 2N/A * Synchronize the record pointers of all in-use records. We do 2N/A * this instead of just walking the hashheads because not all dirty 2N/A * records are hashed (for instance, we may have failed part way 2N/A * through an add_dn()). 2N/A * Verify the pointers match. If not, then correct 2N/A * the record and write it back to disk. 2N/A * Clear the dirty bit on the container. 2N/A * Given a buffer `path' of `pathlen' bytes, fill it in with a path to the 2N/A * DHCP Network table for IP network `ip' located in directory `dir'. 2N/A * Given a `cid' that's `cidlen' bytes long, hash it to a value between 0 2N/A * and DN_CIDHASHSZ - 1. We use CRC16 for our hash since it's known to be 2N/A * very evenly distributed. 2N/A * Convert the dn_filerec_t pointed to by `rec' from native (host) to 2N/A * network order or the other way. 2N/A * Convert the header pointed to by `hdrp' from native (host) to network 2N/A * order or the other way. If `hash' is false, then don't bother 2N/A * converting the hash chains. 2N/A * Read the dn_filerec_t identified by `recid' from open container `fd' 2N/A * into `rec'. Returns 0 on success, -1 on failure (errno is set). 2N/A * Write the dn_filerec_t `rec' identified by `recid' into the open 2N/A * container `fd'. Returns 0 on success, -1 on failure (errno is set). 2N/A * Read the dn_header_t from the open container `fd' into the dn_header_t 2N/A * pointed to by `hdrp'; if `hash' is not set, then skip reading the 2N/A * dn_header_t hash chains. Returns 0 on success, -1 on failure (errno is 2N/A * Write the dn_header_t pointed to by `hdrp' into open container `fd'. 2N/A * Returns 0 on success, -1 on failure (errno is set). 2N/A * Read in the head of the `cidhash' hash chain from open container `fd' 2N/A * into `recid_headp', using image `image'. Returns 0 on success, -1 on 2N/A * failure (errno is set). 2N/A * Write out the head of the `cidhash' hash chain into open container `fd' 2N/A * from `recid_head', using image `image'. Returns 0 on success, -1 on 2N/A * failure (errno is set). 2N/A * Get the byte `offset' bytes into open file `fd', and store in `bytep'. 2N/A * Returns a DSVC_* return code. 2N/A * Set the byte `offset' bytes into open file `fd' to `byte'. Returns a 2N/A * DSVC_* return code.