meta_import.c revision 3e992d44958b161ac51c4643fb42f686bd072ab2
* store the disks in a two dimensional sparse array. The disks are bucketed * based on the length of their device ids. * The list of replicated disks is built just once and this flag is set * Map logical blk to physical * This is based on the routine of the same name in the md kernel module (see * file md_mddb.c), with the following caveats: * - The kernel routine works on in core master blocks, or mddb_mb_ic_t; this * routine works instead on the mddb_mb_t read directly from the disk * Sanity check: is the block within range? If so, we then assume * that the block range map in the master block is valid and * consistent with the block count. Unfortunately, there is no * reliable way to validate this assumption. * Append to tail of linked list of md_im_drive_info_t. * Will allocate space for new node and copy args into new space. * Returns pointer to new node. * If rdid is not NULL then we know we are dealing with * replicated diskset case. 'devid_sz' will always be the * size of a valid devid which can be 'did' or 'rdid' * Also need to store the 'other' devid * In the case of regular diskset, midp->mid_o_devid * Constant time append wrapper; the append function will always walk the list, * this will take a tail argument and use the append function on just the tail * node, doing the appropriate old-tail-next-pointer bookkeeping. * Append to tail of linked list of md_im_replica_info_t. * Will allocate space for new node and copy args into new space. * Returns pointer to new node. * replica_append_wrapper() * Constant time append wrapper; the append function will always walk the list, * this will take a tail argument and use the append function on just the tail * node, doing the appropriate old-tail-next-pointer bookkeeping. * Searches the device id list for a specific * disk based on the locator block device id array index. * Returns a pointer to the did_list node if a match was * found or NULL otherwise. /* not found, return failure */ * replicated_list_lookup() * looks up a replicated disk entry in the global replicated disk list * based upon the length of that disk's device id. returns the new device id * If you store the returned devid you must create a local copy. * replicated_list_insert() * inserts a replicated disk entry into the global replicated disk list * Will step through the locator records in the supplied locator block, and add * each one with an active replica to a supplied list of md_im_drive_info_t, and * add the appropriate replicas to the md_im_replica_info_t contained therein. * search the device id list for a * specific ctds based on the locator * block device id array index. * metadrivename() can fail for a slice name * if there is not an existing mddrivename_t. * So we use metadiskname() to strip the slice * You could get a dnp match, but if 1 disk * is unavailable and the other isn't, they * will have the same dnp due * to the name being the same, but in fact * New on the list so add it * If the disk isn't available, we don't * want to try to read from it. /* determine the replica slice */ * if the replica slice size is zero, * a drive may not have a master block * For either of these assertions to fail, it implies * a NULL return from metadrivename() above. Since * the args came from a presumed valid locator block, * Extract the parameters describing this replica. * The magic "1" in the length calculation accounts * for the length of the master block, in addition to * the block count it describes. (The master block * will always take up one block on the disk, and * there will always only be one master block per * replica, even though much of the code is structured * to handle noncontiguous replicas.) * If we're here it means - * we've added the disk to the list of * We need to bump up the number of active * replica count for each such replica that is * active so that it can be used later for replica * Append pnm_rec_t entry to list of physical devices in the diskset. Entry * contains a mapping of n_key in NM namespace(or min_key in DID_NM namespace) * to name of the physical device. This list will be used to ensure that the * correct names of the physical devices are printed in the metastat output--the * NM namespace might have stale information about where the physical devices * were previously located when the diskset was last active. * Allocates pnm_rec_t record for the physical len =
strlen(p) +
1;
/* Length of name plus Null term */ * Adds new element to head of pnm_rec_t list. * Freeing all pnm_rec_t entries on the list of physical devices in the * get_disks_from_didnamespace() * This function was origionally called: get_nonreplica_disks() * Extracts the disks without replicas from the locator name space and adds them * to the supplied list of md_im_drive_info_t. * If the print verbose option was given then this function will also * correct the nm namespace so that the n_name is the right ctd name void *
r_did;
/* NULL if not a replicated diskset */ * We got a pointer to an mddb record, which we expect to contain a * name record; extract the pointer thereto. * Skip the nm_rec_hdr and iterate on the array of struct minor_name * at the end of the devid_min_rec * For a given DID_NM key, locate the corresponding device * We got a match, this is the device id we're /* we didn't find a match */ * In this case, did->did_devid will * be invalid so lookup the real one /* we have a partial replicated set, fake it */ * Get a ctds mapping for that device id. * Since disk is being imported into this system, * just use the first ctds in list. * We know the disk is available. Use the * device information in nmlist. * The disk is not available. That means we need to * use the (old) device information stored in the /* search in nm space for a match */ * Use the namespace n_dir_key to look in the * shared namespace. When we find the matching * key, that is the devname and minor number we * This complicated looking series * of code creates a devname of the * form <sn_name>/<n_name> which * Use the namespace n_drv_key to look in the * shared namespace. When we find the matching * key, that is the driver name for the disk. * Add drive to pnm_rec_t list of physical devices for /* Is it already on the list? */ /* determine the replica slice */ * if the replica slice size is zero, * a drive may not have a master block * If it is replicated diskset, * r_did will be non-NULL. * Passing the devname as NULL because field * is not currently used for a non-replica disk. * Append to tail of linked list of md_im_set_desc_t. * Will allocate space for new node AND populate it by extracting disks with * and without replicas from the locator blocks and locator namespace. * Returns pointer to new node. /* allocate new list element */ /* Get the disks with and without replicas */ * An error in this struct could come from either of * in both cases, we want to pass it back on up. * Iterator to walk the minor node tree of the device snapshot, adding only the * first non-block instance of each non-cdrom minor node to a list of disks. * If a device does not have a device id, we can't * do anything with it so just exclude it from our * This would also encompass CD devices and floppy * devices that don't have a device id. /* char disk devices (as opposed to block) */ /* only first occurrence (slice 0) of each instance */ * Snapshots the device tree and extracts disk devices from the snapshot. * Checks if given drive is mounted, swapped, part of disk configuration * or in use by SVM. ep also has error code set up if drive is in use. * Returns 1 if drive is in use. * Returns 0 if drive is not in use. * We pass in db_ep to meta_setup_db_locations * and never ever use the error contained therein * because all we're interested in is a check to * see whether any local metadbs are present. * Removes in-use disks from the list prior to further processing. * Return value depends on err_on_prune flag: if set, and one or more disks * are pruned, the return list will be the pruned disks. If not set, or if no * disks are pruned, the return list will be the unpruned disks. * Assuming we're interested in knowing about * whatever error occurred, but not in stopping. * Check if the drive is inuse. * 0 for no valid master block * 1 for valid master block * The supplied buffer will be filled in for EITHER 0 or 1. * The master block magic number can either be MDDB_MAGIC_MB in * the case of a real master block, or, it can be MDDB_MAGIC_DU * in the case of a dummy master block * 0 for no valid locator block * 1 for valid locator block * read_locator_block_did() * 0 for no valid locator name struct * 1 for valid locator name struct * 0 for no valid locator name struct * 1 for valid locator name struct * Return the DE corresponding to the requested namespace record type. * Modifies dbp to have a firstentry if one isn't there. * Reads the NM, NM_DID or NM_DID_SHR record in the mddb and stores the * configuration data in the buffer 'nm' * only one record per mddb. There is a rare case when we * can't expand the record. If this is the case then we * For now assume the normal case and handle the extended /* If meta_nm_rec() never succeeded, bail out */ /* Read in the appropriate record and return configurations */ * Determines whether a disk has been replicated or not. It checks to see * if the device id stored in the master block is the same as the device id * registered for that disk on the current system. If the two device ids are * different, then we know that the disk has been replicated. * If need_devid is set and the disk is replicated, fill in the new_devid. * Also, if need_devid is set, this routine allocates memory for the device * ids; the caller of this routine is responsible for free'ing up the memory. * MD_IM_SET_REPLICATED if it's a replicated disk * 0 if it's not a replicated disk * free_replicated_disks_list() * this frees up all the memory allocated by build_replicated_disks_list * build_replicated_disks_list() * Builds a list of disks that have been replicated using either a * remote replication or a point-in-time replication software. The * list is stored as a two dimensional sparse array. /* determine the replica slice */ * if the replica slice size is zero, don't bother opening /* a drive may not have a master block so we just continue */ * Frees the did_list allocated as part of build_did_list * meta_free_im_replica_info * Frees the md_im_replica_info list * meta_free_im_drive_info * Frees the md_im_drive_info list * dnp is not on the drivenamelist and is a temp * dnp for metaimport if the disk is unavailable. * We need to specifically free it because of this. * If the disk is available, standard drivelist freeing * will kick in so we don't need to do it. * Frees the md_im_set_desc_t list * Build a list of device ids corresponding to disks in the locator block. * Memory is allocated here for the nodes in the did_list. The callers of * this routine must also call free_did_list to free up the memory after * 0 for no valid locator block device id array * 1 for valid locator block device id array * ENOTSUP partial diskset, not all disks in a diskset on the * system where import is being executed * If we can re-use the buffer that has already been * read in then just use it. Otherwise free * the previous one and alloc a new one * If we are not able to find the ctd mapping corresponding * to a given device id, it probably means the device id in * question is not registered with the system. * Highly likely that the only time this happens, we've hit * a case where not all the disks that are a part of the * diskset were moved before importing the diskset. * If set is a replicated diskset, then the device id we get * from 'lb' will be the 'other' did and we need to lookup * the real one before we call this routine. /* we have a partial replicated set, fake it */ * Partial diskset case. We'll need to get the * device information from the metadb instead * of the output (nm) of meta_deviceid_to_nmlist. * Disk is there. Grab device information from nm structure. * Checks the disks listed in the shared did namespace to see if they * are accessable on the system. If not, return ENOTSUP error to * indicate we have a partial diskset. * ENOTSUP partial diskset, not all disks in a diskset on the * system where import is being executed /* grab device id and minor name from the shared spaces */ * We need to check that the DID_NM and DID_SHR_NM are in * sync. It is possible that we took a panic between writing * the two areas to disk. This would be cleaned up on the * next snarf but we don't know for sure that snarf has even * happened since we're reading from disk. * Try to find disk in the system. If we can't find the * disk, we have a partial diskset. /* Partial diskset detected */ /* increment to next item in the shared spaces */ * Generates metadb output for the diskset. * Looping through all drives in the diskset to print * out information about the drive and if the verbose * option is set print out replica data. (
void)
printf(
"%7.7s\t\t%7.7s\t",
(
void)
printf(
"%i\t\t%7.7s\t",
* meta_replica_quorum will determine if the disks in the set to be * imported have enough valid replicas to have quorum. * -1 Set doesn't have quorum * The drive is okay. Now count its replicas /* odd number of replicas */ /* even number of replicas */ * Choose the best drive to use for the metaimport command. /* drive must be available */ /* replica must be active to be a good one */ /* Calculates the correct indentation. */ * This will print before the information for the first diskset * if the verbose option was set. gettext(
"Disksets eligible for import"));
* Make the distinction between a regular diskset and * a replicated diskset. Also make the distinction * between a partial vs. full diskset. "Found partial replicated diskset " "Importing partial replicated diskset " "Found partial regular diskset containing " "Importing partial regular diskset " "Found replicated diskset containing " "Importing replicated diskset containing " "Found regular diskset containing disks"));
"Importing regular diskset containing " * Check each drive in the set. If it's unavailable or * an overlap tell the user. * There is the potential for an overlap, see if * this disk is one of the overlapped disks. * This note explains the (UNAVAIL) that appears next to the * disks in the diskset that are not available. gettext(
"(UNAVAIL) WARNING: This disk is unavailable on" "data in the diskset."));
* This note explains the (CONFLICT) that appears next to the * disks whose lb_inittime timestamp does not * match the rest of the diskset. gettext(
"(CONFLICT) WARNING: This disk has been reused in " "another diskset or system configuration."),
indent,
gettext(
"Import may corrupt data in the diskset."));
* If the verbose flag was given on the command line, * we will print out the metastat -c information , the * creation time, and last modified time for the diskset. gettext(
"Metadatabase information:"));
* Printing creation time and last modified time. * Last modified: uses the global variable "lastaccess", * which is set to the last updated timestamp from all of * the database blocks(db_timestamp) or record blocks * Creation time is the locator block init time gettext(
"Metadevice information:"));
* Even if the verbose option is not set, we will print the * creation time for the diskset. * If the diskset is not actually being imported, then we * print out extra information about how to import it. * If the verbose flag was not set, then we will also * print out information about how to obtain verbose output. * The translation of the phrase "For more information * about this set" will be followed by a ":" and a * suggested command (untranslatable) that the user * may use to request additional information. gettext(
"For more information about this diskset"),
* The translation of the phrase "To import this set" * will be followed by a ":" and a suggested command * (untranslatable) that the user may use to import (
void)
printf(
"%s%s:\n%s %s -f -s <newsetname> %s\n",
(
void)
printf(
"%s%s:\n%s %s -s <newsetname> %s\n",
* meta_get_and_report_set_info * Scans a given drive for set specific information. If the given drive * has a shared metadb, scans the shared metadb for information pertaining * If imp_flags has META_IMP_PASS1 set don't report. * 0 success but no replicas were found * 1 success and a replica was found * Determine and open the replica slice * Test for the size of replica slice in question. If * the size is zero, we know that this is not a disk that was * part of a set and it should be silently ignored for import. * After the open() succeeds, we should return via the "out" * label to clean up after ourselves. (Up 'til now, we can * just return directly, because there are no resources to * Once the locator block has been read, we need to * check if the locator block commit count is zero. * If it is zero, we know that the replica we're dealing * with is on a disk that was deleted from the disk set; * and, it potentially has stale data. We need to quit * Make sure that the disk being imported has device id * namespace present for disksets. If a disk doesn't have * device id namespace, we skip reading the replica on that disk * Grab the locator block device id array. Allocate memory for the * For a disk that has not been replicated, extract the device ids * stored in the locator block device id array and store them in * If the disk has been replicated using replication software such * the locator block are invalid and we need to build a list of * We need to do this for both passes but * replicated_disk_list_built is global so we need some way * to determine which pass we're on. Set it to the appropriate * if there's a replicated diskset involved, we need to * scan the system one more time and build a list of all * candidate disks that might be part of that replicated set * Until here, we've gotten away with fixed sizes for the * master block and locator block. The locator names, * however, are sized (and therefore allocated) dynamically * according to information in the locator block. * An rval of ENOTSUP means we have a partial diskset. We'll want * to set the partial variable so we can pass this information * set_append_wrapper later for placing on the misp list. * If no NM record was found, it still is a valid configuration * but it also means that we won't find any corresponding DID_NM * At this point, we have read in all of the blocks that form * the nm_rec. We should at least detect the corner case * mentioned above, in which r_next_recid links to another * nm_rec. Extended namespace handling is left for Phase 2. * What this should really be is a loop, each iteration of * which reads in a nm_rec and calls the set_append(). * We need to check if all of the disks listed in the namespace * are actually available. If they aren't we'll return with * an ENOTSUP error which indicates a partial diskset. * An rval of ENOTSUP means we have a partial diskset. We'll want * to set the partial variable so we can pass this information * to set_append_wrapper later for placing on the misp list. /* Finally, we've got what we need to process this replica. */ /* Return the fact that we found at least one set */ * If we are at the end of the list, we must free up * the replicated list too * Return the minor name associated with a given disk slice * Update or create the master block with the new set number. * If a non-null devid pointer is given, the devid in the * master block will also be changed. * This routine is called during the import of a diskset * (meta_imp_update_mb) and during the take of a diskset that has * some unresolved replicated drives (meta_unrslv_replicated_mb). * Returns : nothing (void) void *
new_devid,
/* devid to be stored in mb */ void *
old_devid,
/* old devid stored in mb */ /* determine the replica slice */ * if the replica slice size is zero, /* If no replica on disk, check for dummy mb */ * Check to see if there is a dummy there. If not * create one. This would happen if the set was * created before the master block dummy code was * If a old_devid is non-NULL then we're are dealing with a * replicated diskset and the devid needs to be updated. * Now write out the changes to disk. * If an error occurs, just continue on. * Next take of set will register this drive as * an unresolved replicated drive and will attempt * to fix the master block again. * Update the master block information during an import. * Takes an import set descriptor. * Returns : nothing (void) int offset =
16;
/* default mb offset is 16 */ * If disk isn't available we can't update, so go to next * If we have replicas on this disk we need to make * sure that we update the master block on every /* No replicas, just update the one dummy mb */ * meta_unrslv_replicated_common * Given a drive_desc and a drivenamelist pointer, * return the devidp associated with the drive_desc, * the replicated (new) devidp associated with the drive_desc * and the specific mddrivename in the drivenamelist that * matches the replicated (new) devidp. * Typically the drivenamelist pointer would be setup by * the meta_prune_cnames function. * Calling function must free devidp using devid_free. * Returns 0 - success, found new_devidp and dnp_new. * Returns 1 - failure, didn't find new devid info /* name of replicated drive */ /* Get old devid from drive record */ /* Look up replicated (new) devid */ * Using new_devid, find a drivename entry with a matching devid. * Use the passed in dnlp since it has the new (replicated) disknames /* If can't find new name for drive - nothing to update */ * Setup returned value to be the drivename structure associated * with new (replicated) drive. * Need to return the new devid including the minor name. * Find the minor_name here using the sidename or by * looking in the namespace. * The disk has no side name information * minor_name will be NULL if dnp->devid == NULL * Now, use the old devid with minor name to lookup * the replicated (new) devid that will also contain * meta_unrslv_replicated_mb * Update the master block information during a take. * Takes an md_drive_desc descriptor. * Returns : nothing (void) /* If don't need to update master block - skip it. */ * Get old and replicated (new) devids associated with this * drive. Also, get the new (replicated) drivename structure. int offset =
16;
/* default mb offset is 16 */ * Update each master block on the disk /* update the one dummy mb */ /* Set drive record flags to ok */ /* Just update this one drive record. */ /* Ignore failure since no bad effect. */ * Change a devid stored in the diskset namespace and in the local set * namespace with the new devid. * This routine is called during the import of a diskset * (meta_imp_update_nn) and during the take of a diskset that has * some unresolved replicated drives (meta_unrslv_replicated_nm). * Returns : nothing (void) void *
old_devid,
/* old devid being replaced */ void *
new_devid,
/* devid to be stored in nm */ (
void)
memset(&c, 0,
sizeof (c));
/* During import to NOT update the local namespace. */ * Change a devid stored in the diskset namespace with the new devid. * This routine is called during the import of a remotely replicated diskset. * Returns : nothing (void) * If disk isn't available we can't update, so go to next * meta_unrslv_replicated_nm * Change a devid stored in the diskset namespace and in the local set * namespace with the new devid. * This routine is called during the take of a diskset that has * some unresolved replicated drives. * Returns : nothing (void) /* If don't need to update namespace - skip it. */ /* Get old devid from drive record */ * Get old and replicated (new) devids associated with this * drive. Also, get the new (replicated) drivename structure. * Using the new devid, fix up the name. * If meta_upd_ctdnames fails, the next take will re-resolve * the name from the new devid. * This code needs to be expanded when we run in SunCluster * environment SunCluster obtains setno internally (
void)
memset(&c, 0,
sizeof (c));
* Check to see if the setname that the set is being imported into, * Find the next available set number /* Check to see if replica quorum requirement is fulfilled */ * If we have a stale diskset, the kernel will * delete the replicas on the unavailable disks. * To be consistent, we'll zero out the mirp on those * We pass the list of the drives in the * set with replicas on them down to the kernel. * No replicas on this disk, go to next disk. * The disk isn't there. We'll need to get the * disk information from the midp list instead * of going and looking for it. This means it * will be information relative to the old * If the dry run option was specified, flag success "import should be successful"));
* Now the kernel should have all the information * regarding the import diskset replica. * Tell the kernel to load them up and import the set (
void)
memset(&c, 0,
sizeof (c));
* Create a set name for the set. /* Update the diskset namespace */ /* Release the diskset - even if update_nm failed */ (
void)
memset(&c, 0,
sizeof (c));
/* Don't need device id information from this ioctl */ /* If update_nm failed, then fail the import. */ * We'll need to update information in the master block due * to the set number changing and if the case of a replicated * diskset, the device id changing. May also need to create a * dummy master block if it's not there. * Create set record for diskset, but record is left in * MD_SR_ADD state until after drives are added to set. * Create drive records for the disks in the set. * If the disk isn't available, the dnp->devid is * no good. It is either blank for the case where * there is no disk with that devname, or it * contains the devid for the real disk in the system * with that name. The problem is, if the disk is * unavailable, then the devid should be the devid * of the missing disk. So we're faking a dnp for * the import. This is needed for creating drive /* If drives were added without error, set set_record to OK */