zpool_vdev.c revision f94275ce205810a201404c5f35f4cc96057022b1
2N/A * The contents of this file are subject to the terms of the 2N/A * Common Development and Distribution License (the "License"). 2N/A * You may not use this file except in compliance with the License. 2N/A * See the License for the specific language governing permissions 2N/A * and limitations under the License. 2N/A * When distributing Covered Code, include this CDDL HEADER in each 2N/A * If applicable, add the following below this CDDL HEADER, with the 2N/A * fields enclosed by brackets "[]" replaced with your own identifying 2N/A * information: Portions Copyright [yyyy] [name of copyright owner] 2N/A * Copyright 2009 Sun Microsystems, Inc. All rights reserved. 2N/A * Use is subject to license terms. 2N/A * Functions to convert between a list of vdevs and an nvlist representing the 2N/A * configuration. Each entry in the list can be one of: 2N/A * disk=(path=..., devid=...) 2N/A * While the underlying implementation supports it, group vdevs cannot contain 2N/A * other group vdevs. All userland verification of devices is contained within 2N/A * this file. If successful, the nvlist returned can be passed directly to the 2N/A * kernel; we've done as much verification as possible in userland. 2N/A * Hot spares are a special case, and passed down as an array of disk vdevs, at 2N/A * the same level as the root of the vdev tree. 2N/A * The only function exported by this file is 'make_root_vdev'. The 2N/A * function performs several passes: 2N/A * 1. Construct the vdev specification. Performs syntax validation and 2N/A * makes sure each device is valid. 2N/A * 2. Check for devices in use. Using libdiskmgt, makes sure that no 2N/A * devices are also in use. Some can be overridden using the 'force' 2N/A * flag, others cannot. 2N/A * 3. Check for replication errors if the 'force' flag is not specified. 2N/A * validates that the replication level is consistent across the 2N/A * 4. Call libzfs to label any whole disks with an EFI label. 2N/A * For any given vdev specification, we can have multiple errors. The 2N/A * vdev_error() function keeps track of whether we have seen an error yet, and 2N/A * prints out a header if its the first error we've seen. 2N/A "the following errors:\n"));
2N/A "must be manually repaired:\n"));
2N/A * /dev/dsk. Don't bother printing an error message in this case. 2N/A * Validate a device, passing the bulk of the work off to libdiskmgt. 2N/A * If we're given a whole disk, ignore overlapping slices since we're 2N/A * about to label it anyway. 2N/A /* dm_isoverlapping returned -1 */ 2N/A /* libdiskmgt's devcache only handles physical drives */ 2N/A * Validate a whole disk. Iterate over all slices on the disk and make sure 2N/A * that none is in use by calling check_slice(). 2N/A * Get the drive associated with this disk. This should never fail, 2N/A * because we already have an alias handle open for the device. 2N/A * It is possible that the user has specified a removable media drive, 2N/A * and the media is not present. 2N/A * Iterate over all slices and report any errors. We don't care about 2N/A * overlapping slices because we are using the whole disk. 2N/A * Validate a device. 2N/A * For whole disks, libdiskmgt does not include the leading dev path. 2N/A * Check that a file is valid. All we can do in this case is check that it's 2N/A * not in use by another pool, and not in use by swap. 2N/A * Allow hot spares to be shared between pools. 2N/A * By "whole disk" we mean an entire physical disk (something we can 2N/A * label, toggle the write cache on, etc.) as opposed to the full 2N/A * capacity of a pseudo-device such as lofi or did. We act as if we 2N/A * are labeling the disk, which should be a pretty good test of whether 2N/A * it's a viable device or not. Returns B_TRUE if it is and B_FALSE if 2N/A * Create a leaf vdev. Determine if this is a file or a device. If it's a 2N/A * device, fill in the device id to make a complete nvlist. Valid forms for a 2N/A * /xxx Full path to file 2N/A * Determine what type of vdev this is, and put the full path into 2N/A * 'path'. We detect whether this is a device of file afterwards by 2N/A * checking the st_mode of the file. 2N/A * Complete device or file path. Exact type is determined by 2N/A * examining the file descriptor afterwards. 2N/A * This may be a short path for a device, or it could be total 2N/A * gibberish. Check to see if it's a known device in 2N/A * /dev/dsk/. As part of this check, see if we've been given a 2N/A * an entire disk (minus the slice number). 2N/A * If we got ENOENT, then the user gave us 2N/A * gibberish, so try to direct them with a 2N/A * reasonable error message. Otherwise, 2N/A * regurgitate strerror() since it's the best we 2N/A "shorthand device name\n"));
2N/A * Determine whether this is a device or a file. 2N/A "block device or regular file\n"),
path);
2N/A * Finally, we have the complete device or file, and we know that it is 2N/A * acceptable to use. Construct the nvlist to describe this vdev. All 2N/A * vdevs have a 'path' element, and devices also have a 'devid' element. 2N/A * For a whole disk, defer getting its devid until after labeling it. 2N/A * Get the devid for the device. 2N/A * Go through and verify the replication level of the pool is consistent. 2N/A * Performs the following checks: 2N/A * For the new spec, verifies that devices in mirrors and raidz are the 2N/A * If the current configuration already has inconsistent replication 2N/A * levels, ignore any other potential problems in the new spec. 2N/A * Otherwise, make sure that the current spec (if there is one) and the new 2N/A * spec have consistent replication levels. 2N/A * Given a list of toplevel vdevs, return the current replication level. If 2N/A * the config is inconsistent, then NULL is returned. If 'fatal' is set, then 2N/A * an error message will be displayed for each self-inconsistent vdev. 2N/A * For separate logs we ignore the top level vdev replication 2N/A * This is a 'file' or 'disk' vdev. 2N/A * This is a mirror or RAID-Z vdev. Go through and make 2N/A * sure the contents are all the same (files vs. disks), 2N/A * keeping track of the number of elements in the 2N/A * We also check that the size of each vdev (if it can 2N/A * be determined) is the same. 2N/A * The 'dontreport' variable indicates that we've 2N/A * already reported an error for this spec, so don't 2N/A * bother doing it again. 2N/A * If this is a replacing or spare vdev, then 2N/A * get the real first child of the vdev. 2N/A * with files, report it as an error. 2N/A "mismatched replication " 2N/A "level: %s contains both " 2N/A "files and devices\n"),
2N/A * According to stat(2), the value of 'st_size' 2N/A * is undefined for block devices and character 2N/A * devices. But there is no effective way to 2N/A * determine the real size in userland. 2N/A * Instead, we'll take advantage of an 2N/A * implementation detail of spec_size(). If the 2N/A * device is currently open, then we (should) 2N/A * return a valid size. 2N/A * If we still don't get a valid size (indicated 2N/A * by a size of 0 or MAXOFFSET_T), then ignore 2N/A * this device altogether. 2N/A * Also make sure that devices and 2N/A * slices have a consistent size. If 2N/A * they differ by a significant amount 2N/A * (~16MB) then report an error. 2N/A "%s contains devices of " 2N/A "different sizes\n"),
2N/A * At this point, we have the replication of the last toplevel 2N/A * vdev in 'rep'. Compare it to 'lastrep' to see if its 2N/A "mismatched replication level: " 2N/A "both %s and %s vdevs are " 2N/A "mismatched replication level: " 2N/A "both %llu and %llu device parity " 2N/A "%s vdevs are present\n"),
2N/A "mismatched replication level: " 2N/A "both %llu-way and %llu-way %s " 2N/A "vdevs are present\n"),
2N/A * Check the replication level of the vdev spec against the current pool. Calls 2N/A * get_replication() to make sure the new spec is self-consistent. If the pool 2N/A * has a consistent replication level, then we ignore any errors. Otherwise, 2N/A * report any difference between the two. 2N/A * If we have a current pool configuration, check to see if it's 2N/A * self-consistent. If not, simply return success. 2N/A * for spares there may be no children, and therefore no 2N/A * replication level to check 2N/A * If all we have is logs then there's no replication level to check. 2N/A * Get the replication level of the new vdev spec, reporting any 2N/A * inconsistencies found. 2N/A * Check to see if the new vdev spec matches the replication level of 2N/A "mismatched replication level: pool uses %s " 2N/A "and new vdev is %s\n"),
2N/A "mismatched replication level: pool uses %llu " 2N/A "device parity and new vdev uses %llu\n"),
2N/A "mismatched replication level: pool uses %llu-way " 2N/A "%s and new vdev uses %llu-way %s\n"),
2N/A * Go through and find any whole disks in the vdev specification, labelling them 2N/A * as appropriate. When constructing the vdev spec, we were unable to open this 2N/A * device in order to provide a devid. Now that we have labelled the disk and 2N/A * know that slice 0 is valid, we can construct the devid now. 2N/A * If the disk was already labeled with an EFI label, we will have gotten the 2N/A * devid already (because we were able to open the whole disk). Otherwise, we 2N/A * need to get the devid after we label the disk. 2N/A * We have a disk device. Get the path to the device 2N/A * and see if it's a whole disk by appending the backup 2N/A * slice and stat()ing the device. 2N/A * Fill in the devid, now that we've labeled the disk. 2N/A * Update the path to refer to the 's0' slice. The presence of 2N/A * the 'whole_disk' field indicates to the CLI that we should 2N/A * chop off the slice number when displaying the device in 2N/A * Determine if the given path is a hot spare within the given configuration. 2N/A * Go through and find any devices that are in use. We rely on libdiskmgt for 2N/A * the majority of this task. 2N/A * As a generic check, we look to see if this is a replace of a 2N/A * hot spare within the same pool. If so, we allow it 2N/A * regardless of what libdiskmgt or zpool_in_use() says. const char *p =
type +
5;
return (
NULL);
/* no zero prefixes allowed */ * Construct a syntactically valid vdev specification, * and ensure that all devices and files exist and can be opened. * Note: we don't bother freeing anything in the error paths * because the program is just going to exit anyway. * If it's a mirror or raidz, the subsequent arguments are * its leaves -- until we encounter the next mirror or raidz. "specification: 'spare' can be " "specified only once\n"));
"specification: 'log' can be " "specified only once\n"));
* A log is not a real grouping device. * We just set is_log and continue. "specification: 'cache' can be " "specified only once\n"));
"specification: unsupported 'log' " for (c =
1; c <
argc; c++) {
"specification: %s requires at least %d " "specification: %s supports no more than " * We have a device. Pass off to make_leaf_vdev() to * construct the appropriate nvlist describing the vdev. "specification: at least one toplevel vdev must be " "log requires at least 1 device\n"));
* Finally, create nvroot and add all top-level vdevs to it. * Get and validate the contents of the given vdev specification. This ensures * that the nvlist returned is well-formed, that all the devices exist, and that * they are not currently in use by any other known consumer. The 'poolconfig' * parameter is the current configuration of the pool when adding devices * existing pool, and is used to perform additional checks, such as changing the * replication level of the pool. It can be 'NULL' to indicate that this is a * new pool. The 'force' flag controls whether devices should be forcefully * added, even if they appear in use. * Construct the vdev specification. If this is successful, we know * that we have a valid specification, and that all devices can be * Validate each device to make sure that its not shared with another * subsystem. We do this even if 'force' is set, because there are some * uses (such as a dedicated dump device) that even '-f' cannot * Check the replication level of the given vdevs and report any errors * found. We include the existing pool spec, if any, as we need to * catch changes against the existing replication level. * Run through the vdev specification and label any whole disks found.