libzfs_pool.c revision 9a686fbc186e8e2a64e9a5094d44c7d6fa0ea167
2N/A * The contents of this file are subject to the terms of the 2N/A * Common Development and Distribution License (the "License"). 2N/A * You may not use this file except in compliance with the License. 2N/A * See the License for the specific language governing permissions 2N/A * and limitations under the License. 2N/A * When distributing Covered Code, include this CDDL HEADER in each 2N/A * If applicable, add the following below this CDDL HEADER, with the 2N/A * fields enclosed by brackets "[]" replaced with your own identifying 2N/A * information: Portions Copyright [yyyy] [name of copyright owner] 2N/A * Copyright 2015 Nexenta Systems, Inc. All rights reserved. 2N/A * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. 2N/A * Copyright (c) 2011, 2015 by Delphix. All rights reserved. 2N/A * Copyright (c) 2013, Joyent, Inc. All rights reserved. 2N/A int create:
1;
/* Validate property on creation */ 2N/A int import:
1;
/* Validate property on import */ 2N/A * ==================================================================== 2N/A * zpool property functions 2N/A * ==================================================================== 2N/A * zpool_get_all_props() has most likely failed because 2N/A * the pool is faulted, but if all we need is the top level 2N/A * vdev's guid then get it from the zhp config nvlist. 2N/A * Map VDEV STATE to printed strings. 2N/A * Get a zpool property value for 'prop' and return the value in 2N/A * a pre-allocated buffer. 2N/A * Check if the bootfs name has the same pool name as it is set to. 2N/A * Assuming bootfs is a valid dataset name. 2N/A * Given an nvlist of zpool properties to be set, validate that they are 2N/A * correct, and parse any numeric properties (index, boolean, etc) if they are 2N/A * specified as strings. 2N/A "property '%s' can only be set to " 2N/A * Make sure this property is valid and applies to this type. 2N/A * Perform additional checking for specific properties. 2N/A "property '%s' number %d is invalid."),
2N/A "property '%s' cannot be set at creation " 2N/A "pool must be upgraded to support " 2N/A * bootfs property value has to be a dataset name and 2N/A * the dataset has to be in the same pool as it sets to. 2N/A "property '%s' can only be set during pool " 2N/A "property '%s' must be empty, an " 2N/A "'%s' is not a valid directory"),
2N/A "comment may only have printable " 2N/A "comment must not exceed %d characters"),
2N/A "property '%s' can only be set at " 2N/A * Set zpool property : propname=propval. 2N/A * Execute the corresponding ioctl() to set this property. 2N/A /* add any unsupported features */ 2N/A * Before adding the property to the list make sure that no 2N/A * other pool already added the same property. 2N/A * Get the state for the given feature on the given ZFS pool. 2N/A * Convert from feature name to feature guid. This conversion is 2N/A * unecessary for unsupported@... properties because they already 2N/A * Don't start the slice at the default block of 34; many storage 2N/A * devices will use a stripe width of 128k, so start there instead. 2N/A * Validate the given pool name, optionally putting an extended error message in 2N/A * The rules for reserved pool names were extended at a later point. 2N/A * But we need to support users with existing pools that may now be 2N/A * invalid. So we only check for this expanded set of names during a 2N/A * create (or import), and only in userland. 2N/A "name must begin with a letter"));
2N/A "name is reserved"));
2N/A "pool name is reserved"));
2N/A "leading slash in name"));
2N/A "empty component in name"));
2N/A "trailing slash in name"));
2N/A "multiple '@' delimiters in name"));
2N/A * Open a handle to the given pool, even if the pool is currently in the FAULTED 2N/A * Make sure the pool name is valid. 2N/A * Like the above, but silent on error. Used when iterating over pools (because 2N/A * the configuration cache may be out of date). 2N/A * Similar to zpool_open_canfail(), but refuses to open pools in the faulted 2N/A * Close the handle. Simply frees the memory associated with the handle. 2N/A * Return the name of the pool. 2N/A * Return the state of the pool (ACTIVE or UNAVAILABLE) 2N/A * Create the named pool, using the provided vdev list. It is assumed 2N/A * that the consumer has already validated the contents of the nvlist, so we 2N/A * don't have to worry about error semantics. 2N/A * This can happen if the user has specified the same 2N/A * device multiple times. We can't reliably detect this 2N/A * until we try to add it and see we already have a 2N/A "one or more vdevs refer to the same device"));
2N/A * This happens if the record size is smaller or larger 2N/A * than the allowed size range, or not a power of 2. 2N/A * NOTE: although zfs_valid_proplist is called earlier, 2N/A * this case may have slipped through since the 2N/A * pool does not exist yet and it is therefore 2N/A * impossible to read properties e.g. max blocksize 2N/A "record size invalid"));
2N/A * This occurs when one of the devices is below 2N/A * SPA_MINDEVSIZE. Unfortunately, we can't detect which 2N/A * device was the problem device since there's no 2N/A * reliable way to determine device size from userland. 2N/A "one or more devices is less than the " "one or more devices is out of space"));
"cache device must be a disk or disk slice"));
* Destroy the given pool. It is up to the caller to ensure that there are no * datasets left in the pool. "one or more devices is read only"));
* Add the given vdevs to the pool. The caller must have already performed the * necessary verification to ensure that the vdev specification is well-formed. "upgraded to add hot spares"));
"upgraded to add cache devices"));
* This can happen if the user has specified the same * device multiple times. We can't reliably detect this * until we try to add it and see we already have a "one or more vdevs refer to the same device"));
* This occurrs when one of the devices is below * SPA_MINDEVSIZE. Unfortunately, we can't detect which * device was the problem device since there's no * reliable way to determine device size from userland. "device is less than the minimum " "pool must be upgraded to add these vdevs"));
"root pool can not have multiple vdevs" "cache device must be a disk or disk slice"));
* Exports the pool from the system. The caller must ensure that there are no * mounted datasets in the pool. "use '-f' to override the following errors:\n" "'%s' has an active shared spare which could be" " used by other pools once '%s' is exported."),
"Would be able to return %s " "to its state as of %s.\n"),
"Pool %s returned to its state as of %s.\n"),
"%s approximately %lld "),
dryrun ?
"Would discard" :
"Discarded",
"minutes of transactions.\n"));
"%s approximately %lld "),
"seconds of transactions.\n"));
/* All attempted rewinds failed if ZPOOL_CONFIG_LOAD_TIME missing */ "Recovery is possible, but will result in some data loss.\n"));
"\tReturning the pool to its state as of %s\n" "\tshould correct the problem. "),
"\tReverting the pool to an earlier state " "should correct the problem.\n\t"));
"Approximately %lld minutes of data\n" "\tmust be discarded, irreversibly. "), (
loss +
30) /
60);
"Approximately %lld seconds of data\n" "\tmust be discarded, irreversibly. "),
loss);
"After rewind, at least\n" "\tone persistent user-data error will remain. "));
"After rewind, several\n" "\tpersistent user-data errors will remain. "));
"Recovery can be attempted\n\tby executing 'zpool %s -F %s'. "),
"\tis strongly recommended after recovery.\n"));
"Destroy and re-create the pool from\n\ta backup source.\n"));
* zpool_import() is a contracted interface. Should be kept the same * Applications should use zpool_import_props() to import a pool with * new properties value to be set. * Import the given pool using the known configuration and a list of * properties to be set. The configuration should have come from * zpool_find_import(). The 'newname' parameters control whether the pool * is imported with a different name. * Dry-run failed, but we print out what success * looks like if we found a best txg "pool uses the following feature(s) not " "supported by this system:\n"));
"All unsupported features are only " "required for writing to the pool." "\nThe pool can be imported using " "one or more devices is read only"));
"The devices below are missing, use " "'-m' to import the pool anyway:\n"));
* This should never fail, but play it safe anyway. * This provides a very minimal check whether a given string is likely a * c#t#d# style string. Users of this are expected to do their own * verification of the s# part. * More elaborate version for ones which may start with "/dev/dsk/" * If it starts with a slash, check the last component. * If it ends in "/old", check the second-to-last * component of the string instead. * Find a vdev that matches the search criteria specified. We use the * the nvpair name to determine how we should look for the device. * 'avail_spare' is set to TRUE if the provided guid refers to an AVAIL * spare; but FALSE if its an INUSE spare. /* Nothing to look for */ /* Obtain the key we will use to search */ * Search for the requested value. Special cases: * - ZPOOL_CONFIG_PATH for whole disk entries. These end in * "s0" or "s0/old". The "s0" part is hidden from the user, * but included in the string, so this matches around it. * - looking for a top-level vdev name (i.e. ZPOOL_CONFIG_TYPE). * Otherwise, all other searches are simple string compares. * make_leaf_vdev() should only set * wholedisk for ZPOOL_CONFIG_PATHs which * will include "/dev/dsk/", giving plenty of * room for the indices used next. * strings identical except trailing "s0" * strings identical except trailing "s0/old" * Determine our vdev type, keeping in mind * that the srchval is composed of a type and * vdev id pair (i.e. mirror-4). * If the types don't match then keep looking. * Now verify that we have the correct vdev id. * The 'is_log' value is only set for the toplevel * vdev, not the leaf vdevs. So we always lookup the * log device from the root of the vdev tree (where * Given a physical path (minus the "/devices" prefix), find the * Determine if we have an "interior" top-level vdev (i.e mirror/raidz). }
else if (
path[0] !=
'/') {
* Helper function for zpool_get_physpaths(). /* if physpath was not copied properly, clear it */ * An active spare device has ZPOOL_CONFIG_IS_SPARE set. * For a spare vdev, we only want to boot from the active for (i = 0; i <
count; i++) {
* Get phys_path for a root pool config. * Return 0 on success; non-zero on failure. * root pool can only have a single top-level vdev. * Get phys_path for a root pool * Return 0 on success; non-zero on failure. * If the device has being dynamically expanded then we need to relabel * the disk to use the new unallocated space. "efi_use_whole_disk")) ==
NULL)
"relabel '%s': unable to open device"),
name);
* It's possible that we might encounter an error if the device * does not have any unallocated space left. If so, we simply * ignore that error and continue on. "relabel '%s': unable to read disk capacity"),
name);
* Bring the specified vdev online. The 'flags' parameter is a set of the * XXX - L2ARC 1.0 devices can't support expansion. "cannot expand cache devices"));
"from this pool into a new one. Use '%s' " "instead"),
"zpool detach");
* Take the specified vdev offline * There are no other replicas of this device. * The log device has unplayed logs * Mark the given vdev faulted. * There are no other replicas of this device. * Mark the given vdev degraded. * Returns TRUE if the given nvlist is a vdev that was originally swapped in as * Attach new_disk (fully described by nvroot) to old_disk. * If 'replacing' is specified, the new disk will replace the old one. "new device must be a single disk"));
* If the target is a hot spare that has been swapped in, we can only * replace it with another hot spare. "can only be replaced by another hot spare"));
* XXX need a better way to prevent user from * booting up a half-baked vdev. "sure to wait until resilver is done " * Can't attach to or replace this type of vdev. "cannot replace a log with a spare"));
"for completion or use 'zpool detach'"));
"cannot replace a replacing device"));
"can only attach to mirrors and top-level " * The new device must be a single disk. "new device must be a single disk"));
* The new device is too small. * The new device has a different alignment requirement. "devices have different sector alignment"));
* The resulting top-level vdev spec won't fit in the label. * Detach the specified device. * Can't detach from this type of vdev. "applicable to mirror and replacing vdevs"));
* There are no other replicas of this device. * Find a mirror vdev in the source nvlist. * The mchild array contains a list of disks in one of the top-level mirrors * of the source pool. The schild array contains a list of disks that the * user specified on the command line. We loop over the mchild array to * see if any entry in the schild array matches. * If a disk in the mchild array is found in the schild array, we return * the index of that entry. Otherwise we return -1. * Split a mirror pool. If newroot points to null, then a new nvlist * is generated and it is the responsibility of the caller to free it. "retrieve pool configuration\n"));
"Source pool is missing vdev tree"));
* Unlike cache & spares, slogs are stored in the * ZPOOL_CONFIG_CHILDREN array. We filter them out here. * Create a hole vdev and put it in the config. "Source pool must be composed only of mirrors\n"));
/* find or add an entry for this top-level vdev */ /* We found a disk that the user specified. */ /* User didn't specify a disk for this vdev. */ /* did we find every disk the user specified? */ "include at most one disk from each mirror"));
/* Prepare the nvlist for populating. */ /* Add all the children we found */ * If we're just doing a dry run, exit now with success. /* now build up the config list & call the ioctl */ * The new pool is automatically part of the namespace unless we * Remove the given device. Currently, this is supported only for hot spares * and level 2 cache devices. * XXX - this should just go away. "only inactive hot spares, cache, top-level, " "or log devices can be removed"));
"pool must be upgrade to support log removal"));
* Clear the errors for the pool, or the particular device if specified. * Don't allow error clearing for hot spares. Do allow * error clearing for l2cache devices. * Similar to zpool_clear(), but takes a GUID (used by fmd). * Change the GUID for a pool. * Convert from a devid string to a path. * In a case the strdup() fails, we will just return NULL below. * Convert from a path to a devid string. * Issue the necessary ioctl() to update the stored path value for the vdev. We * ignore any failure here, since a common case is for an unprivileged user to * type 'zpool status', and we'll display the correct information anyway. * Given a vdev, return the name to display in iostat. If the vdev has a path, * we use that, stripping off any leading "/dev/dsk/"; if not, we use the type. * We also check if this is a whole disk, in which case we strip off the * trailing 's0' slice name. * This routine is also responsible for identifying when disks have been * reconfigured in a new location. The kernel will have opened the device by * devid, but the path will still refer to the old location. To catch this, we * first do a path -> devid translation (which is fast for the common case). If * the devid matches, we're done. If not, we do a reverse devid -> path * translation and issue the appropriate ioctl() to update the path of the vdev. * If 'zhp' is NULL, then this is an exported pool, and we don't need to do any * If the device is dead (faulted, offline, etc) then don't * bother opening it. Otherwise we may be forcing the user to * open a misbehaving device, which can have undesirable * Determine if the current path is correct. * Update the path appropriately. * If it starts with c#, and ends with "s0", chop * the "s0" off, or if it ends with "s0/old", remove * the "s0" from the middle. * If it's a raidz device, we need to stick in the parity level. * We identify each top-level vdev by using a <type-id> * Retrieve the persistent error log, uniquify the members, and return to the * Retrieve the raw error list from the kernel. If the number of errors * has increased, allocate more space and continue until we get the * Sort the resulting bookmarks. This is a little confusing due to the * implementation of ZFS_IOC_ERROR_LOG. The bookmarks are copied last * to first, and 'zc_nvlist_dst_size' indicates the number of boomarks * _not_ copied as part of the process. So we point the start of our * array appropriate and decrement the total number of elements. * Fill in the nverrlistp with nvlist's of dataset and object numbers. for (i = 0; i <
count; i++) {
/* ignoring zb_blkid and zb_level for now */ * Upgrade a ZFS pool to the latest on-disk version. for (
int i =
1; i <
argc; i++) {
* Perform ioctl to get some command history of a pool. * 'buf' is the buffer to fill up to 'len' bytes. 'off' is the * logical offset of the history buffer to start reading from. * Upon return, 'off' is the next logical offset to read from and * 'len' is the actual amount of bytes read into 'buf'. "cannot show history for pool '%s'"),
* Process the buffer of nvlists, unpacking and storing each nvlist record * into 'records'. 'leftover' is set to the number of bytes that weren't * processed as there wasn't a complete record. /* get length of packed record (stored as little endian) */ /* add record to nvlist array */ * Retrieve the command history of a pool. /* if nothing else was read in, we're at EOF, just return */ * no progress made, because buffer is not big enough * to hold this record; resize and retry. /* special case for the MOS */ /* get the dataset's name */ /* just write out a path of two object numbers */ /* find out if the dataset is mounted */ /* get the corrupted object's path */ * Read the EFI label from the config, if a label does not exist then * pass back the error to the caller. If the caller has passed a non-NULL * diskaddr argument then we set it to the starting address of the EFI * determine where a partition starts on a disk in the current * Label an individual disk. The name provided is the short name, * stripped of any leading /dev path. /* prepare an error message just in case */ * This shouldn't happen. We've long since verified that this * The only way this can fail is if we run out of memory, or we * were unable to read the disk's capacity "unable to read disk capacity"),
name);
* Why we use V_USR: V_BACKUP confuses users, and is considered * disposable by some EFI utilities (since EFI doesn't have a backup * slice). V_UNASSIGNED is supposed to be used only for zero size * partitions, and efi_write() will fail if we use it. V_ROOT, V_BOOT, * etc. were all pretty specific. V_USR is as close to reality as we * can get, in the absence of V_OTHER. * Some block drivers (like pcata) may not support EFI * GPT labels. Print out a helpful error message dir- * ecting the user to manually label the disk and give "try using fdisk(1M) and then provide a specific slice"));
"vdev type '%s' is not supported"),
type);
* Check if this zvol is allowable for use as a dump device; zero if * it is, > 0 if it isn't, < 0 if it isn't a zvol. * Allowable storage configurations include mirrors, all raidz variants, and * pools with log, cache, and spare devices. Pools which are backed by files or "dump is not supported on device '%s'"),
arg);
/* check the configuration of the pool */ "malformed dataset name"));
"dataset name is too long"));
"could not obtain vdev configuration for '%s'"),
poolname);