spa_misc.c revision 2f8aaab38e6371ad39ed90a1211ba8921acbb4d5
5385N/A * The contents of this file are subject to the terms of the 5250N/A * Common Development and Distribution License (the "License"). 1878N/A * You may not use this file except in compliance with the License. 1878N/A * See the License for the specific language governing permissions 1878N/A * and limitations under the License. 1878N/A * When distributing Covered Code, include this CDDL HEADER in each 1878N/A * If applicable, add the following below this CDDL HEADER, with the 1878N/A * fields enclosed by brackets "[]" replaced with your own identifying * information: Portions Copyright [yyyy] [name of copyright owner] * Copyright 2007 Sun Microsystems, Inc. All rights reserved. * Use is subject to license terms. #
pragma ident "%Z%%M% %I% %E% SMI" * There are four basic locks for managing spa_t structures: * spa_namespace_lock (global mutex) * This lock must be acquired to do any of the following: * - Lookup a spa_t by name * - Add or remove a spa_t from the namespace * - Increase spa_refcount from non-zero * - Check if spa_refcount is zero * It does not need to handle recursion. A create or destroy may * reference objects (files or zvols) in other pools, but by * definition they must have an existing reference, and will never need * to lookup a spa_t by name. * spa_refcount (per-spa refcount_t protected by mutex) * This reference count keep track of any active users of the spa_t. The * spa_t cannot be destroyed or freed while this is non-zero. Internally, * the refcount is never really 'zero' - opening a pool implicitly keeps * some references in the DMU. Internally we check against SPA_MINREF, but * spa_config_lock (per-spa read-priority rwlock) * This protects the spa_t from config changes, and must be held in * the following circumstances: * - RW_READER to perform I/O to the spa * - RW_WRITER to change the vdev config * spa_config_cache_lock (per-spa mutex) * This mutex prevents the spa_config nvlist from being updated. No * other locks are required to obtain this lock, although implicitly you * must have the namespace lock or non-zero refcount to have any kind * of spa_t pointer at all. * The locking order is fairly straightforward: * spa_namespace_lock -> spa_refcount * The namespace lock must be acquired to increase the refcount from 0 * or to check if it is zero. * spa_refcount -> spa_config_lock * There must be at least one valid reference on the spa_t to acquire * spa_namespace_lock -> spa_config_lock * The namespace lock must always be taken before the config lock. * The spa_namespace_lock and spa_config_cache_lock can be acquired directly and * The namespace is manipulated using the following functions, all which require * the spa_namespace_lock to be held. * spa_lookup() Lookup a spa_t by name. * spa_add() Create a new spa_t in the namespace. * spa_remove() Remove a spa_t from the namespace. This also * frees up any memory associated with the spa_t. * spa_next() Returns the next spa_t in the system, or the * first if NULL is passed. * spa_evict_all() Shutdown and remove all spa_t structures in * spa_guid_exists() Determine whether a pool/device guid exists. * The spa_refcount is manipulated using the following functions: * spa_open_ref() Adds a reference to the given spa_t. Must be * called with spa_namespace_lock held if the * refcount is currently zero. * spa_close() Remove a reference from the spa_t. This will * not free the spa_t or remove it from the * namespace. No locking is required. * spa_refcount_zero() Returns true if the refcount is currently * zero. Must be called with spa_namespace_lock * The spa_config_lock is manipulated using the following functions: * spa_config_enter() Acquire the config lock as RW_READER or * RW_WRITER. At least one reference on the spa_t * spa_config_exit() Release the config lock. * spa_config_held() Returns true if the config lock is currently * held in the given state. * The vdev configuration is protected by spa_vdev_enter() / spa_vdev_exit(). * spa_vdev_enter() Acquire the namespace lock and the config lock * spa_vdev_exit() Release the config lock, wait for all I/O * to complete, sync the updated configs to the * cache, and release the namespace lock. * The spa_name() function also requires either the spa_namespace_lock * or the spa_config_lock, as both are needed to do a rename. spa_rename() is * also implemented within this file since is requires manipulation of the /* Everything except dprintf is on by default in debug builds */ * zfs_recover can be set to nonzero to attempt to recover from * otherwise-fatal errors, typically caused by on-disk corruption. When * set, calls to zfs_panic_recover() will turn into warning messages. #
define SPA_MINREF 5 /* spa_refcnt for an open-but-idle pool */ * ========================================================================== * SPA namespace functions * ========================================================================== * Lookup the named spa_t in the AVL tree. The spa_namespace_lock must be held. * Returns NULL if no matching spa_t is found. * If it's a full dataset name, figure out the pool name and * Create an uninitialized spa_t with the given name. Requires * spa_namespace_lock. The caller must ensure that the spa_t doesn't already * exist by calling spa_lookup() first. * Set the alternate root, if there is one. * Removes a spa_t from the namespace, freeing up any memory used. Requires * spa_namespace_lock. This is called only after the spa_t has been closed and * Given a pool, return the next pool in the namespace, or NULL if there is * none. If 'prev' is NULL, return the first pool. * ========================================================================== * ========================================================================== * Add a reference to the given spa_t. Must have at least one reference, or * have the namespace lock held. * Remove a reference to the given spa_t. Must have at least one reference, or * have the namespace lock held. * Check to see if the spa refcount is zero. Must be called with * spa_namespace_lock held. We really compare against SPA_MINREF, which is the * number of references acquired when opening a pool * ========================================================================== * ========================================================================== * Spares are tracked globally due to the following constraints: * - A spare may be part of multiple pools. * - A spare may be added to a pool even if it's actively in use within * - A spare in use in any pool can only be the source of a replacement if * the target is a spare in the same pool. * We keep track of all spares on the system through the use of a reference * counted AVL tree. When a vdev is added as a spare, or used as a replacement * spare, then we bump the reference count in the AVL tree. In addition, we set * the 'vdev_isspare' member to indicate that the device is a spare (active or * inactive). When a spare is made active (used to replace a device in the * pool), we also keep track of which pool its been made a part of. * The 'spa_spare_lock' protects the AVL tree. These functions are normally * called under the spa_namespace lock as part of vdev reconfiguration. The * separate spare lock exists for the status query path, which does not need to * be completely consistent with respect to other vdev configuration changes. * ========================================================================== * ========================================================================== * ========================================================================== * ========================================================================== * Lock the given spa_t for the purpose of adding or removing a vdev. * Grabs the global spa_namespace_lock plus the spa config lock for writing. * It returns the next transaction group for the spa_t. * Suspend scrub activity while we mess with the config. We must do * this after acquiring the namespace lock to avoid a 3-way deadlock * with spa_scrub_stop() and the scrub thread. * Unlock the spa_t after adding or removing a vdev. Besides undoing the * locking of spa_vdev_enter(), we also want make sure the transactions have * synced to disk, and then update the global configuration cache with the new * If the config changed, notify the scrub thread that it must restart. * Allow scrubbing to resume. * Note: this txg_wait_synced() is important because it ensures * that there won't be more than one config change per txg. * This allows us to use the txg as the generation number. * If the config changed, update the config cache. * ========================================================================== * Miscellaneous functions * ========================================================================== * Lookup the spa_t and grab the config lock for writing. We need to * actually open the pool so that we can sync out the necessary labels. * It's OK to call spa_open() with the namespace lock held because we * allow recursive calls for other reasons. * Sync all labels to disk with the new names by marking the root vdev * dirty and waiting for it to sync. It will pick up the new pool name * Sync the updated config cache. * Determine whether a pool with given pool_guid exists. If device_guid is * non-zero, determine whether the pool exists *and* contains a device with the * Check any devices we may be in the process of adding. "DVA[%d]=<%llu:%llx:%llx> ", d,
"%s %s %s %s birth=%llu fill=%llu cksum=%llx:%llx:%llx:%llx",
* ========================================================================== * ========================================================================== * Accessing the name requires holding either the namespace lock or the * config lock, both of which are required to do a rename. * If we fail to parse the config during spa_load(), we can go through * the error path (which posts an ereport) and end up here with no root * vdev. We stash the original pool guid in 'spa_load_guid' to handle * Return how much space is allocated in the pool (ie. sum of all asize) * Return how much (raid-z inflated) space there is in the pool. * Return the amount of raid-z-deflated space in the pool. * For now, the worst case is 512-byte RAID-Z blocks, in which * case the space requirement is exactly 2x; so just assume that. * Add to this the fact that we can have up to 3 DVAs per bp, and * we have to multiply by a total of 6x. * Return the failure mode that has been set to this pool. The default * behavior will be to block all I/Os when a complete failure occurs. * As of SPA_VERSION == SPA_VERSION_DITTO_BLOCKS, we are able to * handle BPs with more than one DVA allocated. Set our max * replication level accordingly. * ========================================================================== * Initialization and Termination * ========================================================================== * Return whether this pool has slogs. No locking needed. * It's not a problem if the wrong answer is returned as it's only for * performance and not correctness