spa_misc.c revision d80c45e0f58fa434ba37259ea2e2b12e0380c19a
* The contents of this file are subject to the terms of the * Common Development and Distribution License (the "License"). * You may not use this file except in compliance with the License. * See the License for the specific language governing permissions * and limitations under the License. * When distributing Covered Code, include this CDDL HEADER in each * If applicable, add the following below this CDDL HEADER, with the * fields enclosed by brackets "[]" replaced with your own identifying * information: Portions Copyright [yyyy] [name of copyright owner] * Copyright 2006 Sun Microsystems, Inc. All rights reserved. * Use is subject to license terms. #
pragma ident "%Z%%M% %I% %E% SMI" * There are four basic locks for managing spa_t structures: * spa_namespace_lock (global mutex) * This lock must be acquired to do any of the following: * - Lookup a spa_t by name * - Add or remove a spa_t from the namespace * - Increase spa_refcount from non-zero * - Check if spa_refcount is zero * It does not need to handle recursion. A create or destroy may * reference objects (files or zvols) in other pools, but by * definition they must have an existing reference, and will never need * to lookup a spa_t by name. * spa_refcount (per-spa refcount_t protected by mutex) * This reference count keep track of any active users of the spa_t. The * spa_t cannot be destroyed or freed while this is non-zero. Internally, * the refcount is never really 'zero' - opening a pool implicitly keeps * some references in the DMU. Internally we check against SPA_MINREF, but * spa_config_lock (per-spa crazy rwlock) * This SPA special is a recursive rwlock, capable of being acquired from * asynchronous threads. It has protects the spa_t from config changes, * and must be held in the following circumstances: * - RW_READER to perform I/O to the spa * - RW_WRITER to change the vdev config * spa_config_cache_lock (per-spa mutex) * This mutex prevents the spa_config nvlist from being updated. No * other locks are required to obtain this lock, although implicitly you * must have the namespace lock or non-zero refcount to have any kind * of spa_t pointer at all. * The locking order is fairly straightforward: * spa_namespace_lock -> spa_refcount * The namespace lock must be acquired to increase the refcount from 0 * or to check if it is zero. * spa_refcount -> spa_config_lock * There must be at least one valid reference on the spa_t to acquire * spa_namespace_lock -> spa_config_lock * The namespace lock must always be taken before the config lock. * The spa_namespace_lock and spa_config_cache_lock can be acquired directly and * The namespace is manipulated using the following functions, all which require * the spa_namespace_lock to be held. * spa_lookup() Lookup a spa_t by name. * spa_add() Create a new spa_t in the namespace. * spa_remove() Remove a spa_t from the namespace. This also * frees up any memory associated with the spa_t. * spa_next() Returns the next spa_t in the system, or the * first if NULL is passed. * spa_evict_all() Shutdown and remove all spa_t structures in * spa_guid_exists() Determine whether a pool/device guid exists. * The spa_refcount is manipulated using the following functions: * spa_open_ref() Adds a reference to the given spa_t. Must be * called with spa_namespace_lock held if the * refcount is currently zero. * spa_close() Remove a reference from the spa_t. This will * not free the spa_t or remove it from the * namespace. No locking is required. * spa_refcount_zero() Returns true if the refcount is currently * zero. Must be called with spa_namespace_lock * The spa_config_lock is manipulated using the following functions: * spa_config_enter() Acquire the config lock as RW_READER or * RW_WRITER. At least one reference on the spa_t * spa_config_exit() Release the config lock. * spa_config_held() Returns true if the config lock is currently * held in the given state. * The vdev configuration is protected by spa_vdev_enter() / spa_vdev_exit(). * spa_vdev_enter() Acquire the namespace lock and the config lock * spa_vdev_exit() Release the config lock, wait for all I/O * to complete, sync the updated configs to the * cache, and release the namespace lock. * The spa_name() function also requires either the spa_namespace_lock * or the spa_config_lock, as both are needed to do a rename. spa_rename() is * also implemented within this file since is requires manipulation of the #
define SPA_MINREF 5 /* spa_refcnt for an open-but-idle pool */ * ========================================================================== * SPA namespace functions * ========================================================================== * Lookup the named spa_t in the AVL tree. The spa_namespace_lock must be held. * Returns NULL if no matching spa_t is found. * Create an uninitialized spa_t with the given name. Requires * spa_namespace_lock. The caller must ensure that the spa_t doesn't already * exist by calling spa_lookup() first. * Set the alternate root, if there is one. * Removes a spa_t from the namespace, freeing up any memory used. Requires * spa_namespace_lock. This is called only after the spa_t has been closed and * Given a pool, return the next pool in the namespace, or NULL if there is * none. If 'prev' is NULL, return the first pool. * ========================================================================== * ========================================================================== * Add a reference to the given spa_t. Must have at least one reference, or * have the namespace lock held. * Remove a reference to the given spa_t. Must have at least one reference, or * have the namespace lock held. * Check to see if the spa refcount is zero. Must be called with * spa_namespace_lock held. We really compare against SPA_MINREF, which is the * number of references acquired when opening a pool * ========================================================================== * ========================================================================== * Acquire the config lock. The config lock is a special rwlock that allows for * recursive enters. Because these enters come from the same thread as well as * asynchronous threads working on behalf of the owner, we must unilaterally * allow all reads access as long at least one reader is held (even if a write * is requested). This has the side effect of write starvation, but write locks * are extremely rare, and a solution to this problem would be significantly * more complex (if even possible). * We would like to assert that the namespace lock isn't held, but this is a * valid use during create. * Release the spa config lock, notifying any waiters in the process. * Returns true if the config lock is held in the given manner. * ========================================================================== * ========================================================================== * Lock the given spa_t for the purpose of adding or removing a vdev. * Grabs the global spa_namespace_lock plus the spa config lock for writing. * It returns the next transaction group for the spa_t. * Suspend scrub activity while we mess with the config. * Unlock the spa_t after adding or removing a vdev. Besides undoing the * locking of spa_vdev_enter(), we also want make sure the transactions have * synced to disk, and then update the global configuration cache with the new * If the config changed, notify the scrub thread that it must restart. * Allow scrubbing to resume. * Note: this txg_wait_synced() is important because it ensures * that there won't be more than one config change per txg. * This allows us to use the txg as the generation number. * If the config changed, update the config cache. * ========================================================================== * Miscellaneous functions * ========================================================================== * Lookup the spa_t and grab the config lock for writing. We need to * actually open the pool so that we can sync out the necessary labels. * It's OK to call spa_open() with the namespace lock held because we * allow recursive calls for other reasons. * Sync all labels to disk with the new names by marking the root vdev * dirty and waiting for it to sync. It will pick up the new pool name * Sync the updated config cache. * Determine whether a pool with given pool_guid exists. If device_guid is * non-zero, determine whether the pool exists *and* contains a device with the "DVA[%d]=<%llu:%llx:%llx> ", d,
"%s %s %s %s birth=%llu fill=%llu cksum=%llx:%llx:%llx:%llx",
* ========================================================================== * ========================================================================== * Accessing the name requires holding either the namespace lock or the * config lock, both of which are required to do a rename. * In the future, this may select among different metaslab classes * depending on the zdp. For now, there's no such distinction. * Return pool-wide allocated space. * Return pool-wide allocated space. * For now, the worst case is 512-byte RAID-Z blocks, in which * case the space requirement is exactly 2x; so just assume that. * Add to this the fact that we can have up to 3 DVAs per bp, and * we have to multiply by a total of 6x. * As of ZFS_VERSION == ZFS_VERSION_DITTO_BLOCKS, we are able to * handle BPs with more than one DVA allocated. Set our max * replication level accordingly. * ========================================================================== * Initialization and Termination * ==========================================================================