dsl_dataset.c revision ca45db4129beff691dc46576c328149443788af2
445N/A * The contents of this file are subject to the terms of the 445N/A * Common Development and Distribution License (the "License"). 445N/A * You may not use this file except in compliance with the License. * See the License for the specific language governing permissions * and limitations under the License. * When distributing Covered Code, include this CDDL HEADER in each * If applicable, add the following below this CDDL HEADER, with the * fields enclosed by brackets "[]" replaced with your own identifying * information: Portions Copyright [yyyy] [name of copyright owner] * Copyright 2009 Sun Microsystems, Inc. All rights reserved. * Use is subject to license terms. * Figure out how much of this delta should be propogated to the dsl_dir * layer. If there's a refreservation, that space has already been * partially accounted for in our ancestors. /* It could have been compressed away to nothing */ * Account for the meta-objset space in its placeholder /* No block pointer => nothing to free */ * Account for the meta-objset space in its placeholder /* if (bp->blk_birth > prev prev snap txg) prev unique += bs */ * The snapshot creation could fail, but that would cause an * incorrect FALSE return, which would only result in an * overestimation of the amount of space that an operation would * There's also a small window where we could miss a pending * snapshot, because we could set the sync task in the quiescing * phase. So this should only be used as a guess. * we don't really need to close the blist if we * In sync context, we're called with either no lock * or with the write lock. If we're not syncing, * we're always called with the read lock held. * In syncing context we don't want the rwlock lock: there * may be an existing writer waiting for sync phase to * finish. We don't need to worry about such writers, since * sync phase is single-threaded, so the writer can't be * doing anything while we are active. * Normal users will hold the ds_rwlock as a READER until they * are finished (i.e., call dsl_dataset_rele()). "Owners" will * drop their READER lock after they set the ds_owner field. * If the dataset is being destroyed, the destroy thread will * obtain a WRITER lock for exclusive access after it's done its * open-context work and then change the ds_owner to * dsl_reaper once destruction is assured. So threads * may block here temporarily, until the "destructability" of * the dataset is determined. /* we may be looking for a snapshot */ * We use a "recursive" mutex so that we * can call dprintf_ds() with ds_lock held. ++
result;
/* adding one for the @-sign */ * Destroy 'snapname' in all descendants of 'fsname'. * Return the file system name that triggered the error * If we're removing a clone, and these three conditions are true: * 1) the clone's origin has no other children * 2) the clone's origin has no user references * 3) the clone's origin has been marked for deferred destruction * Then, prepare to remove the origin as part of this sync task group. * ds must be opened as OWNER. On return (whether successful or not), * ds will be closed and caller can no longer dereference it. /* Destroying a snapshot is simpler */ /* NOTE: defer is always B_FALSE for non-snapshots */ * Check for errors and mark this ds as inconsistent, in * case we crash while freeing the objects. * remove the objects in open context, so that we won't * have too much to do in syncing context. * Ignore errors, if there is not enough disk space * we will deal with it in dsl_dataset_destroy_sync(). * We need to sync out all in-flight IO before we try to evict * (the dataset evict func is trying to clear the cached entries * for this dataset in the ARC). * If we managed to free all the objects in open * context, the user space accounting should be zero. * We need to sync out all in-flight IO before we try * to evict (the dataset evict func is trying to clear * the cached entries for this dataset in the ARC). * Blow away the dsl_dir + head dataset. * If we're removing a clone, we might also need to remove its * We could be racing against 'zfs release' or 'zfs destroy -d' * on the origin snap, in which case we can get EBUSY if we * needed to destroy the origin snap but were not ready to /* if it is successful, dsl_dir_destroy_sync will close the dd */ /* If it's the meta-objset, set dp_meta_rootbp */ if (
ds ==
NULL)
/* this is the meta-objset */ panic(
"dirtying snapshot!");
/* up the hold count until we can be written out */ * The unique space in the head dataset can be calculated by subtracting * the space used in the most recent snapshot, that is still being used * in this file system, from the space currently in use. To figure out * the space in the most recent snapshot still in use, we need to take * the total space used in the snapshot and subtract out the space that * has been freed up since the snapshot was taken. * It's a block in the intent log. It has no * accounting, so just free it. * Can't delete a head dataset if there are snapshots of it. * (Except if the only snapshots are from the branch we cloned * This is really a dsl_dir thing, but check it here so that * we'll be less likely to leave this dataset inconsistent & /* Mark it as inconsistent on-disk, in case we crash */ * If we're not prepared to remove the origin, don't remove * If we're not going to remove the origin after all, * undo the open context setup. /* we have an owner hold, so noone else can destroy us */ * Only allow deferred destroy on pools that support it. * NOTE: deferred destroy is only supported on snapshots. * Can't delete a head dataset if there are snapshots of it. * (Except if the only snapshots are from the branch we cloned * If we made changes this txg, traverse_dsl_dataset won't find * If this snapshot has an elevated user reference count, * we can't destroy it yet. * Can't delete a branch point. However, if we're destroying * a clone and removing its origin due to it having a user * hold count of 0 and having been marked for deferred destroy, * it's OK for the origin to have a single clone. /* XXX we should do some i/o error checking... */ /* signal any waiters that this dataset is going away */ /* Remove our reservation */ /* This clone is toast. */ * If the clone's origin has no other clones, no * user holds, and has been marked for deferred * deletion, then we should have done the necessary * Transfer to our deadlist (which will become next's * new deadlist) any entries from next's current * deadlist which were born before prev, and free the * XXX we're doing this long task with the config lock held /* XXX check return value? */ /* free next's deadlist */ /* set next's deadlist to our deadlist */ * Update next's unique to include blocks which * were previously shared by only this snapshot * and it. Those blocks will be born after the * prev snap and before this snap, and will have * died after the next snap and before the one * after that (ie. be on the snap after next's * XXX we're doing this long task with the * Reduce the amount of our unconsmed refreservation * being charged to our parent by the amount of * new unique data we have gained. * There's no next snapshot, so this is a head dataset. * Destroy the deadlist. Unless it's a clone, the * deadlist should be empty. (If it's a clone, it's * safe to ignore the deadlist contents.) * Free everything that we point to (that's born after * the previous snapshot, if we are a clone) * NB: this should be very quick, because we already * freed all the objects in open context. /* Erase the link in the dir */ /* remove from snapshot namespace */ * Remove the origin of the clone we just destroyed. * If there's an fs-only reservation, any blocks that might become * owned by the snapshot dataset must be accommodated by space * outside of the reservation. * Propogate any reserved space for this snapshot to other * snapshot checks in this sync group. * We don't allow multiple snapshots of the same txg. If there * is already one, try again. * Check for conflicting name snapshot name. * Check that the dataset's name is not too long. Name consists * of the dataset's length + 1 for the @-sign + snapshot name's length * The origin's ds_creation_txg has to be < TXG_INITIAL * If we have a reference-reservation on this dataset, we will * need to increase the amount of refreservation being charged * since our unique space is going to zero. "dataset = %llu",
dsobj);
* in case we had to change ds_fsid_guid when we opened it, * This is a snapshot; override the dd's space used with * our unique space and compression ratio. /* clone origin is really a dsl_dir thing... */ * Adjust available bytes according to refquota /* new name better not be in use */ /* dataset name + 1 for the "@" + the new snapshot name must fit */ * For recursive snapshot renames the parent won't be changing * so we just pass name for both the to/from argument. * For all filesystems undergoing rename, we'll need to unmount it. /* truncate the snapshot name to get the fsname */ * If there are more than 2 references there may be holds * hanging around that haven't been cleared out yet. /* if we're growing, validate child name lengths */ /* the name ended in a nonexistant component */ /* new name must be snapshot in same filesystem */ /* Check that it is a real clone */ /* Since this is so expensive, don't do the preliminary check */ /* compute origin's new unique space */ * Walk the snapshots that we are moving * Compute space to transfer. Consider the incremental changes * to used for each snapshot: * (my used) = (prev's used) + (blocks born) - (blocks killed) * So each snapshot gave birth to: * (blocks born) = (my used) - (prev's used) + (blocks killed) * So a sequence would look like: * (uN - u(N-1) + kN) + ... + (u1 - u0 + k1) + (u0 - 0 + k0) * uN + kN + kN-1 + ... + k1 + k0 * Note however, if we stop before we reach the ORIGIN we get: * uN + kN + kN-1 + ... + kM - uM-1 /* Check that the snapshot name does not conflict */ /* The very first snapshot does not have a deadlist */ * If we are a clone of a clone then we never reached ORIGIN, * so we need to subtract out the clone origin's used space. /* Check that there is enough space here */ * Compute the amounts of space that will be used by snapshots * after the promotion (for both origin and clone). For each, * it is the amount of space that will be on all of their * deadlists (that was not born before their new origin). * Note, typically this will not be a clone of a clone, * so snap->ds->ds_origin_txg will be < TXG_INITIAL, so * these snaplist_space() -> bplist_space_birthrange() * calls will be fast because they do not have to * We need to explicitly open odd, since origin_ds's dd will be /* change origin's next snap */ /* change the origin's next clone */ /* move snapshots to this dir */ /* unregister props as dsl_dir is changing */ /* move snap name entry */ /* change containing dsl_dir */ * Change space accounting. * Note, pa->*usedsnap and dd_used_breakdown[SNAP] will either * both be valid, or both be 0 (resulting in delta == 0). This * is true for each of {clone,origin} independently. * Make a list of dsl_dataset_t's for the snapshots between first_obj * (exclusive) and last_obj (inclusive). The list will be in reverse * order (last_obj will be the list_head()). If first_obj == 0, do all * snapshots back to this dataset's origin. /* lost race with snapshot destroy */ * Promote a clone. Nomenclature note: * "clone" or "cds": the original clone which is being promoted * "origin" or "ods": the snapshot which is originally clone's origin * "origin head" or "ohds": the dataset which is the head * "origin origin": the origin of the origin's filesystem (typically * NULL, indicating that the clone is not a clone of a clone). * We are going to inherit all the snapshots taken before our * origin (i.e., our new origin will be our parent's origin). * Take ownership of them so that we can rename them into our * Add in 128x the snapnames zapobj size, since we will be moving * a bunch of snapnames to the promoted ds, and dirtying their /* they should both be heads */ /* the branch point should be just before them */ /* cds should be the clone (unless they are unrelated) */ /* the clone should be a child of the origin */ /* ohds shouldn't be modified unless 'force' */ /* adjust amount of any unconsumed refreservation */ * Reset origin's unique bytes, if it exists. * The difference in the space used by snapshots is the * difference in snapshot space due to the head's * deadlist (since that's the only thing that's * changing that affects the snapused). /* apply any parent delta for change in unconsumed refreservation */ * Swap 'clone' with its origin head datasets. Used at the end of "zfs * recv" into an existing fs to swizzle the file system to the new * version, and by "zfs rollback". Can also be used to swap two * independent head datasets if neither has any snapshots. /* Need exclusive access for the swap */ * Given a pool name and a dataset object number in that pool, * return the name of that dataset. * *ref_rsrv is the portion of asize that will come from any * unconsumed refreservation space. * Make a space adjustment for reserved bytes. * If they are requesting more space, and our current estimate * is over quota, they get to try again unless the actual * on-disk is over quota and there are no pending changes (which * may free up space for us). tx,
cr,
"%lld dataset = %llu ",
* If someone removes a file, then tries to set the quota, we * want to make sure the file freeing takes effect. * If we are doing the preliminary check in open context, the * space estimates may be inaccurate. /* tags must be unique */ * This is the first user hold for this dataset. Create * the userrefs zap object. /* alloc a buffer to hold dsname@snapname plus terminating NULL */ /* The tag can't possibly exist */ /* Make sure the tag exists */ * If we're not prepared to remove the snapshot, * we can't allow the release to happen right now. /* We already did the destroy_check */ /* alloc a buffer to hold dsname@snapname, plus the terminating NULL */ * Called at spa_load time to release a stale temporary user hold. * Note, this fuction is used as the callback for dmu_objset_find(). We * always return 0 so that we will continue to find and process * inconsistent datasets, even if we encounter an error trying to