dsl_scan.c revision 5f37736ac8f99922368294d745d3fefa22b49d11
0N/A * The contents of this file are subject to the terms of the 0N/A * Common Development and Distribution License (the "License"). 0N/A * You may not use this file except in compliance with the License. 0N/A * See the License for the specific language governing permissions 0N/A * and limitations under the License. 0N/A * When distributing Covered Code, include this CDDL HEADER in each 0N/A * If applicable, add the following below this CDDL HEADER, with the 0N/A * fields enclosed by brackets "[]" replaced with your own identifying 0N/A * information: Portions Copyright [yyyy] [name of copyright owner] 0N/A * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved. 0N/A * Copyright (c) 2011, 2014 by Delphix. All rights reserved. 0N/A/* max number of blocks to free in a single TXG */ 0N/A/* the order has to match pool_scan_type */ 0N/A * It's possible that we're resuming a scan after a reboot so 0N/A * make sure that the scan_async_destroying flag is initialized 0N/A * There was an old-style scrub in progress. Restart a 0N/A * new-style scrub from the beginning. 0N/A "restarting new-style scrub in txg %llu",
0N/A * Load the queue obj from the old location so that it 0N/A * can be freed by dsl_scan_done(). 0N/A * A new-type scrub was in progress on an old 0N/A * pool, and the pool was accessed by old 0N/A * software. Restart from the beginning, since 0N/A * the old software may have changed the pool in 0N/A "by old software; restarting in txg %llu",
0N/A /* rewrite all disk labels */ 0N/A * If this is an incremental scrub, limit the DDT scrub phase 0N/A * to just the auto-ditto class (for correctness); the rest 0N/A * of the scrub should go faster using top-down pruning. 0N/A /* back to the generic stuff */ 0N/A "func=%u mintxg=%llu maxtxg=%llu",
0N/A "scrub_ddt_bookmark",
0N/A "scrub_ddt_class_max",
0N/A /* Remove any remnants of an old-style scrub. */ 0N/A * If we were "restarted" from a stopped state, don't bother 0N/A * with anything else. 0N/A * reflect this. Whether it succeeded or not, vacate 0N/A * all temporary scrub DTLs. 0N/A * We may have finished replacing a device. 0N/A * Let the async thread assess this and handle the detach. 0N/A /* We only know how to resume from level-0 blocks. */ 0N/A dprintf(
"pausing at bookmark %llx/%llx/%llx/%llx\n",
0N/A dprintf(
"pausing at DDT bookmark %llx/%llx/%llx/%llx\n",
0N/A * One block ("stubby") can be allocated a long time ago; we 0N/A * want to visit that one because it has been allocated 0N/A * (on-disk) even if it hasn't been claimed (even though for 0N/A * scrub there's nothing to do to it). 0N/A * birth can be < claim_txg if this record's txg is 0N/A * already txg sync'ed (but this log block contains 0N/A * other records that are not synced) 0N/A * We only want to visit blocks that have been claimed but not yet 0N/A * replayed (or, in read-only mode, blocks that *would* be claimed). 0N/A * If we already visited this bp & everything below (in 0N/A * a prior txg sync), don't bother doing it again. 0N/A * If we found the block we're trying to resume from, or 0N/A * we went past it to a different object, zero it out to 0N/A * indicate that it's OK to start checking for pausing 0N/A * Return nonzero on i/o error. 0N/A * Return new buf to write out in *bufp. 1137N/A * objects, and never skip them, even if we are 0N/A * pausing. This is necessary so that the space 0N/A * deltas from this txg get integrated. 0N/A * The arguments are in this order because mdb can only print the 0N/A * first 5; we want them to be useful. 0N/A /* ASSERT(pbuf == NULL || arc_released(pbuf)); */ 0N/A "visiting ds=%p/%llu zb=%llx/%llx/%llx/%llx bp=%p",
0N/A * If dsl_scan_ddt() has aready visited this block, it will have 0N/A * already done any translations or scrubbing, so don't call the 0N/A * If this block is from the future (after cur_max_txg), then we 0N/A * are doing this on behalf of a deleted snapshot, and we will 0N/A * revisit the future block on the next pass of this dataset. 0N/A * Don't scan it now unless we need to because something 0N/A * under it was modified. 0N/A /* Note, scn_cur_{min,max}_txg stays the same. */ 0N/A "reset zb_objset to %llu",
0N/A "reset bookmark to -1,0,0,0",
0N/A * We keep the same mintxg; it could be > 0N/A * ds_creation_txg if the previous snapshot was 0N/A "replacing with %llu",
zfs_dbgmsg(
"destroying ds %llu; in queue; removing",
* dsl_scan_sync() should be called after this, and should sync * out our changed state, but just to be safe, do it here. zfs_dbgmsg(
"snapshotting ds %llu; currently traversing; " "reset zb_objset to %llu",
zfs_dbgmsg(
"clone_swap ds %llu; currently traversing; " "reset zb_objset to %llu",
zfs_dbgmsg(
"clone_swap ds %llu; currently traversing; " "reset zb_objset to %llu",
/* Both were there to begin with */ * Only the ZIL in the head (non-snapshot) is valid. Even though * snapshots can have ZIL block pointers (which may be the same * BP as in the head), they must be ignored. So we traverse the * ZIL here, rather than in scan_recurse(), because the regular * snapshot block-sharing rules don't apply to it. * Iterate over the bps in this ds. zfs_dbgmsg(
"scanned dataset %llu (%s) with min=%llu max=%llu; " * We've finished this pass over this dataset. * If we did not completely visit this dataset, do another pass. * Add descendent datasets to work queue. * A bug in a previous version of the code could * cause upgrade_clones_cb() to not set * ds_next_snap_obj when it should, leading to a * missing entry. Therefore we can only use the * next_clones_obj when its count is correct. * If this is a clone, we don't need to worry about it for now. * If there are N references to a deduped block, we don't want to scrub it * N times -- ideally, we should scrub it exactly once. * We leverage the fact that the dde's replication class (enum ddt_class) * is ordered from highest replication class (DDT_CLASS_DITTO) to lowest * (DDT_CLASS_UNIQUE) so that we may walk the DDT in that order. * To prevent excess scrubbing, the scrub begins by walking the DDT * to find all blocks with refcnt > 1, and scrubs each of these once. * Since there are two replication classes which contain blocks with * refcnt > 1, we scrub the highest replication class (DDT_CLASS_DITTO) first. * Finally the top-down scrub begins, only visiting blocks with refcnt == 1. * There would be nothing more to say if a block's refcnt couldn't change * during a scrub, but of course it can so we must account for changes * in a block's replication class. * Here's an example of what can occur: * If a block has refcnt > 1 during the DDT scrub phase, but has refcnt == 1 * when visited during the top-down scrub phase, it will be scrubbed twice. * This negates our scrub optimization, but is otherwise harmless. * If a block has refcnt == 1 during the DDT scrub phase, but has refcnt > 1 * on each visit during the top-down scrub phase, it will never be scrubbed. * To catch this, ddt_sync_entry() notifies the scrub code whenever a block's * reference class transitions to a higher level (i.e DDT_CLASS_UNIQUE to * DDT_CLASS_DUPLICATE); if it transitions from refcnt == 1 to refcnt > 1 * while a scrub is in progress, it scrubs the block right then. dprintf(
"visiting ddb=%llu/%llu/%llu/%llx\n",
/* There should be no pending changes to the dedup table */ zfs_dbgmsg(
"scanned %llu ddt entries with class_max = %u; pausing=%u",
/* First do the MOS & ORIGIN */ * If we were paused, continue from here. Note if the * ds we were paused on was deleted, the zb_objset may * be -1, so we will skip this and find a new objset * In case we were paused right at the end of the ds, zero the * bookmark so we don't think that we're still trying to resume. /* keep pulling things out of the zap-object-as-queue */ * Check for scn_restart_txg before checking spa_load_state, so * that we can restart an old-style scan while the pool is being * imported (see dsl_scan_init). * If the scan is inactive due to a stalled async destroy, try again. * First process the async destroys. If we pause, don't do * any scrubbing or resilvering. This ensures that there are no * async destroys while we are scanning, so the scan code doesn't * have to worry about traversing it. It is also faster to free the * blocks than to scrub them. "traverse_dataset_destroyed()",
err);
* If we didn't make progress, mark the async destroy as * stalled, so that we will not initiate a spa_sync() on /* finished; deactivate async destroy feature */ * Write out changes to the DDT that may be required as a * result of the blocks freed. This ensures that the DDT * We have finished background destroying, but there is still * some space left in the dp_free_dir. Transfer this leaked * space to the dp_leak_dir. /* finished; verify that space accounting went to zero */ /* finished with scan. */ "ddt bm=%llu/%llu/%llu/%llx",
zfs_dbgmsg(
"doing scan sync txg %llu; bm=%llu/%llu/%llu/%llu",
zfs_dbgmsg(
"txg %llu traversal complete, waiting till txg %llu",
* This will start a new scan, or restart an existing one. * If we resume after a reboot, zab will be NULL; don't record * incomplete stats in that case. for (i = 0; i <
4; i++) {
/* If it's an intent log block, failure is expected. */ * Keep track of how much data we've examined so that * zpool(1M) status can make useful progress reports. /* if it's a resilver, this may not be in the target range */ * Gang members may be spread across multiple * vdevs, so the best estimate we have is the * scrub range, which has already been checked. * XXX -- it would be better to change our * allocation policy to ensure that all * gang members reside on the same vdev. * If we're seeing recent (zfs_scan_idle) "important" I/Os * then throttle our workload to limit the impact of a scan. /* do not relocate this block */ * Purge all vdev caches and probe all devices. We do this here * rather than in sync context because this requires a writer lock * on the spa_config lock, which we can't do from sync context. The * spa_scrub_reopen flag indicates that vdev_open() should not * attempt to start another scrub.