dsl_scan.c revision af3465da8fa420c4ec701e3e57704d537a6f755b
0N/A * The contents of this file are subject to the terms of the 0N/A * Common Development and Distribution License (the "License"). 0N/A * You may not use this file except in compliance with the License. 0N/A * See the License for the specific language governing permissions 0N/A * and limitations under the License. 0N/A * When distributing Covered Code, include this CDDL HEADER in each 0N/A * If applicable, add the following below this CDDL HEADER, with the 0N/A * fields enclosed by brackets "[]" replaced with your own identifying 0N/A * information: Portions Copyright [yyyy] [name of copyright owner] 0N/A * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved. 0N/A * Copyright (c) 2011, 2014 by Delphix. All rights reserved. 0N/A/* max number of blocks to free in a single TXG */ 0N/A/* the order has to match pool_scan_type */ 0N/A * It's possible that we're resuming a scan after a reboot so 0N/A * make sure that the scan_async_destroying flag is initialized 0N/A * There was an old-style scrub in progress. Restart a 0N/A * new-style scrub from the beginning. 1064N/A "restarting new-style scrub in txg %llu",
0N/A * Load the queue obj from the old location so that it 0N/A * can be freed by dsl_scan_done(). 0N/A * A new-type scrub was in progress on an old 0N/A * pool, and the pool was accessed by old 0N/A * software. Restart from the beginning, since 0N/A * the old software may have changed the pool in 0N/A "by old software; restarting in txg %llu",
0N/A /* rewrite all disk labels */ 0N/A * If this is an incremental scrub, limit the DDT scrub phase 0N/A * to just the auto-ditto class (for correctness); the rest 0N/A * of the scrub should go faster using top-down pruning. 0N/A /* back to the generic stuff */ 0N/A "func=%u mintxg=%llu maxtxg=%llu",
0N/A "scrub_ddt_bookmark",
0N/A "scrub_ddt_class_max",
0N/A /* Remove any remnants of an old-style scrub. */ 350N/A * If we were "restarted" from a stopped state, don't bother 0N/A * with anything else. 0N/A * reflect this. Whether it succeeded or not, vacate 0N/A * all temporary scrub DTLs. 642N/A * We may have finished replacing a device. 642N/A * Let the async thread assess this and handle the detach. 0N/A /* We only know how to resume from level-0 blocks. */ 0N/A dprintf(
"pausing at bookmark %llx/%llx/%llx/%llx\n",
0N/A dprintf(
"pausing at DDT bookmark %llx/%llx/%llx/%llx\n",
0N/A * One block ("stubby") can be allocated a long time ago; we 0N/A * want to visit that one because it has been allocated 0N/A * (on-disk) even if it hasn't been claimed (even though for 0N/A * scrub there's nothing to do to it). 0N/A * birth can be < claim_txg if this record's txg is 0N/A * already txg sync'ed (but this log block contains 0N/A * other records that are not synced) 0N/A * We only want to visit blocks that have been claimed but not yet 0N/A * replayed (or, in read-only mode, blocks that *would* be claimed). 0N/A * If we already visited this bp & everything below (in 0N/A * a prior txg sync), don't bother doing it again. 0N/A * If we found the block we're trying to resume from, or 0N/A * we went past it to a different object, zero it out to 0N/A * indicate that it's OK to start checking for pausing 0N/A * Return nonzero on i/o error. 0N/A * Return new buf to write out in *bufp. 0N/A * objects, and never skip them, even if we are 0N/A * pausing. This is necessary so that the space 0N/A * deltas from this txg get integrated. 0N/A * The arguments are in this order because mdb can only print the 0N/A * first 5; we want them to be useful. 0N/A /* ASSERT(pbuf == NULL || arc_released(pbuf)); */ 0N/A "visiting ds=%p/%llu zb=%llx/%llx/%llx/%llx buf=%p bp=%p",
1207N/A * If dsl_scan_ddt() has aready visited this block, it will have 1207N/A * already done any translations or scrubbing, so don't call the 100N/A * If this block is from the future (after cur_max_txg), then we 0N/A * are doing this on behalf of a deleted snapshot, and we will 0N/A * revisit the future block on the next pass of this dataset. 0N/A * Don't scan it now unless we need to because something 0N/A * under it was modified. 0N/A /* Note, scn_cur_{min,max}_txg stays the same. */ 0N/A "reset zb_objset to %llu",
0N/A "reset bookmark to -1,0,0,0",
0N/A * We keep the same mintxg; it could be > 0N/A * ds_creation_txg if the previous snapshot was 0N/A "replacing with %llu",
0N/A * dsl_scan_sync() should be called after this, and should sync 0N/A * out our changed state, but just to be safe, do it here. 0N/A "reset zb_objset to %llu",
0N/A "reset zb_objset to %llu",
0N/A "reset zb_objset to %llu",
1507N/A /* Both were there to begin with */ 0N/A "replacing with %llu",
0N/A "replacing with %llu",
0N/A * Only the ZIL in the head (non-snapshot) is valid. Even though 0N/A * snapshots can have ZIL block pointers (which may be the same 0N/A * BP as in the head), they must be ignored. So we traverse the 0N/A * ZIL here, rather than in scan_recurse(), because the regular 0N/A * snapshot block-sharing rules don't apply to it. 0N/A * Iterate over the bps in this ds. 0N/A * We've finished this pass over this dataset. 0N/A * If we did not completely visit this dataset, do another pass. 0N/A * Add descendent datasets to work queue. 71N/A * A bug in a previous version of the code could 71N/A * cause upgrade_clones_cb() to not set 71N/A * ds_next_snap_obj when it should, leading to a 405N/A * missing entry. Therefore we can only use the 405N/A * next_clones_obj when its count is correct. 880N/A * If this is a clone, we don't need to worry about it for now. 0N/A * If there are N references to a deduped block, we don't want to scrub it 0N/A * N times -- ideally, we should scrub it exactly once. 0N/A * We leverage the fact that the dde's replication class (enum ddt_class) 0N/A * is ordered from highest replication class (DDT_CLASS_DITTO) to lowest 0N/A * (DDT_CLASS_UNIQUE) so that we may walk the DDT in that order. 0N/A * To prevent excess scrubbing, the scrub begins by walking the DDT 0N/A * to find all blocks with refcnt > 1, and scrubs each of these once. 0N/A * Since there are two replication classes which contain blocks with 0N/A * refcnt > 1, we scrub the highest replication class (DDT_CLASS_DITTO) first. 0N/A * Finally the top-down scrub begins, only visiting blocks with refcnt == 1. 0N/A * There would be nothing more to say if a block's refcnt couldn't change 0N/A * during a scrub, but of course it can so we must account for changes 0N/A * in a block's replication class. 0N/A * Here's an example of what can occur: 0N/A * If a block has refcnt > 1 during the DDT scrub phase, but has refcnt == 1 0N/A * when visited during the top-down scrub phase, it will be scrubbed twice. 0N/A * This negates our scrub optimization, but is otherwise harmless. 0N/A * If a block has refcnt == 1 during the DDT scrub phase, but has refcnt > 1 0N/A * on each visit during the top-down scrub phase, it will never be scrubbed. 0N/A * To catch this, ddt_sync_entry() notifies the scrub code whenever a block's 0N/A * reference class transitions to a higher level (i.e DDT_CLASS_UNIQUE to 0N/A * DDT_CLASS_DUPLICATE); if it transitions from refcnt == 1 to refcnt > 1 0N/A * while a scrub is in progress, it scrubs the block right then. 0N/A /* There should be no pending changes to the dedup table */ 0N/A zfs_dbgmsg(
"scanned %llu ddt entries with class_max = %u; pausing=%u",
0N/A /* First do the MOS & ORIGIN */ 0N/A * If we were paused, continue from here. Note if the 483N/A * ds we were paused on was deleted, the zb_objset may 483N/A * be -1, so we will skip this and find a new objset 1284N/A * In case we were paused right at the end of the ds, zero the 1284N/A * bookmark so we don't think that we're still trying to resume. 0N/A /* keep pulling things out of the zap-object-as-queue */ 1145N/A * Check for scn_restart_txg before checking spa_load_state, so 1145N/A * that we can restart an old-style scan while the pool is being 1145N/A * imported (see dsl_scan_init). 1145N/A * If the scan is inactive due to a stalled async destroy, try again. 1145N/A * First process the async destroys. If we pause, don't do 1145N/A * any scrubbing or resilvering. This ensures that there are no 1145N/A * async destroys while we are scanning, so the scan code doesn't 1145N/A * have to worry about traversing it. It is also faster to free the 1145N/A * blocks than to scrub them. 0N/A "traverse_dataset_destroyed()",
err);
0N/A * If we didn't make progress, mark the async destroy as 0N/A * stalled, so that we will not initiate a spa_sync() on 1064N/A /* finished; deactivate async destroy feature */ 1145N/A * Write out changes to the DDT that may be required as a 1145N/A * result of the blocks freed. This ensures that the DDT 1145N/A * We have finished background destroying, but there is still 1145N/A * some space left in the dp_free_dir. Transfer this leaked 1145N/A * space to the dp_leak_dir. 0N/A /* finished; verify that space accounting went to zero */ 0N/A /* finished with scan. */ 0N/A "ddt bm=%llu/%llu/%llu/%llx",
0N/A * This will start a new scan, or restart an existing one. 0N/A * If we resume after a reboot, zab will be NULL; don't record 0N/A * incomplete stats in that case. 0N/A for (i = 0; i <
4; i++) {
0N/A /* If it's an intent log block, failure is expected. */ 0N/A * Keep track of how much data we've examined so that 0N/A * zpool(1M) status can make useful progress reports. 0N/A /* if it's a resilver, this may not be in the target range */ 0N/A * Gang members may be spread across multiple 0N/A * vdevs, so the best estimate we have is the 1064N/A * scrub range, which has already been checked. 0N/A * XXX -- it would be better to change our 94N/A * allocation policy to ensure that all 1064N/A * gang members reside on the same vdev. 1064N/A * If we're seeing recent (zfs_scan_idle) "important" I/Os 0N/A * then throttle our workload to limit the impact of a scan. 1064N/A /* do not relocate this block */ 0N/A * Purge all vdev caches and probe all devices. We do this here 0N/A * rather than in sync context because this requires a writer lock 0N/A * on the spa_config lock, which we can't do from sync context. The 0N/A * spa_scrub_reopen flag indicates that vdev_open() should not 0N/A * attempt to start another scrub.