dnode.c revision 0fbc0cd0e52a11f6c4397a1714f94412cbf98b60
0N/A * The contents of this file are subject to the terms of the 0N/A * Common Development and Distribution License (the "License"). 0N/A * You may not use this file except in compliance with the License. 0N/A * See the License for the specific language governing permissions 0N/A * and limitations under the License. 0N/A * When distributing Covered Code, include this CDDL HEADER in each 0N/A * If applicable, add the following below this CDDL HEADER, with the 0N/A * fields enclosed by brackets "[]" replaced with your own identifying 0N/A * information: Portions Copyright [yyyy] [name of copyright owner] 0N/A * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. 0N/A * Copyright (c) 2012, 2014 by Delphix. All rights reserved. 0N/A * Define DNODE_STATS to turn on statistic gathering. By default, it is only 0N/A * turned on when DEBUG is also defined. 0N/A#
endif /* DNODE_STATS */ 0N/A * Every dbuf has a reference, and dropping a tracked reference is 0N/A * O(number of references), so don't track dn_holds. 0N/A * dn_nblkptr is only one byte, so it's OK to read it in either 0N/A * byte order. We can't read dn_bouslen. 0N/A * OK to check dn_bonuslen for zero, because it won't matter if 0N/A * we have the wrong byte order. This is necessary because the 0N/A * dnode dnode is smaller than a regular dnode. 0N/A * Note that the bonus length calculated here may be 0N/A * longer than the actual bonus buffer. This is because 0N/A * we always put the bonus buffer after the last block 0N/A * pointer (instead of packing it against the end of the 0N/A /* Swap SPILL block if we have one */ 0N/A * Defer setting dn_objset until the dnode is ready to be a candidate 0N/A * for the dnode_move() callback. 0N/A * Everything else must be valid before assigning dn_objset makes the 0N/A * dnode eligible for dnode_move(). 0N/A * Caller must be holding the dnode handle, which is released upon return. 0N/A /* the dnode can no longer move, so we can release the handle */ 0N/A /* clean up any unreferenced dbufs */ 0N/A /* change bonus size and type */ 0N/A /* fix up the bonus db_size */ 0N/A#
endif /* DNODE_STATS */ 0N/A * Update back pointers. Updating the handle fixes the back pointer of 0N/A * every descendant dbuf as well as the bonus dbuf. 0N/A * Invalidate the original dnode by clearing all of its back pointers. 0N/A * Set the low bit of the objset pointer to ensure that dnode_move() 0N/A * recognizes the dnode as invalid in any subsequent callback. 0N/A * Satisfy the destructor. 0N/A * The dnode is on the objset's list of known dnodes if the objset 0N/A * pointer is valid. We set the low bit of the objset pointer when 0N/A * freeing the dnode to invalidate it, and the memory patterns written 0N/A * by kmem (baddcafe and deadbeef) set at least one of the two low bits. 0N/A * A newly created dnode sets the objset pointer last of all to indicate 0N/A * that the dnode is known and in a valid state to be moved by this 0N/A * Ensure that the objset does not go away during the move. 0N/A * If the dnode is still valid, then so is the objset. We know that no 0N/A * valid objset can be freed while we hold os_lock, so we can safely 0N/A * ensure that the objset remains in use. 0N/A * Recheck the objset pointer in case the dnode was removed just before 0N/A * acquiring the lock. 0N/A * At this point we know that as long as we hold os->os_lock, the dnode 0N/A * cannot be freed and fields within the dnode can be safely accessed. 0N/A * The objset listing this dnode cannot go away as long as this dnode is 0N/A * Lock the dnode handle to prevent the dnode from obtaining any new 0N/A * holds. This also prevents the descendant dbufs and the bonus dbuf 0N/A * from accessing the dnode, so that we can discount their holds. The 0N/A * handle is safe to access because we know that while the dnode cannot 0N/A * go away, neither can its handle. Once we hold dnh_zrlock, we can 0N/A * safely move any dnode referenced only by dbufs. 0N/A * Ensure a consistent view of the dnode's holds and the dnode's dbufs. 0N/A * We need to guarantee that there is a hold for every dbuf in order to 0N/A * determine whether the dnode is actively referenced. Falsely matching 0N/A * a dbuf to an active hold would lead to an unsafe move. It's possible 0N/A * that a thread already having an active dnode hold is about to add a 0N/A * dbuf, and we can't compare hold and dbuf counts while the add is in 0N/A * A dbuf may be removed (evicted) without an active dnode hold. In that 0N/A * case, the dbuf count is decremented under the handle lock before the 0N/A * dbuf's hold is released. This order ensures that if we count the hold 0N/A * after the dbuf is removed but before its hold is released, we will 0N/A * treat the unmatched hold as active and exit safely. If we count the 0N/A * hold before the dbuf is removed, the hold is discounted, and the 0N/A * removal is blocked until the move completes. /* We can't have more dbufs than dnode holds. */ * At this point we know that anyone with a hold on the dnode is not * actively referencing it. The dnode is known and in a valid state to * move. We're holding the locks needed to execute the critical section. /* If the dnode was safe to move, the refcount cannot have changed. */ * Wait for final references to the dnode to clear. This can * only happen if the arc is asyncronously evicting state that * has a hold on this dnode while we are trying to evict this for (i = 0; i <
epb; i++) {
* The dnode handle lock guards against the dnode moving to * another valid address, so there is no need here to guard * against changes to or from NULL. * If there are holds on this dnode, then there should * be holds on the dnode's containing dbuf as well; thus * it wouldn't be eligible for eviction and this function * would not have been called. * EINVAL - invalid object number. * succeeds even for free dnodes. * If you are holding the spa config lock as writer, you shouldn't * be asking the DMU to do *anything* unless it's the root pool * which may require us to read from the root filesystem while * holding some (not all) of the locks as writer. for (i = 0; i <
epb; i++) {
for (i = 0; i <
epb; i++) {
/* Now we can rely on the hold to prevent the dnode from moving. */ * Return held dnode if the object is allocated, NULL if not. * Can only add a reference if there is already at least one * reference on the dnode. Returns FALSE if unable to add a /* Get while the hold prevents the dnode from moving. */ * It's unsafe to release the last hold on a dnode by dnode_rele() or * indirectly by dbuf_rele() while relying on the dnode handle to * prevent the dnode from moving, since releasing the last hold could * result in the dnode's parent dbuf evicting its dnode handles. For * that reason anyone calling dnode_rele() or dbuf_rele() without some * other direct or indirect hold on the dnode must first drop the dnode /* NOTE: the DNODE_DNODE does not have a dn_dbuf */ * Another thread could add a hold to the dnode handle in * dnode_hold_impl() while holding the parent dbuf. Since the * hold on the parent dbuf prevents the handle from being * destroyed, the hold on the handle is OK. We can't yet assert * that the handle has zero references, but that will be * asserted anyway when the handle gets destroyed. * Determine old uid/gid when necessary * If we are already marked dirty, we're done. * The dnode maintains a hold on its containing dbuf as * long as there are holds on it. Each instantiated child * dbuf maintains a hold on the dnode. When the last child * drops its hold, the dnode will drop its hold on the * containing dbuf. We add a "dirty hold" here so that the * dnode will hang around after we finish processing its /* we should be the only holder... hopefully */ /* ASSERT3U(refcount_count(&dn->dn_holds), ==, 1); */ * If the dnode is already dirty, it needs to be moved from * the dirty list to the free list. * Try to change the block size for the indicated dnode. This can only * succeed if there are no blocks allocated or dirty beyond first block /* Check for any allocated blocks beyond the first */ /* resize the old block */ /* rele after we have fixed the blocksize in the dnode */ /* read-holding callers must not rely on the lock being continuously held */ * if we have a read-lock, check to see if we need to do any work * before upgrading to a write-lock. * Compute the number of levels necessary to support the new maxblkid. /* dirty the left indirects */ /* transfer the dirty records to the new indirect */ * First, block align the region to free: * Freeing the whole block; fast-track this request. * Note that we won't dirty any indirect blocks, * which is fine because we will be freeing the entire * file and thus all indirect blocks will be freed /* Freeing past end-of-data */ /* Freeing part of the block. */ /* zero out any partial block data at the start of the range */ /* don't dirty if it isn't on disk and isn't dirty */ /* If the range was less than one block, we're done */ /* If the remaining range is past end of file, we're done */ /* zero out any partial block data at the end of the range */ /* don't dirty if not on disk and not dirty */ /* If the range did not include a full block, we are done */ * Dirty the first and last indirect blocks, as they (and/or their * parents) will need to be written out if they were only * partially freed. Interior indirect blocks will be themselves freed, * by free_children(), so they need not be dirtied. Note that these * interior blocks have already been prefetched by dmu_tx_hold_free(). * Add this range to the dnode range list. * We will finish up this free operation in the syncing phase. /* return TRUE if this blkid was freed in a recent txg, or FALSE if it wasn't */ * If we're in the process of opening the pool, dp will not be * set yet, but there shouldn't be anything dirty. /* call from syncing context when we actually write/free space for this dnode */ * Call when we think we're going to write/free space in open context to track * the amount of memory in use by the currently open txg. * Scans a block at the indicated "level" looking for a hole or data, * If level > 0, then we are scanning an indirect block looking at its * pointers. If level == 0, then we are looking at a block of dnodes. * If we don't find what we are looking for in the block, we return ESRCH. * Otherwise, return with *offset pointing to the beginning (if searching * forwards) or end (if searching backwards) of the range covered by the * block pointer we matched on (or dnode). * The basic search algorithm used below by dnode_next_offset() is to * use this function to search up the block tree (widen the search) until * we find something (i.e., we don't return ESRCH) and then search back * down the tree (narrow the search) until we reach our original search dprintf(
"probing object %llu offset %llx level %d of %u\n",
* This can only happen when we are searching up * the block tree for data. We don't really need to * adjust the offset, as we will just end up looking * at the pointer to this block in its parent, and its * going to be unallocated, so we will skip over it. * This can only happen when we are searching up the tree * and these conditions mean that we need to keep climbing. i >= 0 && i <
epb; i +=
inc) {
/* traversing backwards; position offset at the end */ * Find the next hole, data, or sparse region at or after *offset. * The value 'blkfill' tells us how many items we expect to find * in an L0 data block; this value is 1 for normal objects, * DNODES_PER_BLOCK for the meta dnode, and some fraction of * DNODES_PER_BLOCK when searching for sparse regions thereof. * dnode_next_offset(dn, flags, offset, 1, 1, 0); * Used in dmu_offset_next(). * dnode_next_offset(mdn, flags, offset, 0, DNODES_PER_BLOCK, txg); * Only finds objects that have new contents since txg (ie. * bonus buffer changes and content removal are ignored). * Used in dmu_object_next(). * dnode_next_offset(mdn, DNODE_FIND_HOLE, offset, 2, DNODES_PER_BLOCK >> 2, 0); * Finds the next L2 meta-dnode bp that's at most 1/4 full. * Used in dmu_object_alloc(). * There's always a "virtual hole" at the end of the object, even * if all BP's which physically exist are non-holes.