dmu_tx.c revision 3e30c24aeefdee1631958ecf17f18da671781956
2N/A * The contents of this file are subject to the terms of the 2N/A * Common Development and Distribution License (the "License"). 2N/A * You may not use this file except in compliance with the License. 2N/A * See the License for the specific language governing permissions 2N/A * and limitations under the License. 2N/A * When distributing Covered Code, include this CDDL HEADER in each 2N/A * If applicable, add the following below this CDDL HEADER, with the 2N/A * fields enclosed by brackets "[]" replaced with your own identifying 2N/A * information: Portions Copyright [yyyy] [name of copyright owner] 2N/A * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. 2N/A * Copyright 2011 Nexenta Systems, Inc. All rights reserved. 2N/A * Copyright (c) 2013 by Delphix. All rights reserved. * dn->dn_assigned_txg == tx->tx_txg doesn't pose a * problem, but there's no way for it to happen (for * If we're syncing, they can manipulate any object anyhow, and * the hold on the dnode_t can cause problems. * For i/o error checking, read the first and last level-0 * blocks (if they are not aligned), and all the level-1 blocks. /* first level-0 block */ * The blocksize can't change, * so we can make a more precise estimate. * If this write is not off the end of the file * Account for new indirects appearing * before this IO gets assigned into a txg. * 'end' is the last thing we will access, not one past. * This way we won't overflow when accessing the last byte. * The object contains at most 2^(64 - min_bs) blocks, * and each indirect level maps 2^epbs. * We also need a new blkid=0 indirect block * to reference any existing file data. * The struct_rwlock protects us against dn_nlevels * changing, in case (against all odds) we manage to dirty & * sync out the changes after we check for being dirty. * Also, dbuf_hold_impl() wants us to have the struct_rwlock. l0span =
nblks;
/* save for later use to calc level > 1 overhead */ for (i = 0; i <
nblks; i++) {
* We don't check memory_tohold against DMU_MAX_ACCESS because * memory_tohold is an over-estimation (especially the >L1 * indirect blocks), so it could fail. Callers should have * already verified that they will not be holding too much for (i = 0; i <
tochk; i++) {
* Add in memory requirements of higher-level indirects. * This assumes a worst-possible scenario for dn_nlevels and a * worst-possible distribution of l1-blocks over the region to free. * Here we don't use DN_MAX_LEVEL, but calculate it with the * given datablkshift and indblkshift. This makes the * difference between 19 and 8 on large files. /* account for new level 1 indirect blocks that might show up */ * For i/o error checking, read the first and last level-0 * blocks, and all the level-1 blocks. The above count_write's * have already taken care of the level-0 blocks. * We will be able to fit a new object's entries into one leaf * block. So there will be at most 2 blocks total, * including the header block. * If there is only one block (i.e. this is a micro-zap) * and we are not adding anything, the accounting is simple. * Use max block size here, since we don't know how much * the size will change between now and the dbuf dirty call. * access the name in this fat-zap so that we'll check * for i/o errors to the leaf blocks, etc. * If the modified blocks are scattered to the four winds, * we'll have to modify an indirect twig for each. * By asserting that the tx is assigned, we're counting the * number of dn_tx_holds, which is the same as the number of * dn_holds. Otherwise, we'd be counting dn_holds, but * dn_tx_holds could be 0. /* if (tx->tx_anyobj == TRUE) */ /* XXX No checking on the meta dnode for now */ /* XXX txh_arg2 better not be zero... */ dprintf(
"found txh type %x beginblk=%llx endblk=%llx\n",
* We will let this hold work for the bonus * or spill buffer so that we don't need to * hold it when creating a new object. * They might have to increase nlevels, * thus dirtying the new TLIBs. Or the * might have to change the block size, * thus dirying the new lvl=0 blk=0. * We will dirty all the level 1 blocks in * the free range and perhaps the first and panic(
"dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n",
* If the user has indicated a blocking failure mode * then return ERESTART which will block in dmu_tx_wait(). * Otherwise, return EIO so that an error can get * propagated back to the VOP calls. * Note that we always honor the txg_how flag regardless * of the failuremode setting. * NB: No error returns are allowed after txg_hold_open, but * before processing the dnode holds, due to the * dmu_tx_unassign() logic. * If a snapshot has been taken since we made our estimates, * assume that we won't be able to free or overwrite anything. /* needed allocation: worst-case estimate of write space */ /* freed space estimate: worst-case overwrite + free estimate */ /* convert unrefd space to worst-case estimate */ /* calculate memory footprint estimate */ * Add in 'tohold' to account for our dirty holds on this memory * XXX - the "fudge" factor is to account for skipped blocks that * we missed because dnode_next_offset() misses in-core-only blocks. * Walk the transaction's hold list, removing the hold on the * associated dnode, and notifying waiters if the refcount drops to 0. * Assign tx to a transaction group. txg_how can be one of: * (1) TXG_WAIT. If the current open txg is full, waits until there's * a new one. This should be used when you're not holding locks. * It will only fail if we're truly out of space (or over quota). * (2) TXG_NOWAIT. If we can't assign into the current open txg without * blocking, returns immediately with ERESTART. This should be used * whenever you're holding locks. On an ERESTART error, the caller * should drop locks, do a dmu_tx_wait(tx), and try again. /* If we might wait, we must not hold the config lock. */ * It's possible that the pool has become active after this thread * has tried to obtain a tx. If that's the case then his * tx_lasttried_txg would not have been assigned. * Go through the transaction's hold list and remove holds on * associated dnodes, notifying waiters if no holds remain. dprintf(
"towrite=%llu written=%llu tofree=%llu freed=%llu\n",
* Call any registered callbacks with an error code. * Call all the commit callbacks on a list, with a given error code. * Interface to hold a bunch of attributes. * used for creating new files. * attrsize is the total size of all attributes * to be added during object creation * For updating/adding a single attribute dmu_tx_hold_sa() should be used. * hold necessary attribute name for attribute registration. * should be a very rare case where this is needed. If it does * happen it would only happen on the first write to the file system. /* If blkptr doesn't exist then add space to towrite */ * dmu_tx_hold_sa(dmu_tx_t *tx, sa_handle_t *, attribute, add, size) * variable_size is the total size of all variable sized attributes * passed to this function. It is not the total size of all * variable size attributes that *may* exist on this object.