zvol.c revision b7e50089acb2c25bc533ab07f5326ff48cfe3621
0N/A * The contents of this file are subject to the terms of the 0N/A * Common Development and Distribution License (the "License"). 0N/A * You may not use this file except in compliance with the License. 0N/A * See the License for the specific language governing permissions 0N/A * and limitations under the License. 0N/A * When distributing Covered Code, include this CDDL HEADER in each 0N/A * If applicable, add the following below this CDDL HEADER, with the 0N/A * fields enclosed by brackets "[]" replaced with your own identifying 0N/A * information: Portions Copyright [yyyy] [name of copyright owner] 0N/A * Copyright 2008 Sun Microsystems, Inc. All rights reserved. 0N/A * Use is subject to license terms. 0N/A#
pragma ident "%Z%%M% %I% %E% SMI" 0N/A * ZFS volume emulation driver. 0N/A * Makes a DMU object look like a volume of arbitrary size, up to 2^64 bytes. 0N/A * Volumes are accessed through the symbolic links named: 0N/A * These links are created by the ZFS-specific devfsadm link generator. 0N/A * Volumes are persistent through reboot. No user command needs to be 0N/A * run before opening and using a device. 0N/A * This lock protects the zvol_state structure from being modified 0N/A * while it's being used, e.g. an open that comes in before a create 0N/A * finishes. It also protects temporary opens of the dataset so that, 0N/A * e.g., an open doesn't get a spurious EBUSY. 0N/A * The list of extents associated with the dump device 0N/A * The in-core state of each volume. 0N/A * zvol specific flags 0N/A * zvol maximum transfer in one DMU tx. 0N/A /* Notify specfs to invalidate the cached size */ 0N/A * Find a free minor number. 0N/A/* extent mapping arg */ 0N/A /* If there is an error, then keep trying to make progress */ 0N/A /* Abort immediately if we have encountered gang blocks */ 0N/A /* second block in this extent */ 0N/A * the block we allocated has the same 0N/A * dtrace -n 'zfs-dprintf 0N/A * printf("%s: %s", stringof(arg1), stringof(arg3)) 0N/A dprintf(
"ma_extent 0x%lx mrstride 0x%lx stride %lx\n",
0N/A /* start a new extent */ 0N/A * These properties must be removed from the list so the generic 0N/A * property setting step won't apply to them. 0N/A * Replay a TX_WRITE ZIL transaction that didn't get committed 0N/A * after a system failure 0N/A char *
data = (
char *)(
lr +
1);
/* data follows lr_write_t */ 0N/A * Callback vectors for replaying records. 0N/A * Only TX_WRITE is needed for zvol. 0N/A * reconstruct dva that gets us to the desired offset (offset 0N/A /* we've reached the end of this array */ 0N/A * We currently don't support dump devices when the pool 0N/A * is so fragmented that our allocation has resulted in 0N/A * Create a minor node (plus a whole lot more) for the specified volume. 0N/A * If there's an existing /dev/zvol symlink, try to use the 0N/A * same minor number we used last time. 0N/A * If we found a minor but it's already in use, we must pick a new one. 0N/A /* get and cache the blocksize */ 0N/A /* XXX this should handle the possible i/o error */ 0N/A * Remove minor node for the specified volume. 0N/A /* Check the space usage before attempting to allocate the space */ 0N/A /* Free old extents if they exist */ 0N/A /* allocate the blocks by writing each one */ * Reinitialize the dump area to the new size. If we * failed to resize the dump area then restore the it back to if (
minor == 0)
/* This is the control device */ if (
minor == 0)
/* This is the control device */ * The next statement is a workaround for the following DDI bug: * 6343604 specfs race: multiple "last-close" of the same device * If the open count is zero, this is a spurious close. * That indicates a bug in the kernel / DDI framework. * You may get multiple opens, but only one close. * Get data to generate a TX_WRITE intent log record. * Write records come in two flavors: immediate and indirect. * For small writes it's cheaper to store the data with the * log record (immediate); for large writes it's cheaper to * sync the data and get a pointer to it (indirect) so that * we don't have to write the data twice. if (
buf !=
NULL)
/* immediate write */ * Lock the range of the block to ensure that when the data is * written out and its checksum is being calculated that no other * thread can change the block. * If we get EINPROGRESS, then we need to wait for a * write IO initiated by dmu_sync() to complete before * we can release this dbuf. We will finish everything * up in the zvol_get_done() callback. * zvol_log_write() handles synchronous writes using TX_WRITE ZIL transactions. * We store data in the log buffers if it's small enough. * Otherwise we will later flush the data out via dmu_sync(). /* restrict requests to multiples of the system block size */ * There must be no buffer changes when doing a dmu_sync() because * we can't change the data whilst calculating the checksum. /* can't straddle a block boundary */ * Set the buffer count to the zvol maximum transfer. * Using our own routine instead of the default minphys() * means that for larger writes we write bigger buffers on X86 * (128K instead of 56K) and flush the disk write cache less often * (every zvol_maxphys - currently 1MB) instead of minphys (currently * 56K on X86 and 128K on sparc). if (
minor == 0)
/* This is the control device */ /* dump should know better than to write here */ /* can't straddle a block boundary */ if (
minor == 0)
/* This is the control device */ /* don't read past the end */ if (
minor == 0)
/* This is the control device */ * Dirtbag ioctls to support mkfs(1M) for UFS filesystems. See dkio(7I). * Some clients may attempt to request a PMBR for the * zvol. Currently this interface will return ENOTTY to * such requests. These requests could be supported by * adding a check for lba == 0 and consing up an appropriate * commands using these (like prtvtoc) expect ENOTSUP * since we're emulating an EFI label * If we are resizing the dump device then we only need to * update the refreservation to match the newly updated * zvolsize. Otherwise, we save off the original state of the * zvol so that we can restore them if the zvol is ever undumpified. * We only need update the zvol's property if we are initializing * the dump area for the first time. /* Allocate the space for the dump */ * We do not support swap devices acting as dump devices. * Build up our lba mapping. * Attempt to restore the zvol back to its pre-dumpified state. * This is a best-effort attempt as it's possible that not all * of these properties were initialized during the dumpify process * (i.e. error during zvol_dump_init).