zfs_dir.c revision a6e57bd4c7a2bf9cc33be939d674d4c7d3e67cce
1N/A * The contents of this file are subject to the terms of the 1N/A * Common Development and Distribution License (the "License"). 1N/A * You may not use this file except in compliance with the License. 1N/A * See the License for the specific language governing permissions 1N/A * and limitations under the License. 1N/A * When distributing Covered Code, include this CDDL HEADER in each 1N/A * If applicable, add the following below this CDDL HEADER, with the 1N/A * fields enclosed by brackets "[]" replaced with your own identifying 1N/A * information: Portions Copyright [yyyy] [name of copyright owner] 1N/A * Copyright 2008 Sun Microsystems, Inc. All rights reserved. 1N/A * Use is subject to license terms. 1N/A * zfs_match_find() is used by zfs_dirent_lock() to peform zap lookups 1N/A * of names after deciding which is the appropriate lookup interface. 1N/A * In the non-mixed case we only expect there would ever 1N/A * be one match, but we need to use the normalizing lookup. 1N/A * Lock a directory entry. A dirlock on <dzp, name> protects that name 1N/A * in dzp's directory zap object. As long as you hold a dirlock, you can 1N/A * assume two things: (1) dzp cannot be reaped, and (2) no other thread 1N/A * can change the zap entry for (i.e. link or unlink) this name. 1N/A * dzp - znode for directory 1N/A * name - name of entry to lock 1N/A * flag - ZNEW: if the entry already exists, fail with EEXIST. 1N/A * ZEXISTS: if the entry does not exist, fail with ENOENT. 1N/A * ZSHARED: allow concurrent access with other ZSHARED callers. 1N/A * ZXATTR: we want dzp's xattr directory 1N/A * ZCILOOK: On a mixed sensitivity file system, 1N/A * this lookup should be case-insensitive. 1N/A * ZCIEXACT: On a purely case-insensitive file system, 1N/A * this lookup should be case-sensitive. 1N/A * ZRENAMING: we are locking for renaming, force narrow locks 1N/A * zpp - pointer to the znode for the entry (NULL if there isn't one) 1N/A * dlpp - pointer to the dirlock for this entry (NULL on error) 1N/A * direntflags - (case-insensitive lookup only) 1N/A * flags if multiple case-sensitive matches exist in directory 1N/A * realpnp - (case-insensitive lookup only) 1N/A * actual name matched within the directory 1N/A * Return value: 0 on success or errno on failure. 1N/A * NOTE: Always checks for, and rejects, '.' and '..'. 1N/A * NOTE: For case-insensitive file systems we take wide locks (see below), 1N/A * but return znode pointers to a single match. 1N/A * Verify that we are not trying to lock '.', '..', or '.zfs' 1N/A * Case sensitivity and normalization preferences are set when 1N/A * the file system is created. These are stored in the 1N/A * zfsvfs->z_case and zfsvfs->z_norm fields. These choices 1N/A * affect what vnodes can be cached in the DNLC, how we 1N/A * perform zap lookups, and the "width" of our dirlocks. 1N/A * A normal dirlock locks a single name. Note that with 1N/A * normalization a name can be composed multiple ways, but 1N/A * when normalized, these names all compare equal. A wide 1N/A * dirlock locks multiple names. We need these when the file 1N/A * system is supporting mixed-mode access. It is sometimes 1N/A * necessary to lock all case permutations of file name at 1N/A * behaves as rationally as possible. 1N/A * Decide if exact matches should be requested when performing 1N/A * a zap lookup on file systems supporting case-insensitive 1N/A * Only look in or update the DNLC if we are looking for the 1N/A * name on a file system that does not require normalization 1N/A * or case folding. We can also look there if we happen to be 1N/A * on a non-normalizing, mixed sensitivity file system IF we 1N/A * are looking for the exact name. 1N/A * Maybe can add TO-UPPERed version of name to dnlc in ci-only 1N/A * case for performance improvement? 1N/A * ZRENAMING indicates we are in a situation where we should 1N/A * take narrow locks regardless of the file system's 1N/A * preferences for normalizing and case folding. This will 1N/A * prevent us deadlocking trying to grab the same wide lock 1N/A * twice if the two names happen to be case-insensitive 1N/A * Wait until there are no locks on this name. 1N/A * Allocate a new dirlock and add it to the list. 1N/A * We're the second shared reference to dl. Make a copy of 1N/A * dl_name in case the first thread goes away before we do. 1N/A * Note that we initialize the new name before storing its 1N/A * pointer into dl_name, because the first thread may load 1N/A * dl->dl_name at any time. He'll either see the old value, 1N/A * which is his, or the new shared copy; either is OK. 1N/A * We have a dirlock on the name. (Note that it is the dirlock, 1N/A * not the dzp's z_lock, that protects the name in the zap object.) 1N/A * See if there's an object by this name; if so, put a hold on it. 1N/A * Unlock this directory entry and wake anyone who was waiting for it. 1N/A * Look up an entry in a directory. 1N/A * NOTE: '.' and '..' are handled as special cases because 1N/A * no directory entries are actually stored for them. If this is 1N/A * the root of a filesystem, then '.zfs' is also treated as a 1N/A * special pseudo-directory. 1N/A * If we are a snapshot mounted under .zfs, return 1N/A * the vp for the snapshot directory. 1N/A * unlinked Set (formerly known as the "delete queue") Error Handling 1N/A * When dealing with the unlinked set, we dmu_tx_hold_zap(), but we 1N/A * don't specify the name of the entry that we will be manipulating. We 1N/A * also fib and say that we won't be adding any new entries to the 1N/A * unlinked set, even though we might (this is to lower the minimum file 1N/A * size that can be deleted in a full filesystem). So on the small 1N/A * chance that the nlink list is using a fat zap (ie. has more than 1N/A * 2000 entries), we *may* not pre-read a block that's needed. 1N/A * Therefore it is remotely possible for some of the assertions 1N/A * regarding the unlinked set below to fail due to i/o error. On a 1N/A * nondebug system, this will result in the space being leaked. 1N/A * Clean up any znodes that had no links when we either crashed or 1N/A * (force) umounted the file system. 1N/A * Interate over the contents of the unlinked set. 1N/A * See what kind of object we have in list 1N/A * We need to re-mark these list entries for deletion, 1N/A * so we pull them back into core and set zp->z_unlinked. 1N/A * We may pick up znodes that are already marked for deletion. 1N/A * This could happen during the purge of an extended attribute 1N/A * directory. All we need to do is skip over them, since they 1N/A * are already in the system marked z_unlinked. 1N/A * Delete the entire contents of a directory. Return a count 1N/A * of the number of entries that could not be deleted. If we encounter 1N/A * an error, return a count of at least one so that the directory stays 1N/A * in the unlinked set. 1N/A * NOTE: this function assumes that the directory is inactive, 1N/A * so there is no need to lock its entries before deletion. 1N/A * Also, it assumes the directory contents is *only* regular 1N/A * If this is a ZIL replay then leave the object in the unlinked set. 1N/A * Otherwise we can get a deadlock, because the delete can be 1N/A * quite large and span multiple tx's and txgs, but each replay 1N/A * creates a tx to atomically run the replay function and mark the 1N/A * replay record as complete. We deadlock trying to start a tx in 1N/A * a new txg to further the deletion but can't because the replay 1N/A * tx hasn't finished. 1N/A * We actually delete the object if we get a failure to create an 1N/A * object in zil_replay_log_record(), or after calling zil_replay(). 1N/A * If this is an attribute directory, purge its contents. 1N/A * Not enough space to delete some xattrs. 1N/A * Leave it in the unlinked set. 1N/A * Free up all the data in the file. 1N/A * Not enough space. Leave the file in the unlinked set. 1N/A * If the file has extended attributes, we're going to unlink 1N/A * Set up the final transaction. 1N/A * Not enough space to delete the file. Leave it in the 1N/A * unlinked set, leaking it until the fs is remounted (at 1N/A * which point we'll call zfs_unlinked_drain() to process it). 1N/A /* Remove this znode from the unlinked set */ 1N/A * Link zp into dl. Can only fail if zp has been unlinked. 1N/A * Unlink zp from dl, and mark zp for deletion if this was the last link. 1N/A * Can fail if zp is a mount point (EBUSY) or a non-empty directory (EEXIST). 1N/A * If 'unlinkedp' is NULL, we put unlinked znodes on the unlinked list. 1N/A * If it's non-NULL, we use it to indicate whether the znode needs deletion, 1N/A * and it's the caller's job to do it. 1N/A "should be at least %u",
1N/A * Indicate whether the directory is empty. Works with or without z_lock 1N/A * held, but can only be consider a hint in the latter case. Returns true 1N/A * if only "." and ".." remain and there's no work in progress. 1N/A * Return a znode for the extended attribute directory for zp. 1N/A * ** If the directory does not already exist, it is created ** 1N/A * IN: zp - znode to obtain attribute directory from 1N/A * cr - credentials of caller 1N/A * flags - flags from the VOP_LOOKUP call 1N/A * OUT: xzpp - pointer to extended attribute znode 1N/A * RETURN: 0 on success 1N/A * error number on failure 1N/A * The ability to 'create' files in an attribute 1N/A * directory comes from the write_xattr permission on the base file. 1N/A * The ability to 'search' an attribute directory requires 1N/A * read_xattr permission on the base file. 1N/A * is controlled by the permissions on the attribute file. 1N/A /* NB: we already did dmu_tx_wait() if necessary */ 1N/A * Decide whether it is okay to remove within a sticky directory. 1N/A * In sticky directories, write access is not sufficient; 1N/A * you can remove entries from a directory only if: 1N/A * you own the directory, 1N/A * you own the entry, 1N/A * the entry is a plain file and you have write access, 1N/A * or you are privileged (checked in secpolicy...). 1N/A * The function returns 0 if remove access is granted.