nfs4_srv.c revision a08f57bc3483a7b354af646c7950bdddf23c1d6a
2N/A * The contents of this file are subject to the terms of the 2N/A * Common Development and Distribution License (the "License"). 2N/A * You may not use this file except in compliance with the License. 2N/A * See the License for the specific language governing permissions 2N/A * and limitations under the License. 2N/A * When distributing Covered Code, include this CDDL HEADER in each 2N/A * If applicable, add the following below this CDDL HEADER, with the 2N/A * fields enclosed by brackets "[]" replaced with your own identifying 2N/A * information: Portions Copyright [yyyy] [name of copyright owner] 2N/A * Copyright 2009 Sun Microsystems, Inc. All rights reserved. 2N/A * Use is subject to license terms. 2N/A * Copyright (c) 1983,1984,1985,1986,1987,1988,1989 AT&T. 2N/A * All Rights Reserved * Used to bump the stateid4.seqid value and show changes in the stateid * RFS4_MINLEN_ENTRY4: XDR-encoded size of smallest possible dirent. * This is used to return NFS4ERR_TOOSMALL when clients specify * maxcount that isn't large enough to hold the smallest possible * sizeof cookie (8 bytes) + * sizeof name_len (4 bytes) + * sizeof smallest (padded) name (4 bytes) + * sizeof bitmap4_len (12 bytes) + NOTE: we always encode len=2 bm4 * sizeof attrlist4_len (4 bytes) + * sizeof next boolean (4 bytes) * RFS4_MINLEN_RDDIR4: XDR-encoded size of READDIR op reply containing * the smallest possible entry4 (assumes no attrs requested). * sizeof nfsstat4 (4 bytes) + * sizeof verifier4 (8 bytes) + * sizeof entry4list bool (4 bytes) + * sizeof entry4 (36 bytes) + * sizeof eof bool (4 bytes) * RFS4_MINLEN_RDDIR_BUF: minimum length of buffer server will provide to * VOP_READDIR. Its value is the size of the maximum possible dirent * for solaris. The DIRENT64_RECLEN macro returns the size of dirent * required for a given name length. MAXNAMELEN is the maximum * filename length allowed in Solaris. The first two DIRENT64_RECLEN() * macros are to allow for . and .. entries -- just a minor tweak to try * and guarantee that buffer we give to VOP_READDIR will be large enough * to hold ., .., and the largest possible solaris dirent64. * It would be better to pad to 4 bytes since that's what XDR would do, * but the dirents UFS gives us are already padded to 8, so just take * what we're given. Dircount is only a hint anyway. Currently the * solaris kernel is ASCII only, so there's no point in calling the * dirent64: named padded to provide 8 byte struct alignment * d_ino(8) + d_off(8) + d_reclen(2) + d_name(namelen + null(1) + pad) * cookie: uint64_t + utf8namelen: uint_t + utf8name padded to 8 bytes * translation table for attrs void (*
dis_resfree)();
/* frees space allocated by proc */ /* OP_OPEN_CONFIRM = 20 */ /* OP_OPEN_DOWNGRADE = 21 */ /* OP_SETCLIENTID = 35 */ /* OP_SETCLIENTID_CONFIRM = 36 */ /* OP_RELEASE_LOCKOWNER = 39 */ "rfs4_op_open_downgrade",
"rfs4_op_setclient_confirm",
"rfs4_op_release_lockowner",
* The following algorithm attempts to find a unique verifier * to be used as the write verifier returned from the server * to the client. It is important that this verifier change * whenever the server reboots. Of secondary importance, it * is important for the verifier to be unique between two * Thus, an attempt is made to use the system hostid and the * current time in seconds when the nfssrv kernel module is * loaded. It is assumed that an NFS server will not be able * to boot and then to reboot in less than a second. If the * hostid has not been set, then the current high resolution * time is used. This will ensure different verifiers each * time the server reboots and minimize the chances that two * different servers will have the same verifier. * XXX - this is broken on LP64 kernels. /* Used to manage access to server instance linked list */ /* Used to manage access to rfs4_deleg_policy */ * returns true if the instance's grace period has never been started * Indicates if server instance is within the * reset all currently active grace periods * start any new instances' grace periods * Take a copy of the string, since the original may be overwritten. * Sadly, no strdup() in the kernel. /* associate with servinst */ * Add to list of served paths. * No locking required, as we're only ever called at startup. /* this is the first dss_path_t */ * Create a new server instance, and make it the currently active instance. * Note that starting the grace period too early will reduce the clients' * This initial dummy entry is required to setup for insque/remque. * It must be skipped over whenever the list is traversed. /* insque/remque require initial list entry to be self-terminated */ /* make the new instance "current" */ * In future, we might add a rfs4_servinst_destroy(sip) but, for now, destroy * all instances directly. * Assign the current server instance to a client_t. * Should be called with cp->dbe held. * The lock ensures that if the current instance is in the process * of changing, we will see the new one. * This is a fall-through for invalid or not implemented (yet) ops * Check if the security flavor, nfsnum, is in the flavor_list. for (i = 0; i <
count; i++) {
* Used by rfs4_op_secinfo to get the security information from the * export structure associated with the component. * If dotdotting, then need to check whether it's above the * root of a filesystem, or above an export point. * If dotdotting at the root of a filesystem, then * need to traverse back to the mounted-on filesystem * and do the dotdot lookup there. * If at the system root, then can * Traverse back to the mounted-on filesystem * Set the different_export flag so we remember * to pick up a new exportinfo entry for * If dotdotting above an export point then set * the different_export to get new export info. * Get the vnode for the component "nm". * If the vnode is in a pseudo filesystem, or if the security flavor * used in the request is valid but not an explicitly shared flavor, * or the access bit indicates that this is a limited access, * check whether this vnode is visible. * If it's a mountpoint, then traverse it. /* remember that we had to traverse mountpoint */ * If vp isn't a mountpoint and the vfs ptrs aren't the same, * then vp is probably an LOFS object. We don't need the * realvp, we just need to know that we might have crossed * a server fs boundary and need to call checkexport4. * (LOFS lookup hides server fs mountpoints, and actually calls * Get the export information for it. * If this vnode is a mounted-on vnode, * but the mounted-on file system is not * exported, send back the secinfo for * the exported node that the mounted-on * Create the secinfo result based on the security information * from the exportinfo structure (exi). * Return all flavors for a pseudo node. * For a real export node, return the flavor that the client for (i = 0; i <
count; i++) {
/* get oid opaque data */ /* find out which flavors to return */ for (i = 0; i <
count; i ++) {
/* Create the returning SECINFO value */ for (i = 0; i <
count; i++) {
* If the flavor is in the flavor list, /* get oid opaque data */ * SECINFO (Operation 33): Obtain required security information on * the component name in the format of (security-mechanism-oid, qop, service) * Current file handle (cfh) should have been set before getting * into this function. If not, return error. * Verify the component name. If failed, error out, but * do not error out if the component name is a "..". * SECINFO will return its parents secinfo data for SECINFO "..". /* If necessary, convert to UTF-8 for illbehaved clients */ /* If this is not an Ok result, nothing to free. */ for (i = 0; i <
count; i++) {
#
if 0
/* XXX allow access even if !cs->access. Eventually only pseudo fs */ * If the file system is exported read only, it is not appropriate * to check write permissions for regular files and directories. * Special files are interpreted by the client, so the underlying * permissions are sent back to the client for interpretation. * We need the mode so that we can correctly determine access * permissions relative to a mandatory lock file. Access to * mandatory lock files is denied on the server, so it might * as well be reflected to the server during the open. "got client label from request(1)",
char *,
"got server label(1) for vp(2)",
* If we can't get the attributes, then we can't do the * right access checking. So, we'll fail the request. * rfs4_op_mknod is called from rfs4_op_create after all initial verification * was completed. It does the nfsv4 create for special files. * nfsv4 create is used to create non-regular files. For regular files, * If there is an unshared filesystem mounted on this vnode, * do not allow to create an object in this directory. /* Verify that type is correct */ * Name of newly created object /* If necessary, convert to UTF-8 for poorly behaved clients */ /* Get "before" change value */ * Set default initial values for attributes when not specified vap->
va_mode = 0
700;
/* default: owner rwx only */ * Get the initial "after" sequence number, if it fails, vap->
va_mode = 0
700;
/* default: owner rwx only */ * symlink names must be treated as data * Get the initial "after" sequence number, if it fails, * va_seq is not safe over VOP calls, check it again * if it has changed zero out iva to force atomic = FALSE. * probably a special file. * We know this will only generate one VOP call * Get the initial "after" sequence number, if it fails, * Force modified data and metadata out to stable storage. * Finish setup of cinfo response, "before" value already set. * Get "after" change value, if it fails, simply return the * True verification that object was created with correct * attrs is impossible. The attrs could have been changed * immediately after object creation. If attributes did * not verify, the only recourse for the server is to * destroy the object. Maybe if some attrs (like gid) * are set incorrectly, the object should be destroyed; * however, seems bad as a default policy. Do we really * want to destroy an object over one of the times not * verifying correctly? For these reasons, the server * currently sets bits in attrset for createattrs * that were set; however, no verification is done. * vmask_to_nmask accounts for vattr bits set on create * [do_rfs4_set_attrs() only sets resp bits for * Mask off any bits set by default so as not to return * more attrset bits than were requested in createattrs * The cinfo.atomic = TRUE only if we got no errors, we have * non-zero va_seq's, and it has incremented by exactly one * during the creation and it didn't change during the VOP_LOOKUP * Force modified metadata out to stable storage. * if a underlying vp exists, pass it to VOP_FSYNC /* Ensure specified filehandle matches */ * Check to see if a given "flavor" is an explicitly shared flavor. * The assumption of this routine is the "flavor" is already a valid * flavor in the secinfo list of "exi". * # share -o sec=flavor1 /export * flavor2 is not an explicitly shared flavor for /export, * however it is in the secinfo list for /export thru the * server namespace setup. /* Should not reach this point based on the assumption */ * Check if the security flavor used in the request matches what is * required at the export point or at the root pseudo node (exi_root). * returns 1 if there's a match or if exported with AUTH_NONE; 0 otherwise. * Check cs->nfsflavor (from the request) against * the current export data in cs->exi. * Check the access authority for the client and return the correct error. * First, check if the security flavor used in the request * are among the flavors set in the server namespace. * bitmap4_to_attrmask is called by getattr and readdir. * It sets up the vattr mask and determines whether vfsstat call is needed * based on the input bitmap. * Set rdattr_error_req to true if return error per * failed entry rather than fail the readdir. * Handle the easy cases first * Check is vfsstat is needed * bitmap4_get_sysattrs is called by getattr and readdir. * It calls both VOP_GETATTR and VFS_STATVFS calls to get the attrs. * XXX Should do the same checks for whether the bit is set * xdr_free for getattr will be done later * do_rfs4_op_getattr gets the system attrs and converts into fattr4. /* if no bits requested, then return empty fattr4 */ * return NFS4ERR_INVAL when client requests write-only attrs * Now loop to get or verify the attrs * >0 if sv_getit failed to * get the attr; 0 if succeeded; * <0 if rdattr_error and the * attribute cannot be returned. * If error then just for entry }
else if ((
error > 0) &&
* If rdattr_error was set after the return value for it was assigned, /* freed by rfs4_op_getattr_free() */ /* xdrmem_destroy(&xdrs); */ /* NO-OP */ * res : status (NFS4ERR_OP_ILLEGAL) * link: args: SAVED_FH: file, CURRENT_FH: target directory * res: status. If success - CURRENT_FH unchanged, return change_info /* SAVED_FH: source object */ /* CURRENT_FH: target directory */ * If there is a non-shared filesystem mounted on this vnode, * do not allow to link any file in this directory. /* Check source object's type validity */ /* Check target directory's type */ /* Get "before" change value */ * Get the initial "after" sequence number, if it fails, set to zero * Force modified data and metadata out to stable storage. * Get "after" change value, if it fails, simply return the * The cinfo.atomic = TRUE only if we have * non-zero va_seq's, and it has incremented by exactly one * during the VOP_LINK and it didn't change during the VOP_FSYNC. * Used by rfs4_op_lookup and rfs4_op_lookupp to do the actual work. * If dotdotting, then need to check whether it's * above the root of a filesystem, or above an * If dotdotting at the root of a filesystem, then * need to traverse back to the mounted-on filesystem * and do the dotdot lookup there. * If at the system root, then can * Traverse back to the mounted-on filesystem * Set the different_export flag so we remember * to pick up a new exportinfo entry for * If dotdotting above an export point then set * the different_export to get new export info. * If the vnode is in a pseudo filesystem, check whether it is visible. * XXX if the vnode is a symlink and it is not visible in * a pseudo filesystem, return ENOENT (not following symlink). * V4 client can not mount such symlink. This is a regression * In the same exported filesystem, if the security flavor used * is not an explicitly shared flavor, limit the view to the visible * list entries only. This is not a WRONGSEC case because it's already * If it's a mountpoint, then traverse it. * hold pre_tvp to counteract rele by traverse. We will * need pre_tvp below if checkexport4 fails * The vfsp comparison is to handle the case where * a LOFS mount is shared. lo_lookup traverses mount points, * and NFS is unaware of local fs transistions because * v_vfsmountedhere isn't set. For this special LOFS case, * the dir and the obj returned by lookup will have different * If this vnode is a mounted-on vnode, * but the mounted-on file system is not * exported, send back the filehandle for * the mounted-on vnode, not the root of * the mounted-on file system. /* we're done with pre_tvp now. release extra hold */ * Now we do a checkauth4. The reason is that * exported file system, and if he does, * the client/user may be mapped to a different uid. * We start with a new cr, because the checkauth4 done * in the PUT*FH operation over wrote the cred's uid, * gid, etc, and we want the real thing before calling * After various NFS checks, do a label check on the path * component. The label on this path should either be the * global zone's label or a zone's label. We are only * interested in the zone's label because exported files * in global zone is accessible (though read-only) to * done before reaching this code. "got client label from request(1)",
struct svc_req *,
req);
* We grant access to admin_low label clients * only if the client is trusted, i.e. also * running Solaris Trusted Extension. * if did lookup on attrdir and didn't lookup .., set named /* Assume false for now, open proc will set this */ /* If necessary, convert to UTF-8 for illbehaved clients */ * From NFSV4 Specification, LOOKUPP should not check for * NFS4ERR_WRONGSEC. Retrun NFS4_OK instead. * If file system supports passing ACE mask to VOP_ACCESS then * check for ACE_READ_NAMED_ATTRS, otherwise do legacy checks * The CREATE_XATTR_DIR VOP flag cannot be specified if * the file system is exported read-only -- regardless of * createdir flag. Otherwise the attrdir would be created * (assuming server fs isn't mounted readonly locally). If * VOP_LOOKUP returns ENOENT in this case, the error will * be translated into EROFS. ENOSYS is mapped to ENOTSUP * because specfs has no VOP_LOOKUP op, so the macro would * return ENOSYS. EINVAL is returned by all (current) * Solaris file system implementations when any of their * restrictions are violated (xattr(dir) can't have xattrdir). * Returning NOTSUPP is more appropriate in this case * because the object will never be able to have an attrdir. * There is no requirement for an attrdir fh flag * because the attrdir has a vnode flag to distinguish * it from regular (non-xattr) directories. The * FH4_ATTRDIR flag is set for future sanity checks. * Don't block on mandatory locks. If this routine returns * EAGAIN, the caller should return NFS4ERR_LOCKED. * Enter the critical region before calling VOP_RWLOCK * to avoid a deadlock with write requests. * If we can't get the attributes, then we can't do the * right access checking. So, we'll fail the request. * Do not allocate memory more than maximum allowed * If returning data via RDMA Write, then grab the chunk list. If we * aren't returning READ data w/RDMA_WRITE, then grab a mblk. * mp will contain the data to be sent out in the read reply. * It will be freed after the reply has been sent. Let's * roundup the data to a BYTES_PER_XDR_UNIT multiple, so that * the call to xdrmblk_putmblk() never fails. If the first * alloc of the requested size fails, then decrease the size to * something more reasonable and wait for the allocation to "got client label from request(1)",
* No filesystem is actually shared public, so we default * to exi_root. In this case, we must check whether root * if root filesystem is exported, the exportinfo struct that we * should use is what checkexport4 returns, because root_exi is * actually a mostly empty struct. * it's a properly shared filesystem * XXX - issue with put*fh operations. Suppose /export/home is exported. * Suppose an NFS client goes to mount /export/home/joe. If /export, home, * or joe have restrictive search permissions, then we shouldn't let * the client get a file handle. This is easy to enforce. However, we * don't know what security flavor should be used until we resolve the * path name. Another complication is uid mapping. If root is * the user, then it will be mapped to the anonymous user by default, * but we won't know that till we've resolved the path name. And we won't * know what the anonymous user is. * Luckily, SECINFO is specified to take a full filename. * So what we will have to in rfs4_op_lookup is check that flavor of * the target object matches that of the request, and if root was the * caller, check for the root= and anon= options, and if necessary, * repeat the lookup using the right cred_t. But that's not done yet. * Using rootdir, the system root vnode, * Then use the root fsid & fid it to find out if it's exported * If the server root isn't exported directly, then * it should at least be a pseudo export based on * one or more exports further down in the server's (
CE_WARN,
"rfs4_op_putrootfh: export check failure"));
* Now make a filehandle based on the root * A directory entry is a valid nfsv4 entry if * - it has a non-zero ino * - it is not a dot or dotdot name * - it is visible in a pseudo export or in a real export that can * only have a limited view. * set_rdattr_params sets up the variables used to manage what information * to get for each directory entry. * could not even figure attr mask * dirent's d_ino is always correct value for mounted_on_fileid. * mntdfid_set is set once here, but mounted_on_fileid is * set in main dirent processing loop for each dirent. * The mntdfid_set is a simple optimization that lets the * server attr code avoid work when caller is readdir. * Lookup entry only if client asked for any of the following: * c) attrs w/per-object scope requested (change, filehandle, etc) * other than mounted_on_fileid (which we can take from dirent) * If filesystem attrs are requested, get them now from the * directory vp, as most entries will have same filesystem. The only * exception are mounted over entries but we handle * those as we go (XXX mounted over detection not yet implemented). * Failed to get filesystem attributes. * Return a rdattr_error for each entry, but don't fail. * However, don't get any obj-dependent attrs. * At least get fileid for regular readdir output * readlink: args: CURRENT_FH. * res: status. If success - CURRENT_FH unchanged, return linktext. /* CURRENT_FH: directory */ * Even though the conversion failed, we return * something. We just don't translate it. * treat link name as data * Release any state associated with the supplied * lockowner. Note if any lo_state is holding locks we will not * rele that lo_state and thus the lockowner will not be destroyed. * A client using lock after the lock owner stateid has been released * will suffer the consequence of NFS4ERR_BAD_STATEID and would have * to reissue the lock with new_lock_owner set to TRUE. /* Make sure there is a clientid around for this request */ * Check for EXPIRED client. If so will reap state with in a lease * period or on next set_clientid_confirm step * If no sysid has been assigned, then no locks exist; just return. * Mark the lockowner invalid. * sysid-pid pair should now not be used since the lockowner is * invalid. If the client were to instantiate the lockowner again * it would be assigned a new pid. Thus we can get the list of /* If we are still holding locks fail */ * We need to unhide the lockowner so the client can * try it again. The bad thing here is if the client * has a logic error that took it here in the first place * he probably has lost accounting of the locks that it * is holding. So we may have dangling state until the * open owner state is reaped via close. One scenario * that could possibly occur is that the client has * sent the unlock request(s) in separate threads * and has not waited for the replies before sending the * RELEASE_LOCKOWNER request. Presumably, it would expect * and deal appropriately with NFS4ERR_LOCKS_HELD, by * For the corresponding client we need to check each open * owner for any opens that have lockowner state associated * short utility function to lookup a file and recall the delegation * remove: args: CURRENT_FH: directory; name. * res: status. If success - CURRENT_FH unchanged, return change_info /* CURRENT_FH: directory */ * If there is an unshared filesystem mounted on this vnode, * Do not allow to remove anything in this directory. * Lookup the file so that we can check if it's a directory /* If necessary, convert to UTF-8 for illbehaved clients */ * Lookup the file to determine type and while we are see if * there is a file struct around and check for delegation. * We don't need to acquire va_seq before this lookup, if * it causes an update, cinfo.before will not match, which will * trigger a cache flush even if atomic is TRUE. /* Didn't find anything to remove */ /* check label before allowing removal */ "got client label from request(1)",
/* Get dir "before" change value */ /* Actually do the REMOVE operation */ * Can't remove a directory that has a mounted-on filesystem. * System V defines rmdir to return EEXIST, * not * ENOTEMPTY, if the directory is not * empty. A System V NFS server needs to map * NFS4ERR_EXIST to NFS4ERR_NOTEMPTY to * transmit over the wire. * This is va_seq safe because we are not /* Remove state on file remove */ * Get the initial "after" sequence number, if it fails, set to zero * Force modified data and metadata out to stable storage. * Get "after" change value, if it fails, simply return the * The cinfo.atomic = TRUE only if we have * non-zero va_seq's, and it has incremented by exactly one * rename: args: SAVED_FH: from directory, CURRENT_FH: target directory, * res: status. If success - CURRENT_FH unchanged, return change_info * for both from and target directories. /* CURRENT_FH: target directory */ /* SAVED_FH: from directory */ * If there is an unshared filesystem mounted on this vnode, * do not allow to rename objects in this directory. * If there is an unshared filesystem mounted on this vnode, * do not allow to rename to this directory. /* check label of the target dir */ "got client label from request(1)",
* Is the source a file and have a delegation? * We don't need to acquire va_seq before these lookups, if * it causes an update, cinfo.before will not match, which will * trigger a cache flush even if atomic is TRUE. /* Does the destination exist and a file and have a delegation? */ /* Check for NBMAND lock on both source and target */ /* Get source "before" change value */ /* The file is gone and so should the state */ * Get the initial "after" sequence number, if it fails, set to zero * Force modified data and metadata out to stable storage. * Get "after" change values, if it fails, simply return the * The cinfo.atomic = TRUE only if we have * non-zero va_seq's, and it has incremented by exactly one * during the VOP_RENAME and it didn't change during the VOP_FSYNC. * Add the renamed file handle to the volatile rename list /* file handles may expire on rename */ * Already know that nnm will be a valid string #
endif /* VOLATILE_FH_TEST */ /* No need to check cs->access - we are not accessing any object */ /* No need to check cs->access - we are not accessing any object */ * since SAVEFH is fairly rare, don't alloc space for its fh * rfs4_verify_attr is called when nfsv4 Setattr failed, but we wish to * return the bitmap of attrs that were set successfully. It is also * always be called only after rfs4_do_set_attrs(). * Verify that the attributes are same as the expected ones. sargp->vap * and sargp->sbp contain the input attributes as translated from fattr4. * This function verifies only the attrs that correspond to a vattr or * vfsstat struct. That is because of the extra step needed to get the * corresponding system structs. Other attributes have already been set or * verified by do_rfs4_set_attrs. * Return 0 if all attrs match, -1 if some don't, error if error processing. * Okay to overwrite sargp->vap because we verify based * on the incoming values. * Must return bitmap of successful attrs sva_mask = 0;
/* to prevent checking vap later */ * Some file systems clobber va_mask. it is probably * wrong of them to do so, nonethless we practice * Now get the superblock and loop on the bitmap, as there is * no simple way of translating from superblock to bitmap4. * Now loop and verify each attribute which getattr returned * whether it's the same as the input. * If vattr attribute but VOP_GETATTR failed, or it's * superblock attribute but VFS_STATVFS failed, skip else /* update response bitmap */ * Decode the attribute to be set/verified. If the attr requires a sys op * (VOP_GETATTR, VFS_VFSSTAT), and the request is to verify, then don't * call the sv_getit function for it, because the sys op hasn't yet been done. * Return 0 for success, error code if failed. * Note: the decoded arg is not freed here but in nfs4_ntov_table_free. * don't verify yet if a vattr or sb dependent attr, * because we don't have their sys values yet. * ACLs are a special case, since setting the MODE * conflicts with setting the ACL. We delay setting * the ACL until all other attributes have been set. * The ACL gets set in do_rfs4_op_setattr(). "decoding attribute %d\n", k);
* Set vattr based on incoming fattr4 attrs - used by setattr. * Set response mask. Ignore any values that are not writable vattr attrs. * Make sure that maximum attribute number can be expressed as an /* sargp->sbp is set by the caller */ * The following loop iterates on the nfs4_ntov_map checking * if the fbit is set in the requested bitmap. * If set then we process the arguments using the * rfs4_fattr4 conversion functions to populate the setattr * vattr and va_mask. Any settable attrs that are not using vattr * will be set in this loop. * If setattr, must be a writable attr. * unsupported attribute, tries to set * a read only attr or verify a write /* xdrmem_destroy(&xdrs); */ /* NO-OP */ * no further work to be done * If we got a request to set the ACL and the MODE, only * allow changing VSUID, VSGID, and VSVTX. Attempting * to change any other bits, along with setting an ACL, /* Check stateid only if size has been set */ /* XXX start of possible race with delegations */ * We need to specially handle size changes because it is * possible for the client to create a file with read-only * modes, but with the file opened for writing. If the client * then tries to set the file size, e.g. ftruncate(3C), * fcntl(F_FREESP), the normal access checking done in * VOP_SETATTR would prevent the client from doing it even though * it should be allowed to do so. To get around this, we do the * access checking for ourselves and use VOP_SPACE which doesn't * do the access checking. * Also the client should not be allowed to change the file * size if there is a conflicting non-blocking mandatory lock in * the region of the change. * ufs_setattr clears AT_SIZE from vap->va_mask, but * before returning, sarg.vap->va_mask is used to * generate the setattr reply bitmap. We also clear * AT_SIZE below before calling VOP_SPACE. For both * of these cases, the va_mask needs to be saved here * and restored after calling VOP_SETATTR. * Check any possible conflict due to NBMAND locks. * Get into critical region before VOP_GETATTR, so the * size attribute is valid when checking conflicts. /* restore va_mask -- ufs_setattr clears AT_SIZE */ * If an ACL was being set, it has been delayed until now, * in order to set the mode (via the VOP_SETATTR() above) first. "unable to find ACL in fattr4"));
/* check if a monitor detected a delegation conflict */ * Set the response bitmap when setattr failed. * If VOP_SETATTR partially succeeded, test by doing a * VOP_GETATTR on the object and comparing the data * to the setattr arguments. * Force modified metadata out to stable storage. /* Return early and already have a NFSv4 error */ * Except for nfs4_vmask_to_nmask_set(), vattr --> fattr * conversion sets both readable and writeable NFS4 attrs * for AT_MTIME and AT_ATIME. The line below masks out * unrequested attrs from the setattr result bitmap. This * is placed after the done: label to catch the ATTRNOTSUP * If there is an unshared filesystem mounted on this vnode, * do not allow to setattr on this vnode. /* check label before setting attributes */ "got client label from request(1)",
* verify and nverify are exactly the same, except that nverify * succeeds when some argument changed, and verify succeeds when * do_rfs4_set_attrs will try to verify systemwide attrs, * so could return -1 for "no match". * verify and nverify are exactly the same, except that nverify * succeeds when some argument changed, and verify succeeds when * do_rfs4_set_attrs will try to verify systemwide attrs, * so could return -1 for "no match". * XXX - This should live in an NFS header file. * We have to enter the critical region before calling VOP_RWLOCK * to avoid a deadlock with ufs. * If we can't get the attributes, then we can't do the * right access checking. So, we'll fail the request. /* should have ended on an mblk boundary */ printf(
"bytes=0x%x, round_len=0x%x, req len=0x%x\n",
printf(
"args=%p, args->mblk=%p, m=%p", (
void *)
args,
* We're changing creds because VM may fault and we need * the cred of the current thread to be used if quota /* XXX put in a header file */ * Form a reply tag by copying over the reqeuest tag. * XXX for now, minorversion should be zero * For now, NFS4 compound processing must be protected by * exported_lock because it can access more than one exportinfo * exinfo structs. The NFS2/3 code only refs 1 exportinfo * per proc (excluding public exinfo), and exi_count design * is sufficient to protect concurrent execution of NFS2/3 * ops along with unexport. This lock will be removed as * part of the NFSv4 phase 2 namespace redesign work. * If this is the first compound we've seen, we need to start all * new instances' grace periods. * This must be set after rfs4_grace_start_new(), otherwise * another thread could proceed past here before the former * Count the individual ops here; NULL and COMPOUND * are counted in common_dispatch() * This is effectively dead code since XDR code * will have already returned BADXDR if op doesn't * decode to legal value. This only done for a * day when XDR code doesn't verify v4 opcodes. * If not at last op, and if we are to stop, then * compact the results array. * done with this compound request, free the label * XXX because of what appears to be duplicate calls to rfs4_compound_free * XXX zero out the tag and array values. Need to investigate why the * XXX calls occur, but at least prevent the panic for now. * Process the value of the compound request rpc flags, as a bit-AND * of the individual per-op flags (idempotent, allowork, publicfh_ok) "rfs4_client_sysid: allocated 0x%x\n", *
sp));
default:
op =
"F_UNKNOWN";
default:
type =
"F_UNKNOWN";
* Look up the pathname using the vp in cs as the directory vnode. * cs->vp will be the vnode for the file on success /* Get "before" change value */ /* rfs4_lookup may VN_RELE directory */ * Get "after" change value, if it fails, simply return the * Validate the file is a file * It is undefined if VOP_LOOKUP will change va_seq, so * cinfo.atomic = TRUE only if we have * non-zero va_seq's, and they have not changed. /* Check for mandatory locking */ * The file open mode used is VWRITE. If the client needs * some other semantic, then it should do the access checking * itself. It would have been nice to have the file open mode * passed as part of the arguments. * If we got something other than file already exists * then just return this error. Otherwise, we got * EEXIST. If we were doing a GUARDED create, then * just return this error. Otherwise, we need to * make sure that this wasn't a duplicate of an * exclusive create request. * The assumption is made that a non-exclusive create * request will never return EEXIST. * We couldn't find the file that we thought that * we just created. So, we'll just try creating /* existing object must be regular file */ /* Check for duplicate request */ /* but its not our creation */ /* For now we don't allow mandatory locking as per V2/V3 */ * If the file system is exported read only and we are trying * to open for write, then return NFS4ERR_ROFS /* Check if the file system is read only */ /* check the label of including directory */ "got client label from request(1)",
* Get the last component of path name in nm. cs will reference * the including directory on success. /* Disallow create with a non-zero size */ /* prohibit EXCL create of named attributes */ * Ensure no time overflows. Assumes underlying * filesystem supports at least 32 bits. * Truncate nsec to usec resolution to allow valid * compares even if the underlying filesystem truncates. /* If necessary, convert to UTF-8 for illbehaved clients */ * True verification that object was created with correct * attrs is impossible. The attrs could have been changed * immediately after object creation. If attributes did * not verify, the only recourse for the server is to * destroy the object. Maybe if some attrs (like gid) * are set incorrectly, the object should be destroyed; * however, seems bad as a default policy. Do we really * want to destroy an object over one of the times not * verifying correctly? For these reasons, the server * currently sets bits in attrset for createattrs * that were set; however, no verification is done. * vmask_to_nmask accounts for vattr bits set on create * [do_rfs4_set_attrs() only sets resp bits for * Mask off any bits we set by default so as not to return * more attrset bits than were requested in createattrs * We did not create the vnode (we tried but it * already existed). In this case, the only createattr * that the spec allows the server to set is size, * and even then, it can only be set if it is 0. * Get the initial "after" sequence number, if it fails, * set to zero, time to before. * create_vnode attempts to create the file exclusive, * if it already exists the VOP_CREATE will fail and * may not increase va_seq. It is atomic if * we haven't changed the directory, but if it has changed * we don't know what changed it. * The entry was created, we need to sync the * Get "after" change value, if it fails, simply return the * The cinfo->atomic = TRUE only if we have * non-zero va_seq's, and it has incremented by exactly one * during the create_vnode and it didn't * change during the VOP_FSYNC. /* Check for mandatory locking and that the size gets set. */ * Truncate the file if necessary; this would be * the case for create over an existing file. * We are writing over an existing file. * Check to see if we need to recall a delegation. * Force modified data and metadata out to stable storage. /* if parent dir is attrdir, set namedattr fh flag */ * if we did not create the file, we will need to check * the access bits on the file /* XXX Currently not using req */ /* get the file struct and hold a lock on it during initial open */ (
CE_NOTE,
"rfs4_do_open: can't find file"));
(
CE_NOTE,
"rfs4_do_open: can't find state"));
/* No need to keep any reference */ /* try to get the sysid before continuing */ /* Not a fully formed open; "close" it */ /* Calculate the fflags for this OPEN. */ * Calculate the new deny and access mode that this open is adding to * the file for this open owner; * Check to see the client has already sent an open for this * open owner on this file with the same share/deny modes. * If so, we don't need to check for a conflict and we don't * need to add another shrlock. If not, then we need to * check for conflicts in deny and access before checking for * conflicts in delegation. We don't want to recall a * delegation based on an open that will eventually fail based /* Not a fully formed open; "close" it */ * Check to see if this file is delegated and if so, if a * recall needs to be done. /* Let's see if the delegation was returned */ /* Not a fully formed open; "close" it */ * the share check passed and any delegation conflict has been * taken care of, now call vop_open. * if this is the first open then call vop_open with fflags. * if not, call vn_open_upgrade with just the upgrade flags. * if the file has been opened already, it will have the current * access mode in the state struct. if it has no share access, then * However, if this is open with CLAIM_DLEGATE_CUR, then don't * call VOP_OPEN(), just do the open upgrade. /* Not a fully formed open; "close" it */ /* check if a monitor detected a delegation conflict */ }
else {
/* open upgrade */ * calculate the fflags for the new mode that is being added * Check for delegation here. if the deleg argument is not * DELEG_ANY, then this is a reclaim from a client and * we must honor the delegation requested. If necessary we can /* inhibit delegation grants during exclusive create */ /* cs->vp cs->fh now reference the desired file */ * If rfs4_createfile set attrset, we must * clear this attrset before the response is copied. /* Verify that we have a regular file */ * Check if we have access to the file, Note the the file * could have originally been open UNCHECKED or GUARDED * with mode bits that will now fail, but there is nothing * we can really do about that except in the case that the * owner of the file is the one requesting the open. * cinfo on a CLAIM_PREVIOUS is undefined, initialize to zero * Find the state info from the stateid and confirm that the * file is delegated. If the state openowner is the same as * the supplied openowner we're done. If not, get the file * info from the found state info. Use that file info to * create the state for this lock owner. Note solaris doen't * really need the pathname to find the file. We may want to * lookup the pathname and make sure that the vp exist and * matches the vp in the file structure. However it is * possible that the pathname nolonger exists (local process * unlinks the file), so this may not be that useful. * New lock owner, create state. Since this was probably called * in response to a CB_RECALL we set deleg to DELEG_NONE /* Mark progress for delegation returns */ * Lookup the pathname, it must already exist since this file * Find the file and state info for this vp and open owner pair. * check that they are in fact delegated. * check that the state access and deny modes are the same. * Return the delgation possibly seting the recall flag. /* Note we ignore oflags */ /* get the file struct and hold a lock on it during initial open */ (
CE_NOTE,
"rfs4_do_opendelprev: can't find file"));
(
CE_NOTE,
"rfs4_do_opendelprev: can't find state"));
(
CE_NOTE,
"rfs4_do_opendelprev: state mixup"));
* Generic function for sequence number checks. /* Same sequence ids and matching operations? */ "Replayed SEQID %d\n",
seqid));
/* If the incoming sequence is not the next expected then it is bad */ (
CE_NOTE,
"BAD SEQID: Replayed sequence id " "but last op was %d current op is %d\n",
(
CE_NOTE,
"BAD SEQID: got %u expecting %u\n",
/* Everything okay -- next expected */ * Need to check clientid and lease expiration first based on * error ordering and incrementing sequence id. * Find the open_owner for use from this point forward. Take * care in updating the sequence id based on the type of error /* Hold off access to the sequence space while the open is done */ * If the open_owner existed before at the server, then check * Sequence was ok and open owner exists * check to see if we have yet to see an /* Grace only applies to regular-type OPENs */ * If previous state at the server existed then can_reclaim * will be set. If not reply NFS4ERR_NO_GRACE to the * Reject the open if the client has missed the grace period /* Couple of up-front bookkeeping items */ * If this is a reclaim OPEN then we should not ask * for a confirmation of the open_owner per the * protocol specification. * If there is an unshared filesystem mounted on this vnode, * access must READ, WRITE, or BOTH. No access is invalid. * deny can be READ, WRITE, BOTH, or NONE. * make sure attrset is zero before response is built. /* Catch sequence id handling here to make it a little easier */ * The protocol states that if any of these errors are * being returned, the sequence id should not be * incremented. Any other return requires an /* Always update the lease in this case */ /* Regular response - copy the result */ * REPLAY case: Only if the previous response was OK * do we copy the filehandle. If not OK, no * If this is a replay, we must restore the * returned originally. Try our best to do * If this was a replay, no need to update the * sequence id. If the open_owner was not created on * this pass, then update. The first use of an * open_owner will not bump the sequence id. * If the client is receiving an error and the * open_owner needs to be confirmed, there is no way * to notify the client of this fact ignoring the fact * that the server has no method of returning a * stateid to confirm. Therefore, the server needs to * mark this open_owner in a way as to avoid the * sequence id checking the next time the client uses * If OK response then clear the postpone flag and * reset the sequence id to keep in sync with the /* Ensure specified filehandle matches */ /* hold off other access to open_owner while we tinker */ * If it is the appropriate stateid and determined to * be "OKAY" then this means that the stateid does not * need to be confirmed and the client is in error for * sending an OPEN_CONFIRM. * This is replayed stateid; if seqid matches * next expected, then client is using wrong seqid. * Note this case is the duplicate case so * resp->status is already set. /* Ensure specified filehandle matches */ /* hold off other access to open_owner while we tinker */ /* Check the sequence id for the open owner */ * This is replayed stateid; if seqid matches * next expected, then client is using wrong seqid. * Note this case is the duplicate case so * resp->status is already set. * Check that the new access modes and deny modes are valid. * Check that no invalid bits are set. * The new modes must be a subset of the current modes and * the access must specify at least one mode. To test that * the new mode is a subset of the current modes we bitwise * AND them together and check that the result equals the new * New mode, access == R and current mode, sp->share_access == RW * access & sp->share_access == R == access, so the new access mode * is valid. Consider access == RW, sp->share_access = R * access & sp->share_access == R != access, so the new access mode * Release any share locks associated with this stateID. * Strictly speaking, this violates the spec because the * spec effectively requires that open downgrade be atomic. * At present, fs_shrlock does not have this capability. * If the current mode has deny read and the new mode * does not, decrement the number of deny read mode bits * and if it goes to zero turn off the deny read bit * If the current mode has deny write and the new mode * does not, decrement the number of deny write mode bits * and if it goes to zero turn off the deny write bit * If the current mode has access read and the new mode * does not, decrement the number of access read mode bits * and if it goes to zero turn off the access read bit * on the file. set fflags to FREAD for the call to * If the current mode has access write and the new mode * does not, decrement the number of access write mode bits * and if it goes to zero turn off the access write bit * on the file. set fflags to FWRITE for the call to /* Set the new access and deny modes */ /* Check that the file is still accessible */ * we successfully downgraded the share lock, now we need to downgrade * the open. it is possible that the downgrade was only for a deny * mode and we have nothing else to do. * The logic behind this function is detailed in the NFSv4 RFC in the * SETCLIENTID operation description under IMPLEMENTATION. Refer to * that section for explicit guidance to server behavior for * In search of an EXISTING client matching the incoming * request to establish a new client identifier at the server /* Should never happen */ * Easiest case. Client identifier is newly created and is * unconfirmed. Also note that for this case, no other * entries exist for the client identifier. Nothing else to * check. Just setup the response and respond. /* Setup callback information; CB_NULL confirmation later */ * An existing, confirmed client may exist but it may not have * been active for at least one lease period. If so, then * "close" the client and create a new client identifier * We have a confirmed client, now check for an /* If creds don't match then client identifier is inuse */ * Some one else has established this client * id. Try and say * who they are. We will use * the call back address supplied by * the * Confirmed, creds match, and verifier matches; must * be an update of the callback info /* Setup callback information */ /* everything okay -- move ahead */ /* update the confirm_verifier and return it */ * Creds match but the verifier doesn't. Must search * for an unconfirmed client that would be replaced by * At this point, we have taken care of the brand new client * struct, INUSE case, update of an existing, and confirmed * check to see if things have changed while we originally * picked up the client struct. If they have, then return and * retry the processing of this SETCLIENTID request. /* do away with the old unconfirmed one */ * This search will temporarily hide the confirmed client * struct while a new client struct is created as the * If one was not created, then a similar request must be in * process so release and start over with this one /* Setup callback information; CB_NULL confirmation later */ /* If the verifier doesn't match, the record doesn't match */ * Update the client's associated server instance, if it's changed * since the client was created. * Record clientid in stable storage. * Must be done after server instance has been assigned. /* don't need to rele, client_close does it */ /* If needed, initiate CB_NULL call for callback path */ * Check to see if client can perform reclaims /* Ensure specified filehandle matches */ /* hold off other access to open_owner while we tinker */ /* Check the sequence id for the open owner */ * This is replayed stateid; if seqid matches * next expected, then client is using wrong seqid. * Note this case is the duplicate case so * resp->status is already set. /* Update the stateid. */ * Manage the counts on the file struct and close all file locks * Decrement the count for each access and deny bit that this * state has contributed to the file. If the file counts go to zero * clear the appropriate bit in the appropriate mask. * If this call is part of the larger closing down of client * state then it is just easier to release all locks * associated with this client instead of going through each * individual file and cleaning locks there. /* Is the PxFS kernel module loaded? */ /* Encode the cluster nodeid in new sysid */ * This PxFS routine removes file locks for a * client over all nodes of a cluster. "lm_remove_file_locks(sysid=0x%x)\n",
/* Release all locks for this client */ * Release all locks on this file by this lock owner or at * least mark the locks as having been released /* Was this already taken care of above? */ * Release any shrlocks associated with this open state ID. * This must be done before the rfs4_state gets marked closed. * lock_denied: Fill in a LOCK4deneid structure given an flock64 structure. * Its not a NFS4 lock. We take advantage that the upper 32 bits * of the client id contain the boot time for a NFS4 lock. So we * fabricate and identity by setting clientid to the sysid, and * the lock owner to the pid. /* Get the owner of the lock */ /* No longer locked, retry */ /* Check for zero length. To lock to end of file use all ones for V4 */ length = 0;
/* Posix to end of file */ /* Note that length4 is uint64_t but l_len and l_start are off64_t */ * N.B. FREAD has the same value as OPEN4_SHARE_ACCESS_READ and * FWRITE has the same value as OPEN4_SHARE_ACCESS_WRITE. * N.B. We map error values to nfsv4 errors. This is differrent * than puterrno4 routine. /* Can only get here if op is OP_LOCK */ /* Create a new lockowner for this instance */ /* Ensure specified filehandle matches */ /* hold off other access to open_owner while we tinker */ * This is replayed stateid; if seqid * matches next expected, then client /* This is a duplicate LOCK request */ * For a duplicate we do not want to * create a new lockowner as it should * Turn off the lockowner create flag. (
CE_NOTE,
"rfs4_op_lock: no lock owner"));
* Only update theh open_seqid if this is not (
CE_NOTE,
"rfs4_op_lock: no state"));
* This is the new_lock_owner branch and the client is * supposed to be associating a new lock_owner with * the open file at this point. If we find that a * successful LOCK request was returned to the client, * an error is returned to the client since this is * not appropriate. The client should be using the * existing lock_owner branch. * Only update theh open_seqid if this is not * If this is a duplicate lock request, just copy the * previously saved reply and return. /* verify that lock_seqid's match */ (
CE_NOTE,
"rfs4_op_lock: Dup-Lock seqid bad" "lsp->seqid=%d old->seqid=%d",
* Make sure to copy the just * retrieved reply status into the * overall compound status /* Make sure to update the lock sequence id */ * This is used to signify the newly created lockowner * stateid and its sequence number. The checks for * sequence number and increment don't occur on the * very first lock request for a lockowner. /* hold off other access to lsp while we tinker */ /* get lsp and hold the lock on the underlying file struct */ /* Ensure specified filehandle matches */ /* hold off other access to lsp while we tinker */ * The stateid looks like it was okay (expected to be * The sequence id is now checked. Determine * if this is a replay or if it is in the * expected (next) sequence. In the case of a * replay, there are two replay conditions * that may occur. The first is the normal * condition where a LOCK is done with a * NFS4_OK response and the stateid is * updated. That case is handled below when * the stateid is identified as a REPLAY. The * second is the case where an error is * returned, like NFS4ERR_DENIED, and the * sequence number is updated but the stateid * is not updated. This second case is dealt * with here. So it may seem odd that the * stateid is okay but the sequence id is a * Here is our replay and need * to verify that the last * This is done since the sequence id * looked like a replay but it didn't * pass our check so a BAD_SEQID is /* Everything looks okay move ahead */ * This is a replayed stateid; if * seqid matches the next expected, * then client is using wrong seqid. * NFS4 only allows locking on regular files, so * Only update the "OPEN" response here if this was a new * If an sp obtained, then the lsp does not represent * a lock on the file struct. /* Ensure specified filehandle matches */ /* hold off other access to lsp while we tinker */ * This is a replayed stateid; if * seqid matches the next expected, * then client is using wrong seqid. * NFS4 only allows locking on regular files, so * LOCKT is a best effort routine, the client can not be guaranteed that * the status return is still in effect by the time the reply is received. * They are numerous race conditions in this routine, but we are not required * and can not be accurate. * NFS4 only allows locking on regular files, so * Check out the clientid to ensure the server knows about it * so that we correctly inform the client of a server reboot. * Protocol doesn't allow returning NFS4ERR_STALE as * other operations do on this check so STALE_CLIENTID /* Check for zero length. To lock to end of file use all ones for V4 */ /* Find or create a lockowner */ /* Note that length4 is uint64_t but l_len and l_start are off64_t */ * N.B. We map error values to nfsv4 errors. This is differrent * than puterrno4 routine. (
CE_NOTE,
"rfs4_shrlock %s vp=%p acc=%d dny=%d sysid=%d " /* Can make the rest chunks all 0-len */ * MUST fail if there are still more data