clnt_clts.c revision de8c4a14ec9a49bad5e62b2cfa6c1ba21de1c708
1N/A * The contents of this file are subject to the terms of the 1N/A * Common Development and Distribution License (the "License"). 1N/A * You may not use this file except in compliance with the License. 1N/A * See the License for the specific language governing permissions 1N/A * and limitations under the License. 1N/A * When distributing Covered Code, include this CDDL HEADER in each 1N/A * If applicable, add the following below this CDDL HEADER, with the 1N/A * fields enclosed by brackets "[]" replaced with your own identifying 1N/A * information: Portions Copyright [yyyy] [name of copyright owner] 1N/A * Copyright 2009 Sun Microsystems, Inc. All rights reserved. 1N/A * Use is subject to license terms. 1N/A * Copyright (c) 1983, 1984, 1985, 1986, 1987, 1988, 1989 AT&T 1N/A * All Rights Reserved 1N/A * Portions of this source code were derived from Berkeley 4.3 BSD 1N/A * under license from the Regents of the University of California. 1N/A * Implements a kernel based, client side RPC. 1N/A * Operations vector for CLTS based RPC 1N/A * Endpoint for CLTS (INET, INET6, loopback, etc.) 1N/A * Response completion hash queue 1N/A * Routines for the endpoint manager 1N/A * Request dipatching function. 1N/A * The size of the preserialized RPC header information. 1N/A * The initial allocation size. It is small to reduce space requirements. 1N/A * The size of additional allocations, if required. It is larger to 1N/A * reduce the number of actual allocations. 1N/A * Private data per rpc handle. This structure is allocated by 1N/A * clnt_clts_kcreate, and freed by clnt_clts_kdestroy. 1N/A /* ptr to feedback rtn */ 1N/A * The following is used to determine the global default behavior for 1N/A * CLTS when binding to a local port. 1N/A * If the value is set to 1 the default will be to select a reserved 1N/A * (aka privileged) port, if the value is zero the default will be to 1N/A * use non-reserved ports. Users of kRPC may override this by using 1N/A * CLNT_CONTROL() and CLSET_BINDRESVPORT. 1N/A * Backwards compatibility for old kstat clients 1N/A * Create an rpc handle for a clts rpc connection. 1N/A * Allocates space for the handle structure and the private data. 1N/A /* call message, just used to pre-serialize below */ 1N/A /* pre-serialize call message header */ 1N/A /* LINTED pointer alignment */ 1N/A * set the timers. Return current retransmission timeout. 1N/A /* LINTED pointer alignment */ 1N/A * Time out back off function. tim is in HZ 1N/A * Call remote procedure. 1N/A * Most of the work of rpc is done here. We serialize what is left 1N/A * of the header (some was pre-serialized in the handle), serialize 1N/A * the arguments, and send it off. We wait for a reply or a time out. 1N/A * Timeout causes an immediate return, other packet problems may cause 1N/A * a retry on the receive. When a good packet is received we deserialize 1N/A * it, and check verification. A bad reply code will cause one retry 1N/A * with full (longhand) credentials. 1N/A /* LINTED pointer alignment */ 1N/A * Copy in the preserialized RPC header 1N/A * transaction id is the 1st thing in the output 1N/A /* LINTED pointer alignment */ 1N/A /* Skip the preserialized stuff. */ 1N/A /* Serialize dynamic stuff into the output buffer. */ 1N/A /* Serialize the procedure number and the arguments. */ 1N/A * Grab an endpnt only if the endpoint is NULL. We could be retrying 1N/A * the request and in this case we want to go through the same 1N/A * source port, so that the duplicate request cache may detect a 1N/A RPCLOG(
64,
"clnt_clts_kcallit_addr: sent call for xid 0x%x\n",
1N/A * There are two reasons for which we go back to to tryread. 1N/A * a) In case the status is RPC_PROCUNAVAIL and we sent out a 1N/A * broadcast we should not get any invalid messages with the 1N/A * RPC_PROCUNAVAIL error back. Some broken RPC implementations 1N/A * send them and for this we have to ignore them ( as we would 1N/A * have never received them ) and look for another message 1N/A * which might contain the valid response because we don't know 1N/A * how many broken implementations are in the network. So we are 1N/A * going to loop until 1N/A * - we received a valid response 1N/A * - we have processed all invalid responses and 1N/A * got a time out when we try to receive again a 1N/A * b) We will jump back to tryread also in case we failed 1N/A * within the AUTH_VALIDATE. In this case we should move 1N/A * on and loop until we received a valid response or we 1N/A * have processed all responses with broken authentication 1N/A * and we got a time out when we try to receive a message. 1N/A * We have to reset the call_notified here. In case we have 1N/A * to do a retry ( e.g. in case we got a RPC_PROCUNAVAIL 1N/A * error ) we need to set this to false to ensure that 1N/A * we will wait for the next message. When the next message 1N/A * is going to arrive the function clnt_clts_dispatch_notify 1N/A * will set this to true again. 1N/A * We got interrupted, bail out 1N/A * It's possible that our response arrived 1N/A * right after we timed out. Check to see 1N/A * if it has arrived before we remove the 1N/A * calllist from the dispatch queue. 1N/A "response received for request " 1N/A#
if 0
/* XXX not yet */ 1N/A * Timeout may be due to a dead gateway. Send 1N/A * an ioctl downstream advising deletion of 1N/A * route when we reach the half-way point to 1N/A * Check to see if a response arrived. If it one is 1N/A * present then proceed to process the reponse. Otherwise 1N/A * fall through to retry or retransmit the request. This 1N/A * is probably not the optimal thing to do, but since we 1N/A * are most likely dealing with a unrealiable transport it 1N/A * is the safe thing to so. 1N/A * Prepare the message for further processing. We need to remove 1N/A * the datagram header and copy the source address if necessary. No 1N/A * need to verify the header since rpcmod took care of that. 1N/A * Copy the source address if the caller has supplied a netbuf. 1N/A * Pop off the datagram header. 1N/A * Van Jacobson timer algorithm here, only if NOT a retransmission. 1N/A * xdr_results will be done in AUTH_UNWRAP. 1N/A * Decode and validate the response. 1N/A * Reply is good, check auth. 1N/A /* set errno in case we can't recover */ 1N/A * Determine whether or not we're doing an RPC 1N/A * broadcast. Some server implementations don't 1N/A * follow RFC 1050, section 7.4.2 in that they 1N/A * don't remain silent when they see a proc 1N/A * they don't support. Therefore we keep trying 1N/A * to receive on RPC_PROCUNAVAIL, hoping to get 1N/A * a valid response from a compliant server. 1N/A * Maybe our credential need to be refreshed 1N/A * The credential is refreshed. Try the request again. 1N/A * Even if stries == 0, we still retry as long as 1N/A * refreshes > 0. This prevents a soft authentication 1N/A * error turning into a hard one at an upper level. 1N/A * We have used the client handle to do an AUTH_REFRESH 1N/A * and the RPC status may be set to RPC_SUCCESS; 1N/A * Let's make sure to set it to RPC_AUTHERROR. 1N/A * Map recoverable and unrecoverable 1N/A * authentication errors to appropriate errno 1N/A * Could be an nfsportmon failure, set 1N/A * useresvport and try again. 1N/A RPCLOG(
1,
"clnt_clts_kcallit : authentication failed " 1N/A "with RPC_AUTHERROR of type %d\n",
1N/A RPCLOG(
64,
"clnt_clts_kcallit_addr: xid 0x%x taken off dispatch list",
1N/A * Errors due to lack of resources, wait a bit 1N/A /* (void) sleep((caddr_t)&lbolt, PZERO-4); */ 1N/A * Allow the endpoint to be held by the client handle in case this 1N/A * RPC was not successful. A retry may occur at a higher level and 1N/A * in this case we may want to send the request over the same 1N/A * Endpoint is also released for one-way RPC: no reply, nor retransmit 1N/A * Return error info on this handle. 1N/A /* LINTED pointer alignment */ 1N/A /* LINTED pointer alignment */ 1N/A /* LINTED pointer alignment */ 1N/A * Destroy rpc handle. 1N/A * Frees the space used for output buffer, private data, and handle 1N/A /* LINTED pointer alignment */ 1N/A RPCLOG(
8,
"clnt_clts_kdestroy h: %p\n", (
void *)h);
1N/A * The connectionless (CLTS) kRPC endpoint management subsystem. 1N/A * Because endpoints are potentially shared among threads making RPC calls, 1N/A * they are managed in a pool according to type (endpnt_type_t). Each 1N/A * endpnt_type_t points to a list of usable endpoints through the e_pool 1N/A * field, which is of type list_t. list_t is a doubly-linked list. 1N/A * The number of endpoints in the pool is stored in the e_cnt field of 1N/A * endpnt_type_t and the endpoints are reference counted using the e_ref field 1N/A * in the endpnt_t structure. 1N/A * As an optimization, endpoints that have no references are also linked 1N/A * to an idle list via e_ilist which is also of type list_t. When a thread 1N/A * calls endpnt_get() to obtain a transport endpoint, the idle list is first 1N/A * consulted and if such an endpoint exists, it is removed from the idle list 1N/A * and returned to the caller. 1N/A * If the idle list is empty, then a check is made to see if more endpoints 1N/A * can be created. If so, we proceed and create a new endpoint which is added 1N/A * to the pool and returned to the caller. If we have reached the limit and 1N/A * cannot make a new endpoint then one is returned to the caller via round- 1N/A * When an endpoint is placed on the idle list by a thread calling 1N/A * endpnt_rele(), it is timestamped and then a reaper taskq is scheduled to 1N/A * be dispatched if one hasn't already been. When the timer fires, the 1N/A * taskq traverses the idle list and checks to see which endpoints are 1N/A * eligible to be closed. It determines this by checking if the timestamp 1N/A * when the endpoint was released has exceeded the the threshold for how long 1N/A * it should stay alive. 1N/A * endpnt_t structures remain persistent until the memory reclaim callback, 1N/A * endpnt_reclaim(), is invoked. 1N/A * Here is an example of how the data structures would be laid out by the 1N/A * _______________ ______________ 1N/A * | e_next |----------------------->| e_next |---->> 1N/A * | e_pool |<---+ | e_pool |<----+ 1N/A * | e_ilist |<---+--+ | e_ilist |<----+--+ 1N/A * +->| e_pcurr |----+--+--+ +->| e_pcurr |-----+--+--+ 1N/A * | | ... | | | | | | ... | | | | 1N/A * | | e_itimer (90) | | | | | | e_itimer (0) | | | | 1N/A * | | e_cnt (1) | | | | | | e_cnt (3) | | | | 1N/A * | +---------------+ | | | | +--------------+ | | | 1N/A * | endpnt_t | | | | | | | 1N/A * | ____________ | | | | ____________ | | | 1N/A * | | e_node |<------+ | | | | e_node |<------+ | | 1N/A * | | e_idle |<---------+ | | | e_idle | | | | 1N/A * +--| e_type |<------------+ +--| e_type | | | | 1N/A * | e_tiptr | | | e_tiptr | | | | 1N/A * | ... | | | ... | | | | 1N/A * | e_lock | | | e_lock | | | | 1N/A * | ... | | | ... | | | | 1N/A * | e_ref (0) | | | e_ref (2) | | | | 1N/A * | e_itime | | | e_itime | | | | 1N/A * +------------+ | +------------+ | | | 1N/A * | ____________ | | | 1N/A * | | e_node |<------+ | | 1N/A * | | e_idle |<------+--+ | 1N/A * | | e_ref (0) | | | 1N/A * | +------------+ | | 1N/A * | ____________ | | 1N/A * | | e_node |<------+ | 1N/A * +--| e_type |<------------+ 1N/A * Endpoint locking strategy: 1N/A * The following functions manipulate lists which hold the endpoint and the 1N/A * endpoints themselves: 1N/A * endpnt_get()/check_endpnt()/endpnt_rele()/endpnt_reap()/do_endpnt_reclaim() 1N/A * Lock description follows: 1N/A * endpnt_type_lock: Global reader/writer lock which protects accesses to the 1N/A * e_plock: Lock defined in the endpnt_type_t. It is intended to 1N/A * protect accesses to the pool of endopints (e_pool) for a given 1N/A * e_ilock: Lock defined in endpnt_type_t. It is intended to protect accesses 1N/A * to the idle list (e_ilist) of available endpoints for a given 1N/A * endpnt_type_t. It also protects access to the e_itimer, e_async_cv, 1N/A * and e_async_count fields in endpnt_type_t. 1N/A * e_lock: Lock defined in the endpnt structure. It is intended to protect 1N/A * flags, cv, and ref count. 1N/A * The order goes as follows so as not to induce deadlock. 1N/A * endpnt_type_lock -> e_plock -> e_ilock -> e_lock 1N/A * Interaction with Zones and shutting down: 1N/A * endpnt_type_ts are uniquely identified by the (e_zoneid, e_rdev, e_protofmly) 1N/A * tuple, which means that a zone may not reuse another zone's idle endpoints 1N/A * without first doing a t_kclose(). 1N/A * A zone's endpnt_type_ts are destroyed when a zone is shut down; e_async_cv 1N/A * and e_async_count are used to keep track of the threads in endpnt_taskq 1N/A * trying to reap endpnt_ts in the endpnt_type_t. 1N/A * Allocate and initialize an endpnt_type_t 1N/A * Allocate a new endpoint type to hang a list of 1N/A * endpoints off of it. 1N/A * Check to see if we need to create a taskq for endpoint 1N/A * Free an endpnt_type_t 1N/A * Check the endpoint to ensure that it is suitable for use. 1N/A * Possible return values: 1N/A * return (1) - Endpoint is established, but needs to be re-opened. 1N/A * return (0) && *newp == NULL - Endpoint is established, but unusable. 1N/A * return (0) && *newp != NULL - Endpoint is established and usable. 1N/A * The first condition we check for is if the endpoint has been 1N/A * allocated, but is unusable either because it has been closed or 1N/A * has been marked stale. Only *one* thread will be allowed to 1N/A * execute the then clause. This is enforced because the first thread 1N/A * to check this condition will clear the flags, so that subsequent 1N/A * thread(s) checking this endpoint will move on. 1N/A * Clear the flags here since they will be 1N/A * set again by this thread. They need to be 1N/A * individually cleared because we want to maintain 1N/A * the state for ENDPNT_ONIDLE. 1N/A * The second condition is meant for any thread that is waiting for 1N/A * an endpoint to become established. It will cv_wait() until 1N/A * the condition for the endpoint has been changed to ENDPNT_BOUND or 1N/A * The last case we check for is if the endpoint has been marked stale. 1N/A * If this is the case then set *newp to NULL and return, so that the 1N/A * caller is notified of the error and can take appropriate action. 1N/A * Provide a fault injection setting to test error conditions. 1N/A * Returns a handle (struct endpnt *) to an open and bound endpoint 1N/A * specified by the knetconfig passed in. Returns NULL if no valid endpoint 1N/A * Inject fault if desired. Pretend we have a stale endpoint 1N/A * Link the endpoint type onto the list 1N/A * The logic here is that we were unable to find an 1N/A * endpnt_type_t that matched our criteria, so we allocate a 1N/A * new one. Because kmem_alloc() needs to be called with 1N/A * KM_SLEEP, we drop our locks so that we don't induce 1N/A * deadlock. After allocating and initializing the 1N/A * endpnt_type_t, we reaquire the lock and go back to check 1N/A * if this entry needs to be added to the list. Since we do 1N/A * some operations without any locking other threads may 1N/A * have been looking for the same endpnt_type_t and gone 1N/A * through this code path. We check for this case and allow 1N/A * one thread to link its endpnt_type_t to the list and the 1N/A * other threads will simply free theirs. 1N/A * We need to reaquire the lock with RW_WRITER here so that 1N/A * we can safely link the new endpoint type onto the list. 1N/A * If n_etype is not NULL, then another thread was able to 1N/A * insert an endpnt_type_t of this type onto the list before 1N/A * we did. Go ahead and free ours. 1N/A * The algorithm to hand out endpoints is to first 1N/A * give out those that are idle if such endpoints 1N/A * exist. Otherwise, create a new one if we haven't 1N/A * reached the max threshold. Finally, we give out 1N/A * endpoints in a pseudo LRU fashion (round-robin). 1N/A * Note: The idle list is merely a hint of those endpoints 1N/A * that should be idle. There exists a window after the 1N/A * endpoint is released and before it is linked back onto the 1N/A * idle list where a thread could get a reference to it and 1N/A * use it. This is okay, since the reference counts will 1N/A * still be consistent. 1N/A * Pop the endpoint off the idle list and hand it off 1N/A * Reset the idle timer if it has been set 1N/A * There are no idle endpoints currently, so 1N/A * create a new one if we have not reached the maximum or 1N/A * hand one out in round-robin. 1N/A * Advance the pointer to the next eligible endpoint, if 1N/A * We need to check to see if this endpoint is bound or 1N/A * not. If it is in progress then just wait until 1N/A * the set up is complete * Allocate a new endpoint to use. If we can't allocate any * more memory then use one that is already established if any RPCLOG0(
1,
"endpnt_get: kmem_cache_alloc failed\n");
* Try to recover by using an existing endpoint. * Partially init an endpoint structure and put * it on the list, so that other interested threads * know that one is being created * Link the endpoint into the pool. * The transport should be opened with sufficient privs * Allow the kernel to push the module on behalf of the user. RPCLOG(
1,
"endpnt_get: kstr_push on rpcmod failed %d\n",
error);
* Connectionless data flow should bypass the stream head. RPCLOG(
1,
"endpnt_get: kstr_push on timod failed %d\n",
error);
* Attempt to bind the endpoint. If we fail then propogate * error back to calling subsystem, so that it can be handled * If the caller has not specified reserved port usage then * take the system default. "endpnt_get: bindresvport error %d\n",
error);
* reopen with all privileges * Set the flags and notify and waiters that we have an established * mark this endpoint as stale and notify any threads waiting * on this endpoint that it will be going away. * If there was a transport endopoint opened, then close it. * Release a referece to the endpoint * If the ref count is zero, then start the idle timer and link * the endpoint onto the idle list. * Check to see if the endpoint is already linked to the idle * list, so that we don't try to reinsert it. * The idle timer has fired, so dispatch the taskq to close the * Traverse the idle list and close those endpoints that have reached their RPCLOG0(
1,
"endpnt_reclaim: reclaim callback started\n");
RPCLOG(
1,
"endpnt_reclaim: protofmly %s, ",
RPCLOG(
1,
"endpnt_reclaim: found %d endpoint(s)\n",
* The nice thing about maintaining an idle list is that if * there are any endpoints to reclaim, they are going to be * on this list. Just go through and reap the one's that * have ref counts of zero. * Reset the current pointer to be safe RPCLOG(
1,
"endpnt_reclaim: reclaimed %d endpoint(s)\n",
rcnt);
* Endpoint reclaim zones destructor callback routine. * After reclaiming any cached entries, we basically go through the endpnt_type * list, canceling outstanding timeouts and free'ing data structures. /* Make sure NFS client handles are released. */ * We don't need to be holding on to any locks across the call to * endpnt_reclaim() and the code below; we know that no-one can * be holding open connections for this zone (all processes and kernel * threads are gone), so nothing could be adding anything to the list. * untimeout() any outstanding timers that have not yet fired. * Wait for threads in endpnt_taskq trying to reap endpnt_ts in * Endpoint reclaim kmem callback routine. * Reclaim idle endpnt's from all zones. * RPC request dispatch routine. Constructs a datagram message and wraps it * around the RPC request to pass downstream. * Set up the call record. "clnt_clts_dispatch_send: putting xid 0x%x on " * Link the datagram header with the actual data * RPC response delivery routine. Deliver the response to the waiting * thread by matching the xid. * If the RPC response is not contained in the same mblk as the * datagram header, then move to the next mblk. unsigned char *p = (
unsigned char *)&
xid;
* Copy the xid, byte-by-byte into xid. * If we got here, we ran out of mblk space before the "clnt_dispatch_notify(clts): message less than " * Reset the read pointer back to the beginning of the protocol /* call_table_find returns with the hash bucket locked */ * verify that the reply is coming in on * the same zone that it was sent from. * found thread waiting for this reply. "clnt_dispatch_notify (clts): discarding old " RPCLOG(
8,
"clnt_dispatch_notify (clts): no caller for reply " * This is unfortunate, but we need to lookup the zone so we * can increment its "rcbadxids" counter. * Init routine. Called when rpcmod is loaded. * Perform simple bounds checking to make sure that the setting is * Defer creating the taskq until rpcmod gets pushed. If we are * in diskless boot mode, rpcmod will get loaded early even before * thread_create() is available. * Dispatch the taskq at an interval which is offset from the * interval that the endpoints should be reaped. * Initialize the completion queue * Initialize the zone destructor callback.