rsm.c revision b97d6ca7333c353b6ca20c20c99fb1be8d32a8de
1N/A * The contents of this file are subject to the terms of the 1N/A * Common Development and Distribution License (the "License"). 1N/A * You may not use this file except in compliance with the License. 1N/A * See the License for the specific language governing permissions 1N/A * and limitations under the License. 1N/A * When distributing Covered Code, include this CDDL HEADER in each 1N/A * If applicable, add the following below this CDDL HEADER, with the 1N/A * fields enclosed by brackets "[]" replaced with your own identifying 1N/A * information: Portions Copyright [yyyy] [name of copyright owner] 1N/A * Copyright 2009 Sun Microsystems, Inc. All rights reserved. 1N/A * Use is subject to license terms. 1N/A * Copyright 2012 Milan Jurik. All rights reserved. 1N/A * Overview of the RSM Kernel Agent: 1N/A * --------------------------------- 1N/A * rsm.c constitutes the implementation of the RSM kernel agent. The RSM 1N/A * kernel agent is a pseudo device driver which makes use of the RSMPI 1N/A * interface on behalf of the RSMAPI user library. 1N/A * The kernel agent functionality can be categorized into the following 1N/A * 1. Driver Infrastructure 1N/A * The driver infrastructure includes the basic module loading entry points 1N/A * like _init, _info, _fini to load, unload and report information about 1N/A * the driver module. The driver infrastructure also includes the 1N/A * autoconfiguration entry points namely, attach, detach and getinfo for 1N/A * the device autoconfiguration. 1N/A * The kernel agent is a pseudo character device driver and exports 1N/A * a cb_ops structure which defines the driver entry points for character 1N/A * device access. This includes the open and close entry points. The 1N/A * other entry points provided include ioctl, devmap and segmap and chpoll. 1N/A * read and write entry points are not used since the device is memory 1N/A * mapped. Also ddi_prop_op is used for the prop_op entry point. 1N/A * The ioctl entry point supports a number of commands, which are used by 1N/A * the RSMAPI library in order to export and import segments. These 1N/A * commands include commands for binding and rebinding the physical pages 1N/A * allocated to the virtual address range, publishing the export segment, 1N/A * unpublishing and republishing an export segment, creating an 1N/A * import segment and a virtual connection from this import segment to 1N/A * an export segment, performing scatter-gather data transfer, barrier 1N/A * Export and Import segments: 1N/A * --------------------------- 1N/A * In order to create an RSM export segment a process allocates a range in its 1N/A * virtual address space for the segment using standard Solaris interfaces. 1N/A * The process then calls RSMAPI, which in turn makes an ioctl call to the 1N/A * RSM kernel agent for an allocation of physical memory pages and for 1N/A * creation of the export segment by binding these pages to the virtual 1N/A * address range. These pages are locked in memory so that remote accesses 1N/A * are always applied to the correct page. Then the RSM segment is published, 1N/A * again via RSMAPI making an ioctl to the RSM kernel agent, and a segment id 1N/A * is assigned to it. 1N/A * In order to import a published RSM segment, RSMAPI creates an import 1N/A * segment and forms a virtual connection across the interconnect to the 1N/A * export segment, via an ioctl into the kernel agent with the connect 1N/A * command. The import segment setup is completed by mapping the * local device memory into the importers virtual address space. The * mapping of the import segment is handled by the segmap/devmap * infrastructure described as follows. * Segmap and Devmap interfaces: * The RSM kernel agent allows device memory to be directly accessed by user * threads via memory mapping. In order to do so, the RSM kernel agent * supports the devmap and segmap entry points. * The segmap entry point(rsm_segmap) is responsible for setting up a memory * mapping as requested by mmap. The devmap entry point(rsm_devmap) is * responsible for exporting the device memory to the user applications. * rsm_segmap calls RSMPI rsm_map to allocate device memory. Then the * control is transfered to the devmap_setup call which calls rsm_devmap. * rsm_devmap validates the user mapping to the device or kernel memory * and passes the information to the system for setting up the mapping. The * actual setting up of the mapping is done by devmap_devmem_setup(for * device memory) or devmap_umem_setup(for kernel memory). Callbacks are * registered for device context management via the devmap_devmem_setup * or devmap_umem_setup calls. The callbacks are rsmmap_map, rsmmap_unmap, * rsmmap_access, rsmmap_dup. The callbacks are called when a new mapping * is created, a mapping is freed, a mapping is accessed or an existing * mapping is duplicated respectively. These callbacks allow the RSM kernel * agent to maintain state information associated with the mappings. * The state information is mainly in the form of a cookie list for the import * segment for which mapping has been done. * Forced disconnect of import segments: * When an exported segment is unpublished, the exporter sends a forced * disconnect message to all its importers. The importer segments are * unloaded and disconnected. This involves unloading the original * mappings and remapping to a preallocated kernel trash page. This is * done by devmap_umem_remap. The trash/dummy page is a kernel page, * preallocated by the kernel agent during attach using ddi_umem_alloc with * the DDI_UMEM_TRASH flag set. This avoids a core dump in the application * due to unloading of the original mappings. * Additionally every segment has a mapping generation number associated * with it. This is an entry in the barrier generation page, created * during attach time. This mapping generation number for the import * segments is incremented on a force disconnect to notify the application * of the force disconnect. On this notification, the application needs * to reconnect the segment to establish a new legitimate mapping. * Locks used in the kernel agent: * ------------------------------- * The kernel agent uses a variety of mutexes and condition variables for * mutual exclusion of the shared data structures and for synchronization * between the various threads. Some of the locks are described as follows. * Each resource structure, which represents either an export/import segment * has a lock associated with it. The lock is the resource mutex, rsmrc_lock. * This is used directly by RSMRC_LOCK and RSMRC_UNLOCK macros and in the * rsmseglock_acquire and rsmseglock_release macros. An additional * lock called the rsmsi_lock is used for the shared import data structure * that is relevant for resources representing import segments. There is * also a condition variable associated with the resource called s_cv. This * is used to wait for events like the segment state change etc. * The resource structures are allocated from a pool of resource structures, * called rsm_resource. This pool is protected via a reader-writer lock, * There are two separate hash tables, one for the export segments and * one for the import segments. The export segments are inserted into the * export segment hash table only after they have been published and the * import segments are inserted in the import segments list only after they * have successfully connected to an exported segment. These tables are * protected via reader-writer locks. * Debug Support in the kernel agent: * ---------------------------------- * Debugging support in the kernel agent is provided by the following * DBG_PRINTF((category, level, message)) is a macro which logs a debug * message to the kernel agents debug buffer, rsmka_dbg. This debug buffer * can be viewed in kmdb as *rsmka_dbg/s. The message is logged based * on the definition of the category and level. All messages that belong to * the specified category(rsmdbg_category) and are of an equal or greater * severity than the specified level(rsmdbg_level) are logged. The message * is a string which uses the same formatting rules as the strings used in * The category defines which component of the kernel agent has logged this * message. There are a number of categories that have been defined such as * RSM_KERNEL_AGENT, RSM_OPS, RSM_IMPORT, RSM_EXPORT etc. A macro, * DBG_ADDCATEGORY is used to add in another category to the currently * specified category value so that the component using this new category * can also effectively log debug messages. Thus, the category of a specific * message is some combination of the available categories and we can define * sub-categories if we want a finer level of granularity. * The level defines the severity of the message. Different level values are * defined, with RSM_ERR being the most severe and RSM_DEBUG_VERBOSE being * the least severe(debug level is 0). * DBG_DEFINE and DBG_DEFINE_STR are macros provided to declare a debug * variable or a string respectively. * Special Fork and Exec Handling: * ------------------------------- * The backing physical pages of an exported segment are always locked down. * Thus, there are two cases in which a process having exported segments * will cause a cpu to hang: (1) the process invokes exec; (2) a process * forks and invokes exit before the duped file descriptors for the export * segments are closed in the child process. The hang is caused because the * address space release algorithm in Solaris VM subsystem is based on a * non-blocking loop which does not terminate while segments are locked * down. In addition to this, Solaris VM subsystem lacks a callback * mechanism to the rsm kernel agent to allow unlocking these export * In order to circumvent this problem, the kernel agent does the following. * The Solaris VM subsystem keeps memory segments in increasing order of * virtual addressses. Thus a special page(special_exit_offset) is allocated * by the kernel agent and is mmapped into the heap area of the process address * space(the mmap is done by the RSMAPI library). During the mmap processing * of this special page by the devmap infrastructure, a callback(the same * devmap context management callbacks discussed above) is registered for an * As discussed above, this page is processed by the Solaris address space * release code before any of the exported segments pages(which are allocated * from high memory). It is during this processing that the unmap callback gets * called and this callback is responsible for force destroying the exported * segments and thus eliminating the problem of locked pages. * A credit based flow control algorithm is used for messages whose * processing cannot be done in the interrupt context because it might * involve invoking rsmpi calls, or might take a long time to complete * or might need to allocate resources. The algorithm operates on a per * path basis. To send a message the pathend needs to have a credit and * it consumes one for every message that is flow controlled. On the * receiving pathend the message is put on a msgbuf_queue and a task is * dispatched on the worker thread - recv_taskq where it is processed. * After processing the message, the receiving pathend dequeues the message, * and if it has processed > RSMIPC_LOTSFREE_MSGBUFS messages sends * credits to the sender pathend. * This is used to enable the DR testing using a test driver on test * platforms which do not supported DR. (
struct bus_ops *)0,
/* bus operations */ * Module linkage information for the kernel. &
mod_driverops,
/* Type of module. This one is a pseudo driver */ "Remote Shared Memory Driver",
/* This flag can be changed to 0 to help with PIT testing */ /* cookie, va, offsets and length for the barrier */ /* cookie for the trash memory */ /* list of nodes to which RSMIPC_MSG_SUSPEND has been sent */ /* list of descriptors for remote importers */ * data and locks to keep track of total amount of exported memory * The locking model is as follows: * find resource - grab reader lock on resouce list * insert rc - grab writer lock * delete rc - grab writer lock and resource mutex * find resource - grab read lock and resource mutex * resource state - grab resource mutex * Initialize the suspend message list * It is assumed here that configuration data is available * during system boot since _init may be called at that time. * The rsmka_modunloadok flag is simply used to help with * the PIT testing. Make this flag 0 to disallow modunload. /* rsm_detach will be called as a result of mod_remove */ "Unable to fini RSM %x\n", e));
"rsm:rsm_attach - cmd not supported\n"));
"rsm:rsm_attach - supports only " "enable-dynamic-reconfiguration",
1);
"reconfiguration setup failed\n");
* page_list_read_unlock(); "rsm: segment-hashtable-size in rsm.conf " "must be greater than 0, defaulting to 128\n"));
"max-exported-memory", 0);
"rsm:rsm_attach not enough memory available to " "export, or max-exported-memory set incorrectly.\n"));
/* 0 indicates no fixed upper limit. maxmem is the max */ /* available pageable physical mem */ "rsm: Available physical memory = %lu pages, " "Max exportable memory = %lu pages",
"rsm: rsm_attach - Unable to get " "rsm: rsm_attach - unable to allocate " * Allocate the hashtables * Allocate a resource struct * Based on the rsm.conf property max-segments, determine the maximum * to determine the size for barrier failure pages. /* First get the max number of segments from the rsm.conf file */ /* Use default number of segments */ * Based on the max number of segments allowed, determine the barrier * page size. add 1 to max_segs since the barrier page itself uses * allocation of the barrier failure page * Allocate a trash memory and get a cookie for it. This will be used * when remapping segments during force disconnects. Allocate the * trash memory with a large size which is page aligned. /* initialize user segment id allocation variable */ * initialize the null_rsmpi_ops vector and the loopback adapter * The call to mod_remove in the _fine routine will cause the system "rsm:rsm_detach - cmd %x not supported\n",
* Unregister the DR callback functions * Release all resources, seglist, controller, ... /* remove intersend queues */ /* remove registered services */ * Free minor zero resource * Free the memory allocated for the trash * *********************** Resource Number Management ******************** * All resources are stored in a simple hash table. The table is an array * of pointers to resource blks. Each blk contains: * base - base number of this blk * used - number of used slots in this blk. * blks - array of pointers to resource items. * An entry in a resource blk is empty if it's NULL. * We start with no resource array. Each time we run out of slots, we * reallocate a new larger array and copy the pointer to the new array and * a new resource blk is allocated and added to the hash table. * The resource control block contains: * root - array of pointer of resource blks * sz - current size of array. * len - last valid entry in array. * A search operation based on a resource number is as follows: * index = rnum / RESOURCE_BLKSZ; * ASSERT(index < resource_block.len); * offset = rnum % RESOURCE_BLKSZ; * ASSERT(offset >= resource_block.root[index]->base); * ASSERT(offset < resource_block.root[index]->base + RESOURCE_BLKSZ); * return resource_block.root[index]->blks[offset]; * A resource blk is freed with its used count reachs zero. /* search for available resource slot */ "rsmresource_alloc enter\n"));
/* Try to find an empty slot */ /* found an empty slot in this blk */ "rsmresource_alloc done\n"));
/* remember first empty slot */ /* Couldn't find anything, allocate a new blk */ * Do we need to reallocate the root array * Allocate new array and copy current stuff into it * Don't allocate more that max valid rnum * Copy old data into new space and * watch out not to exceed bounds of barrier page "rsmresource_alloc failed %d\n", *
rnum));
"rsmresource_alloc done\n"));
/* search for available resource slot */ "rsmresource_free enter\n"));
"rsmresource_free done\n"));
"rsmresource_free done\n"));
"rsmresource_free done\n"));
"rsmresource_lookup enter\n"));
/* Find resource and lock it in READER mode */ /* search for available resource slot */ "rsmresource_lookup done\n"));
"rsmresource_lookup done\n"));
/* Find resource and lock it in READER mode */ /* Caller can upgrade if need be */ /* search for available resource slot */ "rsmresource_insert enter\n"));
"rsmresource_insert done\n"));
"rsmresource_destroy enter\n"));
"Not null slot %d, %lx\n", j,
"rsmresource_destroy done\n"));
/* ******************** Generic Key Hash Table Management ********* */ /* acquire resource lock */ /* state changed, release lock and return null */ "rsmhash_lookup done: state changed\n"));
* It's ok not to find the segment. * If the current resource state is other than the state passed in * then the resource is (probably) already on the list. eg. for an * import segment if the state is not RSM_STATE_NEW then it's on the * Used for checking export segments; don't want to have * the same key used for multiple segments. /* Key doesn't exist, add it */ * XOR each byte of the key. * generic function to get a specific bucket * generic function to get a specific bucket's address * generic function to alloc a hash table * generic function to free a hash table /* *********************** Exported Segment Key Management ************ */ /* ************************** Import Segment List Management ********** */ * Add segment to import list. This will be useful for paging and loopback * #define rsmimport_lookup(key) \ * (rsmseg_t *)rsmhash_lookup(&rsm_import_segs, (key), RSM_STATE_CONNECT) * increase the ref count and make the import segment point to the * shared data structure. Return a pointer to the share data struct * and the shared data struct is locked upon return * Look for an entry that is importing the same exporter * with the share data structure allocated. /* we are the first importer, create the shared data struct */ /* we grab the shared lock before returning from this function */ * the shared data structure should be locked before calling * Change the state and signal any waiting segments. /* **************************Segment Structure Management ************* */ /* need to take seglock here to avoid race with rsmmap_unmap() */ /* Segment is still busy */ * If it's an importer decrement the refcount * and if its down to zero free the shared data structure. * This is where failures during rsm_connect() are unrefcounted * The following needs to be done after any * rsmsharelock calls which use seg->s_share. * allocate memory for new segment. This should be a segkmem cache. /* ******************************** Driver Open/Close/Poll *************** */ * Only zero can be opened, clones are used for resources. * The library function _rsm_librsm_init calls open for * /dev/rsm with flag set to O_RDONLY. We want a valid * file descriptor to be returned for minor device zero. "rsm_open RDONLY done\n"));
* - allocate new minor number and segment. * - add segment to list of all segments. * - set minordev data to segment * - update devp argument to new device * - update s_cred to cred; make sure you do crhold(cred); /* allocate a new resource number */ * We will bind this minor to a specific resource in first * If we are processing rsm_close wait for force_destroy * processing to complete since force_destroy processing * needs to finish first before we can free the segment. * force_destroy is only for export segments /* It's ok to read the state without a lock */ * Segments in this state have been removed off the * exported segments list and have been unpublished * and unbind. These segments have been removed during * a callback to the rsm_export_force_destroy, which * is called for the purpose of unlocking these * exported memory segments when a process exits but * leaves the segments locked down since rsm_close is * is not called for the segments. This can happen * when a process calls fork or exec and then exits. * Once the segments are in the ZOMBIE state, all that * remains is to destroy them when rsm_close is called. * This is done here. Thus, for such segments the * the state is changed to new so that later in this * function rsmseg_free is called. /* Disconnect will handle the unmap */ "Invalid segment state %d in rsm_close\n",
seg->
s_state));
"Invalid segment state %d in rsm_close\n",
seg->
s_state));
* - make sure you do crfree(s_cred); * release segment and minor number * The export_force_destroy callback is created to unlock * the exported segments of a process * when the process does a fork or exec and then exits calls this * function with the force flag set to 1 which indicates that the * segment state must be converted to ZOMBIE. This state means that the * segments still exist and have been unlocked and most importantly the * only operation allowed is to destroy them on an rsm_close. * At this point we are the last reference to the resource. * Free resource number from resource table. * It's ok to remove number before we free the segment. * We need to lock the resource to protect against remote calls. * Description: increment rsm page counter. * Parameters: pgcnt_t pnum; number of pages to be used * Returns: RSM_SUCCESS if memory limit not exceeded * ENOSPC if memory limit exceeded. In this case, the * page counter remains unchanged. /* ensure that limits have not been exceeded */ * Description: decrement rsm page counter. * Parameters: pgcnt_t pnum; number of pages freed * Make sure vaddr and len are aligned on a page boundary "rsm_bind_pages:mem limit exceeded\n"));
"rsm_bind_pages:ddi_umem_lock failed\n"));
* ddi_umem_lock, in the case of failure, returns one of * the following three errors. These are translated into * the RSMERR namespace and returned. /* unlock address range */ "rsm_bind done:no adapter\n"));
"rsm: rsm_bind done: invalid vaddr\n"));
"rsm_bind: invalid length\n"));
"rsm_bind done: cv_wait INTERRUPTED"));
* Set the s_pid value in the segment structure. This is used * to identify exported segments belonging to a particular * process so that when the process exits, these segments can * be unlocked forcefully even if rsm_close is not called on * process exit since there maybe other processes referencing * them (for example on a fork or exec). * The s_pid value is also used to authenticate the process * doing a publish or unpublish on the export segment. Only * the creator of the export segment has a right to do a * publish or unpublish and unbind on the segment. "unable to lock down pages\n"));
/* copyout the resource number */ "rsm_remap_local_importers enter\n"));
* Change the s_cookie value of only the local importers * which have been mapped (in state RSM_STATE_ACTIVE). * Note that there is no need to change the s_cookie value * if the imported segment is in RSM_STATE_MAPPING since * eventually the s_cookie will be updated via the mapping "rsm_remap_local_importers done\n"));
/* Check for permissions to rebind */ * We will not be allowing partial rebind and hence length passed * in must be same as segment length "rsm_rebind done: null msg->vaddr\n"));
"rsm_rebind: invalid length\n"));
"rsm_rebind done: cv_wait INTERRUPTED"));
/* verify segment state */ "rsm_rebind done: invalid state\n"));
* unbind the older pages, and unload local importers; * but don't disconnect importers * Unbind the pages associated with "cookie" by the * rsm_bind_pages calls prior to this. This is * similar to what is done in the rsm_unbind_pages * routine for the seg->s_cookie. "rsm_rebind failed with %d\n", e));
* At present there is no dependency on the existence of xbuf. * So we can free it here. If in the future this changes, it can * be freed sometime during the segment destroy. /* verify segment state */ "rsm_unbind: invalid state\n"));
/* unlock current range */ /* **************************** Exporter Access List Management ******* */ "rsmacl_build done: acl invalid\n"));
"rsmacl_build done: BAD_ADDR\n"));
"rsmacl_build done: EINVAL\n"));
/* rsmpi understands only RSM_PERM_XXXX */ "rsmsegacl_validate enter\n"));
* Find segment and grab its lock. The reason why we grab the segment * lock in side the search is to avoid the race when the segment is * being deleted and we already have a pointer to it. "rsmsegacl_validate done: %u ENXIO\n",
key));
* We implement a 2-level protection scheme. * First, we check if local/remote host has access rights. * Second, we check if the user has access rights. * This routine only validates the rnode access_list /* rnode is not found in the list */ "rsmsegacl_validate done: EPERM\n"));
/* use default owner creation umask */ /* update perm for this node */ * Perm of requesting node is valid; source will validate user * Add the importer to the list right away, if connect fails * the importer will ask the exporter to remove it. /* ************************** Exporter Calls ************************* */ "rsm_publish: Not creator\n"));
* Get per node access list "rsm_publish done: rsmacl_build failed\n"));
* The application provided msg->key is used for resolving a * segment id according to the following: * key = 0 Kernel Agent selects the segment id * key <= RSM_DLPI_ID_END Reserved for system usage except * key < RSM_USER_APP_ID_BASE segment id = key * key >= RSM_USER_APP_ID_BASE Reserved for KA selections * rsm_nextavail_segmentid is initialized to 0x80000000 and * overflows to zero after 0x80000000 allocations. * An algorithm is needed which allows reinitialization and provides * for reallocation after overflow. For now, ENOMEM is returned * once the overflow condition has occurred. "rsm_publish done: no more keys avlbl\n"));
/* range reserved for internal use by base/ndi libraries */ "rsm_publish done: invalid key %u\n",
msg->
key));
/* Add key to exportlist; The segment lock is held on success */ "rsm_publish done: export_add failed: %d\n", e));
/* state changed since then, free acl and return */ "rsm_publish done: segment in wrong state: %d\n",
* If this is for a local memory handle and permissions are zero, * then the surrogate segment is very large and we want to skip * allocation of DVMA space. * Careful! If the user didn't use an ACL list, acl will be a NULL * pointer. Check that before dereferencing it. /* This call includes a bind operations */ * create a acl list with hwaddr for RSMPI publish "rsm_publish done: rsmpiacl_create failed: %d\n", e));
/* This call includes a bind operations */ * At present there is no dependency on the existence of xbuf. * So we can free it here. If in the future this changes, it can * be freed sometime during the segment destroy. "rsm_publish done: export_create failed: %d\n", e));
* The following assertion ensures that the two errors * related to the length and its alignment do not occur * since they have been checked during export_create /* export segment, this should create an IMMU mapping */ "rsm_publish done: export_publish failed: %d\n",
* If the segment id was solicited, then return it in * the original incoming message. * This function modifies the access control list of an already published * segment. There is no effect on import segments which are already "rsm_republish: Not owner\n"));
"rsm_republish done: rsmacl_build failed %d", e));
* a republish is in progress - REPUBLISH message is being * sent to the importers so wait for it to complete OR "rsm_republish done: cv_wait INTERRUPTED"));
/* recheck if state is valid */ * This call will only be meaningful if and when the interconnect * layer makes use of the access list * create a acl list with hwaddr for RSMPI publish "rsm_republish done: rsmpiacl_create failed %d", e));
"rsm_republish done: rsmpi republish failed %d\n", e));
/* create a tmp copy of the new acl */ * The default permission of a node which was in the old * ACL but not in the new ACL is 0 ie no access. * NULL acl means all importers can connect and * default permission will be owner creation umask /* make other republishers to wait for republish to complete */ /* send the new perms to the importing nodes */ /* wake up any one waiting for republish to complete */ "rsm_unpublish: Not creator\n"));
* wait for QUIESCING to complete here before rsmexport_rm * is called because the SUSPEND_COMPLETE mesg which changes * the seg state from EXPORT_QUIESCING to EXPORT_QUIESCED and * signals the cv_wait needs to find it in the hashtable. "rsm_unpublish done: cv_wait INTR qscing" /* verify segment state */ "rsm_unpublish done: bad state %x\n",
seg->
s_state));
* wait for republish to complete "rsm_unpublish done: cv_wait INTR repubing"));
"rsm_unpublish done: invalid state"));
* check for putv/get surrogate segment which was not published * Be certain to see if there is an ACL first! If this segment was * not published with an ACL, acl will be a null pointer. Check * that before dereferencing it. "rsm_unpublish done: bad state %x\n",
/* unpublish from adapter */ * wait for unpublish to succeed, it's busy. /* wait for a max of 1 ms - this is an empirical */ /* value that was found by some minimal testing */ /* can be fine tuned when we have better numbers */ /* A long term fix would be to send cv_signal */ /* from the intr callback routine */ /* currently nobody signals this wait */ "rsm_unpublish: SEG_IN_USE\n"));
"rsm:rsmpi unpublish err %x\n", e));
"rsm_unpublish: rsmpi destroy key=%x failed %x\n",
* Called from rsm_unpublish to force an unload and disconnection of all * importers of the unpublished segment. * First build the list of segments requiring a force disconnect, then * send a request for each. "rsm_send_importer_disconnects enter\n"));
* take it off the importer list and add it * to the force disconnect list. * make sure that the tmp_token's node * is not already on the force disconnect "rsm_send_importer_disconnects done\n"));
* This function is used as a callback for unlocking the pages locked * down by a process which then does a fork or an exec. * It marks the export segments corresponding to umem cookie given by * the *arg to be in a ZOMBIE state(by calling rsmseg_close to be * destroyed later when an rsm_close occurs). "rsm_export_force_destroy enter\n"));
* Walk the resource list and locate the export segment (either * in the BIND or the EXPORT state) which corresponds to the * ddi_umem_cookie_t being freed up, and call rsmseg_close. * Change the state to ZOMBIE by calling rsmseg_close with the * force_flag argument (the second argument) set to 1. Also, * unpublish and unbind the segment, but don't free it. Free it * only on a rsm_close call for the segment. continue;
/* continue searching */ * Found the segment, set flag to indicate * force destroy processing is in progress /* call rsmseg_close with force flag set to 1 */ * force destroy processing done, clear flag and signal any * thread waiting in rsmseg_close. "rsm_export_force_destroy done\n"));
/* ******************************* Remote Calls *********************** */ "rsm_intr_segconnect enter\n"));
"rsm_intr_segconnect done\n"));
* When an exported segment is unpublished the exporter sends an ipc * message (RSMIPC_MSG_DISCONNECT) to all importers. The recv ipc dispatcher * calls this function. The import list is scanned; segments which match the * exported segment id are unloaded and disconnected. * Will also be called from rsm_rebind with disconnect_flag FALSE. * In order to make rsmseg_unload and rsm_force_unload * thread safe, acquire the segment lock here. * rsmseg_unload is responsible for releasing the lock. * rsmseg_unload releases the lock just before a call * to rsmipc_send or in case of an early exit which * occurs if the segment was in the state * RSM_STATE_CONNECTING or RSM_STATE_NEW. * Find slot for cookie in reply. * Match sequence with sequence in cookie * Try to grap lock of slot, if locked return * copy data into reply slot area "rsm: rsm_intr_reply mismatched reply %d\n",
* This function gets dispatched on the worker thread when we receive * the SQREADY message. This function sends the SQREADY_ACK message. "rsm_sqready_ack_deferred enter\n"));
* If path is not active no point in sending the ACK * because the whole SQREADY protocol will again start * when the path becomes active. * decrement the path refcnt incremented in rsm_proc_sqready "rsm_sqready_ack_deferred done:!ACTIVE\n"));
/* send an SQREADY_ACK message */ /* initialize credits to the max level */ /* wake up any send that is waiting for credits */ * decrement the path refcnt since we incremented it in "rsm_sqready_ack_deferred done\n"));
* Process the SQREADY message /* look up the path - incr the path refcnt */ * No path exists or path is not active - drop the message "rsm_proc_sqready done: msg dropped no path\n"));
/* drain any tasks from the previous incarnation */ * If we'd sent an SQREADY message and were waiting for SQREADY_ACK * in the meanwhile we received an SQREADY message, blindly reset * the WAIT_FOR_SQACK flag because we'll just send SQREADY_ACK * and forget about the SQREADY that we sent. /* decr refcnt and drop the mutex */ "rsm_proc_sqready done: msg dropped path !ACTIVE\n"));
* The sender's local incarnation number is our remote incarnation * number save it in the path data structure * path is active - dispatch task to send SQREADY_ACK - remember * RSMPI calls can't be done in interrupt context * We can use the recv_taskq to send because the remote endpoint * cannot start sending messages till it receives SQREADY_ACK hence * at this point there are no tasks on recv_taskq. * The path refcnt will be decremented in rsm_sqready_ack_deferred. * Process the SQREADY_ACK message "rsm_proc_sqready_ack enter\n"));
/* look up the path - incr the path refcnt */ * drop the message if - no path exists or path is not active * or if its not waiting for SQREADY_ACK message "rsm_proc_sqready_ack done: msg dropped no path\n"));
/* decrement the refcnt */ "rsm_proc_sqready_ack done: msg dropped\n"));
* Check if this message is in response to the last RSMIPC_MSG_SQREADY /* decrement the refcnt */ "rsm_proc_sqready_ack done: msg old incn %lld\n",
* clear the WAIT_FOR_SQACK flag since we have recvd the ack /* save the remote sendq incn number */ /* initialize credits to the max level */ /* wake up any send that is waiting for credits */ /* decrement the refcnt */ "rsm_proc_sqready_ack done\n"));
* process the RSMIPC_MSG_CREDIT message /* look up the path - incr the path refcnt */ "rsm_add_credits enter: path not found\n"));
/* the path is not active - discard credits */ "rsm_add_credits enter:path=%lx !ACTIVE\n",
path));
* Check if these credits are for current incarnation of the path. /* decrement the refcnt */ "rsm_add_credits enter: old incn %lld\n",
"rsm_add_credits:path=%lx new-creds=%d " /* add credits to the path's sendq */ /* wake up any send that is waiting for credits */ /* decrement the refcnt */ /* This is for an import segment */ /* This is for an export segment */ "rsm_intr_event done: exp seg not found\n"));
* We must hold the segment lock here, or else the segment * can be freed while pollwakeup is using it. This implies * that we MUST NOT grab the segment lock during rsm_chpoll, * as outlined in the chpoll(2) man page. * The exporter did a republish and changed the ACL - this change is only * visible to new importers. * find the importer and update the permission in the shared * data structure. Any new importers will use the new perms int done =
1;
/* indicate all SUSPENDS have been acked */ "rsm_suspend_complete enter\n"));
"rsm_suspend_complete done: suspend_list is empty\n"));
/* clear the pending flag for the node */ done = 0;
/* still some nodes have not yet ACKED */ "rsm_suspend_complete done: acks pending\n"));
* Now that we are done with suspending all the remote importers * time to quiesce the local exporters "rsm_suspend_complete done\n"));
* The importers send a SUSPEND_COMPLETE to the exporter node * Unpublish, unbind the export segment and * move the segments to the EXPORT_QUIESCED state * some local memory handles are not published * check if it was published "exporter_quiesce:unpub %d\n", e));
"exporter_quiesce:destroy %d\n",
* All the local segments we are done with the pre-del processing * - time to move to PREDEL_COMPLETED. int susp_flg;
/* true means already suspended */ * Suspend all importers with same <node, key> pair. * After the last one of the shared importers has been continue;
/* go to next entry */ * search the rest of the bucket for * other siblings (imprtrs with the same key) * of "first" and suspend them. * All importers with same key fall in * either not a peer segment or its a * disconnected segment - skip it if (
susp_flg) {
/* seg already suspended */ break;
/* the inner for loop */ * we've processed all importers that are * All the importers with the same key and * nodeid as "first" have been suspended. * This is done only once. /* send an ACK for SUSPEND message */ "rsmseg_suspend enter: key=%u\n",
seg->
s_key));
"rsmseg_suspend:segment %x state=%d\n",
/* wait until segment leaves the mapping state */ /* unload the mappings */ /* rsmseg_suspend already done for seg */ ASSERT(0);
/* invalid state */ "rsmsegshare_suspend enter\n"));
/* do the rsmpi disconnect */ "rsm:rsmpi disconnect seg=%x:err=%d\n",
/* do the rsmpi unmap and disconnect */ "rsmshare_suspend: rsmpi unmap %d\n", e));
"rsm:rsmpi disconnect seg=%x:err=%d\n",
ASSERT(0);
/* invalid state */ "rsmsegshare_suspend done\n"));
* This should get called on receiving a RESUME message or from * the pathmanger if the node undergoing DR dies. /* process only importers of node undergoing DR */ * inform the exporter so that it can "rsmseg_resume enter: key=%u\n",
seg->
s_key));
/* resume the shared connection and/or mapping */ /* shared state can either be connected or mapped */ }
else {
/* error in rsmpi connect during resume */ /* clean up the shared data structure */ * The following needs to be done after any * rsmsharelock calls which use seg->s_share. /* signal any waiting segment */ /* Setup protections for remap */ /* error in rsmpi connect or map during resume */ /* remap to trash page */ "rsmseg_resume:remap=%d\n", e));
/* clean up the shared data structure */ * The following needs to be done after any * rsmsharelock calls which use seg->s_share. /* signal any waiting segment */ "rsmseg_resume done:seg=%x,err=%d\n",
"rsmseg_resume:remap=%d\n", e));
}
else {
/* remote exporter */ /* remap to the new rsmpi maps */ "rsmseg_resume:remap=%d\n", e));
* If we are not in a xxxx_QUIESCE state that means shared "rsmsegshare_resume:rsmpi connect seg=%x:err=%d\n",
/* when do we send the NOT_IMPORTING message */ /* signal any waiting segment */ /* signal any waiting segment */ /* do the rsmpi map of the whole segment here */ * We need to do rsmpi maps with <off, lens> identical to * the old mapinfo list because the segment mapping handles * dhp and such need the fragmentation of rsmpi maps to be * identical to what it was during the mmap of the segment "rsmsegshare_resume: rsmpi map err=%d\n",
"rsmsegshare_resume: rsmpi maplen" /* Check if this is the first rsm_map */ * A single rsm_unmap undoes multiple rsm_maps. "rsmsegshare_resume:disconn seg=%x:err=%d\n",
/* signal the waiting segments */ "rsmsegshare_resume done: rsmpi map err\n"));
/* signal any waiting segment */ * this is the routine that gets called by recv_taskq which is the * thread that processes messages that are flow-controlled. "rsm_intr_proc_deferred enter\n"));
/* use the head of the msgbuf_queue */ * messages that need to send a reply should check the message version * before processing the message. And all messages that need to * send a reply should be processed here by the worker thread. /* incr procmsg_cnt can be at most RSMIPC_MAX_MESSAGES */ /* No need to send credits if path is going down */ * send credits and reset procmsg_cnt if success otherwise * credits will be sent after processing the next message "rsm_intr_proc_deferred:send credits err=%d\n", e));
* decrement the path refcnt since we incremented it in * rsm_intr_callback_dispatch "rsm_intr_proc_deferred done\n"));
* Flow-controlled messages are enqueued and dispatched onto a taskq here "rsm_intr_callback_dispatch enter\n"));
/* look up the path - incr the path refcnt */ /* the path has been removed - drop this message */ "rsm_intr_callback_dispatch done: msg dropped\n"));
/* the path is not active - don't accept new messages */ "rsm_intr_callback_dispatch done: msg dropped" " path=%lx !ACTIVE\n",
path));
* Check if this message was sent to an older incarnation /* decrement the refcnt */ "rsm_intr_callback_dispatch done: old incn %lld\n",
/* copy and enqueue msg on the path's msgbuf queue */ * schedule task to process messages - ignore retval from * task_dispatch because we sender cannot send more than * what receiver can handle. "rsm_intr_callback_dispatch done\n"));
* This procedure is called from rsm_srv_func when a remote node creates a * a send queue. This event is used as a hint that an earlier failed * attempt to create a send queue to that remote node may now succeed and * should be retried. Indication of an earlier failed attempt is provided * by the RSMKA_SQCREATE_PENDING flag. "rsm_sqcreateop_callback enter\n"));
/* look up the path - incr the path refcnt */ "rsm_sqcreateop_callback done: no path\n"));
* previous attempt to create sendq had failed, retry * it and move to RSMKA_PATH_ACTIVE state if successful. * the refcnt will be decremented in the do_deferred_work /* decrement the refcnt */ "rsm_sqcreateop_callback done\n"));
* Check for the version number in the msg header. If it is not * RSM_VERSION, drop the message. In the future, we need to manage * incompatible version numbers in some way * Drop requests that don't have a reply right here * Request with reply will send a BAD_VERSION reply * when they get processed by the worker thread. * These message types are handled by a worker thread using * the flow-control algorithm. * Any message processing that does one or more of the * following should be handled in a worker thread. * - allocates resources and might sleep * - makes RSMPI calls down to the interconnect driver * this by defn include requests with reply. * - takes a long duration of time "rsm_intr_callback: bad msg %lx type %d data %lx\n",
"rsm_srv_func: unknown opcode = %x\n",
opcode));
/* *************************** IPC slots ************************* */ /* try to find a free slot, if not wait */ /* An empty slot is available, find it */ * Check if this is a local case "rsm: rsmipc_send bad node number %x\n",
dest));
* Oh boy! we are going remote. * identify if we need to have credits to send this message * - only selected requests are flow controlled "rsmipc_send:request type=%d\n",
/* backoff before further retries for 10ms */ "rsm: rsmipc_send no device to reach node %d\n",
dest));
/* Send request without ack */ * Set the rsmipc_version number in the msghdr for KA * communication versioning * remote endpoints incn should match the value in our * path's remote_incn field. No need to grab any lock * since we have refcnted the path in rsmka_get_sendq_token * wait till we recv credits or path goes down. If path * goes down rsm_send will fail and we handle the error * path is not active retry on another path. "rsm: rsmipc_send: path !ACTIVE"));
* release the reserved msgbuf since "rsm: rsmipc_send no reply send" " err = %d no reply count = %d\n",
/* Send reply - No flow control is done for reply */ * Set the version in the msg header for KA communication /* incn number is not used for reply msgs currently */ "rsm: rsmipc_send reply send" * Set the rsmipc_version number in the msghdr for KA * communication versioning * remote endpoints incn should match the value in our * path's remote_incn field. No need to grab any lock * since we have refcnted the path in rsmka_get_sendq_token * wait till we recv credits or path goes down. If path * goes down rsm_send will fail and we handle the error * path is not active retry on another path. "rsm: rsmipc_send: path !ACTIVE"));
* release the reserved msgbuf since "rsm: rsmipc_send rsmpi send err = %d\n", e));
/* wait for a reply signal, a SIGINT, or 5 sec. timeout */ /* signalled - return error */ * inform the exporter to delete this importer * send the new access mode to all the nodes that have imported * If the new acl does not have a node that was present in * the old acl a access permission of 0 is sent. "rsm_send_suspend enter\n"));
* create a list of node to send the suspend message * Currently the whole importer list is scanned and we obtain * all the nodes - this basically gets all nodes that at least * import one segment from the local node. * no need to grab the rsm_suspend_list lock here since we are * single threaded when suspend is called. * make sure that the token's node * is not already on the suspend list if (
tokp ==
NULL) {
/* not in suspend list */ if (
head ==
NULL) {
/* no importers so go ahead and quiesce segments */ * update the suspend list righaway so that if a node dies the * pathmanager can set the NODE dead flag * Error in rsmipc_send currently happens due to inaccessibility if (e ==
RSM_SUCCESS) {
/* send failed - don't wait for ack */ "rsm_send_suspend done\n"));
* save the suspend list so that we know where to send * the resume messages and make the suspend list head * This function takes path and sends a message using the sendq * corresponding to it. The RSMIPC_MSG_SQREADY, RSMIPC_MSG_SQREADY_ACK * and RSMIPC_MSG_CREDIT are sent using this function. "rsmipc_send_controlmsg enter\n"));
"msgtype=%d %lx:%llx->%lx:%llx procmsg=%d\n",
path,
msgtype,
"rsmipc_send_controlmsg done: ! RSMKA_PATH_ACTIVE"));
/* incr the sendq, path refcnt */ /* drop the path lock before doing the rsm_send */ /* error counter for statistics */ "rsmipc_send_controlmsg:rsm_send error=%d", e));
/* decrement the sendq,path refcnt that we incr before rsm_send */ "rsmipc_send_controlmsg done=%d", e));
* Called from rsm_force_unload and path_importer_disconnect. The memory * mapping for the imported segment is removed and the segment is * disconnected at the interconnect layer if disconnect_flag is TRUE. * rsm_force_unload will get disconnect_flag TRUE from rsm_intr_callback * and FALSE from rsm_rebind. * When subsequent accesses cause page faulting, the dummy page is mapped * to resolve the fault, and the mapping generation number is incremented * so that the application can be notified on a close barrier operation. * It is important to note that the caller of rsmseg_unload is responsible for * acquiring the segment lock before making a call to rsmseg_unload. This is * required to make the caller and rsmseg_unload thread safe. The segment lock * will be released by the rsmseg_unload function. /* wait until segment leaves the mapping state */ * An unload is only necessary if the segment is connected. However, * if the segment was on the import list in state RSM_STATE_CONNECTING * then a connection was in progress. Change to RSM_STATE_NEW * here to cause an early exit from the connection process. "rsmseg_unload done: RSM_STATE_NEW\n"));
"rsmseg_unload done: RSM_STATE_CONNECTING\n"));
/* Setup protections for remap */ "remap returns %d\n", e));
* inform the exporting node so this import * can be deleted from the list of importers. /* ****************************** Importer Calls ************************ */ "rsm_connect done:ENODEV adapter=NULL\n"));
"rsm_connect done:ENODEV loopback\n"));
* Translate perm to access "rsm_connect done:EINVAL invalid perms\n"));
* Adding to the import list locks the segment; release the segment * lock so we can get the reply for the send. "rsm_connect done:rsmimport_add failed %d\n", e));
* Set the s_adapter field here so as to have a valid comparison of * the adapter and the s_adapter value during rsmshare_get. For * any error, set s_adapter to NULL before doing a release_adapter * get the pointer to the shared data structure; the * shared data is locked and refcount has been incremented /* flag indicates whether we need to recheck the state */ /* wait for the state to change */ /* signalled - clean up and return */ "rsm_connect done: INTERRUPTED\n"));
* the state changed, loop back and check what it is /* exit the loop and clean up further down */ /* already connected, good - fall through */ /* already mapped, wow - fall through */ /* access validation etc is done further down */ /* disconnected - so reconnect now */ ASSERT(0);
/* Invalid State */ /* we are the first to connect */ "rsm_connect done: hwaddr<0\n"));
* send request to node [src, dest, key, msgid] and get back * [status, msgid, cookie] * we need the s_mode of the exporter so pass "rsm_connect done:rsmipc_send failed %d\n", e));
"rsm_connect done:rsmipc_send reply err %d\n",
/* store the information recvd into the shared data struct */ * Get the segment lock and check for a force disconnect * from the export side which would have changed the state * back to RSM_STATE_NEW. Once the segment lock is acquired a * force disconnect will be held off until the connection * set a flag indicating abort handling has been /* send a message to exporter - only once */ * wake up any waiting importers and inform that * connection has been aborted "rsm_connect done: RSM_STATE_ABORT_CONNECT\n"));
* We need to verify that this process has access * No need to lock segment it has been removed /* this is the first importer */ "rsm_connect done: ipcaccess failed\n"));
/* update state and cookie */ * inform the exporter to delete this importer * Now inform any waiting importers to * retry connect. This needs to be done * after sending notimporting so that * the notimporting is sent before a waiting * importer sends a segconnect while retrying * No need to lock segment it has been removed "rsm_connect error %d\n", e));
/* increment generation number on barrier page */ /* return user off into barrier page where status will be */ /* Return back to user the segment size & perm in case it's needed */ /* assert seg is locked */ /* segment unmap has already been done */ * - shared data struct is in MAPPED or MAP_QUIESCE state * Unmap pages - previously rsm_memseg_import_unmap was called only if * the segment cookie list was NULL; but it is always NULL when * called from rsmmap_unmap and won't be NULL when called for * a force disconnect - so the check for NULL cookie list was removed /* unmap the shared RSMPI mapping */ "rsm_unmap: rsmpi unmap %d\n",
err));
}
else {
/* MAP_QUIESCE --munmap()--> CONN_QUIESCE */ * The s_cookie field is used to store the cookie returned from the * ddi_umem_lock when binding the pages for an export segment. This * is the primary use of the s_cookie field and does not normally * pertain to any importing segment except in the loopback case. * For the loopback case, the import segment and export segment are * on the same node, the s_cookie field of the segment structure for * the importer is initialized to the s_cookie field in the exported * segment during the map operation and is used during the call to * devmap_umem_setup for the import mapping. * Thus, during unmap, we simply need to set s_cookie to NULL to * indicate that the mapping no longer exists. * cookie returned here if not null indicates that it is * the last importer and it can be used in the RSMIPC_NOT_IMPORTING "rsm_closeconnection enter\n"));
/* assert seg is locked */ "rsm_closeconnection done: already disconnected\n"));
/* wait for all putv/getv ops to get done */ * The current algorithm is stateless, I don't have to contact * server when I go away. He only gives me permissions. Of course, * the adapters will talk to terminate the connect. * disconnect is needed only if we are CONNECTED not in CONN_QUIESCE /* this is the last importer */ "rsm:disconnect failed seg=%x:err=%d\n",
/* clean up the shared data structure */ /* increment generation number on barrier page */ * The following needs to be done after any * rsmsharelock calls which use seg->s_share. /* signal anyone waiting in the CONN_QUIESCE state */ "rsm_closeconnection done\n"));
/* assert seg isn't locked */ /* Remove segment from imported list */ /* acquire the segment */ /* wait until segment leaves the mapping state */ "rsm_disconnect done: already disconnected\n"));
* This is the last importer so inform the exporting node * so this import can be deleted from the list of importers. /* find minor, no lock */ * An exported segment must be in state RSM_STATE_EXPORT; an * imported segment must be in state RSM_STATE_ACTIVE. /* cannot take segment lock here */ /* ************************* IOCTL Commands ********************* */ /* get segment from resource handle */ /* Allocate segment now and bind it */ * if DR pre-processing is going on or DR is in progress * then the new export segments should be in the NEW_QSCD state "rsmbar_ioctl done: RSM_IMPORT_DUMMY\n"));
"rsmbar_ioctl done: loopback\n"));
"rsmbar_ioctl done: RSM_BAR_CHECK %d\n",
bar_va));
for (i = 0; i <
4; i++) {
"rsmbar_ioctl done: error=%d\n", e));
* Ring the doorbell of the export segment to which this segment is "exportbell_ioctl done: %d\n", e));
* Ring the doorbells of all segments importing this segment "importbell_ioctl done\n"));
/* copyin the ioctl message */ "consumeevent_copyin msgp: RSMERR_BAD_ADDR\n"));
* If numents is large alloc events list on heap otherwise * use the address of array that was passed in. "RSMERR_BAD_ARGS_ERRORS\n"));
/* copyin the seglist into the rsm_poll_event32_t array */ "consumeevent_copyin evlist: RSMERR_BAD_ADDR\n"));
/* evlist and evlistsz are based on rsm_poll_event_t type */ * copy the rsm_poll_event32_t array to the rsm_poll_event_t /* free the temp 32-bit event list */ /* copyin the ioctl message */ "consumeevent_copyin msgp: RSMERR_BAD_ADDR\n"));
* If numents is large alloc events list on heap otherwise * use the address of array that was passed in. "consumeevent_copyin: RSMERR_BAD_ARGS_ERRORS\n"));
"consumeevent_copyin evlist: RSMERR_BAD_ADDR\n"));
"consumeevent_copyin done\n"));
"consumeevent_copyout enter: numents(%d) eventsp(%p)\n",
* copy the rsm_poll_event_t array to the rsm_poll_event32_t if (
evlist32) {
/* free the temp 32-bit event list */ * eventsp and evlistsz are based on rsm_poll_event_t /* event list on the heap and needs to be freed here */ "consumeevent_copyout done: err=%d\n",
err));
/* event list on the heap and needs to be freed here */ "consumeevent_copyout done: err=%d\n",
err));
/* get the segment structure */ "consumeevent_ioctl: rnum(%d) seg(%p)\n",
rnum,
"iovec_copyin: returning RSMERR_BAD_ADDR\n"));
"iovec_copyin done: RSMERR_BAD_ADDR\n"));
"sgio_copyin done: returning EFAULT\n"));
"sgio_copyin done: returning EFAULT\n"));
"sgio_resid_copyout enter\n"));
"sgio_resid_copyout error: rescnt\n"));
"sgio_resid_copyout error: flags\n"));
"sgio_resid_copyout done\n"));
"sgio_resid_copyout error:rescnt\n"));
"sgio_resid_copyout error:flags\n"));
"rsm_iovec_ioctl done: sgio_copyin %d\n", e));
"rsm_iovec_ioctl done: request_count(%d) too large\n",
/* Allocate memory and copyin io vector array */ "rsm_iovec_ioctl done: iovec_copyin %d\n", e));
/* get the import segment descriptor */ * The following sequence of locking may (or MAY NOT) cause a * deadlock but this is currently not addressed here since the * implementation will be changed to incorporate the use of * reference counting for both the import and the export segments. /* rsmseglock_acquire(im_seg) done in rsmresource_lookup */ "rsm_iovec_ioctl done: rsmresource_lookup failed\n"));
/* putv/getv supported is supported only on import segments */ "rsm_iovec_ioctl done: not an import segment\n"));
* wait for a remote DR to complete ie. for segments to get UNQUIESCED * as well as wait for a local DR to complete. "rsm_iovec_ioctl done: cv_wait INTR"));
"rsm_iovec_ioctl done: im_seg not conn/map"));
* Allocate and set up the io vector for rsmpi /* error while processing handle */ "iovec_ioctl: bad command = %x\n",
cmd));
"rsm_iovec_ioctl RSMPI oper done %d\n", e));
* Check for implicit signal post flag and do the signal * Reset the implicit signal post flag to 0 to indicate * that the signal post has been done and need not be * done in the RSMAPI library * At present there is no dependency on the existence of xbufs * created by ddi_umem_iosetup for each of the iovecs. So we "rsm_iovec_ioctl done %d\n", e));
/* if RSMPI call fails return that else return copyout's retval */ "rsmaddr_ioctl done: adapter not found\n"));
/* returns the hwaddr in msg->hwaddr */ /* returns the nodeid in msg->nodeid */ "rsmaddr_ioctl done: %d\n",
rval));
"rsm_ddi_copyin done: EFAULT\n"));
for (i = 0; i <
4; i++) {
"rsm_ddi_copyin done\n"));
"rsmattr_ddi_copyout enter\n"));
* need to copy appropriate data from rsm_controller_attr_t * to rsmka_int_controller_attr_t "rsmattr_ddi_copyout done\n"));
"rsmattr_ddi_copyout done\n"));
"rsm_ioctl RSM_IOCTL_CONSUMEEVENT done: %d\n",
error));
/* topology cmd does not use the arg common to other cmds */ "rsm_ioctl done: %d\n",
error));
"rsm_ioctl done: %d\n",
error));
"rsm_ioctl done: EFAULT\n"));
"rsm_ioctl done: ENODEV\n"));
"rsm_ioctl:after copyout %d\n",
error));
/* Return library off,len of barrier page */ /* map the nodeid or hwaddr */ "rsm_ioctl done: %d\n",
error));
/* Find resource and look it in read mode */ * Export list is searched during publish, loopback and /* Import list is searched during remote unmap call. */ }
else {
/* invalid res value */ else /* RSM_RESOURCE_BAR */ }
else {
/* invalid res value *//* **************************** Segment Mapping Operations ********* */ * Find the correct mapinfo structure to use during the mapping * from the seg->s_mapinfo list. * The seg->s_mapinfo list contains in reverse order the mappings * as returned by the RSMPI rsm_map. In rsm_devmap, we need to * access the correct entry within this list for the mapping * The algorithm for selecting a list entry is as follows: * When start_offset of an entry <= off we have found the entry * we were looking for. Adjust the dev_offset and map_len (needs * to be PAGESIZE aligned). "rsmmap_map: dhp = %x\n",
dhp));
* Allocate structure and add cookie to segment list * Page fault handling is done here. The prerequisite mapping setup * has been done in rsm_devmap with calls to ddi_devmem_setup or "rsmmap_access done: cv_wait INTR"));
"rsmmap_access: dhp = %x\n",
dhp));
* Same as map, create an entry to hold cookie and add it to * connect segment list. The oldpvt is a pointer to segment. * Return segment pointer in newpvt. "rsmmap_dup done: EINVAL\n"));
* Remove pvtp structure from segment list. "rsmmap_unmap: dhp = %x\n",
dhp));
* We can go ahead and remove the dhps even if we are in * the MAPPING state because the dhps being removed here * belong to a different mmap and we are holding the segment /* find and remove dhp handle */ "rsmmap_unmap:parital unmap" "new_dhp1 %lx, new_dhp2 %lx\n",
* rsmmap_unmap is called for each mapping cookie on the list. * When the list becomes empty and we are not in the MAPPING * state then unmap in the rsmpi driver. /* Free the segment structure */ "rsm_devmap: off = %lx, len = %lx\n",
off,
len));
* The offset argument in devmap_umem_setup represents * the offset within the kernel memory defined by the * cookie. We use this offset as barrier_offset. "rsm_devmap done: %d\n",
err));
"rsm_devmap done: %d\n",
err));
"rsm_devmap done: %d\n",
err));
* Make sure we still have permission for the map operation. * For each devmap call, rsmmap_map is called. This maintains driver * private information for the mapping. Thus, if there are multiple * devmap calls there will be multiple rsmmap_map calls and for each * call, the mapping information will be stored. * In case of an error during the processing of the devmap call, error * will be returned. This error return causes the caller of rsm_devmap * to undo all the mappings by calling rsmmap_unmap for each one. * rsmmap_unmap will free up the private information for the requested "rsm_devmap: incorrect mapping info\n"));
"rsm_devmap: dip=%lx,dreg=%lu,doff=%lx," "rsm_devmap: devmap_devmem_setup failed %d\n",
/* cur_len is always an integral multiple pagesize */ "rsm_devmap: devmap_umem_setup failed %d\n",
"rsm_devmap: loopback done\n"));
* We can use the devmap framework for mapping device memory to user space by * specifying this routine in the rsm_cb_ops structure. The kernel mmap * processing calls this entry point and devmap_setup is called within this * function, which eventually calls rsm_devmap "rsm_segmap done: invalid segment\n"));
* the user is trying to map a resource that has not been * defined yet. The library uses this to map in the * The mapping for the barrier page is identified * by the special offset barrier_offset "rsm_segmap: bar cookie/va is NULL\n"));
"rsm_segmap done: %d\n",
error));
/* Make sure you can only map imported segments */ "rsm_segmap done: not an import segment\n"));
/* check means library is broken */ /* wait for the segment to become unquiesced */ "rsm_segmap done: cv_wait INTR"));
/* wait until segment leaves the mapping state */ * we allow multiple maps of the same segment in the KA * and it works because we do an rsmpi map of the whole * segment during the first map and all the device mapping * information needed in rsm_devmap is in the mapinfo list. "rsm_segmap done: segment not connected\n"));
* Make sure we are not mapping a larger segment than what's "rsm_segmap done: off+len>seg size\n"));
* Make sure we still have permission for the map operation. "rsm_segmap done: no permission\n"));
"rsm_segmap done:RSMSI_STATE %d invalid\n",
* Do the map - since we want importers to share mappings * we do the rsmpi map for the whole segment * length_to_map = seg->s_len is always an integral * multiple of PAGESIZE. Length mapped in each entry in mapinfo * list is a multiple of PAGESIZE - RSMPI map ensures this /* map the whole segment */ * Store the mapping info obtained from rsm_map /* Check if this is the the first rsm_map */ * A single rsm_unmap undoes "rsm_segmap done: rsmpi map err %d\n",
/* move to an intermediate mapping state */ /* unmap the shared RSMPI mapping */ "rsm: devmap_setup failed %d\n",
error));
* For loopback, the export segment mapping cookie (s_cookie) * is also used as the s_cookie value for its import segments * Note that reference counting for s_cookie of the export * segment is not required due to the following: * We never have a case of the export segment being destroyed, * leaving the import segments with a stale value for the * s_cookie field, since a force disconnect is done prior to a * destroy of an export segment. The force disconnect causes * the s_cookie value to be reset to NULL. Also for the * rsm_rebind operation, we change the s_cookie value of the * export segment as well as of all its local (loopback) * In order to maintain the lock ordering between the export * and import segment locks, we need to acquire the export * segment lock first and only then acquire the import * The above is necessary to avoid any deadlock scenarios * with rsm_rebind which also acquires both the export * and import segment locks in the above mentioned order. * Based on code inspection, there seem to be no other * situations in which both the export and import segment * locks are acquired either in the same or opposite order * Thus in order to conform to the above lock order, we * need to change the state of the import segment to * RSM_STATE_MAPPING, release the lock. Once this is done we * can now safely acquire the export segment lock first * followed by the import segment lock which is as per * the lock order mentioned above. /* move to an intermediate mapping state */ * Revert to old_state and signal any waiters * The shared state is not changed "rsm_segmap done: key %d not found\n",
seg->
s_key));
* It is not required or necessary to acquire the import * segment lock here to change the value of s_cookie since * no one will touch the import segment as long as it is * in the RSM_STATE_MAPPING state. "rsm_segmap done: %d\n",
error));
"rsmka_init_loopback enter\n"));
/* initialize null ops vector */ /* initialize attributes for loopback adapter */ /* initialize loopback adapter */ "rsmka_init_loopback done\n"));
/* ************** DR functions ********************************** */ * wait for putv/getv to complete if the segp is * state changed need to see what it * send SUSPEND messages - currently it will be }
else {
/* bind failed - resource unavailable */ /* wait for the segment to move to EXPORT_QUIESCED state */ /* bind failed - resource unavailable */ "%s done: exp_qscd bind failed = %d\n",
* segp->s_state = RSM_STATE_EXPORT; * segp->s_state = RSM_STATE_BIND; /* check whether it is a local_memory_handle */ "%s done: exp_qscd create failed = %d\n",
"%s done: exp_qscd publish failed = %d\n",
/* wait for the RDMA to complete */ "rsm_dr_process_local_segments enter\n"));
/* iterate through the resource structure */ "rsm_dr_process_local_segments done\n"));
/* *************** DR callback functions ************ */ "rsm_dr_callback_post_add is a no-op\n"));
"rsm_dr_callback_pre_del enter\n"));
"rsm_dr_callback_pre_del:state=%d\n",
* The state should usually never be RSM_DRV_NEW * since in this state the callbacks have not yet * been registered. So, ASSERT. * The driver is in the process of registering * with the DR framework. So, wait till the * registration process is complete. * If the state is RSM_DRV_UNREG_PROCESSING, the * module is in the process of detaching and * unregistering the callbacks from the DR * framework. So, simply return. "rsm_dr_callback_pre_del:" "rsm_dr_callback_pre_del done\n"));
/* Do all the quiescing stuff here */ "rsm_dr_callback_pre_del: quiesce things now\n"));
* now that all local segments have been quiesced lets inform * In response to the suspend message the remote node(s) will process * the segments and send a suspend_complete message. Till all * the nodes send the suspend_complete message we wait in the * RSM_DRV_PREDEL_STARTED state. In the exporter_quiesce * function we transition to the RSM_DRV_PREDEL_COMPLETED state. "rsm_dr_callback_pre_del done\n"));
"rsm_dr_callback_post_del enter\n"));
"rsm_dr_callback_post_del:state=%d\n",
* The driver state cannot not be RSM_DRV_NEW * since in this state the callbacks have not * The driver is in the process of registering with * the DR framework. Wait till the registration is * RSM_DRV_UNREG_PROCESSING state means the module * is detaching and unregistering the callbacks * from the DR framework. So simply return. * RSM_DRV_OK means we missed the pre-del * corresponding to this post-del coz we had not * registered yet, so simply return. "rsm_dr_callback_post_del:" "rsm_dr_callback_post_del done:\n"));
/* Do all the unquiescing stuff here */ "rsm_dr_callback_post_del: unquiesce things now\n"));
* now that all local segments have been unquiesced lets inform "rsm_dr_callback_post_del done\n"));