rsm.c revision 5dd2c7e9f9042148fc81d6aada5df28c2705977d
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#pragma ident "%Z%%M% %I% %E% SMI"
/*
* Overview of the RSM Kernel Agent:
* ---------------------------------
*
* rsm.c constitutes the implementation of the RSM kernel agent. The RSM
* kernel agent is a pseudo device driver which makes use of the RSMPI
* interface on behalf of the RSMAPI user library.
*
* The kernel agent functionality can be categorized into the following
* components:
* 1. Driver Infrastructure
* 3. Internal resource allocation/deallocation
*
* The driver infrastructure includes the basic module loading entry points
* like _init, _info, _fini to load, unload and report information about
* the driver module. The driver infrastructure also includes the
* autoconfiguration entry points namely, attach, detach and getinfo for
* the device autoconfiguration.
*
* The kernel agent is a pseudo character device driver and exports
* a cb_ops structure which defines the driver entry points for character
* device access. This includes the open and close entry points. The
* other entry points provided include ioctl, devmap and segmap and chpoll.
* read and write entry points are not used since the device is memory
* mapped. Also ddi_prop_op is used for the prop_op entry point.
*
* The ioctl entry point supports a number of commands, which are used by
* the RSMAPI library in order to export and import segments. These
* commands include commands for binding and rebinding the physical pages
* allocated to the virtual address range, publishing the export segment,
* unpublishing and republishing an export segment, creating an
* import segment and a virtual connection from this import segment to
* an export segment, performing scatter-gather data transfer, barrier
* operations.
*
*
* Export and Import segments:
* ---------------------------
*
* In order to create an RSM export segment a process allocates a range in its
* virtual address space for the segment using standard Solaris interfaces.
* The process then calls RSMAPI, which in turn makes an ioctl call to the
* RSM kernel agent for an allocation of physical memory pages and for
* creation of the export segment by binding these pages to the virtual
* address range. These pages are locked in memory so that remote accesses
* are always applied to the correct page. Then the RSM segment is published,
* again via RSMAPI making an ioctl to the RSM kernel agent, and a segment id
* is assigned to it.
*
* In order to import a published RSM segment, RSMAPI creates an import
* segment and forms a virtual connection across the interconnect to the
* export segment, via an ioctl into the kernel agent with the connect
* command. The import segment setup is completed by mapping the
* local device memory into the importers virtual address space. The
* infrastructure described as follows.
*
* Segmap and Devmap interfaces:
*
* The RSM kernel agent allows device memory to be directly accessed by user
* threads via memory mapping. In order to do so, the RSM kernel agent
* supports the devmap and segmap entry points.
*
* The segmap entry point(rsm_segmap) is responsible for setting up a memory
* mapping as requested by mmap. The devmap entry point(rsm_devmap) is
* responsible for exporting the device memory to the user applications.
* rsm_segmap calls RSMPI rsm_map to allocate device memory. Then the
* control is transfered to the devmap_setup call which calls rsm_devmap.
*
* rsm_devmap validates the user mapping to the device or kernel memory
* and passes the information to the system for setting up the mapping. The
* actual setting up of the mapping is done by devmap_devmem_setup(for
* device memory) or devmap_umem_setup(for kernel memory). Callbacks are
* registered for device context management via the devmap_devmem_setup
* or devmap_umem_setup calls. The callbacks are rsmmap_map, rsmmap_unmap,
* rsmmap_access, rsmmap_dup. The callbacks are called when a new mapping
* is created, a mapping is freed, a mapping is accessed or an existing
* mapping is duplicated respectively. These callbacks allow the RSM kernel
* agent to maintain state information associated with the mappings.
* The state information is mainly in the form of a cookie list for the import
* segment for which mapping has been done.
*
* Forced disconnect of import segments:
*
* When an exported segment is unpublished, the exporter sends a forced
* disconnect message to all its importers. The importer segments are
* unloaded and disconnected. This involves unloading the original
* mappings and remapping to a preallocated kernel trash page. This is
* preallocated by the kernel agent during attach using ddi_umem_alloc with
* the DDI_UMEM_TRASH flag set. This avoids a core dump in the application
* due to unloading of the original mappings.
*
* Additionally every segment has a mapping generation number associated
* with it. This is an entry in the barrier generation page, created
* during attach time. This mapping generation number for the import
* segments is incremented on a force disconnect to notify the application
* of the force disconnect. On this notification, the application needs
* to reconnect the segment to establish a new legitimate mapping.
*
*
* Locks used in the kernel agent:
* -------------------------------
*
* The kernel agent uses a variety of mutexes and condition variables for
* mutual exclusion of the shared data structures and for synchronization
* between the various threads. Some of the locks are described as follows.
*
* has a lock associated with it. The lock is the resource mutex, rsmrc_lock.
* This is used directly by RSMRC_LOCK and RSMRC_UNLOCK macros and in the
* rsmseglock_acquire and rsmseglock_release macros. An additional
* lock called the rsmsi_lock is used for the shared import data structure
* that is relevant for resources representing import segments. There is
* also a condition variable associated with the resource called s_cv. This
* is used to wait for events like the segment state change etc.
*
* The resource structures are allocated from a pool of resource structures,
* called rsm_resource. This pool is protected via a reader-writer lock,
* called rsmrc_lock.
*
* There are two separate hash tables, one for the export segments and
* one for the import segments. The export segments are inserted into the
* export segment hash table only after they have been published and the
* import segments are inserted in the import segments list only after they
* have successfully connected to an exported segment. These tables are
* protected via reader-writer locks.
*
* Debug Support in the kernel agent:
* ----------------------------------
*
* Debugging support in the kernel agent is provided by the following
* macros.
*
* DBG_PRINTF((category, level, message)) is a macro which logs a debug
* message to the kernel agents debug buffer, rsmka_dbg. This debug buffer
* can be viewed in kmdb as *rsmka_dbg/s. The message is logged based
* on the definition of the category and level. All messages that belong to
* the specified category(rsmdbg_category) and are of an equal or greater
* severity than the specified level(rsmdbg_level) are logged. The message
* is a string which uses the same formatting rules as the strings used in
* printf.
*
* The category defines which component of the kernel agent has logged this
* message. There are a number of categories that have been defined such as
* RSM_KERNEL_AGENT, RSM_OPS, RSM_IMPORT, RSM_EXPORT etc. A macro,
* DBG_ADDCATEGORY is used to add in another category to the currently
* specified category value so that the component using this new category
* can also effectively log debug messages. Thus, the category of a specific
* message is some combination of the available categories and we can define
* sub-categories if we want a finer level of granularity.
*
* The level defines the severity of the message. Different level values are
* defined, with RSM_ERR being the most severe and RSM_DEBUG_VERBOSE being
* the least severe(debug level is 0).
*
* DBG_DEFINE and DBG_DEFINE_STR are macros provided to declare a debug
* variable or a string respectively.
*
*
* NOTES:
*
* Special Fork and Exec Handling:
* -------------------------------
*
* The backing physical pages of an exported segment are always locked down.
* Thus, there are two cases in which a process having exported segments
* will cause a cpu to hang: (1) the process invokes exec; (2) a process
* forks and invokes exit before the duped file descriptors for the export
* segments are closed in the child process. The hang is caused because the
* address space release algorithm in Solaris VM subsystem is based on a
* non-blocking loop which does not terminate while segments are locked
* down. In addition to this, Solaris VM subsystem lacks a callback
* mechanism to the rsm kernel agent to allow unlocking these export
* segment pages.
*
* In order to circumvent this problem, the kernel agent does the following.
* The Solaris VM subsystem keeps memory segments in increasing order of
* virtual addressses. Thus a special page(special_exit_offset) is allocated
* by the kernel agent and is mmapped into the heap area of the process address
* space(the mmap is done by the RSMAPI library). During the mmap processing
* of this special page by the devmap infrastructure, a callback(the same
* devmap context management callbacks discussed above) is registered for an
* unmap.
*
* As discussed above, this page is processed by the Solaris address space
* release code before any of the exported segments pages(which are allocated
* from high memory). It is during this processing that the unmap callback gets
* called and this callback is responsible for force destroying the exported
* segments and thus eliminating the problem of locked pages.
*
* Flow-control:
* ------------
*
* A credit based flow control algorithm is used for messages whose
* processing cannot be done in the interrupt context because it might
* involve invoking rsmpi calls, or might take a long time to complete
* or might need to allocate resources. The algorithm operates on a per
* path basis. To send a message the pathend needs to have a credit and
* it consumes one for every message that is flow controlled. On the
* receiving pathend the message is put on a msgbuf_queue and a task is
* dispatched on the worker thread - recv_taskq where it is processed.
* After processing the message, the receiving pathend dequeues the message,
* and if it has processed > RSMIPC_LOTSFREE_MSGBUFS messages sends
* credits to the sender pathend.
*
* RSM_DRTEST:
* -----------
*
* This is used to enable the DR testing using a test driver on test
* platforms which do not supported DR.
*
*/
#include <sys/mem_config.h>
#include <sys/ddidevmap.h>
#include <sys/ddi_impldefs.h>
#include <sys/ddi_impldefs.h>
#include <rsm_in.h>
#include <sys/tuneable.h>
#ifdef RSM_DRTEST
void *arg);
void *arg);
#endif
extern void rsmka_pathmanager_init();
extern void rsmka_pathmanager_cleanup();
extern void rele_sendq_token();
extern int rsmka_topology_ioctl(caddr_t, int, int);
extern pri_t maxclsyspri;
extern work_queue_t work_queue;
extern kmutex_t ipc_info_lock;
extern kmutex_t ipc_info_cvlock;
extern kcondvar_t ipc_info_cv;
extern kmutex_t path_hold_cvlock;
extern kcondvar_t path_hold_cv;
extern kmutex_t rsmka_buf_lock;
extern adapter_t *rsmka_lookup_adapter(char *, int);
extern void rsmka_release_adapter(adapter_t *);
/* lint -w2 */
uint_t);
static void rsm_export_force_destroy(ddi_umem_cookie_t *);
static void rsmacl_free(rsmapi_access_entry_t *, int);
static void rsmpiacl_free(rsm_access_entry_t *, int);
static int rsm_inc_pgcnt(pgcnt_t);
static void rsm_dec_pgcnt(pgcnt_t);
size_t *);
static void exporter_quiesce();
static void rsmseg_suspend(rsmseg_t *, int *);
static void rsmsegshare_suspend(rsmseg_t *);
static int rsmseg_resume(rsmseg_t *, void **);
static int rsmsegshare_resume(rsmseg_t *);
static struct cb_ops rsm_cb_ops = {
rsm_open, /* open */
rsm_close, /* close */
nodev, /* strategy */
nodev, /* print */
nodev, /* dump */
nodev, /* read */
nodev, /* write */
rsm_ioctl, /* ioctl */
rsm_devmap, /* devmap */
NULL, /* mmap */
rsm_segmap, /* segmap */
rsm_chpoll, /* poll */
ddi_prop_op, /* cb_prop_op */
0, /* streamtab */
0,
0,
0
};
DEVO_REV, /* devo_rev, */
0, /* refcnt */
rsm_info, /* get_dev_info */
nulldev, /* identify */
nulldev, /* probe */
rsm_attach, /* attach */
rsm_detach, /* detach */
nodev, /* reset */
&rsm_cb_ops, /* driver operations */
(struct bus_ops *)0, /* bus operations */
0
};
/*
* Module linkage information for the kernel.
*/
&mod_driverops, /* Type of module. This one is a pseudo driver */
"Remote Shared Memory Driver %I%",
&rsm_ops, /* driver ops */
};
static struct modlinkage modlinkage = {
(void *)&modldrv,
0,
0,
0
};
static kphysm_setup_vector_t rsm_dr_callback_vec = {
};
/* This flag can be changed to 0 to help with PIT testing */
int rsmka_modunloadok = 1;
int no_reply_cnt = 0;
#define MAX_NODES 64
static struct rsm_driver_data rsm_drv_data;
static struct rsmresource_table rsm_resource;
static void rsmresource_destroy(void);
static int rsmresource_alloc(minor_t *);
void *cookie);
void rsmseg_unload(rsmseg_t *);
/* cookie, va, offsets and length for the barrier */
static rsm_gnum_t *bar_va;
static ddi_umem_cookie_t bar_cookie;
static off_t barrier_offset;
static size_t barrier_size;
static int max_segs;
/* cookie for the trash memory */
static ddi_umem_cookie_t remap_cookie;
extern taskq_t *work_taskq;
extern char *taskq_name;
/* list of nodes to which RSMIPC_MSG_SUSPEND has been sent */
static list_head_t rsm_suspend_list;
/* list of descriptors for remote importers */
static importers_table_t importer_list;
void rsmka_init_loopback();
int rsmka_null_bind(
int rsmka_null_unbind(
size_t);
int rsmka_null_rebind(
int rsmka_null_publish(
int rsmka_null_republish(
int rsmka_null_unpublish(
/*
* data and locks to keep track of total amount of exported memory
*/
static kmutex_t rsm_pgcnt_lock;
static int rsm_enable_dr;
static char loopback_str[] = "loopback";
int rsm_hash_size;
/*
* The locking model is as follows:
*
* Local operations:
* find resource - grab reader lock on resouce list
* insert rc - grab writer lock
* delete rc - grab writer lock and resource mutex
*
* Remote invocations:
* find resource - grab read lock and resource mutex
*
* State:
* resource state - grab resource mutex
*/
int
_init(void)
{
int e;
e = mod_install(&modlinkage);
if (e != 0) {
return (e);
}
for (e = 0; e < RSMIPC_SZ; e++) {
}
/*
* Initialize the suspend message list
*/
/*
* It is assumed here that configuration data is available
* during system boot since _init may be called at that time.
*/
"rsm: _init done\n"));
return (DDI_SUCCESS);
}
int
{
}
int
_fini(void)
{
int e;
"rsm: _fini enter\n"));
/*
* The rsmka_modunloadok flag is simply used to help with
* the PIT testing. Make this flag 0 to disallow modunload.
*/
if (rsmka_modunloadok == 0)
return (EBUSY);
/* rsm_detach will be called as a result of mod_remove */
e = mod_remove(&modlinkage);
if (e) {
"Unable to fini RSM %x\n", e));
return (e);
}
(void) mutex_destroy(&rsm_pgcnt_lock);
return (DDI_SUCCESS);
}
/*ARGSUSED1*/
static int
{
int percent;
int ret;
switch (cmd) {
case DDI_ATTACH:
break;
case DDI_RESUME:
default:
"rsm:rsm_attach - cmd not supported\n"));
return (DDI_FAILURE);
}
"rsm:rsm_attach - supports only "
"one instance\n"));
return (DDI_FAILURE);
}
"enable-dynamic-reconfiguration", 1);
if (rsm_enable_dr) {
#ifdef RSM_DRTEST
(void *)NULL);
#else
(void *)NULL);
#endif
if (ret != 0) {
"reconfiguration setup failed\n");
return (DDI_FAILURE);
}
}
/*
* page_list_read_lock();
* xx_setup();
* page_list_read_unlock();
*/
"segment-hashtable-size", RSM_HASHSZ);
if (rsm_hash_size == 0) {
"rsm: segment-hashtable-size in rsm.conf "
"must be greater than 0, defaulting to 128\n"));
}
rsm_pgcnt = 0;
"max-exported-memory", 0);
if (percent < 0) {
"rsm:rsm_attach not enough memory available to "
"export, or max-exported-memory set incorrectly.\n"));
return (DDI_FAILURE);
}
/* 0 indicates no fixed upper limit. maxmem is the max */
/* available pageable physical mem */
if (rsm_pgcnt_max > 0) {
"rsm: Available physical memory = %lu pages, "
"Max exportable memory = %lu pages",
maxmem, rsm_pgcnt_max));
}
/*
* Create minor number
*/
"rsm: rsm_attach - Unable to get "
"minor number\n"));
return (DDI_FAILURE);
}
"rsm: rsm_attach - unable to allocate "
"minor #\n"));
return (DDI_FAILURE);
}
/*
* Allocate the hashtables
*/
KM_SLEEP);
/*
* Allocate a resource struct
*/
{
rsmresource_t *p;
}
/*
* Based on the rsm.conf property max-segments, determine the maximum
* to determine the size for barrier failure pages.
*/
/* First get the max number of segments from the rsm.conf file */
"max-segments", 0);
if (max_segs == 0) {
/* Use default number of segments */
}
/*
* Based on the max number of segments allowed, determine the barrier
* page size. add 1 to max_segs since the barrier page itself uses
* a slot
*/
PAGESIZE);
/*
* allocation of the barrier failure page
*/
/*
* Set the barrier_offset
*/
barrier_offset = 0;
/*
* Allocate a trash memory and get a cookie for it. This will be used
* when remapping segments during force disconnects. Allocate the
* trash memory with a large size which is page aligned.
*/
/* initialize user segment id allocation variable */
/*
* initialize the null_rsmpi_ops vector and the loopback adapter
*/
return (DDI_SUCCESS);
}
/*
* The call to mod_remove in the _fine routine will cause the system
* to call rsm_detach
*/
/*ARGSUSED*/
static int
{
switch (cmd) {
case DDI_DETACH:
break;
default:
"rsm:rsm_detach - cmd %x not supported\n",
cmd));
return (DDI_FAILURE);
}
/*
* Unregister the DR callback functions
*/
if (rsm_enable_dr) {
#ifdef RSM_DRTEST
(void *)NULL);
#else
(void *)NULL);
#endif
}
/*
* Release all resources, seglist, controller, ...
*/
/* remove intersend queues */
/* remove registered services */
/*
* Free minor zero resource
*/
{
rsmresource_t *p;
if (p) {
mutex_destroy(&p->rsmrc_lock);
kmem_free((void *)p, sizeof (*p));
}
}
/*
* Free resource table
*/
/*
* Free the hash tables
*/
rsm_hash_size * sizeof (importing_token_t *));
/* free barrier page */
if (bar_cookie != NULL) {
}
bar_cookie = NULL;
/*
* Free the memory allocated for the trash
*/
if (remap_cookie != NULL) {
}
remap_cookie = NULL;
return (DDI_SUCCESS);
}
/*ARGSUSED*/
static int
{
register int error;
switch (infocmd) {
case DDI_INFO_DEVT2DEVINFO:
error = DDI_FAILURE;
else {
error = DDI_SUCCESS;
}
break;
case DDI_INFO_DEVT2INSTANCE:
*result = (void *)0;
error = DDI_SUCCESS;
break;
default:
error = DDI_FAILURE;
}
return (error);
}
{
char adapter_devname[MAXNAMELEN];
int instance;
return (NULL);
}
return (NULL);
return (&loopback_adapter);
return (adapter);
}
/*
* *********************** Resource Number Management ********************
* All resources are stored in a simple hash table. The table is an array
* of pointers to resource blks. Each blk contains:
* base - base number of this blk
* used - number of used slots in this blk.
* blks - array of pointers to resource items.
* An entry in a resource blk is empty if it's NULL.
*
* We start with no resource array. Each time we run out of slots, we
* reallocate a new larger array and copy the pointer to the new array and
* a new resource blk is allocated and added to the hash table.
*
* The resource control block contains:
* root - array of pointer of resource blks
* sz - current size of array.
* len - last valid entry in array.
*
* A search operation based on a resource number is as follows:
* index = rnum / RESOURCE_BLKSZ;
* ASSERT(index < resource_block.len);
* ASSERT(index < resource_block.sz);
* offset = rnum % RESOURCE_BLKSZ;
* ASSERT(offset >= resource_block.root[index]->base);
* ASSERT(offset < resource_block.root[index]->base + RESOURCE_BLKSZ);
* return resource_block.root[index]->blks[offset];
*
* A resource blk is freed with its used count reachs zero.
*/
static int
{
/* search for available resource slot */
int i, j, empty = -1;
"rsmresource_alloc enter\n"));
/* Try to find an empty slot */
for (i = 0; i < rsm_resource.rsmrc_len; i++) {
/* found an empty slot in this blk */
for (j = 0; j < RSMRC_BLKSZ; j++) {
(j + (i * RSMRC_BLKSZ));
/*
* obey gen page limits
*/
if (empty < 0) {
"rsmresource"
"_alloc failed:"
"not enough res"
"%d\n", *rnum));
return (
} else {
/* use empty slot */
break;
}
}
blk->rsmrcblk_avail--;
"rsmresource_alloc done\n"));
return (RSM_SUCCESS);
}
}
/* remember first empty slot */
empty = i;
}
}
/* Couldn't find anything, allocate a new blk */
/*
* Do we need to reallocate the root array
*/
if (empty < 0) {
/*
* Allocate new array and copy current stuff into it
*/
rsmresource_blk_t **p;
/*
* Don't allocate more that max valid rnum
*/
max_segs + 1) {
return (RSMERR_INSUFFICIENT_RESOURCES);
}
p = (rsmresource_blk_t **)kmem_zalloc(
newsz * sizeof (*p),
KM_SLEEP);
if (rsm_resource.rsmrc_root) {
(int)sizeof (*p));
/*
* Copy old data into new space and
* free old stuff
*/
}
rsm_resource.rsmrc_root = p;
}
}
/*
* Allocate a new blk
*/
/*
* Allocate slot
*/
/*
* watch out not to exceed bounds of barrier page
*/
"rsmresource_alloc failed %d\n", *rnum));
return (RSMERR_INSUFFICIENT_RESOURCES);
}
"rsmresource_alloc done\n"));
return (RSM_SUCCESS);
}
static rsmresource_t *
{
/* search for available resource slot */
int i, j;
rsmresource_t *p;
"rsmresource_free enter\n"));
i = (int)(rnum / RSMRC_BLKSZ);
j = (int)(rnum % RSMRC_BLKSZ);
if (i >= rsm_resource.rsmrc_len) {
"rsmresource_free done\n"));
return (NULL);
}
"rsmresource_free done\n"));
return (NULL);
}
p = blk->rsmrcblk_blks[j];
if (p == RSMRC_RESERVED) {
p = NULL;
}
blk->rsmrcblk_avail++;
/* free this blk */
}
"rsmresource_free done\n"));
return (p);
}
static rsmresource_t *
{
int i, j;
rsmresource_t *p;
"rsmresource_lookup enter\n"));
/* Find resource and lock it in READER mode */
/* search for available resource slot */
i = (int)(rnum / RSMRC_BLKSZ);
j = (int)(rnum % RSMRC_BLKSZ);
if (i >= rsm_resource.rsmrc_len) {
"rsmresource_lookup done\n"));
return (NULL);
}
p = blk->rsmrcblk_blks[j];
if (p != RSMRC_RESERVED) {
mutex_enter(&p->rsmrc_lock);
} else {
p = NULL;
}
}
} else {
p = NULL;
}
"rsmresource_lookup done\n"));
return (p);
}
static void
{
/* Find resource and lock it in READER mode */
/* Caller can upgrade if need be */
/* search for available resource slot */
int i, j;
"rsmresource_insert enter\n"));
i = (int)(rnum / RSMRC_BLKSZ);
j = (int)(rnum % RSMRC_BLKSZ);
p->rsmrc_type = type;
blk->rsmrcblk_blks[j] = p;
"rsmresource_insert done\n"));
}
static void
{
int i, j;
"rsmresource_destroy enter\n"));
for (i = 0; i < rsm_resource.rsmrc_len; i++) {
continue;
}
for (j = 0; j < RSMRC_BLKSZ; j++) {
"Not null slot %d, %lx\n", j,
}
}
}
if (rsm_resource.rsmrc_root) {
rsm_resource.rsmrc_len = 0;
rsm_resource.rsmrc_sz = 0;
}
"rsmresource_destroy done\n"));
}
/* ******************** Generic Key Hash Table Management ********* */
static rsmresource_t *
{
rsmresource_t *p;
for (; p; p = p->rsmrc_next) {
/* acquire resource lock */
RSMRC_LOCK(p);
break;
}
}
/* state changed, release lock and return null */
RSMRC_UNLOCK(p);
"rsmhash_lookup done: state changed\n"));
return (NULL);
}
return (p);
}
static void
{
rsmresource_t *p, **back;
/*
* It's ok not to find the segment.
*/
if (p == rcelm) {
break;
}
}
}
static int
{
/* lock table */
/*
* If the current resource state is other than the state passed in
* then the resource is (probably) already on the list. eg. for an
* import segment if the state is not RSM_STATE_NEW then it's on the
* list already.
*/
return (RSMERR_BAD_SEG_HNDL);
}
if (dup_check) {
/*
* Used for checking export segments; don't want to have
* the same key used for multiple segments.
*/
for (; p; p = p->rsmrc_next) {
break;
}
}
}
if (p == NULL) {
/* Key doesn't exist, add it */
}
}
/*
* XOR each byte of the key.
*/
static uint_t
{
return (hash % rsm_hash_size);
}
/*
* generic function to get a specific bucket
*/
static void *
{
return (NULL);
else
}
/*
* generic function to get a specific bucket's address
*/
static void **
{
return (NULL);
else
}
/*
* generic function to alloc a hash table
*/
static void
{
}
/*
* generic function to free a hash table
*/
static void
{
}
/* *********************** Exported Segment Key Management ************ */
#define rsmexport_rm(arg) \
#define rsmexport_lookup(key) \
/* ************************** Import Segment List Management ********** */
/*
* Add segment to import list. This will be useful for paging and loopback
* segment unloading.
*/
#define rsmimport_rm(arg) \
/*
* #define rsmimport_lookup(key) \
* (rsmseg_t *)rsmhash_lookup(&rsm_import_segs, (key), RSM_STATE_CONNECT)
*/
/*
* increase the ref count and make the import segment point to the
* shared data structure. Return a pointer to the share data struct
* and the shared data struct is locked upon return
*/
static rsm_import_share_t *
{
rsmresource_t *p;
/* lock table */
for (; p; p = p->rsmrc_next) {
/*
* Look for an entry that is importing the same exporter
* with the share data structure allocated.
*/
(p->rsmrc_node == node) &&
(p->rsmrc_adapter == adapter) &&
break;
}
}
if (p == NULL) {
/* we are the first importer, create the shared data struct */
}
/* we grab the shared lock before returning from this function */
shdatap->rsmsi_refcnt++;
return (shdatap);
}
/*
* the shared data structure should be locked before calling
* rsmsharecv_signal().
* Change the state and signal any waiting segments.
*/
void
{
}
}
/*
* Add to the hash table
*/
static void
void *cookie)
{
int index;
}
static void
{
int index;
else
break;
} else {
}
}
}
/* **************************Segment Structure Management ************* */
/*
* Free segment structure
*/
static void
{
/* need to take seglock here to avoid race with rsmmap_unmap() */
/* Segment is still busy */
"rsmseg_free done\n"));
return;
}
/*
* If it's an importer decrement the refcount
* and if its down to zero free the shared data structure.
* This is where failures during rsm_connect() are unrefcounted
*/
sizeof (rsm_import_share_t));
} else {
}
/*
* The following needs to be done after any
* rsmsharelock calls which use seg->s_share.
*/
}
}
static rsmseg_t *
{
/*
* allocate memory for new segment. This should be a segkmem cache.
*/
return (new);
}
/*ARGSUSED1*/
static int
{
/*
* Char only
*/
return (EINVAL);
}
/*
* Only zero can be opened, clones are used for resources.
*/
return (ENODEV);
}
return (EPERM);
}
/*
* The library function _rsm_librsm_init calls open for
* file descriptor to be returned for minor device zero.
*/
"rsm_open RDONLY done\n"));
return (DDI_SUCCESS);
}
/*
* - allocate new minor number and segment.
* - add segment to list of all segments.
* - set minordev data to segment
* - update devp argument to new device
* - update s_cred to cred; make sure you do crhold(cred);
*/
/* allocate a new resource number */
/*
* We will bind this minor to a specific resource in first
* ioctl
*/
} else {
return (EAGAIN);
}
return (DDI_SUCCESS);
}
static void
{
int e = RSM_SUCCESS;
/*
* If we are processing rsm_close wait for force_destroy
* processing to complete since force_destroy processing
* needs to finish first before we can free the segment.
* force_destroy is only for export segments
*/
}
}
/* It's ok to read the state without a lock */
case RSM_STATE_EXPORT:
/* FALLTHRU */
case RSM_STATE_BIND_QUIESCED:
/* FALLTHRU */
case RSM_STATE_BIND:
e = rsm_unbind(seg);
return;
/* FALLTHRU */
case RSM_STATE_NEW_QUIESCED:
break;
case RSM_STATE_NEW:
break;
case RSM_STATE_ZOMBIE:
/*
* Segments in this state have been removed off the
* exported segments list and have been unpublished
* and unbind. These segments have been removed during
* a callback to the rsm_export_force_destroy, which
* is called for the purpose of unlocking these
* exported memory segments when a process exits but
* leaves the segments locked down since rsm_close is
* is not called for the segments. This can happen
* when a process calls fork or exec and then exits.
* Once the segments are in the ZOMBIE state, all that
* remains is to destroy them when rsm_close is called.
* This is done here. Thus, for such segments the
* the state is changed to new so that later in this
* function rsmseg_free is called.
*/
break;
case RSM_STATE_MAP_QUIESCE:
case RSM_STATE_ACTIVE:
/* Disconnect will handle the unmap */
case RSM_STATE_CONN_QUIESCE:
case RSM_STATE_CONNECT:
case RSM_STATE_DISCONNECT:
(void) rsm_disconnect(seg);
break;
case RSM_STATE_MAPPING:
/*FALLTHRU*/
case RSM_STATE_END:
break;
default:
break;
}
/*
* check state.
* - make sure you do crfree(s_cred);
* release segment and minor number
*/
/*
* The export_force_destroy callback is created to unlock
* the exported segments of a process
* when the process does a fork or exec and then exits calls this
* function with the force flag set to 1 which indicates that the
* segment state must be converted to ZOMBIE. This state means that the
* segments still exist and have been unlocked and most importantly the
* only operation allowed is to destroy them on an rsm_close.
*/
if (force_flag) {
} else {
}
}
static int
{
return (EINVAL);
/*
* At this point we are the last reference to the resource.
* Free resource number from resource table.
* It's ok to remove number before we free the segment.
* We need to lock the resource to protect against remote calls.
*/
if (rnum == RSM_DRIVER_MINOR ||
return (DDI_SUCCESS);
}
switch (res->rsmrc_type) {
break;
case RSM_RESOURCE_BAR:
break;
default:
break;
}
return (DDI_SUCCESS);
}
/*
* rsm_inc_pgcnt
*
* Description: increment rsm page counter.
*
* Parameters: pgcnt_t pnum; number of pages to be used
*
* Returns: RSM_SUCCESS if memory limit not exceeded
* ENOSPC if memory limit exceeded. In this case, the
* page counter remains unchanged.
*
*/
static int
{
if (rsm_pgcnt_max == 0) { /* no upper limit has been set */
return (RSM_SUCCESS);
}
/* ensure that limits have not been exceeded */
return (RSMERR_INSUFFICIENT_MEM);
}
rsm_pgcnt));
return (RSM_SUCCESS);
}
/*
* rsm_dec_pgcnt
*
* Description: decrement rsm page counter.
*
* Parameters: pgcnt_t pnum; number of pages freed
*
*/
static void
{
if (rsm_pgcnt_max == 0) { /* no upper limit has been set */
return;
}
rsm_pgcnt));
}
static struct umem_callback_ops rsm_as_ops = {
UMEM_CALLBACK_VERSION, /* version number */
};
static int
{
int error = RSM_SUCCESS;
/*
* Make sure vaddr and len are aligned on a page boundary
*/
return (RSMERR_BAD_ADDR);
}
return (RSMERR_BAD_LENGTH);
}
/*
* Find number of pages
*/
if (error != RSM_SUCCESS) {
"rsm_bind_pages:mem limit exceeded\n"));
return (RSMERR_INSUFFICIENT_MEM);
}
callbackops, procp);
if (error) {
"rsm_bind_pages:ddi_umem_lock failed\n"));
/*
* ddi_umem_lock, in the case of failure, returns one of
* the following three errors. These are translated into
* the RSMERR namespace and returned.
*/
return (RSMERR_BAD_ADDR);
return (RSMERR_PERM_DENIED);
else
return (RSMERR_INSUFFICIENT_MEM);
}
return (error);
}
static int
{
/* unlock address range */
}
return (RSM_SUCCESS);
}
static int
{
int e;
"rsm_bind done:no adapter\n"));
return (RSMERR_CTLR_NOT_PRESENT);
}
/* lock address range */
"rsm: rsm_bind done: invalid vaddr\n"));
return (RSMERR_BAD_ADDR);
}
"rsm_bind: invalid length\n"));
return (RSMERR_BAD_LENGTH);
}
/* Lock segment */
"rsm_bind done: cv_wait INTERRUPTED"));
return (RSMERR_INTERRUPTED);
}
}
if (e == RSM_SUCCESS) {
}
}
/*
* Set the s_pid value in the segment structure. This is used
* to identify exported segments belonging to a particular
* process so that when the process exits, these segments can
* be unlocked forcefully even if rsm_close is not called on
* process exit since there maybe other processes referencing
* them (for example on a fork or exec).
* The s_pid value is also used to authenticate the process
* doing a publish or unpublish on the export segment. Only
* the creator of the export segment has a right to do a
* publish or unpublish and unbind on the segment.
*/
} else {
"unable to lock down pages\n"));
}
/* Unlock segment */
if (e == RSM_SUCCESS) {
/* copyout the resource number */
#ifdef _MULTI_DATAMODEL
e = RSMERR_BAD_ADDR;
}
}
#endif
e = RSMERR_BAD_ADDR;
}
}
return (e);
}
static void
{
rsmresource_t *p = NULL;
"rsm_remap_local_importers enter\n"));
for (; p; p = p->rsmrc_next) {
/*
* Change the s_cookie value of only the local importers
* which have been mapped (in state RSM_STATE_ACTIVE).
* Note that there is no need to change the s_cookie value
* if the imported segment is in RSM_STATE_MAPPING since
* eventually the s_cookie will be updated via the mapping
* functionality.
*/
}
}
"rsm_remap_local_importers done\n"));
}
static int
{
int e;
/* Check for permissions to rebind */
return (RSMERR_REBIND_NOT_ALLOWED);
}
ddi_get_pid() != 0) {
return (RSMERR_NOT_CREATOR);
}
/*
* We will not be allowing partial rebind and hence length passed
* in must be same as segment length
*/
"rsm_rebind done: null msg->vaddr\n"));
return (RSMERR_BAD_ADDR);
}
"rsm_rebind: invalid length\n"));
return (RSMERR_BAD_LENGTH);
}
/* Lock segment */
"rsm_rebind done: cv_wait INTERRUPTED"));
return (RSMERR_INTERRUPTED);
}
}
/* verify segment state */
/* Unlock segment */
"rsm_rebind done: invalid state\n"));
return (RSMERR_BAD_SEG_HNDL);
}
return (RSM_SUCCESS);
}
if (e == RSM_SUCCESS) {
if (e == RSM_SUCCESS) {
/*
* unbind the older pages, and unload local importers;
* but don't disconnect importers
*/
(void) rsm_unbind_pages(seg);
cookie);
} else {
/*
* Unbind the pages associated with "cookie" by the
* rsm_bind_pages calls prior to this. This is
* similar to what is done in the rsm_unbind_pages
* routine for the seg->s_cookie.
*/
"rsm_rebind failed with %d\n", e));
}
/*
* At present there is no dependency on the existence of xbuf.
* So we can free it here. If in the future this changes, it can
* be freed sometime during the segment destroy.
*/
}
/* Unlock segment */
return (e);
}
static int
{
/* verify segment state */
"rsm_unbind: invalid state\n"));
return (RSMERR_BAD_SEG_HNDL);
}
/* unlock current range */
(void) rsm_unbind_pages(seg);
}
return (RSM_SUCCESS);
}
/* **************************** Exporter Access List Management ******* */
static void
{
int acl_sz;
/* acl could be NULL */
}
}
static void
{
int acl_sz;
}
}
static int
{
int acl_len;
int i;
*len = 0;
"rsmacl_build done: acl invalid\n"));
return (RSMERR_BAD_ACL);
}
"rsmacl_build done: BAD_ADDR\n"));
return (RSMERR_BAD_ADDR);
}
/*
* Verify access list
*/
for (i = 0; i < acl_len; i++) {
/* invalid entry */
"rsmacl_build done: EINVAL\n"));
return (RSMERR_BAD_ACL);
}
}
}
return (DDI_SUCCESS);
}
static int
{
int i;
/*
* translate access list
*/
for (i = 0; i < acl_len; i++) {
} else {
/* invalid hwaddr */
"rsmpiacl_create done:"
"EINVAL hwaddr\n"));
return (RSMERR_INTERNAL_ERROR);
}
}
/* rsmpi understands only RSM_PERM_XXXX */
acl[i].ae_permission =
}
} else {
}
return (RSM_SUCCESS);
}
static int
{
int i;
"rsmsegacl_validate enter\n"));
/*
* Find segment and grab its lock. The reason why we grab the segment
* lock in side the search is to avoid the race when the segment is
* being deleted and we already have a pointer to it.
*/
if (!seg) {
"rsmsegacl_validate done: %u ENXIO\n", key));
return (RSMERR_SEG_NOT_PUBLISHED);
}
/*
* We implement a 2-level protection scheme.
* Second, we check if the user has access rights.
*
* This routine only validates the rnode access_list
*/
/*
* Check host access list
*/
goto found;
}
}
/* rnode is not found in the list */
"rsmsegacl_validate done: EPERM\n"));
return (RSMERR_SEG_NOT_PUBLISHED_TO_NODE);
} else {
/* use default owner creation umask */
}
/* update perm for this node */
/*
* Perm of requesting node is valid; source will validate user
*/
/*
* Add the importer to the list right away, if connect fails
* the importer will ask the exporter to remove it.
*/
return (RSM_SUCCESS);
}
/* ************************** Exporter Calls ************************* */
static int
{
int e;
int acl_len;
int loopback_flag = 0;
int create_flags = 0;
loopback_flag = 1;
ddi_get_pid() != 0) {
"rsm_publish: Not creator\n"));
return (RSMERR_NOT_CREATOR);
}
/*
* Get per node access list
*/
if (e != DDI_SUCCESS) {
"rsm_publish done: rsmacl_build failed\n"));
return (e);
}
/*
* The application provided msg->key is used for resolving a
* segment id according to the following:
* key = 0 Kernel Agent selects the segment id
* key <= RSM_DLPI_ID_END Reserved for system usage except
* RSMLIB range
* key < RSM_USER_APP_ID_BASE segment id = key
* key >= RSM_USER_APP_ID_BASE Reserved for KA selections
*
* rsm_nextavail_segmentid is initialized to 0x80000000 and
* overflows to zero after 0x80000000 allocations.
* An algorithm is needed which allows reinitialization and provides
* for reallocation after overflow. For now, ENOMEM is returned
* once the overflow condition has occurred.
*/
if (segment_id != 0) {
} else {
"rsm_publish done: no more keys avlbl\n"));
return (RSMERR_INSUFFICIENT_RESOURCES);
}
return (RSMERR_RESERVED_SEGID);
else {
return (RSMERR_RESERVED_SEGID);
}
/* Add key to exportlist; The segment lock is held on success */
if (e) {
"rsm_publish done: export_add failed: %d\n", e));
return (e);
}
/* state changed since then, free acl and return */
"rsm_publish done: segment in wrong state: %d\n",
return (RSMERR_BAD_SEG_HNDL);
}
/*
* If this is for a local memory handle and permissions are zero,
* then the surrogate segment is very large and we want to skip
* allocation of DVMA space.
*
* Careful! If the user didn't use an ACL list, acl will be a NULL
* pointer. Check that before dereferencing it.
*/
goto skipdriver;
}
/* create segment */
/* This call includes a bind operations */
/*
* create a acl list with hwaddr for RSMPI publish
*/
if (e != RSM_SUCCESS) {
"rsm_publish done: rsmpiacl_create failed: %d\n", e));
return (e);
}
/* create segment */
/* This call includes a bind operations */
}
} else {
}
create_flags, &mem,
/*
* At present there is no dependency on the existence of xbuf.
* So we can free it here. If in the future this changes, it can
* be freed sometime during the segment destroy.
*/
if (e != RSM_SUCCESS) {
"rsm_publish done: export_create failed: %d\n", e));
/*
* The following assertion ensures that the two errors
* related to the length and its alignment do not occur
* since they have been checked during export_create
*/
ASSERT(e != RSMERR_BAD_MEM_ALIGNMENT &&
e != RSMERR_BAD_LENGTH);
if (e == RSMERR_NOT_MEM)
return (e);
}
/* export segment, this should create an IMMU mapping */
if (e != RSM_SUCCESS) {
"rsm_publish done: export_publish failed: %d\n",
e));
return (e);
}
}
}
/*
* If the segment id was solicited, then return it in
* the original incoming message.
*/
#ifdef _MULTI_DATAMODEL
"rsm_publish done\n"));
}
#endif
"rsm_publish done\n"));
}
return (DDI_SUCCESS);
}
/*
* This function modifies the access control list of an already published
* segment. There is no effect on import segments which are already
* connected.
*/
static int
{
int e, i;
int loopback_flag = 0;
return (RSMERR_SEG_NOT_PUBLISHED);
ddi_get_pid() != 0) {
"rsm_republish: Not owner\n"));
return (RSMERR_NOT_CREATOR);
}
loopback_flag = 1;
/*
* Build new list first
*/
if (e) {
"rsm_republish done: rsmacl_build failed %d", e));
return (e);
}
/* Lock segment */
/*
* a republish is in progress - REPUBLISH message is being
* sent to the importers so wait for it to complete OR
* wait till DR completes
*/
"rsm_republish done: cv_wait INTERRUPTED"));
return (RSMERR_INTERRUPTED);
}
}
/* recheck if state is valid */
return (RSMERR_SEG_NOT_PUBLISHED);
}
/*
* This call will only be meaningful if and when the interconnect
* layer makes use of the access list
*/
/*
* create a acl list with hwaddr for RSMPI publish
*/
if (e != RSM_SUCCESS) {
"rsm_republish done: rsmpiacl_create failed %d", e));
return (e);
}
if (e != RSM_SUCCESS) {
"rsm_republish done: rsmpi republish failed %d\n", e));
return (e);
}
/* create a tmp copy of the new acl */
if (tmp_acl_len > 0) {
for (i = 0; i < tmp_acl_len; i++) {
}
/*
* The default permission of a node which was in the old
* ACL but not in the new ACL is 0 ie no access.
*/
permission = 0;
} else {
/*
* NULL acl means all importers can connect and
* default permission will be owner creation umask
*/
}
/* make other republishers to wait for republish to complete */
/* send the new perms to the importing nodes */
/* wake up any one waiting for republish to complete */
return (DDI_SUCCESS);
}
static int
{
int acl_len;
int e;
ddi_get_pid() != 0) {
"rsm_unpublish: Not creator\n"));
return (RSMERR_NOT_CREATOR);
}
/*
* wait for QUIESCING to complete here before rsmexport_rm
* is called because the SUSPEND_COMPLETE mesg which changes
* the seg state from EXPORT_QUIESCING to EXPORT_QUIESCED and
* signals the cv_wait needs to find it in the hashtable.
*/
"rsm_unpublish done: cv_wait INTR qscing"
return (RSMERR_INTERRUPTED);
}
}
/* verify segment state */
return (RSMERR_SEG_NOT_PUBLISHED);
}
/*
* wait for republish to complete
*/
"rsm_unpublish done: cv_wait INTR repubing"));
return (RSMERR_INTERRUPTED);
}
}
"rsm_unpublish done: invalid state"));
return (RSMERR_SEG_NOT_PUBLISHED);
}
/*
* to the driver.
*
* Be certain to see if there is an ACL first! If this segment was
* not published with an ACL, acl will be a null pointer. Check
* that before dereferencing it.
*/
goto bypass;
}
goto bypass;
for (;;) {
"rsm_unpublish done: bad state %x\n",
return (RSMERR_SEG_NOT_PUBLISHED);
}
/* unpublish from adapter */
if (e == RSM_SUCCESS) {
break;
}
/*
* wait for unpublish to succeed, it's busy.
*/
/* wait for a max of 1 ms - this is an empirical */
/* value that was found by some minimal testing */
/* can be fine tuned when we have better numbers */
/* A long term fix would be to send cv_signal */
/* from the intr callback routine */
/* currently nobody signals this wait */
"rsm_unpublish: SEG_IN_USE\n"));
} else {
if (mode == 1) {
"rsm:rsmpi unpublish err %x\n", e));
}
return (e);
}
}
/* Free segment */
if (e != RSM_SUCCESS) {
"rsm_unpublish: rsmpi destroy key=%x failed %x\n",
}
}
return (DDI_SUCCESS);
}
/*
* Called from rsm_unpublish to force an unload and disconnection of all
* importers of the unpublished segment.
*
* First build the list of segments requiring a force disconnect, then
* send a request for each.
*/
static void
{
int index;
"rsm_send_importer_disconnects enter\n"));
prev_token = NULL;
/*
* take it off the importer list and add it
* to the force disconnect list.
*/
if (prev_token == NULL)
else
if (force_disconnect_list == NULL) {
} else {
/*
* make sure that the tmp_token's node
* is not already on the force disconnect
* list.
*/
if (tokp->importing_node ==
break;
}
}
} else {
sizeof (*token));
}
}
} else {
prev_token = token;
}
}
} else {
for (;;) {
&request,
RSM_NO_REPLY) == RSM_SUCCESS) {
break;
} else {
}
}
}
}
"rsm_send_importer_disconnects done\n"));
}
/*
* This function is used as a callback for unlocking the pages locked
* down by a process which then does a fork or an exec.
* It marks the export segments corresponding to umem cookie given by
* the *arg to be in a ZOMBIE state(by calling rsmseg_close to be
* destroyed later when an rsm_close occurs).
*/
static void
{
rsmresource_t *p;
int i, j;
int found = 0;
"rsm_export_force_destroy enter\n"));
/*
* Walk the resource list and locate the export segment (either
* in the BIND or the EXPORT state) which corresponds to the
* ddi_umem_cookie_t being freed up, and call rsmseg_close.
* Change the state to ZOMBIE by calling rsmseg_close with the
* force_flag argument (the second argument) set to 1. Also,
* unpublish and unbind the segment, but don't free it. Free it
* only on a rsm_close call for the segment.
*/
for (i = 0; i < rsm_resource.rsmrc_len; i++) {
continue;
}
for (j = 0; j < RSMRC_BLKSZ; j++) {
p = blk->rsmrcblk_blks[j];
if ((p != NULL) && (p != RSMRC_RESERVED) &&
(p->rsmrc_type == RSM_RESOURCE_EXPORT_SEGMENT)) {
continue; /* continue searching */
/*
* Found the segment, set flag to indicate
* force destroy processing is in progress
*/
found = 1;
break;
}
}
if (found)
break;
}
if (found) {
/* call rsmseg_close with force flag set to 1 */
/*
* force destroy processing done, clear flag and signal any
* thread waiting in rsmseg_close.
*/
}
"rsm_export_force_destroy done\n"));
}
/* ******************************* Remote Calls *********************** */
static void
{
"rsm_intr_segconnect enter\n"));
"rsm_intr_segconnect done\n"));
}
/*
* When an exported segment is unpublished the exporter sends an ipc
* message (RSMIPC_MSG_DISCONNECT) to all importers. The recv ipc dispatcher
* calls this function. The import list is scanned; segments which match the
* exported segment id are unloaded and disconnected.
*
* Will also be called from rsm_rebind with disconnect_flag FALSE.
*
*/
static void
{
rsmresource_t *p = NULL;
for (; p; p = p->rsmrc_next) {
/*
* In order to make rsmseg_unload and rsm_force_unload
* thread safe, acquire the segment lock here.
* rsmseg_unload is responsible for releasing the lock.
* rsmseg_unload releases the lock just before a call
* to rsmipc_send or in case of an early exit which
* occurs if the segment was in the state
* RSM_STATE_CONNECTING or RSM_STATE_NEW.
*/
if (disconnect_flag)
}
}
}
static void
{
/*
* Find slot for cookie in reply.
* Match sequence with sequence in cookie
* If no match; return
* Try to grap lock of slot, if locked return
* copy data into reply slot area
* signal waiter
*/
return;
}
/* found a match */
}
} else {
"rsm: rsm_intr_reply mismatched reply %d\n",
}
}
/*
* This function gets dispatched on the worker thread when we receive
* the SQREADY message. This function sends the SQREADY_ACK message.
*/
static void
rsm_sqready_ack_deferred(void *arg)
{
"rsm_sqready_ack_deferred enter\n"));
/*
* If path is not active no point in sending the ACK
* because the whole SQREADY protocol will again start
* when the path becomes active.
*/
/*
* decrement the path refcnt incremented in rsm_proc_sqready
*/
"rsm_sqready_ack_deferred done:!ACTIVE\n"));
return;
}
/* send an SQREADY_ACK message */
/* initialize credits to the max level */
/* wake up any send that is waiting for credits */
/*
* decrement the path refcnt since we incremented it in
* rsm_proc_sqready
*/
"rsm_sqready_ack_deferred done\n"));
}
/*
* Process the SQREADY message
*/
static void
{
/* look up the path - incr the path refcnt */
/*
* No path exists or path is not active - drop the message
*/
"rsm_proc_sqready done: msg dropped no path\n"));
return;
}
/* drain any tasks from the previous incarnation */
/*
* If we'd sent an SQREADY message and were waiting for SQREADY_ACK
* in the meanwhile we received an SQREADY message, blindly reset
* the WAIT_FOR_SQACK flag because we'll just send SQREADY_ACK
* and forget about the SQREADY that we sent.
*/
/* decr refcnt and drop the mutex */
"rsm_proc_sqready done: msg dropped path !ACTIVE\n"));
return;
}
/*
* The sender's local incarnation number is our remote incarnation
* number save it in the path data structure
*/
path->procmsg_cnt = 0;
/*
* path is active - dispatch task to send SQREADY_ACK - remember
* RSMPI calls can't be done in interrupt context
*
* We can use the recv_taskq to send because the remote endpoint
* cannot start sending messages till it receives SQREADY_ACK hence
* at this point there are no tasks on recv_taskq.
*
* The path refcnt will be decremented in rsm_sqready_ack_deferred.
*/
}
/*
* Process the SQREADY_ACK message
*/
static void
{
"rsm_proc_sqready_ack enter\n"));
/* look up the path - incr the path refcnt */
/*
* drop the message if - no path exists or path is not active
* or if its not waiting for SQREADY_ACK message
*/
"rsm_proc_sqready_ack done: msg dropped no path\n"));
return;
}
/* decrement the refcnt */
"rsm_proc_sqready_ack done: msg dropped\n"));
return;
}
/*
* Check if this message is in response to the last RSMIPC_MSG_SQREADY
* sent, if not drop it.
*/
/* decrement the refcnt */
"rsm_proc_sqready_ack done: msg old incn %lld\n",
msghdr->rsmipc_incn));
return;
}
/*
* clear the WAIT_FOR_SQACK flag since we have recvd the ack
*/
/* save the remote sendq incn number */
/* initialize credits to the max level */
/* wake up any send that is waiting for credits */
/* decrement the refcnt */
"rsm_proc_sqready_ack done\n"));
}
/*
* process the RSMIPC_MSG_CREDIT message
*/
static void
{
/* look up the path - incr the path refcnt */
"rsm_add_credits enter: path not found\n"));
return;
}
/* the path is not active - discard credits */
"rsm_add_credits enter:path=%lx !ACTIVE\n", path));
return;
}
/*
* Check if these credits are for current incarnation of the path.
*/
/* decrement the refcnt */
"rsm_add_credits enter: old incn %lld\n",
msghdr->rsmipc_incn));
return;
}
"rsm_add_credits:path=%lx new-creds=%d "
src_hwaddr));
/* add credits to the path's sendq */
/* wake up any send that is waiting for credits */
/* decrement the refcnt */
}
static void
{
rsmresource_t *p;
/* This is for an import segment */
for (; p; p = p->rsmrc_next) {
(p->rsmrc_node == src_node)) {
}
}
} else {
/* This is for an export segment */
if (!seg) {
"rsm_intr_event done: exp seg not found\n"));
return;
}
/*
* We must hold the segment lock here, or else the segment
* can be freed while pollwakeup is using it. This implies
* that we MUST NOT grab the segment lock during rsm_chpoll,
* as outlined in the chpoll(2) man page.
*/
}
}
/*
* The exporter did a republish and changed the ACL - this change is only
* visible to new importers.
*/
static void
{
rsmresource_t *p;
for (; p; p = p->rsmrc_next) {
/*
* find the importer and update the permission in the shared
* data structure. Any new importers will use the new perms
*/
break;
}
}
}
void
{
"rsm_suspend_complete enter\n"));
"rsm_suspend_complete done: suspend_list is empty\n"));
return;
}
/* clear the pending flag for the node */
}
done = 0; /* still some nodes have not yet ACKED */
}
if (!done) {
"rsm_suspend_complete done: acks pending\n"));
return;
}
/*
* Now that we are done with suspending all the remote importers
* time to quiesce the local exporters
*/
"rsm_suspend_complete done\n"));
}
static void
{
int i, e;
/*
* The importers send a SUSPEND_COMPLETE to the exporter node
* Unpublish, unbind the export segment and
* move the segments to the EXPORT_QUIESCED state
*/
for (i = 0; i < rsm_hash_size; i++) {
if (current->rsmrc_state ==
/*
* some local memory handles are not published
* check if it was published
*/
"exporter_quiesce:unpub %d\n", e));
"exporter_quiesce:destroy %d\n",
e));
}
(void) rsm_unbind_pages(seg);
}
}
}
/*
* All the local segments we are done with the pre-del processing
* - time to move to PREDEL_COMPLETED.
*/
}
static void
{
int i;
int susp_flg; /* true means already suspended */
int num_importers;
for (i = 0; i < rsm_hash_size; i++) {
/*
* Suspend all importers with same <node, key> pair.
* After the last one of the shared importers has been
* suspended - suspend the shared mappings/connection.
*/
for (; p; p = p->rsmrc_next) {
continue; /* go to next entry */
/*
* search the rest of the bucket for
* other siblings (imprtrs with the same key)
* of "first" and suspend them.
* All importers with same key fall in
* the same bucket.
*/
num_importers = 0;
/*
* either not a peer segment or its a
* disconnected segment - skip it
*/
continue;
}
if (susp_flg) { /* seg already suspended */
break; /* the inner for loop */
}
/*
* we've processed all importers that are
* siblings of "first"
*/
if (num_importers ==
break;
}
}
/*
* All the importers with the same key and
* nodeid as "first" have been suspended.
* This is done only once.
*/
if (!susp_flg) {
}
}
}
/* send an ACK for SUSPEND message */
}
static void
{
int recheck_state;
*susp_flg = 0;
do {
recheck_state = 0;
"rsmseg_suspend:segment %x state=%d\n",
case RSM_STATE_NEW:
/* not a valid state */
break;
case RSM_STATE_CONNECTING:
break;
case RSM_STATE_ABORT_CONNECT:
break;
case RSM_STATE_CONNECT:
break;
case RSM_STATE_MAPPING:
/* wait until segment leaves the mapping state */
recheck_state = 1;
break;
case RSM_STATE_ACTIVE:
/* unload the mappings */
}
}
break;
case RSM_STATE_CONN_QUIESCE:
/* FALLTHRU */
case RSM_STATE_MAP_QUIESCE:
/* rsmseg_suspend already done for seg */
*susp_flg = 1;
break;
case RSM_STATE_DISCONNECT:
break;
default:
ASSERT(0); /* invalid state */
}
} while (recheck_state);
}
static void
{
int e;
"rsmsegshare_suspend enter\n"));
switch (sharedp->rsmsi_state) {
case RSMSI_STATE_NEW:
break;
case RSMSI_STATE_CONNECTING:
break;
break;
case RSMSI_STATE_CONNECTED:
/* do the rsmpi disconnect */
"rsm:rsmpi disconnect seg=%x:err=%d\n",
sharedp->rsmsi_segid, e));
}
break;
case RSMSI_STATE_CONN_QUIESCE:
break;
case RSMSI_STATE_MAPPED:
/* do the rsmpi unmap and disconnect */
"rsmshare_suspend: rsmpi unmap %d\n", e));
"rsm:rsmpi disconnect seg=%x:err=%d\n",
sharedp->rsmsi_segid, e));
}
break;
case RSMSI_STATE_MAP_QUIESCE:
break;
case RSMSI_STATE_DISCONNECTED:
break;
default:
ASSERT(0); /* invalid state */
}
"rsmsegshare_suspend done\n"));
}
/*
* This should get called on receiving a RESUME message or from
* the pathmanger if the node undergoing DR dies.
*/
static void
{
int i;
rsmresource_t *p = NULL;
void *cookie;
for (i = 0; i < rsm_hash_size; i++) {
for (; p; p = p->rsmrc_next) {
/* process only importers of node undergoing DR */
continue;
}
/*
* inform the exporter so that it can
* remove the importer.
*/
} else {
}
}
}
}
static int
{
int e;
int retc;
rsm_mapinfo_t *p;
return (RSM_SUCCESS);
}
/* shared state can either be connected or mapped */
} else { /* error in rsmpi connect during resume */
sharedp->rsmsi_refcnt--;
if (sharedp->rsmsi_refcnt == 0) {
/* clean up the shared data structure */
sizeof (rsm_import_share_t));
} else {
}
/*
* The following needs to be done after any
* rsmsharelock calls which use seg->s_share.
*/
}
/* signal any waiting segment */
return (retc);
}
/* Setup protections for remap */
}
maxprot |= PROT_WRITE;
}
/* error in rsmpi connect or map during resume */
/* remap to trash page */
"rsmseg_resume:remap=%d\n", e));
}
sharedp->rsmsi_refcnt--;
sharedp->rsmsi_mapcnt--;
if (sharedp->rsmsi_refcnt == 0) {
/* clean up the shared data structure */
sizeof (rsm_import_share_t));
} else {
}
/*
* The following needs to be done after any
* rsmsharelock calls which use seg->s_share.
*/
/* signal any waiting segment */
"rsmseg_resume done:seg=%x,err=%d\n",
return (retc);
}
"rsmseg_resume:remap=%d\n", e));
}
} else { /* remote exporter */
/* remap to the new rsmpi maps */
&dev_offset, &maplen);
"rsmseg_resume:remap=%d\n", e));
}
}
return (retc);
}
static int
{
int e = RSM_SUCCESS;
/*
* If we are not in a xxxx_QUIESCE state that means shared
* so return success.
*/
return (RSM_SUCCESS);
}
"rsmsegshare_resume:rsmpi connect seg=%x:err=%d\n",
sharedp->rsmsi_segid, e));
if (e != RSM_SUCCESS) {
/* when do we send the NOT_IMPORTING message */
/* signal any waiting segment */
return (e);
}
}
/* signal any waiting segment */
return (e);
}
/* do the rsmpi map of the whole segment here */
rsm_mapinfo_t *p;
/*
* We need to do rsmpi maps with <off, lens> identical to
* the old mapinfo list because the segment mapping handles
* dhp and such need the fragmentation of rsmpi maps to be
* identical to what it was during the mmap of the segment
*/
p = sharedp->rsmsi_mapinfo;
while (p != NULL) {
mapped_len = 0;
p->individual_len, &mapped_len,
if (e != 0) {
"rsmsegshare_resume: rsmpi map err=%d\n",
e));
break;
}
if (mapped_len != p->individual_len) {
"rsmsegshare_resume: rsmpi maplen"
"< reqlen=%lx\n", mapped_len));
e = RSMERR_BAD_LENGTH;
break;
}
p = p->next;
}
if (e != RSM_SUCCESS) { /* rsmpi map failed */
int err;
/* Check if this is the first rsm_map */
if (p != sharedp->rsmsi_mapinfo) {
/*
* A single rsm_unmap undoes multiple rsm_maps.
*/
}
"rsmsegshare_resume:disconn seg=%x:err=%d\n",
/* signal the waiting segments */
"rsmsegshare_resume done: rsmpi map err\n"));
return (e);
}
}
/* signal any waiting segment */
return (e);
}
/*
* this is the routine that gets called by recv_taskq which is the
* thread that processes messages that are flow-controlled.
*/
static void
rsm_intr_proc_deferred(void *arg)
{
int e;
"rsm_intr_proc_deferred enter\n"));
/* use the head of the msgbuf_queue */
/*
* messages that need to send a reply should check the message version
* before processing the message. And all messages that need to
* send a reply should be processed here by the worker thread.
*/
switch (msghdr->rsmipc_type) {
case RSMIPC_MSG_SEGCONNECT:
} else {
}
break;
case RSMIPC_MSG_DISCONNECT:
break;
case RSMIPC_MSG_SUSPEND:
break;
case RSMIPC_MSG_SUSPEND_DONE:
break;
case RSMIPC_MSG_RESUME:
break;
default:
ASSERT(0);
}
/* incr procmsg_cnt can be at most RSMIPC_MAX_MESSAGES */
path->procmsg_cnt++;
/* No need to send credits if path is going down */
/*
* send credits and reset procmsg_cnt if success otherwise
* credits will be sent after processing the next message
*/
if (e == 0)
path->procmsg_cnt = 0;
else
"rsm_intr_proc_deferred:send credits err=%d\n", e));
}
/*
* decrement the path refcnt since we incremented it in
* rsm_intr_callback_dispatch
*/
"rsm_intr_proc_deferred done\n"));
}
/*
* Flow-controlled messages are enqueued and dispatched onto a taskq here
*/
static void
{
"rsm_intr_callback_dispatch enter\n"));
/* look up the path - incr the path refcnt */
/* the path has been removed - drop this message */
"rsm_intr_callback_dispatch done: msg dropped\n"));
return;
}
/* the path is not active - don't accept new messages */
"rsm_intr_callback_dispatch done: msg dropped"
" path=%lx !ACTIVE\n", path));
return;
}
/*
* Check if this message was sent to an older incarnation
*/
/* decrement the refcnt */
"rsm_intr_callback_dispatch done: old incn %lld\n",
msghdr->rsmipc_incn));
return;
}
/* copy and enqueue msg on the path's msgbuf queue */
/*
* schedule task to process messages - ignore retval from
* task_dispatch because we sender cannot send more than
* what receiver can handle.
*/
"rsm_intr_callback_dispatch done\n"));
}
/*
* This procedure is called from rsm_srv_func when a remote node creates a
* a send queue. This event is used as a hint that an earlier failed
* attempt to create a send queue to that remote node may now succeed and
* should be retried. Indication of an earlier failed attempt is provided
* by the RSMKA_SQCREATE_PENDING flag.
*/
static void
{
"rsm_sqcreateop_callback enter\n"));
/* look up the path - incr the path refcnt */
"rsm_sqcreateop_callback done: no path\n"));
return;
}
/*
* previous attempt to create sendq had failed, retry
* it and move to RSMKA_PATH_ACTIVE state if successful.
* the refcnt will be decremented in the do_deferred_work
*/
} else {
/* decrement the refcnt */
}
"rsm_sqcreateop_callback done\n"));
}
static void
{
msghdr->rsmipc_type));
/*
* Check for the version number in the msg header. If it is not
* RSM_VERSION, drop the message. In the future, we need to manage
* incompatible version numbers in some way
*/
/*
* Drop requests that don't have a reply right here
* Request with reply will send a BAD_VERSION reply
* when they get processed by the worker thread.
*/
return;
}
}
switch (msghdr->rsmipc_type) {
case RSMIPC_MSG_SEGCONNECT:
case RSMIPC_MSG_DISCONNECT:
case RSMIPC_MSG_SUSPEND:
case RSMIPC_MSG_SUSPEND_DONE:
case RSMIPC_MSG_RESUME:
/*
* These message types are handled by a worker thread using
* the flow-control algorithm.
* Any message processing that does one or more of the
* following should be handled in a worker thread.
* - allocates resources and might sleep
* - makes RSMPI calls down to the interconnect driver
* this by defn include requests with reply.
* - takes a long duration of time
*/
break;
case RSMIPC_MSG_NOTIMPORTING:
break;
case RSMIPC_MSG_SQREADY:
break;
case RSMIPC_MSG_SQREADY_ACK:
break;
case RSMIPC_MSG_CREDIT:
break;
case RSMIPC_MSG_REPLY:
break;
case RSMIPC_MSG_BELL:
break;
case RSMIPC_MSG_IMPORTING:
break;
case RSMIPC_MSG_REPUBLISH:
break;
default:
"rsm_intr_callback: bad msg %lx type %d data %lx\n",
}
}
{
switch (opcode) {
case RSM_INTR_Q_OP_CREATE:
break;
case RSM_INTR_Q_OP_DESTROY:
break;
case RSM_INTR_Q_OP_RECEIVE:
break;
default:
"rsm_srv_func: unknown opcode = %x\n", opcode));
}
return (RSM_INTR_HAND_CLAIMED);
}
/* *************************** IPC slots ************************* */
static rsmipc_slot_t *
{
int i;
/* try to find a free slot, if not wait */
}
/* An empty slot is available, find it */
break;
}
}
return (slot);
}
static void
{
}
}
static int
{
int e = 0;
int credit_check = 0;
int retry_cnt = 0;
int min_retry_cnt = 10;
dest));
/*
* Check if this is a local case
*/
case RSMIPC_MSG_SEGCONNECT:
break;
case RSMIPC_MSG_BELL:
break;
case RSMIPC_MSG_IMPORTING:
break;
case RSMIPC_MSG_NOTIMPORTING:
break;
case RSMIPC_MSG_REPUBLISH:
req->rsmipc_perm);
break;
case RSMIPC_MSG_SUSPEND:
break;
case RSMIPC_MSG_SUSPEND_DONE:
rsm_suspend_complete(dest, 0);
break;
case RSMIPC_MSG_RESUME:
break;
default:
ASSERT(0);
}
"rsmipc_send done\n"));
return (0);
}
"rsm: rsmipc_send bad node number %x\n", dest));
return (RSMERR_REMOTE_NODE_UNREACHABLE);
}
/*
* Oh boy! we are going remote.
*/
/*
* identify if we need to have credits to send this message
* - only selected requests are flow controlled
*/
"rsmipc_send:request type=%d\n",
case RSMIPC_MSG_SEGCONNECT:
case RSMIPC_MSG_DISCONNECT:
case RSMIPC_MSG_IMPORTING:
case RSMIPC_MSG_SUSPEND:
case RSMIPC_MSG_SUSPEND_DONE:
case RSMIPC_MSG_RESUME:
credit_check = 1;
break;
default:
credit_check = 0;
}
}
if (retry_cnt++ == min_retry_cnt) {
/* backoff before further retries for 10ms */
retry_cnt = 0; /* reset retry_cnt */
}
if (sendq_token == NULL) {
"rsm: rsmipc_send no device to reach node %d\n", dest));
return (RSMERR_REMOTE_NODE_UNREACHABLE);
}
if ((sendq_token == used_sendq_token) &&
((e == RSMERR_CONN_ABORTED) || (e == RSMERR_TIMEOUT) ||
(e == RSMERR_COMM_ERR_MAYBE_DELIVERED))) {
return (RSMERR_CONN_ABORTED);
} else
/* lint -save -e413 */
/* lint -restore */
/* Send request without ack */
/*
* Set the rsmipc_version number in the msghdr for KA
* communication versioning
*/
/*
* remote endpoints incn should match the value in our
* path's remote_incn field. No need to grab any lock
* since we have refcnted the path in rsmka_get_sendq_token
*/
if (credit_check) {
/*
* wait till we recv credits or path goes down. If path
* goes down rsm_send will fail and we handle the error
* then
*/
while ((sendq_token->msgbuf_avail == 0) &&
if (e == 0) {
no_reply_cnt++;
"rsmipc_send done: "
"cv_wait INTERRUPTED"));
return (RSMERR_INTERRUPTED);
}
}
/*
* path is not active retry on another path.
*/
e = RSMERR_CONN_ABORTED;
"rsm: rsmipc_send: path !ACTIVE"));
goto again;
}
/*
* reserve a msgbuf
*/
NULL);
if (e != RSM_SUCCESS) {
/*
* release the reserved msgbuf since
* the send failed
*/
}
} else
NULL);
no_reply_cnt++;
if (e != RSM_SUCCESS) {
"rsm: rsmipc_send no reply send"
" err = %d no reply count = %d\n",
e, no_reply_cnt));
ASSERT(e != RSMERR_QUEUE_FENCE_UP &&
e != RSMERR_BAD_BARRIER_HNDL);
goto again;
} else {
"rsmipc_send done\n"));
return (e);
}
}
/* Send reply - No flow control is done for reply */
/*
* Set the version in the msg header for KA communication
* versioning
*/
/* incn number is not used for reply msgs currently */
if (e != RSM_SUCCESS) {
"rsm: rsmipc_send reply send"
" err = %d\n", e));
goto again;
} else {
"rsmipc_send done\n"));
return (e);
}
}
/* Reply needed */
/*
* Set the rsmipc_version number in the msghdr for KA
* communication versioning
*/
/*
* remote endpoints incn should match the value in our
* path's remote_incn field. No need to grab any lock
* since we have refcnted the path in rsmka_get_sendq_token
*/
if (credit_check) {
/*
* wait till we recv credits or path goes down. If path
* goes down rsm_send will fail and we handle the error
* then.
*/
while ((sendq_token->msgbuf_avail == 0) &&
if (e == 0) {
"rsmipc_send done: "
"cv_wait INTERRUPTED"));
return (RSMERR_INTERRUPTED);
}
}
/*
* path is not active retry on another path.
*/
e = RSMERR_CONN_ABORTED;
"rsm: rsmipc_send: path !ACTIVE"));
goto again;
}
/*
* reserve a msgbuf
*/
NULL);
if (e != RSM_SUCCESS) {
/*
* release the reserved msgbuf since
* the send failed
*/
}
} else
NULL);
if (e != RSM_SUCCESS) {
"rsm: rsmipc_send rsmpi send err = %d\n", e));
goto again;
}
/* wait for a reply signal, a SIGINT, or 5 sec. timeout */
ticks);
if (e < 0) {
/* timed out - retry */
e = RSMERR_TIMEOUT;
} else if (e == 0) {
/* signalled - return error */
e = RSMERR_INTERRUPTED;
break;
} else {
e = RSM_SUCCESS;
}
}
return (e);
}
static int
{
/*
* inform the exporter to delete this importer
*/
}
static void
{
int i;
int index;
/*
* send the new access mode to all the nodes that have imported
* this segment.
* If the new acl does not have a node that was present in
* the old acl a access permission of 0 is sent.
*/
/*
* create a list of node/permissions to send the republish message
*/
for (i = 0; i < acl_len; i++) {
break;
}
}
republish_list = rp;
}
}
while (republish_list != NULL) {
&request, RSM_NO_REPLY);
rp = republish_list;
}
}
static void
{
int i, e;
"rsm_send_suspend enter\n"));
/*
* create a list of node to send the suspend message
*
* Currently the whole importer list is scanned and we obtain
* all the nodes - this basically gets all nodes that at least
* import one segment from the local node.
*
* no need to grab the rsm_suspend_list lock here since we are
* single threaded when suspend is called.
*/
for (i = 0; i < rsm_hash_size; i++) {
/*
* make sure that the token's node
* is not already on the suspend list
*/
break;
}
}
KM_SLEEP);
}
}
}
return;
}
/*
* update the suspend list righaway so that if a node dies the
* pathmanager can set the NODE dead flag
*/
/*
* Error in rsmipc_send currently happens due to inaccessibility
* of the remote node.
*/
if (e == RSM_SUCCESS) { /* send failed - don't wait for ack */
}
}
"rsm_send_suspend done\n"));
}
static void
{
/*
* save the suspend list so that we know where to send
* the resume messages and make the suspend list head
* NULL.
*/
}
}
/*
* This function takes path and sends a message using the sendq
* corresponding to it. The RSMIPC_MSG_SQREADY, RSMIPC_MSG_SQREADY_ACK
* and RSMIPC_MSG_CREDIT are sent using this function.
*/
int
{
int e;
int retry_cnt = 0;
int min_retry_cnt = 10;
"rsmipc_send_controlmsg enter\n"));
"rsmipc_send_controlmsg done: ! RSMKA_PATH_ACTIVE"));
return (1);
}
if (msgtype == RSMIPC_MSG_CREDIT)
/* incr the sendq, path refcnt */
do {
/* drop the path lock before doing the rsm_send */
ASSERT(e != RSMERR_QUEUE_FENCE_UP &&
e != RSMERR_BAD_BARRIER_HNDL);
if (e == RSM_SUCCESS) {
break;
}
/* error counter for statistics */
"rsmipc_send_controlmsg:rsm_send error=%d", e));
retry_cnt = 0;
}
/* decrement the sendq,path refcnt that we incr before rsm_send */
"rsmipc_send_controlmsg done=%d", e));
return (e);
}
/*
* Called from rsm_force_unload and path_importer_disconnect. The memory
* mapping for the imported segment is removed and the segment is
* disconnected at the interconnect layer if disconnect_flag is TRUE.
* rsm_force_unload will get disconnect_flag TRUE from rsm_intr_callback
* and FALSE from rsm_rebind.
*
* When subsequent accesses cause page faulting, the dummy page is mapped
* to resolve the fault, and the mapping generation number is incremented
* so that the application can be notified on a close barrier operation.
*
* It is important to note that the caller of rsmseg_unload is responsible for
* acquiring the segment lock before making a call to rsmseg_unload. This is
* required to make the caller and rsmseg_unload thread safe. The segment lock
* will be released by the rsmseg_unload function.
*/
void
{
void *shared_cookie;
/* wait until segment leaves the mapping state */
/*
* An unload is only necessary if the segment is connected. However,
* if the segment was on the import list in state RSM_STATE_CONNECTING
* then a connection was in progress. Change to RSM_STATE_NEW
* here to cause an early exit from the connection process.
*/
"rsmseg_unload done: RSM_STATE_NEW\n"));
return;
"rsmseg_unload done: RSM_STATE_CONNECTING\n"));
return;
}
int e;
/* Setup protections for remap */
}
maxprot |= PROT_WRITE;
}
"remap returns %d\n", e));
}
}
if (shared_cookie != NULL) {
/*
* inform the exporting node so this import
* can be deleted from the list of importers.
*/
} else {
}
}
else
}
/* ****************************** Importer Calls ************************ */
static int
{
int shifts = 0;
shifts += 3;
shifts += 3;
}
if (mode == 0)
return (0);
}
static int
{
int e;
int recheck_state = 0;
void *shared_cookie;
rsm_addr_t addr = 0;
"rsm_connect done:ENODEV adapter=NULL\n"));
return (RSMERR_CTLR_NOT_PRESENT);
}
"rsm_connect done:ENODEV loopback\n"));
return (RSMERR_CTLR_NOT_PRESENT);
}
/*
* Translate perm to access
*/
"rsm_connect done:EINVAL invalid perms\n"));
return (RSMERR_BAD_PERMS);
}
access = 0;
/*
* Adding to the import list locks the segment; release the segment
* lock so we can get the reply for the send.
*/
if (e) {
"rsm_connect done:rsmimport_add failed %d\n", e));
return (e);
}
/*
* Set the s_adapter field here so as to have a valid comparison of
* the adapter and the s_adapter value during rsmshare_get. For
* any error, set s_adapter to NULL before doing a release_adapter
*/
/*
* get the pointer to the shared data structure; the
* shared data is locked and refcount has been incremented
*/
do {
/* flag indicates whether we need to recheck the state */
recheck_state = 0;
switch (sharedp->rsmsi_state) {
case RSMSI_STATE_NEW:
break;
case RSMSI_STATE_CONNECTING:
/* FALLTHRU */
case RSMSI_STATE_CONN_QUIESCE:
/* FALLTHRU */
case RSMSI_STATE_MAP_QUIESCE:
/* wait for the state to change */
while ((sharedp->rsmsi_state ==
(sharedp->rsmsi_state ==
(sharedp->rsmsi_state ==
&sharedp->rsmsi_lock) == 0) {
/* signalled - clean up and return */
"rsm_connect done: INTERRUPTED\n"));
return (RSMERR_INTERRUPTED);
}
}
/*
* the state changed, loop back and check what it is
*/
recheck_state = 1;
break;
/* exit the loop and clean up further down */
break;
case RSMSI_STATE_CONNECTED:
/* already connected, good - fall through */
case RSMSI_STATE_MAPPED:
/* already mapped, wow - fall through */
/* access validation etc is done further down */
break;
case RSMSI_STATE_DISCONNECTED:
/* disconnected - so reconnect now */
break;
default:
ASSERT(0); /* Invalid State */
}
} while (recheck_state);
/* we are the first to connect */
"rsm_connect done: hwaddr<0\n"));
return (RSMERR_INTERNAL_ERROR);
}
} else {
}
/*
* send request to node [src, dest, key, msgid] and get back
* [status, msgid, cookie]
*/
/*
* we need the s_mode of the exporter so pass
* RSM_ACCESS_TRUSTED
*/
if (e) {
"rsm_connect done:rsmipc_send failed %d\n", e));
return (e);
}
"rsm_connect done:rsmipc_send reply err %d\n",
return (reply.rsmipc_status);
}
/* store the information recvd into the shared data struct */
}
/*
* Get the segment lock and check for a force disconnect
* from the export side which would have changed the state
* back to RSM_STATE_NEW. Once the segment lock is acquired a
* force disconnect will be held off until the connection
* has completed.
*/
/*
* set a flag indicating abort handling has been
* done
*/
/* send a message to exporter - only once */
/*
* wake up any waiting importers and inform that
* connection has been aborted
*/
}
"rsm_connect done: RSM_STATE_ABORT_CONNECT\n"));
return (RSMERR_INTERRUPTED);
}
/*
* We need to verify that this process has access
*/
if (e) {
/*
* No need to lock segment it has been removed
* from the hash table
*/
/* this is the first importer */
}
"rsm_connect done: ipcaccess failed\n"));
return (RSMERR_PERM_DENIED);
}
/* update state and cookie */
if (e != RSM_SUCCESS) {
/*
* inform the exporter to delete this importer
*/
/*
* Now inform any waiting importers to
* retry connect. This needs to be done
* after sending notimporting so that
* the notimporting is sent before a waiting
* importer sends a segconnect while retrying
*
* No need to lock segment it has been removed
* from the hash table
*/
"rsm_connect error %d\n", e));
if (e == RSMERR_SEG_NOT_PUBLISHED_TO_RSM_ADDR)
return (
else if ((e == RSMERR_RSM_ADDR_UNREACHABLE) ||
(e == RSMERR_UNKNOWN_RSM_ADDR))
return (RSMERR_REMOTE_NODE_UNREACHABLE);
else
return (e);
}
}
}
if (bar_va) {
/* increment generation number on barrier page */
/* return user off into barrier page where status will be */
} else {
}
/* Return back to user the segment size & perm in case it's needed */
#ifdef _MULTI_DATAMODEL
else
"rsm_connect done\n"));
return (RSMERR_BAD_ADDR);
else
return (RSM_SUCCESS);
}
#endif
mode))
return (RSMERR_BAD_ADDR);
else
return (RSM_SUCCESS);
}
static int
{
int err;
/* assert seg is locked */
/* segment unmap has already been done */
return (RSM_SUCCESS);
}
/*
* - shared data struct is in MAPPED or MAP_QUIESCE state
*/
/*
* Unmap pages - previously rsm_memseg_import_unmap was called only if
* the segment cookie list was NULL; but it is always NULL when
* called from rsmmap_unmap and won't be NULL when called for
* a force disconnect - so the check for NULL cookie list was removed
*/
sharedp->rsmsi_mapcnt--;
if (sharedp->rsmsi_mapcnt == 0) {
/* unmap the shared RSMPI mapping */
"rsm_unmap: rsmpi unmap %d\n", err));
}
} else { /* MAP_QUIESCE --munmap()--> CONN_QUIESCE */
}
}
/*
* The s_cookie field is used to store the cookie returned from the
* ddi_umem_lock when binding the pages for an export segment. This
* is the primary use of the s_cookie field and does not normally
* pertain to any importing segment except in the loopback case.
* For the loopback case, the import segment and export segment are
* on the same node, the s_cookie field of the segment structure for
* the importer is initialized to the s_cookie field in the exported
* segment during the map operation and is used during the call to
* devmap_umem_setup for the import mapping.
* Thus, during unmap, we simply need to set s_cookie to NULL to
* indicate that the mapping no longer exists.
*/
else
return (RSM_SUCCESS);
}
/*
* cookie returned here if not null indicates that it is
* the last importer and it can be used in the RSMIPC_NOT_IMPORTING
* message.
*/
static int
{
int e;
"rsm_closeconnection enter\n"));
/* assert seg is locked */
"rsm_closeconnection done: already disconnected\n"));
return (RSM_SUCCESS);
}
}
/*
* Disconnect on adapter
*
* The current algorithm is stateless, I don't have to contact
* server when I go away. He only gives me permissions. Of course,
* the adapters will talk to terminate the connect.
*
* disconnect is needed only if we are CONNECTED not in CONN_QUIESCE
*/
/* this is the last importer */
if (e != RSM_SUCCESS) {
"rsm:disconnect failed seg=%x:err=%d\n",
}
}
}
sharedp->rsmsi_refcnt--;
if (sharedp->rsmsi_refcnt == 0) {
/* clean up the shared data structure */
} else {
}
/* increment generation number on barrier page */
if (bar_va) {
}
/*
* The following needs to be done after any
* rsmsharelock calls which use seg->s_share.
*/
/* signal anyone waiting in the CONN_QUIESCE state */
"rsm_closeconnection done\n"));
return (RSM_SUCCESS);
}
int
{
void *shared_cookie;
/* assert seg isn't locked */
/* Remove segment from imported list */
/* acquire the segment */
/* wait until segment leaves the mapping state */
"rsm_disconnect done: already disconnected\n"));
return (RSM_SUCCESS);
}
/* update state */
if (shared_cookie != NULL) {
/*
* This is the last importer so inform the exporting node
* so this import can be deleted from the list of importers.
*/
} else {
}
return (DDI_SUCCESS);
}
/*ARGSUSED*/
static int
{
/* find minor, no lock */
return (ENXIO);
}
*reventsp = 0;
/*
* An exported segment must be in state RSM_STATE_EXPORT; an
* imported segment must be in state RSM_STATE_ACTIVE.
*/
if (seg->s_pollevent) {
*reventsp = POLLRDNORM;
} else if (!anyyet) {
/* cannot take segment lock here */
}
return (0);
}
/* ************************* IOCTL Commands ********************* */
static rsmseg_t *
{
/* get segment from resource handle */
if (res != RSMRC_RESERVED) {
} else {
/* Allocate segment now and bind it */
/*
* if DR pre-processing is going on or DR is in progress
* then the new export segments should be in the NEW_QSCD state
*/
if (type == RSM_RESOURCE_EXPORT_SEGMENT) {
if ((rsm_drv_data.drv_state ==
}
}
}
return (seg);
}
static int
{
int error;
switch (cmd) {
case RSM_IOCTL_BIND:
break;
case RSM_IOCTL_REBIND:
break;
case RSM_IOCTL_UNBIND:
break;
case RSM_IOCTL_PUBLISH:
break;
case RSM_IOCTL_REPUBLISH:
break;
case RSM_IOCTL_UNPUBLISH:
break;
default:
break;
}
error));
return (error);
}
static int
{
int error;
switch (cmd) {
case RSM_IOCTL_CONNECT:
break;
default:
break;
}
error));
return (error);
}
static int
int mode)
{
int e;
"rsmbar_ioctl done: RSM_IMPORT_DUMMY\n"));
return (RSMERR_CONN_ABORTED);
"rsmbar_ioctl done: loopback\n"));
return (RSM_SUCCESS);
}
switch (cmd) {
case RSM_IOCTL_BAR_CHECK:
"rsmbar_ioctl done: RSM_BAR_CHECK %d\n", bar_va));
case RSM_IOCTL_BAR_OPEN:
break;
case RSM_IOCTL_BAR_ORDER:
break;
case RSM_IOCTL_BAR_CLOSE:
break;
default:
e = EINVAL;
break;
}
if (e == RSM_SUCCESS) {
#ifdef _MULTI_DATAMODEL
int i;
for (i = 0; i < 4; i++) {
}
"rsmbar_ioctl done\n"));
return (RSMERR_BAD_ADDR);
else
return (RSM_SUCCESS);
}
#endif
"rsmbar_ioctl done\n"));
return (RSMERR_BAD_ADDR);
else
return (RSM_SUCCESS);
}
"rsmbar_ioctl done: error=%d\n", e));
return (e);
}
/*
* Ring the doorbell of the export segment to which this segment is
* connected.
*/
static int
{
int e = 0;
"exportbell_ioctl done: %d\n", e));
return (e);
}
/*
* Ring the doorbells of all segments importing this segment
*/
static int
{
int index;
&request, RSM_NO_REPLY);
}
}
"importbell_ioctl done\n"));
return (RSM_SUCCESS);
}
static int
{
#ifdef _MULTI_DATAMODEL
int i;
rsm_consume_event_msg32_t cemsg32 = {0};
/* copyin the ioctl message */
sizeof (rsm_consume_event_msg32_t), mode)) {
"consumeevent_copyin msgp: RSMERR_BAD_ADDR\n"));
return (RSMERR_BAD_ADDR);
}
/*
* If numents is large alloc events list on heap otherwise
* use the address of array that was passed in.
*/
"consumeevent_copyin: "
"RSMERR_BAD_ARGS_ERRORS\n"));
return (RSMERR_BAD_ARGS_ERRORS);
}
} else {
}
/* copyin the seglist into the rsm_poll_event32_t array */
evlistsz32, mode)) {
}
"consumeevent_copyin evlist: RSMERR_BAD_ADDR\n"));
return (RSMERR_BAD_ADDR);
}
/* evlist and evlistsz are based on rsm_poll_event_t type */
} else {
}
/*
* copy the rsm_poll_event32_t array to the rsm_poll_event_t
* array
*/
}
/* free the temp 32-bit event list */
}
return (RSM_SUCCESS);
}
#endif
/* copyin the ioctl message */
mode)) {
"consumeevent_copyin msgp: RSMERR_BAD_ADDR\n"));
return (RSMERR_BAD_ADDR);
}
/*
* If numents is large alloc events list on heap otherwise
* use the address of array that was passed in.
*/
"consumeevent_copyin: RSMERR_BAD_ARGS_ERRORS\n"));
return (RSMERR_BAD_ARGS_ERRORS);
}
}
/* copyin the seglist */
if (evlist) {
}
"consumeevent_copyin evlist: RSMERR_BAD_ADDR\n"));
return (RSMERR_BAD_ADDR);
}
"consumeevent_copyin done\n"));
return (RSM_SUCCESS);
}
static int
{
int err = RSM_SUCCESS;
"consumeevent_copyout enter: numents(%d) eventsp(%p)\n",
#ifdef _MULTI_DATAMODEL
int i;
} else {
}
/*
* copy the rsm_poll_event_t array to the rsm_poll_event32_t
* array
*/
}
evlistsz32, mode)) {
}
if (evlist32) { /* free the temp 32-bit event list */
}
/*
* eventsp and evlistsz are based on rsm_poll_event_t
* type
*/
/* event list on the heap and needs to be freed here */
if (eventsp) {
}
}
"consumeevent_copyout done: err=%d\n", err));
return (err);
}
#endif
mode)) {
}
/* event list on the heap and needs to be freed here */
}
"consumeevent_copyout done: err=%d\n", err));
return (err);
}
static int
{
int rc;
int i;
rsm_consume_event_msg_t msg = {0};
event_list = events;
RSM_SUCCESS) {
return (rc);
}
event_list[i].revent = 0;
/* get the segment structure */
if (seg) {
"consumeevent_ioctl: rnum(%d) seg(%p)\n", rnum,
seg));
if (seg->s_pollevent) {
/* consume the event */
}
}
}
RSM_SUCCESS) {
return (rc);
}
return (RSM_SUCCESS);
}
static int
{
int size;
#ifdef _MULTI_DATAMODEL
int i;
"iovec_copyin: returning RSMERR_BAD_ADDR\n"));
return (RSMERR_BAD_ADDR);
}
else
}
"iovec_copyin done\n"));
return (DDI_SUCCESS);
}
#endif
"iovec_copyin done: RSMERR_BAD_ADDR\n"));
return (RSMERR_BAD_ADDR);
}
return (DDI_SUCCESS);
}
static int
{
#ifdef _MULTI_DATAMODEL
mode)) {
"sgio_copyin done: returning EFAULT\n"));
return (RSMERR_BAD_ADDR);
}
"sgio_copyin done\n"));
return (DDI_SUCCESS);
}
#endif
mode)) {
"sgio_copyin done: returning EFAULT\n"));
return (RSMERR_BAD_ADDR);
}
return (DDI_SUCCESS);
}
static int
{
"sgio_resid_copyout enter\n"));
#ifdef _MULTI_DATAMODEL
"sgio_resid_copyout error: rescnt\n"));
return (RSMERR_BAD_ADDR);
}
"sgio_resid_copyout error: flags\n"));
return (RSMERR_BAD_ADDR);
}
"sgio_resid_copyout done\n"));
return (DDI_SUCCESS);
}
#endif
"sgio_resid_copyout error:rescnt\n"));
return (RSMERR_BAD_ADDR);
}
"sgio_resid_copyout error:flags\n"));
return (RSMERR_BAD_ADDR);
}
return (DDI_SUCCESS);
}
static int
{
int e;
int error = 0;
uint_t i;
/*
* for rsmpi.
*/
if (e != DDI_SUCCESS) {
"rsm_iovec_ioctl done: sgio_copyin %d\n", e));
return (e);
}
"rsm_iovec_ioctl done: request_count(%d) too large\n",
return (RSMERR_BAD_SGIO);
}
rsmpi_sg_io.io_segflg = 0;
/* Allocate memory and copyin io vector array */
} else {
}
if (e != DDI_SUCCESS) {
"rsm_iovec_ioctl done: iovec_copyin %d\n", e));
return (e);
}
/* get the import segment descriptor */
/*
* The following sequence of locking may (or MAY NOT) cause a
* deadlock but this is currently not addressed here since the
* implementation will be changed to incorporate the use of
* reference counting for both the import and the export segments.
*/
/* rsmseglock_acquire(im_seg) done in rsmresource_lookup */
"rsm_iovec_ioctl done: rsmresource_lookup failed\n"));
return (EINVAL);
}
"rsm_iovec_ioctl done: not an import segment\n"));
return (EINVAL);
}
/*
* wait for a remote DR to complete ie. for segments to get UNQUIESCED
* as well as wait for a local DR to complete.
*/
"rsm_iovec_ioctl done: cv_wait INTR"));
return (RSMERR_INTERRUPTED);
}
}
e = RSMERR_BAD_SGIO;
goto out;
}
/*
* Allocate and set up the io vector for rsmpi
*/
} else {
}
e = RSMERR_BAD_SGIO;
break;
}
if (acl[0].ae_permission == 0) {
} else {
}
} else {
}
iovec++;
ka_iovec++;
}
/* error while processing handle */
}
goto out;
}
/* call rsmpi */
if (cmd == RSM_IOCTL_PUTV)
&rsmpi_sg_io);
else if (cmd == RSM_IOCTL_GETV)
&rsmpi_sg_io);
else {
e = EINVAL;
"iovec_ioctl: bad command = %x\n", cmd));
}
"rsm_iovec_ioctl RSMPI oper done %d\n", e));
/*
* Check for implicit signal post flag and do the signal
* post if needed
*/
e == RSM_SUCCESS) {
/*
* Reset the implicit signal post flag to 0 to indicate
* that the signal post has been done and need not be
* done in the RSMAPI library
*/
}
}
out:
iovec = iovec_start;
for (i = 0; i < iov_proc; i++) {
}
}
/*
* At present there is no dependency on the existence of xbufs
* created by ddi_umem_iosetup for each of the iovecs. So we
* can these xbufs here.
*/
}
iovec++;
ka_iovec++;
}
if (iovec_start)
}
"rsm_iovec_ioctl done %d\n", e));
/* if RSMPI call fails return that else return copyout's retval */
return ((e != RSM_SUCCESS) ? e : error);
}
static int
{
int rval = DDI_SUCCESS;
"rsmaddr_ioctl done: adapter not found\n"));
return (RSMERR_CTLR_NOT_PRESENT);
}
switch (cmd) {
case RSM_IOCTL_MAP_TO_ADDR: /* nodeid to hwaddr mapping */
/* returns the hwaddr in msg->hwaddr */
} else {
} else {
}
}
break;
case RSM_IOCTL_MAP_TO_NODEID: /* hwaddr to nodeid mapping */
/* returns the nodeid in msg->nodeid */
} else {
if ((int)node < 0) {
} else {
}
}
break;
default:
break;
}
"rsmaddr_ioctl done: %d\n", rval));
return (rval);
}
static int
{
#ifdef _MULTI_DATAMODEL
int i;
"rsm_ddi_copyin done: EFAULT\n"));
return (RSMERR_BAD_ADDR);
}
for (i = 0; i < 4; i++) {
}
"rsm_ddi_copyin done\n"));
return (RSM_SUCCESS);
}
#endif
return (RSMERR_BAD_ADDR);
else
return (RSM_SUCCESS);
}
static int
{
"rsmattr_ddi_copyout enter\n"));
/*
* need to copy appropriate data from rsm_controller_attr_t
* to rsmka_int_controller_attr_t
*/
#ifdef _MULTI_DATAMODEL
else
else
else
else
else
else
"rsmattr_ddi_copyout done\n"));
sizeof (rsmka_int_controller_attr32_t), mode)) {
return (RSMERR_BAD_ADDR);
}
else
return (RSM_SUCCESS);
}
#endif
"rsmattr_ddi_copyout done\n"));
sizeof (rsmka_int_controller_attr_t), mode)) {
return (RSMERR_BAD_ADDR);
}
else
return (RSM_SUCCESS);
}
/*ARGSUSED*/
static int
int *rvalp)
{
rsm_ioctlmsg_t msg = {0};
int error;
if (cmd == RSM_IOCTL_CONSUMEEVENT) {
"rsm_ioctl RSM_IOCTL_CONSUMEEVENT done: %d\n", error));
return (error);
}
/* topology cmd does not use the arg common to other cmds */
"rsm_ioctl done: %d\n", error));
return (error);
}
"rsm_ioctl done: %d\n", error));
return (error);
}
/*
* try to load arguments
*/
if (cmd != RSM_IOCTL_RING_BELL &&
"rsm_ioctl done: EFAULT\n"));
return (RSMERR_BAD_ADDR);
}
if (cmd == RSM_IOCTL_ATTR) {
"rsm_ioctl done: ENODEV\n"));
return (RSMERR_CTLR_NOT_PRESENT);
}
"rsm_ioctl:after copyout %d\n", error));
return (error);
}
if (cmd == RSM_IOCTL_BAR_INFO) {
/* Return library off,len of barrier page */
#ifdef _MULTI_DATAMODEL
else
"rsm_ioctl done\n"));
return (RSMERR_BAD_ADDR);
else
return (RSM_SUCCESS);
}
#endif
"rsm_ioctl done\n"));
return (RSMERR_BAD_ADDR);
else
return (RSM_SUCCESS);
}
/* map the nodeid or hwaddr */
if (error == RSM_SUCCESS) {
#ifdef _MULTI_DATAMODEL
"rsm_ioctl done\n"));
return (RSMERR_BAD_ADDR);
else
return (RSM_SUCCESS);
}
#endif
"rsm_ioctl done\n"));
return (RSMERR_BAD_ADDR);
else
return (RSM_SUCCESS);
}
"rsm_ioctl done: %d\n", error));
return (error);
}
/* Find resource and look it in read mode */
/*
* Find command group
*/
switch (RSM_IOCTL_CMDGRP(cmd)) {
case RSM_IOCTL_EXPORT_SEG:
/*
* Export list is searched during publish, loopback and
* remote lookup call.
*/
credp);
}
break;
case RSM_IOCTL_IMPORT_SEG:
/* Import list is searched during remote unmap call. */
credp);
}
break;
case RSM_IOCTL_BAR:
if (res != RSMRC_RESERVED &&
mode);
} else { /* invalid res value */
}
break;
case RSM_IOCTL_BELL:
if (res != RSMRC_RESERVED) {
else /* RSM_RESOURCE_BAR */
} else { /* invalid res value */
}
break;
default:
}
error));
return (error);
}
/* **************************** Segment Mapping Operations ********* */
static rsm_mapinfo_t *
{
rsm_mapinfo_t *p;
/*
* Find the correct mapinfo structure to use during the mapping
* from the seg->s_mapinfo list.
* The seg->s_mapinfo list contains in reverse order the mappings
* as returned by the RSMPI rsm_map. In rsm_devmap, we need to
* access the correct entry within this list for the mapping
* requested.
*
* The algorithm for selecting a list entry is as follows:
*
* When start_offset of an entry <= off we have found the entry
* we were looking for. Adjust the dev_offset and map_len (needs
* to be PAGESIZE aligned).
*/
for (; p; p = p->next) {
if (p->start_offset <= off) {
return (p);
}
p = p->next;
}
return (NULL);
}
static void
{
rsm_mapinfo_t *p;
p = mapinfo;
kmem_free(p, sizeof (*p));
}
}
static int
{
rsmcookie_t *p;
"rsmmap_map: dhp = %x\n", dhp));
/*
* Allocate structure and add cookie to segment list
*/
p = kmem_alloc(sizeof (*p), KM_SLEEP);
return (DDI_SUCCESS);
}
/*
* Page fault handling is done here. The prerequisite mapping setup
* has been done in rsm_devmap with calls to ddi_devmem_setup or
* ddi_umem_setup
*/
static int
{
int e;
"rsmmap_access done: cv_wait INTR"));
return (RSMERR_INTERRUPTED);
}
}
"rsmmap_access: dhp = %x\n", dhp));
}
return (e);
}
static int
void **newpvt)
{
rsmcookie_t *p, *old;
/*
* Same as map, create an entry to hold cookie and add it to
* connect segment list. The oldpvt is a pointer to segment.
* Return segment pointer in newpvt.
*/
/*
* Find old cookie
*/
break;
}
}
"rsmmap_dup done: EINVAL\n"));
return (EINVAL);
}
p = kmem_alloc(sizeof (*p), KM_SLEEP);
return (DDI_SUCCESS);
}
static void
{
/*
* Remove pvtp structure from segment list.
*/
int freeflag;
"rsmmap_unmap: dhp = %x\n", dhp));
/*
* We can go ahead and remove the dhps even if we are in
* the MAPPING state because the dhps being removed here
* belong to a different mmap and we are holding the segment
* lock.
*/
/* find and remove dhp handle */
break;
}
}
} else {
"rsmmap_unmap:parital unmap"
"new_dhp1 %lx, new_dhp2 %lx\n",
}
/*
* rsmmap_unmap is called for each mapping cookie on the list.
* When the list becomes empty and we are not in the MAPPING
* state then unmap in the rsmpi driver.
*/
freeflag = 1;
} else {
freeflag = 0;
}
if (freeflag) {
/* Free the segment structure */
}
}
static struct devmap_callback_ctl rsmmap_ops = {
DEVMAP_OPS_REV, /* devmap_ops version number */
rsmmap_map, /* devmap_ops map routine */
rsmmap_access, /* devmap_ops access routine */
rsmmap_dup, /* devmap_ops dup routine */
rsmmap_unmap, /* devmap_ops unmap routine */
};
static int
{
int err;
if ((off == barrier_offset) &&
(len == barrier_size)) {
/*
* The offset argument in devmap_umem_setup represents
* the offset within the kernel memory defined by the
* cookie. We use this offset as barrier_offset.
*/
DEVMAP_DEFAULTS, 0);
if (err != 0) {
"rsm_devmap done: %d\n", err));
return (RSMERR_MAP_FAILED);
}
"rsm_devmap done: %d\n", err));
*maplen = barrier_size;
return (err);
} else {
"rsm_devmap done: %d\n", err));
return (RSMERR_MAP_FAILED);
}
}
/*
* Make sure we still have permission for the map operation.
*/
}
maxprot |= PROT_WRITE;
}
/*
* For each devmap call, rsmmap_map is called. This maintains driver
* private information for the mapping. Thus, if there are multiple
* devmap calls there will be multiple rsmmap_map calls and for each
* call, the mapping information will be stored.
* In case of an error during the processing of the devmap call, error
* will be returned. This error return causes the caller of rsm_devmap
* to undo all the mappings by calling rsmmap_unmap for each one.
* rsmmap_unmap will free up the private information for the requested
* mapping.
*/
rsm_mapinfo_t *p;
if (p == NULL) {
"rsm_devmap: incorrect mapping info\n"));
return (RSMERR_MAP_FAILED);
}
callbackops, p->dev_register,
"rsm_devmap: dip=%lx,dreg=%lu,doff=%lx,"
"off=%lx,len=%lx\n",
if (err != 0) {
"rsm_devmap: devmap_devmem_setup failed %d\n",
err));
return (RSMERR_MAP_FAILED);
}
/* cur_len is always an integral multiple pagesize */
return (err);
} else {
if (err != 0) {
"rsm_devmap: devmap_umem_setup failed %d\n",
err));
return (RSMERR_MAP_FAILED);
}
"rsm_devmap: loopback done\n"));
return (err);
}
}
/*
* We can use the devmap framework for mapping device memory to user space by
* specifying this routine in the rsm_cb_ops structure. The kernel mmap
* processing calls this entry point and devmap_setup is called within this
* function, which eventually calls rsm_devmap
*/
static int
{
int error = 0;
int old_state;
/*
* find segment
*/
"rsm_segmap done: invalid segment\n"));
return (EINVAL);
}
/*
* the user is trying to map a resource that has not been
* defined yet. The library uses this to map in the
* barrier page.
*/
/*
* The mapping for the barrier page is identified
* by the special offset barrier_offset
*/
return (EINVAL);
}
"rsm_segmap done: %d\n", error));
return (error);
} else {
return (EINVAL);
}
}
/* Make sure you can only map imported segments */
"rsm_segmap done: not an import segment\n"));
return (EINVAL);
}
/* check means library is broken */
/* wait for the segment to become unquiesced */
"rsm_segmap done: cv_wait INTR"));
return (ENODEV);
}
}
/* wait until segment leaves the mapping state */
/*
* we allow multiple maps of the same segment in the KA
* and it works because we do an rsmpi map of the whole
* segment during the first map and all the device mapping
* information needed in rsm_devmap is in the mapinfo list.
*/
"rsm_segmap done: segment not connected\n"));
return (ENODEV);
}
/*
* Make sure we are not mapping a larger segment than what's
* exported
*/
"rsm_segmap done: off+len>seg size\n"));
return (ENXIO);
}
/*
* Make sure we still have permission for the map operation.
*/
}
maxprot |= PROT_WRITE;
}
/* No permission */
"rsm_segmap done: no permission\n"));
return (EACCES);
}
"rsm_segmap done:RSMSI_STATE %d invalid\n",
sharedp->rsmsi_state));
return (ENODEV);
}
/*
* Do the map - since we want importers to share mappings
* we do the rsmpi map for the whole segment
*/
rsm_mapinfo_t *p;
/*
* length_to_map = seg->s_len is always an integral
* multiple of PAGESIZE. Length mapped in each entry in mapinfo
* list is a multiple of PAGESIZE - RSMPI map ensures this
*/
error = 0;
/* map the whole segment */
tmp_len = 0;
if (error != 0)
break;
/*
* Store the mapping info obtained from rsm_map
*/
p = kmem_alloc(sizeof (*p), KM_SLEEP);
p->dev_register = dev_register;
p->dev_offset = dev_offset;
p->individual_len = tmp_len;
p->start_offset = tmp_off;
sharedp->rsmsi_mapinfo = p;
length_to_map -= tmp_len;
}
if (error != RSM_SUCCESS) {
/* Check if this is the the first rsm_map */
/*
* A single rsm_unmap undoes
* multiple rsm_maps.
*/
}
"rsm_segmap done: rsmpi map err %d\n",
error));
error != RSMERR_BAD_SEG_HNDL);
if (error == RSMERR_UNSUPPORTED_OPERATION)
return (ENOTSUP);
else if (error == RSMERR_INSUFFICIENT_RESOURCES)
return (EAGAIN);
else if (error == RSMERR_CONN_ABORTED)
return (ENODEV);
else
return (error);
} else {
}
} else {
}
sharedp->rsmsi_mapcnt++;
/* move to an intermediate mapping state */
if (error == DDI_SUCCESS) {
} else {
sharedp->rsmsi_mapcnt--;
if (sharedp->rsmsi_mapcnt == 0) {
/* unmap the shared RSMPI mapping */
}
"rsm: devmap_setup failed %d\n", error));
}
error));
return (error);
} else {
/*
* For loopback, the export segment mapping cookie (s_cookie)
* is also used as the s_cookie value for its import segments
* during mapping.
* Note that reference counting for s_cookie of the export
* segment is not required due to the following:
* We never have a case of the export segment being destroyed,
* leaving the import segments with a stale value for the
* s_cookie field, since a force disconnect is done prior to a
* destroy of an export segment. The force disconnect causes
* the s_cookie value to be reset to NULL. Also for the
* rsm_rebind operation, we change the s_cookie value of the
* export segment as well as of all its local (loopback)
* importers.
*/
/*
* In order to maintain the lock ordering between the export
* and import segment locks, we need to acquire the export
* segment lock first and only then acquire the import
* segment lock.
* The above is necessary to avoid any deadlock scenarios
* with rsm_rebind which also acquires both the export
* and import segment locks in the above mentioned order.
* Based on code inspection, there seem to be no other
* situations in which both the export and import segment
* locks are acquired either in the same or opposite order
* as mentioned above.
* Thus in order to conform to the above lock order, we
* need to change the state of the import segment to
* RSM_STATE_MAPPING, release the lock. Once this is done we
* can now safely acquire the export segment lock first
* followed by the import segment lock which is as per
* the lock order mentioned above.
*/
/* move to an intermediate mapping state */
/*
* Revert to old_state and signal any waiters
* The shared state is not changed
*/
return (ENODEV);
}
sharedp->rsmsi_mapcnt++;
/*
* It is not required or necessary to acquire the import
* segment lock here to change the value of s_cookie since
* no one will touch the import segment as long as it is
* in the RSM_STATE_MAPPING state.
*/
if (error == 0) {
} else {
sharedp->rsmsi_mapcnt--;
if (sharedp->rsmsi_mapcnt == 0) {
}
}
"rsm_segmap done: %d\n", error));
return (error);
}
}
int
{
return (RSM_SUCCESS);
}
int
{
return (RSM_SUCCESS);
}
int
{
return (RSM_SUCCESS);
}
int
{
return (DDI_SUCCESS);
}
int
{
return (RSM_SUCCESS);
}
int
{
return (RSM_SUCCESS);
}
int
{
return (RSM_SUCCESS);
}
int
{
return (RSM_SUCCESS);
}
void
{
"rsmka_init_loopback enter\n"));
/* initialize null ops vector */
/* initialize attributes for loopback adapter */
/* initialize loopback adapter */
"rsmka_init_loopback done\n"));
}
/* ************** DR functions ********************************** */
static void
{
int recheck_state;
do {
recheck_state = 0;
"%s done:state =%d\n", function,
return;
}
"%s done:state =%d\n", function,
return;
}
/* unbind */
(void) rsm_unbind_pages(segp);
"%s done:state =%d\n", function,
return;
}
/*
* a local memory handle
*/
}
/*
* state changed need to see what it
* should be changed to.
*/
recheck_state = 1;
continue;
}
/*
* send SUSPEND messages - currently it will be
* done at the end
*/
"%s done:state =%d\n", function,
return;
}
} while (recheck_state);
}
static void
{
int ret;
int acl_len;
int create_flags = 0;
return;
}
return;
}
/* bind the segment */
} else { /* bind failed - resource unavailable */
}
return;
}
/* wait for the segment to move to EXPORT_QUIESCED state */
}
/* bind the segment */
if (ret != RSM_SUCCESS) {
/* bind failed - resource unavailable */
"%s done: exp_qscd bind failed = %d\n",
return;
}
/*
* publish the segment
* if successful
* segp->s_state = RSM_STATE_EXPORT;
* else failed
* segp->s_state = RSM_STATE_BIND;
*/
/* check whether it is a local_memory_handle */
"%s done:exp_qscd\n", function));
return;
}
}
}
} else {
}
if (ret != RSM_SUCCESS) {
"%s done: exp_qscd create failed = %d\n",
return;
}
if (ret != RSM_SUCCESS) {
"%s done: exp_qscd publish failed = %d\n",
return;
}
function));
return;
}
}
static void
{
/* wait for the RDMA to complete */
}
}
static void
{
}
static void
{
if (event == RSM_DR_QUIESCE)
else /* UNQUIESCE */
}
static void
{
if (event == RSM_DR_QUIESCE)
else /* UNQUIESCE */
}
static void
{
int i, j;
rsmresource_t *p;
"rsm_dr_process_local_segments enter\n"));
/* iterate through the resource structure */
for (i = 0; i < rsm_resource.rsmrc_len; i++) {
for (j = 0; j < RSMRC_BLKSZ; j++) {
p = blk->rsmrcblk_blks[j];
if ((p != NULL) && (p != RSMRC_RESERVED)) {
/* valid resource */
if (p->rsmrc_type ==
rsm_process_exp_seg(p, event);
else if (p->rsmrc_type ==
rsm_process_imp_seg(p, event);
}
}
}
}
"rsm_dr_process_local_segments done\n"));
}
/* *************** DR callback functions ************ */
static void
{
"rsm_dr_callback_post_add is a no-op\n"));
/* Noop */
}
static int
{
int recheck_state = 0;
"rsm_dr_callback_pre_del enter\n"));
do {
recheck_state = 0;
"rsm_dr_callback_pre_del:state=%d\n",
switch (rsm_drv_data.drv_state) {
case RSM_DRV_NEW:
/*
* The state should usually never be RSM_DRV_NEW
* since in this state the callbacks have not yet
* been registered. So, ASSERT.
*/
ASSERT(0);
return (0);
case RSM_DRV_REG_PROCESSING:
/*
* The driver is in the process of registering
* with the DR framework. So, wait till the
* registration process is complete.
*/
recheck_state = 1;
break;
case RSM_DRV_UNREG_PROCESSING:
/*
* If the state is RSM_DRV_UNREG_PROCESSING, the
* module is in the process of detaching and
* unregistering the callbacks from the DR
* framework. So, simply return.
*/
"rsm_dr_callback_pre_del:"
return (0);
case RSM_DRV_OK:
break;
case RSM_DRV_PREDEL_STARTED:
/* FALLTHRU */
case RSM_DRV_PREDEL_COMPLETED:
/* FALLTHRU */
recheck_state = 1;
break;
case RSM_DRV_DR_IN_PROGRESS:
"rsm_dr_callback_pre_del done\n"));
return (0);
/* break; */
default:
ASSERT(0);
break;
}
} while (recheck_state);
/* Do all the quiescing stuff here */
"rsm_dr_callback_pre_del: quiesce things now\n"));
/*
* now that all local segments have been quiesced lets inform
* the importers
*/
/*
* In response to the suspend message the remote node(s) will process
* the segments and send a suspend_complete message. Till all
* the nodes send the suspend_complete message we wait in the
* RSM_DRV_PREDEL_STARTED state. In the exporter_quiesce
* function we transition to the RSM_DRV_PREDEL_COMPLETED state.
*/
}
"rsm_dr_callback_pre_del done\n"));
return (0);
}
static void
{
int recheck_state = 0;
"rsm_dr_callback_post_del enter\n"));
do {
recheck_state = 0;
"rsm_dr_callback_post_del:state=%d\n",
switch (rsm_drv_data.drv_state) {
case RSM_DRV_NEW:
/*
* The driver state cannot not be RSM_DRV_NEW
* since in this state the callbacks have not
* yet been registered.
*/
ASSERT(0);
return;
case RSM_DRV_REG_PROCESSING:
/*
* The driver is in the process of registering with
* the DR framework. Wait till the registration is
* complete.
*/
recheck_state = 1;
break;
case RSM_DRV_UNREG_PROCESSING:
/*
* RSM_DRV_UNREG_PROCESSING state means the module
* is detaching and unregistering the callbacks
* from the DR framework. So simply return.
*/
/* FALLTHRU */
case RSM_DRV_OK:
/*
* RSM_DRV_OK means we missed the pre-del
* corresponding to this post-del coz we had not
* registered yet, so simply return.
*/
"rsm_dr_callback_post_del:"
return;
/* break; */
case RSM_DRV_PREDEL_STARTED:
/* FALLTHRU */
case RSM_DRV_PREDEL_COMPLETED:
/* FALLTHRU */
recheck_state = 1;
break;
case RSM_DRV_DR_IN_PROGRESS:
if (rsm_drv_data.drv_memdel_cnt > 0) {
"rsm_dr_callback_post_del done:\n"));
return;
}
break;
default:
ASSERT(0);
return;
/* break; */
}
} while (recheck_state);
/* Do all the unquiescing stuff here */
"rsm_dr_callback_post_del: unquiesce things now\n"));
/*
* now that all local segments have been unquiesced lets inform
* the importers
*/
"rsm_dr_callback_post_del done\n"));
return;
}