fmd_xprt.c revision 724365f7556fc4201fdb11766ebc6bd918523130
261N/A * The contents of this file are subject to the terms of the 261N/A * Common Development and Distribution License (the "License"). 261N/A * You may not use this file except in compliance with the License. 261N/A * See the License for the specific language governing permissions 261N/A * and limitations under the License. 261N/A * When distributing Covered Code, include this CDDL HEADER in each 261N/A * If applicable, add the following below this CDDL HEADER, with the 261N/A * fields enclosed by brackets "[]" replaced with your own identifying 261N/A * information: Portions Copyright [yyyy] [name of copyright owner] 261N/A * Copyright 2006 Sun Microsystems, Inc. All rights reserved. 261N/A * Use is subject to license terms. 261N/A#
pragma ident "%Z%%M% %I% %E% SMI" 261N/A * FMD Transport Subsystem 261N/A * A transport module uses some underlying mechanism to transport events. 261N/A * This mechanism may use any underlying link-layer protocol and may support 261N/A * additional link-layer packets unrelated to FMA. Some appropriate link- 2238N/A * layer mechanism to create the underlying connection is expected to be 2238N/A * called prior to calling fmd_xprt_open() itself. Alternatively, a transport 2238N/A * may be created in the suspended state by specifying the FMD_XPRT_SUSPENDED 261N/A * flag as part of the call to fmd_xprt_open(), and then may be resumed later. 261N/A * The underlying transport mechanism is *required* to provide ordering: that 261N/A * is, the sequences of bytes written across the transport must be read by 261N/A * the remote peer in the order that they are written, even across separate 261N/A * calls to fmdo_send(). As an example, the Internet TCP protocol would be 261N/A * a valid transport as it guarantees ordering, whereas the Internet UDP 261N/A * protocol would not because UDP datagrams may be delivered in any order 261N/A * as a result of delays introduced when datagrams pass through routers. 261N/A * Similar to sending events, a transport module receives events that are from 261N/A * its peer remote endpoint using some transport-specific mechanism that is 2238N/A * unknown to FMD. As each event is received, the transport module is 2238N/A * responsible for constructing a valid nvlist_t object from the data and then 2238N/A * calling fmd_xprt_post() to post the event to the containing FMD's dispatch 2238N/A * queue, making it available to all local modules that are not transport 2238N/A * modules that have subscribed to the event. 2238N/A * The following state machine is used for each transport. The initial state 261N/A * is either SYN, ACK, or RUN, depending on the flags specified to xprt_create. 3996N/A * FMD_XPRT_ACCEPT !FMD_XPRT_ACCEPT * waiting +--v--+ +--v--+ waiting * for syn | SYN |--+ --+| ACK | for ack * event +-----+ \ / +-----+ event * drop all +--v--+ X +--v--+ send subscriptions, * events | ERR |<---+ +--->| SUB | recv subscriptions, * +-----+ +-----+ wait for run event * When fmd_xprt_open() is called without FMD_XPRT_ACCEPT, the Common Transport * Layer enqueues a "syn" event for the module in its event queue and sets the * state to ACK. In state ACK, we are waiting for the transport to get an * "ack" event and call fmd_xprt_post() on this event. Other events will be * discarded. If an "ack" is received, we transition to state SUB. If a * configurable timeout occurs or if the "ack" is invalid (e.g. invalid version * exchange), we transition to state ERR. Once in state ERR, no further * operations are valid except fmd_xprt_close() and fmd_xprt_error() will * return a non-zero value to the caller indicating the transport has failed. * When fmd_xprt_open() is called with FMD_XPRT_ACCEPT, the Common Transport * Layer assumes this transport is being used to accept a virtual connection * from a remote peer that is sending a "syn", and sets the initial state to * SYN. In this state, the transport waits for a "syn" event, validates it, * and then transitions to state SUB if it is valid or state ERR if it is not. * Once in state SUB, the transport module is expected to receive a sequence of * zero or more "subscribe" events from the remote peer, followed by a "run" * event. Once in state RUN, the transport is active and any events can be * sent or received. The transport module is free to call fmd_xprt_close() * from any state. The fmd_xprt_error() function will return zero if the * transport is not in the ERR state, or non-zero if it is in the ERR state. * Once the state machine reaches RUN, other FMA protocol events can be sent * and received across the transport in addition to the various control events. * Table of Common Transport Layer Control Events * ============================================== * resource.fm.xprt.uuclose string (uuid of case) * resource.fm.xprt.unsubscribe string (class pattern) * resource.fm.xprt.unsuback string (class pattern) * resource.fm.xprt.ack version information * resource.fm.xprt.run version information * Control events are used to add and delete proxy subscriptions on the remote * transport peer module, and to set up connections. When a "syn" event is * sent, FMD will include in the payload the highest version of the FMA event * protocol that is supported by the sender. When a "syn" event is received, * the receiving FMD will use the minimum of this version and its version of * the protocol, and reply with this new minimum version in the "ack" event. * The receiver will then use this new minimum for subsequent event semantics. * The states shown above in the transport state machine diagram are encoded * using arrays of class patterns and a corresponding action function. These * arrays are then passed to fmd_xprt_transition() to change transport states. * Template for per-transport statistics installed by fmd on behalf of each * transport. These are used to initialize the per-transport xi_stats. For * each statistic, the name is prepended with "fmd.xprt.%u", where %u is the * transport ID (xi_id) and then are inserted into the per-module stats hash. * The values in this array must match fmd_xprt_stat_t from <fmd_xprt.h>. {
"prdequeued",
FMD_TYPE_UINT64,
"protocol events dequeued by transport" },
{
"wlentime",
FMD_TYPE_TIME,
"total wait length * time product" },
{
"wlastupdate",
FMD_TYPE_TIME,
"hrtime of last wait queue update" },
{
"dtime",
FMD_TYPE_TIME,
"total processing time after dequeue" },
{
"dlastupdate",
FMD_TYPE_TIME,
"hrtime of last event dequeue completion" },
{
"authority",
FMD_TYPE_STRING,
"authority associated with this transport" },
{
"timeouts",
FMD_TYPE_UINT64,
"events received by transport with ttl=0" },
{
"subscriptions",
FMD_TYPE_UINT64,
"subscriptions registered to transport" },
* Insert the specified class into the specified class hash, and return the * reference count. A return value of one indicates this is the first insert. * If an eventq is associated with the hash, insert a dispq subscription for it. * Delete the specified class from the specified class hash, and return the * reference count. A return value of zero indicates the class was deleted. * If an eventq is associated with the hash, delete the dispq subscription. return (-
1U);
/* explicitly permit an invalid delete */ * Queue subscribe events for the specified transport corresponding to all of * the active module subscriptions. This is an extremely heavyweight operation * that we expect to take place rarely (i.e. when loading a transport module * or when it establishes a connection). We lock all of the known modules to * prevent them from adding or deleting subscriptions, then snapshot their * subscriptions, and then unlock all of the modules. We hold the modhash * lock for the duration of this operation to prevent new modules from loading. * If we've reached the SUB state, take out the big hammer and snapshot * all of the subscriptions of all of the loaded modules. Then queue a * run event for our remote peer indicating that it can enter RUN. return;
/* transitioned to error state */ * If the transport module didn't specify an authority, extract the * one that is passed along with the xprt.syn event and use that. return;
/* transitioned to error state */ * If the transport module didn't specify an authority, extract the * one that is passed along with the xprt.syn event and use that. * Upon transition to RUN, we take every solved case and resend a list.suspect * event for it to our remote peer. If a case transitions from solved to a * future state (CLOSE_WAIT, CLOSED, or REPAIRED) while we are iterating over * the case hash, we will get it as part of examining the resource cache, next. return;
/* unsolved, or we'll get it during the ASRU pass */ * Upon transition to RUN, we take every ASRU which is in the degraded state * and resend a fault.* event for it to our remote peer, in case the peer is * running in the fault manager that knows how to disable this resource. If * any new resources are added to the cache during our iteration, this is no * problem because our subscriptions are already proxied and so any new cases * will result in a list.suspect event being transported if that is needed. return;
/* asru is internal, unusable, or not faulty */ return;
/* transitioned to error state */ return;
/* malformed protocol event */ return;
/* transitioned to error state */ return;
/* malformed protocol event */ return;
/* transitioned to error state */ return;
/* malformed protocol event */ return;
/* transitioned to error state */ char *
class =
"<unknown>";
char *
class =
"<unknown>";
* Grab fmd.d_xprt_lock to block fmd_xprt_suspend_all() and then create * a transport ID and make it visible in fmd.d_xprt_ids. If transports * were previously suspended, set the FMD_XPRT_DSUSPENDED flag on us to * ensure that this transport will not run until fmd_xprt_resume_all(). * If the module has not yet finished _fmd_init(), set the ISUSPENDED * bit so that fmdo_send() is not called until _fmd_init() completes. * Initialize the transport statistics that we keep on behalf of fmd. * These are set up using a template defined at the top of this file. * We rename each statistic with a prefix ensuring its uniqueness. for (i = 0; i <
statc; i++) {
* Create the outbound eventq for this transport and link to its stats. * If any suspend bits were set above, suspend the eventq immediately. * Create our subscription hashes: local subscriptions go to xi_queue, * remote subscriptions are tracked only for protocol requests, and * pending unsubscriptions are associated with the /dev/null eventq. * Determine our initial state based upon the creation flags. If we're * read-only, go directly to RUN. If we're accepting a new connection, * wait for a SYN. Otherwise send a SYN and wait for an ACK. * If client.xprtlog is set to TRUE, create a debugging log for the * If this is a read-only transport, return without creating a send * queue thread and setting up any connection events in our queue. * Once the transport is fully initialized, create a send queue thread * and start any connect events flowing to complete our initialization. "failed to create thread for transport %u",
xip->
xi_id);
* If the transport is not being opened to accept an inbound connect, * start an outbound connection by enqueuing a SYN event for our peer. * Remove the transport from global visibility, cancel its send-side * thread, join with it, and then remove the transport from module * visibility. Once all this is done, destroy and free the transport. * Release every case handle in the module that was cached by this * transport. This will result in these cases disappearing from the * local case hash so that fmd_case_uuclose() can no longer be used. * Destroy every class in the various subscription hashes and remove * any corresponding subscriptions from the event dispatch queue. * Since our statistics are created by hand, after deleting them from * the ustat hash we must manually free them and any embedded strings. for (i = 0; i < n; i++,
sp++) {
* Grab the transport lock and set the busy flag to indicate we are * busy receiving an event. If [DI]SUSPEND is pending, wait until fmd * resumes the transport before continuing on with the receive. return;
/* fmd_destroy() is in progress */ "required \"%s\" payload element", (
void *)
nvl,
FM_CLASS);
* If a time-to-live value is present in the event and is zero, drop * the event and bump xs_timeouts. Otherwise decrement the TTL value. "timeout: event received with ttl=0\n",
* If we are using the native system clock, the underlying transport * code can provide a tighter event time bound by telling us when the * event was enqueued. If we're using simulated clocks, this time * has no meaning to us, so just reset the value to use HRT_NOW. * If an event's class is in the FMD_CTL_CLASS family, then create a * control event. If a FMD_EVN_TOD member is found, create a protocol * event using this time. Otherwise create a protocol event using hrt. * If the debug log is enabled, create a temporary event, log it to the * debug log, and then reset the underlying state of the event. * Iterate over the rules for the current state trying to match the * event class to one of our special rules. If a rule is matched, the * event is consumed and not dispatched to other modules. If the rule * set ends without matching an event, we fall through to dispatching. * Record the event in the errlog if it is an ereport. This code will * be replaced later with a per-transport intent log instead. * If a list.suspect event is received, create a case for the specified * UUID in the case hash, with the transport module as its owner. If * the UUID is already known, fmd_case_recreate() will return NULL and * we simply proceed to our normal event handling regardless. * Insert the specified class into our remote subscription hash. If the class * is already present, bump the reference count; otherwise add it to the hash * and then enqueue an event for our remote peer to proxy our subscription. return;
/* read-only transports do not proxy subscriptions */ return;
/* transport is not yet an active subscriber */ return;
/* we've already asked our peer for this subscription */ * Delete the specified class from the remote subscription hash. If the * reference count drops to zero, ask our remote peer to unsubscribe by proxy. return;
/* read-only transports do not proxy subscriptions */ return;
/* transport is not yet an active subscriber */ * If the subscription reference count drops to zero in xi_rsub, insert * an entry into the xi_usub hash indicating we await an unsuback event. return;
/* other subscriptions for this class still active */ return;
/* already suspended */ return;
/* not ready to be resumed */