ntwdt.c revision 03831d35f7499c87d51205817c93e9a8d42c4bae
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright 2005 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
#pragma ident "%Z%%M% %I% %E% SMI"
/*
* ntwdt driver
* ------------
*
* Subsystem Overview
* ------------------
*
* This is a pseudo driver for the Netra-1280 watchdog
* timer (WDT). It provides for an *application-driven*
* WDT (AWDT), not a traditional, hardware-based WDT. A
* hardware-based feature is already present on the
* Netra-1280, and it is referred to here as the
* System WDT (SWDT).
*
* ScApp and Solaris cooperate to provide either a SWDT or
* an AWDT; they are mutually-exclusive. Once in AWDT
* mode, one can only transition to SWDT mode via a reboot.
* This obviously gives priority to the AWDT and was done
* to handle scenarios where the customer might temporarily
* terminate their wdog-app in order to do some debugging,
* or even to load a new version of the wdog-app.
*
* and then issues ioctl's to control the state of the AWDT.
* The ioctl's are implemented by this driver. Only one
* concurrent instance of open() is allowed. On the close(),
* a watchdog timer still in progress is NOT terminated.
* This allows the global state machine to monitor the
* progress of a Solaris reboot. ScApp will reset Solaris
* is larger than the current AWDT timeout.
*
* The rationale for implementing an AWDT (vs a SWDT) is
* that it is more sensitive to system outage scenarios than
* a SWDT. Eg, a system could be in such a failed state that
* even though its clock-interrupt could still run (and the
* SWDT's watchdog timer therefore re-armed), the system could
* in effect have a corrupt or very poor dispatch latency.
* An AWDT would be sensitive to dispatch latency issues, as
* well as problems with its own execution (eg, a hang or
* crash).
*
* Subsystem Interface Overview
* ----------------------------
*
* This pseudo-driver does not have any 'extern' functions.
*
* All system interaction is done via the traditional driver
* entry points (eg, attach(9e), _init(9e)).
*
* All interaction with user is via the entry points in the
* 'struct cb_ops' vector (eg, open(9e), ioctl(9e), and
* close(9e)).
*
* Subsystem Implementation Overview
* ---------------------------------
*
* ScApp and Solaris (eg, ntwdt) cooperate so that a state
* machine global to ScApp and ntwdt is either in AWDT mode
* or in SWDT mode. These two peers communicate via the SBBC
* Mailbox that resides in IOSRAM (SBBC_MAILBOX_KEY).
* They use two new mailbox messages (LW8_MBOX_WDT_GET and
* LW8_MBOX_WDT_SET) and one new event (LW8_EVENT_SC_RESTARTED).
*
* ntwdt implements the AWDT by implementing a "virtual
* WDT" (VWDT). Eg, the watchdog timer is not a traditional
* counter in hardware, it is a variable in ntwdt's
* softstate. The wdog-app's actions cause changes to this
* and other variables in ntwdt's softstate.
*
* The wdog-app uses the LOMIOCDOGTIME ioctl to specify
* the number of seconds in the watchdog timeout (and
* therefore the VWDT). The wdog-app then uses the
* LOMIOCDOGCTL ioctl to enable the wdog. This causes
* ntwdt to create a Cyclic that will both decrement
* the VWDT and check to see if it has expired. To keep
* the VWDT from expiring, the wdog-app uses the
* LOMIOCDOGPAT ioctl to re-arm (or "pat") the watchdog.
* This sets the VWDT value to that specified in the
* last LOMIOCDOGTIME ioctl. The wdog-app can use the
* LOMIOCDOGSTATE ioctl to query the state of the VWDT.
*
* The wdog-app can also specify how Recovery is to be
* done. The only choice is whether to do a crashdump
* or not. If ntwdt computes a VWDT expiration, then
* ntwdt initiates the Recovery, else ScApp will. Eg,
* a hang in Solaris will be sensed by ScApp and not
* ntwdt. The wdog-app specifies the Recovery policy
* via the DOGCTL ioctl.
*
* Timeout Expiration
* ------------------
* In our implementation, ScApp senses a watchdog
* expiration the same way it historically has:
* by reading a well-known area of IOSRAM (SBBC_TOD_KEY)
* to see if the timestamp associated with a
* Solaris-generated "heartbeat" field is older
* than the currently specified timeout (which is
* also specified in this same IOSRAM section).
*
* What is different when ntwdt is running is that
* ntwdt is responsible for updating the Heartbeat,
* and not the normal client (todsg). When ntwdt
* puts the system in AWDT mode, it disables todsg's
* updating of the Heartbeat by changing the state of
* a pair of kernel tunables (watchdog_activated and
* watchdog_enable). ntwdt then takes responsibility
* for updating the Heartbeat. It does this by
* updating the Heartbeat from the Cyclic that is
* created when the user enables the AWDT (DOGCTL)
* or specifies a new timeout value (DOGTIME).
*
* As long as the AWDT is enabled, ntwdt will update
* the real system Heartbeat. As a result, ScApp
* will conclude that Solaris is still running. If
* the user stops re-arming the VWDT or Solaris
* hangs (eg), ntwdt will stop updating the Heartbeat.
*
* Note that ntwdt computes expiration via the
* repeatedly firing Cyclic, and ScApp computes
* expiration via a cessation of Heartbeat update.
* Since Heartbeat update stops once user stops
* re-arming the VWDT (ie, DOGPAT ioctl), ntwdt
* will compute a timeout at t(x), and ScApp will
* compute a timeout at t(2x), where 'x' is the
* current timeout value. When ntwdt computes
* the expiration, ntwdt masks this asymmetry.
*
* Lifecycle Events
* ----------------
*
* ntwdt only handles one of the coarse-grained
* "lifecycle events" (eg, entering OBP, shutdown,
* power-down, DR) that are possible during a Solaris
* session: a panic. (Note that ScApp handles one
* of the others: "entering OBP"). Other than these,
* a user choosing such a state transition must first
* use the wdog-app to disable the watchdog, else
* an expiration could occur.
*
* Solaris handles a panic by registering a handler
* that's called during the panic. The handler will
* set the watchdog timeout to the value specified
* in the NTWDT_BOOT_TIMEOUT_PROP driver Property.
* Again, this value should be greater than the actual
*
* When the user enters OBP via the System Controller,
* ScApp will disable the watchdog (from ScApp's
* perspective), but it will not communicate this to
* ntwdt. After having exited OBP, the wdog-app can
* be used to enable or disable the watchdog (which
* will get both ScApp and ntwdt in-sync).
*
* Locking
* -------
*
* ntwdt has code running at three interrupt levels as
* well as base level.
*
* The ioctls run at base level in User Context. The
* driver's entry points run at base level in Kernel
* Context.
*
* ntwdt's three interrupt levels are used by:
*
* o LOCK_LEVEL :
* the Cyclic used to manage the VWDT is initialized
* to CY_LOCK_LEVEL
*
* o DDI_SOFTINT_MED :
* the SBBC mailbox implementation registers the
* specified handlers at this level
*
* o DDI_SOFTINT_LOW :
* this level is used by two handlers. One handler
* is triggered by the LOCK_LEVEL Cyclic. The other
* handler is triggered by the DDI_SOFTINT_MED
* handler registered to handle SBBC mailbox events.
*
* The centralizing concept is that the ntwdt_wdog_mutex
* in the driver's softstate is initialized to have an
* interrupt-block-cookie corresponding to DDI_SOFTINT_LOW.
*
* As a result, any base level code grabs ntwdt_wdog_mutex
* before doing work. Also, any handler running at interrupt
* level higher than DDI_SOFTINT_LOW "posts down" so that
* a DDI_SOFTINT_LOW handler is responsible for executing
* the "real work". Each DDI_SOFTINT_LOW handler also
* first grabs ntwdt_wdog_mutex, and so base level is
* synchronized with all interrupt levels.
*
* Note there's another mutex in the softstate: ntwdt_mutex.
* This mutex has few responsibilities. However, this
* locking order must be followed: ntwdt_wdog_mutex is
* held first, and then ntwdt_mutex. This choice results
* from the fact that the number of dynamic call sites
* for ntwdt_wdog_mutex is MUCH greater than that of
* ntwdt_mutex. As a result, almost all uses of
* ntwdt_wdog_mutex do not even require ntwdt_mutex to
* be held, which saves resources.
*
* Driver Properties
* -----------------
*
* "ddi-forceattach=1;"
* ------------------
*
* Using this allows our driver to be automatically
* loaded at boot-time AND to not be removed from memory
* solely due to memory-pressure.
*
* Being loaded at boot allows ntwdt to (as soon as
* possible) tell ScApp of the current mode of the
* state-machine (eg, SWDT). This is needed for the case
* when Solaris is re-loaded while in AWDT mode; having
* Solaris communicate ASAP with ScApp reduces the duration
* of any "split-brain" scenario where ScApp and Solaris
* are not in the same mode.
*
* Having ntwdt remain in memory even after a close()
* allows ntwdt to answer any SBBC mailbox commands
* that ScApp sends (as the mailbox infrastructure is
* not torn down until ntwdt is detach()'d). Specifically,
* ScApp could be re-loaded after AWDT mode had been
* entered and the wdog-app had close()'d ntwdt. ScApp
* will then eventually send a LW8_EVENT_SC_RESTARTED
* mailbox event in order to learn the current state of
* state-machine. Having ntwdt remain loaded allows this
* event to never go unanswered.
*
* "ntwdt-boottimeout=600;"
* ----------------------
*
* This specifies the watchdog timeout value (in seconds) to
*
* ntwdt will update ScApp by setting the watchdog timeout
* to the specified number of seconds when either a) Solaris
* panics or b) the VWDT expires. Note that this is only done
* if the user has chosen to enable Reset.
*
* ntwdt boundary-checks the specified value, and if out-of-range,
* it initializes the watchdog timeout to a default value of
* NTWDT_DEFAULT_BOOT_TIMEOUT seconds. Note that this is a
* default value and is not a *minimum* value. The valid range
* for the watchdog timeout is between one second and
* NTWDT_MAX_TIMEOUT seconds, inclusive.
*
* If ntwdt-boottimeout is set to a value less than an actual
* Solaris boot's latency, ScApp will reset Solaris during boot.
* Note that a continuous series of ScApp-induced resets will
* not occur; ScApp only resets Solaris on the first transition
* into the watchdog-expired state.
*/
#include <sys/ddi_impldefs.h>
#include <sys/lw8_impl.h>
#include <sys/sgsbbc_iosram.h>
#include <sys/sgsbbc_mailbox.h>
#include <sys/mem_config.h>
/*
* tunables
*/
int ntwdt_disable_timeout_action = 0;
#ifdef DEBUG
/*
* tunable to simulate a Solaris hang. If is non-zero, then
* no system heartbeats ("hardware patting") will be done,
* even though all AWDT machinery is functioning OK.
*/
int ntwdt_stop_heart;
#endif
/*
* Driver Property
*/
#define NTWDT_BOOT_TIMEOUT_PROP "ntwdt-boottimeout"
/*
* watchdog-timeout values (in seconds):
*
* NTWDT_DEFAULT_BOOT_TIMEOUT: the default value used if
* this driver is aware of the
* reboot.
*
* NTWDT_MAX_TIMEOUT: max value settable by app (via the
* LOMIOCDOGTIME ioctl)
*/
#define NTWDT_CYCLIC_CHK_PERCENT (20)
#define NTWDT_MINOR_NODE "awdt"
#define NTWDT_SUCCESS 0
#define NTWDT_FAILURE 1
typedef struct {
/* MBOX_EVENT_LW8 that is sent in IOSRAM Mailbox: */
static ddi_softintr_t ntwdt_mbox_softint_id;
/*
* VWDT (i.e., Virtual Watchdog Timer) state
*/
typedef struct {
int ntwdt_wdog_enabled; /* wdog enabled ? */
int ntwdt_reset_enabled; /* reset enabled ? */
int ntwdt_timer_running; /* wdog running ? */
int ntwdt_wdog_expired; /* wdog expired ? */
int ntwdt_is_initial_enable; /* 1st wdog-enable? */
} ntwdt_wdog_t;
/* ntwdt_wdog_flags */
#define NTWDT_FLAG_SET(p, f)\
((p)->ntwdt_wdog_flags |= NTWDT_FLAG_##f)
#define NTWDT_FLAG_CLR(p, f)\
((p)->ntwdt_wdog_flags &= ~NTWDT_FLAG_##f)
/* softstate */
typedef struct {
int ntwdt_open_flag; /* file open ? */
static void *ntwdt_statep; /* softstate */
static dev_info_t *ntwdt_dip;
/*
* if non-zero, then the app-wdog feature is available on
* this system configuration.
*/
static int ntwdt_watchdog_available;
/*
* if non-zero, then application has used the LOMIOCDOGCTL
* ioctl at least once in order to Enable the app-wdog.
* Also, if this is non-zero, then system is in AWDT mode,
* else it is in SWDT mode.
*/
static int ntwdt_watchdog_activated;
void **result);
static void ntwdt_reprogram_wd(ntwdt_state_t *);
static void ntwdt_start_timer(ntwdt_state_t *);
static void ntwdt_stop_timer(void *);
static void ntwdt_stop_timer_lock(void *arg);
static void ntwdt_remove_callbacks();
static void ntwdt_cyclic_pat(void *arg);
static void ntwdt_enforce_timeout();
static void ntwdt_pat_hw_watchdog();
static int ntwdt_read_props(ntwdt_state_t *);
static int ntwdt_add_mbox_handlers(ntwdt_state_t *);
static int ntwdt_remove_mbox_handlers(void);
static int ntwdt_chk_wdog_support();
static int ntwdt_chk_sc_support();
static int ntwdt_set_swdt_state();
static void ntwdt_swdt_to_awdt(ntwdt_wdog_t *);
#ifdef DEBUG
#endif
struct cb_ops ntwdt_cb_ops = {
ntwdt_open, /* open */
ntwdt_close, /* close */
nulldev, /* strategy */
nulldev, /* print */
nulldev, /* dump */
nulldev, /* read */
nulldev, /* write */
ntwdt_ioctl, /* ioctl */
nulldev, /* devmap */
nulldev, /* mmap */
nulldev, /* segmap */
nochpoll, /* poll */
ddi_prop_op, /* cb_prop_op */
NULL, /* streamtab */
};
DEVO_REV, /* Devo_rev */
0, /* Refcnt */
ntwdt_info, /* Info */
nulldev, /* Identify */
nulldev, /* Probe */
ntwdt_attach, /* Attach */
ntwdt_detach, /* Detach */
nodev, /* Reset */
&ntwdt_cb_ops, /* Driver operations */
0, /* Bus operations */
NULL /* Power */
};
&mod_driverops, /* This one is a driver */
"ntwdt-Netra-T12 v%I%", /* Name of the module. */
&ntwdt_ops, /* Driver ops */
};
static struct modlinkage modlinkage = {
};
/*
* Flags to set in ntwdt_debug.
*
* Use either the NTWDT_DBG or NTWDT_NDBG macros
*/
/* used in non-debug version of module */
#ifdef DEBUG
typedef struct {
} ntwdt_data_t;
/* used in debug version of module */
#else
#endif
int
_init(void)
{
int error = 0;
/* Initialize the soft state structures */
sizeof (ntwdt_state_t), 1)) != 0) {
return (error);
}
/* Install the loadable module */
}
return (error);
}
int
{
}
int
_fini(void)
{
int error;
if (error == 0) {
}
return (error);
}
static int
{
int instance;
switch (cmd) {
case DDI_ATTACH:
break;
case DDI_RESUME:
return (DDI_SUCCESS);
default:
return (DDI_FAILURE);
}
/* see if app-wdog is supported on our config */
if (ntwdt_chk_wdog_support() != 0)
return (DDI_FAILURE);
/* (unsolicitedly) send SWDT state to ScApp via mailbox */
!= DDI_SUCCESS) {
return (DDI_FAILURE);
}
MUTEX_DRIVER, NULL);
/*
* Initialize the watchdog structure
*/
/*
* Create an iblock-cookie so that ntwdt_wdog_mutex can be
* used at User Context and Interrupt Context.
*/
"for ntwdt_wdog_mutex");
goto err1;
} else {
(void *)wdog_state->ntwdt_wdog_mtx_cookie);
}
MUTEX_DRIVER, NULL);
/* Cyclic fires once per second: */
/* interpret our .conf file. */
(void) ntwdt_read_props(ntwdt_ptr);
/* init the Cyclic that drives the VWDT */
/* Register handler for SBBC Mailbox events */
goto err2;
/* Softint that will be triggered by Cyclic that drives VWDT */
!= DDI_SUCCESS) {
goto err3;
}
/* Register callbacks for various system events, e.g. panic */
/*
* Create Minor Node as last activity. This prevents
* application from accessing our implementation until it
* is initialized.
*/
goto err4;
}
/* Display our driver info in the banner */
return (DDI_SUCCESS);
err4:
err3:
err2:
err1:
return (DDI_FAILURE);
}
/*
* Do static checks to see if the app-wdog feature is supported in
* the current configuration.
*
* If the kernel debugger was booted, then we disallow the app-wdog
* feature, as we assume the user will be interested more in
* debuggability of system than its ability to support an app-wdog.
* (Note that the System Watchdog (SWDT) can still be available).
*
* If the currently loaded version of ScApp does not understand one
* of the IOSRAM mailbox messages that is specific to the app-wdog
* protocol, then we disallow use of the app-wdog feature (else
* we could have a "split-brain" scenario where Solaris supports
* app-wdog but ScApp doesn't).
*
* Note that there is no *dynamic* checking of whether ScApp supports
* the wdog protocol. Eg, if a new version of ScApp was loaded out
* from under Solaris, then once in AWDT mode, Solaris has no way
* of knowing that (a possibly older version of) ScApp was loaded.
*/
static int
{
int rv;
"application watchdog is not available.");
return (retval);
}
/*
* if ScApp does not support the MBOX_GET cmd, then
* it does not support the app-wdog feature. Also,
* if there is *any* type of SBBC Mailbox error at
* this point, we will disable the app watchdog
* feature.
*/
if ((rv = ntwdt_chk_sc_support()) != 0) {
"the application watchdog feature.");
else
"application watchdog is not available.");
} else {
retval = 0;
}
return (retval);
}
/*
* Check to see if ScApp supports the app-watchdog feature.
*
* Do this by sending one of the mailbox commands that is
* specific to the app-wdog protocol. If ScApp does not
* return an error code, we will assume it understands it
* (as well as the remainder of the app-wdog protocol).
*
* Notes:
* ntwdt_lomcmd() will return EINVAL if ScApp does not
* understand the message. The underlying sbbc_mbox_
* utility function returns SG_MBOX_STATUS_ILLEGAL_PARAMETER
* ("illegal ioctl parameter").
*/
static int
{
}
static int
{
return (DDI_FAILURE);
}
switch (cmd) {
case DDI_SUSPEND:
return (DDI_SUCCESS);
case DDI_DETACH:
/*
* release resources in opposite (LIFO) order as
* were allocated in attach(9f).
*/
ntwdt_stop_timer_lock((void *)ntwdt_ptr);
sizeof (ntwdt_wdog_t));
return (DDI_SUCCESS);
default:
return (DDI_FAILURE);
}
}
/*
* Register the SBBC Mailbox handlers.
*
* Currently, only one handler is used. It processes the MBOX_EVENT_LW8
* Events that are sent by ScApp. Of the Events that are sent, only
* the Event declaring that ScApp is coming up from a reboot
* (LW8_EVENT_SC_RESTARTED) is processed.
*
* sbbc_mbox_reg_intr registers the handler so that it executes at
* a DDI_SOFTINT_MED priority.
*/
static int
{
int err;
/*
* We need two interrupt handlers to handle the SBBC mbox
* events. The sbbc_mbox_xxx implementation will
* trigger our ntwdt_event_data_handler, which itself will
* trigger our ntwdt_mbox_softint. As a result, we'll
* register ntwdt_mbox_softint first, to ensure it cannot
* be called (until its caller, ntwdt_event_data_handler)
* is registered.
*/
/*
* add the softint that will do the real work of handling the
* LW8_SC_RESTARTED_EVENT sent from ScApp.
*/
return (DDI_FAILURE);
}
/*
* Register an interrupt handler with the SBBC mailbox utility.
* This handler will get called on each event of each type of
* MBOX_EVENT_LW8 events. However, it will only conditionally
* trigger the worker-handler (ntwdt_mbox_softintr).
*/
if (err != 0) {
" handler. err=%d", err);
return (DDI_FAILURE);
}
return (DDI_SUCCESS);
}
/*
* Unregister the SBBC Mailbox handlers that were registered
* by ntwdt_add_mbox_handlers.
*/
static int
{
int rv = DDI_SUCCESS;
int err;
/*
* unregister the two handlers that cooperate to handle
* the LW8_SC_RESTARTED_EVENT. Note that they are unregistered
* in LIFO order (as compared to how they were registered).
*/
if (err != 0) {
"handler. Err=%d", err);
rv = DDI_FAILURE;
}
/* remove the associated softint */
return (rv);
}
static int
{
int instance;
int error = DDI_SUCCESS;
return (DDI_FAILURE);
switch (infocmd) {
case DDI_INFO_DEVT2DEVINFO:
else
error = DDI_FAILURE;
break;
case DDI_INFO_DEVT2INSTANCE:
break;
default:
error = DDI_FAILURE;
}
return (error);
}
/*
* Open the device this driver manages.
*
* Ensure the caller is a privileged process, else
* a non-privileged user could cause denial-of-service
* and/or negatively impact reliability/availability.
*
* Ensure there is only one concurrent open().
*/
static int
{
int ret = 0;
/* ensure caller is a privileged process */
return (EPERM);
/*
* Check for a Deferred Attach scenario.
* Return ENXIO so DDI framework will call
* attach() and then retry the open().
*/
return (ENXIO);
if (ntwdt_ptr->ntwdt_open_flag != 0)
else
return (ret);
}
/*
* Close the device this driver manages.
*
* Notes:
*
* The close() can happen while the AWDT is running !
* (and nothing is done, eg, to disable the watchdog
* or to stop updating the system heartbeat). This
* is the desired behavior, as this allows for the
* case of monitoring a Solaris reboot in terms
* of watchdog expiration.
*/
static int
{
return (ENXIO);
if (ntwdt_ptr->ntwdt_open_flag != 0) {
ntwdt_ptr->ntwdt_open_flag = 0;
}
return (0);
}
static int
{
int retval = 0;
return (ENXIO);
if (ntwdt_watchdog_available == 0)
return (ENXIO);
switch (cmd) {
case LOMIOCDOGSTATE: {
/*
* Return the state of the AWDT to the application.
*/
sizeof (lom_dogstate_t), mode) != 0) {
}
break;
}
case LOMIOCDOGCTL: {
/*
* Allow application to control whether watchdog
* is {dis,en}abled and whether Reset is
* {dis,en}abled.
*/
sizeof (lom_dogctl_t), mode) != 0) {
break;
}
if (wdog_state->ntwdt_wdog_timeout == 0) {
/*
* then LOMIOCDOGTIME has never been used
* to setup a valid timeout.
*/
goto end;
}
/*
* Return error for the non-sensical combination:
* "enable Reset" and "disable watchdog".
*/
if (lom_dogctl.dog_enable == 0 &&
lom_dogctl.reset_enable != 0) {
goto end;
}
/*
* Store the user-specified state in our softstate.
* Note that our implementation here is stateless.
* Eg, we do not disallow an "enable the watchdog"
* command when the watchdog is currently enabled.
* This is needed (at least in the case) when
* ScApp disables the watchdog, but does not inform
* Solaris. As a result, an ensuing, unfiltered DOGCTL
* to enable the watchdog is required.
*/
if (wdog_state->ntwdt_wdog_enabled != 0) {
/*
* then user wants to enable watchdog.
* Arm the watchdog timer and start the
* Cyclic, if it is not running.
*/
if (wdog_state->ntwdt_timer_running == 0) {
}
} else {
/*
* user wants to disable the watchdog.
* Note that we do not set ntwdt_secs_remaining
* to zero; that could cause a false expiration.
*/
if (wdog_state->ntwdt_timer_running != 0) {
}
}
/*
* Send a permutation of mailbox commands to
* ScApp that describes the current state of the
* watchdog timer. Note that the permutation
* depends on whether this is the first
* Enabling of the watchdog or not.
*/
if (wdog_state->ntwdt_wdog_enabled != 0 &&
wdog_state->ntwdt_is_initial_enable == 0) {
/* switch from SWDT to AWDT mode */
/* Tell ScApp we're in AWDT mode */
}
/* Inform ScApp of the choices made by the app */
if (wdog_state->ntwdt_wdog_enabled != 0 &&
wdog_state->ntwdt_is_initial_enable == 0) {
/*
* Clear tod_iosram_t.tod_timeout_period,
* which is used in SWDT part of state
* machine. (If this field is non-zero,
* ScApp assumes that Solaris' SWDT is active).
*
* Clearing this is useful in case SC reboots
* while Solaris is running, as ScApp will read
* a zero and not assume SWDT is running.
*/
/* "the first watchdog-enable has been seen" */
}
break;
}
case LOMIOCDOGTIME: {
/*
* Allow application to set the period (in seconds)
* of the watchdog timeout.
*/
break;
}
lom_dogtime));
/* Ensure specified timeout is within range. */
if ((lom_dogtime == 0) ||
(lom_dogtime > NTWDT_MAX_TIMEOUT)) {
break;
}
/*
* If watchdog is currently running, re-arm the
* watchdog timeout with the specified value.
*/
if (wdog_state->ntwdt_timer_running != 0) {
}
/* Tell ScApp of the specified timeout */
break;
}
case LOMIOCDOGPAT: {
/*
* Allow user to re-arm ("pat") the watchdog.
*/
/*
* If watchdog is not enabled or underlying
* Cyclic timer is not running, exit.
*/
if (!(wdog_state->ntwdt_wdog_enabled &&
goto end;
if (wdog_state->ntwdt_wdog_expired == 0) {
/* then VWDT has not expired; re-arm it */
" %d seconds",
}
break;
}
#ifdef DEBUG
case NTWDTIOCPANIC: {
/*
* Use in unit/integration testing to test our
* panic-handler code.
*/
break;
}
case NTWDTIOCSTATE: {
/*
* Allow application to read wdog state from the
* SC (and *not* the driver's softstate).
*
* Return state of:
* o recovery-enabled
* o current timeout value
*/
int action;
int timeout;
int ret;
if (ret != NTWDT_SUCCESS) {
break;
}
sizeof (ntwdt_data_t), mode) != 0) {
}
break;
}
#endif
default:
break;
}
return (retval);
end:
return (retval);
}
/*
* Arm the Virtual Watchdog Timer (VWDT).
*
* Assign the current watchdog timeout (ntwdt_wdog_timeout)
* to the softstate variable representing the watchdog
* timer (ntwdt_secs_remaining).
*
* To ensure (from ntwdt's perspective) that any actual
* timeout expiration is at least as large as the expected
* checked in the Cyclic's softint.
*
* If the Cyclic has been started, the goal is to ignore
* the _next_ firing of the Cyclic, as that firing will
* NOT represent a full, one-second period. If the Cyclic
* has NOT been started yet, then do not ignore the next
* Cyclic's firing, as that's the First One, and it was
* programmed to fire at a specific time (see ntwdt_start_timer).
*/
static void
{
/* arm the watchdog timer (VWDT) */
if (wdog_state->ntwdt_timer_running != 0)
else
}
/*
* Switch from SWDT mode to AWDT mode.
*/
static void
{
/*
* Disable SWDT. If SWDT is currently active,
* display a message so user knows that SWDT Mode
* has terminated.
*/
if (watchdog_enable != 0 ||
watchdog_activated != 0)
watchdog_enable = 0;
watchdog_activated = 0;
/* "we are in AWDT mode" */
}
/*
* This is the Cyclic that runs at a multiple of the
* AWDT's watchdog-timeout period. This Cyclic runs at
* LOCK_LEVEL (eg, CY_LOCK_LEVEL) and will post a
* soft-interrupt in order to complete all processing.
*
* Executing at LOCK_LEVEL gives this function a high
* interrupt priority, while performing its work via
* a soft-interrupt allows for a consistent (eg, MT-safe)
* view of driver softstate between User and Interrupt
* context.
*
* Context:
* interrupt context: Cyclic framework calls at
* CY_LOCK_LEVEL (=> 10)
*/
static void
ntwdt_cyclic_pat(void *arg)
{
/* post-down to DDI_SOFTINT_LOW */
}
/*
* This is the soft-interrupt triggered by the AWDT
* Cyclic.
*
* This softint does all the work re: computing whether
* the VWDT expired. It grabs ntwdt_wdog_mutex
* so User Context code (eg, the IOCTLs) cannot run,
* and then it tests whether the VWDT expired. If it
* hasn't, it decrements the VWDT timer by the amount
* of the Cyclic's period. If the timer has expired,
* it initiates Recovery (based on what user specified
* in LOMIOCDOGCTL).
*
* This function also updates the normal system "heartbeat".
*
* Context:
* interrupt-context: DDI_SOFTINT_LOW
*/
static uint_t
ntwdt_cyclic_softint(char *arg)
{
if ((wdog_state->ntwdt_wdog_flags &
NTWDT_FLAG_SKIP_CYCLIC) != 0) {
/*
* then skip all processing by this interrupt.
* (see ntwdt_arm_vwdt()).
*/
goto end;
}
if (wdog_state->ntwdt_timer_running == 0 ||
(wdog_state->ntwdt_wdog_enabled == 0))
goto end;
/* re-arm ("pat") the hardware watchdog */
/* Decrement the VWDT and see if it has expired. */
if (--wdog_state->ntwdt_secs_remaining == 0) {
if (wdog_state->ntwdt_reset_enabled != 0) {
/*
* Update ScApp so that the new wdog-timeout
* value is as specified in the
* NTWDT_BOOT_TIMEOUT_PROP driver Property.
* This timeout is assumedly larger than the
* actual Solaris reboot time. This will allow
* our forced-reboot to not cause an unplanned
* (series of) watchdog expiration(s).
*/
if (ntwdt_disable_timeout_action == 0)
} else {
wdog_state->ntwdt_wdog_enabled = 0;
/*
* Tell ScApp to disable wdog; this prevents
* the "2x-timeout" artifact. Eg, Solaris
* times-out at t(x) and ScApp times-out at t(2x),
* where (x==ntwdt_wdog_timeout).
*/
(void) ntwdt_set_cfgvar(LW8_WDT_PROP_WDT,
}
/* Schedule Callout to stop this Cyclic */
} else {
}
end:
return (DDI_INTR_CLAIMED);
}
/*
* Program the AWDT watchdog-timeout value to that specified
* in the NTWDT_BOOT_TIMEOUT_PROP driver Property. However,
* only do this if the AWDT is in the correct state.
*
* Caller's Context:
* o interrupt context: (from software-interrupt)
* o during a panic
*/
static void
{
/*
* Program the AWDT watchdog-timeout value only if the
* watchdog is enabled, the user wants to do recovery,
* ("reset is enabled") and the AWDT timer is currently
* running.
*/
if (wdog_state->ntwdt_wdog_enabled != 0 &&
wdog_state->ntwdt_reset_enabled != 0 &&
wdog_state->ntwdt_timer_running != 0) {
if (ddi_in_panic() != 0)
else
(void) ntwdt_set_cfgvar(LW8_WDT_PROP_TO,
}
}
/*
* This is the callback that was registered to run during a panic.
* It will set the watchdog-timeout value to be that as specified
* in the NTWDT_BOOT_TIMEOUT_PROP driver Property.
*
* Note that unless this Property's value specifies a timeout
* that's larger than the actual reboot latency, ScApp will
* experience a timeout and initiate Recovery.
*/
static boolean_t
{
ASSERT(ddi_in_panic() != 0);
return (B_TRUE);
}
/*
* Initialize the Cyclic that is used to monitor the VWDT.
*/
static void
{
/*
* Init Cyclic so its first expiry occurs wdog-timeout
* seconds from the current, absolute time.
*/
wdog_state->ntwdt_wdog_expired = 0;
}
/*
* Stop the cyclic that is used to monitor the VWDT (and
* was Started by ntwdt_start_timer).
*
* Context: per the Cyclic API, cyclic_remove cannot be called
* from interrupt-context. Note that when this is
* called via a Callout, it's called from base level.
*/
static void
ntwdt_stop_timer(void *arg)
{
}
/*
* Stop the cyclic that is used to monitor the VWDT (and
* do it in a thread-safe manner).
*
* This is a wrapper function for the core function,
* ntwdt_stop_timer. Both functions are useful, as some
* callers will already have the appropriate mutex locked, and
* other callers will not.
*/
static void
ntwdt_stop_timer_lock(void *arg)
{
}
/*
* Add callbacks needed to react to major system state transitions.
*/
static void
{
/* register a callback that's called during a panic */
}
/*
* Remove callbacks added by ntwdt_add_callbacks.
*/
static void
{
}
/*
* Initiate a Reset (as a result of the VWDT timeout expiring).
*/
static void
{
if (ntwdt_disable_timeout_action != 0) {
return;
}
}
/*
* Interpret the Properties from driver's config file.
*/
static int
{
int boot_timeout;
/*
* interpret Property that specifies how long
* the watchdog-timeout should be set to when
* Solaris panics. Assumption is that this value
* is larger than the amount of time it takes
* to reboot and write crashdump. If not,
* ScApp could induce a reset, due to an expired
* watchdog-timeout.
*/
NTWDT_BOOT_TIMEOUT_PROP, -1);
} else {
": using default of %d seconds.",
}
return (DDI_SUCCESS);
}
/*
* Write state of SWDT to ScApp.
*
* Currently, this function is only called on attach()
* of our driver.
*
* Note that we do not need to call this function, eg,
* in response to a solicitation from ScApp (eg,
* the LW8_SC_RESTARTED_EVENT).
*
* Context:
* called in Kernel Context
*/
static int
{
/*
* note that ScApp only needs this one
* variable when system is in SWDT mode.
*/
return (0);
}
/*
* Write all AWDT state to ScApp via the SBBC mailbox
* in IOSRAM. Note that the permutation of Writes
* is as specified in the design spec.
*
* Notes: caller must perform synchronization so that
* this series of Writes is consistent as viewed
* by ScApp (eg, there is no LW8_WDT_xxx mailbox
* command that contains "all Properties"; each
* Property must be written individually).
*/
static int
{
/* ScApp expects values in this order: */
ntwdt_watchdog_activated != 0);
return (NTWDT_SUCCESS);
}
/*
* Write a specified WDT Property (and Value) to ScApp.
*
* <Property, Value> is passed in the LW8_MBOX_WDT_SET
* (SBBC) mailbox message. The SBBC mailbox resides in
* IOSRAM.
*
* Note that this function is responsible for ensuring that
* a driver-specific representation of a mailbox <Value> is
* mapped into the representation that is expected by ScApp
* (eg, see LW8_WDT_PROP_RECOV).
*/
static int
{
int rv;
int mbox_val;
switch (var) {
case LW8_WDT_PROP_RECOV:
#ifdef DEBUG
#endif
break;
case LW8_WDT_PROP_WDT:
#ifdef DEBUG
#endif
break;
case LW8_WDT_PROP_TO:
#ifdef DEBUG
" %d seconds", val));
#endif
break;
case LW8_WDT_PROP_MODE:
#ifdef DEBUG
#endif
break;
default:
ASSERT(0);
}
if (rv != 0) {
}
return (rv);
}
static void
{
}
#ifdef DEBUG
/*
* Read a specified WDT Property from ScApp.
*
* <Property> is passed in the Request of the LW8_MBOX_WDT_GET
* (SBBC) mailbox message, and the Property's <Value>
* is returned in the message's Response. The SBBC mailbox
* resides in IOSRAM.
*/
static int
{
int rv;
if (rv != 0) {
} else {
switch (var) {
case LW8_WDT_PROP_RECOV:
*val));
break;
case LW8_WDT_PROP_WDT:
*val));
break;
case LW8_WDT_PROP_TO:
" %d seconds", *val));
break;
default:
ASSERT(0);
}
}
return (rv);
}
#endif
/*
* Update the real system "heartbeat", which resides in IOSRAM.
* This "heartbeat" is normally used in SWDT Mode, but when
* in AWDT Mode, ScApp also uses its value to determine if Solaris
* is up-and-running.
*/
static void
{
static uint32_t i_am_alive = 0;
#ifdef DEBUG
if (ntwdt_stop_heart != 0)
return;
#endif
/* Update the system heartbeat */
if (i_am_alive == UINT32_MAX)
i_am_alive = 0;
else
i_am_alive++;
i_am_alive));
(char *)&i_am_alive, sizeof (uint32_t))) {
"write heartbeat failed");
}
}
/*
* Write the specified value to the system's normal (IOSRAM)
* location that's used to specify Solaris' watchdog-timeout
* on Serengeti platforms.
*
* In SWDT Mode, this location can hold values [0,n).
* In AWDT Mode, this location must have value 0 (else
* after a ScApp-reboot, ScApp could mistakenly interpret
* that the system is in SWDT Mode).
*/
static int
{
int rv;
if (rv != 0)
return (rv);
}
/*
* Soft-interrupt handler that is triggered when ScApp wants
* to know the current state of the app-wdog.
*
* Grab ntwdt_wdog_mutex so that we synchronize with any
* concurrent User Context and Interrupt Context activity. Call
* a function that writes a permutation of the watchdog state
* to the SC, then release the mutex.
*
* We grab the mutex not only so that each variable is consistent
* but also so that the *permutation* of variables is consistent.
* I.e., any set of one or more variables (that we write to SC
* using multiple mailbox commands) will truly be seen as a
* consistent snapshot. Note that if our protocol had a MBOX_SET
* command that allowed writing all watchdog state in one
* command, then the lock-hold latency would be greatly reduced.
* To our advantage, this softint normally executes very
* infrequently.
*
* Context:
* called at Interrupt Context (DDI_SOFTINT_LOW)
*/
static uint_t
ntwdt_mbox_softint(char *arg)
{
/* tell ScApp state of AWDT */
return (DDI_INTR_CLAIMED);
}
/*
* Handle MBOX_EVENT_LW8 Events that are sent from ScApp.
*
* The only (sub-)type of Event we handle is the
* LW8_EVENT_SC_RESTARTED Event. We handle this by triggering
* a soft-interrupt only if we are in AWDT mode.
*
* ScApp sends this Event when it wants to learn the current
* state of the AWDT variables. Design-wise, this is used to
* handle the case where the SC reboots while the system is in
* AWDT mode (if the SC reboots in SWDT mode, then ScApp
* already knows all necessary info and therefore won't send
* this Event).
*
* Context:
* function is called in Interrupt Context (at DDI_SOFTINT_MED)
* and we conditionally trigger a softint that will run at
* DDI_SOFTINT_LOW. Note that function executes at
* DDI_SOFTINT_MED due to how this handler was registered by
* the implementation of sbbc_mbox_reg_intr().
*
* Notes:
* Currently, the LW8_EVENT_SC_RESTARTED Event is only sent
* by SC when in AWDT mode.
*/
static uint_t
ntwdt_event_data_handler(char *arg)
{
return (DDI_INTR_CLAIMED);
}
return (DDI_INTR_CLAIMED);
}
switch (payload->event_type) {
case LW8_EVENT_SC_RESTARTED:
/*
* then SC probably was rebooted, and it therefore
* needs to know what the current state of AWDT is.
*/
"received in %s mode",
if (ntwdt_watchdog_activated != 0) {
/* then system is in AWDT mode */
}
break;
default:
break;
}
return (DDI_INTR_CLAIMED);
}
/*
* Send an SBBC Mailbox command to ScApp.
*
* Use the sbbc_mbox_request_response utility function to
* send the Request and receive the optional Response.
*
* Context:
* can be called from Interrupt Context or User Context.
*/
static int
{
int rv = 0;
switch (cmd) {
case LW8_MBOX_WDT_GET:
break;
case LW8_MBOX_WDT_SET:
break;
default:
return (EINVAL);
}
/* errors from sgsbbc */
if (resp->msg_status > 0) {
return (resp->msg_status);
}
/* errors from ScApp */
switch (resp->msg_status) {
/* illegal ioctl parameter */
return (EINVAL);
default:
return (EIO);
}
}
return (0);
}