errorq.c revision 267b64d58a6a4b7a69129176bbf72eb5e4acf75c
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
/*
* Kernel Error Queues
*
* A common problem when handling hardware error traps and interrupts is that
* these errors frequently must be handled at high interrupt level, where
* reliably producing error messages and safely examining and manipulating
* other kernel state may not be possible. The kernel error queue primitive is
* a common set of routines that allow a subsystem to maintain a queue of
* errors that can be processed by an explicit call from a safe context or by a
* soft interrupt that fires at a specific lower interrupt level. The queue
* management code also ensures that if the system panics, all in-transit
* errors are logged prior to reset. Each queue has an associated kstat for
* observing the number of errors dispatched and logged, and mdb(1) debugging
* support is provided for live and post-mortem observability.
*
* Memory Allocation
*
* All of the queue data structures are allocated in advance as part of
* the errorq_create() call. No additional memory allocations are
* performed as part of errorq_dispatch(), errorq_reserve(),
* errorq_commit() or errorq_drain(). This design
* facilitates reliable error queue processing even when the system is low
* on memory, and ensures that errorq_dispatch() can be called from any
* context. When the queue is created, the maximum queue length is
* specified as a parameter to errorq_create() and errorq_nvcreate(). This
* length should represent a reasonable upper bound on the number of
* simultaneous errors. If errorq_dispatch() or errorq_reserve() is
* invoked and no free queue elements are available, the error is
* dropped and will not be logged. Typically, the queue will only be
* exhausted by an error storm, and in this case
* the earlier errors provide the most important data for analysis.
* When a new error is dispatched, the error data is copied into the
* preallocated queue element so that the caller's buffer can be reused.
*
* When a new error is reserved, an element is moved from the free pool
* and returned to the caller. The element buffer data, eqe_data, may be
* managed by the caller and dispatched to the errorq by calling
* errorq_commit(). This is useful for additions to errorq's
* created with errorq_nvcreate() to handle name-value pair (nvpair) data.
* See below for a discussion on nvlist errorq's.
*
* Queue Drain Callback
*
* When the error queue is drained, the caller's queue drain callback is
* invoked with a pointer to the saved error data. This function may be
* called from passive kernel context or soft interrupt context at or
* below LOCK_LEVEL, or as part of panic(). As such, the callback should
* basically only be calling cmn_err (but NOT with the CE_PANIC flag).
* The callback must not call panic(), attempt to allocate memory, or wait
* on a condition variable. The callback may not call errorq_destroy()
* or errorq_drain() on the same error queue that called it.
*
* The queue drain callback will always be called for each pending error
* in the order in which errors were enqueued (oldest to newest). The
* queue drain callback is guaranteed to provide at *least* once semantics
* for all errors that are successfully dispatched (i.e. for which
* errorq_dispatch() has successfully completed). If an unrelated panic
* occurs while the queue drain callback is running on a vital queue, the
* panic subsystem will continue the queue drain and the callback may be
* invoked again for the same error. Therefore, the callback should
* restrict itself to logging messages and taking other actions that are
* not destructive if repeated.
*
* Name-Value Pair Error Queues
*
* During error handling, it may be more convenient to store error
* queue element data as a fixed buffer of name-value pairs. The
* nvpair library allows construction and destruction of nvlists
* in pre-allocated memory buffers.
*
* Error queues created via errorq_nvcreate() store queue element
* data as fixed buffer nvlists (ereports). errorq_reserve()
* allocates an errorq element from eqp->eq_bitmap and returns a valid
* pointer to a errorq_elem_t (queue element) and a pre-allocated
* fixed buffer nvlist. errorq_elem_nvl() is used to gain access
* to the nvlist to add name-value ereport members prior to
* dispatching the error queue element in errorq_commit().
*
* Once dispatched, the drain function will return the element to
* eqp->eq_bitmap and reset the associated nv_alloc structure.
* error_cancel() may be called to cancel an element reservation
* element that was never dispatched (committed). This is useful in
* cases where a programming error prevents a queue element from being
* dispatched.
*
* Queue Management
*
* The queue element structures and error data buffers are allocated in
* two contiguous chunks as part of errorq_create() or errorq_nvcreate().
* Each queue element structure contains a next pointer,
* a previous pointer, and a pointer to the corresponding error data
* buffer. The data buffer for a nvlist errorq is a shared buffer
* for the allocation of name-value pair lists. The elements are kept on
* one of four lists:
*
* Unused elements are kept in the free pool, managed by eqp->eq_bitmap.
* The eqe_prev and eqe_next pointers are not used while in the free pool
* and will be set to NULL.
*
* Pending errors are kept on the pending list, a singly-linked list
* pointed to by eqp->eq_pend, and linked together using eqe_prev. This
* list is maintained in order from newest error to oldest. The eqe_next
* pointer is not used by the pending list and will be set to NULL.
*
* The processing list is a doubly-linked list pointed to by eqp->eq_phead
* (the oldest element) and eqp->eq_ptail (the newest element). The
* eqe_next pointer is used to traverse from eq_phead to eq_ptail, and the
* eqe_prev pointer is used to traverse from eq_ptail to eq_phead. Once a
* queue drain operation begins, the current pending list is moved to the
* processing list in a two-phase commit fashion (eq_ptail being cleared
* at the beginning but eq_phead only at the end), allowing the panic code
* to always locate and process all pending errors in the event that a
* panic occurs in the middle of queue processing.
*
* A fourth list is maintained for nvlist errorqs. The dump list,
* eq_dump is used to link all errorq elements that should be stored
* in a crash dump file in the event of a system panic. During
* errorq_panic(), the list is created and subsequently traversed
* in errorq_dump() during the final phases of a crash dump.
*
* Platform Considerations
*
* In order to simplify their implementation, error queues make use of the
* C wrappers for compare-and-swap. If the platform itself does not
* support compare-and-swap in hardware and the kernel emulation routines
* are used instead, then the context in which errorq_dispatch() can be
* safely invoked is further constrained by the implementation of the
* compare-and-swap emulation. Specifically, if errorq_dispatch() is
* called from a code path that can be executed above ATOMIC_LEVEL on such
* a platform, the dispatch code could potentially deadlock unless the
* corresponding error interrupt is blocked or disabled prior to calling
* errorq_dispatch(). Error queues should therefore be deployed with
* caution on these platforms.
*
* Interfaces
*
* errorq_t *errorq_create(name, func, private, qlen, eltsize, ipl, flags);
* errorq_t *errorq_nvcreate(name, func, private, qlen, eltsize, ipl, flags);
*
* Create a new error queue with the specified name, callback, and
* properties. A pointer to the new error queue is returned upon success,
* or NULL is returned to indicate that the queue could not be created.
* This function must be called from passive kernel context with no locks
* held that can prevent a sleeping memory allocation from occurring.
* errorq_create() will return failure if the queue kstats cannot be
* created, or if a soft interrupt handler cannot be registered.
*
* The queue 'name' is a string that is recorded for live and post-mortem
* examination by a debugger. The queue callback 'func' will be invoked
* for each error drained from the queue, and will receive the 'private'
* pointer as its first argument. The callback must obey the rules for
* callbacks described above. The queue will have maximum length 'qlen'
* and each element will be able to record up to 'eltsize' bytes of data.
* The queue's soft interrupt (see errorq_dispatch(), below) will fire
* at 'ipl', which should not exceed LOCK_LEVEL. The queue 'flags' may
* include the following flag:
*
* ERRORQ_VITAL - This queue contains information that is considered
* vital to problem diagnosis. Error queues that are marked vital will
* be automatically drained by the panic subsystem prior to printing
* the panic messages to the console.
*
* void errorq_destroy(errorq);
*
* Destroy the specified error queue. The queue is drained of any
* pending elements and these are logged before errorq_destroy returns.
* Once errorq_destroy() begins draining the queue, any simultaneous
* calls to dispatch errors will result in the errors being dropped.
* The caller must invoke a higher-level abstraction (e.g. disabling
* an error interrupt) to ensure that error handling code does not
* attempt to dispatch errors to the queue while it is being freed.
*
* void errorq_dispatch(errorq, data, len, flag);
*
* Attempt to enqueue the specified error data. If a free queue element
* is available, the data is copied into a free element and placed on a
* pending list. If no free queue element is available, the error is
* dropped. The data length (len) is specified in bytes and should not
* exceed the queue's maximum element size. If the data length is less
* than the maximum element size, the remainder of the queue element is
* filled with zeroes. The flag parameter should be one of:
*
* ERRORQ_ASYNC - Schedule a soft interrupt at the previously specified
* IPL to asynchronously drain the queue on behalf of the caller.
*
* ERRORQ_SYNC - Do not schedule a soft interrupt to drain the queue.
* The caller is presumed to be calling errorq_drain() or panic() in
* the near future in order to drain the queue and log the error.
*
* The errorq_dispatch() function may be called from any context, subject
* to the Platform Considerations described above.
*
* void errorq_drain(errorq);
*
* Drain the error queue of all pending errors. The queue's callback
* function is invoked for each error in order from oldest to newest.
* This function may be used at or below LOCK_LEVEL or from panic context.
*
* errorq_elem_t *errorq_reserve(errorq);
*
* Reserve an error queue element for later processing and dispatching.
* The element is returned to the caller who may add error-specific data
* to element. The element is retured to the free pool when either
* errorq_commit() is called and the element asynchronously processed
* or immediately when errorq_cancel() is called.
*
* void errorq_commit(errorq, errorq_elem, flag);
*
* Commit an errorq element (eqep) for dispatching, see
* errorq_dispatch().
*
* void errorq_cancel(errorq, errorq_elem);
*
* Cancel a pending errorq element reservation. The errorq element is
* returned to the free pool upon cancelation.
*/
#include <sys/errorq_impl.h>
#include <sys/sysmacros.h>
#include <sys/machlock.h>
#include <sys/bootconf.h>
#include <sys/compress.h>
static struct errorq_kstat errorq_kstat_template = {
{ "dispatched", KSTAT_DATA_UINT64 },
{ "dropped", KSTAT_DATA_UINT64 },
{ "logged", KSTAT_DATA_UINT64 },
{ "reserved", KSTAT_DATA_UINT64 },
{ "reserve_fail", KSTAT_DATA_UINT64 },
{ "committed", KSTAT_DATA_UINT64 },
{ "commit_fail", KSTAT_DATA_UINT64 },
{ "cancelled", KSTAT_DATA_UINT64 }
};
static uint64_t errorq_lost = 0;
static kmutex_t errorq_lock;
static uint_t
{
return (DDI_INTR_CLAIMED);
}
/*
* Create a new error queue with the specified properties and add a software
* interrupt handler and kstat for it. This function must be called from
* passive kernel context with no locks held that can prevent a sleeping
* memory allocation from occurring. This function will return NULL if the
* softint or kstat for this queue cannot be created.
*/
errorq_t *
{
/*
* If a queue is created very early in boot before device tree services
* are available, the queue softint handler cannot be created. We
* manually drain these queues and create their softint handlers when
* it is safe to do so as part of errorq_init(), below.
*/
return (NULL);
}
KSTAT_TYPE_NAMED, sizeof (struct errorq_kstat) /
"for queue %s", name);
return (NULL);
}
sizeof (struct errorq_kstat));
/*
* Iterate over the array of errorq_elem_t structures and set its
* data pointer.
*/
eep++;
}
/*
* Once the errorq is initialized, add it to the global list of queues,
* and then return a pointer to the new queue to the caller.
*/
errorq_list = eqp;
return (eqp);
}
/*
* Create a new errorq as if by errorq_create(), but set the ERRORQ_NVLIST
* flag and initialize each element to have the start of its data region used
* as an errorq_nvelem_t with a nvlist allocator that consumes the data region.
*/
errorq_t *
{
return (NULL);
}
return (eqp);
}
/*
* To destroy an error queue, we mark it as disabled and then explicitly drain
* all pending errors. Once the drain is complete, we can remove the queue
* from the global list of queues examined by errorq_panic(), and then free
* the various queue data structures. The caller must use some higher-level
* abstraction (e.g. disabling an error interrupt) to ensure that no one will
* attempt to enqueue new errors while we are freeing this queue.
*/
void
{
ulong_t i;
pp = &errorq_list;
if (p == eqp) {
break;
}
}
}
}
}
/*
* private version of bt_availbit which makes a best-efforts attempt
* at allocating in a round-robin fashion in order to facilitate post-mortem
* diagnosis.
*/
static index_t
{
/*
* First check if there are still some bits remaining in the current
* word, and see if any of those are available. We need to do this by
* hand as the bt_availbit() function always starts at the beginning
* of a word.
*/
nextword++;
}
/*
* Now check if there are any words remaining before the end of the
* bitmap. Use bt_availbit() to find any free bits.
*/
/*
* Finally loop back to the start and look for any free bits starting
* from the beginning of the bitmap to the current rotor position.
*/
}
/*
* Dispatch a new error into the queue for later processing. The specified
* data buffer is copied into a preallocated queue element. If 'len' is
* smaller than the queue element size, the remainder of the queue element is
* filled with zeroes. This function may be called from any context subject
* to the Platform Considerations described above.
*/
void
{
return; /* drop error if queue is uninitialized or disabled */
}
for (;;) {
int i, rval;
return;
}
if (rval == 0) {
break;
}
}
for (;;) {
break;
}
}
/*
* Drain the specified error queue by calling eq_func() for each pending error.
* This function must be called at or below LOCK_LEVEL or from panic context.
* In order to synchronize with other attempts to drain the queue, we acquire
* the adaptive eq_lock, blocking other consumers. Once this lock is held,
* we must use compare-and-swap to move the pending list to the processing
* list and to return elements to the free pool in order to synchronize
*
* An additional constraint on this function is that if the system panics
* while this function is running, the panic code must be able to detect and
* handle all intermediate states and correctly dequeue all errors. The
* errorq_panic() function below will be used for detecting and handling
* these intermediate states. The comments in errorq_drain() below explain
* how we make sure each intermediate state is distinct and consistent.
*/
void
{
/*
* If there are one or more pending errors, set eq_ptail to point to
* the first element on the pending list and then attempt to compare-
* and-swap NULL to the pending list. We use membar_producer() to
* make sure that eq_ptail will be visible to errorq_panic() below
* before the pending list is NULLed out. This section is labeled
* case (1) for errorq_panic, below. If eq_ptail is not yet set (1A)
* eq_pend has all the pending errors. If casptr fails or has not
* been called yet (1B), eq_pend still has all the pending errors.
* If casptr succeeds (1C), eq_ptail has all the pending errors.
*/
break;
}
/*
* If no errors were pending, assert that eq_ptail is set to NULL,
* drop the consumer lock, and return without doing anything.
*/
return;
}
/*
* Now iterate from eq_ptail (a.k.a. eep, the newest error) to the
* oldest error, setting the eqe_next pointer so that we can iterate
* over the errors from oldest to newest. We use membar_producer()
* to make sure that these stores are visible before we set eq_phead.
* If we panic before, during, or just after this loop (case 2),
* errorq_panic() will simply redo this work, as described below.
*/
/*
* Now set eq_phead to the head of the processing list (the oldest
* error) and issue another membar_producer() to make sure that
* eq_phead is seen as non-NULL before we clear eq_ptail. If we panic
* after eq_phead is set (case 3), we will detect and log these errors
* in errorq_panic(), as described below.
*/
/*
* If we enter from errorq_panic_drain(), we may already have
* errorq elements on the dump list. Find the tail of
* the list ready for append.
*/
}
/*
* Now iterate over the processing list from oldest (eq_phead) to
* newest and log each error. Once an error is logged, we use
* atomic clear to return it to the free pool. If we panic before,
* during, or after calling eq_func() (case 4), the error will still be
* found on eq_phead and will be logged in errorq_panic below.
*/
/*
* On panic, we add the element to the dump list for each
* nvlist errorq. Elements are stored oldest to newest.
* Then continue, so we don't free and subsequently overwrite
* any elements which we've put on the dump queue.
*/
else
continue;
}
}
}
/*
* Now that device tree services are available, set up the soft interrupt
* handlers for any queues that were created early in boot. We then
* manually drain these queues to report any pending early errors.
*/
void
errorq_init(void)
{
ASSERT(modrootloaded != 0);
continue; /* softint already initialized */
panic("errorq_init: failed to register IPL %u softint "
}
}
}
/*
* This function is designed to be called from panic context only, and
* therefore does not need to acquire errorq_lock when iterating over
* errorq_list. This function must be called no more than once for each
* 'what' value (if you change this then review the manipulation of 'dep'.
*/
static uint64_t
{
continue; /* do not drain this queue on this pass */
/*
* In case (1B) above, eq_ptail may be set but the casptr may
* not have been executed yet or may have failed. Either way,
* we must log errors in chronological order. So we search
* the pending list for the error pointed to by eq_ptail. If
* it is found, we know that all subsequent errors are also
* still on the pending list, so just NULL out eq_ptail and let
* errorq_drain(), below, take care of the logging.
*/
break;
}
}
/*
* In cases (1C) and (2) above, eq_ptail will be set to the
* newest error on the processing list but eq_phead will still
* be NULL. We set the eqe_next pointers so we can iterate
* over the processing list in order from oldest error to the
* newest error. We then set eq_phead to point to the oldest
* error and fall into the for-loop below.
*/
}
/*
* In cases (3) and (4) above (or after case (1C/2) handling),
* eq_phead will be set to the oldest error on the processing
* list. We log each error and return it to the free pool.
*
* Unlike errorq_drain(), we don't need to worry about updating
* eq_phead because errorq_panic() will be called at most once.
* However, we must use casptr to update the freelist in case
* errors are still being enqueued during panic.
*/
/*
* On panic, we add the element to the dump list for
* each nvlist errorq, stored oldest to newest. Then
* continue, so we don't free and subsequently overwrite
* any elements which we've put on the dump queue.
*/
else
continue;
}
}
/*
* Now go ahead and drain any other errors on the pending list.
* This call transparently handles case (1A) above, as well as
* any other errors that were dispatched after errorq_drain()
* completed its first compare-and-swap.
*/
}
return (logged);
}
/*
* Drain all error queues - called only from panic context. Some drain
* functions may enqueue errors to ERRORQ_NVLIST error queues so that
* they may be written out in the panic dump - so ERRORQ_NVLIST queues
* must be drained last. Drain ERRORQ_VITAL queues before nonvital queues
* so that vital errors get to fill the ERRORQ_NVLIST queues first, and
* do not drain the nonvital queues if there are many vital errors.
*/
void
errorq_panic(void)
{
(void) errorq_panic_drain(0);
(void) errorq_panic_drain(ERRORQ_NVLIST);
}
/*
* Reserve an error queue element for later processing and dispatching. The
* element is returned to the caller who may add error-specific data to
* element. The element is retured to the free pool when either
* errorq_commit() is called and the element asynchronously processed
* or immediately when errorq_cancel() is called.
*/
{
return (NULL);
}
for (;;) {
int i, rval;
return (NULL);
}
if (rval == 0) {
break;
}
}
}
return (eqep);
}
/*
* Commit an errorq element (eqep) for dispatching.
* This function may be called from any context subject
* to the Platform Considerations described above.
*/
void
{
return;
}
for (;;) {
break;
}
}
/*
* Cancel an errorq element reservation by returning the specified element
* to the free pool. Duplicate or invalid frees are not supported.
*/
void
{
return;
}
/*
* Write elements on the dump list of each nvlist errorq to the dump device.
* Upon reboot, fmd(1M) will extract and replay them for diagnosis.
*/
void
errorq_dump(void)
{
if (ereport_dumpbuf == NULL)
return; /* reboot or panic before errorq is even set up */
continue; /* do not dump this queue on panic */
int err;
&len, NV_ENCODE_NATIVE);
"report %p due to size %lu\n",
continue;
}
(char **)&ereport_dumpbuf, &ereport_dumplen,
NV_ENCODE_NATIVE, KM_NOSLEEP)) != 0) {
"report %p due to pack error %d\n",
continue;
}
ed.ed_hrt_nsec = 0;
}
}
}
nvlist_t *
{
}
{
}
/*
* Reserve a new element and duplicate the data of the original into it.
*/
void *
{
return (NULL);
}