error.c revision c4b034952d3374cdd114e12b3990493b1b45dc32
0N/A * The contents of this file are subject to the terms of the 0N/A * Common Development and Distribution License (the "License"). 0N/A * You may not use this file except in compliance with the License. 0N/A * See the License for the specific language governing permissions 0N/A * and limitations under the License. 0N/A * When distributing Covered Code, include this CDDL HEADER in each 0N/A * If applicable, add the following below this CDDL HEADER, with the 0N/A * fields enclosed by brackets "[]" replaced with your own identifying 0N/A * information: Portions Copyright [yyyy] [name of copyright owner] 0N/A * Copyright 2006 Sun Microsystems, Inc. All rights reserved. 0N/A * Use is subject to license terms. 1879N/A#
pragma ident "%Z%%M% %I% %E% SMI" 0N/A * Being used by memory test driver. 0N/A * ce_verbose_memory - covers CEs in DIMMs 0N/A * ce_verbose_other - covers "others" (ecache, IO, etc.) 0N/A * If the value is 0, nothing is logged. 0N/A * If the value is 1, the error is logged to the log file, but not console. 0N/A * If the value is 2, the error is logged to the log file and console. 0N/A * Tunables for controlling the handling of asynchronous faults (AFTs). Setting 0N/A * these to non-default values on a non-DEBUG kernel is NOT supported. 0N/Aint aft_panic = 0;
/* panic (not reboot) on fatal usermode AFLT */ 0N/A * Used for vbsc hostshutdown (power-off buton) 0N/A /* kernel buffer starts right after the resumable queue */ 0N/A /* Copy the error report to local buffer */ 0N/A /* Increment the queue head */ 0N/A /* set error handle to zero so it can hold new error report */ 0N/A * Power-off requested, but handle it one time only. 0N/A " invalid in resumable error handler",
0N/A /* If it is an error on other cpu */ 0N/A * Handle resumable queue full case. 0N/A * Queue the error on ce or ue queue depend on flt_panic. 0N/A * Even if flt_panic is set, the code still keep processing 0N/A * the rest element on rq until the panic starts. 0N/A * Panic here if aflt->flt_panic has been set. 0N/A * Enqueued errors will be logged as part of the panic flow. 1145N/A /* kernel buffer starts right after the nonresumable queue */ 0N/A /* Copy the error report to local buffer */ 0N/A /* Increment the queue head */ 0N/A /* set error handle to zero so it can hold new error report */ 0N/A * For the first error packet on the queue, check if it 0N/A * Fall through, precise fault also need to check 0N/A * to see if it was protected. 0N/A * If the trap occurred in privileged mode at TL=0, 0N/A * we need to check to see if we were executing 0N/A * in kernel under on_trap() or t_lofault 0N/A * protection. If so, and if it was a PIO or MEM 0N/A * error, then modify the saved registers so that 0N/A * we return from the trap to the appropriate 0N/A * trampoline routine. 0N/A * If PIO error, we need to query the bus nexus 0N/A " invalid in non-resumable error handler",
0N/A * Queue the error report for further processing. If 0N/A * flt_panic is set, code still process other errors 0N/A * in the queue until the panic routine stops the 113N/A * Panic here if aflt->flt_panic has been set. 0N/A * Enqueued errors will be logged as part of the panic flow. 0N/A * Call page_retire() to handle memory errors. 0N/A * If we queued an error and the it was in user mode, or 0N/A * protected by t_lofault, or user_spill_fill is set, we 0N/A * set AST flag so the queue will be drained before 0N/A * returning to user mode. 0N/A * For PIO errors, this routine calls nexus driver's error 0N/A * callback routines. If the callback routine returns fatal, and 0N/A * we are in kernel or unknow mode without any error protection, 0N/A * we need to turn on the panic flag. 1808N/A * If error is protected, it will jump to proper routine 1808N/A * to handle the handle; if it is in user level, we just 1808N/A * kill the user process; if the driver thinks the error is 1808N/A * not fatal, we can drive on. If none of above are true, 0N/A * This routine checks to see if we are under any error protection when 0N/A * the error happens. If we are under error protection, we unwind to 0N/A * the protection and indicate fault. 0N/A * for peek and caut_gets 0N/A * errors are expected 0N/A * The cpu_async_log_err() function is called by the ce/ue_drain() function to 0N/A * handle logging for CPU events that are dequeued. As such, it can be invoked 0N/A * from softint context, from AST processing in the trap() flow, or from the 0N/A * panic flow. We decode the CPU-specific data, and log appropriate messages. 0N/A * Turn on the PR_UE flag. The page will be 0N/A * scrubbed when it is freed. 0N/A * For non-resumable memory error, retire 0N/A * If we are going to panic, scrub the page first 0N/A * Called from ce_drain(). 0N/A * Called from ue_drain(). 0N/A * Turn on flag on the error memory region. 0N/A * Call hypervisor to flush the memory region. The memory region 0N/A * must be within the same page frame. 0N/A * If resumable queue is full, we need to check if any cpu is in 0N/A * error state. If not, we drive on. If yes, we need to panic. The 0N/A * hypervisor call hv_cpu_state() is being used for checking the 0N/A * Return processor specific async error structure 0N/A * Message print out when resumable queue is overflown 0N/A * Handler to process a fatal error. This routine can be called from a 0N/A * softint, called from trap()'s AST handling, or called from the panic flow. 0N/A * Handler to process a correctable error. This routine can be called from a 0N/A * softint. We just call the CPU module's logging routine. 0N/A * Handler to process vbsc hostshutdown (power-off button). 0N/A * just in case do_shutdown() fails 0N/A * Allocate error queue sizes based on max_ncpus. max_ncpus is set just 0N/A * after ncpunode has been determined. ncpus is set in start_other_cpus 0N/A * which is called after error_init() but may change dynamically. 0N/A * Initialize the correctable and uncorrectable error queues. 0N/A panic(
"failed to create required system error queue");
0N/A * Setup interrupt handler for power-off button. 0N/A * Initialize the busfunc list mutex. This must be a PIL_15 spin lock 0N/A * because we will need to acquire it from cpu_async_error(). 0N/A * Nonresumable queue is full, panic here