ao_mca.c revision bb86c3425be684b7eaa9e875ec2740b39d444ec8
2N/A * The contents of this file are subject to the terms of the 2N/A * Common Development and Distribution License (the "License"). 2N/A * You may not use this file except in compliance with the License. 2N/A * See the License for the specific language governing permissions 2N/A * and limitations under the License. 2N/A * When distributing Covered Code, include this CDDL HEADER in each 2N/A * If applicable, add the following below this CDDL HEADER, with the 2N/A * fields enclosed by brackets "[]" replaced with your own identifying 2N/A * information: Portions Copyright [yyyy] [name of copyright owner] 2N/A * Copyright 2007 Sun Microsystems, Inc. All rights reserved. 2N/A * Use is subject to license terms. 2N/A#
pragma ident "%Z%%M% %I% %E% SMI" * Additional NB MCA ctl initialization for revs F and G * This is quite awful but necessary to work around x86 system vendor's view of * the world. Other operating systems (you know who you are) don't understand * Opteron-specific error handling, so BIOS and system vendors often hide these * conditions from them by using SMI polling to copy out any errors from the * machine-check registers. When Solaris runs on a system with this feature, * we want to disable the SMI polling so we can use FMA instead. Sadly, there * isn't even a standard self-describing way to express the whole situation, * so we have to resort to hard-coded values. This should all be changed to * be a self-describing vendor-specific SMBIOS structure in the future. {
"Sun Microsystems",
"Galaxy12",
"American Megatrends",
0x59 },
{
"Sun Microsystems",
"Sun Fire X4100 Server",
"American Megatrends",
0x59 },
{
"Sun Microsystems",
"Sun Fire X4200 Server",
"American Megatrends",
0x59 },
* If the bank's status register indicates overflow, then we can no * longer rely on the value of CECC: our experience with actual fault * injection has shown that multiple CE's overwriting each other shows * AMD_BANK_STAT_CECC and AMD_BANK_STAT_UECC both set to zero. This * should be clarified in a future BKDG or by the Revision Guide. * This behaviour is fixed in revision F. * r4 and pp bits are stored separately, so we mask off and compare them * for the code types that use them. Once we've taken the r4 and pp * bits out of the equation, we can directly compare the resulting code * with the one stored in the ao_error_disp_t. * ao_chip_once returns 1 if the caller should perform the operation for * this chip, or 0 if some other core has already performed the operation. * Setup individual bank detectors after stashing their bios settings. * The 'donb' argument indicates whether this core should configured * the shared NorthBridhe MSRs. /* Initialize MCi_CTL register for this bank */ /* Initialize the MCi_MISC register for this bank */ * This knob exists in case any platform has a problem with our default * policy of disabling any interrupt registered in the NB MC4_MISC * register. Setting this may cause Solaris and external entities * who also have an interest in this register to argue over available * telemetry (so setting it is generally not recommended). * The BIOS may have setup to receive SMI on counter overflow. It may also * have locked various fields or made them read-only. We will clear any * SMI request and leave the register locked. We will also clear the * counter and enable counting - while we don't use the counter it is nice * to have it enabled for verification and debug work. return;
/* stash BIOS value, but no changes */ * The Valid bit tells us whether the CtrP bit is defined; if it * is the CtrP bit tells us whether an ErrCount field is present. * If not then there is nothing for us to do. * NorthBridge (NB) MCA Configuration. * We add and remove bits from the BIOS-configured value, rather than * writing an absolute value. The variables ao_nb_cfg_{add,remove}_cmn and * ap_nb_cfg_{add,remove}_revFG are available for modification via kmdb * and /etc/system. The revision-specific adds and removes are applied * after the common changes, and one write is made to the config register. * These are not intended for watchdog configuration via these variables - * use the watchdog policy below. * Bits to be added to the NB configuration register - all revs. * Bits to be cleared from the NB configuration register - all revs. * Bits to be added to the NB configuration register - revs F and G. * Bits to be cleared from the NB configuration register - revs F and G. * Bits to be used if we configure the NorthBridge (NB) Watchdog. The watchdog * triggers a machine check exception when no response to an NB system access * occurs within a specified time interval. * The default watchdog policy is to enable it (at the above rate) if it * is disabled; if it is enabled then we leave it enabled at the rate * Read the NorthBridge (NB) configuration register in PCI space, * modify the settings accordingly, and store the new value back. break;
/* if enabled leave rate intact */ * Now apply bit adds and removes, first those common to all revs * and then the revision-specific ones. * This knob exists in case any platform has a problem with our default * policy of disabling any interrupt registered in the online spare * control register. Setting this may cause Solaris and external entities * who also have an interest in this register to argue over available * telemetry (so setting it is generally not recommended). * Setup the online spare control register (revs F and G). We disable * any interrupt registered by the BIOS and zero all error counts. return;
/* stash BIOS value, but no changes */ * If the BIOS has requested SMI interrupt type for ECC count * overflow for a chip-select or channel force those off. /* Enable writing to the EccErrCnt field */ /* First write, preparing for writes to EccErrCnt */ * Zero EccErrCnt and write this back to all chan/cs combinations. * Capture the machine-check exception state into our per-CPU logout area, and * dispatch a copy of the logout area to our error queue for ereport creation. * If 'rp' is non-NULL, we're being called from trap context; otherwise we're * being polled or poked by the injector. We return the number of errors * found through 'np', and a boolean indicating whether the error is fatal. * The caller is expected to call fm_panic() if we return fatal (non-zero). * Iterate over the banks of machine-check registers, read the address * and status registers into the logout area, and clear status as we go. * Also read the MCi_MISC register if MCi_STATUS.MISCV indicates that * there is valid info there (as it will in revisions F and G for * NorthBridge ECC errors). * Clear MCG_STATUS, indicating that machine-check trap processing is * complete. Once we do this, another machine-check trap can occur * (if another occurs up to this point then the system will reset). * If we took a machine-check trap, then the error is fatal if the * return instruction pointer is not valid in the global register. * Now iterate over the saved logout area, determining whether the * error that we saw is fatal or not based upon our dispositions * and the hardware's indicators of whether or not we can resume. * If we are taking a machine-check exception and our context * is corrupt, then we must die. * The overflow bit is set if the bank detects an error but * the valid bit of its status register is already set * (software has not yet read and cleared it). Enabled * (for mc# reporting) errors overwrite disabled errors, * uncorrectable errors overwrite correctable errors, * uncorrectable errors are not overwritten. * For the NB detector bank the overflow bit will not be * set for repeated correctable errors on revisions D and * earlier; it will be set on revisions E and later. * On revision E, however, the CorrECC bit does appear * to clear in these circumstances. Since we can enable * machine-check exception on NB correctables we need to * be careful here; we never enable mc# for correctable from * Our solution will be to declare a machine-check exception * fatal if the overflow bit is set except in the case of * revision F on the NB detector bank for which CorrECC * is indicated. Machine-check exception for NB correctables * on rev E is explicitly not supported. * If we are taking a machine-check exception and we don't * recognize the error case at all, then assume it's fatal. * This will need to change if we eventually use the Opteron * Rev E exception mechanism for detecting correctable errors. *
np = n;
/* return number of errors found to caller */ * Now try to allocate another element for scratch space and * use that for further scratch space (eg for constructing * nvlists to add the main ereport). If we can't reserve * a scratch element just fallback to working within the * element we already have, and hope for the best. All this * is necessary because the fixed buffer nv allocator does * not reclaim freed space and nvlist construction is * Create the "hc" scheme detector FMRI identifying this cpu * Encode all the common data into the ereport. * We're done with 'detector' so reclaim the scratch space. * Encode the error-specific data that was saved in the logout area. * Machine check interrupt handler - we jump here from mcetrap. * A sibling core may attempt to poll the NorthBridge during the * time we are performing the logout. So we coordinate NB access * of all cores of the same chip via a per-chip lock. If the lock * is held on a sibling core then we spin for it here; if the * lock is held by the thread we have interrupted then we do * not acquire the lock but can proceed safe in the knowledge that * the lock owner can't actually perform any NB accesses. This * requires that threads that take the aos_nb_poll_lock do not * block and that they disable preemption while they hold the lock. * It also requires that the lock be adaptive since mutex_owner does * not work for spin locks. * The mutex is not owned by the thread we have interrupted * (since the holder may not block or be preempted once the * lock is acquired). We will spin for this adaptive lock. for (i = 0; i <
nregs; i++)
* cmi_mca_init is only called during cpu startup if features include * X86_MCA (defined as both MCA and MCE support indicated by CPUID). * Furthermore, our ao_init function returns ENOTSUP if features * lacked X86_MCA, IA32_MSR_MCG_CAP lacks MCG_CAP_CTL_P, or the * cpu has an unexpected number of detector banks. * Configure the logout areas. We preset every logout area's acl_ao * pointer to refer back to our per-CPU state for errorq drain usage. /* LINTED: logical expression always true */ * Must this core perform NB MCA or DRAM configuration? This must be * Initialize poller data, but don't start polling yet. * Configure the bank MCi_CTL register to nominate which error * types for each bank will produce a machine-check (we'll poll * for others). Correctable error types mentioned in these MCi_CTL * settings won't actually produce an exception unless an additional * (and undocumented) bit is set elsewhere - the poller must still * Modify the MCA NB Configuration and Dram Configuration Registers. * Setup the Online Spare Control Register * Enable all error reporting banks (icache, dcache, ...). This * enables error detection, as opposed to error reporting above. * Throw away all existing bank state. We do this because some BIOSes, * perhaps during POST, do things to the machine that cause MCA state * to be updated. If we interpret this state as an actual error, we * may end up indicting something that's not actually broken. * Note that although this cpu module is loaded before the PSMs are * loaded (and hence before acpica is loaded), this function is * called from post_startup(), after PSMs are initialized and acpica * AcpiGetFirmwareTable works even if ACPI is disabled, so a failure * here means we weren't able to retreive a pointer to the FADT. * Fetch the System and BIOS vendor strings from SMBIOS and see if they * match a value in our table. If so, disable SMI error polling. This * is grotesque and should be replaced by self-describing vendor- * specific SMBIOS data or a specification enhancement instead. * Look for the SMI_CMD port in the ACPI FADT, * if the port is 0, this platform doesn't support * SMM, so there is no SMI error polling to disable. "favor of Solaris Fault Management for " "for AMD Processors could not disable SMI " "polling because an error occurred while " "trying to determine the SMI command port " "from the ACPI FADT table\n");
* Called after a CPU has been marked with CPU_FAULTED. Not called on the * faulted CPU. cpu_lock is held. * Nothing to do here. We'd like to turn off the faulted CPU's * correctable error detectors, but that can only be done by the * faulted CPU itself. cpu_get_state() will now return P_FAULTED, * allowing the poller to skip this CPU until it is re-enabled. * Called after the CPU_FAULTED bit has been cleared from a previously-faulted * CPU. Not called on the faulted CPU. cpu_lock is held. * We'd like to clear the faulted CPU's MCi_STATUS registers so as to * avoid generating ereports for errors which occurred while the CPU was * officially faulted. Unfortunately, those registers can only be * cleared by the CPU itself, so we can't do it here. * We're going to set the UNFAULTING bit on the formerly-faulted CPU's * MCA state. This will tell the poller that the MCi_STATUS registers * can't yet be trusted. The poller, which is the first thing we * control that'll execute on that CPU, will clear the registers, and * will then clear the bit.