mach_cpu_states.c revision cdf9f8c9206117516a14f3c70f86510326c3d5fa
4632N/A * The contents of this file are subject to the terms of the 4632N/A * Common Development and Distribution License (the "License"). 4632N/A * You may not use this file except in compliance with the License. 4632N/A * See the License for the specific language governing permissions 4632N/A * and limitations under the License. 4632N/A * When distributing Covered Code, include this CDDL HEADER in each 4632N/A * If applicable, add the following below this CDDL HEADER, with the 4632N/A * fields enclosed by brackets "[]" replaced with your own identifying 4632N/A * information: Portions Copyright [yyyy] [name of copyright owner] 4632N/A * Copyright 2007 Sun Microsystems, Inc. All rights reserved. 4632N/A * Use is subject to license terms. 4632N/A#
pragma ident "%Z%%M% %I% %E% SMI" 4632N/A * hvdump_buf_va is a pointer to the currently-configured hvdump_buf. 4632N/A * A value of NULL indicates that this area is not configured. 4632N/A * hvdump_buf_sz is tunable but will be clamped to HVDUMP_SIZE_MAX. 4632N/A * For xt_sync synchronization. 4632N/A * We keep our own copies, used for cache flushing, because we can be called 4632N/A * In an LDoms system we do not save the user's boot args in NVRAM 4632N/A * as is done on legacy systems. Instead, we format and send a 4632N/A * 'reboot-command' variable to the variable service. The contents 4632N/A * of the variable are retrieved by OBP and used verbatim for 4632N/A * Machine dependent code to reboot. 4632N/A * "bootstr", when non-null, points to a string to be used as the 4632N/A * argument string when rebooting. 4632N/A * "invoke_cb" is a boolean. It is set to true when mdboot() can safely 4632N/A * invoke CB_CL_MDBOOT callbacks before shutting the system down, i.e. when 4632N/A * we are in a normal shutdown sequence (interrupts are not blocked, the 4632N/A * system is not panic'ing or being suspended). 4632N/A * XXX - rconsvp is set to NULL to ensure that output messages 4632N/A * are sent to the underlying "hardware" device using the 4632N/A * monitor's printf routine since we are in the process of 4632N/A * either rebooting or halting the machine. 4632N/A * If LDoms is running, we must save the boot string 4632N/A * before we enter restricted mode. This is possible 4632N/A * only if we are not being called from panic. 4632N/A * At a high interrupt level we can't: 4632N/A * 2) wait for pending interrupts prior to redistribution 4632N/A /* make sure there are no more changes to the device tree */ 4632N/A * Clear any unresolved UEs from memory. 4632N/A * stop other cpus which also raise our priority. since there is only 4632N/A * one active cpu after this, and our priority will be too high 4632N/A * for us to be preempted, we're essentially single threaded 4632N/A * try and reset leaf devices. reset_leaves() should only 4632N/A * be called when there are no other threads that could be 4632N/A/* mdpreboot - may be called prior to mdboot while root fs still mounted */ 4632N/A * Halt the machine and then reboot with the device 4632N/A * and arguments specified in bootstr. 4632N/A * For platforms that use CPU signatures, we 4632N/A * need to set the signature block to OS and 4632N/A * the state to exiting for all the processors. 4632N/A * We use the x-trap mechanism and idle_stop_xcall() to stop the other CPUs. 4632N/A * Once in panic_idle() they raise spl, record their location, and spin. 4632N/A * Force the other CPUs to trap into panic_idle(), and then remove them 4632N/A * from the cpu_ready_set so they will no longer receive cross-calls. 4632N/A * Platform callback following each entry to panicsys(). If we've panicked at 4632N/A * level 14, we examine t_panic_trap to see if a fatal trap occurred. If so, 4632N/A * we disable further %tick_cmpr interrupts. If not, an explicit call to panic 4632N/A * was made and so we re-enqueue an interrupt request structure to allow 4632N/A * further level 14 interrupts to be processed once we lower PIL. This allows 4632N/A * us to handle panics from the deadman() CY_HIGH_LEVEL cyclic. 4632N/A /* there are no possible error codes for this hcall */ 4632N/A * Clear SOFTINT<14>, SOFTINT<0> (TICK_INT) 4632N/A * and SOFTINT<16> (STICK_INT) to indicate 4632N/A * that the current level 14 has been serviced. * Miscellaneous hardware-specific code to execute after panicstr is set * by the panic code: we also print and record PTL1 panic information here. * Turn off TRAPTRACE and save the current %tick value in panic_tick. /* there are no possible error codes for this hcall */ * For Platforms that use CPU signatures, we * need to set the signature block to OS, the state to * exiting, and the substate to panic for all the processors. * Disable further ECC errors from the bus nexus. * Redirect all interrupts to the current CPU. * This call exists solely to support dumps to network * devices after sync from OBP. * If we came here via the sync callback, then on some * platforms, interrupts may have arrived while we were * stopped in OBP. OBP will arrange for those interrupts to * be redelivered if you say "go", but not if you invoke a * client callback like 'sync'. For some dump devices * (network swap devices), we need interrupts to be * delivered in order to dump, so we have to call the bus * nexus driver to reset the interrupt state machines. * Platforms that use CPU signatures need to set the signature block to OS and * the state to exiting for all CPUs. PANIC_CONT indicates that we're about to * write the crash dump, which tells the SSP/SMS to begin a timeout routine to * reboot the machine if the dump never completes. panic(
"ptl1_init_cpu: not enough space left for ptl1_panic " "stack, sizeof (struct cpu) = %lu",
(
unsigned long)
sizeof (
struct cpu));
"trap for debug purpose",
/* PTL1_BAD_DEBUG */ "unknown trap",
/* PTL1_BAD_DEBUG */ "register window trap",
/* PTL1_BAD_WTRAP */ "kernel MMU miss",
/* PTL1_BAD_KMISS */ "kernel protection fault",
/* PTL1_BAD_KPROT_FAULT */ "ISM MMU miss",
/* PTL1_BAD_ISM */ "kernel MMU trap",
/* PTL1_BAD_MMUTRAP */ "kernel trap handler state",
/* PTL1_BAD_TRAP */ "floating point trap",
/* PTL1_BAD_FPTRAP */ "pointer to intr_vec",
/* PTL1_BAD_INTR_VEC */ "unknown trap",
/* PTL1_BAD_INTR_VEC */ "TRACE_PTR state",
/* PTL1_BAD_TRACE_PTR */ "unknown trap",
/* PTL1_BAD_TRACE_PTR */ "stack overflow",
/* PTL1_BAD_STACK */ "DTrace flags",
/* PTL1_BAD_DTRACE_FLAGS */ "attempt to steal locked ctx",
/* PTL1_BAD_CTX_STEAL */ "CPU ECC error loop",
/* PTL1_BAD_ECC */ "unexpected error from hypervisor call",
/* PTL1_BAD_HCALL */ "unexpected global level(%gl)",
/* PTL1_BAD_GL */ "Watchdog Reset",
/* PTL1_BAD_WATCHDOG */ "unexpected RED mode trap",
/* PTL1_BAD_RED */ "return value EINVAL from hcall: "\
"UNMAP_PERM_ADDR",
/* PTL1_BAD_HCALL_UNMAP_PERM_EINVAL */ "return value ENOMAP from hcall: "\
"UNMAP_PERM_ADDR",
/* PTL1_BAD_HCALL_UNMAP_PERM_ENOMAP */ * Use trap_info for a place holder to call panic_savetrap() and * panic_showtrap() to save and print out ptl1_panic information. * Restore the watchdog timer when returning from a debugger * after a panic or L1-A and resume watchdog pat. "dump buffer. Error = 0x%lx, size = 0x%lx," "Available buffer size = 0x%lx," "Minimum buffer size required = 0x%lx",
"buffer. Error = 0x%lx",
ret);
value =
1;
/* boolean properties */ panic(
"stick_frequency property not found in MD");
panic(
"cannot allocate list for MD properties");
panic(
"stick_frequency property not found in MD");
"cpuid: 0x%x has been marked in " "unexpected hypervisor error 0x%x " "while sending a mondo to cpuid: " * If there is a big jump between the current tick * count and lasttick, we have probably hit a break * point. Adjust endtick accordingly to avoid panic. "(target 0x%x) [retries: 0x%x hvstat: 0x%x]",
* Assemble CPU list for HV argument. We already know * smallestid and largestid are members of set. * Either not all CPU mondos were sent, or an * error occurred. CPUs that were sent mondos * have their CPU IDs overwritten in cpu_list. * Reset cpu_list so that it only holds those * CPU IDs that still need to be sent. for (i = 0, j = 0; i <
ncpuids; i++) {
* Now handle possible errors returned * Remove any CPUs in the error state from * cpu_list. At this point cpu_list only * contains the CPU IDs for mondos not "H_ECPUERROR but no CPU in " "cpu_list in error state");
"CPU(s) in error state");
* For all other errors, panic. "hypervisor error 0x%x while sending a " "mondo to cpuid(s):",
stat);
* If there is a big jump between the current tick * count and lasttick, we have probably hit a break * point. Adjust endtick accordingly to avoid panic. "[retries: 0x%x] cpuids: ",
retries);
for (
rc = 0, i = 0; i <
NCPU; i++) {
* Sends a cross-call to a specified processor. The caller assumes * responsibility for repetition of cross-calls, as appropriate (MARSA for * return (KDI_XC_RES_ERR); /* Not required on sun4v architecture. */ * For "mdb -K", set soft state to debugging * check again as the read above may or may not have worked and if * it didn't then soft state will still be -1 * For "mdb -K", set soft_state state back to original state on exit * Routine to return memory information associated * with a physical address and syndrome. * This routine returns the size of the kernel's FRU name buffer. * This routine is a more generic interface to cpu_get_mem_unum(), * that may be used by other modules (e.g. mm). * xt_sync - wait for previous x-traps to finish * Sun4v uses a queue for receiving mondos. Successful * transmission of a mondo only indicates that the mondo * has been written into the queue. * We use an array of bytes to let each cpu to signal back * to the cross trap sender that the cross trap has been * executed. Set the byte to 1 before sending the cross trap * and wait until other cpus reset it to 0. * To help debug xt_sync panic, each mondo is uniquely identified * by passing the tick value, traptrace_id as the second mondo * argument to xt_some which is logged in CPU's mondo queue, * traptrace buffer and the panic message. * If there is a big jump between the current tick * count and lasttick, we have probably hit a break * point. Adjust endtick accordingly to avoid panic. "at cpu_sync.xword[%d]: 0x%lx " "starttick: 0x%lx endtick: 0x%lx " "traptrace_id = 0x%lx\n",
* Recalculate the values of the cross-call timeout variables based * on the value of the 'inter-cpu-latency' property of the platform node. * The property sets the number of nanosec to wait for a cross-call * to be acknowledged. Other timeout variables are derived from it. * N.B. This implementation is aware of the internals of xc_init() * and updates many of the same variables. /* See x_call.c for descriptions of these extern variables. */ /* Temp versions of the target variables */ * Look up the 'inter-cpu-latency' (optional) property in the * platform node of the MD. The units are nanoseconds. "Unable to initialize machine description");
"inter-cpu-latency", &
latency) == -
1)
* clock.h defines an assembly-language macro * (NATIVE_TIME_TO_NSEC_SCALE) to convert from %stick * units to nanoseconds. Since the inter-cpu-latency * units are nanoseconds and the xc_* variables require * %stick units, we need the inverse of that function. * The trick is to perform the calculation without * floating point, but also without integer truncation * or overflow. To understand the calculation below, * please read the discussion of the macro in clock.h. * Since this new code will be invoked infrequently, * we can afford to implement it in C. * tick_scale is the reciprocal of nsec_scale which is * calculated at startup in setcpudelay(). The calc * of tick_limit parallels that of NATIVE_TIME_TO_NSEC_SCALE * except we use tick_scale instead of nsec_scale and * C instead of assembler. * xc_init() calculated 'maxfreq' by looking at all the cpus, * and used it to derive some of the timeout variables that we * recalculate below. We can back into the original value by * using the inverse of one of those calculations. * Don't allow the new timeout (xc_tick_limit) to fall below * the system tick frequency (stick). Allowing the timeout * to be set more tightly than this empirically determined * value may cause panics. * Recalculate xc_scale since it is used in a callback function * (xc_func_timeout_adj) to adjust two of the timeouts dynamically. * Make the change in xc_scale proportional to the change in * Don't modify the timeouts if nothing has changed. Else, * stuff the variables with the freshly calculated (temp) * variables. This minimizes the window where the set of * values could be inconsistent. * Force the new values to be used for future cross * calls. This is necessary only when we increase * Try to register soft_state api. If it fails, soft_state api has not * been implemented in the firmware, so do not bother to setup * soft_state in the kernel. * Tell OBP that we are supporting Guest State "hv_soft_state_set returned %ld\n",
rc);
"hv_soft_state_get returned %ld\n",
rc);