mp_machdep.c revision 7417cfdecea1902cef03c0d61a72df97d945925d
* Local function prototypes * External reference functions * PSM functions initialization /* global IRM pool for APIX (PSM) module */ * True if the generic TSC code is our source of hrtime, rather than whatever * True if the hrtime implementation is "hires"; namely, better than microdata. * virtualization support for psm * If non-zero, idle cpus will become "halted" when there's * If non-zero, idle cpus will use mwait if available to halt instead of hlt. * Set to 0 to avoid MONITOR+CLFLUSH assertion. * If non-zero, idle cpus will not use power saving Deep C-States idle loop. * Non-power saving idle loop and wakeup pointers. * Allows user to toggle Deep Idle power saving feature on/off. * Object for the kernel to access the HPET. #
endif /* ifndef __xpv */ * Compare two CPUs and see if they have a pghw_type_t sharing relationship * If pghw_type_t is an unsupported hardware type, then return -1 * Return a physical instance identifier for known hardware sharing * Express preference for optimizing for sharing relationship * Override the default CMT dispatcher policy for the specified * hardware sharing relationship * For shared caches, also load balance across them to * maximize aggregate cache capacity /* Set the nosteal interval (used by disp_getbest()) to 100us */ * Routine to ensure initial callers to hrtime gets 0 as return * Supports Deep C-State power saving idle loop. * Function called by CPU idle notification framework to check whether CPU * has been awakened. It will be called with interrupt disabled. * If CPU has been awakened, call cpu_idle_exit() to notify CPU idle * notification framework. * Toggle interrupt flag to detect pending interrupts. * If interrupt happened, do_interrupt() will notify CPU idle * notification framework so no need to call cpu_idle_exit() here. * Idle the present CPU until wakened via an interrupt * If this CPU is online, and there's multiple CPUs * in the system, then we should notate our halting * by adding ourselves to the partition's halted CPU * bitmap. This allows other CPUs to find/awaken us when * work becomes available. * Add ourselves to the partition's halted CPUs bitmap * and set our HALTED flag, if necessary. * When a thread becomes runnable, it is placed on the queue * and then the halted CPU bitmap is checked to determine who * (if anyone) should be awakened. We therefore need to first * add ourselves to the bitmap, and and then check if there * is any work available. The order is important to prevent a race * that can lead to work languishing on a run queue somewhere while * this CPU remains halted. * Either the producing CPU will see we're halted and will awaken us, * or this CPU will see the work available in disp_anywork(). * Note that memory barriers after updating the HALTED flag * are not necessary since an atomic operation (updating the bitset) * immediately follows. On x86 the atomic operation acts as a * memory barrier for the update of cpu_disp_flags. * Check to make sure there's really nothing to do. * Work destined for this CPU may become available after * this check. We'll be notified through the clearing of our * bit in the halted CPU bitmap, and a poke. * We're on our way to being halted. * Disable interrupts now, so that we'll awaken immediately * after halting if someone tries to poke us between now and * the time we actually halt. * We check for the presence of our bit after disabling interrupts. * If it's cleared, we'll return. If the bit is cleared after * we check then the poke will pop us out of the halted state. * This means that the ordering of the poke and the clearing * of the bit by cpu_wakeup is important. * cpu_wakeup() must clear, then poke. * cpu_idle() must disable interrupts, then check for the bit. * The check for anything locally runnable is here for performance * and isn't needed for correctness. disp_nrunnable ought to be * in our cache still, so it's inexpensive to check, and if there * is anything runnable we won't have to wait for the poke. * If "cpu" is halted, then wake it up clearing its halted bit in advance. * Otherwise, see if other CPUs in the cpu partition are halted and need to * be woken up so that they can steal the thread we placed on this CPU. * This function is only used on MP systems. * Clear the halted bit for that CPU since it will be * We may find the current CPU present in the halted cpuset * if we're in the context of an interrupt that occurred * before we had a chance to clear our bit in cpu_idle(). * Poking ourself is obviously unnecessary, since if * we're here, we're not halted. * This cpu isn't halted, but it's idle or undergoing a * context switch. No need to awaken anyone else. * No need to wake up other CPUs if this is for a bound thread. * The CPU specified for wakeup isn't currently halted, so check * to see if there are any other halted CPUs in the partition, * and if there are then awaken one. * Function called by CPU idle notification framework to check whether CPU * has been awakened. It will be called with interrupt disabled. * If CPU has been awakened, call cpu_idle_exit() to notify CPU idle * notification framework. * CPU has been awakened, notify CPU idle notification system. * Toggle interrupt flag to detect pending interrupts. * If interrupt happened, do_interrupt() will notify CPU idle * notification framework so no need to call cpu_idle_exit() * Idle the present CPU until awakened via touching its monitored line * Set our mcpu_mwait here, so we can tell if anyone tries to * wake us between now and when we call mwait. No other cpu will * attempt to set our mcpu_mwait until we add ourself to the halted * If this CPU is online, and there's multiple CPUs * in the system, then we should note our halting * by adding ourselves to the partition's halted CPU * bitmap. This allows other CPUs to find/awaken us when * work becomes available. * Add ourselves to the partition's halted CPUs bitmap * and set our HALTED flag, if necessary. * When a thread becomes runnable, it is placed on the queue * and then the halted CPU bitmap is checked to determine who * (if anyone) should be awakened. We therefore need to first * add ourselves to the bitmap, and and then check if there * Note that memory barriers after updating the HALTED flag * are not necessary since an atomic operation (updating the bitmap) * immediately follows. On x86 the atomic operation acts as a * memory barrier for the update of cpu_disp_flags. * Check to make sure there's really nothing to do. * Work destined for this CPU may become available after * this check. We'll be notified through the clearing of our * bit in the halted CPU bitmap, and a write to our mcpu_mwait. * disp_anywork() checks disp_nrunnable, so we do not have to later. * We're on our way to being halted. * To avoid a lost wakeup, arm the monitor before checking if another * cpu wrote to mcpu_mwait to wake us up. * If "cpu" is halted in mwait, then wake it up clearing its halted bit in * advance. Otherwise, see if other CPUs in the cpu partition are halted and * need to be woken up so that they can steal the thread we placed on this CPU. * This function is only used on MP systems. * Clear the halted bit for that CPU since it will be woken up * Clear the halted bit for that CPU since it will be * We may find the current CPU present in the halted cpuset * if we're in the context of an interrupt that occurred * before we had a chance to clear our bit in cpu_idle(). * Waking ourself is obviously unnecessary, since if * we're here, we're not halted. * harmless and less expensive than always checking if we * are waking ourself which is an uncommon case. * This cpu isn't halted, but it's idle or undergoing a * context switch. No need to awaken anyone else. * No need to wake up other CPUs if the thread we just enqueued * See if there's any other halted CPUs. If there are, then * select one, and awaken it. * It's possible that after we find a CPU, somebody else * will awaken it before we get the chance. * In that case, look again. * switch to the offline cpu * raise ipl to just below cross call * set base spl to prevent the next swtch to idle from * switch to the online cpu * clear the interrupt active mask /* no psm_preshutdown function */ /* no psm_intr_ops function */ /* no psm_state function */ /* no psm_cpu_ops function */ * Save the version of the PSM module, in case we need to * behave differently based on version. panic(
"No valid PSM modules found");
/* check to see are there any conflicts */ /* remove all psm modules except uppc */ "Conflicts detected on the following PSM modules:");
"Setting the system back to SINGLE processor mode!");
"Please edit /etc/mach to remove the invalid PSM module.");
/* register the interrupt and clock initialization rotuines */ /* register the interrupt setup code */ * Time-of-day functionality now handled in TOD modules. * (Warn about PSM modules that think that we're going to use * Initialize the dispatcher's function hooks to enable CPU halting * when idle. Set both the deep-idle and non-deep-idle hooks. * Assume we can use power saving deep-idle loop cpu_idle_adaptive. * Platform deep-idle driver will reset our idle loop to * non_deep_idle_cpu if power saving deep-idle feature is not available. * Do not use monitor/mwait if idle_cpu_use_hlt is not set(spin idle) * or idle_cpu_prefer_mwait is not set. * Protect ourself from insane mwait size. "handle cpu 0 mwait size.");
* Disable power saving deep idle loop? * Only add boot_ncpus CPUs to mp_cpus. Other CPUs will be handled * by CPU DR driver at runtime. /* MP related routines */ /* optional MP related routines */ /* check for multiple CPUs */ /* check for MP platforms */ * Set the dispatcher hook to enable cpu "wake up" * when a thread becomes runnable. /* register the interrupt handlers */ /* initialize the interrupt hardware */ /* set interrupt mask for current ipl */ * During dom0 bringup, it was noted that on at least one older * Intel HT machine, the hypervisor initially gives a tsc_to_system_mul * value that is quite wrong (the 3.06GHz clock was reported * The curious thing is, that if you stop the kernel at entry, * breakpoint here and inspect the value with kmdb, the value * is correct - but if you don't stop and simply enable the * printf statement (below), you can see the bad value printed * here. Almost as if something kmdb did caused the hypervisor to * figure it out correctly. And, note that the hypervisor * eventually -does- figure it out correctly ... if you look at * the field later in the life of dom0, it is correct. * For now, on dom0, we employ a slightly cheesy workaround of * using the DOM0_PHYSINFO hypercall. printf(
"mach_getcpufreq: system_mul 0x%x, shift %d, " * We have a TSC. freq_tsc() knows how to measure the number * of clock cycles sampled against the PIT. panic(
"mach_getcpufreq: no TSC!");
* We are a Cyrix based on a 6x86 core or an Intel Pentium * for which freq_notsc() knows how to measure the number of * elapsed clock cycles sampled against the PIT /* We do not know how to calculate cpu frequency for this cpu. */ * If the clock speed of a cpu is found to be reported incorrectly, do not add * to this array, instead improve the accuracy of the algorithm that determines * the clock speed of the processor or extend the implementation to support the * vendor as appropriate. This is here only to support adjusting the speed on * older slower processors that mach_fixcpufreq() would not be able to account static int x86_cpu_freq[] = {
60,
75,
80,
90,
120,
160,
166,
175,
180,
233 };
* On fast processors the clock frequency that is measured may be off by * a few MHz from the value printed on the part. This is a combination of * the factors that for such fast parts being off by this much is within * the tolerances for manufacture and because of the difficulties in the * measurement that can lead to small error. This function uses some * heuristics in order to tweak the value that was measured to match what * is most likely printed on the part. * AMD Athlon 1000 mhz measured as 998 mhz * Intel Pentium III Xeon 733 mhz measured as 731 mhz * Intel Pentium IV 1500 mhz measured as 1495mhz * If in the future this function is no longer sufficient to correct * for the error in the measurement, then the algorithm used to perform * the measurement will have to be improved in order to increase accuracy * rather than adding horrible and questionable kludges here. * This is called after the cyclics subsystem because of the potential * that the heuristics within may give a worse estimate of the clock * frequency than the value that was measured. * Find the nearest integer multiple of 200/3 (about 66) MHz to the * measured speed taking into account that the 667 MHz parts were /* Find the nearest integer multiple of 50 MHz to the measured speed */ /* Find the closer of the two */ * Some older parts have a core clock frequency that is not an * integral multiple of 50 or 66 MHz. Check if one of the old * clock frequencies is closer to the measured value than any * of the integral multiples of 50 an 66, and if so set fixed * and delta appropriately to represent the closest value. * Set a reasonable maximum for how much to correct the measured * result by. This check is here to prevent the adjustment made * by this function from being more harm than good. It is entirely * possible that in the future parts will be made that are not * integral multiples of 66 or 50 in clock frequency or that * someone may overclock a part to some odd frequency. If the * measured value is farther from the corrected value than * allowed, then assume the corrected value is in error and use /* Round to nearest MHZ */ /* scalehrtimef will remain dummy */ * either periodic mode was requested or could not set to * psm should be able to do periodic, so we do not check * for return value of psm_clkinit here. * PSMI interface prior to PSMI_3 does not define a return * value for psm_clkinit, so the return value is ignored. * This is here to allow us to simulate cpus that refuse to start. * Default handler to create device node for CPU. * One reference count will be held on created device node. /* First check whether cpus exists. */ /* Create cpus if it doesn't exist. */ "?failed to create cpu nexus device.\n");
* create a child node for cpu identified as 'cpu_id' "?failed to create device node for cpu%d.\n",
cp->
cpu_id);
* Create cpu device node in device tree and online it. * Return created dip with reference count held if requested. /* Recursively attach driver for parent nexus device. */ /* Configure cpu itself and descendants. */ * The dipp contains one of following values on return: * - NULL if no device node found * - pointer to device node if found return (
irqno);
/* default to NO translation */ * SL_FATAL is pass in once panicstr is set, deliver it * as CE_PANIC. Also, translate SL_ codes back to CE_ * codes for the psmi handler * It provides the default basic intr_ops interface for the new DDI * interrupt framework if the PSM doesn't have one. * dip - pointer to the dev_info structure of the requested device * hdlp - pointer to the internal interrupt handle structure for the * intr_op - opcode for this call * result - pointer to the integer that will hold the result to be * passed back if return value is PSM_SUCCESS * return value is either PSM_SUCCESS or PSM_FAILURE * Return 1 if CMT load balancing policies should be * implemented across instances of the specified hardware * Return 1 if thread affinity polices should be implemented * for instances of the specifed hardware sharing relationship. * Return number of counter events requested to measure hardware capacity and * utilization and setup CPC requests for specified CPU as needed * May return 0 when platform or processor specific code knows that no CPC * events should be programmed on this CPU or -1 when platform or processor * specific code doesn't know which counter events are best to use and common * code should decide for itself /* LINTED E_FUNC_ARG_UNUSED */ * Return error if pcbe_ops not set * Return that no CPC events should be programmed on hyperthreaded * Pentium 4 and return error for all other x86 processors to tell * common code to decide what counter events to program on those CPUs * for measuring hardware capacity and utilization