4278N/A * Copyright (c) 1998, 2013, Oracle and/or its affiliates. All rights reserved. 1798N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 1798N/A * This code is free software; you can redistribute it and/or modify it 1798N/A * under the terms of the GNU General Public License version 2 only, as 1798N/A * published by the Free Software Foundation. 1798N/A * This code is distributed in the hope that it will be useful, but WITHOUT 1798N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 1798N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 1798N/A * version 2 for more details (a copy is included in the LICENSE file that 1798N/A * You should have received a copy of the GNU General Public License version 1798N/A * 2 along with this work; if not, write to the Free Software Foundation, 1798N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 1798N/A * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 1798N/A // Need to inhibit inlining for older versions of GCC to avoid build-time failures 1798N/A// Only bother with this argument setup if dtrace is available 1798N/A// TODO-FIXME: probes should not fire when caller is _blocked. assert() accordingly. 1798N/A#
else // ndef DTRACE_ENABLED 1798N/A#
endif // ndef DTRACE_ENABLED 1798N/A// The knob* variables are effectively final. Once set they should 1798N/A// never be modified hence. Consider using __read_mostly with GCC. 1798N/A// ----------------------------------------------------------------------------- 1798N/A// Theory of operations -- Monitors lists, thread residency, etc: 1798N/A// * A thread acquires ownership of a monitor by successfully 1798N/A// CAS()ing the _owner field from null to non-null. 1798N/A// * Invariant: A thread appears on at most one monitor list -- 1798N/A// cxq, EntryList or WaitSet -- at any one time. 1798N/A// * Contending threads "push" themselves onto the cxq with CAS 1798N/A// * After a contending thread eventually acquires the lock it must 1798N/A// dequeue itself from either the EntryList or the cxq. 1798N/A// * The exiting thread identifies and unparks an "heir presumptive" 1798N/A// tentative successor thread on the EntryList. Critically, the 1798N/A// exiting thread doesn't unlink the successor thread from the EntryList. 1798N/A// After having been unparked, the wakee will recontend for ownership of 1798N/A// the monitor. The successor (wakee) will either acquire the lock or 1798N/A// Succession is provided for by a policy of competitive handoff. 1798N/A// The exiting thread does _not_ grant or pass ownership to the 1798N/A// successor thread. (This is also referred to as "handoff" succession"). 1798N/A// Instead the exiting thread releases ownership and possibly wakes 1798N/A// a successor, so the successor can (re)compete for ownership of the lock. 1798N/A// If the EntryList is empty but the cxq is populated the exiting 1798N/A// thread will drain the cxq into the EntryList. It does so by 1798N/A// by detaching the cxq (installing null with CAS) and folding 1798N/A// the threads from the cxq into the EntryList. The EntryList is 1798N/A// doubly linked, while the cxq is singly linked because of the 1798N/A// CAS-based "push" used to enqueue recently arrived threads (RATs). 1798N/A// * Concurrency invariants: 1798N/A// -- only the monitor owner may access or mutate the EntryList. 1798N/A// The mutex property of the monitor itself protects the EntryList 1798N/A// from concurrent interference. 1798N/A// -- Only the monitor owner may detach the cxq. 1798N/A// * The monitor entry list operations avoid locks, but strictly speaking 1798N/A// they're not lock-free. Enter is lock-free, exit is not. 1798N/A// * The cxq can have multiple concurrent "pushers" but only one concurrent 1798N/A// detaching thread. This mechanism is immune from the ABA corruption. 1798N/A// More precisely, the CAS-based "push" onto cxq is ABA-oblivious. 1798N/A// * Taken together, the cxq and the EntryList constitute or form a 1798N/A// single logical queue of threads stalled trying to acquire the lock. 1798N/A// We use two distinct lists to improve the odds of a constant-time 1798N/A// dequeue operation after acquisition (in the ::enter() epilog) and 1798N/A// to reduce heat on the list ends. (c.f. Michael Scott's "2Q" algorithm). 1798N/A// A key desideratum is to minimize queue & monitor metadata manipulation 1798N/A// that occurs while holding the monitor lock -- that is, we want to 1798N/A// minimize monitor lock holds times. Note that even a small amount of 1798N/A// fixed spinning will greatly reduce the # of enqueue-dequeue operations 1798N/A// on EntryList|cxq. That is, spinning relieves contention on the "inner" 1798N/A// locks and monitor metadata. 1798N/A// Cxq points to the the set of Recently Arrived Threads attempting entry. 1798N/A// Because we push threads onto _cxq with CAS, the RATs must take the form of 1798N/A// a singly-linked LIFO. We drain _cxq into EntryList at unlock-time when 1798N/A// the unlocking thread notices that EntryList is null but _cxq is != null. 1798N/A// The EntryList is ordered by the prevailing queue discipline and 1798N/A// can be organized in any convenient fashion, such as a doubly-linked list or 1798N/A// a circular doubly-linked list. Critically, we want insert and delete operations 1798N/A// to operate in constant-time. If we need a priority queue then something akin 1798N/A// to Solaris' sleepq would work nicely. Viz., 1798N/A// Queue discipline is enforced at ::exit() time, when the unlocking thread 1798N/A// drains the cxq into the EntryList, and orders or reorders the threads on the 1798N/A// Barring "lock barging", this mechanism provides fair cyclic ordering, 1798N/A// somewhat similar to an elevator-scan. 1798N/A// * The monitor synchronization subsystem avoids the use of native 1798N/A// synchronization primitives except for the narrow platform-specific 1798N/A// the semantics of park-unpark. Put another way, this monitor implementation 1798N/A// depends only on atomic operations and park-unpark. The monitor subsystem 1798N/A// manages all RUNNING->BLOCKED and BLOCKED->READY transitions while the 1798N/A// underlying OS manages the READY<->RUN transitions. 1798N/A// * Waiting threads reside on the WaitSet list -- wait() puts 1798N/A// the caller onto the WaitSet. 1798N/A// * notify() or notifyAll() simply transfers threads from the WaitSet to 1798N/A// either the EntryList or cxq. Subsequent exit() operations will 1798N/A// unpark the notifyee. Unparking a notifee in notify() is inefficient - 1798N/A// it's likely the notifyee would simply impale itself on the lock held 1798N/A// * An interesting alternative is to encode cxq as (List,LockByte) where 1798N/A// the LockByte is 0 iff the monitor is owned. _owner is simply an auxiliary 1798N/A// variable, like _recursions, in the scheme. The threads or Events that form 1798N/A// the list would have to be aligned in 256-byte addresses. A thread would 1798N/A// try to acquire the lock or enqueue itself with CAS, but exiting threads 1798N/A// could use a 1-0 protocol and simply STB to set the LockByte to 0. 1798N/A// Note that is is *not* word-tearing, but it does presume that full-word 1798N/A// CAS operations are coherent with intermix with STB operations. That's true 1798N/A// on most common processors. 1798N/A// ----------------------------------------------------------------------------- 1798N/A // The following code is ordered to check the most common cases first 1798N/A // and to reduce RTS->RTO cache line upgrades on SPARC and IA32 processors. 1798N/A // Either ASSERT _recursions == 0 or explicitly set _recursions = 0. 1798N/A // CONSIDER: set or assert OwnerIsThread == 1 1798N/A // TODO-FIXME: check for integer overflow! BUGID 6557169. 1798N/A // Commute owner from a thread-specific on-stack BasicLockObject address to 1798N/A // a full-fledged "Thread *". 1798N/A // We've encountered genuine contention. 1798N/A // Try one round of spinning *before* enqueueing Self 1798N/A // and before going through the awkward and expensive state 1798N/A // transitions. The following spin is strictly optional ... 1798N/A // Note that if we acquire the monitor from an initial spin 1798N/A // we forgo posting JVMTI events and firing DTRACE probes. 1798N/A // Prevent deflation at STW-time. See deflate_idle_monitors() and is_busy(). 1798N/A // Ensure the object-monitor relationship remains stable while there's contention. 1798N/A {
// Change java thread status to indicate blocked on monitor enter. 1798N/A // TODO-FIXME: change the following for(;;) loop to straight-line code. 1798N/A // cleared by handle_special_suspend_equivalent_condition() 1798N/A // We have acquired the contended monitor, but while we were 1798N/A // waiting another thread suspended us. We don't want to enter 1798N/A // the monitor while suspended because that would surprise the 1798N/A // thread that suspended us. 1798N/A // Must either set _recursions = 0 or ASSERT _recursions == 0. 1798N/A // The thread -- now the owner -- is back in vm mode. 1798N/A // Report the glorious news via TI,DTrace and jvmstat. 1798N/A // The probe effect is non-trivial. All the reportage occurs 1798N/A // while we hold the monitor, increasing the length of the critical 1798N/A // section. Amdahl's parallel speedup law comes vividly into play. 1798N/A // Another option might be to aggregate the events (thread local or 1798N/A // per-monitor aggregation) and defer reporting until a more opportune 1798N/A // time -- such as next time some thread encounters contention but has 1798N/A // yet to acquire the lock. While spinning that thread could 1798N/A // spinning we could increment JVMStat counters, etc. 1798N/A// Caveat: TryLock() is not necessarily serializing if it returns failure. 1798N/A// Callers must compensate as needed. 1798N/A // Either guarantee _recursions == 0 or set _recursions = 0. 1798N/A // CONSIDER: set or assert that OwnerIsThread == 1 1798N/A // The lock had been free momentarily, but we lost the race to the lock. 1798N/A // Interference -- the CAS failed. 1798N/A // We can either return -1 or retry. 1798N/A // Retry doesn't make as much sense because the lock was just acquired. 1798N/A // We try one round of spinning *before* enqueueing Self. 1798N/A // If the _owner is ready but OFFPROC we could use a YieldTo() 1798N/A // operation to donate the remainder of this thread's quantum 1798N/A // to the owner. This has subtle but beneficial affinity 1798N/A // The Spin failed -- Enqueue and park the thread ... 1798N/A // Enqueue "Self" on ObjectMonitor's _cxq. 1798N/A // Node acts as a proxy for Self. 1798N/A // As an aside, if were to ever rewrite the synchronization code mostly 1798N/A // in Java, WaitNodes, ObjectMonitors, and Events would become 1st-class 1798N/A // Java objects. This would avoid awkward lifecycle and liveness issues, 1798N/A // as well as eliminate a subset of ABA issues. 1798N/A // TODO: eliminate ObjectWaiter and enqueue either Threads or Events. 1798N/A // Push "Self" onto the front of the _cxq. 1798N/A // Note that spinning tends to reduce the rate at which threads 1798N/A // enqueue and dequeue on EntryList|cxq. 1798N/A // Interference - the CAS failed because _cxq changed. Just retry. 1798N/A // As an optional optimization we retry the lock. 1798N/A // Check for cxq|EntryList edge transition to non-null. This indicates 1798N/A // the onset of contention. While contention persists exiting threads 1798N/A // will use a ST:MEMBAR:LD 1-1 exit protocol. When contention abates exit 1798N/A // operations revert to the faster 1-0 mode. This enter operation may interleave 1798N/A // (race) a concurrent 1-0 exit operation, resulting in stranding, so we 1798N/A // arrange for one of the contending thread to use a timed park() operations 1798N/A // to detect and recover from the race. (Stranding is form of progress failure 1798N/A // where the monitor is unlocked but all the contending threads remain parked). 1798N/A // That is, at least one of the contended threads will periodically poll _owner. 1798N/A // One of the contending threads will become the designated "Responsible" thread. 1798N/A // The Responsible thread uses a timed park instead of a normal indefinite park 1798N/A // operation -- it periodically wakes and checks for and recovers from potential 1798N/A // strandings admitted by 1-0 exit operations. We need at most one Responsible 1798N/A // thread per-monitor at any given moment. Only threads on cxq|EntryList may 1798N/A // be responsible for a monitor. 1798N/A // Currently, one of the contended threads takes on the added role of "Responsible". 1798N/A // A viable alternative would be to use a dedicated "stranding checker" thread 1798N/A // that periodically iterated over all the threads (or active monitors) and unparked 1798N/A // successors where there was risk of stranding. This would help eliminate the 1798N/A // timer scalability issues we see on some platforms as we'd only have one thread 1798N/A // -- the checker -- parked on a timer. 1798N/A // Try to assume the role of responsible thread for the monitor. 1798N/A // CONSIDER: ST vs CAS vs { if (Responsible==null) Responsible=Self } 1798N/A // The lock have been released while this thread was occupied queueing 1798N/A // itself onto _cxq. To close the race and avoid "stranding" and 1798N/A // progress-liveness failure we must resample-retry _owner before parking. 1798N/A // In this case the ST-MEMBAR is accomplished with CAS(). 1798N/A // TODO: Defer all thread state transitions until park-time. 1798N/A // Since state transitions are heavy and inefficient we'd like 1798N/A // to defer the state transitions until absolutely necessary, 1798N/A // and in doing so avoid some transitions ... 1798N/A // Increase the RecheckInterval, but clamp the value. 1798N/A // The lock is still contested. 1798N/A // Keep a tally of the # of futile wakeups. 1798N/A // Note that the counter is not protected by a lock or updated by atomics. 1798N/A // That is by design - we trade "lossy" counters which are exposed to 1798N/A // races during updates for a lower probe effect. 1798N/A // Assuming this is not a spurious wakeup we'll normally find _succ == Self. 1798N/A // We can defer clearing _succ until after the spin completes 1798N/A // TrySpin() must tolerate being called with _succ == Self. 1798N/A // Try yet another round of adaptive spinning. 1798N/A // We can find that we were unpark()ed and redesignated _succ while 1798N/A // we were spinning. That's harmless. If we iterate and call park(), 1798N/A // park() will consume the event and return immediately and we'll 1798N/A // just spin again. This pattern can repeat, leaving _succ to simply 1798N/A // spin on a CPU. Enable Knob_ResetEvent to clear pending unparks(). 1798N/A // Alternately, we can sample fired() here, and if set, forgo spinning 1798N/A // Invariant: after clearing _succ a thread *must* retry _owner before parking. 1798N/A // Self has acquired the lock -- Unlink Self from the cxq or EntryList. 1798N/A // Normally we'll find Self on the EntryList . 1798N/A // From the perspective of the lock owner (this thread), the 1798N/A // EntryList is stable and cxq is prepend-only. 1798N/A // The head of cxq is volatile but the interior is stable. 1798N/A // In addition, Self.TState is stable. 1798N/A // guarantee (((oop)(object()))->mark() == markOopDesc::encode(this), "invariant") ; 1798N/A // but as we're at a safepoint that's not safe. 1798N/A // We may leave threads on cxq|EntryList without a designated 1798N/A // "Responsible" thread. This is benign. When this thread subsequently 1798N/A // exits the monitor it can "see" such preexisting "old" threads -- 1798N/A // threads that arrived on the cxq|EntryList before the fence, above -- 1798N/A // by LDing cxq|EntryList. Newly arrived threads -- that is, threads 1798N/A // that arrive on cxq after the ST:MEMBAR, above -- will set Responsible 1798N/A // non-null and elect a new "Responsible" timer thread. 1798N/A // ST Responsible=null; MEMBAR (in enter epilog - here) 1798N/A // LD cxq|EntryList (in subsequent exit) 1798N/A // ST cxq=nonnull; MEMBAR; LD Responsible (in enter prolog) 1798N/A // The (ST cxq; MEMBAR) is accomplished with CAS(). 1798N/A // The MEMBAR, above, prevents the LD of cxq|EntryList in the subsequent 1798N/A // exit operation from floating above the ST Responsible=null. 1798N/A // We've acquired ownership with CAS(). 1798N/A // But since the CAS() this thread may have also stored into _succ, 1798N/A // EntryList, cxq or Responsible. These meta-data updates must be 1798N/A // visible __before this thread subsequently drops the lock. 1798N/A // Consider what could occur if we didn't enforce this constraint -- 1798N/A // STs to monitor meta-data and user-data could reorder with (become 1798N/A // visible after) the ST in exit that drops ownership of the lock. 1798N/A // Some other thread could then acquire the lock, but observe inconsistent 1798N/A // or old monitor meta-data and heap data. That violates the JMM. 1798N/A // To that end, the 1-0 exit() operation must have at least STST|LDST 1798N/A // "release" barrier semantics. Specifically, there must be at least a 1798N/A // STST|LDST barrier in exit() before the ST of null into _owner that drops 1798N/A // the lock. The barrier ensures that changes to monitor meta-data and data 1798N/A // protected by the lock will be visible before we release the lock, and 1798N/A // therefore before some other thread (CPU) has a chance to acquire the lock. 1798N/A // Critically, any prior STs to _succ or EntryList must be visible before 1798N/A // the ST of null into _owner in the *subsequent* (following) corresponding 1798N/A // monitorexit. Recall too, that in 1-0 mode monitorexit does not necessarily 1798N/A // execute a serializing instruction. 1798N/A// ReenterI() is a specialized inline form of the latter half of the 1798N/A// contended slow-path from EnterI(). We use ReenterI() only for 1798N/A// monitor reentry in wait(). 1798N/A// In the future we should reconcile EnterI() and ReenterI(), adding 1798N/A// Knob_Reset and Knob_SpinAfterFutile support and restructuring the 1798N/A // State transition wrappers around park() ... 1798N/A // ReenterI() wisely defers state transitions until 1798N/A // it's clear we must park the thread. 1798N/A // cleared by handle_special_suspend_equivalent_condition() 1798N/A // were we externally suspended while we were waiting? 1798N/A // Try again, but just so we distinguish between futile wakeups and 1798N/A // successful wakeups. The following test isn't algorithmically 1798N/A // necessary, but it helps us maintain sensible statistics. 1798N/A // The lock is still contested. 1798N/A // Keep a tally of the # of futile wakeups. 1798N/A // Note that the counter is not protected by a lock or updated by atomics. 1798N/A // That is by design - we trade "lossy" counters which are exposed to 1798N/A // races during updates for a lower probe effect. 1798N/A // Assuming this is not a spurious wakeup we'll normally 1798N/A // find that _succ == Self. 1798N/A // Invariant: after clearing _succ a contending thread 1798N/A // *must* retry _owner before parking. 1798N/A // Self has acquired the lock -- Unlink Self from the cxq or EntryList . 1798N/A // Normally we'll find Self on the EntryList. 1798N/A // Unlinking from the EntryList is constant-time and atomic-free. 1798N/A // From the perspective of the lock owner (this thread), the 1798N/A // EntryList is stable and cxq is prepend-only. 1798N/A // The head of cxq is volatile but the interior is stable. 1798N/A // In addition, Self.TState is stable. 1798N/A// after the thread acquires the lock in ::enter(). Equally, we could defer 1798N/A// unlinking the thread until ::exit()-time. 1798N/A // Normal case: remove Self from the DLL EntryList . 1798N/A // This is a constant-time operation. 1798N/A // Inopportune interleaving -- Self is still on the cxq. 1798N/A // This usually means the enqueue of self raced an exiting thread. 1798N/A // Normally we'll find Self near the front of the cxq, so 1798N/A // dequeueing is typically fast. If needbe we can accelerate 1798N/A // back-links so dequeueing from the interior will normally operate 1798N/A // Dequeue Self from either the head (with CAS) or from the interior 1798N/A // with a linear-time scan and normal non-atomic memory operations. 1798N/A // CONSIDER: if Self is on the cxq then simply drain cxq into EntryList 1798N/A // and then unlink Self from EntryList. We have to drain eventually, 1798N/A // so it might as well be now. 1798N/A // The CAS above can fail from interference IFF a "RAT" arrived. 1798N/A // In that case Self must be in the interior and can no longer be 1798N/A v =
_cxq ;
// CAS above failed - start scan at head of list 1798N/A// ----------------------------------------------------------------------------- 1798N/A// Note that the collector can't reclaim the objectMonitor or deflate 1798N/A// the object out from underneath the thread calling ::exit() as the 1798N/A// thread calling ::exit() never transitions to a stable state. 1798N/A// This inhibits GC, which in turn inhibits asynchronous (and 1798N/A// inopportune) reclamation of "this". 1798N/A// We'd like to assert that: (THREAD->thread_state() != _thread_blocked) ; 1798N/A// There's one exception to the claim above, however. EnterI() can call 1798N/A// exit() to drop a lock if the acquirer has been externally suspended. 1798N/A// In that case exit() is called with _thread_state as _thread_blocked, 1798N/A// but the monitor's _count field is > 0, which inhibits reclamation. 1798N/A// ::exit() uses a canonical 1-1 idiom with a MEMBAR although some of 1798N/A// the fast-path operators have been optimized so the common ::exit() 1798N/A// operation is 1-0. See i486.ad fast_unlock(), for instance. 1798N/A// The code emitted by fast_unlock() elides the usual MEMBAR. This 1798N/A// greatly improves latency -- MEMBAR and CAS having considerable local 1798N/A// latency on modern processors -- but at the cost of "stranding". Absent the 1798N/A// MEMBAR, a thread in fast_unlock() can race a thread in the slow 1798N/A// ::enter() path, resulting in the entering thread being stranding 1798N/A// and a progress-liveness failure. Stranding is extremely rare. 1798N/A// We use timers (timed park operations) & periodic polling to detect 1798N/A// and recover from stranding. Potentially stranded threads periodically 1798N/A// wake up and poll the lock. See the usage of the _Responsible variable. 1798N/A// The CAS() in enter provides for safety and exclusion, while the CAS or 1798N/A// MEMBAR in exit provides for progress and avoids stranding. 1-0 locking 1798N/A// eliminates the CAS/MEMBAR from the exist path, but it admits stranding. 1798N/A// We detect and recover from stranding with timers. 1798N/A// If a thread transiently strands it'll park until (a) another 1798N/A// thread acquires the lock and then drops the lock, at which time the 1798N/A// exiting thread will notice and unpark the stranded thread, or, (b) 1798N/A// the timer expires. If the lock is high traffic then the stranding latency 1798N/A// will be low due to (a). If the lock is low traffic then the odds of 1798N/A// stranding are lower, although the worst-case stranding latency 1798N/A// is longer. Critically, we don't want to put excessive load in the 1798N/A// platform's timer subsystem. We want to minimize both the timer injection 1798N/A// any one time. (more precisely, we want to minimize timer-seconds, which is 1798N/A// the integral of the # of active timers at any instant over time). 1798N/A// Both impinge on OS scalability. Given that, at most one thread parked on 1798N/A// a monitor will use a timer. 1798N/A // Transmute _owner from a BasicLock pointer to a Thread address. 1798N/A // We don't need to hold _mutex for this transition. 1798N/A // Non-null to Non-null is safe as long as all readers can 1798N/A // in native code by throwing an exception. 1798N/A // TODO: Throw an IllegalMonitorStateException ? 1798N/A // Invariant: after setting Responsible=null an thread must execute 1798N/A // a MEMBAR or other serializing instruction before fetching EntryList|cxq. 4141N/A // get the owner's thread id for the MonitorEnter event 4141N/A // if it is enabled and the thread isn't suspended 1798N/A // release semantics: prior loads and stores from within the critical section 1798N/A // must not float (reorder) past the following store that drops the lock. 1798N/A // On SPARC that requires MEMBAR #loadstore|#storestore. 1798N/A // But of course in TSO #loadstore|#storestore is not required. 1798N/A // I'd like to write one of the following: 1798N/A // A. OrderAccess::release() ; _owner = NULL 1798N/A // B. OrderAccess::loadstore(); OrderAccess::storestore(); _owner = NULL; 1798N/A // Unfortunately OrderAccess::release() and OrderAccess::loadstore() both 1798N/A // store into a _dummy variable. That store is not needed, but can result 1798N/A // in massive wasteful coherency traffic on classic SMP systems. 1798N/A // Instead, I use release_store(), which is implemented as just a simple 1798N/A // ST on x64, x86 and SPARC. 1798N/A // Normally the exiting thread is responsible for ensuring succession, 1798N/A // but if other successors are ready or other entering threads are spinning 1798N/A // then this thread can simply store NULL into _owner and exit without 1798N/A // waking a successor. The existence of spinners or ready successors 1798N/A // guarantees proper succession (liveness). Responsibility passes to the 1798N/A // ready or running successors. The exiting thread delegates the duty. 1798N/A // More precisely, if a successor already exists this thread is absolved 1798N/A // of the responsibility of waking (unparking) one. 1798N/A // The _succ variable is critical to reducing futile wakeup frequency. 1798N/A // _succ identifies the "heir presumptive" thread that has been made 1798N/A // ready (unparked) but that has not yet run. We need only one such 1798N/A // successor thread to guarantee progress. 1798N/A // section 3.3 "Futile Wakeup Throttling" for details. 1798N/A // Note that spinners in Enter() also set _succ non-null. 1798N/A // In the current implementation spinners opportunistically set 1798N/A // _succ so that exiting threads might avoid waking a successor. 1798N/A // Another less appealing alternative would be for the exiting thread 1798N/A // to drop the lock and then spin briefly to see if a spinner managed 1798N/A // to acquire the lock. If so, the exiting thread could exit 1798N/A // immediately without waking a successor, otherwise the exiting 1798N/A // thread would need to dequeue and wake a successor. 1798N/A // (Note that we'd need to make the post-drop spin short, but no 1798N/A // shorter than the worst-case round-trip cache-line migration time. 1798N/A // The dropped lock needs to become visible to the spinner, and then 1798N/A // the acquisition of the lock by the spinner must become visible to 1798N/A // It appears that an heir-presumptive (successor) must be made ready. 1798N/A // Only the current lock owner can manipulate the EntryList or 1798N/A // drain _cxq, so we need to reacquire the lock. If we fail 1798N/A // to reacquire the lock the responsibility for ensuring succession 1798N/A // Ratify the previously observed values. 1798N/A // inopportune interleaving -- the exiting thread (this thread) 1798N/A // in the fast-exit path raced an entering thread in the slow-enter 1798N/A // A. Try to reacquire the lock. 1798N/A // If the CAS() fails return immediately, otherwise 1798N/A // fall-through into the code below which wakes a successor. 1798N/A // B. If the elements forming the EntryList|cxq are TSM 1798N/A // we could simply unpark() the lead thread and return 1798N/A // without having set _succ. 1798N/A // QMode == 2 : cxq has precedence over EntryList. 1798N/A // Try to directly wake a successor from the cxq. 1798N/A // If successful, the successor will need to unlink itself from cxq. 1798N/A // Aggressively drain cxq into EntryList at the first opportunity. 1798N/A // This policy ensure that recently-run threads live at the head of EntryList. 1798N/A // Drain _cxq into EntryList - bulk transfer. 1798N/A // The following loop is tantamount to: w = swap (&cxq, NULL) 1798N/A // Append the RATs to the EntryList 1798N/A // TODO: organize EntryList as a CDLL so we can locate the tail in constant-time. 1798N/A // Fall thru into code that tries to wake a successor from EntryList 1798N/A // Aggressively drain cxq into EntryList at the first opportunity. 1798N/A // This policy ensure that recently-run threads live at the head of EntryList. 1798N/A // Drain _cxq into EntryList - bulk transfer. 1798N/A // The following loop is tantamount to: w = swap (&cxq, NULL) 1798N/A // Prepend the RATs to the EntryList 1798N/A // Fall thru into code that tries to wake a successor from EntryList 1798N/A // I'd like to write: guarantee (w->_thread != Self). 1798N/A // But in practice an exiting thread may find itself on the EntryList. 1798N/A // Lets say thread T1 calls O.wait(). Wait() enqueues T1 on O's waitset and 1798N/A // then calls exit(). Exit release the lock by setting O._owner to NULL. 1798N/A // Lets say T1 then stalls. T2 acquires O and calls O.notify(). The 1798N/A // notify() operation moves T1 from O's waitset to O's EntryList. T2 then 1798N/A // release the lock "O". T2 resumes immediately after the ST of null into 1798N/A // _owner, above. T2 notices that the EntryList is populated, so it 1798N/A // reacquires the lock and then finds itself on the EntryList. 1798N/A // Given all that, we have to tolerate the circumstance where "w" is 1798N/A // If we find that both _cxq and EntryList are null then just 1798N/A // re-run the exit protocol from the top. 1798N/A // Drain _cxq into EntryList - bulk transfer. 1798N/A // The following loop is tantamount to: w = swap (&cxq, NULL) 1798N/A // Convert the LIFO SLL anchored by _cxq into a DLL. 1798N/A // The list reorganization step operates in O(LENGTH(w)) time. 1798N/A // It's critical that this step operate quickly as 1798N/A // "Self" still holds the outer-lock, restricting parallelism 1798N/A // and effectively lengthening the critical section. 1798N/A // Invariant: s chases t chases u. 1798N/A // TODO-FIXME: consider changing EntryList from a DLL to a CDLL so 1798N/A // we have faster access to the tail. 1798N/A // QMode == 1 : drain cxq to EntryList, reversing order 1798N/A // We also reverse the order of the list. 1798N/A // QMode == 0 or QMode == 2 1798N/A // In 1-0 mode we need: ST EntryList; MEMBAR #storestore; ST _owner = NULL 1798N/A // The MEMBAR is satisfied by the release_store() operation in ExitEpilog(). 1798N/A // See if we can abdicate to a spinner instead of waking a thread. 1798N/A // A primary goal of the implementation is to reduce the 1798N/A// A faster alternate to handle_special_suspend_equivalent_condition() 1798N/A// handle_special_suspend_equivalent_condition() unconditionally 1798N/A// acquires the SR_lock. On some platforms uncontended MutexLocker() 1798N/A// operations have high latency. Note that in ::enter() we call HSSEC 1798N/A// while holding the monitor, so we effectively lengthen the critical sections. 1798N/A// There are a number of possible solutions: 1798N/A// A. To ameliorate the problem we might also defer state transitions 1798N/A// to as late as possible -- just prior to parking. 1798N/A// Given that, we'd call HSSEC after having returned from park(), 1798N/A// but before attempting to acquire the monitor. This is only a 1798N/A// partial solution. It avoids calling HSSEC while holding the 1798N/A// monitor (good), but it still increases successor reacquisition latency -- 1798N/A// the interval between unparking a successor and the time the successor 1798N/A// resumes and retries the lock. See ReenterI(), which defers state transitions. 1798N/A// If we use this technique we can also avoid EnterI()-exit() loop 1798N/A// in ::enter() where we iteratively drop the lock and then attempt 1798N/A// to reacquire it after suspending. 1798N/A// B. In the future we might fold all the suspend bits into a 1798N/A// composite per-thread suspend flag and then update it with CAS(). 1798N/A// Alternately, a Dekker-like mechanism with multiple variables 1798N/A// ST Self->_suspend_equivalent = false 1798N/A // We raced a suspension -- fall thru into the slow path 1798N/A // 2. membar #loadstore|#storestore; 1798N/A // Hygiene -- once we've set _owner = NULL we can't safely dereference Wakee again. 1798N/A // The thread associated with Wakee may have grabbed the lock and "Wakee" may be 1798N/A // out-of-scope (non-extant). 1798N/A // Maintain stats and report events to JVMTI 1798N/A// ----------------------------------------------------------------------------- 1798N/A// Class Loader deadlock handling. 1798N/A// complete_exit exits a lock returning recursion count 1798N/A// complete_exit requires an inflated monitor 1798N/A// The _owner field is not always the Thread addr even with an 1798N/A// inflated monitor, e.g. the monitor can be inflated by a non-owning 1798N/A// thread due to contention. 1798N/A// reenter() enters a lock and sets recursion count 1798N/A// ----------------------------------------------------------------------------- 1798N/A// A macro is used below because there may already be a pending 1798N/A// exception which should not abort the execution of the routines 1798N/A// which use this (which is why we don't put this into check_slow and 1798N/A// call it with a CHECK argument). 1798N/A// check_slow() is a misnomer. It's called to simply to throw an IMSX exception. 1798N/A// TODO-FIXME: remove check_slow() -- it's likely dead. 4141N/A// helper method for posting a monitor wait event 1798N/A// ----------------------------------------------------------------------------- 1798N/A// Note: a subset of changes to ObjectMonitor::wait() 1798N/A// will need to be replicated in complete_exit above 1798N/A // check for a pending interrupt 1798N/A // post monitor waited event. Note that this is past-tense, we are done waiting. 1798N/A // Note: 'false' parameter is passed here because the 1798N/A // wait was not timed out due to thread interrupt. 1798N/A // create a node to be put into the queue 1798N/A // Critically, after we reset() the event but prior to park(), we must check 1798N/A // for a pending interrupt. 1798N/A // Enter the waiting queue, which is a circular doubly linked list in this case 1798N/A // but it could be a priority queue or any data structure. 1798N/A // _WaitSetLock protects the wait queue. Normally the wait queue is accessed only 1798N/A // by the the owner of the monitor *except* in the case where park() 1798N/A // returns because of a timeout of interrupt. Contention is exceptionally rare 1798N/A // so we use a simple spin-lock instead of a heavier-weight blocking lock. 1798N/A // As soon as the ObjectMonitor's ownership is dropped in the exit() 1798N/A // call above, another thread can enter() the ObjectMonitor, do the 1798N/A // notify(), and exit() the ObjectMonitor. If the other thread's 1798N/A // exit() call chooses this thread as the successor and the unpark() 1798N/A // call happens to occur while this thread is posting a 1798N/A // MONITOR_CONTENDED_EXIT event, then we run the risk of the event 1798N/A // handler using RawMonitors and consuming the unpark(). 1798N/A // To avoid the problem, we re-post the event. This does no harm 1798N/A // even if the original unpark() was not consumed because we are the 1798N/A // chosen successor for this monitor. 1798N/A // The thread is on the WaitSet list - now park() it. 1798N/A // On MP systems it's conceivable that a brief spin before we park 1798N/A // TODO-FIXME: change the following logic to a loop of the form 1798N/A // while (!timeout && !interrupted && _notified == 0) park() 1798N/A {
// State transition wrappers 1798N/A // Thread is in thread_blocked state and oop access is unsafe. 1798N/A // were we externally suspended while we were waiting? 1798N/A // TODO-FIXME: add -- if succ == Self then succ = null. 1798N/A }
// Exit thread safepoint: transition _thread_blocked -> _thread_in_vm 1798N/A // Node may be on the WaitSet, the EntryList (or cxq), or in transition 1798N/A // from the WaitSet to the EntryList. 1798N/A // See if we need to remove Node from the WaitSet. 1798N/A // We use double-checked locking to avoid grabbing _WaitSetLock 1798N/A // if the thread is not on the wait queue. 1798N/A // Note that we don't need a fence before the fetch of TState. 1798N/A // In the worst case we'll fetch a old-stale value of TS_WAIT previously 1798N/A // written by the is thread. (perhaps the fetch might even be satisfied 1798N/A // by a look-aside into the processor's own store buffer, although given 1798N/A // the length of the code path between the prior ST and this load that's 1798N/A // highly unlikely). If the following LD fetches a stale TS_WAIT value 1798N/A // then we'll acquire the lock and then re-fetch a fresh TState value. 1798N/A // That is, we fail toward safety. 1798N/A // The thread is now either on off-list (TS_RUN), 1798N/A // on the EntryList (TS_ENTER), or on the cxq (TS_CXQ). 1798N/A // The Node's TState variable is stable from the perspective of this thread. 1798N/A // No other threads will asynchronously modify TState. 1798N/A // Reentry phase -- reacquire the monitor. 1798N/A // re-enter contended monitor after object.wait(). 1798N/A // retain OBJECT_WAIT state until re-enter successfully completes 1798N/A // Thread state is thread_in_vm and oop access is again safe, 1798N/A // although the raw address of the object may have changed. 1798N/A // (Don't cache naked oops over safepoints, of course). 1798N/A // post monitor waited event. Note that this is past-tense, we are done waiting. 1798N/A // Self has reacquired the lock. 1798N/A // Lifecycle - the node representing Self must not appear on any queues. 1798N/A // Node is about to go out-of-scope, but even if it were immortal we wouldn't 1798N/A // want residual elements associated with this thread left on any lists. 1798N/A // Verify a few postconditions 1798N/A // check if the notification happened 1798N/A // no, it could be timeout or Thread.interrupt() or both 1798N/A // check for interrupt event, otherwise it is timeout 1798N/A // NOTE: Spurious wake up will be consider as timeout. 1798N/A // Monitor notify has precedence over thread interrupt. 1798N/A// If the lock is cool (cxq == null && succ == null) and we're on an MP system 1798N/A// then instead of transferring a thread from the WaitSet to the EntryList 1798N/A// we might just dequeue a thread from the WaitSet and directly unpark() it. 1798N/A // CONSIDER: finding the tail currently requires a linear-time walk of 1798N/A // the EntryList. We can make tail access constant-time by converting to 1798N/A // a CDLL instead of using our current DLL. 1798N/A // _WaitSetLock protects the wait queue, not the EntryList. We could 1798N/A // move the add-to-EntryList operation, above, outside the critical section 1798N/A // protected by _WaitSetLock. In practice that's not useful. With the 1798N/A // exception of wait() timeouts and interrupts the monitor owner 1798N/A // is the only thread that grabs _WaitSetLock. There's almost no contention 1798N/A // on _WaitSetLock so it's not profitable to reduce the length of the 1798N/A // Disposition - what might we do with iterator ? 1798N/A // a. add it directly to the EntryList - either tail or head. 1798N/A // b. push it onto the front of the _cxq. 1798N/A // CONSIDER: finding the tail currently requires a linear-time walk of 1798N/A // the EntryList. We can make tail access constant-time by converting to 1798N/A // a CDLL instead of using our current DLL. 1798N/A // _WaitSetLock protects the wait queue, not the EntryList. We could 1798N/A // move the add-to-EntryList operation, above, outside the critical section 1798N/A // protected by _WaitSetLock. In practice that's not useful. With the 1798N/A // exception of wait() timeouts and interrupts the monitor owner 1798N/A // is the only thread that grabs _WaitSetLock. There's almost no contention 1798N/A // on _WaitSetLock so it's not profitable to reduce the length of the 1798N/A// ----------------------------------------------------------------------------- 1798N/A// Adaptive Spinning Support 1798N/A// Adaptive spin-then-block - rational spinning 1798N/A// Note that we spin "globally" on _owner with a classic SMP-polite TATAS 1798N/A// algorithm. On high order SMP systems it would be better to start with 1798N/A// a brief global spin and then revert to spinning locally. In the spirit of MCS/CLH, 1798N/A// a contending thread could enqueue itself on the cxq and then spin locally 1798N/A// on a thread-specific variable such as its ParkEvent._Event flag. 1798N/A// That's left as an exercise for the reader. Note that global spinning is 1798N/A// not problematic on Niagara, as the L2$ serves the interconnect and has both 1798N/A// low latency and massive bandwidth. 1798N/A// Broadly, we can fix the spin frequency -- that is, the % of contended lock 1798N/A// acquisition attempts where we opt to spin -- at 100% and vary the spin count 1798N/A// (duration) or we can fix the count at approximately the duration of 1798N/A// a context switch and vary the frequency. Of course we could also 1798N/A// vary both satisfying K == Frequency * Duration, where K is adaptive by monitor. 1798N/A// This implementation varies the duration "D", where D varies with 1798N/A// the success rate of recent spin attempts. (D is capped at approximately 1798N/A// length of a round-trip context switch). The success rate for recent 1798N/A// spin attempts is a good predictor of the success rate of future spin 1798N/A// attempts. The mechanism adapts automatically to varying critical 1798N/A// section length (lock modality), system load and degree of parallelism. 1798N/A// D is maintained per-monitor in _SpinDuration and is initialized 1798N/A// optimistically. Spin frequency is fixed at 100%. 1798N/A// Note that _SpinDuration is volatile, but we update it without locks 1798N/A// or atomics. The code is designed so that _SpinDuration stays within 1798N/A// a reasonable range even in the presence of races. The arithmetic 1798N/A// operations on _SpinDuration are closed over the domain of legal values, 1798N/A// so at worst a race will install and older but still legal value. 1798N/A// At the very worst this introduces some apparent non-determinism. 1798N/A// We might spin when we shouldn't or vice-versa, but since the spin 1798N/A// count are relatively short, even in the worst case, the effect is harmless. 1798N/A// Care must be taken that a low "D" value does not become an 1798N/A// an absorbing state. Transient spinning failures -- when spinning 1798N/A// is overall profitable -- should not cause the system to converge 1798N/A// on low "D" values. We want spinning to be stable and predictable 1798N/A// and fairly responsive to change and at the same time we don't want 1798N/A// it to oscillate, become metastable, be "too" non-deterministic, 1798N/A// or converge on or enter undesirable stable absorbing states. 1798N/A// We implement a feedback-based control system -- using past behavior 1798N/A// to predict future behavior. We face two issues: (a) if the 1798N/A// input signal is random then the spin predictor won't provide optimal 1798N/A// results, and (b) if the signal frequency is too high then the control 1798N/A// system, which has some natural response lag, will "chase" the signal. 1798N/A// (b) can arise from multimodal lock hold times. Transient preemption 1798N/A// can also result in apparent bimodal lock hold times. 1798N/A// Although sub-optimal, neither condition is particularly harmful, as 1798N/A// in the worst-case we'll spin when we shouldn't or vice-versa. 1798N/A// The maximum spin duration is rather short so the failure modes aren't bad. 1798N/A// To be conservative, I've tuned the gain in system to bias toward 1798N/A// _not spinning. Relatedly, the system can sometimes enter a mode where it 1798N/A// "rings" or oscillates between spinning and not spinning. This happens 1798N/A// when spinning is just on the cusp of profitability, however, so the 1798N/A// situation is not dire. The state is benign -- there's no need to add 1798N/A// hysteresis control to damp the transition rate between spinning and 1798N/A// Spinning: Fixed frequency (100%), vary duration 1798N/A // Dumb, brutal spin. Good for comparative measurements against adaptive spinning. 1798N/A // Increase _SpinDuration ... 1798N/A // Note that we don't clamp SpinDuration precisely at SpinLimit. 1798N/A // Raising _SpurDuration to the poverty line is key. 1798N/A // Admission control - verify preconditions for spinning 1798N/A // We always spin a little bit, just to prevent _SpinDuration == 0 from 1798N/A // becoming an absorbing state. Put another way, we spin briefly to 1798N/A // sample, just in case the system load, parallelism, contention, or lock 1798N/A // Consider the following alternative: 1798N/A // spin attempt. "Periodically" might mean after a tally of 1798N/A // the # of failed spin attempts (or iterations) reaches some threshold. 1798N/A // This takes us into the realm of 1-out-of-N spinning, where we 1798N/A // hold the duration constant but vary the frequency. 1798N/A // Slighty racy, but benign ... 1798N/A // We're good to spin ... spin ingress. 1798N/A // CONSIDER: use Prefetch::write() to avoid RTS->RTO upgrades 1798N/A // when preparing to LD...CAS _owner, etc and the CAS is likely 1798N/A // There are three ways to exit the following loop: 1798N/A // 1. A successful spin where this thread has acquired the lock. 1798N/A // 2. Spin failure with prejudice 1798N/A // 3. Spin failure without prejudice 1798N/A // Periodic polling -- Check for pending GC 1798N/A // Threads may spin while they're unsafe. 1798N/A // We don't want spinning threads to delay the JVM from reaching 1798N/A // a stop-the-world safepoint or to steal cycles from GC. 1798N/A // If we detect a pending safepoint we abort in order that 1798N/A // (a) this thread, if unsafe, doesn't delay the safepoint, and (b) 1798N/A // this thread, if safe, doesn't steal cycles from GC. 1798N/A // This is in keeping with the "no loitering in runtime" rule. 1798N/A // We periodically check to see if there's a safepoint pending. 1798N/A // Exponential back-off ... Stay off the bus to reduce coherency traffic. 1798N/A // This is useful on classic SMP systems, but is of less utility on 1798N/A // Trade-off: lock acquisition latency vs coherency bandwidth. 1798N/A // Lock hold times are typically short. A histogram 1798N/A // of successful spin attempts shows that we usually acquire 1798N/A // the lock early in the spin. That suggests we want to 1798N/A // sample _owner frequently in the early phase of the spin, 1798N/A // but then back-off and sample less frequently as the spin 1798N/A // progresses. The back-off makes a good citizen on SMP big 1798N/A // SMP systems. Oversampling _owner can consume excessive 1798N/A // coherency bandwidth. Relatedly, if we _oversample _owner we 1798N/A // can inadvertently interfere with the the ST m->owner=null. 1798N/A // executed by the lock owner. 1798N/A // The 0xF, above, corresponds to the exponent. 1798N/A // If this thread observes the monitor transition or flicker 1798N/A // from locked to unlocked to locked, then the odds that this 1798N/A // thread will acquire the lock in this spin attempt go down 1798N/A // considerably. The same argument applies if the CAS fails 1798N/A // or if we observe _owner change from one non-null value to 1798N/A // another non-null value. In such cases we might abort 1798N/A // the spin without prejudice or apply a "penalty" to the 1798N/A // spin count-down variable "ctr", reducing it by 100, say. 1798N/A // The CAS succeeded -- this thread acquired ownership 1798N/A // Take care of some bookkeeping to exit spin state. 1798N/A // Increase _SpinDuration : 1798N/A // The spin was successful (profitable) so we tend toward 1798N/A // longer spin attempts in the future. 1798N/A // CONSIDER: factor "ctr" into the _SpinDuration adjustment. 1798N/A // If we acquired the lock early in the spin cycle it 1798N/A // makes sense to increase _SpinDuration proportionally. 1798N/A // Note that we don't clamp SpinDuration precisely at SpinLimit. 1798N/A // The CAS failed ... we can take any of the following actions: 1798N/A // * penalize: ctr -= Knob_CASPenalty 1798N/A // * exit spin with prejudice -- goto Abort; 1798N/A // * exit spin without prejudice. 1798N/A // * Since CAS is high-latency, retry again immediately. 1798N/A // Did lock ownership change hands ? 1798N/A // Abort the spin if the owner is not executing. 1798N/A // The owner must be executing in order to drop the lock. 1798N/A // Spinning while the owner is OFFPROC is idiocy. 1798N/A // Consider: ctr -= RunnablePenalty ; 1798N/A // Spin failed with prejudice -- reduce _SpinDuration. 1798N/A // TODO: Use an AIMD-like policy to adjust _SpinDuration. 1798N/A // AIMD is globally stable. 1798N/A // Consider an AIMD scheme like: x -= (x >> 3) + 100 1798N/A // This is globally sample and tends to damp the response. 1798N/A // Invariant: after setting succ=null a contending thread 1798N/A // must recheck-retry _owner before parking. This usually happens 1798N/A // in the normal usage of TrySpin(), but it's safest 1798N/A // to make TrySpin() as foolproof as possible. 1798N/A// NotRunnable() -- informed spinning 1798N/A// Don't bother spinning if the owner is not eligible to drop the lock. 1798N/A// spin only if the owner thread is _thread_in_Java or _thread_in_vm. 1798N/A// The thread must be runnable in order to drop the lock in timely fashion. 1798N/A// If the _owner is not runnable then spinning will not likely be 1798N/A// Beware -- the thread referenced by _owner could have died 1798N/A// so a simply fetch from _owner->_thread_state might trap. 1798N/A// Instead, we use SafeFetchXX() to safely LD _owner->_thread_state. 1798N/A// Because of the lifecycle issues the schedctl and _thread_state values 1798N/A// observed by NotRunnable() might be garbage. NotRunnable must 1798N/A// tolerate this and consider the observed _thread_state value 1798N/A// Beware too, that _owner is sometimes a BasicLock address and sometimes 1798N/A// a thread pointer. We differentiate the two cases with OwnerIsThread. 1798N/A// Alternately, we might tag the type (thread pointer vs basiclock pointer) 1798N/A// with the LSB of _owner. Another option would be to probablistically probe 1798N/A// the putative _owner->TypeTag value. 1798N/A// Checking _thread_state isn't perfect. Even if the thread is 1798N/A// in_java it might be blocked on a page-fault or have been preempted 1798N/A// thread is doing, however. 1798N/A// We'll need to use SafeFetch32() to read from the schedctl block. 1798N/A// The return value from NotRunnable() is *advisory* -- the 1798N/A// result is based on sampling and is not necessarily coherent. 1798N/A// The caller must tolerate false-negative and false-positive errors. 1798N/A// Spinning, in general, is probabilistic anyway. 1798N/A // Check either OwnerIsThread or ox->TypeTag == 2BAD. 1798N/A // Avoid transitive spinning ... 1798N/A // Say T1 spins or blocks trying to acquire L. T1._Stalled is set to L. 1798N/A // Immediately after T1 acquires L it's possible that T2, also 1798N/A // spinning on L, will see L.Owner=T1 and T1._Stalled=L. 1798N/A // This occurs transiently after T1 acquired L but before 1798N/A // T1 managed to clear T1.Stalled. T2 does not need to abort 1798N/A // its spin in this circumstance. 1798N/A // consider also: jst != _thread_in_Java -- but that's overspecific. 1798N/A// ----------------------------------------------------------------------------- 1798N/A // put node at end of queue (circular doubly linked list) 1798N/A // dequeue the very first waiter 1798N/A // when the waiter has woken up because of interrupt, 1798N/A // timeout or other spurious wake-up, dequeue the 1798N/A // waiter from waiting list 1798N/A// ----------------------------------------------------------------------------- 1798N/A// One-shot global initialization for the sync subsystem. 1798N/A// We could also defer initialization and initialize on-demand 1798N/A// the first time we call inflate(). Initialization would 1798N/A// be protected - like so many things - by the MonitorCache_lock. 1798N/A// When possible, it's better to catch errors deterministically at 1798N/A// compile-time than at runtime. The down-side to using compile-time 1798N/A// asserts is that error message -- often something about negative array 1798N/A // One-shot global initialization ... 1798N/A // The initialization is idempotent, so we don't need locks. 1798N/A // In the future consider doing this via os::init_2(). 1798N/A // SyncKnobs consist of <Key>=<Value> pairs in the style 1798N/A // of environment variables. Start by converting ':' to NUL. 1798N/A // CONSIDER: BackOffMask = ROUNDUP_NEXT_POWER2 (ncpus-1)