memnode.cpp revision 2674
1472N/A * Copyright (c) 1997, 2011, Oracle and/or its affiliates. All rights reserved. 0N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 0N/A * This code is free software; you can redistribute it and/or modify it 0N/A * under the terms of the GNU General Public License version 2 only, as 0N/A * published by the Free Software Foundation. 0N/A * This code is distributed in the hope that it will be useful, but WITHOUT 0N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 0N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 0N/A * version 2 for more details (a copy is included in the LICENSE file that 0N/A * accompanied this code). 0N/A * You should have received a copy of the GNU General Public License version 0N/A * 2 along with this work; if not, write to the Free Software Foundation, 0N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 1472N/A * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 0N/A// Portions of code courtesy of Clifford Click 0N/A// Optimization - Graph Style 0N/A//============================================================================= 0N/A // fake the missing field 0N/A return mchain;
// don't try to optimize non-instance types 0N/A break;
// hit one of our sentinels 0N/A // skip over a call which does not affect this memory slice 0N/A break;
// hit one of our sentinels 0N/A // Stop if this is the initialization for the object instance which 0N/A // which contains this memory slice, otherwise skip over it. 0N/A // Can not bypass initialization of the instance 0N/A // we are looking for. 0N/A // Otherwise skip it (the call updated 'result' value). 0N/A // clone the Phi with our address type 0N/A // Check that current type is consistent with the alias index used during graph construction 0N/A // Sometimes dead array references collapse to a[-1], a[-2], or a[-3] 0N/A // don't assert if it is dead code. 0N/A // TypeOopPtr::NOTNULL+any is an OOP with unknown offset - generally 0N/A // means an array I have not precisely typed yet. Do not do any 0N/A // alias stuff with it any time soon. 0N/A // compress paths and change unreachable cycles to TOP 0N/A // If not, we can update the input infinitely along a MergeMem cycle 0N/A // Equivalent code in PhiNode::Ideal 400N/A // If transformed to a MergeMem, get the desired slice 400N/A // Otherwise the returned node represents memory for every slice 400N/A // Update input if it is progress over what we have now 0N/A//--------------------------Ideal_common--------------------------------------- 0N/A// Look for degenerate control and memory inputs. Bypass MergeMem inputs. 0N/A// Unhook non-raw memories from complete (macro-expanded) initializations. 0N/A // If our control input is a dead region, kill all below the region 0N/A // Don't bother trying to transform a dead node 0N/A // Wait if control on the worklist. 0N/A // This control path may be dead. 0N/A // Delay this memory node transformation until the control is processed. 0N/A // Ignore if memory is dead, or self-loop 0N/A // The address's base and type may change when the address is processed. 0N/A // Delay this mem node transformation until the address is processed. 0N/A // Do NOT remove or optimize the next lines: ensure a new alias index 0N/A // is allocated for an oop pointer type before Escape Analysis. 0N/A // Note: C++ will not remove it since the call has side effect. 0N/A // Avoid independent memory operations 0N/A // The code which unhooks non-raw memories from complete (macro-expanded) 0N/A // initializations was removed. After macro-expansion all stores catched 0N/A // by Initialize node became raw stores and there is no information 0N/A // which memory slices they modify. So it is unsafe to move any memory 0N/A // operation above these stores. Also in most cases hooked non-raw memories 0N/A // were already unhooked by using information from detect_ptr_independence() 0N/A // and find_previous_store(). 0N/A // let the subclass continue analyzing... 0N/A// Helper function for proving some simple control dominations. 0N/A// Attempt to prove that all control inputs of 'dom' dominate 'sub'. 0N/A// Already assumes that 'dom' is available at 'sub', and that 'sub' 0N/A// is not a constant (dominated by the method's StartNode). 0N/A// Used by MemNode::find_previous_store to prove that the 0N/A// control input of a memory operation predates (dominates) 0N/A// an allocation it wants to look past. 0N/A return false;
// Conservative answer for dead code 0N/A // Check 'dom'. Skip Proj and CatchProj nodes. 0N/A return false;
// Conservative answer for dead code 0N/A // For the case when, for example, 'sub' is Initialize and the original 0N/A // 'dom' is Proj node of the 'sub'. 0N/A // 'dom' dominates 'sub' if its control edge and control edges 0N/A // of all its inputs dominate or equal to sub's control edge. 0N/A // Currently 'sub' is either Allocate, Initialize or Start nodes. 0N/A // Or Region for the check in LoadNode::Ideal(); 0N/A // 'sub' should have sub->in(0) != NULL. 0N/A // Get control edge of 'sub'. 0N/A return false;
// Conservative answer for dead code 0N/A // Check all control edges of 'dom'. 0N/A return false;
// One of dom's inputs dominated by sub. 0N/A // Check only own control edge for pinned non-control nodes. 0N/A return false;
// Conservative answer for dead code 0N/A // First, own control edge. 0N/A return false;
// Conservative answer for dead code 0N/A // Now, the rest of edges. 0N/A//---------------------detect_ptr_independence--------------------------------- 0N/A// Used by MemNode::find_previous_store to prove that two base 0N/A// pointers are never equal. 0N/A// The pointers are accompanied by their associated allocations, 0N/A// if any, which have been previously discovered by the caller. 0N/A // Attempt to prove that these two pointers cannot be aliased. 0N/A // They may both manifestly be allocations, and they should differ. 0N/A // Or, if they are not both allocations, they can be distinct constants. 0N/A // Otherwise, one is an allocation and the other a pre-existing value. 0N/A }
else if (
a1 !=
NULL) {
// one allocation a1 0N/A // (Note: p2->is_Con implies p2->in(0)->is_Root, which dominates.) 0N/A }
else {
//(a2 != NULL) // one allocation a2 0N/A// The logic for reordering loads and stores uses four steps: 0N/A// (a) Walk carefully past stores and initializations which we 0N/A// can prove are independent of this load. 0N/A// (b) Observe that the next memory state makes an exact match 0N/A// with self (load or store), and locate the relevant store. 0N/A// (c) Ensure that, if we were to wire self directly to the store, 0N/A// the optimizer would fold it up somehow. 0N/A// (d) Do the rewiring, and return, depending on some other part of 0N/A// the optimizer to fold up the load. 0N/A// This routine handles steps (a) and (b). Steps (c) and (d) are 0N/A// specific to loads and stores, so they are handled by the callers. 0N/A// (Currently, only LoadNode::Ideal has steps (c), (d). More later.) 0N/A return NULL;
// cannot unalias unless there are precise offsets 0N/A int cnt =
50;
// Cycle limiter 0N/A for (;;) {
// While we can dance past unrelated stores... 0N/A if (--
cnt < 0)
break;
// Caught in cycle or a complicated dance? 0N/A break;
// inscrutable pointer 0N/A // Success: The offsets are provably independent. 0N/A // (You may ask, why not just test st_offset != offset and be done? 0N/A // The answer is that stores of different sizes can co-exist 0N/A // in the same sequence of RawMem effects. We sometimes initialize 0N/A // a whole 'tile' of array elements with a single jint or jlong.) 0N/A continue;
// (a) advance through independent store memory 0N/A // Success: The bases are provably independent. 0N/A continue;
// (a) advance through independent store memory 0N/A // (b) At this point, if the bases or offsets do not agree, we lose, 0N/A // since we have not managed to prove 'this' and 'mem' independent. 0N/A return mem;
// let caller handle steps (c), (d) 0N/A break;
// something degenerated 0N/A // The bases are provably independent: Either they are 0N/A // manifestly distinct allocations, or else the control 0N/A // of this load dominates the store's allocation. 0N/A continue;
// (a) advance through independent store memory 0N/A // (b) at this point, if we are not looking at a store initializing 0N/A // the same allocation we are loading from, we lose. 0N/A // From caller, can_see_stored_value will consult find_captured_store. 0N/A return mem;
// let caller handle steps (c), (d) 0N/A // Can't use optimize_simple_memory_chain() since it needs PhaseGVN. 0N/A continue;
// (a) advance through independent call memory 0N/A continue;
// (a) advance through independent MemBar memory 0N/A // (the call updated 'mem' value) 0N/A continue;
// (a) advance through independent allocation memory 0N/A // Can not bypass initialization of the instance 0N/A // we are looking for. 0N/A continue;
// (a) advance through independent MergeMem memory 0N/A // Unless there is an explicit 'continue', we must bail out here, 0N/A // because 'mem' is an inscrutable memory state (e.g., a call). 0N/A//----------------------calculate_adr_type------------------------------------- 0N/A// Helper function. Notices when the given type of address hits top or bottom. 0N/A// Also, asserts a cross-check of the type against the expected address type. 0N/A // %%%% [phh] We don't check the alias index if cross_check is 0N/A // TypeRawPtr::BOTTOM. Needs to be investigated. 0N/A // Recheck the alias index, to see if it has changed (due to a bug). 0N/A "must stay in the original alias category");
0N/A // The type of the address must be contained in the adr_type, 0N/A // disregarding "null"-ness. 0N/A // (We make an exception for TypeRawPtr::BOTTOM, which is a bit bucket.) 0N/A "real address must not escape from expected memory type");
0N/A//------------------------adr_phi_is_loop_invariant---------------------------- 0N/A// A helper function for Ideal_DU_postCCP to check if a Phi in a counted 0N/A// loop is loop invariant. Make a quick traversal of Phi and associated 0N/A// CastPP nodes, looking to see if they are a closed group within the loop. 0N/A // The idea is that the phi-nest must boil down to only CastPP nodes 0N/A // with the same data. This implies that any path into the loop already 0N/A // includes such a CastPP, and so the original cast, whatever its input, 0N/A // must be covered by an equivalent cast, with an earlier control input. 0N/A // The loop entry input of the phi should be the unique dominating 0N/A // Add the phi node and the cast to the worklist. 0N/A // Begin recursive walk of phi nodes. 0N/A // Take a node off the worklist 0N/A // Add it to the closure. 668N/A // Make a sanity check to ensure we don't waste too much time here. 783N/A // - it is a cast of an identical value 783N/A // - or it is a phi node (then we add its inputs to the worklist) 783N/A // Otherwise, the node is not OK, and we presume the cast is not invariant 296N/A // Quit when the worklist is empty, and we've found no offending nodes. 296N/A//------------------------------Ideal_DU_postCCP------------------------------- 296N/A// Find any cast-away of null-ness and keep its control. Null cast-aways are 296N/A// going away in this pass and we need to make this memory op depend on the 296N/A// I tried to leave the CastPP's in. This makes the graph more accurate in 0N/A// some sense; we get to keep around the knowledge that an oop is not-null 0N/A// after some test. Alas, the CastPP's interfere with GVN (some values are 0N/A// the regular oop, some are the CastPP of the oop, all merge at Phi's which 0N/A// cannot collapse, etc). This cost us 10% on SpecJVM, even when I removed 0N/A// some of the more trivial cases in the optimizer. Removing more useless 0N/A// Phi's started allowing Loads to illegally float above null checks. I gave 0N/A// up on this approach. CNC 10/20/2000 0N/A// This static method may be called not from MemNode (EncodePNode calls it). 0N/A// Only the control edge of the node 'n' might be updated. 0N/A // Need a null check? Regular static accesses do not because they are 0N/A // from constant addresses. Array ops are gated by the range check (which 0N/A // always includes a NULL check). Just check field ops. 0N/A // Scan upwards for the highest location we can place this memory op. 0N/A case Op_AddP:
// No change to NULL-ness, so peek thru AddP's 0N/A // If the CastPP is useless, just peek on through it. 0N/A // Remember the cast that we've peeked though. If we peek 0N/A // through more than one, then we end up remembering the highest 0N/A // one, that is, if in a loop, the one closest to the top. 293N/A // CastPP is going away in this pass! We need this memory op to be 293N/A // control-dependent on the test that is guarding the CastPP. 0N/A // Attempt to float above a Phi to some dominating point. 0N/A // If we've already peeked through a Cast (which could have set the 0N/A // control), we can't float above a Phi, because the skipped Cast 0N/A // may not be loop invariant. 0N/A // Intentional fallthrough! 0N/A // No obvious dominating point. The mem op is pinned below the Phi 293N/A // by the Phi itself. If the Phi goes away (no true value is merged) 293N/A // then the mem op can float, but not indefinitely. It must be pinned 293N/A // behind the controls leading to the Phi. 293N/A // These usually stick around to change address type, however a 0N/A // useless one can be elided and we still need to pick up a control edge 0N/A // This CheckCastPP node has NO control and is likely useless. But we 0N/A // need check further up the ancestor chain for a control input to keep 0N/A // the node in place. 4959717. 0N/A // List of "safe" opcodes; those that implicitly block the memory 0N/A // op below any null check. 0N/A break;
// No progress 0N/A case Op_Proj:
// Direct call to an allocation routine 113N/A // We further presume that this is one of 113N/A // new_instance_Java, new_array_Java, or 113N/A // the like, but do not assert for this. 113N/A // similar case to new_instance_Java, etc. 113N/A // Projections from fetch_oop (OSR) are allowed as well. 113N/A//============================================================================= 296N/A // standard dump does this in Verbose and WizardMode 296N/A//----------------------------is_immutable_value------------------------------- 296N/A// Helper function to allow a raw load without control edge for some cases 113N/A//----------------------------LoadNode::make----------------------------------- 113N/A// Polymorphic factory method: 113N/A // sanity check the alias category against the created node type 113N/A "use LoadKlassNode instead");
113N/A "use LoadRangeNode instead");
113N/A // Check control edge of raw loads 113N/A // oop will be recorded in oop map if load crosses safepoint 113N/A "raw memory operations should have control edge");
0N/A//------------------------------hash------------------------------------------- 0N/A // unroll addition of interesting fields 0N/A//---------------------------can_see_stored_value------------------------------ 0N/A// This routine exists to make sure this set of tests is done the same 0N/A// everywhere. We need to make a coordinated change: first LoadNode::Ideal 0N/A// will change the graph shape in a way which makes memory alive twice at the 0N/A// same time (uses the Oracle model of aliasing), then some 0N/A// LoadXNode::Identity will fold things back to the equivalence-class model 0N/A // Skip through chains of MemBarNodes checking the MergeMems for 0N/A // new states for the slice of this load. Stop once any other 0N/A // kind of node is encountered. Loads from final memory can skip 0N/A // through any kind of MemBar but normal loads shouldn't skip 0N/A // through MemBarAcquire since the could allow them to move out of 0N/A // a synchronized region. 0N/A // Save the new memory state for the slice and fall through 0N/A // Loop around twice in the case Load -> Initialize -> Store. 0N/A // (See PhaseIterGVN::add_users_to_worklist, which knows about this case.) 0N/A // Try harder before giving up... Match raw and non-raw pointers. 0N/A // At this point we have proven something like this setup: 0N/A // A = Allocate(...) 0N/A // L = LoadQ(, AddP(CastPP(, A.Parm),, #Off)) 0N/A // S = StoreQ(, AddP(, A.Parm , #Off), V) 0N/A // (Actually, we haven't yet proven the Q's are the same.) 0N/A // In other words, we are loading from a casted version of 0N/A // the same pointer-and-offset that we stored to. 0N/A // Thus, we are able to replace L by V. 0N/A // Now prove that we have a LoadQ matched to a StoreQ, for some Q. 0N/A // A load from a freshly-created object always returns zero. 0N/A // (This can happen after LoadNode::Ideal resets the load's memory input 0N/A // to find_captured_store, which returned InitializeNode::zero_memory.) 0N/A // return a zero value for the load's basic type 0N/A // (This is one of the few places where a generic PhaseTransform 0N/A // can create new nodes. Think of it as lazily manifesting 0N/A // virtually pre-existing constants.) 0N/A // A load from an initialization barrier can match a captured store. 0N/A // examine a captured store value 0N/A continue;
// take one more trip around 0N/A//----------------------is_instance_field_load_with_local_phi------------------ 0N/A//------------------------------Identity--------------------------------------- 0N/A// Loads are identity if previous store is to same address 0N/A // If the previous store-maker is the right kind of Store, and the store is 0N/A // to the same address, then we are equal to the value stored. 0N/A // byte, short & char stores truncate naturally. 0N/A // A load has to load the truncated value which requires 0N/A // some sort of masking operation and that requires an 0N/A // Ideal call instead of an Identity call. 0N/A // If the input to the store does not fit with the load's result type, 0N/A // it must be truncated via an Ideal call. 0N/A // (This works even when value is a Con, but LoadNode::Value 0N/A // usually runs first, producing the singleton type of the Con.) 0N/A // Search for an existing data phi which was generated before for the same 0N/A // instance's field to avoid infinite generation of phis in a loop. 0N/A// Returns true if the AliasType refers to the field that holds the 0N/A// cached box array. Currently only handles the IntegerCache case. 0N/A// Fetch the base value in the autobox array 0N/A // Fetch the box object at the base of the array and get its value 0N/A // This should be true nonstatic_field_at requires calling 0N/A // nof_nonstatic_fields so check it anyway 0N/A// Returns true if the AliasType refers to the value field of an 0N/A// autobox object. Currently only handles Integer. 0N/A// We're loading from an object which has autobox behaviour. 0N/A// If this object is result of a valueOf call we'll have a phi 0N/A// merging a newly allocated object and a load from the cache. 0N/A// We want to replace this load with the original incoming 0N/A// argument to the valueOf call. 0N/A // Push the loads from the phi that comes from valueOf up 0N/A // through it to allow elimination of the loads and the recovery 0N/A // of the original value. 0N/A // Get LoadN node which loads cached Integer object 0N/A // Eliminate the load of Integer.value for integers from the cache 0N/A // array by deriving the value from the index into the array. 0N/A // Capture the offset of the load and then reverse the computation. 0N/A // Add up all the offsets making of the address of the load 0N/A // Remove the constant offset from the address and then 0N/A // remove the scaling of the offset to recover the original index. 0N/A // Peel the shift off directly but wrap it in a dummy node 0N/A // since Ideal can't return existing nodes 0N/A//------------------------------split_through_phi------------------------------ 0N/A// Split instance field load through Phi. 0N/A // Check for loop invariant. 0N/A // Do nothing here if Identity will find a value 0N/A // (to avoid infinite chain of value phis generation). 0N/A // Skip the split if the region dominates some control edge of the address. 0N/A x =
this->
clone();
// Else clone up the data op 0N/A // Alter data node to use pre-phi inputs // Check for a 'win' on some paths // See comments in PhaseIdealLoop::split_thru_phi(). // We now call Identity to try to simplify the cloned node. // Note that some Identity methods call phase->type(this). // Make sure that the type array is big enough for // our new node, even though we may throw the node away. // (This tweaking with igvn only works because x is a new node.) // If x is a TypeNode, capture any more-precise type permanently into Node // otherwise it will be not updated during igvn->transform since // igvn->type(x) is set to x->Value() already. // Else x is a new node we are keeping // We do not need register_new_node_with_optimizer // because set_type has already been called. //------------------------------Ideal------------------------------------------ // If the load is from Field memory and the pointer is non-null, we can // zero out the control input. // If the offset is constant and the base is an object allocation, // try to hook me up to the exact initializing store. // Skip up past a SafePoint control. Cannot do this for Stores because // pointer stores & cardmarks must stay on the same side of a SafePoint. // Check for useless control edge in some common special cases // A method-invariant, non-null address (constant or 'this' argument). // try to optimize our memory input // Split instance field load through Phi. // Check for prior store with a different base or offset; make Load // independent. Skip through any number of them. Bail out if the stores // are in an endless dead cycle and report no progress. This is a key // transform for Reflection. However, if after skipping through the Stores // we can't then fold up against a prior store do NOT do the transform as // this amounts to using the 'Oracle' model of aliasing. It leaves the same // array memory alive twice: once for the hoisted Load and again after the // bypassed Store. This situation only works if EVERYBODY who does // anti-dependence work knows how to bypass. I.e. we need all // anti-dependence checks to ask the same Oracle. Right now, that Oracle is // the alias index stuff. So instead, peek through Stores and IFF we can // Steps (a), (b): Walk past independent stores to find an exact match. // (c) See if we can fold up on the spot, but don't fold up here. // just return a prior value, which is done by Identity calls. // Make ready for step (d): return NULL;
// No further progress // Helper to recognize certain Klass fields which are invariant across // some group of array types (e.g., int[] or all T[] where T < Object). // The field is Klass::_modifier_flags. Return its (constant) value. // (Folds up the 2nd indirection in aClassConstant.getModifiers().) // The field is Klass::_access_flags. Return its (constant) value. // (Folds up the 2nd indirection in Reflection.getClassAccessFlags(aClassConstant).) // The field is Klass::_layout_helper. Return its constant value if known. //------------------------------Value----------------------------------------- // Either input is TOP ==> the result is TOP // Try to guess loaded type from pointer type // Don't do this for integer types. There is only potential profit if // the element type t is lower than _type; that is, for int types, if _type is // more restrictive than t. This only happens here if one is short and the other // char (both 16 bits), and in those cases we've made an intentional decision // to use one kind of load over the other. See AndINode::Ideal and 4965907. // Also, do not try to narrow the type for a LoadKlass, regardless of offset. // Yes, it is possible to encounter an expression like (LoadKlass p1:(AddP x x 8)) // where the _gvn.type of the AddP is wider than 8. This occurs when an earlier // copy p0 of (AddP x x 8) has been proven equal to p1, and the p0 has been // subsumed by p1. If p1 is on the worklist but has not yet been re-transformed, // it is possible that p1 will have a type like Foo*[int+]:NotNull*+any. // In fact, that could have been the original type of p1, and p1 could have // had an original form like p1:(AddP x x (LShiftL quux 3)), where the // expression (LShiftL quux 3) independently optimized to the constant 8. // t might actually be lower than _type, if _type is a unique // concrete subclass of abstract class t. // Make sure the reference is not into the header, by comparing // the offset against the offset of the start of the array's data. // Different array types begin at slightly different offsets (12 vs. 16). // We choose T_BYTE as an example base type that is least restrictive // as to alignment, which will therefore produce the smallest // In any case, do not allow the join, per se, to empty out the type. // This can happen if a interface-typed array narrows to a class type. // The pointers in the autobox arrays are always non-null // arrays can be cast to Objects // unsafe field access may not have a constant offset "Field accesses must be precise" );
// For oop loads, we expect the _type to be precise // For constant Strings treat the final fields as compile time constants. // arrays can be cast to Objects // also allow array-loading from the primary supertype // array during subtype checks "Field accesses must be precise" );
// For klass/static loads, we expect the _type to be precise // We are loading a field from a Klass metaobject whose identity // is known at compile time (the type is "exact" or "precise"). // Check for fields we know are maintained as constants by the VM. // The field is Klass::_super_check_offset. Return its (constant) value. // (Folds up type checking code.) // Compute index into primary_supers array // Check for overflowing; use unsigned compare to handle the negative case. // The field is an element of Klass::_primary_supers. Return its (constant) value. // (Folds up type checking code.) // The field is arrayKlass::_component_mirror. Return its (constant) value. // (Folds up aClassConstant.getComponentType, common in Arrays.copyOf.) // The field is Klass::_java_mirror. Return its (constant) value. // (Folds up the 2nd indirection in anObjConstant.getClass().) // We can still check if we are loading from the primary_supers array at a // shallow enough depth. Even though the klass is not exact, entries less // than or equal to its super depth are correct. // Compute index into primary_supers array // Check for overflowing; use unsigned compare to handle the negative case. // The field is an element of Klass::_primary_supers. Return its (constant) value. // (Folds up type checking code.) // If the type is enough to determine that the thing is not an array, // we can give the layout_helper a positive interval type. // This will help short-circuit some reflective code. // Note: When interfaces are reliable, we can narrow the interface // test to (klass != Serializable && klass != Cloneable). // The key property of this type is that it folds up tests // for array-ness, since it proves that the layout_helper is positive. // Thus, a generic value like the basic object layout helper works fine. // If we are loading from a freshly-allocated object, produce a zero, // if the load is provably beyond the header of the object. // (Also allow a variable load from a fresh array to produce zero.) // If we have an instance type and our memory input is the // programs's initial memory state, there is no matching store, // so just return a zero of the appropriate type //------------------------------match_edge------------------------------------- // Do we Match on this edge index or not? Match only the address. //--------------------------LoadBNode::Ideal-------------------------------------- // If the previous store is to the same address as this load, // and the value stored was larger than a byte, replace this load // with the value stored truncated to a byte. If no truncation is // needed, the replacement is done in LoadNode::Identity(). // Identity call will handle the case where truncation is not needed. //--------------------------LoadUBNode::Ideal------------------------------------- // If the previous store is to the same address as this load, // and the value stored was larger than a byte, replace this load // with the value stored truncated to a byte. If no truncation is // needed, the replacement is done in LoadNode::Identity(). // Identity call will handle the case where truncation is not needed. //--------------------------LoadUSNode::Ideal------------------------------------- // If the previous store is to the same address as this load, // and the value stored was larger than a char, replace this load // with the value stored truncated to a char. If no truncation is // needed, the replacement is done in LoadNode::Identity(). // Identity call will handle the case where truncation is not needed. //--------------------------LoadSNode::Ideal-------------------------------------- // If the previous store is to the same address as this load, // and the value stored was larger than a short, replace this load // with the value stored truncated to a short. If no truncation is // needed, the replacement is done in LoadNode::Identity(). // Identity call will handle the case where truncation is not needed. //============================================================================= //----------------------------LoadKlassNode::make------------------------------ // Polymorphic factory method: // sanity check the alias category against the created node type //------------------------------Value------------------------------------------ // Either input is TOP ==> the result is TOP // Return a more precise klass, if possible // We are loading a special hidden field from a Class mirror object, // the field which points to the VM's Klass metaobject. // java_mirror_type returns non-null for compile-time Class constants. // constant oop => constant klass // a primitive Class (e.g., int.class) has NULL for a klass field // (Folds up the 1st indirection in aClassConstant.getModifiers().) // non-constant mirror, so we can't tell what's going on return _type;
// Bail out if not loaded // See if we can become precise: no subklasses and no interface // (Note: We need to support verified interfaces.) //assert(!UseExactTypes, "this code should be useless with exact types"); // Add a dependence; if any subclass added we need to recompile // %%% should use stronger assert_unique_concrete_subtype instead // Return root of possible klass // Check for loading klass from an array // If the klass is an object array, we defer the question to the // array component klass. // See if we can become precise: no subklasses and no interface //assert(!UseExactTypes, "this code should be useless with exact types"); // Add a dependence; if any subclass added we need to recompile // Return precise array klass }
else {
// Found a type-array? //assert(!UseExactTypes, "this code should be useless with exact types"); // Check for loading klass from an array klass return _type;
// Bail out if not loaded // // Always returning precise element type is incorrect, // // e.g., element type could be object and array may contain strings // return TypeKlassPtr::make(TypePtr::Constant, elem, 0); // The array's TypeKlassPtr was declared 'precise' or 'not precise' // according to the element type's subclassing. // The field is Klass::_super. Return its (constant) value. // (Folds up the 2nd indirection in aClassConstant.getSuperClass().) //------------------------------Identity--------------------------------------- // To clean up reflective code, simplify k.java_mirror.as_klass to plain k. // Also feed through the klass in Allocate(...klass...)._klass. // Take apart the address into an oop and and offset. // Return 'this' if we cannot. // We can fetch the klass directly through an AllocateNode. // This works even if the klass is not constant (clone or newArray). // Simplify k.java_mirror.as_klass to plain k, where k is a klassOop. // Simplify ak.component_mirror.array_klass to plain ak, ak an arrayKlass. // See inline_native_Class_query for occurrences of these patterns. // Java Example: x.getClass().isAssignableFrom(y) // Java Example: Array.newInstance(x.getClass().getComponentType(), n) // This improves reflective code, often making the Class // mirror go completely dead. (Current exception: Class // mirrors may appear in debug info, but we could clean them out by // introducing a new debug info operator for klassOop.java_mirror). // We are loading a special hidden field from a Class mirror, // the field which points to its Klass or arrayKlass metaobject. //------------------------------Value------------------------------------------ //------------------------------Identity--------------------------------------- // To clean up reflective code, simplify k.java_mirror.as_klass to narrow k. // Also feed through the klass in Allocate(...klass...)._klass. //------------------------------Value----------------------------------------- // Either input is TOP ==> the result is TOP //-------------------------------Ideal--------------------------------------- // Feed through the length in AllocateArray(...length...)._length. // Take apart the address into an oop and and offset. // Return 'this' if we cannot. // We can fetch the length directly through an AllocateArrayNode. // This works even if the length is not constant (clone or newArray). // New CastII improves on this. //------------------------------Identity--------------------------------------- // Feed through the length in AllocateArray(...length...)._length. // Take apart the address into an oop and and offset. // Return 'this' if we cannot. // We can fetch the length directly through an AllocateArrayNode. // This works even if the length is not constant (clone or newArray). // Do not allow make_ideal_length to allocate a CastII node. // Return allocated_length only if it would not be improved by a CastII. //============================================================================= //---------------------------StoreNode::make----------------------------------- // Polymorphic factory method: ctl !=
NULL,
"raw memory operations should have control edge");
//--------------------------bottom_type---------------------------------------- //------------------------------hash------------------------------------------- // unroll addition of interesting fields //return (uintptr_t)in(Control) + (uintptr_t)in(Memory) + (uintptr_t)in(Address) + (uintptr_t)in(ValueIn); // Since they are not commoned, do not hash them: //------------------------------Ideal------------------------------------------ // Change back-to-back Store(, p, x) -> Store(m, p, y) to Store(m, p, x). // try to capture it into the initialization, or hoist it above. // Back-to-back stores to same address? Fold em up. Generally // unsafe if I have intervening uses... Also disallowed for StoreCM // since they must follow each StoreP operation. Redundant StoreCMs // are eliminated just before matching in final_graph_reshape. // Looking at a dead closed cycle of memory? "no mismatched stores, except on raw memory");
if (
mem->
outcnt() ==
1 &&
// check for intervening uses // If anybody other than 'this' uses 'mem', we cannot fold 'mem' away. // For example, 'mem' might be the final state at a conditional return. // Or, 'mem' might be used by some node which is live at the same time // 'this' is live, which might be unschedulable. So, require exactly // ONE user, the 'this' store, until such time as we clone 'mem' for // each of 'mem's uses (thus making the exactly-1-user-rule hold true). // It's OK to do this in the parser, since DU info is always accurate, // and the parser always refers to nodes via SafePointNode maps. // Capture an unaliased, unconditional, simple store into an initializer. // Or, if it is independent of the allocation, hoist it above the allocation. // If the InitializeNode captured me, it made a raw copy of me, // and I need to disappear. // %%% hack to ensure that Ideal returns a new node: return mem;
// fold me away return NULL;
// No further progress //------------------------------Value----------------------------------------- // Either input is TOP ==> the result is TOP //------------------------------Identity--------------------------------------- // Remove redundant stores: // Store(m, p, Load(m, p)) changes to m. // Store(, p, x) -> Store(m, p, x) changes to Store(m, p, x). // Load then Store? Then the Store is useless // Two stores in a row of the same value? // Store of zero anywhere into a freshly-allocated object? // Then the store is useless. // (It must already have been captured by the InitializeNode.) // a newly allocated object is already all-zeroes everywhere // the store may also apply to zero-bits in an earlier object // Steps (a), (b): Walk past independent stores to find an exact match. // prev_val and val might differ by a cast; it would be good // to keep the more informative of the two. //------------------------------match_edge------------------------------------- // Do we Match on this edge index or not? Match only memory & value //------------------------------cmp-------------------------------------------- // Do not common stores up together. They generally have to be split // back up anyways, so do not bother. return (&n ==
this);
// Always fail except on self //------------------------------Ideal_masked_input----------------------------- // Check for a useless mask before a partial-word store // (StoreB ... (AndI valIn conIa) ) // If (conIa & mask == mask) this simplifies to //------------------------------Ideal_sign_extended_input---------------------- // Check for useless sign-extension before a partial-word store // (StoreB ... (RShiftI _ (LShiftI _ valIn conIL ) conIR) ) // If (conIL == conIR && conIR <= num_bits) this simplifies to //------------------------------value_never_loaded----------------------------------- // Determine whether there are any possible loads of the value stored. // For simplicity, we actually check if there are any loads from the // address stored to, not just for loads of the value stored by this node. return false;
// if not a distinct instance, there may be aliases of the address //============================================================================= //------------------------------Ideal------------------------------------------ // If the store is from an AND mask that leaves the low bits untouched, then // we can skip the AND operation. If the store is from a sign-extension // (a left shift, then right shift) we can skip both. // Finally check the default case //============================================================================= //------------------------------Ideal------------------------------------------ // If the store is from an AND mask that leaves the low bits untouched, then // we can skip the AND operation // Finally check the default case //============================================================================= //------------------------------Identity--------------------------------------- // No need to card mark when storing a null ptr //============================================================================= //------------------------------Ideal--------------------------------------- //------------------------------Value----------------------------------------- // Either input is TOP ==> the result is TOP // If extra input is TOP ==> the result is TOP //============================================================================= //----------------------------------SCMemProjNode------------------------------ //============================================================================= //============================================================================= //-------------------------------adr_type-------------------------------------- // Do we Match on this edge index or not? Do not match memory //------------------------------match_edge------------------------------------- // Do we Match on this edge index or not? Do not match memory //------------------------------Identity--------------------------------------- // Clearing a zero length array does nothing //------------------------------Idealize--------------------------------------- // Clearing a short array is faster with stores // Clearing nothing uses the Identity call. // Negative clears are possible on dead ClearArrays // Length too long; use fast hardware clear // adjust atp to be the correct array element address type // Get base for derived pointer purposes //----------------------------step_through---------------------------------- // Return allocation input memory edge if it is different instance // or itself if it is the one we are looking for. // This method is called only before Allocate nodes are expanded during // macro nodes expansion. Before that ClearArray nodes are only generated // in LibraryCallKit::generate_arraycopy() which follows allocations. // Can not bypass initialization of the instance we are looking for. //----------------------------clear_memory------------------------------------- // Generate code to initialize object storage to zero. // Initialize the remaining stuff, if any, with a ClearArray. // Scale to the unit required by the CPU: // Bulk clear double-words //============================================================================= // Do not match memory edge. //------------------------------Ideal------------------------------------------ // Return a node which is more "ideal" than the current node. Strip out // If transformed to a MergeMem, get the desired slice //============================================================================= //------------------------------cmp-------------------------------------------- return (&n ==
this);
// Always fail except on self //------------------------------make------------------------------------------- //------------------------------Ideal------------------------------------------ // Return a node which is more "ideal" than the current node. Strip out // Eliminate volatile MemBars for scalar replaced objects. // Volatile field loads and stores. // Check for scalar replaced object reference. // Replace MemBar projections by its inputs. // Must return either the original node (now dead) or a new node // (Do not return a top here, since that would break the uniqueness of top.) //------------------------------Value------------------------------------------ //------------------------------match------------------------------------------ // Construct projections for memory. //===========================InitializeNode==================================== // This node acts as a memory barrier on raw memory, after some raw stores. // The 'cooked' oop value feeds from the Initialize, not the Allocation. // The Initialize can 'capture' suitably constrained stores as raw inits. // It can coalesce related raw stores into larger units (called 'tiles'). // It can avoid zeroing new storage for memory units which have raw inits. // At macro-expansion, it is marked 'complete', and does not optimize further. // The object 'new short[2]' occupies 16 bytes in a 32-bit machine. // ctl = incoming control; mem* = incoming memory // (Note: A star * on a memory edge denotes I/O and other standard edges.) // First allocate uninitialized memory and fill in the header: // alloc = (Allocate ctl mem* 16 #short[].klass ...) // ctl := alloc.Control; mem* := alloc.Memory* // rawmem = alloc.Memory; rawoop = alloc.RawAddress // Then initialize to zero the non-header parts of the raw memory block: // init = (Initialize alloc.Control alloc.Memory* alloc.RawAddress) // ctl := init.Control; mem.SLICE(#short[*]) := init.Memory // After the initialize node executes, the object is ready for service: // oop := (CheckCastPP init.Control alloc.RawAddress #short[]) // Suppose its body is immediately initialized as {1,2}: // store1 = (StoreC init.Control init.Memory (+ oop 12) 1) // store2 = (StoreC init.Control store1 (+ oop 14) 2) // mem.SLICE(#short[*]) := store2 // An InitializeNode collects and isolates object initialization after // an AllocateNode and before the next possible safepoint. As a // memory barrier (MemBarNode), it keeps critical stores from drifting // down past any safepoint or any publication of the allocation. // Before this barrier, a newly-allocated object may have uninitialized bits. // After this barrier, it may be treated as a real oop, and GC is allowed. // The semantics of the InitializeNode include an implicit zeroing of // the new object from object header to the end of the object. // (The object header and end are determined by the AllocateNode.) // Certain stores may be added as direct inputs to the InitializeNode. // These stores must update raw memory, and they must be to addresses // derived from the raw address produced by AllocateNode, and with // a constant offset. They must be ordered by increasing offset. // The first one is at in(RawStores), the last at in(req()-1). // Unlike most memory operations, they are not linked in a chain, // but are displayed in parallel as users of the rawmem output of // (See comments in InitializeNode::capture_store, which continue // the example given above.) // When the associated Allocate is macro-expanded, the InitializeNode // may be rewritten to optimize collected stores. A ClearArrayNode // may also be created at that point to represent any required zeroing. // The InitializeNode is then marked 'complete', prohibiting further // capturing of nearby memory operations. // During macro-expansion, all captured initializations which store // constant values of 32 bits or smaller are coalesced (if advantageous) // into larger 'tiles' 32 or 64 bits. This allows an object to be // initialized in fewer memory operations. Memory words which are // covered by neither tiles nor non-constant stores are pre-zeroed // by explicit stores of zero. (The code shape happens to do all // zeroing first, then all other stores, with both sequences occurring // in order of ascending offsets.) // Alternatively, code may be inserted between an AllocateNode and its // InitializeNode, to perform arbitrary initialization of the new object. // E.g., the object copying intrinsics insert complex data transfers here. // The initialization must then be marked as 'complete' disable the // built-in zeroing semantics and the collection of initializing stores. // While an InitializeNode is incomplete, reads from the memory state // produced by it are optimizable if they match the control edge and // They return a stored value (if the offset matches) or else zero. // A write to the memory state, if it matches control and address, // and if it is to a constant offset, may be 'captured' by the // InitializeNode. It is cloned as a raw memory operation and rewired // inside the initialization, to the raw oop produced by the allocation. // Operations on addresses which are provably distinct (e.g., to // other AllocateNodes) are allowed to bypass the initialization. // The effect of all this is to consolidate object initialization // (both arrays and non-arrays, both piecewise and bulk) into a // single location, where it can be optimized as a unit. // Only stores with an offset less than TrackedInitializationLimit words // will be considered for capture by an InitializeNode. This puts a // reasonable limit on the complexity of optimized initializations. //---------------------------InitializeNode------------------------------------ // Note: allocation() can be NULL, for secondary initialization barriers // Since this node is not matched, it will be processed by the // register allocator. Declare that there are no constraints // on the allocation of the RawAddress edge. // This edge should be set to top, by the set_complete. But be conservative. // incoming raw memory is not split // After this node is complete, it contains a bunch of // raw-memory initializations. There is no need for // it to have anything to do with non-raw memory effects. // Therefore, tell all non-raw users to re-optimize themselves, // after skipping the memory effects of this initialization. // return false if the init contains any stores already // for now, if this allocation has already collected any inits, bail: // delete any empty spaces created: // Helper for remembering which stores go with which offsets. if (!
st->
is_Store())
return -
1;
// can happen to dead code via subsume_node if (
base ==
NULL)
return -
1;
// something is dead, if (
offset < 0)
return -
1;
// dead, dead // Helper for proving that an initialization expression is // "simple enough" to be folded into an object initialization. // Attempts to prove that a store's initial value 'n' can be captured // within the initialization without creating a vicious cycle, such as: // { Foo p = new Foo(); p.next = p; } // True for constants and parameters and small combinations thereof. if (n ==
NULL)
return true;
// (can this really happen?) if (n ==
this)
return false;
// found a cycle if (n->
is_Start())
return true;
// params, etc., are OK if (n->
is_Root())
return true;
// even better if (
ctl ==
this)
return false;
// If we already know that the enclosing memory op is pinned right after // the init, then any control flow that the store has picked up // must have preceded the init, or else be equal to the init. // Even after loop optimizations (which might change control edges) // a store is never pinned *before* the availability of its inputs. return false;
// failed to prove a good control // Check data edges for possible dependencies on 'this'. if ((
count +=
1) >
20)
return false;
// complexity limit for (
uint i =
1; i < n->
req(); i++) {
if (i !=
first_i)
continue;
// process duplicate edge just once // Here are all the checks a Store must pass before it can be moved into // an initialization. Returns zero if a check fails. // On success, returns the (constant) offset to which the store applies, // within the initialized memory. return FAIL;
// an inscrutable StoreNode (card mark?) return FAIL;
// must be unconditional after the initialization return FAIL;
// must not be preceded by other stores return FAIL;
// inscrutable address return FAIL;
// wrong allocation! (store needs to float up) return FAIL;
// stored value must be 'simple enough' // Find the captured store in(i) which corresponds to the range // [start..start+size) in the initialized object. // If there is one, return its index i. If there isn't, return the // negative of the index where it should be inserted. // Return 0 if the queried range overlaps an initialization boundary // or if dead code is encountered. // If size_in_bytes is zero, do not bother with overlap checks. return FAIL;
// arraycopy got here first; punt // no negatives, no header fields: // after a certain size, we bail out on tracking all the stores: if (i >=
limit)
return -(
int)i;
// not found; here is where to put it return FAIL;
// bail out if there is dead garbage // ...we are done, since stores are ordered return FAIL;
// the next store overlaps return -(
int)i;
// not found; here is where to put it return FAIL;
// the previous store overlaps return FAIL;
// mismatched store size // Look for a captured store which initializes at the offset 'start' // with the given size. If there is no such store, and no other // initialization interferes, then return zero_memory (the memory // projection of the AllocateNode). return NULL;
// something is dead return zero_memory();
// just primordial zero bits here Node*
st =
in(i);
// here is the store at this position // Create, as a raw pointer, an address within my new object at 'offset'. // Clone the given store, converting it into a raw store // initializing a field or element of my new object. // Caller is responsible for retiring the original store, // with subsume_node or the like. // From the example above InitializeNode::InitializeNode, // here are the old stores to be captured: // store1 = (StoreC init.Control init.Memory (+ oop 12) 1) // store2 = (StoreC init.Control store1 (+ oop 14) 2) // Here is the changed code; note the extra edges on init: // alloc = (Allocate ...) // rawoop = alloc.RawAddress // rawstore1 = (StoreC alloc.Control alloc.Memory (+ rawoop 12) 1) // rawstore2 = (StoreC alloc.Control alloc.Memory (+ rawoop 14) 2) // init = (Initialize alloc.Control alloc.Memory rawoop if (i == 0)
return NULL;
// bail out prev_mem =
in(i);
// there is a pre-existing store under this one set_req(i, C->
top());
// temporarily disconnect it // See StoreNode::Ideal 'st->outcnt() == 1' for the reason to disconnect. i = -i;
// no pre-existing store set_req(--i, C->
top());
// reuse this edge; it has been folded away // At this point, new_st might have swallowed a pre-existing store // at the same offset, or perhaps new_st might have disappeared, // if it redundantly stored the same value (or zero to fresh memory). // In any case, wire it in: // The caller may now kill the old guy. return false;
// strange store offset (assume size==2**N) default:
return false;
// strange store size (detect size!=2**N here) return true;
// return success to caller // Coalesce subword constants into int constants and possibly // into long constants. The goal, if the CPU permits, // is to initialize the object with a small number of 64-bit tiles. // Also, convert floating-point constants to bit patterns. // Non-constants are not relevant to this pass. // In terms of the running example on InitializeNode::InitializeNode // and InitializeNode::capture_store, here is the transformation // of rawstore1 and rawstore2 into rawstore12: // alloc = (Allocate ...) // rawoop = alloc.RawAddress // rawstore12 = (StoreI alloc.Control alloc.Memory (+ rawoop 12) tile12) // init = (Initialize alloc.Control alloc.Memory rawoop rawstore12) // Note: After this pass, they are not completely sane, // since there may be some overlaps. // allocate space for the tile map: // tiles: exact bitwise model of all primitive constants // nodes: last constant-storing node subsumed into the tiles model // inits: which bytes (in each tile) are touched by any initializations //// Pass A: Fill in the tile model with any relevant stores. // Figure out the store's offset and constant value: // Record which bytes are touched, whether by constant or not. continue;
// skip (strange store size) default:
continue;
//skip (odd store type) continue;
// This StoreL is already optimal. // Store down the constant. // This StoreI is already optimal by itself. intcon[
1] = 0;
// undo the store_constant() // If the previous store is also optimal by itself, back up and // undo the action of the previous loop iteration... if we can. // But if we can't, just let the previous half take care of itself. // Undo the effects of the previous loop trip, which swallowed st: intcon[0] = 0;
// undo store_constant() continue;
// This StoreI is already optimal. // This store is not needed. nodes[j] =
st;
// record for the moment return;
// nothing more to do //// Pass B: Convert any non-zero tiles into optimal constant stores. // Be sure to insert them before overlapping non-constant stores. // (E.g., byte[] x = { 1,2,y,4 } => x[int 0] = 0x01020004, x[2]=y.) split =
true;
// only the second word counts // Example: int a[] = { 42 ... } split =
true;
// first word is covered by full inits // Example: int a[] = { ... foo(), 42 ... } split =
true;
// second word is covered by full inits // Example: int a[] = { ... 42, foo() ... } // Here's a case where init0 is neither 0 nor -1: // byte a[] = { ... 0,0,foo(),0, 0,0,0,42 ... } // Assuming big-endian memory, init0, init1 are 0x0000FF00, 0x000000FF. // In this case the tile is not split; it is (jlong)42. // The big tile is stored down, and then the foo() value is inserted. // (If there were foo(),foo() instead of foo(),0, init0 would be -1.) // One or two coalesced stores to plop down. // Omit either if it is a zero. // Insert second store first, then the first before the second. // Insert each one just before any overlapping non-constant stores. // Clean up any remaining occurrences of zmem: // Explore forward from in(start) to find the first fully initialized // word, and return its offset. Skip groups of subword stores which // together initialize full words. If in(start) is itself part of a // fully initialized word, return the offset of in(start). If there // are no following full-word stores, or if something is fishy, return if (
st_off < 0)
break;
// return conservative answer return st_off;
// we found a complete word init // Did this store hit or cross the word boundary? // We passed the current int, without fully initializing it. // We passed the current and next int. // Called when the associated AllocateNode is expanded into CFG. // At this point, we may perform additional optimizations. // Linearize the stores by ascending offset, to make memory // activity as coherent as possible. // reduce instruction count for common initialization patterns bool do_zeroing =
true;
// we might give up if inits are very sparse break;
// unknown junk in the inits break;
// complicated store chains somehow in list // See if this store needs a zero before it or under it. // Look for subword stores which only partially initialize words. // If we find some, we must lay down some word-level zeroes first, // underneath the subword stores. // byte[] a = { p,q,r,s } => a[0]=p,a[1]=q,a[2]=r,a[3]=s // byte[] a = { x,y,0,0 } => a[0..3] = 0, a[0]=x,a[1]=y // byte[] a = { 0,0,z,0 } => a[0..3] = 0, a[2]=z // Note: coalesce_subword_stores may have already done this, // if it was prompted by constant non-zero subword initializers. // But this case can still arise with non-constant stores. // In the examples above: // st_off 12 13 14 15 12 13 14 // next_full_s. 12 16 16 16 16 16 16 // z's_done 12 16 16 16 12 16 12 // z's_needed 12 16 16 16 16 16 16 // Conservative tack: Zero to end of current word. // Zero to beginning of next fully initialized word. // Or, don't zero at all, if we are already in that word. // Do some incremental zeroing on rawmem, in parallel with inits. // Collect the store and move on: inits =
st;
// put it on the linearized chain // Various order invariants. Weaker than stores_are_sane because // a large constant tile can be filled in by smaller non-constant stores. // If anything remains to be zeroed, zero it all now. // if it is the last unused 4 bytes of an instance, forget about it return true;
// stores could be anything at this point if (
st_off < 0)
continue;
// ignore dead garbage assert(
false,
"ascending store offsets");
//============================MergeMemNode===================================== // SEMANTICS OF MEMORY MERGES: A MergeMem is a memory state assembled from several // contributing store or call operations. Each contributor provides the memory // state for a particular "alias type" (see Compile::alias_type). For example, // if a MergeMem has an input X for alias category #6, then any memory reference // to alias category #6 may use X as its memory state input, as an exact equivalent // to using the MergeMem as a whole. // Load<6>( MergeMem(<6>: X, ...), p ) <==> Load<6>(X,p) // (Here, the <N> notation gives the index of the relevant adr_type.) // In one special case (and more cases in the future), alias categories overlap. // The special alias category "Bot" (Compile::AliasIdxBot) includes all memory // states. Therefore, if a MergeMem has only one contributing input W for Bot, // it is exactly equivalent to that state W: // MergeMem(<Bot>: W) <==> W // Usually, the merge has more than one input. In that case, where inputs // overlap (i.e., one is Bot), the narrower alias type determines the memory // state for that type, and the wider alias type (Bot) fills in everywhere else: // Load<5>( MergeMem(<Bot>: W, <6>: X), p ) <==> Load<5>(W,p) // Load<6>( MergeMem(<Bot>: W, <6>: X), p ) <==> Load<6>(X,p) // A merge can take a "wide" memory state as one of its narrow inputs. // This simply means that the merge observes out only the relevant parts of // the wide input. That is, wide memory states arriving at narrow merge inputs // are implicitly "filtered" or "sliced" as necessary. (This is rare.) // These rules imply that MergeMem nodes may cascade (via their <Bot> links), // and that memory slices "leak through": // MergeMem(<Bot>: MergeMem(<Bot>: W, <7>: Y)) <==> MergeMem(<Bot>: W, <7>: Y) // But, in such a cascade, repeated memory slices can "block the leak": // MergeMem(<Bot>: MergeMem(<Bot>: W, <7>: Y), <7>: Y') <==> MergeMem(<Bot>: W, <7>: Y') // In the last example, Y is not part of the combined memory state of the // outermost MergeMem. The system must, of course, prevent unschedulable // memory states from arising, so you can be sure that the state Y is somehow // a precursor to state Y'. // REPRESENTATION OF MEMORY MERGES: The indexes used to address the Node::in array // of each MergeMemNode array are exactly the numerical alias indexes, including // but not limited to AliasIdxTop, AliasIdxBot, and AliasIdxRaw. The functions // Compile::alias_type (and kin) produce and manage these indexes. // By convention, the value of in(AliasIdxTop) (i.e., in(1)) is always the top node. // (Note that this provides quick access to the top node inside MergeMem methods, // without the need to reach out via TLS to Compile::current.) // As a consequence of what was just described, a MergeMem that represents a full // memory state has an edge in(AliasIdxBot) which is a "wide" memory state, // containing all alias categories. // MergeMem nodes never (?) have control inputs, so in(0) is NULL. // All other edges in(N) (including in(AliasIdxRaw), which is in(3)) are either // a memory state for the alias type <N>, or else the top node, meaning that // there is no particular input for that alias type. Note that the length of // a MergeMem is variable, and may be extended at any time to accommodate new // memory states at larger alias indexes. When merges grow, they are of course // filled with "top" in the unused in() positions. // This use of top is named "empty_memory()", or "empty_mem" (no-memory) as a variable. // (Top was chosen because it works smoothly with passes like GCM.) // For convenience, we hardwire the alias index for TypeRawPtr::BOTTOM. (It is // the type of random VM bits like TLS references.) Since it is always the // first non-Bot memory slice, some low-level loops use it to initialize an // index variable: for (i = AliasIdxRaw; i < req(); i++). // ACCESSORS: There is a special accessor MergeMemNode::base_memory which returns // the distinguished "wide" state. The accessor MergeMemNode::memory_at(N) returns // the memory state for alias type <N>, or (if there is no particular slice at <N>, // it returns the base memory. To prevent bugs, memory_at does not accept <Top> // or <Bot> indexes. The iterator MergeMemStream provides robust iteration over // MergeMem nodes or pairs of such nodes, ensuring that the non-top edges are visited. // %%%% We may get rid of base_memory as a separate accessor at some point; it isn't // really that different from the other memory inputs. An abbreviation called // "bot_memory()" for "memory_at(AliasIdxBot)" would keep code tidy. // PARTIAL MEMORY STATES: During optimization, MergeMem nodes may arise that represent // partial memory states. When a Phi splits through a MergeMem, the copy of the Phi // that "emerges though" the base memory will be marked as excluding the alias types // of the other (narrow-memory) copies which "emerged through" the narrow edges: // Phi<Bot>(U, MergeMem(<Bot>: W, <8>: Y)) // ==Ideal=> MergeMem(<Bot>: Phi<Bot-8>(U, W), Phi<8>(U, Y)) // This strange "subtraction" effect is necessary to ensure IGVN convergence. // (It is currently unimplemented.) As you can see, the resulting merge is // actually a disjoint union of memory states, rather than an overlay. //------------------------------MergeMemNode----------------------------------- // all inputs are nullified in Node::Node(int) // set_input(0, NULL); // no control input // Initialize the edges uniformly to top, for starters. // Make a new, untransformed MergeMem with the same base as 'mem'. // If mem is itself a MergeMem, populate the result with the same edges. //------------------------------cmp-------------------------------------------- return (&n ==
this);
// Always fail except on self //------------------------------Identity--------------------------------------- // Identity if this merge point does not record any interesting memory return this;
// Many memory splits; no change return base_mem;
// No memory splits; ID on the one true input //------------------------------Ideal------------------------------------------ // This method is invoked recursively on chains of MergeMem nodes // Remove chain'd MergeMems // This is delicate, because the each "in(i)" (i >= Raw) is interpreted // relative to the "in(Bot)". Since we are patching both at the same time, // we have to be careful to read each "in(i)" relative to the old "in(Bot)", // but rewrite each "in(i)" relative to the new "in(Bot)". return NULL;
// Dead memory path. // simplify stacked MergeMems in base memory // the base memory might contribute new slices beyond my req() // Look carefully at the base node if it is a phi. // do not examine phi if degraded to a copy // see if the phi is unfinished // incomplete phi; do not look at it yet! // Note: We do not call verify_sparse on entry, because inputs // can normalize to the base_memory via subsume_node or similar // mechanisms. This method repairs that damage. // calculate the old memory value // maybe update (reslice) the old memory value // simplify stacked MergeMems // This can happen if loops break up and safepoints disappear. // A merge of BotPtr (default) with a RawPtr memory derived from a // safepoint can be rewritten to a merge of the same BotPtr with // the BotPtr phi coming into the loop. If that phi disappears // also, we can end up with a self-loop of the mergemem. // In general, if loops degenerate and memory effects disappear, // a mergemem can be left looking at itself. This simply means // that the mergemem's default should be used, since there is // no longer any apparent effect on this slice. // Note: If a memory slice is a MergeMem cycle, it is unreachable // from start. Update the input to TOP. // else preceding memory was not a MergeMem // replace equivalent phis (unfortunately, they do not GVN together) // equivalent phi nodes; revert to the def // maybe store down a new value // Warning: Do not combine this "if" with the previous "if" // A memory slice might have be be rewritten even if it is semantically // unchanged, if the base_memory value has changed. // Don't use set_base_memory(new_base), because we need to update du. // a self cycle indicates this memory path is dead // Resolve external cycles by calling Ideal on a MergeMem base_memory // Recursion must occur after the self cycle check above // propagate rollup of dead cycle to self // Cut inputs during Parse phase only. // During Optimize phase a dead MergeMem node will be subsumed by Top. // Check if PhiNode::Ideal's "Split phis through memory merges" // transform should be attempted. Look for this->phi->this cycle. for(
uint i =
1; i <
phi->
req(); ++i ) {
// For all paths in if (
phi->
in(i) ==
this) {
//-------------------------set_base_memory------------------------------------- // Clear out other occurrences of new_base: //------------------------------out_RegMask------------------------------------ //------------------------------dump_spec-------------------------------------- // phis shift around during optimization return true;
// pretty stupid... // verify a narrow slice (either incoming or outgoing) if (!
VerifyAliases)
return;
// don't bother to verify unless requested if (
Node::
in_dump())
return;
// muzzle asserts when printing // Elide intervening MergeMem's // Implicit copy of base_memory() // A few places like make_runtime_call "know" that VM calls are narrow, // and can be used to update only the VM bits stored as TypeRawPtr::BOTTOM. // memory can "leak through" calls on channels that // are write-once. Allow this also. //-----------------------------memory_at--------------------------------------- "must avoid base_memory and AliasIdxTop");
// Otherwise, it is a narrow slice. // the array is sparse; empty slots are the "top" node "must be a wide memory");
// AliasLevel == 0 if we are organizing the memory states manually. // See verify_memory_slice for comments on TypeRawPtr::BOTTOM. // make sure the stored slice is sane // Give it a pass: It is a mostly harmless repetition of the base. // This can arise normally from node subsumption during optimization. //---------------------------set_memory_at------------------------------------- if (n ==
empty_mem)
return;
// already the default, so do not grow me //--------------------------iteration_setup------------------------------------ // invariant: the finite support of mm2 is within mm->req() // Replace spurious copies of base_memory by top. //---------------------------grow_to_match------------------------------------- // look for the finite support of the other memory //---------------------------verify_sparse------------------------------------- // The following can happen in degenerate cases, since empty==top. if (
in(i) ==
base_mem)
return false;
// should have been the sentinel value! if (
mem == n)
return true;
// might be empty_memory() if (
mem == n)
return true;
if (
mem == n)
return true;