loopopts.cpp revision 251
196N/A * Copyright 1999-2008 Sun Microsystems, Inc. All Rights Reserved. 0N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 0N/A * This code is free software; you can redistribute it and/or modify it 0N/A * under the terms of the GNU General Public License version 2 only, as 0N/A * published by the Free Software Foundation. 0N/A * This code is distributed in the hope that it will be useful, but WITHOUT 0N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 0N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 0N/A * version 2 for more details (a copy is included in the LICENSE file that 0N/A * accompanied this code). 0N/A * You should have received a copy of the GNU General Public License version 0N/A * 2 along with this work; if not, write to the Free Software Foundation, 0N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 0N/A * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, 0N/A * CA 95054 USA or visit www.sun.com if you need additional information or 0N/A * have any questions. 0N/A#
include "incls/_precompiled.incl" 0N/A//============================================================================= 0N/A//------------------------------split_thru_phi--------------------------------- 0N/A// Split Node 'n' through merge point if there is enough win. 69N/A // ConvI2L may have type information on it which is unsafe to push up 69N/A // so disable this for now 0N/A x = C->
top();
// Dead path? Use a dead data op 0N/A x = n->
clone();
// Else clone up the data op 0N/A // Alter data node to use pre-phi inputs 0N/A // Check for a 'win' on some paths 0N/A // A TOP singleton indicates that there are no possible values incoming 0N/A // along a particular edge. In most cases, this is OK, and the Phi will 0N/A // be eliminated later in an Ideal call. However, we can't allow this to 0N/A // happen if the singleton occurs on loop entry, as the elimination of 0N/A // the PhiNode may cause the resulting node to migrate back to a previous 0N/A // Is_Loop() == false does not confirm the absence of a loop (e.g., an 0N/A // irreducible loop may not be indicated by an affirmative is_Loop()); 0N/A // therefore, the only top we can split thru a phi is on a backedge of 0N/A // We now call Identity to try to simplify the cloned node. 0N/A // Note that some Identity methods call phase->type(this). 0N/A // Make sure that the type array is big enough for 0N/A // our new node, even though we may throw the node away. 0N/A // (Note: This tweaking with igvn only works because x is a new node.) 0N/A // Else x is a new node we are keeping 0N/A // We do not need register_new_node_with_optimizer 0N/A // because set_type has already been called. 0N/A // If we commoned up the cloned 'x' with another existing Node, 0N/A // the existing Node picks up a new use. We need to make the 0N/A // existing Node occur higher up so it dominates its uses. 0N/A // The occasional new node 0N/A // New late point must dominate new use 0N/A // If changing loop bodies, see if we need to collect into new body 0N/A//------------------------------dominated_by------------------------------------ 0N/A// Replace the dominated test with an obvious true or false. Place it on the 0N/A// IGVN worklist for later cleanup. Move control-dependent data Nodes on the 0N/A// live path up to the dominating control. 0N/A // prevdom is the dominating projection of the dominating test. 0N/A // 'con' is set to true or false to kill the dominated test. 0N/A // Hack the dominated test 0N/A // If I dont have a reachable TRUE and FALSE path following the IfNode then 0N/A // I can assume this path reaches an infinite loop. In this case it's not 0N/A // important to optimize the data Nodes - either the whole compilation will 0N/A // be tossed or this path (and all data Nodes) will go dead. 0N/A // Make control-dependent data Nodes on the live path (path that will remain 0N/A // once the dominated IF is removed) become control-dependent on the 0N/A // dominating projection. 0N/A//------------------------------has_local_phi_input---------------------------- 0N/A// Return TRUE if 'n' has Phi inputs from its local block and no other 0N/A// block-local inputs (all non-local-phi inputs come from earlier blocks) 0N/A // See if some inputs come from a Phi in this block, or from before 0N/A for( i =
1; i < n->
req(); i++ ) {
0N/A return NULL;
// No Phi inputs; nowhere to clone thru 0N/A // Check for inputs created between 'n' and the Phi input. These 0N/A // must split as well; they have already been given the chance 0N/A // (courtesy of a post-order visit) and since they did not we must 0N/A // recover the 'cost' of splitting them by being very profitable 0N/A // when splitting 'n'. Since this is unlikely we simply give up. 0N/A for( i =
1; i < n->
req(); i++ ) {
0N/A // We allow the special case of AddP's with no local inputs. 0N/A // This allows us to split-up address expressions. 0N/A // Move the AddP up to dominating point 0N/A//------------------------------remix_address_expressions---------------------- 0N/A// Rework addressing expressions to get the most loop-invariant stuff 0N/A// moved out. We'd like to do all associative operators, but it's especially 0N/A// important (common) to do address expressions. 0N/A // See if 'n' mixes loop-varying and loop-invariant inputs and 0N/A // itself is loop-varying. 0N/A // Only interested in binary ops (and AddP) 0N/A // Does one of my inputs spin in a tighter loop than self? 0N/A return NULL;
// Leave well enough alone 0N/A // Is at least one of my inputs loop-invariant? 0N/A return NULL;
// No loop-invariant inputs 0N/A // Replace expressions like ((V+I) << 2) with (V<<2 + I<<2). 0N/A // Scale is loop invariant 0N/A // Add must vary with loop (else shift would be loop-invariant) 0N/A //assert( n_loop == add_loop, "" ); 0N/A // Convert I-V into I+ (0-V); same for V-I 0N/A // See if one add input is loop invariant 0N/A // Swap to find the invariant part 0N/A }
else // Else neither input is loop invariant 0N/A return NULL;
// No invariant part of the add? 0N/A // Yes! Reshape address expression! 0N/A // Replace (I+V) with (V+I) 0N/A // Replace ((I1 +p V) +p I2) with ((I1 +p I2) +p V), 0N/A // but not if I2 is a constant. 0N/A // Stuff new AddP in the loop preheader 0N/A // Replace (I1 +p (I2 + V)) with ((I1 +p I2) +p V) 0N/A // Stuff new AddP in the loop preheader 0N/A//------------------------------conditional_move------------------------------- 0N/A// Attempt to replace a Phi with a conditional move. We have some pretty 0N/A// strict profitability requirements. All Phis at the merge point must 0N/A// be converted, so we can remove the control flow. We need to limit the 0N/A// number of c-moves to a small handful. All code that was in the side-arms 0N/A// of the CFG diamond is now speculatively executed. This code has to be 0N/A// "cheap enough". We are pretty much limited to CFG diamonds that merge 0N/A// 1 or 2 items with a total of 1 or 2 ops executed speculatively. 0N/A // Check for CFG diamond 0N/A // Check for highly predictable branch. No point in CMOV'ing if 0N/A // we are going to predict accurately all the time. 0N/A // %%% This hides patterns produced by utility methods like Math.min. 0N/A // Check for ops pinned in an arm of the diamond. 0N/A // Can't remove the control flow in this case 0N/A // Check profitability 0N/A if( !
out->
is_Phi() )
continue;
// Ignore other control edges, etc 0N/A cost++;
// Probably encodes as 2 CMOV's 0N/A case T_OBJECT: {
// Base oops are OK, but not derived oops 0N/A // Derived pointers are Bad (tm): what's the Base (for GC purposes) of a 0N/A // CMOVE'd derived pointer? It's a CMOVE'd derived base. Thus 0N/A // CMOVE'ing a derived pointer requires we also CMOVE the base. If we 0N/A // have a Phi for the base here that we convert to a CMOVE all is well 0N/A // and good. But if the base is dead, we'll not make a CMOVE. Later 0N/A // the allocator will have to produce a base by creating a CMOVE of the 0N/A // relevant bases. This puts the allocator in the business of 0N/A // manufacturing expensive instructions, generally a bad plan. 0N/A // Just Say No to Conditionally-Moved Derived Pointers. 0N/A return NULL;
// In particular, can't do memory or I/O 0N/A // Add in cost any speculative ops 0N/A // Check for a chain of dependent ops; these will all become 0N/A // speculative in a CMOV. 0N/A return NULL;
// Too much speculative goo 0N/A // See if the Phi is used by a Cmp. This will likely Split-If, a 0N/A // higher-payoff operation. 35N/A // It is expensive to generate flags from a float compare. 35N/A // Avoid duplicated float compare. 0N/A // Now replace all Phis with CMOV's 0N/A // Move speculative ops 0N/A // The useless CFG diamond will fold up later; see the optimization in 0N/A // RegionNode::Ideal. 0N/A//------------------------------split_if_with_blocks_pre----------------------- 0N/A// Do the real work in a non-recursive function. Data nodes want to be 0N/A// cloned in the pre-order so they can feed each other nicely. 0N/A // Cloning these guys is unlikely to win 0N/A // Do not clone-up CmpFXXX variations, as these are always 0N/A // followed by a CmpI 0N/A // Attempt to use a conditional move instead of a phi/branch 0N/A if( n->
is_Con() )
return n;
// No cloning for Con nodes 0N/A // Attempt to remix address expressions for loop invariants 0N/A // Determine if the Node has inputs from some local Phi. 0N/A // Returns the block to clone thru. 0N/A // Do not clone the trip counter through on a CountedLoop 0N/A // (messes up the canonical shape). 0N/A // Check for having no control input; not pinned. Allow 0N/A // dominating control. 0N/A // Policy: when is it profitable. You must get more wins than 0N/A // policy before it is considered profitable. Policy is usually 0, 0N/A // so 1 win is considered profitable. Big merges will require big 0N/A // cloning, so get a larger policy. 0N/A // If the loop is a candidate for range check elimination, 0N/A // delay splitting through it's phi until a later loop optimization 0N/A // Use same limit as split_if_with_blocks_post 0N/A if( C->
unique() >
35000 )
return n;
// Method too big 0N/A // Split 'n' through the merge point if it is profitable 0N/A // Found a Phi to split thru! 0N/A // Replace 'n' with the new phi 0N/A // Moved a load around the loop, 'en-registering' something. 0N/A // Bail out if the region and its phis have too many users. 0N/A // 4799512: Stop split_if_with_blocks from splitting a block with a ConvI2LNode 0N/A // having a PhiNode input. This sidesteps the dangerous case where the split 0N/A // ConvI2LNode may become TOP if the input Value() does not 0N/A // overlap the ConvI2L range, leaving a node which may not dominate its 0N/A // A better fix for this problem can be found in the BugTraq entry, but 0N/A // expediency for Mantis demands this hack. 0N/A//------------------------------place_near_use--------------------------------- 0N/A// Place some computation next to use but not inside inner loops. 0N/A// For inner loop uses move it to the preheader area. 0N/A//------------------------------split_if_with_blocks_post---------------------- 0N/A// Do the real work in a non-recursive function. CFG hackery wants to be 0N/A// in the post-order, so it can dirty the I-DOM info and not use the dirtied 0N/A // Cloning Cmp through Phi's involves the split-if transform. 0N/A // FastLock is not used by an If 0N/A if( C->
unique() >
35000 )
return;
// Method too big 0N/A // Do not do 'split-if' if irreducible loops are present. 0N/A // Determine if the Node has inputs from some local Phi. 0N/A // Returns the block to clone thru. 0N/A if( n->
outcnt() !=
1 )
return;
// Multiple bool's from 1 compare? 0N/A if(
bol->
outcnt() !=
1 )
return;
// Multiple branches from 1 compare? 0N/A // Check some safety conditions 0N/A if(
iff->
in(0) !=
n_ctrl )
return;
// Compare must be in same blk as if 0N/A return;
// Inputs not yet split-up 0N/A return;
// Loop-invar test gates loop-varying CMOVE 0N/A return;
// some other kind of node, such as an Allocate 0N/A // Do not do 'split-if' if some paths are dead. First do dead code 0N/A // elimination and then see if its still profitable. 0N/A // When is split-if profitable? Every 'win' on means some control flow 0N/A // goes dead, so it's almost always a win. 0N/A // If trying to do a 'Split-If' at the loop head, it is only 0N/A // profitable if the cmp folds up on BOTH paths. Otherwise we 0N/A // risk peeling a loop forever. 0N/A // CNC - Disabled for now. Requires careful handling of loop 0N/A // body selection for the cloned code. Also, make sure we check 0N/A // for any input path not being in the same loop as n_ctrl. For 0N/A // irreducible loops we cannot check for 'n_ctrl->is_Loop()' 0N/A // because the alternative loop entry points won't be converted 0N/A // Check for safety of the merge point. 0N/A // Split compare 'n' through the merge point if it is profitable 0N/A // Found a Phi to split thru! 0N/A // Replace 'n' with the new phi 0N/A // Now split the bool up thru the phi 0N/A // Conditional-move? Must split up now 0N/A // Check for an IF ready to split; one that has its 0N/A // condition codes input coming from a Phi at the block start. 0N/A // Check for an IF being dominated by another IF same test 0N/A // Check for same test used more than once? 0N/A // Search up IDOMs to see if this IF is dominated. 0N/A // Now search up IDOMs till cutoff, looking for a dominating test 0N/A // Replace the dominated test with an obvious true or false. 0N/A // Place it on the IGVN worklist for later cleanup. 0N/A // See if a shared loop-varying computation has no loop-varying uses. 0N/A // Happens if something is only used for JVM state in uncommon trap exits, 0N/A // like various versions of induction variable+offset. Clone the 0N/A // computation per usage to allow it to sink out of the loop. 0N/A if (
has_ctrl(n) && !n->
in(0)) {
// n not dead and has no control edge (can float about) 0N/A // If n is a load, get and save the result from get_late_ctrl(), 0N/A // to be later used in calculating the control for n's clones. 0N/A // If n is a load, and the late control is the same as the current 0N/A // control, then the cloning of n is a pointless exercise, because 0N/A // GVN will ensure that we end up where we started. 0N/A // Replace all uses of normal nodes. Replace Phi uses 0N/A // individually, so the seperate Nodes can sink down 0N/A while( u->
in(k) != n ) k++;
0N/A // x goes next to Phi input path 0N/A }
else {
// Normal use 0N/A // Find control for 'x' next to use but not inside inner loops. 0N/A // For inner loop uses get the preheader area. 0N/A // For loads, add a control edge to a CFG node outside of the loop 0N/A // to force them to not combine and return back inside the loop 0N/A // during GVN optimization (4641526). 0N/A // Because we are setting the actual control input, factor in 0N/A // the result from get_late_ctrl() so we respect any 0N/A // anti-dependences. (6233005). 0N/A // Don't allow the control input to be a CFG splitting node. 0N/A // Such nodes should only have ProjNodes as outs, e.g. IfNode 0N/A // should only have IfTrueNode and IfFalseNode (4985384). 0N/A // Some institutional knowledge is needed here: 'x' is 0N/A // yanked because if the optimizer runs GVN on it all the 0N/A // cloned x's will common up and undo this optimization and 0N/A // be forced back in the loop. This is annoying because it 0N/A // makes +VerifyOpto report false-positives on progress. I 0N/A // tried setting control edges on the x's to force them to 0N/A // not combine, but the matching gets worried when it tries 0N/A // to fold a StoreP and an AddP together (as part of an 0N/A // address expression) and the AddP and StoreP have 0N/A // different controls. 0N/A // Check for Opaque2's who's loop has disappeared - who's input is in the 0N/A // same loop nest as their output. Remove 'em, they are no longer useful. 0N/A//------------------------------split_if_with_blocks--------------------------- 0N/A// Check for aggressive application of 'split-if' optimization, 0N/A// using basic block level info. 0N/A // Do pre-visit work for root 0N/A // Visit all children 0N/A // Now do pre-visit work for this use 0N/A n =
use;
// Process all children of current use. 0N/A // All of n's children have been processed, complete post-processing. 0N/A // Finished all nodes on stack. 0N/A // Get saved parent node and next use's index. Visit the rest of uses. 0N/A//============================================================================= 0N/A// C L O N E A L O O P B O D Y 0N/A//------------------------------clone_iff-------------------------------------- 0N/A// Passed in a Phi merging (recursively) some nearly equivalent Bool/Cmps. 0N/A// "Nearly" because all Nodes have been cloned from the original in the loop, 0N/A// but the fall-in edges to the Cmp are different. Clone bool/Cmp pairs 0N/A// through the Phi recursively, and return a Bool. 0N/A // Convert this Phi into a Phi merging Bools 0N/A // Make Phis to merge the Cmp's inputs. 0N/A // See if these Phis have been made before. 0N/A // Register with optimizer 0N/A if(
hit1 ) {
// Hit, toss just made Phi 0N/A if(
hit2 ) {
// Hit, toss just made Phi 0N/A//------------------------------clone_bool------------------------------------- 0N/A// Passed in a Phi merging (recursively) some nearly equivalent Bool/Cmps. 0N/A// "Nearly" because all Nodes have been cloned from the original in the loop, 0N/A// but the fall-in edges to the Cmp are different. Clone bool/Cmp pairs 0N/A// through the Phi recursively, and return a Bool. 0N/A // Convert this Phi into a Phi merging Bools 0N/A // Make Phis to merge the Cmp's inputs. 0N/A // See if these Phis have been made before. 0N/A // Register with optimizer 0N/A if(
hit1 ) {
// Hit, toss just made Phi 0N/A if(
hit2 ) {
// Hit, toss just made Phi 0N/A//------------------------------sink_use--------------------------------------- 0N/A// If 'use' was in the loop-exit block, it now needs to be sunk 0N/A// below the post-loop merge point. 0N/A//------------------------------clone_loop------------------------------------- 0N/A// C L O N E A L O O P B O D Y 0N/A// This is the basic building block of the loop optimizations. It clones an 0N/A// entire loop body. It makes an old_new loop body mapping; with this mapping 0N/A// you can find the new-loop equivalent to an old-loop node. All new-loop 0N/A// nodes are exactly equal to their old-loop counterparts, all edges are the 0N/A// same. All exits from the old-loop now have a RegionNode that merges the 0N/A// equivalent new-loop path. This is true even for the normal "loop-exit" 0N/A// condition. All uses of loop-invariant old-loop values now come from (one 0N/A// or more) Phis that merge their new-loop equivalents. 0N/A// This operation leaves the graph in an illegal state: there are two valid 0N/A// control edges coming from the loop pre-header to both loop bodies. I'll 0N/A// definitely have to hack the graph after running this transform. 0N/A// From this building block I will further edit edges to perform loop peeling 0N/A// or loop unrolling or iteration splitting (Range-Check-Elimination), etc. 0N/A// Parameter side_by_size_idom: 0N/A// When side_by_size_idom is NULL, the dominator tree is constructed for 0N/A// the clone loop to dominate the original. Used in construction of 0N/A// pre-main-post loop sequence. 0N/A// When nonnull, the clone and original are side-by-side, both are 0N/A// dominated by the side_by_side_idom node. Used in construction of 0N/A // Step 1: Clone the loop body. Make the old->new mapping. 0N/A // Step 2: Fix the edges in the new body. If the old input is outside the 0N/A // loop use it. If the old input is INside the loop, use the corresponding 0N/A // new node instead. 0N/A // Correct edges to the new node 0N/A // Step 3: Now fix control uses. Loop varying control uses have already 0N/A // been fixed up (as part of all input edges in Step 2). Loop invariant 0N/A // control uses must be either an IfFalse or an IfTrue. Make a merge 0N/A // Copy uses to a worklist, so I can munge the def-use info 0N/A // Both OLD and USE are CFG nodes here. 0N/A // Clone the loop exit control projection 0N/A // We need a Region to merge the exit from the peeled body and the 0N/A // exit from the old loop body. 0N/A // Map the old use to the new merge point 0N/A // The original user of 'use' uses 'r' instead. 0N/A // Now finish up 'r' 0N/A }
// End of if a loop-exit test 0N/A // Step 4: If loop-invariant use is not control, it must be dominated by a 0N/A // there if needed. Make a Phi there merging old and new used values. 0N/A // Copy uses to a worklist, so I can munge the def-use info 0N/A // Check for data-use outside of loop - at least one of OLD or USE 0N/A // must not be a CFG node. 0N/A // If the Data use is an IF, that means we have an IF outside of the 0N/A // loop that is switching on a condition that is set inside of the 0N/A // loop. Happens if people set a loop-exit flag; then test the flag 0N/A // in the loop to break the loop, then test is again outside of the 0N/A // loop to determine which way the loop exited. 0N/A // Since this code is highly unlikely, we lazily build the worklist 0N/A // of such Nodes to go split. 0N/A // Get "block" use is in 0N/A // If the use occurs after merging several exits from the loop, then 0N/A // old value must have dominated all those exits. Since the same old 0N/A // value was used on all those exits we did not need a Phi at this 0N/A // merge point. NOW we do need a Phi here. Each loop exit value 0N/A // is now merged with the peeled body exit; each exit gets its own 0N/A // private Phi and those Phis need to be merged here. 0N/A if(
idx == 0 ) {
// Updating control edge? 0N/A }
else {
// Else need a new Phi 0N/A // Now recursively fix up the new uses of old! 0N/A // Get new RegionNode merging old and new loop exits 0N/A if(
idx == 0 ) {
// Updating control edge? 0N/A }
else {
// Else need a new Phi 0N/A // Make a new Phi merging data values properly 0N/A // If inserting a new Phi, check for prior hits 0N/A // Remove the new phi from the graph and use the hit 0N/A // Make 'use' use the Phi instead of the old loop body exit value 0N/A // Not needed for correctness, but prevents a weak assert 0N/A // in AddPNode from tripping (when we end up with different 0N/A // base & derived Phis that will become the same after 0N/A if(
hit )
// Go ahead and re-hash for hits. 0N/A // If 'use' was in the loop-exit block, it now needs to be sunk 0N/A // below the post-loop merge point. 0N/A // the loop uses a condition set in the loop. The original IF probably 0N/A // takes control from one or more OLD Regions (which in turn get from NEW 0N/A // Regions). In any case, there will be a set of Phis for each merge point 0N/A // from the IF up to where the original BOOL def exists the loop. 0N/A//---------------------- stride_of_possible_iv ------------------------------------- 0N/A// being a cycle involving an add and a phi, 0N/A// with an optional truncation (left-shift followed by a right-shift) 0N/A// of the add. Returns zero if not an iv. 0N/A // Must have an invariant operand 0N/A // (If (Bool (CmpX phi:(Phi ...(Optional-trunc(AddI phi add2))) ))) 0N/A // (If (Bool (CmpX addtrunc:(Optional-trunc((AddI (Phi ...addtrunc...) add2)) ))) 0N/A//---------------------- stay_in_loop ------------------------------------- 0N/A// Return the (unique) control output node that's in the loop (if it exists.) 0N/A//------------------------------ register_node ------------------------------------- 0N/A// Utility to register node "n" with PhaseIdealLoop 0N/A//------------------------------ proj_clone ------------------------------------- 0N/A// Utility to create an if-projection 0N/A//------------------------------ short_circuit_if ------------------------------------- 0N/A// Force the iff control output to be the live_proj 0N/A//------------------------------ insert_if_before_proj ------------------------------------- 0N/A// Insert a new if before an if projection (* - new node) 0N/A// other-proj proj (arg) 0N/A// * new_if(relop(cmp[IU](left,right))) 0N/A//------------------------------ insert_region_before_proj ------------------------------------- 0N/A// Insert a region before an if projection (* - new node) 0N/A//------------------------------ insert_cmpi_loop_exit ------------------------------------- 0N/A// Clone a signed compare loop exit from an unsigned compare and 0N/A// insert it before the unsigned cmp on the stay-in-loop path. 0N/A// All new nodes inserted in the dominator tree between the original 0N/A// if and it's projections. The original if test is replaced with 0N/A// a constant to force the stay-in-loop path. 0N/A// This is done to make sure that the original if and it's projections 0N/A// still dominate the same set of control nodes, that the ctrl() relation 0N/A// from data nodes to them is preserved, and that their loop nesting is 0N/A// if(i <u limit) unsigned compare loop exit 0N/A// exit-proj stay-in-loop-proj 0N/A// if(stay-in-loop-const) original if 0N/A// / if(i < limit) new signed test 0N/A// / / if(i <u limit) new cloned unsigned test 0N/A// exit-proj stay-in-loop-proj 0N/A // Create a new region on the exit path 0N/A // Clone the if-cmpu-true-false using a signed compare 0N/A // Clone the if-cmpu-true-false 0N/A // Force original if to stay in loop. 0N/A//------------------------------ remove_cmpi_loop_exit ------------------------------------- 0N/A// Remove a previously inserted signed compare loop exit. 0N/A//------------------------------ scheduled_nodelist ------------------------------------- 0N/A// Create a post order schedule of nodes that are in the 0N/A// "member" set. The list is returned in "sched". 0N/A// The first node in "sched" is the loop head, followed by 0N/A// nodes which have no inputs in the "member" set, and then 0N/A// followed by the nodes that have an immediate input dependence 0N/A// on a node in "sched". 0N/A // Initially push all with no inputs from within member set 0N/A // traverse out's that are in the member set 0N/A // All outputs processed 0N/A//------------------------------ has_use_in_set ------------------------------------- 0N/A// Has a use in the vector set 0N/A//------------------------------ has_use_internal_to_set ------------------------------------- 0N/A// Has use internal to the vector set (ie. not in a phi at the loop head) 0N/A//------------------------------ clone_for_use_outside_loop ------------------------------------- 0N/A// clone "n" for uses that are outside of loop 0N/A // clone "n" and insert it between the inputs of "n" and the use outside the loop 0N/A // Use in a phi is considered a use in the associated predecessor block 0N/A//------------------------------ clone_for_special_use_inside_loop ------------------------------------- 0N/A// clone "n" for special uses that are in the not_peeled region. 0N/A// If these def-uses occur in separate blocks, the code generator 0N/A// marks the method as not compilable. For example, if a "BoolNode" 0N/A// is in a different basic block than the "IfNode" that uses it, then 0N/A// the compilation is aborted in the code generator. 0N/A // clone "n" and insert it between inputs of "n" and the use 0N/A//------------------------------ insert_phi_for_loop ------------------------------------- 0N/A// Insert phi(lp_entry_val, back_edge_val) at use->in(idx) for loop lp if phi does not already exist 0N/A // Use existing phi if it already exists 0N/A // Remove the new phi from the graph and use the hit 0N/A//------------------------------ is_valid_loop_partition ------------------------------------- 0N/A// Validate the loop partition sets: peel and not_peel 0N/A // Check that peel_list entries are in the peel set 0N/A // Check at loop members are in one of peel set or not_peel set 0N/A // Check that peel set elements are in peel_list 0N/A // Must be in peel_list also 0N/A//------------------------------ is_valid_clone_loop_exit_use ------------------------------------- 0N/A// Ensure a use outside of loop is of the right form 0N/A//------------------------------ is_valid_clone_loop_form ------------------------------------- 0N/A// Ensure that all uses outside of loop are of the right form 0N/A // use is not in the loop, check for correct structure 0N/A//------------------------------ partial_peel ------------------------------------- 0N/A// Partially peel (aka loop rotation) the top portion of a loop (called 0N/A// the peel section below) by cloning it and placing one copy just before 0N/A// the new loop head and the other copy at the bottom of the new loop. 0N/A// before after where it came from 0N/A// stmt2 if condA goto exitA clone 0N/A// if condA goto exitA new_loop: new 0N/A// if !condB goto loop if condB goto exitB clone 0N/A// stmt4 if !condA goto new_loop orig 0N/A// Step 1: find the cut point: an exit test on probable 0N/A// induction variable. 0N/A// Step 2: schedule (with cloning) operations in the peel 0N/A// section that can be executed after the cut into 0N/A// the section that is not peeled. This may need 0N/A// to clone operations into exit blocks. For 0N/A// instance, a reference to A[i] in the not-peel 0N/A// section and a reference to B[i] in an exit block 0N/A// may cause a left-shift of i by 2 to be placed 0N/A// in the peel block. This step will clone the left 0N/A// shift into the exit block and sink the left shift 0N/A// from the peel to the not-peel section. 0N/A// Step 3: clone the loop, retarget the control, and insert 0N/A// phis for values that are live across the new loop 0N/A// head. This is very dependent on the graph structure 0N/A// from clone_loop. It creates region nodes for 0N/A// exit control and associated phi nodes for values 0N/A// flow out of the loop through that exit. The region 0N/A// node is dominated by the clone's control projection. 0N/A// So the clone's peel section is placed before the 0N/A// new loop head, and the clone's not-peel section is 0N/A// forms the top part of the new loop. The original 0N/A// peel section forms the tail of the new loop. 0N/A// Step 4: update the dominator tree and recompute the 0N/A// false true ^ <-- last_peel 0N/A// / stmt3 | <-- first_not_peel 0N/A// +---->loop loop<----+ 0N/A// ^ true false false true ^ <-- last_peel 0N/A// | cut==|== \ \ / ===|==cut | 0N/A// | stmt3 \ \ / stmt3 | <-- first_not_peel 0N/A// | ifB regionA ifB | 0N/A// | v v exitA: v v | 0N/A// | true false false true | 0N/A// after partial peel 0N/A// TOP->region region----+ 0N/A// true false false true | <-- last_peel 0N/A// | ^ \ / +------|---+ 0N/A// +->newloop \ \ / === ==cut | | 0N/A// | stmt3 \ \ / TOP | | 0N/A// | | dom | | stmt3 | | <-- first_not_peel 0N/A// | ifB regionA ifB ^ v 0N/A// | v v exitA: v v | | 0N/A// | true false false true | | 0N/A// | | dom \ / TOP | | 0N/A// | +------------>-----------------+ | 0N/A// +-----------------<---------------------+ 0N/A// ........> ifA clone 0N/A// : | newloop<-----+ 0N/A // Check for complex exit control 0N/A // Step 1: find cut point 0N/A // Walk up dominators to loop head looking for first loop exit 0N/A // which is executed on every path thru loop. 0N/A // If loop-varying exit-test, check for induction variable 0N/A // Prefer signed compare over unsigned compare. 0N/A return false;
// No peel point found 0N/A return false;
// No peel point found 0N/A // Set of cfg nodes to peel are those that are executable from 0N/A // the head through last_peel. 0N/A // Set of non-cfg nodes to peel are those that are control 0N/A // dependent on the cfg nodes. 0N/A // Step 2: move operations from the peeled section down into the 0N/A // not-peeled section 0N/A // Get a post order schedule of nodes in the peel region 0N/A // Result in right-most operand. 0N/A // For future check for too many new phis 0N/A // Evacuate nodes in peel region into the not_peeled region if possible 0N/A // If not used internal to the peeled region, 0N/A // move "n" from peeled to not_peeled region. 0N/A // if not pinned and not a load (which maybe anti-dependent on a store) 0N/A // and not a CMove (Matcher expects only bool->cmove). 0N/A // Otherwise check for special def-use cases that span 0N/A // Inhibit more partial peeling on this loop 0N/A // Step 3: clone loop, retarget control, and insert new phis 0N/A // Create new loop head for new phis and to hang 0N/A // the nodes being moved (sinked) from the peel region. 0N/A // Add phi if "def" node is in peel set and "use" is not 0N/A // "def" is in peel set, "use" is not in peel set 0N/A // or "use" is in the entry boundary (a phi) of the peel set 0N/A // use is not in the loop, check if the live range includes the cut 0N/A // Step 3b: retarget control 0N/A // Redirect control to the new loop head if a cloned node in 0N/A // the not_peeled region has control that points into the peeled region. 0N/A // This necessary because the cloned peeled region will be outside 0N/A // cloned-peeled <---+ 0N/A // new_head_clone: | <--+ 0N/A // cloned-not_peeled in(0) in(0) 0N/A // Backedge of the surviving new_head (the clone) is original last_peel 0N/A // Cut first node in original not_peel set 0N/A // Copy head_clone back-branch info to original head 0N/A // and remove original head's loop entry and 0N/A // clone head's back-branch 0N/A // Similarly modify the phis 0N/A // Step 4: update dominator tree and dominator depth 0N/A // Inhibit more partial peeling on this loop 0N/A//------------------------------reorg_offsets---------------------------------- 0N/A// Reorganize offset computations to lower register pressure. Mostly 0N/A// prevent loop-fallout uses of the pre-incremented trip counter (which are 0N/A// then alive with the post-incremented trip counter forcing an extra 0N/A if( !
cle )
return;
// The occasional dead loop 0N/A // Find loop exit control 0N/A // Check for the special case of folks using the pre-incremented 0N/A // trip-counter on the fall-out path (forces the pre-incremented 0N/A // and post-incremented trip counter to be live at the same time). 0N/A // Fix this by adjusting to use the post-increment trip counter. 0N/A if( !
phi )
return;
// Dead infinite loop 0N/A // Look for loop-invariant use 0N/A // Check that use is live out the bottom. Assuming the trip-counter 0N/A // update is right at the bottom, uses of of the loop middle are ok. 0N/A // protect against stride not being a constant 0N/A // Hit! Refactor use to use the post-incremented tripcounter. 0N/A // Compute a post-increment tripcounter. 0N/A // Since DU info changed, rerun loop