loopnode.hpp revision 0
0N/A * Copyright 1998-2007 Sun Microsystems, Inc. All Rights Reserved. 0N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 0N/A * This code is free software; you can redistribute it and/or modify it 0N/A * under the terms of the GNU General Public License version 2 only, as 0N/A * published by the Free Software Foundation. 0N/A * This code is distributed in the hope that it will be useful, but WITHOUT 0N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 0N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 0N/A * version 2 for more details (a copy is included in the LICENSE file that 0N/A * accompanied this code). 0N/A * You should have received a copy of the GNU General Public License version 0N/A * 2 along with this work; if not, write to the Free Software Foundation, 0N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 0N/A * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, 0N/A * CA 95054 USA or visit www.sun.com if you need additional information or 0N/A * have any questions. 0N/A// I D E A L I Z E D L O O P S 0N/A// Idealized loops are the set of loops I perform more interesting 0N/A// transformations on, beyond simple hoisting. 0N/A//------------------------------LoopNode--------------------------------------- 0N/A// Simple loop header. Fall in path on left, loop-back path on right. 0N/A // Size is bigger to hold the flags. However, the flags do not change 0N/A // the semantics so it does not appear in the hash & cmp functions. 0N/A // Names for flag bitfields 0N/A // Names for edge indices 0N/A//------------------------------Counted Loops---------------------------------- 0N/A// Counted loops are all trip-counted loops, with exactly 1 trip-counter exit 0N/A// path (and maybe some other exit paths). The trip-counter exit is always 0N/A// last in the loop. The trip-counter does not have to stride by a constant, 0N/A// but it does have to stride by a loop-invariant amount; the exit value is 0N/A// also loop invariant. 0N/A// CountedLoopNodes and CountedLoopEndNodes come in matched pairs. The 0N/A// CountedLoopNode has the incoming loop control and the loop-back-control 0N/A// which is always the IfTrue before the matching CountedLoopEndNode. The 0N/A// CountedLoopEndNode has an incoming control (possibly not the 0N/A// CountedLoopNode if there is control flow in the loop), the post-increment 0N/A// trip-counter value, and the limit. The trip-counter value is always of 0N/A// the form (Op old-trip-counter stride). The old-trip-counter is produced 0N/A// by a Phi connected to the CountedLoopNode. The stride is loop invariant. 0N/A// The Op is any commutable opcode, including Add, Mul, Xor. The 0N/A// CountedLoopEndNode also takes in the loop-invariant limit value. 0N/A// From a CountedLoopNode I can reach the matching CountedLoopEndNode via the 0N/A// loop-back control. From CountedLoopEndNodes I can reach CountedLoopNodes 0N/A// via the old-trip-counter from the Op node. 0N/A//------------------------------CountedLoopNode-------------------------------- 0N/A// CountedLoopNodes head simple counted loops. CountedLoopNodes have as 0N/A// inputs the incoming loop-start control and the loop-back control, so they 0N/A// act like RegionNodes. They also take in the initial trip counter, the 0N/A// loop-invariant stride and the loop-invariant limit value. CountedLoopNodes 0N/A// produce a loop-body control and the trip counter value. Since 0N/A// CountedLoopNodes behave like RegionNodes I still have a standard CFG model. 0N/A // Size is bigger to hold _main_idx. However, _main_idx does not change 0N/A // the semantics so it does not appear in the hash & cmp functions. 0N/A // For Pre- and Post-loops during debugging ONLY, this holds the index of 0N/A // the Main CountedLoop. Used to assert that we understand the graph shape. 0N/A // Known trip count calculated by policy_maximally_unroll 0N/A // Expected trip count from profile data 0N/A // Log2 of original loop bodies in unrolled loop 0N/A // Node count prior to last unrolling - used to decide if 0N/A // unroll,optimize,unroll,optimize,... is making progress 0N/A // Initialize _trip_count to the largest possible value. 0N/A // Will be reset (lower) if the loop's trip count is known. 0N/A // Match increment with optional truncation 0N/A // A 'main' loop has a pre-loop and a post-loop. The 'main' loop 0N/A // can run short a few iterations and may start a few iterations in. 0N/A // It will be RCE'd and unrolled and aligned. 0N/A // A following 'post' loop will run any remaining iterations. Used 0N/A // during Range Check Elimination, the 'post' loop will do any final 0N/A // iterations with full checks. Also used by Loop Unrolling, where 0N/A // the 'post' loop will do any epilog iterations needed. Basically, 0N/A // a 'post' loop can not profitably be further unrolled or RCE'd. 0N/A // A preceding 'pre' loop will run at least 1 iteration (to do peeling), 0N/A // it may do under-flow checks for RCE and may do alignment iterations 0N/A // so the following main loop 'knows' that it is striding down cache 0N/A // A 'main' loop that is ONLY unrolled or peeled, never RCE'd or 0N/A // Aligned, may be missing it's pre-loop. 0N/A//------------------------------CountedLoopEndNode----------------------------- 0N/A// CountedLoopEndNodes end simple trip counted loops. They act much like 0N/A// -----------------------------IdealLoopTree---------------------------------- 0N/A // The head-tail backedge defines the loop. 0N/A // If tail is NULL then this loop has multiple backedges as part of the 0N/A // same loop. During cleanup I'll peel off the multiple backedges; merge 0N/A // them at the loop bottom and flow 1 real backedge into the loop. 0N/A inline Node *
tail();
// Handle lazy update of _tail field 0N/A // Is 'l' a member of 'this'? 0N/A // Set loop nesting depth. Accumulate has_call bits. 0N/A // Split out multiple fall-in edges from the loop header. Move them to a 0N/A // private RegionNode before the loop. This becomes the loop landing pad. 0N/A // Split out the outermost loop from this shared header. 0N/A // Merge all the backedges from the shared header into a private Region. 0N/A // Feed that region as the one backedge to this loop. 0N/A // Split shared headers and insert loop landing pads. 0N/A // Insert a LoopNode to replace the RegionNode. 0N/A // Returns TRUE if loop tree is structurally changed. 0N/A // Perform iteration-splitting on inner loops. Split iterations to avoid 0N/A // range checks or one-shot null checks. 0N/A // Driver for various flavors of iteration splitting 0N/A // Given dominators, try to find loops with calls that must always be 0N/A // executed (call dominates loop tail). These loops do not need non-call 0N/A // safepoints (ncsfpt). 0N/A // Allpaths backwards scan from loop tail, terminating each path at first safepoint 0N/A // Convert to counted loops where possible 0N/A // Check for Node being a loop-breaking test 0N/A // Returns true if ctrl is executed on every complete iteration 0N/A // Remove simplistic dead code from loop body 0N/A // Look for loop-exit tests with my 50/50 guesses from the Parsing stage. 0N/A // Replace with a 1-in-10 exit guess. 0N/A // Return TRUE or FALSE if the loop should never be RCE'd or aligned. 0N/A // Useful for unrolling loops with NO array accesses. 0N/A // Return TRUE or FALSE if the loop should be unswitched -- clone 0N/A // loop with an invariant test 0N/A // Micro-benchmark spamming. Remove empty loops. 0N/A // Return TRUE or FALSE if the loop should be peeled or not. Peel if we can 0N/A // make some loop-invariant test (usually a null-check) happen before the 0N/A // Return TRUE or FALSE if the loop should be maximally unrolled. Stash any 0N/A // known trip count in the counted loop node. 0N/A // Return TRUE or FALSE if the loop should be unrolled or not. Unroll if 0N/A // the loop is a CountedLoop and the body is small enough. 0N/A // Return TRUE or FALSE if the loop should be range-check-eliminated. 0N/A // Gather a list of IF tests that are dominated by iteration splitting; 0N/A // also gather the end of the first split and the start of the 2nd split. 0N/A // Return TRUE or FALSE if the loop should be cache-line aligned. 0N/A // Gather the expression that does the alignment. Note that only 0N/A // one array base can be aligned in a loop (unless the VM guarentees 0N/A // mutual alignment). Note that if we vectorize short memory ops 0N/A // into longer memory ops, we may want to increase alignment. 0N/A // Compute loop trip count from profile data 0N/A // Reassociate invariant expressions. 0N/A // Reassociate invariant add and subtract expressions. 0N/A // Return nonzero index of invariant operand if invariant and variant 0N/A // are combined with an Add or Sub. Helper for reassoicate_invariants. 0N/A // Return true if n is invariant 0N/A // Put loop body on igvn work list 0N/A void dump()
const;
// Dump this loop recursively 0N/A// -----------------------------PhaseIdealLoop--------------------------------- 0N/A// Computes the mapping from Nodes to IdealLoopTrees. Organizes IdealLoopTrees into a 0N/A// loop tree. Drives the loop-based transformations on the ideal graph. 0N/A // Pre-computed def-use info 0N/A // Head of loop tree 0N/A // Array of pre-order numbers, plus post-visited bit. 0N/A // ZERO for not pre-visited. EVEN for pre-visited but not post-visited. 0N/A // ODD for post-visited. Other bits are the pre-order number. 0N/A // Allocate _preorders[] array 0N/A // Allocate _preorders[] array 0N/A // Check to grow _preorders[] array for the case when build_loop_tree_impl() 0N/A // Check for pre-visited. Zero for NOT visited; non-zero for visited. 0N/A // Pre-order numbers are written to the Nodes array as low-bit-set values. 0N/A // Return pre-order number. 0N/A // Check for being post-visited. 0N/A // Should be previsited already (checked with assert(is_visited(n))). 0N/A // Mark as post visited 0N/A // Set/get control node out. Set lower bit to distinguish from IdealLoopTree 0N/A // Returns true if "n" is a data node, false if it's a control node. 0N/A // clear out dead code after build_loop_late 0N/A // Support for faster execution of get_late_ctrl()/dom_lca() 0N/A // when a node has many uses and dominator depth is deep. 0N/A // Inline wrapper for frequent cases: 0N/A // 2) a use is the same as the current LCA passed as 'n1' 0N/A // Fast-path NULL lca 0N/A // find LCA of all uses 0N/A // true if CFG node d dominates CFG node n 0N/A // Helper function for directing control inputs away from CFG split 0N/A // check if transform created new nodes that need _ctrl recorded 0N/A // Set control and update loop membership 0N/A // Control nodes can be replaced or subsumed. During this pass they 0N/A // get their replacement Node in slot 1. Instead of updating the block 0N/A // location of all Nodes in the subsumed block, we lazily do it. As we 0N/A // pull such a subsumed block out of the array, we write back the final 0N/A // Skip dead CFG nodes 0N/A // Check for loop being set 0N/A // "n" must be a control node. Returns true if "n" is known to be in a loop. 0N/A // Lazy-dazy update of 'get_ctrl' and 'idom_at' mechanisms. Replace 0N/A // the 'old_node' with 'new_node'. Kill old-node. Add a reference 0N/A // from old_node to new_node to support the lazy update. Reference 0N/A // replaces loop reference, since that is not neede for dead node. 0N/A //old_node->set_req( 1, new_node /*NO DU INFO*/ ); 0N/A // Nodes always have DU info now, so re-use the side array slot 0N/A // for this node to provide the forwarding pointer. 0N/A // Place 'n' in some loop nest, where 'n' is a CFG node 0N/A // Insert loop into the existing loop tree. 'innermost' is a leaf of the 0N/A // loop tree, not the root. 0N/A // Place Data nodes in some loop nest 0N/A // Array of immediate dominance info for each CFG node indexed by node idx 0N/A while (n->
in(0) ==
NULL) {
// Skip dead CFG nodes 0N/A _idom[
didx] = n;
// Lazily remove dead CFG nodes from table. 0N/A // Locally compute IDOM using dom_lca call 0N/A // Recompute dom_depth 0N/A // Is safept not required by an outer loop? 0N/A // Dominators for the sea of nodes 0N/A // Compute the Ideal Node to Loop mapping 0N/A // True if the method has at least 1 irreducible loop 0N/A // Per-Node transform 0N/A // Return a post-walked LoopNode 0N/A // Dead nodes have no loop, so return the top level loop instead 0N/A // Is 'n' a (nested) member of 'loop'? 0N/A // This is the basic building block of the loop optimizations. It clones an 0N/A // entire loop body. It makes an old_new loop body mapping; with this 0N/A // mapping you can find the new-loop equivalent to an old-loop node. All 0N/A // new-loop nodes are exactly equal to their old-loop counterparts, all 0N/A // edges are the same. All exits from the old-loop now have a RegionNode 0N/A // that merges the equivalent new-loop path. This is true even for the 0N/A // normal "loop-exit" condition. All uses of loop-invariant old-loop values 0N/A // now come from (one or more) Phis that merge their new-loop equivalents. 0N/A // Parameter side_by_side_idom: 0N/A // When side_by_size_idom is NULL, the dominator tree is constructed for 0N/A // the clone loop to dominate the original. Used in construction of 0N/A // pre-main-post loop sequence. 0N/A // When nonnull, the clone and original are side-by-side, both are 0N/A // dominated by the passed in side_by_side_idom node. Used in 0N/A // construction of unswitched loops. 0N/A // If we got the effect of peeling, either by actually peeling or by 0N/A // making a pre-loop which must execute at least once, we can remove 0N/A // all loop-invariant dominated tests in the main body. 0N/A // Generate code to do a loop peel for the given loop (and body). 0N/A // old_new is a temp array. 0N/A // Add pre and post loops around the given loop. These loops are used 0N/A // during RCE, unrolling and aligning loops. 0N/A // If Node n lives in the back_ctrl block, we clone a private version of n 0N/A // in preheader_ctrl block and return that, otherwise return n. 0N/A // Take steps to maximally unroll the loop. Peel any odd iterations, then 0N/A // unroll to do double iterations. The next round of major loop transforms 0N/A // will repeat till the doubled loop body does all remaining iterations in 1 0N/A // Unroll the loop body one step - make each trip do 2 iterations. 0N/A // Return true if exp is a constant times an induction var 0N/A // Return true if exp is a scaled induction var plus (or minus) constant 0N/A // Eliminate range-checks and other trip-counter vs loop-invariant tests. 0N/A // Create a slow version of the loop by cloning the loop 0N/A // and inserting an if to select fast-slow versions. 0N/A // Clone loop with an invariant test (that does not exit) and 0N/A // insert a clone of the test that selects which version to 0N/A // Find candidate "if" for unswitching 0N/A // Range Check Elimination uses this function! 0N/A // Constrain the main loop iterations so the affine function: 0N/A // scale_con * I + offset < limit 0N/A // always holds true. That is, either increase the number of iterations in 0N/A // the pre-loop or the post-loop until the condition holds true in the main 0N/A // loop. Scale_con, offset and limit are all loop invariant. 0N/A // Partially peel loop up through last_peel node. 0N/A // Create a scheduled list of nodes control dependent on ctrl set. 0N/A // Has a use in the vector set 0N/A // Has use internal to the vector set (ie. not in a phi at the loop head) 0N/A // clone "n" for uses that are outside of loop 0N/A // clone "n" for special uses that are in the not_peeled region 0N/A // Insert phi(lp_entry_val, back_edge_val) at use->in(idx) for loop lp if phi does not already exist 0N/A // Validate the loop partition sets: peel and not_peel 0N/A // Ensure that uses outside of loop are of the right form 0N/A // Returns nonzero constant stride if-node is a possible iv test (otherwise returns zero.) 0N/A // Return the (unique) control output node that's in the loop (if it exists.) 0N/A // Insert a signed compare loop exit cloned from an unsigned compare. 0N/A // Utility to register node "n" with PhaseIdealLoop 0N/A // Utility to create an if-projection 0N/A // Force the iff control output to be the live_proj 0N/A // Insert a region before an if projection 0N/A // Insert a new if before an if projection 0N/A // Passed in a Phi merging (recursively) some nearly equivalent Bool/Cmps. 0N/A // "Nearly" because all Nodes have been cloned from the original in the loop, 0N/A // but the fall-in edges to the Cmp are different. Clone bool/Cmp pairs 0N/A // through the Phi recursively, and return a Bool. 0N/A // Rework addressing expressions to get the most loop-invariant stuff 0N/A // moved out. We'd like to do all associative operators, but it's especially 0N/A // important (common) to do address expressions. 0N/A // Attempt to use a conditional move instead of a phi/branch 0N/A // Reorganize offset computations to lower register pressure. 0N/A // Mostly prevent loop-fallout uses of the pre-incremented trip counter 0N/A // (which are then alive with the post-incremented trip counter 0N/A // forcing an extra register move) 0N/A // Check for aggressive application of 'split-if' optimization, 0N/A // using basic block level info. 0N/A // Mark an IfNode as being dominated by a prior test, 0N/A // without actually altering the CFG (and hence IDOM info). 0N/A // Split Node 'n' through merge point 0N/A // Split Node 'n' through merge point if there is enough win. 0N/A // Found an If getting its condition-code input from a Phi in the 0N/A // same block. Split thru the Region. 0N/A // Return a type based on condition control flow 0N/A // Helpers for filtered type 0N/A // Dead nodes have no loop, so return the top level loop instead 0N/A// Handle lazy update of _tail field 0N/A //while( !n->in(0) ) // Skip dead CFG nodes 0N/A// Iterate over the loop tree using a preorder, left-to-right traversal. 0N/A// Example that visits all counted loops from within PhaseIdealLoop 0N/A// for (LoopTreeIterator iter(_ltree_root); !iter.done(); iter.next()) { 0N/A// if (!lpt->is_counted()) continue; 0N/A void next();
// Advance to next loop tree