loopnode.hpp revision 2670
2362N/A * Copyright (c) 1998, 2011, Oracle and/or its affiliates. All rights reserved. 0N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 0N/A * This code is free software; you can redistribute it and/or modify it 0N/A * under the terms of the GNU General Public License version 2 only, as 0N/A * published by the Free Software Foundation. 0N/A * This code is distributed in the hope that it will be useful, but WITHOUT 0N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 0N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 0N/A * version 2 for more details (a copy is included in the LICENSE file that 0N/A * accompanied this code). 0N/A * You should have received a copy of the GNU General Public License version 0N/A * 2 along with this work; if not, write to the Free Software Foundation, 0N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 2362N/A * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 0N/A// I D E A L I Z E D L O O P S 0N/A// Idealized loops are the set of loops I perform more interesting 0N/A// transformations on, beyond simple hoisting. 0N/A//------------------------------LoopNode--------------------------------------- 0N/A// Simple loop header. Fall in path on left, loop-back path on right. 0N/A // Size is bigger to hold the flags. However, the flags do not change 0N/A // the semantics so it does not appear in the hash & cmp functions. 4638N/A // Names for flag bitfields 0N/A // Names for edge indices 0N/A//------------------------------Counted Loops---------------------------------- 0N/A// Counted loops are all trip-counted loops, with exactly 1 trip-counter exit 0N/A// path (and maybe some other exit paths). The trip-counter exit is always 0N/A// last in the loop. The trip-counter have to stride by a constant; 0N/A// the exit value is also loop invariant. 0N/A// CountedLoopNodes and CountedLoopEndNodes come in matched pairs. The 0N/A// CountedLoopNode has the incoming loop control and the loop-back-control // which is always the IfTrue before the matching CountedLoopEndNode. The // CountedLoopEndNode has an incoming control (possibly not the // CountedLoopNode if there is control flow in the loop), the post-increment // trip-counter value, and the limit. The trip-counter value is always of // the form (Op old-trip-counter stride). The old-trip-counter is produced // by a Phi connected to the CountedLoopNode. The stride is constant. // The Op is any commutable opcode, including Add, Mul, Xor. The // CountedLoopEndNode also takes in the loop-invariant limit value. // From a CountedLoopNode I can reach the matching CountedLoopEndNode via the // loop-back control. From CountedLoopEndNodes I can reach CountedLoopNodes // via the old-trip-counter from the Op node. //------------------------------CountedLoopNode-------------------------------- // CountedLoopNodes head simple counted loops. CountedLoopNodes have as // inputs the incoming loop-start control and the loop-back control, so they // act like RegionNodes. They also take in the initial trip counter, the // loop-invariant stride and the loop-invariant limit value. CountedLoopNodes // produce a loop-body control and the trip counter value. Since // CountedLoopNodes behave like RegionNodes I still have a standard CFG model. // Size is bigger to hold _main_idx. However, _main_idx does not change // the semantics so it does not appear in the hash & cmp functions. // For Pre- and Post-loops during debugging ONLY, this holds the index of // the Main CountedLoop. Used to assert that we understand the graph shape. // Known trip count calculated by compute_exact_trip_count() // Expected trip count from profile data // Log2 of original loop bodies in unrolled loop // Node count prior to last unrolling - used to decide if // unroll,optimize,unroll,optimize,... is making progress // Initialize _trip_count to the largest possible value. // Will be reset (lower) if the loop's trip count is known. // Match increment with optional truncation // A 'main' loop has a pre-loop and a post-loop. The 'main' loop // can run short a few iterations and may start a few iterations in. // It will be RCE'd and unrolled and aligned. // A following 'post' loop will run any remaining iterations. Used // during Range Check Elimination, the 'post' loop will do any final // iterations with full checks. Also used by Loop Unrolling, where // the 'post' loop will do any epilog iterations needed. Basically, // a 'post' loop can not profitably be further unrolled or RCE'd. // A preceding 'pre' loop will run at least 1 iteration (to do peeling), // it may do under-flow checks for RCE and may do alignment iterations // so the following main loop 'knows' that it is striding down cache // A 'main' loop that is ONLY unrolled or peeled, never RCE'd or // Aligned, may be missing it's pre-loop. //------------------------------CountedLoopEndNode----------------------------- // CountedLoopEndNodes end simple trip counted loops. They act much like //------------------------------LoopLimitNode----------------------------- // Counted Loop limit node which represents exact final iterator value: // trip_count = (limit - init_trip + stride - 1)/stride // final_value= trip_count * stride + init_trip. // Use HW instructions to calculate it when it can overflow in integer. // Note, final_value should fit into integer since counted loop has // limit check: limit <= max_int-stride. // Put it on the Macro nodes list to optimize during macro nodes expansion. // -----------------------------IdealLoopTree---------------------------------- // The head-tail backedge defines the loop. // If tail is NULL then this loop has multiple backedges as part of the // same loop. During cleanup I'll peel off the multiple backedges; merge // them at the loop bottom and flow 1 real backedge into the loop. inline Node *
tail();
// Handle lazy update of _tail field _has_sfpt:
1,
// True if has non-call safepoint // Is 'l' a member of 'this'? // Set loop nesting depth. Accumulate has_call bits. // Split out multiple fall-in edges from the loop header. Move them to a // private RegionNode before the loop. This becomes the loop landing pad. // Split out the outermost loop from this shared header. // Merge all the backedges from the shared header into a private Region. // Feed that region as the one backedge to this loop. // Split shared headers and insert loop landing pads. // Insert a LoopNode to replace the RegionNode. // Returns TRUE if loop tree is structurally changed. // Perform optimization to use the loop predicates for null checks and range checks. // Applies to any loop level (not just the innermost one) // Perform iteration-splitting on inner loops. Split iterations to // avoid range checks or one-shot null checks. Returns false if the // current round of loop opts should stop. // Driver for various flavors of iteration splitting. Returns false // if the current round of loop opts should stop. // Given dominators, try to find loops with calls that must always be // executed (call dominates loop tail). These loops do not need non-call // Allpaths backwards scan from loop tail, terminating each path at first safepoint // Convert to counted loops where possible // Check for Node being a loop-breaking test // Returns true if ctrl is executed on every complete iteration // Remove simplistic dead code from loop body // Look for loop-exit tests with my 50/50 guesses from the Parsing stage. // Replace with a 1-in-10 exit guess. // Return TRUE or FALSE if the loop should never be RCE'd or aligned. // Useful for unrolling loops with NO array accesses. // Return TRUE or FALSE if the loop should be unswitched -- clone // loop with an invariant test // Micro-benchmark spamming. Remove empty loops. // Convert one iteration loop into normal code. // Return TRUE or FALSE if the loop should be peeled or not. Peel if we can // make some loop-invariant test (usually a null-check) happen before the // Return TRUE or FALSE if the loop should be maximally unrolled. Stash any // known trip count in the counted loop node. // Return TRUE or FALSE if the loop should be unrolled or not. Unroll if // the loop is a CountedLoop and the body is small enough. // Return TRUE or FALSE if the loop should be range-check-eliminated. // Gather a list of IF tests that are dominated by iteration splitting; // also gather the end of the first split and the start of the 2nd split. // Return TRUE or FALSE if the loop should be cache-line aligned. // Gather the expression that does the alignment. Note that only // one array base can be aligned in a loop (unless the VM guarantees // mutual alignment). Note that if we vectorize short memory ops // into longer memory ops, we may want to increase alignment. // Return TRUE if "iff" is a range check. // Compute loop exact trip count if possible // Compute loop trip count from profile data // Reassociate invariant expressions. // Reassociate invariant add and subtract expressions. // Return nonzero index of invariant operand if invariant and variant // are combined with an Add or Sub. Helper for reassociate_invariants. // Return true if n is invariant // Put loop body on igvn work list void dump_head( )
const;
// Dump loop head only void dump()
const;
// Dump this loop recursively // -----------------------------PhaseIdealLoop--------------------------------- // Computes the mapping from Nodes to IdealLoopTrees. Organizes IdealLoopTrees into a // loop tree. Drives the loop-based transformations on the ideal graph. // Pre-computed def-use info // Array of pre-order numbers, plus post-visited bit. // ZERO for not pre-visited. EVEN for pre-visited but not post-visited. // ODD for post-visited. Other bits are the pre-order number. // Allocate _preorders[] array // Allocate _preorders[] array // Check to grow _preorders[] array for the case when build_loop_tree_impl() // Check for pre-visited. Zero for NOT visited; non-zero for visited. // Pre-order numbers are written to the Nodes array as low-bit-set values. // Return pre-order number. // Check for being post-visited. // Should be previsited already (checked with assert(is_visited(n))). // Set/get control node out. Set lower bit to distinguish from IdealLoopTree // Returns true if "n" is a data node, false if it's a control node. // clear out dead code after build_loop_late // Support for faster execution of get_late_ctrl()/dom_lca() // when a node has many uses and dominator depth is deep. // Helper for debugging bad dominance relationships // Inline wrapper for frequent cases: // 2) a use is the same as the current LCA passed as 'n1' // Helper function for directing control inputs away from CFG split // check if transform created new nodes that need _ctrl recorded // Set control and update loop membership // Control nodes can be replaced or subsumed. During this pass they // get their replacement Node in slot 1. Instead of updating the block // location of all Nodes in the subsumed block, we lazily do it. As we // pull such a subsumed block out of the array, we write back the final // true if CFG node d dominates CFG node n // return get_ctrl for a data node and self(n) for a CFG node // Check for loop being set // "n" must be a control node. Returns true if "n" is known to be in a loop. // Lazy-dazy update of 'get_ctrl' and 'idom_at' mechanisms. Replace // the 'old_node' with 'new_node'. Kill old-node. Add a reference // from old_node to new_node to support the lazy update. Reference // replaces loop reference, since that is not needed for dead node. //old_node->set_req( 1, new_node /*NO DU INFO*/ ); // Nodes always have DU info now, so re-use the side array slot // for this node to provide the forwarding pointer. // Place 'n' in some loop nest, where 'n' is a CFG node // Insert loop into the existing loop tree. 'innermost' is a leaf of the // loop tree, not the root. // Place Data nodes in some loop nest // Array of immediate dominance info for each CFG node indexed by node idx Node **
_idom;
// Array of immediate dominators assert(n !=
NULL,
"Bad immediate dominator info.");
while (n->
in(0) ==
NULL) {
// Skip dead CFG nodes assert(n !=
NULL,
"Bad immediate dominator info.");
_idom[
didx] = n;
// Lazily remove dead CFG nodes from table. // Locally compute IDOM using dom_lca call // Is safept not required by an outer loop? // Replace parallel induction variable (parallel to trip counter) // Perform verification that the graph is valid. // build the loop tree and perform any requested optimizations // Dominators for the sea of nodes // Compute the Ideal Node to Loop mapping // Verify that verify_me made the same decisions as a fresh run. // Build and verify the loop tree without modifying the graph. This // is useful to verify that all inputs properly dominate their uses. // True if the method has at least 1 irreducible loop // Return a post-walked LoopNode // Dead nodes have no loop, so return the top level loop instead // Is 'n' a (nested) member of 'loop'? // This is the basic building block of the loop optimizations. It clones an // entire loop body. It makes an old_new loop body mapping; with this // mapping you can find the new-loop equivalent to an old-loop node. All // new-loop nodes are exactly equal to their old-loop counterparts, all // edges are the same. All exits from the old-loop now have a RegionNode // that merges the equivalent new-loop path. This is true even for the // normal "loop-exit" condition. All uses of loop-invariant old-loop values // now come from (one or more) Phis that merge their new-loop equivalents. // Parameter side_by_side_idom: // When side_by_size_idom is NULL, the dominator tree is constructed for // the clone loop to dominate the original. Used in construction of // pre-main-post loop sequence. // When nonnull, the clone and original are side-by-side, both are // dominated by the passed in side_by_side_idom node. Used in // construction of unswitched loops. // If we got the effect of peeling, either by actually peeling or by // making a pre-loop which must execute at least once, we can remove // all loop-invariant dominated tests in the main body. // Generate code to do a loop peel for the given loop (and body). // old_new is a temp array. // Add pre and post loops around the given loop. These loops are used // during RCE, unrolling and aligning loops. // If Node n lives in the back_ctrl block, we clone a private version of n // in preheader_ctrl block and return that, otherwise return n. // Take steps to maximally unroll the loop. Peel any odd iterations, then // unroll to do double iterations. The next round of major loop transforms // will repeat till the doubled loop body does all remaining iterations in 1 // Unroll the loop body one step - make each trip do 2 iterations. // Return true if exp is a constant times an induction var // Return true if exp is a scaled induction var plus (or minus) constant // Return true if proj is for "proj->[region->..]call_uct" // Return true for "if(test)-> proj -> ... // other_proj->[region->..]call_uct" // Create a new if above the uncommon_trap_if_pattern for the predicate to be promoted // Clone loop predicates to cloned loops (peeled, unswitched) // Find a good location to insert a predicate // Construct a range check for a predicate if // Implementation of the loop predication to promote checks outside the loop // Helper function to collect predicate for eliminating the useless ones // Eliminate range-checks and other trip-counter vs loop-invariant tests. // Create a slow version of the loop by cloning the loop // and inserting an if to select fast-slow versions. // Clone loop with an invariant test (that does not exit) and // insert a clone of the test that selects which version to // Find candidate "if" for unswitching // Range Check Elimination uses this function! // Constrain the main loop iterations so the affine function: // low_limit <= scale_con * I + offset < upper_limit // always holds true. That is, either increase the number of iterations in // the pre-loop or the post-loop until the condition holds true in the main // loop. Scale_con, offset and limit are all loop invariant. // Helper function for add_constraint(). // Partially peel loop up through last_peel node. // Create a scheduled list of nodes control dependent on ctrl set. // Has a use in the vector set // Has use internal to the vector set (ie. not in a phi at the loop head) // clone "n" for uses that are outside of loop // clone "n" for special uses that are in the not_peeled region // Insert phi(lp_entry_val, back_edge_val) at use->in(idx) for loop lp if phi does not already exist // Validate the loop partition sets: peel and not_peel // Ensure that uses outside of loop are of the right form // Returns nonzero constant stride if-node is a possible iv test (otherwise returns zero.) // Return the (unique) control output node that's in the loop (if it exists.) // Insert a signed compare loop exit cloned from an unsigned compare. // Utility to register node "n" with PhaseIdealLoop // Utility to create an if-projection // Force the iff control output to be the live_proj // Insert a region before an if projection // Insert a new if before an if projection // Passed in a Phi merging (recursively) some nearly equivalent Bool/Cmps. // "Nearly" because all Nodes have been cloned from the original in the loop, // but the fall-in edges to the Cmp are different. Clone bool/Cmp pairs // through the Phi recursively, and return a Bool. // Rework addressing expressions to get the most loop-invariant stuff // moved out. We'd like to do all associative operators, but it's especially // important (common) to do address expressions. // Attempt to use a conditional move instead of a phi/branch // Reorganize offset computations to lower register pressure. // Mostly prevent loop-fallout uses of the pre-incremented trip counter // (which are then alive with the post-incremented trip counter // forcing an extra register move) // Check for aggressive application of 'split-if' optimization, // using basic block level info. // Mark an IfNode as being dominated by a prior test, // without actually altering the CFG (and hence IDOM info). // Split Node 'n' through merge point // Split Node 'n' through merge point if there is enough win. // Found an If getting its condition-code input from a Phi in the // same block. Split thru the Region. // Conversion of fill/copy patterns into intrisic versions // Return a type based on condition control flow // Helpers for filtered type void verify()
const;
// Major slow :-) // Dead nodes have no loop, so return the top level loop instead static int _loop_work;
// Sum of PhaseIdealLoop x _unique // Handle lazy update of _tail field //while( !n->in(0) ) // Skip dead CFG nodes // Iterate over the loop tree using a preorder, left-to-right traversal. // Example that visits all counted loops from within PhaseIdealLoop // for (LoopTreeIterator iter(_ltree_root); !iter.done(); iter.next()) { // IdealLoopTree* lpt = iter.current(); // if (!lpt->is_counted()) continue; void next();
// Advance to next loop tree #
endif // SHARE_VM_OPTO_LOOPNODE_HPP