parCardTableModRefBS.cpp revision 2941
2685N/A * Copyright (c) 2007, 2011, Oracle and/or its affiliates. All rights reserved. 0N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 0N/A * This code is free software; you can redistribute it and/or modify it 0N/A * under the terms of the GNU General Public License version 2 only, as 0N/A * published by the Free Software Foundation. 0N/A * This code is distributed in the hope that it will be useful, but WITHOUT 0N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 0N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 0N/A * version 2 for more details (a copy is included in the LICENSE file that 0N/A * accompanied this code). 0N/A * You should have received a copy of the GNU General Public License version 0N/A * 2 along with this work; if not, write to the Free Software Foundation, 0N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 1472N/A * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 2384N/A "# worker threads != # requested!");
2941N/A "# worker threads != # requested!");
2384N/A // Make sure the LNC array is valid for the space. 2941N/A // Sets the condition for completion of the subtask (how many threads 2941N/A // need to finish in order to be done). 2384N/A // Clear lowest_non_clean array for next time. 2454N/A // We go from higher to lower addresses here; it wouldn't help that much 2454N/A // because of the strided parallelism pattern used here. 0N/A // Find the first card address of the first chunk in the stride that is 0N/A // at least "bottom" of the used region. 0N/A // Go ahead to the next chunk group boundary, then to the requested stride. 2454N/A // Even though we go from lower to higher addresses below, the 2454N/A // strided parallelism can interleave the actual processing of the 2454N/A // dirty pages in various ways. For a specific chunk within this 2454N/A // stride, we take care to avoid double scanning or missing a card 2454N/A // by suitably initializing the "min_done" field in process_chunk_boundaries() 2454N/A // below, together with the dirty region extension accomplished in 2454N/A // DirtyCardToOopClosure::do_MemRegion(). 0N/A // Invariant: chunk_mr should be fully contained within the "used" region. 0N/A // Process the chunk. 2454N/A // We want the LNC array updates above in process_chunk_boundaries 2454N/A // to be visible before any of the card table value changes as a 2454N/A // result of the dirty card iteration below. 2384N/A // We do not call the non_clean_card_iterate_serial() version because 2454N/A // we want to clear the cards: clear_cl here does the work of finding 2454N/A // contiguous dirty ranges of cards to process and clear. 0N/A // Find the next chunk of the stride. 2454N/A// If you want a talkative process_chunk_boundaries, 2454N/A#
error "Encountered a global preprocessor flag, NOISY, which might clash with local definition to follow" 2454N/A // We must worry about non-array objects that cross chunk boundaries, 2454N/A // because such objects are both precisely and imprecisely marked: 2454N/A // .. if the head of such an object is dirty, the entire object 2454N/A // needs to be scanned, under the interpretation that this 2454N/A // .. if the head of such an object is not dirty, we can assume 2454N/A // precise marking and it's efficient to scan just the dirty 2454N/A // In either case, each scanned reference must be scanned precisely 2454N/A // once so as to avoid cloning of a young referent. For efficiency, 2454N/A // our closures depend on this property and do not protect against 2454N/A // First, set "our" lowest_non_clean entry, which would be 2454N/A // used by the thread scanning an adjoining left chunk with 2454N/A // a non-array object straddling the mutual boundary. 2454N/A // Find the object that spans our boundary, if one exists. 2454N/A // first_block is the block possibly straddling our left boundary. 2454N/A "First chunk should always have a co-initial block");
2454N/A // Does the block straddle the chunk's left boundary, and is it 2454N/A // Find our least non-clean card, so that a left neighbour 2454N/A // does not scan an object straddling the mutual boundary 2454N/A // too far to the right, and attempt to scan a portion of 2454N/A // Note that this does not need to go beyond our last card 2454N/A // if our first object completely straddles this chunk. 2454N/A "Write exactly once : value should be stable hereafter for this round");
2454N/A tty->
print_cr(
" LNC: Found no dirty card in current chunk; leaving LNC entry NULL");
2454N/A // In the future, we could have this thread look for a non-NULL value to copy from its 2454N/A // right neighbour (up to the end of the first object). 2454N/A " might be efficient to get value from right neighbour?");
2454N/A // In this case we can help our neighbour by just asking them 2454N/A // to stop at our first card (even though it may not be dirty). 2454N/A NOISY(
tty->
print_cr(
" LNC: first block is not a non-array object; setting LNC to first card of current chunk");)
2454N/A // the highest address that we will scan past the right end of our chunk. 2454N/A // This is not the last chunk in the used region. 2454N/A // What is our last block? We check the first block of 2454N/A // the next (right) chunk rather than strictly check our last block 2454N/A // because it's potentially more efficient to do so. 2454N/A // It is a non-array object that straddles the right boundary of this chunk. 0N/A // last_obj_card is the card corresponding to the start of the last object 0N/A // in the chunk. Note that the last object may not start in 2454N/A // The card containing the head is not dirty. Any marks on 0N/A // subsequent cards still in this chunk must have been made 2454N/A // precisely; we can cap processing at the end of our chunk. 0N/A // The last object must be considered dirty, and extends onto the 0N/A // following chunk. Look for a dirty card in that chunk that will 0N/A // bound our processing. 0N/A // This search potentially goes a long distance looking 2454N/A // for the next card that will be scanned, terminating 2454N/A // at the end of the last_block, if no earlier dirty card 2454N/A "last card of next chunk may be wrong");
2454N/A " max_to_do set at " PTR_FORMAT " which is before end of last block in chunk: " 2454N/A // The following is a pessimistic value, because it's possible 2454N/A // that a dirty card on a subsequent chunk has been cleared by 2454N/A // the time we get to look at it; we'll correct for that further below, 2454N/A // using the LNC array which records the least non-clean card 2454N/A // before cards were cleared in a particular chunk. 2454N/A // It is possible that a dirty card for the last object may have been 2454N/A // cleared before we had a chance to examine it. In that case, the value 2454N/A // will have been logged in the LNC for that chunk. 2454N/A // We need to examine as many chunks to the right as this object 2587N/A // covers. However, we need to bound this checking to the largest 2587N/A // entry in the LNC array: this is because the heap may expand 2587N/A // after the LNC array has been created but before we reach this point, 2587N/A // and the last block in our chunk may have been expanded to include 2587N/A // the expansion delta (and possibly subsequently allocated from, so 2587N/A // it wouldn't be sufficient to check whether that last block was 2587N/A // or was not an object at this point). 2454N/A // we can stop at the first non-NULL entry we find 2454N/A // In any case, we break now 2454N/A }
// else continue to look for a non-NULL entry if any 0N/A // Now we can set the closure we're using so it doesn't to beyond 0N/A // Only the first thread to obtain the lock will resize the 0N/A // LNC array for the covered region. Any later expansion can't affect 0N/A // the used_at_save_marks region. 0N/A // (I observed a bug in which the first thread to execute this would 2454N/A // resize, and then it would cause "expand_and_allocate" that would 2454N/A // increase the number of chunks in the covered region. Then a second 0N/A // thread would come and execute this, see that the size didn't match, 0N/A // and free and allocate again. So the first thread would be using a 0N/A // freed "_lowest_non_clean" array.) 0N/A // Do a dirty read here. If we pass the conditional then take the rare 0N/A // event lock and do the read again in case some other thread had already 0N/A // succeeded and done the resize. 0N/A // Should we delete the old? 0N/A "logical consequence");
0N/A // Now allocate a new one if necessary. 0N/A // In any case, now do the initialization.