sharedRuntime_x86_64.cpp revision 3158
3158N/A * Copyright (c) 2003, 2012, Oracle and/or its affiliates. All rights reserved. 0N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 0N/A * This code is free software; you can redistribute it and/or modify it 0N/A * under the terms of the GNU General Public License version 2 only, as 0N/A * published by the Free Software Foundation. 0N/A * This code is distributed in the hope that it will be useful, but WITHOUT 0N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 0N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 0N/A * version 2 for more details (a copy is included in the LICENSE file that 0N/A * accompanied this code). 0N/A * You should have received a copy of the GNU General Public License version 0N/A * 2 along with this work; if not, write to the Free Software Foundation, 0N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 1472N/A * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 0N/A // Most of the runtime stubs have this simple frame layout. 0N/A // This class exists to make the layout shared in one place. 0N/A // Offsets are for compiler stack slots, which are jints. 0N/A // The frame sender code expects that rbp will be in the "natural" place and 0N/A // will override any oopMap setting for it. We must therefore force the layout 0N/A // so that it agrees with the frame sender code. 0N/A // Capture info about frame layout. Layout offsets are in jint 0N/A // units because compiler frame slots are jints. 0N/A // The frame sender code expects that rbp will be in the "natural" place and 0N/A // will override any oopMap setting for it. We must therefore force the layout 0N/A // so that it agrees with the frame sender code. 0N/A // Offsets into the register save area 0N/A // Used by deoptimization when it is managing result register 0N/A // values on its own 0N/A // During deoptimization only the result registers need to be restored, 0N/A // all the other values have already been extracted. 0N/A // Always make the frame size 16-byte aligned 0N/A // OopMap frame size is in compiler stack slots (jint's) not bytes or words 0N/A // The caller will allocate additional_frame_words 0N/A // CodeBlob frame size is in words. 0N/A // Save registers, fpu state, and flags. 0N/A // We assume caller has already pushed the return address onto the 0N/A // stack, so rsp is 8-byte aligned here. 0N/A // We push rpb twice in this sequence because we want the real rbp 0N/A // to be under the return like a normal enter. 0N/A // Allocate argument register save area 0N/A // Set an oopmap for the call site. This oopmap will map all 0N/A // oop-registers and debug-info registers as callee-saved. This 0N/A // will allow deoptimization at this safepoint to find all possible 0N/A // debug-info recordings, as well as let GC find all oops. 0N/A // rbp location is known implicitly by the frame sender code, needs no oopmap 0N/A // and the location where rbp was saved by is ignored 0N/A // %%% These should all be a waste but we'll keep things as they were for now 0N/A // rbp location is known implicitly by the frame sender code, needs no oopmap 0N/A // Pop arg register save area 0N/A // Recover CPU state 0N/A // Get the rbp described implicitly by the calling convention (no oopMap) 0N/A // Just restore result register. Only used by deoptimization. By 0N/A // now any callee save register that needs to be restored to a c2 0N/A // caller of the deoptee has been extracted into the vframeArray 0N/A // and will be stuffed into the c2i adapter we create for later 0N/A // restoration so only result registers need to be restored here. 0N/A // Restore fp result register 0N/A // Restore integer result register 0N/A // Pop all of the register save are off the stack except the return address 0N/A// The java_calling_convention describes stack locations as ideal slots on 0N/A// a frame with no abi restrictions. Since we must observe abi restrictions 0N/A// (like the placement of the register window) the slots must be biased by 0N/A// the following value. 0N/A // Account for saved rbp and return address 0N/A // This should really be in_preserve_stack_slots 0N/A// --------------------------------------------------------------------------- 0N/A// Read the array of BasicTypes from a signature, and compute where the 0N/A// arguments should go. Values in the VMRegPair regs array refer to 4-byte 0N/A// quantities. Values less than VMRegImpl::stack0 are registers, those above 0N/A// refer to 4-byte stack slots. All stack slots are based off of the stack pointer 0N/A// as framesizes are fixed. 0N/A// VMRegImpl::stack0 refers to the first slot 0(sp). 0N/A// and VMRegImpl::stack0+1 refers to the memory word 4-byes higher. Register 0N/A// up to RegisterImpl::number_of_registers) are the 64-bit 0N/A// integer registers. 0N/A// Note: the INPUTS in sig_bt are in units of Java argument words, which are 0N/A// either 32-bit or 64-bit depending on the build. The OUTPUTS are in 32-bit 0N/A// units regardless of build. Of course for i486 there is no 64 bit build 0N/A// The Java calling convention is a "shifted" version of the C ABI. 0N/A// By skipping the first C ABI register we can call non-static jni methods 0N/A// with small numbers of arguments without having to shuffle the arguments 0N/A// at all. Since we control the java ABI we ought to at least get some 0N/A// advantage out of it. 0N/A // Create the mapping between argument positions and 0N/A // halves of T_LONG or T_DOUBLE 0N/A// Patch the callers callsite with entry to compiled code if it exists. 0N/A // Save the current stack pointer 0N/A // Schedule the branch target address early. 0N/A // Call into the VM to patch the caller, then jump to compiled callee 0N/A // rax isn't live so capture return address while we easily can 0N/A // align stack so push_CPU_state doesn't fault 0N/A // VM needs caller's callsite 0N/A // VM needs target method 0N/A // This needs to be a long call since we will relocate this adapter to 0N/A // the codeBuffer and it may not reach 0N/A // Allocate argument register save area 0N/A // De-allocate argument register save area 0N/A // Before we get into the guts of the C2I adapter, see if we should be here 0N/A // at all. We've come from compiled code and are attempting to jump to the 0N/A // interpreter, which means the caller made a static call to get here 0N/A // (vcalls always get a compiled target if there is one). Check for a 0N/A // compiled target. If there is one, we need to patch the caller's call. 0N/A // Since all args are passed on the stack, total_args_passed * 0N/A // Interpreter::stackElementSize is the space we need. Plus 1 because 0N/A // we also account for the return address location since 0N/A // we store it first rather than hold it in rax across all the shuffling 0N/A // stack is aligned, keep it that way 0N/A // Get return address 0N/A // set senderSP value 0N/A // Store the return address in the expected location 0N/A // Now write the args into the outgoing interpreter space 0N/A // offset to start parameters 0N/A // - 0 return address 0N/A // However to make thing extra confusing. Because we can fit a long/double in 0N/A // a single slot on a 64 bt vm and it would be silly to break them up, the interpreter 0N/A // leaves one slot empty and only stores to a single slot. In this case the 0N/A // slot that is occupied is the T_VOID slot. See I said it was confusing. 0N/A // memory to memory use rax 0N/A // Two VMREgs|OptoRegs can be T_OBJECT, T_ADDRESS, T_DOUBLE, T_LONG 0N/A // T_DOUBLE and T_LONG use two slots in the interpreter 0N/A // ld_off == LSW, ld_off+wordSize == MSW 0N/A // st_off == MSW, next_off == LSW 0N/A // Overwrite the unused slot with known junk 0N/A // must be only an int (or less ) so move only 32bits to slot 0N/A // why not sign extend?? 0N/A // Two VMREgs|OptoRegs can be T_OBJECT, T_ADDRESS, T_DOUBLE, T_LONG 0N/A // T_DOUBLE and T_LONG use two slots in the interpreter 0N/A // Overwrite the unused slot with known junk 0N/A // only a float use just part of the slot 0N/A // Overwrite the unused slot with known junk 0N/A // Schedule the branch target address early. 0N/A // Note: r13 contains the senderSP on entry. We must preserve it since 0N/A // we may do a i2c -> c2i transition if we lose a race where compiled 0N/A // code goes non-entrant while we get args ready. 0N/A // In addition we use r13 to locate all the interpreter args as 0N/A // we must align the stack to 16 bytes on an i2c entry else we 0N/A // lose alignment we expect in all compiled code and register 0N/A // save code can segv when fxsave instructions find improperly 0N/A // aligned stack pointer. 2117N/A // Pick up the return address 1135N/A // Must preserve original SP for loading incoming arguments because 1135N/A // we need to align the outgoing SP for compiled code. 0N/A // Cut-out for having no stack args. Since up to 2 int/oop args are passed 0N/A // in registers, we will occasionally have no stack args. 0N/A // Sig words on the stack are greater-than VMRegImpl::stack0. Those in 0N/A // registers are below. By subtracting stack0, we either get a negative 0N/A // number (all values in registers) or the maximum stack slot accessed. 0N/A // Convert 4-byte c2 stack slots to words. 0N/A // Round up to miminum stack alignment, in wordSize 0N/A // Ensure compiled code always sees stack at proper alignment 0N/A // push the return address and misalign the stack that youngest frame always sees 0N/A // as far as the placement of the call instruction 1135N/A // Put saved SP in another register 0N/A // Will jump to the compiled code just as if compiled code was doing it. 0N/A // Pre-load the register-jump target early, to schedule it better. 0N/A // Now generate the shuffle code. Pick up all register args and move the 0N/A // rest through the floating point stack top. 0N/A // Longs and doubles are passed in native word order, but misaligned 0N/A // in the 32-bit build. 0N/A // Pick up 0, 1 or 2 words from SP+offset. 0N/A "scrambled load targets?");
0N/A // Load in argument order going down. 0N/A // Point to interpreter value (vs. tag) 0N/A // Convert stack slot to an SP offset (+ wordSize to account for return address ) 1135N/A // We can use r13 as a temp here because compiled code doesn't need r13 as an input 1135N/A // and if we end up going thru a c2i because of a miss a reasonable value of r13 0N/A // We are using two optoregs. This can be either T_OBJECT, T_ADDRESS, T_LONG, or T_DOUBLE 0N/A // the interpreter allocates two slots but only uses one for thr T_LONG or T_DOUBLE case 0N/A // So we must adjust where to pick up the data to match the interpreter. 0N/A // Interpreter local[n] == MSW, local[n+1] == LSW however locals 0N/A // are accessed as negative so LSW is at LOW address 0N/A // ld_off is MSW so get LSW 0N/A // st_off is LSW (i.e. reg.first()) 0N/A // We are using two VMRegs. This can be either T_OBJECT, T_ADDRESS, T_LONG, or T_DOUBLE 0N/A // the interpreter allocates two slots but only uses one for thr T_LONG or T_DOUBLE case 0N/A // So we must adjust where to pick up the data to match the interpreter. 0N/A // this can be a misaligned move 0N/A // sign extend and use a full word? 0N/A // 6243940 We might end up in handle_wrong_method if 0N/A // the callee is deoptimized as we race thru here. If that 0N/A // happens we don't want to take a safepoint because the 0N/A // caller frame will look interpreted and arguments are now 0N/A // "compiled" so it is much better to make this transition 0N/A // invisible to the stack walking code. Unfortunately if 0N/A // we try and find the callee by normal means a safepoint 0N/A // is possible. So we stash the desired callee in the thread 0N/A // and the vm will find there should this case occur. 0N/A // put methodOop where a c2i would expect should we end up there 0N/A // only needed becaus eof c2 resolve stubs return methodOop as a result in 0N/A// --------------------------------------------------------------- 0N/A // ------------------------------------------------------------------------- 0N/A // Generate a C2I adapter. On entry we know rbx holds the methodOop during calls 0N/A // to the interpreter. The args start out packed in the compiled layout. They 0N/A // need to be unpacked into the interpreter layout. This will almost always 0N/A // require some stack space. We grow the current (compiled) stack, then repack 0N/A // the args. We finally end in a jump to the generic interpreter entry point. 0N/A // On exit from the interpreter, the interpreter will restore our SP (lest the 0N/A // compiled code, which relys solely on SP and not RBP, get sick). 0N/A // Method might have been compiled since the call site was patched to 0N/A // interpreted if that is the case treat it as a miss so we can get 0N/A // the call site corrected. 0N/A// We return the amount of VMRegImpl stack slots we need to reserve for all 0N/A// the arguments NOT counting out_preserve_stack_slots. 0N/A// NOTE: These arrays will have to change when c1 is ported 0N/A // Allocate slots for callee to stuff register args the stack. 0N/A // Allocate slots for callee to stuff register args the stack. 0N/A // Allocate slots for callee to stuff register args the stack. 0N/A // windows abi requires that we always allocate enough stack space 0N/A // for 4 64bit registers to be stored down. 0N/A// On 64 bit we will store integer like items to the stack as 0N/A// 64 bits items (sparc abi) even though java would only store 0N/A// 32bits for a parameter. On 32bit it will simply be 32 bits 0N/A// So this routine will do 32->32 on 32bit and 32->64 on 64bit 0N/A // Do we really have to sign extend??? 0N/A // __ movslq(src.first()->as_Register(), src.first()->as_Register()); 0N/A // Do we really have to sign extend??? 0N/A // __ movslq(dst.first()->as_Register(), src.first()->as_Register()); 0N/A// An oop arg. Must pass a handle not the oop itself 0N/A // must pass a handle. First figure out the location we use as a handle 0N/A // See if oop is NULL if it is we need no handle 0N/A // Oop is already on the stack as an argument 0N/A // conditionally move a NULL 0N/A // Oop is in an a register we must store it to the space we reserve 0N/A // on the stack for oop_handles and pass a handle if oop is non-NULL 0N/A // Store oop in handle area, may be NULL 0N/A // conditionally move a NULL from the handle area where it was just stored 0N/A // If arg is on the stack then place it otherwise it is already in correct reg. 0N/A// A float arg may have to do float reg int reg conversion 0N/A // The calling conventions assures us that each VMregpair is either 0N/A // all really one physical register or adjacent stack slots. 0N/A // This greatly simplifies the cases here compared to sparc. 0N/A // In theory these overlap but the ordering is such that this is likely a nop 0N/A // The calling conventions assures us that each VMregpair is either 0N/A // all really one physical register or adjacent stack slots. 0N/A // This greatly simplifies the cases here compared to sparc. 0N/A // The calling conventions assures us that each VMregpair is either 0N/A // all really one physical register or adjacent stack slots. 0N/A // This greatly simplifies the cases here compared to sparc. 0N/A // In theory these overlap but the ordering is such that this is likely a nop 0N/A // We always ignore the frame_slots arg and just use the space just below frame pointer 0N/A // which by this time is free to use 0N/A // We always ignore the frame_slots arg and just use the space just below frame pointer 0N/A // which by this time is free to use 3158N/A // if map is non-NULL then the code should store the values, 3158N/A // otherwise it should load them. 3158N/A // Save down double word first 3158N/A // Save or restore single word registers 3158N/A // Value is in an input register pass we must flush it to the stack 3158N/A// Check GC_locker::needs_gc and enter the runtime if it's true. This 3158N/A// keeps a new JNI critical region from starting until a GC has been 3158N/A// forced. Save down any oops in registers and describe them in an 3158N/A // Save down any incoming oops and call into the runtime to halt for a GC 3158N/A // Destroy argument registers 3158N/A// Unpack an array argument into a pointer to the body and the length 3158N/A// if the array is non-null, otherwise pass 0 for both. 3158N/A // Pass the length, ptr pair 3158N/A // Load the arg up from the stack 3158N/A // load the length relative to the body. 0N/A// --------------------------------------------------------------------------- 0N/A// Generate a native wrapper for a given method. The method takes arguments 0N/A// in the Java compiled code convention, marshals them to the native 0N/A// convention (handlizes oops, etc), transitions to native, makes the call, 0N/A// returns to java state (possibly blocking), unhandlizes any result and 0N/A // An OopMap for lock (and class if static) 0N/A // We have received a description of where all the java arg are located 0N/A // on entry to the wrapper. We need to convert these args to where 0N/A // the jni function will expect them. To figure out where they go 0N/A // we convert the java signature to a C signature by inserting 0N/A // the hidden arguments as arg[0] and possibly arg[1] (static method) 3158N/A // Arrays are passed as int, elem* pair 0N/A // Now figure out where the args must be stored and how much stack space 0N/A // Compute framesize for the wrapper. We need to handlize all oops in 0N/A // incoming registers 0N/A // Calculate the total number of stack slots we will need. 0N/A // First count the abi requirement plus all of the outgoing args 0N/A // Now the space for the inbound oop handle area 3158N/A // Critical natives may have to call out so they need a save area 0N/A // Now any space we need for handlizing a klass if static method 0N/A // Plus a lock if needed 0N/A // Now a place (+2) to save return values or temp during shuffling 0N/A // + 4 for return address (which we own) and saved rbp 0N/A // Ok The space we have allocated will look like: 0N/A // |---------------------| 0N/A // | 2 slots for moves | 0N/A // |---------------------| 0N/A // | lock box (if sync) | 0N/A // |---------------------| <- lock_slot_offset 0N/A // | klass (if static) | 0N/A // |---------------------| <- klass_slot_offset 0N/A // | oopHandle area | 0N/A // |---------------------| <- oop_handle_offset (6 java arg registers) 0N/A // | outbound memory | 0N/A // | based arguments | 0N/A // |---------------------| 0N/A // SP-> | out_preserved_slots | 0N/A // Now compute actual number of stack words we need rounding to make 0N/A // stack properly aligned. 0N/A // First thing make an ic check to see if we should even be here 0N/A // We are free to use all registers as temps without saving them and 0N/A // restoring them except rbp. rbp is the only callee save register 0N/A // as far as the interpreter and the compiler(s) are concerned. 0N/A // Verified entry point must be aligned 0N/A // The instruction at the verified entry point must be 5 bytes or longer 0N/A // because it can be patched on the fly by make_non_entrant. The stack bang 0N/A // instruction fits that requirement. 0N/A // Generate stack overflow check 0N/A // need a 5 byte instruction to allow MT safe patching to non-entrant 0N/A // Generate a new frame for the wrapper. 0N/A // -2 because return address is already present and so is saved rbp 3158N/A // Frame is now completed as far as size and linkage. 0N/A // It is callee save so it survives the call to native 0N/A // We immediately shuffle the arguments so that any vm call we have to 0N/A // make from here on out (sync slow path, jvmti, etc.) we will have 0N/A // captured the oops from our caller and have a valid oopMap for 0N/A // ----------------- 0N/A // The Grand Shuffle 0N/A // The Java calling convention is either equal (linux) or denser (win64) than the 0N/A // c calling convention. However the because of the jni_env argument the c calling 0N/A // convention always has at least one more (and two for static) arguments than Java. 0N/A // Therefore if we move the args from java -> c backwards then we will never have 0N/A // a register->register conflict and we don't have to build a dependency graph 0N/A // and figure out how to break any cycles. 0N/A // Record esp-based slot for receiver on stack for non-static methods 0N/A // This is a trick. We double the stack slots so we can claim 0N/A // the oops in the caller's frame. Since we are sure to have 0N/A // more args than the caller doubling is enough to make 0N/A // sure we can capture all the incoming oop args from the 0N/A // Mark location of rbp (someday) 0N/A // map->set_callee_saved(VMRegImpl::stack2reg( stack_slots - 2), stack_slots * 2, 0, vmreg(rbp)); 0N/A // Use eax, ebx as temporaries during any memory-memory moves we have to do 0N/A // All inbound args are referenced based on rbp and all outbound args via rsp. 3158N/A // The mapping of Java and C arguments passed in registers are 3158N/A // rotated by one, which helps when passing arguments to regular 3158N/A // Java method but for critical natives that creates a cycle which 3158N/A // can cause arguments to be killed before they are used. Break 3158N/A // the cycle by moving the first argument into a temporary 3158N/A // This may iterate in two different directions depending on the 3158N/A // kind of native it is. The reason is that for regular JNI natives 3158N/A // the incoming and outgoing registers are offset upwards and for 3158N/A // critical natives they are offset down. 0N/A // point c_arg at the first arg that is already loaded in case we 0N/A // need to spill before we call out 0N/A // Pre-load a static method's oop into r14. Used both by locking code and 0N/A // the normal JNI call code. 0N/A // load oop into a register 0N/A // Now handlize the static class mirror it's known not-null. 0N/A // Now get the handle 0N/A // store the klass handle as second argument 0N/A // and protect the arg if we must spill 0N/A // Change state to native (we save the return address in the thread, since it might not 0N/A // be pushed on the stack when we do a a stack traversal). It is enough that the pc() 0N/A // points into the right code segment. It does not have to be the correct return pc. 0N/A // We use the same pc/oopMap repeatedly when we call out 0N/A // We have all of the arguments setup at this point. We must not touch any register 0N/A // argument registers at this point (what if we save/restore them there are no oop? 0N/A // protect the args we've loaded 610N/A // RedefineClasses() tracing support for obsolete method entry 610N/A // protect the args we've loaded 0N/A // Lock a synchronized method 0N/A // Register definitions used by locking and unlocking 0N/A // Get the handle (the 2nd argument) 0N/A // Get address of the box 0N/A // Load the oop from the handle 0N/A // Load immediate 1 into swap_reg %rax 0N/A // Load (object->mark() | 1) into swap_reg %rax 0N/A // Save (object->mark() | 1) into BasicLock's displaced header 0N/A // src -> dest iff dest == rax else rax <- dest 0N/A // Hmm should this move to the slow path code area??? 0N/A // Test if the oopMark is an obvious stack pointer, i.e., 0N/A // 1) (mark & 3) == 0, and 0N/A // 2) rsp <= mark < mark + os::pagesize() 0N/A // These 3 tests can be done by evaluating the following 0N/A // expression: ((mark - rsp) & (3 - os::vm_page_size())), 0N/A // assuming both stack pointer and pagesize have their 0N/A // least significant 2 bits clear. 0N/A // NOTE: the oopMark is in swap_reg %rax as the result of cmpxchg 0N/A // Save the test result, for recursive case, the result is zero 0N/A // Slow path will re-enter here 0N/A // Finally just about ready to make the JNI call 0N/A // get JNIEnv* which is first argument to native 0N/A // Now set thread in native 0N/A // Either restore the MXCSR register after returning from the JNI Call 0N/A // or verify that it wasn't changed. 0N/A // Unpack native results. 0N/A // Result is in xmm0 we'll save as needed 0N/A break;
// can't de-handlize until after safepoint check 0N/A // Switch thread to "native transition" state before reading the synchronization state. 0N/A // This additional state is necessary because reading and testing the synchronization 0N/A // state is not atomic w.r.t. GC, as this scenario demonstrates: 0N/A // Java thread A, in _thread_in_native state, loads _not_synchronized and is preempted. 0N/A // VM thread changes sync state to synchronizing and suspends threads for GC. 0N/A // Thread A is resumed to finish this native method, but doesn't block here since it 0N/A // didn't see any synchronization is progress, and escapes. 0N/A // Force this write out before the read below 0N/A // Write serialization page so VM thread can do a pseudo remote membar. 0N/A // We use the current thread pointer to calculate a thread specific 0N/A // offset to write to within the page. This minimizes bus traffic 0N/A // due to cache line collision. 0N/A // check for safepoint operation in progress and/or pending suspend requests 0N/A // Don't use call_VM as it will see a possible pending exception and forward it 0N/A // and never return here preventing us from clearing _last_native_pc down below. 0N/A // Also can't use call_VM_leaf either as it will check to see if rsi & rdi are 0N/A // preserved and correspond to the bcp/locals pointers. So we do a runtime call 0N/A // Restore any method result value 3158N/A // The call above performed the transition to thread_in_Java so 3158N/A // skip the transition logic below. 0N/A // change thread state 0N/A // native result if any is live 0N/A // Get locked oop from the handle we passed to jni 0N/A // Simple recursive lock? 0N/A // Must save rax if if it is live now because cmpxchg must use it 0N/A // get address of the stack lock 0N/A // get old displaced header 0N/A // Atomic swap old header if oop still contains the stack lock 0N/A // slow path re-enters here 0N/A // Unpack oop result 0N/A // Unexpected paths are out of line and go here 3158N/A // and forward the exception 0N/A // Slow path locking & unlocking 0N/A // BEGIN Slow path lock 0N/A // has last_Java_frame setup. No exceptions so do vanilla call not call_VM 0N/A // args are (oop obj, BasicLock* lock, JavaThread* thread) 0N/A // protect the args we've loaded 0N/A // Not a leaf but we have last_Java_frame setup as we want 0N/A __ stop(
"no pending exception allowed on exit from monitorenter");
0N/A // END Slow path lock 0N/A // BEGIN Slow path unlock 0N/A // If we haven't already saved the native result we must save it now as xmm registers 0N/A // are still exposed. 0N/A // Save pending exception around call to VM (which contains an EXCEPTION_MARK) 0N/A // NOTE that obj_reg == rbx currently 0N/A __ stop(
"no pending exception allowed on exit complete_monitor_unlocking_C");
0N/A // END Slow path unlock 0N/A // SLOW PATH Reguard the stack if needed 116N/A// --------------------------------------------------------------------------- 116N/A// Generate a dtrace nmethod for a given signature. The method takes arguments 116N/A// in the Java compiled code convention, marshals them to the native 116N/A// abi and then leaves nops at the position you would expect to call a native 116N/A// function. When the probe is enabled the nops are replaced with a trap 116N/A// instruction that dtrace inserts and the trace will cause a notification 116N/A// arguments. No other java types are allowed. Strings are converted to utf8 116N/A// strings so that from dtrace point of view java strings are converted to C 116N/A// strings. There is an arbitrary fixed limit on the total space that a method 116N/A// can use for converting the strings. (256 chars per string in the signature). 116N/A// So any java string larger then this is truncated. 116N/A // generate_dtrace_nmethod is guarded by a mutex so we are sure to 116N/A // be single threaded in this method. 116N/A // Fill in the signature array, for the calling-convention call. 116N/A // The signature we are going to use for the trap that dtrace will see 116N/A // is converted to a two-slot long, which is why we double the allocation). 116N/A // Skip the receiver as dtrace doesn't want to see it 116N/A // We need to convert the java args to where a native (non-jni) function 116N/A // would expect them. To figure out where they go we convert the java 116N/A // signature to a C signature. 116N/A // We convert double to long 116N/A // We convert float to int 116N/A // Now get the compiled-Java layout as input arguments 116N/A // Now figure out where the args must be stored and how much stack space 116N/A // they require (neglecting out_preserve_stack_slots but space for storing 116N/A // the 1st six register arguments). It's weird see int_stk_helper. 116N/A // Calculate the total number of stack slots we will need. 116N/A // First count the abi requirement plus all of the outgoing args 116N/A // Now space for the string(s) we must convert 116N/A // Plus the temps we might need to juggle register args 116N/A // regs take two slots each 116N/A // + 4 for return address (which we own) and saved rbp, 116N/A // Ok The space we have allocated will look like: 116N/A // |---------------------| 116N/A // |---------------------| <- string_locs[n] 116N/A // |---------------------| <- string_locs[n-1] 116N/A // |---------------------| <- string_locs[1] 116N/A // |---------------------| <- string_locs[0] 116N/A // |---------------------| 116N/A // SP-> | out_preserved_slots | 116N/A // Now compute actual number of stack words we need rounding to make 116N/A // stack properly aligned. 116N/A // First thing make an ic check to see if we should even be here 116N/A // We are free to use all registers as temps without saving them and 116N/A // restoring them except rbp. rbp, is the only callee save register 116N/A // as far as the interpreter and the compiler(s) are concerned. 116N/A // verified entry must be aligned for code patching. 116N/A // and the first 5 bytes must be in the same cache line 116N/A // if we align at 8 then we will be sure 5 bytes are in the same line 116N/A // The instruction at the verified entry point must be 5 bytes or longer 116N/A // because it can be patched on the fly by make_non_entrant. The stack bang 116N/A // instruction fits that requirement. 116N/A // Generate stack overflow check 116N/A // need a 5 byte instruction to allow MT safe patching to non-entrant 116N/A "valid size for make_non_entrant");
116N/A // Generate a new frame for the wrapper. 116N/A // -4 because return address is already present and so is saved rbp, 116N/A // Frame is now completed as far a size and linkage. 116N/A // State of input register args 116N/A // All args (except strings) destined for the stack are moved first 116N/A // Get the real reg value or a dummy (rsp) 116N/A // Even though a string arg in a register is still live after this loop 116N/A // after the string conversion loop (next) it will be dead so we take 116N/A // advantage of that now for simpler code to manage live. 116N/A // need to unbox a one-word value 116N/A ++
c_arg;
// skip over T_VOID to keep the loop indices in sync 116N/A // Convert the arg to NULL 116N/A // This does the right thing since we know it is destined for the 116N/A // This does the right thing since we know it is destined for the 116N/A // If we have any strings we must store any register based arg to the stack 116N/A // This includes any still live xmm registers too. 116N/A // string oops were left untouched by the previous loop even if the 116N/A // eventual (converted) arg is destined for the stack so park them 116N/A // away now (except for first) 116N/A // The first string arg won't be killed until after the utf8 116N/A // Convert the xmm register to an int and store it in the reserved 116N/A // location for the eventual c register arg 116N/A // If the arg is an oop type we don't support don't bother to store 116N/A // it remember string was handled above. 116N/A ++
c_arg;
// skip over T_VOID to keep the loop indices in sync 116N/A // Now that the volatile registers are safe, convert all the strings 116N/A // The first string we find might still be in the original java arg 116N/A // We will need to eventually save the final argument to the trap 116N/A // in the von-volatile location dedicated to src. This is the offset 116N/A // from fp we will use. 116N/A // This is where the argument will eventually reside 116N/A // arg is still in the original location 116N/A // see if the oop is NULL 116N/A // Save the ptr to utf string in the origina src loc or the tmp 116N/A // And do the conversion 116N/A ++
c_arg;
// skip over T_VOID to keep the loop indices in sync 116N/A // The get_utf call killed all the c_arg registers 116N/A // Now we can finally move the register args to their desired locations 116N/A // Only need to look for args destined for the interger registers (since we 116N/A // Check if the java arg is unsupported and thereofre useless 116N/A // If we're going to kill an existing arg save it first 116N/A // you can't kill yourself 116N/A // If the arg is an oop type we don't support don't bother to store 116N/A // full sized move even for int should be ok 116N/A // At this point r has the original java arg in the final location 116N/A // (assuming it wasn't useless). If the java arg was an oop 116N/A // we have a bit more to do 116N/A // need to unbox a one-word value 116N/A // Convert the arg to NULL 116N/A // dst can longer be holding an input value 116N/A ++
c_arg;
// skip over T_VOID to keep the loop indices in sync 116N/A // Ok now we are done. Need to place the nop that dtrace wants in order to 0N/A// this function returns the adjust size (in number of words) to a c2i adapter 0N/A// activation for use during deoptimization 0N/A//------------------------------generate_deopt_blob---------------------------- 0N/A // Allocate space for the code 0N/A // Setup code generation tools 0N/A // This code enters when returning to a de-optimized nmethod. A return 0N/A // address has been pushed on the the stack, and return values are in 0N/A // If we are doing a normal deopt then we were called from the patched 0N/A // nmethod from the point we returned to the nmethod. So the return 0N/A // address on the stack is wrong by NativeCall::instruction_size 0N/A // We will adjust the value so it looks like we have the original return 0N/A // address on the stack (like when we eagerly deoptimized). 0N/A // In the case of an exception pending when deoptimizing, we enter 0N/A // with a return address on the stack that points after the call we patched 0N/A // into the exception handler. We have the following register state from, 0N/A // rax: exception oop 0N/A // rbx: exception handler 0N/A // So in this case we simply jam rdx into the useless return address and 0N/A // the stack looks just like we want. 0N/A // At this point we need to de-opt. We save the argument return 0N/A // registers. We call the first C routine, fetch_unroll_info(). This 0N/A // routine captures the return values and returns a structure which 0N/A // describes the current frame size and the sizes of all replacement frames. 0N/A // The current frame is compiled code and may contain many inlined 0N/A // functions, each with their own JVM state. We pop the current frame, then 0N/A // push all the new frames. Then we call the C routine unpack_frames() to 0N/A // populate these frames. Finally unpack_frames() returns us the new target 0N/A // address. Notice that callee-save registers are BLOWN here; they have 0N/A // already been captured in the vframeArray at the time the return PC was 0N/A // Prolog for non exception case! 0N/A // Save everything in sight. 0N/A // Normal deoptimization. Save exec mode for unpack_frames. 304N/A // return address is the pc describes what bci to do re-execute at 304N/A // No need to update map as each call to save_live_registers will produce identical oopmap 0N/A // Prolog for exception case 304N/A // all registers are dead at this entry point, except for rax, and 304N/A // rdx which contain the exception oop and exception pc 304N/A // respectively. Set them in TLS and fall thru to the 304N/A // unpack_with_exception_in_tls entry point. 304N/A // new implementation because exception oop is now passed in JavaThread 304N/A // Prolog for exception case 304N/A // All registers must be preserved because they might be used by LinearScan 304N/A // Exceptiop oop and throwing PC are passed in JavaThread 304N/A // tos: stack at point of call to method that threw the exception (i.e. only 304N/A // args are on the stack, no return address) 304N/A // make room on stack for the return address 304N/A // It will be patched later with the throwing pc. The correct value is not 304N/A // available now because loading it from memory would destroy registers. 0N/A // Save everything in sight. 304N/A // Now it is safe to overwrite any register 0N/A // Deopt during an exception. Save exec mode for unpack_frames. 304N/A // load throwing pc from JavaThread and patch it as the return address 304N/A // of the current frame. Then clear the field in JavaThread 304N/A // verify that there is really an exception oop in JavaThread 304N/A // verify that there is no pending exception 0N/A // Call C code. Need thread and this frame, but NOT official VM entry 0N/A // crud. We cannot block on this call, no GC can happen. 0N/A // UnrollBlock* fetch_unroll_info(JavaThread* thread) 0N/A // fetch_unroll_info needs to call last_java_frame(). 0N/A __ stop(
"SharedRuntime::generate_deopt_blob: last_Java_fp not cleared");
0N/A // Need to have an oopmap that tells fetch_unroll_info where to 0N/A // find any register it might need. 0N/A // Load UnrollBlock* into rdi 304N/A // QQQ this is useless it was NULL above 304N/A // Overwrite the result registers with the exception results. 304N/A // I think this is useless 0N/A // Only register save data is on the stack. 0N/A // Now restore the result registers. Everything else is either dead 0N/A // or captured in the vframeArray. 0N/A // All of the register save area has been popped of the stack. Only the 0N/A // return address remains. 0N/A // Frame picture (youngest to oldest) 0N/A // 1: self-frame (no frame link) 0N/A // 2: deopting frame (no frame link) 0N/A // Note: by leaving the return address of self-frame on the stack 0N/A // and using the size of frame 2 to adjust the stack 0N/A // when we are done the return to frame 3 will still be on the stack. 0N/A // Pop deoptimized frame 0N/A // rsp should be pointing at the return address to the caller (3) 0N/A // Stack bang to make sure there's enough room for these interpreter frames. 0N/A // Load address of array of frame pcs into rcx 0N/A // Load address of array of frame sizes into rsi 0N/A // Load counter into rdx 0N/A // Pick up the initial fp we should save 0N/A // Now adjust the caller's stack to make up for the extra locals 0N/A // but record the original sp so that we can save it in the skeletal interpreter 0N/A // frame and the stack walking of interpreter_sender will get the unextended sp 0N/A // value and not the "real" sp value. 0N/A // Push interpreter frames in a loop 0N/A // This value is corrected by layout_activation_impl 0N/A // Re-push self-frame 0N/A // Allocate a full sized register save area. 0N/A // Return address and rbp are in place, so we allocate two less words. 0N/A // Restore frame locals after moving the frame 0N/A // Call C code. Need thread but NOT official VM entry 0N/A // crud. We cannot block on this call, no GC can happen. Call should 0N/A // restore return values to their stack-slots with the new SP. 0N/A // void Deoptimization::unpack_frames(JavaThread* thread, int exec_mode) 0N/A // Use rbp because the frames look interpreted now 2895N/A // Save "the_pc" since it cannot easily be retrieved using the last_java_SP after we aligned SP. 2895N/A // Don't need the precise return PC here, just precise enough to point into this code blob. 2895N/A // Revert SP alignment after call since we're going to do some SP relative addressing below 0N/A // Set an oopmap for the call site 2895N/A // Use the same PC we used for the last java frame 0N/A // Collect return values 304N/A // I think this is useless (throwing pc?) 0N/A // Jump to interpreter 0N/A // Make sure all code is generated 0N/A//------------------------------generate_uncommon_trap_blob-------------------- 0N/A // Allocate space for the code 0N/A // Setup code generation tools 0N/A // Push self-frame. We get here with a return address on the 0N/A // stack, so rsp is 8-byte aligned until we allocate our frame. 0N/A // No callee saved registers. rbp is assumed implicitly saved 0N/A // compiler left unloaded_class_index in j_rarg0 move to where the 0N/A // runtime expects it. 0N/A // Call C code. Need thread but NOT official VM entry 0N/A // crud. We cannot block on this call, no GC can happen. Call should 0N/A // capture callee-saved registers as well as return values. 0N/A // Thread is in rdi already. 0N/A // UnrollBlock* uncommon_trap(JavaThread* thread, jint unloaded_class_index); 0N/A // Set an oopmap for the call site 0N/A // location of rbp is known implicitly by the frame sender code 0N/A // Load UnrollBlock* into rdi 0N/A // Frame picture (youngest to oldest) 0N/A // 1: self-frame (no frame link) 0N/A // 2: deopting frame (no frame link) 0N/A // Pop self-frame. We have no frame, and must rely only on rax and rsp. 0N/A // Pop deoptimized frame (int) 0N/A // rsp should be pointing at the return address to the caller (3) 0N/A // Stack bang to make sure there's enough room for these interpreter frames. 0N/A // Load address of array of frame pcs into rcx (address*) 0N/A // Trash the return pc 0N/A // Load address of array of frame sizes into rsi (intptr_t*) 0N/A // Pick up the initial fp we should save 0N/A // Now adjust the caller's stack to make up for the extra locals but 0N/A // record the original sp so that we can save it in the skeletal 0N/A // interpreter frame and the stack walking of interpreter_sender 0N/A // will get the unextended sp value and not the "real" sp value. 0N/A // Push interpreter frames in a loop 0N/A // This value is corrected by layout_activation_impl 0N/A // Re-push self-frame 0N/A // Use rbp because the frames look interpreted now 2895N/A // Save "the_pc" since it cannot easily be retrieved using the last_java_SP after we aligned SP. 2895N/A // Don't need the precise return PC here, just precise enough to point into this code blob. 0N/A // Call C code. Need thread but NOT official VM entry 0N/A // crud. We cannot block on this call, no GC can happen. Call should 0N/A // restore return values to their stack-slots with the new SP. 0N/A // Thread is in rdi already. 0N/A // BasicType unpack_frames(JavaThread* thread, int exec_mode); 0N/A // Set an oopmap for the call site 2895N/A // Use the same PC we used for the last java frame 0N/A // Jump to interpreter 0N/A // Make sure all code is generated 0N/A//------------------------------generate_handler_blob------ 0N/A// Generate a special Compile2Runtime blob that saves all registers, 0N/A "must be generated before");
0N/A // Allocate space for the code. Setup code generation tools. 0N/A // Make room for return address (or push it again) 0N/A // Save registers, fpu state, and flags 0N/A // The following is basically a call_VM. However, we need the precise 0N/A // address of the call in order to generate an oopmap. Hence, we do all the 0N/A // The return address must always be correct so that frame constructor never 0N/A // sees an invalid pc. 0N/A // overwrite the dummy value we pushed on entry 0N/A // Set an oopmap for the call site. This oopmap will map all 0N/A // oop-registers and debug-info registers as callee-saved. This 0N/A // will allow deoptimization at this safepoint to find all possible 0N/A // debug-info recordings, as well as let GC find all oops. 0N/A // Exception pending 0N/A // No exception case 0N/A // Normal exit, restore registers and exit. 0N/A // Make sure all code is generated 0N/A // Fill-out other meta info 0N/A// Generate a stub that calls into vm to find out the proper destination 0N/A// of a java call. All the argument registers are live at this point 0N/A// but since this is generic code we don't know what they are and the caller 0N/A// must do any gc of the args. 0N/A // allocate space for the code 0N/A // Set an oopmap for the call site. 0N/A // We need this not only for callee-saved registers, but also for volatile 0N/A // registers that the compiler might be keeping live across a safepoint. 0N/A // rax contains the address we are going to jump to assuming no exception got installed 0N/A // clear last_Java_sp 0N/A // check for pending exceptions 0N/A // get the returned methodOop 0N/A // We are back the the original state on entry and ready to go. 0N/A // Pending exception after the safepoint 0N/A // exception pending => remove activation and forward to exception handler 0N/A // make sure all code is generated 0N/A // frame_size_words or bytes?? 0N/A//------------------------------generate_exception_blob--------------------------- 0N/A// creates exception blob at the end 0N/A// Using exception blob, this code is jumped from a compiled method. 0N/A// (see emit_exception_handler in x86_64.ad file) 0N/A// Given an exception pc at a call we call into the runtime for the 0N/A// handler in this method. This handler might merely restore state 0N/A// (i.e. callee save registers) unwind the frame and jump to the 0N/A// exception handler for the nmethod if there is no Java level handler 0N/A// This code is entered with a jmp. 0N/A// rax: exception oop 0N/A// rax: exception oop 0N/A// rdx: exception pc in caller or ??? 0N/A// destination: exception handler of caller 0N/A// Note: the exception pc MUST be at a call (precise debug information) 0N/A// Registers rax, rdx, rcx, rsi, rdi, r8-r11 are not callee saved. 0N/A // Allocate space for the code 0N/A // Setup code generation tools 0N/A // Exception pc is 'return address' for stack walker 0N/A // Save callee-saved registers. See x86_64.ad. 0N/A // rbp is an implicitly saved callee saved register (i.e. the calling 0N/A // there are no callee save registers now that adapter frames are gone. 0N/A // Store exception in Thread object. We cannot pass any arguments to the 0N/A // handle_exception call, since we do not want to make any assumption 0N/A // about the size of the frame where the exception happened in. 0N/A // c_rarg0 is either rdi (Linux) or rcx (Windows). 0N/A // This call does all the hard work. It checks if an exception handler 0N/A // exists in the method. 0N/A // If so, it returns the handler address. 0N/A // If not, it prepares for stack-unwinding, restoring the callee-save 0N/A // registers of the frame being removed. 0N/A // address OptoRuntime::handle_exception_C(JavaThread* thread) 0N/A // Set an oopmap for the call site. This oopmap will only be used if we 0N/A // are unwinding the stack. Hence, all locations will be dead. 0N/A // Callee-saved registers will be the same as the frame above (i.e., 0N/A // handle_exception_stub), since they were restored when we got the 0N/A // Restore callee-saved registers 0N/A // rbp is an implicitly saved callee saved register (i.e. the calling 0N/A // there are no callee save registers no that adapter frames are gone. 0N/A // rax: exception handler 1368N/A // Restore SP from BP if the exception PC is a MethodHandle call site. 0N/A // We have a handler in rax (could be deopt blob). 0N/A // Get the exception oop 0N/A // Get the exception pc in case we are deoptimized 0N/A // Clear the exception oop so GC no longer processes it as a root. 0N/A // rax: exception oop 0N/A // r8: exception handler 0N/A // rdx: exception pc 0N/A // Make sure all code is generated 0N/A // Set exception blob