sfmmu_asm.s revision 9d0d62ad2e60e8f742a2e723d06e88352ee6a1f3
* Assumes TSBE_INTHI is 0 #
error "TSB_UPDATE and TSB_INVALIDATE assume TSBE_TAG = 0"#
error "TSB_UPDATE and TSB_INVALIDATE assume TSBTAG_INTHI = 0" * The following code assumes the tsb is not split. * With TSBs no longer shared between processes, it's no longer * necessary to hash the context bits into the tsb index to get * tsb coloring; the new implementation treats the TSB as a * direct-mapped, virtually-addressed cache. * vpshift = virtual page shift; e.g. 13 for 8K TTEs (constant or ro) * tsbbase = base address of TSB (clobbered) * tagacc = tag access register (clobbered) * szc = size code of TSB (ro) * tsbbase = pointer to entry in TSB * When the kpm TSB is used it is assumed that it is direct mapped * using (vaddr>>vpshift)%tsbsz as the index. * Note that, for now, the kpm TSB and kernel TSB are the same for * each mapping size. However that need not always be the case. If * the trap handlers are updated to search a different TSB for kpm * addresses than for kernel addresses then kpm_tsbbase and kpm_tsbsz * vpshift = virtual page shift; e.g. 13 for 8K TTEs (constant or ro) * vaddr = virtual address (clobbered) * tsbp, szc, tmp = scratch * tsbp = pointer to entry in TSB bne,
pn %
icc,
1f /* branch if large case */ ;\
* Lock the TSBE at virtual address tsbep. * tmp1, tmp2 = scratch registers (clobbered) * label = label to jump to if we fail to lock the tsb entry * %asi = ASI to use for TSB access * NOTE that we flush the TSB using fast VIS instructions that * set all 1's in the TSB tag, so TSBTAG_LOCKED|TSBTAG_INVALID must * not be treated as a locked entry or we'll get stuck spinning on * an entry that isn't locked but really invalid. /* tsbe lock acquired */ ;\
/* tsbe lock acquired */ ;\
* Atomically write TSBE at virtual address tsbep. * tagtarget = TSBE tag (ro) * %asi = ASI to use for TSB access * Load an entry into the TSB at TL > 0. * tsbep = pointer to the TSBE to load as va (ro) * tte = value of the TTE retrieved and loaded (wo) * tagtarget = tag target register. To get TSBE tag to load, * we need to mask off the context and leave only the va (clobbered) * tmp1, tmp2 = scratch registers * label = label to jump to if we fail to lock the tsb entry * %asi = ASI to use for TSB access * I don't need to update the TSB then check for the valid tte. ;\ * TSB invalidate will spin till the entry is unlocked. Note, ;\ * we always invalidate the hash table before we unload the TSB.;\ * I don't need to update the TSB then check for the valid tte. ;\ * TSB invalidate will spin till the entry is unlocked. Note, ;\ * we always invalidate the hash table before we unload the TSB.;\ * Load a 32M/256M Panther TSB entry into the TSB at TL > 0, * tsbep = pointer to the TSBE to load as va (ro) * tte = 4M pfn offset (in), value of the TTE retrieved and loaded (out) * with exec_perm turned off and exec_synth turned on * tagtarget = tag target register. To get TSBE tag to load, * we need to mask off the context and leave only the va (clobbered) * tmp1, tmp2 = scratch registers * label = label to use for branch (text) * %asi = ASI to use for TSB access * I don't need to update the TSB then check for the valid tte. ;\ * TSB invalidate will spin till the entry is unlocked. Note, ;\ * we always invalidate the hash table before we unload the TSB.;\ * Or in 4M pfn offset to TTE and set the exec_perm bit to 0 ;\ * and exec_synth bit to 1. ;\ * Build a 4M pfn offset for a Panther 32M/256M page, for ITLB synthesis. * tte = value of the TTE, used to get tte_size bits (ro) * tagaccess = tag access register, used to get 4M pfn bits (ro) * pfn = 4M pfn bits shifted to offset for tte (out) * tmp1 = scratch register * label = label to use for branch (text) * Get 4M bits from tagaccess for 32M, 256M pagesizes. ;\ * Return them, shifted, in pfn. ;\ * Add 4M TTE size code to a tte for a Panther 32M/256M page, * tte = value of the TTE, used to get tte_size bits (rw) * tmp1 = scratch register * Set 4M pagesize tte bits. ;\ * Load an entry into the TSB at TL=0. * tsbep = pointer to the TSBE to load as va (ro) * tteva = pointer to the TTE to load as va (ro) * tagtarget = TSBE tag to load (which contains no context), synthesized * to match va of MMU tag target register only (ro) * tmp1, tmp2 = scratch registers (clobbered) * label = label to use for branches (text) * %asi = ASI to use for TSB access /* can't rd tteva after locking tsb because it can tlb miss */ ;\
/* can't rd tteva after locking tsb because it can tlb miss */ ;\
* Invalidate a TSB entry in the TSB. * NOTE: TSBE_TAG is assumed to be zero. There is a compile time check * about this earlier to ensure this is true. Thus when we are * directly referencing tsbep below, we are referencing the tte_tag * field of the TSBE. If this offset ever changes, the code below * will need to be modified. * tsbep = pointer to TSBE as va (ro) * tag = invalidation is done if this matches the TSBE tag (ro) * tmp1 - tmp3 = scratch registers (clobbered) * label = label name to use for branches (text) * %asi = ASI to use for TSB access * An implementation of setx which will be hot patched at run time. * since it is being hot patched, there is no value passed in. * Thus, essentially we are implementing * where value is RUNTIME_PATCH (aka 0) in this case. nop /* for perf reasons */ ;\
* sfmmu related subroutines * Use cas, if tte has changed underneath us then reread and try again. * In the case of a retry, it will update sttep with the new original. * Use cas, if tte has changed underneath us then return 1, else return 0 .
asciz "sfmmu_asm: interrupts already disabled" .
asciz "sfmmu_asm: sfmmu_vatopfn called for user" .
asciz "sfmmu_asm: 4M tsb pointer mis-match" .
asciz "sfmmu_asm: no unlocked TTEs in TLB 0" .
asciz "sfmmu_asm: interrupts not disabled" .
asciz "sfmmu_asm: kernel as" .
asciz "sfmmu_asm: gnum is zero" .
asciz "sfmmu_asm: cnum is greater than MAX_SFMMU_CTX_VAL" .
asciz "sfmmu_asm: valid SCD with no 3rd scd TSB" .
asciz "sfmmu_asm: ktsb_phys must not be 0 on a sun4v platform" * This routine is called both by resume() and sfmmu_get_ctx() to * allocate a new context for the process on a MMU. * if allocflag == 1, then alloc ctx when HAT mmu cnum == INVALID . * if allocflag == 0, then do not alloc ctx if HAT mmu cnum == INVALID, which * is the case when sfmmu_alloc_ctx is called from resume(). * The caller must disable interrupts before entering this routine. * To reduce ctx switch overhead, the code contains both 'fast path' and * 'slow path' code. The fast path code covers the common case where only * a quick check is needed and the real ctx allocation is not required. * It can be done without holding the per-process (PP) lock. * The 'slow path' code must be protected by the PP Lock and performs ctx * Hardware context register and HAT mmu cnum are updated accordingly. * ret - 0: no ctx is allocated * %g5 = sfmmu gnum returned * %g6 = sfmmu cnum returned * %g2 = &sfmmu_ctxs[mmuid] - SFMMU_CTXS * Fast path code, do a quick check. * Grab per process (PP) sfmmu_ctx_lock spinlock, * followed by the 'slow path' code. * %g5 = sfmmu gnum returned * %g6 = sfmmu cnum returned * %g2 = &sfmmu_ctxs[mmuid] - SFMMU_CTXS * We get here if we do not have a valid context, or * the HAT gnum does not match global gnum. We hold * sfmmu_ctx_lock spinlock. Allocate that context. * %g2 = &sfmmu_ctx_t[mmuid] - SFMMU_CTXS; * %o1 = mmu current cnum value (used as new cnum) * cnum reachs max, bail, so wrap around can be performed later. * program the secondary context register * When we come here and context is invalid, we want to set both * private and shared ctx regs to INVALID. In order to * do so, we set the sfmmu priv/shared flag to 'private' regardless * so that private ctx reg will be set to invalid. * Note that on sun4v values written to private context register are * automatically written to corresponding shared context register as * well. On sun4u SET_SECCTX() will invalidate shared context register * when it sets a private secondary context register. cmp %
g2, %
g3 /* is modified = current? */ be,a,
pt %
xcc,
1f /* yes, don't write */ stx %
g3, [%
o0]
/* update new original */ be,
pt %
xcc,
1f /* cas succeeded - return */ stx %
g3, [%
o0]
/* save as new original */ cmp %
g3, %
g2 /* is modified = current? */ be,a,
pn %
xcc,
1f /* yes, don't write */ mov 0, %
o1 /* as if cas failed. */ stx %
g2, [%
o0]
/* report "current" value */ * Calculate a TSB entry pointer for the given TSB, va, pagesize. * %o0 = TSB base address (in), pointer to TSB entry (out) * %o3 = tsb size code (in) * Return a TSB tag for the given va. * %o0 = va shifted to be in tsb tag format (with no context) (out) * %o0 = start of patch area * %o1 = size code of TSB to patch sub %
o3, %
o1, %
o3 /* decrease shift by tsb szc */ sub %
o3, %
o1, %
o3 /* decrease shift by tsb szc */ * %o0 = start of patch area * %o5 = kernel virtual or physical tsb base address * %o2, %o3 are used as scratch registers. /* fixup sethi instruction */ * %o0 = start of patch area * %o4 = 64 bit value to patch * %o2, %o3 are used as scratch registers. * Note: Assuming that all parts of the instructions which need to be * patched correspond to RUNTIME_PATCH (aka 0) * Note the implementation of setx which is being patched is as follows: * sethi %hh(RUNTIME_PATCH), tmp * sethi %lm(RUNTIME_PATCH), dest * or tmp, %hm(RUNTIME_PATCH), tmp * or dest, %lo(RUNTIME_PATCH), dest * which differs from the implementation in the * "SPARC Architecture Manual" /* fixup sethi instruction */ /* fixup sethi instruction */ /* fixup or instruction */ /* fixup or instruction */ * %o0 = start of patch area * %o4 = 32 bit value to patch * %o2, %o3 are used as scratch registers. * Note: Assuming that all parts of the instructions which need to be * patched correspond to RUNTIME_PATCH (aka 0) * %o0 = start of patch area * %o4 = signed int immediate value to add to sllx/srlx imm field * %o2, %o3 are used as scratch registers. * sllx/srlx store the 6 bit immediate value in the lowest order bits * so we do a simple add. The caller must be careful to prevent * overflow, which could easily occur if the initial value is nonzero! * Patch imm_asi of all ldda instructions in the MMU * trap handlers. We search MMU_PATCH_INSTR instructions * starting from the itlb miss handler (trap 0x64). * %o0 = address of tt[0,1]_itlbmiss * %o1 = imm_asi to setup, shifted by appropriate offset. * %o3 = number of instructions to search * %o4 = reserved by caller: called from leaf routine SET_SIZE(sfmmu_fixup_mmu_asi) * Patch immediate ASI used to access the TSB in the * inputs: %o0 = value of ktsb_phys ENTRY_NP(sfmmu_patch_mmu_asi) mov %o7, %o4 ! save return pc in %o4 mov ASI_QUAD_LDD_PHYS, %o3 ! set QUAD_LDD_PHYS by default * Check ktsb_phys. It must be non-zero for sun4v, panic if not. sethi %hi(sfmmu_panic11), %o0 or %o0, %lo(sfmmu_panic11), %o0 * Some non-sun4v platforms deploy virtual ktsb (ktsb_phys==0). movrz %o0, ASI_NQUAD_LD, %o3 sll %o3, 5, %o1 ! imm_asi offset mov 6, %o3 ! number of instructions sethi %hi(dktsb), %o0 ! to search call sfmmu_fixup_mmu_asi ! patch kdtlb miss mov 6, %o3 ! number of instructions sethi %hi(dktsb4m), %o0 ! to search call sfmmu_fixup_mmu_asi ! patch kdtlb4m miss or %o0, %lo(dktsb4m), %o0 mov 6, %o3 ! number of instructions sethi %hi(iktsb), %o0 ! to search call sfmmu_fixup_mmu_asi ! patch kitlb miss mov 6, %o3 ! number of instructions sethi %hi(iktsb4m), %o0 ! to search call sfmmu_fixup_mmu_asi ! patch kitlb4m miss or %o0, %lo(iktsb4m), %o0 mov %o4, %o7 ! retore return pc -- leaf SET_SIZE(sfmmu_patch_mmu_asi) ENTRY_NP(sfmmu_patch_ktsb) * We need to fix iktsb, dktsb, et. al. save %sp, -SA(MINFRAME), %sp sethi %hi(ktsb_szcode), %o1 ld [%o1 + %lo(ktsb_szcode)], %o1 /* %o1 = ktsb size code */ call sfmmu_fix_ktlb_traptable call sfmmu_fix_ktlb_traptable sethi %hi(ktsb4m_szcode), %o1 ld [%o1 + %lo(ktsb4m_szcode)], %o1 /* %o1 = ktsb4m size code */ call sfmmu_fix_ktlb_traptable or %o0, %lo(iktsb4m), %o0 call sfmmu_fix_ktlb_traptable or %o0, %lo(dktsb4m), %o0 movrnz %o4, ASI_MEM, %o2 ! setup kernel 32bit ASI to patch mov %o2, %o4 ! sfmmu_fixup_or needs this in %o4 sethi %hi(tsb_kernel_patch_asi), %o0 or %o0, %lo(tsb_kernel_patch_asi), %o0 ldx [%o5], %o4 ! load ktsb base addr (VA or PA) sethi %hi(dktsbbase), %o0 call sfmmu_fixup_setx ! patch value of ktsb base addr or %o0, %lo(dktsbbase), %o0 sethi %hi(iktsbbase), %o0 call sfmmu_fixup_setx ! patch value of ktsb base addr or %o0, %lo(iktsbbase), %o0 sethi %hi(sfmmu_kprot_patch_ktsb_base), %o0 call sfmmu_fixup_setx ! patch value of ktsb base addr or %o0, %lo(sfmmu_kprot_patch_ktsb_base), %o0 sethi %hi(sfmmu_dslow_patch_ktsb_base), %o0 call sfmmu_fixup_setx ! patch value of ktsb base addr or %o0, %lo(sfmmu_dslow_patch_ktsb_base), %o0 ldx [%l1], %o4 ! load ktsb4m base addr (VA or PA) sethi %hi(dktsb4mbase), %o0 call sfmmu_fixup_setx ! patch value of ktsb4m base addr or %o0, %lo(dktsb4mbase), %o0 sethi %hi(iktsb4mbase), %o0 call sfmmu_fixup_setx ! patch value of ktsb4m base addr or %o0, %lo(iktsb4mbase), %o0 sethi %hi(sfmmu_kprot_patch_ktsb4m_base), %o0 call sfmmu_fixup_setx ! patch value of ktsb4m base addr or %o0, %lo(sfmmu_kprot_patch_ktsb4m_base), %o0 sethi %hi(sfmmu_dslow_patch_ktsb4m_base), %o0 call sfmmu_fixup_setx ! patch value of ktsb4m base addr or %o0, %lo(sfmmu_dslow_patch_ktsb4m_base), %o0 sethi %hi(sfmmu_kprot_patch_ktsb_szcode), %o0 call sfmmu_fixup_or ! patch value of ktsb_szcode or %o0, %lo(sfmmu_kprot_patch_ktsb_szcode), %o0 sethi %hi(sfmmu_dslow_patch_ktsb_szcode), %o0 call sfmmu_fixup_or ! patch value of ktsb_szcode or %o0, %lo(sfmmu_dslow_patch_ktsb_szcode), %o0 sethi %hi(sfmmu_kprot_patch_ktsb4m_szcode), %o0 call sfmmu_fixup_or ! patch value of ktsb4m_szcode or %o0, %lo(sfmmu_kprot_patch_ktsb4m_szcode), %o0 sethi %hi(sfmmu_dslow_patch_ktsb4m_szcode), %o0 call sfmmu_fixup_or ! patch value of ktsb4m_szcode or %o0, %lo(sfmmu_dslow_patch_ktsb4m_szcode), %o0 SET_SIZE(sfmmu_patch_ktsb) ENTRY_NP(sfmmu_kpm_patch_tlbm) * Fixup trap handlers in common segkpm case. This is reserved * for future use should kpm TSB be changed to be other than the SET_SIZE(sfmmu_kpm_patch_tlbm) ENTRY_NP(sfmmu_kpm_patch_tsbm) * nop the branch to sfmmu_kpm_dtsb_miss_small * in the case where we are using large pages for * seg_kpm (and hence must probe the second TSB for set dktsb4m_kpmcheck_small, %o0 SET_SIZE(sfmmu_kpm_patch_tsbm) ENTRY_NP(sfmmu_patch_utsb) * We need to hot patch utsb_vabase and utsb4m_vabase save %sp, -SA(MINFRAME), %sp /* patch value of utsb_vabase */ sethi %hi(sfmmu_uprot_get_1st_tsbe_ptr), %o0 or %o0, %lo(sfmmu_uprot_get_1st_tsbe_ptr), %o0 sethi %hi(sfmmu_uitlb_get_1st_tsbe_ptr), %o0 or %o0, %lo(sfmmu_uitlb_get_1st_tsbe_ptr), %o0 sethi %hi(sfmmu_udtlb_get_1st_tsbe_ptr), %o0 or %o0, %lo(sfmmu_udtlb_get_1st_tsbe_ptr), %o0 /* patch value of utsb4m_vabase */ sethi %hi(sfmmu_uprot_get_2nd_tsb_base), %o0 or %o0, %lo(sfmmu_uprot_get_2nd_tsb_base), %o0 sethi %hi(sfmmu_uitlb_get_2nd_tsb_base), %o0 or %o0, %lo(sfmmu_uitlb_get_2nd_tsb_base), %o0 sethi %hi(sfmmu_udtlb_get_2nd_tsb_base), %o0 or %o0, %lo(sfmmu_udtlb_get_2nd_tsb_base), %o0 * Patch TSB base register masks and shifts if needed. * By default the TSB base register contents are set up for 4M slab. /* patch reserved VA range size if needed. */ /* patch TSBREG_VAMASK used to set up TSB base register */ * Routine that loads an entry into a tsb using virtual addresses. * Locking is required since all cpus can use the same TSB. * Note that it is no longer required to have a valid context * when calling this function. * %o0 = pointer to tsbe to load * %o2 = virtual pointer to TTE * %o3 = 1 if physical address in %o0 else 0 * Flush TSB of a given entry if the tag matches. * %o0 = pointer to tsbe to be flushed * %o2 = 1 if physical address in %o0 else 0 * Routine that loads a TTE into the kpm TSB from C code. * Locking is required since kpm TSB is shared among all CPUs. * %o2 = virtpg to TSB index shift (e.g. TTE pagesize shift) /* GET_KPM_TSBE_POINTER(vpshift, tsbp, vaddr (clobbers), tmp1, tmp2) */ /* %g2 = tsbep, %g1 clobbered */ /* TSB_UPDATE(tsbep, tteva, tagtarget, tmp1, tmp2, label) */ * Routine that shoots down a TTE in the kpm TSB or in the * kernel TSB depending on virtpg. Locking is required since * %o1 = virtpg to TSB index shift (e.g. TTE page shift) /* GET_KPM_TSBE_POINTER(vpshift, tsbp, vaddr (clobbers), tmp1, tmp2) */ /* %g2 = tsbep, %g1 clobbered */ /* TSB_INVALIDATE(tsbep, tag, tmp1, tmp2, tmp3, label) */ * These macros are used to update global sfmmu hme hash statistics * in perf critical paths. It is only enabled in debug kernels or * if SFMMU_STAT_GATHER is defined #
else /* DEBUG || SFMMU_STAT_GATHER */#
endif /* DEBUG || SFMMU_STAT_GATHER */ * This macro is used to update global sfmmu kstas in non * perf critical areas so they are enabled all the time * These macros are used to update per cpu stats in non perf * critical areas so they are enabled all the time * These macros are used to update per cpu stats in non perf * critical areas so they are enabled all the time * Count kpm dtlb misses separately to allow a different * evaluation of hme and kpm tlbmisses. kpm tsb hits can * be calculated by (kpm_dtlb_misses - kpm_tsb_misses). #
endif /* KPM_TLBMISS_STATS_GATHER */ * The following routines are jumped to from the mmu trap handlers to do * the setting up to call systrap. They are separate routines instead of * being part of the handlers because the handlers would exceed 32 * instructions and since this is part of the slow path the jump #
endif /* PTL1_PANIC_DEBUG */ /* check if we want to test the tl1 panic */ #
endif /* PTL1_PANIC_DEBUG */ /* g1 = TL0 handler, g2 = tagacc, g3 = trap type */ * No %g registers in use at this point. /* We assume previous %gl was 1 */ /* user miss at tl>1. better be the window handler or user_rtt */ /* user miss at tl>1. better be the window handler */ /* tpc should be in the trap table */ * some wbuf handlers will call systrap to resolve the fault * we pass the trap type so they figure out the correct parameters. * g5 = trap type, g6 = tag access reg * only use g5, g6, g7 registers after we have switched to alternate * We have accessed an unmapped segkpm address or a legal segkpm * address which is involved in a VAC alias conflict prevention. * Before we go to trap(), check to see if CPU_DTRACE_NOFAULT is * set. If it is, we will instead note that a fault has occurred * by setting CPU_DTRACE_BADADDR and issue a "done" (instead of * a "retry"). This will step over the faulting instruction. * Note that this means that a legal segkpm address involved in * a VAC alias conflict prevention (a rare case to begin with) * cannot be used in DTrace. * g2=tagacc g3.l=type g3.h=0 * Copies ism mapping for this ctx in param "ism" if this is a ISM * tlb miss and branches to label "ismhit". If this is not an ISM * process or an ISM tlb miss it falls thru. * Checks to see if the vaddr passed in via tagacc is in an ISM segment for * If so, it will branch to label "ismhit". If not, it will fall through. * Also hat_unshare() will set the context for this process to INVALID_CONTEXT * so that any other threads of this process will not try and walk the ism * maps while they are being changed. * will make sure of that. This means we can terminate our search on * the first zero mapping we find. * tagacc = (pseudo-)tag access register (vaddr + ctx) (in) * tsbmiss = address of tsb miss area (in) * ismseg = contents of ism_seg for this ism map (out) * ismhat = physical address of imap_ismhat for this ism map (out) * tmp1 = scratch reg (CLOBBERED) * tmp2 = scratch reg (CLOBBERED) * tmp3 = scratch reg (CLOBBERED) * label: temporary labels * ismhit: label where to jump to if an ism dtlb miss * exitlabel:label where to jump if hat is busy due to hat_unshare. * Returns the hme hash bucket (hmebp) given the vaddr, and the hatid * It also returns the virtual pg for vaddr (ie. vaddr << hmeshift) * tagacc = reg containing virtual address * hatid = reg containing sfmmu pointer * hmebp = register where bucket pointer will be stored * vapg = register where virtual page will be stored * tmp1, tmp2 = tmp registers * hashtag includes bspage + hashno (64 bits). * Function to traverse hmeblk hash link list and find corresponding match. * The search is done using physical pointers. It returns the physical address * pointer to the hmeblk that matches with the tag provided. * hmebp = register that points to hme hash bucket, also used as * hmeblktag = register with hmeblk tag match * hatid = register with hatid * hmeblkpa = register where physical ptr will be stored * Function to traverse hmeblk hash link list and find corresponding match. * The search is done using physical pointers. It returns the physical address * pointer to the hmeblk that matches with the tag * hmeblktag = register with hmeblk tag match (rid field is 0) * hatid = register with hatid (pointer to SRD) * hmeblkpa = register where physical ptr will be stored * HMEBLK_TO_HMENT is a macro that given an hmeblk and a vaddr returns * he offset for the corresponding hment. * vaddr = register with virtual address * hmeblkpa = physical pointer to hme_blk * hmentoff = register where hment offset will be stored * GET_TTE is a macro that returns a TTE given a tag and hatid. * tagacc = (pseudo-)tag access register (in) * hatid = sfmmu pointer for TSB miss (in) * tte = tte for TLB miss if found, otherwise clobbered (out) * hmeblkpa = PA of hment if found, otherwise clobbered (out) * tsbarea = pointer to the tsbmiss area for this cpu. (in) * hmemisc = hblk_misc if TTE is found (out), otherwise clobbered * tmp = temp value - clobbered * label = temporary label for branching within macro. * foundlabel = label to jump to when tte is found. * suspendlabel= label to jump to when tte is suspended. * exitlabel = label to jump to when tte is not found. * tte = hmebp (hme bucket pointer) ;\ * hmeblkpa = vapg (virtual page) ;\ * hmemisc, tmp = scratch ;\ * hmeblkpa = CLOBBERED ;\ * hmemisc = htag_bspage+hashno+invalid_rid ;\ * We have found the hmeblk containing the hment. ;\ * Now we calculate the corresponding tte. ;\ * Mapping is suspended, so goto suspend label. ;\ * GET_SHME_TTE is similar to GET_TTE() except it searches * shared hmeblks via HMEHASH_SEARCH_SHME() macro. * If valid tte is found, hmemisc = shctx flag, i.e., shme is * either 0 (not part of scd) or 1 (part of scd). * tte = hmebp (hme bucket pointer) ;\ * hmeblkpa = vapg (virtual page) ;\ * hmemisc, tmp = scratch ;\ * hmemisc = htag_bspage + hashno + 0 (for rid) ;\ * hmeblkpa = CLOBBERED ;\ * We have found the hmeblk containing the hment. ;\ * Now we calculate the corresponding tte. ;\ * tsbarea = tsbmiss area ;\ * tsbarea = tsbmiss area ;\ * We found an invalid 8K tte in shme. ;\ * it may not belong to shme's region since ;\ * regions don't share hmeblks. Continue the search. ;\ * Mapping is suspended, so goto suspend label. ;\ * KERNEL PROTECTION HANDLER * g1 = tsb8k pointer register (clobbered) * g2 = tag access register (ro) * g3 - g7 = scratch registers * Note: This function is patched at runtime for performance reasons. * Any changes here require sfmmu_patch_ktsb fixed. /* %g1 = contents of ktsb_base or ktsb_pbase */ /* %g3 = contents of ktsb4m_base or ktsb4m_pbase */ * USER PROTECTION HANDLER * g1 = tsb8k pointer register (ro) * g2 = tag access register (ro) * g3 = faulting context (clobbered, currently not used) * g4 - g7 = scratch registers /* %g1 = first TSB entry ptr now, %g2 preserved */ /* %g3 = second TSB entry ptr now, %g2 preserved */ /* g1 = first TSB entry ptr */ /* %g3 = second TSB entry ptr now, %g2 preserved */ mov -
1, %
g3 /* set second tsbe ptr to -1 */ /* %g3 = second TSB entry ptr now, %g7 clobbered */ * Kernel 8K page iTLB miss. We also get here if we took a * fast instruction access mmu miss trap while running in * %g1 = 8K TSB pointer register (not used, clobbered) * %g2 = tag access register (used) * %g3 = faulting context id (used) * %g7 = TSB tag to match (used) /* get kernel tsb pointer */ /* we patch the next set of instructions at run time */ /* NOTE: any changes here require sfmmu_patch_ktsb fixed */ /* %g4 = contents of ktsb_base or ktsb_pbase */ /* %g4 = contents of ktsb4m_base or ktsb4m_pbase */ * Kernel dTLB miss. We also get here if we took a fast data * access mmu miss trap while running in invalid context. * Note: for now we store kpm TTEs in the kernel TSB as usual. * We select the TSB miss handler to branch to depending on * the virtual address of the access. In the future it may * be desirable to separate kpm TTEs into their own TSB, * in which case all that needs to be done is to set * early in the miss if we detect a kpm VA to a new handler. * %g1 = 8K TSB pointer register (not used, clobbered) * %g2 = tag access register (used) * %g3 = faulting context id (used) /* Gather some stats for kpm misses in the TLB. */ /* KPM_TLBMISS_STAT_INCR(tagacc, val, tsbma, tmp1, label) */ * Get first TSB offset and look for 8K/64K/512K mapping * using the 8K virtual page as the index. * We patch the next set of instructions at run time; * any changes here require sfmmu_patch_ktsb changes too. /* %g7 = contents of ktsb_base or ktsb_pbase */ * At this point %g1 is our index into the TSB. * We just masked off enough bits of the VA depending /* trapstat expects tte in %g5 */ * If kpm is using large pages, the following instruction needs * to be patched to a nop at boot time (by sfmmu_kpm_patch_tsbm) * so that we will probe the 4M TSB regardless of the VA. In * the case kpm is using small pages, we know no large kernel * mappings are located above 0x80000000.00000000 so we skip the * probe as an optimization. /* delay slot safe, below */ * Get second TSB offset and look for 4M mapping * using 4M virtual page as the TSB index. * %g1 = 8K TSB pointer. Don't squash it. * %g2 = tag access register (we still need it) * We patch the next set of instructions at run time; * any changes here require sfmmu_patch_ktsb changes too. /* %g7 = contents of ktsb4m_base or ktsb4m_pbase */ * At this point %g3 is our index into the TSB. * We just masked off enough bits of the VA depending /* we don't check TTE size here since we assume 4M TSB is separate */ /* trapstat expects tte in %g5 */ * So, we failed to find a valid TTE to match the faulting * address in either TSB. There are a few cases that could land * 1) This is a kernel VA below 0x80000000.00000000. We branch * to sfmmu_tsb_miss_tt to handle the miss. * 2) We missed on a kpm VA, and we didn't find the mapping in the * 4M TSB. Let segkpm handle it. * Note that we shouldn't land here in the case of a kpm VA when * kpm_smallpages is active -- we handled that case earlier at * dktsb4m_kpmcheck_small. * g1 = 8K-indexed primary TSB pointer * g2 = tag access register * g3 = 4M-indexed secondary TSB pointer * User instruction miss w/ single TSB. * The first probe covers 8K, 64K, and 512K page sizes, * because 64K and 512K mappings are replicated off 8K * g1 = tsb8k pointer register * g2 = tag access register * g3 - g6 = scratch registers /* g4 - g5 = clobbered by PROBE_1ST_ITSB */ * User data miss w/ single TSB. * The first probe covers 8K, 64K, and 512K page sizes, * because 64K and 512K mappings are replicated off 8K * g1 = tsb8k pointer register * g2 = tag access register * g3 - g6 = scratch registers /* g4 - g5 = clobbered by PROBE_1ST_DTSB */ * User instruction miss w/ multiple TSBs (sun4v). * The first probe covers 8K, 64K, and 512K page sizes, * because 64K and 512K mappings are replicated off 8K * pointer. Second probe covers 4M page size only. * Just like sfmmu_udtlb_slowpath, except: * o checks for execute permission * g1 = tsb8k pointer register * g2 = tag access register * g3 - g6 = scratch registers /* g4 - g5 = clobbered here */ /* g1 = first TSB pointer, g3 = second TSB pointer */ * User instruction miss w/ multiple TSBs (sun4u). * The first probe covers 8K, 64K, and 512K page sizes, * because 64K and 512K mappings are replicated off 8K * pointer. Probe of 1st TSB has already been done prior to entry * into this routine. For the UTSB_PHYS case we probe up to 3 * valid other TSBs in the following order: * 1) shared TSB for 4M-256M pages * 2) private TSB for 4M-256M pages * 3) shared TSB for 8K-512K pages * For the non UTSB_PHYS case we probe the 2nd TSB here that backs * Just like sfmmu_udtlb_slowpath, except: * o checks for execute permission * g1 = tsb8k pointer register * g2 = tag access register * g4 - g6 = scratch registers mov %
g1, %
g3 /* save tsb8k reg in %g3 */ mov %
g2, %
g6 /* GET_2ND_TSBE_PTR clobbers tagacc */ mov %
g3, %
g7 /* copy tsb8k reg in %g7 */ /* g1 = first TSB pointer, g3 = second TSB pointer */ * We come here for ism predict DTLB_MISS case or if * if probe in first TSB failed. * g1 = tsb8k pointer register * g2 = tag access register * g4 - %g6 = scratch registers * ISM non-predict probe order * probe 1ST_TSB (8K index) * probe 2ND_TSB (4M index) * probe 4TH_TSB (4M index) * probe 3RD_TSB (8K index) * We already probed first TSB in DTLB_MISS handler. * Private 2ND TSB 4M-256 pages * Shared Context 4TH TSB 4M-256 pages * Shared Context 3RD TSB 8K-512K pages * g1 = tsb8k pointer register * g2 = tag access register * g4 - g6 = scratch registers * ISM predict probe order * probe 4TH_TSB (4M index) * probe 2ND_TSB (4M index) * probe 1ST_TSB (8K index) * probe 3RD_TSB (8K index) * Shared Context 4TH TSB 4M-256 pages * Private 2ND TSB 4M-256 pages * Shared Context 3RD TSB 8K-512K pages #
else /* sun4u && UTSB_PHYS */ * g1 = 8K TSB pointer register * g2 = tag access register * g3 = (potentially) second TSB entry ptr * g3 = second TSB ptr IFF ISM pred. (else don't care) * g3 = second TSB ptr IFF ISM pred. (else don't care) /* fall through in 8K->4M probe order */ * Look in the second TSB for the TTE * g1 = First TSB entry ptr if !ISM pred, TSB8K ptr reg if ISM pred. * g3 = 8K TSB pointer register /* GET_2ND_TSBE_PTR(tagacc, tsbe_ptr, tmp1, tmp2) */ /* %g2 is okay, no need to reload, %g3 = second tsbe ptr */ /* %g2 clobbered, %g3 =second tsbe ptr */ /* g4 - g5 = clobbered here; %g7 still vpg_4m at this point */ /* fall through to sfmmu_tsb_miss_tt */ #
endif /* sun4u && UTSB_PHYS */ * We get here if there is a TSB miss OR a write protect trap. * g1 = First TSB entry pointer * g2 = tag access register * g3 = 4M TSB entry pointer; -1 if no 2nd TSB * g4 - g7 = scratch registers * If trapstat is running, we need to shift the %tpc and %tnpc to * point to trapstat's TSB miss return code (note that trapstat * itself will patch the correct offset to add). brz,a,
pn %
g3,
1f /* skip ahead if kernel */ #
endif /* sun4v || UTSB_PHYS */ * The miss wasn't in an ISM segment. * %g1 %g3, %g4, %g5, %g7 all clobbered * %g2 = (pseudo) tag access * Note that there is a small window here where we may have * a 512k page in the hash list but have not set the HAT_512K_FLAG * flag yet, so we will skip searching the 512k hash list. * In this case we will end up in pagefault which will find * the mapping and return. So, in this instance we will end up * spending a bit more time resolving this TSB miss, but it can * only happen once per process and even then, the chances of that * are very small, so it's not worth the extra overhead it would * take to close this window. #
else /* sun4u && !UTSB_PHYS */#
endif /* sun4u && !UTSB_PHYS */ * Set ref/mod bits if this is a prot trap. Usually, it isn't. * If ITLB miss check exec bit. * If not set treat as invalid TTE. * Set reference bit if not already set * Now, load into TSB/TLB. At this point: #
else /* ITLB_32M_256M_SUPPORT */#
endif /* ITLB_32M_256M_SUPPORT */#
else /* defined(sun4v) || defined(UTSB_PHYS) */#
endif /* defined(sun4v) || defined(UTSB_PHYS) */#
else /* defined(sun4v) || defined(UTSB_PHYS) */#
endif /* defined(sun4v) || defined(UTSB_PHYS) */ brlz,
pn %
g1,
5f /* Check to see if we have 2nd TSB programmed */ * Panther ITLB synthesis. * The Panther 32M and 256M ITLB code simulates these two large page * sizes with 4M pages, to provide support for programs, for example * Java, that may copy instructions into a 32M or 256M data page and * then execute them. The code below generates the 4M pfn bits and * saves them in the modified 32M/256M ttes in the TSB. If the tte is * stored in the DTLB to map a 32M/256M page, the 4M pfn offset bits * are ignored by the hardware. * Now, load into TSB/TLB. At this point: bz,
pn %
icc,
4b
/* if not, been here before */ mov ASI_N, %
g7 /* user TSBs always accessed by VA */ brlz,a,
pn %
g1,
7f /* Check to see if we have 2nd TSB programmed */ or %
g5, %
g3, %
g5 /* add 4M bits to TTE */ mov ASI_N, %
g7 /* user TSBs always accessed by VA */ #
endif /* sun4v && ITLB_32M_256M_SUPPORT */ brlz,
pn %
g1,
3f /* skip programming if 4M TSB ptr is -1 */ * This is an ISM [i|d]tlb miss. We optimize for largest * page size down to smallest. * g2 = vaddr + ctx(or ctxtype (sun4v)) aka (pseudo-)tag access * g4 = physical address of ismmap->ism_sfmmu /* g5 = pa of imap_vb_shift */ sub %
g1, %
g3, %
g2 /* g2 = offset in ISM seg */ or %
g2, %
g4, %
g2 /* g2 = (pseudo-)tagacc */ #
endif /* defined(sun4v) || defined(UTSB_PHYS) */ * ISM pages are always locked down. * If we can't find the tte then pagefault * and let the spt segment driver resolve it. * g2 = tagacc w/ISM vaddr (offset in ISM seg) * we get here if we couldn't find a valid tte in the hash. * If user and we are at tl>1 we go to window handling code. * If kernel and the fault is on the same page as our stack * pointer, then we know the stack is bad and the trap handler * will fail, so we call ptl1_panic with PTL1_BAD_STACK. * If this is a kernel trap and tl>1, panic. * Otherwise we call pagefault. brnz,
pn %
g4,
3f /* skip if not kernel */ * We are taking a pagefault in the kernel on a kernel address. If * CPU_DTRACE_NOFAULT is set in the cpuc_dtrace_flags, we don't actually * want to call sfmmu_pagefault -- we will instead note that a fault * has occurred by setting CPU_DTRACE_BADADDR and issue a "done" * (instead of a "retry"). This will step over the faulting * We are taking a pagefault on a non-kernel address. If we are in * the kernel (e.g., due to a copyin()), we will check cpuc_dtrace_flags * and (if CPU_DTRACE_NOFAULT is set) will proceed as outlined above. * Be sure that we're actually taking this miss from the kernel -- * otherwise we have managed to return to user-level with * CPU_DTRACE_NOFAULT set in cpuc_dtrace_flags. * If we have no context, check to see if CPU_DTRACE_NOFAULT is set; * if it is, indicated that we have faulted and issue a done. * Be sure that we're actually taking this miss from the kernel -- * otherwise we have managed to return to user-level with * CPU_DTRACE_NOFAULT set in cpuc_dtrace_flags. * This routine will look for a user or kernel vaddr in the hash * structure. It returns a valid pfn or PFN_INVALID. It doesn't * grab any locks. It should only be used by other sfmmu routines. * disable interrupts to protect the TSBMISS area mov %
g1,%
o5 /* o5 = tsbmiss_area */ * The first arg to GET_TTE is actually tagaccess register * not just vaddr. Since this call is for kernel we need to clear * any lower vaddr bits that would be interpreted as ctx bits. brgez,a,
pn %
g1,
6f /* if tte invalid goto tl0 */ mov -
1, %
o0 /* output = -1 (PFN_INVALID) */ stx %
g1,[%
o2]
/* put tte into *ttep */ * we get here if we couldn't find valid hblk in hash. We rehash mov -
1, %
o0 /* output = -1 (PFN_INVALID) */ * o2 = tsbmiss area use o5 instead of o2 for tsbmiss stx %
g1,[%
o2]
/* put tte into *ttep */ brgez,a,
pn %
g1,
8f /* if tte invalid goto 8: */ sub %
g0,
1, %
o0 /* output = PFN_INVALID */ sub %
g0,
2, %
o0 /* output = PFN_SUSPENDED */ * This routine does NOT support user addresses * There is a routine in C that supports this. * The only reason why we don't have the C routine * support kernel addresses as well is because * we do va_to_pa while holding the hashlock. * This routine is similar to sfmmu_vatopfn() but will only look for * a kernel vaddr in the hash structure for the specified rehash value. * It's just an optimization for the case when pagesize for a given * va range is already known (e.g. large page heap) and we don't want * to start the search with rehash value 1 as sfmmu_vatopfn() does. * Returns valid pfn or PFN_INVALID if * tte for specified rehash # is not found, invalid or suspended. * disable interrupts to protect the TSBMISS area * The first arg to GET_TTE is actually tagaccess register * not just vaddr. Since this call is for kernel we need to clear * any lower vaddr bits that would be interpreted as ctx bits. brgez,a,
pn %
g3,
1f /* check if tte is invalid */ mov -
1, %
o0 /* output = -1 (PFN_INVALID) */ * kpm lock used between trap level tsbmiss handler and kpm C level. * Lookup a memseg for a given pfn and if found, return the physical * address of the corresponding struct memseg in mseg, otherwise * return MSEG_NULLPTR_PA. The kpmtsbm pointer must be provided in * tsbmp, %asi is assumed to be ASI_MEM. * This lookup is done by strictly traversing only the physical memseg * linkage. The more generic approach, to check the virtual linkage * before using the physical (used e.g. with hmehash buckets), cannot * be used here. Memory DR operations can run in parallel to this * lookup w/o any locks and updates of the physical and virtual linkage * cannot be done atomically wrt. to each other. Because physical * address zero can be valid physical address, MSEG_NULLPTR_PA acts * as "physical NULL" pointer. /* brute force lookup */ ;\
* kpm tsb miss handler large pages * g1 = 8K kpm TSB entry pointer * g2 = tag access register * g3 = 4M kpm TSB entry pointer * check TL tsbmiss handling flag * g1 = 8K kpm TSB pointer (not used) * g2 = tag access register * g6 = per-CPU kpm tsbmiss area sub %
g2, %
g7, %
g4 /* paddr = vaddr-kpm_vbase */ * mseg_pa = page_numtomemseg_nolock(pfn) * if (mseg_pa == NULL) sfmmu_kpm_exception * inx = ptokpmp((kpmptop((ptopkpmp(pfn))) - mseg_pa->kpm_pbase)); * g2=pfn g3=mseg_pa g4=inx * kp = &mseg_pa->kpm_pages[inx] * g2=pfn g3=mseg_pa g4=offset g5=kp g7=kpmp_table_sz * Calculate physical kpm_page pointer * g2=pfn g3=mseg_pa g4=offset g5=hashinx * Calculate physical hash lock address * g1=kp_refcntc_pa g2=pfn g5=hashinx * g1=kp_pa g2=pfn g3=hlck_pa * g1=kp_pa g2=ttarget g3=hlck_pa g4=kpmtsbp4m g5=tte g6=kpmtsbm_area /* KPMLOCK_ENTER(kpmlckp, tmp1, label1, asi) */ /* use C-handler if there's no go for dropin */ bne,
pn %
xcc,
5f /* use C-handler if there's no go for dropin */ /* double check refcnt */ brz,
pn %
g7,
5f /* let C-handler deal with this */ * TSB_LOCK_ENTRY(tsbp, tmp1, tmp2, label) (needs %asi set) * If we fail to lock the TSB entry then just load the tte into the /* TSB_INSERT_UNLOCK_ENTRY(tsbp, tte, tagtarget, tmp) */ /* KPMLOCK_EXIT(kpmlckp, asi) */ * If trapstat is running, we need to shift the %tpc and %tnpc to * point to trapstat's TSB miss return code (note that trapstat * itself will patch the correct offset to add). * Note: TTE is expected in %g5 (allows per pagesize reporting). * kpm tsbmiss handler for smallpages * g1 = 8K kpm TSB pointer * g2 = tag access register * g3 = 4M kpm TSB pointer * On fail: goto sfmmu_tsb_miss * check TL tsbmiss handling flag * g2 = tag access register * g3 = 4M kpm TSB pointer (not used) * g6 = per-CPU kpm tsbmiss area * Assembly implementation of SFMMU_KPM_VTOP(vaddr, paddr) * which is defined in mach_kpm.h. Any changes in that macro * should also be ported back to this assembly code. sub %
g2, %
g7, %
g4 /* paddr = vaddr-kpm_vbase */ sub %
g4, %
g5, %
g4 /* paddr -= r << kpm_size_shift */ add %
g4, %
g7, %
g4 /* paddr += (r-v)<<MMU_PAGESHIFT */ sub %
g4, %
g5, %
g4 /* paddr -= r << MMU_PAGESHIFT */ * g2 = tag access register * g6 = per-CPU kpm tsbmiss area * mseg_pa = page_numtomemseg_nolock_pa(pfn) * if (mseg not found) sfmmu_kpm_exception * g2=pfn g6=per-CPU kpm tsbmiss area * g4 g5 g7 for scratch use. * inx = pfn - mseg_pa->kpm_pbase * g2=pfn g3=mseg_pa g6=per-CPU kpm tsbmiss area * g2=pfn g3=mseg_pa g4=inx g6=per-CPU tsbmiss area /* ksp = &mseg_pa->kpm_spages[inx] */ * g2=pfn g3=mseg_pa g4=inx g5=ksp * g6=per-CPU kpm tsbmiss area g7=kpmp_stable_sz * Calculate physical kpm_spage pointer * g2=pfn g3=mseg_pa g4=offset g5=hashinx * g6=per-CPU kpm tsbmiss area * Calculate physical hash lock address. * Note: Changes in kpm_shlk_t must be reflected here. * g1=ksp_pa g2=pfn g5=hashinx * g6=per-CPU kpm tsbmiss area * Assemble non-cacheable tte initially * g1=ksp_pa g2=pfn g3=hlck_pa * g6=per-CPU kpm tsbmiss area * g1=ksp_pa g2=ttarget g3=hlck_pa g4=ktsbp g5=tte (non-cacheable) * g6=per-CPU kpm tsbmiss area g7=scratch register /* KPMLOCK_ENTER(kpmlckp, tmp1, label1, asi) */ /* use C-handler if there's no go for dropin */ * TSB_LOCK_ENTRY(tsbp, tmp1, tmp2, label) (needs %asi set) * If we fail to lock the TSB entry then just load the tte into the /* TSB_INSERT_UNLOCK_ENTRY(tsbp, tte, tagtarget, tmp) */ /* KPMLOCK_EXIT(kpmlckp, asi) */ * If trapstat is running, we need to shift the %tpc and %tnpc to * point to trapstat's TSB miss return code (note that trapstat * itself will patch the correct offset to add). * Note: TTE is expected in %g5 (allows per pagesize reporting). * Enable/disable tsbmiss handling at trap level for a kpm (large) page. * Called from C-level, sets/clears "go" indication for trap level handler. * khl_lock is a low level spin lock to protect the kp_tsbmtl field. * Assumed that &kp->kp_refcntc is checked for zero or -1 at C-level. * Assumes khl_mutex is held when called from C-level. * kpm_smallpages: stores val to byte at address mapped within * low level lock brackets. The old value is returned. .
ascii "sfmmu_kpm_tsbmtl: interrupts disabled" .
ascii "sfmmu_kpm_stsbmtl: interrupts disabled" * The first probe covers 8K, 64K, and 512K page sizes, * because 64K and 512K mappings are replicated off 8K * pointer. Second probe covers 4M page size only. * MMU fault area contains miss address and context. * %g2 = tagacc register (needed for sfmmu_tsb_miss_tt) * %g3 = ctx (cannot be INVALID_CONTEXT) * Get 8K and 4M TSB pointers in %g1 and %g3 and * branch to sfmmu_tsb_miss_tt to handle it. * Get first TSB pointer in %g1 * Get second TSB pointer (or NULL if no second TSB) in %g3 * Branch to sfmmu_tsb_miss_tt to handle it /* %g1 = first TSB entry ptr now, %g2 preserved */ /* %g3 = second TSB entry ptr now, %g2 preserved */ * The first probe covers 8K, 64K, and 512K page sizes, * because 64K and 512K mappings are replicated off 8K * pointer. Second probe covers 4M page size only. * MMU fault area contains miss address and context. * Per-CPU tsbmiss areas to avoid cache misses in TSB miss handlers.