hat_i86.c revision 250b7ff955bc8ffaf1e3b9aae014cbf82bff0589
2N/A * The contents of this file are subject to the terms of the 2N/A * Common Development and Distribution License (the "License"). 2N/A * You may not use this file except in compliance with the License. 2N/A * See the License for the specific language governing permissions 2N/A * and limitations under the License. 2N/A * When distributing Covered Code, include this CDDL HEADER in each 2N/A * If applicable, add the following below this CDDL HEADER, with the 2N/A * fields enclosed by brackets "[]" replaced with your own identifying 2N/A * information: Portions Copyright [yyyy] [name of copyright owner] 2N/A * Copyright 2007 Sun Microsystems, Inc. All rights reserved. 2N/A * Use is subject to license terms. 2N/A#
pragma ident "%Z%%M% %I% %E% SMI" 2N/A * VM - Hardware Address Translation management for i386 and amd64 2N/A * Nearly all the details of how the hardware is managed should not be 2N/A * visible outside this layer except for misc. machine specific functions 2N/A * that work in conjunction with this code. 2N/A * Routines used only inside of i86pc/vm start with hati_ for HAT Internal. * Basic parameters for hat operation. * The page that is the kernel's top level pagetable. * For 32 bit VLP support, the kernel hat will use the 1st 4 entries * on this 4K page for its top level page table. The remaining groups of * 4 entries are used for per processor copies of user VLP pagetables for * running threads. See hat_switch() and reload_pae32() for details. * vlp_page[0] - 0th level==2 PTE for kernel HAT (will be zero) * vlp_page[1] - 1st level==2 PTE for kernel HAT (will be zero) * vlp_page[2] - 2nd level==2 PTE for kernel HAT (zero for small memory) * vlp_page[3] - 3rd level==2 PTE for kernel * vlp_page[4] - 0th level==2 PTE for user thread on cpu 0 * vlp_page[5] - 1st level==2 PTE for user thread on cpu 0 * vlp_page[6] - 2nd level==2 PTE for user thread on cpu 0 * vlp_page[7] - probably copy of kernel PTE * vlp_page[8] - 0th level==2 PTE for user thread on cpu 1 * vlp_page[9] - 1st level==2 PTE for user thread on cpu 1 * vlp_page[10] - 2nd level==2 PTE for user thread on cpu 1 * vlp_page[11] - probably copy of kernel PTE * when / where the kernel PTE's are (entry 2 or 3 or none) depends * forward declaration of internal utility routines * The kernel address space exists in all HATs. To implement this the * kernel reserves a fixed number of entries in every topmost level page * table. The values are setup in hat_init() and then copied to every hat * created by hat_alloc(). This means that kernelbase must be: * 4Meg aligned for 32 bit kernels * 512Gig aligned for x86_64 64 bit kernel * The PAE 32 bit hat is handled as a special case. Otherwise requiring 1Gig * alignment would use too much VA for the kernel. * A cpuset for all cpus. This is used for kernel address cross calls, since * the kernel addresses apply to all cpus. * management stuff for hat structures * kmem cache constructor for struct hat * Allocate a hat structure for as. We also create the top level * htable and initialize it to contain the kernel hat entries. * Once we start creating user process HATs we can enable * the htable_steal() code. * a 32 bit process uses a VLP style hat when using PAE * Allocate the htable hash * Initialize Kernel HAT entries at the top of the top level page * Note that we don't call htable_release() for the top level, that * happens when the hat is destroyed in hat_free_end() * PAE32 HAT alignment is less restrictive than the others to keep * the kernel from using too much VA. Because of this we may need * one layer further down when kernelbase isn't 1Gig aligned. * See hat_free_end() for the htable_release() that goes with this * Put it at the start of the global list of all hats (used by stealing) * kas.a_hat is not in the list but is instead used to find the * first and last items in the list. * - kas.a_hat->hat_next points to the start of the user hats. * The list ends where hat->hat_next == NULL * - kas.a_hat->hat_prev points to the last of the user hats. * The list begins where hat->hat_prev == NULL * process has finished executing but as has not been cleaned up yet. * If the hat is currently a stealing victim, wait for the stealing * to finish. Once we mark it as HAT_FREEING, htable_steal() * won't look at its pagetables anymore. * An address space is being destroyed, so we destroy the associated hat. * must not be running on the given hat * Remove it from the list of HATs * Make a pass through the htables freeing them all up. * Decide which kmem cache the hash table came from, then free it. * round kernelbase down to a supported value to use for _userlimit * userlimit must be aligned down to an entry in the top level htable. * The one exception is for 32 bit HAT's running PAE. panic(
"_userlimit %p will fall in VA hole\n", (
void *)
va);
* Initialize hat data structures based on processor MMU information. * If CPU enabled the page table global bit, use it for the kernel * This is bit 7 in CR4 (PGE - Page Global Enable). * Detect NX and PAE usage. * Intel CPUs allow speculative caching (in TLB-like h/w) of * entries in upper page tables even though there may not be * any valid entries in lower tables. This implies we have to * re-INVLPG at every upper page table entry invalidation. * Use CPU info to set various MMU parameters * If erratum 121 has already been detected at this time, hole_start * contains the value to be subtracted from mmu.hole_start. panic(
"Processor does not support PAE");
panic(
"Processor does not support cmpxchg8b instruction");
* Initialize parameters based on the 64 or 32 bit kernels and * for the 32 bit kernel decide if we should use PAE. * NOTE Legacy 32 bit PAE mode only has the P_VALID bit at top level. * Compute how many hash table entries to have per process for htables. * We start with 1 page's worth of entries. * If physical memory is small, reduce the amount need to cover it. * If running in 64 bits and physical memory is large, * increase the size of the cache to cover all of memory for * initialize hat data structures * _userlimit must be aligned correctly prom_printf(
"hat_init(): _userlimit=%p, not aligned at %p\n",
halt(
"hat_init(): Unable to continue");
* VLP hats can use a smaller hash table size on large memroy machines * Set up the kernel's hat * The kernel hat's next pointer serves as the head of the hat list . * The kernel hat's prev pointer tracks the last hat on the list for * Allocate an htable hash bucket for the kernel * XX64 - tune for 64 bit procs * zero out the top level and cached htable pointers * Pre-allocate hrm_hashtab before enabling the collection of * refmod statistics. Allocating on the fly would mean us * running the risk of suffering recursive mutex enters or * Prepare CPU specific pagetables for VLP processes on 64 bit kernels. * Each CPU has a set of 2 pagetables that are reused for any 32 bit * process it runs. They are the top level pagetable, hci_vlp_l3ptes, and * the next to top level table for the bottom 512 Gig, hci_vlp_l2ptes. * allocate the level==2 page table for the bottom most * 512Gig of address space (this is where 32 bit apps live) * Allocate a top level pagetable and copy the kernel's * entries into it. Then link in hci_vlp_l2ptes in the 1st entry. * Finish filling in the kernel hat. * Pre fill in all top level kernel page table entries for the kernel's * part of the address range. From this point on we can't use any new * kernel large pages if they need PTE's at max_level * create the kmap mappings. * Deal with kernelbase not 1Gig aligned for 32 bit PAE hats. * The kernel hat will need fixed values in the highest level * ptable for copying to all other hat's. This implies * alignment restrictions on _userlimit. * Note we don't htable_release() these htables. This keeps them * from ever being stolen or free'd. * top_level_count is used instead of ptes_per_table, since * on 32-bit PAE we only have 4 usable entries at the top level ptable. * We are now effectively running on the kernel hat. * Clearing use_boot_reserve shuts off using the pre-allocated boot * reserve for all HAT allocations. From here on, the reserves are * only used when mapping in memory for the hat's own allocations. * 32 bit kernels use only 4 of the 512 entries in its top level * pagetable. We'll use the remainder for the "per CPU" page tables * We also map the top level kernel pagetable into the kernel to make * it easy to use bcopy to initialize new address spaces. * Create kmap (cached mappings of kernel PTEs) * for 32 bit we map from segmap_start .. ekernelheap * for 64 bit we map from segmap_start .. segmap_start + segmapsize; * On 32 bit PAE mode, PTE's are 64 bits, but ordinary atomic memory references * are 32 bit, so for safety we must use cas64() to install these. * Load the 4 entries of the level 2 page table into this * cpu's range of the vlp_page and point cr3 at them. * Switch to a new active hat, maintaining bit masks to track active CPUs. * set up this information first, so we don't miss any cross calls * Add this CPU to the active set for this HAT. * now go ahead and load cr3 * Utility to return a valid x86pte_t from protections, pfn, and level number * Set the software bits used track ref/mod sync's and hments. * If not using REF/MOD, set them to avoid h/w rewriting PTEs. * Set the caching attributes in the PTE. The combination * of attributes are poorly defined, so we pay attention * to them in the given order. * The test for HAT_STRICTORDER is different because it's defined * as "0" - which was a stupid thing to do, but is too late to change! /*LINTED [Lint hates empty ifs, but it's the obvious way to do this] */ * Duplicate address translations of the parent to the child. * This function really isn't used anymore. * Allocate any hat resources required for a process being swapped in. /* do nothing - we let everything fault back in */ * Unload all translations associated with an address space of a process * that is being swapped out. * We can't just call hat_unload(hat, 0, _userlimit...) here, because * seg_spt and shared pagetables can't be swapped out. * Take a look at segspt_shmswapout() - it's a big no-op. * Instead we'll walk through all the address space and unload * any mappings which we are sure are not shared, not locked. * If the page table is shared skip its entire range. * This code knows that only level 0 page tables are shared * If the page table has no locked entries, unload this one. * If we have a level 0 page table with locked entries, * skip the entire page table, otherwise skip just one entry. * We're in swapout because the system is low on memory, so * go back and flush all the htables off the cached list. * returns number of bytes that have valid mappings in hat. * Utility to sync the ref/mod bits from a page table entry to the page_t * We must be holding the mapping list lock when this is called. * sync to all constituent pages of a large page * hat_page_demote() can't decrease * pszc below this mapping size * since this large mapping existed after we * This the set of PTE bits for PFN, permissions and caching * that require a TLB flush (hat_tlb_inval) if changed on a HAT_LOAD_REMAP * Do the low-level work to get a mapping entered into a HAT's pagetables * and in the mapping list of the associated page_t. * Is this a consistant (ie. need mapping list lock) mapping? * Track locked mapping count in the htable. Do this first, * as we track locking even if there already is a mapping present. * Acquire the page's mapping list lock and get an hment to use. * Note that hment_prepare() might return NULL. * Set the new pte, retrieving the old one at the same time. * did we get a large page / page table collision? * If the mapping didn't change there is nothing more to do. * Install a new mapping in the page's mapping list * Remap's are more complicated: * - HAT_LOAD_REMAP must be specified if changing the pfn. * We also require that NOCONSIST be specified. * - Otherwise only permission or caching bits may change. * We only let remaps change the bits for PFNs, permissions * We don't create any mapping list entries on a remap, so release * any allocated hment after we drop the mapping list lock. * Internal routine to load a single page table entry. This only fails if * we attempt to overwrite a page table link with a large page. * The number 16 is arbitrary and here to catch a recursion problem * early before we blow out the kernel stack. * Find the page table that maps this page if it already exists. * We must have HAT_LOAD_NOCONSIST if page_t is NULL. * a bunch of paranoid error checking panic(
"hati_load_common: bad htable %p, va %p",
ht, (
void *)
va);
* release the htable and any reserves * special case of hat_memload to deal with some kernel addrs for performance * construct the requested PTE * Figure out the pte_ptr and htable and use common code to finish up * hat_memload() - load a translation to the given page struct * HAT_LOAD Default flags to load a translation to the page. * HAT_LOAD_LOCK Lock down mapping resources; hat_map(), hat_memload(), * HAT_LOAD_NOCONSIST Do not add mapping to page_t mapping list. * HAT_LOAD_SHARE A flag to hat_memload() to indicate h/w page tables * that map some user pages (not kas) is shared by more * than one process (eg. ISM). * HAT_LOAD_REMAP Reload a valid pte with a different page frame. * HAT_NO_KALLOC Do not kmem_alloc while creating the mapping; at this * point, it's setting up mapping to allocate internal * hat layer data structures. This flag forces hat layer * to tap its reserves in order to prevent infinite * The following is a protection attribute (like PROT_READ, etc.) * HAT_NOSYNC set PT_NOSYNC - this mapping's ref/mod bits * Installing new valid PTE's and creation of the mapping list * entry are controlled under the same lock. It's derived from the * kernel address special case for performance. * This is used for memory with normal caching enabled, so * always set HAT_STORECACHING_OK. panic(
"unexpected hati_load_common() failure");
* Load the given array of page structs using large pages when possible * memload is used for memory with full caching enabled, so * set HAT_STORECACHING_OK. * handle all pages using largest possible pagesize * decide what level mapping to use (ie. pagesize) * To use a large mapping of this size, all the * pages we are passed must be sequential subpages * hat_page_demote() can't change p_szc because * Load this page mapping. If the load fails, try a smaller panic(
"unexpected hati_load_common() failure");
* void hat_devload(hat, addr, len, pf, attr, flags) * Advisory ordering attributes. Apply only to device mappings. * HAT_STRICTORDER: the CPU must issue the references in order, as the * programmer specified. This is the default. * HAT_UNORDERED_OK: the CPU may reorder the references (this is all kinds * of reordering; store or load with store or load). * HAT_MERGING_OK: merging and batching: the CPU may merge individual stores * to consecutive locations (for example, turn two consecutive byte * stores into one halfword store), and it may batch individual loads * (for example, turn two consecutive byte loads into one halfword load). * This also implies re-ordering. * HAT_LOADCACHING_OK: the CPU may cache the data it fetches and reuse it * until another store occurs. The default is to fetch new data * on every load. This also implies merging. * HAT_STORECACHING_OK: the CPU may keep the data in the cache and push it to * the device (perhaps with other data) at a later time. The default is * to push the data right away. This also implies load caching. * Equivalent of hat_memload(), but can be used for device memory where * there are no page_t's and we support additional flags (write merging, etc). * Note that we can have large page mappings with this interface. int f;
/* per PTE copy of flags - maybe modified */ uint_t a;
/* per PTE copy of attr */ * decide what level mapping to use (ie. pagesize) * If this is just memory then allow caching (this happens * for the nucleus pages) - though HAT_PLAT_NOCACHE can be used * to override that. If we don't have a page_t then make sure panic(
"unexpected hati_load_common() failure");
* void hat_unlock(hat, addr, len) * unlock the mappings to a given range of addresses * Locks are tracked by ht_lock_cnt in the htable. * kernel entries are always locked, we don't track lock counts panic(
"hat_unlock() address out of range - above _userlimit");
panic(
"hat_unlock(): lock_cnt < 1, " * Cross call service routine to demap a virtual page on * the current CPU or flush all mappings in TLB. * If the target hat isn't the kernel and this CPU isn't operating * in the target hat, we can ignore the cross call. * For a normal address, we just flush one page mapping * Otherwise we reload cr3 to effect a complete TLB flush. * A reload of cr3 on a VLP process also means we must also recopy in * the pte values from the struct hat * Flush all TLB entries, including global (ie. kernel) ones. * 32 bit PAE also needs to always reload_cr3() * Record that a CPU is going idle * Service a delayed TLB flush if coming out of being idle. * Be sure interrupts are off while doing this so that * higher level interrupts correctly wait for flushes to finish. * We only have to do something if coming out of being idle. * Atomic clear and fetch of old state. * Restore interrupt enable control bit. * Internal routine to do cross calls to invalidate a range of pages on * all CPUs using a given hat. * If the hat is being destroyed, there are no more users, so * demap need not do anything. * If demapping from a shared pagetable, we best demap the * entire set of user TLBs, since we don't know what addresses * if not running with multiple CPUs, don't use cross calls * Determine CPUs to shootdown. Kernel changes always do all CPUs. * Otherwise it's just CPUs currently executing in this hat. * If any CPUs in the set are idle, just request a delayed flush * and avoid waking them up. * Interior routine for HAT_UNLOADs from hat_unload_callback(), * hat_kmap_unload() OR from hat_steal() code. This routine doesn't * handle releasing of the htables. * We always track the locking counts, even if nothing is unmapped * Figure out which page's mapping list lock to acquire using the PFN * passed in "old" PTE. We then attempt to invalidate the PTE. * If another thread, probably a hat_pageunload, has asynchronously panic(
"no page_t, not NOCONSIST: old_pte=" FMT_PTE " ht=%lx entry=0x%x pte_ptr=%lx",
* If freeing the address space, check that the PTE * hasn't changed, as the mappings are no longer in use by * any thread, invalidation is unnecessary. * If not freeing, do a full invalidate. * If the page hadn't changed we've unmapped it and can proceed * Otherwise, we'll have to retry with the current old_pte. * Drop the hment lock, since the pfn may have changed. * If the old mapping wasn't valid, there's nothing more to do * Take care of syncing any MOD/REF bits and removing the hment. * Handle book keeping in the htable and hat * very cheap unload implementation to special case some kernel addresses * use mostly common code to unmap it. * unload a range of virtual address space (no callback) * special case for performance. * Do the callbacks for ranges being unloaded. * do callbacks to upper level VM system * Unload a given range of addresses (has optional callback) * define HAT_UNLOAD_NOSYNC 0x02 * define HAT_UNLOAD_UNLOCK 0x04 * define HAT_UNLOAD_OTHER 0x08 - not used * define HAT_UNLOAD_UNMAP 0x10 - same as HAT_UNLOAD * Special case a single page being unloaded for speed. This happens * quite frequently, COW faults after a fork() for example. panic(
"hat_unload_callback(): unmap inside large page");
* We'll do the call backs for contiguous ranges * Unload one mapping from the page tables. * handle last range for callbacks * synchronize mapping with software data structures * This interface is currently only used by the working set monitor * We need to acquire the mapping list lock to protect * against hat_pageunload(), hat_unload(), etc. * Need to clear ref or mod bits. We may compete with * hardware updating the R/M bits and have to try again. * sync the PTE to the page_t * void hat_map(hat, addr, len, flags) * uint_t hat_getattr(hat, addr, *attr) * returns attr for <hat,addr> in *attr. returns 0 if there was a * mapping and *attr is valid, nonzero if there was no mapping and * hat_updateattr() applies the given attribute change to an existing mapping * We found a page table entry in the desired range, * figure out the new attributes. * x86pte_set() depends on this. * what about PROT_READ or others? this code only handles: * If new PTE really changed, update the table. * Various wrappers for hat_updateattr() * size_t hat_getpagesize(hat, addr) * returns pagesize in bytes for <hat, addr>. returns -1 of there is * no mapping. This is an advisory call. * pfn_t hat_getpfnum(hat, addr) * returns pfn for <hat, addr> or PFN_INVALID if mapping is invalid. * A very common use of hat_getpfnum() is from the DDI for kernel pages. * Use the kmap_ptes (which also covers the 32 bit heap) to speed /*LINTED [use of constant 0 causes a silly lint warning] */ * hat_getkpfnum() is an obsolete DDI routine, and its use is discouraged. * Use hat_getpfnum(kas.a_hat, ...) instead. * We'd like to return PFN_INVALID if the mappings have underlying page_t's * but can't right now due to the fact that some software has grown to use * this interface incorrectly. So for now when the interface is misused, * return a warning to the user that in the future it won't work in the * way they're abusing it, and carry on. * Note that hat_getkpfnum() is never supported on amd64. panic(
"hat_getkpfnum(): called too early\n");
* int hat_probe(hat, addr) * return 0 if no valid mapping is present. Faster version * of hat_getattr in certain architectures. * Most common use of hat_probe is from segmap. We special case it * Find out if the segment for hat_share()/hat_unshare() is DISM or locked ISM. * Simple implementation of ISM. hat_share() is similar to hat_memload_array(), * except that we use the ism_hat's existing mappings to determine the pages * and protections to use for this hat. If we find a full properly aligned * and sized pagetable, we will attempt to share the pagetable itself. size_t len,
/* almost useless value, see below.. */ * We might be asked to share an empty DISM hat by as_dup() * The SPT segment driver often passes us a size larger than there are * valid mappings. That's because it rounds the segment size up to a * large pagesize, even if the actual memory mapped by ism_hat is less. * use htable_walk to get the next valid ISM mapping * First check to see if we already share the page table. * Can't ever share top table. * Avoid level mismatches later due to DISM faults. * addresses and lengths must align * table must be fully populated * no lower level page tables * The range of address space must cover a full table. * All entries in the ISM page table must be leaf PTEs. * We know the 0th is from htable_walk() above. * Unable to share the page table. Instead we will * create new mappings from the values in the ISM mappings. * Figure out what level size mappings to use; * The ISM mapping might be larger than the share area, * be careful to truncate it if needed. * Make a new pte for the PFN for this level. * Copy protections for the pte from the ISM pte. panic(
"hati_load_common() failure");
* hat_unshare() is similar to hat_unload_callback(), but * we have to look for empty shared pagetables. Note that * hat_unshare() is always invoked against an entire segment. * First go through and remove any shared pagetables. * Note that it's ok to delay the TLB shootdown till the entire range is * finished, because if hat_pageunload() were to unload a shared * pagetable page, its hat_tlb_inval() will do a global TLB invalidate. * find a pagetable that maps the current address * clear page count, set valid_cnt to 0, * let htable_release() finish the job * flush the TLBs - since we're probably dealing with MANY mappings * we do just one CR3 reload. * Now go back and clean up any unaligned mappings that * couldn't share pagetables. * hat_reserve() does nothing * Called when all mappings to a page should have write permission removed. * Mostly stolem from hat_pagesync() * walk thru the mapping list clearing write permission * Is this mapping of interest? * Clear ref/mod writable bits. This requires cross * calls to ensure any executing TLBs see cleared bits. * void hat_page_setattr(pp, flag) * void hat_page_clrattr(pp, flag) * Some File Systems examine v_pages for NULL w/o * grabbing the vphm mutex. Must not let it become NULL when * pp is the only page on the list. * Caller is expected to hold page's io lock for VMODSORT to work * correctly with pvn_vplist_dirty() and pvn_getdirty() when mod * We don't have assert to avoid tripping some existing third party * code. The dirty page is moved back to top of the v_page list * after IO is done in pvn_write_done(). * VMODSORT works by removing write permissions and getting * a fault when a page is made dirty. At this point * we need to remove write permission from all mappings * If flag is specified, returns 0 if attribute is disabled * and non zero if enabled. If flag specifes multiple attributs * then returns 0 if ALL atriibutes are disabled. This is an advisory * common code used by hat_pageunload() and hment_steal() * We need to acquire a hold on the htable in order to * do the invalidate. We know the htable must exist, since * unmap's don't release the htable until after removing any * hment. Having x86_hm_enter() keeps that from proceeding. * Invalidate the PTE and remove the hment. " pfn being unmapped is %lx ht=0x%lx entry=0x%x",
* Clean up all the htable information for this mapping * sync ref/mod bits to the page_t * Remove the mapping list entry for this page. * drop the mapping list lock so that we might free the * Unload all translations to a page. If the page is a subpage of a large * page, the large page mappings are also removed. * The forceflags are unused. * The loop with next_size handles pages with multiple pagesize mappings * Get a mapping list entry * If not part of a larger page, we're done. * Else check the next larger page size. * hat_page_demote() may decrease p_szc * but that's ok we'll just take an extra * trip discover there're no larger mappings * If this mapping size matches, remove it. * Remove the mapping list entry for this page. * Note this does the x86_hm_exit() for us. * Unload all large mappings to pp and reduce by 1 p_szc field of every large * page level that included pp. * pp must be locked EXCL. Even though no other constituent pages are locked * it's legal to unload large mappings to pp because all constituent pages of * large locked mappings have to be locked SHARED. therefore if we have EXCL * lock on one of constituent pages none of the large mappings to pp are * Change (always decrease) p_szc field starting from the last constituent * page and ending with root constituent page so that root's pszc always shows * the area where hat_page_demote() may be active. * This mechanism is only used for file system pages where it's not always * possible to get EXCL locks on all constituent pages to demote the size code * (as is done for anonymous or kernel large pages). * all large mappings to pp are gone * and no new can be setup since pp is locked exclusively. * Lock the root to make sure there's only one hat_page_demote() * outstanding within the area of this root's pszc. * Second potential hat_page_demote() is already eliminated by upper * VM layer via page_szc_lock() but we don't rely on it and use our * own locking (so that upper layer locking can be changed without * assumptions that hat depends on upper layer VM to prevent multiple * hat_page_demote() to be issued simultaneously to the same large * If root's p_szc is different from pszc we raced with another * hat_page_demote(). Drop the lock and try to find the root again. * If root's p_szc is greater than pszc previous hat_page_demote() is * not done yet. Take and release mlist lock of root's root to wait * for previous hat_page_demote() to complete. /* p_szc of a locked non free page can't increase */ * Decrement by 1 p_szc of every constituent page of a region that * covered pp. For example if original szc is 3 it gets changed to 2 * everywhere except in region 2 that covered pp. Region 2 that * covered pp gets demoted to 1 everywhere except in region 1 that * covered pp. The region 1 that covered pp is demoted to region * 0. It's done this way because from region 3 we removed level 3 * mappings, from region 2 that covered pp we removed level 2 mappings * and from region 1 that covered pp we removed level 1 mappings. All * changes are done from from high pfn's to low pfn's so that roots * are changed last allowing one to know the largest region where * hat_page_demote() is stil active by only looking at the root page. * This algorithm is implemented in 2 while loops. First loop changes * p_szc of pages to the right of pp's level 1 region and second * loop changes p_szc of pages of level 1 region that covers pp * and all pages to the left of level 1 region that covers pp. * In the first loop p_szc keeps dropping with every iteration * and in the second loop it keeps increasing with every iteration. * First loop description: Demote pages to the right of pp outside of * level 1 region that covers pp. In every iteration of the while * loop below find the last page of szc region and the first page of * (szc - 1) region that is immediately to the right of (szc - 1) * region that covers pp. From last such page to first such page * change every page's szc to szc - 1. Decrement szc and continue * looping until szc is 1. If pp belongs to the last (szc - 1) region * of szc region skip to the next iteration. * Second loop description: * First iteration changes p_szc to 0 of every * page of level 1 region that covers pp. * Subsequent iterations find last page of szc region * immediately to the left of szc region that covered pp * and first page of (szc + 1) region that covers pp. * From last to first page change p_szc of every page to szc. * Increment szc and continue looping until szc is pszc. * If pp belongs to the fist szc region of (szc + 1) region * skip to the next iteration. * get hw stats from hardware into page struct and reset hw stats * returns attributes of page * Flags for hat_pagesync, hat_getstat, hat_sync * define HAT_SYNC_ZERORM 0x01 * Additional flags for hat_pagesync * define HAT_SYNC_STOPON_REF 0x02 * define HAT_SYNC_STOPON_MOD 0x04 * define HAT_SYNC_STOPON_RM 0x06 * define HAT_SYNC_STOPON_SHARED 0x08 * walk thru the mapping list syncing (and clearing) ref/mod bits. * Need to clear ref or mod bits. Need to demap * to make sure any executing TLBs see cleared bits. * can stop short if we found a ref'd or mod'd page * returns approx number of mappings to this pp. A return of 0 implies * there are no mappings to the page. * hat_softlock isn't supported anymore * Routine to expose supported HAT features to platform independent code. panic(
"hat_supported() - unknown feature");
* Called when a thread is exiting and has been switched to the kernel AS * Setup the given brand new hat structure as the new HAT on this cpu's mmu. * Prepare for a CPU private mapping for the given address. * The address can only be used from a single CPU and can be remapped * using hat_mempte_remap(). Return the address of the PTE. * We do the htable_create() if necessary and increment the valid count so * the htable can't disappear. We also hat_devload() the page table into * kernel so that the PTE is quickly accessed. panic(
"hat_mempte_setup(): address already mapped" * increment ht_valid_cnt so that the pagetable can't disappear * return the PTE physical address to the caller. * Release a CPU private mapping for the given address. * We decrement the htable valid count so it might be destroyed. * invalidate any left over mapping and decrement the htable valid count panic(
"hat_mempte_release(): invalid address");
* Apply a temporary CPU private mapping to a page. We flush the TLB only * on this CPU, so this ought to have been called with preemption disabled. * Remap the given PTE to the new page's PFN. Invalidate only * XXX - these two functions are currently being used by hatstats * they can be removed by using a per-as mutex for hatstats. * HAT part of cpu initialization. * HAT part of cpu deletion. * (currently, we only call this after the cpu is safely passivated.) * Function called after all CPUs are brought online. * Used to remove low address boot mappings. * On 1st CPU we can unload the prom mappings, basically we blow away * all virtual mappings under _userlimit. * Unload the mapping from the page tables. * Atomically update a new translation for a single page. If the * currently installed PTE doesn't match the value we expect to find, * it's not updated and we return the PTE we found. * If activating nosync or NOWRITE and the page was modified we need to sync * with the page_t. Also sync with page_t if clearing ref/mod bits. * sync to all constituent pages of a large page * hat_page_demote() can't decrease * pszc below this mapping size * since large mapping existed after we * Kernel Physical Mapping (kpm) facility * Most of the routines needed to support segkpm are almost no-ops on the * x86 platform. We map in the entire segment when it is created and leave * it mapped in, so there is no additional work required to set up and tear * down individual mappings. All of these routines were created to support * SPARC platforms that have to avoid aliasing in their virtually indexed * Most of the routines have sanity checks in them (e.g. verifying that the * passed-in page is locked). We don't actually care about most of these * checks on x86, but we leave them in place to identify problems in the * Map in a locked page and return the vaddr. * Return the kpm virtual address for a specific pfn * Return the kpm virtual address for the page at pp. * Return the page frame number for the kpm virtual address vaddr. * Return the page for the kpm virtual address vaddr. * hat_kpm_fault is called from segkpm_fault when we take a page fault on a * KPM page. This should never happen on x86 panic(
"pagefault in seg_kpm. hat: 0x%p vaddr: 0x%p",
hat,
vaddr);