mem_config.c revision ad23a2db4cfc94c0ed1d58554479ce8d2e7e5768
#
include <
sys/
atomic.h>
/* for use in stats collection */ * Add a chunk of memory to the system. page_t's for this memory * are allocated in the first few pages of the chunk. * base: starting PAGESIZE page of new memory. * npgs: length in PAGESIZE pages. * Adding mem this way doesn't increase the size of the hash tables; * growing them would be too hard. This should be OK, but adding memory * dynamically most likely means more hash misses, since the tables will * be smaller than they otherwise would be. "?kphysm_add_memory_dynamic: adding %ldK at 0x%" PRIx64 "\n",
* Add this span in the delete list to prevent interactions. * Check to see if any of the memory span has been added * by trying an add to the installed memory list. This * forms the interlocking process for add. * We store the page_t's for this new memory in the first * few pages of the chunk. Here, we go and get'em ... * The expression after the '-' gives the number of pages * that will fit in the new memory based on a requirement * of (PAGESIZE + sizeof (page_t)) bytes per page. * A viable kpm large page mapping must not overlap two * dynamic memsegs. Therefore the total size is checked * to be at least kpm_pgsz and also whether start and end * points are at least kpm_pgsz aligned. * There is no specific error code for violating * kpm granularity constraints. * Is memory area supplied too small? * There is no specific error code for 'too small'. * We may re-use a previously allocated VA space for the page_ts * eventually, but we need to initialize and lock the pages first. * Get an address in the kernel address map, map * the page_t pages and see if we can touch them. " Can't allocate VA for page_ts");
* In the remapping code we map one page at a time so we must do * the same here to match mapping sizes. " Can't access pp array at 0x%p [phys 0x%lx]",
* Add this memory slice to its memory node translation. * Note that right now, each node may have only one slice; * this may change with COD or in larger SSM systems with * nested latency groups, so we must not assume that the * node does not yet exist. * Allocate or resize page counters as necessary to accomodate * the increase in memory pages. * Update the phys_avail memory list. * The phys_install list was done at the start. /* See if we can find a memseg to re-use. */ * Initialize the memseg structure representing this memory * and add it to the existing list of memsegs. Do some basic * initialization and add the memory to the system. * In order to prevent lock deadlocks, the add_physmem() * code is repeated here, but split into several stages. * Initialize metadata. The page_ts are set to locked state /* Save the original pp base in case we reuse a memseg. */ /* Remap our page_ts to the re-used memseg VA space. */ * The new memseg is inserted at the beginning of the list. * Not only does this save searching for the tail, but in the * case of a re-used memseg, it solves the problem of what * happens of some process has still got a pointer to the * memseg and follows the next pointer to continue traversing * Recalculate the paging parameters now total_pages has changed. * This will also cause the clock hands to be reset before next use. * Free the pages outside the lock to avoid locking loops. * Now that we've updated the appropriate memory lists we * need to reset a number of globals, since we've increased memory. * Several have already been updated for us as noted above. The * globals we're interested in at this point are: * physmax - highest page frame number. * physinstalled - number of pages currently installed (done earlier) * maxmem - max free pages in the system * physmem - physical memory pages available * availrmem - real memory available * Update lgroup generation number on single lgroup systems return (
KPHYSM_OK);
/* Successfully added system memory */ * There are various error conditions in kphysm_add_memory_dynamic() * which require a rollback of already changed global state. /* Unreserve memory span. */ * Only return an available memseg of exactly the right size. * When the meta data area has it's own virtual address space * we will need to manage this more carefully and do best fit * allocations, possibly splitting an availble area. * The stat values are only incremented in the delete thread * so no locking or atomic required. #
else /* MEM_DEL_STATS */#
endif /* MEM_DEL_STATS */ * mh_mutex must be taken to examine or change mh_exthandle and mh_state. * The mutex may not be required for other fields, dependent on mh_state. #
endif /* MEM_DEL_STATS */ /* handle_gen is protected by list mutex. */ * Exit the mutex to preserve locking order. This is OK * here as once in the FREE state, the handle cannot * No need to lock the handle (mh_mutex) as only * mh_next changing and this is the only thread that * This function finds the internal mem_handle corresponding to an * external handle and returns it with the mh_mutex held. * The state of the handle could have been changed * by kphysm_del_release() while waiting for mh_mutex. * The handle is allocated using KM_SLEEP, so cannot fail. * If the implementation is changed, the correct error to return * here would be KPHYSM_ENOHANDLES. /* TODO: phys_install could change now */ * Concatenate lists. No list ordering is required. * Given a new list of delspans, check there is no overlap with * all existing span activity (add or delete) and then concatenate * the new spans to the given list. * Return 1 for OK, 0 if overlapping. /* ASSERT(my_tlp->trl_spans == NULL || tlp_in_list(trh, my_tlp)); */ * Reserve interface for add to stop delete before add finished. * functions and so is fully protected by the mutex in struct transit_list. * Return whether memseg was created by kphysm_add_memory_dynamic(). * If this is the case and startp non zero, return also the start pfn * of the meta data via startp. /* Meta data is required to be at the beginning */ * Intersect the span with the installed memory list (phys_install). * No physical memory in this range. Is this an * error? If an attempt to start the delete is made * for OK returns from del_span such as this, start will * Could return KPHYSM_ENOWORK. * It is assumed that there are no error returns * from span_to_install() due to kmem_alloc failure. * Does this span overlap an existing span? * Differentiate between already on list for this handle * (KPHYSM_EDUP) and busy elsewhere (KPHYSM_EBUSY). * At this point the spans in mdsp_new have been inserted into the * list of spans for this handle and thereby to the global list of * spans being processed. Each of these spans must now be checked * for relocatability. As a side-effect segments in the memseg list * Note that mdsp_new can no longer be used as it is now part of * a larger list. Select elements of this larger list based * The pages_checked count is a hack. All pages should be * checked for relocatability. Those not covered by memsegs * should be tested with arch_kphysm_del_span_ok(). /* Span and memseg don't overlap. */ /* Check that segment is suitable for delete. */ * Can only delete whole added segments * Check that this is completely within the * Set mseg_start for accounting below. * If this segment is larger than the span, * try to split it. After the split, it * is necessary to restart. * The memseg is wholly within the delete span. * The individual pages can now be checked. * Keep holding the mh_mutex to prevent it going away. * It is OK to proceed here if mdsp_new == NULL. * Find the lowest addressed memseg that starts * after sbase and account for it. * This is to catch dynamic memsegs whose start * Now have the full extent of the memseg so /* Span does not overlap memseg. */ * Account for gap either before the segment if * there is one or to the end of the span. * Check with arch layer for relocatability. * No non-relocatble pages in this * area, avoid the fine-grained * Skip the page_t area of a * The individual pages can now be checked. * This release function can be called at any stage as follows: * _start called but failed * Set state so that we can wait if necessary. * Also this means that we have read/write access to all * fields except mh_exthandle and mh_state. * The mem_handle cannot be de-allocated by any other operation * now, so no need to hold mh_mutex. * This cancel function can only be called with the thread running. * Set the cancel flag and wake the delete thread up. * The thread may be waiting on I/O, so the effect of the cancel * Calling kphysm_del_status() is allowed before the delete * is started to allow for status display. * If all pageable pages were paged out, freemem would * equal availrmem. There is a minimum requirement for /* TODO: check swap space, etc. */ * Get up to freemem_incr pages. * Take free_get pages away from freemem, * Duplicate test from page_create_throttle() * but don't override with !PG_WAIT. * Put pressure on pageout. * This function is run as a helper thread for delete_memory_thread. * It is needed in order to force kaio cleanup, so that pages used in kaio * will be unlocked and subsequently relocated by delete_memory_thread. * The address of the delete_memory_threads's mem_handle is passed in to * this thread function, and is used to set the mh_aio_cleanup_done member * prior to calling thread_exit(). if (
modload(
"sys",
"kaio") == -
1) {
"aio_cleanup_dr_delete_memory not found in kaio");
/* cleanup proc's outstanding kaio */ /* delay a bit before retrying all procs again */ #
endif /* MEM_DEL_STATS */#
endif /* MEM_DEL_STATS */ /* Allocate the remap pages now, if necessary. */ * Subtract from availrmem now if possible as availrmem * may not be available by the end of the delete. * Start dr_aio_cleanup_thread, which periodically iterates * through the process list and invokes aio cleanup. This * is needed in order to avoid a deadly embrace between the * delete_memory_thread (waiting on writer lock for page, with the * exclusive-wanted bit set), kaio read request threads (waiting for a * reader lock on the same page that is wanted by the * delete_memory_thread), and threads waiting for kaio completion * (blocked on spt_amp->lock). * Release mh_mutex - some of this * stuff takes some time (eg PUTPAGE). * Not covered by a page_t - will * be dealt with elsewhere. * Page in use elsewhere. Skip it. * See if the cage expanded into the delete. * This can happen as we have to allow the * Page has been retired and is * not part of the cage so we * can now do the accounting for * Like page_reclaim() only 'freemem' * processing is already done. * Keep stats on pages encountered that * are marked for retirement. * In certain cases below, special exceptions * are made for pages that are toxic. This * is because the current meaning of toxic * is that an uncorrectable error has been * previously associated with the page. * Must relocate locked in #
endif /* MEM_DEL_STATS */ * Lock all constituent pages * of a large page to ensure * that p_szc won't change. #
endif /* MEM_DEL_STATS */#
endif /* MEM_DEL_STATS */ * Cannot do anything about * this page because it is * Unload the mappings and check if mod bit #
endif /* MEM_DEL_STATS */ * Lock all constituent pages * of a large page to ensure * that p_szc won't change. #
endif /* MEM_DEL_STATS */#
endif /* MEM_DEL_STATS */ * page_destroy was called with * dontfree. As long as p_lckcnt * and p_cowcnt are both zero, the * only additional action of * page_destroy with !dontfree is to * call page_free, so we can collect #
endif /* MEM_DEL_STATS */ * The page is toxic and the mod bit is * set, we cannot do anything here to deal #
endif /* MEM_DEL_STATS */#
endif /* MEM_DEL_STATS */ * Try to get the page back immediately * so that it can be collected. * This should not happen as this * thread is deleting the page. * If this code is generalized, this "delete_memory_thread(0x%p) " "pfn 0x%lx has no page_t",
* Got some freemem and a target * page, so move the data to avoid * page_relocate() will return pgcnt: the * number of consecutive pages relocated. * If it is successful, pp will be a * linked list of the page structs that * were relocated. If page_relocate() is * unsuccessful, pp will be unmodified. #
endif /* MEM_DEL_STATS */#
endif /* MEM_DEL_STATS */ * We did not succeed. We need * to give the pp_targ pages back. * page_free(pp_targ, 1) without * the freemem accounting. * We will then collect pgcnt pages. * We need to make sure freemem_left is * Do not proceed if mh_cancel is set. * Unlink and unlock each page. * We need to give the pp pages back. * page_free(pp, 1) without the /* Now remove pgcnt from freemem_left */ * pp and pp_targ were passed back as * a linked list of pages. * Unlink and unlock each page. * The original page is now free * so remove it from the linked * This code is needed as we cannot wait * for a page to be locked OR the delete to * be cancelled. Also, we must delay so * that other threads get a chance to run * on our cpu, otherwise page locks may be * held indefinitely by those threads. /* stop the dr aio cleanup thread */ /* Return any surplus. */ #
endif /* MEM_DEL_STATS */ * If the memory delete was cancelled, exclusive-wanted bits must * be cleared. If there are retired pages being deleted, they need /* do we already have pp? */ * To satisfy ASSERT below in * Free retired page bitmap and collected page bitmap /* wait for our dr aio cancel thread to exit */ * Go through list of deleted pages (mh_deleted) freeing * All the pages are no longer in use and are exclusively locked. /* mhp->mh_mutex exited by CALLB_CPR_EXIT() */ * Start the delete of the memory from the system. * Release the mutex in case thread_create sleeps. * The "obvious" process for this thread is pageout (proc_pageout) * but this gives the thread too much power over freemem * which results in freemem starvation. * dpages starts off as the size of the structure and * ends up as the minimum number of pages that will * hold a whole number of page_t structures. * Allocate pp_dummy pages directly from static_arena, * since these are whole page allocations and are * referenced by physical address. This also has the * nice fringe benefit of hiding the memory from * ::findleaks since it doesn't deal well with allocated * kernel heap memory that doesn't have any mappings. * Initialize the page_t's to a known 'deleted' state * that matches the state of deleted pages. /* Remove kmem mappings for the pages for safety. */ /* Leave pp_dummy pointer set as flag that init is done. */ for (i = 0; i < n; i++) {
* Transition all the deleted pages to the deleted state so that * page_lock will not wait. The page_lock_delete call will * also wake up any waiters. * remove from main segment list. /* Span and memseg don't overlap. */ /* Hide the memseg from future scans. */ * Leave the deleted segment's next pointer intact * in case a memsegs scanning loop is walking this * Recalculate the paging parameters now total_pages has changed. * This will also cause the clock hands to be reset before next use. * Put the page_t's into the deleted state to stop * cv_wait()s on the pages. When we remap, the dummy * page_t's will be in the same state. * Collect up information based on pages_base and pages_end * early so that we can flag early that the memseg has been * deleted by setting pages_end == pages_base. /* Remap the meta data to our special dummy area. */ * Set for clean-up below. * For memory whose page_ts were allocated * at boot, we need to find a new use for * For the moment, just leak it. * (It is held in the memseg_delete_junk list.) /* Must not use seg now as it could be re-used. */ /* availrmem is adjusted during the delete. */ * Update lgroup generation number on single lgroup systems /* Successfully deleted system memory */ /* do not do PP_SETAGED(pp); */ printf(
"memory delete loop %x/%x, statistics%s\n",
#
endif /* MEM_DEL_STATS */ * This test will become more complicated when the version must /* Catch this in DEBUG kernels. */ "(0x%p, 0x%p) duplicate registration from 0x%p",
* Note the locking between pre_del and post_del: The reader lock is held * between the two calls to stop the set of functions from changing. * Lock the memsegs list against other updates now * Find boot time memseg that wholly covers this area. /* First find the memseg with page 'base' in it. */ * Work out the size of the two segments that will * surround the new segment, one for low address * Allocate the new structures. The old memseg will not be freed * as there may be a reference to it. * All allocation done now. * Update hat_kpm specific info of all involved memsegs and * allow hat_kpm specific global chain updates. * At this point we have two equivalent memseg sub-chains, * the same place in the global chain. By re-writing the pointer * in the previous element we switch atomically from using the old * We leave the old segment, 'seg', intact as there may be * references to it. Also, as the value of total_pages has not * changed and the memsegs list is effectively the same when * accessed via the old or the new pointer, we do not have to * cause pageout_scanner() to re-evaluate its hand pointers. * We currently do not re-use or reclaim the page_t memory. * If we do, then this may have to change. * The memsegs lock is only taken when modifying the memsegs list * and rebuilding the pfn hash table (after boot). * No lock is needed for read as memseg structure are never de-allocated * and the pointer linkage is never updated until the memseg is ready. * memlist (phys_install, phys_avail) locking. * The sfmmu hat layer (e.g.) accesses some parts of the memseg * structure using physical addresses. Therefore a kmem_cache is * used with KMC_NOHASH to avoid page crossings within a memseg * structure. KMC_NOHASH requires that no external (outside of * slab) information is allowed. This, in turn, implies that the * cache's slabsize must be exactly a single page, since per-slab * information (e.g. the freelist for the slab) is kept at the * end of the slab, where it is easy to locate. Should be changed