seg_kmem.c revision 35b1ab9964f57b69ba8f03d2962f94036aa78c57
2N/A * The contents of this file are subject to the terms of the 2N/A * Common Development and Distribution License (the "License"). 2N/A * You may not use this file except in compliance with the License. 2N/A * See the License for the specific language governing permissions 2N/A * and limitations under the License. 2N/A * When distributing Covered Code, include this CDDL HEADER in each 2N/A * If applicable, add the following below this CDDL HEADER, with the 2N/A * fields enclosed by brackets "[]" replaced with your own identifying 2N/A * information: Portions Copyright [yyyy] [name of copyright owner] 2N/A * Copyright 2007 Sun Microsystems, Inc. All rights reserved. 2N/A * Use is subject to license terms. 2N/A#
pragma ident "%Z%%M% %I% %E% SMI" 2N/A * seg_kmem is the primary kernel memory segment driver. It 2N/A * maps the kernel heap [kernelheap, ekernelheap), module text, 2N/A * and all memory which was allocated before the VM was initialized 2N/A * Pages which belong to seg_kmem are hashed into &kvp vnode at 2N/A * an offset equal to (u_offset_t)virt_addr, and have p_lckcnt >= 1. 2N/A * They must never be paged out since segkmem_fault() is a no-op to 2N/A * prevent recursive faults. 2N/A * Currently, seg_kmem pages are sharelocked (p_sharelock == 1) on 2N/A * __x86 and are unlocked (p_sharelock == 0) on __sparc. Once __x86 2N/A * supports relocation the #ifdef kludges can be removed. 2N/A * seg_kmem pages may be subject to relocation by page_relocate(), 2N/A * provided that the HAT supports it; if this is so, segkmem_reloc 2N/A * will be set to a nonzero value. All boot time allocated memory as 2N/A * well as static memory is considered off limits to relocation. 2N/A * Pages are "relocatable" if p_state does not have P_NORELOC set, so 2N/A * we request P_NORELOC pages for memory that isn't safe to relocate. 2N/A * The kernel heap is logically divided up into four pieces: 2N/A * heap32_arena is for allocations that require 32-bit absolute 2N/A * heap_core is for allocations that require 2GB *relative* 2N/A * offsets; in other words all memory from heap_core is within 2N/A * 2GB of all other memory from the same arena. This is a requirement 2N/A * of the addressing modes of some processors in supervisor code. 2N/A * heap_arena is the general heap arena. 2N/A * static_arena is the static memory arena. Allocations from it 2N/A * are not subject to relocation so it is safe to use the memory 2N/A * physical address as well as the virtual address (e.g. the VA to 2N/A * PA translations are static). Caches may import from static_arena; 2N/A * all other static memory allocations should use static_alloc_arena. 2N/A * On some platforms which have limited virtual address space, seg_kmem 2N/A * may share [kernelheap, ekernelheap) with seg_kp; if this is so, 2N/A * segkp_bitmap is non-NULL, and each bit represents a page of virtual 2N/A * address space which is actually seg_kp mapped. char *
kernelheap;
/* start of primary kernel heap */ struct seg kvseg;
/* primary kernel heap segment */ struct seg kzioseg;
/* Segment for zio mappings */ char *
heap_lp_base;
/* start of kernel large page heap arena */ char *
heap_lp_end;
/* end of kernel large page heap arena */ struct seg kvseg32;
/* 32-bit kernel heap segment */ struct as kas;
/* kernel address space */ struct vnode kvp;
/* vnode for all segkmem pages */ struct vnode zvp;
/* vnode for zfs pages */ * seg_kmem driver can map part of the kernel heap with large pages. * Currently this functionality is implemented for sparc platforms only. * The large page size "segkmem_lpsize" for kernel heap is selected in the * platform specific code. It can also be modified via /etc/system file. * Setting segkmem_lpsize to PAGESIZE in /etc/system disables usage of large * pages for kernel heap. "segkmem_lpshift" is adjusted appropriately to * At boot time we carve from kernel heap arena a range of virtual addresses * that will be used for large page mappings. This range [heap_lp_base, * heap_lp_end) is set up as a separate vmem arena - "heap_lp_arena". We also * create "kmem_lp_arena" that caches memory already backed up by large * pages. kmem_lp_arena imports virtual segments from heap_lp_arena. * We use "segkmem_kmemlp_max" to limit the total amount of physical memory * consumed by the large page heap. By default this parameter is set to 1/8 of * physmem but can be adjusted through /etc/system either directly or * indirectly by setting "segkmem_kmemlp_pcnt" to the percent of physmem * we allow for large page heap. * Getting large pages for kernel heap could be problematic due to * physical memory fragmentation. That's why we allow to preallocate * "segkmem_kmemlp_min" bytes at boot time. * Throttling is used to avoid expensive tries to allocate large pages * for kernel heap when a lot of succesive attempts to do so fail. * Freed pages accumulate on a garbage list until segkmem is ready, * at which point we call segkmem_gc() to free it all. * Allocations from the hat_memload arena add VM_MEMLOAD to their * vmflags so that segkmem_xalloc() can inform the hat layer that it needs * to take steps to prevent infinite recursion. HAT allocations also * must be non-relocatable to prevent recursive page faults. * Allocations from static_arena arena (or any other arena that uses * segkmem_alloc_permanent()) require non-relocatable (permanently * wired) memory pages, since these pages are referenced by physical * as well as virtual address. * Initialize kernel heap boundaries. * Bias heap_lp start address by kmem64_sz to reduce collisions * in 4M kernel TSB between kmem64 area and heap_lp * If this platform has a 'core' heap area, then the space for * overflow module text should be carved out of the end of that * heap. Otherwise, it gets carved out of the general purpose * reserve space for the large page heap. If large pages for kernel * heap is enabled large page heap arean will be created later in the * boot sequence in segkmem_heap_lp_init(). Otherwise the allocated * range will be returned back to the heap_arena. * Remove the already-spoken-for memory range [kernelheap, first_avail). * Create a set of arenas for memory with static translations * (e.g. VA -> PA translations cannot change). Since using * kernel pages by physical address implies it isn't safe to * walk across page boundaries, the static_arena quantum must * be PAGESIZE. Any kmem caches that require static memory * should source from static_arena, while direct allocations * should only use static_alloc_arena. * Create an arena for translation data (ptes, hmes, or hblks). * We need an arena for this because hat_memload() is essential * Note: any kmem cache that allocates from hat_memload_arena * must be created as a KMC_NOHASH cache (i.e. no external slab * and bufctl structures to allocate) so that slab creation doesn't * require anything more than a single vmem_alloc(). panic(
"boot_mapin: page_resv failed");
panic(
"boot_mapin(): No pp for pfnum = %lx",
pfnum);
* must break up any large pages that may have constituent * pages being utilized for BOP_ALLOC()'s before calling * page_numtopp().The locking code (ie. page_reclaim()) panic(
"boot_alloc: pp is NULL or free");
* If the cage is on but doesn't yet contain this page, * mark it as non-relocatable. * Get pages from boot and hash them into the kernel's vp. * Used after page structs have been allocated, but before segkmem is ready. prom_panic(
"boot_alloc: attempt to allocate memory after " panic(
"boot_alloc: BOP_ALLOC failed");
panic(
"segkmem_fault: bad args");
* If it is one of segkp pages, call segkp_fault. case F_SOFTLOCK:
/* lock down already-loaded translations */ * Hmm, no page. Does a kernel mapping panic(
"segkmem_setprot: bad args");
* If it is one of segkp pages, call segkp. * This is a dummy segkmem function overloaded to call segkp * when segkp is under the heap. * If it is one of segkp pages, call into segkp. * This is a dummy segkmem function overloaded to call segkp * when segkp is under the heap. * If it is one of segkp pages, call into segkp. * If we are about to start dumping the range of addresses we * carved out of the kernel heap for the large page heap walk * heap_lp_arena to find what segments are actually populated * The kernel's heap_arena (represented by kvseg) is a very large * VA space, most of which is typically unused. To speed up dumping * we use vmem_walk() to quickly find the pieces of heap_arena that * are actually in use. We do the same for heap32_arena and * We specify VMEM_REENTRANT to vmem_walk() because dump_addpage() * may ultimately need to allocate memory. Reentrant walks are * necessarily imperfect snapshots. The kernel heap continues * to change during a live crash dump, for example. For a normal * crash dump, however, we know that there won't be any other threads * messing with the heap. Therefore, at worst, we may fail to dump * the pages that get allocated by the act of dumping; but we will * always dump every page that was allocated when the walk began. * The other segkmem segments are dense (fully populated), so there's * no need to use this technique when dumping them. * Note: when adding special dump handling for any new sparsely- * populated segments, be sure to add similar handling to the ::kgrep * We don't want to dump pages attached to kzioseg since they * contain file data from ZFS. If this page's segment is * kzioseg return instead of writing it to the dump device. * lock/unlock kmem pages over a given range [addr, addr+len). * Returns a shadow list of pages in ppp. If there are holes * in the range (e.g. some of the kernel mappings do not have * underlying page_ts) returns ENOTSUP so that as_pagelock() * will handle the range via as_fault(F_SOFTLOCK). * If it is one of segkp pages, call into segkp. return (
ENOTSUP);
/* take the slow path */ * This is a dummy segkmem function overloaded to call segkp * when segkp is under the heap. * If it is one of segkp pages, call into segkp. * Allocate pages to back the virtual address range [addr, addr + size). * If addr is NULL, allocate the virtual address space as well. * Under certain conditions, we need to let the HAT layer know * that it cannot safely allocate memory. Allocations from * the hat_memload vmem arena always need this, to prevent * In addition, the x86 hat cannot safely do memory * allocations while in vmem_populate(), because there * is no simple bound on its usage. halt(
"Memory allocation between bop_alloc() and " * There's not a lot of memory to go around during boot, * so recycle it if we can. panic(
"segkmem_alloc: boot_alloc failed");
* Any changes to this routine must also be carried over to * devmap_free_pages() in the seg_dev driver. This is because * we currently don't have a special kernel segment for non-paged * kernel memory that is exported by drivers to user space. panic(
"segkmem_free: page not found");
* Some other thread has a sharelock. Wait for * it to drop the lock so we can free this page. panic(
"segkmem_free: page not found");
/* Clear p_lckcnt so page_destroy() doesn't update availrmem */ * Legacy entry points from here to end of file. * segkmem_page_create_large() allocates a large page to be used for the kmem * caches. If kpr is enabled we ask for a relocatable page unless requested * otherwise. If kpr is disabled we have to ask for a non-reloc page * Allocate a large page to back the virtual address range * [addr, addr + size). If addr is NULL, allocate the virtual address * allocate an array we need for hat_memload_array. * we use a separate arena to avoid recursion. * we will not need this array when hat_memload_array learns pp++ /* create all the pages */ /* at this point we have all the resource to complete the request */ * Load the locked entry. It's OK to preload the entry into the * TSB since we now support large mappings in the kernel TSB. panic(
"segkmem_free_one_lp: page not found");
/* page_unresv() is done by the caller */ * This function is called to import new spans into the vmem arenas like * kmem_default_arena and kmem_oversize_arena. It first tries to import * spans from large page arena - kmem_lp_arena. In order to do this it might * have to "upgrade the requested size" to kmem_lp_arena quantum. If * it was not able to satisfy the upgraded request it then calls regular * segkmem_alloc() that satisfies the request by importing from "*vmp" arena /* try to update the throttle value */ * when we get above throttle start do an exponential * backoff at trying large pages and reaping * we are low on free memory in kmem_lp_arena * we let only one guy to allocate heap_lp * quantum size chunk that everybody is going to /* we are not the first one - wait */ * we are the first one, make sure we import * VM_ABORT flag prevents sleeps in vmem_xalloc when * large pages are not available. In that case this allocation * attempt will fail and we will retry allocation with small * pages. We also do not want to panic if this allocation fails * because we are going to retry. /* if large page throttling is not started yet do it */ * segkmem_alloc_lpi() imports virtual memory from large page heap arena * into kmem_lp arena. In the process it maps the imported segment with /* do not allow large page heap grow beyound limits */ * segkmem_free_lpi() returns virtual memory back into large page heap arena * from kmem_lp arena. Beore doing this it unmaps the segment and frees * large pages used to map it. * This function is called at system boot time by kmem_init right after * /etc/system file has been read. It checks based on hardware configuration * and /etc/system settings if system is going to use large pages. The * initialiazation necessary to actually start using large pages * happens later in the process after segkmem_heap_lp_init() is called. /* get a platform dependent value of large page size for kernel heap */ * put virtual space reserved for the large page kernel * back to the regular heap /* set heap_lp quantum if necessary */ /* set kmem_lp quantum if necessary */ /* set total amount of memory allowed for large page kernel heap */ /* fix lp kmem preallocation request if necesssary */ /* create large page heap arena */ /* This arena caches memory already mapped by large pages */ * this arena is used for the array of page_t pointers necessary * to call hat_mem_load_array /* prealloacate some memory for the lp kernel heap */