kmem.c revision d7dba7e519e96f726807ca55f6a17fef3f90092f
1N/A * The contents of this file are subject to the terms of the 1N/A * Common Development and Distribution License (the "License"). 1N/A * You may not use this file except in compliance with the License. 1N/A * See the License for the specific language governing permissions 1N/A * and limitations under the License. 1N/A * When distributing Covered Code, include this CDDL HEADER in each 1N/A * If applicable, add the following below this CDDL HEADER, with the 1N/A * fields enclosed by brackets "[]" replaced with your own identifying 1N/A * information: Portions Copyright [yyyy] [name of copyright owner] 1N/A * Copyright 2009 Sun Microsystems, Inc. All rights reserved. 1N/A * Use is subject to license terms. 1N/A * Copyright 2011 Joyent, Inc. All rights reserved. mdb_warn(
"kmem_cpu_cache doesn't support global walks");
mdb_warn(
"slab %p isn't in cache %p (in cache %p)\n",
mdb_warn(
"kmem_slab doesn't support global walks\n");
mdb_warn(
"kmem_slab_partial doesn't support global walks\n");
* Some consumers (umem_walk_step(), in particular) require at * least one callback if there are any buffers in the cache. So * if there are *no* partial slabs, report the first full slab, if * Yes, this is ugly, but it's cleaner than the other possibilities. mdb_printf(
"%-?s %-25s %4s %6s %8s %8s\n",
"ADDR",
"NAME",
"FLAG",
"CFLAG",
"BUFSIZE",
"BUFTOTL");
mdb_printf(
"%s",
"Print kernel memory caches.\n\n");
" name of kmem cache (or matching partial name)\n" "ADDR\t\taddress of kmem cache\n" "NAME\t\tname of kmem cache\n" "FLAG\t\tvarious cache state flags\n" "CFLAG\t\tcache creation flags\n" "BUFSIZE\tobject size in bytes\n" "BUFTOTL\tcurrent total buffers in cache (allocated and free)\n");
* minbucketsize does not apply to the first bucket reserved * for completely allocated slabs * The first printed bucket is reserved for completely allocated slabs. * Passing (buckets - 1) excludes that bucket from the generated * distribution, since we're handling it as a special case. * Print bucket ranges in descending order after the first bucket for * completely allocated slabs, so a person can see immediately whether * or not there is fragmentation without having to scan possibly * multiple screens of output. Starting at (buckets - 2) excludes the * extra terminating bucket. for (i =
buckets -
2; i >= 0; i--) {
* The "kmem_partial_slab" walker reports the first full slab if there * are no partial slabs (for the sake of consumers that require at least * one callback if there are any buffers in the cache). int ksu_refcnt;
/* count of allocated buffers on slab */ "",
"",
"Partial",
"",
"Unused",
"");
"Cache Name",
"Slabs",
"Slabs",
"Buffers",
"Buffers",
"Waste");
"-------------------------",
"--------",
"--------",
"---------",
/* match either -n or -N */ /* +1 to include a zero bucket */ "Display slab usage per kmem cache.\n\n");
" name of kmem cache (or matching partial name)\n" " exact name of kmem cache\n" " Print a distribution of allocated buffers per slab using at\n" " most maxbins bins. The first bin is reserved for completely\n" " allocated slabs. Setting maxbins to zero (-b 0) has the same\n" " effect as specifying the maximum allocated buffers per slab\n" " or setting minbinsize to 1 (-B 1).\n" " Print a distribution of allocated buffers per slab, making\n" " all bins (except the first, reserved for completely allocated\n" " slabs) at least minbinsize buffers apart.\n" " -v verbose output: List the allocated buffer count of each partial\n" " slab on the free list in order from front to back to show how\n" " closely the slabs are ordered by usage. For example\n" " 10 complete, 3 partial (8): 7 3 1\n" " means there are thirteen slabs with eight buffers each, including\n" " three partially allocated slabs with less than all eight buffers\n" " Buffer allocations are always from the front of the partial slab\n" " list. When a buffer is freed from a completely used slab, that\n" " slab is added to the front of the partial slab list. Assuming\n" " that all buffers are equally likely to be freed soon, the\n" " desired order of partial slabs is most-used at the front of the\n" " list and least-used at the back (as in the example above).\n" " However, if a slab contains an allocated buffer that will not\n" " soon be freed, it would be better for that slab to be at the\n" " front where all of its buffers can be allocated. Taking a slab\n" " off the partial slab list (either with all buffers freed or all\n" " buffers allocated) reduces cache fragmentation.\n" " A slab's allocated buffer count representing a partial slab (9 in\n" " the example below) may be marked as follows:\n" " 9* An asterisk indicates that kmem has marked the slab non-\n" " reclaimable because the kmem client refused to move one of the\n" " slab's buffers. Since kmem does not expect to completely free the\n" " slab, it moves it to the front of the list in the hope of\n" " completely allocating it instead. A slab marked with an asterisk\n" " stays marked for as long as it remains on the partial slab list.\n" "Column\t\tDescription\n" "Cache Name\t\tname of kmem cache\n" "Slabs\t\t\ttotal slab count\n" "Partial Slabs\t\tcount of partially allocated slabs on the free list\n" "Buffers\t\ttotal buffer count (Slabs * (buffers per slab))\n" "Unused Buffers\tcount of unallocated buffers across all partial slabs\n" "Waste\t\t\t(Unused Buffers / Buffers) does not include space\n" "\t\t\t for accounting structures (debug mode), slab\n" "\t\t\t coloring (incremental small offsets to stagger\n" "\t\t\t buffer alignment), or the per-CPU magazine layer\n");
mdb_warn(
"kmem_hash doesn't support global walks\n");
* Find the address of the bufctl structure for the address 'buf' in cache * 'cp', which is at address caddr, and place it in *out. mdb_warn(
"unable to read hash bucket for %p in cache %p",
* if cpu 0 has a non-zero magsize, it must be correct. caches * with KMF_NOMAGAZINE have disabled their magazine layers, so * it is okay to return 0 for them. mdb_warn(
"unable to read 'kmem_magtype'");
mdb_warn(
"cache '%s' has invalid magtype pointer (%p)\n",
* Returns an upper bound on the number of allocated buffers in a given mdb_warn(
"cache %p's magazine layer holds more buffers " "than the slab layer.\n",
addr);
for (i = 0; i <
rounds; i++) { \
mdb_warn(
"%d magazines exceeds fudge factor\n", \
* Read the magtype out of the cache, after verifying the pointer's * There are several places where we need to go buffer hunting: * the per-CPU loaded magazine, the per-CPU spare full magazine, * and the full magazine list in the depot. * For an upper bound on the number of buffers in the magazine * layer, we have the number of magazines on the cache_full * list plus at most two magazines per CPU (the loaded and the * spare). Toss in 100 magazines as a fudge factor in case this * is live (the number "100" comes from the same fudge factor in mdb_warn(
"magazine size for cache %p unreasonable (%x)\n",
* First up: the magazines in the depot (i.e. on the cache_full list). break;
/* cache_full list loop detected */ dprintf((
"cache_full list done\n"));
* Now whip through the CPUs, snagging the loaded magazines * In order to prevent inconsistent dumps, rounds and prounds * are copied aside before dumping begins. dprintf((
"reading %d previously loaded rounds\n",
* if KMF_AUDIT is not set, we know that we're looking at a (
void)
memset(&b, 0,
sizeof (b));
char *
kmw_valid;
/* to keep track of freed buffers */ mdb_warn(
"kmem walk doesn't support global walks\n");
* First we need to figure out how many CPUs are configured in the * system to know how much to slurp out. * It's easy for someone to hand us an invalid cache address. * Unfortunately, it is hard for this walker to survive an * invalid cache cleanly. So we make sure that: * 1. the vmem arena for the cache is readable, * 2. the vmem arena's quantum is a power of 2, * 3. our slabsize is a multiple of the quantum, and * 4. our chunksize is >0 and less than our slabsize. * If they ask for bufctls, but it's a small-slab cache, * there is nothing to report. dprintf((
"bufctl requested, not KMF_HASH (flags: %p)\n",
* If they want constructed buffers, but there's no constructor or * the cache has DEADBEEF checking enabled, there is nothing to report. * Read in the contents of the magazine layer * We have all of the buffers from the magazines; if we are walking * allocated buffers, sort them so we can bsearch them later. * When walking allocated buffers in a KMF_HASH cache, we walk the * hash table instead of the slab layer. * If we are walking freed buffers, we only need the * magazine layer plus the partially allocated slabs. * To walk allocated buffers, we need all of the slabs. * for small-slab caches, we read in the entire slab. For * freed buffers, we can just walk the freelist. For * allocated buffers, we use a 'valid' array to track * first, handle the 'kmem_hash' layered walk case * We have a buffer which has been allocated out of the * global layer. We need to make sure that it's not * actually sitting in a magazine before we report it as * If we're walking freed buffers, report everything in the * magazine layer before processing the first slab. for (i = 0; i <
magcnt; i++) {
* If they want constructed buffers, we're finished, since the * magazine layer holds them all. * Handle the buffers in the current slab * Set up the valid map as fully allocated -- we'll punch * walk the slab's freelist * since we could be in the middle of allocating a buffer, * our refcnt could be one higher than it aught. So we * check one further on the freelist than the count allows. "slab %p in cache %p freelist too short by %d\n",
mdb_warn(
"failed to read bufctl ptr at %p",
* Otherwise the buffer is (or should be) in the slab * that we've read in; determine its offset in the * slab, validate that it's not corrupt, and add to * our base address to find the umem_bufctl_t. (Note * that we don't need to add the size of the bufctl * to our offset calculation because of the slop that's * allocated for the buffer at ubase.) " in slab %p in cache %p\n",
bcp,
* This is very wrong; we have managed to find * a buffer in the slab which shouldn't * actually be here. Emit a warning, and * we have found a buffer on the slab's freelist; * Report this freed buffer dprintf((
"slab %p in cache %p freelist too long (%p)\n",
* If we are walking freed buffers, the loop above handled reporting mdb_warn(
"impossible situation: small-slab KM_BUFCTL walk for " * Report allocated buffers, skipping buffers in the magazine layer. * We only get this far for small-slab caches. continue;
/* on slab freelist */ continue;
/* in magazine layer */ * Buffers allocated from NOTOUCH caches can also show up as freed * memory in other caches. This can be a little confusing, so we * don't walk NOTOUCH caches when walking all caches (thereby assuring * that "::walk kmem" and "::walk freemem" yield disjoint output). mdb_warn(
"bufctl_history walk doesn't support global walks\n");
* sometimes the first log entry matches the base bufctl; in that * case, skip the base bufctl. * The bufctl is only valid if the address, cache, and slab are * correct. We also check that the timestamp is decreasing, to * prevent infinite loops. * By default (global walk), walk the kmem_transaction_log. Otherwise * read the log whose kmem_log_header_t is stored at walk_addr. mdb_warn(
"failed to read 'kmem_transaction_log'");
(
int(*)(
const void *,
const void *))
bufctlcmp);
mdb_warn(
"allocdby walk doesn't support global walks\n");
mdb_printf(
"%-?s %12s %s\n",
"BUFCTL",
"TIMESTAMP",
"CALLER");
* Return a string describing the address in relation to the given thread's * - If the thread state is TS_FREE, return " (inactive interrupt thread)". * - If the address is above the stack pointer, return an empty string * signifying that the address is active. * - If the address is below the stack pointer, and the thread is not on proc, * - If the address is below the stack pointer, and the thread is on proc, * return " (possibly below sp)". Depending on context, we may or may not return (
" (inactive interrupt thread)");
* Check to see if we're on the panic stack. If so, ignore t_sp, as it * no longer relates to the thread's real stack. return (
" (possibly below sp)");
* Additional state for the kmem and vmem ::whatis handlers /* call one of our dcmd functions with "-v" and the provided address */ /* validate our arguments and read in the buftag */ /* validate the buffer state and read in the callers */ /* If there aren't any filled in callers, bail */ /* Everything's done and checked; print them out */ for (i =
1; i <
count; i++) {
/* LINTED pointer cast may result in improper alignment */ /* for KMF_LITE caches, try to print out the previous callers */ /* We're not interested in anything but alloc and free segments */ * If we're not printing it seperately, provide the vmem_seg * pointer if it has a stack trace. /* It must overlap with the slab data, or it's not interesting */ /* Override the '-b' flag as necessary */ * If more then two buffers live on each slab, figure out if we're * interested in anything in any slab before doing the more expensive mdb_warn(
"can't find kmem_slab walker");
* We have searched for allocated memory; now search for freed memory. * Often, one calls ::whatis on an address from a thread structure. * We use this opportunity to short circuit this case... "allocated as a thread structure\n");
* This assumes that t_stk is the end of the stack, but it's really * only the initial stack pointer for the thread. Arguments to the * initial procedure, SA(MINFRAME), etc. are all after t_stk. So * that 't->t_stk::whatis' reports "part of t's stack", we include * t_stk in the range (the "+ 1", below), but the kernel should * really include the full stack bounds where we can find it. * Since we're searching for addresses inside a module, we report mdb_warn(
"couldn't read symbol header for %p's module",
addr);
/* round our found pointer down to the page_t base. */ "allocated as a page structure\n");
mdb_warn(
"couldn't find modctl walker");
* Now search all thread stacks. Yes, this is a little weak; we * can save a lot of work by first checking to see if the * address is in segkp vs. segkmem. But hey, computers are mdb_warn(
"couldn't find thread walker");
mdb_warn(
"couldn't find memseg walker");
mdb_warn(
"unable to readvar \"kmem_msb_arena\"");
* We process kmem caches in the following order: * non-KMC_NOTOUCH, non-metadata (typically the most interesting) * metadata (can be huge with KMF_AUDIT) * KMC_NOTOUCH, non-metadata (see kmem_walk_all()) mdb_warn(
"couldn't find kmem_cache walker");
mdb_warn(
"couldn't find vmem_postfix walker");
for (i = 0; i <
NCPU; i++) {
"failed to read cache_bufsize for cache at %p",
mdb_warn(
"failed to read 'kmem_transaction_log'");
mdb_warn(
"expected 'cpu' to be of size %d; found %d\n",
for (i = 0; i <
NCPU; i++) {
mdb_warn(
"cannot read cpu %d's log header at %p",
mdb_printf(
"%3s %-?s %-?s %16s %-?s\n",
"CPU",
"ADDR",
"BUFADDR",
* If we have been passed an address, print out only log entries * corresponding to that address. If opt_b is specified, then interpret * the address as a bufctl. "Display the contents of kmem_bufctl_audit_ts, with optional filtering.\n\n");
" -v Display the full content of the bufctl, including its stack trace\n" " -h retrieve the bufctl's transaction history, if available\n" " filter out bufctls not involving the buffer at addr\n" " filter out bufctls without the function/PC in their stack trace\n" " filter out bufctls timestamped before earliest\n" " filter out bufctls timestamped after latest\n" " filter out bufctls not involving thread\n");
for (i = 0; i <
argc; i++)
* When in history mode, we treat each element as if it * were in a seperate loop, so that the headers group * bufctls with similar histories. mdb_warn(
"unable to walk bufctl_history");
"%<u>%16s %16s %16s %16s%</u>\n",
"ADDR",
"BUFADDR",
"TIMESTAMP",
"THREAD",
"",
"CACHE",
"LASTLOG",
"CONTENTS");
"ADDR",
"BUFADDR",
"TIMESTAMP",
"THREAD",
"CALLER");
* Guard against bogus bc_depth in case the bufctl is corrupt or * the address does not really refer to a bufctl. * We were provided an exact symbol value; any * address in the function is valid. for (i = 0; i <
depth; i++)
"%<b>%16p%</b> %16p %16llx %16p\n" for (i = 0; i <
depth; i++)
for (i = 0; i <
depth; i++) {
* verify that buf is filled with the pattern pat. * verify that btp->bt_bxstat == (bcp ^ pat) * verify the integrity of a free block of memory by checking * that it is filled with 0xdeadbeef and that its buftag is sane. * Read the buffer to check. mdb_printf(
"buffer %p (free) seems corrupted, at %p\n",
* When KMF_LITE is set, buftagp->bt_redzone is used to hold * the first bytes of the buffer, hence we cannot check for red "have a corrupt redzone pattern\n",
addr);
* confirm bufctl pointer integrity. * Verify that the buftag of an allocated buffer makes sense with respect * Read the buffer to check. * There are two cases to handle: * 1. If the buf was alloc'd using kmem_cache_alloc, it will have * 0xfeedfacefeedface at the end of it * 2. If the buf was alloc'd using kmem_alloc, it will have * 0xbb just past the end of the region in use. At the buftag, * it will have 0xfeedface (or, if the whole buffer is in use, * 0xfeedface & bb000000 or 0xfeedfacf & 000000bb depending on * endianness), followed by 32 bits containing the offset of the * 0xbb byte in the buffer. * Finally, the two 32-bit words that comprise the second half of the * buftag should xor to KMEM_BUFTAG_ALLOC "redzone size encoding\n",
addr);
"redzone signature\n",
addr);
"corrupt buftag\n",
addr);
"redzone checking enabled\n",
addr,
* table mode, don't print out every corrupt buffer char *s =
"";
/* optional s in "buffer[s]" */ * This is the more verbose mode, when the user has * type addr::kmem_verify. If the cache was clean, * nothing will have yet been printed. So say something. * If the user didn't specify a cache to verify, we'll walk all * kmem_cache's, specifying ourself as a callback for each... * this is the equivalent of '::walk kmem_cache .::kmem_verify' "Cache Name",
"Addr",
"Cache Integrity");
mdb_warn(
"couldn't find %p's parent (%p)\n",
* The "vmem_postfix" walk walks the vmem arenas in post-fix order; all * children are visited before their parent. We perform the postfix walk * iteratively (rather than recursively) to allow mdb to regain control * If this node is marked, then we know that we have already visited * all of its children. If the node has any siblings, they need to * be visited next; otherwise, we need to visit the parent. Note * that vp->vn_marked will only be zero on the first invocation of * We have neither a parent, nor a sibling, and we * have already been visited; we're done. * Before we visit this node, visit its children. * vmem segments can't have type 0 (this should be added to vmem_impl.h). mdb_warn(
"failed to read 'vmem_seg_size'");
"TOTAL",
"SUCCEED",
"FAIL");
mdb_printf(
"%0?p %-*s %10llu %12llu %9llu %5llu\n",
"Display the contents of vmem_seg_ts, with optional filtering.\n\n" "A vmem_seg_t represents a range of addresses (or arbitrary numbers),\n" "representing a single chunk of data. Only ALLOC segments have debugging\n" " -v Display the full content of the vmem_seg, including its stack trace\n" " -s report the size of the segment, instead of the end address\n" " filter out segments without the function/PC in their stack trace\n" " filter out segments timestamped before earliest\n" " filter out segments timestamped after latest\n" " filer out segments smaller than minsize\n" " filer out segments larger than maxsize\n" " filter out segments not involving thread\n" " filter out segments not of type 'type'\n" "%<u>%16s %4s %16s %16s %16s%</u>\n",
"ADDR",
"TYPE",
"START",
"END",
"SIZE",
"",
"",
"THREAD",
"TIMESTAMP",
"");
mdb_printf(
"%?s %4s %?s %?s %s\n",
"ADDR",
"TYPE",
"START",
size?
"SIZE" :
"END",
"WHO");
mdb_warn(
"\"%s\" is not a recognized vmem_seg type\n",
* debug info, when present, is only accurate for VMEM_ALLOC segments return (
DCMD_OK);
/* not enough info */ sizeof (c), &
sym) != -
1 &&
* We were provided an exact symbol value; any * address in the function is valid. for (i = 0; i <
depth; i++)
for (i = 0; i <
depth; i++) {
for (i = 0; i <
depth; i++) {
c,
sizeof (c), &
sym) == -
1)
"failed to read cache_bufsize for cache at %p",
for (i = 0; i <
depth; i++)
const char *
logname =
"kmem_transaction_log";
mdb_warn(
"failed to read %s log header pointer");
* As the final lure for die-hard crash(1M) users, we provide ::kmausers here. * The first piece is a structure which we use to accumulate kmem_cache_t * addresses of interest. The kmc_add is used as a callback for the kmem_cache * walker; we either add all caches, or ones named explicitly as arguments. const char *
kmc_name;
/* Name to match (or NULL) */ int kmc_size;
/* Size of kmc_caches array */ * If we have a match, grow our array (if necessary), and then * add the virtual address of the matching cache to our list. * The second piece of ::kmausers is a hash table of allocations. Each * allocation owner is identified by its stack trace and data_size. We then * track the total bytes of all such allocations, and the number of allocations * to report at the end. Once we have a list of caches, we walk through the * allocated bufctls of each, and update our hash table accordingly. int kmu_size;
/* Total number of entries */ * If the hash table is full, double its size and rehash everything. * Finish computing the hash signature from the stack trace, and then * see if the owner is in the hash table. If so, update our stats. for (i = 0; i <
depth; i++)
for (i = 0; i <
depth; i++) {
* If the owner is not yet hashed, grab the next element and fill it * in based on the allocation information. for (i = 0; i <
depth; i++)
* When ::kmausers is invoked without the -f flag, we simply update our hash * table with the information from each allocated bufctl. * When ::kmausers is invoked with the -f flag, we print out the information * for each bufctl as well as updating the hash table. mdb_printf(
"size %d, addr %p, thread %p, cache %s\n",
for (i = 0; i <
depth; i++)
* We sort our results by allocation size before printing them. * The main engine of ::kmausers is relatively straightforward: First we * accumulate our list of kmem_cache_t addresses into the kmclist_t. Next we * iterate over the allocated bufctls of each cache in the list. Finally, * we sort and print our results. argv += i;
/* skip past options we just processed */ argc -= i;
/* adjust argc */ mdb_warn(
"KMF_AUDIT is not enabled for %s\n",
mdb_warn(
"KMF_AUDIT is not enabled for any caches\n");
mdb_printf(
"%lu bytes for %u allocations with data size %lu:\n",
"Displays the largest users of the kmem allocator, sorted by \n" "trace. If one or more caches is specified, only those caches\n" "will be searched. By default, all caches are searched. If an\n" "address is specified, then only those allocations which include\n" "the given address are displayed. Specifying an address implies\n" "\t-e\tInclude all users, not just the largest\n" "\t-f\tDisplay individual allocations. By default, users are\n" "\t\tgrouped by stack\n");
return (-
1);
/* errno is set for us */ * If kmem is ready, we'll need to invoke the kmem_cache walker * immediately. Walkers in the linkage structure won't be ready until * _mdb_init returns, so we'll need to add this one manually. If kmem * is ready, we'll use the walker to initialize the caches. If kmem * isn't ready, we'll register a callback that will allow us to defer * cache walking until it is. mdb_warn(
"failed to add kmem_cache walker");
/* register our ::whatis handlers */ * Warn about swapped out threads, but drive on anyway * Search the thread's stack for the given pointer. Note that it would * be more efficient to follow ::kgrep's lead and read in page-sized * chunks, but this routine is already fast and simple. mdb_warn(
"couldn't read thread %p's stack at %p",