cpr_dump.c revision 903a11ebdc8df157c4700150f41f1f262f4a8ae8
2N/A * The contents of this file are subject to the terms of the 2N/A * Common Development and Distribution License (the "License"). 2N/A * You may not use this file except in compliance with the License. 2N/A * See the License for the specific language governing permissions 2N/A * and limitations under the License. 2N/A * When distributing Covered Code, include this CDDL HEADER in each 2N/A * If applicable, add the following below this CDDL HEADER, with the 2N/A * fields enclosed by brackets "[]" replaced with your own identifying 2N/A * information: Portions Copyright [yyyy] [name of copyright owner] 2N/A * Copyright 2008 Sun Microsystems, Inc. All rights reserved. 2N/A * Use is subject to license terms. 2N/A#
pragma ident "%Z%%M% %I% %E% SMI" 2N/A * Fill in and write out the cpr state file 2N/A * 1. Allocate and write headers, ELF and cpr dump header 2N/A * 2. Allocate bitmaps according to phys_install 2N/A * 3. Tag kernel pages into corresponding bitmap 2N/A * 4. Write bitmaps to state file 2N/A * 5. Write actual physical page data to state file 2N/A/* Local defines and variables */ 2N/Astatic char *
cpr_wptr;
/* keep track of where to write to next */ 2N/A * On some platforms bcopy may modify the thread structure 2N/A * during bcopy (eg, to prevent cpu migration). If the 2N/A * range we are currently writing out includes our own 2N/A * thread structure then it will be snapshotted by bcopy 2N/A * including those modified members - and the updates made 2N/A * on exit from bcopy will no longer be seen when we later 2N/A * restore the mid-bcopy kthread_t. So if the range we 2N/A * need to copy overlaps with our thread structure we will 2N/A * use a simple byte copy. 2N/A * Allocate pages for buffers used in writing out the statefile 2N/A char *
allocerr =
"Unable to allocate memory for cpr buffer";
2N/A * set the cpr write buffer size to at least the historic 2N/A * size (128k) or large enough to store the both the early 2N/A * set of statefile structures (well under 0x800) plus the 2N/A * bitmaps, and roundup to the next pagesize. 2N/A * Set bitmap size in bytes based on phys_install. 2N/A * CPR dump header contains the following information: 2N/A * 1. header magic -- unique to cpr state file 2N/A * 2. kernel return pc & ppn for resume 2N/A * 3. current thread info 2N/A * 4. debug level and test mode 2N/A * 5. number of bitmaps allocated 2N/A * 6. number of page records 2N/A * Remember how many pages we plan to save to statefile. 2N/A * This information will be used for sanity checks. 2N/A * Untag those pages that will not be saved to statefile. 2N/A "\ncpr_write_header: kpages %ld - vpages %ld + upages %ld = %d\n",
2N/A * Some pages contain volatile data (cpr_buf and storage area for 2N/A * sensitive kpages), which are no longer needed after the statefile 2N/A * is dumped to disk. We have already untagged them from regular 2N/A * bitmaps. Now tag them into the volatile bitmaps. The pages in 2N/A * volatile bitmaps will be claimed during resume, and the resumed 2N/A * kernel will free them. 2N/A * Export accurate statefile size for statefile allocation retry. 2N/A * statefile_size = all the headers + total pages + 2N/A * number of pages used by the bitmaps. 2N/A * Roundup will be done in the file allocation code. 2N/A * If the estimated statefile is not big enough, 2N/A * go retry now to save un-necessary operations. 2N/A "STAT->cs_nocomp_statefsz > " 2N/A "STAT->cs_est_statefsz\n");
2N/A /* now write cpr dump descriptor */ 2N/A * CPR dump tail record contains the following information: 2N/A * 1. header magic -- unique to cpr state file 2N/A * 2. all misc info that needs to be passed to cprboot or resumed kernel 2N/A /* count the last one (flush) */ * Write bitmap descriptor array, followed by merged bitmaps. * merge regular and volatile bitmaps into tmp space * to get an accurate view of kas, we need to untag sensitive * pages *before* dumping them because the disk driver makes * allocations and changes kas along the way. The remaining * pages referenced in the bitmaps are dumped out later as str =
"cpr_write_statefile:";
* now it's OK to call a driver that makes allocations * now write out the clean sensitive kpages * according to the sensitive descriptors "%s cpr_dump_sensitive_kpages() failed!\n",
str);
* cpr_dump_regular_pages() counts cpr_regular_pgs_dumped "%s cpr_dump_regular_pages() failed!\n",
str);
* sanity check to verify the right number of pages were dumped * creates the CPR state file, the following sections are * written out in sequence: * - writes the cpr dump header * - writes the memory usage bitmaps * - writes the platform dependent info * - writes the remaining user pages * - writes the kernel pages /* point to top of internal buffer */ /* initialize global variables used by the write operation */ * set internal cross checking; we dont want to call * a disk driver that makes allocations until after * sensitive pages are saved * 1253112: heap corruption due to memory allocation when dumpping * Theoretically on Sun4u only the kernel data nucleus, kvalloc and * kvseg segments can be contaminated should memory allocations happen * during sddump, which is not supposed to happen after the system * is quiesced. Let's call the kernel pages that tend to be affected * 'sensitive kpages' here. To avoid saving inconsistent pages, we * will allocate some storage space to save the clean sensitive pages * aside before statefile dumping takes place. Since there may not be * much memory left at this stage, the sensitive pages will be * compressed before they are saved into the storage area. "cpr_dump: save_sensitive_kpages failed!\n");
* since all cpr allocations are done (space for sensitive kpages, * bitmaps, cpr_buf), kas is stable, and now we can accurately * count regular and sensitive kpages. "cpr_dump: cpr_write_header() failed!\n");
"cpr_dump: cpr_write_statefile() failed!\n");
* cpr_xwalk() is called many 100x with a range within kvseg or kvseg_reloc; * a page-count from each range is accumulated at arg->pages. * cpr_walk() is called many 100x with a range within kvseg or kvseg_reloc; * a page-count from each range is accumulated at arg->pages. * If we are about to start walking the range of addresses we * carved out of the kernel heap for the large page heap walk * heap_lp_arena to find what segments are actually populated * faster scan of kvseg using vmem_walk() to visit * cpr_walk_kpm() is called for every used area within the large * segkpm virtual address window. A page-count is accumulated at * faster scan of segkpm using hat_kpm_walk() to visit only used ranges. * Sparsely filled kernel segments are registered in kseg_table for * easier lookup. See also block comment for cpr_count_seg_pages. struct seg **
st_seg;
/* segment pointer or segment address */ * Compare seg with each entry in kseg_table; when there is a match * return the entry pointer, otherwise return NULL. * Count pages within each kernel segment; call cpr_sparse_seg_check() * to find out whether a sparsely filled segment needs special * treatment (e.g. kvseg). * Todo: A "SEGOP_CPR" like SEGOP_DUMP should be introduced, the cpr * module shouldn't need to know segment details like if it is * sparsely filled or not (makes kseg_table obsolete). * count kernel pages within kas and any special ranges * Some pages need to be taken care of differently. * eg: panicbuf pages of sun4m are not in kas but they need * to be saved. On sun4u, the physical pages of panicbuf are * allocated via prom_retain(). * Set a bit corresponding to the arg phys page number; * returns 0 when the ppn is valid and the corresponding * map bit was clear, otherwise returns 1. * Clear a bit corresponding to the arg phys page number. * Lookup a bit corresponding to the arg phys page number. * Go thru all pages and pick up any page not caught during the invalidation * stage. This is also used to save pages with cow lock or phys page lock held * (none zero p_lckcnt or p_cowcnt) dcnt++;
/* dirty count */ * try compressing pages based on cflag, * and for DEBUG kernels, verify uncompressed data checksum; * this routine replaces common code from * i_cpr_compress_and_save() and cpr_compress_and_write() * set length to the original uncompressed data size; * always init cpd_flag to zero * Make a copy of the uncompressed data so we can checksum it. * Compress that copy so the checksum works at the other end * try compressing the raw data to cpr_pagedata; * if there was a size reduction: record the new length, * flag the compression, and point to the compressed data. * decompress the data back to a scratch area * and compare the new checksum with the original * checksum to verify the compression. * 1. Prepare cpr page descriptor and write it to file * 2. Compress page data and write it out * Fill cpr page descriptor. /* Write cpr page descriptor */ /* Write compressed page data */ * Unmap the pages for tlb and vac flushing "cpr_compress_and_write: vp 0x%p va 0x%x ", (
void *)
vp,
va);
* break the write into multiple part if request is large, * calculate count up to buf page boundary, then write it out. return (0);
/* buffer not full yet */ "cpr_write: frmp=%p wptr=%p cnt=%lx...",
* cross check, this should not happen! * Calculate remaining blocks in buffer, rounded up to nearest i = 0;
/* Beginning of bitmap */ else /* not contiguous anymore */ /* Stopped on a non-tagged page */