dboot_startkern.c revision c1374a13e412c4ec42cba867e57347a0e049a822
2N/A * The contents of this file are subject to the terms of the 2N/A * Common Development and Distribution License (the "License"). 2N/A * You may not use this file except in compliance with the License. 2N/A * See the License for the specific language governing permissions 2N/A * and limitations under the License. 2N/A * When distributing Covered Code, include this CDDL HEADER in each 2N/A * If applicable, add the following below this CDDL HEADER, with the 2N/A * fields enclosed by brackets "[]" replaced with your own identifying 2N/A * information: Portions Copyright [yyyy] [name of copyright owner] 2N/A * Copyright 2009 Sun Microsystems, Inc. All rights reserved. 2N/A * Use is subject to license terms. 2N/A * This file contains code that runs to transition us from either a multiboot 2N/A * compliant loader (32 bit non-paging) or a XPV domain loader to 2N/A * regular kernel execution. Its task is to setup the kernel memory image 2N/A * The code executes as: 2N/A * - 32 bits under GRUB (for 32 or 64 bit Solaris) 2N/A * - a 32 bit program for the 32-bit PV hypervisor 2N/A * - a 64 bit program for the 64-bit PV hypervisor (at least for now) 2N/A * Under the PV hypervisor, we must create mappings for any memory beyond the 2N/A * initial start of day allocation (such as the kernel itself). 2N/A * When on the metal, the mapping between maddr_t and paddr_t is 1:1. 2N/A * Since we are running in real mode, so all such memory is accessible. 2N/A * Standard bits used in PTE (page level) and PTP (internal levels) 2N/A * This is the target addresses (physical) where the kernel text and data 2N/A * nucleus pages will be unpacked. On the hypervisor this is actually a 2N/A * The stack is setup in assembler before entering startup_kernel() 2N/A * Used to track physical memory allocation 2N/A * Additional information needed for hypervisor memory allocation. 2N/A * Only memory up to scratch_end is mapped by page tables. 2N/A * mfn_base is the start of the hypervisor virtual image. It's ONE_GIG, so 2N/A * to derive a pfn from a pointer, you subtract mfn_base. 2N/A * If on the metal, then we have a multiboot loader. 2N/A * This contains information passed to the kernel 2N/A * Page table and memory stuff. 2N/A * Information about processor MMU 2N/A * Low 32 bits of kernel entry address passed back to assembler. 2N/A * When running a 64 bit kernel, the high 32 bits are 0xffffffff. 2N/A * Memlists for the kernel. We shouldn't need a lot of these. 2N/A * Either hypervisor-specific or grub-specific code builds the initial 2N/A * Now sort the memlists, in case they weren't in order. 2N/A * Yeah, this is a bubble sort; small, simple and easy to get right. 2N/A for (i = 0; i < j; ++i) {
* Merge any memlists that don't have holes between them. --i;
/* after merging we need to reexamine, so do this */ * link together the memlists with native size pointers * build bios reserved memlists * halt on the hypervisor after a delay to drain console output * From a machine address, find the corresponding pseudo-physical address. * Pseudo-physical address are contiguous and run from mfn_base in each VM. * Machine addresses are the real underlying hardware addresses. * These are needed for page table entries. Note that this routine is * poorly protected. A bad value of "ma" will cause a page fault. * From a pseudo-physical address, find the corresponding machine address. dboot_printf(
"pa_to_ma(pfn=%lx) got %lx ma_to_pa() says %lx\n",
/* Remove write permission to the new page table. */ * dump out the contents of page tables... * Don't try to walk hypervisor private pagetables * shorten dump for consecutive mappings if (l ==
3 &&
index ==
256)
/* VA hole */ va =
0xffff800000000000ull;
* Add a mapping for the machine page at the given virtual address. * see if we can avoid find_pte() on the hypervisor * Find the pte that will map this address. This creates any * missing intermediate level page tables * When paravirtualized, we must use hypervisor calls to modify the * PTE, since paging is active. On real hardware we just write to * the pagetables which aren't in use yet. * Add a mapping for the physical page at the given virtual address. * This is called to remove start..end from the * possible range of PCI addresses. /* delete the entire range? */ --i;
/* to revisit the new one at this index */ ++i;
/* skip on to next one */ /* cut memory off the start? */ /* cut memory off the end? */ * Xen strips the size field out of the mb_memory_map_t, see struct e820entry * definition in Xen source. * page align start and end * Finish off the pcimemlist * Initialize memory allocator stuff from hypervisor-supplied start info. * There is 512KB of scratch area after the boot stack page. * We'll use that for everything except the kernel nucleus pages which are too * big to fit there and are allocated last anyway. int local;
/* variables needed to find start region */ DBG_MSG(
"Entered init_mem_alloc()\n");
* Free memory follows the stack. There's at least 512KB of scratch * space, rounded up to at least 2Mb alignment. That should be enough * for the page tables we'll need to build. The nucleus memory is * allocated last and will be outside the addressible range. We'll * switch to new page tables before we unpack the kernel * For paranoia, leave some space between hypervisor data and ours. * Use 500 instead of 512. * The domain builder gives us at most 1 module * Using pseudo-physical addresses, so only 1 memlist element * finish building physinstall list * build bios reserved memlists /*LINTED: constant in conditional context*/ dboot_panic(
"getting XENMEM_machine_memory_map failed");
* During memory allocation, find the highest address not used yet. * Walk through the module information finding the last used address. * The first available address will become the top level page table. * We then build the phys_install memlist from the multiboot information. DBG_MSG(
"Entered init_mem_alloc()\n");
dboot_panic(
"Too many modules (%d) -- the maximum is %d.",
* search the modules to find the last used address * we'll build the module list while we're walking through here dboot_panic(
"module[%d]: Invalid module start address " * Walk through the memory map from multiboot and build our memlist * structures. Note these will have native format pointers. * page align start and end * only type 1 is usable RAM * Old platform - assume I/O space at the end of memory. * finish processing the physinstall list * build bios reserved mem lists * Simple memory allocator, allocates aligned physical memory. * Note that startup_kernel() only allocates memory, never frees. * Memory usage just grows in an upward direction. * make sure size is a multiple of pagesize * a really large bootarchive that causes you to run out of memory * may cause this to blow up /* LINTED E_UNEXPECTED_UINT_PROMOTION */ * did we find the desired address? * if not is this address the best so far? * We didn't find exactly the address we wanted, due to going off the * end of a memory region. Return the best found memory address. dboot_panic(
"Out of mem next_avail: 0x%lx, scratch_end: " * Build page tables to map all of memory used so far as well as the kernel. * If we're on metal, we need to create the top level pagetable. * Determine if we'll use large mappings for kernel, then map it. * The kernel will need a 1 page window to work with page tables /* If this is a domU we're done. */ DBG_MSG(
"\nPage tables constructed\n");
* We need 1:1 mappings for the lower 1M of memory to access * BIOS tables used by a couple of drivers during boot. * The following code works because our simple memory allocator * only grows usage in an upwards direction. * Note that by this point in boot some mappings for low memory * may already exist because we've already accessed device in low * memory. (Specifically the video frame buffer and keyboard * status ports.) If we're booting on raw hardware then GRUB * created these mappings for us. If we're booting under a * hypervisor then we went ahead and remapped these devices into * memory allocated within dboot itself. DBG_MSG(
"\nPage tables constructed\n");
"multiboot is no longer used to boot the Solaris Operating System.\n\ The grub entry should be changed to:\n\ * startup_kernel has a pretty simple job. It builds pagetables which reflect * 1:1 mappings for all memory in use. It then also adds mappings for * the kernel nucleus at virtual address of target_kernel_text using large page * mappings. The page table pages are also accessible at 1:1 mapped * At this point we are executing in a 32 bit real mode. * For dom0, before we initialize the console subsystem we'll * need to enable io operations, so set I/O priveldge level to 1. DBG_MSG(
"\n\nSolaris prekernel set: ");
* boot info must be 16 byte aligned for 64 bit kernel ABI * Need correct target_kernel_text value * XXPV Derive this stuff from CPUID / what the hypervisor has enabled #
else /* _BOOT_TARGET_amd64 */ * See if we are running on a PAE Hypervisor #
endif /* _BOOT_TARGET_amd64 */ * The hypervisor loads stuff starting at 1Gig * enable writable page table mode for the hypervisor dboot_panic(
"HYPERVISOR_vm_assist(writable_pagetables) failed");
* The 32-bit hypervisor uses segmentation to protect itself from * guests. This means when a guest attempts to install a flat 4GB * code or data descriptor the 32-bit hypervisor will protect itself * by silently shrinking the segment such that if the guest attempts * any access where the hypervisor lives a #gp fault is generated. * The problem is that some applications expect a full 4GB flat * segment for their current thread pointer and will use negative * offset segment wrap around to access data. TLS support in linux * brand is one example of this. * The 32-bit hypervisor can catch the #gp fault in these cases * and emulate the access without passing the #gp fault to the guest * but only if VMASST_TYPE_4gb_segments is explicitly turned on. * Seems like this should have been the default. * Either way, we want the hypervisor -- and not Solaris -- to deal * to deal with emulating these accesses. dboot_panic(
"HYPERVISOR_vm_assist(4gb_segments) failed");
#
endif /* !_BOOT_TARGET_amd64 */ * use cpuid to enable MMU features * Allow the command line to over-ride use of PAE for 32 bit. * initialize the simple memory allocator * disable PAE on 32 bit h/w w/o NX and < 4Gig of memory * configure mmu information * For grub, copy kernel bits from the ELF64 file to final place. DBG_MSG(
"\nAllocating nucleus pages.\n");
dboot_panic(
"failed to allocate aligned kernel memory");
dboot_panic(
"failed to parse kernel ELF image, rebooting");
* return to assembly code to switch to running kernel * unmap unused pages in start area to make them available for DMA DBG_MSG(
"\n\n*** DBOOT DONE -- back to asm to jump to kernel\n\n");