Psymtab.c revision 30da143285931291f495cc20b5a1b8869f0618a6
2N/A * The contents of this file are subject to the terms of the 2N/A * Common Development and Distribution License, Version 1.0 only 2N/A * (the "License"). You may not use this file except in compliance 2N/A * See the License for the specific language governing permissions 2N/A * and limitations under the License. 2N/A * When distributing Covered Code, include this CDDL HEADER in each 2N/A * If applicable, add the following below this CDDL HEADER, with the 2N/A * fields enclosed by brackets "[]" replaced with your own identifying 2N/A * information: Portions Copyright [yyyy] [name of copyright owner] 2N/A * Copyright 2005 Sun Microsystems, Inc. All rights reserved. 2N/A * Use is subject to license terms. 2N/A#
pragma ident "%Z%%M% %I% %E% SMI" * Allocation function for a new file_info_t * To figure out which map_info_t instances correspond to the mappings * for this load object, we look at the in-memory ELF image in the * base mapping (usually the program text). We examine the program * headers to find the addresses at the beginning and end of each * section and store them in a list which we then sort. Finally, we * walk down the list of addresses and the list of map_info_t * instances in lock step to correctly find the mappings that * correspond to this load object. * Deallocation function for a file_info_t * Deallocation function for a map_info_t * Call-back function for librtld_db to iterate through all of its shared * libraries. We use this to get the load object names for the mappings. return (
1);
/* Base address does not match any mapping */ return (
1);
/* Failed to allocate a new file_info_t */ return (
1);
/* Failed to allocate rd_loadobj_t */ dprintf(
"loaded rd object %s lmid %lx\n",
return;
/* Failed to allocate a new file_info_t */ return;
/* Failed to allocate rd_loadobj_t */ * Construct the map for the a.out. * If the dynamic linker exists for this process, * construct the map for it. * Go through all the address space mappings, validating or updating * the information already gathered, or gathering new information. * This function is only called when we suspect that the mappings have changed * because this is the first time we're calling it or because of rtld activity. * We try to merge any file information we may have for existing * mappings, to avoid having to rebuild the file info. * We've exhausted all the old mappings. Every new * mapping should be added. * This mapping matches exactly. Copy over the old * mapping, taking care to get the latest flags. * Make sure the associated file_info_t is updated * The old mapping doesn't exist any more, remove it * This is a new mapping, add it directly. * Consult librtld_db to get the load object * names for all of the shared libraries. * Update all of the mappings and rtld_db as if by Pupdate_maps(), and then * forcibly cache all of the symbol tables associated with all object files. * Return the librtld_db agent handle for the victim process. * The handle will become invalid at the next successful exec() and the * client (caller of proc_rd_agent()) must not use it beyond that point. * If the process is already dead, we've already tried our best to * create the agent during core file initialization. * Return the prmap_t structure containing 'addr', but only if it * is in the dynamic linker's link map and is the text section. * Return the prmap_t structure containing 'addr' (no restrictions on * Convert a full or partial load object name to the prmap_t for its * corresponding primary text mapping. return (
NULL);
/* A reasonable mistake */ * By building the symbol table, we implicitly bring the PLT * information up to date in the load object. * By building the symbol table, we implicitly bring the PLT * information up to date in the load object. * The buffer may alread be allocated if this is a core file that * contained CTF data for this file. dprintf(
"failed to allocate ctf buffer\n");
dprintf(
"failed to read ctf data\n");
dprintf(
"loaded %lu bytes of CTF data for %s\n",
* If we're not a core file, re-read the /proc/<pid>/auxv file and store * its contents in P->auxv. In the case of a core file, we either * initialized P->auxv in Pcore() from the NT_AUXV, or we don't have an * auxv because the note was missing. return;
/* Already read during Pgrab_core() */ return;
/* No aux vec for Pgrab_file() */ * Return a requested element from the process's aux vector. * Return -1 on failure (this is adequate for our purposes). * Return a pointer to our internal copy of the process's aux vector. * The caller should not hold on to this pointer across any libproc calls. * Find or build the symbol table for the given mapping. * Attempt to find a matching file. * (A file can be mapped at several different addresses.) * If we need to create a new file_info structure, iterate * through the load objects in order to attempt to connect * this new file with its primary text mapping. We again * need to handle ld.so as a special case because we need * to be able to bootstrap librtld_db. * If librtld_db wasn't able to help us connect the file to a primary * text mapping, set file_map to the current mapping because we require * fptr->file_map to be set in Pbuild_file_symtab. librtld_db may be * unaware of what's going on in the rare case that a legitimate ELF * file has been mmap(2)ed into the process address space *without* * the use of dlopen(3x). Why would this happen? See pwdx ... :) for (i = 0; i <
phnum; i++) {
for (i = 0; i <
phnum; i++) {
* The text segment for each load object contains the elf header and * program headers. We can use this information to determine if the * file that corresponds to the load object is the same file that * was loaded into the process's address space. There can be a discrepency * if a file is recompiled after the process is started or if the target * represents a core file from a differently configured system -- two * common examples. The DT_CHECKSUM entry in the dynamic section * provides an easy method of comparison. It is important to note that * the dynamic section usually lives in the data segment, but the meta * data we use to find the dynamic section lives in the text segment so * if either of those segments is absent we can't proceed. * We're looking through the elf file for several items: the symbol tables * (both dynsym and symtab), the procedure linkage table (PLT) base, * size, and relocation base, and the CTF information. Most of this can * be recovered from the loaded image of the file itself, the exceptions * being the symtab and CTF data. * First we try to open the file that we think corresponds to the load * object, if the DT_CHECKSUM values match, we're all set, and can simply * recover all the information we need from the file. If the values of * DT_CHECKSUM don't match, or if we can't access the file for whatever * reasaon, we fake up a elf file to use in its stead. If we can't read * the elf data in the process's address space, we fall back to using * the file even though it may give inaccurate information. * The elf file that we fake up has to consist of sections for the * dynsym, the PLT and the dynamic section. Note that in the case of a * core file, we'll get the CTF data in the file_info_t later on from * a section embedded the core file (if it's present). * file_differs() conservatively looks for mismatched files, identifying * a match when there is any ambiguity (since that's the legacy behavior). * First, we find the checksum value in the elf file. for (i = 0; i <
ndyn; i++) {
* Get the base of the text mapping that corresponds to this file. dprintf(
"image cksum value is %llx\n",
dprintf(
"image cksum value is %llx\n",
static char shstr[] =
".shstrtab\0.dynsym\0.dynstr\0.dynamic\0.plt";
* We're building a in memory elf file that will let us use libelf * for most of the work we need to later (e.g. symbol table lookups). * We need sections for the dynsym, dynstr, and plt, and we need * the program headers from the text section. The former is used in * Pbuild_file_symtab(); the latter is used in several functions in * Pcore.c to reconstruct the origin of each mapping from the load * object that spawned it. * Here are some useful pieces of elf trivia that will help * to elucidate this code. * All the information we need about the dynstr can be found in these * two entries in the dynamic section: * DT_STRTAB base of dynstr * DT_STRSZ size of dynstr * So deciphering the dynstr is pretty straightforward. * The dynsym is a little trickier. * DT_SYMTAB base of dynsym * DT_SYMENT size of a dynstr entry (Elf{32,64}_Sym) * DT_HASH base of hash table for dynamic lookups * The DT_SYMTAB entry gives us any easy way of getting to the base * of the dynsym, but getting the size involves rooting around in the * dynamic lookup hash table. Here's the layout of the hash table: * | nbucket | All values are of type * +-------------------+ Elf32_Word * (figure 5-12 from the SYS V Generic ABI) * Symbols names are hashed into a particular bucket which contains * an index into the symbol table. Each entry in the symbol table * has a corresponding entry in the chain table which tells the * consumer where the next entry in the hash chain is. We can use * the nchain field to find out the size of the dynsym. * We can figure out the size of the .plt section, but it takes some * doing. We need to use the following information: * DT_PLTGOT base of the PLT * DT_JMPREL base of the PLT's relocation section * DT_PLTRELSZ size of the PLT's relocation section * DT_PLTREL type of the PLT's relocation section * We can use the relocation section to figure out the address of the * last entry and subtract off the value of DT_PLTGOT to calculate * For more information, check out the System V Generic ABI. * For the .dynsym section. * For the .dynstr section. * We need all of those dynamic entries in order to put * together a complete set of elf sections, but we'll * let the PLT section slide if need be. The dynsym- and * dynstr-related dynamic entries are mandatory in both * executables and shared objects so if one of those is * missing, we're in some trouble and should abort. dprintf(
"text section missing required dynamic " /* program headers from in-core elf fragment */ /* unused shdr, and .shstrtab section */ sizeof (r[0]) *
ndx) !=
sizeof (r))
sizeof (r[0]) *
ndx) !=
sizeof (r))
* Copying the program headers directly from the process's * address space is a little suspect, but since we only * use them for their address and size values, this is fine. * The first elf section is always skipped. * Section Header[1] sh_name: .shstrtab * Section Header[2] sh_name: .dynsym * Section Header[3] sh_name: .dynstr * Section Header[4] sh_name: .dynamic * Section Header[5] sh_name: .plt * For the .dynsym section. * For the .dynstr section. * We need all of those dynamic entries in order to put * together a complete set of elf sections, but we'll * let the PLT section slide if need be. The dynsym- and * dynstr-related dynamic entries are mandatory in both * executables and shared objects so if one of those is * missing, we're in some trouble and should abort. dprintf(
"text section missing required dynamic " /* program headers from in-core elf fragment */ /* unused shdr, and .shstrtab section */ sizeof (r[0]) *
ndx) !=
sizeof (r))
sizeof (r[0]) *
ndx) !=
sizeof (r))
* Copying the program headers directly from the process's * address space is a little suspect, but since we only * use them for their address and size values, this is fine. * The first elf section is always skipped. * Section Header[1] sh_name: .shstrtab * Section Header[2] sh_name: .dynsym * Section Header[3] sh_name: .dynstr * Section Header[4] sh_name: .dynamic * Section Header[5] sh_name: .plt * We wouldn't need these if qsort(3C) took an argument for the callback... * Prefer the function to the non-function. * Prefer the weak or strong global symbol to the local symbol. * Prefer the name with fewer leading underscores in the name. * Prefer the symbol with the smaller size. * All other factors being equal, fall back to lexicographic order. * First record all the symbols into a table and count up the ones * that we're interested in. We mark symbols as invalid by setting * the st_name to an illegal value. * Allocate sufficient space for both tables and populate them * with the same symbols we just counted. * Sort the two tables according to the appropriate criteria. * Build the symbol table for the given mapped file. return;
/* We've already processed this file */ * Mark the file_info struct as having the symbol table initialized * even if we fail below. We tried once; we don't try again. dprintf(
"libproc ELF version is more recent than libelf\n");
* If we're a not live, we can't open files from the /proc * object directory; we have only the mapping and file names * to guide us. We prefer the file_lname, but need to handle * the case of it being NULL in order to bootstrap: we first * come here during rd_new() when the only information we have * is interpreter name associated with the AT_BASE mapping. * Open the object file, create the elf file, and then get the elf * header and .shstrtab data buffer so we can process sections by * name. If anything goes wrong try to fake up an elf file from dprintf(
"Pbuild_file_symtab: failed to open %s: %s\n",
dprintf(
"failed to fake up ELF file\n");
dprintf(
"failed to process ELF file %s: %s\n",
dprintf(
"failed to fake up ELF file\n");
* Before we get too excited about this elf file, we'll check * its checksum value against the value we have in memory. If * they don't agree, we try to fake up a new elf file and * proceed with that instead. dprintf(
"ELF file %s (%lx) doesn't match in-core image\n",
dprintf(
"failed to fake up ELF file\n");
dprintf(
"switched to faked up ELF file\n");
* Iterate through each section, caching its section header, data * pointer, and name. We use this for handling sh_link values below. goto bad;
/* Failed to get section header */ goto bad;
/* Failed to get section data */ goto bad;
/* Corrupt section name */ * Now iterate through the section cache in order to locate info * for the .symtab, .dynsym, .dynamic, .plt, and .SUNW_ctf sections: * It's possible that the we already got the symbol * table from the core file itself. Either the file * differs in which case our faked up elf file will * only contain the dynsym (not the symtab) or the * file matches in which case we'll just be replacing * the symbol table we pulled out of the core file * with an equivalent one. In either case, this * check isn't essential, but it's a good idea. * Skip over bogus CTF sections so they don't come back * At this point, we've found all the symbol tables we're ever going * to find: the ones in the loop above and possibly the symtab that * was included in the core file. Before we perform any lookups, we * create sorted versions to optimize for lookups. * Fill in the base address of the text mapping for shared libraries. * This allows us to translate symbols before librtld_db is ready. dprintf(
"setting file_dyn_base for %s to %p\n",
* Record the CTF section information in the file info structure. goto done;
/* Nothing else to do if no load object info */ * If the object is a shared library and we have a different rl_base * value, reset file_dyn_base according to librtld_db's information. dprintf(
"resetting file_dyn_base for %s to %p\n",
* Fill in the PLT information for this file if a PLT symbol is found. * Bring the load object up to date; it is the only way the * user has to access the PLT data. The PLT information in the * rd_loadobj_t is not set in the call to map_iter() (the * callback for rd_loadobj_iter) where we set file_lo. dprintf(
"PLT found at %p, size = %lu\n",
* Fill in the PLT information. for (i = 0; i <
ndyn; i++) {
dprintf(
"_DYNAMIC found at %p, %lu entries, DT_JMPREL = %p\n",
* Given a process virtual address, return the map_info_t containing it. * If none found, return NULL. /* check that addr is in [vaddr, vaddr + size) */ * Return the map_info_t for the executable file. * If not found, return NULL. /* This is a poor way to test for text space */ * Given a shared object name, return the map_info_t for it. If no matching * object is found, return NULL. Normally, the link maps contain the full * take one of the following forms: * 2. An exact basename match: "libc.so.1" * 3. An initial basename match up to a '.' suffix: "libc.so" or "libc" * 4. The literal string "a.out" is an alias for the executable mapping * The third case is a convenience for callers and may not be necessary. * As the exact same object name may be loaded on different link maps (see * dlmopen(3DL)), we also allow the caller to resolve the object name by * specifying a particular link map id. If lmid is PR_LMID_EVERY, the * first matching name will be returned, regardless of the link map id. * First pass: look for exact matches of the entire pathname or * basename (cases 1 and 2 above): * If we match, return the primary text mapping; otherwise * just return the mapping we matched. * Second pass: look for partial matches (case 3 above): * If we match, return the primary text mapping; otherwise * just return the mapping we matched. * One last check: we allow "a.out" to always alias the executable, * assuming this name was not in use for something else. * When two symbols are found by address, decide which one is to be preferred. * Prefer the non-NULL symbol. * Defer to the sort ordering... * Look up a symbol by address in the specified symbol table. * Adjustment to 'addr' must already have been made for the * offset of the symbol if this is a dynamic library symbol table. * We can't return when we've found a match, we have to continue * searching for the closest matching symbol. * There may be many symbols with identical values so we walk * backward in the byaddr table to find the best match. * Look up a symbol by name in the specified symbol table. * Search the process symbol tables looking for a symbol whose * value to value+size contain the address specified by addr. * sym_name_buffer containing the symbol name * GElf_Sym symbol table entry * prsyminfo_t ancillary symbol information * Returns 0 on success, -1 on failure. * Adjust the address by the load object base address in * case the address turns out to be in a shared library. * Search both symbol tables, symtab first, then dynsym. * Search the process symbol tables looking for a symbol whose name matches the * specified name and whose object and link map optionally match the specified * parameters. On success, the function returns 0 and fills in the GElf_Sym * symbol table entry. On failure, -1 is returned. Lmid_t lmid,
/* link map to match, or -1 for any */ const char *
oname,
/* load object name */ const char *
sname,
/* symbol name */ /* create all the file_info_t's for all the mappings */ * Iterate through the loaded object files and look for the symbol * name in the .symtab and .dynsym of each. If we encounter a match * with SHN_UNDEF, keep looking in hopes of finding a better match. * This means that a name such as "puts" will match the puts function * in libc instead of matching the puts PLT entry in the a.out file. * Search the process symbol tables looking for a symbol whose name matches the * specified name, but without any restriction on the link map id. * Iterate over the process's address space mappings. /* create all the file_info_t's for all the mappings */ * Iterate over the process's mapped objects. (
void)
Prd_agent(P);
/* create file_info_t's for all the mappings */ * Given a virtual address, return the name of the underlying * mapped object (file), as provided by the dynamic linker. * Return NULL on failure (no underlying shared library). /* create all the file_info_t's for all the mappings */ * Given a virtual address, return the link map id of the underlying mapped * object (file), as provided by the dynamic linker. Return -1 on failure. /* create all the file_info_t's for all the mappings */ * Given an object name and optional lmid, iterate over the object's symbols. * If which == PR_SYMTAB, search the normal symbol table. * If which == PR_DYNSYM, search the dynamic symbol table. * Search the specified symbol table. for (i = 0; i <
count; i++) {
* In case you haven't already guessed, this relies on * the bitmask used in <libproc.h> for encoding symbol * type and binding matching the order of STB and STT * constants in <sys/elf.h>. ELF can't change without * breaking binary compatibility, so I think this is continue;
/* Invalid type or binding */ * Get the platform string from the core file if we have it; * just perform the system call for the caller if this is a live process. * Get the uname(2) information from the core file if we have it; * just perform the system call for the caller if this is a live process. * Get the zone name from the core file if we have it; look up the * name based on the zone id if this is a live process. * Called from Pcreate(), Pgrab(), and Pfgrab_core() to initialize * the symbol table heads in the new ps_prochandle. * Called from Prelease() to destroy the symbol tables. * Must be called by the client after an exec() in the victim process. /* number of argument or environment pointers to read all at once */ * Attempt to find the "_environ" variable in the process. * Failing that, use the original value provided by Ppsinfo(). for (i = 0; i <
NARG; i++)
* Attempt to read the string from the process. * Bail if we have a corrupted environment