gprof.c revision e0ddff35438f277370a2eae5c6718cd5ba0fe3ab
1N/A * The contents of this file are subject to the terms of the 1N/A * Common Development and Distribution License, Version 1.0 only 1N/A * (the "License"). You may not use this file except in compliance 1N/A * See the License for the specific language governing permissions 1N/A * and limitations under the License. 1N/A * When distributing Covered Code, include this CDDL HEADER in each 1N/A * If applicable, add the following below this CDDL HEADER, with the 1N/A * fields enclosed by brackets "[]" replaced with your own identifying 1N/A * information: Portions Copyright [yyyy] [name of copyright owner] 1N/A * Copyright 2006 Sun Microsystems, Inc. All rights reserved. 1N/A * Use is subject to license terms. 1N/A#
pragma ident "%Z%%M% %I% %E% SMI" 1N/A * things which get -E excluded by default. * calculate scaled entry point addresses (to save time in asgnsamples), * and possibly push the scaled entry points over the entry mask, * if it turns out that the entry point is in one bucket and the code * for a routine is in the next bucket. /* for old-style gmon.out, nameslist is only in modules.nl */ "[alignentries] pushing svalue 0x%llx " * Assign samples to the procedures to which they belong. * There are three cases as to where pcl and pch can be * with respect to the routine entry addresses svalue0 and svalue1 * as shown in the following diagram. overlap computes the * distance between the arrows, the fraction of the sample * that is to be credited to the routine which starts at svalue0. * +-----------------------------------------------+ * | ->| |<- ->| |<- ->| |<- | * +---------+ +---------+ +---------+ * pcl pch pcl pch pcl pch * For the vax we assert that samples will never fall in the first * two bytes of any routine, since that is the entry mask, * thus we give call alignentries() to adjust the entry points if * the entry mask falls in one bucket but the code for the routine * doesn't start until the next bucket. In conjunction with the * alignment of routine addresses, this should allow us to have * only one sample for every four bytes of text space and never * have any overlap (the two end cases, above). /* read samples and assign to namelist symbols */ "[asgnsamples] pcl 0x%llx pch 0x%llx ccnt %d\n",
for (j = (j ? j -
1 : 0); j <
nname; j++) {
* if high end of tick is below entry address, * if low end of tick into next routine, (
void)
printf(
"[asgnsamples] " "(0x%llx->0x%llx-0x%llx) %s gets " "%f ticks %lld overlap\n",
* Write the callgraph header /* Current offset inside the callgraph object */ /* If this is the last callee, set next_to to 0 */ * Dump this callee's raw arc information with all * If no more callers for this callee, set * To save all pc-hits in all the gmon.out's is infeasible, as this * may become quite huge even with a small number of files to sum. * Instead, we'll dump *fictitious hits* to correct functions * by scanning module namelists. Again, since this is summing * pc-hits, we may have to dump the pcsamples out in chunks if the * number of pc-hits is high. * Set up *fictitious* hits (to function entry) buffer for (i = 0; i <
nelem; i++)
/* Allocate for path strings buffer */ /* Dump out PROF_MODULE_T info for all non-aout modules */ * Initialize offsets for ProfModule elements. /* Note that offset to every path str need not be aligned */ /* Write out the module path strings */ * If we have inactive modules, their current load addresses may overlap with * active ones, and so we've to assign fictitious, non-overlapping addresses * to all modules before we dump them. /* Pick the lowest load address among modules */ * Return total path size of non-aout modules only * All module info is in fine shape already if there are no * Assign fictitious load addresses to all (non-aout) modules so * that sum info can be dumped out. /* just to give an appearance of reality */ * can't use this lbase & lend pair, as it * overlaps with aout's addresses * dump the header; use the last header read in * dump the normalized raw arc information. For old-style dumping, * the only namelist is in modules.nl "[dumpsum_ostyle] frompc 0x%llx selfpc " unsigned long total_arcs;
/* total number of arcs in all */ unsigned long ncallees;
/* no. of callees with parents */ * Dump the new-style gprof header. Even if one of the original * profiled-files was of a older version, the summed file is of * Fix up load-maps and dump out modules info * Fix up module load maps so inactive modules get *some* address * (and btw, could you get the total size of non-aout module path * Dump out the summ'd pcsamples * For dumping call graph information later, we need certain * statistics (like total arcs, number of callers for each node); * collect these also while we are at it. * Dump out the summ'd call graph information * if count == 0 this is a null arc and * we don't need to tally it. * Lookup the caller and callee pcs in namelists of (
void)
printf(
"[tally] arc from %s to %s traversed " * Look up a module's base address in a sorted list of pc-hits. Unlike * nllookup(), this deals with misses by mapping them to the next *higher* * pc-hit. This is so that we get into the module's first pc-hit rightaway, * even if the module's entry-point (load_base) itself is not a hit. /* must never reach here! */ /* Locate the first pc-hit for this module */ (
void)
printf(
"[assign_pcsamples] no pc-hits in\n");
return;
/* no pc-hits in this module */ /* Assign all pc-hits in this module to appropriate functions */ /* Update the corresponding function's time */ * Collect all pc-hits in this function. Each * pc-hit counts as 1 tick. * pc sample could not be assigned to function; "[process_pcsamples] number of pcsamples = %lld\n",
/* buffer with no pc samples ? */ * If we're processing pcsamples of a profile sum, we could have * more than PROF_BUFFER_SIZE number of samples. In such a case, * we must read the pcsamples in chunks. /* Allocate for the pcsample chunk */ /* Copy the current set of pcsamples */ /* Sort the pc samples */ * Assign pcsamples to functions in the currently active /* Update total number of pcsamples read so far */ * Note that *callee_off* increment in the for loop below * uses *calleep* and *calleep* doesn't get set until the for loop * is entered. We don't expect the increment to be executed before * the loop body is executed atleast once, so this should be ok. /* LINTED: pointer cast */ * We could choose either to sort the {caller, callee} * the modules list is usually very small, we'l choose the * If we cannot identify a callee with a module, there's * no use worrying about who called it. "[process_cgraph] callee %#llx missed\n",
/* LINTED: pointer cast */ "[process_cgraph] caller %#llx " "[process_cgraph] arc <%#llx, %#llx, " * Two modules overlap each other if they don't lie completely *outside* /* case 1: new module lies completely *before* the old one */ /* case 2: new module lies completely *after* the old one */ /* probably a dlopen: the modules overlap each other */ (
void)
printf(
"[process_modules] module obj version %u\n",
/* Check version of module type object */ * Scan the PROF_MODULES_T list and add modules to current list * of modules, if they're not present already /* LINTED: pointer cast */ * Since the prog could've been renamed after its run, we * should see if this overlaps a.out. If it does, it is * probably the renamed aout. We should also skip any other * non-sharedobj's that we see (or should we report an error ?) /* LINTED: pointer cast */ "[process_modules] `%s'\n",
so_path);
* Check all modules (leave the first one, 'cos that * is the program executable info). If this module is already * there in the list, update the load addresses and proceed. * We expect the full pathname for all shared objects * needed by the program executable. In this case, we * simply need to compare the paths to see if they are * Check if this new shared object will overlap * any existing module. If yes, remove the old one * from the linked list (but don't free it, 'cos * there may be symbols referring to this module "[process_modules] `%s'\n",
/* Module already there, skip it */ /* LINTED: pointer cast */ (
void)
printf(
"[process_modules] base=%#llx, " * Check if gmon.out is outdated with respect to the new "%s: shared obj outdates prof info\n",
whoami);
/* Create a new module element */ /* and fill in info... */ "[process_modules] base=%#llx, end=%#llx\n",
/* Create this module's nameslist */ /* Add it to the tail of active module list */ "[process_modules] total shared objects = %ld\n",
* Move to the next module in the PROF_MODULES_T list /* LINTED: pointer cast */ /* Except the executable, no other module should remain active */ * Before processing a new gmon.out, all modules except the * program executable must be made inactive, so that symbols * are searched only in the program executable, if we don't * find a MODULES_T object. Don't do it *after* we read a gmon.out, * because we need the active module data after we're done with * the last gmon.out, if we're doing summing. /* LINTED: pointer cast */ "\n[getpfiledata] object %s [%#lx]\n",
"%s: unknown prof object type=%d\n",
/* LINTED: pointer cast */ * the rest of the file consists of * a bunch of <from,self,count> tuples. * If rflag is set then this is an profiled * image generated by rtld. It needs to be * 'converted' to the standard data format. * If rflag is set then this is an profiled * image generated by rtld. It needs to be * 'converted' to the standard data format. * If these aren't big %pc's, we need to read * into the 32-bit raw arc structure, and * assign the members into the actual arc. (
void)
printf(
"[getpfile] frompc 0x%llx selfpc " "%s: No room for %d sample pc's\n",
"%s: unexpected EOF after reading %d/%d samples\n",
* Check versioning info. For now, let's say we provide * backward compatibility, so we accept all older versions. * Map gmon.out onto memory. * Before we close this fd, save this gmon.out's info to later verify * if the shared objects it references have changed since the time * they were used to generate this gmon.out * Read in the magic. Note that we changed the cast "unsigned long" * to "unsigned int" because that's how h_magic is defined in the * First check if this is versioned or *old-style* gmon.out * Now, we need to determine if this is a run-time linker * profiled file or if it is a standard gmon.out. * We do this by checking if magic matches PRF_MAGIC. If it * does, then this is a run-time linker profiled file, if it * doesn't, it must be a gmon.out file. * If the rflag is set then the input file is * rtld profiled data, we'll read it in and convert * it to the standard format (ie: make it look like "%s: expected version %d, " "got version %d when processing 64-bit " "run-time linker profiled file.\n",
* If the rflag is set then the input file is * rtld profiled data, we'll read it in and convert * it to the standard format (ie: make it look like "%s: expected version %d, " "got version %d when processing " "run-time linker profiled file.\n",
* If we're not reading big %pc's, we need to read * the 32-bit header, and assign the members to * perform sanity check on profiled file we've opened. "%s: badly formed profiled data.\n",
"%s: badly formed gmon.out file.\n",
"%s: incompatible with first gmon file\n",
(
void)
printf(
"[openpfile] hdr.lowpc 0x%llx hdr.highpc " "0x%llx hdr.ncnt %lld\n",
"[openpfile] s_lowpc 0x%llx s_highpc 0x%llx\n",
"[openpfile] lowpc 0x%llx highpc 0x%llx\n",
(
void)
printf(
"[openpfile] sampbytes %d nsamples %d\n",
* Information from a gmon.out file depends on whether it's versioned * or non-versioned, *old style* gmon.out. If old-style, it is in two * parts : an array of sampling hits within pc ranges, and the arcs. If * versioned, it contains a header, followed by any number of "usage: gprof [ -abcCDlsz ] [ -e function-name ] " "[ -E function-name ]\n\t[ -f function-name ] " "[ -F function-name ]\n\t[ image-file " "[ profile-file ... ] ]\n");
* turn off default functions * how many ticks per second? * if we can't tell, report time in ticks. * get information about mon.out file(s). * dump out a gmon.sum file if requested * assign samples to procedures * assemble the dynamic profile * print the dynamic profile /* raw output of all symbols in all their glory */ (
void)
printf(
" Name, pc_entry_pt, svalue, tix_in_routine, " "#calls, selfcalls, index \n");
for (i = 0; i <
modules.
nname; i++) {
/* Print each symbol */