summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* Add support for reading debug symbols automatically on systems where shared ↵Matt Cross2015-03-261-1/+10
| | | | libraries with debug symbols are installed at "/usr/lib/debug/<originalpath>.debug", such as RHEL and CentOS.
* callgrind : handle inlined functionsJonathan Lambrechts2015-02-131-8/+33
|
* pprof : callgrind : fix unknown filesJonathan Lambrechts2015-02-131-1/+1
|
* issue-672: fixed date of news entry of gperftools 2.4 releaseAliaksey Kandratsenka2015-02-091-1/+1
| | | | It is 2015 and not 2014. Spotted and reported by Armin Rigo.
* fixed default value of HEAP_PROFILER_TIME_INTERVAL in .html docAliaksey Kandratsenka2015-01-101-1/+1
|
* bumped version to 2.4gperftools-2.4Aliaksey Kandratsenka2015-01-104-8/+12
|
* bumped version to 2.4rcgperftools-2.3.90Aliaksey Kandratsenka2014-12-283-7/+7
|
* updated NEWS for gperftools 2.4rcAliaksey Kandratsenka2014-12-281-0/+23
|
* pprof: allow disabling auto-removal of "constant 2nd frame"Aliaksey Kandratsenka2014-12-281-1/+10
| | | | | | | | | | | | | | "constand 2nd frame" feature is supposed to detect and workaround incorrect cpu profile stack captures where parts of or whole cpu profiling signal handler frames are not skipped. I've seen programs where this feature incorrectly removes non-signal frames. Plus it actually hides bugs in stacktrace capturing which we want be able to spot. There is now --no-auto-signal-frm option for disabling it.
* cpuprofiler: drop correct number of signal handler framesAliaksey Kandratsenka2014-12-281-9/+10
| | | | We actually have 3 and not 2 of them.
* pprof: eliminate duplicate top frames if dropping signal framesAliaksey Kandratsenka2014-12-281-0/+8
| | | | | | | | | | | | | | | In cpu profiles that had parts of signal handler we could have situation like that: * PC * signal handler frame * PC Specifically when capturing stacktraces via libunwind. For such stacktraces pprof used to draw self-cycle in functions confusing everybody. Given that me might have a number of such profiles in the wild it makes sense to treat that duplicate PC issue.
* cpuprofiler: better explain deduplication of top stacktrace entryAliaksey Kandratsenka2014-12-281-4/+5
|
* cpuprofiler: disable capturing stacktrace from signal's ucontextAliaksey Kandratsenka2014-12-281-1/+1
| | | | | | This was reported to cause problems due to libunwind occasionally returning top level pc that is 1 smaller than real pc which causes problems.
* pprof: added support for dumping stacks in --text modeAliaksey Kandratsenka2014-12-281-0/+17
| | | | | Which is very useful for diagnosing stack capturing and processing bugs.
* pprof: made --show-addresses workAliaksey Kandratsenka2014-12-281-0/+1
|
* Make PPC64 use 64K of internal page size for tcmalloc by defaultRaphael Moreira Zinsly2014-12-231-2/+4
| | | | | This patch set the default tcmalloc internal page size to 64K when built on PPC.
* New configure flags to set the alignment and page size of tcmallocRaphael Moreira Zinsly2014-12-233-28/+65
| | | | | | | | Added two new configure flags, --with-tcmalloc-pagesize and --with-tcmalloc-alignment, in order to set the tcmalloc internal page size and tcmalloc allocation alignment without the need of a compiler directive and to make the choice of the page size independent of the allocation alignment.
* start building malloc_extension_c_test even with static linkingAliaksey Kandratsenka2014-12-211-7/+4
| | | | | Comment in Makefile.am stating that it doesn't work with static linking is not accurate anymore.
* unbreak malloc_extension_c_test on clangAliaksey Kandratsenka2014-12-211-1/+2
| | | | | | Looks like even force_malloc trick was not enough to force clang to actually call malloc. I'm now calling tc_malloc directly to prevent that smartness.
* added subdir-objects automake optionsAliaksey Kandratsenka2014-12-212-0/+9
| | | | This is suggested by automake itself regarding future-compat.
* fixed C++ comment warning in malloc_extension_c.h from C compilerAliaksey Kandratsenka2014-12-211-1/+1
|
* made AtomicOps_x86CPUFeatureStruct hiddenAliaksey Kandratsenka2014-12-201-0/+3
| | | | So that access to has_sse2 is faster under -fPIC.
* dropped atopmicops workaround for irrelevant Opteron locking bugAliaksey Kandratsenka2014-12-202-32/+1
| | | | | | | | | | | | | | | | | | | It's not cheap at all when done in this way (i.e. without runtime patching) and apparently useless. It looks like Linux kernel never got this workaround at all. See bugzilla ticket: https://bugzilla.kernel.org/show_bug.cgi?id=11305 And I see no traces of this workaround in glibc either. On the other hand, opensolaris folks apparently still have it (or something similar, based on comments on linux bugzilla) in their code: https://github.com/illumos/illumos-gate/blob/32842aabdc7c6f8f0c6140a256cf42cf5404fefb/usr/src/uts/i86pc/os/mp_startup.c#L1136 And affected CPUs (if any) are from year 2008 (that's 6 years now). Plus even if somebody still uses those cpus (which is unlikely), they won't have working kernel and glibc anyways.
* enabled aggressive decommit by defaultAliaksey Kandratsenka2014-12-202-3/+3
| | | | | TCMALLOC_AGGRESSIVE_DECOMMIT=f is one way to disable it and SetNumericProperty is another.
* added basic unit test for singular malloc hooksAliaksey Kandratsenka2014-12-071-0/+22
|
* inform compiler that tcmalloc allocation sampling is unlikelyAliaksey Kandratsenka2014-12-071-1/+1
| | | | | Now compiler generates slightly better code which produces jump-less code for common case of not sampling allocations.
* eliminated CheckIfKernelSupportsTLSAliaksey Kandratsenka2014-12-074-65/+3
| | | | | We don't care about pre-2.6.0 kernels anymore. So we can assume that if compile time check worked, then at runtime it'll work.
* set elf visibility to hidden for malloc hooksAliaksey Kandratsenka2014-12-071-10/+10
| | | | To speed up access to them under -fPIC.
* introduced ATTRIBUTE_VISIBILITY_HIDDENAliaksey Kandratsenka2014-12-071-0/+6
| | | | | So that we can disable elf symbol interposition for certain perf-sensitive symbols.
* replaced separate singular malloc hooks with faster HookListAliaksey Kandratsenka2014-12-072-137/+16
| | | | | Specifically, we can now check in one place if hooks are set at all, instead of two places. Which makes fast path shorter.
* removed extra barriers in malloc hooks mutation methodsAliaksey Kandratsenka2014-12-071-5/+5
| | | | | Because those are already done under spinlock and read-only and lockless Traverse is already tolerant to slight inconsistencies.
* introduced support for deprecated singular hooks into HookListAliaksey Kandratsenka2014-12-072-11/+44
| | | | So that we can later drop separate singular hooks.
* returned date of 2.3rc in NEWS backAliaksey Kandratsenka2014-12-071-0/+2
|
* bumped version to 2.3gperftools-2.3Aliaksey Kandratsenka2014-12-073-8/+8
|
* updated NEWS for gperftools 2.3Aliaksey Kandratsenka2014-12-071-1/+16
|
* Added option to disable libunwind linkingRaphael Moreira Zinsly2014-11-271-6/+19
| | | | | This patch adds a configure option to enable or disable libunwind linking. The patch also disables libunwind on ppc by default.
* compile libunwind unwinder only of __thread is supportedAliaksey Kandratsenka2014-11-271-1/+3
| | | | This fixed build on certain OSX that I have access to.
* issue-658: correctly close socketpair fds when socketpair failsAliaksey Kandratsenka2014-11-271-1/+1
| | | | This applies patch by glider.
* bumped version to 2.3rcgperftools-2.2.90Aliaksey Kandratsenka2014-11-023-7/+7
|
* updated NEWS for gperftools 2.3rcAliaksey Kandratsenka2014-11-021-0/+62
|
* implemented cpu-profiling mode that profiles threads separatelyAliaksey Kandratsenka2014-11-026-24/+183
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Default mode of operation of cpu profiler uses itimer and SIGPROF. This timer is by definition per-process and no spec defines which thread is going to receive SIGPROF. And it provides correct profiles only if we assume that probability of picking threads will be proportional to cpu time spent by threads. It is easy to see, that recent Linux (at least on common SMP hardware) doesn't satisfy that assumption. Quite big skews of SIGPROF ticks between threads is visible. I.e. I could see as big as 70%/20% division instead of 50%/50% for pair of cpu-hog threads. (And I do see it become 50/50 with new mode) Fortunately POSIX provides mechanism to track per-thread cpu time via posix timers facility. And even more fortunately, Linux also provides mechanism to deliver timer ticks to specific threads. Interestingly, it looks like FreeBSD also has very similar facility and seems to suffer from same skew. But due to difference in a way how threads are identified, I haven't bothered to try to support this mode on FreeBSD. This commit implements new profiling mode where every thread creates posix timer which tracks thread's cpu time. Threads also also set up signal delivery to itself on overflows of that timer. This new mode requires every thread to be registered in cpu profiler. Existing ProfilerRegisterThread function is used for that. Because registering threads requires application support (or suitable LD_PRELOAD-able wrapper for thread creation API), new mode is off by default. And it has to be manually activated by setting environment variable CPUPROFILE_PER_THREAD_TIMERS. New mode also requires librt symbols to be available. Which we do not link to due to librt's dependency on libpthread. Which we avoid due to perf impact of bringing in libpthread to otherwise single-threaded programs. So it has to be either already loaded by profiling program or LD_PRELOAD-ed.
* drop workaround for too old redhat 7Aliaksey Kandratsenka2014-11-021-7/+0
| | | | Note that this is _not_ RHEL7 but original redhat 7 from early 2000s.
* don't add leaf function twice to profile under libunwindAliaksey Kandratsenka2014-11-021-2/+11
|
* pprof: indicate if using remote profileAliaksey Kandratsenka2014-11-021-0/+1
| | | | | Missing profile file is common source of confusion. So a bit more clarify is useful.
* issue-493: correctly detect __ARM_ARCH_6ZK__ for MemoryBarrierAliaksey Kandratsenka2014-11-021-1/+1
| | | | Which should fix issue reported by user pedronavf
* issue-655: use safe getenv for aggressive decommit mode flagAliaksey Kandratsenka2014-11-022-5/+46
| | | | | Because otherwise we risk deadlock due to too early use of getenv on windows.
* issue-654: [pprof] handle split text segmentsAliaksey Kandratsenka2014-10-181-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This applies patch by user simonb. Quoting: Relocation packing splits a single executable load segment into two. Before: LOAD 0x000000 0x00000000 0x00000000 0x2034d28 0x2034d28 R E 0x1000 LOAD 0x2035888 0x02036888 0x02036888 0x182d38 0x1a67d0 RW 0x1000 After: LOAD 0x000000 0x00000000 0x00000000 0x14648 0x14648 R E 0x1000 LOAD 0x014648 0x0020c648 0x0020c648 0x1e286e0 0x1e286e0 R E 0x1000 ... LOAD 0x1e3d888 0x02036888 0x02036888 0x182d38 0x1a67d0 RW 0x1000 The .text section is in the second LOAD, and this is not at offset/address zero. The result is that this library shows up in /proc/self/maps as multiple executable entries, for example (note: this trace is not from the library dissected above, but rather from an earlier version of it): 73b0c000-73b21000 r-xp 00000000 b3:19 786460 /data/.../libchrome.2160.0.so 73b21000-73d12000 ---p 00000000 00:00 0 73d12000-75a90000 r-xp 00014000 b3:19 786460 /data/.../libchrome.2160.0.so 75a90000-75c0d000 rw-p 01d91000 b3:19 786460 /data/.../libchrome.2160.0.so When parsing this, pprof needs to merge the two r-xp entries above into a single entry, otherwise the addresses it prints are incorrect. The following fix against 2.2.1 was sufficient to make pprof --text print the correct output. Untested with other pprof options.
* Fix parsing /proc/pid/maps dump in CPU profile data fileRicardo M. Correia2014-10-111-1/+1
| | | | | | | | | | | | | When trying to use pprof on my machine, the symbols of my program were not being recognized. It turned out that pprof, when calculating the offset of the text list of mapped objects (the last section of the CPU profile data file), was assuming that the slot size was always 4 bytes, even on 64-bit machines. This led to ParseLibraries() reading a lot of garbage data at the beginning of the map, and consequently the regex was failing to match on the first line of the real (non-garbage) map.
* Added remaining memory allocated info to 'Exiting' dump messageAliaksey Kandratsenka2014-09-061-1/+21
| | | | This applies patch by user yurivict.
* Cope with new addr2line outputs for DWARF4Adam McNeeney2014-08-231-0/+6
| | | | | | | Copes with ? for line number (converts to 0). Copes with (discriminator <num>) suffixes to file/linenum (removes). Change-Id: I96207165e4852c71d3512157864f12d101cdf44a