| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
libraries with debug symbols are installed at "/usr/lib/debug/<originalpath>.debug", such as RHEL and CentOS.
|
| |
|
| |
|
|
|
|
| |
It is 2015 and not 2014. Spotted and reported by Armin Rigo.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"constand 2nd frame" feature is supposed to detect and workaround
incorrect cpu profile stack captures where parts of or whole cpu
profiling signal handler frames are not skipped.
I've seen programs where this feature incorrectly removes non-signal
frames.
Plus it actually hides bugs in stacktrace capturing which we want be
able to spot.
There is now --no-auto-signal-frm option for disabling it.
|
|
|
|
| |
We actually have 3 and not 2 of them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In cpu profiles that had parts of signal handler we could have
situation like that:
* PC
* signal handler frame
* PC
Specifically when capturing stacktraces via libunwind.
For such stacktraces pprof used to draw self-cycle in functions
confusing everybody. Given that me might have a number of such
profiles in the wild it makes sense to treat that duplicate PC issue.
|
| |
|
|
|
|
|
|
| |
This was reported to cause problems due to libunwind occasionally
returning top level pc that is 1 smaller than real pc which causes
problems.
|
|
|
|
|
| |
Which is very useful for diagnosing stack capturing and processing
bugs.
|
| |
|
|
|
|
|
| |
This patch set the default tcmalloc internal page size to 64K when
built on PPC.
|
|
|
|
|
|
|
|
| |
Added two new configure flags, --with-tcmalloc-pagesize and
--with-tcmalloc-alignment, in order to set the tcmalloc internal page
size and tcmalloc allocation alignment without the need of a compiler
directive and to make the choice of the page size independent of the
allocation alignment.
|
|
|
|
|
| |
Comment in Makefile.am stating that it doesn't work with static
linking is not accurate anymore.
|
|
|
|
|
|
| |
Looks like even force_malloc trick was not enough to force clang to
actually call malloc. I'm now calling tc_malloc directly to prevent
that smartness.
|
|
|
|
| |
This is suggested by automake itself regarding future-compat.
|
| |
|
|
|
|
| |
So that access to has_sse2 is faster under -fPIC.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's not cheap at all when done in this way (i.e. without runtime
patching) and apparently useless.
It looks like Linux kernel never got this workaround at all. See
bugzilla ticket: https://bugzilla.kernel.org/show_bug.cgi?id=11305
And I see no traces of this workaround in glibc either.
On the other hand, opensolaris folks apparently still have it (or
something similar, based on comments on linux bugzilla) in their code:
https://github.com/illumos/illumos-gate/blob/32842aabdc7c6f8f0c6140a256cf42cf5404fefb/usr/src/uts/i86pc/os/mp_startup.c#L1136
And affected CPUs (if any) are from year 2008 (that's 6 years now).
Plus even if somebody still uses those cpus (which is unlikely), they
won't have working kernel and glibc anyways.
|
|
|
|
|
| |
TCMALLOC_AGGRESSIVE_DECOMMIT=f is one way to disable it and
SetNumericProperty is another.
|
| |
|
|
|
|
|
| |
Now compiler generates slightly better code which produces jump-less
code for common case of not sampling allocations.
|
|
|
|
|
| |
We don't care about pre-2.6.0 kernels anymore. So we can assume that
if compile time check worked, then at runtime it'll work.
|
|
|
|
| |
To speed up access to them under -fPIC.
|
|
|
|
|
| |
So that we can disable elf symbol interposition for certain
perf-sensitive symbols.
|
|
|
|
|
| |
Specifically, we can now check in one place if hooks are set at all,
instead of two places. Which makes fast path shorter.
|
|
|
|
|
| |
Because those are already done under spinlock and read-only and
lockless Traverse is already tolerant to slight inconsistencies.
|
|
|
|
| |
So that we can later drop separate singular hooks.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This patch adds a configure option to enable or disable libunwind linking.
The patch also disables libunwind on ppc by default.
|
|
|
|
| |
This fixed build on certain OSX that I have access to.
|
|
|
|
| |
This applies patch by glider.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Default mode of operation of cpu profiler uses itimer and
SIGPROF. This timer is by definition per-process and no spec defines
which thread is going to receive SIGPROF. And it provides correct
profiles only if we assume that probability of picking threads will be
proportional to cpu time spent by threads.
It is easy to see, that recent Linux (at least on common SMP hardware)
doesn't satisfy that assumption. Quite big skews of SIGPROF ticks
between threads is visible. I.e. I could see as big as 70%/20%
division instead of 50%/50% for pair of cpu-hog threads. (And I do see
it become 50/50 with new mode)
Fortunately POSIX provides mechanism to track per-thread cpu time via
posix timers facility. And even more fortunately, Linux also provides
mechanism to deliver timer ticks to specific threads.
Interestingly, it looks like FreeBSD also has very similar facility
and seems to suffer from same skew. But due to difference in a way
how threads are identified, I haven't bothered to try to support this
mode on FreeBSD.
This commit implements new profiling mode where every thread creates
posix timer which tracks thread's cpu time. Threads also also set up
signal delivery to itself on overflows of that timer.
This new mode requires every thread to be registered in cpu
profiler. Existing ProfilerRegisterThread function is used for that.
Because registering threads requires application support (or suitable
LD_PRELOAD-able wrapper for thread creation API), new mode is off by
default. And it has to be manually activated by setting environment
variable CPUPROFILE_PER_THREAD_TIMERS.
New mode also requires librt symbols to be available. Which we do not
link to due to librt's dependency on libpthread. Which we avoid due
to perf impact of bringing in libpthread to otherwise single-threaded
programs. So it has to be either already loaded by profiling program
or LD_PRELOAD-ed.
|
|
|
|
| |
Note that this is _not_ RHEL7 but original redhat 7 from early 2000s.
|
| |
|
|
|
|
|
| |
Missing profile file is common source of confusion. So a bit more
clarify is useful.
|
|
|
|
| |
Which should fix issue reported by user pedronavf
|
|
|
|
|
| |
Because otherwise we risk deadlock due to too early use of getenv on
windows.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This applies patch by user simonb.
Quoting:
Relocation packing splits a single executable load segment into two. Before:
LOAD 0x000000 0x00000000 0x00000000 0x2034d28 0x2034d28 R E 0x1000
LOAD 0x2035888 0x02036888 0x02036888 0x182d38 0x1a67d0 RW 0x1000
After:
LOAD 0x000000 0x00000000 0x00000000 0x14648 0x14648 R E 0x1000
LOAD 0x014648 0x0020c648 0x0020c648 0x1e286e0 0x1e286e0 R E 0x1000
...
LOAD 0x1e3d888 0x02036888 0x02036888 0x182d38 0x1a67d0 RW 0x1000
The .text section is in the second LOAD, and this is not at
offset/address zero. The result is that this library shows up in
/proc/self/maps as multiple executable entries, for example (note:
this trace is not from the library dissected above, but rather from an
earlier version of it):
73b0c000-73b21000 r-xp 00000000 b3:19 786460 /data/.../libchrome.2160.0.so
73b21000-73d12000 ---p 00000000 00:00 0
73d12000-75a90000 r-xp 00014000 b3:19 786460 /data/.../libchrome.2160.0.so
75a90000-75c0d000 rw-p 01d91000 b3:19 786460 /data/.../libchrome.2160.0.so
When parsing this, pprof needs to merge the two r-xp entries above
into a single entry, otherwise the addresses it prints are incorrect.
The following fix against 2.2.1 was sufficient to make pprof --text
print the correct output. Untested with other pprof options.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When trying to use pprof on my machine, the symbols of my program were
not being recognized.
It turned out that pprof, when calculating the offset of the text list
of mapped objects (the last section of the CPU profile data file), was
assuming that the slot size was always 4 bytes, even on 64-bit machines.
This led to ParseLibraries() reading a lot of garbage data at the
beginning of the map, and consequently the regex was failing to match on
the first line of the real (non-garbage) map.
|
|
|
|
| |
This applies patch by user yurivict.
|
|
|
|
|
|
|
| |
Copes with ? for line number (converts to 0).
Copes with (discriminator <num>) suffixes to file/linenum (removes).
Change-Id: I96207165e4852c71d3512157864f12d101cdf44a
|