diff options
35 files changed, 420 insertions, 378 deletions
@@ -1,3 +1,39 @@ +Thu Aug 5 12:48:03 PDT 2010 + + * google-perftools: version 1.6 release + * Add tc_malloc_usable_size for compatibility with glibc (csilvers) + * Override malloc_usable_size with tc_malloc_usable_size (csilvers) + * Default to no automatic heap sampling in tcmalloc (csilvers) + * Add -DTCMALLOC_LARGE_PAGES, a possibly faster tcmalloc (rus) + * Make some functions extern "C" to avoid false ODR warnings (jyasskin) + * pprof: Add SVG-based output (rsc) + * pprof: Extend pprof --tools to allow per-tool configs (csilvers) + * pprof: Improve support of 64-bit and big-endian profiles (csilvers) + * pprof: Add interactive callgrind suport (weidenri...) + * pprof: Improve address->function mapping a bit (dpeng) + * Better detection of when we're running under valgrind (csilvers) + * Better CPU-speed detection under valgrind (saito) + * Use, and recommend, -fno-builtin-malloc when compiling (csilvers) + * Avoid false-sharing of memory between caches (bmaurer) + * BUGFIX: Fix heap sampling to use correct alloc size (bmauer) + * BUGFIX: Avoid gcc 4.0.x bug by making hook-clearing atomic (csilvers) + * BUGFIX: Avoid gcc 4.5.x optimization bug (csilvers) + * BUGFIX: Work around deps-determining bug in libtool 1.5.26 (csilvers) + * BUGFIX: Fixed test to use HAVE_PTHREAD, not HAVE_PTHREADS (csilvers) + * BUGFIX: Fix tls callback behavior on windows when using wpo (wtc) + * BUGFIX: properly align allocation sizes on Windows (antonm) + * BUGFIX: Fix prototypes for tcmalloc/debugalloc wrt throw() (csilvers) + * DOC: Updated heap-checker doc to match reality better (fischman) + * DOC: Document ProfilerFlush, ProfilerStartWithOptions (csilvers) + * DOC: Update docs for heap-profiler functions (csilvers) + * DOC: Clean up documentation around tcmalloc.slack_bytes (fikes) + * DOC: Renamed README.windows to README_windows.txt (csilvers) + * DOC: Update the NEWS file to be non-empty (csilvers) + * PORTING: Fix windows addr2line and nm with proper rc code (csilvers) + * PORTING: Add CycleClock and atomicops support for arm 5 (sanek) + * PORTING: Improve PC finding on cygwin and redhat 7 (csilvers) + * PORTING: speed up function-patching under windows (csilvers) + Tue Jan 19 14:46:12 2010 Google Inc. <opensource@google.com> * google-perftools: version 1.5 release @@ -65,6 +65,26 @@ application with frame pointers (via 'gcc -fno-omit-frame-pointer ...') in this case. +*** TCMALLOC LARGE PAGES: TRADING TIME FOR SPACE + +Internally, tcmalloc divides its memory into "pages." The default +page size is chosen to minimize memory use by reducing fragmentation. +The cost is that keeping track of these pages can cost tcmalloc time. +We've added a new, experimental flag to tcmalloc that enables a larger +page size. In general, this will increase the memory needs of +applications using tcmalloc. However, in many cases it will speed up +the applications as well, particularly if they allocate and free a lot +of memory. We've seen average speedups of 3-5% on Google +applications. + +This feature is still very experimental; it's not even a configure +flag yet. To build libtcmalloc with large pages, run + + ./configure <normal flags> CXXFLAGS=-DTCMALLOC_LARGE_PAGES + +(or add -DTCMALLOC_LARGE_PAGES to your existing CXXFLAGS argument). + + *** NOTE FOR ___tls_get_addr ERROR When compiling perftools on some old systems, like RedHat 8, you may @@ -191,6 +211,15 @@ above, by linking in libtcmalloc_minimal. successfully build are exactly the same as for FreeBSD. See that section for a list of binaries and instructions on building them. + In addition, it appears OS X regularly fails profiler_unittest.sh + in the "thread" test (in addition to occassionally failing in the + "fork" test). It looks like OS X often delivers the profiling + signal to the main thread, even when it's sleeping, rather than + spawned threads that are doing actual work. If anyone knows + details of how OS X handles SIGPROF (via setitimer()) events with + threads, and has insight into this problem, please send mail to + google-perftools@googlegroups.com. + ** Solaris 10 x86: I've only tested using the GNU C++ compiler, not the Sun C++ @@ -236,7 +265,10 @@ above, by linking in libtcmalloc_minimal. the heap-checker and a few other pieces of functionality will not compile). 'make' will compile those libraries and tests that can be compiled. You can run 'make check' to make sure the basic - functionality is working. + functionality is working. I've heard reports that some versions of + cygwin fail calls to pthread_join() with EINVAL, causing several + tests to fail. If you have any insight into this, please mail + google-perftools@googlegroups.com. This Windows functionality is also available using MinGW and Msys, In this case, you can use the regular './configure && make' @@ -1,3 +1,31 @@ +=== 5 August 2010 === + +I've just released perftools 1.6 + +This version also has a large number of minor changes, including +support for `malloc_usable_size()` as a glibc-compatible alias to +`malloc_size()`, the addition of SVG-based output to `pprof`, and +experimental support for tcmalloc large pages, which may speed up +tcmalloc at the cost of greater memory use. To use tcmalloc large +pages, see the +[http://google-perftools.googlecode.com/svn/tags/perftools-1.5/INSTALL +INSTALL file]; for all changes, see the +[http://google-perftools.googlecode.com/svn/tags/perftools-1.5/ChangeLog +ChangeLog]. + +OS X NOTE: improvements in the profiler unittest have turned up an OS +X issue: in multithreaded programs, it seems that OS X often delivers +the profiling signal (from sigitimer()) to the main thread, even when +it's sleeping, rather than spawned threads that are doing actual work. +If anyone knows details of how OS X handles SIGPROF events (from +setitimer) in threaded programs, and has insight into this problem, +please send mail to google-perftools@googlegroups.com. + +To see if you're affected by this, look for profiling time that pprof +attributes to ___semwait_signal. This is work being done in other +threads, that is being attributed to sleeping-time in the main thread. + + === 20 January 2010 === I've just released perftools 1.5 @@ -11,7 +11,7 @@ tcmalloc -- a replacement for malloc and new. See below for some environment variables you can use with tcmalloc, as well. tcmalloc functionality is available on all systems we've tested; see -INSTALL for more details. See README.windows for instructions on +INSTALL for more details. See README_windows.txt for instructions on using tcmalloc on Windows. NOTE: When compiling with programs with gcc, that you plan to link @@ -161,7 +161,7 @@ in its full generality only on those systems. However, we've successfully ported much of the tcmalloc library to FreeBSD, Solaris x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic functionality in tcmalloc_minimal to Windows. See INSTALL for details. -See README.windows for details on the Windows port. +See README_windows.txt for details on the Windows port. PERFORMANCE @@ -175,6 +175,11 @@ win32's malloc. http://www.highlandsun.com/hyc/malloc/ http://gaiacrtn.free.fr/articles/win32perftools.html +It's possible to build tcmalloc in a way that trades off faster +performance (particularly for deletes) at the cost of more memory +fragmentation (that is, more unusable memory on your system). See the +INSTALL file for details. + OLD SYSTEM ISSUES ----------------- @@ -1,6 +1,6 @@ #! /bin/sh # Guess values for system-dependent variables and create Makefiles. -# Generated by GNU Autoconf 2.64 for google-perftools 1.5. +# Generated by GNU Autoconf 2.64 for google-perftools 1.6. # # Report bugs to <opensource@google.com>. # @@ -703,8 +703,8 @@ MAKEFLAGS= # Identity of this package. PACKAGE_NAME='google-perftools' PACKAGE_TARNAME='google-perftools' -PACKAGE_VERSION='1.5' -PACKAGE_STRING='google-perftools 1.5' +PACKAGE_VERSION='1.6' +PACKAGE_STRING='google-perftools 1.6' PACKAGE_BUGREPORT='opensource@google.com' PACKAGE_URL='' @@ -1464,7 +1464,7 @@ if test "$ac_init_help" = "long"; then # Omit some internal or obsolete options to make the list less imposing. # This message is too long to be a string in the A/UX 3.1 sh. cat <<_ACEOF -\`configure' configures google-perftools 1.5 to adapt to many kinds of systems. +\`configure' configures google-perftools 1.6 to adapt to many kinds of systems. Usage: $0 [OPTION]... [VAR=VALUE]... @@ -1535,7 +1535,7 @@ fi if test -n "$ac_init_help"; then case $ac_init_help in - short | recursive ) echo "Configuration of google-perftools 1.5:";; + short | recursive ) echo "Configuration of google-perftools 1.6:";; esac cat <<\_ACEOF @@ -1648,7 +1648,7 @@ fi test -n "$ac_init_help" && exit $ac_status if $ac_init_version; then cat <<\_ACEOF -google-perftools configure 1.5 +google-perftools configure 1.6 generated by GNU Autoconf 2.64 Copyright (C) 2009 Free Software Foundation, Inc. @@ -2317,7 +2317,7 @@ cat >config.log <<_ACEOF This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. -It was created by google-perftools $as_me 1.5, which was +It was created by google-perftools $as_me 1.6, which was generated by GNU Autoconf 2.64. Invocation command line was $ $0 $@ @@ -3050,7 +3050,7 @@ fi # Define the identity of the package. PACKAGE='google-perftools' - VERSION='1.5' + VERSION='1.6' cat >>confdefs.h <<_ACEOF @@ -20303,7 +20303,7 @@ $as_echo_n "checking how to access the program counter from a struct ucontext... pc_fields="$pc_fields uc_mcontext.sc_ip" # Linux (ia64) pc_fields="$pc_fields uc_mcontext.uc_regs->gregs[PT_NIP]" # Linux (ppc) pc_fields="$pc_fields uc_mcontext.gregs[R15]" # Linux (arm old [untested]) - pc_fields="$pc_fields uc_mcontext.arm_pc" # Linux (arm new [untested]) + pc_fields="$pc_fields uc_mcontext.arm_pc" # Linux (arm arch 5) pc_fields="$pc_fields uc_mcontext.gp_regs[PT_NIP]" # Suse SLES 11 (ppc64) pc_fields="$pc_fields uc_mcontext.mc_eip" # FreeBSD (i386) pc_fields="$pc_fields uc_mcontext.mc_rip" # FreeBSD (x86_64 [untested]) @@ -22208,7 +22208,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 # report actual input values of CONFIG_FILES etc. instead of their # values after options handling. ac_log=" -This file was extended by google-perftools $as_me 1.5, which was +This file was extended by google-perftools $as_me 1.6, which was generated by GNU Autoconf 2.64. Invocation command line was CONFIG_FILES = $CONFIG_FILES @@ -22272,7 +22272,7 @@ Report bugs to <opensource@google.com>." _ACEOF cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 ac_cs_version="\\ -google-perftools config.status 1.5 +google-perftools config.status 1.6 configured by $0, generated by GNU Autoconf 2.64, with options \\"`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`\\" diff --git a/configure.ac b/configure.ac index adbb2e5..22d1c5e 100644 --- a/configure.ac +++ b/configure.ac @@ -4,7 +4,7 @@ # make sure we're interpreted by some minimal autoconf AC_PREREQ(2.57) -AC_INIT(google-perftools, 1.5, opensource@google.com) +AC_INIT(google-perftools, 1.6, opensource@google.com) # The argument here is just something that should be in the current directory # (for sanity checking) AC_CONFIG_SRCDIR(README) diff --git a/doc/heap_checker.html b/doc/heap_checker.html index caf46ef..544ce60 100644 --- a/doc/heap_checker.html +++ b/doc/heap_checker.html @@ -89,7 +89,7 @@ check:</p> <li> <code>draconian</code> </ol> -<p>"Minimal" heap-checking starts as late as possible ina +<p>"Minimal" heap-checking starts as late as possible in a initialization, meaning you can leak some memory in your initialization routines (that run before <code>main()</code>, say), and not trigger a leak message. If you frequently (and purposefully) @@ -162,19 +162,13 @@ cleanup code as active only when the heap-checker is turned on.</p> <h2><a name="explicit">Explicit (Partial-program) Heap Leak Checking</h2> -<p>Instead of whole-program checking, you can check certain parts of -your code to verify they do not have memory leaks. There are two -types of checks you can do. The "no leak" check verifies that between -two parts of a program, no memory is allocated without being freed; it -checks that memory does not grow. The stricter "same heap" check -verifies that two parts of a program share the same heap profile; that -is, that the memory does not grow <i>or shrink</i>, or change in any -way.</p> - +<p>Instead of whole-program checking, you can check certain parts of your +code to verify they do not have memory leaks. This check verifies that +between two parts of a program, no memory is allocated without being freed.</p> <p>To use this kind of checking code, bracket the code you want checked by creating a <code>HeapLeakChecker</code> object at the -beginning of the code segment, and calling <code>*SameHeap()</code> or -<code>*NoLeaks()</code> at the end. These functions, and all others +beginning of the code segment, and call +<code>NoLeaks()</code> at the end. These functions, and all others referred to in this file, are declared in <code><google/heap-checker.h></code>. </p> @@ -184,31 +178,11 @@ referred to in this file, are declared in HeapLeakChecker heap_checker("test_foo"); { code that exercises some foo functionality; - this code should preserve memory allocation state; + this code should not leak memory; } - if (!heap_checker.SameHeap()) assert(NULL == "heap memory leak"); -</pre> - -<p>The various flavors of these functions -- <code>SameHeap()</code>, -<code>QuickSameHeap()</code>, <code>BriefSameHeap()</code> -- trade -off running time for accuracy: the faster routines might miss some -legitimate leaks. For instance, the briefest tests might be confused -by code like this:</p> -<pre> - void LeakTwentyBytes() { - char* a = malloc(20); - HeapLeakChecker heap_checker("test_malloc"); - char* b = malloc(20); - free(a); - // This will pass: it totes up 20 bytes allocated and 20 bytes freed - assert(heap_checker.BriefNoLeaks()); // doesn't detect that b is leaked - } + if (!heap_checker.NoLeaks()) assert(NULL == "heap memory leak"); </pre> -<p>(This is because <code>BriefSameHeap()</code> does not use <A -HREF="#pprof">pprof</A>, which is slower but is better able to track -allocations in tricky situations like the above.)</p> - <p>Note that adding in the <code>HeapLeakChecker</code> object merely instruments the code for leak-checking. To actually turn on this leak-checking on a particular run of the executable, you must still @@ -300,28 +274,12 @@ checking.</p> <table frame=box rules=sides cellpadding=5 width=100%> <tr valign=top> - <td><code>HEAP_CHECK_REPORT</code></td> - <td>Default: true</td> - <td> - If true, use <code>pprof</code> to report more info about found leaks. - </td> -</tr> - -<tr valign=top> - <td><code>HEAP_CHECK_STRICT_CHECK</code></td> - <td>Default: true</td> - <td> - If true, do the program-end check via <code>SameHeap()</code>; - if false, use <code>NoLeaks()</code>. - </td> -</tr> - -<tr valign=top> <td><code>HEAP_CHECK_MAX_LEAKS</code></td> <td>Default: 20</td> <td> - The maximum number of leaks to be reported. If negative or zero, print all - the leaks found. + The maximum number of leaks to be printed to stderr (all leaks are still + emitted to file output for pprof to visualize). If negative or zero, + print all the leaks found. </td> </tr> @@ -449,16 +407,15 @@ and then look closely at the generated leak report messages. <p>When a <code>HeapLeakChecker</code> object is constructed, it dumps a memory-usage profile named <code><prefix>.<name>-beg.heap</code> to a temporary -directory. When <code>*NoLeaks()</code> or <code>*SameHeap()</code> +directory. When <code>NoLeaks()</code> is called (for whole-program checking, this happens automatically at program-exit), it dumps another profile, named <code><prefix>.<name>-end.heap</code>. (<code><prefix></code> is typically determined automatically, and <code><name></code> is typically <code>argv[0]</code>.) It -then compares the two profiles. If the second profile shows more -memory use than the first (or, for <code>*SameHeap()</code> calls, -any different pattern of memory use than the first), the -<code>*NoLeaks()</code> or <code>*SameHeap()</code> function will +then compares the two profiles. If the second profile shows +more memory use than the first, the +<code>NoLeaks()</code> function will return false. For "whole program" profiling, this will cause the executable to abort (via <code>exit(1)</code>). In all cases, it will print a message on how to process the dumped profiles to locate @@ -520,94 +477,20 @@ of explicit clean up code and other hassles when dealing with thread data.</p> -<h3><A NAME="pprof">More Exact Checking via pprof</A></h3> +<h3>Visualizing Leak with <code>pprof</code></h3> -<p>The perftools profiling tool, <code>pprof</code>, is primarily -intended for users to use interactively in order to explore heap and -CPU usage. However, the heap-checker can -- and, by default, does - -call <code>pprof</code> internally, in order to improve its leak -checking.</p> - -<p>In particular, the heap-checker calls <code>pprof</code> to utilize -the full call-path for all allocations. <code>pprof</code> uses this -data to disambiguate allocations. When the time comes to do a -<code>SameHeap</code> or <code>NoLeaks</code> check, the heap-checker -asks <code>pprof</code> to do this check on an -allocation-by-allocation basis, rather than just by comparing global -counts.</p> - -<p>Here's an example. Consider the following function:</p> -<pre> - void LeakTwentyBytes() { - char* a = malloc(20); - HeapLeakChecker heap_checker("test_malloc"); - char* b = malloc(20); - free(a); - heap_checker.NoLeaks(); - } -</pre> - -<p>Without using pprof, the only thing we will do is count up the -number of allocations and frees inside the leak-checked interval. -Twenty bytes allocated, twenty bytes freed, and the code looks ok.</p> - -<p>With pprof, however, we can track the call-path for each -allocation, and account for them separately. In the example function -above, there are two call-paths that end in an allocation, one that -ends in "LeakTwentyBytes:line1" and one that ends in -"LeakTwentyBytes:line3".</p> - -<p>Here's how the heap-checker works when it can use pprof in this -way:</p> -<ol> - <li> <b>Line 1:</b> Allocate 20 bytes, mark <code>a</code> as having - call-path "LeakTwentyBytes:line1", and update the count-map - <pre>count["LeakTwentyByte:line1"] += 20;</pre> - <li> <b>Line 2:</b> Dump the current <code>count</code> map to a file. - <li> <b>Line 3:</b> Allocate 20 bytes, mark <code>b</code> as having - call-path "LeakTwentyBytes:line3", and update the count-map: - <pre>count["LeakTwentyByte:line3"] += 20;</pre> - <li> <b>Line 4:</b> Look up <code>a</code> to find its call-path - (stored in line 1), and use that to update the count-map: - <pre>count["LeakTwentyByte:line1"] -= 20;</pre> - <li> <b>Line 5:</b> Look at each bucket in the current count-map, - minus what was dumped in line 2. Here's the diffs we'll have - in each bucket: - <pre> -count["LeakTwentyByte:line1"] == -20; -count["LeakTwentyByte:line3"] == 20; - </pre> - Since <i>at least one</i> bucket has a positive number, we - complain of a leak. (Note if line 5 had been - <code>SameHeap</code> instead of <code>NoLeaks</code>, we would - have complained if any bucket had had a <i>non-zero</i> - number.) -</ol> - -<p>Note that one way to visualize the non-<code>pprof</code> mode is -that we do the same thing as above, but always use "unknown" as the -call-path. That is, our count-map always only has one entry in it: -<code>count["unknown"]</code>. Looking at the example above shows how -having only one entry in the map can lead to incorrect results.</p> - -<p>Here is when <code>pprof</code> is used by the heap-checker:</p> -<ul> - <li> <code>NoLeaks()</code> and <code>SameHeap()</code> both use - <code>pprof</code>. - <li> <code>BriefNoLeaks()</code> and <code>BriefSameHeap()</code> do - not use <code>pprof</code>. - <li> <code>QuickNoLeaks</code> and <code>QuickSameHeap()</code> are - a kind of compromise: they do <i>not</i> use pprof for their - leak check, but if that check happens to find a leak anyway, - then they re-do the leak calculation using <code>pprof</code>. - This means they do not always find leaks, but when they do, - they will be as accurate as possible in their leak report. -</ul> +<p> +The heap checker automatically prints basic leak info with stack traces of +leaked objects' allocation sites, as well as a pprof command line that can be +used to visualize the call-graph involved in these allocations. +The latter can be much more useful for a human +to see where/why the leaks happened, especially if the leaks are numerous. +</p> <h3>Leak-checking and Threads</h3> <p>At the time of HeapLeakChecker's construction and during -<code>*NoLeaks()</code>/<code>*SameHeap()</code> calls, we grab a lock +<code>NoLeaks()</code> calls, we grab a lock and then pause all other threads so other threads do not interfere with recording or analyzing the state of the heap.</p> @@ -635,7 +518,8 @@ depending on how the compiled code works with the stack:</p> int* foo = new int [20]; HeapLeakChecker check("a_check"); foo = NULL; - CHECK(check.NoLeaks()); // this might succeed + // May fail to trigger. + if (!heap_checker.NoLeaks()) assert(NULL == "heap memory leak"); </pre> diff --git a/m4/pc_from_ucontext.m4 b/m4/pc_from_ucontext.m4 index 19ec347..dee73a1 100644 --- a/m4/pc_from_ucontext.m4 +++ b/m4/pc_from_ucontext.m4 @@ -27,7 +27,7 @@ AC_DEFUN([AC_PC_FROM_UCONTEXT], pc_fields="$pc_fields uc_mcontext.sc_ip" # Linux (ia64) pc_fields="$pc_fields uc_mcontext.uc_regs->gregs[[PT_NIP]]" # Linux (ppc) pc_fields="$pc_fields uc_mcontext.gregs[[R15]]" # Linux (arm old [untested]) - pc_fields="$pc_fields uc_mcontext.arm_pc" # Linux (arm new [untested]) + pc_fields="$pc_fields uc_mcontext.arm_pc" # Linux (arm arch 5) pc_fields="$pc_fields uc_mcontext.gp_regs[[PT_NIP]]" # Suse SLES 11 (ppc64) pc_fields="$pc_fields uc_mcontext.mc_eip" # FreeBSD (i386) pc_fields="$pc_fields uc_mcontext.mc_rip" # FreeBSD (x86_64 [untested]) diff --git a/packages/deb/changelog b/packages/deb/changelog index 933795e..579ebf0 100644 --- a/packages/deb/changelog +++ b/packages/deb/changelog @@ -1,3 +1,9 @@ +google-perftools (1.6-1) unstable; urgency=low + + * New upstream release. + + -- Google Inc. <opensource@google.com> Thu, 05 Aug 2010 12:48:03 -0700 + google-perftools (1.5-1) unstable; urgency=low * New upstream release. diff --git a/src/base/atomicops.h b/src/base/atomicops.h index 0f3d3ef..ec60489 100644 --- a/src/base/atomicops.h +++ b/src/base/atomicops.h @@ -89,6 +89,8 @@ // TODO(csilvers): match piii, not just __i386. Also, match k8 #if defined(__MACH__) && defined(__APPLE__) #include "base/atomicops-internals-macosx.h" +#elif defined(__GNUC__) && defined(__ARM_ARCH_5T__) +#include "base/atomicops-internals-arm-gcc.h" #elif defined(_MSC_VER) && defined(_M_IX86) #include "base/atomicops-internals-x86-msvc.h" #elif defined(__MINGW32__) && defined(__i386__) diff --git a/src/base/cycleclock.h b/src/base/cycleclock.h index 8af664e..b114170 100644 --- a/src/base/cycleclock.h +++ b/src/base/cycleclock.h @@ -48,6 +48,8 @@ #include "base/basictypes.h" // make sure we get the def for int64 #if defined(__MACH__) && defined(__APPLE__) #include <mach/mach_time.h> +#elif defined(__ARM_ARCH_5T__) +#include <sys/time.h> #endif // NOTE: only i386 and x86_64 have been well tested. @@ -71,8 +73,7 @@ struct CycleClock { return mach_absolute_time(); #elif defined(__i386__) int64 ret; - __asm__ volatile ("rdtsc" - : "=A" (ret) ); + __asm__ volatile ("rdtsc" : "=A" (ret) ); return ret; #elif defined(__x86_64__) || defined(__amd64__) uint64 low, high; @@ -82,11 +83,15 @@ struct CycleClock { // This returns a time-base, which is not always precisely a cycle-count. int64 tbl, tbu0, tbu1; asm("mftbu %0" : "=r" (tbu0)); - asm("mftb %0" : "=r" (tbl )); + asm("mftb %0" : "=r" (tbl)); asm("mftbu %0" : "=r" (tbu1)); tbl &= -static_cast<int64>(tbu0 == tbu1); // high 32 bits in tbu1; low 32 bits in tbl (tbu0 is garbage) return (tbu1 << 32) | tbl; +#elif defined(__ARM_ARCH_5T__) + struct timeval tv; + gettimeofday(&tv, NULL); + return static_cast<uint64>(tv.tv_sec) * 1000000 + tv.tv_usec; #elif defined(__sparc__) int64 tick; asm(".byte 0x83, 0x41, 0x00, 0x00"); diff --git a/src/base/dynamic_annotations.c b/src/base/dynamic_annotations.c index bddd693..ec37318 100644 --- a/src/base/dynamic_annotations.c +++ b/src/base/dynamic_annotations.c @@ -139,24 +139,24 @@ static int GetRunningOnValgrind(void) { /* See the comments in dynamic_annotations.h */ int RunningOnValgrind(void) { static volatile int running_on_valgrind = -1; + int local_running_on_valgrind = running_on_valgrind; /* C doesn't have thread-safe initialization of statics, and we don't want to depend on pthread_once here, so hack it. */ ANNOTATE_BENIGN_RACE(&running_on_valgrind, "safe hack"); - int local_running_on_valgrind = running_on_valgrind; if (local_running_on_valgrind == -1) running_on_valgrind = local_running_on_valgrind = GetRunningOnValgrind(); return local_running_on_valgrind; } /* See the comments in dynamic_annotations.h */ -double ValgrindSlowdown() { - if (RunningOnValgrind() == 0) { - return 1.0; - } +double ValgrindSlowdown(void) { /* Same initialization hack as in RunningOnValgrind(). */ static volatile double slowdown = 0.0; + double local_slowdown = slowdown; ANNOTATE_BENIGN_RACE(&slowdown, "safe hack"); - int local_slowdown = slowdown; + if (RunningOnValgrind() == 0) { + return 1.0; + } if (local_slowdown == 0.0) { char *env = getenv("VALGRIND_SLOWDOWN"); slowdown = local_slowdown = env ? atof(env) : 50.0; diff --git a/src/base/dynamic_annotations.h b/src/base/dynamic_annotations.h index ceb9809..10642fd 100644 --- a/src/base/dynamic_annotations.h +++ b/src/base/dynamic_annotations.h @@ -468,7 +468,7 @@ int RunningOnValgrind(void); SleepForSeconds(5 * ValgrindSlowdown()); } */ -double ValgrindSlowdown(); +double ValgrindSlowdown(void); #ifdef __cplusplus } diff --git a/src/base/sysinfo.cc b/src/base/sysinfo.cc index 7af0495..7cfa051 100644 --- a/src/base/sysinfo.cc +++ b/src/base/sysinfo.cc @@ -56,6 +56,7 @@ #endif #include "base/sysinfo.h" #include "base/commandlineflags.h" +#include "base/dynamic_annotations.h" // for RunningOnValgrind #include "base/logging.h" #include "base/cycleclock.h" @@ -240,9 +241,15 @@ static void InitializeSystemInfo() { if (already_called) return; already_called = true; - // I put in a never-called reference to EstimateCyclesPerSecond() here - // to silence the compiler for OS's that don't need it - if (0) EstimateCyclesPerSecond(0); + bool saw_mhz = false; + + if (RunningOnValgrind()) { + // Valgrind may slow the progress of time artificially (--scale-time=N + // option). We thus can't rely on CPU Mhz info stored in /sys or /proc + // files. Thus, actually measure the cps. + cpuinfo_cycles_per_second = EstimateCyclesPerSecond(100); + saw_mhz = true; + } #if defined(__linux__) || defined(__CYGWIN__) || defined(__CYGWIN32__) char line[1024]; @@ -250,21 +257,23 @@ static void InitializeSystemInfo() { // If CPU scaling is in effect, we want to use the *maximum* frequency, // not whatever CPU speed some random processor happens to be using now. - bool saw_mhz = false; - const char* pname0 = "/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq"; - int fd0 = open(pname0, O_RDONLY); - if (fd0 != -1) { - memset(line, '\0', sizeof(line)); - read(fd0, line, sizeof(line)); - const int max_freq = strtol(line, &err, 10); - if (line[0] != '\0' && (*err == '\n' || *err == '\0')) { - // The value is in kHz. For example, on a 2GHz machine, the file - // contains the value "2000000". Historically this file contained no - // newline, but at some point the kernel started appending a newline. - cpuinfo_cycles_per_second = max_freq * 1000.0; - saw_mhz = true; + if (!saw_mhz) { + const char* pname0 = + "/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq"; + int fd0 = open(pname0, O_RDONLY); + if (fd0 != -1) { + memset(line, '\0', sizeof(line)); + read(fd0, line, sizeof(line)); + const int max_freq = strtol(line, &err, 10); + if (line[0] != '\0' && (*err == '\n' || *err == '\0')) { + // The value is in kHz. For example, on a 2GHz machine, the file + // contains the value "2000000". Historically this file contained no + // newline, but at some point the kernel started appending a newline. + cpuinfo_cycles_per_second = max_freq * 1000.0; + saw_mhz = true; + } + close(fd0); } - close(fd0); } // Read /proc/cpuinfo for other values, and if there is no cpuinfo_max_freq. @@ -272,7 +281,9 @@ static void InitializeSystemInfo() { int fd = open(pname, O_RDONLY); if (fd == -1) { perror(pname); - cpuinfo_cycles_per_second = EstimateCyclesPerSecond(1000); + if (!saw_mhz) { + cpuinfo_cycles_per_second = EstimateCyclesPerSecond(1000); + } return; // TODO: use generic tester instead? } diff --git a/src/base/thread_annotations.h b/src/base/thread_annotations.h index ded13d6..f1b3593 100644 --- a/src/base/thread_annotations.h +++ b/src/base/thread_annotations.h @@ -45,15 +45,21 @@ #ifndef BASE_THREAD_ANNOTATIONS_H_ #define BASE_THREAD_ANNOTATIONS_H_ + #if defined(__GNUC__) && defined(__SUPPORT_TS_ANNOTATION__) && (!defined(SWIG)) +#define THREAD_ANNOTATION_ATTRIBUTE__(x) __attribute__((x)) +#else +#define THREAD_ANNOTATION_ATTRIBUTE__(x) // no-op +#endif + // Document if a shared variable/field needs to be protected by a lock. // GUARDED_BY allows the user to specify a particular lock that should be // held when accessing the annotated variable, while GUARDED_VAR only // indicates a shared variable should be guarded (by any lock). GUARDED_VAR // is primarily used when the client cannot express the name of the lock. -#define GUARDED_BY(x) __attribute__ ((guarded_by(x))) -#define GUARDED_VAR __attribute__ ((guarded)) +#define GUARDED_BY(x) THREAD_ANNOTATION_ATTRIBUTE__(guarded_by(x)) +#define GUARDED_VAR THREAD_ANNOTATION_ATTRIBUTE__(guarded) // Document if the memory location pointed to by a pointer should be guarded // by a lock when dereferencing the pointer. Similar to GUARDED_VAR, @@ -63,90 +69,64 @@ // q, which is guarded by mu1, points to a shared memory location that is // guarded by mu2, q should be annotated as follows: // int *q GUARDED_BY(mu1) PT_GUARDED_BY(mu2); -#define PT_GUARDED_BY(x) __attribute__ ((point_to_guarded_by(x))) -#define PT_GUARDED_VAR __attribute__ ((point_to_guarded)) +#define PT_GUARDED_BY(x) \ + THREAD_ANNOTATION_ATTRIBUTE__(point_to_guarded_by(x)) +#define PT_GUARDED_VAR \ + THREAD_ANNOTATION_ATTRIBUTE__(point_to_guarded) // Document the acquisition order between locks that can be held // simultaneously by a thread. For any two locks that need to be annotated // to establish an acquisition order, only one of them needs the annotation. // (i.e. You don't have to annotate both locks with both ACQUIRED_AFTER // and ACQUIRED_BEFORE.) -#define ACQUIRED_AFTER(...) __attribute__ ((acquired_after(__VA_ARGS__))) -#define ACQUIRED_BEFORE(...) __attribute__ ((acquired_before(__VA_ARGS__))) +#define ACQUIRED_AFTER(x) \ + THREAD_ANNOTATION_ATTRIBUTE__(acquired_after(x)) +#define ACQUIRED_BEFORE(x) \ + THREAD_ANNOTATION_ATTRIBUTE__(acquired_before(x)) // The following three annotations document the lock requirements for // functions/methods. // Document if a function expects certain locks to be held before it is called -#define EXCLUSIVE_LOCKS_REQUIRED(...) \ - __attribute__ ((exclusive_locks_required(__VA_ARGS__))) +#define EXCLUSIVE_LOCKS_REQUIRED(x) \ + THREAD_ANNOTATION_ATTRIBUTE__(exclusive_locks_required(x)) -#define SHARED_LOCKS_REQUIRED(...) \ - __attribute__ ((shared_locks_required(__VA_ARGS__))) +#define SHARED_LOCKS_REQUIRED(x) \ + THREAD_ANNOTATION_ATTRIBUTE__(shared_locks_required(x)) // Document the locks acquired in the body of the function. These locks // cannot be held when calling this function (as google3's Mutex locks are // non-reentrant). -#define LOCKS_EXCLUDED(...) __attribute__ ((locks_excluded(__VA_ARGS__))) +#define LOCKS_EXCLUDED(x) \ + THREAD_ANNOTATION_ATTRIBUTE__(locks_excluded(x)) // Document the lock the annotated function returns without acquiring it. -#define LOCK_RETURNED(x) __attribute__ ((lock_returned(x))) +#define LOCK_RETURNED(x) THREAD_ANNOTATION_ATTRIBUTE__(lock_returned(x)) // Document if a class/type is a lockable type (such as the Mutex class). -#define LOCKABLE __attribute__ ((lockable)) +#define LOCKABLE THREAD_ANNOTATION_ATTRIBUTE__(lockable) // Document if a class is a scoped lockable type (such as the MutexLock class). -#define SCOPED_LOCKABLE __attribute__ ((scoped_lockable)) +#define SCOPED_LOCKABLE THREAD_ANNOTATION_ATTRIBUTE__(scoped_lockable) // The following annotations specify lock and unlock primitives. -#define EXCLUSIVE_LOCK_FUNCTION(...) \ - __attribute__ ((exclusive_lock(__VA_ARGS__))) +#define EXCLUSIVE_LOCK_FUNCTION(x) \ + THREAD_ANNOTATION_ATTRIBUTE__(exclusive_lock(x)) -#define SHARED_LOCK_FUNCTION(...) \ - __attribute__ ((shared_lock(__VA_ARGS__))) +#define SHARED_LOCK_FUNCTION(x) \ + THREAD_ANNOTATION_ATTRIBUTE__(shared_lock(x)) -#define EXCLUSIVE_TRYLOCK_FUNCTION(...) \ - __attribute__ ((exclusive_trylock(__VA_ARGS__))) +#define EXCLUSIVE_TRYLOCK_FUNCTION(x) \ + THREAD_ANNOTATION_ATTRIBUTE__(exclusive_trylock(x)) -#define SHARED_TRYLOCK_FUNCTION(...) \ - __attribute__ ((shared_trylock(__VA_ARGS__))) +#define SHARED_TRYLOCK_FUNCTION(x) \ + THREAD_ANNOTATION_ATTRIBUTE__(shared_trylock(x)) -#define UNLOCK_FUNCTION(...) __attribute__ ((unlock(__VA_ARGS__))) +#define UNLOCK_FUNCTION(x) \ + THREAD_ANNOTATION_ATTRIBUTE__(unlock(x)) // An escape hatch for thread safety analysis to ignore the annotated function. -#define NO_THREAD_SAFETY_ANALYSIS __attribute__ ((no_thread_safety_analysis)) - - -#else - -// When the compiler is not GCC, these annotations are simply no-ops. - -// NOTE: in theory, the macros that take "arg" below *could* take -// multiple arguments, but in practice so far they only take one. -// Since not all non-gcc compilers support ... -- notably MSVC 7.1 -- -// I just hard-code in a single arg. If this assumption ever breaks, -// we can change it back to "...", or handle it some other way. - -#define GUARDED_BY(x) // no-op -#define GUARDED_VAR // no-op -#define PT_GUARDED_BY(x) // no-op -#define PT_GUARDED_VAR // no-op -#define ACQUIRED_AFTER(arg) // no-op -#define ACQUIRED_BEFORE(arg) // no-op -#define EXCLUSIVE_LOCKS_REQUIRED(arg) // no-op -#define SHARED_LOCKS_REQUIRED(arg) // no-op -#define LOCKS_EXCLUDED(arg) // no-op -#define LOCK_RETURNED(x) // no-op -#define LOCKABLE // no-op -#define SCOPED_LOCKABLE // no-op -#define EXCLUSIVE_LOCK_FUNCTION(arg) // no-op -#define SHARED_LOCK_FUNCTION(arg) // no-op -#define EXCLUSIVE_TRYLOCK_FUNCTION(arg) // no-op -#define SHARED_TRYLOCK_FUNCTION(arg) // no-op -#define UNLOCK_FUNCTION(arg) // no-op -#define NO_THREAD_SAFETY_ANALYSIS // no-op - -#endif // defined(__GNUC__) && defined(__SUPPORT_TS_ANNOTATION__) - // && !defined(SWIG) +#define NO_THREAD_SAFETY_ANALYSIS \ + THREAD_ANNOTATION_ATTRIBUTE__(no_thread_safety_analysis) #endif // BASE_THREAD_ANNOTATIONS_H_ diff --git a/src/base/vdso_support.h b/src/base/vdso_support.h index c47b3c5..86c4527 100644 --- a/src/base/vdso_support.h +++ b/src/base/vdso_support.h @@ -64,7 +64,7 @@ class VDSOSupport { // Supports iteration over all dynamic symbols. class SymbolIterator { public: - friend struct VDSOSupport; + friend class VDSOSupport; const SymbolInfo *operator->() const; const SymbolInfo &operator*() const; SymbolIterator& operator++(); diff --git a/src/common.cc b/src/common.cc index 04723b1..4b84f18 100644 --- a/src/common.cc +++ b/src/common.cc @@ -53,6 +53,24 @@ static inline int LgFloor(size_t n) { return log; } +int AlignmentForSize(size_t size) { + int alignment = kAlignment; + if (size >= 2048) { + // Cap alignment at 256 for large sizes. + alignment = 256; + } else if (size >= 128) { + // Space wasted due to alignment is at most 1/8, i.e., 12.5%. + alignment = (1 << LgFloor(size)) / 8; + } else if (size >= 16) { + // We need an alignment of at least 16 bytes to satisfy + // requirements for some SSE types. + alignment = 16; + } + CHECK_CONDITION(size < 16 || alignment >= 16); + CHECK_CONDITION((alignment & (alignment - 1)) == 0); + return alignment; +} + int SizeMap::NumMoveSize(size_t size) { if (size == 0) return 0; // Use approx 64k transfers between thread and central caches. @@ -93,19 +111,7 @@ void SizeMap::Init() { int lg = LgFloor(size); if (lg > last_lg) { // Increase alignment every so often to reduce number of size classes. - if (size >= 2048) { - // Cap alignment at 256 for large sizes - alignment = 256; - } else if (size >= 128) { - // Space wasted due to alignment is at most 1/8, i.e., 12.5%. - alignment = size / 8; - } else if (size >= 16) { - // We need an alignment of at least 16 bytes to satisfy - // requirements for some SSE types. - alignment = 16; - } - CHECK_CONDITION(size < 16 || alignment >= 16); - CHECK_CONDITION((alignment & (alignment - 1)) == 0); + alignment = AlignmentForSize(size); last_lg = lg; } CHECK_CONDITION((size % alignment) == 0); diff --git a/src/common.h b/src/common.h index 5226998..e2906d6 100644 --- a/src/common.h +++ b/src/common.h @@ -111,6 +111,10 @@ inline Length pages(size_t bytes) { ((bytes & (kPageSize - 1)) > 0 ? 1 : 0); } +// For larger allocation sizes, we use larger memory alignments to +// reduce the number of size classes. +int AlignmentForSize(size_t size); + // Size-class information + mapping class SizeMap { private: diff --git a/src/google/malloc_extension.h b/src/google/malloc_extension.h index 9c05897..3fbefc9 100644 --- a/src/google/malloc_extension.h +++ b/src/google/malloc_extension.h @@ -219,6 +219,7 @@ class PERFTOOLS_DLL_DECL MallocExtension { // SIZE bytes may reserve more bytes, but will never reserve less. // (Currently only implemented in tcmalloc, other implementations // always return SIZE.) + // This is equivalent to malloc_good_size() in OS X. virtual size_t GetEstimatedAllocatedSize(size_t size); // Returns the actual number N of bytes reserved by tcmalloc for the @@ -232,6 +233,8 @@ class PERFTOOLS_DLL_DECL MallocExtension { // from that -- and should not have been freed yet. p may be NULL. // (Currently only implemented in tcmalloc; other implementations // will return 0.) + // This is equivalent to malloc_size() in OS X, malloc_usable_size() + // in glibc, and _msize() for windows. virtual size_t GetAllocatedSize(void* p); // The current malloc implementation. Always non-NULL. diff --git a/src/google/tcmalloc.h.in b/src/google/tcmalloc.h.in index fbb70ab..cdaaaa0 100644 --- a/src/google/tcmalloc.h.in +++ b/src/google/tcmalloc.h.in @@ -89,6 +89,13 @@ extern "C" { PERFTOOLS_DLL_DECL struct mallinfo tc_mallinfo(void) __THROW; #endif + // This is an alias for MallocExtension::instance()->GetAllocatedSize(). + // It is equivalent to + // OS X: malloc_size() + // glibc: malloc_usable_size() + // Windows: _msize() + size_t tc_malloc_size(void* ptr) __THROW; + #ifdef __cplusplus PERFTOOLS_DLL_DECL int tc_set_new_mode(int flag) __THROW; PERFTOOLS_DLL_DECL void* tc_new(size_t size); diff --git a/src/heap-checker.cc b/src/heap-checker.cc index 2b0b854..c4f6da8 100644 --- a/src/heap-checker.cc +++ b/src/heap-checker.cc @@ -965,7 +965,8 @@ static enum { // specially via self_thread_stack, not here: if (thread_pids[i] == self_thread_pid) continue; RAW_VLOG(11, "Handling thread with pid %d", thread_pids[i]); -#if defined(HAVE_LINUX_PTRACE_H) && defined(HAVE_SYS_SYSCALL_H) && defined(DUMPER) +#if (defined(__i386__) || defined(__x86_64)) && \ + defined(HAVE_LINUX_PTRACE_H) && defined(HAVE_SYS_SYSCALL_H) && defined(DUMPER) i386_regs thread_regs; #define sys_ptrace(r, p, a, d) syscall(SYS_ptrace, (r), (p), (a), (d)) // We use sys_ptrace to avoid thread locking @@ -2091,12 +2092,11 @@ void HeapLeakChecker::CancelGlobalCheck() { // HeapLeakChecker global constructor/destructor ordering components //---------------------------------------------------------------------- -static bool in_initial_malloc_hook = false; - #ifdef HAVE___ATTRIBUTE__ // we need __attribute__((weak)) for this to work #define INSTALLED_INITIAL_MALLOC_HOOKS void HeapLeakChecker_BeforeConstructors(); // below +static bool in_initial_malloc_hook = false; // Helper for InitialMallocHook_* below static inline void InitHeapLeakCheckerFromMallocHook() { @@ -2115,20 +2115,20 @@ static inline void InitHeapLeakCheckerFromMallocHook() { // These will owerwrite the weak definitions in malloc_hook.cc: // Important to have this to catch the first allocation call from the binary: -extern void InitialMallocHook_New(const void* ptr, size_t size) { +extern "C" void InitialMallocHook_New(const void* ptr, size_t size) { InitHeapLeakCheckerFromMallocHook(); // record this first allocation as well (if we need to): MallocHook::InvokeNewHook(ptr, size); } // Important to have this to catch the first mmap call (say from tcmalloc): -extern void InitialMallocHook_MMap(const void* result, - const void* start, - size_t size, - int protection, - int flags, - int fd, - off_t offset) { +extern "C" void InitialMallocHook_MMap(const void* result, + const void* start, + size_t size, + int protection, + int flags, + int fd, + off_t offset) { InitHeapLeakCheckerFromMallocHook(); // record this first mmap as well (if we need to): MallocHook::InvokeMmapHook( @@ -2136,7 +2136,8 @@ extern void InitialMallocHook_MMap(const void* result, } // Important to have this to catch the first sbrk call (say from tcmalloc): -extern void InitialMallocHook_Sbrk(const void* result, ptrdiff_t increment) { +extern "C" void InitialMallocHook_Sbrk(const void* result, + ptrdiff_t increment) { InitHeapLeakCheckerFromMallocHook(); // record this first sbrk as well (if we need to): MallocHook::InvokeSbrkHook(result, increment); diff --git a/src/heap-profiler.cc b/src/heap-profiler.cc index 3055f4c..f28dffb 100644 --- a/src/heap-profiler.cc +++ b/src/heap-profiler.cc @@ -210,6 +210,7 @@ static char* DoGetHeapProfileLocked(char* buf, int buflen) { int bytes_written = 0; if (is_on) { HeapProfileTable::Stats const stats = heap_profile->total(); + (void)stats; // avoid an unused-variable warning in non-debug mode. AddRemoveMMapDataLocked(ADD); bytes_written = heap_profile->FillOrderedProfile(buf, buflen - 1); // FillOrderedProfile should not reduce the set of active mmap-ed regions, diff --git a/src/malloc_hook-inl.h b/src/malloc_hook-inl.h index a629691..a690b07 100644 --- a/src/malloc_hook-inl.h +++ b/src/malloc_hook-inl.h @@ -70,8 +70,17 @@ class AtomicPtr { // Sets the contained value to new_val and returns the old value, // atomically, with acquire and release semantics. + // This is a full-barrier instruction. PtrT Exchange(PtrT new_val); + // Atomically executes: + // result = data_ + // if (data_ == old_val) + // data_ = new_val; + // return result; + // This is a full-barrier instruction. + PtrT CompareAndSwap(PtrT old_val, PtrT new_val); + // Not private so that the class is an aggregate and can be // initialized by the linker. Don't access this directly. AtomicWord data_; diff --git a/src/malloc_hook.cc b/src/malloc_hook.cc index 4315b86..e823a44 100644 --- a/src/malloc_hook.cc +++ b/src/malloc_hook.cc @@ -66,8 +66,10 @@ using std::copy; -// Declarations of three default weak hook functions, that can be overridden by -// linking-in a strong definition (as heap-checker.cc does) +// Declarations of five default weak hook functions, that can be overridden by +// linking-in a strong definition (as heap-checker.cc does). These are extern +// "C" so that they don't trigger gold's --detect-odr-violations warning, which +// only looks at C++ symbols. // // These default hooks let some other library we link in // to define strong versions of InitialMallocHook_New, InitialMallocHook_MMap, @@ -81,31 +83,35 @@ using std::copy; // weak symbols too early, at compile rather than link time. By declaring it // (weak) here, then defining it below after its use, we can avoid the problem. // +extern "C" { + ATTRIBUTE_WEAK -extern void InitialMallocHook_New(const void* ptr, size_t size); +void InitialMallocHook_New(const void* ptr, size_t size); ATTRIBUTE_WEAK -extern void InitialMallocHook_PreMMap(const void* start, - size_t size, - int protection, - int flags, - int fd, - off_t offset); +void InitialMallocHook_PreMMap(const void* start, + size_t size, + int protection, + int flags, + int fd, + off_t offset); ATTRIBUTE_WEAK -extern void InitialMallocHook_MMap(const void* result, - const void* start, - size_t size, - int protection, - int flags, - int fd, - off_t offset); +void InitialMallocHook_MMap(const void* result, + const void* start, + size_t size, + int protection, + int flags, + int fd, + off_t offset); ATTRIBUTE_WEAK -extern void InitialMallocHook_PreSbrk(ptrdiff_t increment); +void InitialMallocHook_PreSbrk(ptrdiff_t increment); ATTRIBUTE_WEAK -extern void InitialMallocHook_Sbrk(const void* result, ptrdiff_t increment); +void InitialMallocHook_Sbrk(const void* result, ptrdiff_t increment); + +} // extern "C" namespace base { namespace internal { template<typename PtrT> @@ -123,6 +129,18 @@ PtrT AtomicPtr<PtrT>::Exchange(PtrT new_val) { return old_val; } +template<typename PtrT> +PtrT AtomicPtr<PtrT>::CompareAndSwap(PtrT old_val, PtrT new_val) { + base::subtle::MemoryBarrier(); // Release semantics. + PtrT retval = reinterpret_cast<PtrT>(static_cast<AtomicWord>( + base::subtle::NoBarrier_CompareAndSwap( + &data_, + reinterpret_cast<AtomicWord>(old_val), + reinterpret_cast<AtomicWord>(new_val)))); + base::subtle::MemoryBarrier(); // And acquire semantics. + return retval; +} + AtomicPtr<MallocHook::NewHook> new_hook_ = { reinterpret_cast<AtomicWord>(InitialMallocHook_New) }; AtomicPtr<MallocHook::DeleteHook> delete_hook_ = { 0 }; @@ -215,8 +233,8 @@ MallocHook_SbrkHook MallocHook_SetSbrkHook(MallocHook_SbrkHook hook) { // TODO(csilvers): add support for removing a hook from the middle of a chain. void InitialMallocHook_New(const void* ptr, size_t size) { - if (MallocHook::GetNewHook() == &InitialMallocHook_New) - MallocHook::SetNewHook(NULL); + // Set new_hook to NULL iff its previous value was InitialMallocHook_New + new_hook_.CompareAndSwap(&InitialMallocHook_New, NULL); } void InitialMallocHook_PreMMap(const void* start, @@ -225,8 +243,7 @@ void InitialMallocHook_PreMMap(const void* start, int flags, int fd, off_t offset) { - if (MallocHook::GetPreMmapHook() == &InitialMallocHook_PreMMap) - MallocHook::SetPreMmapHook(NULL); + premmap_hook_.CompareAndSwap(&InitialMallocHook_PreMMap, NULL); } void InitialMallocHook_MMap(const void* result, @@ -236,18 +253,15 @@ void InitialMallocHook_MMap(const void* result, int flags, int fd, off_t offset) { - if (MallocHook::GetMmapHook() == &InitialMallocHook_MMap) - MallocHook::SetMmapHook(NULL); + mmap_hook_.CompareAndSwap(&InitialMallocHook_MMap, NULL); } void InitialMallocHook_PreSbrk(ptrdiff_t increment) { - if (MallocHook::GetPreSbrkHook() == &InitialMallocHook_PreSbrk) - MallocHook::SetPreSbrkHook(NULL); + presbrk_hook_.CompareAndSwap(&InitialMallocHook_PreSbrk, NULL); } void InitialMallocHook_Sbrk(const void* result, ptrdiff_t increment) { - if (MallocHook::GetSbrkHook() == &InitialMallocHook_Sbrk) - MallocHook::SetSbrkHook(NULL); + sbrk_hook_.CompareAndSwap(&InitialMallocHook_Sbrk, NULL); } DEFINE_ATTRIBUTE_SECTION_VARS(google_malloc); @@ -594,6 +594,10 @@ sub Main() { } elsif ($main::use_symbol_page) { $symbols = FetchSymbols($pcs); } else { + # TODO(csilvers): $libs uses the /proc/self/maps data from profile1, + # which may differ from the data from subsequent profiles, especially + # if they were run on different machines. Use appropriate libs for + # each pc somehow. $symbols = ExtractSymbols($libs, $pcs); } @@ -3043,6 +3047,7 @@ BEGIN { stride => 512 * 1024, # must be a multiple of bitsize/8 slots => [], unpack_code => "", # N for big-endian, V for little + perl_is_64bit => 1, # matters if profile is 64-bit }; bless $self, $class; # Let unittests adjust the stride @@ -3066,17 +3071,15 @@ BEGIN { } @$slots = unpack($self->{unpack_code} . "*", $str); } else { - # If we're a 64-bit profile, make sure we're a 64-bit-capable + # If we're a 64-bit profile, check if we're a 64-bit-capable # perl. Otherwise, each slot will be represented as a float # instead of an int64, losing precision and making all the - # 64-bit addresses right. We *could* try to handle this with - # software emulation of 64-bit ints, but that's added complexity - # for no clear benefit (yet). We use 'Q' to test for 64-bit-ness; - # perl docs say it's only available on 64-bit perl systems. + # 64-bit addresses wrong. We won't complain yet, but will + # later if we ever see a value that doesn't fit in 32 bits. my $has_q = 0; eval { $has_q = pack("Q", "1") ? 1 : 1; }; if (!$has_q) { - ::error("$fname: need a 64-bit perl to process this 64-bit profile.\n"); + $self->{perl_is_64bit} = 0; } read($self->{file}, $str, 8); if (substr($str, 4, 4) eq chr(0)x4) { @@ -3112,11 +3115,17 @@ BEGIN { # TODO(csilvers): if this is a 32-bit perl, the math below # could end up in a too-large int, which perl will promote # to a double, losing necessary precision. Deal with that. - if ($self->{unpack_code} eq 'V') { # little-endian - push(@b64_values, $b32_values[$i] + $b32_values[$i+1] * (2**32)); - } else { - push(@b64_values, $b32_values[$i] * (2**32) + $b32_values[$i+1]); - } + # Right now, we just die. + my ($lo, $hi) = ($b32_values[$i], $b32_values[$i+1]); + if ($self->{unpack_code} eq 'N') { # big-endian + ($lo, $hi) = ($hi, $lo); + } + my $value = $lo + $hi * (2**32); + if (!$self->{perl_is_64bit} && # check value is exactly represented + (($value % (2**32)) != $lo || int($value / (2**32)) != $hi)) { + ::error("Need a 64-bit perl to process this 64-bit profile.\n"); + } + push(@b64_values, $value); } @$slots = @b64_values; } @@ -4341,7 +4350,7 @@ sub ConfigureTool { if ($tools =~ m/(,|^)\Q$tool\E:([^,]*)/) { $path = $2; # TODO(csilvers): sanity-check that $path exists? Hard if it's relative. - } elsif ($tools) { + } elsif ($tools ne '') { foreach my $prefix (split(',', $tools)) { next if ($prefix =~ /:/); # ignore "tool:fullpath" entries in the list if (-x $prefix . $tool) { diff --git a/src/stacktrace_x86-inl.h b/src/stacktrace_x86-inl.h index 6753fdb..a140ab6 100644 --- a/src/stacktrace_x86-inl.h +++ b/src/stacktrace_x86-inl.h @@ -297,7 +297,7 @@ int GET_STACK_TRACE_OR_FRAMES { // sp[2] first argument // ... // NOTE: This will break under llvm, since result is a copy and not in sp[2] - sp = (void **)&pcs - 2; + sp = (void **)&result - 2; #elif defined(__x86_64__) unsigned long rbp; // Move the value of the register %rbp into the local variable rbp. diff --git a/src/tcmalloc.cc b/src/tcmalloc.cc index 13d2c23..93bdd1d 100644 --- a/src/tcmalloc.cc +++ b/src/tcmalloc.cc @@ -137,6 +137,7 @@ #endif using std::max; +using tcmalloc::AlignmentForSize; using tcmalloc::PageHeap; using tcmalloc::PageHeapAllocator; using tcmalloc::SizeMap; @@ -212,7 +213,7 @@ extern "C" { ATTRIBUTE_SECTION(google_malloc); int tc_mallopt(int cmd, int value) __THROW ATTRIBUTE_SECTION(google_malloc); -#ifdef HAVE_STRUCT_MALLINFO // struct mallinfo isn't defined on freebsd +#ifdef HAVE_STRUCT_MALLINFO struct mallinfo tc_mallinfo(void) __THROW ATTRIBUTE_SECTION(google_malloc); #endif @@ -238,6 +239,15 @@ extern "C" { ATTRIBUTE_SECTION(google_malloc); void tc_deletearray_nothrow(void* ptr, const std::nothrow_t&) __THROW ATTRIBUTE_SECTION(google_malloc); + + // Some non-standard extensions that we support. + + // This is equivalent to + // OS X: malloc_size() + // glibc: malloc_usable_size() + // Windows: _msize() + size_t tc_malloc_size(void* p) __THROW + ATTRIBUTE_SECTION(google_malloc); } // extern "C" #endif // #ifndef _WIN32 @@ -282,6 +292,8 @@ extern "C" { #ifdef HAVE_STRUCT_MALLINFO struct mallinfo mallinfo(void) __THROW ALIAS("tc_mallinfo"); #endif + size_t malloc_size(void* p) __THROW ALIAS("tc_malloc_size"); + size_t malloc_usable_size(void* p) __THROW ALIAS("tc_malloc_size"); } // extern "C" #else // #if defined(__GNUC__) && !defined(__MACH__) // Portable wrappers @@ -318,6 +330,8 @@ extern "C" { #ifdef HAVE_STRUCT_MALLINFO struct mallinfo mallinfo(void) __THROW { return tc_mallinfo(); } #endif + size_t malloc_size(void* p) __THROW { return tc_malloc_size(p); } + size_t malloc_usable_size(void* p) __THROW { return tc_malloc_size(p); } } // extern "C" #endif // #if defined(__GNUC__) @@ -845,6 +859,8 @@ static void* DoSampledAllocation(size_t size) { return SpanToMallocResult(span); } +namespace { + // Copy of FLAGS_tcmalloc_large_alloc_report_threshold with // automatic increases factored in. static int64_t large_alloc_threshold = @@ -868,8 +884,6 @@ static void ReportLargeAlloc(Length num_pages, void* result) { write(STDERR_FILENO, buffer, strlen(buffer)); } -namespace { - inline void* cpp_alloc(size_t size, bool nothrow); inline void* do_malloc(size_t size); @@ -1096,16 +1110,23 @@ inline void* do_realloc(void* old_ptr, size_t new_size) { // For use by exported routines below that want specific alignments // -// Note: this code can be slow, and can significantly fragment memory. -// The expectation is that memalign/posix_memalign/valloc/pvalloc will -// not be invoked very often. This requirement simplifies our -// implementation and allows us to tune for expected allocation -// patterns. +// Note: this code can be slow for alignments > 16, and can +// significantly fragment memory. The expectation is that +// memalign/posix_memalign/valloc/pvalloc will not be invoked very +// often. This requirement simplifies our implementation and allows +// us to tune for expected allocation patterns. void* do_memalign(size_t align, size_t size) { ASSERT((align & (align - 1)) == 0); ASSERT(align > 0); if (size + align < size) return NULL; // Overflow + // Fall back to malloc if we would already align this memory access properly. + if (align <= AlignmentForSize(size)) { + void* p = do_malloc(size); + ASSERT((reinterpret_cast<uintptr_t>(p) % align) == 0); + return p; + } + if (Static::pageheap() == NULL) ThreadCache::InitModule(); // Allocate at least one byte to avoid boundary conditions below @@ -1178,7 +1199,7 @@ inline int do_mallopt(int cmd, int value) { return 1; // Indicates error } -#ifdef HAVE_STRUCT_MALLINFO // mallinfo isn't defined on freebsd, for instance +#ifdef HAVE_STRUCT_MALLINFO inline struct mallinfo do_mallinfo() { TCMallocStats stats; ExtractStats(&stats, NULL); @@ -1204,7 +1225,7 @@ inline struct mallinfo do_mallinfo() { return info; } -#endif // #ifndef HAVE_STRUCT_MALLINFO +#endif // HAVE_STRUCT_MALLINFO static SpinLock set_new_handler_lock(SpinLock::LINKER_INITIALIZED); @@ -1489,6 +1510,10 @@ extern "C" PERFTOOLS_DLL_DECL struct mallinfo tc_mallinfo(void) __THROW { } #endif +extern "C" PERFTOOLS_DLL_DECL size_t tc_malloc_size(void* ptr) __THROW { + return GetSizeWithCallback(ptr, &InvalidGetAllocatedSize); +} + // This function behaves similarly to MSVC's _set_new_mode. // If flag is 0 (default), calls to malloc will behave normally. // If flag is 1, calls to malloc will behave like calls to new, diff --git a/src/tests/sampler_test.cc b/src/tests/sampler_test.cc index fca10ac..045cd02 100755 --- a/src/tests/sampler_test.cc +++ b/src/tests/sampler_test.cc @@ -647,6 +647,11 @@ TEST(Sample, size_of_class) { LOG(INFO) << "Size of Sampler object is: " << sizeof(sampler); } +// Make sure sampling is enabled, or the tests won't work right. +DECLARE_int64(tcmalloc_sample_parameter); + int main(int argc, char **argv) { + if (FLAGS_tcmalloc_sample_parameter == 0) + FLAGS_tcmalloc_sample_parameter = 524288; return RUN_ALL_TESTS(); } diff --git a/src/tests/tcmalloc_unittest.cc b/src/tests/tcmalloc_unittest.cc index 6b2ec26..522c0d9 100644 --- a/src/tests/tcmalloc_unittest.cc +++ b/src/tests/tcmalloc_unittest.cc @@ -126,6 +126,7 @@ using std::string; DECLARE_double(tcmalloc_release_rate); DECLARE_int32(max_free_queue_size); // in debugallocation.cc +DECLARE_int64(tcmalloc_sample_parameter); namespace testing { @@ -559,6 +560,13 @@ static void TestCalloc(size_t n, size_t s, bool ok) { // direction doesn't cause us to allocate new memory. static void TestRealloc() { #ifndef DEBUGALLOCATION // debug alloc doesn't try to minimize reallocs + // When sampling, we always allocate in units of page-size, which + // makes reallocs of small sizes do extra work (thus, failing these + // checks). Since sampling is random, we turn off sampling to make + // sure that doesn't happen to us here. + const int64 old_sample_parameter = FLAGS_tcmalloc_sample_parameter; + FLAGS_tcmalloc_sample_parameter = 0; // turn off sampling + int start_sizes[] = { 100, 1000, 10000, 100000 }; int deltas[] = { 1, -2, 4, -8, 16, -32, 64, -128 }; @@ -566,7 +574,7 @@ static void TestRealloc() { void* p = malloc(start_sizes[s]); CHECK(p); // The larger the start-size, the larger the non-reallocing delta. - for (int d = 0; d < s*2; ++d) { + for (int d = 0; d < (s+1) * 2; ++d) { void* new_p = realloc(p, start_sizes[s] + deltas[d]); CHECK(p == new_p); // realloc should not allocate new memory } @@ -577,6 +585,7 @@ static void TestRealloc() { } free(p); } + FLAGS_tcmalloc_sample_parameter = old_sample_parameter; #endif } @@ -998,9 +1007,14 @@ static int RunAllTests(int argc, char** argv) { void* p1 = malloc(10); VerifyNewHookWasCalled(); + // Also test the non-standard tc_malloc_size + size_t actual_p1_size = tc_malloc_size(p1); + CHECK_GE(actual_p1_size, 10); + CHECK_LT(actual_p1_size, 100000); // a reasonable upper-bound, I think free(p1); VerifyDeleteHookWasCalled(); + p1 = calloc(10, 2); VerifyNewHookWasCalled(); p1 = realloc(p1, 30); diff --git a/src/windows/google/tcmalloc.h b/src/windows/google/tcmalloc.h index 663b7f9..5bd4c59 100644 --- a/src/windows/google/tcmalloc.h +++ b/src/windows/google/tcmalloc.h @@ -90,6 +90,13 @@ extern "C" { PERFTOOLS_DLL_DECL struct mallinfo tc_mallinfo(void) __THROW; #endif + // This is an alias for MallocExtension::instance()->GetAllocatedSize(). + // It is equivalent to + // OS X: malloc_size() + // glibc: malloc_usable_size() + // Windows: _msize() + PERFTOOLS_DLL_DECL size_t tc_malloc_size(void* ptr) __THROW; + #ifdef __cplusplus PERFTOOLS_DLL_DECL int tc_set_new_mode(int flag) __THROW; PERFTOOLS_DLL_DECL void* tc_new(size_t size); diff --git a/src/windows/port.cc b/src/windows/port.cc index 9a9da80..d62fa9d 100644 --- a/src/windows/port.cc +++ b/src/windows/port.cc @@ -35,6 +35,7 @@ # error You should only be including windows/port.cc in a windows environment! #endif +#define NOMINMAX // so std::max, below, compiles correctly #include <config.h> #include <string.h> // for strlen(), memset(), memcmp() #include <assert.h> diff --git a/vsprojects/addressmap_unittest/addressmap_unittest.vcproj b/vsprojects/addressmap_unittest/addressmap_unittest.vcproj index 7dd8657..d48ef27 100755 --- a/vsprojects/addressmap_unittest/addressmap_unittest.vcproj +++ b/vsprojects/addressmap_unittest/addressmap_unittest.vcproj @@ -128,7 +128,7 @@ </FileConfiguration>
</File>
<File
- RelativePath="..\..\src\base\dynamic_annotations.cc">
+ RelativePath="..\..\src\base\dynamic_annotations.c">
<FileConfiguration
Name="Debug|Win32">
<Tool
diff --git a/vsprojects/libtcmalloc_minimal/libtcmalloc_minimal.vcproj b/vsprojects/libtcmalloc_minimal/libtcmalloc_minimal.vcproj index 3755fb0..58d32e6 100755 --- a/vsprojects/libtcmalloc_minimal/libtcmalloc_minimal.vcproj +++ b/vsprojects/libtcmalloc_minimal/libtcmalloc_minimal.vcproj @@ -130,7 +130,7 @@ </FileConfiguration>
</File>
<File
- RelativePath="..\..\src\base\dynamic_annotations.cc">
+ RelativePath="..\..\src\base\dynamic_annotations.c">
<FileConfiguration
Name="Debug|Win32">
<Tool
@@ -504,23 +504,6 @@ </FileConfiguration>
</File>
<File
- RelativePath="..\..\src\stacktrace_with_context.cc">
- <FileConfiguration
- Name="Debug|Win32">
- <Tool
- Name="VCCLCompilerTool"
- AdditionalIncludeDirectories="..\..\src\windows; ..\..\src"
- RuntimeLibrary="3"/>
- </FileConfiguration>
- <FileConfiguration
- Name="Release|Win32">
- <Tool
- Name="VCCLCompilerTool"
- AdditionalIncludeDirectories="..\..\src\windows; ..\..\src"
- RuntimeLibrary="2"/>
- </FileConfiguration>
- </File>
- <File
RelativePath="..\..\src\stack_trace_table.cc">
<FileConfiguration
Name="Debug|Win32">
diff --git a/vsprojects/low_level_alloc_unittest/low_level_alloc_unittest.vcproj b/vsprojects/low_level_alloc_unittest/low_level_alloc_unittest.vcproj index 85fe7f7..f55b56c 100755 --- a/vsprojects/low_level_alloc_unittest/low_level_alloc_unittest.vcproj +++ b/vsprojects/low_level_alloc_unittest/low_level_alloc_unittest.vcproj @@ -111,7 +111,7 @@ Filter="cpp;c;cxx;def;odl;idl;hpj;bat;asm;asmx"
UniqueIdentifier="{4FC737F1-C7A5-4376-A066-2A32D752A2FF}">
<File
- RelativePath="..\..\src\base\dynamic_annotations.cc">
+ RelativePath="..\..\src\base\dynamic_annotations.c">
<FileConfiguration
Name="Debug|Win32">
<Tool
@@ -263,23 +263,6 @@ RuntimeLibrary="2"/>
</FileConfiguration>
</File>
- <File
- RelativePath="..\..\src\stacktrace_with_context.cc">
- <FileConfiguration
- Name="Debug|Win32">
- <Tool
- Name="VCCLCompilerTool"
- AdditionalIncludeDirectories="..\..\src\windows; ..\..\src"
- RuntimeLibrary="3"/>
- </FileConfiguration>
- <FileConfiguration
- Name="Release|Win32">
- <Tool
- Name="VCCLCompilerTool"
- AdditionalIncludeDirectories="..\..\src\windows; ..\..\src"
- RuntimeLibrary="2"/>
- </FileConfiguration>
- </File>
</Filter>
<Filter
Name="Header Files"
diff --git a/vsprojects/tmu-static/tmu-static.vcproj b/vsprojects/tmu-static/tmu-static.vcproj index a5d6402..8d739ae 100755 --- a/vsprojects/tmu-static/tmu-static.vcproj +++ b/vsprojects/tmu-static/tmu-static.vcproj @@ -130,7 +130,7 @@ </FileConfiguration>
</File>
<File
- RelativePath="..\..\src\base\dynamic_annotations.cc">
+ RelativePath="..\..\src\base\dynamic_annotations.c">
<FileConfiguration
Name="Debug|Win32">
<Tool
@@ -544,25 +544,6 @@ </FileConfiguration>
</File>
<File
- RelativePath="..\..\src\stacktrace_with_context.cc">
- <FileConfiguration
- Name="Debug|Win32">
- <Tool
- Name="VCCLCompilerTool"
- AdditionalOptions="/D PERFTOOLS_DLL_DECL="
- AdditionalIncludeDirectories="..\..\src\windows; ..\..\src"
- RuntimeLibrary="3"/>
- </FileConfiguration>
- <FileConfiguration
- Name="Release|Win32">
- <Tool
- Name="VCCLCompilerTool"
- AdditionalOptions="/D PERFTOOLS_DLL_DECL="
- AdditionalIncludeDirectories="..\..\src\windows; ..\..\src"
- RuntimeLibrary="2"/>
- </FileConfiguration>
- </File>
- <File
RelativePath="..\..\src\stack_trace_table.cc">
<FileConfiguration
Name="Debug|Win32">
|