35 files changed, 420 insertions, 378 deletions
diff --git a/ChangeLog b/ChangeLog
index 02f696e..8cb20e1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,39 @@
+Thu Aug  5 12:48:03 PDT 2010
+
+	* google-perftools: version 1.6 release
+	* Add tc_malloc_usable_size for compatibility with glibc (csilvers)
+	* Override malloc_usable_size with tc_malloc_usable_size (csilvers)
+	* Default to no automatic heap sampling in tcmalloc (csilvers)
+	* Add -DTCMALLOC_LARGE_PAGES, a possibly faster tcmalloc (rus)
+	* Make some functions extern "C" to avoid false ODR warnings (jyasskin)
+	* pprof: Add SVG-based output (rsc)
+	* pprof: Extend pprof --tools to allow per-tool configs (csilvers)
+	* pprof: Improve support of 64-bit and big-endian profiles (csilvers)
+	* pprof: Add interactive callgrind suport (weidenri...)
+	* pprof: Improve address->function mapping a bit (dpeng)
+	* Better detection of when we're running under valgrind (csilvers)
+	* Better CPU-speed detection under valgrind (saito)
+	* Use, and recommend, -fno-builtin-malloc when compiling (csilvers)
+	* Avoid false-sharing of memory between caches (bmaurer)
+	* BUGFIX: Fix heap sampling to use correct alloc size (bmauer)
+	* BUGFIX: Avoid gcc 4.0.x bug by making hook-clearing atomic (csilvers)
+	* BUGFIX: Avoid gcc 4.5.x optimization bug (csilvers)
+	* BUGFIX: Work around deps-determining bug in libtool 1.5.26 (csilvers)
+	* BUGFIX: Fixed test to use HAVE_PTHREAD, not HAVE_PTHREADS (csilvers)
+	* BUGFIX: Fix tls callback behavior on windows when using wpo (wtc)
+	* BUGFIX: properly align allocation sizes on Windows (antonm)
+	* BUGFIX: Fix prototypes for tcmalloc/debugalloc wrt throw() (csilvers)
+	* DOC: Updated heap-checker doc to match reality better (fischman)
+	* DOC: Document ProfilerFlush, ProfilerStartWithOptions (csilvers)
+	* DOC: Update docs for heap-profiler functions (csilvers)
+	* DOC: Clean up documentation around tcmalloc.slack_bytes (fikes)
+	* DOC: Renamed README.windows to README_windows.txt (csilvers)
+	* DOC: Update the NEWS file to be non-empty (csilvers)
+	* PORTING: Fix windows addr2line and nm with proper rc code (csilvers)
+	* PORTING: Add CycleClock and atomicops support for arm 5 (sanek)
+	* PORTING: Improve PC finding on cygwin and redhat 7 (csilvers)
+	* PORTING: speed up function-patching under windows (csilvers)
+
 Tue Jan 19 14:46:12 2010  Google Inc. <opensource@google.com>
 
 	* google-perftools: version 1.5 release
diff --git a/INSTALL b/INSTALL
index 48a4fa6..d82e3aa 100644
--- a/INSTALL
+++ b/INSTALL
@@ -65,6 +65,26 @@ application with frame pointers (via 'gcc -fno-omit-frame-pointer
 ...') in this case.
 
 
+*** TCMALLOC LARGE PAGES: TRADING TIME FOR SPACE
+
+Internally, tcmalloc divides its memory into "pages."  The default
+page size is chosen to minimize memory use by reducing fragmentation.
+The cost is that keeping track of these pages can cost tcmalloc time.
+We've added a new, experimental flag to tcmalloc that enables a larger
+page size.  In general, this will increase the memory needs of
+applications using tcmalloc.  However, in many cases it will speed up
+the applications as well, particularly if they allocate and free a lot
+of memory.  We've seen average speedups of 3-5% on Google
+applications.
+
+This feature is still very experimental; it's not even a configure
+flag yet.  To build libtcmalloc with large pages, run
+
+   ./configure <normal flags> CXXFLAGS=-DTCMALLOC_LARGE_PAGES
+
+(or add -DTCMALLOC_LARGE_PAGES to your existing CXXFLAGS argument).
+
+
 *** NOTE FOR ___tls_get_addr ERROR
 
 When compiling perftools on some old systems, like RedHat 8, you may
@@ -191,6 +211,15 @@ above, by linking in libtcmalloc_minimal.
    successfully build are exactly the same as for FreeBSD.  See that
    section for a list of binaries and instructions on building them.
 
+   In addition, it appears OS X regularly fails profiler_unittest.sh
+   in the "thread" test (in addition to occassionally failing in the
+   "fork" test).  It looks like OS X often delivers the profiling
+   signal to the main thread, even when it's sleeping, rather than
+   spawned threads that are doing actual work.  If anyone knows
+   details of how OS X handles SIGPROF (via setitimer()) events with
+   threads, and has insight into this problem, please send mail to
+   google-perftools@googlegroups.com.
+
 ** Solaris 10 x86:
 
    I've only tested using the GNU C++ compiler, not the Sun C++
@@ -236,7 +265,10 @@ above, by linking in libtcmalloc_minimal.
    the heap-checker and a few other pieces of functionality will not
    compile).  'make' will compile those libraries and tests that can
    be compiled.  You can run 'make check' to make sure the basic
-   functionality is working.
+   functionality is working.  I've heard reports that some versions of
+   cygwin fail calls to pthread_join() with EINVAL, causing several
+   tests to fail.  If you have any insight into this, please mail
+   google-perftools@googlegroups.com.
 
    This Windows functionality is also available using MinGW and Msys,
    In this case, you can use the regular './configure && make'
diff --git a/NEWS b/NEWS
index 064bd4b..b868bf2 100644
--- a/NEWS
+++ b/NEWS
@@ -1,3 +1,31 @@
+=== 5 August 2010 ===
+
+I've just released perftools 1.6
+
+This version also has a large number of minor changes, including
+support for `malloc_usable_size()` as a glibc-compatible alias to
+`malloc_size()`, the addition of SVG-based output to `pprof`, and
+experimental support for tcmalloc large pages, which may speed up
+tcmalloc at the cost of greater memory use.  To use tcmalloc large
+pages, see the
+[http://google-perftools.googlecode.com/svn/tags/perftools-1.5/INSTALL
+INSTALL file]; for all changes, see the
+[http://google-perftools.googlecode.com/svn/tags/perftools-1.5/ChangeLog
+ChangeLog].
+
+OS X NOTE: improvements in the profiler unittest have turned up an OS
+X issue: in multithreaded programs, it seems that OS X often delivers
+the profiling signal (from sigitimer()) to the main thread, even when
+it's sleeping, rather than spawned threads that are doing actual work.
+If anyone knows details of how OS X handles SIGPROF events (from
+setitimer) in threaded programs, and has insight into this problem,
+please send mail to google-perftools@googlegroups.com.
+
+To see if you're affected by this, look for profiling time that pprof
+attributes to ___semwait_signal.  This is work being done in other
+threads, that is being attributed to sleeping-time in the main thread.
+
+
 === 20 January 2010 ===
 
 I've just released perftools 1.5
diff --git a/README b/README
index 40ac8dc..ab31c46 100644
--- a/README
+++ b/README
@@ -11,7 +11,7 @@ tcmalloc -- a replacement for malloc and new.  See below for some
 environment variables you can use with tcmalloc, as well.
 
 tcmalloc functionality is available on all systems we've tested; see
-INSTALL for more details.  See README.windows for instructions on
+INSTALL for more details.  See README_windows.txt for instructions on
 using tcmalloc on Windows.
 
 NOTE: When compiling with programs with gcc, that you plan to link
@@ -161,7 +161,7 @@ in its full generality only on those systems.  However, we've
 successfully ported much of the tcmalloc library to FreeBSD, Solaris
 x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic
 functionality in tcmalloc_minimal to Windows.  See INSTALL for details.
-See README.windows for details on the Windows port.
+See README_windows.txt for details on the Windows port.
 
 
 PERFORMANCE
@@ -175,6 +175,11 @@ win32's malloc.
   http://www.highlandsun.com/hyc/malloc/
   http://gaiacrtn.free.fr/articles/win32perftools.html
 
+It's possible to build tcmalloc in a way that trades off faster
+performance (particularly for deletes) at the cost of more memory
+fragmentation (that is, more unusable memory on your system).  See the
+INSTALL file for details.
+
 
 OLD SYSTEM ISSUES
 -----------------
diff --git a/configure b/configure
index 9a3048c..989c7fb 100755
--- a/configure
+++ b/configure
@@ -1,6 +1,6 @@
 #! /bin/sh
 # Guess values for system-dependent variables and create Makefiles.
-# Generated by GNU Autoconf 2.64 for google-perftools 1.5.
+# Generated by GNU Autoconf 2.64 for google-perftools 1.6.
 #
 # Report bugs to <opensource@google.com>.
 #
@@ -703,8 +703,8 @@ MAKEFLAGS=
 # Identity of this package.
 PACKAGE_NAME='google-perftools'
 PACKAGE_TARNAME='google-perftools'
-PACKAGE_VERSION='1.5'
-PACKAGE_STRING='google-perftools 1.5'
+PACKAGE_VERSION='1.6'
+PACKAGE_STRING='google-perftools 1.6'
 PACKAGE_BUGREPORT='opensource@google.com'
 PACKAGE_URL=''
 
@@ -1464,7 +1464,7 @@ if test "$ac_init_help" = "long"; then
   # Omit some internal or obsolete options to make the list less imposing.
   # This message is too long to be a string in the A/UX 3.1 sh.
   cat <<_ACEOF
-\`configure' configures google-perftools 1.5 to adapt to many kinds of systems.
+\`configure' configures google-perftools 1.6 to adapt to many kinds of systems.
 
 Usage: $0 [OPTION]... [VAR=VALUE]...
 
@@ -1535,7 +1535,7 @@ fi
 
 if test -n "$ac_init_help"; then
   case $ac_init_help in
-     short | recursive ) echo "Configuration of google-perftools 1.5:";;
+     short | recursive ) echo "Configuration of google-perftools 1.6:";;
    esac
   cat <<\_ACEOF
 
@@ -1648,7 +1648,7 @@ fi
 test -n "$ac_init_help" && exit $ac_status
 if $ac_init_version; then
   cat <<\_ACEOF
-google-perftools configure 1.5
+google-perftools configure 1.6
 generated by GNU Autoconf 2.64
 
 Copyright (C) 2009 Free Software Foundation, Inc.
@@ -2317,7 +2317,7 @@ cat >config.log <<_ACEOF
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
 
-It was created by google-perftools $as_me 1.5, which was
+It was created by google-perftools $as_me 1.6, which was
 generated by GNU Autoconf 2.64.  Invocation command line was
 
   $ $0 $@
@@ -3050,7 +3050,7 @@ fi
 
 # Define the identity of the package.
  PACKAGE='google-perftools'
- VERSION='1.5'
+ VERSION='1.6'
 
 
 cat >>confdefs.h <<_ACEOF
@@ -20303,7 +20303,7 @@ $as_echo_n "checking how to access the program counter from a struct ucontext...
    pc_fields="$pc_fields uc_mcontext.sc_ip"            # Linux (ia64)
    pc_fields="$pc_fields uc_mcontext.uc_regs->gregs[PT_NIP]" # Linux (ppc)
    pc_fields="$pc_fields uc_mcontext.gregs[R15]"     # Linux (arm old [untested])
-   pc_fields="$pc_fields uc_mcontext.arm_pc"           # Linux (arm new [untested])
+   pc_fields="$pc_fields uc_mcontext.arm_pc"           # Linux (arm arch 5)
    pc_fields="$pc_fields uc_mcontext.gp_regs[PT_NIP]"  # Suse SLES 11 (ppc64)
    pc_fields="$pc_fields uc_mcontext.mc_eip"           # FreeBSD (i386)
    pc_fields="$pc_fields uc_mcontext.mc_rip"           # FreeBSD (x86_64 [untested])
@@ -22208,7 +22208,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
 # report actual input values of CONFIG_FILES etc. instead of their
 # values after options handling.
 ac_log="
-This file was extended by google-perftools $as_me 1.5, which was
+This file was extended by google-perftools $as_me 1.6, which was
 generated by GNU Autoconf 2.64.  Invocation command line was
 
   CONFIG_FILES    = $CONFIG_FILES
@@ -22272,7 +22272,7 @@ Report bugs to <opensource@google.com>."
 _ACEOF
 cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
 ac_cs_version="\\
-google-perftools config.status 1.5
+google-perftools config.status 1.6
 configured by $0, generated by GNU Autoconf 2.64,
   with options \\"`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`\\"
 
diff --git a/configure.ac b/configure.ac
index adbb2e5..22d1c5e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -4,7 +4,7 @@
 # make sure we're interpreted by some minimal autoconf
 AC_PREREQ(2.57)
 
-AC_INIT(google-perftools, 1.5, opensource@google.com)
+AC_INIT(google-perftools, 1.6, opensource@google.com)
 # The argument here is just something that should be in the current directory
 # (for sanity checking)
 AC_CONFIG_SRCDIR(README)
diff --git a/doc/heap_checker.html b/doc/heap_checker.html
index caf46ef..544ce60 100644
--- a/doc/heap_checker.html
+++ b/doc/heap_checker.html
@@ -89,7 +89,7 @@ check:</p>
   <li> <code>draconian</code>
 </ol>
 
-<p>"Minimal" heap-checking starts as late as possible ina
+<p>"Minimal" heap-checking starts as late as possible in a
 initialization, meaning you can leak some memory in your
 initialization routines (that run before <code>main()</code>, say),
 and not trigger a leak message.  If you frequently (and purposefully)
@@ -162,19 +162,13 @@ cleanup code as active only when the heap-checker is turned on.</p>
 
 <h2><a name="explicit">Explicit (Partial-program) Heap Leak Checking</h2>
 
-<p>Instead of whole-program checking, you can check certain parts of
-your code to verify they do not have memory leaks.  There are two
-types of checks you can do.  The "no leak" check verifies that between
-two parts of a program, no memory is allocated without being freed; it
-checks that memory does not grow.  The stricter "same heap" check
-verifies that two parts of a program share the same heap profile; that
-is, that the memory does not grow <i>or shrink</i>, or change in any
-way.</p>
-
+<p>Instead of whole-program checking, you can check certain parts of your
+code to verify they do not have memory leaks.  This check verifies that
+between two parts of a program, no memory is allocated without being freed.</p>
 <p>To use this kind of checking code, bracket the code you want
 checked by creating a <code>HeapLeakChecker</code> object at the
-beginning of the code segment, and calling <code>*SameHeap()</code> or
-<code>*NoLeaks()</code> at the end.  These functions, and all others
+beginning of the code segment, and call
+<code>NoLeaks()</code> at the end.  These functions, and all others
 referred to in this file, are declared in
 <code>&lt;google/heap-checker.h&gt;</code>.
 </p>
@@ -184,31 +178,11 @@ referred to in this file, are declared in
   HeapLeakChecker heap_checker("test_foo");
   {
     code that exercises some foo functionality;
-    this code should preserve memory allocation state;
+    this code should not leak memory;
   }
-  if (!heap_checker.SameHeap()) assert(NULL == "heap memory leak");
-</pre>
-
-<p>The various flavors of these functions -- <code>SameHeap()</code>,
-<code>QuickSameHeap()</code>, <code>BriefSameHeap()</code> -- trade
-off running time for accuracy: the faster routines might miss some
-legitimate leaks.  For instance, the briefest tests might be confused
-by code like this:</p>
-<pre>
-   void LeakTwentyBytes() {
-     char* a = malloc(20);
-     HeapLeakChecker heap_checker("test_malloc");
-     char* b = malloc(20);
-     free(a);
-     // This will pass: it totes up 20 bytes allocated and 20 bytes freed
-     assert(heap_checker.BriefNoLeaks());  // doesn't detect that b is leaked
-   }
+  if (!heap_checker.NoLeaks()) assert(NULL == "heap memory leak");
 </pre>
 
-<p>(This is because <code>BriefSameHeap()</code> does not use <A
-HREF="#pprof">pprof</A>, which is slower but is better able to track
-allocations in tricky situations like the above.)</p>
-
 <p>Note that adding in the <code>HeapLeakChecker</code> object merely
 instruments the code for leak-checking.  To actually turn on this
 leak-checking on a particular run of the executable, you must still
@@ -300,28 +274,12 @@ checking.</p>
 <table frame=box rules=sides cellpadding=5 width=100%>
 
 <tr valign=top>
-  <td><code>HEAP_CHECK_REPORT</code></td>
-  <td>Default: true</td>
-  <td>
-    If true, use <code>pprof</code> to report more info about found leaks.
-  </td>
-</tr>
-
-<tr valign=top>
-  <td><code>HEAP_CHECK_STRICT_CHECK</code></td>
-  <td>Default: true</td>
-  <td>
-    If true, do the program-end check via <code>SameHeap()</code>;
-    if false, use <code>NoLeaks()</code>.
-  </td>
-</tr>
-
-<tr valign=top>
   <td><code>HEAP_CHECK_MAX_LEAKS</code></td>
   <td>Default: 20</td>
   <td>
-    The maximum number of leaks to be reported. If negative or zero, print all
-    the leaks found.
+    The maximum number of leaks to be printed to stderr (all leaks are still
+    emitted to file output for pprof to visualize). If negative or zero,
+    print all the leaks found.
   </td>
 </tr>
 
@@ -449,16 +407,15 @@ and then look closely at the generated leak report messages.
 <p>When a <code>HeapLeakChecker</code> object is constructed, it dumps
 a memory-usage profile named
 <code>&lt;prefix&gt;.&lt;name&gt;-beg.heap</code> to a temporary
-directory.  When <code>*NoLeaks()</code> or <code>*SameHeap()</code>
+directory.  When <code>NoLeaks()</code>
 is called (for whole-program checking, this happens automatically at
 program-exit), it dumps another profile, named
 <code>&lt;prefix&gt;.&lt;name&gt;-end.heap</code>.
 (<code>&lt;prefix&gt;</code> is typically determined automatically,
 and <code>&lt;name&gt;</code> is typically <code>argv[0]</code>.)  It
-then compares the two profiles.  If the second profile shows more
-memory use than the first (or, for <code>*SameHeap()</code> calls,
-any different pattern of memory use than the first), the
-<code>*NoLeaks()</code> or <code>*SameHeap()</code> function will
+then compares the two profiles.  If the second profile shows
+more memory use than the first, the
+<code>NoLeaks()</code> function will
 return false.  For "whole program" profiling, this will cause the
 executable to abort (via <code>exit(1)</code>).  In all cases, it will
 print a message on how to process the dumped profiles to locate
@@ -520,94 +477,20 @@ of explicit clean up code and other hassles when dealing with thread
 data.</p>
 
 
-<h3><A NAME="pprof">More Exact Checking via pprof</A></h3>
+<h3>Visualizing Leak with <code>pprof</code></h3>
 
-<p>The perftools profiling tool, <code>pprof</code>, is primarily
-intended for users to use interactively in order to explore heap and
-CPU usage.  However, the heap-checker can -- and, by default, does -
-call <code>pprof</code> internally, in order to improve its leak
-checking.</p>
-
-<p>In particular, the heap-checker calls <code>pprof</code> to utilize
-the full call-path for all allocations.  <code>pprof</code> uses this
-data to disambiguate allocations.  When the time comes to do a
-<code>SameHeap</code> or <code>NoLeaks</code> check, the heap-checker
-asks <code>pprof</code> to do this check on an
-allocation-by-allocation basis, rather than just by comparing global
-counts.</p>
-
-<p>Here's an example.  Consider the following function:</p>
-<pre>
-   void LeakTwentyBytes() {
-     char* a = malloc(20);
-     HeapLeakChecker heap_checker("test_malloc");
-     char* b = malloc(20);
-     free(a);
-     heap_checker.NoLeaks();
-   }
-</pre>
-
-<p>Without using pprof, the only thing we will do is count up the
-number of allocations and frees inside the leak-checked interval.
-Twenty bytes allocated, twenty bytes freed, and the code looks ok.</p>
-
-<p>With pprof, however, we can track the call-path for each
-allocation, and account for them separately.  In the example function
-above, there are two call-paths that end in an allocation, one that
-ends in "LeakTwentyBytes:line1" and one that ends in
-"LeakTwentyBytes:line3".</p>
-
-<p>Here's how the heap-checker works when it can use pprof in this
-way:</p>
-<ol>
-  <li> <b>Line 1:</b> Allocate 20 bytes, mark <code>a</code> as having
-       call-path "LeakTwentyBytes:line1", and update the count-map
-       <pre>count["LeakTwentyByte:line1"] += 20;</pre>
-  <li> <b>Line 2:</b> Dump the current <code>count</code> map to a file.
-  <li> <b>Line 3:</b> Allocate 20 bytes, mark <code>b</code> as having
-       call-path "LeakTwentyBytes:line3", and update the count-map:
-       <pre>count["LeakTwentyByte:line3"] += 20;</pre>
-  <li> <b>Line 4:</b> Look up <code>a</code> to find its call-path
-       (stored in line 1), and use that to update the count-map:
-       <pre>count["LeakTwentyByte:line1"] -= 20;</pre>          
-  <li> <b>Line 5:</b> Look at each bucket in the current count-map,
-       minus what was dumped in line 2.  Here's the diffs we'll have
-       in each bucket:
-       <pre>
-count["LeakTwentyByte:line1"] == -20;
-count["LeakTwentyByte:line3"] == 20;
-       </pre>
-       Since <i>at least one</i> bucket has a positive number, we
-       complain of a leak.  (Note if line 5 had been
-       <code>SameHeap</code> instead of <code>NoLeaks</code>, we would 
-       have complained if any bucket had had a <i>non-zero</i>
-       number.)
-</ol>
-
-<p>Note that one way to visualize the non-<code>pprof</code> mode is
-that we do the same thing as above, but always use "unknown" as the
-call-path.  That is, our count-map always only has one entry in it:
-<code>count["unknown"]</code>.  Looking at the example above shows how
-having only one entry in the map can lead to incorrect results.</p>
-
-<p>Here is when <code>pprof</code> is used by the heap-checker:</p>
-<ul>
-  <li> <code>NoLeaks()</code> and <code>SameHeap()</code> both use
-       <code>pprof</code>.
-  <li> <code>BriefNoLeaks()</code> and <code>BriefSameHeap()</code> do
-       not use <code>pprof</code>.
-  <li> <code>QuickNoLeaks</code> and <code>QuickSameHeap()</code> are
-       a kind of compromise: they do <i>not</i> use pprof for their
-       leak check, but if that check happens to find a leak anyway,
-       then they re-do the leak calculation using <code>pprof</code>.
-       This means they do not always find leaks, but when they do,
-       they will be as accurate as possible in their leak report.
-</ul>
+<p>
+The heap checker automatically prints basic leak info with stack traces of
+leaked objects' allocation sites, as well as a pprof command line that can be
+used to visualize the call-graph involved in these allocations.
+The latter can be much more useful for a human
+to see where/why the leaks happened, especially if the leaks are numerous.
+</p>
 
 <h3>Leak-checking and Threads</h3>
 
 <p>At the time of HeapLeakChecker's construction and during
-<code>*NoLeaks()</code>/<code>*SameHeap()</code> calls, we grab a lock
+<code>NoLeaks()</code> calls, we grab a lock
 and then pause all other threads so other threads do not interfere
 with recording or analyzing the state of the heap.</p>
 
@@ -635,7 +518,8 @@ depending on how the compiled code works with the stack:</p>
   int* foo = new int [20];
   HeapLeakChecker check("a_check");
   foo = NULL;
-  CHECK(check.NoLeaks());  // this might succeed
+  // May fail to trigger.
+  if (!heap_checker.NoLeaks()) assert(NULL == "heap memory leak");
 </pre>
 
 
diff --git a/m4/pc_from_ucontext.m4 b/m4/pc_from_ucontext.m4
index 19ec347..dee73a1 100644
--- a/m4/pc_from_ucontext.m4
+++ b/m4/pc_from_ucontext.m4
@@ -27,7 +27,7 @@ AC_DEFUN([AC_PC_FROM_UCONTEXT],
    pc_fields="$pc_fields uc_mcontext.sc_ip"            # Linux (ia64)
    pc_fields="$pc_fields uc_mcontext.uc_regs->gregs[[PT_NIP]]" # Linux (ppc)
    pc_fields="$pc_fields uc_mcontext.gregs[[R15]]"     # Linux (arm old [untested])
-   pc_fields="$pc_fields uc_mcontext.arm_pc"           # Linux (arm new [untested])
+   pc_fields="$pc_fields uc_mcontext.arm_pc"           # Linux (arm arch 5)
    pc_fields="$pc_fields uc_mcontext.gp_regs[[PT_NIP]]"  # Suse SLES 11 (ppc64)
    pc_fields="$pc_fields uc_mcontext.mc_eip"           # FreeBSD (i386)
    pc_fields="$pc_fields uc_mcontext.mc_rip"           # FreeBSD (x86_64 [untested])
diff --git a/packages/deb/changelog b/packages/deb/changelog
index 933795e..579ebf0 100644
--- a/packages/deb/changelog
+++ b/packages/deb/changelog
@@ -1,3 +1,9 @@
+google-perftools (1.6-1) unstable; urgency=low
+
+  * New upstream release.
+
+ -- Google Inc. <opensource@google.com>  Thu, 05 Aug 2010 12:48:03 -0700
+
 google-perftools (1.5-1) unstable; urgency=low
 
   * New upstream release.
diff --git a/src/base/atomicops.h b/src/base/atomicops.h
index 0f3d3ef..ec60489 100644
--- a/src/base/atomicops.h
+++ b/src/base/atomicops.h
@@ -89,6 +89,8 @@
 // TODO(csilvers): match piii, not just __i386.  Also, match k8
 #if defined(__MACH__) && defined(__APPLE__)
 #include "base/atomicops-internals-macosx.h"
+#elif defined(__GNUC__) && defined(__ARM_ARCH_5T__)
+#include "base/atomicops-internals-arm-gcc.h"
 #elif defined(_MSC_VER) && defined(_M_IX86)
 #include "base/atomicops-internals-x86-msvc.h"
 #elif defined(__MINGW32__) && defined(__i386__)
diff --git a/src/base/cycleclock.h b/src/base/cycleclock.h
index 8af664e..b114170 100644
--- a/src/base/cycleclock.h
+++ b/src/base/cycleclock.h
@@ -48,6 +48,8 @@
 #include "base/basictypes.h"   // make sure we get the def for int64
 #if defined(__MACH__) && defined(__APPLE__)
 #include <mach/mach_time.h>
+#elif defined(__ARM_ARCH_5T__)
+#include <sys/time.h>
 #endif
 
 // NOTE: only i386 and x86_64 have been well tested.
@@ -71,8 +73,7 @@ struct CycleClock {
     return mach_absolute_time();
 #elif defined(__i386__)
     int64 ret;
-    __asm__ volatile ("rdtsc"
-                      : "=A" (ret) );
+    __asm__ volatile ("rdtsc" : "=A" (ret) );
     return ret;
 #elif defined(__x86_64__) || defined(__amd64__)
     uint64 low, high;
@@ -82,11 +83,15 @@ struct CycleClock {
     // This returns a time-base, which is not always precisely a cycle-count.
     int64 tbl, tbu0, tbu1;
     asm("mftbu %0" : "=r" (tbu0));
-    asm("mftb  %0" : "=r" (tbl ));
+    asm("mftb  %0" : "=r" (tbl));
     asm("mftbu %0" : "=r" (tbu1));
     tbl &= -static_cast<int64>(tbu0 == tbu1);
     // high 32 bits in tbu1; low 32 bits in tbl  (tbu0 is garbage)
     return (tbu1 << 32) | tbl;
+#elif defined(__ARM_ARCH_5T__)
+    struct timeval tv;
+    gettimeofday(&tv, NULL);
+    return static_cast<uint64>(tv.tv_sec) * 1000000 + tv.tv_usec;
 #elif defined(__sparc__)
     int64 tick;
     asm(".byte 0x83, 0x41, 0x00, 0x00");
diff --git a/src/base/dynamic_annotations.c b/src/base/dynamic_annotations.c
index bddd693..ec37318 100644
--- a/src/base/dynamic_annotations.c
+++ b/src/base/dynamic_annotations.c
@@ -139,24 +139,24 @@ static int GetRunningOnValgrind(void) {
 /* See the comments in dynamic_annotations.h */
 int RunningOnValgrind(void) {
   static volatile int running_on_valgrind = -1;
+  int local_running_on_valgrind = running_on_valgrind;
   /* C doesn't have thread-safe initialization of statics, and we
      don't want to depend on pthread_once here, so hack it. */
   ANNOTATE_BENIGN_RACE(&running_on_valgrind, "safe hack");
-  int local_running_on_valgrind = running_on_valgrind;
   if (local_running_on_valgrind == -1)
     running_on_valgrind = local_running_on_valgrind = GetRunningOnValgrind();
   return local_running_on_valgrind;
 }
 
 /* See the comments in dynamic_annotations.h */
-double ValgrindSlowdown() {
-  if (RunningOnValgrind() == 0) {
-    return 1.0;
-  }
+double ValgrindSlowdown(void) {
   /* Same initialization hack as in RunningOnValgrind(). */
   static volatile double slowdown = 0.0;
+  double local_slowdown = slowdown;
   ANNOTATE_BENIGN_RACE(&slowdown, "safe hack");
-  int local_slowdown = slowdown;
+  if (RunningOnValgrind() == 0) {
+    return 1.0;
+  }
   if (local_slowdown == 0.0) {
     char *env = getenv("VALGRIND_SLOWDOWN");
     slowdown = local_slowdown = env ? atof(env) : 50.0;
diff --git a/src/base/dynamic_annotations.h b/src/base/dynamic_annotations.h
index ceb9809..10642fd 100644
--- a/src/base/dynamic_annotations.h
+++ b/src/base/dynamic_annotations.h
@@ -468,7 +468,7 @@ int RunningOnValgrind(void);
      SleepForSeconds(5 * ValgrindSlowdown());
    }
  */
-double ValgrindSlowdown();
+double ValgrindSlowdown(void);
 
 #ifdef __cplusplus
 }
diff --git a/src/base/sysinfo.cc b/src/base/sysinfo.cc
index 7af0495..7cfa051 100644
--- a/src/base/sysinfo.cc
+++ b/src/base/sysinfo.cc
@@ -56,6 +56,7 @@
 #endif
 #include "base/sysinfo.h"
 #include "base/commandlineflags.h"
+#include "base/dynamic_annotations.h"   // for RunningOnValgrind
 #include "base/logging.h"
 #include "base/cycleclock.h"
 
@@ -240,9 +241,15 @@ static void InitializeSystemInfo() {
   if (already_called)  return;
   already_called = true;
 
-  // I put in a never-called reference to EstimateCyclesPerSecond() here
-  // to silence the compiler for OS's that don't need it
-  if (0) EstimateCyclesPerSecond(0);
+  bool saw_mhz = false;
+
+  if (RunningOnValgrind()) {
+    // Valgrind may slow the progress of time artificially (--scale-time=N
+    // option). We thus can't rely on CPU Mhz info stored in /sys or /proc
+    // files. Thus, actually measure the cps.
+    cpuinfo_cycles_per_second = EstimateCyclesPerSecond(100);
+    saw_mhz = true;
+  }
 
 #if defined(__linux__) || defined(__CYGWIN__) || defined(__CYGWIN32__)
   char line[1024];
@@ -250,21 +257,23 @@ static void InitializeSystemInfo() {
 
   // If CPU scaling is in effect, we want to use the *maximum* frequency,
   // not whatever CPU speed some random processor happens to be using now.
-  bool saw_mhz = false;
-  const char* pname0 = "/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq";
-  int fd0 = open(pname0, O_RDONLY);
-  if (fd0 != -1) {
-    memset(line, '\0', sizeof(line));
-    read(fd0, line, sizeof(line));
-    const int max_freq = strtol(line, &err, 10);
-    if (line[0] != '\0' && (*err == '\n' || *err == '\0')) {
-      // The value is in kHz.  For example, on a 2GHz machine, the file
-      // contains the value "2000000".  Historically this file contained no
-      // newline, but at some point the kernel started appending a newline.
-      cpuinfo_cycles_per_second = max_freq * 1000.0;
-      saw_mhz = true;
+  if (!saw_mhz) {
+    const char* pname0 =
+        "/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq";
+    int fd0 = open(pname0, O_RDONLY);
+    if (fd0 != -1) {
+      memset(line, '\0', sizeof(line));
+      read(fd0, line, sizeof(line));
+      const int max_freq = strtol(line, &err, 10);
+      if (line[0] != '\0' && (*err == '\n' || *err == '\0')) {
+        // The value is in kHz.  For example, on a 2GHz machine, the file
+        // contains the value "2000000".  Historically this file contained no
+        // newline, but at some point the kernel started appending a newline.
+        cpuinfo_cycles_per_second = max_freq * 1000.0;
+        saw_mhz = true;
+      }
+      close(fd0);
     }
-    close(fd0);
   }
 
   // Read /proc/cpuinfo for other values, and if there is no cpuinfo_max_freq.
@@ -272,7 +281,9 @@ static void InitializeSystemInfo() {
   int fd = open(pname, O_RDONLY);
   if (fd == -1) {
     perror(pname);
-    cpuinfo_cycles_per_second = EstimateCyclesPerSecond(1000);
+    if (!saw_mhz) {
+      cpuinfo_cycles_per_second = EstimateCyclesPerSecond(1000);
+    }
     return;          // TODO: use generic tester instead?
   }
 
diff --git a/src/base/thread_annotations.h b/src/base/thread_annotations.h
index ded13d6..f1b3593 100644
--- a/src/base/thread_annotations.h
+++ b/src/base/thread_annotations.h
@@ -45,15 +45,21 @@
 #ifndef BASE_THREAD_ANNOTATIONS_H_
 #define BASE_THREAD_ANNOTATIONS_H_
 
+
 #if defined(__GNUC__) && defined(__SUPPORT_TS_ANNOTATION__) && (!defined(SWIG))
+#define THREAD_ANNOTATION_ATTRIBUTE__(x)   __attribute__((x))
+#else
+#define THREAD_ANNOTATION_ATTRIBUTE__(x)   // no-op
+#endif
+
 
 // Document if a shared variable/field needs to be protected by a lock.
 // GUARDED_BY allows the user to specify a particular lock that should be
 // held when accessing the annotated variable, while GUARDED_VAR only
 // indicates a shared variable should be guarded (by any lock). GUARDED_VAR
 // is primarily used when the client cannot express the name of the lock.
-#define GUARDED_BY(x)          __attribute__ ((guarded_by(x)))
-#define GUARDED_VAR            __attribute__ ((guarded))
+#define GUARDED_BY(x)          THREAD_ANNOTATION_ATTRIBUTE__(guarded_by(x))
+#define GUARDED_VAR            THREAD_ANNOTATION_ATTRIBUTE__(guarded)
 
 // Document if the memory location pointed to by a pointer should be guarded
 // by a lock when dereferencing the pointer. Similar to GUARDED_VAR,
@@ -63,90 +69,64 @@
 // q, which is guarded by mu1, points to a shared memory location that is
 // guarded by mu2, q should be annotated as follows:
 //     int *q GUARDED_BY(mu1) PT_GUARDED_BY(mu2);
-#define PT_GUARDED_BY(x)       __attribute__ ((point_to_guarded_by(x)))
-#define PT_GUARDED_VAR         __attribute__ ((point_to_guarded))
+#define PT_GUARDED_BY(x) \
+  THREAD_ANNOTATION_ATTRIBUTE__(point_to_guarded_by(x))
+#define PT_GUARDED_VAR \
+  THREAD_ANNOTATION_ATTRIBUTE__(point_to_guarded)
 
 // Document the acquisition order between locks that can be held
 // simultaneously by a thread. For any two locks that need to be annotated
 // to establish an acquisition order, only one of them needs the annotation.
 // (i.e. You don't have to annotate both locks with both ACQUIRED_AFTER
 // and ACQUIRED_BEFORE.)
-#define ACQUIRED_AFTER(...)    __attribute__ ((acquired_after(__VA_ARGS__)))
-#define ACQUIRED_BEFORE(...)   __attribute__ ((acquired_before(__VA_ARGS__)))
+#define ACQUIRED_AFTER(x) \
+  THREAD_ANNOTATION_ATTRIBUTE__(acquired_after(x))
+#define ACQUIRED_BEFORE(x) \
+  THREAD_ANNOTATION_ATTRIBUTE__(acquired_before(x))
 
 // The following three annotations document the lock requirements for
 // functions/methods.
 
 // Document if a function expects certain locks to be held before it is called
-#define EXCLUSIVE_LOCKS_REQUIRED(...) \
-  __attribute__ ((exclusive_locks_required(__VA_ARGS__)))
+#define EXCLUSIVE_LOCKS_REQUIRED(x) \
+  THREAD_ANNOTATION_ATTRIBUTE__(exclusive_locks_required(x))
 
-#define SHARED_LOCKS_REQUIRED(...) \
-  __attribute__ ((shared_locks_required(__VA_ARGS__)))
+#define SHARED_LOCKS_REQUIRED(x) \
+  THREAD_ANNOTATION_ATTRIBUTE__(shared_locks_required(x))
 
 // Document the locks acquired in the body of the function. These locks
 // cannot be held when calling this function (as google3's Mutex locks are
 // non-reentrant).
-#define LOCKS_EXCLUDED(...)    __attribute__ ((locks_excluded(__VA_ARGS__)))
+#define LOCKS_EXCLUDED(x) \
+  THREAD_ANNOTATION_ATTRIBUTE__(locks_excluded(x))
 
 // Document the lock the annotated function returns without acquiring it.
-#define LOCK_RETURNED(x)       __attribute__ ((lock_returned(x)))
+#define LOCK_RETURNED(x)       THREAD_ANNOTATION_ATTRIBUTE__(lock_returned(x))
 
 // Document if a class/type is a lockable type (such as the Mutex class).
-#define LOCKABLE               __attribute__ ((lockable))
+#define LOCKABLE               THREAD_ANNOTATION_ATTRIBUTE__(lockable)
 
 // Document if a class is a scoped lockable type (such as the MutexLock class).
-#define SCOPED_LOCKABLE        __attribute__ ((scoped_lockable))
+#define SCOPED_LOCKABLE        THREAD_ANNOTATION_ATTRIBUTE__(scoped_lockable)
 
 // The following annotations specify lock and unlock primitives.
-#define EXCLUSIVE_LOCK_FUNCTION(...) \
-  __attribute__ ((exclusive_lock(__VA_ARGS__)))
+#define EXCLUSIVE_LOCK_FUNCTION(x) \
+  THREAD_ANNOTATION_ATTRIBUTE__(exclusive_lock(x))
 
-#define SHARED_LOCK_FUNCTION(...) \
-  __attribute__ ((shared_lock(__VA_ARGS__)))
+#define SHARED_LOCK_FUNCTION(x) \
+  THREAD_ANNOTATION_ATTRIBUTE__(shared_lock(x))
 
-#define EXCLUSIVE_TRYLOCK_FUNCTION(...) \
-  __attribute__ ((exclusive_trylock(__VA_ARGS__)))
+#define EXCLUSIVE_TRYLOCK_FUNCTION(x) \
+  THREAD_ANNOTATION_ATTRIBUTE__(exclusive_trylock(x))
 
-#define SHARED_TRYLOCK_FUNCTION(...) \
-  __attribute__ ((shared_trylock(__VA_ARGS__)))
+#define SHARED_TRYLOCK_FUNCTION(x) \
+  THREAD_ANNOTATION_ATTRIBUTE__(shared_trylock(x))
 
-#define UNLOCK_FUNCTION(...)   __attribute__ ((unlock(__VA_ARGS__)))
+#define UNLOCK_FUNCTION(x) \
+  THREAD_ANNOTATION_ATTRIBUTE__(unlock(x))
 
 // An escape hatch for thread safety analysis to ignore the annotated function.
-#define NO_THREAD_SAFETY_ANALYSIS  __attribute__ ((no_thread_safety_analysis))
-
-
-#else
-
-// When the compiler is not GCC, these annotations are simply no-ops.
-
-// NOTE: in theory, the macros that take "arg" below *could* take
-// multiple arguments, but in practice so far they only take one.
-// Since not all non-gcc compilers support ... -- notably MSVC 7.1 --
-// I just hard-code in a single arg.  If this assumption ever breaks,
-// we can change it back to "...", or handle it some other way.
-
-#define GUARDED_BY(x)                   // no-op
-#define GUARDED_VAR                     // no-op
-#define PT_GUARDED_BY(x)                // no-op
-#define PT_GUARDED_VAR                  // no-op
-#define ACQUIRED_AFTER(arg)             // no-op
-#define ACQUIRED_BEFORE(arg)            // no-op
-#define EXCLUSIVE_LOCKS_REQUIRED(arg)   // no-op
-#define SHARED_LOCKS_REQUIRED(arg)      // no-op
-#define LOCKS_EXCLUDED(arg)             // no-op
-#define LOCK_RETURNED(x)                // no-op
-#define LOCKABLE                        // no-op
-#define SCOPED_LOCKABLE                 // no-op
-#define EXCLUSIVE_LOCK_FUNCTION(arg)    // no-op
-#define SHARED_LOCK_FUNCTION(arg)       // no-op
-#define EXCLUSIVE_TRYLOCK_FUNCTION(arg) // no-op
-#define SHARED_TRYLOCK_FUNCTION(arg)    // no-op
-#define UNLOCK_FUNCTION(arg)            // no-op
-#define NO_THREAD_SAFETY_ANALYSIS       // no-op
-
-#endif // defined(__GNUC__) && defined(__SUPPORT_TS_ANNOTATION__)
-       // && !defined(SWIG)
+#define NO_THREAD_SAFETY_ANALYSIS \
+  THREAD_ANNOTATION_ATTRIBUTE__(no_thread_safety_analysis)
 
 #endif  // BASE_THREAD_ANNOTATIONS_H_
diff --git a/src/base/vdso_support.h b/src/base/vdso_support.h
index c47b3c5..86c4527 100644
--- a/src/base/vdso_support.h
+++ b/src/base/vdso_support.h
@@ -64,7 +64,7 @@ class VDSOSupport {
   // Supports iteration over all dynamic symbols.
   class SymbolIterator {
    public:
-    friend struct VDSOSupport;
+    friend class VDSOSupport;
     const SymbolInfo *operator->() const;
     const SymbolInfo &operator*() const;
     SymbolIterator& operator++();
diff --git a/src/common.cc b/src/common.cc
index 04723b1..4b84f18 100644
--- a/src/common.cc
+++ b/src/common.cc
@@ -53,6 +53,24 @@ static inline int LgFloor(size_t n) {
   return log;
 }
 
+int AlignmentForSize(size_t size) {
+  int alignment = kAlignment;
+  if (size >= 2048) {
+    // Cap alignment at 256 for large sizes.
+    alignment = 256;
+  } else if (size >= 128) {
+    // Space wasted due to alignment is at most 1/8, i.e., 12.5%.
+    alignment = (1 << LgFloor(size)) / 8;
+  } else if (size >= 16) {
+    // We need an alignment of at least 16 bytes to satisfy
+    // requirements for some SSE types.
+    alignment = 16;
+  }
+  CHECK_CONDITION(size < 16 || alignment >= 16);
+  CHECK_CONDITION((alignment & (alignment - 1)) == 0);
+  return alignment;
+}
+
 int SizeMap::NumMoveSize(size_t size) {
   if (size == 0) return 0;
   // Use approx 64k transfers between thread and central caches.
@@ -93,19 +111,7 @@ void SizeMap::Init() {
     int lg = LgFloor(size);
     if (lg > last_lg) {
       // Increase alignment every so often to reduce number of size classes.
-      if (size >= 2048) {
-        // Cap alignment at 256 for large sizes
-        alignment = 256;
-      } else if (size >= 128) {
-        // Space wasted due to alignment is at most 1/8, i.e., 12.5%.
-        alignment = size / 8;
-      } else if (size >= 16) {
-        // We need an alignment of at least 16 bytes to satisfy
-        // requirements for some SSE types.
-        alignment = 16;
-      }
-      CHECK_CONDITION(size < 16 || alignment >= 16);
-      CHECK_CONDITION((alignment & (alignment - 1)) == 0);
+      alignment = AlignmentForSize(size);
       last_lg = lg;
     }
     CHECK_CONDITION((size % alignment) == 0);
diff --git a/src/common.h b/src/common.h
index 5226998..e2906d6 100644
--- a/src/common.h
+++ b/src/common.h
@@ -111,6 +111,10 @@ inline Length pages(size_t bytes) {
       ((bytes & (kPageSize - 1)) > 0 ? 1 : 0);
 }
 
+// For larger allocation sizes, we use larger memory alignments to
+// reduce the number of size classes.
+int AlignmentForSize(size_t size);
+
 // Size-class information + mapping
 class SizeMap {
  private:
diff --git a/src/google/malloc_extension.h b/src/google/malloc_extension.h
index 9c05897..3fbefc9 100644
--- a/src/google/malloc_extension.h
+++ b/src/google/malloc_extension.h
@@ -219,6 +219,7 @@ class PERFTOOLS_DLL_DECL MallocExtension {
   // SIZE bytes may reserve more bytes, but will never reserve less.
   // (Currently only implemented in tcmalloc, other implementations
   // always return SIZE.)
+  // This is equivalent to malloc_good_size() in OS X.
   virtual size_t GetEstimatedAllocatedSize(size_t size);
 
   // Returns the actual number N of bytes reserved by tcmalloc for the
@@ -232,6 +233,8 @@ class PERFTOOLS_DLL_DECL MallocExtension {
   // from that -- and should not have been freed yet.  p may be NULL.
   // (Currently only implemented in tcmalloc; other implementations
   // will return 0.)
+  // This is equivalent to malloc_size() in OS X, malloc_usable_size()
+  // in glibc, and _msize() for windows.
   virtual size_t GetAllocatedSize(void* p);
 
   // The current malloc implementation.  Always non-NULL.
diff --git a/src/google/tcmalloc.h.in b/src/google/tcmalloc.h.in
index fbb70ab..cdaaaa0 100644
--- a/src/google/tcmalloc.h.in
+++ b/src/google/tcmalloc.h.in
@@ -89,6 +89,13 @@ extern "C" {
   PERFTOOLS_DLL_DECL struct mallinfo tc_mallinfo(void) __THROW;
 #endif
 
+  // This is an alias for MallocExtension::instance()->GetAllocatedSize().
+  // It is equivalent to
+  //    OS X: malloc_size()
+  //    glibc: malloc_usable_size()
+  //    Windows: _msize()
+  size_t tc_malloc_size(void* ptr) __THROW;
+
 #ifdef __cplusplus
   PERFTOOLS_DLL_DECL int tc_set_new_mode(int flag) __THROW;
   PERFTOOLS_DLL_DECL void* tc_new(size_t size);
diff --git a/src/heap-checker.cc b/src/heap-checker.cc
index 2b0b854..c4f6da8 100644
--- a/src/heap-checker.cc
+++ b/src/heap-checker.cc
@@ -965,7 +965,8 @@ static enum {
     // specially via self_thread_stack, not here:
     if (thread_pids[i] == self_thread_pid) continue;
     RAW_VLOG(11, "Handling thread with pid %d", thread_pids[i]);
-#if defined(HAVE_LINUX_PTRACE_H) && defined(HAVE_SYS_SYSCALL_H) && defined(DUMPER)
+#if (defined(__i386__) || defined(__x86_64)) && \
+    defined(HAVE_LINUX_PTRACE_H) && defined(HAVE_SYS_SYSCALL_H) && defined(DUMPER)
     i386_regs thread_regs;
 #define sys_ptrace(r, p, a, d)  syscall(SYS_ptrace, (r), (p), (a), (d))
     // We use sys_ptrace to avoid thread locking
@@ -2091,12 +2092,11 @@ void HeapLeakChecker::CancelGlobalCheck() {
 // HeapLeakChecker global constructor/destructor ordering components
 //----------------------------------------------------------------------
 
-static bool in_initial_malloc_hook = false;
-
 #ifdef HAVE___ATTRIBUTE__   // we need __attribute__((weak)) for this to work
 #define INSTALLED_INITIAL_MALLOC_HOOKS
 
 void HeapLeakChecker_BeforeConstructors();  // below
+static bool in_initial_malloc_hook = false;
 
 // Helper for InitialMallocHook_* below
 static inline void InitHeapLeakCheckerFromMallocHook() {
@@ -2115,20 +2115,20 @@ static inline void InitHeapLeakCheckerFromMallocHook() {
 // These will owerwrite the weak definitions in malloc_hook.cc:
 
 // Important to have this to catch the first allocation call from the binary:
-extern void InitialMallocHook_New(const void* ptr, size_t size) {
+extern "C" void InitialMallocHook_New(const void* ptr, size_t size) {
   InitHeapLeakCheckerFromMallocHook();
   // record this first allocation as well (if we need to):
   MallocHook::InvokeNewHook(ptr, size);
 }
 
 // Important to have this to catch the first mmap call (say from tcmalloc):
-extern void InitialMallocHook_MMap(const void* result,
-                                   const void* start,
-                                   size_t size,
-                                   int protection,
-                                   int flags,
-                                   int fd,
-                                   off_t offset) {
+extern "C" void InitialMallocHook_MMap(const void* result,
+                                       const void* start,
+                                       size_t size,
+                                       int protection,
+                                       int flags,
+                                       int fd,
+                                       off_t offset) {
   InitHeapLeakCheckerFromMallocHook();
   // record this first mmap as well (if we need to):
   MallocHook::InvokeMmapHook(
@@ -2136,7 +2136,8 @@ extern void InitialMallocHook_MMap(const void* result,
 }
 
 // Important to have this to catch the first sbrk call (say from tcmalloc):
-extern void InitialMallocHook_Sbrk(const void* result, ptrdiff_t increment) {
+extern "C" void InitialMallocHook_Sbrk(const void* result,
+                                       ptrdiff_t increment) {
   InitHeapLeakCheckerFromMallocHook();
   // record this first sbrk as well (if we need to):
   MallocHook::InvokeSbrkHook(result, increment);
diff --git a/src/heap-profiler.cc b/src/heap-profiler.cc
index 3055f4c..f28dffb 100644
--- a/src/heap-profiler.cc
+++ b/src/heap-profiler.cc
@@ -210,6 +210,7 @@ static char* DoGetHeapProfileLocked(char* buf, int buflen) {
   int bytes_written = 0;
   if (is_on) {
     HeapProfileTable::Stats const stats = heap_profile->total();
+    (void)stats;   // avoid an unused-variable warning in non-debug mode.
     AddRemoveMMapDataLocked(ADD);
     bytes_written = heap_profile->FillOrderedProfile(buf, buflen - 1);
     // FillOrderedProfile should not reduce the set of active mmap-ed regions,
diff --git a/src/malloc_hook-inl.h b/src/malloc_hook-inl.h
index a629691..a690b07 100644
--- a/src/malloc_hook-inl.h
+++ b/src/malloc_hook-inl.h
@@ -70,8 +70,17 @@ class AtomicPtr {
 
   // Sets the contained value to new_val and returns the old value,
   // atomically, with acquire and release semantics.
+  // This is a full-barrier instruction.
   PtrT Exchange(PtrT new_val);
 
+  // Atomically executes:
+  //      result = data_
+  //      if (data_ == old_val)
+  //        data_ = new_val;
+  //      return result;
+  // This is a full-barrier instruction.
+  PtrT CompareAndSwap(PtrT old_val, PtrT new_val);
+
   // Not private so that the class is an aggregate and can be
   // initialized by the linker. Don't access this directly.
   AtomicWord data_;
diff --git a/src/malloc_hook.cc b/src/malloc_hook.cc
index 4315b86..e823a44 100644
--- a/src/malloc_hook.cc
+++ b/src/malloc_hook.cc
@@ -66,8 +66,10 @@
 using std::copy;
 
 
-// Declarations of three default weak hook functions, that can be overridden by
-// linking-in a strong definition (as heap-checker.cc does)
+// Declarations of five default weak hook functions, that can be overridden by
+// linking-in a strong definition (as heap-checker.cc does).  These are extern
+// "C" so that they don't trigger gold's --detect-odr-violations warning, which
+// only looks at C++ symbols.
 //
 // These default hooks let some other library we link in
 // to define strong versions of InitialMallocHook_New, InitialMallocHook_MMap,
@@ -81,31 +83,35 @@ using std::copy;
 // weak symbols too early, at compile rather than link time.  By declaring it
 // (weak) here, then defining it below after its use, we can avoid the problem.
 //
+extern "C" {
+
 ATTRIBUTE_WEAK
-extern void InitialMallocHook_New(const void* ptr, size_t size);
+void InitialMallocHook_New(const void* ptr, size_t size);
 
 ATTRIBUTE_WEAK
-extern void InitialMallocHook_PreMMap(const void* start,
-                                      size_t size,
-                                      int protection,
-                                      int flags,
-                                      int fd,
-                                      off_t offset);
+void InitialMallocHook_PreMMap(const void* start,
+                               size_t size,
+                               int protection,
+                               int flags,
+                               int fd,
+                               off_t offset);
 
 ATTRIBUTE_WEAK
-extern void InitialMallocHook_MMap(const void* result,
-                                   const void* start,
-                                   size_t size,
-                                   int protection,
-                                   int flags,
-                                   int fd,
-                                   off_t offset);
+void InitialMallocHook_MMap(const void* result,
+                            const void* start,
+                            size_t size,
+                            int protection,
+                            int flags,
+                            int fd,
+                            off_t offset);
 
 ATTRIBUTE_WEAK
-extern void InitialMallocHook_PreSbrk(ptrdiff_t increment);
+void InitialMallocHook_PreSbrk(ptrdiff_t increment);
 
 ATTRIBUTE_WEAK
-extern void InitialMallocHook_Sbrk(const void* result, ptrdiff_t increment);
+void InitialMallocHook_Sbrk(const void* result, ptrdiff_t increment);
+
+}  // extern "C"
 
 namespace base { namespace internal {
 template<typename PtrT>
@@ -123,6 +129,18 @@ PtrT AtomicPtr<PtrT>::Exchange(PtrT new_val) {
   return old_val;
 }
 
+template<typename PtrT>
+PtrT AtomicPtr<PtrT>::CompareAndSwap(PtrT old_val, PtrT new_val) {
+  base::subtle::MemoryBarrier();  // Release semantics.
+  PtrT retval = reinterpret_cast<PtrT>(static_cast<AtomicWord>(
+      base::subtle::NoBarrier_CompareAndSwap(
+          &data_,
+          reinterpret_cast<AtomicWord>(old_val),
+          reinterpret_cast<AtomicWord>(new_val))));
+  base::subtle::MemoryBarrier();  // And acquire semantics.
+  return retval;
+}
+
 AtomicPtr<MallocHook::NewHook>    new_hook_ = {
   reinterpret_cast<AtomicWord>(InitialMallocHook_New) };
 AtomicPtr<MallocHook::DeleteHook> delete_hook_ = { 0 };
@@ -215,8 +233,8 @@ MallocHook_SbrkHook MallocHook_SetSbrkHook(MallocHook_SbrkHook hook) {
 // TODO(csilvers): add support for removing a hook from the middle of a chain.
 
 void InitialMallocHook_New(const void* ptr, size_t size) {
-   if (MallocHook::GetNewHook() == &InitialMallocHook_New)
-     MallocHook::SetNewHook(NULL);
+  // Set new_hook to NULL iff its previous value was InitialMallocHook_New
+  new_hook_.CompareAndSwap(&InitialMallocHook_New, NULL);
 }
 
 void InitialMallocHook_PreMMap(const void* start,
@@ -225,8 +243,7 @@ void InitialMallocHook_PreMMap(const void* start,
                                int flags,
                                int fd,
                                off_t offset) {
-  if (MallocHook::GetPreMmapHook() == &InitialMallocHook_PreMMap)
-    MallocHook::SetPreMmapHook(NULL);
+  premmap_hook_.CompareAndSwap(&InitialMallocHook_PreMMap, NULL);
 }
 
 void InitialMallocHook_MMap(const void* result,
@@ -236,18 +253,15 @@ void InitialMallocHook_MMap(const void* result,
                             int flags,
                             int fd,
                             off_t offset) {
-  if (MallocHook::GetMmapHook() == &InitialMallocHook_MMap)
-    MallocHook::SetMmapHook(NULL);
+  mmap_hook_.CompareAndSwap(&InitialMallocHook_MMap, NULL);
 }
 
 void InitialMallocHook_PreSbrk(ptrdiff_t increment) {
-  if (MallocHook::GetPreSbrkHook() == &InitialMallocHook_PreSbrk)
-    MallocHook::SetPreSbrkHook(NULL);
+  presbrk_hook_.CompareAndSwap(&InitialMallocHook_PreSbrk, NULL);
 }
 
 void InitialMallocHook_Sbrk(const void* result, ptrdiff_t increment) {
-  if (MallocHook::GetSbrkHook() == &InitialMallocHook_Sbrk)
-    MallocHook::SetSbrkHook(NULL);
+  sbrk_hook_.CompareAndSwap(&InitialMallocHook_Sbrk, NULL);
 }
 
 DEFINE_ATTRIBUTE_SECTION_VARS(google_malloc);
diff --git a/src/pprof b/src/pprof
index 8d4ddcf..e67e42e 100755
--- a/src/pprof
+++ b/src/pprof
@@ -594,6 +594,10 @@ sub Main() {
   } elsif ($main::use_symbol_page) {
     $symbols = FetchSymbols($pcs);
   } else {
+    # TODO(csilvers): $libs uses the /proc/self/maps data from profile1,
+    # which may differ from the data from subsequent profiles, especially
+    # if they were run on different machines.  Use appropriate libs for
+    # each pc somehow.
     $symbols = ExtractSymbols($libs, $pcs);
   }
 
@@ -3043,6 +3047,7 @@ BEGIN {
                  stride      => 512 * 1024,   # must be a multiple of bitsize/8
                  slots       => [],
                  unpack_code => "",           # N for big-endian, V for little
+                 perl_is_64bit => 1,          # matters if profile is 64-bit
     };
     bless $self, $class;
     # Let unittests adjust the stride
@@ -3066,17 +3071,15 @@ BEGIN {
       }
       @$slots = unpack($self->{unpack_code} . "*", $str);
     } else {
-      # If we're a 64-bit profile, make sure we're a 64-bit-capable
+      # If we're a 64-bit profile, check if we're a 64-bit-capable
       # perl.  Otherwise, each slot will be represented as a float
       # instead of an int64, losing precision and making all the
-      # 64-bit addresses right.  We *could* try to handle this with
-      # software emulation of 64-bit ints, but that's added complexity
-      # for no clear benefit (yet).  We use 'Q' to test for 64-bit-ness;
-      # perl docs say it's only available on 64-bit perl systems.
+      # 64-bit addresses wrong.  We won't complain yet, but will
+      # later if we ever see a value that doesn't fit in 32 bits.
       my $has_q = 0;
       eval { $has_q = pack("Q", "1") ? 1 : 1; };
       if (!$has_q) {
-        ::error("$fname: need a 64-bit perl to process this 64-bit profile.\n");
+	$self->{perl_is_64bit} = 0;
       }
       read($self->{file}, $str, 8);
       if (substr($str, 4, 4) eq chr(0)x4) {
@@ -3112,11 +3115,17 @@ BEGIN {
         # TODO(csilvers): if this is a 32-bit perl, the math below
         #    could end up in a too-large int, which perl will promote
         #    to a double, losing necessary precision.  Deal with that.
-        if ($self->{unpack_code} eq 'V') {    # little-endian
-          push(@b64_values, $b32_values[$i] + $b32_values[$i+1] * (2**32));
-        } else {
-          push(@b64_values, $b32_values[$i] * (2**32) + $b32_values[$i+1]);
-        }
+	#    Right now, we just die.
+	my ($lo, $hi) = ($b32_values[$i], $b32_values[$i+1]);
+        if ($self->{unpack_code} eq 'N') {    # big-endian
+	  ($lo, $hi) = ($hi, $lo);
+	}
+	my $value = $lo + $hi * (2**32);
+	if (!$self->{perl_is_64bit} &&   # check value is exactly represented
+	    (($value % (2**32)) != $lo || int($value / (2**32)) != $hi)) {
+	  ::error("Need a 64-bit perl to process this 64-bit profile.\n");
+	}
+	push(@b64_values, $value);
       }
       @$slots = @b64_values;
     }
@@ -4341,7 +4350,7 @@ sub ConfigureTool {
   if ($tools =~ m/(,|^)\Q$tool\E:([^,]*)/) {
     $path = $2;
     # TODO(csilvers): sanity-check that $path exists?  Hard if it's relative.
-  } elsif ($tools) {
+  } elsif ($tools ne '') {
     foreach my $prefix (split(',', $tools)) {
       next if ($prefix =~ /:/);    # ignore "tool:fullpath" entries in the list
       if (-x $prefix . $tool) {
diff --git a/src/stacktrace_x86-inl.h b/src/stacktrace_x86-inl.h
index 6753fdb..a140ab6 100644
--- a/src/stacktrace_x86-inl.h
+++ b/src/stacktrace_x86-inl.h
@@ -297,7 +297,7 @@ int GET_STACK_TRACE_OR_FRAMES {
   //    sp[2]   first argument
   //    ...
   // NOTE: This will break under llvm, since result is a copy and not in sp[2]
-  sp = (void **)&pcs - 2;
+  sp = (void **)&result - 2;
 #elif defined(__x86_64__)
   unsigned long rbp;
   // Move the value of the register %rbp into the local variable rbp.
diff --git a/src/tcmalloc.cc b/src/tcmalloc.cc
index 13d2c23..93bdd1d 100644
--- a/src/tcmalloc.cc
+++ b/src/tcmalloc.cc
@@ -137,6 +137,7 @@
 #endif
 
 using std::max;
+using tcmalloc::AlignmentForSize;
 using tcmalloc::PageHeap;
 using tcmalloc::PageHeapAllocator;
 using tcmalloc::SizeMap;
@@ -212,7 +213,7 @@ extern "C" {
       ATTRIBUTE_SECTION(google_malloc);
   int tc_mallopt(int cmd, int value) __THROW
       ATTRIBUTE_SECTION(google_malloc);
-#ifdef HAVE_STRUCT_MALLINFO    // struct mallinfo isn't defined on freebsd
+#ifdef HAVE_STRUCT_MALLINFO
   struct mallinfo tc_mallinfo(void) __THROW
       ATTRIBUTE_SECTION(google_malloc);
 #endif
@@ -238,6 +239,15 @@ extern "C" {
       ATTRIBUTE_SECTION(google_malloc);
   void tc_deletearray_nothrow(void* ptr, const std::nothrow_t&) __THROW
       ATTRIBUTE_SECTION(google_malloc);
+
+  // Some non-standard extensions that we support.
+
+  // This is equivalent to
+  //    OS X: malloc_size()
+  //    glibc: malloc_usable_size()
+  //    Windows: _msize()
+  size_t tc_malloc_size(void* p) __THROW
+      ATTRIBUTE_SECTION(google_malloc);
 }  // extern "C"
 #endif  // #ifndef _WIN32
 
@@ -282,6 +292,8 @@ extern "C" {
 #ifdef HAVE_STRUCT_MALLINFO
   struct mallinfo mallinfo(void) __THROW         ALIAS("tc_mallinfo");
 #endif
+  size_t malloc_size(void* p) __THROW            ALIAS("tc_malloc_size");
+  size_t malloc_usable_size(void* p) __THROW     ALIAS("tc_malloc_size");
 }   // extern "C"
 #else  // #if defined(__GNUC__) && !defined(__MACH__)
 // Portable wrappers
@@ -318,6 +330,8 @@ extern "C" {
 #ifdef HAVE_STRUCT_MALLINFO
   struct mallinfo mallinfo(void) __THROW         { return tc_mallinfo();      }
 #endif
+  size_t malloc_size(void* p) __THROW            { return tc_malloc_size(p); }
+  size_t malloc_usable_size(void* p) __THROW     { return tc_malloc_size(p); }
 }  // extern "C"
 #endif  // #if defined(__GNUC__)
 
@@ -845,6 +859,8 @@ static void* DoSampledAllocation(size_t size) {
   return SpanToMallocResult(span);
 }
 
+namespace {
+
 // Copy of FLAGS_tcmalloc_large_alloc_report_threshold with
 // automatic increases factored in.
 static int64_t large_alloc_threshold =
@@ -868,8 +884,6 @@ static void ReportLargeAlloc(Length num_pages, void* result) {
   write(STDERR_FILENO, buffer, strlen(buffer));
 }
 
-namespace {
-
 inline void* cpp_alloc(size_t size, bool nothrow);
 inline void* do_malloc(size_t size);
 
@@ -1096,16 +1110,23 @@ inline void* do_realloc(void* old_ptr, size_t new_size) {
 
 // For use by exported routines below that want specific alignments
 //
-// Note: this code can be slow, and can significantly fragment memory.
-// The expectation is that memalign/posix_memalign/valloc/pvalloc will
-// not be invoked very often.  This requirement simplifies our
-// implementation and allows us to tune for expected allocation
-// patterns.
+// Note: this code can be slow for alignments > 16, and can
+// significantly fragment memory.  The expectation is that
+// memalign/posix_memalign/valloc/pvalloc will not be invoked very
+// often.  This requirement simplifies our implementation and allows
+// us to tune for expected allocation patterns.
 void* do_memalign(size_t align, size_t size) {
   ASSERT((align & (align - 1)) == 0);
   ASSERT(align > 0);
   if (size + align < size) return NULL;         // Overflow
 
+  // Fall back to malloc if we would already align this memory access properly.
+  if (align <= AlignmentForSize(size)) {
+    void* p = do_malloc(size);
+    ASSERT((reinterpret_cast<uintptr_t>(p) % align) == 0);
+    return p;
+  }
+
   if (Static::pageheap() == NULL) ThreadCache::InitModule();
 
   // Allocate at least one byte to avoid boundary conditions below
@@ -1178,7 +1199,7 @@ inline int do_mallopt(int cmd, int value) {
   return 1;     // Indicates error
 }
 
-#ifdef HAVE_STRUCT_MALLINFO  // mallinfo isn't defined on freebsd, for instance
+#ifdef HAVE_STRUCT_MALLINFO
 inline struct mallinfo do_mallinfo() {
   TCMallocStats stats;
   ExtractStats(&stats, NULL);
@@ -1204,7 +1225,7 @@ inline struct mallinfo do_mallinfo() {
 
   return info;
 }
-#endif  // #ifndef HAVE_STRUCT_MALLINFO
+#endif  // HAVE_STRUCT_MALLINFO
 
 static SpinLock set_new_handler_lock(SpinLock::LINKER_INITIALIZED);
 
@@ -1489,6 +1510,10 @@ extern "C" PERFTOOLS_DLL_DECL struct mallinfo tc_mallinfo(void) __THROW {
 }
 #endif
 
+extern "C" PERFTOOLS_DLL_DECL size_t tc_malloc_size(void* ptr) __THROW {
+  return GetSizeWithCallback(ptr, &InvalidGetAllocatedSize);
+}
+
 // This function behaves similarly to MSVC's _set_new_mode.
 // If flag is 0 (default), calls to malloc will behave normally.
 // If flag is 1, calls to malloc will behave like calls to new,
diff --git a/src/tests/sampler_test.cc b/src/tests/sampler_test.cc
index fca10ac..045cd02 100755
--- a/src/tests/sampler_test.cc
+++ b/src/tests/sampler_test.cc
@@ -647,6 +647,11 @@ TEST(Sample, size_of_class) {
   LOG(INFO) << "Size of Sampler object is: " << sizeof(sampler);
 }
 
+// Make sure sampling is enabled, or the tests won't work right.
+DECLARE_int64(tcmalloc_sample_parameter);
+
 int main(int argc, char **argv) {
+  if (FLAGS_tcmalloc_sample_parameter == 0)
+    FLAGS_tcmalloc_sample_parameter = 524288;
   return RUN_ALL_TESTS();
 }
diff --git a/src/tests/tcmalloc_unittest.cc b/src/tests/tcmalloc_unittest.cc
index 6b2ec26..522c0d9 100644
--- a/src/tests/tcmalloc_unittest.cc
+++ b/src/tests/tcmalloc_unittest.cc
@@ -126,6 +126,7 @@ using std::string;
 
 DECLARE_double(tcmalloc_release_rate);
 DECLARE_int32(max_free_queue_size);     // in debugallocation.cc
+DECLARE_int64(tcmalloc_sample_parameter);
 
 namespace testing {
 
@@ -559,6 +560,13 @@ static void TestCalloc(size_t n, size_t s, bool ok) {
 // direction doesn't cause us to allocate new memory.
 static void TestRealloc() {
 #ifndef DEBUGALLOCATION  // debug alloc doesn't try to minimize reallocs
+  // When sampling, we always allocate in units of page-size, which
+  // makes reallocs of small sizes do extra work (thus, failing these
+  // checks).  Since sampling is random, we turn off sampling to make
+  // sure that doesn't happen to us here.
+  const int64 old_sample_parameter = FLAGS_tcmalloc_sample_parameter;
+  FLAGS_tcmalloc_sample_parameter = 0;   // turn off sampling
+
   int start_sizes[] = { 100, 1000, 10000, 100000 };
   int deltas[] = { 1, -2, 4, -8, 16, -32, 64, -128 };
 
@@ -566,7 +574,7 @@ static void TestRealloc() {
     void* p = malloc(start_sizes[s]);
     CHECK(p);
     // The larger the start-size, the larger the non-reallocing delta.
-    for (int d = 0; d < s*2; ++d) {
+    for (int d = 0; d < (s+1) * 2; ++d) {
       void* new_p = realloc(p, start_sizes[s] + deltas[d]);
       CHECK(p == new_p);  // realloc should not allocate new memory
     }
@@ -577,6 +585,7 @@ static void TestRealloc() {
     }
     free(p);
   }
+  FLAGS_tcmalloc_sample_parameter = old_sample_parameter;
 #endif
 }
 
@@ -998,9 +1007,14 @@ static int RunAllTests(int argc, char** argv) {
 
     void* p1 = malloc(10);
     VerifyNewHookWasCalled();
+    // Also test the non-standard tc_malloc_size
+    size_t actual_p1_size = tc_malloc_size(p1);
+    CHECK_GE(actual_p1_size, 10);
+    CHECK_LT(actual_p1_size, 100000);   // a reasonable upper-bound, I think
     free(p1);
     VerifyDeleteHookWasCalled();
 
+
     p1 = calloc(10, 2);
     VerifyNewHookWasCalled();
     p1 = realloc(p1, 30);
diff --git a/src/windows/google/tcmalloc.h b/src/windows/google/tcmalloc.h
index 663b7f9..5bd4c59 100644
--- a/src/windows/google/tcmalloc.h
+++ b/src/windows/google/tcmalloc.h
@@ -90,6 +90,13 @@ extern "C" {
   PERFTOOLS_DLL_DECL struct mallinfo tc_mallinfo(void) __THROW;
 #endif
 
+  // This is an alias for MallocExtension::instance()->GetAllocatedSize().
+  // It is equivalent to
+  //    OS X: malloc_size()
+  //    glibc: malloc_usable_size()
+  //    Windows: _msize()
+  PERFTOOLS_DLL_DECL size_t tc_malloc_size(void* ptr) __THROW;
+
 #ifdef __cplusplus
   PERFTOOLS_DLL_DECL int tc_set_new_mode(int flag) __THROW;
   PERFTOOLS_DLL_DECL void* tc_new(size_t size);
diff --git a/src/windows/port.cc b/src/windows/port.cc
index 9a9da80..d62fa9d 100644
--- a/src/windows/port.cc
+++ b/src/windows/port.cc
@@ -35,6 +35,7 @@
 # error You should only be including windows/port.cc in a windows environment!
 #endif
 
+#define NOMINMAX       // so std::max, below, compiles correctly
 #include <config.h>
 #include <string.h>    // for strlen(), memset(), memcmp()
 #include <assert.h>
diff --git a/vsprojects/addressmap_unittest/addressmap_unittest.vcproj b/vsprojects/addressmap_unittest/addressmap_unittest.vcproj
index 7dd8657..d48ef27 100755
--- a/vsprojects/addressmap_unittest/addressmap_unittest.vcproj
+++ b/vsprojects/addressmap_unittest/addressmap_unittest.vcproj
@@ -128,7 +128,7 @@
 				</FileConfiguration>
 			</File>
 			<File
-				RelativePath="..\..\src\base\dynamic_annotations.cc">
+				RelativePath="..\..\src\base\dynamic_annotations.c">
 				<FileConfiguration
 					Name="Debug|Win32">
 					<Tool
diff --git a/vsprojects/libtcmalloc_minimal/libtcmalloc_minimal.vcproj b/vsprojects/libtcmalloc_minimal/libtcmalloc_minimal.vcproj
index 3755fb0..58d32e6 100755
--- a/vsprojects/libtcmalloc_minimal/libtcmalloc_minimal.vcproj
+++ b/vsprojects/libtcmalloc_minimal/libtcmalloc_minimal.vcproj
@@ -130,7 +130,7 @@
 				</FileConfiguration>
 			</File>
 			<File
-				RelativePath="..\..\src\base\dynamic_annotations.cc">
+				RelativePath="..\..\src\base\dynamic_annotations.c">
 				<FileConfiguration
 					Name="Debug|Win32">
 					<Tool
@@ -504,23 +504,6 @@
 				</FileConfiguration>
 			</File>
 			<File
-				RelativePath="..\..\src\stacktrace_with_context.cc">
-				<FileConfiguration
-					Name="Debug|Win32">
-					<Tool
-						Name="VCCLCompilerTool"
-						AdditionalIncludeDirectories="..\..\src\windows; ..\..\src"
-						RuntimeLibrary="3"/>
-				</FileConfiguration>
-				<FileConfiguration
-					Name="Release|Win32">
-					<Tool
-						Name="VCCLCompilerTool"
-						AdditionalIncludeDirectories="..\..\src\windows; ..\..\src"
-						RuntimeLibrary="2"/>
-				</FileConfiguration>
-			</File>
-			<File
 				RelativePath="..\..\src\stack_trace_table.cc">
 				<FileConfiguration
 					Name="Debug|Win32">
diff --git a/vsprojects/low_level_alloc_unittest/low_level_alloc_unittest.vcproj b/vsprojects/low_level_alloc_unittest/low_level_alloc_unittest.vcproj
index 85fe7f7..f55b56c 100755
--- a/vsprojects/low_level_alloc_unittest/low_level_alloc_unittest.vcproj
+++ b/vsprojects/low_level_alloc_unittest/low_level_alloc_unittest.vcproj
@@ -111,7 +111,7 @@
 			Filter="cpp;c;cxx;def;odl;idl;hpj;bat;asm;asmx"
 			UniqueIdentifier="{4FC737F1-C7A5-4376-A066-2A32D752A2FF}">
 			<File
-				RelativePath="..\..\src\base\dynamic_annotations.cc">
+				RelativePath="..\..\src\base\dynamic_annotations.c">
 				<FileConfiguration
 					Name="Debug|Win32">
 					<Tool
@@ -263,23 +263,6 @@
 						RuntimeLibrary="2"/>
 				</FileConfiguration>
 			</File>
-			<File
-				RelativePath="..\..\src\stacktrace_with_context.cc">
-				<FileConfiguration
-					Name="Debug|Win32">
-					<Tool
-						Name="VCCLCompilerTool"
-						AdditionalIncludeDirectories="..\..\src\windows; ..\..\src"
-						RuntimeLibrary="3"/>
-				</FileConfiguration>
-				<FileConfiguration
-					Name="Release|Win32">
-					<Tool
-						Name="VCCLCompilerTool"
-						AdditionalIncludeDirectories="..\..\src\windows; ..\..\src"
-						RuntimeLibrary="2"/>
-				</FileConfiguration>
-			</File>
 		</Filter>
 		<Filter
 			Name="Header Files"
diff --git a/vsprojects/tmu-static/tmu-static.vcproj b/vsprojects/tmu-static/tmu-static.vcproj
index a5d6402..8d739ae 100755
--- a/vsprojects/tmu-static/tmu-static.vcproj
+++ b/vsprojects/tmu-static/tmu-static.vcproj
@@ -130,7 +130,7 @@
 				</FileConfiguration>
 			</File>
 			<File
-				RelativePath="..\..\src\base\dynamic_annotations.cc">
+				RelativePath="..\..\src\base\dynamic_annotations.c">
 				<FileConfiguration
 					Name="Debug|Win32">
 					<Tool
@@ -544,25 +544,6 @@
 				</FileConfiguration>
 			</File>
 			<File
-				RelativePath="..\..\src\stacktrace_with_context.cc">
-				<FileConfiguration
-					Name="Debug|Win32">
-					<Tool
-						Name="VCCLCompilerTool"
-						AdditionalOptions="/D PERFTOOLS_DLL_DECL="
-						AdditionalIncludeDirectories="..\..\src\windows; ..\..\src"
-						RuntimeLibrary="3"/>
-				</FileConfiguration>
-				<FileConfiguration
-					Name="Release|Win32">
-					<Tool
-						Name="VCCLCompilerTool"
-						AdditionalOptions="/D PERFTOOLS_DLL_DECL="
-						AdditionalIncludeDirectories="..\..\src\windows; ..\..\src"
-						RuntimeLibrary="2"/>
-				</FileConfiguration>
-			</File>
-			<File
 				RelativePath="..\..\src\stack_trace_table.cc">
 				<FileConfiguration
 					Name="Debug|Win32">