summaryrefslogtreecommitdiff
path: root/deps/jemalloc/ChangeLog
diff options
context:
space:
mode:
Diffstat (limited to 'deps/jemalloc/ChangeLog')
-rw-r--r--deps/jemalloc/ChangeLog596
1 files changed, 595 insertions, 1 deletions
diff --git a/deps/jemalloc/ChangeLog b/deps/jemalloc/ChangeLog
index e3b0a5190..29a00fb78 100644
--- a/deps/jemalloc/ChangeLog
+++ b/deps/jemalloc/ChangeLog
@@ -4,6 +4,600 @@ brevity. Much more detail can be found in the git revision history:
https://github.com/jemalloc/jemalloc
+* 5.1.0 (May 4th, 2018)
+
+ This release is primarily about fine-tuning, ranging from several new features
+ to numerous notable performance and portability enhancements. The release and
+ prior dev versions have been running in multiple large scale applications for
+ months, and the cumulative improvements are substantial in many cases.
+
+ Given the long and successful production runs, this release is likely a good
+ candidate for applications to upgrade, from both jemalloc 5.0 and before. For
+ performance-critical applications, the newly added TUNING.md provides
+ guidelines on jemalloc tuning.
+
+ New features:
+ - Implement transparent huge page support for internal metadata. (@interwq)
+ - Add opt.thp to allow enabling / disabling transparent huge pages for all
+ mappings. (@interwq)
+ - Add maximum background thread count option. (@djwatson)
+ - Allow prof_active to control opt.lg_prof_interval and prof.gdump.
+ (@interwq)
+ - Allow arena index lookup based on allocation addresses via mallctl.
+ (@lionkov)
+ - Allow disabling initial-exec TLS model. (@davidtgoldblatt, @KenMacD)
+ - Add opt.lg_extent_max_active_fit to set the max ratio between the size of
+ the active extent selected (to split off from) and the size of the requested
+ allocation. (@interwq, @davidtgoldblatt)
+ - Add retain_grow_limit to set the max size when growing virtual address
+ space. (@interwq)
+ - Add mallctl interfaces:
+ + arena.<i>.retain_grow_limit (@interwq)
+ + arenas.lookup (@lionkov)
+ + max_background_threads (@djwatson)
+ + opt.lg_extent_max_active_fit (@interwq)
+ + opt.max_background_threads (@djwatson)
+ + opt.metadata_thp (@interwq)
+ + opt.thp (@interwq)
+ + stats.metadata_thp (@interwq)
+
+ Portability improvements:
+ - Support GNU/kFreeBSD configuration. (@paravoid)
+ - Support m68k, nios2 and SH3 architectures. (@paravoid)
+ - Fall back to FD_CLOEXEC when O_CLOEXEC is unavailable. (@zonyitoo)
+ - Fix symbol listing for cross-compiling. (@tamird)
+ - Fix high bits computation on ARM. (@davidtgoldblatt, @paravoid)
+ - Disable the CPU_SPINWAIT macro for Power. (@davidtgoldblatt, @marxin)
+ - Fix MSVC 2015 & 2017 builds. (@rustyx)
+ - Improve RISC-V support. (@EdSchouten)
+ - Set name mangling script in strict mode. (@nicolov)
+ - Avoid MADV_HUGEPAGE on ARM. (@marxin)
+ - Modify configure to determine return value of strerror_r.
+ (@davidtgoldblatt, @cferris1000)
+ - Make sure CXXFLAGS is tested with CPP compiler. (@nehaljwani)
+ - Fix 32-bit build on MSVC. (@rustyx)
+ - Fix external symbol on MSVC. (@maksqwe)
+ - Avoid a printf format specifier warning. (@jasone)
+ - Add configure option --disable-initial-exec-tls which can allow jemalloc to
+ be dynamically loaded after program startup. (@davidtgoldblatt, @KenMacD)
+ - AArch64: Add ILP32 support. (@cmuellner)
+ - Add --with-lg-vaddr configure option to support cross compiling.
+ (@cmuellner, @davidtgoldblatt)
+
+ Optimizations and refactors:
+ - Improve active extent fit with extent_max_active_fit. This considerably
+ reduces fragmentation over time and improves virtual memory and metadata
+ usage. (@davidtgoldblatt, @interwq)
+ - Eagerly coalesce large extents to reduce fragmentation. (@interwq)
+ - sdallocx: only read size info when page aligned (i.e. possibly sampled),
+ which speeds up the sized deallocation path significantly. (@interwq)
+ - Avoid attempting new mappings for in place expansion with retain, since
+ it rarely succeeds in practice and causes high overhead. (@interwq)
+ - Refactor OOM handling in newImpl. (@wqfish)
+ - Add internal fine-grained logging functionality for debugging use.
+ (@davidtgoldblatt)
+ - Refactor arena / tcache interactions. (@davidtgoldblatt)
+ - Refactor extent management with dumpable flag. (@davidtgoldblatt)
+ - Add runtime detection of lazy purging. (@interwq)
+ - Use pairing heap instead of red-black tree for extents_avail. (@djwatson)
+ - Use sysctl on startup in FreeBSD. (@trasz)
+ - Use thread local prng state instead of atomic. (@djwatson)
+ - Make decay to always purge one more extent than before, because in
+ practice large extents are usually the ones that cross the decay threshold.
+ Purging the additional extent helps save memory as well as reduce VM
+ fragmentation. (@interwq)
+ - Fast division by dynamic values. (@davidtgoldblatt)
+ - Improve the fit for aligned allocation. (@interwq, @edwinsmith)
+ - Refactor extent_t bitpacking. (@rkmisra)
+ - Optimize the generated assembly for ticker operations. (@davidtgoldblatt)
+ - Convert stats printing to use a structured text emitter. (@davidtgoldblatt)
+ - Remove preserve_lru feature for extents management. (@djwatson)
+ - Consolidate two memory loads into one on the fast deallocation path.
+ (@davidtgoldblatt, @interwq)
+
+ Bug fixes (most of the issues are only relevant to jemalloc 5.0):
+ - Fix deadlock with multithreaded fork in OS X. (@davidtgoldblatt)
+ - Validate returned file descriptor before use. (@zonyitoo)
+ - Fix a few background thread initialization and shutdown issues. (@interwq)
+ - Fix an extent coalesce + decay race by taking both coalescing extents off
+ the LRU list. (@interwq)
+ - Fix potentially unbound increase during decay, caused by one thread keep
+ stashing memory to purge while other threads generating new pages. The
+ number of pages to purge is checked to prevent this. (@interwq)
+ - Fix a FreeBSD bootstrap assertion. (@strejda, @interwq)
+ - Handle 32 bit mutex counters. (@rkmisra)
+ - Fix a indexing bug when creating background threads. (@davidtgoldblatt,
+ @binliu19)
+ - Fix arguments passed to extent_init. (@yuleniwo, @interwq)
+ - Fix addresses used for ordering mutexes. (@rkmisra)
+ - Fix abort_conf processing during bootstrap. (@interwq)
+ - Fix include path order for out-of-tree builds. (@cmuellner)
+
+ Incompatible changes:
+ - Remove --disable-thp. (@interwq)
+ - Remove mallctl interfaces:
+ + config.thp (@interwq)
+
+ Documentation:
+ - Add TUNING.md. (@interwq, @davidtgoldblatt, @djwatson)
+
+* 5.0.1 (July 1, 2017)
+
+ This bugfix release fixes several issues, most of which are obscure enough
+ that typical applications are not impacted.
+
+ Bug fixes:
+ - Update decay->nunpurged before purging, in order to avoid potential update
+ races and subsequent incorrect purging volume. (@interwq)
+ - Only abort on dlsym(3) error if the failure impacts an enabled feature (lazy
+ locking and/or background threads). This mitigates an initialization
+ failure bug for which we still do not have a clear reproduction test case.
+ (@interwq)
+ - Modify tsd management so that it neither crashes nor leaks if a thread's
+ only allocation activity is to call free() after TLS destructors have been
+ executed. This behavior was observed when operating with GNU libc, and is
+ unlikely to be an issue with other libc implementations. (@interwq)
+ - Mask signals during background thread creation. This prevents signals from
+ being inadvertently delivered to background threads. (@jasone,
+ @davidtgoldblatt, @interwq)
+ - Avoid inactivity checks within background threads, in order to prevent
+ recursive mutex acquisition. (@interwq)
+ - Fix extent_grow_retained() to use the specified hooks when the
+ arena.<i>.extent_hooks mallctl is used to override the default hooks.
+ (@interwq)
+ - Add missing reentrancy support for custom extent hooks which allocate.
+ (@interwq)
+ - Post-fork(2), re-initialize the list of tcaches associated with each arena
+ to contain no tcaches except the forking thread's. (@interwq)
+ - Add missing post-fork(2) mutex reinitialization for extent_grow_mtx. This
+ fixes potential deadlocks after fork(2). (@interwq)
+ - Enforce minimum autoconf version (currently 2.68), since 2.63 is known to
+ generate corrupt configure scripts. (@jasone)
+ - Ensure that the configured page size (--with-lg-page) is no larger than the
+ configured huge page size (--with-lg-hugepage). (@jasone)
+
+* 5.0.0 (June 13, 2017)
+
+ Unlike all previous jemalloc releases, this release does not use naturally
+ aligned "chunks" for virtual memory management, and instead uses page-aligned
+ "extents". This change has few externally visible effects, but the internal
+ impacts are... extensive. Many other internal changes combine to make this
+ the most cohesively designed version of jemalloc so far, with ample
+ opportunity for further enhancements.
+
+ Continuous integration is now an integral aspect of development thanks to the
+ efforts of @davidtgoldblatt, and the dev branch tends to remain reasonably
+ stable on the tested platforms (Linux, FreeBSD, macOS, and Windows). As a
+ side effect the official release frequency may decrease over time.
+
+ New features:
+ - Implement optional per-CPU arena support; threads choose which arena to use
+ based on current CPU rather than on fixed thread-->arena associations.
+ (@interwq)
+ - Implement two-phase decay of unused dirty pages. Pages transition from
+ dirty-->muzzy-->clean, where the first phase transition relies on
+ madvise(... MADV_FREE) semantics, and the second phase transition discards
+ pages such that they are replaced with demand-zeroed pages on next access.
+ (@jasone)
+ - Increase decay time resolution from seconds to milliseconds. (@jasone)
+ - Implement opt-in per CPU background threads, and use them for asynchronous
+ decay-driven unused dirty page purging. (@interwq)
+ - Add mutex profiling, which collects a variety of statistics useful for
+ diagnosing overhead/contention issues. (@interwq)
+ - Add C++ new/delete operator bindings. (@djwatson)
+ - Support manually created arena destruction, such that all data and metadata
+ are discarded. Add MALLCTL_ARENAS_DESTROYED for accessing merged stats
+ associated with destroyed arenas. (@jasone)
+ - Add MALLCTL_ARENAS_ALL as a fixed index for use in accessing
+ merged/destroyed arena statistics via mallctl. (@jasone)
+ - Add opt.abort_conf to optionally abort if invalid configuration options are
+ detected during initialization. (@interwq)
+ - Add opt.stats_print_opts, so that e.g. JSON output can be selected for the
+ stats dumped during exit if opt.stats_print is true. (@jasone)
+ - Add --with-version=VERSION for use when embedding jemalloc into another
+ project's git repository. (@jasone)
+ - Add --disable-thp to support cross compiling. (@jasone)
+ - Add --with-lg-hugepage to support cross compiling. (@jasone)
+ - Add mallctl interfaces (various authors):
+ + background_thread
+ + opt.abort_conf
+ + opt.retain
+ + opt.percpu_arena
+ + opt.background_thread
+ + opt.{dirty,muzzy}_decay_ms
+ + opt.stats_print_opts
+ + arena.<i>.initialized
+ + arena.<i>.destroy
+ + arena.<i>.{dirty,muzzy}_decay_ms
+ + arena.<i>.extent_hooks
+ + arenas.{dirty,muzzy}_decay_ms
+ + arenas.bin.<i>.slab_size
+ + arenas.nlextents
+ + arenas.lextent.<i>.size
+ + arenas.create
+ + stats.background_thread.{num_threads,num_runs,run_interval}
+ + stats.mutexes.{ctl,background_thread,prof,reset}.
+ {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
+ num_owner_switch}
+ + stats.arenas.<i>.{dirty,muzzy}_decay_ms
+ + stats.arenas.<i>.uptime
+ + stats.arenas.<i>.{pmuzzy,base,internal,resident}
+ + stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged}
+ + stats.arenas.<i>.bins.<j>.{nslabs,reslabs,curslabs}
+ + stats.arenas.<i>.bins.<j>.mutex.
+ {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
+ num_owner_switch}
+ + stats.arenas.<i>.lextents.<j>.{nmalloc,ndalloc,nrequests,curlextents}
+ + stats.arenas.i.mutexes.{large,extent_avail,extents_dirty,extents_muzzy,
+ extents_retained,decay_dirty,decay_muzzy,base,tcache_list}.
+ {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
+ num_owner_switch}
+
+ Portability improvements:
+ - Improve reentrant allocation support, such that deadlock is less likely if
+ e.g. a system library call in turn allocates memory. (@davidtgoldblatt,
+ @interwq)
+ - Support static linking of jemalloc with glibc. (@djwatson)
+
+ Optimizations and refactors:
+ - Organize virtual memory as "extents" of virtual memory pages, rather than as
+ naturally aligned "chunks", and store all metadata in arbitrarily distant
+ locations. This reduces virtual memory external fragmentation, and will
+ interact better with huge pages (not yet explicitly supported). (@jasone)
+ - Fold large and huge size classes together; only small and large size classes
+ remain. (@jasone)
+ - Unify the allocation paths, and merge most fast-path branching decisions.
+ (@davidtgoldblatt, @interwq)
+ - Embed per thread automatic tcache into thread-specific data, which reduces
+ conditional branches and dereferences. Also reorganize tcache to increase
+ fast-path data locality. (@interwq)
+ - Rewrite atomics to closely model the C11 API, convert various
+ synchronization from mutex-based to atomic, and use the explicit memory
+ ordering control to resolve various hypothetical races without increasing
+ synchronization overhead. (@davidtgoldblatt)
+ - Extensively optimize rtree via various methods:
+ + Add multiple layers of rtree lookup caching, since rtree lookups are now
+ part of fast-path deallocation. (@interwq)
+ + Determine rtree layout at compile time. (@jasone)
+ + Make the tree shallower for common configurations. (@jasone)
+ + Embed the root node in the top-level rtree data structure, thus avoiding
+ one level of indirection. (@jasone)
+ + Further specialize leaf elements as compared to internal node elements,
+ and directly embed extent metadata needed for fast-path deallocation.
+ (@jasone)
+ + Ignore leading always-zero address bits (architecture-specific).
+ (@jasone)
+ - Reorganize headers (ongoing work) to make them hermetic, and disentangle
+ various module dependencies. (@davidtgoldblatt)
+ - Convert various internal data structures such as size class metadata from
+ boot-time-initialized to compile-time-initialized. Propagate resulting data
+ structure simplifications, such as making arena metadata fixed-size.
+ (@jasone)
+ - Simplify size class lookups when constrained to size classes that are
+ multiples of the page size. This speeds lookups, but the primary benefit is
+ complexity reduction in code that was the source of numerous regressions.
+ (@jasone)
+ - Lock individual extents when possible for localized extent operations,
+ rather than relying on a top-level arena lock. (@davidtgoldblatt, @jasone)
+ - Use first fit layout policy instead of best fit, in order to improve
+ packing. (@jasone)
+ - If munmap(2) is not in use, use an exponential series to grow each arena's
+ virtual memory, so that the number of disjoint virtual memory mappings
+ remains low. (@jasone)
+ - Implement per arena base allocators, so that arenas never share any virtual
+ memory pages. (@jasone)
+ - Automatically generate private symbol name mangling macros. (@jasone)
+
+ Incompatible changes:
+ - Replace chunk hooks with an expanded/normalized set of extent hooks.
+ (@jasone)
+ - Remove ratio-based purging. (@jasone)
+ - Remove --disable-tcache. (@jasone)
+ - Remove --disable-tls. (@jasone)
+ - Remove --enable-ivsalloc. (@jasone)
+ - Remove --with-lg-size-class-group. (@jasone)
+ - Remove --with-lg-tiny-min. (@jasone)
+ - Remove --disable-cc-silence. (@jasone)
+ - Remove --enable-code-coverage. (@jasone)
+ - Remove --disable-munmap (replaced by opt.retain). (@jasone)
+ - Remove Valgrind support. (@jasone)
+ - Remove quarantine support. (@jasone)
+ - Remove redzone support. (@jasone)
+ - Remove mallctl interfaces (various authors):
+ + config.munmap
+ + config.tcache
+ + config.tls
+ + config.valgrind
+ + opt.lg_chunk
+ + opt.purge
+ + opt.lg_dirty_mult
+ + opt.decay_time
+ + opt.quarantine
+ + opt.redzone
+ + opt.thp
+ + arena.<i>.lg_dirty_mult
+ + arena.<i>.decay_time
+ + arena.<i>.chunk_hooks
+ + arenas.initialized
+ + arenas.lg_dirty_mult
+ + arenas.decay_time
+ + arenas.bin.<i>.run_size
+ + arenas.nlruns
+ + arenas.lrun.<i>.size
+ + arenas.nhchunks
+ + arenas.hchunk.<i>.size
+ + arenas.extend
+ + stats.cactive
+ + stats.arenas.<i>.lg_dirty_mult
+ + stats.arenas.<i>.decay_time
+ + stats.arenas.<i>.metadata.{mapped,allocated}
+ + stats.arenas.<i>.{npurge,nmadvise,purged}
+ + stats.arenas.<i>.huge.{allocated,nmalloc,ndalloc,nrequests}
+ + stats.arenas.<i>.bins.<j>.{nruns,reruns,curruns}
+ + stats.arenas.<i>.lruns.<j>.{nmalloc,ndalloc,nrequests,curruns}
+ + stats.arenas.<i>.hchunks.<j>.{nmalloc,ndalloc,nrequests,curhchunks}
+
+ Bug fixes:
+ - Improve interval-based profile dump triggering to dump only one profile when
+ a single allocation's size exceeds the interval. (@jasone)
+ - Use prefixed function names (as controlled by --with-jemalloc-prefix) when
+ pruning backtrace frames in jeprof. (@jasone)
+
+* 4.5.0 (February 28, 2017)
+
+ This is the first release to benefit from much broader continuous integration
+ testing, thanks to @davidtgoldblatt. Had we had this testing infrastructure
+ in place for prior releases, it would have caught all of the most serious
+ regressions fixed by this release.
+
+ New features:
+ - Add --disable-thp and the opt.thp mallctl to provide opt-out mechanisms for
+ transparent huge page integration. (@jasone)
+ - Update zone allocator integration to work with macOS 10.12. (@glandium)
+ - Restructure *CFLAGS configuration, so that CFLAGS behaves typically, and
+ EXTRA_CFLAGS provides a way to specify e.g. -Werror during building, but not
+ during configuration. (@jasone, @ronawho)
+
+ Bug fixes:
+ - Fix DSS (sbrk(2)-based) allocation. This regression was first released in
+ 4.3.0. (@jasone)
+ - Handle race in per size class utilization computation. This functionality
+ was first released in 4.0.0. (@interwq)
+ - Fix lock order reversal during gdump. (@jasone)
+ - Fix/refactor tcache synchronization. This regression was first released in
+ 4.0.0. (@jasone)
+ - Fix various JSON-formatted malloc_stats_print() bugs. This functionality
+ was first released in 4.3.0. (@jasone)
+ - Fix huge-aligned allocation. This regression was first released in 4.4.0.
+ (@jasone)
+ - When transparent huge page integration is enabled, detect what state pages
+ start in according to the kernel's current operating mode, and only convert
+ arena chunks to non-huge during purging if that is not their initial state.
+ This functionality was first released in 4.4.0. (@jasone)
+ - Fix lg_chunk clamping for the --enable-cache-oblivious --disable-fill case.
+ This regression was first released in 4.0.0. (@jasone, @428desmo)
+ - Properly detect sparc64 when building for Linux. (@glaubitz)
+
+* 4.4.0 (December 3, 2016)
+
+ New features:
+ - Add configure support for *-*-linux-android. (@cferris1000, @jasone)
+ - Add the --disable-syscall configure option, for use on systems that place
+ security-motivated limitations on syscall(2). (@jasone)
+ - Add support for Debian GNU/kFreeBSD. (@thesam)
+
+ Optimizations:
+ - Add extent serial numbers and use them where appropriate as a sort key that
+ is higher priority than address, so that the allocation policy prefers older
+ extents. This tends to improve locality (decrease fragmentation) when
+ memory grows downward. (@jasone)
+ - Refactor madvise(2) configuration so that MADV_FREE is detected and utilized
+ on Linux 4.5 and newer. (@jasone)
+ - Mark partially purged arena chunks as non-huge-page. This improves
+ interaction with Linux's transparent huge page functionality. (@jasone)
+
+ Bug fixes:
+ - Fix size class computations for edge conditions involving extremely large
+ allocations. This regression was first released in 4.0.0. (@jasone,
+ @ingvarha)
+ - Remove overly restrictive assertions related to the cactive statistic. This
+ regression was first released in 4.1.0. (@jasone)
+ - Implement a more reliable detection scheme for os_unfair_lock on macOS.
+ (@jszakmeister)
+
+* 4.3.1 (November 7, 2016)
+
+ Bug fixes:
+ - Fix a severe virtual memory leak. This regression was first released in
+ 4.3.0. (@interwq, @jasone)
+ - Refactor atomic and prng APIs to restore support for 32-bit platforms that
+ use pre-C11 toolchains, e.g. FreeBSD's mips. (@jasone)
+
+* 4.3.0 (November 4, 2016)
+
+ This is the first release that passes the test suite for multiple Windows
+ configurations, thanks in large part to @glandium setting up continuous
+ integration via AppVeyor (and Travis CI for Linux and OS X).
+
+ New features:
+ - Add "J" (JSON) support to malloc_stats_print(). (@jasone)
+ - Add Cray compiler support. (@ronawho)
+
+ Optimizations:
+ - Add/use adaptive spinning for bootstrapping and radix tree node
+ initialization. (@jasone)
+
+ Bug fixes:
+ - Fix large allocation to search starting in the optimal size class heap,
+ which can substantially reduce virtual memory churn and fragmentation. This
+ regression was first released in 4.0.0. (@mjp41, @jasone)
+ - Fix stats.arenas.<i>.nthreads accounting. (@interwq)
+ - Fix and simplify decay-based purging. (@jasone)
+ - Make DSS (sbrk(2)-related) operations lockless, which resolves potential
+ deadlocks during thread exit. (@jasone)
+ - Fix over-sized allocation of radix tree leaf nodes. (@mjp41, @ogaun,
+ @jasone)
+ - Fix over-sized allocation of arena_t (plus associated stats) data
+ structures. (@jasone, @interwq)
+ - Fix EXTRA_CFLAGS to not affect configuration. (@jasone)
+ - Fix a Valgrind integration bug. (@ronawho)
+ - Disallow 0x5a junk filling when running in Valgrind. (@jasone)
+ - Fix a file descriptor leak on Linux. This regression was first released in
+ 4.2.0. (@vsarunas, @jasone)
+ - Fix static linking of jemalloc with glibc. (@djwatson)
+ - Use syscall(2) rather than {open,read,close}(2) during boot on Linux. This
+ works around other libraries' system call wrappers performing reentrant
+ allocation. (@kspinka, @Whissi, @jasone)
+ - Fix OS X default zone replacement to work with OS X 10.12. (@glandium,
+ @jasone)
+ - Fix cached memory management to avoid needless commit/decommit operations
+ during purging, which resolves permanent virtual memory map fragmentation
+ issues on Windows. (@mjp41, @jasone)
+ - Fix TSD fetches to avoid (recursive) allocation. This is relevant to
+ non-TLS and Windows configurations. (@jasone)
+ - Fix malloc_conf overriding to work on Windows. (@jasone)
+ - Forcibly disable lazy-lock on Windows (was forcibly *enabled*). (@jasone)
+
+* 4.2.1 (June 8, 2016)
+
+ Bug fixes:
+ - Fix bootstrapping issues for configurations that require allocation during
+ tsd initialization (e.g. --disable-tls). (@cferris1000, @jasone)
+ - Fix gettimeofday() version of nstime_update(). (@ronawho)
+ - Fix Valgrind regressions in calloc() and chunk_alloc_wrapper(). (@ronawho)
+ - Fix potential VM map fragmentation regression. (@jasone)
+ - Fix opt_zero-triggered in-place huge reallocation zeroing. (@jasone)
+ - Fix heap profiling context leaks in reallocation edge cases. (@jasone)
+
+* 4.2.0 (May 12, 2016)
+
+ New features:
+ - Add the arena.<i>.reset mallctl, which makes it possible to discard all of
+ an arena's allocations in a single operation. (@jasone)
+ - Add the stats.retained and stats.arenas.<i>.retained statistics. (@jasone)
+ - Add the --with-version configure option. (@jasone)
+ - Support --with-lg-page values larger than actual page size. (@jasone)
+
+ Optimizations:
+ - Use pairing heaps rather than red-black trees for various hot data
+ structures. (@djwatson, @jasone)
+ - Streamline fast paths of rtree operations. (@jasone)
+ - Optimize the fast paths of calloc() and [m,d,sd]allocx(). (@jasone)
+ - Decommit unused virtual memory if the OS does not overcommit. (@jasone)
+ - Specify MAP_NORESERVE on Linux if [heuristic] overcommit is active, in order
+ to avoid unfortunate interactions during fork(2). (@jasone)
+
+ Bug fixes:
+ - Fix chunk accounting related to triggering gdump profiles. (@jasone)
+ - Link against librt for clock_gettime(2) if glibc < 2.17. (@jasone)
+ - Scale leak report summary according to sampling probability. (@jasone)
+
+* 4.1.1 (May 3, 2016)
+
+ This bugfix release resolves a variety of mostly minor issues, though the
+ bitmap fix is critical for 64-bit Windows.
+
+ Bug fixes:
+ - Fix the linear scan version of bitmap_sfu() to shift by the proper amount
+ even when sizeof(long) is not the same as sizeof(void *), as on 64-bit
+ Windows. (@jasone)
+ - Fix hashing functions to avoid unaligned memory accesses (and resulting
+ crashes). This is relevant at least to some ARM-based platforms.
+ (@rkmisra)
+ - Fix fork()-related lock rank ordering reversals. These reversals were
+ unlikely to cause deadlocks in practice except when heap profiling was
+ enabled and active. (@jasone)
+ - Fix various chunk leaks in OOM code paths. (@jasone)
+ - Fix malloc_stats_print() to print opt.narenas correctly. (@jasone)
+ - Fix MSVC-specific build/test issues. (@rustyx, @yuslepukhin)
+ - Fix a variety of test failures that were due to test fragility rather than
+ core bugs. (@jasone)
+
+* 4.1.0 (February 28, 2016)
+
+ This release is primarily about optimizations, but it also incorporates a lot
+ of portability-motivated refactoring and enhancements. Many people worked on
+ this release, to an extent that even with the omission here of minor changes
+ (see git revision history), and of the people who reported and diagnosed
+ issues, so much of the work was contributed that starting with this release,
+ changes are annotated with author credits to help reflect the collaborative
+ effort involved.
+
+ New features:
+ - Implement decay-based unused dirty page purging, a major optimization with
+ mallctl API impact. This is an alternative to the existing ratio-based
+ unused dirty page purging, and is intended to eventually become the sole
+ purging mechanism. New mallctls:
+ + opt.purge
+ + opt.decay_time
+ + arena.<i>.decay
+ + arena.<i>.decay_time
+ + arenas.decay_time
+ + stats.arenas.<i>.decay_time
+ (@jasone, @cevans87)
+ - Add --with-malloc-conf, which makes it possible to embed a default
+ options string during configuration. This was motivated by the desire to
+ specify --with-malloc-conf=purge:decay , since the default must remain
+ purge:ratio until the 5.0.0 release. (@jasone)
+ - Add MS Visual Studio 2015 support. (@rustyx, @yuslepukhin)
+ - Make *allocx() size class overflow behavior defined. The maximum
+ size class is now less than PTRDIFF_MAX to protect applications against
+ numerical overflow, and all allocation functions are guaranteed to indicate
+ errors rather than potentially crashing if the request size exceeds the
+ maximum size class. (@jasone)
+ - jeprof:
+ + Add raw heap profile support. (@jasone)
+ + Add --retain and --exclude for backtrace symbol filtering. (@jasone)
+
+ Optimizations:
+ - Optimize the fast path to combine various bootstrapping and configuration
+ checks and execute more streamlined code in the common case. (@interwq)
+ - Use linear scan for small bitmaps (used for small object tracking). In
+ addition to speeding up bitmap operations on 64-bit systems, this reduces
+ allocator metadata overhead by approximately 0.2%. (@djwatson)
+ - Separate arena_avail trees, which substantially speeds up run tree
+ operations. (@djwatson)
+ - Use memoization (boot-time-computed table) for run quantization. Separate
+ arena_avail trees reduced the importance of this optimization. (@jasone)
+ - Attempt mmap-based in-place huge reallocation. This can dramatically speed
+ up incremental huge reallocation. (@jasone)
+
+ Incompatible changes:
+ - Make opt.narenas unsigned rather than size_t. (@jasone)
+
+ Bug fixes:
+ - Fix stats.cactive accounting regression. (@rustyx, @jasone)
+ - Handle unaligned keys in hash(). This caused problems for some ARM systems.
+ (@jasone, @cferris1000)
+ - Refactor arenas array. In addition to fixing a fork-related deadlock, this
+ makes arena lookups faster and simpler. (@jasone)
+ - Move retained memory allocation out of the default chunk allocation
+ function, to a location that gets executed even if the application installs
+ a custom chunk allocation function. This resolves a virtual memory leak.
+ (@buchgr)
+ - Fix a potential tsd cleanup leak. (@cferris1000, @jasone)
+ - Fix run quantization. In practice this bug had no impact unless
+ applications requested memory with alignment exceeding one page.
+ (@jasone, @djwatson)
+ - Fix LinuxThreads-specific bootstrapping deadlock. (Cosmin Paraschiv)
+ - jeprof:
+ + Don't discard curl options if timeout is not defined. (@djwatson)
+ + Detect failed profile fetches. (@djwatson)
+ - Fix stats.arenas.<i>.{dss,lg_dirty_mult,decay_time,pactive,pdirty} for
+ --disable-stats case. (@jasone)
+
+* 4.0.4 (October 24, 2015)
+
+ This bugfix release fixes another xallocx() regression. No other regressions
+ have come to light in over a month, so this is likely a good starting point
+ for people who prefer to wait for "dot one" releases with all the major issues
+ shaken out.
+
+ Bug fixes:
+ - Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of large
+ allocations that have been randomly assigned an offset of 0 when
+ --enable-cache-oblivious configure option is enabled.
+
* 4.0.3 (September 24, 2015)
This bugfix release continues the trend of xallocx() and heap profiling fixes.
@@ -38,7 +632,7 @@ brevity. Much more detail can be found in the git revision history:
these fixes, xallocx() now tries harder to partially fulfill requests for
optional extra space. Note that a couple of minor heap profiling
optimizations are included, but these are better thought of as performance
- fixes that were integral to disovering most of the other bugs.
+ fixes that were integral to discovering most of the other bugs.
Optimizations:
- Avoid a chunk metadata read in arena_prof_tctx_set(), since it is in the