diff options
author | Ben Gamari <bgamari.foss@gmail.com> | 2015-07-09 02:08:01 +0200 |
---|---|---|
committer | Ben Gamari <ben@smart-cactus.org> | 2015-07-09 02:08:01 +0200 |
commit | b29633f5cf310824f3e34716e9261162ced779d3 (patch) | |
tree | 5bfb753fb27c55a52c4cdc1303dee150015278a3 /compiler/cmm/Bitmap.hs | |
parent | 4f9d6008c04b71fc9449b3dc10861f757539ed0f (diff) | |
download | haskell-b29633f5cf310824f3e34716e9261162ced779d3.tar.gz |
Bitmap: Fix thunk explosion
Previously we would build up another `map (-N)` thunk
for every word in the bitmap. Now we strictly accumulate the position
and carry out a single ``map (`subtract` accum)``.
`Bitmap.intsToBitmap` showed up in the profile while compiling a
testcase of #7450 (namely a program containing a record type with large
number of fields which derived `Read`). The culprit was
`CmmBuildInfoTables.procpointSRT.bitmap`. On the testcase (with 4096
fields), the profile previously looked like,
```
total time = 307.94 secs (307943 ticks @ 1000 us, 1
processor)
total alloc = 336,797,868,056 bytes (excludes profiling
overheads)
COST CENTRE MODULE %time %alloc
lintAnnots CoreLint 17.2 25.8
procpointSRT.bitmap CmmBuildInfoTables 11.3 25.2
FloatOutwards SimplCore 7.5 1.6
flatten.lookup CmmBuildInfoTables 4.0 3.9
...
```
After this fix it looks like,
```
total time = 256.88 secs (256876 ticks @ 1000 us, 1
processor)
total alloc = 255,033,667,448 bytes (excludes profiling
overheads)
COST CENTRE MODULE %time %alloc
lintAnnots CoreLint 20.3 34.1
FloatOutwards SimplCore 9.1 2.1
flatten.lookup CmmBuildInfoTables 4.8 5.2
pprNativeCode AsmCodeGen 3.7 4.3
simplLetUnfolding Simplify 3.6 2.2
StgCmm HscMain 3.6 2.1
```
Signed-off-by: Ben Gamari <ben@smart-cactus.org>
Test Plan: Validate
Reviewers: austin, simonpj
Reviewed By: simonpj
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1041
GHC Trac Issues: #7450
Diffstat (limited to 'compiler/cmm/Bitmap.hs')
-rw-r--r-- | compiler/cmm/Bitmap.hs | 84 |
1 files changed, 64 insertions, 20 deletions
diff --git a/compiler/cmm/Bitmap.hs b/compiler/cmm/Bitmap.hs index e7aa072063..22ec6ee238 100644 --- a/compiler/cmm/Bitmap.hs +++ b/compiler/cmm/Bitmap.hs @@ -1,4 +1,4 @@ -{-# LANGUAGE CPP #-} +{-# LANGUAGE CPP, BangPatterns #-} -- -- (c) The University of Glasgow 2003-2006 @@ -45,31 +45,75 @@ chunkToBitmap dflags chunk = -- eg. @[0,1,3], size 4 ==> 0xb@. -- -- The list of @Int@s /must/ be already sorted. -intsToBitmap :: DynFlags -> Int -> [Int] -> Bitmap -intsToBitmap dflags size slots{- must be sorted -} - | size <= 0 = [] - | otherwise = - (foldr (.|.) (toStgWord dflags 0) (map (toStgWord dflags 1 `shiftL`) these)) : - intsToBitmap dflags (size - wORD_SIZE_IN_BITS dflags) - (map (\x -> x - wORD_SIZE_IN_BITS dflags) rest) - where (these,rest) = span (< wORD_SIZE_IN_BITS dflags) slots +intsToBitmap :: DynFlags + -> Int -- ^ size in bits + -> [Int] -- ^ sorted indices of ones + -> Bitmap +intsToBitmap dflags size = go 0 + where + word_sz = wORD_SIZE_IN_BITS dflags + oneAt :: Int -> StgWord + oneAt i = toStgWord dflags 1 `shiftL` i + + -- It is important that we maintain strictness here. + -- See Note [Strictness when building Bitmaps]. + go :: Int -> [Int] -> Bitmap + go !pos slots + | size <= pos = [] + | otherwise = + (foldr (.|.) (toStgWord dflags 0) (map (\i->oneAt (i - pos)) these)) : + go (pos + word_sz) rest + where + (these,rest) = span (< (pos + word_sz)) slots -- | Make a bitmap where the slots specified are the /zeros/ in the bitmap. -- eg. @[0,1,3], size 4 ==> 0x4@ (we leave any bits outside the size as zero, -- just to make the bitmap easier to read). -- -- The list of @Int@s /must/ be already sorted and duplicate-free. -intsToReverseBitmap :: DynFlags -> Int -> [Int] -> Bitmap -intsToReverseBitmap dflags size slots{- must be sorted -} - | size <= 0 = [] - | otherwise = - (foldr xor (toStgWord dflags init) (map (toStgWord dflags 1 `shiftL`) these)) : - intsToReverseBitmap dflags (size - wORD_SIZE_IN_BITS dflags) - (map (\x -> x - wORD_SIZE_IN_BITS dflags) rest) - where (these,rest) = span (< wORD_SIZE_IN_BITS dflags) slots - init - | size >= wORD_SIZE_IN_BITS dflags = -1 - | otherwise = (1 `shiftL` size) - 1 +intsToReverseBitmap :: DynFlags + -> Int -- ^ size in bits + -> [Int] -- ^ sorted indices of zeros free of duplicates + -> Bitmap +intsToReverseBitmap dflags size = go 0 + where + word_sz = wORD_SIZE_IN_BITS dflags + oneAt :: Int -> StgWord + oneAt i = toStgWord dflags 1 `shiftL` i + + -- It is important that we maintain strictness here. + -- See Note [Strictness when building Bitmaps]. + go :: Int -> [Int] -> Bitmap + go !pos slots + | size <= pos = [] + | otherwise = + (foldr xor (toStgWord dflags init) (map (\i->oneAt (i - pos)) these)) : + go (pos + word_sz) rest + where + (these,rest) = span (< (pos + word_sz)) slots + remain = size - pos + init + | remain >= word_sz = -1 + | otherwise = (1 `shiftL` remain) - 1 + +{- + +Note [Strictness when building Bitmaps] +======================================== + +One of the places where @Bitmap@ is used is in in building Static Reference +Tables (SRTs) (in @CmmBuildInfoTables.procpointSRT@). In #7450 it was noticed +that some test cases (particularly those whose C-- have large numbers of CAFs) +produced large quantities of allocations from this function. + +The source traced back to 'intsToBitmap', which was lazily subtracting the word +size from the elements of the tail of the @slots@ list and recursively invoking +itself with the result. This resulted in large numbers of subtraction thunks +being built up. Here we take care to avoid passing new thunks to the recursive +call. Instead we pass the unmodified tail along with an explicit position +accumulator, which get subtracted in the fold when we compute the Word. + +-} {- | Magic number, must agree with @BITMAP_BITS_SHIFT@ in InfoTables.h. |