diff options
author | Johan Tibell <johan.tibell@gmail.com> | 2014-03-13 09:35:21 +0100 |
---|---|---|
committer | Johan Tibell <johan.tibell@gmail.com> | 2014-03-22 10:32:02 +0100 |
commit | 1eece45692fb5d1a5f4ec60c1537f8068237e9c1 (patch) | |
tree | b5d99d52c5a6ab762f9b92dfd0504105122ed62b /includes/stg | |
parent | 99ef27913dbe55fa57891bbf97d131e0933733e3 (diff) | |
download | haskell-1eece45692fb5d1a5f4ec60c1537f8068237e9c1.tar.gz |
codeGen: inline allocation optimization for clone array primops
The inline allocation version is 69% faster than the out-of-line
version, when cloning an array of 16 unit elements on a 64-bit
machine.
Comparing the new and the old primop implementations isn't
straightforward. The old version had a missing heap check that I
discovered during the development of the new version. Comparing the
old and the new version would requiring fixing the old version, which
in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim.
The inline allocation threshold is configurable via
-fmax-inline-alloc-size which gives the maximum array size, in bytes,
to allocate inline. The size does not include the closure header size.
Allowing the same primop to be either inline or out-of-line has some
implication for how we lay out heap checks. We always place a heap
check around out-of-line primops, as they may allocate outside of our
knowledge. However, for the inline primops we only allow allocation
via the standard means (i.e. virtHp). Since the clone primops might be
either inline or out-of-line the heap check layout code now consults
shouldInlinePrimOp to know whether a primop will be inlined.
Diffstat (limited to 'includes/stg')
-rw-r--r-- | includes/stg/MiscClosures.h | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/includes/stg/MiscClosures.h b/includes/stg/MiscClosures.h index ff781dd4ec..8be51fb036 100644 --- a/includes/stg/MiscClosures.h +++ b/includes/stg/MiscClosures.h @@ -347,6 +347,10 @@ RTS_FUN_DECL(stg_casIntArrayzh); RTS_FUN_DECL(stg_fetchAddIntArrayzh); RTS_FUN_DECL(stg_newArrayzh); RTS_FUN_DECL(stg_newArrayArrayzh); +RTS_FUN_DECL(stg_cloneArrayzh); +RTS_FUN_DECL(stg_cloneMutableArrayzh); +RTS_FUN_DECL(stg_freezzeArrayzh); +RTS_FUN_DECL(stg_thawArrayzh); RTS_FUN_DECL(stg_newMutVarzh); RTS_FUN_DECL(stg_atomicModifyMutVarzh); |