diff options
author | Austin Seipp <austin@well-typed.com> | 2013-10-01 21:13:14 -0500 |
---|---|---|
committer | Austin Seipp <austin@well-typed.com> | 2013-10-01 21:26:47 -0500 |
commit | fd74014079f14bd3ab50e328e52c44ef97d40e05 (patch) | |
tree | da31c992a76d3816a4f1012ceb1eb4e68d0fb556 /compiler/prelude | |
parent | 627d1e008cbe4d9318b2466394420a968d1659da (diff) | |
download | haskell-fd74014079f14bd3ab50e328e52c44ef97d40e05.tar.gz |
Add support for prefetch with locality levels.
This patch adds support for several new primitive operations which
support using processor-specific instructions to help guide data and
cache locality decisions. We have levels ranging from [0..3]
For LLVM, we generate llvm.prefetch intrinsics at the proper locality
level (similar to GCC.)
For x86 we generate prefetch{NTA, t2, t1, t0} instructions. On SPARC and
PowerPC, the locality levels are ignored.
This closes #8256.
Authored-by: Carter Tazio Schonwald <carter.schonwald@gmail.com>
Signed-off-by: Austin Seipp <austin@well-typed.com>
Diffstat (limited to 'compiler/prelude')
-rw-r--r-- | compiler/prelude/primops.txt.pp | 85 |
1 files changed, 77 insertions, 8 deletions
diff --git a/compiler/prelude/primops.txt.pp b/compiler/prelude/primops.txt.pp index dcd536eeae..5bedc31a7b 100644 --- a/compiler/prelude/primops.txt.pp +++ b/compiler/prelude/primops.txt.pp @@ -2596,22 +2596,91 @@ primop VecWriteScalarOffAddrOp "writeOffAddrAs#" GenPrimOp vector = ALL_VECTOR_TYPES ------------------------------------------------------------------------ + section "Prefetch" - {Prefetch operations} + {Prefetch operations: Note how every prefetch operation has a name + with the pattern prefetch*N#, where N is either 0,1,2, or 3. + + This suffix number, N, is the "locality level" of the prefetch, following the + convention in GCC and other compilers. + Higher locality numbers correspond to the memory being loaded in more + levels of the cpu cache, and being retained after initial use. + + On the LLVM backend, prefetch*N# uses the LLVM prefetch intrinsic + with locality level N. The code generated by LLVM is target architecture + dependent, but should agree with the GHC NCG on x86 systems. + + On the Sparc and PPC native backends, prefetch*N is a No-Op. + + On the x86 NCG, N=0 will generate prefetchNTA, + N=1 generates prefetcht2, N=2 generates prefetcht1, and + N=3 generates prefetcht0. + + For streaming workloads, the prefetch*0 operations are recommended. + For workloads which do many reads or writes to a memory location in a short period of time, + prefetch*3 operations are recommended. + } ------------------------------------------------------------------------ -primop PrefetchByteArrayOp "prefetchByteArray#" GenPrimOp + +--- the Int# argument for prefetch is the byte offset on the byteArray or Addr# + +--- +primop PrefetchByteArrayOp3 "prefetchByteArray3#" GenPrimOp ByteArray# -> Int# -> ByteArray# - with llvm_only = True + with can_fail = True -primop PrefetchMutableByteArrayOp "prefetchMutableByteArray#" GenPrimOp +primop PrefetchMutableByteArrayOp3 "prefetchMutableByteArray3#" GenPrimOp MutableByteArray# s -> Int# -> State# s -> State# s - with has_side_effects = True - llvm_only = True + with can_fail = True + +primop PrefetchAddrOp3 "prefetchAddr3#" GenPrimOp + Addr# -> Int# -> Addr# + with can_fail = True -primop PrefetchAddrOp "prefetchAddr#" GenPrimOp +---- + +primop PrefetchByteArrayOp2 "prefetchByteArray2#" GenPrimOp + ByteArray# -> Int# -> ByteArray# + with can_fail = True + +primop PrefetchMutableByteArrayOp2 "prefetchMutableByteArray2#" GenPrimOp + MutableByteArray# s -> Int# -> State# s -> State# s + with can_fail = True + +primop PrefetchAddrOp2 "prefetchAddr2#" GenPrimOp Addr# -> Int# -> Addr# - with llvm_only = True + with can_fail = True + +---- + +primop PrefetchByteArrayOp1 "prefetchByteArray1#" GenPrimOp + ByteArray# -> Int# -> ByteArray# + with can_fail = True + +primop PrefetchMutableByteArrayOp1 "prefetchMutableByteArray1#" GenPrimOp + MutableByteArray# s -> Int# -> State# s -> State# s + with can_fail = True + +primop PrefetchAddrOp1 "prefetchAddr1#" GenPrimOp + Addr# -> Int# -> Addr# + with can_fail = True + +---- + +primop PrefetchByteArrayOp0 "prefetchByteArray0#" GenPrimOp + ByteArray# -> Int# -> ByteArray# + with can_fail = True + +primop PrefetchMutableByteArrayOp0 "prefetchMutableByteArray0#" GenPrimOp + MutableByteArray# s -> Int# -> State# s -> State# s + with can_fail = True + +primop PrefetchAddrOp0 "prefetchAddr0#" GenPrimOp + Addr# -> Int# -> Addr# + with can_fail = True + + ------------------------------------------------------------------------ --- --- |