summaryrefslogtreecommitdiff
path: root/includes/stg
Commit message (Collapse)AuthorAgeFilesLines
* Rename isPinnedByteArray# to isByteArrayPinned#Ben Gamari2016-06-041-1/+2
| | | | | | | | | | | | Reviewers: simonmar, duncan, erikd, austin Reviewed By: austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2290 GHC Trac Issues: #12059
* RTS SMP: Use compiler built-ins on all platforms.Peter Trommler2016-06-041-166/+23
| | | | | | | | | | | | | | | | | | | Use C compiler builtins for atomic SMP primitives. This saves a lot of CPP ifdefs. Add test for atomic xchg: Test if __sync_lock_test_and_set() builtin stores the second argument. The gcc manual says the actual value stored is implementation defined. Test Plan: validate and eyeball generated assembler code Reviewers: kgardas, simonmar, hvr, bgamari, austin, erikd Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2233
* rts: Add isPinnedByteArray# primopBen Gamari2016-05-181-0/+1
| | | | | | | | | | | | | | | | | Adds a primitive operation to determine whether a particular `MutableByteArray#` is backed by a pinned buffer. Test Plan: Validate with included testcase Reviewers: austin, simonmar Reviewed By: austin, simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2217 GHC Trac Issues: #12059
* Fix histograms for ticky codeMateusz Lenik2016-05-181-2/+9
| | | | | | | | | | | | | | | | | | | | This patch fixes Cmm generation required to produce histograms when compiling with -ticky flag, strips dead code from rts/Ticky.c and reworks it to use a shared constant in both C and Haskell code. Fixes #8308. Test Plan: T8308 Reviewers: jstolarek, simonpj, austin Reviewed By: simonpj Subscribers: mpickering, simonpj, bgamari, mlen, thomie, jstolarek Differential Revision: https://phabricator.haskell.org/D931 GHC Trac Issues: #8308
* PPC: Implement SMP primitives using gcc built-insPeter Trommler2016-05-161-64/+9
| | | | | | | | | | | | | | | | | | | | | | | The SMP primitives were missing appropriate memory barriers (sync, isync instructions) on all PowerPCs. Use the built-ins _sync_* provided by gcc and clang. This reduces code size significantly. Remove broken mark for concprog001 on powerpc64. The referenced ticket number (11259) was wrong. Test Plan: validate on powerpc and ARM Reviewers: erikd, austin, simonmar, bgamari, hvr Reviewed By: bgamari, hvr Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2225 GHC Trac Issues: #12070
* rts: Fix C compiler warnings on WindowsErik de Castro Lopo2016-05-111-0/+5
| | | | | | | | | | | | | | | | Summary: Specifcally we want the MinGW compiler to use ISO print format specfifiers. Test Plan: Validate on Linux, OS X and Windows Reviewers: Phyx, austin, bgamari, simonmar Reviewed By: bgamari, simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2192
* stg/Types.h: Fix comment and #includeBen Gamari2016-05-101-3/+5
|
* Use stdint types for Stg{Word,Int}{8,16,32,64}Tomas Carnecky2016-05-101-55/+75
| | | | | | | | | | | | | | | | | | We can't define Stg{Int,Word} in terms of {,u}intptr_t because STG depends on them being the exact same size as void*, and {,u}intptr_t does not make that guarantee. Furthermore, we also need to define StgHalf{Int,Word}, so the preprocessor if needs to stay. But we can at least keep it in a single place instead of repeating it in various files. Also define STG_{INT,WORD}{8,16,32,64}_{MIN,MAX} and use it in HsFFI.h, further reducing the need for CPP in other files. Reviewers: austin, bgamari, simonmar, hvr, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2182
* Avoid local label syntax for assembler on AIXHerbert Valerio Riedel2016-03-241-0/+20
| | | | | | | | | | | | | Unfortunately (for inline `__asm__()` uses), IBM's `as` doesn't seem to support local labels[1] like GNU `as` does so we need to workaround this when on AIX. [1]: https://sourceware.org/binutils/docs/as/Symbol-Names.html#Symbol-Names Turns out this also addresses the long-standing bug #485 Reviewed By: bgamari, trommler Differential Revision: https://phabricator.haskell.org/D2029
* Split external symbol prototypes (EF_) (Trac #11395)Sergei Trofimovich2016-03-081-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before the patch both Cmm and C symbols were declared with 'EF_' macro: #define EF_(f) extern StgFunPtr f() but for Cmm symbols we know exact prototypes. The patch splits there prototypes in to: #define EFF_(f) void f() /* See Note [External function prototypes] */ #define EF_(f) StgFunPtr f(void) Cmm functions are 'EF_' (External Functions), C functions are 'EFF_' (External Foreign Functions). While at it changed external C function prototype to return 'void' to workaround ghc bug on m68k. Described in detail in Trac #11395. This makes simple tests work on m68k-linux target! Thanks to Michael Karcher for awesome analysis happening in Trac #11395. Signed-off-by: Sergei Trofimovich <siarheit@google.com> Test Plan: ran "hello world" on m68k successfully Reviewers: simonmar, austin, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1975 GHC Trac Issues: #11395
* rts: drop unused global 'blackhole_queue'Sergei Trofimovich2016-02-271-1/+0
| | | | | | | | | | | | | | | | | | | | | | Commit 5d52d9b64c21dcf77849866584744722f8121389 removed global 'blackhole_queue' in favour of new mechanism: when TSO hits blackhole TSO blocks waiting for 'MessgaeBlackhole' delivery. Patch removed unused global and updates stale comments. Noticed by Yuras Shumovich. Signed-off-by: Sergei Trofimovich <siarheit@google.com> Test Plan: build test Reviewers: simonmar, austin, Yuras, bgamari Reviewed By: Yuras, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1953
* PowerPC: Improve float register assignment.Peter Trommler2016-02-161-3/+9
| | | | | | | | | | | | | | | | | | On Linux assign F5 and F6 and D3 through D6 to caller-saved registers. Fixes #11273 Test Plan: validate on powerpc (I validated on powerpc64) Reviewers: bgamari, erikd, austin Reviewed By: erikd, austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1914 GHC Trac Issues: #11273
* Ensure that we don't produce code for pre-ARMv7 without barriersBen Gamari2016-01-251-14/+3
| | | | | | | | | | | | | | | | | | | | | | We are unable to produce load/store barriers for pre-ARMv7 targets. Phab:D894 added dummy cases to SMP.h for these barriers to prevent the build from failing under the assumption that there are no SMP-capable devices of this vintage. However, #10433 points out that it is more correct to simply set NOSMP for such targets. Tested By: rwbarton Test Plan: Validate Reviewers: erikd, rwbarton, austin Reviewed By: rwbarton Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1704 GHC Trac Issues: #10433
* Comments only: more alternate names for ARM registers [skip ci]Reid Barton2016-01-251-3/+4
| | | | | | | | | | | | | | Summary: These are the names used by arm-linux-androideabi-objdump, so it's helpful to have them here next to the Stg register mapping. Reviewers: austin, erikd, bgamari Reviewed By: erikd, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1840
* Remove unused IND_PERMJoachim Breitner2016-01-231-1/+0
| | | | | | | | | | | | | | | | | it seems that this closure type has not been in use since 5d52d9, so all this is dead and untested code. This removes it. Some of the code might be useful for a counting indirection as described in #10613, so when implementing that, have a look at what this commit removes. Test Plan: validate on harbormaster Reviewers: austin, bgamari, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1821
* Maintain cost-centre stacks in the interpreterSimon Marlow2015-12-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Breakpoints become SCCs, so we have detailed call-stack info for interpreted code. Currently this only works when GHC is compiled with -prof, but D1562 (Remote GHCi) removes this constraint so that in the future call stacks will be available without building your own GHCi. How can you get a stack trace? * programmatically: GHC.Stack.currentCallStack * I've added an experimental :where command that shows the stack when stopped at a breakpoint * `error` attaches a call stack automatically, although since calls to `error` are often lifted out to the top level, this is less useful than it might be (ImplicitParams still works though). * Later we might attach call stacks to all exceptions Other related changes in this diff: * I reduced the number of places that get ticks attached for breakpoints. In particular there was a breakpoint around the whole declaration, which was often redundant because it bound no variables. This reduces clutter in the stack traces and speeds up compilation. * I tidied up some RealSrcSpan stuff in InteractiveUI, and made a few other small cleanups Test Plan: validate Reviewers: ezyang, bgamari, austin, hvr Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1595 GHC Trac Issues: #11047
* New magic function for applying realWorld#Ben Gamari2015-11-121-0/+2
| | | | | | | | | | | | | | Test Plan: validate Reviewers: goldfire, erikd, rwbarton, simonpj, austin, simonmar, hvr Reviewed By: simonpj Subscribers: simonmar, thomie Differential Revision: https://phabricator.haskell.org/D1103 GHC Trac Issues: #10678
* cmm: Expose machine's stack and return address registerBen Gamari2015-11-011-0/+2
| | | | | | | | | | We will need to use these to setup proper unwinding information for the stg_stop_thread closure. This pokes a hole in the STG abstraction, exposing the machine's stack pointer register so that we can accomplish this. We also expose a dummy return address register, which corresponds to the register used to hold the DWARF return address. Differential Revision: https://phabricator.haskell.org/D1225
* Fix signature of atomic builtinsAndreas Schwab2015-10-021-36/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is due to Andreas Schwab. This fixes #10926, which reports (on AArch64) errors of the form, ``` /tmp/ghc1492_0/ghc_1.hc:2844:25: warning: passing argument 1 of 'hs_atomic_xor64' makes pointer from integer without a cast [-Wint-conversion] _c1Ho = hs_atomic_xor64((*Sp) + (((Sp[1]) << 0x3UL) + 0x10UL), Sp[2]); ^ In file included from /home/abuild/rpmbuild/BUILD/ghc-7.10.2/includes/Stg.h:273:0: 0, from /tmp/ghc1492_0/ghc_1.hc:3: /home/abuild/rpmbuild/BUILD/ghc-7.10.2/includes/stg/Prim.h:41:11: note: expected 'volatile StgWord64 * {aka volatile long unsigned int *}' but argument is of type 'long unsigned int' StgWord64 hs_atomic_xor64(volatile StgWord64 *x, StgWord64 val); ^ ``` Test Plan: Validate Reviewers: austin, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1300 GHC Trac Issues: #10926
* Make headers C++ compatible (fixes #10700)Alexey Shmalko2015-07-302-12/+12
| | | | | | | | | | | | | | | | | Some headers used `new` as parameter name, which is reserved word in C++. This patch changes these names to `new_`. Test Plan: validate Reviewers: austin, ezyang, bgamari, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1107 GHC Trac Issues: #10700
* Implement PowerPC 64-bit native code backend for LinuxPeter Trommler2015-07-033-5/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the PowerPC 32-bit native code generator for "64-bit PowerPC ELF Application Binary Interface Supplement 1.9" by Ian Lance Taylor and "Power Architecture 64-Bit ELF V2 ABI Specification -- OpenPOWER ABI for Linux Supplement" by IBM. The latter ABI is mainly used on POWER7/7+ and POWER8 Linux systems running in little-endian mode. The code generator supports both static and dynamic linking. PowerPC 64-bit code for ELF ABI 1.9 and 2 is mostly position independent anyway, and thus so is all the code emitted by the code generator. In other words, -fPIC does not make a difference. rts/stg/SMP.h support is implemented. Following the spirit of the introductory comment in PPC/CodeGen.hs, the rest of the code is a straightforward extension of the 32-bit implementation. Limitations: * Code is generated only in the medium code model, which is also gcc's default * Local symbols are not accessed directly, which seems to also be the case for 32-bit * LLVM does not work, but this does not work on 32-bit either * Must use the system runtime linker in GHCi, because the GHC linker for "static" object files (rts/Linker.c) for PPC 64-bit is not implemented. The system runtime (dynamic) linker works. * The handling of the system stack (register 1) is not ELF- compliant so stack traces break. Instead of allocating a new stack frame, spill code should use the "official" spill area in the current stack frame and deallocation code should restore the back chain * DWARF support is missing Fixes #9863 Test Plan: validate (on powerpc, too) Reviewers: simonmar, trofi, erikd, austin Reviewed By: trofi Subscribers: bgamari, arnons1, kgardas, thomie Differential Revision: https://phabricator.haskell.org/D629 GHC Trac Issues: #9863
* Be aware of overlapping global STG registers in CmmSink (#10521)Reid Barton2015-06-251-0/+6
| | | | | | | | | | | | | | | | | | | Summary: On x86_64, commit e2f6bbd3a27685bc667655fdb093734cb565b4cf assigned the STG registers F1 and D1 the same hardware register (xmm1), and the same for the registers F2 and D2, etc. When mixing calls to functions involving Float#s and Double#s, this can cause wrong Cmm optimizations that assume the F1 and D1 registers are independent. Reviewers: simonpj, austin Reviewed By: austin Subscribers: simonpj, thomie, bgamari Differential Revision: https://phabricator.haskell.org/D993 GHC Trac Issues: #10521
* rts: Fix aarch64 implementation of xchgErik de Castro Lopo2015-06-011-4/+2
| | | | | | | | | | | | | | | | In the previous implementation, the `stlxr` instruction clobbered the value that was supposed to be returned by the the `xchg` function. Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com> Test Plan: build on aarch64 Reviewers: austin, bgamari, rwbarton Subscribers: bgamari, thomie Differential Revision: https://phabricator.haskell.org/D932
* Add a TODO FIXME w.r.t. D894Austin Seipp2015-05-191-0/+5
| | | | | | | | | | | | As Reid mentioned in a comment on D894, the case fixed by this revision likely isn't really correct, because old ARM binaries could run on newer machines, meaning we need to detect at runtime whether we need a proper barrier. But in the mean time, this actually stops the build from failing - which is better off. So we'll just remember this when we fix it in the future. Signed-off-by: Austin Seipp <austin@well-typed.com>
* includes/stg/SMP.h: implement simple load_/store_load_barrier on armv6 and olderSergei Trofimovich2015-05-181-0/+4
| | | | | | | | | | | | | | | | | | | | Assuming there is no real SMP systems on these CPUs I've added only compiler barrier (otherwise write_barrier and friends need to be fixed as well). Patch also fixes build breakage reported in #10244. Signed-off-by: Sergei Trofimovich <siarheit@google.com> Reviewers: rwbarton, nomeata, austin Reviewed By: nomeata, austin Subscribers: bgamari, thomie Differential Revision: https://phabricator.haskell.org/D894 GHC Trac Issues: #10244
* commentsSimon Marlow2014-12-151-1/+11
|
* Link pre-ARMv6 spinlocks into all RTS variantsJoachim Breitner2014-12-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For compatibility with ARM machines from pre v6, the RTS provides implementations of certain atomic operations. Previously, these were only included in the threaded RTS. But ghc (the library) contains the code in compiler/cbits/genSym.c, which uses these operations if there is more than one capability. But there is only one libHSghc, so the linker wants to resolve these symbols in every case. By providing these operations in all RTSs, the linker is happy. The only downside is a small amount of dead code in the non-threaded RTS on old ARM machines. Test Plan: It helped here. Reviewers: bgamari, austin Reviewed By: bgamari, austin Subscribers: carter, thomie Differential Revision: https://phabricator.haskell.org/D564 GHC Trac Issues: #8951
* arm64: 64bit iOS and SMP support (#7942)Luke Iannini2014-11-193-1/+39
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* Revert "Rename _closure to _static_closure, apply naming consistently."Edward Z. Yang2014-10-201-15/+15
| | | | | | | This reverts commit 35672072b4091d6f0031417bc160c568f22d0469. Conflicts: compiler/main/DriverPipeline.hs
* Rename _closure to _static_closure, apply naming consistently.Edward Z. Yang2014-10-011-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In preparation for indirecting all references to closures, we rename _closure to _static_closure to ensure any old code will get an undefined symbol error. In order to reference a closure foobar_closure (which is now undefined), you should instead use STATIC_CLOSURE(foobar). For convenience, a number of these old identifiers are macro'd. Across C-- and C (Windows and otherwise), there were differing conventions on whether or not foobar_closure or &foobar_closure was the address of the closure. Now, all foobar_closure references are addresses, and no & is necessary. CHARLIKE/INTLIKE were not changed, simply alpha-renamed. Part of remove HEAP_ALLOCED patch set (#8199) Depends on D265 Signed-off-by: Edward Z. Yang <ezyang@mit.edu> Test Plan: validate Reviewers: simonmar, austin Subscribers: simonmar, ezyang, carter, thomie Differential Revision: https://phabricator.haskell.org/D267 GHC Trac Issues: #8199
* `M-x delete-trailing-whitespace` & `M-x untabify`Herbert Valerio Riedel2014-09-241-1/+1
|
* Implement `decodeDouble_Int64#` primopHerbert Valerio Riedel2014-09-171-0/+1
| | | | | | | | | | | | | | | The existing `decodeDouble_2Int#` primop is rather inconvenient to use (and in fact is not even used by `integer-gmp`) as the mantissa is split into 3 components which would actually fit in an `Int64#` value. However, `decodeDouble_Int64#` is to be used by the new `integer-gmp2` re-implementation (see #9281). Moreover, `decodeDouble_2Int#` performs direct bit-wise operations on the IEEE representation which can be replaced by a combination of the portable standard C99 `scalbn(3)` and `frexp(3)` functions. Differential Revision: https://phabricator.haskell.org/D160
* [ci skip] includes: detabify/dewhitespace stg/Types.hAustin Seipp2014-08-201-20/+20
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* [ci skip] includes: detabify/dewhitespace stg/SMP.hAustin Seipp2014-08-201-21/+21
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* [ci skip] includes: detabify/dewhitespace stg/Regs.hAustin Seipp2014-08-201-52/+52
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* includes/stg/Prim.h: add matching 'hs_atomic_*' prototypesSergei Trofimovich2014-08-191-0/+38
| | | | | | Fixes implicit function declarations in C codegen. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* Revert "Fix typos 'resizze'"Gabor Greif2014-08-161-1/+1
| | | | | | this is z-encoding (as hvr tells me) This reverts commit 425d5178af55620efa00e6e16426f491c63ad533.
* Fix typos 'resizze'Gabor Greif2014-08-161-1/+1
|
* Implement {resize,shrink}MutableByteArray# primopsHerbert Valerio Riedel2014-08-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The two new primops with the type-signatures resizeMutableByteArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, MutableByteArray# s #) shrinkMutableByteArray# :: MutableByteArray# s -> Int# -> State# s -> State# s allow to resize MutableByteArray#s in-place (when possible), and are useful for algorithms where memory is temporarily over-allocated. The motivating use-case is for implementing integer backends, where the final target size of the result is either N or N+1, and only known after the operation has been performed. A future commit will implement a stateful variant of the `sizeofMutableByteArray#` operation (see #9447 for details), since now the size of a `MutableByteArray#` may change over its lifetime (i.e before it gets frozen or GCed). Test Plan: ./validate --slow Reviewers: ezyang, austin, simonmar Reviewed By: austin, simonmar Differential Revision: https://phabricator.haskell.org/D133
* Implement new CLZ and CTZ primops (re #9340)Herbert Valerio Riedel2014-08-141-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | This implements the new primops clz#, clz32#, clz64#, ctz#, ctz32#, ctz64# which provide efficient implementations of the popular count-leading-zero and count-trailing-zero respectively (see testcase for a pure Haskell reference implementation). On x86, NCG as well as LLVM generates code based on the BSF/BSR instructions (which need extra logic to make the 0-case well-defined). Test Plan: validate and succesful tests on i686 and amd64 Reviewers: rwbarton, simonmar, ezyang, austin Subscribers: simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D144 GHC Trac Issues: #9340
* stg/Prim.h: drop redundant #ifdefSergei Trofimovich2014-08-111-4/+0
| | | | | | | | | | | | | | | | | Summary: Noticed by Herbert Valerio Riedel Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Test Plan: build test Reviewers: simonmar, austin, hvr Reviewed By: hvr Subscribers: rwbarton, phaskell, simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D143
* includes/stg/SMP.h: use 'NOSMP' instead of never defined 'WITHSMP' (Trac #8789)Sergei Trofimovich2014-07-021-21/+21
| | | | | | | | | | | | | | | | | | Summary: git history does not contain state where 'WITHSMP' macro was ever defined. It should have always been '!NOSMP'. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Test Plan: build tested Reviewers: austin, simonmar Reviewed By: austin, simonmar Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D33
* Re-add more primops for atomic ops on byte arraysJohan Tibell2014-06-301-1/+0
| | | | | | | | | | | | | | | | | | | | | | | This is the second attempt to add this functionality. The first attempt was reverted in 950fcae46a82569e7cd1fba1637a23b419e00ecd, due to register allocator failure on x86. Given how the register allocator currently works, we don't have enough registers on x86 to support cmpxchg using complicated addressing modes. Instead we fall back to a simpler addressing mode on x86. Adds the following primops: * atomicReadIntArray# * atomicWriteIntArray# * fetchSubIntArray# * fetchOrIntArray# * fetchXorIntArray# * fetchAndIntArray# Makes these pre-existing out-of-line primops inline: * fetchAddIntArray# * casIntArray#
* Revert "Add more primops for atomic ops on byte arrays"Johan Tibell2014-06-261-0/+1
| | | | | | | | This commit caused the register allocator to fail on i386. This reverts commit d8abf85f8ca176854e9d5d0b12371c4bc402aac3 and 04dd7cb3423f1940242fdfe2ea2e3b8abd68a177 (the second being a fix to the first).
* Add more primops for atomic ops on byte arraysJohan Tibell2014-06-241-1/+0
| | | | | | | | | | | | | | | | | | | Summary: Add more primops for atomic ops on byte arrays Adds the following primops: * atomicReadIntArray# * atomicWriteIntArray# * fetchSubIntArray# * fetchOrIntArray# * fetchXorIntArray# * fetchAndIntArray# Makes these pre-existing out-of-line primops inline: * fetchAddIntArray# * casIntArray#
* ghc: initial AArch64 patchesColin Watson2014-04-212-1/+57
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* Add missing symbols to linkerJohan Tibell2014-03-291-0/+4
| | | | The copy array family of primops were moved out-of-line.
* Add SmallArray# and SmallMutableArray# typesJohan Tibell2014-03-291-0/+14
| | | | | | | | | | | | | | | These array types are smaller than Array# and MutableArray# and are faster when the array size is small, as they don't have the overhead of a card table. Having no card table reduces the closure size with 2 words in the typical small array case and leads to less work when updating or GC:ing the array. Reduces both the runtime and memory allocation by 8.8% on my insert benchmark for the HashMap type in the unordered-containers package, which makes use of lots of small arrays. With tuned GC settings (i.e. `+RTS -A6M`) the runtime reduction is 15%. Fixes #8923.
* Follow hs_popcntX changes in ghc-primJohan Tibell2014-03-221-5/+5
|
* codeGen: inline allocation optimization for clone array primopsJohan Tibell2014-03-221-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | The inline allocation version is 69% faster than the out-of-line version, when cloning an array of 16 unit elements on a 64-bit machine. Comparing the new and the old primop implementations isn't straightforward. The old version had a missing heap check that I discovered during the development of the new version. Comparing the old and the new version would requiring fixing the old version, which in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim. The inline allocation threshold is configurable via -fmax-inline-alloc-size which gives the maximum array size, in bytes, to allocate inline. The size does not include the closure header size. Allowing the same primop to be either inline or out-of-line has some implication for how we lay out heap checks. We always place a heap check around out-of-line primops, as they may allocate outside of our knowledge. However, for the inline primops we only allow allocation via the standard means (i.e. virtHp). Since the clone primops might be either inline or out-of-line the heap check layout code now consults shouldInlinePrimOp to know whether a primop will be inlined.