| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
|
|
|
|
| |
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
|
| |
|
| |
|
| |
|
|
|
|
|
| |
I don't want to fall back to gettimeofday(), because that might have a
different absolute value.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a patch from FB's internal build of GHC that I'm pushing
upstream.
Author: Andrew Gallagher <agallagher@fb.com>
This diff adds simple thin archive support to ghc's linker code, which
basically just entails finding the member data from disk rather than
from inside the archive (except for the case of the symbol index and
gnu filename index, where the member data is still inline).
|
| |
|
|
|
|
|
| |
Harmonize the indentation amount. The file mixed 4, 2, and in some
cases 3 spaces for indentation.
|
|
|
|
| |
The copy array family of primops were moved out-of-line.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These array types are smaller than Array# and MutableArray# and are
faster when the array size is small, as they don't have the overhead
of a card table. Having no card table reduces the closure size with 2
words in the typical small array case and leads to less work when
updating or GC:ing the array.
Reduces both the runtime and memory allocation by 8.8% on my insert
benchmark for the HashMap type in the unordered-containers package,
which makes use of lots of small arrays. With tuned GC settings
(i.e. `+RTS -A6M`) the runtime reduction is 15%.
Fixes #8923.
|
|
|
|
|
|
| |
This should reduce code size when there's little to gain from inlining
these primops, while still retaining the inlining benefit when the
size of the copy is known statically.
|
|
|
|
|
|
|
| |
After a toolchain update, Clang is no longer appreciative of the fact
these are unused, thanks to -Werror during validate.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The inline allocation version is 69% faster than the out-of-line
version, when cloning an array of 16 unit elements on a 64-bit
machine.
Comparing the new and the old primop implementations isn't
straightforward. The old version had a missing heap check that I
discovered during the development of the new version. Comparing the
old and the new version would requiring fixing the old version, which
in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim.
The inline allocation threshold is configurable via
-fmax-inline-alloc-size which gives the maximum array size, in bytes,
to allocate inline. The size does not include the closure header size.
Allowing the same primop to be either inline or out-of-line has some
implication for how we lay out heap checks. We always place a heap
check around out-of-line primops, as they may allocate outside of our
knowledge. However, for the inline primops we only allow allocation
via the standard means (i.e. virtHp). Since the clone primops might be
either inline or out-of-line the heap check layout code now consults
shouldInlinePrimOp to know whether a primop will be inlined.
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
| |
gcptr should only be used for pointers that the GC should
follow. While this didn't cause any bugs right now, since these
variables aren't live over a GC, it's clearer to use the right type.
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The implementations of newArray# and newArrayArray#, stg_newArrayzh
and stg_newArrayArrayzh, had three issues:
* The condition for the loop that fills the array with the initial
element was incorrect. It would write into the card table as
well. The condition for the loop that filled the card table was
never executed, as its condition was also wrong. In the end this
didn't lead to any disasters as the value of the card table doesn't
matter for newly allocated arrays.
* The card table was unnecessarily initialized. The card table is
only used when the array isn't copied, which new arrays always
are. By not writing the card table at all we save some cycles.
* The ticky allocation accounting was wrong. The second argument to
TICK_ALLOC_PRIM is the size of the closure excluding the header
size, but the header size was incorrectly included.
Fixes #8867.
|
|
|
|
| |
See documentation for details.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Found by clang:
rts_dist_HC rts/dist/build/RetainerProfile.p_o
rts/RetainerProfile.c:1779:5:
error: implicit declaration of function 'markStableTables' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
markStableTables(retainRoot, NULL);
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
| |
This should have manifested earlier, but for some reason it only seemed
to trigger on Mavericks.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
| |
passed explicitely
Issue #8748
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UNREG mode has quite nasty invariant to maintain:
capabilities[0] == &MainCapability
and it's a non-heap memory, while other
capabilities are dynamically allocated.
Issue #8748
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
| |
Following 298a25bdf and #8722 as Peter mentioned, this probably isn't
needed anymore.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
| |
Our old function for searching for sections could only deal
with section names that were eight bytes or shorter; this
patch adds support for long section names.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
| |
As Luke Iannini reported, the Clang iOS cross compiler apparently
doesn't support __thread for some bizarre reason, so unfortunately they
too must fall back to pthread_{get,set}specific.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This basically cleans a lot of GCTDecl up - I found it quite hard to
read and a bit confusing. The changes are mostly cosmetic: better
delineation between the alternative cases and light touchups, and tries
to make every branch as consistent as possible.
However, this patch does have one significant effect: it will ensure
that any LLVM-based compilers will use __thread if they support it.
Before, they would simply always use pthread_getspecific and
pthread_setspecific, which are almost surely even *more* inefficient.
The details are a bit too long and boring to go into here; see #7602.
After talking with Simon, we decided to play it safe - __thread can at
least be optimized by future clang releases even further on OS X if they
choose, and it's safer until we can investigate the pthread
implementation further on Mavericks.
For Linux, the story isn't so bleak if you use Clang (for whatever
reason) - Linux directly writes to `%fs` for __thread slots (while OS X
will perform a load followed by an indirect call.) So it should still be
fairly competitive, speed-wise.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
| |
This fixes #7134
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We occasionally need to reserve some temporary memory in a primop for
passing to a foreign function. We've been using the stack for this,
but when we moved to high-level Cmm it became quite fragile because
primops are in high-level Cmm and the stack is supposed to be under
the control of the Cmm pipeline.
So this change puts things on a firmer footing by adding a new Cmm
construct 'reserve'. e.g. in decodeFloat_Int#:
reserve 2 = tmp {
mp_tmp1 = tmp + WDS(1);
mp_tmp_w = tmp;
/* Perform the operation */
ccall __decodeFloat_Int(mp_tmp1 "ptr", mp_tmp_w "ptr", arg);
r1 = W_[mp_tmp1];
r2 = W_[mp_tmp_w];
}
reserve is described in CmmParse.y.
Unfortunately the argument to reserve must be a compile-time constant.
We might have to extend the parser to allow expressions with
arithmetic operators if this is too restrictive.
Note also that the return instruction for the procedure must be
outside the scope of the reserved stack area, so we have to extract
the values from the reserved area before we close the scope. This
means some more local variables (r1, r2 in the example above). The
generated code is more or less identical to what we had before though.
|
|
|
|
|
|
|
|
| |
When printing an update frame in printClosure(), it will not print
the unspecific UPDATE_FRAME, instead it prints BH_UPDATE_FRAME,
NORMAL_UPDATE_FRAME or MARKED_UPDATE_FRAME.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Herbert Valerio Riedel <hvr@gnu.org>
|
|
|
|
|
|
|
| |
Gold apparently doesn't recognize `-z origin`, only `-zorigin` it seems.
Authored-by: Ben Gamari <bgamari.foss@gmail.com>
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
|
|
|
|
|
|
|
| |
On win64 sizeof(long) != sizeof(void*), so debugTrace was casting a
value of incorrect size causing a validate failure.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
| |
An earlier patch fixes a bug in flushExec on linux only. This
patch uses the fixed code on all operating systems.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
We now do the allocation of the blackhole indirection closure inside the
RTS procedure 'newCAF' instead of generating the allocation code inline
in the closure body of each CAF. This slightly decreases code size in
modules with a lot of CAFs.
As a result of this change, for example, the size of DynFlags.o drops by
~60KB and HsExpr.o by ~100KB.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
Instead, just don't do anything on x86/amd64, and on !x86, use either A)
__clear_cache from libgcc, or B) sys_icache_invalidate for OS X (and
iOS.)
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
| |
Authored-by: Authored-by: Luke Iannini <lukexi@me.com>
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
| |
Signed-off-by: Arash Rouhani <rarash@student.chalmers.se>
Reviewed-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds code for jumping to given addresses for ARM, written by Ben
Gamari.
However, when allocating new infotables for bytecode (which is where
this jump code occurs), we need to be sure to flush the cache on the
execute pointer returned from allocateExec() - on systems like ARM, the
processor won't reliably read back code or automatically cache flush,
where x86 will.
So we add a new flushExec primitive to call out to GCC's
__builtin___clear_cache primitive, which will properly generate the
correct code (nothing on x86, and a call to libgcc's __clear_cache on
ARM) and make sure we use it after writing the code out.
Authored-by: Ben Gamari <bgamari.foss@gmail.com>
Authored-by: Austin Seipp <austin@well-typed.com>
Signed-off-by: Austin Seipp <austin@well-typed.com>
|