| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
All the wibble seem to have cancelled out, and (non-debug) object sizes
are back to where they started.
I'm not 100% sure that the types are optimal, but at least now the
functions have types and we can fix them if necessary.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has several advantages:
* It can be called from gdb
* There is more type information for the user, and type checking
for the compiler
* Less opportunity for things to go wrong, e.g. due to missing
parentheses or repeated execution
The sizes of the non-debug .o files hasn't changed (other than
Inlines.o), so I'm pretty sure the compiled code is identical.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
The calculation should be done in one place, of course.
|
| |
|
|
|
|
| |
Submitted by: Markus Pfeiffer <markus.pfeiffer@morphism.de> on cvs-ghc
|
|
|
|
|
|
|
|
|
|
|
| |
This came up since the addition of C finalizers, since Haskell
finalizers are already stored in an explicit list. C finalizers on
the other hand get a WEAK object each, so in order to run them in the
right order we have to make sure that list stays in the correct
order. I hate adding new invariants, but this is the quickest way to
fix the bug for now. A better way to fix it would be to have a single
WEAK object with a list of finaliers attached to it, and a primop
for adding finalizers to the list.
|
|
|
|
| |
This fixes unresolved symbols error when dynamically linking base.
|
|
|
|
|
| |
The code for retainer profiling is used with e.g. +RTS -hc -hrfoo -RTS,
as well as with +RTS -hr -RTS.
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Slightly modified version of a patch from Ben Collins <bcollins@ubuntu.com>
who did the final debugging that showed the segfault was being caused the
memory protection mechanism.
Due to the requirement of "jump islands" to handle 24 bit relative jump
offsets, GHCi on PowerPC did not use mmap to load object files like the
other architectures. Instead, it allocated memory using malloc and fread
to load the object code. However there is a quirk in the GNU libc malloc
implementation. For memory regions over certain size (dynamic and
configurable), malloc will use mmap to obtain the required memory instead
of sbrk and malloc's call to mmap sets the memory readable and writable,
but not executable. That means when GHCi loads code into a memory region
that was mmapped instead of malloc-ed and tries to execute it we get a
segfault.
This solution drops the malloc/fread object loading in favour of using
mmap and then puts the jump island for each object code module at the
end of the mmaped region for that object.
This patch may also be a solution on other ELF based powerpc systems
but does not work on darwin-powerpc.
|
| | |
|
|\ \
| |/ |
|
| | |
|
|/
|
|
| |
This allows us to provide access to them in the base library.
|
|
|
|
|
|
|
|
| |
The problem occurred when the idle GC was turned off with +RTS -I0.
Then the scheduler would go into the state ACTIVITY_DONE_GC directly
without doing a GC, and a subsequent GC would put it back to
ACTIVITY_YES but without turning the timer back on. Instead if the GC
finds the state is ACTIVITY_DONE_GC it should leave it there.
|
| |
|
| |
|
| |
|
|
|
|
| |
A companion ghc-events pachakge commit displays task ids in the same format.
|
| |
|
|
|
|
| |
The tid argument was missing
|
| |
|
| |
|
| |
|
| |
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| | |
This patch will need to be tested by someone on OSX.
Fixed a couple wrong names:
CapsetID vs EventCapsetID
gc__sync vs gc__global__sync
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Based on initial patches by Mikolaj Konarski <mikolaj@well-typed.com>
Use the new task tracing functions traceTaskCreate/Migrate/Delete.
There are two key places. One is for worker tasks which have a
relatively simple life cycle. Worker tasks are created and deleted by
the RTS. The other case is bound tasks which are either created by the
RTS, or appear as foreign C threads making calls into the RTS. For bound
threads we do the tracing in rts_lock/unlock, which actually covers both
threads coming in from outside, and also bound threads made by the RTS.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Based on initial patches by Mikolaj Konarski <mikolaj@well-typed.com>
These new eventlog events are to let profiling tools keep track of all
the OS threads that belong to an RTS capability at any moment in time.
In the RTS, OS threads correspond to the Task abstraction, so that is
what we track. There are events for tasks being created, migrated
between capabilities and deleted. In particular the task creation event
also records the kernel thread id which lets us match up the OS thread
with data collected by others tools (in the initial use case with
Linux's perf tool, but in principle also with DTrace).
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
On most platforms the userspace thread type (e.g. pthread_t) and kernel
thread id are different. Normally we don't care about kernel thread Ids,
but some system tools for tracing/profiling etc report kernel ids.
For example Solaris and OSX's DTrace and Linux's perf tool report kernel
thread ids. To be able to match these up with RTS's OSThread we need a
way to get at the kernel thread, so we add a new function for to do just
that (the implementation is system-dependent).
Additionally, strictly speaking the OSThreadId type, used as task ids,
is not a serialisable representation. On unix OSThreadId is a typedef for
pthread_t, but pthread_t is not guaranteed to be a numeric type.
Indeed on some systems pthread_t is a pointer and in principle it
could be a structure type. So we add another new function to get a
serialisable representation of an OSThreadId. This is only for use
in log files. We use the function to serialise an id of a task,
with the extra feature that it works in non-threaded builds
by always returning 1.
|
| |
| |
| |
| |
| | |
You can get it with +RTS -P, as with the other systemish cost centres
like "GC".
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The clearNurseries() operation resets the free pointer in each nursery
block to the start of the block, emptying the nursery. In the
parallel GC this was done on the main GC thread, but that's bad
because it accesses the bdescr of every nursery block, and move all
those cache lines onto the CPU of the main GC thread. With large
nurseries, this can be especially bad. So instead we want to clear
each nursery in its local GC thread.
Thanks to Andreas Voellmy <andreas.voellmy@gmail.com> for idenitfying
the issue.
After this change and the previous patch to make the last GC a major
one, I see these results for nofib/parallel on 8 cores:
blackscholes +0.0% +0.0% -3.7% -3.3% +0.3%
coins +0.0% +0.0% -5.1% -5.0% +0.4%
gray +0.0% +0.0% -4.5% -2.1% +0.8%
mandel +0.0% -0.0% -7.6% -5.1% -2.3%
matmult +0.0% +5.5% -2.8% -1.9% -5.8%
minimax +0.0% +0.0% -10.6% -10.5% +0.0%
nbody +0.0% -4.4% +0.0% 0.07 +0.0%
parfib +0.0% +1.0% +0.5% +0.9% +0.0%
partree +0.0% +0.0% -2.4% -2.5% +1.7%
prsa +0.0% -0.2% +1.8% +4.2% +0.0%
queens +0.0% -0.0% -1.8% -1.4% -4.8%
ray +0.0% -0.6% -18.5% -17.8% +0.0%
sumeuler +0.0% -0.0% -3.7% -3.7% +0.0%
transclos +0.0% -0.0% -25.7% -26.6% +0.0%
--------------------------------------------------------------------------------
Min +0.0% -4.4% -25.7% -26.6% -5.8%
Max +0.0% +5.5% +1.8% +4.2% +1.7%
Geometric Mean +0.0% +0.1% -6.3% -6.1% -0.7%
|
|/
|
|
|
|
|
|
| |
We do a final GC before shutting down the system, to clean up.
However, we were doing an ordinary GC rather than forcing a major GC,
so especially when the allocation area is large, this final GC could
be expensive. This is really just a bug - the final GC should have
virtually nothing to do, because there is nothing live.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Patch by Samuel Thibault.
See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=659530.
|
| |
|
|
|
|
|
| |
It uses native 64-bit instructions instead of these, despite having
32-bit pointers.
|
|\ |
|