| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While the thread ids had been changed to 64 bit words in
e57b7cc6d8b1222e0939d19c265b51d2c3c2b4c0 the return type of the foreign
import function used to retrieve these ids - namely
'GHC.Conc.Sync.getThreadId' - was never updated accordingly.
In order to fix that this function returns now a 'CUULong'.
In addition to that the types used in the thread labeling subsystem were
adjusted as well and several format strings were modified throughout the
whole RTS to display thread ids in a consistent and correct way.
Fixes #16761
|
|
|
|
|
|
| |
is used outside of the rts so we do this rather than just fish it out of
the repo in ad-hoc way, in order to make packages in this repo more
self-contained.
|
|
|
|
|
|
|
|
|
|
| |
We have a couple of places where the conditions in asserts depend on code
ifdefed out when DEBUG is off. I'd like to allow compiling assertions into
non-DEBUG RTSen so that won't do.
Currently if we remove the conditional around the definition of ASSERT()
the build will not actually work due to a deadlock caused by initMutex not
initializing mutexes with PTHREAD_MUTEX_ERRORCHECK because DEBUG is off.
|
|\ |
|
| |
| |
| |
| | |
Specifically, we need to hold all_tasks_mutex to read taskCount.
|
| | |
|
| |
| |
| |
| |
| | |
This shouldn't be necessary since only the owning thread of the capability
should be touching this.
|
|/
|
|
|
|
|
|
|
| |
The `rts_pause` and `rts_resume` functions have been added to `RtsAPI.h` and
allow an external process to completely pause and resume the RTS.
Co-authored-by: Sven Tennie <sven.tennie@gmail.com>
Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: erikd, simonmar
Reviewed By: simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5325
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCC 8 now generates warnings for incompatible function pointer casts
[-Werror=cast-function-type]. Apparently there are a few of those in rts
code, which makes `./validate` unhappy (since we compile with `-Werror`)
This commit tries to fix these issues by changing the functions to have
the correct type (and, if necessary, moving the casts into those
functions).
For instance, hash/comparison function are declared (`Hash.h`) to take
`StgWord` but we want to use `StgWord64[2]` in `StaticPtrTable.c`.
Instead of casting the function pointers, we can cast the `StgWord`
parameter to `StgWord*`. I think this should be ok since `StgWord`
should be the same size as a pointer.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: bgamari, erikd, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4673
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to this commit, worker OS thread were renamed to "ghc_worker" when
spawned. This was annoying when reading debugging messages that print
the process name because it doesn't tell you *which* Haskell program is
generating the message.
This commit changes it to "original_process_name:w", truncating the
original name to fit in the kernel buffer if neccesary.
Test Plan: ./validate
Reviewers: austin, bgamari, erikd, simonmar
Reviewed By: bgamari
Subscribers: Phyx, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D4001
|
|
|
|
| |
Our new CPP linter enforces this.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate on lots of platforms
Reviewers: erikd, simonmar, austin
Reviewed By: erikd, simonmar
Subscribers: michalt, thomie
Differential Revision: https://phabricator.haskell.org/D2699
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`rts_setInCallCapability` sets the thread affinity as well as pins the
numa node. We should also have the ability to set the numa node without
setting the capability affinity. `rts_pinNumaNodeForCapability` function
is added and exported via `RtsAPI.h`.
Previous callers of `rts_setInCallCapability` should now also call
`rts_pinNumaNodeForCapability` to get the same effect as before.
Test Plan:
./validate
Reviewers: austin, simonmar, bgamari
Reviewed By: simonmar, bgamari
Subscribers: thomie, niteria
Differential Revision: https://phabricator.haskell.org/D2637
GHC Trac Issues: #12764
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is a fast, non-blocking, asynchronous, interface to tryPutMVar that
can be called from C/C++.
It's useful for callback-based C/C++ APIs: the idea is that the callback
invokes hs_try_putmvar(), and the Haskell code waits for the callback to
run by blocking in takeMVar.
The callback doesn't block - this is often a requirement of
callback-based APIs. The callback wakes up the Haskell thread with
minimal overhead and no unnecessary context-switches.
There are a couple of benchmarks in
testsuite/tests/concurrent/should_run. Some example results comparing
hs_try_putmvar() with using a standard foreign export:
./hs_try_putmvar003 1 64 16 100 +RTS -s -N4 0.49s
./hs_try_putmvar003 2 64 16 100 +RTS -s -N4 2.30s
hs_try_putmvar() is 4x faster for this workload (see the source for
hs_try_putmvar003.hs for details of the workload).
An alternative solution is to use the IO Manager for this. We've tried
it, but there are problems with that approach:
* Need to create a new file descriptor for each callback
* The IO Manger thread(s) become a bottleneck
* More potential for things to go wrong, e.g. throwing an exception in
an IO Manager callback kills the IO Manager thread.
Test Plan: validate; new unit tests
Reviewers: niteria, erikd, ezyang, bgamari, austin, hvr
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2501
|
|
|
|
|
| |
- Move the numaMap and nNumaNodes out of RtsFlags to Capability.c
- Add a test to tests/rts
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The aim here is to reduce the number of remote memory accesses on
systems with a NUMA memory architecture, typically multi-socket servers.
Linux provides a NUMA API for doing two things:
* Allocating memory local to a particular node
* Binding a thread to a particular node
When given the +RTS --numa flag, the runtime will
* Determine the number of NUMA nodes (N) by querying the OS
* Assign capabilities to nodes, so cap C is on node C%N
* Bind worker threads on a capability to the correct node
* Keep a separate free lists in the block layer for each node
* Allocate the nursery for a capability from node-local memory
* Allocate blocks in the GC from node-local memory
For example, using nofib/parallel/queens on a 24-core 2-socket machine:
```
$ ./Main 15 +RTS -N24 -s -A64m
Total time 173.960s ( 7.467s elapsed)
$ ./Main 15 +RTS -N24 -s -A64m --numa
Total time 150.836s ( 6.423s elapsed)
```
The biggest win here is expected to be allocating from node-local
memory, so that means programs using a large -A value (as here).
According to perf, on this program the number of remote memory accesses
were reduced by more than 50% by using `--numa`.
Test Plan:
* validate
* There's a new flag --debug-numa=<n> that pretends to do NUMA without
actually making the OS calls, which is useful for testing the code
on non-NUMA systems.
* TODO: I need to add some unit tests
Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2199
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `nat` type was an alias for `unsigned int` with a comment saying
it was at least 32 bits. We keep the typedef in case client code is
using it but mark it as deprecated.
Test Plan: Validated on Linux, OS X and Windows
Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20
Differential Revision: https://phabricator.haskell.org/D2166
|
|
|
|
|
|
|
|
| |
This allows an OS thread to specify which capability it should run on
when it makes a call into Haskell. It is intended for a fairly
specialised use case, when the client wants to have tighter control over
the mapping between OS threads and Capabilities - perhaps 1:1
correspondence, for example.
|
|
|
|
|
|
|
|
|
|
|
| |
On AIX, C system headers can redirect the token `stat` via
#define stat stat64
to provide large-file support. Simply avoiding the use of `stat` as an
identifier eschews macro-replacement.
Differential Revision: https://phabricator.haskell.org/D1566
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In #9261, there was some confusion about the meaning of the taskCount
stats variable in the rts.
It turns out that taskCount is not decremented when a worker task is
stopped (i.e. from workerTaskStop), but only when freeMyTask is called,
which frees the task bound to the current thread. So taskCount is the
current number of bound tasks + the total number of worker tasks.
This makes the calculation of the current number of bound tasks in
rts/Stats.c correct _as is_.
[skip ci]
Reviewed By: austin
Differential Revision: https://phabricator.haskell.org/D746
|
|
|
|
|
| |
This helps identify threads in gdb particularly in processes with a
lot of threads.
|
|
|
|
| |
This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
| |
This will hopefully help ensure some basic consistency in the forward by
overriding buffer variables. In particular, it sets the wrap length, the
offset to 4, and turns off tabs.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: (for the same reason that we acquire all the other mutexes)
Test Plan: validate
Reviewers: simonmar, austin, duncan
Reviewed By: simonmar, austin, duncan
Subscribers: simonmar, relrod, carter
Differential Revision: https://phabricator.haskell.org/D60
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Documented in more detail inline with the change.
Test Plan: validate
Reviewers: austin, simonmar, duncan
Reviewed By: austin, simonmar, duncan
Subscribers: simonmar, relrod, carter
Differential Revision: https://phabricator.haskell.org/D59
|
|
|
|
| |
See documentation for details.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have various problems with reallocating the array of Capabilities,
due to threads in waitForReturnCapability that are already holding a
pointer to a Capability.
Rather than add more locking to make this safer, I decided it would be
easier to ensure that we never move the Capabilities at all. The
capabilities array is now an array of pointers to Capabaility. There
are extra indirections, but it rarely matters - we don't often access
Capabilities via the array, normally we already have a pointer to
one. I ran the parallel benchmarks and didn't see any difference.
|
|
|
|
|
|
|
| |
Reordering of includes in GC.c broke on OS X because gctKey is
declared in Task.h and is needed in the storage manager. This is
really the wrong place for it anyway, so I've moved the gctKey pieces
to where they should be.
|
|
|
|
| |
A companion ghc-events pachakge commit displays task ids in the same format.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on initial patches by Mikolaj Konarski <mikolaj@well-typed.com>
Use the new task tracing functions traceTaskCreate/Migrate/Delete.
There are two key places. One is for worker tasks which have a
relatively simple life cycle. Worker tasks are created and deleted by
the RTS. The other case is bound tasks which are either created by the
RTS, or appear as foreign C threads making calls into the RTS. For bound
threads we do the tracing in rts_lock/unlock, which actually covers both
threads coming in from outside, and also bound threads made by the RTS.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On most platforms the userspace thread type (e.g. pthread_t) and kernel
thread id are different. Normally we don't care about kernel thread Ids,
but some system tools for tracing/profiling etc report kernel ids.
For example Solaris and OSX's DTrace and Linux's perf tool report kernel
thread ids. To be able to match these up with RTS's OSThread we need a
way to get at the kernel thread, so we add a new function for to do just
that (the implementation is system-dependent).
Additionally, strictly speaking the OSThreadId type, used as task ids,
is not a serialisable representation. On unix OSThreadId is a typedef for
pthread_t, but pthread_t is not guaranteed to be a numeric type.
Indeed on some systems pthread_t is a pointer and in principle it
could be a structure type. So we add another new function to get a
serialisable representation of an OSThreadId. This is only for use
in log files. We use the function to serialise an id of a task,
with the extra feature that it works in non-threaded builds
by always returning 1.
|
|
|
|
|
|
| |
Mostly this meant getting pointer<->int conversions to use the right
sizes. lnat is now size_t, rather than unsigned long, as that seems a
better match for how it's used.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were keeping around the Task struct (216 bytes) for every worker we
ever created, even though we only keep a maximum of 6 workers per
Capability. These Task structs accumulate and cause a space leak in
programs that do lots of safe FFI calls; this patch frees the Task
struct as soon as a worker exits.
One reason we were keeping the Task structs around is because we print
out per-Task timing stats in +RTS -s, but that isn't terribly useful.
What is sometimes useful is knowing how *many* Tasks there were. So
now I'm printing a single-line summary, this is for the program in
TASKS: 2001 (1 bound, 31 peak workers (2000 total), using -N1)
So although we created 2k tasks overall, there were only 31 workers
active at any one time (which is exactly what we expect: the program
makes 30 safe FFI calls concurrently).
This also gives an indication of how many capabilities were being
used, which is handy if you use +RTS -N without an explicit number.
|
|
|
|
|
| |
At present the number of capabilities can only be *increased*, not
decreased. The latter presents a few more challenges!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Terminology cleanup: the type "Ticks" has been renamed "Time", which
is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds).
The terminology "tick" is now used consistently to mean the interval
between timer signals.
The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if
we have it). Before it used CPU time in the non-threaded RTS and
realtime in the threaded RTS, but I've discovered that the CPU timer
has terrible resolution (at least on Linux) and isn't much use for
profiling. So now we always use realtime. This should also fix
The default tick interval is now 10ms, except when profiling where we
drop it to 1ms. This gives more accurate profiles without affecting
runtime too much (<1%).
Lots of cleanups - the resolution of Time is now in one place
only (Rts.h) rather than having calculations that depend on the
resolution scattered all over the RTS. I hope I found them all.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM does not support the __thread attribute for thread
local storage and may generate incorrect code for global
register variables. We want to allow building the runtime with
LLVM-based compilers such as llvm-gcc and clang,
particularly for MacOS.
This patch changes the gct variable used by the garbage
collector to use pthread_getspecific() for thread local
storage when an llvm based compiler is used to build the
runtime.
|
|
|
|
|
|
|
|
|
|
|
| |
Based on a patch from David Terei.
Some parts are a little ugly (e.g. defining things that only ASSERTs
use only when DEBUG is defined), so we might want to tweak things a
little.
I've also turned off -Werror for didn't-inline warnings, as we now
get a few such warnings.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a port of some of the changes from my private local-GC branch
(which is still in darcs, I haven't converted it to git yet). There
are a couple of small functional differences in the GC stats: first,
per-thread GC timings should now be more accurate, and secondly we now
report average and maximum pause times. e.g. from minimax +RTS -N8 -s:
Tot time (elapsed) Avg pause Max pause
Gen 0 2755 colls, 2754 par 13.16s 0.93s 0.0003s 0.0150s
Gen 1 769 colls, 769 par 3.71s 0.26s 0.0003s 0.0059s
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bug in this case was that we had a worker thread making a foreign
call which invoked a callback (in this case it was performGC, I
think). When the callback ended, boundTaskExiting() was setting
task->stopped, but the Task is now per-OS-thread, so it is shared by
the worker that made the original foreign call. When the foreign call
returned, because task->stopped was set, the worker was not placed on
the queue of spare workers. Somehow the worker woke up again, and
found the spare_workers queue empty, which lead to a crash.
Two bugs here: task->stopped should not have been set by
boundTaskExiting (this broke when I split the Task and InCall structs,
in 6.12.2), and releaseCapabilityAndQueueWorker() should not be
testing task->stopped anyway, because it should only ever be called
when task->stopped is false (this is now an assertion).
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is patch that adds support for interruptible FFI calls in the form
of a new foreign import keyword 'interruptible', which can be used
instead of 'safe' or 'unsafe'. Interruptible FFI calls act like safe
FFI calls, except that the worker thread they run on may be interrupted.
Internally, it replaces BlockedOnCCall_NoUnblockEx with
BlockedOnCCall_Interruptible, and changes the behavior of the RTS
to not modify the TSO_ flags on the event of an FFI call from
a thread that was interruptible. It also modifies the bytecode
format for foreign call, adding an extra Word16 to indicate
interruptibility.
The semantics of interruption vary from platform to platform, but the
intent is that any blocking system calls are aborted with an error code.
This is most useful for making function calls to system library
functions that support interrupting. There is no support for pre-Vista
Windows.
There is a partner testsuite patch which adds several tests for this
functionality.
|
|
|
|
| |
Which entailed fixing an incorrect #ifdef in Task.c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In GHC 6.12.x I found a rare deadlock caused by this
lock-order-reversal:
AQ cap->lock
startWorkerTask
newTask
AQ sched_mutex
scheduleCheckBlackHoles
AQ sched_mutex
unblockOne_
wakeupThreadOnCapabilty
AQ cap->lock
so sched_mutex and cap->lock are taken in a different order in two
places.
This doesn't happen in the HEAD because we don't have
scheduleCheckBlackHoles, but I thought it would be prudent to make
this less likely to happen in the future by using a different mutex in
newTask. We can clearly see that the all_tasks mutex cannot be
involved in a deadlock, becasue we never call anything else while
holding it.
|
|
|
|
|
| |
Broken by "Split part of the Task struct into a separate struct
InCall".
|
|
|
|
|
| |
Fixes a bug reported by Lennart Augustsson, whereby we could get an
incorrect error from the RTS about re-entry from a finalizer,
|