| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
pthread_join returns its error code and apparently doesn't set errno.
|
|
|
|
|
| |
Previously we would take all capabilities but fail to join on the thread
itself, potentially resulting in a leaked thread.
|
|
|
|
|
| |
Previously we would race on the cached processor count. Avoiding this is
straightforward; just use relaxed operations.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we would report the number of physical processors, which
can be quite wrong in a containerized setting. Now we rather return how
many processors are in our affinity mask when possible.
I also refactored the code to prefer platform-specific since this will
report logical CPUs instead of physical (using
`machdep.cpu.thread_count` on Darwin and `cpuset_getaffinity` on FreeBSD).
Fixes #14781.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCC 8 now generates warnings for incompatible function pointer casts
[-Werror=cast-function-type]. Apparently there are a few of those in rts
code, which makes `./validate` unhappy (since we compile with `-Werror`)
This commit tries to fix these issues by changing the functions to have
the correct type (and, if necessary, moving the casts into those
functions).
For instance, hash/comparison function are declared (`Hash.h`) to take
`StgWord` but we want to use `StgWord64[2]` in `StaticPtrTable.c`.
Instead of casting the function pointers, we can cast the `StgWord`
parameter to `StgWord*`. I think this should be ok since `StgWord`
should be the same size as a pointer.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: bgamari, erikd, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4673
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: validate
Reviewers: simonmar, erikd
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #14725
Differential Revision: https://phabricator.haskell.org/D4346
|
|
|
|
| |
Our new CPP linter enforces this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The C code in the RTS now gets built with `-Wundef` and the Haskell code
(stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever
`#if` is used on undefined identifiers.
Test Plan: Validate on Linux and Windows
Reviewers: austin, angerman, simonmar, bgamari, Phyx
Reviewed By: bgamari
Subscribers: thomie, snowleopard
Differential Revision: https://phabricator.haskell.org/D3278
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate on DragonflyBSD
Reviewers: austin, erikd, simonmar
Reviewed By: erikd
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3480
|
|
|
|
|
|
|
|
| |
This is causing too much platform dependent breakage at the moment. We
will need a more rigorous testing strategy before this can be
merged again.
This reverts commit 7e340c2bbf4a56959bd1e95cdd1cfdb2b7e537c2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The C code in the RTS now gets built with `-Wundef` and the Haskell code
(stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever
`#if` is used on undefined identifiers.
Test Plan: Validate on Linux and Windows
Reviewers: austin, angerman, simonmar, bgamari, Phyx
Reviewed By: bgamari
Subscribers: thomie, snowleopard
Differential Revision: https://phabricator.haskell.org/D3278
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate on lots of platforms
Reviewers: erikd, simonmar, austin
Reviewed By: erikd, simonmar
Subscribers: michalt, thomie
Differential Revision: https://phabricator.haskell.org/D2699
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On iOS, we use the pthread-based implementation of Itimer.c even for a
non-threaded RTS. Since 999c464, this relies on synchronization
primitives like Mutex, so ensure those primitives are defined whenever
they are supported, even if !THREADED_RTS.
Fixes #12799.
Reviewers: erikd, austin, simonmar, bgamari
Reviewed By: simonmar, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2712
GHC Trac Issues: #12799
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When trying to build arm64-apple-iso, the build fell over `strdup`, as
the arm64-apple-ios build did not fall into `darwin_HOST_OS`, and would
need `ios_HOST_OS`.
This diff tries to clean up PosixSource.h, instead of layering another
define on top.
As we use `strnlen` in sources that include PosixSource.h, and `strnlen`
is defined in POSIX.1-2008, the `_POSIX_C_SOURCE` and `_XOPEN_SOURCE`
are increased accordingly.
Furthermore the `_DARWIN_C_SOURCE` (required for `u_char`, etc. used in
sysctl.h) define is moved into `OSThreads.h` alongside a similar ifdef
for freebsd.
Test Plan: Build on all supported platforms.
Reviewers: austin, simonmar, erikd, kgardas, bgamari
Reviewed By: simonmar, erikd, kgardas, bgamari
Subscribers: Phyx, hvr, thomie
Differential Revision: https://phabricator.haskell.org/D2579
GHC Trac Issues: #12624
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Windows support for more than 64 logical processors are implemented
using processor groups.
Essentially what it's doing is keeping the existing maximum of 64
processors and keeping the affinity mask a 64 bit value, but adds an
hierarchy above that.
This support was added to Windows 7 and so we need to at runtime detect
if the APIs are still there due to our minimum supported version being
Windows Vista.
The Maximum number of groups supported at this time is 4, so 256 logical
cores. The group indices are 0 based. One thread can have affinity with
multiple groups.
See
https://msdn.microsoft.com/en-us/library/windows/desktop/ms684251.aspx
and particularly helpful is the whitepaper: 'Supporting Systems that
have more than 64 processors' at
https://msdn.microsoft.com/en-us/library/windows/hardware/dn653313.aspx
Processor groups are not guaranteed to be uniformly distributed nor
guaranteed to be filled before a next group is needed. The OS will
assign processors to groups based on physical proximity and will never
partially assign cores from one physical cpu to more than one group. If
one has two 48 core CPUs then you'd end up with two groups of 48 logical
cpus. Now add a 3rd CPU with 10 cores and the group it is assigned to
depends where the socket is on the board.
Test Plan:
./validate or make test -c . in the rts test folder.
This tests for regressions, to test this particular functionality
itself:
<program> +RTS -N -qa -RTS
Test is detailed in description.
Reviewers: bgamari, simonmar, austin, erikd
Reviewed By: simonmar
Subscribers: thomie, #ghc_windows_task_force
Differential Revision: https://phabricator.haskell.org/D2533
GHC Trac Issues: #11054
|
|
|
|
|
|
| |
This reverts commit cac3fb06f4b282eee21159c364c4d08e8fdedce9.
This breaks OS X and Windows.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When trying to build arm64-apple-iso, the build fell over
`strdup`, as the arm64-apple-ios build did not fall into `darwin_HOST_OS`,
and would need `ios_HOST_OS`.
This diff tries to clean up PosixSource.h, instead of layering another
define on top.
As we use `strnlen` in sources that include PosixSource.h, and `strnlen`
is defined in POSIX.1-2008, the `_POSIX_C_SOURCE` and `_XOPEN_SOURCE`
are increased accordingly.
Furthermore the `_DARWIN_C_SOURCE` (required for `u_char`, etc. used in
sysctl.h) define is moved into `OSThreads.h` alongside a similar ifdef
for freebsd.
Test Plan: Build on all supported platforms.
Reviewers: rwbarton, erikd, austin, hvr, simonmar, bgamari
Reviewed By: simonmar, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2375
|
|
|
|
|
| |
- Move the numaMap and nNumaNodes out of RtsFlags to Capability.c
- Add a test to tests/rts
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NUMA code was enabled whenever numa.h and numaif.h are detected.
Unfortunately, the hosts' header files were being detected even then
cross compiling in the absence of a target libnuma.
Fix that by relying on the the presence of libnuma instead of the
presence of the header files. The test for libnuma does `AC_TRY_LINK`
which will fail if the test program (compiled for the target) can't
be linked against libnuma.
Test Plan:
Build on x86_64/linux and make sure NUMA works and cross compile to
armhf/linux.
Reviewers: austin, bgamari, hvr, simonmar
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2329
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The aim here is to reduce the number of remote memory accesses on
systems with a NUMA memory architecture, typically multi-socket servers.
Linux provides a NUMA API for doing two things:
* Allocating memory local to a particular node
* Binding a thread to a particular node
When given the +RTS --numa flag, the runtime will
* Determine the number of NUMA nodes (N) by querying the OS
* Assign capabilities to nodes, so cap C is on node C%N
* Bind worker threads on a capability to the correct node
* Keep a separate free lists in the block layer for each node
* Allocate the nursery for a capability from node-local memory
* Allocate blocks in the GC from node-local memory
For example, using nofib/parallel/queens on a 24-core 2-socket machine:
```
$ ./Main 15 +RTS -N24 -s -A64m
Total time 173.960s ( 7.467s elapsed)
$ ./Main 15 +RTS -N24 -s -A64m --numa
Total time 150.836s ( 6.423s elapsed)
```
The biggest win here is expected to be allocating from node-local
memory, so that means programs using a large -A value (as here).
According to perf, on this program the number of remote memory accesses
were reduced by more than 50% by using `--numa`.
Test Plan:
* validate
* There's a new flag --debug-numa=<n> that pretends to do NUMA without
actually making the OS calls, which is useful for testing the code
on non-NUMA systems.
* TODO: I need to add some unit tests
Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2199
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `nat` type was an alias for `unsigned int` with a comment saying
it was at least 32 bits. We keep the typedef in case client code is
using it but mark it as deprecated.
Test Plan: Validated on Linux, OS X and Windows
Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20
Differential Revision: https://phabricator.haskell.org/D2166
|
|
|
|
|
|
|
|
| |
Reviewed By: erikd, austin
Differential Revision: https://phabricator.haskell.org/D2082
GHC Trac Issues: #8594
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This takes care of setting feature test macros (i.e. let Autoconf decide when
those can be set safely) to allow subsequent Autoconf tests to better detect
available OS features.
This also includes a submodule update of unix which enables the use of
`AC_USE_SYSTEM_EXTENSIONS` in there as well.
Specifically, this takes care of setting `_GNU_SOURCE` (which allows to remove
two occurences where it's set manually) and `_ALL_SOURCE` (which fixes issues
on AIX).
See also
https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Posix-Variants.html
for details.
At some point we may want to reconsider the purpose of "rts/PosixSource.h" and
rely more on Autoconf instead.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If `pthread_setname_np` is not available, then a regular
./validate will fail due to warnings; the `name` parameter to
`createOSThread` becomes unused.
Signed-off-by: Austin Seipp <austin@well-typed.com>
Test Plan: iiam
Reviewers: simonmar, nomeata, jstolarek, hvr
Reviewed By: nomeata, jstolarek, hvr
Subscribers: nomeata, thomie, carter, ezyang, simonmar
Differential Revision: https://phabricator.haskell.org/D344
|
| |
|
|
|
|
|
| |
This helps identify threads in gdb particularly in processes with a
lot of threads.
|
|
|
|
| |
This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
|
|
|
|
|
|
|
|
| |
This will hopefully help ensure some basic consistency in the forward by
overriding buffer variables. In particular, it sets the wrap length, the
offset to 4, and turns off tabs.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
| |
|
|
|
|
| |
Based on a patch from Thorkil Naur.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On most platforms the userspace thread type (e.g. pthread_t) and kernel
thread id are different. Normally we don't care about kernel thread Ids,
but some system tools for tracing/profiling etc report kernel ids.
For example Solaris and OSX's DTrace and Linux's perf tool report kernel
thread ids. To be able to match these up with RTS's OSThread we need a
way to get at the kernel thread, so we add a new function for to do just
that (the implementation is system-dependent).
Additionally, strictly speaking the OSThreadId type, used as task ids,
is not a serialisable representation. On unix OSThreadId is a typedef for
pthread_t, but pthread_t is not guaranteed to be a numeric type.
Indeed on some systems pthread_t is a pointer and in principle it
could be a structure type. So we add another new function to get a
serialisable representation of an OSThreadId. This is only for use
in log files. We use the function to serialise an id of a task,
with the extra feature that it works in non-threaded builds
by always returning 1.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were keeping around the Task struct (216 bytes) for every worker we
ever created, even though we only keep a maximum of 6 workers per
Capability. These Task structs accumulate and cause a space leak in
programs that do lots of safe FFI calls; this patch frees the Task
struct as soon as a worker exits.
One reason we were keeping the Task structs around is because we print
out per-Task timing stats in +RTS -s, but that isn't terribly useful.
What is sometimes useful is knowing how *many* Tasks there were. So
now I'm printing a single-line summary, this is for the program in
TASKS: 2001 (1 bound, 31 peak workers (2000 total), using -N1)
So although we created 2k tasks overall, there were only 31 workers
active at any one time (which is exactly what we expect: the program
makes 30 safe FFI calls concurrently).
This also gives an indication of how many capabilities were being
used, which is handy if you use +RTS -N without an explicit number.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider this experimental for the time being. There are a lot of
things that could go wrong, but I've verified that at least it works
on the test cases we have.
I also did some API cleanups while I was here. Previously we had:
Capability * rts_eval (Capability *cap, HaskellObj p, /*out*/HaskellObj *ret);
but this API is particularly error-prone: if you forget to discard the
Capability * you passed in and use the return value instead, then
you're in for subtle bugs with +RTS -N later on. So I changed all
these functions to this form:
void rts_eval (/* inout */ Capability **cap,
/* in */ HaskellObj p,
/* out */ HaskellObj *ret)
It's much harder to use this version incorrectly, because you have to
pass the Capability in by reference.
|
|
|
|
|
|
|
|
| |
Fixes validate on amd64/Linux with:
SRC_CC_OPTS += -Wmissing-parameter-type
SRC_CC_OPTS += -Wold-style-declaration
SRC_CC_OPTS += -Wold-style-definition
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is patch that adds support for interruptible FFI calls in the form
of a new foreign import keyword 'interruptible', which can be used
instead of 'safe' or 'unsafe'. Interruptible FFI calls act like safe
FFI calls, except that the worker thread they run on may be interrupted.
Internally, it replaces BlockedOnCCall_NoUnblockEx with
BlockedOnCCall_Interruptible, and changes the behavior of the RTS
to not modify the TSO_ flags on the event of an FFI call from
a thread that was interruptible. It also modifies the bytecode
format for foreign call, adding an extra Word16 to indicate
interruptibility.
The semantics of interruption vary from platform to platform, but the
intent is that any blocking system calls are aborted with an error code.
This is most useful for making function calls to system library
functions that support interrupting. There is no support for pre-Vista
Windows.
There is a partner testsuite patch which adds several tests for this
functionality.
|
|
|
|
|
|
|
|
|
| |
- Implement missing functions for setting thread affinity and getting real
number of processors.
- It is available starting from 7.1-RELEASE, which includes a native support
for managing CPU sets.
- Add __BSD_VISIBLE, since it is required for certain types to be visible in
addition to POSIX & C99.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first phase of this tidyup is focussed on the header files, and in
particular making sure we are exposinng publicly exactly what we need
to, and no more.
- Rts.h now includes everything that the RTS exposes publicly,
rather than a random subset of it.
- Most of the public header files have moved into subdirectories, and
many of them have been renamed. But clients should not need to
include any of the other headers directly, just #include the main
public headers: Rts.h, HsFFI.h, RtsAPI.h.
- All the headers needed for via-C compilation have moved into the
stg subdirectory, which is self-contained. Most of the headers for
the rest of the RTS APIs have moved into the rts subdirectory.
- I left MachDeps.h where it is, because it is so widely used in
Haskell code.
- I left a deprecated stub for RtsFlags.h in place. The flag
structures are now exposed by Rts.h.
- Various internal APIs are no longer exposed by public header files.
- Various bits of dead code and declarations have been removed
- More gcc warnings are turned on, and the RTS code is more
warning-clean.
- More source files #include "PosixSource.h", and hence only use
standard POSIX (1003.1c-1995) interfaces.
There is a lot more tidying up still to do, this is just the first
pass. I also intend to standardise the names for external RTS APIs
(e.g use the rts_ prefix consistently), and declare the internal APIs
as hidden for shared libraries.
|
|
|
|
| |
getNumberOfProcessors() for MacOS X
|
|
|
|
|
| |
This checks if darwin_HOST_OS is defined and, if so, we call
sysctlbyname() on the "hw.ncpu" property to get the processor count.
|
| |
|
| |
|
| |
|
| |
|