| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The aim here is to reduce the number of remote memory accesses on
systems with a NUMA memory architecture, typically multi-socket servers.
Linux provides a NUMA API for doing two things:
* Allocating memory local to a particular node
* Binding a thread to a particular node
When given the +RTS --numa flag, the runtime will
* Determine the number of NUMA nodes (N) by querying the OS
* Assign capabilities to nodes, so cap C is on node C%N
* Bind worker threads on a capability to the correct node
* Keep a separate free lists in the block layer for each node
* Allocate the nursery for a capability from node-local memory
* Allocate blocks in the GC from node-local memory
For example, using nofib/parallel/queens on a 24-core 2-socket machine:
```
$ ./Main 15 +RTS -N24 -s -A64m
Total time 173.960s ( 7.467s elapsed)
$ ./Main 15 +RTS -N24 -s -A64m --numa
Total time 150.836s ( 6.423s elapsed)
```
The biggest win here is expected to be allocating from node-local
memory, so that means programs using a large -A value (as here).
According to perf, on this program the number of remote memory accesses
were reduced by more than 50% by using `--numa`.
Test Plan:
* validate
* There's a new flag --debug-numa=<n> that pretends to do NUMA without
actually making the OS calls, which is useful for testing the code
on non-NUMA systems.
* TODO: I need to add some unit tests
Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2199
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `nat` type was an alias for `unsigned int` with a comment saying
it was at least 32 bits. We keep the typedef in case client code is
using it but mark it as deprecated.
Test Plan: Validated on Linux, OS X and Windows
Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20
Differential Revision: https://phabricator.haskell.org/D2166
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x86 and x64
Summary:
On Windows, the default action for things like division by zero and
segfaults is to pop up a Dr. Watson error reporting dialog if the exception
is unhandled by the user code.
This is a pain when we are SSHed into a Windows machine, or when we
want to debug a problem with gdb (gdb will get a first and second chance to
handle the exception, but if it doesn't the pop-up will show).
veh_excn provides two macros, `BEGIN_CATCH` and `END_CATCH`, which
will catch such exceptions in the entire process and die by
printing a message and calling `stg_exit(1)`.
Previously this code was handled using SEH (Structured Exception Handlers)
however each compiler and platform have different ways of dealing with SEH.
`MSVC` compilers have the keywords `__try`, `__catch` and `__except` to have the
compiler generate the appropriate SEH handler code for you.
`MinGW` compilers have no such keywords and require you to manually set the
SEH Handlers, however because SEH is implemented differently in x86 and x64
the methods to use them in GCC differs.
`x86`: SEH is based on the stack, the SEH handlers are available at `FS[0]`.
On startup one would only need to add a new handler there. This has
a number of issues such as hard to share handlers and it can be exploited.
`x64`: In order to fix the issues with the way SEH worked in x86, on x64 SEH handlers
are statically compiled and added to the .pdata section by the compiler.
Instead of being thread global they can now be Image global since you have to
specify the `RVA` of the region of code that the handlers govern.
You can on x64 Dynamically allocate SEH handlers, but it seems that (based on
experimentation and it's very under-documented) that the dynamic calls cannot override
static SEH handlers in the .pdata section.
Because of this and because GHC no longer needs to support < windows XP, the better
alternative for handling errors would be using the in XP introduced VEH.
The bonus is because VEH (Vectored Exception Handler) are a runtime construct the API
is the same for both x86 and x64 (note that the Context object does contain CPU specific
structures) and the calls are the same cross compilers. Which means this file can be
simplified quite a bit.
Using VEH also means we don't have to worry about the dynamic code generated by GHCi.
Test Plan:
Prior to this diff the tests for `derefnull` and `divbyzero` seem to have been disabled for windows.
To reproduce the issue on x64:
1) open ghci
2) import GHC.Base
3) run: 1 `divInt` 0
which should lead to ghci crashing an a watson error box displaying.
After applying the patch, run:
make TEST="derefnull divbyzero"
on both x64 and x86 builds of ghc to verify fix.
Reviewers: simonmar, austin
Reviewed By: austin
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D691
GHC Trac Issues: #6079
|
|
|
|
|
| |
This helps identify threads in gdb particularly in processes with a
lot of threads.
|
|
|
|
| |
This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
| |
This will hopefully help ensure some basic consistency in the forward by
overriding buffer variables. In particular, it sets the wrap length, the
offset to 4, and turns off tabs.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
| |
GlobalMemoryStatusEx actually requires _WIN32_WINNT to be defined as
0x0501 (Windows XP) for availability.
For completeness, I bumped WIN32_WINNT in Ticker and OSThreads as well.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On most platforms the userspace thread type (e.g. pthread_t) and kernel
thread id are different. Normally we don't care about kernel thread Ids,
but some system tools for tracing/profiling etc report kernel ids.
For example Solaris and OSX's DTrace and Linux's perf tool report kernel
thread ids. To be able to match these up with RTS's OSThread we need a
way to get at the kernel thread, so we add a new function for to do just
that (the implementation is system-dependent).
Additionally, strictly speaking the OSThreadId type, used as task ids,
is not a serialisable representation. On unix OSThreadId is a typedef for
pthread_t, but pthread_t is not guaranteed to be a numeric type.
Indeed on some systems pthread_t is a pointer and in principle it
could be a structure type. So we add another new function to get a
serialisable representation of an OSThreadId. This is only for use
in log files. We use the function to serialise an id of a task,
with the extra feature that it works in non-threaded builds
by always returning 1.
|
| |
|
| |
|
|
|
|
|
| |
Also, use the Win32 API (CreateThread) instead of the CRT API
(_beginthreadex) for thread creation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is patch that adds support for interruptible FFI calls in the form
of a new foreign import keyword 'interruptible', which can be used
instead of 'safe' or 'unsafe'. Interruptible FFI calls act like safe
FFI calls, except that the worker thread they run on may be interrupted.
Internally, it replaces BlockedOnCCall_NoUnblockEx with
BlockedOnCCall_Interruptible, and changes the behavior of the RTS
to not modify the TSO_ flags on the event of an FFI call from
a thread that was interruptible. It also modifies the bytecode
format for foreign call, adding an extra Word16 to indicate
interruptibility.
The semantics of interruption vary from platform to platform, but the
intent is that any blocking system calls are aborted with an error code.
This is most useful for making function calls to system library
functions that support interrupting. There is no support for pre-Vista
Windows.
There is a partner testsuite patch which adds several tests for this
functionality.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Somebody needs to implement getNumberOfProcessors() for MacOS X,
currently it will return 1.
|
|
|
|
|
|
| |
The C-- parser was missing the "stdcall" calling convention for
foreign calls, but once added we can call {Enter,Leave}CricialSection
directly.
|
|
|
|
|
| |
When calling EnterCriticalSection and LeaveCriticalSection from C--
code, we go via wrappers which use ccall (rather than stdcall).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It seems that when a program exits with open DLLs on Windows, the
system attempts to shut down the DLLs, but it also terminates (some
of?) the running threads. The RTS isn't prepared for threads to die
unexpectedly, so it sits around waiting for its workers to finish.
This bites in two places: ShutdownIOManager() in the the unthreaded
RTS, and shutdownCapability() in the threaded RTS. So far I've
modified the latter to notice when worker threads have died
unexpectedly and continue shutting down. It seems a bit trickier to
fix the unthreaded RTS, so for now the workaround for #926 is to use
the threaded RTS.
|
| |
|
| |
|
|
Most of the other users of the fptools build system have migrated to
Cabal, and with the move to darcs we can now flatten the source tree
without losing history, so here goes.
The main change is that the ghc/ subdir is gone, and most of what it
contained is now at the top level. The build system now makes no
pretense at being multi-project, it is just the GHC build system.
No doubt this will break many things, and there will be a period of
instability while we fix the dependencies. A straightforward build
should work, but I haven't yet fixed binary/source distributions.
Changes to the Building Guide will follow, too.
|