| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
We already have a function to go from time to ms so use it.
Also expand on the state of timer resolution.
|
| |
|
|
|
|
| |
This will help track down #17035.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
These are unexploded minds as far as the linter is concerned. I don't
want to hit in my MRs by mistake!
I did this with `sed`, and then rolled back some changes in the docs,
config.guess, and the linter itself.
|
| |
|
|
|
|
|
| |
This moves all URL references to Trac tickets to their corresponding
GitLab counterparts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The megablock allocator does not currently check that after aligning the
free region if it still has enough space to actually do the allocation.
This causes it to return a memory region which it didn't fully allocate
itself. Even worse, it can cause it to return a block with a region
that will be present in two allocation pools.
This causes if you're lucky an error from the OS that you're committing
memory that has never been reserved, or causes random heap corruption.
This change makes it consider the alignment as well.
Test Plan: ./validate , testcase testmblockalloc
Reviewers: bgamari, erikd, simonmar
Reviewed By: simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5363
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fix a number of issues that have broken the 32 bit build.
This makes it build again.
Test Plan: ./validate
Reviewers: hvr, goldfire, bgamari, erikd, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4691
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit b2ff5dde399cd012218578945ada1d9ff68daa35 "Fix #15038"
added new stable closure 'absentSumFieldError_closure' to
base package. This closure is used in rts package.
Unfortunately the symbol was not explicitly exported and build
failed on windows as:
```
"inplace/bin/ghc-stage1" -o ...hsc2hs.exe ...
rts/dist/build/libHSrts.a(RtsStartup.o): In function `hs_init_ghc':
rts/RtsStartup.c:272:0: error:
undefined reference to `base_ControlziExceptionziBase_absentSumFieldError_closure'
|
272 | getStablePtr((StgPtr)absentSumFieldError_closure);
| ^
```
This change adds 'absentSumFieldError_closure' to explicit export
into libHSbase.def.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate, run program with `+RTS --numa` without libnuma
support compiled in
Reviewers: erikd, simonmar
Subscribers: thomie, carter
GHC Trac Issues: #14956
Differential Revision: https://phabricator.haskell.org/D4556
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* osNumaNodes now returns the right number of nodes
* thread affinity is now correctly set
TODO: no noticeable performance improvement.
does windows already distribute threads in a NUMA-aware fashion?
Test Plan:
* validate
* local tests on a NUMA machine
Reviewers: bgamari, erikd, simonmar
Reviewed By: bgamari, simonmar
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4607
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's no-op on all platforms
Reviewers: bgamari, simonmar, erikd, dfeuer
Reviewed By: dfeuer
Subscribers: dfeuer, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4588
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: bgamari, erikd, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4450
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Adds quick-cross-ncg flavour.
- Fix windows wchar with `_s` for mingw
- Lookup windres, dllwrap and objdump
- Fix type.
Reviewers: bgamari, hvr, Phyx, erikd, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, erikd, carter
Differential Revision: https://phabricator.haskell.org/D4430
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: bgamari, erikd, simonmar
Reviewed By: bgamari, simonmar
Subscribers: Phyx, rwbarton, thomie, carter
GHC Trac Issues: #11777
Differential Revision: https://phabricator.haskell.org/D4428
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: validate
Reviewers: simonmar, erikd
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #14725
Differential Revision: https://phabricator.haskell.org/D4346
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes the testsuite pass clean on Windows again.
It also fixes the `libstdc++-6.dll` error harbormaster
was showing.
I'm marking some tests as isolated tests to reduce their
flakiness (mostly concurrency tests) when the test system
is under heavy load.
Updates process submodule.
Test Plan: ./validate
Reviewers: hvr, bgamari, erikd, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4277
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The failure is visible when we build a cross-compiler
from linux to mingw32 as:
```
$ ./configure --host=x86_64-pc-linux-gnu \
--target=x86_64-w64-mingw32
$ make
rts/linker/PEi386.c:159:10: error:
fatal error: Psapi.h: No such file or directory
#include <Psapi.h>
^~~~~~~~~
|
159 | #include <Psapi.h>
| ^
```
The problem here is case-sensitive linux filesystem. On windows
it does not matter what case is used for includes and libraries.
mingw32 provides all libraries and headers lowercase. This change
fixes case for <dbghelp.h>, <psapi.h>, -ldbghelp, -lpsapi.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Reviewers: bgamari, erikd, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4247
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate
Reviewers: Phyx, austin, erikd, simonmar
Reviewed By: Phyx
Subscribers: rwbarton, thomie
GHC Trac Issues: #14415
Differential Revision: https://phabricator.haskell.org/D4151
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds the ability to generate stack traces on crashes for Windows.
When running in the interpreter this attempts to use symbol information from
the interpreter and information we know about the loaded object files to
resolve addresses to symbols.
When running compiled it doesn't have this information and then defaults
to using symbol information from PDB files. Which for now means only
files compiled with ICC or MSVC will show traces compiled.
But I have a future patch that may address this shortcoming.
Also since I don't know how to walk a pure haskell stack, I can for now
only show the last entry. I'm hoping to figure out how Apply.cmm works to
be able to walk the stalk and give more entries for pure haskell code.
In GHCi
```
$ echo main | inplace/bin/ghc-stage2.exe --interactive ./testsuite/tests/rts/derefnull.hs
GHCi, version 8.3.20170830: http://www.haskell.org/ghc/ :? for help
Ok, 1 module loaded.
Prelude Main>
Access violation in generated code when reading 0x0
Attempting to reconstruct a stack trace...
Frame Code address
* 0x77cde10 0xc370229 E:\..\base\dist-install\build\HSbase-4.10.0.0.o+0x190031
(base_ForeignziStorable_zdfStorableInt4_info+0x3f)
```
and compiled
```
Access violation in generated code when reading 0x0
Attempting to reconstruct a stack trace...
Frame Code address
* 0xf0dbd0 0x40bb01 E:\..\rts\derefnull.run\derefnull.exe+0xbb01
```
Test Plan: ./validate
Reviewers: austin, hvr, bgamari, erikd, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3913
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: austin, erikd, simonmar
Reviewed By: simonmar
Subscribers: pacak, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D4068
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's often hard to debug things like segfaults on Windows,
mostly because gdb isn't always of use and users don't know
how to effectively use it.
This patch provides a way to create a crash drump by passing
`+RTS --generate-crash-dumps` as an option. If any unhandled
exception is triggered a dump is made that contains enough
information to be able to diagnose things successfully.
Currently the created dumps are a bit big because I include
all registers, code and threads information.
This looks like
```
$ testsuite/tests/rts/derefnull.run/derefnull.exe +RTS
--generate-crash-dumps
Access violation in generated code when reading 0000000000000000
Crash dump created. Dump written to:
E:\msys64\tmp\ghc-20170901-220250-11216-16628.dmp
```
Test Plan: ./validate
Reviewers: austin, hvr, bgamari, erikd, simonmar
Reviewed By: bgamari, simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3912
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Knowing this is important for followup commits, where we will subtract
getProcessElapsedTime() values from each other, in a way that assumes
that there is no wrapping every 49 days.
Reviewers: bgamari, austin, erikd, simonmar, NicolasT
Reviewed By: bgamari
Subscribers: rwbarton, thomie
GHC Trac Issues: #14233
Differential Revision: https://phabricator.haskell.org/D3964
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Exception handling on Windows is unfortunately a bit complicated.
But essentially the VEH Handlers we currently have are running too
early.
This was a problem as it ran so early it also swallowed C++ exceptions
and other software exceptions which the system could have very well
recovered from.
So instead we use a sequence of chains to for the exception handlers to
run as late as possible. You really can't get any later than this.
Please read the comment in the patch for more details.
I'm also providing a switch to allow people to turn off the exception
handling entirely. In case it does present a problem with their code.
(Reverted and recommitted to fix authorship information)
Test Plan: ./validate
Reviewers: austin, hvr, bgamari, erikd, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie
GHC Trac Issues: #13911, #12110
Differential Revision: https://phabricator.haskell.org/D3911
|
|
|
|
|
|
| |
Reverting to fix authorship of commit.
This reverts commit 1825cbdbdf08ed4bd6fd6794852596078953298a.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Exception handling on Windows is unfortunately a bit complicated.
But essentially the VEH Handlers we currently have are running too
early.
This was a problem as it ran so early it also swallowed C++ exceptions
and other software exceptions which the system could have very well
recovered from.
So instead we use a sequence of chains to for the exception handlers to
run as late as possible. You really can't get any later than this.
Please read the comment in the patch for more details.
I'm also providing a switch to allow people to turn off the exception
handling entirely. In case it does present a problem with their code.
Test Plan: ./validate
Reviewers: austin, hvr, bgamari, erikd, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie
GHC Trac Issues: #13911, #12110
Differential Revision: https://phabricator.haskell.org/D3911
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
compiler/ghc.mk
@echo "#define $(HostArch_CPP)_HOST_ARCH 1" >> $@
@echo "#define $(TargetArch_CPP)_HOST_ARCH 1" >> $@
this leads to warnigns like:
> warning: 'x86_64_HOST_ARCH' is not defined, evaluates to 0 [-Wundef]
Reviewers: austin, bgamari, erikd, simonmar
Reviewed By: simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3555
|
|
|
|
| |
Our new CPP linter enforces this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This both says what we mean and silences a bunch of spurious CPP linting
warnings. This pragma is supported by all CPP implementations which we
support.
Reviewers: austin, erikd, simonmar, hvr
Reviewed By: simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3482
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While debugging an unrelated issue I noticed that we leak a
TimerQueueTimer on exit since we don't necessarily call stopTicker
before exitTicker. Fix this.
Test Plan: Validate on Windows
Reviewers: simonmar, austin, erikd
Reviewed By: simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3477
|
|
|
|
| |
This will make it a bit easier to maintain consistent output in the testsuite.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[skip ci]
There ware some old file names (.lhs, ...) at comments.
* rts/win32/ThrIOManager.c
- Conc.lhs -> Conc.hs
* rts/PrimOps.cmm
- ByteCodeLink.lhs -> ByteCodeLink.hs
- StgMiscClosures.hc -> StgMiscClosures.cmm
* rts/AutoApply.h
- AutoApply.hc -> AutoApply.cmm
* rts/HeapStackCheck.cmm
- PrimOps.hc -> PrimOps.cmm
* rts/LdvProfile.h
- Updates.hc -> Updates.cmm
* rts/Schedule.c
- StgStartup.hc -> StgStartup.cmm
* rts/Weak.c
- StgMiscClosures.hc -> StgMiscClosures.cmm
Reviewers: bgamari, austin, erikd, simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D3075
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch is courtesy of @awson.
Currently, whenever GHC catches a segfault on Windows, it simply reports the
somewhat uninformative message
`Segmentation fault/access violation in generated code`. This patch adds to
the message the type of violation (read/write/dep) and location information,
which should help debugging segfaults in the future.
Fixes #13108.
Test Plan: Build on Windows
Reviewers: austin, erikd, bgamari, simonmar, Phyx
Reviewed By: bgamari, Phyx
Subscribers: awson, thomie, #ghc_windows_task_force
Differential Revision: https://phabricator.haskell.org/D2969
GHC Trac Issues: #13108
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code has been broken on 64-bit systems for some time: the length
and timeout arguments of `addIORequest` and `addDelayRequest`,
respectively, were declared as `int`. However, they were passed Haskell
integers from their respective primops. Integer overflow and madness
ensued. This resulted in #7325 and who knows what else.
Also, there were a few left-over `BOOL`s in here which were not passed
to Windows system calls; these were changed to C99 `bool`s.
However, there is still a bit of signedness inconsistency within the
`delay#` call-chain,
* `GHC.Conc.IO.threadDelay` and the `delay#` primop accept `Int`
arguments
* The `delay#` implementation in `PrimOps.cmm` expects the timeout as
a `W_`
* `AsyncIO.c:addDelayRequest` expects an `HsInt` (was `int` prior to
this patch)
* `IOManager.c:AddDelayRequest` expects an `HsInt`` (was `int`)
* The Windows `Sleep` function expects a `DWORD` (which is unsigned)
Test Plan: Validate on Windows
Reviewers: erikd, austin, simonmar, Phyx
Reviewed By: Phyx
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2861
GHC Trac Issues: #7325
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This commit makes various improvements and addresses some issues with
Compact Regions (aka Compact Normal Forms).
This was the most important thing I wanted to fix. Compaction
previously prevented GC from running until it was complete, which
would be a problem in a multicore setting. Now, we compact using a
hand-written Cmm routine that can be interrupted at any point. When a
GC is triggered during a sharing-enabled compaction, the GC has to
traverse and update the hash table, so this hash table is now stored
in the StgCompactNFData object.
Previously, compaction consisted of a deepseq using the NFData class,
followed by a traversal in C code to copy the data. This is now done
in a single pass with hand-written Cmm (see rts/Compact.cmm). We no
longer use the NFData instances, instead the Cmm routine evaluates
components directly as it compacts.
The new compaction is about 50% faster than the old one with no
sharing, and a little faster on average with sharing (the cost of the
hash table dominates when we're doing sharing).
Static objects that don't (transitively) refer to any CAFs don't need
to be copied into the compact region. In particular this means we
often avoid copying Char values and small Int values, because these
are static closures in the runtime.
Each Compact# object can support a single compactAdd# operation at any
given time, so the Data.Compact library now enforces mutual exclusion
using an MVar stored in the Compact object.
We now get exceptions rather than killing everything with a barf()
when we encounter an object that cannot be compacted (a function, or a
mutable object). We now also detect pinned objects, which can't be
compacted either.
The Data.Compact API has been refactored and cleaned up. A new
compactSize operation returns the size (in bytes) of the compact
object.
Most of the documentation is in the Haddock docs for the compact
library, which I've expanded and improved here.
Various comments in the code have been improved, especially the main
Note [Compact Normal Forms] in rts/sm/CNF.c.
I've added a few tests, and expanded a few of the tests that were
there. We now also run the tests with GHCi, and in a new test way
that enables sanity checking (+RTS -DS).
There's a benchmark in libraries/compact/tests/compact_bench.hs for
measuring compaction speed and comparing sharing vs. no sharing.
The field totalDataW in StgCompactNFData was unnecessary.
Test Plan:
* new unit tests
* validate
* tested manually that we can compact Data.Aeson data
Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd
Subscribers: thomie, simonpj
Differential Revision: https://phabricator.haskell.org/D2751
GHC Trac Issues: #12455
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fix issues preventing x86 GHC to build on Windows and
fix segfault in the testsuite.
Test Plan: ./validate
Reviewers: austin, erikd, simonmar, bgamari
Reviewed By: bgamari
Subscribers: #ghc_windows_task_force, thomie
Differential Revision: https://phabricator.haskell.org/D2789
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate on lots of platforms
Reviewers: erikd, simonmar, austin
Reviewed By: erikd, simonmar
Subscribers: michalt, thomie
Differential Revision: https://phabricator.haskell.org/D2699
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
NOTE: I have been able to do simple testing on emulated NUMA nodes.
Real hardware would be needed for a proper test.
D2199 Added NUMA support for Linux, I have just filled in the missing pieces following
the description of the Linux APIs.
Test Plan:
Use `bcdedit.exe /set groupsize 2` to modify the kernel again (Similar to D2533).
This generates some NUMA nodes:
```
Logical Processor to NUMA Node Map:
NUMA Node 0:
**
--
NUMA Node 1:
--
**
Approximate Cross-NUMA Node Access Cost (relative to fastest):
00 01
00: 1.1 1.1
01: 1.0 1.0
```
run ` ../test-numa.exe +RTS --numa -RTS`
and check PerfMon for NUMA allocations.
Reviewers: simonmar, erikd, bgamari, austin
Reviewed By: simonmar
Subscribers: thomie, #ghc_windows_task_force
Differential Revision: https://phabricator.haskell.org/D2534
GHC Trac Issues: #12602
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Windows support for more than 64 logical processors are implemented
using processor groups.
Essentially what it's doing is keeping the existing maximum of 64
processors and keeping the affinity mask a 64 bit value, but adds an
hierarchy above that.
This support was added to Windows 7 and so we need to at runtime detect
if the APIs are still there due to our minimum supported version being
Windows Vista.
The Maximum number of groups supported at this time is 4, so 256 logical
cores. The group indices are 0 based. One thread can have affinity with
multiple groups.
See
https://msdn.microsoft.com/en-us/library/windows/desktop/ms684251.aspx
and particularly helpful is the whitepaper: 'Supporting Systems that
have more than 64 processors' at
https://msdn.microsoft.com/en-us/library/windows/hardware/dn653313.aspx
Processor groups are not guaranteed to be uniformly distributed nor
guaranteed to be filled before a next group is needed. The OS will
assign processors to groups based on physical proximity and will never
partially assign cores from one physical cpu to more than one group. If
one has two 48 core CPUs then you'd end up with two groups of 48 logical
cpus. Now add a 3rd CPU with 10 cores and the group it is assigned to
depends where the socket is on the board.
Test Plan:
./validate or make test -c . in the rts test folder.
This tests for regressions, to test this particular functionality
itself:
<program> +RTS -N -qa -RTS
Test is detailed in description.
Reviewers: bgamari, simonmar, austin, erikd
Reviewed By: simonmar
Subscribers: thomie, #ghc_windows_task_force
Differential Revision: https://phabricator.haskell.org/D2533
GHC Trac Issues: #11054
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We stumbled upon a case where an external library (OpenCL) does not work
if a specific address (0x200000000) is taken.
It so happens that `osReserveHeapMemory` starts trying to mmap at 0x200000000:
```
void *hint = (void*)((W_)8 * (1 << 30) + attempt * BLOCK_SIZE);
at = osTryReserveHeapMemory(*len, hint);
```
This makes it impossible to use Haskell programs compiled with GHC 8
with C functions that use OpenCL.
See this example https://github.com/chpatrick/oclwtf for a repro.
This patch allows the user to work around this kind of behavior outside
our control by letting the user override the starting address through an
RTS command line flag.
Reviewers: bgamari, Phyx, simonmar, erikd, austin
Reviewed By: Phyx, simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D2513
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The aim here is to reduce the number of remote memory accesses on
systems with a NUMA memory architecture, typically multi-socket servers.
Linux provides a NUMA API for doing two things:
* Allocating memory local to a particular node
* Binding a thread to a particular node
When given the +RTS --numa flag, the runtime will
* Determine the number of NUMA nodes (N) by querying the OS
* Assign capabilities to nodes, so cap C is on node C%N
* Bind worker threads on a capability to the correct node
* Keep a separate free lists in the block layer for each node
* Allocate the nursery for a capability from node-local memory
* Allocate blocks in the GC from node-local memory
For example, using nofib/parallel/queens on a 24-core 2-socket machine:
```
$ ./Main 15 +RTS -N24 -s -A64m
Total time 173.960s ( 7.467s elapsed)
$ ./Main 15 +RTS -N24 -s -A64m --numa
Total time 150.836s ( 6.423s elapsed)
```
The biggest win here is expected to be allocating from node-local
memory, so that means programs using a large -A value (as here).
According to perf, on this program the number of remote memory accesses
were reduced by more than 50% by using `--numa`.
Test Plan:
* validate
* There's a new flag --debug-numa=<n> that pretends to do NUMA without
actually making the OS calls, which is useful for testing the code
on non-NUMA systems.
* TODO: I need to add some unit tests
Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2199
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes the code a little more modular and allows the removal of some
CPP hackery. By providing dummy implementations of of the `m32_*`
functions (which simply call `errorBelch`) it means that the call sites
for these functions are syntax checked even when `RTS_LINKER_USE_MMAP`
is `0`.
Also changes some size parameter types from `unsigned int` to `size_t`.
Test Plan: Validate on Linux, OS X and Windows
Reviewers: Phyx, hsyl20, bgamari, simonmar, austin
Reviewed By: simonmar, austin
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2237
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first argument of 'osFreeMBlocks' ought to have the same type as the
return value from 'osGetMBlocks'. Make it so.
Reviewers: austin, simonmar, bgamari
Reviewed By: bgamari
Subscribers: erikd, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D2235
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `nat` type was an alias for `unsigned int` with a comment saying
it was at least 32 bits. We keep the typedef in case client code is
using it but mark it as deprecated.
Test Plan: Validated on Linux, OS X and Windows
Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20
Differential Revision: https://phabricator.haskell.org/D2166
|
|
|
|
|
|
|
|
|
| |
At some point there may have been a reason for the
`INLINE_ME` macro, but not anymore...
Reviewed By: austin
Differential Revision: https://phabricator.haskell.org/D2041
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use of this helper function was removed in:
commit 3c9fc104337a142fe4f375d30d7a6b81d55a70c1
Author: Brian Brooks <brooks.brian@gmail.com>
Date: Thu Jul 10 02:55:33 2014 -0500
Avoid unnecessary clock_gettime() syscalls in GC stats.
Noticed by uselex.rb:
getThreadCPUTime: [R]: exported from:
./rts/dist/build/posix/GetTime.p_o
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
|