| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
This is causing too much platform dependent breakage at the moment. We
will need a more rigorous testing strategy before this can be
merged again.
This reverts commit 7e340c2bbf4a56959bd1e95cdd1cfdb2b7e537c2.
|
|
|
|
| |
These were missed in D3278.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The C code in the RTS now gets built with `-Wundef` and the Haskell code
(stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever
`#if` is used on undefined identifiers.
Test Plan: Validate on Linux and Windows
Reviewers: austin, angerman, simonmar, bgamari, Phyx
Reviewed By: bgamari
Subscribers: thomie, snowleopard
Differential Revision: https://phabricator.haskell.org/D3278
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we throw an exception for heap overflow, we should only print
the heap overflow message in the main thread when the HeapOverflow
exception is caught, rather than as a side effect in the GC.
Stack overflows were already done this way, I just made heap overflow
consistent with stack overflow, and did some related cleanup.
Fixes broken T2592(profasm) which was reporting the heap overflow
message twice (you would only notice when building with profiling
libs enabled).
Test Plan: validate
Reviewers: bgamari, niteria, austin, DemiMarie, hvr, erikd
Reviewed By: bgamari
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3394
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to
https://ghc.haskell.org/trac/ghc/ticket/13491
https://phabricator.haskell.org/D3122
SIZEOF_HSINT and SIZEOF_VOID_P are sizes of
target platform. These values are usually
not correct when stage1 is built.
It means the code
```haskell
newFastMutInt = IO $ \s ->
case newByteArray# size s of { (# s, arr #) ->
(# s, FastMutInt arr #) }
where !(I# size) = SIZEOF_HSINT
```
would try to allocate only 4 bytes on 64-bit-host
targeting 32-bit system.
It does not matter in practice as newByteArray#
implementation rounds up passed value to host's
word size. But one day it might not.
To prevent this class of problems in compiler/
directory 'MachDeps.h' contents is hidden when
ghc-stage1 (-DSTAGE=1) is built.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Reviewers: austin, rwbarton, simonmar, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D3405
|
|
|
|
|
| |
I evidently neglected to consider that validate doesn't build profiled
ways. Arg.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces a RTS option, -po, which allows the user to override the stem
used to form the output file names of the heap profile and cost center summary.
It's a bit unclear to me whether this is really the interface we want.
Alternatively we could just allow the user to specify the `.hp` and `.prof` file
names separately. This would arguably be a bit more straightforward and would
allow the user to name JSON output with an appropriate `.json` suffix if they so
desired. However, this would come at the cost of taking more of the option
space, which is a somewhat precious commodity.
Test Plan: Validate, try using `-po` RTS option
Reviewers: simonmar, austin, erikd
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D3182
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces a JSON output format for cost-centre profiler reports.
It's not clear whether this is really something we want to introduce
given that we may also move to a more Haskell-driven output pipeline in
the future, but I nevertheless found this helpful, so I thought I would
put it up.
Test Plan: Compile a program with `-prof -fprof-auto`; run with `+RTS
-pj`
Reviewers: austin, erikd, simonmar
Reviewed By: simonmar
Subscribers: duncan, maoe, thomie, simonmar
Differential Revision: https://phabricator.haskell.org/D3132
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This corrects the unwind information for `stg_stop_thread`, which
allows us to unwind back to the C stack after reaching the end of the
STG stack.
Test Plan: Validate
Reviewers: simonmar, austin, erikd
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2746
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[skip ci]
There ware some old file names (.lhs, ...) at comments.
* includes/rts/Bytecodes.h
- ghc/compiler/ghci/ByteCodeGen.lhs -> ByteCodeAsm.hs
* includes/rts/Constants.h
- libraries/base/GHC/Conc.lhs -> libraries/base/GHC/Conc/Sync.hs
* includes/rts/storage/FunTypes.h
- utils/genapply/GenApply.hs -> utils/genappl/Main.hs
- compiler/codeGen/CgCallConv.lhs -> compiler/codeGen/StgCmmLayout.hs
* includes/stg/MiscClosures.h
- compiler/codeGen/CgStackery.lhs -> compiler/codeGen/StgCmmArgRep.hs
- HeapStackCheck.hc -> HeapStackCheck.cmm
Reviewers: bgamari, austin, simonmar, erikd
Reviewed By: erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D3074
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here we add support to GHCi for StaticPointers. This process begins by
adding remote GHCi messages for adding entries to the static pointer
table. We then collect binders needing SPT entries after linking and
send the interpreter a message adding entries with the appropriate
fingerprints.
Test Plan: `make test TEST=StaticPtr`
Reviewers: facundominguez, mboes, simonpj, simonmar, goldfire, austin,
hvr, erikd
Reviewed By: simonpj, simonmar
Subscribers: RyanGlScott, simonpj, thomie
Differential Revision: https://phabricator.haskell.org/D2504
GHC Trac Issues: #12356
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently eventlog data is always written to a file `progname.eventlog`.
This patch introduces the `flushEventLog` field in `RtsConfig` which
allows to customize the writing of eventlog data.
One possible scenario is the ongoing live-profile-monitor effort by
@NCrashed which slurps all eventlog data through `fluchEventLog`.
`flushEventLog` takes a buffer with eventlog data and its size and
returns `false` (0) in case eventlog data could not be procesed.
Reviewers: simonmar, austin, erikd, bgamari
Reviewed By: simonmar, bgamari
Subscribers: qnikst, thomie, NCrashed
Differential Revision: https://phabricator.haskell.org/D2934
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When toplevel literals don't have a way to be exported
from module GHC infers their labels as static.
Example from GHC.Arr:
static char rdVA_bytes[] = " out of range ";
When this label is used in module internally
we also need to provide it's forward declaration.
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Typical UNREG build failure looks like that:
ghc-unreg/includes/Stg.h:226:46: error:
note: in definition of macro 'EI_'
#define EI_(X) extern StgWordArray (X) GNU_ATTRIBUTE(aligned (8))
^
|
226 | #define EI_(X) extern StgWordArray (X) GNU_ATTRIBUTE(aligned (8))
| ^
/tmp/ghc10489_0/ghc_3.hc:1754:6: error:
note: previous definition of 'ghczmprim_GHCziTypes_zdtcTyCon2_bytes' was here
char ghczmprim_GHCziTypes_zdtcTyCon2_bytes[] = "TyCon";
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
1754 | char ghczmprim_GHCziTypes_zdtcTyCon2_bytes[] = "TyCon";
| ^
As we see here "_bytes" string literals are defined as 'char []'
array, not 'StgWord []'.
The change special-cases "_bytes" string literals to have
correct declaration type.
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes heap overflow to throw a HeapOverflow exception instead of
killing the process.
Test Plan: GHC CI
Reviewers: simonmar, austin, hvr, erikd, bgamari
Reviewed By: simonmar, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2790
GHC Trac Issues: #1791
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* In stg_ap_0_fast, if we're evaluating a thunk, the thunk might
evaluate to a function in which case we may have to adjust its CCS.
* The interpreter has its own implementation of stg_ap_0_fast, so we
have to do the same shenanigans with creating empty PAPs and copying
PAPs there.
* GHCi creates Cost Centres as children of CCS_MAIN, which enterFunCCS()
wrongly assumed to imply that they were CAFs. Now we use the is_caf
flag for this, which we have to correctly initialise when we create a
Cost Centre in GHCi.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 394231b301efb6b56654b0a480ab794fe3b7e4db aded
CCS_OVERHEAD annotation to 'rts/Apply.cmm'.
Before the change CCS_OVERHEAD was used only in C code.
The change exports CCS_OVERHEAD to STG.
Fixes UNREG build failure:
rts_dist_HC rts/dist/build/Apply.p_o
/tmp/ghc29563_0/ghc_4.hc: In function 'cm_entry':
/tmp/ghc29563_0/ghc_4.hc:73:13: error:
error: 'CCS_OVERHEAD' undeclared (first use in this function)
*((P_)((W_)&CCS_OVERHEAD+72)) = ...
^~~~~~~~~~~~
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes some cases of wrong stacks being generated by the profiler.
For background and details on the fix see
`Note [Evaluating functions with profiling]` in `rts/Apply.cmm`.
This does have an impact on allocations for some programs when
profiling. nofib results:
```
k-nucleotide +0.0% +8.8% +11.0% +11.0% 0.0%
puzzle +0.0% +12.5% 0.244 0.246 0.0%
typecheck 0.0% +8.7% +16.1% +16.2% 0.0%
------------------------------------------------------------------------
--------
Min -0.0% -0.0% -34.4% -35.5% -25.0%
Max +0.0% +12.5% +48.9% +49.4% +10.6%
Geometric Mean +0.0% +0.6% +2.0% +1.8% -0.3%
```
But runtimes don't seem to be affected much, and the examples I looked
at were completely legitimate. For example, in puzzle we have this:
```
position :: ItemType -> StateType -> BankType
position Bono = bonoPos
position Edge = edgePos
position Larry = larryPos
position Adam = adamPos
```
where the identifiers on the rhs are all record selectors. Previously
the profiler gave a stack that looked like
```
position
bonoPos
...
```
i.e. `bonoPos` was at the same level of the call stack as `position`,
but now it looks like
```
position
bonoPos
...
```
I used the normaliser from the testsuite to diff the profiling output
from other nofib programs and they all looked better.
Test Plan:
* the broken test passes
* validate
* compiled and ran all of nofib, measured perf, diff'd several .prof
files
Reviewers: niteria, erikd, austin, scpmw, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2804
GHC Trac Issues: #5654, #10007
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: austin, simonmar, erikd, bgamari
Reviewed By: erikd, bgamari
Subscribers: angerman, thomie
Differential Revision: https://phabricator.haskell.org/D2823
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The use of globals is quite painful when multiple rts are loaded, e.g.
when plugins are loaded, which bring in a second rts. The sharedCAF
appraoch was employed for the FastStringTable; I've taken the libery
to extend this to the other globals I could find.
This is a reboot of D2575, that should hopefully not exhibit the same
windows build issues.
Reviewers: Phyx, simonmar, goldfire, bgamari, austin, hvr, erikd
Reviewed By: Phyx, simonmar, bgamari
Subscribers: mpickering, thomie
Differential Revision: https://phabricator.haskell.org/D2773
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces calls to barf() in loadArchive() with proper
error handling.
Test Plan: GHC CI
Reviewers: rwbarton, erikd, hvr, austin, simonmar, bgamari
Reviewed By: bgamari
Subscribers: thomie
Tags: #ghc
Differential Revision: https://phabricator.haskell.org/D2652
GHC Trac Issues: #12388
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This commit makes various improvements and addresses some issues with
Compact Regions (aka Compact Normal Forms).
This was the most important thing I wanted to fix. Compaction
previously prevented GC from running until it was complete, which
would be a problem in a multicore setting. Now, we compact using a
hand-written Cmm routine that can be interrupted at any point. When a
GC is triggered during a sharing-enabled compaction, the GC has to
traverse and update the hash table, so this hash table is now stored
in the StgCompactNFData object.
Previously, compaction consisted of a deepseq using the NFData class,
followed by a traversal in C code to copy the data. This is now done
in a single pass with hand-written Cmm (see rts/Compact.cmm). We no
longer use the NFData instances, instead the Cmm routine evaluates
components directly as it compacts.
The new compaction is about 50% faster than the old one with no
sharing, and a little faster on average with sharing (the cost of the
hash table dominates when we're doing sharing).
Static objects that don't (transitively) refer to any CAFs don't need
to be copied into the compact region. In particular this means we
often avoid copying Char values and small Int values, because these
are static closures in the runtime.
Each Compact# object can support a single compactAdd# operation at any
given time, so the Data.Compact library now enforces mutual exclusion
using an MVar stored in the Compact object.
We now get exceptions rather than killing everything with a barf()
when we encounter an object that cannot be compacted (a function, or a
mutable object). We now also detect pinned objects, which can't be
compacted either.
The Data.Compact API has been refactored and cleaned up. A new
compactSize operation returns the size (in bytes) of the compact
object.
Most of the documentation is in the Haddock docs for the compact
library, which I've expanded and improved here.
Various comments in the code have been improved, especially the main
Note [Compact Normal Forms] in rts/sm/CNF.c.
I've added a few tests, and expanded a few of the tests that were
there. We now also run the tests with GHCi, and in a new test way
that enables sanity checking (+RTS -DS).
There's a benchmark in libraries/compact/tests/compact_bench.hs for
measuring compaction speed and comparing sharing vs. no sharing.
The field totalDataW in StgCompactNFData was unnecessary.
Test Plan:
* new unit tests
* validate
* tested manually that we can compact Data.Aeson data
Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd
Subscribers: thomie, simonpj
Differential Revision: https://phabricator.haskell.org/D2751
GHC Trac Issues: #12455
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Visible API changes:
* The C struct `GCDetails` gives the stats about a single GC. This is
passed to the `gcDone()` callback if one is set via the
RtsConfig. (previously we just passed a collection of values, so this
is more extensible, at the expense of breaking the existing API)
* `RTSStats` gives cumulative stats since the start of the program,
and includes the `GCDetails` for the most recent GC. This struct
can be obtained via `getRTSStats()` (the old `getGCStats()` has been
removed, and `getGCStatsEnabled()` has been renamed to
`getRTSStatsEnabled()`)
Improvements:
* The per-GC stats and cumulative stats are now cleanly separated.
* Inside the RTS we have a top-level `RTSStats` struct to keep all our
stats in, previously this was just a collection of strangely-named
variables. This struct is mostly just copied in `getRTSStats()`, so
the implementation of that function is a lot shorter.
* Types are more consistent. We use a uint64_t byte count for all
memory values, and Time for all time values.
* Names are more consistent. We use a suffix `_bytes` for all byte
counts and `_ns` for all time values.
* We now collect information about the amount of memory in large
objects and compact objects in `GCDetails`. (the latter was the reason
I started doing this patch but it seems to have ballooned a bit!)
* I fixed a bug in the calculation of the elapsed MUT time, and added
an ASSERT to stop the calculations going wrong in the future.
For now I kept the Haskell API in `GHC.Stats` the same, by
impedence-matching with the new API. We could either break that API
and make it match the C API more closely, or we could add a new API
and deprecate the old one. Opinions welcome.
This stuff is very easy to get wrong, and it's hard to test. Reviews
welcome!
Test Plan:
manual testing
validate
Reviewers: bgamari, niteria, austin, ezyang, hvr, erikd, rwbarton, Phyx
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2756
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When rts is forked it doesn't update toplevel handler, so UserInterrupt
exception is sent to Thread1 that doesn't exist in forked process.
We install toplevel handler when fork so signal will be delivered to the
new main thread.
Fixes #12903
Reviewers: simonmar, austin, erikd, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2770
GHC Trac Issues: #12903
|
|
|
|
|
| |
This reverts commit 6f7ed1e51bf360621a3c2a447045ab3012f68575 due to breakage of
the build on Windows.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate on lots of platforms
Reviewers: erikd, simonmar, austin
Reviewed By: erikd, simonmar
Subscribers: michalt, thomie
Differential Revision: https://phabricator.haskell.org/D2699
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The use of globals is quite painful when multiple rts are loaded, e.g.
when plugins are loaded, which bring in a second rts. The sharedCAF
appraoch was employed for the FastStringTable; I've taken the libery
to extend this to the other globals I could find.
Reviewers: rwbarton, simonmar, austin, hvr, erikd, bgamari
Reviewed By: simonmar, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2575
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On iOS, we use the pthread-based implementation of Itimer.c even for a
non-threaded RTS. Since 999c464, this relies on synchronization
primitives like Mutex, so ensure those primitives are defined whenever
they are supported, even if !THREADED_RTS.
Fixes #12799.
Reviewers: erikd, austin, simonmar, bgamari
Reviewed By: simonmar, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2712
GHC Trac Issues: #12799
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We currently have two info tables for a constructor
* XXX_con_info: the info table for a heap-resident instance of the
constructor, It has type CONSTR, or one of the specialised types like
CONSTR_1_0
* XXX_static_info: the info table for a static instance of this
constructor, which has type CONSTR_STATIC or CONSTR_STATIC_NOCAF.
I'm getting rid of the latter, and using the `con_info` info table for
both static and dynamic constructors. For rationale and more details
see Note [static constructors] in SMRep.hs.
I also removed these macros: `isSTATIC()`, `ip_STATIC()`,
`closure_STATIC()`, since they relied on the CONSTR/CONSTR_STATIC
distinction, and anyway HEAP_ALLOCED() does the same job.
Test Plan: validate
Reviewers: bgamari, simonpj, austin, gcampax, hvr, niteria, erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2690
GHC Trac Issues: #12455
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`rts_setInCallCapability` sets the thread affinity as well as pins the
numa node. We should also have the ability to set the numa node without
setting the capability affinity. `rts_pinNumaNodeForCapability` function
is added and exported via `RtsAPI.h`.
Previous callers of `rts_setInCallCapability` should now also call
`rts_pinNumaNodeForCapability` to get the same effect as before.
Test Plan:
./validate
Reviewers: austin, simonmar, bgamari
Reviewed By: simonmar, bgamari
Subscribers: thomie, niteria
Differential Revision: https://phabricator.haskell.org/D2637
GHC Trac Issues: #12764
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Read it
Reviewers: austin, erikd, simonmar
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2663
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Windows support for more than 64 logical processors are implemented
using processor groups.
Essentially what it's doing is keeping the existing maximum of 64
processors and keeping the affinity mask a 64 bit value, but adds an
hierarchy above that.
This support was added to Windows 7 and so we need to at runtime detect
if the APIs are still there due to our minimum supported version being
Windows Vista.
The Maximum number of groups supported at this time is 4, so 256 logical
cores. The group indices are 0 based. One thread can have affinity with
multiple groups.
See
https://msdn.microsoft.com/en-us/library/windows/desktop/ms684251.aspx
and particularly helpful is the whitepaper: 'Supporting Systems that
have more than 64 processors' at
https://msdn.microsoft.com/en-us/library/windows/hardware/dn653313.aspx
Processor groups are not guaranteed to be uniformly distributed nor
guaranteed to be filled before a next group is needed. The OS will
assign processors to groups based on physical proximity and will never
partially assign cores from one physical cpu to more than one group. If
one has two 48 core CPUs then you'd end up with two groups of 48 logical
cpus. Now add a 3rd CPU with 10 cores and the group it is assigned to
depends where the socket is on the board.
Test Plan:
./validate or make test -c . in the rts test folder.
This tests for regressions, to test this particular functionality
itself:
<program> +RTS -N -qa -RTS
Test is detailed in description.
Reviewers: bgamari, simonmar, austin, erikd
Reviewed By: simonmar
Subscribers: thomie, #ghc_windows_task_force
Differential Revision: https://phabricator.haskell.org/D2533
GHC Trac Issues: #11054
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: And document it. See the docmentation for the reason I want this.
Test Plan: It's an existing API, just exposing it.
Reviewers: bgamari, niteria, austin, erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2531
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is a fast, non-blocking, asynchronous, interface to tryPutMVar that
can be called from C/C++.
It's useful for callback-based C/C++ APIs: the idea is that the callback
invokes hs_try_putmvar(), and the Haskell code waits for the callback to
run by blocking in takeMVar.
The callback doesn't block - this is often a requirement of
callback-based APIs. The callback wakes up the Haskell thread with
minimal overhead and no unnecessary context-switches.
There are a couple of benchmarks in
testsuite/tests/concurrent/should_run. Some example results comparing
hs_try_putmvar() with using a standard foreign export:
./hs_try_putmvar003 1 64 16 100 +RTS -s -N4 0.49s
./hs_try_putmvar003 2 64 16 100 +RTS -s -N4 2.30s
hs_try_putmvar() is 4x faster for this workload (see the source for
hs_try_putmvar003.hs for details of the workload).
An alternative solution is to use the IO Manager for this. We've tried
it, but there are problems with that approach:
* Need to create a new file descriptor for each callback
* The IO Manger thread(s) become a bottleneck
* More potential for things to go wrong, e.g. throwing an exception in
an IO Manager callback kills the IO Manager thread.
Test Plan: validate; new unit tests
Reviewers: niteria, erikd, ezyang, bgamari, austin, hvr
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2501
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We stumbled upon a case where an external library (OpenCL) does not work
if a specific address (0x200000000) is taken.
It so happens that `osReserveHeapMemory` starts trying to mmap at 0x200000000:
```
void *hint = (void*)((W_)8 * (1 << 30) + attempt * BLOCK_SIZE);
at = osTryReserveHeapMemory(*len, hint);
```
This makes it impossible to use Haskell programs compiled with GHC 8
with C functions that use OpenCL.
See this example https://github.com/chpatrick/oclwtf for a repro.
This patch allows the user to work around this kind of behavior outside
our control by letting the user override the starting address through an
RTS command line flag.
Reviewers: bgamari, Phyx, simonmar, erikd, austin
Reviewed By: Phyx, simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D2513
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of stg_interp_constr_entry there are now 7 functions (one for
each value of the tag bits) that tag the constructor pointer before
returning. This is consistent with compiled constructors' entry code,
and expectations that compiled code places on compiled constructors. The
iserv protocol is extended with an extra field that explains what
pointer tag the constructor should use.
Test Plan: Added tests for #12523
Reviewers: erikd, bgamari, hvr, austin, simonmar
Reviewed By: simonmar
Subscribers: osa1, thomie, rwbarton
Differential Revision: https://phabricator.haskell.org/D2473
GHC Trac Issues: #12523
|
|
|
|
|
|
|
| |
This reverts commit e3e2e49a8f6952e1c8a19321c729c17b294d8c92.
I'm reverting because it makes ghc-stage2 seg-fault on
64-bit Windows machines. Even ghc-stage2 --version seg-faults.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch refactors GNU C version test (for 4.5 and more modern)
due to usage of __builtin_unreachable done in the CNF.c code directly
into the new RTS_UNREACHABLE macro placed into Rts.h
Reviewers: bgamari, austin, simonmar, erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2457
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There was a complication on the x86_64 platform, where pointers were 64
bits, but the tools didn't support 64-bit relative relocations. This
was true before binutils 2.17, which nowadays is quite standart (even
CentOs 5 is shipped with 2.17).
Hacks were removed from x86 genSwitch and asm pretty printer. Also
[x86-64-relative] note was dropped from
includes/rts/storage/InfoTables.h as it's not referenced anywhere now.
Reviewers: austin, simonmar, rwbarton, erikd, bgamari
Reviewed By: simonmar, erikd, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2426
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adjust `CmmParse.y` to parse the `cmpxchg{8, 16, 32, 64}` instructions
and use the 32 respectively the 64 bit variant in `Primops.cmm`. This
effectively eliminates the compare-and-swap ccall to the rts.
Based off the mailing list question from @osa1
(https://mail.haskell.org/pipermail/ghc-devs/2016-July/012506.html).
Reviewers: simonmar, austin, erikd, bgamari, trommler
Reviewed By: erikd, bgamari, trommler
Subscribers: carter, trommler, osa1, thomie
Differential Revision: https://phabricator.haskell.org/D2431
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This exposes mblocks_allocated in the GCStats struct.
Test Plan: it builds
Reviewers: bgamari, simonmar, austin, hvr, erikd
Reviewed By: erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2429
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch implements primitive unboxed sum types, as described in
https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes.
Main changes are:
- Add new syntax for unboxed sums types, terms and patterns. Hidden
behind `-XUnboxedSums`.
- Add unlifted unboxed sum type constructors and data constructors,
extend type and pattern checkers and desugarer.
- Add new RuntimeRep for unboxed sums.
- Extend unarise pass to translate unboxed sums to unboxed tuples right
before code generation.
- Add `StgRubbishArg` to `StgArg`, and a new type `CmmArg` for better
code generation when sum values are involved.
- Add user manual section for unboxed sums.
Some other changes:
- Generalize `UbxTupleRep` to `MultiRep` and `UbxTupAlt` to
`MultiValAlt` to be able to use those with both sums and tuples.
- Don't use `tyConPrimRep` in `isVoidTy`: `tyConPrimRep` is really
wrong, given an `Any` `TyCon`, there's no way to tell what its kind
is, but `kindPrimRep` and in turn `tyConPrimRep` returns `PtrRep`.
- Fix some bugs on the way: #12375.
Not included in this patch:
- Update Haddock for new the new unboxed sum syntax.
- `TemplateHaskell` support is left as future work.
For reviewers:
- Front-end code is mostly trivial and adapted from unboxed tuple code
for type checking, pattern checking, renaming, desugaring etc.
- Main translation routines are in `RepType` and `UnariseStg`.
Documentation in `UnariseStg` should be enough for understanding
what's going on.
Credits:
- Johan Tibell wrote the initial front-end and interface file
extensions.
- Simon Peyton Jones reviewed this patch many times, wrote some code,
and helped with debugging.
Reviewers: bgamari, alanz, goldfire, RyanGlScott, simonpj, austin,
simonmar, hvr, erikd
Reviewed By: simonpj
Subscribers: Iceland_jack, ggreif, ezyang, RyanGlScott, goldfire,
thomie, mpickering
Differential Revision: https://phabricator.haskell.org/D2259
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This brings in initial support for compact regions, as described in the
ICFP 2015 paper "Efficient Communication and Collection with Compact
Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni
Campagna.
Some things may change before the 8.2 release, but I (Simon M.) wanted
to get the main patch committed so that we can iterate.
What documentation there is is in the Data.Compact module in the new
compact package. We'll need to extend and polish the documentation
before the release.
Test Plan:
validate
(new test cases included)
Reviewers: ezyang, simonmar, hvr, bgamari, austin
Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd
Differential Revision: https://phabricator.haskell.org/D1264
GHC Trac Issues: #11493
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Try it
Reviewers: hvr, simonmar, austin, erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1722
GHC Trac Issues: #11094
|
|
|
|
|
| |
- Move the numaMap and nNumaNodes out of RtsFlags to Capability.c
- Add a test to tests/rts
|
|
|
|
|
|
|
|
| |
* Remove unused/old flags from the structs
* Update old comments
* Add missing flags to GHC.RTS
* Simplify GHC.RTS, remove C code and use hsc2hs instead
* Make ParFlags unconditional, and add support to GHC.RTS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The aim here is to reduce the number of remote memory accesses on
systems with a NUMA memory architecture, typically multi-socket servers.
Linux provides a NUMA API for doing two things:
* Allocating memory local to a particular node
* Binding a thread to a particular node
When given the +RTS --numa flag, the runtime will
* Determine the number of NUMA nodes (N) by querying the OS
* Assign capabilities to nodes, so cap C is on node C%N
* Bind worker threads on a capability to the correct node
* Keep a separate free lists in the block layer for each node
* Allocate the nursery for a capability from node-local memory
* Allocate blocks in the GC from node-local memory
For example, using nofib/parallel/queens on a 24-core 2-socket machine:
```
$ ./Main 15 +RTS -N24 -s -A64m
Total time 173.960s ( 7.467s elapsed)
$ ./Main 15 +RTS -N24 -s -A64m --numa
Total time 150.836s ( 6.423s elapsed)
```
The biggest win here is expected to be allocating from node-local
memory, so that means programs using a large -A value (as here).
According to perf, on this program the number of remote memory accesses
were reduced by more than 50% by using `--numa`.
Test Plan:
* validate
* There's a new flag --debug-numa=<n> that pretends to do NUMA without
actually making the OS calls, which is useful for testing the code
on non-NUMA systems.
* TODO: I need to add some unit tests
Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2199
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: simonmar, duncan, erikd, austin
Reviewed By: austin
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2290
GHC Trac Issues: #12059
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use C compiler builtins for atomic SMP primitives. This saves a lot
of CPP ifdefs.
Add test for atomic xchg:
Test if __sync_lock_test_and_set() builtin stores the second argument.
The gcc manual says the actual value stored is implementation defined.
Test Plan: validate and eyeball generated assembler code
Reviewers: kgardas, simonmar, hvr, bgamari, austin, erikd
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2233
|