| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GC had a two-level structure, G generations each of T steps.
Steps are for aging within a generation, mostly to avoid premature
promotion.
Measurements show that more than 2 steps is almost never worthwhile,
and 1 step is usually worse than 2. In theory fractional steps are
possible, so the ideal number of steps is somewhere between 1 and 3.
GHC's default has always been 2.
We can implement 2 steps quite straightforwardly by having each block
point to the generation to which objects in that block should be
promoted, so blocks in the nursery point to generation 0, and blocks
in gen 0 point to gen 1, and so on.
This commit removes the explicit step structures, merging generations
with steps, thus simplifying a lot of code. Performance is
unaffected. The tunable number of steps is now gone, although it may
be replaced in the future by a way to tune the aging in generation 0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a batch of refactoring to remove some of the GC's global
state, as we move towards CPU-local GC.
- allocateLocal() now allocates large objects into the local
nursery, rather than taking a global lock and allocating
then in gen 0 step 0.
- allocatePinned() was still allocating from global storage and
taking a lock each time, now it uses local storage.
(mallocForeignPtrBytes should be faster with -threaded).
- We had a gen 0 step 0, distinct from the nurseries, which are
stored in a separate nurseries[] array. This is slightly strange.
I removed the g0s0 global that pointed to gen 0 step 0, and
removed all uses of it. I think now we don't use gen 0 step 0 at
all, except possibly when there is only one generation. Possibly
more tidying up is needed here.
- I removed the global allocate() function, and renamed
allocateLocal() to allocate().
- the alloc_blocks global is gone. MAYBE_GC() and
doYouWantToGC() now check the local nursery only.
|
| |
|
|
|
|
| |
Also, use C99-style array initialisers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first phase of this tidyup is focussed on the header files, and in
particular making sure we are exposinng publicly exactly what we need
to, and no more.
- Rts.h now includes everything that the RTS exposes publicly,
rather than a random subset of it.
- Most of the public header files have moved into subdirectories, and
many of them have been renamed. But clients should not need to
include any of the other headers directly, just #include the main
public headers: Rts.h, HsFFI.h, RtsAPI.h.
- All the headers needed for via-C compilation have moved into the
stg subdirectory, which is self-contained. Most of the headers for
the rest of the RTS APIs have moved into the rts subdirectory.
- I left MachDeps.h where it is, because it is so widely used in
Haskell code.
- I left a deprecated stub for RtsFlags.h in place. The flag
structures are now exposed by Rts.h.
- Various internal APIs are no longer exposed by public header files.
- Various bits of dead code and declarations have been removed
- More gcc warnings are turned on, and the RTS code is more
warning-clean.
- More source files #include "PosixSource.h", and hence only use
standard POSIX (1003.1c-1995) interfaces.
There is a lot more tidying up still to do, this is just the first
pass. I also intend to standardise the names for external RTS APIs
(e.g use the rts_ prefix consistently), and declare the internal APIs
as hidden for shared libraries.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Eager blackholing can improve parallel performance by reducing the
chances that two threads perform the same computation. However, it
has a cost: one extra memory write per thunk entry.
To get the best results, any code which may be executed in parallel
should be compiled with eager blackholing turned on. But since
there's a cost for sequential code, we make it optional and turn it on
for the parallel package only. It might be a good idea to compile
applications (or modules) with parallel code in with
-feager-blackholing.
ToDo: document -feager-blackholing.
|
| |
|
|
|
|
|
|
|
|
|
| |
gcc 4.3 emits warnings for static inline functions that its heuristics
decided not to inline. The workaround is to either mark appropriate
functions as "hot" (a new attribute in gcc 4.3), or sometimes to use
"extern inline" instead.
With this fix I can validate with gcc 4.3 on Fedora 9.
|
|
|
|
|
|
|
|
| |
Now allocate() is a synonym for allocateInGen().
I also made various cleanups: there is now less special-case code for
supporting -G1 (two-space collection), and -G1 now works with
-threaded.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously MVars were always on the mutable list of the old
generation, which meant every MVar was visited during every minor GC.
With lots of MVars hanging around, this gets expensive. We addressed
this problem for MUT_VARs (aka IORefs) a while ago, the solution is to
use a traditional GC write-barrier when the object is modified. This
patch does the same thing for MVars.
TVars are still done the old way, they could probably benefit from the
same treatment too.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the con_desc field of an info table was made into a relative
reference, this had the side effect of making the profiling fields
(closure_desc and closure_type) also relative, but only when compiling
via C, and the heap profiler was still treating them as absolute,
leading to crashes when profiling with -hd or -hy.
This patch fixes up the story to be consistent: these fields really
should be relative (otherwise we couldn't make shared versions of the
profiling libraries), so I've made them relative and fixed up the RTS
to know about this.
|
| |
|
|
|
|
|
| |
Seems to be a bug introduced by code to free the memory allocated by
the heap profiler.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that constructor info tables contain the name of the constructor,
we can generate useful heap profiles without requiring the whole
program and libraries to be compiled with -prof. So now, "+RTS -hT"
generates a heap profile for any program, dividing the profile by
constructor. It wouldn't be hard to add support for grouping
constructors by module, or to restrict the profile to certain
constructors/modules/packages.
This means that for the first time we can get heap profiles for GHCi,
which was previously impossible because the byte-code
interpreter and linker don't work with -prof.
|
|
|
|
|
| |
We recently discovered that they aren't a win any more, and just cost
code size.
|
| |
|
| |
|
|
|
|
|
|
|
| |
Fix output of cost-centre stacks so that the slashes appear in the correct place
Please include this patch in the 6.6 branch as well as HEAD
|
|
|
|
|
|
|
|
| |
Add the -L RTS flag to control the length of the cost-centre stacks reported in
a heap profile.
Please include this change in the 6.6 branch as well as HEAD
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
These closure types aren't used/needed, as far as I can tell. The
commoning up of Chars/Ints happens by comparing info pointers, and
the info table for a dynamic C#/I# is CONSTR_0_1. The RTS seemed
a little confused about whether CONSTR_CHARLIKE/CONSTR_INTLIKE were
supposed to be static or dynamic closures, too.
|
|
Most of the other users of the fptools build system have migrated to
Cabal, and with the move to darcs we can now flatten the source tree
without losing history, so here goes.
The main change is that the ghc/ subdir is gone, and most of what it
contained is now at the top level. The build system now makes no
pretense at being multi-project, it is just the GHC build system.
No doubt this will break many things, and there will be a period of
instability while we fix the dependencies. A straightforward build
should work, but I haven't yet fixed binary/source distributions.
Changes to the Building Guide will follow, too.
|