| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Except for CgUtils.fixStgRegisters that is used in the NCG and LLVM
backends, and should probably be moved somewhere else.
|
|
|
|
|
| |
Mostly d -> g (matching DynFlag -> GeneralFlag).
Also renamed if* to when*, matching the Haskell if/when names
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
I've switched to passing DynFlags rather than Platform, as (a) it's
simpler to not have to extract targetPlatform in so many places, and
(b) it may be useful to have DynFlags around in future.
|
| |
|
|
|
|
|
|
| |
To explicitly choose whether you want an unregisterised build you now
need to use the "--enable-unregisterised"/"--disable-unregisterised"
configure flags.
|
|
|
|
|
| |
This is a bit odd by itself, but it's a stepping stone on the way to
putting "target unregisterised" into the settings file.
|
|
|
|
| |
All the flags that 'ways' imply are now dynamic
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A thunk with no free variables was not getting blackholed when
-feager-blackholing was on, but we were nevertheless pushing the
stg_bh_upd_frame version of the update frame that expects to see a
black hole.
I fixed this twice for good measure:
- we now call blackHoleOnEntry when pushing the update frame to check
whether the closure was actually blackholed, and so that we use the
same predicate in both places
- we now black hole thunks even if they have no free variables.
These only occur when optimisation is off, but presumably if you say
-feager-blackholing then that's what you want to happen.
|
|
|
|
| |
No special-casing in any NCGs yet
|
|
|
|
|
|
|
|
|
| |
We now carry around with CmmJump statements a list of
the STG registers that are live at that jump site.
This is used by the LLVM backend so it can avoid
unnesecarily passing around dead registers, improving
perfromance. This gives us the framework to finally
fix trac #4308.
|
| |
|
|
|
|
|
|
|
|
| |
Instead of enterLocalIdLabel we should get the label from the
ClosureInfo, because that knows better whether the label should be
local or not.
Needed by #5357
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This means that both time and heap profiling work for parallel
programs. Main internal changes:
- CCCS is no longer a global variable; it is now another
pseudo-register in the StgRegTable struct. Thus every
Capability has its own CCCS.
- There is a new built-in CCS called "IDLE", which records ticks for
Capabilities in the idle state. If you profile a single-threaded
program with +RTS -N2, you'll see about 50% of time in "IDLE".
- There is appropriate locking in rts/Profiling.c to protect the
shared cost-centre-stack data structures.
This patch does enough to get it working, I have cut one big corner:
the cost-centre-stack data structure is still shared amongst all
Capabilities, which means that multiple Capabilities will race when
updating the "allocations" and "entries" fields of a CCS. Not only
does this give unpredictable results, but it runs very slowly due to
cache line bouncing.
It is strongly recommended that you use -fno-prof-count-entries to
disable the "entries" count when profiling parallel programs. (I shall
add a note to this effect to the docs).
|
|
|
|
|
| |
This field was doing nothing. I think it originally appeared in a
very old incarnation of the new code generator.
|
|
|
|
|
| |
We only use it for "compiler" sources, i.e. not for libraries.
Many modules have a -fno-warn-tabs kludge for now.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
User visible changes
====================
Profilng
--------
Flags renamed (the old ones are still accepted for now):
OLD NEW
--------- ------------
-auto-all -fprof-auto
-auto -fprof-exported
-caf-all -fprof-cafs
New flags:
-fprof-auto Annotates all bindings (not just top-level
ones) with SCCs
-fprof-top Annotates just top-level bindings with SCCs
-fprof-exported Annotates just exported bindings with SCCs
-fprof-no-count-entries Do not maintain entry counts when profiling
(can make profiled code go faster; useful with
heap profiling where entry counts are not used)
Cost-centre stacks have a new semantics, which should in most cases
result in more useful and intuitive profiles. If you find this not to
be the case, please let me know. This is the area where I have been
experimenting most, and the current solution is probably not the
final version, however it does address all the outstanding bugs and
seems to be better than GHC 7.2.
Stack traces
------------
+RTS -xc now gives more information. If the exception originates from
a CAF (as is common, because GHC tends to lift exceptions out to the
top-level), then the RTS walks up the stack and reports the stack in
the enclosing update frame(s).
Result: +RTS -xc is much more useful now - but you still have to
compile for profiling to get it. I've played around a little with
adding 'head []' to GHC itself, and +RTS -xc does pinpoint the problem
quite accurately.
I plan to add more facilities for stack tracing (e.g. in GHCi) in the
future.
Coverage (HPC)
--------------
* derived instances are now coloured yellow if they weren't used
* likewise record field names
* entry counts are more accurate (hpc --fun-entry-count)
* tab width is now correct (markup was previously off in source with
tabs)
Internal changes
================
In Core, the Note constructor has been replaced by
Tick (Tickish b) (Expr b)
which is used to represent all the kinds of source annotation we
support: profiling SCCs, HPC ticks, and GHCi breakpoints.
Depending on the properties of the Tickish, different transformations
apply to Tick. See CoreUtils.mkTick for details.
Tickets
=======
This commit closes the following tickets, test cases to follow:
- Close #2552: not a bug, but the behaviour is now more intuitive
(test is T2552)
- Close #680 (test is T680)
- Close #1531 (test is result001)
- Close #949 (test is T949)
- Close #2466: test case has bitrotted (doesn't compile against current
version of vector-space package)
|
| |
|
|
|
|
| |
See Note [atomic CAFs] in rts/sm/Storage.c
|
|
|
|
|
| |
1c2f89535394958f75cfb15c8c5e0433a20953ed (symptom was broken
biographical profiling, see #5451).
|
| |
|
|
|
|
|
|
|
| |
Warning: This change seems to tickle a bug in ghc-stage1 compiler
built with GHC 6.12.1 during validates.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes the new code generator to make use of the Hoopl package
for dataflow analysis. Hoopl is a new boot package, and is maintained
in a separate upstream git repository (as usual, GHC has its own
lagging darcs mirror in http://darcs.haskell.org/packages/hoopl).
During this merge I squashed recent history into one patch. I tried
to rebase, but the history had some internal conflicts of its own
which made rebase extremely confusing, so I gave up. The history I
squashed was:
- Update new codegen to work with latest Hoopl
- Add some notes on new code gen to cmm-notes
- Enable Hoopl lag package.
- Add SPJ note to cmm-notes
- Improve GC calls on new code generator.
Work in this branch was done by:
- Milan Straka <fox@ucw.cz>
- John Dias <dias@cs.tufts.edu>
- David Terei <davidterei@gmail.com>
Edward Z. Yang <ezyang@mit.edu> merged in further changes from GHC HEAD
and fixed a few bugs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This replaces the global blackhole_queue with a clever scheme that
enables us to queue up blocked threads on the closure that they are
blocked on, while still avoiding atomic instructions in the common
case.
Advantages:
- gets rid of a locked global data structure and some tricky GC code
(replacing it with some per-thread data structures and different
tricky GC code :)
- wakeups are more prompt: parallel/concurrent performance should
benefit. I haven't seen anything dramatic in the parallel
benchmarks so far, but a couple of threading benchmarks do improve
a bit.
- waking up a thread blocked on a blackhole is now O(1) (e.g. if
it is the target of throwTo).
- less sharing and better separation of Capabilities: communication
is done with messages, the data structures are strictly owned by a
Capability and cannot be modified except by sending messages.
- this change will utlimately enable us to do more intelligent
scheduling when threads block on each other. This is what started
off the whole thing, but it isn't done yet (#3838).
I'll be documenting all this on the wiki in due course.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The type of the CmmLabel ctor is now
CmmLabel :: PackageId -> FastString -> CmmLabelInfo -> CLabel
- When you construct a CmmLabel you have to explicitly say what
package it is in. Many of these will just use rtsPackageId, but
I've left it this way to remind people not to pretend labels are
in the RTS package when they're not.
- When parsing a Cmm file, labels that are not defined in the
current file are assumed to be in the RTS package.
Labels imported like
import label
are assumed to be in a generic "foreign" package, which is different
from the current one.
Labels imported like
import "package-name" label
are marked as coming from the named package.
This last one is needed for the integer-gmp library as we want to
refer to labels that are not in the same compilation unit, but
are in the same non-rts package.
This should help remove the nasty #ifdef __PIC__ stuff from
integer-gmp/cbits/gmp-wrappers.cmm
|
| |
|
|
|
|
| |
This one complains sometimes, but there's no good way to improve it.
|
| |
|
|
|
|
| |
We used to use StaticFlags
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This merge does not turn on the new codegen (which only compiles
a select few programs at this point),
but it does introduce some changes to the old code generator.
The high bits:
1. The Rep Swamp patch is finally here.
The highlight is that the representation of types at the
machine level has changed.
Consequently, this patch contains updates across several back ends.
2. The new Stg -> Cmm path is here, although it appears to have a
fair number of bugs lurking.
3. Many improvements along the CmmCPSZ path, including:
o stack layout
o some code for infotables, half of which is right and half wrong
o proc-point splitting
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Eager blackholing can improve parallel performance by reducing the
chances that two threads perform the same computation. However, it
has a cost: one extra memory write per thunk entry.
To get the best results, any code which may be executed in parallel
should be compiled with eager blackholing turned on. But since
there's a cost for sequential code, we make it optional and turn it on
for the parallel package only. It might be a good idea to compile
applications (or modules) with parallel code in with
-feager-blackholing.
ToDo: document -feager-blackholing.
|
|
|
|
|
|
| |
C-- no longer has 'hints'; to guide parameter passing, it
has 'kinds'. Renamed type constructor, data constructor, and record
fields accordingly
|
| |
|
|
|
|
|
|
|
| |
This allows the instance of UserOfLocalRegs to be within Haskell98, and IMHO
makes the code a little cleaner generally.
This is one small (though tedious) step towards making GHC's code more
portable...
|
| |
|
|
|
|
|
|
|
| |
Older GHCs can't parse OPTIONS_GHC.
This also changes the URL referenced for the -w options from
WorkingConventions#Warnings to CodingStyle#Warnings for the compiler
modules.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements pointer tagging as per our ICFP'07 paper "Faster
laziness using dynamic pointer tagging". It improves performance by
10-15% for most workloads, including GHC itself.
The original patches were by Alexey Rodriguez Yakushev
<mrchebas@gmail.com>, with additions and improvements by me. I've
re-recorded the development as a single patch.
The basic idea is this: we use the low 2 bits of a pointer to a heap
object (3 bits on a 64-bit architecture) to encode some information
about the object pointed to. For a constructor, we encode the "tag"
of the constructor (e.g. True vs. False), for a function closure its
arity. This enables some decisions to be made without dereferencing
the pointer, which speeds up some common operations. In particular it
enables us to avoid costly indirect jumps in many cases.
More information in the commentary:
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/PointerTagging
|
|
|
|
|
| |
mapAccumL and mapAccumR are in Data.List now.
mapAccumB is unused.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following changes restore ticky-ticky profiling to functionality
from its formerly bit-rotted state. Sort of. (It got bit-rotted as part
of the switch to the C-- back-end.)
The way that ticky-ticky is supposed to work is documented in Section 5.7
of the GHC manual (though the manual doesn't mention that it hasn't worked
since sometime around 6.0, alas). Changes from this are as follows (which
I'll document on the wiki):
* In the past, you had to build all of the libraries with way=t in order to
use ticky-ticky, because it entailed a different closure layout. No longer.
You still need to do make way=t in rts/ in order to build the ticky RTS,
but you should now be able to mix ticky and non-ticky modules.
* Some of the counters that worked in the past aren't implemented yet.
I was originally just trying to get entry counts to work, so those should
be correct. The list of counters was never documented in the first place,
so I hope it's not too much of a disaster that some don't appear anymore.
Someday, someone (perhaps me) should document all the counters and what
they do. For now, all of the counters are either accurate (or at least as
accurate as they always were), zero, or missing from the ticky profiling
report altogether.
This hasn't been particularly well-tested, but these changes shouldn't
affect anything except when compiling with -fticky-ticky (famous last
words...)
Implementation details:
I got rid of StgTicky.h, which in the past had the macros and declarations
for all of the ticky counters. Now, those macros are defined in Cmm.h.
StgTicky.h was still there for inclusion in C code. Now, any remaining C
code simply cannot call the ticky macros -- or rather, they do call those
macros, but from the perspective of C code, they're defined as no-ops.
(This shouldn't be too big a problem.)
I added a new file TickyCounter.h that has all the declarations for ticky
counters, as well as dummy macros for use in C code. Someday, these
declarations should really be automatically generated, since they need
to be kept consistent with the macros defined in Cmm.h.
Other changes include getting rid of the header that was getting added to
closures before, and getting rid of various code having to do with eager
blackholing and permanent indirections (the changes under compiler/
and rts/Updates.*).
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is a start on removing import lists and generally tidying
up the top of each module. In addition to removing import lists:
- Change DATA.IOREF -> Data.IORef etc.
- Change List -> Data.List etc.
- Remove $Id$
- Update copyrights
- Re-order imports to put non-GHC imports last
- Remove some unused and duplicate imports
|