summaryrefslogtreecommitdiff
path: root/ghc/docs/comm/the-beast/ghci.html
diff options
context:
space:
mode:
Diffstat (limited to 'ghc/docs/comm/the-beast/ghci.html')
-rw-r--r--ghc/docs/comm/the-beast/ghci.html407
1 files changed, 0 insertions, 407 deletions
diff --git a/ghc/docs/comm/the-beast/ghci.html b/ghc/docs/comm/the-beast/ghci.html
deleted file mode 100644
index b893acdeb4..0000000000
--- a/ghc/docs/comm/the-beast/ghci.html
+++ /dev/null
@@ -1,407 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
-<html>
- <head>
- <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
- <title>The GHC Commentary - GHCi</title>
- </head>
-
- <body BGCOLOR="FFFFFF">
- <h1>The GHC Commentary - GHCi</h1>
-
- This isn't a coherent description of how GHCi works, sorry. What
- it is (currently) is a dumping ground for various bits of info
- pertaining to GHCi, which ought to be recorded somewhere.
-
- <h2>Debugging the interpreter</h2>
-
- The usual symptom is that some expression / program crashes when
- running on the interpreter (commonly), or gets wierd results
- (rarely). Unfortunately, finding out what the problem really is
- has proven to be extremely difficult. In retrospect it may be
- argued a design flaw that GHC's implementation of the STG
- execution mechanism provides only the weakest of support for
- automated internal consistency checks. This makes it hard to
- debug.
- <p>
- Execution failures in the interactive system can be due to
- problems with the bytecode interpreter, problems with the bytecode
- generator, or problems elsewhere. From the bugs seen so far,
- the bytecode generator is often the culprit, with the interpreter
- usually being correct.
- <p>
- Here are some tips for tracking down interactive nonsense:
- <ul>
- <li>Find the smallest source fragment which causes the problem.
- <p>
- <li>Using an RTS compiled with <code>-DDEBUG</code> (nb, that
- means the RTS from the previous stage!), run with <code>+RTS
- -D2</code> to get a listing in great detail from the
- interpreter. Note that the listing is so voluminous that
- this is impractical unless you have been diligent in
- the previous step.
- <p>
- <li>At least in principle, using the trace and a bit of GDB
- poking around at the time of death, you can figure out what
- the problem is. In practice you quickly get depressed at
- the hopelessness of ever making sense of the mass of
- details. Well, I do, anyway.
- <p>
- <li><code>+RTS -D2</code> tries hard to print useful
- descriptions of what's on the stack, and often succeeds.
- However, it has no way to map addresses to names in
- code/data loaded by our runtime linker. So the C function
- <code>ghci_enquire</code> is provided. Given an address, it
- searches the loaded symbol tables for symbols close to that
- address. You can run it from inside GDB:
- <pre>
- (gdb) p ghci_enquire ( 0x50a406f0 )
- 0x50a406f0 + -48 == `PrelBase_Czh_con_info'
- 0x50a406f0 + -12 == `PrelBase_Izh_static_info'
- 0x50a406f0 + -48 == `PrelBase_Czh_con_entry'
- 0x50a406f0 + -24 == `PrelBase_Izh_con_info'
- 0x50a406f0 + 16 == `PrelBase_ZC_con_entry'
- 0x50a406f0 + 0 == `PrelBase_ZMZN_static_entry'
- 0x50a406f0 + -36 == `PrelBase_Czh_static_entry'
- 0x50a406f0 + -24 == `PrelBase_Izh_con_entry'
- 0x50a406f0 + 64 == `PrelBase_EQ_static_info'
- 0x50a406f0 + 0 == `PrelBase_ZMZN_static_info'
- 0x50a406f0 + 48 == `PrelBase_LT_static_entry'
- $1 = void
- </pre>
- In this case the enquired-about address is
- <code>PrelBase_ZMZN_static_entry</code>. If no symbols are
- close to the given addr, nothing is printed. Not a great
- mechanism, but better than nothing.
- <p>
- <li>We have had various problems in the past due to the bytecode
- generator (<code>compiler/ghci/ByteCodeGen.lhs</code>) being
- confused about the true set of free variables of an
- expression. The compilation scheme for <code>let</code>s
- applies the BCO for the RHS of the let to its free
- variables, so if the free-var annotation is wrong or
- misleading, you end up with code which has wrong stack
- offsets, which is usually fatal.
- <p>
- <li>The baseline behaviour of the interpreter is to interpret
- BCOs, and hand all other closures back to the scheduler for
- evaluation. However, this causes a huge number of expensive
- context switches, so the interpreter knows how to enter the
- most common non-BCO closure types by itself.
- <p>
- These optimisations complicate the interpreter.
- If you think you have an interpreter problem, re-enable the
- define <code>REFERENCE_INTERPRETER</code> in
- <code>ghc/rts/Interpreter.c</code>. All optimisations are
- thereby disabled, giving the baseline
- I-only-know-how-to-enter-BCOs behaviour.
- <p>
- <li>Following the traces is often problematic because execution
- hops back and forth between the interpreter, which is
- traced, and compiled code, which you can't see.
- Particularly annoying is when the stack looks OK in the
- interpreter, then compiled code runs for a while, and later
- we arrive back in the interpreter, with the stack corrupted,
- and usually in a completely different place from where we
- left off.
- <p>
- If this is biting you baaaad, it may be worth copying
- sources for the compiled functions causing the problem, into
- your interpreted module, in the hope that you stay in the
- interpreter more of the time. Of course this doesn't work
- very well if you've defined
- <code>REFERENCE_INTERPRETER</code> in
- <code>ghc/rts/Interpreter.c</code>.
- <p>
- <li>There are various commented-out pieces of code in
- <code>Interpreter.c</code> which can be used to get the
- stack sanity-checked after every entry, and even after after
- every bytecode instruction executed. Note that some
- bytecodes (<code>PUSH_UBX</code>) leave the stack in
- an unwalkable state, so the <code>do_print_stack</code>
- local variable is used to suppress the stack walk after
- them.
- </ul>
-
-
- <h2>Useful stuff to know about the interpreter</h2>
-
- The code generation scheme is straightforward (naive, in fact).
- <code>-ddump-bcos</code> prints each BCO along with the Core it
- was generated from, which is very handy.
- <ul>
- <li>Simple lets are compiled in-line. For the general case, let
- v = E in ..., E is compiled into a new BCO which takes as
- args its free variables, and v is bound to AP(the new BCO,
- free vars of E).
- <p>
- <li><code>case</code>s as usual, become: push the return
- continuation, enter the scrutinee. There is some magic to
- make all combinations of compiled/interpreted calls and
- returns work, described below. In the interpreted case, all
- case alts are compiled into a single big return BCO, which
- commences with instructions implementing a switch tree.
- </ul>
- <p>
- <b>ARGCHECK magic</b>
- <p>
- You may find ARGCHECK instructions at the start of BCOs which
- don't appear to need them; case continuations in particular.
- These play an important role: they force objects which should
- evaluated to BCOs to actually be BCOs.
- <p>
- Typically, there may be an application node somewhere in the heap.
- This is a thunk which when leant on turns into a BCO for a return
- continuation. The thunk may get entered with an update frame on
- top of the stack. This is legitimate since from one viewpoint
- this is an AP which simply reduces to a data object, so does not
- have functional type. However, once the AP turns itself into a
- BCO (so to speak) we cannot simply enter the BCO, because that
- expects to see args on top of the stack, not an update frame.
- Therefore any BCO which expects something on the stack above an
- update frame, even non-function BCOs, start with an ARGCHECK. In
- this case it fails, the update is done, the update frame is
- removed, and the BCO re-entered. Subsequent entries of the BCO of
- course go unhindered.
- <p>
- The optimised (<code>#undef REFERENCE_INTERPRETER</code>) handles
- this case specially, so that a trip through the scheduler is
- avoided. When reading traces from <code>+RTS -D2 -RTS</code>, you
- may see BCOs which appear to execute their initial ARGCHECK insn
- twice. The first time it fails; the interpreter does the update
- immediately and re-enters with no further comment.
- <p>
- This is all a bit ugly, and, as SimonM correctly points out, it
- would have been cleaner to make BCOs unpointed (unthunkable)
- objects, so that a pointer to something <code>:: BCO#</code>
- really points directly at a BCO.
- <p>
- <b>Stack management</b>
- <p>
- There isn't any attempt to stub the stack, minimise its growth, or
- generally remove unused pointers ahead of time. This is really
- due to lazyness on my part, although it does have the minor
- advantage that doing something cleverer would almost certainly
- increase the number of bytecodes that would have to be executed.
- Of course we SLIDE out redundant stuff, to get the stack back to
- the sequel depth, before returning a HNF, but that's all. As
- usual this is probably a cause of major space leaks.
- <p>
- <b>Building constructors</b>
- <p>
- Constructors are built on the stack and then dumped into the heap
- with a single PACK instruction, which simply copies the top N
- words of the stack verbatim into the heap, adds an info table, and zaps N
- words from the stack. The constructor args are pushed onto the
- stack one at a time. One upshot of this is that unboxed values
- get pushed untaggedly onto the stack (via PUSH_UBX), because that's how they
- will be in the heap. That in turn means that the stack is not
- always walkable at arbitrary points in BCO execution, although
- naturally it is whenever GC might occur.
- <p>
- Function closures created by the interpreter use the AP-node
- (tagged) format, so although their fields are similarly
- constructed on the stack, there is never a stack walkability
- problem.
- <p>
- <b>Unpacking constructors</b>
- <p>
- At the start of a case continuation, the returned constructor is
- unpacked onto the stack, which means that unboxed fields have to
- be tagged. Rather than burdening all such continuations with a
- complex, general mechanism, I split it into two. The
- allegedly-common all-pointers case uses a single UNPACK insn
- to fish out all fields with no further ado. The slow case uses a
- sequence of more complex UPK_TAG insns, one for each field (I
- think). This seemed like a good compromise to me.
- <p>
- <b>Perspective</b>
- <p>
- I designed the bytecode mechanism with the experience of both STG
- hugs and Classic Hugs in mind. The latter has an small
- set of bytecodes, a small interpreter loop, and runs amazingly
- fast considering the cruddy code it has to interpret. The former
- had a large interpretative loop with many different opcodes,
- including multiple minor variants of the same thing, which
- made it difficult to optimise and maintain, yet it performed more
- or less comparably with Classic Hugs.
- <p>
- My design aims were therefore to minimise the interpreter's
- complexity whilst maximising performance. This means reducing the
- number of opcodes implemented, whilst reducing the number of insns
- despatched. In particular there are only two opcodes, PUSH_UBX
- and UPK_TAG, which deal with tags. STG Hugs had dozens of opcodes
- for dealing with tagged data. In cases where the common
- all-pointers case is significantly simpler (UNPACK) I deal with it
- specially. Finally, the number of insns executed is reduced a
- little by merging multiple pushes, giving PUSH_LL and PUSH_LLL.
- These opcode pairings were determined by using the opcode-pair
- frequency profiling stuff which is ifdef-d out in
- <code>Interpreter.c</code>. These significantly improve
- performance without having much effect on the uglyness or
- complexity of the interpreter.
- <p>
- Overall, the interpreter design is something which turned out
- well, and I was pleased with it. Unfortunately I cannot say the
- same of the bytecode generator.
-
- <h2><code>case</code> returns between interpreted and compiled code</h2>
-
- Variants of the following scheme have been drifting around in GHC
- RTS documentation for several years. Since what follows is
- actually what is implemented, I guess it supersedes all other
- documentation. Beware; the following may make your brain melt.
- In all the pictures below, the stack grows downwards.
- <p>
- <b>Returning to interpreted code</b>.
- <p>
- Interpreted returns employ a set of polymorphic return infotables.
- Each element in the set corresponds to one of the possible return
- registers (R1, D1, F1) that compiled code will place the returned
- value in. In fact this is a bit misleading, since R1 can be used
- to return either a pointer or an int, and we need to distinguish
- these cases. So, supposing the set of return registers is {R1p,
- R1n, D1, F1}, there would be four corresponding infotables,
- <code>stg_ctoi_ret_R1p_info</code>, etc. In the pictures below we
- call them <code>stg_ctoi_ret_REP_info</code>.
- <p>
- These return itbls are polymorphic, meaning that all 8 vectored
- return codes and the direct return code are identical.
- <p>
- Before the scrutinee is entered, the stack is arranged like this:
- <pre>
- | |
- +--------+
- | BCO | -------> the return contination BCO
- +--------+
- | itbl * | -------> stg_ctoi_ret_REP_info, with all 9 codes as follows:
- +--------+
- BCO* bco = Sp[1];
- push R1/F1/D1 depending on REP
- push bco
- yield to sched
- </pre>
- On entry, the interpreted contination BCO expects the stack to look
- like this:
- <pre>
- | |
- +--------+
- | BCO | -------> the return contination BCO
- +--------+
- | itbl * | -------> ret_REP_ctoi_info, with all 9 codes as follows:
- +--------+
- : VALUE : (the returned value, shown with : since it may occupy
- +--------+ multiple stack words)
- </pre>
- A machine code return will park the returned value in R1/F1/D1,
- and enter the itbl on the top of the stack. Since it's our magic
- itbl, this pushes the returned value onto the stack, which is
- where the interpreter expects to find it. It then pushes the BCO
- (again) and yields. The scheduler removes the BCO from the top,
- and enters it, so that the continuation is interpreted with the
- stack as shown above.
- <p>
- An interpreted return will create the value to return at the top
- of the stack. It then examines the return itbl, which must be
- immediately underneath the return value, to see if it is one of
- the magic <code>stg_ctoi_ret_REP_info</code> set. Since this is so,
- it knows it is returning to an interpreted contination. It
- therefore simply enters the BCO which it assumes it immediately
- underneath the itbl on the stack.
-
- <p>
- <b>Returning to compiled code</b>.
- <p>
- Before the scrutinee is entered, the stack is arranged like this:
- <pre>
- ptr to vec code 8 ------> return vector code 8
- | | ....
- +--------+ ptr to vec code 1 ------> return vector code 1
- | itbl * | -- Itbl end
- +--------+ \ ....
- \ Itbl start
- ----> direct return code
- </pre>
- The scrutinee value is then entered.
- The case continuation(s) expect the stack to look the same, with
- the returned HNF in a suitable return register, R1, D1, F1 etc.
- <p>
- A machine code return knows whether it is doing a vectored or
- direct return, and, if the former, which vector element it is.
- So, for a direct return we jump to <code>Sp[0]</code>, and for a
- vectored return, jump to <code>((CodePtr*)(Sp[0]))[ - ITBL_LENGTH
- - vector number ]</code>. This is (of course) the scheme that
- compiled code has been using all along.
- <p>
- An interpreted return will, as described just above, have examined
- the itbl immediately beneath the return value it has just pushed,
- and found it not to be one of the <code>ret_REP_ctoi_info</code> set,
- so it knows this must be a return to machine code. It needs to
- pop the return value, currently on the stack, into R1/F1/D1, and
- jump through the info table. Unfortunately the first part cannot
- be accomplished directly since we are not in Haskellised-C world.
- <p>
- We therefore employ a second family of magic infotables, indexed,
- like the first, on the return representation, and therefore with
- names of the form <code>stg_itoc_ret_REP_info</code>. (Note:
- <code>itoc</code>; the previous bunch were <code>ctoi</code>).
- This is pushed onto the stack (note, tagged values have their tag
- zapped), giving:
- <pre>
- | |
- +--------+
- | itbl * | -------> arbitrary machine code return itbl
- +--------+
- : VALUE : (the returned value, possibly multiple words)
- +--------+
- | itbl * | -------> stg_itoc_ret_REP_info, with code:
- +--------+
- pop myself (stg_itoc_ret_REP_info) off the stack
- pop return value into R1/D1/F1
- do standard machine code return to itbl at t.o.s.
- </pre>
- We then return to the scheduler, asking it to enter the itbl at
- t.o.s. When entered, <code>stg_itoc_ret_REP_info</code> removes
- itself from the stack, pops the return value into the relevant
- return register, and returns to the itbl to which we were trying
- to return in the first place.
- <p>
- Amazingly enough, this stuff all actually works! Well, mostly ...
- <p>
- <b>Unboxed tuples: a Right Royal Spanner In The Works</b>
- <p>
- The above scheme depends crucially on having magic infotables
- <code>stg_{itoc,ctoi}_ret_REP_info</code> for each return
- representation <code>REP</code>. It unfortunately fails miserably
- in the face of unboxed tuple returns, because the set of required
- tables would be infinite; this despite the fact that for any given
- unboxed tuple return type, the scheme could be made to work fine.
- <p>
- This is a serious problem, because it prevents interpreted
- code from doing <code>IO</code>-typed returns, since <code>IO
- t</code> is implemented as <code>(# t, RealWorld# #)</code> or
- thereabouts. This restriction in turn rules out FFI stuff in the
- interpreter. Not good.
- <p>
- Although we have no way to make general unboxed tuples work, we
- can at least make <code>IO</code>-types work using the following
- ultra-kludgey observation: <code>RealWorld#</code> doesn't really
- exist and so has zero size, in compiled code. In turn this means
- that a type of the form <code>(# t, RealWorld# #)</code> has the
- same representation as plain <code>t</code> does. So the bytecode
- generator, whilst rejecting code with general unboxed tuple
- returns, recognises and accepts this special case. Which means
- that <code>IO</code>-typed stuff works in the interpreter. Just.
- <p>
- If anyone asks, I will claim I was out of radio contact, on a
- 6-month walking holiday to the south pole, at the time this was
- ... er ... dreamt up.
-
-
-<p><small>
-
-<!-- hhmts start -->
-Last modified: Thursday February 7 15:33:49 GMT 2002
-<!-- hhmts end -->
- </small>
- </body>
-</html>