diff options
Diffstat (limited to 'ghc/docs/comm/the-beast/ghci.html')
-rw-r--r-- | ghc/docs/comm/the-beast/ghci.html | 407 |
1 files changed, 0 insertions, 407 deletions
diff --git a/ghc/docs/comm/the-beast/ghci.html b/ghc/docs/comm/the-beast/ghci.html deleted file mode 100644 index b893acdeb4..0000000000 --- a/ghc/docs/comm/the-beast/ghci.html +++ /dev/null @@ -1,407 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> -<html> - <head> - <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"> - <title>The GHC Commentary - GHCi</title> - </head> - - <body BGCOLOR="FFFFFF"> - <h1>The GHC Commentary - GHCi</h1> - - This isn't a coherent description of how GHCi works, sorry. What - it is (currently) is a dumping ground for various bits of info - pertaining to GHCi, which ought to be recorded somewhere. - - <h2>Debugging the interpreter</h2> - - The usual symptom is that some expression / program crashes when - running on the interpreter (commonly), or gets wierd results - (rarely). Unfortunately, finding out what the problem really is - has proven to be extremely difficult. In retrospect it may be - argued a design flaw that GHC's implementation of the STG - execution mechanism provides only the weakest of support for - automated internal consistency checks. This makes it hard to - debug. - <p> - Execution failures in the interactive system can be due to - problems with the bytecode interpreter, problems with the bytecode - generator, or problems elsewhere. From the bugs seen so far, - the bytecode generator is often the culprit, with the interpreter - usually being correct. - <p> - Here are some tips for tracking down interactive nonsense: - <ul> - <li>Find the smallest source fragment which causes the problem. - <p> - <li>Using an RTS compiled with <code>-DDEBUG</code> (nb, that - means the RTS from the previous stage!), run with <code>+RTS - -D2</code> to get a listing in great detail from the - interpreter. Note that the listing is so voluminous that - this is impractical unless you have been diligent in - the previous step. - <p> - <li>At least in principle, using the trace and a bit of GDB - poking around at the time of death, you can figure out what - the problem is. In practice you quickly get depressed at - the hopelessness of ever making sense of the mass of - details. Well, I do, anyway. - <p> - <li><code>+RTS -D2</code> tries hard to print useful - descriptions of what's on the stack, and often succeeds. - However, it has no way to map addresses to names in - code/data loaded by our runtime linker. So the C function - <code>ghci_enquire</code> is provided. Given an address, it - searches the loaded symbol tables for symbols close to that - address. You can run it from inside GDB: - <pre> - (gdb) p ghci_enquire ( 0x50a406f0 ) - 0x50a406f0 + -48 == `PrelBase_Czh_con_info' - 0x50a406f0 + -12 == `PrelBase_Izh_static_info' - 0x50a406f0 + -48 == `PrelBase_Czh_con_entry' - 0x50a406f0 + -24 == `PrelBase_Izh_con_info' - 0x50a406f0 + 16 == `PrelBase_ZC_con_entry' - 0x50a406f0 + 0 == `PrelBase_ZMZN_static_entry' - 0x50a406f0 + -36 == `PrelBase_Czh_static_entry' - 0x50a406f0 + -24 == `PrelBase_Izh_con_entry' - 0x50a406f0 + 64 == `PrelBase_EQ_static_info' - 0x50a406f0 + 0 == `PrelBase_ZMZN_static_info' - 0x50a406f0 + 48 == `PrelBase_LT_static_entry' - $1 = void - </pre> - In this case the enquired-about address is - <code>PrelBase_ZMZN_static_entry</code>. If no symbols are - close to the given addr, nothing is printed. Not a great - mechanism, but better than nothing. - <p> - <li>We have had various problems in the past due to the bytecode - generator (<code>compiler/ghci/ByteCodeGen.lhs</code>) being - confused about the true set of free variables of an - expression. The compilation scheme for <code>let</code>s - applies the BCO for the RHS of the let to its free - variables, so if the free-var annotation is wrong or - misleading, you end up with code which has wrong stack - offsets, which is usually fatal. - <p> - <li>The baseline behaviour of the interpreter is to interpret - BCOs, and hand all other closures back to the scheduler for - evaluation. However, this causes a huge number of expensive - context switches, so the interpreter knows how to enter the - most common non-BCO closure types by itself. - <p> - These optimisations complicate the interpreter. - If you think you have an interpreter problem, re-enable the - define <code>REFERENCE_INTERPRETER</code> in - <code>ghc/rts/Interpreter.c</code>. All optimisations are - thereby disabled, giving the baseline - I-only-know-how-to-enter-BCOs behaviour. - <p> - <li>Following the traces is often problematic because execution - hops back and forth between the interpreter, which is - traced, and compiled code, which you can't see. - Particularly annoying is when the stack looks OK in the - interpreter, then compiled code runs for a while, and later - we arrive back in the interpreter, with the stack corrupted, - and usually in a completely different place from where we - left off. - <p> - If this is biting you baaaad, it may be worth copying - sources for the compiled functions causing the problem, into - your interpreted module, in the hope that you stay in the - interpreter more of the time. Of course this doesn't work - very well if you've defined - <code>REFERENCE_INTERPRETER</code> in - <code>ghc/rts/Interpreter.c</code>. - <p> - <li>There are various commented-out pieces of code in - <code>Interpreter.c</code> which can be used to get the - stack sanity-checked after every entry, and even after after - every bytecode instruction executed. Note that some - bytecodes (<code>PUSH_UBX</code>) leave the stack in - an unwalkable state, so the <code>do_print_stack</code> - local variable is used to suppress the stack walk after - them. - </ul> - - - <h2>Useful stuff to know about the interpreter</h2> - - The code generation scheme is straightforward (naive, in fact). - <code>-ddump-bcos</code> prints each BCO along with the Core it - was generated from, which is very handy. - <ul> - <li>Simple lets are compiled in-line. For the general case, let - v = E in ..., E is compiled into a new BCO which takes as - args its free variables, and v is bound to AP(the new BCO, - free vars of E). - <p> - <li><code>case</code>s as usual, become: push the return - continuation, enter the scrutinee. There is some magic to - make all combinations of compiled/interpreted calls and - returns work, described below. In the interpreted case, all - case alts are compiled into a single big return BCO, which - commences with instructions implementing a switch tree. - </ul> - <p> - <b>ARGCHECK magic</b> - <p> - You may find ARGCHECK instructions at the start of BCOs which - don't appear to need them; case continuations in particular. - These play an important role: they force objects which should - evaluated to BCOs to actually be BCOs. - <p> - Typically, there may be an application node somewhere in the heap. - This is a thunk which when leant on turns into a BCO for a return - continuation. The thunk may get entered with an update frame on - top of the stack. This is legitimate since from one viewpoint - this is an AP which simply reduces to a data object, so does not - have functional type. However, once the AP turns itself into a - BCO (so to speak) we cannot simply enter the BCO, because that - expects to see args on top of the stack, not an update frame. - Therefore any BCO which expects something on the stack above an - update frame, even non-function BCOs, start with an ARGCHECK. In - this case it fails, the update is done, the update frame is - removed, and the BCO re-entered. Subsequent entries of the BCO of - course go unhindered. - <p> - The optimised (<code>#undef REFERENCE_INTERPRETER</code>) handles - this case specially, so that a trip through the scheduler is - avoided. When reading traces from <code>+RTS -D2 -RTS</code>, you - may see BCOs which appear to execute their initial ARGCHECK insn - twice. The first time it fails; the interpreter does the update - immediately and re-enters with no further comment. - <p> - This is all a bit ugly, and, as SimonM correctly points out, it - would have been cleaner to make BCOs unpointed (unthunkable) - objects, so that a pointer to something <code>:: BCO#</code> - really points directly at a BCO. - <p> - <b>Stack management</b> - <p> - There isn't any attempt to stub the stack, minimise its growth, or - generally remove unused pointers ahead of time. This is really - due to lazyness on my part, although it does have the minor - advantage that doing something cleverer would almost certainly - increase the number of bytecodes that would have to be executed. - Of course we SLIDE out redundant stuff, to get the stack back to - the sequel depth, before returning a HNF, but that's all. As - usual this is probably a cause of major space leaks. - <p> - <b>Building constructors</b> - <p> - Constructors are built on the stack and then dumped into the heap - with a single PACK instruction, which simply copies the top N - words of the stack verbatim into the heap, adds an info table, and zaps N - words from the stack. The constructor args are pushed onto the - stack one at a time. One upshot of this is that unboxed values - get pushed untaggedly onto the stack (via PUSH_UBX), because that's how they - will be in the heap. That in turn means that the stack is not - always walkable at arbitrary points in BCO execution, although - naturally it is whenever GC might occur. - <p> - Function closures created by the interpreter use the AP-node - (tagged) format, so although their fields are similarly - constructed on the stack, there is never a stack walkability - problem. - <p> - <b>Unpacking constructors</b> - <p> - At the start of a case continuation, the returned constructor is - unpacked onto the stack, which means that unboxed fields have to - be tagged. Rather than burdening all such continuations with a - complex, general mechanism, I split it into two. The - allegedly-common all-pointers case uses a single UNPACK insn - to fish out all fields with no further ado. The slow case uses a - sequence of more complex UPK_TAG insns, one for each field (I - think). This seemed like a good compromise to me. - <p> - <b>Perspective</b> - <p> - I designed the bytecode mechanism with the experience of both STG - hugs and Classic Hugs in mind. The latter has an small - set of bytecodes, a small interpreter loop, and runs amazingly - fast considering the cruddy code it has to interpret. The former - had a large interpretative loop with many different opcodes, - including multiple minor variants of the same thing, which - made it difficult to optimise and maintain, yet it performed more - or less comparably with Classic Hugs. - <p> - My design aims were therefore to minimise the interpreter's - complexity whilst maximising performance. This means reducing the - number of opcodes implemented, whilst reducing the number of insns - despatched. In particular there are only two opcodes, PUSH_UBX - and UPK_TAG, which deal with tags. STG Hugs had dozens of opcodes - for dealing with tagged data. In cases where the common - all-pointers case is significantly simpler (UNPACK) I deal with it - specially. Finally, the number of insns executed is reduced a - little by merging multiple pushes, giving PUSH_LL and PUSH_LLL. - These opcode pairings were determined by using the opcode-pair - frequency profiling stuff which is ifdef-d out in - <code>Interpreter.c</code>. These significantly improve - performance without having much effect on the uglyness or - complexity of the interpreter. - <p> - Overall, the interpreter design is something which turned out - well, and I was pleased with it. Unfortunately I cannot say the - same of the bytecode generator. - - <h2><code>case</code> returns between interpreted and compiled code</h2> - - Variants of the following scheme have been drifting around in GHC - RTS documentation for several years. Since what follows is - actually what is implemented, I guess it supersedes all other - documentation. Beware; the following may make your brain melt. - In all the pictures below, the stack grows downwards. - <p> - <b>Returning to interpreted code</b>. - <p> - Interpreted returns employ a set of polymorphic return infotables. - Each element in the set corresponds to one of the possible return - registers (R1, D1, F1) that compiled code will place the returned - value in. In fact this is a bit misleading, since R1 can be used - to return either a pointer or an int, and we need to distinguish - these cases. So, supposing the set of return registers is {R1p, - R1n, D1, F1}, there would be four corresponding infotables, - <code>stg_ctoi_ret_R1p_info</code>, etc. In the pictures below we - call them <code>stg_ctoi_ret_REP_info</code>. - <p> - These return itbls are polymorphic, meaning that all 8 vectored - return codes and the direct return code are identical. - <p> - Before the scrutinee is entered, the stack is arranged like this: - <pre> - | | - +--------+ - | BCO | -------> the return contination BCO - +--------+ - | itbl * | -------> stg_ctoi_ret_REP_info, with all 9 codes as follows: - +--------+ - BCO* bco = Sp[1]; - push R1/F1/D1 depending on REP - push bco - yield to sched - </pre> - On entry, the interpreted contination BCO expects the stack to look - like this: - <pre> - | | - +--------+ - | BCO | -------> the return contination BCO - +--------+ - | itbl * | -------> ret_REP_ctoi_info, with all 9 codes as follows: - +--------+ - : VALUE : (the returned value, shown with : since it may occupy - +--------+ multiple stack words) - </pre> - A machine code return will park the returned value in R1/F1/D1, - and enter the itbl on the top of the stack. Since it's our magic - itbl, this pushes the returned value onto the stack, which is - where the interpreter expects to find it. It then pushes the BCO - (again) and yields. The scheduler removes the BCO from the top, - and enters it, so that the continuation is interpreted with the - stack as shown above. - <p> - An interpreted return will create the value to return at the top - of the stack. It then examines the return itbl, which must be - immediately underneath the return value, to see if it is one of - the magic <code>stg_ctoi_ret_REP_info</code> set. Since this is so, - it knows it is returning to an interpreted contination. It - therefore simply enters the BCO which it assumes it immediately - underneath the itbl on the stack. - - <p> - <b>Returning to compiled code</b>. - <p> - Before the scrutinee is entered, the stack is arranged like this: - <pre> - ptr to vec code 8 ------> return vector code 8 - | | .... - +--------+ ptr to vec code 1 ------> return vector code 1 - | itbl * | -- Itbl end - +--------+ \ .... - \ Itbl start - ----> direct return code - </pre> - The scrutinee value is then entered. - The case continuation(s) expect the stack to look the same, with - the returned HNF in a suitable return register, R1, D1, F1 etc. - <p> - A machine code return knows whether it is doing a vectored or - direct return, and, if the former, which vector element it is. - So, for a direct return we jump to <code>Sp[0]</code>, and for a - vectored return, jump to <code>((CodePtr*)(Sp[0]))[ - ITBL_LENGTH - - vector number ]</code>. This is (of course) the scheme that - compiled code has been using all along. - <p> - An interpreted return will, as described just above, have examined - the itbl immediately beneath the return value it has just pushed, - and found it not to be one of the <code>ret_REP_ctoi_info</code> set, - so it knows this must be a return to machine code. It needs to - pop the return value, currently on the stack, into R1/F1/D1, and - jump through the info table. Unfortunately the first part cannot - be accomplished directly since we are not in Haskellised-C world. - <p> - We therefore employ a second family of magic infotables, indexed, - like the first, on the return representation, and therefore with - names of the form <code>stg_itoc_ret_REP_info</code>. (Note: - <code>itoc</code>; the previous bunch were <code>ctoi</code>). - This is pushed onto the stack (note, tagged values have their tag - zapped), giving: - <pre> - | | - +--------+ - | itbl * | -------> arbitrary machine code return itbl - +--------+ - : VALUE : (the returned value, possibly multiple words) - +--------+ - | itbl * | -------> stg_itoc_ret_REP_info, with code: - +--------+ - pop myself (stg_itoc_ret_REP_info) off the stack - pop return value into R1/D1/F1 - do standard machine code return to itbl at t.o.s. - </pre> - We then return to the scheduler, asking it to enter the itbl at - t.o.s. When entered, <code>stg_itoc_ret_REP_info</code> removes - itself from the stack, pops the return value into the relevant - return register, and returns to the itbl to which we were trying - to return in the first place. - <p> - Amazingly enough, this stuff all actually works! Well, mostly ... - <p> - <b>Unboxed tuples: a Right Royal Spanner In The Works</b> - <p> - The above scheme depends crucially on having magic infotables - <code>stg_{itoc,ctoi}_ret_REP_info</code> for each return - representation <code>REP</code>. It unfortunately fails miserably - in the face of unboxed tuple returns, because the set of required - tables would be infinite; this despite the fact that for any given - unboxed tuple return type, the scheme could be made to work fine. - <p> - This is a serious problem, because it prevents interpreted - code from doing <code>IO</code>-typed returns, since <code>IO - t</code> is implemented as <code>(# t, RealWorld# #)</code> or - thereabouts. This restriction in turn rules out FFI stuff in the - interpreter. Not good. - <p> - Although we have no way to make general unboxed tuples work, we - can at least make <code>IO</code>-types work using the following - ultra-kludgey observation: <code>RealWorld#</code> doesn't really - exist and so has zero size, in compiled code. In turn this means - that a type of the form <code>(# t, RealWorld# #)</code> has the - same representation as plain <code>t</code> does. So the bytecode - generator, whilst rejecting code with general unboxed tuple - returns, recognises and accepts this special case. Which means - that <code>IO</code>-typed stuff works in the interpreter. Just. - <p> - If anyone asks, I will claim I was out of radio contact, on a - 6-month walking holiday to the south pole, at the time this was - ... er ... dreamt up. - - -<p><small> - -<!-- hhmts start --> -Last modified: Thursday February 7 15:33:49 GMT 2002 -<!-- hhmts end --> - </small> - </body> -</html> |