diff options
author | Mikael Djurfeldt <djurfeldt@nada.kth.se> | 2000-08-17 04:10:42 +0000 |
---|---|---|
committer | Mikael Djurfeldt <djurfeldt@nada.kth.se> | 2000-08-17 04:10:42 +0000 |
commit | 2fb8bdabd27f6620f143dc5b9d1b429e4287fe6c (patch) | |
tree | 1237a50be38d29d622af4f409a74a007f481d9f9 /devel | |
parent | 53bb55082885bae1770cd4e4268724c7e7200d6e (diff) | |
download | guile-2fb8bdabd27f6620f143dc5b9d1b429e4287fe6c.tar.gz |
Mikael's ideas on a new type of Scheme interpreter
Diffstat (limited to 'devel')
-rw-r--r-- | devel/vm/ior/ior-intro.text | 0 | ||||
-rw-r--r-- | devel/vm/ior/ior.text | 665 |
2 files changed, 665 insertions, 0 deletions
diff --git a/devel/vm/ior/ior-intro.text b/devel/vm/ior/ior-intro.text new file mode 100644 index 000000000..e69de29bb --- /dev/null +++ b/devel/vm/ior/ior-intro.text diff --git a/devel/vm/ior/ior.text b/devel/vm/ior/ior.text new file mode 100644 index 000000000..9730de55d --- /dev/null +++ b/devel/vm/ior/ior.text @@ -0,0 +1,665 @@ +*** +*** These notes about the design of a new type of Scheme interpreter +*** "Ior" are cut out from various emails from early spring 2000. +*** +*** MDJ 000817 <djurfeldt@nada.kth.se> +*** + +Generally, we should try to make a design which is clean and +minimalistic in as many respects as possible. For example, even if we +need more primitives than those in R5RS internally, I don't think +these should be made available to the user in the core, but rather be +made available *through* libraries (implementation in core, +publication via library). + +The suggested working name for this project is "Ior" (Swedish name for +the donkey in "Winnie the Pooh" :). If, against the odds, we really +would succeed in producing an Ior, and we find it suitable, we could +turn it into a Guile 2.0 (or whatever). (The architecture still +allows for support of the gh interface and uses conservative GC (Hans +Böhm's, in fact).) + + Beware now that I'm just sending over my original letter, which is + just a sketch of the more detailed, but cryptic, design notes I made + originally, which are, in turn, not as detailed as the design has + become now. :) + + Please also excuse the lack of structure. I shouldn't work on this at + all right now. Choose for yourselves if you want to read this + unstructured information or if you want to wait until I've structured + it after end of January. + +But then I actually have to blurt out the basic idea of my +architecture already now. (I had hoped to present you with a proper +and fairly detailed spec, but I won't be able to complete such a spec +quickly.) + + +The basic idea is this: + +* Don't waste time on non-computation! + +Why waste a lot of time on type-checks, unboxing and boxing of data? +Neither of these actions do any computations! + +I'd like both interpreter and compiled code to work directly with data +in raw, native form (integers represented as 32bit longs, inexact +numbers as doubles, short strings as bytes in a word, longer strings +as a normal pointer to malloced memory, bignums are just pointers to a +gmp (GNU MultiPrecision library) object, etc.) + +* Don't we need to dispatch on type to know what to do? + +But don't we need to dispatch on the type in order to know how to +compute with the data? E.g., `display' does entirely different +computations on a <fixnum> and a <string>. (<fixnum> is an integer +between -2^31 and 2^31-1.) + +The answer is *no*, not in 95% of all cases. The main reason is that +the interpreter does type analysis while converting closures to +bytecode, and knows already when _calling_ `display' what type it's +arguments has. This means that the bytecode compiler can choose a +suitable _version_ of `display' which handles that particular type. + + + +This type analysis is greatly simplified by the fact that just as the +type analysis _results_ in the type of the argument in the call to +`display', and, thus, we can select the correct _version_ of +`display', the closure byte-code itself will only be one _version_ of +the closure with the types of its arguments fixed at the start of the +analysis. + +As you already have understood by now, the basic architecture is that +all procedures are generic functions, and the "versions" I'm speaking +about is a kind of methods. Let's call them "branches" by now. + +For example: + +(define foo + (lambda (x) + ... + (display x) + ...) + +may result in the following two branches: + +1. [<fixnum>-foo] = + (branch ((x <fixnum>)) + ... + ([<fixnum>-display] x) + ...) + +2. [<string>-foo] = + (branch ((x <string>)) + ... + ([<string>-display] x) + ...) + +and a new closure + +(define bar + (lambda (x y) + ... + (foo x) + ...)) + +results in + +[<fixnum>-<fixnum>-bar] = + (branch ((x <fixnum>) (y <fixnum>)) + ... + ([<fixnum>-foo] x) + ...) + +Note how all type dispatch is eliminated in these examples. + +As a further reinforcement to the type analysis, branches will not +only have typed parameters but also have return types. This means +that the type of a branch will look like + + <type 1> x ... x <type n> --> <type r> + +In essence, the entire system will be very ML-like internally, and we +can benefit from the research done on ML-compilation. + +However, we now get three major problems to confront: + +1. In the Scheme language not all situations can be completely type + analyzed. + +2. In particular, for some operations, even if the types of the + parameters are well defined, we can't determine the return type + generically. For example, [<fixnum>-<fixnum>-+] may have return + type <fixnum> _or_ <bignum>. + +3. Even if we can do a complete analysis, some closures will generate + a combinatoric explosion of branches. + + +Problem 1: Incomplete analysis + +We introduce a new type <boxed>. This data type has type <boxed> and +contents + +struct ior_boxed_t { + ior_type *type; /* pointer to class struct */ + void *data; /* generic field, may also contain immediate objects + */ +} + +For example, a boxed fixnum 4711 has type <boxed> and contents +{ <fixnum>, 4711 }. The boxed type essentially corresponds to Guile's +SCM type. It's just that the 1 or 3 or 7 or 16-bit type tag has been +replaced with a 32-bit type tag (the pointer to the class structure +describing the type of the object). + +This is more inefficient than the SCM type system, but it's no problem +since it won't be used in 95% of all cases. The big advantage +compared to SCM's type system is that it is so simple and uniform. + +I should note here that while SCM and Guile are centered around the +cell representation and all objects either _are_ cells or have a cell +handle, objects in ior will more look like mallocs. This is the +reason why I planned to start with Böhm's GC which has C pointers as +object handles. But it is of course still possible to use a heap, or, +preferably several heaps for different kinds of objects. (Böhm's GC +has multiple heaps for different sizes of objects.) If we write a +custom GC, we can increase speed further. + + +Problem 3 (yes, I skipped :) Combinatoric explosion + +We simply don't generate all possible branches. In the interpreter we +generate branches "just-too-late" (well, it's normally called "lazy +compilation" or "just-in-time", but if it was "in-time", the procedure +would already be compiled when it was needed, right? :) as when Guile +memoizes or when a Java machine turns byte-codes into machine code, or +as when GOOPS turns methods into cmethods for that matter. + +Have noticed that branches (although still without return type +information) already exist in GOOPS? They are currently called +"cmethods" and are generated on demand from the method code and put +into the GF cache during evaluation of GOOPS code. :-) (I have not +utilized this fully yet. I plan to soon use this method compilation +(into branches) to eliminate almost all type dispatch in calls to +accessors.) + +For the compiler, we use profiling information, just as the modern GCC +scheduler, or else relies on some type analysis (if a procedure says +(+ x y), x is not normally a <string> but rather some subclass of +<number>) and some common sense (it's usually more important to +generate <fixnum> branches than <foobar> branches). + +The rest of the cases can be handled by <boxed>-branches. We can, for +example, have a: + +[<boxed>-<boxed>-bar] = + (branch ((x <boxed>) (y <boxed>)) + ... + ([<boxed>-foo] x) + ...) + +[<boxed>-foo] will use an efficient type dispatch mechanism (for +example akin to the GOOPS one) to select the right branch of +`display'. + + +Problem 2: Ambiguous return type + +If the return type of a branch is ambiguous, we simply define the +return type as <boxed>, and box data at the point in the branch where +it can be decided which type of data we will return. This is how +things can be handled in the general case. However, we might be able +to handle things in a more neat way, at least in some cases: + +During compilation to byte code, we'll probably use an intermediate +representation in continuation passing style. We might even use a +subtype of branches reprented as continuations (not a heavy +representation, as in Guile and SCM, but probably not much more than a +function pointer). This is, for example, one way of handling tail +recursion, especially mutual tail recursion. + +One case where we would like to try really hard not to box data is +when fixnums "overflow into" bignums. + +Let's say that the branch [<fixnum>-<fixnum>-bar] contains a form + + (+ x y) + +where the type analyzer knows that x and y are fixnums. We then split +the branch right after the form and let it fork into two possible +continuation branches bar1 and bar2: + +[The following is only pseudo code. It can be made efficient on the C + level. We can also use the asm compiler directive in conditional + compilation for GCC on i386. We could even let autoconf/automake + substitute an architecture specific solution for multiple + architectures, but still support a C level default case.] + + (if (sum-over/underflow? x y) + (bar1 (fixnum->bignum x) (fixnum->bignum y) ...) + (bar2 x y ...)) + +bar1 begins with the evaluation of the form + + ([<bignum>-<bignum>-+] x y) + +while bar 2 begins with + + ([<fixnum>-<fixnum>-+] x y) + +Note that the return type of each of these forms is unambiguous. + + +Now some random points from the design: + +* The basic concept in Ior is the class. A type is a concrete class. + Classes which are subclasses of <object> are concrete, otherwise they + are abstract. + +* A procedure is a collection of methods. Each method can have + arbitrary number of parameters of arbitrary class (not type). + +* The type of a method is the tuple of it's argument classes. + +* The type of a procedure is the set of it's method types. + +But the most important new concept is the branch. +Regard the procedure: + +(define (half x) + (quotient x 2)) + +The procedure half will have the single method + + (method ((x <top>)) + (quotient x 2)) + +When `(half 128)' is called the Ior evaluator will create a new branch +during the actual evaluation. I'm now going to extend the branch +syntax by adding a second list of formals: the continuations of the +branch. + +* The type of a branch is namely the tuple of the tuple of it's + argument types (not classes!) and the tuple of it's continuation + argument types. The branch generated above will be: + + (branch ((x <fixnum>) ((c <fixnum>)) + (c (quotient x 2))) + + If the method + + (method ((x <top>) (y <top>)) + (quotient (+ x 1) y)) + + is called with arguments 1 and 2 it results in the branch + + (branch ((x <fixnum>) (y <fixnum>)) ((c1 <fixnum>) (c2 <bignum>)) + (quotient (+ x 1 c3) 2)) + + where c3 is: + + (branch ((x <fixnum>) (y <fixnum>)) ((c <bignum>)) + (quotient (+ (fixnum->bignum x) 1) 2) + +The generated branches are stored in a cache in the procedure object. + + +But wait a minute! What about variables and data structures? + +In essence, what we do is that we fork up all data paths so that they +can be typed: We put the type tags on the _data paths_ instead of on +the data itself. You can look upon the "branches" as tubes of +information where the type tag is attached to the tube instead of on +what passes through it. + +Variables and data structures are part of the "tubes", so they need to +be typed. For example, the generic pair looks like: + +(define-class <pair> () + car-type + car + cdr-type + cdr) + +But note that since car and cdr are generic procedures, we can let +more efficient pairs exist in parallel, like + +(define-class <immutable-fixnum-list> () + (car (class <fixnum>)) + (cdr (class <immutable-fixnum-list>))) + +Note that instances of this last type only takes two words of memory! +They are easy to use too. We can't use `cons' or `list' to create +them, since these procedures can't assume immutability, but we don't +need to specify the type <fixnum> in our program. Something like + + (const-cons 1 x) + +where x is in the data flow path tagged as <immutable-fixnum-list>, or + + (const-list 1 2 3) + + +Some further notes: + +* The concepts module and instance are the same thing. Using other + modules means 1. creating a new module class which inherits the + classes of the used modules and 2. instantiating it. + +* Module definitions and class definitions are equivalent but + different syntactic sugar adapted for each kind of use. + +* (define x 1) means: create an instance variable which is itself a + subclass of <boxed> with initial value 1 (which is an instance of + <fixnum>). + + +The interpreter is a mixture between a stack machine and a register +machine. The evaluator looks like this... :) + + /* the interpreter! */ + if (!setjmp (ior_context->exit_buf)) +#ifndef i386_GCC + while (1) +#endif + (*ior_continue) (IOR_MICRO_OP_ARGS); + +The branches are represented as an array of pointers to micro +operations. In essence, the evaluator doesn't exist in itself, but is +folded out over the entire implementation. This allows for an extreme +form of modularity! + +The i386_GCC is a machine specific optimization which avoids all +unnecessary popping and pushing of the CPU stack (which is different +from the Ior data stack). + +The execution environment consists of + +* a continue register similar to the program counter in the CPU +* a data stack (where micro operation arguments and results are stored) +* a linked chain of environment frames (but look at exception below!) +* a dynamic context + +I've written a small baby Ior which uses Guile's infrastructure. +Here's the context from that baby Ior: + +typedef struct ior_context_t { + ior_data_t *env; /* rest of environment frames */ + ior_cont_t save_continue; /* saves or represents continuation */ + ior_data_t *save_env; /* saves or represents environment */ + ior_data_t *fluids; /* array of fluids (use GC_malloc!) */ + int n_fluids; + int fluids_size; + /* dynwind chain is stored directly in the environment, not in context */ + jmp_buf exit_buf; + IOR_SCM guile_protected; /* temporary */ +} ior_context_t; + +There's an important exception regarding the lowest environment +frame. That frame isn't stored in a separate block on the heap, but +on Ior's data stack. Frames are copied out onto the heap when +necessary (for example when closures "escape"). + + +Now a concrete example: + +Look at: + +(define sum + (lambda (from to res) + (if (= from to) + res + (sum (+ 1 from) to (+ from res))))) + +This can be rewritten into CPS (which captures a lot of what happens +during flow analysis): + +(define sum + (lambda (from to res c1) + (let ((c2 (lambda (limit?) + (let ((c3 (lambda () + (c1 res))) + (c4 (lambda () + (let ((c5 (lambda (from+1) + (let ((c6 (lambda (from+res) + (sum from+1 to from+res c1)))) + (_+ from res c6))))) + (_+ 1 from c5))))) + (_if limit? c3 c4))))) + (_= from to c2)))) + +Finally, after branch expansion, some optimization, code generation, +and some optimization again, we end up with the byte code for the two +branches (here marked by labels `sum' and `sumbig'): + + c5 + (ref -3) + (shift -1) + (+ <fixnum> <fixnum> c4big) + ;; c4 + (shift -2) + (+ <fixnum> 1 sumbig) + ;; c6 + sum + (shift 3) + (ref2 -3) + ;; c2 + (if!= <fixnum> <fixnum> c5) + ;; c3 + (ref -1) + ;; c1 + (end) + + c5big + (ref -3) + (shift -1) + (+ <bignum> <bignum>) + c4big + (shift -2) + (+ <bignum> 1) + ;; c6 + sumbig + (shift 3) + (ref2 -3) + ;; c2 + (= <bignum> <bignum>) + (if! c5big) + ;; c3 + (ref -1) + ;; c1 + (end) + +Let's take a closer look upon the (+ <fixnum> 1 sumbig) micro +operation. The generated assembler from the Ior C source + machine +specific optimizations for i386_GCC looks like this (with some rubbish +deleted): + +ior_int_int_sum_intbig: + movl 4(%ebx),%eax ; fetch arg 2 + addl (%ebx),%eax ; fetch arg 1 and do the work! + jo ior_big_sum_int_int ; dispatch to other branch on overflow + movl %eax,(%ebx) ; store result in first environment frame + addl $8,%esi ; increment program counter + jmp (%esi) ; execute next opcode + +ior_big_sum_int_int: + +To clearify: This is output from the C compiler. I added the comments +afterwards. + +The source currently looks like this: + +IOR_MICRO_BRANCH_2_2 ("+", int, big, sum, int, int, 1, 0) +{ + int res = IOR_ARG (int, 0) + IOR_ARG (int, 1); + IOR_JUMP_OVERFLOW (res, ior_big_sum_int_int); + IOR_NEXT2 (z); +} + +where the macros allow for different definitions depending on if we +want to play pure ANSI or optimize for a certain machine/compiler. + +The plan is actually to write all source in the Ior language and write +Ior code to translate the core code into bootstrapping C code. + +Please note that if i386_GCC isn't defined, we run plain portable ANSI C. + + +Just one further note: + +In Ior, there are three modes of evaluation + +1. evaluating and type analyzing (these go in parallel) +2. code generation +3. executing byte codes + +It is mode 3 which is really fast in Ior. + +You can look upon your program as a web of branch segments where one +branch segment can be generated from fragments of many closures. Mode +switches doesn't occur at the procedure borders, but at "growth +points". I don't have time to define them here, but they are based +upon the idea that the continuation together with the type signature +of the data flow path is unique. + +We normally run in mode 3. When we come to a source growth point +(essentially an apply instruction) for uncompiled code we "dive out" +of mode 3 into mode 1 which starts to eval/analyze code until we come +to a "sink". When we reach the "sink", we have enough information +about the data path to do code generation, so we backtrack to the +source growth point and grow the branch between source and sink. +Finally, we "dive into" mode 3! + +So, code generation doesn't respect procedure borders. We instead get +a very neat kind of inlining, which, e.g., means that it is OK to use +closures instead of macros in many cases. +---------------------------------------------------------------------- +Ior and module system +===================== + +How, exactly, should the module system of Ior look like? + +There is this general issue of whether to have a single-dispatch or +multi-dispatch system. Personally, I see that Scheme already use +multi-dispatch. Compare (+ 1.0 2) and (+ 1 2.0). + +As you've seen if you've read the notes about Ior design, efficiency +is not an issue here, since almost all dispatch will be eliminated +anyway. + +Also, note an interesting thing: GOOPS actually has a special, +implicit, argument to all of it's methods: the lexical environment. +It would be very ugly to add a second, special, argument to this. + +Of course, the theoreticians have already recognised this, and in many +systems, the implicit argument (the object) and the environment for +the method is the same thing. + +I think we should especially take impressions from Matthias Blume's +module/object system. + +The idea, now, for Ior (remember that everything about Ior is +negotiable between us) is that a module is a type, as well as an +instance of that type. The idea is that we basically keep the GOOPS +style of methods, with the implicit argument being the module object +(or some other lexical environment, in a chain with the module as +root). + +Let's say now that module C uses modules A and B. Modules A and B +both exports the procedure `foo'. But A:foo and B:foo as different +sets of methods. + +What does this mean? Well, it obviously means that the procedure +`foo' in module C is a subtype of A:foo and B:foo. Note how this is +similar in structure to slot inheritance: When class C is created with +superclasses A and B, the properties of a slot in C are created +through slot inheritance. One way of interpreting variable foo in +module A is as a slot with init value foo. Through the MOP, we can +specify that procedure slot inheritance in a module class implies +creation of new init values through inheritance. + +This may look like a kludge, and perhaps it is, and, sure, we are not +going to accept any kludges in Ior. But, it might actually not be a +kludge... + +I think it is commonly accepted by computer scientists that a module, +and/or at least a module interface is a type. Again, this type can be +seen as the set of types of the functions in the interface. The types +of our procedures are the set of branch types the provide. It is then +natural that a module using two other modules create new procedure +types by folding. + +This thing would become less cloudy (yes, this is a cloudy part of my +reasoning; I meant previously that the interpreter itself is now +clear) if module interfaces were required to be explicitly types. + +Actually, this would fit much better together with the rest of Ior's +design. On one hand, we might be free to introduce such a restriction +(compiler writers would applaud it), since R5RS hasn't specified any +module system. On the other hand, it might be strange to require +explicit typing when Scheme is fundamentally implicitly types... + +We also have to consider that a module has an "inward" face, which is +one type, and possibly many "outward" faces, which are different +types. (Compare the idea of "interfaces" in Scheme48.) + +It thus, seems that, while a module can truly be an Ior class, the +reverse should probably not hold in the general case... + +Unless + + instance <-> module proper + class of the instance <-> "inward interface" + superclasses <-> "outward interfaces + inward uses" + +...hmm, is this possible to reconcile with Rees' object system? + +Please think about these issues. We should try to end up with a +beautiful and consistent object/module system. + +---------------------------------------------------------------------- + +Here's a difficult problem in Ior's design: + +Let's say that we have a mutable data structure, like an ordinary +list. Since, in Ior, the type tag (which is really a pointer to a +class structure) is stored separately from the data, it is thinkable +that another thread modifies the location in the list between when our +thread reads the type tag and when it reads the data. + +The reading of type and data must be made atomic in some way. +Probably, some kind of locking of the heap is required. It's just +that it may cause a lot of overhead to look the heap at every *read* +from a mutable data structure. + +Look how much trouble those set!-operations cause! Not only does it +force us to store type tags for each car and cdr in the list, but it +also forces a lot of explicit dispatch to be done, and causes troubles +in a threaded system... + +---------------------------------------------------------------------- + +Jim Blandy <jimb@red-bean.com> writes: + +> We also should try to make less work for the GC, by avoiding consing +> up local environments until they're closed over. + +Did the texts which I sent to you talk about Ior's solution? + +It basically is: Use *two* environment "arguments" to the evaluator +(in Ior, they aren't arguments but registers): + +* One argument is a pointer to the "top" of an environment stack. + This is used in the "inner loop" for very efficient access to + in-between results. The "top" segment of the environment stack is + also regarded as the first environment frame in the lexical + environment. ("top" is bottom on a stack which grows downwards) + +* The other argument points to a structure holding the evaluation + context. In this context, there is a pointer to the chain of the + rest of the environment frames. Note that since frames are just + blocks of SCM values, you can very efficiently "release" a frame + into the heap by block copying it (remember that Ior uses Boehms GC; + this is how we allocate the block). |