diff options
author | michele.simionato <devnull@localhost> | 2009-06-29 15:26:59 +0000 |
---|---|---|
committer | michele.simionato <devnull@localhost> | 2009-06-29 15:26:59 +0000 |
commit | d8e3dcdb814db0c6c11e85cf6961a6691c3d8d00 (patch) | |
tree | ddc8860a7cf908f51fcbc0789fde2c3ead8b2ab1 /artima | |
parent | 3e172339013209f767e655a44d601766d5bfd9b5 (diff) | |
download | micheles-d8e3dcdb814db0c6c11e85cf6961a6691c3d8d00.tar.gz |
Lot of work on scheme30
Diffstat (limited to 'artima')
-rw-r--r-- | artima/scheme/scheme30.ss | 506 |
1 files changed, 299 insertions, 207 deletions
diff --git a/artima/scheme/scheme30.ss b/artima/scheme/scheme30.ss index 7bae815..231c8d9 100644 --- a/artima/scheme/scheme30.ss +++ b/artima/scheme/scheme30.ss @@ -1,264 +1,356 @@ -#|Runtime pattern matching and object oriented APIs -======================================================= +#|Comparing identifiers +================================================= -As a final example of macros power, this episode will implement a syntax -for full runtime pattern matching of lists . +Equality of identifiers is one of the trickiest things in R6RS +Scheme, since the meaning of identifier equality in a lexically +scoped language with hygienic macros is non-obvious. -.. _13: http://www.artima.com/weblogs/viewpost.jsp?thread=248953 -.. _15: http://www.artima.com/weblogs/viewpost.jsp?thread=249681 -.. _association lists: +Identifier equality is a compile-time +concept which has nothing to do with the run-time concept of equality +between variables: two variables are equal if their underlying values +are equal, but the value to which a variable is bound is only known +at run-time, therefore identifier equality cannot rely on it. -list-match ----------------------------------------------------------------- +Identifiers are not variables: they are syntax objects with an +underlying symbol and an underlying lexical context, which are known +statically at compile time. It is possible to know if an +identifier is bound or not at compile-time, but the value the +identifier will be bound to at run-time is unknown. -In episode 15_ I have discussed a very limited form of pattern -matching, i.e. list destructuring. There is more to -pattern matching than that. +The Scheme standard define two different equality procedures +for identifiers (``bound-identifier=?`` and ``free-identifier=?``, +with absolutely misleading names); to those, I will add a third one, +``symbol-identifier=?``. -A general list matcher should be able to -destructure a list, whatever complicate it may be, and to perform -*different* actions for different structures. In this sense, pattern -matching is a kind of if statement on steroids. +This episode will be spent discussing the subtilities of identifier +equality. -The Scheme standard does not provide a default list matcher, but there -are tons of libraries providing such a feature, and in the appendix -we will provide an implementation of ``list-match`` ourselves, so -let us assume we have list matcher accepting the following patterns: +.. image:: Love_equality.png -.. code-block:: scheme +``symbol-identifier=?`` +----------------------------------------------- - > (list-match <patterns>) - ((list-match lst (sub pattern action guard ...) ...)) +The simplest concept of identifier equality is expressed by +the following ``symbol-identifier=?`` comparison function +(for convenience, I have added the ``symbol-identifier=?`` +precedure in the ``(aps lang)`` library): -Given an object (which usually is a list), the matcher compares -it with various patterns, and perform different actions depending -on the match, which can be further restricted with a guard. -Here are a few trivial tests: +$$lang:SYMBOL-IDENTIFIER=? -$$MATCH-TESTS +Two identifiers are ``symbol-identifier=?`` if they are equal as +symbols. -As usual, the understore ``_`` means "match but do not bind. -Here are two further tests, that makes use of the following helper function: +``symbol-identifier=?`` can +be used to manage duplicate names in macros defining multiple +identifiers at once, or in macros defining name->value tables, such as +the ``static-map`` I discussed in episode 22. Moreover, +``symbol-identifier=?`` can be used to reject reserved +identifiers (you may need such +functionality if are buildin a mini-language on top of Scheme +and you want to reject a few identifiers as language keywords), as +in the following macro: -$$MATCH-ODD +$$CHECK-RESERVED -``replace-odd`` makes use of guarded patterns to replace the odd -numbers in a list of numbers with the string "odd": +``(check-reserved id)`` will raise a syntax-violation if ``id`` +is one of the keyword ``reserved1`` or ``reserved2``. -$$MATCH-ODD1 +``bound-identifier=?`` +----------------------------------------------- -``replace-odd`` only recognizes numbers and lists: it chokes on strings -or any other kind of object, by raising a ``pattern failure`` error: +``symbol-identifier=?`` is simple and easy to understand, but it cannot +be used in all situations. Consider for instance the very first macro +I wrote, in episode 9_: -$$MATCH-ODD2 +.. code-block:: scheme -This example is simple, but it does not make justice to the power of -pattern matching. The following example, however, will do (or at least -I hope so): it explains how to implement object oriented APIs in terms -of pattern matching functions. + (def-syntax (multi-define (name1 name2 ...) (value1 value2 ...)) + #'(begin (define name1 value1) (define name2 value2) ...)) -Pattern matching vs message passing -------------------------------------------------------- +It is quite common to write macros defining multiple bindings, such +as ``multi-define``. There is also the common situation of +macros which expand themselves to macros defining +multiple bindings. When code is generated automatically, errors are +more than likely and it makes sense to manage explicitly +the situation when the list of names to be defined contains some +duplicate. ``multi-define`` as written does not perform any check, so that it +relies on the standard behavior of R6RS scheme, raising an error. +Unfortunately, the standard behavior only applies to programs and scripts, +whereas the REPL is quite free to behaves differently and indeed it does: -A typical use case for -full pattern matching is message passing: in functional languages you -can send lists as messages to an object, and the object will respond -differently depending on the structure of the list. This is the -standard mechanism to send messages in Erlang, but can be done in any -functional language with pattern matching. +.. code-block:: scheme -I you think about it for a minute, you will recognize that pattern -matching makes the creation of object oriented APIs rather trivial. -Let consider for instance Scheme `association lists`_, which by default -have a functional API, and suppose we want to provide an object oriented -interface on top on it, say a Python-like API. The task can be easily -performed via pattern matching, as follows: + > (multi-define (a a) (1 2)) + a + 2 -$$ALIST +(in the REPL latter definitions override previous definitions). Being +unhappy with that, we can introduce a ``bound-identifier=?`` check +and raise a custom exception: -Here we used the R6RS procedures ``(assq k a)`` to extract the slot -corresponding to the key ``k`` from the association list ``a``, as -well as the procedure ``set-cdr!`` described in episode 13_ to modify -the ``cdr`` of a slot (imported from ``(rnrs mutable-pairs)``). -Notice that I am using the term *slot* here to refer to -the inner lists that compose an association list. +$$MULTI-DEFINE -Here are a few tests: +Two identifiers are equal according to ``bound-identifier=?`` only if +they have the same name *and* the same marks. The name is +misleading since the arguments of ``bound-identifier=?`` are not +required to be bound identifiers; a better name would be +``strict-identifier=?``. -$$ALIST-TESTS +You can ``check`` that ``multi-define`` correctly reject duplicated +identifiers: -Appendix: implementation of ``list-match`` -------------------------------------------------------------- +.. code-block:: scheme -The implementation below is not my own, I have adapted -(i.e. shamelessly copied) it from Phil Bewig, which in turns copied it -from Jos Koot, which maybe is the original author, or maybe he copied -it from some other source. But attribution is not important, the -important thing is the code: + > (multi-define (a a) (1 2)) + Unhandled exception + Condition components: + 1. &who: multi-define + 2. &message: "Found duplicated identifier in" + 3. &syntax: + form: (a a) + subform: #f -$$aps/list-match: +You may also believe that using ``symbol-identifier=?`` would work too. +However this is not the case, in general. Consider +for instance the following macro expanding to ``multi-define``: -The implementation above makes use of a few tricks which are well known -to experienced Schemers, but may look unfamiliar to my readers. +$$MULTI-DEFINE2 -First of all, we leverage on the fact that ``(and boolean expr)`` returns -``expr`` if ``boolean`` is true and ``#f`` otherwise, i.e. it is equivalent -to ``(if boolean expr #f)``. Then, we use the ``=>`` form of the conditional -expression. -As explained in the `cond specification`, a ``cond`` clause may -have the form ``(value => func)``; when the value is not false, -then the function is applied to it and ``cond`` returns the result, -otherwise ``cond`` looks at the next clause. For instance +The macro introduces a dummy identifier ``id2``. +Had we defined ``multi-define`` in terms of ``symbol-identifier=?``, +calling ``multi-define2`` with argument ``id`` equal to ``id2`` +would have generated a spurious name clash. Fortunately, since we defined +``multi-define`` in terms of ``bound-identifier=?``, nothing bad happens: .. code-block:: scheme - > (cond (#f => any-function) ('(1) => car)) + > (multi-define2 id2 1) + id2 1 -Notice that the underscore is a special identifier in macros -(just as the ellipsis identifier ``...``` is special) and it -cannot be used as a literal, this is why the trick -``(free-identifier=? #'_ underscore)`` is used. +``bound-identifier=?`` works in this case because +the identifier ``id2`` introduced by the macro has +different marks from the identifier ``id2`` coming as an argument. + +``free-identifier=?`` +------------------------------------ + +``bound-identifier=?`` is not good for every circumstance. Consider +for instance the following variation of ``multi-define``, featuring +a literal keyword ``as``: + +$$MULTI-DEF + +This work, but the error messages could stand some improvement. +For instance, if an user misspells the infix identifier ``as``, +she gets a generic ``"invalid syntax"`` error, whereas we +would like to provide a customized error message showing the misspelled +literal identifier. Using ``bound-identifier=?`` we could try to +solve the problem as follows: + +$$MULTI-DEF-BAD + +Unfortunately this solution does not work at all, since it raises +an error even when the ``as`` identifiers are spelled correctly: -I have exported the internal macro ``_match``, to make it -possible to understand how the implementation works. ``_match`` is -able to match a single pattern. For instance we have +.. code-block:: scheme + + > (multi-def-bad (x as y) (1 as 2)) + Unhandled exception + Condition components: + 1. &who: multi-def-bad + 2. &message: "Offending infix syntax (expected `as')" + 3. &syntax: + form: (as as) + subform: #f + +The reason is that ``as`` is not ``bound-identifier=?`` +to ``#'as``. We need a less strict comparison predicate such as +``symbol-identifier=?`` or ``free-identifier=?``. + +``free-identifier=?`` is +the most complicated equality predicate. As +usual, the name is misleading since the arguments of +``free-identifier=?`` are not required to be free identifiers. +A better name would be ``lax-identifier=?``. +Two identifiers are ``free-identifier=?`` if + +1. they are both bound and they share the same name, or they shared + the same name before renaming at import time; +2. they are both unbound and they share the same name. + +In implementations with full phase separation, the identifiers +must also be both bound/unbound in the same phase. +In all other cases the two identifiers are not ``free-identifier=?``. +Here is an example: + +.. code-block:: scheme + + > (import (only (aps list-utils) range)) + > (import (rename (aps list-utils) (range r))) + > (free-identifier=? #'r #'range) + #t + +Notice that both ``symbol-identifier=?`` and ``bound-identifier=?`` +would fail to recognize the identity of ``range`` and ``r`` in this +case. + +.. _9: http://www.artima.com/weblogs/viewpost.jsp?thread=240804 + +Literal identifiers and auxiliary syntax +----------------------------------------------------- + +Macros with literal identifiers may be surprising, because internally +the literal identifiers are compared by using ``free-identifier=?``. +Consider for instance the macro ``multi-def`` defined in the previous +paragraph. This works: .. code-block:: scheme - > (_match '(1 2) (x y) (list (+ x y))) - (3) + > (let () + (multi-def (x as 1) (y as 2)) + (list x y)) + (1 2) -Using ``syntax-expand`` we can see the inner working of ``_match``: +But this does not work: .. code-block:: scheme - > (syntax-expand(_match '(1 2) (x y) (list (+ x y)) #t)) - (let ((ob '(1 2))) - (and (pair? ob) - (let ((kar-obj ob) (kdr-obj (cdr ob))) - (_match kar-obj x - (_match kdr-obj (y) (list (+ x y)) #t))))) + > (let ((as 2)) + (multi-def (x as 1) (y as 2)) + (list x y)) + Unhandled exception + Condition components: + 1. &message: "invalid syntax" + 2. &syntax: + form: (multi-def (x as 1) (y as 2)) + subform: #f + 3. &trace: #<syntax (multi-def (x as 1) (y as 2))> + +The reason is that in the second example ``as`` is bound, and therefore +it is not ``free-identifier=?`` to the literal identifier ``#'as``. + +The recommended "solution" is to define at top-level and to +export some dummy definition for the literal identifiers you +are going to use in your macro. I think this is fundamentally +broken: literal identifiers should be a concept internal to +the macro and they should not be exported. The mistake is +that the R6RS requires the literal identifiers to be matched +via ``free-identifier=?``, whereas they should be matched +with ``symbol-identifier=?``. I never understood why the editors +decided to use ``free-identifier=?``, perhaps because it makes it +possible to rename the identifiers used as literal identifiers, +a feature that looks more like a misfeature to me. + +Anyway, the R6RS document defines a set of special macros, such as +``_``, ``...``, ``else`` and ``=>``, which lives in the global +namespace and are available to all R6RS programs. Such macros are used +as auxiliary syntax in various special forms, like ``cond`` and +``syntax-case``; for this reason they are usually called auxiliary +keywords. The existence of such global variables makes it impossible +to redefine them at top-level in scripts (but it can be done at the +REPL); however they can be redefined locally, thus breaking the macros +using the auxiliary syntax: +.. code-block:: scheme -.. _cond specification: http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-14.html#node_idx_376 + > (let ((else #f)) + (cond (else 'something))) + > ; does not return something |# -(import (rnrs) (sweet-macros) (aps list-utils) (aps list-match) - (rnrs mutable-pairs) (aps easy-test) (aps compat)) - -;;ALIST -;; a function performing pattern matching which provides a Pythonic -;; dictionary interface over an association list -(define (alist . a) - (lambda args - (list-match args - (sub () a); return the underlying alist - (sub ('->keys); return the keys - (map car a)) - (sub ('->values); return the values - (map cdr a)) - (sub (k); emulate Python .__getitem__ - (let ((slot (assq k a))) - (if slot - (cdr slot) - (error 'alist (format "Missing key ~a" k))))) - (sub (k 'or val); emulate Python .get - (let ((slot (assq k a))) - (if slot - (cdr slot) - val))) - (sub (k '! val); emulate Python .__setitem__ - (let ((slot (assq k a))) - (if slot - ;; modify existing slot - (begin (set-cdr! slot val) a) - ;; else append new slot - (let ((new-a (append a (list (cons k val))))) - (set! a new-a) a)))) - ))) +(import (rnrs) (sweet-macros) (aps compat) + (for (aps cut) expand) + (for (aps lang) expand) + (for (aps list-utils) expand) + (aps easy-test)) + +;;STATIC-MAP +(def-syntax (static-map (name value) ...) + #'(syntax-match (<names> name ...) + (sub (ctx <names>) #''(name ...)) + (sub (ctx name) #'value) + ...) + (distinct? symbol-identifier=? #'(name ...)) + (syntax-violation 'static-map "Found duplicated identifier in" + #'(name ...))) ;;END -(define a (alist '(x . 1) '(y . 2))) +;;MULTI-DEFINE +(def-syntax (multi-define (name1 name2 ...) (value1 value2 ...)) + #'(begin (define name1 value1) (define name2 value2) ...) + (distinct? bound-identifier=? #'(name1 name2 ...)) + (syntax-violation 'multi-define "Found duplicated identifier in" + #'(name1 name2 ...))) +;;END -;;MATCH-ODD -(define (replace-odd obj) - (list-match obj - (sub x "odd" (and (integer? x) (odd? x))) - (sub x x (integer? x)) - (sub x (map replace-odd x) (list? x)) - )) +;;MULTI-DEFINE2 +(def-syntax (multi-define2 id value) + #'(multi-define (id id2) (value 'dummy))) ;;END -(run +(def-syntax (compare-ids id1 id2) + (let ((res '())) + (when (symbol-identifier=? #'id1 #'id2) + (set! res (cons 'raw= res))) + (when (bound-identifier=? #'id1 #'id2) + (set! res (cons 'bound= res))) + (when (free-identifier=? #'id1 #'id2) + (set! res (cons 'free= res))) + (datum->syntax #'compare-ids `',res))) + +(let ((a 1)) + (define id-a #'a) + (let ((a a)) + (free-identifier=? #'a id-a))) + +;;MULTI-DEF-BAD +(def-syntax (multi-def-bad (name as_ value) ...) + #'(begin (define name value) ...) + (for-all (lambda (id) (bound-identifier=? id #'as)) #'(as_ ...)) + (syntax-violation 'multi-def-bad "Offending infix syntax (expected `as')" + (remp (lambda (id) (bound-identifier=? id #'as)) #'(as_ ...)))) +;;END - (test "patterns" - (list-match <patterns>) - '((list-match lst (sub pattern template guard ...) ...))) +;;MULTI-DEF + (define-syntax multi-def + (syntax-rules (as) + ((multi-def (name as value) ...) + (begin (define name value) ...)))) +;;END - ;;MATCH-TESTS - (test "match1" - (list-match '(1) (sub (x) x)) - 1) - - (test "match2" - (list-match '(1 2) (sub (_ y) y)) - 2) - - (test "match3" - (list-match '(1 (2 3)) (sub (_ (_ z)) z)) - 3) - ;;END - - ;;MATCH-ODD1 - (test "match-odd1" - (replace-odd '(1 (2 3))) - '("odd" (2 "odd"))) - ;;END - - ;;MATCH-ODD2 - (test "match-odd2" - (catch-error (replace-odd "string")) - "pattern failure") - ;;END - - ;;ALIST-TESTS - (test "alist1" - (a 'x) - 1) - - (test "alist2" - (a 'y) - 2) +;;FREE-DISTINCT +(def-syntax (free-distinct name ...) + (distinct? free-identifier=? #'(name ...))) +;;END - (test "alist3" - (catch-error (a 'z)) - "Missing key z") - - (test "alist4" - (a 'z 'or 3) - 3) - - (test "alist5" - (a 'z '! 3) - '((x . 1) (y . 2) (z . 3))) +;;BOUND-DISTINCT +(def-syntax (bound-distinct name ...) + (distinct? bound-identifier=? #'(name ...))) +;;END + +;;SYMBOL-DISTINCT +(def-syntax (symbol-distinct name ...) + (distinct? symbol-identifier=? #'(name ...))) +;;END + +;;CHECK-RESERVED +(def-syntax (check-reserved id) + (syntax-violation 'check-reserved "Reserved identifier" #'id) + (exists (cut symbol-identifier=? #'id <>) (list #'reserved1 #'reserved2)) + #f) +;;END + +(def-syntax (distinct-x a) + #'(symbol-distinct x a)) + +(run + (test "symbol1" (symbol-distinct a b) #t) + (test "symbol2" (symbol-distinct a a) #f) - (test "alist6" - (a 'y '! 4) - '((x . 1) (y . 4) (z . 3))) - - (test "alist7" - (a '->keys) - '(x y z)) - - (test "alist8" - (a '->values) - '(1 4 3)) - ;;END + (test "bound1" (bound-distinct a b) #t) + (test "bound2" (bound-distinct a a) #f) + (test "free1" (free-distinct a b) #t) + (test "free2" (free-distinct a a) #f) ) - -; (test "match4" -; (list-match '(1 (2 (x 4) 5) 6) -; (sub x (and (integer? x) (odd? x)) "odd") -; (sub x (and (integer? x) (odd? x)) "odd")) 1) |