summaryrefslogtreecommitdiff
path: root/doc/ref/sxml-match.texi
blob: 9b5f1dbd424e63fddf6a154fead42f18fbfd7369 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
@c -*-texinfo-*-
@c This is part of the GNU Guile Reference Manual.
@c Copyright (C) 2010, 2013  Free Software Foundation, Inc.
@c See the file guile.texi for copying conditions.
@c
@c Based on the documentation at
@c <http://planet.plt-scheme.org/package-source/jim/sxml-match.plt/1/1/doc.txt>,
@c copyright 2005 Jim Bender, and released under the MIT/X11 license (like the
@c rest of `sxml-match'.)
@c
@c Converted to Texinfo and modified by Ludovic Courtès, 2010.

@node sxml-match
@section @code{sxml-match}: Pattern Matching of SXML

@cindex pattern matching (SXML)
@cindex SXML pattern matching

The @code{(sxml match)} module provides syntactic forms for pattern
matching of SXML trees, in a ``by example'' style reminiscent of the
pattern matching of the @code{syntax-rules} and @code{syntax-case} macro
systems.  @xref{SXML}, for more information on SXML.

The following example@footnote{This example is taken from a paper by
Krishnamurthi et al.  Their paper was the first to show the usefulness of the
@code{syntax-rules} style of pattern matching for transformation of XML, though
the language described, XT3D, is an XML language.} provides a brief
illustration, transforming a music album catalog language into HTML.

@lisp
(define (album->html x)
  (sxml-match x
    ((album (@@ (title ,t)) (catalog (num ,n) (fmt ,f)) ...)
     `(ul (li ,t)
          (li (b ,n) (i ,f)) ...))))
@end lisp

Three macros are provided: @code{sxml-match}, @code{sxml-match-let}, and
@code{sxml-match-let*}.

Compared to a standard s-expression pattern matcher (@pxref{Pattern
Matching}), @code{sxml-match} provides the following benefits:

@itemize
@item
matching of SXML elements does not depend on any degree of normalization of the
SXML;
@item
matching of SXML attributes (within an element) is under-ordered; the order of
the attributes specified within the pattern need not match the ordering with the
element being matched;
@item
all attributes specified in the pattern must be present in the element being
matched; in the spirit that XML is 'extensible', the element being matched may
include additional attributes not specified in the pattern.
@end itemize

The present module is a descendant of WebIt!, and was inspired by an
s-expression pattern matcher developed by Erik Hilsdale, Dan Friedman, and Kent
Dybvig at Indiana University.

@unnumberedsubsec Syntax

@code{sxml-match} provides @code{case}-like form for pattern matching of XML
nodes.

@deffn {Scheme Syntax} sxml-match input-expression clause1 clause2 @dots{}
Match @var{input-expression}, an SXML tree, according to the given @var{clause}s
(one or more), each consisting of a pattern and one or more expressions to be
evaluated if the pattern match succeeds.  Optionally, each @var{clause} within
@code{sxml-match} may include a @dfn{guard expression}.
@end deffn

The pattern notation is based on that of Scheme's @code{syntax-rules} and
@code{syntax-case} macro systems.  The grammar for the @code{sxml-match} syntax
is given below:

@verbatim
match-form ::= (sxml-match input-expression
                 clause+)

clause ::= [node-pattern action-expression+]
         | [node-pattern (guard expression*) action-expression+]

node-pattern ::= literal-pattern
               | pat-var-or-cata
               | element-pattern
               | list-pattern

literal-pattern ::= string
                  | character
                  | number
                  | #t
                  | #f

attr-list-pattern ::= (@ attribute-pattern*)
                    | (@ attribute-pattern* . pat-var-or-cata)

attribute-pattern ::= (tag-symbol attr-val-pattern)

attr-val-pattern ::= literal-pattern
                   | pat-var-or-cata
                   | (pat-var-or-cata default-value-expr)

element-pattern ::= (tag-symbol attr-list-pattern?)
                  | (tag-symbol attr-list-pattern? nodeset-pattern)
                  | (tag-symbol attr-list-pattern?
                                nodeset-pattern? . pat-var-or-cata)

list-pattern ::= (list nodeset-pattern)
               | (list nodeset-pattern? . pat-var-or-cata)
               | (list)

nodeset-pattern ::= node-pattern
                  | node-pattern ...
                  | node-pattern nodeset-pattern
                  | node-pattern ... nodeset-pattern

pat-var-or-cata ::= (unquote var-symbol)
                  | (unquote [var-symbol*])
                  | (unquote [cata-expression -> var-symbol*])
@end verbatim

Within a list or element body pattern, ellipses may appear only once, but may be
followed by zero or more node patterns.

Guard expressions cannot refer to the return values of catamorphisms.

Ellipses in the output expressions must appear only in an expression context;
ellipses are not allowed in a syntactic form.

The sections below illustrate specific aspects of the @code{sxml-match} pattern
matcher.

@unnumberedsubsec Matching XML Elements

The example below illustrates the pattern matching of an XML element:

@lisp
(sxml-match '(e (@@ (i 1)) 3 4 5)
  ((e (@@ (i ,d)) ,a ,b ,c) (list d a b c))
  (,otherwise #f))
@end lisp

Each clause in @code{sxml-match} contains two parts: a pattern and one or more
expressions which are evaluated if the pattern is successfully match.  The
example above matches an element @code{e} with an attribute @code{i} and three
children.

Pattern variables must be ``unquoted'' in the pattern.  The above expression
binds @var{d} to @code{1}, @var{a} to @code{3}, @var{b} to @code{4}, and @var{c}
to @code{5}.

@unnumberedsubsec Ellipses in Patterns

As in @code{syntax-rules}, ellipses may be used to specify a repeated pattern.
Note that the pattern @code{item ...} specifies zero-or-more matches of the
pattern @code{item}.

The use of ellipses in a pattern is illustrated in the code fragment below,
where nested ellipses are used to match the children of repeated instances of an
@code{a} element, within an element @code{d}.

@lisp
(define x '(d (a 1 2 3) (a 4 5) (a 6 7 8) (a 9 10)))

(sxml-match x
  ((d (a ,b ...) ...)
   (list (list b ...) ...)))
@end lisp

The above expression returns a value of @code{((1 2 3) (4 5) (6 7 8) (9 10))}.

@unnumberedsubsec Ellipses in Quasiquote'd Output

Within the body of an @code{sxml-match} form, a slightly extended version of
quasiquote is provided, which allows the use of ellipses.  This is illustrated
in the example below.

@lisp
(sxml-match '(e 3 4 5 6 7)
  ((e ,i ... 6 7) `("start" ,(list 'wrap i) ... "end"))
  (,otherwise #f))
@end lisp

The general pattern is that @code{`(something ,i ...)} is rewritten as
@code{`(something ,@@i)}.

@unnumberedsubsec Matching Nodesets

A nodeset pattern is designated by a list in the pattern, beginning the
identifier list.  The example below illustrates matching a nodeset.

@lisp
(sxml-match '("i" "j" "k" "l" "m")
  ((list ,a ,b ,c ,d ,e)
   `((p ,a) (p ,b) (p ,c) (p ,d) (p ,e))))
@end lisp

This example wraps each nodeset item in an HTML paragraph element.  This example
can be rewritten and simplified through using ellipsis:

@lisp
(sxml-match '("i" "j" "k" "l" "m")
  ((list ,i ...)
   `((p ,i) ...)))
@end lisp

This version will match nodesets of any length, and wrap each item in the
nodeset in an HTML paragraph element.

@unnumberedsubsec Matching the ``Rest'' of a Nodeset

Matching the ``rest'' of a nodeset is achieved by using a @code{. rest)} pattern
at the end of an element or nodeset pattern.

This is illustrated in the example below:

@lisp
(sxml-match '(e 3 (f 4 5 6) 7)
  ((e ,a (f . ,y) ,d)
   (list a y d)))
@end lisp

The above expression returns @code{(3 (4 5 6) 7)}.

@unnumberedsubsec Matching the Unmatched Attributes

Sometimes it is useful to bind a list of attributes present in the element being
matched, but which do not appear in the pattern.  This is achieved by using a
@code{. rest)} pattern at the end of the attribute list pattern.  This is
illustrated in the example below:

@lisp
(sxml-match '(a (@@ (z 1) (y 2) (x 3)) 4 5 6)
  ((a (@@ (y ,www) . ,qqq) ,t ,u ,v)
   (list www qqq t u v)))
@end lisp

The above expression matches the attribute @code{y} and binds a list of the
remaining attributes to the variable @var{qqq}.  The result of the above
expression is @code{(2 ((z 1) (x 3)) 4 5 6)}.

This type of pattern also allows the binding of all attributes:

@lisp
(sxml-match '(a (@@ (z 1) (y 2) (x 3)))
  ((a (@@ . ,qqq))
   qqq))
@end lisp

@unnumberedsubsec Default Values in Attribute Patterns

It is possible to specify a default value for an attribute which is used if the
attribute is not present in the element being matched.  This is illustrated in
the following example:

@lisp
(sxml-match '(e 3 4 5)
  ((e (@@ (z (,d 1))) ,a ,b ,c) (list d a b c)))
@end lisp

The value @code{1} is used when the attribute @code{z} is absent from the
element @code{e}.

@unnumberedsubsec Guards in Patterns

Guards may be added to a pattern clause via the @code{guard} keyword.  A guard
expression may include zero or more expressions which are evaluated only if the
pattern is matched.  The body of the clause is only evaluated if the guard
expressions evaluate to @code{#t}.

The use of guard expressions is illustrated below:

@lisp
(sxml-match '(a 2 3)
  ((a ,n) (guard (number? n)) n)
  ((a ,m ,n) (guard (number? m) (number? n)) (+ m n)))
@end lisp

@unnumberedsubsec Catamorphisms

The example below illustrates the use of explicit recursion within an
@code{sxml-match} form.  This example implements a simple calculator for the
basic arithmetic operations, which are represented by the XML elements
@code{plus}, @code{minus}, @code{times}, and @code{div}.

@lisp
(define simple-eval
  (lambda (x)
    (sxml-match x
      (,i (guard (integer? i)) i)
      ((plus ,x ,y) (+ (simple-eval x) (simple-eval y)))
      ((times ,x ,y) (* (simple-eval x) (simple-eval y)))
      ((minus ,x ,y) (- (simple-eval x) (simple-eval y)))
      ((div ,x ,y) (/ (simple-eval x) (simple-eval y)))
      (,otherwise (error "simple-eval: invalid expression" x)))))
@end lisp

Using the catamorphism feature of @code{sxml-match}, a more concise version of
@code{simple-eval} can be written.  The pattern @code{,(x)} recursively invokes
the pattern matcher on the value bound in this position.

@lisp
(define simple-eval
  (lambda (x)
    (sxml-match x
      (,i (guard (integer? i)) i)
      ((plus ,(x) ,(y)) (+ x y))
      ((times ,(x) ,(y)) (* x y))
      ((minus ,(x) ,(y)) (- x y))
      ((div ,(x) ,(y)) (/ x y))
      (,otherwise (error "simple-eval: invalid expression" x)))))
@end lisp

@unnumberedsubsec Named-Catamorphisms

It is also possible to explicitly name the operator in the ``cata'' position.
Where @code{,(id*)} recurs to the top of the current @code{sxml-match},
@code{,(cata -> id*)} recurs to @code{cata}.  @code{cata} must evaluate to a
procedure which takes one argument, and returns as many values as there are
identifiers following @code{->}.

Named catamorphism patterns allow processing to be split into multiple, mutually
recursive procedures.  This is illustrated in the example below: a
transformation that formats a ``TV Guide'' into HTML.

@lisp
(define (tv-guide->html g)
  (define (cast-list cl)
    (sxml-match cl
      ((CastList (CastMember (Character (Name ,ch)) (Actor (Name ,a))) ...)
       `(div (ul (li ,ch ": " ,a) ...)))))
  (define (prog p)
    (sxml-match p
      ((Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
                (Description ,desc ...))
       `(div (p ,start-time
                (br) ,series-title
                (br) ,desc ...)))
      ((Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
                (Description ,desc ...)
                ,(cast-list -> cl))
       `(div (p ,start-time
                (br) ,series-title
                (br) ,desc ...)
             ,cl))))
  (sxml-match g
    ((TVGuide (@@ (start ,start-date)
                 (end ,end-date))
              (Channel (Name ,nm) ,(prog -> p) ...) ...)
     `(html (head (title "TV Guide"))
            (body (h1 "TV Guide")
                  (div (h2 ,nm) ,p ...) ...)))))
@end lisp

@unnumberedsubsec @code{sxml-match-let} and @code{sxml-match-let*}

@deffn {Scheme Syntax} sxml-match-let ((pat expr) ...) expression0 expression ...
@deffnx {Scheme Syntax} sxml-match-let* ((pat expr) ...) expression0 expression ...
These forms generalize the @code{let} and @code{let*} forms of Scheme to allow
an XML pattern in the binding position, rather than a simple variable.
@end deffn

For example, the expression below:

@lisp
(sxml-match-let (((a ,i ,j) '(a 1 2)))
  (+ i j))
@end lisp

binds the variables @var{i} and @var{j} to @code{1} and @code{2} in the XML
value given.

@c Local Variables:
@c coding: utf-8
@c End: