summaryrefslogtreecommitdiff
path: root/doc/ragel.1.in
blob: ca58f6e02f868d3c2c1f6419e0997f82a3e57854 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
.\"
.\"   Copyright 2001-2007 Adrian Thurston <thurston@complang.org>
.\"

.\"   This file is part of Ragel.
.\"
.\"   Ragel is free software; you can redistribute it and/or modify
.\"   it under the terms of the GNU General Public License as published by
.\"   the Free Software Foundation; either version 2 of the License, or
.\"   (at your option) any later version.
.\"
.\"   Ragel is distributed in the hope that it will be useful,
.\"   but WITHOUT ANY WARRANTY; without even the implied warranty of
.\"   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
.\"   GNU General Public License for more details.
.\"
.\"   You should have received a copy of the GNU General Public License
.\"   along with Ragel; if not, write to the Free Software
.\"   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA 

.\"   Process this file with
.\"   groff -man -Tascii ragel.1
.\"
.TH RAGEL 1 "@PUBDATE@" "Ragel @VERSION@" "Ragel State Machine Compiler"
.SH NAME
ragel \- compile regular languages into executable state machines 
.SH SYNOPSIS
.B ragel 
.RI [ options ]
.I file
.SH DESCRIPTION
Ragel compiles executable finite state machines from regular languages.  
Ragel can generate C, C++, Objective-C, D, Go, or Java code. Ragel state
machines can not only recognize byte
sequences as regular expression machines do, but can also execute code at
arbitrary points in the recognition of a regular language.  User code is
embedded using inline operators that do not disrupt the regular language
syntax.

The core language consists of standard regular expression operators, such as
union, concatenation and kleene star, accompanied by action embedding
operators. Ragel also provides operators that let you control any
non-determinism that you create, construct scanners using the longest match
paradigm, and build state machines using the statechart model. It is also
possible to influence the execution of a state machine from inside an embedded
action by jumping or calling to other parts of the machine and reprocessing
input.

Ragel provides a very flexibile interface to the host language that attempts to
place minimal restrictions on how the generated code is used and integrated
into the application. The generated code has no dependencies.

.SH OPTIONS
.TP
.BR \-h ", " \-H ", " \-? ", " \-\-help
Display help and exit.
.TP
.B \-v
Print version information and exit.
.TP
.B \-o " file"
Write output to file. If -o is not given, a default file name is chosen by
replacing the file extenstion of the input file. For source files ending in .rh
the suffix .h is used. For all other source files a suffix based on the output
language is used (.c, .cpp, .m, etc.). If -o is not given for Graphviz output
the generated dot file is written to standard output.
.TP
.B \-s
Print some statistics on standard error.
.TP
.B \--error-format=gnu
Print error messages using the format "file:line:column:" (default)
.TP
.B \--error-format=msvc
Print error messages using the format "file(line,column):"
.TP
.B \-d
Do not remove duplicate actions from action lists.
.TP
.B \-I " dir"
Add dir to the list of directories to search for included and imported files
.TP
.B \-n
Do not perform state minimization.
.TP
.B \-m
Perform minimization once, at the end of the state machine compilation. 
.TP
.B \-l
Minimize after nearly every operation. Lists of like operations such as unions
are minimized once at the end. This is the default minimization option.
.TP
.B \-e
Minimize after every operation.
.TP
.B \-x
Compile the state machines and emit an XML representation of the host data and
the machines.
.TP
.B \-V
Generate a dot file for Graphviz.
.TP
.B \-p
Display printable characters on labels.
.TP
.B \-S <spec>
FSM specification to output.
.TP
.B \-M <machine>
Machine definition/instantiation to output.
.TP
.B \-C
The host language is C, C++, Obj-C or Obj-C++. This is the default host language option.
.TP
.B \-D
The host language is D.
.TP
.B \-J
The host language is Java.
.TP
.B \-Z
The host language is Go.
.TP
.B \-R
The host language is Ruby.
.TP
.B \-L
Inhibit writing of #line directives.
.TP
.B \-T0
(C/D/Java/Ruby/C#/Go) Generate a table driven FSM. This is the default code style.
The table driven
FSM represents the state machine as static data. There are tables of states,
transitions, indicies and actions. The current state is stored in a variable.
The execution is a loop that looks that given the current state and current
character to process looks up the transition to take using a binary search,
executes any actions and moves to the target state. In general, the table
driven FSM produces a smaller binary and requires a less expensive host language
compile but results in slower running code. The table driven FSM is suitable
for any FSM.
.TP
.B \-T1
(C/D/Ruby/C#/Go) Generate a faster table driven FSM by expanding action lists in the action
execute code.
.TP
.B \-F0
(C/D/Ruby/C#/Go) Generate a flat table driven FSM. Transitions are represented as an array
indexed by the current alphabet character. This eliminates the need for a
binary search to locate transitions and produces faster code, however it is
only suitable for small alphabets.
.TP
.B \-F1
(C/D/Ruby/C#/Go) Generate a faster flat table driven FSM by expanding action lists in the action
execute code.
.TP
.B \-G0
(C/D/C#/Go) Generate a goto driven FSM. The goto driven FSM represents the state machine
as a series of goto statements. While in the machine, the current state is
stored by the processor's instruction pointer. The execution is a flat function
where control is passed from state to state using gotos. In general, the goto
FSM produces faster code but results in a larger binary and a more expensive
host language compile.
.TP
.B \-G1
(C/D/C#/Go) Generate a faster goto driven FSM by expanding action lists in the action
execute code.
.TP
.B \-G2
(C/D/Go) Generate a really fast goto driven FSM by embedding action lists in the state
machine control code.
.TP
.B \-P<N>
(C/D) N-Way Split really fast goto-driven FSM.

.SH RAGEL INPUT
NOTE: This is a very brief description of Ragel input. Ragel is described in
more detail in the user guide available from the homepage (see below).

Ragel normally passes input files straight to the output. When it sees an FSM
specification that contains machine instantiations it stops to generate the
state machine. If there are write statements (such as "write exec") then ragel emits the
corresponding code. There can be any number of FSM specifications in an input
file. A multi-line FSM specification starts with '%%{' and ends with '}%%'. A
single line FSM specification starts with %% and ends at the first newline.
.SH FSM STATEMENTS
.TP
.I Machine Name:
Set the the name of the machine. If given, it must be the first statement.
.TP
.I Alphabet Type:
Set the data type of the alphabet.
.TP
.I GetKey:
Specify how to retrieve the alphabet character from the element type.
.TP
.I Include:
Include a machine of same name as the current or of a different name in either
the current file or some other file.
.TP
.I Action Definition:
Define an action that can be invoked by the FSM.
.TP
.I Fsm Definition, Instantiation and Longest Match Instantiation:
Used to build FSMs. Syntax description in next few sections.
.TP
.I Access:
Specify how to access the persistent state machine variables.
.TP
.I Write:
Write some component of the machine.
.TP
.I Variable:
Override the default variable names (p, pe, cs, act, etc).
.SH BASIC MACHINES
The basic machines are the base operands of the regular language expressions.
.TP
.I 'hello'
Concat literal. Produces a concatenation of the characters in the string.
Supports escape sequences with '\\'.  The result will have a start state and a
transition to a new state for each character in the string. The last state in
the sequence will be made final. To make the string case-insensitive, append
an 'i' to the string, as in 'cmd'i\fR.
.TP
.I \(dqhello\(dq
Identical to single quote version.
.TP
.I [hello]
Or literal. Produces a union of characters.  Supports character ranges 
with '\-', negating the sense of the union with an initial '^' and escape
sequences with '\\'. The result will have two states with a transition between
them for each character or range. 
.LP
NOTE: '', "", and [] produce null FSMs. Null machines have one state that is
both a start state and a final state and match the zero length string. A null machine
may be created with the null builtin machine.
.TP
.I integer
Makes a two state machine with one transition on the given integer number.
.TP
.I hex
Makes a two state machine with one transition on the given hexidecimal number.
.TP
.I "/simple_regex/"
A simple regular expression. Supports the notation '.', '*' and '[]', character
ranges with '\-', negating the sense of an OR expression with and initial '^'
and escape sequences with '\\'. Also supports one trailing flag: i. Use it to
produce a case-insensitive regular expression, as in /GET/i.
.TP
.I lit .. lit
Specifies a range. The allowable upper and lower bounds are concat literals of
length one and number machines. 
For example, 0x10..0x20,  0..63, and 'a'..'z' are valid ranges.
.TP 
.I "variable_name"
References the machine definition assigned to the variable name given.
.TP
.I "builtin_machine"
There are several builtin machines available. They are all two state machines
for the purpose of matching common classes of characters. They are:
.RS
.TP
.B any
Any character in the alphabet.
.TP
.B ascii
Ascii characters 0..127.
.TP
.B extend
Ascii extended characters. This is the range -128..127 for signed alphabets
and the range 0..255 for unsigned alphabets.
.TP
.B alpha
Alphabetic characters /[A-Za-z]/.
.TP
.B digit
Digits /[0-9]/.
.TP
.B alnum
Alpha numerics /[0-9A-Za-z]/.
.TP
.B lower
Lowercase characters /[a-z]/.
.TP
.B upper
Uppercase characters /[A-Z]/.
.TP
.B xdigit
Hexidecimal digits /[0-9A-Fa-f]/.
.TP
.B cntrl
Control characters 0..31.
.TP
.B graph
Graphical characters /[!-~]/.
.TP
.B print
Printable characters /[ -~]/.
.TP
.B punct
Punctuation. Graphical characters that are not alpha-numerics
/[!-/:-@\\[-`{-~]/. 
.TP
.B space
Whitespace /[\\t\\v\\f\\n\\r ]/.
.TP
.B null
Zero length string. Equivalent to '', "" and [].
.TP
.B empty
Empty set. Matches nothing.
.RE
.SH BRIEF OPERATOR REFERENCE
Operators are grouped by precedence, group 1 being the lowest and group 6 the
highest.
.LP
.B GROUP 1:
.TP
.I expr , expr
Join machines together without drawing any transitions, setting up a start
state or any final states. Start state must be explicitly specified with the
"start" label. Final states may be specified with the an epsilon transitions to
the implicitly created "final" state.
.LP
.B GROUP 2:
.TP
.I expr | expr
Produces a machine that matches any string in machine one or machine two.
.TP
.I expr & expr
Produces a machine that matches any string that is in both machine one and
machine two.
.TP
.I expr - expr
Produces a machine that matches any string that is in machine one but not in
machine two.
.TP
.I expr -- expr
Strong Subtraction. Matches any string in machine one that does not have any string
in machine two as a substring.
.LP
.B GROUP 3:
.TP
.I expr . expr
Produces a machine that matches all the strings in machine one followed
by all the strings in machine two.
.TP
.I expr :> expr
Entry-Guarded Concatenation: terminates machine one upon entry to machine two.
.TP
.I expr :>> expr
Finish-Guarded Concatenation: terminates machine one when machine two finishes.
.TP
.I expr <: expr
Left-Guarded Concatenation: gives a higher priority to machine one.
.LP
NOTE: Concatenation is the default operator. Two machines next to each other
with no operator between them results in the concatenation operation.
.LP
.B GROUP 4:
.TP
.I label: expr
Attaches a label to an expression. Labels can be used by epsilon transitions
and fgoto and fcall statements in actions. Also note that the referencing of a
machine definition causes the implicit creation of label by the same name.
.LP
.B GROUP 5:
.TP
.I expr -> label
Draws an epsilon transition to the state defined by label. Label must
be a name in the current scope. Epsilon transitions are resolved when
comma operators are evaluated and at the root of the expression tree of
machine assignment/instantiation.
.LP
.B GROUP 6: Actions
.LP
An action may be a name predefined with an action statement or may
be specified directly with '{' and '}' in the expression.
.TP
.I expr > action
Embeds action into starting transitions.
.TP
.I expr @ action
Embeds action into transitions that go into a final state.
.TP
.I expr $ action
Embeds action into all transitions. Does not include pending out transitions.
.TP
.I expr % action
Embeds action into pending out transitions from final states.
.LP
.B GROUP 6: EOF Actions
.LP
When a machine's finish routine is called the current state's EOF actions are
executed. 
.TP
.I expr >/ action
Embed an EOF action into the start state.
.TP
.I expr </ action
Embed an EOF action into all states except the start state.
.TP
.I expr $/ action
Embed an EOF action into all states.
.TP
.I expr %/ action
Embed an EOF action into final states.
.TP
.I expr @/ action
Embed an EOF action into all states that are not final.
.TP
.I expr <>/ action
Embed an EOF action into all states that are not the start
state and that are not final (middle states).
.LP
.B GROUP 6: Global Error Actions
.LP
Global error actions are stored in states until the final state machine has
been fully constructed. They are then transferred to error transitions, giving
the effect of a default action.
.TP
.I expr >! action
Embed a global error action into the start state.
.TP
.I expr <! action
Embed a global error action into all states except the start state.
.TP
.I expr $! action
Embed a global error action into all states.
.TP
.I expr %! action
Embed a global error action into the final states.
.TP
.I expr @! action
Embed a global error action into all states which are not final.
.TP
.I expr <>! action
Embed a global error action into all states which are not the start state and
are not final (middle states).
.LP
.B GROUP 6: Local Error Actions 
.LP
Local error actions are stored in states until the named machine is fully
constructed. They are then transferred to error transitions, giving the effect
of a default action for a section of the total machine. Note that the name may
be omitted, in which case the action will be transferred to error actions upon
construction of the current machine.
.TP
.I expr >^ action
Embed a local error action into the start state.
.TP
.I expr <^ action
Embed a local error action into all states except the start state.
.TP
.I expr $^ action
Embed a local error action into all states.
.TP
.I expr %^ action
Embed a local error action into the final states.
.TP
.I expr @^ action
Embed a local error action into all states which are not final.
.TP
.I expr <>^ action
Embed a local error action into all states which are not the start state and
are not final (middle states).
.LP
.B GROUP 6: To-State Actions
.LP
To state actions are stored in states and executed any time the machine moves
into a state. This includes regular transitions, and transfers of control such
as fgoto. Note that setting the current state from outside the machine (for
example during initialization) does not count as a transition into a state.
.TP
.I expr >~ action
Embed a to-state action action into the start state.
.TP
.I expr <~ action
Embed a to-state action into all states except the start state.
.TP
.I expr $~ action
Embed a to-state action into all states.
.TP
.I expr %~ action
Embed a to-state action into the final states.
.TP
.I expr @~ action
Embed a to-state action into all states which are not final.
.TP
.I expr <>~ action
Embed a to-state action into all states which are not the start state and
are not final (middle states).
.LP
.B GROUP 6: From-State Actions
.LP
From state actions are executed whenever a state takes a transition on a character.
This includes the error transition and a transition to self.
.TP
.I expr >* action
Embed a from-state action into the start state.
.TP
.I expr <* action
Embed a from-state action into every state except the start state.
.TP
.I expr $* action
Embed a from-state action into all states.
.TP
.I expr %* action
Embed a from-state action into the final states.
.TP
.I expr @* action
Embed a from-state action into all states which are not final.
.TP
.I expr <>* action
Embed a from-state action into all states which are not the start state and
are not final (middle states).
.LP
.B GROUP 6: Priority Assignment
.LP
Priorities are assigned to names within transitions. Only priorities on the
same name are allowed to interact. In the first form of priorities the name
defaults to the name of the machine definition the priority is assigned in.
Transitions do not have default priorities.
.TP
.I expr > int
Assigns the priority int in all transitions leaving the start state.
.TP
.I expr @ int
Assigns the priority int in all transitions that go into a final state.
.TP
.I expr $ int
Assigns the priority int in all existing transitions.
.TP
.I expr % int
Assigns the priority int in all pending out transitions.
.LP
A second form of priority assignment allows the programmer to specify the name
to which the priority is assigned, allowing interactions to cross machine
definition boundaries.
.TP
.I expr > (name,int)
Assigns the priority int to name in all transitions leaving the start state.
.TP
.I expr @ (name, int)
Assigns the priority int to name in all transitions that go into a final state.
.TP
.I expr $ (name, int)
Assigns the priority int to name in all existing transitions.
.TP
.I expr % (name, int)
Assigns the priority int to name in all pending out transitions.
.LP
.B GROUP 7:
.TP
.I expr *
Produces the kleene star of a machine. Matches zero or more repetitions of the
machine.
.TP
.I expr **
Longest-Match Kleene Star. This version of kleene star puts a higher
priority on staying in the machine over wrapping around and starting over. This
operator is equivalent to ( ( expr ) $0 %1 )*.
.TP
.I expr ?
Produces a machine that accepts the machine given or the null string. This operator
is equivalent to  ( expr | '' ).
.TP
.I expr +
Produces the machine concatenated with the kleen star of itself. Matches one or
more repetitions of the machine.  This operator is equivalent to ( expr . expr* ).
.TP
.I expr {n}
Produces a machine that matches exactly n repetitions of expr.
.TP
.I expr {,n}
Produces a machine that matches anywhere from zero to n repetitions of expr.
.TP
.I expr {n,}
Produces a machine that matches n or more repetitions of expr.
.TP
.I expr {n,m}
Produces a machine that matches n to m repetitions of expr.
.LP
.B GROUP 8:
.TP
.I ! expr
Produces a machine that matches any string not matched by the given machine.
This operator is equivalent to ( *extend - expr ).
.TP
.I ^ expr
Character-Level Negation. Matches any single character not matched by the
single character machine expr.
.LP
.B GROUP 9:
.TP
.I ( expr )
Forces precedence on operators.
.SH VALUES AVAILABLE IN CODE BLOCKS
.TP
.I fc
The current character. Equivalent to *p.
.TP
.I fpc
A pointer to the current character. Equivalent to p.
.TP
.I fcurs
An integer value representing the current state.
.TP
.I ftargs
An integer value representing the target state.
.TP
.I fentry(<label>)
An integer value representing the entry point <label>.
.SH STATEMENTS AVAILABLE IN CODE BLOCKS
.TP
.I fhold;
Do not advance over the current character. Equivalent to --p;.
.TP
.I fexec <expr>;
Sets the current character to something else. Equivalent to p = (<expr>)-1;
.TP
.I fgoto <label>;
Jump to the machine defined by <label>. 
.TP
.I fgoto *<expr>;
Jump to the entry point given by <expr>. The expression must
evaluate to an integer value representing a state.
.TP
.I fnext <label>;
Set the next state to be the entry point defined by <label>.  The fnext
statement does not immediately jump to the specified state. Any action code
following the statement is executed.
.TP
.I fnext *<expr>;
Set the next state to be the entry point given by <expr>. The expression must
evaluate to an integer value representing a state.
.TP
.I fcall <label>;
Call the machine defined by <label>. The next fret will jump to the
target of the transition on which the action is invoked.
.TP
.I fcall *<expr>;
Call the entry point given by <expr>. The next fret will jump to the target of
the transition on which the action is invoked.
.TP
.I fret;
Return to the target state of the transition on which the last fcall was made.
.TP
.I fbreak;
Save the current state and immediately break out of the machine.
.SH CREDITS
Ragel was written by Adrian Thurston <thurston@complang.org>.
Objective-C output contributed by Erich Ocean. D output contributed by
Alan West. Ruby output contributed by Victor Hugo Borja. C Sharp code
generation contributed by Daniel Tang. Contributions to Java code
generation by Colin Fleming.  Go code generation contributed by
Justine Tunney.
.SH "SEE ALSO"
.BR re2c (1),
.BR flex (1)

Homepage: http://www.complang.org/ragel/