diff options
author | Lorry Tar Creator <lorry-tar-importer@lorry> | 2014-10-13 19:14:30 +0000 |
---|---|---|
committer | Lorry Tar Creator <lorry-tar-importer@lorry> | 2014-10-13 19:14:30 +0000 |
commit | eafd7a3974e8605fd02794269db6114a3446e016 (patch) | |
tree | 064737b35dbe10f2995753ead92f95bac30ba048 /doc/ragel.1.in | |
download | ragel-tarball-eafd7a3974e8605fd02794269db6114a3446e016.tar.gz |
ragel-6.9ragel-6.9
Diffstat (limited to 'doc/ragel.1.in')
-rw-r--r-- | doc/ragel.1.in | 661 |
1 files changed, 661 insertions, 0 deletions
diff --git a/doc/ragel.1.in b/doc/ragel.1.in new file mode 100644 index 0000000..ca58f6e --- /dev/null +++ b/doc/ragel.1.in @@ -0,0 +1,661 @@ +.\" +.\" Copyright 2001-2007 Adrian Thurston <thurston@complang.org> +.\" + +.\" This file is part of Ragel. +.\" +.\" Ragel is free software; you can redistribute it and/or modify +.\" it under the terms of the GNU General Public License as published by +.\" the Free Software Foundation; either version 2 of the License, or +.\" (at your option) any later version. +.\" +.\" Ragel is distributed in the hope that it will be useful, +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +.\" GNU General Public License for more details. +.\" +.\" You should have received a copy of the GNU General Public License +.\" along with Ragel; if not, write to the Free Software +.\" Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + +.\" Process this file with +.\" groff -man -Tascii ragel.1 +.\" +.TH RAGEL 1 "@PUBDATE@" "Ragel @VERSION@" "Ragel State Machine Compiler" +.SH NAME +ragel \- compile regular languages into executable state machines +.SH SYNOPSIS +.B ragel +.RI [ options ] +.I file +.SH DESCRIPTION +Ragel compiles executable finite state machines from regular languages. +Ragel can generate C, C++, Objective-C, D, Go, or Java code. Ragel state +machines can not only recognize byte +sequences as regular expression machines do, but can also execute code at +arbitrary points in the recognition of a regular language. User code is +embedded using inline operators that do not disrupt the regular language +syntax. + +The core language consists of standard regular expression operators, such as +union, concatenation and kleene star, accompanied by action embedding +operators. Ragel also provides operators that let you control any +non-determinism that you create, construct scanners using the longest match +paradigm, and build state machines using the statechart model. It is also +possible to influence the execution of a state machine from inside an embedded +action by jumping or calling to other parts of the machine and reprocessing +input. + +Ragel provides a very flexibile interface to the host language that attempts to +place minimal restrictions on how the generated code is used and integrated +into the application. The generated code has no dependencies. + +.SH OPTIONS +.TP +.BR \-h ", " \-H ", " \-? ", " \-\-help +Display help and exit. +.TP +.B \-v +Print version information and exit. +.TP +.B \-o " file" +Write output to file. If -o is not given, a default file name is chosen by +replacing the file extenstion of the input file. For source files ending in .rh +the suffix .h is used. For all other source files a suffix based on the output +language is used (.c, .cpp, .m, etc.). If -o is not given for Graphviz output +the generated dot file is written to standard output. +.TP +.B \-s +Print some statistics on standard error. +.TP +.B \--error-format=gnu +Print error messages using the format "file:line:column:" (default) +.TP +.B \--error-format=msvc +Print error messages using the format "file(line,column):" +.TP +.B \-d +Do not remove duplicate actions from action lists. +.TP +.B \-I " dir" +Add dir to the list of directories to search for included and imported files +.TP +.B \-n +Do not perform state minimization. +.TP +.B \-m +Perform minimization once, at the end of the state machine compilation. +.TP +.B \-l +Minimize after nearly every operation. Lists of like operations such as unions +are minimized once at the end. This is the default minimization option. +.TP +.B \-e +Minimize after every operation. +.TP +.B \-x +Compile the state machines and emit an XML representation of the host data and +the machines. +.TP +.B \-V +Generate a dot file for Graphviz. +.TP +.B \-p +Display printable characters on labels. +.TP +.B \-S <spec> +FSM specification to output. +.TP +.B \-M <machine> +Machine definition/instantiation to output. +.TP +.B \-C +The host language is C, C++, Obj-C or Obj-C++. This is the default host language option. +.TP +.B \-D +The host language is D. +.TP +.B \-J +The host language is Java. +.TP +.B \-Z +The host language is Go. +.TP +.B \-R +The host language is Ruby. +.TP +.B \-L +Inhibit writing of #line directives. +.TP +.B \-T0 +(C/D/Java/Ruby/C#/Go) Generate a table driven FSM. This is the default code style. +The table driven +FSM represents the state machine as static data. There are tables of states, +transitions, indicies and actions. The current state is stored in a variable. +The execution is a loop that looks that given the current state and current +character to process looks up the transition to take using a binary search, +executes any actions and moves to the target state. In general, the table +driven FSM produces a smaller binary and requires a less expensive host language +compile but results in slower running code. The table driven FSM is suitable +for any FSM. +.TP +.B \-T1 +(C/D/Ruby/C#/Go) Generate a faster table driven FSM by expanding action lists in the action +execute code. +.TP +.B \-F0 +(C/D/Ruby/C#/Go) Generate a flat table driven FSM. Transitions are represented as an array +indexed by the current alphabet character. This eliminates the need for a +binary search to locate transitions and produces faster code, however it is +only suitable for small alphabets. +.TP +.B \-F1 +(C/D/Ruby/C#/Go) Generate a faster flat table driven FSM by expanding action lists in the action +execute code. +.TP +.B \-G0 +(C/D/C#/Go) Generate a goto driven FSM. The goto driven FSM represents the state machine +as a series of goto statements. While in the machine, the current state is +stored by the processor's instruction pointer. The execution is a flat function +where control is passed from state to state using gotos. In general, the goto +FSM produces faster code but results in a larger binary and a more expensive +host language compile. +.TP +.B \-G1 +(C/D/C#/Go) Generate a faster goto driven FSM by expanding action lists in the action +execute code. +.TP +.B \-G2 +(C/D/Go) Generate a really fast goto driven FSM by embedding action lists in the state +machine control code. +.TP +.B \-P<N> +(C/D) N-Way Split really fast goto-driven FSM. + +.SH RAGEL INPUT +NOTE: This is a very brief description of Ragel input. Ragel is described in +more detail in the user guide available from the homepage (see below). + +Ragel normally passes input files straight to the output. When it sees an FSM +specification that contains machine instantiations it stops to generate the +state machine. If there are write statements (such as "write exec") then ragel emits the +corresponding code. There can be any number of FSM specifications in an input +file. A multi-line FSM specification starts with '%%{' and ends with '}%%'. A +single line FSM specification starts with %% and ends at the first newline. +.SH FSM STATEMENTS +.TP +.I Machine Name: +Set the the name of the machine. If given, it must be the first statement. +.TP +.I Alphabet Type: +Set the data type of the alphabet. +.TP +.I GetKey: +Specify how to retrieve the alphabet character from the element type. +.TP +.I Include: +Include a machine of same name as the current or of a different name in either +the current file or some other file. +.TP +.I Action Definition: +Define an action that can be invoked by the FSM. +.TP +.I Fsm Definition, Instantiation and Longest Match Instantiation: +Used to build FSMs. Syntax description in next few sections. +.TP +.I Access: +Specify how to access the persistent state machine variables. +.TP +.I Write: +Write some component of the machine. +.TP +.I Variable: +Override the default variable names (p, pe, cs, act, etc). +.SH BASIC MACHINES +The basic machines are the base operands of the regular language expressions. +.TP +.I 'hello' +Concat literal. Produces a concatenation of the characters in the string. +Supports escape sequences with '\\'. The result will have a start state and a +transition to a new state for each character in the string. The last state in +the sequence will be made final. To make the string case-insensitive, append +an 'i' to the string, as in 'cmd'i\fR. +.TP +.I \(dqhello\(dq +Identical to single quote version. +.TP +.I [hello] +Or literal. Produces a union of characters. Supports character ranges +with '\-', negating the sense of the union with an initial '^' and escape +sequences with '\\'. The result will have two states with a transition between +them for each character or range. +.LP +NOTE: '', "", and [] produce null FSMs. Null machines have one state that is +both a start state and a final state and match the zero length string. A null machine +may be created with the null builtin machine. +.TP +.I integer +Makes a two state machine with one transition on the given integer number. +.TP +.I hex +Makes a two state machine with one transition on the given hexidecimal number. +.TP +.I "/simple_regex/" +A simple regular expression. Supports the notation '.', '*' and '[]', character +ranges with '\-', negating the sense of an OR expression with and initial '^' +and escape sequences with '\\'. Also supports one trailing flag: i. Use it to +produce a case-insensitive regular expression, as in /GET/i. +.TP +.I lit .. lit +Specifies a range. The allowable upper and lower bounds are concat literals of +length one and number machines. +For example, 0x10..0x20, 0..63, and 'a'..'z' are valid ranges. +.TP +.I "variable_name" +References the machine definition assigned to the variable name given. +.TP +.I "builtin_machine" +There are several builtin machines available. They are all two state machines +for the purpose of matching common classes of characters. They are: +.RS +.TP +.B any +Any character in the alphabet. +.TP +.B ascii +Ascii characters 0..127. +.TP +.B extend +Ascii extended characters. This is the range -128..127 for signed alphabets +and the range 0..255 for unsigned alphabets. +.TP +.B alpha +Alphabetic characters /[A-Za-z]/. +.TP +.B digit +Digits /[0-9]/. +.TP +.B alnum +Alpha numerics /[0-9A-Za-z]/. +.TP +.B lower +Lowercase characters /[a-z]/. +.TP +.B upper +Uppercase characters /[A-Z]/. +.TP +.B xdigit +Hexidecimal digits /[0-9A-Fa-f]/. +.TP +.B cntrl +Control characters 0..31. +.TP +.B graph +Graphical characters /[!-~]/. +.TP +.B print +Printable characters /[ -~]/. +.TP +.B punct +Punctuation. Graphical characters that are not alpha-numerics +/[!-/:-@\\[-`{-~]/. +.TP +.B space +Whitespace /[\\t\\v\\f\\n\\r ]/. +.TP +.B null +Zero length string. Equivalent to '', "" and []. +.TP +.B empty +Empty set. Matches nothing. +.RE +.SH BRIEF OPERATOR REFERENCE +Operators are grouped by precedence, group 1 being the lowest and group 6 the +highest. +.LP +.B GROUP 1: +.TP +.I expr , expr +Join machines together without drawing any transitions, setting up a start +state or any final states. Start state must be explicitly specified with the +"start" label. Final states may be specified with the an epsilon transitions to +the implicitly created "final" state. +.LP +.B GROUP 2: +.TP +.I expr | expr +Produces a machine that matches any string in machine one or machine two. +.TP +.I expr & expr +Produces a machine that matches any string that is in both machine one and +machine two. +.TP +.I expr - expr +Produces a machine that matches any string that is in machine one but not in +machine two. +.TP +.I expr -- expr +Strong Subtraction. Matches any string in machine one that does not have any string +in machine two as a substring. +.LP +.B GROUP 3: +.TP +.I expr . expr +Produces a machine that matches all the strings in machine one followed +by all the strings in machine two. +.TP +.I expr :> expr +Entry-Guarded Concatenation: terminates machine one upon entry to machine two. +.TP +.I expr :>> expr +Finish-Guarded Concatenation: terminates machine one when machine two finishes. +.TP +.I expr <: expr +Left-Guarded Concatenation: gives a higher priority to machine one. +.LP +NOTE: Concatenation is the default operator. Two machines next to each other +with no operator between them results in the concatenation operation. +.LP +.B GROUP 4: +.TP +.I label: expr +Attaches a label to an expression. Labels can be used by epsilon transitions +and fgoto and fcall statements in actions. Also note that the referencing of a +machine definition causes the implicit creation of label by the same name. +.LP +.B GROUP 5: +.TP +.I expr -> label +Draws an epsilon transition to the state defined by label. Label must +be a name in the current scope. Epsilon transitions are resolved when +comma operators are evaluated and at the root of the expression tree of +machine assignment/instantiation. +.LP +.B GROUP 6: Actions +.LP +An action may be a name predefined with an action statement or may +be specified directly with '{' and '}' in the expression. +.TP +.I expr > action +Embeds action into starting transitions. +.TP +.I expr @ action +Embeds action into transitions that go into a final state. +.TP +.I expr $ action +Embeds action into all transitions. Does not include pending out transitions. +.TP +.I expr % action +Embeds action into pending out transitions from final states. +.LP +.B GROUP 6: EOF Actions +.LP +When a machine's finish routine is called the current state's EOF actions are +executed. +.TP +.I expr >/ action +Embed an EOF action into the start state. +.TP +.I expr </ action +Embed an EOF action into all states except the start state. +.TP +.I expr $/ action +Embed an EOF action into all states. +.TP +.I expr %/ action +Embed an EOF action into final states. +.TP +.I expr @/ action +Embed an EOF action into all states that are not final. +.TP +.I expr <>/ action +Embed an EOF action into all states that are not the start +state and that are not final (middle states). +.LP +.B GROUP 6: Global Error Actions +.LP +Global error actions are stored in states until the final state machine has +been fully constructed. They are then transferred to error transitions, giving +the effect of a default action. +.TP +.I expr >! action +Embed a global error action into the start state. +.TP +.I expr <! action +Embed a global error action into all states except the start state. +.TP +.I expr $! action +Embed a global error action into all states. +.TP +.I expr %! action +Embed a global error action into the final states. +.TP +.I expr @! action +Embed a global error action into all states which are not final. +.TP +.I expr <>! action +Embed a global error action into all states which are not the start state and +are not final (middle states). +.LP +.B GROUP 6: Local Error Actions +.LP +Local error actions are stored in states until the named machine is fully +constructed. They are then transferred to error transitions, giving the effect +of a default action for a section of the total machine. Note that the name may +be omitted, in which case the action will be transferred to error actions upon +construction of the current machine. +.TP +.I expr >^ action +Embed a local error action into the start state. +.TP +.I expr <^ action +Embed a local error action into all states except the start state. +.TP +.I expr $^ action +Embed a local error action into all states. +.TP +.I expr %^ action +Embed a local error action into the final states. +.TP +.I expr @^ action +Embed a local error action into all states which are not final. +.TP +.I expr <>^ action +Embed a local error action into all states which are not the start state and +are not final (middle states). +.LP +.B GROUP 6: To-State Actions +.LP +To state actions are stored in states and executed any time the machine moves +into a state. This includes regular transitions, and transfers of control such +as fgoto. Note that setting the current state from outside the machine (for +example during initialization) does not count as a transition into a state. +.TP +.I expr >~ action +Embed a to-state action action into the start state. +.TP +.I expr <~ action +Embed a to-state action into all states except the start state. +.TP +.I expr $~ action +Embed a to-state action into all states. +.TP +.I expr %~ action +Embed a to-state action into the final states. +.TP +.I expr @~ action +Embed a to-state action into all states which are not final. +.TP +.I expr <>~ action +Embed a to-state action into all states which are not the start state and +are not final (middle states). +.LP +.B GROUP 6: From-State Actions +.LP +From state actions are executed whenever a state takes a transition on a character. +This includes the error transition and a transition to self. +.TP +.I expr >* action +Embed a from-state action into the start state. +.TP +.I expr <* action +Embed a from-state action into every state except the start state. +.TP +.I expr $* action +Embed a from-state action into all states. +.TP +.I expr %* action +Embed a from-state action into the final states. +.TP +.I expr @* action +Embed a from-state action into all states which are not final. +.TP +.I expr <>* action +Embed a from-state action into all states which are not the start state and +are not final (middle states). +.LP +.B GROUP 6: Priority Assignment +.LP +Priorities are assigned to names within transitions. Only priorities on the +same name are allowed to interact. In the first form of priorities the name +defaults to the name of the machine definition the priority is assigned in. +Transitions do not have default priorities. +.TP +.I expr > int +Assigns the priority int in all transitions leaving the start state. +.TP +.I expr @ int +Assigns the priority int in all transitions that go into a final state. +.TP +.I expr $ int +Assigns the priority int in all existing transitions. +.TP +.I expr % int +Assigns the priority int in all pending out transitions. +.LP +A second form of priority assignment allows the programmer to specify the name +to which the priority is assigned, allowing interactions to cross machine +definition boundaries. +.TP +.I expr > (name,int) +Assigns the priority int to name in all transitions leaving the start state. +.TP +.I expr @ (name, int) +Assigns the priority int to name in all transitions that go into a final state. +.TP +.I expr $ (name, int) +Assigns the priority int to name in all existing transitions. +.TP +.I expr % (name, int) +Assigns the priority int to name in all pending out transitions. +.LP +.B GROUP 7: +.TP +.I expr * +Produces the kleene star of a machine. Matches zero or more repetitions of the +machine. +.TP +.I expr ** +Longest-Match Kleene Star. This version of kleene star puts a higher +priority on staying in the machine over wrapping around and starting over. This +operator is equivalent to ( ( expr ) $0 %1 )*. +.TP +.I expr ? +Produces a machine that accepts the machine given or the null string. This operator +is equivalent to ( expr | '' ). +.TP +.I expr + +Produces the machine concatenated with the kleen star of itself. Matches one or +more repetitions of the machine. This operator is equivalent to ( expr . expr* ). +.TP +.I expr {n} +Produces a machine that matches exactly n repetitions of expr. +.TP +.I expr {,n} +Produces a machine that matches anywhere from zero to n repetitions of expr. +.TP +.I expr {n,} +Produces a machine that matches n or more repetitions of expr. +.TP +.I expr {n,m} +Produces a machine that matches n to m repetitions of expr. +.LP +.B GROUP 8: +.TP +.I ! expr +Produces a machine that matches any string not matched by the given machine. +This operator is equivalent to ( *extend - expr ). +.TP +.I ^ expr +Character-Level Negation. Matches any single character not matched by the +single character machine expr. +.LP +.B GROUP 9: +.TP +.I ( expr ) +Forces precedence on operators. +.SH VALUES AVAILABLE IN CODE BLOCKS +.TP +.I fc +The current character. Equivalent to *p. +.TP +.I fpc +A pointer to the current character. Equivalent to p. +.TP +.I fcurs +An integer value representing the current state. +.TP +.I ftargs +An integer value representing the target state. +.TP +.I fentry(<label>) +An integer value representing the entry point <label>. +.SH STATEMENTS AVAILABLE IN CODE BLOCKS +.TP +.I fhold; +Do not advance over the current character. Equivalent to --p;. +.TP +.I fexec <expr>; +Sets the current character to something else. Equivalent to p = (<expr>)-1; +.TP +.I fgoto <label>; +Jump to the machine defined by <label>. +.TP +.I fgoto *<expr>; +Jump to the entry point given by <expr>. The expression must +evaluate to an integer value representing a state. +.TP +.I fnext <label>; +Set the next state to be the entry point defined by <label>. The fnext +statement does not immediately jump to the specified state. Any action code +following the statement is executed. +.TP +.I fnext *<expr>; +Set the next state to be the entry point given by <expr>. The expression must +evaluate to an integer value representing a state. +.TP +.I fcall <label>; +Call the machine defined by <label>. The next fret will jump to the +target of the transition on which the action is invoked. +.TP +.I fcall *<expr>; +Call the entry point given by <expr>. The next fret will jump to the target of +the transition on which the action is invoked. +.TP +.I fret; +Return to the target state of the transition on which the last fcall was made. +.TP +.I fbreak; +Save the current state and immediately break out of the machine. +.SH CREDITS +Ragel was written by Adrian Thurston <thurston@complang.org>. +Objective-C output contributed by Erich Ocean. D output contributed by +Alan West. Ruby output contributed by Victor Hugo Borja. C Sharp code +generation contributed by Daniel Tang. Contributions to Java code +generation by Colin Fleming. Go code generation contributed by +Justine Tunney. +.SH "SEE ALSO" +.BR re2c (1), +.BR flex (1) + +Homepage: http://www.complang.org/ragel/ |