diff options
Diffstat (limited to 'doc/ragel/ragel.1.in')
-rw-r--r-- | doc/ragel/ragel.1.in | 702 |
1 files changed, 0 insertions, 702 deletions
diff --git a/doc/ragel/ragel.1.in b/doc/ragel/ragel.1.in deleted file mode 100644 index 9c768e85..00000000 --- a/doc/ragel/ragel.1.in +++ /dev/null @@ -1,702 +0,0 @@ -.\" -.\" Copyright 2001-2007 Adrian Thurston <thurston@colm.net> -.\" -.\" Permission is hereby granted, free of charge, to any person obtaining a copy -.\" of this software and associated documentation files (the "Software"), to -.\" deal in the Software without restriction, including without limitation the -.\" rights to use, copy, modify, merge, publish, distribute, sublicense, and/or -.\" sell copies of the Software, and to permit persons to whom the Software is -.\" furnished to do so, subject to the following conditions: -.\" -.\" The above copyright notice and this permission notice shall be included in all -.\" copies or substantial portions of the Software. -.\" -.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -.\" AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -.\" OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -.\" SOFTWARE. -.\" - -.\" -.\" Process this file with -.\" groff -man -Tascii ragel.1 -.\" - - -.TH RAGEL 1 "@PUBDATE@" "Ragel @VERSION@" "Ragel State Machine Compiler" -.SH NAME -ragel \- compile regular languages into executable state machines -.SH SYNOPSIS -.B ragel -.RI [ options ] -.I file -.SH DESCRIPTION -Ragel compiles executable finite state machines from regular languages. -Ragel can generate C, C++, Objective-C, D, Go, or Java code. Ragel state -machines can not only recognize byte -sequences as regular expression machines do, but can also execute code at -arbitrary points in the recognition of a regular language. User code is -embedded using inline operators that do not disrupt the regular language -syntax. - -The core language consists of standard regular expression operators, such as -union, concatenation and kleene star, accompanied by action embedding -operators. Ragel also provides operators that let you control any -non-determinism that you create, construct scanners using the longest match -paradigm, and build state machines using the statechart model. It is also -possible to influence the execution of a state machine from inside an embedded -action by jumping or calling to other parts of the machine and reprocessing -input. - -Ragel provides a very flexibile interface to the host language that attempts to -place minimal restrictions on how the generated code is used and integrated -into the application. The generated code has no dependencies. - -.SH OPTIONS -.TP -.BR \-h ", " \-H ", " \-? ", " \-\-help -Display help and exit. -.TP -.B \-v -Print version information and exit. -.TP -.B \-o " file" -Write output to file. If -o is not given, a default file name is chosen by -replacing the file extenstion of the input file. For source files ending in .rh -the suffix .h is used. For all other source files a suffix based on the output -language is used (.c, .cpp, .m, etc.). If -o is not given for Graphviz output -the generated dot file is written to standard output. -.TP -.B \-s -Print some statistics on standard error. -.TP -.B \--error-format=gnu -Print error messages using the format "file:line:column:" (default) -.TP -.B \--error-format=msvc -Print error messages using the format "file(line,column):" -.TP -.B \-d -Do not remove duplicate actions from action lists. -.TP -.B \-I " dir" -Add dir to the list of directories to search for included and imported files -.TP -.B \-n -Do not perform state minimization. -.TP -.B \-m -Perform minimization once, at the end of the state machine compilation. -.TP -.B \-l -Minimize after nearly every operation. Lists of like operations such as unions -are minimized once at the end. This is the default minimization option. -.TP -.B \-e -Minimize after every operation. -.TP -.B \-x -Compile the state machines and emit an XML representation of the host data and -the machines. -.TP -.B \-V -Generate a dot file for Graphviz. -.TP -.B \-p -Display printable characters on labels. -.TP -.B \-L -Inhibit writing of #line directives. -.TP -.B \-S <spec> -FSM specification to output. -.TP -.B \-M <machine> -Machine definition/instantiation to output. -.TP -.B \-C -The host language is C, C++, Obj-C or Obj-C++. This is the default host language option. -.TP -.B --asm --gas-x86-64-sys-v -GNU AS, x86_64, System V ABI -ASM is generated in a code style equiv to -G2 -.TP -.B \-D -The host language is D. -.TP -.B \-Z -The host language is Go. -.TP -.B \-J -The host language is Java. -.TP -.B \-R -The host language is Ruby. -.TP -.B -A -The host language is C# -.TP -.B -O -The host language is OCaml. -.TP -.B -U -The host language is Rust. -.TP -.B -Y -The host language is Julia. -.TP -.B -K -The host language is Crack. -.TP -.B -P -The host language is JavaScript. -.TP -.B \-T0 -(C/D/Java/Ruby/C#) Generate a table driven FSM. This is the default code style. -The table driven -FSM represents the state machine as static data. There are tables of states, -transitions, indices and actions. The current state is stored in a variable. -The execution is a loop that looks that given the current state and current -character to process looks up the transition to take using a binary search, -executes any actions and moves to the target state. In general, the table -driven FSM produces a smaller binary and requires a less expensive host language -compile but results in slower running code. The table driven FSM is suitable -for any FSM. -.TP -.B \-T1 -(C/D/Ruby/C#) Generate a faster table driven FSM by expanding action lists in the action -execute code. -.TP -.B \-F0 -(C/D/Ruby/C#) Generate a flat table driven FSM. Transitions are represented as an array -indexed by the current alphabet character. This eliminates the need for a -binary search to locate transitions and produces faster code, however it is -only suitable for small alphabets. -.TP -.B \-F1 -(C/D/Ruby/C#) Generate a faster flat table driven FSM by expanding action lists in the action -execute code. -.TP -.B \-G0 -(C/D/C#) Generate a goto driven FSM. The goto driven FSM represents the state machine -as a series of goto statements. While in the machine, the current state is -stored by the processor's instruction pointer. The execution is a flat function -where control is passed from state to state using gotos. In general, the goto -FSM produces faster code but results in a larger binary and a more expensive -host language compile. -.TP -.B \-G1 -(C/D/C#) Generate a faster goto driven FSM by expanding action lists in the action -execute code. -.TP -.B \-G2 -(C/D/Go) Generate a really fast goto driven FSM by embedding action lists in the state -machine control code. -.TP -.B --nfa-conds-depth=D -Search for high-cost conditions inside a prefix of the machine (depth D from -start state). Search is rooted at NFA union contructs. -.TP -.B --nfa-term-check -Search for condition-based general repetitions that will not function properly -and must be NFA reps. Search is rooted at NFA union constructs. -.TP -.B --nfa-intermed-state-limit=L -Report fail if number of states exceeds this during compilation. -.TP -.B --nfa-final-state-limit=L -Report a fail if number states in final machine exceeds this. -.TP -.B --nfa-breadth-check=E1,E2,.. -Report breadth cost of named entry points by (and start). Reporting starts at -NFA union contructs. -.TP -.B --input-histogram=FN -Input char histogram for breadth check. If unspecified a flat histogram is -used. -.SH RAGEL INPUT -NOTE: This is a very brief description of Ragel input. Ragel is described in -more detail in the user guide available from the homepage (see below). - -Ragel normally passes input files straight to the output. When it sees an FSM -specification that contains machine instantiations it stops to generate the -state machine. If there are write statements (such as "write exec") then ragel emits the -corresponding code. There can be any number of FSM specifications in an input -file. A multi-line FSM specification starts with '%%{' and ends with '}%%'. A -single line FSM specification starts with %% and ends at the first newline. -.SH FSM STATEMENTS -.TP -.I Machine Name: -Set the the name of the machine. If given, it must be the first statement. -.TP -.I Alphabet Type: -Set the data type of the alphabet. -.TP -.I GetKey: -Specify how to retrieve the alphabet character from the element type. -.TP -.I Include: -Include a machine of same name as the current or of a different name in either -the current file or some other file. -.TP -.I Action Definition: -Define an action that can be invoked by the FSM. -.TP -.I Fsm Definition, Instantiation and Longest Match Instantiation: -Used to build FSMs. Syntax description in next few sections. -.TP -.I Access: -Specify how to access the persistent state machine variables. -.TP -.I Write: -Write some component of the machine. -.TP -.I Variable: -Override the default variable names (p, pe, cs, act, etc). -.SH BASIC MACHINES -The basic machines are the base operands of the regular language expressions. -.TP -.I 'hello' -Concat literal. Produces a concatenation of the characters in the string. -Supports escape sequences with '\\'. The result will have a start state and a -transition to a new state for each character in the string. The last state in -the sequence will be made final. To make the string case-insensitive, append -an 'i' to the string, as in 'cmd'i\fR. -.TP -.I \(dqhello\(dq -Identical to single quote version. -.TP -.I [hello] -Or literal. Produces a union of characters. Supports character ranges -with '\-', negating the sense of the union with an initial '^' and escape -sequences with '\\'. The result will have two states with a transition between -them for each character or range. -.LP -NOTE: '', "", and [] produce null FSMs. Null machines have one state that is -both a start state and a final state and match the zero length string. A null machine -may be created with the null builtin machine. -.TP -.I integer -Makes a two state machine with one transition on the given integer number. -.TP -.I hex -Makes a two state machine with one transition on the given hexidecimal number. -.TP -.I "/simple_regex/" -A simple regular expression. Supports the notation '.', '*' and '[]', character -ranges with '\-', negating the sense of an OR expression with and initial '^' -and escape sequences with '\\'. Also supports one trailing flag: i. Use it to -produce a case-insensitive regular expression, as in /GET/i. -.TP -.I lit .. lit -Specifies a range. The allowable upper and lower bounds are concat literals of -length one and number machines. -For example, 0x10..0x20, 0..63, and 'a'..'z' are valid ranges. -.TP -.I "variable_name" -References the machine definition assigned to the variable name given. -.TP -.I "builtin_machine" -There are several builtin machines available. They are all two state machines -for the purpose of matching common classes of characters. They are: -.RS -.TP -.B any -Any character in the alphabet. -.TP -.B ascii -Ascii characters 0..127. -.TP -.B extend -Ascii extended characters. This is the range -128..127 for signed alphabets -and the range 0..255 for unsigned alphabets. -.TP -.B alpha -Alphabetic characters /[A-Za-z]/. -.TP -.B digit -Digits /[0-9]/. -.TP -.B alnum -Alpha numerics /[0-9A-Za-z]/. -.TP -.B lower -Lowercase characters /[a-z]/. -.TP -.B upper -Uppercase characters /[A-Z]/. -.TP -.B xdigit -Hexidecimal digits /[0-9A-Fa-f]/. -.TP -.B cntrl -Control characters 0..31, 127. -.TP -.B graph -Graphical characters /[!-~]/. -.TP -.B print -Printable characters /[ -~]/. -.TP -.B punct -Punctuation. Graphical characters that are not alpha-numerics -/[!-/:-@\\[-`{-~]/. -.TP -.B space -Whitespace /[\\t\\v\\f\\n\\r ]/. -.TP -.B null -Zero length string. Equivalent to '', "" and []. -.TP -.B empty -Empty set. Matches nothing. -.RE -.SH BRIEF OPERATOR REFERENCE -Operators are grouped by precedence, group 1 being the lowest and group 6 the -highest. -.LP -.B GROUP 1: -.TP -.I expr , expr -Join machines together without drawing any transitions, setting up a start -state or any final states. Start state must be explicitly specified with the -"start" label. Final states may be specified with the an epsilon transitions to -the implicitly created "final" state. -.LP -.B GROUP 2: -.TP -.I expr | expr -Produces a machine that matches any string in machine one or machine two. -.TP -.I expr & expr -Produces a machine that matches any string that is in both machine one and -machine two. -.TP -.I expr - expr -Produces a machine that matches any string that is in machine one but not in -machine two. -.TP -.I expr -- expr -Strong Subtraction. Matches any string in machine one that does not have any string -in machine two as a substring. -.LP -.B GROUP 3: -.TP -.I expr . expr -Produces a machine that matches all the strings in machine one followed -by all the strings in machine two. -.TP -.I expr :> expr -Entry-Guarded Concatenation: terminates machine one upon entry to machine two. -.TP -.I expr :>> expr -Finish-Guarded Concatenation: terminates machine one when machine two finishes. -.TP -.I expr <: expr -Left-Guarded Concatenation: gives a higher priority to machine one. -.LP -NOTE: Concatenation is the default operator. Two machines next to each other -with no operator between them results in the concatenation operation. -.LP -.B GROUP 4: -.TP -.I label: expr -Attaches a label to an expression. Labels can be used by epsilon transitions -and fgoto and fcall statements in actions. Also note that the referencing of a -machine definition causes the implicit creation of label by the same name. -.LP -.B GROUP 5: -.TP -.I expr -> label -Draws an epsilon transition to the state defined by label. Label must -be a name in the current scope. Epsilon transitions are resolved when -comma operators are evaluated and at the root of the expression tree of -machine assignment/instantiation. -.LP -.B GROUP 6: Actions -.LP -An action may be a name predefined with an action statement or may -be specified directly with '{' and '}' in the expression. -.TP -.I expr > action -Embeds action into starting transitions. -.TP -.I expr @ action -Embeds action into transitions that go into a final state. -.TP -.I expr $ action -Embeds action into all transitions. Does not include pending out transitions. -.TP -.I expr % action -Embeds action into pending out transitions from final states. -.LP -.B GROUP 6: EOF Actions -.LP -When a machine's finish routine is called the current state's EOF actions are -executed. -.TP -.I expr >/ action -Embed an EOF action into the start state. -.TP -.I expr </ action -Embed an EOF action into all states except the start state. -.TP -.I expr $/ action -Embed an EOF action into all states. -.TP -.I expr %/ action -Embed an EOF action into final states. -.TP -.I expr @/ action -Embed an EOF action into all states that are not final. -.TP -.I expr <>/ action -Embed an EOF action into all states that are not the start -state and that are not final (middle states). -.LP -.B GROUP 6: Global Error Actions -.LP -Global error actions are stored in states until the final state machine has -been fully constructed. They are then transferred to error transitions, giving -the effect of a default action. -.TP -.I expr >! action -Embed a global error action into the start state. -.TP -.I expr <! action -Embed a global error action into all states except the start state. -.TP -.I expr $! action -Embed a global error action into all states. -.TP -.I expr %! action -Embed a global error action into the final states. -.TP -.I expr @! action -Embed a global error action into all states which are not final. -.TP -.I expr <>! action -Embed a global error action into all states which are not the start state and -are not final (middle states). -.LP -.B GROUP 6: Local Error Actions -.LP -Local error actions are stored in states until the named machine is fully -constructed. They are then transferred to error transitions, giving the effect -of a default action for a section of the total machine. Note that the name may -be omitted, in which case the action will be transferred to error actions upon -construction of the current machine. -.TP -.I expr >^ action -Embed a local error action into the start state. -.TP -.I expr <^ action -Embed a local error action into all states except the start state. -.TP -.I expr $^ action -Embed a local error action into all states. -.TP -.I expr %^ action -Embed a local error action into the final states. -.TP -.I expr @^ action -Embed a local error action into all states which are not final. -.TP -.I expr <>^ action -Embed a local error action into all states which are not the start state and -are not final (middle states). -.LP -.B GROUP 6: To-State Actions -.LP -To state actions are stored in states and executed any time the machine moves -into a state. This includes regular transitions, and transfers of control such -as fgoto. Note that setting the current state from outside the machine (for -example during initialization) does not count as a transition into a state. -.TP -.I expr >~ action -Embed a to-state action action into the start state. -.TP -.I expr <~ action -Embed a to-state action into all states except the start state. -.TP -.I expr $~ action -Embed a to-state action into all states. -.TP -.I expr %~ action -Embed a to-state action into the final states. -.TP -.I expr @~ action -Embed a to-state action into all states which are not final. -.TP -.I expr <>~ action -Embed a to-state action into all states which are not the start state and -are not final (middle states). -.LP -.B GROUP 6: From-State Actions -.LP -From state actions are executed whenever a state takes a transition on a character. -This includes the error transition and a transition to self. -.TP -.I expr >* action -Embed a from-state action into the start state. -.TP -.I expr <* action -Embed a from-state action into every state except the start state. -.TP -.I expr $* action -Embed a from-state action into all states. -.TP -.I expr %* action -Embed a from-state action into the final states. -.TP -.I expr @* action -Embed a from-state action into all states which are not final. -.TP -.I expr <>* action -Embed a from-state action into all states which are not the start state and -are not final (middle states). -.LP -.B GROUP 6: Priority Assignment -.LP -Priorities are assigned to names within transitions. Only priorities on the -same name are allowed to interact. In the first form of priorities the name -defaults to the name of the machine definition the priority is assigned in. -Transitions do not have default priorities. -.TP -.I expr > int -Assigns the priority int in all transitions leaving the start state. -.TP -.I expr @ int -Assigns the priority int in all transitions that go into a final state. -.TP -.I expr $ int -Assigns the priority int in all existing transitions. -.TP -.I expr % int -Assigns the priority int in all pending out transitions. -.LP -A second form of priority assignment allows the programmer to specify the name -to which the priority is assigned, allowing interactions to cross machine -definition boundaries. -.TP -.I expr > (name,int) -Assigns the priority int to name in all transitions leaving the start state. -.TP -.I expr @ (name, int) -Assigns the priority int to name in all transitions that go into a final state. -.TP -.I expr $ (name, int) -Assigns the priority int to name in all existing transitions. -.TP -.I expr % (name, int) -Assigns the priority int to name in all pending out transitions. -.LP -.B GROUP 7: -.TP -.I expr * -Produces the kleene star of a machine. Matches zero or more repetitions of the -machine. -.TP -.I expr ** -Longest-Match Kleene Star. This version of kleene star puts a higher -priority on staying in the machine over wrapping around and starting over. This -operator is equivalent to ( ( expr ) $0 %1 )*. -.TP -.I expr ? -Produces a machine that accepts the machine given or the null string. This operator -is equivalent to ( expr | '' ). -.TP -.I expr + -Produces the machine concatenated with the kleen star of itself. Matches one or -more repetitions of the machine. This operator is equivalent to ( expr . expr* ). -.TP -.I expr {n} -Produces a machine that matches exactly n repetitions of expr. -.TP -.I expr {,n} -Produces a machine that matches anywhere from zero to n repetitions of expr. -.TP -.I expr {n,} -Produces a machine that matches n or more repetitions of expr. -.TP -.I expr {n,m} -Produces a machine that matches n to m repetitions of expr. -.LP -.B GROUP 8: -.TP -.I ! expr -Produces a machine that matches any string not matched by the given machine. -This operator is equivalent to ( *extend - expr ). -.TP -.I ^ expr -Character-Level Negation. Matches any single character not matched by the -single character machine expr. -.LP -.B GROUP 9: -.TP -.I ( expr ) -Forces precedence on operators. -.SH VALUES AVAILABLE IN CODE BLOCKS -.TP -.I fc -The current character. Equivalent to *p. -.TP -.I fpc -A pointer to the current character. Equivalent to p. -.TP -.I fcurs -An integer value representing the current state. -.TP -.I ftargs -An integer value representing the target state. -.TP -.I fentry(<label>) -An integer value representing the entry point <label>. -.SH STATEMENTS AVAILABLE IN CODE BLOCKS -.TP -.I fhold; -Do not advance over the current character. Equivalent to --p;. -.TP -.I fexec <expr>; -Sets the current character to something else. Equivalent to p = (<expr>)-1; -.TP -.I fgoto <label>; -Jump to the machine defined by <label>. -.TP -.I fgoto *<expr>; -Jump to the entry point given by <expr>. The expression must -evaluate to an integer value representing a state. -.TP -.I fnext <label>; -Set the next state to be the entry point defined by <label>. The fnext -statement does not immediately jump to the specified state. Any action code -following the statement is executed. -.TP -.I fnext *<expr>; -Set the next state to be the entry point given by <expr>. The expression must -evaluate to an integer value representing a state. -.TP -.I fcall <label>; -Call the machine defined by <label>. The next fret will jump to the -target of the transition on which the action is invoked. -.TP -.I fcall *<expr>; -Call the entry point given by <expr>. The next fret will jump to the target of -the transition on which the action is invoked. -.TP -.I fret; -Return to the target state of the transition on which the last fcall was made. -.TP -.I fbreak; -Save the current state and immediately break out of the machine. -.SH CREDITS -Ragel was written by Adrian Thurston <thurston@colm.net>. There have been many -contributors. Please see CREDITS file in source distribution. -.SH "SEE ALSO" -.BR re2c (1), -.BR flex (1) - -Homepage: http://www.colm.net/open-source/ragel/ |