diff options
author | Eric S. Raymond <esr@thyrsus.com> | 2020-10-14 13:16:44 -0400 |
---|---|---|
committer | Eric S. Raymond <esr@thyrsus.com> | 2020-10-14 13:16:44 -0400 |
commit | e740e500e21be7a95b5f19835c01bbdfd3f9fe07 (patch) | |
tree | 90aa01e57c21a4bcdf12491942939dca64bf2bc3 /doc | |
parent | 35c1cf34269654f6bf254be7cd5bd46e28d29bd3 (diff) | |
download | flex-git-e740e500e21be7a95b5f19835c01bbdfd3f9fe07.tar.gz |
Documentation polishing.
Diffstat (limited to 'doc')
-rw-r--r-- | doc/flex.texi | 74 |
1 files changed, 48 insertions, 26 deletions
diff --git a/doc/flex.texi b/doc/flex.texi index 9531366..f4a526b 100644 --- a/doc/flex.texi +++ b/doc/flex.texi @@ -2378,9 +2378,10 @@ rules is matched. The following would do the trick: @end example @vindex YY_NUM_RULES -where @code{ctr} is an array to hold the counts for the different rules. -Note that the macro @code{YY_NUM_RULES} gives the total number of rules -(including the default rule), even if you use @samp{-s)}, so a correct +where @code{ctr} is an array to hold the counts for the different +rules. Note that the public constant @code{YY_NUM_RULES} (a macro in +the default C/C++ back end) gives the total number of rules (including +the default rule), even if you use @samp{-s)}, so a correct declaration for @code{ctr} is: @example @@ -2396,14 +2397,14 @@ internal initializations are done). For example, it could be used to call a routine to read in a data table or open a logging file. @findex yy_set_interactive -The macro @code{yy_set_interactive(is_interactive)} can be used to +The entry point @code{yy_set_interactive(is_interactive)} can be used to control whether the current buffer is considered @dfn{interactive}. An interactive buffer is processed more slowly, but must be used when the scanner's input source is indeed interactive to avoid problems due to waiting to fill buffers (see the discussion of the @samp{-I} flag in -@ref{Scanner Options}). A non-zero value in the macro invocation marks +@ref{Scanner Options}). Passing a boolean true (in C/C++, non-zero) value marks the buffer as interactive, a zero value as non-interactive. Note that -use of this macro overrides @code{%option always-interactive} or +use of this entry point overrides @code{%option always-interactive} or @code{%option never-interactive} (@pxref{Scanner Options}). @code{yy_set_interactive()} must be invoked prior to beginning to scan the buffer that is (or is not) to be considered interactive. @@ -2412,7 +2413,7 @@ the buffer that is (or is not) to be considered interactive. @findex yy_set_bol The rule hook @code{yy_set_bol(at_bol)} can be used to control whether the current buffer's scanning context for the next token match is done as -though at the beginning of a line. A non-zero macro argument makes +though at the beginning of a line. A non-zero argument makes rules anchored with @samp{^} active, while a zero argument makes @samp{^} rules inactive. @@ -2906,7 +2907,7 @@ is performed in @code{yylex_init} at runtime. directs @code{flex} to generate a scanner that maintains the number of the current line read from its input in the global variable @code{yylineno}. This option is implied by @code{%option -lex-compat}. In a reentrant C scanner, the macro @code{yylineno} is +lex-compat}. In a reentrant C scanner, @code{yylineno} is accessible regardless of the value of @code{%option yylineno}, however, its value is not modified by @code{flex} unless @code{%option yylineno} is enabled. @@ -4141,12 +4142,15 @@ scanners. Here is a quick overview of the API: All functions take one additional argument: @code{yyscanner} @item -All global variables are replaced by their macro equivalents. -(We tell you this because it may be important to you during debugging.) +In C/C++, all global variables are replaced by their macro equivalents. +(We tell you this because it may be important to you during +debugging.) This is a historical-compatibilty hack; other back ends +probably will not emulate it. @item -@code{yylex_init} and @code{yylex_destroy} must be called before and -after @code{yylex}, respectively. +In the default C/C++ @code{yylex_init} and @code{yylex_destroy} must +be called before and after @code{yylex}, respectively. Other back ebds +may or may not require this. @item Accessor methods (get/set functions) provide access to common @@ -4193,7 +4197,7 @@ First, an example of a reentrant scanner: @node Reentrant Detail, Reentrant Functions, Reentrant Example, Reentrant @section The Reentrant API in Detail -Here are the things you need to do or know to use the reentrant C API of +Here are the things you need to do or know to use the reentrant API of @code{flex}. @menu @@ -4270,7 +4274,8 @@ equivalents. In particular, @code{yytext}, @code{yyleng}, @code{yylineno}, @code{yyin}, @code{yyout}, @code{yyextra}, @code{yylval}, and @code{yylloc} are macros. You may safely use these macros in actions as if they were plain variables. We only tell you this so you don't expect to link to these variables -externally. Currently, each macro expands to a member of an internal struct, e.g., +externally. Currently, each macro expands to a member of an internal +struct, e.g., in C/C++: @example @verbatim @@ -4296,8 +4301,10 @@ to accomplish this. (See below). @findex yylex_init @findex yylex_destroy -@code{yylex_init} and @code{yylex_destroy} must be called before and -after @code{yylex}, respectively. +In the default C/C++ back end @code{yylex_init} and +@code{yylex_destroy} must be called before and after @code{yylex}, +respectively. This may not be true in other target langages, +especially those with automatic memory management. @example @verbatim @@ -4334,8 +4341,9 @@ takes one argument, which is the value returned (via an argument) by @code{yylex_init}. Otherwise, it behaves the same as the non-reentrant version of @code{yylex}. -Both @code{yylex_init} and @code{yylex_init_extra} returns 0 (zero) on success, -or non-zero on failure, in which case errno is set to one of the following values: +Both @code{yylex_init} and @code{yylex_init_extra} returns 0 (zero) on +success, or non-zero on failure. On error, in the C/C++ back end in +which case errno is set to one of the following values: @itemize @item ENOMEM @@ -4344,6 +4352,8 @@ Memory allocation error. @xref{memory-management}. Invalid argument. @end itemize +Othert target langages may use different means of passing back an +error indication. The function @code{yylex_destroy} should be called to free resources used by the scanner. After @code{yylex_destroy} @@ -4500,7 +4510,7 @@ scanner: @subsection About yyscan_t @tindex yyscan_t (reentrant only) -@code{yyscan_t} is defined as: +On C/C++, @code{yyscan_t} is defined as: @example @verbatim @@ -5462,15 +5472,18 @@ in @emph{fixed} trailing context being turned into the more expensive @end verbatim @end example -Use of @code{yyunput()} invalidates yytext and yyleng, unless the +Some caveats are specific ro the In the C/C++ back end: Use of +@code{yyunput()} invalidates yytext and yyleng, unless the @code{%array} directive or the @samp{-l} option has been used. Pattern-matching of @code{NUL}s is substantially slower than matching other characters. Dynamic resizing of the input buffer is slow, as it -entails rescanning all the text matched so far by the current (generally -huge) token. Due to both buffering of input and read-ahead, you cannot -intermix calls to @file{<stdio.h>} routines, such as, @b{getchar()}, -with @code{flex} rules and expect it to work. Call @code{yyinput()} -instead. The total table entries listed by the @samp{-v} flag excludes +entails rescanning all the text matched so far by the current +(generally huge) token. Due to both buffering of input and +read-ahead, you cannot intermix calls to @file{<stdio.h>} routines, +such as, @b{getchar()}, with @code{flex} rules and expect it to work. +Call @code{yyinput()} instead. + +The total table entries listed by the @samp{-v} flag excludes the number of table entries needed to determine what rule has been matched. The number of entries is equal to the number of DFA states if the scanner does not use @code{yyreject()}, and somewhat greater than the @@ -8748,7 +8761,7 @@ targets. Almost anything generally descended from Algol shouldn't be much more difficult; this certainly includes the whole Pascal/Modula/Oberon family. -Two notes about the interesting part: +Some notes about the interesting part: @itemize @item @@ -8764,6 +8777,15 @@ The ``one exception'' to target-syntax independence hinted at earlier is some C code spliced into the skeleton when table serialization is enabled. This option is thus available only with the C back end; you need not bother supporting it in yours. + +@item +If your target language has an object system, you probably want your +back end to generate a class named by default FlexLexer (as the +C++ back end does) with all of the controls and query functions as +methods. As in C++, @code{%option yyclass} should modify the +class name. If your target language has a moduke system, the +-P option (which in C/C++ sets a common prefix on exposed entry +points) can be pressed into service to set the module name. @end itemize The following assumptions in the code might trip you up and |