summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorAkim Demaille <akim.demaille@gmail.com>2020-07-02 08:19:58 +0200
committerAkim Demaille <akim.demaille@gmail.com>2020-07-04 11:43:35 +0200
commit156e548341c24975c8c84d4013ab53bc990401fb (patch)
tree67952bb00b863d8c9b7c52ca50ea11cf951e54f9 /doc
parent526ef45f306ed85ae22411e2564a66e5e39c9778 (diff)
downloadbison-156e548341c24975c8c84d4013ab53bc990401fb.tar.gz
cex: give more details about -Wcex and -rcex
* data/bison-default.css: Cobalt does not seem to be supported. * doc/bison.texi (Counterexamples): A new section. (Understanding): Show the counterexamples as it shows in the report: with its items. (Bison Options): Document -Wcex and -rcex.
Diffstat (limited to 'doc')
-rw-r--r--doc/bison.texi309
1 files changed, 235 insertions, 74 deletions
diff --git a/doc/bison.texi b/doc/bison.texi
index 5409f36c..84a609cf 100644
--- a/doc/bison.texi
+++ b/doc/bison.texi
@@ -53,11 +53,11 @@
\gdef\colorBlue{%
\setcolor{\rgbBlue}%
}
-
-\gdef\rgbWarning{0.50 0 0.50}
-\gdef\colorWarning{%
- \setcolor{\rgbWarning}%
+\gdef\rgbPurple{0.50 0 0.50}
+\gdef\colorPurple{%
+ \setcolor{\rgbPurple}%
}
+
\gdef\rgbError{0.80 0 0}
\gdef\colorError{%
\setcolor{\rgbError}%
@@ -84,10 +84,10 @@
@macro colorBlue
@inlineraw{html, <b style="color:blue">}
@end macro
-
-@macro colorWarning
+@macro colorPurple
@inlineraw{html, <b style="color:darkviolet">}
@end macro
+
@macro colorError
@inlineraw{html, <b style="color:red">}
@end macro
@@ -115,8 +115,12 @@
@colorBlue{}\text\@colorOff{}
@end macro
+@macro purple{text}
+@colorPurple{}\text\@colorOff{}
+@end macro
+
@macro dwarning{text}
-@colorWarning{}\text\@colorOff{}
+@purple{\text\}
@end macro
@macro derror{text}
@@ -445,6 +449,7 @@ Handling Context Dependencies
Debugging Your Parser
+* Counterexamples:: Understanding conflicts.
* Understanding:: Understanding the structure of your parser.
* Graphviz:: Getting a visual representation of the parser.
* Xml:: Getting a markup representation of the parser.
@@ -9785,12 +9790,18 @@ clear the flag.
@chapter Debugging Your Parser
Developing a parser can be a challenge, especially if you don't understand
-the algorithm (@pxref{Algorithm}). This
-chapter explains how to understand and debug a parser.
-
-The first sections focus on the static part of the parser: its structure.
-They explain how to generate and read the detailed description of the
-automaton. There are several formats available:
+the algorithm (@pxref{Algorithm}). This chapter explains how to understand
+and debug a parser.
+
+The most frequent issue users face is solving their conflicts. To fix them,
+the first step is understanding how they arise in a given grammar. This is
+made much easier by automated generation of counterexamples, cover in the
+first section (@pxref{Counterexamples}).
+
+In most cases though, looking at the structure of the automaton is still
+needed. The following sections explain how to generate and read the
+detailed structural description of the automaton. There are several formats
+available:
@itemize @minus
@item
as text, see @ref{Understanding};
@@ -9807,19 +9818,175 @@ The last section focuses on the dynamic part of the parser: how to enable
and understand the parser run-time traces (@pxref{Tracing}).
@menu
+* Counterexamples:: Understanding conflicts.
* Understanding:: Understanding the structure of your parser.
* Graphviz:: Getting a visual representation of the parser.
* Xml:: Getting a markup representation of the parser.
* Tracing:: Tracing the execution of your parser.
@end menu
+@node Counterexamples
+@section Generation of Counterexamples
+
+Solving conflicts is probably the most delicate part of the design of an LR
+parser, as demonstrated by the number of sections devoted to them in this
+very documentation. To solve a conflict, one must understand it: when does
+it occur? Is it because of a flaw in the grammar? Is it rather because
+LR(1) cannot cope with this grammar?
+
+On difficulty is that conflicts occur in the @emph{automaton}, and it can be
+tricky to related them to issues in the @emph{grammar} itself. With
+experience and patience, analysis the detailed description of the automaton
+(@pxref{Understanding}) allows to find example strings that reach these conflicts.
+
+That task is made much easier thanks to the generation of counterexamples,
+initially developed by Chinawat Isradisaikul and Andrew Myers
+@pcite{Isradisaikul 2015}.
+
+As a first example, see the example grammar of @ref{Shift/Reduce}, which
+features on shift/reduce conflict:
+
+@example
+$ @kbd{bison if-then-else.y}
+if-then-else.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
+if-then-else.y: warning: rerun with option '-Wcounterexamples' to generate conflict counterexamples [-Wother]
+@end example
+
+@noindent
+Let's rerun @command{bison} with the option
+@option{-Wcex}/@option{-Wcounterexamples}@inlinefmt{info, (the following
+output is actually in color)}:
+
+@ifhtml
+@example
+Shift/reduce conflict on token "else":
+@group
+ Example @yellow{"if" expr "then"} @blue{"if" expr "then" stmt} @red{•} @yellow{"else" stmt}
+ First derivation @yellow{if_stmt ::=[ "if" expr "then"} @green{stmt ::=[} @blue{if_stmt ::=[ "if" expr "then" stmt} @red{•} @blue{]} @green{]} @yellow{"else" stmt ]}
+ Example @yellow{"if" expr "then"} @blue{"if" expr "then" stmt} @red{•} @blue{"else" stmt}
+ Second derivation @yellow{if_stmt ::=[ "if" expr "then"} @green{stmt ::=[} @blue{if_stmt ::=[ "if" expr "then" stmt} @red{•} @blue{"else" stmt ]} @green{]} @yellow{]}
+@end group
+@end example
+@end ifhtml
+@ifnothtml
+@smallexample
+Shift/reduce conflict on token "else":
+@group
+ Example
+ @yellow{"if" expr "then"} @blue{"if" expr "then" stmt} @red{•} @yellow{"else" stmt}
+ First derivation
+ @yellow{if_stmt ::=[ "if" expr "then"} @green{stmt ::=[} @blue{if_stmt ::=[ "if" expr "then" stmt} @red{•} @blue{]} @green{]} @yellow{"else" stmt ]}
+ Example
+ @yellow{"if" expr "then"} @blue{"if" expr "then" stmt} @red{•} @blue{"else" stmt}
+ Second derivation
+ @yellow{if_stmt ::=[ "if" expr "then"} @green{stmt ::=[} @blue{if_stmt ::=[ "if" expr "then" stmt} @red{•} @blue{"else" stmt ]} @green{]} @yellow{]}
+@end group
+@end smallexample
+@end ifnothtml
+
+This shows two different derivations for one single expression. That
+demonstrates that the grammar is ambiguous.
+
+@sp 1
+
+As a more delicate example, consider the example grammar of
+@ref{Reduce/Reduce}, which features a reduce/reduce conflict:
+
+@example
+%%
+sequence:
+ %empty
+| maybeword
+| sequence "word"
+;
+maybeword:
+ %empty
+| "word"
+;
+@end example
+
+Bison generates the following counterexamples:
+
+@example
+$ @kbd{bison -Wcex sequence.y}
+sequence.y: @dwarning{warning}: 1 shift/reduce conflict [@dwarning{-Wconflicts-sr}]
+sequence.y: @dwarning{warning}: 2 reduce/reduce conflicts [@dwarning{-Wconflicts-rr}]
+Shift/reduce conflict on token "word":
+ Example @red{•} @yellow{"word"}
+ First derivation @yellow{sequence ::=[} @green{sequence ::=[} @red{•} @green{]} @yellow{"word" ]}
+ Example @red{•} @green{"word"}
+ Second derivation @yellow{sequence ::=[} @green{maybeword ::=[} @red{•} @green{"word" ]} @yellow{]}
+
+Reduce/reduce conflict on tokens $end, "word":
+ Example @red{•}
+ First derivation @yellow{sequence ::=[} @red{•} @yellow{]}
+ Example @red{•}
+ Second derivation @yellow{sequence ::=[} @green{maybeword ::=[} @red{•} @green{]} @yellow{]}
+
+Shift/reduce conflict on token "word":
+ Example @red{•} @yellow{"word"}
+ First derivation @yellow{sequence ::=[} @green{sequence ::=[} @blue{maybeword ::=[} @red{•} @blue{]} @green{]} @yellow{"word" ]}
+ Example @red{•} @green{"word"}
+ Second derivation @yellow{sequence ::=[} @green{maybeword ::=[} @red{•} @green{"word" ]} @yellow{]}
+
+sequence.y:8.3-45: @dwarning{warning}: rule useless in parser due to conflicts [@dwarning{-Wother}]
+ 8 | @dwarning{%empty @{ printf ("empty maybeword\n"); @}}
+ | @dwarning{^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}
+@end example
+
+Each of these three conflicts, again, prove that the grammar is ambiguous.
+For instance, the second conflict (the reduce/reduce one) shows that the
+grammar accept the empty input in two different ways.
+
+@sp 1
+
+Sometimes, the search will not find an example that can be derived in two
+ways. In these cases, counterexample generation will provide two examples
+that are the same up until the dot. Most notably, this will happen when
+your grammar requires a stronger parser (more lookahead, LR instead of
+LALR). The following example isn't LR(1):
+
+@example
+%token ID
+%%
+s: a ID
+a: expr
+expr: %empty | expr ID ','
+@end example
+
+@command{bison} reports:
+
+@smallexample
+Shift/reduce conflict on token ID:
+ First example @blue{expr} @red{•} @green{ID} @yellow{$end}
+ First derivation @yellow{$accept ::=[} @green{s ::=[} @blue{a ::=[ expr} @red{•} @blue{]} @green{ID ]} @yellow{$end ]}
+ Second example @purple{expr} @red{•} @purple{ID ','} @green{ID} @yellow{$end}
+ Second derivation @yellow{$accept ::=[} @green{s ::=[} @blue{a ::=[} @purple{expr ::=[ expr} @red{•} @purple{ID ',' ]} @blue{]} @green{ID ]} @yellow{$end ]}
+@end smallexample
+
+This conflict is caused by the parser not having enough information to know
+the difference between these two examples. The parser would need an
+additional lookahead token to know whether or not a comma follows the
+@code{ID} after @code{expr}. These types of conflicts tend to be more
+difficult to fix, and usually need a rework of the grammar. In this case,
+it can be fixed by changing around the recursion: @code{expr: ID | ',' expr
+ID}.
+
+Alternatively, you might also want to consider using a GLR parser
+(@pxref{GLR Parsers}).
+
+@sp 1
+
+On occasions, it is useful to look at counterexamples @emph{in situ}: with
+the automaton report (@xref{Understanding}, in particular @ref{state-8,,
+State 8}).
+
@node Understanding
@section Understanding Your Parser
-As documented elsewhere (@pxref{Algorithm}) Bison parsers are
-@dfn{shift/reduce automata}. In some cases (much more frequent than one
-would hope), looking at this automaton is required to tune or simply fix a
-parser.
+Bison parsers are @dfn{shift/reduce automata} (@pxref{Algorithm}). In some
+cases (much more frequent than one would hope), looking at this automaton is
+required to tune or simply fix a parser.
The textual file is generated when the options @option{--report} or
@option{--verbose} are specified, see @ref{Invocation}. Its name is made by
@@ -9877,62 +10044,6 @@ calc.y: @dwarning{warning}: 7 shift/reduce conflicts [@dwarning{-Wconflicts-sr}]
calc.y: @dwarning{warning}: rerun with option '-Wcounterexamples' to generate conflict counterexamples [@dwarning{-Wother}]
@end smallexample
-When given @option{-Wcounterexamples}, @command{bison} will run a search for
-strings in your grammar that better demonstrate you
-conflicts. Counterexample generation was initially developed by Chinawat
-Isradisaikul and Andrew Myers @pcite{Isradisaikul 2015}. For
-@file{calc.y}, the first printed example is:
-
-@example
-Shift/reduce conflict on token '/':
- Example @green{exp '+' exp} @red{•} @yellow{'/' exp}
- First derivation @yellow{exp ::=[} @green{exp ::=[ exp '+' exp} @red{•} @green{]} @yellow{'/' exp ]}
- Example @yellow{exp '+'} @green{exp} @red{•} @green{'/' exp}
- Second derivation @yellow{exp ::=[ exp '+'} @green{exp ::=[ exp} @red{•} @green{'/' exp ]} @yellow{]}
-@end example
-
-This shows two separate derivations in the grammar for the same @code{exp}:
-@samp{e1 + e2 / e3}. The derivations show how your rules would parse the
-given example. Here, the first derivation completes a reduction when seeing
-@samp{/}, causing @samp{e1 + e2} to be grouped as an @code{exp}. The second
-derivation shifts on @samp{/}, resulting in @samp{e2 / e3} being grouped as
-an @code{exp}. Therefore, it is easy to see that adding @code{%precedence}
-directives would fix this conflict.
-
-Sometimes, the search will not find an example that can be derived in two
-ways. In these cases, counterexample generation will provide two examples
-that are the same up until the dot. Most notably, this will happen when
-your grammar requires a stronger parser (more lookahead, LR instead of
-LALR). The following example isn't LR(1):
-
-@example
-%token ID
-%%
-s: a ID
-a: expr
-expr: %empty | expr ID ','
-@end example
-
-@command{bison} reports:
-
-@smallexample
-Shift/reduce conflict on token ID:
- First example @blue{expr} @red{•} @green{ID} @yellow{$end}
- First derivation @yellow{$accept ::=[} @green{s ::=[} @blue{a ::=[ expr} @red{•} @blue{]} @green{ID ]} @yellow{$end ]}
- Second example @blue{expr} @red{•} @blue{ID ','} @green{ID} @yellow{$end}
- Second derivation @yellow{$accept ::=[} @green{s ::=[} @blue{a ::=[ expr ::=[ expr} @red{•} @blue{ID ',' ] ]} @green{ID ]} @yellow{$end ]}
-@end smallexample
-
-This conflict is caused by the parser not having enough information to know
-the difference between these two examples. The parser would need an
-additional lookahead token to know whether or not a comma follows the
-@code{ID} after @code{expr}. These types of conflicts tend to be more
-difficult to fix, and usually need a rework of the grammar. In this case,
-it can be fixed by changing around the recursion: @code{expr: ID | ',' expr
-ID}.
-
-Counterexamples can also be written to a file with @option{--report=cex}.
-
Going back to the calc example, when given @option{--report=state},
in addition to @file{calc.tab.c}, it creates a file @file{calc.output}
with contents detailed below. The order of the output and the exact
@@ -10164,6 +10275,7 @@ State 7
exp go to state 11
@end example
+@anchor{state-8}
As was announced in beginning of the report, @samp{State 8 conflicts:
1 shift/reduce}:
@@ -10237,6 +10349,27 @@ Conflict between rule 1 and token '-' resolved as reduce (%left '-').
Conflict between rule 1 and token '*' resolved as shift ('+' < '*').
@end example
+When given @option{--report=counterexamples}, @command{bison} will generate
+counterexamples within the report, augmented with the corresponding items
+(@pxref{Counterexamples}).
+
+@example
+Shift/reduce conflict on token '/':
+ 1 exp: exp '+' exp •
+ 4 exp: exp • '/' exp
+ Example @green{exp '+' exp} @red{•} @yellow{'/' exp}
+ First derivation @yellow{exp ::=[} @green{exp ::=[ exp '+' exp} @red{•} @green{]} @yellow{'/' exp ]}
+ Example @yellow{exp '+'} @green{exp} @red{•} @green{'/' exp}
+ Second derivation @yellow{exp ::=[ exp '+'} @green{exp ::=[ exp} @red{•} @green{'/' exp ]} @yellow{]}
+@end example
+
+This shows two separate derivations in the grammar for the same @code{exp}:
+@samp{e1 + e2 / e3}. The derivations show how your rules would parse the
+given example. Here, the first derivation completes a reduction when seeing
+@samp{/}, causing @samp{e1 + e2} to be grouped as an @code{exp}. The second
+derivation shifts on @samp{/}, resulting in @samp{e2 / e3} being grouped as
+an @code{exp}. Therefore, it is easy to see that adding @code{%precedence}
+directives would fix this conflict.
The remaining states are similar:
@@ -11084,6 +11217,13 @@ unexpected number of conflicts is an error, and an expected number of
conflicts is not reported, so @option{-W} and @option{--warning} then have
no effect on the conflict report.
+@item counterexamples
+@itemx cex
+Provide counterexamples for conflicts. @xref{Counterexamples}.
+Counterexamples take time to compute. The option @option{-Wcex} should be
+used by the developer when working on the grammar; it hardly makes sense to
+use it in a CI.
+
@item dangling-alias
Report string literals that are not bound to a token symbol.
@@ -11458,6 +11598,13 @@ each rule's lookahead set.
Implies @code{state}. Explain how conflicts were solved thanks to
precedence and associativity directives.
+@item counterexamples
+@itemx cex
+Look for counterexamples for the conflicts. @xref{Counterexamples}.
+Counterexamples take time to compute. The option @option{-rcex} should be
+used by the developer when working on the grammar; it hardly makes sense to
+use it in a CI.
+
@item all
Enable all the items.
@@ -15121,6 +15268,18 @@ Thus, if there is a rule which says that an integer can be used as an
expression, integers are allowed @emph{anywhere} an expression is
permitted. @xref{Language and Grammar}.
+@item Counterexample
+A sequence of tokens and/or nonterminals, with one dot, that demonstrates a
+conflict. The dot marks the place where the conflict occurs.
+
+A @emph{unifying} counterexample is a single string that has two different
+parses; its existence proves that the grammar is ambiguous. When a unifying
+counterexample cannot be found in reasonable time, a @emph{nonunifying}
+counterexample is built: @emph{two} different string sharing the prefix up
+to the dot.
+
+@xref{Counterexamples}
+
@item Default reduction
The reduction that a parser should perform if the current parser state
contains no other action for the lookahead token. In permitted parser
@@ -15507,7 +15666,9 @@ London, Department of Computer Science, TR-00-12 (December 2000).
@c LocalWords: ResourceBundle myResources getString getName getToken ylwrap
@c LocalWords: getLocation getExpectedTokens reportSyntaxError bistromathic
@c LocalWords: TokenKind Automake's rtti Wcounterexamples Chinawat PLDI
-@c LocalWords: Isradisaikul
+@c LocalWords: Isradisaikul tcite pcite rgbGreen colorGreen rgbYellow Wcex
+@c LocalWords: colorYellow rgbRed colorRed rgbBlue colorBlue rgbPurple
+@c LocalWords: colorPurple ifhtml ifnothtml situ rcex
@c Local Variables:
@c ispell-dictionary: "american"