diff options
author | Joel E. Denny <joeldenny@joeldenny.org> | 2011-01-29 12:54:28 -0500 |
---|---|---|
committer | Joel E. Denny <joeldenny@joeldenny.org> | 2011-01-29 14:57:53 -0500 |
commit | 82f3355eaf8d5988391021262dc9acfa6485c098 (patch) | |
tree | 647d67f92023898e19261e8d075ba7c01c40e9f2 | |
parent | 676997e53bf9e8d364302bd1f90d812e50b1477a (diff) | |
download | bison-82f3355eaf8d5988391021262dc9acfa6485c098.tar.gz |
Do not allow identifiers that start with a dash.
This cleans up our previous fixes for a bug whereby Bison
discarded `.field' in `$-1.field'. The previous fixes were less
restrictive about where a dash could appear in an identifier, but
the restrictions were hard to explain. That bug was reported and
this final fix was originally suggested by Paul Hilfinger. This
also fixes a remaining bug reported by Paul Eggert whereby Bison
parses `%token ID -123' as `%token ID - 123' and handles `-' as an
identifier. Now, `-' cannot be an identifier. Discussed in
threads beginning at
<http://lists.gnu.org/archive/html/bug-bison/2011-01/msg00000.html>,
<http://lists.gnu.org/archive/html/bug-bison/2011-01/msg00004.html>.
* NEWS (2.5): Update entry describing the dash extension to
grammar symbol names. Also, move that entry before the named
references entry because the latter mentions the former.
* doc/bison.texinfo (Symbol): Update documentation for symbol
names. As suggested by Paul Eggert, mention the effect of periods
and dashes on named references.
(Decl Summary): Update documentation for unquoted %define values,
which, as a side effect, can no longer start with dashes either.
* src/scan-code.l (id): Implement.
* src/scan-gram.l (id): Implement.
* tests/actions.at (Exotic Dollars): Extend test group to exercise
bug reported by Paul Hilfinger.
* tests/input.at (Symbols): Update test group, and extend to
exercise bug reported by Paul Eggert.
* tests/named-refs.at (Stray symbols in brackets): Update test
group.
($ or @ followed by . or -): Likewise.
* tests/regression.at (Invalid inputs): Likewise.
-rw-r--r-- | ChangeLog | 33 | ||||
-rw-r--r-- | NEWS | 16 | ||||
-rw-r--r-- | doc/bison.texinfo | 18 | ||||
-rw-r--r-- | src/scan-code.l | 2 | ||||
-rw-r--r-- | src/scan-gram.l | 2 | ||||
-rw-r--r-- | tests/actions.at | 46 | ||||
-rw-r--r-- | tests/input.at | 11 | ||||
-rw-r--r-- | tests/named-refs.at | 27 | ||||
-rw-r--r-- | tests/regression.at | 3 |
9 files changed, 123 insertions, 35 deletions
@@ -1,3 +1,36 @@ +2011-01-29 Joel E. Denny <joeldenny@joeldenny.org> + + Do not allow identifiers that start with a dash. + This cleans up our previous fixes for a bug whereby Bison + discarded `.field' in `$-1.field'. The previous fixes were less + restrictive about where a dash could appear in an identifier, but + the restrictions were hard to explain. That bug was reported and + this final fix was originally suggested by Paul Hilfinger. This + also fixes a remaining bug reported by Paul Eggert whereby Bison + parses `%token ID -123' as `%token ID - 123' and handles `-' as an + identifier. Now, `-' cannot be an identifier. Discussed in + threads beginning at + <http://lists.gnu.org/archive/html/bug-bison/2011-01/msg00000.html>, + <http://lists.gnu.org/archive/html/bug-bison/2011-01/msg00004.html>. + * NEWS (2.5): Update entry describing the dash extension to + grammar symbol names. Also, move that entry before the named + references entry because the latter mentions the former. + * doc/bison.texinfo (Symbol): Update documentation for symbol + names. As suggested by Paul Eggert, mention the effect of periods + and dashes on named references. + (Decl Summary): Update documentation for unquoted %define values, + which, as a side effect, can no longer start with dashes either. + * src/scan-code.l (id): Implement. + * src/scan-gram.l (id): Implement. + * tests/actions.at (Exotic Dollars): Extend test group to exercise + bug reported by Paul Hilfinger. + * tests/input.at (Symbols): Update test group, and extend to + exercise bug reported by Paul Eggert. + * tests/named-refs.at (Stray symbols in brackets): Update test + group. + ($ or @ followed by . or -): Likewise. + * tests/regression.at (Invalid inputs): Likewise. + 2011-01-24 Joel E. Denny <joeldenny@joeldenny.org> * data/yacc.c: Fix last apostrophe warning from xgettext. @@ -62,6 +62,14 @@ Bison News * Changes in version 2.5 (????-??-??): +** Grammar symbol names can now contain non-initial dashes: + + Consistently with directives (such as %error-verbose) and with + %define variables (e.g. push-pull), grammar symbol names may contain + dashes in any position except the beginning. This is a GNU + extension over POSIX Yacc. Thus, use of this extension is reported + by -Wyacc and rejected in Yacc mode (--yacc). + ** Named references: Historically, Yacc and Bison have supported positional references @@ -157,14 +165,6 @@ Bison News LAC is an experimental feature. More user feedback will help to stabilize it. -** Grammar symbol names can now contain dashes: - - Consistently with directives (such as %error-verbose) and variables - (e.g. push-pull), grammar symbol names may include dashes in any - position, similarly to periods and underscores. This is GNU - extension over POSIX Yacc whose use is reported by -Wyacc, and - rejected in Yacc mode (--yacc). - ** %define improvements: *** Can now be invoked via the command line: diff --git a/doc/bison.texinfo b/doc/bison.texinfo index cc1e0645..8b96ad93 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -3123,12 +3123,13 @@ A @dfn{nonterminal symbol} stands for a class of syntactically equivalent groupings. The symbol name is used in writing grammar rules. By convention, it should be all lower case. -Symbol names can contain letters, underscores, periods, dashes, and (not -at the beginning) digits. Dashes in symbol names are a GNU -extension, incompatible with POSIX Yacc. Terminal symbols -that contain periods or dashes make little sense: since they are not -valid symbols (in most programming languages) they are not exported as -token names. +Symbol names can contain letters, underscores, periods, and non-initial +digits and dashes. Dashes in symbol names are a GNU extension, incompatible +with POSIX Yacc. Periods and dashes make symbol names less convenient to +use with named references, which require brackets around such names +(@pxref{Named References}). Terminal symbols that contain periods or dashes +make little sense: since they are not valid symbols (in most programming +languages) they are not exported as token names. There are three ways of writing terminal symbols in the grammar: @@ -5039,9 +5040,8 @@ Define a variable to adjust Bison's behavior. It is an error if a @var{variable} is defined by @code{%define} multiple times, but see @ref{Bison Options,,-D @var{name}[=@var{value}]}. -@var{value} must be placed in quotation marks if it contains any -character other than a letter, underscore, period, dash, or non-initial -digit. +@var{value} must be placed in quotation marks if it contains any character +other than a letter, underscore, period, or non-initial dash or digit. Omitting @code{"@var{value}"} entirely is always equivalent to specifying @code{""}. diff --git a/src/scan-code.l b/src/scan-code.l index 3dd10443..66757199 100644 --- a/src/scan-code.l +++ b/src/scan-code.l @@ -86,7 +86,7 @@ splice (\\[ \f\t\v]*\n)* named symbol references. Shall be kept synchronized with scan-gram.l "letter" and "id". */ letter [.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_] -id -*(-|{letter}({letter}|[-0-9])*) +id {letter}({letter}|[-0-9])* ref -?[0-9]+|{id}|"["{id}"]"|"$" %% diff --git a/src/scan-gram.l b/src/scan-gram.l index 41291812..83d76506 100644 --- a/src/scan-gram.l +++ b/src/scan-gram.l @@ -119,7 +119,7 @@ static void unexpected_newline (boundary, char const *); %x SC_BRACKETED_ID SC_RETURN_BRACKETED_ID letter [.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_] -id -*(-|{letter}({letter}|[-0-9])*) +id {letter}({letter}|[-0-9])* directive %{id} int [0-9]+ diff --git a/tests/actions.at b/tests/actions.at index 6f267af7..24c6ac8a 100644 --- a/tests/actions.at +++ b/tests/actions.at @@ -158,6 +158,52 @@ AT_PARSER_CHECK([./input], 0, [[15 ]]) +# Make sure that fields after $n or $-n are parsed correctly. At one +# point while implementing dashes in symbol names, we were dropping +# fields after $-n. +AT_DATA_GRAMMAR([[input.y]], +[[ +%{ +# include <stdio.h> + static int yylex (void); + static void yyerror (char const *msg); + typedef struct { int val; } stype; +# define YYSTYPE stype +%} + +%% +start: one two { $$.val = $1.val + $2.val; } sum ; +one: { $$.val = 1; } ; +two: { $$.val = 2; } ; +sum: { printf ("%d\n", $0.val + $-1.val + $-2.val); } ; + +%% + +static int +yylex (void) +{ + return 0; +} + +static void +yyerror (char const *msg) +{ + fprintf (stderr, "%s\n", msg); +} + +int +main (void) +{ + return yyparse (); +} +]]) + +AT_BISON_CHECK([[-o input.c input.y]]) +AT_COMPILE([[input]]) +AT_PARSER_CHECK([[./input]], [[0]], +[[6 +]]) + AT_CLEANUP diff --git a/tests/input.at b/tests/input.at index f223a332..8a71ff6f 100644 --- a/tests/input.at +++ b/tests/input.at @@ -653,17 +653,20 @@ AT_BISON_CHECK([-o input.c input.y]) AT_COMPILE([input.o], [-c input.c]) -# Periods and dashes are genuine letters, they can start identifiers. -# Digits cannot. +# Periods are genuine letters, they can start identifiers. +# Digits and dashes cannot. AT_DATA_GRAMMAR([input.y], [[%token .GOOD -GOOD 1NV4L1D + -123 %% -start: .GOOD -GOOD +start: .GOOD GOOD ]]) AT_BISON_CHECK([-o input.c input.y], [1], [], -[[input.y:11.10-16: invalid identifier: `1NV4L1D' +[[input.y:10.10: invalid character: `-' +input.y:11.10-16: invalid identifier: `1NV4L1D' +input.y:12.10: invalid character: `-' ]]) AT_CLEANUP diff --git a/tests/named-refs.at b/tests/named-refs.at index 74549c6e..3c7b072d 100644 --- a/tests/named-refs.at +++ b/tests/named-refs.at @@ -446,13 +446,14 @@ AT_SETUP([Stray symbols in brackets]) AT_DATA_GRAMMAR([test.y], [[ %% -start: foo[ /* aaa */ *&-+ ] bar +start: foo[ /* aaa */ *&-.+ ] bar { s = $foo; } ]]) AT_BISON_CHECK([-o test.c test.y], 1, [], [[test.y:11.23: invalid character in bracketed name: `*' test.y:11.24: invalid character in bracketed name: `&' -test.y:11.26: invalid character in bracketed name: `+' +test.y:11.25: invalid character in bracketed name: `-' +test.y:11.27: invalid character in bracketed name: `+' ]]) AT_CLEANUP @@ -570,23 +571,27 @@ AT_DATA([[test.y]], %% start: .field { $.field; } -| -field { @-field; } | 'a' { @.field; } -| 'a' { $-field; } ; .field: ; --field: ; ]]) AT_BISON_CHECK([[test.y]], [[1]], [], [[test.y:4.12-18: invalid reference: `$.field' test.y:4.13: syntax error after `$', expecting integer, letter, `_', `@<:@', or `$' test.y:4.3-8: possibly meant: $[.field] at $1 -test.y:5.12-18: invalid reference: `@-field' +test.y:5.12-18: invalid reference: `@.field' test.y:5.13: syntax error after `@', expecting integer, letter, `_', `@<:@', or `$' -test.y:5.3-8: possibly meant: @[-field] at $1 -test.y:6.12-18: invalid reference: `@.field' -test.y:6.13: syntax error after `@', expecting integer, letter, `_', `@<:@', or `$' -test.y:7.12-18: invalid reference: `$-field' -test.y:7.13: syntax error after `$', expecting integer, letter, `_', `@<:@', or `$' +]]) +AT_DATA([[test.y]], +[[ +%% +start: + 'a' { $-field; } +| 'b' { @-field; } +; +]]) +AT_BISON_CHECK([[test.y]], [[0]], [], +[[test.y:4.9: warning: stray `$' +test.y:5.9: warning: stray `@' ]]) AT_CLEANUP diff --git a/tests/regression.at b/tests/regression.at index a49d1df0..5e0ead03 100644 --- a/tests/regression.at +++ b/tests/regression.at @@ -392,7 +392,8 @@ input.y:3.14: invalid character: `}' input.y:4.1: invalid character: `%' input.y:4.2: invalid character: `&' input.y:5.1-17: invalid directive: `%a-does-not-exist' -input.y:6.1-2: invalid directive: `%-' +input.y:6.1: invalid character: `%' +input.y:6.2: invalid character: `-' input.y:7.1-8.0: missing `%}' at end of file input.y:7.1-8.0: syntax error, unexpected %{...%} ]]) |