diff options
author | Yves Orton <demerphq@gmail.com> | 2014-09-17 00:23:01 +0200 |
---|---|---|
committer | Yves Orton <demerphq@gmail.com> | 2014-09-17 04:47:34 +0200 |
commit | d3d47aac53402ea3d4836c60e3659dc927a9887c (patch) | |
tree | 90ce6aa3a324b6c2cf4c6b17b2b69f91c4239a7c /regen | |
parent | e1919fe5716d96be44afc32406d9504bc70403de (diff) | |
download | perl-d3d47aac53402ea3d4836c60e3659dc927a9887c.tar.gz |
Eliminate the duplicative regops BOL and EOL
See also perl5porters thread titled: "Perl MBOLism in regex engine"
In the perl 5.000 release (a0d0e21ea6ea90a22318550944fe6cb09ae10cda)
the BOL regop was split into two behaviours MBOL and SBOL, with SBOL
and BOL behaving identically. Similarly the EOL regop was split into
two behaviors SEOL and MEOL, with EOL and SEOL behaving identically.
This then resulted in various duplicative code related to flags and
case statements in various parts of the regex engine.
It appears that perhaps BOL and EOL were kept because they are the
type ("regkind") for SBOL/MBOL and SEOL/MEOL/EOS. Reworking regcomp.pl
to handle aliases for the type data so that SBOL/MBOL are of type
BOL, even though BOL == SBOL seems to cover that case without adding
to the confusion.
This means two regops, a regstate, and an internal regex flag can
be removed (and used for other things), and various logic relating
to them can be removed.
For the uninitiated, SBOL is /^/ and /\A/ (with or without /m) and
MBOL is /^/m. (I consider it a fail we have no way to say MBOL without
the /m modifier). Similarly SEOL is /$/ and MEOL is /$/m (there is
also a /\z/ which is EOS "end of string" with or without the /m).
Diffstat (limited to 'regen')
-rw-r--r-- | regen/regcomp.pl | 21 |
1 files changed, 20 insertions, 1 deletions
diff --git a/regen/regcomp.pl b/regen/regcomp.pl index 2b6d9641c2..538bddefe7 100644 --- a/regen/regcomp.pl +++ b/regen/regcomp.pl @@ -28,6 +28,7 @@ open DESC, 'regcomp.sym'; my $ind = 0; my (@name,@rest,@type,@code,@args,@flags,@longj,@cmnt); my ($longest_name_length,$desc,$lastregop) = 0; +my (%seen_op, %type_alias); while (<DESC>) { # Special pod comments if (/^#\* ?/) { $cmnt[$ind] .= "# $'"; } @@ -43,8 +44,21 @@ while (<DESC>) { } unless ($lastregop) { ($name[$ind], $desc, $rest[$ind]) = /^(\S+)\s+([^\t]+?)\s*;\s*(.*)/; + + if (defined $seen_op{$name[$ind]}) { + die "Duplicate regop $name[$ind] in regcomp.sym line $. previously defined on line $seen_op{$name[$ind]}\n"; + } else { + $seen_op{$name[$ind]}= $.; + } + ($type[$ind], $code[$ind], $args[$ind], $flags[$ind], $longj[$ind]) = split /[,\s]\s*/, $desc; + + if (!defined $seen_op{$type[$ind]} and !defined $type_alias{$type[$ind]}) { + warn "Regop type '$type[$ind]' from regcomp.sym line $. is not an existing regop, and will be aliased to $name[$ind]\n"; + $type_alias{$type[$ind]}= $name[$ind]; + } + $longest_name_length = length $name[$ind] if length $name[$ind] > $longest_name_length; ++$ind; @@ -148,10 +162,15 @@ EOP -$width, REGMATCH_STATE_MAX => $tot - 1 ; - +my %rev_type_alias= reverse %type_alias; for ($ind=0; $ind < $lastregop ; ++$ind) { printf $out "#define\t%*s\t%d\t/* %#04x %s */\n", -$width, $name[$ind], $ind, $ind, $rest[$ind]; + if (defined(my $alias= $rev_type_alias{$name[$ind]})) { + printf $out "#define\t%*s\t%d\t/* %#04x %s */\n", + -$width, $alias, $ind, $ind, "type alias"; + } + } print $out "\t/* ------------ States ------------- */\n"; for ( ; $ind < $tot ; $ind++) { |