summaryrefslogtreecommitdiff
path: root/regcomp.sym
diff options
context:
space:
mode:
authorYves Orton <demerphq@gmail.com>2014-09-17 00:23:01 +0200
committerYves Orton <demerphq@gmail.com>2014-09-17 04:47:34 +0200
commitd3d47aac53402ea3d4836c60e3659dc927a9887c (patch)
tree90ce6aa3a324b6c2cf4c6b17b2b69f91c4239a7c /regcomp.sym
parente1919fe5716d96be44afc32406d9504bc70403de (diff)
downloadperl-d3d47aac53402ea3d4836c60e3659dc927a9887c.tar.gz
Eliminate the duplicative regops BOL and EOL
See also perl5porters thread titled: "Perl MBOLism in regex engine" In the perl 5.000 release (a0d0e21ea6ea90a22318550944fe6cb09ae10cda) the BOL regop was split into two behaviours MBOL and SBOL, with SBOL and BOL behaving identically. Similarly the EOL regop was split into two behaviors SEOL and MEOL, with EOL and SEOL behaving identically. This then resulted in various duplicative code related to flags and case statements in various parts of the regex engine. It appears that perhaps BOL and EOL were kept because they are the type ("regkind") for SBOL/MBOL and SEOL/MEOL/EOS. Reworking regcomp.pl to handle aliases for the type data so that SBOL/MBOL are of type BOL, even though BOL == SBOL seems to cover that case without adding to the confusion. This means two regops, a regstate, and an internal regex flag can be removed (and used for other things), and various logic relating to them can be removed. For the uninitiated, SBOL is /^/ and /\A/ (with or without /m) and MBOL is /^/m. (I consider it a fail we have no way to say MBOL without the /m modifier). Similarly SEOL is /$/ and MEOL is /$/m (there is also a /\z/ which is EOS "end of string" with or without the /m).
Diffstat (limited to 'regcomp.sym')
-rw-r--r--regcomp.sym34
1 files changed, 19 insertions, 15 deletions
diff --git a/regcomp.sym b/regcomp.sym
index bea2a8e716..b285647086 100644
--- a/regcomp.sym
+++ b/regcomp.sym
@@ -24,15 +24,19 @@
END END, no ; End of program.
SUCCEED END, no ; Return from a subroutine, basically.
-#* Anchors:
-
-BOL BOL, no ; Match "" at beginning of line.
-MBOL BOL, no ; Same, assuming multiline.
-SBOL BOL, no ; Same, assuming singleline.
-EOS EOL, no ; Match "" at end of string.
-EOL EOL, no ; Match "" at end of line.
-MEOL EOL, no ; Same, assuming multiline.
-SEOL EOL, no ; Same, assuming singleline.
+#* Line Start Anchors:
+SBOL BOL, no ; Match "" at beginning of line: /^/, /\A/
+MBOL BOL, no ; Same, assuming multiline: /^/m
+
+#* Line End Anchors:
+SEOL EOL, no ; Match "" at end of line: /$/
+MEOL EOL, no ; Same, assuming multiline: /$/m
+EOS EOL, no ; Match "" at end of string: /\z/
+
+#* Match Start Anchors:
+GPOS GPOS, no ; Matches where last m//g left off.
+
+#* Word Boundary Opcodes:
# The regops that have varieties that vary depending on the character set regex
# modifiers have to ordered thusly: /d, /l, /u, /a, /aa. This is because code
# in regcomp.c uses the enum value of the modifier as an offset from the /d
@@ -47,15 +51,14 @@ NBOUND NBOUND, no ; Match "" at any word non-boundary using nati
NBOUNDL NBOUND, no ; Match "" at any locale word non-boundary
NBOUNDU NBOUND, no ; Match "" at any word non-boundary using Unicode rules
NBOUNDA NBOUND, no ; Match "" at any word non-boundary using ASCII rules
-GPOS GPOS, no ; Matches where last m//g left off.
#* [Special] alternatives:
-
REG_ANY REG_ANY, no 0 S ; Match any one character (except newline).
SANY REG_ANY, no 0 S ; Match any one character.
CANY REG_ANY, no 0 S ; Match any one byte.
ANYOF ANYOF, sv 0 S ; Match character in (or not in) this class, single char match only
+#* POSIX Character Classes:
# Order of the below is important. See ordering comment above.
POSIXD POSIXD, none 0 S ; Some [[:class:]] under /d; the FLAGS field gives which one
POSIXL POSIXD, none 0 S ; Some [[:class:]] under /l; the FLAGS field gives which one
@@ -147,16 +150,17 @@ NREFFL REF, no-sv 1 V ; Match already matched string, folded in loc.
NREFFU REF, num 1 V ; Match already matched string, folded using unicode rules for non-utf8
NREFFA REF, num 1 V ; Match already matched string, folded using unicode rules for non-utf8, no mixing ASCII, non-ASCII
+#*Support for long RE
+LONGJMP LONGJMP, off 1 . 1 ; Jump far away.
+BRANCHJ BRANCHJ, off 1 V 1 ; BRANCH with long offset.
+
+#*Special Case Regops
IFMATCH BRANCHJ, off 1 . 2 ; Succeeds if the following matches.
UNLESSM BRANCHJ, off 1 . 2 ; Fails if the following matches.
SUSPEND BRANCHJ, off 1 V 1 ; "Independent" sub-RE.
IFTHEN BRANCHJ, off 1 V 1 ; Switch, should be preceded by switcher.
GROUPP GROUPP, num 1 ; Whether the group matched.
-#*Support for long RE
-
-LONGJMP LONGJMP, off 1 . 1 ; Jump far away.
-BRANCHJ BRANCHJ, off 1 V 1 ; BRANCH with long offset.
#*The heavy worker