Eliminate the duplicative regops BOL and EOL

See also perl5porters thread titled: "Perl MBOLism in regex engine" In the perl 5.000 release (a0d0e21ea6ea90a22318550944fe6cb09ae10cda) the BOL regop was split into two behaviours MBOL and SBOL, with SBOL and BOL behaving identically. Similarly the EOL regop was split into two behaviors SEOL and MEOL, with EOL and SEOL behaving identically. This then resulted in various duplicative code related to flags and case statements in various parts of the regex engine. It appears that perhaps BOL and EOL were kept because they are the type ("regkind") for SBOL/MBOL and SEOL/MEOL/EOS. Reworking regcomp.pl to handle aliases for the type data so that SBOL/MBOL are of type BOL, even though BOL == SBOL seems to cover that case without adding to the confusion. This means two regops, a regstate, and an internal regex flag can be removed (and used for other things), and various logic relating to them can be removed. For the uninitiated, SBOL is /^/ and /\A/ (with or without /m) and MBOL is /^/m. (I consider it a fail we have no way to say MBOL without the /m modifier). Similarly SEOL is /$/ and MEOL is /$/m (there is also a /\z/ which is EOS "end of string" with or without the /m).
author: Yves Orton <demerphq@gmail.com> 2014-09-17 00:23:01 +0200
committer: Yves Orton <demerphq@gmail.com> 2014-09-17 04:47:34 +0200
commit: d3d47aac53402ea3d4836c60e3659dc927a9887c (patch)
tree: 90ce6aa3a324b6c2cf4c6b17b2b69f91c4239a7c /regcomp.sym
parent: e1919fe5716d96be44afc32406d9504bc70403de (diff)
download: perl-d3d47aac53402ea3d4836c60e3659dc927a9887c.tar.gz
1 files changed, 19 insertions, 15 deletions
diff --git a/regcomp.sym b/regcomp.sym
index bea2a8e716..b285647086 100644
--- a/regcomp.sym
+++ b/regcomp.sym
@@ -24,15 +24,19 @@
 END         END,        no        ; End of program.
 SUCCEED     END,        no        ; Return from a subroutine, basically.
 
-#* Anchors:
-
-BOL         BOL,        no        ; Match "" at beginning of line.
-MBOL        BOL,        no        ; Same, assuming multiline.
-SBOL        BOL,        no        ; Same, assuming singleline.
-EOS         EOL,        no        ; Match "" at end of string.
-EOL         EOL,        no        ; Match "" at end of line.
-MEOL        EOL,        no        ; Same, assuming multiline.
-SEOL        EOL,        no        ; Same, assuming singleline.
+#* Line Start Anchors:
+SBOL        BOL,        no        ; Match "" at beginning of line: /^/, /\A/
+MBOL        BOL,        no        ; Same, assuming multiline: /^/m
+
+#* Line End Anchors:
+SEOL        EOL,        no        ; Match "" at end of line: /$/
+MEOL        EOL,        no        ; Same, assuming multiline: /$/m
+EOS         EOL,        no        ; Match "" at end of string: /\z/
+
+#* Match Start Anchors:
+GPOS        GPOS,       no        ; Matches where last m//g left off.
+
+#* Word Boundary Opcodes:
 # The regops that have varieties that vary depending on the character set regex
 # modifiers have to ordered thusly: /d, /l, /u, /a, /aa.  This is because code
 # in regcomp.c uses the enum value of the modifier as an offset from the /d
@@ -47,15 +51,14 @@ NBOUND      NBOUND,     no        ; Match "" at any word non-boundary using nati
 NBOUNDL     NBOUND,     no        ; Match "" at any locale word non-boundary
 NBOUNDU     NBOUND,     no        ; Match "" at any word non-boundary using Unicode rules
 NBOUNDA     NBOUND,     no        ; Match "" at any word non-boundary using ASCII rules
-GPOS        GPOS,       no        ; Matches where last m//g left off.
 
 #* [Special] alternatives:
-
 REG_ANY     REG_ANY,    no 0 S    ; Match any one character (except newline).
 SANY        REG_ANY,    no 0 S    ; Match any one character.
 CANY        REG_ANY,    no 0 S    ; Match any one byte.
 ANYOF       ANYOF,      sv 0 S    ; Match character in (or not in) this class, single char match only
 
+#* POSIX Character Classes:
 # Order of the below is important.  See ordering comment above.
 POSIXD      POSIXD,     none 0 S   ; Some [[:class:]] under /d; the FLAGS field gives which one
 POSIXL      POSIXD,     none 0 S   ; Some [[:class:]] under /l; the FLAGS field gives which one
@@ -147,16 +150,17 @@ NREFFL      REF,        no-sv 1 V ; Match already matched string, folded in loc.
 NREFFU      REF,        num   1 V ; Match already matched string, folded using unicode rules for non-utf8
 NREFFA      REF,        num   1 V ; Match already matched string, folded using unicode rules for non-utf8, no mixing ASCII, non-ASCII
 
+#*Support for long RE
+LONGJMP     LONGJMP,    off 1 . 1 ; Jump far away.
+BRANCHJ     BRANCHJ,    off 1 V 1 ; BRANCH with long offset.
+
+#*Special Case Regops
 IFMATCH     BRANCHJ,    off 1 . 2 ; Succeeds if the following matches.
 UNLESSM     BRANCHJ,    off 1 . 2 ; Fails if the following matches.
 SUSPEND     BRANCHJ,    off 1 V 1 ; "Independent" sub-RE.
 IFTHEN      BRANCHJ,    off 1 V 1 ; Switch, should be preceded by switcher.
 GROUPP      GROUPP,     num 1     ; Whether the group matched.
 
-#*Support for long RE
-
-LONGJMP     LONGJMP,    off 1 . 1 ; Jump far away.
-BRANCHJ     BRANCHJ,    off 1 V 1 ; BRANCH with long offset.
 
 #*The heavy worker
author	Yves Orton <demerphq@gmail.com>	2014-09-17 00:23:01 +0200
committer	Yves Orton <demerphq@gmail.com>	2014-09-17 04:47:34 +0200
commit	d3d47aac53402ea3d4836c60e3659dc927a9887c (patch)
tree	90ce6aa3a324b6c2cf4c6b17b2b69f91c4239a7c /regcomp.sym
parent	e1919fe5716d96be44afc32406d9504bc70403de (diff)
download	perl-d3d47aac53402ea3d4836c60e3659dc927a9887c.tar.gz