diff options
author | Karl Williamson <public@khwilliamson.com> | 2011-03-24 15:37:00 -0600 |
---|---|---|
committer | Karl Williamson <public@khwilliamson.com> | 2011-03-24 15:41:57 -0600 |
commit | 5da6b59a91ad9679bcad4cba7615c78cd63d09f8 (patch) | |
tree | 7cf868d8f7b88f53ede23fbf7982040fd05e86ba /pod/perldebguts.pod | |
parent | efdea7e225ab2d9822f0b179482b0ea44630e51d (diff) | |
download | perl-5da6b59a91ad9679bcad4cba7615c78cd63d09f8.tar.gz |
perldebguts: Update regnodes to 5.14
This hadn't been updated for quite some time. It just takes what is
in regcomp.sym, and removes some columns, and some reformatting.
Diffstat (limited to 'pod/perldebguts.pod')
-rw-r--r-- | pod/perldebguts.pod | 339 |
1 files changed, 230 insertions, 109 deletions
diff --git a/pod/perldebguts.pod b/pod/perldebguts.pod index e5970a3e54..9bc0b63de4 100644 --- a/pod/perldebguts.pod +++ b/pod/perldebguts.pod @@ -536,115 +536,236 @@ C< >I<id>: I<TYPE> I<OPTIONAL-INFO> (I<next-id>) Here are the possible types, with short descriptions: - # TYPE arg-description [num-args] [longjump-len] DESCRIPTION - - # Exit points - END no End of program. - SUCCEED no Return from a subroutine, basically. - - # Anchors: - BOL no Match "" at beginning of line. - MBOL no Same, assuming multiline. - SBOL no Same, assuming singleline. - EOS no Match "" at end of string. - EOL no Match "" at end of line. - MEOL no Same, assuming multiline. - SEOL no Same, assuming singleline. - BOUND no Match "" at any word boundary - BOUNDL no Match "" at any word boundary - NBOUND no Match "" at any word non-boundary - NBOUNDL no Match "" at any word non-boundary - GPOS no Matches where last m//g left off. - - # [Special] alternatives - ANY no Match any one character (except newline). - SANY no Match any one character. - ANYOF sv Match character in (or not in) this class. - ALNUM no Match any alphanumeric character - ALNUML no Match any alphanumeric char in locale - NALNUM no Match any non-alphanumeric character - NALNUML no Match any non-alphanumeric char in locale - SPACE no Match any whitespace character - SPACEL no Match any whitespace char in locale - NSPACE no Match any non-whitespace character - NSPACEL no Match any non-whitespace char in locale - DIGIT no Match any numeric character - NDIGIT no Match any non-numeric character - - # BRANCH The set of branches constituting a single choice are hooked - # together with their "next" pointers, since precedence prevents - # anything being concatenated to any individual branch. The - # "next" pointer of the last BRANCH in a choice points to the - # thing following the whole choice. This is also where the - # final "next" pointer of each individual branch points; each - # branch starts with the operand node of a BRANCH node. - # - BRANCH node Match this alternative, or the next... - - # BACK Normal "next" pointers all implicitly point forward; BACK - # exists to make loop structures possible. - # not used - BACK no Match "", "next" ptr points backward. - - # Literals - EXACT sv Match this string (preceded by length). - EXACTF sv Match this string, folded (prec. by length). - EXACTFL sv Match this string, folded in locale (w/len). - - # Do nothing - NOTHING no Match empty string. - # A variant of above which delimits a group, thus stops optimizations - TAIL no Match empty string. Can jump here from outside. - - # STAR,PLUS '?', and complex '*' and '+', are implemented as circular - # BRANCH structures using BACK. Simple cases (one character - # per match) are implemented with STAR and PLUS for speed - # and to minimize recursive plunges. - # - STAR node Match this (simple) thing 0 or more times. - PLUS node Match this (simple) thing 1 or more times. - - CURLY sv 2 Match this simple thing {n,m} times. - CURLYN no 2 Match next-after-this simple thing - # {n,m} times, set parens. - CURLYM no 2 Match this medium-complex thing {n,m} times. - CURLYX sv 2 Match this complex thing {n,m} times. - - # This terminator creates a loop structure for CURLYX - WHILEM no Do curly processing and see if rest matches. - - # OPEN,CLOSE,GROUPP ...are numbered at compile time. - OPEN num 1 Mark this point in input as start of #n. - CLOSE num 1 Analogous to OPEN. - - REF num 1 Match some already matched string - REFF num 1 Match already matched string, folded - REFFL num 1 Match already matched string, folded in loc. - - # grouping assertions - IFMATCH off 1 2 Succeeds if the following matches. - UNLESSM off 1 2 Fails if the following matches. - SUSPEND off 1 1 "Independent" sub-regex. - IFTHEN off 1 1 Switch, should be preceded by switcher. - GROUPP num 1 Whether the group matched. - - # Support for long regex - LONGJMP off 1 1 Jump far away. - BRANCHJ off 1 1 BRANCH with long offset. - - # The heavy worker - EVAL evl 1 Execute some Perl code. - - # Modifiers - MINMOD no Next operator is not greedy. - LOGICAL no Next opcode should set the flag only. - - # This is not used yet - RENUM off 1 1 Group with independently numbered parens. - - # This is not really a node, but an optimized-away piece of a "long" node. - # To simplify debugging output, we mark it as if it were a node - OPTIMIZED off Placeholder for dump. + # TYPE arg-description [num-args] [longjump-len] DESCRIPTION + + # Exit points + END no End of program. + SUCCEED no Return from a subroutine, basically. + + # Anchors: + + BOL no Match "" at beginning of line. + MBOL no Same, assuming multiline. + SBOL no Same, assuming singleline. + EOS no Match "" at end of string. + EOL no Match "" at end of line. + MEOL no Same, assuming multiline. + SEOL no Same, assuming singleline. + BOUND no Match "" at any word boundary using native charset + semantics for non-utf8 + BOUNDL no Match "" at any locale word boundary + BOUNDU no Match "" at any word boundary using Unicode semantics + BOUNDA no Match "" at any word boundary using ASCII semantics + NBOUND no Match "" at any word non-boundary using native charset + semantics for non-utf8 + NBOUNDL no Match "" at any locale word non-boundary + NBOUNDU no Match "" at any word non-boundary using Unicode semantics + NBOUNDA no Match "" at any word non-boundary using ASCII semantics + GPOS no Matches where last m//g left off. + + # [Special] alternatives: + + REG_ANY no Match any one character (except newline). + SANY no Match any one character. + CANY no Match any one byte. + ANYOF sv Match character in (or not in) this class, single char + match only + ANYOFV sv Match character in (or not in) this class, can + match-multiple chars + ALNUM no Match any alphanumeric character using native charset + semantics for non-utf8 + ALNUML no Match any alphanumeric char in locale + ALNUMU no Match any alphanumeric char using Unicode semantics + ALNUMA no Match [A-Za-z_0-9] + NALNUM no Match any non-alphanumeric character using native charset + semantics for non-utf8 + NALNUML no Match any non-alphanumeric char in locale + NALNUMU no Match any non-alphanumeric char using Unicode semantics + NALNUMA no Match [^A-Za-z_0-9] + SPACE no Match any whitespace character using native charset + semantics for non-utf8 + SPACEL no Match any whitespace char in locale + SPACEU no Match any whitespace char using Unicode semantics + SPACEA no Match [ \t\n\f\r] + NSPACE no Match any non-whitespace character using native charset + semantics for non-utf8 + NSPACEL no Match any non-whitespace char in locale + NSPACEU no Match any non-whitespace char using Unicode semantics + NSPACEA no Match [^ \t\n\f\r] + DIGIT no Match any numeric character using native charset semantics + for non-utf8 + DIGITL no Match any numeric character in locale + DIGITA no Match [0-9] + NDIGIT no Match any non-numeric character using native charset + i semantics for non-utf8 + NDIGITL no Match any non-numeric character in locale + NDIGITA no Match [^0-9] + CLUMP no Match any extended grapheme cluster sequence + + # Alternation + + # BRANCH The set of branches constituting a single choice are hooked + # together with their "next" pointers, since precedence prevents + # anything being concatenated to any individual branch. The + # "next" pointer of the last BRANCH in a choice points to the + # thing following the whole choice. This is also where the + # final "next" pointer of each individual branch points; each + # branch starts with the operand node of a BRANCH node. + # + BRANCH node Match this alternative, or the next... + + # Back pointer + + # BACK Normal "next" pointers all implicitly point forward; BACK + # exists to make loop structures possible. + # not used + BACK no Match "", "next" ptr points backward. + + # Literals + + EXACT str Match this string (preceded by length). + EXACTF str Match this string, folded, native charset semantics for + non-utf8 (prec. by length). + EXACTFL str Match this string, folded in locale (w/len). + EXACTFU str Match this string, folded, Unicode semantics for non-utf8 + (prec. by length). + EXACTFA str Match this string, folded, Unicode semantics for non-utf8, + but no ASCII-range character matches outside ASCII (prec. + by length),. + + # Do nothing types + + NOTHING no Match empty string. + # A variant of above which delimits a group, thus stops optimizations + TAIL no Match empty string. Can jump here from outside. + + # Loops + + # STAR,PLUS '?', and complex '*' and '+', are implemented as circular + # BRANCH structures using BACK. Simple cases (one character + # per match) are implemented with STAR and PLUS for speed + # and to minimize recursive plunges. + # + STAR node Match this (simple) thing 0 or more times. + PLUS node Match this (simple) thing 1 or more times. + + CURLY sv 2 Match this simple thing {n,m} times. + CURLYN no 2 Capture next-after-this simple thing + CURLYM no 2 Capture this medium-complex thing {n,m} times. + CURLYX sv 2 Match this complex thing {n,m} times. + + # This terminator creates a loop structure for CURLYX + WHILEM no Do curly processing and see if rest matches. + + # Buffer related + + # OPEN,CLOSE,GROUPP ...are numbered at compile time. + OPEN num 1 Mark this point in input as start of #n. + CLOSE num 1 Analogous to OPEN. + + REF num 1 Match some already matched string + REFF num 1 Match already matched string, folded using native charset + semantics for non-utf8 + REFFL num 1 Match already matched string, folded in loc. + REFFU num 1 Match already matched string, folded using unicode + semantics for non-utf8 + REFFA num 1 Match already matched string, folded using unicode + semantics for non-utf8, no mixing ASCII, non-ASCII + + # Named references. Code in regcomp.c assumes that these all are after the + # numbered references + NREF no-sv 1 Match some already matched string + NREFF no-sv 1 Match already matched string, folded using native charset + semantics for non-utf8 + NREFFL no-sv 1 Match already matched string, folded in loc. + NREFFU num 1 Match already matched string, folded using unicode + semantics for non-utf8 + NREFFA num 1 Match already matched string, folded using unicode + semantics for non-utf8, no mixing ASCII, non-ASCII + + IFMATCH off 1 2 Succeeds if the following matches. + UNLESSM off 1 2 Fails if the following matches. + SUSPEND off 1 1 "Independent" sub-RE. + IFTHEN off 1 1 Switch, should be preceded by switcher. + GROUPP num 1 Whether the group matched. + + # Support for long RE + + LONGJMP off 1 1 Jump far away. + BRANCHJ off 1 1 BRANCH with long offset. + + # The heavy worker + + EVAL evl 1 Execute some Perl code. + + # Modifiers + + MINMOD no Next operator is not greedy. + LOGICAL no Next opcode should set the flag only. + + # This is not used yet + RENUM off 1 1 Group with independently numbered parens. + + # Trie Related + + # Behave the same as A|LIST|OF|WORDS would. The '..C' variants have + # inline charclass data (ascii only), the 'C' store it in the structure. + # NOTE: the relative order of the TRIE-like regops is significant + + TRIE trie 1 Match many EXACT(F[ALU]?)? at once. flags==type + TRIEC charclass Same as TRIE, but with embedded charclass data + + # For start classes, contains an added fail table. + AHOCORASICK trie 1 Aho Corasick stclass. flags==type + AHOCORASICKC charclass Same as AHOCORASICK, but with embedded charclass data + + # Regex Subroutines + GOSUB num/ofs 2L recurse to paren arg1 at (signed) ofs arg2 + GOSTART no recurse to start of pattern + + # Special conditionals + NGROUPP no-sv 1 Whether the group matched. + INSUBP num 1 Whether we are in a specific recurse. + DEFINEP none 1 Never execute directly. + + # Backtracking Verbs + ENDLIKE none Used only for the type field of verbs + OPFAIL none Same as (?!) + ACCEPT parno 1 Accepts the current matched string. + + + # Verbs With Arguments + VERB no-sv 1 Used only for the type field of verbs + PRUNE no-sv 1 Pattern fails at this startpoint if no-backtracking through this + MARKPOINT no-sv 1 Push the current location for rollback by cut. + SKIP no-sv 1 On failure skip forward (to the mark) before retrying + COMMIT no-sv 1 Pattern fails outright if backtracking through this + CUTGROUP no-sv 1 On failure go to the next alternation in the group + + # Control what to keep in $&. + KEEPS no $& begins here. + + # New charclass like patterns + LNBREAK none generic newline pattern + VERTWS none vertical whitespace (Perl 6) + NVERTWS none not vertical whitespace (Perl 6) + HORIZWS none horizontal whitespace (Perl 6) + NHORIZWS none not horizontal whitespace (Perl 6) + + FOLDCHAR codepoint 1 codepoint with tricky case folding properties. + + # SPECIAL REGOPS + + # This is not really a node, but an optimized away piece of a "long" node. + # To simplify debugging output, we mark it as if it were a node + OPTIMIZED off Placeholder for dump. + + # Special opcode with the property that no opcode in a compiled program + # will ever be of this type. Thus it can be used as a flag value that + # no other opcode has been seen. END is used similarly, in that an END + # node cant be optimized. So END implies "unoptimizable" and PSEUDO mean + # "not seen anything to optimize yet". + PSEUDO off Pseudo opcode for internal use. =for unprinted-credits Next section M-J. Dominus (mjd-perl-patch+@plover.com) 20010421 |