diff options
author | Yves Orton <demerphq@gmail.com> | 2006-11-13 00:29:41 +0100 |
---|---|---|
committer | Steve Peters <steve@fisharerojo.org> | 2006-11-13 02:19:12 +0000 |
commit | de8c53012b7e614137ab875e0d58a92474b317ce (patch) | |
tree | cc24fc09cc1af2e140a8d29a1bcd652cba6c4b00 /pod/perlreguts.pod | |
parent | 7834bb7eff465724a885b368420973bce2d27483 (diff) | |
download | perl-de8c53012b7e614137ab875e0d58a92474b317ce.tar.gz |
Regex Utility Functions and Substituion Fix (XML::Twig core dump)
Message-ID: <9b18b3110611121429g1fc9d6c1t4007dc711f9e8396@mail.gmail.com>
Plus a couple tweaks to ext/re/re.pm and t/op/pat.t to those patches
to apply cleanly.
p4raw-id: //depot/perl@29252
Diffstat (limited to 'pod/perlreguts.pod')
-rw-r--r-- | pod/perlreguts.pod | 30 |
1 files changed, 23 insertions, 7 deletions
diff --git a/pod/perlreguts.pod b/pod/perlreguts.pod index 4ee2be172f..937565745c 100644 --- a/pod/perlreguts.pod +++ b/pod/perlreguts.pod @@ -759,7 +759,8 @@ F<regexp.h> contains the base structure definition: U32 *offsets; /* offset annotations 20001228 MJD */ I32 sublen; /* Length of string pointed by subbeg */ I32 refcnt; - I32 minlen; /* mininum possible length of $& */ + I32 minlen; /* mininum length of string to match */ + I32 minlenret; /* mininum possible length of $& */ I32 prelen; /* length of precomp */ U32 nparens; /* number of parentheses */ U32 lastparen; /* last paren matched */ @@ -838,13 +839,28 @@ that handles this is called C<find_by_class()>. Sometimes this field points at a regop embedded in the program, and sometimes it points at an independent synthetic regop that has been constructed by the optimiser. -=item C<minlen> +=item C<minlen> C<minlenret> -The minimum possible length of the final matching string. This is used -to prune the search space by not bothering to match any closer to the -end of a string than would allow a match. For instance there is no point -in even starting the regex engine if the minlen is 10 but the string -is only 5 characters long. There is no way that the pattern can match. +C<minlen> is the minimum string length required for the pattern to match. +This is used to prune the search space by not bothering to match any +closer to the end of a string than would allow a match. For instance +there is no point in even starting the regex engine if the minlen is +10 but the string is only 5 characters long. There is no way that the +pattern can match. + +C<minlenret> is the minimum length of the string that would be found +in $& after a match. + +The difference between C<minlen> and C<minlenret> can be seen in the +following pattern: + + /ns(?=\d)/ + +where the C<minlen> would be 3 but the minlen ret would only be 2 as +the \d is required to match but is not actually included in the matched +content. This distinction is particularly important as the substitution +logic uses the C<minlenret> to tell whether it can do in-place substition +which can result in considerable speedup. =item C<reganch> |