diff options
author | David Mitchell <davem@iabyn.com> | 2013-05-18 15:05:57 +0100 |
---|---|---|
committer | David Mitchell <davem@iabyn.com> | 2013-06-02 22:28:50 +0100 |
commit | 52a21eb36148cc4f249f436a989e2cfe5c6bab1f (patch) | |
tree | e85cc834c1b706d4f5245c6aef489970a0e6621a /pod/perlreapi.pod | |
parent | f9176b44e50593d8f3446da63d3989558f6d4c20 (diff) | |
download | perl-52a21eb36148cc4f249f436a989e2cfe5c6bab1f.tar.gz |
add strbeg argument to Perl_re_intuit_start()
(note that this is a change both to the perl API and the regex engine
plugin API).
Currently, Perl_re_intuit_start() is passed an SV, plus pointers to:
where in the string to start matching (strpos); and to the end of the
string (strend).
Unlike Perl_regexec_flags(), it doesn't also have a strbeg arg.
Because of this this, it guesses strbeg: based on the passed SV (if its
svPOK()); or just set to strpos otherwise. This latter can happen if for
example the SV is overloaded. Note also that this latter guess is wrong,
and could in theory make /\b.../ fail.
But just to confuse matters, although Perl_re_intuit_start() itself uses
its guesstimate strbeg var, some of the functions it calls use the global
value of PL_bostr instead. To make this work, the *callers* of
Perl_re_intuit_start() currently set PL_bostr first. This is why \b
doesn't actually break.
The fix to this unholy mess is to simply add a strbeg arg to
Perl_re_intuit_start(). It's also the first step to eliminating PL_bostr
altogether.
Diffstat (limited to 'pod/perlreapi.pod')
-rw-r--r-- | pod/perlreapi.pod | 27 |
1 files changed, 24 insertions, 3 deletions
diff --git a/pod/perlreapi.pod b/pod/perlreapi.pod index 3d0962ac8d..c4e30cb4a8 100644 --- a/pod/perlreapi.pod +++ b/pod/perlreapi.pod @@ -21,6 +21,7 @@ following format: void* data, U32 flags); char* (*intuit) (pTHX_ REGEXP * const rx, SV *sv, + const char * const strbeg, char *strpos, char *strend, U32 flags, struct re_scream_pos_data_s *data); SV* (*checkstr) (pTHX_ REGEXP * const rx); @@ -286,9 +287,14 @@ Optimisation flags; subject to change. =head2 intuit - char* intuit(pTHX_ REGEXP * const rx, - SV *sv, char *strpos, char *strend, - const U32 flags, struct re_scream_pos_data_s *data); + char* intuit(pTHX_ + REGEXP * const rx, + SV *sv, + const char * const strbeg, + char *strpos, + char *strend, + const U32 flags, + struct re_scream_pos_data_s *data); Find the start position where a regex match should be attempted, or possibly if the regex engine should not be run because the @@ -296,6 +302,21 @@ pattern can't match. This is called, as appropriate, by the core, depending on the values of the C<extflags> member of the C<regexp> structure. +Arguments: + + rx: the regex to match against + sv: the SV being matched: only used for utf8 flag; the string + itself is accessed via the pointers below. Note that on + something like an overloaded SV, SvPOK(sv) may be false + and the string pointers may point to something unrelated to + the SV itself. + strbeg: real beginning of string + strpos: the point in the string at which to begin matching + strend: pointer to the byte following the last char of the string + flags currently unused; set to 0 + data: currently unused; set to NULL + + =head2 checkstr SV* checkstr(pTHX_ REGEXP * const rx); |