| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cause of this is the emulation of recursion in executing regex
patterns and that the script run feature did not cooperate with it. The
result is that the input pointers got pushed (but not the script run)
and popped, so that the script run was pointing to something that had
been tried, failed and otherwise popped.
The solution I've adopted is to always push the current script run start
position whenever a push is done; and pop it whenever a pop is done.
If someone has suggestions about this code, please step forward.
|
|
|
|
| |
See [perl #132367].
|
|
|
|
|
|
| |
This continues the process of changing regmatch() to be able to restrict
the end point of where it is looking in the input, by adding that eol
position to the pushed state, and popping it when appropriate.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes the pattern parsing to use offsets from the first node in
the pattern program, rather than direct addresses of such nodes. This
is in preparation for a later change in which more mallocs will be done
which will change those addresses, whereas the offsets will remain
constant. Once the final program space is allocated, real addresses are
used as currently. This limits the necessary changes to a few
functions. Also, real addresses are used if they are constant across a
function; again this limits the changes.
Doing this introduces a new typedef for clarity 'regnode_offset' which
is not a pointer, but a count. This necessitates changing a bunch of
things to use 0 instead of NULL to indicate an error.
A new boolean is also required to indicate if we are in the first or
second passes of the pattern. And separate heap space is allocated for
scratch during the first pass.
|
|
|
|
|
| |
There's lots of confusion here, especially about lastparen - some of
the docs are just plain wrong.
|
| |
|
|
|
|
|
|
|
|
| |
It is used in only one place now, so its better to expand it out rather
than having a complex struct's fields defined using a huge macro.
No functional changes, but I did take the opportunity to tidy up the
formatting of the code comments for each struct field
|
|
|
|
|
|
| |
v5.27.2-34-g196a02a reorganised the RX_FOO() macros, mostly redefining
them in terms of the RXp_FOO() macros. A cut-an-paste error screwed up
the definition of RX_MATCH_UTF8_on(), which is isn't used in core.
|
|
|
|
|
|
|
| |
My recent commit made RX_MATCH_COPY_FREE() a wrapper for
RXp_MATCH_COPY_FREE() but it didn't build on VC 2003.
Spotted by bulk88.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For various RX_FOO() macros, add a RXp_FOO() variant, in such a way that
the original macro is now defined in terms of
#define RX_FOO(rx_sv) (RXp_FOO(ReANY(rx_sv)))
(This is a pre-existing convention; this commit just makes a larger subset
of the RX_() macros have an RXp_() variant).
Then use those macros in various pp_hot.c and regexec.c functions like
pp_match() and regexec_flags(), which already, or added via this commit,
have this line near the start:
regexp *prog = ReANY(rx);
This avoids having to do multiple ReANY()'s, which is important as this
macro now includes a conditional in its expression (to cope with
PVLV-as-REGEX)..
|
|
|
|
|
|
|
|
|
| |
It takes a private/internal regexp* pointer rather than a public REGEXP
pointer, so rename it to match the other RXp_ macros.
Most RXp_ macros are a variant of an RX_ macro; however, I didn't add an
RX_HAS_CUTGROUP() macro - someone can always add that if needed. (There's
only one use of RXp_HAS_CUTGROUP in core).
|
|
|
|
|
|
|
|
|
| |
The previous commit caused the formatting of these macros to become
messy. Realign them, replacing tabs with spaces, and split some long
macros over multiple lines.
Whitespace-only change (if you count splitting a macro with \ as
whitespace).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a family of RX_() macros which take an SV of type REGEXP as an
argument. For historical reasons (regexeps didn't use to be SVs), the name
of the parameter is 'prog', which is also by convention used to name the
actual regexp struct in some places. So e.g. at the top of
re_intuit_start(), you see
struct regexp *const prog = ReANY(rx);
This is confusing. So for this class of macro, rename the parameter from
'prog' to 'rx_sv'. This makes it clearer that the arg should be a REGEXP*
rather than an regexp*.
Note that there are some RXp_() macros which do take a regexp* as an arg;
I've left their parameter name as 'prog'.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit v5.17.5-99-g8d919b0 stopped SVt_REGEXP SVs (and PVLVs acting as
regexes) from having the POK and pPOK flags set. This made things like
SvOK() and SvTRUE() slower, because as well as the quick single test for
any I/N/P/R flags, SvOK() also has to test for
(SvTYPE(sv) == SVt_REGEXP
|| (SvFLAGS(sv) & (SVTYPEMASK|SVp_POK|SVpgv_GP|SVf_FAKE))
== (SVt_PVLV|SVf_FAKE))
This commit fixes the issue fixed by g8d919b0 in a slightly different way,
which is less invasive and allows the POK flag.
Background:
PVLV are basically PVMGs with a few extra fields. They are intended to
be a superset of all scalar types, so any scalar value can be assigned
to a PVLV SV.
However, once REGEXPs were made into first-class scalar SVs, this
assumption broke - there are a whole bunch of fields in a regex SV body
which can't be copied to to a PVLV. So this broke:
sub f {
my $r = qr/abc/; # $r is reference to an SVt_REGEXP
$_[0] = $$r;
}
f($h{foo}); # the hash access is deferred - a temporary PVLV is
# passed instead
The basic idea behind the g8d919b0 fix was, for an LV-acting-as-regex,
to attach both a PVLV body and a regex body to the SV head. This commit
keeps this basic concept; it just changes how the extra body is attached.
The original fix changed SVt_REGEXP SVs so that sv.sv_u.svu_pv no longer
pointed to the regexp's string representation; instead this pointer was
stored in a union made out of the xpv_len field. Doing this necessitated
not turning the POK flag on for any REGEXP SVs.
This freed up the sv_u to point to the regex body, while the sv_any field
could continue to point to the PVLV body. An ReANY() macro was introduced
that returned the sv_u field rather than the sv_any field.
This commit changes it so that instead, on regexp SVs (and LV-as-regexp
SVs), sv_u always points to the string buffer (so they can have POK set
again), but on specifically LV-as-regex SVs, the xpv_len_u union of the
PVLV body points to the regexp body.
This means that SVt_REGEXP SVs are now completely "normal" again,
and SVt_PVLV SVs are normal except in the one case where they hold a
regex, in which case rather than storing the string buffer's length, the
PVLV body stores a pointer to the regex body.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RT #130650 heap-use-after-free in S_free_codeblocks
When compiling qr/(?{...})/, a reg_code_blocks structure is allocated
and various SVs are attached to it. Initially this is set to be freed
via a destructor on the savestack, in case of early dying. Later the
structure is attached to the compiling regex, and a boolean flag in the
structure, 'attached', is set to true to show that the destructor no
longer needs to free the struct.
However, it is possible to get three orders of destruction:
1) allocate, push destructor, die early
2) allocate, push destructor, attach to regex, die
2) allocate, push destructor, attach to regex, succeed
In 2, the regex is freed (via the savestack) before the destructor is
called. In 3, the destructor is called, then later the regex is freed.
It turns out perl can't currently handle case 2:
qr'(?{})\6'
Fix this by turning the 'attached' boolean field into an integer refcount,
then keep a count of whether the struct is referenced from the savestack
and/or the regex. Since it normally has a value of 1 or 2, it's similar
to a boolean flag, but crucially it no longer just indicates that the
regex has a pointer to it ('attached'), but that at least one of the
savestack and regex have a pointer to it. So order of freeing no longer
matters.
I also updated S_free_codeblocks() so that it nulls out SV pointers in
the reg_code_blocks struct before freeing them. This is is generally good
practice to avoid double frees, although is probably not needed at the
moment.
|
|
|
|
| |
Except under cpan/ and dist/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[perl #129140] attempting double-free
Thus fixes some leaks and double frees in regexes which contain code
blocks.
During compilation, an array of struct reg_code_block's is malloced.
Initially this is just attached to the RExC_state_t struct local var in
Perl_re_op_compile(). Later it may be attached to a pattern. The difficulty
is ensuring that the array is free()d (and the ref counts contained within
decremented) should compilation croak early, while avoiding double frees
once the array has been attached to a regex.
The current mechanism of making the array the PVX of an SV is a bit flaky,
as the array can be realloced(), and code can be re-entered when utf8 is
detected mid-compilation.
This commit changes the array into separately malloced head and body.
The body contains the actual array, and can be realloced. The head
contains a pointer to the array, plus size and an 'attached' boolean.
This indicates whether the struct has been attached to a regex, and is
effectively a 1-bit ref count.
Whenever a head is allocated, SAVEDESTRUCTOR_X() is used to call
S_free_codeblocks() to free the head and body on scope exit. This function
skips the freeing if 'attached' is true, and this flag is set only at the
point where the head gets attached to the regex.
In one way this complicates the code, since the num_code_blocks field is now
not always available (it's only there is a head has been allocated), but
mainly its simplifies, since all the book-keeping is now done in the two
new static functions S_alloc_code_blocks() and S_free_codeblocks()
|
|
|
|
|
| |
This was first proposed in the thread starting at
http://www.nntp.perl.org/group/perl.perl5.porters/2014/09/msg219394.html
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most ops that execute a regex, such as match and subst, are of type PMOP.
A PMOP allows the actual regex to be attached directly to that op, due
to its extra fields.
OP_SPLIT is different; it is just a plain LISTOP, but it always has an
OP_PUSHRE as its first child, which *is* a PMOP and which has the regex
attached.
At runtime, pp_pushre()'s only job is to push itself (i.e. the current
PL_op) onto the stack. Later pp_split() pops this to get access to the
regex it wants to execute.
This is a bit unpleasant, because we're pushing an OP* onto the stack,
which is supposed to be an array of SV*'s. As a bit of a hack, on
DEBUGGING builds we push a PVLV with the PL_op address embedded instead,
but this still isn't very satisfactory.
Now that regexes are first-class SVs, we could push a REGEXP onto the
stack rather than PL_op. However, there is an optimisation of @array =
split which eliminates the assign and embeds the array's GV/padix directly
in the PUSHRE op. So split still needs access to that op. But the pushre
op will always be splitop->op_first anyway, so one possibility is to just
skip executing the pushre altogether, and make pp_split just directly
access op_first instead to get the regex and @array info.
But if we're doing that, then why not just go the full hog and make
OP_SPLIT into a PMOP, and eliminate the OP_PUSHRE op entirely: with the
data that was spread across the two ops now combined into just the one
split op.
That is exactly what this commit does.
For a simple compile-time pattern like split(/foo/, $s, 1), the optree
looks like:
before:
<@> split[t2] lK
</> pushre(/"foo"/) s/RTIME
<0> padsv[$s:1,2] s
<$> const(IV 1) s
after:
</> split(/"foo"/)[t2] lK/RTIME
<0> padsv[$s:1,2] s
<$> const[IV 1] s
while for a run-time expression like split(/$pat/, $s, 1),
before:
<@> split[t3] lK
</> pushre() sK/RTIME
<|> regcomp(other->8) sK
<0> padsv[$pat:2,3] s
<0> padsv[$s:1,3] s
<$> const(IV 1)s
after:
</> split()[t3] lK/RTIME
<|> regcomp(other->8) sK
<0> padsv[$pat:2,3] s
<0> padsv[$s:1,3] s
<$> const[IV 1] s
This makes the code faster and simpler.
At the same time, two new private flags have been added for OP_SPLIT -
OPpSPLIT_ASSIGN and OPpSPLIT_LEX - which make it explicit that the
assign op has been optimised away, and if so, whether the array is
lexical.
Also, deparsing of split has been improved, to the extent that
perl TEST -deparse op/split.t
now passes.
Also, a couple of panic messages in pp_split() have been replaced with
asserts().
|
|
|
|
| |
This has been deprecated since v5.22
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 24be310237a0f8f19cfdb71de1b068b4ce9572a0 I reworked how
we stored the close_paren info in the regexp match state
structure. Unfortunately I missed a subtle aspect of the
logic which meant that in certain cases we were relying
on close_paren being true to avoid comparing it against
a false ARG value for things like CURLYX, which meant that
sometimes we would exit an stack frame prematurely. This
patch fixes that logic and makes it more clear (via macros)
what is going on.
|
|
|
|
|
|
|
|
| |
In most cases the curlyx member is the first thing after the yes state
member, but eval was reversed. While debugging perl #127705 I switched
them to see what would happen, which changed the bug, and ultimately
revealed the cause of the problem. So I am going to leave them in the
"consistent" order.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The way we tracked if pattern recursion was infinite did not work
properly. A pattern like
"a"=~/(.(?2))((?<=(?=(?1)).))/
would loop forever, slowly eat up all available ram as it added
pattern recursion stack frames.
This patch changes the rules for recursion so that recursively
entering a given pattern "subroutine" twice from the same position
fails the match. This means that where previously we might have
seen fatal exception we will now simply fail. This means that
"aaabbb"=~/a(?R)?b/
succeeds with $& equal to "aaabbb".
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GOSTART is a special case of GOSUB, we can remove a lot of offset twiddling,
and other special casing by unifying them, at pretty much no cost.
GOSUB has 2 arguments, ARG() and ARG2L(), which are interpreted as
a U32 and an I32 respectively. ARG() holds the "parno" we will recurse
into. ARG2L() holds a signed offset to the relevant start node for the
recursion.
Prior to this patch the argument to GOSUB would always be >=, and unlike
other parts of our logic we would not use 0 to represent "start/end" of
pattern, as GOSTART would be used for "recurse to beginning of pattern",
after this patch we use 0 to represent "start/end", and a lot of
complexity "goes away" along with GOSTART regops.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we do a GOSUB/GOSTART we need to know where the recursion
"returns" to the previous position in the pattern. Currently
this is done by comparing cur_eval->u.eval.close_paren to the
the argument of CLOSE parens, and within END blocks (for GOSTART).
However, there is a problem. The state machinery for GOSUB/GOSTART
is shared with EVAL ( implementing /(??{ ... })/ ), which also sets
cur_eval->u.eval.close_paren to 0. This for instance breaks /(?(R)YES|NO)/
by making it return true within a /(??{ ... })/ when arguably it
shouldn't. It also is confusing.
So we change the meaning of close_paren so that 0 means "from EVAL",
and that otherwise the value is "+1", so 1 means GOSTART (trigger for
END), and 2 means "CLOSE1", etc. In order to make this transparent
this patch adds the EVAL_CLOSE_PAREN_IS( cur_eval, EXPR ) macro which
does the right thing:
( cur_eval && cur_eval->u.eval.close_paren &&
( ( cur_eval->u.eval.close_paren - 1 ) == EXPR ) )
|
|
|
|
| |
Removes 'the' in front of parameter names in some instances.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These 2 macros were created for the Symbian port in commit
"Symbian port of Perl" to replace a direct "extern" token. I guess the
author was unaware of PERL_CALLCONV.
PERL_CALLCONV is the official macro to use. PERL_XS_EXPORT_C and
PERL_EXPORT_C have no usage on cpan grep except for modules with direct
copies of core headers. A defect of using PERL_EXPORT_C and
PERL_XS_EXPORT_C instead of PERL_CALLCONV is that win32/win32.h has no
knowledge of the 2 macros and doesn't set them, and os/os2ish.h doesn't
either. On Win32, since the unix defaults are used instead of Win32
specific "__declspec(dllimport)" token, XS modules use indirect function
stubs in each XS module placed by the CC to call into perl5**.dll instead
of directly calls the core C functions. I observed this in in XS-Typemap's
DLL. To simplify the API, and to decrease the amount of macros needing to
implemented to support each platform, just remove the 2 macros.
Since perl.h's fallback defaults for PERL_CALLCONV are very late in perl.h,
they need to be moved up before function declarations start in perlio.h
(perlio.h is included from iperlsys.h).
win32iop.h contains the "PerlIO" and SV" tokens, so perlio.h must be
included before win32iop.h is. Including perlio.h so early in win32.h,
causes PERL_CALLCONV not be defined since Win32 platform uses the
fallback in perl.h, since win32.h doesn't always define PERL_CALLCONV and
sometimes relies on the fallback. Since win32iop.h contains alot of
declarations, it belongs with other declarations such as those in proto.h
so move it from win32.h to perl.h.
the "free" token in struct regexp_engine conflicts with win32iop's
"#define free win32_free" so rename that member.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An empty cpan/.dir-locals.el stops Emacs using the core defaults for
code imported from CPAN.
Committer's work:
To keep t/porting/cmp_version.t and t/porting/utils.t happy, $VERSION needed
to be incremented in many files, including throughout dist/PathTools.
perldelta entry for module updates.
Add two Emacs control files to MANIFEST; re-sort MANIFEST.
For: RT #124119.
|
|
|
|
| |
This is another step in the process
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This flag will prevent () from capturing and filling in $1, $2, etc...
Named captures will still work though, and if used will cause $1, $2, etc...
to be filled in *only* within named groups.
The motivation behind this is to allow the common construct of:
/(?:b|c)a(?:t|n)/
To be rewritten more cleanly as:
/(b|c)a(t|n)/n
When you want grouping but no memory penalty on captures.
You can also use ?n inside of a () directly to avoid capturing, and
?-n inside of a () to negate its effects if you want to capture.
|
| |
|
|
|
|
| |
It is no longer needed as of 1067df30ae9.
|
|
|
|
|
|
|
|
| |
We get an integer overflow message when we left shift a 1 into the
highest bit of a word. This changes the 1's into 1U's to indicate
unsigned. This is done for all the flag bits in the affected word, as
they could get reorderd by someone in the future, unintentionally
reintroducing this problem again.
|
|
|
|
|
|
|
|
|
|
| |
It is planned for a future Perl release to have /xx mean something
different from just /x. To prepare for this, this commit raises a
deprecation warning if someone currently has this usage. A grep of CPAN
did not turn up any instances of this, but this is to be safe anyway.
The added code is more general than actually needed, in case we want to
do this for another flag.
|
|
|
|
|
|
| |
This doesn't actually use the flag yet.
We no longer have to make version-dependent changes to
ext/Devel-Peek/t/Peek.t, (it being in /ext) so this doesn't
|
| |
|
|
|
|
|
|
| |
This sets a #define to point in the middle of the free-space, so that
bits at either end can be added without having to adjust many other
defines.
|
|
|
|
|
|
|
| |
This #defined a symbol then did a compile time check that it was the
same as another symbol. This commit simply defines it as the other
symbol directly, and moves it to above the other definitions, which it
no longer is part of. This prepares for the next commit.
|
|
|
|
|
|
| |
We do not need a placeholder for unused flag bits. And removing them
makes the generated regnodes.h more accurate as to what bits are
available.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves three bits to create a block of unused bits at the beginning.
The first bit had to be moved to make space for other uses that are
coming in future commits. This breaks binary compatibility, so might as
well move the other two bits so that all the unused bits are
consolidated at the beginning.
This pool of unused bits is the boundary between the bits that are
common to op.h and regexp.h (and in op_reg_common.h) and those that are
separate. It's best to have all the unused bits there, so when we need
to use one, it can be taken from either side, as needed, without us
being trapped into having an available bit, but of the wrong kind.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- after return/croak/die/exit, return/break are pointless
(break is not a terminator/separator, it's a goto)
- after goto, another goto (!) is pointless
- in some cases (usually function ends) introduce explicit NOT_REACHED
to make the noreturn nature clearer (do not do this everywhere, though,
since that would mean adding NOT_REACHED after every croak)
- for the added NOT_REACHED also add /* NOTREACHED */ since
NOT_REACHED is for gcc (and VC), while the comment is for linters
- declaring variables in switch blocks is just too fragile:
it kind of works for narrowing the scope (which is nice),
but breaks the moment there are initializations for the variables
(the initializations will be skipped since the flow will bypass
the start of the block); in some easy cases simply hoist the declarations
out of the block and move them earlier
Note 1: Since after this patch the core is not yet -Wunreachable-code
clean, not enabling that via cflags.SH, one needs to -Accflags=... it.
Note 2: At least with the older gcc 4.4.7 there are far too many
"unreachable code" warnings, which seem to go away with gcc 4.8,
maybe better flow control analysis. Therefore, the warning should
eventually be enabled only for modernish gccs (what about clang and
Intel cc?)
|
|
|
|
|
|
|
| |
This reverts commit 8c2b19724d117cecfa186d044abdbf766372c679.
I don't understand - smoke-me came back happy with three
separate reports... oh well, some other time.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- after croak/die/exit (or return), break (or return!) are pointless
(break is not a terminator/separator, it's a promise of a jump)
- after goto, another goto (!) is pointless
- in some cases (usually function ends) introduce explicit NOT_REACHED
to make the noreturn nature clearer (do not do this everywhere, though,
since that would mean adding NOT_REACHED after every croak)
- for the added NOT_REACHED also add /* NOTREACHED */ since
NOT_REACHED is for gcc (and VC), while the comment is for linters
- declaring variables in switch blocks is just too fragile:
it kind of works for narrowing the scope (which is nice),
but breaks the moment there are initializations for the variables
(they will be skipped!); in some easy cases simply hoist the declarations
out of the block and move them earlier
There are still a few places left.
|
|
|
|
| |
(For some odd reason assert() cannot be found and Jenkins becomes apoplectic.)
|
|
|
|
|
|
|
|
| |
The default case really is impossible because all the valid
enums values are already covered in the switch.
The NOT_REACHED; is for the compiler (from perl.h),
the /* NOTREACHED */ is for static analyzers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit v5.19.8-533-g63baef5 changed the handling of locale-dependent
regexes so that the pattern was considered tainted at compile-time, rather
than determining it each time at run-time whenever it executed a
locale-dependent node. Unfortunately due to the conflating of two flags,
RXf_TAINTED and RXf_TAINTED_SEEN, it had the side effect of permanently
marking a pattern as tainted once it had had a single tainted result.
E.g.
use re qw(taint);
use Scalar::Util qw(tainted);
for ($^X, "abc") {
/(.*)/ or die;
print "not " unless tainted("$1"); print "tainted\n";
};
which from 5.19.9 onwards output:
tainted
tainted
but with this commit (and with 5.19.8 and earlier), it now outputs:
tainted
not tainted
The RXf_TAINTED flag indicates that the pattern itself is tainted, e.g.
$r = qr/$tainted_value/
while the RXf_TAINTED_SEEN flag means that the results of the last match
are tainted, e.g.
use re 'tainted';
$tainted =~ /(.*)/;
# $1 is tainted
Pre 63baef5, the code used to look like:
at run-time:
turn off RXf_TAINTED_SEEN;
while (nodes to execute) {
switch(node) {
case
BOUNDL: /* and other locale-specific ops */
turn on RXf_TAINTED_SEEN;
...;
}
}
if (tainted || RXf_TAINTED)
turn on RXf_TAINTED_SEEN;
63baef5 changed it to:
at compile-time:
if (pattern has locale ops)
turn on RXf_TAINTED_SEEN;
at run-time:
while (nodes to execute) {
...
}
if (tainted || RXf_TAINTED)
turn on RXf_TAINTED_SEEN;
This commit changes it to:
at compile-time;
if (pattern has locale ops)
turn on RXf_TAINTED;
at run-time:
turn off RXf_TAINTED_SEEN;
while (nodes to execute) {
...
}
if (tainted || RXf_TAINTED)
turn on RXf_TAINTED_SEEN;
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently prog->substrs->data[] is a 3 element array of structures.
Elements 0 and 1 record the longest anchored and floating substrings,
while element 2 ('check'), is a copy of the longest of 0 and 1.
Record in a new field, prog->substrs->check_ix, the index of which element
was copied. (Eventually I intend to remove the copy altogether.)
Also for the anchored substr, set max_offset equal to min offset.
Previously it was left as zero and ignored, although if copied to check,
the check copy of max *was* set equal to min. Having this always set will
allow us to make the code simpler.
|
|
|
|
|
| |
In particular, specify that the various offset fields are char rather
than byte counts.
|
| |
|
|
|
|
|
|
|
|
|
| |
The flag tells us that a pattern may match an infinitely long string.
The new member in the regexp struct tells us how long the string might
be.
With these two items we can implement regexp based $/
|