| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
New macros {GCC,CLANG}_DIAG_{IGNORE,RESTORE}_{DECL,STMT}, which take a
following semicolon. It is necessary to use the _DECL or _STMT version
as appropriate to the context. Fixes [perl #130726].
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
The extended charclass parser makes some assumptions during the
first pass which are only true on well structured input, and it
does not properly catch various errors. later on the code assumes
that things the first pass will let through are valid, when in
fact they should trigger errors.
|
|
|
|
|
|
|
|
|
|
|
|
| |
'depth' is used to track the recursion depth during compilation,
and is used by things like DEBUG_PARSE() to show the compilation
process.
handle_regex_sets() was using its own 'depth' for two different purposes,
which is quite confusing.
At the same time, when we call handle_regex_sets() from reg() it is
important to increment 'depth'.
|
|
|
|
| |
Fixes [perl #131893].
|
|
|
|
| |
so it is a bit easier to follow what they are used for.
|
|
|
|
| |
Coverity #169257, #169265, #169269.
|
|
|
|
|
|
| |
Reduce Newxz() to Newx() where all relevant parts of the memory are
being explicitly initialised, and don't explicitly zero memory that was
already zeroed. [perl #36078]
|
| |
|
|
|
|
|
| |
This allows things to work properly in the face of embedded NULs.
See the branch merge message for more information.
|
|
|
|
|
|
|
| |
This allows \x and \o to work properly in the face of embedded NULs.
A limit parameter is added to each function, and that is passed to
memchr (which replaces strchr). See the branch merge message for more
information.
|
| |
|
|
|
|
|
|
| |
The latter is generally faster when the length is already known.
This commit also changes a few hard-coded numbers to use sizeof().
|
|
|
|
|
| |
The latter is much clearer as to what's going on, and the programmer and
program reader don't have to count characters.
|
|
|
|
|
|
|
|
|
|
| |
Where the length is known, we can use these functions which relieve
the programmer and the program reader from having to count characters.
The memFOO functions should also be slightly faster than the strFOO
equivalents.
In some instances in this commit, hard coded numbers are used. These
come from the 'case' statement values that apply to them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 2bfbbbaf9ef1783ba914ff9e9270e877fbbb6aba changed things so -Dr
output could be changed through an environment variable to truncate
the output differently than the default.
For most purposes, the default is good enough, but for someone trying to
debug the regcomp internals, sometimes one wants to see more than is
output by default.
That commit did not catch all the places. This one changes the handling
so that any place that use the previous default maximum now uses the
environment variable (if set) instead.
|
|
|
|
|
| |
By adding two branches, we can avoid the expensive UTF-8 decode step for
the common case of the input being an ASCII character.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In code reading (so I don't have a test case), I realized that this code
could return out of its function without restoring the state that it has
changed out from under the caller, and that can lead to havoc when the
caller continues on assuming the original state.
This commit moves the return and other checking to after the state
restoral code. It can call FAIL2 as part of a panic after the state the
failure is in is gone. This would be a problem if it called vFAIL2
instead, but isn't because FAIL2 doesn't need the state the failure was
in.
|
|
|
|
|
| |
We have already assured earlier in the function that this 'if' is always
true.
|
|
|
|
| |
Vertically align some text.
|
| |
|
| |
|
|
|
|
| |
If this value is negative, something is wrong.
|
|
|
|
| |
However, we do preserve it outside PERL_CORE for the use of XS authors.
|
|
|
|
|
|
|
|
|
|
| |
We check that numerically quantified subpatterns can match something,
so that we can detect things like (){4}. However, we produce false positives
when using regex recursion. This is related to slow-downs in grammar matches
in Perl 5.20 which were fixed by a51d618a82a7057c3aabb600a7a8691d27f44a34.
In an ideal world we would do a lot of work and this false-positive would not
happen, but that requires more round tuits than I have available
|
|
|
|
|
|
|
|
| |
The cause of this is that the vFAIL macro uses RExC_parse, and that
variable has just been changed in preparation for code after the vFAIL.
The solution is to not change RExC_parse until after the vFAIL.
This is a case where the macro hides stuff that can bite you.
|
| |
|
|
|
|
|
| |
this means that callers do not have to worry about resetting the flags,
which reduces the chance of error in using reginsert.
|
|
|
|
| |
why reginsert doesnt do this stuff I dont know.
|
|
|
|
|
|
|
|
|
|
|
| |
This function copies a regexp SV. Rename its args to ssv and dsv to
match a usual convention in other functions such as sv_catsv().
Similarly rename the two local vars holding ReANY(ssv/dsv) to srx, drx.
This is less confusing than having four vars called rx, ret_x, r, ret.
Also update the comments explaining what the function does.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit v5.17.5-99-g8d919b0 stopped SVt_REGEXP SVs (and PVLVs acting as
regexes) from having the POK and pPOK flags set. This made things like
SvOK() and SvTRUE() slower, because as well as the quick single test for
any I/N/P/R flags, SvOK() also has to test for
(SvTYPE(sv) == SVt_REGEXP
|| (SvFLAGS(sv) & (SVTYPEMASK|SVp_POK|SVpgv_GP|SVf_FAKE))
== (SVt_PVLV|SVf_FAKE))
This commit fixes the issue fixed by g8d919b0 in a slightly different way,
which is less invasive and allows the POK flag.
Background:
PVLV are basically PVMGs with a few extra fields. They are intended to
be a superset of all scalar types, so any scalar value can be assigned
to a PVLV SV.
However, once REGEXPs were made into first-class scalar SVs, this
assumption broke - there are a whole bunch of fields in a regex SV body
which can't be copied to to a PVLV. So this broke:
sub f {
my $r = qr/abc/; # $r is reference to an SVt_REGEXP
$_[0] = $$r;
}
f($h{foo}); # the hash access is deferred - a temporary PVLV is
# passed instead
The basic idea behind the g8d919b0 fix was, for an LV-acting-as-regex,
to attach both a PVLV body and a regex body to the SV head. This commit
keeps this basic concept; it just changes how the extra body is attached.
The original fix changed SVt_REGEXP SVs so that sv.sv_u.svu_pv no longer
pointed to the regexp's string representation; instead this pointer was
stored in a union made out of the xpv_len field. Doing this necessitated
not turning the POK flag on for any REGEXP SVs.
This freed up the sv_u to point to the regex body, while the sv_any field
could continue to point to the PVLV body. An ReANY() macro was introduced
that returned the sv_u field rather than the sv_any field.
This commit changes it so that instead, on regexp SVs (and LV-as-regexp
SVs), sv_u always points to the string buffer (so they can have POK set
again), but on specifically LV-as-regex SVs, the xpv_len_u union of the
PVLV body points to the regexp body.
This means that SVt_REGEXP SVs are now completely "normal" again,
and SVt_PVLV SVs are normal except in the one case where they hold a
regex, in which case rather than storing the string buffer's length, the
PVLV body stores a pointer to the regex body.
|
|
|
|
|
|
| |
This reverts commit 2ac902efe11ee156653eb2ca1369f0e5f4546c31.
See thread at Message-ID: <d8jfuebazrl.fsf@dalvik.ping.uio.no>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the scan_data_t stores its fixed and floating substring data
as a 2-element array, replace various bits of duplicated code which
separately handled fixed and floating substrings with for (i = 0; i < 2;
i++) loops etc.
This makes the code shorter and simpler, and will make it easier in future
to expand to more than a single each of fixed+float.
There should be no functional changes, except that debugging output
now displays N..N rather than just just N for the fixed substring
start range (i.e. its now just a subset of float where max == min)
|
|
|
|
|
|
|
|
|
| |
.. to 'cur_is_floating'
It's an index into either the fixed or float substring info; the
information it provides is whether the currently being captured substring
is fixed or floating; it's nothing to do with whether the fixed or the
floating is currently the longest.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In this src file, expand all the various macros like
#define anchored_offset substrs->data[0].min_offset
#define float_min_offset substrs->data[1].min_offset
This will later allow parts of the code to be parameterised, e.g.
for (i=0; i<1; i++) {
substrs->data[i].min_offset = ...;
...
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than passing e.g.
&(r->float_utf8),
&(r->float_substr),
&(r->float_end_shift),
pass the single arg
&(r->substrs->data[1])
(float_foo are macros which expand to substrs->data[1].foo)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
previously scan_data_t had the three fields
offset_fixed
offset_float_min
offset_float_max
a few commits ago that was converted into a 2 element array (for fixed
and float), each with the fields
min_offset
max_offset
where the max_offset was unused in fixed (substrs[0]) case.
Instead, set it equal to min_offset
This makes the fixed and float code paths more similar.
At the same time expand a few of the 'float_max_offset' type macros
to make it clearer what's going on.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the scan_data_t struct has a flags field which contains
SF_ and SCF_ flags. Some of the SF_ flags are general; others are specific
to the fixed or floating substr. For example there are these 3 flags:
SF_BEFORE_MEOL
SF_FIX_BEFORE_MEOL
SF_FL_BEFORE_MEOL
This commit adds a flags field to the per-substring substruct and sets
some flags per-substring instead. For example
previously we did: now we would do:
-------------------------------- --------------------------------------
data->flags |= SF_BEFORE_MEOL unchanged
data->flags |= SF_FIX_BEFORE_MEOL data->substrs[0].flags |= SF_BEFORE_MEOL
data->flags |= SF_FL_BEFORE_MEOL data->substrs[1].flags |= SF_BEFORE_MEOL
This allows us to simplify the code (e.g. eliminating some args from
S_setup_longest()) and in future will allow more than one fixed or
floating substring.
|
|
|
|
|
|
|
|
| |
DEBUG_PEEP(..., flags) was invoked from 3 functions - however in two of
throse functions, the 'flags' local var did *not* contain SF_ and SCF_
bits, so the flag bits were being incorrectly displayed as SF_ etc.
In those two functions, change it instead to DEBUG_PEEP(...., 0)
|
|
|
|
|
|
|
|
|
|
|
|
| |
make these 3 macros into thin wrappers around some new static
functions, rather than just being huge macros:
DEBUG_SHOW_STUDY_FLAGS
DEBUG_STUDYDATA
DEBUG_PEEP
Also, avoid the macros implicitly using local vars: make them into
explicit parameters instead (this is one of my pet peeves).
|
|
|
|
|
|
|
|
|
|
|
| |
In this private data structure used during regex compilation, the
'longest' field was an SV** pointer which was always set to point to
one of these two addresses:
&(data->substrs[0].str)
&(data->substrs[1].str)
Instead, just make it a U8 with the value 0 or 1.
|
|
|
|
|
| |
Now that a substring is a separate struct, pass as a single pointer
rather than as 4 separate args.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This private struct is used just within regcomp.c while compiling a
pattern. It has a set of fields for a fixed substring, and similar set for
floating, e.g.
SV *longest_fixed;
SV *longest_float;
SSize_t *minlen_fixed;
SSize_t *minlen_float;
etc
Instead have a 2 element array, one for fixed, one for float, so e.g.
data->offset_float_max
becomes
data->substrs[1].max_offset
There are 3 reasons for doing this.
First, it makes the code more regular, and allows a whole substr ptr to be
passed as an arg to a function rather than having to pass every individual
field;
second, it makes the compile-time struct more similar to the runtime
struct, which already has such an arrangement;
third, it allows for a hypothetical future expansion where there aren't
necessarily at most 1 fixed and 1 floating substring.
Note that a side effect of this commit has been to change
lookbehind_fixed from I32 to SSize_t; lookbehind_float was already
SSize_t, so the I32 was probably a bug.
|
|
|
|
|
|
| |
this worked because 'a' and 'o' are treated the same for all intents
and purposes, but it is confusing as 'a' stands for array, and 'o'
for hash, and the DEBUG mode code here adds two arrays not hashes.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the original code you'd have to be very, very careful:
if (foo)
CLEAR_POSIX_WARNINGS_AND_RETURN(42);
would have expanded to
if (foo)
CLEAR_POSIX_WARNINGS();
return 42; /* always returns! */
|
| |
|
| |
|