| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
This reverts commit 2ac902efe11ee156653eb2ca1369f0e5f4546c31.
See thread at Message-ID: <d8jfuebazrl.fsf@dalvik.ping.uio.no>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the scan_data_t stores its fixed and floating substring data
as a 2-element array, replace various bits of duplicated code which
separately handled fixed and floating substrings with for (i = 0; i < 2;
i++) loops etc.
This makes the code shorter and simpler, and will make it easier in future
to expand to more than a single each of fixed+float.
There should be no functional changes, except that debugging output
now displays N..N rather than just just N for the fixed substring
start range (i.e. its now just a subset of float where max == min)
|
|
|
|
|
|
|
|
|
| |
.. to 'cur_is_floating'
It's an index into either the fixed or float substring info; the
information it provides is whether the currently being captured substring
is fixed or floating; it's nothing to do with whether the fixed or the
floating is currently the longest.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In this src file, expand all the various macros like
#define anchored_offset substrs->data[0].min_offset
#define float_min_offset substrs->data[1].min_offset
This will later allow parts of the code to be parameterised, e.g.
for (i=0; i<1; i++) {
substrs->data[i].min_offset = ...;
...
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than passing e.g.
&(r->float_utf8),
&(r->float_substr),
&(r->float_end_shift),
pass the single arg
&(r->substrs->data[1])
(float_foo are macros which expand to substrs->data[1].foo)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
previously scan_data_t had the three fields
offset_fixed
offset_float_min
offset_float_max
a few commits ago that was converted into a 2 element array (for fixed
and float), each with the fields
min_offset
max_offset
where the max_offset was unused in fixed (substrs[0]) case.
Instead, set it equal to min_offset
This makes the fixed and float code paths more similar.
At the same time expand a few of the 'float_max_offset' type macros
to make it clearer what's going on.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the scan_data_t struct has a flags field which contains
SF_ and SCF_ flags. Some of the SF_ flags are general; others are specific
to the fixed or floating substr. For example there are these 3 flags:
SF_BEFORE_MEOL
SF_FIX_BEFORE_MEOL
SF_FL_BEFORE_MEOL
This commit adds a flags field to the per-substring substruct and sets
some flags per-substring instead. For example
previously we did: now we would do:
-------------------------------- --------------------------------------
data->flags |= SF_BEFORE_MEOL unchanged
data->flags |= SF_FIX_BEFORE_MEOL data->substrs[0].flags |= SF_BEFORE_MEOL
data->flags |= SF_FL_BEFORE_MEOL data->substrs[1].flags |= SF_BEFORE_MEOL
This allows us to simplify the code (e.g. eliminating some args from
S_setup_longest()) and in future will allow more than one fixed or
floating substring.
|
|
|
|
|
|
|
|
| |
DEBUG_PEEP(..., flags) was invoked from 3 functions - however in two of
throse functions, the 'flags' local var did *not* contain SF_ and SCF_
bits, so the flag bits were being incorrectly displayed as SF_ etc.
In those two functions, change it instead to DEBUG_PEEP(...., 0)
|
|
|
|
|
|
|
|
|
|
|
|
| |
make these 3 macros into thin wrappers around some new static
functions, rather than just being huge macros:
DEBUG_SHOW_STUDY_FLAGS
DEBUG_STUDYDATA
DEBUG_PEEP
Also, avoid the macros implicitly using local vars: make them into
explicit parameters instead (this is one of my pet peeves).
|
|
|
|
|
|
|
|
|
|
|
| |
In this private data structure used during regex compilation, the
'longest' field was an SV** pointer which was always set to point to
one of these two addresses:
&(data->substrs[0].str)
&(data->substrs[1].str)
Instead, just make it a U8 with the value 0 or 1.
|
|
|
|
|
| |
Now that a substring is a separate struct, pass as a single pointer
rather than as 4 separate args.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This private struct is used just within regcomp.c while compiling a
pattern. It has a set of fields for a fixed substring, and similar set for
floating, e.g.
SV *longest_fixed;
SV *longest_float;
SSize_t *minlen_fixed;
SSize_t *minlen_float;
etc
Instead have a 2 element array, one for fixed, one for float, so e.g.
data->offset_float_max
becomes
data->substrs[1].max_offset
There are 3 reasons for doing this.
First, it makes the code more regular, and allows a whole substr ptr to be
passed as an arg to a function rather than having to pass every individual
field;
second, it makes the compile-time struct more similar to the runtime
struct, which already has such an arrangement;
third, it allows for a hypothetical future expansion where there aren't
necessarily at most 1 fixed and 1 floating substring.
Note that a side effect of this commit has been to change
lookbehind_fixed from I32 to SSize_t; lookbehind_float was already
SSize_t, so the I32 was probably a bug.
|
|
|
|
|
|
| |
this worked because 'a' and 'o' are treated the same for all intents
and purposes, but it is confusing as 'a' stands for array, and 'o'
for hash, and the DEBUG mode code here adds two arrays not hashes.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the original code you'd have to be very, very careful:
if (foo)
CLEAR_POSIX_WARNINGS_AND_RETURN(42);
would have expanded to
if (foo)
CLEAR_POSIX_WARNINGS();
return 42; /* always returns! */
|
| |
|
| |
|
|
|
|
|
| |
Here, there is no advantage to assigning a variable within an 'if', and
it is somewhat harder to read, so don't do it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After the 5.26.0 code freeze, it came out that an application that many
others depend on, GNU Autoconf, has an unescaped '{' in it. Commit
7335cb814c19345052a23bc4462c701ce734e6c5 created a kludge that was
minimal, and designed to get just that one application to work.
I originally proposed a less kludgy patch that was applicable across a
larger set of applications. The proposed patch didn't fatalize uses
of unesacped '{' where we don't anticipate using it for something other
than its literal self. That approach worked for Autoconf, but also far
more instances, but was more complicated, and was rejected as being too
risky during code freeze.
Now this commit implements my original suggestion. I am putting it in
now, to let it soak in blead, in case something else surfaces besides
Autoconf, that we need to work around. By having experience with the
patch live, we can be more confident about using it, if necessary, in a
dot release.
|
|
|
|
|
|
|
| |
Sometimes it is convenient/and or necessary to do an assignment within a
clause of an 'if', but it adds a little cognitive load. In this case,
it's entirely unnecessary. This patch changes to do the assignment
before the 'if'.
|
|
|
|
|
| |
This changes to precede each literal '[' in a [...] class with a
backslash to better make is standout as a literal
|
|
|
|
|
|
| |
Instead of using a bunch of branches, use strchr() to see if a
character is a member of a class. This is a common paradigm in the
parsers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See [perl #130497]
GNU Autoconf depends on Perl, and will not work on Blead (and the
forthcoming Perl 5.26), due to a single unescaped '{', that has
previously been deprecated and is now fatal. A patch for it has been in
the Autoconf repository since early 2013, but there has not been a
release since before then.
Because this is depended on by so much code, and because it is simpler
than trying to revert to making the fatality merely deprecated, this
patch simply changes perl to not die when compiled with the exact
pattern that trips up Autoconf. Thus Autoconf can continue to work, but
any other patterns that use the now illegal construct will continue to
die. If other code uses the exact pattern, they too will not die, but
the deprecation message continues to get raised. The use of the left
brace in this particular pattern is not one where we envision using the
construct to mean something else, so a deprecation is suitable for the
foreseeable future.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RT #130841
In general code, change this idiom:
PL_foo_max += size;
Renew(PL_foo, PL_foo_max, foo_t);
to
Renew(PL_foo, PL_foo_max + size, foo_t);
PL_foo_max += size;
so that if Renew dies, PL_foo_max won't be left hanging.
|
|
|
|
| |
Originally noted as a scoping issue by Andy Lester.
|
|
|
|
| |
This reverts commit bfdc8cd3d5a81ab176f7d530d2e692897463c97d.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These names sparked some controversy when created:
http://www.nntp.perl.org/group/perl.perl5.porters/2016/03/msg235216.html
I looked through existing code for paradigms to follow, and found some
occurrences of 'skip_foo_mg'. So this commit changes the names to be
av_top_index_skip_len_mg()
av_tindex_skip_len_mg()
This is explicit about the type of magic that is ignored, and will still
be valid if another type of magic ever gets added.
|
|
|
|
|
| |
Even though code calling S_pat_upgrade_to_utf8 from the
Perl_re_op_compile is testing the code_blocks for NULLness.
|
|
|
|
| |
See 147e38468b8279e26a0ca11e4efd8492016f2702 for complete explanation
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
study_chunk() for CURLYX is used to set flags on the linked WHILEM
node to say it is the whilem_c'th of whilem_seen. However it assumes
each CURLYX can be studied only once, which is not the case - there
are various cases such as GOSUB which call study_chunk() recursively
on already-visited parts of the program.
Storing the wrong index can cause the super-linear cache handling in
regmatch() to read/write the byte after the end of poscache.
Also reported in [perl #129281].
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RT #130650 heap-use-after-free in S_free_codeblocks
When compiling qr/(?{...})/, a reg_code_blocks structure is allocated
and various SVs are attached to it. Initially this is set to be freed
via a destructor on the savestack, in case of early dying. Later the
structure is attached to the compiling regex, and a boolean flag in the
structure, 'attached', is set to true to show that the destructor no
longer needs to free the struct.
However, it is possible to get three orders of destruction:
1) allocate, push destructor, die early
2) allocate, push destructor, attach to regex, die
2) allocate, push destructor, attach to regex, succeed
In 2, the regex is freed (via the savestack) before the destructor is
called. In 3, the destructor is called, then later the regex is freed.
It turns out perl can't currently handle case 2:
qr'(?{})\6'
Fix this by turning the 'attached' boolean field into an integer refcount,
then keep a count of whether the struct is referenced from the savestack
and/or the regex. Since it normally has a value of 1 or 2, it's similar
to a boolean flag, but crucially it no longer just indicates that the
regex has a pointer to it ('attached'), but that at least one of the
savestack and regex have a pointer to it. So order of freeing no longer
matters.
I also updated S_free_codeblocks() so that it nulls out SV pointers in
the reg_code_blocks struct before freeing them. This is is generally good
practice to avoid double frees, although is probably not needed at the
moment.
|
|
|
|
|
|
|
|
|
| |
77c8f26370dcc0e added support for a doubled x regexp flags, and ensured
the doubled flag was passed to the qr// created by
S_compile_runtime_code().
Unfortunately it didn't ensure enough space was allocated for that
extra 'x'.
|
|
|
|
| |
As per bb78386f13.
|
|
|
|
|
| |
This assert will fail if someone adds code that optimises away a GOSUB
call. At which point they will see the comment and know what to do.
|
|
|
|
|
|
| |
In 31fc93954d1f379c7a49889d91436ce99818e1f6 I added code that would modify
NEXT_OFF() when we were not in PASS2, when we should not do so. Strangly this
did not segfault when I tested, but this fix is required.
|
|
|
|
|
|
| |
Had these docs been here I would have saved some time debugging. So
save the next guy from the same trouble... (with my memory *I* might
even be the /next guy/. Sigh.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
not friends
Instead of optimising away impossible quantifiers like (foo){1,0} treat them
as unquantified, and guard them with an OPFAIL. Thus /(foo){1,0}/ is treated
the same as /(*FAIL)(foo)/ this is important in patterns like /(foo){1,0}|(?1)/
where the (?1) needs to be able to recurse into the (foo) even though the
(foo){1,0} can never match. It also resolves various issues (SEGVs) with patterns
like /((?1)){1,0}/.
This patch would have been easier if S_reginsert() documented that it is
the callers responsibility to properly set up the NEXT_OFF() of the inserted
node (if the node has a NEXT_OFF())
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[perl #129140] attempting double-free
Thus fixes some leaks and double frees in regexes which contain code
blocks.
During compilation, an array of struct reg_code_block's is malloced.
Initially this is just attached to the RExC_state_t struct local var in
Perl_re_op_compile(). Later it may be attached to a pattern. The difficulty
is ensuring that the array is free()d (and the ref counts contained within
decremented) should compilation croak early, while avoiding double frees
once the array has been attached to a regex.
The current mechanism of making the array the PVX of an SV is a bit flaky,
as the array can be realloced(), and code can be re-entered when utf8 is
detected mid-compilation.
This commit changes the array into separately malloced head and body.
The body contains the actual array, and can be realloced. The head
contains a pointer to the array, plus size and an 'attached' boolean.
This indicates whether the struct has been attached to a regex, and is
effectively a 1-bit ref count.
Whenever a head is allocated, SAVEDESTRUCTOR_X() is used to call
S_free_codeblocks() to free the head and body on scope exit. This function
skips the freeing if 'attached' is true, and this flag is set only at the
point where the head gets attached to the regex.
In one way this complicates the code, since the num_code_blocks field is now
not always available (it's only there is a head has been allocated), but
mainly its simplifies, since all the book-keeping is now done in the two
new static functions S_alloc_code_blocks() and S_free_codeblocks()
|
|
|
|
|
|
|
|
|
| |
"use re 'strict" is supposed to warn if a range whose start and end
points are digits aren't from the same group of 10. For example, if you
mix Bengali and Thai digits. It wasn't working properly for 5 groups of
mathematical digits starting at U+1D7E. This commit fixes that, and
refactors the code to bail out as soon as it discovers that no warning
is warranted, instead of doing unnecessary work.
|
|
|
|
|
|
|
|
|
|
|
| |
Starting in 5.14, we deprecated the use of "\cI<X>" when this
results in a printable character. For instance, "\c:" is just
a fancy way of writing "z". Starting in 5.28, this will be a
fatal error.
This also includes certain usage in regular expressions with the
experimental (?[ ]) construct, or when "use re 'strict'" is in
effect (also experimental).
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 5.26, some uses of unescaped left braces were made fatal; they have
given a deprecation warning since 5.20. Due to an oversight, some cases
were missed, and did not give a deprecation warning. They do now.
This patch changes said deprecation warning to mention the Perl version
in which the use of an unescaped left brace will be fatal (5.30).
The patch also cleans up some unnecessary quotes inside a C<> construct
in the discussion of this warning in perldiag.pod.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit generates a warning when the experimental 're strict'
feature is in effect for unescaped '}' and ']' characters (in a regular
expression pattern) that are interpreted literally.
This brings the behavior of these more in line with ')' which croaks
when it is taken literally.
The problem with the existing behavior is that these characters may be
metacharacters or they may be literals, depending on action at a
distance. Not so with ')', which is always a metacharacter unless
escaped.
Ideally, all three of these characters should behave similarly, but it
really is too late for that, except we can warn if the user has
requested extra checking of their patterns with this experimental
're strict' feature.
|
| |
|
|
|
|
|
| |
This was first proposed in the thread starting at
http://www.nntp.perl.org/group/perl.perl5.porters/2014/09/msg219394.html
|
|
|
|
|
| |
This was used for the removed feature of having the source in a
different encoding.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was introduced in a1a5ec35e6a3df0994b103aadb28a8c1a3a278da, and was
due to a thinko on my part. Zefram figured it out.
A macro evaluating to a string constant returns an instance of that
constant. Compilers are free to collapse all instances into a single
one (which saves space), or to have multiple copies. The code was
assuming the former, and HP-UX cc doesn't.
The passed size also was one byte larger than it should have been.
|
|
|
|
| |
"warning: unused variable 'i' [-Wunused-variable]"
|