| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
The extended charclass parser makes some assumptions during the
first pass which are only true on well structured input, and it
does not properly catch various errors. later on the code assumes
that things the first pass will let through are valid, when in
fact they should trigger errors.
(cherry picked from commit 19a498a461d7c81ae3507c450953d1148efecf4f)
|
|
|
|
|
|
|
| |
This allows things to work properly in the face of embedded NULs.
See the branch merge message for more information.
(cherry picked from commit 43b2f4ef399e2fd7240b4eeb0658686ad95f8e62)
|
| |
|
|
|
|
|
|
|
| |
This checks for and aborts if it find control characters in a supposed
Unicode property name. Code further along could not handle these.
This also fixes #132553 and #132658
|
|
|
|
|
|
|
|
| |
encounter a sharp S
This could lead to a buffer overflow.
(cherry picked from commit a02c70e35d1313a5f4e245e8f863c810e991172d)
|
|
|
|
|
|
| |
why reginsert doesnt do this stuff I dont know.
(cherry picked from commit 4dc12118f61b997fbd030230665b46e7c40f32d6)
|
|
|
|
|
|
| |
Fixes [perl #131893].
(cherry picked from commit 6c4f4eb174d1e2e9f874786123a699d11ae741f9)
|
|
|
|
| |
(cherry picked from commit f1d945b85ac2d18ddd1ed2e1d4f72011246d905a)
|
|
|
|
| |
(cherry picked from commit 910a6a8be166fb3780dcd2520e3526e537383ef2)
|
|
|
|
|
|
|
|
|
|
| |
The cause of this is that the vFAIL macro uses RExC_parse, and that
variable has just been changed in preparation for code after the vFAIL.
The solution is to not change RExC_parse until after the vFAIL.
This is a case where the macro hides stuff that can bite you.
(cherry picked from commit 2be4edede4ae226e2eebd4eff28cedd2041f300f)
|
|
|
|
| |
(cherry picked from commit 96c83ed78aeea1a0496dd2b2d935869a822dc8a5)
|
|
|
|
| |
(cherry picked from commit bab0f8e933b383b6bef406d79c2da340bbcded33)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See [perl #130497]
GNU Autoconf depends on Perl, and will not work on Blead (and the
forthcoming Perl 5.26), due to a single unescaped '{', that has
previously been deprecated and is now fatal. A patch for it has been in
the Autoconf repository since early 2013, but there has not been a
release since before then.
Because this is depended on by so much code, and because it is simpler
than trying to revert to making the fatality merely deprecated, this
patch simply changes perl to not die when compiled with the exact
pattern that trips up Autoconf. Thus Autoconf can continue to work, but
any other patterns that use the now illegal construct will continue to
die. If other code uses the exact pattern, they too will not die, but
the deprecation message continues to get raised. The use of the left
brace in this particular pattern is not one where we envision using the
construct to mean something else, so a deprecation is suitable for the
foreseeable future.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RT #130841
In general code, change this idiom:
PL_foo_max += size;
Renew(PL_foo, PL_foo_max, foo_t);
to
Renew(PL_foo, PL_foo_max + size, foo_t);
PL_foo_max += size;
so that if Renew dies, PL_foo_max won't be left hanging.
|
|
|
|
| |
Originally noted as a scoping issue by Andy Lester.
|
|
|
|
| |
This reverts commit bfdc8cd3d5a81ab176f7d530d2e692897463c97d.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These names sparked some controversy when created:
http://www.nntp.perl.org/group/perl.perl5.porters/2016/03/msg235216.html
I looked through existing code for paradigms to follow, and found some
occurrences of 'skip_foo_mg'. So this commit changes the names to be
av_top_index_skip_len_mg()
av_tindex_skip_len_mg()
This is explicit about the type of magic that is ignored, and will still
be valid if another type of magic ever gets added.
|
|
|
|
|
| |
Even though code calling S_pat_upgrade_to_utf8 from the
Perl_re_op_compile is testing the code_blocks for NULLness.
|
|
|
|
| |
See 147e38468b8279e26a0ca11e4efd8492016f2702 for complete explanation
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
study_chunk() for CURLYX is used to set flags on the linked WHILEM
node to say it is the whilem_c'th of whilem_seen. However it assumes
each CURLYX can be studied only once, which is not the case - there
are various cases such as GOSUB which call study_chunk() recursively
on already-visited parts of the program.
Storing the wrong index can cause the super-linear cache handling in
regmatch() to read/write the byte after the end of poscache.
Also reported in [perl #129281].
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RT #130650 heap-use-after-free in S_free_codeblocks
When compiling qr/(?{...})/, a reg_code_blocks structure is allocated
and various SVs are attached to it. Initially this is set to be freed
via a destructor on the savestack, in case of early dying. Later the
structure is attached to the compiling regex, and a boolean flag in the
structure, 'attached', is set to true to show that the destructor no
longer needs to free the struct.
However, it is possible to get three orders of destruction:
1) allocate, push destructor, die early
2) allocate, push destructor, attach to regex, die
2) allocate, push destructor, attach to regex, succeed
In 2, the regex is freed (via the savestack) before the destructor is
called. In 3, the destructor is called, then later the regex is freed.
It turns out perl can't currently handle case 2:
qr'(?{})\6'
Fix this by turning the 'attached' boolean field into an integer refcount,
then keep a count of whether the struct is referenced from the savestack
and/or the regex. Since it normally has a value of 1 or 2, it's similar
to a boolean flag, but crucially it no longer just indicates that the
regex has a pointer to it ('attached'), but that at least one of the
savestack and regex have a pointer to it. So order of freeing no longer
matters.
I also updated S_free_codeblocks() so that it nulls out SV pointers in
the reg_code_blocks struct before freeing them. This is is generally good
practice to avoid double frees, although is probably not needed at the
moment.
|
|
|
|
|
|
|
|
|
| |
77c8f26370dcc0e added support for a doubled x regexp flags, and ensured
the doubled flag was passed to the qr// created by
S_compile_runtime_code().
Unfortunately it didn't ensure enough space was allocated for that
extra 'x'.
|
|
|
|
| |
As per bb78386f13.
|
|
|
|
|
| |
This assert will fail if someone adds code that optimises away a GOSUB
call. At which point they will see the comment and know what to do.
|
|
|
|
|
|
| |
In 31fc93954d1f379c7a49889d91436ce99818e1f6 I added code that would modify
NEXT_OFF() when we were not in PASS2, when we should not do so. Strangly this
did not segfault when I tested, but this fix is required.
|
|
|
|
|
|
| |
Had these docs been here I would have saved some time debugging. So
save the next guy from the same trouble... (with my memory *I* might
even be the /next guy/. Sigh.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
not friends
Instead of optimising away impossible quantifiers like (foo){1,0} treat them
as unquantified, and guard them with an OPFAIL. Thus /(foo){1,0}/ is treated
the same as /(*FAIL)(foo)/ this is important in patterns like /(foo){1,0}|(?1)/
where the (?1) needs to be able to recurse into the (foo) even though the
(foo){1,0} can never match. It also resolves various issues (SEGVs) with patterns
like /((?1)){1,0}/.
This patch would have been easier if S_reginsert() documented that it is
the callers responsibility to properly set up the NEXT_OFF() of the inserted
node (if the node has a NEXT_OFF())
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[perl #129140] attempting double-free
Thus fixes some leaks and double frees in regexes which contain code
blocks.
During compilation, an array of struct reg_code_block's is malloced.
Initially this is just attached to the RExC_state_t struct local var in
Perl_re_op_compile(). Later it may be attached to a pattern. The difficulty
is ensuring that the array is free()d (and the ref counts contained within
decremented) should compilation croak early, while avoiding double frees
once the array has been attached to a regex.
The current mechanism of making the array the PVX of an SV is a bit flaky,
as the array can be realloced(), and code can be re-entered when utf8 is
detected mid-compilation.
This commit changes the array into separately malloced head and body.
The body contains the actual array, and can be realloced. The head
contains a pointer to the array, plus size and an 'attached' boolean.
This indicates whether the struct has been attached to a regex, and is
effectively a 1-bit ref count.
Whenever a head is allocated, SAVEDESTRUCTOR_X() is used to call
S_free_codeblocks() to free the head and body on scope exit. This function
skips the freeing if 'attached' is true, and this flag is set only at the
point where the head gets attached to the regex.
In one way this complicates the code, since the num_code_blocks field is now
not always available (it's only there is a head has been allocated), but
mainly its simplifies, since all the book-keeping is now done in the two
new static functions S_alloc_code_blocks() and S_free_codeblocks()
|
|
|
|
|
|
|
|
|
| |
"use re 'strict" is supposed to warn if a range whose start and end
points are digits aren't from the same group of 10. For example, if you
mix Bengali and Thai digits. It wasn't working properly for 5 groups of
mathematical digits starting at U+1D7E. This commit fixes that, and
refactors the code to bail out as soon as it discovers that no warning
is warranted, instead of doing unnecessary work.
|
|
|
|
|
|
|
|
|
|
|
| |
Starting in 5.14, we deprecated the use of "\cI<X>" when this
results in a printable character. For instance, "\c:" is just
a fancy way of writing "z". Starting in 5.28, this will be a
fatal error.
This also includes certain usage in regular expressions with the
experimental (?[ ]) construct, or when "use re 'strict'" is in
effect (also experimental).
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 5.26, some uses of unescaped left braces were made fatal; they have
given a deprecation warning since 5.20. Due to an oversight, some cases
were missed, and did not give a deprecation warning. They do now.
This patch changes said deprecation warning to mention the Perl version
in which the use of an unescaped left brace will be fatal (5.30).
The patch also cleans up some unnecessary quotes inside a C<> construct
in the discussion of this warning in perldiag.pod.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit generates a warning when the experimental 're strict'
feature is in effect for unescaped '}' and ']' characters (in a regular
expression pattern) that are interpreted literally.
This brings the behavior of these more in line with ')' which croaks
when it is taken literally.
The problem with the existing behavior is that these characters may be
metacharacters or they may be literals, depending on action at a
distance. Not so with ')', which is always a metacharacter unless
escaped.
Ideally, all three of these characters should behave similarly, but it
really is too late for that, except we can warn if the user has
requested extra checking of their patterns with this experimental
're strict' feature.
|
| |
|
|
|
|
|
| |
This was first proposed in the thread starting at
http://www.nntp.perl.org/group/perl.perl5.porters/2014/09/msg219394.html
|
|
|
|
|
| |
This was used for the removed feature of having the source in a
different encoding.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was introduced in a1a5ec35e6a3df0994b103aadb28a8c1a3a278da, and was
due to a thinko on my part. Zefram figured it out.
A macro evaluating to a string constant returns an instance of that
constant. Compilers are free to collapse all instances into a single
one (which saves space), or to have multiple copies. The code was
assuming the former, and HP-UX cc doesn't.
The passed size also was one byte larger than it should have been.
|
|
|
|
| |
"warning: unused variable 'i' [-Wunused-variable]"
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This creates several macros that future commits will use to provide a
layer between the caller and the function.
|
|
|
|
| |
Better to use the macro than to directly call the function it wraps
|
|
|
|
|
|
|
| |
The previous commit added this feature; now this commit uses it in core.
toke.c is deferred to the next commit to aid in possible future
bisecting, because some of the changes there seem somewhat more likely
to expose bugs.
|
|
|
|
|
|
|
| |
This function is equivalent to sv_setsv(sv, &PL_sv_undef), but more
efficient.
Also change the obvious places in the core to use the new idiom.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
C++11 requires space between the end of a string literal and a macro, so
that a feature can unambiguously be added to the language. Starting in
g++ 6.2, the compiler emits a warning when there isn't a space
(presumably so that future versions can support C++11). Unfortunately
there are many such instances in the perl core. This commit fixes
those, including those in ext/, but individual commits will be used for
the other modules, those in dist/ and cpan/.
This commit also inserts space at the end of a macro before a string
literal, even though that is not deprecated, and removes useless ""
literals following a macro (instead of inserting a blank). The result
is easier to read, making the macro stand out, and be clearer as to the
intention.
Code and modules included with the Perl core need to be compilable using
C++. This is so that perl can be embedded in C++ programs. (Actually,
only the hdr files need to be so compilable, but it would be hard to
test that just the hdrs are compilable.) So we need to accommodate
changes to the C++ language.
|
|
|
|
|
|
|
|
|
| |
An unclosed \N{ that made it through to the regex engine rather than
being handled by the lexer would erroneously trigger the error for
"\N{NAME} must be resolved by the lexer".
This separates the check for the missing trailing } and issues the
correct error message for this.
|
|
|
|
|
| |
SvTAIL() isn't set on an SV until fbm_compile() has been called,
so there's no point testing it before calling fbm_compile()
|
|
|
|
|
|
|
|
| |
I wrote this code some time ago. It is somewhat of
a state machine with some interesting implicit
assumptions which took me a while to remember. While
I do it seems reasonable to document them so the next
guy (maybe/probably me) doesn't have to think so hard.
|