| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
Commit c74340f9 added backreferences as well as the idea of a ->swap
regex pointer to keep track of the match offsets in case of backtracking.
The problem is that when Perl re-enters the regex engine to handle
utf8::SWASHNEW, the ->swap is not saved/restored/cleared so any capture
from the utf8 (Perl) code could inadvertently modify the regex match
data that caused the utf8 swash to get built.
This change should close out RT #60508
|
|
|
|
|
|
| |
Calculate memory allocation using regexp and XPVIO, and the offset of the first
real structure member. This avoids tripping over alignment differences between
X* and x*_allocated, because x*_allocated doesn't have a double in it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"Hugo van der Sanden via RT" <perlbug-followup@perl.org> wrote:
:This is caused by a failure of the start_class optimization in the case
:of lookahead, as per the attached comment.
:
:In more detail: at the point study_chunk() attempts to deal with the
:start_class discovered for the lookahead chunk, we have
:SCF_DO_STCLASS_OR set, and_withp has the starting value of ANYOF_EOS |
:ANYOF_UNICODE_ALL, and data->start_class has [a] | ANYOF_EOS.
[...]
:In other words, we need to stack an alternation of ANDs and ORs to cope
:with this situation, and we don't have a mechanism to do that except to
:recurse into study_chunk() some more.
:
:A simpler short-term fix is instead to throw up our hands in this
:situation, and just nullify start_class. I'm not sure exactly how to do
:that, but it seems the more likely to be achievable for 5.10.1.
This patch implements the simple fix, and passes all tests including
Abigail's test cases for the bug.
Yves: note that I've preserved the 'was' code in this chunk, introduced
by you in the patch [1], discussed in the thread [2]. As far as I can
see the 3 lines propagating ANYOF_EOS via 'was' (and the copy of those
3 lines a little later) are simply doing the wrong thing - they seem
to be saying "when we combine two start classes using SCF_DO_STCLASS_AND,
claim that end-of-string is valid if the first class says it would be
even though the second says it wouldn't be". Removing those lines doesn't
cause any test failures - can you remember why you introduced those lines,
and maybe add a test case that fails without them?
Hugo
[1] http://perl5.git.perl.org/perl.git/commit/b515a41db88584b4fd1c30cf890c92d3f9697760
[2] http://groups.google.co.uk/group/perl.perl5.porters/browse_thread/thread/436187077ef96918/f11c3268394abf89
Message-Id: <200907021036.n62Aa8rv029500@zen.crypt.org>
rt.perl.org #56690
|
|
|
|
|
|
| |
... )
This fixes RT #59734 : Segfault when using (?|) in regexp.
|
|
|
|
|
|
| |
\N, like in Perl 6, is equivalent to . but not influenced by /s.
It matches any character except \n. Note that followed by { and
a non-number, \N is still a named character.
|
|
|
|
|
|
| |
(Tweaked by rgs)
Message-ID: <496D3F02.6020204@khwilliamson.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the regex engine)
Perlbug #60156 and #49302 (and probably others) resolve down to the problem
that the definition of \s and \w and \d and the POSIX charclasses are different
for unicode strings and for non-unicode strings. This broke the character class
logic in the regex engine. The easiest fix to make the character class logic sane
again is to define new properties which do match.
This change creates new property classes that can be used instead of the
traditional ones (it does not change the previously defined ones). If the
define in regcomp.h:
#define PERL_LEGACY_UNICODE_CHARCLASS_MAPPINGS 1
is changed to 0, then the new mappings will be used. This will fix a bunch
of bugs that are reported as TODO items in the new reg_posixcc.t test file.
p4raw-id: //depot/perl@34769
|
|
|
|
|
| |
And refactor the code that adds the extra braces into a macro, and make it support the colorization stuff.
p4raw-id: //depot/perl@34766
|
|
|
|
|
|
|
|
|
|
|
| |
* Make ANYOF output from regprop easier to read by adding ][ in between the unicode representation and the "ascii" one
* Make it possible to make tests in re_tests todo.
* add a todo test for a complementary character class match that should fail (perl #60156)
* Also add a comment explaining a previous commit (relating to perl #60344)
p4raw-id: //depot/perl@34755
|
|
|
|
|
|
|
| |
Subject: PATCH [perl #59328] In re's, \N{U+...} doesn't match for ... > 256
Message-ID: <49124B78.2000907@khwilliamson.com>
Date: Wed, 05 Nov 2008 18:42:16 -0700
p4raw-id: //depot/perl@34747
|
|
|
|
|
|
| |
Message-ID: <25940.1225611819@chthon>
Date: Sun, 02 Nov 2008 01:43:39 -0600
p4raw-id: //depot/perl@34698
|
|
|
|
|
|
| |
From: Michael Cartmell (via RT) <perlbug-followup@perl.org>
Message-ID: <rt-3.6.HEAD-27577-1215001078-1211.56526-75-0@perl.org>
p4raw-id: //depot/perl@34697
|
|
|
|
|
|
| |
This is mostly to silence gcc's warning, "format not a string
literal and no format arguments".
p4raw-id: //depot/perl@34694
|
|
|
|
|
| |
erroneous const in dump.c.
p4raw-id: //depot/perl@34675
|
|
|
|
|
|
|
|
|
|
| |
to Perl_re_compile() can't be const, which means that the pattern
argument to Perl_pregcomp() can't be const, as can't the argument in
the function in the regexp engine structure.
It's a shame that no-one spotted this earlier.
(Again) I may have rendered the documentation inaccurate.
p4raw-id: //depot/perl@34672
|
|
|
| |
p4raw-id: //depot/perl@34653
|
|
|
| |
p4raw-id: //depot/perl@34650
|
|
|
| |
p4raw-id: //depot/perl@34629
|
|
|
| |
p4raw-id: //depot/perl@34628
|
|
|
| |
p4raw-id: //depot/perl@34585
|
|
|
|
|
|
| |
optimization. This was most probably introduced with #28262.
This change fixes perl #59516.
p4raw-id: //depot/perl@34507
|
|
|
|
|
|
| |
See http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-09/msg00590.html
and http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-10/msg00163.html
p4raw-id: //depot/perl@34464
|
|
|
| |
p4raw-id: //depot/perl@34381
|
|
|
|
|
| |
along with a bunch other named capture related leaks.
p4raw-id: //depot/perl@34151
|
|
|
|
|
| |
a++; so write it as the former, to keep PERL_DEBUG_COW happy.
p4raw-id: //depot/perl@34039
|
|
|
|
|
|
| |
Message-ID: <484D491D.9050704@x-ray.at>
Date: Mon, 09 Jun 2008 17:15:41 +0200
p4raw-id: //depot/perl@34038
|
|
|
| |
p4raw-id: //depot/perl@33853
|
|
|
|
|
|
| |
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Message-ID: <51dd1af80804091738r15d37763lf900d59f8bcc5e81@mail.gmail.com>
p4raw-id: //depot/perl@33667
|
|
|
|
|
| |
http://www.nntp.perl.org/group/perl.daily-build.reports/2008/02/msg53937.html
p4raw-id: //depot/perl@33370
|
|
|
|
|
|
|
|
|
| |
-- lastcloseparen is literally the index of the last paren closed
-- lastparen is index of the highest index paren that has been closed.
In nested parens, they will be completely different.
'ab'=~/(a(b))/ will have: lastparen = 2, lastcloseparen = 1
'ab'=~/(a)(b)/ will have: lastparen = lastcloseparen = 2
p4raw-id: //depot/perl@33325
|
|
|
| |
p4raw-id: //depot/perl@33324
|
|
|
|
|
|
|
|
|
|
|
|
| |
ability to create landmines that will explode under someone in the
future when they upgrade their compiler to one with better
optimisation. We've already done this at least twice.
(Yes, some of the assertions are after code that would already have
SEGVd because it already deferences a pointer, but they are put in
to make it easier to automate checking that each and every case is
covered.)
Add a tool, checkARGS_ASSERT.pl, to check that every case is covered.
p4raw-id: //depot/perl@33291
|
|
|
|
|
|
|
|
| |
Message-ID: <86zlveaewk.fsf@cpan.org>
with two corrections.
Plus remove reg_stringify from embed.fnc and regen.
p4raw-id: //depot/perl@32934
|
|
|
| |
p4raw-id: //depot/perl@32933
|
|
|
|
|
| |
(Certain regexps could SEGV if cloned).
p4raw-id: //depot/perl@32932
|
|
|
|
|
| |
(and related changes)
p4raw-id: //depot/perl@32880
|
|
|
| |
p4raw-id: //depot/perl@32861
|
|
|
| |
p4raw-id: //depot/perl@32859
|
|
|
|
|
|
| |
be accessed via RXp_PAREN_NAMES(). (They are entirely within the
regexp implementation).
p4raw-id: //depot/perl@32853
|
|
|
| |
p4raw-id: //depot/perl@32852
|
|
|
|
|
| |
bit in pmflags, to decide whether the pattern is UTF-8.
p4raw-id: //depot/perl@32851
|
|
|
| |
p4raw-id: //depot/perl@32849
|
|
|
| |
p4raw-id: //depot/perl@32845
|
|
|
|
|
|
| |
Fix up some uses of RX_* macros in the block conditionally compiled
with STUPID_PATTERN_CHECKS.
p4raw-id: //depot/perl@32843
|
|
|
|
|
| |
in the SvPVX().
p4raw-id: //depot/perl@32841
|
|
|
|
|
|
|
| |
Remove RXp_PRECOMP() and RXp_WRAPPED().
Change the parameter of S_debug_start_match() from regexp to REGEXP.
Change its callers [the only part wrong for 5.10.x]
p4raw-id: //depot/perl@32840
|
|
|
|
|
|
| |
the flags. Move its implementation just ahead of sv_2mortal()'s for
CPU cache locality. Refactor all code that can be to use this.
p4raw-id: //depot/perl@32818
|
|
|
|
|
| |
But use newSVhek() in preference when possible.
p4raw-id: //depot/perl@32813
|
|
|
|
|
|
|
|
|
|
| |
flag bits. Right now the only flag bit is SVf_UTF8, which will call
SvUTF8_on() on the new SV for you. Provide a wrapper newSVpvn_utf8(),
which takes a boolean, and passes in SVf_UTF8 if that is true.
Refactor the core to use it where possible. It makes the source code
clearer and smaller, but seems to be swings and roundabouts on object
code size.
p4raw-id: //depot/perl@32807
|
|
|
|
|
|
| |
and regexp reference counting is via the regular SV reference counting.
This was not as easy at it looks.
p4raw-id: //depot/perl@32804
|