| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes s/// so that it doesn't act destructively on its target.
Instead it returns the result of the substitution (or the original string if
there was no match).
In addition this patch:
* Adds a new warning when s///r happens in void context.
* Adds a error when you try to use s///r with !~
* Makes it so constant strings can be bound to s///r with =~
* Adds documentation.
* Adds some tests.
* Updates various debug code so it knows about the /r flag.
* Adds some new 'r' words to B::Deparse.
|
|
|
|
| |
This was the only user of xivu_hv in union _xivu, so remove that too.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an indirect fix for
[perl #74484] Regex causing exponential runtime+mem usage
The trie runtime code was doing more SAVETMPS than FREETMPS and was thus
growing a large tmps stack on heavy backtracking. Rather than fixing this
directly, I rewrote part of the trie code so that it no longer needs to
allocate memory in S_regmatch (it still does in find_byclass()).
The basic issue is that multiple branches in the trie may trigger an
accept state; for example:
"abcd" =~ /xyz/abcd.*X|ab.*Y|/
here, words (branches) 2 and 3 are accept states. The original approach
was, at run time, to create a list of accepted word numbers and the
character positions of the end of each of those words. Then run the rest
of the pattern for each word in the list in turn (in word index order).
This requires memory for the list to be allocated and freed.
The new approach involves creating extra info at compile time; in
particular, for each word, a pointer to the previous accepted word (if
any) in the state tree. For example for the above pattern, part of the
state tree may be
q b c d
1 -> 2 -> 3 -> 4 -> 5
(#3) (#2)
(e.g. at state 1, if the next char is 'a', we transition to state 2).
Here, state 3 is an accept state with word #3, and 5 is an accept state
with word #2. So we build a table indexed by word number, which has
wordinfo[2] = 3, wordinfo[3] = 0, thus building the word chain 2->3->0.
At run time we run the trie to completion, and remember the word
associated with the longest accept state (word #2 above). Then by following
back the chain of .prev fields, we can produce a list of all accepting
words. We then iteratively find the smallest-numbered (ie LH-most) word in
the chain, and run with it. On failure and backtrack, we find the
next-smallest and so on.
Since we are no longer recording the end-position of each word in the
string, we have to recalculate this for each backtrack. We initially
record the end-position of the shortest accepting word, and given that we
know the length of each word, we can calculate the new position each time
as an offset from that first word. Depending on unicode and folding, that
calculation can be cheap or expensive.
This algorithm is optimised for the typical case where there are a small
number (<= 2) accepting states.
This patch creates a new compile-time array, trie->wordinfo[], indexed by
word number, which contains relevant info about each word. This also
supersedes the old trie->newword[] array, whose function of recording
"overspills" of multiple words per accept state, is now handled as part of
the wordinfo[].prev chain.
|
|
|
|
|
|
|
|
|
|
|
| |
Commit c74340f9 added backreferences as well as the idea of a ->swap
regex pointer to keep track of the match offsets in case of backtracking.
The problem is that when Perl re-enters the regex engine to handle
utf8::SWASHNEW, the ->swap is not saved/restored/cleared so any capture
from the utf8 (Perl) code could inadvertently modify the regex match
data that caused the utf8 swash to get built.
This change should close out RT #60508
|
|
|
|
|
|
| |
Calculate memory allocation using regexp and XPVIO, and the offset of the first
real structure member. This avoids tripping over alignment differences between
X* and x*_allocated, because x*_allocated doesn't have a double in it.
|
|
|
|
|
|
|
| |
MUTABLE_SV() check. Use SvPVX_const() instead of SvPVX()
where only a const SV* is available. Also fix two falsely
consted pointers in Perl_sv_2pv_flags().
p4raw-id: //depot/perl@34770
|
|
|
|
|
|
|
|
|
|
| |
to Perl_re_compile() can't be const, which means that the pattern
argument to Perl_pregcomp() can't be const, as can't the argument in
the function in the regexp engine structure.
It's a shame that no-one spotted this earlier.
(Again) I may have rendered the documentation inaccurate.
p4raw-id: //depot/perl@34672
|
|
|
| |
p4raw-id: //depot/perl@34585
|
|
|
|
|
|
| |
From: "Reini Urban" <rurban@x-ray.at>
Message-ID: <6910a60806080717h1aaaef1fh425a2ef21a62c9ed@mail.gmail.com>
p4raw-id: //depot/perl@34030
|
|
|
|
|
|
|
|
| |
and XS?]
From: "Jan Dubois" <jand@activestate.com>
Message-ID: <02ee01c8651b$17ef72f0$47ce58d0$@com>
p4raw-id: //depot/perl@33292
|
|
|
|
|
|
|
|
|
| |
#if defined(__GNUC__) && !defined(__STRICT_ANSI__) && !defined(PERL_GCC_PEDANTIC)
to
#if defined(__GNUC__) && !defined(PERL_GCC_BRACE_GROUPS_FORBIDDEN)
because the ({}) construction can be used under __STRICT_ANSI__
(and should be, because it avoids temporary use of PL_Sv).
p4raw-id: //depot/perl@33077
|
|
|
|
|
|
| |
From: "Robin Barker" <Robin.Barker@npl.co.uk>
Message-ID: <46A0F33545E63740BC7563DE59CA9C6D0939CA@exchsvr2.npl.ad.local>
p4raw-id: //depot/perl@32976
|
|
|
|
|
|
| |
change 32961, and I thought that it was the right thing, but I guess
not. It should have read like this.
p4raw-id: //depot/perl@32962
|
|
|
|
|
|
|
| |
(at least for the most commonly used macros).
Remove the duplicate definition of RX_SUBBEG(), which I was sure I'd
done earlier.
p4raw-id: //depot/perl@32961
|
|
|
| |
p4raw-id: //depot/perl@32925
|
|
|
|
|
|
| |
[I don't get warnings about void context here, but I'm sure someone
will :-(]
p4raw-id: //depot/perl@32890
|
|
|
| |
p4raw-id: //depot/perl@32859
|
|
|
| |
p4raw-id: //depot/perl@32854
|
|
|
|
|
|
| |
be accessed via RXp_PAREN_NAMES(). (They are entirely within the
regexp implementation).
p4raw-id: //depot/perl@32853
|
|
|
| |
p4raw-id: //depot/perl@32852
|
|
|
| |
p4raw-id: //depot/perl@32845
|
|
|
|
|
|
| |
Fix up some uses of RX_* macros in the block conditionally compiled
with STUPID_PATTERN_CHECKS.
p4raw-id: //depot/perl@32843
|
|
|
|
|
| |
in the SvPVX().
p4raw-id: //depot/perl@32841
|
|
|
|
|
|
|
| |
Remove RXp_PRECOMP() and RXp_WRAPPED().
Change the parameter of S_debug_start_match() from regexp to REGEXP.
Change its callers [the only part wrong for 5.10.x]
p4raw-id: //depot/perl@32840
|
|
|
|
|
| |
broken window: ../ext/Compress/Raw/Zlib/t/07bufsize.t)
p4raw-id: //depot/perl@32837
|
|
|
|
|
|
| |
and regexp reference counting is via the regular SV reference counting.
This was not as easy at it looks.
p4raw-id: //depot/perl@32804
|
|
|
|
|
|
|
| |
regcomp.c and regexec.c RXp_* where necessary] so that in future we
can maintain source compatibility when we add an extra level of
dereferencing.
p4raw-id: //depot/perl@32802
|
|
|
|
|
|
|
| |
lowest 4 bits (which saves a shift), and the "flags indicating special
patterns" into contiguous bits. This makes everything a little tidier,
and saves 88 bytes (woohoo!) of object file with -Os on x86 FreeBSD.
p4raw-id: //depot/perl@32775
|
|
|
|
|
|
|
| |
shift 12 in two places, so replace that magic number with a macro
RXf_PMf_STD_PMMOD_SHIFT defined adjacent to the flags it interacts
with.
p4raw-id: //depot/perl@32774
|
|
|
| |
p4raw-id: //depot/perl@32759
|
|
|
|
|
|
| |
RX_WRAPLEN() to preserve source compatibility when they get moved
around.
p4raw-id: //depot/perl@32758
|
|
|
|
|
|
| |
to much, as the replacement assumes that the wrapping string has
exactly 1 character after the wrapped string [specifically ')'].
p4raw-id: //depot/perl@32757
|
|
|
|
|
|
|
| |
wrapped in pre_prefix, a 4 bit value. (Maybe only for now) reduce
seen_evals from I32 to 28 bits. Will anyone have more than 268435456
eval groups in a regexp?
p4raw-id: //depot/perl@32755
|
|
|
|
|
|
| |
the macros RX_PRECOMP() and RX_PRELEN(). This will allow us to reduce
the regexp storage overhead by computing them at retrieve time.
p4raw-id: //depot/perl@32753
|
|
|
| |
p4raw-id: //depot/perl@32237
|
|
|
|
|
|
|
|
| |
regcomp.pl to update regnodes.h
Currently the *NIX makefiles are not set up to update regnodes.h automatically when regexp.h is modified.
This at least warns people modifying the list about what they should do. A better solution is needed.
p4raw-id: //depot/perl@31734
|
|
|
|
|
|
| |
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Message-ID: <51dd1af80708090049p2cf4810ep5a437ad53f64fa78@mail.gmail.com>
p4raw-id: //depot/perl@31693
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Date: Fri, 29 Jun 2007 23:38:07 +0200
Message-ID: <20070629213807.GA14454@abigail.nl>
Subject: [PATCH pod/perlre.pod] Keeping up with the changes.
From: Abigail <abigail@abigail.be>
Date: Sat, 30 Jun 2007 01:24:36 +0200
Message-ID: <20070629232436.GA15326@abigail.nl>
Plus tweaks, and debug enahancements.
p4raw-id: //depot/perl@31506
|
|
|
| |
p4raw-id: //depot/perl@31498
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
their usage.
RXf_ => flags used in pm_flags argument to regcomp
and stored in the regex via rx->extflags
PREGf_ => flags stored in rx->intflags
RXapif_ => argument flags for regex named capture api
RX_BUFF_IDX_ => special indexes to represent $` $' $&
used in the numeric capture buffer api
PREGf is untouched by this change, but RXf_ is split into RXapif and RX_BUFF_IDX_.
p4raw-id: //depot/perl@31497
|
|
|
|
|
|
|
|
|
| |
From: "Ævar Arnfjörð Bjarmason"
<avarab@gmail.com>
Message-ID: <51dd1af80706281306i4dbba39em3eeb8da1d67ea27c@mail.gmail.com>
(with tweaks)
p4raw-id: //depot/perl@31495
|
|
|
|
|
|
| |
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Message-ID: <51dd1af80706172033h1908aa0ge15698204e0b79ed@mail.gmail.com>
p4raw-id: //depot/perl@31409
|
|
|
|
|
|
| |
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Message-ID: <51dd1af80706031324y5618d519p460da27a2e7fe712@mail.gmail.com>
p4raw-id: //depot/perl@31341
|
|
|
|
|
|
| |
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Message-ID: <51dd1af80705160938w13789b63m6d5f4710441ceac@mail.gmail.com>
p4raw-id: //depot/perl@31244
|
|
|
|
|
|
| |
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Message-ID: <51dd1af80705011658g1156e14cw4d2b21a8d772ed41@mail.gmail.com>
p4raw-id: //depot/perl@31130
|
|
|
|
|
| |
Message-ID: <9b18b3110705011446h2113221cndf70af928d72505@mail.gmail.com>
p4raw-id: //depot/perl@31118
|
|
|
|
|
|
| |
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Message-ID: <51dd1af80704261922j3db0615wa86ccc4cb65b2713@mail.gmail.com>
p4raw-id: //depot/perl@31106
|
|
|
|
|
|
|
| |
char* xend
Message-ID: <51dd1af80704211430m6ad1b4afy49b069faa61e33a9@mail.gmail.com>
p4raw-id: //depot/perl@31027
|