| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
The netbsd - 5.0.2 compiler pointed out that the recent changes to add
longjmps to speed up some regex compilations can result in clobbering a
few values. These depend on the compiled code, and so didn't show up in
other compiler's warnings. This patch reinitializes them after a
longjmp.
|
|
|
|
|
| |
Previously the AV paren_name_list would "leak" until global destruction.
This was only an issue under -DDEBUGGING. Fixes RT #73438.
|
|
|
|
|
|
| |
Add a new flags column to regcomp.sym, with V if the node type is in PL_varies,
S if it is in PL_simple, and . if a placeholder is needed because subsequent
optional columns are present.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an indirect fix for
[perl #74484] Regex causing exponential runtime+mem usage
The trie runtime code was doing more SAVETMPS than FREETMPS and was thus
growing a large tmps stack on heavy backtracking. Rather than fixing this
directly, I rewrote part of the trie code so that it no longer needs to
allocate memory in S_regmatch (it still does in find_byclass()).
The basic issue is that multiple branches in the trie may trigger an
accept state; for example:
"abcd" =~ /xyz/abcd.*X|ab.*Y|/
here, words (branches) 2 and 3 are accept states. The original approach
was, at run time, to create a list of accepted word numbers and the
character positions of the end of each of those words. Then run the rest
of the pattern for each word in the list in turn (in word index order).
This requires memory for the list to be allocated and freed.
The new approach involves creating extra info at compile time; in
particular, for each word, a pointer to the previous accepted word (if
any) in the state tree. For example for the above pattern, part of the
state tree may be
q b c d
1 -> 2 -> 3 -> 4 -> 5
(#3) (#2)
(e.g. at state 1, if the next char is 'a', we transition to state 2).
Here, state 3 is an accept state with word #3, and 5 is an accept state
with word #2. So we build a table indexed by word number, which has
wordinfo[2] = 3, wordinfo[3] = 0, thus building the word chain 2->3->0.
At run time we run the trie to completion, and remember the word
associated with the longest accept state (word #2 above). Then by following
back the chain of .prev fields, we can produce a list of all accepting
words. We then iteratively find the smallest-numbered (ie LH-most) word in
the chain, and run with it. On failure and backtrack, we find the
next-smallest and so on.
Since we are no longer recording the end-position of each word in the
string, we have to recalculate this for each backtrack. We initially
record the end-position of the shortest accepting word, and given that we
know the length of each word, we can calculate the new position each time
as an offset from that first word. Depending on unicode and folding, that
calculation can be cheap or expensive.
This algorithm is optimised for the typical case where there are a small
number (<= 2) accepting states.
This patch creates a new compile-time array, trie->wordinfo[], indexed by
word number, which contains relevant info about each word. This also
supersedes the old trie->newword[] array, whose function of recording
"overspills" of multiple words per accept state, is now handled as part of
the wordinfo[].prev chain.
|
|
|
|
|
|
|
|
| |
revert ba9ac1759cb6e7a5e6883c85edd0b450061b5ccb
Changing the semantics of \w \s and \d breaks too much
and Jesse wants to do a rollout. This disables the new
semantics until we can get all the details worked out.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
class matching
This also alters which Unicode properties that the POSIX character
class and the Perl "special" character classes, like \w and \d map
to. At the same time it allows a number of tests for POSIX character
class behaviour to be switched from todo to non todo. Legacy testing
is still available by changing the define and setting the
PERL_TEST_LEGACY_POSIX_CC value to true.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the regex engine)
Perlbug #60156 and #49302 (and probably others) resolve down to the problem
that the definition of \s and \w and \d and the POSIX charclasses are different
for unicode strings and for non-unicode strings. This broke the character class
logic in the regex engine. The easiest fix to make the character class logic sane
again is to define new properties which do match.
This change creates new property classes that can be used instead of the
traditional ones (it does not change the previously defined ones). If the
define in regcomp.h:
#define PERL_LEGACY_UNICODE_CHARCLASS_MAPPINGS 1
is changed to 0, then the new mappings will be used. This will fix a bunch
of bugs that are reported as TODO items in the new reg_posixcc.t test file.
p4raw-id: //depot/perl@34769
|
|
|
|
|
|
|
|
|
|
|
| |
* Make ANYOF output from regprop easier to read by adding ][ in between the unicode representation and the "ascii" one
* Make it possible to make tests in re_tests todo.
* add a todo test for a complementary character class match that should fail (perl #60156)
* Also add a comment explaining a previous commit (relating to perl #60344)
p4raw-id: //depot/perl@34755
|
|
|
|
|
|
| |
and regexp reference counting is via the regular SV reference counting.
This was not as easy at it looks.
p4raw-id: //depot/perl@32804
|
|
|
|
|
|
|
| |
regcomp.c and regexec.c RXp_* where necessary] so that in future we
can maintain source compatibility when we add an extra level of
dereferencing.
p4raw-id: //depot/perl@32802
|
|
|
| |
p4raw-id: //depot/perl@32793
|
|
|
| |
p4raw-id: //depot/perl@32237
|
|
|
| |
p4raw-id: //depot/perl@31983
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Date: Fri, 29 Jun 2007 23:38:07 +0200
Message-ID: <20070629213807.GA14454@abigail.nl>
Subject: [PATCH pod/perlre.pod] Keeping up with the changes.
From: Abigail <abigail@abigail.be>
Date: Sat, 30 Jun 2007 01:24:36 +0200
Message-ID: <20070629232436.GA15326@abigail.nl>
Plus tweaks, and debug enahancements.
p4raw-id: //depot/perl@31506
|
|
|
| |
p4raw-id: //depot/perl@31455
|
|
|
|
|
|
| |
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Message-ID: <51dd1af80706031324y5618d519p460da27a2e7fe712@mail.gmail.com>
p4raw-id: //depot/perl@31341
|
|
|
|
|
|
| |
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Message-ID: <51dd1af80705011658g1156e14cw4d2b21a8d772ed41@mail.gmail.com>
p4raw-id: //depot/perl@31130
|
|
|
|
|
|
| |
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Message-ID: <51dd1af80704261922j3db0615wa86ccc4cb65b2713@mail.gmail.com>
p4raw-id: //depot/perl@31106
|
|
|
|
|
|
|
| |
PCRE and unicode tr18
Message-ID: <9b18b3110704221434g43457742p28cab00289f83639@mail.gmail.com>
p4raw-id: //depot/perl@31026
|
|
|
|
|
|
| |
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Message-ID: <51dd1af80703291552y1073bcb6r954b043eb68a4459@mail.gmail.com>
p4raw-id: //depot/perl@30849
|
|
|
|
|
|
|
|
| |
LP64 platforms, by pairing up I32 and U32 members. Notably structs
_reg_trie_data, reg_ac_data, regexp and regmatch_state down by 8 bytes,
re_save_state by 16, and regmatch_slab up by 48 (ie one more state per
slab)
p4raw-id: //depot/perl@30815
|
|
|
|
|
|
|
|
|
| |
pattern is a qr.
Message-ID: <9b18b3110703210239x540f5ad9mdb41c2ea6229ac31@mail.gmail.com>
plus two follow-up patches (minor tweaks)
p4raw-id: //depot/perl@30678
|
|
|
|
|
| |
Message-ID: <9b18b3110702280845p7860ca08taf1aead39a178aa4@mail.gmail.com>
p4raw-id: //depot/perl@30436
|
|
|
|
|
| |
Message-ID: <9b18b3110702131127q79cc6df1lb1480d9a40d15213@mail.gmail.com>
p4raw-id: //depot/perl@30265
|
|
|
|
|
| |
Message-ID: <9b18b3110701301458k2f6a8254hea6c6db28489c38b@mail.gmail.com>
p4raw-id: //depot/perl@30084
|
|
|
|
|
|
| |
Message-ID: <9b18b3110701210953l4df6198re36a9342e6049583@mail.gmail.com>
Date: Sun, 21 Jan 2007 18:53:38 +0100
p4raw-id: //depot/perl@29923
|
|
|
|
|
|
|
| |
and add support for %-
Message-ID: <9b18b3110701151406p7168b20byf873ee2e58091ca3@mail.gmail.com>
p4raw-id: //depot/perl@29843
|
|
|
|
|
| |
Message-ID: <9b18b3110701140624v452f7684x5e9d2890805489fd@mail.gmail.com>
p4raw-id: //depot/perl@29842
|
|
|
|
|
|
|
|
| |
${^POSTMATCH}
Message-ID: <9b18b3110701111731x29b1c63i57b1698f769b3bbc@mail.gmail.com>
(with tweaks)
p4raw-id: //depot/perl@29831
|
|
|
|
|
|
| |
files that generate .h files, so they'll be ready
next time.
p4raw-id: //depot/perl@29695
|
|
|
|
|
| |
Message-ID: <9b18b3110612240538m5c45654br7d27171835f6664@mail.gmail.com>
p4raw-id: //depot/perl@29621
|
|
|
|
|
|
| |
by taking advantage of how anchored_* and float_* are stored in arrays
to use a loop.
p4raw-id: //depot/perl@29503
|
|
|
|
|
|
|
| |
Message-ID: <9b18b3110612050713g77cac516x46fb5baac99b47c9@mail.gmail.com>
(with tweaks)
p4raw-id: //depot/perl@29468
|
|
|
|
|
|
| |
Actually the regexp engine structure only needs
one compilation function hook.
p4raw-id: //depot/perl@29459
|
|
|
| |
p4raw-id: //depot/perl@29458
|
|
|
|
|
|
|
| |
specific.
Message-ID: <9b18b3110611301306p5cad5deal4aa55559b8c8defd@mail.gmail.com>
p4raw-id: //depot/perl@29430
|
|
|
|
|
| |
Message-ID: <9b18b3110611290718o685a07ddja39f595ed97c231a@mail.gmail.com>
p4raw-id: //depot/perl@29420
|
|
|
|
|
| |
them on thread clone.
p4raw-id: //depot/perl@29394
|
|
|
|
|
|
| |
top level regdata array, so that it can be correctly duplicated on
thread clone.
p4raw-id: //depot/perl@29393
|
|
|
|
|
| |
preliminary to moving _reg_trie_data.widecharmap out too.
p4raw-id: //depot/perl@29392
|
|
|
|
|
| |
_reg_ac_data allows smaller code in Perl_regdupe.
p4raw-id: //depot/perl@29391
|
|
|
|
|
| |
Message-ID: <9b18b3110611220811k1a54f650t1bd7c6a9450b0a7e@mail.gmail.com>
p4raw-id: //depot/perl@29354
|
|
|
|
|
| |
Message-ID: <9b18b3110611090809l667860c9t6c27453d7c86a21e@mail.gmail.com>
p4raw-id: //depot/perl@29260
|
|
|
|
|
|
|
|
| |
Message-ID: <9b18b3110611121429g1fc9d6c1t4007dc711f9e8396@mail.gmail.com>
Plus a couple tweaks to ext/re/re.pm and t/op/pat.t to those patches
to apply cleanly.
p4raw-id: //depot/perl@29252
|
|
|
|
|
|
|
|
| |
Message-ID: <9b18b3110611060406u2fa1572as57073949a5df9e62@mail.gmail.com>
Plus a portability fix (in string comparison for regex verbs)
and doc tweaks / podchecker fixes
p4raw-id: //depot/perl@29222
|
|
|
|
|
| |
Message-ID: <9b18b3110611020335h7ea469a8g28ca483f6832816d@mail.gmail.com>
p4raw-id: //depot/perl@29189
|
|
|
|
|
|
|
|
|
|
| |
Message-ID: <9b18b3110610181151i3ca438cdied769ebaa4255079@mail.gmail.com>
1. code necessary to make patterns with interpolated vars behave
correctly under lexical re 'debug', including additional tests.
2. changes necessary to resolve the off by one error,
3. tweaks to re.pm to document that re 'debug' is lexical,
p4raw-id: //depot/perl@29057
|
|
|
|
|
|
|
| |
Message-ID: <9b18b3110610121223m191e47ddtce3398cb0e8ba320@mail.gmail.com>
With doc tweaks
p4raw-id: //depot/perl@29005
|