summaryrefslogtreecommitdiff
path: root/regexp.h
Commit message (Collapse)AuthorAgeFilesLines
* reduce size of regmatch_state.u.curlyx by 2 wordsDavid Mitchell2010-06-061-3/+2
|
* Add s///r (non-destructive substitution).David Caldwell2010-05-221-1/+3
| | | | | | | | | | | | | | | | This changes s/// so that it doesn't act destructively on its target. Instead it returns the result of the substitution (or the original string if there was no match). In addition this patch: * Adds a new warning when s///r happens in void context. * Adds a error when you try to use s///r with !~ * Makes it so constant strings can be bound to s///r with =~ * Adds documentation. * Adds some tests. * Updates various debug code so it knows about the /r flag. * Adds some new 'r' words to B::Deparse.
* Remove union _xivu from struct regexp - replace it with a non-union paren_names.Nicholas Clark2010-05-211-5/+2
| | | | This was the only user of xivu_hv in union _xivu, so remove that too.
* In the SV body, exchange the positions of the NV and stash/magic.Nicholas Clark2010-05-211-1/+1
|
* tries: don't allocate memory at runtimeDavid Mitchell2010-05-031-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an indirect fix for [perl #74484] Regex causing exponential runtime+mem usage The trie runtime code was doing more SAVETMPS than FREETMPS and was thus growing a large tmps stack on heavy backtracking. Rather than fixing this directly, I rewrote part of the trie code so that it no longer needs to allocate memory in S_regmatch (it still does in find_byclass()). The basic issue is that multiple branches in the trie may trigger an accept state; for example: "abcd" =~ /xyz/abcd.*X|ab.*Y|/ here, words (branches) 2 and 3 are accept states. The original approach was, at run time, to create a list of accepted word numbers and the character positions of the end of each of those words. Then run the rest of the pattern for each word in the list in turn (in word index order). This requires memory for the list to be allocated and freed. The new approach involves creating extra info at compile time; in particular, for each word, a pointer to the previous accepted word (if any) in the state tree. For example for the above pattern, part of the state tree may be q b c d 1 -> 2 -> 3 -> 4 -> 5 (#3) (#2) (e.g. at state 1, if the next char is 'a', we transition to state 2). Here, state 3 is an accept state with word #3, and 5 is an accept state with word #2. So we build a table indexed by word number, which has wordinfo[2] = 3, wordinfo[3] = 0, thus building the word chain 2->3->0. At run time we run the trie to completion, and remember the word associated with the longest accept state (word #2 above). Then by following back the chain of .prev fields, we can produce a list of all accepting words. We then iteratively find the smallest-numbered (ie LH-most) word in the chain, and run with it. On failure and backtrack, we find the next-smallest and so on. Since we are no longer recording the end-position of each word in the string, we have to recalculate this for each backtrack. We initially record the end-position of the shortest accepting word, and given that we know the length of each word, we can calculate the new position each time as an offset from that first word. Depending on unicode and folding, that calculation can be cheap or expensive. This algorithm is optimised for the typical case where there are a small number (<= 2) accepting states. This patch creates a new compile-time array, trie->wordinfo[], indexed by word number, which contains relevant info about each word. This also supersedes the old trie->newword[] array, whose function of recording "overspills" of multiple words per accept state, is now handled as part of the wordinfo[].prev chain.
* much better swap logic to support reentrancy and fix assert failureGeorge Greer2009-07-261-1/+1
| | | | | | | | | | | Commit c74340f9 added backreferences as well as the idea of a ->swap regex pointer to keep track of the match offsets in case of backtracking. The problem is that when Perl re-enters the regex engine to handle utf8::SWASHNEW, the ->swap is not saved/restored/cleared so any capture from the utf8 (Perl) code could inadvertently modify the regex match data that caused the utf8 swash to get built. This change should close out RT #60508
* Eliminate struct regexp_allocated and xpvio_allocated.Nicholas Clark2009-07-171-6/+0
| | | | | | Calculate memory allocation using regexp and XPVIO, and the offset of the first real structure member. This avoids tripping over alignment differences between X* and x*_allocated, because x*_allocated doesn't have a double in it.
* Revert SvPVX() to allow lvalue usage, but also add aMarcus Holland-Moritz2008-11-071-0/+2
| | | | | | | MUTABLE_SV() check. Use SvPVX_const() instead of SvPVX() where only a const SV* is available. Also fix two falsely consted pointers in Perl_sv_2pv_flags(). p4raw-id: //depot/perl@34770
* SvPV() does not take a const SV*, which means that the pattern argumentNicholas Clark2008-10-301-1/+1
| | | | | | | | | | to Perl_re_compile() can't be const, which means that the pattern argument to Perl_pregcomp() can't be const, as can't the argument in the function in the regexp engine structure. It's a shame that no-one spotted this earlier. (Again) I may have rendered the documentation inaccurate. p4raw-id: //depot/perl@34672
* Update copyright years.Nicholas Clark2008-10-251-1/+1
| | | p4raw-id: //depot/perl@34585
* Re: [PATCH] readable assertion names, now the restReini Urban2008-06-081-27/+27
| | | | | | From: "Reini Urban" <rurban@x-ray.at> Message-ID: <6910a60806080717h1aaaef1fh425a2ef21a62c9ed@mail.gmail.com> p4raw-id: //depot/perl@34030
* Fix bit-fields for VC [was RE: [perl #50386] GIMME_V broken with 5.10.0/GCC ↵Jan Dubois2008-02-121-2/+2
| | | | | | | | and XS?] From: "Jan Dubois" <jand@activestate.com> Message-ID: <02ee01c8651b$17ef72f0$47ce58d0$@com> p4raw-id: //depot/perl@33292
* Standardise the conditional compilation protection of ({}) fromNicholas Clark2008-01-261-2/+2
| | | | | | | | | #if defined(__GNUC__) && !defined(__STRICT_ANSI__) && !defined(PERL_GCC_PEDANTIC) to #if defined(__GNUC__) && !defined(PERL_GCC_BRACE_GROUPS_FORBIDDEN) because the ({}) construction can be used under __STRICT_ANSI__ (and should be, because it avoids temporary use of PL_Sv). p4raw-id: //depot/perl@33077
* constingRobin Barker2008-01-141-6/+7
| | | | | | From: "Robin Barker" <Robin.Barker@npl.co.uk> Message-ID: <46A0F33545E63740BC7563DE59CA9C6D0939CA@exchsvr2.npl.ad.local> p4raw-id: //depot/perl@32976
* Well, I know *something* passed make test from a clean build beforeNicholas Clark2008-01-111-2/+2
| | | | | | change 32961, and I thought that it was the right thing, but I guess not. It should have read like this. p4raw-id: //depot/perl@32962
* assert that these are the regexps you were looking for.Nicholas Clark2008-01-111-6/+39
| | | | | | | (at least for the most commonly used macros). Remove the duplicate definition of RX_SUBBEG(), which I was sure I'd done earlier. p4raw-id: //depot/perl@32961
* Fix prototype in regexp code following #32851, and regenSteve Hay2008-01-091-1/+1
| | | p4raw-id: //depot/perl@32925
* ReREFCNT_inc() should return a pointer to REGEXP.Nicholas Clark2008-01-071-2/+2
| | | | | | [I don't get warnings about void context here, but I'm sure someone will :-(] p4raw-id: //depot/perl@32890
* Don't allocate the NV slot for SVt_REGEXP.Nicholas Clark2008-01-051-33/+42
| | | p4raw-id: //depot/perl@32859
* In struct regexp move the member paren_names to the IV union.Nicholas Clark2008-01-051-2/+4
| | | p4raw-id: //depot/perl@32854
* Convert all accesses of the member paren_names of struct regexp toNicholas Clark2008-01-051-0/+2
| | | | | | be accessed via RXp_PAREN_NAMES(). (They are entirely within the regexp implementation). p4raw-id: //depot/perl@32853
* Abolish RXf_UTF8. Store the UTF-8-ness of the pattern with SvUTF8().Nicholas Clark2008-01-051-2/+1
| | | p4raw-id: //depot/perl@32852
* Abolish wraplen from struct regexp. We're already storing it in SvCUR.Nicholas Clark2008-01-051-2/+1
| | | p4raw-id: //depot/perl@32845
* Abolish RXp_PRELEN(rx) and RXp_WRAPLEN()Nicholas Clark2008-01-051-7/+5
| | | | | | Fix up some uses of RX_* macros in the block conditionally compiled with STUPID_PATTERN_CHECKS. p4raw-id: //depot/perl@32843
* Abolish wrapped in struct regexp - store the wrapped pattern pointerNicholas Clark2008-01-051-4/+2
| | | | | in the SvPVX(). p4raw-id: //depot/perl@32841
* Add RX_UTF8(), which is effectively SvUTF8() but for regexps.Nicholas Clark2008-01-051-4/+5
| | | | | | | Remove RXp_PRECOMP() and RXp_WRAPPED(). Change the parameter of S_debug_start_match() from regexp to REGEXP. Change its callers [the only part wrong for 5.10.x] p4raw-id: //depot/perl@32840
* Fix the compile for -DPERL_OLD_COPY_ON_WRITE (apart from the tenaciousNicholas Clark2008-01-051-2/+3
| | | | | broken window: ../ext/Compress/Raw/Zlib/t/07bufsize.t) p4raw-id: //depot/perl@32837
* Make struct regexp the body of SVt_REGEXP SVs, REGEXPs become SVs,Nicholas Clark2008-01-021-27/+43
| | | | | | and regexp reference counting is via the regular SV reference counting. This was not as easy at it looks. p4raw-id: //depot/perl@32804
* Wrap all deferences of struct regexp* in macros RX_*() [and forNicholas Clark2008-01-021-16/+41
| | | | | | | regcomp.c and regexec.c RXp_* where necessary] so that in future we can maintain source compatibility when we add an extra level of dereferencing. p4raw-id: //depot/perl@32802
* Reorder the external regexp flags to get RXf_PMf_STD_PMMOD into theNicholas Clark2007-12-291-40/+40
| | | | | | | lowest 4 bits (which saves a shift), and the "flags indicating special patterns" into contiguous bits. This makes everything a little tidier, and saves 88 bytes (woohoo!) of object file with -Os on x86 FreeBSD. p4raw-id: //depot/perl@32775
* The position of the modifier flag bits is actually encoded by a rightNicholas Clark2007-12-291-0/+1
| | | | | | | shift 12 in two places, so replace that magic number with a macro RXf_PMf_STD_PMMOD_SHIFT defined adjacent to the flags it interacts with. p4raw-id: //depot/perl@32774
* Note to future self about moving the regexp flag bits around.Nicholas Clark2007-12-291-1/+3
| | | p4raw-id: //depot/perl@32759
* Wrap wrapped and wraplen from struct regexp in macros RW_WRAPPED() andNicholas Clark2007-12-291-0/+3
| | | | | | RX_WRAPLEN() to preserve source compatibility when they get moved around. p4raw-id: //depot/perl@32758
* Eliminate prelen from struct regexp. Possibly we are hardcoding a bitNicholas Clark2007-12-281-2/+4
| | | | | | to much, as the replacement assumes that the wrapping string has exactly 1 character after the wrapped string [specifically ')']. p4raw-id: //depot/perl@32757
* Eliminate precomp from struct regexp. Store the offset of precomp fromNicholas Clark2007-12-281-3/+3
| | | | | | | wrapped in pre_prefix, a 4 bit value. (Maybe only for now) reduce seen_evals from I32 to 28 bits. Will anyone have more than 268435456 eval groups in a regexp? p4raw-id: //depot/perl@32755
* Wrap all accesses to the members precomp and prelen of struct regexp inNicholas Clark2007-12-281-0/+4
| | | | | | the macros RX_PRECOMP() and RX_PRELEN(). This will allow us to reduce the regexp storage overhead by computing them at retrieve time. p4raw-id: //depot/perl@32753
* Fix up copyright years for files modified in 2007.Nicholas Clark2007-11-071-1/+1
| | | p4raw-id: //depot/perl@32237
* Add note to regexp.h that modifying RXf_ type flags requires a regen.pl or ↵Yves Orton2007-08-181-0/+14
| | | | | | | | regcomp.pl to update regnodes.h Currently the *NIX makefiles are not set up to update regnodes.h automatically when regexp.h is modified. This at least warns people modifying the list about what they should do. A better solution is needed. p4raw-id: //depot/perl@31734
* Optimize split //Ævar Arnfjörð Bjarmason2007-08-091-0/+1
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80708090049p2cf4810ep5a437ad53f64fa78@mail.gmail.com> p4raw-id: //depot/perl@31693
* /p vs (?p)Abigail2007-06-301-1/+1
| | | | | | | | | | | | | Date: Fri, 29 Jun 2007 23:38:07 +0200 Message-ID: <20070629213807.GA14454@abigail.nl> Subject: [PATCH pod/perlre.pod] Keeping up with the changes. From: Abigail <abigail@abigail.be> Date: Sat, 30 Jun 2007 01:24:36 +0200 Message-ID: <20070629232436.GA15326@abigail.nl> Plus tweaks, and debug enahancements. p4raw-id: //depot/perl@31506
* fix overzealous search and replaceYves Orton2007-06-291-4/+4
| | | p4raw-id: //depot/perl@31498
* Rename various regex defined so that they have distinct prefixes based on ↵Yves Orton2007-06-281-20/+20
| | | | | | | | | | | | | | their usage. RXf_ => flags used in pm_flags argument to regcomp and stored in the regex via rx->extflags PREGf_ => flags stored in rx->intflags RXapif_ => argument flags for regex named capture api RX_BUFF_IDX_ => special indexes to represent $` $' $& used in the numeric capture buffer api PREGf is untouched by this change, but RXf_ is split into RXapif and RX_BUFF_IDX_. p4raw-id: //depot/perl@31497
* Move the RXf_WHITE logic for split " " into the regex engineÆvar Arnfjörð Bjarmason2007-06-281-1/+8
| | | | | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80706281306i4dbba39em3eeb8da1d67ea27c@mail.gmail.com> (with tweaks) p4raw-id: //depot/perl@31495
* SvRX() and SvRXOK() macrosÆvar Arnfjörð Bjarmason2007-06-181-0/+35
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80706172033h1908aa0ge15698204e0b79ed@mail.gmail.com> p4raw-id: //depot/perl@31409
* Re: [PATCH] Callbacks for named captures (%+ and %-)Ævar Arnfjörð Bjarmason2007-06-061-3/+40
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80706031324y5618d519p460da27a2e7fe712@mail.gmail.com> p4raw-id: //depot/perl@31341
* Minor perlreapi.pod cleanupÆvar Arnfjörð Bjarmason2007-05-201-2/+11
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80705160938w13789b63m6d5f4710441ceac@mail.gmail.com> p4raw-id: //depot/perl@31244
* FETCH/STORE/LENGTH callbacks for numbered capture variablesÆvar Arnfjörð Bjarmason2007-05-031-5/+9
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80705011658g1156e14cw4d2b21a8d772ed41@mail.gmail.com> p4raw-id: //depot/perl@31130
* tweak some regexp params to avoid warningsYves Orton2007-05-021-2/+2
| | | | | Message-ID: <9b18b3110705011446h2113221cndf70af928d72505@mail.gmail.com> p4raw-id: //depot/perl@31118
* Re: [PATCH] Cleanup of the regexp APIÆvar Arnfjörð Bjarmason2007-04-301-12/+14
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80704261922j3db0615wa86ccc4cb65b2713@mail.gmail.com> p4raw-id: //depot/perl@31106
* Re: [PATCH (incomplete)] Make regcomp use SV* sv, instead of char* exp, ↵Ævar Arnfjörð Bjarmason2007-04-231-1/+1
| | | | | | | char* xend Message-ID: <51dd1af80704211430m6ad1b4afy49b069faa61e33a9@mail.gmail.com> p4raw-id: //depot/perl@31027