summaryrefslogtreecommitdiff
path: root/regcomp.h
Commit message (Collapse)AuthorAgeFilesLines
* PATCH: regex longjmp flawsKarl Williamson2010-09-151-1/+3
| | | | | | | | The netbsd - 5.0.2 compiler pointed out that the recent changes to add longjmps to speed up some regex compilations can result in clobbering a few values. These depend on the compiled code, and so didn't show up in other compiler's warnings. This patch reinitializes them after a longjmp.
* Properly free paren_name_list with its regexp.Nicholas Clark2010-05-291-0/+1
| | | | | Previously the AV paren_name_list would "leak" until global destruction. This was only an issue under -DDEBUGGING. Fixes RT #73438.
* Generate PL_simple[] and PL_varies[] with regcomp.pl, rather than hard-coding.Nicholas Clark2010-05-271-31/+0
| | | | | | Add a new flags column to regcomp.sym, with V if the node type is in PL_varies, S if it is in PL_simple, and . if a placeholder is needed because subsequent optional columns are present.
* tries: don't allocate memory at runtimeDavid Mitchell2010-05-031-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an indirect fix for [perl #74484] Regex causing exponential runtime+mem usage The trie runtime code was doing more SAVETMPS than FREETMPS and was thus growing a large tmps stack on heavy backtracking. Rather than fixing this directly, I rewrote part of the trie code so that it no longer needs to allocate memory in S_regmatch (it still does in find_byclass()). The basic issue is that multiple branches in the trie may trigger an accept state; for example: "abcd" =~ /xyz/abcd.*X|ab.*Y|/ here, words (branches) 2 and 3 are accept states. The original approach was, at run time, to create a list of accepted word numbers and the character positions of the end of each of those words. Then run the rest of the pattern for each word in the list in turn (in word index order). This requires memory for the list to be allocated and freed. The new approach involves creating extra info at compile time; in particular, for each word, a pointer to the previous accepted word (if any) in the state tree. For example for the above pattern, part of the state tree may be q b c d 1 -> 2 -> 3 -> 4 -> 5 (#3) (#2) (e.g. at state 1, if the next char is 'a', we transition to state 2). Here, state 3 is an accept state with word #3, and 5 is an accept state with word #2. So we build a table indexed by word number, which has wordinfo[2] = 3, wordinfo[3] = 0, thus building the word chain 2->3->0. At run time we run the trie to completion, and remember the word associated with the longest accept state (word #2 above). Then by following back the chain of .prev fields, we can produce a list of all accepting words. We then iteratively find the smallest-numbered (ie LH-most) word in the chain, and run with it. On failure and backtrack, we find the next-smallest and so on. Since we are no longer recording the end-position of each word in the string, we have to recalculate this for each backtrack. We initially record the end-position of the shortest accepting word, and given that we know the length of each word, we can calculate the new position each time as an offset from that first word. Depending on unicode and folding, that calculation can be cheap or expensive. This algorithm is optimised for the typical case where there are a small number (<= 2) accepting states. This patch creates a new compile-time array, trie->wordinfo[], indexed by word number, which contains relevant info about each word. This also supersedes the old trie->newword[] array, whose function of recording "overspills" of multiple words per accept state, is now handled as part of the wordinfo[].prev chain.
* revert to 5.8.x semantics for \s \w and \dYves Orton2009-10-191-1/+1
| | | | | | | | revert ba9ac1759cb6e7a5e6883c85edd0b450061b5ccb Changing the semantics of \w \s and \d breaks too much and Jesse wants to do a rollout. This disables the new semantics until we can get all the details worked out.
* somewhat fix failing regex tests. but break lots of other stuff at the same timeYves Orton2009-10-191-1/+1
|
* add more positive gofs GPOS tests and fix some bugs tooYves Orton2009-09-101-0/+3
|
* set PERL_LEGACY_UNICODE_CHARCLASS_MAPPINGS to 0 and enable proper POSIX char ↵Yves Orton2009-09-021-1/+2
| | | | | | | | | | | class matching This also alters which Unicode properties that the POSIX character class and the Perl "special" character classes, like \w and \d map to. At the same time it allows a number of tests for POSIX character class behaviour to be switched from todo to non todo. Legacy testing is still available by changing the define and setting the PERL_TEST_LEGACY_POSIX_CC value to true.
* create new unicode props as defined in POSIX spec (optionally use them in ↵Yves Orton2008-11-071-0/+18
| | | | | | | | | | | | | | | | | | | | the regex engine) Perlbug #60156 and #49302 (and probably others) resolve down to the problem that the definition of \s and \w and \d and the POSIX charclasses are different for unicode strings and for non-unicode strings. This broke the character class logic in the regex engine. The easiest fix to make the character class logic sane again is to define new properties which do match. This change creates new property classes that can be used instead of the traditional ones (it does not change the previously defined ones). If the define in regcomp.h: #define PERL_LEGACY_UNICODE_CHARCLASS_MAPPINGS 1 is changed to 0, then the new mappings will be used. This will fix a bunch of bugs that are reported as TODO items in the new reg_posixcc.t test file. p4raw-id: //depot/perl@34769
* Various changes to regex diagnostics and testingYves Orton2008-11-061-2/+2
| | | | | | | | | | | * Make ANYOF output from regprop easier to read by adding ][ in between the unicode representation and the "ascii" one * Make it possible to make tests in re_tests todo. * add a todo test for a complementary character class match that should fail (perl #60156) * Also add a comment explaining a previous commit (relating to perl #60344) p4raw-id: //depot/perl@34755
* Make struct regexp the body of SVt_REGEXP SVs, REGEXPs become SVs,Nicholas Clark2008-01-021-4/+4
| | | | | | and regexp reference counting is via the regular SV reference counting. This was not as easy at it looks. p4raw-id: //depot/perl@32804
* Wrap all deferences of struct regexp* in macros RX_*() [and forNicholas Clark2008-01-021-1/+4
| | | | | | | regcomp.c and regexec.c RXp_* where necessary] so that in future we can maintain source compatibility when we add an extra level of dereferencing. p4raw-id: //depot/perl@32802
* Add editor blocks to some header files.Marcus Holland-Moritz2008-01-011-2/+9
| | | p4raw-id: //depot/perl@32793
* Fix up copyright years for files modified in 2007.Nicholas Clark2007-11-071-1/+1
| | | p4raw-id: //depot/perl@32237
* API spelling patch, by Jerry D. HeddenRafael Garcia-Suarez2007-09-261-1/+1
| | | p4raw-id: //depot/perl@31983
* /p vs (?p)Abigail2007-06-301-1/+3
| | | | | | | | | | | | | Date: Fri, 29 Jun 2007 23:38:07 +0200 Message-ID: <20070629213807.GA14454@abigail.nl> Subject: [PATCH pod/perlre.pod] Keeping up with the changes. From: Abigail <abigail@abigail.be> Date: Sat, 30 Jun 2007 01:24:36 +0200 Message-ID: <20070629232436.GA15326@abigail.nl> Plus tweaks, and debug enahancements. p4raw-id: //depot/perl@31506
* s/\bunicode\b/Unicode/; # For everything not dual lifeNicholas Clark2007-06-241-1/+1
| | | p4raw-id: //depot/perl@31455
* Re: [PATCH] Callbacks for named captures (%+ and %-)Ævar Arnfjörð Bjarmason2007-06-061-1/+2
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80706031324y5618d519p460da27a2e7fe712@mail.gmail.com> p4raw-id: //depot/perl@31341
* FETCH/STORE/LENGTH callbacks for numbered capture variablesÆvar Arnfjörð Bjarmason2007-05-031-4/+6
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80705011658g1156e14cw4d2b21a8d772ed41@mail.gmail.com> p4raw-id: //depot/perl@31130
* Re: [PATCH] Cleanup of the regexp APIÆvar Arnfjörð Bjarmason2007-04-301-1/+1
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80704261922j3db0615wa86ccc4cb65b2713@mail.gmail.com> p4raw-id: //depot/perl@31106
* Change meaning of \v, \V, and add \h, \H to match Perl6, add \R to match ↵Yves Orton2007-04-231-1/+13
| | | | | | | PCRE and unicode tr18 Message-ID: <9b18b3110704221434g43457742p28cab00289f83639@mail.gmail.com> p4raw-id: //depot/perl@31026
* Re: Proposed changes and to regular expression interfaces in coreÆvar Arnfjörð Bjarmason2007-04-061-1/+2
| | | | | | From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Message-ID: <51dd1af80703291552y1073bcb6r954b043eb68a4459@mail.gmail.com> p4raw-id: //depot/perl@30849
* Reorder the members of various regexp structs to reduce their size onNicholas Clark2007-03-311-5/+5
| | | | | | | | LP64 platforms, by pairing up I32 and U32 members. Notably structs _reg_trie_data, reg_ac_data, regexp and regmatch_state down by 8 bytes, re_save_state by 16, and regmatch_slab up by 48 (ie one more state per slab) p4raw-id: //depot/perl@30815
* Resolve PL_curpm issues with (??{}) and fix corruption of match results when ↵Yves Orton2007-03-221-6/+1
| | | | | | | | | pattern is a qr. Message-ID: <9b18b3110703210239x540f5ad9mdb41c2ea6229ac31@mail.gmail.com> plus two follow-up patches (minor tweaks) p4raw-id: //depot/perl@30678
* Re: New file: t/op/regexp_email.tYves Orton2007-03-011-0/+4
| | | | | Message-ID: <9b18b3110702280845p7860ca08taf1aead39a178aa4@mail.gmail.com> p4raw-id: //depot/perl@30436
* add hooks for capture buffers into regex engine.Yves Orton2007-02-131-0/+2
| | | | | Message-ID: <9b18b3110702131127q79cc6df1lb1480d9a40d15213@mail.gmail.com> p4raw-id: //depot/perl@30265
* Improve regex stringification codeYves Orton2007-01-311-0/+1
| | | | | Message-ID: <9b18b3110701301458k2f6a8254hea6c6db28489c38b@mail.gmail.com> p4raw-id: //depot/perl@30084
* Disable positive lookaround optimisationsYves Orton2007-01-221-1/+1
| | | | | | Message-ID: <9b18b3110701210953l4df6198re36a9342e6049583@mail.gmail.com> Date: Sun, 21 Jan 2007 18:53:38 +0100 p4raw-id: //depot/perl@29923
* Re: [PATCH] Change implementation of %+ to use a proper tied hash interface ↵Yves Orton2007-01-161-4/+4
| | | | | | | and add support for %- Message-ID: <9b18b3110701151406p7168b20byf873ee2e58091ca3@mail.gmail.com> p4raw-id: //depot/perl@29843
* Make offsets support conditionalYves Orton2007-01-161-3/+14
| | | | | Message-ID: <9b18b3110701140624v452f7684x5e9d2890805489fd@mail.gmail.com> p4raw-id: //depot/perl@29842
* Add support for /k modfier for matching along with ${^PREMATCH}, ${^MATCH}, ↵Yves Orton2007-01-151-3/+0
| | | | | | | | ${^POSTMATCH} Message-ID: <9b18b3110701111731x29b1c63i57b1698f769b3bbc@mail.gmail.com> (with tweaks) p4raw-id: //depot/perl@29831
* Update copyright years in .h files. Also, in .plRafael Garcia-Suarez2007-01-051-1/+1
| | | | | | files that generate .h files, so they'll be ready next time. p4raw-id: //depot/perl@29695
* Re: Named-capture regex syntaxYves Orton2006-12-251-1/+5
| | | | | Message-ID: <9b18b3110612240538m5c45654br7d27171835f6664@mail.gmail.com> p4raw-id: //depot/perl@29621
* Remove code duplication in S_to_utf8_substr() and S_to_byte_substr()Nicholas Clark2006-12-101-0/+2
| | | | | | by taking advantage of how anchored_* and float_* are stored in arrays to use a loop. p4raw-id: //depot/perl@29503
* Further tweaks to make it easier to create regexp engine plug ins.Yves Orton2006-12-051-13/+25
| | | | | | | Message-ID: <9b18b3110612050713g77cac516x46fb5baac99b47c9@mail.gmail.com> (with tweaks) p4raw-id: //depot/perl@29468
* Better version of last patch, by Yves Orton.Rafael Garcia-Suarez2006-12-041-1/+17
| | | | | | Actually the regexp engine structure only needs one compilation function hook. p4raw-id: //depot/perl@29459
* The new regexp compilation function must be added to the engine structure.Rafael Garcia-Suarez2006-12-041-0/+1
| | | p4raw-id: //depot/perl@29458
* Continue split of perl internal regexp structures from ones that are engine ↵Yves Orton2006-12-011-3/+2
| | | | | | | specific. Message-ID: <9b18b3110611301306p5cad5deal4aa55559b8c8defd@mail.gmail.com> p4raw-id: //depot/perl@29430
* Re: Fix \k<foo> preceded by literalYves Orton2006-11-291-1/+4
| | | | | Message-ID: <9b18b3110611290718o685a07ddja39f595ed97c231a@mail.gmail.com> p4raw-id: //depot/perl@29420
* Move words and revcharmap out of struct _rev_trie_data and duplicateNicholas Clark2006-11-271-4/+13
| | | | | them on thread clone. p4raw-id: //depot/perl@29394
* Move widecharmap out of the shared structure _reg_trie_data into theNicholas Clark2006-11-261-1/+1
| | | | | | top level regdata array, so that it can be correctly duplicated on thread clone. p4raw-id: //depot/perl@29393
* Swap _reg_ac_data.trie to U32 offset into the regdata array, asNicholas Clark2006-11-261-1/+1
| | | | | preliminary to moving _reg_trie_data.widecharmap out too. p4raw-id: //depot/perl@29392
* Moving the reference count to the front of both _reg_trie_data andNicholas Clark2006-11-261-3/+7
| | | | | _reg_ac_data allows smaller code in Perl_regdupe. p4raw-id: //depot/perl@29391
* \G with /g results in infinite loop in 5.6 and laterYves Orton2006-11-221-0/+8
| | | | | Message-ID: <9b18b3110611220811k1a54f650t1bd7c6a9450b0a7e@mail.gmail.com> p4raw-id: //depot/perl@29354
* Re: [PATCH] New regex syntax omnibusYves Orton2006-11-131-0/+1
| | | | | Message-ID: <9b18b3110611090809l667860c9t6c27453d7c86a21e@mail.gmail.com> p4raw-id: //depot/perl@29260
* Regex Utility Functions and Substituion Fix (XML::Twig core dump)Yves Orton2006-11-131-0/+1
| | | | | | | | Message-ID: <9b18b3110611121429g1fc9d6c1t4007dc711f9e8396@mail.gmail.com> Plus a couple tweaks to ext/re/re.pm and t/op/pat.t to those patches to apply cleanly. p4raw-id: //depot/perl@29252
* New regex syntax omnibusYves Orton2006-11-071-0/+1
| | | | | | | | Message-ID: <9b18b3110611060406u2fa1572as57073949a5df9e62@mail.gmail.com> Plus a portability fix (in string comparison for regex verbs) and doc tweaks / podchecker fixes p4raw-id: //depot/perl@29222
* Add more backtracking control verbs to regex engine (?CUT), (?ERROR)Yves Orton2006-11-021-0/+4
| | | | | Message-ID: <9b18b3110611020335h7ea469a8g28ca483f6832816d@mail.gmail.com> p4raw-id: //depot/perl@29189
* Re: Off by one in the trie code?Yves Orton2006-10-191-1/+2
| | | | | | | | | | Message-ID: <9b18b3110610181151i3ca438cdied769ebaa4255079@mail.gmail.com> 1. code necessary to make patterns with interpolated vars behave correctly under lexical re 'debug', including additional tests. 2. changes necessary to resolve the off by one error, 3. tweaks to re.pm to document that re 'debug' is lexical, p4raw-id: //depot/perl@29057
* Add possessive quantifiers to regex engine.Yves Orton2006-10-131-0/+6
| | | | | | | Message-ID: <9b18b3110610121223m191e47ddtce3398cb0e8ba320@mail.gmail.com> With doc tweaks p4raw-id: //depot/perl@29005