summaryrefslogtreecommitdiff
path: root/regexp.h
Commit message (Collapse)AuthorAgeFilesLines
* Add /a regex modifierKarl Williamson2011-01-171-0/+3
| | | | | This restricts certain constructs, like \w, to matching in the ASCII range only.
* Change name of /d to DEPENDSKarl Williamson2011-01-161-3/+3
| | | | | | | I much prefer David Golden's name for /d whose meaning 'depends' on circumstances, instead of 'dual' meaning it could be one or another. Change it before this gets out in a stable release, and we're stuck with the old name.
* Use multi-bit field for regex character setKarl Williamson2011-01-161-1/+27
| | | | | | | | | | | | | The /d, /l, and /u regex modifiers are mutually exclusive. This patch changes the field that stores the character set to use more than one bit with an enum determining which one. This data structure more closely follows the semantics of their being mutually exclusive, and conserves bits as well, and is better expandable. A small API is added to set and query the bit field. This patch is not .xs source backwards compatible. A handful of cpan programs are affected.
* Fix typos (spelling errors) in Perl sources.Peter J. Acklam) (via RT2011-01-071-2/+2
| | | | | | | | | # New Ticket Created by (Peter J. Acklam) # Please include the string: [perl #81904] # in the subject line of all future correspondence about this issue. # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=81904 > Signed-off-by: Abigail <abigail@abigail.be>
* The docs for SvRX and SvRXOK still refered to magic and the code snippetDavid Leadbeater2010-12-071-9/+6
| | | | was wrong.
* regcomp.pl -> regen/regcomp.plFather Chrysostomos2010-10-131-6/+6
|
* Add /d, /l, /u (infixed) regex modifiersKarl Williamson2010-09-221-3/+9
| | | | | | | | | | | | This patch adds recognition of these modifiers, with appropriate action for d and l. u does nothing useful yet. This allows for the interpolation of a regex into another one without losing the character set semantics that it was compiled with, as for the first time, the semantics is now specified in the stringification as one of these modifiers. To this end, it allocates an unused bit in the structures. The off- sets change so as to not disturb other bits.
* Change to use mnemonic instead of char constantKarl Williamson2010-09-221-0/+1
| | | | The new '^' in (?^...) should really be a macro.
* Add (?^...) regex constructKarl Williamson2010-09-201-0/+4
| | | | | | | | | | | | | | | | | | This adds (?^...) to signify to use the default regex modifiers for the cluster or embedded pattern-match modifier change. The major purpose of this is to simplify regex stringification, so that "^" is output in place of "-xism". As a result, the stringification will not change in the future when new regex modifiers are added, so tests, etc. that rely on a particular stringification will have to change now, but never again. Code that needs to work properly with both old- and new-style regexes can use something like the following: # Accept both old and new-style stringification my $modifiers = (qr/foobar/ =~ /\Q(?^/) ? '^' : '-xism'; This construct is Ben Morrow's idea.
* regexp.h: Move bits aroundKarl Williamson2010-08-111-14/+16
| | | | | | | | | | | | make regen needed. This commit moves some bits in extflags around so that all the unallocated ones are at the boundary between the unshared portion and the portion shared with op.h. This allows them to be allocated in the future to go either way, without affecting binary compatibility at that time. The high-order bits are unaffected, but the low order ones move to fill the gap.
* op.h, regexp.h: renumber shifts.Karl Williamson2010-08-111-25/+25
| | | | | | This patch doesn't change any generated code. It just changes the base numbering of the shifts from 1 to 0. In regexp.h the RXf_BASE_SHIFT was changed to make sure the used bits didn't change
* op_reg_common.h: Continue refactoringKarl Williamson2010-08-111-14/+4
| | | | | | | | | | | | | | | The new op_reg_common.h did not have in it all the things that made sense for it to have, including some comment changes that I should have made when I created it. I also realized the the new mechanism of using shifts allowed RXf_PMf_STD_PMMOD_SHIFT to actually control things, rather than be a #define that one had to remember to change if those things changed independently. Finally, I created a check so that adding bits without adding them to RXf_PMf_COMPILETIME will force a compilation error. (This came from the school of hard knocks)
* regexp.h: Nit in commentsKarl Williamson2010-08-111-4/+4
|
* op_reg_common.h: Refactor variable for safetyKarl Williamson2010-08-111-1/+1
| | | | | | | | This patch changes the variable that tells how many common bits there are to instead be +1 that value, so bits won't get reused. A later commit will renumber the bits in op.h and regexp.h, but for now things are left as-is there, which means the base variables in those two files must subtract one to compensate for the +1
* regexp.h, op.h: decouple mostly from op_reg_common.hKarl Williamson2010-08-111-24/+26
| | | | | | This patch changes the shift bases to new ones local in the files that are set to the common one. Thus, there is now a single point of coupling between in each file to the common one.
* regexp.h: Fix error check to use correct offsetKarl Williamson2010-08-111-1/+1
|
* Refactor common parts of op.h, regexp.h into new .hKarl Williamson2010-07-291-31/+29
| | | | | | | | | | | op.h and regexp.h share common elements in their data structures. They have had to manually be kept in sync. This patch makes it easier by putting those common parts into a common header #included by the two. To do this, it seemed easiest to change the symbol definitions to use left shifts to generate the flag bits. But this meant that regcomp.pl and axt/B/defsubs_h.PL had to be taught to recognize those forms of expressions, done in separate commits
* regexp.h: Add some commentsKarl Williamson2010-07-291-1/+11
|
* reduce size of regmatch_state.u.curlyx by 2 wordsDavid Mitchell2010-06-061-3/+2
|
* Add s///r (non-destructive substitution).David Caldwell2010-05-221-1/+3
| | | | | | | | | | | | | | | | This changes s/// so that it doesn't act destructively on its target. Instead it returns the result of the substitution (or the original string if there was no match). In addition this patch: * Adds a new warning when s///r happens in void context. * Adds a error when you try to use s///r with !~ * Makes it so constant strings can be bound to s///r with =~ * Adds documentation. * Adds some tests. * Updates various debug code so it knows about the /r flag. * Adds some new 'r' words to B::Deparse.
* Remove union _xivu from struct regexp - replace it with a non-union paren_names.Nicholas Clark2010-05-211-5/+2
| | | | This was the only user of xivu_hv in union _xivu, so remove that too.
* In the SV body, exchange the positions of the NV and stash/magic.Nicholas Clark2010-05-211-1/+1
|
* tries: don't allocate memory at runtimeDavid Mitchell2010-05-031-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an indirect fix for [perl #74484] Regex causing exponential runtime+mem usage The trie runtime code was doing more SAVETMPS than FREETMPS and was thus growing a large tmps stack on heavy backtracking. Rather than fixing this directly, I rewrote part of the trie code so that it no longer needs to allocate memory in S_regmatch (it still does in find_byclass()). The basic issue is that multiple branches in the trie may trigger an accept state; for example: "abcd" =~ /xyz/abcd.*X|ab.*Y|/ here, words (branches) 2 and 3 are accept states. The original approach was, at run time, to create a list of accepted word numbers and the character positions of the end of each of those words. Then run the rest of the pattern for each word in the list in turn (in word index order). This requires memory for the list to be allocated and freed. The new approach involves creating extra info at compile time; in particular, for each word, a pointer to the previous accepted word (if any) in the state tree. For example for the above pattern, part of the state tree may be q b c d 1 -> 2 -> 3 -> 4 -> 5 (#3) (#2) (e.g. at state 1, if the next char is 'a', we transition to state 2). Here, state 3 is an accept state with word #3, and 5 is an accept state with word #2. So we build a table indexed by word number, which has wordinfo[2] = 3, wordinfo[3] = 0, thus building the word chain 2->3->0. At run time we run the trie to completion, and remember the word associated with the longest accept state (word #2 above). Then by following back the chain of .prev fields, we can produce a list of all accepting words. We then iteratively find the smallest-numbered (ie LH-most) word in the chain, and run with it. On failure and backtrack, we find the next-smallest and so on. Since we are no longer recording the end-position of each word in the string, we have to recalculate this for each backtrack. We initially record the end-position of the shortest accepting word, and given that we know the length of each word, we can calculate the new position each time as an offset from that first word. Depending on unicode and folding, that calculation can be cheap or expensive. This algorithm is optimised for the typical case where there are a small number (<= 2) accepting states. This patch creates a new compile-time array, trie->wordinfo[], indexed by word number, which contains relevant info about each word. This also supersedes the old trie->newword[] array, whose function of recording "overspills" of multiple words per accept state, is now handled as part of the wordinfo[].prev chain.
* much better swap logic to support reentrancy and fix assert failureGeorge Greer2009-07-261-1/+1
| | | | | | | | | | | Commit c74340f9 added backreferences as well as the idea of a ->swap regex pointer to keep track of the match offsets in case of backtracking. The problem is that when Perl re-enters the regex engine to handle utf8::SWASHNEW, the ->swap is not saved/restored/cleared so any capture from the utf8 (Perl) code could inadvertently modify the regex match data that caused the utf8 swash to get built. This change should close out RT #60508
* Eliminate struct regexp_allocated and xpvio_allocated.Nicholas Clark2009-07-171-6/+0
| | | | | | Calculate memory allocation using regexp and XPVIO, and the offset of the first real structure member. This avoids tripping over alignment differences between X* and x*_allocated, because x*_allocated doesn't have a double in it.
* Revert SvPVX() to allow lvalue usage, but also add aMarcus Holland-Moritz2008-11-071-0/+2
| | | | | | | MUTABLE_SV() check. Use SvPVX_const() instead of SvPVX() where only a const SV* is available. Also fix two falsely consted pointers in Perl_sv_2pv_flags(). p4raw-id: //depot/perl@34770
* SvPV() does not take a const SV*, which means that the pattern argumentNicholas Clark2008-10-301-1/+1
| | | | | | | | | | to Perl_re_compile() can't be const, which means that the pattern argument to Perl_pregcomp() can't be const, as can't the argument in the function in the regexp engine structure. It's a shame that no-one spotted this earlier. (Again) I may have rendered the documentation inaccurate. p4raw-id: //depot/perl@34672
* Update copyright years.Nicholas Clark2008-10-251-1/+1
| | | p4raw-id: //depot/perl@34585
* Re: [PATCH] readable assertion names, now the restReini Urban2008-06-081-27/+27
| | | | | | From: "Reini Urban" <rurban@x-ray.at> Message-ID: <6910a60806080717h1aaaef1fh425a2ef21a62c9ed@mail.gmail.com> p4raw-id: //depot/perl@34030
* Fix bit-fields for VC [was RE: [perl #50386] GIMME_V broken with 5.10.0/GCC ↵Jan Dubois2008-02-121-2/+2
| | | | | | | | and XS?] From: "Jan Dubois" <jand@activestate.com> Message-ID: <02ee01c8651b$17ef72f0$47ce58d0$@com> p4raw-id: //depot/perl@33292
* Standardise the conditional compilation protection of ({}) fromNicholas Clark2008-01-261-2/+2
| | | | | | | | | #if defined(__GNUC__) && !defined(__STRICT_ANSI__) && !defined(PERL_GCC_PEDANTIC) to #if defined(__GNUC__) && !defined(PERL_GCC_BRACE_GROUPS_FORBIDDEN) because the ({}) construction can be used under __STRICT_ANSI__ (and should be, because it avoids temporary use of PL_Sv). p4raw-id: //depot/perl@33077
* constingRobin Barker2008-01-141-6/+7
| | | | | | From: "Robin Barker" <Robin.Barker@npl.co.uk> Message-ID: <46A0F33545E63740BC7563DE59CA9C6D0939CA@exchsvr2.npl.ad.local> p4raw-id: //depot/perl@32976
* Well, I know *something* passed make test from a clean build beforeNicholas Clark2008-01-111-2/+2
| | | | | | change 32961, and I thought that it was the right thing, but I guess not. It should have read like this. p4raw-id: //depot/perl@32962
* assert that these are the regexps you were looking for.Nicholas Clark2008-01-111-6/+39
| | | | | | | (at least for the most commonly used macros). Remove the duplicate definition of RX_SUBBEG(), which I was sure I'd done earlier. p4raw-id: //depot/perl@32961
* Fix prototype in regexp code following #32851, and regenSteve Hay2008-01-091-1/+1
| | | p4raw-id: //depot/perl@32925
* ReREFCNT_inc() should return a pointer to REGEXP.Nicholas Clark2008-01-071-2/+2
| | | | | | [I don't get warnings about void context here, but I'm sure someone will :-(] p4raw-id: //depot/perl@32890
* Don't allocate the NV slot for SVt_REGEXP.Nicholas Clark2008-01-051-33/+42
| | | p4raw-id: //depot/perl@32859
* In struct regexp move the member paren_names to the IV union.Nicholas Clark2008-01-051-2/+4
| | | p4raw-id: //depot/perl@32854
* Convert all accesses of the member paren_names of struct regexp toNicholas Clark2008-01-051-0/+2
| | | | | | be accessed via RXp_PAREN_NAMES(). (They are entirely within the regexp implementation). p4raw-id: //depot/perl@32853
* Abolish RXf_UTF8. Store the UTF-8-ness of the pattern with SvUTF8().Nicholas Clark2008-01-051-2/+1
| | | p4raw-id: //depot/perl@32852
* Abolish wraplen from struct regexp. We're already storing it in SvCUR.Nicholas Clark2008-01-051-2/+1
| | | p4raw-id: //depot/perl@32845
* Abolish RXp_PRELEN(rx) and RXp_WRAPLEN()Nicholas Clark2008-01-051-7/+5
| | | | | | Fix up some uses of RX_* macros in the block conditionally compiled with STUPID_PATTERN_CHECKS. p4raw-id: //depot/perl@32843
* Abolish wrapped in struct regexp - store the wrapped pattern pointerNicholas Clark2008-01-051-4/+2
| | | | | in the SvPVX(). p4raw-id: //depot/perl@32841
* Add RX_UTF8(), which is effectively SvUTF8() but for regexps.Nicholas Clark2008-01-051-4/+5
| | | | | | | Remove RXp_PRECOMP() and RXp_WRAPPED(). Change the parameter of S_debug_start_match() from regexp to REGEXP. Change its callers [the only part wrong for 5.10.x] p4raw-id: //depot/perl@32840
* Fix the compile for -DPERL_OLD_COPY_ON_WRITE (apart from the tenaciousNicholas Clark2008-01-051-2/+3
| | | | | broken window: ../ext/Compress/Raw/Zlib/t/07bufsize.t) p4raw-id: //depot/perl@32837
* Make struct regexp the body of SVt_REGEXP SVs, REGEXPs become SVs,Nicholas Clark2008-01-021-27/+43
| | | | | | and regexp reference counting is via the regular SV reference counting. This was not as easy at it looks. p4raw-id: //depot/perl@32804
* Wrap all deferences of struct regexp* in macros RX_*() [and forNicholas Clark2008-01-021-16/+41
| | | | | | | regcomp.c and regexec.c RXp_* where necessary] so that in future we can maintain source compatibility when we add an extra level of dereferencing. p4raw-id: //depot/perl@32802
* Reorder the external regexp flags to get RXf_PMf_STD_PMMOD into theNicholas Clark2007-12-291-40/+40
| | | | | | | lowest 4 bits (which saves a shift), and the "flags indicating special patterns" into contiguous bits. This makes everything a little tidier, and saves 88 bytes (woohoo!) of object file with -Os on x86 FreeBSD. p4raw-id: //depot/perl@32775
* The position of the modifier flag bits is actually encoded by a rightNicholas Clark2007-12-291-0/+1
| | | | | | | shift 12 in two places, so replace that magic number with a macro RXf_PMf_STD_PMMOD_SHIFT defined adjacent to the flags it interacts with. p4raw-id: //depot/perl@32774
* Note to future self about moving the regexp flag bits around.Nicholas Clark2007-12-291-1/+3
| | | p4raw-id: //depot/perl@32759