summaryrefslogtreecommitdiff
path: root/ext/re
Commit message (Collapse)AuthorAgeFilesLines
* add strbeg argument to Perl_re_intuit_start()David Mitchell2013-06-022-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | (note that this is a change both to the perl API and the regex engine plugin API). Currently, Perl_re_intuit_start() is passed an SV, plus pointers to: where in the string to start matching (strpos); and to the end of the string (strend). Unlike Perl_regexec_flags(), it doesn't also have a strbeg arg. Because of this this, it guesses strbeg: based on the passed SV (if its svPOK()); or just set to strpos otherwise. This latter can happen if for example the SV is overloaded. Note also that this latter guess is wrong, and could in theory make /\b.../ fail. But just to confuse matters, although Perl_re_intuit_start() itself uses its guesstimate strbeg var, some of the functions it calls use the global value of PL_bostr instead. To make this work, the *callers* of Perl_re_intuit_start() currently set PL_bostr first. This is why \b doesn't actually break. The fix to this unholy mess is to simply add a strbeg arg to Perl_re_intuit_start(). It's also the first step to eliminating PL_bostr altogether.
* typo fix for reDavid Steinbrunner2013-05-251-2/+2
| | | | Bump $VERSION.
* Enable perl core tests to pass when locale support is not available.Jess Robinson2013-02-092-1/+7
| | | | | | | | use locale - this will now die if $Config{d_setlocale} is not true. All tests that use locale will skip if $Config{d_setlocale} is not true. This enables us to pass tests on Android which uses ICU instead of locales. The committer removed trailing white space
* fix a hash order dependency in t/re_funcs_u.tYves Orton2012-10-261-1/+1
|
* ext/re: Optimize XPUSH's to EXTEND(), PUSH,...Steffen Mueller2012-10-222-3/+4
|
* Remove the MPE/iX port.Nicholas Clark2012-09-211-3/+0
| | | | | MPE/iX was a business-oriented minicomputer operating system made by Hewlett-Packard. Support from HP terminated at the end of 2010.
* Add empty inline_invlist.cKarl Williamson2012-08-251-2/+7
| | | | | | | | | | This will be used for things need to handle inversion lists in the three files that currently use them. I'm putting this in a separate hdr, because inversion lists are very internal-only, so should not be grouped in with things that there is an external API for. It is a dot-c file so that the functions can continue to be declared with embed.fnc, and porting/args_assert.t will continue to work, as it looks only in .c files.
* re.pm: Nits in podKarl Williamson2012-08-111-15/+23
| | | | | This has clarifications, grammar changes, and reflowing to fit into 79 columns
* Optimize a single character [class] into EXACTishKarl Williamson2012-07-241-4/+4
| | | | | Things like /[s]/ or /[s]/i can be optimized as if they did not have the brackets, /s/ or /s/i.
* PATCH: [perl #113750] re.pm clobbers $_Karl Williamson2012-06-202-3/+8
| | | | Thanks to Jesse Luehrs and Father Chrysostomos for testing advice.
* update docs for (?{}) jumbo fixDavid Mitchell2012-06-141-2/+3
| | | | | | Update the docs and add perldelta entries summarising the changes and fixes related to (?{}) and (??{}) accumulated over the 120 or so commits in this branch.
* propagate /msix and (?msix) etc flags into (??{})David Mitchell2012-06-131-1/+10
| | | | | | | | | | | In /.........(??{ some_string_value; }).../flags and /(?flags).(??{ some_string_value; }).../, use flags when compiling the inner /some_string_value/ pattern. Achieve this by storing the compile-time modifier flags in the (apparently) unused 'flags' field of the EVAL node in the (??{}) case.
* make Perl_... and my_re_op_compile sigs matchDavid Mitchell2012-06-131-1/+1
|
* bump re.pm version numberDavid Mitchell2012-06-131-1/+1
|
* fix =/== typo in ext/re/t/regop.tDavid Mitchell2012-06-131-1/+1
|
* add op_comp field to regexp_engine APIDavid Mitchell2012-06-131-2/+6
| | | | | | | | | | | | | | | | | | | | Perl's internal function for compiling regexes that knows about code blocks, Perl_re_op_compile, isn't part of the engine API. However, the way that regcomp.c is dual-lifed as ext/re/re_comp.c with debugging compiled in, means that Perl_re_op_compile is also compiled as my_re_op_compile. These days days the mechanism to choose whether to call the main functions or the debugging my_* functions when 'use re debug' is in scope, is the re engine API jump table. Ergo, to ensure that my_re_op_compile gets called appropriately, this method needs adding to the jump table. So, I've added it, but documented as 'for perl internal use only, set to null in your engine'. I've also updated current_re_engine() to always return a pointer to a jump table, even if we're using the internal engine (formerly it returned null). This then allows us to use the simple condition (eng->op_comp) to determine whether the current engine supports code blocks.
* add Perl_re_op_compile functionDavid Mitchell2012-06-132-0/+2
| | | | | | | | | | | | Make Perl_re_compile() a thin wrapper around a new function, Perl_re_op_compile(). This function can take either a string pattern or a list of ops. Then make pmruntime() pass a list of ops directly to it, rather concatenating all the consts into a single string and passing the const to Perl_re_compile(). For now, Perl_re_op_compile just does the same: if its passed an op tree rather than an SV, then it just concats the consts. So this is is just the next step towards eventually allowing the regex engine to use the ops directly.
* update the editor hints for spaces, not tabsRicardo Signes2012-05-291-2/+2
| | | | | This updates the editor hints in our files for Emacs and vim to request that tabs be inserted as spaces.
* Remove ‘Useless use of "re" pragma’ warningFather Chrysostomos2012-02-032-5/+10
| | | | | | | | | | | | | It’s wrong. $ ./perl -Ilib -le 'use re; print re::regmust(qr/foo/)' Useless use of "re" pragma at -e line 1. foo Useless, eh? OK, then: $ ./perl -Ilib -le 'print re::regmust(qr/foo/)' Undefined subroutine &re::regmust called at -e line 1.
* Increase $re::VERSION to 0.19Father Chrysostomos2012-02-031-1/+1
|
* Update lengthen time-out time for t/re/re.t.Nobuhiro Iwamatsu2011-06-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we carry out this test on SH4, it becomes the time-out. 2 seconds are set in watchdog, but are too short for SH4. This patch was changed for 10 seconds. $ time ./perl t/re/re.t 1..19 ok 1 - is_regexp(REGEXP ref) ok 2 - is_regexp(REGEXP) ok 3 - is_regexp("") ok 4 - regexp_pattern[0] (ref) ok 5 - regexp_pattern[1] (ref) ok 6 - scalar regexp_pattern (ref) ok 7 - regexp_pattern[0] (bare REGEXP) ok 8 - regexp_pattern[1] (bare REGEXP) ok 9 - scalar regexp_pattern (bare REGEXP) ok 10 - !regexp_pattern("") ok 11 - regnames ok 12 - regnames ok 13 - regnames in scalar context ok 14 - regnames ok 15 ok 16 ok 17 ok 18 ok 19 - Didn't loop real 0m7.482s user 0m3.848s sys 0m0.036s
* reflags.t: Remove no longer applicable TODOKarl Williamson2011-05-311-6/+2
| | | | | | When this test was written, t the new 5.14 regex modifiers were not usable in suffix notation. That changed before 5.14 shipped, but the test did not.
* PATCH: final [perl #86972]: Allow /(?aia)/Karl Williamson2011-04-112-13/+34
| | | | | This fixes "use re '/aia'", and completes the sequence of commits for this ticket.
* Move t/re/re.t to ext/re/t/re_funcs_u.t, so that it is not part of minitest.Nicholas Clark2011-03-061-0/+143
| | | | | | | The test file is for functions in the re:: namespace implemented in universal.c, but needs to load re, which isn't built for minitest. As none of these functions are used as part of the core's build process, seems best to move it with all the other tests related to the re extension.
* Avoid segfault in re::regmust with pluggable RE enginesDavid Leadbeater2011-02-181-5/+8
| | | | | | re::regmust would segfault if called on a Regexp belonging to a pluggable regexp engine, only allow on the core and debugging engine. Also correctly moralize the return values to avoid leaking.
* Decrease (unbump?) re.pm’s versionFather Chrysostomos2011-02-141-1/+1
| | | | | b4ab316d increased it unnecessarily, as it had already been increased since 5.13.9 (by ffedb8c).
* re.pm: Add /aa supportKarl Williamson2011-02-142-8/+35
|
* re.pm: Forbid things like /dd, /uuKarl Williamson2011-02-142-23/+32
| | | | | | | This is so they can perhaps be used in the future by Perl. The test file is refactored to test these more comprehensively, adding tests for the recently added /a.
* Bump re.pm’s versionFather Chrysostomos2011-02-121-1/+1
|
* perldebug: capitalise titlesFather Chrysostomos2011-02-121-1/+1
| | | | The capitalisation was rather inconsistent throughout.
* Version bumps for re non-dual-life modules identified byJesse Vincent2011-01-201-1/+1
| | | | ./perl -Ilib Porting/cmpVERSION.pl -xd . v5.13.8
* Add /a regex modifierKarl Williamson2011-01-171-2/+3
| | | | | This restricts certain constructs, like \w, to matching in the ASCII range only.
* Use multi-bit field for regex character setKarl Williamson2011-01-161-2/+2
| | | | | | | | | | | | | The /d, /l, and /u regex modifiers are mutually exclusive. This patch changes the field that stores the character set to use more than one bit with an enum determining which one. This data structure more closely follows the semantics of their being mutually exclusive, and conserves bits as well, and is better expandable. A small API is added to set and query the bit field. This patch is not .xs source backwards compatible. A handful of cpan programs are affected.
* Subject: [PATCH] re.pm: Correct pod statementKarl Williamson2011-01-161-2/+2
| | | | The /d also overrides one of the other pragmas; not just /u, /l.
* .pm: rename variables to reflect expanded usageKarl Williamson2011-01-161-9/+9
| | | | | Certain variables have /dul in their names. /a is about to be added; and maybe more, so give a more generic name to avoid future confusion
* re.pm: correct typoKarl Williamson2011-01-161-1/+1
|
* Fix typos (spelling errors) in ext/*.Peter J. Acklam) (via RT2011-01-071-1/+1
| | | | | | | | | # New Ticket Created by (Peter J. Acklam) # Please include the string: [perl #81882] # in the subject line of all future correspondence about this issue. # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=81882 > Signed-off-by: Abigail <abigail@abigail.be>
* Emit warning for use re "/ul"Father Chrysostomos2010-12-042-1/+32
| | | | | | | This was an omission on my part. This should perhaps be an error, but I am just following what ‘use re’ already does with ‘use re "whatever"’.
* ++substr $re::VERSION, -1Father Chrysostomos2010-11-281-1/+1
|
* Tiny pod fixAndreas J. Koenig2010-11-281-1/+1
|
* Bump re’s versionFather Chrysostomos2010-10-211-1/+1
|
* [perl #78072] use re '/xism';Father Chrysostomos2010-10-212-0/+198
|
* Convert modules in ext/ to pass minimal arguments to XSLoader::load().Nicholas Clark2010-10-141-2/+2
|
* Make dquote_static.c available to ext/re/Tony Cook2010-09-231-1/+6
| | | | | | Under Win32 the main perl source directory isn't in the C include path, so as we do with the re source files, copy dquote_static.c to the ext/re directory.
* re.pm: Change comment to use new (?^...)Karl Williamson2010-09-221-1/+1
|
* Bump module version numbersDavid Golden2010-07-191-1/+1
|
* Standardize on use of 'capture group' over 'buffer'Karl Williamson2010-06-281-1/+1
| | | | | | Both terms 'capture group' and 'capture buffer' are used in the documentation. This patch changes most uses of the latter to the former, as they are referenced using "\g".
* tries: don't allocate memory at runtimeDavid Mitchell2010-05-031-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an indirect fix for [perl #74484] Regex causing exponential runtime+mem usage The trie runtime code was doing more SAVETMPS than FREETMPS and was thus growing a large tmps stack on heavy backtracking. Rather than fixing this directly, I rewrote part of the trie code so that it no longer needs to allocate memory in S_regmatch (it still does in find_byclass()). The basic issue is that multiple branches in the trie may trigger an accept state; for example: "abcd" =~ /xyz/abcd.*X|ab.*Y|/ here, words (branches) 2 and 3 are accept states. The original approach was, at run time, to create a list of accepted word numbers and the character positions of the end of each of those words. Then run the rest of the pattern for each word in the list in turn (in word index order). This requires memory for the list to be allocated and freed. The new approach involves creating extra info at compile time; in particular, for each word, a pointer to the previous accepted word (if any) in the state tree. For example for the above pattern, part of the state tree may be q b c d 1 -> 2 -> 3 -> 4 -> 5 (#3) (#2) (e.g. at state 1, if the next char is 'a', we transition to state 2). Here, state 3 is an accept state with word #3, and 5 is an accept state with word #2. So we build a table indexed by word number, which has wordinfo[2] = 3, wordinfo[3] = 0, thus building the word chain 2->3->0. At run time we run the trie to completion, and remember the word associated with the longest accept state (word #2 above). Then by following back the chain of .prev fields, we can produce a list of all accepting words. We then iteratively find the smallest-numbered (ie LH-most) word in the chain, and run with it. On failure and backtrack, we find the next-smallest and so on. Since we are no longer recording the end-position of each word in the string, we have to recalculate this for each backtrack. We initially record the end-position of the shortest accepting word, and given that we know the length of each word, we can calculate the new position each time as an offset from that first word. Depending on unicode and folding, that calculation can be cheap or expensive. This algorithm is optimised for the typical case where there are a small number (<= 2) accepting states. This patch creates a new compile-time array, trie->wordinfo[], indexed by word number, which contains relevant info about each word. This also supersedes the old trie->newword[] array, whose function of recording "overspills" of multiple words per accept state, is now handled as part of the wordinfo[].prev chain.
* bump versions for core libs changed since 5.11.3Ricardo Signes2010-01-191-1/+1
|
* Fix typo in referenceAbigail2010-01-061-1/+1
|