summaryrefslogtreecommitdiff
path: root/ext/re
Commit message (Collapse)AuthorAgeFilesLines
* move ignore for re into its own dists gitignoreGraham Knop2020-11-231-0/+1
|
* add gitignore exclusions for files in gitGraham Knop2020-11-231-0/+1
| | | | | | | | There are a number of files excluded using gitignore rules that are included in the repository. This can lead to confusion if something other than git tries to read the ignore files. Add rules to the gitignore files so that these files won't be ignored.
* Add missing aTHX_Hugo van der Sanden2020-09-251-1/+1
| | | | Somehow dropped in 49781baac3.
* Add re::optimization()Hugo van der Sanden2020-09-253-12/+290
| | | | | Given a compiled regexp object, this returns a hashref of the optimization information discovered for it.
* reorder re.pm docsHugo van der Sanden2020-09-251-22/+22
| | | | | regmust() is about internals, it should go at the end; move docs for functions interrogating match results above it.
* Add quadmath to list of libraries.Peter John Acklam2020-05-251-0/+5
|
* Fix a bunch of repeated-word typosDagfinn Ilmari Mannsåker2020-05-221-2/+2
| | | | | Mostly in comments and docs, but some in diagnostic messages and one case of 'or die die'.
* use re qw(debug foo) should warnKarl Williamson2020-03-112-3/+14
| | | | Instead, foo was silently ignored
* ext/re/re.pm: Fix up setting debug option defaultsKarl Williamson2020-03-111-1/+13
| | | | This was not setting the defaults properly for 'debug', and 'Debug'
* Allow wildcard pattern debuggingKarl Williamson2020-03-052-7/+63
| | | | | | | | | | | | | | | | | | | | | | This fixes #17026 Patterns can have subpatterns since 5.30. These are processed when encountered, by suspending the main pattern compilation, compiling the subpattern, and then matching that against the set of all legal possibilities, which Perl knows about. Debugging info for the compilation portion of the subpattern was added by be8790133a0ce8fc67454e55e7849a47a0858d32, without fanfare. But, prior to this new commit, debugging info was not available for that matching portion of the compilation, except under DEBUGGING builds, with -Drv. This commit adds a new option to 'use re qw(Debug ...)', WILDCARD, to enable subpattern match debugging. Whatever other match debugging options have been turned on will show up when a wildcard subpattern is compiled iff WILDCARD is specified. The output of this may be voluminous, which is why you have to ask for it specifically. Or, the EXTRA option turns it on, along with several other things.
* Add ability to dump pre-optimized compiled patternKarl Williamson2019-08-261-11/+17
| | | | in qr//
* ext/re/re.pm: Clarify pod slightlyKarl Williamson2019-08-261-2/+2
|
* ext/re/re.pm: White-space only, bump versionKarl Williamson2019-08-261-24/+25
|
* regcomp.c: Fix up RE_TRACK_PATTERN_OFFSETSKarl Williamson2018-11-161-2/+2
| | | | | | | | | These need to be changed around as a result of removing the sizing pass from pattern compilation. The first element in the array is the number of offsets. This had become wrong. And it is used instead of the program length when it is available.
* -Drv now turns on all regex debuggingKarl Williamson2018-11-161-2/+4
| | | | | This commit makes the v (verbose) modifier to -Dr do something: turn on all possible regex debugging.
* ext/re/t/qr.t: Rmv extraneous Debug statementKarl Williamson2018-11-051-1/+0
| | | | | | | | | | | | | | It looks like this statement to turn on Debugging was used to help debug the issues that this test now tests, and should have been removed when the file was first commited. But no harm came, as no debugging output got generated. This was changed by commit 15cab4d70522286feb2fcb1e7313b2f995343181, "Remove references to passes from regex compiler". It unknowingly fixed a bug wherein a Debug statement didn't get output. That cause qr.t to output that debugging statement. The fix is to simply remove the call for Debugging from qr.t
* Don't use VOL internally, because "volatile" works just fineAaron Crane2017-10-212-2/+2
| | | | However, we do preserve it outside PERL_CORE for the use of XS authors.
* add DEBUGGING_RE_ONLY defineDavid Mitchell2017-06-233-1/+3
| | | | | re.xs and re_top.h both turn DEBUGGING on. make them set DEBUGGING_RE_ONLY too so that its easy to tell this is a fake DEBUGGING.
* re.pm: pod formatting nits, and clarificationsKarl Williamson2017-01-131-8/+29
|
* Add /xx regex pattern modifierKarl Williamson2017-01-132-11/+29
| | | | | This was first proposed in the thread starting at http://www.nntp.perl.org/group/perl.perl5.porters/2014/09/msg219394.html
* t/regop.t: improve test nameYves Orton2016-10-191-1/+1
|
* Make deprecated qr//xx fatalKarl Williamson2016-05-092-14/+6
| | | | This has been deprecated since v5.22
* Various tests: use centralized locale detectionKarl Williamson2015-11-201-5/+4
| | | | | | | These tests were using individually defined heuristics to decide whether to do locale testing or not. However t/loc_tools.pl provides functions that are more reliable and complete for determining this than the hand-rolled ones in these tests.
* Update list of files to clean for ceab18aaa.Matthew Horsfall2015-08-201-1/+1
|
* b992490d copied wrong for ext re.Jarkko Hietaniemi2015-08-201-5/+5
| | | | | | This has been magically working since ext re builds with -I../.., and so picks up the inline headers from the top, the copied bogus file has been left unused.
* dquote_static.c -> dquote.cJarkko Hietaniemi2015-07-221-5/+5
| | | | Instead of #include-ing the C file, compile it normally.
* inline_invlist.c -> invlist_inline.hJarkko Hietaniemi2015-07-221-6/+6
|
* Replace common Emacs file-local variables with dir-localsDagfinn Ilmari Mannsåker2015-03-221-6/+0
| | | | | | | | | | | | | | | | An empty cpan/.dir-locals.el stops Emacs using the core defaults for code imported from CPAN. Committer's work: To keep t/porting/cmp_version.t and t/porting/utils.t happy, $VERSION needed to be incremented in many files, including throughout dist/PathTools. perldelta entry for module updates. Add two Emacs control files to MANIFEST; re-sort MANIFEST. For: RT #124119.
* re_intuit_start(): improve debugging outputDavid Mitchell2015-03-171-2/+2
| | | | | | | | | 1) make string offsets be consistently counted from strbeg, rather than a mixture of that and strpos; 2) make it clearer when rx_origin has been updated, since that value is the raison d'etre of intuit(); 3) always show the input and output offsets when calling fbm_intr() from intuit().
* Skip various locale tests when locales are not availableKarl Williamson2015-03-091-6/+2
| | | | | | | It is possible to compile Perl without locales, and some platforms may not have them available properly. These tests were failing under these conditions. This commit uses the new infrastructure in loc_tools.pl to centralize the knowledge of how to determine if locales are available.
* use re 'strict' doc changesKarl Williamson2015-03-071-5/+9
| | | | | Add to perlexperiment; note that an alternative syntax has been proposed; nits.
* ext/re/t/re_funcs_u.t: Generalize for non-ASCII platformsKarl Williamson2015-03-051-1/+1
|
* ext/re/re.pm: Fix commentKarl Williamson2015-02-051-1/+1
|
* re.pm: White-space onlyKarl Williamson2015-02-051-4/+4
| | | | Fix some indents, vertically align ternary
* Make 'no re' workKarl Williamson2015-02-045-6/+52
| | | | | | | A plain 'no re'; without subpragmas prior to this commit only turned off a few things. Now it turns off all the enabled things. For example, previously, you couldn't turn off debugging, once enabled, inside the same block.
* ext/re/t/re.t: Use variable instead of constantsKarl Williamson2015-02-041-6/+9
| | | | | There are multiple occurrences of these constants in the file. It's better to use a variable than to repeat them.
* re.pm: Bump version to 0.31Karl Williamson2015-02-041-1/+1
|
* Add 'strict' subpragma to 'use re'Karl Williamson2015-01-132-1/+139
| | | | | | | | | | | | | | | | | | This subpragma is to allow p5p to add warnings/errors for regex patterns without having to worry about backwards compatibility. And it allows users who want to have the latest checks on their code to do so. An experimental warning is raised by default when it is used, not because the subpragma might go away, but because what it catches is subject to change from release-to-release, and so the user is acknowledging that they waive the right to backwards compatibility. I will be working in the near term to make some changes to what is detected by this. Note that there is no indication in the pattern stringification that it was compiled under this. This means I didn't have to figure out how to stringify it. It is fine because using this doesn't affect what the pattern gets compiled into, if successful. And interpolating the stringified pattern under either strict or non-strict should both just work.
* Bump re.pm version for changesMatthew Horsfall2014-12-281-1/+1
|
* Support for nocapture regexp flag /nMatthew Horsfall (alh)2014-12-281-1/+2
| | | | | | | | | | | | | | | | | | | | This flag will prevent () from capturing and filling in $1, $2, etc... Named captures will still work though, and if used will cause $1, $2, etc... to be filled in *only* within named groups. The motivation behind this is to allow the common construct of: /(?:b|c)a(?:t|n)/ To be rewritten more cleanly as: /(b|c)a(t|n)/n When you want grouping but no memory penalty on captures. You can also use ?n inside of a () directly to avoid capturing, and ?-n inside of a () to negate its effects if you want to capture.
* [perl #123458] list cx re::regexp_pattern($nonre)Father Chrysostomos2014-12-191-0/+1
| | | | | It was returning (undef) in list context, though it was documented to return the empty list.
* add filename handling to xs handshakeDaniel Dragan2014-11-131-1/+1
| | | | | | | | | | | | | | | | | | | | | - this improves the error message on ABI incompatibility, per [perl #123136] - reduce the number of gv_fetchfile calls in newXS over registering many XSUBs - "v" was not stripped from PERL_API_VERSION_STRING since string "vX.XX.X\0", a typical version number is 8 bytes long, and aligned to 4/8 by most compilers in an image. A double digit maint release is extremely unlikely. - newXS_deffile saves on machine code in bootstrap functions by not passing arg filename - move newXS to where the rest of the newXS*()s live - move the "no address" panic closer to the start to get it out of the way sooner flow wise (it nothing to do with var gv or cv) - move CvANON_on to not check var name twice - change die message to use %p, more efficient on 32 ptr/64 IV platforms see ML post "about commit "util.c: fix comiler warnings"" - vars cv/xs_spp (stack pointer pointer)/xs_interp exist for inspection by a C debugger in an unoptimized build
* add xs_handshake APIDaniel Dragan2014-11-072-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This API elevates the amount of ABI compatibility protection between XS modules and the interp. It also makes each boot XSUB smaller in machine code by removing function calls and factoring out code into the new Perl_xs_handshake and Perl_xs_epilog functions. sv.c : - revise padlist duping code to reduce code bloat/asserts on DEBUGGING ext/DynaLoader/dlutils.c : - disable version checking so interp startup is faster, ABI mismatches are impossible because DynaLoader is never available as a shared library ext/XS-APItest/XSUB-redefined-macros.xs : - "" means dont check the version, so switch to " " to make the test in xsub_h.t pass, see ML thread "XS_APIVERSION_BOOTCHECK and XS_VERSION is CPP defined but "", mow what?" ext/re/re.xs : - disable API version checking until #123007 is resolved ParseXS/Utilities.pm : 109-standard_XS_defs.t : - remove context from S_croak_xs_usage similar to core commit cb077ed296 . CvGV doesn't need a context until 5.21.4 and commit ae77754ae2 and by then core's croak_xs_uage API has been long available and this backport doesn't need to account for newer perls - fix test where lack of having PERL_IMPLICIT_CONTEXT caused it to fail
* Tighten uses of regex synthetic start classKarl Williamson2014-09-291-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A synthetic start class (SSC) is generated by the regular expression pattern compiler to give a consolidation of all the possible things that can match at the beginning of where a pattern can possibly match. For example qr/a?bfoo/; requires the match to begin with either an 'a' or a 'b'. There are no other possibilities. We can set things up to quickly scan for either of these in the target string, and only when one of these is found do we need to look for 'foo'. There is an overhead associated with using SSCs. If the number of possibilities that the SSC excludes is relatively small, it can be counter-productive to use them. This patch creates a crude sieve to decide whether to use an SSC or not. If the SSC doesn't exclude at least half the "likely" possiblities, it is discarded. This patch is a starting point, and can be refined if necessary as we gain experience. See thread beginning with http://nntp.perl.org/group/perl.perl5.porters/212644 In many patterns, no SSC is generated; and with the advent of tries, SSC's have become less important, so whatever we do is not terribly critical.
* Deprecate multiple "x" in "/xx"Karl Williamson2014-09-292-4/+22
| | | | | | | | | | It is planned for a future Perl release to have /xx mean something different from just /x. To prepare for this, this commit raises a deprecation warning if someone currently has this usage. A grep of CPAN did not turn up any instances of this, but this is to be safe anyway. The added code is more general than actually needed, in case we want to do this for another flag.
* Make space for /xx flagKarl Williamson2014-09-291-1/+1
| | | | | | This doesn't actually use the flag yet. We no longer have to make version-dependent changes to ext/Devel-Peek/t/Peek.t, (it being in /ext) so this doesn't
* Add tests for a51d618a fix of RT #122283Yves Orton2014-09-283-4/+72
| | | | | | | | | | | | | | | | | | | | | Add a new re debug mode for outputing stuff useful for testing. In this case we count the number of times that we go through study_chunk. With a51d618a we should do 5 times (or less) when we traverse the test pattern. Without a51d618a we recurse 11 times. In the case of RT #122283 we would do gazilions of recursions, so many I never let it run to finish. / (?(DEFINE)(?<foo>foo)) (?(DEFINE)(?<bar>(?&foo)bar)) (?(DEFINE)(?<baz>(?&bar)baz)) (?(DEFINE)(?<bop>(?&baz)bop)) /x I say "or less" because you could argue that since these defines are never called, we should not actually recurse at all, and should maybe just compile this as a simple empty pattern.
* Eliminate the duplicative regops BOL and EOLYves Orton2014-09-171-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | See also perl5porters thread titled: "Perl MBOLism in regex engine" In the perl 5.000 release (a0d0e21ea6ea90a22318550944fe6cb09ae10cda) the BOL regop was split into two behaviours MBOL and SBOL, with SBOL and BOL behaving identically. Similarly the EOL regop was split into two behaviors SEOL and MEOL, with EOL and SEOL behaving identically. This then resulted in various duplicative code related to flags and case statements in various parts of the regex engine. It appears that perhaps BOL and EOL were kept because they are the type ("regkind") for SBOL/MBOL and SEOL/MEOL/EOS. Reworking regcomp.pl to handle aliases for the type data so that SBOL/MBOL are of type BOL, even though BOL == SBOL seems to cover that case without adding to the confusion. This means two regops, a regstate, and an internal regex flag can be removed (and used for other things), and various logic relating to them can be removed. For the uninitiated, SBOL is /^/ and /\A/ (with or without /m) and MBOL is /^/m. (I consider it a fail we have no way to say MBOL without the /m modifier). Similarly SEOL is /$/ and MEOL is /$/m (there is also a /\z/ which is EOS "end of string" with or without the /m).
* ext/re/t/regop.t: Use eq instead of == for stringsKarl Williamson2014-08-261-1/+1
| | | | Interestingly, this bug has been unnoticed for almost 3 years.
* Improve -Dr output of bracketed char classesKarl Williamson2014-08-251-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I look at this output a lot to verify that patterns compiled correctly. This commit makes them somewhat easier to read, while extending this to also work on EBCDIC platforms (as yet untested). In staring at these over time, I realized that punctuation literals are mostly what contributes to being hard to read. [A-Z] is just as readable as [A-Y], but [%!@\]~] is harder to read than if there were fewer. Sometimes that can't be helped, but if many get output, inverting the pattern [^...] can cause fewer to be output. This commit employs heuristics to invert when it thinks that that would be more legible. For example, it converts the output of [^"'] to be ANYOF[^"'][{unicode_all}] instead of ANYOF[\x{00}-\x{1F} !#$%&()*+,\-./0-9:;<=>?@A-Z[\\\]\^_`a-z{|}~\x{7F}-\x{FF}][{unicode_all}] Since it is a heuristic, it may not be the best under all circumstances, and may need to be tweaked in the future. If almost all the printables are to be output, it uses a hex range, as that is probably more closely aligned with the intent of the pattern than which individual printables are desired. Again this heuristic can be tweaked. And it prints a leading 0 on things it outputs as hex formerly as a single digit \x{0A} now instead of \x{A} previously.