| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
There are a number of files excluded using gitignore rules that are
included in the repository. This can lead to confusion if something
other than git tries to read the ignore files.
Add rules to the gitignore files so that these files won't be ignored.
|
|
|
|
| |
Somehow dropped in 49781baac3.
|
|
|
|
|
| |
Given a compiled regexp object, this returns a hashref of the optimization
information discovered for it.
|
|
|
|
|
| |
regmust() is about internals, it should go at the end; move docs for
functions interrogating match results above it.
|
| |
|
|
|
|
|
| |
Mostly in comments and docs, but some in diagnostic messages and one
case of 'or die die'.
|
|
|
|
| |
Instead, foo was silently ignored
|
|
|
|
| |
This was not setting the defaults properly for 'debug', and 'Debug'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes #17026
Patterns can have subpatterns since 5.30. These are processed when
encountered, by suspending the main pattern compilation, compiling the
subpattern, and then matching that against the set of all legal
possibilities, which Perl knows about.
Debugging info for the compilation portion of the subpattern was added
by be8790133a0ce8fc67454e55e7849a47a0858d32, without fanfare. But,
prior to this new commit, debugging info was not available for that
matching portion of the compilation, except under DEBUGGING builds, with
-Drv. This commit adds a new option to 'use re qw(Debug ...)',
WILDCARD, to enable subpattern match debugging. Whatever other match
debugging options have been turned on will show up when a wildcard
subpattern is compiled iff WILDCARD is specified.
The output of this may be voluminous, which is why you have to ask for
it specifically. Or, the EXTRA option turns it on, along with several
other things.
|
|
|
|
| |
in qr//
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
These need to be changed around as a result of removing the sizing pass
from pattern compilation.
The first element in the array is the number of offsets. This had
become wrong. And it is used instead of the program length when it is
available.
|
|
|
|
|
| |
This commit makes the v (verbose) modifier to -Dr do something: turn on
all possible regex debugging.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like this statement to turn on Debugging was used to help debug
the issues that this test now tests, and should have been removed when
the file was first commited. But no harm came, as no debugging output
got generated.
This was changed by commit 15cab4d70522286feb2fcb1e7313b2f995343181,
"Remove references to passes from regex compiler". It unknowingly fixed
a bug wherein a Debug statement didn't get output. That cause qr.t to
output that debugging statement.
The fix is to simply remove the call for Debugging from qr.t
|
|
|
|
| |
However, we do preserve it outside PERL_CORE for the use of XS authors.
|
|
|
|
|
| |
re.xs and re_top.h both turn DEBUGGING on. make them set
DEBUGGING_RE_ONLY too so that its easy to tell this is a fake DEBUGGING.
|
| |
|
|
|
|
|
| |
This was first proposed in the thread starting at
http://www.nntp.perl.org/group/perl.perl5.porters/2014/09/msg219394.html
|
| |
|
|
|
|
| |
This has been deprecated since v5.22
|
|
|
|
|
|
|
| |
These tests were using individually defined heuristics to decide whether
to do locale testing or not. However t/loc_tools.pl provides functions
that are more reliable and complete for determining this than the
hand-rolled ones in these tests.
|
| |
|
|
|
|
|
|
| |
This has been magically working since ext re builds with -I../..,
and so picks up the inline headers from the top, the copied bogus
file has been left unused.
|
|
|
|
| |
Instead of #include-ing the C file, compile it normally.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An empty cpan/.dir-locals.el stops Emacs using the core defaults for
code imported from CPAN.
Committer's work:
To keep t/porting/cmp_version.t and t/porting/utils.t happy, $VERSION needed
to be incremented in many files, including throughout dist/PathTools.
perldelta entry for module updates.
Add two Emacs control files to MANIFEST; re-sort MANIFEST.
For: RT #124119.
|
|
|
|
|
|
|
|
|
| |
1) make string offsets be consistently counted from strbeg, rather than
a mixture of that and strpos;
2) make it clearer when rx_origin has been updated, since that value
is the raison d'etre of intuit();
3) always show the input and output offsets when calling fbm_intr() from
intuit().
|
|
|
|
|
|
|
| |
It is possible to compile Perl without locales, and some platforms may
not have them available properly. These tests were failing under these
conditions. This commit uses the new infrastructure in loc_tools.pl to
centralize the knowledge of how to determine if locales are available.
|
|
|
|
|
| |
Add to perlexperiment; note that an alternative syntax has been
proposed; nits.
|
| |
|
| |
|
|
|
|
| |
Fix some indents, vertically align ternary
|
|
|
|
|
|
|
| |
A plain 'no re'; without subpragmas prior to this commit only turned off
a few things. Now it turns off all the enabled things. For example,
previously, you couldn't turn off debugging, once enabled, inside the
same block.
|
|
|
|
|
| |
There are multiple occurrences of these constants in the file. It's
better to use a variable than to repeat them.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This subpragma is to allow p5p to add warnings/errors for regex patterns
without having to worry about backwards compatibility. And it allows
users who want to have the latest checks on their code to do so. An
experimental warning is raised by default when it is used, not because
the subpragma might go away, but because what it catches is subject to
change from release-to-release, and so the user is acknowledging that
they waive the right to backwards compatibility. I will be working in
the near term to make some changes to what is detected by this.
Note that there is no indication in the pattern stringification that it
was compiled under this. This means I didn't have to figure out how to
stringify it. It is fine because using this doesn't affect what the
pattern gets compiled into, if successful. And interpolating the
stringified pattern under either strict or non-strict should both just
work.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This flag will prevent () from capturing and filling in $1, $2, etc...
Named captures will still work though, and if used will cause $1, $2, etc...
to be filled in *only* within named groups.
The motivation behind this is to allow the common construct of:
/(?:b|c)a(?:t|n)/
To be rewritten more cleanly as:
/(b|c)a(t|n)/n
When you want grouping but no memory penalty on captures.
You can also use ?n inside of a () directly to avoid capturing, and
?-n inside of a () to negate its effects if you want to capture.
|
|
|
|
|
| |
It was returning (undef) in list context, though it was documented to
return the empty list.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- this improves the error message on ABI incompatibility, per
[perl #123136]
- reduce the number of gv_fetchfile calls in newXS over registering many
XSUBs
- "v" was not stripped from PERL_API_VERSION_STRING since string
"vX.XX.X\0", a typical version number is 8 bytes long, and aligned to
4/8 by most compilers in an image. A double digit maint release is
extremely unlikely.
- newXS_deffile saves on machine code in bootstrap functions by not passing
arg filename
- move newXS to where the rest of the newXS*()s live
- move the "no address" panic closer to the start to get it out of the way
sooner flow wise (it nothing to do with var gv or cv)
- move CvANON_on to not check var name twice
- change die message to use %p, more efficient on 32 ptr/64 IV platforms
see ML post "about commit "util.c: fix comiler warnings""
- vars cv/xs_spp (stack pointer pointer)/xs_interp exist for inspection by
a C debugger in an unoptimized build
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This API elevates the amount of ABI compatibility protection between XS
modules and the interp. It also makes each boot XSUB smaller in machine
code by removing function calls and factoring out code into the new
Perl_xs_handshake and Perl_xs_epilog functions.
sv.c :
- revise padlist duping code to reduce code bloat/asserts on DEBUGGING
ext/DynaLoader/dlutils.c :
- disable version checking so interp startup is faster, ABI mismatches are
impossible because DynaLoader is never available as a shared library
ext/XS-APItest/XSUB-redefined-macros.xs :
- "" means dont check the version, so switch to " " to make the test in
xsub_h.t pass, see ML thread "XS_APIVERSION_BOOTCHECK and XS_VERSION
is CPP defined but "", mow what?"
ext/re/re.xs :
- disable API version checking until #123007 is resolved
ParseXS/Utilities.pm :
109-standard_XS_defs.t :
- remove context from S_croak_xs_usage similar to core commit cb077ed296 .
CvGV doesn't need a context until 5.21.4 and commit ae77754ae2 and
by then core's croak_xs_uage API has been long available and this
backport doesn't need to account for newer perls
- fix test where lack of having PERL_IMPLICIT_CONTEXT caused it to fail
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A synthetic start class (SSC) is generated by the regular expression
pattern compiler to give a consolidation of all the possible things that
can match at the beginning of where a pattern can possibly match.
For example
qr/a?bfoo/;
requires the match to begin with either an 'a' or a 'b'. There are no
other possibilities. We can set things up to quickly scan for either of
these in the target string, and only when one of these is found do we
need to look for 'foo'.
There is an overhead associated with using SSCs. If the number of
possibilities that the SSC excludes is relatively small, it can be
counter-productive to use them.
This patch creates a crude sieve to decide whether to use an SSC or not.
If the SSC doesn't exclude at least half the "likely" possiblities, it
is discarded. This patch is a starting point, and can be refined if
necessary as we gain experience.
See thread beginning with
http://nntp.perl.org/group/perl.perl5.porters/212644
In many patterns, no SSC is generated; and with the advent of tries,
SSC's have become less important, so whatever we do is not terribly
critical.
|
|
|
|
|
|
|
|
|
|
| |
It is planned for a future Perl release to have /xx mean something
different from just /x. To prepare for this, this commit raises a
deprecation warning if someone currently has this usage. A grep of CPAN
did not turn up any instances of this, but this is to be safe anyway.
The added code is more general than actually needed, in case we want to
do this for another flag.
|
|
|
|
|
|
| |
This doesn't actually use the flag yet.
We no longer have to make version-dependent changes to
ext/Devel-Peek/t/Peek.t, (it being in /ext) so this doesn't
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new re debug mode for outputing stuff useful for testing.
In this case we count the number of times that we go through
study_chunk. With a51d618a we should do 5 times (or less) when
we traverse the test pattern. Without a51d618a we recurse 11
times. In the case of RT #122283 we would do gazilions of
recursions, so many I never let it run to finish.
/
(?(DEFINE)(?<foo>foo))
(?(DEFINE)(?<bar>(?&foo)bar))
(?(DEFINE)(?<baz>(?&bar)baz))
(?(DEFINE)(?<bop>(?&baz)bop))
/x
I say "or less" because you could argue that since these defines are
never called, we should not actually recurse at all, and should maybe
just compile this as a simple empty pattern.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See also perl5porters thread titled: "Perl MBOLism in regex engine"
In the perl 5.000 release (a0d0e21ea6ea90a22318550944fe6cb09ae10cda)
the BOL regop was split into two behaviours MBOL and SBOL, with SBOL
and BOL behaving identically. Similarly the EOL regop was split into
two behaviors SEOL and MEOL, with EOL and SEOL behaving identically.
This then resulted in various duplicative code related to flags and
case statements in various parts of the regex engine.
It appears that perhaps BOL and EOL were kept because they are the
type ("regkind") for SBOL/MBOL and SEOL/MEOL/EOS. Reworking regcomp.pl
to handle aliases for the type data so that SBOL/MBOL are of type
BOL, even though BOL == SBOL seems to cover that case without adding
to the confusion.
This means two regops, a regstate, and an internal regex flag can
be removed (and used for other things), and various logic relating
to them can be removed.
For the uninitiated, SBOL is /^/ and /\A/ (with or without /m) and
MBOL is /^/m. (I consider it a fail we have no way to say MBOL without
the /m modifier). Similarly SEOL is /$/ and MEOL is /$/m (there is
also a /\z/ which is EOS "end of string" with or without the /m).
|
|
|
|
| |
Interestingly, this bug has been unnoticed for almost 3 years.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I look at this output a lot to verify that patterns compiled correctly.
This commit makes them somewhat easier to read, while extending this to
also work on EBCDIC platforms (as yet untested).
In staring at these over time, I realized that punctuation literals are
mostly what contributes to being hard to read. [A-Z] is just as
readable as [A-Y], but [%!@\]~] is harder to read than if there were
fewer. Sometimes that can't be helped, but if many get output,
inverting the pattern [^...] can cause fewer to be output. This commit
employs heuristics to invert when it thinks that that would be more
legible. For example, it converts the output of [^"'] to be
ANYOF[^"'][{unicode_all}]
instead of
ANYOF[\x{00}-\x{1F} !#$%&()*+,\-./0-9:;<=>?@A-Z[\\\]\^_`a-z{|}~\x{7F}-\x{FF}][{unicode_all}]
Since it is a heuristic, it may not be the best under all circumstances,
and may need to be tweaked in the future.
If almost all the printables are to be output, it uses a hex range, as
that is probably more closely aligned with the intent of the pattern
than which individual printables are desired. Again this heuristic can
be tweaked.
And it prints a leading 0 on things it outputs as hex formerly as a
single digit \x{0A} now instead of \x{A} previously.
|