| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
|
|
|
|
| |
All the other function headers in this pod look like what this patch
makes this one look like. Corresponding links to it are also revised.
|
|
|
|
|
|
|
|
|
| |
This emphasized that the /dul modifiers should rarely be used
explicitly, but are automatically selected when various pragma are in
effect.
It also calls the /a parameter ASCII-safe or ASCII-restrict instead of
plain ASCII, as this is more accurate.
|
| |
|
|
|
|
|
|
| |
This makes many documents more consistent in their pod formatting. Don't trim
blank lines between verbatim blocks and =item, as removing them makes the (raw)
pod harder to read.
|
|
|
|
|
| |
This fixes some verbatim text exceeding an 80 column window by shortening
two =over amounts.
|
|
|
|
|
|
| |
This fixes some issues with the pod wrapping verbatim in 80 column
windows by indenting less, and not having the comments so far to the
right
|
| |
|
| |
|
| |
|
|
|
|
|
| |
pod/perlre.pod: (*COMMIT) should be grouped under "Verbs without an
argument".
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Several confusions have arisen about how things work, and this
addresses them.
|
| |
|
|
|
|
|
|
|
|
| |
perlre: Include a high-level description of what it does, and what a missing
pattern means
perlreref: Include missing look-around cases
Signed-off-by: Chris 'BinGOs' Williams <chris@bingosnet.co.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mostly typos, grammatical errors and factual errors (mostly due to
bitrot), but also:
• The section explaining how to work around the lack of look behind
obviously has not been relevant for years. :-)
• Since we have relative backreferences, we might as well use them in
the explanation of the (?>...) construct.
• Note that it’s possible to backtrack *past* (?>...), but
not into it.
• (?:non-zero-length|zero-length)* is *not* equivalent to nzl*|zl? as
"aaaaab" =~ /(?:a|(?{print "hello"})(?=(b)))*/ demonstrates.
• The custom re engine section doesn’t mention custom re engines. :-)
|
| |
|
|
|
|
|
|
|
| |
As explained in the doc changes of this patch, under /l, caseless
matching of code points less than 256 now use locale rules regardless
of the utf8ness of the pattern or string. They now match the behavior
of things like \w, in this regard.
|
| |
|
|
|
|
|
|
| |
• Mention look-around assertions in the list
• It’s the code block’s return value, not the block itself
that is used
|
|
|
|
| |
Tests for \N{} with this option will be added later.
|
| |
|
|
|
|
| |
Now, a Unicode property match specified in the pattern will indicate that the pattern is meant for matching according to Unicode rules
|
| |
|
|
|
|
| |
(found using code from Karl Williamson <public@khwilliamson.com>)
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This restricts certain constructs, like \w, to matching in the ASCII range
only.
|
| |
|
| |
|
|
|
|
|
|
| |
The previous documentation really didn't specify what \w is. It matches
the underscore, but also all other connector punctuation, plus any
marks, such as diacritical accents that occur within a word.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit causes regex sequences \b, \s, and \w (and complements) to
match in the latin1 range in the scope of feature 'unicode_strings' or
with the /u regex modifier.
It uses the previously unused flags field in the respective regnodes to
indicate the type of matching, and in regexec.c, uses that to decide
which of the handy.h macros to use, native or Latin1.
I chose this for now rather than create new nodes for each type of
match. An earlier version of this patch did that, and in every case the
switch case: statements were adjacent, offering no performance
advantage. If regexec were modified to use in-line functions or more
macros for various short section of it, then it would be faster to have
new nodes rather than using the flags field. But, using that field
simplified things, as this change flies under the radar in a number of
places where it would not if separate nodes were used.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mktables is changed to process the Unicode named sequence file.
charnames.pm is changed to cache the looked-up values in utf8. A new
function, string_vianame is created that can handle named sequences, as
the interface for vianame cannot. The subroutine lookup_name() is
slightly refactored to do almost all of the common work for \N{} and the
vianame routines. It now understands named sequences as created my
mktables..
tests and documentation are added. In the randomized testing section,
half use vianame() and half string_vianame().
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds recognition of these modifiers, with appropriate action
for d and l. u does nothing useful yet. This allows for the
interpolation of a regex into another one without losing the character
set semantics that it was compiled with, as for the first time, the
semantics is now specified in the stringification as one of these
modifiers.
To this end, it allocates an unused bit in the structures. The off-
sets change so as to not disturb other bits.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds (?^...) to signify to use the default regex modifiers for the
cluster or embedded pattern-match modifier change. The major purpose of
this is to simplify regex stringification, so that "^" is output in
place of "-xism". As a result, the stringification will not change in
the future when new regex modifiers are added, so tests, etc. that rely
on a particular stringification will have to change now, but never
again.
Code that needs to work properly with both old- and new-style regexes
can use something like the following:
# Accept both old and new-style stringification
my $modifiers = (qr/foobar/ =~ /\Q(?^/) ? '^' : '-xism';
This construct is Ben Morrow's idea.
|
| |
|
|
|
|
|
|
|
|
| |
This patch adds a mention of \o{} to perlre to avoid the backreference
ambiguities, and uses 3 octal digits in an example, and suggests using 3
digits where 2 were suggested before.
Signed-off-by: David Golden <dagolden@cpan.org>
|
|
|
|
|
|
|
|
|
|
| |
This commit adds the new construct \o{} to express a character constant
by its octal ordinal value, along with ancillary tests and
documentation.
A function to handle this is added to util.c, and it is called from the
3 parsing places it could occur. The function is a candidate for
in-lining, though I doubt that it will ever be used frequently.
|
|
|
|
| |
Signed-off-by: David Golden <dagolden@cpan.org>
|
|
|
|
|
|
| |
These come from Abigail.
Signed-off-by: David Golden <dagolden@cpan.org>
|
|
|
|
|
|
| |
I don't know where the text for the stuff below this new heading should
go, but it clearly doesn't belon with what came before, so add a heading
to separate them, perhaps rearranging things later
|
|
|
|
|
|
|
| |
\g was added to avoid ambiguities that \digit causes. This updates the
pod documentation to use \g in examples, and to prefer it when
explaining the concepts. Some non-symmetrical outlined text dealing
with it was also cleaned up.
|
|
|
|
|
|
| |
Both terms 'capture group' and 'capture buffer' are used in the
documentation. This patch changes most uses of the latter to the
former, as they are referenced using "\g".
|