| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
|
|
| |
The previous documentation really didn't specify what \w is. It matches
the underscore, but also all other connector punctuation, plus any
marks, such as diacritical accents that occur within a word.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit causes regex sequences \b, \s, and \w (and complements) to
match in the latin1 range in the scope of feature 'unicode_strings' or
with the /u regex modifier.
It uses the previously unused flags field in the respective regnodes to
indicate the type of matching, and in regexec.c, uses that to decide
which of the handy.h macros to use, native or Latin1.
I chose this for now rather than create new nodes for each type of
match. An earlier version of this patch did that, and in every case the
switch case: statements were adjacent, offering no performance
advantage. If regexec were modified to use in-line functions or more
macros for various short section of it, then it would be faster to have
new nodes rather than using the flags field. But, using that field
simplified things, as this change flies under the radar in a number of
places where it would not if separate nodes were used.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mktables is changed to process the Unicode named sequence file.
charnames.pm is changed to cache the looked-up values in utf8. A new
function, string_vianame is created that can handle named sequences, as
the interface for vianame cannot. The subroutine lookup_name() is
slightly refactored to do almost all of the common work for \N{} and the
vianame routines. It now understands named sequences as created my
mktables..
tests and documentation are added. In the randomized testing section,
half use vianame() and half string_vianame().
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds recognition of these modifiers, with appropriate action
for d and l. u does nothing useful yet. This allows for the
interpolation of a regex into another one without losing the character
set semantics that it was compiled with, as for the first time, the
semantics is now specified in the stringification as one of these
modifiers.
To this end, it allocates an unused bit in the structures. The off-
sets change so as to not disturb other bits.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds (?^...) to signify to use the default regex modifiers for the
cluster or embedded pattern-match modifier change. The major purpose of
this is to simplify regex stringification, so that "^" is output in
place of "-xism". As a result, the stringification will not change in
the future when new regex modifiers are added, so tests, etc. that rely
on a particular stringification will have to change now, but never
again.
Code that needs to work properly with both old- and new-style regexes
can use something like the following:
# Accept both old and new-style stringification
my $modifiers = (qr/foobar/ =~ /\Q(?^/) ? '^' : '-xism';
This construct is Ben Morrow's idea.
|
| |
|
|
|
|
|
|
|
|
| |
This patch adds a mention of \o{} to perlre to avoid the backreference
ambiguities, and uses 3 octal digits in an example, and suggests using 3
digits where 2 were suggested before.
Signed-off-by: David Golden <dagolden@cpan.org>
|
|
|
|
|
|
|
|
|
|
| |
This commit adds the new construct \o{} to express a character constant
by its octal ordinal value, along with ancillary tests and
documentation.
A function to handle this is added to util.c, and it is called from the
3 parsing places it could occur. The function is a candidate for
in-lining, though I doubt that it will ever be used frequently.
|
|
|
|
| |
Signed-off-by: David Golden <dagolden@cpan.org>
|
|
|
|
|
|
| |
These come from Abigail.
Signed-off-by: David Golden <dagolden@cpan.org>
|
|
|
|
|
|
| |
I don't know where the text for the stuff below this new heading should
go, but it clearly doesn't belon with what came before, so add a heading
to separate them, perhaps rearranging things later
|
|
|
|
|
|
|
| |
\g was added to avoid ambiguities that \digit causes. This updates the
pod documentation to use \g in examples, and to prefer it when
explaining the concepts. Some non-symmetrical outlined text dealing
with it was also cleaned up.
|
|
|
|
|
|
| |
Both terms 'capture group' and 'capture buffer' are used in the
documentation. This patch changes most uses of the latter to the
former, as they are referenced using "\g".
|
| |
|
| |
|
| |
|
|
|
|
| |
double word, make table fit in 80 column terminal
|
|
|
|
| |
Things were getting out of sync.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
The NAME portion of (*MARK:NAME) is not optional.
|
|
|
|
| |
expression also requires "use re 'eval'", just as '(?{ code })' does.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are all in the pod/ directory, and only the first is a code fix.
There was also a single lingering ISO 8859-1 encoding that missed the
UTF-8 upconvert. The rest are cleanups for typos, some of which seem
to have been around for a rather long time: spelling errors, incorrect
possessives, and extra, missing, or duplicated words.
If you actually read through, I bet you'll realize what sparked this. :)
--tom
Signed-off-by: Abigail <abigail@abigail.be>
|
| |
|
|
|
|
| |
pattern (see also #71136)
|
|
|
|
|
| |
Also add a test for that, fill in test description, and sneak in a vim
modeline for re_tests
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
POSIX charclasses
|
| |
|
|
|
|
|
|
| |
From: "Robin Barker" <Robin.Barker@npl.co.uk>
Message-ID: <46A0F33545E63740BC7563DE59CA9C6D093B12@exchsvr2.npl.ad.local>
p4raw-id: //depot/perl@33752
|
|
|
| |
p4raw-id: //depot/perl@33129
|
|
|
|
|
| |
Message-ID: <477FACF4.5030801@casella.verplant.org>
p4raw-id: //depot/perl@32872
|
|
|
|
|
| |
third component. (Suggested by Jarkko)
p4raw-id: //depot/perl@32523
|
|
|
| |
p4raw-id: //depot/perl@32484
|