| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
The previous documentation really didn't specify what \w is. It matches
the underscore, but also all other connector punctuation, plus any
marks, such as diacritical accents that occur within a word.
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a set of synonyms \p{XPosixFOO} for the full extended
Unicode version of \p{PosixFOO}, so only one rule need be remembered.
Similarly, \p{XPerlSpace} is added to preserve the rule for the one
similar class that doesn't have Posix in its name.
Prior to this patch there was no exact equivalent to \p{PosixPunct}
extended beyond ASCII.
|
|
|
|
| |
This reverts commit d5944336d74c819152158dabfd806d49ad0ecb21.
|
|
|
|
|
|
|
| |
This patch adds a set of synonyms \p{XPosixFOO} for the full extended
Unicode version of \p{PosixFOO}, so only one rule need be remembered.
Similarly, \p{XPerlSpace} is added to preserve the rule for the one
similar class that doesn't have Posix in its name.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit causes regex sequences \b, \s, and \w (and complements) to
match in the latin1 range in the scope of feature 'unicode_strings' or
with the /u regex modifier.
It uses the previously unused flags field in the respective regnodes to
indicate the type of matching, and in regexec.c, uses that to decide
which of the handy.h macros to use, native or Latin1.
I chose this for now rather than create new nodes for each type of
match. An earlier version of this patch did that, and in every case the
switch case: statements were adjacent, offering no performance
advantage. If regexec were modified to use in-line functions or more
macros for various short section of it, then it would be faster to have
new nodes rather than using the flags field. But, using that field
simplified things, as this change flies under the radar in a number of
places where it would not if separate nodes were used.
|
|
|
|
|
|
|
| |
Inside a bracketed character class, any \N{name} which expands to more
than one character will have only the first one considered. This
doesn't need named character sequences, as user-defined aliases have
long been able to be multi-char.
|
|
|
|
|
| |
The documentation had failed to mention that a regex pattern in utf8
encoding forces a Unicode interpretation on a non-utf8 string.
|
| |
|
|
|
|
|
| |
While not strictly wrong, the hre was missing info for what \p{Punct}
does.
|
|
|
|
|
|
| |
A number of clarification and wording edits have been made, fixing some
broken links, and details especially on \d in the Unicode range. Fixed
an incorrect character ordinal
|
|
|
|
|
| |
Rewording to clarify a few paragraphs; make table fit in 80 column
terminal; remove extra word; other slight edits
|
|
|
|
|
|
| |
The regex documentation included changes that were put temporarily into
a 5.11 release, but not into 5.12. And there were a number of omissions.
I went through this pod and tried to make it reflect reality.
|
| |
|
| |
|
| |
|
|
|
|
| |
Signed-off-by: Abigail <abigail@abigail.be>
|
| |
|
| |
|
|
|
|
| |
Signed-off-by: Abigail <abigail@abigail.be>
|
|
|
|
|
| |
third component. (Suggested by Jarkko)
p4raw-id: //depot/perl@32523
|
|
p4raw-id: //depot/perl@31110
|