| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
(cherry picked from commit 43c6e0a7ba1950c4a64b59be5d0a9cd7b1807cca)
|
|
|
|
|
|
| |
The term 'semantics' in documentation when applied to character sets is
changed to 'rules' as being a shorter less-jargony synonym in this case.
This was discussed several releases ago, but I didn't get around to it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This large (sorry, I couldn't figure out how to meaningfully split it
up) commit causes Perl to fully support LC_CTYPE operations (case
changing, character classification) in UTF-8 locales.
As a side effect it resolves [perl #56820].
The basics are easy, but there were a lot of details, and one
troublesome edge case discussed below.
What essentially happens is that when the locale is changed to a UTF-8
one, a global variable is set TRUE (FALSE when changed to a non-UTF-8
locale). Within the scope of 'use locale', this variable is checked,
and if TRUE, the code that Perl uses for non-locale behavior is used
instead of the code for locale behavior. Since Perl's internal
representation is UTF-8, we get UTF-8 behavior for a UTF-8 locale.
More work had to be done for regular expressions. There are three
cases.
1) The character classes \w, [[:punct:]] needed no extra work, as
the changes fall out from the base work.
2) Strings that are to be matched case-insensitively. These form
EXACTFL regops (nodes). Notice that if such a string contains only
characters above-Latin1 that match only themselves, that the node can be
downgraded to an EXACT-only node, which presents better optimization
possibilities, as we now have a fixed string known at compile time to be
required to be in the target string to match. Similarly if all
characters in the string match only other above-Latin1 characters
case-insensitively, the node can be downgraded to a regular EXACTFU node
(match, folding, using Unicode, not locale, rules). The code changes
for this could be done without accepting UTF-8 locales fully, but there
were edge cases which needed to be handled differently if I stopped
there, so I continued on.
In an EXACTFL node, all such characters are now folded at compile time
(just as before this commit), while the other characters whose folds are
locale-dependent are left unfolded. This means that they have to be
folded at execution time based on the locale in effect at the moment.
Again, this isn't a change from before. The difference is that now some
of the folds that need to be done at execution time (in regexec) are
potentially multi-char. Some of the code in regexec was trivial to
extend to account for this because of existing infrastructure, but the
part dealing with regex quantifiers, had to have more work.
Also the code that joins EXACTish nodes together had to be expanded to
account for the possibility of multi-character folds within locale
handling. This was fairly easy, because it already has infrastructure
to handle these under somewhat different circumstances.
3) In bracketed character classes, represented by ANYOF nodes, a new
inversion list was created giving the characters that should be matched
by this node when the runtime locale is UTF-8. The list is ignored
except under that circumstance. To do this, I created a new ANYOF type
which has an extra SV for the inversion list.
The edge case that caused the most difficulty is folding involving the
MICRO SIGN, U+00B5. It folds to the GREEK SMALL LETTER MU, as does the
GREEK CAPITAL LETTER MU. The MICRO SIGN is the only 0-255 range
character that folds to outside that range. The issue is that it
doesn't naturally fall out that it will match the CAP MU. If we let the
CAP MU fold to the samll mu at compile time (which it can because both
are above-Latin1 and so the fold is the same no matter what locale is in
effect), it could appear that the regnode can be downgraded away from
EXACTFL to EXACTFU, but doing so would cause the MICRO SIGN to not case
insensitvely match the CAP MU. This could be special cased in regcomp
and regexec, but I wanted to avoid that. Instead the mktables tables
are set up to include the CAP MU as a character whose presence forbids
the downgrading, so the special casing is in mktables, and not in the C
code.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds an overview of the feature to perlref and a pointer
to the section in perlref to perlop's documentation of the arrow.
If/when this feature becomes non-experimental, the documentation
should be merged upward into Using References.
This documentation was written against a previous state of the
branch. Is should be fact-checked before any merge.
|
|
|
|
|
| |
In many other dynamic languages it is the operator plus the type of the
first operand, so it is worth mentioning.
|
|
|
|
| |
For RT #118593, 118595, 118597, 118599.
|
| |
|
| |
|
|
|
|
|
| |
These were all uncovered by the new Pod::Checker, not yet in core.
Fixing these will speed up debugging the new Checker.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
These have always* had assignment precedence, such that
$a = goto $b = $c
is equivalent to
$a = (goto ($b = $c))
* I haven’t checked before perl 5.
|
| |
|
|
|
|
|
| |
Removed some out-of-date modules and add Math::GMPq, Math::GMPz and
Math:GMPf.
|
| |
|
| |
|
|
|
|
| |
There is no 'bitfloat' pragma
|
|
|
|
|
| |
Sometimes patterns with embedded code are recompiled each time even
if the pattern string hasn't changed.
|
|
|
|
|
|
| |
Update the docs and add perldelta entries summarising the changes and
fixes related to (?{}) and (??{}) accumulated over the 120 or so commits
in this branch.
|
|
|
|
| |
Not having such space has been deprecated since v5.14.0.
|
|
|
|
| |
(Thanks for reporting this, Tom Christiansen!)
|
|
|
|
|
|
|
| |
It was already documented that when scanning for the end of the string,
backslashes escaping the closing delimiter are being eliminated; but
this is true for backslashes escaping backslashes as well. This makes
that C<< '.\.' eq '.\\.' >>. (Pointed out by Mithaldu)
|
| |
|
| |
|
|
|
|
|
| |
This adds the parameter handling, tests, and documentation for this new
feature which allows locale and Unicode to play well with each other.
|
| |
|
|
|
|
|
| |
This has been superseded by c2f1e229, which adds it
to perlsyn.
|
|
|
|
|
| |
Here is a patch against the first patch,
fixing typos reported to me.
|
|
|
|
|
|
|
| |
The thrust of this patch is to move the description of the ~~
operator into perlop where it properly belongs; given and when
remain relegated to perlsyn. This is also (nearly) the first-ever
set of examples for the smartmatch operator. Staggerment.
|
| |
|
| |
|
|
|
|
| |
Signed-off-by: Abigail <abigail@abigail.be>
|
|
|
|
| |
This is to address ticket #94252.
|
|
|
|
|
|
|
| |
See the discussion starting with mail:9879.1315954489@chthon
This rephrasing should avoid people getting the impression // is a
source filter, translating 'A // B' into 'defined(A) ? A : B', and
reparsing the result.
|
|
|
|
|
| |
Make the indentation in this example match the surrounding
examples.
|
|
|
|
|
|
|
| |
The perlop manpage was stating ‘the left operand’, which was
not entirely correct, as ‘time.shift =>’ quotes just the shift,
not the time (nor does it see the whole as not being an ident-
ifier and refuse to quote anything).
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Subject: [perl #89490] PATCH: perlop.pod
|
| |
|
| |
|
|
|
|
|
|
| |
The reason there are links broken to this is that the X<>
were part of the heading, and the spaces between them are
significant
|
| |
|
| |
|
| |
|