| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
eb0925341cc65ce6ce57503ec0ab97cdad39dc98 caused the definitions for
about 45% of the Unicode tables to be placed in-line in Heavy.pl instead
of them having to be read-in from disk. This new commit extends that so
that about 55% are in-lined, by in-lining tables which consist of up to
3 ranges.
This is a no-brainer to do, as the memory usage does not increase by
doing it, and disk accesses go down. I used the delta in the disk size
of Heavy.pl as a proxy for the delta in the memory size that it uses,
as what this commit does is to change various scalar strings in it.
Doing this measurement indicates that this commit results in a slightly
smaller Heavy.pl than what was there before eb092534. The amounts will
vary between Unicode releases. I also checked for Unicode beta 7.0, and
the sizes are again comparable, with a slightly larger Heavy.pl for the
3-range version there.
For 4-, 5-, ... range tables, doing this results in slowly increasing
Heavy.pl size (and hence more and more memory use), and that is
something we may wish to look at in the future, trading memory for fewer
files and less disk start-up cost. But for the imminent v5.20, doing it
for 3-range tables doesn't cost us anything, and gains us fewer disk
files and accesses.
|
|
|
|
| |
or, sometimes, simply remove them
|
| |
|
| |
|
|
|
|
|
|
|
| |
Add tests for 111f73b5d79, the fix for kill -SIG on win32, which was
broken in 5.18.0
(A follow-up commit will clean this code up a bit)
|
|
|
|
| |
This doesn't include module version updates.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
(cherry picked from commit 43c6e0a7ba1950c4a64b59be5d0a9cd7b1807cca)
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
subpattern references
Match variables should be dynamically scoped during GOSUB and GOSTART.
The callers state should be inherited by the callee, but once the callee
returns, the callers state should be restored.
This is different from EVAL, where the callers and callees state are
expected to not be the same (although might be the same), and where
the "reasonable" match semantics differ. Currently the following two
one liners will produce different results:
$ ./perl -Ilib -le'"<ab><>>" =~/ < (?: \1 | [ab]+ ) (>) (?0)? /x and print $&;'
<ab><>>
$ ./perl -Ilib -le'$qr= qr/ < (?: \1 | [ab]+ ) (>) (??{ $qr })? /x; "<ab><>>" =~ m/$qr/ and print $&;'
<ab>
While I think reasonable people could argue that we should special case
things when we know that the return from (??{ ... }) is the same as the
currently executing pattern I think explaining the difference would be
harder than necessary.
On the contrary making GOSUB/GOSTART exactly the same as EVAL, so that
the match vars were totally independent seems to throw away an
opportunity for much more powerful semantics than can be offered by
EVAL.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
Commit 1c604e7c7f fixed some pod errors, but broke
./perl -Ilib -Mdiagnostics -e '$/=[]'
This fixes that.
|
|
|
|
|
|
| |
The term 'semantics' in documentation when applied to character sets is
changed to 'rules' as being a shorter less-jargony synonym in this case.
This was discussed several releases ago, but I didn't get around to it.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See discussion at https://rt.perl.org/Ticket/Display.html?id=120675
There are several unresolved items in this discussion, but we did agree
that tainting should be dependent only on the regex pattern, and not the
particular input string being matched against:
"The bottom line is we are moving to the policy that tainting is based
on the operation being in locale, without regard to the particular
operand's contents passed this time to the operation. This means simpler
core code and more consistent tainting results. And it lessens the
likelihood that there are paths in the core that should taint but don't"
This commit does the minimal work to change regex pattern matching to
determine tainting at pattern compilation time. Simply put, if a
pattern contains a regnode whose match/not match depends on the run-time
locale, any attempt to match against that pattern will taint, regardless
of the actual target string or runtime locale in effect. Given this
change, there are optimizations that can be made to avoid runtime work,
but these are deferred until later.
Note that just because a regular expression is compiled under locale
doesn't mean that the generated pattern will be tainted. It depends on
the actual pattern. For example, the pattern /(.)/ doesn't taint
because it will match exactly one character of the input, regardless of
locale settings.
|
|
|
|
|
| |
This variable is part of the environment, but wasn't previously
mentioned.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Class::Tiny is similarly small and simple in API, but with more powerful
features available. Comparison to Object::Tiny and Class::Accessor is
here: https://metacpan.org/pod/Class::Tiny#RATIONALE
At mst's suggestion, a link to Class::Tiny::Antlers for Moose-syntax
is included.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
1.4413 2014-02-17 20:04:23-05:00 America/New_York
[FIXED]
- UTF-8 decoding is done differently to avoid requiring
a newer version of Encode (Graham Knop)
|
| |
|
| |
|
|
|
|
|
|
| |
Amended with a suggestion from rjbs.
For: RT #120808
|
| |
|
| |
|
|
|
|
|
| |
This reverts commit 34fdef848b1687b91892ba55e9e0c3430e0770f6, and
adds comments referring to it, in case it is ever needed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit frees up a bit by using an extra regnode to pass the
information to the regex engine instead of the flag. I originally
thought that if this was needed, it should be the ANYOF_ABOVE_LATIN1_ALL
bit, as that might speed some things up. But if we need to do this
again by adding another node to get another bit, we want one that is
mutually exclusive of the first one we did, For otherwise we start
having to make 3 nodes instead of two to get the combinations:
1 0
0 1
1 1
This combinatorial problem is avoided by using bits that are mutually
exclusive, which the ABOVE_LATIN1_ALL isn't, but the one freed by this
commit ANYOF_NON_UTF8_NON_ASCII_ALL is only set under /d matching, and
there are other bits that are set only under /l, so if we need to do
this again, we should use one of those.
I wrote this code when I thought I really needed a bit. But since, I
have figured out a better way to get the bit needed now. But I don't
want to lose this code to posterity, so this commit is being made long
enough to get the commit number, then it will be reverted, adding
comments referring to the commit number, so that it can easily be
reconstructed when necessary.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
These functions have been supplanted in more modern Perls by
/[[:posix:]]/. The documentation has been wrong; they don't handle
UTF-8 and return true on an empty string. Rather than try to fix them,
the decision has been made to deprecate them instead.
See http://markmail.org/thread/jhqcag5njmx7jpyu
This commit also updates the documentation to be accurate.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If Perl encounters a problem during startup trying to initialize the
locales from the environment it has immediately reverted to the "C"
locale.
This commit generalizes that so it tries each of the applicable
environment variables in order of priority until it works, or it gives
up and uses the "C" locale. For example, if LC_ALL is set to something
that is invalid, but LANG is valid, LANG will be used. This was
motivated by trying to get the Windows system default locale used in
preference to "C" if all else fails.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Locale initialization and setting on Windows haven't been as
described in perllocale for setting locales to "". This is because that
tells Windows to use the system default locale, as set through the
Control Panel, but on POSIX systems, it means to look at various
environment variables.
This commit creates a wrapper for setlocale, used only on Windows, that
looks for the appropriate environment variables when called with a ""
input locale. If none are found, it continues to use the system default
locale.
|
| |
|