| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
This commit adds various release notes covering:
* module updates
* documentation updates
* some bug fixes and internal changes
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Because it's already defined in regcomp.c and the VMS build was
failing with a linker error (multiply-defined symbol).
|
|
|
|
|
|
|
| |
We can optimize ANYOF nodes that are equivalent to POSIX character
classes. Discovering if they are equivalent takes work, which can be
skipped with a simple test that will rule out many run-of-the-mill
character classes.
|
|
|
|
|
| |
Indent a section of code in preparation for the next commit which will
make it into a block.
|
| |
|
|
|
|
| |
So in a make, it is abundantly clear where the messages are coming from
|
|
|
|
|
| |
This shouldn't actually happen, and g++ under -O0 didn't flag it, but
gcc under -O2 does, so initialize to an illegal value
|
|
|
|
|
| |
These were invalidated by commit
709be747a32edc503b4645d9c5396bd4b40100d2
|
|
|
|
|
|
| |
Commits 108316fb65dc7243a1c5d87b4b29068b7d62d32e
and 5e85fd899767ba3003766fc9289c0ee2d8427d10
broke -Dr output in rare cases.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bracketed character classes generally generate an ANYOF-type regnode,
which consists of a bitmap for the lower code points, and an inversion
list or swash to handle ones not in the bitmap. They take up more
memory than other regnode types. There are already some optimizations
that use a smaller and/or faster regnode instead. For example, some
people prefer not to use a backslash to escape metacharacters, instead
writing something like /abc[.]def/. This has for some time generated
the same thing as /abc\.def/ does, namely a single EXACT node, which is
both smaller and faster than an ANYOF node in the middle of two EXACT
nodes.
This commit adds some optimizations that hadn't been done previously.
Now things like /[\p{Word}]/ will optimize to \w, for example. I had
not done this before, because my tests had shown very little performance
difference, but I had added most of the code to regcomp.c so it wouldn't
get lost, #ifdef'd out.
It turns out that I hadn't tested on code points above the bitmap, which
with this commit have a small, but appreciable speed up in matching, so
this commit enables and finishes that code.
Prior to this commit, things like /[[:word:]]/ were optimized to \w, but
things like /[_[:word:]]/ were not. This commit fixes that.
If the following command is run on a perl compiled with -O2 and no
DEBUGGING:
blead Porting/bench.pl --raw --benchfile=charclass_perf --perlargs=-Ilib /path_to_prior_perl="before this commit" /path_to_this_perl=after
and the file 'charclass_perf' contains
[
'regex::charclass::ascii' => {
desc => 'charclass, ascii range',
setup => 'my $a = qr/[\p{Word}]/',
code => '"A" =~ $a'
},
'regex::charclass::upper_latin1' => {
desc => 'charclass, upper latin1 range',
setup => 'my $a = qr/[\p{Word}]/',
code => '"\x{e0}" =~ $a'
},
'regex::charclass::above_latin1' => {
desc => 'charclass, above latin1 range',
setup => 'my $a = qr/[\p{Word}]/',
code => '"\x{100}" =~ $a'
},
'regex::charclass::high_Unicode' => {
desc => 'charclass, high Unicode code point',
setup => 'my $a = qr/[\p{Word}]/',
code => '"\x{10FFFF}" =~ $a'
},
];
the following results are obtained:
The numbers represent raw counts per loop iteration.
regex::charclass::above_latin1
charclass, above latin1 range
before this commit after
------------------ --------
Ir 3344.0 2888.0
Dr 971.0 855.0
Dw 604.0 541.0
COND 575.0 504.0
IND 25.0 25.0
COND_m 11.0 10.7
IND_m 10.0 10.0
Ir_m1 8.9 6.0
Dr_m1 3.0 3.2
Dw_m1 1.5 1.4
Ir_mm 0.0 0.0
Dr_mm 0.0 0.0
Dw_mm 0.0 0.0
regex::charclass::ascii
charclass, ascii range
before this commit after
------------------ --------
Ir 2661.0 2649.0
Dr 798.0 795.0
Dw 516.0 517.0
COND 467.0 465.0
IND 23.0 23.0
COND_m 10.0 8.8
IND_m 10.0 10.0
Ir_m1 7.9 0.0
Dr_m1 2.9 3.1
Dw_m1 1.3 1.3
Ir_mm 0.0 0.0
Dr_mm 0.0 0.0
Dw_mm 0.0 0.0
regex::charclass::high_Unicode
charclass, high Unicode code point
before this commit after
------------------ --------
Ir 3344.0 2888.0
Dr 971.0 855.0
Dw 604.0 541.0
COND 575.0 504.0
IND 25.0 25.0
COND_m 11.0 10.7
IND_m 10.0 10.0
Ir_m1 8.9 6.0
Dr_m1 3.0 3.2
Dw_m1 1.5 1.4
Ir_mm 0.0 0.0
Dr_mm 0.0 0.0
Dw_mm 0.0 0.0
regex::charclass::upper_latin1
charclass, upper latin1 range
before this commit after
------------------ --------
Ir 2661.0 2651.0
Dr 798.0 796.0
Dw 516.0 517.0
COND 467.0 466.0
IND 23.0 23.0
COND_m 11.0 8.8
IND_m 10.0 10.0
Ir_m1 7.9 0.0
Dr_m1 2.9 3.3
Dw_m1 1.5 1.2
Ir_mm 0.0 0.0
Dr_mm 0.0 0.0
Dw_mm 0.0 0.0
|
| |
|
|
|
|
|
|
| |
This commit sets a flag at pattern compilation time to indicate if
a rare case is present that requires special handling, so that that
handling can be avoided unless necessary.
|
|
|
|
|
|
| |
This changes the spare bit to be adjacent to the LOC_FOLD bit, in
preparation for the next commit, which will use that bit for a
LOC_FOLD-related use.
|
|
|
|
|
|
|
|
| |
This is done by combining 2 mutually exclusive bits into one. I hadn't
seen this possibility before because the name of one of them misled me.
It also misled me into turning on one that flag unnecessarily, and to
miss opportunities to not have to create a swash at runtime. This
commit corrects those things as well.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Characters in a bracketed character class can come from a bunch of
sources, all bundled together. Some things under /d match only when the
target string is UTF-8; some match only when it isn't UTF-8. Other
sources may introduce ones that match regardless. It may be that some
things are specified as conditionally matching from one source, and as
unconditionally matching from another. We can subtract the
unconditionals from the conditionals, leaving a simpler set of things
that must be conditionally matched. In some cases, the conditional set
may go to zero, allowing other optimizations to happen that otherwise
couldn't. An example is
qr/[\W\xAB]/
which before this commit compiled to:
ANYOFD[^0-9A-Z_a-z\x{80}-\x{AA}\x{AC}-\x{FF}][{non-utf8-latin1-all}
{utf8}0080-00A9 00AC-00B4 00B6-00B9 00BB-00BF 00D7 00F7
02C2-02C5...] (12)
and after it, compiles to
ANYOFD[^0-9A-Z_a-z\x{AA}\x{B5}\x{BA}\x{C0}-\x{D6}\x{D8}-\x{F6}
\x{F8}-\x{FF}][{non-utf8-latin1-all}{utf8}02C2-02C5...] (12)
Notice that the {utf8} component has been stripped of everything below
256. That means no swash has to be created at runtime when matching
code points below 256, unlike the case before this commit.
A starker example, though unlikely in real life except in
machine-generated code, is
qr/[\w\W]/
Before this commit, it would generate:
ANYOFD[\x{00}-\x{7F}][{non-utf8-latin1-all}{above_bitmap_all}
{utf8}0080-00FF]
and afterwards, simply:
SANY
|
|
|
|
|
| |
This name confused me, and led to suboptimal code. The new name is more
cumbersome, but won't confuse (at least it won't confuse me).
|
|
|
|
| |
It does not work in SysV (solaris) or old BSD greps.
|
|
|
|
|
|
| |
This reverts commit 0bd66ca801c5fb84ee6a8feeb8114f0d8248029f.
Worked for me, but Jenkins isn't happy :-(
|
| |
|
| |
|
|
|
|
| |
Blead customizations are now assimilated.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On threaded builds on OS X, libSystem registers atfork handlers that
call setenv(), which internally modifies members of environ[], setting
them to malloc()ed blocks.
In some cases Perl_my_setenv() reallocates environ[] using
safesysmalloc(), which under debugging builds adds a tracking header,
and if perl_destruct() sees that environ[] has been reallocated, frees
it with safesysfree().
When these combine, perl attempts to free the malloc()ed block with
safesysfree(), which attempts to access the tracking header, causing
an invalid access in tools like valgrind, or a "free from wrong pool"
error, since the header contains unrelated data.
Avoid this mess by letting libc manage environ[] if unsetenv() is
available.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
pp_sort() saves the SV pointers for *a and *b, if the sort block
cleared *a or *b the GP, which the pointer is stored would be freed
and the save stack processing would try to write to freed memory.
Make sure the GP lasts at least long enough for the SV slots to be
restored. This doesn't attempt to restore *a or *b, the user chose
to clear them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Symptom of failure: in openindiana "make" fails:
...
./perl -Ilib -f pod/buildtoc -q
Can't load 'lib/auto/re/re.so' for module re: ld.so.1: perl: fatal:
relocation error: file lib/auto/re/re.so: symbol PL_localizing:
referenced symbol not found at lib/XSLoader.pm line 71.
at lib/re.pm line 88.
...
Running the above command with 'env LD_DEBUG=files ...' shows that
there are many other symbol lookup failures, the one above is just
the last one before bailing.
If configured explicitly with -Duseshrplib, openindiana build succeeds.
Curiously, while the hints/solaris_2.sh (which openindiana uses) does
not specify useshrplib, Oracle/Sun builds/has been building their perl
with useshrplib since Perl 5.6.1 or thereabouts (source: Alan Burlison).
Using shared libraries is strongly recommended in Solaris in general
(source: the same).
Tested in:
- Solaris 5.10/i386 with solstudio 12.2 and gcc 4.8.0
- Solaris 5.10/sparc with solarisstudio 12.3 and gcc 4.9.2
- OpenIndiana 5.11/i386 with solarisstudio 12.3 and gcc 4.5.0
|
| |
|
|
|
|
|
|
|
| |
See thread starting at
http://nntp.perl.org/group/perl.perl5.porters/227698
Ricardo Signes provided the perldelta and perldiag text.
|
|
|
|
|
| |
The old text used the passive voice. No 5.23 release has been made with
the old text, so no perldelta changes are needed.
|
| |
|