| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
This changes this header to include a bit for each character indicating
if it should be quoted by quotemeta under unicode_strings
|
|
|
|
|
|
| |
This commit delivers the official Unicode character database files for
release 6.1, plus the final bits needed to cope with the changes in them
from release 6.0, including documentation.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 88c8c9616516015e2fe0b502cdb92dc4efcc0c10.
It turns out that these multi-char fold targets are now needed;
In a future commit, I plan to compile in the dozen or so rules that
are needed to avoid a Latin1-only regex from having to go out to the
utf8 tables to avoid the performance penalty; or calling code can use
the also forthcoming 'use re "/aa"'.
|
|
|
|
|
|
|
| |
These are not currently used, and slow things down, as regular
expressions that have them, such as /[Etl]/i now have to go out and load
utf8 code. This remains the case, though, for bracketed character
classes that include [KkSs].
|
| |
|
|
|
|
|
|
|
| |
Change it to read CaseFolding.txt from lib/unicore, instead of the file
installed with perl, so that it can run with an uninstalled perl.
Add "read only" editor blocks to l1_char_class_tab.h
|
|
|
|
|
| |
Now the contents of l1_char_class_tab.h is only the output of
Porting/mk_PL_charclass.pl
|
|
|
|
| |
This patch is the result of running mk_PL_charclass.pl
|
|
|
|
|
|
| |
The output of the revised Porting/mk_charclass.pl is here incorporated
into this .h., with a #define for the new bit that signifies if a
character participates in a fold with a non-latin1 character.
|
|
|
|
|
| |
The generated table was wrong in the Latin1 range for characters with
the ALNUMC property
|
|
This patch adds a table for looking up character classes. It is 256
words long, in l1_char_class_tab.h, with each word corresponding to the
ordinal of a Latin1 character, and each word contains a bit map of all
the properties that character matches. Each property has a bit or two.
Ones named _CC_property_A are true only if the character is also in the
ASCII character set. Ones named CC_property_L1 do not have this
restriction. (L1 stands for Latin1.)
Also added is a script that generates the table. It is not anticipated
that this will need to be used often.
(This commit was changed from its original form by Steffen.)
|