summaryrefslogtreecommitdiff
path: root/l1_char_class_tab.h
Commit message (Collapse)AuthorAgeFilesLines
* Revert "l1_char_class_tab.h: Remove multi-char fold targets"Karl Williamson2011-02-141-21/+21
| | | | | | | | | | This reverts commit 88c8c9616516015e2fe0b502cdb92dc4efcc0c10. It turns out that these multi-char fold targets are now needed; In a future commit, I plan to compile in the dozen or so rules that are needed to avoid a Latin1-only regex from having to go out to the utf8 tables to avoid the performance penalty; or calling code can use the also forthcoming 'use re "/aa"'.
* l1_char_class_tab.h: Remove multi-char fold targetsKarl Williamson2011-02-041-21/+21
| | | | | | | These are not currently used, and slow things down, as regular expressions that have them, such as /[Etl]/i now have to go out and load utf8 code. This remains the case, though, for bracketed character classes that include [KkSs].
* Move mk_PL_charclass.pl from Porting/ to regen/Nicholas Clark2011-01-241-1/+1
|
* Convert mk_PL_charclass.pl to use regen_lib.plNicholas Clark2011-01-241-1/+9
| | | | | | | Change it to read CaseFolding.txt from lib/unicore, instead of the file installed with perl, so that it can run with an uninstalled perl. Add "read only" editor blocks to l1_char_class_tab.h
* Move the non-generated parts of l1_char_class_tab.h out into handy.hNicholas Clark2011-01-241-46/+0
| | | | | Now the contents of l1_char_class_tab.h is only the output of Porting/mk_PL_charclass.pl
* l1_char_class_tab.h: include multi-char foldsKarl Williamson2010-12-151-21/+21
| | | | This patch is the result of running mk_PL_charclass.pl
* l1_char_class_tab.h: Add new bit to table.Karl Williamson2010-11-221-9/+9
| | | | | | The output of the revised Porting/mk_charclass.pl is here incorporated into this .h., with a #define for the new bit that signifies if a character participates in a fold with a non-latin1 character.
* l1_char_class_tab.h: Wrong for ALNUMCKarl Williamson2010-10-311-65/+65
| | | | | The generated table was wrong in the Latin1 range for characters with the ALNUMC property
* Add 256 word bit table of character classesKarl Williamson2010-09-251-0/+303
This patch adds a table for looking up character classes. It is 256 words long, in l1_char_class_tab.h, with each word corresponding to the ordinal of a Latin1 character, and each word contains a bit map of all the properties that character matches. Each property has a bit or two. Ones named _CC_property_A are true only if the character is also in the ASCII character set. Ones named CC_property_L1 do not have this restriction. (L1 stands for Latin1.) Also added is a script that generates the table. It is not anticipated that this will need to be used often. (This commit was changed from its original form by Steffen.)