diff options
author | Karl Williamson <public@khwilliamson.com> | 2012-06-28 13:32:17 -0600 |
---|---|---|
committer | Karl Williamson <public@khwilliamson.com> | 2012-07-24 21:13:43 -0600 |
commit | e94e94b5ceeb265476690d9992a953b7d876f3a1 (patch) | |
tree | 06c23cac92667c6d749c6cf3ccd882aa5f077075 /Configure | |
parent | f792674226f74e98903d6b00d08167effecfd8e9 (diff) | |
download | perl-e94e94b5ceeb265476690d9992a953b7d876f3a1.tar.gz |
mktables: Generate new table for foldable chars
This table consists of all characters that participate in any way in a
fold in the current Unicode version. regcomp.c currently uses the Cased
property as a proxy for these. This information is used to limit the
number of characters whose folds have to be dealt with in compiling
bracketed regex character classess. It turns out that Cased contains
more than 1300 more code points than actually do appear in folds, which
means potential extra work for compiling. Hence this patch allows that
work to be avoided.
There are a few characters in this new table that aren't in Cased, which
are potential bugs in the old way of doing things. In Unicode 6.1,
these are: U+02BC MODIFIER LETTER APOSTROPHE, U+0308 COMBINING
DIAERESIS, U+0313 COMBINING COMMA ABOVE, and U+0342 COMBINING GREEK
PERISPOMENI. I can't figure out how these might be currently causing a
bug, but this patch fixes any such.
Diffstat (limited to 'Configure')
0 files changed, 0 insertions, 0 deletions