diff options
author | Karl Williamson <public@khwilliamson.com> | 2011-02-27 18:44:43 -0700 |
---|---|---|
committer | Karl Williamson <public@khwilliamson.com> | 2011-02-27 19:21:33 -0700 |
commit | d50a4f90cab527593b2dd218f71b66a6be555490 (patch) | |
tree | 37c9334aa808d276506f002fcc5a34ae770073c2 /pod/perldiag.pod | |
parent | 2335b3d39eb70759d992779a5e8e11443648e5dd (diff) | |
download | perl-d50a4f90cab527593b2dd218f71b66a6be555490.tar.gz |
Handle [folds] of 0-255 without swashes
Commit 56ca34cada940c7f6aae9a59da266e541530041e had the side effect of
causing regular expressions with things like [a-z], or even just [k] to
go out to disk to read tables to create swashes because it knew that
some of those characters matched outside the bitmap (and due to
l1_char_class_tab.h it knew which ones had those matches), but it didn't
know what the characters were that participated in those folds.
This patch hard-codes the Unicode 6.0 rules into regcomp.c for the
code points 0-255, so that the very slow utf8_heavy is not invoked on
them. (Code points above 255 will continue to invoke it.) It would,
of course, be better if these rules could be regen'd into regcomp.c, as
there is a risk that the standard will change, and the code will not.
But I don't think that has ever happened; in other words, I think that
the rules haven't changed so far since Day 1 of Unicode. (That would
not be the case if we were doing simple case folding, as the capital
sharp ss which folds to U+00DF was added later.) And the Standard is
getting more stable in this area. I believe one of their stability
policies now forbid them from adding something that simply folds to
one of the characters that already has a fold, such as M and m.
Ligatures are frowned on, and I doubt that new ones would be encoded,
so that leaves a new Unicode character that folds to a Latin-1 plus some
sort of mark. For those, this code is a no-op, so those aren't a
problem either.
Diffstat (limited to 'pod/perldiag.pod')
-rw-r--r-- | pod/perldiag.pod | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod index aae2dd3b08..ce2a5d2d77 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -3607,6 +3607,15 @@ redirected it with select().) "Can't locate object method \"%s\" via package \"%s\"". It often means that a method requires a package that has not been loaded. +=item Perl folding rules are not up-to-date for 0x%x; please use the perlbug utility to report; + +(W regex, deprecated) You used a regular expression with +case-insensitive matching, and there is a bug in Perl in which the +built-in regular expression folding rules are not accurate. This may +lead to incorrect results. Please report this as a bug using the +"perlbug" utility. (This message is marked deprecated, so that it by +default will be turned-on.) + =item Perl_my_%s() not available (F) Your platform has very uncommon byte-order and integer size, |