diff options
author | Karl Williamson <khw@cpan.org> | 2018-04-29 21:08:37 -0600 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2018-06-25 07:33:29 -0600 |
commit | 0426f63574a2379bce80c33f85f158ae093be0c2 (patch) | |
tree | b1dae5cd535465fbd78b01f364ae9b9732f2eb55 /uni_keywords.h | |
parent | 7a6f68415b295f4315b6181237ea0000dd706cd5 (diff) | |
download | perl-0426f63574a2379bce80c33f85f158ae093be0c2.tar.gz |
Revise \p{nv=float} lookup
The Numeric Value property allows one to find all code points that have
a certain numeric value. An example would be to match against any
character in any of the world's scripts which is effectively equivalent
to the digit zero.
It is documented that we accept either integers (like \p{nv=9}) or
rationals (like \p{nv=1/2}). But we also accept floating point
representations in case a conversion to numeric has happened. I think
it is right that we not document these and their vagaries. One reason
is that Unicode might someday create a new rational number that, to the
precision we currently accept, is indistinguishable from an existing
one, so that we would have to increase the precision.
But there was a bug I introduced years ago. I thought that in order for
a float to be considered to match a close rational, that 3 significant
digits of precision would be needed, like .667 to match 2/3. That still
seems reasonable. But I didn't implement that concept. Instead, prior
to this commit, it was 3 (not necessarily significant) digits, so that
for 1/160, it would match .001.
This commit corrects that, and makes the lookup simpler. mktables will
use sprintf %e to get the number normalized and having the 3 signicant
digits required. At runtime, a floating number is normalized using the
same format, and the result looked up in a hash. This eliminates the
need to worry about matching within some epsilon.
Further simplifications in utf8_heavy.pl are achieved by making a more
precise definition as to what an acceptable number looks like, so we
don't have to check later to see if what matched really was one.
Diffstat (limited to 'uni_keywords.h')
-rw-r--r-- | uni_keywords.h | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/uni_keywords.h b/uni_keywords.h index 418651a8d9..ef959407af 100644 --- a/uni_keywords.h +++ b/uni_keywords.h @@ -6751,7 +6751,7 @@ MPH_VALt match_uniprop( const unsigned char * const key, const U16 key_len ) { * be0f129691d479aa38646e4ca0ec1ee576ae7f75b0300a5624a7fa862fa8abba lib/unicore/extracted/DLineBreak.txt * 92449d354d9f6b6f2f97a292ebb59f6344ffdeb83d120d7d23e569c43ba67cd5 lib/unicore/extracted/DNumType.txt * e3a319527153b0c6c0c549b40fc6f3a01a7a0dcd6620784391db25901df3b154 lib/unicore/extracted/DNumValues.txt - * 6f7e75c46e2c6e4cff53fd9c14a0fbc77611809565d609b15cb98868c5891cdd lib/unicore/mktables + * c237f9e6bda604db4388693b42a20ee0d5c2cf9c08152beca27aa0e1ee735550 lib/unicore/mktables * 21653d2744fdd071f9ef138c805393901bb9547cf3e777ebf50215a191f986ea lib/unicore/version * 4bb677187a1a64e39d48f2e341b5ecb6c99857e49d7a79cf503bd8a3c709999b regen/charset_translations.pl * 03e51b0f07beebd5da62ab943899aa4934eee1f792fa27c1fb638c33bf4ac6ea regen/mk_PL_charclass.pl |