summaryrefslogtreecommitdiff
path: root/uni_keywords.h
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2018-04-29 21:08:37 -0600
committerKarl Williamson <khw@cpan.org>2018-06-25 07:33:29 -0600
commit0426f63574a2379bce80c33f85f158ae093be0c2 (patch)
treeb1dae5cd535465fbd78b01f364ae9b9732f2eb55 /uni_keywords.h
parent7a6f68415b295f4315b6181237ea0000dd706cd5 (diff)
downloadperl-0426f63574a2379bce80c33f85f158ae093be0c2.tar.gz
Revise \p{nv=float} lookup
The Numeric Value property allows one to find all code points that have a certain numeric value. An example would be to match against any character in any of the world's scripts which is effectively equivalent to the digit zero. It is documented that we accept either integers (like \p{nv=9}) or rationals (like \p{nv=1/2}). But we also accept floating point representations in case a conversion to numeric has happened. I think it is right that we not document these and their vagaries. One reason is that Unicode might someday create a new rational number that, to the precision we currently accept, is indistinguishable from an existing one, so that we would have to increase the precision. But there was a bug I introduced years ago. I thought that in order for a float to be considered to match a close rational, that 3 significant digits of precision would be needed, like .667 to match 2/3. That still seems reasonable. But I didn't implement that concept. Instead, prior to this commit, it was 3 (not necessarily significant) digits, so that for 1/160, it would match .001. This commit corrects that, and makes the lookup simpler. mktables will use sprintf %e to get the number normalized and having the 3 signicant digits required. At runtime, a floating number is normalized using the same format, and the result looked up in a hash. This eliminates the need to worry about matching within some epsilon. Further simplifications in utf8_heavy.pl are achieved by making a more precise definition as to what an acceptable number looks like, so we don't have to check later to see if what matched really was one.
Diffstat (limited to 'uni_keywords.h')
-rw-r--r--uni_keywords.h2
1 files changed, 1 insertions, 1 deletions
diff --git a/uni_keywords.h b/uni_keywords.h
index 418651a8d9..ef959407af 100644
--- a/uni_keywords.h
+++ b/uni_keywords.h
@@ -6751,7 +6751,7 @@ MPH_VALt match_uniprop( const unsigned char * const key, const U16 key_len ) {
* be0f129691d479aa38646e4ca0ec1ee576ae7f75b0300a5624a7fa862fa8abba lib/unicore/extracted/DLineBreak.txt
* 92449d354d9f6b6f2f97a292ebb59f6344ffdeb83d120d7d23e569c43ba67cd5 lib/unicore/extracted/DNumType.txt
* e3a319527153b0c6c0c549b40fc6f3a01a7a0dcd6620784391db25901df3b154 lib/unicore/extracted/DNumValues.txt
- * 6f7e75c46e2c6e4cff53fd9c14a0fbc77611809565d609b15cb98868c5891cdd lib/unicore/mktables
+ * c237f9e6bda604db4388693b42a20ee0d5c2cf9c08152beca27aa0e1ee735550 lib/unicore/mktables
* 21653d2744fdd071f9ef138c805393901bb9547cf3e777ebf50215a191f986ea lib/unicore/version
* 4bb677187a1a64e39d48f2e341b5ecb6c99857e49d7a79cf503bd8a3c709999b regen/charset_translations.pl
* 03e51b0f07beebd5da62ab943899aa4934eee1f792fa27c1fb638c33bf4ac6ea regen/mk_PL_charclass.pl