diff options
author | Karl Williamson <khw@cpan.org> | 2018-08-16 16:27:52 -0600 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2018-08-16 16:54:39 -0600 |
commit | 8350b2740abc0cad147113487148473a9e19034b (patch) | |
tree | eec5b35258b4a08d72d703fb76bbf228bd2d4b70 /pod/perlrecharclass.pod | |
parent | 7da8e27b9d7d2be4e770d074405ddb9941e6c8b7 (diff) | |
download | perl-8350b2740abc0cad147113487148473a9e19034b.tar.gz |
perlre, perlrecharclass: Add examples
This adds more concrete cases of how mixed script digits can be
hazardous.
Diffstat (limited to 'pod/perlrecharclass.pod')
-rw-r--r-- | pod/perlrecharclass.pod | 7 |
1 files changed, 5 insertions, 2 deletions
diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod index 3b5c5b12b1..225a092c05 100644 --- a/pod/perlrecharclass.pod +++ b/pod/perlrecharclass.pod @@ -109,14 +109,17 @@ security issues. Some digits that C<\d> matches look like some of the [0-9] ones, but have different values. For example, BENGALI DIGIT FOUR (U+09EA) looks -very much like an ASCII DIGIT EIGHT (U+0038). An application that +very much like an ASCII DIGIT EIGHT (U+0038), and LEPCHA DIGIT SIX +(U+1C46) looks very much like an ASCII DIGIT FIVE (U+0035). An +application that is expecting only the ASCII digits might be misled, or if the match is C<\d+>, the matched string might contain a mixture of digits from different writing systems that look like they signify a number different than they actually do. L<Unicode::UCD/num()> can be used to safely calculate the value, returning C<undef> if the input string contains -such a mixture. +such a mixture. Otherwise, for example, a displayed price might be +deliberately different than it appears. What C<\p{Digit}> means (and hence C<\d> except under the C</a> modifier) is C<\p{General_Category=Decimal_Number}>, or synonymously, |