summaryrefslogtreecommitdiff
path: root/pod/perlre.pod
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2018-08-16 16:27:52 -0600
committerKarl Williamson <khw@cpan.org>2018-08-16 16:54:39 -0600
commit8350b2740abc0cad147113487148473a9e19034b (patch)
treeeec5b35258b4a08d72d703fb76bbf228bd2d4b70 /pod/perlre.pod
parent7da8e27b9d7d2be4e770d074405ddb9941e6c8b7 (diff)
downloadperl-8350b2740abc0cad147113487148473a9e19034b.tar.gz
perlre, perlrecharclass: Add examples
This adds more concrete cases of how mixed script digits can be hazardous.
Diffstat (limited to 'pod/perlre.pod')
-rw-r--r--pod/perlre.pod15
1 files changed, 10 insertions, 5 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 70c53f1536..ce557edf4d 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -649,11 +649,16 @@ possible matches. And some of those digits look like some of the 10
ASCII digits, but mean a different number, so a human could easily think
a number is a different quantity than it really is. For example,
C<BENGALI DIGIT FOUR> (U+09EA) looks very much like an
-C<ASCII DIGIT EIGHT> (U+0038). And, C<\d+>, may match strings of digits
-that are a mixture from different writing systems, creating a security
-issue. L<Unicode::UCD/num()> can be used to sort
-this out. Or the C</a> modifier can be used to force C<\d> to match
-just the ASCII 0 through 9.
+C<ASCII DIGIT EIGHT> (U+0038), and C<LEPCHA DIGIT SIX> (U+1C46) looks
+very much like an C<ASCII DIGIT FIVE> (U+0035). And, C<\d+>, may match
+strings of digits that are a mixture from different writing systems,
+creating a security issue. A fraudulent website, for example, could
+display the price of something using U+1C46, and it would appear to the
+user that something cost 500 units, but it really costs 600. A browser
+that enforced script runs (L</Script Runs>) would prevent that
+fraudulent display. L<Unicode::UCD/num()> can also be used to sort this
+out. Or the C</a> modifier can be used to force C<\d> to match just the
+ASCII 0 through 9.
Also, under this modifier, case-insensitive matching works on the full
set of Unicode