perlrecharclass: Note many fewer xdigits than digts

This adds a note explaining why there are only two sets of hex digits
author: Karl Williamson <khw@cpan.org> 2019-02-13 10:02:13 -0700
committer: Karl Williamson <khw@cpan.org> 2019-02-13 10:09:31 -0700
commit: 7835a09a181366ad4d4188409a4c0e3a6236fcf5 (patch)
tree: 82c53e73d6b847c90bf102813668590e37fe1b9c /pod/perlrecharclass.pod
parent: 4f5c9941bb6f93a967e4cc3ef19c9d39351f0ad3 (diff)
download: perl-7835a09a181366ad4d4188409a4c0e3a6236fcf5.tar.gz
1 files changed, 12 insertions, 2 deletions
diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod
index 4e2857cddb..e07638844b 100644
--- a/pod/perlrecharclass.pod
+++ b/pod/perlrecharclass.pod
@@ -800,7 +800,7 @@ Perl recognizes the following POSIX character classes:
         ("\cK").
  upper  Any uppercase character (e.g., [A-Z]).
  word   A Perl extension (e.g., [A-Za-z0-9_]), equivalent to "\w".
- xdigit Any hexadecimal digit (e.g., [0-9a-fA-F]).
+ xdigit Any hexadecimal digit (e.g., [0-9a-fA-F]).  Note [7].
 
 Like the L<Unicode properties|/Unicode Properties>, most of the POSIX
 properties match the same regardless of whether case-insensitive (C</i>)
@@ -841,7 +841,7 @@ equivalent.
    space      \p{PosixSpace}       \p{XPosixSpace}          [6]
    upper      \p{PosixUpper}       \p{XPosixUpper}
    word       \p{PosixWord}        \p{XPosixWord}   \w
-   xdigit     \p{PosixXDigit}      \p{XPosixXDigit}
+   xdigit     \p{PosixXDigit}      \p{XPosixXDigit}         [7]
 
 =over 4
 
@@ -896,6 +896,16 @@ v5.18.  In earlier versions, these differ only in that in non-locale
 matching, C<\p{XPerlSpace}> did not match the vertical tab, C<\cK>.
 Same for the two ASCII-only range forms.
 
+=item [7]
+
+Unlike C<[[:digit:]]> which matches digits in many writing systems, such
+as Thai and Devanagari, there are currently only two sets of hexadecimal
+digits, and it is unlikely that more will be added.  This is because you
+not only need the ten digits, but also the six C<[A-F]> (and C<[a-f]>)
+to correspond.  That means only the Latin script is suitable for these,
+and Unicode has only two sets of these, the familiar ASCII set, and the
+fullwidth forms starting at U+FF10 (FULLWIDTH DIGIT ZERO).
+
 =back
 
 There are various other synonyms that can be used besides the names
author	Karl Williamson <khw@cpan.org>	2019-02-13 10:02:13 -0700
committer	Karl Williamson <khw@cpan.org>	2019-02-13 10:09:31 -0700
commit	7835a09a181366ad4d4188409a4c0e3a6236fcf5 (patch)
tree	82c53e73d6b847c90bf102813668590e37fe1b9c /pod/perlrecharclass.pod
parent	4f5c9941bb6f93a967e4cc3ef19c9d39351f0ad3 (diff)
download	perl-7835a09a181366ad4d4188409a4c0e3a6236fcf5.tar.gz