summaryrefslogtreecommitdiff
path: root/pod/perlrecharclass.pod
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2019-02-13 10:02:13 -0700
committerKarl Williamson <khw@cpan.org>2019-02-13 10:09:31 -0700
commit7835a09a181366ad4d4188409a4c0e3a6236fcf5 (patch)
tree82c53e73d6b847c90bf102813668590e37fe1b9c /pod/perlrecharclass.pod
parent4f5c9941bb6f93a967e4cc3ef19c9d39351f0ad3 (diff)
downloadperl-7835a09a181366ad4d4188409a4c0e3a6236fcf5.tar.gz
perlrecharclass: Note many fewer xdigits than digts
This adds a note explaining why there are only two sets of hex digits
Diffstat (limited to 'pod/perlrecharclass.pod')
-rw-r--r--pod/perlrecharclass.pod14
1 files changed, 12 insertions, 2 deletions
diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod
index 4e2857cddb..e07638844b 100644
--- a/pod/perlrecharclass.pod
+++ b/pod/perlrecharclass.pod
@@ -800,7 +800,7 @@ Perl recognizes the following POSIX character classes:
("\cK").
upper Any uppercase character (e.g., [A-Z]).
word A Perl extension (e.g., [A-Za-z0-9_]), equivalent to "\w".
- xdigit Any hexadecimal digit (e.g., [0-9a-fA-F]).
+ xdigit Any hexadecimal digit (e.g., [0-9a-fA-F]). Note [7].
Like the L<Unicode properties|/Unicode Properties>, most of the POSIX
properties match the same regardless of whether case-insensitive (C</i>)
@@ -841,7 +841,7 @@ equivalent.
space \p{PosixSpace} \p{XPosixSpace} [6]
upper \p{PosixUpper} \p{XPosixUpper}
word \p{PosixWord} \p{XPosixWord} \w
- xdigit \p{PosixXDigit} \p{XPosixXDigit}
+ xdigit \p{PosixXDigit} \p{XPosixXDigit} [7]
=over 4
@@ -896,6 +896,16 @@ v5.18. In earlier versions, these differ only in that in non-locale
matching, C<\p{XPerlSpace}> did not match the vertical tab, C<\cK>.
Same for the two ASCII-only range forms.
+=item [7]
+
+Unlike C<[[:digit:]]> which matches digits in many writing systems, such
+as Thai and Devanagari, there are currently only two sets of hexadecimal
+digits, and it is unlikely that more will be added. This is because you
+not only need the ten digits, but also the six C<[A-F]> (and C<[a-f]>)
+to correspond. That means only the Latin script is suitable for these,
+and Unicode has only two sets of these, the familiar ASCII set, and the
+fullwidth forms starting at U+FF10 (FULLWIDTH DIGIT ZERO).
+
=back
There are various other synonyms that can be used besides the names