diff options
author | Karl Williamson <khw@cpan.org> | 2019-02-13 10:02:13 -0700 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2019-02-13 10:09:31 -0700 |
commit | 7835a09a181366ad4d4188409a4c0e3a6236fcf5 (patch) | |
tree | 82c53e73d6b847c90bf102813668590e37fe1b9c /pod/perlrecharclass.pod | |
parent | 4f5c9941bb6f93a967e4cc3ef19c9d39351f0ad3 (diff) | |
download | perl-7835a09a181366ad4d4188409a4c0e3a6236fcf5.tar.gz |
perlrecharclass: Note many fewer xdigits than digts
This adds a note explaining why there are only two sets of hex digits
Diffstat (limited to 'pod/perlrecharclass.pod')
-rw-r--r-- | pod/perlrecharclass.pod | 14 |
1 files changed, 12 insertions, 2 deletions
diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod index 4e2857cddb..e07638844b 100644 --- a/pod/perlrecharclass.pod +++ b/pod/perlrecharclass.pod @@ -800,7 +800,7 @@ Perl recognizes the following POSIX character classes: ("\cK"). upper Any uppercase character (e.g., [A-Z]). word A Perl extension (e.g., [A-Za-z0-9_]), equivalent to "\w". - xdigit Any hexadecimal digit (e.g., [0-9a-fA-F]). + xdigit Any hexadecimal digit (e.g., [0-9a-fA-F]). Note [7]. Like the L<Unicode properties|/Unicode Properties>, most of the POSIX properties match the same regardless of whether case-insensitive (C</i>) @@ -841,7 +841,7 @@ equivalent. space \p{PosixSpace} \p{XPosixSpace} [6] upper \p{PosixUpper} \p{XPosixUpper} word \p{PosixWord} \p{XPosixWord} \w - xdigit \p{PosixXDigit} \p{XPosixXDigit} + xdigit \p{PosixXDigit} \p{XPosixXDigit} [7] =over 4 @@ -896,6 +896,16 @@ v5.18. In earlier versions, these differ only in that in non-locale matching, C<\p{XPerlSpace}> did not match the vertical tab, C<\cK>. Same for the two ASCII-only range forms. +=item [7] + +Unlike C<[[:digit:]]> which matches digits in many writing systems, such +as Thai and Devanagari, there are currently only two sets of hexadecimal +digits, and it is unlikely that more will be added. This is because you +not only need the ten digits, but also the six C<[A-F]> (and C<[a-f]>) +to correspond. That means only the Latin script is suitable for these, +and Unicode has only two sets of these, the familiar ASCII set, and the +fullwidth forms starting at U+FF10 (FULLWIDTH DIGIT ZERO). + =back There are various other synonyms that can be used besides the names |