diff options
author | Jarkko Hietaniemi <jhi@iki.fi> | 2003-05-11 06:25:08 +0000 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 2003-05-11 06:25:08 +0000 |
commit | dc0a4417816de4de16d412283906a2a3c2bbce9b (patch) | |
tree | e772fcf319a016f738aa97d7b239b30907e9f35f /lib/Unicode | |
parent | 82c0b05bfa6af07a8eed09c3accfe3016c6e65bc (diff) | |
download | perl-dc0a4417816de4de16d412283906a2a3c2bbce9b.tar.gz |
Clarify the doc (and the code) for Unicode code points.
p4raw-id: //depot/perl@19481
Diffstat (limited to 'lib/Unicode')
-rw-r--r-- | lib/Unicode/UCD.pm | 13 |
1 files changed, 8 insertions, 5 deletions
diff --git a/lib/Unicode/UCD.pm b/lib/Unicode/UCD.pm index dcb478a54d..ec1c9989df 100644 --- a/lib/Unicode/UCD.pm +++ b/lib/Unicode/UCD.pm @@ -126,9 +126,9 @@ you will need also the compexcl(), casefold(), and casespec() functions. sub _getcode { my $arg = shift; - if ($arg =~ /^\d+$/) { + if ($arg =~ /^[1-9]\d*$/) { return $arg; - } elsif ($arg =~ /^(?:U\+|0x)?([[:xdigit:]]+)$/) { + } elsif ($arg =~ /^(?:[Uu]\+|0[xX])?([[:xdigit:]]+)$/) { return hex($1); } @@ -457,9 +457,12 @@ any of the 256 code points in the Tibetan block). A I<code point argument> is either a decimal or a hexadecimal scalar designating a Unicode character, or C<U+> followed by hexadecimals -designating a Unicode character. Note that Unicode is B<not> limited -to 16 bits (the number of Unicode characters is open-ended, in theory -unlimited): you may have more than 4 hexdigits. +designating a Unicode character. In other words, if you want a code +point to be interpreted as a hexadecimal number, you must prefix it +with either C<0x> or C<U+>, becauseq a string like e.g. C<123> will +be interpreted as a decimal code point. Also note that Unicode is +B<not> limited to 16 bits (the number of Unicode characters is +open-ended, in theory unlimited): you may have more than 4 hexdigits. =head2 charinrange |