Make /[\N{}-\N{}]/ match Unicodely on EBCDIC

This makes [\N{U+06}-\N{U+09}] match U+06, U+07, U+08, U+09 even on EBCDIC platforms, allowing one to write portable ranges. For 1047 EBCDIC this would match 0x2E, 0x2F, 0x16, and 0x05. Thanks to Yaroslave Kuzmin for finding a bug in an earlier incarnation of this patch.
author: Karl Williamson <khw@cpan.org> 2014-11-24 13:19:21 -0700
committer: Karl Williamson <khw@cpan.org> 2014-11-24 13:43:07 -0700
commit: c7d255944c0b238f9cec18e728822535d42a9ed2 (patch)
tree: 4ac5dfc5e6cbd25c3a26fad3f166b37ab639acca /pod/perlrecharclass.pod
parent: 22e7ef05c1f7a7fcd58d10d6e720579b9bbea728 (diff)
download: perl-c7d255944c0b238f9cec18e728822535d42a9ed2.tar.gz
1 files changed, 15 insertions, 3 deletions
diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod
index c79c9a0399..fb5868d521 100644
--- a/pod/perlrecharclass.pod
+++ b/pod/perlrecharclass.pod
@@ -608,10 +608,22 @@ Examples:
              #  hyphen ('-'), or the letter 'm'.
  ['-?]       #  Matches any of the characters  '()*+,-./0123456789:;<=>?
              #  (But not on an EBCDIC platform).
-
-Perl guarantees that the ranges C<A-Z>, C<a-z>, C<0-9>, and any
+ [\N{APOSTROPHE}-\N{QUESTION MARK}]
+             #  Matches any of the characters  '()*+,-./0123456789:;<=>?
+             #  even on an EBCDIC platform.
+ [\N{U+27}-\N{U+3F}] # Same. (U+27 is "'", and U+3F is "?"
+
+As the final two examples above show, you can achieve portablity to
+non-ASCII platforms by using the C<\N{...}> form for the range
+endpoints.  These indicate that the specified range is to be interpreted
+using Unicode values, so C<[\N{U+27}-\N{U+3F}]> means to match
+C<\N{U+27}>, C<\N{U+28}>, C<\N{U+29}>, ..., C<\N{U+3D}>, C<\N{U+3E}>,
+and C<\N{U+3F}>, whatever the native code point versions for those are.
+
+Perl also guarantees that the ranges C<A-Z>, C<a-z>, C<0-9>, and any
 subranges of these match what an English-only speaker would expect them
-to match.  That is, C<[A-Z]> matches the 26 ASCII uppercase letters;
+to match on any platform.  That is, C<[A-Z]> matches the 26 ASCII
+uppercase letters;
 C<[a-z]> matches the 26 lowercase letters; and C<[0-9]> matches the 10
 digits.  Subranges, like C<[h-k]>, match correspondingly, in this case
 just the four letters C<"h">, C<"i">, C<"j">, and C<"k">.  This is the
author	Karl Williamson <khw@cpan.org>	2014-11-24 13:19:21 -0700
committer	Karl Williamson <khw@cpan.org>	2014-11-24 13:43:07 -0700
commit	c7d255944c0b238f9cec18e728822535d42a9ed2 (patch)
tree	4ac5dfc5e6cbd25c3a26fad3f166b37ab639acca /pod/perlrecharclass.pod
parent	22e7ef05c1f7a7fcd58d10d6e720579b9bbea728 (diff)
download	perl-c7d255944c0b238f9cec18e728822535d42a9ed2.tar.gz