diff options
-rw-r--r-- | pod/perlre.pod | 15 | ||||
-rw-r--r-- | pod/perlretut.pod | 11 | ||||
-rw-r--r-- | pod/perlunicode.pod | 4 |
3 files changed, 16 insertions, 14 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod index 6c687495cb..5c7e76b5ad 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -184,7 +184,9 @@ In addition, Perl defines the following: \PP Match non-P \X Match eXtended Unicode "combining character sequence", equivalent to C<(?:\PM\pM*)> - \C Match a single C char (octet) even under utf8. + \C Match a single C char (octet) even under Unicode. + B<NOTE:> breaks up characters into their UTF-8 bytes, + so you may end up with malformed pieces of UTF-8. A C<\w> matches a single alphanumeric character or C<_>, not a whole word. Use C<\w+> to match a string of Perl-identifier characters (which isn't @@ -193,7 +195,7 @@ list of alphabetic characters generated by C<\w> is taken from the current locale. See L<perllocale>. You may use C<\w>, C<\W>, C<\s>, C<\S>, C<\d>, and C<\D> within character classes, but if you try to use them as endpoints of a range, that's not a range, the "-" is understood literally. -See L<utf8> for details about C<\pP>, C<\PP>, and C<\X>. +See L<perlunicode> for details about C<\pP>, C<\PP>, and C<\X>. The POSIX character class syntax @@ -230,9 +232,10 @@ whole character class. For example: matches zero, one, any alphabetic character, and the percentage sign. -If the C<utf8> pragma is used, the following equivalences to Unicode -\p{} constructs and equivalent backslash character classes (if available), -will hold: +The following equivalences to Unicode \p{} constructs and equivalent +backslash character classes (if available), will hold: + + [:...:] \p{...} backslash alpha IsAlpha alnum IsAlnum @@ -291,7 +294,7 @@ work just fine) it is included for completeness. You can negate the [::] character classes by prefixing the class name with a '^'. This is a Perl extension. For example: - POSIX trad. Perl utf8 Perl + POSIX traditional Unicode [:^digit:] \D \P{IsDigit} [:^space:] \S \P{IsSpace} diff --git a/pod/perlretut.pod b/pod/perlretut.pod index f4e9bb6440..bb2423b8af 100644 --- a/pod/perlretut.pod +++ b/pod/perlretut.pod @@ -1653,12 +1653,11 @@ Unicode characters in the range of 128-255 use two hexadecimal digits with braces: C<\x{ab}>. Note that this is different than C<\xab>, which is just a hexadecimal byte with no Unicode significance. -B<NOTE>: in perl 5.6.0 it used to be that one needed to say C<use utf8> -to use any Unicode features. This is no more the case: for almost all -Unicode processing, the explicit C<utf8> pragma is not needed. -(The only case where it matters is if your Perl script is in Unicode, -that is, encoded in UTF-8/UTF-16/UTF-EBCDIC: then an explicit C<use utf8> -is needed.) +B<NOTE>: in Perl 5.6.0 it used to be that one needed to say C<use +utf8> to use any Unicode features. This is no more the case: for +almost all Unicode processing, the explicit C<utf8> pragma is not +needed. (The only case where it matters is if your Perl script is in +Unicode and encoded in UTF-8, then an explicit C<use utf8> is needed.) Figuring out the hexadecimal sequence of a Unicode character you want or deciphering someone else's hexadecimal Unicode regexp is about as diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 2fca71454a..64116bcae1 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -782,7 +782,7 @@ for more discussion of the issues. =head1 SEE ALSO -L<encoding>, L<Encode>, L<open>, L<bytes>, L<utf8>, L<perlretut>, -L<perlvar/"${^WIDE_SYSTEM_CALLS}"> +L<perluniintro>, L<encoding>, L<Encode>, L<open>, L<utf8>, L<bytes>, +L<perlretut>, L<perlvar/"${^WIDE_SYSTEM_CALLS}"> =cut |