diff options
author | Juerd Waalboer <#####@juerd.nl> | 2007-03-04 17:00:19 +0100 |
---|---|---|
committer | H.Merijn Brand <h.m.brand@xs4all.nl> | 2007-03-07 13:23:23 +0000 |
commit | 2575c402a8f9be55f848bdfb219afbf912c50ac1 (patch) | |
tree | c21a19c42deaa2dba098c38d74338a7c01328c28 /pod/perluniintro.pod | |
parent | 2a6a970fa1b36c99c83fd3fdd48253c1b567db9b (diff) | |
download | perl-2575c402a8f9be55f848bdfb219afbf912c50ac1.tar.gz |
Re: [PATCH] (Re: [PATCH] unicode/utf8 pod)
Message-ID: <20070304150019.GN4723@c4.convolution.nl>
p4raw-id: //depot/perl@30493
Diffstat (limited to 'pod/perluniintro.pod')
-rw-r--r-- | pod/perluniintro.pod | 22 |
1 files changed, 4 insertions, 18 deletions
diff --git a/pod/perluniintro.pod b/pod/perluniintro.pod index b0d5859065..9337e5f919 100644 --- a/pod/perluniintro.pod +++ b/pod/perluniintro.pod @@ -278,21 +278,7 @@ encodings, I/O, and certain special cases: When you combine legacy data and Unicode the legacy data needs to be upgraded to Unicode. Normally ISO 8859-1 (or EBCDIC, if -applicable) is assumed. You can override this assumption by -using the C<encoding> pragma, for example - - use encoding 'latin2'; # ISO 8859-2 - -in which case literals (string or regular expressions), C<chr()>, -and C<ord()> in your whole script are assumed to produce Unicode -characters from ISO 8859-2 code points. Note that the matching for -encoding names is forgiving: instead of C<latin2> you could have -said C<Latin 2>, or C<iso8859-2>, or other variations. With just - - use encoding; - -the environment variable C<PERL_ENCODING> will be consulted. -If that variable isn't set, the encoding pragma will fail. +applicable) is assumed. The C<Encode> module knows about many encodings and has interfaces for doing conversions between those encodings: @@ -404,8 +390,8 @@ the file "text.utf8", encoded as UTF-8: while (<$nihongo>) { print $unicode $_ } The naming of encodings, both by the C<open()> and by the C<open> -pragma, is similar to the C<encoding> pragma in that it allows for -flexible names: C<koi8-r> and C<KOI8R> will both be understood. +pragma allows for flexible names: C<koi8-r> and C<KOI8R> will both be +understood. Common encodings recognized by ISO, MIME, IANA, and various other standardisation organisations are recognised; for a more detailed @@ -885,7 +871,7 @@ to UTF-8 bytes and back, the code works even with older Perl 5 versions. =head1 SEE ALSO -L<perlunicode>, L<Encode>, L<encoding>, L<open>, L<utf8>, L<bytes>, +L<perlunitut>, L<perlunicode>, L<Encode>, L<open>, L<utf8>, L<bytes>, L<perlretut>, L<perlrun>, L<Unicode::Collate>, L<Unicode::Normalize>, L<Unicode::UCD> |