summaryrefslogtreecommitdiff
path: root/pod/perlunicode.pod
diff options
context:
space:
mode:
authorNick Ing-Simmons <nik@tiuk.ti.com>2001-04-05 21:32:26 +0000
committerNick Ing-Simmons <nik@tiuk.ti.com>2001-04-05 21:32:26 +0000
commit0a1f2d144e4463451f8627bd1c6ca420a59b01b0 (patch)
treeb1f6981a3fe5fa891326c4d23972ff64f451778c /pod/perlunicode.pod
parent62efc1596d65f50561044b28d65870870b167946 (diff)
downloadperl-0a1f2d144e4463451f8627bd1c6ca420a59b01b0.tar.gz
Change sense from "incomplete" to "implemented but needs more work" in perlunicode.pod
p4raw-id: //depot/perlio@9569
Diffstat (limited to 'pod/perlunicode.pod')
-rw-r--r--pod/perlunicode.pod36
1 files changed, 24 insertions, 12 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index 30a4482260..bb3ce2b87d 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -4,28 +4,40 @@ perlunicode - Unicode support in Perl
=head1 DESCRIPTION
-=head2 Important Caveat
+=head2 Important Caveats
-WARNING: The implementation of Unicode support in Perl is incomplete.
+WARNING: While the implementation of Unicode support in Perl is now fairly
+complete it is still evolving to some extent.
-The following areas need further work.
+In particular the way Unicode is handled on EBCDIC platforms is still rather
+experimental. On such a platform references to UTF-8 encoding in this
+document and elsewhere should be read as meaning UTF-EBCDIC as specified
+in Unicode Technical Report 16 unless ASCII vs EBCDIC issues are specifically
+discussed. There is no C<utfebcdic> pragma or ":utfebcdic" layer, rather
+"utf8" and ":utf8" are re-used to mean platform's "natural" 8-bit encoding
+of Unicode. See L<perlebcdic> for more discussion of the issues.
+
+The following areas are still under development.
=over 4
=item Input and Output Disciplines
-There is currently no easy way to mark data read from a file or other
-external source as being utf8. This will be one of the major areas of
-focus in the near future.
+A filehandle can be marked as containing perl's internal Unicode encoding
+(UTF-8 or UTF-EBCDIC) by opening it with the ":utf8" layer.
+Other encodings can be converted to perl's encoding on input, or from
+perl's encoding on output by use of the ":encoding()" layer.
+There is not yet a clean way to mark the perl source itself as being
+in an particular encoding.
=item Regular Expressions
-The existing regular expression compiler does not produce polymorphic
-opcodes. This means that the determination on whether to match Unicode
-characters is made when the pattern is compiled, based on whether the
-pattern contains Unicode characters, and not when the matching happens
-at run time. This needs to be changed to adaptively match Unicode if
-the string to be matched is Unicode.
+The regular expression compiler does now attempt to produce polymorphic
+opcodes. That is the pattern should now adapt to the data and
+automaticaly switch to the Unicode character scheme when presented with Unicode data,
+or a traditional byte scheme when presented with byte data.
+The implementation is still new and (particularly on EBCDIC platforms) may
+need further work.
=item C<use utf8> still needed to enable a few features