diff options
author | Nick Ing-Simmons <nik@tiuk.ti.com> | 2001-04-05 21:32:26 +0000 |
---|---|---|
committer | Nick Ing-Simmons <nik@tiuk.ti.com> | 2001-04-05 21:32:26 +0000 |
commit | 0a1f2d144e4463451f8627bd1c6ca420a59b01b0 (patch) | |
tree | b1f6981a3fe5fa891326c4d23972ff64f451778c /pod/perlunicode.pod | |
parent | 62efc1596d65f50561044b28d65870870b167946 (diff) | |
download | perl-0a1f2d144e4463451f8627bd1c6ca420a59b01b0.tar.gz |
Change sense from "incomplete" to "implemented but needs more work" in perlunicode.pod
p4raw-id: //depot/perlio@9569
Diffstat (limited to 'pod/perlunicode.pod')
-rw-r--r-- | pod/perlunicode.pod | 36 |
1 files changed, 24 insertions, 12 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 30a4482260..bb3ce2b87d 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -4,28 +4,40 @@ perlunicode - Unicode support in Perl =head1 DESCRIPTION -=head2 Important Caveat +=head2 Important Caveats -WARNING: The implementation of Unicode support in Perl is incomplete. +WARNING: While the implementation of Unicode support in Perl is now fairly +complete it is still evolving to some extent. -The following areas need further work. +In particular the way Unicode is handled on EBCDIC platforms is still rather +experimental. On such a platform references to UTF-8 encoding in this +document and elsewhere should be read as meaning UTF-EBCDIC as specified +in Unicode Technical Report 16 unless ASCII vs EBCDIC issues are specifically +discussed. There is no C<utfebcdic> pragma or ":utfebcdic" layer, rather +"utf8" and ":utf8" are re-used to mean platform's "natural" 8-bit encoding +of Unicode. See L<perlebcdic> for more discussion of the issues. + +The following areas are still under development. =over 4 =item Input and Output Disciplines -There is currently no easy way to mark data read from a file or other -external source as being utf8. This will be one of the major areas of -focus in the near future. +A filehandle can be marked as containing perl's internal Unicode encoding +(UTF-8 or UTF-EBCDIC) by opening it with the ":utf8" layer. +Other encodings can be converted to perl's encoding on input, or from +perl's encoding on output by use of the ":encoding()" layer. +There is not yet a clean way to mark the perl source itself as being +in an particular encoding. =item Regular Expressions -The existing regular expression compiler does not produce polymorphic -opcodes. This means that the determination on whether to match Unicode -characters is made when the pattern is compiled, based on whether the -pattern contains Unicode characters, and not when the matching happens -at run time. This needs to be changed to adaptively match Unicode if -the string to be matched is Unicode. +The regular expression compiler does now attempt to produce polymorphic +opcodes. That is the pattern should now adapt to the data and +automaticaly switch to the Unicode character scheme when presented with Unicode data, +or a traditional byte scheme when presented with byte data. +The implementation is still new and (particularly on EBCDIC platforms) may +need further work. =item C<use utf8> still needed to enable a few features |