summaryrefslogtreecommitdiff
path: root/pod/perltodo.pod
diff options
context:
space:
mode:
authorRafael Garcia-Suarez <rgarciasuarez@gmail.com>2007-12-18 09:51:39 +0000
committerRafael Garcia-Suarez <rgarciasuarez@gmail.com>2007-12-18 09:51:39 +0000
commita3d15f9a2bf22c599dfee4c8fb750856644c6d1f (patch)
tree618bbc3d2b62e25d629446f0575644e5a6118db4 /pod/perltodo.pod
parent4e4a88873a7e660d02017ef31d574516d5d3deb6 (diff)
downloadperl-a3d15f9a2bf22c599dfee4c8fb750856644c6d1f.tar.gz
Notes on 5.12 Unicode revamping planned.
Complete the "reporting bug" section of perldelta. p4raw-id: //depot/perl@32636
Diffstat (limited to 'pod/perltodo.pod')
-rw-r--r--pod/perltodo.pod24
1 files changed, 16 insertions, 8 deletions
diff --git a/pod/perltodo.pod b/pod/perltodo.pod
index 0c85ceb1ce..d869b67c87 100644
--- a/pod/perltodo.pod
+++ b/pod/perltodo.pod
@@ -667,6 +667,22 @@ also the warning messages (see L<perllexwarn>, C<warnings.pl>).
These tasks would need C knowledge, and knowledge of how the interpreter works,
or a willingness to learn.
+=head2 UTF-8 revamp
+
+The handling of Unicode is unclean in many places. For example, the regexp
+engine matches in Unicode semantics whenever the string or the pattern is
+flagged as UTF-8, but that should not be dependent on an internal storage
+detail of the string. Likewise, case folding behaviour is dependent on the
+UTF8 internal flag being on or off.
+
+=head2 Properly Unicode safe tokeniser and pads.
+
+The tokeniser isn't actually very UTF-8 clean. C<use utf8;> is a hack -
+variable names are stored in stashes as raw bytes, without the utf-8 flag
+set. The pad API only takes a C<char *> pointer, so that's all bytes too. The
+tokeniser ignores the UTF-8-ness of C<PL_rsfp>, or any SVs returned from
+source filters. All this could be fixed.
+
=head2 state variable initialization in list context
Currently this is illegal:
@@ -776,14 +792,6 @@ reinstated.
The old perltodo notes "Look at the "reification" code in C<av.c>".
-=head2 Properly Unicode safe tokeniser and pads.
-
-The tokeniser isn't actually very UTF-8 clean. C<use utf8;> is a hack -
-variable names are stored in stashes as raw bytes, without the utf-8 flag
-set. The pad API only takes a C<char *> pointer, so that's all bytes too. The
-tokeniser ignores the UTF-8-ness of C<PL_rsfp>, or any SVs returned from
-source filters. All this could be fixed.
-
=head2 The yada yada yada operators
Perl 6's Synopsis 3 says: