perltodo: Revise utf8 todo

author: Karl Williamson <public@khwilliamson.com> 2010-12-19 11:37:06 -0700
committer: Karl Williamson <public@khwilliamson.com> 2010-12-19 20:22:25 -0700
commit: 85c006b64da3a6adb26786871a367c7b75119d2e (patch)
tree: e3f951e6f20296da466f2db10a71848b8f101949
parent: ff97e5cf7f9d89732c45b74ff5abc53519433776 (diff)
download: perl-85c006b64da3a6adb26786871a367c7b75119d2e.tar.gz
1 files changed, 7 insertions, 4 deletions
diff --git a/pod/perltodo.pod b/pod/perltodo.pod
index 4eda9920ce..3bd0c06a4e 100644
--- a/pod/perltodo.pod
+++ b/pod/perltodo.pod
@@ -966,10 +966,13 @@ years for this discrepancy.
 
 =head2 UTF-8 revamp
 
-The handling of Unicode is unclean in many places. For example, the regexp
-engine matches in Unicode semantics whenever the string or the pattern is
-flagged as UTF-8, but that should not be dependent on an internal storage
-detail of the string.
+The handling of Unicode is unclean in many places.  In the regex engine
+there are especially many problems.  The swash data structure could be
+replaced my something better.  Inversion lists and maps are likely
+candidates.  The whole Unicode database could be placed in-core for a
+huge speed-up.  Only minimal work was done on the optimizer when utf8
+was added, with the result that the synthetic start class often will
+fail to narrow down the possible choices when given non-Latin1 input.
 
 =head2 Properly Unicode safe tokeniser and pads.
author	Karl Williamson <public@khwilliamson.com>	2010-12-19 11:37:06 -0700
committer	Karl Williamson <public@khwilliamson.com>	2010-12-19 20:22:25 -0700
commit	85c006b64da3a6adb26786871a367c7b75119d2e (patch)
tree	e3f951e6f20296da466f2db10a71848b8f101949
parent	ff97e5cf7f9d89732c45b74ff5abc53519433776 (diff)
download	perl-85c006b64da3a6adb26786871a367c7b75119d2e.tar.gz