summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2010-12-19 11:37:06 -0700
committerKarl Williamson <public@khwilliamson.com>2010-12-19 20:22:25 -0700
commit85c006b64da3a6adb26786871a367c7b75119d2e (patch)
treee3f951e6f20296da466f2db10a71848b8f101949
parentff97e5cf7f9d89732c45b74ff5abc53519433776 (diff)
downloadperl-85c006b64da3a6adb26786871a367c7b75119d2e.tar.gz
perltodo: Revise utf8 todo
-rw-r--r--pod/perltodo.pod11
1 files changed, 7 insertions, 4 deletions
diff --git a/pod/perltodo.pod b/pod/perltodo.pod
index 4eda9920ce..3bd0c06a4e 100644
--- a/pod/perltodo.pod
+++ b/pod/perltodo.pod
@@ -966,10 +966,13 @@ years for this discrepancy.
=head2 UTF-8 revamp
-The handling of Unicode is unclean in many places. For example, the regexp
-engine matches in Unicode semantics whenever the string or the pattern is
-flagged as UTF-8, but that should not be dependent on an internal storage
-detail of the string.
+The handling of Unicode is unclean in many places. In the regex engine
+there are especially many problems. The swash data structure could be
+replaced my something better. Inversion lists and maps are likely
+candidates. The whole Unicode database could be placed in-core for a
+huge speed-up. Only minimal work was done on the optimizer when utf8
+was added, with the result that the synthetic start class often will
+fail to narrow down the possible choices when given non-Latin1 input.
=head2 Properly Unicode safe tokeniser and pads.