diff options
author | Karl Williamson <public@khwilliamson.com> | 2010-12-19 11:37:06 -0700 |
---|---|---|
committer | Karl Williamson <public@khwilliamson.com> | 2010-12-19 20:22:25 -0700 |
commit | 85c006b64da3a6adb26786871a367c7b75119d2e (patch) | |
tree | e3f951e6f20296da466f2db10a71848b8f101949 | |
parent | ff97e5cf7f9d89732c45b74ff5abc53519433776 (diff) | |
download | perl-85c006b64da3a6adb26786871a367c7b75119d2e.tar.gz |
perltodo: Revise utf8 todo
-rw-r--r-- | pod/perltodo.pod | 11 |
1 files changed, 7 insertions, 4 deletions
diff --git a/pod/perltodo.pod b/pod/perltodo.pod index 4eda9920ce..3bd0c06a4e 100644 --- a/pod/perltodo.pod +++ b/pod/perltodo.pod @@ -966,10 +966,13 @@ years for this discrepancy. =head2 UTF-8 revamp -The handling of Unicode is unclean in many places. For example, the regexp -engine matches in Unicode semantics whenever the string or the pattern is -flagged as UTF-8, but that should not be dependent on an internal storage -detail of the string. +The handling of Unicode is unclean in many places. In the regex engine +there are especially many problems. The swash data structure could be +replaced my something better. Inversion lists and maps are likely +candidates. The whole Unicode database could be placed in-core for a +huge speed-up. Only minimal work was done on the optimizer when utf8 +was added, with the result that the synthetic start class often will +fail to narrow down the possible choices when given non-Latin1 input. =head2 Properly Unicode safe tokeniser and pads. |