summaryrefslogtreecommitdiff
path: root/t/op/utf8cache.t
diff options
context:
space:
mode:
authorDavid Mitchell <davem@iabyn.com>2012-08-21 10:55:00 +0100
committerDavid Mitchell <davem@iabyn.com>2012-08-21 10:55:00 +0100
commit88621eaff54e5f5ea9adea13440d1750968643a6 (patch)
tree26e14afe1e690413f0e4eacb4e36c20cf9fbab8e /t/op/utf8cache.t
parent2c46c0863b19def8407894ed12c11c5bdfdf502a (diff)
downloadperl-88621eaff54e5f5ea9adea13440d1750968643a6.tar.gz
utf8 pos cache: always keep most recent value
UTF-8 strings may have magic attached that caches up to two byte position to char position (or vice versa) mappings. When a third position has been calculated (e.g. via sv_pos_b2u()), the code has to decide how to update the cache: i.e. which value to discard. Currently for each of the three possibilities, it looks at what would be the remaining two values, and calculates the RMS sum of the three distances between ^ ... cache A .. cache B ... $. Whichever permutation gives the lowest result is picked. Note that this means that the most recently calculated value may be discarded. This makes sense if the next position request will be for a random part of the string; however in reality, the next request is more likely to be for the same position, or one a bit further along. Consider the following innocuous code: $_ = "\x{100}" x 1_000_000; $p = pos while /./g; This goes quadratic, and takes 150s on my system. The fix is is to always keep the newest value, and use the RMS calculation only to decide which of the two older values to discard. With this fix, the above code takes 0.4s. The test suite takes the same time in both cases, so there's no obvious slowdown elsewhere with this change.
Diffstat (limited to 't/op/utf8cache.t')
-rw-r--r--t/op/utf8cache.t14
1 files changed, 13 insertions, 1 deletions
diff --git a/t/op/utf8cache.t b/t/op/utf8cache.t
index 7ac0011a79..83ad4e83ce 100644
--- a/t/op/utf8cache.t
+++ b/t/op/utf8cache.t
@@ -10,7 +10,7 @@ BEGIN {
use strict;
-plan(tests => 1);
+plan(tests => 2);
my $pid = open CHILD, '-|';
die "kablam: $!\n" unless defined $pid;
@@ -35,3 +35,15 @@ my $utf8magic = qr{ ^ \s+ MAGIC \s = .* \n
\s+ MG_LEN \s = .* \n }xm;
unlike($_, qr{ $utf8magic $utf8magic }x);
+
+# With bad caching, this code used to go quadratic and take 10s of minutes.
+# The 'test' in this case is simply that it doesn't hang.
+
+{
+ local ${^UTF8CACHE} = 1; # enable cache, disable debugging
+ my $x = "\x{100}" x 1000000;
+ while ($x =~ /./g) {
+ my $p = pos($x);
+ }
+ pass("quadratic pos");
+}