re_untuit_start() avoid overshoot with utf8

RT #129012 re_untuit_start() is run before doing a "proper" regex match, to either quickly reject a match or to find the earliest position in a string where the match could occur. Part of its action is to search within the string for a known substring which forms a part of the pattern. If that substring is utf8, with multiple bytes per character, then the calculation of the highest point in the string where its worth searching for the substring, could overshoot the end of the string. It's a (mostly) harmless issue, since apart from the issue of reading a few bytes beyond the end of a string (which might cause a problem if the string is memory mapped for example), the only concern is that in theory (although extremely unlikely) a spurious match for a substring could be found partly beyond the end of the string, resulting in the full RE engine being called to (correctly) do the match, when otherwise the match could have been more quickly rejected.
author: David Mitchell <davem@iabyn.com> 2016-08-24 13:21:04 +0100
committer: David Mitchell <davem@iabyn.com> 2016-08-24 13:30:33 +0100
commit: 71a9d1055562b01938400494965dac70b3a685c5 (patch)
tree: ec9af7279d3a39d2503db4736ae12a7ba20a60c5 /t
parent: f82c7fdb5e25e4e2974e9e3c5519a3d41b00ae4c (diff)
download: perl-71a9d1055562b01938400494965dac70b3a685c5.tar.gz
2 files changed, 12 insertions, 1 deletions
diff --git a/t/re/pat_rt_report.t b/t/re/pat_rt_report.t
index cb09360f4d..addb3e226c 100644
--- a/t/re/pat_rt_report.t
+++ b/t/re/pat_rt_report.t
@@ -20,7 +20,7 @@ use warnings;
 use 5.010;
 use Config;
 
-plan tests => 2500;  # Update this when adding/deleting tests.
+plan tests => 2501;  # Update this when adding/deleting tests.
 
 run_tests() unless caller;
 
@@ -1113,6 +1113,16 @@ EOP
 	my $s = "\x{1ff}" . "f" x 32;
 	ok($s =~ /\x{1ff}[[:alpha:]]+/gca, "POSIXA pointer wrap");
     }
+
+    {
+        # RT #129012 heap-buffer-overflow Perl_fbm_instr.
+        # This test is unlikely to not pass, but it used to fail
+        # ASAN/valgrind
+
+        my $s ="\x{100}0000000";
+        ok($s !~ /00000?\x80\x80\x80/, "RT #129012");
+    }
+
 } # End of sub run_tests
 
 1;
diff --git a/t/re/re_tests b/t/re/re_tests
index b72b18a913..35948b3c23 100644
--- a/t/re/re_tests
+++ b/t/re/re_tests
@@ -1968,6 +1968,7 @@ ab(?#Comment){2}c	abbc	y	$&	abbc
 (?:.||)(?|)000000000@	000000000@	y	$&	000000000@		#  [perl #126405]
 aa$|a(?R)a|a	aaa	y	$&	aaa		# [perl 128420] recursive matches
 (?:\1|a)([bcd])\1(?:(?R)|e)\1	abbaccaddedcb	y	$&	abbaccaddedcb		# [perl 128420] recursive match with backreferences
+AB\s+\x{100}	AB \x{100}X	y	-	-
 
 # Keep these lines at the end of the file
 # vim: softtabstop=0 noexpandtab
author	David Mitchell <davem@iabyn.com>	2016-08-24 13:21:04 +0100
committer	David Mitchell <davem@iabyn.com>	2016-08-24 13:30:33 +0100
commit	71a9d1055562b01938400494965dac70b3a685c5 (patch)
tree	ec9af7279d3a39d2503db4736ae12a7ba20a60c5 /t
parent	f82c7fdb5e25e4e2974e9e3c5519a3d41b00ae4c (diff)
download	perl-71a9d1055562b01938400494965dac70b3a685c5.tar.gz