From 76513bdc5d9e7bddc7d5da43b64755a51aea8673 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 3 Jul 2017 12:26:34 -0600 Subject: Revert: Restrict code points to <= IV_MAX This reverts the two related commits 51099b64db323d0e1d871837f619d72bea8ca2f9 (partially) 13f4dd346e6f3b61534a20f246de3a80b3feb743 (entirely) I was in the middle of a long branch dealing with this and related issues when these were pushed to blead. It was far easier for me to revert these at the beginning of my branch than to try to rebase unreverted. And there are changes needed to the approaches taken in the reverted commits. A third related commit, 113b8661ce6d987db4dd217e2f90cbb983ce5d00, doesn't cause problems so isn't reverted. I reverted the second commit, then the first one, and squashed them together into this one. No other changes were done in this commit. The reason for the squashing is to avoid problems when bisecting on a 32-bit machine. If the bisect landed between the commits, it could show failures. The portion of the first commit that wasn't reverted was the part that was rendered moot because of the changes in the meantime that forbid bitwise operations on strings containing code points above Latin1. The next commit in this series will reinstate portions of these commits. I reverted as much as possible here to make this reversion commit cleaner. The biggest problem with these commits, is that some Perl applications are made vulnerable to Denial of Service attacks. I do believe it is ok to croak when a program tries, for example, to do chr() of too large a number, which is what the reverted commit does (and what this branch will eventually reinstate doing). But when parsing UTF-8, you can't just die if you find something too large. That would be an easy DOS on any program, such as a web server, that gets its UTF-8 from the public. Perl already has a means to deal with too-large code points (before 5.26, this was those code points that overflow the word size), and web servers should have already been written in such a way as to deal with these. This branch just adapts the code so that anything above IV_MAX is considered to be overflowing. Web servers should not have to change as a result. A second issue is that one of the reasons we did the original deprecation is so that we can use the forbidden code points internally ourselves, such as Perl 6 does to store Grapheme Normal Form. The implementation should not burn bridges, but allow that use to easily happen when the time comes. For that reason, some tests should not be deleted, but commented out, so they can be quickly adapted. While working on this branch, I found several unlikely-to-occur bugs in the existing code. These should be fixed now in the code that handles up to UV_MAX code points, so that when we do allow internal use of such, the bugs are already gone. I also had researched the tests that fail as a result of the IV_MAX restriction. Some of the test changes in these reverted commits were inappropriate. For example, some tests that got changed were for bugs that happen only on code points that are now illegal on 32-bit builds. Lowering the code point in the test to a legal value, as was done in some instances, no longer tests for the original bug. Instead, where I found this, I just skip the test on 32-bit platforms. Other tests were simply deleted, where a lower code point would have worked, and the test is useful with a lower code point. I retain such tests, using a lower code point. In some cases, it was probably ok to delete the tests on 32-bit platforms, as something was retained for a 64-bit one, but since I had already done the adaptive work, I retain that. And still other tests were from files that I extensively revamp, so I went with the revamp. The following few commits fix those as far as possible now. This is so that the reversion of the tests and my changes are close together in the final commit series. Some changes have to wait to later, as for those where the entire test files are revamped, or when the deprecation messages finally go away in the final commit of this series. In cases where the message wording I was contemplating using conflicts with the reverted commits, I change mine to use that of the reverted commits. --- t/re/pat_advanced.t | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) (limited to 't/re') diff --git a/t/re/pat_advanced.t b/t/re/pat_advanced.t index 3d57beade4..68c827a362 100644 --- a/t/re/pat_advanced.t +++ b/t/re/pat_advanced.t @@ -2344,7 +2344,7 @@ EOF # We use 'ok' instead of 'like' because the warnings are lexically # scoped, and want to turn them off, so have to do the match in this # scope. - if ($Config{uvsize} > 4) { + if ($Config{uvsize} < 8) { ok(chr(0xFFFF_FFFE) =~ /\p{Is_32_Bit_Super}/, "chr(0xFFFF_FFFE) can match a Unicode property"); ok(chr(0xFFFF_FFFF) =~ /\p{Is_32_Bit_Super}/, @@ -2355,6 +2355,24 @@ EOF ok(chr(0xFFFF_FFFF) =~ $p, # Tests any caching "chr(0xFFFF_FFFF) can match itself in a [class] subsequently"); } + else { + no warnings 'overflow'; + ok(chr(0xFFFF_FFFF_FFFF_FFFE) =~ qr/\p{Is_Portable_Super}/, + "chr(0xFFFF_FFFF_FFFF_FFFE) can match a Unicode property"); + ok(chr(0xFFFF_FFFF_FFFF_FFFF) =~ qr/^\p{Is_Portable_Super}$/, + "chr(0xFFFF_FFFF_FFFF_FFFF) can match a Unicode property"); + + my $p = qr/^[\x{FFFF_FFFF_FFFF_FFFF}]$/; + ok(chr(0xFFFF_FFFF_FFFF_FFFF) =~ $p, + "chr(0xFFFF_FFFF_FFFF_FFFF) can match itself in a [class]"); + ok(chr(0xFFFF_FFFF_FFFF_FFFF) =~ $p, # Tests any caching + "chr(0xFFFF_FFFF_FFFF_FFFF) can match itself in a [class] subsequently"); + + # This test is because something was declared as 32 bits, but + # should have been cast to 64; only a problem where + # sizeof(STRLEN) != sizeof(UV) + ok(chr(0xFFFF_FFFF_FFFF_FFFE) !~ qr/\p{Is_32_Bit_Super}/, "chr(0xFFFF_FFFF_FFFF_FFFE) shouldn't match a range ending in 0xFFFF_FFFF"); + } } { # [perl #112530], the code below caused a panic @@ -2404,7 +2422,8 @@ EOF $Config{uvsize} == 8 or skip("need large code-points for this test", 1); - fresh_perl_is('/\x{E000000000}|/ and print qq(ok\n)', "ok\n", {}, + # This is above IV_MAX on 32 bit machines, so turn off those warnings + fresh_perl_is('no warnings "deprecated"; /\x{E000000000}|/ and print qq(ok\n)', "ok\n", {}, "buffer overflow in TRIE_STORE_REVCHAR"); } -- cgit v1.2.1