summaryrefslogtreecommitdiff
path: root/toke.c
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2022-03-24 14:31:44 -0600
committerKarl Williamson <khw@cpan.org>2022-03-24 16:13:30 -0600
commit2ea73191f792b62c945c77ff01626704f7523ed0 (patch)
tree6190b17f51fe43c3b4163bae26ca81e10b678e14 /toke.c
parent7ddf4b5513d17e0aaa106c6dd076d0dfe4a7e4a8 (diff)
downloadperl-2ea73191f792b62c945c77ff01626704f7523ed0.tar.gz
toke.c: Add missing ptr update
scan_str() calls s=skipspace(s). It turns out that this function can actually change the buffer 's' is pointing to, so that the original 'start' passed in to the function is obsolete. Just update it. This is very much like the paradigm already in S_force_word(). This bug previously existed, but commit 32b87797e986f5d99836e16ea6b9d9ff5a56d3be increased the frequency of occurrence from close to non-existent to relatively often. It only happened when the string being delimited had some spaces before it, and only if the buffer got moved. This depends on the position the construct is in the file, and on the buffering of the reading of that file, hence the symptoms had it occurring much more often using stdio than PerlIO. (it could just as well have been the reverse, I suppose.) The mentioned commit collapsed two different loops; one of which didn't bother with a check it should have been doing. Without that check, the likelihood of this being triggered was much less. (But illegal input would get by.) There is a nuance here, which resulted in the need for this commit to also update the test file, from having two occurrences of an error on a single line to just one. This is because, if the buffer moves, we reset 'start' to 's'. This makes 's' appear to be at the left edge of the input when it really is just at the left edge of the buffer. The test that failed used a combining character (I'll call it 'cc' for short) after a space, to check that the code accurately catches the illegality that you can't delimit a string with a character that doesn't stand on its own, such as a cc. However when such a character comes at the beginning of the input, there's nothing for it to combine with, and Unicode says that is legal, so we do too. So this moving 'start' makes something that is illegal look to be legal. I don't think this is a problem because the code looks up the cc and discovers there is no mirror for it, so it must also be the terminator for the string. If this cc is just from a single typo in the input, there won't be a matching terminator, and the compilation will abort. If the program intended to use a cc as both fore and aft of a string, the terminating occurrence of this cc will also be checked for validity, and it will almost certainly be seen to be an illegal cc in this context, so again the compilation will fail. That is indeed what is happening in t/lib/warnings/toke. If the buffering were such that the terminating cc also began a new buffer, it again would be viewed as at the edge and the string would be parsed as being ok, when it really shouldn't have been. Should this happen, I don't see a real problem. An attacker could craft a string with the precise length to make this happen, but to do so they would have to control the source code, and the war is already lost.
Diffstat (limited to 'toke.c')
-rw-r--r--toke.c7
1 files changed, 4 insertions, 3 deletions
diff --git a/toke.c b/toke.c
index b4b4786a1d..8114ef03d0 100644
--- a/toke.c
+++ b/toke.c
@@ -11339,8 +11339,9 @@ Perl_scan_str(pTHX_ char *start, int keep_bracketed_quoted, int keep_delims, int
PERL_ARGS_ASSERT_SCAN_STR;
/* skip space before the delimiter */
- if (isSPACE(*s)) {
- s = skipspace(s);
+ if (isSPACE(*s)) { /* skipspace can change the buffer 's' is in, so
+ 'start' also has to change */
+ s = start = skipspace(s);
}
/* mark where we are, in case we need to report errors */
@@ -11362,7 +11363,7 @@ Perl_scan_str(pTHX_ char *start, int keep_bracketed_quoted, int keep_delims, int
if (UTF && UNLIKELY(! is_grapheme((U8 *) start,
(U8 *) s,
(U8 *) PL_bufend,
- open_delim_code)))
+ open_delim_code)))
{
yyerror(non_grapheme_msg);
}