summaryrefslogtreecommitdiff
path: root/t
Commit message (Collapse)AuthorAgeFilesLines
* [perl #123893] Fix hang with "@{"Father Chrysostomos2015-08-281-4/+15
| | | | | | | | | | | | | | | | | | | | | | | Commit v5.21.8-320-ge47d32d stopped code interpolated into quote-like operators from reading more lines of input, by making lex_next_chunk ignore the open filehandle and return false. That causes this block under case 0 in yylex to loop: if (!lex_next_chunk(fake_eof)) { CopLINE_dec(PL_curcop); s = PL_bufptr; TOKEN(';'); /* not infinite loop because rsfp is NULL now */ } (rsfp is not null there.) This commit makes it check for quote-like operators above, in the same place where it checks whether the file is open, to avoid falling through to this code that can loop. This changes the syntax errors for a couple of cases recently added to t/op/lex.t, though I think the error output is now more consis- tent overall. (cherry picked from commit 0f9d53bbcafba2b30e50a1ad22c7759be170e14a)
* [perl #123712] Don’t check sub_inwhatFather Chrysostomos2015-08-281-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PL_sublex_info.sub_inwhat (in the parser struct) is a temporary spot to store the value of PL_lex_inwhat (also in the parser struct) when a sub-lexing scope (for a quote-like operator) is entered. PL_lex_inwhat is localised, and the value is copied from its temporary spot (sub_inwhat) into PL_lex_inwhat. The PL_sublex_info.sub_inwhat was not localised, but instead the value was set to 0 when a sub-lexing scope was exited. This value was being used, in a couple of places, to determine whether we were inside a quote-like operator. But because the value is not localised, it can be wrong when it is set to 0, if we have nested lexing scopes. So this ends up crashing for the same reason described in e47d32dcd5: echo -n '/$a[m||/<<a' | ./miniperl perl-5.005_02-1816-g09bef84 added the first use of PL_sublex_info.sub_inwhat to determine whether we are in a quote-like operator. (Later it got shifted around.) I copied that in e47d32dcd5 (earlier today), because I assumed the logic was correct. Other parts of the code use PL_lex_inwhat, which is already localised, as I said, and does not suffer this problem. If we do not check PL_sublex_info.sub_inwhat to see if we are in a quote-like construct, then we don’t need to clear it on lexing scope exit. (cherry picked from commit d27f4b916ce5819f564bdd4a135137c457156333)
* [perl #123712] Fix /$a[/ parsingFather Chrysostomos2015-08-281-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The parser used to read more lines of input when parsing code interpo- lated into quote-like operators, under some circumstance. This would result in code like this working, even though it should be a syn- tax error: s||${s/.*/|; /s}Just another Perl hacker, print "${;s/.*/Just an"; other Perl hacker, /s} die or return; print While this was harmless, other cases, like /$a[/<<a with no trailing newline, would cause unexpected internal state that did not meet the reasonable assumptions made by S_scan_heredoc, resulting in a crash. The simplest fix is to modify the function that reads more input, namely, lex_next_chunk, and prevent it from reading more lines of input from inside a quote-like operator. (The alternative would be to modify all the calls to lex_next_chunk, and make them conditional.) That breaks here-doc parsing for things like s//<<EOF/, but the LEX_NO_TERM flag to lex_next_chunk is used only by the here-doc parser, so lex_next_chunk can make an exception if it is set. (cherry picked from commit e47d32dcd59a578274f445fac79e977d83055c8c)
* PATCH: [perl 125825] {n}+ possessive quantifier brokenKarl Williamson2015-08-272-1/+2
| | | | | | | I was unaware of this construct when I wrote the commit that broke it, and there were no tests for it. Now there are. (cherry picked from commit 9a7bb2f73a8a1b561890191974201d576371e7f9)
* RT #124156: death during unwinding causes crashDavid Mitchell2015-08-191-1/+51
| | | | | | | | | | | | | | | | | | | | | | | | | v5.19.3-139-g2537512 changed POPSUB and POPFORMAT so that they also unwind the relevant portion of the scope stack. This (sensible) change means that during exception handling, contexts and savestack frames are popped in lock-step, rather than all the contexts being popped followed by all the savestack contents. However, LEAVE_SCOPE() is now called by POPSUB/FORMAT, which can trigger destructors, tied method calls etc, which themselves may croak. The new unwinding will see the old sub context still on the context stack and call POPSUB on it again, leading to double frees etc. At this late stage in code freeze, the least invasive change is to use an unused bit in cx->blk_u16 to indicate that POPSUB has already been called on this context frame. Sometime later, this whole area of code really needs a thorough overhaul. The main issue is that if cxstack_ix-- is done too early, then calling destructors etc can overwrite the current context frame while we're still using using it; if cxstack_ix-- is done too late, then that stack frame can end up getting unwound twice. (cherry picked from commit 1956db7ee60460e5b4a25c19fda4999666c8cbd1)
* [perl #123711] Fix crash with 0-5x-l{0}Father Chrysostomos2015-08-171-0/+3
| | | | | | | | | | | | | | | | | perl-5.8.0-117-g6f33ba7, which added the XTERMORDORDOR hack, did not change the leftbracket code to treat XTERMORDORDOR the same way as XTERM, so -l {0} and getc {0} (among other ops) were treating {...} as a block, rather than an anonymous hash. This was not, however, being turned into a real block with enter/leave ops to protect the stack, so the nextstate op was corrupting the stack and possibly freeing mor- tals in use. This commit makes the leftbracket code check for XTERMORDORDOR and treat it like XTERM, so that -l {0} once more creates an anonymous hash. There is really no way to get to that hash, though, so all I can test for is the crash. (cherry picked from commit 83a85f49e265a458a481a9dc402dd3bdd30ae457)
* save_re_context(): do "local $n" with no PL_curpmDavid Mitchell2015-08-122-0/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RT #124109. 2c1f00b9036 localised PL_curpm to NULL when calling swash init code (i.e. perl-level code that is loaded and executed when something like "lc $large_codepoint" is executed). b4fa55d3f1 followed this up by gutting Perl_save_re_context(), since that function did, basically, if (PL_curpm) { for (i = 1; i <= RX_NPARENS(PM_GETRE(PL_curpm))) { do the C equivalent of the perl code "local ${i}"; } } and now that PL_curpm was null, the code wasn't called any more. However, it turns out that the localisation *was* still needed, it's just that nothing in the test suite actually tested for it. In something like the following: $x = "\x{41c}"; $x =~ /(.*)/; $s = lc $1; pp_lc() calls get magic on $1, which sets $1's PV value to a copy of the substring captured by the current pattern match. Then pp_lc() calls a function to convert the string to upper case, which triggers a swash load, which calls perl code that does a pattern match and, most importantly, uses the value of $1. This triggers get magic on $1, which overwrites $1's PV value with a new value. When control returns to pp_lc(), $1 now holds the wrong string value. Hence $1, $2 etc need localising as well as PL_curpm. The old way that Perl_save_re_context() used to work (localising $1..${RX_NPARENS}) won't work directly when PL_curpm is NULL (as in the swash case), since we don't know how many vars to localise. In this case, hard-code it as localising $1,$2,$3 and add a porting test file that checks that the utf8.pm code and dependences don't use anything outside those 3 vars. (cherry picked from commit 3553f4fa11fd9e8bb0797ace43605cc33ebf32fa)
* Fix test count in t/base/rs.tSteve Hay2015-08-101-2/+1
| | | | | | Commit da902b5900 cherry-picked 5fe499a8e2, but I got the conflict resolution wrong. Now resolved correctly after looking at blead commit 0b81c0dda6, which made corrections to the test counting/skipping.
* Fix "...without parentheses is ambuguous" warning for UTF-8 function namesAlex Vandiver2015-08-101-0/+10
| | | | | | | | | | | | | | While isWORDCHAR_lazy_if is UTF-8 aware, checking advanced byte-by-byte. This lead to errors of the form: Passing malformed UTF-8 to "XPosixWord" is deprecated Malformed UTF-8 character (unexpected continuation byte 0x9d, with no preceding start byte) Warning: Use of "�" without parentheses is ambiguous Use UTF8SKIP to advance character-by-character, not byte-by-byte. (cherry picked from commit 8ce2ba821761a7ada1e1def512c0374977759cf7)
* Allow unquoted UTF-8 HERE-document terminatorsAlex Vandiver2015-08-101-0/+11
| | | | | | | | | | | | | | | | | | | | | | When not explicitly quoted, tokenization of the HERE-document terminator dealt improperly with multi-byte characters, advancing one byte at a time instead of one character at a time. This lead to incomprehensible-to-the-user errors of the form: Passing malformed UTF-8 to "XPosixWord" is deprecated Malformed UTF-8 character (unexpected continuation byte 0xa7, with no preceding start byte) Can't find string terminator "EnFra�" anywhere before EOF If enclosed in single or double quotes, parsing was correctly effected, as delimcpy advances byte-by-byte, but looks only for the single-byte ending character. When doing a \w+ match looking for the end of the word, advance character-by-character instead of byte-by-byte, ensuring that the size does not extend past the available size in PL_tokenbuf. (cherry picked from commit 6e59c8626d31f697a2b7b36cf8a200b36d93eac2)
* [perl #124113] Make check for multi-dimensional arrays be UTF8-awareAlex Vandiver2015-08-101-0/+10
| | | | | | | | | | | | | | | | | | | | | During parsing, toke.c checks if the user is attempting provide multiple indexes to an array index: $a[ $foo, $bar ]; However, while checking for word characters in variable names is aware of multi-byte characters if "use utf8" is enabled, the loop is only advanced one byte at a time, not one character at a time. As such, multibyte variables in array indexes incorrectly yield warnings: Passing malformed UTF-8 to "XPosixWord" is deprecated Malformed UTF-8 character (unexpected continuation byte 0x9d, with no preceding start byte) Switch the loop to advance character-by-character if UTF-8 semantics are in use. (cherry picked from commit b3089e964c0afaf7eb8d54aa5a912e4eb2e6c176)
* Stop $^H |= 0x1c020000 from enabling all featuresFather Chrysostomos2015-08-101-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | That set of bits sets the feature bundle to ‘custom’, which means that the features are set by %^H, and also indicates that %^H has been did- dled with, so it’s worth looking at. In the specific case where %^H is untouched and there is no corres- ponding cop hint hash behind the scenes, Perl_feature_is_enabled (in toke.c) ends up returning TRUE. Commit v5.15.6-55-g94250ae sped up feature checking by allowing refcounted_he_fetch to return a boolean when checking for existence, instead of converting the value to a scalar, whose contents we are not even going to use. This was when the bug started happening. I did not update the code path in refcounted_he_fetch that handles the absence of a hint hash. So it was returning &PL_sv_placeholder instead of NULL; TRUE instead of FALSE. This did not cause problems for most code, but with the introduction of the new bitwise ops in v5.21.8-150-g8823cb8, it started causing uni::perl to fail, because they were implicitly enabled, making ^ a numeric op, when it was being used as a string op. (cherry picked from commit 71622e40793536aa4f2ace7ffc704cc78151fd26)
* [perl #123202] speed up scalar //g against tainted stringsTony Cook2015-08-101-0/+42
| | | | (cherry picked from commit ed38223246c041b4e9ce5687cadf6f6b903050ca)
* Fix quoting in new switchd.t test.Craig A. Berry2015-08-101-1/+1
| | | | | | | | | | Escaped double quotes are not portable, but luckily we don't need to worry about what is portable as runperl will take care of it for us if we leave things in its capable hands. Follow-up to 8d28fc8f69270cc75d9564. (cherry picked from commit 7f8f1c2613d1d5df0b1071bd5fe3eec808c4a69e)
* [perl #123748] - Add test case for possible getenv/putenv/setenv stomping in ↵Matthew Horsfall (alh)2015-08-101-1/+23
| | | | | | perl.c (cherry picked from commit 8d28fc8f69270cc75d9564b369ac6008c5b5d617)
* [perl #123218] "preserve" $/ if set to a bad valueTony Cook2015-08-101-2/+9
| | | | | | and base/rs.t tests $/ not $! (cherry picked from commit 5fe499a8e26270679c0c6d48431f3a328a8ffeba)
* [perl #124127] fix cloning arrays with unused elementsTony Cook2015-08-101-1/+7
| | | | | | | | | | ce0d59fd changed arrays to use NULL instead of &PL_sv_undef for unused elements, unfortunately it missed updating sv_dup_common()'s initialization of unused elements, leaving them as &PL_sv_undef. This resulted in modification of read only value errors at runtime. (cherry picked from commit 902d16915db2735c3a41f15ef8d95cf300c31801)
* [perl #123652] eval {label:} crashFather Chrysostomos2015-06-011-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of v5.13.6-130-geae48c8, the block consists solely of a nextstate op. The code in ck_eval that distinguished between eval-block and eval- string was checking the type of the kid op (looking for lineseq or stub) instead of simply checking the type of the op itself (entertry/ entereval). The lexer was already making the distinction between the two but op.c was ignoring the information provided by the lexer. Usually entertry(unop) kid gets converted into leavetry entertry(logop) kid with the entertry reallocated as a larger-sized op, but that was not happening. The peephole optimiser assumed it had happened, and fol- lowed the cLOGOPo->op_other pointer, which is unrelated junk beyond the end of the unop struct. Hence the crash. (cherry picked from commit 2f465e08eb39981706429873d24e3bcc18015bfb)
* Bump Pod::PlainText $VERSIONSteve Hay2015-01-311-0/+1
| | | | | | To keep Porting\cmpVERSION.pl --tag v5.18.4 happy. (cherry picked from commit f8d8294fb886e7b57f8bb3b2a1edd33218c74281)
* Bump some CPAN $VERSIONsSteve Hay2015-01-311-0/+8
| | | | | | To keep Porting\cmpVERSION.pl --tag v5.18.4 happy. (cherry picked from commit 057e4b4ac27b64f0c638760275b3d6205d87b3f0)
* Add Module-CoreList check to t/porting/corelist.tSteve Hay2015-01-281-1/+2
| | | | Manually backported from commit 1524a56aaa5d246d02ca951a4de771af3d9c2a54.
* intuit_more: no need to copy before keyword checkHugo van der Sanden2015-01-211-1/+10
| | | | | | That also avoids crashing on overrun. (cherry picked from commit 56f81afc0f2d331537f38e6f12b86a850187cb8a)
* [perl #123538] always set chophere and itembytes at the same timeTony Cook2015-01-211-1/+13
| | | | | | | Previously this would crash in FF_MORE because chophere was still NULL. (cherry picked from commit 62db6ea5fed19611596cbc5fc0b8a4df2c604e58)
* perlunicook: add trusted-to-exist links for perlunicookRicardo Signes2015-01-171-0/+6
| | | | (cherry picked from commit efb8961587a30e930d4581a96246b02e20d2b1f4)
* Silence "Useless use ... in void context" warnings from commit 2ce04c5f4aSteve Hay2015-01-171-2/+2
| | | | Thanks to Craig A. Berry for spotting this.
* parser.t: Correct bug numberFather Chrysostomos2015-01-131-1/+1
| | | | (cherry picked from commit cc5af3775649fc00e4d4e74d41dcad591b1fa122)
* perldiag: Reunite ‘perhaps you forgot to load’Father Chrysostomos2015-01-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | to the other part of the message. diagnostics.pm won’t find it otherwise: $ perl -Mdiagnostics -we '"foo"->bar' Can't locate object method "bar" via package "foo" (perhaps you forgot to load "foo"?) at -e line 1 (#1) Uncaught exception from user code: Can't locate object method "bar" via package "foo" (perhaps you forgot to load "foo"?) at -e line 1. Now we have this: Can't locate object method "bar" via package "foo" (perhaps you forgot to load "foo"?) at -e line 1 (#1) (F) You called a method on a class that did not exist, and the method could not be found in UNIVERSAL. This often means that a method requires a package that has not been loaded. Uncaught exception from user code: Can't locate object method "bar" via package "foo" (perhaps you forgot to load "foo"?) at -e line 1. (cherry picked from commit 8af56b9d4cb926792c8f72b634303126a5b1d860)
* Semicolon before ellipsis inside block disambiguates.James E Keenan2015-01-131-3/+28
| | | | | | | | | | | | | Correct documentation which indicated that, inside a block, a semicolon after an ellipsis statement would disambiguate between a block and a hash reference constructor. The semicolon must precede the ellipsis to perform this disambiguation. Add tests to demonstrate that whitespace around the ellipsis statement does not impeded the disambiguation. Add perldelta entry. For: RT #122661 (cherry picked from commit 12d22d1fe17e8471834a01cd417792ac5c022d62)
* t/io/eintr.t: Make this pass on my ppc64 boxÆvar Arnfjörð Bjarmason2015-01-121-2/+7
| | | | | | | See the "Test failures in blead on ppc64" thread on perl5-porters for details. I'd fail on the previous value, this passes every time. (cherry picked from commit 1e02895ff34c407637067df12a1b06eb07a5a96a)
* parser.t: Correct skip countFather Chrysostomos2015-01-121-1/+1
| | | | | | | I increased the skip count in 08b999a9 by mistake. For crashing bugs, executing the code that would have crashed is sufficient to test it. (cherry picked from commit ca949e9c3c10048fd3c2829b96293e6139846dcc)
* [perl #123452] Fix crash with s/${<>{})//Father Chrysostomos2015-01-121-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | s/foo/bar/ tokenizes as something akin to subst("foo","bar") and the resulting list op representing the contents of the parentheses is passed to pmruntime as its expr argument. If we have invalid code like s/${<>{})//, the bison parser will dis- card invalid tokens until it finds something it can fall back to, in an attempt to keep parsing (to report as many errors as possible). In the process of discarding tokens, it may convert s/${<>{})//, which the lexer emits like this: PMFUNC ( $ { THING(readline) { } ) , "" ) into this: PMFUNC ( $ { THING(readline) } ) , "" ) (or something similar). So when the parser sees the first closing parentheses, it decides it has a complete PMFUNC(...), and the expr argument to pmruntime ends up being an rv2sv op (the ${...}), not a list op. pmruntime assumes it is a list op, and tries to access its op_last field, to find the replacement part; but rv2sv has no op_last field, so this reads past the end of the op struct, usually into the first pointer in the next op slot, which itself is an opslot pointer, not an op pointer, so things really screw up. If we check that the arguments to subst are indeed a list op first before trying to extract the replacement part, everything works. We get the syntax errors reported as expected, but no crash. (cherry picked from commit 08b999a9d7e845b758c38568f45f6b2b8d552ed9)
* Adjust test expectations from previous commitSteve Hay2015-01-112-3/+11
| | | | See http://www.nntp.perl.org/group/perl.perl5.porters/2015/01/msg224538.html
* [perl #123495] Stop gmtime(nan) from crashingFather Chrysostomos2015-01-112-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | We were getting a time struct like this: $12 = { tm_sec = -2147483588, tm_min = 2147483647, tm_hour = -2147483624, tm_mday = -2147483647, tm_mon = 11, tm_year = 69, tm_wday = -2147483641, tm_yday = -2147483314, tm_isdst = 0, tm_gmtoff = 0, tm_zone = 0x1004f6bb6 "UTC" } which resulted in dayname[tmbuf.tm_wday] reading past the beginning of the array. We should check for nan explicitly instead of falling through to the time calculations. (cherry picked from commit d8bd3d828a02f8df716063d9980b8b9af539ca42)
* Fix assertion failure with qr/\Q(?{})/Father Chrysostomos2015-01-111-1/+5
| | | | | | | | | | | | | | | | | | | | \Q and \u create ops that need targets, and hence use the pad of the anonymous sub created temporarily when parsing something like qr/\Q(?{})/. If it turns out we don’t have a code block (in this case), that anon sub is thrown away, but there is an assertion that makes sure its pad has not been used, which fails: $ ./perl -e 'qr/\Q(?{})/' Assertion failed: (AvFILLp(PL_comppad) == 0), function Perl_pmruntime, file op.c, line 5395. Abort trap: 6 (That assertion was added by d63c20f27.) If we have had \Q or \l, then the length of the pad may be more than 1, but constant folding should have stolen the values from the pad, so assert that instead. (cherry picked from commit d100ca43dce2c9a6bb636517e5595aa9e1e01e7e)
* Fix qr/@array(?{block})/Father Chrysostomos2015-01-111-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For qr/(?{})/ to work closurewise, it has to have an implicit anony- mous sub that the blocks run in. To that end, the parser compiles the entire thing in the context of a new anonymous sub. For a run-time, pattern (with @a or $b outside the block), since the ops have been compiled in the context of that anonymous sub, they must be run within it, too (otherwise the ops point to the wrong pad), so at compile time the ‘arguments’ to qr are turned into a call to an anonymous sub that looks like like sub { @a, "(?{...})", ...}. This was causing a bizarre copy: $ perl5.18.1 -e 'qr/@a(?{})/' Bizarre copy of ARRAY in subroutine exit at -e line 1. Bisect points to v5.17.10-92-g491453b: $ ../perl.git/Porting/bisect.pl --target=miniperl --start=v5.14.0 --end=v5.18.1 -e 'BEGIN{$^H|=0x00200000} qr/@a(?{})/' ... 491453ba443e114f751f325a4734b3d07b897606 is the first bad commit commit 491453ba443e114f751f325a4734b3d07b897606 Author: David Mitchell <davem@iabyn.com> Date: Wed Apr 17 17:51:16 2013 +0100 Handle /@a/ array expansion within regex engine The array op was not being flagged as a flattening op, and sub exit was trying to copy an unflattened array. We don’t want the array flattened. To allow it to pass through sub exit unscathed, we need to make this an lvalue sub. That fixes it and everything just works. It even makes non-array cases slightly faster, because nothing is copied at sub exit now: before$ time ./miniperl -e 'qr/$a(?{})/ for 1..1000000' real 0m3.321s user 0m3.312s sys 0m0.006s after$ time ./miniperl -e 'qr/$a(?{})/ for 1..1000000' real 0m2.855s user 0m2.845s sys 0m0.006s (cherry picked from commit 3bc8ec963e9657121e69386195faa61e46928dda)
* [perl #123410] sort CORE::fake bizarre behaviourFather Chrysostomos2015-01-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit: commit 01b5ef509f2ebf466fd7de2c1e7406717bb14332 Author: Father Chrysostomos <sprout@cpan.org> Date: Fri Jun 7 20:16:23 2013 -0700 [perl #24482] Fix sort and require to treat CORE:: as keyword caused sort CORE::lc "FOO" to be equivalent to sort +CORE::lc "FOO", the way it does if a keyword is not preceded by CORE::. But it made the mistake of chopping off the last six characters if it is not a keyword after CORE::. So sort CORE::f @_ became equivalent to sort C @_ ! This commit just reverts to the previous behaviour in such cases. (cherry picked from commit 487e470dbd7a885bb6a92a735b2783e1c6740066)
* PATCH: [perl #123539] regcomp.c node overrun/segfaultKarl Williamson2015-01-101-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a minimal patch suitable for a maintenance release. It extracts the guts of reguni and REGC without the conditional they have. The next commit will do some refactoring to avoid branching and to make things clearer. This bug is due to the current two pass structure of the Perl regular expression compiler. The first pass attempts to do just enough work to figure out how much space to malloc for the compiled pattern; and the 2nd pass actually fills in the details. One problem with this design is that in many cases quite a bit of work is required to figure out the size, and this work is thrown away and redone in the second pass. Another problem is that it is easy to forget to do enough work in the sizing pass, and that is what happened with the blamed commit. I understand that there are plans (God speed) to change the compiler design. When not under /i matching, the size of a node that will match a sequence of characters is just the number of bytes those characters take up. We have an easy way to calculate the number of bytes any code point will occupy in UTF-8, and it's just 1 byte per code point for non-UTF-8. So in the sizing pass, we don't actually have to figure out the representation of the characters. However under /i matching, we do. First of all, matching of UTF-8 strings is done by replacing each character of each string by its fold-case (function fc()) and then comparing. This is required by the nature of full Unicode matching which is not 1-1. If we do that replacement for the pattern at compile time, we avoid having to do it over-and-over as pattern matching backtracks at execution. And because fc(x) may not occupy the same number of bytes as x, and there is no easy way to know that size without actually doing the fc(), we have to do the fold in the sizing pass. Now, there are relatively few folds where sizeof(fc(x)) != sizeof(x), so we could construct an exception table for those few cases where it is, and look up through that. But there is another reason that we have to fold in the sizing pass. And that is because of the potential for multi-character folds being split across regnodes. The regular expression compiler generates EXACTish regnodes for matching sequences of characters exactly or via /i. The limit for how many bytes in a sequence such a node can match is 255 because the length is stored in a U8. If the pattern has a sequence longer than that, it is split into two or more EXACTish nodes in a row. (Actually, the compiler splits at a size much lower than that; I'm not sure why, but then two adjoining nodes whose total sum length is at most 255 get joined later in the third, optimizing pass.) Now consider, matching the character U+FB03 LATIN SMALL LIGATURE FFI. It matches the sequence of the three characters "f f i". Because of the design of the regex pattern matching code, if these characters are such that the first one or two are at the end of one EXACTish node, and the final two or one are in another EXACTish node, then U+FB03 wrongly would not match them. Matches can't cross node boundaries. If the pattern were tweaked so all three characters were in either the first or second node, then the match would succeed. And that is what the compiler does. When it reaches the node's size limit, and the final character is one that is a non-terminal character in a multi-char fold, what's in the node is backed-off until it ends with a character without this characteristic. This has to be done in the sizing pass, as we are repacking the nodes, which can affect the size of the pattern, and we have to know what the folds are in order to determine all this. (We don't fold non-UTF-8 patterns. This is for two reasons. One is that one character, the U+00B5 MICRO SIGN, folds to above-Latin1, and if we folded it, we would have to change the pattern into UTF-8, and that would slow everything down. I've thought about adding a regnode type for the much more common case of a sequence that doesn't have this character in it, and which could hence be folded at compile time. But I've not been able to justify this because of the 2nd reason, which is folds in this range are simple enough to be handled by an array lookup, so folding is fast at runtime.) Then there is the complication of matching under locale rules. This bug manifested itself only under /l matching. We can't fold at pattern compile time, because the folding rules won't be known until runtime. This isn't a problem for non-UTF-8 locales, as all folds are 1-1, and so there never will be a multi-char fold. But there could be such folds in a UTF-8 locale, so the regnodes have to be packed to work for that eventuality. The blamed commit did not do that, and because this issue doesn't arise unless there is a string long enough to trigger the problem, this wasn't found until now. What is needed, and what this commit does, is for the unfolded characters to be accumulated in both passes. The code that looks for potential multi-char fold issues handles both folded and unfolded-inputs, so will work. (cherry picked from commit 405dffcb17b9cc9d0e5d7b41835b998ca7f1d873)
* [perl #123245] avoid a panic in sv_chop() in formatsTony Cook2015-01-081-3/+0
| | | | | | | | | | | | | | | | | This fixes two issues: 1) if you don't supply enough arguments to the format, pp_formline() uses &PL_sv_no as the sv, since we've already warned about the missing format argument, we don't need to produce a read only error for an SV the caller didn't supply 2) when the supplied string is empty for FF_LINESNGL and FF_LINEGLOB the case would skip most of its processing, including setting chophere, this meant that when the following FF_CHOP operator was processed it would pass a pointer into a different string, producing a panic. (cherry picked from commit fb9282c3ccd3b3c2e184a3158c46c930c23f30fb)
* [perl #123245] tests for format crashesTony Cook2015-01-081-1/+28
| | | | (cherry picked from commit fcaef4dc8ca94ff0fe27bf4a249a5583ca0e7af5)
* add test for rt122747Yves Orton2015-01-081-0/+29
|
* svleak.t: Add test for #123198Father Chrysostomos2015-01-081-1/+2
|
* Manual backport of 572618de.Jarkko Hietaniemi2014-12-271-1/+2
| | | | | | To fix cherry-pick of fa2edc1a. (cherry picked from commit 69c6e8ed5072300ea7e1ba00cd9fadd60e9200c0)
* Tru64: Skip tests that for some reason grind Tru64 to a halt.Jarkko Hietaniemi2014-12-275-0/+18
| | | | | | | | | | | | | | | | fold_grind and pat_psycho finishe but take several minutes as opposed to other re tests which take seconds; uniprops grinds for even longer but eventually runs out of memory (ulimit mem ~0.5GB) (failing) There probably should be a more centralized/general way of doing this with the core tests: either 'on this $^O, skip these tests' (to avoid crowding the BEGIN of each test with $^O testing), or a more generalized watchdog system (if this test takes more than N sec, bail out -- probably should not be a hard failure as such by default, given slow systems), or some combination thereof. (cherry picked from commit fa2edc1a38517bf179bb9eefa2039264279c29db) (cherry picked from commit cbb727554c39b16b3662af6ec874a8f8e29f3980)
* Fix t/op/taint.t on WindowsFather Chrysostomos2014-12-271-6/+12
| | | | | | | | | | | | | | $ENV{PATH} seems to be the problem. If we clear it, then we can’t spawn another process. I am basing this solely on this comment ear- lier in the file: # On Windows we can't spawn a fresh Perl interpreter unless at # least the Windows system directory (usually C:\Windows\System32) # is still on the PATH. There is however no way to determine the # actual path on the current system without loading the Win32 # module, so we just restore the original $ENV{PATH} here. (cherry picked from commit eaff586aa6444fb20654ed863b7ff35e136737e8)
* [perl #122669] Don’t taint at compile timeFather Chrysostomos2014-12-271-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | #!perl -T # tainted constant use constant K=>$^X; # Just reading the constant for the sake of folding can enabled # taintedness at compile time. 0 if K; # Taintedness is still on when the ‘strict.pm’ SV is created, so # require croaks on it (‘Insecure dependency’). use strict; The fix is simply not to propagate taintedness at compile time. Hence, the value of K will still be tainted at run time (require(K) croaks), but just reading the value of K at compile time won’t taint subsequent string literals (or barewords treated as strings). ‘Compile time’ here is relative: Taintedness still wafts about as usual when BEGIN blocks are executed, because code is actually run- ning. It’s when code is being parsed that propagation is disabled. The reason taint propagation could span across statements at compile time was that *execution* of a new statement resets taintedness, whereas parsing is oblivious to it. (cherry picked from commit 64ff300be0f7714585466af5bb87b2e37db5082a)
* Fix crash with lex subs under -dFather Chrysostomos2014-12-271-1/+40
| | | | (cherry picked from commit 9d8e4b9b32800eb499d83442ce8bbe6639773936)
* Fix crash when lex subs are used for overloadFather Chrysostomos2014-12-271-1/+11
| | | | (cherry picked from commit 56117e3ef4ef2c7986c400f86f321f22f261571a)
* Fix crash when lex subs are used for AUTOLOADFather Chrysostomos2014-12-271-1/+19
| | | | (cherry picked from commit 18691622911f2e18df42a5a98ea4c42386f4e558)
* state.t: Improve test for #123029Father Chrysostomos2014-12-271-1/+1
| | | | | | | This version fails in 5.20.1 whether COW is enabled or not. (cherry picked from commit a4f1ca6eb9658c4d589c98787f06e1851909c7d5) (cherry picked from commit 029988317d165cb2c0c7f73581bb70d358c56458)
* [perl #123029]: add regression testAaron Crane2014-12-271-1/+9
| | | | | | | This bug was fixed in c0683843e9299db25f354e2c8c90faa7614950d1. (cherry picked from commit c4a33ecd3009146ea545628e3014a22c637b6bb1) (cherry picked from commit 60eaf23b013ea62d010333c18283f269a49e6e3b)