summaryrefslogtreecommitdiff
path: root/regcomp.c
Commit message (Collapse)AuthorAgeFilesLines
* fix #131649 - extended charclass can trigger assertYves Orton2018-11-051-10/+18
| | | | | | | | | | The extended charclass parser makes some assumptions during the first pass which are only true on well structured input, and it does not properly catch various errors. later on the code assumes that things the first pass will let through are valid, when in fact they should trigger errors. (cherry picked from commit 19a498a461d7c81ae3507c450953d1148efecf4f)
* regcomp.c: Convert some strchr to memchrKarl Williamson2018-11-051-4/+7
| | | | | | | This allows things to work properly in the face of embedded NULs. See the branch merge message for more information. (cherry picked from commit 43b2f4ef399e2fd7240b4eeb0658686ad95f8e62)
* PATCH: [perl #133423] for 5.26 maintKarl Williamson2018-11-051-1/+0
|
* PATCH: [perl #132055] Assertion failureKarl Williamson2018-03-241-0/+10
| | | | | | | This checks for and aborts if it find control characters in a supposed Unicode property name. Code further along could not handle these. This also fixes #132553 and #132658
* (perl #132227) restart a node if we change to uni rules within the node and ↵Karl Williamson2018-03-231-0/+12
| | | | | | | | encounter a sharp S This could lead to a buffer overflow. (cherry picked from commit a02c70e35d1313a5f4e245e8f863c810e991172d)
* fix #132017 - OPFAIL insert needs to set flags to 0Yves Orton2018-03-191-1/+5
| | | | | | why reginsert doesnt do this stuff I dont know. (cherry picked from commit 4dc12118f61b997fbd030230665b46e7c40f32d6)
* prevent integer overflow when compiling a regexpTony Cook2018-03-121-2/+6
| | | | | | Fixes [perl #131893]. (cherry picked from commit 6c4f4eb174d1e2e9f874786123a699d11ae741f9)
* if an SV IsCOW_shared_hash then we can assume it has a null at the endYves Orton2018-03-021-1/+1
| | | | (cherry picked from commit f1d945b85ac2d18ddd1ed2e1d4f72011246d905a)
* perl #132892: avoid leak by mortalizing temporary copy of patternYves Orton2018-03-021-2/+2
| | | | (cherry picked from commit 910a6a8be166fb3780dcd2520e3526e537383ef2)
* PATCH: [perl #131598]Karl Williamson2017-09-101-2/+4
| | | | | | | | | | The cause of this is that the vFAIL macro uses RExC_parse, and that variable has just been changed in preparation for code after the vFAIL. The solution is to not change RExC_parse until after the vFAIL. This is a case where the macro hides stuff that can bite you. (cherry picked from commit 2be4edede4ae226e2eebd4eff28cedd2041f300f)
* regcomp [perl #131582]Karl Williamson2017-09-101-0/+1
| | | | (cherry picked from commit 96c83ed78aeea1a0496dd2b2d935869a822dc8a5)
* Resolve Perl #131522: Spurious "Assuming NOT a POSIX class" warningYves Orton2017-09-071-12/+18
| | | | (cherry picked from commit bab0f8e933b383b6bef406d79c2da340bbcded33)
* Workaround for GNU Autoconf unescaped left braceKarl Williamson2017-04-171-2/+22
| | | | | | | | | | | | | | | | | | | | | See [perl #130497] GNU Autoconf depends on Perl, and will not work on Blead (and the forthcoming Perl 5.26), due to a single unescaped '{', that has previously been deprecated and is now fatal. A patch for it has been in the Autoconf repository since early 2013, but there has not been a release since before then. Because this is depended on by so much code, and because it is simpler than trying to revert to making the fatality merely deprecated, this patch simply changes perl to not die when compiled with the exact pattern that trips up Autoconf. Thus Autoconf can continue to work, but any other patterns that use the now illegal construct will continue to die. If other code uses the exact pattern, they too will not die, but the deprecation message continues to get raised. The use of the left brace in this particular pattern is not one where we envision using the construct to mean something else, so a deprecation is suitable for the foreseeable future.
* update size after RenewHugo van der Sanden2017-03-151-4/+6
| | | | | | | | | | | | | | | RT #130841 In general code, change this idiom: PL_foo_max += size; Renew(PL_foo, PL_foo_max, foo_t); to Renew(PL_foo, PL_foo_max + size, foo_t); PL_foo_max += size; so that if Renew dies, PL_foo_max won't be left hanging.
* (perl #130822) fix an AV leak in Perl_reg_named_buff_fetchTony Cook2017-02-211-4/+1
| | | | Originally noted as a scoping issue by Andy Lester.
* Revert "Deprecating the use of C<< \cI<X> >> to specify a printable character."Sawyer X2017-02-121-15/+6
| | | | This reverts commit bfdc8cd3d5a81ab176f7d530d2e692897463c97d.
* Change av_foo_nomg() nameKarl Williamson2017-02-111-17/+17
| | | | | | | | | | | | | | | These names sparked some controversy when created: http://www.nntp.perl.org/group/perl.perl5.porters/2016/03/msg235216.html I looked through existing code for paradigms to follow, and found some occurrences of 'skip_foo_mg'. So this commit changes the names to be av_top_index_skip_len_mg() av_tindex_skip_len_mg() This is explicit about the type of magic that is ignored, and will still be valid if another type of magic ever gets added.
* Coverity #155950: pRExC->code_blocks is blindly derefedJarkko Hietaniemi2017-02-101-0/+2
| | | | | Even though code calling S_pat_upgrade_to_utf8 from the Perl_re_op_compile is testing the code_blocks for NULLness.
* regcomp.c: Fix so will compile on C++11Karl Williamson2017-02-091-1/+1
| | | | See 147e38468b8279e26a0ca11e4efd8492016f2702 for complete explanation
* [perl #129061] CURLYX nodes can be studied more than onceHugo van der Sanden2017-02-061-3/+9
| | | | | | | | | | | | | study_chunk() for CURLYX is used to set flags on the linked WHILEM node to say it is the whilem_c'th of whilem_seen. However it assumes each CURLYX can be studied only once, which is not the case - there are various cases such as GOSUB which call study_chunk() recursively on already-visited parts of the program. Storing the wrong index can cause the super-linear cache handling in regmatch() to read/write the byte after the end of poscache. Also reported in [perl #129281].
* avoid double-freeing regex code blocksDavid Mitchell2017-02-011-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RT #130650 heap-use-after-free in S_free_codeblocks When compiling qr/(?{...})/, a reg_code_blocks structure is allocated and various SVs are attached to it. Initially this is set to be freed via a destructor on the savestack, in case of early dying. Later the structure is attached to the compiling regex, and a boolean flag in the structure, 'attached', is set to true to show that the destructor no longer needs to free the struct. However, it is possible to get three orders of destruction: 1) allocate, push destructor, die early 2) allocate, push destructor, attach to regex, die 2) allocate, push destructor, attach to regex, succeed In 2, the regex is freed (via the savestack) before the destructor is called. In 3, the destructor is called, then later the regex is freed. It turns out perl can't currently handle case 2: qr'(?{})\6' Fix this by turning the 'attached' boolean field into an integer refcount, then keep a count of whether the struct is referenced from the savestack and/or the regex. Since it normally has a value of 1 or 2, it's similar to a boolean flag, but crucially it no longer just indicates that the regex has a pointer to it ('attached'), but that at least one of the savestack and regex have a pointer to it. So order of freeing no longer matters. I also updated S_free_codeblocks() so that it nulls out SV pointers in the reg_code_blocks struct before freeing them. This is is generally good practice to avoid double frees, although is probably not needed at the moment.
* (perl #130684) allocate enough space for the extra 'x'Tony Cook2017-02-011-1/+1
| | | | | | | | | 77c8f26370dcc0e added support for a doubled x regexp flags, and ensured the doubled flag was passed to the qr// created by S_compile_runtime_code(). Unfortunately it didn't ensure enough space was allocated for that extra 'x'.
* mention PASS2 in reginsert() exampleHugo van der Sanden2017-01-291-1/+2
| | | | As per bb78386f13.
* assert that the RExC_recurse data structure points at a valid GOSUBYves Orton2017-01-281-0/+12
| | | | | This assert will fail if someone adds code that optimises away a GOSUB call. At which point they will see the comment and know what to do.
* only mess with NEXT_OFF() when we are in PASS2Yves Orton2017-01-271-2/+2
| | | | | | In 31fc93954d1f379c7a49889d91436ce99818e1f6 I added code that would modify NEXT_OFF() when we were not in PASS2, when we should not do so. Strangly this did not segfault when I tested, but this fix is required.
* add some details to the docs for S_reginsert()Yves Orton2017-01-271-0/+7
| | | | | | Had these docs been here I would have saved some time debugging. So save the next guy from the same trouble... (with my memory *I* might even be the /next guy/. Sigh.)
* fix RT #130561 - recursion and optimising away impossible quantifiers are ↵Yves Orton2017-01-271-11/+3
| | | | | | | | | | | | | | | not friends Instead of optimising away impossible quantifiers like (foo){1,0} treat them as unquantified, and guard them with an OPFAIL. Thus /(foo){1,0}/ is treated the same as /(*FAIL)(foo)/ this is important in patterns like /(foo){1,0}|(?1)/ where the (?1) needs to be able to recurse into the (foo) even though the (foo){1,0} can never match. It also resolves various issues (SEGVs) with patterns like /((?1)){1,0}/. This patch would have been easier if S_reginsert() documented that it is the callers responsibility to properly set up the NEXT_OFF() of the inserted node (if the node has a NEXT_OFF())
* rename opnd to operand to save my sanityYves Orton2017-01-271-5/+5
|
* better handle freeing of code blocks in /(?{...})/David Mitchell2017-01-241-107/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [perl #129140] attempting double-free Thus fixes some leaks and double frees in regexes which contain code blocks. During compilation, an array of struct reg_code_block's is malloced. Initially this is just attached to the RExC_state_t struct local var in Perl_re_op_compile(). Later it may be attached to a pattern. The difficulty is ensuring that the array is free()d (and the ref counts contained within decremented) should compilation croak early, while avoiding double frees once the array has been attached to a regex. The current mechanism of making the array the PVX of an SV is a bit flaky, as the array can be realloced(), and code can be re-entered when utf8 is detected mid-compilation. This commit changes the array into separately malloced head and body. The body contains the actual array, and can be realloced. The head contains a pointer to the array, plus size and an 'attached' boolean. This indicates whether the struct has been attached to a regex, and is effectively a 1-bit ref count. Whenever a head is allocated, SAVEDESTRUCTOR_X() is used to call S_free_codeblocks() to free the head and body on scope exit. This function skips the freeing if 'attached' is true, and this flag is set only at the point where the head gets attached to the regex. In one way this complicates the code, since the num_code_blocks field is now not always available (it's only there is a head has been allocated), but mainly its simplifies, since all the book-keeping is now done in the two new static functions S_alloc_code_blocks() and S_free_codeblocks()
* Fix bug with a digit range under re 'strict'Karl Williamson2017-01-191-39/+71
| | | | | | | | | "use re 'strict" is supposed to warn if a range whose start and end points are digits aren't from the same group of 10. For example, if you mix Bengali and Thai digits. It wasn't working properly for 5 groups of mathematical digits starting at U+1D7E. This commit fixes that, and refactors the code to bail out as soon as it discovers that no warning is warranted, instead of doing unnecessary work.
* Deprecating the use of C<< \cI<X> >> to specify a printable character.Abigail2017-01-161-6/+15
| | | | | | | | | | | Starting in 5.14, we deprecated the use of "\cI<X>" when this results in a printable character. For instance, "\c:" is just a fancy way of writing "z". Starting in 5.28, this will be a fatal error. This also includes certain usage in regular expressions with the experimental (?[ ]) construct, or when "use re 'strict'" is in effect (also experimental).
* Unescaped left braces in regular expressions will be fatal in 5.30.Abigail2017-01-161-1/+1
| | | | | | | | | | | | In 5.26, some uses of unescaped left braces were made fatal; they have given a deprecation warning since 5.20. Due to an oversight, some cases were missed, and did not give a deprecation warning. They do now. This patch changes said deprecation warning to mention the Perl version in which the use of an unescaped left brace will be fatal (5.30). The patch also cleans up some unnecessary quotes inside a C<> construct in the discussion of this warning in perldiag.pod.
* Warn on unescaped /[]}]/ under re strictKarl Williamson2017-01-131-0/+6
| | | | | | | | | | | | | | | | | | | This commit generates a warning when the experimental 're strict' feature is in effect for unescaped '}' and ']' characters (in a regular expression pattern) that are interpreted literally. This brings the behavior of these more in line with ')' which croaks when it is taken literally. The problem with the existing behavior is that these characters may be metacharacters or they may be literals, depending on action at a distance. Not so with ')', which is always a metacharacter unless escaped. Ideally, all three of these characters should behave similarly, but it really is too late for that, except we can warn if the user has requested extra checking of their patterns with this experimental 're strict' feature.
* regcomp.c: Clarify comment.Karl Williamson2017-01-131-1/+1
|
* Add /xx regex pattern modifierKarl Williamson2017-01-131-8/+19
| | | | | This was first proposed in the thread starting at http://www.nntp.perl.org/group/perl.perl5.porters/2014/09/msg219394.html
* regcomp.c: Remove obsolete data structure elementKarl Williamson2017-01-131-6/+0
| | | | | This was used for the removed feature of having the source in a different encoding.
* Rmv unused regex implementation structure elementKarl Williamson2017-01-121-9/+0
|
* PATCH: [perl #130530]: HP-UX assertion failureKarl Williamson2017-01-111-3/+2
| | | | | | | | | | | | This was introduced in a1a5ec35e6a3df0994b103aadb28a8c1a3a278da, and was due to a thinko on my part. Zefram figured it out. A macro evaluating to a string constant returns an instance of that constant. Compilers are free to collapse all instances into a single one (which saves space), or to have multiple copies. The code was assuming the former, and HP-UX cc doesn't. The passed size also was one byte larger than it should have been.
* Eliminate two unused variables detected by clang.James E Keenan2017-01-061-3/+0
| | | | "warning: unused variable 'i' [-Wunused-variable]"
* Removed unused CHR_DIST macro from a second file (RT 130519).James E Keenan2017-01-061-1/+0
|
* regcomp.c: Use memEQ instead of looping an element at a timeKarl Williamson2017-01-051-13/+2
|
* Convert core to use toFOO_utf8_safe()Karl Williamson2016-12-231-3/+5
|
* For character case changing, create macros and useKarl Williamson2016-12-231-1/+1
| | | | | This creates several macros that future commits will use to provide a layer between the caller and the function.
* regcomp.c, mathoms.c: Convert to use preferred macroKarl Williamson2016-12-231-2/+2
| | | | Better to use the macro than to directly call the function it wraps
* Convert core (except toke.c) to use isFOO_utf8_safe()Karl Williamson2016-12-231-3/+4
| | | | | | | The previous commit added this feature; now this commit uses it in core. toke.c is deferred to the next commit to aid in possible future bisecting, because some of the changes there seem somewhat more likely to expose bugs.
* add sv_set_undef() API functionDavid Mitchell2016-11-241-1/+1
| | | | | | | This function is equivalent to sv_setsv(sv, &PL_sv_undef), but more efficient. Also change the obvious places in the core to use the new idiom.
* Change white space to avoid C++ deprecation warningKarl Williamson2016-11-181-116/+117
| | | | | | | | | | | | | | | | | | | | | | C++11 requires space between the end of a string literal and a macro, so that a feature can unambiguously be added to the language. Starting in g++ 6.2, the compiler emits a warning when there isn't a space (presumably so that future versions can support C++11). Unfortunately there are many such instances in the perl core. This commit fixes those, including those in ext/, but individual commits will be used for the other modules, those in dist/ and cpan/. This commit also inserts space at the end of a macro before a string literal, even though that is not deprecated, and removes useless "" literals following a macro (instead of inserting a blank). The result is easier to read, making the macro stand out, and be clearer as to the intention. Code and modules included with the Perl core need to be compilable using C++. This is so that perl can be embedded in C++ programs. (Actually, only the hdr files need to be so compilable, but it would be hard to test that just the hdrs are compilable.) So we need to accommodate changes to the C++ language.
* Fix error message for unclosed \N{ in regcompDagfinn Ilmari Mannsåker2016-11-141-3/+5
| | | | | | | | | An unclosed \N{ that made it through to the regex engine rather than being handled by the lexer would erroneously trigger the error for "\N{NAME} must be resolved by the lexer". This separates the check for the missing trailing } and issues the correct error message for this.
* S_setup_longest(): SvTAIL() used where always 0David Mitchell2016-11-121-1/+5
| | | | | SvTAIL() isn't set on an SV until fbm_compile() has been called, so there's no point testing it before calling fbm_compile()
* regcomp.c: document the trie common prefix logicYves Orton2016-10-271-0/+15
| | | | | | | | I wrote this code some time ago. It is somewhat of a state machine with some interesting implicit assumptions which took me a while to remember. While I do it seems reasonable to document them so the next guy (maybe/probably me) doesn't have to think so hard.