summaryrefslogtreecommitdiff
path: root/regcomp.c
Commit message (Collapse)AuthorAgeFilesLines
* regcomp.c: White-space onlyKarl Williamson2021-08-071-7/+7
|
* regcomp.c: Add comment; fix commentKarl Williamson2021-08-071-1/+46
| | | | | The flagp parameter currently can only be used to pass values up, not down.
* regcomp.c: Initialize a variableKarl Williamson2021-08-071-1/+1
| | | | to silence some compilers that were warning
* regcomp.c: Save a value instead of re-calling fcnKarl Williamson2021-08-071-2/+4
| | | | | This variable will be used in future commits in more places, so compute it just once.
* regcomp.c: Add a clearer mnemonicKarl Williamson2021-08-071-26/+30
|
* regcomp.c: Move some code to within a blockKarl Williamson2021-08-071-3/+3
| | | | | This code is irrelevant unless the condition of the block immediately before it is TRUE, so move it to within that block.
* regcomp.c: Consolidate duplicate codeKarl Williamson2021-08-071-17/+17
|
* regcomp.c: S_optimize_regclass() return 0 if failKarl Williamson2021-08-071-7/+13
| | | | | Based on a comment from @hvds, I think it better if this function return an impossible node value if it didn't find a node to use.
* regcomp.c: Add some branch predictorsKarl Williamson2021-08-071-2/+2
|
* regcomp.c: Move some code out of unlikely #ifdefKarl Williamson2021-08-071-4/+5
| | | | | Spotted by Hugo van der Sanden. Doing this caused it to attempt to be compiled, and showed a typo.
* utf8.h: Add symbol for easing EBCDIC handlingKarl Williamson2021-08-071-5/+6
| | | | This is then used in regcomp.c to avoid an #ifdef EBCDIC
* regcomp: Save '&' instrs by casting to U8Karl Williamson2021-07-301-1/+1
|
* Create and use single_1bit_pos32()Karl Williamson2021-07-301-10/+1
| | | | | | This moves the code from regcomp.c to inline.h that calculates the position of the lone set bit in a U32. This is in preparation for use by other call sites.
* regcomp.c: Use existing macroKarl Williamson2021-07-251-1/+1
| | | | Don't reinvent the macro
* Fix spelling: precedeFelipe Gasper2021-06-151-1/+1
|
* regcomp.c: commentsHugo van der Sanden2021-06-141-16/+14
| | | | Comment change suggestions from @hvds in PR #18835.
* regcomp.c: White-space onlyKarl Williamson2021-06-141-512/+512
| | | | | | | | My attempt to insulate from the leading tab removal the year-old commits finally pushed as 77a6d54c0deb1165b37dcf11c21cd334ae2579bb and 403d7eb3e4320188571cf61b9dab62ff10799f49 failed miserably. I spent a bunch of time sorting it all out, and this is the result.
* regcomp.c: Fix typo in commentKarl Williamson2021-06-121-1/+1
|
* Rename G_ARRAY to G_LIST; provide back-compat when not(PERL_CORE)Paul "LeoNerd" Evans2021-06-021-1/+1
|
* gh18770: stop scanning for substrs after *COMMITHugo van der Sanden2021-06-011-6/+20
| | | | | *ACCEPT already avoids this (because it is "ENDLIKE"), but gets a related fix to stop scanning for start class.
* regcomp.c: white-space; commentsKarl Williamson2021-05-311-268/+239
|
* Base *.[ch] files: Replace leading tabs with blanksMichael G Schwern2021-05-311-2615/+2615
| | | | | | | This is a rebasing by @khw of part of GH #18792, which I needed to get in now to proceed with other commits. It also strips trailing white space from the affected files.
* regcomp.c: Extract code from a too-large-functionKarl Williamson2021-05-311-140/+191
| | | | | S_regclass() is unwieldy. This commit splits it into two nearly equal size parts. More could be done.
* [gh 17847] data->pos_delta should stick at infinityHugo van der Sanden2021-05-311-0/+1
| | | | | | | | The expression we're about to add to data->pos_delta in this part of study_chunk() can be both positive or negative; however while we apply an overflow check to avoid exceeding OPTIMIZE_INFTY, we were happily subtracting from it when the expression was negative, making it no longer infinite.
* [gh 17847] avoid overflow on delta in study_chunkHugo van der Sanden2021-05-311-2/+14
| | | | delta and pos_delta may hold OPTIMIZE_INFTY to represent infinity.
* [gh 17847] Include data->pos_delta in #if'd-out diagnosticHugo van der Sanden2021-05-311-2/+3
|
* regcomp.c: Remove memory leakKarl Williamson2021-02-281-0/+7
| | | | | | | | | | | | | This fixes GH #18604. There was a path through the code where a particular SV did not get its reference count decremented. I did an audit of the function and came up with several other possiblities that are included in this commit. Further, there would be leaks for some instances of finding syntax errors in the input pattern, or when warnings are fatalized. Those would require mortalizing some SVs, but that is beyond the scope of this commit.
* Hide Perl_regcurly in the re extensionCraig A. Berry2021-02-151-6/+8
| | | | | | | | | Otherwise a strict linker will fail to build the extenstion due to a multiply defined symbol. We used to do this but it was removed in e513125ac7bdea1f for unknown reasons. The same commit also defined some macros inside the function that are used but inside and outside it, so put them where they can be seen regardless of whether we are defining the function itself.
* gh18515: fix special handling of specific split() patternsHugo van der Sanden2021-02-091-4/+8
| | | | | | | | | | | | | | | | Commit 122af31004 acted on the wrong assumption that NEXTOPER() and regnext() were equivalent, and in fixing a valgrind complaint tried to simplify code for detecting specific patterns for split() that merited special-case handling by making them all use regnext(). As a result, the special case /\s+/ was no longer correctly detected, resulting in a degree of pessimisation. This commit fixes that, and avoids reading via the calculated 'next' pointer except for the ops we need (in which cases we know it'll point to another regop) - for the EXACT case (which we don't need), valgrind was correctly pointing out that it points to potentially uninitialized data.
* regcomp.c: White-space and commentsKarl Williamson2021-01-201-12/+15
|
* Allow blanks within and adjacent to {...} constructsKarl Williamson2021-01-201-24/+77
| | | | | This was the consensus in http://nntp.perl.org/group/perl.perl5.porters/258489
* perlre: Note the other forms of \k<name>Karl Williamson2021-01-201-2/+2
| | | | | | Not all three synonyms were documented. This also fixes up related comments in regcomp.c to correspond
* regcomp.c: Further refactor \gKarl Williamson2021-01-201-14/+15
| | | | | By changing a bool into a pointer, we can avoid some work and prepare for a future commit.
* regcomp.c: Refactor portions of \g parsingKarl Williamson2021-01-201-13/+39
| | | | | | | | This moves the finding of the matching '}' for \g{ to earlier, and creates a temporary to point to the current position in the parse. This makes it easier to deal with backtracking; we haven't advanced the main parse pointer, so don't have to remember how far we advanced. This will prove advantageous in a future commit.
* regcomp.c: Move initialization into declarationKarl Williamson2021-01-201-2/+2
| | | | This is considered better practice.
* regcomp.c: Slight simplificationKarl Williamson2021-01-201-1/+1
| | | | | | | Rather than know how far we have advanced in parsing when we have to back up, use the already-existing checkpoint position. This results in slightly more maintainable code that a future commit will take advantage of.
* Allow empty lower bound in /{,n}/Karl Williamson2021-01-201-56/+11
| | | | | | | | This change has been planned for a long time, bringing Perl into parity with similar languages, but it took many deprecation cycles to be able to reach the point where it could safely go in. This fixes GH #18264
* Point to error in malformed /x{y,z}/Karl Williamson2021-01-201-2/+2
| | | | | | Prior to this comment a curly quantifier that had an error in the bounds pointed to the left brace. Now the error message points to the first bound that has a problem.
* Revamp regcurly(), regpiece() use of itKarl Williamson2021-01-201-66/+159
| | | | | | | | | | | | | | | | | | | | This commit copies portions of new_regcurly(), which has been around since 5.28, into plain regcurly(), as a baby step in preparation for converting entirely to the new one. These functions are used for parsing {m,n} quantifiers. Future commits will add capabilities not available using the old version. The commit adds an optional parameter, to return to the caller information it gleans during parsing. regpiece() is changed by this commit to use this information, instead of itself reparsing the input. Part of the reason for this commit is that changes are planned soon to what is legal syntax. With this commit in place, those changes only have to be done once. This commit also extracts into a function the calculation of the quantifier bounds. This allows the logic for that to be done in one place instead of two.
* regcomp.c: Change names of 2 macros for mnemonicsKarl Williamson2021-01-201-2626/+2627
| | | | | | The new names are more understandable to me. This also adds a second parameter to one macro, that is unused until the next commit in the series.
* style: Detabify indentation of the C code maintained by the core.Michael G. Schwern2021-01-171-2631/+2631
| | | | | | | | | | | This just detabifies to get rid of the mixed tab/space indentation. Applying consistent indentation and dealing with other tabs are another issue. Done with `expand -i`. * vutil.* left alone, it's part of version. * Left regen managed files alone for now.
* Don't define Perl_regcurly in re extensionCraig A. Berry2020-12-241-1/+2
| | | | | | This makes the linker have to decide (or guess) which of the identically-named symbols to include. The VMS linker refuses and throws a multiply-defined symbol error.
* Remove empty "#ifdef"sTom Hukins2020-12-081-4/+0
|
* Restrict scope/Shorten some very long macro namesKarl Williamson2020-11-221-11/+0
| | | | | | The names were intended to force people to not use them outside their intended scopes. But by restricting those scopes in the first place, we don't need such unwieldy names
* Move regcurly to regcomp.c (from inline.h)Karl Williamson2020-11-181-0/+24
| | | | | | This function is called only at compile time; experience has shown that compile-time operations are not time-critical. And future commits will lengthen it, making it not practically inlinable anyway.
* autodoc.pl: Specify scn for single-purpose filesKarl Williamson2020-11-061-1/+0
| | | | | | | | Many of the files in perl are for one thing only, and hence their embedded documentation will be for that one thing. By creating a hash here of them, those files don't have to worry about what section that documentation goes under, and so it can be completely changed without affecting them.
* don't croak when the \K follows the lookaround assertionTony Cook2020-11-041-23/+12
| | | | | | | | | | | | | | | this also simplifies the flagging for these assertions, since this error is now the only thing using in_lookhead and in_lookbehind they can be combined into a single in_lookaround. Rather than conditional increment/decrement as we recurse into S_reg I simply save the value of in_lookaround and restore it before returning. Some unsuccessful or restart paths don't do the restore, but they either result in a croak(), or a restart which reinitialises in_lookaround anyway. Also added tests to ensure that all the different zero-width assertions with content trigger the error.
* Fix GH #17278Karl Williamson2020-10-231-5/+10
| | | | | | | | | | | | This was an assertion failure in regexec.c under rare circumstances. A reduction of the fuzzed test case is now in pat_advanced.t The root cause of this was that the pattern being compiled was encoded in UTF-8 and 'use locale' was in effect, equivalent to the /l charset, and then the charset was reset inside the pattern, to /d. But /d in a UTF-8 patterns is illegal, hence the later assertion failure. The solution is to reset instead to /u when the pattern is UTF-8.
* perlapi: Add markupKarl Williamson2020-10-221-1/+1
|
* regcomp.c: Do some extra foldingKarl Williamson2020-10-161-4/+19
| | | | | | | | | | | | | | | | | | | | | Generally we have to wait until runtime to do folding for regnodes that are locale dependent, because we don't know what the locale at runtime will be, and hence what the folds will be. But UTF-8 locales all have the same folding behavior, no matter what the locale is, with the exception of two fold pairs in Turkish. (Lithuanian too, but Perl doesn't support that language's special folding rules.) UTF-8 is the only locale type that Perl supports that can represent code points above 255. Therefore we do know at compile time what the above-255 folds are (again excepting the two in Turkish), and so we can do the folding then. But only if both the components are above 255. There are a few folds that cross the 255/256 boundary, and they must be deferred. However, there are two instances where there are three characters that fold together in which two of them are above 255, and the third isn't. That the two high ones are equivalent under /i is known at compile time, and so that equivalence can be stated then.