summaryrefslogtreecommitdiff
path: root/regexec.c
Commit message (Collapse)AuthorAgeFilesLines
* regexec.c: Add commentKarl Williamson2021-08-071-0/+1
|
* regexec.c: Refactor macro to generalize itKarl Williamson2021-08-071-11/+27
| | | | This is in preparation for a somewhat different use to be added.
* regexec.c: Use lsbit_pos32() to avoid iterationsKarl Williamson2021-07-301-26/+13
| | | | | | Before this commit, the code looped through a bitmap looking for a set bit. Now that we have a fast way to find where a set bit is, use it, and avoid the fruitless iterations.
* Fix spelling: precedeFelipe Gasper2021-06-151-1/+1
|
* Base *.[ch] files: Replace leading tabs with blanksMichael G Schwern2021-05-311-2092/+2092
| | | | | | | This is a rebasing by @khw of part of GH #18792, which I needed to get in now to proceed with other commits. It also strips trailing white space from the affected files.
* regexec.c: Move parameter cast into macroKarl Williamson2021-05-281-8/+5
| | | | | Instead of calling the macro with a cast parameter, do the cast inside the macro so the caller doesn't have to be bothered with it.
* regexec.c: Replace code with equivlent inline fcnKarl Williamson2021-05-281-2/+1
| | | | Don't repeat a paradigm
* Perl_regexec_flags(): fixup code commentsDavid Mitchell2021-03-091-4/+4
| | | | | One comment there since 5.000 had become meaningless, so remove it; add a couple of of other code comments to compensate.
* regexec.c: Make internal function staticKarl Williamson2021-02-101-6/+2
| | | | | This used to be called from utf8.c, but no longer; no need to make it other than static. This allows the compiler to better optimize.
* Correct for build-time warningJames E Keenan2021-01-101-1/+1
| | | | | | Addresses this build-time warning: suggest braces around initialization of subobject [-Wmissing-braces]
* regexec.c: Fix assertion failure GH #18451Karl Williamson2021-01-031-13/+26
| | | | | This was caused by copying too many characters for the size of the buffer. Only one character is needed.
* regexec.c: Clarify commentsKarl Williamson2021-01-031-3/+6
|
* regexec.c: Silence compiler warningKarl Williamson2020-12-211-1/+1
|
* regexec.c: Fix failing CI 32-bit testsKarl Williamson2020-12-211-1/+3
| | | | | | This bug was introduced by bb3825626ed2b1217a2ac184eff66d0d4ed6e070, and was the result of overflowing a 32 bit space. The solution is to rework the expression so that it can't overflow.
* regexec.c: Link to github issue in commentKarl Williamson2020-12-201-0/+3
|
* regexec.c: White-space, comments onlyKarl Williamson2020-12-191-48/+50
| | | | Mostly indent because the prior commit created a new block
* regexec.c: Revamp S_setup_EXACTISH_ST() loop end conditionsKarl Williamson2020-12-191-599/+780
| | | | | | | | | | | Consider the pattern /A*B/ where A and B are arbitrary. The pattern matching code tries to make a tight loop to match the span of A's. The logic of this was not really updated when UTF-8 was added. I did revamp it some releases ago to fix some bugs and to at least consider UTF-8. This commit changes it so that Unicode is now a first class citizen. Some details are listed in the ticket GH #18414
* regexec.c: Change name of static functionKarl Williamson2020-12-191-4/+4
| | | | The new name reflects its new functionality coming in future commits
* regexec.c: Trim trailing blanksKarl Williamson2020-12-191-87/+87
|
* Restrict scope/Shorten some very long macro namesKarl Williamson2020-11-221-2/+0
| | | | | | The names were intended to force people to not use them outside their intended scopes. But by restricting those scopes in the first place, we don't need such unwieldy names
* autodoc.pl: Enhance apidoc_section featureKarl Williamson2020-11-061-1/+1
| | | | | | | | | | | This feature allows documentation destined for perlapi or perlintern to be split into sections of related functions, no matter where the documentation source is. Prior to this commit the line had to contain the exact text of the title of the section. Now it can be a $variable name that autodoc.pl expands to the title. It still has to be an exact match for the variable in autodoc, but now, the expanded text can be changed in autodoc alone, without other files needing to be updated at the same time.
* regexec.c: Store expression in a variableKarl Williamson2020-10-161-10/+11
| | | | | | This makes the text look cleaner, and prepares for a future commit, where we will want to change the variable (which can't be done with the expression).
* regexec.c: Change variable name in a functionKarl Williamson2020-10-161-8/+8
| | | | This makes it like a corresponding variable.
* regexec.c: Rename local variable; change typeKarl Williamson2020-10-161-32/+33
| | | | | | | | | | | I found myself getting confused, as this most likely was named before UTF-8 came along. It actually is just a byte, plus an out-of-bounds value. While I'm at it, I'm also changing the type from I32, to the perl equivalent of the C99 'int_fast16_t', as it doesn't need to be 32 bits, and we should let the compiler choose what size is the most efficient that still meets our needs.
* regcomp.c,regexec.c: SimplifyKarl Williamson2020-10-161-11/+3
| | | | | This commit uses the new macros from the previous commit to simply come code.
* regexec.c: Macroize another common paradigmKarl Williamson2020-10-141-22/+16
|
* regexec.c: Macroize a common paradigmKarl Williamson2020-10-141-19/+12
|
* regexec.c: Rename a static variableKarl Williamson2020-10-141-5/+9
| | | | | This is to distinguish it from a similar variable being added in a future commit
* regexec.c: find_byclass(): RestructureKarl Williamson2020-10-141-465/+754
| | | | | | | | | | | | This is a follow-on to the previous commit. The case number of the main switch statement now includes three things: the regnode op, the UTF8ness of the target, and the UTF8ness of the pattern. This allows the conditionals within the previous cases (which only encoded the op), to be removed, and things to be moved around so that there is more fall throughs and fewer gotos, and the macros that are called no longer have to test for UTF8ness; so I teased the UTF8 ones apart from the non_UTF8 ones.
* regexec.c: S_find_byclass(): utf8ness in switch()Karl Williamson2020-10-141-40/+40
| | | | | | | | | | | | | | | | | | | | | | | | This uses the #defines created in the previous commit to make the switch statement in this function incorporate the UTF8ness of both the pattern and the target string. The reason for this is that the first statement in nearly every case of the switch is to test if the target string being matched is UTF-8 or not. By putting that information into the the case number, those conditionals can be eliminated, leading to cleaner, more modular code. I had hoped that this would also improve performance since there are fewer conditionals, but Sergey Aleynikov did performance testing of this change for me, and found no real noticeable gain nor loss. Further, the cases involving matching EXACTish nodes have to also test if the pattern is UTF-8 or not before doing anything else. I added that information as well to the case number, so that those conditionals can be eliminated. For the non-EXACTish nodes, it simply means that that two case statements execute the same code. This is an intermediate commit, which only does the expansion of the current cases into four for each. The refactoring that takes advantage of this is in the following commit.
* regexec: disallow zero-width nodes in regrepeatHugo van der Sanden2020-10-081-19/+0
| | | | | GH #17594: the logic here expects the node to have width 1 (except for LNBREAK), it is not expected to do the right thing on zero-width nodes.
* regexec.c: White-space onlyKarl Williamson2020-10-021-169/+170
| | | | Adjust indentation as a result of the previous commit.
* S_find_byclass() Restructure bounds checkingKarl Williamson2020-10-021-59/+16
| | | | | | | There are five \b variants. Plain \b (without braces) is the outlier as far as implementation. This commit moves the handling of plain \b to outside the switch that handles the others. That allows the duplicate code that previously existed to be consolidated into one occurrence.
* Change some =head1 to apidoc_section linesKarl Williamson2020-09-041-1/+1
| | | | | apidoc_section is slightly favored over head1, as it is known only to autodoc, and can't be confused with real pod.
* Use av_top_index() instead of av_tindex()Karl Williamson2020-08-191-1/+1
| | | | | | | I was never happy with this short form, and other people weren't either. Now that most things are better expressed in terms of av_count, convert the few remaining items that are clearer when referring to an index into using the fully spelled out form
* regexec.c: Use withinCOUNT()Karl Williamson2020-08-081-7/+2
| | | | This is faster, and clearer
* regexec.c: Clarify commentKarl Williamson2020-08-081-1/+1
|
* regexec.c: Use UTF8SKIP for utf8 stringKarl Williamson2020-07-301-1/+1
| | | | | This code was advancing per-byte on a UTF-8 string, which still works, but is slower than need be.
* Remove use of dVAR in coreDagfinn Ilmari Mannsåker2020-07-201-15/+0
| | | | | It only does anything under PERL_GLOBAL_STRUCT, which is gone. Keep the dNOOP defintion for CPAN back-compat
* regexec.c: Fix commentKarl Williamson2020-07-171-1/+2
|
* regexec.c: Don't use sizeof()Karl Williamson2020-07-171-1/+1
| | | | | A future commit will change this array so that its size isn't known at compilation time.
* Fix a bunch of repeated-word typosDagfinn Ilmari Mannsåker2020-05-221-2/+2
| | | | | Mostly in comments and docs, but some in diagnostic messages and one case of 'or die die'.
* regexec.c: Clean up debug callHugo van der Sanden2020-03-111-3/+5
| | | | | | The code this replaces relies on the internal structure of a macro, which can change and break things. This commit changes to use a more straight forward way of accomplishing the same thing.
* Allow debugging from regexec.c back to regcomp.cKarl Williamson2020-03-111-2/+8
| | | | | | | | | The compilation of User-defined properties in a regular expression that haven't been defined at the time that pattern is compiled is deferred until execution time. Until this commit, any request for debugging info on those was ignored. This fixes that by
* regex: Change internal macro nameKarl Williamson2020-03-051-9/+9
| | | | | It wasn't clear to me that the macro did more than a declaration, given its name. Rename it to be clear as to what it does.
* regexec.c: Fix Debug statementKarl Williamson2020-02-261-2/+2
| | | | | | | | A proper debugging statement isn't just controlled by DEBUG_r, it needs what sort of class of debugging controls this, so that re.pm can operate properly. This is the second of two cases in the code where it was wrong.
* regexec.c: Fix Debug statementKarl Williamson2020-02-261-1/+1
| | | | | | | | A proper debugging statement isn't just controlled by DEBUG_r, it needs what sort of class of debugging controls this, so that re.pm can operate properly. This is one of two cases in the code where it was wrong.
* Change Unicode property abbrev to upcoming officialKarl Williamson2020-01-301-1/+1
| | | | | | | | | | | | Unicode 12.0 used a new property file that was not from the Unicode Character Database. It only had a long property name. I incorporated it into our data, and rather than use the very long name all the time, I created my own short name, since there was no official one. Now, the upcoming 13.0 has moved the file to the UCD, and come up with a short name that differs from the one I had. This commit converts to use Unicode's name. This property is not exposed to user or XS space, so there is no user impact.
* regexec: don't increment recursion counter for non-postponed EVALHugo van der Sanden2020-01-271-1/+1
| | | | | It wasn't intended to be part of the recursion logic, and doesn't get decremented again (GH 17490).
* Change parameter type of static fcnKarl Williamson2020-01-031-1/+1
| | | | | This makes the first parameter consistent with the other similar parameter.