summaryrefslogtreecommitdiff
path: root/regcomp.c
Commit message (Collapse)AuthorAgeFilesLines
* semicolon-friendly diagnostic controlZefram2017-12-161-2/+2
| | | | | | New macros {GCC,CLANG}_DIAG_{IGNORE,RESTORE}_{DECL,STMT}, which take a following semicolon. It is necessary to use the _DECL or _STMT version as appropriate to the context. Fixes [perl #130726].
* PATCH: [perl #132548] regcomp.c Fix memory leakKarl Williamson2017-12-081-0/+1
|
* add comment (to test pushing)Yves Orton2017-12-081-1/+1
|
* fix #131649 - extended charclass can trigger assertYves Orton2017-12-071-10/+18
| | | | | | | | The extended charclass parser makes some assumptions during the first pass which are only true on well structured input, and it does not properly catch various errors. later on the code assumes that things the first pass will let through are valid, when in fact they should trigger errors.
* regcomp.c: handle_regex_sets() - add DEBUG_PARSE and fixup 'depth' logicYves Orton2017-12-071-4/+6
| | | | | | | | | | | | 'depth' is used to track the recursion depth during compilation, and is used by things like DEBUG_PARSE() to show the compilation process. handle_regex_sets() was using its own 'depth' for two different purposes, which is quite confusing. At the same time, when we call handle_regex_sets() from reg() it is important to increment 'depth'.
* prevent integer overflow when compiling a regexpTony Cook2017-12-061-2/+6
| | | | Fixes [perl #131893].
* comment entry points to study_chunkYves Orton2017-12-011-0/+16
| | | | so it is a bit easier to follow what they are used for.
* Initialize variables.Jarkko Hietaniemi2017-11-291-0/+2
| | | | Coverity #169257, #169265, #169269.
* avoid redundant initialisation around Newxz()Zefram2017-11-131-8/+8
| | | | | | Reduce Newxz() to Newx() where all relevant parts of the memory are being explicitly initialised, and don't explicitly zero memory that was already zeroed. [perl #36078]
* remove unused struct member "is_top_frame"Zefram2017-11-131-1/+0
|
* regcomp.c: Convert some strchr to memchrKarl Williamson2017-11-061-4/+6
| | | | | This allows things to work properly in the face of embedded NULs. See the branch merge message for more information.
* dquote.c: Use memchr() instead of strchr()Karl Williamson2017-11-061-0/+4
| | | | | | | This allows \x and \o to work properly in the face of embedded NULs. A limit parameter is added to each function, and that is passed to memchr (which replaces strchr). See the branch merge message for more information.
* Use memBEGINs() in coreKarl Williamson2017-11-061-14/+15
|
* Change some strBEGINs() to memBEGINs()Karl Williamson2017-11-061-4/+5
| | | | | | The latter is generally faster when the length is already known. This commit also changes a few hard-coded numbers to use sizeof().
* Change some strncmp(), etc. to strBEGINs()Karl Williamson2017-11-061-2/+2
| | | | | The latter is much clearer as to what's going on, and the programmer and program reader don't have to count characters.
* Use memEQs, memNEs in core filesKarl Williamson2017-11-061-5/+5
| | | | | | | | | | Where the length is known, we can use these functions which relieve the programmer and the program reader from having to count characters. The memFOO functions should also be slightly faster than the strFOO equivalents. In some instances in this commit, hard coded numbers are used. These come from the 'case' statement values that apply to them.
* Change upper limit handling of -Dr outputKarl Williamson2017-10-271-8/+9
| | | | | | | | | | | | | | Commit 2bfbbbaf9ef1783ba914ff9e9270e877fbbb6aba changed things so -Dr output could be changed through an environment variable to truncate the output differently than the default. For most purposes, the default is good enough, but for someone trying to debug the regcomp internals, sometimes one wants to see more than is output by default. That commit did not catch all the places. This one changes the handling so that any place that use the previous default maximum now uses the environment variable (if set) instead.
* regcomp.c: Skip UTF-8 decoding for invariantsKarl Williamson2017-10-271-2/+2
| | | | | By adding two branches, we can avoid the expensive UTF-8 decode step for the common case of the input being an ASCII character.
* regcomp.c: Don't forget to restore stateKarl Williamson2017-10-271-10/+12
| | | | | | | | | | | | | In code reading (so I don't have a test case), I realized that this code could return out of its function without restoring the state that it has changed out from under the caller, and that can lead to havoc when the caller continues on assuming the original state. This commit moves the return and other checking to after the state restoral code. It can call FAIL2 as part of a panic after the state the failure is in is gone. This would be a problem if it called vFAIL2 instead, but isn't because FAIL2 doesn't need the state the failure was in.
* regcomp.c: Remove redundant 'if'Karl Williamson2017-10-271-2/+0
| | | | | We have already assured earlier in the function that this 'if' is always true.
* regcomp.c: White-space onlyKarl Williamson2017-10-271-10/+10
| | | | Vertically align some text.
* regcomp.c: Add assertionKarl Williamson2017-10-271-0/+1
|
* regcomp.c: Fix typo in commentKarl Williamson2017-10-211-1/+1
|
* regcomp.c: Add assertionKarl Williamson2017-10-211-1/+1
| | | | If this value is negative, something is wrong.
* Don't use VOL internally, because "volatile" works just fineAaron Crane2017-10-211-2/+2
| | | | However, we do preserve it outside PERL_CORE for the use of XS authors.
* Fix #131868 - silence quantifier warnings for regex gosubYves Orton2017-09-131-1/+1
| | | | | | | | | | We check that numerically quantified subpatterns can match something, so that we can detect things like (){4}. However, we produce false positives when using regex recursion. This is related to slow-downs in grammar matches in Perl 5.20 which were fixed by a51d618a82a7057c3aabb600a7a8691d27f44a34. In an ideal world we would do a lot of work and this false-positive would not happen, but that requires more round tuits than I have available
* PATCH: [perl #131598]Karl Williamson2017-09-101-2/+4
| | | | | | | | The cause of this is that the vFAIL macro uses RExC_parse, and that variable has just been changed in preparation for code after the vFAIL. The solution is to not change RExC_parse until after the vFAIL. This is a case where the macro hides stuff that can bite you.
* regcomp [perl #131582]Karl Williamson2017-09-101-0/+1
|
* reduce error surface of reginsert, set flags to 0 for inserted nodeYves Orton2017-09-101-5/+2
| | | | | this means that callers do not have to worry about resetting the flags, which reduces the chance of error in using reginsert.
* fix #132017 - OPFAIL insert needs to set flags to 0Yves Orton2017-09-101-1/+5
| | | | why reginsert doesnt do this stuff I dont know.
* Perl_reg_temp_copy(): rename args.David Mitchell2017-08-041-33/+42
| | | | | | | | | | | This function copies a regexp SV. Rename its args to ssv and dsv to match a usual convention in other functions such as sv_catsv(). Similarly rename the two local vars holding ReANY(ssv/dsv) to srx, drx. This is less confusing than having four vars called rx, ret_x, r, ret. Also update the comments explaining what the function does.
* give REGEXP SVs the POK flag againDavid Mitchell2017-07-271-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit v5.17.5-99-g8d919b0 stopped SVt_REGEXP SVs (and PVLVs acting as regexes) from having the POK and pPOK flags set. This made things like SvOK() and SvTRUE() slower, because as well as the quick single test for any I/N/P/R flags, SvOK() also has to test for (SvTYPE(sv) == SVt_REGEXP || (SvFLAGS(sv) & (SVTYPEMASK|SVp_POK|SVpgv_GP|SVf_FAKE)) == (SVt_PVLV|SVf_FAKE)) This commit fixes the issue fixed by g8d919b0 in a slightly different way, which is less invasive and allows the POK flag. Background: PVLV are basically PVMGs with a few extra fields. They are intended to be a superset of all scalar types, so any scalar value can be assigned to a PVLV SV. However, once REGEXPs were made into first-class scalar SVs, this assumption broke - there are a whole bunch of fields in a regex SV body which can't be copied to to a PVLV. So this broke: sub f { my $r = qr/abc/; # $r is reference to an SVt_REGEXP $_[0] = $$r; } f($h{foo}); # the hash access is deferred - a temporary PVLV is # passed instead The basic idea behind the g8d919b0 fix was, for an LV-acting-as-regex, to attach both a PVLV body and a regex body to the SV head. This commit keeps this basic concept; it just changes how the extra body is attached. The original fix changed SVt_REGEXP SVs so that sv.sv_u.svu_pv no longer pointed to the regexp's string representation; instead this pointer was stored in a union made out of the xpv_len field. Doing this necessitated not turning the POK flag on for any REGEXP SVs. This freed up the sv_u to point to the regex body, while the sv_any field could continue to point to the PVLV body. An ReANY() macro was introduced that returned the sv_u field rather than the sv_any field. This commit changes it so that instead, on regexp SVs (and LV-as-regexp SVs), sv_u always points to the string buffer (so they can have POK set again), but on specifically LV-as-regex SVs, the xpv_len_u union of the PVLV body points to the regexp body. This means that SVt_REGEXP SVs are now completely "normal" again, and SVt_PVLV SVs are normal except in the one case where they hold a regex, in which case rather than storing the string buffer's length, the PVLV body stores a pointer to the regex body.
* Revert "use symbolic constants for substrs[] indices"David Mitchell2017-07-051-45/+42
| | | | | | This reverts commit 2ac902efe11ee156653eb2ca1369f0e5f4546c31. See thread at Message-ID: <d8jfuebazrl.fsf@dalvik.ping.uio.no>
* regcomp.c: use symbolic constants for substrs[] indicesDagfinn Ilmari Mannsåker2017-07-051-42/+45
|
* regcomp.c: parameterise scan_data_t substrs[]David Mitchell2017-07-021-157/+133
| | | | | | | | | | | | | | Now that the scan_data_t stores its fixed and floating substring data as a 2-element array, replace various bits of duplicated code which separately handled fixed and floating substrings with for (i = 0; i < 2; i++) loops etc. This makes the code shorter and simpler, and will make it easier in future to expand to more than a single each of fixed+float. There should be no functional changes, except that debugging output now displays N..N rather than just just N for the fixed substring start range (i.e. its now just a subset of float where max == min)
* scan_data_t: rename 'longest' fieldDavid Mitchell2017-07-021-17/+19
| | | | | | | | | .. to 'cur_is_floating' It's an index into either the fixed or float substring info; the information it provides is whether the currently being captured substring is fixed or floating; it's nothing to do with whether the fixed or the floating is currently the longest.
* regcomp.c: remove float_min_offset etc macro useDavid Mitchell2017-07-021-56/+65
| | | | | | | | | | | | | | In this src file, expand all the various macros like #define anchored_offset substrs->data[0].min_offset #define float_min_offset substrs->data[1].min_offset This will later allow parts of the code to be parameterised, e.g. for (i=0; i<1; i++) { substrs->data[i].min_offset = ...; ... }
* regcomp.c: S_setup_longest(): simplify argsDavid Mitchell2017-07-021-12/+8
| | | | | | | | | | | | | | Rather than passing e.g. &(r->float_utf8), &(r->float_substr), &(r->float_end_shift), pass the single arg &(r->substrs->data[1]) (float_foo are macros which expand to substrs->data[1].foo)
* regcomp: set fixed max_offset to min_offsetDavid Mitchell2017-07-021-8/+23
| | | | | | | | | | | | | | | | | | | | | | | previously scan_data_t had the three fields offset_fixed offset_float_min offset_float_max a few commits ago that was converted into a 2 element array (for fixed and float), each with the fields min_offset max_offset where the max_offset was unused in fixed (substrs[0]) case. Instead, set it equal to min_offset This makes the fixed and float code paths more similar. At the same time expand a few of the 'float_max_offset' type macros to make it clearer what's going on.
* S_study_chunk: have per substring flagsDavid Mitchell2017-07-021-39/+40
| | | | | | | | | | | | | | | | | | | | | | | Currently the scan_data_t struct has a flags field which contains SF_ and SCF_ flags. Some of the SF_ flags are general; others are specific to the fixed or floating substr. For example there are these 3 flags: SF_BEFORE_MEOL SF_FIX_BEFORE_MEOL SF_FL_BEFORE_MEOL This commit adds a flags field to the per-substring substruct and sets some flags per-substring instead. For example previously we did: now we would do: -------------------------------- -------------------------------------- data->flags |= SF_BEFORE_MEOL unchanged data->flags |= SF_FIX_BEFORE_MEOL data->substrs[0].flags |= SF_BEFORE_MEOL data->flags |= SF_FL_BEFORE_MEOL data->substrs[1].flags |= SF_BEFORE_MEOL This allows us to simplify the code (e.g. eliminating some args from S_setup_longest()) and in future will allow more than one fixed or floating substring.
* regcomp.c: DEBUG_PEEP(): invalid flagsDavid Mitchell2017-07-021-6/+6
| | | | | | | | DEBUG_PEEP(..., flags) was invoked from 3 functions - however in two of throse functions, the 'flags' local var did *not* contain SF_ and SCF_ bits, so the flag bits were being incorrectly displayed as SF_ etc. In those two functions, change it instead to DEBUG_PEEP(...., 0)
* regcomp.c: convert debugging macros to static fnsDavid Mitchell2017-07-021-89/+133
| | | | | | | | | | | | make these 3 macros into thin wrappers around some new static functions, rather than just being huge macros: DEBUG_SHOW_STUDY_FLAGS DEBUG_STUDYDATA DEBUG_PEEP Also, avoid the macros implicitly using local vars: make them into explicit parameters instead (this is one of my pet peeves).
* make struct scan_data_t->longest an index valDavid Mitchell2017-07-021-23/+22
| | | | | | | | | | | In this private data structure used during regex compilation, the 'longest' field was an SV** pointer which was always set to point to one of these two addresses: &(data->substrs[0].str) &(data->substrs[1].str) Instead, just make it a U8 with the value 0 or 1.
* S_setup_longest() pass struct rather than fieldsDavid Mitchell2017-07-021-25/+21
| | | | | Now that a substring is a separate struct, pass as a single pointer rather than as 4 separate args.
* struct scan_data_t: make some fields into an arrayDavid Mitchell2017-07-021-79/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This private struct is used just within regcomp.c while compiling a pattern. It has a set of fields for a fixed substring, and similar set for floating, e.g. SV *longest_fixed; SV *longest_float; SSize_t *minlen_fixed; SSize_t *minlen_float; etc Instead have a 2 element array, one for fixed, one for float, so e.g. data->offset_float_max becomes data->substrs[1].max_offset There are 3 reasons for doing this. First, it makes the code more regular, and allows a whole substr ptr to be passed as an arg to a function rather than having to pass every individual field; second, it makes the compile-time struct more similar to the runtime struct, which already has such an arrangement; third, it allows for a hypothetical future expansion where there aren't necessarily at most 1 fixed and 1 floating substring. Note that a side effect of this commit has been to change lookbehind_fixed from I32 to SSize_t; lookbehind_float was already SSize_t, so the I32 was probably a bug.
* regcomp.c: correct the regdata which paratermers under DEBUGYves Orton2017-06-271-1/+4
| | | | | | this worked because 'a' and 'o' are treated the same for all intents and purposes, but it is confusing as 'a' stands for array, and 'o' for hash, and the DEBUG mode code here adds two arrays not hashes.
* regcomp.c: document reg_data types better in reg_dupYves Orton2017-06-271-8/+22
|
* wrap multi-statement macros in STMT_START/STMT_ENDLukas Mai2017-06-201-4/+8
| | | | | | | | | | | | | With the original code you'd have to be very, very careful: if (foo) CLEAR_POSIX_WARNINGS_AND_RETURN(42); would have expanded to if (foo) CLEAR_POSIX_WARNINGS(); return 42; /* always returns! */
* Resolve Perl #131522: Spurious "Assuming NOT a POSIX class" warningYves Orton2017-06-181-12/+18
|
* enforce size constraint via STATIC_ASSERT, not just a commentLukas Mai2017-06-071-0/+1
|