summaryrefslogtreecommitdiff
path: root/proto.h
Commit message (Collapse)AuthorAgeFilesLines
* study_chunk: honour mutate_ok over recursionHugo van der Sanden2020-06-011-1/+1
| | | | | | | | | | | | | | As described in #17743, study_chunk can re-enter itself either by simple recursion or by enframing. 089ad25d3f used the new mutate_ok variable to track whether we were within the framing scope of GOSUB, and to disallow mutating changes to ops if so. This commit extends that logic to reentry by recursion, passing in the current state as was_mutate_ok. (CVE-2020-12723) (cherry picked from commit 3445383845ed220eaa12cd406db2067eb7b8a741)
* study_chunk: extract rck_elide_nothingHugo van der Sanden2020-06-011-0/+3
| | | | | | (CVE-2020-10878) (cherry picked from commit 4fccd2d99bdeb28c2937c3220ea5334999564aa8)
* regcomp.c: Rmv C undefined behaviorKarl Williamson2020-04-121-1/+1
| | | | | | | | | | | | | | | One analyzer said that what this commit changes was C undefined behavior, in casting void* pointers. Right now, the only actual type it is called with is SV*, but I made it void*, because I thought it might be used more generally. But, it turns out that Unicode is planning on changing its regular expression processing requirements to where what I have no longer will make sense. And, since only SV* is actually used, this commit changes the void* to SV*, removing any undefined behavior, with no changes to program logic. The changes for the new Unicode direction will come in probably 5.34; their document is still in draft, but I anticipate it will soon be finalized.
* Add named sequences to Unicode wildcard name capabilitesKarl Williamson2020-03-201-3/+3
| | | | | | | | | Prior to this commit, specifying a named sequence would result in a mostly unhelpful fatal error message. This makes their use legal. This is also the beginning of allowing Unicode string properties, which are a new thing in the (still draft) Unicode requirements for regular expression parsing, UTS 18. Full compliance will have to come later.
* pp_match(): output regex debugging infoKarl Williamson2020-03-181-0/+8
| | | | | | | | This fixes #17612 This adds an inline function to pp_hot to be called to determine if debugging info should be output or not, regardless of whether it comes from -Dr, or from a 'use re Debug' statement
* op.c: change Optype to I32 for cmpchain functionsYves Orton2020-03-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Optype appears to be almost completely unused, and on Win32 builds we saw warnings from the cmpchain patches: perly.y(1063) : warning C4244: 'function' : conversion from 'I32' to 'Optype', possible loss of data perly.y(1065) : warning C4244: 'function' : conversion from 'I32' to 'Optype', possible loss of data perly.y(1079) : warning C4244: 'function' : conversion from 'I32' to 'Optype', possible loss of data perly.y(1081) : warning C4244: 'function' : conversion from 'I32' to 'Optype', possible loss of data Reviewing the code I noticed that functions like Perl_newBINOP() have an I32 type argument, and functions like OpTYPE_set() coerce such arguments into type OPCODE: #define OpTYPE_set(o,type) \ STMT_START { \ o->op_type = (OPCODE)type; \ o->op_ppaddr = PL_ppaddr[type]; \ } STMT_END this patch changes the signature to the new cmpchain functions so that they do they same, and change the type for storage for op_type values to also use OPCODE like most of the other op.c code.
* chained comparisonsZefram2020-03-121-0/+14
|
* Stop requesting inlining some functions in recomp.cKarl Williamson2020-03-111-21/+7
| | | | | | | Pattern compilation is not a performance critical process; there's no need to request these to be inlined. Let the compiler decide, given they are static anyway. This came up because g++ was warning they weren't getting inlined anyway.
* Revert "croak_memory_wrap is an inline function."Karl Williamson2020-03-111-3/+1
| | | | | | | This reverts commit 6c714a09cc08600278e72aea1fcdf83576d061b4. croak_memory_wrap is designed to save a few bytes of memory, and was never intended to be inlined. This commit moves it to util.c where the other croak functions are.
* Rmv obsolete functionKarl Williamson2020-03-111-5/+0
| | | | | Use of this function was removed as part of adding wildcarding to the Unicode name property
* Allow debugging from regexec.c back to regcomp.cKarl Williamson2020-03-111-3/+12
| | | | | | | | | The compilation of User-defined properties in a regular expression that haven't been defined at the time that pattern is compiled is deferred until execution time. Until this commit, any request for debugging info on those was ignored. This fixes that by
* Add thread safety to some environment accessesKarl Williamson2020-03-111-0/+7
| | | | | | | | | | | | | | | | | | The previous commit added a mutex specifically for protecting against simultaneous accesses of the environment. This commit changes the normal getenv, putenv, and clearenv functions to use it, to avoid races. This makes the code simpler in places where we've gotten burned and added stuff to avoid races. Other places where we haven't known we were getting burned could have existed until now. Now that comes automatically, and we can remove the special cases we earlier stumbled over. getenv() returns a pointer to static memory, which can be overwritten at any moment from another thread, or even another getenv from the same thread. This commit changes the accesses to be under control of a mutex, and in the case of getenv, a mortalized copy is created so that there is no possible race.
* Implement \p{Name=/.../} wildcardsKarl Williamson2020-03-111-0/+3
| | | | | This commit adds wildcard subpatterns for the Name and Name Aliases properties.
* optimize sort by inlining comparison functionsTomasz Konojacki2020-03-091-6/+93
| | | | | | | | This makes special-cased forms such as sort { $b <=> $a } even faster. Also, since this commit removes PL_sort_RealCmp, it fixes the issue with nested sort calls mentioned in gh #16129
* regen/embed.pl: handle PERL_STATIC_INLINE_NO_RET properlyTomasz Konojacki2020-03-091-0/+2
|
* Allow wildcard pattern debuggingKarl Williamson2020-03-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | This fixes #17026 Patterns can have subpatterns since 5.30. These are processed when encountered, by suspending the main pattern compilation, compiling the subpattern, and then matching that against the set of all legal possibilities, which Perl knows about. Debugging info for the compilation portion of the subpattern was added by be8790133a0ce8fc67454e55e7849a47a0858d32, without fanfare. But, prior to this new commit, debugging info was not available for that matching portion of the compilation, except under DEBUGGING builds, with -Drv. This commit adds a new option to 'use re qw(Debug ...)', WILDCARD, to enable subpattern match debugging. Whatever other match debugging options have been turned on will show up when a wildcard subpattern is compiled iff WILDCARD is specified. The output of this may be voluminous, which is why you have to ask for it specifically. Or, the EXTRA option turns it on, along with several other things.
* Allow more debugging in re_comp.cKarl Williamson2020-03-021-2/+19
| | | | | | | | This adds two main functions that were previously only defined in regcomp.c to also be defined in re_comp.c. This allows re.pm to use debugging with them. To avoid duplicating large data structures, several lightweight wrapper functions are added to regcomp.c that re_comp.c calls to access those structures.
* Move two regex functions so that can use re debugKarl Williamson2020-03-021-8/+8
| | | | These have to have a version in re_comp.c for re.pm to work on them.
* embed.fnc: Reorder the entries dealing with regexesKarl Williamson2020-03-021-101/+101
| | | | | | This moves a bunch of entries around so that they make more sense, and consolidates some blocks that had the same #ifdefs. There should be no change in what gets compiled.
* regen/embed.pl: Force F or f flags on ... fcnsKarl Williamson2020-03-011-1/+3
| | | | | | This makes sure that a function with varargs arguments is marked as format or non-format, so that a such a new function can't be added without considering if it should be marked as 'f'.
* embed.fnc: Make re_croak a format fcnKarl Williamson2020-03-011-1/+2
| | | | | This enables compiler warnings when argument types don't match the format
* regcomp.c: Change re_croak2 to re_croakKarl Williamson2020-03-011-3/+3
| | | | | | | | | | | | | This changes this function from taking two format parameters to instead taking a single one. The reason is that the generality isn't actually currently needed, and it prevents the function from being declared as taking a format, which adds extra checking. If this checking had been in effect, GH #17574 would have generated a warning message. The reason the second format isn't required is that in all cases, both formats are literal strings. In the macros that call this, simply removing the comma separators between them causes the two literals to automagically become one by the C preprocessor.
* regcomp.c: Add wrappers for cmplng/xctng wildcard subpatternsKarl Williamson2020-02-191-0/+8
| | | | | | | | This is in preparation for being called from more than one place. It has the salubrious effect that the wrapping we do around the user's supplied pattern is no longer visible in the Debug output of that pattern.
* regcomp.c: Create wrapper fcn for re_op_compileKarl Williamson2020-02-191-0/+5
| | | | | | This does the bulk of re_compile(), but is a private entry point, meaning it takes an extra parameter, and a future commit will call it from another place.
* Move some obsolete UTF-8 handling fcns to mathomsKarl Williamson2020-02-191-2/+12
| | | | | Two of the functions are internal to the core; the third has long been deprecated.
* Improve handling of nested qr/(?[...])/Karl Williamson2020-02-191-0/+3
| | | | | | | | | | | | | | | | | | | | A set operations expression can contain a previously-compiled one interpolated in. Prior to this commit, some heuristics were employed to verify it actually was such a thing, and not a sort of look-alike that wasn't necessarily valid. The heuristics actually forbade legal ones. I don't know of any illegal ones that were let through, but it is certainly possible. Also, the error/warning messages referred to the heuristics, and were unhelpful at best. The technique used instead in this commit is to return a regop only used by this feature for any nested compilations. This guarantees that the caller can determine if the result is valid, and what that result is without having to do any heuristics or inspecting any flags. The error/warning messages are changed to reflect this, and I believe are now helpful. This fixes the bugs in #16779 https://github.com/Perl/perl5/issues/16779#issuecomment-563987618
* toke.c: Split code to load _charnames.pm into own fncKarl Williamson2020-02-121-0/+5
| | | | This is in preparation for it being called from more than one place.
* utf8.c: Use common fcn for error messageKarl Williamson2020-01-231-4/+6
| | | | | There is now a function that generates this error message. This is so that it is always the same from wherever generated.
* Move cntrl_to_mnemonic() to util.c from regcomp.cKarl Williamson2020-01-231-4/+4
| | | | | This is in preparation for it being used elsewhere, to reduce duplication of code.
* Change return type of regcurly to boolKarl Williamson2020-01-231-1/+1
| | | | This internal function is more properly bool, not I32.
* Remove dquote_inline.hKarl Williamson2020-01-231-7/+9
| | | | | The remaining function in this file is moved to inline.h, just to not have an extra file lying around with hardly anything in it.
* (toke|regcomp).c: Use common fcn to handle \0 problemsKarl Williamson2020-01-231-7/+0
| | | | | | | | This changes warning messages for too short \0 octal constants to use the function introduced in the previous commit. This function assures a consistent and clear warning message, which is slightly different than the one this commit replaces. I know of no CPAN code which depends on this warning's wording.
* Restructure grok_bslash_[ox]Karl Williamson2020-01-231-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit causes these functions to allow a caller to request any messages generated to be returned to the caller, instead of always being handled within these functions. The messages are somewhat changed from previously to be clearer. I did not find any code in CPAN that relied on the previous message text. Like the previous commit for grok_bslash_c, here are two reasons to do this, repeated here. 1) In pattern compilation this brings these messages into conformity with the other ones that get generated in pattern compilation, where there is a particular syntax, including marking the exact position in the parse where the problem occurred. 2) These could generate truncated messages due to the (mostly) single-pass nature of pattern compilation that is now in effect. It keeps track of where during a parse a message has been output, and won't output it again if a second parsing pass turns out to be necessary. Prior to this commit, it had to assume that a message from one of these functions did get output, and this caused some out-of-bounds reads when a subparse (using a constructed pattern) was executed. The possibility of those went away in commit 5d894ca5213, which guarantees it won't try to read outside bounds, but that may still mean it is outputting text from the wrong parse, giving meaningless results. This commit should stop that possibility.
* Restructure grok_bslash_cKarl Williamson2020-01-231-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | This commit causes this function to allow a caller to request any messages generated to be returned to the caller, instead of always being handled within this function. Like the previous commit for grok_bslash_c, here are two reasons to do this, repeated here. 1) In pattern compilation this brings these messages into conformity with the other ones that get generated in pattern compilation, where there is a particular syntax, including marking the exact position in the parse where the problem occurred. 2) The messages could be truncated due to the (mostly) single-pass nature of pattern compilation that is now in effect. It keeps track of where during a parse a message has been output, and won't output it again if a second parsing pass turns out to be necessary. Prior to this commit, it had to assume that a message from one of these functions did get output, and this caused some out-of-bounds reads when a subparse (using a constructed pattern) was executed. The possibility of those went away in commit 5d894ca5213, which guarantees it won't try to read outside bounds, but that may still mean it is outputting text from the wrong parse, giving meaningless results. This commit should stop that possibility.
* dquote.c: Change parameter nameKarl Williamson2020-01-231-4/+4
| | | | | | In two functions, future commits will generalize this parameter to be possibly a warning message instead of only an error message. Change its name to reflect the added meaning.
* Hoist code point portability warningsKarl Williamson2020-01-231-2/+2
|
* Improve performance of grok_bin_oct_hex()Karl Williamson2020-01-131-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit uses a variety of techniques for speeding this up. It is now faster than blead, and has less maintenance cost than before. Most of the checks that the current character isn't NUL are unnecssary. The logic works on that character, even if, for some reason, you can't trust the input length. A special test is added to not output the illegal character message if that character is a NUL. This is simply for backcompat. And a switch statement is used to unroll the loop for the leading digits in the number. This should handle most common cases. Beyond these, and one has to start worrying about overflow. So this version has removed that worrying from the common cases. Extra conditionals are avoided for large numbers by extracting the portability warning message code into a separate static function called from two different places. Simplifying this logic led me to see that if it overflowed, it must be non-portable, so another conditional could be removed. Other conditionals were removed at the expense of adding parameters to the function. This function isn't public, but is called from the grok_hex, et. al. macros. grok_hex knows, for example, that it is looking for an 'x' prefix and not a 'b'. Previously the code had a conditional to determine that. Similarly in pp.c, we look for the prefix. Having found it we can start the parse after the prefix, and tell this function not to look for it. Previously, this work was duplicated. The previous changes had left this function slower than blead. That is in part due to the fact that the loop doesn't go through that many iterations per function call, and the gcc compiler managed to optimize away the conditionals in XDIGIT_VALUE in the call of it from the loop. (The other call in this function did have the conditionals.) Thanks to Sergey Aleynikov for his help on this
* Collapse grok_bin, _oct, _hex into one functionKarl Williamson2020-01-131-0/+3
| | | | | | | | | | These functions are identical in logic in the main loop, the difference being which digits they accept. The rest of the code had slight variations. This commit unifies the functions. I presume the reason they were kept separate was because of speed. Future commits will make this unified function faster than blead, and the reduced maintenance cost makes this worthwhile.
* Rewrite and inline my_strnlen()Karl Williamson2020-01-131-1/+3
| | | | | | | This commit changes this function to use memchr() instead of looping byte-by-byte through the string. And it inlines it into 3 lines of code. This should give comparable performance to a native libc strnlen().
* embed.fnc: Remove wrong 'const'Karl Williamson2020-01-071-1/+1
| | | | This parameter isn't const
* Change len param in savepvn to Size_t from I32Karl Williamson2020-01-071-1/+1
| | | | We handle longer strings than 31 bits.
* utf8.c: Change parameter types of internal fcnsKarl Williamson2020-01-031-2/+2
| | | | | These generated warnings on certain platform builds, and weren't the best types for the purpose anyway.
* Change parameter type of static fcnKarl Williamson2020-01-031-1/+1
| | | | | This makes the first parameter consistent with the other similar parameter.
* Change some structures/fcns to use I32 and U32Karl Williamson2020-01-031-2/+2
| | | | | | | This is because these deal with only legal Unicode code points, which are restricted to 21 bits, so 16 is too few, but 32 is sufficient to hold them. Doing this saves some space/memory on 64 bit builds where an int is 64 bits.
* Rmv leading underscore from macro nameKarl Williamson2019-12-111-3/+3
| | | | | | | These are illegal in C, but we have plenty of them around; I happened to be looking at this function, and decided to fix it. Note that only the macro name is illegal; the function was fine, but to change the macro name means changing the function one.
* Add the `isa` operatorPaul "LeoNerd" Evans2019-12-091-0/+10
| | | | | | | | | | | | | | | | | | Adds a new infix operator named `isa`, with the semantics that $x isa SomeClass is true if and only if `$x` is a blessed object reference that is either `SomeClass` directly, or includes the class somewhere in its @ISA hierarchy. It is false without warning or error for non-references or non-blessed references. This operator respects `->isa` method overloading, and is intended to replace boilerplate code such as use Scalar::Util 'blessed'; blessed($x) and $x->isa("SomeClass")
* PATCH: gh #17275 Silence new warningKarl Williamson2019-11-211-19/+21
| | | | | | | | | This was caused by a static inline function in a header that was #included in a file that didn't use it. Normally, these functions are #ifdef'd so as to be visible only to files in which they are used. Some compilers warn that the function is defined but not used otherwise. The solution is to remove this function's visibility from the file that didn't use it.
* regcomp.c: Add invlist_lowest() and use itKarl Williamson2019-11-201-0/+7
| | | | | | This makes it less complicated to find the lowest code point in an inversion list. This makes the place where it's used clearer as to what is going on. And it may eventually be used in more than one place.
* find_first_differing_byte_posKarl Williamson2019-11-201-0/+5
|
* add explicit 1-arg and 3-arg sig handler functionsDavid Mitchell2019-11-181-0/+8
| | | | | | | Currently, whether the OS-level signal handler function is declared as 1-arg or 3-arg depends on the configuration. Add explicit versions of these functions, principally so that POSIX.xs can call which version of the handler it wants regardless of configuration: see next commit.