summaryrefslogtreecommitdiff
path: root/regen
Commit message (Collapse)AuthorAgeFilesLines
* Reset regen/feature.pl's mode following previous cherry-pick commitSteve Hay2014-12-271-0/+0
| | | | [Why does my (Windows) git cherry-pick change 100755 modes to 100644?]
* document the postderef feature in feature.pmRicardo Signes2014-12-271-1/+22
| | | | | | | | (cherry picked from commit f86d720ebb7ad53ce8b1c12cee66586eabffe0c8) [Edited by the committer to bump the $VERSION to a value that has not already been used in a blead release and will not be used in a future blead release.]
* Fix for Coverity perl5 CID 29034: Out-of-bounds read (OVERRUN) ↵Jarkko Hietaniemi2014-04-301-1/+16
| | | | | | | | | | | | overrun-local: Overrunning array PL_reg_intflags name of 14 8-byte elements at element index 31 (byte offset 248) using index bit (which evaluates to 31). Needed compile-time limits for the PL_reg_intflags_name so that the bit loop doesn't waltz off past the array. Could not use C_ARRAY_LENGTH because the size of name array is not visible during compile time (only const char*[] is), so modified regcomp.pl to generate the size, made it visible only under DEBUGGING. Did extflags analogously even though its size currently exactly 32 already. The sizeof(flags)*8 is extra paranoia for ILP64.
* warnings.pm: improve awkward sentence in podRicardo Signes2014-03-181-4/+4
|
* bump the version of warnings.pmRicardo Signes2014-03-181-2/+2
| | | | (and of regen/warnings.pl)
* remove references to perllexwarn from warnings.pmRicardo Signes2014-03-181-6/+3
|
* regen/warnings.pl no longer touches perllexwarnRicardo Signes2014-03-181-19/+0
|
* merge most of perllexwarn into warningsRicardo Signes2014-03-181-7/+465
|
* replace printTree with warningsTreeRicardo Signes2014-03-181-9/+12
| | | | | | we return, rather than print, the warnings, so we can potentially futz around with the string and put it where we like without having to worry about C<select>
* enclose warnings.h generation in a blockRicardo Signes2014-03-181-40/+45
| | | | | | ...to limit the number of variables visible everywhere and make it a bit easier to see what I am doing as I refactor regen/warnings.pl
* regcomp.c: Don't read past string-endKarl Williamson2014-03-121-1/+1
| | | | | | | | In doing an audit of regcomp.c, and experimenting using Encode::_utf8_on(), I found this one instance of a regen/regcharclass.pl macro that could read beyond the end of the string if given malformed UTF-8. Hence we convert to use the 'safe' form. There are no other uses of the non-safe version, so don't need to generate them.
* regen/regcharclass.pl: Don't generate unused macrosKarl Williamson2014-03-121-4/+4
| | | | Having these unused macros around just clutters up the header file
* regen/regcharclass.pl: Generate correct macro instead of skippingKarl Williamson2014-03-121-2/+1
| | | | | | | | | | It makes no sense to check for length safeness for The macros generated by this program which take a single UV code point as a parameter. Prior to this patch, it would skip trying to generate them if asked. But, because of the way things are structured, that means that if you need just this and the safe versions, you can't do it so easily. What this commit does is generate the cp macro if requested even if the 'safe' version of other macros are also requested.
* regen/regcharclass.pl: Forbid non-safe macros for multi-char matchesKarl Williamson2014-03-011-3/+13
| | | | | | | | | | | For matches that can match more than a single code point, one should always use a macro that makes sure that one doesn't read off the end of the buffer. This is because the buffer might end with the first N characters of a sequence with at least N+1 in it, and we don't want to read that N+1 position in the buffer. If this had been in place, buggy commit 3a8bbffbce would not have happened.
* regen/regcharclass.pl: Don't generate unused macrosKarl Williamson2014-03-011-3/+3
| | | | | | The macros generated by these options are not needed in the core; generating them just clutters up the header file, and some will actually be forbidden by the next commit.
* Revert most of 3a8bbffbce: Avoid unnecessary malformed checkingKarl Williamson2014-03-011-2/+2
| | | | | | | | | | | | | | My thinking was muddled when I made that commit, and this reverts the essence of it. The theory was that since we have already processed the regex pattern, we don't need to check it for malformedness, hence we don't need the "safe" form of certain macros that check for and avoid running off the end of the buffer. It is true that we don't have to worry about malformedness indicating that the buffer is bigger than it really is, but these macros can match up to three well-formed characters, so we do have to make sure that all three are in the buffer, and that the input isn't just the first two at the buffer's end. This was caught by running valgrind.
* regen/regcharclass.pl: White-space; comment nits onlyKarl Williamson2014-03-011-8/+8
| | | | Indent to account for new block added in the previous commit
* regen/regcharclass.pl: Simplify generated safe macrosKarl Williamson2014-03-011-12/+117
| | | | | | | | | | | | | | | | | | | | | This simplifies the macros generated which make sure there are no read errors. Prior to this commit, the code generated looked like (e - s) > 3 ? see if things of at most length 4 match : (e - s) > 2 ? see if things of at most length 3 match : (e - s) > 1 ? see if things of at most length 2 match : (e - s) > 0 ? see if things of at most length 1 match For things that are a single character, the ones greater than length 2 must be in UTF8, and their needed length can be determined by UTF8SKIP, so we can get rid of most of the (e-s) tests. This doesn't change the macros which can match multiple characters; that is a harder to do.
* regen/regcharclass.pl: Warn that macros are internal onlyKarl Williamson2014-03-011-1/+6
| | | | | This adds a comment to the generated file that the macros are not to be generally used.
* Change 'semantics' to 'rules'Karl Williamson2014-02-202-4/+4
| | | | | | The term 'semantics' in documentation when applied to character sets is changed to 'rules' as being a shorter less-jargony synonym in this case. This was discussed several releases ago, but I didn't get around to it.
* subroutine signaturesZefram2014-02-012-1/+24
| | | | | | | | | | Declarative syntax to unwrap argument list into lexical variables. "sub foo ($a,$b) {...}" checks number of arguments and puts the arguments into lexical variables. Signatures are not equivalent to the existing idiom of "sub foo { my($a,$b) = @_; ... }". Signatures are only available by enabling a non-default feature, and generate warnings about being experimental. The syntactic clash with prototypes is managed by disabling the short prototype syntax when signatures are enabled.
* Work properly under UTF-8 LC_CTYPE localesKarl Williamson2014-01-271-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This large (sorry, I couldn't figure out how to meaningfully split it up) commit causes Perl to fully support LC_CTYPE operations (case changing, character classification) in UTF-8 locales. As a side effect it resolves [perl #56820]. The basics are easy, but there were a lot of details, and one troublesome edge case discussed below. What essentially happens is that when the locale is changed to a UTF-8 one, a global variable is set TRUE (FALSE when changed to a non-UTF-8 locale). Within the scope of 'use locale', this variable is checked, and if TRUE, the code that Perl uses for non-locale behavior is used instead of the code for locale behavior. Since Perl's internal representation is UTF-8, we get UTF-8 behavior for a UTF-8 locale. More work had to be done for regular expressions. There are three cases. 1) The character classes \w, [[:punct:]] needed no extra work, as the changes fall out from the base work. 2) Strings that are to be matched case-insensitively. These form EXACTFL regops (nodes). Notice that if such a string contains only characters above-Latin1 that match only themselves, that the node can be downgraded to an EXACT-only node, which presents better optimization possibilities, as we now have a fixed string known at compile time to be required to be in the target string to match. Similarly if all characters in the string match only other above-Latin1 characters case-insensitively, the node can be downgraded to a regular EXACTFU node (match, folding, using Unicode, not locale, rules). The code changes for this could be done without accepting UTF-8 locales fully, but there were edge cases which needed to be handled differently if I stopped there, so I continued on. In an EXACTFL node, all such characters are now folded at compile time (just as before this commit), while the other characters whose folds are locale-dependent are left unfolded. This means that they have to be folded at execution time based on the locale in effect at the moment. Again, this isn't a change from before. The difference is that now some of the folds that need to be done at execution time (in regexec) are potentially multi-char. Some of the code in regexec was trivial to extend to account for this because of existing infrastructure, but the part dealing with regex quantifiers, had to have more work. Also the code that joins EXACTish nodes together had to be expanded to account for the possibility of multi-character folds within locale handling. This was fairly easy, because it already has infrastructure to handle these under somewhat different circumstances. 3) In bracketed character classes, represented by ANYOF nodes, a new inversion list was created giving the characters that should be matched by this node when the runtime locale is UTF-8. The list is ignored except under that circumstance. To do this, I created a new ANYOF type which has an extra SV for the inversion list. The edge case that caused the most difficulty is folding involving the MICRO SIGN, U+00B5. It folds to the GREEK SMALL LETTER MU, as does the GREEK CAPITAL LETTER MU. The MICRO SIGN is the only 0-255 range character that folds to outside that range. The issue is that it doesn't naturally fall out that it will match the CAP MU. If we let the CAP MU fold to the samll mu at compile time (which it can because both are above-Latin1 and so the fold is the same no matter what locale is in effect), it could appear that the regnode can be downgraded away from EXACTFL to EXACTFU, but doing so would cause the MICRO SIGN to not case insensitvely match the CAP MU. This could be special cased in regcomp and regexec, but I wanted to avoid that. Instead the mktables tables are set up to include the CAP MU as a character whose presence forbids the downgrading, so the special casing is in mktables, and not in the C code.
* Avoid unnecessary malformed checkingKarl Williamson2014-01-271-2/+2
| | | | | | | | | | regen/regcharclass.pl can create macros for use where we need to worry about the possibility of malformed UTF-8, and for where we don't. In the case of looking at regex patterns, the Perl core has complete control over generating them, and hence isn't generally going to create too short a buffer; if it does, it's a bug that will show up and get fixed. This commit changes to generate and use the faster macros that don't do bounds checking.
* regen/regcharclass.pl: Don't test UV >= 0Karl Williamson2014-01-271-3/+11
| | | | | | | An unsigned must always be >= 0, and generating a test for that can lead to a compiler warning, even if it gets optimized out. The input to the macros generated by this are supposed to be UV. This commit suppresses any >= 0 test.
* regen/regcharclass.pl: Fix warningKarl Williamson2014-01-271-1/+0
| | | | | wrap() is already defined by the regen infrastructure; no need to do so again, and get warning if we persist in doing so.
* Move an inversion list generation to mktablesKarl Williamson2014-01-272-6/+5
| | | | | | | Prior to this patch, this was in regen/mk_invlists.pl, but future commits will want it to also be used by the header generated by regen/regcharclass.pl, so use a common source so the logic doesn't have to be duplicated.
* reentr.c: Handle systems without getpwentBrian Fraser2014-01-261-0/+2
| | | | Namely, Android.
* [perl #120977] bump $warnings::VERSIONTony Cook2014-01-221-1/+1
|
* assume "all" in "use warnings 'FATAL';" and relatedHauke D2014-01-221-1/+5
| | | | | | | | | | | | | Until now, the behavior of the statements use warnings "FATAL"; use warnings "NONFATAL"; no warnings "FATAL"; was unspecified and inconsistent. This change causes them to be handled with an implied "all" at the end of the import list. Tony Cook: fix AUTHORS formatting
* rename aggref warnings to autoderefRicardo Signes2014-01-142-2/+2
|
* Increase $warnings::VERSION to 1.21Father Chrysostomos2014-01-141-1/+1
|
* Make key/push $scalar experimentalFather Chrysostomos2014-01-142-0/+3
| | | | | We need a better name for the experimental category, but I have not thought of one, even after sleeping on it.
* IDStart and IDCont no longer go out to diskKarl Williamson2014-01-091-0/+2
| | | | | | | These are the base names for various macros used in parsing identifiers. Prior to this patch, parsing a code point above Latin1 caused loading disk files. This patch causes all the information to be compiled into the Perl binary.
* regen/mk_invlists.pl: White-space onlyKarl Williamson2014-01-091-14/+14
| | | | This outdents a block to be in line with adjacent lines.
* Rmv PL_Posix_ptrsKarl Williamson2014-01-091-14/+0
| | | | | | | | | | | | Previous commits in this series have removed all uses of this global array. This completely removes it. Since it is a global, consideration need be given to possible uses of it outside the core. It has never been externally documented, and is an opaque structure whose internals have changed with every release. The functions used to access it are almost all static to regcomp.c; those few that aren't have been hidden from all but the few .c files that need to have access to them, via #if's.
* Remove PL_L1Posix_ptrsKarl Williamson2014-01-091-9/+0
| | | | | | | | | | | | This global array is no longer used, having been removed in previous commits in this series. Since it is a global, consideration need be given to possible uses of it outside the core. It has never been externally documented, and is an opaque structure whose internals have changed with every release. The functions used to access it are almost all static to regcomp.c; those few that aren't have been hidden from all but the few .c files that need to have access to them, via #if's.
* Compile in list of foldable code pointsKarl Williamson2014-01-091-0/+1
| | | | | | | | | | | | | | When constructing what matches code points under /i, Perl uses an inversion list of all the possible code points that participate in folds. This number is relatively few compared to the possible universe of code points, as most of the world's scripts aren't cased, and many characters in the scripts that do fold aren't foldable (such as punctuation). Prior to this commit, the list for the above-Latin1 code points was read-in from disk if and only if needed. This commit causes the list to be added to read-only data in a C header, trading a little space in Perl's text segment for speed at execution. This will enable ripping out some code in this and future commits (offsetting the space used by this one).
* Compile in all POSIX class inversion listsKarl Williamson2014-01-091-0/+10
| | | | | | | | This changes charclass_invlists.h to have the complete definitions for all the POSIX classes, like \w and [:alpha:]. Thus these won't have to be loaded off disk at run-time. Taking advantage of this will be done in stages in future commits
* regen/warnings.pl: Add commentsKarl Williamson2014-01-011-0/+8
| | | | | These note that warnings categories should be independent in the calls to ckWARN() and packWARN() type macros.
* silence -Wformat-nonliteral compiler warningsDavid Mitchell2013-11-281-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to the security risks associated with user-supplied formats being passed to C-level printf() style functions (eg %n), gcc has a -Wformat-nonliteral warning that complains whenever such a function is passed a non-literal format string. This commit silences all such warnings in core and ext/. The main changes are 1) the 'f' (format) flag in embed.fnc is now handled slightly more cleverly. Rather than just applying to functions whose last arg is '...' (and where the format arg is assumed to be the previous arg), it can now handle non-'...' functions: arg checking is disabled, but format checking is sill done: it works by assuming that an arg called 'fmt', 'pat' or 'f' is the format string (and dies if fails to find exactly one such arg). 2) with the new embed.fnc functionally, more functions have been marked with the 'f' flag. When such a function passes its fmt arg onto an inner printf-like function, we simply disable the warning for that call using GCC_DIAG_IGNORE(-Wformat-nonliteral), since we know that the caller must have already checked it. 3) In quite a few places the format string isn't literal, but it *is* constant (e.g. PL_warn_uninit_sv). For those cases, again disable the warning. 4) In pp_formline(), a particular format was was one of several different literal strings depending on circumstances. Rather than assigning this string to a temporary variable, incorporate the ?: branches directly in the function call arg. gcc is clever enough to decide the arg is then always literal.
* mark Perl_my_strftime with format attributeDavid Mitchell2013-11-281-2/+7
| | | | | | | | | | | | | | mark this function with __attribute__format__null_ok__(__strftime__,pTHX_1,0) so that compiler checks and warnings about strftime-style format args can be checked. Rather than adding new flag(s) to embed.fnc, I just enhanced the f flag to treat it as strftime-style rather than printf if the function name matches /strftime/. This was quicker, and we're unlikely to have many such functions.
* Reënable qr caching for (??{}) retval where possibleFather Chrysostomos2013-11-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | When a scalar is returned from (??{...}) inside a regexp, it gets com- piled into a regexp if it is not one already. Then the regexp is sup- posed to be cached on that scalar (in magic), so that the same scalar returned again will not require another compilation. Commit e4bfbed39b disabled caching except on references to overloaded objects. But in that one case the caching caused erroneous behaviour, which was just fixed by 636209429f and this commit’s parent, effect- ively disabling the cache altogether. The cache is disabled because it does not apply to TEMP variables (those about to be freed anyway, for which caching would be a waste of CPU), and all non-overloaded non-qr thingies get copied into new mortal (TEMP) scalars (as of e4bfbed39b) before reaching the caching code. This commit skips the copy if the return value is already a non-magi- cal string or number. It also allows the caching to happen on con- stants, which has never been permitted before. (There is actually no reason for disallowing qr magic on read-only variables.)
* Make &CORE::exit respect vmsish exit hintFather Chrysostomos2013-11-081-1/+1
| | | | | | | | | by removing the hint from the exit op itself and just having pp_exit look in the cop hint hash, where it is already stored (as a result of having been in %^H at compile time). &CORE:: subs intentionally lack a nextstate op (cop) so they can see the hints in the caller’s nextstate op.
* Fix &CORE::exit/die under vmsish "hushed"Father Chrysostomos2013-11-081-1/+1
| | | | | | | This commit makes them behave like exit and die without the ampersand by moving the OPpHUSH_VMSISH hint from exit/die op to the current statement (nextstate/cop) instead. &CORE:: subs intentionally lack a nextstate op, so they can see the hints in the caller’s nextstate op.
* Stop lexical CORE sub from interfering with CORE::Father Chrysostomos2013-11-081-1/+0
| | | | | | | | | | | | | | | | | | | | The way CORE:: was handled in the lexer was convoluted. CORE was treated initially as a keyword, with exceptions in the lexer to make it behave correctly. If it turned out not to be followed by ::, then the lexer would fall back to treating it as a bareword or sub name. Before even checking for a keyword, the lexer looks for :: and goes to the bareword/sub code. But it made a special exception there for CORE::. In the end, treating CORE as a keyword recognized by the keyword() function requires more special cases than simply special-casing CORE:: in toke.c. This fixes the lexical CORE sub bug, while reducing the total num- ber of lines.
* Split ck_open into two functionsFather Chrysostomos2013-11-061-1/+1
| | | | | | It is used for two op types, but only a small portion of it applies to both, so we can put that in a static function. This makes the next commit easier.
* rv2hv does not use its TARGFather Chrysostomos2013-10-241-1/+1
| | | | | | | rv2hv has had a TARG since perl 5.000, but it has not used it since hv_scalar was added in perl-5.8.0-3008-ga3bcc51. This commit removes it, saving a tiny bit of space in the pad.
* new warnings category, so bump warnings.pmRicardo Signes2013-10-051-1/+1
|
* Make postderef experimentalFather Chrysostomos2013-10-051-0/+2
|
* Add postderef_qq feature featureFather Chrysostomos2013-10-051-0/+1
|