summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* Provide a simple API for testing features enabledTony Cook2021-08-104-0/+311
| | | | | | | | Inspired by discussion in #p5p. This calls caller() itself rather than taking hints and hints_hash parameters so if we end up adding an extra hints word callers won't need to adjust their code.
* Time-HiRes: handle $Config{d_various} correctlyTony Cook2021-08-101-4/+16
| | | | | | | | | | | The Time::HiRes Makefile.PL checks %Config for a variety of symbols to attempt to probe for them without having to perform its own compilation/run to probe for them. This is useful for cross compilation, since the Time::HiRes probes don't appear to handle probing on a remote system as the Configure probes do. A few of these probes didn't set the appropriate -DTIME_HIRES_XXX symbol on the compilation command-line, fix that.
* Upgraded Encode from 3.10_01 to 3.12Ricardo Signes2021-08-099-32/+589
|
* perlop: Clarify indented here-doc rulesKarl Williamson2021-08-091-6/+11
|
* Merge branch 'encode-cve-fix' into bleadRicardo Signes2021-08-091-3/+4
|\
| * Encode.pm: apply a local patch for CVE-2021-36770Ricardo Signes2021-08-091-3/+4
|/ | | | | | | | I expect Encode to see a new release today. Without this fix, Encode::ConfigLocal can be loaded from a path relative to the current directory, because the || operator will evaluate @INC in scalar context, putting an integer as the only value in @INC.
* mktables: Fix table outputKarl Williamson2021-08-095-6/+6
| | | | | | Commit 4fe9356b250 changed the signatures on subroutines, and didn't do this one correctly. The result was that the comments in the generated files had duplicate text and were slightly garbled.
* regcomp.c: White-space onlyKarl Williamson2021-08-071-7/+7
|
* regcomp.c: Add comment; fix commentKarl Williamson2021-08-071-1/+46
| | | | | The flagp parameter currently can only be used to pass values up, not down.
* regcomp.c: Initialize a variableKarl Williamson2021-08-071-1/+1
| | | | to silence some compilers that were warning
* regcomp.c: Save a value instead of re-calling fcnKarl Williamson2021-08-071-2/+4
| | | | | This variable will be used in future commits in more places, so compute it just once.
* regcomp.c: Add a clearer mnemonicKarl Williamson2021-08-071-26/+30
|
* regcomp.c: Move some code to within a blockKarl Williamson2021-08-071-3/+3
| | | | | This code is irrelevant unless the condition of the block immediately before it is TRUE, so move it to within that block.
* regcomp.c: Consolidate duplicate codeKarl Williamson2021-08-071-17/+17
|
* regcomp.c: S_optimize_regclass() return 0 if failKarl Williamson2021-08-071-7/+13
| | | | | Based on a comment from @hvds, I think it better if this function return an impossible node value if it didn't find a node to use.
* regcomp.c: Add some branch predictorsKarl Williamson2021-08-071-2/+2
|
* regcomp.c: Move some code out of unlikely #ifdefKarl Williamson2021-08-071-4/+5
| | | | | Spotted by Hugo van der Sanden. Doing this caused it to attempt to be compiled, and showed a typo.
* perlfunc: Fix typoKarl Williamson2021-08-071-1/+1
|
* utf8.c: Rename formal param to static fcnKarl Williamson2021-08-073-21/+21
| | | | The new mname is more mnemonic
* regexec.c: Add commentKarl Williamson2021-08-071-0/+1
|
* handy.h: Fix internal macroKarl Williamson2021-08-071-1/+1
| | | | | | | | | | I found this reading code. The macro is supposed to check for something not being in the ASCII range, but instead checked that the input is invariant under UTF-8. These concepts evaluate to the same thing on ASCII platforms, but differently on EBCDIC ones. The calls to this macro are such that there isn't a bug that surfaces here, but the code generated is slightly different, and it should be fixed to prevent any future issues.
* regexec.c: Refactor macro to generalize itKarl Williamson2021-08-071-11/+27
| | | | This is in preparation for a somewhat different use to be added.
* APItest.xs: White space onlyKarl Williamson2021-08-071-1528/+1528
| | | | Remove tabs, trailing white space
* regcharclass.pl: Add fast surrogate UTF-8 trieKarl Williamson2021-08-072-2/+14
| | | | | This will be used in the next commit. It requires only the first two bytes to determine if a UTF-8 or UTF-EBCDIC sequence is for a surrogate
* utf8.h: Comment changesKarl Williamson2021-08-071-10/+21
|
* utf8.h: White space onlyKarl Williamson2021-08-071-19/+19
|
* utf8.h: Refactor UTF8_IS_NONCHAR...Karl Williamson2021-08-071-7/+6
| | | | | | UTF8_IS_NONCHAR_GIVEN_THAT_NON_SUPER_AND_GE_PROBLEMATIC() is defined just for backward compatability (though I don't think anyone uses it). Swap which macro is the base level that the other is defined in terms of
* Refactor UTF8_IS_SUPER()Karl Williamson2021-08-071-20/+14
| | | | | | This uses macros recently introduced to remove an EBCDIC dependency and make the definition simpler. It now uses the DFA, which should speed up the non-edge case uses.
* utf8.h: Document some #definesKarl Williamson2021-08-071-0/+37
| | | | | The reorganization in the previous commit revealed some undocumented public macros
* utf8.h: Move some #defines aroundKarl Williamson2021-08-071-116/+119
| | | | | | This moves the defines for things like surrogates, non-character code points, etc. to a more logical order, with like adjacent to like, and before they are otherwise used in the file.
* Remove EBCDIC-only codeKarl Williamson2021-08-076-47/+34
| | | | The previous commit stopped using this code, so can just get rid of it.
* utf8.h: Remove EBCDIC dependencyKarl Williamson2021-08-072-12/+10
| | | | By generalizing a macro, we can make it serve both ASCII and EBCDIC
* utf8.h: Add macros to calc UTF start byte, first contKarl Williamson2021-08-071-4/+41
| | | | | These two bytes are useful to know in some situations. This commit changes a couple such places to use the first macro.
* utfebcdic.h: White-space, comment onlyKarl Williamson2021-08-071-11/+17
|
* utf8.h: Reorder some preprocessor directivesKarl Williamson2021-08-071-14/+10
| | | | This is just so that things are clearer to the reader
* utf8.c: in-line only use of two macrosKarl Williamson2021-08-071-40/+30
| | | | | These macros don't need to be macros, as they each are only called from one place, and that isn't likely to change.
* utf8.c: Comment non-obvious fcn param meaningKarl Williamson2021-08-071-1/+2
|
* uvoffuni_to_utf8_flags_msgs: Avoid extra conditionalsKarl Williamson2021-08-073-27/+46
| | | | | | | The previous commit for EBCDIC paved the way for moving some checks for a code point being for Perl extended UTF-8 out of places where they cannot succeed. The resultant simplifications more than compensate for the two extra case statements added by this commit.
* Fix EBCDIC deficiency in uvoffuni_to_utf8_flags_msgs()Karl Williamson2021-08-071-4/+16
| | | | | | Simply by adjusting the case statement labels, and adding an extra case, the code can avoid checking for a problem on EBCDIC boxes when it would be impossible for the problem to exist.
* Refactor uvoffuni_to_utf8_flags_msgsKarl Williamson2021-08-071-119/+73
| | | | | | | | | | | Having a fast UVOFFUNISKIP() allows this function be be refactored to simplify it. This commit continues to shortchange large code points and EBCDIC by a little. For example, it checks if a 4-byte character is above Unicode, but no 4-byte characters fit that description in UTF-EBCDIC. This will be fixed in the next commit, which will prepare for further enhancements.
* utf8.c: Change formal parameter name to fcnKarl Williamson2021-08-073-41/+41
| | | | This will make more sense of the next commit
* inline.h: Macroize DFA for isFOO_UTF8_CHAR()Karl Williamson2021-08-071-60/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | There are currently three functions for variants of finding if the next few bytes of a string form a proper UTF-8 encoded character of some ilk. The main code for each is identical to the others, except for the table that drives it. This commit makes that code a macro that takes arguments to customize its behavior sufficiently for current and forseeable needs. This makes it easier to keep the varieties in sync with each other with future changes. The macro has three exit points: 1) successful parsing 2) unsuccessful parsing 3) succesful parsing as far as it went, but the input was exhausted before reaching a full character. What to do for each of these eventualities is passed to the macro. This is a change in behavior in which 2) and 3) were not distinguished from each other. This actually leads to fewer tests in some situations, and future commits using this DFA for other purposes will take advantage of it.
* Add helper function for longest UTF8 sequenceKarl Williamson2021-08-074-0/+74
| | | | | | | | | This specialized functionality is used to check the validity of Perl's extended-length UTF-8, which has some ideosyncratic characteristics from the shorter sequences. This means this function doesn't have to consider those differences. It will be used in the next commit to avoid some work, and to eventually enable is_utf8_char_helper() to be simplified.
* utf8.c: Fold 2 overlapping fcns into oneKarl Williamson2021-08-074-217/+97
| | | | | | | | One of these functions is now only called from the other, and there is significant overlap in their logic. This commit refactors them into one resulting function, which is half the code, and more straight forward.
* utf8.c: Change internal macro nameKarl Williamson2021-08-071-6/+6
| | | | | The sequences here aren't UTF-8, but UTF, since they are I8 in UTF-EBCDIC terms
* utf8.c: Improve algorithm for detecting overflowKarl Williamson2021-08-071-61/+25
| | | | | | | | | | | | | | The code has hard-coded into it the UTF-8 for the highest representable code point for various platforms and word sizes. The algorithm is to compare the input sequence to verify it is <= the highest. But the tail of each of them has some number of the highest possible continuation byte. We need not look at the tail, as the input cannot be above the highest possible. This commit shortens the highest string constants and exits the loop when we get to where the tail used to be. This change allows for the complete removal of the code that is #ifdef'd out that would be used when we allow core to use code points up to UV_MAX.
* utf8.c: Use STRLENs() instead of sizeof()Karl Williamson2021-08-071-9/+14
| | | | This makes the code easier to read.
* utf8.c: Use C_ARRAY_LENGTH()Karl Williamson2021-08-071-1/+1
| | | | This macro is preferred to sizeof()
* utf8.c: Generalize static fcnKarl Williamson2021-08-074-30/+45
| | | | | | | | | I've always been uncomfortable with the input constraints this function had. Now that it has been refactored into using a switch(), new cases for full generality can be added without affecting performance, and some conditionals removed before calling it. The function is renamed to reflect its more generality
* utf8.c: Refactor internal functionKarl Williamson2021-08-071-35/+25
| | | | | The insight in the previous commit allows this function to become much more compact.