summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Digest-MD5: Consolidate byte-swapping pathsMatt Turner2019-10-083-157/+3
| | | | | | | | | | | | The code guarded by #ifndef U32_ALIGNMENT_REQUIRED attempts to optimize byte-swapping by doing unaligned loads, but accessing data through unaligned pointers is undefined behavior in C. Moreover, compilers are more than capable of recognizing these open-coded byte-swap patterns and emitting a bswap instruction, or an unaligned load instruction, or a combined load, etc. There's no need for multiple paths to attain the desired result. See https://rt.perl.org/Ticket/Display.html?id=133495
* Merge branch 'Remove EBCDCIC special handling' into bleadKarl Williamson2019-10-062-103/+65
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It turns out that only a single number is needed to distinguish between basic UTF-8 and UTF-EBCDIC. And that is the number of bits of real information are in each continuation byte. In UTF-8 it is 6 (2 bits reserved for syntax); In UTF-EBCDIC it is 5. Everything else stems from reasonable decisions based on this fundamental difference. So all the other constants can be common between the two systems, using compile-time shifts and masks. For Perl's extended UTF-8-like encoding, another constant is needed, which is the number of continuation bytes appended when the start byte is 8 bits. For both systems, that number is the minimum required to be able to encode a 64-bit integer. (There are other ways to extend the encoding, including some that are infinitely so. But Perl chose to just append a fixed number of bytes, so it isn't extensible. But it has the advantage of needing to rely only on the first byte to know how many more are coming.) This commit consolidates various constants that differed between the two systems, but were unnecessarily so. There are other constants that remain that differ between the two files; these are for convenience .
| * Make defn of UTF_IS_CONTINUED commonKarl Williamson2019-10-062-10/+5
| | | | | | | | This can be derived from other values, removing an EBCDIC dependency
| * Make defn of UVCHR_IS_INVARIANT commonKarl Williamson2019-10-062-24/+14
| | | | | | | | This can be derived from other values, removing an EBCDIC dependency
| * Make defn of OFFUNI_IS_INVARIANT commonKarl Williamson2019-10-062-11/+5
| | | | | | | | This can be derived from other values, removing an EBCDIC dependency
| * Make defn of UTF8_IS_DOWNGRADEABLE_START commonKarl Williamson2019-10-062-11/+7
| | | | | | | | This can be derived from other values, removing an EBCDIC dependency
| * Make defn of UTF_IS_ABOVE_LATIN1 commonKarl Williamson2019-10-062-12/+8
| | | | | | | | This can be derived from other values, removing an EBCDIC dependency
| * Make defn of UTF8_IS_START commonKarl Williamson2019-10-062-10/+10
| | | | | | | | This can be derived from other values, removing an EBCDIC dependency
| * Make defn of UTF8_IS_CONTINUATION commonKarl Williamson2019-10-062-14/+6
| | | | | | | | This can be derived from other values, removing an EBCDIC dependency
| * Make defn of UTF_CONTINUATION_MARK commonKarl Williamson2019-10-062-6/+6
| | | | | | | | This can be derived from other values, removing an EBCDIC dependency
| * Make defn of UTF_IS_CONTINUATION_MASK commonKarl Williamson2019-10-062-5/+4
|/ | | | | This variable can be defined from the same base in both UTF-8 and UTF-EBCDIC, and doing so eliminates an EBCDIC dependency.
* utf8.h: Add commentKarl Williamson2019-10-061-1/+3
|
* utf8.h: Remove redundant castKarl Williamson2019-10-061-1/+1
| | | | The called macro does the cast already
* utf8.h: Make sure macros not called with a ptrKarl Williamson2019-10-061-8/+8
| | | | | By doing an '| 0' with a parameter in a macro expansion, a C syntax error will be generated. This is free protection.
* t/TEST: Test most of CPAN on EBCDICKarl Williamson2019-10-061-3/+17
| | | | | | CPAN was mostly skipped before because so many distros raised errors, but that is no longer true, so just skip about 10 that have big problems, and test the rest
* Small typo in README.os390 from 2001-01-08 09:53H.Merijn Brand2019-10-041-1/+1
| | | | 0e06870bf080a38cda51c06c6612359afc2334e1
* Recent os390 experiences reflected in docs and hintsH.Merijn Brand2019-10-033-45/+61
|
* fix some signed/unsigned warningsDavid Mitchell2019-10-032-5/+5
| | | | | Note that utf8_distance returns IV, while STR_LEN is an unsigned value of varying sizes.
* regen charclass_invlists.hDavid Mitchell2019-10-034-5/+5
| | | | | | | this was missed from the previous commit Also, fix typo in regen/regcharclass.pl It was still referring to itself as Porting/regcharclass.pl
* Use balanced delimiters for multi-line s///gxeDagfinn Ilmari Mannsåker2019-10-035-7/+7
|
* lib/charnames.t: Fix Named Sequence test for EBCDICKarl Williamson2019-10-021-0/+3
| | | | The file from Unicode needs to be translated to native
* mktables: Fix Named Sequences for EBCDICKarl Williamson2019-10-025-4/+9
| | | | This table wasn't being translated into native code points
* perldelta for 30fc7a2809e5Tony Cook2019-10-021-0/+18
|
* Eliminate modifiable variables in constantsJames E Keenan2019-10-023-48/+25
| | | | | | | | | | Transform previously deprecated cases into exceptions. Update diagnostic; change D to F remove now irrelevant code (TonyC) For: RT 134138
* Document the various stacks in perlguts.podPaul "LeoNerd" Evans2019-10-021-1/+235
|
* mathoms: Restore fcns accidentally deletedKarl Williamson2019-09-301-0/+72
| | | | | | | | | Commit x059703b088f44d5665f67fba0b9d80cad89085fd removed more code than was intended. This commit restores the missing functions. This showed up in MSWin32 builds, I presume VMS as well. Spotted by Tony Cook
* regcomp.c: Fix MSWin32 compilation errorKarl Williamson2019-09-301-1/+6
| | | | | | On DEBUGGING builds, the asserts in the expansion of this macro build up too large of literal strings for the Win32 compiler. Solve this by storing to an intermediary.
* Remove deprecated character classification/case changing macrosKarl Williamson2019-09-299-1121/+206
| | | | | | | | | | | | | | It has been deprecated since 5.26 to use various macros that deal with UTF-8 inputs but don't have a parameter indicating the maximum length beyond which we should not look. This commit changes all such macros, as threatened in existing documentation and warning messages, to have an extra parameter giving the length. This was originally scheduled to happen in 5.30, but was delayed because it broke some CPAN modules, and there wasn't really a good way around it. But now that Devel::PPPort 3.54 is out, ppport.h has new facilities for getting modules making these changes to work with older Perl releases.
* APItest: Remove use of macros about to be removedKarl Williamson2019-09-293-100/+76
| | | | | The next commit removes some macros that this uses. They have been deprecated, and the uses here were to test those deprecations.
* perl.h: Silence warning when compiled with C++Karl Williamson2019-09-291-0/+2
| | | | | | This silences a warning that the pragma it surrounds is not valid on C++. We don't need to know that, and it clutters the compilation output.
* regex: Add LEXACT_ONLY8 node typeKarl Williamson2019-09-297-133/+176
| | | | | This is like LEXACT, but it is known that only strings encoded in UTF-8 will match it, so don't even have to try if that condition isn't met.
* regex: Create and handle LEXACT nodesKarl Williamson2019-09-293-9/+125
| | | | | | See the previous commit for info on these. I am not changing trie code to recognize these at this time.
* Add regnode LEXACT, for long stringsKarl Williamson2019-09-295-160/+217
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds a new regnode for strings that don't fit in a regular one, and adds a structure for that regnode to use. Actually using them is deferred to the next commit. This new regnode structure is needed because the previous structure only allows for an 8 bit length field, 255 max bytes. This commit puts the length instead in a new field, the same place single-argument regnodes put their argument. Hence this long string is an extra 32 bits of overhead, but at no string length is this node ever bigger than the combination of the smaller nodes it replaces. I also considered simply combining the original 8 bit length field (which is now unused) with the first byte of the string field to get a 16 bit length, and have the actual string be offset by 1. But I rejected that because it would mean the string would usually not be aligned, slowing down memory accesses. This new LEXACT regnode can hold up to what 1024 regular EXACT ones hold, using 4K fewer overhead bytes to do so. That means it can handle strings containing 262000 bytes. The comments give ideas for expanding that should it become necessary or desirable. Besides the space advantage, any hardware acceleration in memcmp can be done in much bigger chunks, and otherwise the memcmp inner loop (often written in assembly) will run many more times in a row, and our outer loop that calls it, correspondingly fewer.
* regcomp.c: Change handling of filled EXACT nodesKarl Williamson2019-09-292-153/+292
| | | | | | | | | | | | | | | | | | | | | | | | | This changes the detection mechanism to check just before writing to see if if would be out of bounds, and if so, instead break out of the loop, and go close out the node. Prior to this commit space for a worst-case scenario was reserved, and we didn't start a new character if we were in that danger zone. This left nodes left fully packed than they could have been. Thus this improves the packing of nodes, especially under /i, from the previous mechanism. But more importantly, it set things up so that we can potentially increase the node size as we go along. This also changes the handling of avoiding splitting a multi-character fold across nodes under /i. For example, take the sequence 'ffi', We wouldn't want to end a node with 'ff', when the first character in the next node is an 'i', as U+FB03 folds to that sequence, and the code that does pattern matching can't currently match across node boundaries. Previously we backed off filling the node until the final character wasn't one that could potentially cause such a break. That is we didn't look at the next character and see if it was an 'i' (or some other potential multi-char fold.) Now we do look at that next character(s), and only back off if this actually would split a real multi-char fold.
* regcomp.h: Add commentsKarl Williamson2019-09-291-0/+9
|
* regcomp.h: Remove obsolete macroKarl Williamson2019-09-291-2/+0
| | | | This is no longer used
* regcomp.c: Rename three variablesKarl Williamson2019-09-291-10/+11
| | | | | | | | | One of the variables is misnamed, the upper_fill indicates that the node has to be left not completely filled. Comments will be added in a later commit. The other two are renamed in preparation for future changes to more accurately describe their new purposes.
* regcomp.c: White-space only, commentsKarl Williamson2019-09-291-14/+15
| | | | | Outdent a block that was doubly indented. Change some other white space and fix grammar in a comment
* regcomp: Use new set macro to store a valueKarl Williamson2019-09-292-5/+7
| | | | | This is in preparation for the current mechanism in a later commit to become a not legal lhs
* Change to primary email address for Paul Marquesspmqs2019-09-281-1/+1
|
* Devel::PPPort Adjust manifest for next releaseNicolas R2019-09-271-19/+139
|
* Devel::PPPort - fix podcheck issuesNicolas R2019-09-272-5/+11
| | | | Fix issues noticed by porting/podcheck.t
* Turn the clock backward for Devel::PPPort 3.54Nicolas R2019-09-271-1/+1
| | | | | | The last release version at this date is 3.52 turn the clock backward to 3.54 for now so porting/cmp_version.t passes.
* Devel::PPPort: Fix commit d6d4687 vmess is already implementedPali2019-09-271-1/+0
| | | | | | | Fixes GH #61 aka RT 134101 (cherry picked from commit 935b7556e54d4bd3c18fdfef2f072b674afb7051) Signed-off-by: Nicolas R <atoomic@cpan.org>
* Devel::PPPort - Reconciliate changes with GitHub 26a6a909Nicolas R2019-09-274-512/+3
|
* Update parts/base,todo filesKarl Williamson2019-09-27280-761/+3826
| | | | | | | This is updated to the latest blead. (cherry picked from commit e7398cda98d95e464aefd3b7ab8a052bdf19c896) Signed-off-by: Nicolas R <atoomic@cpan.org>
* We don't provide GCC_BRACE_GROUPS_FORBIDDENDKarl Williamson2019-09-271-2/+1
| | | | | | | So, don't use our macro that indicates we do provide it. (cherry picked from commit 36672207f64165e8e58251a2a4cb4569984dadcd) Signed-off-by: Nicolas R <atoomic@cpan.org>
* Backport start_subparseKarl Williamson2019-09-274-15/+35
| | | | | (cherry picked from commit 59c0a72a7f36c9f3e2c0779f5affc420499252b8) Signed-off-by: Nicolas R <atoomic@cpan.org>
* Backport rsync_locale, switch_to_global_localeKarl Williamson2019-09-273-0/+118
| | | | | | | Before these existed, they should be no-ops (cherry picked from commit fbde8074e56bf0da478eb424c4bc9329ee48210b) Signed-off-by: Nicolas R <atoomic@cpan.org>
* Backport UVCHR_SKIPKarl Williamson2019-09-272-5/+89
| | | | | (cherry picked from commit bfe660f9f9775fc1cbbf1c5fd7ed809b3e4dd369) Signed-off-by: Nicolas R <atoomic@cpan.org>