delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Bump \p{nv=} precision from 2 to 3	Karl Williamson	2022-04-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This closes #19603 Unicode has various characters whose numeric value is rational non-integer. These can be specified in \p{nv=...} constructs by either the rational form or by an expression that it evaluates to. The number of significant digits that must match are kept to a minimum to allow for variances in different platforms floating point lengths and rounding decisions. Previously that number was 2 digits; but that is no longer always sufficient for all platforms. This commit changes it to 3.
*	Add is_XPERLSPACE_utf8_safe_backwards()	Karl Williamson	2022-03-19	1	-1/+88
\| \| \| \| \|	This macro starts from the right side and matches UTF-8 white space characters.
*	Remove 'no warnings experimental::signatures' from support files	Paul "LeoNerd" Evans	2022-02-20	1	-1/+1
\|
*	Fix lib/unicore/mktables for experimental::builtin warnings	Paul "LeoNerd" Evans	2022-01-25	1	-1/+1
\|
*	Remove remaining uses of @_ in signatured subs in lib/unicore/mktables	Paul "LeoNerd" Evans	2022-01-24	1	-1/+1
\|
*	Add missing aliases for \p{Present_In}	Karl Williamson	2022-01-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	\p{Present_In} is a Perl extension of the Unicode Age property, added because knowing the exact Unicode version in which a code point became assigned is rarely what you want; much more frequently you want to know if the code point exists in the version or not. (Since this extension was added, Unicode changed their language to declare that the Age property should be interpreted in pattern matching, not as described, but as Perl's Present_In is. But I chose to not change Age, to avoid backwards compatibility issues, and this way, a coder can choose which thing s/he wanted.) Unicode typically has synonyms (aliases) for each value a property can tak on, so \p{Age=6.1} and \p{Age=V61_1} mean the same thing. Prior to this commit, neither \p{Present_In=1_1} nor \p{Present_In=NA} worked.
*	mktables: Use builtin::refaddr	Karl Williamson	2021-12-13	1	-1/+1
\| \| \| \| \|	Now that this function is available in miniperl, mktables can use it to avoid a bunch of visually distracting 'no overloading' calls.
*	mktables: Don't calculate some unused values	Karl Williamson	2021-12-13	1	-1/+1
\| \| \| \|	These apparently were once needed, but no longer.
*	mktables: Use mnemonic variable names	Karl Williamson	2021-12-07	1	-1/+1
\| \| \| \|	Spotted by Dagfinn Ilmari Mannsåker
*	Fix unicore/mktables to avoid any @_ accesses in signatured subs	Paul "LeoNerd" Evans	2021-12-07	1	-1/+1
\|
*	mktables: Remove relics of removed legacy tables	Karl Williamson	2021-09-15	1	-1/+1
\| \| \| \| \|	These mentions of the tables removed in b852e1da77b497e086508451bebff00541073fb1 were missed in that commit.
*	Support Unicode 14.0	Unicode Consortium	2021-09-15	1	-52/+52
\|
*	mktables: Split a Line Break equivalence class	Karl Williamson	2021-09-15	1	-1/+1
\| \| \| \|	This is used for the \b{lb}, and the rule is changing in Unicode 14.0
*	mktables: Reorder some comments, white-space	Karl Williamson	2021-09-15	1	-1/+1
\| \| \| \|	Move comments closer to the action
*	mktables: Rename variable, and hoist calc from loop	Karl Williamson	2021-09-15	1	-1/+1
\|
*	Unicode::UCD: Don't depend on a file current syntax	Karl Williamson	2021-08-31	1	-1/+1
\| \| \| \| \|	This generated file will be changed in a future commit. This shouldn't have been relying on its syntax anyway, but the value it returns.
*	Unicode::UCD: Fix typo in pod	Karl Williamson	2021-08-31	1	-1/+1
\|
*	Remove deprecated Unicode files	Karl Williamson	2021-09-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These files were once apparently intended for use by modules to supplement the core Unicode handling. They contain tables suitable for use by Perl code of the portions of the Unicode character database about changing the case of characters and finding the numeric value of a given \d character, in a form suitable for use by perl code. In particular, they were designed for fast access using the swash mechanism that has since been removed. Now, Unicode::UCD now contains more convenient methods of accessing the data these contain, and the use of these files has been deprecated since 5.16. I could not figure out a way to force a message should someone open and read one of these files, but each of their texts say that the file may be removed without notice at any time. I did not find any uses on cpan of them. Unicode is adding new properties that the format of these files will not be able to handle. Consequently I'm coming up with a new format. Though these files don't contain the new properties, their existence means having the burden of having to maintain two separate mechanisms. Better to have just one mechanism, suitable for going forward.
*	mktables: Generate =head1 NAME line in Name.pm	Karl Williamson	2021-08-15	1	-1/+1
\| \| \| \| \|	All .pm files are supposed to have this line. So far this hasn't been necessary for this file, but future commits will require it.
*	lib/unicore/mktables: correct sub signatures in 2 locations	James E Keenan	2021-08-14	1	-1/+1
\| \| \| \| \|	Then, re-run regen/mk_invlists.pl and regen/regcharclass.pl and commit changes in headers.
*	utf8.c: Rmv an EBCDIC dependency	Karl Williamson	2021-08-14	1	-1/+1
\| \| \| \|	This is now generated by regcharclass.pl
*	mktables: Change "null string" to "empty string"	Karl Williamson	2021-08-11	1	-1/+1
\| \| \| \|	The latter phrase makes more sense
*	mktables: Add, fix comments	Karl Williamson	2021-08-11	1	-1/+1
\|
*	mktables: Fix debugging issues	Karl Williamson	2021-08-11	1	-1/+1
\| \| \| \| \| \|	Commit 4fe9356b250 changed the signatures on subroutines, and didn't do these correctly. The result was that perl would croak when using the mktables debugging facility.
*	mktables: Fix table output	Karl Williamson	2021-08-09	1	-1/+1
\| \| \| \| \| \|	Commit 4fe9356b250 changed the signatures on subroutines, and didn't do this one correctly. The result was that the comments in the generated files had duplicate text and were slightly garbled.
*	regcharclass.pl: Add fast surrogate UTF-8 trie	Karl Williamson	2021-08-07	1	-1/+13
\| \| \| \| \|	This will be used in the next commit. It requires only the first two bytes to determine if a UTF-8 or UTF-EBCDIC sequence is for a surrogate
*	regcharclass.pl: Further improve EBCDIC code	Karl Williamson	2021-08-07	1	-23/+23
\| \| \| \| \| \| \| \| \| \| \|	A couple of commits ago improved the generated output of this script. This builds on that. The improvements were to try a transform that could lead to fewer conditionals, as bytes were greouped in fewer ranges. But that introduced a useless transformation for the single element ranges that remain. This commit removes the transformation if not needed.
*	regcharclass.pl: Make 2 locals into global hashes	Karl Williamson	2021-08-07	1	-1/+1
\| \| \| \|	This is in preparation for a future commit
*	regcharclass.pl: Improve generated code for EBCDIC	Karl Williamson	2021-08-07	1	-151/+139
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	UTF-8 has some desirable characteristics not shared by UTF-EBCDIC. One example is all the continuation bytes are in a single range. By transforming a UTF-EBCDIC byte into I8 (similar to UTF-8), we gain those characteristics, and may be able to save a conditional or three. This commit creates a 2nd pass over the bytes that are to be matched, transforming them into I8. If that pass results in fewer conditionals than the traditional, native, generated code, use the fewer result. This saves quite a bit in some of the generated code, enabling the quotemeta macro to be represented in a single part; previously it had to be split to avoid compiler macro size limits.
*	regcharclass.pl: White-space comment only	Karl Williamson	2021-08-07	1	-1/+1
\| \| \| \|	A future commit will put a block around this; indent now.
*	regcharclass.pl: Get UTF EBCDIC translations	Karl Williamson	2021-08-07	1	-1/+1
\| \| \| \|	These will be used in a future commit
*	regcharclass.pl: Add ability to avoid wrong mnemonic	Karl Williamson	2021-08-07	1	-1/+1
\| \| \| \| \| \| \|	A future commit will pass this function data that shouldn't be translated into a mnemonic, like 'f' for the letter f. The reason is that that code will potentially be executed on a machine with a different character set than what the mnemonic would be valid for.
*	regcharclass.pl: Change variable name	Karl Williamson	2021-08-07	1	-1/+1
\| \| \| \|	A future commit will use this differently than the current name implies
*	regcharclass.pl: Reorder execution path	Karl Williamson	2021-08-07	1	-1/+1
\| \| \| \| \|	This moves a loop earlier in the execution path. This will be useful in a later commit
*	regcharclass.pl: Rmv unused variable	Karl Williamson	2021-08-07	1	-1/+1
\|
*	regcharclass.pl: Add an error check	Karl Williamson	2021-08-07	1	-1/+1
\|
*	regcharclass.pl: Move some code earlier	Karl Williamson	2021-08-07	1	-1/+1
\| \| \| \| \|	We can short circuit some work by moving the test earlier. This does not change the generated file.
*	regcharclass.pl: Rmv unused variable	Karl Williamson	2021-08-07	1	-1/+1
\|
*	regen/regcharclass.pl: Use deref of an array	Karl Williamson	2021-08-07	1	-1/+1
\| \| \| \|	This will make future commits read better.
*	regen/charset xlations.pl: Use revised UTF-8 macros	Karl Williamson	2021-07-31	1	-1/+1
\| \| \| \| \| \| \| \|	I realized that two base level utf8.h macros for UTF-8 could be refactored to eliminate the conditionals in each. Those macros have equivalents in the pure perl code changed by this commit, which I changed before the utf8.h versions to verify that everything worked, by verifying there was no difference in the generated tables.
*	regcharclass.h: Remove 2 EBCDIC dependencies	Karl Williamson	2021-07-31	1	-2/+20
\| \| \| \| \| \| \| \| \|	This commit makes is_HANGUL_ED_utf8_safe() return 0 unconditionally on EBCDIC platforms. This means its callers don't have to care what platform is running. Change the two callers to take advantage of this The commit also changes the description of the macro to be slightly more accurate
*	regcharclass.h: #defines for non-chars by UTF8 length	Karl Williamson	2021-07-30	1	-1/+131
\| \| \| \| \| \| \| \| \| \|	This creates macros for the non-character code points so that, given the length of the UTF-8 sequence, only those ones that have that length match. This makes for more efficient processing, to be used in a future commit. The place where the length changes depends on the platform type, and these macros will keep the code from having to worry about that.
*	Unicode::UCD: Bump version; regen	Karl Williamson	2021-07-20	1	-1/+1
\|
*	Put back the old url for unicode.org (in lib/unicore) since there is now a ↵	Thibault DUPONCHELLE	2021-07-17	1	-4/+4
\| \| \| \|	redirection
*	Update UCD version. Remove changes to cpan Encode. Regen	Thibault DUPONCHELLE	2021-06-17	1	-5/+5
\|
*	perluniprops: Remove references to Unicode::Unihan	Karl Williamson	2021-05-31	1	-1/+1
\| \| \| \| \| \|	This CPAN module doesn't work on recent Unicode versions This fixes GH #18787
*	mostly docs: replace "pumpking" when referring to the present	Ricardo Signes	2021-04-16	1	-1/+1
\| \| \| \| \| \| \|	Some other tweaks or modernizations are present, but I expect none of this is controversial. This also includes running regen/mk_invlists.pl and regen/regcharclass.pl
*	style: Detabify regen files.	Michael G. Schwern	2021-01-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	They generate C files. Bump feature.pm and warnings.pm versions to satisfy cmpVERSION.pl. I can't get it to easily ignore whitespace, `git diff --name-only` does not respect the -w flag. regen_perly.pl is left alone. That would require rebuilding perly.* which is beyond a simple indentation change.
*	regen/regcharclass.pl: Mark intermediate macros as internal	Karl Williamson	2020-12-21	1	-61/+61
\| \| \| \| \| \| \| \|	The macros generated by this script may have to be split into sub-macros to make the overall macro fit the maximum number of characters allowed by the compiler for a macro definition. This commit adds a trailing underscore to the names of such intermediate macros so as to mark them as non-API for autodoc.
*	regcharclass.pl: Get code point folding to a seq	Karl Williamson	2020-12-19	1	-20/+1340
\| \| \| \| \| \| \|	Previously regcharclass.pl could tell if an input string was a multi-character fold of some Unicode code point. This commit adds the ability to return what that code point is. This capability will be used in a later commit.