delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add utf8_to_utf16	Karl Williamson	2021-08-14	1	-0/+4
\|
*	Improve utf16_to_utf8_reversed()	Karl Williamson	2021-08-14	1	-0/+5
\| \| \| \| \| \|	Instead of destroying the input by first swapping the bytes, this calls a base function with the order to use. The non-reverse function is changed to call the base function with the non-reversed order.
*	Make macro isUTF8_CHAR_flags an inline fcn	Karl Williamson	2021-08-14	1	-39/+0
\| \| \| \|	This makes it use the fast DFA for this functionality.
*	utf8.h: Comment changes	Karl Williamson	2021-08-07	1	-10/+21
\|
*	utf8.h: White space only	Karl Williamson	2021-08-07	1	-19/+19
\|
*	utf8.h: Refactor UTF8_IS_NONCHAR...	Karl Williamson	2021-08-07	1	-7/+6
\| \| \| \| \| \|	UTF8_IS_NONCHAR_GIVEN_THAT_NON_SUPER_AND_GE_PROBLEMATIC() is defined just for backward compatability (though I don't think anyone uses it). Swap which macro is the base level that the other is defined in terms of
*	Refactor UTF8_IS_SUPER()	Karl Williamson	2021-08-07	1	-20/+14
\| \| \| \| \| \|	This uses macros recently introduced to remove an EBCDIC dependency and make the definition simpler. It now uses the DFA, which should speed up the non-edge case uses.
*	utf8.h: Document some #defines	Karl Williamson	2021-08-07	1	-0/+37
\| \| \| \| \|	The reorganization in the previous commit revealed some undocumented public macros
*	utf8.h: Move some #defines around	Karl Williamson	2021-08-07	1	-116/+119
\| \| \| \| \| \|	This moves the defines for things like surrogates, non-character code points, etc. to a more logical order, with like adjacent to like, and before they are otherwise used in the file.
*	utf8.h: Remove EBCDIC dependency	Karl Williamson	2021-08-07	1	-7/+10
\| \| \| \|	By generalizing a macro, we can make it serve both ASCII and EBCDIC
*	utf8.h: Add macros to calc UTF start byte, first cont	Karl Williamson	2021-08-07	1	-4/+41
\| \| \| \| \|	These two bytes are useful to know in some situations. This commit changes a couple such places to use the first macro.
*	utf8.h: Reorder some preprocessor directives	Karl Williamson	2021-08-07	1	-14/+10
\| \| \| \|	This is just so that things are clearer to the reader
*	utf8.h: Add #define	Karl Williamson	2021-08-07	1	-1/+5
\| \| \| \|	UTF_MIN_CONTINUATION_BYTE is clearer for use in some contexts
*	utf8.h: Move all SKIP functions to be near each other	Karl Williamson	2021-08-07	1	-9/+8
\| \| \| \|	For convenient code reading
*	utf8.h: Add new #define for extended length UTF-8	Karl Williamson	2021-08-07	1	-0/+1
\| \| \| \| \| \| \| \|	The previous commit added a convenient place to create a symbol to indicate that the UTF-8 on this platform includes Perl's nearly-double length extension. The platforms this isn't needed on are 32-bit ASCII ones. This symbol allows removing one place where EBCDIC need be considered, and future commits will use it as well.
*	utf8.h: Refactor MAX_UTF8_TWO_BYTE	Karl Williamson	2021-08-07	1	-3/+11
\| \| \| \| \| \|	The previous commit removed a macro that the comments for this refer to in explaining its derivation. So use an alternative, that is actually clearer.
*	Reimplement OFFUNISKIP	Karl Williamson	2021-08-07	1	-47/+23
\| \| \| \| \| \| \|	Now that previous commits have made it fast to find the position of the first set bit in a word, we can use a forumla to find how many bytes the UTF-8 of that will occupy. This allows for simplification of this macro, removing several conditionals
*	utf8.h: Add macro to compute UV skip by its log2	Karl Williamson	2021-08-07	1	-2/+30
\| \| \| \| \| \| \| \| \| \|	This macro will calculate at compile time, if passed a compile-time constant, how many UTF-8 bytes are required to represent the parameter. The macro is a helper which works fine except for edge cases, which a wrapper is needed to handle. The commit changes one instance to use this new macro
*	utf8.h: Rmv EBCDIC dependency	Karl Williamson	2021-08-07	1	-11/+59
\| \| \| \| \| \| \|	This moves a #define into the common code for ASCII and EBCDIC machines. It adds a bunch of comments about the value that I wish I hadn't had to figure out for myself.
*	Rename internal macro and move to utf8.h	Karl Williamson	2021-08-07	1	-0/+2
\| \| \| \| \| \|	This macro has a corresponding, older, name for the non-UTF-8 case. It makes sense to use the same paradigm, and move the definitions together so that the comments for one don't have to be repeated for the other.
*	utf8.h: Remove an EBCDIC dependency	Karl Williamson	2021-08-07	1	-2/+19
\| \| \| \| \|	A symbol introduced in a previous commit allows this internal macro to only need a single version, suitable for either EBCDIC or ASCII.
*	utf8.h: Add symbol for easing EBCDIC handling	Karl Williamson	2021-08-07	1	-0/+6
\| \| \| \|	This is then used in regcomp.c to avoid an #ifdef EBCDIC
*	utf8.h: Make a bit of EBCDIC known to ASCII	Karl Williamson	2021-08-07	1	-4/+15
\| \| \| \| \|	This info is needed in one other place; doing it here means only specifying it once.
*	utf8.h: Add a #define synonym	Karl Williamson	2021-08-07	1	-3/+9
\| \| \| \| \|	This is more clearly named for various uses in this file. It has an unwieldy length, but is unlikely to be used outside it.
*	Refactor UTF_START_MASK()	Karl Williamson	2021-08-07	1	-5/+14
\| \| \| \| \| \| \| \|	A slight change to this very low level macro (hence called a lot) removes the need for a conditional, and causes it to work on single-byte UTF-8 characters on ASCII platforms. The definition is also moved to a more logical place in the file
*	utf8.h: Move macro to earlier in file	Karl Williamson	2021-08-07	1	-13/+13
\| \| \| \|	This is now defined before first use
*	UTF8_IS_DOWNGRADEABLE_START: Call less general helper	Karl Williamson	2021-08-07	1	-1/+1
\| \| \| \| \| \|	Future commits would otherwise make the expansion of this macro too complicated for some C compilers. Use a less general internal helper function to avoid that.
*	Refactor UTF_START_MARK()	Karl Williamson	2021-05-30	1	-5/+10
\| \| \| \| \|	This allows the removal of a conditional in a very low level (called a lot) macro
*	UTF8_IS_NEXT_CHAR_DOWNGRADEABLE() check before deref	Karl Williamson	2021-05-29	1	-2/+2
\| \| \| \|	Reorder the clauses to check first before dereferencing
*	utf8.h: Simplify UNICODE_IS_SURROGATE()	Karl Williamson	2021-05-28	1	-4/+3
\| \| \| \| \|	This uses inRANGE() with mnemonics to make it clearer with no increase in the number of conditionals
*	utf8.h: Use inRANGE for UNICODE_IS_32_CONTIGUOUS_NONCHARS	Karl Williamson	2021-05-28	1	-2/+2
\| \| \| \|	This leads to a single conditional instead of two.
*	utf8.h: Refactor UNICODE_IS_NONCHAR()	Karl Williamson	2021-05-28	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	This adds branch prediction and re-orders so that an unlikely to succeed test is done before the likely to succeed one, so that the latter usually doesn't need to be executed. Since both conditions must succeed for the entire expression to succeed, this doesn't change what the whole expresson matches. s# Please enter the commit message for your changes. Lines starting
*	style: Detabify indentation of the C code maintained by the core.	Michael G. Schwern	2021-01-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	This just detabifies to get rid of the mixed tab/space indentation. Applying consistent indentation and dealing with other tabs are another issue. Done with `expand -i`. * vutil.* left alone, it's part of version. * Left regen managed files alone for now.
*	utf8.h: Fix syntax error only found on EBCDIC builds	Karl Williamson	2020-12-04	1	-1/+1
\|
*	autodoc.pl: Specify scn for single-purpose files	Karl Williamson	2020-11-06	1	-8/+0
\| \| \| \| \| \| \| \|	Many of the files in perl are for one thing only, and hence their embedded documentation will be for that one thing. By creating a hash here of them, those files don't have to worry about what section that documentation goes under, and so it can be completely changed without affecting them.
*	Change some link pod for better rendering	Karl Williamson	2020-08-31	1	-7/+7
\| \| \| \|	C<L</foo>> renders better in places than L</C<foo>>
*	Document ibcmp_utf8, and move to like-fcns hdr	Karl Williamson	2020-08-22	1	-3/+0
\|
*	utf8.h: Add comment	Karl Williamson	2020-07-31	1	-0/+1
\|
*	utf8.h: Remove obsolete macro	Karl Williamson	2020-07-30	1	-7/+0
\| \| \| \| \| \|	It turns out that this macro would have failed to compile since commit 538b546eb0f252250a30c08e6af47d0ea7433fa1, in October 2013. So it is clear no one is using it.
*	Fix typo when using nBIT_UMAX	Nicolas R	2020-07-22	1	-1/+1
\| \| \| \| \| \| \| \|	nBIT_MAX was used instead of nBIT_UMAX from d223e1ea9ae864c0e563187f1e76 changes note: at first glance it seems that nBIT_UMAX is an alias for nBIT_MASK
*	utf8.h: Add some branch predictions	Karl Williamson	2020-07-17	1	-20/+26
\| \| \| \| \|	It is likely that the data will be well-formed Unicode, and not one of its special characters, like surrogates or non-characters, nor NUL.
*	handy.h: Create nBIT_MASK(n) macro	Karl Williamson	2020-07-17	1	-2/+2
\| \| \| \| \|	This encapsulates a common paradigm, making sure that it is done correctly for the platform's size.
*	Fix a bunch of repeated-word typos	Dagfinn Ilmari Mannsåker	2020-05-22	1	-1/+1
\| \| \| \| \|	Mostly in comments and docs, but some in diagnostic messages and one case of 'or die die'.
*	pv_uni_display: Use common fcn; \b mnemonic	Karl Williamson	2020-01-23	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \|	This removes the (almost) duplicate code in this function to display mnemonics for control characters that have them. The reason the two pieces of code aren't precisely the same is that the other function also uses \b as a mnemonic for backspace. Using all possible mnemonics is desirable, so a flag is added for pv_uni_display to now use \b. This is now by default enabled in double-quoted strings, but not regex patterns (as \b there means something quite different except in character classes). B.pm is changed to expect \b.
*	Fix UTF8_IS_START on EBCDIC	Karl Williamson	2019-12-07	1	-3/+11
\|
*	utf8.h: Rmv obsolete macros	Karl Williamson	2019-11-24	1	-16/+0
\| \| \| \| \| \| \|	These macros were missed in dd1a3ba7882ca70c1e85b0fd6c03d07856672075 and 059703b088f44d5665f67fba0b9d80cad89085fd. Using them would cause things to fail to compile
*	Add missing back compat macros	Karl Williamson	2019-11-24	1	-0/+1
\| \| \| \|	These are needed only to allow some modules to stay updated with blead.
*	utf8.h: Use MAX() macro instead of its expansion	Karl Williamson	2019-11-14	1	-3/+1
\| \| \| \|	It makes things a little clearer.
*	utf8.h: Use a cast to U8 to avoid an AND	Karl Williamson	2019-11-11	1	-1/+1
\|
*	Allow core to work with code points above IV_MAX	Karl Williamson	2019-11-06	1	-0/+4
\| \| \| \| \|	Higher has been reserved for core use, and a future commit will want to finally do this.