delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	handy.h Special case toCTRL('?') for EBCDIC	Karl Williamson	2014-02-05	1	-5/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no change for ASCII platforms. For EBCDIC ones, toCTRL('?") and its inverse are special cased to map to/from the APC control character, which is the outlier control on these platforms. The reason to special case this is that otherwise toCTRL('?') would map to a graphic character, not a control. By outlier, I mean it is the one control not in the single block where all the other controls are placed. Further, it corresponds on two of the platforms with 0xFF, which is would be an EBCDIC rub-out character corresponding to an ASCII rub-out (or DEL) 0x7F, which is what toCTRL('?') maps to on ASCII. This is an outlier control on ASCII not being a member of the C0 nor C1 controls. Hence this make '?' mean the outlier control on both platforms.
*	Fix [[:blank:]] handling when no isblank() on platform	Karl Williamson	2014-02-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	isblank() is a C99 construct, Perl tries to handle the use of this on C89 platforms by using the standard hard-coded definition. However, this code was not updated to account for UTF-8 locales when handling for those was recently added (31f05a37c), since in a UTF-8 locale the no-break space is also considered to be a blank. This commit fixes that. Previously regcomp.c generated the hard-coded definitions when there was no isblank(), using #ifdef'd code. That special handling was removed, and [:blank:] is always treated just like any other POSIX class. The specialness of it is hidden entirely in handy.h. This simplifies the regcomp.c code slightly. I considered removing the special handling for isascii(), also a C99 construct, in the name of simplicity over the slight speed that would be lost. But the special handling is only a single line in two places, so I left it in.
*	handy.h: Add a cBOOL()	Karl Williamson	2014-01-30	1	-1/+1
\| \| \| \| \| \| \|	isascii() is used as a fallback for isASCII(). This would be on an unusual platform or under unusual circumstances. isascii() may return values besides 0 and 1 which can cause things that are expecting a bool to fail.
*	PATCH: [perl #121109] locales failing	Karl Williamson	2014-01-28	1	-6/+4
\| \| \| \| \| \| \|	This was due to a logic error in toFOLD_LC() introduced in 31f05a37c4e9c37a7263491f2fc0237d836e1a80. It affected only the code point at 0xB5 and shows up only in locales in which the character at that code point is an uppercase letter.
*	Work properly under UTF-8 LC_CTYPE locales	Karl Williamson	2014-01-27	1	-13/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This large (sorry, I couldn't figure out how to meaningfully split it up) commit causes Perl to fully support LC_CTYPE operations (case changing, character classification) in UTF-8 locales. As a side effect it resolves [perl #56820]. The basics are easy, but there were a lot of details, and one troublesome edge case discussed below. What essentially happens is that when the locale is changed to a UTF-8 one, a global variable is set TRUE (FALSE when changed to a non-UTF-8 locale). Within the scope of 'use locale', this variable is checked, and if TRUE, the code that Perl uses for non-locale behavior is used instead of the code for locale behavior. Since Perl's internal representation is UTF-8, we get UTF-8 behavior for a UTF-8 locale. More work had to be done for regular expressions. There are three cases. 1) The character classes \w, [[:punct:]] needed no extra work, as the changes fall out from the base work. 2) Strings that are to be matched case-insensitively. These form EXACTFL regops (nodes). Notice that if such a string contains only characters above-Latin1 that match only themselves, that the node can be downgraded to an EXACT-only node, which presents better optimization possibilities, as we now have a fixed string known at compile time to be required to be in the target string to match. Similarly if all characters in the string match only other above-Latin1 characters case-insensitively, the node can be downgraded to a regular EXACTFU node (match, folding, using Unicode, not locale, rules). The code changes for this could be done without accepting UTF-8 locales fully, but there were edge cases which needed to be handled differently if I stopped there, so I continued on. In an EXACTFL node, all such characters are now folded at compile time (just as before this commit), while the other characters whose folds are locale-dependent are left unfolded. This means that they have to be folded at execution time based on the locale in effect at the moment. Again, this isn't a change from before. The difference is that now some of the folds that need to be done at execution time (in regexec) are potentially multi-char. Some of the code in regexec was trivial to extend to account for this because of existing infrastructure, but the part dealing with regex quantifiers, had to have more work. Also the code that joins EXACTish nodes together had to be expanded to account for the possibility of multi-character folds within locale handling. This was fairly easy, because it already has infrastructure to handle these under somewhat different circumstances. 3) In bracketed character classes, represented by ANYOF nodes, a new inversion list was created giving the characters that should be matched by this node when the runtime locale is UTF-8. The list is ignored except under that circumstance. To do this, I created a new ANYOF type which has an extra SV for the inversion list. The edge case that caused the most difficulty is folding involving the MICRO SIGN, U+00B5. It folds to the GREEK SMALL LETTER MU, as does the GREEK CAPITAL LETTER MU. The MICRO SIGN is the only 0-255 range character that folds to outside that range. The issue is that it doesn't naturally fall out that it will match the CAP MU. If we let the CAP MU fold to the samll mu at compile time (which it can because both are above-Latin1 and so the fold is the same no matter what locale is in effect), it could appear that the regnode can be downgraded away from EXACTFL to EXACTFU, but doing so would cause the MICRO SIGN to not case insensitvely match the CAP MU. This could be special cased in regcomp and regexec, but I wanted to avoid that. Instead the mktables tables are set up to include the CAP MU as a character whose presence forbids the downgrading, so the special casing is in mktables, and not in the C code.
*	handy.h: Move the _LC_CAST declaration.	Brian Fraser	2014-01-23	1	-2/+2
\| \| \| \| \|	Previous it was left so that some systems, like Android, didn't get this, which broke the build.
*	handy.h: Express locale macros using common base macros	Karl Williamson	2014-01-22	1	-34/+85
\| \| \| \| \| \| \| \| \|	This extracts out the code of looking up POSIX classes in locales to use base macros common to all of them. It does this for the NeXT only code as well as the typical compilations. This is in preparation for changing the behavior. Certain things look weird as they are aligned, etc as part of that preparation.
*	handy.h: Factor out common code	Karl Williamson	2014-01-22	1	-18/+14
\| \| \| \| \| \| \| \| \| \| \|	It turns out that the definitions for isASCII_LC and is_BLANK_LC end up being the same for all three possible #if platform states, so can just have them once instead of three times. It is unlikely that the && ! defined(USE_NEXT_CTYPE) is necessary, because HAS_ISASCII likely won't be defined, but this makes sure that this doesn't change the previous behavior.
*	handy.h: White-space, comments, pod nit only	Karl Williamson	2014-01-22	1	-37/+51
\|
*	handy.h: Add two macros	Karl Williamson	2014-01-01	1	-4/+13
\| \| \| \| \| \| \| \|	handy.h contains a macro that reads a hex digit and returns its value, with fewer branches than a naive implementation would use. This commit just copies and modifies it to create two macros for 1) just converting the hex value, without advancing the input; and 2) doing the same for an octal value.
*	handy.h: Add debugging assertion	Karl Williamson	2014-01-01	1	-1/+3
\| \| \| \| \|	This macro requires the input to be a hex digit, without testing. It is prudent to assert that under DEBUGGING.
*	Move a macro from utf8.h to handy.h for wider use.	Karl Williamson	2014-01-01	1	-0/+10
\| \| \| \|	Future commits will want this available outside utf8.h
*	perlapi: Consistent spaces after dots	Father Chrysostomos	2013-12-29	1	-7/+11
\| \| \| \|	plus some typo fixes. I probably changed some things in perlintern, too.
*	Note usage compatability of Safefree / Newx and friends	Matthew Horsfall (alh)	2013-12-18	1	-0/+12
\|
*	fix -Wsign-compare in core	David Mitchell	2013-11-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	There were a few places that were doing unsigned_var = cond ? signed_val : unsigned_val; or similar. Fixed by suitable casts etc. The four in utf8.c were fixed by assigning to an intermediate unsigned var; this has the happy side-effect of collapsing a large macro expansion, where toUPPER_LC() etc evaluate their arg multiple times.
*	handy.h: Slightly refactor READ_XDIGIT macro	Karl Williamson	2013-11-26	1	-1/+5
\| \| \| \| \| \| \| \| \|	This adds comments as to how it works, factors out the mask to be specified only once, and uses isDIGIT instead of isALPHA, as the former is likely to be slightly more efficient (because isDIGIT doesn't have to worry about there being non-ASCII digits, and isALPHA does have to worry about non-ASCII alphas). The result is easier to understand what's going on.
*	handy.h: Remove duplicate line	Karl Williamson	2013-11-25	1	-1/+0
\| \| \| \|	Two adjacent lines were identical. Only one is needed.
*	toLOWER_LC(), toUPPER_LC(): fix signedness	David Mitchell	2013-11-21	1	-2/+2
\| \| \| \| \| \|	The are documented to return UV, but in one definition they return tolower()/toupper(), which on Linux return a signed value. So cast away the compiler warnings.
*	remove redundant Zero() from JMPENV_BOOTSTRAP	Daniel Dragan	2013-11-02	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 14dd3ad8c9 , a 3 NULL assigns were converted to a Zero() for what I guess was an optimization. This also caused the large je_buf to be zeroed even though je_buf was uninit before. At that time, JMPENV had 2 extra members that don't exist anymore. The 2 extra members in JMPENV were removed in commit 766f891612 . The comment about je_throw was made obsolete in commit 766f891612 so rework it. One function call free NULL assign is faster than a memset() call. je_buf is 0x40 bytes long on 32 bit VC2003 Win32 Perl. No need to zero it since je_buf is never read unless je_prev is not NULL. Also there is no need to zero the last 2 members je_ret and je_mustcatch since they are immediatley assigned to. Move PL_top_env assignment to near je_prev so compiler tries to optimize better since je_prev is the start of the struct and hopefully will calculate the pointer once. Also put some poisoning in case JMPENV gets new members in the future. To conditionally poison in a macro, PERL_POISON_EXPR is being introduced instead of 2 different definitions of JMPENV_BOOTSTRAP.
*	Introduce PERL_BOOL_AS_CHAR define	Father Chrysostomos	2013-10-30	1	-1/+6
\| \| \| \| \| \| \| \| \|	This allows compilers that do support real booleans (C++ or anything with stdbool.h) to emulate those that don’t. See ticket #120314. This patch incorporates suggestions from Craig Berry.
*	Don't make bool an int on VMS.	Craig A. Berry	2013-10-27	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \|	The special case has been there since 61bb59065bf1b12edab3, most likely because the VMS C++ compiler, like a lot of other C++ compilers in the 1990s implemented a bool as an int, and making the type in C compatible seemed like a good idea. But no C++ compiler that's likely to build Perl on VMS has a bool type that occupies more than one byte now, so remove the special case. We're unlikely to even see this code since we've had stdbool.h since DEC C 6.4, released in 2001.
*	Add a USING_MSVC6 macro to identify Microsoft Visual C++ 6.0	Steve Hay	2013-09-19	1	-1/+1
\| \| \| \| \| \| \|	This simplifies some of the logic necessary for coping with its various problems. Suggested by Nicholas Clark.
*	handy.h: Allow bootstrapping to non-ASCII platform	Karl Williamson	2013-08-29	1	-63/+125
\| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a bunch of macros and moves things around to support conditional compilation when Configure is called with -DBOOTSTRAP_CHARSET. Doing so causes the usual macros that are table-driven to not be used, since the table may not be valid when bringing Perl up for the first time on a non-ASCII platform. This allows it to compile using the platform's native C library ctype functions, which should work enough to compile miniperl, and allow the table to be changed to be valid. Then Configure can be re-run to not bootstrap, and normal compilation can proceed
*	handy.h: Remove extraneous parens	Karl Williamson	2013-08-29	1	-1/+1
\|
*	handy.h: White space only	Karl Williamson	2013-08-29	1	-23/+23
\|
*	EBCDIC has the unicode bug too	Karl Williamson	2013-08-29	1	-28/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have not had a working modern Perl on EBCDIC for some years. When I started out, comments and code led me to conclude erroneously that natively it supported semantics for all 256 characters 0-255. It turns out that I was wrong; it natively (at least on some platforms) has the same rules (essentially none) for the characters which don't correspond to ASCII ones, as the rules for these on ASCII platforms. A previous commit for 5.18 changed the docs about this issue. This current commit forces ASCII rules on EBCDIC platforms (even should there be one that natively uses all 256). To get all 256, the same things like 'use feature "unicode_strings"' must now be done.
*	handy.h: Solve a failure to compile problem under EBCDIC	Karl Williamson	2013-08-29	1	-13/+22
\| \| \| \| \| \| \| \| \|	handy.h is included in files that don't include perl.h, and hence not utf8.h. We can't rely therefore on the ASCII/EBCDIC conversion macros being available to us. The best way to cope is to use the native ctype functions. Most, but not all, of the macros in this commit currently resolve to use those native ones, but a future commit will change that.
*	handy.h: Simplify some macro definitions	Karl Williamson	2013-08-29	1	-6/+3
\| \| \| \| \|	Now, only one of the macros relies on magic numbers (isPRINT), leading to clearer definitions.
*	handy.h: Combine macros that are same in ASCII, EBCDIC	Karl Williamson	2013-08-29	1	-8/+4
\| \| \| \| \| \| \| \|	These 4 macros can have the same RHS for their ASCII and EBCDIC versions, so no need to duplicate their definitions This also enables the EBCDIC versions to not have undefined expansions when compiling without perl.h
*	Remove EBCDIC remappings	Karl Williamson	2013-08-29	1	-7/+5
\| \| \| \| \| \| \| \|	Now that the Unicode tables are stored in native format, we shouldn't be doing remapping. Note that this assumes that the Latin1 casing tables are stored in native order; not all of this has been done yet.
*	Add and use macro to return EBCDIC	Karl Williamson	2013-08-29	1	-7/+7
\| \| \| \| \| \| \| \|	The conversion from UTF-8 to code point should generally be to the native code point. This adds a macro to do that, and converts the core calls to the existing macro to use the new one instead. The old macro is retained for possible backwards compatibility, though it probably should be deprecated.
*	Use byte domain EBCDIC/LATIN1 macro where appropriate	Karl Williamson	2013-08-29	1	-20/+20
\| \| \| \| \| \| \| \| \| \|	The macros like NATIVE_TO_UNI will work on EBCDIC, but operate on the whole Unicode range. In the locations affected by this commit, it is known that the domain is limited to a single byte, so the simpler ones whose names contain LATIN1 may be used. On ASCII platforms, all the macros are null, so there is no effective change.
*	Revert "Remove the non-inline function S_croak_memory_wrap from inline.h."	Tony Cook	2013-07-24	1	-2/+2
\| \| \| \| \| \| \| \| \|	This reverts commit 43387ee1abcd83c3c7586b7f7aa86e838d239aac. Which reverted parts of f019c49e380f764c1ead36fe3602184804292711, but that reversion may no longer be necessary. See [perl #116989]
*	perlapi: Add note to isASCII	Karl Williamson	2013-06-18	1	-0/+6
\|
*	Stop making assumptions about uids and gids.	Brian Fraser	2013-06-04	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code dealt rather inconsistently with uids and gids. Some places assumed that they could be safely stored in UVs, others in IVs, others in ints; All of them should've been using the macros from config.h instead. Similarly, code that created SVs or pushed values into the stack was also making incorrect assumptions -- As a point of reference, only pp_stat did the right thing: #if Uid_t_size > IVSIZE mPUSHn(PL_statcache.st_uid); #else # if Uid_t_sign <= 0 mPUSHi(PL_statcache.st_uid); # else mPUSHu(PL_statcache.st_uid); # endif #endif The other places were potential bugs, and some were even causing warnings in some unusual OSs, like haiku or qnx. This commit ammends the situation by introducing four new macros, SvUID(), sv_setuid(), SvGID(), and sv_setgid(), and using them where needed.
*	handy.h: Change the error return of two macros	Karl Williamson	2013-05-20	1	-8/+8
\| \| \| \| \| \| \| \| \| \|	These two undocumented macros returned the REPLACEMENT CHARACTER if the input was outside the Latin1 range. This was contrary to all other similar macros, which return their input if it is invalid. It caused warnings in some (dumber than average) compilers. These macros are undocumented; this changes the behavior only of illegal inputs to them.
*	handy.h: Add some macro definitions	Karl Williamson	2013-05-20	1	-1/+23
\| \| \| \| \| \| \| \|	These macros fill in all the missing case changing operations. They were omitted before because they are identical in their input domains to other operations. But by adding them here, that detail no longer need be known by the callers. toFOLD_LC is not documented, as is subject to change
*	perlapi: Add docs for some case-changing macros; clarify others	Karl Williamson	2013-05-20	1	-7/+100
\| \| \| \| \| \| \| \| \| \| \|	The case changing macros are now almost all documented. The exception is toUPPER_LC, which may change in 5.19 In addition the functions in utf8.c that these macros call now refer to them instead of having their own documentation. People should really be using the macros instead of calling the functions directly. I'm not deprecating the functions because I can't foresee the need to change them, so code that uses them should continue to be ok.
*	handy.h: Add missing toFOLD_utf8 macro	Karl Williamson	2013-05-20	1	-0/+1
\| \| \| \|	This corresponds to the other case changing macros
*	handy.h: define some synonyms for consistency	Karl Williamson	2013-05-20	1	-2/+8
\| \| \| \|	Other macros have these suffixes, so for uniformity add these.
*	handy.h: Clarify comment	Karl Williamson	2013-05-20	1	-5/+4
\|
*	perlapi.pod: Clarify character classification macros	Karl Williamson	2013-04-20	1	-37/+31
\| \| \| \|	The language was confusing, and this also fixes a typo.
*	handy.h: Remove docs for non-existent macro	Karl Williamson	2013-03-29	1	-6/+0
\| \| \| \| \| \| \| \| \| \| \|	In commit 3c3ecf18c35ad7832c6e454d304b30b2c0fef127, I mistakenly added documentation for a non-existent macro. It turns out that only the variants listed for that macro exist, and not the base macro. Since we are in code freeze, the solution has to be not to change code by adding the base macro, but to delete the documentation, or change it to refer to just the existing versions. In order to not cause an entry that is anomalous to the others, for this release, I'm just getting rid of the documentation.
*	Remove the non-inline function S_croak_memory_wrap from inline.h.	Andy Dougherty	2013-03-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This appears to resolve these three related tickets: [perl #116989] S_croak_memory_wrap breaks gcc warning flags detection [perl #117319] Can't include perl.h without linking to libperl [perl #117331] Time::HiRes::clock_gettime not implemented on Linux (regression?) This patch changes S_croak_memory_wrap from a static (but not inline) function into an ordinary exported function Perl_croak_memory_wrap. This has the advantage of allowing programs (particuarly probes, such as in cflags.SH and Time::HiRes) to include perl.h without linking against libperl. Since it is not a static function defined within each compilation unit, the optimizer can no longer remove it when it's not needed or inline it as needed. This likely negates some of the savings that motivated the original commit 380f764c1ead36fe3602184804292711. However, calling the simpler function Perl_croak_memory_wrap() still does take less set-up than the previous version, so it may still be a slight win. Specific cross-platform measurements are welcome.
*	perlapi: Document some macros	Karl Williamson	2013-03-26	1	-2/+26
\|
*	EBCDIC has the Unicode bug too	Karl Williamson	2013-03-11	1	-13/+9
\| \| \| \| \| \| \| \| \| \| \| \|	We have not had a working modern Perl on EBCDIC for some years. When I started out, comments and code led me to conclude erroneously that natively it supported semantics for all 256 characters 0-255. It turns out that I was wrong; it natively (at least on some platforms) has the same rules (essentially none) for the characters which don't correspond to ASCII onees, as the rules for these on ASCII platforms. This commit is documentation only, mostly just removing the special mentions of EBCDIC.
*	perlapi: Nits	Karl Williamson	2013-03-11	1	-13/+13
\|
*	Deprecate certain rare uses of backslashes within regexes	Karl Williamson	2013-01-19	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are three pairs of characters that Perl recognizes as metacharacters in regular expression patterns: {}, [], and (). These can be used as well to delimit patterns, as in: m{foo} s(foo)(bar) Since they are metacharacters, they have special meaning to regular expression patterns, and it turns out that you can't turn off that special meaning by the normal means of preceding them with a backslash, if you use them, paired, within a pattern delimitted by them. For example, in m{foo\{1,3\}} the backslashes do not change the behavior, and this matches "f", "o" followed by one to three more occurrences of "o". Usages like this, where they are interpreted as metacharacters, are exceedingly rare; we think there are none, for example, in all of CPAN. Hence, this deprecation should affect very little code. It does give notice, however, that any such code needs to change, which will in turn allow us to change the behavior in future Perl versions so that the backslashes do have an effect, and without fear that we are silently breaking any existing code. =head1 Performance Enhancements
*	handy.h: Fix isIDCONT_utf8()	Karl Williamson	2013-01-14	1	-1/+1
\| \| \| \| \|	It was handling above-Latin1 code points as IDstarts instead of continues.
*	regex: Add pseudo-Posix class: 'cased'	Karl Williamson	2012-12-31	1	-18/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	/[[:upper:]]/i and /[[:lower:]]/i should match the Unicode property \p{Cased}. This commit introduces a pseudo-Posix class, internally named 'cased', to represent this. This class isn't specifiable by the user, except through using either /[[:upper:]]/i or /[[:lower:]]/i. Debug output will say ':cased:'. The regex parsing either of :lower: or :upper: will change them into :cased:, where already existing logic can handle this, just like any other class. This commit fixes the regression introduced in 3018b823898645e44b8c37c70ac5c6302b031381, and that these have never worked under 'use locale'. The next commit will un-TODO the tests for these things.