summaryrefslogtreecommitdiff
path: root/handy.h
Commit message (Collapse)AuthorAgeFilesLines
* Add a USING_MSVC6 macro to identify Microsoft Visual C++ 6.0Steve Hay2013-09-191-1/+1
| | | | | | | This simplifies some of the logic necessary for coping with its various problems. Suggested by Nicholas Clark.
* handy.h: Allow bootstrapping to non-ASCII platformKarl Williamson2013-08-291-63/+125
| | | | | | | | | | | | | This adds a bunch of macros and moves things around to support conditional compilation when Configure is called with -DBOOTSTRAP_CHARSET. Doing so causes the usual macros that are table-driven to not be used, since the table may not be valid when bringing Perl up for the first time on a non-ASCII platform. This allows it to compile using the platform's native C library ctype functions, which should work enough to compile miniperl, and allow the table to be changed to be valid. Then Configure can be re-run to not bootstrap, and normal compilation can proceed
* handy.h: Remove extraneous parensKarl Williamson2013-08-291-1/+1
|
* handy.h: White space onlyKarl Williamson2013-08-291-23/+23
|
* EBCDIC has the unicode bug tooKarl Williamson2013-08-291-28/+2
| | | | | | | | | | | | | | We have not had a working modern Perl on EBCDIC for some years. When I started out, comments and code led me to conclude erroneously that natively it supported semantics for all 256 characters 0-255. It turns out that I was wrong; it natively (at least on some platforms) has the same rules (essentially none) for the characters which don't correspond to ASCII ones, as the rules for these on ASCII platforms. A previous commit for 5.18 changed the docs about this issue. This current commit forces ASCII rules on EBCDIC platforms (even should there be one that natively uses all 256). To get all 256, the same things like 'use feature "unicode_strings"' must now be done.
* handy.h: Solve a failure to compile problem under EBCDICKarl Williamson2013-08-291-13/+22
| | | | | | | | | handy.h is included in files that don't include perl.h, and hence not utf8.h. We can't rely therefore on the ASCII/EBCDIC conversion macros being available to us. The best way to cope is to use the native ctype functions. Most, but not all, of the macros in this commit currently resolve to use those native ones, but a future commit will change that.
* handy.h: Simplify some macro definitionsKarl Williamson2013-08-291-6/+3
| | | | | Now, only one of the macros relies on magic numbers (isPRINT), leading to clearer definitions.
* handy.h: Combine macros that are same in ASCII, EBCDICKarl Williamson2013-08-291-8/+4
| | | | | | | | These 4 macros can have the same RHS for their ASCII and EBCDIC versions, so no need to duplicate their definitions This also enables the EBCDIC versions to not have undefined expansions when compiling without perl.h
* Remove EBCDIC remappingsKarl Williamson2013-08-291-7/+5
| | | | | | | | Now that the Unicode tables are stored in native format, we shouldn't be doing remapping. Note that this assumes that the Latin1 casing tables are stored in native order; not all of this has been done yet.
* Add and use macro to return EBCDICKarl Williamson2013-08-291-7/+7
| | | | | | | | The conversion from UTF-8 to code point should generally be to the native code point. This adds a macro to do that, and converts the core calls to the existing macro to use the new one instead. The old macro is retained for possible backwards compatibility, though it probably should be deprecated.
* Use byte domain EBCDIC/LATIN1 macro where appropriateKarl Williamson2013-08-291-20/+20
| | | | | | | | | | The macros like NATIVE_TO_UNI will work on EBCDIC, but operate on the whole Unicode range. In the locations affected by this commit, it is known that the domain is limited to a single byte, so the simpler ones whose names contain LATIN1 may be used. On ASCII platforms, all the macros are null, so there is no effective change.
* Revert "Remove the non-inline function S_croak_memory_wrap from inline.h."Tony Cook2013-07-241-2/+2
| | | | | | | | | This reverts commit 43387ee1abcd83c3c7586b7f7aa86e838d239aac. Which reverted parts of f019c49e380f764c1ead36fe3602184804292711, but that reversion may no longer be necessary. See [perl #116989]
* perlapi: Add note to isASCIIKarl Williamson2013-06-181-0/+6
|
* Stop making assumptions about uids and gids.Brian Fraser2013-06-041-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | The code dealt rather inconsistently with uids and gids. Some places assumed that they could be safely stored in UVs, others in IVs, others in ints; All of them should've been using the macros from config.h instead. Similarly, code that created SVs or pushed values into the stack was also making incorrect assumptions -- As a point of reference, only pp_stat did the right thing: #if Uid_t_size > IVSIZE mPUSHn(PL_statcache.st_uid); #else # if Uid_t_sign <= 0 mPUSHi(PL_statcache.st_uid); # else mPUSHu(PL_statcache.st_uid); # endif #endif The other places were potential bugs, and some were even causing warnings in some unusual OSs, like haiku or qnx. This commit ammends the situation by introducing four new macros, SvUID(), sv_setuid(), SvGID(), and sv_setgid(), and using them where needed.
* handy.h: Change the error return of two macrosKarl Williamson2013-05-201-8/+8
| | | | | | | | | | These two undocumented macros returned the REPLACEMENT CHARACTER if the input was outside the Latin1 range. This was contrary to all other similar macros, which return their input if it is invalid. It caused warnings in some (dumber than average) compilers. These macros are undocumented; this changes the behavior only of illegal inputs to them.
* handy.h: Add some macro definitionsKarl Williamson2013-05-201-1/+23
| | | | | | | | These macros fill in all the missing case changing operations. They were omitted before because they are identical in their input domains to other operations. But by adding them here, that detail no longer need be known by the callers. toFOLD_LC is not documented, as is subject to change
* perlapi: Add docs for some case-changing macros; clarify othersKarl Williamson2013-05-201-7/+100
| | | | | | | | | | | The case changing macros are now almost all documented. The exception is toUPPER_LC, which may change in 5.19 In addition the functions in utf8.c that these macros call now refer to them instead of having their own documentation. People should really be using the macros instead of calling the functions directly. I'm not deprecating the functions because I can't foresee the need to change them, so code that uses them should continue to be ok.
* handy.h: Add missing toFOLD_utf8 macroKarl Williamson2013-05-201-0/+1
| | | | This corresponds to the other case changing macros
* handy.h: define some synonyms for consistencyKarl Williamson2013-05-201-2/+8
| | | | Other macros have these suffixes, so for uniformity add these.
* handy.h: Clarify commentKarl Williamson2013-05-201-5/+4
|
* perlapi.pod: Clarify character classification macrosKarl Williamson2013-04-201-37/+31
| | | | The language was confusing, and this also fixes a typo.
* handy.h: Remove docs for non-existent macroKarl Williamson2013-03-291-6/+0
| | | | | | | | | | | In commit 3c3ecf18c35ad7832c6e454d304b30b2c0fef127, I mistakenly added documentation for a non-existent macro. It turns out that only the variants listed for that macro exist, and not the base macro. Since we are in code freeze, the solution has to be not to change code by adding the base macro, but to delete the documentation, or change it to refer to just the existing versions. In order to not cause an entry that is anomalous to the others, for this release, I'm just getting rid of the documentation.
* Remove the non-inline function S_croak_memory_wrap from inline.h.Andy Dougherty2013-03-281-2/+2
| | | | | | | | | | | | | | | | | | | | This appears to resolve these three related tickets: [perl #116989] S_croak_memory_wrap breaks gcc warning flags detection [perl #117319] Can't include perl.h without linking to libperl [perl #117331] Time::HiRes::clock_gettime not implemented on Linux (regression?) This patch changes S_croak_memory_wrap from a static (but not inline) function into an ordinary exported function Perl_croak_memory_wrap. This has the advantage of allowing programs (particuarly probes, such as in cflags.SH and Time::HiRes) to include perl.h without linking against libperl. Since it is not a static function defined within each compilation unit, the optimizer can no longer remove it when it's not needed or inline it as needed. This likely negates some of the savings that motivated the original commit 380f764c1ead36fe3602184804292711. However, calling the simpler function Perl_croak_memory_wrap() still does take less set-up than the previous version, so it may still be a slight win. Specific cross-platform measurements are welcome.
* perlapi: Document some macrosKarl Williamson2013-03-261-2/+26
|
* EBCDIC has the Unicode bug tooKarl Williamson2013-03-111-13/+9
| | | | | | | | | | | | We have not had a working modern Perl on EBCDIC for some years. When I started out, comments and code led me to conclude erroneously that natively it supported semantics for all 256 characters 0-255. It turns out that I was wrong; it natively (at least on some platforms) has the same rules (essentially none) for the characters which don't correspond to ASCII onees, as the rules for these on ASCII platforms. This commit is documentation only, mostly just removing the special mentions of EBCDIC.
* perlapi: NitsKarl Williamson2013-03-111-13/+13
|
* Deprecate certain rare uses of backslashes within regexesKarl Williamson2013-01-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are three pairs of characters that Perl recognizes as metacharacters in regular expression patterns: {}, [], and (). These can be used as well to delimit patterns, as in: m{foo} s(foo)(bar) Since they are metacharacters, they have special meaning to regular expression patterns, and it turns out that you can't turn off that special meaning by the normal means of preceding them with a backslash, if you use them, paired, within a pattern delimitted by them. For example, in m{foo\{1,3\}} the backslashes do not change the behavior, and this matches "f", "o" followed by one to three more occurrences of "o". Usages like this, where they are interpreted as metacharacters, are exceedingly rare; we think there are none, for example, in all of CPAN. Hence, this deprecation should affect very little code. It does give notice, however, that any such code needs to change, which will in turn allow us to change the behavior in future Perl versions so that the backslashes do have an effect, and without fear that we are silently breaking any existing code. =head1 Performance Enhancements
* handy.h: Fix isIDCONT_utf8()Karl Williamson2013-01-141-1/+1
| | | | | It was handling above-Latin1 code points as IDstarts instead of continues.
* regex: Add pseudo-Posix class: 'cased'Karl Williamson2012-12-311-18/+21
| | | | | | | | | | | | | | | | | /[[:upper:]]/i and /[[:lower:]]/i should match the Unicode property \p{Cased}. This commit introduces a pseudo-Posix class, internally named 'cased', to represent this. This class isn't specifiable by the user, except through using either /[[:upper:]]/i or /[[:lower:]]/i. Debug output will say ':cased:'. The regex parsing either of :lower: or :upper: will change them into :cased:, where already existing logic can handle this, just like any other class. This commit fixes the regression introduced in 3018b823898645e44b8c37c70ac5c6302b031381, and that these have never worked under 'use locale'. The next commit will un-TODO the tests for these things.
* handy.h, regcomp.h, regexec.c: Sort initializers, switch()Karl Williamson2012-12-311-8/+8
| | | | | | | | Until recently, these were needed to be (or it made sense to be) in numerical value of what the rhs of each #define evaluates to. But now, they are all initialized to something else, and the numerical value is not even apparent. Alphabetical order gives a logical ordering to help a reader find things.
* perlapi: Clarify isSPACE(), document isPSXSPC()Karl Williamson2012-12-231-2/+25
|
* handy.h: Add full complement of isIDCONT() macrosKarl Williamson2012-12-231-3/+12
| | | | | | | This also changes isIDCONT_utf8() to use the Perl definition, which excludes any \W characters (the Unicode definition includes a few of these). Tests are also added. These macros remain undocumented for now.
* Remove temporary back-compat PL_ variable namesKarl Williamson2012-12-221-10/+0
| | | | | | These names are synonyms for specific array elements, and were used temporarily until all uses of them were removed. This commit removes the remaining uses, and the definitions
* handy.h: Improve some commentsKarl Williamson2012-12-221-9/+14
|
* Consolidate some regex OPSKarl Williamson2012-12-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The regular rexpression operation POSIXA works on any of the (currently) 16 posix classes (like \w and [:graph:]) under the regex modifier /a. This commit creates similar operations for the other modifiers: POSIXL (for /l), POSIXD (for /d), POSIXU (for /u), plus their complements. It causes these ops to be generated instead of the ALNUM, DIGIT, HORIZWS, SPACE, and VERTWS ops, as well as all their variants. The net saving is 22 regnode types. The reason to do this is for maintenance. As of this commit, there are now 22 fewer node types for which code has to be maintained. The code for each variant was essentially the same logic, but on different operands. It would be easy to make a change to one copy and forget to make the corresponding change in the others. Indeed, this patch fixes [perl #114272] in which one copy was out of sync with others. This patch actually reduces the number of separate code paths to 5: POSIXA, NPOSIXA, POSIXL, POSIXD, and POSIXU. The complements of the last 3 use the same code path as their non-complemented version, except that a variable is initialized differently. The code then XORs this variable with its result to do the complementing or not. Further, the POSIXD branch now just checks if the target string being matched is UTF-8 or not, and then jumps to either the POSIXU or POSIXA code respectively. So, there are effectively only 4 cases that are coded: POSIXA, NPOSIXA, POSIXL, and POSIXU. (POSIXA doesn't have to worry about UTF-8, while NPOSIXA does, hence these for efficiency are coded separately.) Removing all this code saves memory. The output of the Linux size command shows that the perl executable was shrunk by 33K bytes on my platform compiled under -O0 (.7%) and by 18K bytes (1.3%) under -O2. The reason this patch was doable was previous work in numbering the POSIX classes, so that they could be indexed in arrays and bit positions. This is a large patch; I didn't see how to break it into smaller components. I chose to make this code more efficient as opposed to saving even more memory. Thus there is a separate loop that is jumped to after we know we have to load a swash; this just saves having to test if the swash is loaded each time through the loop. I avoid loading the swash until absolutely necessary. In places in the previous version of this code, the swash was loaded when the input was UTF-8, even if it wasn't yet needed (and might never be if the input didn't contain anything above Latin1); apparently to avoid the extra test per iteration. The Perl test suite runs slightly faster on my platform with this patch under -O0, and the speeds are indistinguishable under -O2. This is in spite of these new POSIX regops being unknown to the regex optimizer (this will be addressed in future commits), and extra machine instructions being required for each character (the xor, and some shifting and masking). I expect this is a result of better caching, and not loading swashes unless absolutely necessary.
* handy.h: Refactor some internal macro callsKarl Williamson2012-12-221-73/+78
| | | | | I didn't plan very well when I added these macros recently. This refactors them to be more logical.
* Use array for some inversion listsKarl Williamson2012-12-221-0/+1
| | | | | | This patch creates an array pointing to the inversion lists that cover the Latin-1 ranges for Posix character classes, and uses it instead of the individual variables previously referred to.
* regcomp.c: Use table look-up instead of individual strings.Karl Williamson2012-12-221-1/+5
| | | | | | This changes to get the name for the character class's Unicode property via table lookup. This is in preparation for making most of the cases in this switch identical, so they can be collapsed.
* handy.h: Move some back compat macrosKarl Williamson2012-12-221-9/+7
| | | | Move them to the section that is for back-compat definitions.
* Add generic _is_(uni|utf8)_FOO() functionKarl Williamson2012-12-221-18/+45
| | | | | | This function uses table lookup to replace 9 more specific functions, which can be deprecated. They should not have been exposed to the public API in the first place
* handy.h: Create isALPHANUMERIC() and kinKarl Williamson2012-12-221-17/+38
| | | | | | | | | | | | | | | | | | | | | | Perl has had an undocumented macro isALNUMC() for a long time. I want to document it, but the name is very obscure. Neither Yves nor I are sure what it is. My best guess is "C's alnum". It corresponds to /[[:alnum:]]/, and so its best name would be isALNUM(). But that is the name long given to what matches \w. A new synonym, isWORDCHAR(), has been in place for several releases for that, but the old isALNUM() should remain for backwards compatibility. I don't think that the name isALNUMC() should be published, as it is too close to isALNUM(). I finally came to the conclusion that isALPHANUMERIC() is the best name; it describes its purpose clearly; the disadvantage is its long length. I doubt that it will get much use, but we need something, I think, that we can publish to accomplish this functionality. This commit also converts core uses of isALNUMC to isALPHANUMERIC. (I intended to that separately, but made a mistake in rebasing, and combined the two patches; and it seemed like not a big enough problem to separate them out again.)
* handy.h: Move some #definesKarl Williamson2012-12-221-10/+10
| | | | | | I'm moving this block of back-compat macros to later in the file, so it comes after all the other definitions that may need to have backwards compatibility equivalents
* intrpvar.h: Place some swash pointers in an arrayKarl Williamson2012-12-221-0/+12
|
* handy.h: Guard against recursive #inclusionKarl Williamson2012-12-221-0/+5
|
* regexec.c: Replace infamous if-else-if sequence by loopKarl Williamson2012-12-091-1/+2
| | | | | | This saves 1.5 KB in the text section on my machine in regexec.o (unoptimized) and 820 optimized. I did not benchmark, as we don't really care very much about performance under 'use locale'.
* handy.h: Add an enum typedefKarl Williamson2012-12-091-0/+23
| | | | | | | This creates a copy of all the Posix character class numbers and puts them in an enum. This enum is for internal Perl core use only, and is used so hopefully compilers can generate better code from future commits that will make use of it.
* handy.h: Reorder char class #defines; add commentsKarl Williamson2012-12-091-18/+35
| | | | | | This groups the Posix-like classes in two groups, one which contains those classes whose above-Latin1 lookups are done via swashes; the other which aren't. This will prove useful in future commits.
* handy.h: Add commentKarl Williamson2012-12-091-0/+8
|
* handy.h: Improve isDIGIT_utf8() and isXDIGIT_utf8() macrosKarl Williamson2012-12-091-2/+14
| | | | | There are no digits in the upper Latin1 range, therefore we can skip testing for such.
* handy.h: Change documentation for perlapiKarl Williamson2012-12-091-41/+148
| | | | | | | | | | | This documents several more of the character classification macros, including all variants of them. There are no code changes. The READ_XDIGIT macro was moved to "Miscellaneous Functions", as it really isn't character classification. Several of the macros remain undocumented because I'm not comfortable yet about their names/and or functionality.