summaryrefslogtreecommitdiff
path: root/handy.h
Commit message (Collapse)AuthorAgeFilesLines
* remove redundant PERL_EXPORT_C and PERL_XS_EXPORT_C macrosDaniel Dragan2015-06-031-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These 2 macros were created for the Symbian port in commit "Symbian port of Perl" to replace a direct "extern" token. I guess the author was unaware of PERL_CALLCONV. PERL_CALLCONV is the official macro to use. PERL_XS_EXPORT_C and PERL_EXPORT_C have no usage on cpan grep except for modules with direct copies of core headers. A defect of using PERL_EXPORT_C and PERL_XS_EXPORT_C instead of PERL_CALLCONV is that win32/win32.h has no knowledge of the 2 macros and doesn't set them, and os/os2ish.h doesn't either. On Win32, since the unix defaults are used instead of Win32 specific "__declspec(dllimport)" token, XS modules use indirect function stubs in each XS module placed by the CC to call into perl5**.dll instead of directly calls the core C functions. I observed this in in XS-Typemap's DLL. To simplify the API, and to decrease the amount of macros needing to implemented to support each platform, just remove the 2 macros. Since perl.h's fallback defaults for PERL_CALLCONV are very late in perl.h, they need to be moved up before function declarations start in perlio.h (perlio.h is included from iperlsys.h). win32iop.h contains the "PerlIO" and SV" tokens, so perlio.h must be included before win32iop.h is. Including perlio.h so early in win32.h, causes PERL_CALLCONV not be defined since Win32 platform uses the fallback in perl.h, since win32.h doesn't always define PERL_CALLCONV and sometimes relies on the fallback. Since win32iop.h contains alot of declarations, it belongs with other declarations such as those in proto.h so move it from win32.h to perl.h. the "free" token in struct regexp_engine conflicts with win32iop's "#define free win32_free" so rename that member.
* perlapi: Document some functionsKarl Williamson2015-05-071-0/+10
| | | | | These are mentioned in some other pods. It's best to bring them into perlapi, and refer to them from the other pods.
* Replace common Emacs file-local variables with dir-localsDagfinn Ilmari Mannsåker2015-03-221-6/+0
| | | | | | | | | | | | | | | | An empty cpan/.dir-locals.el stops Emacs using the core defaults for code imported from CPAN. Committer's work: To keep t/porting/cmp_version.t and t/porting/utils.t happy, $VERSION needed to be incremented in many files, including throughout dist/PathTools. perldelta entry for module updates. Add two Emacs control files to MANIFEST; re-sort MANIFEST. For: RT #124119.
* followup for MEM_WRAP_CHECKDavid Mitchell2015-03-041-12/+12
| | | | | On my previous commit, I failed to save the edited handy.h file where I had edited the comments, so they were missed from the commit.
* further tweak MEM_WRAP_CHECK()David Mitchell2015-03-041-11/+16
| | | | | | My recent mods to make it often constant-fold at time-time were generating Coverity warnings when the expression happened to equal (cond ? 1 : 1).
* Make MEM_WRAP_CHECK more compile-timeDavid Mitchell2015-03-031-7/+37
| | | | | | | | | | | | | | | | | | MEM_WRAP_CHECK(n,t) checks whether n * sizeof(t) exceeds the memory size, and so is likely to wrap. When the type of n is small (e.g. a U8), you used to get compiler warnings about a comparison always being true. This was avoided by adding 0.0. Now Coverity complains that you're doing a floating-point comparison with the results of an integer division. Instead of adding 0.0, instead add some more compile-time checks that will cause the runtime check to be skipped when the maximum value of n (as determined by sizeof(n)) is a lot less than memory size. On my 64-bit system this also pleasingly makes the executable 8384 bytes smaller, implying that in many cases, the run-time check is now being skipped.
* \s matching VT is no longer experimentalKarl Williamson2015-02-211-37/+24
| | | | | | | This was experimentally introduced in 5.18, and no issues were raised, except that it got us to thinking and spurred us to stop allowing $^X, where 'X' is a non-printable control character, and that change caused some issues.
* handy.h: EXTERN_C-ize PL_charclassKarl Williamson2015-01-231-0/+2
| | | | See thread http://nntp.perl.org/group/perl.perl5.porters/224999
* handy.h Cast to unsigned before doing xorKarl Williamson2014-12-301-9/+9
| | | | | | It occurred to me that these macros could have an xor applied to a signed value if the argument is signed, whereas the xor is expecting unsigned.
* Remove duplicate apidoc entriesDavid Mitchell2014-12-171-4/+0
| | | | | Modify apidoc.pl to warn about duplicate apidoc entries, and remove duplicates for av_tindex and toLOWER_LC
* toupper/lower: avoid sign warningsDavid Mitchell2014-12-161-2/+2
| | | | | | | Perl's toLOWER_LC() etc macros are specified as having U8 arg and return, while the underlying macro may call the OS's tolower() function which is int. Stop the compiler warning about mismatched sign in conditional by casting the result of the OS function.
* handy.h: Add missing parentheses to macro #defineKarl Williamson2014-11-141-1/+1
| | | | | | These being missing caused 3d3a881c1b0eb9c855d257a2eea1f72666e30fbc to have to be reverted. It only shows up on platforms that don't have an isblank() libc function.
* handy.h: Two EBCDIC fixesKarl Williamson2014-10-211-10/+13
| | | | | In EBCDIC only macros, an argument previously was failed to be dereferenced, and there was an extra ==. A few comment changes as well
* Fix isASCII for EBCDICKarl Williamson2014-10-211-33/+82
| | | | | | | | | | | | | | | | | | | | | | | | | Prior to this commit isASCII on EBCDIC platforms was defined as the isascii() libc function. It turns out that this doesn't work properly. It needed to be this way back when EBCDIC was bootstrapped onto the target machine, but now, various header files are furnished with the requisite definitions, so this is no longer necessary. The problem with isascii() is that it is locale-dependent, unlike on ASCII platforms. This means that instead of getting a standard ASCII definition, it returns whatever the underlying locale says, even if there is no 'use locale' anywhere in the program. Starting with this commit, the isASCII definition now comes from the l1_char_class_tab.h file which we know is accurate and not locale-dependent. This header can be used in compilations of utility programs where perl.h is not available. For these, there are alternate, more complicated definitions, should they be needed in those utility programs. Several of those definitions prior to this commit also used locale-sensitive isfoo() functions. The bulk of this commit refactors those definitions to not use these functions as much as possible. As noted in the added comments in the code, the one remaining use of such a function is only for the lesser-used control characters. Likely these aren't used in the utility programs.
* handy.h: Add missing macroKarl Williamson2014-10-211-1/+6
| | | | | | | This section of code is normally not compiled, but when circumstances call for it to be compiled, it may be missing the macro defined in this commit, which is trivial on ASCII platforms, so just define it if missing
* handy.h: Need macro definition for normally non-compiled codeKarl Williamson2014-10-211-0/+1
| | | | | | | This section of code is compiled only when perl.h is not available, i.e. for utility programs. I periodically test that it still works, and this time a macro was added to the other branch of the #if, but not this one. This commit adds a trivial one to the missing area.
* handy.h: Comments onlyKarl Williamson2014-10-211-3/+2
| | | | | Removes obsolete comment, and adds text to make it easier to find matching #else and #endif of a #if
* perlapi: Clarify two entriesKarl Williamson2014-10-071-3/+6
|
* regcomp.c: Make macro a lookupKarl Williamson2014-09-061-1/+4
| | | | | | | | | | | The recently introduced macro isMNEMONIC_CNTRL has a look-up and several tests in it, which occupy time and space. Since it was only used for debugging, that did not matter much, but future commits will use it in more mainline code. This commit changes it to be a single look-up, using up one of the spare bits available for that purpose in PL_charclass. There are enough available bits that we aren't likely to run out, really ever. (We can always add a 2nd word of bits if necessary.)
* handy.h, regcomp.c: Add, clarify commentsKarl Williamson2014-08-251-5/+6
|
* Add and use macros for case-insensitive comparisonKarl Williamson2014-08-221-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | This adds to handy.h isALPHA_FOLD_EQ(c1,c2) which efficiently tests if c1 and c2 are the same character, case-insensitively. For example isALPHA_FOLD_EQ(c, 's') returns true if and only if <c> is 's' or 'S'. isALPHA_FOLD_NE() is also added by this commit. At least one of c1 and c2 must be known to be in [A-Za-z] or this macro doesn't work properly. (There is an assert for this in the macro in DEBUGGING builds). That is why the name includes "ALPHA", so you won't forget when using it. This functionality has been in regcomp.c for a while, under a different name. I had thought that the only reason to make it more generally available was potential speed gain, but recent gcc versions optimize to the same code, so I thought there wasn't any point to doing so. But I now think that using this makes things easier to read (and certainly shorter to type in). Once you grok what this macro does, it simplifies what you have to keep in your mind when reading logical expressions with multiple operands. That something can be either upper or lower case can be a distraction to understanding the larger point of the expression.
* Deprecate unescaped literal "{" in regex patternsKarl Williamson2014-06-121-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit also causes escaped (by a backslash) "(", "[", and "{" to be considered literally. In the previous 2 Perl versions, the escaping was ignored, and a (default-on) deprecation warning was raised. Now that we have warned for 2 release cycles, we can change the meaning.of escaping to actually do something Warning when a literal left brace is not escaped by a backslash, will allow us to eventually use this character in more contexts as being meta, allowing us to extend the language. For example, the lower limit of a quantifier could be omited, and better error checking instituted, or things like \w could be followed by a {...} indicating some special word character, like \w{Greek} to restrict to just Greek word characters. We tried to do this in v5.16, and many CPAN modules changed to backslash their left braces at that time. However we had to back out that change before 5.16 shipped because it turned out that escaping a left brace in some contexts didn't work, namely when the brace would normally be a metacharacter (for example surrounding a quantifier), and the pattern delimiters were { }. Instead we raised the useless backslash warning mentioned above, which has now been there for the requisite 2 cycles. This patch partially reverts 2 patches. The first, e62d0b1335a7959680be5f7e56910067d6f33c1f, partially reverted the deprecation of unescaped literal left brace. The other, 4d68ffa0f7f345bc1ae6751744518ba4bc3859bd, instituted the deprecation of the useless left-characters. Note that, as in the original attempt to deprecate, we don't raise a warning if the left brace is the first character in the pattern. This is because in that position it can't be a metacharacter, so we don't require any disambiguation, and we found that if we did raise an error, there were quite a few places where this occurred.
* Removed NeXT supportBrian Fraser2014-06-111-102/+63
|
* perlapi: Refactor placements, headings of some functionsKarl Williamson2014-06-051-18/+35
| | | | | | | | | | | | | | It is not very user friendly to list functions as "Functions found in file FOO". Better is to group them by purpose, as many were already. I went through and placed the ones that weren't already so grouped into groups. Patches welcome if you have a better classification. I changed the headings of some so that the important disctinction was the first word so that they are placed in the file more appropriately. And a couple of ones that I had created myself, I came up with a name that I think is better than the original
* Fix Windows ctype functionsKarl Williamson2014-06-051-1/+31
| | | | | | | Windows doesn't follow the Posix standard for their functions like isalnum(), isdigit(), etc. This forces compliance by changing the macros that are the interfaces to those functions to be smarter than just calling the raw functions.
* regcomp.c: Skip work that is a no-opKarl Williamson2014-06-011-9/+12
| | | | | | | | | | | | There are a few characters in the Latin1 range that can be folded to by above-Latin1 characters. Some of these are folded to as part of a single character fold, like KELVIN SIGN folds to 'k'. More are folded to as part of a multi-character fold. Until this commit, there wasn't a quick way to distinguish between the two classes. A couple of places only want the single-character ones. It is more efficient to look for just those than to include the multi-char ones which end up not doing anything. This uses a bit in l1_char_class_tab.h to indicate those characters that are in the desired class.
* Fix definition of toCTRL() for EBCDICKarl Williamson2014-05-311-5/+7
| | | | | | The definition was incorrect. When going from control to printable name, we need to go from Latin1 -> Native, so that e.g., a 65 gets turned into the native 'A'
* Revert bootstrapping to non-ASCII platformsKarl Williamson2014-05-311-17/+6
| | | | | | | | | | | | | | | | | | | This effectively reverts commit 3ded5eb052cdc3f861ec0c0ff85348086d653be0. That commit created a scheme to bootstrap Perl onto a non-ASCII platform, by adding the allowing a Configure option that caused the compiled code to bypass a number of normal macro definitions and use slower, generic ones, sufficient to get miniperl to compile on the target architecture. One would then use miniperl to run a few scripts that would re-order certain header files, Using this one could then recompile all of perl, and once that was done, use it to recompile to use the normal fast macros. This worked, but was a cumbersome process. We now have the infrastructure, since commit 6ff677df5d6fe0f52ca0b6736f8b5a46ac402943, to cross compile on an ASCII platform to EBCDIC, the likely only non-ASCII character set to ever be used. So the new infrastructure will be used in future commits.
* handy.h: Make macro more efficient on EBCDICKarl Williamson2014-05-311-1/+9
| | | | The comments say it all
* perlapi: Clarify some instances where NUL is or isn't permittedKarl Williamson2014-05-301-8/+10
| | | | | | Some functions that take a string/length pair can have embedded NULs and don't have to be NUL terminated; others are the opposite. This adds text to clarify the issue.
* handy.h: Comments, white-space onlyKarl Williamson2014-05-301-3/+4
|
* handy.h: Use some common macros for ASCII/EBCDICKarl Williamson2014-05-301-23/+15
| | | | | | | It turns out that the EBCDIC definitions can be made the same as the ASCII ones, so this moves the ASCII definitions to the spot where other ones common to the 2 platforms reside, and removes the EBCDIC ones. In other words it combines separate definitions into common ones.
* __APPLE__ is not Apple, use PERL_DARWIN instead.Jarkko Hietaniemi2014-05-291-1/+1
| | | | See hints/darwin.sh for details.
* Make UINT64_C()/INT64_C() available anytime HAS_QUAD is definedTony Cook2014-05-291-1/+1
| | | | | Prevent the failure for 32-bit builds on C89 compilers introduced in f4e3fd268af3.
* UINT64_C/INT64_C logic shuffling.Jarkko Hietaniemi2014-05-281-23/+40
| | | | | | | | (1) Prefer the native int/long over long long (not in C89!) or __int64. (2) Define them only if necessary, they might be defined in <stdint.h> by C99 (3) However, note the C99. They might not be available in strict C89. (4) In OS X they are defined with ULL/LL, which will not be to the liking of C89 pedantic gcc.
* Use the C_ARRAY_LENGTH.Jarkko Hietaniemi2014-05-281-1/+6
| | | | | | | | | | | | Use the C_ARRAY_LENGTH instead of sizeof(c_array)/sizeof(c_array[0]) or sizeof(c_array)/sizeof(type_of_element_in_c_array), and C_ARRAY_END for c_array + C_ARRAY_LENGTH(c_array). While doing this found potential off-by-one error in sv.c:Perl_sv_magic: how > C_ARRAY_LENGTH(PL_magic_data) should probably have been how >= C_ARRAY_LENGTH(PL_magic_data) No tests fail, but this seems to be more of an internal sanity check.
* Fix comments and pod that mention 5.20 erroneouslyKarl Williamson2014-04-011-2/+2
| | | | | | In certain places in the documentation, "5.20" is no longer applicable. Also, a message referred to in perldiag got reworded, but our checks did not catch that perldiag should have been updated.
* sprinkle LIKELY() on pp_hot.c scope.c and some *.hDavid Mitchell2014-03-121-2/+2
| | | | | | | | I've gone through pp_hot.c and scope.c and added LIKELY() or UNLIKELY() to all conditionals where I understand the code well enough to know that a particular branch is or isn't likely to be taken very often. I also processed some of the .h files which contain commonly used macros.
* handy.h Special case toCTRL('?') for EBCDICKarl Williamson2014-02-051-5/+17
| | | | | | | | | | | | | | There is no change for ASCII platforms. For EBCDIC ones, toCTRL('?") and its inverse are special cased to map to/from the APC control character, which is the outlier control on these platforms. The reason to special case this is that otherwise toCTRL('?') would map to a graphic character, not a control. By outlier, I mean it is the one control not in the single block where all the other controls are placed. Further, it corresponds on two of the platforms with 0xFF, which is would be an EBCDIC rub-out character corresponding to an ASCII rub-out (or DEL) 0x7F, which is what toCTRL('?') maps to on ASCII. This is an outlier control on ASCII not being a member of the C0 nor C1 controls. Hence this make '?' mean the outlier control on both platforms.
* Fix [[:blank:]] handling when no isblank() on platformKarl Williamson2014-02-031-2/+2
| | | | | | | | | | | | | | | | | | isblank() is a C99 construct, Perl tries to handle the use of this on C89 platforms by using the standard hard-coded definition. However, this code was not updated to account for UTF-8 locales when handling for those was recently added (31f05a37c), since in a UTF-8 locale the no-break space is also considered to be a blank. This commit fixes that. Previously regcomp.c generated the hard-coded definitions when there was no isblank(), using #ifdef'd code. That special handling was removed, and [:blank:] is always treated just like any other POSIX class. The specialness of it is hidden entirely in handy.h. This simplifies the regcomp.c code slightly. I considered removing the special handling for isascii(), also a C99 construct, in the name of simplicity over the slight speed that would be lost. But the special handling is only a single line in two places, so I left it in.
* handy.h: Add a cBOOL()Karl Williamson2014-01-301-1/+1
| | | | | | | isascii() is used as a fallback for isASCII(). This would be on an unusual platform or under unusual circumstances. isascii() may return values besides 0 and 1 which can cause things that are expecting a bool to fail.
* PATCH: [perl #121109] locales failingKarl Williamson2014-01-281-6/+4
| | | | | | | This was due to a logic error in toFOLD_LC() introduced in 31f05a37c4e9c37a7263491f2fc0237d836e1a80. It affected only the code point at 0xB5 and shows up only in locales in which the character at that code point is an uppercase letter.
* Work properly under UTF-8 LC_CTYPE localesKarl Williamson2014-01-271-13/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This large (sorry, I couldn't figure out how to meaningfully split it up) commit causes Perl to fully support LC_CTYPE operations (case changing, character classification) in UTF-8 locales. As a side effect it resolves [perl #56820]. The basics are easy, but there were a lot of details, and one troublesome edge case discussed below. What essentially happens is that when the locale is changed to a UTF-8 one, a global variable is set TRUE (FALSE when changed to a non-UTF-8 locale). Within the scope of 'use locale', this variable is checked, and if TRUE, the code that Perl uses for non-locale behavior is used instead of the code for locale behavior. Since Perl's internal representation is UTF-8, we get UTF-8 behavior for a UTF-8 locale. More work had to be done for regular expressions. There are three cases. 1) The character classes \w, [[:punct:]] needed no extra work, as the changes fall out from the base work. 2) Strings that are to be matched case-insensitively. These form EXACTFL regops (nodes). Notice that if such a string contains only characters above-Latin1 that match only themselves, that the node can be downgraded to an EXACT-only node, which presents better optimization possibilities, as we now have a fixed string known at compile time to be required to be in the target string to match. Similarly if all characters in the string match only other above-Latin1 characters case-insensitively, the node can be downgraded to a regular EXACTFU node (match, folding, using Unicode, not locale, rules). The code changes for this could be done without accepting UTF-8 locales fully, but there were edge cases which needed to be handled differently if I stopped there, so I continued on. In an EXACTFL node, all such characters are now folded at compile time (just as before this commit), while the other characters whose folds are locale-dependent are left unfolded. This means that they have to be folded at execution time based on the locale in effect at the moment. Again, this isn't a change from before. The difference is that now some of the folds that need to be done at execution time (in regexec) are potentially multi-char. Some of the code in regexec was trivial to extend to account for this because of existing infrastructure, but the part dealing with regex quantifiers, had to have more work. Also the code that joins EXACTish nodes together had to be expanded to account for the possibility of multi-character folds within locale handling. This was fairly easy, because it already has infrastructure to handle these under somewhat different circumstances. 3) In bracketed character classes, represented by ANYOF nodes, a new inversion list was created giving the characters that should be matched by this node when the runtime locale is UTF-8. The list is ignored except under that circumstance. To do this, I created a new ANYOF type which has an extra SV for the inversion list. The edge case that caused the most difficulty is folding involving the MICRO SIGN, U+00B5. It folds to the GREEK SMALL LETTER MU, as does the GREEK CAPITAL LETTER MU. The MICRO SIGN is the only 0-255 range character that folds to outside that range. The issue is that it doesn't naturally fall out that it will match the CAP MU. If we let the CAP MU fold to the samll mu at compile time (which it can because both are above-Latin1 and so the fold is the same no matter what locale is in effect), it could appear that the regnode can be downgraded away from EXACTFL to EXACTFU, but doing so would cause the MICRO SIGN to not case insensitvely match the CAP MU. This could be special cased in regcomp and regexec, but I wanted to avoid that. Instead the mktables tables are set up to include the CAP MU as a character whose presence forbids the downgrading, so the special casing is in mktables, and not in the C code.
* handy.h: Move the _LC_CAST declaration.Brian Fraser2014-01-231-2/+2
| | | | | Previous it was left so that some systems, like Android, didn't get this, which broke the build.
* handy.h: Express locale macros using common base macrosKarl Williamson2014-01-221-34/+85
| | | | | | | | | This extracts out the code of looking up POSIX classes in locales to use base macros common to all of them. It does this for the NeXT only code as well as the typical compilations. This is in preparation for changing the behavior. Certain things look weird as they are aligned, etc as part of that preparation.
* handy.h: Factor out common codeKarl Williamson2014-01-221-18/+14
| | | | | | | | | | | It turns out that the definitions for isASCII_LC and is_BLANK_LC end up being the same for all three possible #if platform states, so can just have them once instead of three times. It is unlikely that the && ! defined(USE_NEXT_CTYPE) is necessary, because HAS_ISASCII likely won't be defined, but this makes sure that this doesn't change the previous behavior.
* handy.h: White-space, comments, pod nit onlyKarl Williamson2014-01-221-37/+51
|
* handy.h: Add two macrosKarl Williamson2014-01-011-4/+13
| | | | | | | | handy.h contains a macro that reads a hex digit and returns its value, with fewer branches than a naive implementation would use. This commit just copies and modifies it to create two macros for 1) just converting the hex value, without advancing the input; and 2) doing the same for an octal value.
* handy.h: Add debugging assertionKarl Williamson2014-01-011-1/+3
| | | | | This macro requires the input to be a hex digit, without testing. It is prudent to assert that under DEBUGGING.
* Move a macro from utf8.h to handy.h for wider use.Karl Williamson2014-01-011-0/+10
| | | | Future commits will want this available outside utf8.h