summaryrefslogtreecommitdiff
path: root/handy.h
Commit message (Collapse)AuthorAgeFilesLines
* Use full sym name in isIDFIRST_utf8 to fix [perl #100930]Father Chrysostomos2011-10-071-1/+1
| | | | | | _is_utf8__perl_idstart is not an API function, so the short _is_utf8__perl_idstart form cannot be used in public macros. The long form (Perl__is_utf8__perl_idstart) must be used.
* Now with comma :(H.Merijn Brand2011-10-061-1/+1
|
* _A is predefined in some precompiler environmentsH.Merijn Brand2011-10-061-1/+1
| | | | | | | | | | | | On HP-UX 10.20 in the HP C-ANSI-C environment CAT2(macro, _A) expands to macro01 as _A obviously expands to 01. This fix "breaks" the token
* handy.h: Reorder tests for speedKarl Williamson2011-10-011-4/+4
| | | | | | | | | | It's much more likely that a random character will have its ordinal be above the ordinal for '7' than below. In the test for if a character is octal then, testing first if it is <= '7' will exclude many more possibilities than if the first test is if it is >= '0'. I left the ones for lowercase letters in the same order, because, in ASCII, anyway, there are more characters below 'a' than above it.
* handy.h: Add macroKarl Williamson2011-10-011-0/+4
|
* handy.h Fix isOCTAL_A macroKarl Williamson2011-10-011-1/+1
| | | | | | This has the incorrect definition, allowing 8 and 9, for programs that don't include perl.h. Likely no one actually uses this recently added macro who doesn't also include perl.h.
* handy.h: Add comments, pod changeKarl Williamson2011-10-011-2/+7
|
* handy.h: Improve definition of FITS_IN_8_BITSKarl Williamson2011-10-011-4/+2
| | | | | Unoptimized, the new definition takes signficantly fewer machine instructions than the old one
* handy.h: Change '(foo) ? bar : 0 to 'foo && bar'Karl Williamson2011-10-011-3/+3
| | | | | This is clearer, and leads to better unoptimized code at least. 'bar' is a boolean
* handy.h: Speed up isIDFIRST_utf8()Karl Williamson2011-10-011-1/+1
| | | | | | This now takes advantage of the new table that mktables generates to find out if a character is a legal start character in Perl's definition. Previously, it had to be looked up in two tables.
* Comment-only nitsKarl Williamson2011-10-011-3/+4
|
* handy.h: Add missing isASCII_L1 macroKarl Williamson2011-10-011-0/+1
| | | | This macro is in the pod, but never got defined.
* handy.h: Don't call _utf8 fcns if Latin1Karl Williamson2011-10-011-7/+19
| | | | | | This patch avoids the overhead of calling eg. is_utf8_alpha() on Latin1 inputs. The result is known to Perl's core, and this can avoid a swash load.
* handy.h: Don't call _utf8 fcns if ASCIIKarl Williamson2011-10-011-17/+31
| | | | | | This patch avoids the overhead of calling eg. is_utf8_alpha() on ASCII inputs. The result is known to Perl's core, and this can avoid a swash load.
* handy.h: Don't call _uni fcns if have applicable macroKarl Williamson2011-10-011-12/+23
| | | | | This patch avoids the overhead of calling eg. is_uni_alpha() if the result is known to Perl's core. This can avoid a swash load.
* Don't use swash to find cntrlsKarl Williamson2011-10-011-1/+2
| | | | | | | | | Unicode stability policy guarantees that no code points will ever be added to the control characters beyond those already in it. All such characters are in the Latin1 range, and so the Perl core already knows which ones those are, and so there is no need to go out to disk and create a swash for these.
* handy.h: No need to call fcns to compute if ASCIIKarl Williamson2011-10-011-2/+3
| | | | | | | Only the characters whose ordinals are 0-127 are ASCII. This is trivially computed by the macro, so no need to call is_uni_ascii() to do this. Also, since ASCII characters are the same when represented in utf8 or not, the utf8 function call is also superfluous.
* handy.h: Simplify isASCII definitionKarl Williamson2011-10-011-1/+6
| | | | | | | | | | | Thus retains essentially the same definition for EBCDIC platforms, but substitutes a simpler one for ASCII platforms. On my system, the new definition compiles to about half the assembly instructions that the old one did (non-optimized) A bomb-proof definition of ASCII is to make sure that the value is unsigned in the largest possible unsigned for the platform so there is no possible loss of information, and then the ord must be < 128.
* handy.h: refactor FITS_IN_8_BITS defnKarl Williamson2011-10-011-8/+12
| | | | | | This creates a #define for the platforms widest UV, and then uses this in the FITS_IN_8ITS definition, instead of #ifdef'ing that. This will be useful in future commits.
* handy.h: clarify, typos in commentKarl Williamson2011-10-011-9/+10
|
* Probe for <stdbool.h>, and if found use it in handy.hNicholas Clark2011-09-161-4/+7
| | | | | | | | | This means that the core uses the compiler's bool type if one exists. This avoids potential problems of clashes between perl's own implementation of bool and the compiler's bool type, which otherwise occur when one attempts to include headers which in turn include <stdbool.h>. Signed-off-by: H.Merijn Brand <h.m.brand@xs4all.nl>
* Tiny comment typo fix in handy.hFather Chrysostomos2011-06-241-1/+1
|
* handy.h: Link moved to perlhacktipsKarl Williamson2011-05-181-1/+1
|
* handy.h: isIDFIRST_utf8() changed to use XIDStartKarl Williamson2011-02-171-10/+7
| | | | | | | | | | Previously this used a home-grown definition of an identifier start, stemming from a bug in some early Unicode versions. This led to some problems, fixed by #74022. But the home-grown solution did not track Unicode, and allowed for characters, like marks, to begin words when they shouldn't. This change brings this macro into compliance with Unicode going-forward.
* Add commentsKarl Williamson2011-02-141-0/+2
|
* Move the non-generated parts of l1_char_class_tab.h out into handy.hNicholas Clark2011-01-241-1/+45
| | | | | Now the contents of l1_char_class_tab.h is only the output of Porting/mk_PL_charclass.pl
* Move metaconfig control comments into its own filesH.Merijn Brand2010-12-211-14/+0
|
* Add sin6_scope_id probe (LeoNerd)H.Merijn Brand2010-12-201-1/+1
|
* Add probe for sa_len availability in sockaddr structH.Merijn Brand2010-12-101-0/+1
| | | | Sorry for the huge config_h.SH re-order. Don't know (yet) what caused that
* regexec.c: Latin1 chars can fold match UTF8_ALLKarl Williamson2010-11-281-1/+1
| | | | | | | | | | | Some ANYOF regnodes have the ANYOF_UNICODE_ALL flag set, which means they match any non-Latin1 character. These should match /i (in a utf8 target string) any ASCII or Latin1 character that folds outside the Latin1 range As part of this patch, an internal only macro is renamed to account for its new use in regexec.c. The cumbersome name is to ward off others from using it until the final semantics have been settled on.
* handy.h: New #define to use new bitKarl Williamson2010-11-221-0/+1
| | | | | | | | | This creates a new macro for use by regcomp to test the new bit regarding non-ascii folds. Because the semantics may change in the future to deal with multi-char folds, the name of the macro is unwieldy and specific enough that no one should be tempted to use it.
* [perl #74022] Parser hangs on some Unicode charactersFather Chrysostomos2010-11-141-4/+10
| | | | | | | | | | | | | | | | | | | This changes the definition of isIDFIRST_utf8 to avoid any characters that would put the parser in a loop. isIDFIRST_utf8 is used all over the place in toke.c. Almost every instance is followed by a call to S_scan_word. S_scan_word is only called when it is known that there is a word to scan. What was happening was that isIDFIRST_utf8 would accept a character, but S_scan_word in toke.t would then reject it, as it was using is_utf8_alnum, resulting in an infinite number of zero-length identifiers. Another possible solution was to change S_scan_word to use isIDFIRST_utf8 or similar, but that has back-compatibility problems, as it stops q·foo· from being a strings and makes it an identi- fier instead.
* systematically provide pv/pvn/pvs/sv quartetsZefram2010-09-281-3/+44
| | | | | Anywhere an API function takes a string in pvn form, ensure that there are corresponding pv, pvs, and sv APIs.
* handy.h: Fix so x2p compilesKarl Williamson2010-09-251-19/+54
| | | | | | | | | | | | | The recent series of commits on handy.h causes x2p to not compile. These commits had some differences from what I submitted, in that they moved the new table to a new header file instead of the submitted perl.h. Unfortunately, this bypasses code in perl.h that figures out about duplicate definitions, and externs, and so fails on programs that include handy.h but not perl.h. This patch changes things so that the table lookup is not used unless perl.h is included. This is essentially my original patch, but adding an #include of the new header file.
* handy.h: Add isFOO_L1() macros, using table lookupKarl Williamson2010-09-251-37/+99
| | | | | | | | | | | | | | | | | | | | | | This patch adds *_L1() macros for character class lookup, using table lookup for O(1) performance. These force a Latin-1 interpretation on ASCII platforms. There were a couple existing macros that had the suffix U for Unicode semantics. I thought that those names might be confusing, so settled on L1 as the least bad name. The older names are kept as synonyms for backward compatibility. The problem with those names is that these are actually macros, not functions, and hence can be called with any int, including any Unicode code point. The U suffix might be mistaken for indicating they are more general purpose, whereas they are really only valid for the latin1 subset of Unicode (including the EBCDIC isomorphs). When called with something outside the latin1 range, they will return false. This patch necessitated rearranging a few things in the file. I added documentation for several more macros, and intend to document the rest. (This commit was modified from its original form by Steffen.)
* handy.h: Make isWORDCHAR() primary documentationKarl Williamson2010-09-251-5/+8
| | | | | This macro is clearer as to intent over isALNUM, and isn't confusable with isALNUMC. So document it primarily.
* handy.h: Slightly change the podKarl Williamson2010-09-251-8/+8
|
* handy.h: alphabetize pod entriesKarl Williamson2010-09-251-8/+8
| | | | | There are a number of macros missing from the documentation. This helps me figure out which ones.
* handy.h: Change isFOO_A() to be O(1) performanceKarl Williamson2010-09-251-32/+17
| | | | | | | | | | | | | | | | This patch changes the macros whose names end in _A to use table lookup except for the one (isASCII) which always has only one comparison. The table is in l1_char_class_tab.h. The advantage of this is speed. It replaces some fairly complicated expressions with an O(1) look-up and a mask. It uses the FITS_IN_8_BITS() macro to guarantee that the table bounds are not exceeded. For legal inputs that are byte size, the optimizer should get rid of this macro leaving only the lookup and mask. (This commit was changed from its original form by Steffen.)
* handy.h: EBCDIC should use native isalpha()Karl Williamson2010-09-251-1/+2
|
* handy.h: Add isFOO_A() macros for ASCII range matchesKarl Williamson2010-09-251-26/+64
| | | | These macros return true only if the parameter is an ASCII character.
* handy.h: should use EBCDIC libc isdigit()Karl Williamson2010-09-251-1/+2
| | | | as is better optimized and suitable for the purpose.
* handy.h: move macro in fileKarl Williamson2010-09-251-1/+2
|
* Subject: handy.h: Add isWORDCHAR() for clarityKarl Williamson2010-09-251-3/+4
| | | | | | | | | | | | | | | | | The name isALNUM() is problematic, as it is very close to isALNUMC(), and doesn't mean exactly what most people might think. I presume the C in isALNUMC stands for C language or libc, but am not sure. Others don't know either. But in any event, isALNUM is different from the C isalnum(), in that it matches the Perl concept of \w, which differs from the C definition in exactly one place. Perl includes the underscore character, '_'. So, I'm adding a isWORDCHAR() macro for future code to use to be more clear. I thought also about isWORD(), but I think confusion can arise from thinking that means a whole word. isWORDCHAR_L1() matches in the Latin1 range, to be equivalent to isALNUMU(). The motivation for using L1 instead of U will be explained in a commit message for the other L1 macros that are to be added.
* Add a comment; clarify anotherKarl Williamson2010-09-251-2/+2
|
* Indent a comment betterKarl Williamson2010-09-251-1/+1
|
* Subject: handy.h: Reorder #defines alphabeticallyKarl Williamson2010-09-251-12/+13
| | | | | The only change here is that I sorted these #defines within their groups, to make it much easier to follow what's going on.
* handy.h: isSPACE() is wrong for EBCDICKarl Williamson2010-09-251-2/+3
| | | | It didn't include the Latin1 space components.
* handy.h: EBCDIC isBLANK() is wrongKarl Williamson2010-09-251-1/+2
| | | | It doesn't include NBSP
* handy.h: isPSXSPC() is wrong for EBCDICKarl Williamson2010-09-251-1/+2
| | | | | | The macro was using the ASCII definition, which doesn't include NEL nor NBSP. But, libc contains the correct definition, which is usable on EBCDIC since we don't worry about locales there.