summaryrefslogtreecommitdiff
path: root/handy.h
Commit message (Collapse)AuthorAgeFilesLines
* Tiny comment typo fix in handy.hFather Chrysostomos2011-06-241-1/+1
|
* handy.h: Link moved to perlhacktipsKarl Williamson2011-05-181-1/+1
|
* handy.h: isIDFIRST_utf8() changed to use XIDStartKarl Williamson2011-02-171-10/+7
| | | | | | | | | | Previously this used a home-grown definition of an identifier start, stemming from a bug in some early Unicode versions. This led to some problems, fixed by #74022. But the home-grown solution did not track Unicode, and allowed for characters, like marks, to begin words when they shouldn't. This change brings this macro into compliance with Unicode going-forward.
* Add commentsKarl Williamson2011-02-141-0/+2
|
* Move the non-generated parts of l1_char_class_tab.h out into handy.hNicholas Clark2011-01-241-1/+45
| | | | | Now the contents of l1_char_class_tab.h is only the output of Porting/mk_PL_charclass.pl
* Move metaconfig control comments into its own filesH.Merijn Brand2010-12-211-14/+0
|
* Add sin6_scope_id probe (LeoNerd)H.Merijn Brand2010-12-201-1/+1
|
* Add probe for sa_len availability in sockaddr structH.Merijn Brand2010-12-101-0/+1
| | | | Sorry for the huge config_h.SH re-order. Don't know (yet) what caused that
* regexec.c: Latin1 chars can fold match UTF8_ALLKarl Williamson2010-11-281-1/+1
| | | | | | | | | | | Some ANYOF regnodes have the ANYOF_UNICODE_ALL flag set, which means they match any non-Latin1 character. These should match /i (in a utf8 target string) any ASCII or Latin1 character that folds outside the Latin1 range As part of this patch, an internal only macro is renamed to account for its new use in regexec.c. The cumbersome name is to ward off others from using it until the final semantics have been settled on.
* handy.h: New #define to use new bitKarl Williamson2010-11-221-0/+1
| | | | | | | | | This creates a new macro for use by regcomp to test the new bit regarding non-ascii folds. Because the semantics may change in the future to deal with multi-char folds, the name of the macro is unwieldy and specific enough that no one should be tempted to use it.
* [perl #74022] Parser hangs on some Unicode charactersFather Chrysostomos2010-11-141-4/+10
| | | | | | | | | | | | | | | | | | | This changes the definition of isIDFIRST_utf8 to avoid any characters that would put the parser in a loop. isIDFIRST_utf8 is used all over the place in toke.c. Almost every instance is followed by a call to S_scan_word. S_scan_word is only called when it is known that there is a word to scan. What was happening was that isIDFIRST_utf8 would accept a character, but S_scan_word in toke.t would then reject it, as it was using is_utf8_alnum, resulting in an infinite number of zero-length identifiers. Another possible solution was to change S_scan_word to use isIDFIRST_utf8 or similar, but that has back-compatibility problems, as it stops q·foo· from being a strings and makes it an identi- fier instead.
* systematically provide pv/pvn/pvs/sv quartetsZefram2010-09-281-3/+44
| | | | | Anywhere an API function takes a string in pvn form, ensure that there are corresponding pv, pvs, and sv APIs.
* handy.h: Fix so x2p compilesKarl Williamson2010-09-251-19/+54
| | | | | | | | | | | | | The recent series of commits on handy.h causes x2p to not compile. These commits had some differences from what I submitted, in that they moved the new table to a new header file instead of the submitted perl.h. Unfortunately, this bypasses code in perl.h that figures out about duplicate definitions, and externs, and so fails on programs that include handy.h but not perl.h. This patch changes things so that the table lookup is not used unless perl.h is included. This is essentially my original patch, but adding an #include of the new header file.
* handy.h: Add isFOO_L1() macros, using table lookupKarl Williamson2010-09-251-37/+99
| | | | | | | | | | | | | | | | | | | | | | This patch adds *_L1() macros for character class lookup, using table lookup for O(1) performance. These force a Latin-1 interpretation on ASCII platforms. There were a couple existing macros that had the suffix U for Unicode semantics. I thought that those names might be confusing, so settled on L1 as the least bad name. The older names are kept as synonyms for backward compatibility. The problem with those names is that these are actually macros, not functions, and hence can be called with any int, including any Unicode code point. The U suffix might be mistaken for indicating they are more general purpose, whereas they are really only valid for the latin1 subset of Unicode (including the EBCDIC isomorphs). When called with something outside the latin1 range, they will return false. This patch necessitated rearranging a few things in the file. I added documentation for several more macros, and intend to document the rest. (This commit was modified from its original form by Steffen.)
* handy.h: Make isWORDCHAR() primary documentationKarl Williamson2010-09-251-5/+8
| | | | | This macro is clearer as to intent over isALNUM, and isn't confusable with isALNUMC. So document it primarily.
* handy.h: Slightly change the podKarl Williamson2010-09-251-8/+8
|
* handy.h: alphabetize pod entriesKarl Williamson2010-09-251-8/+8
| | | | | There are a number of macros missing from the documentation. This helps me figure out which ones.
* handy.h: Change isFOO_A() to be O(1) performanceKarl Williamson2010-09-251-32/+17
| | | | | | | | | | | | | | | | This patch changes the macros whose names end in _A to use table lookup except for the one (isASCII) which always has only one comparison. The table is in l1_char_class_tab.h. The advantage of this is speed. It replaces some fairly complicated expressions with an O(1) look-up and a mask. It uses the FITS_IN_8_BITS() macro to guarantee that the table bounds are not exceeded. For legal inputs that are byte size, the optimizer should get rid of this macro leaving only the lookup and mask. (This commit was changed from its original form by Steffen.)
* handy.h: EBCDIC should use native isalpha()Karl Williamson2010-09-251-1/+2
|
* handy.h: Add isFOO_A() macros for ASCII range matchesKarl Williamson2010-09-251-26/+64
| | | | These macros return true only if the parameter is an ASCII character.
* handy.h: should use EBCDIC libc isdigit()Karl Williamson2010-09-251-1/+2
| | | | as is better optimized and suitable for the purpose.
* handy.h: move macro in fileKarl Williamson2010-09-251-1/+2
|
* Subject: handy.h: Add isWORDCHAR() for clarityKarl Williamson2010-09-251-3/+4
| | | | | | | | | | | | | | | | | The name isALNUM() is problematic, as it is very close to isALNUMC(), and doesn't mean exactly what most people might think. I presume the C in isALNUMC stands for C language or libc, but am not sure. Others don't know either. But in any event, isALNUM is different from the C isalnum(), in that it matches the Perl concept of \w, which differs from the C definition in exactly one place. Perl includes the underscore character, '_'. So, I'm adding a isWORDCHAR() macro for future code to use to be more clear. I thought also about isWORD(), but I think confusion can arise from thinking that means a whole word. isWORDCHAR_L1() matches in the Latin1 range, to be equivalent to isALNUMU(). The motivation for using L1 instead of U will be explained in a commit message for the other L1 macros that are to be added.
* Add a comment; clarify anotherKarl Williamson2010-09-251-2/+2
|
* Indent a comment betterKarl Williamson2010-09-251-1/+1
|
* Subject: handy.h: Reorder #defines alphabeticallyKarl Williamson2010-09-251-12/+13
| | | | | The only change here is that I sorted these #defines within their groups, to make it much easier to follow what's going on.
* handy.h: isSPACE() is wrong for EBCDICKarl Williamson2010-09-251-2/+3
| | | | It didn't include the Latin1 space components.
* handy.h: EBCDIC isBLANK() is wrongKarl Williamson2010-09-251-1/+2
| | | | It doesn't include NBSP
* handy.h: isPSXSPC() is wrong for EBCDICKarl Williamson2010-09-251-1/+2
| | | | | | The macro was using the ASCII definition, which doesn't include NEL nor NBSP. But, libc contains the correct definition, which is usable on EBCDIC since we don't worry about locales there.
* Subject: handy.h: Move defn's outside #ifndef EBCDICKarl Williamson2010-09-251-15/+15
| | | | | | Commit 4125141464884619e852c7b0986a51eba8fe1636 improperly got rid of EBCDIC handling, as it combined the ASCII and EBCDIC versions, but left the result in the ASCII-only branch. Just move to the common code.
* Rename isALNUM_L1 to isWORDCHAR_L1Karl Williamson2010-09-231-1/+1
|
* handy.h: Add isALNUM_L1() macroKarl Williamson2010-09-231-0/+1
| | | | This is a synonym for isALNUMU
* Subject: handy.h: Add isSPACE_L1 with Unicode semanticsKarl Williamson2010-09-231-0/+4
|
* handy.h: isASCII() extend to work on > 8 bit valuesKarl Williamson2010-09-221-3/+4
| | | | | | | | | | | | | | Prior to this patch, if isASCII() is called with something like '256', it would return true. For some reason unknown to me, U64 is defined only inside the perl core. However, the equivalent U64TYPE is known everywhere, so in the macro that can be called outside of core, use that instead. The commit log doesn't give a reason for not defining U64 outside of core, and no tests in the suite fail when it is defined outside core. But out of caution, I'm just doing this workaround instead of exposing U64.
* handy.h: Don't use isascii() as not in all libc'sKarl Williamson2010-09-221-2/+1
| | | | | EBCDIC platforms use isascii(), but is not in all libc's so better to use our own.
* handy.h: Fix-up documentationKarl Williamson2010-09-221-18/+25
| | | | | Previous documentation was wrong for EBCDIC platforms. This fixes that and adds some more explanation.
* handy.h: toUPPER is not a char class fcnKarl Williamson2010-09-221-0/+2
| | | | | | toUPPER() and toLOWER() were grouped with the character class functions (in perlapi), to which they are related, but aren't the same. Create a new heading for these.
* Fix /[\8]/ to not match NULL; give correct warningKarl Williamson2010-09-161-0/+5
| | | | | | | | | | 8 and 9 are not treated as alphas in parsing as opposed to illegal octals. This also adds tests to verify that 1-3 digits work in char classes. I created an isOCTAL macro in case that lookup gets moved to a bit field, as I plan to do later, for speed.
* handy.h: Add bounds checking to case change arraysKarl Williamson2010-09-131-7/+13
| | | | | | | This makes sure that the index into the arrays used to change between lower and upper case will fit into their bounds; returning an error character if not. The check is likely to be optimized out if the index is stored in 8 bits.
* handy.h: Add FITS_IN_8_BITS() macroKarl Williamson2010-09-131-0/+14
| | | | | | | This macro is designed to be optimized out if the argument is byte-length, but otherwise to be a bomb-proof way of making sure that the argument occupies only 8 bits or fewer in whatever storage class it is in.
* add lex_stuff_pvs()Zefram2010-08-221-0/+9
| | | | New macro lex_stuff_pvs(), wrapping lex_stuff_pvn() for literal strings.
* handy.h: Note Devel::PPPort has duplicated macrosKarl Williamson2010-08-021-0/+3
| | | | | | If a bug is found in the handy.h macros, it may be necessary to fix the duplicates in the cpan module. This may require filing a bug report there.
* Add C_ARRAY_END(), returning a pointer to after the last element of an array.Nicholas Clark2010-05-281-0/+1
| | | | Refactor the macro append_flags() in dump.c to use it.
* PATCH: Clean up EBCDIC handling of \cXKarl Williamson2010-05-171-10/+3
| | | | | | | | | | The function perl_ebcdic_control() is unnecessary, as the toCTRL macro that calls it can be changed to just map EBCDIC to ASCII first, and then doing the normal procedure. This means that EBCDIC and ASCII will no longer diverge. Currently, EBCIDIC gives a syntax error for inputs outside its domain, whereas the ASCII version accepts some of them.
* Make sure isCNTRL and isASCII work on signed charsKarl Williamson2010-04-261-2/+7
| | | | | | | | Prior to this patch, there is a potential bug in these two macros, in which, if they are called with a signed character outside the ASCII range, it will be negative and they always returned true for negative. Casting the parameter to an unsigned should fix that by having it be interpreted as a number above the ASCII range.
* More defensive definition of memEQs().Nicholas Clark2010-04-251-1/+1
|
* Set the legacy process name with prctl() on assignment to $0 on LinuxÆvar Arnfjörð Bjarmason2010-04-151-2/+1
| | | | | | | | | | | | | Ever since perl 4.000 we've only set the POSIX process name via argv[0]. Unfortunately on Linux the POSIX name isn't used by utilities like top(1), ps(1) and killall(1). Now when we set C<$0 = "hello"> both C<qx[ps h $$]> (POSIX) and C<qx[ps hc $$]> (legacy) will say "hello", instead of the latter being "perl" as was previously the case. See also the March 9 2010 thread "Why doesn't assignment to $0 on Linux also call prctl()?" on perl5-porters.
* use cBOOL for bool castsDavid Mitchell2010-04-151-0/+6
| | | | | | | | | | | | | bool b = (bool)some_int doesn't necessarily do what you think. In some builds, bool is defined as char, and that cast's behaviour is thus undefined. So this line in mg.c: const bool was_temp = (bool)SvTEMP(sv); was actually setting was_temp to false even when the SVs_TEMP flag was set. Fix this by replacing all the (bool) casts with a new cBOOL() cast macro that (hopefully) does the right thing.
* Probe for prctl () and check id PR_SET_NAME is supportedH.Merijn Brand2010-04-131-1/+2
|
* PATCH: deprecation warnings for unreasonable charnamesKarl Williamson2010-02-201-0/+12
| | | | | | | | | | | | | | | | | Prior to now just about anything has been legal for a character name in \N{...}. This means that legal code was broken by having \N{3,4} for example mean [^\n]{3,4}. Such code doesn't come from standard charnames, but from legal custom translators. This patch deprecates "unreasonable" names. handy.h is changed by the addition of macros that taken together define the names we deem reasonable, namely alpha beginning with alphanumerics and some punctuations as continuations. toke.c is changed to parse each name and to raise a warning if any problematic characters are found. Some tests and diagnostic documentation are also included.