summaryrefslogtreecommitdiff
path: root/handy.h
Commit message (Collapse)AuthorAgeFilesLines
* We don't support MS VC++ < 6.0Steve Hay2012-08-181-1/+1
|
* Remove the UTS port.Nicholas Clark2012-08-171-1/+1
| | | | | | UTS was a mainframe version of System V created by Amdahl, subsequently sold to UTS Global. The port has not been touched since before 5.8.0, and UTS Global is now defunct.
* VC++ has QUADKIND == QUAD_IS___INT64 so we might as well make use of itSteve Hay2012-08-071-6/+6
| | | | | | - Use I64/UI64 suffixes rather than I64TYPE/U64TYPE casts for INT64_C/UINT64_C, not just when _WIN64 is defined - Use UI64 suffix rather than UL for U64_CONST
* regcomp.c: Fix multi-char fold bugKarl Williamson2012-08-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Input text to be matched under /i is placed in EXACTFish nodes. The current limit on such text is 255 bytes per node. Even if we raised that limit, it will always be finite. If the input text is longer than this, it is split across 2 or more nodes. A problem occurs when that split occurs within a potential multi-character fold. For example, if the final character that fits in a node is 'f', and the next character is 'i', it should be matchable by LATIN SMALL LIGATURE FI, but because Perl isn't structured to find multi-char folds that cross node boundaries, we will miss this it. The solution presented here isn't optimum. What we do is try to prevent all EXACTFish nodes from ending in a character that could be at the beginning or middle of a multi-char fold. That prevents the problem. But in actuality, the problem only occurs if the input text is actually a multi-char fold, which happens much less frequently. For example, we try to not end a full node with an 'f', but the problem doesn't actually occur unless the adjacent following node begins with an 'i' (or one of the other characters that 'f' participates in). That is, this patch splits when it doesn't need to. At the point of execution for this patch, we only know that the final character that fits in the node is that 'f'. The next character remains unparsed, and could be in any number of forms, a literal 'i', or a hex, octal, or named character constant, or it may need to be decoded (from 'use encoding'). So look-ahead is not really viable. So finding if a real multi-character fold is involved would have to be done later in the process, when we have full knowledge of the nodes, at the places where join_exact() is now called, and would require inserting a new node(s) in the middle of existing ones. This solution seems reasonable instead. It does not yet address named character constants (\N{}) which currently bypass the code added here.
* mktables: Generate tables for chars that aren't in final fold posKarl Williamson2012-08-021-1/+2
| | | | | | | | | | This starts with the existing table that mktables generates that lists all the characters in Unicode that occur in multi-character folds, and aren't in the final positions of any such fold. It generates data structures with this information to make it quickly available to code that wants to use it. Future commits will use these tables.
* Remove code for supporting 80286 based systems.Nicholas Clark2012-07-281-4/+0
| | | | | | | | The 80286 was released two years before Perl 1, but the support code was added with Perl 3. The chip hasn't been produced for more than 15 years - even the 80386 hasn't been manufactured since 2007. Most of the other memory model code was removed by commit 5869b1f143426909 in Sep 2000, so support for 16 bit systems is long dead.
* regcomp.h: Use handy.h constantsKarl Williamson2012-07-241-4/+7
| | | | | This synchronizes the ANYOF_FOO usages to the isFOO() usages. Future commits will take advantage of this relationship.
* handy.h: Free up bits in PL_charclass[]Karl Williamson2012-07-241-80/+66
| | | | | | | | | | | | | | | | | | | | | | | | This array is a bit map containing the Posix and similar character classes for the first 256 code points. Prior to this commit many character classes were represented by two bits, one for characters that are in it over the full Latin-1 range, and one for just the ASCII characters that are in it. The number of bits in use was approaching the 32-bit limit available without playing games. This commit takes advantage of a recent commit that adds a bit to the table for all the ASCII characters, and the fact that the ASCII characters in a character class are a subset of the full Latin1 range. So, iff both the full-range character class bit and the ASCII bit is set is that character an ASCII-range character with the given character class. A new internal macro is created to generate code to determine if a character is an ASCII range character with the given class. It's not clear if the generated code is faster or slower than the full range version. The result is that nearly half the bits are freed up, as the ones for the ASCII-range are now redundant.
* handy.h: Add intermediate internal macroKarl Williamson2012-07-241-2/+5
| | | | This macro abstracts an operation, and will make future commits cleaner.
* handy.h: Remove duplicated testKarl Williamson2012-07-241-1/+1
| | | | This test is duplicated in the called macro
* handy.h: White space onlyKarl Williamson2012-07-241-4/+5
| | | | This moves a #define next to similar ones, and removes some white space
* handy.h: Move bit shifting into base macroKarl Williamson2012-07-241-35/+36
| | | | | | This changes the #defines to be just the shift number, while doing the shifting in the macro that the number is passed to. This will prove useful in future commits
* handy.h: Renumber character class bitsKarl Williamson2012-07-241-31/+31
| | | | | | These are renumbered so that the ones that correspond to character classes in regcomp.h are related numerically as well. This will prove useful in future commits.
* handy.h: Reorder some #definesKarl Williamson2012-07-241-21/+22
| | | | | They are now ordered in the same order as the similar #defines in regcomp.h. This will be useful in later commits
* handy.h: l1_charclass.h: Add bit for matching ASCIIKarl Williamson2012-07-241-1/+2
| | | | | | | This does not replace the isASCII macro definition, as I think the current one is more efficient than this one provides. But future commits will rely on all the named character classes (e.g., /[[:ascii:]]/) having a bit, and this is the only one missing.
* handy.h: refactor some macros to use a new one in common.Karl Williamson2012-07-241-30/+32
| | | | | This creates a new, unpublished, macro to implement most of the other macros. This macro will be useful in future commits.
* handy.h: Fix broken is_ASCII_utf8()Karl Williamson2012-07-081-1/+1
| | | | Tests to follow in a future commit.
* both INT64_C and UINT64_C should be guarded [perl #76306]Jesse Luehrs2012-07-051-14/+18
|
* handy.h: Fix isBLANK_uni and isBLANK_utf8Karl Williamson2012-06-291-4/+3
| | | | | | | | | | These macros have never worked outside the Latin1 range, so this extends them to work. There are no tests I could find for things in handy.h, except that many of them are called all over the place during the normal course of events. This commit adds a new file for such testing, containing for now only with a few tests for the isBLANK's
* [perl #113756] fix type of StructCopy in API documentationLukas Mai2012-06-201-2/+2
| | | | | perlapi currently claims StructCopy takes two structs when it really takes two pointers.
* handy.h: Add commentKarl Williamson2012-06-171-1/+1
|
* update the editor hints for spaces, not tabsRicardo Signes2012-05-291-2/+2
| | | | | This updates the editor hints in our files for Emacs and vim to request that tabs be inserted as spaces.
* handy.h: Fix definition of isOCTAL_A()Karl Williamson2012-05-241-1/+1
| | | | | | | Commit c2da0b36ccf7393a329af732fac4153ddf6ab42e changed this macro, and created a syntax error. But it turns out that there were no current calls to it in the Perl core. When I tried adding one, it showed the failure.
* handy.h: New defn of isOCTAL_A() to free up bitKarl Williamson2012-05-221-7/+20
| | | | | | | | | | | | The new definition is likely slightly faster, as it replaces an array lookup with a mask. Comments are also added, listing the other possible candidates for this treatment, though the speed differential is unclear as they would also add an extra test. A U32 is used to store the information about the various properties for a character. This frees up one bit of that for future other use.
* Use the new utf8 to code point functionsKarl Williamson2012-03-191-13/+14
| | | | | These functions should be used in preference to the old ones which can read beyond the end of the input string.
* handy.h: Silence Solaris compiler warningKarl Williamson2012-02-191-1/+1
| | | | | Making this an unsigned constant silences the scary and wrong Solaris warnings about integer overflow
* handy.h: New macro for quotemetaKarl Williamson2012-02-151-2/+3
| | | | This tests if a Latin1 character should be quoted.
* Allow [[:blank:]] to work under localeKarl Williamson2012-02-091-1/+11
| | | | | | This takes advantage of the recently added Configure probe, and if the platform has an isblank library function, calls that under locale. This now matches the documentation
* Use system isascii() when available under localeKarl Williamson2012-02-091-0/+7
| | | | | | | We have code that assumes that ASCII should be locale dependent, but it was missing its final link. This supplies that, and makes the code work as documented. I thought it better to do that then to document yet another exception.
* handy.h: Add commentKarl Williamson2012-02-091-0/+4
|
* Tweak the cBOOL() macro to avoid problems with the AIX compiler.Nicholas Clark2011-11-181-1/+2
| | | | (cherry picked from commit 0cebf65582f924952bfee1472749d442d51e43e6)
* Use full sym name in isIDFIRST_utf8 to fix [perl #100930]Father Chrysostomos2011-10-071-1/+1
| | | | | | _is_utf8__perl_idstart is not an API function, so the short _is_utf8__perl_idstart form cannot be used in public macros. The long form (Perl__is_utf8__perl_idstart) must be used.
* Now with comma :(H.Merijn Brand2011-10-061-1/+1
|
* _A is predefined in some precompiler environmentsH.Merijn Brand2011-10-061-1/+1
| | | | | | | | | | | | On HP-UX 10.20 in the HP C-ANSI-C environment CAT2(macro, _A) expands to macro01 as _A obviously expands to 01. This fix "breaks" the token
* handy.h: Reorder tests for speedKarl Williamson2011-10-011-4/+4
| | | | | | | | | | It's much more likely that a random character will have its ordinal be above the ordinal for '7' than below. In the test for if a character is octal then, testing first if it is <= '7' will exclude many more possibilities than if the first test is if it is >= '0'. I left the ones for lowercase letters in the same order, because, in ASCII, anyway, there are more characters below 'a' than above it.
* handy.h: Add macroKarl Williamson2011-10-011-0/+4
|
* handy.h Fix isOCTAL_A macroKarl Williamson2011-10-011-1/+1
| | | | | | This has the incorrect definition, allowing 8 and 9, for programs that don't include perl.h. Likely no one actually uses this recently added macro who doesn't also include perl.h.
* handy.h: Add comments, pod changeKarl Williamson2011-10-011-2/+7
|
* handy.h: Improve definition of FITS_IN_8_BITSKarl Williamson2011-10-011-4/+2
| | | | | Unoptimized, the new definition takes signficantly fewer machine instructions than the old one
* handy.h: Change '(foo) ? bar : 0 to 'foo && bar'Karl Williamson2011-10-011-3/+3
| | | | | This is clearer, and leads to better unoptimized code at least. 'bar' is a boolean
* handy.h: Speed up isIDFIRST_utf8()Karl Williamson2011-10-011-1/+1
| | | | | | This now takes advantage of the new table that mktables generates to find out if a character is a legal start character in Perl's definition. Previously, it had to be looked up in two tables.
* Comment-only nitsKarl Williamson2011-10-011-3/+4
|
* handy.h: Add missing isASCII_L1 macroKarl Williamson2011-10-011-0/+1
| | | | This macro is in the pod, but never got defined.
* handy.h: Don't call _utf8 fcns if Latin1Karl Williamson2011-10-011-7/+19
| | | | | | This patch avoids the overhead of calling eg. is_utf8_alpha() on Latin1 inputs. The result is known to Perl's core, and this can avoid a swash load.
* handy.h: Don't call _utf8 fcns if ASCIIKarl Williamson2011-10-011-17/+31
| | | | | | This patch avoids the overhead of calling eg. is_utf8_alpha() on ASCII inputs. The result is known to Perl's core, and this can avoid a swash load.
* handy.h: Don't call _uni fcns if have applicable macroKarl Williamson2011-10-011-12/+23
| | | | | This patch avoids the overhead of calling eg. is_uni_alpha() if the result is known to Perl's core. This can avoid a swash load.
* Don't use swash to find cntrlsKarl Williamson2011-10-011-1/+2
| | | | | | | | | Unicode stability policy guarantees that no code points will ever be added to the control characters beyond those already in it. All such characters are in the Latin1 range, and so the Perl core already knows which ones those are, and so there is no need to go out to disk and create a swash for these.
* handy.h: No need to call fcns to compute if ASCIIKarl Williamson2011-10-011-2/+3
| | | | | | | Only the characters whose ordinals are 0-127 are ASCII. This is trivially computed by the macro, so no need to call is_uni_ascii() to do this. Also, since ASCII characters are the same when represented in utf8 or not, the utf8 function call is also superfluous.
* handy.h: Simplify isASCII definitionKarl Williamson2011-10-011-1/+6
| | | | | | | | | | | Thus retains essentially the same definition for EBCDIC platforms, but substitutes a simpler one for ASCII platforms. On my system, the new definition compiles to about half the assembly instructions that the old one did (non-optimized) A bomb-proof definition of ASCII is to make sure that the value is unsigned in the largest possible unsigned for the platform so there is no possible loss of information, and then the ord must be < 128.
* handy.h: refactor FITS_IN_8_BITS defnKarl Williamson2011-10-011-8/+12
| | | | | | This creates a #define for the platforms widest UV, and then uses this in the FITS_IN_8ITS definition, instead of #ifdef'ing that. This will be useful in future commits.