summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* mktables: Change \w definition to match new Unicode'sKarl Williamson2012-07-263-1/+15
| | | | | Unicode has changed their definition of what should match \w. http://www.unicode.org/reports/tr18/. This follows that change.
* Make Module::CoreList install into 'site' >= 5.012Chris 'BinGOs' Williams2012-07-261-2/+18
| | | | | Also if versiononly is set make sure that corelist is installed with the appropriate versioned suffix.
* [perl #113872] Fix leavewrite’s stack handlingFather Chrysostomos2012-07-261-8/+2
| | | | | | | | | | | | | This commit fixes Scope::Escape compatibility by restoring the old stack pointer stored on the context stack when exiting a write. I don’t really understand why this fixes Scope::Escape, or rather, how Scope::Escape ends up leaving some number of items on the stack other than 1. But I *do* know this is the correct approach, as it mirrors what pp_leavesub does in scalar context, but pops the stack back to the old value (SP = newsp), rather than the old value+1 (see pp_hot.c:pp_leavesub: MARK = newsp + 1; and later SP = MARK;). Then the code that follows takes care of pushing write’s own return value.
* op.c: op_clear is tempting fateFather Chrysostomos2012-07-251-0/+1
| | | | | | | | | | This if() statement can be reached by op types to which the OpTRANS* flags to not apply. They happen at present not to use any flags that conflict with these (except when OPf_KIDS is set, in which case this code is not reached). But we should make sure, via an assertion, that new flags added to goto or last do not conflict with trans utf8 flags, and that trans utf8 flags (1 and 2), if renumbered, do not con- flict with goto/last utf8 flags (128).
* Don’t let ?: folding affect truncateFather Chrysostomos2012-07-252-2/+13
| | | | | | | | truncate(${\1} ? foo : bar, 0) and truncate(1 ? foo : bar, 0) should behave the same way, but were treated differently, due to the way ?: is folded in the latter case. Now that foldedness is recorded in the op tree (cc2ebcd7902), we can use the OPpCONST_FOLDED flag to distin- guish truncate(1 ? foo : bar, 0) from truncate(foo, 0).
* Stop truncate(word) from falling back to file nameFather Chrysostomos2012-07-252-5/+13
| | | | | | | | | In commit 5e0adc2d66, which was a bug fix, I made the mistake of checking the truth of the return value of gv_fetchsv, which is called when truncate’s argument is a bareword. This meant that truncate FOO, 0; would truncate the file named FOO if the glob happened to have been deleted.
* Don’t let ?: folding affect statFather Chrysostomos2012-07-252-2/+10
| | | | | | | | stat(${\1} ? foo : bar) and stat(1 ? foo : bar) should behave the same way, but were treated differently, due to the way ?: is folded in the latter case. Now that foldedness is recorded in the op tree (cc2ebcd7902), we can use the OPpCONST_FOLDED flag to distinguish stat(1 ? foo : bar) from stat(foo).
* Merge ck_trunc and ck_chdirFather Chrysostomos2012-07-255-26/+2
| | | | | | ck_chdir, added in 2006 (d4ac975e) duplicates ck_trunc, added in 1993 (79072805), except for a null op check which is harmless when applied to chdir.
* op.c: dump LABEL leaks its labelFather Chrysostomos2012-07-251-0/+1
| | | | | | | | ./perl -Ilib -e 'warn $$; eval "sub { dump a }" while 1' Watch the memory usage go up. It didn’t have its own case in op_clear.
* op.c:op_free: Rmv dead code; simplify cop_free logicFather Chrysostomos2012-07-251-7/+3
| | | | | | | | | | | | | This reverts c53f1caa and cc93af5f. See the thread starting at http://www.nntp.perl.org/group/perl.perl5.porters/2008/04/msg135885.html Basically, change c53f1caa made a change, but then cc93af5f undid it, but differently. This resulted in dead code left by c53f1caa (type is unused after the assignment). And in the end the code behaved exactly the same way, so the original problem was not fixed. I suspect this was a B::C bug.
* Merge branch 'blead' of ssh://perl5.git.perl.org/perl into bleadKarl Williamson2012-07-241-2/+2
|\
| * In Perl_magic_setenv() s/ptr/key/ in two pieces of platform-specific code.Nicholas Clark2012-07-241-2/+2
| | | | | | | | These were missed in commit 1203306491d341ed, which renamed ptr to key.
* | Merge branch 'khw/invlist' into bleadKarl Williamson2012-07-2418-915/+1313
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This topic branch deals mostly with bracketed character classes in regular expressions. It has several main thrusts: 1) The character class macros in handy.h are tied to the class numbers in regcomp.h, and a new set of regnode types, POSIX, are introduced. This will allow more table driven code in regular expression compilation and matching, so that the same regnode type can be used for any of the Posix-like character classes, such as \w as well as [:upper:]. This will allow removal of nearly-duplicate (and triplicate, etc) code. 2) The optimizations for bracketed character classes are extended to work off not just the first 256 characters, but all code points. This extends some char class optimizations to Unicode, and will allow future work to change the regular expression optimizer to work off all Unicode characters, instead of its current behavior of mostly giving up on these. 3) Several new character class optimizations are introduced, with the groundwork laid for more. A character class containing a single character is the same as the single character without the class, except that most metacharacters are treated as literals. Thus, /a[.]b/ produces identical results to /a\.b/. Prior to this merge, the first form's compiled version would require quite a bit more space than the second. Now they are identical. Inversion lists of the whole class are used for optimization calculations.
| * regcomp.c: Revise bracketed char class optimizationsKarl Williamson2012-07-241-115/+146
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit uses the inversion list instead of the bit map constructed during compilation of [bracketed classes] for determining the optimizations that are done at the very end of processing the class. This provides optimizations for things that can't be seen by just looking at the bitmap. There are optimizations done earlier in the code for things that can be easily caught in Pass 1, but now we have complete information. At this time, I'm not repeating checking for most optimizations that are checked for earlier, though this could be added. The earlier optimizations could overlook cases where someone specified a class in a suboptimal way. For example the earlier code looks for a class with a single range [0-9], but those 10 code points could instead have been specified via [0123456789], and the code doesn't catch that currently; here we could, but I'm not doing so at this time. The code does do some duplicate checking. For example, some Unicode properties match only a single code point, such as lb=cr, and these aren't known at the earlier point where single code point classes are checked for.
| * regcomp.c: Fix <if> conditionKarl Williamson2012-07-241-2/+1
| | | | | | | | | | | | | | The else clause is expecting that the regex is compiled under /d, when in fact, until this commit, it could also be under /l. I could not come up with a case currently where this distinction matters, but it's best to not tempt fate.
| * regcomp.c: Add _invlist_contains_cpKarl Williamson2012-07-244-0/+20
| | | | | | | | | | This simply searches an inversion list without going through a swash. It will be used in a future commit.
| * utf8.c: Add a get_() method to hide internal detailsKarl Williamson2012-07-245-6/+24
| | | | | | | | | | | | This should have been written this way to begin with (I'm the culprit). But we should have a method so another routine doesn't have to know the internal details.
| * regcomp.c: Optimize /[[:blank:]]/u into \hKarl Williamson2012-07-241-0/+9
| | | | | | | | These two are equivalent.
| * regcomp.c: Properly count elements in [] for false rangesKarl Williamson2012-07-241-1/+3
| | | | | | | | | | | | When something that looks like a range turns out not to be, the hyphen is matched literally, and so is a separate element in the character class. It needs to be accounted as such.
| * regcomp.c: Use POSIXA, NPOSIXAKarl Williamson2012-07-243-1/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit optimizes character classes which are matched under /a or /aa and consist of a single Posix class, into POSIXA or NPOSIXA regop types. For example /[[:word:]]/a. Since [:ascii:] is always ascii-restricted no matter what the charset modifier is, it is always optimized. These nodes should execute somewhat faster than a generic ANYOF node, and are significantly smaller, taking 2 bytes instead of 12. The flags field of the node structure is used to hold an enum indicating which of the 15 Posix classes is being matched.
| * regcomp.sym: Add new node types POSIXA and NPOSIXAKarl Williamson2012-07-242-142/+191
| | | | | | | | | | | | | | | | | | These will be used to handle things like /[[:word:]]/a. This patch doesn't add the code to actually use these. That will be done in a future patch. Also, placeholders POSIXD, POSIXL, and POSIXU are also added for future use.
| * regcomp.h: Use handy.h constantsKarl Williamson2012-07-242-34/+40
| | | | | | | | | | This synchronizes the ANYOF_FOO usages to the isFOO() usages. Future commits will take advantage of this relationship.
| * handy.h: Free up bits in PL_charclass[]Karl Williamson2012-07-243-364/+337
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This array is a bit map containing the Posix and similar character classes for the first 256 code points. Prior to this commit many character classes were represented by two bits, one for characters that are in it over the full Latin-1 range, and one for just the ASCII characters that are in it. The number of bits in use was approaching the 32-bit limit available without playing games. This commit takes advantage of a recent commit that adds a bit to the table for all the ASCII characters, and the fact that the ASCII characters in a character class are a subset of the full Latin1 range. So, iff both the full-range character class bit and the ASCII bit is set is that character an ASCII-range character with the given character class. A new internal macro is created to generate code to determine if a character is an ASCII range character with the given class. It's not clear if the generated code is faster or slower than the full range version. The result is that nearly half the bits are freed up, as the ones for the ASCII-range are now redundant.
| * handy.h: Add intermediate internal macroKarl Williamson2012-07-241-2/+5
| | | | | | | | This macro abstracts an operation, and will make future commits cleaner.
| * regcomp.c: Relax some restrictions on optimizations for localeKarl Williamson2012-07-241-7/+6
| | | | | | | | | | | | | | | | | | Prior to this commit, we didn't do any inversions for bracketed character classes running under locale. However, this is more strict than necessary. If there is no folding, and everything else is known at compile time, then what is matched when the result is complemented is well-defined, and can be done now. (Also clarifies one of the affected comments)
| * regcomp.c: Add func to test 2 inversion lists for equalityKarl Williamson2012-07-242-0/+72
| | | | | | | | This adds _invlistEQ which for now is commented out
| * utf8.c: Add info to commented-out -DU linesKarl Williamson2012-07-241-3/+3
| | | | | | | | This proved useful when I recently needed to use these for debugging
| * regcomp.c: Reverse order of setting, for speedKarl Williamson2012-07-241-2/+2
| | | | | | | | | | | | | | | | It's faster to append to an inversion list than to insert into the middle. The previous order of doing things guaranteed that the 2nd thing done would be an insertion, hence slower than an append. Now we add the lowest ordinal character first, so there is a chance that both will be appends
| * perllocale: Mention that \n doesn't change for localesKarl Williamson2012-07-241-0/+6
| |
| * handy.h: Remove duplicated testKarl Williamson2012-07-241-1/+1
| | | | | | | | This test is duplicated in the called macro
| * handy.h: White space onlyKarl Williamson2012-07-241-4/+5
| | | | | | | | This moves a #define next to similar ones, and removes some white space
| * regcomp.c: Move table to wider scopeKarl Williamson2012-07-241-33/+34
| | | | | | | | This table will be used in future commits outside this block's scope
| * regcomp.c: Silence compiler warningKarl Williamson2012-07-241-3/+3
| | | | | | | | | | I suspect this being an IV stemmed from an earlier version. It always contains unsigneds
| * regcomp.c: Change macro name to better indicate its purposeKarl Williamson2012-07-241-11/+11
| |
| * Optimize a single character [class] into EXACTishKarl Williamson2012-07-242-9/+28
| | | | | | | | | | Things like /[s]/ or /[s]/i can be optimized as if they did not have the brackets, /s/ or /s/i.
| * regcomp.c: Extract some code into an inline functionKarl Williamson2012-07-244-12/+28
| | | | | | | | This code will be used in future commits in multiple places
| * regcomp.c: shrink some optimized [class] nodesKarl Williamson2012-07-241-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Various bracketed character class specifications don't need the full generality, and can be optimized into smaller, faster nodes. Recent commits, such as 3a64b5154fffec75126d34d25954f0aef30d9f8a have introduced some such optimizations. However, their commit messages are wrong, in that they don't end taking up any less space than the original ANYOF node, as there is no provision for giving it back, once reserved. This commit corrects that for non-locale nodes. Restructuring of the code, more than I care to do now, would be required to extend this to locale nodes. Only optimizations that are determined in pass1 of the regex compilation are eligible for this, as once the space is calculated, it is reserved before pass2 begins. There are now two sections where optimization is done in this routine. The final section is after all the data is examined and processed, and we know exactly what is to be in the ANYOF node. Currently, most of this calculation processing is skipped in pass 1. We could do everything in both passes, greatly slowing down the first, in order to figure out the exact space requirements. But instead, I've chosen to add a separate optimization section that looks at only the optimizations that are easily computable in the current first pass, and to apply those early, in time for the shrinking to occur. This commit adds that shrinking. Locale nodes can't be shrunk because the code currently writes into their larger buffer before this decision to optimize is made. To illustrate, say that the node is at the end of the regex, and the data is written at x+20, within a generic locale ANYOF node's space. Pass 1 doesn't write anything, but calculates the space needed. At the point that this commit addresses, we would shrink the amount of space to be allocated to be allocated. Then we carve out space for the regex from the heap. That x+20 could be pointing to something entirely different, which Pass 2 would destroy. This can be avoided by storing the stuff that would be written into a temporary until the optimization decision has been made. But there is extra work involved in doing this, that I don't think is worth it at this time. The same problem doesn't occur in non-locale situations, even though the flags field of the node is written before the optimization decision. This is because every node has the space for a flags field, so it just gets overwritten as necessary when the optimization is done.
| * regcomp.c: Delay some initialization until neededKarl Williamson2012-07-241-1/+1
| | | | | | | | | | | | | | | | This delays the initialization of the bitmap in ANYOF nodes until just before it is needed, and to after where we make a decision to optimize that node to a node which takes less space. Currently, the space is not given up, once reserved in pass 1, so the write is harmless. This will allow a future commit to shrink the space.
| * regcomp.c: Remove duplicate assignmentsKarl Williamson2012-07-241-3/+0
| | | | | | | | These variables are already set to the same values a few lines up.
| * handy.h: Move bit shifting into base macroKarl Williamson2012-07-243-293/+294
| | | | | | | | | | | | This changes the #defines to be just the shift number, while doing the shifting in the macro that the number is passed to. This will prove useful in future commits
| * handy.h: Renumber character class bitsKarl Williamson2012-07-241-31/+31
| | | | | | | | | | | | These are renumbered so that the ones that correspond to character classes in regcomp.h are related numerically as well. This will prove useful in future commits.
| * handy.h: Reorder some #definesKarl Williamson2012-07-241-21/+22
| | | | | | | | | | They are now ordered in the same order as the similar #defines in regcomp.h. This will be useful in later commits
| * handy.h: l1_charclass.h: Add bit for matching ASCIIKarl Williamson2012-07-243-131/+133
| | | | | | | | | | | | | | This does not replace the isASCII macro definition, as I think the current one is more efficient than this one provides. But future commits will rely on all the named character classes (e.g., /[[:ascii:]]/) having a bit, and this is the only one missing.
| * handy.h: refactor some macros to use a new one in common.Karl Williamson2012-07-241-30/+32
| | | | | | | | | | This creates a new, unpublished, macro to implement most of the other macros. This macro will be useful in future commits.
| * regcomp.c: Extract code to inline functionKarl Williamson2012-07-244-6/+65
| | | | | | | | | | | | Future commits will use this paradigm in additional places, so extract it to a function, so they all do things right. This isn't a great API, but it works for the few places this will be called.
| * regcomp.sym: Correct and add commentsKarl Williamson2012-07-241-1/+2
| |
| * regen/regcomp.pl: Allow ';' in commentsKarl Williamson2012-07-241-1/+1
| | | | | | | | | | | | If a comment contained a semi-colon, the regular expression's greedy quantifier would think the portion of the comment before it was part of the data to be processed
| * regcomp.c: Optimize [^\n] into \NKarl Williamson2012-07-242-1/+7
| | | | | | | | | | This optimization is a big win, as it takes less space, and completely avoids any swash hash creation, and use.
| * regcomp.c: White-space, comments onlyKarl Williamson2012-07-241-22/+32
| | | | | | | | | | | | This fixes some nits in a some comments, adds some comments, but mostly just indents and outdents to reflect a new outer block, and to fit within 80 columns
| * regcomp.c: Refactor new charclass optimizationsKarl Williamson2012-07-241-97/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commits 3a64b5154fffec75126d34d25954f0aef30d9f8a and 3172e3fd885a9c54105d3b6156f18dc761fe29e5 introduced some optimizations into the handling of bracketed character classes with just a single element. In working on other optimizations, I realized that it would be better to put these all in one spot, instead of doing things partially and setting flags to pass to other areas of the routine. This commit moves all the processing to a single spot, which gets called only after we know that there will be just one element in the character class. I also realized that the [0-9] optimization should strictly not be done under locale. There is no test for this, as actually this would only be a problem if a locale was in violation of the C standard. But (most) of the rest of Perl doesn't assume that locales are well-behaved in this regard, so now this code doesn't either.