summaryrefslogtreecommitdiff
path: root/regen
Commit message (Collapse)AuthorAgeFilesLines
* regen/regcharclass.pl: Simplify regexKarl Williamson2012-10-091-1/+1
| | | | There doesn't need to be a quantifier or capturing on this regex.
* regen/regcharclass.pl: Add ability for more complex inputsKarl Williamson2012-10-091-4/+23
| | | | | | | This adds the capability to get input to this program from another program, thus allowing essentially arbitrary input. This will be used in future commits.
* regen/regcharclass.pl: improved optree generationYves Orton2012-10-031-20/+26
| | | | | | Karl Williamson noticed that we dont always deal with common suffixes in the most efficient way. This change reworks how we convert a trie to an optree so that common suffixes are always grouped together.
* regen/regcharclass.pl: add comments and some minor code cleanupYves Orton2012-10-031-17/+34
|
* Remove length magic on scalarsFather Chrysostomos2012-10-011-1/+1
| | | | | | | | | It is not possible to know how to interpret the returned length without accessing the UTF8 flag, which is not reliable until the SV has been stringified, which requires get-magic. So length magic has not made senses since utf8 support was added. I have removed all uses of length magic from the core, so this is now dead code.
* Increase $warnings::VERSION to 1.15Father Chrysostomos2012-09-301-1/+1
|
* Increase $feature::VERSION to 1.31Father Chrysostomos2012-09-301-1/+1
|
* Use two colons for lexsub warningFather Chrysostomos2012-09-302-8/+4
|
* remove test define from regen/regcharclass.plYves Orton2012-09-291-5/+0
|
* improve conditional folding logic in regen/regcharclass.plYves Orton2012-09-291-7/+12
|
* fix perl #115078, ternary folding logic failureYves Orton2012-09-291-5/+22
|
* add a new define for testing perl #115078Yves Orton2012-09-291-0/+5
| | | | | | We dont have any easy way to test regen/regcharclass.pl currently. Perl #115078 is related to a bug in the _cleanup() routine which is fixed with next patch.
* Correct fm vtable in perlguts.podFather Chrysostomos2012-09-251-1/+1
| | | | | | fm magic uses want_vtbl_fm, which is #defined as want_vtbl_regexp. The definition in regen/mg_vtable.pl does not affect anything except the documentation. It was listed as using regdata which was wrong.
* [perl #94490] const fold should not trigger special split " "Father Chrysostomos2012-09-221-1/+1
| | | | | | | | | | | The easiest way to fix this was to move the special handling out of the regexp engine. Now a flag is set on the split op itself for this case. A real regexp is still created, as that is the most convenient way to propagate locale settings, and it prevents the need to rework pp_split to handle a null regexp. This also means that custom regexp plugins no longer need to handle split specially (which they all do currently).
* add shebangs where missingSawyer X2012-09-221-0/+1
|
* Document lexical subsFather Chrysostomos2012-09-151-0/+14
|
* Add experimental lexical_subs featureFather Chrysostomos2012-09-151-3/+25
|
* feature.pm: Missing spaceFather Chrysostomos2012-09-151-1/+1
|
* Increase $feature::VERSION to 1.30Father Chrysostomos2012-09-151-1/+1
|
* Add experimental warnings categ and :lexical_subs warn IDFather Chrysostomos2012-09-151-2/+11
| | | | | | | I reindented the tree in perllexwarn because I was simply copying and pasting the output from: perl regen/warnings.pl tree
* Add clonecv op typeFather Chrysostomos2012-09-152-1/+2
| | | | | | This will be used for cloning a ‘my’ sub on scope entry. I was going to use pp_padcv for this, but it would end up having a top-level if/else.
* Add introcv op typeFather Chrysostomos2012-09-152-1/+2
| | | | | This will be used for introducing ‘my’ subs on scope entry, by turning off the stale flag.
* Add proto magic typeFather Chrysostomos2012-09-151-0/+1
| | | | | | | | This will be used for storing the prototype CV of a ‘my’ sub. The clone needs to occupy the pad entry so that padcv ops will be able to find it. That means the clone has to displace its prototype. In case the same sub is called recursively, we still need to be able to access the prototype.
* padcv op typeFather Chrysostomos2012-09-151-0/+2
|
* Increase $warnings::VERSION to 1.14Father Chrysostomos2012-09-141-1/+1
|
* Stop lexical warnings from turning off deprecationsFather Chrysostomos2012-09-141-7/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some warnings, such as deprecation warnings, are on by default: $ perl5.16.0 -e '$*' $* is no longer supported at -e line 1. But turning *on* other warnings will turn them off: $ perl5.16.0 -e 'use warnings "void"; $*' Useless use of a variable in void context at -e line 1. Either all warnings in any given scope are controlled by lexical hints, or none of them are. When a single warnings category is turned on or off, if the warn- ings were controlled by $^W, then all warnings are first turned on lexically if $^W is 1 and all warnings are turned off lexically if $^W is 0. That has the unfortunate affect of turning off warnings when it was only requested that warnings be turned on. These categories contain default warnings: ambiguous debugging deprecated inplace internal io malloc utf8 redefine syntax glob inplace overflow precedence prototype threads misc Most also contain regular warnings, but these contain *only* default warnings: debugging deprecated glob inplace malloc So we can treat $^W==0 as equivalent to qw(debugging deprecated glob inplace malloc) when enabling lexical warnings. While this means that some default warnings will still be turned off by ‘use warnings "void"’, it won’t be as many as before. So at least this is a step in the right direction. (The real solution, of course, is to allow each warning to be turned off or on on its own.)
* utf8.h: Use machine generated IS_UTF8_CHAR()Karl Williamson2012-09-131-0/+16
| | | | | | | | | | | | | | | | This takes the output of regen/regcharclass.pl for all the 1-4 byte UTF8-representations of Unicode code points, and replaces the current hand-rolled definition there. It does this only for ASCII platforms, leaving EBCDIC to be machine generated when run on such a platform. I would rather have both versions to be regenerated each time it is needed to save an EBCDIC dependency, but it takes more than 10 minutes on my computer to process the 2 billion code points that have to be checked for on ASCII platforms, and currently t/porting/regen.t runs this program every times; and that slow down would be unacceptable. If this is ever run under EBCDIC, the macro should be machine computed (very slowly). So, even though there is an EBCDIC dependency, it has essentially been solved.
* regen/regcharclass.pl: Add ability to restrict platformsKarl Williamson2012-09-131-0/+9
| | | | | This adds the capability to skip definitions if they are for other than a desired platform.
* utf8.h: Remove some EBCDIC dependenciesKarl Williamson2012-09-131-0/+12
| | | | | | | | | | | regen/regcharclass.pl has been enhanced in previous commits so that it generates as good code as these hand-defined macro definitions for various UTF-8 constructs. And, it should be able to generate EBCDIC ones as well. By using its definitions, we can remove the EBCDIC dependencies for them. It is quite possible that the EBCDIC versions were wrong, since they have never been tested. Even if regcharclass.pl has bugs under EBCDIC, it is easier to find and fix those in one place, than all the sundry definitions.
* regen/regcharclass.pl: Add optimizationKarl Williamson2012-09-131-5/+42
| | | | | | On UTF-8 input known to be valid, continuation bytes must be in the range 0x80 .. 0x9F. Therefore, any tests for being within those bounds will always be true, and may be omitted.
* regen/regcharclass.pl: White-space onlyKarl Williamson2012-09-131-7/+7
| | | | Indent a newly-formed block
* regen/regcharclass.pl: Extend previously added optimizationKarl Williamson2012-09-131-13/+71
| | | | | | | | | A previous commit added an optimization to save a branch in the generated code at the expense of an extra mask when the input class has certain characteristics. This extends that to the case where sub-portions of the class have similar characteristics. The first optimization for the entire class is moved to right before the new loop that checks each range in it.
* regen/regcharclass.pl: Rmv always true components from gen'd macroKarl Williamson2012-09-131-0/+3
| | | | | | This adds a test and returns 1 from a subroutine if the condition will always match; and in the caller it adds a check for that, and omits the condition from the generated macro.
* regen/regcharclass.pl: Add an optimizationKarl Williamson2012-09-131-0/+126
| | | | | | Branches can be eliminated from the macros that are generated here by using a mask in cases where applicable. This adds checking to see if this optimization is possible, and applies it if so.
* regen/regcharclass.pl: Rename a variableKarl Williamson2012-09-131-3/+3
| | | | I find it confusing that the array element name is the same as the full array
* regen/regcharclass.pl: Pass options deeper into call stackKarl Williamson2012-09-131-8/+8
| | | | | This is to prepare for future commits which will act differently at the deep level depending on some of the options.
* Use macro not swash for utf8 quotemetaKarl Williamson2012-09-131-0/+4
| | | | | | | | | | | | | | The rules for matching whether an above-Latin1 code point are now saved in a macro generated from a trie by regen/regcharclass.pl, and these are now used by pp.c to test these cases. This allows removal of a wrapper subroutine, and also there is no need for dynamic loading at run-time into a swash. This macro is about as big as I'm comfortable compiling in, but it saves the building of a hash that can grow over time, and removes a subroutine and interpreter variables. Indeed, performance benchmarks show that it is about the same speed as a hash, but it does not require having to load the rules in from disk the first time it is used.
* regen/regcharclass.pl: Add new output macro typeKarl Williamson2012-09-131-5/+10
| | | | | | The new type 'high' is used on only above-Latin1 code points. It is designed for code that already knows the tested code point is not Latin1, and avoids unnecessary tests.
* regen/regcharclass.pl: Add documentationKarl Williamson2012-09-131-32/+128
|
* regen/regcharclass.pl: Error check input betterKarl Williamson2012-09-131-3/+15
| | | | | This makes sure that the modifiers specified in the input are known to the program.
* regen/regcharclass.pl: Allow comments in inputKarl Williamson2012-09-131-8/+8
| | | | | | Lines whose first non-blank character is a '#' are now considered to be comments, and ignored. This allows the moving of some lines that have been commented out back to after the __DATA__ where they really belong.
* regen/unicode_constants.pl: Add name parameterKarl Williamson2012-09-131-3/+11
| | | | | | | A future commit will want to use the first surrogate code point's UTF-8 value. Add this to the generated macros, and give it a name, since there is no official one. The program has to be modified to cope with this.
* regexec.c: Use new macros instead of swashesKarl Williamson2012-09-131-3/+0
| | | | | | | | | | A previous commit has caused macros to be generated that will match Unicode code points of interest to the \X algorithm. This patch uses them. This speeds up modern Korean processing by 15%. Together with recent previous commits, the throughput of modern Korean under \X has more than doubled, and is now comparable to other languages (which have increased themselved by 35%)
* regen/regcharclass.pl: Generate macros for \X processingKarl Williamson2012-09-131-0/+28
| | | | | | | \X is implemented in regexec.c as a complicated series of property look-ups. It turns out that many of those are for just a few code points, and so can be more efficiently implemented with a macro than a swash. This generates those.
* regen/regcharclass.pl: Change to work on an empty classKarl Williamson2012-09-131-0/+1
| | | | | | Future commits will add Unicode properties for this to generate macros, and some of them may be empty in some Unicode releases. This just causes such a generated macro to evaluate to 0.
* regen/regcharclass.pl: Fix bug for character '0'Karl Williamson2012-09-131-1/+1
| | | | | | The character '0' could be omitted from some generated macros due to it's testing the value of a hash entry (getting 0 or false) instead of if it exists or not.
* regen/regcharclass.pl: Work on EBCDIC platformsKarl Williamson2012-09-131-8/+40
| | | | | | | | | This will now automatically generate macros for non-ASCII platforms, by mapping the Unicode input to native output. Doing this will allow several cases of EBCDIC dependencies in other code to be removed, and fixes the bug that this previously had with non-ASCII platforms.
* regen/regcharclass.pl: Remove Encode:: dependencyKarl Williamson2012-09-131-6/+3
| | | | Newer options to unpack alleviate the need for Encode, and run faster.
* regen/regcharclass.pl: Handle ranges, \p{}Karl Williamson2012-09-131-33/+34
| | | | | | | | Instead of having to list all code points in a class, you can now use \p{} or a range. This changes some classes to use the \p{}, so that any changes Unicode makes to the definitions don't have to manually be done here as well.
* Rename regen'd hdr to reflect expanded capabilitiesKarl Williamson2012-09-131-4/+4
| | | | | The recently added utf8_strings.h has been expanded to include more than just strings. I'm renaming it to avoid confusion.