delta/perl.git - github.com: perl/perl5.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	regcomp.c: Remove #if 0 code	Karl Williamson	2011-03-02	1	-98/+0
\| \| \| \| \| \|	This code is obsolete, as new code has been written to do folding; now that smokes are all passing with that new code, there is no point to retaining the old.
*	regex: Remove obsolete code	Karl Williamson	2011-02-28	1	-24/+2
\| \| \| \| \| \| \|	This code has been rendered obsolete in 5.14 by using a different mechanism altogether. This functionality is now provided at run-time, user-selectable, via the /u and /d regex modifiers. This code was for compile-time selection of which to use.
*	regcomp.c: white space only	Karl Williamson	2011-02-28	1	-145/+149
\| \| \| \| \|	A previous commit collapsed nested blocks. This outdents the nested part
*	regcomp.c: collapse two blocks	Karl Williamson	2011-02-28	1	-6/+3
\| \| \| \| \|	An earlier commit removed code so that these two blocks can be written as one.
*	regcomp.c: Remove temporary code	Karl Williamson	2011-02-28	1	-9/+0
\| \| \| \| \| \|	This code was inserted to make sure no tests failed in the intermediate commits leading up to d50a4f90cab527593b2dd218f71b66a6be555490, and should have been removed in that commit, but I forgot to.
*	Handle [folds] of 0-255 without swashes	Karl Williamson	2011-02-27	1	-13/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 56ca34cada940c7f6aae9a59da266e541530041e had the side effect of causing regular expressions with things like [a-z], or even just [k] to go out to disk to read tables to create swashes because it knew that some of those characters matched outside the bitmap (and due to l1_char_class_tab.h it knew which ones had those matches), but it didn't know what the characters were that participated in those folds. This patch hard-codes the Unicode 6.0 rules into regcomp.c for the code points 0-255, so that the very slow utf8_heavy is not invoked on them. (Code points above 255 will continue to invoke it.) It would, of course, be better if these rules could be regen'd into regcomp.c, as there is a risk that the standard will change, and the code will not. But I don't think that has ever happened; in other words, I think that the rules haven't changed so far since Day 1 of Unicode. (That would not be the case if we were doing simple case folding, as the capital sharp ss which folds to U+00DF was added later.) And the Standard is getting more stable in this area. I believe one of their stability policies now forbid them from adding something that simply folds to one of the characters that already has a fold, such as M and m. Ligatures are frowned on, and I doubt that new ones would be encoded, so that leaves a new Unicode character that folds to a Latin-1 plus some sort of mark. For those, this code is a no-op, so those aren't a problem either.
*	regcomp.c: Add deprecation macro with extra param	Karl Williamson	2011-02-27	1	-0/+7
\|
*	regcomp.c: More prep for bitmap/nonbitmap folds	Karl Williamson	2011-02-27	1	-1/+32
\| \| \| \| \|	This sets things up in preparation for a future commit that will move calculating all folds involving characters in the bit map.
*	regcomp.c: Place marker for 2nd inversion list	Karl Williamson	2011-02-27	1	-19/+28
\| \| \| \| \|	The set_regclass_bit functions will be adding to a new inversion list. This declares that list and passes it to them.
*	Change to use new add_cp_to_invlist()	Karl Williamson	2011-02-27	1	-1/+1
\|
*	regcomp.c: Add parameters to fcns	Karl Williamson	2011-02-27	1	-23/+23
\| \| \| \| \| \|	A pointer to the list of multi-char folds in an ANYOF node is now passed to the routines that set the bit map. This is in preparation for those routines to add to the list
*	regcomp.c: Convert old-style to inversion list	Karl Williamson	2011-02-27	1	-5/+5
\| \| \| \| \|	The code that handles a false range in a [character class] hadn't been converted to use inversion lists
*	regcomp.c: Add fcn add_cp_to_invlist()	Karl Williamson	2011-02-27	1	-0/+5
\| \| \| \| \|	This is just an inline shorthand when a single code point is all that is needed. A macro could have been used instead, but this just seemed nicer.
*	regcomp.c: Move code to common place	Karl Williamson	2011-02-27	1	-3/+3
\| \| \| \| \| \|	THis is part of the refactoring of the code that sets the alternate array for multi-char folds. Changing the node type to ANYOFV can be done at the last second, in pass 2, as it doesn't change any sizing, etc.
*	regcomp.c: Factor code into a function.	Karl Williamson	2011-02-27	1	-6/+19
\| \| \| \|	A future commit uses this same code, so put it into a common place.
*	regcomp.c: Remove no longer necessary tests	Karl Williamson	2011-02-27	1	-6/+0
\| \| \| \| \|	A previous commit changed add_range_to_invlist() to do the creation that these lines did.
*	regcomp.c: accept NULL as inversion list param	Karl Williamson	2011-02-27	1	-5/+12
\| \| \| \| \| \| \|	Change the function add_range_to_invlist() to accept NULL as the inversion list, in which case it creates it. A common usage of this function is to create the list if it doesn't exist before calling it, so this just makes that code once.
*	[perl #84746] Accessing $2 causes the interpreter to crash	Father Chrysostomos	2011-02-25	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Actually, it doesn’t. The original test case was: #!/usr/bin/perl my $rx = qr'\$ (?\| {(.+?)} \| (.+?); \| (.+?)(\s) )'x; my $test = '/home/$USERNAME '; die unless $test =~ $rx; print "1: $1\n"; print "2: $2\n" if defined $2; This crashes even if I put an ‘exit’ right after the pattern match. What’s happening is that regcomp miscounts the number of capturing parenthesis pairs (cf. [perl #59734]), so the execution of the regular expression causes a buffer overflow which overwrites the op_sibling field of the regcreset op, causing a crash when the op is freed. (The exact failure may differ between builds, platforms, etc., of course.) S_reg in regcomp.c keeps a count of the parenthesised groups in a (?\|...) construct, which it updates after each branch, if that branch has more captures than any previous branch. But it was not updating the count after the last branch. So this bug would occur if the last branch had more capturing paren- theses than any previous branch. Commit ee91d26, which fixed bug #59734, only solved the problem when there was just one branch (by updating the count before the loop that deals with subsequent branches was entered). This commit changes the code at the end of S_reg to take into account that RExC_npar (the current paren count) might have been increased by the last branch. Since the loop to deal with subsequent branches resets the count before each branch, the code that commit ee91d26 added is no longer necessary, so this commit removes it.
*	Free up bit in ANYOF flags	Karl Williamson	2011-02-25	1	-22/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the foundation for fixing the regression RT #82610. My analysis was wrong that two bits could be shared, at least not without further work. This changes to use a different mechanism to pass needed information to regexec.c so that another bit can be freed up and, in a later commit, the two bits can become unshared again. The bit that is freed up is ANYOF_UTF8, which basically said there is something that is matched outside the ANYOF bitmap, and requires the target string to be in utf8. This changes things so the existence of something besides the bitmap indicates this, and so no flag is needed. The flag bit ANYOF_NONBITMAP_NON_UTF8 remains to indicate that there is something that should be matched outside the bitmap even if the target string isn't in utf8.
*	regcomp.c: ANYOF node handle range to UV_MAX	Karl Williamson	2011-02-25	1	-2/+11
\|
*	regcomp.c: Move inversion list conversion code	Karl Williamson	2011-02-25	1	-29/+32
\| \| \| \|	This just moves the code to later in the subroutine, in preparation for future commits
*	regcomp.c: Use more precise ANYOF flag	Karl Williamson	2011-02-25	1	-1/+1
\| \| \| \| \| \|	As the comment above the changed line says, \p doesn't have to match only utf8, but it sets the flag that is two bits, meaning UTF8. Set just the one flag.
*	regcomp.c: Add comment	Karl Williamson	2011-02-25	1	-1/+2
\|
*	Two functions shouldnt be declared inline	Karl Williamson	2011-02-21	1	-2/+2
\|
*	Revert "regcomp: Add warning if tries to use \p in locale."	Karl Williamson	2011-02-19	1	-8/+1
\| \| \| \| \| \| \|	This reverts commit fb2e24cdda774d9e9c28f1cd0356bba9070894c7. This turned out to be contentious, and is past the date for contentious changes.
*	Fix locale caseless matching and utf8	Karl Williamson	2011-02-19	1	-16/+66
\| \| \| \| \| \| \|	As explained in the doc changes of this patch, under /l, caseless matching of code points less than 256 now use locale rules regardless of the utf8ness of the pattern or string. They now match the behavior of things like \w, in this regard.
*	regcomp.c: no sharp ss tricky fold under locale	Karl Williamson	2011-02-19	1	-2/+4
\|
*	regcomp.c: Fix some comments	Karl Williamson	2011-02-19	1	-13/+11
\|
*	regcomp.c: Silence win32 compiler warnings	Karl Williamson	2011-02-15	1	-2/+2
\|
*	Add /aa regex modifier	Karl Williamson	2011-02-14	1	-35/+142
\| \| \| \|	Tests for \N{} with this option will be added later.
*	regcomp.c: Add cast.	Karl Williamson	2011-02-14	1	-1/+1
\| \| \| \|	I found this through gdb. Sign extension was happening.
*	regcomp.c: Handle more cases of tricky fold chars	Karl Williamson	2011-02-14	1	-0/+61
\| \| \| \| \| \| \| \| \| \| \| \| \|	Certain characters are not placed in EXACTish nodes because of problems mostly with the optimizer. However, not all notations that generated those characters were caught. This catches all but those in \N{} constructs; which is coming later. This does not use FOLDCHAR, which doesn't know the difference between /d and /u; instead it uses ANYOFV, which does handle those cases already, at the expense of larger (in storage) regexes for these few characters. If this were deemed a problem, there would be some work involved in adding FOLDCHARU, and fixing the code where it doesn't work properly now.
*	regex: Add comments	Karl Williamson	2011-02-14	1	-2/+4
\|
*	regcomp.c: Add comment	Karl Williamson	2011-02-14	1	-0/+2
\|
*	regcomp.c: simplify conditional	Karl Williamson	2011-02-14	1	-9/+5
\| \| \| \|	A previous commit removed some things, so this block can be rearranged
*	Add comments	Karl Williamson	2011-02-14	1	-0/+5
\|
*	regcomp.c: Remove special handling for U+00DF	Karl Williamson	2011-02-14	1	-26/+0
\| \| \| \|	The code elsewhere is now better equipped to handle this.
*	regcomp.c: tell regexec more about multi-char folds	Karl Williamson	2011-02-14	1	-2/+24
\| \| \| \| \|	A multi-char fold that matches in the Latin1 range needs to have that fact communicated to regexec.
*	regcomp.c: Synthetic start class should include ord >255 folds	Karl Williamson	2011-02-14	1	-0/+26
\| \| \| \| \| \| \| \|	Some characters above 255 fold to the < 256 range. These need to be in the synthetic start class so the optimizer won't reject them. This is temporary code which creates false positives, to be replaced by more precise matching later.
*	regcomp.c: Be more precise about ANYOF matching flag	Karl Williamson	2011-02-14	1	-1/+1
\| \| \| \| \|	There are two flags for matching outside the ANYOF bitmap. Instead of setting both, set the corresponding one.
*	regcomp.c: Put two static functions in embed.fnc	Karl Williamson	2011-02-14	1	-22/+26
\|
*	Update comment	Karl Williamson	2011-02-14	1	-4/+2
\|
*	regex: Deprecate \b{ and \B{	Karl Williamson	2011-02-12	1	-0/+6
\| \| \| \|	This allows future use by Perl of these
*	regcomp.c: include { in unregcognized \q{ warning	Karl Williamson	2011-02-12	1	-2/+7
\| \| \| \| \| \|	The warning message about regex unrecognized escapes passed through is changed to include any literal '{' following the 2-char escape. e.g., "\q{" will include the { in the message as part of the escape.
*	Fix up \cX for 5.14	Karl Williamson	2011-02-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Throughout 5.13 there was temporary code to deprecate and forbid certain values of X following a \c in qq strings. This patch fixes this to the final 5.14 semantics. These are: 1) a utf8 non-ASCII character will croak. This is the same behavior as pre-5.13, but it gives a correct error message, rather than the malformed utf8 message previously. 2) \c{ and \cX where X is above ASCII will generate a deprecated message. The intent is to remove these capabilities in 5.16. The original agreement was to croak on above ASCII, but that does violate our stability policy, so I'm deprecating it instead. 3) A non-deprecated warning is generated for all other \cX; this is the same as throughout the 5.13 series. I did not have the tuits to use \c{} as I had planned in 5.14, but \N{} can be used instead.
*	regcomp: Add/subtract consts to match embed.fnc	Karl Williamson	2011-02-06	1	-1/+1
\|
*	Silence compile warnings before uni tables built	Karl Williamson	2011-02-06	1	-7/+17
\| \| \| \| \| \| \| \| \| \|	The recent move of Unicode folding to the compilation phase caused spurious warnings during the miniperl build phase of Perl itself before the Unicode tables get built. Before the tables are built, Perl is unable to know about the Unicode semantics (it has ASCII/Latin1 hard-coded in), but was still trying to access the tables. Now, it checks and if the tables aren't present uses just the hard-coded ASCII/Latin1 semantics.
*	Two Safefree() changes to make -DPERL_POISON builds work again.	George Greer	2011-02-06	1	-1/+2
\| \| \| \| \| \| \|	The poison exposes a failure in t/op/magic: panic: corrupt saved stack index at - line 6. FAILED at test 7
*	Don't redefine Perl API functions in ext/re.	Craig A. Berry	2011-02-05	1	-0/+4
\|
*	Move ANYOF folding from regexec to regcomp	Karl Williamson	2011-02-02	1	-25/+205
\| \| \| \| \| \| \| \| \| \|	This is for security as well as performance. It allows Unicode properties to not be matched case sensitively. As a result the swash inversion hash is converted from having utf8 keys to numeric, code point, keys. It also for the first time fixes the bug where /i doesn't work for a code point not at the end of a range in a bracketed character class has a multi-character fold