summaryrefslogtreecommitdiff
path: root/regexec.c
Commit message (Collapse)AuthorAgeFilesLines
* Re: [ID 20010506.041] segfault when matching utf8 stringInaba Hiroto2001-05-251-0/+3
| | | | | Message-Id: <200105250124.KAA19571@toshiba.co.jp> p4raw-id: //depot/perl@10206
* [LARGE!] symbolic magicDave Mitchell2001-05-201-6/+9
| | | | | Message-Id: <200105191912.UAA23925@gizmo.fdgroup.co.uk> p4raw-id: //depot/perl@10168
* small lookbehind fixHugo van der Sanden2001-05-171-28/+17
| | | | | Message-Id: <200105172307.AAA06142@crypt.compulink.co.uk> p4raw-id: //depot/perl@10152
* Re: [PATCH bleadperl] [ID 20010426.002] Word boundry regex [...] Hugo van der Sanden2001-04-301-28/+12
| | | | | Message-Id: <200104291609.RAA17790@crypt.compulink.co.uk> p4raw-id: //depot/perl@9911
* In character classes one couldn't have 0x80..0xff charactersJarkko Hietaniemi2001-04-291-32/+33
| | | | | | at the left hand side if there were 0x100.. characters in the character class. p4raw-id: //depot/perl@9901
* A more minimal fix for 20010410.006 from Hugo.Jarkko Hietaniemi2001-04-111-9/+12
| | | p4raw-id: //depot/perl@9682
* Integrate changes #9675,9676 from maintperl into mainline.Jarkko Hietaniemi2001-04-111-11/+9
| | | | | | | | | | | | | | | fix for bug 20010410.006, undo change#7115 port the OpenBSD glob() security patch p4raw-link: @9676 on //depot/maint-5.6/perl: 3f3c3e312f619efa81ad88565a24e92f15dff662 p4raw-link: @9675 on //depot/maint-5.6/perl: c84593816ace2807d5ff27bb0745a28ec29187b1 p4raw-link: @7115 on //depot/perl: 5675c1a6395a0842c857fc8de159747577df6c4b p4raw-id: //depot/perl@9677 p4raw-integrated: from //depot/maint-5.6/perl@9672 'copy in' ext/File/Glob/bsd_glob.h (@9264..) ext/File/Glob/bsd_glob.c (@9512..) ext/File/Glob/Glob.xs (@9545..) 'merge in' t/op/pat.t (@9138..) regexec.c (@9288..) ext/File/Glob/Glob.pm (@9512..)
* Not OK: perl v5.7.0 +DEVEL9472 on VMS_AXP V7.1 (UNINSTALLED)Peter Prymmer2001-03-311-1/+7
| | | | | Message-ID: <Pine.OSF.4.10.10103301805450.63762-100000@aspara.forte.com> p4raw-id: //depot/perl@9485
* More EBCDIC stuff:Nick Ing-Simmons2001-03-201-5/+5
| | | | | | | | | | | | | | - Loose the extra level of function on ASCII. - spotted a chr(0) issue in sv.c - re-work of UTF-X tr/// ranges to work in Unicode space. Still issues with the "0xff is illegal UTF-8" hack. - Yet another ad. hoc. utf8 'upgrade' in op.c recoded (why do it once when you can do it all over the place :-( - Enable HINTS_UTF8 on EBCDIC - then ignore it in toke.c, need utf8.pm for swashes. - Simplified and commented scan_const() in toke.c Still something wrong regexp and tr (swashes?). p4raw-id: //depot/perlio@9267
* Uninitialized Memory Read in regexec.cGurusamy Sarathy2001-03-141-1/+1
| | | p4raw-id: //depot/perl@9148
* Fix for ID 20010306.008, UTF-8 and \w without 'use utf8' coredump.Jarkko Hietaniemi2001-03-101-1/+23
| | | p4raw-id: //depot/perl@9098
* EBCDIC sanity - phase INick Ing-Simmons2001-03-101-95/+95
| | | | | | | | | | | | | | - rename utf8/uv functions to indicate what sort of uv they provide (uvuni/uvchr) - use utf8n_xxxx (c.f. pvn) for forms which take length. - back out vN.N and $^V exceptions to e2a/a2e - make "locale" isxxx macros be uvchr (may be redundant?) Not clear yet that toUPPER_uni et. al. return being handled correctly. The tr// and rexexp stuff still needs an audit, assumption is they are working in Unicode space. Need to provide v5.6 names for XS modules (decide is uni or chr ?). p4raw-id: //depot/perlio@9096
* Re: [ PATCH perl@8956 ] new debug option -DR shows ref countsDave Mitchell2001-03-091-1/+1
| | | | | Message-Id: <200103081206.MAA06281@tiree.fdgroup.co.uk> p4raw-id: //depot/perl@9084
* Misapplied regex optimizations when \C is present.Jarkko Hietaniemi2001-02-181-18/+9
| | | | | | | | | | Fixes 20001230.002. What still remains broken is that the submatches that have \C in them get their UTF8 flag on because their parent SV has it on. This will result in malformed UTF8 if a \C happened to match a non-ASCII byte. p4raw-id: //depot/perl@8836
* Retract #8762.Jarkko Hietaniemi2001-02-111-1/+16
| | | p4raw-id: //depot/perl@8769
* (Retracted by #8769)Jarkko Hietaniemi2001-02-101-16/+1
| | | p4raw-id: //depot/perl@8762
* More documentation for the regexp context stack.Jarkko Hietaniemi2001-01-271-2/+7
| | | p4raw-id: //depot/perl@8566
* Document the regex content pushing/popping a bit better.Jarkko Hietaniemi2001-01-211-7/+16
| | | p4raw-id: //depot/perl@8510
* One more UTF-8 fix from Inaba Hiroto.Jarkko Hietaniemi2001-01-121-3/+2
| | | p4raw-id: //depot/perl@8415
* Mea culpa: I botched up Hugo's "Tw" bug fix when applying it.Jarkko Hietaniemi2001-01-121-2/+2
| | | p4raw-id: //depot/perl@8414
* Re: [ID 20001029.005] Regex error: "cd. (A. Tw)" !~ /\((\w\. \w+)\)/Hugo van der Sanden2001-01-111-1/+1
| | | | | Message-Id: <200010300133.BAA10390@crypt.compulink.co.uk> p4raw-id: //depot/perl@8403
* One more patch for UTF8 Inaba Hiroto2001-01-091-0/+2
| | | | | | | Message-ID: <3A59E510.52BAB5B9@st.rim.or.jp> UTF-8 fixes for 'x' and tr///. p4raw-id: //depot/perl@8378
* UTF-8 cleanup.Jarkko Hietaniemi2001-01-051-1/+1
| | | p4raw-id: //depot/perl@8328
* Bump up Larry's copyright.Jarkko Hietaniemi2001-01-011-1/+1
| | | p4raw-id: //depot/perl@8289
* more UTF8 test suites and an UTF8 patchInaba Hiroto2000-12-301-181/+338
| | | | | | | | Message-ID: <3A4D722D.243AFD88@st.rim.or.jp> Just the patch part for now, and the pragma renamed as unicode::distinct. p4raw-id: //depot/perl@8267
* [ID 20001218.005] Not OK: perl v5.7.0 +DEVEL8148 on powerpc-machten 4.1.4Dominic Dunlop2000-12-191-0/+1
| | | | | | | Message-Id: <p04320404b6639e7aa043@[192.168.1.4]> This patchlet is needed in order that perl can be statically linked. p4raw-id: //depot/perl@8191
* Polymorphic regexps.Jarkko Hietaniemi2000-12-171-663/+650
| | | | | | | Fixes at least the bugs 20001028.003 (both of them...) and 20001108.001. The bugs 20001114.001 and 20001205.014 seem also to be fixed by now, probably already before this patch. p4raw-id: //depot/perl@8143
* dTHR is a nop in 5.6.0 onwards. Ergo, it can go.Jarkko Hietaniemi2000-12-051-14/+0
| | | p4raw-id: //depot/perl@7984
* On DEBUGGING make ANYOFUTF8 nodes store away also the SVJarkko Hietaniemi2000-12-031-4/+15
| | | | | | used to swash_init(), makes regprop() dumps more informative (+utf8::IsAlpha, -utf8::IsDigit, for example). p4raw-id: //depot/perl@7969
* Make uv_to_utf8() to zero-terminate its output buffer,Jarkko Hietaniemi2000-12-031-1/+1
| | | | | always use (at least) UTF8_MAXLEN + 1 U8s deep buffer. p4raw-id: //depot/perl@7967
* Get the three different space character classes right under utf8.Jarkko Hietaniemi2000-12-011-1/+1
| | | p4raw-id: //depot/perl@7940
* It seems that *both* the unused submatch loop cleanupJarkko Hietaniemi2000-11-271-6/+14
| | | | | codes are needed. p4raw-id: //depot/perl@7881
* The code in regcppop() (see #7878) contains the correct lowerJarkko Hietaniemi2000-11-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | limit for the unused submatch 'cleanup' loop so that under "use utf8" the following code wouldn't dump core: "," =~ /([^,]*,)*/ With the the wrong lower limit (>=1) the cleanup loop in regtry() stomped beyond allocated area in the startp[] array. Therefore, copied the correct lower loop limit (*PL_reglastparen) to regtry(). Note: something may still not be quite right: why was the _higher_ loop limit (prog->nparens) different in the utf8 case? After this patch "./perl -Ilib -Mutf8 t/op/regexp.t" works without core dumps, there were about 17 of them before the patch (with us since Perl 5.7.0). Two failures, still: 496 and 505 (though these may not be severe). Patch #7881 is also needed since both the cleanup loops seem to be needed. Also, the t/op/pat#44 seems to core dump under utf8. Plus a couple of failures. UGH-8. p4raw-id: //depot/perl@7879
* The unused submatch cleanup code in regtry() seems to be more crucial,Jarkko Hietaniemi2000-11-271-1/+17
| | | | | | | the code in regcppop() seems to be redundant for the test suite -- but it contains a germ of truth, and it needed for the build process itself: see #7879 and #7881. p4raw-id: //depot/perl@7878
* Comment on comment.Jarkko Hietaniemi2000-11-271-1/+2
| | | p4raw-id: //depot/perl@7877
* BOUND regex opcodes (\b, \B) could try to scan zero length UTF-8.Jarkko Hietaniemi2000-11-261-14/+29
| | | p4raw-id: //depot/perl@7873
* The first step in removing recursion from the REx engineIlya Zakharevich2000-11-201-51/+142
| | | | | Message-ID: <20001119223026.A5165@monk.mps.ohio-state.edu> p4raw-id: //depot/perl@7760
* restore match data on backtracingIlya Zakharevich2000-11-181-1/+7
| | | | | Message-ID: <20001117172802.A1032@monk.mps.ohio-state.edu> p4raw-id: //depot/perl@7733
* Continue the internal UTF-8 API tweaking.Jarkko Hietaniemi2000-10-251-4/+4
| | | | | | | | Rename utf8_to_uv_chk() back to utf8_to_uv() because it's used much more than the simpler API, now called utf8_to_uv_simple(). Still not quite happy with API, too much partial duplication of functionality. p4raw-id: //depot/perl@7439
* Make the UTF-8 decoding stricter and more verbose whenJarkko Hietaniemi2000-10-241-4/+9
| | | | | | | | | | | | malformation happens. This involved adding an argument to utf8_to_uv_chk(), which involved changing its prototype, and prefer STRLEN over I32 for the UTF-8 length, which as a domino effect necessitated changing the prototypes of scan_bin(), scan_oct(), scan_hex(), and reg_uni(). The stricter UTF-8 decoding checking uses Markus Kuhn's UTF-8 Decode Stress Tester from http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt p4raw-id: //depot/perl@7416
* Re: [ID 20001021.005] SEGV with regex match Hugo van der Sanden2000-10-231-4/+5
| | | | | Message-Id: <200010222347.AAA09697@crypt.compulink.co.uk> p4raw-id: //depot/perl@7407
* Minor optimization in re_intuit_startIlya Zakharevich2000-10-021-8/+10
| | | | | Message-ID: <20000928215531.A4315@monk.mps.ohio-state.edu> p4raw-id: //depot/perl@7115
* Re-instate Perl_utf8_to_uv without checking parameter - added in change 7075.Nick Ing-Simmons2000-09-301-4/+4
| | | | | | | i.e. rename Simon's function to Perl_utf8_to_uv_chk, change all calls to it to use new name and add Perl_utf8_to_uv() as a wrapper which calls it passing 0 to checking to get the warning. p4raw-id: //depot/perl@7096
* Batch of UTF-8 patches from Simon Cozens.Jarkko Hietaniemi2000-09-141-4/+4
| | | p4raw-id: //depot/perl@7075
* Fix forMarc Lehmann2000-09-071-1/+0
| | | | | | | | Subject: [ID 20000903.001] \w in utf8-strings Message-Id: <E13VUS5-0000cv-00.pgcc-forever-2000-09-03-09-44-29@fuji> and various related nits. p4raw-id: //depot/perl@7030
* Add [[:blank:]] as suggested inJeffrey Friedl2000-08-181-1/+5
| | | | | | | | | | | | | | Subject: [ID 20000716.024] [=cc=] / [:blank:] Message-Id: <200007170055.RAA23528@fummy.dsl.yahoo.com> (the [=cc=] has already been taken care of by #6439 so the whole bug report can be closed) and make [[:space:]] to be equivalent to isspace(3) (as opposed to \s, which is isSPACE()). The difference is that now [[:space:]] matches the mythical vertical tab, while \s doesn't. p4raw-id: //depot/perl@6703
* Re: [ID 20000809.005] trouble with long string and /m modifier - ↵Hugo van der Sanden2000-08-111-2/+11
| | | | | | | uninitialized value Message-Id: <200008101823.TAA23580@crypt.compulink.co.uk> p4raw-id: //depot/perl@6591
* [ID 20000731.010] regex error Hugo van der Sanden2000-08-021-1/+1
| | | | | Message-Id: <200008021353.OAA24761@crypt.compulink.co.uk> p4raw-id: //depot/perl@6493
* Replace change #6337 with a better one.Hugo van der Sanden2000-07-141-34/+20
| | | | | | | | Subject: Re: [PATCH] [ID 20000701.002] Regular Expressions Not Unsetting $1 Vars When Backtracking Message-Id: <200007140316.EAA15857@crypt.compulink.co.uk> p4raw-link: @6337 on //depot/cfgperl: f06a1d4e6ae96bf8af49f0ef1c79f500d8de0143 p4raw-id: //depot/cfgperl@6395
* [ID 20000701.002] Regular Expressions Not Unsetting $1 Vars When Backtracking Hugo van der Sanden2000-07-111-0/+2
| | | | | Message-Id: <200007111144.MAA04446@crypt.compulink.co.uk> p4raw-id: //depot/cfgperl@6337