summaryrefslogtreecommitdiff
path: root/regcomp.c
Commit message (Collapse)AuthorAgeFilesLines
* Integrate perlio:Jarkko Hietaniemi2001-03-281-8/+34
| | | | | | | | | | | | | | [ 9400] More EBCDIC tweaks: - one more swash issue &~(0xA0-1) did not do the right thing, for UTF-EBCDIC where &~(0x80-1) does for UTF-8. - add "use re 'asciirange'" to make [!-~] etc. work use it in MIME::QuotedPrint and t/op/regexp.t and t/op/pat.t - Choose a key for t/op/each.t test which gets encoded. - Skip utf8decode if this is UTF-EBCDIC. p4raw-link: @9400 on //depot/perlio: daf0f78e031c718c75590ef9ef573756f805776e p4raw-id: //depot/perl@9407
* More EBCDIC stuff:Nick Ing-Simmons2001-03-201-10/+6
| | | | | | | | | | | | | | - Loose the extra level of function on ASCII. - spotted a chr(0) issue in sv.c - re-work of UTF-X tr/// ranges to work in Unicode space. Still issues with the "0xff is illegal UTF-8" hack. - Yet another ad. hoc. utf8 'upgrade' in op.c recoded (why do it once when you can do it all over the place :-( - Enable HINTS_UTF8 on EBCDIC - then ignore it in toke.c, need utf8.pm for swashes. - Simplified and commented scan_const() in toke.c Still something wrong regexp and tr (swashes?). p4raw-id: //depot/perlio@9267
* Integrate changes #9137,9138,9142 from maintperl into mainline.Jarkko Hietaniemi2001-03-141-3/+2
| | | | | | | | | | | | | | | | | | | | | | fix leak in pregcomp() when RE fails to compile (e.g. m/\\/) remove squelch controls for "Scalars leaked" messages in most places (these are now cured) fix another memory leak reported by purify (tie callbacks that croak can leak when wiping out magic) p4raw-link: @9142 on //depot/maint-5.6/perl: 26972843796e21c404c9d13ec5ee86e7b952a2bd p4raw-link: @9138 on //depot/maint-5.6/perl: ad7f1144250940f9ca43bac32708ec5e718b30ff p4raw-link: @9137 on //depot/maint-5.6/perl: 1f35595ecca168b4f66e3399344799fdbd496d17 p4raw-id: //depot/perl@9144 p4raw-integrated: from //depot/maint-5.6/perl@9143 'copy in' t/pragma/strict-vars (@7318..) t/pragma/warn/regcomp (@7887..) t/op/regexp.t (@8551..) t/op/lex_assign.t (@8987..) 'merge in' t/op/local.t (@5902..) t/pragma/warn/op (@7846..) t/pragma/warnings.t (@7895..) t/comp/proto.t (@8173..) t/pragma/warn/toke (@8570..) regcomp.c (@8777..) scope.c (@8855..) t/op/pat.t (@9076..)
* regcomp.c is working in native space, not Unicode space (if different)Nick Ing-Simmons2001-03-111-9/+8
| | | | | as it is doing compare against 'W' in \W etc. p4raw-id: //depot/perlio@9106
* Audit #ifdef EBCDIC and #ifndef ASCIIish, replace latter with former.Nick Ing-Simmons2001-03-111-20/+7
| | | | | Use ASCII_TO_NATIVE and NATIVE_TO_ASCII to avoid some #ifs. p4raw-id: //depot/perlio@9105
* Fix for ID 20010306.008, UTF-8 and \w without 'use utf8' coredump.Jarkko Hietaniemi2001-03-101-18/+0
| | | p4raw-id: //depot/perl@9098
* EBCDIC sanity - phase INick Ing-Simmons2001-03-101-11/+11
| | | | | | | | | | | | | | - rename utf8/uv functions to indicate what sort of uv they provide (uvuni/uvchr) - use utf8n_xxxx (c.f. pvn) for forms which take length. - back out vN.N and $^V exceptions to e2a/a2e - make "locale" isxxx macros be uvchr (may be redundant?) Not clear yet that toUPPER_uni et. al. return being handled correctly. The tr// and rexexp stuff still needs an audit, assumption is they are working in Unicode space. Need to provide v5.6 names for XS modules (decide is uni or chr ?). p4raw-id: //depot/perlio@9096
* Make /x{abcd}/ to work without use utf8.Jarkko Hietaniemi2001-03-061-0/+2
| | | p4raw-id: //depot/perl@9058
* Retract #8929,8930,8932,8933 for now.Jarkko Hietaniemi2001-02-251-31/+43
| | | p4raw-id: //depot/perl@8935
* (Retracted by #8395.)Jarkko Hietaniemi2001-02-251-43/+31
| | | | | Attempt to fix the EBCDIC character range problem with //. p4raw-id: //depot/perl@8930
* Misapplied regex optimizations when \C is present.Jarkko Hietaniemi2001-02-181-0/+3
| | | | | | | | | | Fixes 20001230.002. What still remains broken is that the submatches that have \C in them get their UTF8 flag on because their parent SV has it on. This will result in malformed UTF8 if a \C happened to match a non-ASCII byte. p4raw-id: //depot/perl@8836
* Re: [ID 20010212.006] Core dump with /((?:hard|soft)cover)?/ Hugo van der Sanden2001-02-131-6/+4
| | | | | Message-Id: <200102130011.AAA14310@crypt.compulink.co.uk> p4raw-id: //depot/perl@8779
* Manually applied version for dev branch of Alan/Sarathy 5.6 patch.Alan Burlison2001-02-071-118/+117
| | | | | | Subject: Re: Incorrect scoping of PL_reg_start_tmp causes leak Message-Id: <3A808A9D.20F7A035@uk.sun.com> p4raw-id: //depot/perl@8711
* regcomp.c old feature removalMark-Jason Dominus2001-01-161-5/+0
| | | | | Message-ID: <20010116144318.7140.qmail@plover.com> p4raw-id: //depot/perl@8455
* One more patch for UTF8 Inaba Hiroto2001-01-091-1/+5
| | | | | | | Message-ID: <3A59E510.52BAB5B9@st.rim.or.jp> UTF-8 fixes for 'x' and tr///. p4raw-id: //depot/perl@8378
* UTF-8 cleanup.Jarkko Hietaniemi2001-01-051-1/+5
| | | p4raw-id: //depot/perl@8328
* Bump up Larry's copyright.Jarkko Hietaniemi2001-01-011-1/+1
| | | p4raw-id: //depot/perl@8289
* more UTF8 test suites and an UTF8 patchInaba Hiroto2000-12-301-40/+89
| | | | | | | | Message-ID: <3A4D722D.243AFD88@st.rim.or.jp> Just the patch part for now, and the pragma renamed as unicode::distinct. p4raw-id: //depot/perl@8267
* Comments work so much better when they are closed.Jarkko Hietaniemi2000-12-181-1/+1
| | | p4raw-id: //depot/perl@8184
* Some compilers (e.g. HP-UX) can't switch on 64-bit integers.Jarkko Hietaniemi2000-12-181-2/+8
| | | | | Fixes the bug 20001218.016. p4raw-id: //depot/perl@8183
* Polymorphic regexps.Jarkko Hietaniemi2000-12-171-501/+339
| | | | | | | Fixes at least the bugs 20001028.003 (both of them...) and 20001108.001. The bugs 20001114.001 and 20001205.014 seem also to be fixed by now, probably already before this patch. p4raw-id: //depot/perl@8143
* dTHR is a nop in 5.6.0 onwards. Ergo, it can go.Jarkko Hietaniemi2000-12-051-26/+0
| | | p4raw-id: //depot/perl@7984
* On DEBUGGING make ANYOFUTF8 nodes store away also the SVJarkko Hietaniemi2000-12-031-2/+40
| | | | | | used to swash_init(), makes regprop() dumps more informative (+utf8::IsAlpha, -utf8::IsDigit, for example). p4raw-id: //depot/perl@7969
* Implement ANYOFUTF8 regprop() dumping.Jarkko Hietaniemi2000-12-031-10/+39
| | | p4raw-id: //depot/perl@7968
* Make uv_to_utf8() to zero-terminate its output buffer,Jarkko Hietaniemi2000-12-031-7/+1
| | | | | always use (at least) UTF8_MAXLEN + 1 U8s deep buffer. p4raw-id: //depot/perl@7967
* Get the three different space character classes right under utf8.Jarkko Hietaniemi2000-12-011-7/+8
| | | p4raw-id: //depot/perl@7940
* \x{} doesn't any more require 'use utf8' outside regexen so whyJarkko Hietaniemi2000-12-011-7/+1
| | | | | should it be required inside regexen? p4raw-id: //depot/perl@7938
* Fix for 20001130.008 and 20001130.010, the PL_regnpar wasn'tJarkko Hietaniemi2000-12-011-0/+1
| | | | | | stored and restored, and thusly was trounced by the utf8 swash routines. p4raw-id: //depot/perl@7937
* Debug dump of ANYOFUTF8 was garbage (data from ANYOF).Jarkko Hietaniemi2000-11-261-16/+24
| | | | | | | | Not really fixed (should really dump the UTF-8 charclass), but stopped displaying the garbage. Also add a note on the (missing) Unicode PSXSPC and BLANK. p4raw-id: //depot/perl@7874
* Message nit.Jarkko Hietaniemi2000-11-261-1/+1
| | | p4raw-id: //depot/perl@7870
* Fixes for signedness warnings noticed by VMSperlers.Jarkko Hietaniemi2000-11-221-2/+6
| | | p4raw-id: //depot/perl@7824
* Overeager visited-positions optimizationsIlya Zakharevich2000-11-221-7/+27
| | | | | Message-ID: <20001120183051.A15228@monk.mps.ohio-state.edu> p4raw-id: //depot/perl@7815
* [PATCH 5.7.0] make regcomp reenterableIlya Zakharevich2000-11-181-578/+624
| | | | | | | | | | | | | Date: Fri, 17 Nov 2000 20:35:11 -0500 Message-ID: <20001117203511.A13121@monk.mps.ohio-state.edu> Subject: Re: [PATCH 5.7.0] make regcomp reenterable From: Ilya Zakharevich <ilya@math.ohio-state.edu> Date: Fri, 17 Nov 2000 21:03:47 -0500 Message-ID: <20001117210347.A16570@monk.mps.ohio-state.edu> Plus a little bit of tweaking in pregcomp(). p4raw-id: //depot/perl@7741
* restore match data on backtracingIlya Zakharevich2000-11-181-10/+32
| | | | | Message-ID: <20001117172802.A1032@monk.mps.ohio-state.edu> p4raw-id: //depot/perl@7733
* Too profiler-happy: with optimization the #7590 actually makesJarkko Hietaniemi2000-11-071-3/+6
| | | | | | the test to run 0.5% _slower_. Requires much more instrumentation. Retract #7590. p4raw-id: //depot/perl@7591
* Shave off about 5% (Digital UNIX, -g, pixie) of the op/regexpJarkko Hietaniemi2000-11-071-6/+3
| | | | | | execution time in regcomp.c S_cl_any() and S_cl_is_anything() by using memset() and testing bytewise (as opposed to bitwise). p4raw-id: //depot/perl@7590
* [ID 20001031.004] Uninitialized auto variable in regcomp.cMartin Husemann2000-11-011-2/+2
| | | | | Message-Id: <200010312239.e9VMdZR01580@night-porter.duskware.de> p4raw-id: //depot/perl@7512
* Continue the internal UTF-8 API tweaking.Jarkko Hietaniemi2000-10-251-3/+3
| | | | | | | | Rename utf8_to_uv_chk() back to utf8_to_uv() because it's used much more than the simpler API, now called utf8_to_uv_simple(). Still not quite happy with API, too much partial duplication of functionality. p4raw-id: //depot/perl@7439
* Make the UTF-8 decoding stricter and more verbose whenJarkko Hietaniemi2000-10-241-8/+13
| | | | | | | | | | | | malformation happens. This involved adding an argument to utf8_to_uv_chk(), which involved changing its prototype, and prefer STRLEN over I32 for the UTF-8 length, which as a domino effect necessitated changing the prototypes of scan_bin(), scan_oct(), scan_hex(), and reg_uni(). The stricter UTF-8 decoding checking uses Markus Kuhn's UTF-8 Decode Stress Tester from http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt p4raw-id: //depot/perl@7416
* Re-instate Perl_utf8_to_uv without checking parameter - added in change 7075.Nick Ing-Simmons2000-09-301-3/+3
| | | | | | | i.e. rename Simon's function to Perl_utf8_to_uv_chk, change all calls to it to use new name and add Perl_utf8_to_uv() as a wrapper which calls it passing 0 to checking to get the warning. p4raw-id: //depot/perl@7096
* continued -Wformat supportRobin Barker2000-09-141-11/+11
| | | | | Message-Id: <200009141707.SAA13276@tempest.npl.co.uk> p4raw-id: //depot/perl@7081
* Fix for a parsing bug, not for the original bug.Spider Boardman2000-09-141-0/+3
| | | | | | Subject: Re: [ID 20000910.005] Another segfault with regexes. Message-Id: <200009132152.RAA24029@leggy.zk3.dec.com> p4raw-id: //depot/perl@7076
* Batch of UTF-8 patches from Simon Cozens.Jarkko Hietaniemi2000-09-141-3/+3
| | | p4raw-id: //depot/perl@7075
* nextchar() abuse misses an optimisationHugo van der Sanden2000-08-221-2/+2
| | | | | Message-Id: <200008221021.LAA03332@crypt.compulink.co.uk> p4raw-id: //depot/perl@6770
* Rename the macro argument because some preprocessorsJarkko Hietaniemi2000-08-211-8/+8
| | | | | | can't tell the difference and expand arguments also inside double quoted strings. p4raw-id: //depot/perl@6747
* Fix a core dump in lib/selfloader under -DDEBUGGING.Spider.Boardman@Orb.Nashua.NH.US2000-08-191-2/+7
| | | | | | Subject: PATCH @6698 for [ID 20000817.007] Not OK: perl v5.7.0 +SUIDMAIL +DEVEL6676 on alpha-dec_osf 4.0f (UNINSTALLED) Message-Id: <200008182241.SAA29667@Orb.Nashua.NH.US> p4raw-id: //depot/perl@6709
* Add [[:blank:]] as suggested inJeffrey Friedl2000-08-181-6/+55
| | | | | | | | | | | | | | Subject: [ID 20000716.024] [=cc=] / [:blank:] Message-Id: <200007170055.RAA23528@fummy.dsl.yahoo.com> (the [=cc=] has already been taken care of by #6439 so the whole bug report can be closed) and make [[:space:]] to be equivalent to isspace(3) (as opposed to \s, which is isSPACE()). The difference is that now [[:space:]] matches the mythical vertical tab, while \s doesn't. p4raw-id: //depot/perl@6703
* Tweak the regex compilation errors once more.Jarkko Hietaniemi2000-08-171-3/+3
| | | p4raw-id: //depot/perl@6663
* Change the regx compilation error markers to use = instead of <Lupe Christoph2000-08-161-2/+2
| | | | | | | | since pod makes using the latter quite messy. Reported in ID 20000814.006 by Abigail and in Subject: Unknown escape E<> ? Message-ID: <20000811003027.F17420@alanya.lupe-christoph.de> p4raw-id: //depot/perl@6653
* Get back into sync with Jeffrey on the enhanced regex warnings.Jarkko Hietaniemi2000-08-101-6/+6
| | | p4raw-id: //depot/perl@6563