summaryrefslogtreecommitdiff
path: root/regcomp.c
Commit message (Collapse)AuthorAgeFilesLines
* More UTF-8 EXACT tweaking, plus a forgotten UTF-8 Jarkko Hietaniemi2001-11-061-1/+2
| | | | | toggle-on from the encoding pragma. p4raw-id: //depot/perl@12872
* Implement the encoding pragma for regex literals.Jarkko Hietaniemi2001-11-061-0/+15
| | | p4raw-id: //depot/perl@12864
* Regex debugging fixes from Hugo.Jarkko Hietaniemi2001-11-051-1/+1
| | | p4raw-id: //depot/perl@12858
* Don't bother doing POSIX charclass parsing if itJarkko Hietaniemi2001-11-031-18/+22
| | | | | | | | | | | | | | | | | | possibly cannot be so. Prepares way for charclass syntax like [[abc]||[def]] (or just [[abc][def]]) for union, [[\w]&&[$a]] for intersection, and [[a-z]&&[^def]] for subtraction. Currently /[[a]/ (or /[a[]/) parses as a character class containing two characters, "[" and "a", this may have to be broken for the syntax described above, otherwise we would have to scan the whole pattern to find out whether the square brackets match pairwise. Luckily, the special case of "[" doesn't seem to be documented (as opposed to "]" and "-"), so we may have better story for breaking it... One can always use \[ if one wants a literal "[", so there. p4raw-id: //depot/perl@12835
* Comment correction.Jarkko Hietaniemi2001-11-031-1/+1
| | | p4raw-id: //depot/perl@12834
* STRLEN != int.Jarkko Hietaniemi2001-10-251-1/+1
| | | p4raw-id: //depot/perl@12658
* This takes care of some of the re 'debug' cases butJarkko Hietaniemi2001-10-251-2/+3
| | | | | | | not all cases since the information whether the pattern or the target are utf8 seems to be either lost or not spread widely enough, sigh. p4raw-id: //depot/perl@12631
* Dump Unicode better for re 'debug'. The regprop()Jarkko Hietaniemi2001-10-241-4/+14
| | | | | | is unfinished since have to figure out how to detect Unicodeness in there. p4raw-id: //depot/perl@12621
* Make the toupper/lower/title API for Unicode not rightJarkko Hietaniemi2001-10-091-4/+4
| | | | | | but at least less wrong: prepare for the mapping being more than just one-character-to-one-character. p4raw-id: //depot/perl@12371
* Be careful to pull chars from the varargs stack whenJarkko Hietaniemi2001-10-011-3/+5
| | | | | formatting chars. p4raw-id: //depot/perl@12292
* Further tweaks to the Unicode properties.Jarkko Hietaniemi2001-10-011-2/+2
| | | p4raw-id: //depot/perl@12286
* More leniency to the \p and \P: now can have whitespaceJarkko Hietaniemi2001-09-291-5/+20
| | | | | | | between the property definition and the curlies; now can invert the property by having a caret between the open curly and the property. p4raw-id: //depot/perl@12269
* Re: the remaining bugs in \x escapes (was Re: [PATCH] oct and hex in ↵Nicholas Clark2001-09-101-4/+6
| | | | | | | glorious 64 bit (with less bugs) (was Re: hex and oct again (was Re: FreeBSD MD5 crypt? Re: crypt/hex/oct and Unicode?))) Message-ID: <20010911000031.G1512@plum.flirble.org> p4raw-id: //depot/perl@11990
* Using strlen() not good on embedded nul bytes.Jarkko Hietaniemi2001-09-101-13/+13
| | | p4raw-id: //depot/perl@11967
* [PATCH regcomp.c] zero-width assertions CAN be ?'dJeff Pinyan2001-09-091-0/+4
| | | | | | | | | | | | | | | Date: Sat, 8 Sep 2001 15:42:30 -0400 (EDT) Message-ID: <Pine.GSO.4.21.0109081535480.24489-100000@crusoe.crusoe.net> Subject: Re: [PATCH t/op/misc.t] regcomp.c patch broke test From: "Jeff 'japhy/Marillion' Pinyan" <jeffp@crusoe.net> Date: Sat, 8 Sep 2001 18:33:12 -0400 (EDT) Message-ID: <Pine.GSO.4.21.0109081832030.24489-100000@crusoe.crusoe.net> Subject: [PATCH t/lib/warnings/regcomp] (?=...)? gives no warning now From: "Jeff 'japhy/Marillion' Pinyan" <jeffp@crusoe.net> Date: Sat, 8 Sep 2001 18:37:22 -0400 (EDT) Message-ID: <Pine.GSO.4.21.0109081835340.24489-100000@crusoe.crusoe.net> p4raw-id: //depot/perl@11956
* oct and hex in glorious 64 bit (with less bugs) (was Re: hex and oct again ↵Nicholas Clark2001-09-051-14/+21
| | | | | | | (was Re: FreeBSD MD5 crypt? Re: crypt/hex/oct and Unicode?)) Message-ID: <20010904224250.P25120@plum.flirble.org> p4raw-id: //depot/perl@11874
* Rename the variable: it *used* to be (wrongly) that theJarkko Hietaniemi2001-09-021-1/+1
| | | | | | code related to PL_reg_sv (so PL_reg_sv_utf8 was logical) but that is no more the case: PL_reg_match_utf8 is better. p4raw-id: //depot/perl@11823
* remove deprecated PERL_OBJECT cruft, it has long since stoppedGurusamy Sarathy2001-08-311-13/+3
| | | | | working in 5.7.x p4raw-id: //depot/perl@11803
* Fixes bug in change 11717 that bus errored on HP-UX 10.20Artur Bergman2001-08-311-1/+1
| | | | | Might break on platforms where bool is larger than 8 bites ??? p4raw-id: //depot/perl@11800
* Change 11797 sneaked in a faulty regcomp.c change which wasn'tArtur Bergman2001-08-311-1/+1
| | | | | supposed to happen. p4raw-id: //depot/perl@11798
* Re: Problem in ext/Time/HiRest/HiRes.t John P. Linderman2001-08-311-1/+1
| | | | | | Message-Id: <200108311220.IAA54125@raptor.research.att.com> Fixes test 14 which could fail randomly in rare cases. p4raw-id: //depot/perl@11797
* New try for ID 20010407.006: detach the semanticsJarkko Hietaniemi2001-08-181-0/+1
| | | | | "was the last match target UTF8" into its own variable. p4raw-id: //depot/perl@11717
* Re: [ID 20010809.023] perlre misleads when stating that (?i) should be at ↵Hugo van der Sanden2001-08-151-0/+1
| | | | | | | front of pattern Message-Id: <200108151032.f7FAWBI30961@crypt.compulink.co.uk> p4raw-id: //depot/perl@11677
* Not quite so relicy as thought in #11651 (op/concat #4 and #5Jarkko Hietaniemi2001-08-121-1/+5
| | | | | stopped working). p4raw-id: //depot/perl@11653
* More (less) regex/utf8 relics. (Toned down later in #11653.)Jarkko Hietaniemi2001-08-121-5/+1
| | | p4raw-id: //depot/perl@11651
* Continuation of #11575: SANY_SEEN completely deprecated,Jarkko Hietaniemi2001-08-041-5/+1
| | | | | | plus more tests that unearthed a bug in @a = ($utf8 =~ /\C/g), plus a fix for the bug. p4raw-id: //depot/perl@11577
* Decouple SANY into SANY and CANY: the new SANY is /./s,Daniel P. Berrange2001-08-041-3/+7
| | | | | | | | | the new CANY is the \C. The problem reported and the test case supplied in Subject: UTF-8 bugs in string length & single line regex matches Message-ID: <20010803113932.A19318@berrange.com> p4raw-id: //depot/perl@11575
* [patch] refcount re opsDoug MacEachern2001-08-041-1/+8
| | | | | Message-ID: <Pine.LNX.4.21.0108031814240.23972-100000@mako.covalent.net> p4raw-id: //depot/perl@11568
* patch to add DEL to [:cntrl:]Jeffrey Friedl2001-07-141-1/+1
| | | | | Message-Id: <200107140625.XAA01517@ventrue.corp.yahoo.com> p4raw-id: //depot/perl@11371
* Patch: document reg_data.what memberMark-Jason Dominus2001-07-121-0/+1
| | | | | Message-ID: <20010712182532.14821.qmail@plover.com> p4raw-id: //depot/perl@11322
* Code cleanup based on turning off the -woffs in IRIX.Jarkko Hietaniemi2001-06-301-1/+6
| | | | | | | | Not all of the gripes cleaned up (hairy code in hv.c and regcomp.c; unused newsp, gimme, and optype from cop.h macros; unused 'key' arguments in ?DBM_File.xs) (and the -woffs left to the IRIX hints) p4raw-id: //depot/perl@11051
* Partially fix a problem noticed by IRIX compiler:Jarkko Hietaniemi2001-06-241-1/+3
| | | | | | | the initialization of parse_start was bypassed by several gotos. Now initialized to zero, which may not be the best choice. p4raw-id: //depot/perl@10906
* Re: perl@10722: Bogus warnings on REs Hugo van der Sanden2001-06-211-1/+2
| | | | | | | Message-Id: <200106210851.JAA01942@crypt.compulink.co.uk> Unroll to avoid a UTS compiler bug. p4raw-id: //depot/perl@10774
* RE: [PATCHES] regcomp.c, pod/perldiag.pod, t/op/pat.tPaul Marquess2001-06-031-2/+10
| | | | | Message-ID: <000601c0ebae$77d10dc0$99dcfea9@bfs.phone.com> p4raw-id: //depot/perl@10410
* One less -Wall whine.Jarkko Hietaniemi2001-06-031-1/+1
| | | p4raw-id: //depot/perl@10406
* -Wall cleanup continues.Jarkko Hietaniemi2001-06-021-2/+8
| | | p4raw-id: //depot/perl@10392
* Re: [PATCHES] regcomp.c, pod/perldiag.pod, t/op/pat.tJeff Pinyan2001-06-011-2/+55
| | | | | Message-ID: <Pine.GSO.4.21.0106011032080.21027-100000@crusoe.crusoe.net> p4raw-id: //depot/perl@10376
* More -Wall sweeping.Jarkko Hietaniemi2001-05-301-14/+14
| | | p4raw-id: //depot/perl@10338
* Medley of -Wall cleanups from Michael Schwen, Hugo van der Sanden,Jarkko Hietaniemi2001-05-301-7/+6
| | | | | and Abhijit Menon-Sen. p4raw-id: //depot/perl@10321
* Re: [ID 20010506.041] segfault when matching utf8 stringInaba Hiroto2001-05-251-0/+1
| | | | | Message-Id: <200105250124.KAA19571@toshiba.co.jp> p4raw-id: //depot/perl@10206
* Re: [ID 20000716.007] \G in a m//g expression causes problems Hugo van der Sanden2001-05-231-1/+1
| | | | | Message-Id: <200105211532.QAA03999@crypt.compulink.co.uk> p4raw-id: //depot/perl@10187
* Re: [PATCH] HERE mark in regexRonald J. Kimball2001-05-161-3/+3
| | | | | Message-ID: <20010516130443.E1516273@linguist.thayer.dartmouth.edu> p4raw-id: //depot/perl@10136
* Remove the 'asciir' re subpragma. Should instead implementJarkko Hietaniemi2001-05-111-31/+7
| | | | | | | | | | | | | | | | | | | | | | | | | the 'physical vs logical' range scheme: \xAA-\xCC is a native physical range, you want that range of codepoints in your native encoding. In EBCDIC the codepoints in the gaps (between i-j and r-s) should be included. \x{AA}-\x{CC} is a physical Unicode range, you want that range of codepoints in Unicode. a-z is a logical range, you want that range of 'logical' codepoints in your native encoding. In EBCDIC the codepoints in the gaps (between i-j and r-s) should not be included. Mixed cases (a-\xAA, etc) should either be errors, or maybe the 'logical' endpoints should be converted to native/Unicode codepoints, and the range handled as a physical range. 'Logical endpoints' are to be recognized only in the A-Z, a-z, and 0-9 ranges. Probably a warning should be given for mixed cases like A-z or a-9 (since such expressions are encoding dependent), with a recommendation to use physical ranges. p4raw-id: //depot/perl@10085
* Insecure regexesRobin Houston2001-05-071-1/+1
| | | | | Message-ID: <20010507215612.A31114@penderel> p4raw-id: //depot/perl@10021
* -Wformat error from ext/re/re_comp.cRobin Barker2001-05-041-6/+6
| | | | | Message-Id: <200105041709.SAA14835@tempest.npl.co.uk> p4raw-id: //depot/perl@9991
* The #9901 had removed one line essential for EBCDIC.Jarkko Hietaniemi2001-05-041-0/+1
| | | p4raw-id: //depot/perl@9987
* Re: [PATCH bleadperl] [ID 20010426.002] Word boundry regex [...] Hugo van der Sanden2001-04-301-1/+0
| | | | | Message-Id: <200104291609.RAA17790@crypt.compulink.co.uk> p4raw-id: //depot/perl@9911
* In character classes one couldn't have 0x80..0xff charactersJarkko Hietaniemi2001-04-291-63/+40
| | | | | | at the left hand side if there were 0x100.. characters in the character class. p4raw-id: //depot/perl@9901
* Re: [PATCH @9846] dumping ANYOFHugo van der Sanden2001-04-261-1/+5
| | | | | Message-Id: <200104262233.XAA22352@crypt.compulink.co.uk> p4raw-id: //depot/perl@9873
* Retract #9851, core dumps from pod2man.Jarkko Hietaniemi2001-04-261-1/+0
| | | p4raw-id: //depot/perl@9852