| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
toggle-on from the encoding pragma.
p4raw-id: //depot/perl@12872
|
|
|
| |
p4raw-id: //depot/perl@12864
|
|
|
| |
p4raw-id: //depot/perl@12858
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
possibly cannot be so. Prepares way for charclass
syntax like [[abc]||[def]] (or just [[abc][def]])
for union, [[\w]&&[$a]] for intersection,
and [[a-z]&&[^def]] for subtraction.
Currently /[[a]/ (or /[a[]/) parses as a character
class containing two characters, "[" and "a",
this may have to be broken for the syntax described
above, otherwise we would have to scan the whole pattern
to find out whether the square brackets match pairwise.
Luckily, the special case of "[" doesn't seem to be
documented (as opposed to "]" and "-"), so we may have
better story for breaking it... One can always use \[
if one wants a literal "[", so there.
p4raw-id: //depot/perl@12835
|
|
|
| |
p4raw-id: //depot/perl@12834
|
|
|
| |
p4raw-id: //depot/perl@12658
|
|
|
|
|
|
|
| |
not all cases since the information whether the pattern
or the target are utf8 seems to be either lost or not
spread widely enough, sigh.
p4raw-id: //depot/perl@12631
|
|
|
|
|
|
| |
is unfinished since have to figure out how to detect
Unicodeness in there.
p4raw-id: //depot/perl@12621
|
|
|
|
|
|
| |
but at least less wrong: prepare for the mapping being
more than just one-character-to-one-character.
p4raw-id: //depot/perl@12371
|
|
|
|
|
| |
formatting chars.
p4raw-id: //depot/perl@12292
|
|
|
| |
p4raw-id: //depot/perl@12286
|
|
|
|
|
|
|
| |
between the property definition and the curlies; now can
invert the property by having a caret between the open
curly and the property.
p4raw-id: //depot/perl@12269
|
|
|
|
|
|
|
| |
glorious 64 bit (with less bugs) (was Re: hex and oct again (was Re: FreeBSD MD5 crypt? Re: crypt/hex/oct and Unicode?)))
Message-ID: <20010911000031.G1512@plum.flirble.org>
p4raw-id: //depot/perl@11990
|
|
|
| |
p4raw-id: //depot/perl@11967
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Date: Sat, 8 Sep 2001 15:42:30 -0400 (EDT)
Message-ID: <Pine.GSO.4.21.0109081535480.24489-100000@crusoe.crusoe.net>
Subject: Re: [PATCH t/op/misc.t] regcomp.c patch broke test
From: "Jeff 'japhy/Marillion' Pinyan" <jeffp@crusoe.net>
Date: Sat, 8 Sep 2001 18:33:12 -0400 (EDT)
Message-ID: <Pine.GSO.4.21.0109081832030.24489-100000@crusoe.crusoe.net>
Subject: [PATCH t/lib/warnings/regcomp] (?=...)? gives no warning now
From: "Jeff 'japhy/Marillion' Pinyan" <jeffp@crusoe.net>
Date: Sat, 8 Sep 2001 18:37:22 -0400 (EDT)
Message-ID: <Pine.GSO.4.21.0109081835340.24489-100000@crusoe.crusoe.net>
p4raw-id: //depot/perl@11956
|
|
|
|
|
|
|
| |
(was Re: FreeBSD MD5 crypt? Re: crypt/hex/oct and Unicode?))
Message-ID: <20010904224250.P25120@plum.flirble.org>
p4raw-id: //depot/perl@11874
|
|
|
|
|
|
| |
code related to PL_reg_sv (so PL_reg_sv_utf8 was logical)
but that is no more the case: PL_reg_match_utf8 is better.
p4raw-id: //depot/perl@11823
|
|
|
|
|
| |
working in 5.7.x
p4raw-id: //depot/perl@11803
|
|
|
|
|
| |
Might break on platforms where bool is larger than 8 bites ???
p4raw-id: //depot/perl@11800
|
|
|
|
|
| |
supposed to happen.
p4raw-id: //depot/perl@11798
|
|
|
|
|
|
| |
Message-Id: <200108311220.IAA54125@raptor.research.att.com>
Fixes test 14 which could fail randomly in rare cases.
p4raw-id: //depot/perl@11797
|
|
|
|
|
| |
"was the last match target UTF8" into its own variable.
p4raw-id: //depot/perl@11717
|
|
|
|
|
|
|
| |
front of pattern
Message-Id: <200108151032.f7FAWBI30961@crypt.compulink.co.uk>
p4raw-id: //depot/perl@11677
|
|
|
|
|
| |
stopped working).
p4raw-id: //depot/perl@11653
|
|
|
| |
p4raw-id: //depot/perl@11651
|
|
|
|
|
|
| |
plus more tests that unearthed a bug in @a = ($utf8 =~ /\C/g),
plus a fix for the bug.
p4raw-id: //depot/perl@11577
|
|
|
|
|
|
|
|
|
| |
the new CANY is the \C. The problem reported and the
test case supplied in
Subject: UTF-8 bugs in string length & single line regex matches
Message-ID: <20010803113932.A19318@berrange.com>
p4raw-id: //depot/perl@11575
|
|
|
|
|
| |
Message-ID: <Pine.LNX.4.21.0108031814240.23972-100000@mako.covalent.net>
p4raw-id: //depot/perl@11568
|
|
|
|
|
| |
Message-Id: <200107140625.XAA01517@ventrue.corp.yahoo.com>
p4raw-id: //depot/perl@11371
|
|
|
|
|
| |
Message-ID: <20010712182532.14821.qmail@plover.com>
p4raw-id: //depot/perl@11322
|
|
|
|
|
|
|
|
| |
Not all of the gripes cleaned up (hairy code in hv.c and
regcomp.c; unused newsp, gimme, and optype from cop.h macros;
unused 'key' arguments in ?DBM_File.xs) (and the -woffs left
to the IRIX hints)
p4raw-id: //depot/perl@11051
|
|
|
|
|
|
|
| |
the initialization of parse_start was bypassed by
several gotos. Now initialized to zero, which may
not be the best choice.
p4raw-id: //depot/perl@10906
|
|
|
|
|
|
|
| |
Message-Id: <200106210851.JAA01942@crypt.compulink.co.uk>
Unroll to avoid a UTS compiler bug.
p4raw-id: //depot/perl@10774
|
|
|
|
|
| |
Message-ID: <000601c0ebae$77d10dc0$99dcfea9@bfs.phone.com>
p4raw-id: //depot/perl@10410
|
|
|
| |
p4raw-id: //depot/perl@10406
|
|
|
| |
p4raw-id: //depot/perl@10392
|
|
|
|
|
| |
Message-ID: <Pine.GSO.4.21.0106011032080.21027-100000@crusoe.crusoe.net>
p4raw-id: //depot/perl@10376
|
|
|
| |
p4raw-id: //depot/perl@10338
|
|
|
|
|
| |
and Abhijit Menon-Sen.
p4raw-id: //depot/perl@10321
|
|
|
|
|
| |
Message-Id: <200105250124.KAA19571@toshiba.co.jp>
p4raw-id: //depot/perl@10206
|
|
|
|
|
| |
Message-Id: <200105211532.QAA03999@crypt.compulink.co.uk>
p4raw-id: //depot/perl@10187
|
|
|
|
|
| |
Message-ID: <20010516130443.E1516273@linguist.thayer.dartmouth.edu>
p4raw-id: //depot/perl@10136
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the 'physical vs logical' range scheme:
\xAA-\xCC is a native physical range, you want that range of
codepoints in your native encoding. In EBCDIC the codepoints
in the gaps (between i-j and r-s) should be included.
\x{AA}-\x{CC} is a physical Unicode range, you want that range of
codepoints in Unicode.
a-z is a logical range, you want that range of 'logical' codepoints
in your native encoding. In EBCDIC the codepoints in the gaps
(between i-j and r-s) should not be included.
Mixed cases (a-\xAA, etc) should either be errors, or maybe
the 'logical' endpoints should be converted to native/Unicode
codepoints, and the range handled as a physical range.
'Logical endpoints' are to be recognized only in the A-Z, a-z,
and 0-9 ranges. Probably a warning should be given for mixed
cases like A-z or a-9 (since such expressions are encoding
dependent), with a recommendation to use physical ranges.
p4raw-id: //depot/perl@10085
|
|
|
|
|
| |
Message-ID: <20010507215612.A31114@penderel>
p4raw-id: //depot/perl@10021
|
|
|
|
|
| |
Message-Id: <200105041709.SAA14835@tempest.npl.co.uk>
p4raw-id: //depot/perl@9991
|
|
|
| |
p4raw-id: //depot/perl@9987
|
|
|
|
|
| |
Message-Id: <200104291609.RAA17790@crypt.compulink.co.uk>
p4raw-id: //depot/perl@9911
|
|
|
|
|
|
| |
at the left hand side if there were 0x100.. characters in the
character class.
p4raw-id: //depot/perl@9901
|
|
|
|
|
| |
Message-Id: <200104262233.XAA22352@crypt.compulink.co.uk>
p4raw-id: //depot/perl@9873
|
|
|
| |
p4raw-id: //depot/perl@9852
|