| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
| |
p4raw-id: //depot/perl@12803
|
|
|
| |
p4raw-id: //depot/perl@12519
|
|
|
|
|
|
|
|
| |
implement Category=, Script=, Block=
(these are based on an upcoming update of TR#18)
Fix a bug where we got two In categories named "old italic",
and another where shortcut for the Is categories wasn't taken.
p4raw-id: //depot/perl@12500
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Cleaner.
- Faster: 15-20 seconds as opposed to several minutes.
- More dynamic: the names of the various categories
such as the linebreak ones are dynamic, not static.
- Is.pl: long names for the general category properties
are now available.
- Ranges (<... ,First>, <..., Last>) from the general
categories work now.
- No more mktables.PL because the mktables.PL is not
and never has been run to create a mktables.
- syllables.txt and Is/Syl*.pl removed: non-standard
(not part of the Unicode), and the whole concept is
being reworked (http://syllabary.sourceforge.net/),
the old way wouldn't even work with the new Syllables.txt
(it would result in 1000+ new categories)
p4raw-id: //depot/perl@12427
|
|
|
|
|
|
| |
only IsL&) and Inherited (negative lookahead good);
add tests for Common, Inherited, and L&.
p4raw-id: //depot/perl@12320
|
|
|
|
|
| |
not just one single space.
p4raw-id: //depot/perl@12309
|
|
|
| |
p4raw-id: //depot/perl@12293
|
|
|
|
|
| |
general properties.
p4raw-id: //depot/perl@12287
|
|
|
| |
p4raw-id: //depot/perl@12286
|
|
|
|
|
|
|
| |
between the property definition and the curlies; now can
invert the property by having a caret between the open
curly and the property.
p4raw-id: //depot/perl@12269
|
|
|
|
|
|
|
| |
case doesn't matter, and any space or dash can be
matched by any space, dash, underbar, or empty.
(may be going too far on leniency)
p4raw-id: //depot/perl@12264
|
|
|
|
|
| |
Message-ID: <Pine.GSO.4.21.0109140955250.12393-100000@crusoe.crusoe.net>
p4raw-id: //depot/perl@12020
|
|
|
|
|
|
|
| |
glorious 64 bit (with less bugs) (was Re: hex and oct again (was Re: FreeBSD MD5 crypt? Re: crypt/hex/oct and Unicode?)))
Message-ID: <20010911000031.G1512@plum.flirble.org>
p4raw-id: //depot/perl@11990
|
|
|
| |
p4raw-id: //depot/perl@11818
|
|
|
|
|
| |
"was the last match target UTF8" into its own variable.
p4raw-id: //depot/perl@11717
|
|
|
|
|
|
|
|
| |
be something like making PL_reg_sv a copy (PV + UTF8)
of the matched/substituted string (note: not just a SvPOK
string, for example the stringified form of a ROK would
be applicable) Beware of leaks.
p4raw-id: //depot/perl@11714
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Okay analysis, debatable fix. (The fix will inc
the refcount of all temporary match objects,
like for example tied(%h) =~ /^.../ from Tie/RefHash.t,
which will then cause griping at untie() time
("inner references remain").
Fix for ID 20010407.006: PL_reg_sv got wiped out
by freetemps if the match target was a temporary
(like function_call() =~ /.../), which in turn meant
that the $1 et al stopped working if they had UTF-8
in them. Therefore bump up the refcount of PL_reg_sv.
p4raw-id: //depot/perl@11712
|
|
|
|
|
| |
Message-Id: <200108161750.f7GHo1l22207@crypt.compulink.co.uk>
p4raw-id: //depot/perl@11696
|
|
|
| |
p4raw-id: //depot/perl@11654
|
|
|
|
|
|
|
|
| |
(and one file having explicit UTF-8) with an explicit
'no utf8' (and one explicit 'use utf8').
NOTE: t/op/pat.t #64 fails mysteriously under 'use utf8'.
p4raw-id: //depot/perl@11648
|
|
|
|
|
|
|
|
|
|
|
| |
the unnecessary "use bytes" ones.
TODO: scour the documentation for unnecessary "use utf8"
and prominently display it in perldelta when the time comes.
("use utf8" should be necessary ONLY if one wants the script
to be in UTF-8.) Also should be checked in some non-ASCII
non-Latin-1 platform, like EBCDIC.
p4raw-id: //depot/perl@11638
|
|
|
|
|
|
| |
plus more tests that unearthed a bug in @a = ($utf8 =~ /\C/g),
plus a fix for the bug.
p4raw-id: //depot/perl@11577
|
|
|
|
|
|
|
|
|
| |
the new CANY is the \C. The problem reported and the
test case supplied in
Subject: UTF-8 bugs in string length & single line regex matches
Message-ID: <20010803113932.A19318@berrange.com>
p4raw-id: //depot/perl@11575
|
|
|
|
|
| |
Message-Id: <200107140625.XAA01517@ventrue.corp.yahoo.com>
p4raw-id: //depot/perl@11371
|
|
|
|
|
| |
of a unilo..unihi range) in Unicode scripts.
p4raw-id: //depot/perl@11133
|
|
|
|
|
|
|
|
|
|
|
|
| |
in the \p{In...} notation since according to Unicode the
scripts concept is more natural for matching than using
the somewhat artificial block names. The block names are
still available, though, and if there's a name conflict,
the scripts one wins and the blocks one has to do with
'Block' appended to its name. For more information see
http://www.unicode.org/unicode/reports/tr24/
p4raw-id: //depot/perl@11132
|
|
|
| |
p4raw-id: //depot/perl@11038
|
|
|
| |
p4raw-id: //depot/perl@11014
|
|
|
|
|
| |
Message-ID: <Pine.LNX.4.21.0106241207320.17075-100000@oregonnet.com>
p4raw-id: //depot/perl@10909
|
|
|
|
|
| |
to match the whole isprint(), only the space character.
p4raw-id: //depot/perl@10855
|
|
|
| |
p4raw-id: //depot/perl@10778
|
|
|
|
|
| |
Message-ID: <B757B74A.184D%artur@contiller.se>
p4raw-id: //depot/perl@10773
|
|
|
|
|
|
|
|
|
| |
C<eval "/x$\r\n/x"> fails to compile correctly
p4raw-link: @10739 on //depot/maint-5.6/perl: a3d864e88a38f4417518c9eac1d0058e2537efe7
p4raw-id: //depot/perl@10742
p4raw-integrated: from //depot/maint-5.6/perl@10741 'merge in'
t/op/pat.t (@9675..) toke.c (@10158..)
|
|
|
| |
p4raw-id: //depot/perl@10481
|
|
|
|
|
| |
Message-ID: <Pine.GSO.4.21.0106011032080.21027-100000@crusoe.crusoe.net>
p4raw-id: //depot/perl@10376
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the 'physical vs logical' range scheme:
\xAA-\xCC is a native physical range, you want that range of
codepoints in your native encoding. In EBCDIC the codepoints
in the gaps (between i-j and r-s) should be included.
\x{AA}-\x{CC} is a physical Unicode range, you want that range of
codepoints in Unicode.
a-z is a logical range, you want that range of 'logical' codepoints
in your native encoding. In EBCDIC the codepoints in the gaps
(between i-j and r-s) should not be included.
Mixed cases (a-\xAA, etc) should either be errors, or maybe
the 'logical' endpoints should be converted to native/Unicode
codepoints, and the range handled as a physical range.
'Logical endpoints' are to be recognized only in the A-Z, a-z,
and 0-9 ranges. Probably a warning should be given for mixed
cases like A-z or a-9 (since such expressions are encoding
dependent), with a recommendation to use physical ranges.
p4raw-id: //depot/perl@10085
|
|
|
|
|
|
| |
at the left hand side if there were 0x100.. characters in the
character class.
p4raw-id: //depot/perl@9901
|
|
|
|
|
|
|
|
| |
bug noticed by Robin Houston; basically the code of detecting
value wraparound was acting differently under different compilers
and platforms. The workaround is to remove the overflow check
for now, a real fix would be to do the overflow (portably) right.
p4raw-id: //depot/perl@9740
|
|
|
|
|
| |
Fixes ID 20010411.001.
p4raw-id: //depot/perl@9680
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fix for bug 20010410.006, undo change#7115
port the OpenBSD glob() security patch
p4raw-link: @9676 on //depot/maint-5.6/perl: 3f3c3e312f619efa81ad88565a24e92f15dff662
p4raw-link: @9675 on //depot/maint-5.6/perl: c84593816ace2807d5ff27bb0745a28ec29187b1
p4raw-link: @7115 on //depot/perl: 5675c1a6395a0842c857fc8de159747577df6c4b
p4raw-id: //depot/perl@9677
p4raw-integrated: from //depot/maint-5.6/perl@9672 'copy in'
ext/File/Glob/bsd_glob.h (@9264..) ext/File/Glob/bsd_glob.c
(@9512..) ext/File/Glob/Glob.xs (@9545..) 'merge in' t/op/pat.t
(@9138..) regexec.c (@9288..) ext/File/Glob/Glob.pm (@9512..)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ 9400]
More EBCDIC tweaks:
- one more swash issue &~(0xA0-1) did not do the right thing,
for UTF-EBCDIC where &~(0x80-1) does for UTF-8.
- add "use re 'asciirange'" to make [!-~] etc. work
use it in MIME::QuotedPrint and t/op/regexp.t and t/op/pat.t
- Choose a key for t/op/each.t test which gets encoded.
- Skip utf8decode if this is UTF-EBCDIC.
p4raw-link: @9400 on //depot/perlio: daf0f78e031c718c75590ef9ef573756f805776e
p4raw-id: //depot/perl@9407
|
|
|
| |
p4raw-id: //depot/perlio@9360
|
|
|
|
|
|
|
|
|
|
|
| |
the squelching of the unneeded "Scalars leaked" messages.
p4raw-id: //depot/perl@9203
p4raw-integrated: from //depot/maint-5.6/perl@9202 'copy in'
t/pragma/strict-vars (@7318..) t/pragma/warn/regcomp (@7887..)
t/op/regexp.t (@8551..) t/op/lex_assign.t (@8987..) 'merge in'
t/op/local.t (@5902..) t/pragma/warn/op (@7846..)
t/pragma/warnings.t (@7895..) t/comp/proto.t (@8173..)
t/pragma/warn/toke (@8570..) t/op/pat.t (@9076..)
|
|
|
|
|
| |
Hiroto's, Nicholas Clark's, and Vadim Konovalov's tests.
p4raw-id: //depot/perl@9194
|
|
|
| |
p4raw-id: //depot/perl@9193
|
|
|
| |
p4raw-id: //depot/perlio@9182
|
|
|
| |
p4raw-id: //depot/perl@9098
|
|
|
| |
p4raw-id: //depot/perl@9075
|
|
|
|
|
|
|
| |
Hopefully no tests were lost in the shuffle.
(The beginning of pragma/utf8 was lost intentionally,
the tests were rather bogus and incomplete.)
p4raw-id: //depot/perl@9063
|
|
|
| |
p4raw-id: //depot/perl@9057
|