summaryrefslogtreecommitdiff
path: root/t/op/pat.t
Commit message (Collapse)AuthorAgeFilesLines
* The basic character classes seem to go untested with Unicode.Jarkko Hietaniemi2001-11-021-1/+14
| | | p4raw-id: //depot/perl@12803
* Test vertical whitespace combined with /x in \p{}.Jarkko Hietaniemi2001-10-201-1/+4
| | | p4raw-id: //depot/perl@12519
* Unicode categories continue:Jarkko Hietaniemi2001-10-191-1/+13
| | | | | | | | implement Category=, Script=, Block= (these are based on an upcoming update of TR#18) Fix a bug where we got two In categories named "old italic", and another where shortcut for the Is categories wasn't taken. p4raw-id: //depot/perl@12500
* Rewrite mktables from scratch.Jarkko Hietaniemi2001-10-131-1/+14
| | | | | | | | | | | | | | | | | | | - Cleaner. - Faster: 15-20 seconds as opposed to several minutes. - More dynamic: the names of the various categories such as the linebreak ones are dynamic, not static. - Is.pl: long names for the general category properties are now available. - Ranges (<... ,First>, <..., Last>) from the general categories work now. - No more mktables.PL because the mktables.PL is not and never has been run to create a mktables. - syllables.txt and Is/Syl*.pl removed: non-standard (not part of the Unicode), and the whole concept is being reworked (http://syllabary.sourceforge.net/), the old way wouldn't even work with the new Syllables.txt (it would result in 1000+ new categories) p4raw-id: //depot/perl@12427
* Unicode properties: fix L& (the #12319 didn't allow L&,Jarkko Hietaniemi2001-10-031-1/+25
| | | | | | only IsL&) and Inherited (negative lookahead good); add tests for Common, Inherited, and L&. p4raw-id: //depot/perl@12320
* Unicode properties: allow also intra(wordbreak)name whitespace,Jarkko Hietaniemi2001-10-021-2/+3
| | | | | not just one single space. p4raw-id: //depot/perl@12309
* Also the ^Is is optional.Jarkko Hietaniemi2001-10-011-10/+35
| | | p4raw-id: //depot/perl@12293
* More Unicode property tests for the abbreviatedJarkko Hietaniemi2001-10-011-13/+27
| | | | | general properties. p4raw-id: //depot/perl@12287
* Further tweaks to the Unicode properties.Jarkko Hietaniemi2001-10-011-1/+31
| | | p4raw-id: //depot/perl@12286
* More leniency to the \p and \P: now can have whitespaceJarkko Hietaniemi2001-09-291-1/+6
| | | | | | | between the property definition and the curlies; now can invert the property by having a caret between the open curly and the property. p4raw-id: //depot/perl@12269
* Allow for more flexibility in the \p{In...} names, nowJarkko Hietaniemi2001-09-291-5/+9
| | | | | | | case doesn't matter, and any space or dash can be matched by any space, dash, underbar, or empty. (may be going too far on leniency) p4raw-id: //depot/perl@12264
* more jumpables, and hit-bit bugJeff Pinyan2001-09-141-1/+10
| | | | | Message-ID: <Pine.GSO.4.21.0109140955250.12393-100000@crusoe.crusoe.net> p4raw-id: //depot/perl@12020
* Re: the remaining bugs in \x escapes (was Re: [PATCH] oct and hex in ↵Nicholas Clark2001-09-101-1/+111
| | | | | | | glorious 64 bit (with less bugs) (was Re: hex and oct again (was Re: FreeBSD MD5 crypt? Re: crypt/hex/oct and Unicode?))) Message-ID: <20010911000031.G1512@plum.flirble.org> p4raw-id: //depot/perl@11990
* Test tweaks.Jarkko Hietaniemi2001-09-011-1/+3
| | | p4raw-id: //depot/perl@11818
* New try for ID 20010407.006: detach the semanticsJarkko Hietaniemi2001-08-181-1/+20
| | | | | "was the last match target UTF8" into its own variable. p4raw-id: //depot/perl@11717
* Retract #11712 for now. The real fix would probablyJarkko Hietaniemi2001-08-181-20/+1
| | | | | | | | be something like making PL_reg_sv a copy (PV + UTF8) of the matched/substituted string (note: not just a SvPOK string, for example the stringified form of a ROK would be applicable) Beware of leaks. p4raw-id: //depot/perl@11714
* (Retracted by #11714)Jarkko Hietaniemi2001-08-181-1/+20
| | | | | | | | | | | | | | | Okay analysis, debatable fix. (The fix will inc the refcount of all temporary match objects, like for example tied(%h) =~ /^.../ from Tie/RefHash.t, which will then cause griping at untie() time ("inner references remain"). Fix for ID 20010407.006: PL_reg_sv got wiped out by freetemps if the match target was a temporary (like function_call() =~ /.../), which in turn meant that the $1 et al stopped working if they had UTF-8 in them. Therefore bump up the refcount of PL_reg_sv. p4raw-id: //depot/perl@11712
* Re: [ID 20010814.004] pos() doesn't work when using =~m// in list context Hugo van der Sanden2001-08-171-1/+9
| | | | | Message-Id: <200108161750.f7GHo1l22207@crypt.compulink.co.uk> p4raw-id: //depot/perl@11696
* Failure not true anymore (probably caused by my broken setup).Jarkko Hietaniemi2001-08-121-1/+1
| | | p4raw-id: //depot/perl@11654
* Mark tests (and one module) having explicit Latin-1Jarkko Hietaniemi2001-08-121-1/+1
| | | | | | | | (and one file having explicit UTF-8) with an explicit 'no utf8' (and one explicit 'use utf8'). NOTE: t/op/pat.t #64 fails mysteriously under 'use utf8'. p4raw-id: //depot/perl@11648
* Drop all the unnecessary "use utf8" clauses and some ofJarkko Hietaniemi2001-08-121-1/+1
| | | | | | | | | | | the unnecessary "use bytes" ones. TODO: scour the documentation for unnecessary "use utf8" and prominently display it in perldelta when the time comes. ("use utf8" should be necessary ONLY if one wants the script to be in UTF-8.) Also should be checked in some non-ASCII non-Latin-1 platform, like EBCDIC. p4raw-id: //depot/perl@11638
* Continuation of #11575: SANY_SEEN completely deprecated,Jarkko Hietaniemi2001-08-041-1/+28
| | | | | | plus more tests that unearthed a bug in @a = ($utf8 =~ /\C/g), plus a fix for the bug. p4raw-id: //depot/perl@11577
* Decouple SANY into SANY and CANY: the new SANY is /./s,Daniel P. Berrange2001-08-041-2/+11
| | | | | | | | | the new CANY is the \C. The problem reported and the test case supplied in Subject: UTF-8 bugs in string length & single line regex matches Message-ID: <20010803113932.A19318@berrange.com> p4raw-id: //depot/perl@11575
* patch to add DEL to [:cntrl:]Jeffrey Friedl2001-07-141-1/+15
| | | | | Message-Id: <200107140625.XAA01517@ventrue.corp.yahoo.com> p4raw-id: //depot/perl@11371
* The #11132 missed singleton characters (not partJarkko Hietaniemi2001-07-041-1/+26
| | | | | of a unilo..unihi range) in Unicode scripts. p4raw-id: //depot/perl@11133
* Support preferentially the Unicode 'scripts' definitionJarkko Hietaniemi2001-07-041-1/+16
| | | | | | | | | | | | in the \p{In...} notation since according to Unicode the scripts concept is more natural for matching than using the somewhat artificial block names. The block names are still available, though, and if there's a name conflict, the scripts one wins and the blocks one has to do with 'Block' appended to its name. For more information see http://www.unicode.org/unicode/reports/tr24/ p4raw-id: //depot/perl@11132
* Add support for $^N, the most-recently closed group.Jarkko Hietaniemi2001-06-301-1/+36
| | | p4raw-id: //depot/perl@11038
* In EBCDIC assume UTF-EBCDIC, not UTF-8.Jarkko Hietaniemi2001-06-291-14/+53
| | | p4raw-id: //depot/perl@11014
* t/op/pat.t typo fixRichard Soderberg2001-06-241-1/+1
| | | | | Message-ID: <Pine.LNX.4.21.0106241207320.17075-100000@oregonnet.com> p4raw-id: //depot/perl@10909
* Fix for ID 20010619.003, the [[:print:]] is not supposedJarkko Hietaniemi2001-06-231-1/+20
| | | | | to match the whole isprint(), only the space character. p4raw-id: //depot/perl@10855
* Case of confused test numbering.Jarkko Hietaniemi2001-06-211-5/+5
| | | p4raw-id: //depot/perl@10778
* Re: [PATCH] Make /o work under i?threadsArtur Bergman2001-06-211-0/+13
| | | | | Message-ID: <B757B74A.184D%artur@contiller.se> p4raw-id: //depot/perl@10773
* Integrate change #10739 from maintperl:Jarkko Hietaniemi2001-06-201-1/+19
| | | | | | | | | C<eval "/x$\r\n/x"> fails to compile correctly p4raw-link: @10739 on //depot/maint-5.6/perl: a3d864e88a38f4417518c9eac1d0058e2537efe7 p4raw-id: //depot/perl@10742 p4raw-integrated: from //depot/maint-5.6/perl@10741 'merge in' t/op/pat.t (@9675..) toke.c (@10158..)
* More \p{In...} testing, combined with \N{...}.Jarkko Hietaniemi2001-06-081-1/+23
| | | p4raw-id: //depot/perl@10481
* Re: [PATCHES] regcomp.c, pod/perldiag.pod, t/op/pat.tJeff Pinyan2001-06-011-1/+59
| | | | | Message-ID: <Pine.GSO.4.21.0106011032080.21027-100000@crusoe.crusoe.net> p4raw-id: //depot/perl@10376
* Remove the 'asciir' re subpragma. Should instead implementJarkko Hietaniemi2001-05-111-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | the 'physical vs logical' range scheme: \xAA-\xCC is a native physical range, you want that range of codepoints in your native encoding. In EBCDIC the codepoints in the gaps (between i-j and r-s) should be included. \x{AA}-\x{CC} is a physical Unicode range, you want that range of codepoints in Unicode. a-z is a logical range, you want that range of 'logical' codepoints in your native encoding. In EBCDIC the codepoints in the gaps (between i-j and r-s) should not be included. Mixed cases (a-\xAA, etc) should either be errors, or maybe the 'logical' endpoints should be converted to native/Unicode codepoints, and the range handled as a physical range. 'Logical endpoints' are to be recognized only in the A-Z, a-z, and 0-9 ranges. Probably a warning should be given for mixed cases like A-z or a-9 (since such expressions are encoding dependent), with a recommendation to use physical ranges. p4raw-id: //depot/perl@10085
* In character classes one couldn't have 0x80..0xff charactersJarkko Hietaniemi2001-04-291-5/+109
| | | | | | at the left hand side if there were 0x100.. characters in the character class. p4raw-id: //depot/perl@9901
* Workaround for the "\x{12345678}" plus s/(.)/$1/g plus ord/lengthJarkko Hietaniemi2001-04-181-1/+10
| | | | | | | | bug noticed by Robin Houston; basically the code of detecting value wraparound was acting differently under different compilers and platforms. The workaround is to remove the overflow check for now, a real fix would be to do the overflow (portably) right. p4raw-id: //depot/perl@9740
* Bad test numbering in integrate in #9677.Jarkko Hietaniemi2001-04-111-0/+2
| | | | | Fixes ID 20010411.001. p4raw-id: //depot/perl@9680
* Integrate changes #9675,9676 from maintperl into mainline.Jarkko Hietaniemi2001-04-111-1/+26
| | | | | | | | | | | | | | | fix for bug 20010410.006, undo change#7115 port the OpenBSD glob() security patch p4raw-link: @9676 on //depot/maint-5.6/perl: 3f3c3e312f619efa81ad88565a24e92f15dff662 p4raw-link: @9675 on //depot/maint-5.6/perl: c84593816ace2807d5ff27bb0745a28ec29187b1 p4raw-link: @7115 on //depot/perl: 5675c1a6395a0842c857fc8de159747577df6c4b p4raw-id: //depot/perl@9677 p4raw-integrated: from //depot/maint-5.6/perl@9672 'copy in' ext/File/Glob/bsd_glob.h (@9264..) ext/File/Glob/bsd_glob.c (@9512..) ext/File/Glob/Glob.xs (@9545..) 'merge in' t/op/pat.t (@9138..) regexec.c (@9288..) ext/File/Glob/Glob.pm (@9512..)
* Integrate perlio:Jarkko Hietaniemi2001-03-281-0/+3
| | | | | | | | | | | | | | [ 9400] More EBCDIC tweaks: - one more swash issue &~(0xA0-1) did not do the right thing, for UTF-EBCDIC where &~(0x80-1) does for UTF-8. - add "use re 'asciirange'" to make [!-~] etc. work use it in MIME::QuotedPrint and t/op/regexp.t and t/op/pat.t - Choose a key for t/op/each.t test which gets encoded. - Skip utf8decode if this is UTF-EBCDIC. p4raw-link: @9400 on //depot/perlio: daf0f78e031c718c75590ef9ef573756f805776e p4raw-id: //depot/perl@9407
* Memory tweaks and notes for OEMVS.Nick Ing-Simmons2001-03-261-0/+1
| | | p4raw-id: //depot/perlio@9360
* Re-integrate #9138 from maintperl to mainline,Jarkko Hietaniemi2001-03-181-3/+0
| | | | | | | | | | | the squelching of the unneeded "Scalars leaked" messages. p4raw-id: //depot/perl@9203 p4raw-integrated: from //depot/maint-5.6/perl@9202 'copy in' t/pragma/strict-vars (@7318..) t/pragma/warn/regcomp (@7887..) t/op/regexp.t (@8551..) t/op/lex_assign.t (@8987..) 'merge in' t/op/local.t (@5902..) t/pragma/warn/op (@7846..) t/pragma/warnings.t (@7895..) t/comp/proto.t (@8173..) t/pragma/warn/toke (@8570..) t/op/pat.t (@9076..)
* Sarathy's clear_pmop patch with Radu Greab's fix,Jarkko Hietaniemi2001-03-181-18/+17
| | | | | Hiroto's, Nicholas Clark's, and Vadim Konovalov's tests. p4raw-id: //depot/perl@9194
* NI-S' cunning idea of how to de-UTF8 the "\C-broken" submatches.Jarkko Hietaniemi2001-03-181-5/+1
| | | p4raw-id: //depot/perl@9193
* Allow test to pass even when \C leaves SvUTF8 set by adding 'use bytes'Nick Ing-Simmons2001-03-171-45/+51
| | | p4raw-id: //depot/perlio@9182
* Fix for ID 20010306.008, UTF-8 and \w without 'use utf8' coredump.Jarkko Hietaniemi2001-03-101-3/+11
| | | p4raw-id: //depot/perl@9098
* More UTF-8 test tweaks.Jarkko Hietaniemi2001-03-071-0/+2
| | | p4raw-id: //depot/perl@9075
* Major utf8 test reorganisation and rewrite.Jarkko Hietaniemi2001-03-071-1/+308
| | | | | | | Hopefully no tests were lost in the shuffle. (The beginning of pragma/utf8 was lost intentionally, the tests were rather bogus and incomplete.) p4raw-id: //depot/perl@9063
* Easier to outcomment all the three reset() tests for now.Jarkko Hietaniemi2001-03-061-17/+16
| | | p4raw-id: //depot/perl@9057