summaryrefslogtreecommitdiff
path: root/t/re/regexp.t
Commit message (Collapse)AuthorAgeFilesLines
* t/re/regexp.t: Properly handle \c?[ in regex_setsKarl Williamson2014-11-011-4/+4
| | | | | | t/re/regex_sets.t is actually handled by regexp.t, skipping all tests that don't have a [bracketed character class]. Prior to this commit, \[ and \c[ were thought to be such a class, when in fact they aren't.
* t/re/regexp.t: Add ability to skip depending on platformKarl Williamson2014-10-211-0/+10
| | | | | This adds the capability to specify that a test is to be done only on an ASCII platform, or only on an EBCDIC.
* t/re/regexp.t: Generalize for non-ASCII platformsKarl Williamson2014-10-211-0/+29
| | | | | | | | | This adds code to the processing of the tests in t/re/re_tests to automatically convert most character constants from unicode to native character sets. This allows most tests in t/re/re_tests to be run on both platforms without change. A later commit will add the capability to skip individual tests if on the wrong platform, so those few tests that this commit doesn't work for can be accommodated
* Add test names to t/re/regexp.t and friendsYves Orton2014-10-201-13/+20
|
* Skip t/re/regexp.t under miniperl unless uni tables existFather Chrysostomos2014-09-021-1/+4
| | | | | As of 2db3e09128, attempts to load Unicode tables under miniperl croak instead of failing silently.
* Partial minitest fix-upFather Chrysostomos2014-08-231-1/+1
| | | | | | | | | | | | | | While minitest passes all its tests when everything has been built, it is sometimes useful to run it when nothing has been built but miniperl (especially when one is working on low-level stuff that breaks miniperl). Many tests fail if things have not been built yet because miniperl can’t find modules like re.pm. This patch fixes up some tests to find those modules and changes _charnames.pm to load File::Spec only when it needs it. There are still many more failures, but I’ll leave the rest for another time (or another hacker :-).
* Add comments that re tests can be commented in col 7Karl Williamson2013-12-161-0/+2
|
* Fix and add tests for *PRUNE/*THEN plus leading non-greedy +Yves Orton2013-06-221-0/+3
| | | | "aaabc" should match /a+?(*THEN)bc/ with "abc".
* Add back-compat (?[ ]) testsKarl Williamson2013-01-111-4/+194
| | | | | | | | | | This adds testing of (?[ ]), using the same tests, t/re/re_tests< as are used by many of the regular expression .t files. Basically, it converts the [bracketed] character classes in these tests to the new syntax and verifies that they work there. Some tests won't work in one or the other, and the capability to skip depending on the .t is added
* regexp.t: Add a period in test name skip reasonKarl Williamson2013-01-111-1/+1
| | | | This is easier to read.
* regexp.t: Skip tests that are supposed toKarl Williamson2013-01-081-5/+5
| | | | | | This reorders some if elsif ... blocks so that skip is tested for and done before actually trying the test. This only affected tests which were supposed to generate compiler errors.
* regexp.t: Add 'no warnings "utf8";Karl Williamson2012-10-141-0/+1
| | | | | | This .t works fine unless there are failures that it tries to output, and the handle hasn't been opened using utf8. Because we aren't sure if that operation works, just turn off warnings.
* Avoid t/re/regexp.t failing on miniperl when displaying TODO test output.Nicholas Clark2012-01-021-1/+5
| | | | | | | | | | | | | Change 1e9285c2ad54ae39 refactored Data::Dumper to load on miniperl. t/re/regexp.t attempts to load Data::Dumper (in an eval) to display failure output, including the failure of TODO tests. Hence Data::Dumper is now loaded without error as part of minitest, so regexp.t then attempts to use Data::Dumper to output better diagnostics. This fails (hard) because Data::Dumper attempts to load Scalar::Util, which attempts to load B, which bails out because this is miniperl. It's not obvious that there's a 100% solution here that gets full-on Data::Dumper functionality for miniperl.
* Tidy up t/re/regexp.tNicholas Clark2011-12-031-5/+4
| | | | | | | | Eliminate the declaration of $numtests, unused since commit 1a6108908b085da4. Convert $iters and $OP to lexicals. Remove the vestigial logic for finding t/re/re_tests - the MacOS classic style pathname is redundant now, and the file can never be found at t/re/re_tests given that there is a chdir 't' in the BEGIN block.
* [RT #36079] Convert ` to '.jkeenan2011-11-221-4/+4
|
* regexp.t: print diagnostics with leading '#'Karl Williamson2011-09-241-3/+8
| | | | | Some test platforms don't like unexpected output without the comment prefix character
* Move the special-case logic for $qr_embed_thr to regexp_qr_embed_thr.tNicholas Clark2011-03-081-12/+0
| | | | | | t/re/regexp_qr_embed_thr.t is the only place that sets $::qr_embed_thr, so move the special-case startup logic related to it from t/re/regexp.t to t/re/regexp_qr_embed_thr.t. Use the skip_all_*() functions from test.pl
* Allow t/re/regexp.t to conditionally skip tests on miniperlNicholas Clark2011-02-221-0/+2
| | | | Annotate that all tests for %+ and %- are to be skipped on miniperl.
* Fix typos (spelling errors) in t/*.Peter J. Acklam) (via RT2011-01-071-1/+1
| | | | | | | # New Ticket Created by (Peter J. Acklam) # Please include the string: [perl #81916] # in the subject line of all future correspondence about this issue. # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=81916 >
* PATCH: [perl #56444] delayed interpolation of \N{...}Karl Williamson2010-02-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | make regen embed.fnc needs to be run on this patch. This patch fixes Bugs #56444 and #62056. Hopefully we have finally gotten this right. The parser used to handle all the escaped constants, expanding \x2e to its single byte equivalent. The problem is that for regexp patterns, this is a '.', which is a metacharacter and has special meaning that \x2e does not. So things were changed so that the parser didn't expand things in patterns. But this causes problems for \N{NAME}, when the pattern doesn't get evaluated until runtime, as for example when it has a scalar reference in it, like qr/$foo\N{NAME}/. We want the value for \N{NAME} that was in effect at the point during the parsing phase that this regex was encountered in, but we don't actually look at it until runtime, when these bug reports show that it is gone. The solution is for the tokenizer to parse \N{NAME}, but to compile it into an intermediate value that won't ever be considered a metacharacter. We have chosen to compile NAME to its equivalent code point value, and express it in the already existing \N{U+...} form. This indicates to the regex compiler that the original input was a named character and retains the value it had at that point in the parse. This means that \N{U+...} now always must imply Unicode semantics for the string or pattern it appeared in. Previously there was an inconsistency, where effectively \N{NAME} implied Unicode semantics, but \N{U+...} did not necessarily. So now, any string or pattern that has either of these forms is utf8 upgraded. A complication is that a charnames handler can return a sequence of multiple characters instead of just one. To deal with this case, the tokenizer will generate a constant of the form \N{U+c1.c2.c2...}, where c1 etc are the individual characters. Perhaps this will be made a public interface someday, but I decided to not expose it externally as far as possible for now in case we find reason to change it. It is possible to defeat this by passing it in a single quoted string to the regex compiler, so the documentation will be changed to discourage that. A further complication is that \N can have an additional meaning: to match a non-newline. This means that the two meanings have to be disambiguated. embed.fnc was changed to make public the function regcurly() in regcomp.c so that it could be referred to in toke.c to see if the ... in \N{...} is a legal quantifier like {2,}. This is used in the disambiguation. toke.c was changed to update some out-dated relevant comments. It now parses \N in patterns. If it determines that it isn't a named sequence, it passes it through unchanged. This happens when there is no brace after the \N, or no closing brace, or if the braces enclose a legal quantifier. Previously there has been essentially no restriction on what can come between the braces so that a custom translator can accept virtually anything. Now, legal quantifiers are assumed to mean that the \N is a "match non-newline that quantity of times". I removed the #ifdef'd out code that had been left in in case pack U reverted to earlier behavior. I did this because it complicated things, and because the change to pack U has been in long enough and shown that it is correct so it's not likely to be reverted. \N meaning a named character is handled differently depending on whether this is a pattern or not. In all cases, the output will be upgraded to utf8 because a named character implies Unicode semantics. If not a pattern, the \N is parsed into a utf8 string, as before. Otherwise it will be parsed into the intermediate \N{U+...} form. If the original was already a valid \N{U+...} constant, it is passed through unchanged. I now check that the sequence returned by the charnames handler is not malformed, which was lacking before. The code in regcomp.c which dealt with interfacing with the charnames handler has been removed. All the values should be determined by the time regcomp.c gets involved. The affected subroutine is necessarily restructured. An EXACT-type node is generated for the character sequence. Such a node has a capacity of 255 bytes, and so it is possible to overflow it. This wasn't checked for before, but now it is, and a warning issued and the overflowing characters are discarded.
* Update some remaining comments that still point to the old regexp tests locationVincent Pit2009-09-101-1/+1
|
* missed a comment reference to t/op that should now be t/reYves Orton2009-09-101-1/+1
|
* move regex related tests out of t/op/ into t/re/Yves Orton2009-09-101-0/+207