| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
In this loop, if a code point is unassigned, it means that the test is
being run on an early Unicode version which doesn't have this character
yet, or something is very wrong. Instead of persisting with the tests
that aren't going to succeed, fail with an appropriate message.
This means that the .t will not pass, but it gives fewer and better
messages. We want to mark the failure for the case where the problem
isn't an early Unicode version.
|
| |
|
|
|
|
| |
This allows this .t to work on early Unicodes.
|
|
|
|
|
|
|
|
|
|
| |
This changes case.pl to use Unicode::UCD instead of directly reading
the casing files. This allows it to be used on Unicode releases that
don't have those files, as Unicode::UCD has the intelligence to cope
with that. The EBCDIC code in it can be removed as Unicode::UCD should
cope with that as well.
As a result, the .t's that call it have a slightly different API.
|
| |
|
|
|
|
| |
(Also allows the tempfile() to be unlink()ed :-)
|
|
|
|
|
|
|
|
| |
A recent change exposed a faulty test, in t/uni/labels.t;
Previously, a downgraded label passed to eval under 'use utf8;'
would've been erroneously considered UTF-8 and the tests
would pass. Now it's correctly reported as illegal UTF-8
unless unicode_eval is in effect.
|
|
|
|
| |
A different ‘(eval xxx)’ number was being emitted under miniperl.
|
| |
|
|
|
|
|
|
|
| |
This fixes up tests added in the previous commit, making them take
evalbytes into account. Those tests were originally written in a
branch where evalbytes didn’t exist and the unicode_eval feature
was implicitly enabled.
|
|
|
|
|
| |
This meant changing LABEL's definition in perly.y, so most of this
commit is actually from the regened files.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Along with the simple_casefolding and full_casefolding features.
fc() stands for foldcase, a sort of pseudo case (like lowercase),
which is used to implement Unicode casefolding. It maps a string
to a form where all case differences are erased, so it's a
locale-independent way of checking if two strings are the same,
regardless of case.
This functionality was, and still is, available through the
regular expression engine -- /i matches would use casefolding
internally. The fc keyword merely exposes this for easier access.
Previously, one could attempt to case-insensitively test two strings
for equality by doing
lc($a) eq lc($b)
But that might get you wrong results, for example in the case of
\x{DF}, LATIN SMALL LETTER SHARP S.
|
|
|
|
|
|
|
| |
Commit 604a99bd464c92d7 enabled the warning for package arrays, but failed
to lexically disable the warning for the various tests for the construction.
Even though the construction is deprecated, we'd still like to know if the
behaviour changes, in case it wasn't intentional.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All Unicode properties actually turn into bracketed character classes,
whether explicitly done or not. A swash is generated for each property
in the class. If that is the only thing not in the class's bitmap, it
specifies completely the non-bitmap behavior of the class, and can be
passed explicitly to regexec.c. This avoids having to regenerate the
swash. It also means that the same swash is used for multiple instances
of a property. And that means the number of duplicated data structures
is greatly reduced. This currently doesn't extend to cases where
multiple Unicode properties are used in the same class
[\p{greek}\p{latin}] will not share the same swash as another character
class with the same components. This is because I don't know of a
an efficient method to determine if a new class being parsed has the
same components as one already generated. I suppose some sort of
checksum could be generated, but that is for future consideration.
|
|
|
|
|
|
|
| |
Future commits are planned to move the resolution of Unicode properties
from regex execution time to compile time. By moving the code into a
BEGIN block, this .t can now handle both types. Before this patch, it
wouldn't show any activity at all if things are done at compile time.
|
|
|
|
|
| |
This new test makes sure that a regular expression that forward
references a user-defined property works.
|
|
|
|
|
|
| |
Previously the first letter for latin-1 classnames was being mischecked, only
allowing ASCII, which caused an instance of the Unicode Bug for downgradable
classnames.
|
|
|
|
|
|
|
|
|
|
| |
Change every existing instance of isa_ok() to use object_ok(). This is safe because
before this point, t/test.pl's isa_ok() only worked on objects. lib/dbmt_common.pl is
the last hold out because it uses Test::More.
These are like isa_ok() but they also check if it's a class or an object.
This lets the core tests defend against outlandish bugs while allowing
t/test.pl to retain feature parity with Test::More.
|
|
|
|
|
|
|
|
|
|
| |
Future commits are planned to change the base list in various
mapping tables to include the simple maps which are now suppressed
when there are full maps that override them.
The current test blindly tests all the simple maps, which would
start to fail because the core uses the full maps when available.
So, simply don't test the overridden ones.
|
|
|
|
|
| |
(modified by the committer only to apply when the unicode_eval
feature is enabled)
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
This makes perl -E '$::{example} = "\x{30cb}"; say prototype example;'
store and fetch the correctly flagged prototype.
With this, all TODO tests in gv.t pass; The next commit will deal
with making the parsing of prototypes nul-clean.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|