| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
The documentation says this is how it should behave, but only 1 of the
three paths in the code did it, and in fact there was a test to the
contrary.
I'm only adding a test for one of the two fixed paths, as the other one
appears to require a weird file name.
|
|
|
|
|
|
|
| |
Previously, it was a warning with the REPLACEMENT CHARACTER substituted.
Unicode recommends that it be a syntax error, and any code that used
this had to be buggy since the REPLACEMENT CHARACTER has no other use in
Unicode.
|
|
|
|
| |
This is supposed to print as-is, not interpolate.
|
|
|
|
|
|
|
|
|
| |
As a result of the Unicode 6.0 mistake of using "BELL" to refer to
a different code point, Perl has deprecated use of this name for 2 major
release cycles, while not fully implementing Unicode in the interim, to
allow any affected code to migrate to the new name
This commit now switches to the new meaning of BELL.
|
|
|
|
| |
Indent newly formed block
|
|
|
|
|
|
| |
This is a recent addition. Use alternate means if the file doesn't
exist in the Unicode release, or is for a non-ASCII platform (as the
alternate means should take care of the translation in that case).
|
|
|
|
|
| |
If the Unicode release doesn't contain hangul syllables, just skip those
tests
|
| |
|
|
|
|
|
| |
Instead of dying when applied to a Unicode version that doesn't have
named sequences, skip them.
|
|
|
|
| |
There are now four characters which have a different preferred name.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit changes the viacode() returned name for four control characters, as
follows:
Code point Old Name New Name
U+000A LINE FEED (LF) LINE FEED
U+000C FORM FEED (FF) FORM FEED
U+000D CARRIAGE RETURN (CR) CARRIAGE RETURN
U+0085 NEXT LINE (NEL) NEXT LINE
Only the return from viacode is affected. All the names are accepted as
input, as they always have been.
Unicode 6.1 now has official names for all the controls, and the new
names match those. The old names were the ones that were recommended by
TR18 prior to 6.1, and still are, sort of. This change uses the
official names in preference to the TR18 ones. We probably wouldn't
bother except that the old names were problematic--the only names in the
whole universe of names containing parentheses, and not matching
traditional usage. The new names have always been accepted as inputs by
Perl.
I actually doubt that Unicode ever grokked that they were recommending
these ugly names. and they haven't paid much attention to TR18 anyway,
breaking it in version 6.0 by encoding one of the recommended names
(BELL) as an official name for another code point, and without realizing
it. TR18 now is in limbo, still wrongly recommending BELL, with a
rewrite being promised for many months now. It's unclear what will
happen with it.
It was agreed on p5p to go with the cleaner, now official names, instead
of the older, likely obsolete, TR18 names. I did a search of
CPAN; it was unclear if this change, (which again is only for viacode())
mattered to any code there or not. There were a few instances of the
old names, but none of those were apparently associated with viacode().
|
|
|
|
|
|
| |
This commit delivers the official Unicode character database files for
release 6.1, plus the final bits needed to cope with the changes in them
from release 6.0, including documentation.
|
|
|
|
|
|
| |
In versions of Unicode earlier than 6.1, there was no possibility of a
name being empty here; but 6.1 will make that happen, so guard against
it.
|
|
|
|
|
| |
This adds test names to nearly all the ones that were missing. Most
were done via a global substituted in a text editor.
|
|
|
|
|
| |
This test was calling grep, then the comma operator, then the non-empty
string after the comment operator caused it to always succeed.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This autoloads charnames.pm when needed. It uses the :full and :short
options. :loose is not used because of its relative unfamiliarity in
the Perl community, and is slower. (If someone later added a typical
"use charnames qw(:full)", things that previously matched under :loose
would start to fail, causing confustion. If :loose does become more
common, we can change this in the future to use it; the converse isn't
true.)
The callable functions in the module are not automatically loaded. To
access them, an explicity "use charnames" must be provided.
Thanks to Tony Cook for doing a code inspection and finding a missing
SPAGAIN.
|
| |
|
|
|
|
|
|
|
|
|
| |
This adds the capability to charnames to use Unicode loose name
look-ups, via ":loose" being specified in the pragma.
The number of tests per code point is doubled in the .t, so to preserve
the same amount of elapsed test time, the number of code points tested
in each run is halved.
|
| |
|
|
|
|
| |
"use charnames qw(.*)" will match any script; it should match none.
|
|
|
|
|
| |
The abbreviations for 4 of the C1 controls have a trailing blank.
Unfortunately so did the tests for them.
|
|
|
|
|
|
|
|
|
| |
# New Ticket Created by (Peter J. Acklam)
# Please include the string: [perl #81890]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=81890 >
Signed-off-by: Abigail <abigail@abigail.be>
|
|
|
|
|
|
|
|
|
|
| |
Unicode version 6.0 has co-opted the name BELL for a different character
than traditionally used in Perl. This patch works around that by adding
ALERT as a synonym for BELL, and causing a deprecated warning for uses
of the old name.
The new Unicode character will be nameless in Perl 5.14, unless I can
(unlikely) get Unicode to grant a synonym that they will support.
|
|
|
|
|
| |
This is a white-space only patch to indent the code that was put into an
if block by the previous commit
|
|
|
|
|
|
| |
This patch makes this .t look for this environment variable, and if set
run more tests. There are two levels of setting, as explained in the
comments
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are several cases where more than one name is valid for a code
point. This happens usually when the original name was published with a
typo in it. It's best for viacode to return the revised name, though
the original remains valid.
The names data is in a table generated by mktables exclusively for
charnames, including vianame (and its kin) and viacode. The fix is to
mktables to put the more correct name first in the table, so that it is
found first and returned by viacode().
When I originally designed this code, I thought the correct name should
come last in the tables, so someone looping and reading it could just
overwrite the less correct one with the more correct one.
But to save memory we have the same table shared by viacode and
vianame, and vianame has to recognize both names, so both entries
are needed. viacode could do an rindex to find the more correct name,
but experiments show that that was twice as slow as going the other
direction. Therefore, this patch is for speed.
If the tables for vianame and viacode were ever to be split, this patch
could be reverted, if desired, to put things back to the reverse order.
|
|
|
|
|
| |
Some code points have two (possibly more names). This makes sure that
all work.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mktables is changed to process the Unicode named sequence file.
charnames.pm is changed to cache the looked-up values in utf8. A new
function, string_vianame is created that can handle named sequences, as
the interface for vianame cannot. The subroutine lookup_name() is
slightly refactored to do almost all of the common work for \N{} and the
vianame routines. It now understands named sequences as created my
mktables..
tests and documentation are added. In the randomized testing section,
half use vianame() and half string_vianame().
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
If the percentage of characters to test is changed to 100%, add code to
make the block size 1. This guarantees each character gets tested in
spite of randomness
|
| |
|
|
|
|
|
| |
srand(undef) is the same as srand(0). The code is trying to get random
seeds, not a fixed one.
|
|
|
|
|
|
|
| |
Commit 8f776eae73090661 turned out to be a bit optimistic with
"should be capable of running in parallel", as the temporary files and
modules written out by the various test scripts have clashing names.
Hence run each test a private subdirectory.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Actually, this tweaks the amount of testing of characters whose names are
algorithmically determinable, most of which are CJK characters.
This patch changes the testing to test not 1% of them, but to test 1 in
each block, no matter what the block size. We really don't need to test
many of these to be confident the algorithm is working.
It also adds some comments to clarify what happens if one tweaks the
block size.
|
|
|
|
| |
This is an important message. Better not to wrap it.
|
|
|
|
|
|
|
|
| |
Somehow an empty line got inserted at the end of the file, and got
interpreted as 0's which caused the test for NULL to fail. This guards
against that.
I removed the empty line, but I'm not sure git has picked that up.
|
|
|
|
| |
Don't calculate our own seed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch causes \N{}, vianame, and viacode to know the names of all
Unicode code points. Previously the names that are algorithmically
determinable were not handled. These include the Hangul syllables and
many CJK characters.
It simply adds using the routines that mktables inserts into Name.pl
that handle these characters. mktables generates these algorithms from
data in the Unicode data base. The routines have been there since
11/2009 in anticipation of this change, but have been unused until now.
They probably have not been reviewed thoroughly.
The major change to this is the .t file. Now that all code points are
understood, the .t tests them all. But this would take too long each
time, so it tests a random sample. If there is a failure, the seed is
output so that the test can be reproduced. This idea came from Michael
Schwern, and is the same he uses in Test::Sims. Various parameters
about the sampling are easily adjustable.
|
|
|
|
|
| |
The form "0x..." is supposed to evaluate as if it weren't a string.
Make sure that is tested
|
|
|
|
|
|
| |
The syntax for name look ups under :short is 'script:letter'. Allow
spaces adjacent to the colon (and while we're at it) at the beginning
and end
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was done by moving what could to %^H. Because data structures in
%^H get stringified at runtime, new serialized entries for them had to
be created and then unserialized on each runtime call. Also, because
%^H is read-only at runtime, some data structures couldn't be moved to
it. Things were set up so that these contain only things invariant
under scoping, and looked at only when the same scoped options are in
effect as when they were created. Further comments at declaration of
%full_names_cache.
I was well into this patch when it dawned on me that it was doing
unnecessary tests, so that
if (! a) { conditionally set a }
if (! a) {}
could be implemented more efficiently as
if (! a) {
conditionally set a }
if (! a) {}
}
so I changed it, which messes up leading indentation for the diffs.
|
|
|
|
|
|
| |
Use of t/lib/common.pl caused some glitches; some behaviors of the
underlying is() functions changed, so revised .t to work under this
scheme.
|
|
|
|
|
|
|
|
|
|
| |
When vianame returns a chr, it now verifies that it is legal under 'use
bytes'. Update .t
An instance of taking of a substr of a huge string is needed only in an
error leg. Move it to that leg for performance.
And make the message a subroutine so will be identical whenever raised.
|
|
|
|
|
| |
This changes viacode to accept aliases that the user has defined beyond
the Unicode range.
|
|
|
|
|
| |
This patch refactors charnames so that vianame and \N call the same
common subroutine so that they have as identical behavior as possible.
|