| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
./perl -Ilib Porting/cmpVERSION.pl -xd . v5.13.8
|
|
|
|
|
|
|
|
|
| |
# New Ticket Created by (Peter J. Acklam)
# Please include the string: [perl #81890]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=81890 >
Signed-off-by: Abigail <abigail@abigail.be>
|
|
|
|
|
|
|
|
|
|
| |
Unicode version 6.0 has co-opted the name BELL for a different character
than traditionally used in Perl. This patch works around that by adding
ALERT as a synonym for BELL, and causing a deprecated warning for uses
of the old name.
The new Unicode character will be nameless in Perl 5.14, unless I can
(unlikely) get Unicode to grant a synonym that they will support.
|
|
|
|
| |
This reverts commit fc7c69e2154d208e45cf3f5c98b74ed035d8f50c.
|
| |
|
|
|
|
|
|
| |
Now that have less indent, don't need so many lines. The only changes
in this commit are several blocks of comments to occupy more of each
line. No wording changes are involved.
|
|
|
|
|
|
| |
This patch changes white space only. It lessens the indent of certain
lines that were made longer in an earlier commit, and now most of them
fit into 80 columns.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mktables is changed to process the Unicode named sequence file.
charnames.pm is changed to cache the looked-up values in utf8. A new
function, string_vianame is created that can handle named sequences, as
the interface for vianame cannot. The subroutine lookup_name() is
slightly refactored to do almost all of the common work for \N{} and the
vianame routines. It now understands named sequences as created my
mktables..
tests and documentation are added. In the randomized testing section,
half use vianame() and half string_vianame().
|
| |
|
| |
|
|
|
|
|
| |
This is an intermediate commit in preparation for handling named
sequences
|
|
|
|
|
| |
The double \t\t is unnecessary, and so we can remove one of them,
shortening the table.
|
|
|
|
|
|
| |
mktables is changed to output 5 digit code points, which means that
charnames doesn't have to go looking for the boundaries, which gives a
slight performance enhancement.
|
|
|
|
|
|
| |
If "use charnames ()" are the only usages of this pragma in a program,
it fails due to the pragma's import method not getting called. This
fixes that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch causes \N{}, vianame, and viacode to know the names of all
Unicode code points. Previously the names that are algorithmically
determinable were not handled. These include the Hangul syllables and
many CJK characters.
It simply adds using the routines that mktables inserts into Name.pl
that handle these characters. mktables generates these algorithms from
data in the Unicode data base. The routines have been there since
11/2009 in anticipation of this change, but have been unused until now.
They probably have not been reviewed thoroughly.
The major change to this is the .t file. Now that all code points are
understood, the .t tests them all. But this would take too long each
time, so it tests a random sample. If there is a failure, the seed is
output so that the test can be reproduced. This idea came from Michael
Schwern, and is the same he uses in Test::Sims. Various parameters
about the sampling are easily adjustable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I realized after the last commit that it might be faster to use a trie
when there are multiple scripts that a letter could be in, instead of
searching the table for each script. When there were 6 possible scripts
and the letter was found in the final one, the speed-up was a factor of
5. This also simplified things. The list of scripts can be stored as a
string like A|B|C instead of a stringified array, and the code just gets
simpler.
Also, there were complications to the code to keep from zapping the
input name, just in case it was needed for an error message. But I
realized that instead of using a shift to get the name, just copy it
from $_[0], and on the error leg that needs the original, it still is in
$_[0]. If a user-defined alias is to a character name to lookup and
that one is invalid, we want to output the invalid one, so a further
variable, $save_input is used to hold it
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The :short option which looks like "greek:letter" is just a special case
of the option where a list of possible scripts is set up in the pragma
call. In this case, greek is the single script to look up. It also
turns out that, contrary to the prior code, :short is effectively
mutually exclusive of checking through that list of scripts. That is,
"greek:letter" didn't match in the :short option, it won't match any
script option either because ':' is not a legal character in a name. So
there is no need to execute both. I refactored the code to do an if
then else because of this.
And they both use the same complicated regex that I may have to change
in future patches. So I refactored the code to use the same re
Finally, I added a goto to eliminate a test.
|
|
|
|
| |
It makes more sense to me in light of patches coming up.
|
|
|
|
| |
It was getting painful to have tabs, so change to expand them to blanks
|
| |
|
|
|
|
|
|
| |
The syntax for name look ups under :short is 'script:letter'. Allow
spaces adjacent to the colon (and while we're at it) at the beginning
and end
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was done by moving what could to %^H. Because data structures in
%^H get stringified at runtime, new serialized entries for them had to
be created and then unserialized on each runtime call. Also, because
%^H is read-only at runtime, some data structures couldn't be moved to
it. Things were set up so that these contain only things invariant
under scoping, and looked at only when the same scoped options are in
effect as when they were created. Further comments at declaration of
%full_names_cache.
I was well into this patch when it dawned on me that it was doing
unnecessary tests, so that
if (! a) { conditionally set a }
if (! a) {}
could be implemented more efficiently as
if (! a) {
conditionally set a }
if (! a) {}
}
so I changed it, which messes up leading indentation for the diffs.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
An error leg in charnames.pm was returning the wrong type. This fixes
it. A later commit will change the .t to add "use warnings" so this fix
will be noticed.
|
| |
|
|
|
|
|
|
| |
Capturing parentheses greatly slow down regexes, at least here.
On my machine, viacode took 27 seconds for the 22K Unicode names without
capturing parens; 45s with.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
When vianame returns a chr, it now verifies that it is legal under 'use
bytes'. Update .t
An instance of taking of a substr of a huge string is needed only in an
error leg. Move it to that leg for performance.
And make the message a subroutine so will be identical whenever raised.
|
|
|
|
|
| |
This patch brings the charnames pod up-to-date, and rewords it to
hopefully be more clear.
|
|
|
|
|
| |
This changes viacode to accept aliases that the user has defined beyond
the Unicode range.
|
|
|
|
|
| |
This patch refactors charnames so that vianame and \N call the same
common subroutine so that they have as identical behavior as possible.
|
| |
|
|
|
|
|
| |
Other programs do this; I don't know why just hex() needs to be
protected from user override, but I'm just copying prior art.
|
|
|
|
|
|
|
|
| |
The BUGS section of the charnames pod said that it was a bug to return
undef for unassigned characters, whereas the real Unicode name is the
empty string. demerphq noted that undef stringifies to the empty
string, so we are in fact in compliance with the standard. This
clarifies the pod wording, removing the text from the BUGS section.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the ability of a user to create a custom alias that maps to a
numeric ordinal value, instead of an official Unicode name.
The number of hashes went up so that is better to refer to them by a
name than a number, so I renamed them.
Also, viacode will return any defined user's alias for an otherwise
unamed code point.
This change is principally so that private use characters can be named
so it is more convenient to use them in Perl.
|
|
|
|
|
| |
It's not clear to me what should be done about the problem of vianame
being bipolar.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the standard abbreviations for the control characters
(such as ACK, BEL, etc) to the repertoire that \N{} knows about. It
also adds a few common variants of their full names, and the old names
for the 4 controls that Unicode has chosen not to have any names at all
for.
The patch also adds all the abbreviations that Unicode lists in 5.2 for
longer characters, such as NBSP, SHY, LRE, ...
To preserve complete backward compatibilty for these and future changes,
user-defined aliases are now checked first, before these are.
As a performance enhancement, these aliases are mapped to their actual
code values instead of their full names which then had to be looked up
in the large table. Now that is avoided, and the table is not loaded
at all until a name is encountered that is not one of these aliases.
The pod and .t are updated.
|
|
|
|
| |
viacode now works correctly for 0.
|
|
|
|
|
|
| |
The commit e10d7780a27dcfeb9c50ab28b66f2df8763d8016 introduced a bug in
which a parameter to viacode of the form U+... no longer worked. This
is fixed, as well as tests added.
|
|
|
|
|
|
|
| |
Remove all uses of $[, both reads and writes, from library code.
Test code (which must test behaviour of $[) is unchanged, as is the
actual implementation of $[. Uses in CPAN libraries are also untouched:
I've opened tickets at rt.cpan.org regarding them.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The viacode() function contained the code from the _getcode() function from
Unicode::UCD, unchanged. However, the rest of viacode() requires that
the result be specially formatted to do a string match with leading
zeros inserted to bring the length up to 4 if less than that. The
original function only needs to get the number right, as a numerical
comparison is done, so it doesn't do this. This showed up with calling
viacode with 0, but the bug also affected any input that looked like a
hex number, or a U+ number, such as 'BEE' or 'U+EF'. These need to be
massaged into '0BEE' and '00EF' for the pattern match later in the
routine to succeed.
The patch also adds a test case to Unicode::UCD to verify that it really
does work ok on 0.
|
|
|
|
| |
List known bugs, mention new meaning of \N
|
|
|
|
|
| |
\N has a possible new meaning, and mention bug reports filed against
charnames
|
|
|
|
|
| |
* differ between 5.10.0 and maint-5.10, or
* differ between 5.8.9 and maint-5.10
|
| |
|