| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes all the tables in the lib/unicore/To directory that map from
code point to code point be formatted so that the mapped-to code point
is expressed as hexadecimal.
This allows for uniform treatment of these tables in utf8.c, and removes
the final use of strtol() in the (non-CPAN) core. strtol() should be
avoided because it is subject to locale rules, and some older libc
implementations have been buggy. It was used because Perl doesn't have
an efficient way of parsing a decimal number and advancing the parse
pointer to beyond it; we do have such a method for hex numbers.
The input to mktables published by Unicode is also in hex, so this now
conforms to that convention.
This also will facilitate the new work currently being done to read in
the tables that find the closing bracket given an opening one.
|
|
|
|
| |
I needed this in order to compare successive runs of this test
|
|
|
|
|
| |
In commit a9c9e371c40cf388593577cf577494e91793f62a, I forgot to update
the Unicode version in the file that states it.
|
|
|
|
|
| |
Now that mktables generates native tables, it is a fairly simple matter
to get Unicode::UCD to work on those platforms.
|
| |
|
|
|
|
|
| |
This commit documents this function, removing the initial underscore
from its name. (And it hardens input checking.)
|
|
|
|
|
|
| |
There are no "missing" values in inversion maps; there is a default
value returned for each one. So change the example variables' names.
Plus another sentence rewording for clarity.
|
|
|
|
|
| |
This is in preparation for making this function public, and it should be
listed in the pod later than it otherwise would be.
|
|
|
|
| |
This was erroneous. Extra clarifications are also added.
|
|
|
|
|
| |
This only happens should Perl be compiled on the very first Unicode
release, which is extremely unlikely, but fix it anyway.
|
| |
|
| |
|
|
|
|
|
| |
These were all uncovered by the new Pod::Checker, not yet in core.
Fixing these will speed up debugging the new Checker.
|
|
|
|
|
|
|
|
|
|
|
| |
This function is undocumented mostly because I was afraid it would be
buggy, as many such implementations are. See:
http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html
(I recommend reading that link; it is instructive, entertaining, and
humbling.)
And it turns out I was right that I was wrong (in my original code). A
test was inexplicably reversed, and another missing.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
These supposedly are the final data files for 6.2. Earlier changes
originally proposed for 6.2 have been deferred until a later release.
Thus there is no change in the general category of ASCII characters in
these files from what they were in 6.1 and earlier, unlike what had been
proposed.
Unlike the previous experimental beta, code is now in place in Perl to
handle the revised definition of \X in 6.2. The current working draft
of that definition is at http://unicode.org/draft/reports/tr29/tr29.html
|
|
|
|
|
| |
This reverts commit 5435c3759c4567a1bb51384f6641c04822ec6391.
A new beta has been released, and so we should use that instead.
|
|
|
|
|
|
|
|
| |
This creates an optional undocumented parameter to this function to
allow it to return the inversion list of an internal-only Perl property.
This will be used by other functions in Perl, but should not be
documented, as we don't want to encourage the use of internal-only
properties, which are subject to change or removal without notice.
|
|
|
|
|
| |
An extra word is removed, and the recipe should not eval lines with NaN
in them and expect the answer to be NaN
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unicode 6.2 is proposing some changes that may very well break some
CPAN modules. The timing of this nicely coincides with Perl's being
early in the release cycle. This commit takes the current beta 6.2,
adds the proposed changes that aren't yet in it, and subtracts the
changes that would affect \X processing, as those turn out to have
errors, and may have to be rethought. Unicode has been notified of
these problems.
This commit is to gather data as to whether or not the proposed changes
cause us problems. These will be presented to Unicode to aid in their
final decision as to whether or not to go forward with the changes.
This commit should be reverted at some point, and the final 6.2 used
instead.
|
|
|
|
|
| |
These tests fail in some earlier Unicode versions because $/ is being
set. Reset it to $/ around the reads and chomps this .t does.
|
|
|
|
|
|
| |
This reverts commit 9a8e4a54f8b3668a7ebd8f229cdfb405a1dce77c. It turns
out that the reason this was there was to add a stress test. Outside
setting of $/ should not break Unicode::UCD
|
| |
|
|
|
|
|
|
|
| |
In Unicode 6.1, the only property that is stored in hex format that
wasn't handled elsewhere is the bmg property, but earlier Unicodes had
some of the Unihan (if they are being compiled) ones stored that way
too. So make it more general.
|
| |
|
|
|
|
|
| |
This function returns the entire structure that casefold() builds. It
is useful for a .t.
|
|
|
|
|
|
|
|
|
| |
Some of the functions defined in this module are needed for minitest,
where dclone is not available. This defines and uses a substitute
dclone when Storable::dclone is not available.
It also conditionally loads Unicode::Normalize. The function that uses
that module is not executed in minitest.
|
|
|
|
|
|
|
|
| |
This converts this function to using the outputs of prop_invmap() to get
its casefolding definitions. This allows it to work on versions of
Unicode which don't have this file, allows the file to not have to be
installed, and removes this function from having to be different on
EBCDIC platforms (which wasn't coded anyway).
|
|
|
|
| |
This causes failures on early Unicode releases, and is not necessary
|
| |
|
| |
|
| |
|
|
|
|
| |
Indent because a previous commit surrounded this with an 'if'
|
|
|
|
| |
Not all Unicode releases supported blocks
|
| |
|
| |
|
|
|
|
| |
There are no hangul syllables in early releases.
|
|
|
|
|
| |
Some versions of Unicode did not have hangul syllables; and there is a
bug in handling them that doesn't show up in the latest versions.
|
|
|
|
|
| |
Some versions of Unicode don't have the AHex property. Instead use
[:xdigit:] which is defined in all versions.
|
|
|
|
|
| |
The scf property was originally known as the sfc property. This handles
both possibilities.
|
|
|
|
| |
This has to do extra work for releases prior to 6.0.
|
|
|
|
| |
This field had meaning in earlier Unicode versions.
|
|
|
|
|
| |
This value will be used in future commits to make version comparisons
easier.
|
|
|
|
|
| |
Unicode has published a correction to their data files for version 6.1.
This patch applies that correction.
|
| |
|
|
|
|
|
|
| |
The 'i' is an earlier name, and I overlooked changing it when the other
formats were changed. In Unicode 6.1, the only property that is
affected is Bmg.
|
|
|
|
|
|
| |
The type of an 'a' table should not be changed to 's'. This bug
happened currently only if someone changed mktables to output one of the
optional files.
|
|
|
|
|
| |
These concatenated the package name with the beginning of the text with
no intervening punctuation. Add also the function within the package
|
| |
|
|
|
|
| |
This outdents some statements that are no longer enclosed in a block
|