| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Commit d11155ec2b4e3f6cf952e2a25615aec506a8e296 changed the format of
some of the generated tables, but I left some of the old comments and
variable names the same in order to not make this already large commit
bigger. This updates these to reflect the new format.
It also refactors one 'if' statement to not use a block.
|
|
|
|
| |
This outdents some statements that are no longer enclosed in a block
|
|
|
|
|
| |
These were incorrectly stating that some tables are accessible via
Unicode::UCD, and giving the wrong name in some instances.
|
|
|
|
|
| |
By converting this property to requiring adjustments to get the proper
values, its storage size decreases by more than half.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Thanks to Tony Cook for suggesting this.
The API is changed from returning deltas of code points, to storing the
actual correct values, but requiring adjustments for the non-initial
elements in a range, as explained in the pod.
This makes the data less confusing to look at, and gets rid of
inconsistencies if we didn't make the same sort of deltas for entries
that were, e.g. arrays of code points.
|
|
|
|
|
|
|
| |
All the files that should ever be read by the subroutine will be found
in the unicore directory, so can specify it in the subroutine instead of
in each call to it. This makes things slightly easier in future
commits.
|
|
|
|
|
| |
One comment is out-dated, also moves a line of code so that the comments
flow better.
|
|
|
|
|
|
|
| |
This will be used to generate compile-time inversion lists in a C hdr
file that can be included in programs for initialization speed
Three simple inversion lists are included in this initial commit
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
* use variables for the names of temporary files
* use lexicals for file handles
* check the return value of close
* use is() rather than ok() with ==
[possibly still dubious that it's using unpack checksums for comparison,
instead of SHAs or simply File::Compare]
|
|
|
|
|
|
| |
eol.t gained code to clean up temporary files it generated as part of commit
0ec158f4b0db050a in 2002. The temporary file names used by Pod::Html were
changed by commit 33869856bc668ad8 in 2003, but eol.t had never been updated.
|
|
|
|
|
|
| |
Applied patch from John Peacock, but added whitespace fixes,
corrected pod link error and updated known Pod issues to reflect
a fix.
|
| |
|
|
|
|
|
| |
This merely moves a whole=item to another place, in preparation for
future commits
|
|
|
|
|
|
| |
This changes the output of prop_invmap() for the Perl_Decimal_Digit
property to use code point deltas, similar to other properties. This
causes the output to be 1/10 what it used to be.
|
|
|
|
| |
Indent properly to account for these being in a newly formed block
|
|
|
|
|
|
|
| |
The file for this property is stored in the old-style format for
backward compatibility with any applications that might be reading it
directly. But the values should be returned through the Unicode::UCD
API as deltas for consistency with other, similar properties.
|
|
|
|
|
|
|
|
|
|
| |
Earlier commits caused the return of prop_invmap() for certain
properties to return deltas from code points instead of the code points
themselves, for compactness of storage and speed of searching. This
causes the same for the 'dm' property, for consistency with the others,
even though the space savings is not large for this one; essentially the
same code can be used for the two types now; instead of an application
having to have special cases.
|
|
|
|
|
|
|
|
|
|
| |
This commit has the effect of changing the non-legacy tables for the lc,
uc, tc, and fc properties to use maps of deltas from the code points
instead of the code points themselves, thus shortening them
significantly, and hence the time required to search through them.
Note that these tables are new, and currently used only by Unicode::UCD.
A future commit will change the Perl core to use them.
|
|
|
|
|
|
|
|
|
| |
All the files that mktables generates that are for external-to-core use
have now been changed so that the code requests explicitly for each that
they have the comment that says they are for external use, but it is
deprecated to use them. That means that any files that haven't been so
explicitly set should have the comment instead that says they are for
internal use only.
|
|
|
|
|
|
|
|
| |
Future commits will cause tables that map to code points to, in general,
use deltas instead. This ensures that files that contain tables and
have been mentioned publicly in the past continue to have their current
contents and format, so that applications that read them (such as
Unicode::Normalize) are unaffected.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Delta tables are those in which the mapping is not stored as-is, but is
modified to be the delta between the actual mapping and the code point
it is for. This allows for smaller tables that are faster to search and
require less memory to store.
For example, consider the lower case mapping of A=>a, B=b, ... Z=>z.
Prior to this patch, this requires 26 entries in the table; now it
requires just one. This is because A=65 and a=97. We store 97-65=32.
And 32 is the same delta for each of A-Z, so we can store these as a
single range each with the same value, 32.
The delta tables tend to be half as large as the non-ones, or even
smaller.
This just enables the feature. No tables currently use it. For that,
changes in other Unicode::UCD need to be coordinated.
|
|
|
|
|
|
|
|
|
|
|
| |
A previous commit has added two nested blocks surrounding the affected
code. This looks like a big change, but it is in fact only white space
plus reflowing things to fit in an 80 column window, plus slight changes
to comments.
I verified that there were no code changes by using a diff command that
can ignore leading white space changes, and hence gave a more accurate
difference listing
|
|
|
|
|
|
|
| |
This is a slight refactoring to avoid using 'next' in the loop, and to
surround things with a bare block. Future commits will want to
do common code at the bottom of the loop, including a redo of the bare
block.
|
|
|
|
| |
This should speed up this test slightly
|
|
|
|
|
| |
Previous commits have removed all uses of these tables, so they are no
longer needed.
|
|
|
|
|
|
| |
Previous commits have expanded whats in the full case mapping tables
to include the simple maps as well. Thus the specially constructed
tables need no longer be used, leading to simplification.
|
|
|
|
| |
outdent now that surrounding block is removed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes the case change mapping tables to include the simple
mappings. This was done in 5.14 for the case folding table. The full
mappings are contained, as before, in a hash. Now the simple mappings
they override (when doing multi-char case changing) are added to the
main body of the table, to the already existing simple mappings that
aren't overridden.
If the caller wants to do full mapping, it should look first in the
hash, and only if not found, look in the main body. If the caller wants
only simple mapping, it ignores the hash.
This is already how the code in utf8.c that reads these tables is
constructed.
The .t is modified to take into account that these code points are now
in the main table body.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is for backwards compatibility. Future commits will change these
tables that are generated by mktables to be more efficient. But the
existence of them was advertised in v5.12 and v5.14, as something a Perl
program could use because the Perl core did not provide access to their
contents. We can't change the format of those without some notice.
The solution adopted is to have two versions of the tables, one kept in
the original file name has the original format; and the other is free to
change formats at will.
This commit just creates copies of the original, with the same format.
Later commits will change the format to be more efficient.
We state in v5.16 that using these files is now deprecated, as the
information is now available through Unicode::UCD in a stable API. But
we don't test for whether someone is opening and reading these files; so
the deprecation cycle should be somewhat long; they will be unused, and
the only drawbacks to having them are some extra disk space and the time
spent in having to generate them at Perl build time.
This commit also changes the Perl core to use the original tables, so
that the new format can be gradually developed in a series of patches
without having to cut over the whole thing at once.
|
|
|
|
|
|
|
| |
The object is already known to us as the loop variable, so no need to
derive it again; and change the loop variable name and one other
variable name to distinguish the table as being the full map one from
the simple map one
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some property tables have multiple values per code point. These include
the final Name-equivalent property in which some code points have more
than one synonym; and the full case changing property tables that are
supersets of the simple case changing tables, in which some code points
have a full mapping that differs from the simple mapping.
Prior to this patch, these could not be initialized simply using the
Initialize parameter to the constructor, as it was unable to handle
multiple values per code point.
This also preserves the range type.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These three tables are handled alike; this creates a loop to execute the
same instructions on each of them. Currently there is so little to do,
that it wouldn't be worth it, except that future commits will add
complications, and this makes those easier to handle.
There is now a test that the input data is sane, and instead of
overwriting a value in a table with a known identical value, we skip
that. This doesn't save much effort, because most of the work is
looking up the value (which we can now check sanity for), but again will
be useful for future commits.
|
|
|
|
|
| |
When calculating the format of a table, assume that there are no leading
zeros if it is a decimal number, but that means hex.
|
|
|
|
|
|
| |
This table was used only by Unicode::UCD which no longer uses it, and it
turns out that the data in it are redundant. This is in preparation for
refactoring and removal of the table altogether.
|
|
|
|
|
| |
It turns out that currently in Unicode 6.0, this table is redundant.
This prepares for removing it altogether.
|
|
|
|
| |
As it is calling something in a different package
|
|
|
|
|
|
| |
This commit delivers the official Unicode character database files for
release 6.1, plus the final bits needed to cope with the changes in them
from release 6.0, including documentation.
|
| |
|
|
|
|
|
|
|
| |
This property has an extra field. So far, commits have allowed mktables
to cope with an extra field, with the value it would have if it had
existed in earlier Unicode releases. This adds code to deal with the
values it will have in 6.1
|
|
|
|
|
| |
The format of this property in Unicode 6.1 is changing, so that the
previous algorithm for separating it out from Name.pl no longer works.
|
|
|
|
|
|
|
|
| |
T_DATAUNIT and T_CALLBACK are nowhere to be found in a CPAN module and
are not used in core. Their purpose is entirely unclear and they are
trivial. They'll always be available from CPAN from the
ExtUtils::Typemaps::Excommunicated module. See perlxstypemap for details
on how to use that if you need it.
|
|
|
|
|
|
|
|
|
|
| |
Sadly, the POD in Typemap.xs was not easily extractable into a POD file
at build time, so it now lives in a separate POD file from the start.
Makes keeping documentation and testing efforts in sync marginally
harder, but it's probably the right trade-off.
What's left to do is finding the right places in other POD files to
refer to this old/new documentation.
|
|
|
|
|
| |
not 2.22, as that was used by 5.13.10, but the only changes in 5.13.10
were reverted for 5.13.11+.
|
|
|
|
|
|
|
|
|
| |
These warnings exist to catch file operations on unchomped file names.
But File::Copy should not be triggering them, otherwise it produces
warnings for every copy("foo/bar\n", "baz/bar\n"), with no (easy) way
to suppress the warning, as warnings are lexical.
I don’t know how to test this portably.
|
|
|
|
|
| |
It only helps with very long strings. It actually slows things down
slightly when used on short strings.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pods are installed into privlib. Before Perl 5.005, privlib didn't contain
the version number in it, so 5.004 and 5.003 (etc) both installed to the same
directory. perldiag.pod differed in not-quite compatible ways, so hacks were
put in to (a) also install it in privlib with the version number in the
filename (b) to hard link it to archlib, which always did contain the version
number.
5.005 changed privlib to contain the version number, solving the underlying
problem (strictly commit bfb7748a896459cc, described here:
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1998-07/msg00136.html )
Commit a841533b5cf319b3 (Oct 2009) removed the first installation hack,
Commit e8ea61279d90dbe9 (Jul 1998) removed the second.
Hence the code to search in the "hacked" locations is no longer needed,
and in the interests of clarity, should go.
|