| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
IBM says that there are 13 characters whose code point varies depending
on the EBCDIC code page. They fail to mention that the \n character may
also vary. This commit adds checks for \n, in addition to the checks
for the 13 graphic variant ones.
|
|
|
|
|
|
|
|
| |
These were introduced in c05125c57fd7868af65366bacb6fe40c04b1c719 in
July 2018, and would cause any EBCDIC compilations to fail.
That I found it by code inspection shows that we've lost all our EBCDIC
smokers again.
|
|
|
|
|
|
|
| |
They are defined in the Perl library but referenced in the Perl
executable, but the Perl executable can't see them unless they
are exported by the library, and some linkers only make a symbol
a public export if they've been told to explicitly.
|
|
|
|
|
| |
This replaces a complicated trie with a dfa. This should cut down the
number of conditionals encountered in parsing many code points.
|
| |
|
|
|
|
|
|
| |
This commit changes to use a dfa for translating from UTF-8 on EBCDIC
platforms. This makes for fewer #ifdefs, and I realized while I was
working on the dfa, that it wasn't difficult to do for EBCDIC.
|
|
|
|
|
| |
This kind of table is used for the dfa for translating or verifying
UTF-8.
|
|
|
|
|
|
| |
This adds code to declare and define the tables only under DOINIT, and
otherwise to just declare them. This allows the includer to not have to
deal with them at all.
|
|
|
|
|
|
| |
Previously, this omitted the headings on tables that just barely fit
into 80 columns. But future commits will create tables that can't fit
into 80 columns, and these headings are useful, so print them.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
We changed to use symbols not likely to be used by non-Perl code that
could conflict, and which have trailing underbars, so they don't look
like a regular Perl #define.
See https://rt.perl.org/Ticket/Display.html?id=131110
There are many more header files which are not guarded.
|
|
|
|
|
|
|
|
|
|
|
| |
This commit comments out the code that generates these tables. This is
trivially reversible. We don't believe anyone is using Perl and
POSIX-BC at this time, and this saves time during development when
having to regenerate these tables, and makes the resulting tar ball
smaller.
See thread beginning at
http://nntp.perl.org/group/perl.perl5.porters/233663
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses for UTF-EBCDIC essentially the same mechanism that Perl
already uses for UTF-8 on ASCII platforms to extend it beyond what might
be its natural maximum. That is, when the UTF-8 start byte is 0xFF, it
adds a bunch more bytes to the character than it otherwise would,
bringing it to a total of 14 for UTF-EBCDIC. This is enough to handle
any code point that fits in a 64 bit word.
The downside of this is that this extension is not compatible with
previous perls for the range 2**30 up through the previous max,
2**30 - 1. A simple program could be written to convert files that were
written out using an older perl so that they can be read with newer
perls, and the perldelta says we will do this should anyone ask.
However, I strongly suspect that the number of such files in existence
is zero, as people in EBCDIC land don't seem to use Unicode much, and
these are very large code points, which are associated with a
portability warning every time they are output in some way.
This extension brings UTF-EBCDIC to parity with UTF-8, so that both can
cover a 64-bit word. It allows some removal of special cases for EBCDIC
in core code and core tests. And it is a necessary step to handle Perl
6's NFG, which I'd like eventually to bring to Perl 5.
This commit causes two implementations of a macro in utf8.h and
utfebcdic.h to become the same, and both are moved to a single one in
the portion of utf8.h common to both.
To illustrate, the I8 for U+3FFFFFFF (2**30-1) is
"\xFE\xBF\xBF\xBF\xBF\xBF\xBF" before and after this commit, but the I8
for the next code point, U+40000000 is now
"\xFF\xA0\xA0\xA0\xA0\xA0\xA0\xA1\xA0\xA0\xA0\xA0\xA0\xA0",
and before this commit it was "\xFF\xA0\xA0\xA0\xA0\xA0\xA0".
The I8 for 2**64-1 (U+FFFFFFFFFFFFFFFF) is
"\xFF\xAF\xBF\xBF\xBF\xBF\xBF\xBF\xBF\xBF\xBF\xBF\xBF\xBF", whereas
before this commit it was unrepresentable.
Commit 7c560c3beefbb9946463c9f7b946a13f02f319d8 said in its message that
it was moving something that hadn't been needed on EBCDIC until the
"next commit". That statement turned out to be wrong, overtaken by
events. This now is the commit it was referring to.
commit I prematurely
pushed that
|
|
|
|
|
|
|
|
|
|
| |
When dealing with code points, it is easier to use the hex values. This
outputs the tables in hex, squeezing them so they barely fit in an 80
column window. That they didn't use to so fit was why they were not
output in hex prior to this commit.
The UTF8SKIP table is continued to be output in decimal, as the values
aren't code points.
|
|
This causes the generated file ebcdic_tables.h to be #included by
utfebcdic.h instead of the hand-coded tables that were formerly there.
This makes it much easier to add or remove support for EBCDIC code
pages.
The UTF-EBCDIC-related tables for 037 and POSIX-BC are somewhat modified
from what they were before. They were changed by hand minimally a long
time ago to prevent segfaults, but in so doing, they lost an important
sorting characteristic of UTF-EBCDIC. The machine-generated versions
retain the sorting, while also not doing the segfaults. utfebcdic.h has
more detail about this, regarding tr16.
|