summaryrefslogtreecommitdiff
path: root/ebcdic_tables.h
Commit message (Collapse)AuthorAgeFilesLines
* Check for \n in EBCDIC code pagesKarl Williamson2019-03-061-2/+2
| | | | | | | IBM says that there are 13 characters whose code point varies depending on the EBCDIC code page. They fail to mention that the \n character may also vary. This commit adds checks for \n, in addition to the checks for the 13 graphic variant ones.
* ebcdic_tables.h: Remove alien '#'Karl Williamson2019-03-041-48/+48
| | | | | | | | These were introduced in c05125c57fd7868af65366bacb6fe40c04b1c719 in July 2018, and would cause any EBCDIC compilations to fail. That I found it by code inspection shows that we've lost all our EBCDIC smokers again.
* Make new EBCDIC tables global.Craig A. Berry2018-07-081-12/+12
| | | | | | | They are defined in the Perl library but referenced in the Perl executable, but the Perl executable can't see them unless they are exported by the library, and some linkers only make a symbol a public export if they've been told to explicitly.
* Make isC9_STRICT_UTF8_CHAR() an inline dfaKarl Williamson2018-07-051-0/+78
| | | | | This replaces a complicated trie with a dfa. This should cut down the number of conditionals encountered in parsing many code points.
* Add dfa for strict translation from UTF-8Karl Williamson2018-07-051-0/+92
|
* Extend dfa for translation of UTF-8 to EBCDICKarl Williamson2018-07-051-1/+109
| | | | | | This commit changes to use a dfa for translating from UTF-8 on EBCDIC platforms. This makes for fewer #ifdefs, and I realized while I was working on the dfa, that it wasn't difficult to do for EBCDIC.
* regen/ebcdic.pl: Add capability to generate a dfa tableKarl Williamson2018-07-051-320/+324
| | | | | This kind of table is used for the dfa for translating or verifying UTF-8.
* regen/ebcdic.pl: Add declaration of generated tablesKarl Williamson2018-07-051-18/+90
| | | | | | This adds code to declare and define the tables only under DOINIT, and otherwise to just declare them. This allows the includer to not have to deal with them at all.
* regen/ebcdic.pl: Always print row headingsKarl Williamson2018-07-051-256/+256
| | | | | | Previously, this omitted the headings on tables that just barely fit into 80 columns. But future commits will create tables that can't fit into 80 columns, and these headings are useful, so print them.
* ebcdic_tables.h: Add commentsKarl Williamson2018-07-051-8/+8
|
* Use new paradigm for hdr file double inclusion guardKarl Williamson2017-06-021-3/+3
| | | | | | | | | | We changed to use symbols not likely to be used by non-Perl code that could conflict, and which have trailing underbars, so they don't look like a regular Perl #define. See https://rt.perl.org/Ticket/Display.html?id=131110 There are many more header files which are not guarded.
* Don't generate EBCDIC POSIX-BC tablesKarl Williamson2016-01-141-213/+0
| | | | | | | | | | | This commit comments out the code that generates these tables. This is trivially reversible. We don't believe anyone is using Perl and POSIX-BC at this time, and this saves time during development when having to regenerate these tables, and makes the resulting tar ball smaller. See thread beginning at http://nntp.perl.org/group/perl.perl5.porters/233663
* Extend UTF-EBCDIC to handle up to 2**64-1Karl Williamson2015-11-251-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This uses for UTF-EBCDIC essentially the same mechanism that Perl already uses for UTF-8 on ASCII platforms to extend it beyond what might be its natural maximum. That is, when the UTF-8 start byte is 0xFF, it adds a bunch more bytes to the character than it otherwise would, bringing it to a total of 14 for UTF-EBCDIC. This is enough to handle any code point that fits in a 64 bit word. The downside of this is that this extension is not compatible with previous perls for the range 2**30 up through the previous max, 2**30 - 1. A simple program could be written to convert files that were written out using an older perl so that they can be read with newer perls, and the perldelta says we will do this should anyone ask. However, I strongly suspect that the number of such files in existence is zero, as people in EBCDIC land don't seem to use Unicode much, and these are very large code points, which are associated with a portability warning every time they are output in some way. This extension brings UTF-EBCDIC to parity with UTF-8, so that both can cover a 64-bit word. It allows some removal of special cases for EBCDIC in core code and core tests. And it is a necessary step to handle Perl 6's NFG, which I'd like eventually to bring to Perl 5. This commit causes two implementations of a macro in utf8.h and utfebcdic.h to become the same, and both are moved to a single one in the portion of utf8.h common to both. To illustrate, the I8 for U+3FFFFFFF (2**30-1) is "\xFE\xBF\xBF\xBF\xBF\xBF\xBF" before and after this commit, but the I8 for the next code point, U+40000000 is now "\xFF\xA0\xA0\xA0\xA0\xA0\xA0\xA1\xA0\xA0\xA0\xA0\xA0\xA0", and before this commit it was "\xFF\xA0\xA0\xA0\xA0\xA0\xA0". The I8 for 2**64-1 (U+FFFFFFFFFFFFFFFF) is "\xFF\xAF\xBF\xBF\xBF\xBF\xBF\xBF\xBF\xBF\xBF\xBF\xBF\xBF", whereas before this commit it was unrepresentable. Commit 7c560c3beefbb9946463c9f7b946a13f02f319d8 said in its message that it was moving something that hadn't been needed on EBCDIC until the "next commit". That statement turned out to be wrong, overtaken by events. This now is the commit it was referring to. commit I prematurely pushed that
* regen/ebcdic.pl: Output tables in hexKarl Williamson2015-11-251-384/+432
| | | | | | | | | | When dealing with code points, it is easier to use the hex values. This outputs the tables in hex, squeezing them so they barely fit in an 80 column window. That they didn't use to so fit was why they were not output in hex prior to this commit. The UTF8SKIP table is continued to be output in decimal, as the values aren't code points.
* Make many EBCDIC tables generated instead of hand-codedKarl Williamson2014-05-311-0/+607
This causes the generated file ebcdic_tables.h to be #included by utfebcdic.h instead of the hand-coded tables that were formerly there. This makes it much easier to add or remove support for EBCDIC code pages. The UTF-EBCDIC-related tables for 037 and POSIX-BC are somewhat modified from what they were before. They were changed by hand minimally a long time ago to prevent segfaults, but in so doing, they lost an important sorting characteristic of UTF-EBCDIC. The machine-generated versions retain the sorting, while also not doing the segfaults. utfebcdic.h has more detail about this, regarding tr16.