summaryrefslogtreecommitdiff
path: root/Parser/Encodings
diff options
context:
space:
mode:
authorLorry Tar Creator <lorry-tar-importer@baserock.org>2007-11-20 14:28:05 +0000
committer <>2013-08-08 17:01:04 +0000
commitc97631728ce7d6d3f4692a56c3cda7476b42a968 (patch)
tree8c00053771ccae41a737eecd072dbb3cd8b06fdd /Parser/Encodings
downloadperl-xml-parser-c97631728ce7d6d3f4692a56c3cda7476b42a968.tar.gz
Imported from /home/lorry/working-area/delta_perl-xml-parser/XML-Parser-2.36.tar.gz.HEADXML-Parser-2.36master
Diffstat (limited to 'Parser/Encodings')
-rw-r--r--Parser/Encodings/Japanese_Encodings.msg117
-rw-r--r--Parser/Encodings/README51
-rw-r--r--Parser/Encodings/big5.encbin0 -> 40706 bytes
-rw-r--r--Parser/Encodings/euc-kr.encbin0 -> 45802 bytes
-rw-r--r--Parser/Encodings/iso-8859-2.encbin0 -> 1072 bytes
-rw-r--r--Parser/Encodings/iso-8859-3.encbin0 -> 1072 bytes
-rw-r--r--Parser/Encodings/iso-8859-4.encbin0 -> 1072 bytes
-rw-r--r--Parser/Encodings/iso-8859-5.encbin0 -> 1072 bytes
-rw-r--r--Parser/Encodings/iso-8859-7.encbin0 -> 1072 bytes
-rw-r--r--Parser/Encodings/iso-8859-8.encbin0 -> 1072 bytes
-rw-r--r--Parser/Encodings/iso-8859-9.encbin0 -> 1072 bytes
-rw-r--r--Parser/Encodings/windows-1250.encbin0 -> 1072 bytes
-rw-r--r--Parser/Encodings/windows-1252.encbin0 -> 1072 bytes
-rw-r--r--Parser/Encodings/x-euc-jp-jisx0221.encbin0 -> 37890 bytes
-rw-r--r--Parser/Encodings/x-euc-jp-unicode.encbin0 -> 37890 bytes
-rw-r--r--Parser/Encodings/x-sjis-cp932.encbin0 -> 20368 bytes
-rw-r--r--Parser/Encodings/x-sjis-jdk117.encbin0 -> 18202 bytes
-rw-r--r--Parser/Encodings/x-sjis-jisx0221.encbin0 -> 18202 bytes
-rw-r--r--Parser/Encodings/x-sjis-unicode.encbin0 -> 18202 bytes
19 files changed, 168 insertions, 0 deletions
diff --git a/Parser/Encodings/Japanese_Encodings.msg b/Parser/Encodings/Japanese_Encodings.msg
new file mode 100644
index 0000000..6912e70
--- /dev/null
+++ b/Parser/Encodings/Japanese_Encodings.msg
@@ -0,0 +1,117 @@
+Mapping files for Japanese encodings
+
+1998 12/25
+
+Fuji Xerox Information Systems
+MURATA Makoto
+
+1. Overview
+
+This version of XML::Parser and XML::Encoding does not come with map files for
+the charset "Shift_JIS" and the charset "euc-jp". Unfortunately, each of these
+charsets has more than one mapping. None of these mappings are
+considered as authoritative.
+
+Therefore, we have come to believe that it is dangerous to provide map files
+for these charsets. Rather, we introduce several private charsets and map
+files for these private charsets. If IANA, Unicode Consoritum, and JIS
+eventually reach a consensus, we will be able to provide map files for
+"Shift_JIS" and "euc-jp".
+
+2. Different mappings from existing charsets to Unicode
+
+1) Different mappings in JIS X0221 and Unicode
+
+The mapping between JIS X0208:1990 and Unicode 1.1 and the mapping
+between JIS X0212:1990 and Unicode 1.1 are published from Unicode
+consortium. They are available at
+ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0208.TXT and
+ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0212.TXT,
+respectively.) These mapping files have a note as below:
+
+# The kanji mappings are a normative part of ISO/IEC 10646. The
+# non-kanji mappings are provisional, pending definition of
+# official mappings by Japanese standards bodies.
+
+Unfortunately, the non-kanji mappings in the Japanese standard for ISO 10646/1,
+namely JIS X 0221:1995, is different from the Unicode Consortium mapping since
+0x213D of JIS X 0208 is mapped to U+2014 (em dash) rather than U+2015
+(horizontal bar). Furthermore, JIS X 0221 clearly says that the mapping is
+informational and non-normative. As a result, some companies (e.g., Microsoft and
+Apple) have introduced slightly different mappings. Therefore, neither the
+Unicode consortium mapping nor the JIS X 0221 mapping are considered as
+authoritative.
+
+2) Shift-JIS
+
+This charset is especially problematic, since its definition has been unclear
+since its inception.
+
+The current registration of the charset "Shift_JIS" is as below:
+
+>Name: Shift_JIS (preferred MIME name)
+>MIBenum: 17
+>Source: A Microsoft code that extends csHalfWidthKatakana to include
+> kanji by adding a second byte when the value of the first
+> byte is in the ranges 81-9F or E0-EF.
+>Alias: MS_Kanji
+>Alias: csShiftJIS
+
+First, this does not reference to the mapping "Shift-JIS to Unicode"
+published by the Unicode consortium (available at
+ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/SHIFTJIS.TXT).
+
+Second, "kanji" in this registration can be interepreted in different ways.
+Does this "kanji" reference to JIS X0208:1978, JIS X0208:1983, or JIS
+X0208:1990(== JIS X0208:1997)? These three standards are *incompatible* with
+each other. Moreover, we can even argue that "kanji" refers to JIS X0212 or
+ideographic characters in other countries.
+
+Third, each company has extended Shift JIS. For example, Microsoft introduced
+OEM extensions (NEC extensionsand IBM extensions).
+
+Forth, Shift JIS uses JIS X0201, which is almost upper-compatible with US-ASCII
+but is not quite. 5C and 7E of JIS X 0201 are different from backslash and
+tilde, respectively. However, many programming languages (e.g., Java)
+ignore this difference and assumes that 5C and 7E of Shift JIS are backslash
+and tilde.
+
+
+3. Proposed charsets and mappings
+
+As a tentative solution, we introduce two private charsets for EUC-JP and four
+priviate charsets for Shift JIS.
+
+1) EUC-JP
+
+We have two charsets, namely "x-eucjp-unicode" and "x-eucjp-jisx0221". Their
+difference is only one code point. The mapping for the former is based
+on the Unicode Consortium mapping, while the latter is based on the JIS X0221
+mapping.
+
+2) Shift JIS
+
+We have four charsets, namely x-sjis-unicode, x-sjis-jisx0221,
+x-sjis-jdk117, and x-sjis-cp932.
+
+The mapping for the charset x-sjis-unicode is the one published by the Unicode
+consortium. The mapping for x-sjis-jisx0221 is almost equivalent to
+x-sjis-unicode, but 0x213D of JIS X 0208 is mapped to U+2014 (em dash) rather
+than U+2015. The charset x-sjis-jdk117 is again almost equivalent to
+x-sjis-unicode, but 0x5C and 0x7E of JIS X0201 are mapped to backslash and
+tilde.
+
+The charset x-sjis-cp932 is used by Microsoft Windows, and its mapping is
+published from the Unicode Consortium (available at:
+ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.txt). The
+coded character set for this charset includes NEC-extensions and
+IBM-extensions. 0x5C and 0x7E of JIS X0201 are mapped to backslash and tilde;
+0x213D is mapped to U+2015; and 0x2140, 0x2141, 0x2142, and 0x215E of JIS X
+0208 are mapped to compatibility characters.
+
+Makoto
+
+Fuji Xerox Information Systems
+
+Tel: +81-44-812-7230 Fax: +81-44-812-7231
+E-mail: murata@apsdc.ksp.fujixerox.co.jp
diff --git a/Parser/Encodings/README b/Parser/Encodings/README
new file mode 100644
index 0000000..576323c
--- /dev/null
+++ b/Parser/Encodings/README
@@ -0,0 +1,51 @@
+This directory contains binary encoding maps for some selected encodings.
+If they are placed in a directoy listed in @XML::Parser::Expat::Encoding_Path,
+then they are automaticly loaded by the XML::Parser::Expat::load_encoding
+function as needed. Otherwise you may load what you need directly by
+explicity calling this function.
+
+These maps were generated by a perl script that comes with the module
+XML::Encoding, compile_encoding, from XML formatted encoding maps that
+are distributed with that module. These XML encoding maps were generated
+in turn with a different script, domap, from mapping information contained
+on the Unicode version 2.0 CD-ROM. This CD-ROM comes with the Unicode
+Standard reference manual and can be ordered from the Unicode Consortium
+at http://www.unicode.org. The identical information is available on the
+internet at ftp://ftp.unicode.org/Public/MAPPINGS.
+
+See the encoding.h header in the Expat sub-directory for a description of
+the structure of these files.
+
+Clark Cooper
+December 12, 1998
+
+================================================================
+
+Contributed maps
+
+This distribution contains four contributed encodings from MURATA Makoto
+<murata@apsdc.ksp.fujixerox.co.jp> that are variations on the encoding
+commonly called Shift_JIS:
+
+x-sjis-cp932.enc
+x-sjis-jdk117.enc
+x-sjis-jisx0221.enc
+x-sjis-unicode.enc (This is the same encoding as the shift_jis.enc that
+ was distributed with this module in version 2.17)
+
+Please read his message (Japanese_Encodings.msg) about why these are here
+and why I've removed the shift_jis.enc encoding.
+
+We also have two contributed encodings that are variations of the EUC-JP
+encoding from Yoshida Masato <yoshidam@inse.co.jp>:
+
+x-euc-jp-jisx0221.enc
+x-euc-jp-unicode.enc
+
+The comments that MURATA Makoto made in his message apply to these
+encodings too.
+
+KangChan Lee <dolphin@comeng.chungnam.ac.kr> supplied the euc-kr encoding.
+
+Clark Cooper
+December 26, 1998
diff --git a/Parser/Encodings/big5.enc b/Parser/Encodings/big5.enc
new file mode 100644
index 0000000..94b2bd4
--- /dev/null
+++ b/Parser/Encodings/big5.enc
Binary files differ
diff --git a/Parser/Encodings/euc-kr.enc b/Parser/Encodings/euc-kr.enc
new file mode 100644
index 0000000..3da8a13
--- /dev/null
+++ b/Parser/Encodings/euc-kr.enc
Binary files differ
diff --git a/Parser/Encodings/iso-8859-2.enc b/Parser/Encodings/iso-8859-2.enc
new file mode 100644
index 0000000..d320d7f
--- /dev/null
+++ b/Parser/Encodings/iso-8859-2.enc
Binary files differ
diff --git a/Parser/Encodings/iso-8859-3.enc b/Parser/Encodings/iso-8859-3.enc
new file mode 100644
index 0000000..ba48378
--- /dev/null
+++ b/Parser/Encodings/iso-8859-3.enc
Binary files differ
diff --git a/Parser/Encodings/iso-8859-4.enc b/Parser/Encodings/iso-8859-4.enc
new file mode 100644
index 0000000..0294a24
--- /dev/null
+++ b/Parser/Encodings/iso-8859-4.enc
Binary files differ
diff --git a/Parser/Encodings/iso-8859-5.enc b/Parser/Encodings/iso-8859-5.enc
new file mode 100644
index 0000000..6dbd169
--- /dev/null
+++ b/Parser/Encodings/iso-8859-5.enc
Binary files differ
diff --git a/Parser/Encodings/iso-8859-7.enc b/Parser/Encodings/iso-8859-7.enc
new file mode 100644
index 0000000..02a4aee
--- /dev/null
+++ b/Parser/Encodings/iso-8859-7.enc
Binary files differ
diff --git a/Parser/Encodings/iso-8859-8.enc b/Parser/Encodings/iso-8859-8.enc
new file mode 100644
index 0000000..f211bd5
--- /dev/null
+++ b/Parser/Encodings/iso-8859-8.enc
Binary files differ
diff --git a/Parser/Encodings/iso-8859-9.enc b/Parser/Encodings/iso-8859-9.enc
new file mode 100644
index 0000000..fdc574b
--- /dev/null
+++ b/Parser/Encodings/iso-8859-9.enc
Binary files differ
diff --git a/Parser/Encodings/windows-1250.enc b/Parser/Encodings/windows-1250.enc
new file mode 100644
index 0000000..d4a64b5
--- /dev/null
+++ b/Parser/Encodings/windows-1250.enc
Binary files differ
diff --git a/Parser/Encodings/windows-1252.enc b/Parser/Encodings/windows-1252.enc
new file mode 100644
index 0000000..ab2d57c
--- /dev/null
+++ b/Parser/Encodings/windows-1252.enc
Binary files differ
diff --git a/Parser/Encodings/x-euc-jp-jisx0221.enc b/Parser/Encodings/x-euc-jp-jisx0221.enc
new file mode 100644
index 0000000..ca79c07
--- /dev/null
+++ b/Parser/Encodings/x-euc-jp-jisx0221.enc
Binary files differ
diff --git a/Parser/Encodings/x-euc-jp-unicode.enc b/Parser/Encodings/x-euc-jp-unicode.enc
new file mode 100644
index 0000000..34d4d0d
--- /dev/null
+++ b/Parser/Encodings/x-euc-jp-unicode.enc
Binary files differ
diff --git a/Parser/Encodings/x-sjis-cp932.enc b/Parser/Encodings/x-sjis-cp932.enc
new file mode 100644
index 0000000..c2a6bc4
--- /dev/null
+++ b/Parser/Encodings/x-sjis-cp932.enc
Binary files differ
diff --git a/Parser/Encodings/x-sjis-jdk117.enc b/Parser/Encodings/x-sjis-jdk117.enc
new file mode 100644
index 0000000..b6c2c07
--- /dev/null
+++ b/Parser/Encodings/x-sjis-jdk117.enc
Binary files differ
diff --git a/Parser/Encodings/x-sjis-jisx0221.enc b/Parser/Encodings/x-sjis-jisx0221.enc
new file mode 100644
index 0000000..cbb2db5
--- /dev/null
+++ b/Parser/Encodings/x-sjis-jisx0221.enc
Binary files differ
diff --git a/Parser/Encodings/x-sjis-unicode.enc b/Parser/Encodings/x-sjis-unicode.enc
new file mode 100644
index 0000000..6f88a06
--- /dev/null
+++ b/Parser/Encodings/x-sjis-unicode.enc
Binary files differ