summaryrefslogtreecommitdiff
path: root/data/i18n_sdd.txt
diff options
context:
space:
mode:
Diffstat (limited to 'data/i18n_sdd.txt')
-rw-r--r--data/i18n_sdd.txt2337
1 files changed, 0 insertions, 2337 deletions
diff --git a/data/i18n_sdd.txt b/data/i18n_sdd.txt
deleted file mode 100644
index 5c6cbcedc..000000000
--- a/data/i18n_sdd.txt
+++ /dev/null
@@ -1,2337 +0,0 @@
-
-
- WORKING DRAFT Ira McDonald
- <i18n_sdd.txt> High North Inc
-
- Common UNIX Printing System ("CUPS")
- Internationalization Software Design Description v0.3
-
- Copyright (C) Easy Software Products (2002) - All Rights Reserved
-
-
- Status of this Document
-
- This document is an unapproved working draft and is incomplete in some
- sections (see 'Ed Note:' comments).
-
-
- Abstract
-
- This document provides general information and high-level design for the
- Internationalization extensions for the Common UNIX Printing System
- ("CUPS") Version 1.2. This document also provides C language header
- files and high-level pseudo-code for all new modules and external
- functions.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- McDonald June 20, 2002 [Page 1]
-
- CUPS Internationalization Software Design Description v0.3
-
- Table of Contents
-
- 1. Scope ...................................................... 4
- 1.1. Identification ......................................... 4
- 1.2. System Overview ........................................ 4
- 1.3. Document Overview ...................................... 4
- 2. References ................................................. 5
- 2.1. CUPS References ........................................ 5
- 2.2. Other Documents ........................................ 5
- 3. Design Overview ............................................ 7
- 3.1. Transcoding - New ...................................... 7
- 3.1.1. transcode.h - Transcoding header ................... 7
- 3.1.1.1. cups_cmap_t - SBCS Charmap Structure ........... 10
- 3.1.1.2. cups_dmap_t - DBCS Charmap Structure ........... 11
- 3.1.2. transcode.c - Transcoding module ................... 11
- 3.1.2.1. cupsUtf8ToCharset() ............................ 11
- 3.1.2.2. cupsCharsetToUtf8() ............................ 12
- 3.1.2.3. cupsUtf8ToUtf16() .............................. 12
- 3.1.2.4. cupsUtf16ToUtf8() .............................. 12
- 3.1.2.5. cupsUtf8ToUtf32() .............................. 12
- 3.1.2.6. cupsUtf32ToUtf8() .............................. 13
- 3.1.2.7. cupsUtf16ToUtf32() ............................. 13
- 3.1.2.8. cupsUtf32ToUtf16() ............................. 13
- 3.1.2.9. Transcoding Utility Functions .................. 13
- 3.1.2.9.1. cupsCharmapGet() ........................... 14
- 3.1.2.9.2. cupsCharmapFree() .......................... 14
- 3.1.2.9.3. cupsCharmapFlush() ......................... 14
- 3.2. Normalization - New .................................... 15
- 3.2.1. normalize.h - Normalization header ................. 15
- 3.2.1.1. cups_normmap_t - Normalize Map Structure ....... 22
- 3.2.1.2. cups_foldmap_t - Case Fold Map Structure ....... 22
- 3.2.1.3. cups_propmap_t - Char Property Map Structure ... 23
- 3.2.1.4. cups_prop_t - Char Property Structure .......... 23
- 3.2.1.5. cups_breakmap_t - Line Break Map Structure ..... 23
- 3.2.1.6. cups_combmap_t - Combining Class Map Structure . 24
- 3.2.1.7. cups_comb_t - Combining Class Structure ........ 24
- 3.2.2. normalize.c - Normalization module ................. 24
- 3.2.2.1. cupsUtf8Normalize() ............................ 24
- 3.2.2.2. cupsUtf32Normalize() ........................... 25
- 3.2.2.3. cupsUtf8CaseFold() ............................. 25
- 3.2.2.4. cupsUtf32CaseFold() ............................ 26
- 3.2.2.5. cupsUtf8CompareCaseless() ...................... 26
- 3.2.2.6. cupsUtf32CompareCaseless() ..................... 26
- 3.2.2.7. cupsUtf8CompareIdentifier() .................... 27
- 3.2.2.8. cupsUtf32CompareIdentifier() ................... 27
- 3.2.2.9. cupsUtf32CharacterProperty() ................... 27
- 3.2.2.10. Normalization Utility Functions ............... 28
- 3.2.2.10.1. cupsNormalizeMapsGet() .................... 28
- 3.2.2.10.2. cupsNormalizeMapsFree() ................... 28
- 3.2.2.10.3. cupsNormalizeMapsFlush() .................. 28
- 3.3. Language - Existing .................................... 29
- 3.3.1. language.h - Language header ....................... 29
-
- McDonald June 20, 2002 [Page 2]
-
- CUPS Internationalization Software Design Description v0.3
-
- 3.3.2. language.c - Language module ....................... 29
- 3.3.2.1. cupsLangEncoding() - Existing .................. 29
- 3.3.2.2. cupsLangFlush() - Existing ..................... 29
- 3.3.2.3. cupsLangFree() - Existing ...................... 29
- 3.3.2.4. cupsLangGet() - Existing ....................... 30
- 3.3.2.5. cupsLangPrintf() - New ......................... 30
- 3.3.2.6. cupsLangPuts() - New ........................... 30
- 3.3.2.7. cupsEncodingName() - New ....................... 31
- 3.4. Common Text Filter - Existing .......................... 31
- 3.4.1. textcommon.h - Common text filter header ........... 31
- 3.4.1.1. lchar_t - Character/Attribute Structure ........ 31
- 3.4.2. textcommon.c - Common text filter .................. 32
- 3.4.2.1. TextMain() - Existing .......................... 32
- 3.4.2.2. compare_keywords() - Existing .................. 33
- 3.4.2.3. getutf8() - Existing ........................... 33
- 3.5. Text to PostScript Filter - Existing ................... 33
- 3.5.1. texttops.c - Text to PostScript filter ............. 33
- 3.5.1.1. main() - Existing .............................. 33
- 3.5.1.2. WriteEpilogue () - Existing .................... 34
- 3.5.1.3. WritePage () - Existing ........................ 34
- 3.5.1.4. WriteProlog () - Existing ...................... 34
- 3.5.1.5. write_line() - Existing ........................ 34
- 3.5.1.6. write_string() - Existing ...................... 34
- 3.5.1.7. write_text() - Existing ........................ 35
- A. Glossary ................................................... A-1
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- McDonald June 20, 2002 [Page 3]
-
- CUPS Internationalization Software Design Description v0.3
-
-
-
- 1. Scope
-
-
-
- 1.1. Identification
-
- This document provides general information and high-level design for the
- Internationalization extensions for the Common UNIX Printing System
- ("CUPS") Version 1.2. This document also provides C language header
- files and high-level pseudo-code for all new modules and external
- functions.
-
-
- 1.2. System Overview
-
- The CUPS Internationalization extensions provide multilingual support
- via Unicode 3.2:2002 [UNICODE3.2] / ISO-10646-1:2000 [ISO10646-1] and a
- suite of local character sets (including all adopted parts of ISO-8859
- and many MS Windows code pages) for CUPS 1.2.
-
- The CUPS Internationalization extensions support UTF-8 [RFC2279] as the
- common stream-oriented representation of all character data. UTF-8 is
- defined in [ISO10646-1] and is further constrained (for integrity and
- security) by [UNICODE3.2].
-
- UTF-8 is the native character set of LDAPv3 [RFC2251], SLPv2 [RFC2608],
- IPP/1.1 [RFC2910] [RFC2911], and many other Internet protocols.
-
-
- 1.3. Document Overview
-
-
- This software design description document is organized into the
- following sections:
-
- o 1 - Scope
- o 2 - References
- o 3 - Design Overview
- o A - Glossary
-
-
-
-
-
-
-
-
-
-
-
-
- McDonald June 20, 2002 [Page 4]
-
- CUPS Internationalization Software Design Description v0.3
-
-
-
- 2. References
-
-
-
- 2.1. CUPS References
-
- See: Section 2.1 'CUPS Documentation' of CUPS Software Design
- Description.
-
-
- 2.2. Other Documents
-
- The following non-CUPS documents are referenced by this document.
-
- [ANSI-X3.4] ANSI Coded Character Set - 7-bit American National Standard
- Code for Information Interchange, ANSI X3.4, 1986 (aka US-ASCII).
-
- [GB2312] Code of Chinese Graphic Character Set for Information
- Interchange, Primary Set, GB 2312, 1980.
-
- [ISO639-1] Codes for the Representation of Names of Languages -- Part 1:
- Alpha-2 Code, ISO/IEC 639-1, 2000.
-
- [ISO639-2] Codes for the Representation of Names of Languages -- Part 2:
- Alpha-3 Code, ISO/IEC 639-2, 1998.
-
- [ISO646] Information Technology - ISO 7-bit Coded Character Set for
- Information Interchange, ISO/IEC 646, 1991.
-
- [ISO2022] Information Processing - ISO 7-bit and 8-bit Coded Character
- Sets - Code Extension Techniques, ISO/IEC 2022, 1994. (Technically
- identical to ECMA-35.)
-
- [ISO3166-1] Codes for the Representation of Names of Countries and their
- Subdivisions, Part 1: Country Codes, ISO/ISO 3166-1, 1997.
-
- [ISO8859] Information Processing - 8-bit Single-Byte Code Graphic
- Character Sets, ISO/IEC 8859-n, 1987-2001.
-
- [ISO10646-1] Information Technology - Universal Multiple-Octet Code
- Character Set (UCS) - Part 1: Architecture and Basic Multilingual
- Plane, ISO/IEC 10646-1, September 2000.
-
- [ISO10646-2] Information Technology - Universal Multiple-Octet Code
- Character Set (UCS) - Part 2: Supplemental Planes, ISO/IEC 10646-2,
- January 2001.
-
- [RFC2119] Bradner. Key words for use in RFCs to Indicate Requirement
- Levels, RFC 2119, March 1997.
-
-
- McDonald June 20, 2002 [Page 5]
-
- CUPS Internationalization Software Design Description v0.3
-
-
- [RFC2251] Whal, Howes, Kille. Lightweight Directory Access Protocol
- Version 3 (LDAPv3), RFC 2251, December 1997.
-
- [RFC2277] Alvestrand. IETF Policy on Character Sets and Languages, RFC
- 2277, January 1998.
-
- [RFC2279] Yergeau. UTF-8, a Transformation Format of ISO 10646, RFC
- 2279, January 1998.
-
- [RFC2608] Guttman, Perkins, Veizades, Day. Service Location Protocol
- Version 2 (SLPv2), RFC 2608, June 1999.
-
- [RFC2910] Herriot, Butler, Moore, Turner, Wenn. Internet Printing
- Protocol/1.1: Encoding and Transport, RFC 2910, September 2000.
-
- [RFC2911] Hastings, Herriot, deBry, Isaacson, Powell. Internet Printing
- Protocol/1.1: Model and Semantics, RFC 2911, September 2000.
-
- [UNICODE3.0] Unicode Consortium, Unicode Standard Version 3.0,
- Addison-Wesley Developers Press, ISBN 0-201-61633-5, 2000.
-
- [UNICODE3.1] Unicode Consortium, Unicode Standard Version 3.1 (UAX-27),
- May 2001.
-
- [UNICODE3.2] Unicode Consortium, Unicode Standard Version 3.2 (UAX-28),
- March 2002.
-
- [US-ASCII] See [ANSI-X3.4] above.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- McDonald June 20, 2002 [Page 6]
-
- CUPS Internationalization Software Design Description v0.3
-
-
-
- 3. Design Overview
-
- The CUPS Internationalization extensions are composed of several header
- files and modules which extend the Language functions in the existing
- CUPS Application Programmers Interface (API).
-
-
- 3.1. Transcoding - New
-
- Initially, the CUPS Internationalization extensions will only support
- SBCS (single-byte character set) transcoding. But the design allows
- future support for DBCS (double-byte character set) transcoding for CJK
- (Chinese/Japanese/Korean) languages and the MBCS (multiple-byte
- character set) compound sets that use escapes for charset switching.
-
- In order to reduce code size and increase performance all conventional
- 'mapping files' (tables of values in legacy characters sets with their
- corresponding Unicode scalar values) will ALSO be sorted and stored in
- memory as reverse maps (for efficient conversion from Unicode scalar
- values to their corresponding legacy character set values). Transcoding
- will be done directly by 2-level lookup (without any searching or
- sorting).
-
- [Ed Note: CJK languages will be fairly costly in mapping table sizes,
- because they have thousands (or tens of thousands) of codepoints.]
-
-
-
- 3.1.1. transcode.h - Transcoding header
-
- /*
- * "$Id: i18n_sdd.txt 2678 2002-08-19 01:15:26Z mike $"
- *
- * Transcoding support for the Common UNIX Printing System (CUPS).
- *
- * Copyright 1997-2002 by Easy Software Products.
- *
- * These coded instructions, statements, and computer programs are
- * the property of Easy Software Products and are protected by Federal
- * copyright law. Distribution and use rights are outlined in the
- * file "LICENSE.txt" which should have been included with this file.
- * If this file is missing or damaged please contact Easy Software
- * Products at:
- *
- * Attn: CUPS Licensing Information
- * Easy Software Products
- * 44141 Airport View Drive, Suite 204
- * Hollywood, Maryland 20636-3111 USA
- *
- * Voice: (301) 373-9603
-
- McDonald June 20, 2002 [Page 7]
-
- CUPS Internationalization Software Design Description v0.3
-
- * EMail: cups-info@cups.org
- * WWW: http://www.cups.org
- */
-
- #ifndef _CUPS_TRANSCODE_H_
- # define _CUPS_TRANSCODE_H_
-
- /*
- * Include necessary headers...
- */
-
- # include "cups/language.h"
-
- # ifdef __cplusplus
- extern "C" {
- # endif /* __cplusplus */
-
- /*
- * Types...
- */
-
- typedef unsigned char utf8_t; /* UTF-8 Unicode/ISO-10646 code unit */
- typedef unsigned short utf16_t; /* UTF-16 Unicode/ISO-10646 code unit */
- typedef unsigned long utf32_t; /* UTF-32 Unicode/ISO-10646 code unit */
- typedef unsigned short ucs2_t; /* UCS-2 Unicode/ISO-10646 code unit */
- typedef unsigned long ucs4_t; /* UCS-4 Unicode/ISO-10646 code unit */
- typedef unsigned char sbcs_t; /* SBCS Legacy 8-bit code unit */
- typedef unsigned short dbcs_t; /* DBCS Legacy 16-bit code unit */
-
- /*
- * Structures...
- */
-
- typedef struct cups_cmap_str /**** SBCS Charmap Cache Structure ****/
- {
- struct cups_cmap_str *next; /* Next charmap in cache */
- int used; /* Number of times entry used */
- cups_encoding_t encoding; /* Legacy charset encoding */
- ucs2_t char2uni[256]; /* Map Legacy SBCS -> UCS-2 */
- sbcs_t *uni2char[256]; /* Map UCS-2 -> Legacy SBCS */
- } cups_cmap_t;
-
- #if 0
- typedef struct cups_dmap_str /**** DBCS Charmap Cache Structure ****/
- {
- struct cups_dmap_str *next; /* Next charmap in cache */
- int used; /* Number of times entry used */
- cups_encoding_t encoding; /* Legacy charset encoding */
- ucs2_t *char2uni[256]; /* Map Legacy DBCS -> UCS-2 */
- dbcs_t *uni2char[256]; /* Map UCS-2 -> Legacy DBCS */
- } cups_dmap_t;
- #endif
-
- McDonald June 20, 2002 [Page 8]
-
- CUPS Internationalization Software Design Description v0.3
-
-
- /*
- * Constants...
- */
- #define CUPS_MAX_USTRING 1024 /* Maximum size of Unicode string */
-
- /*
- * Globals...
- */
-
- extern int TcFixMapNames; /* Fix map names to Unicode names */
- extern int TcStrictUtf8; /* Non-shortest-form is illegal */
- extern int TcStrictUtf16; /* Invalid surrogate pair is illegal */
- extern int TcStrictUtf32; /* Greater than 0x10FFFF is illegal */
- extern int TcRequireBOM; /* Require BOM for little/big-endian */
- extern int TcSupportBOM; /* Support BOM for little/big-endian */
- extern int TcSupport8859; /* Support ISO 8859-x repertoires */
- extern int TcSupportWin; /* Support Windows-x repertoires */
- extern int TcSupportCJK; /* Support CJK (Asian) repertoires */
-
- /*
- * Prototypes...
- */
-
- /*
- * Utility functions for character set maps
- */
- extern void *cupsCharmapGet(const cups_encoding_t encoding);
- /* I - Encoding */
- extern void cupsCharmapFree(const cups_encoding_t encoding);
- /* I - Encoding */
- extern void cupsCharmapFlush(void);
-
- /*
- * Convert UTF-8 to and from legacy character set
- */
- extern int cupsUtf8ToCharset(char *dest, /* O - Target string */
- const utf8_t *src, /* I - Source string */
- const int maxout, /* I - Max output */
- cups_encoding_t encoding); /* I - Encoding */
- extern int cupsCharsetToUtf8(utf8_t *dest, /* O - Target string */
- const char *src, /* I - Source string */
- const int maxout, /* I - Max output */
- cups_encoding_t encoding); /* I - Encoding */
-
- /*
- * Convert UTF-8 to and from UTF-16
- */
- extern int cupsUtf8ToUtf16(utf16_t *dest, /* O - Target string */
- const utf8_t *src, /* I - Source string */
- const int maxout); /* I - Max output */
- extern int cupsUtf16ToUtf8(utf8_t *dest, /* O - Target string */
-
- McDonald June 20, 2002 [Page 9]
-
- CUPS Internationalization Software Design Description v0.3
-
- const utf16_t *src, /* I - Source string */
- const int maxout); /* I - Max output */
-
- /*
- * Convert UTF-8 to and from UTF-32
- */
- extern int cupsUtf8ToUtf32(utf32_t *dest, /* O - Target string */
- const utf8_t *src, /* I - Source string */
- const int maxout); /* I - Max output */
- extern int cupsUtf32ToUtf8(utf8_t *dest, /* O - Target string */
- const utf32_t *src, /* I - Source string */
- const int maxout); /* I - Max output */
-
- /*
- * Convert UTF-16 to and from UTF-32
- */
- extern int cupsUtf16ToUtf32(utf32_t *dest, /* O - Target string */
- const utf16_t *src, /* I - Source string */
- const int maxout); /* I - Max output */
- extern int cupsUtf32ToUtf16(utf16_t *dest, /* O - Target string */
- const utf32_t *src, /* I - Source string */
- const int maxout); /* I - Max output */
-
- # ifdef __cplusplus
- }
- # endif /* __cplusplus */
-
- #endif /* !_CUPS_TRANSCODE_H_ */
-
- /*
- * End of "$Id: i18n_sdd.txt 2678 2002-08-19 01:15:26Z mike $"
- */
-
-
-
- 3.1.1.1. cups_cmap_t - SBCS Charmap Structure
-
- typedef struct cups_cmap_str /**** SBCS Charmap Cache Structure ****/
- {
- struct cups_cmap_str *next; /* Next charset map in cache */
- int used; /* Number of times entry used */
- cups_encoding_t encoding; /* Legacy charset encoding */
- ucs2_t char2uni[256]; /* Map Legacy SBCS -> UCS-2 */
- sbcs_t *uni2char[256]; /* Map UCS-2 -> Legacy SBCS */
- } cups_cmap_t;
-
- 'char2uni[]' is a (complete) array of UCS-2 values that supports direct
- one-level lookup from an input SBCS legacy charset code point, for use
- by 'cupsCharsetToUtf8()'.
-
- 'uni2char[]' is a (sparse) array of pointers to arrays of (256 each)
- SBCS values, that supports direct two-level lookup from an input UCS-2
-
- McDonald June 20, 2002 [Page 10]
-
- CUPS Internationalization Software Design Description v0.3
-
- code point, for use by 'cupsUtf8ToCharset()'.
-
-
-
- 3.1.1.2. cups_dmap_t - DBCS Charmap Structure
-
- typedef struct cups_dmap_str /**** DBCS Charmap Cache Structure ****/
- {
- struct cups_dmap_str *next; /* Next charset map in cache */
- int used; /* Number of times entry used */
- cups_encoding_t encoding; /* Legacy charset encoding */
- ucs2_t *char2uni[256]; /* Map Legacy DBCS -> UCS-2 */
- dbcs_t *uni2char[256]; /* Map UCS-2 -> Legacy DBCS */
- } cups_dmap_t;
-
- 'char2uni[]' is a (sparse) array of pointers to arrays of (256 each)
- UCS-2 values that supports direct two-level lookup from an input DBCS
- legacy charset code point, for (future) use by 'cupsCharsetToUtf8()'.
-
- 'uni2char[]' is a (sparse) array of pointers to arrays of (256 each)
- DBCS values, that supports direct two-level lookup from an input UCS-2
- code point, for (future) use by 'cupsUtf8ToCharset()'.
-
-
-
- 3.1.2. transcode.c - Transcoding module
-
- All of the transcoding functions are modelled on the C standard library
- function 'strncpy()', except that they return the count of output, like
- 'strlen()', rather than the (redundant) pointer to the output.
-
- If the transcoding functions detect invalid input parameters or they
- detect an encoding error in their input, then they return '-1', rather
- than the count of output.
-
- All of the transcoding functions take an input parameter indicating the
- maximum output units (for safe operation). The functions that return
- 16-bit (UTF-16) or 32-bit (UTF-32/UCS-4) output always return the output
- string count (not including the final null) and NOT the memory size in
- bytes.
-
-
-
- 3.1.2.1. cupsUtf8ToCharset()
-
- extern int cupsUtf8ToCharset(char *dest, /* O - Target string */
- const utf8_t *src, /* I - Source string */
- const int maxout, /* I - Max output */
- cups_encoding_t encoding); /* I - Encoding */
-
- <Find charset map by calling 'cupsCharmapGet()'>
- <Convert input UTF-8 to internal UCS-4 by calling 'cupsUtf8ToUtf32()'>
-
- McDonald June 20, 2002 [Page 11]
-
- CUPS Internationalization Software Design Description v0.3
-
- <Convert internal UCS-4 to legacy charset via charset map>
- <Release charset map by calling 'cupsCharmapFree()'>
- <Return length of output legacy charset string -- size in butes>
-
-
-
- 3.1.2.2. cupsCharsetToUtf8()
-
- extern int cupsCharsetToUtf8(utf8_t *dest, /* O - Target string */
- const char *src, /* I - Source string */
- const int maxout, /* I - Max output */
- cups_encoding_t encoding); /* I - Encoding */
-
- <Find charset map by calling 'cupsCharmapGet()'>
- <Convert input legacy charset to internal UCS-4 via charset map>
- <Convert internal UCS-4 to UTF-8 by calling 'cupsUtf32ToUtf8()'>
- <Release charset map by calling 'cupsCharmapFree()'>
- <Return length of output UTF-8 string -- size in bytes>
-
-
-
- 3.1.2.3. cupsUtf8ToUtf16()
-
- extern int cupsUtf8ToUtf16(utf16_t *dest, /* O - Target string */
- const utf8_t *src, /* I - Source string */
- const int maxout); /* I - Max output */
-
- <...to avoid duplicate code to handle surrogate pairs...>
- <Convert input UTF-8 to internal UCS-4 by calling 'cupsUtf8ToUtf32()'>
- <Convert internal UCS-4 to UTF-16 by calling 'cupsUtf32ToUtf16()'>
- <Return count of output UTF-16 string -- NOT memory size in bytes>
-
-
-
- 3.1.2.4. cupsUtf16ToUtf8()
-
- extern int cupsUtf16ToUtf8(utf8_t *dest, /* O - Target string */
- const utf16_t *src, /* I - Source string */
- const int maxout); /* I - Max output */
-
- <...to avoid duplicate code to handle surrogate pairs...>
- <Convert input UTF-16 to internal UCS-4 by calling 'cupsUtf16ToUtf32()'>
- <Convert internal UCS-4 to UTF-8 by calling 'cupsUtf32ToUtf8()'>
- <Return length of output UTF-8 string -- size in bytes>
-
-
-
- 3.1.2.5. cupsUtf8ToUtf32()
-
- extern int cupsUtf8ToUtf32(utf32_t *dest, /* O - Target string */
- const utf8_t *src, /* I - Source string */
- const int maxout); /* I - Max output */
-
- McDonald June 20, 2002 [Page 12]
-
- CUPS Internationalization Software Design Description v0.3
-
-
- <Convert input UTF-8 directly to output UCS-4...>
- <...checking for valid range, shortest-form, etc.>
- <Return count of output UTF-32 string -- NOT memory size in bytes>
-
-
-
- 3.1.2.6. cupsUtf32ToUtf8()
-
- extern int cupsUtf32ToUtf8(utf8_t *dest, /* O - Target string */
- const utf32_t *src, /* I - Source string */
- const int maxout); /* I - Max output */
-
- <Convert input UCS-4 directly to output UTF-8...>
- <...checking for valid range, etc.>
- <Return length of output UTF-8 string -- size in bytes>
-
-
-
- 3.1.2.7. cupsUtf16ToUtf32()
-
- extern int cupsUtf16ToUtf32(utf32_t *dest, /* O - Target string */
- const utf16_t *src, /* I - Source string */
- const int maxout); /* I - Max output */
-
- <Convert input UTF-16 directly to output UCS-4...>
- <...handling surrogate pairs decoding from UTF-16>
- <Return count of output UTF-32 string -- NOT memory size in bytes>
-
-
-
- 3.1.2.8. cupsUtf32ToUtf16()
-
- extern int cupsUtf32ToUtf16(utf16_t *dest, /* O - Target string */
- const utf32_t *src, /* I - Source string */
- const int maxout); /* I - Max output */
-
- <Convert input UCS-4 directly to output UTF-16...>
- <...handling surrogate pairs encoding to UTF-16>
- <Return count of output UTF-16 string -- NOT memory size in bytes>
-
-
-
- 3.1.2.9. Transcoding Utility Functions
-
- The transcoding utility functions are used to load (from a file into
- memory), free (logically, without freeing memory), and flush (actually
- free memory) character maps for SBCS (single-byte character set) and
- (future) DBCS (double-byte character set) transcoding to and from UTF-8.
-
-
-
-
- McDonald June 20, 2002 [Page 13]
-
- CUPS Internationalization Software Design Description v0.3
-
-
-
- 3.1.2.9.1. cupsCharmapGet()
-
- extern void *cupsCharmapGet(const cups_encoding_t encoding);
- /* I - Encoding */
-
- <Find SBSC or DBCS charset map in cache>
- <...If found, increment 'used'>
- <...and return pointer to SBCS or DBCS charset map>
- <Get charset map file name by calling 'cupsEncodingName()'>
- <Open charset map file>
- <...If not found, return void>
- <Allocate memory for SBCS or DBCS charset map in cache>
- <...If no memory, return void>
- <Add to SBCS or DBCS cache by assigning 'next' field>
- <Assign 'encoding' field>
- <Increment 'used' field>
- <Read charset map file into memory in loop...>
- <If SBCS, then 'char2uni[]' is an array of 'ucs2_t' values>
- <...and 'uni2char[]' is an array of pointers to 'sbcs_t' arrays>
- <If DBCS, then char2uni[]' is an array of pointers to 'ucs2_t' arrays>
- <...and 'uni2char[]' is an array of pointers to 'dbcs_t' arrays>
- <Close charset map file>
- <Return pointer to SBCS or DBCS charset map>
-
-
-
- 3.1.2.9.2. cupsCharmapFree()
-
- extern void cupsCharmapFree(const cups_encoding_t encoding);
- /* I - Encoding */
-
- <Find SBSC or DBCS charset map in cache>
- <...If found, decrement 'used'>
- <Return void>
-
-
-
- 3.1.2.9.3. cupsCharmapFlush()
-
- extern void cupsCharmapFlush(void);
-
- <Loop through SBCS charset map cache...>
- <...Free 'uni2char[]' memory>
- <...Free SBCS charset map memory>
- <Loop through DBCS charset map cache...>
- <...Free 'char2uni[]' memory>
- <...Free 'uni2char[]' memory>
- <...Free DBCS charset map memory>
- <Return void>
-
-
- McDonald June 20, 2002 [Page 14]
-
- CUPS Internationalization Software Design Description v0.3
-
-
-
-
- 3.2. Normalization - New
-
-
-
- 3.2.1. normalize.h - Normalization header
-
- /*
- * "$Id: i18n_sdd.txt 2678 2002-08-19 01:15:26Z mike $"
- *
- * Unicode normalization for the Common UNIX Printing System (CUPS).
- *
- * Copyright 1997-2002 by Easy Software Products.
- *
- * These coded instructions, statements, and computer programs are
- * the property of Easy Software Products and are protected by Federal
- * copyright law. Distribution and use rights are outlined in the
- * file "LICENSE.txt" which should have been included with this file.
- * If this file is missing or damaged please contact Easy Software
- * Products at:
- *
- * Attn: CUPS Licensing Information
- * Easy Software Products
- * 44141 Airport View Drive, Suite 204
- * Hollywood, Maryland 20636-3111 USA
- *
- * Voice: (301) 373-9603
- * EMail: cups-info@cups.org
- * WWW: http://www.cups.org
- */
-
- #ifndef _CUPS_NORMALIZE_H_
- # define _CUPS_NORMALIZE_H_
-
- /*
- * Include necessary headers...
- */
-
- # include "transcod.h"
-
- # ifdef __cplusplus
- extern "C" {
- # endif /* __cplusplus */
-
- /*
- * Types...
- */
-
- typedef enum /**** Normalizataion Types ****/
- {
-
- McDonald June 20, 2002 [Page 15]
-
- CUPS Internationalization Software Design Description v0.3
-
- CUPS_NORM_NFD, /* Canonical Decomposition */
- CUPS_NORM_NFKD, /* Compatibility Decomposition */
- CUPS_NORM_NFC, /* NFD, them Canonical Composition */
- CUPS_NORM_NFKC /* NFKD, them Canonical Composition */
- } cups_normalize_t;
-
- typedef enum /**** Case Folding Types ****/
- {
- CUPS_FOLD_SIMPLE, /* Simple - no expansion in size */
- CUPS_FOLD_FULL /* Full - possible expansion in size */
- } cups_folding_t;
-
- typedef enum /**** Unicode Char Property Types ****/
- {
- CUPS_PROP_GENERAL_CATEGORY, /* See 'cups_gencat_t' enum */
- CUPS_PROP_BIDI_CATEGORY, /* See 'cups_bidicat_t' enum */
- CUPS_PROP_COMBINING_CLASS, /* See 'cups_combclass_t' type */
- CUPS_PROP_BREAK_CLASS /* See 'cups_breakclass_t' enum */
- } cups_property_t;
-
- /*
- * Note - parse Unicode char general category from 'UnicodeData.txt'
- * into sparse local table in 'normalize.c'.
- * Use major classes for logic optimizations throughout (by mask).
- */
-
- typedef enum /**** Unicode General Category ****/
- {
- CUPS_GENCAT_L = 0x10, /* Letter major class */
- CUPS_GENCAT_LU = 0x11, /* Lu Letter, Uppercase */
- CUPS_GENCAT_LL = 0x12, /* Ll Letter, Lowercase */
- CUPS_GENCAT_LT = 0x13, /* Lt Letter, Titlecase */
- CUPS_GENCAT_LM = 0x14, /* Lm Letter, Modifier */
- CUPS_GENCAT_LO = 0x15, /* Lo Letter, Other */
- CUPS_GENCAT_M = 0x20, /* Mark major class */
- CUPS_GENCAT_MN = 0x21, /* Mn Mark, Non-Spacing */
- CUPS_GENCAT_MC = 0x22, /* Mc Mark, Spacing Combining */
- CUPS_GENCAT_ME = 0x23, /* Me Mark, Enclosing */
- CUPS_GENCAT_N = 0x30, /* Number major class */
- CUPS_GENCAT_ND = 0x31, /* Nd Number, Decimal Digit */
- CUPS_GENCAT_NL = 0x32, /* Nl Number, Letter */
- CUPS_GENCAT_NO = 0x33, /* No Number, Other */
- CUPS_GENCAT_P = 0x40, /* Punctuation major class */
- CUPS_GENCAT_PC = 0x41, /* Pc Punctuation, Connector */
- CUPS_GENCAT_PD = 0x42, /* Pd Punctuation, Dash */
- CUPS_GENCAT_PS = 0x43, /* Ps Punctuation, Open (start) */
- CUPS_GENCAT_PE = 0x44, /* Pe Punctuation, Close (end) */
- CUPS_GENCAT_PI = 0x45, /* Pi Punctuation, Initial Quote */
- CUPS_GENCAT_PF = 0x46, /* Pf Punctuation, Final Quote */
- CUPS_GENCAT_PO = 0x47, /* Po Punctuation, Other */
- CUPS_GENCAT_S = 0x50, /* Symbol major class */
- CUPS_GENCAT_SM = 0x51, /* Sm Symbol, Math */
-
- McDonald June 20, 2002 [Page 16]
-
- CUPS Internationalization Software Design Description v0.3
-
- CUPS_GENCAT_SC = 0x52, /* Sc Symbol, Currency */
- CUPS_GENCAT_SK = 0x53, /* Sk Symbol, Modifier */
- CUPS_GENCAT_SO = 0x54, /* So Symbol, Other */
- CUPS_GENCAT_Z = 0x60, /* Separator major class */
- CUPS_GENCAT_ZS = 0x61, /* Zs Separator, Space */
- CUPS_GENCAT_ZL = 0x62, /* Zl Separator, Line */
- CUPS_GENCAT_ZP = 0x63, /* Zp Separator, Paragraph */
- CUPS_GENCAT_C = 0x70, /* Other (miscellaneous) major class */
- CUPS_GENCAT_CC = 0x71, /* Cc Other, Control */
- CUPS_GENCAT_CF = 0x72, /* Cf Other, Format */
- CUPS_GENCAT_CS = 0x73, /* Cs Other, Surrogate */
- CUPS_GENCAT_CO = 0x74, /* Co Other, Private Use */
- CUPS_GENCAT_CN = 0x75 /* Cn Other, Not Assigned */
- } cups_gencat_t;
-
- /*
- * Note - parse Unicode char bidi category from 'UnicodeData.txt'
- * into sparse local table in 'normalize.c'.
- * Add bidirectional support to 'textcommon.c' - per Mike
- */
-
- typedef enum /**** Unicode Bidi Category ****/
- {
- CUPS_BIDI_L, /* Left-to-Right (Alpha, Syllabic, Ideographic) */
- CUPS_BIDI_LRE, /* Left-to-Right Embedding (explicit) */
- CUPS_BIDI_LRO, /* Left-to-Right Override (explicit) */
- CUPS_BIDI_R, /* Right-to-Left (Hebrew alphabet and most punct) */
- CUPS_BIDI_AL, /* Right-to-Left Arabic (Arabic, Thaana, Syriac) */
- CUPS_BIDI_RLE, /* Right-to-Left Embedding (explicit) */
- CUPS_BIDI_RLO, /* Right-to-Left Override (explicit) */
- CUPS_BIDI_PDF, /* Pop Directional Format */
- CUPS_BIDI_EN, /* Euro Number (Euro and East Arabic-Indic digits) */
- CUPS_BIDI_ES, /* Euro Number Separator (Slash) */
- CUPS_BIDI_ET, /* Euro Number Termintor (Plus, Minus, Degree, etc) */
- CUPS_BIDI_AN, /* Arabic Number (Arabic-Indic digits, separators) */
- CUPS_BIDI_CS, /* Common Number Separator (Colon, Comma, Dot, etc) */
- CUPS_BIDI_NSM, /* Non-Spacing Mark (category Mn / Me in UCD) */
- CUPS_BIDI_BN, /* Boundary Neutral (Formatting / Control chars) */
- CUPS_BIDI_B, /* Paragraph Separator */
- CUPS_BIDI_S, /* Segment Separator (Tab) */
- CUPS_BIDI_WS, /* Whitespace Space (Space, Line Separator, etc) */
- CUPS_BIDI_ON /* Other Neutrals */
- } cups_bidicat_t;
-
- /*
- * Note - parse Unicode line break class from 'DerivedLineBreak.txt'
- * into sparse local table (list of class ranges) in 'normalize.c'.
- * Note - add state table from UAX-14, section 7.3 - Ira
- * Remember to do BK and SP in outer loop (not in state table).
- * Consider optimization for CM (combining mark).
- * See 'LineBreak.txt' (12,875) and 'DerivedLineBreak.txt' (1,350).
- */
-
- McDonald June 20, 2002 [Page 17]
-
- CUPS Internationalization Software Design Description v0.3
-
-
- typedef enum /**** Unicode Line Break Class ****/
- {
- /*
- * (A) - Allow Break AFTER
- * (XA) - Prevent Break AFTER
- * (B) - Allow Break BEFORE
- * (XB) - Prevent Break BEFORE
- * (P) - Allow Break For Pair
- * (XP) - Prevent Break For Pair
- */
- CUPS_BREAK_AI, /* Ambiguous (Alphabetic or Ideograph) */
- CUPS_BREAK_AL, /* Ordinary Alphabetic / Symbol Chars (XP) */
- CUPS_BREAK_BA, /* Break Opportunity After Chars (A) */
- CUPS_BREAK_BB, /* Break Opportunities Before Chars (B) */
- CUPS_BREAK_B2, /* Break Opportunity Before / After (B/A/XP) */
- CUPS_BREAK_BK, /* Mandatory Break (A) (normative) */
- CUPS_BREAK_CB, /* Contingent Break (B/A) (normative) */
- CUPS_BREAK_CL, /* Closing Punctuation (XB) */
- CUPS_BREAK_CM, /* Attached Chars / Combining (XB) (normative) */
- CUPS_BREAK_CR, /* Carriage Return (A) (normative) */
- CUPS_BREAK_EX, /* Exclamation / Interrogation (XB) */
- CUPS_BREAK_GL, /* Non-breaking ("Glue") (XB/XA) (normative) */
- CUPS_BREAK_HY, /* Hyphen (XA) */
- CUPS_BREAK_ID, /* Ideographic (B/A) */
- CUPS_BREAK_IN, /* Inseparable chars (XP) */
- CUPS_BREAK_IS, /* Numeric Separator (Infix) (XB) */
- CUPS_BREAK_LF, /* Line Feed (A) (normative) */
- CUPS_BREAK_NS, /* Non-starters (XB) */
- CUPS_BREAK_NU, /* Numeric (XP) */
- CUPS_BREAK_OP, /* Opening Punctuation (XA) */
- CUPS_BREAK_PO, /* Postfix (Numeric) (XB) */
- CUPS_BREAK_PR, /* Prefix (Numeric) (XA) */
- CUPS_BREAK_QU, /* Ambiguous Quotation (XB/XA) */
- CUPS_BREAK_SA, /* Context Dependent (South East Asian) (P) */
- CUPS_BREAK_SG, /* Surrogates (XP) (normative) */
- CUPS_BREAK_SP, /* Space (A) (normative) */
- CUPS_BREAK_SY, /* Symbols Allowing Break After (A) */
- CUPS_BREAK_XX, /* Unknown (XP) */
- CUPS_BREAK_ZW /* Zero Width Space (A) (normative) */
- } cups_breakclass_t;
-
- typedef int cups_combclass_t; /**** Unicode Combining Class ****/
- /* 0=base / 1..254=combining char */
-
- /*
- * Structures...
- */
-
- typedef struct cups_normmap_str /**** Normalize Map Cache Struct ****/
- {
- struct cups_normmap_str *next; /* Next normalize in cache */
-
- McDonald June 20, 2002 [Page 18]
-
- CUPS Internationalization Software Design Description v0.3
-
- int used; /* Number of times entry used */
- cups_normalize_t normalize; /* Normalization type */
- int normcount; /* Count of Source Chars */
- ucs2_t *uni2norm; /* Char -> Normalization */
- /* ...only supports UCS-2 */
- } cups_normmap_t;
-
- typedef struct cups_foldmap_str /**** Case Fold Map Cache Struct ****/
- {
- struct cups_foldmap_str *next; /* Next case fold in cache */
- int used; /* Number of times entry used */
- cups_folding_t fold; /* Case folding type */
- int foldcount; /* Count of Source Chars */
- ucs2_t *uni2fold; /* Char -> Folded Char(s) */
- /* ...only supports UCS-2 */
- } cups_foldmap_t;
-
- typedef struct cups_prop_str /**** Char Property Struct ****/
- {
- ucs2_t ch; /* Unicode Char as UCS-2 */
- unsigned char gencat; /* General Category */
- unsigned char bidicat; /* Bidirectional Category */
- } cups_prop_t;
-
- typedef struct /**** Char Property Map Struct ****/
- {
- int used; /* Number of times entry used */
- int propcount; /* Count of Source Chars */
- cups_prop_t *uni2prop; /* Char -> Properties */
- } cups_propmap_t;
-
- typedef struct /**** Line Break Class Map Struct ****/
- {
- int used; /* Number of times entry used */
- int breakcount; /* Count of Source Chars */
- ucs2_t *uni2break; /* Char -> Line Break Class */
- } cups_breakmap_t;
-
- typedef struct cups_comb_str /**** Char Combining Class Struct ****/
- {
- ucs2_t ch; /* Unicode Char as UCS-2 */
- unsigned char combclass; /* Combining Class */
- unsigned char reserved; /* Reserved for alignment */
- } cups_comb_t;
-
- typedef struct /**** Combining Class Map Struct ****/
- {
- int used; /* Number of times entry used */
- int combcount; /* Count of Source Chars */
- cups_comb_t *uni2comb; /* Char -> Combining Class */
- } cups_combmap_t;
-
-
- McDonald June 20, 2002 [Page 19]
-
- CUPS Internationalization Software Design Description v0.3
-
-
- /*
- * Globals...
- */
-
- extern int NzSupportUcs2; /* Support UCS-2 (16-bit) mapping */
- extern int NzSupportUcs4; /* Support UCS-4 (32-bit) mapping */
-
- /*
- * Prototypes...
- */
-
- /*
- * Utility functions for normalization module
- */
- extern int cupsNormalizeMapsGet(void);
- extern int cupsNormalizeMapsFree(void);
- extern void cupsNormalizeMapsFlush(void);
-
- /*
- * Normalize UTF-8 string to Unicode UAX-15 Normalization Form
- * Note - Compatibility Normalization Forms (NFKD/NFKC) are
- * unsafe for subsequent transcoding to legacy charsets
- */
- extern int cupsUtf8Normalize(utf8_t *dest, /* O - Target string */
- const utf8_t *src, /* I - Source string */
- const int maxout, /* I - Max output */
- const cups_normalize_t normalize);
- /* I - Normalization */
-
- /*
- * Normalize UTF-32 string to Unicode UAX-15 Normalization Form
- * Note - Compatibility Normalization Forms (NFKD/NFKC) are
- * unsafe for subsequent transcoding to legacy charsets
- */
- extern int cupsUtf32Normalize(utf32_t *dest,
- /* O - Target string */
- const utf32_t *src, /* I - Source string */
- const int maxout, /* I - Max output */
- const cups_normalize_t normalize);
- /* I - Normalization */
-
- /*
- * Case Fold UTF-8 string per Unicode UAX-21 Section 2.3
- * Note - Case folding output is
- * unsafe for subsequent transcoding to legacy charsets
- */
- extern int cupsUtf8CaseFold(utf8_t *dest, /* O - Target string */
- const utf8_t *src, /* I - Source string */
- const int maxout, /* I - Max output */
- const cups_folding_t fold); /* I - Fold Mode */
-
-
- McDonald June 20, 2002 [Page 20]
-
- CUPS Internationalization Software Design Description v0.3
-
-
- /*
- * Case Fold UTF-32 string per Unicode UAX-21 Section 2.3
- * Note - Case folding output is
- * unsafe for subsequent transcoding to legacy charsets
- */
- extern int cupsUtf32CaseFold(utf32_t *dest,/* O - Target string */
- const utf32_t *src, /* I - Source string */
- const int maxout, /* I - Max output */
- const cups_folding_t fold); /* I - Fold Mode */
-
- /*
- * Compare UTF-8 strings after case folding
- */
- extern int cupsUtf8CompareCaseless(const utf8_t *s1,
- /* I - String1 */
- const utf8_t *s2); /* I - String2 */
-
- /*
- * Compare UTF-32 strings after case folding
- */
- extern int cupsUtf32CompareCaseless(const utf32_t *s1,
- /* I - String1 */
- const utf32_t *s2); /* I - String2 */
-
- /*
- * Compare UTF-8 strings after case folding and NFKC normalization
- */
- extern int cupsUtf8CompareIdentifier(const utf8_t *s1,
- /* I - String1 */
- const utf8_t *s2); /* I - String2 */
-
- /*
- * Compare UTF-32 strings after case folding and NFKC normalization
- */
- extern int cupsUtf32CompareIdentifier(const utf32_t *s1,
- /* I - String1 */
- const utf32_t *s2); /* I - String2 */
-
- /*
- * Get UTF-32 character property
- */
- extern int cupsUtf32CharacterProperty(const utf32_t ch,
- /* I - Source char */
- const cups_property_t property);
- /* I - Char Property */
-
- # ifdef __cplusplus
- }
- # endif /* __cplusplus */
-
- #endif /* !_CUPS_NORMALIZE_H_ */
-
- McDonald June 20, 2002 [Page 21]
-
- CUPS Internationalization Software Design Description v0.3
-
-
- /*
- * End of "$Id: i18n_sdd.txt 2678 2002-08-19 01:15:26Z mike $"
- */
-
-
-
- 3.2.1.1. cups_normmap_t - Normalize Map Structure
-
- typedef struct cups_normmap_str /**** Normalize Map Cache Struct ****/
- {
- struct cups_normmap_str *next; /* Next normalize in cache */
- int used; /* Number of times entry used */
- cups_normalize_t normalize; /* Normalization type */
- int normcount; /* Count of Source Chars */
- ucs2_t *uni2norm; /* Char -> Normalization */
- /* ...only supports UCS-2 */
- } cups_normmap_t;
-
- 'uni2norm' is a pointer to an array of _triplets_ of UCS-2 values.
- 'normcount' is a count of _triplets_ in the 'uni2norm[]' array.
-
- For decompositions (NFD and NFKD), the triplets are: composed base
- character, decomposed base character, and decomposed accent character.
- These are used by 'cupsUtf8Normalize()' and 'cupsUtf32Normalize()' in
- performing canonical (NFD) or compatibility (NFKD) decomposition.
-
- For compositions (NFC and NFKC), the triplets are: decomposed base
- character, decomposed accent character, and composed base character.
- These are used by 'cupsUtf8Normalize()' and 'cupsUtf32Normalize()' in
- performing canonical composition (for NFC or NFKC).
-
-
-
- 3.2.1.2. cups_foldmap_t - Case Fold Map Structure
-
- typedef struct cups_foldmap_str /**** Case Fold Map Cache Struct ****/
- {
- int used; /* Number of times entry used */
- cups_folding_t fold; /* Case folding type */
- int foldcount; /* Count of Source Chars */
- ucs2_t *uni2fold; /* Char -> Folded Char(s) */
- /* ...only supports UCS-2 */
- } cups_foldmap_t;
-
- 'uni2fold' is a pointer to an array of _quadruplets_ of UCS-2 values.
- 'foldcount' is a count of _quadruplets_ in the 'uni2fold[]' array.
-
- For simple case folding (without expansion of the size of the output
- string), the quadruplets are: input base character, output case folded
- character, zero (unused), and zero (unused).
-
-
- McDonald June 20, 2002 [Page 22]
-
- CUPS Internationalization Software Design Description v0.3
-
-
- For full case folding (with possible expansion of the size of the output
- string), the quadruplets are: input base character, output case folded
- character, second output character or zero, third output character or
- zero.
-
-
-
- 3.2.1.3. cups_propmap_t - Char Property Map Structure
-
- typedef struct /**** Char Property Map Struct ****/
- {
- int used; /* Number of times entry used */
- int propcount; /* Count of Source Chars */
- cups_prop_t *uni2prop; /* Char -> Properties */
- } cups_propmap_t;
-
- 'uni2prop' is a pointer to an array of 'cups_prop_t' (see below).
- 'propcount' is a count of elements in the 'uni2prop[]' array.
-
-
-
- 3.2.1.4. cups_prop_t - Char Property Structure
-
- typedef struct cups_prop_str /**** Char Property Struct ****/
- {
- ucs2_t ch; /* Unicode Char as UCS-2 */
- unsigned char gencat; /* General Category */
- unsigned char bidicat; /* Bidirectional Category */
- } cups_prop_t;
-
-
-
- 3.2.1.5. cups_breakmap_t - Line Break Map Structure
-
- typedef struct /**** Line Break Class Map Struct ****/
- {
- int used; /* Number of times entry used */
- int breakcount; /* Count of Source Chars */
- ucs2_t *uni2break; /* Char -> Line Break Class */
- } cups_breakmap_t;
-
- 'uni2break' is a pointer to an array of _triplets_ of UCS-2 values.
- 'breakcount' is a count of _triplets_ in the 'uni2break[]' array.
-
- The triplets in 'uni2break' are: first UCS-2 value in a range, last
- UCS-2 value in a range, and line break class stored as UCS-2.
-
-
-
-
-
-
- McDonald June 20, 2002 [Page 23]
-
- CUPS Internationalization Software Design Description v0.3
-
-
-
- 3.2.1.6. cups_combmap_t - Combining Class Map Structure
-
- typedef struct /**** Combining Class Map Struct ****/
- {
- int used; /* Number of times entry used */
- int combcount; /* Count of Source Chars */
- cups_comb_t *uni2comb; /* Char -> Combining Class */
- } cups_combmap_t;
-
- 'uni2comb' is a pointer to an array of 'cups_comb_t' (see below).
- 'combcount' is a count of elements in the 'uni2comb[]' array.
-
-
-
- 3.2.1.7. cups_comb_t - Combining Class Structure
-
- typedef struct cups_comb_str /**** Char Combining Class Struct ****/
- {
- unsigned short ch; /* Unicode Char as UCS-2 */
- unsigned char combclass; /* Combining Class */
- unsigned char reserved; /* Reserved for alignment */
- } cups_comb_t;
-
-
-
- 3.2.2. normalize.c - Normalization module
-
- The normalization function 'cupsUtf8Normalize()' and the case folding
- function 'cupsUtf8CaseFold()' are modelled on the C standard library
- function 'strncpy()', except that they return the count of the output,
- like 'strlen()', rather than the (redundant) pointer to the output.
-
- If the normalization or case folding functions detect invalid input
- parameters or they detect an encoding error in their input, then they
- return '-1', rather than the count of output.
-
- The normalization and case folding functions take an input parameter
- indicating the maximum output units (for safe operation).
-
-
-
- 3.2.2.1. cupsUtf8Normalize()
-
- /*
- * Normalize UTF-8 string to Unicode UAX-15 Normalization Form
- * Note - Compatibility Normalization Forms (NFKD/NFKC) are
- * unsafe for subsequent transcoding to legacy charsets
- */
- extern int cupsUtf8Normalize(utf8_t *dest, /* O - Target string */
- const utf8_t *src, /* I - Source string */
-
- McDonald June 20, 2002 [Page 24]
-
- CUPS Internationalization Software Design Description v0.3
-
- const int maxout, /* I - Max output */
- const cups_normalize_t normalize);
- /* I - Normalization */
-
- <Convert input UTF-8 to internal UCS-4 by calling 'cupsUtf8ToUtf32()'>
- <Normalize by calling 'cupsUtf32Normalize()'>
- <Convert normalized UCS-4 to UTF-8 by calling 'cupsUtf32ToUtf8()>
- <Return length of output UTF-8 string -- size in butes>
-
-
-
- 3.2.2.2. cupsUtf32Normalize()
-
- extern int cupsUtf32Normalize(utf32_t *dest,
- /* O - Target string */
- const utf32_t *src, /* I - Source string */
- const int maxout, /* I - Max output */
- const cups_normalize_t normalize);
- /* I - Normalization */
-
- <Find normalize maps by calling 'cupsNormalizeMapsGet()'>
- <...if not found, return '-1'>
- <Repeatedly traverse internal UCS-4, decomposing (NFD or NFKD)...>
- <...with 'bsearch()' of 'uni2norm[]' using local 'compare_decompose()'>
- <...until one pass yields no further decomposition>
- <Repeatedly traverse internal UCS-4, doing canonical reordering>
- <...with 'bsearch()' of 'uni2comb[]' using local 'compare_combchar()'>
- <...until one pass yields no further canonical reordering>
- <If 'normalize' requests composition (NFC or NFKC)...>
- <...repeatedly traverse internal UCS-4, composing (NFC or NFKC)...>
- <...with 'bsearch()' of 'uni2norm[]' using local 'compare_compose()'>
- <...until one pass yields no further composition>
- <Release normalize maps by calling 'cupsNormalizeMapsFree()'>
- <Return count of output UTF-32 string -- NOT memory size in butes>
-
-
-
- 3.2.2.3. cupsUtf8CaseFold()
-
- /*
- * Case Fold UTF-8 string per Unicode UAX-21 Section 2.3
- * Note - Case folding output is
- * unsafe for subsequent transcoding to legacy charsets
- */
- extern int cupsUtf8CaseFold(utf8_t *dest, /* O - Target string */
- const utf8_t *src, /* I - Source string */
- const int maxout, /* I - Max output */
- const cups_folding_t fold); /* I - Fold Mode */
-
- <Find normalize maps by calling 'cupsNormalizeMapsGet()'>
- <...if not found, return '-1'>
- <Convert input UTF-8 to internal UCS-4 by calling 'cupsUtf8ToUtf32()'>
-
- McDonald June 20, 2002 [Page 25]
-
- CUPS Internationalization Software Design Description v0.3
-
- <Case fold internal UCS-4 by calling 'cupsUtf32CaseFold()'>
- <Convert internal UCS-4 to output UTF-8 by calling 'cupsUtf32ToUtf8()>
- <Release normalize maps by calling 'cupsNormalizeMapsFree()'>
- <Return length of output UTF-8 string -- size in butes>
-
-
-
- 3.2.2.4. cupsUtf32CaseFold()
-
- /*
- * Case Fold UTF-32 string per Unicode UAX-21 Section 2.3
- * Note - Case folding output is
- * unsafe for subsequent transcoding to legacy charsets
- */
- extern int cupsUtf32CaseFold(utf32_t *dest, /* Target string */
- const utf32_t *src, /* Source string */
- const int maxout); /* Max output units */
-
- <Find case fold maps by calling 'cupsNormalizeMapsGet()'>
- <...if not found, return '-1'>
- <Traverse internal UCS-4 once, performing case folding...>
- <...with 'bsearch()' of 'uni2fold[]' using local 'compare_foldchar()'>
- <Copy internal UCS-4 to output UTF-32 string>
- <Release normalize maps by calling 'cupsNormalizeMapsFree()'>
- <Return count of output UTF-32 string -- NOT memory size in bytes>
-
-
-
- 3.2.2.5. cupsUtf8CompareCaseless()
-
- /*
- * Compare UTF-8 strings after case folding
- */
- extern int cupsUtf8CompareCaseless(const utf8_t *s1,
- /* I - String1 */
- const utf8_t *s2); /* I - String2 */
-
- <Case fold both input UTF-8 strings by calling 'cupsUtf8CaseFold()'>
- <Return compare of case folded first and second strings>
-
-
-
- 3.2.2.6. cupsUtf32CompareCaseless()
-
- /*
- * Compare UTF-32 strings after case folding
- */
- extern int cupsUtf32CompareCaseless(const utf32_t *s1,
- /* I - String1 */
- const utf32_t *s2); /* I - String2 */
-
- <Case fold both input UTF-32 strings by calling 'cupsUtf32CaseFold()'>
-
- McDonald June 20, 2002 [Page 26]
-
- CUPS Internationalization Software Design Description v0.3
-
- <Return compare of case folded first and second strings>
-
-
-
- 3.2.2.7. cupsUtf8CompareIdentifier()
-
- /*
- * Compare UTF-8 strings after case folding and NFKC normalization
- */
- extern int cupsUtf8CompareIdentifier(const utf8_t *s1,
- /* I - String1 */
- const utf8_t *s2); /* I - String2 */
-
- <Convert input UTF-8 to internal UCS-4 by calling 'cupsUtf8ToUtf32()'>
- <Case fold both strings by calling 'cupsUtf32CaseFold()'>
- <Normalize both strings to NFKC by calling 'cupsUtf32Normalize()'>
- <Return compare of case folded/normalized first and second strings>
-
-
-
- 3.2.2.8. cupsUtf32CompareIdentifier()
-
- /*
- * Compare UTF-32 strings after case folding and NFKC normalization
- */
- extern int cupsUtf32CompareIdentifier(const utf32_t *s1,
- /* I - String1 */
- const utf32_t *s2); /* I - String2 */
-
- <Case fold both strings by calling 'cupsUtf32CaseFold()'>
- <Normalize both strings to NFKC by calling 'cupsUtf32Normalize()'>
- <Return compare of case folded/normalized first and second strings>
-
-
-
- 3.2.2.9. cupsUtf32CharacterProperty()
-
- /*
- * Get UTF-32 character property
- */
- extern int cupsUtf32CharacterProperty(const utf32_t ch,
- /* I - Source char */
- const cups_property_t property);
- /* I - Char Property */
-
- <Lookup UTF-32 character property in appropriate map...> <...internal
- functions for each different map lookup>
-
-
-
-
-
-
- McDonald June 20, 2002 [Page 27]
-
- CUPS Internationalization Software Design Description v0.3
-
-
-
- 3.2.2.10. Normalization Utility Functions
-
-
-
-
- 3.2.2.10.1. cupsNormalizeMapsGet()
-
- extern void cupsNormalizeMapsMapsGet(void);
-
- <Find normalize maps in cache>
- <...If found, increment 'used'>
- <...and return void>
- <For each map (normalization, case fold, combining class, etc.)...>
- <Open (preprocessed form of) Unicode data file...>
- <...If not found, return void>
- <Count lines in preprocessed form, for mapping memory alloc>
- <...Close (preprocessed form of) Unicode data file>
- <Open (preprocessed form of) Unicode data file...>
- <...If not found, return void>
- <Allocate memory for approriate map in cache...>
- <...If no memory, return void>
- <Add to appropriate cache by assigning 'next' field>
- <Assign map type field and count field>
- <Increment 'used' field>
- <Read normalize map into memory in loop...>
- <...Add values to 'uni2xxx[]' array>
- <Close (preprocessed form of) Unicode data file>
- <Return void>
-
-
-
- 3.2.2.10.2. cupsNormalizeMapsFree()
-
- extern void cupsNormalizeMapsFree(void);
-
- <Find normalize maps in cache>
- <...If found, decrement 'used'>
- <Return void>
-
-
-
- 3.2.2.10.3. cupsNormalizeMapsFlush()
-
- extern void cupsNormalizeMapsFlush(void);
-
- <Loop through normalize maps cache...>
- <...Free 'uni2norm[]' memory>
- <...Free normalize map memory>
- <Loop through case folding cache...>
- <...Free 'uni2fold[]' memory>
-
- McDonald June 20, 2002 [Page 28]
-
- CUPS Internationalization Software Design Description v0.3
-
- <...Free case folding memory>
- <Loop through char property map cache...>
- <...Free 'uni2prop[]' memory>
- <...Free char property map memory>
- <Loop through line break class map cache...>
- <...Free 'uni2break[]' memory>
- <...Free line break class map memory>
- <Loop through combining class map cache...>
- <...Free 'uni2comb[]' memory>
- <...Free combining class map memory>
- <Return void>
-
-
-
- 3.3. Language - Existing
-
-
-
- 3.3.1. language.h - Language header
-
- Required Changes:
-
- (1) Change definition of 'cups_lang_t' to correct length of 'language[]'
- to 32 characters per [RFC3066] and [ISO639-2] and [ISO3166-1].
-
-
-
- 3.3.2. language.c - Language module
-
-
-
- 3.3.2.1. cupsLangEncoding() - Existing
-
- [No Change]
-
-
-
- 3.3.2.2. cupsLangFlush() - Existing
-
- [No Change]
-
-
-
- 3.3.2.3. cupsLangFree() - Existing
-
- [No Change]
-
-
-
-
-
-
-
- McDonald June 20, 2002 [Page 29]
-
- CUPS Internationalization Software Design Description v0.3
-
-
-
- 3.3.2.4. cupsLangGet() - Existing
-
- Required Changes:
-
- (1) Change length of 'langname[]' and 'real[]' to 64 characters per
- [RFC3066] and potential length of encoding (charset) names;
- (2) Change language string normalization to support:
- (a) 8-character language codes per [RFC3066] and 3-character
- language codes per [ISO639-2];
- (b) 8-character country codes per [RFC3066] and 3-character country
- codes per [ISO3166-1];
- (c) Support for 'i' (IANA registered) and 'x' (private) language
- prefixes per [RFC3066];
- (d) Invariant use of 'utf-8' for encoding in message catalog, but
- save actual requested encoding name for later use.
- (3) Correct broken do/while statement for message catalog lookup (while
- condition is _never_ satisfied).
-
-
-
- 3.3.2.5. cupsLangPrintf() - New
-
- extern int cupsLangPrintf(FILE *fp, /* I - File to write */
- const cups_lang_t *lang, /* I - Language/locale*/
- const cups_msg_t msg, /* I - Msg to format */
- ...); /* I - Args to format */
-
- <Set up variable args by calling 'va_start()'>
- <Format CUPS message with variable args by calling 'vsnprintf()'>
- <Clean up variable args by calling 'va_end()'>
- <Transcode CUPS message by calling 'cupsUtf8ToCharset()'>
- <Write CUPS message by calling 'fputs()'>
- <Return transcoded output CUPS message length>
-
-
-
- 3.3.2.6. cupsLangPuts() - New
-
- extern int cupsLangPuts(FILE *fp, /* I - File to write */
- const cups_lang_t *lang, /* I - Language/locale*/
- const cups_msg_t msg); /* I - Msg to write */
-
- <Transcode CUPS message by calling 'cupsUtf8ToCharset()'>
- <Write CUPS message by calling 'fputs()'>
- <Return transcoded output CUPS message length>
-
-
-
-
-
-
- McDonald June 20, 2002 [Page 30]
-
- CUPS Internationalization Software Design Description v0.3
-
-
-
- 3.3.2.7. cupsEncodingName() - New
-
- extern char *cupsEncodingName(cups_encoding_t encoding);
-
- <Lookup encoding name in static 'lang_encodings[]' array>
- <Return pointer to encoding name (charset map file name)>
-
-
-
- 3.4. Common Text Filter - Existing
-
-
-
- 3.4.1. textcommon.h - Common text filter header
-
- Required changes:
-
- (1) Revise 'lchar_t' as specified below, adding 'attrx' bit-mask for
- selected Unicode character properties;
- (2) Revise 'lchar_t' as specified below, adding 'comblen' and 'combch[]'
- for Unicode combining/attached chars (accents);
- (3) Add 'COMBLEN_MAX' limit as specified below;
- (4) Add 'ATTRX_...' selected Unicode character properties as specified
- below.
-
-
-
- 3.4.1.1. lchar_t - Character/Attribute Structure
-
- typedef struct lchar_str /**** Character / Attribute Structure ****/
- {
- unsigned short ch; /* Unicode Char as UCS-2 */
- /* or 8/16-bit Legacy Char */
- unsigned short attr; /* Attributes of Char */
- unsigned short attrx; /* Extended Attributes */
- unsigned short comblen; /* Combining Char Count */
- unsigned short combch[8]; /* Combining Chars as UCS-2 */
- } lchar_t;
-
- 'ch' is a 16-bit UCS-2 character or a 8/16-bit legacy char. 'attr' is
- the character attributes defined for the existing 'lchar_t' structure
- (defined in 'textcommon.h'). 'attrx' is the extended character
- attributes defined for future selected Unicode character properties (see
- below). 'comblen' is the number of attached/combining characters.
- 'combch' is an array of 16-bit UCS-2 attached/combining characters.
-
- Add to 'textcommon.h' constants:
-
- COMBLEN_MAX 8
-
-
- McDonald June 20, 2002 [Page 31]
-
- CUPS Internationalization Software Design Description v0.3
-
-
- ATTRX_RIGHT2LEFT 0x0001
-
-
-
- 3.4.2. textcommon.c - Common text filter
-
- Required Changes:
-
- (1) Revise 'TextMain()' function as described below.
-
-
-
- 3.4.2.1. TextMain() - Existing
-
- Required Changes:
-
- [Ed Note: Pseudo code below needs more work on bidi handling.]
-
- (1) In main loop at the _beginning_ of the 'default' clause, add the
- following code for combining marks:
- lchar_t *cp;
-
- cp = Page[line];
- cp += column;
- /*
- * Check for Unicode combining mark (accent)
- */
- if (UTF-8 && cupsUtf32CombiningClass(ch) > 0)
- {
-
- /*
- * Save Unicode combining mark in SAME character
- */
- if (cp->comblen > COMBLEN_MAX)
- break;
- cp->combch[cp->comblen] = ch;
- cp->comblen ++;
- break;
- }
-
- (2) In main loop _after_ combining chars section in 'default' clause,
- add the following code for Unicode bidi control characters
- cups_bidicat_t bidicat;
-
- /*
- * Check for Unicode bidi control character
- */
- if (UTF-8)
- {
- bidicat = (cups_bidicat_t)
- cupsUtf32CharacterProperty(ch, CUPS_PROP_BIDI_CATEGORY);
-
- McDonald June 20, 2002 [Page 32]
-
- CUPS Internationalization Software Design Description v0.3
-
- if ((bidicat == CUPS_BIDI_LRE) /* Left-to-Right Embedding *
- || (bidicat == CUPS_BIDI_LRO) /* Left-to-Right Override */
- || (bidicat == CUPS_BIDI_RLE) /* Right-to-Left Embedding *
- || (bidicat == CUPS_BIDI_RLO) /* Right-to-Left Override */
- || (bidicat == CUPS_BIDI_PDF)) /* Pop Directional Format */
- {
- /* Do bidi stuff here with memory for NEXT char's direction
- /* Discard bidi control character and break */
- }
- if ((bidicat == CUPS_BIDI_R) /* Right-to-Left Hebrew */
- || (bidicat == CUPS_BIDI_AL)) /* Right-to-Left Arabic */
- {
- /* Set attrx for right-to-left */
- cp->attrx |= ATTRX_RIGHT2LEFT
- }
- }
-
-
-
- 3.4.2.2. compare_keywords() - Existing
-
- [No Change]
-
-
-
- 3.4.2.3. getutf8() - Existing
-
- [No Change]
-
- [Ed Note: Future - allow 20-bit UTF-32 code points - requires updates
- in both 'textcommon.c' and 'texttops.c' for extended PostScript.]
-
-
-
- 3.5. Text to PostScript Filter - Existing
-
-
-
- 3.5.1. texttops.c - Text to PostScript filter
-
- Required Changes:
-
- (1) Revise local 'write_string()' function as described below.
-
-
-
- 3.5.1.1. main() - Existing
-
- [No Change]
-
-
-
-
- McDonald June 20, 2002 [Page 33]
-
- CUPS Internationalization Software Design Description v0.3
-
-
-
- 3.5.1.2. WriteEpilogue () - Existing
-
- [No Change]
-
-
-
- 3.5.1.3. WritePage () - Existing
-
- [No Change]
-
-
-
- 3.5.1.4. WriteProlog () - Existing
-
- [No Change]
-
-
-
- 3.5.1.5. write_line() - Existing
-
- [No Change]
-
-
-
- 3.5.1.6. write_string() - Existing
-
- Required Changes:
-
- (1) At the _beginning_ of Multiple Fonts section, _replace_ the while()
- loop and surrounding 'putchar()' calls with the following code:
-
- for (; len > 0; len --, s ++)
- {
- utf32_t decstr[COMBLEN_MAX * 2];
- utf32_t cmpstr[COMBLEN_MAX * 2];
- int cmplen;
- int i;
-
- if (s->comblen == 0)
- {
- printf("<%04x>", Chars[s->ch]);
- continue;
- }
-
- /*
- * Normalize decomposed Unicode character to NFKC
- * (compatibility decomposition, then canonical composition)
- */
- decstr[0] = (utf32_t) s->ch;
- for (i = 0; i < s->comblen; i ++)
-
- McDonald June 20, 2002 [Page 34]
-
- CUPS Internationalization Software Design Description v0.3
-
- decstr[i + 1] = (utf32_t) s->combch[i];
- decstr[i] = 0;
- cmplen = cupsUtf32Normalize (&cmpstr[0],
- &decstr[0], COMBLEN_MAX * 2, CUPS_NORM_NFKC);
- if (cmplen < 1)
- continue;
-
- /*
- * Write combining chars, then composed base, to same location
- */
- for (i = 1; i < cmplen; i ++)
- {
- printf("<%04x>", Chars[(int) cmpstr[i]);
- /*
- * Superimpose glyphs by backing up one column width
- */
- printf (" -%.3f ", (72.0f / (float) CharsPerInch));
- }
- printf("<%04x>", Chars[(int) cmpstr[0]);
- }
-
- [Ed Note: Future - Bidi support - When writing Unicode characters
- (checking for explicit bidi) convert input string (lchar_t) to display
- order???]
-
-
-
- 3.5.1.7. write_text() - Existing
-
- [No Change]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- McDonald June 20, 2002 [Page 35]
-
- CUPS Internationalization Software Design Description v0.3
- APPENDIX A
- Glossary
-
-
-
- A. Glossary
-
- Abstract Character: A unit of information used for the organization,
- control, or representation of textual data.
-
- Accent Mark: A mark placed above, below, or to the side of a character
- to alter its phonetic value (also 'diacritic').
-
- Alphabet: A collection of symbols that, in the context of a particular
- written language, represent the sounds of that language.
-
- Base Character: A character that does not graphically combine with
- preceding characters, and that is neither a control nor a format
- character.
-
- Basic Multilingual Plane: The Unicode (or UCS) code values 0x0000
- through 0xFFFF, specified by [ISO10646] (also 'Plane 0').
-
- BIDI: Abbreviation for Bidirectional, in reference to mixed
- left-to-right and right-to-left text.
-
- Bidirectional Display: The process or result of mixing left-to-right
- oriented text and right-to-left oriented text in a single line.
-
- Big-endian: A computer architecture that stores multiple-byte numerical
- values with the most significant byte (MSB) values first.
-
- BMP: Abbreviation for Basic Multilingual Plane.
-
- BOM: Acronym for byte order mark (also 'ZWNBSP').
-
- Byte Order Mark: The Unicode character U+FEFF Zero Width No-Break Space
- (ZWNBSP) when used to indicate the byte order of text.
-
- Canonical: (1) Conforming to the general rules for encoding -- that is,
- not compressed, compacted, or in any other form specified by a higher
- protocol. (2) Characteristic of a normative mapping and form of
- equivalence.
-
- Canonical Decomposition: The decomposition of a character that results
- from recursively applying the canonical mappings defined in the Unicode
- Character Database until no characters can be further decomposed, then
- reordering nonspacing marks according to section 3.10 of [UNICODE3.2].
-
- Canonical Equivalent: Two characters are canonical equivalents if their
- full canonical decompositions are identical.
-
- Case: (1) Feature of certain alphabets wheere the letters have two
-
- McDonald June 20, 2002 [Page A-1]
-
- CUPS Internationalization Software Design Description v0.3
- APPENDIX A
- Glossary
-
- distinct forms. These variants are called the 'uppercase' letter (also
- known as 'capital' or 'majuscule') and the 'lowercase' letter (also
- known as 'small' or 'minuscule'). (2) Normative property of Unicode
- characters, consisting of uppercase, lowercase, and titlecase.
-
- Character: (1) The smallest component of written language that has
- semantic value; refers to the abstract meaning and/or shape, rather than
- a specific shape (see also 'glyph'). (2) Synonym for 'abstract
- character'. (3) The basic unit of encoding for the Unicode character
- encoding. (4) The English name for the ideographic written elements of
- Chinese origin (see 'ideograph').
-
- Character Encoding Form (CEF): Mapping from a character set definition
- to the actual bits used to represent the data.
-
- Character Encoding Scheme (CES): A 'character encoding form' plus byte
- serialization. [UNICODE3.2] defines seven character encoding schemes:
- UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, and UTF32-LE.
-
- Character Properties: A set of property names and property values
- associated with individual characters defined in [UNICODE3.2].
-
- Character Repertoire: (1) The collection of characters included in a
- character set. (2) The SUBSET of characters included in a large
- character set, e.g., [UNICODE3.2], that are necessary to support a
- complete mapping to another smaller character set, e.g., ISO8859-1 (also
- called 'Latin-1').
-
- Character Set: A collection of elements used to represent textual
- information.
-
- Coded Character Set: A character set in which each character is
- assigned a numeric code value. Frequently abbreviated as 'character
- set', 'charset', or 'code set'.
-
- Code Point: (1) A numerical index (or position) in an encoding table
- used for encoding characters. (2) Synonym for 'Unicode scalar value'.
-
- Collation: The process of ordering units of textual information.
- Collation is usually specific to a particular language. Also known as
- 'alphabetizing' or 'alphabetic sorting'.
-
- Combining Character: A character that graphically combines with a
- preceding 'base character'. The combining character is said to 'apply'
- to that base character. (See also 'nonspacing mark'.)
-
- Compatibility: (1) Consistency with existing practice or preexisting
- character encoding standards. (2) Characterisitic of a normative
- mapping and form of equivalence (see 'compatibility decomposition').
-
-
- McDonald June 20, 2002 [Page A-2]
-
- CUPS Internationalization Software Design Description v0.3
- APPENDIX A
- Glossary
-
-
- Compatibility Character: A character that has a compatibility
- decomposition.
-
- Compatibility Decomposition: The decomposition of a character that
- results from recursively applying BOTH the compatibility mappings AND
- the canonical mappings found in the Unicode Character Database until no
- characters can be further decomposed, then reordering nonspacing marks
- according to section 3.10 of [UNICODE3.2].
-
- Compatibility Equivalent: Two characters are compatibility equivalents
- if their full compatibility decompositions are identical.
-
- Composed Character: (See 'descomposable character'.)
-
- DBCS: Acronym for 'double-byte character set'.
-
- Decomposable Character: A character that is equivalent to a sequence of
- one or more other characters, according to the decomposition mappings
- found in [UNICODE3.2]. It may also be known as a 'precomposed
- character' or a 'composite character'.
-
- Decomposition: (1) The process of separating or analyzing a text
- element into component units. (2) A sequence of one or more characters
- that is equivalent to a 'decomposable character'.
-
- Diacritic: (See 'accent mark'.)
-
- Double-Byte Character Set (DBCS): One of a number of character sets
- defined for representing Chinese, Japanese, or Korean text (for example,
- JIS X 0208-1990). These character sets are often encoded in such a way
- as to allow double-byte character encodings to be mixed with single-byte
- character encodings. (See also 'multiple-byte character set'.)
-
- Font: A collection of glyphs used for visual depication of character
- data.
-
- FSS-UTF: Abbreviation for 'File System Safe UCS Transformation Format',
- originally published by X/Open. Now called 'UTF-8'.
-
- Fullwidth: Characters of East Asian character sets whose glyph image
- extends across the entire character display cell. In legacy character
- sets, fullwidth characters are normally encoded in two or three bytes.
-
- Glyph: (1) An abstract form that represents one or more glyph images.
- (2) A synonym for 'glyph image'.
-
- Glyph Image: The actual, concrete image of a glyph representation
- having been rasterized or otherwise images onto some display surface.
-
-
- McDonald June 20, 2002 [Page A-3]
-
- CUPS Internationalization Software Design Description v0.3
- APPENDIX A
- Glossary
-
-
- Halfwidth: Characters of East Asian character sets whose glyph image
- occupies half of the character display cell. In legacy character sets,
- halfwidth characters are normally encoded in a single byte.
-
- Han Characters: Ideographic characters of Chinese origin.
-
- Hangul: The name of the script used to write the Korean language.
-
- High-Surrogate: A Unicode code value in the range U+D800 to U+DBFF.
-
- Hiragana: One of two standard syllabaries associated with the Japanese
- writing system. Use to write particles, grammatical affixes, and words
- that have no 'kanji' form.
-
- IANA: Internet Assigned Numbers Authority.
-
- Ideograph: (1) Any symbol that denotes an idea (or meaning) in contrast
- to a sound or pronunciation (for example, a 'smiley face'). (2) A
- common term used to refer to Han characters.
-
- IPA: International Phonetic Alphabet.
-
- IRG: Abbreviation for Ideographic Rapporteur Group, a subgroup of
- ISO/IEC JTC1/SC2/WG2 (who work on Han unification and submission of new
- Han characters for inclusion in revised versions of Unicode/ISO 10646).
-
- Jamo: The Korean name for a single letter of the Hangul script. Jamos
- are used to form Hangul syllables.
-
- Joiner: An invisible character that affects the joining behavior of
- surrounding characters.
-
- JTC1: Abbreviation for Joint Technical Committee 1 of ISO/IEC,
- responsible for information technology standardization.
-
- Kana: The name of a primarily syllabic script used by the Japanese
- writing system, composed of 'hiragana' and 'katakana'.
-
- Kanji: The Japanese name for Han characters; derived from the Chinese
- word 'hanzi'. Also romanized as 'kanzi'.
-
- Katakana: One of two standard syllabaries associated with the Japanese
- writing system, typically used in representation of borrowed vocabulary.
-
- Ligature: A glyph representing a combination of two or more characters,
- for example in the Latin script the ligature between 'f' and 'i' as
- 'fi'.
-
- Logical Order: The order in which text is typed on a keyboard. For the
-
- McDonald June 20, 2002 [Page A-4]
-
- CUPS Internationalization Software Design Description v0.3
- APPENDIX A
- Glossary
-
- most part, logical order corresponds to phonetic order.
-
- Lowercase: (See 'case'.)
-
- Low-Surrogate: A Unicode code value in the range U+DC00 to U+DFFF.
-
- MBCS: Acronym for 'multiple-byte character set'.
-
- Multiple-Byte Character Set (MBCS): A character set encoded with a
- variable number of bytes per character. Many large character sets have
- been defined as MBCS so as to keep strict compatibility with the
- US-ASCII subset and/or [ISO2022].
-
- Normalization: Transformation of data to a normal form.
-
- Plain Text: Computer-encoded text that consists ONLY of a sequence of
- code values from a given standard, with no other formatting or
- structural information.
-
- Precomposed Character: (See 'decomposable character'.)
-
- Rendering: (1) The process of selecting and laying out glyphs for the
- purpose of depicting characters. (2) The process of making glyphs
- visible on a display device.
-
- Repertoire: (See 'character repertoire'.)
-
- Replacement Character: A character used as a substitute for an
- uninterpretable character from another encoding. [UNICODE3.2] defines
- U+FFFD REPLACEMENT CHARACTER for this function.
-
- Rich Text: The result of adding information such as font data, color,
- formatting, phonetic annotations, etc. to 'plain text' (e.g., HTML).
-
- SBCS: Acronym for 'single-byte character set'.
-
- Scalar Value: (See 'Unicode scalar value'.)
-
- Script: A collection of symbols used to represent textual information
- in one or more writing systems.
-
- Single-Byte Character Set (SBCS): One of a number of one-byte character
- sets defined for representing (mostly) Western languages (for example,
- ISO 8859-1 'Latin-1'). These character sets are often encoded in such a
- way as to be strict supersets of 7-bit [US-ASCII].
-
- Sorting: (See 'collation'.)
-
- Transcoding: Conversion of character data between different character
- sets.
-
- McDonald June 20, 2002 [Page A-5]
-
- CUPS Internationalization Software Design Description v0.3
- APPENDIX A
- Glossary
-
-
- Transformation Format: A mapping from a coded character sequence to a
- unique sequence of code values (typically octets).
-
- UCS: Abbreviation for Universal Character Set, specified by [ISO10646].
-
- UCS-2: UCS encoded in 2 octets, specified by [ISO10646].
-
- UCS-4: UCS encoded in 4 octets, specified by [ISO10646].
-
- Unicode Scalar Value: A number between 0 to 0x10FFFF.
-
- Uppercase: (See 'case'.)
-
- UTF: Abbreviation for Unicode (or UCS) Transformation Format.
-
- UTF-8: Unicode (or UCS) Transformation Format, 8-bit encoding form.
- Serializes a Unicode (or UCS) scalar value (code point) as a sequence of
- one to four octets. Does NOT suffer from byte-ordering ambiguities.
-
- UTF-16: Unicode (or UCS) Transformation Format, 16-bit encoding form.
- Serializes a Unicode (or UCS) scalar value (code point) as a sequence of
- two octets, in either big-endian or little-endian format. Uses an
- (optional) prefix of BOM to disambiguate byte-ordering.
-
- UTF-32: Unicode (or UCS) Transformation Format, 32-bit encoding form.
- Serializes a Unicode (or UCS) scalar value (code point) as a sequence of
- four octets, in either big-endian or little-endian format. Uses an
- (optional) prefix of BOM to disambiguate byte-ordering.
-
- Zero Width: Characteristic of some spaces or format control characters
- that do not advance text along the horizontal baseline.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- McDonald June 20, 2002 [Page A-6]