diff options
Diffstat (limited to 'data/i18n_sdd.txt')
-rw-r--r-- | data/i18n_sdd.txt | 2337 |
1 files changed, 0 insertions, 2337 deletions
diff --git a/data/i18n_sdd.txt b/data/i18n_sdd.txt deleted file mode 100644 index 5c6cbcedc..000000000 --- a/data/i18n_sdd.txt +++ /dev/null @@ -1,2337 +0,0 @@ - - - WORKING DRAFT Ira McDonald - <i18n_sdd.txt> High North Inc - - Common UNIX Printing System ("CUPS") - Internationalization Software Design Description v0.3 - - Copyright (C) Easy Software Products (2002) - All Rights Reserved - - - Status of this Document - - This document is an unapproved working draft and is incomplete in some - sections (see 'Ed Note:' comments). - - - Abstract - - This document provides general information and high-level design for the - Internationalization extensions for the Common UNIX Printing System - ("CUPS") Version 1.2. This document also provides C language header - files and high-level pseudo-code for all new modules and external - functions. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - McDonald June 20, 2002 [Page 1] - - CUPS Internationalization Software Design Description v0.3 - - Table of Contents - - 1. Scope ...................................................... 4 - 1.1. Identification ......................................... 4 - 1.2. System Overview ........................................ 4 - 1.3. Document Overview ...................................... 4 - 2. References ................................................. 5 - 2.1. CUPS References ........................................ 5 - 2.2. Other Documents ........................................ 5 - 3. Design Overview ............................................ 7 - 3.1. Transcoding - New ...................................... 7 - 3.1.1. transcode.h - Transcoding header ................... 7 - 3.1.1.1. cups_cmap_t - SBCS Charmap Structure ........... 10 - 3.1.1.2. cups_dmap_t - DBCS Charmap Structure ........... 11 - 3.1.2. transcode.c - Transcoding module ................... 11 - 3.1.2.1. cupsUtf8ToCharset() ............................ 11 - 3.1.2.2. cupsCharsetToUtf8() ............................ 12 - 3.1.2.3. cupsUtf8ToUtf16() .............................. 12 - 3.1.2.4. cupsUtf16ToUtf8() .............................. 12 - 3.1.2.5. cupsUtf8ToUtf32() .............................. 12 - 3.1.2.6. cupsUtf32ToUtf8() .............................. 13 - 3.1.2.7. cupsUtf16ToUtf32() ............................. 13 - 3.1.2.8. cupsUtf32ToUtf16() ............................. 13 - 3.1.2.9. Transcoding Utility Functions .................. 13 - 3.1.2.9.1. cupsCharmapGet() ........................... 14 - 3.1.2.9.2. cupsCharmapFree() .......................... 14 - 3.1.2.9.3. cupsCharmapFlush() ......................... 14 - 3.2. Normalization - New .................................... 15 - 3.2.1. normalize.h - Normalization header ................. 15 - 3.2.1.1. cups_normmap_t - Normalize Map Structure ....... 22 - 3.2.1.2. cups_foldmap_t - Case Fold Map Structure ....... 22 - 3.2.1.3. cups_propmap_t - Char Property Map Structure ... 23 - 3.2.1.4. cups_prop_t - Char Property Structure .......... 23 - 3.2.1.5. cups_breakmap_t - Line Break Map Structure ..... 23 - 3.2.1.6. cups_combmap_t - Combining Class Map Structure . 24 - 3.2.1.7. cups_comb_t - Combining Class Structure ........ 24 - 3.2.2. normalize.c - Normalization module ................. 24 - 3.2.2.1. cupsUtf8Normalize() ............................ 24 - 3.2.2.2. cupsUtf32Normalize() ........................... 25 - 3.2.2.3. cupsUtf8CaseFold() ............................. 25 - 3.2.2.4. cupsUtf32CaseFold() ............................ 26 - 3.2.2.5. cupsUtf8CompareCaseless() ...................... 26 - 3.2.2.6. cupsUtf32CompareCaseless() ..................... 26 - 3.2.2.7. cupsUtf8CompareIdentifier() .................... 27 - 3.2.2.8. cupsUtf32CompareIdentifier() ................... 27 - 3.2.2.9. cupsUtf32CharacterProperty() ................... 27 - 3.2.2.10. Normalization Utility Functions ............... 28 - 3.2.2.10.1. cupsNormalizeMapsGet() .................... 28 - 3.2.2.10.2. cupsNormalizeMapsFree() ................... 28 - 3.2.2.10.3. cupsNormalizeMapsFlush() .................. 28 - 3.3. Language - Existing .................................... 29 - 3.3.1. language.h - Language header ....................... 29 - - McDonald June 20, 2002 [Page 2] - - CUPS Internationalization Software Design Description v0.3 - - 3.3.2. language.c - Language module ....................... 29 - 3.3.2.1. cupsLangEncoding() - Existing .................. 29 - 3.3.2.2. cupsLangFlush() - Existing ..................... 29 - 3.3.2.3. cupsLangFree() - Existing ...................... 29 - 3.3.2.4. cupsLangGet() - Existing ....................... 30 - 3.3.2.5. cupsLangPrintf() - New ......................... 30 - 3.3.2.6. cupsLangPuts() - New ........................... 30 - 3.3.2.7. cupsEncodingName() - New ....................... 31 - 3.4. Common Text Filter - Existing .......................... 31 - 3.4.1. textcommon.h - Common text filter header ........... 31 - 3.4.1.1. lchar_t - Character/Attribute Structure ........ 31 - 3.4.2. textcommon.c - Common text filter .................. 32 - 3.4.2.1. TextMain() - Existing .......................... 32 - 3.4.2.2. compare_keywords() - Existing .................. 33 - 3.4.2.3. getutf8() - Existing ........................... 33 - 3.5. Text to PostScript Filter - Existing ................... 33 - 3.5.1. texttops.c - Text to PostScript filter ............. 33 - 3.5.1.1. main() - Existing .............................. 33 - 3.5.1.2. WriteEpilogue () - Existing .................... 34 - 3.5.1.3. WritePage () - Existing ........................ 34 - 3.5.1.4. WriteProlog () - Existing ...................... 34 - 3.5.1.5. write_line() - Existing ........................ 34 - 3.5.1.6. write_string() - Existing ...................... 34 - 3.5.1.7. write_text() - Existing ........................ 35 - A. Glossary ................................................... A-1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - McDonald June 20, 2002 [Page 3] - - CUPS Internationalization Software Design Description v0.3 - - - - 1. Scope - - - - 1.1. Identification - - This document provides general information and high-level design for the - Internationalization extensions for the Common UNIX Printing System - ("CUPS") Version 1.2. This document also provides C language header - files and high-level pseudo-code for all new modules and external - functions. - - - 1.2. System Overview - - The CUPS Internationalization extensions provide multilingual support - via Unicode 3.2:2002 [UNICODE3.2] / ISO-10646-1:2000 [ISO10646-1] and a - suite of local character sets (including all adopted parts of ISO-8859 - and many MS Windows code pages) for CUPS 1.2. - - The CUPS Internationalization extensions support UTF-8 [RFC2279] as the - common stream-oriented representation of all character data. UTF-8 is - defined in [ISO10646-1] and is further constrained (for integrity and - security) by [UNICODE3.2]. - - UTF-8 is the native character set of LDAPv3 [RFC2251], SLPv2 [RFC2608], - IPP/1.1 [RFC2910] [RFC2911], and many other Internet protocols. - - - 1.3. Document Overview - - - This software design description document is organized into the - following sections: - - o 1 - Scope - o 2 - References - o 3 - Design Overview - o A - Glossary - - - - - - - - - - - - - McDonald June 20, 2002 [Page 4] - - CUPS Internationalization Software Design Description v0.3 - - - - 2. References - - - - 2.1. CUPS References - - See: Section 2.1 'CUPS Documentation' of CUPS Software Design - Description. - - - 2.2. Other Documents - - The following non-CUPS documents are referenced by this document. - - [ANSI-X3.4] ANSI Coded Character Set - 7-bit American National Standard - Code for Information Interchange, ANSI X3.4, 1986 (aka US-ASCII). - - [GB2312] Code of Chinese Graphic Character Set for Information - Interchange, Primary Set, GB 2312, 1980. - - [ISO639-1] Codes for the Representation of Names of Languages -- Part 1: - Alpha-2 Code, ISO/IEC 639-1, 2000. - - [ISO639-2] Codes for the Representation of Names of Languages -- Part 2: - Alpha-3 Code, ISO/IEC 639-2, 1998. - - [ISO646] Information Technology - ISO 7-bit Coded Character Set for - Information Interchange, ISO/IEC 646, 1991. - - [ISO2022] Information Processing - ISO 7-bit and 8-bit Coded Character - Sets - Code Extension Techniques, ISO/IEC 2022, 1994. (Technically - identical to ECMA-35.) - - [ISO3166-1] Codes for the Representation of Names of Countries and their - Subdivisions, Part 1: Country Codes, ISO/ISO 3166-1, 1997. - - [ISO8859] Information Processing - 8-bit Single-Byte Code Graphic - Character Sets, ISO/IEC 8859-n, 1987-2001. - - [ISO10646-1] Information Technology - Universal Multiple-Octet Code - Character Set (UCS) - Part 1: Architecture and Basic Multilingual - Plane, ISO/IEC 10646-1, September 2000. - - [ISO10646-2] Information Technology - Universal Multiple-Octet Code - Character Set (UCS) - Part 2: Supplemental Planes, ISO/IEC 10646-2, - January 2001. - - [RFC2119] Bradner. Key words for use in RFCs to Indicate Requirement - Levels, RFC 2119, March 1997. - - - McDonald June 20, 2002 [Page 5] - - CUPS Internationalization Software Design Description v0.3 - - - [RFC2251] Whal, Howes, Kille. Lightweight Directory Access Protocol - Version 3 (LDAPv3), RFC 2251, December 1997. - - [RFC2277] Alvestrand. IETF Policy on Character Sets and Languages, RFC - 2277, January 1998. - - [RFC2279] Yergeau. UTF-8, a Transformation Format of ISO 10646, RFC - 2279, January 1998. - - [RFC2608] Guttman, Perkins, Veizades, Day. Service Location Protocol - Version 2 (SLPv2), RFC 2608, June 1999. - - [RFC2910] Herriot, Butler, Moore, Turner, Wenn. Internet Printing - Protocol/1.1: Encoding and Transport, RFC 2910, September 2000. - - [RFC2911] Hastings, Herriot, deBry, Isaacson, Powell. Internet Printing - Protocol/1.1: Model and Semantics, RFC 2911, September 2000. - - [UNICODE3.0] Unicode Consortium, Unicode Standard Version 3.0, - Addison-Wesley Developers Press, ISBN 0-201-61633-5, 2000. - - [UNICODE3.1] Unicode Consortium, Unicode Standard Version 3.1 (UAX-27), - May 2001. - - [UNICODE3.2] Unicode Consortium, Unicode Standard Version 3.2 (UAX-28), - March 2002. - - [US-ASCII] See [ANSI-X3.4] above. - - - - - - - - - - - - - - - - - - - - - - - - - McDonald June 20, 2002 [Page 6] - - CUPS Internationalization Software Design Description v0.3 - - - - 3. Design Overview - - The CUPS Internationalization extensions are composed of several header - files and modules which extend the Language functions in the existing - CUPS Application Programmers Interface (API). - - - 3.1. Transcoding - New - - Initially, the CUPS Internationalization extensions will only support - SBCS (single-byte character set) transcoding. But the design allows - future support for DBCS (double-byte character set) transcoding for CJK - (Chinese/Japanese/Korean) languages and the MBCS (multiple-byte - character set) compound sets that use escapes for charset switching. - - In order to reduce code size and increase performance all conventional - 'mapping files' (tables of values in legacy characters sets with their - corresponding Unicode scalar values) will ALSO be sorted and stored in - memory as reverse maps (for efficient conversion from Unicode scalar - values to their corresponding legacy character set values). Transcoding - will be done directly by 2-level lookup (without any searching or - sorting). - - [Ed Note: CJK languages will be fairly costly in mapping table sizes, - because they have thousands (or tens of thousands) of codepoints.] - - - - 3.1.1. transcode.h - Transcoding header - - /* - * "$Id: i18n_sdd.txt 2678 2002-08-19 01:15:26Z mike $" - * - * Transcoding support for the Common UNIX Printing System (CUPS). - * - * Copyright 1997-2002 by Easy Software Products. - * - * These coded instructions, statements, and computer programs are - * the property of Easy Software Products and are protected by Federal - * copyright law. Distribution and use rights are outlined in the - * file "LICENSE.txt" which should have been included with this file. - * If this file is missing or damaged please contact Easy Software - * Products at: - * - * Attn: CUPS Licensing Information - * Easy Software Products - * 44141 Airport View Drive, Suite 204 - * Hollywood, Maryland 20636-3111 USA - * - * Voice: (301) 373-9603 - - McDonald June 20, 2002 [Page 7] - - CUPS Internationalization Software Design Description v0.3 - - * EMail: cups-info@cups.org - * WWW: http://www.cups.org - */ - - #ifndef _CUPS_TRANSCODE_H_ - # define _CUPS_TRANSCODE_H_ - - /* - * Include necessary headers... - */ - - # include "cups/language.h" - - # ifdef __cplusplus - extern "C" { - # endif /* __cplusplus */ - - /* - * Types... - */ - - typedef unsigned char utf8_t; /* UTF-8 Unicode/ISO-10646 code unit */ - typedef unsigned short utf16_t; /* UTF-16 Unicode/ISO-10646 code unit */ - typedef unsigned long utf32_t; /* UTF-32 Unicode/ISO-10646 code unit */ - typedef unsigned short ucs2_t; /* UCS-2 Unicode/ISO-10646 code unit */ - typedef unsigned long ucs4_t; /* UCS-4 Unicode/ISO-10646 code unit */ - typedef unsigned char sbcs_t; /* SBCS Legacy 8-bit code unit */ - typedef unsigned short dbcs_t; /* DBCS Legacy 16-bit code unit */ - - /* - * Structures... - */ - - typedef struct cups_cmap_str /**** SBCS Charmap Cache Structure ****/ - { - struct cups_cmap_str *next; /* Next charmap in cache */ - int used; /* Number of times entry used */ - cups_encoding_t encoding; /* Legacy charset encoding */ - ucs2_t char2uni[256]; /* Map Legacy SBCS -> UCS-2 */ - sbcs_t *uni2char[256]; /* Map UCS-2 -> Legacy SBCS */ - } cups_cmap_t; - - #if 0 - typedef struct cups_dmap_str /**** DBCS Charmap Cache Structure ****/ - { - struct cups_dmap_str *next; /* Next charmap in cache */ - int used; /* Number of times entry used */ - cups_encoding_t encoding; /* Legacy charset encoding */ - ucs2_t *char2uni[256]; /* Map Legacy DBCS -> UCS-2 */ - dbcs_t *uni2char[256]; /* Map UCS-2 -> Legacy DBCS */ - } cups_dmap_t; - #endif - - McDonald June 20, 2002 [Page 8] - - CUPS Internationalization Software Design Description v0.3 - - - /* - * Constants... - */ - #define CUPS_MAX_USTRING 1024 /* Maximum size of Unicode string */ - - /* - * Globals... - */ - - extern int TcFixMapNames; /* Fix map names to Unicode names */ - extern int TcStrictUtf8; /* Non-shortest-form is illegal */ - extern int TcStrictUtf16; /* Invalid surrogate pair is illegal */ - extern int TcStrictUtf32; /* Greater than 0x10FFFF is illegal */ - extern int TcRequireBOM; /* Require BOM for little/big-endian */ - extern int TcSupportBOM; /* Support BOM for little/big-endian */ - extern int TcSupport8859; /* Support ISO 8859-x repertoires */ - extern int TcSupportWin; /* Support Windows-x repertoires */ - extern int TcSupportCJK; /* Support CJK (Asian) repertoires */ - - /* - * Prototypes... - */ - - /* - * Utility functions for character set maps - */ - extern void *cupsCharmapGet(const cups_encoding_t encoding); - /* I - Encoding */ - extern void cupsCharmapFree(const cups_encoding_t encoding); - /* I - Encoding */ - extern void cupsCharmapFlush(void); - - /* - * Convert UTF-8 to and from legacy character set - */ - extern int cupsUtf8ToCharset(char *dest, /* O - Target string */ - const utf8_t *src, /* I - Source string */ - const int maxout, /* I - Max output */ - cups_encoding_t encoding); /* I - Encoding */ - extern int cupsCharsetToUtf8(utf8_t *dest, /* O - Target string */ - const char *src, /* I - Source string */ - const int maxout, /* I - Max output */ - cups_encoding_t encoding); /* I - Encoding */ - - /* - * Convert UTF-8 to and from UTF-16 - */ - extern int cupsUtf8ToUtf16(utf16_t *dest, /* O - Target string */ - const utf8_t *src, /* I - Source string */ - const int maxout); /* I - Max output */ - extern int cupsUtf16ToUtf8(utf8_t *dest, /* O - Target string */ - - McDonald June 20, 2002 [Page 9] - - CUPS Internationalization Software Design Description v0.3 - - const utf16_t *src, /* I - Source string */ - const int maxout); /* I - Max output */ - - /* - * Convert UTF-8 to and from UTF-32 - */ - extern int cupsUtf8ToUtf32(utf32_t *dest, /* O - Target string */ - const utf8_t *src, /* I - Source string */ - const int maxout); /* I - Max output */ - extern int cupsUtf32ToUtf8(utf8_t *dest, /* O - Target string */ - const utf32_t *src, /* I - Source string */ - const int maxout); /* I - Max output */ - - /* - * Convert UTF-16 to and from UTF-32 - */ - extern int cupsUtf16ToUtf32(utf32_t *dest, /* O - Target string */ - const utf16_t *src, /* I - Source string */ - const int maxout); /* I - Max output */ - extern int cupsUtf32ToUtf16(utf16_t *dest, /* O - Target string */ - const utf32_t *src, /* I - Source string */ - const int maxout); /* I - Max output */ - - # ifdef __cplusplus - } - # endif /* __cplusplus */ - - #endif /* !_CUPS_TRANSCODE_H_ */ - - /* - * End of "$Id: i18n_sdd.txt 2678 2002-08-19 01:15:26Z mike $" - */ - - - - 3.1.1.1. cups_cmap_t - SBCS Charmap Structure - - typedef struct cups_cmap_str /**** SBCS Charmap Cache Structure ****/ - { - struct cups_cmap_str *next; /* Next charset map in cache */ - int used; /* Number of times entry used */ - cups_encoding_t encoding; /* Legacy charset encoding */ - ucs2_t char2uni[256]; /* Map Legacy SBCS -> UCS-2 */ - sbcs_t *uni2char[256]; /* Map UCS-2 -> Legacy SBCS */ - } cups_cmap_t; - - 'char2uni[]' is a (complete) array of UCS-2 values that supports direct - one-level lookup from an input SBCS legacy charset code point, for use - by 'cupsCharsetToUtf8()'. - - 'uni2char[]' is a (sparse) array of pointers to arrays of (256 each) - SBCS values, that supports direct two-level lookup from an input UCS-2 - - McDonald June 20, 2002 [Page 10] - - CUPS Internationalization Software Design Description v0.3 - - code point, for use by 'cupsUtf8ToCharset()'. - - - - 3.1.1.2. cups_dmap_t - DBCS Charmap Structure - - typedef struct cups_dmap_str /**** DBCS Charmap Cache Structure ****/ - { - struct cups_dmap_str *next; /* Next charset map in cache */ - int used; /* Number of times entry used */ - cups_encoding_t encoding; /* Legacy charset encoding */ - ucs2_t *char2uni[256]; /* Map Legacy DBCS -> UCS-2 */ - dbcs_t *uni2char[256]; /* Map UCS-2 -> Legacy DBCS */ - } cups_dmap_t; - - 'char2uni[]' is a (sparse) array of pointers to arrays of (256 each) - UCS-2 values that supports direct two-level lookup from an input DBCS - legacy charset code point, for (future) use by 'cupsCharsetToUtf8()'. - - 'uni2char[]' is a (sparse) array of pointers to arrays of (256 each) - DBCS values, that supports direct two-level lookup from an input UCS-2 - code point, for (future) use by 'cupsUtf8ToCharset()'. - - - - 3.1.2. transcode.c - Transcoding module - - All of the transcoding functions are modelled on the C standard library - function 'strncpy()', except that they return the count of output, like - 'strlen()', rather than the (redundant) pointer to the output. - - If the transcoding functions detect invalid input parameters or they - detect an encoding error in their input, then they return '-1', rather - than the count of output. - - All of the transcoding functions take an input parameter indicating the - maximum output units (for safe operation). The functions that return - 16-bit (UTF-16) or 32-bit (UTF-32/UCS-4) output always return the output - string count (not including the final null) and NOT the memory size in - bytes. - - - - 3.1.2.1. cupsUtf8ToCharset() - - extern int cupsUtf8ToCharset(char *dest, /* O - Target string */ - const utf8_t *src, /* I - Source string */ - const int maxout, /* I - Max output */ - cups_encoding_t encoding); /* I - Encoding */ - - <Find charset map by calling 'cupsCharmapGet()'> - <Convert input UTF-8 to internal UCS-4 by calling 'cupsUtf8ToUtf32()'> - - McDonald June 20, 2002 [Page 11] - - CUPS Internationalization Software Design Description v0.3 - - <Convert internal UCS-4 to legacy charset via charset map> - <Release charset map by calling 'cupsCharmapFree()'> - <Return length of output legacy charset string -- size in butes> - - - - 3.1.2.2. cupsCharsetToUtf8() - - extern int cupsCharsetToUtf8(utf8_t *dest, /* O - Target string */ - const char *src, /* I - Source string */ - const int maxout, /* I - Max output */ - cups_encoding_t encoding); /* I - Encoding */ - - <Find charset map by calling 'cupsCharmapGet()'> - <Convert input legacy charset to internal UCS-4 via charset map> - <Convert internal UCS-4 to UTF-8 by calling 'cupsUtf32ToUtf8()'> - <Release charset map by calling 'cupsCharmapFree()'> - <Return length of output UTF-8 string -- size in bytes> - - - - 3.1.2.3. cupsUtf8ToUtf16() - - extern int cupsUtf8ToUtf16(utf16_t *dest, /* O - Target string */ - const utf8_t *src, /* I - Source string */ - const int maxout); /* I - Max output */ - - <...to avoid duplicate code to handle surrogate pairs...> - <Convert input UTF-8 to internal UCS-4 by calling 'cupsUtf8ToUtf32()'> - <Convert internal UCS-4 to UTF-16 by calling 'cupsUtf32ToUtf16()'> - <Return count of output UTF-16 string -- NOT memory size in bytes> - - - - 3.1.2.4. cupsUtf16ToUtf8() - - extern int cupsUtf16ToUtf8(utf8_t *dest, /* O - Target string */ - const utf16_t *src, /* I - Source string */ - const int maxout); /* I - Max output */ - - <...to avoid duplicate code to handle surrogate pairs...> - <Convert input UTF-16 to internal UCS-4 by calling 'cupsUtf16ToUtf32()'> - <Convert internal UCS-4 to UTF-8 by calling 'cupsUtf32ToUtf8()'> - <Return length of output UTF-8 string -- size in bytes> - - - - 3.1.2.5. cupsUtf8ToUtf32() - - extern int cupsUtf8ToUtf32(utf32_t *dest, /* O - Target string */ - const utf8_t *src, /* I - Source string */ - const int maxout); /* I - Max output */ - - McDonald June 20, 2002 [Page 12] - - CUPS Internationalization Software Design Description v0.3 - - - <Convert input UTF-8 directly to output UCS-4...> - <...checking for valid range, shortest-form, etc.> - <Return count of output UTF-32 string -- NOT memory size in bytes> - - - - 3.1.2.6. cupsUtf32ToUtf8() - - extern int cupsUtf32ToUtf8(utf8_t *dest, /* O - Target string */ - const utf32_t *src, /* I - Source string */ - const int maxout); /* I - Max output */ - - <Convert input UCS-4 directly to output UTF-8...> - <...checking for valid range, etc.> - <Return length of output UTF-8 string -- size in bytes> - - - - 3.1.2.7. cupsUtf16ToUtf32() - - extern int cupsUtf16ToUtf32(utf32_t *dest, /* O - Target string */ - const utf16_t *src, /* I - Source string */ - const int maxout); /* I - Max output */ - - <Convert input UTF-16 directly to output UCS-4...> - <...handling surrogate pairs decoding from UTF-16> - <Return count of output UTF-32 string -- NOT memory size in bytes> - - - - 3.1.2.8. cupsUtf32ToUtf16() - - extern int cupsUtf32ToUtf16(utf16_t *dest, /* O - Target string */ - const utf32_t *src, /* I - Source string */ - const int maxout); /* I - Max output */ - - <Convert input UCS-4 directly to output UTF-16...> - <...handling surrogate pairs encoding to UTF-16> - <Return count of output UTF-16 string -- NOT memory size in bytes> - - - - 3.1.2.9. Transcoding Utility Functions - - The transcoding utility functions are used to load (from a file into - memory), free (logically, without freeing memory), and flush (actually - free memory) character maps for SBCS (single-byte character set) and - (future) DBCS (double-byte character set) transcoding to and from UTF-8. - - - - - McDonald June 20, 2002 [Page 13] - - CUPS Internationalization Software Design Description v0.3 - - - - 3.1.2.9.1. cupsCharmapGet() - - extern void *cupsCharmapGet(const cups_encoding_t encoding); - /* I - Encoding */ - - <Find SBSC or DBCS charset map in cache> - <...If found, increment 'used'> - <...and return pointer to SBCS or DBCS charset map> - <Get charset map file name by calling 'cupsEncodingName()'> - <Open charset map file> - <...If not found, return void> - <Allocate memory for SBCS or DBCS charset map in cache> - <...If no memory, return void> - <Add to SBCS or DBCS cache by assigning 'next' field> - <Assign 'encoding' field> - <Increment 'used' field> - <Read charset map file into memory in loop...> - <If SBCS, then 'char2uni[]' is an array of 'ucs2_t' values> - <...and 'uni2char[]' is an array of pointers to 'sbcs_t' arrays> - <If DBCS, then char2uni[]' is an array of pointers to 'ucs2_t' arrays> - <...and 'uni2char[]' is an array of pointers to 'dbcs_t' arrays> - <Close charset map file> - <Return pointer to SBCS or DBCS charset map> - - - - 3.1.2.9.2. cupsCharmapFree() - - extern void cupsCharmapFree(const cups_encoding_t encoding); - /* I - Encoding */ - - <Find SBSC or DBCS charset map in cache> - <...If found, decrement 'used'> - <Return void> - - - - 3.1.2.9.3. cupsCharmapFlush() - - extern void cupsCharmapFlush(void); - - <Loop through SBCS charset map cache...> - <...Free 'uni2char[]' memory> - <...Free SBCS charset map memory> - <Loop through DBCS charset map cache...> - <...Free 'char2uni[]' memory> - <...Free 'uni2char[]' memory> - <...Free DBCS charset map memory> - <Return void> - - - McDonald June 20, 2002 [Page 14] - - CUPS Internationalization Software Design Description v0.3 - - - - - 3.2. Normalization - New - - - - 3.2.1. normalize.h - Normalization header - - /* - * "$Id: i18n_sdd.txt 2678 2002-08-19 01:15:26Z mike $" - * - * Unicode normalization for the Common UNIX Printing System (CUPS). - * - * Copyright 1997-2002 by Easy Software Products. - * - * These coded instructions, statements, and computer programs are - * the property of Easy Software Products and are protected by Federal - * copyright law. Distribution and use rights are outlined in the - * file "LICENSE.txt" which should have been included with this file. - * If this file is missing or damaged please contact Easy Software - * Products at: - * - * Attn: CUPS Licensing Information - * Easy Software Products - * 44141 Airport View Drive, Suite 204 - * Hollywood, Maryland 20636-3111 USA - * - * Voice: (301) 373-9603 - * EMail: cups-info@cups.org - * WWW: http://www.cups.org - */ - - #ifndef _CUPS_NORMALIZE_H_ - # define _CUPS_NORMALIZE_H_ - - /* - * Include necessary headers... - */ - - # include "transcod.h" - - # ifdef __cplusplus - extern "C" { - # endif /* __cplusplus */ - - /* - * Types... - */ - - typedef enum /**** Normalizataion Types ****/ - { - - McDonald June 20, 2002 [Page 15] - - CUPS Internationalization Software Design Description v0.3 - - CUPS_NORM_NFD, /* Canonical Decomposition */ - CUPS_NORM_NFKD, /* Compatibility Decomposition */ - CUPS_NORM_NFC, /* NFD, them Canonical Composition */ - CUPS_NORM_NFKC /* NFKD, them Canonical Composition */ - } cups_normalize_t; - - typedef enum /**** Case Folding Types ****/ - { - CUPS_FOLD_SIMPLE, /* Simple - no expansion in size */ - CUPS_FOLD_FULL /* Full - possible expansion in size */ - } cups_folding_t; - - typedef enum /**** Unicode Char Property Types ****/ - { - CUPS_PROP_GENERAL_CATEGORY, /* See 'cups_gencat_t' enum */ - CUPS_PROP_BIDI_CATEGORY, /* See 'cups_bidicat_t' enum */ - CUPS_PROP_COMBINING_CLASS, /* See 'cups_combclass_t' type */ - CUPS_PROP_BREAK_CLASS /* See 'cups_breakclass_t' enum */ - } cups_property_t; - - /* - * Note - parse Unicode char general category from 'UnicodeData.txt' - * into sparse local table in 'normalize.c'. - * Use major classes for logic optimizations throughout (by mask). - */ - - typedef enum /**** Unicode General Category ****/ - { - CUPS_GENCAT_L = 0x10, /* Letter major class */ - CUPS_GENCAT_LU = 0x11, /* Lu Letter, Uppercase */ - CUPS_GENCAT_LL = 0x12, /* Ll Letter, Lowercase */ - CUPS_GENCAT_LT = 0x13, /* Lt Letter, Titlecase */ - CUPS_GENCAT_LM = 0x14, /* Lm Letter, Modifier */ - CUPS_GENCAT_LO = 0x15, /* Lo Letter, Other */ - CUPS_GENCAT_M = 0x20, /* Mark major class */ - CUPS_GENCAT_MN = 0x21, /* Mn Mark, Non-Spacing */ - CUPS_GENCAT_MC = 0x22, /* Mc Mark, Spacing Combining */ - CUPS_GENCAT_ME = 0x23, /* Me Mark, Enclosing */ - CUPS_GENCAT_N = 0x30, /* Number major class */ - CUPS_GENCAT_ND = 0x31, /* Nd Number, Decimal Digit */ - CUPS_GENCAT_NL = 0x32, /* Nl Number, Letter */ - CUPS_GENCAT_NO = 0x33, /* No Number, Other */ - CUPS_GENCAT_P = 0x40, /* Punctuation major class */ - CUPS_GENCAT_PC = 0x41, /* Pc Punctuation, Connector */ - CUPS_GENCAT_PD = 0x42, /* Pd Punctuation, Dash */ - CUPS_GENCAT_PS = 0x43, /* Ps Punctuation, Open (start) */ - CUPS_GENCAT_PE = 0x44, /* Pe Punctuation, Close (end) */ - CUPS_GENCAT_PI = 0x45, /* Pi Punctuation, Initial Quote */ - CUPS_GENCAT_PF = 0x46, /* Pf Punctuation, Final Quote */ - CUPS_GENCAT_PO = 0x47, /* Po Punctuation, Other */ - CUPS_GENCAT_S = 0x50, /* Symbol major class */ - CUPS_GENCAT_SM = 0x51, /* Sm Symbol, Math */ - - McDonald June 20, 2002 [Page 16] - - CUPS Internationalization Software Design Description v0.3 - - CUPS_GENCAT_SC = 0x52, /* Sc Symbol, Currency */ - CUPS_GENCAT_SK = 0x53, /* Sk Symbol, Modifier */ - CUPS_GENCAT_SO = 0x54, /* So Symbol, Other */ - CUPS_GENCAT_Z = 0x60, /* Separator major class */ - CUPS_GENCAT_ZS = 0x61, /* Zs Separator, Space */ - CUPS_GENCAT_ZL = 0x62, /* Zl Separator, Line */ - CUPS_GENCAT_ZP = 0x63, /* Zp Separator, Paragraph */ - CUPS_GENCAT_C = 0x70, /* Other (miscellaneous) major class */ - CUPS_GENCAT_CC = 0x71, /* Cc Other, Control */ - CUPS_GENCAT_CF = 0x72, /* Cf Other, Format */ - CUPS_GENCAT_CS = 0x73, /* Cs Other, Surrogate */ - CUPS_GENCAT_CO = 0x74, /* Co Other, Private Use */ - CUPS_GENCAT_CN = 0x75 /* Cn Other, Not Assigned */ - } cups_gencat_t; - - /* - * Note - parse Unicode char bidi category from 'UnicodeData.txt' - * into sparse local table in 'normalize.c'. - * Add bidirectional support to 'textcommon.c' - per Mike - */ - - typedef enum /**** Unicode Bidi Category ****/ - { - CUPS_BIDI_L, /* Left-to-Right (Alpha, Syllabic, Ideographic) */ - CUPS_BIDI_LRE, /* Left-to-Right Embedding (explicit) */ - CUPS_BIDI_LRO, /* Left-to-Right Override (explicit) */ - CUPS_BIDI_R, /* Right-to-Left (Hebrew alphabet and most punct) */ - CUPS_BIDI_AL, /* Right-to-Left Arabic (Arabic, Thaana, Syriac) */ - CUPS_BIDI_RLE, /* Right-to-Left Embedding (explicit) */ - CUPS_BIDI_RLO, /* Right-to-Left Override (explicit) */ - CUPS_BIDI_PDF, /* Pop Directional Format */ - CUPS_BIDI_EN, /* Euro Number (Euro and East Arabic-Indic digits) */ - CUPS_BIDI_ES, /* Euro Number Separator (Slash) */ - CUPS_BIDI_ET, /* Euro Number Termintor (Plus, Minus, Degree, etc) */ - CUPS_BIDI_AN, /* Arabic Number (Arabic-Indic digits, separators) */ - CUPS_BIDI_CS, /* Common Number Separator (Colon, Comma, Dot, etc) */ - CUPS_BIDI_NSM, /* Non-Spacing Mark (category Mn / Me in UCD) */ - CUPS_BIDI_BN, /* Boundary Neutral (Formatting / Control chars) */ - CUPS_BIDI_B, /* Paragraph Separator */ - CUPS_BIDI_S, /* Segment Separator (Tab) */ - CUPS_BIDI_WS, /* Whitespace Space (Space, Line Separator, etc) */ - CUPS_BIDI_ON /* Other Neutrals */ - } cups_bidicat_t; - - /* - * Note - parse Unicode line break class from 'DerivedLineBreak.txt' - * into sparse local table (list of class ranges) in 'normalize.c'. - * Note - add state table from UAX-14, section 7.3 - Ira - * Remember to do BK and SP in outer loop (not in state table). - * Consider optimization for CM (combining mark). - * See 'LineBreak.txt' (12,875) and 'DerivedLineBreak.txt' (1,350). - */ - - McDonald June 20, 2002 [Page 17] - - CUPS Internationalization Software Design Description v0.3 - - - typedef enum /**** Unicode Line Break Class ****/ - { - /* - * (A) - Allow Break AFTER - * (XA) - Prevent Break AFTER - * (B) - Allow Break BEFORE - * (XB) - Prevent Break BEFORE - * (P) - Allow Break For Pair - * (XP) - Prevent Break For Pair - */ - CUPS_BREAK_AI, /* Ambiguous (Alphabetic or Ideograph) */ - CUPS_BREAK_AL, /* Ordinary Alphabetic / Symbol Chars (XP) */ - CUPS_BREAK_BA, /* Break Opportunity After Chars (A) */ - CUPS_BREAK_BB, /* Break Opportunities Before Chars (B) */ - CUPS_BREAK_B2, /* Break Opportunity Before / After (B/A/XP) */ - CUPS_BREAK_BK, /* Mandatory Break (A) (normative) */ - CUPS_BREAK_CB, /* Contingent Break (B/A) (normative) */ - CUPS_BREAK_CL, /* Closing Punctuation (XB) */ - CUPS_BREAK_CM, /* Attached Chars / Combining (XB) (normative) */ - CUPS_BREAK_CR, /* Carriage Return (A) (normative) */ - CUPS_BREAK_EX, /* Exclamation / Interrogation (XB) */ - CUPS_BREAK_GL, /* Non-breaking ("Glue") (XB/XA) (normative) */ - CUPS_BREAK_HY, /* Hyphen (XA) */ - CUPS_BREAK_ID, /* Ideographic (B/A) */ - CUPS_BREAK_IN, /* Inseparable chars (XP) */ - CUPS_BREAK_IS, /* Numeric Separator (Infix) (XB) */ - CUPS_BREAK_LF, /* Line Feed (A) (normative) */ - CUPS_BREAK_NS, /* Non-starters (XB) */ - CUPS_BREAK_NU, /* Numeric (XP) */ - CUPS_BREAK_OP, /* Opening Punctuation (XA) */ - CUPS_BREAK_PO, /* Postfix (Numeric) (XB) */ - CUPS_BREAK_PR, /* Prefix (Numeric) (XA) */ - CUPS_BREAK_QU, /* Ambiguous Quotation (XB/XA) */ - CUPS_BREAK_SA, /* Context Dependent (South East Asian) (P) */ - CUPS_BREAK_SG, /* Surrogates (XP) (normative) */ - CUPS_BREAK_SP, /* Space (A) (normative) */ - CUPS_BREAK_SY, /* Symbols Allowing Break After (A) */ - CUPS_BREAK_XX, /* Unknown (XP) */ - CUPS_BREAK_ZW /* Zero Width Space (A) (normative) */ - } cups_breakclass_t; - - typedef int cups_combclass_t; /**** Unicode Combining Class ****/ - /* 0=base / 1..254=combining char */ - - /* - * Structures... - */ - - typedef struct cups_normmap_str /**** Normalize Map Cache Struct ****/ - { - struct cups_normmap_str *next; /* Next normalize in cache */ - - McDonald June 20, 2002 [Page 18] - - CUPS Internationalization Software Design Description v0.3 - - int used; /* Number of times entry used */ - cups_normalize_t normalize; /* Normalization type */ - int normcount; /* Count of Source Chars */ - ucs2_t *uni2norm; /* Char -> Normalization */ - /* ...only supports UCS-2 */ - } cups_normmap_t; - - typedef struct cups_foldmap_str /**** Case Fold Map Cache Struct ****/ - { - struct cups_foldmap_str *next; /* Next case fold in cache */ - int used; /* Number of times entry used */ - cups_folding_t fold; /* Case folding type */ - int foldcount; /* Count of Source Chars */ - ucs2_t *uni2fold; /* Char -> Folded Char(s) */ - /* ...only supports UCS-2 */ - } cups_foldmap_t; - - typedef struct cups_prop_str /**** Char Property Struct ****/ - { - ucs2_t ch; /* Unicode Char as UCS-2 */ - unsigned char gencat; /* General Category */ - unsigned char bidicat; /* Bidirectional Category */ - } cups_prop_t; - - typedef struct /**** Char Property Map Struct ****/ - { - int used; /* Number of times entry used */ - int propcount; /* Count of Source Chars */ - cups_prop_t *uni2prop; /* Char -> Properties */ - } cups_propmap_t; - - typedef struct /**** Line Break Class Map Struct ****/ - { - int used; /* Number of times entry used */ - int breakcount; /* Count of Source Chars */ - ucs2_t *uni2break; /* Char -> Line Break Class */ - } cups_breakmap_t; - - typedef struct cups_comb_str /**** Char Combining Class Struct ****/ - { - ucs2_t ch; /* Unicode Char as UCS-2 */ - unsigned char combclass; /* Combining Class */ - unsigned char reserved; /* Reserved for alignment */ - } cups_comb_t; - - typedef struct /**** Combining Class Map Struct ****/ - { - int used; /* Number of times entry used */ - int combcount; /* Count of Source Chars */ - cups_comb_t *uni2comb; /* Char -> Combining Class */ - } cups_combmap_t; - - - McDonald June 20, 2002 [Page 19] - - CUPS Internationalization Software Design Description v0.3 - - - /* - * Globals... - */ - - extern int NzSupportUcs2; /* Support UCS-2 (16-bit) mapping */ - extern int NzSupportUcs4; /* Support UCS-4 (32-bit) mapping */ - - /* - * Prototypes... - */ - - /* - * Utility functions for normalization module - */ - extern int cupsNormalizeMapsGet(void); - extern int cupsNormalizeMapsFree(void); - extern void cupsNormalizeMapsFlush(void); - - /* - * Normalize UTF-8 string to Unicode UAX-15 Normalization Form - * Note - Compatibility Normalization Forms (NFKD/NFKC) are - * unsafe for subsequent transcoding to legacy charsets - */ - extern int cupsUtf8Normalize(utf8_t *dest, /* O - Target string */ - const utf8_t *src, /* I - Source string */ - const int maxout, /* I - Max output */ - const cups_normalize_t normalize); - /* I - Normalization */ - - /* - * Normalize UTF-32 string to Unicode UAX-15 Normalization Form - * Note - Compatibility Normalization Forms (NFKD/NFKC) are - * unsafe for subsequent transcoding to legacy charsets - */ - extern int cupsUtf32Normalize(utf32_t *dest, - /* O - Target string */ - const utf32_t *src, /* I - Source string */ - const int maxout, /* I - Max output */ - const cups_normalize_t normalize); - /* I - Normalization */ - - /* - * Case Fold UTF-8 string per Unicode UAX-21 Section 2.3 - * Note - Case folding output is - * unsafe for subsequent transcoding to legacy charsets - */ - extern int cupsUtf8CaseFold(utf8_t *dest, /* O - Target string */ - const utf8_t *src, /* I - Source string */ - const int maxout, /* I - Max output */ - const cups_folding_t fold); /* I - Fold Mode */ - - - McDonald June 20, 2002 [Page 20] - - CUPS Internationalization Software Design Description v0.3 - - - /* - * Case Fold UTF-32 string per Unicode UAX-21 Section 2.3 - * Note - Case folding output is - * unsafe for subsequent transcoding to legacy charsets - */ - extern int cupsUtf32CaseFold(utf32_t *dest,/* O - Target string */ - const utf32_t *src, /* I - Source string */ - const int maxout, /* I - Max output */ - const cups_folding_t fold); /* I - Fold Mode */ - - /* - * Compare UTF-8 strings after case folding - */ - extern int cupsUtf8CompareCaseless(const utf8_t *s1, - /* I - String1 */ - const utf8_t *s2); /* I - String2 */ - - /* - * Compare UTF-32 strings after case folding - */ - extern int cupsUtf32CompareCaseless(const utf32_t *s1, - /* I - String1 */ - const utf32_t *s2); /* I - String2 */ - - /* - * Compare UTF-8 strings after case folding and NFKC normalization - */ - extern int cupsUtf8CompareIdentifier(const utf8_t *s1, - /* I - String1 */ - const utf8_t *s2); /* I - String2 */ - - /* - * Compare UTF-32 strings after case folding and NFKC normalization - */ - extern int cupsUtf32CompareIdentifier(const utf32_t *s1, - /* I - String1 */ - const utf32_t *s2); /* I - String2 */ - - /* - * Get UTF-32 character property - */ - extern int cupsUtf32CharacterProperty(const utf32_t ch, - /* I - Source char */ - const cups_property_t property); - /* I - Char Property */ - - # ifdef __cplusplus - } - # endif /* __cplusplus */ - - #endif /* !_CUPS_NORMALIZE_H_ */ - - McDonald June 20, 2002 [Page 21] - - CUPS Internationalization Software Design Description v0.3 - - - /* - * End of "$Id: i18n_sdd.txt 2678 2002-08-19 01:15:26Z mike $" - */ - - - - 3.2.1.1. cups_normmap_t - Normalize Map Structure - - typedef struct cups_normmap_str /**** Normalize Map Cache Struct ****/ - { - struct cups_normmap_str *next; /* Next normalize in cache */ - int used; /* Number of times entry used */ - cups_normalize_t normalize; /* Normalization type */ - int normcount; /* Count of Source Chars */ - ucs2_t *uni2norm; /* Char -> Normalization */ - /* ...only supports UCS-2 */ - } cups_normmap_t; - - 'uni2norm' is a pointer to an array of _triplets_ of UCS-2 values. - 'normcount' is a count of _triplets_ in the 'uni2norm[]' array. - - For decompositions (NFD and NFKD), the triplets are: composed base - character, decomposed base character, and decomposed accent character. - These are used by 'cupsUtf8Normalize()' and 'cupsUtf32Normalize()' in - performing canonical (NFD) or compatibility (NFKD) decomposition. - - For compositions (NFC and NFKC), the triplets are: decomposed base - character, decomposed accent character, and composed base character. - These are used by 'cupsUtf8Normalize()' and 'cupsUtf32Normalize()' in - performing canonical composition (for NFC or NFKC). - - - - 3.2.1.2. cups_foldmap_t - Case Fold Map Structure - - typedef struct cups_foldmap_str /**** Case Fold Map Cache Struct ****/ - { - int used; /* Number of times entry used */ - cups_folding_t fold; /* Case folding type */ - int foldcount; /* Count of Source Chars */ - ucs2_t *uni2fold; /* Char -> Folded Char(s) */ - /* ...only supports UCS-2 */ - } cups_foldmap_t; - - 'uni2fold' is a pointer to an array of _quadruplets_ of UCS-2 values. - 'foldcount' is a count of _quadruplets_ in the 'uni2fold[]' array. - - For simple case folding (without expansion of the size of the output - string), the quadruplets are: input base character, output case folded - character, zero (unused), and zero (unused). - - - McDonald June 20, 2002 [Page 22] - - CUPS Internationalization Software Design Description v0.3 - - - For full case folding (with possible expansion of the size of the output - string), the quadruplets are: input base character, output case folded - character, second output character or zero, third output character or - zero. - - - - 3.2.1.3. cups_propmap_t - Char Property Map Structure - - typedef struct /**** Char Property Map Struct ****/ - { - int used; /* Number of times entry used */ - int propcount; /* Count of Source Chars */ - cups_prop_t *uni2prop; /* Char -> Properties */ - } cups_propmap_t; - - 'uni2prop' is a pointer to an array of 'cups_prop_t' (see below). - 'propcount' is a count of elements in the 'uni2prop[]' array. - - - - 3.2.1.4. cups_prop_t - Char Property Structure - - typedef struct cups_prop_str /**** Char Property Struct ****/ - { - ucs2_t ch; /* Unicode Char as UCS-2 */ - unsigned char gencat; /* General Category */ - unsigned char bidicat; /* Bidirectional Category */ - } cups_prop_t; - - - - 3.2.1.5. cups_breakmap_t - Line Break Map Structure - - typedef struct /**** Line Break Class Map Struct ****/ - { - int used; /* Number of times entry used */ - int breakcount; /* Count of Source Chars */ - ucs2_t *uni2break; /* Char -> Line Break Class */ - } cups_breakmap_t; - - 'uni2break' is a pointer to an array of _triplets_ of UCS-2 values. - 'breakcount' is a count of _triplets_ in the 'uni2break[]' array. - - The triplets in 'uni2break' are: first UCS-2 value in a range, last - UCS-2 value in a range, and line break class stored as UCS-2. - - - - - - - McDonald June 20, 2002 [Page 23] - - CUPS Internationalization Software Design Description v0.3 - - - - 3.2.1.6. cups_combmap_t - Combining Class Map Structure - - typedef struct /**** Combining Class Map Struct ****/ - { - int used; /* Number of times entry used */ - int combcount; /* Count of Source Chars */ - cups_comb_t *uni2comb; /* Char -> Combining Class */ - } cups_combmap_t; - - 'uni2comb' is a pointer to an array of 'cups_comb_t' (see below). - 'combcount' is a count of elements in the 'uni2comb[]' array. - - - - 3.2.1.7. cups_comb_t - Combining Class Structure - - typedef struct cups_comb_str /**** Char Combining Class Struct ****/ - { - unsigned short ch; /* Unicode Char as UCS-2 */ - unsigned char combclass; /* Combining Class */ - unsigned char reserved; /* Reserved for alignment */ - } cups_comb_t; - - - - 3.2.2. normalize.c - Normalization module - - The normalization function 'cupsUtf8Normalize()' and the case folding - function 'cupsUtf8CaseFold()' are modelled on the C standard library - function 'strncpy()', except that they return the count of the output, - like 'strlen()', rather than the (redundant) pointer to the output. - - If the normalization or case folding functions detect invalid input - parameters or they detect an encoding error in their input, then they - return '-1', rather than the count of output. - - The normalization and case folding functions take an input parameter - indicating the maximum output units (for safe operation). - - - - 3.2.2.1. cupsUtf8Normalize() - - /* - * Normalize UTF-8 string to Unicode UAX-15 Normalization Form - * Note - Compatibility Normalization Forms (NFKD/NFKC) are - * unsafe for subsequent transcoding to legacy charsets - */ - extern int cupsUtf8Normalize(utf8_t *dest, /* O - Target string */ - const utf8_t *src, /* I - Source string */ - - McDonald June 20, 2002 [Page 24] - - CUPS Internationalization Software Design Description v0.3 - - const int maxout, /* I - Max output */ - const cups_normalize_t normalize); - /* I - Normalization */ - - <Convert input UTF-8 to internal UCS-4 by calling 'cupsUtf8ToUtf32()'> - <Normalize by calling 'cupsUtf32Normalize()'> - <Convert normalized UCS-4 to UTF-8 by calling 'cupsUtf32ToUtf8()> - <Return length of output UTF-8 string -- size in butes> - - - - 3.2.2.2. cupsUtf32Normalize() - - extern int cupsUtf32Normalize(utf32_t *dest, - /* O - Target string */ - const utf32_t *src, /* I - Source string */ - const int maxout, /* I - Max output */ - const cups_normalize_t normalize); - /* I - Normalization */ - - <Find normalize maps by calling 'cupsNormalizeMapsGet()'> - <...if not found, return '-1'> - <Repeatedly traverse internal UCS-4, decomposing (NFD or NFKD)...> - <...with 'bsearch()' of 'uni2norm[]' using local 'compare_decompose()'> - <...until one pass yields no further decomposition> - <Repeatedly traverse internal UCS-4, doing canonical reordering> - <...with 'bsearch()' of 'uni2comb[]' using local 'compare_combchar()'> - <...until one pass yields no further canonical reordering> - <If 'normalize' requests composition (NFC or NFKC)...> - <...repeatedly traverse internal UCS-4, composing (NFC or NFKC)...> - <...with 'bsearch()' of 'uni2norm[]' using local 'compare_compose()'> - <...until one pass yields no further composition> - <Release normalize maps by calling 'cupsNormalizeMapsFree()'> - <Return count of output UTF-32 string -- NOT memory size in butes> - - - - 3.2.2.3. cupsUtf8CaseFold() - - /* - * Case Fold UTF-8 string per Unicode UAX-21 Section 2.3 - * Note - Case folding output is - * unsafe for subsequent transcoding to legacy charsets - */ - extern int cupsUtf8CaseFold(utf8_t *dest, /* O - Target string */ - const utf8_t *src, /* I - Source string */ - const int maxout, /* I - Max output */ - const cups_folding_t fold); /* I - Fold Mode */ - - <Find normalize maps by calling 'cupsNormalizeMapsGet()'> - <...if not found, return '-1'> - <Convert input UTF-8 to internal UCS-4 by calling 'cupsUtf8ToUtf32()'> - - McDonald June 20, 2002 [Page 25] - - CUPS Internationalization Software Design Description v0.3 - - <Case fold internal UCS-4 by calling 'cupsUtf32CaseFold()'> - <Convert internal UCS-4 to output UTF-8 by calling 'cupsUtf32ToUtf8()> - <Release normalize maps by calling 'cupsNormalizeMapsFree()'> - <Return length of output UTF-8 string -- size in butes> - - - - 3.2.2.4. cupsUtf32CaseFold() - - /* - * Case Fold UTF-32 string per Unicode UAX-21 Section 2.3 - * Note - Case folding output is - * unsafe for subsequent transcoding to legacy charsets - */ - extern int cupsUtf32CaseFold(utf32_t *dest, /* Target string */ - const utf32_t *src, /* Source string */ - const int maxout); /* Max output units */ - - <Find case fold maps by calling 'cupsNormalizeMapsGet()'> - <...if not found, return '-1'> - <Traverse internal UCS-4 once, performing case folding...> - <...with 'bsearch()' of 'uni2fold[]' using local 'compare_foldchar()'> - <Copy internal UCS-4 to output UTF-32 string> - <Release normalize maps by calling 'cupsNormalizeMapsFree()'> - <Return count of output UTF-32 string -- NOT memory size in bytes> - - - - 3.2.2.5. cupsUtf8CompareCaseless() - - /* - * Compare UTF-8 strings after case folding - */ - extern int cupsUtf8CompareCaseless(const utf8_t *s1, - /* I - String1 */ - const utf8_t *s2); /* I - String2 */ - - <Case fold both input UTF-8 strings by calling 'cupsUtf8CaseFold()'> - <Return compare of case folded first and second strings> - - - - 3.2.2.6. cupsUtf32CompareCaseless() - - /* - * Compare UTF-32 strings after case folding - */ - extern int cupsUtf32CompareCaseless(const utf32_t *s1, - /* I - String1 */ - const utf32_t *s2); /* I - String2 */ - - <Case fold both input UTF-32 strings by calling 'cupsUtf32CaseFold()'> - - McDonald June 20, 2002 [Page 26] - - CUPS Internationalization Software Design Description v0.3 - - <Return compare of case folded first and second strings> - - - - 3.2.2.7. cupsUtf8CompareIdentifier() - - /* - * Compare UTF-8 strings after case folding and NFKC normalization - */ - extern int cupsUtf8CompareIdentifier(const utf8_t *s1, - /* I - String1 */ - const utf8_t *s2); /* I - String2 */ - - <Convert input UTF-8 to internal UCS-4 by calling 'cupsUtf8ToUtf32()'> - <Case fold both strings by calling 'cupsUtf32CaseFold()'> - <Normalize both strings to NFKC by calling 'cupsUtf32Normalize()'> - <Return compare of case folded/normalized first and second strings> - - - - 3.2.2.8. cupsUtf32CompareIdentifier() - - /* - * Compare UTF-32 strings after case folding and NFKC normalization - */ - extern int cupsUtf32CompareIdentifier(const utf32_t *s1, - /* I - String1 */ - const utf32_t *s2); /* I - String2 */ - - <Case fold both strings by calling 'cupsUtf32CaseFold()'> - <Normalize both strings to NFKC by calling 'cupsUtf32Normalize()'> - <Return compare of case folded/normalized first and second strings> - - - - 3.2.2.9. cupsUtf32CharacterProperty() - - /* - * Get UTF-32 character property - */ - extern int cupsUtf32CharacterProperty(const utf32_t ch, - /* I - Source char */ - const cups_property_t property); - /* I - Char Property */ - - <Lookup UTF-32 character property in appropriate map...> <...internal - functions for each different map lookup> - - - - - - - McDonald June 20, 2002 [Page 27] - - CUPS Internationalization Software Design Description v0.3 - - - - 3.2.2.10. Normalization Utility Functions - - - - - 3.2.2.10.1. cupsNormalizeMapsGet() - - extern void cupsNormalizeMapsMapsGet(void); - - <Find normalize maps in cache> - <...If found, increment 'used'> - <...and return void> - <For each map (normalization, case fold, combining class, etc.)...> - <Open (preprocessed form of) Unicode data file...> - <...If not found, return void> - <Count lines in preprocessed form, for mapping memory alloc> - <...Close (preprocessed form of) Unicode data file> - <Open (preprocessed form of) Unicode data file...> - <...If not found, return void> - <Allocate memory for approriate map in cache...> - <...If no memory, return void> - <Add to appropriate cache by assigning 'next' field> - <Assign map type field and count field> - <Increment 'used' field> - <Read normalize map into memory in loop...> - <...Add values to 'uni2xxx[]' array> - <Close (preprocessed form of) Unicode data file> - <Return void> - - - - 3.2.2.10.2. cupsNormalizeMapsFree() - - extern void cupsNormalizeMapsFree(void); - - <Find normalize maps in cache> - <...If found, decrement 'used'> - <Return void> - - - - 3.2.2.10.3. cupsNormalizeMapsFlush() - - extern void cupsNormalizeMapsFlush(void); - - <Loop through normalize maps cache...> - <...Free 'uni2norm[]' memory> - <...Free normalize map memory> - <Loop through case folding cache...> - <...Free 'uni2fold[]' memory> - - McDonald June 20, 2002 [Page 28] - - CUPS Internationalization Software Design Description v0.3 - - <...Free case folding memory> - <Loop through char property map cache...> - <...Free 'uni2prop[]' memory> - <...Free char property map memory> - <Loop through line break class map cache...> - <...Free 'uni2break[]' memory> - <...Free line break class map memory> - <Loop through combining class map cache...> - <...Free 'uni2comb[]' memory> - <...Free combining class map memory> - <Return void> - - - - 3.3. Language - Existing - - - - 3.3.1. language.h - Language header - - Required Changes: - - (1) Change definition of 'cups_lang_t' to correct length of 'language[]' - to 32 characters per [RFC3066] and [ISO639-2] and [ISO3166-1]. - - - - 3.3.2. language.c - Language module - - - - 3.3.2.1. cupsLangEncoding() - Existing - - [No Change] - - - - 3.3.2.2. cupsLangFlush() - Existing - - [No Change] - - - - 3.3.2.3. cupsLangFree() - Existing - - [No Change] - - - - - - - - McDonald June 20, 2002 [Page 29] - - CUPS Internationalization Software Design Description v0.3 - - - - 3.3.2.4. cupsLangGet() - Existing - - Required Changes: - - (1) Change length of 'langname[]' and 'real[]' to 64 characters per - [RFC3066] and potential length of encoding (charset) names; - (2) Change language string normalization to support: - (a) 8-character language codes per [RFC3066] and 3-character - language codes per [ISO639-2]; - (b) 8-character country codes per [RFC3066] and 3-character country - codes per [ISO3166-1]; - (c) Support for 'i' (IANA registered) and 'x' (private) language - prefixes per [RFC3066]; - (d) Invariant use of 'utf-8' for encoding in message catalog, but - save actual requested encoding name for later use. - (3) Correct broken do/while statement for message catalog lookup (while - condition is _never_ satisfied). - - - - 3.3.2.5. cupsLangPrintf() - New - - extern int cupsLangPrintf(FILE *fp, /* I - File to write */ - const cups_lang_t *lang, /* I - Language/locale*/ - const cups_msg_t msg, /* I - Msg to format */ - ...); /* I - Args to format */ - - <Set up variable args by calling 'va_start()'> - <Format CUPS message with variable args by calling 'vsnprintf()'> - <Clean up variable args by calling 'va_end()'> - <Transcode CUPS message by calling 'cupsUtf8ToCharset()'> - <Write CUPS message by calling 'fputs()'> - <Return transcoded output CUPS message length> - - - - 3.3.2.6. cupsLangPuts() - New - - extern int cupsLangPuts(FILE *fp, /* I - File to write */ - const cups_lang_t *lang, /* I - Language/locale*/ - const cups_msg_t msg); /* I - Msg to write */ - - <Transcode CUPS message by calling 'cupsUtf8ToCharset()'> - <Write CUPS message by calling 'fputs()'> - <Return transcoded output CUPS message length> - - - - - - - McDonald June 20, 2002 [Page 30] - - CUPS Internationalization Software Design Description v0.3 - - - - 3.3.2.7. cupsEncodingName() - New - - extern char *cupsEncodingName(cups_encoding_t encoding); - - <Lookup encoding name in static 'lang_encodings[]' array> - <Return pointer to encoding name (charset map file name)> - - - - 3.4. Common Text Filter - Existing - - - - 3.4.1. textcommon.h - Common text filter header - - Required changes: - - (1) Revise 'lchar_t' as specified below, adding 'attrx' bit-mask for - selected Unicode character properties; - (2) Revise 'lchar_t' as specified below, adding 'comblen' and 'combch[]' - for Unicode combining/attached chars (accents); - (3) Add 'COMBLEN_MAX' limit as specified below; - (4) Add 'ATTRX_...' selected Unicode character properties as specified - below. - - - - 3.4.1.1. lchar_t - Character/Attribute Structure - - typedef struct lchar_str /**** Character / Attribute Structure ****/ - { - unsigned short ch; /* Unicode Char as UCS-2 */ - /* or 8/16-bit Legacy Char */ - unsigned short attr; /* Attributes of Char */ - unsigned short attrx; /* Extended Attributes */ - unsigned short comblen; /* Combining Char Count */ - unsigned short combch[8]; /* Combining Chars as UCS-2 */ - } lchar_t; - - 'ch' is a 16-bit UCS-2 character or a 8/16-bit legacy char. 'attr' is - the character attributes defined for the existing 'lchar_t' structure - (defined in 'textcommon.h'). 'attrx' is the extended character - attributes defined for future selected Unicode character properties (see - below). 'comblen' is the number of attached/combining characters. - 'combch' is an array of 16-bit UCS-2 attached/combining characters. - - Add to 'textcommon.h' constants: - - COMBLEN_MAX 8 - - - McDonald June 20, 2002 [Page 31] - - CUPS Internationalization Software Design Description v0.3 - - - ATTRX_RIGHT2LEFT 0x0001 - - - - 3.4.2. textcommon.c - Common text filter - - Required Changes: - - (1) Revise 'TextMain()' function as described below. - - - - 3.4.2.1. TextMain() - Existing - - Required Changes: - - [Ed Note: Pseudo code below needs more work on bidi handling.] - - (1) In main loop at the _beginning_ of the 'default' clause, add the - following code for combining marks: - lchar_t *cp; - - cp = Page[line]; - cp += column; - /* - * Check for Unicode combining mark (accent) - */ - if (UTF-8 && cupsUtf32CombiningClass(ch) > 0) - { - - /* - * Save Unicode combining mark in SAME character - */ - if (cp->comblen > COMBLEN_MAX) - break; - cp->combch[cp->comblen] = ch; - cp->comblen ++; - break; - } - - (2) In main loop _after_ combining chars section in 'default' clause, - add the following code for Unicode bidi control characters - cups_bidicat_t bidicat; - - /* - * Check for Unicode bidi control character - */ - if (UTF-8) - { - bidicat = (cups_bidicat_t) - cupsUtf32CharacterProperty(ch, CUPS_PROP_BIDI_CATEGORY); - - McDonald June 20, 2002 [Page 32] - - CUPS Internationalization Software Design Description v0.3 - - if ((bidicat == CUPS_BIDI_LRE) /* Left-to-Right Embedding * - || (bidicat == CUPS_BIDI_LRO) /* Left-to-Right Override */ - || (bidicat == CUPS_BIDI_RLE) /* Right-to-Left Embedding * - || (bidicat == CUPS_BIDI_RLO) /* Right-to-Left Override */ - || (bidicat == CUPS_BIDI_PDF)) /* Pop Directional Format */ - { - /* Do bidi stuff here with memory for NEXT char's direction - /* Discard bidi control character and break */ - } - if ((bidicat == CUPS_BIDI_R) /* Right-to-Left Hebrew */ - || (bidicat == CUPS_BIDI_AL)) /* Right-to-Left Arabic */ - { - /* Set attrx for right-to-left */ - cp->attrx |= ATTRX_RIGHT2LEFT - } - } - - - - 3.4.2.2. compare_keywords() - Existing - - [No Change] - - - - 3.4.2.3. getutf8() - Existing - - [No Change] - - [Ed Note: Future - allow 20-bit UTF-32 code points - requires updates - in both 'textcommon.c' and 'texttops.c' for extended PostScript.] - - - - 3.5. Text to PostScript Filter - Existing - - - - 3.5.1. texttops.c - Text to PostScript filter - - Required Changes: - - (1) Revise local 'write_string()' function as described below. - - - - 3.5.1.1. main() - Existing - - [No Change] - - - - - McDonald June 20, 2002 [Page 33] - - CUPS Internationalization Software Design Description v0.3 - - - - 3.5.1.2. WriteEpilogue () - Existing - - [No Change] - - - - 3.5.1.3. WritePage () - Existing - - [No Change] - - - - 3.5.1.4. WriteProlog () - Existing - - [No Change] - - - - 3.5.1.5. write_line() - Existing - - [No Change] - - - - 3.5.1.6. write_string() - Existing - - Required Changes: - - (1) At the _beginning_ of Multiple Fonts section, _replace_ the while() - loop and surrounding 'putchar()' calls with the following code: - - for (; len > 0; len --, s ++) - { - utf32_t decstr[COMBLEN_MAX * 2]; - utf32_t cmpstr[COMBLEN_MAX * 2]; - int cmplen; - int i; - - if (s->comblen == 0) - { - printf("<%04x>", Chars[s->ch]); - continue; - } - - /* - * Normalize decomposed Unicode character to NFKC - * (compatibility decomposition, then canonical composition) - */ - decstr[0] = (utf32_t) s->ch; - for (i = 0; i < s->comblen; i ++) - - McDonald June 20, 2002 [Page 34] - - CUPS Internationalization Software Design Description v0.3 - - decstr[i + 1] = (utf32_t) s->combch[i]; - decstr[i] = 0; - cmplen = cupsUtf32Normalize (&cmpstr[0], - &decstr[0], COMBLEN_MAX * 2, CUPS_NORM_NFKC); - if (cmplen < 1) - continue; - - /* - * Write combining chars, then composed base, to same location - */ - for (i = 1; i < cmplen; i ++) - { - printf("<%04x>", Chars[(int) cmpstr[i]); - /* - * Superimpose glyphs by backing up one column width - */ - printf (" -%.3f ", (72.0f / (float) CharsPerInch)); - } - printf("<%04x>", Chars[(int) cmpstr[0]); - } - - [Ed Note: Future - Bidi support - When writing Unicode characters - (checking for explicit bidi) convert input string (lchar_t) to display - order???] - - - - 3.5.1.7. write_text() - Existing - - [No Change] - - - - - - - - - - - - - - - - - - - - - - - - McDonald June 20, 2002 [Page 35] - - CUPS Internationalization Software Design Description v0.3 - APPENDIX A - Glossary - - - - A. Glossary - - Abstract Character: A unit of information used for the organization, - control, or representation of textual data. - - Accent Mark: A mark placed above, below, or to the side of a character - to alter its phonetic value (also 'diacritic'). - - Alphabet: A collection of symbols that, in the context of a particular - written language, represent the sounds of that language. - - Base Character: A character that does not graphically combine with - preceding characters, and that is neither a control nor a format - character. - - Basic Multilingual Plane: The Unicode (or UCS) code values 0x0000 - through 0xFFFF, specified by [ISO10646] (also 'Plane 0'). - - BIDI: Abbreviation for Bidirectional, in reference to mixed - left-to-right and right-to-left text. - - Bidirectional Display: The process or result of mixing left-to-right - oriented text and right-to-left oriented text in a single line. - - Big-endian: A computer architecture that stores multiple-byte numerical - values with the most significant byte (MSB) values first. - - BMP: Abbreviation for Basic Multilingual Plane. - - BOM: Acronym for byte order mark (also 'ZWNBSP'). - - Byte Order Mark: The Unicode character U+FEFF Zero Width No-Break Space - (ZWNBSP) when used to indicate the byte order of text. - - Canonical: (1) Conforming to the general rules for encoding -- that is, - not compressed, compacted, or in any other form specified by a higher - protocol. (2) Characteristic of a normative mapping and form of - equivalence. - - Canonical Decomposition: The decomposition of a character that results - from recursively applying the canonical mappings defined in the Unicode - Character Database until no characters can be further decomposed, then - reordering nonspacing marks according to section 3.10 of [UNICODE3.2]. - - Canonical Equivalent: Two characters are canonical equivalents if their - full canonical decompositions are identical. - - Case: (1) Feature of certain alphabets wheere the letters have two - - McDonald June 20, 2002 [Page A-1] - - CUPS Internationalization Software Design Description v0.3 - APPENDIX A - Glossary - - distinct forms. These variants are called the 'uppercase' letter (also - known as 'capital' or 'majuscule') and the 'lowercase' letter (also - known as 'small' or 'minuscule'). (2) Normative property of Unicode - characters, consisting of uppercase, lowercase, and titlecase. - - Character: (1) The smallest component of written language that has - semantic value; refers to the abstract meaning and/or shape, rather than - a specific shape (see also 'glyph'). (2) Synonym for 'abstract - character'. (3) The basic unit of encoding for the Unicode character - encoding. (4) The English name for the ideographic written elements of - Chinese origin (see 'ideograph'). - - Character Encoding Form (CEF): Mapping from a character set definition - to the actual bits used to represent the data. - - Character Encoding Scheme (CES): A 'character encoding form' plus byte - serialization. [UNICODE3.2] defines seven character encoding schemes: - UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, and UTF32-LE. - - Character Properties: A set of property names and property values - associated with individual characters defined in [UNICODE3.2]. - - Character Repertoire: (1) The collection of characters included in a - character set. (2) The SUBSET of characters included in a large - character set, e.g., [UNICODE3.2], that are necessary to support a - complete mapping to another smaller character set, e.g., ISO8859-1 (also - called 'Latin-1'). - - Character Set: A collection of elements used to represent textual - information. - - Coded Character Set: A character set in which each character is - assigned a numeric code value. Frequently abbreviated as 'character - set', 'charset', or 'code set'. - - Code Point: (1) A numerical index (or position) in an encoding table - used for encoding characters. (2) Synonym for 'Unicode scalar value'. - - Collation: The process of ordering units of textual information. - Collation is usually specific to a particular language. Also known as - 'alphabetizing' or 'alphabetic sorting'. - - Combining Character: A character that graphically combines with a - preceding 'base character'. The combining character is said to 'apply' - to that base character. (See also 'nonspacing mark'.) - - Compatibility: (1) Consistency with existing practice or preexisting - character encoding standards. (2) Characterisitic of a normative - mapping and form of equivalence (see 'compatibility decomposition'). - - - McDonald June 20, 2002 [Page A-2] - - CUPS Internationalization Software Design Description v0.3 - APPENDIX A - Glossary - - - Compatibility Character: A character that has a compatibility - decomposition. - - Compatibility Decomposition: The decomposition of a character that - results from recursively applying BOTH the compatibility mappings AND - the canonical mappings found in the Unicode Character Database until no - characters can be further decomposed, then reordering nonspacing marks - according to section 3.10 of [UNICODE3.2]. - - Compatibility Equivalent: Two characters are compatibility equivalents - if their full compatibility decompositions are identical. - - Composed Character: (See 'descomposable character'.) - - DBCS: Acronym for 'double-byte character set'. - - Decomposable Character: A character that is equivalent to a sequence of - one or more other characters, according to the decomposition mappings - found in [UNICODE3.2]. It may also be known as a 'precomposed - character' or a 'composite character'. - - Decomposition: (1) The process of separating or analyzing a text - element into component units. (2) A sequence of one or more characters - that is equivalent to a 'decomposable character'. - - Diacritic: (See 'accent mark'.) - - Double-Byte Character Set (DBCS): One of a number of character sets - defined for representing Chinese, Japanese, or Korean text (for example, - JIS X 0208-1990). These character sets are often encoded in such a way - as to allow double-byte character encodings to be mixed with single-byte - character encodings. (See also 'multiple-byte character set'.) - - Font: A collection of glyphs used for visual depication of character - data. - - FSS-UTF: Abbreviation for 'File System Safe UCS Transformation Format', - originally published by X/Open. Now called 'UTF-8'. - - Fullwidth: Characters of East Asian character sets whose glyph image - extends across the entire character display cell. In legacy character - sets, fullwidth characters are normally encoded in two or three bytes. - - Glyph: (1) An abstract form that represents one or more glyph images. - (2) A synonym for 'glyph image'. - - Glyph Image: The actual, concrete image of a glyph representation - having been rasterized or otherwise images onto some display surface. - - - McDonald June 20, 2002 [Page A-3] - - CUPS Internationalization Software Design Description v0.3 - APPENDIX A - Glossary - - - Halfwidth: Characters of East Asian character sets whose glyph image - occupies half of the character display cell. In legacy character sets, - halfwidth characters are normally encoded in a single byte. - - Han Characters: Ideographic characters of Chinese origin. - - Hangul: The name of the script used to write the Korean language. - - High-Surrogate: A Unicode code value in the range U+D800 to U+DBFF. - - Hiragana: One of two standard syllabaries associated with the Japanese - writing system. Use to write particles, grammatical affixes, and words - that have no 'kanji' form. - - IANA: Internet Assigned Numbers Authority. - - Ideograph: (1) Any symbol that denotes an idea (or meaning) in contrast - to a sound or pronunciation (for example, a 'smiley face'). (2) A - common term used to refer to Han characters. - - IPA: International Phonetic Alphabet. - - IRG: Abbreviation for Ideographic Rapporteur Group, a subgroup of - ISO/IEC JTC1/SC2/WG2 (who work on Han unification and submission of new - Han characters for inclusion in revised versions of Unicode/ISO 10646). - - Jamo: The Korean name for a single letter of the Hangul script. Jamos - are used to form Hangul syllables. - - Joiner: An invisible character that affects the joining behavior of - surrounding characters. - - JTC1: Abbreviation for Joint Technical Committee 1 of ISO/IEC, - responsible for information technology standardization. - - Kana: The name of a primarily syllabic script used by the Japanese - writing system, composed of 'hiragana' and 'katakana'. - - Kanji: The Japanese name for Han characters; derived from the Chinese - word 'hanzi'. Also romanized as 'kanzi'. - - Katakana: One of two standard syllabaries associated with the Japanese - writing system, typically used in representation of borrowed vocabulary. - - Ligature: A glyph representing a combination of two or more characters, - for example in the Latin script the ligature between 'f' and 'i' as - 'fi'. - - Logical Order: The order in which text is typed on a keyboard. For the - - McDonald June 20, 2002 [Page A-4] - - CUPS Internationalization Software Design Description v0.3 - APPENDIX A - Glossary - - most part, logical order corresponds to phonetic order. - - Lowercase: (See 'case'.) - - Low-Surrogate: A Unicode code value in the range U+DC00 to U+DFFF. - - MBCS: Acronym for 'multiple-byte character set'. - - Multiple-Byte Character Set (MBCS): A character set encoded with a - variable number of bytes per character. Many large character sets have - been defined as MBCS so as to keep strict compatibility with the - US-ASCII subset and/or [ISO2022]. - - Normalization: Transformation of data to a normal form. - - Plain Text: Computer-encoded text that consists ONLY of a sequence of - code values from a given standard, with no other formatting or - structural information. - - Precomposed Character: (See 'decomposable character'.) - - Rendering: (1) The process of selecting and laying out glyphs for the - purpose of depicting characters. (2) The process of making glyphs - visible on a display device. - - Repertoire: (See 'character repertoire'.) - - Replacement Character: A character used as a substitute for an - uninterpretable character from another encoding. [UNICODE3.2] defines - U+FFFD REPLACEMENT CHARACTER for this function. - - Rich Text: The result of adding information such as font data, color, - formatting, phonetic annotations, etc. to 'plain text' (e.g., HTML). - - SBCS: Acronym for 'single-byte character set'. - - Scalar Value: (See 'Unicode scalar value'.) - - Script: A collection of symbols used to represent textual information - in one or more writing systems. - - Single-Byte Character Set (SBCS): One of a number of one-byte character - sets defined for representing (mostly) Western languages (for example, - ISO 8859-1 'Latin-1'). These character sets are often encoded in such a - way as to be strict supersets of 7-bit [US-ASCII]. - - Sorting: (See 'collation'.) - - Transcoding: Conversion of character data between different character - sets. - - McDonald June 20, 2002 [Page A-5] - - CUPS Internationalization Software Design Description v0.3 - APPENDIX A - Glossary - - - Transformation Format: A mapping from a coded character sequence to a - unique sequence of code values (typically octets). - - UCS: Abbreviation for Universal Character Set, specified by [ISO10646]. - - UCS-2: UCS encoded in 2 octets, specified by [ISO10646]. - - UCS-4: UCS encoded in 4 octets, specified by [ISO10646]. - - Unicode Scalar Value: A number between 0 to 0x10FFFF. - - Uppercase: (See 'case'.) - - UTF: Abbreviation for Unicode (or UCS) Transformation Format. - - UTF-8: Unicode (or UCS) Transformation Format, 8-bit encoding form. - Serializes a Unicode (or UCS) scalar value (code point) as a sequence of - one to four octets. Does NOT suffer from byte-ordering ambiguities. - - UTF-16: Unicode (or UCS) Transformation Format, 16-bit encoding form. - Serializes a Unicode (or UCS) scalar value (code point) as a sequence of - two octets, in either big-endian or little-endian format. Uses an - (optional) prefix of BOM to disambiguate byte-ordering. - - UTF-32: Unicode (or UCS) Transformation Format, 32-bit encoding form. - Serializes a Unicode (or UCS) scalar value (code point) as a sequence of - four octets, in either big-endian or little-endian format. Uses an - (optional) prefix of BOM to disambiguate byte-ordering. - - Zero Width: Characteristic of some spaces or format control characters - that do not advance text along the horizontal baseline. - - - - - - - - - - - - - - - - - - - - McDonald June 20, 2002 [Page A-6] |