summaryrefslogtreecommitdiff
path: root/docs/howto/i18n.txt
blob: 4a5b1de5a9a33b426a824934cd375db8a9ea02d3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
================================
 Docutils_ Internationalization
================================

:Author: David Goodger
:Contact: goodger@users.sourceforge.net
:Date: $Date$
:Revision: $Revision$
:Copyright: This document has been placed in the public domain.


.. contents::


This document describes the internationalization facilities of the
Docutils_ project.  `Introduction to i18n`_ by Tomohiro KUBOTA is a
good general reference.  "Internationalization" is often abbreviated
as "i18n": "i" + 18 letters + "n".

.. Note::

   The i18n facilities of Docutils should be considered a "first
   draft".  They work so far, but improvements are welcome.
   Specifically, standard i18n facilities like "gettext" have yet to
   be explored.

Docutils is designed to work flexibly with text in multiple languages
(one language at a time).  Language-specific features are (or should
be [#]_) fully parameterized.  To enable a new language, two modules
have to be added to the project: one for Docutils itself (the
`Docutils Language Module`_) and one for the reStructuredText parser
(the `reStructuredText Language Module`_).

.. [#] If anything in Docutils is insufficiently parameterized, it
   should be considered a bug.  Please report bugs to the Docutils
   project bug tracker on SourceForge at
   http://sourceforge.net/tracker/?group_id=38414&atid=422030.

.. _Docutils: http://docutils.sourceforge.net/
.. _Introduction to i18n:
   http://www.debian.org/doc/manuals/intro-i18n/


Language Module Names
=====================

Language modules are named using a case-insensitive language
identifier as defined in `RFC 1766`_, converting hyphens to
underscores [#]_.  A typical language identifier consists of a
2-letter language code from `ISO 639`_ (3-letter codes can be used if
no 2-letter code exists; RFC 1766 is currently being revised to allow
3-letter codes).  The language identifier can have an optional subtag,
typically for variations based on country (from `ISO 3166`_ 2-letter
country codes).  If no language identifier is specified, the default
is "en" for English.  Examples of module names include ``en.py``,
``fr.py``, ``ja.py``, and ``pt_br.py``.

.. [#] Subtags are separated from primary tags by underscores instead
   of hyphens, to conform to Python naming rules.

.. _RFC 1766: http://www.faqs.org/rfcs/rfc1766.html
.. _ISO 639: http://www.loc.gov/standards/iso639-2/englangn.html
.. _ISO 3166: http://www.iso.ch/iso/en/prods-services/iso3166ma/
   02iso-3166-code-lists/index.html


Python Code
===========

All Python code in Docutils will be ASCII-only.  In language modules,
Unicode-escapes will have to be used for non-ASCII characters.
Although it may be possible for some developers to store non-ASCII
characters directly in strings, it will cause problems for other
developers whose locales are set up differently.

`PEP 263`_ introduces source code encodings to Python modules,
implemented beginning in Python 2.3.  Until PEP 263 is fully
implemented as a well-established convention, proven robust in daily
use, and the tools (editors, CVS, email, etc.) recognize this
convention, Docutils shall remain conservative.

As mentioned in the note above, developers are invited to explore
"gettext" and other i18n technologies.

.. _PEP 263: http://www.python.org/peps/pep-0263.html


Docutils Language Module
========================

Modules in ``docutils/languages`` contain language mappings for
markup-independent language-specific features of Docutils.  To make a
new language module, just copy the ``en.py`` file, rename it with the
code for your language (see `Language Module Names`_ above), and
translate the terms as described below.

Each Docutils language module contains three module attributes:

``labels``
    This is a mapping of node class names to language-dependent
    boilerplate label text.  The label text is used by Writer
    components when they encounter document tree elements whose class
    names are the mapping keys.

    The entry values (*not* the keys) should be translated to the
    target language.

``bibliographic_fields``
    This is a mapping of language-dependent field names (converted to
    lower case) to canonical field names (keys of
    ``DocInfo.biblio_notes`` in ``docutils.transforms.frontmatter``).
    It is used when transforming bibliographic fields.

    The keys should be translated to the target language.

``author_separators``
    This is a list of strings used to parse the 'Authors'
    bibliographic field.  They separate individual authors' names, and
    are tried in order (i.e., earlier items take priority, and the
    first item that matches wins).  The English-language module
    defines them as ``[';', ',']``; semi-colons can be used to
    separate names like "Arthur Pewtie, Esq.".

    Most languages won't have to "translate" this list.


reStructuredText Language Module
================================

Modules in ``docutils/parsers/rst/languages`` contain language
mappings for language-specific features of the reStructuredText
parser.  To make a new language module, just copy the ``en.py`` file,
rename it with the code for your language (see `Language Module
Names`_ above), and translate the terms as described below.

Each reStructuredText language module contains two module attributes:

``directives``
    This is a mapping from language-dependent directive names to
    canonical directive names.  The canonical directive names are
    registered in ``docutils/parsers/rst/directives/__init__.py``, in
    ``_directive_registry``.

    The keys should be translated to the target language.  Synonyms
    (multiple keys with the same values) are allowed; this is useful
    for abbreviations.

``roles``
    This is a mapping language-dependent role names to canonical role
    names for interpreted text.  The canonical directive names are
    registered in ``docutils/parsers/rst/states.py``, in
    ``Inliner._interpreted_roles`` (this may change).

    The keys should be translated to the target language.  Synonyms
    (multiple keys with the same values) are allowed; this is useful
    for abbreviations.


Testing the Language Modules
============================

Whenever a new language module is added or an existing one modified,
the unit tests should be run.  The test modules can be found in the
docutils/test directory from CVS_ or from the `latest CVS snapshot`_.

The ``test_language.py`` module can be run as a script.  With no
arguments, it will test all language modules.  With one or more
language codes, it will test just those languages.  For example::

    $ python test_language.py en
    ..
    ----------------------------------------
    Ran 2 tests in 0.095s

    OK

Use the "alltests.py" script to run all test modules, exhaustively
testing the parser and other parts of the Docutils system.

.. _CVS: http://sourceforge.net/cvs/?group_id=38414
.. _latest CVS snapshot: http://docutils.sf.net/docutils-snapshot.tgz


Submitting the Language Modules
===============================

If you do not have CVS write access and want to contribute your
language modules, feel free to submit them at the `SourceForge patch
tracker`__.

__ http://sourceforge.net/tracker/?group_id=38414&atid=422032