summaryrefslogtreecommitdiff
path: root/docs/messages.rst
diff options
context:
space:
mode:
authorArmin Ronacher <armin.ronacher@active-4.com>2013-07-04 15:41:16 +0200
committerArmin Ronacher <armin.ronacher@active-4.com>2013-07-04 15:41:16 +0200
commit874e39cb47cbba9b458b7bfe28fd4a4fc4d91844 (patch)
tree4e92d12640104c57e0b7388714b3ea20ff5793de /docs/messages.rst
parenta1318b5cd7640520a5a6ec5e88658e432f2438db (diff)
downloadbabel-874e39cb47cbba9b458b7bfe28fd4a4fc4d91844.tar.gz
Moved doc to docs
Diffstat (limited to 'docs/messages.rst')
-rw-r--r--docs/messages.rst312
1 files changed, 312 insertions, 0 deletions
diff --git a/docs/messages.rst b/docs/messages.rst
new file mode 100644
index 0000000..9f4bfab
--- /dev/null
+++ b/docs/messages.rst
@@ -0,0 +1,312 @@
+.. -*- mode: rst; encoding: utf-8 -*-
+
+=============================
+Working with Message Catalogs
+=============================
+
+
+Introduction
+============
+
+The ``gettext`` translation system enables you to mark any strings used in your
+application as subject to localization, by wrapping them in functions such as
+``gettext(str)`` and ``ngettext(singular, plural, num)``. For brevity, the
+``gettext`` function is often aliased to ``_(str)``, so you can write:
+
+.. code-block:: python
+
+ print _("Hello")
+
+instead of just:
+
+.. code-block:: python
+
+ print "Hello"
+
+to make the string "Hello" localizable.
+
+Message catalogs are collections of translations for such localizable messages
+used in an application. They are commonly stored in PO (Portable Object) and MO
+(Machine Object) files, the formats of which are defined by the GNU `gettext`_
+tools and the GNU `translation project`_.
+
+ .. _`gettext`: http://www.gnu.org/software/gettext/
+ .. _`translation project`: http://sourceforge.net/projects/translation
+
+The general procedure for building message catalogs looks something like this:
+
+ * use a tool (such as ``xgettext``) to extract localizable strings from the
+ code base and write them to a POT (PO Template) file.
+ * make a copy of the POT file for a specific locale (for example, "en_US")
+ and start translating the messages
+ * use a tool such as ``msgfmt`` to compile the locale PO file into an binary
+ MO file
+ * later, when code changes make it necessary to update the translations, you
+ regenerate the POT file and merge the changes into the various
+ locale-specific PO files, for example using ``msgmerge``
+
+Python provides the `gettext module`_ as part of the standard library, which
+enables applications to work with appropriately generated MO files.
+
+ .. _`gettext module`: http://docs.python.org/lib/module-gettext.html
+
+As ``gettext`` provides a solid and well supported foundation for translating
+application messages, Babel does not reinvent the wheel, but rather reuses this
+infrastructure, and makes it easier to build message catalogs for Python
+applications.
+
+
+Message Extraction
+==================
+
+Babel provides functionality similar to that of the ``xgettext`` program,
+except that only extraction from Python source files is built-in, while support
+for other file formats can be added using a simple extension mechanism.
+
+Unlike ``xgettext``, which is usually invoked once for every file, the routines
+for message extraction in Babel operate on directories. While the per-file
+approach of ``xgettext`` works nicely with projects using a ``Makefile``,
+Python projects rarely use ``make``, and thus a different mechanism is needed
+for extracting messages from the heterogeneous collection of source files that
+many Python projects are composed of.
+
+When message extraction is based on directories instead of individual files,
+there needs to be a way to configure which files should be treated in which
+manner. For example, while many projects may contain ``.html`` files, some of
+those files may be static HTML files that don't contain localizable message,
+while others may be `Django`_ templates, and still others may contain `Genshi`_
+markup templates. Some projects may even mix HTML files for different templates
+languages (for whatever reason). Therefore the way in which messages are
+extracted from source files can not only depend on the file extension, but
+needs to be controllable in a precise manner.
+
+.. _`Django`: http://www.djangoproject.com/
+.. _`Genshi`: http://genshi.edgewall.org/
+
+Babel accepts a configuration file to specify this mapping of files to
+extraction methods, which is described below.
+
+
+.. _`frontends`:
+
+----------
+Front-Ends
+----------
+
+Babel provides two different front-ends to access its functionality for working
+with message catalogs:
+
+ * A `Command-line interface <cmdline.html>`_, and
+ * `Integration with distutils/setuptools <setup.html>`_
+
+Which one you choose depends on the nature of your project. For most modern
+Python projects, the distutils/setuptools integration is probably more
+convenient.
+
+
+.. _`mapping`:
+
+-------------------------------------------
+Extraction Method Mapping and Configuration
+-------------------------------------------
+
+The mapping of extraction methods to files in Babel is done via a configuration
+file. This file maps extended glob patterns to the names of the extraction
+methods, and can also set various options for each pattern (which options are
+available depends on the specific extraction method).
+
+For example, the following configuration adds extraction of messages from both
+Genshi markup templates and text templates:
+
+.. code-block:: ini
+
+ # Extraction from Python source files
+
+ [python: **.py]
+
+ # Extraction from Genshi HTML and text templates
+
+ [genshi: **/templates/**.html]
+ ignore_tags = script,style
+ include_attrs = alt title summary
+
+ [genshi: **/templates/**.txt]
+ template_class = genshi.template:TextTemplate
+ encoding = ISO-8819-15
+
+ # Extraction from JavaScript files
+
+ [javascript: **.js]
+ extract_messages = $._, jQuery._
+
+The configuration file syntax is based on the format commonly found in ``.INI``
+files on Windows systems, and as supported by the ``ConfigParser`` module in
+the Python standard library. Section names (the strings enclosed in square
+brackets) specify both the name of the extraction method, and the extended glob
+pattern to specify the files that this extraction method should be used for,
+separated by a colon. The options in the sections are passed to the extraction
+method. Which options are available is specific to the extraction method used.
+
+The extended glob patterns used in this configuration are similar to the glob
+patterns provided by most shells. A single asterisk (``*``) is a wildcard for
+any number of characters (except for the pathname component separator "/"),
+while a question mark (``?``) only matches a single character. In addition,
+two subsequent asterisk characters (``**``) can be used to make the wildcard
+match any directory level, so the pattern ``**.txt`` matches any file with the
+extension ``.txt`` in any directory.
+
+Lines that start with a ``#`` or ``;`` character are ignored and can be used
+for comments. Empty lines are ignored, too.
+
+.. note:: if you're performing message extraction using the command Babel
+ provides for integration into ``setup.py`` scripts, you can also
+ provide this configuration in a different way, namely as a keyword
+ argument to the ``setup()`` function. See `Distutils/Setuptools
+ Integration`_ for more information.
+
+.. _`distutils/setuptools integration`: setup.html
+
+
+Default Extraction Methods
+--------------------------
+
+Babel comes with a few builtin extractors: ``python`` (which extracts
+messages from Python source files), ``javascript``, and ``ignore`` (which
+extracts nothing).
+
+The ``python`` extractor is by default mapped to the glob pattern ``**.py``,
+meaning it'll be applied to all files with the ``.py`` extension in any
+directory. If you specify your own mapping configuration, this default mapping
+is discarded, so you need to explicitly add it to your mapping (as shown in the
+example above.)
+
+
+.. _`referencing extraction methods`:
+
+Referencing Extraction Methods
+------------------------------
+
+To be able to use short extraction method names such as “genshi”, you need to
+have `pkg_resources`_ installed, and the package implementing that extraction
+method needs to have been installed with its meta data (the `egg-info`_).
+
+If this is not possible for some reason, you need to map the short names to
+fully qualified function names in an extract section in the mapping
+configuration. For example:
+
+.. code-block:: ini
+
+ # Some custom extraction method
+
+ [extractors]
+ custom = mypackage.module:extract_custom
+
+ [custom: **.ctm]
+ some_option = foo
+
+Note that the builtin extraction methods ``python`` and ``ignore`` are available
+by default, even if `pkg_resources`_ is not installed. You should never need to
+explicitly define them in the ``[extractors]`` section.
+
+.. _`egg-info`: http://peak.telecommunity.com/DevCenter/PythonEggs
+.. _`pkg_resources`: http://peak.telecommunity.com/DevCenter/PkgResources
+
+
+--------------------------
+Writing Extraction Methods
+--------------------------
+
+Adding new methods for extracting localizable methods is easy. First, you'll
+need to implement a function that complies with the following interface:
+
+.. code-block:: python
+
+ def extract_xxx(fileobj, keywords, comment_tags, options):
+ """Extract messages from XXX files.
+
+ :param fileobj: the file-like object the messages should be extracted
+ from
+ :param keywords: a list of keywords (i.e. function names) that should
+ be recognized as translation functions
+ :param comment_tags: a list of translator tags to search for and
+ include in the results
+ :param options: a dictionary of additional options (optional)
+ :return: an iterator over ``(lineno, funcname, message, comments)``
+ tuples
+ :rtype: ``iterator``
+ """
+
+.. note:: Any strings in the tuples produced by this function must be either
+ ``unicode`` objects, or ``str`` objects using plain ASCII characters.
+ That means that if sources contain strings using other encodings, it
+ is the job of the extractor implementation to do the decoding to
+ ``unicode`` objects.
+
+Next, you should register that function as an entry point. This requires your
+``setup.py`` script to use `setuptools`_, and your package to be installed with
+the necessary metadata. If that's taken care of, add something like the
+following to your ``setup.py`` script:
+
+.. code-block:: python
+
+ def setup(...
+
+ entry_points = """
+ [babel.extractors]
+ xxx = your.package:extract_xxx
+ """,
+
+That is, add your extraction method to the entry point group
+``babel.extractors``, where the name of the entry point is the name that people
+will use to reference the extraction method, and the value being the module and
+the name of the function (separated by a colon) implementing the actual
+extraction.
+
+.. note:: As shown in `Referencing Extraction Methods`_, declaring an entry
+ point is not strictly required, as users can still reference the
+ extraction function directly. But whenever possible, the entry point
+ should be declared to make configuration more convenient.
+
+.. _`setuptools`: http://peak.telecommunity.com/DevCenter/setuptools
+
+
+-------------------
+Translator Comments
+-------------------
+
+First of all what are comments tags. Comments tags are excerpts of text to
+search for in comments, only comments, right before the `python gettext`_
+calls, as shown on the following example:
+
+ .. _`python gettext`: http://docs.python.org/lib/module-gettext.html
+
+.. code-block:: python
+
+ # NOTE: This is a comment about `Foo Bar`
+ _('Foo Bar')
+
+The comments tag for the above example would be ``NOTE:``, and the translator
+comment for that tag would be ``This is a comment about `Foo Bar```.
+
+The resulting output in the catalog template would be something like::
+
+ #. This is a comment about `Foo Bar`
+ #: main.py:2
+ msgid "Foo Bar"
+ msgstr ""
+
+Now, you might ask, why would I need that?
+
+Consider this simple case; you have a menu item called “manual”. You know what
+it means, but when the translator sees this they will wonder did you mean:
+
+1. a document or help manual, or
+2. a manual process?
+
+This is the simplest case where a translation comment such as
+“The installation manual” helps to clarify the situation and makes a translator
+more productive.
+
+.. note:: Whether translator comments can be extracted depends on the extraction
+ method in use. The Python extractor provided by Babel does implement
+ this feature, but others may not.