diff options
author | Michael Bourke <michael@iter8ve.com> | 2021-10-28 13:31:39 +1100 |
---|---|---|
committer | Michael Bourke <michael@iter8ve.com> | 2021-10-28 13:32:04 +1100 |
commit | 9b79bf0e1e74dc9f95c934302ca606856d6c4470 (patch) | |
tree | e3fb771ae0938580ce2d99b6254e3a5a8e023ed0 | |
parent | 87c102ca4612109db54cdf95e7f7a044e087d25a (diff) | |
download | mako-9b79bf0e1e74dc9f95c934302ca606856d6c4470.tar.gz |
Remove more Python 2 language from docs
Change-Id: Icdddf85b3ddf5d3e7172e318c9e75b3c9a857314
-rw-r--r-- | doc/build/filtering.rst | 22 | ||||
-rw-r--r-- | doc/build/unicode.rst | 123 | ||||
-rw-r--r-- | doc/build/unreleased/cstring_io.rst | 4 |
3 files changed, 31 insertions, 118 deletions
diff --git a/doc/build/filtering.rst b/doc/build/filtering.rst index 3d0225e..b4c3323 100644 --- a/doc/build/filtering.rst +++ b/doc/build/filtering.rst @@ -34,9 +34,15 @@ The built-in escape flags are: * ``trim`` : whitespace trimming, provided by ``string.strip()`` * ``entity`` : produces HTML entity references for applicable strings, derived from ``htmlentitydefs`` -* ``unicode`` (``str`` on Python 3): produces a Python unicode +* ``str`` : produces a Python unicode string (this function is applied by default) -* ``decode.<some encoding>``: decode input into a Python +* ``unicode`` : aliased to ``str`` above + + .. versionchanged:: 1.2.0 + Prior versions applied the ``unicode`` built-in when running in Python 2; + in 1.2.0 Mako applies the Python 3 ``str`` built-in. + +* ``decode.<some encoding>`` : decode input into a Python unicode with the specified encoding * ``n`` : disable all default filtering; only filters specified in the local expression tag will be applied. @@ -101,13 +107,13 @@ In addition to the ``expression_filter`` argument, the :class:`.TemplateLookup` can specify filtering for all expression tags at the programmatic level. This array-based argument, when given its default argument of ``None``, will be internally set to -``["unicode"]`` (or ``["str"]`` on Python 3): +``["str"]``: .. sourcecode:: python - t = TemplateLookup(directories=['/tmp'], default_filters=['unicode']) + t = TemplateLookup(directories=['/tmp'], default_filters=['str']) -To replace the usual ``unicode``/``str`` function with a +To replace the usual ``str`` function with a specific encoding, the ``decode`` filter can be substituted: .. sourcecode:: python @@ -128,7 +134,7 @@ applied first. .. sourcecode:: python - t = Template(templatetext, default_filters=['unicode', 'myfilter']) + t = Template(templatetext, default_filters=['str', 'myfilter']) To ease the usage of ``default_filters`` with custom filters, you can also add imports (or other code) to all templates using @@ -137,7 +143,7 @@ the ``imports`` argument: .. sourcecode:: python t = TemplateLookup(directories=['/tmp'], - default_filters=['unicode', 'myfilter'], + default_filters=['str', 'myfilter'], imports=['from mypackage import myfilter']) The above will generate templates something like this: @@ -148,7 +154,7 @@ The above will generate templates something like this: from mypackage import myfilter def render_body(context): - context.write(myfilter(unicode("some text"))) + context.write(myfilter(str("some text"))) .. _expression_filtering_nfilter: diff --git a/doc/build/unicode.rst b/doc/build/unicode.rst index c48fce8..060e113 100644 --- a/doc/build/unicode.rst +++ b/doc/build/unicode.rst @@ -4,74 +4,12 @@ The Unicode Chapter =================== -.. note:: this chapter was written many years ago and is very Python-2 - centric. As of Mako 1.1.3, the default template encoding is ``utf-8``. - -The Python language supports two ways of representing what we -know as "strings", i.e. series of characters. In Python 2, the -two types are ``string`` and ``unicode``, and in Python 3 they are -``bytes`` and ``string``. A key aspect of the Python 2 ``string`` and -Python 3 ``bytes`` types are that they contain no information -regarding what **encoding** the data is stored in. For this -reason they were commonly referred to as **byte strings** on -Python 2, and Python 3 makes this name more explicit. The -origins of this come from Python's background of being developed -before the Unicode standard was even available, back when -strings were C-style strings and were just that, a series of -bytes. Strings that had only values below 128 just happened to -be **ASCII** strings and were printable on the console, whereas -strings with values above 128 would produce all kinds of -graphical characters and bells. - -Contrast the "byte-string" type with the "unicode/string" type. -Objects of this latter type are created whenever you say something like -``u"hello world"`` (or in Python 3, just ``"hello world"``). In this -case, Python represents each character in the string internally -using multiple bytes per character (something similar to -UTF-16). What's important is that when using the -``unicode``/``string`` type to store strings, Python knows the -data's encoding; it's in its own internal format. Whereas when -using the ``string``/``bytes`` type, it does not. - -When Python 2 attempts to treat a byte-string as a string, which -means it's attempting to compare/parse its characters, to coerce -it into another encoding, or to decode it to a unicode object, -it has to guess what the encoding is. In this case, it will -pretty much always guess the encoding as ``ascii``... and if the -byte-string contains bytes above value 128, you'll get an error. -Python 3 eliminates much of this confusion by just raising an -error unconditionally if a byte-string is used in a -character-aware context. - -There is one operation that Python *can* do with a non-ASCII -byte-string, and it's a great source of confusion: it can dump the -byte-string straight out to a stream or a file, with nary a care -what the encoding is. To Python, this is pretty much like -dumping any other kind of binary data (like an image) to a -stream somewhere. In Python 2, it is common to see programs that -embed all kinds of international characters and encodings into -plain byte-strings (i.e. using ``"hello world"`` style literals) -can fly right through their run, sending reams of strings out to -wherever they are going, and the programmer, seeing the same -output as was expressed in the input, is now under the illusion -that his or her program is Unicode-compliant. In fact, their -program has no unicode awareness whatsoever, and similarly has -no ability to interact with libraries that *are* unicode aware. -Python 3 makes this much less likely by defaulting to unicode as -the storage format for strings. - -The "pass through encoded data" scheme is what template -languages like Cheetah and earlier versions of Myghty do by -default. In Python 3 Mako only allows -usage of native, unicode strings. - In normal Mako operation, all parsed template constructs and -output streams are handled internally as Python ``unicode`` -objects. It's only at the point of :meth:`~.Template.render` that this unicode -stream may be rendered into whatever the desired output encoding +output streams are handled internally as Python 3 ``str`` (Unicode) +objects. It's only at the point of :meth:`~.Template.render` that this stream of Unicode objects may be rendered into whatever the desired output encoding is. The implication here is that the template developer must :ensure that :ref:`the encoding of all non-ASCII templates is explicit -<set_template_file_encoding>` (still required in Python 3), +<set_template_file_encoding>` (still required in Python 3, although Mako defaults to ``utf-8``), that :ref:`all non-ASCII-encoded expressions are in one way or another converted to unicode <handling_non_ascii_expressions>` (not much of a burden in Python 3), and that :ref:`the output stream of the @@ -129,51 +67,34 @@ looks something like this: .. sourcecode:: python - context.write(unicode("hello world")) - -In Python 3, it's just: - -.. sourcecode:: python - context.write(str("hello world")) That is, **the output of all expressions is run through the -``unicode`` built-in**. This is the default setting, and can be -modified to expect various encodings. The ``unicode`` step serves +``str`` built-in**. This is the default setting, and can be +modified to expect various encodings. The ``str`` step serves both the purpose of rendering non-string expressions into strings (such as integers or objects which contain ``__str()__`` methods), and to ensure that the final output stream is -constructed as a unicode object. The main implication of this is +constructed as a Unicode object. The main implication of this is that **any raw byte-strings that contain an encoding other than -ASCII must first be decoded to a Python unicode object**. It -means you can't say this in Python 2: - -.. sourcecode:: mako - - ${"voix m’a réveillé."} ## error in Python 2! - -You must instead say this: - -.. sourcecode:: mako - - ${u"voix m’a réveillé."} ## OK ! +ASCII must first be decoded to a Python unicode object**. Similarly, if you are reading data from a file that is streaming bytes, or returning data from some object that is returning a Python byte-string containing a non-ASCII encoding, you have to -explicitly decode to unicode first, such as: +explicitly decode to Unicode first, such as: .. sourcecode:: mako ${call_my_object().decode('utf-8')} Note that filehandles acquired by ``open()`` in Python 3 default -to returning "text", that is the decoding is done for you. See +to returning "text": that is, the decoding is done for you. See Python 3's documentation for the ``open()`` built-in for details on this. If you want a certain encoding applied to *all* expressions, -override the ``unicode`` builtin with the ``decode`` built-in at the +override the ``str`` builtin with the ``decode`` built-in at the :class:`.Template` or :class:`.TemplateLookup` level: .. sourcecode:: python @@ -181,7 +102,7 @@ override the ``unicode`` builtin with the ``decode`` built-in at the t = Template(templatetext, default_filters=['decode.utf8']) Note that the built-in ``decode`` object is slower than the -``unicode`` function, since unlike ``unicode`` it's not a Python +``str`` function, since unlike ``str`` it's not a Python built-in, and it also checks the type of the incoming data to determine if string conversion is needed first. @@ -194,7 +115,7 @@ in :ref:`filtering_default_filters`. Defining Output Encoding ======================== -Now that we have a template which produces a pure unicode output +Now that we have a template which produces a pure Unicode output stream, all the hard work is done. We can take the output and do anything with it. @@ -218,7 +139,7 @@ encoding is specified. By default it performs no encoding and returns a native string. :meth:`~.Template.render_unicode` will return the template output as a Python -``unicode`` object (or ``string`` in Python 3): +``str`` object: .. sourcecode:: python @@ -230,21 +151,3 @@ you can encode yourself by saying: .. sourcecode:: python print(mytemplate.render_unicode().encode('utf-8', 'replace')) - -Buffer Selection ----------------- - -Mako does play some games with the style of buffering used -internally, to maximize performance. Since the buffer is by far -the most heavily used object in a render operation, it's -important! - -When calling :meth:`~.Template.render` on a template that does not specify any -output encoding (i.e. it's ``ascii``), Python's ``cStringIO`` module, -which cannot handle encoding of non-ASCII ``unicode`` objects -(even though it can send raw byte-strings through), is used for -buffering. Otherwise, a custom Mako class called -``FastEncodingBuffer`` is used, which essentially is a super -dumbed-down version of ``StringIO`` that gathers all strings into -a list and uses ``''.join(elements)`` to produce the final output --- it's markedly faster than ``StringIO``. diff --git a/doc/build/unreleased/cstring_io.rst b/doc/build/unreleased/cstring_io.rst new file mode 100644 index 0000000..2653564 --- /dev/null +++ b/doc/build/unreleased/cstring_io.rst @@ -0,0 +1,4 @@ +.. change:: + :tags: py3k + + With the removal of Python 2's ``cStringIO``, Mako now uses its own internal ``FastEncodingBuffer`` exclusively. |