summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorStefan Behnel <stefan_ml@behnel.de>2013-06-02 15:48:01 +0200
committerStefan Behnel <stb@skoobe.de>2013-07-14 13:42:48 +0200
commit49e31be2c2170ac1045483cae0afdd399869aa40 (patch)
tree72509ca6248e51dd94ea1529851f610e3424eada
parente510060aadf470ceda7e97a2b734c68708684d82 (diff)
downloadcython-49e31be2c2170ac1045483cae0afdd399869aa40.tar.gz
add explicit section on Cython's Python string types
-rw-r--r--docs/src/tutorial/strings.rst45
1 files changed, 37 insertions, 8 deletions
diff --git a/docs/src/tutorial/strings.rst b/docs/src/tutorial/strings.rst
index cea7d7660..0e6ef81c4 100644
--- a/docs/src/tutorial/strings.rst
+++ b/docs/src/tutorial/strings.rst
@@ -3,14 +3,43 @@
Unicode and passing strings
===========================
-Similar to the string semantics in Python 3, Cython also strictly
-separates byte strings and unicode strings. Above all, this means
-that by default there is no automatic conversion between byte strings
-and unicode strings (except for what Python 2 does in string operations).
-All encoding and decoding must pass through an explicit encoding/decoding
-step. For simple cases, the module-level ``c_string_type`` and
-``c_string_encoding`` directives can be used to implicitly insert these
-encoding/decoding steps to ease conversion between Python and C strings.
+Similar to the string semantics in Python 3, Cython strictly separates
+byte strings and unicode strings. Above all, this means that by default
+there is no automatic conversion between byte strings and unicode strings
+(except for what Python 2 does in string operations). All encoding and
+decoding must pass through an explicit encoding/decoding step. To ease
+conversion between Python and C strings in simple cases, the module-level
+``c_string_type`` and ``c_string_encoding`` directives can be used to
+implicitly insert these encoding/decoding steps.
+
+
+Python string types in Cython code
+----------------------------------
+
+Cython supports three Python string types: :type:`bytes`, :type:`str`
+and :type:`unicode`. The :type:`str` type is special in that it is the
+byte string in Python 2 and the Unicode string in Python 3 (for Cython
+code compiled with language level 2, i.e. the default). Thus, in Python
+2, both :type:`bytes` and :type:`str` represent the byte string type,
+whereas in Python 3, :type:`str` and :type:`unicode` represent the Python
+Unicode string type. The switch is made at C compile time, the Python
+version that is used to run Cython is not relevant.
+
+When compiling Cython code with language level 3, the :type:`str` type
+is identified with exactly the Unicode string type at Cython compile time,
+i.e. it no does not identify with :type:`bytes` when running in Python 2.
+
+Note that the :type:`str` type is not compatible with the :type:`unicode`
+type in Python 2, i.e. you cannot assign a Unicode string to a variable
+or argument that is typed :type:`str`. The attempt will result in either
+a compile time error (if detectable) or a ``TypeError`` exception at
+runtime. You should therefore be careful when you statically type a
+string variable in code that must be compatible with Python 2, as this
+Python version allows a mix of byte strings and unicode strings for data
+and users normally expect code to be able to work with both. Code that
+only targets Python 3 can safely type variables and arguments as either
+:type:`bytes` or :type:`unicode`.
+
General notes about C strings
-----------------------------