summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSeth Morton <seth.m.morton@gmail.com>2022-01-30 19:17:17 -0800
committerGitHub <noreply@github.com>2022-01-30 19:17:17 -0800
commit473348f820941cae18d377bde986e56ecc0ff37e (patch)
treedeb3521c2a0f727b0d43a7f55d0d226f9e3959d2
parent24d7a4c73d655a72baae1acd8a988837a6cc8cac (diff)
parent1606199f3e693d8e3f61447d8dd370af4435ee02 (diff)
downloadnatsort-473348f820941cae18d377bde986e56ecc0ff37e.tar.gz
Merge pull request #147 from SethMMorton/remove-python2-mentions-in-documentation
Remove python2 mentions in documentation
-rw-r--r--README.rst43
-rw-r--r--docs/api.rst4
-rw-r--r--docs/examples.rst9
-rw-r--r--docs/howitworks.rst108
4 files changed, 97 insertions, 67 deletions
diff --git a/README.rst b/README.rst
index fb9c081..81f4f8e 100644
--- a/README.rst
+++ b/README.rst
@@ -34,11 +34,10 @@ Simple yet flexible natural sorting in Python.
- `Installation`_
- `How to Run Tests`_
- `How to Build Documentation`_
- - `Deprecation Schedule`_
+ - `Dropped Deprecated APIs`_
- `History`_
-**NOTE**: Please see the `Deprecation Schedule`_ section for changes in
-``natsort`` version 7.0.0.
+**NOTE**: Please see the `Dropped Deprecated APIs`_ section for changes.
Quick Description
-----------------
@@ -103,7 +102,7 @@ Quick Examples
- `Locale-Aware Sorting (or "Human Sorting")`_
- `Further Customizing Natsort`_
- `Sorting Mixed Types`_
-- `Handling Bytes on Python 3`_
+- `Handling Bytes`_
- `Generating a Reusable Sorting Key and Sorting In-Place`_
- `Other Useful Things`_
@@ -241,26 +240,23 @@ when you sort:
>>> a = ['4.5', 6, 2.0, '5', 'a']
>>> natsorted(a)
[2.0, '4.5', '5', 6, 'a']
- >>> # On Python 2, sorted(a) would return [2.0, 6, '4.5', '5', 'a']
- >>> # On Python 3, sorted(a) would raise an "unorderable types" TypeError
+ >>> # sorted(a) would raise an "unorderable types" TypeError
-Handling Bytes on Python 3
-++++++++++++++++++++++++++
+Handling Bytes
+++++++++++++++
-``natsort`` does not officially support the `bytes` type on Python 3, but
+``natsort`` does not officially support the `bytes` type, but
convenience functions are provided that help you decode to `str` first:
.. code-block:: pycon
>>> from natsort import as_utf8
>>> a = [b'a', 14.0, 'b']
- >>> # On Python 2, natsorted(a) would would work as expected.
- >>> # On Python 3, natsorted(a) would raise a TypeError (bytes() < str())
+ >>> # natsorted(a) would raise a TypeError (bytes() < str())
>>> natsorted(a, key=as_utf8) == [14.0, b'a', 'b']
True
>>> a = [b'a56', b'a5', b'a6', b'a40']
- >>> # On Python 2, natsorted(a) would would work as expected.
- >>> # On Python 3, natsorted(a) would return the same results as sorted(a)
+ >>> # natsorted(a) would return the same results as sorted(a)
>>> natsorted(a, key=as_utf8) == [b'a5', b'a6', b'a40', b'a56']
True
@@ -446,27 +442,8 @@ use ``tox``:
This will place the documentation in ``build/sphinx/html``.
-Deprecation Schedule
---------------------
-
-Dropped Python 3.4 and Python 3.5 Support
-+++++++++++++++++++++++++++++++++++++++++
-
-``natsort`` version 8.0.0 dropped support for Python < 3.6.
-
-Dropped Python 2.7 Support
-++++++++++++++++++++++++++
-
-``natsort`` version 7.0.0 dropped support for Python 2.7.
-
-The version 6.X branch will remain as a "long term support" branch where bug
-fixes are applied so that users who cannot update from Python 2.7 will not be
-forced to use a buggy ``natsort`` version (bug fixes will need to be requested;
-by default only the 7.X branch will be updated).
-New features would not be added to version 6.X, only bug fixes.
-
Dropped Deprecated APIs
-+++++++++++++++++++++++
+-----------------------
In ``natsort`` version 6.0.0, the following APIs and functions were removed
diff --git a/docs/api.rst b/docs/api.rst
index 242b22d..ddf6964 100644
--- a/docs/api.rst
+++ b/docs/api.rst
@@ -83,8 +83,8 @@ Convenience Functions
.. _bytes_help:
-Help With Bytes On Python 3
-+++++++++++++++++++++++++++
+Help With Bytes
++++++++++++++++
The official stance of :mod:`natsort` is to not support `bytes` for
sorting; there is just too much that can go wrong when trying to automate
diff --git a/docs/examples.rst b/docs/examples.rst
index 8bc05c0..689983a 100644
--- a/docs/examples.rst
+++ b/docs/examples.rst
@@ -349,10 +349,10 @@ Just like the :func:`sorted` built-in function, you can supply the
>>> natsorted(a, reverse=True)
['a10', 'a9', 'a4', 'a2', 'a1']
-Sorting Bytes on Python 3
--------------------------
+Sorting Bytes
+-------------
-Python 3 is rather strict about comparing strings and bytes, and this
+Python is rather strict about comparing strings and bytes, and this
can make it difficult to deal with collections of both. Because of the
challenge of guessing which encoding should be used to decode a bytes
array to a string, :mod:`natsort` does *not* try to guess and automatically
@@ -368,8 +368,7 @@ array, so you can use the key on any arbitrary collection of data.
>>> from natsort import as_ascii
>>> a = [b'a', 14.0, 'b']
- >>> # On Python 2, natsorted(a) would would work as expected.
- >>> # On Python 3, natsorted(a) would raise a TypeError (bytes() < str())
+ >>> # natsorted(a) would raise a TypeError (bytes() < str())
>>> natsorted(a, key=as_ascii) == [14.0, b'a', 'b']
True
diff --git a/docs/howitworks.rst b/docs/howitworks.rst
index c60bbeb..d1c4559 100644
--- a/docs/howitworks.rst
+++ b/docs/howitworks.rst
@@ -413,11 +413,18 @@ is done, we can see how comparisons can be done in the expected manner.
>>> a > b
True
-Comparing Different Types on Python 3
-+++++++++++++++++++++++++++++++++++++
+.. note::
+
+ The actual :meth:`decompose_path_into_components`-equivalent function in
+ :mod:`natsort` actually has a few more heuristics than shown here so that
+ it is not over-zealous in what it defines as a path suffix, but this has
+ been omitted in this how-to for clarity.
+
+Comparing Different Types
++++++++++++++++++++++++++
`The second major special case I encountered was sorting of different types`_.
-If you are on Python 2 (i.e. legacy Python), this mostly doesn't matter *too*
+On Python 2 (i.e. legacy Python), this mostly didnt't matter *too*
much since it uses an arbitrary heuristic to allow traditionally un-comparable
types to be compared (such as comparing ``'a'`` to ``1``). However, on Python 3
(i.e. Python) it simply won't let you perform such nonsense, raising a
@@ -662,9 +669,9 @@ These can be summed up as follows:
#. :mod:`locale` is a thin wrapper over your operating system's *locale*
library, so if *that* is broken (like it is on BSD and OSX) then
:mod:`locale` is broken in Python.
-#. Because of a bug in legacy Python (i.e. Python 2), there is no uniform
+#. Because of a bug in legacy Python (i.e. Python 2), there was no uniform
way to use the :mod:`locale` sorting functionality between legacy Python
- and Python 3.
+ and Python (luckily this is no longer an issue now that Python 2 is EOL).
#. People have differing opinions of how capitalization should affect word
order.
#. There is no built-in way to handle locale-dependent thousands separators
@@ -715,23 +722,13 @@ easy... just call the :meth:`str.swapcase` method on the input.
>>> sorted(a, key=lambda x: x.swapcase())
['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']
-The last (i call it *IGNORECASE*) should be super easy, right?
-Simply call :meth:`str.lowercase` on the input. This will work but may
-not always give the correct answer on non-latin character sets. It's
-a good thing that in Python 3.3
-:meth:`str.casefold` was introduced, which does a better job of removing
-all case information from unicode characters in
-non-latin alphabets.
+The last (i call it *IGNORECASE*) is pretty easy.
+Simply call :meth:`str.casefold` on the input (it's like :meth:`std.lowercase`
+but does a better job on non-latin character sets).
.. code-block:: pycon
- >>> def remove_case(x):
- ... try:
- ... return x.casefold()
- ... except AttributeError: # Legacy Python backwards compatibility
- ... return x.lowercase()
- ...
- >>> sorted(a, key=remove_case)
+ >>> sorted(a, key=lambda x: x.casefold())
['Apple', 'apple', 'Banana', 'banana', 'corn', 'Corn']
The middle case (I call it *GROUPLETTERS*) is less straightforward.
@@ -742,7 +739,7 @@ with its lowercase version and then the original character.
>>> import itertools
>>> def groupletters(x):
- ... return ''.join(itertools.chain.from_iterable((remove_case(y), y) for y in x))
+ ... return ''.join(itertools.chain.from_iterable((y.casefold(), y) for y in x))
...
>>> groupletters('Apple')
'aAppppllee'
@@ -904,13 +901,69 @@ characters; otherwise, numbers won't be parsed properly. Therefore, it must
be applied as part of the :func:`coerce_to_int`/:func:`coerce_to_float`
functions in a manner similar to :func:`groupletters`.
-As you might have guessed, there is a small problem.
-It turns out the there is a bug in the legacy Python implementation of
-:func:`locale.strxfrm` that causes it to outright fail for :func:`unicode`
-input (https://bugs.python.org/issue2481). :func:`locale.strcoll` works,
-but is intended for use with ``cmp``, which does not exist in current Python
-implementations. Luckily, the :func:`functools.cmp_to_key` function
-makes :func:`locale.strcoll` behave like :func:`locale.strxfrm`.
+Unicode Support With Local
+++++++++++++++++++++++++++
+
+Remember how in the `Basic Unicode Support`_ section I mentioned that we
+use the "decompressed" Unicode normalization form (e.g. NFD) on all inputs
+to ensure the order is as expected?
+
+If you have been following along so far, you probably expect that it is not
+that easy. You would be correct.
+
+It turns out that some locales (but not all) expect the input to be in
+"compressed form" (e.g. NFC) or the ordering is not as you might expect.
+`Check out this issue for a real-world example`_. Here's a relevant
+snippet of code
+
+.. code-block:: pycon
+
+ In [1]: import locale, unicodedata
+
+ In [2]: a = ['Aš', 'Cheb', 'Česko', 'Cibulov', 'Znojmo', 'Žilina']
+
+ In [3]: locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
+ Out[3]: 'en_US.UTF-8'
+
+ In [4]: sorted(a, key=locale.strxfrm)
+ Out[4]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
+
+ In [5]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFD", x)))
+ Out[5]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
+
+ In [6]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFC", x)))
+ Out[6]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
+
+ In [7]: locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8')
+ Out[7]: 'de_DE.UTF-8'
+
+ In [8]: sorted(a, key=locale.strxfrm)
+ Out[8]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
+
+ In [9]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFD", x)))
+ Out[9]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
+
+ In [10]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFC", x)))
+ Out[10]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
+
+ In [11]: locale.setlocale(locale.LC_ALL, 'cs_CZ.UTF-8')
+ Out[11]: 'cs_CZ.UTF-8'
+
+ In [12]: sorted(a, key=locale.strxfrm)
+ Out[12]: ['Aš', 'Cibulov', 'Česko', 'Cheb', 'Znojmo', 'Žilina']
+
+ In [13]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFD", x)))
+ Out[13]: ['Aš', 'Česko', 'Cibulov', 'Cheb', 'Žilina', 'Znojmo']
+
+ In [14]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFC", x)))
+ Out[14]: ['Aš', 'Cibulov', 'Česko', 'Cheb', 'Znojmo', 'Žilina']
+
+Two out of three locales sort the same data in the same order no matter how the unicode
+input was normalized, but Czech seems to care how the input is formatted!
+
+So, everthing mentioned in `Basic Unicode Support`_ is conditional on whether
+or not the user wants to use the :mod:`locale` library or not. If not, then
+"NFD" normalization is used. If they do, "NFC" normalization is used.
Handling Broken Locale On OSX
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1121,3 +1174,4 @@ what the rest of the world assumes.
.. _really good: https://hypothesis.readthedocs.io/en/latest/
.. _testing strategy: https://docs.pytest.org/en/latest/
.. _check out some official Unicode documentation: https://unicode.org/reports/tr15/
+.. _Check out this issue for a real-world example: https://github.com/SethMMorton/natsort/issues/140 \ No newline at end of file