summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README.rst91
-rw-r--r--docs/api.rst12
-rw-r--r--docs/examples.rst18
-rw-r--r--docs/howitworks.rst210
-rw-r--r--docs/intro.rst94
-rw-r--r--docs/locale_issues.rst16
-rw-r--r--docs/shell.rst8
7 files changed, 239 insertions, 210 deletions
diff --git a/README.rst b/README.rst
index 91d5d35..df468c2 100644
--- a/README.rst
+++ b/README.rst
@@ -42,8 +42,8 @@ Quick Description
-----------------
When you try to sort a list of strings that contain numbers, the normal python
-sort algorithm sorts lexicographically, so you might not get the results that you
-expect:
+sort algorithm sorts lexicographically, so you might not get the results that
+you expect:
.. code-block:: pycon
@@ -73,10 +73,10 @@ naturally. Below are some other things you can do with ``natsort``
for a quick start guide, or the
`api <https://natsort.readthedocs.io/en/master/api.html>`_ for complete details).
-**Note**: ``natsorted`` is designed to be a drop-in replacement for the built-in
-``sorted`` function. Like ``sorted``, ``natsorted`` `does not sort in-place`.
-To sort a list and assign the output to the same variable, you must
-explicitly assign the output to a variable:
+**Note**: ``natsorted`` is designed to be a drop-in replacement for the
+built-in ``sorted`` function. Like ``sorted``, ``natsorted``
+`does not sort in-place`. To sort a list and assign the output to the same
+variable, you must explicitly assign the output to a variable:
.. code-block:: pycon
@@ -137,9 +137,9 @@ version < 4.0.0). Use the ``realsorted`` function:
Locale-Aware Sorting (or "Human Sorting")
+++++++++++++++++++++++++++++++++++++++++
-This is where the non-numeric characters are also ordered based on their meaning,
-not on their ordinal value, and a locale-dependent thousands separator and decimal
-separator is accounted for in the number.
+This is where the non-numeric characters are also ordered based on their
+meaning, not on their ordinal value, and a locale-dependent thousands
+separator and decimal separator is accounted for in the number.
This can be achieved with the ``humansorted`` function:
.. code-block:: pycon
@@ -185,8 +185,8 @@ bitwise OR operator (``|``). For example,
All of the available customizations can be found in the documentation for
`the ns enum <https://natsort.readthedocs.io/en/master/api.html#natsort.ns>`_.
-You can also add your own custom transformation functions with the ``key`` argument.
-These can be used with ``alg`` if you wish.
+You can also add your own custom transformation functions with the ``key``
+argument. These can be used with ``alg`` if you wish.
.. code-block:: pycon
@@ -248,8 +248,9 @@ method.
>>> a
['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
-All of the algorithm customizations mentioned in the `Further Customizing Natsort`_
-section can also be applied to ``natsort_keygen`` through the *alg* keyword option.
+All of the algorithm customizations mentioned in the
+`Further Customizing Natsort`_ section can also be applied to
+``natsort_keygen`` through the *alg* keyword option.
Other Useful Things
+++++++++++++++++++
@@ -266,18 +267,20 @@ FAQ
How do I debug ``natsort.natsorted()``?
The best way to debug ``natsorted()`` is to generate a key using ``natsort_keygen()``
with the same options being passed to ``natsorted``. One can take a look at
- exactly what is being done with their input using this key - it is highly recommended
+ exactly what is being done with their input using this key - it is highly
+ recommended
to `look at this issue describing how to debug <https://github.com/SethMMorton/natsort/issues/13#issuecomment-50422375>`_
for *how* to debug, and also to review the
`How Does Natsort Work? <https://natsort.readthedocs.io/en/master/howitworks.html>`_
page for *why* ``natsort`` is doing that to your data.
- If you are trying to sort custom classes and running into trouble, please take a look at
- https://github.com/SethMMorton/natsort/issues/60. In short,
+ If you are trying to sort custom classes and running into trouble, please
+ take a look at https://github.com/SethMMorton/natsort/issues/60. In short,
custom classes are not likely to be sorted correctly if one relies
- on the behavior of ``__lt__`` and the other rich comparison operators in their
- custom class - it is better to use a ``key`` function with ``natsort``, or
- use the ``natsort`` key as part of your rich comparison operator definition.
+ on the behavior of ``__lt__`` and the other rich comparison operators in
+ their custom class - it is better to use a ``key`` function with
+ ``natsort``, or use the ``natsort`` key as part of your rich comparison
+ operator definition.
How *does* ``natsort`` work?
If you don't want to read `How Does Natsort Work? <https://natsort.readthedocs.io/en/master/howitworks.html>`_,
@@ -286,9 +289,9 @@ How *does* ``natsort`` work?
``natsort`` provides a `key function <https://docs.python.org/3/howto/sorting.html#key-functions>`_
that can be passed to `list.sort() <https://docs.python.org/3/library/stdtypes.html#list.sort>`_
or `sorted() <https://docs.python.org/3/library/functions.html#sorted>`_ in order to
- modify the default sorting behavior. This key is generated on-demand with the
- key generator ``natsort.natsort_keygen()``. ``natsort.natsorted()`` is essentially
- a wrapper for the following code:
+ modify the default sorting behavior. This key is generated on-demand with
+ the key generator ``natsort.natsort_keygen()``. ``natsort.natsorted()``
+ is essentially a wrapper for the following code:
.. code-block:: pycon
@@ -341,8 +344,8 @@ The most efficient sorting can occur if you install the
`fastnumbers <https://pypi.org/project/fastnumbers>`_ package
(version >=2.0.0); it helps with the string to number conversions.
``natsort`` will still run (efficiently) without the package, but if you need
-to squeeze out that extra juice it is recommended you include this as a dependency.
-``natsort`` will not require (or check) that
+to squeeze out that extra juice it is recommended you include this as a
+dependency. ``natsort`` will not require (or check) that
`fastnumbers <https://pypi.org/project/fastnumbers>`_ is installed
at installation.
@@ -381,17 +384,18 @@ How to Run Tests
Please note that ``natsort`` is NOT set-up to support ``python setup.py test``.
The recommended way to run tests is with `tox <https://tox.readthedocs.io/en/latest/>`_.
-After installing ``tox``, running tests is as simple as executing the following in the
-``natsort`` directory:
+After installing ``tox``, running tests is as simple as executing the following
+in the ``natsort`` directory:
.. code-block:: console
$ tox
-``tox`` will create virtual a virtual environment for your tests and install all the
-needed testing requirements for you. You can specify a particular python version
-with the ``-e`` flag, e.g. ``tox -e py36``. Static analysis is done with ``tox -e flake8``.
-You can see all available testing environments with ``tox --listenvs``.
+``tox`` will create virtual a virtual environment for your tests and install
+all the needed testing requirements for you. You can specify a particular
+python version with the ``-e`` flag, e.g. ``tox -e py36``. Static analysis
+is done with ``tox -e flake8``. You can see all available testing environments
+with ``tox --listenvs``.
If you do not wish to use ``tox``, you can install the testing dependencies with the
``dev/requirements.txt`` file and then run the tests manually using
@@ -408,7 +412,8 @@ Note that above I invoked ``python -m pytest`` instead of just ``pytest`` - this
How to Build Documentation
--------------------------
-If you want to build the documentation for ``natsort``, it is recommended to use ``tox``:
+If you want to build the documentation for ``natsort``, it is recommended to
+use ``tox``:
.. code-block:: console
@@ -430,10 +435,10 @@ Dropping Python 2.7 Support
``natsort`` version 7.0.0 will drop support for Python 2.7.
-The version 6.X branch will remain as a "long term support" branch where bug fixes
-are applied so that users who cannot update from Python 2.7 will not be forced to
-use a buggy ``natsort`` version. Once version 7.0.0 is released, new features
-will not be added to version 6.X, only bug fixes.
+The version 6.X branch will remain as a "long term support" branch where bug
+fixes are applied so that users who cannot update from Python 2.7 will not be
+forced to use a buggy ``natsort`` version. Once version 7.0.0 is released, new
+features will not be added to version 6.X, only bug fixes.
Deprecated APIs
+++++++++++++++
@@ -448,19 +453,21 @@ In ``natsort`` version 6.0.0, the following APIs and functions were removed
- ``ns.TYPESAFE`` (deprecated since version 5.0.0)
- ``ns.DIGIT`` (deprecated since version 5.0.0)
- ``ns.VERSION`` (deprecated since version 5.0.0)
- - ``versorted()`` (discouraged since version 4.0.0, officially deprecated since version 5.5.0)
- - ``index_versorted()`` (discouraged since version 4.0.0, officially deprecated since version 5.5.0)
+ - ``versorted()`` (discouraged since version 4.0.0,
+ officially deprecated since version 5.5.0)
+ - ``index_versorted()`` (discouraged since version 4.0.0,
+ officially deprecated since version 5.5.0)
-In general, if you want to determine if you are using deprecated APIs you can run your
-code with the following flag
+In general, if you want to determine if you are using deprecated APIs you
+can run your code with the following flag
.. code-block:: console
$ python -Wdefault::DeprecationWarning my-code.py
-By default ``DeprecationWarnings`` are not shown, but this will cause them to be shown.
-Alternatively, you can just set the environment variable ``PYTHONWARNINGS`` to
-"default::DeprecationWarning" and then run your code.
+By default ``DeprecationWarnings`` are not shown, but this will cause them
+to be shown. Alternatively, you can just set the environment variable
+``PYTHONWARNINGS`` to "default::DeprecationWarning" and then run your code.
Dropped Pipenv for Development
++++++++++++++++++++++++++++++
diff --git a/docs/api.rst b/docs/api.rst
index 8052606..64794ff 100644
--- a/docs/api.rst
+++ b/docs/api.rst
@@ -89,16 +89,16 @@ Help With Creating Function Keys
++++++++++++++++++++++++++++++++
If you need to create a complicated *key* argument to (for example)
-:func:`natsorted` that is actually multiple functions called one after the other,
-the following function can help you easily perform this action. It is
+:func:`natsorted` that is actually multiple functions called one after the
+other, the following function can help you easily perform this action. It is
used internally to :mod:`natsort`, and has been exposed publicly for
the convenience of the user.
.. autofunction:: chain_functions
-If you need to be able to search your input for numbers using the same definition
-as :mod:`natsort`, you can do so using the following function. Given your chosen
-algorithm (selected using the :class:`~natsort.ns` enum), the corresponding regular
-expression to locate numbers will be returned.
+If you need to be able to search your input for numbers using the same
+definition as :mod:`natsort`, you can do so using the following function.
+Given your chosen algorithm (selected using the :class:`~natsort.ns` enum),
+the corresponding regular expression to locate numbers will be returned.
.. autofunction:: numeric_regex_chooser
diff --git a/docs/examples.rst b/docs/examples.rst
index 04ca632..f09f8e5 100644
--- a/docs/examples.rst
+++ b/docs/examples.rst
@@ -47,8 +47,8 @@ By default, if you wish to sort versions that are not as simple as
>>> natsorted(a)
['1.1', '1.2', '1.2.1', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.3']
-To make the '1.2' pre-releases come before '1.2.1', you need to use the following
-recipe:
+To make the '1.2' pre-releases come before '1.2.1', you need to use the
+following recipe:
.. code-block:: pycon
@@ -76,8 +76,8 @@ to assist in sorting. Some examples might be
`SemVer <https://python-semver.readthedocs.io/en/latest/api.html>`_.
If we are being honest, using these methods to parse a version means you don't
-need to use :mod:`natsort` - you should probably just use :func:`sorted` directly.
-Here's an example with SemVer:
+need to use :mod:`natsort` - you should probably just use :func:`sorted`
+directly. Here's an example with SemVer:
.. code-block:: pycon
@@ -253,8 +253,8 @@ Accounting for Units When Sorting
:mod:`natsort` does not come with any pre-built mechanism to sort units,
but you can write your own `key` to do this. Below, I will demonstrate sorting
imperial lengths (e.g. feet an inches), but of course you can extend this to any
-set of units you need. This example is based on code from
-`this issue <https://github.com/SethMMorton/natsort/issues/100#issuecomment-530659310>`_,
+set of units you need. This example is based on code
+`from this issue <https://github.com/SethMMorton/natsort/issues/100#issuecomment-530659310>`_,
and uses the function :func:`natsort.numeric_regex_chooser` to build a regular
expression that will parse numbers in the same manner as :mod:`natsort` itself.
@@ -426,9 +426,9 @@ If you need a codec different from ASCII or UTF-8, you can use
Sorting a Pandas DataFrame
--------------------------
-As of Pandas version 0.16.0, the sorting methods do not accept a ``key`` argument,
-so you cannot simply pass :func:`natsort_keygen` to a Pandas DataFrame and sort.
-This request has been made to the Pandas devs; see
+As of Pandas version 0.16.0, the sorting methods do not accept a ``key``
+argument, so you cannot simply pass :func:`natsort_keygen` to a Pandas
+DataFrame and sort. This request has been made to the Pandas devs; see
`issue 3942 <https://github.com/pydata/pandas/issues/3942>`_ if you are interested.
If you need to sort a Pandas DataFrame, please check out
`this answer on StackOverflow <https://stackoverflow.com/a/29582718/1399279>`_
diff --git a/docs/howitworks.rst b/docs/howitworks.rst
index a8176e3..fb157c6 100644
--- a/docs/howitworks.rst
+++ b/docs/howitworks.rst
@@ -36,7 +36,7 @@ If I want to compare '2 ft 7 in' to '2 ft 11 in', I might do the following
We as humans know that the above should be true, but why does Python think it
is false? Here is how it is performing the comparison:
-.. code-block:: none
+.. code-block::
'2' <=> '2' ==> equal, so keep going
' ' <=> ' ' ==> equal, so keep going
@@ -53,18 +53,18 @@ The best way to handle this is to break the string into sub-components
of numbers and non-numbers, and then convert the numeric parts into
:func:`float` or :func:`int` types. This will force Python to
actually understand the context of what it is sorting and then "do the
-right thing." Luckily, it handles sorting lists of strings right out-of-the-box,
-so the only hard part is actually making this string-to-list transformation
-and then Python will handle the rest.
+right thing." Luckily, it handles sorting lists of strings right
+out-of-the-box, so the only hard part is actually making this string-to-list
+transformation and then Python will handle the rest.
-.. code-block:: none
+.. code-block::
'2 ft 7 in' ==> (2, ' ft ', 7, ' in')
'2 ft 11 in' ==> (2, ' ft ', 11, ' in')
When Python compares the two, it roughly follows the below logic:
-.. code-block:: none
+.. code-block::
2 <=> 2 ==> equal, so keep going
' ft ' <=> ' ft ' ==> a string is a special type of sequence - evaluate each character individually
@@ -92,10 +92,10 @@ Natsort's Approach
Decomposing Strings Into Sub-Components
+++++++++++++++++++++++++++++++++++++++
-The first major hurtle to overcome is to decompose the string into sub-components.
-Remarkably, this turns out to be the easy part, owing mostly to Python's easy access
-to regular expressions. Breaking an arbitrary string based on a pattern is pretty
-straightforward.
+The first major hurtle to overcome is to decompose the string into
+sub-components. Remarkably, this turns out to be the easy part, owing mostly
+to Python's easy access to regular expressions. Breaking an arbitrary string
+based on a pattern is pretty straightforward.
.. code-block:: pycon
@@ -106,10 +106,11 @@ straightforward.
Clear (assuming you can read regular expressions) and concise.
The reason I began developing :mod:`natsort` in the first place was because I
-needed to handle the natural sorting of strings containing *real numbers*, not just
-unsigned integers as the above example contains. By real numbers, I mean those like
-``-45.4920E-23``. :mod:`natsort` can handle just about any number definition;
-to that end, here are all the regular expressions used in :mod:`natsort`:
+needed to handle the natural sorting of strings containing *real numbers*, not
+just unsigned integers as the above example contains. By real numbers, I mean
+those like ``-45.4920E-23``. :mod:`natsort` can handle just about any number
+definition; to that end, here are all the regular expressions used in
+:mod:`natsort`:
.. code-block:: pycon
@@ -120,9 +121,9 @@ to that end, here are all the regular expressions used in :mod:`natsort`:
>>> unsigned_float_no_exponent = r'((?:[0-9]+\.?[0-9]*|\.[0-9]+))'
>>> signed_float_no_exponent = r'([-+]?(?:[0-9]+\.?[0-9]*|\.[0-9]+))'
-Note that ``"inf"`` and ``"nan"`` are deliberately omitted from the float definition because you
-wouldn't want (for example) ``"banana"`` to be converted into ``['ba', 'nan', 'a']``,
-Let's see an example:
+Note that ``"inf"`` and ``"nan"`` are deliberately omitted from the float
+definition because you wouldn't want (for example) ``"banana"`` to be converted
+into ``['ba', 'nan', 'a']``, Let's see an example:
.. code-block:: pycon
@@ -135,21 +136,21 @@ Let's see an example:
actual code there is also handling for non-ASCII unicode characters (such as ⑦),
but I will ignore that aspect of :mod:`natsort` in this discussion.
-Now, when the user wants to change the definition of a number, it is as easy as changing
-the pattern supplied to the regular expression engine.
-
-Choosing the right default is hard, though (well, in this case it shouldn't have been
-but I was rather thick-headed).
-In retrospect, it should have been obvious that since essentially all the code examples
-I had/have seen for natural sorting were for *unsigned integers*, I should have made the default
-definition of a number an *unsigned integer*. But, in the brash days of my youth I assumed
-that since my use case was real numbers, everyone else would be happier sorting by real numbers;
-so, I made the default definition of a number a *signed float with exponent*.
-`This astonished`_ `a lot`_ `of people`_
+Now, when the user wants to change the definition of a number, it is as easy as
+changing the pattern supplied to the regular expression engine.
+
+Choosing the right default is hard, though (well, in this case it shouldn't
+have been but I was rather thick-headed). In retrospect, it should have been
+obvious that since essentially all the code examples I had/have seen for
+natural sorting were for *unsigned integers*, I should have made the default
+definition of a number an *unsigned integer*. But, in the brash days of my
+youth I assumed that since my use case was real numbers, everyone else would
+be happier sorting by real numbers; so, I made the default definition of a
+number a *signed float with exponent*. `This astonished`_ `a lot`_ `of people`_
(`and some people aren't very nice when they are astonished`_).
Starting with :mod:`natsort` version 4.0.0 the default number definition was
-changed to an *unsigned integer* which satisfies the "least astonishment" principle, and
-I have not heard a complaint since.
+changed to an *unsigned integer* which satisfies the "least astonishment"
+principle, and I have not heard a complaint since.
Coercing Strings Containing Numbers Into Numbers
++++++++++++++++++++++++++++++++++++++++++++++++
@@ -193,28 +194,29 @@ Here are some timing results run on my machine:
In [5]: %timeit [coerce_regex(x) for x in numbers]
10000 loops, best of 3: 123 µs per loop
-What can we learn from this? The ``try: except`` method (arguably the most "pythonic"
-of the solutions) is best for numeric input, but performs over 5X slower for non-numeric
-input. Conversely, the regular expression method, though slower than ``try: except`` for
-both input types, is more efficient for non-numeric input than for input that can be
-converted to an ``int``. Further, even though the regular expression method is slower
-for both input types, it is always at least twice as fast as the worst case for the
-``try: except``.
-
-Why do I care? Shouldn't I just pick a method and not worry about it? Probably. However,
-I am very conscious about the performance of :mod:`natsort`, and want it to be a true
-drop-in replacement for :func:`sorted` without having to incur a performance penalty.
-For the purposes of :mod:`natsort`, there is no clear winner between the two algorithms -
-the data being passed to this function will likely be a mix of numeric and non-numeric
-string content. Do I use the ``try: except`` method and hope the speed gains on
-numbers will offset the non-number performance, or do I use regular expressions and
-take the more stable performance?
+What can we learn from this? The ``try: except`` method (arguably the most
+"pythonic" of the solutions) is best for numeric input, but performs over 5X
+slower for non-numeric input. Conversely, the regular expression method, though
+slower than ``try: except`` for both input types, is more efficient for
+non-numeric input than for input that can be converted to an ``int``. Further,
+even though the regular expression method is slower for both input types, it is
+always at least twice as fast as the worst case for the ``try: except``.
+
+Why do I care? Shouldn't I just pick a method and not worry about it? Probably.
+However, I am very conscious about the performance of :mod:`natsort`, and want
+it to be a true drop-in replacement for :func:`sorted` without having to incur
+a performance penalty. For the purposes of :mod:`natsort`, there is no clear
+winner between the two algorithms - the data being passed to this function will
+likely be a mix of numeric and non-numeric string content. Do I use the
+``try: except`` method and hope the speed gains on numbers will offset the
+non-number performance, or do I use regular expressions and take the more
+stable performance?
It turns out that within the context of :mod:`natsort`, some assumptions can be
made that make a hybrid approach attractive. Because all strings are pre-split
-into numeric and non-numeric content *before* being passed to this coercion function,
-the assumption can be made that *if a string begins with a digit or a sign, it
-can be coerced into a number*.
+into numeric and non-numeric content *before* being passed to this coercion
+function, the assumption can be made that *if a string begins with a digit or a
+sign, it can be coerced into a number*.
.. code-block:: pycon
@@ -238,9 +240,9 @@ So how does this perform compared to the standard coercion methods?
In [7]: %timeit [coerce_to_int(x) for x in not_numbers]
10000 loops, best of 3: 26.4 µs per loop
-The hybrid method eliminates most of the time wasted on numbers checking that it
-is in fact a number before passing to :func:`int`, and eliminates the time wasted
-in the exception stack for input that is not a number.
+The hybrid method eliminates most of the time wasted on numbers checking
+that it is in fact a number before passing to :func:`int`, and eliminates
+the time wasted in the exception stack for input that is not a number.
That's as fast as we can get, right? In pure Python, probably. At least, it's
close. But because I am crazy and a glutton for punishment, I decided to see
@@ -257,12 +259,12 @@ called :func:`fast_int`. How does it fair? Pretty well.
10000 loops, best of 3: 30 µs per loop
During development of :mod:`natsort`, I wanted to ensure that using it did not
-get in the way of a user's program by introducing a performance penalty to their code.
-To that end, I do not feel like my adventures down the rabbit hole of optimization
-of coercion functions was a waste; I can confidently look users in the eye and
-say I considered every option in ensuring :mod:`natsort` is as efficient as possible.
-This is why if `fastnumbers`_ is installed it will be used for this step,
-and otherwise the hybrid method will be used.
+get in the way of a user's program by introducing a performance penalty to
+their code. To that end, I do not feel like my adventures down the rabbit hole
+of optimization of coercion functions was a waste; I can confidently look users
+in the eye and say I considered every option in ensuring :mod:`natsort` is as
+efficient as possible. This is why if `fastnumbers`_ is installed it will be
+used for this step, and otherwise the hybrid method will be used.
.. note::
@@ -392,11 +394,11 @@ filename component as well. We can solve that nicely and quickly with
>>> sorted(paths, key=natsort_key_with_path_support)
['/p/Folder/file.tar.gz', '/p/Folder (1)/file.tar.gz', '/p/Folder (1)/file (1).tar.gz', '/p/Folder (10)/file.tar.gz']
-This works because in addition to breaking the input by path separators, the final
-filename component is separated from its extensions as well [#f1]_. *Then*, each of these
-separated components is sent to the :mod:`natsort` algorithm, so the result is
-a tuple of tuples. Once that is done, we can see how comparisons can be done in
-the expected manner.
+This works because in addition to breaking the input by path separators,
+the final filename component is separated from its extensions as well
+[#f1]_. *Then*, each of these separated components is sent to the
+:mod:`natsort` algorithm, so the result is a tuple of tuples. Once that
+is done, we can see how comparisons can be done in the expected manner.
.. code-block:: pycon
@@ -455,22 +457,24 @@ Let's break these down.
#. ``natsort_key_with_poor_real_number_support('12 apples') < natsort_key_with_poor_real_number_support('apples')``
is the same as ``(12.0, ' apples') < ('apples',)``, and thus a number gets
compared to a string [#f2]_ which also is a no-no.
-#. This one scores big on the astonishment scale, especially if one accidentally
- uses signed integers or real numbers when they mean to use unsigned integers.
+#. This one scores big on the astonishment scale, especially if one
+ accidentally uses signed integers or real numbers when they mean
+ to use unsigned integers.
``natsort_key_with_poor_real_number_support('version5.3.0') < natsort_key_with_poor_real_number_support('version5.3rc1')``
- is the same as ``('version', 5.3, 0.0) < ('version', 5.3, 'rc', 1.0)``, so in the
- third element a number gets compared to a string, once again the same
- old no-no. (The same would happen with ``'version5-3'`` and ``'version5-a'``,
- which would be come ``('version', 5, -3)`` and ``('version', 5, '-a')``).
-
-As you might expect, the solution to the first issue is to wrap the ``re.split``
-call in a ``try: except:`` block and handle the number specially if a
-:exc:`TypeError` is raised. The second and third cases *could* be handled
+ is the same as ``('version', 5.3, 0.0) < ('version', 5.3, 'rc', 1.0)``,
+ so in the third element a number gets compared to a string, once again
+ the same old no-no. (The same would happen with ``'version5-3'`` and
+ ``'version5-a'``, which would become ``('version', 5, -3)`` and
+ ``('version', 5, '-a')``).
+
+As you might expect, the solution to the first issue is to wrap the
+``re.split`` call in a ``try: except:`` block and handle the number specially
+if a :exc:`TypeError` is raised. The second and third cases *could* be handled
in a "special case" manner, meaning only respond and do something different
if these problems are detected. But a less error-prone method is to ensure
that the data is correct-by-construction, and this can be done by ensuring
that the returned tuples *always* start with a string, and then alternate
-in a string-number-string-number-string patter;n this can be achieved by
+in a string-number-string-number-string pattern; this can be achieved by
adding an empty string wherever the pattern is not followed [#f3]_. This ends
up working out pretty nicely because empty strings are always "less" than
any non-empty string, and we typically want numbers to come before strings.
@@ -501,7 +505,8 @@ Let's take a look at how this works out.
>>> sorted(['version5.3.0', 'version5.3rc1'], key=natsort_key_with_good_real_number_support)
['version5.3.0', 'version5.3rc1']
-How the "good" version works will be given in `TL;DR 2 - Handling Crappy, Real-World Input`_.
+How the "good" version works will be given in
+`TL;DR 2 - Handling Crappy, Real-World Input`_.
Handling NaN
++++++++++++
@@ -548,7 +553,8 @@ to know how **NaN** will behave in a sorting algorithm). The simplest way to
satisfy the "least astonishment" principle is to substitute **NaN** with
some other value. But what value is *least* astonishing? I chose to replace
**NaN** with :math:`-\infty` so that these poorly behaved elements always
-end up at the front where the users will most likely be alerted to their presence.
+end up at the front where the users will most likely be alerted to their
+presence.
.. code-block:: pycon
@@ -571,6 +577,8 @@ Let's see how our elegant key function from :ref:`TL;DR 1 <tldr1>` has
become bastardized in order to support handling mixed real-world data
and user customizations.
+.. code-block:: pycon
+
>>> def natsort_key(x, as_float=False, signed=False, as_path=False):
... if as_float:
... regex = signed_float if signed else unsigned_float
@@ -600,10 +608,10 @@ and user customizations.
... return tuple(sep_inserter(coerced_input, ''))
...
-And this doesn't even show handling :class:`bytes` type! Notice that we have
+And this doesn't even show handling :class:`bytes` type! Notice that we have
to do non-obvious things like modify the return form of numbers when ``as_path``
-is given, just to avoid comparing strings and numbers for the case in which a user provides
-input like ``['/home/me', 42]``.
+is given, just to avoid comparing strings and numbers for the case in which a
+user provides input like ``['/home/me', 42]``.
Let's take it out for a spin!
@@ -629,9 +637,10 @@ Probably the most challenging special case I had to handle was getting
:mod:`natsort` to handle sorting the non-numerical parts of input
correctly, and also allowing it to sort the numerical bits in different
locales. This was in no way what I originally set out to do with this
-library, so I was `caught a bit off guard when the request was initially made`_.
-I discovered the :mod:`locale` library, and assumed that if it's part of Python's
-StdLib there can't be too many dragons, right?
+library, so I was
+`caught a bit off guard when the request was initially made`_.
+I discovered the :mod:`locale` library, and assumed that if it's part of
+Python's StdLib there can't be too many dragons, right?
.. admonition:: INCOMPLETE LIST OF DRAGONS
@@ -653,9 +662,11 @@ These can be summed up as follows:
#. :mod:`locale` is a thin wrapper over your operating system's *locale*
library, so if *that* is broken (like it is on BSD and OSX) then
:mod:`locale` is broken in Python.
-#. Because of a bug in legacy Python (i.e. Python 2), there is no uniform way to use
- the :mod:`locale` sorting functionality between legacy Python and Python 3.
-#. People have differing opinions of how capitalization should affect word order.
+#. Because of a bug in legacy Python (i.e. Python 2), there is no uniform
+ way to use the :mod:`locale` sorting functionality between legacy Python
+ and Python 3.
+#. People have differing opinions of how capitalization should affect word
+ order.
#. There is no built-in way to handle locale-dependent thousands separators
and decimal points *robustly*.
#. Proper handling of Unicode is complicated.
@@ -692,7 +703,8 @@ so all capitalized words appear first. Not everyone agrees that this
is the correct order. Some believe that the capitalized words should
be last (``['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']``).
Some believe that both the lowercase and uppercase versions
-should appear together (``['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']``).
+should appear together
+(``['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']``).
Some believe that both should be true ☹. Some people don't care at all [#f4]_.
Solving the first case (I call it *LOWERCASEFIRST*) is actually pretty
@@ -787,7 +799,6 @@ Unicode is hard and complicated. Here's an example.
>>> sorted(a) # doctest: +SKIP
['a', 'e', 'é', 'f', 'z', 'é']
-
There are more than one way to represent the character 'é' in Unicode.
In fact, many characters have multiple representations. This is a challenge
because comparing the two representations would return ``False`` even though
@@ -806,12 +817,14 @@ The original approach that :mod:`natsort` took with respect to non-ASCII
Unicode characters was to say "just use
the :mod:`locale` or :mod:`PyICU` library" and then cross it's fingers
and hope those libraries take care of it. As you will find in the following
-sections, that comes with its own baggage, and turned out to not always work anyway
-(see https://stackoverflow.com/q/45734562/1399279). A more robust approach is to
-handle the Unicode out-of-the-box without invoking a heavy-handed library
-like :mod:`locale` or :mod:`PyICU`. To do this, we must use *normalization*.
-
-To fully understand Unicode normalization, `check out some official Unicode documentation`_.
+sections, that comes with its own baggage, and turned out to not always work
+anyway (see https://stackoverflow.com/q/45734562/1399279). A more robust
+approach is to handle the Unicode out-of-the-box without invoking a
+heavy-handed library like :mod:`locale` or :mod:`PyICU`.
+To do this, we must use *normalization*.
+
+To fully understand Unicode normalization,
+`check out some official Unicode documentation`_.
Just kidding... that's too much text. The following StackOverflow answers do
a good job at explaining Unicode normalization in simple terms:
https://stackoverflow.com/a/7934397/1399279 and
@@ -1076,11 +1089,12 @@ what the rest of the world assumes.
:func:`sep_inserter` in `util.py`_.
.. [#f4]
Handling each of these is straightforward, but coupled with the rapidly
- fracturing execution paths presented in :ref:`TL;DR 2 <tldr2>` one can imagine
- this will get out of hand quickly. If you take a look at `natsort.py`_ and
- `util.py`_ you can observe that to avoid this I take a more functional approach
- to construting the :mod:`natsort` algorithm as opposed to the procedural approach
- illustrated in :ref:`TL;DR 1 <tldr1>` and :ref:`TL;DR 2 <tldr2>`.
+ fracturing execution paths presented in :ref:`TL;DR 2 <tldr2>` one can
+ imagine this will get out of hand quickly. If you take a look at
+ `natsort.py`_ and `util.py`_ you can observe that to avoid this I take
+ a more functional approach to construting the :mod:`natsort` algorithm
+ as opposed to the procedural approach illustrated in
+ :ref:`TL;DR 1 <tldr1>` and :ref:`TL;DR 2 <tldr2>`.
.. _ASCII table: https://www.asciitable.com/
.. _getting sorting right is surprisingly hard: http://www.compciv.org/guides/python/fundamentals/sorting-collections-with-sorted/
diff --git a/docs/intro.rst b/docs/intro.rst
index e5905fc..0a25a35 100644
--- a/docs/intro.rst
+++ b/docs/intro.rst
@@ -17,11 +17,11 @@ Simple yet flexible natural sorting in Python.
**NOTE**: Please see the `Deprecation Schedule`_ section for changes in
:mod:`natsort` version 6.0.0 and in the upcoming version 7.0.0.
-:mod:`natsort` is a general utility for sorting lists *naturally*; the definition
-of "naturally" is not well-defined, but the most common definition is that numbers
-contained within the string should be sorted as numbers and not as you would
-other characters. If you need to present sorted output to a user, you probably
-want to sort it naturally.
+:mod:`natsort` is a general utility for sorting lists *naturally*; the
+definition of "naturally" is not well-defined, but the most common definition
+is that numbers contained within the string should be sorted as numbers and not
+as you would other characters. If you need to present sorted output to a user,
+you probably want to sort it naturally.
:mod:`natsort` was initially created for sorting scientific output filenames that
contained signed floating point numbers in the names. There was a lack of
@@ -32,8 +32,9 @@ and its answers and links therein,
`this ActiveState forum <https://code.activestate.com/recipes/285264-natural-string-sorting/>`_,
and of course `this great article on natural sorting <https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/>`_
from CodingHorror.com for examples of what I mean.
-:mod:`natsort` was created to fill in this gap, but has since expanded to handle
-just about any definition of a number, as well as other sorting customizations.
+:mod:`natsort` was created to fill in this gap, but has since expanded to
+handle just about any definition of a number, as well as other sorting
+customizations.
Quick Description
-----------------
@@ -183,8 +184,8 @@ bitwise OR operator (``|``). For example,
All of the available customizations can be found in the documentation for
the :class:`~natsort.ns` enum.
-You can also add your own custom transformation functions with the ``key`` argument.
-These can be used with ``alg`` if you wish:
+You can also add your own custom transformation functions with the ``key``
+argument. These can be used with ``alg`` if you wish:
.. code-block:: pycon
@@ -246,8 +247,9 @@ method.
>>> a
['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
-All of the algorithm customizations mentioned in the `Further Customizing Natsort`_
-section can also be applied to :func:`~natsort_keygen` through the *alg* keyword option.
+All of the algorithm customizations mentioned in the
+`Further Customizing Natsort`_ section can also be applied to :func:`~natsort_keygen`
+through the *alg* keyword option.
Other Useful Things
+++++++++++++++++++
@@ -263,18 +265,19 @@ FAQ
How do I debug :func:`~natsorted`?
The best way to debug :func:`~natsorted` is to generate a key using :func:`~natsort_keygen`
- with the same options being passed to :func:`~natsorted`. One can take a look at
- exactly what is being done with their input using this key - it is highly recommended
- to `look at this issue describing how to debug <https://github.com/SethMMorton/natsort/issues/13#issuecomment-50422375>`_
+ with the same options being passed to :func:`~natsorted`. One can take a
+ look at exactly what is being done with their input using this key - it is
+ highly recommended to `look at this issue describing how to debug <https://github.com/SethMMorton/natsort/issues/13#issuecomment-50422375>`_
for *how* to debug, and also to review the :ref:`howitworks` page for *why*
:mod:`natsort` is doing that to your data.
- If you are trying to sort custom classes and running into trouble, please take a look at
- https://github.com/SethMMorton/natsort/issues/60. In short,
+ If you are trying to sort custom classes and running into trouble, please
+ take a look at https://github.com/SethMMorton/natsort/issues/60. In short,
custom classes are not likely to be sorted correctly if one relies
- on the behavior of ``__lt__`` and the other rich comparison operators in their
- custom class - it is better to use a ``key`` function with :mod:`natsort`, or
- use the :mod:`natsort` key as part of your rich comparison operator definition.
+ on the behavior of ``__lt__`` and the other rich comparison operators in
+ their custom class - it is better to use a ``key`` function with
+ :mod:`natsort`, or use the :mod:`natsort` key as part of your rich
+ comparison operator definition.
How *does* :mod:`natsort` work?
If you don't want to read :ref:`howitworks`, here is a quick primer.
@@ -318,8 +321,8 @@ How *does* :mod:`natsort` work?
Shell script
------------
-:mod:`natsort` comes with a shell script called :mod:`natsort`, or can also be called
-from the command line with ``python -m natsort``.
+:mod:`natsort` comes with a shell script called :mod:`natsort`, or can also be
+called from the command line with ``python -m natsort``.
Requirements
------------
@@ -335,9 +338,9 @@ fastnumbers
The most efficient sorting can occur if you install the
`fastnumbers <https://pypi.org/project/fastnumbers>`_ package
(version >=2.0.0); it helps with the string to number conversions.
-:mod:`natsort` will still run (efficiently) without the package, but if you need
-to squeeze out that extra juice it is recommended you include this as a dependency.
-:mod:`natsort` will not require (or check) that
+:mod:`natsort` will still run (efficiently) without the package, but if you
+need to squeeze out that extra juice it is recommended you include this as a
+dependency. :mod:`natsort` will not require (or check) that
`fastnumbers <https://pypi.org/project/fastnumbers>`_ is installed
at installation.
@@ -373,20 +376,22 @@ at installation time to install those dependencies as well - use ``fast`` for
How to Run Tests
----------------
-Please note that :mod:`natsort` is NOT set-up to support ``python setup.py test``.
+Please note that :mod:`natsort` is NOT set-up to support
+``python setup.py test``.
The recommended way to run tests is with `tox <https://tox.readthedocs.io/en/latest/>`_.
-After installing ``tox``, running tests is as simple as executing the following in the
-``natsort`` directory:
+After installing ``tox``, running tests is as simple as executing the following
+in the ``natsort`` directory:
.. code-block:: sh
$ tox
-``tox`` will create virtual a virtual environment for your tests and install all the
-needed testing requirements for you. You can specify a particular python version
-with the ``-e`` flag, e.g. ``tox -e py36``. Static analysis is done with ``tox -e flake8``.
-You can see all available testing environments with ``tox --listenvs``.
+``tox`` will create virtual a virtual environment for your tests and install
+all the needed testing requirements for you. You can specify a particular
+python version with the ``-e`` flag, e.g. ``tox -e py36``. Static analysis is
+done with ``tox -e flake8``. You can see all available testing environments
+with ``tox --listenvs``.
If you do not wish to use ``tox``, you can install the testing dependencies with the
``dev/requirements.txt`` file and then run the tests manually using
@@ -403,7 +408,8 @@ Note that above I invoked ``python -m pytest`` instead of just ``pytest`` - this
How to Build Documentation
--------------------------
-If you want to build the documentation for :mod:`natsort`, it is recommended to use ``tox``:
+If you want to build the documentation for :mod:`natsort`, it is recommended to
+use ``tox``:
.. code-block:: console
@@ -425,10 +431,10 @@ Dropping Python 2.7 Support
:mod:`natsort` version 7.0.0 will drop support for Python 2.7.
-The version 6.X branch will remain as a "long term support" branch where bug fixes
-are applied so that users who cannot update from Python 2.7 will not be forced to
-use a buggy :mod:`natsort` version. Once version 7.0.0 is released, new features
-will not be added to version 6.X, only bug fixes.
+The version 6.X branch will remain as a "long term support" branch where bug
+fixes are applied so that users who cannot update from Python 2.7 will not be
+forced to use a buggy :mod:`natsort` version. Once version 7.0.0 is released,
+new features will not be added to version 6.X, only bug fixes.
Deprecated APIs
+++++++++++++++
@@ -443,19 +449,21 @@ In :mod:`natsort` version 6.0.0, the following APIs and functions were removed
- ``ns.TYPESAFE`` (deprecated since version 5.0.0)
- ``ns.DIGIT`` (deprecated since version 5.0.0)
- ``ns.VERSION`` (deprecated since version 5.0.0)
- - :func:`~natsort.versorted` (discouraged since version 4.0.0, officially deprecated since version 5.5.0)
- - :func:`~natsort.index_versorted` (discouraged since version 4.0.0, officially deprecated since version 5.5.0)
+ - :func:`~natsort.versorted` (discouraged since version 4.0.0,
+ officially deprecated since version 5.5.0)
+ - :func:`~natsort.index_versorted` (discouraged since version 4.0.0,
+ officially deprecated since version 5.5.0)
-In general, if you want to determine if you are using deprecated APIs you can run your
-code with the following flag
+In general, if you want to determine if you are using deprecated APIs you can
+run your code with the following flag
.. code-block:: console
$ python -Wdefault::DeprecationWarning my-code.py
-By default :exc:`DeprecationWarnings` are not shown, but this will cause them to be shown.
-Alternatively, you can just set the environment variable ``PYTHONWARNINGS`` to
-"default::DeprecationWarning" and then run your code.
+By default :exc:`DeprecationWarnings` are not shown, but this will cause them
+to be shown. Alternatively, you can just set the environment variable
+``PYTHONWARNINGS`` to "default::DeprecationWarning" and then run your code.
Dropped Pipenv for Development
++++++++++++++++++++++++++++++
diff --git a/docs/locale_issues.rst b/docs/locale_issues.rst
index 88cf3b8..427eedc 100644
--- a/docs/locale_issues.rst
+++ b/docs/locale_issues.rst
@@ -9,9 +9,9 @@ Possible Issues with :func:`~natsort.humansorted` or ``ns.LOCALE``
Being Locale-Aware Means Both Numbers and Non-Numbers
-----------------------------------------------------
-In addition to modifying how characters are sorted, ``ns.LOCALE`` will take into
-account locale-dependent thousands separators (and locale-dependent decimal
-separators if ``ns.FLOAT`` is enabled). This means that if you are in a
+In addition to modifying how characters are sorted, ``ns.LOCALE`` will take
+into account locale-dependent thousands separators (and locale-dependent
+decimal separators if ``ns.FLOAT`` is enabled). This means that if you are in a
locale that uses commas as the thousands separator, a number like
``123,456`` will be interpreted as ``123456``. If this is not what you want,
you may consider using ``ns.LOCALEALPHA`` which will only enable locale-aware
@@ -52,8 +52,8 @@ installed, please keep the following known problems and issues in mind.
.. note:: Remember, if you have `PyICU`_ installed you shouldn't need to worry
about any of these.
-Explicitly Set the Locale Before Using :func:`~natsort.humansorted` or ``ns.LOCALE``
-++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+Explicitly Set the Locale Before Using ``ns.LOCALE``
+++++++++++++++++++++++++++++++++++++++++++++++++++++
I have found that unless you explicitly set a locale, the sorted order may not
be what you expect. Setting this is straightforward
@@ -90,8 +90,8 @@ install this there is some hope.
a built-in lookup table of thousands separators that are incorrect
on OS X/BSD (but is possible it is not complete... please file an
issue if you see it is not complete)
- 2. Use "\*.ISO8859-1" locale (i.e. 'en_US.ISO8859-1') rather than "\*.UTF-8"
- locale. I have found that these have fewer issues than "UTF-8", but
- your mileage may vary.
+ 2. Use "\*.ISO8859-1" locale (i.e. 'en_US.ISO8859-1') rather than
+ "\*.UTF-8" locale. I have found that these have fewer issues than
+ "UTF-8", but your mileage may vary.
.. _PyICU: https://pypi.org/project/PyICU
diff --git a/docs/shell.rst b/docs/shell.rst
index 8a17ccc..0d7d3c9 100644
--- a/docs/shell.rst
+++ b/docs/shell.rst
@@ -14,7 +14,7 @@ Below is the usage and some usage examples for the ``natsort`` shell script.
Usage
-----
-.. code-block:: none
+.. code-block::
usage: natsort [-h] [--version] [-p] [-f LOW HIGH] [-F LOW HIGH] [-e EXCLUDE]
[-r] [-t {digit,int,float,version,ver}] [--nosign] [--noexp]
@@ -81,7 +81,7 @@ named after the parameter used:
$ ls *.out
mode1000.35.out mode1243.34.out mode744.43.out mode943.54.out
-(Obviously, in reality there would be more files, but you get the idea.) Notice
+(Obviously, in reality there would be more files, but you get the idea.) Notice
that the shell sorts in lexicographical order. This is the behavior of programs like
``find`` as well as ``ls``. The problem is passing these files to an
analysis program causes them not to appear in numerical order, which can lead
@@ -114,8 +114,8 @@ To sort version numbers, use the default ``--number-type``:
prog-1.10.zip
prog-2.0.zip
-In general, all ``natsort`` shell script options mirror the :func:`~natsorted` API,
-with notable exception of the ``--filter``, ``--reverse-filter``, and ``--exclude``
+In general, all ``natsort`` shell script options mirror the :func:`~natsorted`
+API, with notable exception of the ``--filter``, ``--reverse-filter``, and ``--exclude``
options. These three options are used as follows:
.. code-block:: console