From 48349a14c6a23d6924bf29ca8aaa06b6a401e551 Mon Sep 17 00:00:00 2001 From: Seth Morton Date: Sun, 26 Feb 2023 15:08:25 -0800 Subject: Move most documentation to the wiki The pages are left with a note about the redirection so as to not destroy the internet. --- docs/examples.rst | 420 +---------------- docs/howitworks.rst | 1169 +----------------------------------------------- docs/locale_issues.rst | 91 +--- docs/shell.rst | 152 +------ 4 files changed, 10 insertions(+), 1822 deletions(-) diff --git a/docs/examples.rst b/docs/examples.rst index 689983a..b372ca7 100644 --- a/docs/examples.rst +++ b/docs/examples.rst @@ -6,421 +6,5 @@ Examples and Recipes ==================== -If you want more detailed examples than given on this page, please see -https://github.com/SethMMorton/natsort/tree/master/tests. - -.. contents:: - :local: - -Basic Usage ------------ - -In the most basic use case, simply import :func:`~natsorted` and use -it as you would :func:`sorted`: - -.. code-block:: pycon - - >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in'] - >>> sorted(a) - ['1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '2 ft 7 in', '7 ft 6 in'] - >>> from natsort import natsorted, ns - >>> natsorted(a) - ['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in'] - -Sort Version Numbers --------------------- - -As of :mod:`natsort` version >= 4.0.0, :func:`~natsorted` will work for -well-behaved version numbers, like ``MAJOR.MINOR.PATCH``. - -.. _rc_sorting: - -Sorting More Expressive Versioning Schemes -++++++++++++++++++++++++++++++++++++++++++ - -By default, if you wish to sort versions that are not as simple as -``MAJOR.MINOR.PATCH`` (or similar), you may not get the results you expect: - -.. code-block:: pycon - - >>> a = ['1.2', '1.2rc1', '1.2beta2', '1.2beta1', '1.2alpha', '1.2.1', '1.1', '1.3'] - >>> natsorted(a) - ['1.1', '1.2', '1.2.1', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.3'] - -To make the '1.2' pre-releases come before '1.2.1', you need to use the -following recipe: - -.. code-block:: pycon - - >>> natsorted(a, key=lambda x: x.replace('.', '~')) - ['1.1', '1.2', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.2.1', '1.3'] - -If you also want '1.2' after all the alpha, beta, and rc candidates, you can -modify the above recipe: - -.. code-block:: pycon - - >>> natsorted(a, key=lambda x: x.replace('.', '~')+'z') - ['1.1', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.2', '1.2.1', '1.3'] - -Please see `this issue `_ to -see why this works. - -Sorting Rigorously Defined Versioning Schemes (e.g. SemVer or PEP 440) -"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" - -If you know you are using a versioning scheme that follows a well-defined format -for which there is third-party module support, you should use those modules -to assist in sorting. Some examples might be -`PEP 440 `_ or -`SemVer `_. - -If we are being honest, using these methods to parse a version means you don't -need to use :mod:`natsort` - you should probably just use :func:`sorted` -directly. Here's an example with SemVer: - -.. code-block:: pycon - - >>> from semver import VersionInfo - >>> a = ['3.4.5-pre.1', '3.4.5', '3.4.5-pre.2+build.4'] - >>> sorted(a, key=VersionInfo.parse) - ['3.4.5-pre.1', '3.4.5-pre.2+build.4', '3.4.5'] - -.. _path_sort: - -Sort OS-Generated Paths ------------------------ - -In some cases when sorting file paths with OS-Generated names, the default -:mod:`~natsorted` algorithm may not be sufficient. In cases like these, -you may need to use the ``ns.PATH`` option: - -.. code-block:: pycon - - >>> a = ['./folder/file (1).txt', - ... './folder/file.txt', - ... './folder (1)/file.txt', - ... './folder (10)/file.txt'] - >>> natsorted(a) - ['./folder (1)/file.txt', './folder (10)/file.txt', './folder/file (1).txt', './folder/file.txt'] - >>> natsorted(a, alg=ns.PATH) - ['./folder/file.txt', './folder/file (1).txt', './folder (1)/file.txt', './folder (10)/file.txt'] - -Locale-Aware Sorting (Human Sorting) ------------------------------------- - -.. note:: - Please read :ref:`locale_issues` before using ``ns.LOCALE``, :func:`humansorted`, - or :func:`index_humansorted`. - -You can instruct :mod:`natsort` to use locale-aware sorting with the -``ns.LOCALE`` option. In addition to making this understand non-ASCII -characters, it will also properly interpret non-'.' decimal separators -and also properly order case. It may be more convenient to just use -the :func:`humansorted` function: - -.. code-block:: pycon - - >>> from natsort import humansorted - >>> import locale - >>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') - 'en_US.UTF-8' - >>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana'] - >>> natsorted(a, alg=ns.LOCALE) - ['apple', 'Apple', 'banana', 'Banana', 'corn', 'Corn'] - >>> humansorted(a) - ['apple', 'Apple', 'banana', 'Banana', 'corn', 'Corn'] - -You may find that if you do not explicitly set the locale your results may not -be as you expect... I have found that it depends on the system you are on. -If you use `PyICU `_ (see below) then -you should not need to do this. - -.. _case_sort: - -Controlling Case When Sorting ------------------------------ - -For non-numbers, by default :mod:`natsort` used ordinal sorting (i.e. -it sorts by the character's value in the ASCII table). For example: - -.. code-block:: pycon - - >>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana'] - >>> natsorted(a) - ['Apple', 'Banana', 'Corn', 'apple', 'banana', 'corn'] - -There are times when you wish to ignore the case when sorting, -you can easily do this with the ``ns.IGNORECASE`` option: - -.. code-block:: pycon - - >>> natsorted(a, alg=ns.IGNORECASE) - ['Apple', 'apple', 'Banana', 'banana', 'corn', 'Corn'] - -Note that's since Python's sorting is stable, the order of equivalent -elements after lowering the case is the same order they appear in the -original list. - -Upper-case letters appear first in the ASCII table, but many natural -sorting methods place lower-case first. To do this, use -``ns.LOWERCASEFIRST``: - -.. code-block:: pycon - - >>> natsorted(a, alg=ns.LOWERCASEFIRST) - ['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn'] - -It may be undesirable to have the upper-case letters grouped together -and the lower-case letters grouped together; most would expect all -"a"s to bet together regardless of case, and all "b"s, and so on. To -achieve this, use ``ns.GROUPLETTERS``: - -.. code-block:: pycon - - >>> natsorted(a, alg=ns.GROUPLETTERS) - ['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn'] - -You might combine this with ``ns.LOWERCASEFIRST`` to get what most -would expect to be "natural" sorting: - -.. code-block:: pycon - - >>> natsorted(a, alg=ns.G | ns.LF) - ['apple', 'Apple', 'banana', 'Banana', 'corn', 'Corn'] - -Customizing Float Definition ----------------------------- - -You can make :func:`~natsorted` search for any float that would be -a valid Python float literal, such as 5, 0.4, -4.78, +4.2E-34, etc. -using the ``ns.FLOAT`` key. You can disable the exponential component -of the number with ``ns.NOEXP``. - -.. code-block:: pycon - - >>> a = ['a50', 'a51.', 'a+50.4', 'a5.034e1', 'a+50.300'] - >>> natsorted(a, alg=ns.FLOAT) - ['a50', 'a5.034e1', 'a51.', 'a+50.300', 'a+50.4'] - >>> natsorted(a, alg=ns.FLOAT | ns.SIGNED) - ['a50', 'a+50.300', 'a5.034e1', 'a+50.4', 'a51.'] - >>> natsorted(a, alg=ns.FLOAT | ns.SIGNED | ns.NOEXP) - ['a5.034e1', 'a50', 'a+50.300', 'a+50.4', 'a51.'] - -For convenience, the ``ns.REAL`` option is provided which is a shortcut -for ``ns.FLOAT | ns.SIGNED`` and can be used to sort on real numbers. -This can be easily accessed with the :func:`~realsorted` convenience -function. Please note that the behavior of the :func:`~realsorted` function -was the default behavior of :func:`~natsorted` for :mod:`natsort` -version < 4.0.0: - -.. code-block:: pycon - - >>> natsorted(a, alg=ns.REAL) - ['a50', 'a+50.300', 'a5.034e1', 'a+50.4', 'a51.'] - >>> from natsort import realsorted - >>> realsorted(a) - ['a50', 'a+50.300', 'a5.034e1', 'a+50.4', 'a51.'] - -.. _custom_sort: - -Using a Custom Sorting Key --------------------------- - -Like the built-in ``sorted`` function, ``natsorted`` can accept a custom -sort key so that: - -.. code-block:: pycon - - >>> from operator import attrgetter, itemgetter - >>> a = [['a', 'num4'], ['b', 'num8'], ['c', 'num2']] - >>> natsorted(a, key=itemgetter(1)) - [['c', 'num2'], ['a', 'num4'], ['b', 'num8']] - >>> class Foo: - ... def __init__(self, bar): - ... self.bar = bar - ... def __repr__(self): - ... return "Foo('{}')".format(self.bar) - >>> b = [Foo('num3'), Foo('num5'), Foo('num2')] - >>> natsorted(b, key=attrgetter('bar')) - [Foo('num2'), Foo('num3'), Foo('num5')] - -.. _unit_sorting: - -Accounting for Units When Sorting -+++++++++++++++++++++++++++++++++ - -:mod:`natsort` does not come with any pre-built mechanism to sort units, -but you can write your own `key` to do this. Below, I will demonstrate sorting -imperial lengths (e.g. feet an inches), but of course you can extend this to any -set of units you need. This example is based on code -`from this issue `_, -and uses the function :func:`natsort.numeric_regex_chooser` to build a regular -expression that will parse numbers in the same manner as :mod:`natsort` itself. - -.. code-block:: pycon - - >>> import re - >>> import natsort - >>> - >>> # Define how each unit will be transformed - >>> conversion_mapping = { - ... "in": 1, - ... "inch": 1, - ... "inches": 1, - ... "ft": 12, - ... "feet": 12, - ... "foot": 12, - ... } - >>> - >>> # This regular expression searches for numbers and units - >>> all_units = "|".join(conversion_mapping.keys()) - >>> float_re = natsort.numeric_regex_chooser(natsort.FLOAT | natsort.SIGNED) - >>> unit_finder = re.compile(r"({})\s*({})".format(float_re, all_units), re.IGNORECASE) - >>> - >>> def unit_replacer(matchobj): - ... """ - ... Given a regex match object, return a replacement string where units are modified - ... """ - ... number = matchobj.group(1) - ... unit = matchobj.group(2) - ... new_number = float(number) * conversion_mapping[unit] - ... return "{} in".format(new_number) - ... - >>> # Demo time! - >>> data = ['1 ft', '5 in', '10 ft', '2 in'] - >>> [unit_finder.sub(unit_replacer, x) for x in data] - ['12.0 in', '5.0 in', '120.0 in', '2.0 in'] - >>> - >>> natsort.natsorted(data, key=lambda x: unit_finder.sub(unit_replacer, x)) - ['2 in', '5 in', '1 ft', '10 ft'] - -Generating a Natsort Key ------------------------- - -If you need to sort a list in-place, you cannot use :func:`~natsorted`; you -need to pass a key to the :meth:`list.sort` method. The function -:func:`~natsort_keygen` is a convenient way to generate these keys for you: - -.. code-block:: pycon - - >>> from natsort import natsort_keygen - >>> a = ['a50', 'a51.', 'a50.4', 'a5.034e1', 'a50.300'] - >>> natsort_key = natsort_keygen(alg=ns.FLOAT) - >>> a.sort(key=natsort_key) - >>> a - ['a50', 'a50.300', 'a5.034e1', 'a50.4', 'a51.'] - -:func:`~natsort_keygen` has the same API as :func:`~natsorted` (minus the -`reverse` option). - -Sorting Multiple Lists According to a Single List -------------------------------------------------- - -Sometimes you have multiple lists, and you want to sort one of those -lists and reorder the other lists according to how the first was sorted. -To achieve this you could use the :func:`~index_natsorted` in combination -with the convenience function -:func:`~order_by_index`: - -.. code-block:: pycon - - >>> from natsort import index_natsorted, order_by_index - >>> a = ['a2', 'a9', 'a1', 'a4', 'a10'] - >>> b = [4, 5, 6, 7, 8] - >>> c = ['hi', 'lo', 'ah', 'do', 'up'] - >>> index = index_natsorted(a) - >>> order_by_index(a, index) - ['a1', 'a2', 'a4', 'a9', 'a10'] - >>> order_by_index(b, index) - [6, 4, 7, 5, 8] - >>> order_by_index(c, index) - ['ah', 'hi', 'do', 'lo', 'up'] - -Returning Results in Reverse Order ----------------------------------- - -Just like the :func:`sorted` built-in function, you can supply the -``reverse`` option to return the results in reverse order: - -.. code-block:: pycon - - >>> a = ['a2', 'a9', 'a1', 'a4', 'a10'] - >>> natsorted(a, reverse=True) - ['a10', 'a9', 'a4', 'a2', 'a1'] - -Sorting Bytes -------------- - -Python is rather strict about comparing strings and bytes, and this -can make it difficult to deal with collections of both. Because of the -challenge of guessing which encoding should be used to decode a bytes -array to a string, :mod:`natsort` does *not* try to guess and automatically -convert for you; in fact, the official stance of :mod:`natsort` is to -not support sorting bytes. Instead, some decoding convenience functions -have been provided to you (see :ref:`bytes_help`) that allow you to -provide a codec for decoding bytes through the ``key`` argument that -will allow :mod:`natsort` to convert byte arrays to strings for sorting; -these functions know not to raise an error if the input is not a byte -array, so you can use the key on any arbitrary collection of data. - -.. code-block:: pycon - - >>> from natsort import as_ascii - >>> a = [b'a', 14.0, 'b'] - >>> # natsorted(a) would raise a TypeError (bytes() < str()) - >>> natsorted(a, key=as_ascii) == [14.0, b'a', 'b'] - True - -Additionally, regular expressions cannot be run on byte arrays, making it -so that :mod:`natsort` cannot parse them for numbers. As a result, if you -run :mod:`natsort` on a list of bytes, you will get results that are like -Python's default sorting behavior. Of course, you can use the decoding -functions to solve this: - -.. code-block:: pycon - - >>> from natsort import as_utf8 - >>> a = [b'a56', b'a5', b'a6', b'a40'] - >>> natsorted(a) # doctest: +SKIP - [b'a40', b'a5', b'a56', b'a6'] - >>> natsorted(a, key=as_utf8) == [b'a5', b'a6', b'a40', b'a56'] - True - -If you need a codec different from ASCII or UTF-8, you can use -:func:`decoder` to generate a custom key: - -.. code-block:: pycon - - >>> from natsort import decoder - >>> a = [b'a56', b'a5', b'a6', b'a40'] - >>> natsorted(a, key=decoder('latin1')) == [b'a5', b'a6', b'a40', b'a56'] - True - -Sorting a Pandas DataFrame --------------------------- - -Starting from Pandas version 1.1.0, the -`sorting methods accept a "key" argument `_, -so you can simply pass :func:`natsort_keygen` to the sorting methods and sort: - -.. code-block:: python - - import pandas as pd - from natsort import natsort_keygen - s = pd.Series(['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']) - s.sort_values(key=natsort_keygen()) - # 1 1 ft 5 in - # 0 2 ft 7 in - # 3 2 ft 11 in - # 4 7 ft 6 in - # 2 10 ft 2 in - # dtype: object - -Similarly, if you need to sort the index there is -`sort_index `_ -of a DataFrame. - -If you are on an older version of Pandas, check out please check out -`this answer on StackOverflow `_ -for ways to do this without the ``key`` argument to ``sort_values``. +This page has been moved to the +`natsort wiki `_. diff --git a/docs/howitworks.rst b/docs/howitworks.rst index d1c4559..4617a1b 100644 --- a/docs/howitworks.rst +++ b/docs/howitworks.rst @@ -6,1172 +6,11 @@ How Does Natsort Work? ====================== -.. contents:: - :local: - -:mod:`natsort` works by breaking strings into smaller sub-components (numbers -or everything else), and returning these components in a tuple. Sorting -tuples in Python is well-defined, and this fact is used to sort the input -strings properly. But how does one break a string into sub-components? -And what does one do to those components once they are split? Below I -will explain the algorithm that was chosen for the :mod:`natsort` module, -and some of the thinking that went into those design decisions. I will -also mention some of the stumbling blocks I ran into because -`getting sorting right is surprisingly hard`_. - -If you are impatient, you can skip to :ref:`tldr1` for the algorithm -in the simplest case, and :ref:`tldr2` -to see what extra code is needed to handle special cases. - -First, How Does Natural Sorting Work At a High Level? ------------------------------------------------------ - -If I want to compare '2 ft 7 in' to '2 ft 11 in', I might do the following - -.. code-block:: pycon - - >>> '2 ft 7 in' < '2 ft 11 in' - False - -We as humans know that the above should be true, but why does Python think it -is false? Here is how it is performing the comparison: - -:: - - '2' <=> '2' ==> equal, so keep going - ' ' <=> ' ' ==> equal, so keep going - 'f' <=> 'f' ==> equal, so keep going - 't' <=> 't' ==> equal, so keep going - ' ' <=> ' ' ==> equal, so keep going - '7' <=> '1' ==> different, use result of '7' < '1' - -'7' evaluates as greater than '1' so the statement is false. When sorting, if -a value is less than another it is placed first, so in our above example -'2 ft 11 in' would end up before '2 ft 7 in', which is not correct. What to do? - -The best way to handle this is to break the string into sub-components -of numbers and non-numbers, and then convert the numeric parts into -:func:`float` or :func:`int` types. This will force Python to -actually understand the context of what it is sorting and then "do the -right thing." Luckily, it handles sorting lists of strings right -out-of-the-box, so the only hard part is actually making this string-to-list -transformation and then Python will handle the rest. - -:: - - '2 ft 7 in' ==> (2, ' ft ', 7, ' in') - '2 ft 11 in' ==> (2, ' ft ', 11, ' in') - -When Python compares the two, it roughly follows the below logic: - -:: - - 2 <=> 2 ==> equal, so keep going - ' ft ' <=> ' ft ' ==> a string is a special type of sequence - evaluate each character individually - || - --> - ' ' <=> ' ' ==> equal, so keep going - 'f' <=> 'f' ==> equal, so keep going - 't' <=> 't' ==> equal, so keep going - ' ' <=> ' ' ==> equal, so keep going - <== Back to parent sequence - 7 <=> 11 ==> different, use the result of 7 < 11 - -Clearly, seven is less than eleven, so our comparison is as we expect, and we -would get the sorting order we wanted. - -At its heart, :mod:`natsort` is simply a tool to break strings into tuples, -turning numbers in strings (i.e. ``'79'``) into *ints* and *floats* as it does this. - -Natsort's Approach ------------------- - -.. contents:: - :local: - -Decomposing Strings Into Sub-Components -+++++++++++++++++++++++++++++++++++++++ - -The first major hurtle to overcome is to decompose the string into -sub-components. Remarkably, this turns out to be the easy part, owing mostly -to Python's easy access to regular expressions. Breaking an arbitrary string -based on a pattern is pretty straightforward. - -.. code-block:: pycon - - >>> import re - >>> re.split(r'(\d+)', '2 ft 11 in') - ['', '2', ' ft ', '11', ' in'] - -Clear (assuming you can read regular expressions) and concise. - -The reason I began developing :mod:`natsort` in the first place was because I -needed to handle the natural sorting of strings containing *real numbers*, not -just unsigned integers as the above example contains. By real numbers, I mean -those like ``-45.4920E-23``. :mod:`natsort` can handle just about any number -definition; to that end, here are all the regular expressions used in -:mod:`natsort`: - -.. code-block:: pycon - - >>> unsigned_int = r'([0-9]+)' - >>> signed_int = r'([-+]?[0-9]+)' - >>> unsigned_float = r'((?:[0-9]+\.?[0-9]*|\.[0-9]+)(?:[eE][-+]?[0-9]+)?)' - >>> signed_float = r'([-+]?(?:[0-9]+\.?[0-9]*|\.[0-9]+)(?:[eE][-+]?[0-9]+)?)' - >>> unsigned_float_no_exponent = r'((?:[0-9]+\.?[0-9]*|\.[0-9]+))' - >>> signed_float_no_exponent = r'([-+]?(?:[0-9]+\.?[0-9]*|\.[0-9]+))' - -Note that ``"inf"`` and ``"nan"`` are deliberately omitted from the float -definition because you wouldn't want (for example) ``"banana"`` to be converted -into ``['ba', 'nan', 'a']``, Let's see an example: - -.. code-block:: pycon - - >>> re.split(signed_float, 'The mass of 3 electrons is 2.732815068E-30 kg') - ['The mass of ', '3', ' electrons is ', '2.732815068E-30', ' kg'] - -.. note:: - - It is a bit of a lie to say the above are the complete regular expressions. In the - actual code there is also handling for non-ASCII unicode characters (such as ⑦), - but I will ignore that aspect of :mod:`natsort` in this discussion. - -Now, when the user wants to change the definition of a number, it is as easy as -changing the pattern supplied to the regular expression engine. - -Choosing the right default is hard, though (well, in this case it shouldn't -have been but I was rather thick-headed). In retrospect, it should have been -obvious that since essentially all the code examples I had/have seen for -natural sorting were for *unsigned integers*, I should have made the default -definition of a number an *unsigned integer*. But, in the brash days of my -youth I assumed that since my use case was real numbers, everyone else would -be happier sorting by real numbers; so, I made the default definition of a -number a *signed float with exponent*. `This astonished`_ `a lot`_ `of people`_ -(`and some people aren't very nice when they are astonished`_). -Starting with :mod:`natsort` version 4.0.0 the default number definition was -changed to an *unsigned integer* which satisfies the "least astonishment" -principle, and I have not heard a complaint since. - -Coercing Strings Containing Numbers Into Numbers -++++++++++++++++++++++++++++++++++++++++++++++++ - -There has been some debate on Stack Overflow as to what method is best to -coerce a string to a number if it can be coerced, and leaving it alone otherwise -(see `this one for coercion`_ and `this one for checking`_ for some high traffic questions), -but it mostly boils down to two different solutions, shown here: - -.. code-block:: pycon - - >>> def coerce_try_except(x): - ... try: - ... return int(x) - ... except ValueError: - ... return x - ... - >>> def coerce_regex(x): - ... # Note that precompiling the regex is more performant, - ... # but I do not show that here for clarity's sake. - ... return int(x) if re.match(r'[-+]?\d+$', x) else x - ... - -Here are some timing results run on my machine: - -.. code-block:: pycon - - In [0]: numbers = list(map(str, range(100))) # A list of numbers as strings - - In [1]: not_numbers = ['banana' + x for x in numbers] - - In [2]: %timeit [coerce_try_except(x) for x in numbers] - 10000 loops, best of 3: 51.1 µs per loop - - In [3]: %timeit [coerce_try_except(x) for x in not_numbers] - 1000 loops, best of 3: 289 µs per loop - - In [4]: %timeit [coerce_regex(x) for x in not_numbers] - 10000 loops, best of 3: 67.6 µs per loop - - In [5]: %timeit [coerce_regex(x) for x in numbers] - 10000 loops, best of 3: 123 µs per loop - -What can we learn from this? The ``try: except`` method (arguably the most -"pythonic" of the solutions) is best for numeric input, but performs over 5X -slower for non-numeric input. Conversely, the regular expression method, though -slower than ``try: except`` for both input types, is more efficient for -non-numeric input than for input that can be converted to an ``int``. Further, -even though the regular expression method is slower for both input types, it is -always at least twice as fast as the worst case for the ``try: except``. - -Why do I care? Shouldn't I just pick a method and not worry about it? Probably. -However, I am very conscious about the performance of :mod:`natsort`, and want -it to be a true drop-in replacement for :func:`sorted` without having to incur -a performance penalty. For the purposes of :mod:`natsort`, there is no clear -winner between the two algorithms - the data being passed to this function will -likely be a mix of numeric and non-numeric string content. Do I use the -``try: except`` method and hope the speed gains on numbers will offset the -non-number performance, or do I use regular expressions and take the more -stable performance? - -It turns out that within the context of :mod:`natsort`, some assumptions can be -made that make a hybrid approach attractive. Because all strings are pre-split -into numeric and non-numeric content *before* being passed to this coercion -function, the assumption can be made that *if a string begins with a digit or a -sign, it can be coerced into a number*. - -.. code-block:: pycon - - >>> def coerce_to_int(x): - ... if x[0] in '0123456789+-': - ... try: - ... return int(x) - ... except ValueError: - ... return x - ... else: - ... return x - ... - -So how does this perform compared to the standard coercion methods? - -.. code-block:: pycon - - In [6]: %timeit [coerce_to_int(x) for x in numbers] - 10000 loops, best of 3: 71.6 µs per loop - - In [7]: %timeit [coerce_to_int(x) for x in not_numbers] - 10000 loops, best of 3: 26.4 µs per loop - -The hybrid method eliminates most of the time wasted on numbers checking -that it is in fact a number before passing to :func:`int`, and eliminates -the time wasted in the exception stack for input that is not a number. - -That's as fast as we can get, right? In pure Python, probably. At least, it's -close. But because I am crazy and a glutton for punishment, I decided to see -if I could get any faster writing a C extension. It's called -`fastnumbers`_ and contains a C implementation of the above coercion functions -called :func:`fast_int`. How does it fair? Pretty well. - -.. code-block:: pycon - - In [8]: %timeit [fast_int(x) for x in numbers] - 10000 loops, best of 3: 30.9 µs per loop - - In [9]: %timeit [fast_int(x) for x in not_numbers] - 10000 loops, best of 3: 30 µs per loop - -During development of :mod:`natsort`, I wanted to ensure that using it did not -get in the way of a user's program by introducing a performance penalty to -their code. To that end, I do not feel like my adventures down the rabbit hole -of optimization of coercion functions was a waste; I can confidently look users -in the eye and say I considered every option in ensuring :mod:`natsort` is as -efficient as possible. This is why if `fastnumbers`_ is installed it will be -used for this step, and otherwise the hybrid method will be used. - -.. note:: - - Modifying the hybrid coercion function for floats is straightforward. - - .. code-block:: pycon - - >>> def coerce_to_float(x): - ... if x[0] in '.0123456789+-' or x.lower().lstrip()[:3] in ('nan', 'inf'): - ... try: - ... return float(x) - ... except ValueError: - ... return x - ... else: - ... return x - ... - -.. _tldr1: - -TL;DR 1 - The Simple "No Special Cases" Algorithm -+++++++++++++++++++++++++++++++++++++++++++++++++ - -At this point, our :mod:`natsort` algorithm is essentially the following: - -.. code-block:: pycon - - >>> import re - >>> def natsort_key(x, as_float=False, signed=False): - ... if as_float: - ... regex = signed_float if signed else unsigned_float - ... else: - ... regex = signed_int if signed else unsigned_int - ... split_input = re.split(regex, x) - ... split_input = filter(None, split_input) # removes null strings - ... coerce = coerce_to_float if as_float else coerce_to_int - ... return tuple(coerce(s) for s in split_input) - ... - -I have written the above for clarity and not performance. -This pretty much matches `most natural sort solutions for python on Stack Overflow`_ -(except the above includes customization of the definition of a number). +This page has been moved to the +`natsort wiki `_. Special Cases Everywhere! ------------------------- -.. contents:: - :local: - -.. image:: special_cases_everywhere.jpg - -If what I described in :ref:`TL;DR 1 ` were -all that :mod:`natsort` needed to -do then there probably wouldn't be much need for a third-party module, right? -Probably. But it turns out that in real-world data there are a lot of -special cases that need to be handled, and in true `80%/20%`_ fashion, the -majority of the code in :mod:`natsort` is devoted to handling special cases -like those described below. - -Sorting Filesystem Paths -++++++++++++++++++++++++ - -`The first major special case I encountered was sorting filesystem paths`_ -(if you go to the link, you will see I didn't handle it well for a year... -this was before I fully realized how much functionality I could really add -to :mod:`natsort`). Let's apply the :func:`natsort_key` from above to some -filesystem paths that you might see being auto-generated from your operating -system: - -.. code-block:: pycon - - >>> paths = ['Folder (10)/file.tar.gz', - ... 'Folder/file.tar.gz', - ... 'Folder (1)/file (1).tar.gz', - ... 'Folder (1)/file.tar.gz'] - >>> sorted(paths, key=natsort_key) - ['Folder (1)/file (1).tar.gz', 'Folder (1)/file.tar.gz', 'Folder (10)/file.tar.gz', 'Folder/file.tar.gz'] - -Well that's not right! What is ``'Folder/file.tar.gz'`` doing at the end? -It has to do with the numerical ASCII code assigned to the space and -``/`` characters in the `ASCII table`_. According to the `ASCII table`_, the -space character (number 32) comes before the ``/`` character (number 47). If -we remove the common prefix in all of the above strings (``'Folder'``), we -can see why this happens: - -.. code-block:: pycon - - >>> ' (1)/file.tar.gz' < '/file.tar.gz' - True - >>> ' ' < '/' - True - -This isn't very convenient... how do we solve it? We can split the path -across the path separators and then sort. A convenient way do to this is -with the :data:`Path.parts ` property from -:mod:`pathlib`: - -.. code-block:: pycon - - >>> import pathlib - >>> sorted(paths, key=lambda x: tuple(natsort_key(s) for s in pathlib.Path(x).parts)) - ['Folder/file.tar.gz', 'Folder (1)/file (1).tar.gz', 'Folder (1)/file.tar.gz', 'Folder (10)/file.tar.gz'] - -Almost! It seems like there is some funny business going on in the final -filename component as well. We can solve that nicely and quickly with -:data:`Path.suffixes ` and :data:`Path.stem -`. - -.. code-block:: pycon - - >>> def decompose_path_into_components(x): - ... path_split = list(pathlib.Path(x).parts) - ... # Remove the final filename component from the path. - ... final_component = pathlib.Path(path_split.pop()) - ... # Split off all the extensions. - ... suffixes = final_component.suffixes - ... stem = final_component.name.replace(''.join(suffixes), '') - ... # Remove the '.' prefix of each extension, and make that - ... # final component a list of the stem and each suffix. - ... final_component = [stem] + [x[1:] for x in suffixes] - ... # Replace the split final filename component. - ... path_split.extend(final_component) - ... return path_split - ... - >>> def natsort_key_with_path_support(x): - ... return tuple(natsort_key(s) for s in decompose_path_into_components(x)) - ... - >>> sorted(paths, key=natsort_key_with_path_support) - ['Folder/file.tar.gz', 'Folder (1)/file.tar.gz', 'Folder (1)/file (1).tar.gz', 'Folder (10)/file.tar.gz'] - -This works because in addition to breaking the input by path separators, -the final filename component is separated from its extensions as well. -*Then*, each of these separated components is sent to the -:mod:`natsort` algorithm, so the result is a tuple of tuples. Once that -is done, we can see how comparisons can be done in the expected manner. - -.. code-block:: pycon - - >>> a = natsort_key_with_path_support('Folder (1)/file (1).tar.gz') - >>> a - (('Folder (', 1, ')'), ('file (', 1, ')'), ('tar',), ('gz',)) - >>> - >>> b = natsort_key_with_path_support('Folder/file.tar.gz') - >>> b - (('Folder',), ('file',), ('tar',), ('gz',)) - >>> - >>> a > b - True - -.. note:: - - The actual :meth:`decompose_path_into_components`-equivalent function in - :mod:`natsort` actually has a few more heuristics than shown here so that - it is not over-zealous in what it defines as a path suffix, but this has - been omitted in this how-to for clarity. - -Comparing Different Types -+++++++++++++++++++++++++ - -`The second major special case I encountered was sorting of different types`_. -On Python 2 (i.e. legacy Python), this mostly didnt't matter *too* -much since it uses an arbitrary heuristic to allow traditionally un-comparable -types to be compared (such as comparing ``'a'`` to ``1``). However, on Python 3 -(i.e. Python) it simply won't let you perform such nonsense, raising a -:exc:`TypeError` instead. - -You can imagine that a module that breaks strings into tuples of numbers and -strings is walking a dangerous line if it does not have special handling for -comparing numbers and strings. My imagination was not so great at first. -Let's take a look at all the ways this can fail with real-world data. - -.. code-block:: pycon - - >>> def natsort_key_with_poor_real_number_support(x): - ... split_input = re.split(signed_float, x) - ... split_input = filter(None, split_input) # removes null strings - ... return tuple(coerce_to_float(s) for s in split_input) - >>> - >>> sorted([5, '4'], key=natsort_key_with_poor_real_number_support) - Traceback (most recent call last): - ... - TypeError: ... - >>> - >>> sorted(['12 apples', 'apples'], key=natsort_key_with_poor_real_number_support) - Traceback (most recent call last): - ... - TypeError: ... - >>> - >>> sorted(['version5.3.0', 'version5.3rc1'], key=natsort_key_with_poor_real_number_support) - Traceback (most recent call last): - ... - TypeError: ... - -Let's break these down. - -#. The integer ``5`` is sent to ``re.split`` which expects only strings - or bytes, which is a no-no. -#. ``natsort_key_with_poor_real_number_support('12 apples') < natsort_key_with_poor_real_number_support('apples')`` - is the same as ``(12.0, ' apples') < ('apples',)``, and thus a number gets - compared to a string [#f1]_ which also is a no-no. -#. This one scores big on the astonishment scale, especially if one - accidentally uses signed integers or real numbers when they mean - to use unsigned integers. - ``natsort_key_with_poor_real_number_support('version5.3.0') < natsort_key_with_poor_real_number_support('version5.3rc1')`` - is the same as ``('version', 5.3, 0.0) < ('version', 5.3, 'rc', 1.0)``, - so in the third element a number gets compared to a string, once again - the same old no-no. (The same would happen with ``'version5-3'`` and - ``'version5-a'``, which would become ``('version', 5, -3)`` and - ``('version', 5, '-a')``). - -As you might expect, the solution to the first issue is to wrap the -``re.split`` call in a ``try: except:`` block and handle the number specially -if a :exc:`TypeError` is raised. The second and third cases *could* be handled -in a "special case" manner, meaning only respond and do something different -if these problems are detected. But a less error-prone method is to ensure -that the data is correct-by-construction, and this can be done by ensuring -that the returned tuples *always* start with a string, and then alternate -in a string-number-string-number-string pattern; this can be achieved by -adding an empty string wherever the pattern is not followed [#f2]_. This ends -up working out pretty nicely because empty strings are always "less" than -any non-empty string, and we typically want numbers to come before strings. - -Let's take a look at how this works out. - -.. code-block:: pycon - - >>> from natsort.utils import sep_inserter - >>> list(sep_inserter(iter(['apples']), '')) - ['apples'] - >>> - >>> list(sep_inserter(iter([12, ' apples']), '')) - ['', 12, ' apples'] - >>> - >>> list(sep_inserter(iter(['version', 5, -3]), '')) - ['version', 5, '', -3] - >>> - >>> from natsort import natsort_keygen, ns - >>> natsort_key_with_good_real_number_support = natsort_keygen(alg=ns.REAL) - >>> - >>> sorted([5, '4'], key=natsort_key_with_good_real_number_support) - ['4', 5] - >>> - >>> sorted(['12 apples', 'apples'], key=natsort_key_with_good_real_number_support) - ['12 apples', 'apples'] - >>> - >>> sorted(['version5.3.0', 'version5.3rc1'], key=natsort_key_with_good_real_number_support) - ['version5.3.0', 'version5.3rc1'] - -How the "good" version works will be given in -`TL;DR 2 - Handling Crappy, Real-World Input`_. - -Handling NaN -++++++++++++ - -`A rather unexpected special case I encountered was sorting collections containing NaN`_. -Let's see what happens when you try to sort a plain old list of numbers when there -is a **NaN** floating around in there. - -.. code-block:: pycon - - >>> danger = [7, float('nan'), 22.7, 19, -14, 59.123, 4] - >>> sorted(danger) - [7, nan, -14, 4, 19, 22.7, 59.123] - -Clearly that isn't correct, and for once it isn't my fault! -`It's hard to compare floating point numbers`_. By definition, **NaN** is unorderable -to any other number, and is never equal to any other number, including itself. - -.. code-block:: pycon - - >>> nan = float('nan') - >>> 5 > nan - False - >>> 5 < nan - False - >>> 5 == nan - False - >>> 5 != nan - True - >>> nan == nan - False - >>> nan != nan - True - -The implication of all this for us is that if there is an **NaN** in the -data-set we are trying to sort, the data-set will end up being sorted in -two separate yet individually sorted sequences - the one *before* the **NaN**, -and the one *after*. This is because the ``<`` operation that is used -to sort always returns :const:`False` with **NaN**. - -Because :mod:`natsort` aims to sort sequences in a way that does not surprise -the user, keeping this behavior is not acceptable (I don't require my users -to know how **NaN** will behave in a sorting algorithm). The simplest way to -satisfy the "least astonishment" principle is to substitute **NaN** with -some other value. But what value is *least* astonishing? I chose to replace -**NaN** with :math:`-\infty` so that these poorly behaved elements always -end up at the front where the users will most likely be alerted to their -presence. - -.. code-block:: pycon - - >>> def fix_nan(x): - ... if x != x: # only true for NaN - ... return float('-inf') - ... else: - ... return x - ... - -Let's check out :ref:`TL;DR 2 ` to see how this can be -incorporated into the simple key function from :ref:`TL;DR 1 `. - -.. _tldr2: - -TL;DR 2 - Handling Crappy, Real-World Input -+++++++++++++++++++++++++++++++++++++++++++ - -Let's see how our elegant key function from :ref:`TL;DR 1 ` has -become bastardized in order to support handling mixed real-world data -and user customizations. - -.. code-block:: pycon - - >>> def natsort_key(x, as_float=False, signed=False, as_path=False): - ... if as_float: - ... regex = signed_float if signed else unsigned_float - ... else: - ... regex = signed_int if signed else unsigned_int - ... try: - ... if as_path: - ... x = decompose_path_into_components(x) # Decomposes into list of strings - ... # If this raises a TypeError, input is not a string. - ... split_input = re.split(regex, x) - ... except TypeError: - ... try: - ... # Does this need to be applied recursively (list-of-list)? - ... return tuple(map(natsort_key, x)) - ... except TypeError: - ... # Must be a number - ... ret = ('', fix_nan(x)) # Maintain string-number-string pattern - ... return (ret,) if as_path else ret # as_path returns tuple-of-tuples - ... else: - ... split_input = filter(None, split_input) # removes null strings - ... # Note that the coerce_to_int/coerce_to_float functions - ... # are also modified to use the fix_nan function. - ... if as_float: - ... coerced_input = (coerce_to_float(s) for s in split_input) - ... else: - ... coerced_input = (coerce_to_int(s) for s in split_input) - ... return tuple(sep_inserter(coerced_input, '')) - ... - -And this doesn't even show handling :class:`bytes` type! Notice that we have -to do non-obvious things like modify the return form of numbers when ``as_path`` -is given, just to avoid comparing strings and numbers for the case in which a -user provides input like ``['/home/me', 42]``. - -Let's take it out for a spin! - -.. code-block:: pycon - - >>> danger = [7, float('nan'), 22.7, '19', '-14', '59.123', 4] - >>> sorted(danger, key=lambda x: natsort_key(x, as_float=True, signed=True)) - [nan, '-14', 4, 7, '19', 22.7, '59.123'] - >>> - >>> paths = ['Folder (1)/file.tar.gz', - ... 'Folder/file.tar.gz', - ... 123456] - >>> sorted(paths, key=lambda x: natsort_key(x, as_path=True)) - [123456, 'Folder/file.tar.gz', 'Folder (1)/file.tar.gz'] - -Here Be Dragons: Adding Locale Support --------------------------------------- - -.. contents:: - :local: - -Probably the most challenging special case I had to handle was getting -:mod:`natsort` to handle sorting the non-numerical parts of input -correctly, and also allowing it to sort the numerical bits in different -locales. This was in no way what I originally set out to do with this -library, so I was -`caught a bit off guard when the request was initially made`_. -I discovered the :mod:`locale` library, and assumed that if it's part of -Python's StdLib there can't be too many dragons, right? - -.. admonition:: INCOMPLETE LIST OF DRAGONS - - - https://github.com/SethMMorton/natsort/issues/21 - - https://github.com/SethMMorton/natsort/issues/22 - - https://github.com/SethMMorton/natsort/issues/23 - - https://github.com/SethMMorton/natsort/issues/36 - - https://github.com/SethMMorton/natsort/issues/44 - - https://bugs.python.org/issue2481 - - https://bugs.python.org/issue23195 - - https://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help - - https://stackoverflow.com/questions/22203550/sort-dictionary-by-key-using-locale-collation - - https://stackoverflow.com/questions/33459384/unicode-character-not-in-range-when-calling-locale-strxfrm - - https://stackoverflow.com/questions/36431810/sort-numeric-lines-with-thousand-separators - - https://stackoverflow.com/questions/45734562/how-can-i-get-a-reasonable-string-sorting-with-python - -These can be summed up as follows: - -#. :mod:`locale` is a thin wrapper over your operating system's *locale* - library, so if *that* is broken (like it is on BSD and OSX) then - :mod:`locale` is broken in Python. -#. Because of a bug in legacy Python (i.e. Python 2), there was no uniform - way to use the :mod:`locale` sorting functionality between legacy Python - and Python (luckily this is no longer an issue now that Python 2 is EOL). -#. People have differing opinions of how capitalization should affect word - order. -#. There is no built-in way to handle locale-dependent thousands separators - and decimal points *robustly*. -#. Proper handling of Unicode is complicated. -#. Proper handling of :mod:`locale` is complicated. - -Easily over half of the code in :mod:`natsort` is in some way dealing with some -aspect of :mod:`locale` or basic case handling. It would have been impossible -to get right without a `really good`_ `testing strategy`_. - -Don't expect any more TL;DR's... if you want to see how all this is fully -incorporated into the :mod:`natsort` algorithm then please take a look -`at the code`_. However, I will hint at how specific steps are taken in -each section. - -Let's see how we can handle some of the dragons, one-by-one. - -Basic Case Control Support -++++++++++++++++++++++++++ - -Without even thinking about the mess that is adding :mod:`locale` support, -:mod:`natsort` can introduce support for controlling how case is interpreted. - -First, let's take a look at how it is sorted by default (due to -where characters lie on the `ASCII table`_). - -.. code-block:: pycon - - >>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana'] - >>> sorted(a) - ['Apple', 'Banana', 'Corn', 'apple', 'banana', 'corn'] - -All uppercase letters come before lowercase letters in the `ASCII table`_, -so all capitalized words appear first. Not everyone agrees that this -is the correct order. Some believe that the capitalized words should -be last (``['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']``). -Some believe that both the lowercase and uppercase versions -should appear together -(``['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']``). -Some believe that both should be true ☹. Some people don't care at all [#f3]_. - -Solving the first case (I call it *LOWERCASEFIRST*) is actually pretty -easy... just call the :meth:`str.swapcase` method on the input. - -.. code-block:: pycon - - >>> sorted(a, key=lambda x: x.swapcase()) - ['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn'] - -The last (i call it *IGNORECASE*) is pretty easy. -Simply call :meth:`str.casefold` on the input (it's like :meth:`std.lowercase` -but does a better job on non-latin character sets). - -.. code-block:: pycon - - >>> sorted(a, key=lambda x: x.casefold()) - ['Apple', 'apple', 'Banana', 'banana', 'corn', 'Corn'] - -The middle case (I call it *GROUPLETTERS*) is less straightforward. -The most efficient way to handle this is to duplicate each character -with its lowercase version and then the original character. - -.. code-block:: pycon - - >>> import itertools - >>> def groupletters(x): - ... return ''.join(itertools.chain.from_iterable((y.casefold(), y) for y in x)) - ... - >>> groupletters('Apple') - 'aAppppllee' - >>> groupletters('apple') - 'aappppllee' - >>> sorted(a, key=groupletters) - ['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn'] - -The effect of this is that both ``'Apple'`` and ``'apple'`` are -placed adjacent to each other because their transformations both begin -with ``'a'``, and then the second character can be used to order them -appropriately with respect to each other. - -There's a problem with this, though. Within the context of :mod:`natsort` -we are trying to correctly sort numbers and those should be left alone. - -.. code-block:: pycon - - >>> a = ['Apple5', 'apple', 'Apple4E10', 'Banana'] - >>> sorted(a, key=lambda x: natsort_key(x, as_float=True)) - ['Apple5', 'Apple4E10', 'Banana', 'apple'] - >>> sorted(a, key=lambda x: natsort_key(groupletters(x), as_float=True)) - ['Apple4E10', 'Apple5', 'apple', 'Banana'] - >>> groupletters('Apple4E10') - 'aAppppllee44eE1100' - -We messed up the numbers! Looks like :func:`groupletters` needs to be applied -*after* the strings are broken into their components. I'm not going to show -how this is done here, but basically it requires applying the function in -the ``else:`` block of :func:`coerce_to_int`/:func:`coerce_to_float`. - -.. code-block:: pycon - - >>> better_groupletters = natsort_keygen(alg=ns.GROUPLETTERS | ns.REAL) - >>> better_groupletters('Apple4E10') - ('aAppppllee', 40000000000.0) - >>> sorted(a, key=better_groupletters) - ['Apple5', 'Apple4E10', 'apple', 'Banana'] - -Of course, applying both *LOWERCASEFIRST* and *GROUPLETTERS* is just -a matter of turning on both functions. - -Basic Unicode Support -+++++++++++++++++++++ - -Unicode is hard and complicated. Here's an example. - -.. code-block:: pycon - - >>> b = [b'\x66', b'\x65', b'\xc3\xa9', b'\x65\xcc\x81', b'\x61', b'\x7a'] - >>> a = [x.decode('utf8') for x in b] - >>> a # doctest: +SKIP - ['f', 'e', 'é', 'é', 'a', 'z'] - >>> sorted(a) # doctest: +SKIP - ['a', 'e', 'é', 'f', 'z', 'é'] - -There are more than one way to represent the character 'é' in Unicode. -In fact, many characters have multiple representations. This is a challenge -because comparing the two representations would return ``False`` even though -they *look* the same. - -.. code-block:: pycon - - >>> a[2] == a[3] - False - -Alas, since characters are compared based on the numerical value of their -representation, sorting Unicode often gives unexpected results (like seeing -'é' come both *before* and *after* 'z'). - -The original approach that :mod:`natsort` took with respect to non-ASCII -Unicode characters was to say "just use -the :mod:`locale` or :mod:`PyICU` library" and then cross it's fingers -and hope those libraries take care of it. As you will find in the following -sections, that comes with its own baggage, and turned out to not always work -anyway (see https://stackoverflow.com/q/45734562/1399279). A more robust -approach is to handle the Unicode out-of-the-box without invoking a -heavy-handed library like :mod:`locale` or :mod:`PyICU`. -To do this, we must use *normalization*. - -To fully understand Unicode normalization, -`check out some official Unicode documentation`_. -Just kidding... that's too much text. The following StackOverflow answers do -a good job at explaining Unicode normalization in simple terms: -https://stackoverflow.com/a/7934397/1399279 and -https://stackoverflow.com/a/7931547/1399279. Put simply, normalization -ensures that Unicode characters with multiple representations are in -some canonical and consistent representation so that (for example) comparisons -of the characters can be performed in a sane way. The following discussion -assumes you at least read the StackOverflow answers. - -Looking back at our 'é' example, we can see that the two versions were -constructed with the byte strings ``b'\xc3\xa9'`` and ``b'\x65\xcc\x81'``. -The former representation is actually -`LATIN SMALL LETTER E WITH ACUTE `_ -and is a single character in the Unicode standard. This is known as the -*compressed form* and corresponds to the 'NFC' normalization scheme. -The latter representation is actually the letter 'e' followed by -`COMBINING ACUTE ACCENT `_ -and so is two characters in the Unicode standard. This is known as the -*decompressed form* and corresponds to the 'NFD' normalization scheme. -Since the first character in the decompressed form is actually the letter 'e', -when compared to other ASCII characters it fits where you might expect. -Unfortunately, all Unicode compressed form characters come after the -ASCII characters and so they always will be placed after 'z' when sorting. - -It seems that most Unicode data is stored and shared in the compressed form -which makes it challenging to sort. This can be solved by normalizing all -incoming Unicode data to the decompressed form ('NFD') and *then* sorting. - -.. code-block:: pycon - - >>> import unicodedata - >>> c = [unicodedata.normalize('NFD', x) for x in a] - >>> c # doctest: +SKIP - ['f', 'e', 'é', 'é', 'a', 'z'] - >>> sorted(c) # doctest: +SKIP - ['a', 'e', 'é', 'é', 'f', 'z'] - -Huzzah! Sane sorting without having to resort to :mod:`locale`! - -Using Locale to Compare Strings -+++++++++++++++++++++++++++++++ - -The :mod:`locale` module is actually pretty cool, and provides lowly -spare-time programmers like myself a way to handle the daunting task -of proper locale-dependent support of their libraries and utilities. -Having said that, it can be a bit of a bear to get right, -`although they do point out in the documentation that it will be painful to use`_. -Aside from the caveats spelled out in that link, it turns out that just -comparing strings with :mod:`locale` in a cross-platform and -cross-python-version manner is not as straightforward as one might hope. - -First, how to use :mod:`locale` to compare strings? It's actually -pretty straightforward. Simply run the input through the :mod:`locale` -transformation function :func:`locale.strxfrm`. - -.. code-block:: pycon - - >>> import locale, sys - >>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') - 'en_US.UTF-8' - >>> a = ['a', 'b', 'ä'] - >>> sorted(a) - ['a', 'b', 'ä'] - >>> # The below fails on OSX, so don't run doctest on darwin. - >>> is_osx = sys.platform == 'darwin' - >>> sorted(a, key=locale.strxfrm) if not is_osx else ['a', 'ä', 'b'] - ['a', 'ä', 'b'] - >>> - >>> a = ['apple', 'Banana', 'banana', 'Apple'] - >>> sorted(a, key=locale.strxfrm) if not is_osx else ['apple', 'Apple', 'banana', 'Banana'] - ['apple', 'Apple', 'banana', 'Banana'] - -It turns out that locale-aware sorting groups numbers in the same -way as turning on *GROUPLETTERS* and *LOWERCASEFIRST*. -The trick is that you have to apply :func:`locale.strxfrm` only to non-numeric -characters; otherwise, numbers won't be parsed properly. Therefore, it must -be applied as part of the :func:`coerce_to_int`/:func:`coerce_to_float` -functions in a manner similar to :func:`groupletters`. - -Unicode Support With Local -++++++++++++++++++++++++++ - -Remember how in the `Basic Unicode Support`_ section I mentioned that we -use the "decompressed" Unicode normalization form (e.g. NFD) on all inputs -to ensure the order is as expected? - -If you have been following along so far, you probably expect that it is not -that easy. You would be correct. - -It turns out that some locales (but not all) expect the input to be in -"compressed form" (e.g. NFC) or the ordering is not as you might expect. -`Check out this issue for a real-world example`_. Here's a relevant -snippet of code - -.. code-block:: pycon - - In [1]: import locale, unicodedata - - In [2]: a = ['Aš', 'Cheb', 'Česko', 'Cibulov', 'Znojmo', 'Žilina'] - - In [3]: locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') - Out[3]: 'en_US.UTF-8' - - In [4]: sorted(a, key=locale.strxfrm) - Out[4]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo'] - - In [5]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFD", x))) - Out[5]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo'] - - In [6]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFC", x))) - Out[6]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo'] - - In [7]: locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8') - Out[7]: 'de_DE.UTF-8' - - In [8]: sorted(a, key=locale.strxfrm) - Out[8]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo'] - - In [9]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFD", x))) - Out[9]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo'] - - In [10]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFC", x))) - Out[10]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo'] - - In [11]: locale.setlocale(locale.LC_ALL, 'cs_CZ.UTF-8') - Out[11]: 'cs_CZ.UTF-8' - - In [12]: sorted(a, key=locale.strxfrm) - Out[12]: ['Aš', 'Cibulov', 'Česko', 'Cheb', 'Znojmo', 'Žilina'] - - In [13]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFD", x))) - Out[13]: ['Aš', 'Česko', 'Cibulov', 'Cheb', 'Žilina', 'Znojmo'] - - In [14]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFC", x))) - Out[14]: ['Aš', 'Cibulov', 'Česko', 'Cheb', 'Znojmo', 'Žilina'] - -Two out of three locales sort the same data in the same order no matter how the unicode -input was normalized, but Czech seems to care how the input is formatted! - -So, everthing mentioned in `Basic Unicode Support`_ is conditional on whether -or not the user wants to use the :mod:`locale` library or not. If not, then -"NFD" normalization is used. If they do, "NFC" normalization is used. - -Handling Broken Locale On OSX -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -But what if the underlying *locale* implementation that :mod:`locale` -relies upon is simply broken? It turns out that the *locale* library on -OSX (and other BSD systems) is broken (and for some reason has never been -fixed?), and so :mod:`locale` does not work as expected. - -How do I define doesn't work as expected? - -.. code-block:: pycon - - >>> a = ['apple', 'Banana', 'banana', 'Apple'] - >>> sorted(a) - ['Apple', 'Banana', 'apple', 'banana'] - >>> - >>> sorted(a, key=locale.strxfrm) if is_osx else sorted(a) - ['Apple', 'Banana', 'apple', 'banana'] - -IT'S SORTING AS IF :func:`locale.stfxfrm` WAS NEVER USED!! (and it's worse -once non-ASCII characters get thrown into the mix.) I'm really not -sure why this is considered OK for the OSX/BSD maintainers to not fix, -but it's more than frustrating for poor developers who have been dragged -into the *locale* game kicking and screaming. **. - -So, how to deal with this situation? There are two ways to do so. - -#. Detect if :mod:`locale` is sorting incorrectly (i.e. ``dumb``) by seeing - if ``'A'`` is sorted before ``'a'`` (incorrect) or not. - - .. code-block:: pycon - - >>> # This is genuinely the name of this function. - >>> # See natsort.compat.locale.py - >>> def dumb_sort(): - ... return locale.strxfrm('A') < locale.strxfrm('a') - ... - - If a ``dumb`` *locale* implementation is found, then automatically - turn on *LOWERCASEFIRST* and *GROUPLETTERS*. -#. Use an alternate library if installed. `ICU `_ - is a great and powerful library that has a pretty decent Python port - called (you guessed it) `PyICU `_. - If a user has this library installed on their computer, :mod:`natsort` - chooses to use that instead of :mod:`locale`. With a little bit of - planning, one can write a set of wrapper functions that call - the correct library under the hood such that the business logic never - has to know what library is being used (see `natsort.compat.locale.py`_). - -Let me tell you, this little complication really makes a challenge of testing -the code, since one must set up different environments on different operating -systems in order to test all possible code paths. Not to mention that -certain checks *will* fail for certain operating systems and environments -so one must be diligent in either writing the tests not to fail, or ignoring -those tests when on offending environments. - -Handling Locale-Aware Numbers -+++++++++++++++++++++++++++++ - -`Thousands separator support`_ is a problem that I knew would someday be -requested but had decided to push off until a rainy day. One day it finally -rained, and I decided to tackle the problem. - -So what is the problem? Consider the number ``1,234,567`` (assuming the -``','`` is the thousands separator). Try to run that through :func:`int` -and you will get a :exc:`ValueError`. To handle this properly the thousands -separators must be removed. - -.. code-block:: pycon - - >>> float('1,234,567'.replace(',', '')) - 1234567.0 - -What if, in our current locale, the thousands separator is ``'.'`` and -the ``','`` is the decimal separator (like for the German locale *de_DE*)? - -.. code-block:: pycon - - >>> float('1.234.567'.replace('.', '').replace(',', '.')) - 1234567.0 - >>> float('1.234.567,89'.replace('.', '').replace(',', '.')) - 1234567.89 - -This is pretty much what :func:`locale.atoi` and :func:`locale.atof` do -under the hood. So what's the problem? Why doesn't :mod:`natsort` just -use this method under its hood? -Well, let's take a look at what would happen if we send some possible -:mod:`natsort` input through our the above function: - -.. code-block:: pycon - - >>> natsort_key('1,234 apples, please.'.replace(',', '')) - ('', 1234, ' apples please.') - >>> natsort_key('Sir, €1.234,50 please.'.replace('.', '').replace(',', '.'), as_float=True) - ('Sir. €', 1234.5, ' please') - -Any character matching the thousands separator was dropped, and anything -matching the decimal separator was changed to ``'.'``! If these characters -were critical to how your data was ordered, this would break :mod:`natsort`. - -The first solution one might consider would be to first decompose the -input into sub-components (like we did for the *GROUPLETTERS* method -above) and then only apply these transformations on the number components. -This is a chicken-and-egg problem, though, because *we cannot appropriately -separate out the numbers because of the thousands separators and -non-'.' decimal separators* (well, at least not without making multiple -passes over the data which I do not consider to be a valid option). - -Regular expressions to the rescue! With regular expressions, we can -remove the thousands separators and change the decimal separator only -when they are actually within a number. Once the input has been -pre-processed with this regular expression, all the infrastructure -shown previously will work. - -Beware, these regular expressions will make your eyes bleed. - -.. code-block:: pycon - - >>> decimal = ',' # Assume German locale, so decimal separator is ',' - >>> # Look-behind assertions cannot accept range modifiers, so instead of i.e. - >>> # (?>> nodecimal = r'(?>> strip_thousands = r''' - ... (?<=[0-9]{{1}}) # At least 1 number - ... (?>> re.sub(strip_thousands, '', 'Sir, €1.234,50 please.', flags=re.X) - 'Sir, €1234,50 please.' - >>> - >>> # The decimal point must be preceded by a number or after - >>> # a number. This option only needs to be performed in the - >>> # case when the decimal separator for the locale is not '.'. - >>> switch_decimal = r'(?<=[0-9]){decimal}|{decimal}(?=[0-9])' - >>> switch_decimal = switch_decimal.format(decimal=decimal) - >>> re.sub(switch_decimal, '.', 'Sir, €1234,50 please.', flags=re.X) - 'Sir, €1234.50 please.' - >>> - >>> natsort_key('Sir, €1234.50 please.', as_float=True) - ('Sir, €', 1234.5, ' please.') - -Final Thoughts --------------- - -My hope is that users of :mod:`natsort` never have to think about or worry -about all the bookkeeping or any of the details described above, and that using -:mod:`natsort` seems to magically "just work". For those of you who -took the time to read this engineering description, I hope it has enlightened -you to some of the issues that can be encountered when code is released -into the wild and has to accept "real-world data", or to what happens -to developers who naïvely make bold assumptions that are counter to -what the rest of the world assumes. - -.. rubric:: Footnotes - -.. [#f1] - *"But if you hadn't removed the leading empty string from re.split this - wouldn't have happened!!"* I can hear you saying. Well, that's true. I don't - have a *great* reason for having done that except that in an earlier - non-optimal incarnation of the algorithm I needed to it, and it kind of - stuck, and it made other parts of the code easier if the assumption that - there were no empty strings was valid. -.. [#f2] - I'm not going to show how this is implemented in this document, - but if you are interested you can look at the code to - :func:`sep_inserter` in `util.py`_. -.. [#f3] - Handling each of these is straightforward, but coupled with the rapidly - fracturing execution paths presented in :ref:`TL;DR 2 ` one can - imagine this will get out of hand quickly. If you take a look at - `natsort.py`_ and `util.py`_ you can observe that to avoid this I take - a more functional approach to construting the :mod:`natsort` algorithm - as opposed to the procedural approach illustrated in - :ref:`TL;DR 1 ` and :ref:`TL;DR 2 `. - -.. _ASCII table: https://www.asciitable.com/ -.. _getting sorting right is surprisingly hard: http://www.compciv.org/guides/python/fundamentals/sorting-collections-with-sorted/ -.. _This astonished: https://github.com/SethMMorton/natsort/issues/19 -.. _a lot: https://stackoverflow.com/questions/29548742/python-natsort-sort-strings-recursively -.. _of people: https://stackoverflow.com/questions/24045348/sort-set-of-numbers-in-the-form-xx-yy-in-python -.. _and some people aren't very nice when they are astonished: - https://github.com/xolox/python-naturalsort/blob/ed3e6b6ffaca3bdea3b76e08acbb8bd2a5fee463/README.rst#why-another-natsort-module -.. _fastnumbers: https://github.com/SethMMorton/fastnumbers -.. _as part of my testing: https://github.com/SethMMorton/natsort/blob/master/test_natsort/slow_splitters.py -.. _this one for coercion: https://stackoverflow.com/questions/736043/checking-if-a-string-can-be-converted-to-float-in-python -.. _this one for checking: https://stackoverflow.com/questions/354038/how-do-i-check-if-a-string-is-a-number-float -.. _most natural sort solutions for python on Stack Overflow: https://stackoverflow.com/q/4836710/1399279 -.. _80%/20%: https://en.wikipedia.org/wiki/Pareto_principle -.. _The first major special case I encountered was sorting filesystem paths: https://github.com/SethMMorton/natsort/issues/3 -.. _The second major special case I encountered was sorting of different types: https://github.com/SethMMorton/natsort/issues/7 -.. _A rather unexpected special case I encountered was sorting collections containing NaN: - https://github.com/SethMMorton/natsort/issues/27 -.. _It's hard to compare floating point numbers: http://www.drdobbs.com/cpp/its-hard-to-compare-floating-point-numbe/240149806 -.. _caught a bit off guard when the request was initially made: https://github.com/SethMMorton/natsort/issues/14 -.. _at the code: https://github.com/SethMMorton/natsort/tree/master/natsort -.. _natsort.py: https://github.com/SethMMorton/natsort/blob/master/natsort/natsort.py -.. _util.py: https://github.com/SethMMorton/natsort/blob/master/natsort/util.py -.. _although they do point out in the documentation that it will be painful to use: - https://docs.python.org/3/library/locale.html#background-details-hints-tips-and-caveats -.. _natsort.compat.locale.py: https://github.com/SethMMorton/natsort/blob/master/natsort/compat/locale.py -.. _Thousands separator support: https://github.com/SethMMorton/natsort/issues/36 -.. _really good: https://hypothesis.readthedocs.io/en/latest/ -.. _testing strategy: https://docs.pytest.org/en/latest/ -.. _check out some official Unicode documentation: https://unicode.org/reports/tr15/ -.. _Check out this issue for a real-world example: https://github.com/SethMMorton/natsort/issues/140 \ No newline at end of file +This page has been moved to the +`natsort wiki `_. diff --git a/docs/locale_issues.rst b/docs/locale_issues.rst index 56cd5a9..3539904 100644 --- a/docs/locale_issues.rst +++ b/docs/locale_issues.rst @@ -6,92 +6,5 @@ Possible Issues with :func:`~natsort.humansorted` or ``ns.LOCALE`` ================================================================== -Being Locale-Aware Means Both Numbers and Non-Numbers ------------------------------------------------------ - -In addition to modifying how characters are sorted, ``ns.LOCALE`` will take -into account locale-dependent thousands separators (and locale-dependent -decimal separators if ``ns.FLOAT`` is enabled). This means that if you are in a -locale that uses commas as the thousands separator, a number like -``123,456`` will be interpreted as ``123456``. If this is not what you want, -you may consider using ``ns.LOCALEALPHA`` which will only enable locale-aware -sorting for non-numbers (similarly, ``ns.LOCALENUM`` enables locale-aware -sorting only for numbers). - -Regenerate Key With :func:`~natsort.natsort_keygen` After Changing Locale -------------------------------------------------------------------------- - -When :func:`~natsort.natsort_keygen` is called it returns a key function that -hard-codes the provided settings. This means that the key returned when -``ns.LOCALE`` is used contains the settings specified by the locale -*loaded at the time the key is generated*. If you change the locale, -you should regenerate the key to account for the new locale. - -Corollary: Do Not Reuse :func:`~natsort.natsort_keygen` After Changing Locale -+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - -If you change locale, the old function will not work as expected. -The :mod:`locale` library works with a global state. When -:func:`~natsort.natsort_keygen` is called it does the best job that it can to -make the returned function as static as possible and independent of the global -state, but the :func:`locale.strxfrm` function must access this global state to -work; therefore, if you change locale and use ``ns.LOCALE`` then you should -discard the old key. - -.. note:: If you use `PyICU`_ then you may be able to reuse keys after changing - locale. - -The :mod:`locale` Module From the StdLib Has Issues ---------------------------------------------------- - -:mod:`natsort` will use `PyICU`_ for :func:`~natsort.humansorted` or -``ns.LOCALE`` if it is installed. If not, it will fall back on the -:mod:`locale` library from the Python stdlib. If you do not have `PyICU`_ -installed, please keep the following known problems and issues in mind. - -.. note:: Remember, if you have `PyICU`_ installed you shouldn't need to worry - about any of these. - -Explicitly Set the Locale Before Using ``ns.LOCALE`` -++++++++++++++++++++++++++++++++++++++++++++++++++++ - -I have found that unless you explicitly set a locale, the sorted order may not -be what you expect. Setting this is straightforward -(in the below example I use 'en_US.UTF-8', but you should use your -locale): - -.. code-block:: pycon - - >>> import locale - >>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') - 'en_US.UTF-8' - -.. _bug_note: - -The :mod:`locale` Module Is Broken on Mac OS X -++++++++++++++++++++++++++++++++++++++++++++++ - -It's not Python's fault, but the OS... the locale library for BSD-based systems -(of which Mac OS X is one) is broken. See the following links: - - - https://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help - - https://bugs.python.org/issue23195 - - https://github.com/SethMMorton/natsort/issues/21 (contains instructons on installing) - - https://stackoverflow.com/questions/33459384/unicode-character-not-in-range-when-calling-locale-strxfrm - - https://github.com/SethMMorton/natsort/issues/34 - -Of course, installing `PyICU`_ fixes this, but if you don't want to or cannot -install this there is some hope. - - 1. As of ``natsort`` version 4.0.0, ``natsort`` is configured - to compensate for a broken ``locale`` library. When sorting non-numbers - it will handle case as you expect, but it will still not be able to - comprehend non-ASCII characters properly. Additionally, it has - a built-in lookup table of thousands separators that are incorrect - on OS X/BSD (but is possible it is not complete... please file an - issue if you see it is not complete) - 2. Use "\*.ISO8859-1" locale (i.e. 'en_US.ISO8859-1') rather than - "\*.UTF-8" locale. I have found that these have fewer issues than - "UTF-8", but your mileage may vary. - -.. _PyICU: https://pypi.org/project/PyICU +This page has been moved to the +`natsort wiki `_. diff --git a/docs/shell.rst b/docs/shell.rst index 0d7d3c9..bf40874 100644 --- a/docs/shell.rst +++ b/docs/shell.rst @@ -6,153 +6,5 @@ Shell Script ============ -The ``natsort`` shell script is automatically installed when you install -:mod:`natsort` with pip. - -Below is the usage and some usage examples for the ``natsort`` shell script. - -Usage ------ - -.. code-block:: - - usage: natsort [-h] [--version] [-p] [-f LOW HIGH] [-F LOW HIGH] [-e EXCLUDE] - [-r] [-t {digit,int,float,version,ver}] [--nosign] [--noexp] - [--locale] - [entries [entries ...]] - - Performs a natural sort on entries given on the command-line. - A natural sort sorts numerically then alphabetically, and will sort - by numbers in the middle of an entry. - - positional arguments: - entries The entries to sort. Taken from stdin if nothing is - given on the command line. - - optional arguments: - -h, --help show this help message and exit - --version show program's version number and exit - -p, --paths Interpret the input as file paths. This is not - strictly necessary to sort all file paths, but in - cases where there are OS-generated file paths like - "Folder/" and "Folder (1)/", this option is needed to - make the paths sorted in the order you expect - ("Folder/" before "Folder (1)/"). - -f LOW HIGH, --filter LOW HIGH - Used for keeping only the entries that have a number - falling in the given range. - -F LOW HIGH, --reverse-filter LOW HIGH - Used for excluding the entries that have a number - falling in the given range. - -e EXCLUDE, --exclude EXCLUDE - Used to exclude an entry that contains a specific - number. - -r, --reverse Returns in reversed order. - -t {digit,int,float,version,ver,real,f,i,r,d}, - --number-type {digit,int,float,version,ver,real,f,i,r,d}, - --number_type {digit,int,float,version,ver,real,f,i,r,d} - Choose the type of number to search for. "float" will - search for floating-point numbers. "int" will only - search for integers. "digit", "version", and "ver" are - synonyms for "int"."real" is a shortcut for "float" - with --sign. "i" and "d" are synonyms for "int", "f" - is a synonym for "float", and "r" is a synonym for - "real".The default is int. - --nosign Do not consider "+" or "-" as part of a number, i.e. - do not take sign into consideration. This is the - default. - -s, --sign Consider "+" or "-" as part of a number, i.e. take - sign into consideration. The default is unsigned. - --noexp Do not consider an exponential as part of a number, - i.e. 1e4, would be considered as 1, "e", and 4, not as - 10000. This only effects the --number-type=float. - -l, --locale Causes natsort to use locale-aware sorting. You will - get the best results if you install PyICU. - -Description ------------ - -``natsort`` was originally written to aid in computational chemistry -research so that it would be easy to analyze large sets of output files -named after the parameter used: - -.. code-block:: console - - $ ls *.out - mode1000.35.out mode1243.34.out mode744.43.out mode943.54.out - -(Obviously, in reality there would be more files, but you get the idea.) Notice -that the shell sorts in lexicographical order. This is the behavior of programs like -``find`` as well as ``ls``. The problem is passing these files to an -analysis program causes them not to appear in numerical order, which can lead -to bad analysis. To remedy this, use ``natsort``: - -.. code-block:: console - - $ natsort *.out - mode744.43.out - mode943.54.out - mode1000.35.out - mode1243.34.out - $ natsort -t r *.out | xargs your_program - -``-t r`` is short for ``--number-type real``. You can also place natsort in -the middle of a pipe: - -.. code-block:: console - - $ find . -name "*.out" | natsort -t r | xargs your_program - -To sort version numbers, use the default ``--number-type``: - -.. code-block:: console - - $ ls * - prog-1.10.zip prog-1.9.zip prog-2.0.zip - $ natsort * - prog-1.9.zip - prog-1.10.zip - prog-2.0.zip - -In general, all ``natsort`` shell script options mirror the :func:`~natsorted` -API, with notable exception of the ``--filter``, ``--reverse-filter``, and ``--exclude`` -options. These three options are used as follows: - -.. code-block:: console - - $ ls *.out - mode1000.35.out mode1243.34.out mode744.43.out mode943.54.out - $ natsort -t r *.out -f 900 1100 # Select only numbers between 900-1100 - mode943.54.out - mode1000.35.out - $ natsort -t r *.out -F 900 1100 # Select only numbers NOT between 900-1100 - mode744.43.out - mode1243.34.out - $ natsort -t r *.out -e 1000.35 # Exclude 1000.35 from search - mode744.43.out - mode943.54.out - mode1243.34.out - -If you are sorting paths with OS-generated filenames, you may require the -``--paths``/``-p`` option: - -.. code-block:: console - - $ find . ! -path . -type f - ./folder/file (1).txt - ./folder/file.txt - ./folder (1)/file.txt - ./folder (10)/file.txt - ./folder (2)/file.txt - $ find . ! -path . -type f | natsort - ./folder (1)/file.txt - ./folder (2)/file.txt - ./folder (10)/file.txt - ./folder/file (1).txt - ./folder/file.txt - $ find . ! -path . -type f | natsort -p - ./folder/file.txt - ./folder/file (1).txt - ./folder (1)/file.txt - ./folder (2)/file.txt - ./folder (10)/file.txt +This page has been moved to the +`natsort wiki `_. -- cgit v1.2.1