From 95efc46795c2ec22d78c41cd81394728c0488cd1 Mon Sep 17 00:00:00 2001 From: Abdurrahmaan Iqbal Date: Thu, 6 Feb 2020 20:07:10 +0000 Subject: Implement parse_pattern function --- docs/tutorial.rst | 50 +++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 41 insertions(+), 9 deletions(-) (limited to 'docs/tutorial.rst') diff --git a/docs/tutorial.rst b/docs/tutorial.rst index 158d2ad..9693444 100644 --- a/docs/tutorial.rst +++ b/docs/tutorial.rst @@ -251,14 +251,14 @@ or .. note:: Pint´s rule for parsing strings with a mixture of numbers and units is that **units are treated with the same precedence as numbers**. - + For example, the unit of .. doctest:: >>> Q_('3 l / 100 km') - + may be unexpected first but is a consequence of applying this rule. Use brackets to get the expected result: @@ -272,6 +272,38 @@ brackets to get the expected result: exposed to when parsing information from untrusted sources. +Strings containing values can be parsed using the ``ureg.parse_pattern`` function. A ``format``-like string with the units defined in it is used as the pattern: + +.. doctest:: + + >>> input_string = '10 feet 10 inches' + >>> pattern = '{feet} feet {inch} inches' + >>> ureg.parse_pattern(input_string, pattern) + [10.0 , 10.0 ] + +To search for multiple matches, set the ``many`` parameter to ``True``. The following example also demonstrates how the parser is able to find matches in amongst filler characters: + +.. doctest:: + + >>> input_string = '10 feet - 20 feet ! 30 feet.' + >>> pattern = '{feet} feet' + >>> ureg.parse_pattern(input_string, pattern, many=True) + [[10.0 ], [20.0 ], [30.0 ]] + +The full power of regex can also be employed when writing patterns: + +.. doctest:: + + >>> input_string = "10` - 20 feet ! 30 ft." + >>> pattern = r"{feet}(`| feet| ft)" + >>> ureg.parse_pattern(input_string, pattern, many=True) + [[10.0 ], [20.0 ], [30.0 ]] + +*Note that the curly brackets (``{}``) are converted to a float-matching pattern by the parser.* + +This function is useful for tasks such as bulk extraction of units from thousands of uniform strings or even very large texts with units dotted around in no particular pattern. + + .. _sec-string-formatting: String formatting @@ -303,12 +335,12 @@ Pint supports float formatting for numpy arrays as well: >>> # scientific form formatting with unit pretty printing >>> print('The array is {:+.2E~P}'.format(accel)) The array is [-1.10E+00 +1.00E-06 +1.25E+00 +1.30E+00] m/s² - + Pint also supports 'f-strings'_ from python>=3.6 : .. doctest:: - >>> accel = 1.3 * ureg['meter/second**2'] + >>> accel = 1.3 * ureg['meter/second**2'] >>> print(f'The str is {accel}') The str is 1.3 meter / second ** 2 >>> print(f'The str is {accel:.3e}') @@ -318,7 +350,7 @@ Pint also supports 'f-strings'_ from python>=3.6 : >>> print(f'The str is {accel:~.3e}') The str is 1.300e+00 m / s ** 2 >>> print(f'The str is {accel:~H}') - The str is 1.3 m/s² + The str is 1.3 m/s² But Pint also extends the standard formatting capabilities for unicode and LaTeX representations: @@ -349,11 +381,11 @@ If you want to use abbreviated unit names, prefix the specification with `~`: The same is true for latex (`L`) and HTML (`H`) specs. .. note:: - The abbreviated unit is drawn from the unit registry where the 3rd item in the - equivalence chain (ie 1 = 2 = **3**) will be returned when the prefix '~' is + The abbreviated unit is drawn from the unit registry where the 3rd item in the + equivalence chain (ie 1 = 2 = **3**) will be returned when the prefix '~' is used. The 1st item in the chain is the canonical name of the unit. -The formatting specs (ie 'L', 'H', 'P') can be used with Python string 'formatting +The formatting specs (ie 'L', 'H', 'P') can be used with Python string 'formatting syntax'_ for custom float representations. For example, scientific notation: ..doctest:: @@ -438,4 +470,4 @@ also define the registry as the application registry:: .. _`serious security problems`: http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html .. _`Babel`: http://babel.pocoo.org/ .. _'formatting syntax': https://docs.python.org/3/library/string.html#format-specification-mini-language -.. _'f-strings': https://www.python.org/dev/peps/pep-0498/ \ No newline at end of file +.. _'f-strings': https://www.python.org/dev/peps/pep-0498/ -- cgit v1.2.1 From fa86d1af3cbb7904a53f08d4e0c08e4cec25519a Mon Sep 17 00:00:00 2001 From: Hernan Date: Tue, 11 Feb 2020 00:45:11 -0300 Subject: Make `__str__` and `__format__` locale aware MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit With this commit, all string formatting operations are locale aware >>> ureg.set_fmt_locale('fr_FR') >>> str(accel) '1.3 mètre par seconde²' >>> "%s" % accel '1.3 mètre par seconde²' >>> "{}".format(accel) '1.3 mètre par seconde²' It should not break any code as the default locale value for the Registry is `None` (meaning do not localize). Close #984 --- docs/tutorial.rst | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) (limited to 'docs/tutorial.rst') diff --git a/docs/tutorial.rst b/docs/tutorial.rst index 9693444..ac9c2b3 100644 --- a/docs/tutorial.rst +++ b/docs/tutorial.rst @@ -419,10 +419,27 @@ Finally, if Babel_ is installed you can translate unit names to any language >>> accel.format_babel(locale='fr_FR') '1.3 mètre par seconde²' -You can also specify the format locale at u +You can also specify the format locale at the registry level either at creation: >>> ureg = UnitRegistry(fmt_locale='fr_FR') +or later: + +.. doctest:: + + >>> ureg.set_fmt_locale('fr_FR') + +and by doing that, string formatting is now localized: + +.. doctest:: + + >>> str(accel) + '1.3 mètre par seconde²' + >>> "%s" % accel + '1.3 mètre par seconde²' + >>> "{}".format(accel) + '1.3 mètre par seconde²' + Using Pint in your projects --------------------------- -- cgit v1.2.1