summaryrefslogtreecommitdiff
path: root/docs/HowToUsePyparsing.rst
diff options
context:
space:
mode:
authorPaul McGuire <ptmcg@users.noreply.github.com>2020-05-28 00:33:51 -0500
committerPaul McGuire <ptmcg@users.noreply.github.com>2020-05-28 00:33:51 -0500
commit6fc6fa978bb65496becfba3c72039971f151ed70 (patch)
tree27f946a2743a8d490b9e43b721edefb003e45f57 /docs/HowToUsePyparsing.rst
parentcb7f7fda5604997a65c8efc958cc8dfc4985eaf7 (diff)
downloadpyparsing-git-6fc6fa978bb65496becfba3c72039971f151ed70.tar.gz
Update HowTo doc, address comments in #213
Diffstat (limited to 'docs/HowToUsePyparsing.rst')
-rw-r--r--docs/HowToUsePyparsing.rst114
1 files changed, 93 insertions, 21 deletions
diff --git a/docs/HowToUsePyparsing.rst b/docs/HowToUsePyparsing.rst
index ce954f2..8d8582c 100644
--- a/docs/HowToUsePyparsing.rst
+++ b/docs/HowToUsePyparsing.rst
@@ -5,10 +5,10 @@ Using the pyparsing module
:author: Paul McGuire
:address: ptmcg@users.sourceforge.net
-:revision: 2.0.1a
-:date: July, 2013 (minor update August, 2018)
+:revision: 2.4.7
+:date: June, 2020
-:copyright: Copyright |copy| 2003-2013 Paul McGuire.
+:copyright: Copyright |copy| 2003-2020 Paul McGuire.
.. |copy| unicode:: 0xA9
@@ -25,7 +25,7 @@ Using the pyparsing module
Note: While this content is still valid, there are more detailed
descriptions and examples at the online doc server at
-https://pythonhosted.org/pyparsing/pyparsing-module.html
+https://pyparsing-docs.readthedocs.io/en/latest/pyparsing.html
Steps to follow
===============
@@ -82,8 +82,8 @@ Usage notes
automatically converted to Literal objects. For example::
integer = Word(nums) # simple unsigned integer
- variable = Word(alphas, max=1) # single letter variable, such as x, z, m, etc.
- arithOp = Word("+-*/", max=1) # arithmetic operators
+ variable = Char(alphas) # single letter variable, such as x, z, m, etc.
+ arithOp = oneOf("+ - * /") # arithmetic operators
equation = variable + "=" + integer + arithOp + integer # will match "x=2+2", etc.
In the definition of ``equation``, the string ``"="`` will get added as
@@ -214,7 +214,7 @@ Usage notes
+ "MIN:" + realNum.setResultsName("min")
+ "MAX:" + realNum.setResultsName("max"))
- can now be written as this::
+ can more simply and cleanly be written as this::
stats = ("AVE:" + realNum("average")
+ "MIN:" + realNum("min")
@@ -272,14 +272,59 @@ methods for code to use are:
- ``setName(name)`` - associate a short descriptive name for this
element, useful in displaying exceptions and trace information
+- ``runTests(testsString)`` - useful development and testing method on
+ expressions, to pass a multiline string of sample strings to test against
+ the expression. Comment lines (beginning with ``#``) can be inserted
+ and they will be included in the test output:
+
+ digits = Word(nums).setName("numeric digits")
+ real_num = Combine(digits + '.' + digits)
+ real_num.runTests("""\
+ # valid number
+ 3.14159
+
+ # no integer part
+ .00001
+
+ # no decimal
+ 101
+
+ # no decimal value
+ 101.
+ """)
+
+ will print:
+
+ # valid number
+ 3.14159
+ ['3.14159']
+
+ # no integer part
+ .00001
+ ^
+ FAIL: Expected numeric digits, found '.' (at char 0), (line:1, col:1)
+
+ # no decimal
+ 101
+ ^
+ FAIL: Expected ".", found end of text (at char 3), (line:1, col:4)
+
+ # no decimal value
+ 101.
+ ^
+ FAIL: Expected numeric digits, found end of text (at char 4), (line:1, col:5)
+
- ``setResultsName(string, listAllMatches=False)`` - name to be given
to tokens matching
the element; if multiple tokens within
a repetition group (such as ``ZeroOrMore`` or ``delimitedList``) the
default is to return only the last matching token - if listAllMatches
is set to True, then a list of all the matching tokens is returned.
- (New in 1.5.6 - a results name with a trailing '*' character will be
- interpreted as setting listAllMatches to True.)
+
+ ``expr.setResultsName("key")` can also be written ``expr("key")``
+ (a results name with a trailing '*' character will be
+ interpreted as setting listAllMatches to True).
+
Note:
``setResultsName`` returns a *copy* of the element so that a single
basic element can be referenced multiple times and given
@@ -296,8 +341,17 @@ methods for code to use are:
- ``toks`` is the list of the matched tokens, packaged as a ParseResults_ object
- Multiple functions can be attached to a ParserElement by specifying multiple
- arguments to setParseAction, or by calling setParseAction multiple times.
+ Parse actions can have any of the following signatures:
+
+ fn(s, loc, tokens)
+ fn(loc, tokens)
+ fn(tokens)
+ fn()
+
+ Multiple functions can be attached to a ``ParserElement`` by specifying multiple
+ arguments to ``setParseAction``, or by calling ``addParseAction``. Calls to ``setParseAction``
+ will replace any previously defined parse actions. ``setParseAction(None)`` will clear
+ any previously defined parse action.
Each parse action function can return a modified ``toks`` list, to perform conversion, or
string modifications. For brevity, ``fn`` may also be a
@@ -306,8 +360,12 @@ methods for code to use are:
intNumber = Word(nums).setParseAction(lambda s,l,t: [int(t[0])])
- If ``fn`` does not modify the ``toks`` list, it does not need to return
- anything at all.
+ If ``fn`` modifies the ``toks`` list in-place, it does not need to return
+ and pyparsing will use the modified ``toks`` list.
+
+- ``addParseAction`` - similar to ``setParseAction``, but instead of replacing any
+ previously defined parse actions, will append the given action or actions to the
+ existing defined parse actions.
- ``setBreak(breakFlag=True)`` - if breakFlag is True, calls pdb.set_break()
as this expression is about to be parsed
@@ -412,13 +470,18 @@ Basic ParserElement subclasses
If ``exact`` is specified, it will override any values for ``min`` or ``max``.
- New in 1.5.6 - Sometimes you want to define a word using all
+ Sometimes you want to define a word using all
characters in a range except for one or two of them; you can do this
with the new ``excludeChars`` argument. This is helpful if you want to define
a word with all printables except for a single delimiter character, such
as '.'. Previously, you would have to create a custom string to pass to Word.
With this change, you can just create ``Word(printables, excludeChars='.')``.
+- Char - a convenience form of ``Word`` that will match just a single character from
+ a string of matching characters
+
+ single_digit = Char(nums)
+
- ``CharsNotIn`` - similar to Word_, but matches characters not
in the given constructor string (accepts only one string for both
initial and body characters); also supports ``min``, ``max``, and ``exact``
@@ -460,6 +523,13 @@ Basic ParserElement subclasses
- ``failOn`` - if a literal string or expression is given for this argument, it defines an expression that
should cause the ``SkipTo`` expression to fail, and not skip over that expression
+ ``SkipTo`` can also be written using ``...``:
+
+ LBRACE, RBRACE = map(Literal, "{}")
+ brace_expr = LBRACE + SkipTo(RBRACE) + RBRACE
+ # can also be written as
+ brace_expr = LBRACE + ... + RBRACE
+
.. _White:
- ``White`` - also similar to Word_, but matches whitespace
@@ -525,10 +595,11 @@ Expression subclasses
parse element is not found in the input string; parse action will only
be called if a match is found, or if a default is specified
-- ``ZeroOrMore`` - similar to Optional, but can be repeated
+- ``ZeroOrMore`` - similar to Optional, but can be repeated; ``ZeroOrMore(expr)``
+ can also be written as ``expr[...]``.
- ``OneOrMore`` - similar to ZeroOrMore, but at least one match must
- be present
+ be present; ``OneOrMore(expr)`` can also be written as ``expr[1, ...]``.
- ``FollowedBy`` - a lookahead expression, requires matching of the given
expressions, but does not advance the parsing position within the input string
@@ -566,8 +637,8 @@ Expression operators
- ``==`` - matching expression to string; returns True if the string matches the given expression
- ``<<=`` - inserts the expression following the operator as the body of the
- Forward expression before the operator
-
+ Forward expression before the operator (``<<`` can also be used, but ``<<=`` is preferred
+ to avoid operator precedence misinterpretation of the pyparsing expression)
Positional subclasses
@@ -633,7 +704,8 @@ Other classes
- total list of elements can be found using len()
- - individual elements can be found using [0], [1], [-1], etc.
+ - individual elements can be found using [0], [1], [-1], etc.,
+ or retrieved using slices
- elements can be deleted using ``del``
@@ -754,14 +826,14 @@ Helper methods
are returned as keyed tokens in the returned ParseResults. ``makeHTMLTags`` is less
restrictive than ``makeXMLTags``, especially with respect to case sensitivity.
-- ``infixNotation(baseOperand, operatorList)`` - (formerly named ``operatorPrecedence``)
+- ``infixNotation(baseOperand, operatorList)`` -
convenience function to define a grammar for parsing infix notation
expressions with a hierarchical precedence of operators. To use the ``infixNotation``
helper:
1. Define the base "atom" operand term of the grammar.
For this simple grammar, the smallest operand is either
- and integer or a variable. This will be the first argument
+ an integer or a variable. This will be the first argument
to the ``infixNotation`` method.
2. Define a list of tuples for each level of operator