summaryrefslogtreecommitdiff
path: root/docs/HowToUsePyparsing.rst
diff options
context:
space:
mode:
Diffstat (limited to 'docs/HowToUsePyparsing.rst')
-rw-r--r--docs/HowToUsePyparsing.rst243
1 files changed, 137 insertions, 106 deletions
diff --git a/docs/HowToUsePyparsing.rst b/docs/HowToUsePyparsing.rst
index 61a0580..4fe8cf1 100644
--- a/docs/HowToUsePyparsing.rst
+++ b/docs/HowToUsePyparsing.rst
@@ -61,8 +61,8 @@ To parse an incoming data string, the client code must follow these steps:
When token matches occur, any defined parse action methods are
called.
-3. Process the parsed results, returned as a ParseResults object.
- The ParseResults object can be accessed as if it were a list of
+3. Process the parsed results, returned as a ParseResults_ object.
+ The ParseResults_ object can be accessed as if it were a list of
strings. Matching results may also be accessed as named attributes of
the returned results, if names are defined in the definition of
the token pattern, using ``set_results_name()``.
@@ -71,7 +71,7 @@ To parse an incoming data string, the client code must follow these steps:
Hello, World!
-------------
-The following complete Python program will parse the greeting "Hello, World!",
+The following complete Python program will parse the greeting ``"Hello, World!"``,
or any other greeting of the form "<salutation>, <addressee>!"::
import pyparsing as pp
@@ -106,8 +106,8 @@ Usage notes
- To keep up the readability of your code, use operators_ such as ``+``, ``|``,
``^``, and ``~`` to combine expressions. You can also combine
- string literals with ParseExpressions - they will be
- automatically converted to Literal objects. For example::
+ string literals with ``ParseExpressions`` - they will be
+ automatically converted to Literal_ objects. For example::
integer = Word(nums) # simple unsigned integer
variable = Char(alphas) # single letter variable, such as x, z, m, etc.
@@ -188,18 +188,18 @@ Usage notes
occurrences. If this behavior is desired, then write
``expr[..., n] + ~expr``.
-- ``MatchFirst`` expressions are matched left-to-right, and the first
+- MatchFirst_ expressions are matched left-to-right, and the first
match found will skip all later expressions within, so be sure
to define less-specific patterns after more-specific patterns.
- If you are not sure which expressions are most specific, use Or
+ If you are not sure which expressions are most specific, use Or_
expressions (defined using the ``^`` operator) - they will always
match the longest expression, although they are more
compute-intensive.
-- ``Or`` expressions will evaluate all of the specified subexpressions
+- Or_ expressions will evaluate all of the specified subexpressions
to determine which is the "best" match, that is, which matches
the longest string in the input data. In case of a tie, the
- left-most expression in the ``Or`` list will win.
+ left-most expression in the Or_ list will win.
- If parsing the contents of an entire file, pass it to the
``parse_file`` method using::
@@ -252,8 +252,8 @@ Usage notes
- Be careful when defining parse actions that modify global variables or
data structures (as in fourFn.py_), especially for low level tokens
- or expressions that may occur within an ``And`` expression; an early element
- of an ``And`` may match, but the overall expression may fail.
+ or expressions that may occur within an And_ expression; an early element
+ of an And_ may match, but the overall expression may fail.
Classes
@@ -269,7 +269,7 @@ methods for code to use are:
matching pattern; returns a ParseResults_ object that makes the
matched tokens available as a list, and optionally as a dictionary,
or as an object with named attributes; if ``parse_all`` is set to True, then
- parse_string will raise a ParseException if the grammar does not process
+ ``parse_string`` will raise a ParseException_ if the grammar does not process
the complete input string.
- ``parse_file(source_file)`` - a convenience function, that accepts an
@@ -348,12 +348,12 @@ methods for code to use are:
to tokens matching
the element; if multiple tokens within
a repetition group (such as ``ZeroOrMore`` or ``delimited_list``) the
- default is to return only the last matching token - if list_all_matches
+ default is to return only the last matching token - if ``list_all_matches``
is set to True, then a list of all the matching tokens is returned.
``expr.set_results_name("key")`` can also be written ``expr("key")``
(a results name with a trailing '*' character will be
- interpreted as setting ``list_all_matches`` to True).
+ interpreted as setting ``list_all_matches`` to ``True``).
Note:
``set_results_name`` returns a *copy* of the element so that a single
@@ -373,9 +373,9 @@ methods for code to use are:
Parse actions can have any of the following signatures::
- fn(s, loc, tokens)
- fn(loc, tokens)
- fn(tokens)
+ fn(s: str, loc: int, tokens: ParseResults)
+ fn(loc: int, tokens: ParseResults)
+ fn(tokens: ParseResults)
fn()
Multiple functions can be attached to a ``ParserElement`` by specifying multiple
@@ -406,7 +406,7 @@ methods for code to use are:
- ``set_break(break_flag=True)`` - if ``break_flag`` is ``True``, calls ``pdb.set_break()``
as this expression is about to be parsed
-- ``copy()`` - returns a copy of a ParserElement; can be used to use the same
+- ``copy()`` - returns a copy of a ``ParserElement``; can be used to use the same
parse expression in different places in a grammar, with different parse actions
attached to each; a short-form ``expr()`` is equivalent to ``expr.copy()``
@@ -415,7 +415,7 @@ methods for code to use are:
pyparsing module, rarely used by client code)
- ``set_whitespace_chars(chars)`` - define the set of chars to be ignored
- as whitespace before trying to match a specific ParserElement, in place of the
+ as whitespace before trying to match a specific ``ParserElement``, in place of the
default set of whitespace (space, tab, newline, and return)
- ``set_default_whitespace_chars(chars)`` - class-level method to override
@@ -460,18 +460,24 @@ methods for code to use are:
Basic ParserElement subclasses
------------------------------
+.. _Literal:
+
- ``Literal`` - construct with a string to be matched exactly
+.. _CaselessLiteral:
+
- ``CaselessLiteral`` - construct with a string to be matched, but
without case checking; results are always returned as the
defining literal, NOT as they are found in the input string
-- ``Keyword`` - similar to Literal, but must be immediately followed by
+.. _Keyword:
+
+- ``Keyword`` - similar to Literal_, but must be immediately followed by
whitespace, punctuation, or other non-keyword characters; prevents
accidental matching of a non-keyword that happens to begin with a
defined keyword
-- ``CaselessKeyword`` - similar to Keyword, but with caseless matching
+- ``CaselessKeyword`` - similar to Keyword_, but with caseless matching
behavior
.. _Word:
@@ -479,30 +485,33 @@ Basic ParserElement subclasses
- ``Word`` - one or more contiguous characters; construct with a
string containing the set of allowed initial characters, and an
optional second string of allowed body characters; for instance,
- a common Word construct is to match a code identifier - in C, a
+ a common ``Word`` construct is to match a code identifier - in C, a
valid identifier must start with an alphabetic character or an
underscore ('_'), followed by a body that can also include numeric
digits. That is, ``a``, ``i``, ``MAX_LENGTH``, ``_a1``, ``b_109_``, and
``plan9FromOuterSpace``
are all valid identifiers; ``9b7z``, ``$a``, ``.section``, and ``0debug``
are not. To
- define an identifier using a Word, use either of the following:
+ define an identifier using a ``Word``, use either of the following::
- - ``Word(alphas+"_", alphanums+"_")``
+ Word(alphas+"_", alphanums+"_")
+ Word(srange("[a-zA-Z_]"), srange("[a-zA-Z0-9_]"))
- - ``Word(srange("[a-zA-Z_]"), srange("[a-zA-Z0-9_]"))``
+ Pyparsing also provides pre-defined strings ``identchars`` and
+ ``identbodychars`` so that you can also write::
+
+ Word(identchars, identbodychars)
If only one
string given, it specifies that the same character set defined
for the initial character is used for the word body; for instance, to
define an identifier that can only be composed of capital letters and
- underscores, use:
-
- - ``Word("ABCDEFGHIJKLMNOPQRSTUVWXYZ_")``
+ underscores, use one of::
- - ``Word(srange("[A-Z_]"))``
+ ``Word("ABCDEFGHIJKLMNOPQRSTUVWXYZ_")``
+ ``Word(srange("[A-Z_]"))``
- A Word may
+ A ``Word`` may
also be constructed with any of the following optional parameters:
- ``min`` - indicating a minimum length of matching characters
@@ -516,7 +525,7 @@ Basic ParserElement subclasses
Sometimes you want to define a word using all
characters in a range except for one or two of them; you can do this
with the new ``exclude_chars`` argument. This is helpful if you want to define
- a word with all printables except for a single delimiter character, such
+ a word with all ``printables`` except for a single delimiter character, such
as '.'. Previously, you would have to create a custom string to pass to Word.
With this change, you can just create ``Word(printables, exclude_chars='.')``.
@@ -551,7 +560,9 @@ Basic ParserElement subclasses
- ``unquote_results`` - boolean indicating whether the matched text should be unquoted (default=True)
- - ``end_quote_char`` - string of one or more characters defining the end of the quote delimited string (default=None => same as quote_char)
+ - ``end_quote_char`` - string of one or more characters defining the end of the quote delimited string (default=None => same as ``quote_char``)
+
+.. _SkipTo:
- ``SkipTo`` - skips ahead in the input string, accepting any
characters up to the specified pattern; may be constructed with
@@ -564,11 +575,12 @@ Basic ParserElement subclasses
to prevent false matches
- ``fail_on`` - if a literal string or expression is given for this argument, it defines an expression that
- should cause the ``SkipTo`` expression to fail, and not skip over that expression
+ should cause the SkipTo_ expression to fail, and not skip over that expression
``SkipTo`` can also be written using ``...``::
LBRACE, RBRACE = map(Literal, "{}")
+
brace_expr = LBRACE + SkipTo(RBRACE) + RBRACE
# can also be written as
brace_expr = LBRACE + ... + RBRACE
@@ -585,16 +597,18 @@ Basic ParserElement subclasses
- ``Empty`` - a null expression, requiring no characters - will always
match; useful for debugging and for specialized grammars
-- ``NoMatch`` - opposite of Empty, will never match; useful for debugging
+- ``NoMatch`` - opposite of ``Empty``, will never match; useful for debugging
and for specialized grammars
Expression subclasses
---------------------
+.. _And:
+
- ``And`` - construct with a list of ``ParserElements``, all of which must
- match for And to match; can also be created using the '+'
- operator; multiple expressions can be Anded together using the '*'
+ match for ``And`` to match; can also be created using the '+'
+ operator; multiple expressions can be ``Anded`` together using the '*'
operator as in::
ip_address = Word(nums) + ('.' + Word(nums)) * 3
@@ -618,18 +632,24 @@ Expression subclasses
the location where the incoming text does not match the specified
grammar.
+.. _Or:
+
- ``Or`` - construct with a list of ``ParserElements``, any of which must
- match for Or to match; if more than one expression matches, the
+ match for ``Or`` to match; if more than one expression matches, the
expression that makes the longest match will be used; can also
be created using the '^' operator
+.. _MatchFirst:
+
- ``MatchFirst`` - construct with a list of ``ParserElements``, any of
- which must match for MatchFirst to match; matching is done
+ which must match for ``MatchFirst`` to match; matching is done
left-to-right, taking the first expression that matches; can
also be created using the '|' operator
-- ``Each`` - similar to ``And``, in that all of the provided expressions
- must match; however, Each permits matching to be done in any order;
+.. _Each:
+
+- ``Each`` - similar to And_, in that all of the provided expressions
+ must match; however, ``Each`` permits matching to be done in any order;
can also be created using the '&' operator
- ``Opt`` - construct with a ``ParserElement``, but this element is
@@ -652,6 +672,8 @@ Expression subclasses
- ``FollowedBy`` - a lookahead expression, requires matching of the given
expressions, but does not advance the parsing position within the input string
+.. _NotAny:
+
- ``NotAny`` - a negative lookahead expression, prevents matching of named
expressions, does not advance the parsing position within the input string;
can also be created using the unary '~' operator
@@ -662,31 +684,31 @@ Expression subclasses
Expression operators
--------------------
-- ``~`` - creates ``NotAny`` using the expression after the operator
+- ``+`` - creates And_ using the expressions before and after the operator
-- ``+`` - creates ``And`` using the expressions before and after the operator
+- ``|`` - creates MatchFirst_ (first left-to-right match) using the expressions before and after the operator
-- ``|`` - creates ``MatchFirst`` (first left-to-right match) using the expressions before and after the operator
+- ``^`` - creates Or_ (longest match) using the expressions before and after the operator
-- ``^`` - creates ``Or`` (longest match) using the expressions before and after the operator
+- ``&`` - creates Each_ using the expressions before and after the operator
-- ``&`` - creates ``Each`` using the expressions before and after the operator
-
-- ``*`` - creates ``And`` by multiplying the expression by the integer operand; if
- expression is multiplied by a 2-tuple, creates an ``And`` of (min,max)
- expressions (similar to "{min,max}" form in regular expressions); if
- min is None, interpret as (0,max); if max is None, interpret as
+- ``*`` - creates And_ by multiplying the expression by the integer operand; if
+ expression is multiplied by a 2-tuple, creates an And_ of ``(min,max)``
+ expressions (similar to ``{min,max}`` form in regular expressions); if
+ ``min`` is ``None``, interpret as ``(0,max)``; if ``max`` is ``None``, interpret as
``expr*min + ZeroOrMore(expr)``
- ``-`` - like ``+`` but with no backup and retry of alternatives
-- ``==`` - matching expression to string; returns True if the string matches the given expression
+- ``~`` - creates NotAny_ using the expression after the operator
+
+- ``==`` - matching expression to string; returns ``True`` if the string matches the given expression
- ``<<=`` - inserts the expression following the operator as the body of the
- Forward expression before the operator (``<<`` can also be used, but ``<<=`` is preferred
+ ``Forward`` expression before the operator (``<<`` can also be used, but ``<<=`` is preferred
to avoid operator precedence misinterpretation of the pyparsing expression)
-- ``...`` - inserts a ``SkipTo`` expression leading to the next expression, as in
+- ``...`` - inserts a SkipTo_ expression leading to the next expression, as in
``Keyword("start") + ... + Keyword("end")``.
- ``[min, max]`` - specifies repetition similar to ``*`` with ``min`` and ``max`` specified
@@ -717,7 +739,7 @@ Converter subclasses
--------------------
- ``Combine`` - joins all matched tokens into a single string, using
- specified join_string (default ``join_string=""``); expects
+ specified ``join_string`` (default ``join_string=""``); expects
all matching tokens to be adjacent, with no intervening
whitespace (can be overridden by specifying ``adjacent=False`` in constructor)
@@ -734,13 +756,9 @@ Special subclasses
break up matched tokens into groups for each repeated pattern
- ``Dict`` - like ``Group``, but also constructs a dictionary, using the
- [0]'th elements of all enclosed token lists as the keys, and
+ ``[0]``'th elements of all enclosed token lists as the keys, and
each token list as the value
-- ``SkipTo`` - catch-all matching expression that accepts all characters
- up until the given pattern is found to match; useful for specifying
- incomplete grammars
-
- ``Forward`` - placeholder token used to define recursive token
patterns; when defining the actual expression later in the
program, insert it into the ``Forward`` object using the ``<<=``
@@ -753,18 +771,18 @@ Other classes
- ``ParseResults`` - class used to contain and manage the lists of tokens
created from parsing the input using the user-defined parse
- expression. ParseResults can be accessed in a number of ways:
+ expression. ``ParseResults`` can be accessed in a number of ways:
- as a list
- - total list of elements can be found using len()
+ - total list of elements can be found using ``len()``
- - individual elements can be found using [0], [1], [-1], etc.,
+ - individual elements can be found using ``[0], [1], [-1],`` etc.,
or retrieved using slices
- elements can be deleted using ``del``
- - the -1th element can be extracted and removed in a single operation
+ - the ``-1``th element can be extracted and removed in a single operation
using ``pop()``, or any element can be extracted and removed
using ``pop(n)``
@@ -774,8 +792,8 @@ Other classes
overall parse expression, then these fields can be referenced
as dictionary elements or as attributes
- - the Dict class generates dictionary entries using the data of the
- input text - in addition to ParseResults listed as ``[ [ a1, b1, c1, ...], [ a2, b2, c2, ...] ]``
+ - the ``Dict`` class generates dictionary entries using the data of the
+ input text - in addition to ParseResults_ listed as ``[ [ a1, b1, c1, ...], [ a2, b2, c2, ...] ]``
it also acts as a dictionary with entries defined as ``{ a1 : [ b1, c1, ... ] }, { a2 : [ b2, c2, ... ] }``;
this is especially useful when processing tabular data where the first column contains a key
value for that line of data
@@ -786,11 +804,12 @@ Other classes
- supports ``get()``, ``items()`` and ``keys()`` methods, similar to a dictionary
- a keyed item can be extracted and removed using ``pop(key)``. Here
- key must be non-numeric (such as a string), in order to use dict
+ ``key`` must be non-numeric (such as a string), in order to use dict
extraction instead of list extraction.
- new named elements can be added (in a parse action, for instance), using the same
- syntax as adding an item to a dict (``parse_results["X"] = "new item"``); named elements can be removed using ``del parse_results["X"]``
+ syntax as adding an item to a dict (``parse_results["X"] = "new item"``);
+ named elements can be removed using ``del parse_results["X"]``
- as a nested list
@@ -803,13 +822,13 @@ Other classes
- named elements can be accessed as if they were attributes of an object:
if an element is referenced that does not exist, it will return ``""``.
- ParseResults can also be converted to an ordinary list of strings
+ ParseResults_ can also be converted to an ordinary list of strings
by calling ``as_list()``. Note that this will strip the results of any
field names that have been defined for any embedded parse elements.
(The ``pprint`` module is especially good at printing out the nested contents
given by ``as_list()``.)
- Finally, ParseResults can be viewed by calling ``dump()``. ``dump()`` will first show
+ Finally, ParseResults_ can be viewed by calling ``dump()``. ``dump()`` will first show
the ``as_list()`` output, followed by an indented structure listing parsed tokens that
have been assigned results names.
@@ -845,7 +864,7 @@ Exception classes and Troubleshooting
.. _ParseException:
- ``ParseException`` - exception returned when a grammar parse fails;
- ParseExceptions have attributes loc, msg, line, lineno, and column; to view the
+ ``ParseExceptions`` have attributes ``loc``, ``msg``, ``line``, ``lineno``, and ``column``; to view the
text line and location where the reported ParseException occurs, use::
except ParseException as err:
@@ -853,6 +872,11 @@ Exception classes and Troubleshooting
print(" " * (err.column - 1) + "^")
print(err)
+ ``ParseExceptions`` also have an ``explain()`` method that gives this same information::
+
+ except ParseException as err:
+ print(err.explain())
+
- ``RecursiveGrammarException`` - exception returned by ``validate()`` if
the grammar contains a recursive infinite loop, such as::
@@ -866,7 +890,7 @@ Exception classes and Troubleshooting
- ``ParseSyntaxException`` - subclass of ``ParseFatalException`` raised when a
syntax error is found, based on the use of the '-' operator when defining
- a sequence of expressions in an ``And`` expression.
+ a sequence of expressions in an And_ expression.
- You can also get some insights into the parsing logic using diagnostic parse actions,
and ``set_debug()``, or test the matching of expression fragments by testing them using
@@ -876,7 +900,7 @@ Exception classes and Troubleshooting
one of the following enum values defined in ``pyparsing.Diagnostics``
- ``warn_multiple_tokens_in_named_alternation`` - flag to enable warnings when a results
- name is defined on a ``MatchFirst`` or ``Or`` expression with one or more ``And`` subexpressions
+ name is defined on a MatchFirst_ or Or_ expression with one or more And_ subexpressions
- ``warn_ungrouped_named_tokens_in_collection`` - flag to enable warnings when a results
name is defined on a containing expression with ungrouped subexpressions that also
@@ -930,22 +954,22 @@ Helper methods
parse results - the leading integer is suppressed from the results (although it
is easily reconstructed by using len on the returned array).
-- ``one_of(string, caseless=False, as_keyword=False)`` - convenience function for quickly declaring an
- alternative set of ``Literal`` expressions, by splitting the given string on
- whitespace boundaries. The expressions are sorted so that longer
- matches are attempted first; this ensures that a short expressions does
+- ``one_of(choices, caseless=False, as_keyword=False)`` - convenience function for quickly declaring an
+ alternative set of Literal_ expressions. ``choices`` can be passed as a list of strings
+ or as a single string of values separated by spaces. The values are sorted so that longer
+ matches are attempted first; this ensures that a short value does
not mask a longer one that starts with the same characters. If ``caseless=True``,
- will create an alternative set of CaselessLiteral tokens. If ``as_keyword=True``,
- ``one_of`` will declare ``Keyword`` expressions instead of ``Literal`` expressions.
+ will create an alternative set of CaselessLiteral_ tokens. If ``as_keyword=True``,
+ ``one_of`` will declare Keyword_ expressions instead of Literal_ expressions.
-- ``dict_off(key, value)`` - convenience function for quickly declaring a
+- ``dict_of(key, value)`` - convenience function for quickly declaring a
dictionary pattern of ``Dict(ZeroOrMore(Group(key + value)))``.
- ``make_html_tags(tag_str)`` and ``make_xml_tags(tag_str)`` - convenience
functions to create definitions of opening and closing tag expressions. Returns
a pair of expressions, for the corresponding ``<tag>`` and ``</tag>`` strings. Includes
support for attributes in the opening tag, such as ``<tag attr1="abc">`` - attributes
- are returned as named results in the returned ParseResults. ``make_html_tags`` is less
+ are returned as named results in the returned ParseResults_. ``make_html_tags`` is less
restrictive than ``make_xml_tags``, especially with respect to case sensitivity.
- ``infix_notation(base_operand, operator_list)`` -
@@ -963,8 +987,8 @@ Helper methods
``(operand_expr, num_operands, right_left_assoc, parse_action)``, where:
- ``operand_expr`` - the pyparsing expression for the operator;
- may also be a string, which will be converted to a Literal; if
- None, indicates an empty operator, such as the implied
+ may also be a string, which will be converted to a Literal_; if
+ ``None``, indicates an empty operator, such as the implied
multiplication operation between 'm' and 'x' in "y = mx + b".
- ``num_operands`` - the number of terms for this operator (must
@@ -984,7 +1008,7 @@ Helper methods
this expression to parse input strings, or incorporate it
into a larger, more complex grammar.
-- ``match_previous_literal`` and ``match_previous_expr`` - function to define and
+- ``match_previous_literal`` and ``match_previous_expr`` - function to define an
expression that matches the same content
as was parsed in a previous parse expression. For instance::
@@ -1011,7 +1035,7 @@ Helper methods
- ``content`` - expression for items within the nested lists (default=None)
- - ``ignore_expr`` - expression for ignoring opening and closing delimiters (default=quoted_string)
+ - ``ignore_expr`` - expression for ignoring opening and closing delimiters (default=``quoted_string``)
If an expression is not provided for the content argument, the nested
expression will capture all whitespace-delimited content between delimiters
@@ -1019,10 +1043,10 @@ Helper methods
Use the ``ignore_expr`` argument to define expressions that may contain
opening or closing characters that should not be treated as opening
- or closing characters for nesting, such as quoted_string or a comment
- expression. Specify multiple expressions using an Or or MatchFirst.
- The default is quoted_string, but if no expressions are to be ignored,
- then pass None for this argument.
+ or closing characters for nesting, such as ``quoted_string`` or a comment
+ expression. Specify multiple expressions using an Or_ or MatchFirst_.
+ The default is ``quoted_string``, but if no expressions are to be ignored,
+ then pass ``None`` for this argument.
- ``IndentedBlock(statement_expr, recursive=True)`` -
@@ -1046,8 +1070,8 @@ Helper methods
full_name = original_text_for(Word(alphas) + Word(alphas))
- ``ungroup(expr)`` - function to "ungroup" returned tokens; useful
- to undo the default behavior of And to always group the returned tokens, even
- if there is only one in the list. (New in 1.5.6)
+ to undo the default behavior of And_ to always group the returned tokens, even
+ if there is only one in the list.
- ``lineno(loc, string)`` - function to give the line number of the
location within the string; the first line is line 1, newlines
@@ -1064,7 +1088,7 @@ Helper methods
- ``srange(range_spec)`` - function to define a string of characters,
given a string of the form used by regexp string ranges, such as ``"[0-9]"`` for
all numeric digits, ``"[A-Z_]"`` for uppercase characters plus underscore, and
- so on (note that range_spec does not include support for generic regular
+ so on (note that ``range_spec`` does not include support for generic regular
expressions, just string range specs)
- ``trace_parse_action(fn)`` - decorator function to debug parse actions. Lists
@@ -1079,14 +1103,14 @@ Helper parse actions
useful to remove the delimiting quotes from quoted strings
- ``replace_with(repl_string)`` - returns a parse action that simply returns the
- repl_string; useful when using transform_string, or converting HTML entities, as in::
+ ``repl_string``; useful when using ``transform_string``, or converting HTML entities, as in::
nbsp = Literal("&nbsp;").set_parse_action(replace_with("<BLANK>"))
-- ``keepOriginalText``- (deprecated, use original_text_for_ instead) restores any internal whitespace or suppressed
+- ``original_text_for``- restores any internal whitespace or suppressed
text within the tokens for a matched parse
expression. This is especially useful when defining expressions
- for scan_string or transform_string applications.
+ for ``scan_string`` or ``transform_string`` applications.
- ``with_attribute(*args, **kwargs)`` - helper to create a validating parse action to be used with start tags created
with ``make_xml_tags`` or ``make_html_tags``. Use ``with_attribute`` to qualify a starting tag
@@ -1110,7 +1134,7 @@ Helper parse actions
- ``match_only_at_col(column_number)`` - a parse action that verifies that
an expression was matched at a particular column, raising a
- ParseException if matching at a different column number; useful when parsing
+ ``ParseException`` if matching at a different column number; useful when parsing
tabular data
@@ -1165,17 +1189,13 @@ To generate a railroad diagram in pyparsing, you first have to install pyparsing
To do this, just run ``pip install pyparsing[diagrams]``, and make sure you add ``pyparsing[diagrams]`` to any
``setup.py`` or ``requirements.txt`` that specifies pyparsing as a dependency.
-Next, run ``pyparsing.diagrams.to_railroad`` to convert your grammar into a form understood by the
-`railroad-diagrams <https://github.com/tabatkins/railroad-diagrams/blob/gh-pages/README-py.md>`_ module, and
-then ``pyparsing.diagrams.railroad_to_html`` to convert that into an HTML document. For example::
+Create your parser as you normally would. Then call ``create_diagram()``, passing the name of an output HTML file.::
- from pyparsing.diagram import to_railroad, railroad_to_html
+ street_address = Word(nums).set_name("house_number") + Word(alphas)[1, ...].set_name("street_name")
+ street_address.set_name("street_address")
+ street_address.create_diagram("street_address_diagram.html")
- with open('output.html', 'w') as fp:
- railroad = to_railroad(my_grammar)
- fp.write(railroad_to_html(railroad))
-
-This will result in the railroad diagram being written to ``output.html``
+This will result in the railroad diagram being written to ``street_address_diagram.html``.
Example
-------
@@ -1185,12 +1205,23 @@ SQL SELECT statements <_static/sql_railroad.html>`_.
Customization
-------------
You can customize the resulting diagram in a few ways.
+To do so, run ``pyparsing.diagrams.to_railroad`` to convert your grammar into a form understood by the
+`railroad-diagrams <https://github.com/tabatkins/railroad-diagrams/blob/gh-pages/README-py.md>`_ module, and
+then ``pyparsing.diagrams.railroad_to_html`` to convert that into an HTML document. For example::
+
+ from pyparsing.diagram import to_railroad, railroad_to_html
+
+ with open('output.html', 'w') as fp:
+ railroad = to_railroad(my_grammar)
+ fp.write(railroad_to_html(railroad))
+
+This will result in the railroad diagram being written to ``output.html``
-Firstly, you can pass in additional keyword arguments to ``pyparsing.diagrams.to_railroad``, which will be passed
+You can then pass in additional keyword arguments to ``pyparsing.diagrams.to_railroad``, which will be passed
into the ``Diagram()`` constructor of the underlying library,
`as explained here <https://github.com/tabatkins/railroad-diagrams/blob/gh-pages/README-py.md#diagrams>`_.
-Secondly, you can edit global options in the underlying library, by editing constants::
+In addition, you can edit global options in the underlying library, by editing constants::
from pyparsing.diagram import to_railroad, railroad_to_html
import railroad