diff options
author | ptmcg <ptmcg@austin.rr.com> | 2021-09-02 17:54:59 -0500 |
---|---|---|
committer | ptmcg <ptmcg@austin.rr.com> | 2021-09-02 17:54:59 -0500 |
commit | 11fda2880df71ce6661807b3b5921bc09bd6e003 (patch) | |
tree | b4a2b54e574382c5c2bf2330adea62cfa7d86872 | |
parent | 1ed653a54dec0c0a81ba39139cfa4503a13fa956 (diff) | |
download | pyparsing-git-11fda2880df71ce6661807b3b5921bc09bd6e003.tar.gz |
Docs cleanup
-rw-r--r-- | docs/HowToUsePyparsing.rst | 243 | ||||
-rw-r--r-- | docs/whats_new_in_3_0_0.rst | 28 |
2 files changed, 151 insertions, 120 deletions
diff --git a/docs/HowToUsePyparsing.rst b/docs/HowToUsePyparsing.rst index 61a0580..4fe8cf1 100644 --- a/docs/HowToUsePyparsing.rst +++ b/docs/HowToUsePyparsing.rst @@ -61,8 +61,8 @@ To parse an incoming data string, the client code must follow these steps: When token matches occur, any defined parse action methods are called. -3. Process the parsed results, returned as a ParseResults object. - The ParseResults object can be accessed as if it were a list of +3. Process the parsed results, returned as a ParseResults_ object. + The ParseResults_ object can be accessed as if it were a list of strings. Matching results may also be accessed as named attributes of the returned results, if names are defined in the definition of the token pattern, using ``set_results_name()``. @@ -71,7 +71,7 @@ To parse an incoming data string, the client code must follow these steps: Hello, World! ------------- -The following complete Python program will parse the greeting "Hello, World!", +The following complete Python program will parse the greeting ``"Hello, World!"``, or any other greeting of the form "<salutation>, <addressee>!":: import pyparsing as pp @@ -106,8 +106,8 @@ Usage notes - To keep up the readability of your code, use operators_ such as ``+``, ``|``, ``^``, and ``~`` to combine expressions. You can also combine - string literals with ParseExpressions - they will be - automatically converted to Literal objects. For example:: + string literals with ``ParseExpressions`` - they will be + automatically converted to Literal_ objects. For example:: integer = Word(nums) # simple unsigned integer variable = Char(alphas) # single letter variable, such as x, z, m, etc. @@ -188,18 +188,18 @@ Usage notes occurrences. If this behavior is desired, then write ``expr[..., n] + ~expr``. -- ``MatchFirst`` expressions are matched left-to-right, and the first +- MatchFirst_ expressions are matched left-to-right, and the first match found will skip all later expressions within, so be sure to define less-specific patterns after more-specific patterns. - If you are not sure which expressions are most specific, use Or + If you are not sure which expressions are most specific, use Or_ expressions (defined using the ``^`` operator) - they will always match the longest expression, although they are more compute-intensive. -- ``Or`` expressions will evaluate all of the specified subexpressions +- Or_ expressions will evaluate all of the specified subexpressions to determine which is the "best" match, that is, which matches the longest string in the input data. In case of a tie, the - left-most expression in the ``Or`` list will win. + left-most expression in the Or_ list will win. - If parsing the contents of an entire file, pass it to the ``parse_file`` method using:: @@ -252,8 +252,8 @@ Usage notes - Be careful when defining parse actions that modify global variables or data structures (as in fourFn.py_), especially for low level tokens - or expressions that may occur within an ``And`` expression; an early element - of an ``And`` may match, but the overall expression may fail. + or expressions that may occur within an And_ expression; an early element + of an And_ may match, but the overall expression may fail. Classes @@ -269,7 +269,7 @@ methods for code to use are: matching pattern; returns a ParseResults_ object that makes the matched tokens available as a list, and optionally as a dictionary, or as an object with named attributes; if ``parse_all`` is set to True, then - parse_string will raise a ParseException if the grammar does not process + ``parse_string`` will raise a ParseException_ if the grammar does not process the complete input string. - ``parse_file(source_file)`` - a convenience function, that accepts an @@ -348,12 +348,12 @@ methods for code to use are: to tokens matching the element; if multiple tokens within a repetition group (such as ``ZeroOrMore`` or ``delimited_list``) the - default is to return only the last matching token - if list_all_matches + default is to return only the last matching token - if ``list_all_matches`` is set to True, then a list of all the matching tokens is returned. ``expr.set_results_name("key")`` can also be written ``expr("key")`` (a results name with a trailing '*' character will be - interpreted as setting ``list_all_matches`` to True). + interpreted as setting ``list_all_matches`` to ``True``). Note: ``set_results_name`` returns a *copy* of the element so that a single @@ -373,9 +373,9 @@ methods for code to use are: Parse actions can have any of the following signatures:: - fn(s, loc, tokens) - fn(loc, tokens) - fn(tokens) + fn(s: str, loc: int, tokens: ParseResults) + fn(loc: int, tokens: ParseResults) + fn(tokens: ParseResults) fn() Multiple functions can be attached to a ``ParserElement`` by specifying multiple @@ -406,7 +406,7 @@ methods for code to use are: - ``set_break(break_flag=True)`` - if ``break_flag`` is ``True``, calls ``pdb.set_break()`` as this expression is about to be parsed -- ``copy()`` - returns a copy of a ParserElement; can be used to use the same +- ``copy()`` - returns a copy of a ``ParserElement``; can be used to use the same parse expression in different places in a grammar, with different parse actions attached to each; a short-form ``expr()`` is equivalent to ``expr.copy()`` @@ -415,7 +415,7 @@ methods for code to use are: pyparsing module, rarely used by client code) - ``set_whitespace_chars(chars)`` - define the set of chars to be ignored - as whitespace before trying to match a specific ParserElement, in place of the + as whitespace before trying to match a specific ``ParserElement``, in place of the default set of whitespace (space, tab, newline, and return) - ``set_default_whitespace_chars(chars)`` - class-level method to override @@ -460,18 +460,24 @@ methods for code to use are: Basic ParserElement subclasses ------------------------------ +.. _Literal: + - ``Literal`` - construct with a string to be matched exactly +.. _CaselessLiteral: + - ``CaselessLiteral`` - construct with a string to be matched, but without case checking; results are always returned as the defining literal, NOT as they are found in the input string -- ``Keyword`` - similar to Literal, but must be immediately followed by +.. _Keyword: + +- ``Keyword`` - similar to Literal_, but must be immediately followed by whitespace, punctuation, or other non-keyword characters; prevents accidental matching of a non-keyword that happens to begin with a defined keyword -- ``CaselessKeyword`` - similar to Keyword, but with caseless matching +- ``CaselessKeyword`` - similar to Keyword_, but with caseless matching behavior .. _Word: @@ -479,30 +485,33 @@ Basic ParserElement subclasses - ``Word`` - one or more contiguous characters; construct with a string containing the set of allowed initial characters, and an optional second string of allowed body characters; for instance, - a common Word construct is to match a code identifier - in C, a + a common ``Word`` construct is to match a code identifier - in C, a valid identifier must start with an alphabetic character or an underscore ('_'), followed by a body that can also include numeric digits. That is, ``a``, ``i``, ``MAX_LENGTH``, ``_a1``, ``b_109_``, and ``plan9FromOuterSpace`` are all valid identifiers; ``9b7z``, ``$a``, ``.section``, and ``0debug`` are not. To - define an identifier using a Word, use either of the following: + define an identifier using a ``Word``, use either of the following:: - - ``Word(alphas+"_", alphanums+"_")`` + Word(alphas+"_", alphanums+"_") + Word(srange("[a-zA-Z_]"), srange("[a-zA-Z0-9_]")) - - ``Word(srange("[a-zA-Z_]"), srange("[a-zA-Z0-9_]"))`` + Pyparsing also provides pre-defined strings ``identchars`` and + ``identbodychars`` so that you can also write:: + + Word(identchars, identbodychars) If only one string given, it specifies that the same character set defined for the initial character is used for the word body; for instance, to define an identifier that can only be composed of capital letters and - underscores, use: - - - ``Word("ABCDEFGHIJKLMNOPQRSTUVWXYZ_")`` + underscores, use one of:: - - ``Word(srange("[A-Z_]"))`` + ``Word("ABCDEFGHIJKLMNOPQRSTUVWXYZ_")`` + ``Word(srange("[A-Z_]"))`` - A Word may + A ``Word`` may also be constructed with any of the following optional parameters: - ``min`` - indicating a minimum length of matching characters @@ -516,7 +525,7 @@ Basic ParserElement subclasses Sometimes you want to define a word using all characters in a range except for one or two of them; you can do this with the new ``exclude_chars`` argument. This is helpful if you want to define - a word with all printables except for a single delimiter character, such + a word with all ``printables`` except for a single delimiter character, such as '.'. Previously, you would have to create a custom string to pass to Word. With this change, you can just create ``Word(printables, exclude_chars='.')``. @@ -551,7 +560,9 @@ Basic ParserElement subclasses - ``unquote_results`` - boolean indicating whether the matched text should be unquoted (default=True) - - ``end_quote_char`` - string of one or more characters defining the end of the quote delimited string (default=None => same as quote_char) + - ``end_quote_char`` - string of one or more characters defining the end of the quote delimited string (default=None => same as ``quote_char``) + +.. _SkipTo: - ``SkipTo`` - skips ahead in the input string, accepting any characters up to the specified pattern; may be constructed with @@ -564,11 +575,12 @@ Basic ParserElement subclasses to prevent false matches - ``fail_on`` - if a literal string or expression is given for this argument, it defines an expression that - should cause the ``SkipTo`` expression to fail, and not skip over that expression + should cause the SkipTo_ expression to fail, and not skip over that expression ``SkipTo`` can also be written using ``...``:: LBRACE, RBRACE = map(Literal, "{}") + brace_expr = LBRACE + SkipTo(RBRACE) + RBRACE # can also be written as brace_expr = LBRACE + ... + RBRACE @@ -585,16 +597,18 @@ Basic ParserElement subclasses - ``Empty`` - a null expression, requiring no characters - will always match; useful for debugging and for specialized grammars -- ``NoMatch`` - opposite of Empty, will never match; useful for debugging +- ``NoMatch`` - opposite of ``Empty``, will never match; useful for debugging and for specialized grammars Expression subclasses --------------------- +.. _And: + - ``And`` - construct with a list of ``ParserElements``, all of which must - match for And to match; can also be created using the '+' - operator; multiple expressions can be Anded together using the '*' + match for ``And`` to match; can also be created using the '+' + operator; multiple expressions can be ``Anded`` together using the '*' operator as in:: ip_address = Word(nums) + ('.' + Word(nums)) * 3 @@ -618,18 +632,24 @@ Expression subclasses the location where the incoming text does not match the specified grammar. +.. _Or: + - ``Or`` - construct with a list of ``ParserElements``, any of which must - match for Or to match; if more than one expression matches, the + match for ``Or`` to match; if more than one expression matches, the expression that makes the longest match will be used; can also be created using the '^' operator +.. _MatchFirst: + - ``MatchFirst`` - construct with a list of ``ParserElements``, any of - which must match for MatchFirst to match; matching is done + which must match for ``MatchFirst`` to match; matching is done left-to-right, taking the first expression that matches; can also be created using the '|' operator -- ``Each`` - similar to ``And``, in that all of the provided expressions - must match; however, Each permits matching to be done in any order; +.. _Each: + +- ``Each`` - similar to And_, in that all of the provided expressions + must match; however, ``Each`` permits matching to be done in any order; can also be created using the '&' operator - ``Opt`` - construct with a ``ParserElement``, but this element is @@ -652,6 +672,8 @@ Expression subclasses - ``FollowedBy`` - a lookahead expression, requires matching of the given expressions, but does not advance the parsing position within the input string +.. _NotAny: + - ``NotAny`` - a negative lookahead expression, prevents matching of named expressions, does not advance the parsing position within the input string; can also be created using the unary '~' operator @@ -662,31 +684,31 @@ Expression subclasses Expression operators -------------------- -- ``~`` - creates ``NotAny`` using the expression after the operator +- ``+`` - creates And_ using the expressions before and after the operator -- ``+`` - creates ``And`` using the expressions before and after the operator +- ``|`` - creates MatchFirst_ (first left-to-right match) using the expressions before and after the operator -- ``|`` - creates ``MatchFirst`` (first left-to-right match) using the expressions before and after the operator +- ``^`` - creates Or_ (longest match) using the expressions before and after the operator -- ``^`` - creates ``Or`` (longest match) using the expressions before and after the operator +- ``&`` - creates Each_ using the expressions before and after the operator -- ``&`` - creates ``Each`` using the expressions before and after the operator - -- ``*`` - creates ``And`` by multiplying the expression by the integer operand; if - expression is multiplied by a 2-tuple, creates an ``And`` of (min,max) - expressions (similar to "{min,max}" form in regular expressions); if - min is None, interpret as (0,max); if max is None, interpret as +- ``*`` - creates And_ by multiplying the expression by the integer operand; if + expression is multiplied by a 2-tuple, creates an And_ of ``(min,max)`` + expressions (similar to ``{min,max}`` form in regular expressions); if + ``min`` is ``None``, interpret as ``(0,max)``; if ``max`` is ``None``, interpret as ``expr*min + ZeroOrMore(expr)`` - ``-`` - like ``+`` but with no backup and retry of alternatives -- ``==`` - matching expression to string; returns True if the string matches the given expression +- ``~`` - creates NotAny_ using the expression after the operator + +- ``==`` - matching expression to string; returns ``True`` if the string matches the given expression - ``<<=`` - inserts the expression following the operator as the body of the - Forward expression before the operator (``<<`` can also be used, but ``<<=`` is preferred + ``Forward`` expression before the operator (``<<`` can also be used, but ``<<=`` is preferred to avoid operator precedence misinterpretation of the pyparsing expression) -- ``...`` - inserts a ``SkipTo`` expression leading to the next expression, as in +- ``...`` - inserts a SkipTo_ expression leading to the next expression, as in ``Keyword("start") + ... + Keyword("end")``. - ``[min, max]`` - specifies repetition similar to ``*`` with ``min`` and ``max`` specified @@ -717,7 +739,7 @@ Converter subclasses -------------------- - ``Combine`` - joins all matched tokens into a single string, using - specified join_string (default ``join_string=""``); expects + specified ``join_string`` (default ``join_string=""``); expects all matching tokens to be adjacent, with no intervening whitespace (can be overridden by specifying ``adjacent=False`` in constructor) @@ -734,13 +756,9 @@ Special subclasses break up matched tokens into groups for each repeated pattern - ``Dict`` - like ``Group``, but also constructs a dictionary, using the - [0]'th elements of all enclosed token lists as the keys, and + ``[0]``'th elements of all enclosed token lists as the keys, and each token list as the value -- ``SkipTo`` - catch-all matching expression that accepts all characters - up until the given pattern is found to match; useful for specifying - incomplete grammars - - ``Forward`` - placeholder token used to define recursive token patterns; when defining the actual expression later in the program, insert it into the ``Forward`` object using the ``<<=`` @@ -753,18 +771,18 @@ Other classes - ``ParseResults`` - class used to contain and manage the lists of tokens created from parsing the input using the user-defined parse - expression. ParseResults can be accessed in a number of ways: + expression. ``ParseResults`` can be accessed in a number of ways: - as a list - - total list of elements can be found using len() + - total list of elements can be found using ``len()`` - - individual elements can be found using [0], [1], [-1], etc., + - individual elements can be found using ``[0], [1], [-1],`` etc., or retrieved using slices - elements can be deleted using ``del`` - - the -1th element can be extracted and removed in a single operation + - the ``-1``th element can be extracted and removed in a single operation using ``pop()``, or any element can be extracted and removed using ``pop(n)`` @@ -774,8 +792,8 @@ Other classes overall parse expression, then these fields can be referenced as dictionary elements or as attributes - - the Dict class generates dictionary entries using the data of the - input text - in addition to ParseResults listed as ``[ [ a1, b1, c1, ...], [ a2, b2, c2, ...] ]`` + - the ``Dict`` class generates dictionary entries using the data of the + input text - in addition to ParseResults_ listed as ``[ [ a1, b1, c1, ...], [ a2, b2, c2, ...] ]`` it also acts as a dictionary with entries defined as ``{ a1 : [ b1, c1, ... ] }, { a2 : [ b2, c2, ... ] }``; this is especially useful when processing tabular data where the first column contains a key value for that line of data @@ -786,11 +804,12 @@ Other classes - supports ``get()``, ``items()`` and ``keys()`` methods, similar to a dictionary - a keyed item can be extracted and removed using ``pop(key)``. Here - key must be non-numeric (such as a string), in order to use dict + ``key`` must be non-numeric (such as a string), in order to use dict extraction instead of list extraction. - new named elements can be added (in a parse action, for instance), using the same - syntax as adding an item to a dict (``parse_results["X"] = "new item"``); named elements can be removed using ``del parse_results["X"]`` + syntax as adding an item to a dict (``parse_results["X"] = "new item"``); + named elements can be removed using ``del parse_results["X"]`` - as a nested list @@ -803,13 +822,13 @@ Other classes - named elements can be accessed as if they were attributes of an object: if an element is referenced that does not exist, it will return ``""``. - ParseResults can also be converted to an ordinary list of strings + ParseResults_ can also be converted to an ordinary list of strings by calling ``as_list()``. Note that this will strip the results of any field names that have been defined for any embedded parse elements. (The ``pprint`` module is especially good at printing out the nested contents given by ``as_list()``.) - Finally, ParseResults can be viewed by calling ``dump()``. ``dump()`` will first show + Finally, ParseResults_ can be viewed by calling ``dump()``. ``dump()`` will first show the ``as_list()`` output, followed by an indented structure listing parsed tokens that have been assigned results names. @@ -845,7 +864,7 @@ Exception classes and Troubleshooting .. _ParseException: - ``ParseException`` - exception returned when a grammar parse fails; - ParseExceptions have attributes loc, msg, line, lineno, and column; to view the + ``ParseExceptions`` have attributes ``loc``, ``msg``, ``line``, ``lineno``, and ``column``; to view the text line and location where the reported ParseException occurs, use:: except ParseException as err: @@ -853,6 +872,11 @@ Exception classes and Troubleshooting print(" " * (err.column - 1) + "^") print(err) + ``ParseExceptions`` also have an ``explain()`` method that gives this same information:: + + except ParseException as err: + print(err.explain()) + - ``RecursiveGrammarException`` - exception returned by ``validate()`` if the grammar contains a recursive infinite loop, such as:: @@ -866,7 +890,7 @@ Exception classes and Troubleshooting - ``ParseSyntaxException`` - subclass of ``ParseFatalException`` raised when a syntax error is found, based on the use of the '-' operator when defining - a sequence of expressions in an ``And`` expression. + a sequence of expressions in an And_ expression. - You can also get some insights into the parsing logic using diagnostic parse actions, and ``set_debug()``, or test the matching of expression fragments by testing them using @@ -876,7 +900,7 @@ Exception classes and Troubleshooting one of the following enum values defined in ``pyparsing.Diagnostics`` - ``warn_multiple_tokens_in_named_alternation`` - flag to enable warnings when a results - name is defined on a ``MatchFirst`` or ``Or`` expression with one or more ``And`` subexpressions + name is defined on a MatchFirst_ or Or_ expression with one or more And_ subexpressions - ``warn_ungrouped_named_tokens_in_collection`` - flag to enable warnings when a results name is defined on a containing expression with ungrouped subexpressions that also @@ -930,22 +954,22 @@ Helper methods parse results - the leading integer is suppressed from the results (although it is easily reconstructed by using len on the returned array). -- ``one_of(string, caseless=False, as_keyword=False)`` - convenience function for quickly declaring an - alternative set of ``Literal`` expressions, by splitting the given string on - whitespace boundaries. The expressions are sorted so that longer - matches are attempted first; this ensures that a short expressions does +- ``one_of(choices, caseless=False, as_keyword=False)`` - convenience function for quickly declaring an + alternative set of Literal_ expressions. ``choices`` can be passed as a list of strings + or as a single string of values separated by spaces. The values are sorted so that longer + matches are attempted first; this ensures that a short value does not mask a longer one that starts with the same characters. If ``caseless=True``, - will create an alternative set of CaselessLiteral tokens. If ``as_keyword=True``, - ``one_of`` will declare ``Keyword`` expressions instead of ``Literal`` expressions. + will create an alternative set of CaselessLiteral_ tokens. If ``as_keyword=True``, + ``one_of`` will declare Keyword_ expressions instead of Literal_ expressions. -- ``dict_off(key, value)`` - convenience function for quickly declaring a +- ``dict_of(key, value)`` - convenience function for quickly declaring a dictionary pattern of ``Dict(ZeroOrMore(Group(key + value)))``. - ``make_html_tags(tag_str)`` and ``make_xml_tags(tag_str)`` - convenience functions to create definitions of opening and closing tag expressions. Returns a pair of expressions, for the corresponding ``<tag>`` and ``</tag>`` strings. Includes support for attributes in the opening tag, such as ``<tag attr1="abc">`` - attributes - are returned as named results in the returned ParseResults. ``make_html_tags`` is less + are returned as named results in the returned ParseResults_. ``make_html_tags`` is less restrictive than ``make_xml_tags``, especially with respect to case sensitivity. - ``infix_notation(base_operand, operator_list)`` - @@ -963,8 +987,8 @@ Helper methods ``(operand_expr, num_operands, right_left_assoc, parse_action)``, where: - ``operand_expr`` - the pyparsing expression for the operator; - may also be a string, which will be converted to a Literal; if - None, indicates an empty operator, such as the implied + may also be a string, which will be converted to a Literal_; if + ``None``, indicates an empty operator, such as the implied multiplication operation between 'm' and 'x' in "y = mx + b". - ``num_operands`` - the number of terms for this operator (must @@ -984,7 +1008,7 @@ Helper methods this expression to parse input strings, or incorporate it into a larger, more complex grammar. -- ``match_previous_literal`` and ``match_previous_expr`` - function to define and +- ``match_previous_literal`` and ``match_previous_expr`` - function to define an expression that matches the same content as was parsed in a previous parse expression. For instance:: @@ -1011,7 +1035,7 @@ Helper methods - ``content`` - expression for items within the nested lists (default=None) - - ``ignore_expr`` - expression for ignoring opening and closing delimiters (default=quoted_string) + - ``ignore_expr`` - expression for ignoring opening and closing delimiters (default=``quoted_string``) If an expression is not provided for the content argument, the nested expression will capture all whitespace-delimited content between delimiters @@ -1019,10 +1043,10 @@ Helper methods Use the ``ignore_expr`` argument to define expressions that may contain opening or closing characters that should not be treated as opening - or closing characters for nesting, such as quoted_string or a comment - expression. Specify multiple expressions using an Or or MatchFirst. - The default is quoted_string, but if no expressions are to be ignored, - then pass None for this argument. + or closing characters for nesting, such as ``quoted_string`` or a comment + expression. Specify multiple expressions using an Or_ or MatchFirst_. + The default is ``quoted_string``, but if no expressions are to be ignored, + then pass ``None`` for this argument. - ``IndentedBlock(statement_expr, recursive=True)`` - @@ -1046,8 +1070,8 @@ Helper methods full_name = original_text_for(Word(alphas) + Word(alphas)) - ``ungroup(expr)`` - function to "ungroup" returned tokens; useful - to undo the default behavior of And to always group the returned tokens, even - if there is only one in the list. (New in 1.5.6) + to undo the default behavior of And_ to always group the returned tokens, even + if there is only one in the list. - ``lineno(loc, string)`` - function to give the line number of the location within the string; the first line is line 1, newlines @@ -1064,7 +1088,7 @@ Helper methods - ``srange(range_spec)`` - function to define a string of characters, given a string of the form used by regexp string ranges, such as ``"[0-9]"`` for all numeric digits, ``"[A-Z_]"`` for uppercase characters plus underscore, and - so on (note that range_spec does not include support for generic regular + so on (note that ``range_spec`` does not include support for generic regular expressions, just string range specs) - ``trace_parse_action(fn)`` - decorator function to debug parse actions. Lists @@ -1079,14 +1103,14 @@ Helper parse actions useful to remove the delimiting quotes from quoted strings - ``replace_with(repl_string)`` - returns a parse action that simply returns the - repl_string; useful when using transform_string, or converting HTML entities, as in:: + ``repl_string``; useful when using ``transform_string``, or converting HTML entities, as in:: nbsp = Literal(" ").set_parse_action(replace_with("<BLANK>")) -- ``keepOriginalText``- (deprecated, use original_text_for_ instead) restores any internal whitespace or suppressed +- ``original_text_for``- restores any internal whitespace or suppressed text within the tokens for a matched parse expression. This is especially useful when defining expressions - for scan_string or transform_string applications. + for ``scan_string`` or ``transform_string`` applications. - ``with_attribute(*args, **kwargs)`` - helper to create a validating parse action to be used with start tags created with ``make_xml_tags`` or ``make_html_tags``. Use ``with_attribute`` to qualify a starting tag @@ -1110,7 +1134,7 @@ Helper parse actions - ``match_only_at_col(column_number)`` - a parse action that verifies that an expression was matched at a particular column, raising a - ParseException if matching at a different column number; useful when parsing + ``ParseException`` if matching at a different column number; useful when parsing tabular data @@ -1165,17 +1189,13 @@ To generate a railroad diagram in pyparsing, you first have to install pyparsing To do this, just run ``pip install pyparsing[diagrams]``, and make sure you add ``pyparsing[diagrams]`` to any ``setup.py`` or ``requirements.txt`` that specifies pyparsing as a dependency. -Next, run ``pyparsing.diagrams.to_railroad`` to convert your grammar into a form understood by the -`railroad-diagrams <https://github.com/tabatkins/railroad-diagrams/blob/gh-pages/README-py.md>`_ module, and -then ``pyparsing.diagrams.railroad_to_html`` to convert that into an HTML document. For example:: +Create your parser as you normally would. Then call ``create_diagram()``, passing the name of an output HTML file.:: - from pyparsing.diagram import to_railroad, railroad_to_html + street_address = Word(nums).set_name("house_number") + Word(alphas)[1, ...].set_name("street_name") + street_address.set_name("street_address") + street_address.create_diagram("street_address_diagram.html") - with open('output.html', 'w') as fp: - railroad = to_railroad(my_grammar) - fp.write(railroad_to_html(railroad)) - -This will result in the railroad diagram being written to ``output.html`` +This will result in the railroad diagram being written to ``street_address_diagram.html``. Example ------- @@ -1185,12 +1205,23 @@ SQL SELECT statements <_static/sql_railroad.html>`_. Customization ------------- You can customize the resulting diagram in a few ways. +To do so, run ``pyparsing.diagrams.to_railroad`` to convert your grammar into a form understood by the +`railroad-diagrams <https://github.com/tabatkins/railroad-diagrams/blob/gh-pages/README-py.md>`_ module, and +then ``pyparsing.diagrams.railroad_to_html`` to convert that into an HTML document. For example:: + + from pyparsing.diagram import to_railroad, railroad_to_html + + with open('output.html', 'w') as fp: + railroad = to_railroad(my_grammar) + fp.write(railroad_to_html(railroad)) + +This will result in the railroad diagram being written to ``output.html`` -Firstly, you can pass in additional keyword arguments to ``pyparsing.diagrams.to_railroad``, which will be passed +You can then pass in additional keyword arguments to ``pyparsing.diagrams.to_railroad``, which will be passed into the ``Diagram()`` constructor of the underlying library, `as explained here <https://github.com/tabatkins/railroad-diagrams/blob/gh-pages/README-py.md#diagrams>`_. -Secondly, you can edit global options in the underlying library, by editing constants:: +In addition, you can edit global options in the underlying library, by editing constants:: from pyparsing.diagram import to_railroad, railroad_to_html import railroad diff --git a/docs/whats_new_in_3_0_0.rst b/docs/whats_new_in_3_0_0.rst index 91bfa67..5d4bfcd 100644 --- a/docs/whats_new_in_3_0_0.rst +++ b/docs/whats_new_in_3_0_0.rst @@ -52,8 +52,8 @@ generator for documenting pyparsing parsers. You need to install # define a simple grammar for parsing street addresses such # as "123 Main Street" # number word... - number = pp.Word(pp.nums).setName("number") - name = pp.Word(pp.alphas).setName("word")[1, ...] + number = pp.Word(pp.nums).set_name("number") + name = pp.Word(pp.alphas).set_name("word")[1, ...] parser = number("house_number") + name("street") parser.set_name("street address") @@ -73,7 +73,7 @@ the methods of the Python PEG parser, pyparsing uses a variation of packrat parsing to detect and handle left-recursion during parsing.:: import pyparsing as pp - pp.ParserElement.enableLeftRecursion() + pp.ParserElement.enable_left_recursion() # a common left-recursion definition # define a list of items as 'list + item | item' @@ -200,7 +200,7 @@ nesting level). For this code:: wd = Word(alphas) - for match in locatedExpr(wd).searchString("ljsdf123lksdjjf123lkkjj1222"): + for match in locatedExpr(wd).search_string("ljsdf123lksdjjf123lkkjj1222"): print(match) the docs for ``locaatedExpr`` show this output:: @@ -261,7 +261,7 @@ deprecated in a future release. Shortened tracebacks -------------------- Cleaned up default tracebacks when getting a ``ParseException`` when calling -``parseString``. Exception traces should now stop at the call in ``parseString``, +``parse_string``. Exception traces should now stop at the call in ``parse_string``, and not include the internal pyparsing traceback frames. (If the full traceback is desired, then set ``ParserElement.verbose_traceback`` to ``True``.) @@ -314,7 +314,7 @@ Other new features - Better exception messages to show full word where an exception occurred.:: - Word(alphas)[...].parseString("abc 123", parseAll=True) + Word(alphas)[...].parse_string("abc 123", parse_all=True) Was:: @@ -331,15 +331,15 @@ Other new features start_marker = Keyword("START") end_marker = Keyword("END") find_body = Suppress(...) + start_marker + ... + end_marker - print(find_body.parseString(source).dump()) + print(find_body.parse_string(source).dump()) Prints:: ['START', 'relevant text ', 'END'] - _skipped: ['relevant text '] -- Added ``ignoreWhitespace(recurse:bool = True)`` and added a - ``recurse`` argument to ``leaveWhitespace``, both added to provide finer +- Added ``ignore_whitespace(recurse:bool = True)`` and added a + ``recurse`` argument to ``leave_whitespace``, both added to provide finer control over pyparsing's whitespace skipping. Contributed by Michael Milton. @@ -355,7 +355,7 @@ Other new features and was easily misinterpreted as a ``tuple`` containing a ``list`` and a ``dict``. -- Minor reformatting of output from ``runTests`` to make embedded +- Minor reformatting of output from ``run_tests`` to make embedded comments more visible. - New ``pyparsing_test`` namespace, assert methods and classes added to support writing @@ -404,7 +404,7 @@ API Changes to ``True`` or ``False``). ``enable_all_warnings()`` has also been added. -- ``countedArray`` formerly returned its list of items nested +- ``counted_array`` formerly returned its list of items nested within another list, so that accessing the items required indexing the 0'th element to get the actual list. This extra nesting has been removed. In addition, if there are @@ -417,7 +417,7 @@ API Changes expr = pp.Word(pp.nums) * 3 try: - expr.parseString("123 456 A789") + expr.parse_string("123 456 A789") except pp.ParseException as pe: print(pe.explain(depth=0)) @@ -443,10 +443,10 @@ API Changes will now always return ``True``. This code will need to change to ``"if name in results and results[name]:"`` or just ``"if results[name]:"``. Also, any parser unit tests that check the - ``asDict()`` contents will now see additional entries for parsers + ``as_dict()`` contents will now see additional entries for parsers having named ``ZeroOrMore`` expressions, whose values will be ``[]``. -- ``ParserElement.setDefaultWhitespaceChars`` will now update +- ``ParserElement.set_default_whitespace_chars`` will now update whitespace characters on all built-in expressions defined in the pyparsing module. |