1 files changed, 2587 insertions, 0 deletions
diff --git a/doc/ply.md b/doc/ply.md
new file mode 100644
index 0000000..29a16de
--- /dev/null
+++ b/doc/ply.md
@@ -0,0 +1,2587 @@
+# PLY (Python Lex-Yacc)
+
+This document provides an overview of lexing and parsing with PLY. Given
+the intrinsic complexity of parsing, I strongly advise that you read (or
+at least skim) this entire document before jumping into a big
+development project with PLY.
+
+The current version requires Python 3.6 or newer. If you\'re using an
+older version of Python, use one of the historical releases.
+
+## Introduction
+
+PLY is a pure-Python implementation of the compiler construction tools
+lex and yacc. The main goal of PLY is to stay fairly faithful to the way
+in which traditional lex/yacc tools work. This includes supporting
+LALR(1) parsing as well as providing extensive input validation, error
+reporting, and diagnostics. Thus, if you\'ve used yacc in another
+programming language, it should be relatively straightforward to use
+PLY.
+
+Early versions of PLY were developed to support an Introduction to
+Compilers Course I taught in 2001 at the University of Chicago. Since
+PLY was primarily developed as an instructional tool, you will find it
+to be fairly picky about token and grammar rule specification. In part,
+this added formality is meant to catch common programming mistakes made
+by novice users. However, advanced users will also find such features to
+be useful when building complicated grammars for real programming
+languages. It should also be noted that PLY does not provide much in the
+way of bells and whistles (e.g., automatic construction of abstract
+syntax trees, tree traversal, etc.). Nor would I consider it to be a
+parsing framework. Instead, you will find a bare-bones, yet fully
+capable lex/yacc implementation written entirely in Python.
+
+The rest of this document assumes that you are somewhat familiar with
+parsing theory, syntax directed translation, and the use of compiler
+construction tools such as lex and yacc in other programming languages.
+If you are unfamiliar with these topics, you will probably want to
+consult an introductory text such as \"Compilers: Principles,
+Techniques, and Tools\", by Aho, Sethi, and Ullman. O\'Reilly\'s \"Lex
+and Yacc\" by John Levine may also be handy. In fact, the O\'Reilly book
+can be used as a reference for PLY as the concepts are virtually
+identical.
+
+## PLY Overview
+
+PLY consists of two separate modules; `lex.py` and `yacc.py`, both of
+which are found in a Python package called `ply`. The `lex.py` module is
+used to break input text into a collection of tokens specified by a
+collection of regular expression rules. `yacc.py` is used to recognize
+language syntax that has been specified in the form of a context free
+grammar.
+
+The two tools are meant to work together. Specifically, `lex.py`
+provides an interface to produce tokens. `yacc.py` uses this retrieve
+tokens and invoke grammar rules. The output of `yacc.py` is often an
+Abstract Syntax Tree (AST). However, this is entirely up to the user. If
+desired, `yacc.py` can also be used to implement simple one-pass
+compilers.
+
+Like its Unix counterpart, `yacc.py` provides most of the features you
+expect including extensive error checking, grammar validation, support
+for empty productions, error tokens, and ambiguity resolution via
+precedence rules. In fact, almost everything that is possible in
+traditional yacc should be supported in PLY.
+
+The primary difference between `yacc.py` and Unix `yacc` is that
+`yacc.py` doesn\'t involve a separate code-generation process. Instead,
+PLY relies on reflection (introspection) to build its lexers and
+parsers. Unlike traditional lex/yacc which require a special input file
+that is converted into a separate source file, the specifications given
+to PLY *are* valid Python programs. This means that there are no extra
+source files nor is there a special compiler construction step (e.g.,
+running yacc to generate Python code for the compiler).
+
+## Lex
+
+`lex.py` is used to tokenize an input string. For example, suppose
+you\'re writing a programming language and a user supplied the following
+input string:
+
+    x = 3 + 42 * (s - t)
+
+A tokenizer splits the string into individual tokens:
+
+    'x','=', '3', '+', '42', '*', '(', 's', '-', 't', ')'
+
+Tokens are usually given names to indicate what they are. For example:
+
+    'ID','EQUALS','NUMBER','PLUS','NUMBER','TIMES',
+    'LPAREN','ID','MINUS','ID','RPAREN'
+
+More specifically, the input is broken into pairs of token types and
+values. For example:
+
+    ('ID','x'), ('EQUALS','='), ('NUMBER','3'), 
+    ('PLUS','+'), ('NUMBER','42), ('TIMES','*'),
+    ('LPAREN','('), ('ID','s'), ('MINUS','-'),
+    ('ID','t'), ('RPAREN',')'
+
+The specification of tokens is done by writing a series of regular
+expression rules. The next section shows how this is done using
+`lex.py`.
+
+### Lex Example
+
+The following example shows how `lex.py` is used to write a simple
+tokenizer:
+
+    # ------------------------------------------------------------
+    # calclex.py
+    #
+    # tokenizer for a simple expression evaluator for
+    # numbers and +,-,*,/
+    # ------------------------------------------------------------
+    import ply.lex as lex
+
+    # List of token names.   This is always required
+    tokens = (
+       'NUMBER',
+       'PLUS',
+       'MINUS',
+       'TIMES',
+       'DIVIDE',
+       'LPAREN',
+       'RPAREN',
+    )
+
+    # Regular expression rules for simple tokens
+    t_PLUS    = r'\+'
+    t_MINUS   = r'-'
+    t_TIMES   = r'\*'
+    t_DIVIDE  = r'/'
+    t_LPAREN  = r'\('
+    t_RPAREN  = r'\)'
+
+    # A regular expression rule with some action code
+    def t_NUMBER(t):
+        r'\d+'
+        t.value = int(t.value)    
+        return t
+
+    # Define a rule so we can track line numbers
+    def t_newline(t):
+        r'\n+'
+        t.lexer.lineno += len(t.value)
+
+    # A string containing ignored characters (spaces and tabs)
+    t_ignore  = ' \t'
+
+    # Error handling rule
+    def t_error(t):
+        print("Illegal character '%s'" % t.value[0])
+        t.lexer.skip(1)
+
+    # Build the lexer
+    lexer = lex.lex()
+
+To use the lexer, you first need to feed it some input text using its
+`input()` method. After that, repeated calls to `token()` produce
+tokens. The following code shows how this works:
+
+    # Test it out
+    data = '''
+    3 + 4 * 10
+      + -20 *2
+    '''
+
+    # Give the lexer some input
+    lexer.input(data)
+
+    # Tokenize
+    while True:
+        tok = lexer.token()
+        if not tok: 
+            break      # No more input
+        print(tok)
+
+When executed, the example will produce the following output:
+
+    $ python example.py
+    LexToken(NUMBER,3,2,1)
+    LexToken(PLUS,'+',2,3)
+    LexToken(NUMBER,4,2,5)
+    LexToken(TIMES,'*',2,7)
+    LexToken(NUMBER,10,2,10)
+    LexToken(PLUS,'+',3,14)
+    LexToken(MINUS,'-',3,16)
+    LexToken(NUMBER,20,3,18)
+    LexToken(TIMES,'*',3,20)
+    LexToken(NUMBER,2,3,21)
+
+Lexers also support the iteration protocol. So, you can write the above
+loop as follows:
+
+    for tok in lexer:
+        print(tok)
+
+The tokens returned by `lexer.token()` are instances of `LexToken`. This
+object has attributes `type`, `value`, `lineno`, and `lexpos`. The
+following code shows an example of accessing these attributes:
+
+    # Tokenize
+    while True:
+        tok = lexer.token()
+        if not tok: 
+            break      # No more input
+        print(tok.type, tok.value, tok.lineno, tok.lexpos)
+
+The `type` and `value` attributes contain the type and value of the
+token itself. `lineno` and `lexpos` contain information about the
+location of the token. `lexpos` is the index of the token relative to
+the start of the input text.
+
+### The tokens list
+
+All lexers must provide a list `tokens` that defines all of the possible
+token names that can be produced by the lexer. This list is always
+required and is used to perform a variety of validation checks. The
+tokens list is also used by the `yacc.py` module to identify terminals.
+
+In the example, the following code specified the token names:
+
+    tokens = (
+       'NUMBER',
+       'PLUS',
+       'MINUS',
+       'TIMES',
+       'DIVIDE',
+       'LPAREN',
+       'RPAREN',
+    )
+
+### Specification of tokens
+
+Each token is specified by writing a regular expression rule compatible
+with Python\'s `re` module. Each of these rules are defined by making
+declarations with a special prefix `t_` to indicate that it defines a
+token. For simple tokens, the regular expression can be specified as
+strings such as this (note: Python raw strings are used since they are
+the most convenient way to write regular expression strings):
+
+    t_PLUS = r'\+'
+
+In this case, the name following the `t_` must exactly match one of the
+names supplied in `tokens`. If some kind of action needs to be
+performed, a token rule can be specified as a function. For example,
+this rule matches numbers and converts the string into a Python integer:
+
+    def t_NUMBER(t):
+        r'\d+'
+        t.value = int(t.value)
+        return t
+
+When a function is used, the regular expression rule is specified in the
+function documentation string. The function always takes a single
+argument which is an instance of `LexToken`. This object has attributes
+of `type` which is the token type (as a string), `value` which is the
+lexeme (the actual text matched), `lineno` which is the current line
+number, and `lexpos` which is the position of the token relative to the
+beginning of the input text. By default, `type` is set to the name
+following the `t_` prefix. The action function can modify the contents
+of the `LexToken` object as appropriate. However, when it is done, the
+resulting token should be returned. If no value is returned by the
+action function, the token is discarded and the next token read.
+
+Internally, `lex.py` uses the `re` module to do its pattern matching.
+Patterns are compiled using the `re.VERBOSE` flag which can be used to
+help readability. However, be aware that unescaped whitespace is ignored
+and comments are allowed in this mode. If your pattern involves
+whitespace, make sure you use `\s`. If you need to match the `#`
+character, use `[#]`.
+
+When building the master regular expression, rules are added in the
+following order:
+
+1.  All tokens defined by functions are added in the same order as they
+    appear in the lexer file.
+2.  Tokens defined by strings are added next by sorting them in order of
+    decreasing regular expression length (longer expressions are added
+    first).
+
+Without this ordering, it can be difficult to correctly match certain
+types of tokens. For example, if you wanted to have separate tokens for
+\"=\" and \"==\", you need to make sure that \"==\" is checked first. By
+sorting regular expressions in order of decreasing length, this problem
+is solved for rules defined as strings. For functions, the order can be
+explicitly controlled since rules appearing first are checked first.
+
+To handle reserved words, you should write a single rule to match an
+identifier and do a special name lookup in a function like this:
+
+    reserved = {
+       'if' : 'IF',
+       'then' : 'THEN',
+       'else' : 'ELSE',
+       'while' : 'WHILE',
+       ...
+    }
+
+    tokens = ['LPAREN','RPAREN',...,'ID'] + list(reserved.values())
+
+    def t_ID(t):
+        r'[a-zA-Z_][a-zA-Z_0-9]*'
+        t.type = reserved.get(t.value,'ID')    # Check for reserved words
+        return t
+
+This approach greatly reduces the number of regular expression rules and
+is likely to make things a little faster.
+
+Note: You should avoid writing individual rules for reserved words. For
+example, if you write rules like this:
+
+    t_FOR   = r'for'
+    t_PRINT = r'print'
+
+those rules will be triggered for identifiers that include those words
+as a prefix such as \"forget\" or \"printed\". This is probably not what
+you want.
+
+### Token values
+
+When tokens are returned by lex, they have a value that is stored in the
+`value` attribute. Normally, the value is the text that was matched.
+However, the value can be assigned to any Python object. For instance,
+when lexing identifiers, you may want to return both the identifier name
+and information from some sort of symbol table. To do this, you might
+write a rule like this:
+
+    def t_ID(t):
+        ...
+        # Look up symbol table information and return a tuple
+        t.value = (t.value, symbol_lookup(t.value))
+        ...
+        return t
+
+It is important to note that storing data in other attribute names is
+*not* recommended. The `yacc.py` module only exposes the contents of the
+`value` attribute. Thus, accessing other attributes may be unnecessarily
+awkward. If you need to store multiple values on a token, assign a
+tuple, dictionary, or instance to `value`.
+
+### Discarded tokens
+
+To discard a token, such as a comment, define a token rule that returns
+no value. For example:
+
+    def t_COMMENT(t):
+        r'\#.*'
+        pass
+        # No return value. Token discarded
+
+Alternatively, you can include the prefix `ignore_` in the token
+declaration to force a token to be ignored. For example:
+
+    t_ignore_COMMENT = r'\#.*'
+
+Be advised that if you are ignoring many different kinds of text, you
+may still want to use functions since these provide more precise control
+over the order in which regular expressions are matched (i.e., functions
+are matched in order of specification whereas strings are sorted by
+regular expression length).
+
+### Line numbers and positional information
+
+By default, `lex.py` knows nothing about line numbers. This is because
+`lex.py` doesn\'t know anything about what constitutes a \"line\" of
+input (e.g., the newline character or even if the input is textual
+data). To update this information, you need to write a special rule. In
+the example, the `t_newline()` rule shows how to do this:
+
+    # Define a rule so we can track line numbers
+    def t_newline(t):
+        r'\n+'
+        t.lexer.lineno += len(t.value)
+
+Within the rule, the `lineno` attribute of the underlying lexer
+`t.lexer` is updated. After the line number is updated, the token is
+discarded since nothing is returned.
+
+`lex.py` does not perform any kind of automatic column tracking.
+However, it does record positional information related to each token in
+the `lexpos` attribute. Using this, it is usually possible to compute
+column information as a separate step. For instance, just count
+backwards until you reach a newline:
+
+    # Compute column.
+    #     input is the input text string
+    #     token is a token instance
+    def find_column(input, token):
+        line_start = input.rfind('\n', 0, token.lexpos) + 1
+        return (token.lexpos - line_start) + 1
+
+Since column information is often only useful in the context of error
+handling, calculating the column position can be performed when needed
+as opposed to doing it for each token. Note: If you\'re parsing a
+language where whitespace matters (i.e., Python), it\'s probably better
+match whitespace as a token instead of ignoring it.
+
+### Ignored characters
+
+The special `t_ignore` rule is reserved by `lex.py` for characters that
+should be completely ignored in the input stream. Usually this is used
+to skip over whitespace and other non-essential characters. Although it
+is possible to define a regular expression rule for whitespace in a
+manner similar to `t_newline()`, the use of `t_ignore` provides
+substantially better lexing performance because it is handled as a
+special case and is checked in a much more efficient manner than the
+normal regular expression rules.
+
+The characters given in `t_ignore` are not ignored when such characters
+are part of other regular expression patterns. For example, if you had a
+rule to capture quoted text, that pattern can include the ignored
+characters (which will be captured in the normal way). The main purpose
+of `t_ignore` is to ignore whitespace and other padding between the
+tokens that you actually want to parse.
+
+### Literal characters
+
+Literal characters can be specified by defining a variable `literals` in
+your lexing module. For example:
+
+    literals = [ '+','-','*','/' ]
+
+or alternatively:
+
+    literals = "+-*/"
+
+A literal character is a single character that is returned \"as is\"
+when encountered by the lexer. Literals are checked after all of the
+defined regular expression rules. Thus, if a rule starts with one of the
+literal characters, it will always take precedence.
+
+When a literal token is returned, both its `type` and `value` attributes
+are set to the character itself. For example, `'+'`.
+
+It\'s possible to write token functions that perform additional actions
+when literals are matched. However, you\'ll need to set the token type
+appropriately. For example:
+
+    literals = [ '{', '}' ]
+
+    def t_lbrace(t):
+        r'\{'
+        t.type = '{'      # Set token type to the expected literal
+        return t
+
+    def t_rbrace(t):
+        r'\}'
+        t.type = '}'      # Set token type to the expected literal
+        return t
+
+### Error handling
+
+The `t_error()` function is used to handle lexing errors that occur when
+illegal characters are detected. In this case, the `t.value` attribute
+contains the rest of the input string that has not been tokenized. In
+the example, the error function was defined as follows:
+
+    # Error handling rule
+    def t_error(t):
+        print("Illegal character '%s'" % t.value[0])
+        t.lexer.skip(1)
+
+In this case, we print the offending character and skip ahead one
+character by calling `t.lexer.skip(1)`.
+
+### EOF Handling
+
+The `t_eof()` function is used to handle an end-of-file (EOF) condition
+in the input. As input, it receives a token type `'eof'` with the
+`lineno` and `lexpos` attributes set appropriately. The main use of this
+function is provide more input to the lexer so that it can continue to
+parse. Here is an example of how this works:
+
+    # EOF handling rule
+    def t_eof(t):
+        # Get more input (Example)
+        more = input('... ')
+        if more:
+            t.lexer.input(more)
+            return t.lexer.token()
+        return None
+
+The EOF function should return the next available token (by calling
+`t.lexer.token())` or `None` to indicate no more data. Be aware that
+setting more input with the `t.lexer.input()` method does NOT reset
+the lexer state or the `lineno` attribute used for position tracking.
+The `lexpos` attribute is reset so be aware of that if you\'re using it
+in error reporting.
+
+### Building and using the lexer
+
+To build the lexer, the function `lex.lex()` is used. For example:
+
+    lexer = lex.lex()
+
+This function uses Python reflection (or introspection) to read the
+regular expression rules out of the calling context and build the lexer.
+Once the lexer has been built, two methods can be used to control the
+lexer.
+
+`lexer.input(data)`. Reset the lexer and store a new input string.
+
+`lexer.token()`. Return the next token. Returns a special `LexToken`
+instance on success or None if the end of the input text has been
+reached.
+
+### The \@TOKEN decorator
+
+In some applications, you may want to define tokens as a series of more
+complex regular expression rules. For example:
+
+    digit            = r'([0-9])'
+    nondigit         = r'([_A-Za-z])'
+    identifier       = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)'        
+
+    def t_ID(t):
+        # want docstring to be identifier above. ?????
+        ...
+
+In this case, we want the regular expression rule for `ID` to be one of
+the variables above. However, there is no way to directly specify this
+using a normal documentation string. To solve this problem, you can use
+the `@TOKEN` decorator. For example:
+
+    from ply.lex import TOKEN
+
+    @TOKEN(identifier)
+    def t_ID(t):
+        ...
+
+This will attach `identifier` to the docstring for `t_ID()` allowing
+`lex.py` to work normally. Naturally, you could use `@TOKEN` on all
+functions as an alternative to using docstrings.
+
+### Debugging
+
+For the purpose of debugging, you can run `lex()` in a debugging mode as
+follows:
+
+    lexer = lex.lex(debug=True)
+
+This will produce various sorts of debugging information including all
+of the added rules, the master regular expressions used by the lexer,
+and tokens generating during lexing.
+
+In addition, `lex.py` comes with a simple main function which will
+either tokenize input read from standard input or from a file specified
+on the command line. To use it, put this in your lexer:
+
+    if __name__ == '__main__':
+         lex.runmain()
+
+Please refer to the \"Debugging\" section near the end for some more
+advanced details of debugging.
+
+### Alternative specification of lexers
+
+As shown in the example, lexers are specified all within one Python
+module. If you want to put token rules in a different module from the
+one in which you invoke `lex()`, use the `module` keyword argument.
+
+For example, you might have a dedicated module that just contains the
+token rules:
+
+    # module: tokrules.py
+    # This module just contains the lexing rules
+
+    # List of token names.   This is always required
+    tokens = (
+       'NUMBER',
+       'PLUS',
+       'MINUS',
+       'TIMES',
+       'DIVIDE',
+       'LPAREN',
+       'RPAREN',
+    )
+
+    # Regular expression rules for simple tokens
+    t_PLUS    = r'\+'
+    t_MINUS   = r'-'
+    t_TIMES   = r'\*'
+    t_DIVIDE  = r'/'
+    t_LPAREN  = r'\('
+    t_RPAREN  = r'\)'
+
+    # A regular expression rule with some action code
+    def t_NUMBER(t):
+        r'\d+'
+        t.value = int(t.value)    
+        return t
+
+    # Define a rule so we can track line numbers
+    def t_newline(t):
+        r'\n+'
+        t.lexer.lineno += len(t.value)
+
+    # A string containing ignored characters (spaces and tabs)
+    t_ignore  = ' \t'
+
+    # Error handling rule
+    def t_error(t):
+        print("Illegal character '%s'" % t.value[0])
+        t.lexer.skip(1)
+
+Now, if you wanted to build a tokenizer from these rules from within a
+different module, you would do the following (shown for Python
+interactive mode):
+
+    >>> import tokrules
+    >>> lexer = lex.lex(module=tokrules)
+    >>> lexer.input("3 + 4")
+    >>> lexer.token()
+    LexToken(NUMBER,3,1,1,0)
+    >>> lexer.token()
+    LexToken(PLUS,'+',1,2)
+    >>> lexer.token()
+    LexToken(NUMBER,4,1,4)
+    >>> lexer.token()
+    None
+    >>>
+
+The `module` option can also be used to define lexers from instances of
+a class. For example:
+
+    import ply.lex as lex
+
+    class MyLexer(object):
+        # List of token names.   This is always required
+        tokens = (
+           'NUMBER',
+           'PLUS',
+           'MINUS',
+           'TIMES',
+           'DIVIDE',
+           'LPAREN',
+           'RPAREN',
+        )
+
+        # Regular expression rules for simple tokens
+        t_PLUS    = r'\+'
+        t_MINUS   = r'-'
+        t_TIMES   = r'\*'
+        t_DIVIDE  = r'/'
+        t_LPAREN  = r'\('
+        t_RPAREN  = r'\)'
+
+        # A regular expression rule with some action code
+        # Note addition of self parameter since we're in a class
+        def t_NUMBER(self,t):
+            r'\d+'
+            t.value = int(t.value)    
+            return t
+
+        # Define a rule so we can track line numbers
+        def t_newline(self,t):
+            r'\n+'
+            t.lexer.lineno += len(t.value)
+
+        # A string containing ignored characters (spaces and tabs)
+        t_ignore  = ' \t'
+
+        # Error handling rule
+        def t_error(self,t):
+            print("Illegal character '%s'" % t.value[0])
+            t.lexer.skip(1)
+
+        # Build the lexer
+        def build(self,**kwargs):
+            self.lexer = lex.lex(module=self, **kwargs)
+
+        # Test it output
+        def test(self,data):
+            self.lexer.input(data)
+            while True:
+                 tok = self.lexer.token()
+                 if not tok: 
+                     break
+                 print(tok)
+
+    # Build the lexer and try it out
+    m = MyLexer()
+    m.build()           # Build the lexer
+    m.test("3 + 4")     # Test it
+
+When building a lexer from class, *you should construct the lexer from
+an instance of the class*, not the class object itself. This is because
+PLY only works properly if the lexer actions are defined by
+bound-methods.
+
+When using the `module` option to `lex()`, PLY collects symbols from the
+underlying object using the `dir()` function. There is no direct access
+to the `__dict__` attribute of the object supplied as a module value.
+
+Finally, if you want to keep things nicely encapsulated, but don\'t want
+to use a full-fledged class definition, lexers can be defined using
+closures. For example:
+
+    import ply.lex as lex
+
+    # List of token names.   This is always required
+    tokens = (
+      'NUMBER',
+      'PLUS',
+      'MINUS',
+      'TIMES',
+      'DIVIDE',
+      'LPAREN',
+      'RPAREN',
+    )
+
+    def MyLexer():
+        # Regular expression rules for simple tokens
+        t_PLUS    = r'\+'
+        t_MINUS   = r'-'
+        t_TIMES   = r'\*'
+        t_DIVIDE  = r'/'
+        t_LPAREN  = r'\('
+        t_RPAREN  = r'\)'
+
+        # A regular expression rule with some action code
+        def t_NUMBER(t):
+            r'\d+'
+            t.value = int(t.value)    
+            return t
+
+        # Define a rule so we can track line numbers
+        def t_newline(t):
+            r'\n+'
+            t.lexer.lineno += len(t.value)
+
+        # A string containing ignored characters (spaces and tabs)
+        t_ignore  = ' \t'
+
+        # Error handling rule
+        def t_error(t):
+            print("Illegal character '%s'" % t.value[0])
+            t.lexer.skip(1)
+
+        # Build the lexer from my environment and return it    
+        return lex.lex()
+
+Important note: If you are defining a lexer using a class or closure, be
+aware that PLY still requires you to only define a single lexer per
+module (source file). There are extensive validation/error checking
+parts of the PLY that may falsely report error messages if you don\'t
+follow this rule.
+
+### Maintaining state
+
+In your lexer, you may want to maintain a variety of state information.
+This might include mode settings, symbol tables, and other details. As
+an example, suppose that you wanted to keep track of how many NUMBER
+tokens had been encountered.
+
+One way to do this is to keep a set of global variables in the module
+where you created the lexer. For example:
+
+    num_count = 0
+    def t_NUMBER(t):
+        r'\d+'
+        global num_count
+        num_count += 1
+        t.value = int(t.value)    
+        return t
+
+If you don\'t like the use of a global variable, another place to store
+information is inside the Lexer object created by `lex()`. To this, you
+can use the `lexer` attribute of tokens passed to the various rules. For
+example:
+
+    def t_NUMBER(t):
+        r'\d+'
+        t.lexer.num_count += 1     # Note use of lexer attribute
+        t.value = int(t.value)    
+        return t
+
+    lexer = lex.lex()
+    lexer.num_count = 0            # Set the initial count
+
+This latter approach has the advantage of being simple and working
+correctly in applications where multiple instantiations of a given lexer
+exist in the same application. However, this might also feel like a
+gross violation of encapsulation to OO purists. Just to put your mind at
+some ease, all internal attributes of the lexer (with the exception of
+`lineno`) have names that are prefixed by `lex` (e.g.,
+`lexdata`,`lexpos`, etc.). Thus, it is perfectly safe to store
+attributes in the lexer that don\'t have names starting with that prefix
+or a name that conflicts with one of the predefined methods (e.g.,
+`input()`, `token()`, etc.).
+
+If you don\'t like assigning values on the lexer object, you can define
+your lexer as a class as shown in the previous section:
+
+    class MyLexer:
+        ...
+        def t_NUMBER(self,t):
+            r'\d+'
+            self.num_count += 1
+            t.value = int(t.value)    
+            return t
+
+        def build(self, **kwargs):
+            self.lexer = lex.lex(object=self,**kwargs)
+
+        def __init__(self):
+            self.num_count = 0
+
+The class approach may be the easiest to manage if your application is
+going to be creating multiple instances of the same lexer and you need
+to manage a lot of state.
+
+State can also be managed through closures. For example:
+
+    def MyLexer():
+        num_count = 0
+        ...
+        def t_NUMBER(t):
+            r'\d+'
+            nonlocal num_count
+            num_count += 1
+            t.value = int(t.value)    
+            return t
+        ...
+
+### Lexer cloning
+
+If necessary, a lexer object can be duplicated by invoking its `clone()`
+method. For example:
+
+    lexer = lex.lex()
+    ...
+    newlexer = lexer.clone()
+
+When a lexer is cloned, the copy is exactly identical to the original
+lexer including any input text and internal state. However, the clone
+allows a different set of input text to be supplied which may be
+processed separately. This may be useful in situations when you are
+writing a parser/compiler that involves recursive or reentrant
+processing. For instance, if you needed to scan ahead in the input for
+some reason, you could create a clone and use it to look ahead. Or, if
+you were implementing some kind of preprocessor, cloned lexers could be
+used to handle different input files.
+
+Creating a clone is different than calling `lex.lex()` in that PLY
+doesn\'t regenerate any of the internal tables or regular expressions.
+
+Special considerations need to be made when cloning lexers that also
+maintain their own internal state using classes or closures. Namely, you
+need to be aware that the newly created lexers will share all of this
+state with the original lexer. For example, if you defined a lexer as a
+class and did this:
+
+    m = MyLexer()
+    a = lex.lex(object=m)      # Create a lexer
+
+    b = a.clone()              # Clone the lexer
+
+Then both `a` and `b` are going to be bound to the same object `m` and
+any changes to `m` will be reflected in both lexers. It\'s important to
+emphasize that `clone()` is only meant to create a new lexer that reuses
+the regular expressions and environment of another lexer. If you need to
+make a totally new copy of a lexer, then call `lex()` again.
+
+### Internal lexer state
+
+A Lexer object `lexer` has a number of internal attributes that may be
+useful in certain situations.
+
+`lexer.lexpos`
+
+:   This attribute is an integer that contains the current position
+    within the input text. If you modify the value, it will change the
+    result of the next call to `token()`. Within token rule functions,
+    this points to the first character *after* the matched text. If the
+    value is modified within a rule, the next returned token will be
+    matched at the new position.
+
+`lexer.lineno`
+
+:   The current value of the line number attribute stored in the lexer.
+    PLY only specifies that the attribute exists\-\--it never sets,
+    updates, or performs any processing with it. If you want to track
+    line numbers, you will need to add code yourself (see the section on
+    line numbers and positional information).
+
+`lexer.lexdata`
+
+:   The current input text stored in the lexer. This is the string
+    passed with the `input()` method. It would probably be a bad idea to
+    modify this unless you really know what you\'re doing.
+
+`lexer.lexmatch`
+
+:   This is the raw `Match` object returned by the Python `re.match()`
+    function (used internally by PLY) for the current token. If you have
+    written a regular expression that contains named groups, you can use
+    this to retrieve those values. Note: This attribute is only updated
+    when tokens are defined and processed by functions.
+
+### Conditional lexing and start conditions
+
+In advanced parsing applications, it may be useful to have different
+lexing states. For instance, you may want the occurrence of a certain
+token or syntactic construct to trigger a different kind of lexing. PLY
+supports a feature that allows the underlying lexer to be put into a
+series of different states. Each state can have its own tokens, lexing
+rules, and so forth. The implementation is based largely on the \"start
+condition\" feature of GNU flex. Details of this can be found at
+<http://flex.sourceforge.net/manual/Start-Conditions.html>
+
+To define a new lexing state, it must first be declared. This is done by
+including a \"states\" declaration in your lex file. For example:
+
+    states = (
+       ('foo','exclusive'),
+       ('bar','inclusive'),
+    )
+
+This declaration declares two states, `'foo'` and `'bar'`. States may be
+of two types; `'exclusive'` and `'inclusive'`. An exclusive state
+completely overrides the default behavior of the lexer. That is, lex
+will only return tokens and apply rules defined specifically for that
+state. An inclusive state adds additional tokens and rules to the
+default set of rules. Thus, lex will return both the tokens defined by
+default in addition to those defined for the inclusive state.
+
+Once a state has been declared, tokens and rules are declared by
+including the state name in token/rule declaration. For example:
+
+    t_foo_NUMBER = r'\d+'                      # Token 'NUMBER' in state 'foo'        
+    t_bar_ID     = r'[a-zA-Z_][a-zA-Z0-9_]*'   # Token 'ID' in state 'bar'
+
+    def t_foo_newline(t):
+        r'\n'
+        t.lexer.lineno += 1
+
+A token can be declared in multiple states by including multiple state
+names in the declaration. For example:
+
+    t_foo_bar_NUMBER = r'\d+'         # Defines token 'NUMBER' in both state 'foo' and 'bar'
+
+Alternative, a token can be declared in all states using the \'ANY\' in
+the name:
+
+    t_ANY_NUMBER = r'\d+'         # Defines a token 'NUMBER' in all states
+
+If no state name is supplied, as is normally the case, the token is
+associated with a special state `'INITIAL'`. For example, these two
+declarations are identical:
+
+    t_NUMBER = r'\d+'
+    t_INITIAL_NUMBER = r'\d+'
+
+States are also associated with the special `t_ignore`, `t_error()`, and
+`t_eof()` declarations. For example, if a state treats these
+differently, you can declare:
+
+    t_foo_ignore = " \t\n"       # Ignored characters for state 'foo'
+
+    def t_bar_error(t):          # Special error handler for state 'bar'
+        pass 
+
+By default, lexing operates in the `'INITIAL'` state. This state
+includes all of the normally defined tokens. For users who aren\'t using
+different states, this fact is completely transparent. If, during lexing
+or parsing, you want to change the lexing state, use the `begin()`
+method. For example:
+
+    def t_begin_foo(t):
+        r'start_foo'
+        t.lexer.begin('foo')             # Starts 'foo' state
+
+To get out of a state, you use `begin()` to switch back to the initial
+state. For example:
+
+    def t_foo_end(t):
+        r'end_foo'
+        t.lexer.begin('INITIAL')        # Back to the initial state
+
+The management of states can also be done with a stack. For example:
+
+    def t_begin_foo(t):
+        r'start_foo'
+        t.lexer.push_state('foo')             # Starts 'foo' state
+
+    def t_foo_end(t):
+        r'end_foo'
+        t.lexer.pop_state()                   # Back to the previous state
+
+The use of a stack would be useful in situations where there are many
+ways of entering a new lexing state and you merely want to go back to
+the previous state afterwards.
+
+An example might help clarify. Suppose you were writing a parser and you
+wanted to grab sections of arbitrary C code enclosed by curly braces.
+That is, whenever you encounter a starting brace `'{'`, you want to read
+all of the enclosed code up to the ending brace `'}'` and return it as a
+string. Doing this with a normal regular expression rule is nearly (if
+not actually) impossible. This is because braces can be nested and can
+be included in comments and strings. Thus, matching up to the first
+matching `'}'` character isn\'t good enough. Here is how you might use
+lexer states to do this:
+
+    # Declare the state
+    states = (
+      ('ccode','exclusive'),
+    )
+
+    # Match the first {. Enter ccode state.
+    def t_ccode(t):
+        r'\{'
+        t.lexer.code_start = t.lexer.lexpos        # Record the starting position
+        t.lexer.level = 1                          # Initial brace level
+        t.lexer.begin('ccode')                     # Enter 'ccode' state
+
+    # Rules for the ccode state
+    def t_ccode_lbrace(t):     
+        r'\{'
+        t.lexer.level +=1                
+
+    def t_ccode_rbrace(t):
+        r'\}'
+        t.lexer.level -=1
+
+        # If closing brace, return the code fragment
+        if t.lexer.level == 0:
+             t.value = t.lexer.lexdata[t.lexer.code_start:t.lexer.lexpos+1]
+             t.type = "CCODE"
+             t.lexer.lineno += t.value.count('\n')
+             t.lexer.begin('INITIAL')           
+             return t
+
+    # C or C++ comment (ignore)    
+    def t_ccode_comment(t):
+        r'(/\*(.|\n)*?\*/)|(//.*)'
+        pass
+
+    # C string
+    def t_ccode_string(t):
+       r'\"([^\\\n]|(\\.))*?\"'
+
+    # C character literal
+    def t_ccode_char(t):
+       r'\'([^\\\n]|(\\.))*?\''
+
+    # Any sequence of non-whitespace characters (not braces, strings)
+    def t_ccode_nonspace(t):
+       r'[^\s\{\}\'\"]+'
+
+    # Ignored characters (whitespace)
+    t_ccode_ignore = " \t\n"
+
+    # For bad characters, we just skip over it
+    def t_ccode_error(t):
+        t.lexer.skip(1)
+
+In this example, the occurrence of the first \'{\' causes the lexer to
+record the starting position and enter a new state `'ccode'`. A
+collection of rules then match various parts of the input that follow
+(comments, strings, etc.). All of these rules merely discard the token
+(by not returning a value). However, if the closing right brace is
+encountered, the rule `t_ccode_rbrace` collects all of the code (using
+the earlier recorded starting position), stores it, and returns a token
+\'CCODE\' containing all of that text. When returning the token, the
+lexing state is restored back to its initial state.
+
+### Miscellaneous Issues
+
+-   The lexer requires input to be supplied as a single input string.
+    Since most machines have more than enough memory, this rarely
+    presents a performance concern. However, it means that the lexer
+    currently can\'t be used with streaming data such as open files or
+    sockets. This limitation is primarily a side-effect of using the
+    `re` module. You might be able to work around this by implementing
+    an appropriate `def t_eof()` end-of-file handling rule. The main
+    complication here is that you\'ll probably need to ensure that data
+    is fed to the lexer in a way so that it doesn\'t split in in the
+    middle of a token.
+
+-   If you need to supply optional flags to the re.compile() function,
+    use the reflags option to lex. For example:
+
+        lex.lex(reflags=re.UNICODE | re.VERBOSE)
+
+    Note: by default, `reflags` is set to `re.VERBOSE`. If you provide
+    your own flags, you may need to include this for PLY to preserve its
+    normal behavior.
+
+-   If you are going to create a hand-written lexer and you plan to use
+    it with `yacc.py`, it only needs to conform to the following
+    requirements:
+
+    1.  It must provide a `token()` method that returns the next token
+        or `None` if no more tokens are available.
+    2.  The `token()` method must return an object `tok` that has `type`
+        and `value` attributes. If line number tracking is being used,
+        then the token should also define a `lineno` attribute.
+
+## Parsing basics
+
+`yacc.py` is used to parse language syntax. Before showing an example,
+there are a few important bits of background that must be mentioned.
+First, *syntax* is usually specified in terms of a BNF grammar. For
+example, if you wanted to parse simple arithmetic expressions, you might
+first write an unambiguous grammar specification like this:
+
+    expression : expression + term
+               | expression - term
+               | term
+
+    term       : term * factor
+               | term / factor
+               | factor
+
+    factor     : NUMBER
+               | ( expression )
+
+In the grammar, symbols such as `NUMBER`, `+`, `-`, `*`, and `/` are
+known as *terminals* and correspond to input tokens. Identifiers such as
+`term` and `factor` refer to grammar rules comprised of a collection of
+terminals and other rules. These identifiers are known as
+*non-terminals*.
+
+The semantic behavior of a language is often specified using a technique
+known as syntax directed translation. In syntax directed translation,
+attributes are attached to each symbol in a given grammar rule along
+with an action. Whenever a particular grammar rule is recognized, the
+action describes what to do. For example, given the expression grammar
+above, you might write the specification for a simple calculator like
+this:
+
+    Grammar                             Action
+    --------------------------------    -------------------------------------------- 
+    expression0 : expression1 + term    expression0.val = expression1.val + term.val
+                | expression1 - term    expression0.val = expression1.val - term.val
+                | term                  expression0.val = term.val
+
+    term0       : term1 * factor        term0.val = term1.val * factor.val
+                | term1 / factor        term0.val = term1.val / factor.val
+                | factor                term0.val = factor.val
+
+    factor      : NUMBER                factor.val = int(NUMBER.lexval)
+                | ( expression )        factor.val = expression.val
+
+A good way to think about syntax directed translation is to view each
+symbol in the grammar as a kind of object. Associated with each symbol
+is a value representing its \"state\" (for example, the `val` attribute
+above). Semantic actions are then expressed as a collection of functions
+or methods that operate on the symbols and associated values.
+
+Yacc uses a parsing technique known as LR-parsing or shift-reduce
+parsing. LR parsing is a bottom up technique that tries to recognize the
+right-hand-side of various grammar rules. Whenever a valid
+right-hand-side is found in the input, the appropriate action code is
+triggered and the grammar symbols are replaced by the grammar symbol on
+the left-hand-side.
+
+LR parsing is commonly implemented by shifting grammar symbols onto a
+stack and looking at the stack and the next input token for patterns
+that match one of the grammar rules. The details of the algorithm can be
+found in a compiler textbook, but the following example illustrates the
+steps that are performed if you wanted to parse the expression
+`3 + 5 * (10 - 20)` using the grammar defined above. In the example, the
+special symbol `$` represents the end of input:
+
+    Step Symbol Stack           Input Tokens            Action
+    ---- ---------------------  ---------------------   -------------------------------
+    1                           3 + 5 * ( 10 - 20 )$    Shift 3
+    2    3                        + 5 * ( 10 - 20 )$    Reduce factor : NUMBER
+    3    factor                   + 5 * ( 10 - 20 )$    Reduce term   : factor
+    4    term                     + 5 * ( 10 - 20 )$    Reduce expr : term
+    5    expr                     + 5 * ( 10 - 20 )$    Shift +
+    6    expr +                     5 * ( 10 - 20 )$    Shift 5
+    7    expr + 5                     * ( 10 - 20 )$    Reduce factor : NUMBER
+    8    expr + factor                * ( 10 - 20 )$    Reduce term   : factor
+    9    expr + term                  * ( 10 - 20 )$    Shift *
+    10   expr + term *                  ( 10 - 20 )$    Shift (
+    11   expr + term * (                  10 - 20 )$    Shift 10
+    12   expr + term * ( 10                  - 20 )$    Reduce factor : NUMBER
+    13   expr + term * ( factor              - 20 )$    Reduce term : factor
+    14   expr + term * ( term                - 20 )$    Reduce expr : term
+    15   expr + term * ( expr                - 20 )$    Shift -
+    16   expr + term * ( expr -                20 )$    Shift 20
+    17   expr + term * ( expr - 20                )$    Reduce factor : NUMBER
+    18   expr + term * ( expr - factor            )$    Reduce term : factor
+    19   expr + term * ( expr - term              )$    Reduce expr : expr - term
+    20   expr + term * ( expr                     )$    Shift )
+    21   expr + term * ( expr )                    $    Reduce factor : (expr)
+    22   expr + term * factor                      $    Reduce term : term * factor
+    23   expr + term                               $    Reduce expr : expr + term
+    24   expr                                      $    Reduce expr
+    25                                             $    Success!
+
+When parsing the expression, an underlying state machine and the current
+input token determine what happens next. If the next token looks like
+part of a valid grammar rule (based on other items on the stack), it is
+generally shifted onto the stack. If the top of the stack contains a
+valid right-hand-side of a grammar rule, it is usually \"reduced\" and
+the symbols replaced with the symbol on the left-hand-side. When this
+reduction occurs, the appropriate action is triggered (if defined). If
+the input token can\'t be shifted and the top of stack doesn\'t match
+any grammar rules, a syntax error has occurred and the parser must take
+some kind of recovery step (or bail out). A parse is only successful if
+the parser reaches a state where the symbol stack is empty and there are
+no more input tokens.
+
+It is important to note that the underlying implementation is built
+around a large finite-state machine that is encoded in a collection of
+tables. The construction of these tables is non-trivial and beyond the
+scope of this discussion. However, subtle details of this process
+explain why, in the example above, the parser chooses to shift a token
+onto the stack in step 9 rather than reducing the rule
+`expr : expr + term`.
+
+## Yacc
+
+The `ply.yacc` module implements the parsing component of PLY. The name
+\"yacc\" stands for \"Yet Another Compiler Compiler\" and is borrowed
+from the Unix tool of the same name.
+
+### An example
+
+Suppose you wanted to make a grammar for simple arithmetic expressions
+as previously described. Here is how you would do it with `yacc.py`:
+
+    # Yacc example
+
+    import ply.yacc as yacc
+
+    # Get the token map from the lexer.  This is required.
+    from calclex import tokens
+
+    def p_expression_plus(p):
+        'expression : expression PLUS term'
+        p[0] = p[1] + p[3]
+
+    def p_expression_minus(p):
+        'expression : expression MINUS term'
+        p[0] = p[1] - p[3]
+
+    def p_expression_term(p):
+        'expression : term'
+        p[0] = p[1]
+
+    def p_term_times(p):
+        'term : term TIMES factor'
+        p[0] = p[1] * p[3]
+
+    def p_term_div(p):
+        'term : term DIVIDE factor'
+        p[0] = p[1] / p[3]
+
+    def p_term_factor(p):
+        'term : factor'
+        p[0] = p[1]
+
+    def p_factor_num(p):
+        'factor : NUMBER'
+        p[0] = p[1]
+
+    def p_factor_expr(p):
+        'factor : LPAREN expression RPAREN'
+        p[0] = p[2]
+
+    # Error rule for syntax errors
+    def p_error(p):
+        print("Syntax error in input!")
+
+    # Build the parser
+    parser = yacc.yacc()
+
+    while True:
+       try:
+           s = input('calc > ')
+       except EOFError:
+           break
+       if not s: continue
+       result = parser.parse(s)
+       print(result)
+
+In this example, each grammar rule is defined by a Python function where
+the docstring to that function contains the appropriate context-free
+grammar specification. The statements that make up the function body
+implement the semantic actions of the rule. Each function accepts a
+single argument `p` that is a sequence containing the values of each
+grammar symbol in the corresponding rule. The values of `p[i]` are
+mapped to grammar symbols as shown here:
+
+    def p_expression_plus(p):
+        'expression : expression PLUS term'
+        #   ^            ^        ^    ^
+        #  p[0]         p[1]     p[2] p[3]
+
+        p[0] = p[1] + p[3]
+
+For tokens, the \"value\" of the corresponding `p[i]` is the *same* as
+the `p.value` attribute assigned in the lexer module. For non-terminals,
+the value is determined by whatever is placed in `p[0]` when rules are
+reduced. This value can be anything at all. However, it probably most
+common for the value to be a simple Python type, a tuple, or an
+instance. In this example, we are relying on the fact that the `NUMBER`
+token stores an integer value in its value field. All of the other rules
+perform various types of integer operations and propagate the result.
+
+Note: The use of negative indices have a special meaning in
+yacc\-\--specially `p[-1]` does not have the same value as `p[3]` in
+this example. Please see the section on \"Embedded Actions\" for further
+details.
+
+The first rule defined in the yacc specification determines the starting
+grammar symbol (in this case, a rule for `expression` appears first).
+Whenever the starting rule is reduced by the parser and no more input is
+available, parsing stops and the final value is returned (this value
+will be whatever the top-most rule placed in `p[0]`). Note: an
+alternative starting symbol can be specified using the `start` keyword
+argument to `yacc()`.
+
+The `p_error(p)` rule is defined to catch syntax errors. See the error
+handling section below for more detail.
+
+To build the parser, call the `yacc.yacc()` function. This function
+looks at the module and attempts to construct all of the LR parsing
+tables for the grammar you have specified.
+
+If any errors are detected in your grammar specification, `yacc.py` will
+produce diagnostic messages and possibly raise an exception. Some of the
+errors that can be detected include:
+
+-   Duplicated function names (if more than one rule function have the
+    same name in the grammar file).
+-   Shift/reduce and reduce/reduce conflicts generated by ambiguous
+    grammars.
+-   Badly specified grammar rules.
+-   Infinite recursion (rules that can never terminate).
+-   Unused rules and tokens
+-   Undefined rules and tokens
+
+The next few sections discuss grammar specification in more detail.
+
+The final part of the example shows how to actually run the parser
+created by `yacc()`. To run the parser, you have to call the `parse()`
+with a string of input text. This will run all of the grammar rules and
+return the result of the entire parse. This result return is the value
+assigned to `p[0]` in the starting grammar rule.
+
+### Combining Grammar Rule Functions
+
+When grammar rules are similar, they can be combined into a single
+function. For example, consider the two rules in our earlier example:
+
+    def p_expression_plus(p):
+        'expression : expression PLUS term'
+        p[0] = p[1] + p[3]
+
+    def p_expression_minus(t):
+        'expression : expression MINUS term'
+        p[0] = p[1] - p[3]
+
+Instead of writing two functions, you might write a single function like
+this:
+
+    def p_expression(p):
+        '''expression : expression PLUS term
+                      | expression MINUS term'''
+        if p[2] == '+':
+            p[0] = p[1] + p[3]
+        elif p[2] == '-':
+            p[0] = p[1] - p[3]
+
+In general, the docstring for any given function can contain multiple
+grammar rules. So, it would have also been legal (although possibly
+confusing) to write this:
+
+    def p_binary_operators(p):
+        '''expression : expression PLUS term
+                      | expression MINUS term
+           term       : term TIMES factor
+                      | term DIVIDE factor'''
+        if p[2] == '+':
+            p[0] = p[1] + p[3]
+        elif p[2] == '-':
+            p[0] = p[1] - p[3]
+        elif p[2] == '*':
+            p[0] = p[1] * p[3]
+        elif p[2] == '/':
+            p[0] = p[1] / p[3]
+
+When combining grammar rules into a single function, it is usually a
+good idea for all of the rules to have a similar structure (e.g., the
+same number of terms). Otherwise, the corresponding action code may be
+more complicated than necessary. However, it is possible to handle
+simple cases using len(). For example:
+
+    def p_expressions(p):
+        '''expression : expression MINUS expression
+                      | MINUS expression'''
+        if (len(p) == 4):
+            p[0] = p[1] - p[3]
+        elif (len(p) == 3):
+            p[0] = -p[2]
+
+If parsing performance is a concern, you should resist the urge to put
+too much conditional processing into a single grammar rule as shown in
+these examples. When you add checks to see which grammar rule is being
+handled, you are actually duplicating the work that the parser has
+already performed (i.e., the parser already knows exactly what rule it
+matched). You can eliminate this overhead by using a separate `p_rule()`
+function for each grammar rule.
+
+### Character Literals
+
+If desired, a grammar may contain tokens defined as single character
+literals. For example:
+
+    def p_binary_operators(p):
+        '''expression : expression '+' term
+                      | expression '-' term
+           term       : term '*' factor
+                      | term '/' factor'''
+        if p[2] == '+':
+            p[0] = p[1] + p[3]
+        elif p[2] == '-':
+            p[0] = p[1] - p[3]
+        elif p[2] == '*':
+            p[0] = p[1] * p[3]
+        elif p[2] == '/':
+            p[0] = p[1] / p[3]
+
+A character literal must be enclosed in quotes such as `'+'`. In
+addition, if literals are used, they must be declared in the
+corresponding `lex` file through the use of a special `literals`
+declaration:
+
+    # Literals.  Should be placed in module given to lex()
+    literals = ['+','-','*','/' ]
+
+Character literals are limited to a single character. Thus, it is not
+legal to specify literals such as `'<='` or `'=='`. For this, use the
+normal lexing rules (e.g., define a rule such as `t_EQ = r'=='`).
+
+### Empty Productions
+
+`yacc.py` can handle empty productions by defining a rule like this:
+
+    def p_empty(p):
+        'empty :'
+        pass
+
+Now to use the empty production, use \'empty\' as a symbol. For example:
+
+    def p_optitem(p):
+        'optitem : item'
+        '        | empty'
+        ...
+
+Note: You can write empty rules anywhere by specifying an empty right
+hand side. However, I personally find that writing an \"empty\" rule and
+using \"empty\" to denote an empty production is easier to read and more
+clearly states your intentions.
+
+### Changing the starting symbol
+
+Normally, the first rule found in a yacc specification defines the
+starting grammar rule (top level rule). To change this, supply a `start`
+specifier in your file. For example:
+
+    start = 'foo'
+
+    def p_bar(p):
+        'bar : A B'
+
+    # This is the starting rule due to the start specifier above
+    def p_foo(p):
+        'foo : bar X'
+    ...
+
+The use of a `start` specifier may be useful during debugging since you
+can use it to have yacc build a subset of a larger grammar. For this
+purpose, it is also possible to specify a starting symbol as an argument
+to `yacc()`. For example:
+
+    parser = yacc.yacc(start='foo')
+
+### Dealing With Ambiguous Grammars
+
+The expression grammar given in the earlier example has been written in
+a special format to eliminate ambiguity. However, in many situations, it
+is extremely difficult or awkward to write grammars in this format. A
+much more natural way to express the grammar is in a more compact form
+like this:
+
+    expression : expression PLUS expression
+               | expression MINUS expression
+               | expression TIMES expression
+               | expression DIVIDE expression
+               | LPAREN expression RPAREN
+               | NUMBER
+
+Unfortunately, this grammar specification is ambiguous. For example, if
+you are parsing the string \"3 \* 4 + 5\", there is no way to tell how
+the operators are supposed to be grouped. For example, does the
+expression mean \"(3 \* 4) + 5\" or is it \"3 \* (4+5)\"?
+
+When an ambiguous grammar is given to `yacc.py` it will print messages
+about \"shift/reduce conflicts\" or \"reduce/reduce conflicts\". A
+shift/reduce conflict is caused when the parser generator can\'t decide
+whether or not to reduce a rule or shift a symbol on the parsing stack.
+For example, consider the string \"3 \* 4 + 5\" and the internal parsing
+stack:
+
+    Step Symbol Stack           Input Tokens            Action
+    ---- ---------------------  ---------------------   -------------------------------
+    1    $                                3 * 4 + 5$    Shift 3
+    2    $ 3                                * 4 + 5$    Reduce : expression : NUMBER
+    3    $ expr                             * 4 + 5$    Shift *
+    4    $ expr *                             4 + 5$    Shift 4
+    5    $ expr * 4                             + 5$    Reduce: expression : NUMBER
+    6    $ expr * expr                          + 5$    SHIFT/REDUCE CONFLICT ????
+
+In this case, when the parser reaches step 6, it has two options. One is
+to reduce the rule `expr : expr * expr` on the stack. The other option
+is to shift the token `+` on the stack. Both options are perfectly legal
+from the rules of the context-free-grammar.
+
+By default, all shift/reduce conflicts are resolved in favor of
+shifting. Therefore, in the above example, the parser will always shift
+the `+` instead of reducing. Although this strategy works in many cases
+(for example, the case of \"if-then\" versus \"if-then-else\"), it is
+not enough for arithmetic expressions. In fact, in the above example,
+the decision to shift `+` is completely wrong\-\--we should have reduced
+`expr * expr` since multiplication has higher mathematical precedence
+than addition.
+
+To resolve ambiguity, especially in expression grammars, `yacc.py`
+allows individual tokens to be assigned a precedence level and
+associativity. This is done by adding a variable `precedence` to the
+grammar file like this:
+
+    precedence = (
+        ('left', 'PLUS', 'MINUS'),
+        ('left', 'TIMES', 'DIVIDE'),
+    )
+
+This declaration specifies that `PLUS`/`MINUS` have the same precedence
+level and are left-associative and that `TIMES`/`DIVIDE` have the same
+precedence and are left-associative. Within the `precedence`
+declaration, tokens are ordered from lowest to highest precedence. Thus,
+this declaration specifies that `TIMES`/`DIVIDE` have higher precedence
+than `PLUS`/`MINUS` (since they appear later in the precedence
+specification).
+
+The precedence specification works by associating a numerical precedence
+level value and associativity direction to the listed tokens. For
+example, in the above example you get:
+
+    PLUS      : level = 1,  assoc = 'left'
+    MINUS     : level = 1,  assoc = 'left'
+    TIMES     : level = 2,  assoc = 'left'
+    DIVIDE    : level = 2,  assoc = 'left'
+
+These values are then used to attach a numerical precedence value and
+associativity direction to each grammar rule. *This is always determined
+by looking at the precedence of the right-most terminal symbol.* For
+example:
+
+    expression : expression PLUS expression                 # level = 1, left
+               | expression MINUS expression                # level = 1, left
+               | expression TIMES expression                # level = 2, left
+               | expression DIVIDE expression               # level = 2, left
+               | LPAREN expression RPAREN                   # level = None (not specified)
+               | NUMBER                                     # level = None (not specified)
+
+When shift/reduce conflicts are encountered, the parser generator
+resolves the conflict by looking at the precedence rules and
+associativity specifiers.
+
+1.  If the current token has higher precedence than the rule on the
+    stack, it is shifted.
+2.  If the grammar rule on the stack has higher precedence, the rule is
+    reduced.
+3.  If the current token and the grammar rule have the same precedence,
+    the rule is reduced for left associativity, whereas the token is
+    shifted for right associativity.
+4.  If nothing is known about the precedence, shift/reduce conflicts are
+    resolved in favor of shifting (the default).
+
+For example, if \"expression PLUS expression\" has been parsed and the
+next token is \"TIMES\", the action is going to be a shift because
+\"TIMES\" has a higher precedence level than \"PLUS\". On the other
+hand, if \"expression TIMES expression\" has been parsed and the next
+token is \"PLUS\", the action is going to be reduce because \"PLUS\" has
+a lower precedence than \"TIMES.\"
+
+When shift/reduce conflicts are resolved using the first three
+techniques (with the help of precedence rules), `yacc.py` will report no
+errors or conflicts in the grammar (although it will print some
+information in the `parser.out` debugging file).
+
+One problem with the precedence specifier technique is that it is
+sometimes necessary to change the precedence of an operator in certain
+contexts. For example, consider a unary-minus operator in \"3 + 4 \*
+-5\". Mathematically, the unary minus is normally given a very high
+precedence\--being evaluated before the multiply. However, in our
+precedence specifier, MINUS has a lower precedence than TIMES. To deal
+with this, precedence rules can be given for so-called \"fictitious
+tokens\" like this:
+
+    precedence = (
+        ('left', 'PLUS', 'MINUS'),
+        ('left', 'TIMES', 'DIVIDE'),
+        ('right', 'UMINUS'),            # Unary minus operator
+    )
+
+Now, in the grammar file, we can write our unary minus rule like this:
+
+    def p_expr_uminus(p):
+        'expression : MINUS expression %prec UMINUS'
+        p[0] = -p[2]
+
+In this case, `%prec UMINUS` overrides the default rule
+precedence\--setting it to that of UMINUS in the precedence specifier.
+
+At first, the use of UMINUS in this example may appear very confusing.
+UMINUS is not an input token or a grammar rule. Instead, you should
+think of it as the name of a special marker in the precedence table.
+When you use the `%prec` qualifier, you\'re telling yacc that you want
+the precedence of the expression to be the same as for this special
+marker instead of the usual precedence.
+
+It is also possible to specify non-associativity in the `precedence`
+table. This would be used when you *don\'t* want operations to chain
+together. For example, suppose you wanted to support comparison
+operators like `<` and `>` but you didn\'t want to allow combinations
+like `a < b < c`. To do this, specify a rule like this:
+
+    precedence = (
+        ('nonassoc', 'LESSTHAN', 'GREATERTHAN'),  # Nonassociative operators
+        ('left', 'PLUS', 'MINUS'),
+        ('left', 'TIMES', 'DIVIDE'),
+        ('right', 'UMINUS'),            # Unary minus operator
+    )
+
+If you do this, the occurrence of input text such as `a < b < c` will
+result in a syntax error. However, simple expressions such as `a < b`
+will still be fine.
+
+Reduce/reduce conflicts are caused when there are multiple grammar rules
+that can be applied to a given set of symbols. This kind of conflict is
+almost always bad and is always resolved by picking the rule that
+appears first in the grammar file. Reduce/reduce conflicts are almost
+always caused when different sets of grammar rules somehow generate the
+same set of symbols. For example:
+
+    assignment :  ID EQUALS NUMBER
+               |  ID EQUALS expression
+
+    expression : expression PLUS expression
+               | expression MINUS expression
+               | expression TIMES expression
+               | expression DIVIDE expression
+               | LPAREN expression RPAREN
+               | NUMBER
+
+In this case, a reduce/reduce conflict exists between these two rules:
+
+    assignment  : ID EQUALS NUMBER
+    expression  : NUMBER
+
+For example, if you wrote \"a = 5\", the parser can\'t figure out if
+this is supposed to be reduced as `assignment : ID EQUALS NUMBER` or
+whether it\'s supposed to reduce the 5 as an expression and then reduce
+the rule `assignment : ID EQUALS expression`.
+
+It should be noted that reduce/reduce conflicts are notoriously
+difficult to spot looking at the input grammar. When a reduce/reduce
+conflict occurs, `yacc()` will try to help by printing a warning message
+such as this:
+
+    WARNING: 1 reduce/reduce conflict
+    WARNING: reduce/reduce conflict in state 15 resolved using rule (assignment -> ID EQUALS NUMBER)
+    WARNING: rejected rule (expression -> NUMBER)
+
+This message identifies the two rules that are in conflict. However, it
+may not tell you how the parser arrived at such a state. To try and
+figure it out, you\'ll probably have to look at your grammar and the
+contents of the `parser.out` debugging file with an appropriately high
+level of caffeination.
+
+### The parser.out file
+
+Tracking down shift/reduce and reduce/reduce conflicts is one of the
+finer pleasures of using an LR parsing algorithm. To assist in
+debugging, `yacc.py` can create a debugging file called \'parser.out\'.
+To create this file, use `yacc.yacc(debug=True)`. The contents of this
+file look like the following:
+
+    Unused terminals:
+
+
+    Grammar
+
+    Rule 1     expression -> expression PLUS expression
+    Rule 2     expression -> expression MINUS expression
+    Rule 3     expression -> expression TIMES expression
+    Rule 4     expression -> expression DIVIDE expression
+    Rule 5     expression -> NUMBER
+    Rule 6     expression -> LPAREN expression RPAREN
+
+    Terminals, with rules where they appear
+
+    TIMES                : 3
+    error                : 
+    MINUS                : 2
+    RPAREN               : 6
+    LPAREN               : 6
+    DIVIDE               : 4
+    PLUS                 : 1
+    NUMBER               : 5
+
+    Nonterminals, with rules where they appear
+
+    expression           : 1 1 2 2 3 3 4 4 6 0
+
+
+    Parsing method: LALR
+
+
+    state 0
+
+        S' -> . expression
+        expression -> . expression PLUS expression
+        expression -> . expression MINUS expression
+        expression -> . expression TIMES expression
+        expression -> . expression DIVIDE expression
+        expression -> . NUMBER
+        expression -> . LPAREN expression RPAREN
+
+        NUMBER          shift and go to state 3
+        LPAREN          shift and go to state 2
+
+
+    state 1
+
+        S' -> expression .
+        expression -> expression . PLUS expression
+        expression -> expression . MINUS expression
+        expression -> expression . TIMES expression
+        expression -> expression . DIVIDE expression
+
+        PLUS            shift and go to state 6
+        MINUS           shift and go to state 5
+        TIMES           shift and go to state 4
+        DIVIDE          shift and go to state 7
+
+
+    state 2
+
+        expression -> LPAREN . expression RPAREN
+        expression -> . expression PLUS expression
+        expression -> . expression MINUS expression
+        expression -> . expression TIMES expression
+        expression -> . expression DIVIDE expression
+        expression -> . NUMBER
+        expression -> . LPAREN expression RPAREN
+
+        NUMBER          shift and go to state 3
+        LPAREN          shift and go to state 2
+
+
+    state 3
+
+        expression -> NUMBER .
+
+        $               reduce using rule 5
+        PLUS            reduce using rule 5
+        MINUS           reduce using rule 5
+        TIMES           reduce using rule 5
+        DIVIDE          reduce using rule 5
+        RPAREN          reduce using rule 5
+
+
+    state 4
+
+        expression -> expression TIMES . expression
+        expression -> . expression PLUS expression
+        expression -> . expression MINUS expression
+        expression -> . expression TIMES expression
+        expression -> . expression DIVIDE expression
+        expression -> . NUMBER
+        expression -> . LPAREN expression RPAREN
+
+        NUMBER          shift and go to state 3
+        LPAREN          shift and go to state 2
+
+
+    state 5
+
+        expression -> expression MINUS . expression
+        expression -> . expression PLUS expression
+        expression -> . expression MINUS expression
+        expression -> . expression TIMES expression
+        expression -> . expression DIVIDE expression
+        expression -> . NUMBER
+        expression -> . LPAREN expression RPAREN
+
+        NUMBER          shift and go to state 3
+        LPAREN          shift and go to state 2
+
+
+    state 6
+
+        expression -> expression PLUS . expression
+        expression -> . expression PLUS expression
+        expression -> . expression MINUS expression
+        expression -> . expression TIMES expression
+        expression -> . expression DIVIDE expression
+        expression -> . NUMBER
+        expression -> . LPAREN expression RPAREN
+
+        NUMBER          shift and go to state 3
+        LPAREN          shift and go to state 2
+
+
+    state 7
+
+        expression -> expression DIVIDE . expression
+        expression -> . expression PLUS expression
+        expression -> . expression MINUS expression
+        expression -> . expression TIMES expression
+        expression -> . expression DIVIDE expression
+        expression -> . NUMBER
+        expression -> . LPAREN expression RPAREN
+
+        NUMBER          shift and go to state 3
+        LPAREN          shift and go to state 2
+
+
+    state 8
+
+        expression -> LPAREN expression . RPAREN
+        expression -> expression . PLUS expression
+        expression -> expression . MINUS expression
+        expression -> expression . TIMES expression
+        expression -> expression . DIVIDE expression
+
+        RPAREN          shift and go to state 13
+        PLUS            shift and go to state 6
+        MINUS           shift and go to state 5
+        TIMES           shift and go to state 4
+        DIVIDE          shift and go to state 7
+
+
+    state 9
+
+        expression -> expression TIMES expression .
+        expression -> expression . PLUS expression
+        expression -> expression . MINUS expression
+        expression -> expression . TIMES expression
+        expression -> expression . DIVIDE expression
+
+        $               reduce using rule 3
+        PLUS            reduce using rule 3
+        MINUS           reduce using rule 3
+        TIMES           reduce using rule 3
+        DIVIDE          reduce using rule 3
+        RPAREN          reduce using rule 3
+
+      ! PLUS            [ shift and go to state 6 ]
+      ! MINUS           [ shift and go to state 5 ]
+      ! TIMES           [ shift and go to state 4 ]
+      ! DIVIDE          [ shift and go to state 7 ]
+
+    state 10
+
+        expression -> expression MINUS expression .
+        expression -> expression . PLUS expression
+        expression -> expression . MINUS expression
+        expression -> expression . TIMES expression
+        expression -> expression . DIVIDE expression
+
+        $               reduce using rule 2
+        PLUS            reduce using rule 2
+        MINUS           reduce using rule 2
+        RPAREN          reduce using rule 2
+        TIMES           shift and go to state 4
+        DIVIDE          shift and go to state 7
+
+      ! TIMES           [ reduce using rule 2 ]
+      ! DIVIDE          [ reduce using rule 2 ]
+      ! PLUS            [ shift and go to state 6 ]
+      ! MINUS           [ shift and go to state 5 ]
+
+    state 11
+
+        expression -> expression PLUS expression .
+        expression -> expression . PLUS expression
+        expression -> expression . MINUS expression
+        expression -> expression . TIMES expression
+        expression -> expression . DIVIDE expression
+
+        $               reduce using rule 1
+        PLUS            reduce using rule 1
+        MINUS           reduce using rule 1
+        RPAREN          reduce using rule 1
+        TIMES           shift and go to state 4
+        DIVIDE          shift and go to state 7
+
+      ! TIMES           [ reduce using rule 1 ]
+      ! DIVIDE          [ reduce using rule 1 ]
+      ! PLUS            [ shift and go to state 6 ]
+      ! MINUS           [ shift and go to state 5 ]
+
+    state 12
+
+        expression -> expression DIVIDE expression .
+        expression -> expression . PLUS expression
+        expression -> expression . MINUS expression
+        expression -> expression . TIMES expression
+        expression -> expression . DIVIDE expression
+
+        $               reduce using rule 4
+        PLUS            reduce using rule 4
+        MINUS           reduce using rule 4
+        TIMES           reduce using rule 4
+        DIVIDE          reduce using rule 4
+        RPAREN          reduce using rule 4
+
+      ! PLUS            [ shift and go to state 6 ]
+      ! MINUS           [ shift and go to state 5 ]
+      ! TIMES           [ shift and go to state 4 ]
+      ! DIVIDE          [ shift and go to state 7 ]
+
+    state 13
+
+        expression -> LPAREN expression RPAREN .
+
+        $               reduce using rule 6
+        PLUS            reduce using rule 6
+        MINUS           reduce using rule 6
+        TIMES           reduce using rule 6
+        DIVIDE          reduce using rule 6
+        RPAREN          reduce using rule 6
+
+The different states that appear in this file are a representation of
+every possible sequence of valid input tokens allowed by the grammar.
+When receiving input tokens, the parser is building up a stack and
+looking for matching rules. Each state keeps track of the grammar rules
+that might be in the process of being matched at that point. Within each
+rule, the \".\" character indicates the current location of the parse
+within that rule. In addition, the actions for each valid input token
+are listed. When a shift/reduce or reduce/reduce conflict arises, rules
+*not* selected are prefixed with an !. For example:
+
+    ! TIMES           [ reduce using rule 2 ]
+    ! DIVIDE          [ reduce using rule 2 ]
+    ! PLUS            [ shift and go to state 6 ]
+    ! MINUS           [ shift and go to state 5 ]
+
+By looking at these rules (and with a little practice), you can usually
+track down the source of most parsing conflicts. It should also be
+stressed that not all shift-reduce conflicts are bad. However, the only
+way to be sure that they are resolved correctly is to look at
+`parser.out`.
+
+### Syntax Error Handling
+
+If you are creating a parser for production use, the handling of syntax
+errors is important. As a general rule, you don\'t want a parser to
+throw up its hands and stop at the first sign of trouble. Instead, you
+want it to report the error, recover if possible, and continue parsing
+so that all of the errors in the input get reported to the user at once.
+This is the standard behavior found in compilers for languages such as
+C, C++, and Java.
+
+In PLY, when a syntax error occurs during parsing, the error is
+immediately detected (i.e., the parser does not read any more tokens
+beyond the source of the error). However, at this point, the parser
+enters a recovery mode that can be used to try and continue further
+parsing. As a general rule, error recovery in LR parsers is a delicate
+topic that involves ancient rituals and black-magic. The recovery
+mechanism provided by `yacc.py` is comparable to Unix yacc so you may
+want consult a book like O\'Reilly\'s \"Lex and Yacc\" for some of the
+finer details.
+
+When a syntax error occurs, `yacc.py` performs the following steps:
+
+1.  On the first occurrence of an error, the user-defined `p_error()`
+    function is called with the offending token as an argument. However,
+    if the syntax error is due to reaching the end-of-file, `p_error()`
+    is called with an argument of `None`. Afterwards, the parser enters
+    an \"error-recovery\" mode in which it will not make future calls to
+    `p_error()` until it has successfully shifted at least 3 tokens onto
+    the parsing stack.
+2.  If no recovery action is taken in `p_error()`, the offending
+    lookahead token is replaced with a special `error` token.
+3.  If the offending lookahead token is already set to `error`, the top
+    item of the parsing stack is deleted.
+4.  If the entire parsing stack is unwound, the parser enters a restart
+    state and attempts to start parsing from its initial state.
+5.  If a grammar rule accepts `error` as a token, it will be shifted
+    onto the parsing stack.
+6.  If the top item of the parsing stack is `error`, lookahead tokens
+    will be discarded until the parser can successfully shift a new
+    symbol or reduce a rule involving `error`.
+
+#### Recovery and resynchronization with error rules
+
+The most well-behaved approach for handling syntax errors is to write
+grammar rules that include the `error` token. For example, suppose your
+language had a grammar rule for a print statement like this:
+
+    def p_statement_print(p):
+         'statement : PRINT expr SEMI'
+         ...
+
+To account for the possibility of a bad expression, you might write an
+additional grammar rule like this:
+
+    def p_statement_print_error(p):
+         'statement : PRINT error SEMI'
+         print("Syntax error in print statement. Bad expression")
+
+In this case, the `error` token will match any sequence of tokens that
+might appear up to the first semicolon that is encountered. Once the
+semicolon is reached, the rule will be invoked and the `error` token
+will go away.
+
+This type of recovery is sometimes known as parser resynchronization.
+The `error` token acts as a wildcard for any bad input text and the
+token immediately following `error` acts as a synchronization token.
+
+It is important to note that the `error` token usually does not appear
+as the last token on the right in an error rule. For example:
+
+    def p_statement_print_error(p):
+        'statement : PRINT error'
+        print("Syntax error in print statement. Bad expression")
+
+This is because the first bad token encountered will cause the rule to
+be reduced\--which may make it difficult to recover if more bad tokens
+immediately follow.
+
+#### Panic mode recovery
+
+An alternative error recovery scheme is to enter a panic mode recovery
+in which tokens are discarded to a point where the parser might be able
+to recover in some sensible manner.
+
+Panic mode recovery is implemented entirely in the `p_error()` function.
+For example, this function starts discarding tokens until it reaches a
+closing \'}\'. Then, it restarts the parser in its initial state:
+
+    def p_error(p):
+        print("Whoa. You are seriously hosed.")
+        if not p:
+            print("End of File!")
+            return
+
+        # Read ahead looking for a closing '}'
+        while True:
+            tok = parser.token()             # Get the next token
+            if not tok or tok.type == 'RBRACE': 
+                break
+        parser.restart()
+
+This function discards the bad token and tells the parser that the error
+was ok:
+
+    def p_error(p):
+        if p:
+             print("Syntax error at token", p.type)
+             # Just discard the token and tell the parser it's okay.
+             parser.errok()
+        else:
+             print("Syntax error at EOF")
+
+More information on these methods is as follows:
+
+`parser.errok()`
+
+:   This resets the parser state so it doesn\'t think it\'s in
+    error-recovery mode. This will prevent an `error` token from being
+    generated and will reset the internal error counters so that the
+    next syntax error will call `p_error()` again.
+
+`parser.token()`
+
+:   This returns the next token on the input stream.
+
+`parser.restart()`.
+
+:   This discards the entire parsing stack and resets the parser to its
+    initial state.
+
+To supply the next lookahead token to the parser, `p_error()` can return
+a token. This might be useful if trying to synchronize on special
+characters. For example:
+
+    def p_error(p):
+        # Read ahead looking for a terminating ";"
+        while True:
+            tok = parser.token()             # Get the next token
+            if not tok or tok.type == 'SEMI': break
+        parser.errok()
+
+        # Return SEMI to the parser as the next lookahead token
+        return tok  
+
+Keep in mind in that the above error handling functions, `parser` is an
+instance of the parser created by `yacc()`. You\'ll need to save this
+instance someplace in your code so that you can refer to it during error
+handling.
+
+#### Signalling an error from a production
+
+If necessary, a production rule can manually force the parser to enter
+error recovery. This is done by raising the `SyntaxError` exception like
+this:
+
+    def p_production(p):
+        'production : some production ...'
+        raise SyntaxError
+
+The effect of raising `SyntaxError` is the same as if the last symbol
+shifted onto the parsing stack was actually a syntax error. Thus, when
+you do this, the last symbol shifted is popped off of the parsing stack
+and the current lookahead token is set to an `error` token. The parser
+then enters error-recovery mode where it tries to reduce rules that can
+accept `error` tokens. The steps that follow from this point are exactly
+the same as if a syntax error were detected and `p_error()` were called.
+
+One important aspect of manually setting an error is that the
+`p_error()` function will NOT be called in this case. If you need to
+issue an error message, make sure you do it in the production that
+raises `SyntaxError`.
+
+Note: This feature of PLY is meant to mimic the behavior of the YYERROR
+macro in yacc.
+
+#### When Do Syntax Errors Get Reported?
+
+In most cases, yacc will handle errors as soon as a bad input token is
+detected on the input. However, be aware that yacc may choose to delay
+error handling until after it has reduced one or more grammar rules
+first. This behavior might be unexpected, but it\'s related to special
+states in the underlying parsing table known as \"defaulted states.\" A
+defaulted state is parsing condition where the same grammar rule will be
+reduced regardless of what *valid* token comes next on the input. For
+such states, yacc chooses to go ahead and reduce the grammar rule
+*without reading the next input token*. If the next token is bad, yacc
+will eventually get around to reading it and report a syntax error.
+It\'s just a little unusual in that you might see some of your grammar
+rules firing immediately prior to the syntax error.
+
+Usually, the delayed error reporting with defaulted states is harmless
+(and there are other reasons for wanting PLY to behave in this way).
+However, if you need to turn this behavior off for some reason. You can
+clear the defaulted states table like this:
+
+    parser = yacc.yacc()
+    parser.defaulted_states = {}
+
+Disabling defaulted states is not recommended if your grammar makes use
+of embedded actions as described in Section 6.11.
+
+#### General comments on error handling
+
+For normal types of languages, error recovery with error rules and
+resynchronization characters is probably the most reliable technique.
+This is because you can instrument the grammar to catch errors at
+selected places where it is relatively easy to recover and continue
+parsing. Panic mode recovery is really only useful in certain
+specialized applications where you might want to discard huge portions
+of the input text to find a valid restart point.
+
+### Line Number and Position Tracking
+
+Position tracking is often a tricky problem when writing compilers. By
+default, PLY tracks the line number and position of all tokens. This
+information is available using the following functions:
+
+`p.lineno(num)`. Return the line number for symbol *num*
+
+`p.lexpos(num)`. Return the lexing position for symbol *num*
+
+For example:
+
+    def p_expression(p):
+        'expression : expression PLUS expression'
+        line   = p.lineno(2)        # line number of the PLUS token
+        index  = p.lexpos(2)        # Position of the PLUS token
+
+As an optional feature, `yacc.py` can automatically track line numbers
+and positions for all of the grammar symbols as well. However, this
+extra tracking requires extra processing and can significantly slow down
+parsing. Therefore, it must be enabled by passing the `tracking=True`
+option to `yacc.parse()`. For example:
+
+    yacc.parse(data,tracking=True)
+
+Once enabled, the `lineno()` and `lexpos()` methods work for all grammar
+symbols. In addition, two additional methods can be used:
+
+`p.linespan(num)`. Return a tuple (startline,endline) with the starting
+and ending line number for symbol *num*.
+
+`p.lexspan(num)`. Return a tuple (start,end) with the starting and
+ending positions for symbol *num*.
+
+For example:
+
+    def p_expression(p):
+        'expression : expression PLUS expression'
+        p.lineno(1)        # Line number of the left expression
+        p.lineno(2)        # line number of the PLUS operator
+        p.lineno(3)        # line number of the right expression
+        ...
+        start,end = p.linespan(3)    # Start,end lines of the right expression
+        starti,endi = p.lexspan(3)   # Start,end positions of right expression
+
+Note: The `lexspan()` function only returns the range of values up to
+the start of the last grammar symbol.
+
+Although it may be convenient for PLY to track position information on
+all grammar symbols, this is often unnecessary. For example, if you are
+merely using line number information in an error message, you can often
+just key off of a specific token in the grammar rule. For example:
+
+    def p_bad_func(p):
+        'funccall : fname LPAREN error RPAREN'
+        # Line number reported from LPAREN token
+        print("Bad function call at line", p.lineno(2))
+
+Similarly, you may get better parsing performance if you only
+selectively propagate line number information where it\'s needed using
+the `p.set_lineno()` method. For example:
+
+    def p_fname(p):
+        'fname : ID'
+        p[0] = p[1]
+        p.set_lineno(0,p.lineno(1))
+
+PLY doesn\'t retain line number information from rules that have already
+been parsed. If you are building an abstract syntax tree and need to
+have line numbers, you should make sure that the line numbers appear in
+the tree itself.
+
+### AST Construction
+
+`yacc.py` provides no special functions for constructing an abstract
+syntax tree. However, such construction is easy enough to do on your
+own.
+
+A minimal way to construct a tree is to create and propagate a tuple or
+list in each grammar rule function. There are many possible ways to do
+this, but one example would be something like this:
+
+    def p_expression_binop(p):
+        '''expression : expression PLUS expression
+                      | expression MINUS expression
+                      | expression TIMES expression
+                      | expression DIVIDE expression'''
+
+        p[0] = ('binary-expression',p[2],p[1],p[3])
+
+    def p_expression_group(p):
+        'expression : LPAREN expression RPAREN'
+        p[0] = ('group-expression',p[2])
+
+    def p_expression_number(p):
+        'expression : NUMBER'
+        p[0] = ('number-expression',p[1])
+
+Another approach is to create a set of data structure for different
+kinds of abstract syntax tree nodes and assign nodes to `p[0]` in each
+rule. For example:
+
+    class Expr: pass
+
+    class BinOp(Expr):
+        def __init__(self,left,op,right):
+            self.left = left
+            self.right = right
+            self.op = op
+
+    class Number(Expr):
+        def __init__(self,value):
+            self.value = value
+
+    def p_expression_binop(p):
+        '''expression : expression PLUS expression
+                      | expression MINUS expression
+                      | expression TIMES expression
+                      | expression DIVIDE expression'''
+
+        p[0] = BinOp(p[1],p[2],p[3])
+
+    def p_expression_group(p):
+        'expression : LPAREN expression RPAREN'
+        p[0] = p[2]
+
+    def p_expression_number(p):
+        'expression : NUMBER'
+        p[0] = Number(p[1])
+
+The advantage to this approach is that it may make it easier to attach
+more complicated semantics, type checking, code generation, and other
+features to the node classes.
+
+To simplify tree traversal, it may make sense to pick a very generic
+tree structure for your parse tree nodes. For example:
+
+    class Node:
+        def __init__(self,type,children=None,leaf=None):
+             self.type = type
+             if children:
+                  self.children = children
+             else:
+                  self.children = [ ]
+             self.leaf = leaf
+
+    def p_expression_binop(p):
+        '''expression : expression PLUS expression
+                      | expression MINUS expression
+                      | expression TIMES expression
+                      | expression DIVIDE expression'''
+
+        p[0] = Node("binop", [p[1],p[3]], p[2])
+
+### Embedded Actions
+
+The parsing technique used by yacc only allows actions to be executed at
+the end of a rule. For example, suppose you have a rule like this:
+
+    def p_foo(p):
+        "foo : A B C D"
+        print("Parsed a foo", p[1],p[2],p[3],p[4])
+
+In this case, the supplied action code only executes after all of the
+symbols `A`, `B`, `C`, and `D` have been parsed. Sometimes, however, it
+is useful to execute small code fragments during intermediate stages of
+parsing. For example, suppose you wanted to perform some action
+immediately after `A` has been parsed. To do this, write an empty rule
+like this:
+
+    def p_foo(p):
+        "foo : A seen_A B C D"
+        print("Parsed a foo", p[1],p[3],p[4],p[5])
+        print("seen_A returned", p[2])
+
+    def p_seen_A(p):
+        "seen_A :"
+        print("Saw an A = ", p[-1])   # Access grammar symbol to left
+        p[0] = some_value            # Assign value to seen_A
+
+In this example, the empty `seen_A` rule executes immediately after `A`
+is shifted onto the parsing stack. Within this rule, `p[-1]` refers to
+the symbol on the stack that appears immediately to the left of the
+`seen_A` symbol. In this case, it would be the value of `A` in the `foo`
+rule immediately above. Like other rules, a value can be returned from
+an embedded action by assigning it to `p[0]`
+
+The use of embedded actions can sometimes introduce extra shift/reduce
+conflicts. For example, this grammar has no conflicts:
+
+    def p_foo(p):
+        """foo : abcd
+               | abcx"""
+
+    def p_abcd(p):
+        "abcd : A B C D"
+
+    def p_abcx(p):
+        "abcx : A B C X"
+
+However, if you insert an embedded action into one of the rules like
+this:
+
+    def p_foo(p):
+        """foo : abcd
+               | abcx"""
+
+    def p_abcd(p):
+        "abcd : A B C D"
+
+    def p_abcx(p):
+        "abcx : A B seen_AB C X"
+
+    def p_seen_AB(p):
+        "seen_AB :"
+
+an extra shift-reduce conflict will be introduced. This conflict is
+caused by the fact that the same symbol `C` appears next in both the
+`abcd` and `abcx` rules. The parser can either shift the symbol (`abcd`
+rule) or reduce the empty rule `seen_AB` (`abcx` rule).
+
+A common use of embedded rules is to control other aspects of parsing
+such as scoping of local variables. For example, if you were parsing C
+code, you might write code like this:
+
+    def p_statements_block(p):
+        "statements: LBRACE new_scope statements RBRACE"""
+        # Action code
+        ...
+        pop_scope()        # Return to previous scope
+
+    def p_new_scope(p):
+        "new_scope :"
+        # Create a new scope for local variables
+        s = new_scope()
+        push_scope(s)
+        ...
+
+In this case, the embedded action `new_scope` executes immediately after
+a `LBRACE` (`{`) symbol is parsed. This might adjust internal symbol
+tables and other aspects of the parser. Upon completion of the rule
+`statements_block`, code might undo the operations performed in the
+embedded action (e.g., `pop_scope()`).
+
+### Miscellaneous Yacc Notes
+
+1.  By default, `yacc.py` relies on `lex.py` for tokenizing. However, an
+    alternative tokenizer can be supplied as follows:
+
+        parser = yacc.parse(lexer=x)
+
+    in this case, `x` must be a Lexer object that minimally has a
+    `x.token()` method for retrieving the next token. If an input string
+    is given to `yacc.parse()`, the lexer must also have an `x.input()`
+    method.
+
+2.  To print copious amounts of debugging during parsing, use:
+
+        parser.parse(input_text, debug=True)     
+
+3.  Since LR parsing is driven by tables, the performance of the parser
+    is largely independent of the size of the grammar. The biggest
+    bottlenecks will be the lexer and the complexity of the code in your
+    grammar rules.
+
+4.  `yacc()` also allows parsers to be defined as classes and as
+    closures (see the section on alternative specification of lexers).
+    However, be aware that only one parser may be defined in a single
+    module (source file). There are various error checks and validation
+    steps that may issue confusing error messages if you try to define
+    multiple parsers in the same source file.
+
+## Multiple Parsers and Lexers
+
+In advanced parsing applications, you may want to have multiple parsers
+and lexers.
+
+As a general rules this isn\'t a problem. However, to make it work, you
+need to carefully make sure everything gets hooked up correctly. First,
+make sure you save the objects returned by `lex()` and `yacc()`. For
+example:
+
+    lexer  = lex.lex()       # Return lexer object
+    parser = yacc.yacc()     # Return parser object
+
+Next, when parsing, make sure you give the `parse()` function a
+reference to the lexer it should be using. For example:
+
+    parser.parse(text,lexer=lexer)
+
+If you forget to do this, the parser will use the last lexer
+created\--which is not always what you want.
+
+Within lexer and parser rule functions, these objects are also
+available. In the lexer, the \"lexer\" attribute of a token refers to
+the lexer object that triggered the rule. For example:
+
+    def t_NUMBER(t):
+       r'\d+'
+       ...
+       print(t.lexer)           ## Show lexer object
+
+In the parser, the \"lexer\" and \"parser\" attributes refer to the
+lexer and parser objects respectively:
+
+    def p_expr_plus(p):
+       'expr : expr PLUS expr'
+       ...
+       print(p.parser)          # Show parser object
+       print(p.lexer)           # Show lexer object
+
+If necessary, arbitrary attributes can be attached to the lexer or
+parser object. For example, if you wanted to have different parsing
+modes, you could attach a mode attribute to the parser object and look
+at it later.
+
+## Advanced Debugging
+
+Debugging a compiler is typically not an easy task. PLY provides some
+diagostic capabilities through the use of Python\'s `logging` module.
+The next two sections describe this:
+
+### Debugging the lex() and yacc() commands
+
+Both the `lex()` and `yacc()` commands have a debugging mode that can be
+enabled using the `debug` flag. For example:
+
+    lex.lex(debug=True)
+    yacc.yacc(debug=True)
+
+Normally, the output produced by debugging is routed to either standard
+error or, in the case of `yacc()`, to a file `parser.out`. This output
+can be more carefully controlled by supplying a logging object. Here is
+an example that adds information about where different debugging
+messages are coming from:
+
+    # Set up a logging object
+    import logging
+    logging.basicConfig(
+        level = logging.DEBUG,
+        filename = "parselog.txt",
+        filemode = "w",
+        format = "%(filename)10s:%(lineno)4d:%(message)s"
+    )
+    log = logging.getLogger()
+
+    lex.lex(debug=True,debuglog=log)
+    yacc.yacc(debug=True,debuglog=log)
+
+If you supply a custom logger, the amount of debugging information
+produced can be controlled by setting the logging level. Typically,
+debugging messages are either issued at the `DEBUG`, `INFO`, or
+`WARNING` levels.
+
+PLY\'s error messages and warnings are also produced using the logging
+interface. This can be controlled by passing a logging object using the
+`errorlog` parameter:
+
+    lex.lex(errorlog=log)
+    yacc.yacc(errorlog=log)
+
+If you want to completely silence warnings, you can either pass in a
+logging object with an appropriate filter level or use the `NullLogger`
+object defined in either `lex` or `yacc`. For example:
+
+    yacc.yacc(errorlog=yacc.NullLogger())
+
+### Run-time Debugging
+
+To enable run-time debugging of a parser, use the `debug` option to
+parse. This option can either be an integer (which turns debugging on or
+off) or an instance of a logger object. For example:
+
+    log = logging.getLogger()
+    parser.parse(input,debug=log)
+
+If a logging object is passed, you can use its filtering level to
+control how much output gets generated. The `INFO` level is used to
+produce information about rule reductions. The `DEBUG` level will show
+information about the parsing stack, token shifts, and other details.
+The `ERROR` level shows information related to parsing errors.
+
+For very complicated problems, you should pass in a logging object that
+redirects to a file where you can more easily inspect the output after
+execution.
+
+## Using Python -OO Mode
+
+Because of PLY\'s reliance on docstrings, it is not compatible with
+[-OO]{.title-ref} mode of the interpreter (which strips docstrings). If
+you want to support this, you\'ll need to write a decorator or some
+other tool to attach docstrings to functions. For example:
+
+    def _(doc):
+        def decorate(func):
+            func.__doc__ = doc
+            return func
+        return decorate
+
+    @_("assignment : expr PLUS expr")
+    def p_assignment(p):
+        ...
+
+PLY does not provide such a decorator by default.
+
+## Where to go from here?
+
+The `examples` directory of the PLY distribution contains several simple
+examples. Please consult a compilers textbook for the theory and
+underlying implementation details or LR parsing.