summaryrefslogtreecommitdiff
path: root/pygments/lexer.py
diff options
context:
space:
mode:
authorArthur Milchior <arthur@milchior.fr>2021-12-06 17:38:43 +0100
committerGitHub <noreply@github.com>2021-12-06 17:38:43 +0100
commit73b77d5258dfd034b6daa8b6e9c49cb1ab072782 (patch)
tree1212b9fd8e990a2b4580def99bc254c454d0d85b /pygments/lexer.py
parent48c5908977e57f7c8371e991da837eb8a7bf1345 (diff)
downloadpygments-git-73b77d5258dfd034b6daa8b6e9c49cb1ab072782.tar.gz
Clarifying some documentation (#1928)
* NF: adding an example of use of simple filter @simplefilter is great, but also not very intuitive. Indeeds, the syntax seems to indicate that you define a function with four arguments while in reality you define a class whose constructor takes arbitrary keyword arguments. I believe in this case an example to show how to instantiate this filter is really necessary. Regarding simplefilter, I also believe that it could be improved in two simple ways: * accepting any method which takes lexer and stream as a filter. That would be sufficient as long as there is no option * the @simplefilter decorator could deal with `self` so that the user do not have to add it themselves. Probably not worth doing it no, as it would break compatibility with current version, but would be even simpler to use * NF: clarifying get_..._options get_bool_opt's documentation seems to indicate that the key is interpreted as a Boolean. While a quick look at the code shows clearly that the value associated to the key is what is interpreted as a Boolean. I hope I made the code clearer to any people who know python by indicating that it is essentially `.get` but with extra features * NF: clarifying Filter `filter` has already a specific behavior in general python, or for any people used to functional programing (and even if some dom processor). So indicating that a filter is not something that remove some tokens seems really useful to try to explain what is going on. * NF: adding details regarding states in lexer I found the state explanation confusing. I do know what a state machine is. However, reading the code, I first thought that there were two distinct variables: * the current state * the stack that are somehow related but distinct. Explaining that the current state is the top of the stack was lacking in my opinion. That also help explain #push. In particular that if you define in state "s" an operation whose new state is "#push", the behavior can be quite different than if the new state was "s".
Diffstat (limited to 'pygments/lexer.py')
-rw-r--r--pygments/lexer.py17
1 files changed, 11 insertions, 6 deletions
diff --git a/pygments/lexer.py b/pygments/lexer.py
index cf9ebdf4..33d738a8 100644
--- a/pygments/lexer.py
+++ b/pygments/lexer.py
@@ -590,19 +590,24 @@ class RegexLexer(Lexer, metaclass=RegexLexerMeta):
#: Defaults to MULTILINE.
flags = re.MULTILINE
+ #: At all time there is a stack of states. Initially, the stack contains
+ #: a single state 'root'. The top of the stack is called "the current state".
+ #:
#: Dict of ``{'state': [(regex, tokentype, new_state), ...], ...}``
#:
- #: The initial state is 'root'.
#: ``new_state`` can be omitted to signify no state transition.
- #: If it is a string, the state is pushed on the stack and changed.
- #: If it is a tuple of strings, all states are pushed on the stack and
- #: the current state will be the topmost.
- #: It can also be ``combined('state1', 'state2', ...)``
+ #: If ``new_state`` is a string, it is pushed on the stack. This ensure
+ #: the new current state is ``new_state``.
+ #: If ``new_state`` is a tuple of strings, all of those strings are pushed
+ #: on the stack and the current state will be the last element of the list.
+ #: ``new_state`` can also be ``combined('state1', 'state2', ...)``
#: to signify a new, anonymous state combined from the rules of two
#: or more existing ones.
#: Furthermore, it can be '#pop' to signify going back one step in
#: the state stack, or '#push' to push the current state on the stack
- #: again.
+ #: again. Note that if you push while in a combined state, the combined
+ #: state itself is pushed, and not only the state in which the rule is
+ #: defined.
#:
#: The tuple can also be replaced with ``include('state')``, in which
#: case the rules from the state named by the string are included in the