diff options
author | Arthur Milchior <arthur@milchior.fr> | 2021-12-06 17:38:43 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2021-12-06 17:38:43 +0100 |
commit | 73b77d5258dfd034b6daa8b6e9c49cb1ab072782 (patch) | |
tree | 1212b9fd8e990a2b4580def99bc254c454d0d85b /pygments/lexer.py | |
parent | 48c5908977e57f7c8371e991da837eb8a7bf1345 (diff) | |
download | pygments-git-73b77d5258dfd034b6daa8b6e9c49cb1ab072782.tar.gz |
Clarifying some documentation (#1928)
* NF: adding an example of use of simple filter
@simplefilter is great, but also not very intuitive. Indeeds, the syntax seems
to indicate that you define a function with four arguments while in reality you
define a class whose constructor takes arbitrary keyword arguments. I believe in
this case an example to show how to instantiate this filter is really necessary.
Regarding simplefilter, I also believe that it could be improved in two simple
ways:
* accepting any method which takes lexer and stream as a filter. That would be
sufficient as long as there is no option
* the @simplefilter decorator could deal with `self` so that the user do not
have to add it themselves. Probably not worth doing it no, as it would break
compatibility with current version, but would be even simpler to use
* NF: clarifying get_..._options
get_bool_opt's documentation seems to indicate that the key is interpreted as a
Boolean. While a quick look at the code shows clearly that the value associated
to the key is what is interpreted as a Boolean. I hope I made the code clearer
to any people who know python by indicating that it is essentially `.get` but
with extra features
* NF: clarifying Filter
`filter` has already a specific behavior in general python, or for any people
used to functional programing (and even if some dom processor). So indicating
that a filter is not something that remove some tokens seems really useful to
try to explain what is going on.
* NF: adding details regarding states in lexer
I found the state explanation confusing. I do know what a state machine
is. However, reading the code, I first thought that there were two distinct
variables:
* the current state
* the stack
that are somehow related but distinct. Explaining that the current state is the
top of the stack was lacking in my opinion. That also help explain #push. In
particular that if you define in state "s" an operation whose new state is
"#push", the behavior can be quite different than if the new state was "s".
Diffstat (limited to 'pygments/lexer.py')
-rw-r--r-- | pygments/lexer.py | 17 |
1 files changed, 11 insertions, 6 deletions
diff --git a/pygments/lexer.py b/pygments/lexer.py index cf9ebdf4..33d738a8 100644 --- a/pygments/lexer.py +++ b/pygments/lexer.py @@ -590,19 +590,24 @@ class RegexLexer(Lexer, metaclass=RegexLexerMeta): #: Defaults to MULTILINE. flags = re.MULTILINE + #: At all time there is a stack of states. Initially, the stack contains + #: a single state 'root'. The top of the stack is called "the current state". + #: #: Dict of ``{'state': [(regex, tokentype, new_state), ...], ...}`` #: - #: The initial state is 'root'. #: ``new_state`` can be omitted to signify no state transition. - #: If it is a string, the state is pushed on the stack and changed. - #: If it is a tuple of strings, all states are pushed on the stack and - #: the current state will be the topmost. - #: It can also be ``combined('state1', 'state2', ...)`` + #: If ``new_state`` is a string, it is pushed on the stack. This ensure + #: the new current state is ``new_state``. + #: If ``new_state`` is a tuple of strings, all of those strings are pushed + #: on the stack and the current state will be the last element of the list. + #: ``new_state`` can also be ``combined('state1', 'state2', ...)`` #: to signify a new, anonymous state combined from the rules of two #: or more existing ones. #: Furthermore, it can be '#pop' to signify going back one step in #: the state stack, or '#push' to push the current state on the stack - #: again. + #: again. Note that if you push while in a combined state, the combined + #: state itself is pushed, and not only the state in which the rule is + #: defined. #: #: The tuple can also be replaced with ``include('state')``, in which #: case the rules from the state named by the string are included in the |