summaryrefslogtreecommitdiff
path: root/doc/docs/tokens.rst
blob: 801fc638cbfbeda34b752a4ed45fa866fdd8cc23 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
.. -*- mode: rst -*-

==============
Builtin Tokens
==============

.. module:: pygments.token

In the :mod:`pygments.token` module, there is a special object called `Token`
that is used to create token types.

You can create a new token type by accessing an attribute of `Token`:

.. sourcecode:: pycon

    >>> from pygments.token import Token
    >>> Token.String
    Token.String
    >>> Token.String is Token.String
    True

Note that tokens are singletons so you can use the ``is`` operator for comparing
token types.

As of Pygments 0.7 you can also use the ``in`` operator to perform set tests:

.. sourcecode:: pycon

    >>> from pygments.token import Comment
    >>> Comment.Single in Comment
    True
    >>> Comment in Comment.Multi
    False

This can be useful in :doc:`filters <filters>` and if you write lexers on your
own without using the base lexers.

You can also split a token type into a hierarchy, and get the parent of it:

.. sourcecode:: pycon

    >>> String.split()
    [Token, Token.Literal, Token.Literal.String]
    >>> String.parent
    Token.Literal

In principle, you can create an unlimited number of token types but nobody can
guarantee that a style would define style rules for a token type. Because of
that, Pygments proposes some global token types defined in the
`pygments.token.STANDARD_TYPES` dict.

For some tokens aliases are already defined:

.. sourcecode:: pycon

    >>> from pygments.token import String
    >>> String
    Token.Literal.String

Inside the :mod:`pygments.token` module the following aliases are defined:

============= ============================ ====================================
`Text`        `Token.Text`                 for any type of text data
`Whitespace`  `Token.Text.Whitespace`      for specially highlighted whitespace
`Error`       `Token.Error`                represents lexer errors
`Other`       `Token.Other`                special token for data not
                                           matched by a parser (e.g. HTML
                                           markup in PHP code)
`Keyword`     `Token.Keyword`              any kind of keywords
`Name`        `Token.Name`                 variable/function names
`Literal`     `Token.Literal`              Any literals
`String`      `Token.Literal.String`       string literals
`Number`      `Token.Literal.Number`       number literals
`Operator`    `Token.Operator`             operators (``+``, ``not``...)
`Punctuation` `Token.Punctuation`          punctuation (``[``, ``(``...)
`Comment`     `Token.Comment`              any kind of comments
`Generic`     `Token.Generic`              generic tokens (have a look at
                                           the explanation below)
============= ============================ ====================================

The `Whitespace` token type is new in Pygments 0.8. It is used only by the
`VisibleWhitespaceFilter` currently.

Normally you just create token types using the already defined aliases. For each
of those token aliases, a number of subtypes exists (excluding the special tokens
`Token.Text`, `Token.Error` and `Token.Other`)

The `is_token_subtype()` function in the `pygments.token` module can be used to
test if a token type is a subtype of another (such as `Name.Tag` and `Name`).
(This is the same as ``Name.Tag in Name``. The overloaded `in` operator was newly
introduced in Pygments 0.7, the function still exists for backwards
compatibility.)

With Pygments 0.7, it's also possible to convert strings to token types (for example
if you want to supply a token from the command line):

.. sourcecode:: pycon

    >>> from pygments.token import String, string_to_tokentype
    >>> string_to_tokentype("String")
    Token.Literal.String
    >>> string_to_tokentype("Token.Literal.String")
    Token.Literal.String
    >>> string_to_tokentype(String)
    Token.Literal.String


Keyword Tokens
==============

`Keyword`
    For any kind of keyword (especially if it doesn't match any of the
    subtypes of course).

`Keyword.Constant`
    For keywords that are constants (e.g. ``None`` in future Python versions).

`Keyword.Declaration`
    For keywords used for variable declaration (e.g. ``var`` in some programming
    languages like JavaScript).

`Keyword.Namespace`
    For keywords used for namespace declarations (e.g. ``import`` in Python and
    Java and ``package`` in Java).

`Keyword.Pseudo`
    For keywords that aren't really keywords (e.g. ``None`` in old Python
    versions).

`Keyword.Reserved`
    For reserved keywords.

`Keyword.Type`
    For builtin types that can't be used as identifiers (e.g. ``int``,
    ``char`` etc. in C).


Name Tokens
===========

`Name`
    For any name (variable names, function names, classes).

`Name.Attribute`
    For all attributes (e.g. in HTML tags).

`Name.Builtin`
    Builtin names; names that are available in the global namespace.

`Name.Builtin.Pseudo`
    Builtin names that are implicit (e.g. ``self`` in Ruby, ``this`` in Java).

`Name.Class`
    Class names. Because no lexer can know if a name is a class or a function
    or something else this token is meant for class declarations.

`Name.Constant`
    Token type for constants. In some languages you can recognise a token by the
    way it's defined (the value after a ``const`` keyword for example). In
    other languages constants are uppercase by definition (Ruby).

`Name.Decorator`
    Token type for decorators. Decorators are syntactic elements in the Python
    language. Similar syntax elements exist in C# and Java.

`Name.Entity`
    Token type for special entities. (e.g. ``&nbsp;`` in HTML).

`Name.Exception`
    Token type for exception names (e.g. ``RuntimeError`` in Python). Some languages
    define exceptions in the function signature (Java). You can highlight
    the name of that exception using this token then.

`Name.Function`
    Token type for function names.

`Name.Function.Magic`
    same as `Name.Function` but for special function names that have an implicit use
    in a language (e.g. ``__init__`` method in Python).

`Name.Label`
    Token type for label names (e.g. in languages that support ``goto``).

`Name.Namespace`
    Token type for namespaces. (e.g. import paths in Java/Python), names following
    the ``module``/``namespace`` keyword in other languages.

`Name.Other`
    Other names. Normally unused.

`Name.Tag`
    Tag names (in HTML/XML markup or configuration files).

`Name.Variable`
    Token type for variables. Some languages have prefixes for variable names
    (PHP, Ruby, Perl). You can highlight them using this token.

`Name.Variable.Class`
    same as `Name.Variable` but for class variables (also static variables).

`Name.Variable.Global`
    same as `Name.Variable` but for global variables (used in Ruby, for
    example).

`Name.Variable.Instance`
    same as `Name.Variable` but for instance variables.

`Name.Variable.Magic`
    same as `Name.Variable` but for special variable names that have an implicit use
    in a language (e.g. ``__doc__`` in Python).


Literals
========

`Literal`
    For any literal (if not further defined).

`Literal.Date`
    for date literals (e.g. ``42d`` in Boo).


`String`
    For any string literal.

`String.Affix`
    Token type for affixes that further specify the type of the string they're
    attached to (e.g. the prefixes ``r`` and ``u8`` in ``r"foo"`` and ``u8"foo"``). 

`String.Backtick`
    Token type for strings enclosed in backticks.

`String.Char`
    Token type for single characters (e.g. Java, C).

`String.Delimiter`
    Token type for delimiting identifiers in "heredoc", raw and other similar
    strings (e.g. the word ``END`` in Perl code ``print <<'END';``).

`String.Doc`
    Token type for documentation strings (for example Python).

`String.Double`
    Double quoted strings.

`String.Escape`
    Token type for escape sequences in strings.

`String.Heredoc`
    Token type for "heredoc" strings (e.g. in Ruby or Perl).

`String.Interpol`
    Token type for interpolated parts in strings (e.g. ``#{foo}`` in Ruby).

`String.Other`
    Token type for any other strings (for example ``%q{foo}`` string constructs
    in Ruby).

`String.Regex`
    Token type for regular expression literals (e.g. ``/foo/`` in JavaScript).

`String.Single`
    Token type for single quoted strings.

`String.Symbol`
    Token type for symbols (e.g. ``:foo`` in LISP or Ruby).


`Number`
    Token type for any number literal.

`Number.Bin`
    Token type for binary literals (e.g. ``0b101010``).

`Number.Float`
    Token type for float literals (e.g. ``42.0``).

`Number.Hex`
    Token type for hexadecimal number literals (e.g. ``0xdeadbeef``).

`Number.Integer`
    Token type for integer literals (e.g. ``42``).

`Number.Integer.Long`
    Token type for long integer literals (e.g. ``42L`` in Python).

`Number.Oct`
    Token type for octal literals.


Operators
=========

`Operator`
    For any punctuation operator (e.g. ``+``, ``-``).

`Operator.Word`
    For any operator that is a word (e.g. ``not``).


Punctuation
===========

.. versionadded:: 0.7

`Punctuation`
    For any punctuation which is not an operator (e.g. ``[``, ``(``...)


Comments
========

`Comment`
    Token type for any comment.

`Comment.Hashbang`
    Token type for hashbang comments (i.e. first lines of files that start with
     ``#!``).

`Comment.Multiline`
    Token type for multiline comments.

`Comment.Preproc`
    Token type for preprocessor comments (also ``<?php``/``<%`` constructs).

`Comment.Single`
    Token type for comments that end at the end of a line (e.g. ``# foo``).

`Comment.Special`
    Special data in comments. For example code tags, author and license
    information, etc.


Generic Tokens
==============

Generic tokens are for special lexers like the `DiffLexer` that doesn't really
highlight a programming language but a patch file.


`Generic`
    A generic, unstyled token. Normally you don't use this token type.

`Generic.Deleted`
    Marks the token value as deleted.

`Generic.Emph`
    Marks the token value as emphasized.

`Generic.Error`
    Marks the token value as an error message.

`Generic.Heading`
    Marks the token value as headline.

`Generic.Inserted`
    Marks the token value as inserted.

`Generic.Output`
    Marks the token value as program output (e.g. for python cli lexer).

`Generic.Prompt`
    Marks the token value as command prompt (e.g. bash lexer).

`Generic.Strong`
    Marks the token value as bold (e.g. for rst lexer).

`Generic.Subheading`
    Marks the token value as subheadline.

`Generic.Traceback`
    Marks the token value as a part of an error traceback.