Contributing.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83

Licensing
=========

The code is distributed under the BSD 2-clause license. Contributors making pull
requests must agree that they are able and willing to put their contributions
under that license.

Goals & non-goals of Pygments
=============================

Python support
--------------

Pygments supports all supported Python versions as per the [Python Developer's Guide](https://devguide.python.org/#status-of-python-branches). Additionally, the default Python version of the latest stable version of RHEL, Ubuntu LTS, and Debian are supported, even if they're officially EOL. Supporting other end-of-life versions is a non-goal of Pygments.

Validation
----------

Pygments does not attempt to validate the input. Accepting code that is not legal for a given language is acceptable if it simplifies the codebase and does not result in surprising behavior. For instance, in C89, accepting `//` based comments would be fine because de-facto all compilers supported it, and having a separate lexer for it would not be worth it.

Contribution checklist
======================

* Check the documentation for how to write
  [a new lexer](https://pygments.org/docs/lexerdevelopment/),
  [a new formatter](https://pygments.org/docs/formatterdevelopment/) or
  [a new filter](https://pygments.org/docs/filterdevelopment/)

* Make sure to add a test for your new functionality, and where applicable, 
  write documentation.

* When writing rules, try to merge simple rules. For instance, combine:

  ```python
  _PUNCTUATION = [
    (r"\(", token.Punctuation),
    (r"\)", token.Punctuation),
    (r"\[", token.Punctuation),
    (r"\]", token.Punctuation),
    ("{", token.Punctuation),
    ("}", token.Punctuation),
  ]
  ```

  into:

  ```python
  (r"[\(\)\[\]{}]", token.Punctuation)
  ```

* Be careful with ``.*``. This matches greedily as much as it can. For instance,
  a rule like ``@.*@`` will match the whole string ``@first@ second @third@``,
  instead of matching ``@first@`` and ``@second@``. You can use ``@.*?@`` in
  this case to stop early. The ``?`` tries to match _as few times_ as possible.

* Don't add imports of your lexer anywhere in the codebase. (In case you're
  curious about ``compiled.py`` -- this file exists for backwards compatibility
  reasons.)

* Use the standard importing convention: ``from token import Punctuation``

* For test cases that assert on the tokens produced by a lexer, use tools:

  * You can use the ``testcase`` formatter to produce a piece of code that
    can be pasted into a unittest file:
    ``python -m pygments -l lua -f testcase <<< "local a = 5"``

  * Most snippets should instead be put as a sample file under
    ``tests/snippets/<lexer_alias>/*.txt``. These files are automatically
    picked up as individual tests, asserting that the input produces the
    expected tokens.

    To add a new test, create a file with just your code snippet under a
    subdirectory based on your lexer's main alias. Then run
    ``pytest --update-goldens <filename.txt>`` to auto-populate the currently
    expected tokens. Check that they look good and check in the file.

    Also run the same command whenever you need to update the test if the
    actual produced tokens change (assuming the change is expected).

  * Large test files should go in ``tests/examplefiles``.  This works
    similar to ``snippets``, but the token output is stored in a separate
    file.  Output can also be regenerated with ``--update-goldens``.