summaryrefslogtreecommitdiff
path: root/_doc/detail.ryd
blob: b25c939aaf567c4dfe9358d4430e6bdc97dd2e8b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
version: 0.2
text: rst
fix_inline_single_backquotes: true
--- |
*******
Details
*******



- support for simple lists as mapping keys by transforming these to tuples
- ``!!omap`` generates ordereddict (C) on Python 2, collections.OrderedDict
  on Python 3, and ``!!omap`` is generated for these types.
- Tests whether the C yaml library is installed as well as the header
  files. That library  doesn't generate CommentTokens, so it cannot be used to
  do round trip editing on comments. It can be used to speed up normal
  processing (so you don't need to install ``ruamel.yaml`` and ``PyYaml``).
  See the section *Optional requirements*.
- Basic support for multiline strings with preserved newlines and
  chomping ( '``|``', '``|+``', '``|-``' ). As this subclasses the string type
  the information is lost on reassignment. (This might be changed
  in the future so that the preservation/folding/chomping is part of the
  parent container, like comments).
- anchors names that are hand-crafted (not of the form``idNNN``) are preserved
- `merges <http://yaml.org/type/merge.html>`_ in dictionaries are preserved
- adding/replacing comments on block-style sequences and mappings
  with smart column positioning
- collection objects (when read in via RoundTripParser) have an ``lc``
  property that contains line and column info ``lc.line`` and ``lc.col``.
  Individual positions for mappings and sequences can also be retrieved
  (``lc.key('a')``, ``lc.value('a')`` resp. ``lc.item(3)``)
- preservation of whitelines after block scalars. Contributed by Sam Thursfield.

*In the following examples it is assumed you have done something like:*::

  from ruamel.yaml import YAML
  yaml = YAML()

*if not explicitly specified.*

Indentation of block sequences
==============================

Although ruamel.yaml doesn't preserve individual indentations of block sequence
items, it does properly dump::

  x:
  - b: 1
  - 2

back to::

  x:
  -   b: 1
  -   2

if you specify ``yaml.indent(sequence=4)`` (indentation is counted to the
beginning of the sequence element).

PyYAML (and older versions of ruamel.yaml) gives you non-indented
scalars (when specifying default_flow_style=False)::

  x:
  -   b: 1
  - 2

You can use ``mapping=4`` to also have the mappings values indented.
The dump also observes an additional ``offset=2`` setting that
can be used to push the dash inwards, *within the space defined by* ``sequence``.

The above example with the often seen ``yaml.indent(mapping=2, sequence=4, offset=2)``
indentation::

  x:
    y:
      - b: 1
      - 2

The defaults are as if you specified ``yaml.indent(mapping=2, sequence=2, offset=0)``.

If the ``offset`` equals ``sequence``, there is not enough
room for the dash and the space that has to follow it. In that case the
element itself would normally be pushed to the next line (and older versions
of ruamel.yaml did so). But this is
prevented from happening. However the ``indent`` level is what is used
for calculating the cumulative indent for deeper levels and specifying
``sequence=3`` resp. ``offset=2``, might give correct, but counter
intuitive results.

**It is best to always have** ``sequence >= offset + 2``
**but this is not enforced**. Depending on your structure, not following
this advice **might lead to invalid output**.

Inconsistently indented YAML
++++++++++++++++++++++++++++

If your input is inconsistently indented, such indentation cannot be preserved.
The first round-trip will make it consistent/normalize it. Here are some
inconsistently indented YAML examples.

``b`` indented 3, ``c`` indented 4 positions::

  a:
     b:
         c: 1

Top level sequence is indented 2 without offset, the other sequence 4 (with offset 2)::

  - key:
      - foo
      - bar



Positioning ':' in top level mappings, prefixing ':'
====================================================

If you want your toplevel mappings to look like::

  library version: 1
  comment        : |
      this is just a first try

then set ``yaml.top_level_colon_align = True``
(and ``yaml.indent = 4``). ``True`` causes calculation based on the longest key,
but you can also explicitly set a number.

If you want an extra space between a mapping key and the colon specify
``yaml.prefix_colon = ' '``::

  - https://myurl/abc.tar.xz : 23445
  #                         ^ extra space here
  - https://myurl/def.tar.xz : 944

If you combine ``prefix_colon`` with ``top_level_colon_align``, the
top level mapping doesn't get the extra prefix. If you want that
anyway, specify ``yaml.top_level_colon_align = 12`` where ``12`` has to be an
integer that is one more than length of the widest key.


Document version support
++++++++++++++++++++++++

In YAML a document version can be explicitly set by using::

   %YAML 1.x

before the document start (at the top or before a
``---``). For ``ruamel.yaml``  x has to be 1 or 2. If no explicit
version is set `version 1.2 <http://www.yaml.org/spec/1.2/spec.html>`_
is assumed (which has been released in 2009).

The 1.2 version does **not** support:

- sexagesimals like ``12:34:56``
- octals that start with 0 only: like ``012`` for number 10 (``0o12`` **is**
  supported by YAML 1.2)
- Unquoted Yes and On as alternatives for True and No and Off for False.

If you cannot change your YAML files and you need them to load as 1.1
you can load with ``yaml.version = (1, 1)``,
or the equivalent (version can be a tuple, list or string)  ``yaml.version = "1.1"``

*If you cannot change your code, stick with ruamel.yaml==0.10.23 and let
me know if it would help to be able to set an environment variable.*

This does not affect dump as ruamel.yaml never emitted sexagesimals, nor
octal numbers, and emitted booleans always as true resp. false

Round trip including comments
+++++++++++++++++++++++++++++

The major motivation for this fork is the round-trip capability for
comments. The integration of the sources was just an initial step to
make this easier.

adding/replacing comments
^^^^^^^^^^^^^^^^^^^^^^^^^

Starting with version 0.8, you can add/replace comments on block style
collections (mappings/sequences resuting in Python dict/list). The basic
for for this is:
--- !python |
  from __future__ import print_function

  import sys
  import ruamel.yaml

  yaml = ruamel.yaml.YAML()  # defaults to round-trip

  inp = """\
  abc:
    - a     # comment 1
  xyz:
    a: 1    # comment 2
    b: 2
    c: 3
    d: 4
    e: 5
    f: 6 # comment 3
  """

  data = yaml.load(inp)
  data['abc'].append('b')
  data['abc'].yaml_add_eol_comment('comment 4', 1)  # takes column of comment 1
  data['xyz'].yaml_add_eol_comment('comment 5', 'c')  # takes column of comment 2
  data['xyz'].yaml_add_eol_comment('comment 6', 'e')  # takes column of comment 3
  data['xyz'].yaml_add_eol_comment('comment 7\n\n# that\'s all folks', 'd', column=20)

  yaml.dump(data, sys.stdout)
--- !stdout |
Resulting in::
--- !comment |
  abc:
  - a       # comment 1
  - b       # comment 4
  xyz:
    a: 1    # comment 2
    b: 2
    c: 3    # comment 5
    d: 4              # comment 7
    e: 5 # comment 6
    f: 6 # comment 3

--- |
If the comment doesn't start with '#', this will be added. The key is
the element index for list, the actual key for dictionaries. As can be seen
from the example, the column to choose for a comment is derived
from the previous, next or preceding comment column (picking the first one
found).

Make sure that the added comment is correct, in the sense that when it
contains newlines, the following is either an empty line or a line with
only spaces, or the first non-space is a `#`.

Config file formats
+++++++++++++++++++

There are only a few configuration file formats that are easily
readable and editable: JSON, INI/ConfigParser, YAML (XML is to cluttered
to be called easily readable).

Unfortunately `JSON <http://www.json.org/>`_ doesn't support comments,
and although there are some solutions with pre-processed filtering of
comments, there are no libraries that support round trip updating of
such commented files.

INI files support comments, and the excellent `ConfigObj
<http://www.voidspace.org.uk/python/configobj.html>`_ library by Foord
and Larosa even supports round trip editing with comment preservation,
nesting of sections and limited lists (within a value). Retrieval of
particular value format is explicit (and extensible).

YAML has basic mapping and sequence structures as well as support for
ordered mappings and sets. It supports scalars various types
including dates and datetimes (missing in JSON).
YAML has comments, but these are normally thrown away.

Block structured YAML is a clean and very human readable
format. By extending the Python YAML parser to support round trip
preservation of comments, it makes YAML a very good choice for
configuration files that are human readable and editable while at
the same time interpretable and modifiable by a program.

Extending
+++++++++

There are normally six files involved when extending the roundtrip
capabilities: the reader, parser, composer and constructor to go from YAML to
Python and the resolver, representer, serializer and emitter to go the other
way.

Extending involves keeping extra data around for the next process step,
eventuallly resulting in a different Python object (subclass or alternative),
that should behave like the original, but on the way from Python to YAML
generates the original (or at least something much closer).

Smartening
++++++++++

When you use round-tripping, then the complex data you get are
already subclasses of the built-in types. So you can patch
in extra methods or override existing ones. Some methods are already
included and you can do::

    yaml_str = """\
    a:
    - b:
      c: 42
    - d:
        f: 196
      e:
        g: 3.14
    """


    data = yaml.load(yaml_str)

    assert data.mlget(['a', 1, 'd', 'f'], list_ok=True) == 196