summaryrefslogtreecommitdiff
path: root/ovn/ovn-sb.xml
blob: 57e96897bc96e7171f5282fa02f9e00ab011f03f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
<?xml version="1.0" encoding="utf-8"?>
<database name="ovn-sb" title="OVN Southbound Database">
  <p>
    This database holds logical and physical configuration and state for the
    Open Virtual Network (OVN) system to support virtual network abstraction.
    For an introduction to OVN, please see <code>ovn-architecture</code>(7).
  </p>

  <p>
    The OVN Southbound database sits at the center of the OVN
    architecture.  It is the one component that speaks both southbound
    directly to all the hypervisors and gateways, via
    <code>ovn-controller</code>, and northbound to the Cloud Management
    System, via <code>ovn-northd</code>:
  </p>

  <h2>Database Structure</h2>

  <p>
    The OVN Southbound database contains three classes of data with
    different properties, as described in the sections below.
  </p>

  <h3>Physical Network (PN) data</h3>

  <p>
    PN tables contain information about the chassis nodes in the system.  This
    contains all the information necessary to wire the overlay, such as IP
    addresses, supported tunnel types, and security keys.
  </p>

  <p>
    The amount of PN data is small (O(n) in the number of chassis) and it
    changes infrequently, so it can be replicated to every chassis.
  </p>

  <p>
    The <ref table="Chassis"/> table comprises the PN tables.
  </p>

  <h3>Logical Network (LN) data</h3>

  <p>
    LN tables contain the topology of logical switches and routers, ACLs,
    firewall rules, and everything needed to describe how packets traverse a
    logical network, represented as logical datapath flows (see Logical
    Datapath Flows, below).
  </p>

  <p>
    LN data may be large (O(n) in the number of logical ports, ACL rules,
    etc.).  Thus, to improve scaling, each chassis should receive only data
    related to logical networks in which that chassis participates.  Past
    experience shows that in the presence of large logical networks, even
    finer-grained partitioning of data, e.g. designing logical flows so that
    only the chassis hosting a logical port needs related flows, pays off
    scale-wise.  (This is not necessary initially but it is worth bearing in
    mind in the design.)
  </p>

  <p>
    The LN is a slave of the cloud management system running northbound of OVN.
    That CMS determines the entire OVN logical configuration and therefore the
    LN's content at any given time is a deterministic function of the CMS's
    configuration, although that happens indirectly via the OVN Northbound DB
    and <code>ovn-northd</code>.
  </p>

  <p>
    LN data is likely to change more quickly than PN data.  This is especially
    true in a container environment where VMs are created and destroyed (and
    therefore added to and deleted from logical switches) quickly.
  </p>

  <p>
    <ref table="Logical_Flow"/> and <ref table="Multicast_Group"/> contain LN
    data.
  </p>

  <h3>Bindings data</h3>

  <p>
    Bindings data link logical and physical components.  They show the current
    placement of logical components (such as VMs and VIFs) onto chassis, and
    map logical entities to the values that represent them in tunnel
    encapsulations.
  </p>

  <p>
    Bindings change frequently, at least every time a VM powers up or down
    or migrates, and especially quickly in a container environment.  The
    amount of data per VM (or VIF) is small.
  </p>

  <p>
    Each chassis is authoritative about the VMs and VIFs that it hosts at any
    given time and can efficiently flood that state to a central location, so
    the consistency needs are minimal.
  </p>

  <p>
    The <ref table="Port_Binding"/> and <ref table="Datapath_Binding"/> tables
    contain binding data.
  </p>

  <h2>Common Columns</h2>

  <p>
    Some tables contain a special column named <code>external_ids</code>.  This
    column has the same form and purpose each place that it appears, so we
    describe it here to save space later.
  </p>

  <dl>
    <dt><code>external_ids</code>: map of string-string pairs</dt>
    <dd>
      Key-value pairs for use by the software that manages the OVN Southbound
      database rather than by <code>ovn-controller</code>.  In particular,
      <code>ovn-northd</code> can use key-value pairs in this column to relate
      entities in the southbound database to higher-level entities (such as
      entities in the OVN Northbound database).  Individual key-value pairs in
      this column may be documented in some cases to aid in understanding and
      troubleshooting, but the reader should not mistake such documentation as
      comprehensive.
    </dd>
  </dl>

  <table name="Chassis" title="Physical Network Hypervisor and Gateway Information">
    <p>
      Each row in this table represents a hypervisor or gateway (a chassis) in
      the physical network (PN).  Each chassis, via
      <code>ovn-controller</code>, adds and updates its own row, and keeps a
      copy of the remaining rows to determine how to reach other hypervisors.
    </p>

    <p>
      When a chassis shuts down gracefully, it should remove its own row.
      (This is not critical because resources hosted on the chassis are equally
      unreachable regardless of whether the row is present.)  If a chassis
      shuts down permanently without removing its row, some kind of manual or
      automatic cleanup is eventually needed; we can devise a process for that
      as necessary.
    </p>

    <column name="name">
      A chassis name, taken from <ref key="system-id" table="Open_vSwitch"
      column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch
      database's <ref table="Open_vSwitch" db="Open_vSwitch"/> table.  OVN does
      not prescribe a particular format for chassis names.
    </column>

    <group title="Encapsulation Configuration">
      <p>
        OVN uses encapsulation to transmit logical dataplane packets
        between chassis.
      </p>

      <column name="encaps">
        Points to supported encapsulation configurations to transmit
        logical dataplane packets to this chassis.  Each entry is a <ref
        table="Encap"/> record that describes the configuration.
      </column>
    </group>

     <group title="Gateway Configuration">
       <p>
        A <dfn>gateway</dfn> is a chassis that forwards traffic between the
        OVN-managed part of a logical network and a physical VLAN, extending a
        tunnel-based logical network into a physical network.  Gateways are
        typically dedicated nodes that do not host VMs.
      </p>

      <column name="vtep_logical_switches">
        Stores all vtep logical switch names connected by this gateway
        chassis.
      </column>
     </group>
  </table>

  <table name="Encap" title="Encapsulation Types">
    <p>
      The <ref column="encaps" table="Chassis"/> column in the <ref
      table="Chassis"/> table refers to rows in this table to identify
      how OVN may transmit logical dataplane packets to this chassis.
      Each chassis, via <code>ovn-controller</code>(8), adds and updates
      its own rows and keeps a copy of the remaining rows to determine
      how to reach other chassis.
    </p>

    <column name="type">
      The encapsulation to use to transmit packets to this chassis.
      Hypervisors must use either <code>geneve</code> or
      <code>stt</code>.  Gateways may use <code>vxlan</code>,
      <code>geneve</code>, or <code>stt</code>.
    </column>

    <column name="options">
      Options for configuring the encapsulation, e.g. IPsec parameters when
      IPsec support is introduced.  No options are currently defined.
    </column>

    <column name="ip">
      The IPv4 address of the encapsulation tunnel endpoint.
    </column>
  </table>

  <table name="Logical_Flow" title="Logical Network Flows">
    <p>
      Each row in this table represents one logical flow.  The cloud management
      system, via its OVN integration, populates this table with logical flows
      that implement the L2 and L3 topology specified in the CMS configuration.
      Each hypervisor, via <code>ovn-controller</code>, translates the logical
      flows into OpenFlow flows specific to its hypervisor and installs them
      into Open vSwitch.
    </p>

    <p>
      Logical flows are expressed in an OVN-specific format, described here.  A
      logical datapath flow is much like an OpenFlow flow, except that the
      flows are written in terms of logical ports and logical datapaths instead
      of physical ports and physical datapaths.  Translation between logical
      and physical flows helps to ensure isolation between logical datapaths.
      (The logical flow abstraction also allows the CMS to do less work, since
      it does not have to separately compute and push out physical flows to each
      chassis.)
    </p>

    <p>
      The default action when no flow matches is to drop packets.
    </p>

    <p><em>Logical Life Cycle of a Packet</em></p>

    <p>
      This following description focuses on the life cycle of a packet through
      a logical datapath, ignoring physical details of the implementation.
      Please refer to <em>Life Cycle of a Packet</em> in
      <code>ovn-architecture</code>(7) for the physical information.
    </p>

    <p>
      The description here is written as if OVN itself executes these steps,
      but in fact OVN (that is, <code>ovn-controller</code>) programs Open
      vSwitch, via OpenFlow and OVSDB, to execute them on its behalf.
    </p>

    <p>
      At a high level, OVN passes each packet through the logical datapath's
      logical ingress pipeline, which may output the packet to one or more
      logical port or logical multicast groups.  For each such logical output
      port, OVN passes the packet through the datapath's logical egress
      pipeline, which may either drop the packet or deliver it to the
      destination.  Between the two pipelines, outputs to logical multicast
      groups are expanded into logical ports, so that the egress pipeline only
      processes a single logical output port at a time.  Between the two
      pipelines is also where, when necessary, OVN encapsulates a packet in a
      tunnel (or tunnels) to transmit to remote hypervisors.
    </p>

    <p>
      In more detail, to start, OVN searches the <ref table="Logical_Flow"/>
      table for a row with correct <ref column="logical_datapath"/>, a <ref
      column="pipeline"/> of <code>ingress</code>, a <ref column="table_id"/>
      of 0, and a <ref column="match"/> that is true for the packet.  If none
      is found, OVN drops the packet.  If OVN finds more than one, it chooses
      the match with the highest <ref column="priority"/>.  Then OVN executes
      each of the actions specified in the row's <ref table="actions"/> column,
      in the order specified.  Some actions, such as those to modify packet
      headers, require no further details.  The <code>next</code> and
      <code>output</code> actions are special.
    </p>

    <p>
      The <code>next</code> action causes the above process to be repeated
      recursively, except that OVN searches for <ref column="table_id"/> of 1
      instead of 0.  Similarly, any <code>next</code> action in a row found in
      that table would cause a further search for a <ref column="table_id"/> of
      2, and so on.  When recursive processing completes, flow control returns
      to the action following <code>next</code>.
    </p>

    <p>
      The <code>output</code> action also introduces recursion.  Its effect
      depends on the current value of the <code>outport</code> field.  Suppose
      <code>outport</code> designates a logical port.  First, OVN compares
      <code>inport</code> to <code>outport</code>; if they are equal, it treats
      the <code>output</code> as a no-op.  In the common case, where they are
      different, the packet enters the egress pipeline.  This transition to the
      egress pipeline discards register data, e.g. <code>reg0</code>
      ... <code>reg5</code>, to achieve uniform behavior regardless of whether
      the egress pipeline is on a different hypervisor (because registers
      aren't preserve across tunnel encapsulation).
    </p>

    <p>
      To execute the egress pipeline, OVN again searches the <ref
      table="Logical_Flow"/> table for a row with correct <ref
      column="logical_datapath"/>, a <ref column="table_id"/> of 0, a <ref
      column="match"/> that is true for the packet, but now looking for a <ref
      column="pipeline"/> of <code>egress</code>.  If no matching row is found,
      the output becomes a no-op.  Otherwise, OVN executes the actions for the
      matching flow (which is chosen from multiple, if necessary, as already
      described).
    </p>

    <p>
      In the <code>egress</code> pipeline, the <code>next</code> action acts as
      already described, except that it, of course, searches for
      <code>egress</code> flows.  The <code>output</code> action, however, now
      directly outputs the packet to the output port (which is now fixed,
      because <code>outport</code> is read-only within the egress pipeline).
    </p>

    <p>
      The description earlier assumed that <code>outport</code> referred to a
      logical port.  If it instead designates a logical multicast group, then
      the description above still applies, with the addition of fan-out from
      the logical multicast group to each logical port in the group.  For each
      member of the group, OVN executes the logical pipeline as described, with
      the logical output port replaced by the group member.
    </p>

    <column name="logical_datapath">
      The logical datapath to which the logical flow belongs.
    </column>

    <column name="pipeline">
      <p>
        The primary flows used for deciding on a packet's destination are the
        <code>ingress</code> flows.  The <code>egress</code> flows implement
        ACLs.  See <em>Logical Life Cycle of a Packet</em>, above, for details.
      </p>
    </column>

    <column name="table_id">
      The stage in the logical pipeline, analogous to an OpenFlow table number.
    </column>

    <column name="priority">
      The flow's priority.  Flows with numerically higher priority take
      precedence over those with lower.  If two logical datapath flows with the
      same priority both match, then the one actually applied to the packet is
      undefined.
    </column>

    <column name="match">
      <p>
        A matching expression.  OVN provides a superset of OpenFlow matching
        capabilities, using a syntax similar to Boolean expressions in a
        programming language.
      </p>

      <p>
        The most important components of match expression are
        <dfn>comparisons</dfn> between <dfn>symbols</dfn> and
        <dfn>constants</dfn>, e.g. <code>ip4.dst == 192.168.0.1</code>,
        <code>ip.proto == 6</code>, <code>arp.op == 1</code>, <code>eth.type ==
        0x800</code>.  The logical AND operator <code>&amp;&amp;</code> and
        logical OR operator <code>||</code> can combine comparisons into a
        larger expression.
      </p>

      <p>
        Matching expressions also support parentheses for grouping, the logical
        NOT prefix operator <code>!</code>, and literals <code>0</code> and
        <code>1</code> to express ``false'' or ``true,'' respectively.  The
        latter is useful by itself as a catch-all expression that matches every
        packet.
      </p>

      <p><em>Symbols</em></p>

      <p>
        <em>Type</em>.  Symbols have <dfn>integer</dfn> or <dfn>string</dfn>
        type.  Integer symbols have a <dfn>width</dfn> in bits.
      </p>

      <p>
        <em>Kinds</em>.  There are three kinds of symbols:
      </p>

      <ul>
        <li>
          <p>
            <dfn>Fields</dfn>.  A field symbol represents a packet header or
            metadata field.  For example, a field
            named <code>vlan.tci</code> might represent the VLAN TCI field in a
            packet.
          </p>

          <p>
            A field symbol can have integer or string type.  Integer fields can
            be nominal or ordinal (see <em>Level of Measurement</em>,
            below).
          </p>
        </li>

        <li>
          <p>
            <dfn>Subfields</dfn>.  A subfield represents a subset of bits from
            a larger field.  For example, a field <code>vlan.vid</code> might
            be defined as an alias for <code>vlan.tci[0..11]</code>.  Subfields
            are provided for syntactic convenience, because it is always
            possible to instead refer to a subset of bits from a field
            directly.
          </p>

          <p>
            Only ordinal fields (see <em>Level of Measurement</em>,
            below) may have subfields.  Subfields are always ordinal.
          </p>
        </li>

        <li>
          <p>
            <dfn>Predicates</dfn>.  A predicate is shorthand for a Boolean
            expression.  Predicates may be used much like 1-bit fields.  For
            example, <code>ip4</code> might expand to <code>eth.type ==
            0x800</code>.  Predicates are provided for syntactic convenience,
            because it is always possible to instead specify the underlying
            expression directly.
          </p>

          <p>
            A predicate whose expansion refers to any nominal field or
            predicate (see <em>Level of Measurement</em>, below) is nominal;
            other predicates have Boolean level of measurement.
          </p>
        </li>
      </ul>

      <p>
        <em>Level of Measurement</em>.  See
        http://en.wikipedia.org/wiki/Level_of_measurement for the statistical
        concept on which this classification is based.  There are three
        levels:
      </p>

      <ul>
        <li>
          <p>
            <dfn>Ordinal</dfn>.  In statistics, ordinal values can be ordered
            on a scale.  OVN considers a field (or subfield) to be ordinal if
            its bits can be examined individually.  This is true for the
            OpenFlow fields that OpenFlow or Open vSwitch makes ``maskable.''
          </p>

          <p>
            Any use of a nominal field may specify a single bit or a range of
            bits, e.g. <code>vlan.tci[13..15]</code> refers to the PCP field
            within the VLAN TCI, and <code>eth.dst[40]</code> refers to the
            multicast bit in the Ethernet destination address.
          </p>

          <p>
            OVN supports all the usual arithmetic relations (<code>==</code>,
            <code>!=</code>, <code>&lt;</code>, <code>&lt;=</code>,
            <code>&gt;</code>, and <code>&gt;=</code>) on ordinal fields and
            their subfields, because OVN can implement these in OpenFlow and
            Open vSwitch as collections of bitwise tests.
          </p>
        </li>

        <li>
          <p>
            <dfn>Nominal</dfn>.  In statistics, nominal values cannot be
            usefully compared except for equality.  This is true of OpenFlow
            port numbers, Ethernet types, and IP protocols are examples: all of
            these are just identifiers assigned arbitrarily with no deeper
            meaning.  In OpenFlow and Open vSwitch, bits in these fields
            generally aren't individually addressable.
          </p>

          <p>
            OVN only supports arithmetic tests for equality on nominal fields,
            because OpenFlow and Open vSwitch provide no way for a flow to
            efficiently implement other comparisons on them.  (A test for
            inequality can be sort of built out of two flows with different
            priorities, but OVN matching expressions always generate flows with
            a single priority.)
          </p>

          <p>
            String fields are always nominal.
          </p>
        </li>

        <li>
          <p>
            <dfn>Boolean</dfn>.  A nominal field that has only two values, 0
            and 1, is somewhat exceptional, since it is easy to support both
            equality and inequality tests on such a field: either one can be
            implemented as a test for 0 or 1.
          </p>

          <p>
            Only predicates (see above) have a Boolean level of measurement.
          </p>

          <p>
            This isn't a standard level of measurement.
          </p>
        </li>
      </ul>

      <p>
        <em>Prerequisites</em>.  Any symbol can have prerequisites, which are
        additional condition implied by the use of the symbol.  For example,
        For example, <code>icmp4.type</code> symbol might have prerequisite
        <code>icmp4</code>, which would cause an expression <code>icmp4.type ==
        0</code> to be interpreted as <code>icmp4.type == 0 &amp;&amp;
        icmp4</code>, which would in turn expand to <code>icmp4.type == 0
        &amp;&amp; eth.type == 0x800 &amp;&amp; ip4.proto == 1</code> (assuming
        <code>icmp4</code> is a predicate defined as suggested under
        <em>Types</em> above).
      </p>

      <p><em>Relational operators</em></p>

      <p>
        All of the standard relational operators <code>==</code>,
        <code>!=</code>, <code>&lt;</code>, <code>&lt;=</code>,
        <code>&gt;</code>, and <code>&gt;=</code> are supported.  Nominal
        fields support only <code>==</code> and <code>!=</code>, and only in a
        positive sense when outer <code>!</code> are taken into account,
        e.g. given string field <code>inport</code>, <code>inport ==
        "eth0"</code> and <code>!(inport != "eth0")</code> are acceptable, but
        not <code>inport != "eth0"</code>.
      </p>

      <p>
        The implementation of <code>==</code> (or <code>!=</code> when it is
        negated), is more efficient than that of the other relational
        operators.
      </p>

      <p><em>Constants</em></p>

      <p>
        Integer constants may be expressed in decimal, hexadecimal prefixed by
        <code>0x</code>, or as dotted-quad IPv4 addresses, IPv6 addresses in
        their standard forms, or Ethernet addresses as colon-separated hex
        digits.  A constant in any of these forms may be followed by a slash
        and a second constant (the mask) in the same form, to form a masked
        constant.  IPv4 and IPv6 masks may be given as integers, to express
        CIDR prefixes.
      </p>

      <p>
        String constants have the same syntax as quoted strings in JSON (thus,
        they are Unicode strings).
      </p>

      <p>
        Some operators support sets of constants written inside curly braces
        <code>{</code> ... <code>}</code>.  Commas between elements of a set,
        and after the last elements, are optional.  With <code>==</code>,
        ``<code><var>field</var> == { <var>constant1</var>,
        <var>constant2</var>,</code> ... <code>}</code>'' is syntactic sugar
        for ``<code><var>field</var> == <var>constant1</var> ||
        <var>field</var> == <var>constant2</var> || </code>...<code></code>.
        Similarly, ``<code><var>field</var> != { <var>constant1</var>,
        <var>constant2</var>, </code>...<code> }</code>'' is equivalent to
        ``<code><var>field</var> != <var>constant1</var> &amp;&amp;
        <var>field</var> != <var>constant2</var> &amp;&amp;
        </code>...<code></code>''.
      </p>

      <p><em>Miscellaneous</em></p>

      <p>
        Comparisons may name the symbol or the constant first,
        e.g. <code>tcp.src == 80</code> and <code>80 == tcp.src</code> are both
        acceptable.
      </p>

      <p>
        Tests for a range may be expressed using a syntax like <code>1024 &lt;=
        tcp.src &lt;= 49151</code>, which is equivalent to <code>1024 &lt;=
        tcp.src &amp;&amp; tcp.src &lt;= 49151</code>.
      </p>

      <p>
        For a one-bit field or predicate, a mention of its name is equivalent
        to <code><var>symobl</var> == 1</code>, e.g. <code>vlan.present</code>
        is equivalent to <code>vlan.present == 1</code>.  The same is true for
        one-bit subfields, e.g. <code>vlan.tci[12]</code>.  There is no
        technical limitation to implementing the same for ordinal fields of all
        widths, but the implementation is expensive enough that the syntax
        parser requires writing an explicit comparison against zero to make
        mistakes less likely, e.g. in <code>tcp.src != 0</code> the comparison
        against 0 is required.
      </p>

      <p>
        <em>Operator precedence</em> is as shown below, from highest to lowest.
        There are two exceptions where parentheses are required even though the
        table would suggest that they are not: <code>&amp;&amp;</code> and
        <code>||</code> require parentheses when used together, and
        <code>!</code> requires parentheses when applied to a relational
        expression.  Thus, in <code>(eth.type == 0x800 || eth.type == 0x86dd)
        &amp;&amp; ip.proto == 6</code> or <code>!(arp.op == 1)</code>, the
        parentheses are mandatory.
      </p>

      <ul>
        <li><code>()</code></li>
        <li><code>==   !=   &lt;   &lt;=   &gt;   &gt;=</code></li>
        <li><code>!</code></li>
        <li><code>&amp;&amp;   ||</code></li>
      </ul>

      <p>
        <em>Comments</em> may be introduced by <code>//</code>, which extends
        to the next new-line.  Comments within a line may be bracketed by
        <code>/*</code> and <code>*/</code>.  Multiline comments are not
        supported.
      </p>

      <p><em>Symbols</em></p>

      <p>
        Most of the symbols below have integer type.  Only <code>inport</code>
        and <code>outport</code> have string type.  <code>inport</code> names a
        logical port.  Thus, its value is a <ref column="logical_port"/> name
        from the <ref table="Port_Binding"/> table.  <code>outport</code> may
        name a logical port, as <code>inport</code>, or a logical multicast
        group defined in the <ref table="Multicast_Group"/> table.  For both
        symbols, only names within the flow's logical datapath may be used.
      </p>

      <ul>
        <li><code>reg0</code>...<code>reg5</code></li>
        <li><code>inport</code> <code>outport</code></li>
        <li><code>eth.src</code> <code>eth.dst</code> <code>eth.type</code></li>
        <li><code>vlan.tci</code> <code>vlan.vid</code> <code>vlan.pcp</code> <code>vlan.present</code></li>
        <li><code>ip.proto</code> <code>ip.dscp</code> <code>ip.ecn</code> <code>ip.ttl</code> <code>ip.frag</code></li>
        <li><code>ip4.src</code> <code>ip4.dst</code></li>
        <li><code>ip6.src</code> <code>ip6.dst</code> <code>ip6.label</code></li>
        <li><code>arp.op</code> <code>arp.spa</code> <code>arp.tpa</code> <code>arp.sha</code> <code>arp.tha</code></li>
        <li><code>tcp.src</code> <code>tcp.dst</code> <code>tcp.flags</code></li>
        <li><code>udp.src</code> <code>udp.dst</code></li>
        <li><code>sctp.src</code> <code>sctp.dst</code></li>
        <li><code>icmp4.type</code> <code>icmp4.code</code></li>
        <li><code>icmp6.type</code> <code>icmp6.code</code></li>
        <li><code>nd.target</code> <code>nd.sll</code> <code>nd.tll</code></li>
      </ul>

    </column>

    <column name="actions">
      <p>
        Logical datapath actions, to be executed when the logical flow
        represented by this row is the highest-priority match.
      </p>

      <p>
        Actions share lexical syntax with the <ref column="match"/> column.  An
        empty set of actions (or one that contains just white space or
        comments), or a set of actions that consists of just
        <code>drop;</code>, causes the matched packets to be dropped.
        Otherwise, the column should contain a sequence of actions, each
        terminated by a semicolon.
      </p>

      <p>
	The following actions are defined:
      </p>

      <dl>
        <dt><code>output;</code></dt>
        <dd>
          <p>
	    In the ingress pipeline, this action executes the
	    <code>egress</code> pipeline as a subroutine.  If
	    <code>outport</code> names a logical port, the egress pipeline
	    executes once; if it is a multicast group, the egress pipeline runs
	    once for each logical port in the group.
          </p>

          <p>
            In the egress pipeline, this action performs the actual
            output to the <code>outport</code> logical port.  (In the egress
            pipeline, <code>outport</code> never names a multicast group.)
          </p>

          <p>
            Output to the input port is implicitly dropped, that is,
            <code>output</code> becomes a no-op if <code>outport</code> ==
            <code>inport</code>.
          </p>
	</dd>

        <dt><code>next;</code></dt>
        <dd>
          Executes the next logical datapath table as a subroutine.
        </dd>

        <dt><code><var>field</var> = <var>constant</var>;</code></dt>
        <dd>
          <p>
	    Sets data or metadata field <var>field</var> to constant value
	    <var>constant</var>, e.g. <code>outport = "vif0";</code> to set the
	    logical output port.  To set only a subset of bits in a field,
	    specify a subfield for <var>field</var> or a masked
	    <var>constant</var>, e.g. one may use <code>vlan.pcp[2] = 1;</code>
	    or <code>vlan.pcp = 4/4;</code> to set the most sigificant bit of
	    the VLAN PCP.
          </p>

          <p>
            Assigning to a field with prerequisites implicitly adds those
            prerequisites to <ref column="match"/>; thus, for example, a flow
            that sets <code>tcp.dst</code> applies only to TCP flows,
            regardless of whether its <ref column="match"/> mentions any TCP
            field.
          </p>

          <p>
            Not all fields are modifiable (e.g. <code>eth.type</code> and
            <code>ip.proto</code> are read-only), and not all modifiable fields
            may be partially modified (e.g. <code>ip.ttl</code> must assigned
            as a whole).  The <code>outport</code> field is modifiable in the
            <code>ingress</code> pipeline but not in the <code>egress</code>
            pipeline.
          </p>
	</dd>
      </dl>

      <p>
        The following actions will likely be useful later, but they have not
        been thought out carefully.
      </p>

      <dl>
        <dt><code><var>field1</var> = <var>field2</var>;</code></dt>
        <dd>
          Extends the assignment action to allow copying between fields.
        </dd>

        <dt><code>learn</code></dt>

        <dt><code>conntrack</code></dt>

        <dt><code>dec_ttl { <var>action</var>, </code>...<code> } { <var>action</var>; </code>...<code>};</code></dt>
        <dd>
          decrement TTL; execute first set of actions if
          successful, second set if TTL decrement fails
        </dd>

        <dt><code>icmp_reply { <var>action</var>, </code>...<code> };</code></dt>
        <dd>generate ICMP reply from packet, execute <var>action</var>s</dd>

        <dt><code>arp { <var>action</var>, </code>...<code> }</code></dt>
        <dd>generate ARP from packet, execute <var>action</var>s</dd>
      </dl>
    </column>

    <column name="external_ids" key="stage-name">
      Human-readable name for this flow's stage in the pipeline.
    </column>

    <group title="Common Columns">
      The overall purpose of these columns is described under <code>Common
      Columns</code> at the beginning of this document.

      <column name="external_ids"/>
    </group>
  </table>

  <table name="Multicast_Group" title="Logical Port Multicast Groups">
    <p>
      The rows in this table define multicast groups of logical ports.
      Multicast groups allow a single packet transmitted over a tunnel to a
      hypervisor to be delivered to multiple VMs on that hypervisor, which
      uses bandwidth more efficiently.
    </p>

    <p>
      Each row in this table defines a logical multicast group numbered <ref
      column="tunnel_key"/> within <ref column="datapath"/>, whose logical
      ports are listed in the <ref column="ports"/> column.
    </p>

    <column name="datapath">
      The logical datapath in which the multicast group resides.
    </column>

    <column name="tunnel_key">
      The value used to designate this logical egress port in tunnel
      encapsulations.  An index forces the key to be unique within the <ref
      column="datapath"/>.  The unusual range ensures that multicast group IDs
      do not overlap with logical port IDs.
    </column>

    <column name="name">
      <p>
        The logical multicast group's name.  An index forces the name to be
        unique within the <ref column="datapath"/>.  Logical flows in the
        ingress pipeline may output to the group just as for individual logical
        ports, by assigning the group's name to <code>outport</code> and
        executing an <code>output</code> action.
      </p>

      <p>
        Multicast group names and logical port names share a single namespace
        and thus should not overlap (but the database schema cannot enforce
        this).  To try to avoid conflicts, <code>ovn-northd</code> uses names
        that begin with <code>_MC_</code>.
      </p>
    </column>

    <column name="ports">
      The logical ports included in the multicast group.  All of these ports
      must be in the <ref column="datapath"/> logical datapath (but the
      database schema cannot enforce this).
    </column>
  </table>

  <table name="Datapath_Binding" title="Physical-Logical Datapath Bindings">
    <p>
      Each row in this table identifies physical bindings of a logical
      datapath.  A logical datapath implements a logical pipeline among the
      ports in the <ref table="Port_Binding"/> table associated with it.  In
      practice, the pipeline in a given logical datapath implements either a
      logical switch or a logical router.
    </p>

    <column name="tunnel_key">
      The tunnel key value to which the logical datapath is bound.
      The <code>Tunnel Encapsulation</code> section in
      <code>ovn-architecture</code>(7) describes how tunnel keys are
      constructed for each supported encapsulation.
    </column>

    <column name="external_ids" key="logical-switch" type='{"type": "uuid"}'>
      Each row in <ref table="Datapath_Binding"/> is associated with some
      logical datapath.  <code>ovn-northd</code> uses this key to store the
      UUID of the logical datapath <ref table="Logical_Switch"
      db="OVN_Northbound"/> row in the <ref db="OVN_Northbound"/> database.
    </column>

    <group title="Common Columns">
      The overall purpose of these columns is described under <code>Common
      Columns</code> at the beginning of this document.

      <column name="external_ids"/>
    </group>
  </table>

  <table name="Port_Binding" title="Physical-Logical Port Bindings">
    <p>
      Each row in this table identifies the physical location of a logical
      port.
    </p>

    <p>
      For every <code>Logical_Port</code> record in <code>OVN_Northbound</code>
      database, <code>ovn-northd</code> creates a record in this table.
      <code>ovn-northd</code> populates and maintains every column except
      the <code>chassis</code> column, which it leaves empty in new records.
    </p>

    <p>
      <code>ovn-controller</code> populates the <code>chassis</code> column
      for the records that identify the logical ports that are located on its
      hypervisor, which <code>ovn-controller</code> in turn finds out by
      monitoring the local hypervisor's Open_vSwitch database, which
      identifies logical ports via the conventions described in
      <code>IntegrationGuide.md</code>.
    </p>

    <p>
      When a chassis shuts down gracefully, it should clean up the
      <code>chassis</code> column that it previously had populated.
      (This is not critical because resources hosted on the chassis are equally
      unreachable regardless of whether their rows are present.)  To handle the
      case where a VM is shut down abruptly on one chassis, then brought up
      again on a different one, <code>ovn-controller</code> must overwrite the
      <code>chassis</code> column with new information.
    </p>

    <column name="datapath">
      The logical datapath to which the logical port belongs.
    </column>

    <column name="logical_port">
      A logical port, taken from <ref table="Logical_Port" column="name"
      db="OVN_Northbound"/> in the OVN_Northbound database's
      <ref table="Logical_Port" db="OVN_Northbound"/> table.  OVN does not
      prescribe a particular format for the logical port ID.
    </column>

    <column name="type">
      <p>
      A type for this logical port.  Logical ports can be used to model
      other types of connectivity into an OVN logical switch.  Leaving this column
      blank maintains the default logical port behavior.
      </p>

      <p>
      There are no other logical port types implemented yet.
      </p>
    </column>

    <column name="options">
        This column provides key/value settings specific to the logical port
        <ref column="type"/>.
    </column>

    <column name="tunnel_key">
      <p>
        A number that represents the logical port in the key (e.g. STT key or
        Geneve TLV) field carried within tunnel protocol packets.
      </p>

      <p>
        The tunnel ID must be unique within the scope of a logical datapath.
      </p>
    </column>

    <column name="parent_port">
      For containers created inside a VM, this is taken from
      <ref table="Logical_Port" column="parent_name" db="OVN_Northbound"/>
      in the OVN_Northbound database's <ref table="Logical_Port"
      db="OVN_Northbound"/> table.  It is left empty if
      <ref column="logical_port"/> belongs to a VM or a container created
      in the hypervisor.
    </column>

    <column name="tag">
      When <ref column="logical_port"/> identifies the interface of a container
      spawned inside a VM, this column identifies the VLAN tag in
      the network traffic associated with that container's network interface.
      It is left empty if <ref column="logical_port"/> belongs to a VM or a
      container created in the hypervisor.
    </column>

    <column name="chassis">
      The physical location of the logical port.  To successfully identify a
      chassis, this column must be a <ref table="Chassis"/> record.  This is
      populated by <code>ovn-controller</code>.
    </column>

    <column name="mac">
      <p>
        The Ethernet address or addresses used as a source address on the
        logical port, each in the form
        <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>.
        The string <code>unknown</code> is also allowed to indicate that the
        logical port has an unknown set of (additional) source addresses.
      </p>

      <p>
        A VM interface would ordinarily have a single Ethernet address.  A
        gateway port might initially only have <code>unknown</code>, and then
        add MAC addresses to the set as it learns new source addresses.
      </p>
    </column>
  </table>
</database>