ovn/TODO


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252

* Flow match expression handling library.

  ovn-controller is the primary user of flow match expressions, but
  the same syntax and I imagine the same code ought to be useful in
  ovn-northd for ACL match expressions.

* ovn-controller

** Flow table handling in ovn-controller.

   ovn-controller has to transform logical datapath flows from the
   database into OpenFlow flows.

*** Definition (or choice) of data structure for flows and flow table.

    It would be natural enough to use "struct flow" and "struct
    classifier" for this.  Maybe that is what we should do.  However,
    "struct classifier" is optimized for searches based on packet
    headers, whereas all we care about here can be implemented with a
    hash table.  Also, we may want to make it easy to add and remove
    support for fields without recompiling, which is not possible with
    "struct flow" or "struct classifier".

    On the other hand, we may find that it is difficult to decide that
    two OXM flow matches are identical (to normalize them) without a
    lot of domain-specific knowledge that is already embedded in struct
    flow.  It's also going to be a pain to come up with a way to make
    anything other than "struct flow" work with the ofputil_*()
    functions for encoding and decoding OpenFlow.

    It's also possible we could use struct flow without struct
    classifier.

*** Assembling conjunctive flows from flow match expressions.

    This transformation explodes logical datapath flows into multiple
    OpenFlow flow table entries, since a flow match expression in CoD
    form requires several OpenFlow flow table entries.  It also
    requires merging together OpenFlow flow tables entries that contain
    "conjunction" actions (really just concatenating their actions).

*** Translating logical datapath port names into port numbers.

    Logical ports are specified by name in logical datapath flows, but
    OpenFlow only works in terms of numbers.

*** Translating logical datapath actions into OpenFlow actions.

    Some of the logical datapath actions do not have natural
    representations as OpenFlow actions: they require
    packet-in/packet-out round trips through ovn-controller.  The
    trickiest part of that is going to be making sure that the
    packet-out resumes the control flow that was broken off by the
    packet-in.  That's tricky; we'll probably have to restrict control
    flow or add OVS features to make resuming in general possible.  Not
    sure which is better at this point.

*** OpenFlow flow table synchronization.

    The internal representation of the OpenFlow flow table has to be
    synced across the controller connection to OVS.  This probably
    boils down to the "flow monitoring" feature of OF1.4 which was then
    made available as a "standard extension" to OF1.3.  (OVS hasn't
    implemented this for OF1.4 yet, but the feature is based on a OVS
    extension to OF1.0, so it should be straightforward to add it.)

    We probably need some way to catch cases where OVS and OVN don't
    see eye-to-eye on what exactly constitutes a flow, so that OVN
    doesn't waste a lot of CPU time hammering at OVS trying to install
    something that it's not going to do.

*** Logical/physical translation.

    When a packet comes into the integration bridge, the first stage of
    processing needs to translate it from a physical to a logical
    context.  When a packet leaves the integration bridge, the final
    stage of processing needs to translate it back into a physical
    context.  ovn-controller needs to populate the OpenFlow flows
    tables to do these translations.

*** Determine how to split logical pipeline across physical nodes.

    From the original OVN architecture document:

    The pipeline processing is split between the ingress and egress
    transport nodes.  In particular, the logical egress processing may
    occur at either hypervisor.  Processing the logical egress on the
    ingress hypervisor requires more state about the egress vif's
    policies, but reduces traffic on the wire that would eventually be
    dropped.  Whereas, processing on the egress hypervisor can reduce
    broadcast traffic on the wire by doing local replication.  We
    initially plan to process logical egress on the egress hypervisor
    so that less state needs to be replicated.  However, we may change
    this behavior once we gain some experience writing the logical
    flows.

    The split pipeline processing split will influence how tunnel keys
    are encoded.

** Interaction with Open_vSwitch and OVN databases:

*** Monitor Chassis table in OVN.

    Populate Port records for tunnels to other chassis into
    Open_vSwitch database.  As a scale optimization later on, one can
    populate only records for tunnels to other chassis that have
    logical networks in common with this one.

*** Monitor Pipeline table in OVN, trigger flow table recomputation on change.

** ovn-controller parameters and configuration.

*** Tunnel encapsulation to publish.

    Default: VXLAN? Geneve?

*** SSL configuration.

    Can probably get this from Open_vSwitch database.

* ovn-northd

** Monitor OVN_Northbound database, trigger Pipeline recomputation on change.

** Translate each OVN_Northbound entity into Pipeline logical datapath flows.

   We have to first sit down and figure out what the general
   translation of each entity is.  The original OVN architecture
   description at
   http://openvswitch.org/pipermail/dev/2015-January/050380.html had
   some sketches of these, but they need to be completed and
   elaborated.

   Initially, the simplest way to do this is probably to write
   straight C code to do a full translation of the entire
   OVN_Northbound database into the format for the Pipeline table in
   the OVN Southbound database.  As scale increases, this will probably
   be too inefficient since a small change in OVN_Northbound requires a
   full recomputation.  At that point, we probably want to adopt a more
   systematic approach, such as something akin to the "nlog" system used
   in NVP (see Koponen et al. "Network Virtualization in Multi-tenant
   Datacenters", NSDI 2014).

** Push logical datapath flows to Pipeline table.

** Monitor OVN Southbound database Bindings table.

   Sync rows in the OVN Bindings table to the "up" column in the
   OVN_Northbound database.

* ovsdb-server

  ovsdb-server should have adequate features for OVN but it probably
  needs work for scale and possibly for availability as deployments
  grow.  Here are some thoughts.

  Andy Zhou is looking at these issues.

** Scaling number of connections.

   In typical use today a given ovsdb-server has only a single-digit
   number of simultaneous connections.  The OVN Southbound database will
   have a connection from every hypervisor.  This use case needs testing
   and probably coding work.  Here are some possible improvements.

*** Reducing amount of data sent to clients.

    Currently, whenever a row monitored by a client changes,
    ovsdb-server sends the client every monitored column in the row,
    even if only one column changes.  It might be valuable to reduce
    this only to the columns that changes.

    Also, whenever a column changes, ovsdb-server sends the entire
    contents of the column.  It might be valuable, for columns that
    are sets or maps, to send only added or removed values or
    key-values pairs.

    Currently, clients monitor the entire contents of a table.  It
    might make sense to allow clients to monitor only rows that
    satisfy specific criteria, e.g. to allow an ovn-controller to
    receive only Pipeline rows for logical networks on its hypervisor.

*** Reducing redundant data and code within ovsdb-server.

    Currently, ovsdb-server separately composes database update
    information to send to each of its clients.  This is fine for a
    small number of clients, but it wastes time and memory when
    hundreds of clients all want the same updates (as will be in the
    case in OVN).

    (This is somewhat opposed to the idea of letting a client monitor
    only some rows in a table, since that would increase the diversity
    among clients.)

*** Multithreading.

    If it turns out that other changes don't let ovsdb-server scale
    adequately, we can multithread ovsdb-server.  Initially one might
    only break protocol handling into separate threads, leaving the
    actual database work serialized through a lock.

** Increasing availability.

   Database availability might become an issue.  The OVN system
   shouldn't grind to a halt if the database becomes unavailable, but
   it would become impossible to bring VIFs up or down, etc.

   My current thought on how to increase availability is to add
   clustering to ovsdb-server, probably via the Raft consensus
   algorithm.  As an experiment, I wrote an implementation of Raft
   for Open vSwitch that you can clone from:

       https://github.com/blp/ovs-reviews.git raft

** Reducing startup time.

   As-is, if ovsdb-server restarts, every client will fetch a fresh
   copy of the part of the database that it cares about.  With
   hundreds of clients, this could cause heavy CPU load on
   ovsdb-server and use excessive network bandwidth.  It would be
   better to allow incremental updates even across connection loss.
   One way might be to use "Difference Digests" as described in
   Epstein et al., "What's the Difference? Efficient Set
   Reconciliation Without Prior Context".  (I'm not yet aware of
   previous non-academic use of this technique.)

* Miscellaneous:

** Write ovn-nbctl utility.

   The idea here is that we need a utility to act on the OVN_Northbound
   database in a way similar to a CMS, so that we can do some testing
   without an actual CMS in the picture.

   No details yet.

** Init scripts for ovn-controller (on HVs), ovn-northd, OVN DB server.

** Distribution packaging.

* Not yet scoped:

** Neutron plugin.

   This is being developed on OpenStack's development infrastructure
   to be along side most of the other Neutron plugins.

   http://git.openstack.org/cgit/stackforge/networking-ovn

   http://git.openstack.org/cgit/stackforge/networking-ovn/tree/doc/source/todo.rst

** Gateways.