1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
|
-*- outline -*-
* L3 support
** New OVN logical actions
*** icmp4 { action... }
Generates an ICMPv4 packet based on the current IPv4 packet and
processes it according to each nested action (and then pops back to
processing the original IPv4 packet). The intended use case is for
generating "time exceeded" and "destination unreachable" errors.
ovn-sb.xml includes a tentative specification for this action.
Tentatively, the icmp4 action sets a default icmp_type and icmp_code
and lets the nested actions override it. This means that we'd have to
make icmp_type and icmp_code writable. Because changing icmp_type and
icmp_code can change the interpretation of the rest of the data in the
ICMP packet, we would want to think this through carefully. If it
seems like a bad idea then we could instead make the type and code a
parameter to the action: icmp4(type, code) { action... }
It is worth considering what should be considered the ingress port for
the ICMPv4 packet. It's quite likely that the ICMPv4 packet is going
to go back out the ingress port. Maybe the icmp4 action, therefore,
should clear the inport, so that output to the original inport won't
be discarded.
*** tcp_reset
Transforms the current TCP packet into a RST reply.
ovn-sb.xml includes a tentative specification for this action.
*** Other actions for IPv6.
IPv6 will probably need an action or actions for ND that is similar to
the "arp" action, and an action for generating
** IPv6
*** Drop invalid source IPv6 addresses
*** Don't forward non-routable addresses
*** ICMPv6 action
Similar to the ICMPv4 action, ICMPv6 messages should be generated.
*** Neighbor Discovery
**** ND Router Advertisements
The router ports should periodically send out ND Router Advertisements
and respond to Router Solicitations.
**** Learn MAC bindings on ND Solicitations
**** Properly set RSO flags
** Dynamic IP to MAC bindings
OVN has basic support for establishing IP to MAC bindings dynamically,
using ARP.
*** Ratelimiting.
From casual observation, Linux appears to generate at most one ARP per
second per destination.
This might be supported by adding a new OVN logical action for
rate-limiting.
*** Tracking queries
It's probably best to only record in the database responses to queries
actually issued by an L3 logical router, so somehow they have to be
tracked, probably by putting a tentative binding without a MAC address
into the database.
*** Renewal and expiration.
Something needs to make sure that bindings remain valid and expire
those that become stale.
One way to do this might be to add some support for time to the
database server itself.
*** Table size limiting.
The table of MAC bindings must not be allowed to grow unreasonably
large.
** MTU handling (fragmentation on output)
* ovn-controller
** ovn-controller parameters and configuration.
*** SSL configuration.
Can probably get this from Open_vSwitch database.
** Security
*** Limiting the impact of a compromised chassis.
Every instance of ovn-controller has the same full access to the central
OVN_Southbound database. This means that a compromised chassis can
interfere with the normal operation of the rest of the deployment. Some
specific examples include writing to the logical flow table to alter
traffic handling or updating the port binding table to claim ports that are
actually present on a different chassis. In practice, the compromised host
would be fighting against ovn-northd and other instances of ovn-controller
that would be trying to restore the correct state. The impact could include
at least temporarily redirecting traffic (so the compromised host could
receive traffic that it shouldn't) and potentially a more general denial of
service.
There are different potential improvements to this area. The first would be
to add some sort of ACL scheme to ovsdb-server. A proposal for this should
first include an ACL scheme for ovn-controller. An example policy would
be to make Logical_Flow read-only. Table-level control is needed, but is
not enough. For example, ovn-controller must be able to update the Chassis
and Encap tables, but should only be able to modify the rows associated with
that chassis and no others.
A more complex example is the Port_Binding table. Currently, ovn-controller
is the source of truth of where a port is located. There seems to be no
policy that can prevent malicious behavior of a compromised host with this
table.
An alternative scheme for port bindings would be to provide an optional mode
where an external entity controls port bindings and make them read-only to
ovn-controller. This is actually how OpenStack works today, for example.
The part of OpenStack that manages VMs (Nova) tells the networking component
(Neutron) where a port will be located, as opposed to the networking
component discovering it.
* ovsdb-server
ovsdb-server should have adequate features for OVN but it probably
needs work for scale and possibly for availability as deployments
grow. Here are some thoughts.
Andy Zhou is looking at these issues.
*** Reducing amount of data sent to clients.
Currently, whenever a row monitored by a client changes,
ovsdb-server sends the client every monitored column in the row,
even if only one column changes. It might be valuable to reduce
this only to the columns that changes.
Also, whenever a column changes, ovsdb-server sends the entire
contents of the column. It might be valuable, for columns that
are sets or maps, to send only added or removed values or
key-values pairs.
Currently, clients monitor the entire contents of a table. It
might make sense to allow clients to monitor only rows that
satisfy specific criteria, e.g. to allow an ovn-controller to
receive only Logical_Flow rows for logical networks on its hypervisor.
*** Reducing redundant data and code within ovsdb-server.
Currently, ovsdb-server separately composes database update
information to send to each of its clients. This is fine for a
small number of clients, but it wastes time and memory when
hundreds of clients all want the same updates (as will be in the
case in OVN).
(This is somewhat opposed to the idea of letting a client monitor
only some rows in a table, since that would increase the diversity
among clients.)
*** Multithreading.
If it turns out that other changes don't let ovsdb-server scale
adequately, we can multithread ovsdb-server. Initially one might
only break protocol handling into separate threads, leaving the
actual database work serialized through a lock.
** Increasing availability.
Database availability might become an issue. The OVN system
shouldn't grind to a halt if the database becomes unavailable, but
it would become impossible to bring VIFs up or down, etc.
My current thought on how to increase availability is to add
clustering to ovsdb-server, probably via the Raft consensus
algorithm. As an experiment, I wrote an implementation of Raft
for Open vSwitch that you can clone from:
https://github.com/blp/ovs-reviews.git raft
** Reducing startup time.
As-is, if ovsdb-server restarts, every client will fetch a fresh
copy of the part of the database that it cares about. With
hundreds of clients, this could cause heavy CPU load on
ovsdb-server and use excessive network bandwidth. It would be
better to allow incremental updates even across connection loss.
One way might be to use "Difference Digests" as described in
Epstein et al., "What's the Difference? Efficient Set
Reconciliation Without Prior Context". (I'm not yet aware of
previous non-academic use of this technique.)
** Support multiple tunnel encapsulations in Chassis.
So far, both ovn-controller and ovn-controller-vtep only allow
chassis to have one tunnel encapsulation entry. We should extend
the implementation to support multiple tunnel encapsulations.
** Update learned MAC addresses from VTEP to OVN
The VTEP gateway stores all MAC addresses learned from its
physical interfaces in the 'Ucast_Macs_Local' and the
'Mcast_Macs_Local' tables. ovn-controller-vtep should be
able to update that information back to ovn-sb database,
so that other chassis know where to send packets destined
to the extended external network instead of broadcasting.
** Translate ovn-sb Multicast_Group table into VTEP config
The ovn-controller-vtep daemon should be able to translate
the Multicast_Group table entry in ovn-sb database into
Mcast_Macs_Remote table configuration in VTEP database.
* Consider the use of BFD as tunnel monitor.
The use of BFD for hypervisor-to-hypervisor tunnels is probably not worth it,
since there's no alternative to switch to if a tunnel goes down. It could
make sense at a slow rate if someone does OVN monitoring system integration,
but not otherwise.
When OVN gets to supporting HA for gateways (see ovn/OVN-GW-HA.md), BFD is
likely needed as a part of that solution.
There's more commentary in this ML post:
http://openvswitch.org/pipermail/dev/2015-November/062385.html
* ACL
** Support FTP ALGs.
** Support reject action.
** Support log option.
** We currently use ct_label[0] to mark connections that have been blocked
by an ACL. It would be nice in the logical flow syntax to have a
symbolic name like "ct_label.blocked" to make things more clear, as
well as to change the implementation of "blocked" in the future
if needed.
|