From 98dc8dee9922cf931494d5baa799f4f19a713a93 Mon Sep 17 00:00:00 2001 From: Ben Pfaff Date: Thu, 19 Oct 2017 11:28:35 -0700 Subject: Documentation: Add Faucet tutorial. This is for a talk at the Faucet conference on Oct. 19: http://conference.faucet.nz/schedule/ Signed-off-by: Ben Pfaff --- Documentation/automake.mk | 1 + Documentation/tutorials/faucet.rst | 1461 ++++++++++++++++++++++++++++++++++++ Documentation/tutorials/index.rst | 1 + 3 files changed, 1463 insertions(+) create mode 100644 Documentation/tutorials/faucet.rst (limited to 'Documentation') diff --git a/Documentation/automake.mk b/Documentation/automake.mk index 6f38912f2..da482b419 100644 --- a/Documentation/automake.mk +++ b/Documentation/automake.mk @@ -23,6 +23,7 @@ DOC_SOURCE = \ Documentation/intro/install/windows.rst \ Documentation/intro/install/xenserver.rst \ Documentation/tutorials/index.rst \ + Documentation/tutorials/faucet.rst \ Documentation/tutorials/ovs-advanced.rst \ Documentation/tutorials/ovn-openstack.rst \ Documentation/tutorials/ovn-sandbox.rst \ diff --git a/Documentation/tutorials/faucet.rst b/Documentation/tutorials/faucet.rst new file mode 100644 index 000000000..dbdab95cc --- /dev/null +++ b/Documentation/tutorials/faucet.rst @@ -0,0 +1,1461 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +=================== +OVS Faucet Tutorial +=================== + +This tutorial demonstrates how Open vSwitch works with a controller, using the +Faucet controller as a simple way to get started. It was tested with the +"master" branch of Open vSwitch and version 1.6.7 of Faucet in October 2017. +It does not use advanced or recently added features in OVS or Faucet, so other +versions of both pieces of software are likely to work equally well. + +The goal of the tutorial is to demonstrate Open vSwitch and Faucet in an +end-to-end way, that is, to show how it works from the Faucet controller +configuration at the top, through the OpenFlow flow table, to the datapath +processing. Along the way, in addition to helping to understand the +architecture at each level, we discuss performance and troubleshooting issues. +We hope that this demonstration makes it easier for users and potential users +to understand how Open vSwitch works and how to debug and troubleshoot it. + +We provide enough details in the tutorial that you should be able to fully +follow along by following the instructions. + +Setting Up OVS +-------------- + +This section explains how to set up Open vSwitch for the purpose of using it +with Faucet for the tutorial. + +You might already have Open vSwitch installed on one or more computers or VMs, +perhaps set up to control a set of VMs or a physical network. This is +admirable, but we will be using Open vSwitch in a different way to set up a +simulation environment called the OVS "sandbox". The sandbox does not use +virtual machines or containers, which makes it more limited, but on the other +hand it is (in this writer's opinion) easier to set up. + +There are two ways to start a sandbox: one that uses the Open vSwitch that is +already installed on a system, and another that uses a copy of Open vSwitch +that has been built but not yet installed. The latter is more often used and +thus better tested, but both should work. The instructions below explain both +approaches: + +1. Get a copy of the Open vSwitch source repository using Git, then ``cd`` into + the new directory:: + + $ git clone https://github.com/openvswitch/ovs.git + $ cd ovs + + The default checkout is the master branch. You can check out a tag + (such as v2.8.0) or a branch (such as origin/branch-2.8), if you + prefer. + +2. If you do not already have an installed copy of Open vSwitch on your system, + or if you do not want to use it for the sandbox (the sandbox will not + disturb the functionality of any existing switches), then proceed to step 3. + If you do have an installed copy and you want to use it for the sandbox, try + to start the sandbox by running:: + + $ tutorial/ovs-sandbox + + If it is successful, you will find yourself in a subshell environment, which + is the sandbox (you can exit with ``exit`` or Control+D). If so, you're + finished and do not need to complete the rest of the steps. If it fails, + you can proceed to step 3 to build Open vSwitch anyway. + +3. Before you build, you might want to check that your system meets the build + requirements. Read :doc:`/intro/install/general` to find out. For this + tutorial, there is no need to compile the Linux kernel module, or to use any + of the optional libraries such as OpenSSL, DPDK, or libcap-ng. + +4. Configure and build Open vSwitch:: + + $ ./boot.sh + $ ./configure + $ make -j4 + +5. Try out the sandbox by running:: + + $ make sandbox + + You can exit the sandbox with ``exit`` or Control+D. + +Setting up Faucet +----------------- + +This section explains how to get a copy of Faucet and set it up +appropriately for the tutorial. There are many other ways to install +Faucet, but this simple approach worked well for me. It has the +advantage that it does not require modifying any system-level files or +directories on your machine. It does, on the other hand, require +Docker, so make sure you have it installed and working. + +It will be a little easier to go through the rest of the tutorial if +you run these instructions in a separate terminal from the one that +you're using for Open vSwitch, because it's often necessary to switch +between one and the other. + +1. Get a copy of the Faucet source repository using Git, then ``cd`` + into the new directory:: + + $ git clone https://github.com/faucetsdn/faucet.git + $ cd faucet + + At this point I checked out the latest tag:: + + $ git checkout v1.6.7 + +2. Build a docker container image:: + + $ docker build -t faucet/faucet . + + This will take a few minutes. + +3. Create an installation directory under the ``faucet`` directory for + the docker image to use:: + + $ mkdir inst + + The Faucet configuration will go in ``inst/faucet.yaml`` and its + main log will appear in ``inst/faucet.log``. (The official Faucet + installation instructions call to put these in ``/etc/ryu/faucet`` + and ``/var/log/ryu/faucet``, respectively, but we avoid modifying + these system directories.) + +4. Create a container and start Faucet:: + + $ docker run -d --name faucet -v `pwd`/inst/:/etc/ryu/faucet/ -v `pwd`/inst/:/var/log/ryu/faucet/ -p 6653:6653 faucet/faucet + +5. Look in ``inst/faucet.log`` to verify that Faucet started. It will + probably start with an exception and traceback because we have not + yet created ``inst/faucet.yaml``. + +6. Later on, to make a new or updated Faucet configuration take + effect quickly, you can run:: + + $ docker exec faucet pkill -HUP -f faucet.faucet + + Another way is to stop and start the Faucet container:: + + $ docker restart faucet + + You can also stop and delete the container; after this, to start it + again, you need to rerun the ``docker run`` command:: + + $ docker stop faucet + $ docker rm faucet + +Overview +-------- + +Now that Open vSwitch and Faucet are ready, here's an overview of what +we're going to do for the remainder of the tutorial: + +1. Switching: Set up an L2 network with Faucet. + +2. Routing: Route between multiple L3 networks with Faucet. + +3. ACLs: Add and modify access control rules. + +At each step, we will take a look at how the features in question work +from Faucet at the top to the data plane layer at the bottom. From +the highest to lowest level, these layers and the software components +that connect them are: + +* Faucet, which as the top level in the system is the authoritative + source of the network configuration. + + Faucet connects to a variety of monitoring and performance tools, + but we won't use them in this tutorial. Our main insights into the + system will be through ``faucet.yaml`` for configuration and + ``faucet.log`` to observe state, such as MAC learning and ARP + resolution, and to tell when we've screwed up configuration syntax + or semantics. + +* The OpenFlow subsystem in Open vSwitch. OpenFlow is the protocol, + standardized by the Open Networking Foundation, that controllers + like Faucet use to control how Open vSwitch and other switches treat + packets in the network. + + We will use ``ovs-ofctl``, a utility that comes with Open vSwitch, + to observe and occasionally modify Open vSwitch's OpenFlow behavior. + We will also use ``ovs-appctl``, a utility for communicating with + ``ovs-vswitchd`` and other Open vSwitch daemons, to ask "what-if?" + type questions. + + In addition, the OVS sandbox by default raises the Open vSwitch + logging level for OpenFlow high enough that we can learn a great + deal about OpenFlow behavior simply by reading its log file. + +* Open vSwitch datapath. This is essentially a cache designed to + accelerate packet processing. Open vSwitch includes a few different + datapaths, such as one based on the Linux kernel and a + userspace-only datapath (sometimes called the "DPDK" datapath). The + OVS sandbox uses the latter, but the principles behind it apply + equally well to other datapaths. + +At each step, we discuss how the design of each layer influences +performance. We demonstrate how Open vSwitch features can be used to +debug, troubleshoot, and understand the system as a whole. + +Switching +--------- + +Layer-2 (L2) switching is the basis of modern networking. It's also +very simple and a good place to start, so let's set up a switch with +some VLANs in Faucet and see how it works at each layer. Begin by +putting the following into ``inst/faucet.yaml``:: + + dps: + switch-1: + dp_id: 0x1 + timeout: 3600 + arp_neighbor_timeout: 3600 + interfaces: + 1: + native_vlan: 100 + 2: + native_vlan: 100 + 3: + native_vlan: 100 + 4: + native_vlan: 200 + 5: + native_vlan: 200 + vlans: + 100: + 200: + +This configuration file defines a single switch ("datapath" or "dp") +named ``switch-1``. The switch has five ports, numbered 1 through 5. +Ports 1, 2, and 3 are in VLAN 100, and ports 4 and 5 are in VLAN 2. +Faucet can identify the switch from its datapath ID, which is defined +to be 0x1. + +.. note:: + + This also sets high MAC learning and ARP timeouts. The defaults are + 5 minutes and about 8 minutes, which are fine in production but + sometimes too fast for manual experimentation. (Don't use a timeout + bigger than about 65000 seconds because it will crash Faucet.) + +Now restart Faucet so that the configuration takes effect, e.g.:: + + $ docker restart faucet + +Assuming that the configuration update is successful, you should now +see a new line at the end of ``inst/faucet.log``:: + + Oct 14 22:36:42 faucet INFO Add new datapath DPID 1 (0x1) + +Faucet is now waiting for a switch with datapath ID 0x1 to connect to +it over OpenFlow, so our next step is to create a switch with OVS and +make it connect to Faucet. To do that, switch to the terminal where +you checked out OVS and start a sandbox with ``make sandbox`` or +`tutorial/ovs-sandbox`` (as explained earlier under `Setting Up +OVS`_). You should see something like this toward the end of the +output:: + + ---------------------------------------------------------------------- + You are running in a dummy Open vSwitch environment. You can use + ovs-vsctl, ovs-ofctl, ovs-appctl, and other tools to work with the + dummy switch. + + Log files, pidfiles, and the configuration database are in the + "sandbox" subdirectory. + + Exit the shell to kill the running daemons. + blp@sigabrt:~/nicira/ovs/tutorial(0)$ + +Inside the sandbox, create a switch ("bridge") named ``br0``, set its +datapath ID to 0x1, add simulated ports to it named ``p1`` through +``p5``, and tell it to connect to the Faucet controller. To make it +easier to understand, we request for port ``p1`` to be assigned +OpenFlow port 1, ``p2`` port 2, and so on. As a final touch, +configure the controller to be "out-of-band" (this is mainly to avoid +some annoying messages in the ``ovs-vswitchd`` logs; for more +information, run ``man ovs-vswitchd.conf.db`` and search for +``connection_mode``):: + + $ ovs-vsctl add-br br0 \ + -- set bridge br0 other-config:datapath-id=0000000000000001 \ + -- add-port br0 p1 -- set interface p1 ofport_request=1 \ + -- add-port br0 p2 -- set interface p2 ofport_request=2 \ + -- add-port br0 p3 -- set interface p3 ofport_request=3 \ + -- add-port br0 p4 -- set interface p4 ofport_request=4 \ + -- add-port br0 p5 -- set interface p5 ofport_request=5 \ + -- set-controller br0 tcp:127.0.0.1:6653 \ + -- set controller br0 connection-mode=out-of-band + +.. note:: + + You don't have to run all of these as a single ``ovs-vsctl`` + invocation. It is a little more efficient, though, and since it + updates the OVS configuration in a single database transaction it + means that, for example, there is never a time when the controller + is set but it has not yet been configured as out-of-band. + +Now, if you look at ``inst/faucet.log`` again, you should see that +Faucet recognized and configured the new switch and its ports:: + + Oct 14 22:50:08 faucet.valve INFO DPID 1 (0x1) Cold start configuring DP + Oct 14 22:50:08 faucet.valve INFO DPID 1 (0x1) Configuring VLAN 100 vid:100 ports:Port 1,Port 2,Port 3 + Oct 14 22:50:08 faucet.valve INFO DPID 1 (0x1) Configuring VLAN 200 vid:200 ports:Port 4,Port 5 + Oct 14 22:50:08 faucet.valve INFO DPID 1 (0x1) Port Port 1 up, configuring + Oct 14 22:50:08 faucet.valve INFO DPID 1 (0x1) Port Port 2 up, configuring + Oct 14 22:50:08 faucet.valve INFO DPID 1 (0x1) Port Port 3 up, configuring + Oct 14 22:50:08 faucet.valve INFO DPID 1 (0x1) Port Port 4 up, configuring + Oct 14 22:50:08 faucet.valve INFO DPID 1 (0x1) Port Port 5 up, configuring + +Over on the Open vSwitch side, you can see a lot of related activity +if you take a look in ``sandbox/ovs-vswitchd.log``. For example, here +is the basic OpenFlow session setup and Faucet's probe of the switch's +ports and capabilities:: + + rconn|INFO|br0<->tcp:127.0.0.1:6653: connecting... + vconn|DBG|tcp:127.0.0.1:6653: sent (Success): OFPT_HELLO (OF1.4) (xid=0x1): + version bitmap: 0x01, 0x02, 0x03, 0x04, 0x05 + vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_HELLO (OF1.3) (xid=0x2f24810a): + version bitmap: 0x01, 0x02, 0x03, 0x04 + vconn|DBG|tcp:127.0.0.1:6653: negotiated OpenFlow version 0x04 (we support version 0x05 and earlier, peer supports version 0x04 and earlier) + rconn|INFO|br0<->tcp:127.0.0.1:6653: connected + vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_ECHO_REQUEST (OF1.3) (xid=0x2f24810b): 0 bytes of payload + vconn|DBG|tcp:127.0.0.1:6653: sent (Success): OFPT_ECHO_REPLY (OF1.3) (xid=0x2f24810b): 0 bytes of payload + vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_FEATURES_REQUEST (OF1.3) (xid=0x2f24810c): + vconn|DBG|tcp:127.0.0.1:6653: sent (Success): OFPT_FEATURES_REPLY (OF1.3) (xid=0x2f24810c): dpid:0000000000000001 + n_tables:254, n_buffers:0 + capabilities: FLOW_STATS TABLE_STATS PORT_STATS GROUP_STATS QUEUE_STATS + vconn|DBG|tcp:127.0.0.1:6653: received: OFPST_PORT_DESC request (OF1.3) (xid=0x2f24810d): port=ANY + vconn|DBG|tcp:127.0.0.1:6653: sent (Success): OFPST_PORT_DESC reply (OF1.3) (xid=0x2f24810d): + 1(p1): addr:aa:55:aa:55:00:14 + config: PORT_DOWN + state: LINK_DOWN + speed: 0 Mbps now, 0 Mbps max + 2(p2): addr:aa:55:aa:55:00:15 + config: PORT_DOWN + state: LINK_DOWN + speed: 0 Mbps now, 0 Mbps max + 3(p3): addr:aa:55:aa:55:00:16 + config: PORT_DOWN + state: LINK_DOWN + speed: 0 Mbps now, 0 Mbps max + 4(p4): addr:aa:55:aa:55:00:17 + config: PORT_DOWN + state: LINK_DOWN + speed: 0 Mbps now, 0 Mbps max + 5(p5): addr:aa:55:aa:55:00:18 + config: PORT_DOWN + state: LINK_DOWN + speed: 0 Mbps now, 0 Mbps max + LOCAL(br0): addr:c6:64:ff:59:48:41 + config: PORT_DOWN + state: LINK_DOWN + speed: 0 Mbps now, 0 Mbps max + +After that, you can see Faucet delete all existing flows and then +start adding new ones:: + + vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_FLOW_MOD (OF1.3) (xid=0x2f24810e): DEL table:255 priority=0 actions=drop + vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_BARRIER_REQUEST (OF1.3) (xid=0x2f24810f): + vconn|DBG|tcp:127.0.0.1:6653: sent (Success): OFPT_BARRIER_REPLY (OF1.3) (xid=0x2f24810f): + vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_FLOW_MOD (OF1.3) (xid=0x2f248110): ADD priority=0 cookie:0x5adc15c0 out_port:0 actions=drop + vconn|DBG|tcp:127.0.0.1:6653: received: OFPT_FLOW_MOD (OF1.3) (xid=0x2f248111): ADD table:1 priority=0 cookie:0x5adc15c0 out_port:0 actions=drop + ... + +OpenFlow Layer +~~~~~~~~~~~~~~ + +Let's take a look at the OpenFlow tables that Faucet set up. Before +we do that, it's helpful to take a look at ``docs/architecture.rst`` +in the Faucet documentation to learn how Faucet structures its flow +tables. In summary, this document says: + +Table 0 + Port-based ACLs + +Table 1 + Ingress VLAN processing + +Table 2 + VLAN-based ACLs + +Table 3 + Ingress L2 processing, MAC learning + +Table 4 + L3 forwarding for IPv4 + +Table 5 + L3 forwarding for IPv6 + +Table 6 + Virtual IP processing, e.g. for router IP addresses implemented by Faucet + +Table 7 + Egress L2 processing + +Table 8 + Flooding + +With that in mind, let's dump the flow tables. The simplest way is to +just run plain ``ovs-ofctl dump-flows``:: + + $ ovs-ofctl dump-flows br0 + +If you run that bare command, it produces a lot of extra junk that +makes the output harder to read, like statistics and "cookie" values +that are all the same. In addition, for historical reasons +``ovs-ofctl`` always defaults to using OpenFlow 1.0 even though Faucet +and most modern controllers use OpenFlow 1.3, so it's best to force it +to use OpenFlow 1.3. We could throw in a lot of options to fix these, +but we'll want to do this more than once, so let's start by defining a +shell function for ourselves:: + + $ dump-flows () { + ovs-ofctl -OOpenFlow13 --names --no-stat dump-flows "$@" \ + | sed 's/cookie=0x5adc15c0, //' + } + +Let's also define ``save-flows`` and ``diff-flows`` functions for +later use:: + + $ save-flows () { + ovs-ofctl -OOpenFlow13 --no-names --sort dump-flows "$@" + } + $ diff-flows () { + ovs-ofctl -OOpenFlow13 diff-flows "$@" | sed 's/cookie=0x5adc15c0 //' + } + +Now let's take a look at the flows we've got and what they mean, like +this:: + + $ dump-flows br0 + +First, table 0 has a flow that just jumps to table 1 for each +configured port, and drops other unrecognized packets. Presumably it +will do more if we configured port-based ACLs:: + + priority=9099,in_port=p1 actions=goto_table:1 + priority=9099,in_port=p2 actions=goto_table:1 + priority=9099,in_port=p3 actions=goto_table:1 + priority=9099,in_port=p4 actions=goto_table:1 + priority=9099,in_port=p5 actions=goto_table:1 + priority=0 actions=drop + +Table 1, for ingress VLAN processing, has a bunch of flows that drop +inappropriate packets, like those that claim to be from a broadcast +source address (why not from all multicast source addresses, +though?):: + + table=1, priority=9099,dl_src=ff:ff:ff:ff:ff:ff actions=drop + table=1, priority=9001,dl_src=0e:00:00:00:00:01 actions=drop + table=1, priority=9099,dl_dst=01:80:c2:00:00:00 actions=drop + table=1, priority=9099,dl_dst=01:00:0c:cc:cc:cd actions=drop + table=1, priority=9099,dl_type=0x88cc actions=drop + +Table 1 also has some more interesting flows that recognize packets +without a VLAN header on each of our ports +(``vlan_tci=0x0000/0x1fff``), push on the VLAN configured for the +port, and proceed to table 3. Presumably these skip table 2 because +we did not configure any VLAN-based ACLs. There is also a fallback +flow to drop other packets, which in practice means that if any +received packet already has a VLAN header then it will be dropped:: + + table=1, priority=9000,in_port=p1,vlan_tci=0x0000/0x1fff actions=push_vlan:0x8100,set_field:4196->vlan_vid,goto_table:3 + table=1, priority=9000,in_port=p2,vlan_tci=0x0000/0x1fff actions=push_vlan:0x8100,set_field:4196->vlan_vid,goto_table:3 + table=1, priority=9000,in_port=p3,vlan_tci=0x0000/0x1fff actions=push_vlan:0x8100,set_field:4196->vlan_vid,goto_table:3 + table=1, priority=9000,in_port=p4,vlan_tci=0x0000/0x1fff actions=push_vlan:0x8100,set_field:4296->vlan_vid,goto_table:3 + table=1, priority=9000,in_port=p5,vlan_tci=0x0000/0x1fff actions=push_vlan:0x8100,set_field:4296->vlan_vid,goto_table:3 + table=1, priority=0 actions=drop + +.. note:: + + The syntax ``set_field:4196->vlan_vid`` is curious and somewhat + misleading. OpenFlow 1.3 defines the ``vlan_vid`` field as a 13-bit + field where bit 12 is set to 1 if the VLAN header is present. Thus, + since 4196 is 0x1064, this action sets VLAN value 0x64, which in + decimal is 100. + +Table 2 isn't used because there are no VLAN-based ACLs. It just has +a drop flow:: + + table=2, priority=0 actions=drop + +Table 3 is used for MAC learning but the controller hasn't learned any +MAC yet. We'll come back here later:: + + table=3, priority=0 actions=drop + table=3, priority=9000 actions=CONTROLLER:96,goto_table:7 + +Tables 4, 5, and 6 aren't used because we haven't configured any +routing:: + + table=4, priority=0 actions=drop + table=5, priority=0 actions=drop + table=6, priority=0 actions=drop + +Table 7 is used to direct packets to learned MACs but Faucet hasn't +learned any MACs yet, so it just sends all the packets along to table +8:: + + table=7, priority=0 actions=drop + table=7, priority=9000 actions=goto_table:8 + +Table 8 implements flooding, broadcast, and multicast. The flows for +broadcast and flood are easy to understand: if the packet came in on a +given port and needs to be flooded or broadcast, output it to all the +other ports in the same VLAN:: + + table=8, priority=9008,in_port=p1,dl_vlan=100,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p2,output:p3 + table=8, priority=9008,in_port=p2,dl_vlan=100,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p1,output:p3 + table=8, priority=9008,in_port=p3,dl_vlan=100,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p1,output:p2 + table=8, priority=9008,in_port=p4,dl_vlan=200,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p5 + table=8, priority=9008,in_port=p5,dl_vlan=200,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p4 + table=8, priority=9000,in_port=p1,dl_vlan=100 actions=pop_vlan,output:p2,output:p3 + table=8, priority=9000,in_port=p2,dl_vlan=100 actions=pop_vlan,output:p1,output:p3 + table=8, priority=9000,in_port=p3,dl_vlan=100 actions=pop_vlan,output:p1,output:p2 + table=8, priority=9000,in_port=p4,dl_vlan=200 actions=pop_vlan,output:p5 + table=8, priority=9000,in_port=p5,dl_vlan=200 actions=pop_vlan,output:p4 + +.. note:: + + These flows could apparently be simpler because OpenFlow says that + ``output:`` is ignored if ```` is the input port. That + means that the first three flows above could apparently be collapsed + into just:: + + table=8, priority=9008,dl_vlan=100,dl_dst=ff:ff:ff:ff:ff:ff actions=pop_vlan,output:p1,output:p2,output:p3 + + There might be some reason why this won't work or isn't practical, + but that isn't obvious from looking at the flow table. + +There are also some flows for handling some standard forms of +multicast, and a fallback drop flow:: + + table=8, priority=9006,in_port=p1,dl_vlan=100,dl_dst=33:33:00:00:00:00/ff:ff:00:00:00:00 actions=pop_vlan,output:p2,output:p3 + table=8, priority=9006,in_port=p2,dl_vlan=100,dl_dst=33:33:00:00:00:00/ff:ff:00:00:00:00 actions=pop_vlan,output:p1,output:p3 + table=8, priority=9006,in_port=p3,dl_vlan=100,dl_dst=33:33:00:00:00:00/ff:ff:00:00:00:00 actions=pop_vlan,output:p1,output:p2 + table=8, priority=9006,in_port=p4,dl_vlan=200,dl_dst=33:33:00:00:00:00/ff:ff:00:00:00:00 actions=pop_vlan,output:p5 + table=8, priority=9006,in_port=p5,dl_vlan=200,dl_dst=33:33:00:00:00:00/ff:ff:00:00:00:00 actions=pop_vlan,output:p4 + table=8, priority=9002,in_port=p1,dl_vlan=100,dl_dst=01:80:c2:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p2,output:p3 + table=8, priority=9002,in_port=p2,dl_vlan=100,dl_dst=01:80:c2:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p1,output:p3 + table=8, priority=9002,in_port=p3,dl_vlan=100,dl_dst=01:80:c2:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p1,output:p2 + table=8, priority=9004,in_port=p1,dl_vlan=100,dl_dst=01:00:5e:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p2,output:p3 + table=8, priority=9004,in_port=p2,dl_vlan=100,dl_dst=01:00:5e:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p1,output:p3 + table=8, priority=9004,in_port=p3,dl_vlan=100,dl_dst=01:00:5e:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p1,output:p2 + table=8, priority=9002,in_port=p4,dl_vlan=200,dl_dst=01:80:c2:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p5 + table=8, priority=9002,in_port=p5,dl_vlan=200,dl_dst=01:80:c2:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p4 + table=8, priority=9004,in_port=p4,dl_vlan=200,dl_dst=01:00:5e:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p5 + table=8, priority=9004,in_port=p5,dl_vlan=200,dl_dst=01:00:5e:00:00:00/ff:ff:ff:00:00:00 actions=pop_vlan,output:p4 + table=8, priority=0 actions=drop + +Tracing +~~~~~~~ + +Let's go a level deeper. So far, everything we've done has been +fairly general. We can also look at something more specific: the path +that a particular packet would take through Open vSwitch. We can use +OVN ``ofproto/trace`` command to play "what-if?" games. This command +is one that we send directly to ``ovs-vswitchd``, using the +``ovs-appctl`` utility. + +.. note:: + + ``ovs-appctl`` is actually a very simple-minded JSON-RPC client, so you could + also use some other utility that speaks JSON-RPC, or access it from a program + as an API. + +The ``ovs-vswitchd``\(8) manpage has a lot of detail on how to use +``ofproto/trace``, but let's just start by building up from a simple +example. You can start with a command that just specifies the +datapath (e.g. ``br0``), an input port, and nothing else; unspecified +fields default to all-zeros. Let's look at the full output for this +trivial example:: + + $ ovs-appctl ofproto/trace br0 in_port=p1 + Flow: in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000 + + bridge("br0") + ------------- + 0. in_port=1, priority 9099, cookie 0x5adc15c0 + goto_table:1 + 1. in_port=1,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0 + push_vlan:0x8100 + set_field:4196->vlan_vid + goto_table:3 + 3. priority 9000, cookie 0x5adc15c0 + CONTROLLER:96 + goto_table:7 + 7. priority 9000, cookie 0x5adc15c0 + goto_table:8 + 8. in_port=1,dl_vlan=100, priority 9000, cookie 0x5adc15c0 + pop_vlan + output:2 + output:3 + + Final flow: unchanged + Megaflow: recirc_id=0,eth,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000 + Datapath actions: push_vlan(vid=100,pcp=0),pop_vlan,2,3 + This flow is handled by the userspace slow path because it: + - Sends "packet-in" messages to the OpenFlow controller. + +The first line of output, beginning with ``Flow:``, just repeats our +request in a more verbose form, including the L2 fields that were +zeroed. + +Each of the numbered items under ``bridge("br0")`` shows what would +happen to our hypothetical packet in the table with the given number. +For example, we see in table 1 that the packet matches a flow that +push on a VLAN header, set the VLAN ID to 100, and goes on to further +processing in table 3. In table 3, the packet gets sent to the +controller to allow MAC learning to take place, and then table 8 +floods the packet to the other ports in the same VLAN. + +Summary information follows the numbered tables. The packet hasn't +been changed (overall, even though a VLAN was pushed and then popped +back off) since ingress, hence ``Final flow: unchanged``. We'll look +at the ``Megaflow`` information later. The ``Datapath actions`` +summarize what would actually happen to such a packet. Finally, the +note at the end gives a hint that this flow would not perform well for +large volumes of traffic, because it has to be handled in the switch's +slow path since it sends OpenFlow messages to the controller. + +.. note:: + + This performance limitation is probably not problematic in this case + because it is only used for MAC learning, so that most packets won't + encounter it. However, the Open vSwitch 2.9 release (which is + upcoming as of this writing) will likely remove this performance + limitation anyway. + +Triggering MAC Learning +~~~~~~~~~~~~~~~~~~~~~~~ + +We just saw how a packet gets sent to the controller to trigger MAC +learning. Let's actually send the packet and see what happens. But +before we do that, let's save a copy of the current flow tables for +later comparison:: + + $ save-flows br0 > flows1 + +Now use ``ofproto/trace``, as before, with a few new twists: we +specify the source and destination Ethernet addresses and append the +``-generate`` option so that side effects like sending a packet to the +controller actually happen:: + + $ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:11:11:00:00:00,dl_dst=00:22:22:00:00:00 -generate + +The output is almost identical to that before, so it is not repeated +here. But, take a look at ``inst/faucet.log`` now. It should now +include a line at the end that says that it learned about our MAC +00:11:11:00:00:00, like this:: + + Oct 15 01:16:23 faucet.valve INFO DPID 1 (0x1) learned 00:11:11:00:00:00 on Port 1 on VLAN 100 (1 hosts total) + +Now compare the flow tables that we saved to the current ones:: + + diff-flows flows1 br0 + +The result should look like this, showing new flows for the learned +MACs:: + + +table=3 priority=9098,in_port=p1,dl_vlan=100,dl_src=00:11:11:00:00:00 cookie=0x5adc15c0 hard_timeout=3601 actions=resubmit(,7) + +table=7 priority=9099,dl_vlan=100,dl_dst=00:11:11:00:00:00 cookie=0x5adc15c0 idle_timeout=3601 actions=strip_vlan,output:p1 + +To demonstrate the usefulness of the learned MAC, try tracing (with +side effects) a packet arriving on ``p2`` (or ``p3``) and destined to +the address learned on ``p1``, like this:: + + $ ovs-appctl ofproto/trace br0 in_port=p2,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00 -generate + +The first time you run this command, you will notice that it sends the +packet to the controller, to learn ``p2``'s 00:22:22:00:00:00 source +address:: + + bridge("br0") + ------------- + 0. in_port=2, priority 9099, cookie 0x5adc15c0 + goto_table:1 + 1. in_port=2,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0 + push_vlan:0x8100 + set_field:4196->vlan_vid + goto_table:3 + 3. priority 9000, cookie 0x5adc15c0 + CONTROLLER:96 + goto_table:7 + 7. dl_vlan=100,dl_dst=00:11:11:00:00:00, priority 9099, cookie 0x5adc15c0 + pop_vlan + output:1 + +If you check ``inst/faucet.log``, you can see that ``p2``'s MAC has +been learned too:: + + Oct 15 01:24:01 faucet.valve INFO DPID 1 (0x1) learned 00:22:22:00:00:00 on Port 2 on VLAN 100 (2 hosts total) + +Similarly for ``diff-flows``:: + + $ diff-flows flows1 br0 + +table=3 priority=9098,in_port=p1,dl_vlan=100,dl_src=00:11:11:00:00:00 cookie=0x5adc15c0 hard_timeout=3601 actions=resubmit(,7) + +table=3 priority=9098,in_port=p2,dl_vlan=100,dl_src=00:22:22:00:00:00 cookie=0x5adc15c0 hard_timeout=3604 actions=resubmit(,7) + +table=7 priority=9099,dl_vlan=100,dl_dst=00:11:11:00:00:00 cookie=0x5adc15c0 idle_timeout=3601 actions=strip_vlan,output:p1 + +table=7 priority=9099,dl_vlan=100,dl_dst=00:22:22:00:00:00 cookie=0x5adc15c0 idle_timeout=3604 actions=strip_vlan,output:p2 + +Then, if you re-run either of the ``ofproto/trace`` commands (with or +without ``-generate``), you can see that the packets go back and forth +without any further MAC learning, e.g.:: + + $ ovs-appctl ofproto/trace br0 in_port=p2,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00 -generate + Flow: in_port=2,vlan_tci=0x0000,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00,dl_type=0x0000 + + bridge("br0") + ------------- + 0. in_port=2, priority 9099, cookie 0x5adc15c0 + goto_table:1 + 1. in_port=2,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0 + push_vlan:0x8100 + set_field:4196->vlan_vid + goto_table:3 + 3. in_port=2,dl_vlan=100,dl_src=00:22:22:00:00:00, priority 9098, cookie 0x5adc15c0 + goto_table:7 + 7. dl_vlan=100,dl_dst=00:11:11:00:00:00, priority 9099, cookie 0x5adc15c0 + pop_vlan + output:1 + + Final flow: unchanged + Megaflow: recirc_id=0,eth,in_port=2,vlan_tci=0x0000/0x1fff,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00,dl_type=0x0000 + +Performance +~~~~~~~~~~~ + +We've already seen one factor that can be important for performance: +Open vSwitch forces any flow that sends a packet to an OpenFlow +controller into its "slow path", which means that processing packets +in the flow will be orders of magnitude slower than otherwise. This +distinction between "slow path" and "fast path" is the key to making +sure that Open vSwitch performs as fast as possible. + +In addition to sending packets to a controller, some other factors can +force a flow or a packet to take the slow path. As one example, all +CFM, BFD, LACP, STP, and LLDP processing takes place in the slow path, +in the cases where Open vSwitch processes these protocols itself +instead of delegating to controller-written flows. As a second +example, any flow that modifies ARP fields is processed in the slow +path. These are corner cases that are unlikely to cause performance +problems in practice because these protocols send packets at a +relatively slow rate, and users and controller authors do not normally +need to be concerned about them. + +To understand what cases users and controller authors should consider, +we need to talk about how Open vSwitch optimizes for performance. The +Open vSwitch code is divided into two major components which, as +already mentioned, are called the "slow path" and "fast path" (aka +"datapath"). The slow path is embedded in the ``ovs-vswitchd`` +userspace program. It is the part of the Open vSwitch packet +processing logic that understands OpenFlow. Its job is to take a +packet and run it through the OpenFlow tables to determine what should +happen to it. It outputs a list of actions in a form similar to +OpenFlow actions but simpler, called "ODP actions" or "datapath +actions". It then passes the ODP actions to the datapath, which +applies them to the packet. + +.. note:: + + Open vSwitch contains a single slow path and multiple fast paths. + The difference between using Open vSwitch with the Linux kernel + versus with DPDK is the datapath. + +If every packet passed through the slow path and the fast path in this +way, performance would be terrible. The key to getting high +performance from this architecture is caching. Open vSwitch includes +a multi-level cache. It works like this: + +1. A packet initially arrives at the datapath. Some datapaths (such + as DPDK and the in-tree version of the OVS kernel module) have a + first-level cache called the "microflow cache". The microflow + cache is the key to performance for relatively long-lived, high + packet rate flows. If the datapath has a microflow cache, then it + consults it and, if there is a cache hit, the datapath executes the + associated actions. Otherwise, it proceeds to step 2. + +2. The datapath consults its second-level cache, called the "megaflow + cache". The megaflow cache is the key to performance for shorter + or low packet rate flows. If there is a megaflow cache hit, the + datapath executes the associated actions. Otherwise, it proceeds + to step 3. + +3. The datapath passes the packet to the slow path, which runs it + through the OpenFlow table to yield ODP actions, a process that is + often called "flow translation". It then passes the packet back to + the datapath to execute the actions and to, if possible, install a + megaflow cache entry so that subsequent similar packets can be + handled directly by the fast path. (We already described above + most of the cases where a cache entry cannot be installed.) + +The megaflow cache is the key cache to consider for performance +tuning. Open vSwitch provides tools for understanding and optimizing +its behavior. The ``ofproto/trace`` command that we have already been +using is the most common tool for this use. Let's take another look +at the most recent ``ofproto/trace`` output:: + + $ ovs-appctl ofproto/trace br0 in_port=p2,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00 -generate + Flow: in_port=2,vlan_tci=0x0000,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00,dl_type=0x0000 + + bridge("br0") + ------------- + 0. in_port=2, priority 9099, cookie 0x5adc15c0 + goto_table:1 + 1. in_port=2,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0 + push_vlan:0x8100 + set_field:4196->vlan_vid + goto_table:3 + 3. in_port=2,dl_vlan=100,dl_src=00:22:22:00:00:00, priority 9098, cookie 0x5adc15c0 + goto_table:7 + 7. dl_vlan=100,dl_dst=00:11:11:00:00:00, priority 9099, cookie 0x5adc15c0 + pop_vlan + output:1 + + Final flow: unchanged + Megaflow: recirc_id=0,eth,in_port=2,vlan_tci=0x0000/0x1fff,dl_src=00:22:22:00:00:00,dl_dst=00:11:11:00:00:00,dl_type=0x0000 + +This time, it's the last line that we're interested in. This line +shows the entry that Open vSwitch would insert into the megaflow cache +given the particular packet with the current flow tables. The +megaflow entry includes: + +* ``recirc_id``. This is an implementation detail that users don't + normally need to understand. + +* ``eth``. This just indicates that the cache entry matches only + Ethernet packets; Open vSwitch also supports other types of packets, + such as IP packets not encapsulated in Ethernet. + +* All of the fields matched by any of the flows that the packet + visited: + + ``in_port`` + In tables 0, 1, and 3. + + ``vlan_tci`` + In tables 1, 3, and 7 (``vlan_tci`` includes the VLAN ID and PCP + fields and``dl_vlan`` is just the VLAN ID). + + ``dl_src`` + In table 3 + + ``dl_dst`` + In table 7. + +* All of the fields matched by flows that had to be ruled out to + ensure that the ones that actually matched were the highest priority + matching rules. + +The last one is important. Notice how the megaflow matches on +``dl_type=0x0000``, even though none of the tables matched on +``dl_type`` (the Ethernet type). One reason is because of this flow +in OpenFlow table 1 (which shows up in ``dump-flows`` output):: + + table=1, priority=9099,dl_type=0x88cc actions=drop + +This flow has higher priority than the flow in table 1 that actually +matched. This means that, to put it in the megaflow cache, +``ovs-vswitchd`` has to add a match on ``dl_type`` to ensure that the +cache entry doesn't match LLDP packets (with Ethertype 0x88cc). + +.. note:: + + In fact, in some cases ``ovs-vswitchd`` matches on fields that + aren't strictly required according to this description. ``dl_type`` + is actually one of those, so deleting the LLDP flow probably would + not have any effect on the megaflow. But the principle here is + sound. + +So why does any of this matter? It's because, the more specific a +megaflow is, that is, the more fields or bits within fields that a +megaflow matches, the less valuable it is from a caching viewpoint. A +very specific megaflow might match on L2 and L3 addresses and L4 port +numbers. When that happens, only packets in one (half-)connection +match the megaflow. If that connection has only a few packets, as +many connections do, then the high cost of the slow path translation +is amortized over only a few packets, so the average cost of +forwarding those packets is high. On the other hand, if a megaflow +only matches a relatively small number of L2 and L3 packets, then the +cache entry can potentially be used by many individual connections, +and the average cost is low. + +For more information on how Open vSwitch constructs megaflows, +including about ways that it can make megaflow entries less specific +than one would infer from the discussion here, please refer to the +2015 NSDI paper, "The Design and Implementation of Open vSwitch", +which focuses on this algorithm. + +Routing +------- + +We've looked at how Faucet implements switching in OpenFlow, and how +Open vSwitch implements OpenFlow through its datapath architecture. +Now let's start over, adding L3 routing into the picture. + +It's remarkably easy to enable routing. We just change our ``vlans`` +section in ``inst/faucet.yaml`` to specify a router IP address for +each VLAN and define a router between them. The ``dps`` section is +unchanged:: + + dps: + switch-1: + dp_id: 0x1 + timeout: 3600 + arp_neighbor_timeout: 3600 + interfaces: + 1: + native_vlan: 100 + 2: + native_vlan: 100 + 3: + native_vlan: 100 + 4: + native_vlan: 200 + 5: + native_vlan: 200 + vlans: + 100: + faucet_vips: ["10.100.0.254/24"] + 200: + faucet_vips: ["10.200.0.254/24"] + routers: + router-1: + vlans: [100, 200] + +Then we restart Faucet:: + + $ docker restart faucet + +.. note:: + + One should be able to tell Faucet to re-read its configuration file + without restarting it. I sometimes saw anomalous behavior when I + did this, although I didn't characterize it well enough to make a + quality bug report. I found restarting the container to be + reliable. + +OpenFlow Layer +~~~~~~~~~~~~~~ + +Back in the OVS sandbox, let's see how the flow table has changed, with:: + + $ diff-flows flows1 br0 + +First, table 3 has new flows to direct ARP packets to table 6 (the +virtual IP processing table), presumably to handle ARP for the router +IPs. New flows also send IP packets destined to a particular Ethernet +address to table 4 (the L3 forwarding table); we can make the educated +guess that the Ethernet address is the one used by the Faucet router:: + + +table=3 priority=9131,arp,dl_vlan=100 actions=resubmit(,6) + +table=3 priority=9131,arp,dl_vlan=200 actions=resubmit(,6) + +table=3 priority=9099,ip,dl_vlan=100,dl_dst=0e:00:00:00:00:01 actions=resubmit(,4) + +table=3 priority=9099,ip,dl_vlan=200,dl_dst=0e:00:00:00:00:01 actions=resubmit(,4) + +The new flows in table 4 appear to be verifying that the packets are +indeed addressed to a network or IP address that Faucet knows how to +route:: + + +table=4 priority=9131,ip,dl_vlan=100,nw_dst=10.100.0.254 actions=resubmit(,6) + +table=4 priority=9131,ip,dl_vlan=200,nw_dst=10.200.0.254 actions=resubmit(,6) + +table=4 priority=9123,ip,dl_vlan=200,nw_dst=10.100.0.0/24 actions=resubmit(,6) + +table=4 priority=9123,ip,dl_vlan=100,nw_dst=10.100.0.0/24 actions=resubmit(,6) + +table=4 priority=9123,ip,dl_vlan=200,nw_dst=10.200.0.0/24 actions=resubmit(,6) + +table=4 priority=9123,ip,dl_vlan=100,nw_dst=10.200.0.0/24 actions=resubmit(,6) + +Table 6 has a few different things going on. It sends ARP requests +for the router IPs to the controller; presumably the controller will +generate replies and send them back to the requester. It switches +other ARP packets, either broadcasting them if they have a broadcast +destination or attempting to unicast them otherwise. It sends all +other IP packets to the controller:: + + +table=6 priority=9133,arp,arp_tpa=10.100.0.254 actions=CONTROLLER:96 + +table=6 priority=9133,arp,arp_tpa=10.200.0.254 actions=CONTROLLER:96 + +table=6 priority=9132,arp,dl_dst=ff:ff:ff:ff:ff:ff actions=resubmit(,8) + +table=6 priority=9131,arp actions=resubmit(,7) + +table=6 priority=9131,ip actions=CONTROLLER:96 + +table=6 priority=9131,icmp actions=CONTROLLER:96 + +.. note:: + + There's one oddity here in that ICMP packets can match either the + ``ip`` or ``icmp`` entry, which both have priority 9131. OpenFlow + says, "If there are multiple matching flow entries with the same + highest priority, the selected flow entry is explicitly undefined." + In this case, it probably doesn't matter, since both flows have the + same actions, but if Faucet wants to keep track of ICMP statistics + separately from other IP packets, then it should install the ``ip`` + flow with a lower priority than the ``icmp`` flow. + +Performance is clearly going to be poor if every packet that needs to +be routed has to go to the controller, but it's unlikely that's the +full story. In the next section, we'll take a closer look. + +Tracing +~~~~~~~ + +As in our switching example, we can play some "what-if?" games to +figure out how this works. Let's suppose that a machine with IP +10.100.0.1, on port ``p1``, wants to send a IP packet to a machine +with IP 10.200.0.1 on port ``p4``. Assuming that these hosts have not +been in communication recently, the steps to accomplish this are +normally the following: + +1. Host 10.100.0.1 sends an ARP request to router 10.100.0.254. + +2. The router sends an ARP reply to the host. + +3. Host 10.100.0.1 sends an IP packet to 10.200.0.1, via the router's + Ethernet address. + +4. The router broadcasts an ARP request to ``p4`` and ``p5``, the + ports that carry the 10.200.0. network. + +5. Host 10.200.0.1 sends an ARP reply to the router. + +6. Either the router sends the IP packet (which it buffered) to + 10.200.0.1, or eventually 10.100.0.1 times out and resends it. + +Let's use ``ofproto/trace`` to see whether Faucet and OVS follow this +procedure. + +Before we start, save a new snapshot of the flow tables for later +comparison:: + + $ save-flows br0 > flows2 + +Step 1: Host ARP for Router ++++++++++++++++++++++++++++ + +Let's simulate the ARP from 10.100.0.1 to its gateway router +10.100.0.254. This requires more detail than any of the packets we've +simulated previously:: + + $ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:01:02:03:04:05,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x806,arp_spa=10.100.0.1,arp_tpa=10.100.0.254,arp_sha=00:01:02:03:04:05,arp_tha=ff:ff:ff:ff:ff:ff,arp_op=1 -generate + +The important part of the output is where it shows that the packet was +recognized as an ARP request destined to the router gateway and +therefore sent to the controller:: + + 6. arp,arp_tpa=10.100.0.254, priority 9133, cookie 0x5adc15c0 + CONTROLLER:96 + +The Faucet log shows that Faucet learned the host's MAC address, +its MAC-to-IP mapping, and responded to the ARP request:: + + Oct 15 19:01:16 faucet.valve INFO DPID 1 (0x1) Adding new route 10.100.0.1/32 via 10.100.0.1 (00:01:02:03:04:05) on VLAN 100 + Oct 15 19:01:16 faucet.valve INFO DPID 1 (0x1) Responded to ARP request for 10.100.0.254 from 10.100.0.1 (00:01:02:03:04:05) on VLAN 100 + Oct 15 19:01:16 faucet.valve INFO DPID 1 (0x1) learned 00:01:02:03:04:05 on Port 1 on VLAN 100 (1 hosts total) + +We can also look at the changes to the flow tables:: + + $ diff-flows flows2 br0 + +table=3 priority=9098,in_port=1,dl_vlan=100,dl_src=00:01:02:03:04:05 hard_timeout=3600 actions=goto_table:7 + +table=4 priority=9131,ip,dl_vlan=100,nw_dst=10.100.0.1 actions=set_field:4196->vlan_vid,set_field:0e:00:00:00:00:01->eth_src,set_field:00:01:02:03:04:05->eth_dst,dec_ttl,goto_table:7 + +table=4 priority=9131,ip,dl_vlan=200,nw_dst=10.100.0.1 actions=set_field:4196->vlan_vid,set_field:0e:00:00:00:00:01->eth_src,set_field:00:01:02:03:04:05->eth_dst,dec_ttl,goto_table:7 + +table=7 priority=9099,dl_vlan=100,dl_dst=00:01:02:03:04:05 idle_timeout=3600 actions=pop_vlan,output:1 + +The new flows include one in table 3 and one in table 7 for the +learned MAC, which have the same forms we saw before. The new flows +in table 4 are different. They matches packets directed to 10.100.0.1 +(in two VLANs) and forward them to the host by updating the Ethernet +source and destination addresses appropriately, decrementing the TTL, +and skipping ahead to unicast output in table 7. This means that +packets sent **to** 10.100.0.1 should now get to their destination. + +Step 2: Router Sends ARP Reply +++++++++++++++++++++++++++++++ + +``inst/faucet.log`` said that the router sent an ARP reply. How can +we see it? Simulated packets just get dropped by default. One way is +to configure the dummy ports to write the packets they receive to a +file. Let's try that. First configure the port:: + + $ ovs-vsctl set interface p1 options:pcap=p1.pcap + +Then re-run the "trace" command:: + + $ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:01:02:03:04:05,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x806,arp_spa=10.100.0.1,arp_tpa=10.100.0.254,arp_sha=00:01:02:03:04:05,arp_tha=ff:ff:ff:ff:ff:ff,arp_op=1 -generate + +And dump the reply packet:: + + $ /usr/sbin/tcpdump -evvvr sandbox/p1.pcap + reading from file sandbox/x.pcap, link-type EN10MB (Ethernet) + 15:34:14.172222 0e:00:00:00:00:01 (oui Unknown) > 00:01:02:03:04:05 (oui Unknown), ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.100.0.254 is-at 0e:00:00:00:00:01 (oui Unknown), length 46 + +We clearly see the ARP reply, which tells us that the Faucet router's +Ethernet address is 0e:00:00:00:00:01 (as we guessed before from the +flow table. + +Let's configure the rest of our ports to log their packets, too:: + + $ for i in 2 3 4 5; do ovs-vsctl set interface p$i options:pcap=p$i.pcap; done + +Step 3: Host Sends IP Packet +++++++++++++++++++++++++++++ + +Now that host 10.100.0.1 has the MAC address for its router, it can +send an IP packet to 10.200.0.1 via the router's MAC address, like +this:: + + $ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,udp,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_ttl=64 -generate + Flow: ip,in_port=1,vlan_tci=0x0000,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_proto=17,nw_tos=0,nw_ecn=0,nw_ttl=64 + + bridge("br0") + ------------- + 0. in_port=1, priority 9099, cookie 0x5adc15c0 + goto_table:1 + 1. in_port=1,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0 + push_vlan:0x8100 + set_field:4196->vlan_vid + goto_table:3 + 3. ip,dl_vlan=100,dl_dst=0e:00:00:00:00:01, priority 9099, cookie 0x5adc15c0 + goto_table:4 + 4. ip,dl_vlan=100,nw_dst=10.200.0.0/24, priority 9123, cookie 0x5adc15c0 + goto_table:6 + 6. ip, priority 9131, cookie 0x5adc15c0 + CONTROLLER:96 + + Final flow: ip,in_port=1,dl_vlan=100,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_proto=17,nw_tos=0,nw_ecn=0,nw_ttl=64 + Megaflow: recirc_id=0,eth,ip,in_port=1,vlan_tci=0x0000/0x1fff,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_dst=10.200.0.1,nw_frag=no + Datapath actions: push_vlan(vid=100,pcp=0) + This flow is handled by the userspace slow path because it: + - Sends "packet-in" messages to the OpenFlow controller. + +Observe that the packet gets recognized as destined to the router, in +table 3, and then as properly destined to the 10.200.0.0/24 network, +in table 4. In table 6, however, it gets sent to the controller. +Presumably, this is because Faucet has not yet resolved an Ethernet +address for the destination host 10.200.0.1. It probably sent out an +ARP request. Let's take a look in the next step. + +Step 4: Router Broadcasts ARP Request ++++++++++++++++++++++++++++++++++++++ + +The router needs to know the Ethernet address of 10.200.0.1. It knows +that, if this machine exists, it's on port ``p4`` or ``p5``, since we +configured those ports as VLAN 200. + +Let's make sure:: + + $ /usr/sbin/tcpdump -evvvr sandbox/p4.pcap + reading from file sandbox/p4.pcap, link-type EN10MB (Ethernet) + 15:55:42.977504 0e:00:00:00:00:01 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.200.0.1 tell 10.200.0.254, length 46 + +and:: + + $ /usr/sbin/tcpdump -evvvr sandbox/p5.pcap + reading from file sandbox/p5.pcap, link-type EN10MB (Ethernet) + 15:55:42.977568 0e:00:00:00:00:01 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.200.0.1 tell 10.200.0.254, length 46 + +For good measure, let's make sure that it wasn't sent to ``p3``:: + + $ /usr/sbin/tcpdump -evvvr sandbox/p3.pcap + reading from file sandbox/p3.pcap, link-type EN10MB (Ethernet) + +Step 5: Host 2 Sends ARP Reply +++++++++++++++++++++++++++++++ + +The Faucet controller sent an ARP request, so we can send an ARP +reply:: + + $ ovs-appctl ofproto/trace br0 in_port=p4,dl_src=00:10:20:30:40:50,dl_dst=0e:00:00:00:00:01,dl_type=0x806,arp_spa=10.200.0.1,arp_tpa=10.200.0.254,arp_sha=00:10:20:30:40:50,arp_tha=0e:00:00:00:00:01,arp_op=2 -generate + Flow: arp,in_port=4,vlan_tci=0x0000,dl_src=00:10:20:30:40:50,dl_dst=0e:00:00:00:00:01,arp_spa=10.200.0.1,arp_tpa=10.200.0.254,arp_op=2,arp_sha=00:10:20:30:40:50,arp_tha=0e:00:00:00:00:01 + + bridge("br0") + ------------- + 0. in_port=4, priority 9099, cookie 0x5adc15c0 + goto_table:1 + 1. in_port=4,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0 + push_vlan:0x8100 + set_field:4296->vlan_vid + goto_table:3 + 3. arp,dl_vlan=200, priority 9131, cookie 0x5adc15c0 + goto_table:6 + 6. arp,arp_tpa=10.200.0.254, priority 9133, cookie 0x5adc15c0 + CONTROLLER:96 + + Final flow: arp,in_port=4,dl_vlan=200,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=00:10:20:30:40:50,dl_dst=0e:00:00:00:00:01,arp_spa=10.200.0.1,arp_tpa=10.200.0.254,arp_op=2,arp_sha=00:10:20:30:40:50,arp_tha=0e:00:00:00:00:01 + Megaflow: recirc_id=0,eth,arp,in_port=4,vlan_tci=0x0000/0x1fff,dl_src=00:10:20:30:40:50,dl_dst=0e:00:00:00:00:01,arp_tpa=10.200.0.254 + Datapath actions: push_vlan(vid=200,pcp=0) + This flow is handled by the userspace slow path because it: + - Sends "packet-in" messages to the OpenFlow controller. + +It shows up in ``inst/faucet.log``:: + + Oct 16 23:02:22 faucet.valve INFO DPID 1 (0x1) ARP response 10.200.0.1 (00:10:20:30:40:50) on VLAN 200 + Oct 16 23:02:22 faucet.valve INFO DPID 1 (0x1) learned 00:10:20:30:40:50 on Port 4 on VLAN 200 (1 hosts total) + +and in the OVS flow tables:: + + $ diff-flows flows2 br0 + +table=3 priority=9098,in_port=4,dl_vlan=200,dl_src=00:10:20:30:40:50 hard_timeout=295 actions=goto_table:7 + ... + +table=4 priority=9131,ip,dl_vlan=200,nw_dst=10.200.0.1 actions=set_field:4296->vlan_vid,set_field:0e:00:00:00:00:01->eth_src,set_field:00:10:20:30:40:50->eth_dst,dec_ttl,goto_table:7 + +table=4 priority=9131,ip,dl_vlan=100,nw_dst=10.200.0.1 actions=set_field:4296->vlan_vid,set_field:0e:00:00:00:00:01->eth_src,set_field:00:10:20:30:40:50->eth_dst,dec_ttl,goto_table:7 + ... + +table=4 priority=9123,ip,dl_vlan=100,nw_dst=10.200.0.0/24 actions=goto_table:6 + +table=7 priority=9099,dl_vlan=200,dl_dst=00:10:20:30:40:50 idle_timeout=295 actions=pop_vlan,output:4 + +Step 6: IP Packet Delivery +++++++++++++++++++++++++++ + +Now both the host and the router have everything they need to deliver +the packet. There are two ways it might happen. If Faucet's router +is smart enough to buffer the packet that trigger ARP resolution, then +it might have delivered it already. If so, then it should show up in +``p4.pcap``. Let's take a look:: + + $ /usr/sbin/tcpdump -evvvr sandbox/p4.pcap ip + reading from file sandbox/p4.pcap, link-type EN10MB (Ethernet) + +Nope. That leaves the other possibility, which is that Faucet waits +for the original sending host to re-send the packet. We can do that +by re-running the trace:: + + $ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,udp,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_ttl=64 -generate + Flow: udp,in_port=1,vlan_tci=0x0000,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=0 + + bridge("br0") + ------------- + 0. in_port=1, priority 9099, cookie 0x5adc15c0 + goto_table:1 + 1. in_port=1,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0 + push_vlan:0x8100 + set_field:4196->vlan_vid + goto_table:3 + 3. ip,dl_vlan=100,dl_dst=0e:00:00:00:00:01, priority 9099, cookie 0x5adc15c0 + goto_table:4 + 4. ip,dl_vlan=100,nw_dst=10.200.0.1, priority 9131, cookie 0x5adc15c0 + set_field:4296->vlan_vid + set_field:0e:00:00:00:00:01->eth_src + set_field:00:10:20:30:40:50->eth_dst + dec_ttl + goto_table:7 + 7. dl_vlan=200,dl_dst=00:10:20:30:40:50, priority 9099, cookie 0x5adc15c0 + pop_vlan + output:4 + + Final flow: udp,in_port=1,vlan_tci=0x0000,dl_src=0e:00:00:00:00:01,dl_dst=00:10:20:30:40:50,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_tos=0,nw_ecn=0,nw_ttl=63,tp_src=0,tp_dst=0 + Megaflow: recirc_id=0,eth,ip,in_port=1,vlan_tci=0x0000/0x1fff,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_dst=10.200.0.1,nw_ttl=64,nw_frag=no + Datapath actions: set(eth(src=0e:00:00:00:00:01,dst=00:10:20:30:40:50)),set(ipv4(dst=10.200.0.1,ttl=63)),4 + +Finally, we have working IP packet forwarding! + +Performance +~~~~~~~~~~~ + +Take another look at the megaflow line above:: + + Megaflow: recirc_id=0,eth,ip,in_port=1,vlan_tci=0x0000/0x1fff,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_dst=10.200.0.1,nw_ttl=64,nw_frag=no + +This means that (almost) any packet between these Ethernet source and +destination hosts, destined to the given IP host, will be handled by +this single megaflow cache entry. So regardless of the number of UDP +packets or TCP connections that these hosts exchange, Open vSwitch +packet processing won't need to fall back to the slow path. It is +quite efficient. + +.. note:: + + The exceptions are packets with a TTL other than 64, and fragmented + packets. Most hosts use a constant TTL for outgoing packets, and + fragments are rare. If either of those did change, then that would + simply result in a new megaflow cache entry. + +The datapath actions might also be worth a look:: + + Datapath actions: set(eth(src=0e:00:00:00:00:01,dst=00:10:20:30:40:50)),set(ipv4(dst=10.200.0.1,ttl=63)),4 + +This just means that, to process these packets, the datapath changes +the Ethernet source and destination addresses and the IP TTL, and then +transmits the packet to port ``p4`` (also numbered 4). Notice in +particular that, despite the OpenFlow actions that pushed, modified, +and popped back off a VLAN, there is nothing in the datapath actions +about VLANs. This is because the OVS flow translation code "optimizes +out" redundant or unneeded actions, which saves time when the cache +entry is executed later. + +.. note:: + + It's not clear why the actions also re-set the IP destination + address to its original value. Perhaps this is a minor performance + bug. + +ACLs +---- + +Let's try out some ACLs, since they do a good job illustrating some of +the ways that OVS tries to optimize megaflows. Update +``inst/faucet.yaml`` to the following:: + + dps: + switch-1: + dp_id: 0x1 + timeout: 3600 + arp_neighbor_timeout: 3600 + interfaces: + 1: + native_vlan: 100 + acl_in: 1 + 2: + native_vlan: 100 + 3: + native_vlan: 100 + 4: + native_vlan: 200 + 5: + native_vlan: 200 + vlans: + 100: + faucet_vips: ["10.100.0.254/24"] + 200: + faucet_vips: ["10.200.0.254/24"] + routers: + router-1: + vlans: [100, 200] + acls: + 1: + - rule: + dl_type: 0x800 + nw_proto: 6 + tp_dst: 8080 + actions: + allow: 0 + - rule: + actions: + allow: 1 + +Then restart Faucet:: + + $ docker restart faucet + +On port 1, this new configuration blocks all traffic to TCP port 8080 +and allows all other traffic. The resulting change in the flow table +shows this clearly too:: + + $ diff-flows flows2 br0 + -priority=9099,in_port=1 actions=goto_table:1 + +priority=9098,in_port=1 actions=goto_table:1 + +priority=9099,tcp,in_port=1,tp_dst=8080 actions=drop + +The most interesting question here is performance. If you recall the +earlier discussion, when a packet through the flow table encounters a +match on a given field, the resulting megaflow has to match on that +field, even if the flow didn't actually match. This is expensive. + +In particular, here you can see that any TCP packet is going to +encounter the ACL flow, even if it is directed to a port other than +8080. If that means that every megaflow for a TCP packet is going to +have to match on the TCP destination, that's going to be bad for +caching performance because there will be a need for a separate +megaflow for every TCP destination port that actually appears in +traffic, which means a lot more megaflows than otherwise. (Really, in +practice, if such a simple ACL blew up performance, OVS wouldn't be a +very good switch!) + +Let's see what happens, by sending a packet to port 80 (instead of +8080):: + + $ ovs-appctl ofproto/trace br0 in_port=p1,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,tcp,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_ttl=64,tp_dst=80 -generate + + bridge("br0") + ------------- + 0. in_port=1, priority 9098, cookie 0x5adc15c0 + goto_table:1 + 1. in_port=1,vlan_tci=0x0000/0x1fff, priority 9000, cookie 0x5adc15c0 + push_vlan:0x8100 + set_field:4196->vlan_vid + goto_table:3 + 3. ip,dl_vlan=100,dl_dst=0e:00:00:00:00:01, priority 9099, cookie 0x5adc15c0 + goto_table:4 + 4. ip,dl_vlan=100,nw_dst=10.200.0.0/24, priority 9123, cookie 0x5adc15c0 + goto_table:6 + 6. ip, priority 9131, cookie 0x5adc15c0 + CONTROLLER:96 + + Final flow: tcp,in_port=1,dl_vlan=100,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_src=10.100.0.1,nw_dst=10.200.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 + Megaflow: recirc_id=0,eth,tcp,in_port=1,vlan_tci=0x0000/0x1fff,dl_src=00:01:02:03:04:05,dl_dst=0e:00:00:00:00:01,nw_dst=10.200.0.1,nw_frag=no,tp_dst=0x0/0xf000 + Datapath actions: push_vlan(vid=100,pcp=0) + +Take a look at the Megaflow line and in particular the match on +``tp_dst``, which says ``tp_dst=0x0/0xf000``. What this means is that +the megaflow matches on only the top 4 bits of the TCP destination +port. That works because:: + + 80 (base 10) == 0001,1111,1001,0000 (base 2) + 8080 (base 10) == 0000,0000,0101,0000 (base 2) + +and so by matching on only the top 4 bits, rather than all 16, the OVS +fast path can distinguish port 80 from port 8080. This allows this +megaflow to match one-sixteenth of the TCP destination port address +space, rather than just 1/65536th of it. + +.. note:: + + The algorithm OVS uses for this purpose isn't perfect. In this + case, a single-bit match would work (e.g. tp_dst=0x0/0x1000), and + would be superior since it would only match half the port address + space instead of one-sixteenth. + +For details of this algorithm, please refer to ``lib/classifier.c`` in +the Open vSwitch source tree, or our 2015 NSDI paper "The Design and +Implementation of Open vSwitch". + +Finishing Up +------------ + +When you're done, you probably want to exit the sandbox session, with +Control+D or ``exit``, and stop the Faucet controller with ``docker +stop faucet; docker rm faucet``. + +Further Directions +------------------ + +We've looked a fair bit at how Faucet interacts with Open vSwitch. If +you still have some interest, you might want to explore some of these +directions: + +* Adding more than one switch. Faucet can control multiple switches + but we've only been simulating one of them. It's easy enough to + make a single OVS instance act as multiple switches (just + ``ovs-vsctl add-br`` another bridge), or you could use genuinely + separate OVS instances. + +* Additional features. Faucet has more features than we've + demonstrated, such as IPv6 routing and port mirroring. These should + also interact gracefully with Open vSwitch. + +* Real performance testing. We've looked at how flows and traces + **should** demonstrate good performance, but of course there's no + proof until it actually works in practice. We've also only tested + with trivial configurations. Open vSwitch can scale to millions of + OpenFlow flows, but the scaling in practice depends on the + particular flow tables and traffic patterns, so it's valuable to + test with large configurations, either in the way we've done it or + with real traffic. diff --git a/Documentation/tutorials/index.rst b/Documentation/tutorials/index.rst index e67209b0e..c2d343b11 100644 --- a/Documentation/tutorials/index.rst +++ b/Documentation/tutorials/index.rst @@ -39,6 +39,7 @@ vSwitch. .. toctree:: :maxdepth: 2 + faucet ovs-advanced ovn-sandbox ovn-openstack -- cgit v1.2.1