summaryrefslogtreecommitdiff
path: root/ovn
Commit message (Collapse)AuthorAgeFilesLines
* ovn-northd: Fix the HA_Chassis sync issue in OVN SB DBNuman Siddique2019-04-251-18/+74
| | | | | | | | | | | | | | | | | | | | | ovn-northd deletes and recreates HA_Chassis rows (which belong to a HA_Chassis_Group) whenever the HA_Chassis_Group/Gateway_Chassis rows in Northbound DB are out of sync. If a Chassis table row in Southbound DB is deleted and if this row is referenced by HA_Chassis row (in Southbound DB), then the present code syncs the HA_Chassis rows continously and this causes the ovn-controller's to wake up and results in 100% cpu usage. This was a simple case which the commit 1be1e0e5e0d1 ("ovn: Add generic HA chassis group") missed out addressing. This patch fixes this issue. Fixes: 1be1e0e5e0d1 ("ovn: Add generic HA chassis group") Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-April/048580.html Reported-by: Daniel Alvarez Sanchez (dalvarez@redhat.com) Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* OVN: Clarify docs about the default transport zoneLucas Alvares Gomes2019-04-231-5/+11
| | | | | | | | | This patch is extending the documentation about the new transport zones feature to clarify that if no transport zones are set, the chassis will belong to a default group. Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* OVN: Add support for Transport ZonesLucas Alvares Gomes2019-04-228-10/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is adding support for Transport Zones. Transport zones (a.k.a TZs) is way to enable users of OVN to separate Chassis into different logical groups that will only form tunnels between members of the same groups. Each Chassis can belong to one or more Transport Zones. If not set, the Chassis will be considered part of a default group. Configuring Transport Zones is done by creating a key called "ovn-transport-zones" in the external_ids column of the Open_vSwitch table from the local OVS instance. The value is a string with the name of the Transport Zone that this instance is part of. Multiple TZs can be specified with a comma-separated list. For example: $ sudo ovs-vsctl set open . external-ids:ovn-transport-zones=tz1 or $ sudo ovs-vsctl set open . external-ids:ovn-transport-zones=tz1,tz2,tz3 This configuration is also exposed in the Chassis table of the OVN Southbound Database in a new column called "transport_zones". The use for Transport Zones includes but are not limited to: * Edge computing: As a way to preventing edge sites from trying to create tunnels with every node on every other edge site while still allowing these sites to create tunnels with the central node. * Extra security layer: Where users wants to create "trust zones" and prevent computes in a more secure zone to communicate with a less secure zone. This patch is also backward compatible so the upgrade guide for OVN [0] is still valid and the ovn-controller service can be upgraded before the OVSDBs. [0] http://docs.openvswitch.org/en/latest/intro/install/ovn-upgrades/ Reported-by: Daniel Alvarez Sanchez <dalvarez@redhat.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-February/048255.html Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn: Generate ICMPv4 packet in router pipeline for larger packetsNuman Siddique2019-04-222-6/+169
| | | | | | | | | | | | | | | | | | | | | This patch adds 2 stages in router pipeline after ARP_RESOLVE and adds the logical flows to check the packet length and generate ICMPv4 packet. * S_ROUTER_IN_CHK_PKT_LEN - Which checks the packet length using check_pkt_larger OVN action * S_ROUTER_IN_LARGER_PKTS - Which generates icmp packet with type 3 (Destination Unreachable), code 4 (Frag Needed and DF was Set) icmp4.frag_mtu = gw_mtu In order to add these logical flows, CMS should set the option 'gateway_mtu' for the distributed logical router port. Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn: Support OVS action 'check_pkt_larger' in OVNNuman Siddique2019-04-223-1/+77
| | | | | | | | | | | | | | | Previous commit added a new OVS action 'check_pkt_larger'. This patch supports that action in OVN. The syntax to use this would be reg0[0] = check_pkt_larger(LEN) Upcoming commit will make use of this action in ovn-northd and will generate an ICMPv4 packet if the packet length is greater than the specified length. Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn: Add a new OVN action 'icmp4_error'Numan Siddique2019-04-224-7/+87
| | | | | | | | | | | | This action is similar to the existing 'icmp4' OVN action except that that this action is expected to be used to generate an ICMPv4 packet in response to an error in original IP packet. When this action injects the icmpv4 packet, it also copies the original IP datagram following the icmp4 header as per RFC 1122: 3.2.2 Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn: Add a new OVN field icmp4.frag_mtuNuman Siddique2019-04-2211-95/+216
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to support OVN specific fields (which are not yet supported in OpenvSwitch to set or modify values) a generic OVN field support is added in this patch. These OVN fields gets translated to controller actions. This patch adds only one field for now - icmp4.frag_mtu. It should be fairly straightforward to add similar fields in the near future. Example usage. action=(icmp4 {"eth.dst <-> eth.src; " "icmp4.type = 3; /* Destination Unreachable */ " "icmp4.code = 4; /* Fragmentation Needed */ " icmp4.frag_mtu = 1442; ... "next; };") action=(icmp4.frag_mtu = 1500; ..) pinctrl module of ovn-controller will set the specified value in the the low-order 16 bits of the ICMP4 header field that is labelled "unused" in the ICMP specification as defined in the RFC 1191. Upcoming patch will use it to send an icmp4 packet if the source IPv4 packet destined to go via external gateway needs to be fragmented. Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-nbctl: Fix 32-bit build with gcc.Ilya Maximets2019-04-171-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ovn/utilities/ovn-nbctl.c: In function 'print_routing_policy': ovn/utilities/ovn-nbctl.c:3620:23: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'int64_t' policy->match, policy->action, next_hop); ^ ovn/utilities/ovn-nbctl.c:3624:23: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'int64_t' policy->match, policy->action); ^ ovn/utilities/ovn-nbctl.c: In function 'cmd_ha_ch_grp_list': ovn/utilities/ovn-nbctl.c:5056:27: error: format '%lu' expects argument of type 'long unsigned int', but argument 10 has type 'int64_t' ha_ch->priority); ^ cc1: all warnings being treated as errors make[2]: *** [ovn/utilities/ovn-nbctl.o] Error 1 https://travis-ci.org/openvswitch/ovs/jobs/521015912 CC: Numan Siddique <nusiddiq@redhat.com> CC: Mary Manohar <mary.manohar@nutanix.com> Fixes: 1be1e0e5e0d1 ("ovn: Add generic HA chassis group") Fixes: a64bb573468f ("Policy-based routing (PBR) in OVN.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* OVN: add the possibility to configure a static IPv4/IPv6 address and dynamic MACLorenzo Bianconi2019-04-164-22/+97
| | | | | | | | | | | | | | | Add the possibility to configure a static IPv4 and/or IPv6 address and get MAC address dynamically allocated. This can be done using the following commands: $ovn-nbctl ls-add sw0 $ovn-nbctl set Logical-Switch sw0 other_config:subnet=192.168.0.0/24 $ovn-nbctl set Logical-switch sw0 other_config:ipv6_prefix=2001::0 $ovn-nbctl lsp-add sw0 lsp0 -- lsp-set-addresses lsp0 "dynamic 192.168.0.1 2001::1" Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* chassis.c: Return chassis record whenever available in chassis_run().Han Zhou2019-04-161-5/+4
| | | | | | | | | | | | | | | | | | The ovn-controller main loop relies on the return value of chassis_run(). When ovnsb_idl_txn is NULL (i.e. there is a pending transaction for SB), chasssis_run() returns NULL, which blocks functions to be executed in the main loop unnecessarily. This patch updates chassis_run() so that it returns chassis record whenever it is available. This changes allows xxx_run() functions being executed whenever br_int and chassis are not NULL. For functions that need to update SB DB, there are already additional checks making sure ovnsb_idl_txn is not NULL. Acked-by: Numan Siddique <nusiddiq@redhat.com> Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-controller: Fix busy loop when sb disconnected.Han Zhou2019-04-161-46/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the main loop, if the SB DB is disconnected when there is a pending transaction, there can be busy loop causing 100% CPU of ovn-controller, until SB DB is connected again. The root cause is that when a transaction is pending, ovsdb_idl_loop_run() will return NULL for ovnsb_idl_txn, and chassis_run() returns NULL when ovnsb_idl_txn is NULL, so the condition if (br_int && chassis) is not satisfied and so ofctrl_run() is not executed in the main loop. If there is any message pending from br-int.mgmt, such as OFPTYPE_BARRIER_REPLY or OFPTYPE_ECHO_REQUEST, the main loop will be woken up again and again because those messages are not processed because ofctrl_run() is not invoked. This patch fixes the problem by moving ofctrl_run() above and run it whenever br_int is not NULL, and not care about chassis because this function doesn't depend on it. It also moves out sbrec_chassis_set_nb_cfg() from the "if (ovs_idl_txn)" just to avoid adding more indentation of the whole block to avoid >79 line length. Note: the changes of this patch is better to be shown with "-w" because most of them are indent changes. Acked-by: Numan Siddique <nusiddiq@redhat.com> Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Policy-based routing (PBR) in OVN.Mary Manohar2019-04-164-8/+396
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PBR provides a mechanism to configure permit/deny and reroute policies on the router. Permit/deny policies are similar to OVN ACLs, but exist on the logical-router. Reroute policies are needed for service-insertion and service-chaining. Currently, policies are stateless. To achieve this, a new table is introduced in the ingress pipeline of the Logical-router. The new table is between the ‘IP Routing’ and the ‘ARP/ND resolution’ table. This way, PBR can override routing decisions and provide a different next-hop. This Patch: a. Changes in OVN NB Schema to introduce a new table in the Logical router. b. Add commands to ovn-nbctl to add/delete/list routing policies. c. Changes in ovn-northd to process routing-policy configurations. A new table 'Logical_Router_Policy' has been added in the northbound schema. The table has the following columns: * priority: Rules with numerically higher priority take precedence over those with lower. * match: Uses the same expression language as the 'match' column of 'Logical_Flow' table in the OVN Southbound database. * action: allow/drop/reroute nexthop: Nexthop IP address. Each row in this table represents one routing policy for a logical router. The 'action' column for the highest priority matching row in this table determines a packet's treatment. If no row matches, packets are allowed by default. The new ovn-nbctl commands are as follows: 1. Add a new ovn-nbctl command to add a routing policy. lr-policy-add ROUTER PRIORITY MATCH ACTION [NEXTHOP] Nexthop is an optional parameter. It needs to be provided only when 'action' is 'reroute'. A policy is uniquely identified by priority and match. Multiple policies can have the same priority. 2. Add a new ovn-nbctl command to delete a routing policy. lr-policy-del ROUTER [PRIORITY [MATCH]] Takes priority and match as optional parameters. If priority and match are specified, the policy with the given priority and match is deleted. If priority is specified and match is not specified, all rules with that priority are deleted. If priority is not specified, all the rules would be deleted. 3. Add a new ovn-nbctl command to list routing-policies in the logical router. lr-policy-list ROUTER ovn-northd changes are to get routing-policies from northbound database and populate the same as logical flows in the southbound database. A new table called 'POLICY' is introduced in the Logical router's ingress pipeline. Each routing-policy configured in the northbound database translates into a single logical flow in the new table. The columns from the Logical_Router_Policy table are used as follows: The priority column is used as priority in the logical-flow. The match column is used as the 'match' string in the logical-flow. The action column is used to determine the action of the logical-flow. When the 'action' is reroute, if the nexthop ip-address is a connected router port or the IP address of a logical port, the logical-flow is constructed to route the packet to the nexthop ip-address. Signed-off-by: Mary Manohar <mary.manohar@nutanix.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* OVN: fix DVR Floating IP supportLorenzo Bianconi2019-04-162-0/+156
| | | | | | | | | | | When DVR is enabled FIP traffic need to be forwarded directly using external connection to the underlay network and not be distributed through geneve tunnels. Fix this adding new logical flows to take care of distributed DNAT/SNAT Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn: Support a new Logical_Switch_Port.type - 'external'Numan Siddique2019-04-168-15/+316
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case of OpenStack + OVN, when the VMs are booted on hypervisors supporting SR-IOV nics, there are no OVS ports for these VMs. When these VMs sends DHCPv4, DHPCv6 or IPv6 Router Solicitation requests, the local ovn-controller cannot reply to these packets. OpenStack Neutron dhcp agent service needs to be run to serve these requests. With the new logical port type - 'external', OVN itself can handle these requests avoiding the need to deploy any external services like neutron dhcp agent. To make use of this feature, CMS has to - create a logical port for such VMs - set the type to 'external' - create an HA chassis group and associate the logical port to it or associate an already existing HA chassis group. - create a localnet port for the logical switch - configure the ovn-bridge-mappings option in the OVS db. HA chassis with the highest priority becomes the master of the HA chassis group and the ovn-controller running in that 'chassis', claims the Port_Binding for that logical port and it adds the necessary DHCPv4/v6 OF flows. Since the packet enters the logical switch pipeline via the localnet port, the inport register (reg14) is set to the tunnel key of localnet port in the match conditions. In case the chassis goes down for some reason, next higher priority HA chassis becomes the master and claims the port. When the VM with the external port, sends an ARP request for the router ips, only the chassis which has claimed the port, will reply to the ARP requests. Rest of the chassis on receiving these packets drop them in the ingress switch datapath stage - S_SWITCH_IN_EXTERNAL_PORT which is just before S_SWITCH_IN_L2_LKUP. This would guarantee that only the chassis which has claimed the external ports will run the router datapath pipeline. Acked-by: Mark Michelson <mmichels@redhat.com> Acked-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-northd: Delete the references to gateway_chasss in SB DBNuman Siddique2019-04-161-168/+163
| | | | | | | | | | | | | | | | | | Previous patch in the series added the support in ovn-controller to use ha_chassis_group table in SB DB to support HA chassis and establishing BFD tunnels instead of the gateway_chassis table. There is no need for ovn-northd to create any gateway_chassis rows in SB DB. This patch does that and deletes the code which is not required anymore. This patch also now supports 'ha_chassis_group' to be associated with a distributed logical router port and ignores 'gateway_chassis' and 'redirect-chassis' if set along with 'ha_chassis_group'. Acked-by: Mark Michelson <mmichels@redhat.com> Acked-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-controller: Make use of ha_chassis_group table to bind the ↵Numan Siddique2019-04-1616-583/+427
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | chassisredirect ports This patch uses the newly added ha_chassis_group table in Southbound DB - to bind the chassisredirect ports. - to establish BFD sessions with the required chassis. The previous patch in this series sets the list of chassis which references a ha chassis group in the 'ref_chassis' column of 'ha_chassis_group' table (in ovn-northd). This patch uses that information to establish BFD sessions with only the required chassis. There is no need to traverse the local_datapath list to determine if a local chasis has to establish a BFD session with another chassis. For eg, if chassis - HV1, HV2 and HV3 are part of a chassis group G1 and G1 is referenced by compute chassis - C1 and C2, the chassis C1 will establish BFD sessions with HV1, HV2 and HV3 since C1 references the group G1. The ha chassis HV1, HV2 and HV3 also establish BFD sessions amongst themselves and also with C1 and C2. This patch also deletes the old code (which used gateway_chassis table) to bind the chassisredirect port. The rational behind the refactor is to make the ha chassis binding support generic, so that logical ports of type 'external' (which will be added in the upcoming patch) can also make use of it and to simplify the gateway chassis support code in OVN. Functionally this new approach is same as the older one. Acked-by: Mark Michelson <mmichels@redhat.com> Acked-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn: Add generic HA chassis groupNuman Siddique2019-04-1610-35/+931
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the tables - 'HA_Chassis_Group' and 'HA_Chassis' in both OVN Northbound and Southbound DBs to support generic HA Chassis groups in OVN. CMS can create a group of HA chassis with the priorities assigned to each chassis in the group. An HA chassis group can be associated to a distributed logical router port. An upcoming patch will make use of it while supporting 'external'* logical ports. HA chassis group is similar to the existing gateway chassis support in OVN which is used by the distributed gateway router ports. This patch tries to abstract this so that, the HA chassis support can be leveraged by not just distributed gateway router ports. If a logical router port has a set of gateway chassis associated to it, ovn-northd will create HA chassis group in Southbound DB and add these gateway chassis to this group. ovn-northd would still create gateway chassis in Southbound DB as ovn-controller still doesn't support using the HA chassis group. Next patch in the series will add the support in ovn-controller to make use of HA chassis group instead of gateway chassis. The patch following that will delete creation of gateway chassis in Southbound DB. HA_Chasss_Group table in Southbound DB has a column - 'ref_chassis'. This column is used to store the list of chassis which references the HA chassis group. This information will be used by ovn-controller in an upcoming patch to establish BFD sessions with the required chassis. Suppose if there is an HA chassis group - 'hagrp1' in the Southbound DB and it has HA chasiss list - ha1, ha2 and ha3 and this HA chassis group is used by a distributed logical router port, then ovn-northd will update the 'ref_chassis' with the list of chassis which has claimed all the logical switch ports which are connected to the logical router which has this distributed logical router port. Acked-by: Han Zhou <hzhou8@ebay.com> Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-northd: Reuse the hmaps - datapaths and ports in ovnsb_db_run()Numan Siddique2019-04-161-61/+48
| | | | | | | | | | | | | | | We can reuse the datapaths and ports built during ovnnb_db_run() in ovnsb_db_run(). This way we avoid creating the logical ports hash nodes during the ovnsb_db_run(). An upcoming patch will make further use of these hashmaps during ovnsb_db_run(). This patch refactors the code accordingly. Acked-by: Mark Michelson <mmichels@redhat.com> Acked-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-nbctl: Support --no-shuffle-remotes.Han Zhou2019-04-152-2/+41
| | | | | | | | | Support --no-shuffle-remotes option for ovn-nbctl, which is mainly for testing purpose, so that we can specify the order that client will failover when the connected node is down, to have more predictability in the test cases. Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* OVN: Make periodic RAs consistent with RA responder.Mark Michelson2019-03-251-2/+3
| | | | | | | | | | | | This commit makes periodic RAs from OVN consistent with the RAs sent in response to RSs. Specifically, this ensures that prefix flags are set correctly for each address mode. This commit also gets rid of some redundant definitions for RA prefix option flags from packets.h in favor of the ones in ovn-l7.h. Signed-off-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* OVN: Always send prefix option in RAsMark Michelson2019-03-253-15/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | OVN's behavior when sending router advertisements has been to include IP prefix information only if the address mode is set to "slaac" or "dhcp_stateless". In these modes, sending the prefix to the client is necessary so that it may automatically provision its IP address. We do not send the prefix option when the address mode is set to "dhcp_stateful" since there is no need for the client to automatically provision an IP address. This logic is flawed, however. When using dhcp_stateful, we provide a managed IPv6 address for a client. However, because we do not provide prefix information in our RAs, the client does not know the prefix length for the address it has been allocated. With dhclient, we have seen it assume either /64 or /128, depending on which version is being used. This may not accurately reflect the prefix length being used by the DHCP server though. The fix here is to always send prefix information in our RAs, regardless of address mode. The key difference lies in how we set the A (autonomous addressing) flag. For slaac and dhcp_stateless address modes, we will set this flag, indicating the client should provision its own address based on the prefix we have sent. For dhcp_stateful, we will not set this flag. This way, it is clear the prefix is informational, and the client should not try to provision its own IPv6 address. Signed-off-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* OVN: Use offset instead of pointer into ofpbufMark Michelson2019-03-251-2/+6
| | | | | | | | | | | | | | | | | | | | | In general, maintaining a pointer into an ofpbuf is risky. As the ofpbuf grows, it can reallocate its data. If this happens, then pointers into the data will become invalid. A safer practice is to track an offset into the ofpbuf's data where a structure you are interested in is kept. This way, if the ofpbuf data is reallocated, you can find your structure again by using the offset. In practice, this patch is not fixing any issues with OVN. Even though the ra pointer is pointing to ofpbuf data that can be reallocated, it will never actually happen. ovn-northd and all test cases always encode the address mode first, meaning we will only ever read from the ra pointer before the ofpbuf has a chance to expand. However, this base work is essential for an upcoming patch in this series. Signed-off-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-controller: Add a new thread in pinctrl module to handle packet-ins.Numan Siddique2019-03-251-136/+526
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this patch, ovn-controller was single threaded and everytime the poll_block() at the end of the main while() loop wakes up, it processes the whole SB DB and translates the logical flows to OF flows. There are few issues with this - * For every packet-in received, ovn-controller does this translation resulting in unnecessary CPU cycles. * If the translation takes a lot of time, then the packet-in handling would get delayed. The delay in responses to DHCP requests could result in resending of these requests. This patch addresses these issues by creating a new pthread in pinctrl module to handle packet-ins. This thread doesn't access the Southbound DB IDL object. Since some of the OVN actions - like dns_lookup, arp, put_arp, put_nd require access to the Southbound DB contents and gARPs, periodic IPv6 RA generation also requires the DB access, pinctrl_run() called by the main ovn-controller thread accesses the Southbound DB IDL and builds the local datastructures. pinctrl_handler thread accesses these data structures in handling such requests. An ovs_mutex is used between the pinctr_run() and the pinctrl_handler thread to protect these data structures. Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn pinctrl: Pass 'struct rconn *swconn' to all the functions which use itNuman Siddique2019-03-251-78/+106
| | | | | | | | | | | | In pinctrl.c, many functions use 'swconn' variable which is declared as global static. This patch passes 'swconn' as a variable to functions. This will help in an upcoming patch which makes processing packet-ins in a separate pthread. Suggested-by: Mark Michelson <mmichels@redhat.com> Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-ctl: Make sure OVN_RUNDIR is created for central nodes.Han Zhou2019-03-221-0/+1
| | | | | | | | | | | | | | When ovn-ctl tries to start ovsdb, it didn't ensure the rundir (e.g. /var/run/openvswitch) exist, because it is not calling start_daemon(). Usually, if OVS is started by ovs-ctl before on the same node, the folder is created already. However, for OVN central node, OVS is usually not needed. If the folder is not created (it is common case when system restarted because /var/run is usually tmpfs), ovn-ctl will fail to start ovsdb. This patch always ensures the OVN_RUNDIR is created. Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-ctl: Unify OVN_RUNDIR usage.Han Zhou2019-03-221-16/+17
| | | | | | | | | In this script $rundir and $OVN_RUNDIR is used in a mixed way, which can cause different folders used for different runtime files. This patch unifies the usage to the correct one. Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-nbctl: Don't segfault when ovn-northd doesn't configure dynamic addresses.Justin Pettit2019-03-131-1/+1
| | | | | | | | | | | | | When ovn-nbctl is used to configure a logical switch port's addresses, it does a sanity-check to make sure that a duplicate address isn't being used. If a port is configured as "dynamic", ovn-northd is supposed to populate the "dynamic_addresses" column in the Logical_Switch_Port table. If it isn't ovn-nbctl, would dereference a null pointer as part of the duplicate address check. This patch checks that "dynamic_addresses" is actually set first. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* OVN: Add support for DHCP option 150 - TFTP server addressLucas Alvares Gomes2019-03-073-0/+12
| | | | | | | | | | | | | | | | | OpenStack Ironic relies on a few DHCP options [0] that were not supported in OVN yet. This patch is adding the last one which is the option 150 (TFTP server address, RFC5859 [1]). Note that this option is Cisco proprietary, the IEEE standard that matches with this requirement is Option 66. The difference is that 150 allows to multiple IPs to be specified and 66 only allows one. [0] https://github.com/openstack/ironic/blob/3f6d4c6a789b12512d6cc67cdbc93ba5fbf29848/ironic/common/pxe_utils.py#L44-L54 [1] https://tools.ietf.org/html/rfc5859 Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* OVN: Add support for DHCP option 210 - path prefixLucas Alvares Gomes2019-03-073-0/+10
| | | | | | | | | | | | OpenStack Ironic relies on few DHCP options [0] that are not yet supported in OVN, one of them is the 210 (PATH PREFIX, RFC5071 [1]). [0] https://github.com/openstack/ironic/blob/3f6d4c6a789b12512d6cc67cdbc93ba5fbf29848/ironic/common/pxe_utils.py#L44-L54 [1] https://tools.ietf.org/html/rfc5071#section-5 Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* OVN: Add port addresses to IPAM after all ports are joined.Mark Michelson2019-03-061-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | Joining ports involves setting the peer field on ovn_ports. If a switch port is visited, and it is connected to a router port, then the switch port's peer is set to the router port and the router port's peer is set to the switch port. A router port's addresses are added to IPAM if it is peered with a switch that has dynamic addressing enabled. When visiting ports, if a router port is visited before its connected switch port, then the router port's peer is not set yet. Therefore the router's port addresses cannot be added to IPAM. The result is that duplicate addresses can be assigned by a logical switch. The fix for this is to wait until all ports have been joined and then add port addresses to IPAM. This way, we guarantee that all peer assignments have been set, and no duplicate IP addresses may be assigned by a switch. Reported-by: James Page <james.page@canonical.com> Signed-off-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* OVN: select a random mac_prefix if not providedLorenzo Bianconi2019-03-051-19/+17
| | | | | | | | | | | | | | Select a random IPAM mac_prefix if it has not been provided by the user. With this patch the admin can avoid to configure mac_prefix in order to avoid L2 address collisions if multiple OVN deployments share the same broadcast domain. Remove MAC_ADDR_PREFIX definitions/occurrences since now mac_prefix is always provided to ovn-northd Acked-by: Numan Siddique <nusiddiq@redhat.com> Tested-by: Miguel Duarte de Mora Barroso <mdbarroso@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* OVN: update RA next_announce according to {min, max}_intervalLorenzo Bianconi2019-03-041-0/+5
| | | | | | | | | | | | Update RA next_announce whenever min_interval and/or max_interval are updated in sbrec_port_binding option. In the current implementation if ipv6_ra_configs:send_periodic is set to true before setting ipv6_ra_configs:{min,max}_interval, next_announce will be set using default values and it will not be updated until we send the first IPv6 router advertisement Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn: Make DHCP log messages uniqueBrian Haley2019-02-281-7/+8
| | | | | | | | | Two messags were using the same string, add info to one to make it unique. Also cleaned-up some of the others to make them consistent throughout. Signed-off-by: Brian Haley <haleyb.dev@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-nbctl: Add lsp-get-ls commandLucas Alvares Gomes2019-02-272-0/+31
| | | | | | | | | | | | This commit adds the following command: lsp-get-ls: Get the logical switch which the port belongs to. This command is handy for scripting since there's no logical switch id in the Logical_Switch_Port table. Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn: update ovn-ctl usage with status, promote and demote commandsMoshe Levi2019-02-261-0/+8
| | | | | Signed-off-by: Moshe Levi <moshele@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-controller: Provide the option to set the datapath-type of br-intNuman Siddique2019-02-222-1/+18
| | | | | | | | | | | | If the integration bridge is deleted, ovn-controller recreates it but the previous datapath-type value is lost if it was set. This patch adds the code in ovn-controller to set the datapath-type if it is configured by the user in the 'external_ids:ovn-bridge-datapath-type' column of OpenvSwitch table. Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-nbctl: Daemon mode should retry when IDL connection lost.Han Zhou2019-02-221-4/+8
| | | | | | | | | | | | | | | When creating IDL, "retry" was set to false. However, in daemon mode, reconnecting upon DB server failure should be transparent to user. This even impacts HA mode. E.g. in clustered mode, although IDL tries to connect to next server, but at the first retry the server fail-over may not be completed yet, and it stops retry after N (N = number of remotes) times. This patch makes sure in daemon mode retry is set to true so that the daemon will automatically retry forever. Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Support for multiple VTEP in OVNvenu iyer2019-02-229-44/+282
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OVN uses tunnels to achieve logical network connectivity. The tunnel IP to be used when communicating with a node is configured using an external_ids field called "ovn-encap-ip" (and "ovn-encap-type" to indicate the type of tunnel - geneve, vxlan, stt). The fact that "ovn-encap-ip" is a single IP is significantly limiting when used in certain scenarios. Primarly, if we have multiple NICs on a system and want to assign SR-IOV VFs from different NICs to a guest (as logical ports), then we'll still end up using the "ovn-encap-ip" to encapsulate traffic from different VFs. This means we'll end up using only one NIC on the physical, thereby not maintaining the VF-PF association while also not using all the physical NICs. It is possible to bond all the NICs and use the bond IP as the "encap-ip", but bonding multiple NICs has its own limitations, i.e. NICs supporting OVS flows offload don't work with bonding - this severly undermines SR-IOV use with OVS (i.e. if all the processing needs to be done in the host despite giving VFs to guests). +-------------------------------------------------------+ +-------------------------~ |Hypervisor I (chassis-ID = HV1) | |Hypervisor II | | | |+----------------------+ | | || guest | | | || | | | |+-------|----------|---+ | | (ovn-chassis-id) | | | vf0_rep +-------+ | | +-------+ | | | encap-ip=IP1 | | | | (HV1@IP1) | | | |------------------------------| |-------------------------------| | | | | |br-int | | | |br-int | | | | vf0_rep | | | | (HV1@IP2) | | | | | encap-ip=IP2 | |-------------------------------| | | | |-------------------| | | | | +-------+ | | | +-------+ | | | | |vf0 |vf0 | | | | +---------+ +---------+ | | | +---| nic1 |--| nic2 |----------------------------+ | +--------------------------~ +---------+ +---------+ V | | Tunnel Ports |pf=IP1 |pf=IP2 between Hypervisors. Note: The above uses a NIC that supports OVS with SR-IOV (e.g. Mellanox CX-5) which uses a "representor" to plug in a VF to the OVS bridge. This patch enables a list of comma separated IP addresses to be specified in "ovn-encap-ip", thus allowing the node to be reached via any IP combined with the "ovn-encap-type" - assuming physical routing allows that. Additionally, it also introduces an way to specify the encap IP to be used for a logical port (so that the VF-PF mapping is maintained when traversing the logical path over a tunnel). A new "encap-ip" external_ids can be configured on an Interface to indicate this. On the SB these changes appear as an additional column in port_bindings as "encap". The encap record for a port points to an encap record on its chassis. If the port is not explicitly associated with an encap-ip (using external_ids), the encap record is empty, which means the preferred tunnel will be used to reach the port's chassis. The intention is also to have no functional changes in the default case, i.e when there is only one "ovn-encap-ip". The changes have been tested with multiple encap-ip addresses, SR-IOV and for backwards compatibality (in the case where there is only one ovn-encap-ip) with an OVN SB that doesn't include these changes.
* Initialize the right database.Ted Elhourani2019-02-141-1/+1
| | | | | | | Use value of db parameter in order to initialize the correct database. Signed-off-by: Ted Elhourani <ted.elhourani@nutanix.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn: change load balancer references to weak in NB schemaDaniel Alvarez2019-02-111-4/+4
| | | | | | | | | | | | | | | | | | | | When a load balancer is added to multiple logical switches and routers it has be to be removed from all of them before being able to delete due to the current 'strong' reference in the NB schema. By changing it to 'weak', users can simply remove the load balancer without having to remove all the references manually. In particular, this will make things easier for networking-ovn, the OpenStack integration project as it'll save some calculations upon load balancer deletion. The update path has been successfully from the previous version of the schema. Acked-by: Lucas Alvares Gomes <lucasagomes@gmail.com> Signed-off-by: Daniel Alvarez <dalvarez@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-controller: Fix chassisredirect port flapping when ovs-vswitchd crashesNuman Siddique2019-02-043-1/+21
| | | | | | | | | | | | | | | | | | | | | | | On a chassis when ovs-vswitchd crashes for some reason, the BFD status doesn't get updated in the ovs db. ovn-controller will be reading the old BFD status even though ovs-vswitchd is crashed. This results in the chassiredirect port claim flapping between the master chassis and the chasiss with the next higher priority if ovs-vswitchd crashes in the master chassis. All the other chassis notices the BFD status down with the master chassis and hence the next higher priority claims the port. But according to the master chassis, the BFD status is fine and it again claims back the chassisredirect port. And this results in flapping. The issue gets resolved when ovs-vswitchd comes back but until then it leads to lot of SB DB transactions and high CPU usage in ovn-controller's. This patch fixes the issue by checking the OF connection status of the ovn-controller with ovs-vswitchd and calculates the active bfd tunnels only if it's connected. Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-nb.xml: Minor documentation corrections.Ben Pfaff2019-01-171-2/+2
| | | | | Acked-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn: Add DHCP support for option 67 - bootfile nameNuman Siddique2019-01-163-0/+9
| | | | | | Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn: Add port addresses to IPAM later.Mark Michelson2019-01-161-2/+2
| | | | | | | | | | | ipam_add_port_adresses() needs to be called after the peer field is set on the ovn_port structures. This way, addresses taken by peered router ports will be added to the logical switch's IPAM and therefore will be barred from assignment to other ports. Reported-by: Girish Moodalbail <gmoodalbail@nvidia.com> Signed-off-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn: Clear dynamic_addresses when addresses are not "dynamic"Mark Michelson2019-01-161-1/+1
| | | | | | | | | When a logical switch port changes to no longer use "dynamic" addresses, then the dynamic_addresses should be cleared. Reported-by: Girish Moodalbail <gmoodalbail@nvidia.com> Signed-off-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* vconn: Allow timeout configuration for blocking connection.Ilya Maximets2019-01-102-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some systems in case where remote is not responding, socket could remain in SYN_SENT state for a really long time without errors waiting for connection. This leads to situations where vconn connection hangs for a few minutes waiting for connection to the DOWN remote. For example, this situation emulated by "refuse-connection" vconn testcase. This leads to test failures because Alarm signal arrives much faster than ETIMEDOUT from the socket: ./vconn.at:21: ovstest test-vconn refuse-connection tcp Alarm clock stderr: |socket_util|INFO|0:127.0.0.1: listening on port 63812 |poll_loop|DBG|wakeup due to 0-ms timeout |poll_loop|DBG|wakeup due to 10155-ms timeout |fatal_signal|WARN|terminating with signal 14 (Alarm clock) ./vconn.at:21: exit code was 142, expected 0 vconn.at:21: 535. tcp vconn - refuse connection (vconn.at:21): FAILED This patch allowes to specify timeout value for vconn blocking connections. If the connection takes more time, socket will be closed with ETIMEDOUT error code. Negative value could be used to wait infinitely. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* OVN: add static IP support to IPAMLorenzo Bianconi2018-12-283-5/+45
| | | | | | | | | | | | | Add the capability to IPAM/MACAM framework to specify a static ip address and get the L2 one allocated dynamically using the following syntax: $ovn-nbctl lsp-set-addresses <port> "dynamic <IP>" The static ip address needs to belong to the subnet configured for the logical switch Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* OVN: add mac address only support to IPAM/MACAMLorenzo Bianconi2018-12-282-1/+12
| | | | | | | | | | | Add the capability to assign just L2 address to IPAM/MACAM since in the current implementation either subnet or ipv6_prefix are mandatory to enable IPAM Tested-by: Yossi Segev <ysegev@redhat.com> Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-sb.ovsschema: Avoid duplicated IPs in Encap table.Han Zhou2018-12-271-3/+4
| | | | | | | | | | | | | | | | | | | | When adding a new chassis, if there is an old chassis with same IP existed in Encap table, it is allowed to be added today. However, allowing it to be added results in problems: 1. The new chassis cannot work because none of the other chassises are able to create tunnel to it, because of the IP confliction with already existed tunnel to the old chassis. 2. All the other chassises will continuously retry creating the tunnel and complaining about the error. So, instead of hiding the problem, it is better to expose it while trying to add the second chassis with duplicated IP. This patch ensures it from the ovsdb schema. Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn: Fix incorrect comparison of the NB and SB band actionMaks Naumov2018-12-271-1/+1
| | | | Signed-off-by: Ben Pfaff <blp@ovn.org>