summaryrefslogtreecommitdiff
path: root/lib/odp-util.c
Commit message (Collapse)AuthorAgeFilesLines
* odp: Use struct in6_addr for IPv6 addresses.Jarno Rajahalme2017-01-041-39/+22
| | | | | | | | | Code is simplified when the ODP keys use the same type as the struct flow for the IPv6 addresses. As the change is facilitated by extract-odp-netlink-h, this change only affects the userspace. We already do the same for the ethernet addresses. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* mpls: Fix MPLS restoration after patch port and group bucket.Jarno Rajahalme2016-12-021-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes problems with MPLS handling related to patch ports and group buckets. If a group bucket or a peer bridge across a patch port pushes MPLS headers to a non-MPLS packet and outputs, the flow translation after returning from the group bucket or patch port would undo the packet transformations so that the processing could continue with the packet as it was before entering the patch port. There were two problems with this: 1. As part of the first MPLS push on a non-MPLS packet, the flow translation would first clear the L3/4 headers of the 'flow' to mark those fields invalid. Later, when committing 'flow' changes to datapath actions before output, the necessary datapath MPLS actions are created and the corresponding changes updated to the 'base flow'. This was done using the same flow_push_mpls() function that clears the L2/3 headers, so also the 'base flow' L2/3 headers were cleared. Then, when translation returns from a patch port or group bucket, the original 'flow' is restored, now showing no sign of the MPLS labels. Since the 'base flow' now has the MPLS labels, following translations know to issue MPLS POP actions before any output actions. However, as part of checking for changes to IP headers we test that the IP protocol type was not changed. But now the 'base flow's 'nw_proto' field is zero and an assert fail crashes OVS. This is solved by not clearing the L3/4 fields of the 'base flow'. This allows the processing after the patch port to continue with L3/4 fields as if no MPLS was done, after first issuing the necessary MPLS POP actions. 2. IP header updates were done before the MPLS POP actions were issued. This caused incorrect packet output after, e.g., group action or patch port. For example, with actions: group 1234: all bucket=push_mpls,output:LOCAL ip actions=group:1234,dec_ttl,output:LOCAL,output:LOCAL the dec_ttl would only be executed before the last output to LOCAL, since at the time of committing IP changes after the group action the packet was still an MPLS packet. This is solved by checking the dl_type of both 'flow' and 'base flow' and issuing MPLS actions if they can transform the packet from an MPLS packet to a non-MPLS packet. For an IP packet the change in ttl can then be correctly committed before the last two output actions. Two test cases are added to prevent future regressions. Reported-by: Thomas Morin <thomas.morin@orange.com> Suggested-by: Takashi YAMAMOTO <yamamoto@ovn.org> Fixes: 8bfd0fdac ("Enhance userspace support for MPLS, for up to 3 labels.") Fixes: 1b035ef20 ("mpls: Allow l3 and l4 actions to prior to a push_mpls action") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: YAMAMOTO Takashi <yamamoto@ovn.org>
* ofp-actions: Add "ingress" and "egress" options to "sample" action.Ben Pfaff2016-11-301-2/+23
| | | | | | | | | | | | | | | | | | | | Before Open vSwitch 2.5.90, IPFIX reports from Open vSwitch didn't include whether the packet was ingressing or egressing the switch. Starting in OVS 2.5.90, this information was available but only accurate if the action included a port number that indicated a tunnel. Conflating these two does not always make sense (not every packet involves a tunnel!), so this patch makes it possible for the sample action to simply say whether it's for ingress or egress. This is difficult to test, since the "tests" directory of OVS does not have a proper IPFIX listener. This passes those tests, plus a couple that just verify that the actions are properly parsed and formatted. Benli did test it end-to-end in a VMware use case. Requested-by: Benli Ye <daniely@vmware.com> Tested-by: Benli Ye <daniely@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Simon Horman <simon.horman@netronome.com>
* lib: Use nl_attr_get_odp_port().Joe Stringer2016-11-161-1/+1
| | | | | | | This helper is a little tidier than the alternative. Use it treewide. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Simon Horman <simon.horman@netronome.com>
* tun-metadata: Manage tunnel TLV mapping table on a per-bridge basis.Jesse Gross2016-09-191-69/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using tunnel TLVs (at the moment, this means Geneve options), a controller must first map the class and type onto an appropriate OXM field so that it can be used in OVS flow operations. This table is managed using OpenFlow extensions. The original code that added support for TLVs made the mapping table global as a simplification. However, this is not really logically correct as the OpenFlow management commands are operating on a per-bridge basis. This removes the original limitation to make the table per-bridge. One nice result of this change is that it is generally clearer whether the tunnel metadata is in datapath or OpenFlow format. Rather than allowing ad-hoc format changes and trying to handle both formats in the tunnel metadata functions, the format is more clearly separated by function. Datapaths (both kernel and userspace) use datapath format and it is not changed during the upcall process. At the beginning of action translation, tunnel metadata is converted to OpenFlow format and flows and wildcards are translated back at the end of the process. As an additional benefit, this change improves performance in some flow setup situations by keeping the tunnel metadata in the original packet format in more cases. This helps when copies need to be made as the amount of data touched is only what is present in the packet rather than the maximum amount of metadata supported. Co-authored-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Ben Pfaff <blp@ovn.org>
* datapath: backport: libnl: nla_put_be64(): align on a 64-bit areaPravin B Shelar2016-07-171-0/+2
| | | | | | | | | | | | | | | | | | | Upstream commit: commit b46f6ded906ef0be52a4881ba50a084aeca64d7e Author: Nicolas Dichtel <nicolas.dichtel@6wind.com> libnl: nla_put_be64(): align on a 64-bit area nla_data() is now aligned on a 64-bit area. A temporary version (nla_put_be64_32bit()) is added for nla_put_net64(). This function is removed in the next patch. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* Introduce 128-bit xxregs.Justin Pettit2016-07-121-0/+6
| | | | | | | These are needed to handle IPv6 addresses. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* flow: New function is_nd().Ben Pfaff2016-07-021-6/+2
| | | | | | | This simplifies a few pieces of code and will acquire another user in an upcoming commit. Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofp-actions: Add truncate action.William Tu2016-06-241-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds a new action to support packet truncation. The new action is formatted as 'output(port=n,max_len=m)', as output to port n, with packet size being MIN(original_size, m). One use case is to enable port mirroring to send smaller packets to the destination port so that only useful packet information is mirrored/copied, saving some performance overhead of copying entire packet payload. Example use case is below as well as shown in the testcases: - Output to port 1 with max_len 100 bytes. - The output packet size on port 1 will be MIN(original_packet_size, 100). # ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)' - The scope of max_len is limited to output action itself. The following packet size of output:1 and output:2 will be intact. # ovs-ofctl add-flow br0 \ 'actions=output(port=1,max_len=100),output:1,output:2' - The Datapath actions shows: # Datapath actions: trunc(100),1,1,2 Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134 Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* ipfix: Support tunnel information for Flow IPFIX.Benli Ye2016-06-171-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support to export tunnel information for flow-based IPFIX. The original steps to configure flow level IPFIX: 1) Create a new record in Flow_Sample_Collector_Set table: 'ovs-vsctl -- create Flow_Sample_Collector_Set id=1 bridge="Bridge UUID"' 2) Add IPFIX configuration which is referred by corresponding row in Flow_Sample_Collector_Set table: 'ovs-vsctl -- set Flow_Sample_Collector_Set "Flow_Sample_Collector_Set UUID" ipfix=@i -- --id=@i create IPFIX targets=\"IP:4739\" obs_domain_id=123 obs_point_id=456 cache_active_timeout=60 cache_max_flows=13' 3) Add sample action to the flows: 'ovs-ofctl add-flow mybridge in_port=1, actions=sample'('probability=65535,collector_set_id=1, obs_domain_id=123,obs_point_id=456')',output:3' NXAST_SAMPLE action was used in step 3. In order to support exporting tunnel information, the NXAST_SAMPLE2 action was added and with NXAST_SAMPLE2 action in this patch, the step 3 should be configured like below: 'ovs-ofctl add-flow mybridge in_port=1, actions=sample'('probability=65535,collector_set_id=1,obs_domain_id=123, obs_point_id=456,sampling_port=3')',output:3' 'sampling_port' can be equal to ingress port or one of egress ports. If sampling port is equal to output port and the output port is a tunnel port, OVS_USERSPACE_ATTR_EGRESS_TUN_PORT will be set in the datapath flow sample action. When flow sample action upcall happens, tunnel information will be retrieved from the datapath and then IPFIX can export egress tunnel port information. If samping_port=65535 (OFPP_NONE), flow-based IPFIX will keep the same behavior as before. This patch mainly do three tasks: 1) Add a new flow sample action NXAST_SAMPLE2 to support exporting tunnel information. NXAST_SAMPLE2 action has a new added field 'sampling_port'. 2) Use 'other_configure: enable-tunnel-sampling' to enable or disable exporting tunnel information. 3) If 'sampling_port' is equal to output port and output port is a tunnel port, the translation of OpenFlow "sample" action should first emit set(tunnel(...)), then the sample action itself. It makes sure the egress tunnel information can be sampled. 4) Add a test of flow-based IPFIX for tunnel set. How to test flow-based IPFIX: 1) Setup a test environment with two Linux host with Docker supported 2) Create a Docker container and a GRE tunnel port on each host 3) Use ovs-docker to add the container on the bridge 4) Listen on port 4739 on the collector machine and use wireshark to filter 'cflow' packets. 5) Configure flow-based IPFIX: - 'ovs-vsctl -- create Flow_Sample_Collector_Set id=1 bridge="Bridge UUID"' - 'ovs-vsctl -- set Flow_Sample_Collector_Set "Flow_Sample_Collector_Set UUID" ipfix=@i -- --id=@i create IPFIX \ targets=\"IP:4739\" cache_active_timeout=60 cache_max_flows=13 \ other_config:enable-tunnel-sampling=true' - 'ovs-ofctl add-flow mybridge in_port=1, actions=sample'('probability=65535,collector_set_id=1,obs_domain_id=123, obs_point_id=456,sampling_port=3')',output:3' Note: The in-port is container port. The output port and sampling_port are both open flow port and the output port is a GRE tunnel port. 6) Ping from the container whose host enabled flow-based IPFIX. 7) Get the IPFIX template pakcets and IPFIX information packets. Signed-off-by: Benli Ye <daniely@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* odp-util: Remove odp_in_port from struct odp_flow_key_parms.Jesse Gross2016-06-131-3/+3
| | | | | | | | | | | | | | | | | When calling odp_flow_key_from_flow (or _mask), the in_port included as part of the flow is ignored and must be explicitly passed as a separate parameter. This is because the assumption was that the flow's version would often be in OFP format, rather than ODP. However, at this point all flows that are ready for serialization in netlink format already have their in_port properly set to ODP format. As a result, every caller needs to explicitly initialize the extra paramter to the value that is in the flow. This switches to just use the value in the flow to simply things and avoid the possibility of forgetting to initialize the extra parameter. Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
* ofproto-dpif-xlate: Fix IGMP megaflow matching.Ben Pfaff2016-05-201-2/+2
| | | | | | | | | IGMP translations wasn't setting enough bits in the wildcards to ensure different packets were handled differently. Reported-by: "O'Reilly, Darragh" <darragh.oreilly@hpe.com> Reported-at: http://openvswitch.org/pipermail/discuss/2016-April/021036.html Signed-off-by: Ben Pfaff <blp@ovn.org>
* util: Pass 128-bit arguments directly instead of using pointers.Justin Pettit2016-05-081-4/+4
| | | | | | | | | | | Commit f2d105b5 (ofproto-dpif-xlate: xlate ct_{mark, label} correctly.) introduced the ovs_u128_and() function. It directly takes ovs_u128 values as arguments instead of pointers to them. As this is a bit more direct way to deal with 128-bit values, modify the other utility functions to do the same. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* hmap: Add HMAP_FOR_EACH_POP.Daniele Di Proietto2016-04-261-4/+3
| | | | | | | Makes popping each member of the hmap a bit easier. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
* odp-util: Fix build warning on flags_mask.antonio.fischetti@intel.com2016-04-221-1/+1
| | | | | | | Fix build warning: 'flags_mask' may be used uninitialized. Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Move lib/ofpbuf.h to include/openvswitch directoryBen Warren2016-03-301-1/+1
| | | | | | Signed-off-by: Ben Warren <ben@skyportsystems.com> Acked-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Move lib/dynamic-string.h to include/openvswitch directoryBen Warren2016-03-191-1/+1
| | | | | Signed-off-by: Ben Warren <ben@skyportsystems.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* odp-util: Use FLOW_MAX_MPLS_LABELS when parsing MPLS ODP keys.Jarno Rajahalme2016-02-291-1/+1
| | | | | | | | | | Even though the number of supported MPLS labels may vary between a datapath and the OVS userspace, it is better to use the FLOW_MAX_MPLS_LABELS than a hard-coded '3' as the maximum number of labels to scan. Requested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* odp-util: Format and scan multiple MPLS labels.Jarno Rajahalme2016-02-241-21/+56
| | | | | | | | | So far we have been limited to including only one MPLS label in the textual datapath flow format. Allow upto 3 labels to be included so that testing with multiple labels becomes easier. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* odp-util: Fix formatting and parsing of 'frag' in tnl_push ipv4 argument.Ben Pfaff2016-02-011-2/+4
| | | | | | | | | | ip_frag_off is an ovs_be16 so it must be converted between host and network byte order for parsing and formatting. Reported-by: Dimitri John Ledkov <xnox@ubuntu.com> Reported-at: http://openvswitch.org/pipermail/discuss/2016-January/020072.html Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Dimitri John Ledkov <xnox@ubuntu.com>
* hash: New helper functions hash_bytes32() and hash_bytes64().Ben Pfaff2016-01-201-3/+2
| | | | | | | | | | All of the callers of hash_words() and hash_words64() actually find it easier to pass in the number of bytes instead of the number of 32-bit or 64-bit words. These new functions allow the callers to be a little simpler. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
* odp-util: Accept fields with zero maskHaggai Eran2016-01-191-6/+3
| | | | | | | | | It is possible to pass some fields to the kernel with a zero mask, but ovs-dpctl doesn't currently allow it. Change the code to allow it to mimic what vswitchd is allowed to do. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Jesse Gross <jesse@kernel.org>
* odp-util: Fix memory leak reported by valgrind.William Tu2016-01-041-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test case: OVS datapath key parsing and formatting (377) Return without freeing buf: xmalloc(util.c:112) ofpbuf_init(ofpbuf.c:124) parse_odp_userspace_action(odp-util.c:987) parse_odp_action(odp-util.c:1552) odp_actions_from_string(odp-util.c:1721) parse_actions(test-odp.c:132) Test case: OVS datapath actions parsing and formatting (380) Exit withtou uninit in test-odp.c xrealloc(util.c:123) ofpbuf_resize__(ofpbuf.c:243) ofpbuf_put_uninit(ofpbuf.c:364) nl_msg_put_uninit(netlink.c:178) nl_msg_put_unspec_uninit(netlink.c:216) nl_msg_put_unspec(netlink.c:243) parse_odp_key_mask_attr(odp-util.c:3974) odp_flow_from_string(odp-util.c:4151) parse_keys(test-odp.c:49) test_odp_main(test-odp.c:237) ovstest_wrapper_test_odp_main__(test-odp.c:251) ovs_cmdl_run_command(command-line.c:121) main(ovstest.c:132) Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Co-authored-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* odp-util: Correctly [de]serialize mask for ND attributes.Daniele Di Proietto2015-12-101-4/+13
| | | | | | | | | | | | When converting between ODP attributes and struct flow_wildcards, we check that all the prerequisites are exact matched on the mask. For ND(ICMPv6) attributes, an exact match on tp_src and tp_dst (which in this context are the icmp type and code) shold look like htons(0xff), not htons(0xffff). Fix this in two places. The consequences were that the ODP mask wouldn't include the ND attributes and the flow would be deleted by the revalidation.
* odp-util: Return exact mask if netlink mask attribute is missing.Daniele Di Proietto2015-12-101-6/+29
| | | | | | | | | | | | | | | | | | In the ODP context an empty mask netlink attribute usually means that the flow should be an exact match. odp_flow_key_to_mask{,_udpif}() instead return a struct flow_wildcards with matches only on recirc_id and vlan_tci. A more appropriate behavior is to handle a missing (zero length) netlink mask specially (like we do in userspace and Linux datapath) and create an exact match flow_wildcards from the original flow. This fixes a bug in revalidate_ukey(): every flow created with megaflows disabled would be revalidated away, because the mask would seem too generic. (Another possible fix would be to handle the special case of a missing mask in revalidate_ukey(), but this seems a more generic solution).
* odp-util: Commit ICMP set only for ICMP packets.Daniele Di Proietto2015-12-101-2/+8
| | | | | | | | | | | | | | | | | | | | commit_set_icmp_action() should do its job only if the packet is ICMP, otherwise there will be two problems: * A set ICMP action will be inserted in the ODP actions and the flow will be slow pathed. * The tp_src and tp_dst field will be unwildcarded. Normal TCP or UDP packets won't be impacted, because commit_set_icmp_action() is called after commit_set_port_action() and it will see the fields as already committed (TCP/UCP transport ports and ICMP code/type are stored in the same members in struct flow). MPLS packets though will hit the bug, causing a nonsensical set action (which will end up zeroing the transport source port) and an invalid mask to be generated. The commit also alters an MPLS testcase to trigger the bug.
* odp-util: Consider NAT bits in conversions and format.Daniele Di Proietto2015-12-041-0/+16
| | | | Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* netdev-vport: Add IPv6 support for build/push/pop tunnel headerThadeu Lima de Souza Cascardo2015-12-041-33/+74
| | | | | | | This includes VXLAN, GRE and Geneve. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* odp-util: Limit scope of vlan in format_odp_action().Simon Horman2015-12-011-3/+3
| | | | | | | | | Limit the scope of the local vlan variable in format_odp_action() to where it is used. This is consistent with the treatment of mpls in the same function. Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* tunneling: extend flow_tnl with ipv6 addressesJiri Benc2015-11-301-4/+35
| | | | | | | | | | | | | | | | Note that because there's been no prerequisite on the outer protocol, we cannot add it now. Instead, treat the ipv4 and ipv6 dst fields in the way that either both are null, or at most one of them is non-null. [cascardo: abstract testing either dst with flow_tnl_dst_is_set] cascardo: using IPv4-mapped address is an exercise for the future, since this would require special handling of MFF_TUN_SRC and MFF_TUN_DST and OpenFlow messages. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Co-authored-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* lib: add format_in6_addr and scan_in6_addrJiri Benc2015-11-301-6/+22
| | | | | | | | Add in6_addr counterparts to the existing format and scan functions. Otherwise we'd need to recast all the time. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* conntrack: Add support for NAT.Jarno Rajahalme2015-11-251-15/+351
| | | | | | | | | | | | | Extend OVS conntrack interface to cover NAT. New nested NAT action may be included with a CT action. A bare NAT action only mangles existing connections. If a NAT action with src or dst range attribute is included, new (non-committed) connections are mangled according to the NAT attributes. This work extends on a branch by Thomas Graf at https://github.com/tgraf/ovs/tree/nat. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* byte-order: Make hton128() and ntoh128() behave like their counterparts.Justin Pettit2015-11-241-4/+4
| | | | | | | | Instead of taking the source and destination as arguments, make these functions act like their short and long counterparts. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* vswitchd: Allow modifying ICMP type and code.Justin Pettit2015-11-091-3/+41
| | | | | Signed-off-by: Justin Pettit <jpettit@nicira.com> Acked-by: Flavio Leitner <fbl@sysclose.org>
* odp-util: Fix CT action formating.Jarno Rajahalme2015-10-231-0/+1
| | | | | | | Comma was missing after "label" attribute. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Joe Stringer <joestringer@nicira.com>
* Add support for connection tracking helper/ALGs.Joe Stringer2015-10-131-1/+25
| | | | | | | | | | | | | | | | | | | | This patch adds support for specifying a "helper" or ALG to assist connection tracking for protocols that consist of multiple streams. Initially, only support for FTP is included. Below is an example set of flows to allow FTP control connections from port 1->2 to establish active data connections in the reverse direction: table=0,priority=1,action=drop table=0,arp,action=normal table=0,in_port=1,tcp,action=ct(alg=ftp,commit),2 table=0,in_port=2,tcp,ct_state=-trk,action=ct(table=1) table=1,in_port=2,tcp,ct_state=+trk+est,action=1 table=1,in_port=2,tcp,ct_state=+trk+rel,action=ct(commit),1 Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* Add connection tracking label support.Joe Stringer2015-10-131-1/+115
| | | | | | | | | | | | | | | | | | | | | | This patch adds a new 128-bit metadata field to the connection tracking interface. When a label is specified as part of the ct action and the connection is committed, the value is saved with the current connection. Subsequent ct lookups with the table specified will expose this metadata as the "ct_label" field in the flow. For example, to allow new TCP connections from port 1->2 and only allow established connections from port 2->1, and to associate a label with those connections: table=0,priority=1,action=drop table=0,arp,action=normal table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_label)),2 table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1) table=1,in_port=2,ct_state=+trk,ct_label=1,tcp,action=1 Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* Add connection tracking mark support.Joe Stringer2015-10-131-2/+54
| | | | | | | | | | | | | | | | | | | | | | This patch adds a new 32-bit metadata field to the connection tracking interface. When a mark is specified as part of the ct action and the connection is committed, the value is saved with the current connection. Subsequent ct lookups with the table specified will expose this metadata as the "ct_mark" field in the flow. For example, to allow new TCP connections from port 1->2 and only allow established connections from port 2->1, and to associate a mark with those connections: table=0,priority=1,action=drop table=0,arp,action=normal table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_mark)),2 table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1) table=1,in_port=2,ct_state=+trk,ct_mark=1,tcp,action=1 Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* Add support for connection tracking.Joe Stringer2015-10-131-0/+296
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new action and fields to OVS that allow connection tracking to be performed. This support works in conjunction with the Linux kernel support merged into the Linux-4.3 development cycle. Packets have two possible states with respect to connection tracking: Untracked packets have not previously passed through the connection tracker, while tracked packets have previously been through the connection tracker. For OpenFlow pipeline processing, untracked packets can become tracked, and they will remain tracked until the end of the pipeline. Tracked packets cannot become untracked. Connections can be unknown, uncommitted, or committed. Packets which are untracked have unknown connection state. To know the connection state, the packet must become tracked. Uncommitted connections have no connection state stored about them, so it is only possible for the connection tracker to identify whether they are a new connection or whether they are invalid. Committed connections have connection state stored beyond the lifetime of the packet, which allows later packets in the same connection to be identified as part of the same established connection, or related to an existing connection - for instance ICMP error responses. The new 'ct' action transitions the packet from "untracked" to "tracked" by sending this flow through the connection tracker. The following parameters are supported initally: - "commit": When commit is executed, the connection moves from uncommitted state to committed state. This signals that information about the connection should be stored beyond the lifetime of the packet within the pipeline. This allows future packets in the same connection to be recognized as part of the same "established" (est) connection, as well as identifying packets in the reply (rpl) direction, or packets related to an existing connection (rel). - "zone=[u16|NXM]": Perform connection tracking in the zone specified. Each zone is an independent connection tracking context. When the "commit" parameter is used, the connection will only be committed in the specified zone, and not in other zones. This is 0 by default. - "table=NUMBER": Fork pipeline processing in two. The original instance of the packet will continue processing the current actions list as an untracked packet. An additional instance of the packet will be sent to the connection tracker, which will be re-injected into the OpenFlow pipeline to resume processing in the specified table, with the ct_state and other ct match fields set. If the table is not specified, then the packet is submitted to the connection tracker, but the pipeline does not fork and the ct match fields are not populated. It is strongly recommended to specify a table later than the current table to prevent loops. When the "table" option is used, the packet that continues processing in the specified table will have the ct_state populated. The ct_state may have any of the following flags set: - Tracked (trk): Connection tracking has occurred. - Reply (rpl): The flow is in the reply direction. - Invalid (inv): The connection tracker couldn't identify the connection. - New (new): This is the beginning of a new connection. - Established (est): This is part of an already existing connection. - Related (rel): This connection is related to an existing connection. For more information, consult the ovs-ofctl(8) man pages. Below is a simple example flow table to allow outbound TCP traffic from port 1 and drop traffic from port 2 that was not initiated by port 1: table=0,priority=1,action=drop table=0,arp,action=normal table=0,in_port=1,tcp,ct_state=-trk,action=ct(commit,zone=9),2 table=0,in_port=2,tcp,ct_state=-trk,action=ct(zone=9,table=1) table=1,in_port=2,ct_state=+trk+est,tcp,action=1 table=1,in_port=2,ct_state=+trk+new,tcp,action=drop Based on original design by Justin Pettit, contributions from Thomas Graf and Daniele Di Proietto. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* dpif-netdev: Exact match non-presence of vlans.Jarno Rajahalme2015-09-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | The Netlink encoding of datapath flow keys cannot express wildcarding the presence of a VLAN tag. Instead, a missing VLAN tag is interpreted as exact match on the fact that there is no VLAN. This makes reading datapath flow dumps confusing, since for everything else, a missing key value means that the corresponding key was wildcarded. Unless we refactor a lot of code that translates between Netlink and struct flow representations, we have to do the same in the userspace datapath. This makes at least the flow install logs show that the vlan_tci field is matched to zero. However, the datapath flow dumps remain as they were before, as they are performed using the netlink format. Add a test to verify that packet with a vlan will not match a rule that may seem wildcarding the presence of the vlan tag. Applying this test without the userspace datapath modification showed that the userspace datapath failed to create a new datapath flow for the VLAN packet before this patch. Reported-by: Tony van der Peet <tony.vanderpeet@gmail.com> Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* tunnel: Support matching on the presence of Geneve options.Jesse Gross2015-08-281-1/+2
| | | | | | | | | | | | | | | | | | | | | Sometimes it is useful to match only on whether a Geneve option is present even if the specific value is unimportant. A special case of this is zero length options where there is no value at all and the only information conveyed is whether the option was included in the packet. This operation was partially supported before but it was not consistent - in particular, options were never serialized through NXM/OXM unless they had a non-zero mask. Furthermore, zero length options were rejected altogether when they were installed through the Geneve map OpenFlow command. This adds support for these types of matches by making any NXM/OXM for tunnel metadata force a match on that field. In the case of a zero length option, both the value and mask of the NXM are ignored. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
* userspace: Define and use struct eth_addr.Jarno Rajahalme2015-08-281-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Define struct eth_addr and use it instead of a uint8_t array for all ethernet addresses in OVS userspace. The struct is always the right size, and it can be assigned without an explicit memcpy, which makes code more readable. "struct eth_addr" is a good type name for this as many utility functions are already named accordingly. struct eth_addr can be accessed as bytes as well as ovs_be16's, which makes the struct 16-bit aligned. All use seems to be 16-bit aligned, so some algorithms on the ethernet addresses can be made a bit more efficient making use of this fact. As the struct fits into a register (in 64-bit systems) we pass it by value when possible. This patch also changes the few uses of Linux specific ETH_ALEN to OVS's own ETH_ADDR_LEN, and removes the OFP_ETH_ALEN, as it is no longer needed. This work stemmed from a desire to make all struct flow members assignable for unrelated exploration purposes. However, I think this might be a nice code readability improvement by itself. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
* odp-util: Fix put_nd_key().Jarno Rajahalme2015-08-201-1/+1
| | | | | | | | | Actually copy the 'nd_target' from the key. Found by inspection. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* dpif-netdev: Translate Geneve options per-flow, not per-packet.Jesse Gross2015-08-051-18/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel implementation of Geneve options stores the TLV option data in the flow exactly as received, without any further parsing. This is then translated to known options for the purposes of matching on flow setup (which will then install a datapath flow in the form the kernel is expecting). The userspace implementation behaves a little bit differently - it looks up known options as each packet is received. The reason for this is there is a much tighter coupling between datapath and flow translation and the representation is generally expected to be the same. This works but it incurs work on a per-packet basis that could be done per-flow instead. This introduces a small translation step for Geneve packets between datapath and flow lookup for the userspace datapath in order to allow the same kind of processing that the kernel does. A side effect of this is that unknown options are now shown when flows dumped via ovs-appctl dpif/dump-flows, similar to the kernel. There is a second benefit to this as well: for some operations it is preferable to keep the options exactly as they were received on the wire, which this enables. One example is that for packets that are executed from ofproto-dpif-upcall to the datapath, this avoids the translation of Geneve metadata. Since this conversion is potentially lossy (for unknown options), keeping everything in the same format removes the possibility of dropping options if the packet comes back up to userspace and the Geneve option translation table has changed. To help with these types of operations, most functions can understand both formats of data and seamlessly do the right thing. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
* Extend sFlow agent to report tunnel and MPLS structuresNeil McKee2015-07-211-3/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Packets are still sampled at ingress only, so the egress tunnel and/or MPLS structures are only included when there is just 1 output port. The actions are either provided by the datapath in the sample upcall or looked up in the userspace cache. The former is preferred because it is more reliable and does not present any new demands or constraints on the userspace cache, however the code falls back on the userspace lookup so that this solution can work with existing kernel datapath modules. If the lookup fails it is not critical: the compiled user-action-cookie is still available and provides the essential output port and output VLAN forwarding information just as before. The openvswitch actions can express almost any tunneling/mangling so the only totally faithful representation would be to somehow encode the whole list of flow actions in the sFlow output. However the standard sFlow tunnel structures can express most common real-world scenarios, so in parsing the actions we look for those and skip the encoding if we see anything unusual. For example, a single set(tunnel()) or tnl_push() is interpreted, but if a second such action is encountered then the egress tunnel reporting is suppressed. The sFlow standard allows "best effort" encoding so that if a field is not knowable or too onerous to look up then it can be left out. This is often the case for the layer-4 source port or even the src ip address of a tunnel. The assumption is that monitoring is enabled everywhere so a missing field can typically be seen at ingress to the next switch in the path. This patch also adds unit tests to check the sFlow encoding of set(tunnel()), tnl_push() and push_mpls() actions. The netlink attribute to request that actions be included in the upcall from the datapath is inserted for sFlow sampling only. To make that option be explicit would require further changes to the printing and parsing of actions in lib/odp-util.c, and to scripts in the test suite. Further enhancements to report on 802.1AD QinQ, 64-bit tunnel IDs, and NAT transformations can follow in future patches that make only incremental changes. Signed-off-by: Neil McKee <neil.mckee@inmon.com> [blp@nicira.com made stylistic and semantic changes] Signed-off-by: Ben Pfaff <blp@nicira.com>
* flow: Factor out flag parsing and formatting routines.Jesse Gross2015-07-151-137/+18
| | | | | | | | | There are several implementations of functions that parse/format flags and their binary representation. This factors them out into common routines. In addition to reducing code, it also makes things more consistent across different parts of OVS. Signed-off-by: Jesse Gross <jesse@nicira.com>
* odp-util: Share fields between odp and dpif_backer.Joe Stringer2015-07-061-2/+2
| | | | | | | | Datapath support for some flow key fields is used inside ofproto-dpif as well as odp-util. Share these fields using the same structure. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
* tunnels: Don't initialize unnecessary packet metadata.Jesse Gross2015-07-011-1/+2
| | | | | | | | | | | | | | | | | | | The addition of Geneve options to packet metadata significantly expanded its size. It was reported that this can decrease performance for DPDK ports by up to 25% since we need to initialize the whole structure on each packet receive. It is not really necessary to zero out the entire structure because miniflow_extract() only copies the tunnel metadata when particular fields indicate that it is valid. Therefore, as long as we zero out these fields when the metadata is initialized and ensure that the rest of the structure is correctly set in the presence of a tunnel, we can avoid touching the tunnel fields on packet reception. Reported-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* tunneling: Userspace datapath support for Geneve options.Jesse Gross2015-06-261-23/+71
| | | | | | | | | | | | | | | | | | Currently the userspace datapath only supports Geneve in a basic mode - without options - since the rest of userspace previously didn't support options either. This enables the userspace datapath to send and receive options as well. The receive path for extracting the tunnel options isn't entirely optimal because it does a lookup on the options on a per-packet basis, rather than per-flow like the kernel does. This is not as straightforward to do in the userspace datapath since there is no translation step between packet formats used in packet vs. flow lookup. This can be optimized in the future and in the meantime option support is still useful for testing and simulation. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* tunnel: Geneve TLV handling support for OpenFlow.Jesse Gross2015-06-251-45/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current support for Geneve in OVS is exactly equivalent to VXLAN: it is possible to set and match on the VNI but not on any options contained in the header. This patch enables the use of options. The goal for Geneve support is not to add support for any particular option but to allow end users or controllers to specify what they would like to match. That is, the full range of Geneve's capabilities should be exposed without modifying the code (the one exception being options that require per-packet computation in the fast path). The main issue with supporting Geneve options is how to integrate the fields into the existing OpenFlow pipeline. All existing operations are referred to by their NXM/OXM field name - matches, action generation, arithmetic operations (i.e. tranfer to a register). However, the Geneve option space is exactly the same as the OXM space, so a direct mapping is not feasible. Instead, we create a pool of 64 NXMs that are then dynamically mapped on Geneve option TLVs using OpenFlow. Once mapped, these fields become first-class citizens in the OpenFlow pipeline. An example of how to use Geneve options: ovs-ofctl add-geneve-map br0 {class=0xffff,type=0,len=4}->tun_metadata0 ovs-ofctl add-flow br0 in_port=LOCAL,actions=set_field:0xffffffff->tun_metadata0,1 This will add a 4 bytes option (filled will all 1's) to all packets coming from the LOCAL port and then send then out to port 1. A limitation of this patch is that although the option table is specified for a particular switch over OpenFlow, it is currently global to all switches. This will be addressed in a future patch. Based on work originally done by Madhu Challa. Ben Pfaff also significantly improved the comments. Signed-off-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>