summaryrefslogtreecommitdiff
path: root/datapath
Commit message (Collapse)AuthorAgeFilesLines
* datapath: conntrack: mark expected switch fall-throughGustavo A. R. Silva2020-03-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 279badc2a85be83e0187b8c566e3b476b76a87a2 Author: Gustavo A. R. Silva <garsilva@embeddedor.com> Date: Thu Oct 19 12:55:03 2017 -0500 openvswitch: conntrack: mark expected switch fall-through In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that in this particular case I placed a "fall through" comment on its own line, which is what GCC is expecting to find. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Use nla_parse deprecated functionsJohannes Berg2020-03-065-13/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 8cb081746c031fb164089322e2336a0bf5b3070c Author: Johannes Berg <johannes.berg@intel.com> Date: Fri Apr 26 14:07:28 2019 +0200 netlink: make validation more configurable for future strictness We currently have two levels of strict validation: 1) liberal (default) - undefined (type >= max) & NLA_UNSPEC attributes accepted - attribute length >= expected accepted - garbage at end of message accepted 2) strict (opt-in) - NLA_UNSPEC attributes accepted - attribute length >= expected accepted Split out parsing strictness into four different options: * TRAILING - check that there's no trailing data after parsing attributes (in message or nested) * MAXTYPE - reject attrs > max known type * UNSPEC - reject attributes with NLA_UNSPEC policy entries * STRICT_ATTRS - strictly validate attribute size The default for future things should be *everything*. The current *_strict() is a combination of TRAILING and MAXTYPE, and is renamed to _deprecated_strict(). The current regular parsing has none of this, and is renamed to *_parse_deprecated(). Additionally it allows us to selectively set one of the new flags even on old policies. Notably, the UNSPEC flag could be useful in this case, since it can be arranged (by filling in the policy) to not be an incompatible userspace ABI change, but would then going forward prevent forgetting attribute entries. Similar can apply to the POLICY flag. We end up with the following renames: * nla_parse -> nla_parse_deprecated * nla_parse_strict -> nla_parse_deprecated_strict * nlmsg_parse -> nlmsg_parse_deprecated * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict * nla_parse_nested -> nla_parse_nested_deprecated * nla_validate_nested -> nla_validate_nested_deprecated Using spatch, of course: @@ expression TB, MAX, HEAD, LEN, POL, EXT; @@ -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression TB, MAX, NLA, POL, EXT; @@ -nla_parse_nested(TB, MAX, NLA, POL, EXT) +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) @@ expression START, MAX, POL, EXT; @@ -nla_validate_nested(START, MAX, POL, EXT) +nla_validate_nested_deprecated(START, MAX, POL, EXT) @@ expression NLH, HDRLEN, MAX, POL, EXT; @@ -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) For this patch, don't actually add the strict, non-renamed versions yet so that it breaks compile if I get it wrong. Also, while at it, make nla_validate and nla_parse go down to a common __nla_validate_parse() function to avoid code duplication. Ultimately, this allows us to have very strict validation for every new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the next patch, while existing things will continue to work as is. In effect then, this adds fully strict validation for any new command. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Backport portions of this commit applicable to openvswitch and added necessary compatibility layer changes to support older kernels. Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Kbuild: Add kcompat.h header to front of NOSTDINCGreg Rose2020-03-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since this commit in the Linux upstream kernel: 'commit 9b9a3f20cbe0 ("kbuild: split final module linking out into Makefile.modfinal")' The openvswitch kernel module fails to build against the upstream Linux kernel. The cause of the build failure is that the include of the KBUILD_EXTMOD variable was dropped in Makefile.modfinal when it was split out from Makefile.modpost. Our Kbuild was setting the ccflags-y variable to include our kcompat.h header as the first header file. The Linux kernel maintainer has said that it is incorrect to rely on the ccflags-y variable for the modfinal phase of the build so that is why KBUILD_EXTMOD is not included. We fix this by breaking a different Linux kernel make rule. We add '-include $(builddir)/kcompat.h' to the front of the NOSTDINC variable setting in our Kbuild makefile. As noted already in the comment for the NOSTDINC setting: \# These include directories have to go before -I$(KSRC)/include. \# NOSTDINC_FLAGS just happens to be a variable that goes in the \# right place, even though it's conceptually incorrect. So we continue the misuse of the NOSTDINC variable to fix this issue as well. The assumption of the Linux kernel maintainers is that any local, out-of-tree build include files can be added to the end of the command line. In our case that is wrong of course, but there is nothing we can do about it that I know of other than using some utility like unifdef to strip out offending chunks of our compatibility layer code before invocation of Makefile.modfinal. That is a big change that would take a lot of work to implement. We could ask the Linux kernel maintainers to provide some way for out-of-tree kernel modules to include their own header files first in a proper manner. I consider that to be a very low probability of success but something we could ask about. For now we cheat and take the easy way out. Reported-by: David Ahern <dsahern@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Use sizeof_field macroPankaj Bharadiya2020-03-069-13/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit c593642c8be046915ca3a4a300243a68077cd207 Author: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Date: Mon Dec 9 10:31:43 2019 -0800 treewide: Use sizeof_field() macro Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net Also added a compatibility layer macro for older kernels that still use FIELD_SIZEOF Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Remove flex_array codeGreg Rose2020-03-061-18/+10
| | | | | | | | | Flex array support is removed since kernel 5.1. Convert to use kvmalloc_array instead. Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Move genl_ops policy to genl_familyJohannes Berg2020-03-063-0/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 3b0f31f2b8c9fb348e4530b88f6b64f9621f83d6 Author: Johannes Berg <johannes.berg@intel.com> Date: Thu Mar 21 22:51:02 2019 +0100 genetlink: make policy common to family Since maxattr is common, the policy can't really differ sanely, so make it common as well. The only user that did in fact manage to make a non-common policy is taskstats, which has to be really careful about it (since it's still using a common maxattr!). This is no longer supported, but we can fake it using pre_doit. This reduces the size of e.g. nl80211.o (which has lots of commands): text data bss dec hex filename 398745 14323 2240 415308 6564c net/wireless/nl80211.o (before) 397913 14331 2240 414484 65314 net/wireless/nl80211.o (after) -------------------------------- -832 +8 0 -824 Which is obviously just 8 bytes for each command, and an added 8 bytes for the new policy pointer. I'm not sure why the ops list is counted as .text though. Most of the code transformations were done using the following spatch: @ops@ identifier OPS; expression POLICY; @@ struct genl_ops OPS[] = { ..., { - .policy = POLICY, }, ... }; @@ identifier ops.OPS; expression ops.POLICY; identifier fam; expression M; @@ struct genl_family fam = { .ops = OPS, .maxattr = M, + .policy = POLICY, ... }; This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing the cb->data as ops, which we want to change in a later genl patch. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Since commit 3b0f31f2b8c9f ("genetlink: make policy common to family") the policy field of the genl_ops structure has been moved into the genl_family structure. Add necessary compat layer infrastructure to still support older kernels. Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Fix up changes to inet frags in 5.1+Greg Rose2020-03-061-0/+14
| | | | | | | | | | | | Since Linux kernel release 5.1 the fragments field of the inet_frag_queue structure is removed and now only the rb_fragments structure with an rb_node pointer is used for both ipv4 and ipv6. In addition, the atomic_sub and atomic_add functions are replaced with their equivalent long counterparts. Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Remove HAVE_BOOL_TYPEGreg Rose2020-01-312-11/+0
| | | | | | | | | | | OVS only supports Linux kernels since 3.10 and all kernels since then have the bool type. This check is unnecessary so remove it. Passes Travis: https://travis-ci.org/gvrose8192/ovs-experimental/builds/644103253 Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* userspace: Improved packet drop statistics.Anju Thomas2020-01-071-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently OVS maintains explicit packet drop/error counters only on port level. Packets that are dropped as part of normal OpenFlow processing are counted in flow stats of “drop” flows or as table misses in table stats. These can only be interpreted by controllers that know the semantics of the configured OpenFlow pipeline. Without that knowledge, it is impossible for an OVS user to obtain e.g. the total number of packets dropped due to OpenFlow rules. Furthermore, there are numerous other reasons for which packets can be dropped by OVS slow path that are not related to the OpenFlow pipeline. The generated datapath flow entries include a drop action to avoid further expensive upcalls to the slow path, but subsequent packets dropped by the datapath are not accounted anywhere. Finally, the datapath itself drops packets in certain error situations. Also, these drops are today not accounted for.This makes it difficult for OVS users to monitor packet drop in an OVS instance and to alert a management system in case of a unexpected increase of such drops. Also OVS trouble-shooters face difficulties in analysing packet drops. With this patch we implement following changes to address the issues mentioned above. 1. Identify and account all the silent packet drop scenarios 2. Display these drops in ovs-appctl coverage/show Co-authored-by: Rohith Basavaraja <rohith.basavaraja@gmail.com> Co-authored-by: Keshav Gupta <keshugupta1@gmail.com> Signed-off-by: Anju Thomas <anju.thomas@ericsson.com> Signed-off-by: Rohith Basavaraja <rohith.basavaraja@gmail.com> Signed-off-by: Keshav Gupta <keshugupta1@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* compat: Include confirm_neigh parameter if neededGreg Rose2020-01-072-0/+9
| | | | | | | | | | | | | | A change backported to the Linux 4.14.162 LTS kernel requires a boolean parameter. Check for the presence of the parameter and adjust the caller in that case. Passes check-kmod test with no regressions. Passes Travis build here: https://travis-ci.org/gvrose8192/ovs-experimental/builds/633461320 Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* dpif: Add support to set user featuresPaul Blakey2019-12-221-0/+3
| | | | | | | | | | | | This enables user features on the kernel datapath via the DP_CMD_SET command, and also retrieves them to check for actual support and not just an older kernel ignoring the requested features. This will be used in next patch to enable recirc_id sharing with tc. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* datapath: make generic netlink group constGreg Rose2019-12-022-11/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 48e48a70c08a8a68f8697f8b30cb83775bda8001 Author: stephen hemminger <stephen@networkplumber.org> Date: Wed Jul 16 11:25:52 2014 -0700 openvswitch: make generic netlink group const Generic netlink tables can be const. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net> The equivalent tables in meter.c and conntrack.c are constified so it should be safe to do the same for these and will improve security as well. Original patch slightly modified for out of tree module. Passes check-kmod. Passes Travis. https://travis-ci.org/gvrose8192/ovs-experimental/builds/616880002 Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif-upcall: Echo HASH attribute back to datapath.Tonghao Zhang2019-11-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel datapath may sent upcall with hash info, ovs-vswitchd should get it from upcall and then send it back. The reason is that: | When using the kernel datapath, the upcall don't | include skb hash info relatived. That will introduce | some problem, because the hash of skb is important | in kernel stack. For example, VXLAN module uses | it to select UDP src port. The tx queue selection | may also use the hash in stack. | | Hash is computed in different ways. Hash is random | for a TCP socket, and hash may be computed in hardware, | or software stack. Recalculation hash is not easy. | | There will be one upcall, without information of skb | hash, to ovs-vswitchd, for the first packet of a TCP | session. The rest packets will be processed in Open vSwitch | modules, hash kept. If this tcp session is forward to | VXLAN module, then the UDP src port of first tcp packet | is different from rest packets. | | TCP packets may come from the host or dockers, to Open vSwitch. | To fix it, we store the hash info to upcall, and restore hash | when packets sent back. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=bd1903b7c4596ba6f7677d0dfefd05ba5876707d Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Datapath: Change in openvswitch kernel module to support MPLS label depth of ↵Martin Varghese2019-11-224-32/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | 3 in ingress direction. Upstream commit: commit fbdcdd78da7c95f1b970d371e1b23cbd3aa990f3 Author: Martin Varghese <martin.varghese@nokia.com> Date: Mon Nov 4 07:27:44 2019 +0530 Change in Openvswitch to support MPLS label depth of 3 in ingress direction The openvswitch was supporting a MPLS label depth of 1 in the ingress direction though the userspace OVS supports a max depth of 3 labels.This change enables openvswitch module to support a max depth of 3 labels in the ingress. Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Add missing inline keywordGreg Rose2019-11-201-1/+1
| | | | | | | | | | The missing inline keyword before the definition of the rpl_nf_ct_tmpl_free() function causes spurious warnings about the function not being used on some older kernels. Add the keyword to suppress the warning. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ip_gre: Remove even more unused codeGreg Rose2019-11-051-102/+0
| | | | | | | | | | | | | | There is a confusing mix of ipgre and gretap functions with some needed for gretap still having ipgre_ prefixes. This time though I think I got the rest of the unused ipgre code. Passes Travis here and this time I made sure the patch passing Travis is the same one I'm mailing. https://travis-ci.org/gvrose8192/ovs-experimental/builds/607296133 Fixes: d5822f428814 ("gre: Remove dead ipgre code") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Revert "ip_gre: Remove even more unused code"Greg Rose2019-11-011-0/+38
| | | | | | | | | | | This reverts commit 42a059e02bf343787951be2824c579e1c9a26e12. Not all the necessary ipgre prefixed code was removed that should have been. Another patch will follow with the correct removed code. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* ip_gre: Remove even more unused codeGreg Rose2019-10-311-38/+0
| | | | | | | | | | There is a confusing mix of ipgre and gretap functions with some needed for gretap still having ipgre_ prefixes. This time though I think I got the rest of the unused ipgre code. Fixes: d5822f428814 ("gre: Remove dead ipgre code") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ip_gre: Removed unused ipgre netdev opsGreg Rose2019-10-311-16/+0
| | | | | | | | | When cleaning up unused ipgre code the ipgre_netdev_ops structure was missed. Get rid of it now. Fixes: d5822f428814 ("gre: Remove dead ipgre code") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Allow attaching helper in later commitYi-Hung Wei2019-10-181-8/+13
| | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 248d45f1e1934f7849fbdc35ef1e57151cf063eb Author: Yi-Hung Wei <yihung.wei@gmail.com> Date: Fri Oct 4 09:26:44 2019 -0700 openvswitch: Allow attaching helper in later commit This patch allows to attach conntrack helper to a confirmed conntrack entry. Currently, we can only attach alg helper to a conntrack entry when it is in the unconfirmed state. This patch enables an use case that we can firstly commit a conntrack entry after it passed some initial conditions. After that the processing pipeline will further check a couple of packets to determine if the connection belongs to a particular application, and attach alg helper to the connection in a later stage. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Fix log message in ovs conntrackYi-Hung Wei2019-10-181-1/+1
| | | | | | | | | | | | | | | | | Upstream commit: commit 12c6bc38f99bb168b7f16bdb5e855a51a23ee9ec Author: Yi-Hung Wei <yihung.wei@gmail.com> Date: Wed Aug 21 17:16:10 2019 -0700 openvswitch: Fix log message in ovs conntrack Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Replace removed NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT)Yi-Hung Wei2019-10-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Backports the following upstream commit with some backward compatibility change. commit f319ca6557c10a711facc4dd60197470796d3ec1 Author: Geert Uytterhoeven <geert@linux-m68k.org> Date: Wed May 8 08:52:32 2019 +0200 openvswitch: Replace removed NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT) Commit 4806e975729f99c7 ("netfilter: replace NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT)") removed CONFIG_NF_NAT_NEEDED, but a new user popped up afterwards. Fixes: fec9c271b8f1bde1 ("openvswitch: load and reference the NAT helper.") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Check for null pointer return from nla_nest_start_noflagColin Ian King2019-10-181-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | upstream commit: commit ca96534630e2edfd73121c487c957b17eca3b7d7 Author: Colin Ian King <colin.king@canonical.com> Date: Wed May 1 14:41:58 2019 +0100 openvswitch: check for null pointer return from nla_nest_start_noflag The call to nla_nest_start_noflag can return null in the unlikely event that nla_put returns -EMSGSIZE. Check for this condition to avoid a null pointer dereference on pointer nla_reply. Addresses-Coverity: ("Dereference null return value") Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Load and reference the NAT helper.Yi-Hung Wei2019-10-182-6/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit backports the following upstream commit, and two functions in nf_conntrack_helper.h. Upstream commit: commit fec9c271b8f1bde1086be5aa415cdb586e0dc800 Author: Flavio Leitner <fbl@redhat.com> Date: Wed Apr 17 11:46:17 2019 -0300 openvswitch: load and reference the NAT helper. This improves the original commit 17c357efe5ec ("openvswitch: load NAT helper") where it unconditionally tries to load the module for every flow using NAT, so not efficient when loading multiple flows. It also doesn't hold any references to the NAT module while the flow is active. This change fixes those problems. It will try to load the module only if it's not present. It grabs a reference to the NAT module and holds it while the flow is active. Finally, an error message shows up if either actions above fails. Fixes: 17c357efe5ec ("openvswitch: load NAT helper") Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: genetlink: optionally validate strictly/dumpsYi-Hung Wei2019-10-183-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch backports the following upstream commit within the openvswitch kernel module with some checks so that it also works in the older kernel. Upstream commit: commit ef6243acb4782df587a4d7d6c310fa5b5d82684b Author: Johannes Berg <johannes.berg@intel.com> Date: Fri Apr 26 14:07:31 2019 +0200 genetlink: optionally validate strictly/dumps Add options to strictly validate messages and dump messages, sometimes perhaps validating dump messages non-strictly may be required, so add an option for that as well. Since none of this can really be applied to existing commands, set the options everwhere using the following spatch: @@ identifier ops; expression X; @@ struct genl_ops ops[] = { ..., { .cmd = X, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, ... }, ... }; For new commands one should just not copy the .validate 'opt-out' flags and thus get strict validation. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Use nla_nest_start_noflag()Yi-Hung Wei2019-10-187-28/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch backports the openvswitch changes and update the compat layer for the following upstream patch. commit ae0be8de9a53cda3505865c11826d8ff0640237c Author: Michal Kubecek <mkubecek@suse.cz> Date: Fri Apr 26 11:13:06 2019 +0200 netlink: make nla_nest_start() add NLA_F_NESTED flag Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Handle NF_NAT_NEEDED replacementYi-Hung Wei2019-10-181-8/+17
| | | | | | | | | | | | | | | | | | | | | | | | | Starting from the following upstream commit, NF_NAT_NEEDED is replaced by IS_ENABLED(CONFIG_NF_NAT) in the upstream kernel. This patch makes some changes so that our in tree ovs kernel module is compatible to both old and new kernels. Upstream commit: commit 4806e975729f99c7908d1688a143f1e16d464e6c Author: Florian Westphal <fw@strlen.de> Date: Wed Mar 27 09:22:26 2019 +0100 netfilter: replace NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT) NF_NAT_NEEDED is true whenever nat support for either ipv4 or ipv6 is enabled. Now that the af-specific nat configuration switches have been removed, IS_ENABLED(CONFIG_NF_NAT) has the same effect. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: add seqadj extension when NAT is used.Flavio Leitner2019-10-181-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | upstream patch: commit fa7e428c6b7ed3281610511a2b2ec716d9894be8 Author: Flavio Leitner <fbl@sysclose.org> Date: Mon Mar 25 15:58:31 2019 -0300 openvswitch: add seqadj extension when NAT is used. When the conntrack is initialized, there is no helper attached yet so the nat info initialization (nf_nat_setup_info) skips adding the seqadj ext. A helper is attached later when the conntrack is not confirmed but is going to be committed. In this case, if NAT is needed then adds the seqadj ext as well. Fixes: 16ec3d4fbb96 ("openvswitch: Fix cached ct with helper.") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Detect upstream nf_nat changeYi-Hung Wei2019-10-181-1/+12
| | | | | | | | | | | | | | | | | | | | The following two upstream commits merge nf_nat_ipv4 and nf_nat_ipv6 into nf_nat core, and move some header files around. To handle these modifications, this patch detects the upstream changes, uses the header files and config symbols properly. Ideally, we should replace CONFIG_NF_NAT_IPV4 and CONFIG_NF_NAT_IPV6 with CONFIG_NF_NAT and CONFIG_IPV6. In order to keep backward compatibility, we keep the checking of CONFIG_NF_NAT_IPV4/6 as is for the old kernel, and replace them with marco for the new kernel. upstream commits: 3bf195ae6037 ("netfilter: nat: merge nf_nat_ipv4,6 into nat core") d2c5c103b133 ("netfilter: nat: remove nf_nat_l3proto.h and nf_nat_core.h") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Replace nf_ct_invert_tuplepr() with nf_ct_invert_tuple()Yi-Hung Wei2019-10-182-1/+15
| | | | | | | | | | | | | | | | | | After upstream net-next commit 303e0c558959 ("netfilter: conntrack: avoid unneeded nf_conntrack_l4proto lookups") nf_ct_invert_tuplepr() is no longer available in the kernel. Ideally, we should be in sync with upstream kernel by calling nf_ct_invert_tuple() directly in conntrack.c. However, nf_ct_invert_tuple() has different function signature in older kernel, and it would be hard to replace that in the compat layer. Thus, we use rpl_nf_ct_invert_tuple() in conntrack.c and maintain compatibility in the compat layer so that ovs kernel module runs smoothly in both new and old kernel. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Fix linking without CONFIG_NF_CONNTRACK_LABELSArnd Bergmann2019-10-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | upstream commit: commit a277d516de5f498c91d91189717ef7e01102ad27 Author: Arnd Bergmann <arnd@arndb.de> Date: Fri Nov 2 16:36:55 2018 +0100 openvswitch: fix linking without CONFIG_NF_CONNTRACK_LABELS When CONFIG_CC_OPTIMIZE_FOR_DEBUGGING is enabled, the compiler fails to optimize out a dead code path, which leads to a link failure: net/openvswitch/conntrack.o: In function `ovs_ct_set_labels': conntrack.c:(.text+0x2e60): undefined reference to `nf_connlabels_replace' In this configuration, we can take a shortcut, and completely remove the contrack label code. This may also help the regular optimization. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: compat: drop bridge nf reset from nf_resetGreg Rose2019-10-182-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commmit: commit 895b5c9f206eb7d25dc1360a8ccfc5958895eb89 Author: Florian Westphal <fw@strlen.de> Date: Sun Sep 29 20:54:03 2019 +0200 netfilter: drop bridge nf reset from nf_reset commit 174e23810cd31 ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi recycle always drop skb extensions. The additional skb_ext_del() that is performed via nf_reset on napi skb recycle is not needed anymore. Most nf_reset() calls in the stack are there so queued skb won't block 'rmmod nf_conntrack' indefinitely. This removes the skb_ext_del from nf_reset, and renames it to a more fitting nf_reset_ct(). In a few selected places, add a call to skb_ext_reset to make sure that no active extensions remain. I am submitting this for "net", because we're still early in the release cycle. The patch applies to net-next too, but I think the rename causes needless divergence between those trees. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Added some compat layer fixups for nf_reset_ct. This is just a portion of the upstream commit that applies to openvswitch. Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* datapath: rename flow_stats to sw_flow_statsPablo Neira Ayuso2019-10-183-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit aef833c58d321f09ae4ce4467723542842ba9faf Author: Pablo Neira Ayuso <pablo@netfilter.org> Date: Fri Jul 19 18:20:13 2019 +0200 net: openvswitch: rename flow_stats to sw_flow_stats There is a flow_stats structure defined in include/net/flow_offload.h and a follow up patch adds #include <net/flow_offload.h> to net/sch_generic.h. This breaks compilation since OVS codebase includes net/sock.h which pulls in linux/filter.h which includes net/sch_generic.h. In file included from ./include/net/sch_generic.h:18:0, from ./include/linux/filter.h:25, from ./include/net/sock.h:59, from ./include/linux/tcp.h:19, from net/openvswitch/datapath.c:24 This definition takes precedence on OVS since it is placed in the networking core, so rename flow_stats in OVS to sw_flow_stats since this structure is contained in sw_flow. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* compat: remove the incorrect mtu limit for erspanHaishuang Yan2019-10-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 0e141f757b2c78c983df893e9993313e2dc21e38 Author: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Date: Fri Sep 27 14:58:20 2019 +0800 erspan: remove the incorrect mtu limit for erspan erspan driver calls ether_setup(), after commit 61e84623ace3 ("net: centralize net_device min/max MTU checking"), the range of mtu is [min_mtu, max_mtu], which is [68, 1500] by default. It causes the dev mtu of the erspan device to not be greater than 1500, this limit value is not correct for ipgre tap device. Tested: Before patch: # ip link set erspan0 mtu 1600 Error: mtu greater than device maximum. After patch: # ip link set erspan0 mtu 1600 # ip -d link show erspan0 21: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1600 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 0 Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking") Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* datapath: change type of UPCALL_PID attribute to NLA_UNSPECLi RongQing2019-10-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit ea8564c865299815095bebeb4b25bef474218e4c Author: Li RongQing <lirongqing@baidu.com> Date: Tue Sep 24 19:11:52 2019 +0800 openvswitch: change type of UPCALL_PID attribute to NLA_UNSPEC userspace openvswitch patch "(dpif-linux: Implement the API functions to allow multiple handler threads read upcall)" changes its type from U32 to UNSPEC, but leave the kernel unchanged and after kernel 6e237d099fac "(netlink: Relax attr validation for fixed length types)", this bug is exposed by the below warning [ 57.215841] netlink: 'ovs-vswitchd': attribute type 5 has an invalid length. Fixes: 5cd667b0a456 ("openvswitch: Allow each vport to have an array of 'port_id's") Signed-off-by: Li RongQing <lirongqing@baidu.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Fixes: beb1c69a3 ("datapath: Allow each vport to have an array of 'port_id's.") Cc: Li RongQing <lirongqing@baidu.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* datapath: hide clang frame-overflow warningsArnd Bergmann2019-10-181-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 260637903f47f20c5918bb5c1eea52b2a28ea863 Author: Arnd Bergmann <arnd@arndb.de> Date: Mon Jul 22 17:00:01 2019 +0200 ovs: datapath: hide clang frame-overflow warnings Some functions in the datapath code are factored out so that each one has a stack frame smaller than 1024 bytes with gcc. However, when compiling with clang, the functions are inlined more aggressively and combined again so we get net/openvswitch/datapath.c:1124:12: error: stack frame size of 1528 bytes in function 'ovs_flow_cmd_set' [-Werror,-Wframe-larger-than=] Marking both get_flow_actions() and ovs_nla_init_match_and_action() as 'noinline_for_stack' gives us the same behavior that we see with gcc, and no warning. Note that this does not mean we actually use less stack, as the functions call each other, and we still get three copies of the large 'struct sw_flow_key' type on the stack. The comment tells us that this was previously considered safe, presumably since the netlink parsing functions are called with a known backchain that does not also use a lot of stack space. Fixes: 9cc9a5cb176c ("datapath: Avoid using stack larger than 1024.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* compat: Fix small naming issueGreg Rose2019-10-171-3/+3
| | | | | | | | | | In commit 057772cf2477 the function is missing the rpl_ prefix and the define that replaces the original function should come after the function definition. Fixes: 057772cf2477 ("compat: Backport nf_ct_tmpl_alloc().") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
* datapath: Fix conntrack cache with timeoutYi-Hung Wei2019-09-301-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch is from the following upstream net-next commit along with an updated system traffic test to avoid regression. Upstream commit: commit 7177895154e6a35179d332f4a584d396c50d0612 Author: Yi-Hung Wei <yihung.wei@gmail.com> Date: Thu Aug 22 13:17:50 2019 -0700 openvswitch: Fix conntrack cache with timeout This patch addresses a conntrack cache issue with timeout policy. Currently, we do not check if the timeout extension is set properly in the cached conntrack entry. Thus, after packet recirculate from conntrack action, the timeout policy is not applied properly. This patch fixes the aforementioned issue. Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
* datapath: Add support for conntrack timeout policyYi-Hung Wei2019-09-262-1/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for specifying a timeout policy for a connection in connection tracking system in kernel datapath. The timeout policy will be attached to a connection when the connection is committed to conntrack. This patch introduces a new odp field OVS_CT_ATTR_TIMEOUT in the ct action that specifies the timeout policy in the datapath. In the following patch, during the upcall process, the vswitchd will use the ct_zone to look up the corresponding timeout policy and fill OVS_CT_ATTR_TIMEOUT if it is available. The datapath code is from the following two net-next upstream commits. Upstream commit: commit 06bd2bdf19d2f3d22731625e1a47fa1dff5ac407 Author: Yi-Hung Wei <yihung.wei@gmail.com> Date: Tue Mar 26 11:31:14 2019 -0700 openvswitch: Add timeout support to ct action Add support for fine-grain timeout support to conntrack action. The new OVS_CT_ATTR_TIMEOUT attribute of the conntrack action specifies a timeout to be associated with this connection. If no timeout is specified, it acts as is, that is the default timeout for the connection will be automatically applied. Example usage: $ nfct timeout add timeout_1 inet tcp syn_sent 100 established 200 $ ovs-ofctl add-flow br0 in_port=1,ip,tcp,action=ct(commit,timeout=timeout_1) CC: Pravin Shelar <pshelar@ovn.org> CC: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> commit 6d670497e01803b486aa72cc1a718401ab986896 Author: Dan Carpenter <dan.carpenter@oracle.com> Date: Tue Apr 2 09:53:14 2019 +0300 openvswitch: use after free in __ovs_ct_free_action() We free "ct_info->ct" and then use it on the next line when we pass it to nf_ct_destroy_timeout(). This patch swaps the order to avoid the use after free. Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* datapath: compat: Backport nf_conntrack_timeout supportYi-Hung Wei2019-09-263-0/+138
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch brings in nf_ct_timeout_put() and nf_ct_set_timeout() when it is not available in the kernel. Three symbols are created in acinclude.m4. * HAVE_NF_CT_SET_TIMEOUT is used to determine if upstream net-next commit 717700d183d65 ("netfilter: Export nf_ct_{set,destroy}_timeout()") is availabe. If it is defined, the kernel should have all the nf_conntrack_timeout support that OVS needs. * HAVE_NF_CT_TIMEOUT is used to check if upstream net-next commit 6c1fd7dc489d9 ("netfilter: cttimeout: decouple timeout policy from nfnetlink_cttimeout object") is there. If it is not defined, we will use the old ctnl_timeout interface rather than the nf_ct_timeout interface that is introduced in this commit. * HAVE_NF_CT_TIMEOUT_FIND_GET_HOOK_NET is used to check if upstream commit 19576c9478682 ("netfilter: cttimeout: add netns support") is there, so that we pass different arguement based on whether the kernel has netns support. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* datapath: compat: Backports bugfixes for nf_conncountYifeng Sun2019-08-284-163/+156
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch backports several critical bug fixes related to locking and data consistency in nf_conncount code. This backport is based on the following upstream net-next upstream commits. a007232 ("netfilter: nf_conncount: fix argument order to find_next_bit") c80f10b ("netfilter: nf_conncount: speculative garbage collection on empty lists") 2f971a8 ("netfilter: nf_conncount: move all list iterations under spinlock") df4a902 ("netfilter: nf_conncount: merge lookup and add functions") e8cfb37 ("netfilter: nf_conncount: restart search when nodes have been erased") f7fcc98 ("netfilter: nf_conncount: split gc in two phases") 4cd273b ("netfilter: nf_conncount: don't skip eviction when age is negative") c78e781 ("netfilter: nf_conncount: replace CONNCOUNT_LOCK_SLOTS with CONNCOUNT_SLOTS") d4e7df1 ("netfilter: nf_conncount: use rb_link_node_rcu() instead of rb_link_node()") 53ca0f2 ("netfilter: nf_conncount: remove wrong condition check routine") 3c5cdb1 ("netfilter: nf_conncount: fix unexpected permanent node of list.") 31568ec ("netfilter: nf_conncount: fix list_del corruption in conn_free") fd3e71a ("netfilter: nf_conncount: use spin_lock_bh instead of spin_lock") This patch adds additional compat code so that it can build on all supported kernel versions. In addition, this patch helps OVS datapath to always choose bug-fixed nf_conncount code. If kernel already has these fixes, then kernel's nf_conncount is being used. Otherwise, OVS falls back to use compat nf_conncount functions. Travis tests are at https://travis-ci.org/yifsun/ovs-travis/builds/569056850 On latest RHEL kernel, 'make check-kmod' runs good. VMware-BZ: #2396471 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Clear the L4 portion of the key for "later" fragmentsJustin Pettit2019-08-281-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 0754b4e8cdf3eec6e4122e79af26ed9bab20f8f8 Author: Justin Pettit <jpettit@ovn.org> Date: Tue Aug 27 07:58:10 2019 -0700 openvswitch: Clear the L4 portion of the key for "later" fragments. Only the first fragment in a datagram contains the L4 headers. When the Open vSwitch module parses a packet, it always sets the IP protocol field in the key, but can only set the L4 fields on the first fragment. The original behavior would not clear the L4 portion of the key, so garbage values would be sent in the key for "later" fragments. This patch clears the L4 fields in that circumstance to prevent sending those garbage values as part of the upcall. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* datapath: Properly set L4 keys on "later" IP fragmentsGreg Rose2019-08-283-68/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit ad06a566e118e57b852cab5933dbbbaebb141de3 Author: Greg Rose <gvrose8192@gmail.com> Date: Tue Aug 27 07:58:09 2019 -0700 openvswitch: Properly set L4 keys on "later" IP fragments When IP fragments are reassembled before being sent to conntrack, the key from the last fragment is used. Unless there are reordering issues, the last fragment received will not contain the L4 ports, so the key for the reassembled datagram won't contain them. This patch updates the key once we have a reassembled datagram. The handle_fragments() function works on L3 headers so we pull the L3/L4 flow key update code from key_extract into a new function 'key_extract_l3l4'. Then we add a another new function ovs_flow_key_update_l3l4() and export it so that it is accessible by handle_fragments() for conntrack packet reassembly. Co-authored-by: Justin Pettit <jpettit@ovn.org> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* datapath: fix csum updates for MPLS actionsGreg Rose2019-07-101-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 0e3183cd2a64843a95b62f8bd4a83605a4cf0615 Author: John Hurley <john.hurley@netronome.com> Date: Thu Jun 27 14:37:30 2019 +0100 net: openvswitch: fix csum updates for MPLS actions Skbs may have their checksum value populated by HW. If this is a checksum calculated over the entire packet then the CHECKSUM_COMPLETE field is marked. Changes to the data pointer on the skb throughout the network stack still try to maintain this complete csum value if it is required through functions such as skb_postpush_rcsum. The MPLS actions in Open vSwitch modify a CHECKSUM_COMPLETE value when changes are made to packet data without a push or a pull. This occurs when the ethertype of the MAC header is changed or when MPLS lse fields are modified. The modification is carried out using the csum_partial function to get the csum of a buffer and add it into the larger checksum. The buffer is an inversion of the data to be removed followed by the new data. Because the csum is calculated over 16 bits and these values align with 16 bits, the effect is the removal of the old value from the CHECKSUM_COMPLETE and addition of the new value. However, the csum fed into the function and the outcome of the calculation are also inverted. This would only make sense if it was the new value rather than the old that was inverted in the input buffer. Fix the issue by removing the bit inverts in the csum_partial calculation. The bug was verified and the fix tested by comparing the folded value of the updated CHECKSUM_COMPLETE value with the folded value of a full software checksum calculation (reset skb->csum to 0 and run skb_checksum_complete(skb)). Prior to the fix the outcomes differed but after they produce the same result. Fixes: 25cd9ba0abc0 ("openvswitch: Add basic MPLS support to kernel") Fixes: bc7cc5999fd3 ("openvswitch: update checksum in {push,pop}_mpls") Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Fixes: ccf4378615e9 ("datapath: Add basic MPLS support to kernel") Fixes: b51367aad315 ("datapath: update checksum in {push,pop}_mpls") Cc: John Hurley <john.hurley@netronome.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: ip6_gre: fix possible use-after-free in ip6erspan_rcvGreg Rose2019-07-101-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 2a3cabae4536edbcb21d344e7aa8be7a584d2afb Author: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Date: Sat Apr 6 17:16:53 2019 +0200 net: ip6_gre: fix possible use-after-free in ip6erspan_rcv erspan_v6 tunnels run __iptunnel_pull_header on received skbs to remove erspan header. This can determine a possible use-after-free accessing pkt_md pointer in ip6erspan_rcv since the packet will be 'uncloned' running pskb_expand_head if it is a cloned gso skb (e.g if the packet has been sent though a veth device). Fix it resetting pkt_md pointer after __iptunnel_pull_header Fixes: 1d7e2ed22f8d ("net: erspan: refactor existing erspan code") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Fixes: c387d8177f20 ("compat: Add ipv6 GRE and IPV6 Tunneling") Cc: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* tunnel: Add layer 2 IPv6 GRE encapsulation support.William Tu2019-07-031-1/+1
| | | | | | | | | | | | | | | | | The patch adds ip6gre support. Tunnel type 'ip6gre' with packet_type= legacy_l2 is a layer 2 GRE tunnel over IPv6, carrying inner ethernet packets and encap with GRE header with outer IPv6 header. Encapsulation of layer 3 packet over IPv6 GRE, ip6gre, is not supported yet. I tested it by running: # make check-kernel TESTSUITEFLAGS='-k ip6gre' under kernel 5.2 and for userspace: # make check TESTSUITEFLAGS='-k ip6gre' Tested-by: Greg Rose <gvrose8192@gmail.com> Tested-at: https://travis-ci.org/gvrose8192/ovs-experimental/builds/552977116 Reviewed-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Clean up tunnel_id_to_keyGreg Rose2019-07-031-11/+1
| | | | | | | | | This function was just a duplicate of tunnel_id_to_key32 - I'm not sure why it was ever needed but let's dump it now. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Clean up gre_calc_hlenGreg Rose2019-07-034-43/+25
| | | | | | | | | | | | It's proliferated throughout three .c files so let's pull them all together in gre.h where the inline function belongs. This requires some adjustments to the compat layer so that the various iterations of gre_calc_hlen and ip_gre_calc_hlen since the 3.10 kernel are handled correctly. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Remove duplicate metadata destination codeGreg Rose2019-07-033-183/+133
| | | | | | | | | | ip_gre.c and ip6_gre.c both had duplicate code for handling the tunnel metadata destinations. Move the duplicate code over into the right header file, dst_metadata.h. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Fix compilation error on CentOS 7.6Yi-Hung Wei2019-06-261-0/+1
| | | | | | | | | | | This fix the compilation issue on CentOS 7.6 kernel (3.10.0-957.21.3.el7.x86_64). Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-June/360013.html Reported-by: Fred Neubauer <fred.neubauer@gmail.com> Fixes: 6660a9597a49 ("datapath: compat: Introduce static key support") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>