From bfcb813203e619a8960a819bf533ad2a108d8105 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Fri, 27 Mar 2020 21:55:42 +0200 Subject: net: dsa: configure the MTU for switch ports It is useful be able to configure port policers on a switch to accept frames of various sizes: - Increase the MTU for better throughput from the default of 1500 if it is known that there is no 10/100 Mbps device in the network. - Decrease the MTU to limit the latency of high-priority frames under congestion, or work around various network segments that add extra headers to packets which can't be fragmented. For DSA slave ports, this is mostly a pass-through callback, called through the regular ndo ops and at probe time (to ensure consistency across all supported switches). The CPU port is called with an MTU equal to the largest configured MTU of the slave ports. The assumption is that the user might want to sustain a bidirectional conversation with a partner over any switch port. The DSA master is configured the same as the CPU port, plus the tagger overhead. Since the MTU is by definition L2 payload (sans Ethernet header), it is up to each individual driver to figure out if it needs to do anything special for its frame tags on the CPU port (it shouldn't except in special cases). So the MTU does not contain the tagger overhead on the CPU port. However the MTU of the DSA master, minus the tagger overhead, is used as a proxy for the MTU of the CPU port, which does not have a net device. This is to avoid uselessly calling the .change_mtu function on the CPU port when nothing should change. So it is safe to assume that the DSA master and the CPU port MTUs are apart by exactly the tagger's overhead in bytes. Some changes were made around dsa_master_set_mtu(), function which was now removed, for 2 reasons: - dev_set_mtu() already calls dev_validate_mtu(), so it's redundant to do the same thing in DSA - __dev_set_mtu() returns 0 if ops->ndo_change_mtu is an absent method That is to say, there's no need for this function in DSA, we can safely call dev_set_mtu() directly, take the rtnl lock when necessary, and just propagate whatever errors get reported (since the user probably wants to be informed). Some inspiration (mainly in the MTU DSA notifier) was taken from a vaguely similar patch from Murali and Florian, who are credited as co-developers down below. Co-developed-by: Murali Krishna Policharla Signed-off-by: Murali Krishna Policharla Co-developed-by: Florian Fainelli Signed-off-by: Florian Fainelli Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- net/dsa/dsa_priv.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'net/dsa/dsa_priv.h') diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h index 760e6ea3178a..da3be60beefe 100644 --- a/net/dsa/dsa_priv.h +++ b/net/dsa/dsa_priv.h @@ -22,6 +22,7 @@ enum { DSA_NOTIFIER_MDB_DEL, DSA_NOTIFIER_VLAN_ADD, DSA_NOTIFIER_VLAN_DEL, + DSA_NOTIFIER_MTU, }; /* DSA_NOTIFIER_AGEING_TIME */ @@ -61,6 +62,14 @@ struct dsa_notifier_vlan_info { int port; }; +/* DSA_NOTIFIER_MTU */ +struct dsa_notifier_mtu_info { + bool propagate_upstream; + int sw_index; + int port; + int mtu; +}; + struct dsa_slave_priv { /* Copy of CPU port xmit for faster access in slave transmit hot path */ struct sk_buff * (*xmit)(struct sk_buff *skb, @@ -127,6 +136,8 @@ int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering, struct switchdev_trans *trans); int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock, struct switchdev_trans *trans); +int dsa_port_mtu_change(struct dsa_port *dp, int new_mtu, + bool propagate_upstream); int dsa_port_fdb_add(struct dsa_port *dp, const unsigned char *addr, u16 vid); int dsa_port_fdb_del(struct dsa_port *dp, const unsigned char *addr, -- cgit v1.2.1 From bff33f7e2ae2e805a4b0af597b58422185c68900 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Fri, 27 Mar 2020 21:55:43 +0200 Subject: net: dsa: implement auto-normalization of MTU for bridge hardware datapath Many switches don't have an explicit knob for configuring the MTU (maximum transmission unit per interface). Instead, they do the length-based packet admission checks on the ingress interface, for reasons that are easy to understand (why would you accept a packet in the queuing subsystem if you know you're going to drop it anyway). So it is actually the MRU that these switches permit configuring. In Linux there only exists the IFLA_MTU netlink attribute and the associated dev_set_mtu function. The comments like to play blind and say that it's changing the "maximum transfer unit", which is to say that there isn't any directionality in the meaning of the MTU word. So that is the interpretation that this patch is giving to things: MTU == MRU. When 2 interfaces having different MTUs are bridged, the bridge driver MTU auto-adjustment logic kicks in: what br_mtu_auto_adjust() does is it adjusts the MTU of the bridge net device itself (and not that of the slave net devices) to the minimum value of all slave interfaces, in order for forwarded packets to not exceed the MTU regardless of the interface they are received and send on. The idea behind this behavior, and why the slave MTUs are not adjusted, is that normal termination from Linux over the L2 forwarding domain should happen over the bridge net device, which _is_ properly limited by the minimum MTU. And termination over individual slave devices is possible even if those are bridged. But that is not "forwarding", so there's no reason to do normalization there, since only a single interface sees that packet. The problem with those switches that can only control the MRU is with the offloaded data path, where a packet received on an interface with MRU 9000 would still be forwarded to an interface with MRU 1500. And the br_mtu_auto_adjust() function does not really help, since the MTU configured on the bridge net device is ignored. In order to enforce the de-facto MTU == MRU rule for these switches, we need to do MTU normalization, which means: in order for no packet larger than the MTU configured on this port to be sent, then we need to limit the MRU on all ports that this packet could possibly come from. AKA since we are configuring the MRU via MTU, it means that all ports within a bridge forwarding domain should have the same MTU. And that is exactly what this patch is trying to do. >From an implementation perspective, we try to follow the intent of the user, otherwise there is a risk that we might livelock them (they try to change the MTU on an already-bridged interface, but we just keep changing it back in an attempt to keep the MTU normalized). So the MTU that the bridge is normalized to is either: - The most recently changed one: ip link set dev swp0 master br0 ip link set dev swp1 master br0 ip link set dev swp0 mtu 1400 This sequence will make swp1 inherit MTU 1400 from swp0. - The one of the most recently added interface to the bridge: ip link set dev swp0 master br0 ip link set dev swp1 mtu 1400 ip link set dev swp1 master br0 The above sequence will make swp0 inherit MTU 1400 as well. Suggested-by: Florian Fainelli Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- net/dsa/dsa_priv.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'net/dsa/dsa_priv.h') diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h index da3be60beefe..904cc7c9b882 100644 --- a/net/dsa/dsa_priv.h +++ b/net/dsa/dsa_priv.h @@ -194,4 +194,8 @@ dsa_slave_to_master(const struct net_device *dev) /* switch.c */ int dsa_switch_register_notifier(struct dsa_switch *ds); void dsa_switch_unregister_notifier(struct dsa_switch *ds); + +/* dsa2.c */ +extern struct list_head dsa_tree_list; + #endif -- cgit v1.2.1