summaryrefslogtreecommitdiff
path: root/net/core/neighbour.c
Commit message (Collapse)AuthorAgeFilesLines
* net: neighbour: use source address of last enqueued packet for solicitationHannes Frederic Sowa2013-09-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Currently we always use the first member of the arp_queue to determine the sender ip address of the arp packet (or in case of IPv6 - source address of the ndisc packet). This skb is fixed as long as the queue is not drained by a complete purge because of a timeout or by a successful response. If the first packet enqueued on the arp_queue is from a local application with a manually set source address and the to be discovered system does some kind of uRPF checks on the source address in the arp packet the resolving process hangs until a timeout and restarts. This hurts communication with the participating network node. This could be mitigated a bit if we use the latest enqueued skb's source address for the resolving process, which is not as static as the arp_queue's head. This change of the source address could result in better recovery of a failed solicitation. Cc: "David S. Miller" <davem@davemloft.net> Cc: Julian Anastasov <ja@ssi.bg> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: neighbour: Remove CONFIG_ARPDTim Gardner2013-09-031-2/+0
| | | | | | | | | | | | | | | | | | | | | | | This config option is superfluous in that it only guards a call to neigh_app_ns(). Enabling CONFIG_ARPD by default has no change in behavior. There will now be call to __neigh_notify() for each ARP resolution, which has no impact unless there is a user space daemon waiting to receive the notification, i.e., the case for which CONFIG_ARPD was designed anyways. Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Joe Perches <joe@perches.com> Cc: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-08-031-7/+22
|\ | | | | | | | | | | | | Merge net into net-next to setup some infrastructure Eric Dumazet needs for usbnet changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * neigh: prevent overflowing params in /proc/sys/net/ipv4/neigh/Francesco Fusco2013-07-261-7/+22
| | | | | | | | | | | | | | | | | | Without this patch, the fields app_solicit, gc_thresh1, gc_thresh2, gc_thresh3, proxy_qlen, ucast_solicit, mcast_solicit could have assumed negative values when setting large numbers. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | neighbour: populate neigh_parms on alloc before calling ndo_neigh_setupVeaceslav Falico2013-08-021-4/+6
|/ | | | | | | | dev->ndo_neigh_setup() might need some of the values of neigh_parms, so populate them before calling it. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* neighbour: fix a race in neigh_destroy()Eric Dumazet2013-07-011-5/+7
| | | | | | | | | | | | | | | | | | There is a race in neighbour code, because neigh_destroy() uses skb_queue_purge(&neigh->arp_queue) without holding neighbour lock, while other parts of the code assume neighbour rwlock is what protects arp_queue Convert all skb_queue_purge() calls to the __skb_queue_purge() variant Use __skb_queue_head_init() instead of skb_queue_head_init() to make clear we do not use arp_queue.lock And hold neigh->lock in neigh_destroy() to close the race. Reported-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* neigh: disallow un-init_net to change thresh of neighGao feng2013-06-191-0/+6
| | | | | | | | thresh and interval are global resources, only init net can change them. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* neigh: only allow init_net to change the default neigh_parmsGao feng2013-06-191-1/+1
| | | | | | | | | | | | | Though we don't export the /proc/sys/net/ipv[4,6]/neigh/default/ directory to the un-init_net, but we can still use cmd such as "ip ntable change name arp_cache locktime 129" to change the locktime of default neigh_parms. This patch disallows the un-init_net to find out the neigh_table.parms. So the un-init_net will failed to influence the init_net. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* neigh: no need to call lookup_neigh_parms in neigh_parms_allocGao feng2013-06-191-6/+2
| | | | | | | | | neigh_table.parms always exist and is initialized,kmemdup can use it to create new neigh_parms, actually lookup_neigh_parms here will return neigh_table.parms too. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Convert uses of typedef ctl_table to struct ctl_tableJoe Perches2013-06-131-3/+3
| | | | | | | | | | | | | | | | Reduce the uses of this unnecessary typedef. Done via perl script: $ git grep --name-only -w ctl_table net | \ xargs perl -p -i -e '\ sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \ s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge' Reflow the modified lines that now exceed 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'for-linus' of ↵Linus Torvalds2013-05-011-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS updates from Al Viro, Misc cleanups all over the place, mainly wrt /proc interfaces (switch create_proc_entry to proc_create(), get rid of the deprecated create_proc_read_entry() in favor of using proc_create_data() and seq_file etc). 7kloc removed. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits) don't bother with deferred freeing of fdtables proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h proc: Make the PROC_I() and PDE() macros internal to procfs proc: Supply a function to remove a proc entry by PDE take cgroup_open() and cpuset_open() to fs/proc/base.c ppc: Clean up scanlog ppc: Clean up rtas_flash driver somewhat hostap: proc: Use remove_proc_subtree() drm: proc: Use remove_proc_subtree() drm: proc: Use minor->index to label things, not PDE->name drm: Constify drm_proc_list[] zoran: Don't print proc_dir_entry data in debug reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() proc: Supply an accessor for getting the data from a PDE's parent airo: Use remove_proc_subtree() rtl8192u: Don't need to save device proc dir PDE rtl8187se: Use a dir under /proc/net/r8180/ proc: Add proc_mkdir_data() proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} proc: Move PDE_NET() to fs/proc/proc_net.c ...
| * procfs: new helper - PDE_DATA(inode)Al Viro2013-04-091-1/+1
| | | | | | | | | | | | | | | | | | | | The only part of proc_dir_entry the code outside of fs/proc really cares about is PDE(inode)->data. Provide a helper for that; static inline for now, eventually will be moved to fs/proc, along with the knowledge of struct proc_dir_entry layout. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | neighbour: Convert NEIGH_PRINTK to neigh_dbgJoe Perches2013-04-161-29/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Update debugging messages to a more current style. Emit these debugging messages at KERN_DEBUG instead of KERN_DEFAULT. Add and use neigh_dbg(level, fmt, ...) macro Add dynamic_debug capability via pr_debug Convert embedded function names to "%s: ", __func__ Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | rtnetlink: Remove passing of attributes into rtnl_doit functionsThomas Graf2013-03-221-3/+3
|/ | | | | | | | | | With decnet converted, we can finally get rid of rta_buf and its computations around it. It also gets rid of the minimal header length verification since all message handlers do that explicitly anyway. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* net neigh: Optimize neighbor entry size calculation.YOSHIFUJI Hideaki / 吉藤英明2013-01-281-9/+7
| | | | | | | | | | | | | | | | | | | | | | When allocating memory for neighbour cache entry, if tbl->entry_size is not set, we always calculate sizeof(struct neighbour) + tbl->key_len, which is common in the same table. With this change, set tbl->entry_size during the table initialization phase, if it was not set, and use it in neigh_alloc() and neighbour_priv(). This change also allow us to have both of protocol private data and device priate data at tha same time. Note that the only user of prototcol private is DECnet and the only user of device private is ATM CLIP. Since those are exclusive, we have not been facing issues here. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* neigh: Keep neighbour cache entries if number of them is small enough.YOSHIFUJI Hideaki / 吉藤英明2013-01-221-0/+4
| | | | | | | | | | | | | | | Since we have removed NCE (Neighbour Cache Entry) reference from routing entries, the only refcnt holders of an NCE are its timer (if running) and its owner table, in usual cases. As a result, neigh_periodic_work() purges NCEs over and over again even for gateways. It does not make sense to purge entries, if number of them is very small, so keep them. The minimum number of entries to keep is specified by gc_thresh1. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix some compiler warning in net/core/neighbour.cCong Wang2012-12-051-3/+2
| | | | | | | | | | | | | net/core/neighbour.c:65:12: warning: 'zero' defined but not used [-Wunused-variable] net/core/neighbour.c:66:12: warning: 'unres_qlen_max' defined but not used [-Wunused-variable] These variables are only used when CONFIG_SYSCTL is defined, so move them under #ifdef CONFIG_SYSCTL. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Shan Wei <davidshan@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: neighbour: prohibit negative value for unres_qlen_bytes parameterShan Wei2012-12-051-5/+12
| | | | | | | | | | | | | | | | | | | | | | unres_qlen_bytes and unres_qlen are int type. But multiple relation(unres_qlen_bytes = unres_qlen * SKB_TRUESIZE(ETH_FRAME_LEN)) will cause type overflow when seting unres_qlen. e.g. $ echo 1027506 > /proc/sys/net/ipv4/neigh/eth1/unres_qlen $ cat /proc/sys/net/ipv4/neigh/eth1/unres_qlen 1182657265 $ cat /proc/sys/net/ipv4/neigh/eth1/unres_qlen_bytes -2147479756 The gutted value is not that we setting。 But user/administrator don't know this is caused by int type overflow. what's more, it is meaningless and even dangerous that unres_qlen_bytes is set with negative number. Because, for unresolved neighbour address, kernel will cache packets without limit in __neigh_event_send()(e.g. (u32)-1 = 2GB). Signed-off-by: Shan Wei <davidshan@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Enable a userns root rtnl calls that are safe for unprivilged usersEric W. Biederman2012-11-181-9/+0
| | | | | | | | | | | | | | | | | - Only allow moving network devices to network namespaces you have CAP_NET_ADMIN privileges over. - Enable creating/deleting/modifying interfaces - Enable adding/deleting addresses - Enable adding/setting/deleting neighbour entries - Enable adding/removing routes - Enable adding/removing fib rules - Enable setting the forwarding state - Enable adding/removing ipv6 address labels - Enable setting bridge parameter Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Push capable(CAP_NET_ADMIN) into the rtnl methodsEric W. Biederman2012-11-181-0/+9
| | | | | | | | | | | | | | | | | | - In rtnetlink_rcv_msg convert the capable(CAP_NET_ADMIN) check to ns_capable(net->user-ns, CAP_NET_ADMIN). Allowing unprivileged users to make netlink calls to modify their local network namespace. - In the rtnetlink doit methods add capable(CAP_NET_ADMIN) so that calls that are not safe for unprivileged users are still protected. Later patches will remove the extra capable calls from methods that are safe for unprivilged users. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Don't export sysctls to unprivileged usersEric W. Biederman2012-11-181-0/+4
| | | | | | | | | | | | | In preparation for supporting the creation of network namespaces by unprivileged users, modify all of the per net sysctl exports and refuse to allow them to unprivileged users. This makes it safe for unprivileged users in general to access per net sysctls, and allows sysctls to be exported to unprivileged users on an individual basis as they are deemed safe. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Fix skb_under_panic oops in neigh_resolve_outputramesh.nagappa@gmail.com2012-10-071-4/+2
| | | | | | | | | | | | | The retry loop in neigh_resolve_output() and neigh_connected_output() call dev_hard_header() with out reseting the skb to network_header. This causes the retry to fail with skb_under_panic. The fix is to reset the network_header within the retry loop. Signed-off-by: Ramesh Nagappa <ramesh.nagappa@ericsson.com> Reviewed-by: Shawn Lu <shawn.lu@ericsson.com> Reviewed-by: Robert Coulson <robert.coulson@ericsson.com> Reviewed-by: Billie Alsup <billie.alsup@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2012-10-021-4/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking changes from David Miller: 1) GRE now works over ipv6, from Dmitry Kozlov. 2) Make SCTP more network namespace aware, from Eric Biederman. 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko. 4) Make openvswitch network namespace aware, from Pravin B Shelar. 5) IPV6 NAT implementation, from Patrick McHardy. 6) Server side support for TCP Fast Open, from Jerry Chu and others. 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel Borkmann. 8) Increate the loopback default MTU to 64K, from Eric Dumazet. 9) Use a per-task rather than per-socket page fragment allocator for outgoing networking traffic. This benefits processes that have very many mostly idle sockets, which is quite common. From Eric Dumazet. 10) Use up to 32K for page fragment allocations, with fallbacks to smaller sizes when higher order page allocations fail. Benefits are a) less segments for driver to process b) less calls to page allocator c) less waste of space. From Eric Dumazet. 11) Allow GRO to be used on GRE tunnels, from Eric Dumazet. 12) VXLAN device driver, one way to handle VLAN issues such as the limitation of 4096 VLAN IDs yet still have some level of isolation. From Stephen Hemminger. 13) As usual there is a large boatload of driver changes, with the scale perhaps tilted towards the wireless side this time around. Fix up various fairly trivial conflicts, mostly caused by the user namespace changes. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits) hyperv: Add buffer for extended info after the RNDIS response message. hyperv: Report actual status in receive completion packet hyperv: Remove extra allocated space for recv_pkt_list elements hyperv: Fix page buffer handling in rndis_filter_send_request() hyperv: Fix the missing return value in rndis_filter_set_packet_filter() hyperv: Fix the max_xfer_size in RNDIS initialization vxlan: put UDP socket in correct namespace vxlan: Depend on CONFIG_INET sfc: Fix the reported priorities of different filter types sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP sfc: Fix loopback self-test with separate_tx_channels=1 sfc: Fix MCDI structure field lookup sfc: Add parentheses around use of bitfield macro arguments sfc: Fix null function pointer in efx_sriov_channel_type vxlan: virtual extensible lan igmp: export symbol ip_mc_leave_group netlink: add attributes to fdb interface tg3: unconditionally select HWMON support when tg3 is enabled. Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT" gre: fix sparse warning ...
| * netlink: Rename pid to portid to avoid confusionEric W. Biederman2012-09-101-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is a frequent mistake to confuse the netlink port identifier with a process identifier. Try to reduce this confusion by renaming fields that hold port identifiers portid instead of pid. I have carefully avoided changing the structures exported to userspace to avoid changing the userspace API. I have successfully built an allyesconfig kernel with this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | workqueue: make deferrable delayed_work initializer names consistentTejun Heo2012-08-211-1/+1
|/ | | | | | | | | | | | | | | | | | Initalizers for deferrable delayed_work are confused. * __DEFERRED_WORK_INITIALIZER() * DECLARE_DEFERRED_WORK() * INIT_DELAYED_WORK_DEFERRABLE() Rename them to * __DEFERRABLE_WORK_INITIALIZER() * DECLARE_DEFERRABLE_WORK() * INIT_DEFERRABLE_WORK() This patch doesn't cause any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>
* neigh: Convert over to dst_neigh_lookup_skb().David S. Miller2012-07-051-3/+16
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Make neigh lookups directly in output packet path.David S. Miller2012-07-051-5/+7
| | | | | | Do not use the dst cached neigh, we'll be getting rid of that. Signed-off-by: David S. Miller <davem@davemloft.net>
* net: neighbour: fix neigh_dump_info()Eric Dumazet2012-06-071-8/+6
| | | | | | | | | Denys found out "ip neigh" output was truncated to about 54 neighbours. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: core: Use pr_<level>Joe Perches2012-05-171-6/+7
| | | | | | | | | | | | | Use the current logging style. This enables use of dynamic debugging as well. Convert printk(KERN_<LEVEL> to pr_<level>. Add pr_fmt. Remove embedded prefixes, use %s, __func__ instead. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net neighbour: Convert to use register_net_sysctlEric W. Biederman2012-04-201-27/+6
| | | | | | | | | | | | | Using an ascii path to register_net_sysctl as opposed to the slightly awkward ctl_path allows for much simpler code. We no longer need to malloc dev_name to keep it alive the length of our sysctl register instead we can use a small temporary buffer on the stack. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Move all of the network sysctls without a namespace into init_net.Eric W. Biederman2012-04-201-1/+1
| | | | | | | | | | | | | | | | This makes it clearer which sysctls are relative to your current network namespace. This makes it a little less error prone by not exposing sysctls for the initial network namespace in other namespaces. This is the same way we handle all of our other network interfaces to userspace and I can't honestly remember why we didn't do this for sysctls right from the start. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: cleanup unsigned to unsigned intEric Dumazet2012-04-151-1/+1
| | | | | | | Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* neighbour: Make neigh_table_init_no_netlink() static.Hiroaki SHIMODA2012-04-131-2/+1
| | | | | | | neigh_table_init_no_netlink() is only used in net/core/neighbour.c file. Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* neighbour: Stop using NLA_PUT*().David S. Miller2012-04-021-35/+40
| | | | | | | These macros contain a hidden goto, and are thus extremely error prone and make code hard to audit. Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-02-261-0/+2
|\ | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/sfc/rx.c Overlapping changes in drivers/net/ethernet/sfc/rx.c, one to change the rx_buf->is_page boolean into a set of u16 flags, and another to adjust how ->ip_summed is initialized. Signed-off-by: David S. Miller <davem@davemloft.net>
| * neighbour: Fixed race condition at tbl->nhtMichel Machado2012-02-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the fixed race condition happens: 1. While function neigh_periodic_work scans the neighbor hash table pointed by field tbl->nht, it unlocks and locks tbl->lock between buckets in order to call cond_resched. 2. Assume that function neigh_periodic_work calls cond_resched, that is, the lock tbl->lock is available, and function neigh_hash_grow runs. 3. Once function neigh_hash_grow finishes, and RCU calls neigh_hash_free_rcu, the original struct neigh_hash_table that function neigh_periodic_work was using doesn't exist anymore. 4. Once back at neigh_periodic_work, whenever the old struct neigh_hash_table is accessed, things can go badly. Signed-off-by: Michel Machado <michel@digirati.com.br> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Allow ipv6 proxies and arp proxies be shown with iproute2Tony Zelenoff2012-01-301-3/+87
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add ability to return neighbour proxies list to caller if it sent full ndmsg structure and has NTF_PROXY flag set. Before this patch (and before iproute2 patches): $ ip neigh add proxy 2001::1 dev eth0 $ ip -6 neigh show $ After it and with applied iproute2 patches: $ ip neigh add proxy 2001::1 dev eth0 $ ip -6 neigh show 2001::1 dev eth0 proxy $ Compatibility with old versions of iproute2 is not broken, kernel checks for incoming structure size and properly works if old structure is came. [v2] * changed comments style. * removed useless line with continue and curly bracket. * changed incoming message size check from equal to more or equal. CC: davem@davemloft.net CC: kuznet@ms2.inr.ac.ru CC: netdev@vger.kernel.org CC: xemul@parallels.com Signed-off-by: Tony Zelenoff <antonz@parallels.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Use universal hash for NDISC.David S. Miller2011-12-281-3/+10
| | | | | | | | | In order to perform a proper universal hash on a vector of integers, we have to use different universal hashes on each vector element. Which means we need 4 different hash randoms for ipv6. Signed-off-by: David S. Miller <davem@davemloft.net>
* Revert "net: Remove unused neighbour layer ops."David S. Miller2011-12-191-0/+10
| | | | | | | | | This reverts commit 5c3ddec73d01a1fae9409c197078cb02c42238c3. S390 qeth driver actually still uses the setup ops. Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Remove unused neighbour layer ops.David S. Miller2011-12-131-10/+0
| | | | | | | | It's simpler to just keep these things out until there is a real user of them, so we can see what the needs actually are, rather than keep these things around as useless overhead. Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.David Miller2011-12-051-1/+1
| | | | | | | | To reflect the fact that a refrence is not obtained to the resulting neighbour entry. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
* neigh: Add device constructor/destructor capability.David Miller2011-11-301-1/+14
| | | | | | | If the neigh entry has device private state, it will need constructor/destructor ops. Signed-off-by: David S. Miller <davem@davemloft.net>
* neigh: Add infrastructure for allocating device neigh privates.David Miller2011-11-301-3/+11
| | | | | | | | | netdev->neigh_priv_len records the private area length. This will trigger for neigh_table objects which set tbl->entry_size to zero, and the first instances of this will be forthcoming. Signed-off-by: David S. Miller <davem@davemloft.net>
* neigh: Get rid of neigh_table->kmem_cachepDavid Miller2011-11-301-16/+2
| | | | | | | | | | | We are going to alloc for device specific private areas for neighbour entries, and in order to do that we have to move away from the fixed allocation size enforced by using neigh_table->kmem_cachep As a nice side effect we can now use kfree_rcu(). Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2011-11-261-1/+4
|\ | | | | | | | | Conflicts: net/ipv4/inet_diag.c
| * netns: fix proxy ARP entries listing on a netnsJorge Boncompte [DTI2]2011-11-251-1/+4
| | | | | | | | | | | | | | Skip entries from foreign network namespaces. Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | neigh: new unresolved queue limitsEric Dumazet2011-11-141-51/+111
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit : > From: David Miller <davem@davemloft.net> > Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST) > > > From: Eric Dumazet <eric.dumazet@gmail.com> > > Date: Wed, 09 Nov 2011 12:14:09 +0100 > > > >> unres_qlen is the number of frames we are able to queue per unresolved > >> neighbour. Its default value (3) was never changed and is responsible > >> for strange drops, especially if IP fragments are used, or multiple > >> sessions start in parallel. Even a single tcp flow can hit this limit. > > ... > > > > Ok, I've applied this, let's see what happens :-) > > Early answer, build fails. > > Please test build this patch with DECNET enabled and resubmit. The > decnet neigh layer still refers to the removed ->queue_len member. > > Thanks. Ouch, this was fixed on one machine yesterday, but not the other one I used this morning, sorry. [PATCH V5 net-next] neigh: new unresolved queue limits unres_qlen is the number of frames we are able to queue per unresolved neighbour. Its default value (3) was never changed and is responsible for strange drops, especially if IP fragments are used, or multiple sessions start in parallel. Even a single tcp flow can hit this limit. $ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108 PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data. 8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms Signed-off-by: David S. Miller <davem@davemloft.net>
* neigh: Kill bogus SMP protected debugging message.David S. Miller2011-11-011-5/+1
| | | | | | | | | | | | Whatever situations make this state legitimate when SMP also would be legitimate when !SMP and f.e. preemption is enabled. This is dubious enough that we should just delete it entirely. If we want to add debugging for neigh timer races, better more thorough mechanisms are needed. Signed-off-by: David S. Miller <davem@davemloft.net>
* neigh: fix rcu splat in neigh_update()roy.qing.li@gmail.com2011-10-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when use dst_get_neighbour to get neighbour, we need rcu_read_lock to protect, since dst_get_neighbour uses rcu_dereference. The bug was reported by Ari Savolainen <ari.m.savolainen@gmail.com> [ 105.612095] [ 105.612096] =================================================== [ 105.612100] [ INFO: suspicious rcu_dereference_check() usage. ] [ 105.612101] --------------------------------------------------- [ 105.612103] include/net/dst.h:91 invoked rcu_dereference_check() without protection! [ 105.612105] [ 105.612106] other info that might help us debug this: [ 105.612106] [ 105.612108] [ 105.612108] rcu_scheduler_active = 1, debug_locks = 0 [ 105.612110] 1 lock held by dnsmasq/2618: [ 105.612111] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff815df8c7>] rtnl_lock+0x17/0x20 [ 105.612120] [ 105.612121] stack backtrace: [ 105.612123] Pid: 2618, comm: dnsmasq Not tainted 3.1.0-rc1 #41 [ 105.612125] Call Trace: [ 105.612129] [<ffffffff810ccdcb>] lockdep_rcu_dereference+0xbb/0xc0 [ 105.612132] [<ffffffff815dc5a9>] neigh_update+0x4f9/0x5f0 [ 105.612135] [<ffffffff815da001>] ? neigh_lookup+0xe1/0x220 [ 105.612139] [<ffffffff81639298>] arp_req_set+0xb8/0x230 [ 105.612142] [<ffffffff8163a59f>] arp_ioctl+0x1bf/0x310 [ 105.612146] [<ffffffff810baa40>] ? lock_hrtimer_base.isra.26+0x30/0x60 [ 105.612150] [<ffffffff8163fb75>] inet_ioctl+0x85/0x90 [ 105.612154] [<ffffffff815b5520>] sock_do_ioctl+0x30/0x70 [ 105.612157] [<ffffffff815b55d3>] sock_ioctl+0x73/0x280 [ 105.612162] [<ffffffff811b7698>] do_vfs_ioctl+0x98/0x570 [ 105.612165] [<ffffffff811a5c40>] ? fget_light+0x340/0x3a0 [ 105.612168] [<ffffffff811b7bbf>] sys_ioctl+0x4f/0x80 [ 105.612172] [<ffffffff816fdcab>] system_call_fastpath+0x16/0x1b Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com> Signed-off-by: RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of github.com:davem330/netDavid S. Miller2011-09-221-2/+6
|\ | | | | | | | | | | | | | | | | | | | | | | Conflicts: MAINTAINERS drivers/net/Kconfig drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c drivers/net/ethernet/broadcom/tg3.c drivers/net/wireless/iwlwifi/iwl-pci.c drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c drivers/net/wireless/rt2x00/rt2800usb.c drivers/net/wireless/wl12xx/main.c