summaryrefslogtreecommitdiff
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2009-07-228-11/+53
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits) sky2: Avoid races in sky2_down drivers/net/mlx4: Adjust constant drivers/net: Move a dereference below a NULL test drivers/net: Move a dereference below a NULL test connector: maintainer/mail update. USB host CDC Phonet network interface driver macsonic, jazzsonic: fix oops on module unload macsonic: move probe function to .devinit.text can: switch carrier on if device was stopped while in bus-off state can: restart device even if dev_alloc_skb() fails can: sja1000: remove duplicated includes New device ID for sc92031 [1088:2031] 3c589_cs: re-initialize the multicast in the tc589_reset Fix error return for setsockopt(SO_TIMESTAMPING) netxen: fix thermal check and shutdown netxen: fix deadlock on dev close netxen: fix context deletion sequence net: Micrel KS8851 SPI network driver tcp: Use correct peer adr when copying MD5 keys tcp: Fix MD5 signature checking on IPv4 mapped sockets ...
| * Fix error return for setsockopt(SO_TIMESTAMPING)Rémi Denis-Courmont2009-07-201-1/+1
| | | | | | | | | | | | | | | | I guess it should be -EINVAL rather than EINVAL. I have not checked when the bug came in. Perhaps a candidate for -stable? Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp: Use correct peer adr when copying MD5 keysJohn Dykstra2009-07-202-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | When the TCP connection handshake completes on the passive side, a variety of state must be set up in the "child" sock, including the key if MD5 authentication is being used. Fix TCP for both address families to label the key with the peer's destination address, rather than the address from the listening sock, which is usually the wildcard. Reported-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: John Dykstra <john.dykstra1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp: Fix MD5 signature checking on IPv4 mapped socketsJohn Dykstra2009-07-203-1/+3
| | | | | | | | | | | | | | | | | | | | | | Fix MD5 signature checking so that an IPv4 active open to an IPv6 socket can succeed. In particular, use the correct address family's signature generation function for the SYN/ACK. Reported-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: John Dykstra <john.dykstra1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: sock_copy() fixesEric Dumazet2009-07-161-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e912b1142be8f1e2c71c71001dc992c6e5eb2ec1 (net: sk_prot_alloc() should not blindly overwrite memory) took care of not zeroing whole new socket at allocation time. sock_copy() is another spot where we should be very careful. We should not set refcnt to a non null value, until we are sure other fields are correctly setup, or a lockless reader could catch this socket by mistake, while not fully (re)initialized. This patch puts sk_node & sk_refcnt to the very beginning of struct sock to ease sock_copy() & sk_prot_alloc() job. We add appropriate smp_wmb() before sk_refcnt initializations to match our RCU requirements (changes to sock keys should be committed to memory before sk_refcnt setting) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'master' of ↵David S. Miller2009-07-162-5/+21
| |\ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
| | * netfilter: nf_conntrack: nf_conntrack_alloc() fixesEric Dumazet2009-07-161-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a slab cache uses SLAB_DESTROY_BY_RCU, we must be careful when allocating objects, since slab allocator could give a freed object still used by lockless readers. In particular, nf_conntrack RCU lookups rely on ct->tuplehash[xxx].hnnode.next being always valid (ie containing a valid 'nulls' value, or a valid pointer to next object in hash chain.) kmem_cache_zalloc() setups object with NULL values, but a NULL value is not valid for ct->tuplehash[xxx].hnnode.next. Fix is to call kmem_cache_alloc() and do the zeroing ourself. As spotted by Patrick, we also need to make sure lookup keys are committed to memory before setting refcount to 1, or a lockless reader could get a reference on the old version of the object. Its key re-check could then pass the barrier. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * netfilter: xt_osf: fix nf_log_packet() argumentsPatrick McHardy2009-07-161-2/+3
| | | | | | | | | | | | | | | | | | | | | The first argument is the address family, the second one the hook number. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | net/can: add module alias to can protocol driversLothar Waßmann2009-07-152-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Add appropriate MODULE_ALIAS() to facilitate autoloading of can protocol drivers Signed-off-by: Lothar Wassmann <LW@KARO-electronics.de> Acked-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net/can bugfix: use after free bug in can protocol driversLothar Waßmann2009-07-152-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a use after free bug in can protocol drivers The release functions of the can protocol drivers lack a call to sock_orphan() which leads to referencing freed memory under certain circumstances. This patch fixes a bug reported here: https://lists.berlios.de/pipermail/socketcan-users/2009-July/000985.html Signed-off-by: Lothar Wassmann <LW@KARO-electronics.de> Acked-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | 9p: Possible regression in p9_client_statAbhishek Kulkarni2009-07-141-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a possible regression with p9_client_stat where it can try to kfree an ERR_PTR after an erroneous p9pdu_readf. Also remove an unnecessary data buffer increment in p9_client_read. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* | | 9p: default 9p transport module fixAbhishek Kulkarni2009-07-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | The default 9p transport module is not chosen unless an option parameter (any) is passed to mount, which thus returns a ENOPROTOSUPPORT. This fix moves the check out of parse_opts into p9_client_create. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2009-07-145-6/+20
|\ \ \ | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: Revert "NET: Fix locking issues in PPP, 6pack, mkiss and strip line disciplines." skbuff.h: Fix comment for NET_IP_ALIGN drivers/net: using spin_lock_irqsave() in net_send_packet() NET: phy_device, fix lock imbalance gre: fix ToS/DiffServ inherit bug igb: gcc-3.4.6 fix atlx: duplicate testing of MCAST flag NET: Fix locking issues in PPP, 6pack, mkiss and strip line disciplines. netdev: restore MTU change operation netdev: restore MAC address set and validate operations sit: fix regression: do not release skb->dst before xmit net: ip_push_pending_frames() fix net: sk_prot_alloc() should not blindly overwrite memory
| * | gre: fix ToS/DiffServ inherit bugAndreas Jaggi2009-07-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes two bugs: - ToS/DiffServ inheritance was unintentionally activated when using impair fixed ToS values - ECN bit was lost during ToS/DiffServ inheritance Signed-off-by: Andreas Jaggi <aj@open.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | sit: fix regression: do not release skb->dst before xmitSascha Hlusiak2009-07-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sit module makes use of skb->dst in it's xmit function, so since 93f154b594fe47 ("net: release dst entry in dev_hard_start_xmit()") sit tunnels are broken, because the flag IFF_XMIT_DST_RELEASE is not unset. This patch unsets that flag for sit devices to fix this regression. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: ip_push_pending_frames() fixEric Dumazet2009-07-112-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) we do not take any more references on sk->sk_refcnt on outgoing packets. I forgot to delete two __sock_put() from ip_push_pending_frames() and ip6_push_pending_frames(). Reported-by: Emil S Tantilov <emils.tantilov@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Emil S Tantilov <emils.tantilov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: sk_prot_alloc() should not blindly overwrite memoryEric Dumazet2009-07-111-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some sockets use SLAB_DESTROY_BY_RCU, and our RCU code correctness depends on sk->sk_nulls_node.next being always valid. A NULL value is not allowed as it might fault a lockless reader. Current sk_prot_alloc() implementation doesnt respect this hypothesis, calling kmem_cache_alloc() with __GFP_ZERO. Just call memset() around the forbidden field. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | headers: smp_lock.h reduxAlexey Dobriyan2009-07-1210-3/+7
|/ / | | | | | | | | | | | | | | | | | | | | | | | | * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | net: adding memory barrier to the poll and receive callbacksJiri Olsa2009-07-099-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding memory barrier after the poll_wait function, paired with receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper to wrap the memory barrier. Without the memory barrier, following race can happen. The race fires, when following code paths meet, and the tp->rcv_nxt and __add_wait_queue updates stay in CPU caches. CPU1 CPU2 sys_select receive packet ... ... __add_wait_queue update tp->rcv_nxt ... ... tp->rcv_nxt check sock_def_readable ... { schedule ... if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) wake_up_interruptible(sk->sk_sleep) ... } If there was no cache the code would work ok, since the wait_queue and rcv_nxt are opposit to each other. Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already passed the tp->rcv_nxt check and sleeps, or will get the new value for tp->rcv_nxt and will return with new data mask. In both cases the process (CPU1) is being added to the wait queue, so the waitqueue_active (CPU2) call cannot miss and will wake up CPU1. The bad case is when the __add_wait_queue changes done by CPU1 stay in its cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1 will then endup calling schedule and sleep forever if there are no more data on the socket. Calls to poll_wait in following modules were ommited: net/bluetooth/af_bluetooth.c net/irda/af_irda.c net/irda/irnet/irnet_ppp.c net/mac80211/rc80211_pid_debugfs.c net/phonet/socket.c net/rds/af_rds.c net/rfkill/core.c net/sunrpc/cache.c net/sunrpc/rpc_pipe.c net/tipc/socket.c Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netpoll: Fix carrier detection for drivers that are using phylibAnton Vorontsov2009-07-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using early netconsole and gianfar driver this error pops up: netconsole: timeout waiting for carrier It appears that net/core/netpoll.c:netpoll_setup() is using cond_resched() in a loop waiting for a carrier. The thing is that cond_resched() is a no-op when system_state != SYSTEM_RUNNING, and so drivers/net/phy/phy.c's state_queue is never scheduled, therefore link detection doesn't work. I belive that the main problem is in cond_resched()[1], but despite how the cond_resched() story ends, it might be a good idea to call msleep(1) instead of cond_resched(), as suggested by Andrew Morton. [1] http://lkml.org/lkml/2009/7/7/463 Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2009-07-084-3/+6
|\ \ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
| * | mac80211: minstrel: avoid accessing negative indices in rix_to_ndx()Luciano Coelho2009-07-071-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If rix is not found in mi->r[], i will become -1 after the loop. This value is eventually used to access arrays, so we were accessing arrays with a negative index, which is obviously not what we want to do. This patch fixes this potential problem. Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com> Acked-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | cfg80211: fix refcount leakJohannes Berg2009-07-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code in cfg80211's cfg80211_bss_update erroneously grabs a reference to the BSS, which means that it will never be freed. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: stable@kernel.org [2.6.29, 2.6.30] Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | mac80211: fix allocation in mesh_queue_preqAndrey Yurovsky2009-07-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We allocate a PREQ queue node in mesh_queue_preq, however the allocation may cause us to sleep. Use GFP_ATOMIC to prevent this. [ 1869.126498] BUG: scheduling while atomic: ping/1859/0x10000100 [ 1869.127164] Modules linked in: ath5k mac80211 ath [ 1869.128310] Pid: 1859, comm: ping Not tainted 2.6.30-wl #1 [ 1869.128754] Call Trace: [ 1869.129293] [<c1023a2b>] __schedule_bug+0x48/0x4d [ 1869.129866] [<c13b5533>] __schedule+0x77/0x67a [ 1869.130544] [<c1026f2e>] ? release_console_sem+0x17d/0x185 [ 1869.131568] [<c807cf47>] ? mesh_queue_preq+0x2b/0x165 [mac80211] [ 1869.132318] [<c13b5b3e>] schedule+0x8/0x1f [ 1869.132807] [<c1023c12>] __cond_resched+0x16/0x2f [ 1869.133478] [<c13b5bf0>] _cond_resched+0x27/0x32 [ 1869.134191] [<c108a370>] kmem_cache_alloc+0x1c/0xcf [ 1869.134714] [<c10273ae>] ? printk+0x15/0x17 [ 1869.135670] [<c807cf47>] mesh_queue_preq+0x2b/0x165 [mac80211] [ 1869.136731] [<c807d1f8>] mesh_nexthop_lookup+0xee/0x12d [mac80211] [ 1869.138130] [<c807417e>] ieee80211_xmit+0xe6/0x2b2 [mac80211] [ 1869.138935] [<c80be46d>] ? ath5k_hw_setup_rx_desc+0x0/0x66 [ath5k] [ 1869.139831] [<c80c97bc>] ? ath5k_tasklet_rx+0xba/0x506 [ath5k] [ 1869.140863] [<c8075191>] ieee80211_subif_start_xmit+0x6c9/0x6e4 [mac80211] [ 1869.141665] [<c105cf1c>] ? handle_level_irq+0x78/0x9d [ 1869.142390] [<c12e3f93>] dev_hard_start_xmit+0x168/0x1c7 [ 1869.143092] [<c12f1f17>] __qdisc_run+0xe1/0x1b7 [ 1869.143612] [<c12e25ff>] qdisc_run+0x18/0x1a [ 1869.144248] [<c12e62f4>] dev_queue_xmit+0x16a/0x25a [ 1869.144785] [<c13b6dcc>] ? _read_unlock_bh+0xe/0x10 [ 1869.145465] [<c12eacdb>] neigh_resolve_output+0x19c/0x1c7 [ 1869.146182] [<c130e2da>] ? ip_finish_output+0x0/0x51 [ 1869.146697] [<c130e2a0>] ip_finish_output2+0x182/0x1bc [ 1869.147358] [<c130e327>] ip_finish_output+0x4d/0x51 [ 1869.147863] [<c130e9d5>] ip_output+0x80/0x85 [ 1869.148515] [<c130cc49>] dst_output+0x9/0xb [ 1869.149141] [<c130dec6>] ip_local_out+0x17/0x1a [ 1869.149632] [<c130e0bc>] ip_push_pending_frames+0x1f3/0x255 [ 1869.150343] [<c13247ff>] raw_sendmsg+0x5e6/0x667 [ 1869.150883] [<c1033c55>] ? insert_work+0x6a/0x73 [ 1869.151834] [<c8071e00>] ? ieee80211_invoke_rx_handlers+0x17da/0x1ae8 [mac80211] [ 1869.152630] [<c132bd68>] inet_sendmsg+0x3b/0x48 [ 1869.153232] [<c12d7deb>] __sock_sendmsg+0x45/0x4e [ 1869.153740] [<c12d8537>] sock_sendmsg+0xb8/0xce [ 1869.154519] [<c80be46d>] ? ath5k_hw_setup_rx_desc+0x0/0x66 [ath5k] [ 1869.155289] [<c1036b25>] ? autoremove_wake_function+0x0/0x30 [ 1869.155859] [<c115992b>] ? __copy_from_user_ll+0x11/0xce [ 1869.156573] [<c1159d99>] ? copy_from_user+0x31/0x54 [ 1869.157235] [<c12df646>] ? verify_iovec+0x40/0x6e [ 1869.157778] [<c12d869a>] sys_sendmsg+0x14d/0x1a5 [ 1869.158714] [<c8072c40>] ? __ieee80211_rx+0x49e/0x4ee [mac80211] [ 1869.159641] [<c80c83fe>] ? ath5k_rxbuf_setup+0x6d/0x8d [ath5k] [ 1869.160543] [<c80be46d>] ? ath5k_hw_setup_rx_desc+0x0/0x66 [ath5k] [ 1869.161434] [<c80beba4>] ? ath5k_hw_get_rxdp+0xe/0x10 [ath5k] [ 1869.162319] [<c80c97bc>] ? ath5k_tasklet_rx+0xba/0x506 [ath5k] [ 1869.163063] [<c1005627>] ? enable_8259A_irq+0x40/0x43 [ 1869.163594] [<c101edb8>] ? __dequeue_entity+0x23/0x27 [ 1869.164793] [<c100187a>] ? __switch_to+0x2b/0x105 [ 1869.165442] [<c1021d5f>] ? finish_task_switch+0x5b/0x74 [ 1869.166129] [<c12d963a>] sys_socketcall+0x14b/0x17b [ 1869.166612] [<c1002b95>] syscall_call+0x7/0xb Signed-off-by: Andrey Yurovsky <andrey@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | Wireless: nl80211, fix lock imbalanceJiri Slaby2009-07-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Don't forget to unlock cfg80211_mutex in one fail path of nl80211_set_wiphy. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* | | ipv4: Fix fib_trie rebalancing, part 4 (root thresholds)Jarek Poplawski2009-07-081-2/+2
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pawel Staszewski wrote: <blockquote> Some time ago i report this: http://bugzilla.kernel.org/show_bug.cgi?id=6648 and now with 2.6.29 / 2.6.29.1 / 2.6.29.3 and 2.6.30 it back dmesg output: oprofile: using NMI interrupt. Fix inflate_threshold_root. Now=15 size=11 bits ... Fix inflate_threshold_root. Now=15 size=11 bits cat /proc/net/fib_triestat Basic info: size of leaf: 40 bytes, size of tnode: 56 bytes. Main: Aver depth: 2.28 Max depth: 6 Leaves: 276539 Prefixes: 289922 Internal nodes: 66762 1: 35046 2: 13824 3: 9508 4: 4897 5: 2331 6: 1149 7: 5 9: 1 18: 1 Pointers: 691228 Null ptrs: 347928 Total size: 35709 kB </blockquote> It seems, the current threshold for root resizing is too aggressive, and it causes misleading warnings during big updates, but it might be also responsible for memory problems, especially with non-preempt configs, when RCU freeing is delayed long after call_rcu. It should be also mentioned that because of non-atomic changes during resizing/rebalancing the current lookup algorithm can miss valid leaves so it's additional argument to shorten these activities even at a cost of a minimally longer searching. This patch restores values before the patch "[IPV4]: fib_trie root node settings", commit: 965ffea43d4ebe8cd7b9fee78d651268dd7d23c5 from v2.6.22. Pawel's report: <blockquote> I dont see any big change of (cpu load or faster/slower routing/propagating routes from bgpd or something else) - in avg there is from 2% to 3% more of CPU load i dont know why but it is - i change from "preempt" to "no preempt" 3 times and check this my "mpstat -P ALL 1 30" always avg cpu load was from 2 to 3% more compared to "no preempt" [...] cat /proc/net/fib_triestat Basic info: size of leaf: 20 bytes, size of tnode: 36 bytes. Main: Aver depth: 2.44 Max depth: 6 Leaves: 277814 Prefixes: 291306 Internal nodes: 66420 1: 32737 2: 14850 3: 10332 4: 4871 5: 2313 6: 942 7: 371 8: 3 17: 1 Pointers: 599098 Null ptrs: 254865 Total size: 18067 kB </blockquote> According to this and other similar reports average depth is slightly increased (~0.2), and root nodes are shorter (log 17 vs. 18), but there is no visible performance decrease. So, until memory handling is improved or added parameters for changing this individually, this patch resets to safer defaults. Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> Reported-by: Jorge Boncompte [DTI2] <jorge@dti2.net> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Tested-by: Pawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: fix warning at inet_sock_destruct() while release sctp socketWei Yongjun2009-07-061-23/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 'net: Move rx skb_orphan call to where needed' broken sctp protocol with warning at inet_sock_destruct(). Actually, sctp can do this right with sctp_sock_rfree_frag() and sctp_skb_set_owner_r_frag() pair. sctp_sock_rfree_frag(skb); sctp_skb_set_owner_r_frag(skb, newsk); This patch not revert the commit d55d87fdff8252d0e2f7c28c2d443aee17e9d70f, instead remove the sctp_sock_rfree_frag() function. ------------[ cut here ]------------ WARNING: at net/ipv4/af_inet.c:151 inet_sock_destruct+0xe0/0x142() Modules linked in: sctp ipv6 dm_mirror dm_region_hash dm_log dm_multipath scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan] Pid: 1808, comm: sctp_test Not tainted 2.6.31-rc2 #40 Call Trace: [<c042dd06>] warn_slowpath_common+0x6a/0x81 [<c064a39a>] ? inet_sock_destruct+0xe0/0x142 [<c042dd2f>] warn_slowpath_null+0x12/0x15 [<c064a39a>] inet_sock_destruct+0xe0/0x142 [<c05fde44>] __sk_free+0x19/0xcc [<c05fdf50>] sk_free+0x18/0x1a [<ca0d14ad>] sctp_close+0x192/0x1a1 [sctp] [<c0649f7f>] inet_release+0x47/0x4d [<c05fba4d>] sock_release+0x19/0x5e [<c05fbab3>] sock_close+0x21/0x25 [<c049c31b>] __fput+0xde/0x189 [<c049c3de>] fput+0x18/0x1a [<c049988f>] filp_close+0x56/0x60 [<c042f422>] put_files_struct+0x5d/0xa1 [<c042f49f>] exit_files+0x39/0x3d [<c043086a>] do_exit+0x1a5/0x5dd [<c04a86c2>] ? d_kill+0x35/0x3b [<c0438fa4>] ? dequeue_signal+0xa6/0x115 [<c0430d05>] do_group_exit+0x63/0x8a [<c0439504>] get_signal_to_deliver+0x2e1/0x2f9 [<c0401d9e>] do_notify_resume+0x7c/0x6b5 [<c043f601>] ? autoremove_wake_function+0x0/0x34 [<c04a864e>] ? __d_free+0x3d/0x40 [<c04a867b>] ? d_free+0x2a/0x3c [<c049ba7e>] ? vfs_write+0x103/0x117 [<c05fc8fa>] ? sys_socketcall+0x178/0x182 [<c0402a56>] work_notifysig+0x13/0x19 ---[ end trace 9db92c463e789fba ]--- Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | dsa: fix 88e6xxx statistics counter snapshottingStephane Contri2009-07-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The bit that tells us whether a statistics counter snapshot operation has completed is located in the GLOBAL register block, not in the GLOBAL2 register block, so fix up mv88e6xxx_stats_wait() to poll the right register address. Signed-off-by: Stephane Contri <Stephane.Contri@grassvalley.com> Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
* | IPv6: preferred lifetime of address not getting updatedBrian Haley2009-07-031-3/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a bug in addrconf_prefix_rcv() where it won't update the preferred lifetime of an IPv6 address if the current valid lifetime of the address is less than 2 hours (the minimum value in the RA). For example, If I send a router advertisement with a prefix that has valid lifetime = preferred lifetime = 2 hours we'll build this address: 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic valid_lft 7175sec preferred_lft 7175sec If I then send the same prefix with valid lifetime = preferred lifetime = 0 it will be ignored since the minimum valid lifetime is 2 hours: 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic valid_lft 7161sec preferred_lft 7161sec But according to RFC 4862 we should always reset the preferred lifetime even if the valid lifetime is invalid, which would cause the address to immediately get deprecated. So with this patch we'd see this: 5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2001:1890:1109:a20:21f:29ff:fe5a:ef04/64 scope global deprecated dynamic valid_lft 7163sec preferred_lft 0sec The comment winds-up being 5x the size of the code to fix the problem. Update the preferred lifetime of IPv6 addresses derived from a prefix info option in a router advertisement even if the valid lifetime in the option is invalid, as specified in RFC 4862 Section 5.5.3e. Fixes an issue where an address will not immediately become deprecated. Reported by Jens Rosenboom. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | xfrm6: fix the proto and ports decode of sctp protocolWei Yongjun2009-07-031-2/+4
| | | | | | | | | | | | | | | | | | | | | | The SCTP pushed the skb above the sctp chunk header, so the check of pskb_may_pull(skb, nh + offset + 1 - skb->data) in _decode_session6() will never return 0 and the ports decode of sctp will always fail. (nh + offset + 1 - skb->data < 0) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | xfrm4: fix the ports decode of sctp protocolWei Yongjun2009-07-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | The SCTP pushed the skb data above the sctp chunk header, so the check of pskb_may_pull(skb, xprth + 4 - skb->data) in _decode_session4() will never return 0 because xprth + 4 - skb->data < 0, the ports decode of sctp will always fail. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/9p: Fix crash due to bad mount parameters.Abhishek Kulkarni2009-07-021-6/+8
| | | | | | | | | | | | | | | | | | It is not safe to use match_int without checking the token type returned by match_token (especially when the token type returned is Opt_err and args is empty). Fix it. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Revert "ipv4: arp announce, arp_proxy and windows ip conflict verification"Eric W. Biederman2009-06-301-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 73ce7b01b4496a5fbf9caf63033c874be692333f. After discovering that we don't listen to gratuitious arps in 2.6.30 I tracked the failure down to this commit. The patch makes absolutely no sense. RFC2131 RFC3927 and RFC5227. are all in agreement that an arp request with sip == 0 should be used for the probe (to prevent learning) and an arp request with sip == tip should be used for the gratitous announcement that people can learn from. It appears the author of the broken patch got those two cases confused and modified the code to drop all gratuitous arp traffic. Ouch! Cc: stable@kernel.org Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Fix fib_trie rebalancing, part 3Jarek Poplawski2009-06-301-0/+3
| | | | | | | | | | | | | | | | | | | | | | Alas current delaying of freeing old tnodes by RCU in trie_rebalance is still not enough because we can free a top tnode before updating a t->trie pointer. Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> Tested-by: Pawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: xmit sctp packet always return no route errorWei Yongjun2009-06-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | Commit 'net: skb->dst accessors'(adf30907d63893e4208dfe3f5c88ae12bc2f25d5) broken the sctp protocol stack, the sctp packet can never be sent out after Eric Dumazet's patch, which have typo in the sctp code. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Vlad Yasevich <vladisalv.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | xfrm: use xfrm_addr_cmp() instead of compare addresses directlyWei Yongjun2009-06-291-49/+8
| | | | | | | | | | | | | | | | Clean up to use xfrm_addr_cmp() instead of compare addresses directly. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: Do not tack on TSO data to non-TSO packetHerbert Xu2009-06-291-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | If a socket starts out on a non-TSO route, and then switches to a TSO route, then we will tack on data to the tail of the tx queue even if it started out life as non-TSO. This is suboptimal because all of it will then be copied and checksummed unnecessarily. This patch fixes this by ensuring that skb->ip_summed is set to CHECKSUM_PARTIAL before appending extra data beyond the MSS. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: Stop non-TSO packets morphing into TSOHerbert Xu2009-06-291-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If a socket starts out on a non-TSO route, and then switches to a TSO route, then the tail on the tx queue can morph into a TSO packet, causing mischief because the rest of the stack does not expect a partially linear TSO packet. This patch fixes this by ensuring that skb->ip_summed is set to CHECKSUM_PARTIAL before declaring a packet as TSO. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'for-linus' of ↵David S. Miller2009-06-291-0/+6
|\ \ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lowpan/lowpan
| * | nl802154: add module license and descriptionDmitry Eremin-Solenikov2009-06-291-0/+3
| | | | | | | | | | | | Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
| * | nl802154: fix Oops in ieee802154_nl_get_devDmitry Eremin-Solenikov2009-06-291-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | ieee802154_nl_get_dev() lacks check for the existance of the device that was returned by dev_get_XXX, thus resulting in Oops for non-existing devices. Fix it. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
* | | Merge branch 'master' of ↵David S. Miller2009-06-295-17/+78
|\ \ \ | | |/ | |/| | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
| * | netfilter: xtables: conntrack match revision 2Jan Engelhardt2009-06-291-6/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported by Philip, the UNTRACKED state bit does not fit within the 8-bit state_mask member. Enlarge state_mask and give status_mask a few more bits too. Reported-by: Philip Craig <philipc@snapgear.com> References: http://markmail.org/thread/b7eg6aovfh4agyz7 Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | netfilter: tcp conntrack: fix unacknowledged data detection with NATPatrick McHardy2009-06-292-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When NAT helpers change the TCP packet size, the highest seen sequence number needs to be corrected. This is currently only done upwards, when the packet size is reduced the sequence number is unchanged. This causes TCP conntrack to falsely detect unacknowledged data and decrease the timeout. Fix by updating the highest seen sequence number in both directions after packet mangling. Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | nf_conntrack: Use rcu_barrier()Jesper Dangaard Brouer2009-06-252-2/+4
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RCU barriers, rcu_barrier(), is inserted two places. In nf_conntrack_expect.c nf_conntrack_expect_fini() before the kmem_cache_destroy(). Firstly to make sure the callback to the nf_ct_expect_free_rcu() code is still around. Secondly because I'm unsure about the consequence of having in flight nf_ct_expect_free_rcu/kmem_cache_free() calls while doing a kmem_cache_destroy() slab destroy. And in nf_conntrack_extend.c nf_ct_extend_unregister(), inorder to wait for completion of callbacks to __nf_ct_ext_free_rcu(), which is invoked by __nf_ct_ext_add(). It might be more efficient to call rcu_barrier() in nf_conntrack_core.c nf_conntrack_cleanup_net(), but thats make it more difficult to read the code (as the callback code in located in nf_conntrack_extend.c). Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | gro: Flush GRO packets in napi_disable_pending pathHerbert Xu2009-06-261-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When NAPI is disabled while we're in net_rx_action, we end up calling __napi_complete without flushing GRO packets. This is a bug as it would cause the GRO packets to linger, of course it also literally BUGs to catch error like this :) This patch changes it to napi_complete, with the obligatory IRQ reenabling. This should be safe because we've only just disabled IRQs and it does not materially affect the test conditions in between. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: Call skb_orphan before tproxy activatesHerbert Xu2009-06-262-0/+6
| | | | | | | | | | | | | | | | | | | | As transparent proxying looks up the socket early and assigns it to the skb for later processing, we must drop any existing socket ownership prior to that in order to distinguish between the case where tproxy is active and where it is not. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mac80211: Use rcu_barrier() on unload.Jesper Dangaard Brouer2009-06-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The mac80211 module uses rcu_call() thus it should use rcu_barrier() on module unload. The rcu_barrier() is placed in mech.c ieee80211_stop_mesh() which is invoked from ieee80211_stop() in case vif.type == NL80211_IFTYPE_MESH_POINT. Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sunrpc: Use rcu_barrier() on unload.Jesper Dangaard Brouer2009-06-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sunrpc module uses rcu_call() thus it should use rcu_barrier() on module unload. Have not verified that the possibility for new call_rcu() callbacks has been disabled. As a hint for checking, the functions calling call_rcu() (unx_destroy_cred and generic_destroy_cred) are registered as crdestroy function pointer in struct rpc_credops. Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bridge: Use rcu_barrier() instead of syncronize_net() on unload.Jesper Dangaard Brouer2009-06-261-1/+1
| | | | | | | | | | | | | | | | | | | | When unloading modules that uses call_rcu() callbacks, then we must use rcu_barrier(). This module uses syncronize_net() which is not enough to be sure that all callback has been completed. Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net>