| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When CONFIG_MODULES is not set then it fails to compile in lockdep:
|kernel/locking/lockdep.c: In function 'look_up_lock_class':
|kernel/locking/lockdep.c:684:12: error: implicit declaration of function
| '__is_module_percpu_address' [-Werror=implicit-function-declaration]
If CONFIG_MODULES is set but CONFIG_SMP is not, then it compiles but
fails link at the end:
|kernel/locking/lockdep.c:684: undefined reference to `__is_module_percpu_address'
|kernel/built-in.o:(.debug_addr+0x1e674): undefined reference to `__is_module_percpu_address'
This patch adds the function for both cases.
Signed-off-by: Dan Murphy <dmurphy@ti.com>
[bigeasy: merge the two patches from Dan into one, adapt changelog]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
| |
What we have now should be enough, the EXPORT_SYMBOL statement for
rt_mutex_destroy() is not required.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a PER_CPU struct which contains a spin_lock is statically initialized
via:
DEFINE_PER_CPU(struct foo, bla) = {
.lock = __SPIN_LOCK_UNLOCKED(bla.lock)
};
then lockdep assigns a seperate key to each lock because the logic for
assigning a key to statically initialized locks is to use the address as
the key. With per CPU locks the address is obvioulsy different on each CPU.
That's wrong, because all locks should have the same key.
To solve this the following modifications are required:
1) Extend the is_kernel/module_percpu_addr() functions to hand back the
canonical address of the per CPU address, i.e. the per CPU address
minus the per CPU offset.
2) Check the lock address with these functions and if the per CPU check
matches use the returned canonical address as the lock key, so all per
CPU locks have the same key.
3) Move the static_obj(key) check into look_up_lock_class() so this check
can be avoided for statically initialized per CPU locks. That's
required because the canonical address fails the static_obj(key) check
for obvious reasons.
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Alex Goins reported that mutex_destroy() on RT will force a GPL only symbol
which won't link and therefore fail on a non-GPL kernel module.
This does not happen on !RT and is a regression on RT which we would like to
avoid.
I try here the easy thing and to not use rt_mutex_destroy() if
CONFIG_DEBUG_MUTEXES is not enabled. This will still break for the DEBUG
configs so instead of adding a wrapper around rt_mutex_destroy() (which we have
for rt_mutex_lock() for instance) I am simply dropping the GPL part from the
export.
Reported-by: Alex Goins <agoins@nvidia.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although wbinvd() is faster than flushing many individual pages, it
blocks the memory bus for "long" periods of time (>100us), thus
directly causing unusually large latencies on all CPUs, regardless
of any CPU isolation features that may be active.
For 1024 pages, flushing those pages individually can take up to
2200us, but the task remains fully preemptible during that time.
Cc: stable-rt@vger.kernel.org
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The preload functionality uses per-CPU variables and preempt-disable to
ensure that it does not switch CPUs during its usage. This patch adds
local_locks() instead preempt_disable() for the same purpose and to
remain preemptible on -RT.
Cc: stable-rt@vger.kernel.org
Reported-and-debugged-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The RCU sched protection was changed to RCU only and so all IRQ-off and
preempt-off disabled region were changed to the relevant rcu-read-lock
primitives. One was missed and triggered:
|[ BUG: bad unlock balance detected! ]
|4.4.30-rt41 #51 Tainted: G W
|btattach/345 is trying to release lock (
|Unable to handle kernel paging request at virtual address 6b6b6bbb
|Backtrace:
|[<c016b5a0>] (lock_release) from [<c0804844>] (rt_spin_unlock+0x20/0x30)
|[<c0804824>] (rt_spin_unlock) from [<c0138954>] (put_pwq_unlocked+0xa4/0x118)
|[<c01388b0>] (put_pwq_unlocked) from [<c0138b2c>] (destroy_workqueue+0x164/0x1b0)
|[<c01389c8>] (destroy_workqueue) from [<c078e1ac>] (hci_unregister_dev+0x120/0x21c)
|[<c078e08c>] (hci_unregister_dev) from [<c054f658>] (hci_uart_tty_close+0x90/0xbc)
|[<c054f5c8>] (hci_uart_tty_close) from [<c03a2be8>] (tty_ldisc_close+0x50/0x58)
|[<c03a2b98>] (tty_ldisc_close) from [<c03a2cb4>] (tty_ldisc_kill+0x18/0x78)
|[<c03a2c9c>] (tty_ldisc_kill) from [<c03a3528>] (tty_ldisc_release+0x100/0x134)
|[<c03a3428>] (tty_ldisc_release) from [<c039cd68>] (tty_release+0x3bc/0x460)
|[<c039c9ac>] (tty_release) from [<c020cc08>] (__fput+0xe0/0x1b4)
|[<c020cb28>] (__fput) from [<c020cd3c>] (____fput+0x10/0x14)
|[<c020cd2c>] (____fput) from [<c013e0d4>] (task_work_run+0xa4/0xb8)
|[<c013e030>] (task_work_run) from [<c0121754>] (do_exit+0x40c/0x8b0)
|[<c0121348>] (do_exit) from [<c0122ff8>] (do_group_exit+0x54/0xc4)
Cc: stable-rt@vger.kernel.org
Reported-by: John Keeping <john@metanate.com>
Tested-by: John Keeping <john@metanate.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
| |
Debian started to build the gcc with -fPIE by default so the kernel
build ends before it starts properly with:
|kernel/bounds.c:1:0: error: code model kernel does not support PIC mode
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
| |
Line up helper arrows to the right column.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bigeasy: fixup function tracer header]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
|
|
|
|
| |
It has been pointed out by tglx that on UP the non-RT task could spin
its entire time slice because the lock owner is preempted. This won't
happen on !RT. So we back to "chill" if we can't cond_resched() did not
work.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Upstream commit 47be61845c77 ("fs/dcache.c: avoid soft-lockup in
dput()") changed the condition _when_ cpu_relax() / cond_resched() was
invoked. This change was adapted in -RT into mostly the same thing
except that if cond_resched() did nothing we had to do cpu_chill() to
force the task off CPU for a tiny little bit in case the task had RT
priority and did not want to leave the CPU.
This change resulted in a performance regression (in my testcase the
build time on /dev/shm increased from 19min to 24min). The reason is
that with this change cpu_chill() was invoked even dput() made progress
(dentry_kill() returned a different dentry) instead only if we were
trying this operation on the same dentry over and over again.
This patch brings back to the old behavior back to cond_resched() &
chill if we make no progress. A little improvement is to invoke
cpu_chill() only if we are a RT task (and avoid the sleep otherwise).
Otherwise the scheduler should remove us from the CPU if we make no
progress.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
| |
It looks like the this_cpu_ptr() access in icmp_sk() is protected with
local_bh_disable(). To avoid missing serialization in -RT I am adding
here a local lock. No crash has been observed, this is just precaution.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some time ago Sami Pietikäinen reported a crash on -RT in
ip_send_unicast_reply() which was later fixed by Nicholas Mc Guire
(v3.12.8-rt11). Later (v3.18.8) the code was reworked and I dropped the
patch. As it turns out it was mistake.
I have reports that the same crash is possible with a similar backtrace.
It seems that vanilla protects access to this_cpu_ptr() via
local_bh_disable(). This does not work the on -RT since we can have
NET_RX and NET_TX running in parallel on the same CPU.
This is brings back the old locks.
|Unable to handle kernel NULL pointer dereference at virtual address 00000010
|PC is at __ip_make_skb+0x198/0x3e8
|[<c04e39d8>] (__ip_make_skb) from [<c04e3ca8>] (ip_push_pending_frames+0x20/0x40)
|[<c04e3ca8>] (ip_push_pending_frames) from [<c04e3ff0>] (ip_send_unicast_reply+0x210/0x22c)
|[<c04e3ff0>] (ip_send_unicast_reply) from [<c04fbb54>] (tcp_v4_send_reset+0x190/0x1c0)
|[<c04fbb54>] (tcp_v4_send_reset) from [<c04fcc1c>] (tcp_v4_do_rcv+0x22c/0x288)
|[<c04fcc1c>] (tcp_v4_do_rcv) from [<c0474364>] (release_sock+0xb4/0x150)
|[<c0474364>] (release_sock) from [<c04ed904>] (tcp_close+0x240/0x454)
|[<c04ed904>] (tcp_close) from [<c0511408>] (inet_release+0x74/0x7c)
|[<c0511408>] (inet_release) from [<c0470728>] (sock_release+0x30/0xb0)
|[<c0470728>] (sock_release) from [<c0470abc>] (sock_close+0x1c/0x24)
|[<c0470abc>] (sock_close) from [<c0115ec4>] (__fput+0xe8/0x20c)
|[<c0115ec4>] (__fput) from [<c0116050>] (____fput+0x18/0x1c)
|[<c0116050>] (____fput) from [<c0058138>] (task_work_run+0xa4/0xb8)
|[<c0058138>] (task_work_run) from [<c0011478>] (do_work_pending+0xd0/0xe4)
|[<c0011478>] (do_work_pending) from [<c000e740>] (work_pending+0xc/0x20)
|Code: e3530001 8a000001 e3a00040 ea000011 (e5973010)
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
| |
During master->rt merge, I stumbled across the buglet below.
Fix get_cpu()/put_cpu_light() imbalance.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Gabraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
| |
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
| |
There should be no need to hold the base lock during the wakeup. There
should be no boosting involved, the wakeup list has its own lock so it
should be safe to do this without the lock.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The base lock is dropped during the invocation if the timer. That means
it is possible that we have one waiter while timer1 is running and once
this one finished, we get another waiter while timer2 is running. Since
we wake up only one waiter it is possible that we miss the other one.
This will probably heal itself over time because most of the time we
complete timers without an active wake up.
To avoid the scenario where we don't wake up all waiters at once,
wake_up_all() is used.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On some x86 systems an MCE interrupt would come in before the kernel
was ready for it. Looking at the latest RT code, it has similar
(but not quite the same) code, except it adds a bool that tells if
MCE handling is initialized. That was required because they had
switched to use swork instead of a kernel thread. Here, just
checking to see if the thread is NULL is good enough to see if
MCE handling is initialized.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trace events like raw_syscalls show always a preempt code of one. The
reason is that on PREEMPT kernels rcu_read_lock_sched_notrace()
increases the preemption counter and the function recording the counter
is caller within the RCU section.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[ Changed this to upstream version. See commit e947841c0dce ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lru_add_drain_all() works by scheduling lru_add_drain_cpu() to run
on all CPUs that have non-empty LRU pagevecs and then waiting for
the scheduled work to complete. However, workqueue threads may never
have the chance to run on a CPU that's running a SCHED_FIFO task.
This causes lru_add_drain_all() to block forever.
This commit solves this problem by changing lru_add_drain_all()
to drain the LRU pagevecs of remote CPUs. This is done by grabbing
swapvec_lock and calling lru_add_drain_cpu().
PS: This is based on an idea and initial implementation by
Rik van Riel.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
| |
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we get out of preempt_schedule_irq() then we check for NEED_RESCHED
and call the former function again if set because the preemption counter
has be zero at this point.
However the counter for lazy-preempt might not be zero therefore we have
to check the counter before looking at the need_resched_lazy flag.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
| |
On -RT we try to acquire sleeping locks which might lead to warnings
from lockdep or a warn_on() from spin_try_lock() (which is a rtmutex on
RT).
We don't print in general from a IRQ off region so we should not try
this via console_unblank() / bust_spinlocks() as well.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At first glance, the use of 'static inline' seems appropriate for
INIT_HLIST_BL_HEAD().
However, when a 'static inline' function invocation is inlined by gcc,
all callers share any static local data declared within that inline
function.
This presents a problem for how lockdep classes are setup. raw_spinlocks, for
example, when CONFIG_DEBUG_SPINLOCK,
# define raw_spin_lock_init(lock) \
do { \
static struct lock_class_key __key; \
\
__raw_spin_lock_init((lock), #lock, &__key); \
} while (0)
When this macro is expanded into a 'static inline' caller, like
INIT_HLIST_BL_HEAD():
static inline INIT_HLIST_BL_HEAD(struct hlist_bl_head *h)
{
h->first = NULL;
raw_spin_lock_init(&h->lock);
}
...the static local lock_class_key object is made a function static.
For compilation units which initialize invoke INIT_HLIST_BL_HEAD() more
than once, then, all of the invocations share this same static local
object.
This can lead to some very confusing lockdep splats (example below).
Solve this problem by forcing the INIT_HLIST_BL_HEAD() to be a macro,
which prevents the lockdep class object sharing.
=============================================
[ INFO: possible recursive locking detected ]
4.4.4-rt11 #4 Not tainted
---------------------------------------------
kswapd0/59 is trying to acquire lock:
(&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan
but task is already holding lock:
(&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&h->lock#2);
lock(&h->lock#2);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by kswapd0/59:
#0: (shrinker_rwsem){+.+...}, at: rt_down_read_trylock
#1: (&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan
Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Tested-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The root-lock is dropped before dev_hard_start_xmit() is invoked and after
setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away
by a task with a higher priority then the task with the higher priority
won't be able to submit packets to the NIC directly instead they will be
enqueued into the Qdisc. The NIC will remain idle until the task(s) with
higher priority leave the CPU and the task with lower priority gets back
and finishes the job.
If we take always the busylock we ensure that the RT task can boost the
low-prio task and submit the packet.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The async pagefault wake code can run from the idle task in exception
context, so everything here needs to be made non-preemptible.
Conversion to a simple wait queue and raw spinlock does the trick.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
| |
Drop 'success' arg from probe_wakeup_latency_hist_start().
Link: http://lkml.kernel.org/r/1457064246.3501.2.camel@gmail.com
Fixes: cf1dd658 sched: Introduce the trace_sched_waking tracepoint
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
preempt_disable() invokes preempt_count_add() which saves the caller in
current->preempt_disable_ip. It uses CALLER_ADDR1 which does not look for its
caller but for the parent of the caller. Which means we get the correct caller
for something like spin_lock() unless the architectures inlines those
invocations. It is always wrong for preempt_disable() or local_bh_disable().
This patch makes the function get_parent_ip() which tries CALLER_ADDR0,1,2 if
the former is a locking function. This seems to record the preempt_disable()
caller properly for preempt_disable() itself as well as for get_cpu_var() or
local_bh_disable().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
| |
RT has dropped support of rcu_bh, comment out in rcutorture.
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When running -rt kernel with both PREEMPT_OFF_HIST and LOCKDEP enabled,
the below error is reported:
[ INFO: suspicious RCU usage. ]
4.4.1-rt6 #1 Not tainted
include/trace/events/hist.h:31 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.
stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.1-rt6-WR8.0.0.0_standard #1
Stack : 0000000000000006 0000000000000000 ffffffff81ca8c38 ffffffff81c8fc80
ffffffff811bdd68 ffffffff81cb0000 0000000000000000 ffffffff81cb0000
0000000000000000 0000000000000000 0000000000000004 0000000000000000
0000000000000004 ffffffff811bdf50 0000000000000000 ffffffff82b60000
0000000000000000 ffffffff812897ac ffffffff819f0000 000000000000000b
ffffffff811be460 ffffffff81b7c588 ffffffff81c8fc80 0000000000000000
0000000000000000 ffffffff81ec7f88 ffffffff81d70000 ffffffff81b70000
ffffffff81c90000 ffffffff81c3fb00 ffffffff81c3fc28 ffffffff815e6f98
0000000000000000 ffffffff81c8fa87 ffffffff81b70958 ffffffff811bf2c4
0707fe32e8d60ca5 ffffffff81126d60 0000000000000000 0000000000000000
...
Call Trace:
[<ffffffff81126d60>] show_stack+0xe8/0x108
[<ffffffff815e6f98>] dump_stack+0x88/0xb0
[<ffffffff8124b88c>] time_hardirqs_off+0x204/0x300
[<ffffffff811aa5dc>] trace_hardirqs_off_caller+0x24/0xe8
[<ffffffff811a4ec4>] cpu_startup_entry+0x39c/0x508
[<ffffffff81d7dc68>] start_kernel+0x584/0x5a0
Replace regular trace_preemptoff_hist to rcuidle version to avoid the error.
Signed-off-by: Yang Shi <yang.shi@windriver.com>
Cc: bigeasy@linutronix.de
Cc: rostedt@goodmis.org
Cc: linux-rt-users@vger.kernel.org
Link: http://lkml.kernel.org/r/1456262603-10075-1-git-send-email-yang.shi@windriver.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
homer: # nm kernel/sched/core.o|grep preemptible_lazy
00000000000000b5 t preemptible_lazy
echo wakeup_rt > current_tracer ==> Welcome to infinity.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Link: http://lkml.kernel.org/r/1456067490.3771.2.camel@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With completion using swait and so rawlocks we don't need this anymore.
Further, bisect thinks this patch is responsible for:
|BUG: unable to handle kernel NULL pointer dereference at (null)
|IP: [<ffffffff81082123>] sched_cpu_active+0x53/0x70
|PGD 0
|Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
|Dumping ftrace buffer:
| (ftrace buffer empty)
|Modules linked in:
|CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.1+ #330
|Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
|task: ffff88013ae64b00 ti: ffff88013ae74000 task.ti: ffff88013ae74000
|RIP: 0010:[<ffffffff81082123>] [<ffffffff81082123>] sched_cpu_active+0x53/0x70
|RSP: 0000:ffff88013ae77eb8 EFLAGS: 00010082
|RAX: 0000000000000001 RBX: ffffffff81c2cf20 RCX: 0000001050fb52fb
|RDX: 0000001050fb52fb RSI: 000000105117ca1e RDI: 00000000001c7723
|RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
|R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffff
|R13: ffffffff81c2cee0 R14: 0000000000000000 R15: 0000000000000001
|FS: 0000000000000000(0000) GS:ffff88013b200000(0000) knlGS:0000000000000000
|CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
|CR2: 0000000000000000 CR3: 0000000001c09000 CR4: 00000000000006e0
|Stack:
| ffffffff810c446d ffff88013ae77f00 ffffffff8107d8dd 000000000000000a
| 0000000000000001 0000000000000000 0000000000000000 0000000000000000
| 0000000000000000 ffff88013ae77f10 ffffffff8107d90e ffff88013ae77f20
|Call Trace:
| [<ffffffff810c446d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
| [<ffffffff8107d8dd>] ? notifier_call_chain+0x5d/0x80
| [<ffffffff8107d90e>] ? __raw_notifier_call_chain+0xe/0x10
| [<ffffffff810598a3>] ? cpu_notify+0x23/0x40
| [<ffffffff8105a7b8>] ? notify_cpu_starting+0x28/0x30
during hotplug. The rawlocks need to remain however.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
| |
We unlock the lock while the interrupts are off. This isn't a problem
now but will get because the migrate_disable() + enable are not
symmetrical in regard to the status of interrupts.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
| |
With interrupts off it makes no sense to do the long path since we can't
leave the CPU anyway. Also we might end up in a recursion with lockdep.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Atleast on X86 we die a recursive death
|CPU: 3 PID: 585 Comm: bash Not tainted 4.4.1-rt4+ #198
|Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
|task: ffff88007ab4cd00 ti: ffff88007ab94000 task.ti: ffff88007ab94000
|RIP: 0010:[<ffffffff81684870>] [<ffffffff81684870>] int3+0x0/0x10
|RSP: 0018:ffff88013c107fd8 EFLAGS: 00010082
|RAX: ffff88007ab4cd00 RBX: ffffffff8100ceab RCX: 0000000080202001
|RDX: 0000000000000000 RSI: ffffffff8100ceab RDI: ffffffff810c78b2
|RBP: ffff88007ab97c10 R08: ffffffffff57b000 R09: 0000000000000000
|R10: ffff88013bb64790 R11: ffff88007ab4cd68 R12: ffffffff8100ceab
|R13: ffffffff810c78b2 R14: ffffffff810f8158 R15: ffffffff810f9120
|FS: 0000000000000000(0000) GS:ffff88013c100000(0063) knlGS:00000000f74e3940
|CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
|CR2: 0000000008cf6008 CR3: 000000013b169000 CR4: 00000000000006e0
|Call Trace:
| <#DB>
| [<ffffffff810f8158>] ? trace_preempt_off+0x18/0x170
| <<EOE>>
| [<ffffffff81077745>] preempt_count_add+0xa5/0xc0
| [<ffffffff810c78b2>] on_each_cpu+0x22/0x90
| [<ffffffff8100ceab>] text_poke_bp+0x5b/0xc0
| [<ffffffff8100a29c>] arch_jump_label_transform+0x8c/0xf0
| [<ffffffff8111c77c>] __jump_label_update+0x6c/0x80
| [<ffffffff8111c83a>] jump_label_update+0xaa/0xc0
| [<ffffffff8111ca54>] static_key_slow_inc+0x94/0xa0
| [<ffffffff810e0d8d>] tracepoint_probe_register_prio+0x26d/0x2c0
| [<ffffffff810e0df3>] tracepoint_probe_register+0x13/0x20
| [<ffffffff810fca78>] trace_event_reg+0x98/0xd0
| [<ffffffff810fcc8b>] __ftrace_event_enable_disable+0x6b/0x180
| [<ffffffff810fd5b8>] event_enable_write+0x78/0xc0
| [<ffffffff8117a768>] __vfs_write+0x28/0xe0
| [<ffffffff8117b025>] vfs_write+0xa5/0x180
| [<ffffffff8117bb76>] SyS_write+0x46/0xa0
| [<ffffffff81002c91>] do_fast_syscall_32+0xa1/0x1d0
| [<ffffffff81684d57>] sysenter_flags_fixed+0xd/0x17
during
echo 1 > /sys/kernel/debug/tracing/events/hist/preemptirqsoff_hist/enable
Reported-By: Christoph Mathys <eraserix@gmail.com>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the NET_RX uses up all of his budget it moves the following NAPI
invocations into the `ksoftirqd`. On -RT it does not do so. Instead it
rises the NET_RX softirq in its current context again.
In order to get closer to mainline's behaviour this patch provides
__raise_softirq_irqoff_ksoft() which raises the softirq in the ksoftird.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The softirqd runs in -RT with SCHED_FIFO (prio 1) and deals mostly with
timer wakeup which can not happen in hardirq context. The prio has been
risen from the normal SCHED_OTHER so the timer wakeup does not happen
too late.
With enough networking load it is possible that the system never goes
idle and schedules ksoftirqd and everything else with a higher priority.
One of the tasks left behind is one of RCU's threads and so we see stalls
and eventually run out of memory.
This patch moves the TIMER and HRTIMER softirqs out of the `ksoftirqd`
thread into its own `ktimersoftd`. The former can now run SCHED_OTHER
(same as mainline) and the latter at SCHED_FIFO due to the wakeups.
From networking point of view: The NAPI callback runs after the network
interrupt thread completes. If its run time takes too long the NAPI code
itself schedules the `ksoftirqd`. Here in the thread it can run at
SCHED_OTHER priority and it won't defer RCU anymore.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
| |
Probably in the rebase onto v4.1 this check got moved into less commonly used
preempt_schedule_notrace(). This patch ensures that both functions use it.
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
| |
In the non-RT case the spin_lock_irq() here disables interrupts as well
as raw_spin_lock_irq(). So in the unlock case the interrupts are enabled
too early.
Reported-by: kernel test robot <ying.huang@linux.intel.com>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Upstream commit fbd705a0c6184580d0e2fbcbd47a37b6e5822511
Mathieu reported that since 317f394160e9 ("sched: Move the second half
of ttwu() to the remote cpu") trace_sched_wakeup() can happen out of
context of the waker.
This is a problem when you want to analyse wakeup paths because it is
now very hard to correlate the wakeup event to whoever issued the
wakeup.
OTOH trace_sched_wakeup() is issued at the point where we set
p->state = TASK_RUNNING, which is right were we hand the task off to
the scheduler, so this is an important point when looking at
scheduling behaviour, up to here its been the wakeup path everything
hereafter is due to scheduler policy.
To bridge this gap, introduce a second tracepoint: trace_sched_waking.
It is guaranteed to be called in the waker context.
[ Ported to linux-4.1.y-rt kernel by Mathieu Desnoyers. Resolved
conflict: try_to_wake_up_local() does not exist in -rt kernel. Removed
its instrumentation hunk. ]
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Julien Desfossez <jdesfossez@efficios.com>
CC: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Francis Giraldeau <francis.giraldeau@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/20150609091336.GQ3644@twins.programming.kicks-ass.net
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On architectures where arch_irq_work_has_interrupt() returns false, we
end up running the irq safe work from the softirq context. That
results in a potential deadlock in the scheduler irq work which
expects that function to be called with interrupts disabled.
Split the irq_work_tick() function into a hard and soft variant. Call
the hard variant from the tick interrupt and add the soft variant to
the timer softirq.
Reported-and-tested-by: Yanjiang Jin <yanjiang.jin@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I can constantly see below error report with 4.1 RT-kernel on TI ARM dra7-evm
if I'm trying to unplug cpu1:
[ 57.737589] CPU1: shutdown
[ 57.767537] BUG: spinlock bad magic on CPU#0, sh/137
[ 57.767546] lock: 0xee994730, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
[ 57.767552] CPU: 0 PID: 137 Comm: sh Not tainted 4.1.10-rt8-01700-g2c38702-dirty #55
[ 57.767555] Hardware name: Generic DRA74X (Flattened Device Tree)
[ 57.767568] [<c001acd0>] (unwind_backtrace) from [<c001534c>] (show_stack+0x20/0x24)
[ 57.767579] [<c001534c>] (show_stack) from [<c075560c>] (dump_stack+0x84/0xa0)
[ 57.767593] [<c075560c>] (dump_stack) from [<c00aca48>] (spin_dump+0x84/0xac)
[ 57.767603] [<c00aca48>] (spin_dump) from [<c00acaa4>] (spin_bug+0x34/0x38)
[ 57.767614] [<c00acaa4>] (spin_bug) from [<c00acc10>] (do_raw_spin_lock+0x168/0x1c0)
[ 57.767624] [<c00acc10>] (do_raw_spin_lock) from [<c075b4cc>] (_raw_spin_lock+0x4c/0x54)
[ 57.767631] [<c075b4cc>] (_raw_spin_lock) from [<c07599fc>] (rt_spin_lock_slowlock+0x5c/0x374)
[ 57.767638] [<c07599fc>] (rt_spin_lock_slowlock) from [<c075bcf4>] (rt_spin_lock+0x38/0x70)
[ 57.767649] [<c075bcf4>] (rt_spin_lock) from [<c06333c0>] (skb_dequeue+0x28/0x7c)
[ 57.767662] [<c06333c0>] (skb_dequeue) from [<c06476ec>] (dev_cpu_callback+0x1b8/0x240)
[ 57.767673] [<c06476ec>] (dev_cpu_callback) from [<c007566c>] (notifier_call_chain+0x3c/0xb4)
The reason is that skb_dequeue is taking skb->lock, but RT changed the
core code to use a raw spinlock. The non-raw lock is not initialized
on purpose to catch exactly this kind of problem.
Fixes: 91df05da13a6 'net: Use skbufhead with raw lock'
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While the use of synchronize_rcu_expedited() might make
synchronize_net() "faster", it does so at significant cost on RT
systems, as expediting a grace period forcibly preempts any
high-priority RT tasks (via the stop_machine() mechanism).
Without this change, we can observe a latency spike up to 30us with
cyclictest by rapidly unplugging/reestablishing an ethernet link.
Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Cc: bigeasy@linutronix.de
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20151027123153.GG8245@jcartwri.amer.corp.natinst.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I see here large latencies during a stack dump on x86. The
preempt_disable() and get_cpu() should forbid moving the task to another
CPU during a stack dump and avoiding two stack traces in parallel on the
same CPU. However a stack trace from a second CPU may still happen in
parallel. Also nesting is allowed so a stack trace happens in
process-context and we may have another one from IRQ context. With migrate
disable we keep this code preemptible and allow a second backtrace on
the same CPU by another task.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 8930ed80 (rtmutex: Cleanup deadlock detector debug logic),
chainwalking control enums were introduced to limit the deadlock
detection logic. One of the calls to task_blocks_on_rt_mutex was
missed when converting to use the enums.
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Brad Mouring <brad.mouring@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Yimin debugged that in case of a PI wakeup in progress when
rt_mutex_start_proxy_lock() calls task_blocks_on_rt_mutex() the latter
returns -EAGAIN and in consequence the remove_waiter() call runs into
a BUG_ON() because there is nothing to remove.
Guard it with rt_mutex_has_waiters(). This is a quick fix which is
easy to backport. The proper fix is to have a central check in
remove_waiter() so we can call it unconditionally.
Reported-and-debugged-by: Yimin Deng <yimin11.deng@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When running with the RT-kernel (4.1.5-rt5) on TI OMAP dra7-evm and trying
to do Suspend to RAM, the following backtrace occurs:
Disabling non-boot CPUs ...
PM: noirq suspend of devices complete after 7.295 msecs
Disabling non-boot CPUs ...
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
in_atomic(): 1, irqs_disabled(): 128, pid: 18, name: migration/1
INFO: lockdep is turned off.
irq event stamp: 122
hardirqs last enabled at (121): [<c06ac0ac>] _raw_spin_unlock_irqrestore+0x88/0x90
hardirqs last disabled at (122): [<c06abed0>] _raw_spin_lock_irq+0x28/0x5c
softirqs last enabled at (0): [<c003d294>] copy_process.part.52+0x410/0x19d8
softirqs last disabled at (0): [< (null)>] (null)
Preemption disabled at:[< (null)>] (null)
CPU: 1 PID: 18 Comm: migration/1 Tainted: G W 4.1.4-rt3-01046-g96ac8da #204
Hardware name: Generic DRA74X (Flattened Device Tree)
[<c0019134>] (unwind_backtrace) from [<c0014774>] (show_stack+0x20/0x24)
[<c0014774>] (show_stack) from [<c06a70f4>] (dump_stack+0x88/0xdc)
[<c06a70f4>] (dump_stack) from [<c006cab8>] (___might_sleep+0x198/0x2a8)
[<c006cab8>] (___might_sleep) from [<c06ac4dc>] (rt_spin_lock+0x30/0x70)
[<c06ac4dc>] (rt_spin_lock) from [<c013f790>] (find_lock_task_mm+0x9c/0x174)
[<c013f790>] (find_lock_task_mm) from [<c00409ac>] (clear_tasks_mm_cpumask+0xb4/0x1ac)
[<c00409ac>] (clear_tasks_mm_cpumask) from [<c00166a4>] (__cpu_disable+0x98/0xbc)
[<c00166a4>] (__cpu_disable) from [<c06a2e8c>] (take_cpu_down+0x1c/0x50)
[<c06a2e8c>] (take_cpu_down) from [<c00f2600>] (multi_cpu_stop+0x11c/0x158)
[<c00f2600>] (multi_cpu_stop) from [<c00f2a9c>] (cpu_stopper_thread+0xc4/0x184)
[<c00f2a9c>] (cpu_stopper_thread) from [<c0069058>] (smpboot_thread_fn+0x18c/0x324)
[<c0069058>] (smpboot_thread_fn) from [<c00649c4>] (kthread+0xe8/0x104)
[<c00649c4>] (kthread) from [<c0010058>] (ret_from_fork+0x14/0x3c)
CPU1: shutdown
PM: Calling sched_clock_suspend+0x0/0x40
PM: Calling timekeeping_suspend+0x0/0x2e0
PM: Calling irq_gc_suspend+0x0/0x68
PM: Calling fw_suspend+0x0/0x2c
PM: Calling cpu_pm_suspend+0x0/0x28
Also, sometimes system stucks right after displaying "Disabling non-boot
CPUs ...". The root cause of above backtrace is task_lock() which takes
a sleeping lock on -RT.
To fix the issue, move clear_tasks_mm_cpumask() call from __cpu_disable()
to __cpu_die() which is called on the thread which is asking for a target
CPU to be shutdown. In addition, this change restores CPUhotplug functionality
on TI OMAP dra7-evm and CPU1 can be unplugged/plugged many times.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Sekhar Nori <nsekhar@ti.com>
Cc: Austin Schuh <austin@peloton-tech.com>
Cc: <philipp@peloton-tech.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/1441995683-30817-1-git-send-email-grygorii.strashko@ti.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cpufreq_rwsem was introduced in commit 6eed9404ab3c4 ("cpufreq: Use
rwsem for protecting critical sections) in order to replace
try_module_get() on the cpu-freq driver. That try_module_get() worked
well until the refcount was so heavily used that module removal became
more or less impossible.
Though when looking at the various (undocumented) protection
mechanisms in that code, the randomly sprinkeled around cpufreq_rwsem
locking sites are superfluous.
The policy, which is acquired in cpufreq_cpu_get() and released in
cpufreq_cpu_put() is sufficiently protected already.
cpufreq_cpu_get(cpu)
/* Protects against concurrent driver removal */
read_lock_irqsave(&cpufreq_driver_lock, flags);
policy = per_cpu(cpufreq_cpu_data, cpu);
kobject_get(&policy->kobj);
read_unlock_irqrestore(&cpufreq_driver_lock, flags);
The reference on the policy serializes versus module unload already:
cpufreq_unregister_driver()
subsys_interface_unregister()
__cpufreq_remove_dev_finish()
per_cpu(cpufreq_cpu_data) = NULL;
cpufreq_policy_put_kobj()
If there is a reference held on the policy, i.e. obtained prior to the
unregister call, then cpufreq_policy_put_kobj() will wait until that
reference is dropped. So once subsys_interface_unregister() returns
there is no policy pointer in flight and no new reference can be
obtained. So that rwsem protection is useless.
The other usage of cpufreq_rwsem in show()/store() of the sysfs
interface is redundant as well because sysfs already does the proper
kobject_get()/put() pairs.
That leaves CPU hotplug versus module removal. The current
down_write() around the write_lock() in cpufreq_unregister_driver() is
silly at best as it protects actually nothing.
The trivial solution to this is to prevent hotplug across
cpufreq_unregister_driver completely.
[upstream: rafael/linux-pm 454d3a2500a4eb33be85dde3bfba9e5f6b5efadc]
[fixes: "cpufreq_stat_notifier_trans: No policy found" since v4.0-rt]
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.
This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.
Cc: stable-rt@vger.kernel.org
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|