| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Luis captured the following:
| BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
| in_atomic(): 1, irqs_disabled(): 0, pid: 517, name: Xorg
| 2 locks held by Xorg/517:
| #0:
| (
| &dev->vbl_lock
| ){......}
| , at:
| [<ffffffffa0024c60>] drm_vblank_get+0x30/0x2b0 [drm]
| #1:
| (
| &dev->vblank_time_lock
| ){......}
| , at:
| [<ffffffffa0024ce1>] drm_vblank_get+0xb1/0x2b0 [drm]
| Preemption disabled at:
| [<ffffffffa008bc95>] i915_get_vblank_timestamp+0x45/0xa0 [i915]
| CPU: 3 PID: 517 Comm: Xorg Not tainted 3.10.10-rt7+ #5
| Call Trace:
| [<ffffffff8164b790>] dump_stack+0x19/0x1b
| [<ffffffff8107e62f>] __might_sleep+0xff/0x170
| [<ffffffff81651ac4>] rt_spin_lock+0x24/0x60
| [<ffffffffa0084e67>] i915_read32+0x27/0x170 [i915]
| [<ffffffffa008a591>] i915_pipe_enabled+0x31/0x40 [i915]
| [<ffffffffa008a6be>] i915_get_crtc_scanoutpos+0x3e/0x1b0 [i915]
| [<ffffffffa00245d4>] drm_calc_vbltimestamp_from_scanoutpos+0xf4/0x430 [drm]
| [<ffffffffa008bc95>] i915_get_vblank_timestamp+0x45/0xa0 [i915]
| [<ffffffffa0024998>] drm_get_last_vbltimestamp+0x48/0x70 [drm]
| [<ffffffffa0024db5>] drm_vblank_get+0x185/0x2b0 [drm]
| [<ffffffffa0025d03>] drm_wait_vblank+0x83/0x5d0 [drm]
| [<ffffffffa00212a2>] drm_ioctl+0x552/0x6a0 [drm]
| [<ffffffff811a0095>] do_vfs_ioctl+0x325/0x5b0
| [<ffffffff811a03a1>] SyS_ioctl+0x81/0xa0
| [<ffffffff8165a342>] tracesys+0xdd/0xe2
After a longer thread it was decided to drop the preempt_disable()/
enable() invocations which were meant for -RT and Mario Kleiner looks
for a replacement.
Cc: stable-rt@vger.kernel.org
Reported-By: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following trace is triggered when running ltp oom test cases:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 17188, name: oom03
Preemption disabled at:[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
CPU: 2 PID: 17188 Comm: oom03 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffff88007684d730 ffff880070df9b58 ffffffff8169918d ffff880070df9b70
ffffffff8106db31 ffff88007688b4a0 ffff880070df9b88 ffffffff8169d9c0
ffff88007688b4a0 ffff880070df9bc8 ffffffff81059da1 0000000170df9bb0
Call Trace:
[<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff8112b361>] drain_all_stock+0xe1/0x1c0
[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
[<ffffffff8112beda>] __mem_cgroup_try_charge+0x41a/0xc40
[<ffffffff810f1c91>] ? release_pages+0x1b1/0x1f0
[<ffffffff8106f200>] ? sched_exec+0x40/0xb0
[<ffffffff8112cc87>] mem_cgroup_charge_common+0x37/0x70
[<ffffffff8112e2c6>] mem_cgroup_newpage_charge+0x26/0x30
[<ffffffff8110af68>] handle_pte_fault+0x618/0x840
[<ffffffff8103ecf6>] ? unpin_current_cpu+0x16/0x70
[<ffffffff81070f94>] ? migrate_enable+0xd4/0x200
[<ffffffff8110cde5>] handle_mm_fault+0x145/0x1e0
[<ffffffff810301e1>] __do_page_fault+0x1a1/0x4c0
[<ffffffff8169c9eb>] ? preempt_schedule_irq+0x4b/0x70
[<ffffffff8169e3b7>] ? retint_kernel+0x37/0x40
[<ffffffff8103053e>] do_page_fault+0xe/0x10
[<ffffffff8169e4c2>] page_fault+0x22/0x30
So, to prevent schedule_work_on from being called in preempt disabled context,
replace the pair of get/put_cpu() to get/put_cpu_light().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Yang Shi <yang.shi@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
| |
If the user specified a threshold at module load time, use it.
Cc: stable-rt@vger.kernel.org
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit ee23871389 ("genirq: Set irq thread to RT priority on
creation") we moved the assigment of the thread's priority from the
thread's function into __setup_irq(). That function may run in user
context for instance if the user opens an UART node and then driver
calls requests in the ->open() callback. That user may not have
CAP_SYS_NICE and so the irq thread won't run with the SCHED_OTHER
policy.
This patch uses sched_setscheduler_nocheck() so we omit the CAP_SYS_NICE
check which is otherwise required for the SCHED_OTHER policy.
Cc: Ivo Sieben <meltedpianoman@gmail.com>
Cc: stable@vger.kernel.org
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Pfaff <tpfaff@pcs.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bigeasy: rewrite the changelog]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
| |
Joe Korty reported, that __irq_set_affinity_locked() schedules a
workqueue while holding a rawlock which results in a might_sleep()
warning.
This patch moves the invokation into a process context so that we only
wakeup() a process while holding the lock.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's no reason to use stop machine to search for hardware latency.
Simply disabling interrupts while running the loop will do enough to
check if something comes in that wasn't disabled by interrupts being
off, which is exactly what stop machine does.
Instead of using stop machine, just have the thread disable interrupts
while it checks for hardware latency.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
| |
As ktime_get() calls into the timing code which does a read_seq(), it
may be affected by other CPUS that touch that lock. To remove this
dependency, use the trace_clock_local() which is already exported
for module use. If CONFIG_TRACING is enabled, use that as the clock,
otherwise use ktime_get().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
|
|
| |
The hwlat-detector needs a better clock than just ktime_get() as that
can induce its own latencies. The trace clock is perfect for it, but
it needs to be exported for use by modules.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hwlat_detector reads two timestamps in a row, then reports any
gap between those calls. The problem is, it misses everything between
the second reading of the time stamp to the first reading of the time stamp
in the next loop. That's were most of the time is spent, which means,
chances are likely that it will miss all hardware latencies. This
defeats the purpose.
By also testing the first time stamp from the previous loop second
time stamp (the outer loop), we are more likely to find a latency.
Setting the threshold to 1, here's what the report now looks like:
1347415723.0232202770 0 2
1347415725.0234202822 0 2
1347415727.0236202875 0 2
1347415729.0238202928 0 2
1347415731.0240202980 0 2
1347415734.0243203061 0 2
1347415736.0245203113 0 2
1347415738.0247203166 2 0
1347415740.0249203219 0 3
1347415742.0251203272 0 3
1347415743.0252203299 0 3
1347415745.0254203351 0 2
1347415747.0256203404 0 2
1347415749.0258203457 0 2
1347415751.0260203510 0 2
1347415754.0263203589 0 2
1347415756.0265203642 0 2
1347415758.0267203695 0 2
1347415760.0269203748 0 2
1347415762.0271203801 0 2
1347415764.0273203853 2 0
There's some hardware latency that takes 2 microseconds to run.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ntp code for notify_cmos_timer() is called from a hard interrupt
context. schedule_delayed_work() under PREEMPT_RT_FULL calls spinlocks
that have been converted to mutexes, thus calling schedule_delayed_work()
from interrupt is not safe.
Add a helper thread that does the call to schedule_delayed_work and wake
up that thread instead of calling schedule_delayed_work() directly.
This is only for CONFIG_PREEMPT_RT_FULL, otherwise the code still calls
schedule_delayed_work() directly in irq context.
Note: There's a few places in the kernel that do this. Perhaps the RT
code should have a dedicated thread that does the checks. Just register
a notifier on boot up for your check and wake up the thread when
needed. This will be a todo.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This tracepoint is responsible for:
|[<814cc358>] __schedule_bug+0x4d/0x59
|[<814d24cc>] __schedule+0x88c/0x930
|[<814d3b90>] ? _raw_spin_unlock_irqrestore+0x40/0x50
|[<814d3b95>] ? _raw_spin_unlock_irqrestore+0x45/0x50
|[<810b57b5>] ? task_blocks_on_rt_mutex+0x1f5/0x250
|[<814d27d9>] schedule+0x29/0x70
|[<814d3423>] rt_spin_lock_slowlock+0x15b/0x278
|[<814d3786>] rt_spin_lock+0x26/0x30
|[<a00dced9>] gen6_gt_force_wake_get+0x29/0x60 [i915]
|[<a00e183f>] gen6_ring_get_irq+0x5f/0x100 [i915]
|[<a00b2a33>] ftrace_raw_event_i915_gem_ring_dispatch+0xe3/0x100 [i915]
|[<a00ac1b3>] i915_gem_do_execbuffer.isra.13+0xbd3/0x1430 [i915]
|[<810f8943>] ? trace_buffer_unlock_commit+0x43/0x60
|[<8113e8d2>] ? ftrace_raw_event_kmem_alloc+0xd2/0x180
|[<8101d063>] ? native_sched_clock+0x13/0x80
|[<a00acf29>] i915_gem_execbuffer2+0x99/0x280 [i915]
|[<a00114a3>] drm_ioctl+0x4c3/0x570 [drm]
|[<8101d0d9>] ? sched_clock+0x9/0x10
|[<a00ace90>] ? i915_gem_execbuffer+0x480/0x480 [i915]
|[<810f1c18>] ? rb_commit+0x68/0xa0
|[<810f1c6c>] ? ring_buffer_unlock_commit+0x1c/0xa0
|[<81197467>] do_vfs_ioctl+0x97/0x540
|[<81021318>] ? ftrace_raw_event_sys_enter+0xd8/0x130
|[<811979a1>] sys_ioctl+0x91/0xb0
|[<814db931>] tracesys+0xe1/0xe6
Chris Wilson does not like to move i915_trace_irq_get() out of the macro
|No. This enables the IRQ, as well as making a number of
|very expensively serialised read, unconditionally.
so it is gone now on RT.
Cc: stable-rt@vger.kernel.org
Reported-by: Joakim Hernberg <jbh@alchemy.lu>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
|
|
|
|
| |
If a task which is allowed to run only on CPU X puts CPU Y down then it
will be allowed on all CPUs but the on CPU Y after it comes back from
kernel. This patch ensures that we don't lose the initial setting unless
the CPU the task is running is going down.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If kthread is pinned to CPUx and CPUx is going down then we get into
trouble:
- first the unplug thread is created
- it will set itself to hp->unplug. As a result, every task that is
going to take a lock, has to leave the CPU.
- the CPU_DOWN_PREPARE notifier are started. The worker thread will
start a new process for the "high priority worker".
Now kthread would like to take a lock but since it can't leave the CPU
it will never complete its task.
We could fire the unplug thread after the notifier but then the cpu is
no longer marked "online" and the unplug thread will run on CPU0 which
was fixed before :)
So instead the unplug thread is started and kept waiting until the
notfier complete their work.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
|
|
|
|
| |
wake_up should do nothing on the nort, so we should use wakeup_timer_waiters,
also fix a spell mistake.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
[bigeasy: s/CONFIG_PREEMPT_RT_BASE/CONFIG_PREEMPT_RT_FULL/]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The patch "list_bl.h: make list head locking RT safe" introduced
an unconditional
__set_bit(0, (unsigned long *)b);
in void hlist_bl_lock(struct hlist_bl_head *b). This clobbers the value
of b->first. When the value of b->first is retrieved using
hlist_bl_first the clobbering is undone using
(unsigned long)h->first & ~LIST_BL_LOCKMASK
and so depending on LIST_BL_LOCKMASK being one. But LIST_BL_LOCKMASK is
only one if at least on of CONFIG_SMP and CONFIG_DEBUG_SPINLOCK are
defined. Without these the value returned by hlist_bl_first has the
zeroth bit set which likely results in a crash.
So only do the clobbering in the cases where LIST_BL_LOCKMASK is one.
An alternative would be to always define LIST_BL_LOCKMASK to one with
CONFIG_PREEMPT_RT_BASE.
Cc: stable-rt@vger.kernel.org
Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As per changes in include/linux/jbd_common.h for avoiding the
bit_spin_locks on RT ("fs: jbd/jbd2: Make state lock and journal
head lock rt safe") we do the same thing here.
We use the non atomic __set_bit and __clear_bit inside the scope of
the lock to preserve the ability of the existing LIST_DEBUG code to
use the zero'th bit in the sanity checks.
As a bit spinlock, we had no lockdep visibility into the usage
of the list head locking. Now, if we were to implement it as a
standard non-raw spinlock, we would see:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
in_atomic(): 1, irqs_disabled(): 0, pid: 122, name: udevd
5 locks held by udevd/122:
#0: (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffff811967e8>] lock_rename+0xe8/0xf0
#1: (rename_lock){+.+...}, at: [<ffffffff811a277c>] d_move+0x2c/0x60
#2: (&dentry->d_lock){+.+...}, at: [<ffffffff811a0763>] dentry_lock_for_move+0xf3/0x130
#3: (&dentry->d_lock/2){+.+...}, at: [<ffffffff811a0734>] dentry_lock_for_move+0xc4/0x130
#4: (&dentry->d_lock/3){+.+...}, at: [<ffffffff811a0747>] dentry_lock_for_move+0xd7/0x130
Pid: 122, comm: udevd Not tainted 3.4.47-rt62 #7
Call Trace:
[<ffffffff810b9624>] __might_sleep+0x134/0x1f0
[<ffffffff817a24d4>] rt_spin_lock+0x24/0x60
[<ffffffff811a0c4c>] __d_shrink+0x5c/0xa0
[<ffffffff811a1b2d>] __d_drop+0x1d/0x40
[<ffffffff811a24be>] __d_move+0x8e/0x320
[<ffffffff811a278e>] d_move+0x3e/0x60
[<ffffffff81199598>] vfs_rename+0x198/0x4c0
[<ffffffff8119b093>] sys_renameat+0x213/0x240
[<ffffffff817a2de5>] ? _raw_spin_unlock+0x35/0x60
[<ffffffff8107781c>] ? do_page_fault+0x1ec/0x4b0
[<ffffffff817a32ca>] ? retint_swapgs+0xe/0x13
[<ffffffff813eb0e6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8119b0db>] sys_rename+0x1b/0x20
[<ffffffff817a3b96>] system_call_fastpath+0x1a/0x1f
Since we are only taking the lock during short lived list operations,
lets assume for now that it being raw won't be a significant latency
concern.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a threaded irq handler is installed the irq thread is initially
created on normal scheduling priority. Only after the irq thread is
woken up it sets its priority to RT_FIFO MAX_USER_RT_PRIO/2 itself.
This means that interrupts that occur directly after the irq handler
is installed will be handled on a normal scheduling priority instead
of the realtime priority that one would expect.
Fix this by setting the RT priority on creation of the irq_thread.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Ivo Sieben <meltedpianoman@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1370254322-17240-1-git-send-email-meltedpianoman@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Seems mce timer fire at the wrong frequency in -rt kernels since roughly
forever due to 32 bit overflow. 3.8-rt is also missing a multiplier.
Add missing us -> ns conversion and 32 bit overflow prevention.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bigeasy: use ULL instead of u64 cast]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In -rt, most spin_locks() turn into mutexes. One of these spin_lock
conversions is performed on the workqueue gcwq->lock. When the idle
worker is worken, the first thing it will do is grab that same lock and
it too will block, possibly jumping into the same code, but because
nr_running would already be decremented it prevents an infinite loop.
But this is still a waste of CPU cycles, and it doesn't follow the method
of mainline, as new workers should only be woken when a worker thread is
truly going to sleep, and not just blocked on a spin_lock().
Check the saved_state too before waking up new workers.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From lib/Kconfig.debug on CONFIG_FORCE_WEAK_PER_CPU:
----
s390 and alpha require percpu variables in modules to be
defined weak to work around addressing range issue which
puts the following two restrictions on percpu variable
definitions.
1. percpu symbols must be unique whether static or not
2. percpu variables can't be defined inside a function
To ensure that generic code follows the above rules, this
option forces all percpu variables to be defined as weak.
----
The addition of the local IRQ swap_lock in mm/swap.c broke this config
as the name "swap_lock" is used through out the kernel. Just do a "git
grep swap_lock" to see, and the new swap_lock is a local lock which
defines the swap_lock for per_cpu.
The fix was to rename swap_lock to swapvec_lock which keeps it unique.
Reported-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had a customer report a lockup on a 3.0-rt kernel that had the
following backtrace:
[ffff88107fca3e80] rt_spin_lock_slowlock at ffffffff81499113
[ffff88107fca3f40] rt_spin_lock at ffffffff81499a56
[ffff88107fca3f50] __wake_up at ffffffff81043379
[ffff88107fca3f80] mce_notify_irq at ffffffff81017328
[ffff88107fca3f90] intel_threshold_interrupt at ffffffff81019508
[ffff88107fca3fa0] smp_threshold_interrupt at ffffffff81019fc1
[ffff88107fca3fb0] threshold_interrupt at ffffffff814a1853
It actually bugged because the lock was taken by the same owner that
already had that lock. What happened was the thread that was setting
itself on a wait queue had the lock when an MCE triggered. The MCE
interrupt does a wake up on its wait list and grabs the same lock.
NOTE: THIS IS NOT A BUG ON MAINLINE
Sorry for yelling, but as I Cc'd mainline maintainers I want them to
know that this is an PREEMPT_RT bug only. I only Cc'd them for advice.
On PREEMPT_RT the wait queue locks are converted from normal
"spin_locks" into an rt_mutex (see the rt_spin_lock_slowlock above).
These are not to be taken by hard interrupt context. This usually isn't
a problem as most all interrupts in PREEMPT_RT are converted into
schedulable threads. Unfortunately that's not the case with the MCE irq.
As wait queue locks are notorious for long hold times, we can not
convert them to raw_spin_locks without causing issues with -rt. But
Thomas has created a "simple-wait" structure that uses raw spin locks
which may have been a good fit.
Unfortunately, wait queues are not the only issue, as the mce_notify_irq
also does a schedule_work(), which grabs the workqueue spin locks that
have the exact same issue.
Thus, this patch I'm proposing is to move the actual work of the MCE
interrupt into a helper thread that gets woken up on the MCE interrupt
and does the work in a schedulable context.
NOTE: THIS PATCH ONLY CHANGES THE BEHAVIOR WHEN PREEMPT_RT IS SET
Oops, sorry for yelling again, but I want to stress that I keep the same
behavior of mainline when PREEMPT_RT is not set. Thus, this only changes
the MCE behavior when PREEMPT_RT is configured.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bigeasy@linutronix: make mce_notify_work() a proper prototype, use
kthread_run()]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
|
| |
Simple waitqueues can be handled from interrupt disabled contexts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
wait_queue is a swiss army knife and in most of the cases the
complexity is not needed. For RT waitqueues are a constant source of
trouble as we can't convert the head lock to a raw spinlock due to
fancy and long lasting callbacks.
Provide a slim version, which allows RT to replace wait queues. This
should go mainline as well, as it lowers memory consumption and
runtime overhead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[backport by: Tim Sander <tim.sander@hbm.com> ]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 9ec1882df2 (tty: serial: imx: console write routing is unsafe
on SMP) introduced a recursive locking bug in imx_console_write().
The callchain is:
imx_rxint()
spin_lock_irqsave(&sport->port.lock,flags);
...
uart_handle_sysrq_char();
sysrq_function();
printk();
imx_console_write();
spin_lock_irqsave(&sport->port.lock,flags); <--- DEAD
The bad news is that the kernel debugging facilities can dectect the
problem, but the printks never surface on the serial console for
obvious reasons.
There is a similar issue with oops_in_progress. If the kernel crashes
we really don't want to be stuck on the lock and unable to tell what
happened.
In general most UP originated drivers miss these checks and nobody
ever notices because CONFIG_PROVE_LOCKING seems to be still ignored by
a large number of developers.
The solution is to avoid locking in the sysrq case and trylock in the
oops_in_progress case.
This scheme is used in other drivers as well and it would be nice if
we could move this to a common place, so the usual copy/paste/modify
bugs can be avoided.
Now there is another issue with this scheme:
CPU0 CPU1
printk()
rxint()
sysrq_detection() -> sets port->sysrq
return from interrupt
console_write()
if (port->sysrq)
avoid locking
port->sysrq is reset with the next receive character. So as long as
the port->sysrq is not reset and this can take an endless amount of
time if after the break no futher receive character follows, all
console writes happen unlocked.
While the current writer is protected against other console writers by
the console sem, it's unprotected against open/close or other
operations which fiddle with the port. That's what the above mentioned
commit tried to solve.
That's an issue in all drivers which use that scheme and unfortunately
there is no easy workaround. The only solution is to have a separate
indicator port->sysrq_cpu. uart_handle_sysrq_char() then sets it to
smp_processor_id() before calling into handle_sysrq() and resets it to
-1 after that. Then change the locking check to:
if (port->sysrq_cpu == smp_processor_id())
locked = 0;
else if (oops_in_progress)
locked = spin_trylock_irqsave(port->lock, flags);
else
spin_lock_irqsave(port->lock, flags);
That would force all other cpus into the spin_lock path. Problem
solved, but that's way beyond the scope of this fix and really wants
to be implemented in a common function which calls the uart specific
write function to avoid another gazillion of hard to debug
copy/paste/modify bugs.
Reported-and-tested-by: Tim Sander <tim@krieglstein.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Xinyu Chen <xinyu.chen@freescale.com>
Cc: Dirk Behme <dirk.behme@de.bosch.com>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: Tim Sander <tim@krieglstein.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302142006050.22263@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 07354eb1a74d1 ("locking printk: Annotate logbuf_lock as raw")
reintroduced a lock inversion problem which was fixed in commit
0b5e1c5255 ("printk: Release console_sem after logbuf_lock"). This
happened probably when fixing up patch rejects.
Restore the ordering and unlock logbuf_lock before releasing
console_sem.
Signed-off-by: ybu <ybu@qti.qualcomm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/E807E903FE6CBE4D95E420FBFCC273B827413C@nasanexd01h.na.qualcomm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We hit the following bug with 3.6-rt:
[ 5.898990] BUG: scheduling while atomic: swapper/3/0/0x00000002
[ 5.898991] no locks held by swapper/3/0.
[ 5.898993] Modules linked in:
[ 5.898996] Pid: 0, comm: swapper/3 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1
[ 5.898997] Call Trace:
[ 5.899011] [<ffffffff810804e7>] __schedule_bug+0x67/0x90
[ 5.899028] [<ffffffff81577923>] __schedule+0x793/0x7a0
[ 5.899032] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200
[ 5.899034] [<ffffffff81577b89>] schedule+0x29/0x70
[ 5.899036] BUG: scheduling while atomic: swapper/7/0/0x00000002
[ 5.899037] no locks held by swapper/7/0.
[ 5.899039] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0
[ 5.899040] Modules linked in:
[ 5.899041]
[ 5.899045] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90
[ 5.899046] Pid: 0, comm: swapper/7 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1
[ 5.899047] Call Trace:
[ 5.899049] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40
[ 5.899052] [<ffffffff810804e7>] __schedule_bug+0x67/0x90
[ 5.899054] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80
[ 5.899056] [<ffffffff81577923>] __schedule+0x793/0x7a0
[ 5.899059] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23
[ 5.899062] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200
[ 5.899068] [<ffffffff8130be64>] acpi_write_bit_register+0x33/0xb0
[ 5.899071] [<ffffffff81577b89>] schedule+0x29/0x70
[ 5.899072] [<ffffffff8130be13>] ? acpi_read_bit_register+0x33/0x51
[ 5.899074] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0
[ 5.899077] [<ffffffff8131d1fc>] acpi_idle_enter_bm+0x8a/0x28e
[ 5.899079] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90
[ 5.899081] [<ffffffff8107e5da>] ? this_cpu_load+0x1a/0x30
[ 5.899083] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40
[ 5.899087] [<ffffffff8144c759>] cpuidle_enter+0x19/0x20
[ 5.899088] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80
[ 5.899090] [<ffffffff8144c777>] cpuidle_enter_state+0x17/0x50
[ 5.899092] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23
[ 5.899094] [<ffffffff8144d1a1>] cpuidle899101] [<ffffffff8130be13>] ?
As the acpi code disables interrupts in acpi_idle_enter_bm, and calls
code that grabs the acpi lock, it causes issues as the lock is currently
in RT a sleeping lock.
The lock was converted from a raw to a sleeping lock due to some
previous issues, and tests that showed it didn't seem to matter.
Unfortunately, it did matter for one of our boxes.
This patch converts the lock back to a raw lock. I've run this code on a
few of my own machines, one being my laptop that uses the acpi quite
extensively. I've been able to suspend and resume without issues.
[ tglx: Made the change exclusive for acpi_gbl_hardware_lock ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: John Kacur <jkacur@gmail.com>
Cc: Clark Williams <clark@redhat.com>
Link: http://lkml.kernel.org/r/1360765565.23152.5.camel@gandalf.local.home
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
| |
Even with CONFIG_HIGHMEM=n we need to take care of the "atomic"
mappings which are installed via iomap_atomic.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
| |
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
| |
Idle is not allowed to call sleeping functions ever!
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
| |
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
| |
On !RT interrupt runs with interrupts disabled. On RT it's in a
thread, so no need to disable interrupts at all.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
| |
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Retry loops on RT might loop forever when the modifying side was
preempted. Steven also observed a live lock when there was a
concurrent priority boosting going on.
Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a PI boosted task policy/priority is modified by a setscheduler()
call we unconditionally dequeue and requeue the task if it is on the
runqueue even if the new priority is lower than the current effective
boosted priority. This can result in undesired reordering of the
priority bucket list.
If the new priority is less or equal than the current effective we
just store the new parameters in the task struct and leave the
scheduler class and the runqueue untouched. This is handled when the
task deboosts itself. Only if the new priority is higher than the
effective boosted priority we apply the change immediately.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: stable-rt@vger.kernel.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following scenario does not work correctly:
Runqueue of CPUx contains two runnable and pinned tasks:
T1: SCHED_FIFO, prio 80
T2: SCHED_FIFO, prio 80
T1 is on the cpu and executes the following syscalls (classic priority
ceiling scenario):
sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90);
...
sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80);
...
Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back
to sleep the scheduler picks T2. Surprise!
The same happens w/o actual preemption when T1 is forced into the
scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes
pick_next_task() which returns T2. So T1 gets preempted and scheduled
out.
This happens because sched_setscheduler() dequeues T1 from the prio 90
list and then enqueues it on the tail of the prio 80 list behind T2.
This violates the POSIX spec and surprises user space which relies on
the guarantee that SCHED_FIFO tasks are not scheduled out unless they
give the CPU up voluntarily or are preempted by a higher priority
task. In the latter case the preempted task must get back on the CPU
after the preempting task schedules out again.
We fixed a similar issue already in commit 60db48c (sched: Queue a
deboosted task to the head of the RT prio queue). The same treatment
is necessary for sched_setscheduler(). So enqueue to head of the prio
bucket list if the priority of the task is lowered.
It might be possible that existing user space relies on the current
behaviour, but it can be considered highly unlikely due to the corner
case nature of the application scenario.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: stable-rt@vger.kernel.org
|
|
|
|
|
|
|
|
|
| |
If the policy and priority remain unchanged a possible modification of
sched_reset_on_fork gets lost in the early exit path.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: stable-rt@vger.kernel.org
|
|
|
|
|
|
|
|
|
|
| |
The netfilter code relies only on the implicit semantics of
local_bh_disable() for serializing wt_write_recseq sections. RT breaks
that and needs explicit serialization here.
Reported-by: Peter LaDow <petela@gocougs.wsu.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
|
|
|
|
|
|
| |
This uses a timer_list timer from the irq disabled guts of the idle
code. Disable it for now to prevent wreckage.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
|
|
|
|
|
| |
When the hrtimer stall detection hits the softirq is not raised.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
rcu_read_unlock_special() checks in_serving_softirq() and leaves early
when true. On RT this is obviously wrong as softirq processing context
can be preempted and therefor such a task can be on the gp_tasks
list. Leaving early here will leave the task on the list and therefor
block RCU processing forever.
This cannot happen on mainline because softirq processing context
cannot be preempted and therefor this can never happen at all.
In fact this check looks quite questionable in general. Neither irq
context nor softirq processing context in mainline can ever be
preempted in mainline so the special unlock case should not ever be
invoked in such context. Now the only explanation might be a
rcu_read_unlock() being interrupted and therefor leave the rcu nest
count at 0 before the special unlock bit has been cleared. That looks
fragile. At least it's missing a big fat comment. Paul ????
See mainline commits: ec433f0c5 and 8762705a for further enlightment.
Reported-by: Kristian Lehmann <krleit00@hs-esslingen.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
| |
There was a stable fix that moved the init_lock_keys() to after
the enable_cpucache(). But -rt changed this function to
init_cachep_lock_keys(). This moves the init afterwards to
match the stable fix.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
| |
If the stop machinery is called from inactive CPU we cannot use
mutex_lock, because some other stomp machine invokation might be in
progress and the mutex can be contended. We cannot schedule from this
context, so trylock and loop.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
| |
might sleep can tell us where interrupts have been disabled, but we
have no idea what disabled preemption. Add some debug infrastructure.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
| |
rwlocks and rwsems on RT do not allow multiple readers. Annotate the
lockdep acquire functions accordingly.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
| |
The plain spinlock while sufficient does not update the local_lock
internals. Use a proper local_lock function instead to ease debugging.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
=============================================
[ INFO: possible recursive locking detected ]
3.6.0-rt1+ #49 Not tainted
---------------------------------------------
swapper/0/1 is trying to acquire lock:
lock_slab_on+0x72/0x77
but task is already holding lock:
__local_lock_irq+0x24/0x77
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&per_cpu(slab_lock, __cpu).lock);
lock(&per_cpu(slab_lock, __cpu).lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by swapper/0/1:
kmem_cache_create+0x33/0x89
__local_lock_irq+0x24/0x77
stack backtrace:
Pid: 1, comm: swapper/0 Not tainted 3.6.0-rt1+ #49
Call Trace:
__lock_acquire+0x9a4/0xdc4
? __local_lock_irq+0x24/0x77
? lock_slab_on+0x72/0x77
lock_acquire+0xc4/0x108
? lock_slab_on+0x72/0x77
? unlock_slab_on+0x5b/0x5b
rt_spin_lock+0x36/0x3d
? lock_slab_on+0x72/0x77
? migrate_disable+0x85/0x93
lock_slab_on+0x72/0x77
do_ccupdate_local+0x19/0x44
slab_on_each_cpu+0x36/0x5a
do_tune_cpucache+0xc1/0x305
enable_cpucache+0x8c/0xb5
setup_cpu_cache+0x28/0x182
__kmem_cache_create+0x34b/0x380
? shmem_mount+0x1a/0x1a
kmem_cache_create+0x4a/0x89
? shmem_mount+0x1a/0x1a
shmem_init+0x3e/0xd4
kernel_init+0x11c/0x214
kernel_thread_helper+0x4/0x10
? retint_restore_args+0x13/0x13
? start_kernel+0x3bc/0x3bc
? gs_change+0x13/0x13
It's not a missing annotation. It's simply wrong code and needs to be
fixed. Instead of nesting the local and the remote cpu lock simply
acquire only the remote cpu lock, which is sufficient protection for
this procedure.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I discovered this bug when booting 3.4-rt on my powerpc box. It crashed
with the following report:
------------[ cut here ]------------
kernel BUG at /work/rt/stable-rt.git/kernel/rtmutex_common.h:75!
Oops: Exception in kernel mode, sig: 5 [#1]
PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient
Modules linked in:
NIP: c0000000004aa03c LR: c0000000004aa01c CTR: c00000000009b2ac
REGS: c00000003e8d7950 TRAP: 0700 Not tainted (3.4.11-test-rt19)
MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 24000082 XER: 20000000
SOFTE: 0
TASK = c00000003e8fdcd0[11] 'ksoftirqd/1' THREAD: c00000003e8d4000 CPU: 1
GPR00: 0000000000000001 c00000003e8d7bd0 c000000000d6cbb0 0000000000000000
GPR04: c00000003e8fdcd0 0000000000000000 0000000024004082 c000000000011454
GPR08: 0000000000000000 0000000080000001 c00000003e8fdcd1 0000000000000000
GPR12: 0000000024000084 c00000000fff0280 ffffffffffffffff 000000003ffffad8
GPR16: ffffffffffffffff 000000000072c798 0000000000000060 0000000000000000
GPR20: 0000000000642741 000000000072c858 000000003ffffaf0 0000000000000417
GPR24: 000000000072dcd0 c00000003e7ff990 0000000000000000 0000000000000001
GPR28: 0000000000000000 c000000000792340 c000000000ccec78 c000000001182338
NIP [c0000000004aa03c] .wakeup_next_waiter+0x44/0xb8
LR [c0000000004aa01c] .wakeup_next_waiter+0x24/0xb8
Call Trace:
[c00000003e8d7bd0] [c0000000004aa01c] .wakeup_next_waiter+0x24/0xb8 (unreliable)
[c00000003e8d7c60] [c0000000004a0320] .rt_spin_lock_slowunlock+0x8c/0xe4
[c00000003e8d7ce0] [c0000000004a07cc] .rt_spin_unlock+0x54/0x64
[c00000003e8d7d60] [c0000000000636bc] .__thread_do_softirq+0x130/0x174
[c00000003e8d7df0] [c00000000006379c] .run_ksoftirqd+0x9c/0x1a4
[c00000003e8d7ea0] [c000000000080b68] .kthread+0xa8/0xb4
[c00000003e8d7f90] [c00000000001c2f8] .kernel_thread+0x54/0x70
Instruction dump:
60000000 e86d01c8 38630730 4bff7061 60000000 ebbf0008 7c7c1b78 e81d0040
7fe00278 7c000074 7800d182 68000001 <0b000000> e88d01c8 387d0010 38840738
The rtmutex_common.h:75 is:
rt_mutex_top_waiter(struct rt_mutex *lock)
{
struct rt_mutex_waiter *w;
w = plist_first_entry(&lock->wait_list, struct rt_mutex_waiter,
list_entry);
BUG_ON(w->lock != lock);
return w;
}
Where the waiter->lock is corrupted. I saw various other random bugs
that all had to with the softirq lock and plist. As plist needs to be
initialized before it is used I investigated how this lock is
initialized. It's initialized with:
void __init softirq_early_init(void)
{
local_irq_lock_init(local_softirq_lock);
}
Where:
do { \
int __cpu; \
for_each_possible_cpu(__cpu) \
spin_lock_init(&per_cpu(lvar, __cpu).lock); \
} while (0)
As the softirq lock is a local_irq_lock, which is a per_cpu lock, the
initialization is done to all per_cpu versions of the lock. But lets
look at where the softirq_early_init() is called from.
In init/main.c: start_kernel()
/*
* Interrupts are still disabled. Do necessary setups, then
* enable them
*/
softirq_early_init();
tick_init();
boot_cpu_init();
page_address_init();
printk(KERN_NOTICE "%s", linux_banner);
setup_arch(&command_line);
mm_init_owner(&init_mm, &init_task);
mm_init_cpumask(&init_mm);
setup_command_line(command_line);
setup_nr_cpu_ids();
setup_per_cpu_areas();
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
One of the first things that is called is the initialization of the
softirq lock. But if you look further down, we see the per_cpu areas
have not been set up yet. Thus initializing a local_irq_lock() before
the per_cpu section is set up, may not work as it is initializing the
per cpu locks before the per cpu exists.
By moving the softirq_early_init() right after setup_per_cpu_areas(),
the kernel boots fine.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Clark Williams <clark@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Carsten Emde <cbe@osadl.org>
Cc: vomlehn@texas.net
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/1349362924.6755.18.camel@gandalf.local.home
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
| |
Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The leap-second backport broke RT, and a few changes had to be done.
1) The second_overflow now encompasses ntp_leap_second, and since
second_overflow is called with the xtime_lock held, we can not take that
lock either. (this update was done during the rebase).
2) Change ktime_get_update_offsets() to use read_seqcount_begin() instead
of read_seq_begin() (and retry).
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
| |
The commit "cpu/rt: Rework cpu down for PREEMPT_RT" changed the double
meaning of the cpu_hotplug.lock, where it was a spinlock for RT and a
mutex for non-RT, to just a mutex for both. But the initialization of
the variable was not updated to reflect this change.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|