summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSebastian Andrzej Siewior <bigeasy@linutronix.de>2017-12-01 15:55:00 +0100
committerSebastian Andrzej Siewior <bigeasy@linutronix.de>2017-12-01 15:55:00 +0100
commita9e44086cd67396bc1ad41455b86e3a60620c021 (patch)
treec4da378ce7d2b5f3d6ecfded281adb6ce8bbd522
parent657d8cd9f93891840fb1cd1666a8e590d19e72ba (diff)
downloadlinux-rt-a9e44086cd67396bc1ad41455b86e3a60620c021.tar.gz
[ANNOUNCE] v4.14.3-rt4v4.14.3-rt4-patches
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-rw-r--r--patches/localversion.patch2
-rw-r--r--patches/sched-rt-Simplify-the-IPI-based-RT-balancing-logic.patch564
-rw-r--r--patches/series1
3 files changed, 1 insertions, 566 deletions
diff --git a/patches/localversion.patch b/patches/localversion.patch
index e36eb4b6666a..03a80b8b0e80 100644
--- a/patches/localversion.patch
+++ b/patches/localversion.patch
@@ -10,4 +10,4 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
--- /dev/null
+++ b/localversion-rt
@@ -0,0 +1 @@
-+-rt3
++-rt4
diff --git a/patches/sched-rt-Simplify-the-IPI-based-RT-balancing-logic.patch b/patches/sched-rt-Simplify-the-IPI-based-RT-balancing-logic.patch
deleted file mode 100644
index e40ee9b5e839..000000000000
--- a/patches/sched-rt-Simplify-the-IPI-based-RT-balancing-logic.patch
+++ /dev/null
@@ -1,564 +0,0 @@
-From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
-Date: Fri, 6 Oct 2017 14:05:04 -0400
-Subject: [PATCH] sched/rt: Simplify the IPI based RT balancing logic
-
-Upstream commit 4bdced5c9a2922521e325896a7bbbf0132c94e56
-
-When a CPU lowers its priority (schedules out a high priority task for a
-lower priority one), a check is made to see if any other CPU has overloaded
-RT tasks (more than one). It checks the rto_mask to determine this and if so
-it will request to pull one of those tasks to itself if the non running RT
-task is of higher priority than the new priority of the next task to run on
-the current CPU.
-
-When we deal with large number of CPUs, the original pull logic suffered
-from large lock contention on a single CPU run queue, which caused a huge
-latency across all CPUs. This was caused by only having one CPU having
-overloaded RT tasks and a bunch of other CPUs lowering their priority. To
-solve this issue, commit:
-
- b6366f048e0c ("sched/rt: Use IPI to trigger RT task push migration instead of pulling")
-
-changed the way to request a pull. Instead of grabbing the lock of the
-overloaded CPU's runqueue, it simply sent an IPI to that CPU to do the work.
-
-Although the IPI logic worked very well in removing the large latency build
-up, it still could suffer from a large number of IPIs being sent to a single
-CPU. On a 80 CPU box, I measured over 200us of processing IPIs. Worse yet,
-when I tested this on a 120 CPU box, with a stress test that had lots of
-RT tasks scheduling on all CPUs, it actually triggered the hard lockup
-detector! One CPU had so many IPIs sent to it, and due to the restart
-mechanism that is triggered when the source run queue has a priority status
-change, the CPU spent minutes! processing the IPIs.
-
-Thinking about this further, I realized there's no reason for each run queue
-to send its own IPI. As all CPUs with overloaded tasks must be scanned
-regardless if there's one or many CPUs lowering their priority, because
-there's no current way to find the CPU with the highest priority task that
-can schedule to one of these CPUs, there really only needs to be one IPI
-being sent around at a time.
-
-This greatly simplifies the code!
-
-The new approach is to have each root domain have its own irq work, as the
-rto_mask is per root domain. The root domain has the following fields
-attached to it:
-
- rto_push_work - the irq work to process each CPU set in rto_mask
- rto_lock - the lock to protect some of the other rto fields
- rto_loop_start - an atomic that keeps contention down on rto_lock
- the first CPU scheduling in a lower priority task
- is the one to kick off the process.
- rto_loop_next - an atomic that gets incremented for each CPU that
- schedules in a lower priority task.
- rto_loop - a variable protected by rto_lock that is used to
- compare against rto_loop_next
- rto_cpu - The cpu to send the next IPI to, also protected by
- the rto_lock.
-
-When a CPU schedules in a lower priority task and wants to make sure
-overloaded CPUs know about it. It increments the rto_loop_next. Then it
-atomically sets rto_loop_start with a cmpxchg. If the old value is not "0",
-then it is done, as another CPU is kicking off the IPI loop. If the old
-value is "0", then it will take the rto_lock to synchronize with a possible
-IPI being sent around to the overloaded CPUs.
-
-If rto_cpu is greater than or equal to nr_cpu_ids, then there's either no
-IPI being sent around, or one is about to finish. Then rto_cpu is set to the
-first CPU in rto_mask and an IPI is sent to that CPU. If there's no CPUs set
-in rto_mask, then there's nothing to be done.
-
-When the CPU receives the IPI, it will first try to push any RT tasks that is
-queued on the CPU but can't run because a higher priority RT task is
-currently running on that CPU.
-
-Then it takes the rto_lock and looks for the next CPU in the rto_mask. If it
-finds one, it simply sends an IPI to that CPU and the process continues.
-
-If there's no more CPUs in the rto_mask, then rto_loop is compared with
-rto_loop_next. If they match, everything is done and the process is over. If
-they do not match, then a CPU scheduled in a lower priority task as the IPI
-was being passed around, and the process needs to start again. The first CPU
-in rto_mask is sent the IPI.
-
-This change removes this duplication of work in the IPI logic, and greatly
-lowers the latency caused by the IPIs. This removed the lockup happening on
-the 120 CPU machine. It also simplifies the code tremendously. What else
-could anyone ask for?
-
-Thanks to Peter Zijlstra for simplifying the rto_loop_start atomic logic and
-supplying me with the rto_start_trylock() and rto_start_unlock() helper
-functions.
-
-Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
-Cc: Clark Williams <williams@redhat.com>
-Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
-Cc: John Kacur <jkacur@redhat.com>
-Cc: Linus Torvalds <torvalds@linux-foundation.org>
-Cc: Mike Galbraith <efault@gmx.de>
-Cc: Peter Zijlstra <peterz@infradead.org>
-Cc: Scott Wood <swood@redhat.com>
-Cc: Thomas Gleixner <tglx@linutronix.de>
-Link: http://lkml.kernel.org/r/20170424114732.1aac6dc4@gandalf.local.home
-Signed-off-by: Ingo Molnar <mingo@kernel.org>
-Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
----
- kernel/sched/rt.c | 316 +++++++++++++++++-------------------------------
- kernel/sched/sched.h | 24 ++-
- kernel/sched/topology.c | 6
- 3 files changed, 138 insertions(+), 208 deletions(-)
-
---- a/kernel/sched/rt.c
-+++ b/kernel/sched/rt.c
-@@ -74,10 +74,6 @@ static void start_rt_bandwidth(struct rt
- raw_spin_unlock(&rt_b->rt_runtime_lock);
- }
-
--#if defined(CONFIG_SMP) && defined(HAVE_RT_PUSH_IPI)
--static void push_irq_work_func(struct irq_work *work);
--#endif
--
- void init_rt_rq(struct rt_rq *rt_rq)
- {
- struct rt_prio_array *array;
-@@ -97,13 +93,6 @@ void init_rt_rq(struct rt_rq *rt_rq)
- rt_rq->rt_nr_migratory = 0;
- rt_rq->overloaded = 0;
- plist_head_init(&rt_rq->pushable_tasks);
--
--#ifdef HAVE_RT_PUSH_IPI
-- rt_rq->push_flags = 0;
-- rt_rq->push_cpu = nr_cpu_ids;
-- raw_spin_lock_init(&rt_rq->push_lock);
-- init_irq_work(&rt_rq->push_work, push_irq_work_func);
--#endif
- #endif /* CONFIG_SMP */
- /* We start is dequeued state, because no RT tasks are queued */
- rt_rq->rt_queued = 0;
-@@ -1876,241 +1865,166 @@ static void push_rt_tasks(struct rq *rq)
- }
-
- #ifdef HAVE_RT_PUSH_IPI
-+
- /*
-- * The search for the next cpu always starts at rq->cpu and ends
-- * when we reach rq->cpu again. It will never return rq->cpu.
-- * This returns the next cpu to check, or nr_cpu_ids if the loop
-- * is complete.
-+ * When a high priority task schedules out from a CPU and a lower priority
-+ * task is scheduled in, a check is made to see if there's any RT tasks
-+ * on other CPUs that are waiting to run because a higher priority RT task
-+ * is currently running on its CPU. In this case, the CPU with multiple RT
-+ * tasks queued on it (overloaded) needs to be notified that a CPU has opened
-+ * up that may be able to run one of its non-running queued RT tasks.
-+ *
-+ * All CPUs with overloaded RT tasks need to be notified as there is currently
-+ * no way to know which of these CPUs have the highest priority task waiting
-+ * to run. Instead of trying to take a spinlock on each of these CPUs,
-+ * which has shown to cause large latency when done on machines with many
-+ * CPUs, sending an IPI to the CPUs to have them push off the overloaded
-+ * RT tasks waiting to run.
-+ *
-+ * Just sending an IPI to each of the CPUs is also an issue, as on large
-+ * count CPU machines, this can cause an IPI storm on a CPU, especially
-+ * if its the only CPU with multiple RT tasks queued, and a large number
-+ * of CPUs scheduling a lower priority task at the same time.
-+ *
-+ * Each root domain has its own irq work function that can iterate over
-+ * all CPUs with RT overloaded tasks. Since all CPUs with overloaded RT
-+ * tassk must be checked if there's one or many CPUs that are lowering
-+ * their priority, there's a single irq work iterator that will try to
-+ * push off RT tasks that are waiting to run.
-+ *
-+ * When a CPU schedules a lower priority task, it will kick off the
-+ * irq work iterator that will jump to each CPU with overloaded RT tasks.
-+ * As it only takes the first CPU that schedules a lower priority task
-+ * to start the process, the rto_start variable is incremented and if
-+ * the atomic result is one, then that CPU will try to take the rto_lock.
-+ * This prevents high contention on the lock as the process handles all
-+ * CPUs scheduling lower priority tasks.
-+ *
-+ * All CPUs that are scheduling a lower priority task will increment the
-+ * rt_loop_next variable. This will make sure that the irq work iterator
-+ * checks all RT overloaded CPUs whenever a CPU schedules a new lower
-+ * priority task, even if the iterator is in the middle of a scan. Incrementing
-+ * the rt_loop_next will cause the iterator to perform another scan.
- *
-- * rq->rt.push_cpu holds the last cpu returned by this function,
-- * or if this is the first instance, it must hold rq->cpu.
- */
- static int rto_next_cpu(struct rq *rq)
- {
-- int prev_cpu = rq->rt.push_cpu;
-+ struct root_domain *rd = rq->rd;
-+ int next;
- int cpu;
-
-- cpu = cpumask_next(prev_cpu, rq->rd->rto_mask);
--
- /*
-- * If the previous cpu is less than the rq's CPU, then it already
-- * passed the end of the mask, and has started from the beginning.
-- * We end if the next CPU is greater or equal to rq's CPU.
-+ * When starting the IPI RT pushing, the rto_cpu is set to -1,
-+ * rt_next_cpu() will simply return the first CPU found in
-+ * the rto_mask.
-+ *
-+ * If rto_next_cpu() is called with rto_cpu is a valid cpu, it
-+ * will return the next CPU found in the rto_mask.
-+ *
-+ * If there are no more CPUs left in the rto_mask, then a check is made
-+ * against rto_loop and rto_loop_next. rto_loop is only updated with
-+ * the rto_lock held, but any CPU may increment the rto_loop_next
-+ * without any locking.
- */
-- if (prev_cpu < rq->cpu) {
-- if (cpu >= rq->cpu)
-- return nr_cpu_ids;
-+ for (;;) {
-
-- } else if (cpu >= nr_cpu_ids) {
-- /*
-- * We passed the end of the mask, start at the beginning.
-- * If the result is greater or equal to the rq's CPU, then
-- * the loop is finished.
-- */
-- cpu = cpumask_first(rq->rd->rto_mask);
-- if (cpu >= rq->cpu)
-- return nr_cpu_ids;
-- }
-- rq->rt.push_cpu = cpu;
-+ /* When rto_cpu is -1 this acts like cpumask_first() */
-+ cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
-
-- /* Return cpu to let the caller know if the loop is finished or not */
-- return cpu;
--}
-+ rd->rto_cpu = cpu;
-
--static int find_next_push_cpu(struct rq *rq)
--{
-- struct rq *next_rq;
-- int cpu;
-+ if (cpu < nr_cpu_ids)
-+ return cpu;
-
-- while (1) {
-- cpu = rto_next_cpu(rq);
-- if (cpu >= nr_cpu_ids)
-- break;
-- next_rq = cpu_rq(cpu);
-+ rd->rto_cpu = -1;
-+
-+ /*
-+ * ACQUIRE ensures we see the @rto_mask changes
-+ * made prior to the @next value observed.
-+ *
-+ * Matches WMB in rt_set_overload().
-+ */
-+ next = atomic_read_acquire(&rd->rto_loop_next);
-
-- /* Make sure the next rq can push to this rq */
-- if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr)
-+ if (rd->rto_loop == next)
- break;
-+
-+ rd->rto_loop = next;
- }
-
-- return cpu;
-+ return -1;
- }
-
--#define RT_PUSH_IPI_EXECUTING 1
--#define RT_PUSH_IPI_RESTART 2
-+static inline bool rto_start_trylock(atomic_t *v)
-+{
-+ return !atomic_cmpxchg_acquire(v, 0, 1);
-+}
-
--/*
-- * When a high priority task schedules out from a CPU and a lower priority
-- * task is scheduled in, a check is made to see if there's any RT tasks
-- * on other CPUs that are waiting to run because a higher priority RT task
-- * is currently running on its CPU. In this case, the CPU with multiple RT
-- * tasks queued on it (overloaded) needs to be notified that a CPU has opened
-- * up that may be able to run one of its non-running queued RT tasks.
-- *
-- * On large CPU boxes, there's the case that several CPUs could schedule
-- * a lower priority task at the same time, in which case it will look for
-- * any overloaded CPUs that it could pull a task from. To do this, the runqueue
-- * lock must be taken from that overloaded CPU. Having 10s of CPUs all fighting
-- * for a single overloaded CPU's runqueue lock can produce a large latency.
-- * (This has actually been observed on large boxes running cyclictest).
-- * Instead of taking the runqueue lock of the overloaded CPU, each of the
-- * CPUs that scheduled a lower priority task simply sends an IPI to the
-- * overloaded CPU. An IPI is much cheaper than taking an runqueue lock with
-- * lots of contention. The overloaded CPU will look to push its non-running
-- * RT task off, and if it does, it can then ignore the other IPIs coming
-- * in, and just pass those IPIs off to any other overloaded CPU.
-- *
-- * When a CPU schedules a lower priority task, it only sends an IPI to
-- * the "next" CPU that has overloaded RT tasks. This prevents IPI storms,
-- * as having 10 CPUs scheduling lower priority tasks and 10 CPUs with
-- * RT overloaded tasks, would cause 100 IPIs to go out at once.
-- *
-- * The overloaded RT CPU, when receiving an IPI, will try to push off its
-- * overloaded RT tasks and then send an IPI to the next CPU that has
-- * overloaded RT tasks. This stops when all CPUs with overloaded RT tasks
-- * have completed. Just because a CPU may have pushed off its own overloaded
-- * RT task does not mean it should stop sending the IPI around to other
-- * overloaded CPUs. There may be another RT task waiting to run on one of
-- * those CPUs that are of higher priority than the one that was just
-- * pushed.
-- *
-- * An optimization that could possibly be made is to make a CPU array similar
-- * to the cpupri array mask of all running RT tasks, but for the overloaded
-- * case, then the IPI could be sent to only the CPU with the highest priority
-- * RT task waiting, and that CPU could send off further IPIs to the CPU with
-- * the next highest waiting task. Since the overloaded case is much less likely
-- * to happen, the complexity of this implementation may not be worth it.
-- * Instead, just send an IPI around to all overloaded CPUs.
-- *
-- * The rq->rt.push_flags holds the status of the IPI that is going around.
-- * A run queue can only send out a single IPI at a time. The possible flags
-- * for rq->rt.push_flags are:
-- *
-- * (None or zero): No IPI is going around for the current rq
-- * RT_PUSH_IPI_EXECUTING: An IPI for the rq is being passed around
-- * RT_PUSH_IPI_RESTART: The priority of the running task for the rq
-- * has changed, and the IPI should restart
-- * circulating the overloaded CPUs again.
-- *
-- * rq->rt.push_cpu contains the CPU that is being sent the IPI. It is updated
-- * before sending to the next CPU.
-- *
-- * Instead of having all CPUs that schedule a lower priority task send
-- * an IPI to the same "first" CPU in the RT overload mask, they send it
-- * to the next overloaded CPU after their own CPU. This helps distribute
-- * the work when there's more than one overloaded CPU and multiple CPUs
-- * scheduling in lower priority tasks.
-- *
-- * When a rq schedules a lower priority task than what was currently
-- * running, the next CPU with overloaded RT tasks is examined first.
-- * That is, if CPU 1 and 5 are overloaded, and CPU 3 schedules a lower
-- * priority task, it will send an IPI first to CPU 5, then CPU 5 will
-- * send to CPU 1 if it is still overloaded. CPU 1 will clear the
-- * rq->rt.push_flags if RT_PUSH_IPI_RESTART is not set.
-- *
-- * The first CPU to notice IPI_RESTART is set, will clear that flag and then
-- * send an IPI to the next overloaded CPU after the rq->cpu and not the next
-- * CPU after push_cpu. That is, if CPU 1, 4 and 5 are overloaded when CPU 3
-- * schedules a lower priority task, and the IPI_RESTART gets set while the
-- * handling is being done on CPU 5, it will clear the flag and send it back to
-- * CPU 4 instead of CPU 1.
-- *
-- * Note, the above logic can be disabled by turning off the sched_feature
-- * RT_PUSH_IPI. Then the rq lock of the overloaded CPU will simply be
-- * taken by the CPU requesting a pull and the waiting RT task will be pulled
-- * by that CPU. This may be fine for machines with few CPUs.
-- */
--static void tell_cpu_to_push(struct rq *rq)
-+static inline void rto_start_unlock(atomic_t *v)
- {
-- int cpu;
-+ atomic_set_release(v, 0);
-+}
-
-- if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
-- raw_spin_lock(&rq->rt.push_lock);
-- /* Make sure it's still executing */
-- if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
-- /*
-- * Tell the IPI to restart the loop as things have
-- * changed since it started.
-- */
-- rq->rt.push_flags |= RT_PUSH_IPI_RESTART;
-- raw_spin_unlock(&rq->rt.push_lock);
-- return;
-- }
-- raw_spin_unlock(&rq->rt.push_lock);
-- }
-+static void tell_cpu_to_push(struct rq *rq)
-+{
-+ int cpu = -1;
-
-- /* When here, there's no IPI going around */
-+ /* Keep the loop going if the IPI is currently active */
-+ atomic_inc(&rq->rd->rto_loop_next);
-
-- rq->rt.push_cpu = rq->cpu;
-- cpu = find_next_push_cpu(rq);
-- if (cpu >= nr_cpu_ids)
-+ /* Only one CPU can initiate a loop at a time */
-+ if (!rto_start_trylock(&rq->rd->rto_loop_start))
- return;
-
-- rq->rt.push_flags = RT_PUSH_IPI_EXECUTING;
-+ raw_spin_lock(&rq->rd->rto_lock);
-
-- irq_work_queue_on(&rq->rt.push_work, cpu);
-+ /*
-+ * The rto_cpu is updated under the lock, if it has a valid cpu
-+ * then the IPI is still running and will continue due to the
-+ * update to loop_next, and nothing needs to be done here.
-+ * Otherwise it is finishing up and an ipi needs to be sent.
-+ */
-+ if (rq->rd->rto_cpu < 0)
-+ cpu = rto_next_cpu(rq);
-+
-+ raw_spin_unlock(&rq->rd->rto_lock);
-+
-+ rto_start_unlock(&rq->rd->rto_loop_start);
-+
-+ if (cpu >= 0)
-+ irq_work_queue_on(&rq->rd->rto_push_work, cpu);
- }
-
- /* Called from hardirq context */
--static void try_to_push_tasks(void *arg)
-+void rto_push_irq_work_func(struct irq_work *work)
- {
-- struct rt_rq *rt_rq = arg;
-- struct rq *rq, *src_rq;
-- int this_cpu;
-+ struct rq *rq;
- int cpu;
-
-- this_cpu = rt_rq->push_cpu;
-+ rq = this_rq();
-
-- /* Paranoid check */
-- BUG_ON(this_cpu != smp_processor_id());
--
-- rq = cpu_rq(this_cpu);
-- src_rq = rq_of_rt_rq(rt_rq);
--
--again:
-+ /*
-+ * We do not need to grab the lock to check for has_pushable_tasks.
-+ * When it gets updated, a check is made if a push is possible.
-+ */
- if (has_pushable_tasks(rq)) {
- raw_spin_lock(&rq->lock);
-- push_rt_task(rq);
-+ push_rt_tasks(rq);
- raw_spin_unlock(&rq->lock);
- }
-
-- /* Pass the IPI to the next rt overloaded queue */
-- raw_spin_lock(&rt_rq->push_lock);
-- /*
-- * If the source queue changed since the IPI went out,
-- * we need to restart the search from that CPU again.
-- */
-- if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) {
-- rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART;
-- rt_rq->push_cpu = src_rq->cpu;
-- }
-+ raw_spin_lock(&rq->rd->rto_lock);
-
-- cpu = find_next_push_cpu(src_rq);
-+ /* Pass the IPI to the next rt overloaded queue */
-+ cpu = rto_next_cpu(rq);
-
-- if (cpu >= nr_cpu_ids)
-- rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING;
-- raw_spin_unlock(&rt_rq->push_lock);
-+ raw_spin_unlock(&rq->rd->rto_lock);
-
-- if (cpu >= nr_cpu_ids)
-+ if (cpu < 0)
- return;
-
-- /*
-- * It is possible that a restart caused this CPU to be
-- * chosen again. Don't bother with an IPI, just see if we
-- * have more to push.
-- */
-- if (unlikely(cpu == rq->cpu))
-- goto again;
--
- /* Try the next RT overloaded CPU */
-- irq_work_queue_on(&rt_rq->push_work, cpu);
--}
--
--static void push_irq_work_func(struct irq_work *work)
--{
-- struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work);
--
-- try_to_push_tasks(rt_rq);
-+ irq_work_queue_on(&rq->rd->rto_push_work, cpu);
- }
- #endif /* HAVE_RT_PUSH_IPI */
-
---- a/kernel/sched/sched.h
-+++ b/kernel/sched/sched.h
-@@ -502,7 +502,7 @@ static inline int rt_bandwidth_enabled(v
- }
-
- /* RT IPI pull logic requires IRQ_WORK */
--#ifdef CONFIG_IRQ_WORK
-+#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_SMP)
- # define HAVE_RT_PUSH_IPI
- #endif
-
-@@ -524,12 +524,6 @@ struct rt_rq {
- unsigned long rt_nr_total;
- int overloaded;
- struct plist_head pushable_tasks;
--#ifdef HAVE_RT_PUSH_IPI
-- int push_flags;
-- int push_cpu;
-- struct irq_work push_work;
-- raw_spinlock_t push_lock;
--#endif
- #endif /* CONFIG_SMP */
- int rt_queued;
-
-@@ -638,6 +632,19 @@ struct root_domain {
- struct dl_bw dl_bw;
- struct cpudl cpudl;
-
-+#ifdef HAVE_RT_PUSH_IPI
-+ /*
-+ * For IPI pull requests, loop across the rto_mask.
-+ */
-+ struct irq_work rto_push_work;
-+ raw_spinlock_t rto_lock;
-+ /* These are only updated and read within rto_lock */
-+ int rto_loop;
-+ int rto_cpu;
-+ /* These atomics are updated outside of a lock */
-+ atomic_t rto_loop_next;
-+ atomic_t rto_loop_start;
-+#endif
- /*
- * The "RT overload" flag: it gets set if a CPU has more than
- * one runnable RT task.
-@@ -655,6 +662,9 @@ extern void init_defrootdomain(void);
- extern int sched_init_domains(const struct cpumask *cpu_map);
- extern void rq_attach_root(struct rq *rq, struct root_domain *rd);
-
-+#ifdef HAVE_RT_PUSH_IPI
-+extern void rto_push_irq_work_func(struct irq_work *work);
-+#endif
- #endif /* CONFIG_SMP */
-
- /*
---- a/kernel/sched/topology.c
-+++ b/kernel/sched/topology.c
-@@ -269,6 +269,12 @@ static int init_rootdomain(struct root_d
- if (!zalloc_cpumask_var(&rd->rto_mask, GFP_KERNEL))
- goto free_dlo_mask;
-
-+#ifdef HAVE_RT_PUSH_IPI
-+ rd->rto_cpu = -1;
-+ raw_spin_lock_init(&rd->rto_lock);
-+ init_irq_work(&rd->rto_push_work, rto_push_irq_work_func);
-+#endif
-+
- init_dl_bw(&rd->dl_bw);
- if (cpudl_init(&rd->cpudl) != 0)
- goto free_rto_mask;
diff --git a/patches/series b/patches/series
index d3d6c39f17f2..c06296eca4f4 100644
--- a/patches/series
+++ b/patches/series
@@ -6,7 +6,6 @@
# UPSTREAM changes queued
############################################################
rcu-Suppress-lockdep-false-positive-boost_mtx-compla.patch
-sched-rt-Simplify-the-IPI-based-RT-balancing-logic.patch
############################################################
# UPSTREAM FIXES, patches pending