[ANNOUNCE] v5.15-rc4-rt8v5.15-rc4-rt8-patches

Dear RT folks! I'm pleased to announce the v5.15-rc4-rt8 patch set. Changes since v5.15-rc4-rt7: - Redo the i915 patches. Everyone with a i915 may help testing. The upstream thread is at https://lore.kernel.org/all/20211005150046.1000285-1-bigeasy@linutronix.de/ I'm also curious to find out if the uncore.lock lock can be made a raw_spinlock_t or if doing so pushes the latency through the roof. See https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfyf7g@linutronix.de/ Any help is welcome. - During the i915 rework, a preempt_disable_rt() section in the radeon driver has been dropped because there is a spinlock_t within this section. - The irq_work patch(es) have been reworked based on upstream's feedback. The irq_work is now no longer processed in softirq but by a per-CPU thread at the lowest RT priority. - The statistics accounting in networking has been reworked in order to decouple the seqcount_t "try lock" usage from statistics update and qdisc's running state. Patches by Ahmed S. Darwish. Known issues - netconsole triggers WARN. - The "Memory controller" (CONFIG_MEMCG) has been disabled. - Valentin Schneider reported a few splats on ARM64, see https://lkml.kernel.org/r/20210810134127.1394269-1-valentin.schneider@arm.com The delta patch against v5.15-rc4-rt7 is appended below and can be found here: https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.15/incr/patch-5.15-rc4-rt7-rt8.patch.xz You can get this release via the git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git v5.15-rc4-rt8 The RT patch against v5.15-rc4 can be found here: https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.15/older/patch-5.15-rc4-rt8.patch.xz The split quilt queue is available at: https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.15/older/patches-5.15-rc4-rt8.tar.xz Sebastian Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> 2021-10-08 23:18:03 +0200
committer: Sebastian Andrzej Siewior <bigeasy@linutronix.de> 2021-10-08 23:18:03 +0200
commit: 8faf46dc8df161aa1b6be244e6d842094fc32212 (patch)
tree: 4761b83d49b72a53caf36962e02efc407b335a9b
parent: 6b69678d2e2b3c2109af22409c3aff3cbab3f239 (diff)
download: linux-rt-8faf46dc8df161aa1b6be244e6d842094fc32212.tar.gz
40 files changed, 3429 insertions, 809 deletions
diff --git a/patches/0001-drm-i915-remember-to-call-i915_sw_fence_fini.patch b/patches/0001-drm-i915-remember-to-call-i915_sw_fence_fini.patch
new file mode 100644
index 000000000000..96a65d66e1b2
--- /dev/null
+++ b/patches/0001-drm-i915-remember-to-call-i915_sw_fence_fini.patch
@@ -0,0 +1,35 @@
+From: Matthew Auld <matthew.auld@intel.com>
+Date: Fri, 24 Sep 2021 15:46:46 +0100
+Subject: [PATCH 01/10] drm/i915: remember to call i915_sw_fence_fini
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Seems to fix some object-debug splat which appeared while debugging
+something unrelated.
+
+v2: s/guc_blocked/guc_state.blocked/
+
+[bigeasy: s/guc_state.blocked/guc_blocked ]
+
+Signed-off-by: Matthew Auld <matthew.auld@intel.com>
+Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
+Cc: Matthew Brost <matthew.brost@intel.com>
+Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
+Reviewed-by: Matthew Brost <matthew.brost@intel.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Link: https://lore.kernel.org/r/20210924144646.4096402-1-matthew.auld@intel.com
+---
+ drivers/gpu/drm/i915/gt/intel_context.c |    1 +
+ 1 file changed, 1 insertion(+)
+
+--- a/drivers/gpu/drm/i915/gt/intel_context.c
++++ b/drivers/gpu/drm/i915/gt/intel_context.c
+@@ -421,6 +421,7 @@ void intel_context_fini(struct intel_con
+ 
+ 	mutex_destroy(&ce->pin_mutex);
+ 	i915_active_fini(&ce->active);
++	i915_sw_fence_fini(&ce->guc_blocked);
+ }
+ 
+ void i915_context_module_exit(void)
diff --git a/patches/0001-mqprio-Correct-stats-in-mqprio_dump_class_stats.patch b/patches/0001-mqprio-Correct-stats-in-mqprio_dump_class_stats.patch
new file mode 100644
index 000000000000..d16a5ade53b2
--- /dev/null
+++ b/patches/0001-mqprio-Correct-stats-in-mqprio_dump_class_stats.patch
@@ -0,0 +1,69 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 7 Oct 2021 18:06:03 +0200
+Subject: [PATCH 01/10] mqprio: Correct stats in mqprio_dump_class_stats().
+
+It looks like with the introduction of subqueus the statics broke.
+Before the change `bstats' and `qstats' on stack was fed and later this
+was copied over to struct gnet_dump.
+
+After the change the `bstats' and `qstats' are only set to 0 and no
+longer updated and that is then fed to gnet_dump. Additionally
+qdisc->cpu_bstats and qdisc->cpu_qstats is destroeyd for global
+stats. For per-CPU stats both __gnet_stats_copy_basic() and
+__gnet_stats_copy_queue() add the values but for global stats the value
+set and so the previous value is lost and only the last value from the
+loop ends up in sch->[bq]stats.
+
+Use the on-stack [bq]stats variables again and add the stats manually in
+the global case.
+
+Fixes: ce679e8df7ed2 ("net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio")
+Cc: John Fastabend <john.fastabend@gmail.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ net/sched/sch_mqprio.c |   30 ++++++++++++++++++------------
+ 1 file changed, 18 insertions(+), 12 deletions(-)
+
+--- a/net/sched/sch_mqprio.c
++++ b/net/sched/sch_mqprio.c
+@@ -529,22 +529,28 @@ static int mqprio_dump_class_stats(struc
+ 		for (i = tc.offset; i < tc.offset + tc.count; i++) {
+ 			struct netdev_queue *q = netdev_get_tx_queue(dev, i);
+ 			struct Qdisc *qdisc = rtnl_dereference(q->qdisc);
+-			struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL;
+-			struct gnet_stats_queue __percpu *cpu_qstats = NULL;
+ 
+ 			spin_lock_bh(qdisc_lock(qdisc));
++
+ 			if (qdisc_is_percpu_stats(qdisc)) {
+-				cpu_bstats = qdisc->cpu_bstats;
+-				cpu_qstats = qdisc->cpu_qstats;
+-			}
++				qlen = qdisc_qlen_sum(qdisc);
+ 
+-			qlen = qdisc_qlen_sum(qdisc);
+-			__gnet_stats_copy_basic(NULL, &sch->bstats,
+-						cpu_bstats, &qdisc->bstats);
+-			__gnet_stats_copy_queue(&sch->qstats,
+-						cpu_qstats,
+-						&qdisc->qstats,
+-						qlen);
++				__gnet_stats_copy_basic(NULL, &bstats,
++							qdisc->cpu_bstats,
++							&qdisc->bstats);
++				__gnet_stats_copy_queue(&qstats,
++							qdisc->cpu_qstats,
++							&qdisc->qstats,
++							qlen);
++			} else {
++				qlen		+= qdisc->q.qlen;
++				bstats.bytes	+= qdisc->bstats.bytes;
++				bstats.packets	+= qdisc->bstats.packets;
++				qstats.backlog	+= qdisc->qstats.backlog;
++				qstats.drops	+= qdisc->qstats.drops;
++				qstats.requeues	+= qdisc->qstats.requeues;
++				qstats.overlimits += qdisc->qstats.overlimits;
++			}
+ 			spin_unlock_bh(qdisc_lock(qdisc));
+ 		}
+ 
diff --git a/patches/0001_sched_rt_annotate_the_rt_balancing_logic_irqwork_as_irq_work_hard_irq.patch b/patches/0001_sched_rt_annotate_the_rt_balancing_logic_irqwork_as_irq_work_hard_irq.patch
index 7cdc7f9c970e..11c2f6c2a776 100644
--- a/patches/0001_sched_rt_annotate_the_rt_balancing_logic_irqwork_as_irq_work_hard_irq.patch
+++ b/patches/0001_sched_rt_annotate_the_rt_balancing_logic_irqwork_as_irq_work_hard_irq.patch
@@ -1,6 +1,6 @@
 From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
 Subject: sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ
-Date: Mon, 27 Sep 2021 23:19:15 +0200
+Date: Wed, 06 Oct 2021 13:18:49 +0200
 
 The push-IPI logic for RT tasks expects to be invoked from hardirq
 context. One reason is that a RT task on the remote CPU would block the
@@ -19,7 +19,7 @@ Cc: Ben Segall <bsegall@google.com>
 Cc: Mel Gorman <mgorman@suse.de>
 Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
 Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Link: https://lore.kernel.org/r/20210927211919.310855-2-bigeasy@linutronix.de
+Link: https://lore.kernel.org/r/20211006111852.1514359-2-bigeasy@linutronix.de
 ---
  kernel/sched/topology.c |    2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/patches/0002-drm-Increase-DRM_OBJECT_MAX_PROPERTY-by-18.patch b/patches/0002-drm-Increase-DRM_OBJECT_MAX_PROPERTY-by-18.patch
new file mode 100644
index 000000000000..2e0a3d61e418
--- /dev/null
+++ b/patches/0002-drm-Increase-DRM_OBJECT_MAX_PROPERTY-by-18.patch
@@ -0,0 +1,28 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Sat, 2 Oct 2021 12:03:48 +0200
+Subject: [PATCH 02/10] drm: Increase DRM_OBJECT_MAX_PROPERTY by 18.
+
+The warning poped up, it says it increase it by the number of occurence.
+I saw it 18 times so here it is.
+It started to up since commit
+   2f425cf5242a0 ("drm: Fix oops in damage self-tests by mocking damage property")
+
+Increase DRM_OBJECT_MAX_PROPERTY by 18.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Link: https://lkml.kernel.org/r/20211005065151.828922-1-bigeasy@linutronix.de
+---
+ include/drm/drm_mode_object.h |    2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/include/drm/drm_mode_object.h
++++ b/include/drm/drm_mode_object.h
+@@ -60,7 +60,7 @@ struct drm_mode_object {
+ 	void (*free_cb)(struct kref *kref);
+ };
+ 
+-#define DRM_OBJECT_MAX_PROPERTY 24
++#define DRM_OBJECT_MAX_PROPERTY 42
+ /**
+  * struct drm_object_properties - property tracking for &drm_mode_object
+  */
diff --git a/patches/0002-gen_stats-Add-instead-Set-the-value-in-__gnet_stats_.patch b/patches/0002-gen_stats-Add-instead-Set-the-value-in-__gnet_stats_.patch
new file mode 100644
index 000000000000..59eb5799683b
--- /dev/null
+++ b/patches/0002-gen_stats-Add-instead-Set-the-value-in-__gnet_stats_.patch
@@ -0,0 +1,50 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 7 Oct 2021 17:25:05 +0200
+Subject: [PATCH 02/10] gen_stats: Add instead Set the value in
+ __gnet_stats_copy_basic().
+
+Since day one __gnet_stats_copy_basic() always assigned the value to the
+bstats argument overwriting the previous value.
+
+Based on review there are five users of that function as of today:
+- est_fetch_counters(), ___gnet_stats_copy_basic()
+  memsets() bstats to zero, single invocation.
+
+- mq_dump(), mqprio_dump(), mqprio_dump_class_stats()
+  memsets() bstats to zero, multiple invocation but does not use the
+  function due to !qdisc_is_percpu_stats().
+
+It will probably simplify in percpu stats case if the value would be
+added and not just stored.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ net/core/gen_stats.c |    9 +++++++--
+ 1 file changed, 7 insertions(+), 2 deletions(-)
+
+--- a/net/core/gen_stats.c
++++ b/net/core/gen_stats.c
+@@ -143,6 +143,8 @@ void
+ 			struct gnet_stats_basic_packed *b)
+ {
+ 	unsigned int seq;
++	__u64 bytes = 0;
++	__u64 packets = 0;
+ 
+ 	if (cpu) {
+ 		__gnet_stats_copy_basic_cpu(bstats, cpu);
+@@ -151,9 +153,12 @@ void
+ 	do {
+ 		if (running)
+ 			seq = read_seqcount_begin(running);
+-		bstats->bytes = b->bytes;
+-		bstats->packets = b->packets;
++		bytes = b->bytes;
++		packets = b->packets;
+ 	} while (running && read_seqcount_retry(running, seq));
++
++	bstats->bytes += bytes;
++	bstats->packets += packets;
+ }
+ EXPORT_SYMBOL(__gnet_stats_copy_basic);
+ 
diff --git a/patches/0003_irq_work_allow_irq_work_sync_to_sleep_if_irq_work_no_irq_support.patch b/patches/0002_irq_work_allow_irq_work_sync_to_sleep_if_irq_work_no_irq_support.patch
index 356d65d05c2a..edf47f3a11d6 100644
--- a/patches/0003_irq_work_allow_irq_work_sync_to_sleep_if_irq_work_no_irq_support.patch
+++ b/patches/0002_irq_work_allow_irq_work_sync_to_sleep_if_irq_work_no_irq_support.patch
@@ -1,6 +1,6 @@
 From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
 Subject: irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.
-Date: Mon, 27 Sep 2021 23:19:17 +0200
+Date: Wed, 06 Oct 2021 13:18:50 +0200
 
 irq_work() triggers instantly an interrupt if supported by the
 architecture. Otherwise the work will be processed on the next timer
@@ -15,7 +15,7 @@ Let irq_work_sync() synchronize with rcuwait if the architecture
 processes irqwork via the timer tick.
 
 Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Link: https://lore.kernel.org/r/20210927211919.310855-4-bigeasy@linutronix.de
+Link: https://lore.kernel.org/r/20211006111852.1514359-3-bigeasy@linutronix.de
 ---
  include/linux/irq_work.h |    3 +++
  kernel/irq_work.c        |   10 ++++++++++
diff --git a/patches/0002_irq_work_ensure_that_irq_work_runs_in_in_irq_context.patch b/patches/0002_irq_work_ensure_that_irq_work_runs_in_in_irq_context.patch
deleted file mode 100644
index 89d0266ff28c..000000000000
--- a/patches/0002_irq_work_ensure_that_irq_work_runs_in_in_irq_context.patch
+++ /dev/null
@@ -1,31 +0,0 @@
-From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Subject: irq_work: Ensure that irq_work runs in in-IRQ context.
-Date: Mon, 27 Sep 2021 23:19:16 +0200
-
-The irq-work callback should be invoked in hardirq context and some
-callbacks rely on this behaviour. At the time irq_work_run_list()
-interrupts should be disabled but the important part is that the
-callback is invoked from a in-IRQ context.
-The "disabled interrupts" check can be satisfied by disabling interrupts
-from a kworker which is not the intended context.
-
-Ensure that the callback is invoked from hardirq context and not just
-with disabled interrupts.
-
-Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Link: https://lore.kernel.org/r/20210927211919.310855-3-bigeasy@linutronix.de
----
- kernel/irq_work.c |    2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
---- a/kernel/irq_work.c
-+++ b/kernel/irq_work.c
-@@ -167,7 +167,7 @@ static void irq_work_run_list(struct lli
- 	struct irq_work *work, *tmp;
- 	struct llist_node *llnode;
- 
--	BUG_ON(!irqs_disabled());
-+	BUG_ON(!in_hardirq());
- 
- 	if (llist_empty(list))
- 		return;
diff --git a/patches/0003-drm-i915-Use-preempt_disable-enable_rt-where-recomme.patch b/patches/0003-drm-i915-Use-preempt_disable-enable_rt-where-recomme.patch
new file mode 100644
index 000000000000..fa8699c3b14a
--- /dev/null
+++ b/patches/0003-drm-i915-Use-preempt_disable-enable_rt-where-recomme.patch
@@ -0,0 +1,55 @@
+From: Mike Galbraith <umgwanakikbuti@gmail.com>
+Date: Sat, 27 Feb 2016 08:09:11 +0100
+Subject: [PATCH 03/10] drm/i915: Use preempt_disable/enable_rt() where
+ recommended
+
+Mario Kleiner suggest in commit
+  ad3543ede630f ("drm/intel: Push get_scanout_position() timestamping into kms driver.")
+
+a spots where preemption should be disabled on PREEMPT_RT. The
+difference is that on PREEMPT_RT the intel_uncore::lock disables neither
+preemption nor interrupts and so region remains preemptible.
+
+The area covers only register reads and writes. The part that worries me
+is:
+- __intel_get_crtc_scanline() the worst case is 100us if no match is
+  found.
+
+- intel_crtc_scanlines_since_frame_timestamp() not sure how long this
+  may take in the worst case.
+
+It was in the RT queue for a while and nobody complained.
+Disable preemption on PREEPMPT_RT during timestamping.
+
+[bigeasy: patch description.]
+
+Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
+Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/gpu/drm/i915/i915_irq.c |    6 ++++--
+ 1 file changed, 4 insertions(+), 2 deletions(-)
+
+--- a/drivers/gpu/drm/i915/i915_irq.c
++++ b/drivers/gpu/drm/i915/i915_irq.c
+@@ -886,7 +886,8 @@ static bool i915_get_crtc_scanoutpos(str
+ 	 */
+ 	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
+ 
+-	/* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */
++	if (IS_ENABLED(CONFIG_PREEMPT_RT))
++		preempt_disable();
+ 
+ 	/* Get optional system timestamp before query. */
+ 	if (stime)
+@@ -950,7 +951,8 @@ static bool i915_get_crtc_scanoutpos(str
+ 	if (etime)
+ 		*etime = ktime_get();
+ 
+-	/* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */
++	if (IS_ENABLED(CONFIG_PREEMPT_RT))
++		preempt_enable();
+ 
+ 	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
+ 
diff --git a/patches/0003-gen_stats-Add-instead-Set-the-value-in-__gnet_stats_.patch b/patches/0003-gen_stats-Add-instead-Set-the-value-in-__gnet_stats_.patch
new file mode 100644
index 000000000000..14a833469171
--- /dev/null
+++ b/patches/0003-gen_stats-Add-instead-Set-the-value-in-__gnet_stats_.patch
@@ -0,0 +1,45 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 7 Oct 2021 18:40:24 +0200
+Subject: [PATCH 03/10] gen_stats: Add instead Set the value in
+ __gnet_stats_copy_queue().
+
+Based on review there are five users of __gnet_stats_copy_queue as of
+today:
+- qdisc_qstats_qlen_backlog(), gnet_stats_copy_queue(),
+  memsets() bstats to zero, single invocation.
+
+- mq_dump(), mqprio_dump(), mqprio_dump_class_stats(),
+  memsets() bstats to zero, multiple invocation but does not use the
+  function due to !qdisc_is_percpu_stats().
+
+It will probably simplify in percpu stats case if the value would be
+added and not just stored.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ net/core/gen_stats.c |   12 ++++++------
+ 1 file changed, 6 insertions(+), 6 deletions(-)
+
+--- a/net/core/gen_stats.c
++++ b/net/core/gen_stats.c
+@@ -312,14 +312,14 @@ void __gnet_stats_copy_queue(struct gnet
+ 	if (cpu) {
+ 		__gnet_stats_copy_queue_cpu(qstats, cpu);
+ 	} else {
+-		qstats->qlen = q->qlen;
+-		qstats->backlog = q->backlog;
+-		qstats->drops = q->drops;
+-		qstats->requeues = q->requeues;
+-		qstats->overlimits = q->overlimits;
++		qstats->qlen += q->qlen;
++		qstats->backlog += q->backlog;
++		qstats->drops += q->drops;
++		qstats->requeues += q->requeues;
++		qstats->overlimits += q->overlimits;
+ 	}
+ 
+-	qstats->qlen = qlen;
++	qstats->qlen += qlen;
+ }
+ EXPORT_SYMBOL(__gnet_stats_copy_queue);
+ 
diff --git a/patches/0003_irq_work_handle_some_irq_work_in_a_per_cpu_thread_on_preempt_rt.patch b/patches/0003_irq_work_handle_some_irq_work_in_a_per_cpu_thread_on_preempt_rt.patch
new file mode 100644
index 000000000000..4ce667fb66c8
--- /dev/null
+++ b/patches/0003_irq_work_handle_some_irq_work_in_a_per_cpu_thread_on_preempt_rt.patch
@@ -0,0 +1,234 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Subject: irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT
+Date: Wed, 06 Oct 2021 13:18:51 +0200
+
+The irq_work callback is invoked in hard IRQ context. By default all
+callbacks are scheduled for invocation right away (given supported by
+the architecture) except for the ones marked IRQ_WORK_LAZY which are
+delayed until the next timer-tick.
+
+While looking over the callbacks, some of them may acquire locks
+(spinlock_t, rwlock_t) which are transformed into sleeping locks on
+PREEMPT_RT and must not be acquired in hard IRQ context.
+Changing the locks into locks which could be acquired in this context
+will lead to other problems such as increased latencies if everything
+in the chain has IRQ-off locks. This will not solve all the issues as
+one callback has been noticed which invoked kref_put() and its callback
+invokes kfree() and this can not be invoked in hardirq context.
+
+Some callbacks are required to be invoked in hardirq context even on
+PREEMPT_RT to work properly. This includes for instance the NO_HZ
+callback which needs to be able to observe the idle context.
+
+The callbacks which require to be run in hardirq have already been
+marked. Use this information to split the callbacks onto the two lists
+on PREEMPT_RT:
+- lazy_list
+  Work items which are not marked with IRQ_WORK_HARD_IRQ will be added
+  to this list. Callbacks on this list will be invoked from a per-CPU
+  thread.
+  The handler here may acquire sleeping locks such as spinlock_t and
+  invoke kfree().
+
+- raised_list
+  Work items which are marked with IRQ_WORK_HARD_IRQ will be added to
+  this list. They will be invoked in hardirq context and must not
+  acquire any sleeping locks.
+
+The wake up of the per-CPU thread occurs from irq_work handler/
+hardirq context. The thread runs with lowest RT priority to ensure it
+runs before any SCHED_OTHER tasks do.
+
+[bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a
+	  hard and soft variant. Collected fixes over time from Steven
+	  Rostedt and Mike Galbraith. Move to per-CPU threads instead of
+	  softirq as suggested by PeterZ.]
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Link: https://lore.kernel.org/r/20211007092646.uhshe3ut2wkrcfzv@linutronix.de
+---
+ kernel/irq_work.c |  118 ++++++++++++++++++++++++++++++++++++++++++++++++------
+ 1 file changed, 106 insertions(+), 12 deletions(-)
+
+--- a/kernel/irq_work.c
++++ b/kernel/irq_work.c
+@@ -18,11 +18,36 @@
+ #include <linux/cpu.h>
+ #include <linux/notifier.h>
+ #include <linux/smp.h>
++#include <linux/smpboot.h>
+ #include <asm/processor.h>
+ #include <linux/kasan.h>
+ 
+ static DEFINE_PER_CPU(struct llist_head, raised_list);
+ static DEFINE_PER_CPU(struct llist_head, lazy_list);
++static DEFINE_PER_CPU(struct task_struct *, irq_workd);
++
++static void wake_irq_workd(void)
++{
++	struct task_struct *tsk = __this_cpu_read(irq_workd);
++
++	if (!llist_empty(this_cpu_ptr(&lazy_list)) && tsk)
++		wake_up_process(tsk);
++}
++
++#ifdef CONFIG_SMP
++static void irq_work_wake(struct irq_work *entry)
++{
++	wake_irq_workd();
++}
++
++static DEFINE_PER_CPU(struct irq_work, irq_work_wakeup) =
++	IRQ_WORK_INIT_HARD(irq_work_wake);
++#endif
++
++static int irq_workd_should_run(unsigned int cpu)
++{
++	return !llist_empty(this_cpu_ptr(&lazy_list));
++}
+ 
+ /*
+  * Claim the entry so that no one else will poke at it.
+@@ -52,15 +77,29 @@ void __weak arch_irq_work_raise(void)
+ /* Enqueue on current CPU, work must already be claimed and preempt disabled */
+ static void __irq_work_queue_local(struct irq_work *work)
+ {
++	struct llist_head *list;
++	bool rt_lazy_work = false;
++	bool lazy_work = false;
++	int work_flags;
++
++	work_flags = atomic_read(&work->node.a_flags);
++	if (work_flags & IRQ_WORK_LAZY)
++		lazy_work = true;
++	else if (IS_ENABLED(CONFIG_PREEMPT_RT) &&
++		 !(work_flags & IRQ_WORK_HARD_IRQ))
++		rt_lazy_work = true;
++
++	if (lazy_work || rt_lazy_work)
++		list = this_cpu_ptr(&lazy_list);
++	else
++		list = this_cpu_ptr(&raised_list);
++
++	if (!llist_add(&work->node.llist, list))
++		return;
++
+ 	/* If the work is "lazy", handle it from next tick if any */
+-	if (atomic_read(&work->node.a_flags) & IRQ_WORK_LAZY) {
+-		if (llist_add(&work->node.llist, this_cpu_ptr(&lazy_list)) &&
+-		    tick_nohz_tick_stopped())
+-			arch_irq_work_raise();
+-	} else {
+-		if (llist_add(&work->node.llist, this_cpu_ptr(&raised_list)))
+-			arch_irq_work_raise();
+-	}
++	if (!lazy_work || tick_nohz_tick_stopped())
++		arch_irq_work_raise();
+ }
+ 
+ /* Enqueue the irq work @work on the current CPU */
+@@ -104,17 +143,34 @@ bool irq_work_queue_on(struct irq_work *
+ 	if (cpu != smp_processor_id()) {
+ 		/* Arch remote IPI send/receive backend aren't NMI safe */
+ 		WARN_ON_ONCE(in_nmi());
++
++		/*
++		 * On PREEMPT_RT the items which are not marked as
++		 * IRQ_WORK_HARD_IRQ are added to the lazy list and a HARD work
++		 * item is used on the remote CPU to wake the thread.
++		 */
++		if (IS_ENABLED(CONFIG_PREEMPT_RT) &&
++		    !(atomic_read(&work->node.a_flags) & IRQ_WORK_HARD_IRQ)) {
++
++			if (!llist_add(&work->node.llist, &per_cpu(lazy_list, cpu)))
++				goto out;
++
++			work = &per_cpu(irq_work_wakeup, cpu);
++			if (!irq_work_claim(work))
++				goto out;
++		}
++
+ 		__smp_call_single_queue(cpu, &work->node.llist);
+ 	} else {
+ 		__irq_work_queue_local(work);
+ 	}
++out:
+ 	preempt_enable();
+ 
+ 	return true;
+ #endif /* CONFIG_SMP */
+ }
+ 
+-
+ bool irq_work_needs_cpu(void)
+ {
+ 	struct llist_head *raised, *lazy;
+@@ -170,7 +226,12 @@ static void irq_work_run_list(struct lli
+ 	struct irq_work *work, *tmp;
+ 	struct llist_node *llnode;
+ 
+-	BUG_ON(!irqs_disabled());
++	/*
++	 * On PREEMPT_RT IRQ-work which is not marked as HARD will be processed
++	 * in a per-CPU thread in preemptible context. Only the items which are
++	 * marked as IRQ_WORK_HARD_IRQ will be processed in hardirq context.
++	 */
++	BUG_ON(!irqs_disabled() && !IS_ENABLED(CONFIG_PREEMPT_RT));
+ 
+ 	if (llist_empty(list))
+ 		return;
+@@ -187,7 +248,10 @@ static void irq_work_run_list(struct lli
+ void irq_work_run(void)
+ {
+ 	irq_work_run_list(this_cpu_ptr(&raised_list));
+-	irq_work_run_list(this_cpu_ptr(&lazy_list));
++	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
++		irq_work_run_list(this_cpu_ptr(&lazy_list));
++	else
++		wake_irq_workd();
+ }
+ EXPORT_SYMBOL_GPL(irq_work_run);
+ 
+@@ -197,7 +261,11 @@ void irq_work_tick(void)
+ 
+ 	if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
+ 		irq_work_run_list(raised);
+-	irq_work_run_list(this_cpu_ptr(&lazy_list));
++
++	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
++		irq_work_run_list(this_cpu_ptr(&lazy_list));
++	else
++		wake_irq_workd();
+ }
+ 
+ /*
+@@ -219,3 +287,29 @@ void irq_work_sync(struct irq_work *work
+ 		cpu_relax();
+ }
+ EXPORT_SYMBOL_GPL(irq_work_sync);
++
++static void run_irq_workd(unsigned int cpu)
++{
++	irq_work_run_list(this_cpu_ptr(&lazy_list));
++}
++
++static void irq_workd_setup(unsigned int cpu)
++{
++	sched_set_fifo_low(current);
++}
++
++static struct smp_hotplug_thread irqwork_threads = {
++	.store                  = &irq_workd,
++	.setup			= irq_workd_setup,
++	.thread_should_run      = irq_workd_should_run,
++	.thread_fn              = run_irq_workd,
++	.thread_comm            = "irq_work/%u",
++};
++
++static __init int irq_work_init_threads(void)
++{
++	if (IS_ENABLED(CONFIG_PREEMPT_RT))
++		BUG_ON(smpboot_register_percpu_thread(&irqwork_threads));
++	return 0;
++}
++early_initcall(irq_work_init_threads);
diff --git a/patches/drm_i915__Dont_disable_interrupts_on_PREEMPT_RT_during_atomic_updates.patch b/patches/0004-drm-i915-Don-t-disable-interrupts-on-PREEMPT_RT-duri.patch
index c73d560af48f..61f6fe7a4aa9 100644
--- a/patches/drm_i915__Dont_disable_interrupts_on_PREEMPT_RT_during_atomic_updates.patch
+++ b/patches/0004-drm-i915-Don-t-disable-interrupts-on-PREEMPT_RT-duri.patch
@@ -1,8 +1,7 @@
-Subject: drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates
-From: Mike Galbraith <umgwanakikbuti@gmail.com>
-Date: Sat Feb 27 09:01:42 2016 +0100
-
 From: Mike Galbraith <umgwanakikbuti@gmail.com>
+Date: Sat, 27 Feb 2016 09:01:42 +0100
+Subject: [PATCH 04/10] drm/i915: Don't disable interrupts on PREEMPT_RT during
+ atomic updates
 
 Commit
    8d7849db3eab7 ("drm/i915: Make sprite updates atomic")
@@ -13,6 +12,17 @@ are sleeping locks on PREEMPT_RT.
 
 According to the comment the interrupts are disabled to avoid random delays and
 not required for protection or synchronisation.
+If this needs to happen with disabled interrupts on PREEMPT_RT, and the
+whole section is restricted to register access then all sleeping locks
+need to be acquired before interrupts are disabled and some function
+maybe moved after enabling interrupts again.
+This includes:
+- prepare_to_wait() + finish_wait() due its wake queue.
+- drm_crtc_vblank_put() -> vblank_disable_fn() drm_device::vbl_lock.
+- skl_pfit_enable(), intel_update_plane(), vlv_atomic_update_fifo() and
+  maybe others due to intel_uncore::lock
+- drm_crtc_arm_vblank_event() due to drm_device::event_lock and
+  drm_device::vblank_time_lock.
 
 Don't disable interrupts on PREEMPT_RT during atomic updates.
 
@@ -20,13 +30,10 @@ Don't disable interrupts on PREEMPT_RT during atomic updates.
 
 Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
 Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
-
 ---
  drivers/gpu/drm/i915/display/intel_crtc.c |   15 ++++++++++-----
  1 file changed, 10 insertions(+), 5 deletions(-)
----
+
 --- a/drivers/gpu/drm/i915/display/intel_crtc.c
 +++ b/drivers/gpu/drm/i915/display/intel_crtc.c
 @@ -425,7 +425,8 @@ void intel_pipe_update_start(const struc
diff --git a/patches/0004-mq-mqprio-Simplify-stats-copy.patch b/patches/0004-mq-mqprio-Simplify-stats-copy.patch
new file mode 100644
index 000000000000..3d00d58e6c14
--- /dev/null
+++ b/patches/0004-mq-mqprio-Simplify-stats-copy.patch
@@ -0,0 +1,127 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 7 Oct 2021 18:53:41 +0200
+Subject: [PATCH 04/10] mq, mqprio: Simplify stats copy.
+
+__gnet_stats_copy_basic() and __gnet_stats_copy_queue() update the
+statistics and don't overwritte them for both: global and per-CPU
+statistics.
+
+Simplify the code by removing the else case.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ net/sched/sch_mq.c     |   27 +++++++-----------------
+ net/sched/sch_mqprio.c |   55 +++++++++++++++----------------------------------
+ 2 files changed, 25 insertions(+), 57 deletions(-)
+
+--- a/net/sched/sch_mq.c
++++ b/net/sched/sch_mq.c
+@@ -145,26 +145,15 @@ static int mq_dump(struct Qdisc *sch, st
+ 		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
+ 		spin_lock_bh(qdisc_lock(qdisc));
+ 
+-		if (qdisc_is_percpu_stats(qdisc)) {
+-			qlen = qdisc_qlen_sum(qdisc);
+-			__gnet_stats_copy_basic(NULL, &sch->bstats,
+-						qdisc->cpu_bstats,
+-						&qdisc->bstats);
+-			__gnet_stats_copy_queue(&sch->qstats,
+-						qdisc->cpu_qstats,
+-						&qdisc->qstats, qlen);
+-			sch->q.qlen		+= qlen;
+-		} else {
+-			sch->q.qlen		+= qdisc->q.qlen;
+-			sch->bstats.bytes	+= qdisc->bstats.bytes;
+-			sch->bstats.packets	+= qdisc->bstats.packets;
+-			sch->qstats.qlen	+= qdisc->qstats.qlen;
+-			sch->qstats.backlog	+= qdisc->qstats.backlog;
+-			sch->qstats.drops	+= qdisc->qstats.drops;
+-			sch->qstats.requeues	+= qdisc->qstats.requeues;
+-			sch->qstats.overlimits	+= qdisc->qstats.overlimits;
+-		}
++		qlen = qdisc_qlen_sum(qdisc);
+ 
++		__gnet_stats_copy_basic(NULL, &sch->bstats,
++					qdisc->cpu_bstats,
++					&qdisc->bstats);
++		__gnet_stats_copy_queue(&sch->qstats,
++					qdisc->cpu_qstats,
++					&qdisc->qstats, qlen);
++		sch->q.qlen		+= qlen;
+ 		spin_unlock_bh(qdisc_lock(qdisc));
+ 	}
+ 
+--- a/net/sched/sch_mqprio.c
++++ b/net/sched/sch_mqprio.c
+@@ -399,28 +399,18 @@ static int mqprio_dump(struct Qdisc *sch
+ 	 * qdisc totals are added at end.
+ 	 */
+ 	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
++		u32 qlen = qdisc_qlen_sum(qdisc);
++
+ 		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
+ 		spin_lock_bh(qdisc_lock(qdisc));
+ 
+-		if (qdisc_is_percpu_stats(qdisc)) {
+-			__u32 qlen = qdisc_qlen_sum(qdisc);
+-
+-			__gnet_stats_copy_basic(NULL, &sch->bstats,
+-						qdisc->cpu_bstats,
+-						&qdisc->bstats);
+-			__gnet_stats_copy_queue(&sch->qstats,
+-						qdisc->cpu_qstats,
+-						&qdisc->qstats, qlen);
+-			sch->q.qlen		+= qlen;
+-		} else {
+-			sch->q.qlen		+= qdisc->q.qlen;
+-			sch->bstats.bytes	+= qdisc->bstats.bytes;
+-			sch->bstats.packets	+= qdisc->bstats.packets;
+-			sch->qstats.backlog	+= qdisc->qstats.backlog;
+-			sch->qstats.drops	+= qdisc->qstats.drops;
+-			sch->qstats.requeues	+= qdisc->qstats.requeues;
+-			sch->qstats.overlimits	+= qdisc->qstats.overlimits;
+-		}
++		__gnet_stats_copy_basic(NULL, &sch->bstats,
++					qdisc->cpu_bstats,
++					&qdisc->bstats);
++		__gnet_stats_copy_queue(&sch->qstats,
++					qdisc->cpu_qstats,
++					&qdisc->qstats, qlen);
++		sch->q.qlen		+= qlen;
+ 
+ 		spin_unlock_bh(qdisc_lock(qdisc));
+ 	}
+@@ -532,25 +522,14 @@ static int mqprio_dump_class_stats(struc
+ 
+ 			spin_lock_bh(qdisc_lock(qdisc));
+ 
+-			if (qdisc_is_percpu_stats(qdisc)) {
+-				qlen = qdisc_qlen_sum(qdisc);
+-
+-				__gnet_stats_copy_basic(NULL, &bstats,
+-							qdisc->cpu_bstats,
+-							&qdisc->bstats);
+-				__gnet_stats_copy_queue(&qstats,
+-							qdisc->cpu_qstats,
+-							&qdisc->qstats,
+-							qlen);
+-			} else {
+-				qlen		+= qdisc->q.qlen;
+-				bstats.bytes	+= qdisc->bstats.bytes;
+-				bstats.packets	+= qdisc->bstats.packets;
+-				qstats.backlog	+= qdisc->qstats.backlog;
+-				qstats.drops	+= qdisc->qstats.drops;
+-				qstats.requeues	+= qdisc->qstats.requeues;
+-				qstats.overlimits += qdisc->qstats.overlimits;
+-			}
++			qlen = qdisc_qlen_sum(qdisc);
++			__gnet_stats_copy_basic(NULL, &bstats,
++						qdisc->cpu_bstats,
++						&qdisc->bstats);
++			__gnet_stats_copy_queue(&qstats,
++						qdisc->cpu_qstats,
++						&qdisc->qstats,
++						qlen);
+ 			spin_unlock_bh(qdisc_lock(qdisc));
+ 		}
+ 
diff --git a/patches/0005_irq_work_also_rcuwait_for_irq_work_hard_irq_on_preempt_rt.patch b/patches/0004_irq_work_also_rcuwait_for_irq_work_hard_irq_on_preempt_rt.patch
index 7eb2665d5042..c0bde89fb628 100644
--- a/patches/0005_irq_work_also_rcuwait_for_irq_work_hard_irq_on_preempt_rt.patch
+++ b/patches/0004_irq_work_also_rcuwait_for_irq_work_hard_irq_on_preempt_rt.patch
@@ -1,6 +1,6 @@
 From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
 Subject: irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT
-Date: Mon, 27 Sep 2021 23:19:19 +0200
+Date: Wed, 06 Oct 2021 13:18:52 +0200
 
 On PREEMPT_RT most items are processed as LAZY via softirq context.
 Avoid to spin-wait for them because irq_work_sync() could have higher
@@ -9,7 +9,7 @@ priority and not allow the irq-work to be completed.
 Wait additionally for !IRQ_WORK_HARD_IRQ irq_work items on PREEMPT_RT.
 
 Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Link: https://lore.kernel.org/r/20210927211919.310855-6-bigeasy@linutronix.de
+Link: https://lore.kernel.org/r/20211006111852.1514359-5-bigeasy@linutronix.de
 ---
  include/linux/irq_work.h |    5 +++++
  kernel/irq_work.c        |    6 ++++--
@@ -31,7 +31,7 @@ Link: https://lore.kernel.org/r/20210927211919.310855-6-bigeasy@linutronix.de
  
 --- a/kernel/irq_work.c
 +++ b/kernel/irq_work.c
-@@ -181,7 +181,8 @@ void irq_work_single(void *arg)
+@@ -217,7 +217,8 @@ void irq_work_single(void *arg)
  	 */
  	(void)atomic_cmpxchg(&work->node.a_flags, flags, flags & ~IRQ_WORK_BUSY);
  
@@ -41,7 +41,7 @@ Link: https://lore.kernel.org/r/20210927211919.310855-6-bigeasy@linutronix.de
  		rcuwait_wake_up(&work->irqwait);
  }
  
-@@ -245,7 +246,8 @@ void irq_work_sync(struct irq_work *work
+@@ -277,7 +278,8 @@ void irq_work_sync(struct irq_work *work
  	lockdep_assert_irqs_enabled();
  	might_sleep();
  
diff --git a/patches/0004_irq_work_handle_some_irq_work_in_softirq_on_preempt_rt.patch b/patches/0004_irq_work_handle_some_irq_work_in_softirq_on_preempt_rt.patch
deleted file mode 100644
index 4c80f1413cc2..000000000000
--- a/patches/0004_irq_work_handle_some_irq_work_in_softirq_on_preempt_rt.patch
+++ /dev/null
@@ -1,183 +0,0 @@
-From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Subject: irq_work: Handle some irq_work in SOFTIRQ on PREEMPT_RT
-Date: Mon, 27 Sep 2021 23:19:18 +0200
-
-The irq_work callback is invoked in hard IRQ context. By default all
-callbacks are scheduled for invocation right away (given supported by
-the architecture) except for the ones marked IRQ_WORK_LAZY which are
-delayed until the next timer-tick.
-
-While looking over the callbacks, some of them may acquire locks
-(spinlock_t, rwlock_t) which are transformed into sleeping locks on
-PREEMPT_RT and must not be acquired in hard IRQ context.
-Changing the locks into locks which could be acquired in this context
-will lead to other problems such as increased latencies if everything
-in the chain has IRQ-off locks. This will not solve all the issues as
-one callback has been noticed which invoked kref_put() and its callback
-invokes kfree() and this can not be invoked in hardirq context.
-
-Some callbacks are required to be invoked in hardirq context even on
-PREEMPT_RT to work properly. This includes for instance the NO_HZ
-callback which needs to be able to observe the idle context.
-
-The callbacks which require to be run in hardirq have already been
-marked. Use this information to split the callbacks onto the two lists
-on PREEMPT_RT:
-- lazy_list
-  Work items which are not marked with IRQ_WORK_HARD_IRQ will be added
-  to this list. Callbacks on this list will be invoked from timer
-  softirq handler. The handler here may acquire sleeping locks such as
-  spinlock_t and invoke kfree().
-
-- raised_list
-  Work items which are marked with IRQ_WORK_HARD_IRQ will be added to
-  this list. They will be invoked in hardirq context and must not
-  acquire any sleeping locks.
-
-[bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a
-          hard and soft variant. Collected fixes over time from Steven
-	  Rostedt and Mike Galbraith. ]
-
-Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Link: https://lore.kernel.org/r/20210927211919.310855-5-bigeasy@linutronix.de
----
- include/linux/irq_work.h |    6 ++++
- kernel/irq_work.c        |   58 ++++++++++++++++++++++++++++++++++++++---------
- kernel/time/timer.c      |    2 +
- 3 files changed, 55 insertions(+), 11 deletions(-)
-
---- a/include/linux/irq_work.h
-+++ b/include/linux/irq_work.h
-@@ -67,4 +67,10 @@ static inline void irq_work_run(void) {
- static inline void irq_work_single(void *arg) { }
- #endif
- 
-+#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_PREEMPT_RT)
-+void irq_work_tick_soft(void);
-+#else
-+static inline void irq_work_tick_soft(void) { }
-+#endif
-+
- #endif /* _LINUX_IRQ_WORK_H */
---- a/kernel/irq_work.c
-+++ b/kernel/irq_work.c
-@@ -18,6 +18,7 @@
- #include <linux/cpu.h>
- #include <linux/notifier.h>
- #include <linux/smp.h>
-+#include <linux/interrupt.h>
- #include <asm/processor.h>
- #include <linux/kasan.h>
- 
-@@ -52,13 +53,27 @@ void __weak arch_irq_work_raise(void)
- /* Enqueue on current CPU, work must already be claimed and preempt disabled */
- static void __irq_work_queue_local(struct irq_work *work)
- {
--	/* If the work is "lazy", handle it from next tick if any */
--	if (atomic_read(&work->node.a_flags) & IRQ_WORK_LAZY) {
--		if (llist_add(&work->node.llist, this_cpu_ptr(&lazy_list)) &&
--		    tick_nohz_tick_stopped())
--			arch_irq_work_raise();
--	} else {
--		if (llist_add(&work->node.llist, this_cpu_ptr(&raised_list)))
-+	struct llist_head *list;
-+	bool lazy_work;
-+	int work_flags;
-+
-+	work_flags = atomic_read(&work->node.a_flags);
-+	if (work_flags & IRQ_WORK_LAZY)
-+		lazy_work = true;
-+	else if (IS_ENABLED(CONFIG_PREEMPT_RT) &&
-+		 !(work_flags & IRQ_WORK_HARD_IRQ))
-+		lazy_work = true;
-+	else
-+		lazy_work = false;
-+
-+	if (lazy_work)
-+		list = this_cpu_ptr(&lazy_list);
-+	else
-+		list = this_cpu_ptr(&raised_list);
-+
-+	if (llist_add(&work->node.llist, list)) {
-+		/* If the work is "lazy", handle it from next tick if any */
-+		if (!lazy_work || tick_nohz_tick_stopped())
- 			arch_irq_work_raise();
- 	}
- }
-@@ -104,7 +119,13 @@ bool irq_work_queue_on(struct irq_work *
- 	if (cpu != smp_processor_id()) {
- 		/* Arch remote IPI send/receive backend aren't NMI safe */
- 		WARN_ON_ONCE(in_nmi());
--		__smp_call_single_queue(cpu, &work->node.llist);
-+
-+		if (IS_ENABLED(CONFIG_PREEMPT_RT) && !(atomic_read(&work->node.a_flags) & IRQ_WORK_HARD_IRQ)) {
-+			if (llist_add(&work->node.llist, &per_cpu(lazy_list, cpu)))
-+				arch_send_call_function_single_ipi(cpu);
-+		} else {
-+			__smp_call_single_queue(cpu, &work->node.llist);
-+		}
- 	} else {
- 		__irq_work_queue_local(work);
- 	}
-@@ -121,7 +142,6 @@ bool irq_work_needs_cpu(void)
- 
- 	raised = this_cpu_ptr(&raised_list);
- 	lazy = this_cpu_ptr(&lazy_list);
--
- 	if (llist_empty(raised) || arch_irq_work_has_interrupt())
- 		if (llist_empty(lazy))
- 			return false;
-@@ -170,7 +190,11 @@ static void irq_work_run_list(struct lli
- 	struct irq_work *work, *tmp;
- 	struct llist_node *llnode;
- 
--	BUG_ON(!in_hardirq());
-+	/*
-+	 * On PREEMPT_RT IRQ-work may run in SOFTIRQ context if it is not marked
-+	 * explicitly that it needs to run in hardirq context.
-+	 */
-+	BUG_ON(!in_hardirq() && !IS_ENABLED(CONFIG_PREEMPT_RT));
- 
- 	if (llist_empty(list))
- 		return;
-@@ -187,7 +211,10 @@ static void irq_work_run_list(struct lli
- void irq_work_run(void)
- {
- 	irq_work_run_list(this_cpu_ptr(&raised_list));
--	irq_work_run_list(this_cpu_ptr(&lazy_list));
-+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
-+		irq_work_run_list(this_cpu_ptr(&lazy_list));
-+	else if (!llist_empty(this_cpu_ptr(&lazy_list)))
-+		raise_softirq(TIMER_SOFTIRQ);
- }
- EXPORT_SYMBOL_GPL(irq_work_run);
- 
-@@ -197,8 +224,17 @@ void irq_work_tick(void)
- 
- 	if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
- 		irq_work_run_list(raised);
-+
-+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
-+		irq_work_run_list(this_cpu_ptr(&lazy_list));
-+}
-+
-+#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_PREEMPT_RT)
-+void irq_work_tick_soft(void)
-+{
- 	irq_work_run_list(this_cpu_ptr(&lazy_list));
- }
-+#endif
- 
- /*
-  * Synchronize against the irq_work @entry, ensures the entry is not
---- a/kernel/time/timer.c
-+++ b/kernel/time/timer.c
-@@ -1744,6 +1744,8 @@ static __latent_entropy void run_timer_s
- {
- 	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
- 
-+	irq_work_tick_soft();
-+
- 	__run_timers(base);
- 	if (IS_ENABLED(CONFIG_NO_HZ_COMMON))
- 		__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
diff --git a/patches/drm_i915__disable_tracing_on_-RT.patch b/patches/0005-drm-i915-Disable-tracing-points-on-PREEMPT_RT.patch
index cc3d17c6c6ee..b53679b5e75a 100644
--- a/patches/drm_i915__disable_tracing_on_-RT.patch
+++ b/patches/0005-drm-i915-Disable-tracing-points-on-PREEMPT_RT.patch
@@ -1,8 +1,6 @@
-Subject: drm/i915: disable tracing on -RT
-From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Date: Thu Dec  6 09:52:20 2018 +0100
-
 From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 6 Dec 2018 09:52:20 +0100
+Subject: [PATCH 05/10] drm/i915: Disable tracing points on PREEMPT_RT
 
 Luca Abeni reported this:
 | BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
@@ -14,22 +12,23 @@ Luca Abeni reported this:
 |  trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
 
 The tracing events use trace_i915_pipe_update_start() among other events
-use functions acquire spin locks. A few trace points use
+use functions acquire spinlock_t locks which are transformed into
+sleeping locks on PREEMPT_RT. A few trace points use
 intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
-might acquire a sleeping lock.
+might acquire a sleeping locks on PREEMPT_RT.
+At the time the arguments are evaluated within trace point, preemption
+is disabled and so the locks must not be acquired on PREEMPT_RT.
 
-Based on this I don't see any other way than disable trace points on RT.
+Based on this I don't see any other way than disable trace points on
+PREMPT_RT.
 
-Cc: stable-rt@vger.kernel.org
 Reported-by: Luca Abeni <lucabe72@gmail.com>
+Cc: Steven Rostedt <rostedt@goodmis.org>
 Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
-
 ---
  drivers/gpu/drm/i915/i915_trace.h |    4 ++++
  1 file changed, 4 insertions(+)
----
+
 --- a/drivers/gpu/drm/i915/i915_trace.h
 +++ b/drivers/gpu/drm/i915/i915_trace.h
 @@ -2,6 +2,10 @@
diff --git a/patches/0005-u64_stats-Introduce-u64_stats_set.patch b/patches/0005-u64_stats-Introduce-u64_stats_set.patch
new file mode 100644
index 000000000000..ff39a6f8e2c9
--- /dev/null
+++ b/patches/0005-u64_stats-Introduce-u64_stats_set.patch
@@ -0,0 +1,43 @@
+From: "Ahmed S. Darwish" <a.darwish@linutronix.de>
+Date: Fri, 17 Sep 2021 13:31:37 +0200
+Subject: [PATCH 05/10] u64_stats: Introduce u64_stats_set()
+
+Allow to directly set a u64_stats_t value which is used to provide an init
+function which sets it directly to zero intead of memset() the value.
+
+Add u64_stats_set() to the u64_stats API.
+
+[bigeasy: commit message. ]
+
+Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/linux/u64_stats_sync.h |   10 ++++++++++
+ 1 file changed, 10 insertions(+)
+
+--- a/include/linux/u64_stats_sync.h
++++ b/include/linux/u64_stats_sync.h
+@@ -83,6 +83,11 @@ static inline u64 u64_stats_read(const u
+ 	return local64_read(&p->v);
+ }
+ 
++static inline void u64_stats_set(u64_stats_t *p, u64 val)
++{
++	local64_set(&p->v, val);
++}
++
+ static inline void u64_stats_add(u64_stats_t *p, unsigned long val)
+ {
+ 	local64_add(val, &p->v);
+@@ -104,6 +109,11 @@ static inline u64 u64_stats_read(const u
+ 	return p->v;
+ }
+ 
++static inline void u64_stats_set(u64_stats_t *p, u64 val)
++{
++	p->v = val;
++}
++
+ static inline void u64_stats_add(u64_stats_t *p, unsigned long val)
+ {
+ 	p->v += val;
diff --git a/patches/drm_i915__skip_DRM_I915_LOW_LEVEL_TRACEPOINTS_with_NOTRACE.patch b/patches/0006-drm-i915-skip-DRM_I915_LOW_LEVEL_TRACEPOINTS-with-NO.patch
index 111d12ca2f85..d014fd161968 100644
--- a/patches/drm_i915__skip_DRM_I915_LOW_LEVEL_TRACEPOINTS_with_NOTRACE.patch
+++ b/patches/0006-drm-i915-skip-DRM_I915_LOW_LEVEL_TRACEPOINTS-with-NO.patch
@@ -1,22 +1,20 @@
-Subject: drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE
-From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Date: Wed Dec 19 10:47:02 2018 +0100
-
 From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Wed, 19 Dec 2018 10:47:02 +0100
+Subject: [PATCH 06/10] drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with
+ NOTRACE
 
 The order of the header files is important. If this header file is
 included after tracepoint.h was included then the NOTRACE here becomes a
 nop. Currently this happens for two .c files which use the tracepoitns
 behind DRM_I915_LOW_LEVEL_TRACEPOINTS.
 
+Cc: Steven Rostedt <rostedt@goodmis.org>
 Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
-
 ---
  drivers/gpu/drm/i915/i915_trace.h |    2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
----
+
 --- a/drivers/gpu/drm/i915/i915_trace.h
 +++ b/drivers/gpu/drm/i915/i915_trace.h
 @@ -826,7 +826,7 @@ DEFINE_EVENT(i915_request, i915_request_
diff --git a/patches/0006-net-sched-Protect-Qdisc-bstats-with-u64_stats.patch b/patches/0006-net-sched-Protect-Qdisc-bstats-with-u64_stats.patch
new file mode 100644
index 000000000000..17a95078180e
--- /dev/null
+++ b/patches/0006-net-sched-Protect-Qdisc-bstats-with-u64_stats.patch
@@ -0,0 +1,323 @@
+From: "Ahmed S. Darwish" <a.darwish@linutronix.de>
+Date: Fri, 17 Sep 2021 13:31:38 +0200
+Subject: [PATCH 06/10] net: sched: Protect Qdisc::bstats with u64_stats
+
+The not-per-CPU variant of qdisc tc (traffic control) statistics,
+Qdisc::gnet_stats_basic_packed bstats, is protected with Qdisc::running
+sequence counter.
+
+This sequence counter is used for reliably protecting bstats reads from
+parallel writes.  Meanwhile, the seqcount's write section covers a much
+wider area than bstats update: qdisc_run_begin() => qdisc_run_end().
+
+That read/write section asymmetry can lead to needless retries of the
+read section. To prepare for removing the Qdisc::running sequence
+counter altogether, introduce a u64_stats sync point inside bstats
+instead.
+
+Modify _bstats_update() to start/end the bstats u64_stats write
+section. Introduce _bstats_set(); it is now needed since raw writes done
+within the bigger qdisc_run_begin/end() section need a helper for
+starting/ending the u64_stats write section.
+
+For bisectability, and finer commits granularity, the bstats read
+section is still protected with a Qdisc::running read/retry loop and
+qdisc_run_begin/end() still starts/ends that seqcount write section.
+Once all call sites are modified to use _bstats_set/update(), the
+Qdisc::running seqcount will be removed and bstats read/retry loop will
+be modified to utilize the internal u64_stats sync point.
+
+Note, using u64_stats implies no sequence counter protection for 64-bit
+architectures. This can lead to the statistics "packets" vs. "bytes"
+values getting out of sync on rare occasions. The individual values will
+still be valid.
+
+[bigeasy: Minor commit message edits, init all gnet_stats_basic_packed.]
+
+Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/net/gen_stats.h    |   14 ++++++++++++++
+ include/net/sch_generic.h  |    2 ++
+ net/core/gen_estimator.c   |    2 +-
+ net/core/gen_stats.c       |   16 ++++++++++++++--
+ net/netfilter/xt_RATEEST.c |    1 +
+ net/sched/act_api.c        |    2 ++
+ net/sched/sch_atm.c        |    1 +
+ net/sched/sch_cbq.c        |    1 +
+ net/sched/sch_drr.c        |    1 +
+ net/sched/sch_ets.c        |    1 +
+ net/sched/sch_generic.c    |    1 +
+ net/sched/sch_gred.c       |    4 +++-
+ net/sched/sch_hfsc.c       |    1 +
+ net/sched/sch_htb.c        |    7 +++++--
+ net/sched/sch_mq.c         |    2 +-
+ net/sched/sch_mqprio.c     |    5 +++--
+ net/sched/sch_qfq.c        |    1 +
+ 17 files changed, 53 insertions(+), 9 deletions(-)
+
+--- a/include/net/gen_stats.h
++++ b/include/net/gen_stats.h
+@@ -11,6 +11,7 @@
+ struct gnet_stats_basic_packed {
+ 	__u64	bytes;
+ 	__u64	packets;
++	struct u64_stats_sync syncp;
+ };
+ 
+ struct gnet_stats_basic_cpu {
+@@ -18,6 +19,19 @@ struct gnet_stats_basic_cpu {
+ 	struct u64_stats_sync syncp;
+ } __aligned(2 * sizeof(u64));
+ 
++#ifdef CONFIG_LOCKDEP
++void gnet_stats_basic_packed_init(struct gnet_stats_basic_packed *b);
++
++#else
++
++static inline void gnet_stats_basic_packed_init(struct gnet_stats_basic_packed *b)
++{
++	b->bytes = 0;
++	b->packets = 0;
++	u64_stats_init(&b->syncp);
++}
++#endif
++
+ struct net_rate_estimator;
+ 
+ struct gnet_dump {
+--- a/include/net/sch_generic.h
++++ b/include/net/sch_generic.h
+@@ -848,8 +848,10 @@ static inline int qdisc_enqueue(struct s
+ static inline void _bstats_update(struct gnet_stats_basic_packed *bstats,
+ 				  __u64 bytes, __u32 packets)
+ {
++	u64_stats_update_begin(&bstats->syncp);
+ 	bstats->bytes += bytes;
+ 	bstats->packets += packets;
++	u64_stats_update_end(&bstats->syncp);
+ }
+ 
+ static inline void bstats_update(struct gnet_stats_basic_packed *bstats,
+--- a/net/core/gen_estimator.c
++++ b/net/core/gen_estimator.c
+@@ -62,7 +62,7 @@ struct net_rate_estimator {
+ static void est_fetch_counters(struct net_rate_estimator *e,
+ 			       struct gnet_stats_basic_packed *b)
+ {
+-	memset(b, 0, sizeof(*b));
++	gnet_stats_basic_packed_init(b);
+ 	if (e->stats_lock)
+ 		spin_lock(e->stats_lock);
+ 
+--- a/net/core/gen_stats.c
++++ b/net/core/gen_stats.c
+@@ -18,7 +18,7 @@
+ #include <linux/gen_stats.h>
+ #include <net/netlink.h>
+ #include <net/gen_stats.h>
+-
++#include <net/sch_generic.h>
+ 
+ static inline int
+ gnet_stats_copy(struct gnet_dump *d, int type, void *buf, int size, int padattr)
+@@ -114,6 +114,17 @@ gnet_stats_start_copy(struct sk_buff *sk
+ }
+ EXPORT_SYMBOL(gnet_stats_start_copy);
+ 
++#ifdef CONFIG_LOCKDEP
++/* Must not be inlined, due to u64_stats seqcount_t lockdep key */
++void gnet_stats_basic_packed_init(struct gnet_stats_basic_packed *b)
++{
++	b->bytes = 0;
++	b->packets = 0;
++	u64_stats_init(&b->syncp);
++}
++EXPORT_SYMBOL(gnet_stats_basic_packed_init);
++#endif
++
+ static void
+ __gnet_stats_copy_basic_cpu(struct gnet_stats_basic_packed *bstats,
+ 			    struct gnet_stats_basic_cpu __percpu *cpu)
+@@ -169,8 +180,9 @@ static int
+ 			 struct gnet_stats_basic_packed *b,
+ 			 int type)
+ {
+-	struct gnet_stats_basic_packed bstats = {0};
++	struct gnet_stats_basic_packed bstats;
+ 
++	gnet_stats_basic_packed_init(&bstats);
+ 	__gnet_stats_copy_basic(running, &bstats, cpu, b);
+ 
+ 	if (d->compat_tc_stats && type == TCA_STATS_BASIC) {
+--- a/net/netfilter/xt_RATEEST.c
++++ b/net/netfilter/xt_RATEEST.c
+@@ -143,6 +143,7 @@ static int xt_rateest_tg_checkentry(cons
+ 	if (!est)
+ 		goto err1;
+ 
++	gnet_stats_basic_packed_init(&est->bstats);
+ 	strlcpy(est->name, info->name, sizeof(est->name));
+ 	spin_lock_init(&est->lock);
+ 	est->refcnt		= 1;
+--- a/net/sched/act_api.c
++++ b/net/sched/act_api.c
+@@ -490,6 +490,8 @@ int tcf_idr_create(struct tc_action_net
+ 		if (!p->cpu_qstats)
+ 			goto err3;
+ 	}
++	gnet_stats_basic_packed_init(&p->tcfa_bstats);
++	gnet_stats_basic_packed_init(&p->tcfa_bstats_hw);
+ 	spin_lock_init(&p->tcfa_lock);
+ 	p->tcfa_index = index;
+ 	p->tcfa_tm.install = jiffies;
+--- a/net/sched/sch_atm.c
++++ b/net/sched/sch_atm.c
+@@ -548,6 +548,7 @@ static int atm_tc_init(struct Qdisc *sch
+ 	pr_debug("atm_tc_init(sch %p,[qdisc %p],opt %p)\n", sch, p, opt);
+ 	INIT_LIST_HEAD(&p->flows);
+ 	INIT_LIST_HEAD(&p->link.list);
++	gnet_stats_basic_packed_init(&p->link.bstats);
+ 	list_add(&p->link.list, &p->flows);
+ 	p->link.q = qdisc_create_dflt(sch->dev_queue,
+ 				      &pfifo_qdisc_ops, sch->handle, extack);
+--- a/net/sched/sch_cbq.c
++++ b/net/sched/sch_cbq.c
+@@ -1611,6 +1611,7 @@ cbq_change_class(struct Qdisc *sch, u32
+ 	if (cl == NULL)
+ 		goto failure;
+ 
++	gnet_stats_basic_packed_init(&cl->bstats);
+ 	err = tcf_block_get(&cl->block, &cl->filter_list, sch, extack);
+ 	if (err) {
+ 		kfree(cl);
+--- a/net/sched/sch_drr.c
++++ b/net/sched/sch_drr.c
+@@ -106,6 +106,7 @@ static int drr_change_class(struct Qdisc
+ 	if (cl == NULL)
+ 		return -ENOBUFS;
+ 
++	gnet_stats_basic_packed_init(&cl->bstats);
+ 	cl->common.classid = classid;
+ 	cl->quantum	   = quantum;
+ 	cl->qdisc	   = qdisc_create_dflt(sch->dev_queue,
+--- a/net/sched/sch_ets.c
++++ b/net/sched/sch_ets.c
+@@ -662,6 +662,7 @@ static int ets_qdisc_change(struct Qdisc
+ 	q->nbands = nbands;
+ 	for (i = nstrict; i < q->nstrict; i++) {
+ 		INIT_LIST_HEAD(&q->classes[i].alist);
++		gnet_stats_basic_packed_init(&q->classes[i].bstats);
+ 		if (q->classes[i].qdisc->q.qlen) {
+ 			list_add_tail(&q->classes[i].alist, &q->active);
+ 			q->classes[i].deficit = quanta[i];
+--- a/net/sched/sch_generic.c
++++ b/net/sched/sch_generic.c
+@@ -892,6 +892,7 @@ struct Qdisc *qdisc_alloc(struct netdev_
+ 	__skb_queue_head_init(&sch->gso_skb);
+ 	__skb_queue_head_init(&sch->skb_bad_txq);
+ 	qdisc_skb_head_init(&sch->q);
++	gnet_stats_basic_packed_init(&sch->bstats);
+ 	spin_lock_init(&sch->q.lock);
+ 
+ 	if (ops->static_flags & TCQ_F_CPUSTATS) {
+--- a/net/sched/sch_gred.c
++++ b/net/sched/sch_gred.c
+@@ -364,9 +364,11 @@ static int gred_offload_dump_stats(struc
+ 	hw_stats->handle = sch->handle;
+ 	hw_stats->parent = sch->parent;
+ 
+-	for (i = 0; i < MAX_DPs; i++)
++	for (i = 0; i < MAX_DPs; i++) {
++		gnet_stats_basic_packed_init(&hw_stats->stats.bstats[i]);
+ 		if (table->tab[i])
+ 			hw_stats->stats.xstats[i] = &table->tab[i]->stats;
++	}
+ 
+ 	ret = qdisc_offload_dump_helper(sch, TC_SETUP_QDISC_GRED, hw_stats);
+ 	/* Even if driver returns failure adjust the stats - in case offload
+--- a/net/sched/sch_hfsc.c
++++ b/net/sched/sch_hfsc.c
+@@ -1406,6 +1406,7 @@ hfsc_init_qdisc(struct Qdisc *sch, struc
+ 	if (err)
+ 		return err;
+ 
++	gnet_stats_basic_packed_init(&q->root.bstats);
+ 	q->root.cl_common.classid = sch->handle;
+ 	q->root.sched   = q;
+ 	q->root.qdisc = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops,
+--- a/net/sched/sch_htb.c
++++ b/net/sched/sch_htb.c
+@@ -1311,7 +1311,7 @@ static void htb_offload_aggregate_stats(
+ 	struct htb_class *c;
+ 	unsigned int i;
+ 
+-	memset(&cl->bstats, 0, sizeof(cl->bstats));
++	gnet_stats_basic_packed_init(&cl->bstats);
+ 
+ 	for (i = 0; i < q->clhash.hashsize; i++) {
+ 		hlist_for_each_entry(c, &q->clhash.hash[i], common.hnode) {
+@@ -1357,7 +1357,7 @@ htb_dump_class_stats(struct Qdisc *sch,
+ 			if (cl->leaf.q)
+ 				cl->bstats = cl->leaf.q->bstats;
+ 			else
+-				memset(&cl->bstats, 0, sizeof(cl->bstats));
++				gnet_stats_basic_packed_init(&cl->bstats);
+ 			cl->bstats.bytes += cl->bstats_bias.bytes;
+ 			cl->bstats.packets += cl->bstats_bias.packets;
+ 		} else {
+@@ -1849,6 +1849,9 @@ static int htb_change_class(struct Qdisc
+ 		if (!cl)
+ 			goto failure;
+ 
++		gnet_stats_basic_packed_init(&cl->bstats);
++		gnet_stats_basic_packed_init(&cl->bstats_bias);
++
+ 		err = tcf_block_get(&cl->block, &cl->filter_list, sch, extack);
+ 		if (err) {
+ 			kfree(cl);
+--- a/net/sched/sch_mq.c
++++ b/net/sched/sch_mq.c
+@@ -133,7 +133,7 @@ static int mq_dump(struct Qdisc *sch, st
+ 	__u32 qlen = 0;
+ 
+ 	sch->q.qlen = 0;
+-	memset(&sch->bstats, 0, sizeof(sch->bstats));
++	gnet_stats_basic_packed_init(&sch->bstats);
+ 	memset(&sch->qstats, 0, sizeof(sch->qstats));
+ 
+ 	/* MQ supports lockless qdiscs. However, statistics accounting needs
+--- a/net/sched/sch_mqprio.c
++++ b/net/sched/sch_mqprio.c
+@@ -390,7 +390,7 @@ static int mqprio_dump(struct Qdisc *sch
+ 	unsigned int ntx, tc;
+ 
+ 	sch->q.qlen = 0;
+-	memset(&sch->bstats, 0, sizeof(sch->bstats));
++	gnet_stats_basic_packed_init(&sch->bstats);
+ 	memset(&sch->qstats, 0, sizeof(sch->qstats));
+ 
+ 	/* MQ supports lockless qdiscs. However, statistics accounting needs
+@@ -504,10 +504,11 @@ static int mqprio_dump_class_stats(struc
+ 		int i;
+ 		__u32 qlen = 0;
+ 		struct gnet_stats_queue qstats = {0};
+-		struct gnet_stats_basic_packed bstats = {0};
++		struct gnet_stats_basic_packed bstats;
+ 		struct net_device *dev = qdisc_dev(sch);
+ 		struct netdev_tc_txq tc = dev->tc_to_txq[cl & TC_BITMASK];
+ 
++		gnet_stats_basic_packed_init(&bstats);
+ 		/* Drop lock here it will be reclaimed before touching
+ 		 * statistics this is required because the d->lock we
+ 		 * hold here is the look on dev_queue->qdisc_sleeping
+--- a/net/sched/sch_qfq.c
++++ b/net/sched/sch_qfq.c
+@@ -465,6 +465,7 @@ static int qfq_change_class(struct Qdisc
+ 	if (cl == NULL)
+ 		return -ENOBUFS;
+ 
++	gnet_stats_basic_packed_init(&cl->bstats);
+ 	cl->common.classid = classid;
+ 	cl->deficit = lmax;
+ 
diff --git a/patches/drm-i915-gt-Queue-and-wait-for-the-irq_work-item.patch b/patches/0007-drm-i915-gt-Queue-and-wait-for-the-irq_work-item.patch
index d944f7801b2c..35d8d1780147 100644
--- a/patches/drm-i915-gt-Queue-and-wait-for-the-irq_work-item.patch
+++ b/patches/0007-drm-i915-gt-Queue-and-wait-for-the-irq_work-item.patch
@@ -1,6 +1,6 @@
 From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
 Date: Wed, 8 Sep 2021 17:18:00 +0200
-Subject: [PATCH] drm/i915/gt: Queue and wait for the irq_work item.
+Subject: [PATCH 07/10] drm/i915/gt: Queue and wait for the irq_work item.
 
 Disabling interrupts and invoking the irq_work function directly breaks
 on PREEMPT_RT.
@@ -19,6 +19,7 @@ directly.
 
 Reported-by: Clark Williams <williams@redhat.com>
 Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
 ---
  drivers/gpu/drm/i915/gt/intel_breadcrumbs.c |    5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/patches/0007-net-sched-Use-_bstats_update-set-instead-of-raw-writ.patch b/patches/0007-net-sched-Use-_bstats_update-set-instead-of-raw-writ.patch
new file mode 100644
index 000000000000..af9c13588dc0
--- /dev/null
+++ b/patches/0007-net-sched-Use-_bstats_update-set-instead-of-raw-writ.patch
@@ -0,0 +1,176 @@
+From: "Ahmed S. Darwish" <a.darwish@linutronix.de>
+Date: Fri, 17 Sep 2021 13:31:39 +0200
+Subject: [PATCH 07/10] net: sched: Use _bstats_update/set() instead of raw
+ writes
+
+The Qdisc::running sequence counter, used to protect Qdisc::bstats reads
+from parallel writes, is in the process of being removed. Qdisc::bstats
+read/writes will synchronize using an internal u64_stats sync point
+instead.
+
+Modify all bstats writes to use _bstats_update(). This ensures that
+the internal u64_stats sync point is always acquired and released as
+appropriate.
+
+Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ net/core/gen_stats.c |    9 +++++----
+ net/sched/sch_cbq.c  |    3 +--
+ net/sched/sch_gred.c |    7 ++++---
+ net/sched/sch_htb.c  |   25 +++++++++++++++----------
+ net/sched/sch_qfq.c  |    3 +--
+ 5 files changed, 26 insertions(+), 21 deletions(-)
+
+--- a/net/core/gen_stats.c
++++ b/net/core/gen_stats.c
+@@ -129,6 +129,7 @@ static void
+ __gnet_stats_copy_basic_cpu(struct gnet_stats_basic_packed *bstats,
+ 			    struct gnet_stats_basic_cpu __percpu *cpu)
+ {
++	u64 t_bytes = 0, t_packets = 0;
+ 	int i;
+ 
+ 	for_each_possible_cpu(i) {
+@@ -142,9 +143,10 @@ static void
+ 			packets = bcpu->bstats.packets;
+ 		} while (u64_stats_fetch_retry_irq(&bcpu->syncp, start));
+ 
+-		bstats->bytes += bytes;
+-		bstats->packets += packets;
++		t_bytes += bytes;
++		t_packets += packets;
+ 	}
++	_bstats_update(bstats, t_bytes, t_packets);
+ }
+ 
+ void
+@@ -168,8 +170,7 @@ void
+ 		packets = b->packets;
+ 	} while (running && read_seqcount_retry(running, seq));
+ 
+-	bstats->bytes += bytes;
+-	bstats->packets += packets;
++	_bstats_update(bstats, bytes, packets);
+ }
+ EXPORT_SYMBOL(__gnet_stats_copy_basic);
+ 
+--- a/net/sched/sch_cbq.c
++++ b/net/sched/sch_cbq.c
+@@ -565,8 +565,7 @@ cbq_update(struct cbq_sched_data *q)
+ 		long avgidle = cl->avgidle;
+ 		long idle;
+ 
+-		cl->bstats.packets++;
+-		cl->bstats.bytes += len;
++		_bstats_update(&cl->bstats, len, 1);
+ 
+ 		/*
+ 		 * (now - last) is total time between packet right edges.
+--- a/net/sched/sch_gred.c
++++ b/net/sched/sch_gred.c
+@@ -353,6 +353,7 @@ static int gred_offload_dump_stats(struc
+ {
+ 	struct gred_sched *table = qdisc_priv(sch);
+ 	struct tc_gred_qopt_offload *hw_stats;
++	u64 bytes = 0, packets = 0;
+ 	unsigned int i;
+ 	int ret;
+ 
+@@ -381,15 +382,15 @@ static int gred_offload_dump_stats(struc
+ 		table->tab[i]->bytesin += hw_stats->stats.bstats[i].bytes;
+ 		table->tab[i]->backlog += hw_stats->stats.qstats[i].backlog;
+ 
+-		_bstats_update(&sch->bstats,
+-			       hw_stats->stats.bstats[i].bytes,
+-			       hw_stats->stats.bstats[i].packets);
++		bytes += hw_stats->stats.bstats[i].bytes;
++		packets += hw_stats->stats.bstats[i].packets;
+ 		sch->qstats.qlen += hw_stats->stats.qstats[i].qlen;
+ 		sch->qstats.backlog += hw_stats->stats.qstats[i].backlog;
+ 		sch->qstats.drops += hw_stats->stats.qstats[i].drops;
+ 		sch->qstats.requeues += hw_stats->stats.qstats[i].requeues;
+ 		sch->qstats.overlimits += hw_stats->stats.qstats[i].overlimits;
+ 	}
++	_bstats_update(&sch->bstats, bytes, packets);
+ 
+ 	kfree(hw_stats);
+ 	return ret;
+--- a/net/sched/sch_htb.c
++++ b/net/sched/sch_htb.c
+@@ -1308,6 +1308,7 @@ static int htb_dump_class(struct Qdisc *
+ static void htb_offload_aggregate_stats(struct htb_sched *q,
+ 					struct htb_class *cl)
+ {
++	u64 bytes = 0, packets = 0;
+ 	struct htb_class *c;
+ 	unsigned int i;
+ 
+@@ -1323,14 +1324,15 @@ static void htb_offload_aggregate_stats(
+ 			if (p != cl)
+ 				continue;
+ 
+-			cl->bstats.bytes += c->bstats_bias.bytes;
+-			cl->bstats.packets += c->bstats_bias.packets;
++			bytes += c->bstats_bias.bytes;
++			packets += c->bstats_bias.packets;
+ 			if (c->level == 0) {
+-				cl->bstats.bytes += c->leaf.q->bstats.bytes;
+-				cl->bstats.packets += c->leaf.q->bstats.packets;
++				bytes += c->leaf.q->bstats.bytes;
++				packets += c->leaf.q->bstats.packets;
+ 			}
+ 		}
+ 	}
++	_bstats_update(&cl->bstats, bytes, packets);
+ }
+ 
+ static int
+@@ -1358,8 +1360,9 @@ htb_dump_class_stats(struct Qdisc *sch,
+ 				cl->bstats = cl->leaf.q->bstats;
+ 			else
+ 				gnet_stats_basic_packed_init(&cl->bstats);
+-			cl->bstats.bytes += cl->bstats_bias.bytes;
+-			cl->bstats.packets += cl->bstats_bias.packets;
++			_bstats_update(&cl->bstats,
++				       cl->bstats_bias.bytes,
++				       cl->bstats_bias.packets);
+ 		} else {
+ 			htb_offload_aggregate_stats(q, cl);
+ 		}
+@@ -1578,8 +1581,9 @@ static int htb_destroy_class_offload(str
+ 		WARN_ON(old != q);
+ 
+ 	if (cl->parent) {
+-		cl->parent->bstats_bias.bytes += q->bstats.bytes;
+-		cl->parent->bstats_bias.packets += q->bstats.packets;
++		_bstats_update(&cl->parent->bstats_bias,
++			       q->bstats.bytes,
++			       q->bstats.packets);
+ 	}
+ 
+ 	offload_opt = (struct tc_htb_qopt_offload) {
+@@ -1925,8 +1929,9 @@ static int htb_change_class(struct Qdisc
+ 				htb_graft_helper(dev_queue, old_q);
+ 				goto err_kill_estimator;
+ 			}
+-			parent->bstats_bias.bytes += old_q->bstats.bytes;
+-			parent->bstats_bias.packets += old_q->bstats.packets;
++			_bstats_update(&parent->bstats_bias,
++				       old_q->bstats.bytes,
++				       old_q->bstats.packets);
+ 			qdisc_put(old_q);
+ 		}
+ 		new_q = qdisc_create_dflt(dev_queue, &pfifo_qdisc_ops,
+--- a/net/sched/sch_qfq.c
++++ b/net/sched/sch_qfq.c
+@@ -1235,8 +1235,7 @@ static int qfq_enqueue(struct sk_buff *s
+ 		return err;
+ 	}
+ 
+-	cl->bstats.bytes += len;
+-	cl->bstats.packets += gso_segs;
++	_bstats_update(&cl->bstats, len, gso_segs);
+ 	sch->qstats.backlog += len;
+ 	++sch->q.qlen;
+ 
diff --git a/patches/drm-i915-gt-Use-spin_lock_irq-instead-of-local_irq_d.patch b/patches/0008-drm-i915-gt-Use-spin_lock_irq-instead-of-local_irq_d.patch
index 3147f4f9249a..ba915643b99a 100644
--- a/patches/drm-i915-gt-Use-spin_lock_irq-instead-of-local_irq_d.patch
+++ b/patches/0008-drm-i915-gt-Use-spin_lock_irq-instead-of-local_irq_d.patch
@@ -1,6 +1,6 @@
 From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
 Date: Wed, 8 Sep 2021 19:03:41 +0200
-Subject: [PATCH] drm/i915/gt: Use spin_lock_irq() instead of
+Subject: [PATCH 08/10] drm/i915/gt: Use spin_lock_irq() instead of
  local_irq_disable() + spin_lock()
 
 execlists_dequeue() is invoked from a function which uses
@@ -20,6 +20,7 @@ anything that would acquire the lock again.
 
 Reported-by: Clark Williams <williams@redhat.com>
 Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
 ---
  drivers/gpu/drm/i915/gt/intel_execlists_submission.c |   17 +++++------------
  1 file changed, 5 insertions(+), 12 deletions(-)
diff --git a/patches/0008-net-sched-Merge-Qdisc-bstats-and-Qdisc-cpu_bstats-da.patch b/patches/0008-net-sched-Merge-Qdisc-bstats-and-Qdisc-cpu_bstats-da.patch
new file mode 100644
index 000000000000..31f3ef55c7d6
--- /dev/null
+++ b/patches/0008-net-sched-Merge-Qdisc-bstats-and-Qdisc-cpu_bstats-da.patch
@@ -0,0 +1,1002 @@
+From: "Ahmed S. Darwish" <a.darwish@linutronix.de>
+Date: Fri, 17 Sep 2021 13:31:40 +0200
+Subject: [PATCH 08/10] net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats
+ data types
+
+The only factor differentiating per-CPU bstats data type (struct
+gnet_stats_basic_cpu) from the packed non-per-CPU one (struct
+gnet_stats_basic_packed) was a u64_stats sync point inside the former.
+The two data types are now equivalent: earlier commits added a u64_stats
+sync point to the latter.
+
+Combine both data types into "struct gnet_stats_basic_sync". This
+eliminates redundancy and simplifies the bstats read/write APIs.
+
+Use u64_stats_t for bstats "packets" and "bytes" data types. On 64-bit
+architectures, u64_stats sync points do not use sequence counter
+protection.
+
+Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/net/ethernet/netronome/nfp/abm/qdisc.c |    2 
+ include/net/act_api.h                          |   10 ++--
+ include/net/gen_stats.h                        |   50 ++++++++++----------
+ include/net/netfilter/xt_rateest.h             |    2 
+ include/net/pkt_cls.h                          |    4 -
+ include/net/sch_generic.h                      |   34 +++----------
+ net/core/gen_estimator.c                       |   36 ++++++++------
+ net/core/gen_stats.c                           |   62 +++++++++++++------------
+ net/netfilter/xt_RATEEST.c                     |    8 +--
+ net/sched/act_api.c                            |   14 ++---
+ net/sched/act_bpf.c                            |    2 
+ net/sched/act_ife.c                            |    4 -
+ net/sched/act_mpls.c                           |    2 
+ net/sched/act_police.c                         |    2 
+ net/sched/act_sample.c                         |    2 
+ net/sched/act_simple.c                         |    3 -
+ net/sched/act_skbedit.c                        |    2 
+ net/sched/act_skbmod.c                         |    2 
+ net/sched/sch_api.c                            |    2 
+ net/sched/sch_atm.c                            |    4 -
+ net/sched/sch_cbq.c                            |    4 -
+ net/sched/sch_drr.c                            |    4 -
+ net/sched/sch_ets.c                            |    4 -
+ net/sched/sch_generic.c                        |    4 -
+ net/sched/sch_gred.c                           |   10 ++--
+ net/sched/sch_hfsc.c                           |    4 -
+ net/sched/sch_htb.c                            |   32 ++++++------
+ net/sched/sch_mq.c                             |    2 
+ net/sched/sch_mqprio.c                         |    6 +-
+ net/sched/sch_qfq.c                            |    4 -
+ 30 files changed, 158 insertions(+), 163 deletions(-)
+
+--- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
++++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
+@@ -458,7 +458,7 @@ nfp_abm_qdisc_graft(struct nfp_abm_link
+ static void
+ nfp_abm_stats_calculate(struct nfp_alink_stats *new,
+ 			struct nfp_alink_stats *old,
+-			struct gnet_stats_basic_packed *bstats,
++			struct gnet_stats_basic_sync *bstats,
+ 			struct gnet_stats_queue *qstats)
+ {
+ 	_bstats_update(bstats, new->tx_bytes - old->tx_bytes,
+--- a/include/net/act_api.h
++++ b/include/net/act_api.h
+@@ -30,13 +30,13 @@ struct tc_action {
+ 	atomic_t			tcfa_bindcnt;
+ 	int				tcfa_action;
+ 	struct tcf_t			tcfa_tm;
+-	struct gnet_stats_basic_packed	tcfa_bstats;
+-	struct gnet_stats_basic_packed	tcfa_bstats_hw;
++	struct gnet_stats_basic_sync	tcfa_bstats;
++	struct gnet_stats_basic_sync	tcfa_bstats_hw;
+ 	struct gnet_stats_queue		tcfa_qstats;
+ 	struct net_rate_estimator __rcu *tcfa_rate_est;
+ 	spinlock_t			tcfa_lock;
+-	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+-	struct gnet_stats_basic_cpu __percpu *cpu_bstats_hw;
++	struct gnet_stats_basic_sync __percpu *cpu_bstats;
++	struct gnet_stats_basic_sync __percpu *cpu_bstats_hw;
+ 	struct gnet_stats_queue __percpu *cpu_qstats;
+ 	struct tc_cookie	__rcu *act_cookie;
+ 	struct tcf_chain	__rcu *goto_chain;
+@@ -206,7 +206,7 @@ static inline void tcf_action_update_bst
+ 					    struct sk_buff *skb)
+ {
+ 	if (likely(a->cpu_bstats)) {
+-		bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), skb);
++		bstats_update(this_cpu_ptr(a->cpu_bstats), skb);
+ 		return;
+ 	}
+ 	spin_lock(&a->tcfa_lock);
+--- a/include/net/gen_stats.h
++++ b/include/net/gen_stats.h
+@@ -7,27 +7,29 @@
+ #include <linux/rtnetlink.h>
+ #include <linux/pkt_sched.h>
+ 
+-/* Note: this used to be in include/uapi/linux/gen_stats.h */
+-struct gnet_stats_basic_packed {
+-	__u64	bytes;
+-	__u64	packets;
+-	struct u64_stats_sync syncp;
+-};
+-
+-struct gnet_stats_basic_cpu {
+-	struct gnet_stats_basic_packed bstats;
++/* Throughput stats.
++ * Must be initialized beforehand with gnet_stats_basic_sync_init().
++ *
++ * If no reads can ever occur parallel to writes (e.g. stack-allocated
++ * bstats), then the internal stat values can be written to and read
++ * from directly. Otherwise, use _bstats_set/update() for writes and
++ * __gnet_stats_copy_basic() for reads.
++ */
++struct gnet_stats_basic_sync {
++	u64_stats_t bytes;
++	u64_stats_t packets;
+ 	struct u64_stats_sync syncp;
+ } __aligned(2 * sizeof(u64));
+ 
+ #ifdef CONFIG_LOCKDEP
+-void gnet_stats_basic_packed_init(struct gnet_stats_basic_packed *b);
++void gnet_stats_basic_sync_init(struct gnet_stats_basic_sync *b);
+ 
+ #else
+ 
+-static inline void gnet_stats_basic_packed_init(struct gnet_stats_basic_packed *b)
++static inline void gnet_stats_basic_sync_init(struct gnet_stats_basic_sync *b)
+ {
+-	b->bytes = 0;
+-	b->packets = 0;
++	u64_stats_set(&b->bytes, 0);
++	u64_stats_set(&b->packets, 0);
+ 	u64_stats_init(&b->syncp);
+ }
+ #endif
+@@ -58,16 +60,16 @@ int gnet_stats_start_copy_compat(struct
+ 
+ int gnet_stats_copy_basic(const seqcount_t *running,
+ 			  struct gnet_dump *d,
+-			  struct gnet_stats_basic_cpu __percpu *cpu,
+-			  struct gnet_stats_basic_packed *b);
++			  struct gnet_stats_basic_sync __percpu *cpu,
++			  struct gnet_stats_basic_sync *b);
+ void __gnet_stats_copy_basic(const seqcount_t *running,
+-			     struct gnet_stats_basic_packed *bstats,
+-			     struct gnet_stats_basic_cpu __percpu *cpu,
+-			     struct gnet_stats_basic_packed *b);
++			     struct gnet_stats_basic_sync *bstats,
++			     struct gnet_stats_basic_sync __percpu *cpu,
++			     struct gnet_stats_basic_sync *b);
+ int gnet_stats_copy_basic_hw(const seqcount_t *running,
+ 			     struct gnet_dump *d,
+-			     struct gnet_stats_basic_cpu __percpu *cpu,
+-			     struct gnet_stats_basic_packed *b);
++			     struct gnet_stats_basic_sync __percpu *cpu,
++			     struct gnet_stats_basic_sync *b);
+ int gnet_stats_copy_rate_est(struct gnet_dump *d,
+ 			     struct net_rate_estimator __rcu **ptr);
+ int gnet_stats_copy_queue(struct gnet_dump *d,
+@@ -80,14 +82,14 @@ int gnet_stats_copy_app(struct gnet_dump
+ 
+ int gnet_stats_finish_copy(struct gnet_dump *d);
+ 
+-int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
+-		      struct gnet_stats_basic_cpu __percpu *cpu_bstats,
++int gen_new_estimator(struct gnet_stats_basic_sync *bstats,
++		      struct gnet_stats_basic_sync __percpu *cpu_bstats,
+ 		      struct net_rate_estimator __rcu **rate_est,
+ 		      spinlock_t *lock,
+ 		      seqcount_t *running, struct nlattr *opt);
+ void gen_kill_estimator(struct net_rate_estimator __rcu **ptr);
+-int gen_replace_estimator(struct gnet_stats_basic_packed *bstats,
+-			  struct gnet_stats_basic_cpu __percpu *cpu_bstats,
++int gen_replace_estimator(struct gnet_stats_basic_sync *bstats,
++			  struct gnet_stats_basic_sync __percpu *cpu_bstats,
+ 			  struct net_rate_estimator __rcu **ptr,
+ 			  spinlock_t *lock,
+ 			  seqcount_t *running, struct nlattr *opt);
+--- a/include/net/netfilter/xt_rateest.h
++++ b/include/net/netfilter/xt_rateest.h
+@@ -6,7 +6,7 @@
+ 
+ struct xt_rateest {
+ 	/* keep lock and bstats on same cache line to speedup xt_rateest_tg() */
+-	struct gnet_stats_basic_packed	bstats;
++	struct gnet_stats_basic_sync	bstats;
+ 	spinlock_t			lock;
+ 
+ 
+--- a/include/net/pkt_cls.h
++++ b/include/net/pkt_cls.h
+@@ -765,7 +765,7 @@ struct tc_cookie {
+ };
+ 
+ struct tc_qopt_offload_stats {
+-	struct gnet_stats_basic_packed *bstats;
++	struct gnet_stats_basic_sync *bstats;
+ 	struct gnet_stats_queue *qstats;
+ };
+ 
+@@ -885,7 +885,7 @@ struct tc_gred_qopt_offload_params {
+ };
+ 
+ struct tc_gred_qopt_offload_stats {
+-	struct gnet_stats_basic_packed bstats[MAX_DPs];
++	struct gnet_stats_basic_sync bstats[MAX_DPs];
+ 	struct gnet_stats_queue qstats[MAX_DPs];
+ 	struct red_stats *xstats[MAX_DPs];
+ };
+--- a/include/net/sch_generic.h
++++ b/include/net/sch_generic.h
+@@ -97,7 +97,7 @@ struct Qdisc {
+ 	struct netdev_queue	*dev_queue;
+ 
+ 	struct net_rate_estimator __rcu *rate_est;
+-	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
++	struct gnet_stats_basic_sync __percpu *cpu_bstats;
+ 	struct gnet_stats_queue	__percpu *cpu_qstats;
+ 	int			pad;
+ 	refcount_t		refcnt;
+@@ -107,7 +107,7 @@ struct Qdisc {
+ 	 */
+ 	struct sk_buff_head	gso_skb ____cacheline_aligned_in_smp;
+ 	struct qdisc_skb_head	q;
+-	struct gnet_stats_basic_packed bstats;
++	struct gnet_stats_basic_sync bstats;
+ 	seqcount_t		running;
+ 	struct gnet_stats_queue	qstats;
+ 	unsigned long		state;
+@@ -845,16 +845,16 @@ static inline int qdisc_enqueue(struct s
+ 	return sch->enqueue(skb, sch, to_free);
+ }
+ 
+-static inline void _bstats_update(struct gnet_stats_basic_packed *bstats,
++static inline void _bstats_update(struct gnet_stats_basic_sync *bstats,
+ 				  __u64 bytes, __u32 packets)
+ {
+ 	u64_stats_update_begin(&bstats->syncp);
+-	bstats->bytes += bytes;
+-	bstats->packets += packets;
++	u64_stats_add(&bstats->bytes, bytes);
++	u64_stats_add(&bstats->packets, packets);
+ 	u64_stats_update_end(&bstats->syncp);
+ }
+ 
+-static inline void bstats_update(struct gnet_stats_basic_packed *bstats,
++static inline void bstats_update(struct gnet_stats_basic_sync *bstats,
+ 				 const struct sk_buff *skb)
+ {
+ 	_bstats_update(bstats,
+@@ -862,26 +862,10 @@ static inline void bstats_update(struct
+ 		       skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1);
+ }
+ 
+-static inline void _bstats_cpu_update(struct gnet_stats_basic_cpu *bstats,
+-				      __u64 bytes, __u32 packets)
+-{
+-	u64_stats_update_begin(&bstats->syncp);
+-	_bstats_update(&bstats->bstats, bytes, packets);
+-	u64_stats_update_end(&bstats->syncp);
+-}
+-
+-static inline void bstats_cpu_update(struct gnet_stats_basic_cpu *bstats,
+-				     const struct sk_buff *skb)
+-{
+-	u64_stats_update_begin(&bstats->syncp);
+-	bstats_update(&bstats->bstats, skb);
+-	u64_stats_update_end(&bstats->syncp);
+-}
+-
+ static inline void qdisc_bstats_cpu_update(struct Qdisc *sch,
+ 					   const struct sk_buff *skb)
+ {
+-	bstats_cpu_update(this_cpu_ptr(sch->cpu_bstats), skb);
++	bstats_update(this_cpu_ptr(sch->cpu_bstats), skb);
+ }
+ 
+ static inline void qdisc_bstats_update(struct Qdisc *sch,
+@@ -1314,7 +1298,7 @@ void psched_ppscfg_precompute(struct psc
+ struct mini_Qdisc {
+ 	struct tcf_proto *filter_list;
+ 	struct tcf_block *block;
+-	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
++	struct gnet_stats_basic_sync __percpu *cpu_bstats;
+ 	struct gnet_stats_queue	__percpu *cpu_qstats;
+ 	struct rcu_head rcu;
+ };
+@@ -1322,7 +1306,7 @@ struct mini_Qdisc {
+ static inline void mini_qdisc_bstats_cpu_update(struct mini_Qdisc *miniq,
+ 						const struct sk_buff *skb)
+ {
+-	bstats_cpu_update(this_cpu_ptr(miniq->cpu_bstats), skb);
++	bstats_update(this_cpu_ptr(miniq->cpu_bstats), skb);
+ }
+ 
+ static inline void mini_qdisc_qstats_cpu_drop(struct mini_Qdisc *miniq)
+--- a/net/core/gen_estimator.c
++++ b/net/core/gen_estimator.c
+@@ -40,10 +40,10 @@
+  */
+ 
+ struct net_rate_estimator {
+-	struct gnet_stats_basic_packed	*bstats;
++	struct gnet_stats_basic_sync	*bstats;
+ 	spinlock_t		*stats_lock;
+ 	seqcount_t		*running;
+-	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
++	struct gnet_stats_basic_sync __percpu *cpu_bstats;
+ 	u8			ewma_log;
+ 	u8			intvl_log; /* period : (250ms << intvl_log) */
+ 
+@@ -60,9 +60,9 @@ struct net_rate_estimator {
+ };
+ 
+ static void est_fetch_counters(struct net_rate_estimator *e,
+-			       struct gnet_stats_basic_packed *b)
++			       struct gnet_stats_basic_sync *b)
+ {
+-	gnet_stats_basic_packed_init(b);
++	gnet_stats_basic_sync_init(b);
+ 	if (e->stats_lock)
+ 		spin_lock(e->stats_lock);
+ 
+@@ -76,14 +76,18 @@ static void est_fetch_counters(struct ne
+ static void est_timer(struct timer_list *t)
+ {
+ 	struct net_rate_estimator *est = from_timer(est, t, timer);
+-	struct gnet_stats_basic_packed b;
++	struct gnet_stats_basic_sync b;
++	u64 b_bytes, b_packets;
+ 	u64 rate, brate;
+ 
+ 	est_fetch_counters(est, &b);
+-	brate = (b.bytes - est->last_bytes) << (10 - est->intvl_log);
++	b_bytes = u64_stats_read(&b.bytes);
++	b_packets = u64_stats_read(&b.packets);
++
++	brate = (b_bytes - est->last_bytes) << (10 - est->intvl_log);
+ 	brate = (brate >> est->ewma_log) - (est->avbps >> est->ewma_log);
+ 
+-	rate = (b.packets - est->last_packets) << (10 - est->intvl_log);
++	rate = (b_packets - est->last_packets) << (10 - est->intvl_log);
+ 	rate = (rate >> est->ewma_log) - (est->avpps >> est->ewma_log);
+ 
+ 	write_seqcount_begin(&est->seq);
+@@ -91,8 +95,8 @@ static void est_timer(struct timer_list
+ 	est->avpps += rate;
+ 	write_seqcount_end(&est->seq);
+ 
+-	est->last_bytes = b.bytes;
+-	est->last_packets = b.packets;
++	est->last_bytes = b_bytes;
++	est->last_packets = b_packets;
+ 
+ 	est->next_jiffies += ((HZ/4) << est->intvl_log);
+ 
+@@ -121,8 +125,8 @@ static void est_timer(struct timer_list
+  * Returns 0 on success or a negative error code.
+  *
+  */
+-int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
+-		      struct gnet_stats_basic_cpu __percpu *cpu_bstats,
++int gen_new_estimator(struct gnet_stats_basic_sync *bstats,
++		      struct gnet_stats_basic_sync __percpu *cpu_bstats,
+ 		      struct net_rate_estimator __rcu **rate_est,
+ 		      spinlock_t *lock,
+ 		      seqcount_t *running,
+@@ -130,7 +134,7 @@ int gen_new_estimator(struct gnet_stats_
+ {
+ 	struct gnet_estimator *parm = nla_data(opt);
+ 	struct net_rate_estimator *old, *est;
+-	struct gnet_stats_basic_packed b;
++	struct gnet_stats_basic_sync b;
+ 	int intvl_log;
+ 
+ 	if (nla_len(opt) < sizeof(*parm))
+@@ -164,8 +168,8 @@ int gen_new_estimator(struct gnet_stats_
+ 	est_fetch_counters(est, &b);
+ 	if (lock)
+ 		local_bh_enable();
+-	est->last_bytes = b.bytes;
+-	est->last_packets = b.packets;
++	est->last_bytes = u64_stats_read(&b.bytes);
++	est->last_packets = u64_stats_read(&b.packets);
+ 
+ 	if (lock)
+ 		spin_lock_bh(lock);
+@@ -222,8 +226,8 @@ EXPORT_SYMBOL(gen_kill_estimator);
+  *
+  * Returns 0 on success or a negative error code.
+  */
+-int gen_replace_estimator(struct gnet_stats_basic_packed *bstats,
+-			  struct gnet_stats_basic_cpu __percpu *cpu_bstats,
++int gen_replace_estimator(struct gnet_stats_basic_sync *bstats,
++			  struct gnet_stats_basic_sync __percpu *cpu_bstats,
+ 			  struct net_rate_estimator __rcu **rate_est,
+ 			  spinlock_t *lock,
+ 			  seqcount_t *running, struct nlattr *opt)
+--- a/net/core/gen_stats.c
++++ b/net/core/gen_stats.c
+@@ -116,31 +116,31 @@ EXPORT_SYMBOL(gnet_stats_start_copy);
+ 
+ #ifdef CONFIG_LOCKDEP
+ /* Must not be inlined, due to u64_stats seqcount_t lockdep key */
+-void gnet_stats_basic_packed_init(struct gnet_stats_basic_packed *b)
++void gnet_stats_basic_sync_init(struct gnet_stats_basic_sync *b)
+ {
+-	b->bytes = 0;
+-	b->packets = 0;
++	u64_stats_set(&b->bytes, 0);
++	u64_stats_set(&b->packets, 0);
+ 	u64_stats_init(&b->syncp);
+ }
+-EXPORT_SYMBOL(gnet_stats_basic_packed_init);
++EXPORT_SYMBOL(gnet_stats_basic_sync_init);
+ #endif
+ 
+ static void
+-__gnet_stats_copy_basic_cpu(struct gnet_stats_basic_packed *bstats,
+-			    struct gnet_stats_basic_cpu __percpu *cpu)
++__gnet_stats_copy_basic_cpu(struct gnet_stats_basic_sync *bstats,
++			    struct gnet_stats_basic_sync __percpu *cpu)
+ {
+ 	u64 t_bytes = 0, t_packets = 0;
+ 	int i;
+ 
+ 	for_each_possible_cpu(i) {
+-		struct gnet_stats_basic_cpu *bcpu = per_cpu_ptr(cpu, i);
++		struct gnet_stats_basic_sync *bcpu = per_cpu_ptr(cpu, i);
+ 		unsigned int start;
+ 		u64 bytes, packets;
+ 
+ 		do {
+ 			start = u64_stats_fetch_begin_irq(&bcpu->syncp);
+-			bytes = bcpu->bstats.bytes;
+-			packets = bcpu->bstats.packets;
++			bytes = u64_stats_read(&bcpu->bytes);
++			packets = u64_stats_read(&bcpu->packets);
+ 		} while (u64_stats_fetch_retry_irq(&bcpu->syncp, start));
+ 
+ 		t_bytes += bytes;
+@@ -151,9 +151,9 @@ static void
+ 
+ void
+ __gnet_stats_copy_basic(const seqcount_t *running,
+-			struct gnet_stats_basic_packed *bstats,
+-			struct gnet_stats_basic_cpu __percpu *cpu,
+-			struct gnet_stats_basic_packed *b)
++			struct gnet_stats_basic_sync *bstats,
++			struct gnet_stats_basic_sync __percpu *cpu,
++			struct gnet_stats_basic_sync *b)
+ {
+ 	unsigned int seq;
+ 	__u64 bytes = 0;
+@@ -166,8 +166,8 @@ void
+ 	do {
+ 		if (running)
+ 			seq = read_seqcount_begin(running);
+-		bytes = b->bytes;
+-		packets = b->packets;
++		bytes = u64_stats_read(&b->bytes);
++		packets = u64_stats_read(&b->packets);
+ 	} while (running && read_seqcount_retry(running, seq));
+ 
+ 	_bstats_update(bstats, bytes, packets);
+@@ -177,18 +177,22 @@ EXPORT_SYMBOL(__gnet_stats_copy_basic);
+ static int
+ ___gnet_stats_copy_basic(const seqcount_t *running,
+ 			 struct gnet_dump *d,
+-			 struct gnet_stats_basic_cpu __percpu *cpu,
+-			 struct gnet_stats_basic_packed *b,
++			 struct gnet_stats_basic_sync __percpu *cpu,
++			 struct gnet_stats_basic_sync *b,
+ 			 int type)
+ {
+-	struct gnet_stats_basic_packed bstats;
++	struct gnet_stats_basic_sync bstats;
++	u64 bstats_bytes, bstats_packets;
+ 
+-	gnet_stats_basic_packed_init(&bstats);
++	gnet_stats_basic_sync_init(&bstats);
+ 	__gnet_stats_copy_basic(running, &bstats, cpu, b);
+ 
++	bstats_bytes = u64_stats_read(&bstats.bytes);
++	bstats_packets = u64_stats_read(&bstats.packets);
++
+ 	if (d->compat_tc_stats && type == TCA_STATS_BASIC) {
+-		d->tc_stats.bytes = bstats.bytes;
+-		d->tc_stats.packets = bstats.packets;
++		d->tc_stats.bytes = bstats_bytes;
++		d->tc_stats.packets = bstats_packets;
+ 	}
+ 
+ 	if (d->tail) {
+@@ -196,14 +200,14 @@ static int
+ 		int res;
+ 
+ 		memset(&sb, 0, sizeof(sb));
+-		sb.bytes = bstats.bytes;
+-		sb.packets = bstats.packets;
++		sb.bytes = bstats_bytes;
++		sb.packets = bstats_packets;
+ 		res = gnet_stats_copy(d, type, &sb, sizeof(sb), TCA_STATS_PAD);
+-		if (res < 0 || sb.packets == bstats.packets)
++		if (res < 0 || sb.packets == bstats_packets)
+ 			return res;
+ 		/* emit 64bit stats only if needed */
+-		return gnet_stats_copy(d, TCA_STATS_PKT64, &bstats.packets,
+-				       sizeof(bstats.packets), TCA_STATS_PAD);
++		return gnet_stats_copy(d, TCA_STATS_PKT64, &bstats_packets,
++				       sizeof(bstats_packets), TCA_STATS_PAD);
+ 	}
+ 	return 0;
+ }
+@@ -224,8 +228,8 @@ static int
+ int
+ gnet_stats_copy_basic(const seqcount_t *running,
+ 		      struct gnet_dump *d,
+-		      struct gnet_stats_basic_cpu __percpu *cpu,
+-		      struct gnet_stats_basic_packed *b)
++		      struct gnet_stats_basic_sync __percpu *cpu,
++		      struct gnet_stats_basic_sync *b)
+ {
+ 	return ___gnet_stats_copy_basic(running, d, cpu, b,
+ 					TCA_STATS_BASIC);
+@@ -248,8 +252,8 @@ EXPORT_SYMBOL(gnet_stats_copy_basic);
+ int
+ gnet_stats_copy_basic_hw(const seqcount_t *running,
+ 			 struct gnet_dump *d,
+-			 struct gnet_stats_basic_cpu __percpu *cpu,
+-			 struct gnet_stats_basic_packed *b)
++			 struct gnet_stats_basic_sync __percpu *cpu,
++			 struct gnet_stats_basic_sync *b)
+ {
+ 	return ___gnet_stats_copy_basic(running, d, cpu, b,
+ 					TCA_STATS_BASIC_HW);
+--- a/net/netfilter/xt_RATEEST.c
++++ b/net/netfilter/xt_RATEEST.c
+@@ -94,11 +94,11 @@ static unsigned int
+ xt_rateest_tg(struct sk_buff *skb, const struct xt_action_param *par)
+ {
+ 	const struct xt_rateest_target_info *info = par->targinfo;
+-	struct gnet_stats_basic_packed *stats = &info->est->bstats;
++	struct gnet_stats_basic_sync *stats = &info->est->bstats;
+ 
+ 	spin_lock_bh(&info->est->lock);
+-	stats->bytes += skb->len;
+-	stats->packets++;
++	u64_stats_add(&stats->bytes, skb->len);
++	u64_stats_inc(&stats->packets);
+ 	spin_unlock_bh(&info->est->lock);
+ 
+ 	return XT_CONTINUE;
+@@ -143,7 +143,7 @@ static int xt_rateest_tg_checkentry(cons
+ 	if (!est)
+ 		goto err1;
+ 
+-	gnet_stats_basic_packed_init(&est->bstats);
++	gnet_stats_basic_sync_init(&est->bstats);
+ 	strlcpy(est->name, info->name, sizeof(est->name));
+ 	spin_lock_init(&est->lock);
+ 	est->refcnt		= 1;
+--- a/net/sched/act_api.c
++++ b/net/sched/act_api.c
+@@ -480,18 +480,18 @@ int tcf_idr_create(struct tc_action_net
+ 		atomic_set(&p->tcfa_bindcnt, 1);
+ 
+ 	if (cpustats) {
+-		p->cpu_bstats = netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
++		p->cpu_bstats = netdev_alloc_pcpu_stats(struct gnet_stats_basic_sync);
+ 		if (!p->cpu_bstats)
+ 			goto err1;
+-		p->cpu_bstats_hw = netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
++		p->cpu_bstats_hw = netdev_alloc_pcpu_stats(struct gnet_stats_basic_sync);
+ 		if (!p->cpu_bstats_hw)
+ 			goto err2;
+ 		p->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
+ 		if (!p->cpu_qstats)
+ 			goto err3;
+ 	}
+-	gnet_stats_basic_packed_init(&p->tcfa_bstats);
+-	gnet_stats_basic_packed_init(&p->tcfa_bstats_hw);
++	gnet_stats_basic_sync_init(&p->tcfa_bstats);
++	gnet_stats_basic_sync_init(&p->tcfa_bstats_hw);
+ 	spin_lock_init(&p->tcfa_lock);
+ 	p->tcfa_index = index;
+ 	p->tcfa_tm.install = jiffies;
+@@ -1128,13 +1128,13 @@ void tcf_action_update_stats(struct tc_a
+ 			     u64 drops, bool hw)
+ {
+ 	if (a->cpu_bstats) {
+-		_bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets);
++		_bstats_update(this_cpu_ptr(a->cpu_bstats), bytes, packets);
+ 
+ 		this_cpu_ptr(a->cpu_qstats)->drops += drops;
+ 
+ 		if (hw)
+-			_bstats_cpu_update(this_cpu_ptr(a->cpu_bstats_hw),
+-					   bytes, packets);
++			_bstats_update(this_cpu_ptr(a->cpu_bstats_hw),
++				       bytes, packets);
+ 		return;
+ 	}
+ 
+--- a/net/sched/act_bpf.c
++++ b/net/sched/act_bpf.c
+@@ -41,7 +41,7 @@ static int tcf_bpf_act(struct sk_buff *s
+ 	int action, filter_res;
+ 
+ 	tcf_lastuse_update(&prog->tcf_tm);
+-	bstats_cpu_update(this_cpu_ptr(prog->common.cpu_bstats), skb);
++	bstats_update(this_cpu_ptr(prog->common.cpu_bstats), skb);
+ 
+ 	filter = rcu_dereference(prog->filter);
+ 	if (at_ingress) {
+--- a/net/sched/act_ife.c
++++ b/net/sched/act_ife.c
+@@ -718,7 +718,7 @@ static int tcf_ife_decode(struct sk_buff
+ 	u8 *tlv_data;
+ 	u16 metalen;
+ 
+-	bstats_cpu_update(this_cpu_ptr(ife->common.cpu_bstats), skb);
++	bstats_update(this_cpu_ptr(ife->common.cpu_bstats), skb);
+ 	tcf_lastuse_update(&ife->tcf_tm);
+ 
+ 	if (skb_at_tc_ingress(skb))
+@@ -806,7 +806,7 @@ static int tcf_ife_encode(struct sk_buff
+ 			exceed_mtu = true;
+ 	}
+ 
+-	bstats_cpu_update(this_cpu_ptr(ife->common.cpu_bstats), skb);
++	bstats_update(this_cpu_ptr(ife->common.cpu_bstats), skb);
+ 	tcf_lastuse_update(&ife->tcf_tm);
+ 
+ 	if (!metalen) {		/* no metadata to send */
+--- a/net/sched/act_mpls.c
++++ b/net/sched/act_mpls.c
+@@ -59,7 +59,7 @@ static int tcf_mpls_act(struct sk_buff *
+ 	int ret, mac_len;
+ 
+ 	tcf_lastuse_update(&m->tcf_tm);
+-	bstats_cpu_update(this_cpu_ptr(m->common.cpu_bstats), skb);
++	bstats_update(this_cpu_ptr(m->common.cpu_bstats), skb);
+ 
+ 	/* Ensure 'data' points at mac_header prior calling mpls manipulating
+ 	 * functions.
+--- a/net/sched/act_police.c
++++ b/net/sched/act_police.c
+@@ -248,7 +248,7 @@ static int tcf_police_act(struct sk_buff
+ 	int ret;
+ 
+ 	tcf_lastuse_update(&police->tcf_tm);
+-	bstats_cpu_update(this_cpu_ptr(police->common.cpu_bstats), skb);
++	bstats_update(this_cpu_ptr(police->common.cpu_bstats), skb);
+ 
+ 	ret = READ_ONCE(police->tcf_action);
+ 	p = rcu_dereference_bh(police->params);
+--- a/net/sched/act_sample.c
++++ b/net/sched/act_sample.c
+@@ -163,7 +163,7 @@ static int tcf_sample_act(struct sk_buff
+ 	int retval;
+ 
+ 	tcf_lastuse_update(&s->tcf_tm);
+-	bstats_cpu_update(this_cpu_ptr(s->common.cpu_bstats), skb);
++	bstats_update(this_cpu_ptr(s->common.cpu_bstats), skb);
+ 	retval = READ_ONCE(s->tcf_action);
+ 
+ 	psample_group = rcu_dereference_bh(s->psample_group);
+--- a/net/sched/act_simple.c
++++ b/net/sched/act_simple.c
+@@ -36,7 +36,8 @@ static int tcf_simp_act(struct sk_buff *
+ 	 * then it would look like "hello_3" (without quotes)
+ 	 */
+ 	pr_info("simple: %s_%llu\n",
+-	       (char *)d->tcfd_defdata, d->tcf_bstats.packets);
++		(char *)d->tcfd_defdata,
++		u64_stats_read(&d->tcf_bstats.packets));
+ 	spin_unlock(&d->tcf_lock);
+ 	return d->tcf_action;
+ }
+--- a/net/sched/act_skbedit.c
++++ b/net/sched/act_skbedit.c
+@@ -31,7 +31,7 @@ static int tcf_skbedit_act(struct sk_buf
+ 	int action;
+ 
+ 	tcf_lastuse_update(&d->tcf_tm);
+-	bstats_cpu_update(this_cpu_ptr(d->common.cpu_bstats), skb);
++	bstats_update(this_cpu_ptr(d->common.cpu_bstats), skb);
+ 
+ 	params = rcu_dereference_bh(d->params);
+ 	action = READ_ONCE(d->tcf_action);
+--- a/net/sched/act_skbmod.c
++++ b/net/sched/act_skbmod.c
+@@ -31,7 +31,7 @@ static int tcf_skbmod_act(struct sk_buff
+ 	u64 flags;
+ 
+ 	tcf_lastuse_update(&d->tcf_tm);
+-	bstats_cpu_update(this_cpu_ptr(d->common.cpu_bstats), skb);
++	bstats_update(this_cpu_ptr(d->common.cpu_bstats), skb);
+ 
+ 	action = READ_ONCE(d->tcf_action);
+ 	if (unlikely(action == TC_ACT_SHOT))
+--- a/net/sched/sch_api.c
++++ b/net/sched/sch_api.c
+@@ -884,7 +884,7 @@ static void qdisc_offload_graft_root(str
+ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
+ 			 u32 portid, u32 seq, u16 flags, int event)
+ {
+-	struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL;
++	struct gnet_stats_basic_sync __percpu *cpu_bstats = NULL;
+ 	struct gnet_stats_queue __percpu *cpu_qstats = NULL;
+ 	struct tcmsg *tcm;
+ 	struct nlmsghdr  *nlh;
+--- a/net/sched/sch_atm.c
++++ b/net/sched/sch_atm.c
+@@ -52,7 +52,7 @@ struct atm_flow_data {
+ 	struct atm_qdisc_data	*parent;	/* parent qdisc */
+ 	struct socket		*sock;		/* for closing */
+ 	int			ref;		/* reference count */
+-	struct gnet_stats_basic_packed	bstats;
++	struct gnet_stats_basic_sync	bstats;
+ 	struct gnet_stats_queue	qstats;
+ 	struct list_head	list;
+ 	struct atm_flow_data	*excess;	/* flow for excess traffic;
+@@ -548,7 +548,7 @@ static int atm_tc_init(struct Qdisc *sch
+ 	pr_debug("atm_tc_init(sch %p,[qdisc %p],opt %p)\n", sch, p, opt);
+ 	INIT_LIST_HEAD(&p->flows);
+ 	INIT_LIST_HEAD(&p->link.list);
+-	gnet_stats_basic_packed_init(&p->link.bstats);
++	gnet_stats_basic_sync_init(&p->link.bstats);
+ 	list_add(&p->link.list, &p->flows);
+ 	p->link.q = qdisc_create_dflt(sch->dev_queue,
+ 				      &pfifo_qdisc_ops, sch->handle, extack);
+--- a/net/sched/sch_cbq.c
++++ b/net/sched/sch_cbq.c
+@@ -116,7 +116,7 @@ struct cbq_class {
+ 	long			avgidle;
+ 	long			deficit;	/* Saved deficit for WRR */
+ 	psched_time_t		penalized;
+-	struct gnet_stats_basic_packed bstats;
++	struct gnet_stats_basic_sync bstats;
+ 	struct gnet_stats_queue qstats;
+ 	struct net_rate_estimator __rcu *rate_est;
+ 	struct tc_cbq_xstats	xstats;
+@@ -1610,7 +1610,7 @@ cbq_change_class(struct Qdisc *sch, u32
+ 	if (cl == NULL)
+ 		goto failure;
+ 
+-	gnet_stats_basic_packed_init(&cl->bstats);
++	gnet_stats_basic_sync_init(&cl->bstats);
+ 	err = tcf_block_get(&cl->block, &cl->filter_list, sch, extack);
+ 	if (err) {
+ 		kfree(cl);
+--- a/net/sched/sch_drr.c
++++ b/net/sched/sch_drr.c
+@@ -19,7 +19,7 @@ struct drr_class {
+ 	struct Qdisc_class_common	common;
+ 	unsigned int			filter_cnt;
+ 
+-	struct gnet_stats_basic_packed		bstats;
++	struct gnet_stats_basic_sync		bstats;
+ 	struct gnet_stats_queue		qstats;
+ 	struct net_rate_estimator __rcu *rate_est;
+ 	struct list_head		alist;
+@@ -106,7 +106,7 @@ static int drr_change_class(struct Qdisc
+ 	if (cl == NULL)
+ 		return -ENOBUFS;
+ 
+-	gnet_stats_basic_packed_init(&cl->bstats);
++	gnet_stats_basic_sync_init(&cl->bstats);
+ 	cl->common.classid = classid;
+ 	cl->quantum	   = quantum;
+ 	cl->qdisc	   = qdisc_create_dflt(sch->dev_queue,
+--- a/net/sched/sch_ets.c
++++ b/net/sched/sch_ets.c
+@@ -41,7 +41,7 @@ struct ets_class {
+ 	struct Qdisc *qdisc;
+ 	u32 quantum;
+ 	u32 deficit;
+-	struct gnet_stats_basic_packed bstats;
++	struct gnet_stats_basic_sync bstats;
+ 	struct gnet_stats_queue qstats;
+ };
+ 
+@@ -662,7 +662,7 @@ static int ets_qdisc_change(struct Qdisc
+ 	q->nbands = nbands;
+ 	for (i = nstrict; i < q->nstrict; i++) {
+ 		INIT_LIST_HEAD(&q->classes[i].alist);
+-		gnet_stats_basic_packed_init(&q->classes[i].bstats);
++		gnet_stats_basic_sync_init(&q->classes[i].bstats);
+ 		if (q->classes[i].qdisc->q.qlen) {
+ 			list_add_tail(&q->classes[i].alist, &q->active);
+ 			q->classes[i].deficit = quanta[i];
+--- a/net/sched/sch_generic.c
++++ b/net/sched/sch_generic.c
+@@ -892,12 +892,12 @@ struct Qdisc *qdisc_alloc(struct netdev_
+ 	__skb_queue_head_init(&sch->gso_skb);
+ 	__skb_queue_head_init(&sch->skb_bad_txq);
+ 	qdisc_skb_head_init(&sch->q);
+-	gnet_stats_basic_packed_init(&sch->bstats);
++	gnet_stats_basic_sync_init(&sch->bstats);
+ 	spin_lock_init(&sch->q.lock);
+ 
+ 	if (ops->static_flags & TCQ_F_CPUSTATS) {
+ 		sch->cpu_bstats =
+-			netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
++			netdev_alloc_pcpu_stats(struct gnet_stats_basic_sync);
+ 		if (!sch->cpu_bstats)
+ 			goto errout1;
+ 
+--- a/net/sched/sch_gred.c
++++ b/net/sched/sch_gred.c
+@@ -366,7 +366,7 @@ static int gred_offload_dump_stats(struc
+ 	hw_stats->parent = sch->parent;
+ 
+ 	for (i = 0; i < MAX_DPs; i++) {
+-		gnet_stats_basic_packed_init(&hw_stats->stats.bstats[i]);
++		gnet_stats_basic_sync_init(&hw_stats->stats.bstats[i]);
+ 		if (table->tab[i])
+ 			hw_stats->stats.xstats[i] = &table->tab[i]->stats;
+ 	}
+@@ -378,12 +378,12 @@ static int gred_offload_dump_stats(struc
+ 	for (i = 0; i < MAX_DPs; i++) {
+ 		if (!table->tab[i])
+ 			continue;
+-		table->tab[i]->packetsin += hw_stats->stats.bstats[i].packets;
+-		table->tab[i]->bytesin += hw_stats->stats.bstats[i].bytes;
++		table->tab[i]->packetsin += u64_stats_read(&hw_stats->stats.bstats[i].packets);
++		table->tab[i]->bytesin += u64_stats_read(&hw_stats->stats.bstats[i].bytes);
+ 		table->tab[i]->backlog += hw_stats->stats.qstats[i].backlog;
+ 
+-		bytes += hw_stats->stats.bstats[i].bytes;
+-		packets += hw_stats->stats.bstats[i].packets;
++		bytes += u64_stats_read(&hw_stats->stats.bstats[i].bytes);
++		packets += u64_stats_read(&hw_stats->stats.bstats[i].packets);
+ 		sch->qstats.qlen += hw_stats->stats.qstats[i].qlen;
+ 		sch->qstats.backlog += hw_stats->stats.qstats[i].backlog;
+ 		sch->qstats.drops += hw_stats->stats.qstats[i].drops;
+--- a/net/sched/sch_hfsc.c
++++ b/net/sched/sch_hfsc.c
+@@ -111,7 +111,7 @@ enum hfsc_class_flags {
+ struct hfsc_class {
+ 	struct Qdisc_class_common cl_common;
+ 
+-	struct gnet_stats_basic_packed bstats;
++	struct gnet_stats_basic_sync bstats;
+ 	struct gnet_stats_queue qstats;
+ 	struct net_rate_estimator __rcu *rate_est;
+ 	struct tcf_proto __rcu *filter_list; /* filter list */
+@@ -1406,7 +1406,7 @@ hfsc_init_qdisc(struct Qdisc *sch, struc
+ 	if (err)
+ 		return err;
+ 
+-	gnet_stats_basic_packed_init(&q->root.bstats);
++	gnet_stats_basic_sync_init(&q->root.bstats);
+ 	q->root.cl_common.classid = sch->handle;
+ 	q->root.sched   = q;
+ 	q->root.qdisc = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops,
+--- a/net/sched/sch_htb.c
++++ b/net/sched/sch_htb.c
+@@ -113,8 +113,8 @@ struct htb_class {
+ 	/*
+ 	 * Written often fields
+ 	 */
+-	struct gnet_stats_basic_packed bstats;
+-	struct gnet_stats_basic_packed bstats_bias;
++	struct gnet_stats_basic_sync bstats;
++	struct gnet_stats_basic_sync bstats_bias;
+ 	struct tc_htb_xstats	xstats;	/* our special stats */
+ 
+ 	/* token bucket parameters */
+@@ -1312,7 +1312,7 @@ static void htb_offload_aggregate_stats(
+ 	struct htb_class *c;
+ 	unsigned int i;
+ 
+-	gnet_stats_basic_packed_init(&cl->bstats);
++	gnet_stats_basic_sync_init(&cl->bstats);
+ 
+ 	for (i = 0; i < q->clhash.hashsize; i++) {
+ 		hlist_for_each_entry(c, &q->clhash.hash[i], common.hnode) {
+@@ -1324,11 +1324,11 @@ static void htb_offload_aggregate_stats(
+ 			if (p != cl)
+ 				continue;
+ 
+-			bytes += c->bstats_bias.bytes;
+-			packets += c->bstats_bias.packets;
++			bytes += u64_stats_read(&c->bstats_bias.bytes);
++			packets += u64_stats_read(&c->bstats_bias.packets);
+ 			if (c->level == 0) {
+-				bytes += c->leaf.q->bstats.bytes;
+-				packets += c->leaf.q->bstats.packets;
++				bytes += u64_stats_read(&c->leaf.q->bstats.bytes);
++				packets += u64_stats_read(&c->leaf.q->bstats.packets);
+ 			}
+ 		}
+ 	}
+@@ -1359,10 +1359,10 @@ htb_dump_class_stats(struct Qdisc *sch,
+ 			if (cl->leaf.q)
+ 				cl->bstats = cl->leaf.q->bstats;
+ 			else
+-				gnet_stats_basic_packed_init(&cl->bstats);
++				gnet_stats_basic_sync_init(&cl->bstats);
+ 			_bstats_update(&cl->bstats,
+-				       cl->bstats_bias.bytes,
+-				       cl->bstats_bias.packets);
++				       u64_stats_read(&cl->bstats_bias.bytes),
++				       u64_stats_read(&cl->bstats_bias.packets));
+ 		} else {
+ 			htb_offload_aggregate_stats(q, cl);
+ 		}
+@@ -1582,8 +1582,8 @@ static int htb_destroy_class_offload(str
+ 
+ 	if (cl->parent) {
+ 		_bstats_update(&cl->parent->bstats_bias,
+-			       q->bstats.bytes,
+-			       q->bstats.packets);
++			       u64_stats_read(&q->bstats.bytes),
++			       u64_stats_read(&q->bstats.packets));
+ 	}
+ 
+ 	offload_opt = (struct tc_htb_qopt_offload) {
+@@ -1853,8 +1853,8 @@ static int htb_change_class(struct Qdisc
+ 		if (!cl)
+ 			goto failure;
+ 
+-		gnet_stats_basic_packed_init(&cl->bstats);
+-		gnet_stats_basic_packed_init(&cl->bstats_bias);
++		gnet_stats_basic_sync_init(&cl->bstats);
++		gnet_stats_basic_sync_init(&cl->bstats_bias);
+ 
+ 		err = tcf_block_get(&cl->block, &cl->filter_list, sch, extack);
+ 		if (err) {
+@@ -1930,8 +1930,8 @@ static int htb_change_class(struct Qdisc
+ 				goto err_kill_estimator;
+ 			}
+ 			_bstats_update(&parent->bstats_bias,
+-				       old_q->bstats.bytes,
+-				       old_q->bstats.packets);
++				       u64_stats_read(&old_q->bstats.bytes),
++				       u64_stats_read(&old_q->bstats.packets));
+ 			qdisc_put(old_q);
+ 		}
+ 		new_q = qdisc_create_dflt(dev_queue, &pfifo_qdisc_ops,
+--- a/net/sched/sch_mq.c
++++ b/net/sched/sch_mq.c
+@@ -133,7 +133,7 @@ static int mq_dump(struct Qdisc *sch, st
+ 	__u32 qlen = 0;
+ 
+ 	sch->q.qlen = 0;
+-	gnet_stats_basic_packed_init(&sch->bstats);
++	gnet_stats_basic_sync_init(&sch->bstats);
+ 	memset(&sch->qstats, 0, sizeof(sch->qstats));
+ 
+ 	/* MQ supports lockless qdiscs. However, statistics accounting needs
+--- a/net/sched/sch_mqprio.c
++++ b/net/sched/sch_mqprio.c
+@@ -390,7 +390,7 @@ static int mqprio_dump(struct Qdisc *sch
+ 	unsigned int ntx, tc;
+ 
+ 	sch->q.qlen = 0;
+-	gnet_stats_basic_packed_init(&sch->bstats);
++	gnet_stats_basic_sync_init(&sch->bstats);
+ 	memset(&sch->qstats, 0, sizeof(sch->qstats));
+ 
+ 	/* MQ supports lockless qdiscs. However, statistics accounting needs
+@@ -504,11 +504,11 @@ static int mqprio_dump_class_stats(struc
+ 		int i;
+ 		__u32 qlen = 0;
+ 		struct gnet_stats_queue qstats = {0};
+-		struct gnet_stats_basic_packed bstats;
++		struct gnet_stats_basic_sync bstats;
+ 		struct net_device *dev = qdisc_dev(sch);
+ 		struct netdev_tc_txq tc = dev->tc_to_txq[cl & TC_BITMASK];
+ 
+-		gnet_stats_basic_packed_init(&bstats);
++		gnet_stats_basic_sync_init(&bstats);
+ 		/* Drop lock here it will be reclaimed before touching
+ 		 * statistics this is required because the d->lock we
+ 		 * hold here is the look on dev_queue->qdisc_sleeping
+--- a/net/sched/sch_qfq.c
++++ b/net/sched/sch_qfq.c
+@@ -131,7 +131,7 @@ struct qfq_class {
+ 
+ 	unsigned int filter_cnt;
+ 
+-	struct gnet_stats_basic_packed bstats;
++	struct gnet_stats_basic_sync bstats;
+ 	struct gnet_stats_queue qstats;
+ 	struct net_rate_estimator __rcu *rate_est;
+ 	struct Qdisc *qdisc;
+@@ -465,7 +465,7 @@ static int qfq_change_class(struct Qdisc
+ 	if (cl == NULL)
+ 		return -ENOBUFS;
+ 
+-	gnet_stats_basic_packed_init(&cl->bstats);
++	gnet_stats_basic_sync_init(&cl->bstats);
+ 	cl->common.classid = classid;
+ 	cl->deficit = lmax;
+ 
diff --git a/patches/0009-drm-i915-Drop-the-irqs_disabled-check.patch b/patches/0009-drm-i915-Drop-the-irqs_disabled-check.patch
new file mode 100644
index 000000000000..8fc17fbdd8f8
--- /dev/null
+++ b/patches/0009-drm-i915-Drop-the-irqs_disabled-check.patch
@@ -0,0 +1,38 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Fri, 1 Oct 2021 20:01:03 +0200
+Subject: [PATCH 09/10] drm/i915: Drop the irqs_disabled() check
+
+The !irqs_disabled() check triggers on PREEMPT_RT even with
+i915_sched_engine::lock acquired. The reason is the lock is transformed
+into a sleeping lock on PREEMPT_RT and does not disable interrupts.
+
+There is no need to check for disabled interrupts. The lockdep
+annotation below already check if the lock has been acquired by the
+caller and will yell if the interrupts are not disabled.
+
+Remove the !irqs_disabled() check.
+
+Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/gpu/drm/i915/i915_request.c |    2 --
+ 1 file changed, 2 deletions(-)
+
+--- a/drivers/gpu/drm/i915/i915_request.c
++++ b/drivers/gpu/drm/i915/i915_request.c
+@@ -559,7 +559,6 @@ bool __i915_request_submit(struct i915_r
+ 
+ 	RQ_TRACE(request, "\n");
+ 
+-	GEM_BUG_ON(!irqs_disabled());
+ 	lockdep_assert_held(&engine->sched_engine->lock);
+ 
+ 	/*
+@@ -668,7 +667,6 @@ void __i915_request_unsubmit(struct i915
+ 	 */
+ 	RQ_TRACE(request, "\n");
+ 
+-	GEM_BUG_ON(!irqs_disabled());
+ 	lockdep_assert_held(&engine->sched_engine->lock);
+ 
+ 	/*
diff --git a/patches/0009-net-sched-Remove-Qdisc-running-sequence-counter.patch b/patches/0009-net-sched-Remove-Qdisc-running-sequence-counter.patch
new file mode 100644
index 000000000000..cd48c16f0e5a
--- /dev/null
+++ b/patches/0009-net-sched-Remove-Qdisc-running-sequence-counter.patch
@@ -0,0 +1,822 @@
+From: "Ahmed S. Darwish" <a.darwish@linutronix.de>
+Date: Fri, 17 Sep 2021 13:31:41 +0200
+Subject: [PATCH 09/10] net: sched: Remove Qdisc::running sequence counter
+
+The Qdisc::running sequence counter has two uses:
+
+  1. Reliably reading qdisc's tc statistics while the qdisc is running
+     (a seqcount read/retry loop at __gnet_stats_copy_basic()).
+
+  2. As a flag, indicating whether the qdisc in question is running
+     (without any retry loops).
+
+For the first usage, the Qdisc::running sequence counter write section,
+qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what
+is actually needed: the raw qdisc's bstats update. A u64_stats sync
+point was thus introduced (in previous commits) inside the bstats
+structure itself. A local u64_stats write section is then started and
+stopped for the bstats updates.
+
+Use that u64_stats sync point mechanism for the bstats read/retry loop
+at __gnet_stats_copy_basic().
+
+For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag,
+accessed with atomic bitops, is sufficient. Using a bit flag instead of
+a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads
+to the SMP barriers implicitly added through raw_read_seqcount() and
+write_seqcount_begin/end() getting removed. All call sites have been
+surveyed though, and no required ordering was identified.
+
+Now that the qdisc->running sequence counter is no longer used, remove
+it.
+
+Note, using u64_stats implies no sequence counter protection for 64-bit
+architectures. This can lead to the qdisc tc statistics "packets" vs.
+"bytes" values getting out of sync on rare occasions. The individual
+values will still be valid.
+
+Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/linux/netdevice.h |    4 ---
+ include/net/gen_stats.h   |   22 +++++++++----------
+ include/net/sch_generic.h |   33 ++++++++++++-----------------
+ net/core/gen_estimator.c  |   16 +++++++++-----
+ net/core/gen_stats.c      |   51 ++++++++++++++++++++++++++--------------------
+ net/sched/act_api.c       |    9 ++++----
+ net/sched/act_police.c    |    2 -
+ net/sched/sch_api.c       |   16 ++------------
+ net/sched/sch_atm.c       |    3 --
+ net/sched/sch_cbq.c       |    9 ++------
+ net/sched/sch_drr.c       |   10 ++-------
+ net/sched/sch_ets.c       |    3 --
+ net/sched/sch_generic.c   |   10 +--------
+ net/sched/sch_hfsc.c      |    8 ++-----
+ net/sched/sch_htb.c       |    7 ++----
+ net/sched/sch_mq.c        |    8 ++-----
+ net/sched/sch_mqprio.c    |   16 ++++++--------
+ net/sched/sch_multiq.c    |    3 --
+ net/sched/sch_prio.c      |    4 +--
+ net/sched/sch_qfq.c       |    7 ++----
+ net/sched/sch_taprio.c    |    2 -
+ 21 files changed, 106 insertions(+), 137 deletions(-)
+
+--- a/include/linux/netdevice.h
++++ b/include/linux/netdevice.h
+@@ -1916,7 +1916,6 @@ enum netdev_ml_priv_type {
+  *	@sfp_bus:	attached &struct sfp_bus structure.
+  *
+  *	@qdisc_tx_busylock: lockdep class annotating Qdisc->busylock spinlock
+- *	@qdisc_running_key: lockdep class annotating Qdisc->running seqcount
+  *
+  *	@proto_down:	protocol port state information can be sent to the
+  *			switch driver and used to set the phys state of the
+@@ -2250,7 +2249,6 @@ struct net_device {
+ 	struct phy_device	*phydev;
+ 	struct sfp_bus		*sfp_bus;
+ 	struct lock_class_key	*qdisc_tx_busylock;
+-	struct lock_class_key	*qdisc_running_key;
+ 	bool			proto_down;
+ 	unsigned		wol_enabled:1;
+ 	unsigned		threaded:1;
+@@ -2360,13 +2358,11 @@ static inline void netdev_for_each_tx_qu
+ #define netdev_lockdep_set_classes(dev)				\
+ {								\
+ 	static struct lock_class_key qdisc_tx_busylock_key;	\
+-	static struct lock_class_key qdisc_running_key;		\
+ 	static struct lock_class_key qdisc_xmit_lock_key;	\
+ 	static struct lock_class_key dev_addr_list_lock_key;	\
+ 	unsigned int i;						\
+ 								\
+ 	(dev)->qdisc_tx_busylock = &qdisc_tx_busylock_key;	\
+-	(dev)->qdisc_running_key = &qdisc_running_key;		\
+ 	lockdep_set_class(&(dev)->addr_list_lock,		\
+ 			  &dev_addr_list_lock_key);		\
+ 	for (i = 0; i < (dev)->num_tx_queues; i++)		\
+--- a/include/net/gen_stats.h
++++ b/include/net/gen_stats.h
+@@ -58,18 +58,18 @@ int gnet_stats_start_copy_compat(struct
+ 				 spinlock_t *lock, struct gnet_dump *d,
+ 				 int padattr);
+ 
+-int gnet_stats_copy_basic(const seqcount_t *running,
+-			  struct gnet_dump *d,
++int gnet_stats_copy_basic(struct gnet_dump *d,
+ 			  struct gnet_stats_basic_sync __percpu *cpu,
+-			  struct gnet_stats_basic_sync *b);
+-void __gnet_stats_copy_basic(const seqcount_t *running,
+-			     struct gnet_stats_basic_sync *bstats,
++			  struct gnet_stats_basic_sync *b,
++			  bool running);
++void __gnet_stats_copy_basic(struct gnet_stats_basic_sync *bstats,
+ 			     struct gnet_stats_basic_sync __percpu *cpu,
+-			     struct gnet_stats_basic_sync *b);
+-int gnet_stats_copy_basic_hw(const seqcount_t *running,
+-			     struct gnet_dump *d,
++			     struct gnet_stats_basic_sync *b,
++			     bool running);
++int gnet_stats_copy_basic_hw(struct gnet_dump *d,
+ 			     struct gnet_stats_basic_sync __percpu *cpu,
+-			     struct gnet_stats_basic_sync *b);
++			     struct gnet_stats_basic_sync *b,
++			     bool unning);
+ int gnet_stats_copy_rate_est(struct gnet_dump *d,
+ 			     struct net_rate_estimator __rcu **ptr);
+ int gnet_stats_copy_queue(struct gnet_dump *d,
+@@ -86,13 +86,13 @@ int gen_new_estimator(struct gnet_stats_
+ 		      struct gnet_stats_basic_sync __percpu *cpu_bstats,
+ 		      struct net_rate_estimator __rcu **rate_est,
+ 		      spinlock_t *lock,
+-		      seqcount_t *running, struct nlattr *opt);
++		      bool running, struct nlattr *opt);
+ void gen_kill_estimator(struct net_rate_estimator __rcu **ptr);
+ int gen_replace_estimator(struct gnet_stats_basic_sync *bstats,
+ 			  struct gnet_stats_basic_sync __percpu *cpu_bstats,
+ 			  struct net_rate_estimator __rcu **ptr,
+ 			  spinlock_t *lock,
+-			  seqcount_t *running, struct nlattr *opt);
++			  bool running, struct nlattr *opt);
+ bool gen_estimator_active(struct net_rate_estimator __rcu **ptr);
+ bool gen_estimator_read(struct net_rate_estimator __rcu **ptr,
+ 			struct gnet_stats_rate_est64 *sample);
+--- a/include/net/sch_generic.h
++++ b/include/net/sch_generic.h
+@@ -38,6 +38,10 @@ enum qdisc_state_t {
+ 	__QDISC_STATE_DEACTIVATED,
+ 	__QDISC_STATE_MISSED,
+ 	__QDISC_STATE_DRAINING,
++	/* Only for !TCQ_F_NOLOCK qdisc. Never access it directly.
++	 * Use qdisc_run_begin/end() or qdisc_is_running() instead.
++	 */
++	__QDISC_STATE_RUNNING,
+ };
+ 
+ #define QDISC_STATE_MISSED	BIT(__QDISC_STATE_MISSED)
+@@ -108,7 +112,6 @@ struct Qdisc {
+ 	struct sk_buff_head	gso_skb ____cacheline_aligned_in_smp;
+ 	struct qdisc_skb_head	q;
+ 	struct gnet_stats_basic_sync bstats;
+-	seqcount_t		running;
+ 	struct gnet_stats_queue	qstats;
+ 	unsigned long		state;
+ 	struct Qdisc            *next_sched;
+@@ -143,11 +146,15 @@ static inline struct Qdisc *qdisc_refcou
+ 	return NULL;
+ }
+ 
++/* For !TCQ_F_NOLOCK qdisc: callers must either call this within a qdisc
++ * root_lock section, or provide their own memory barriers -- ordering
++ * against qdisc_run_begin/end() atomic bit operations.
++ */
+ static inline bool qdisc_is_running(struct Qdisc *qdisc)
+ {
+ 	if (qdisc->flags & TCQ_F_NOLOCK)
+ 		return spin_is_locked(&qdisc->seqlock);
+-	return (raw_read_seqcount(&qdisc->running) & 1) ? true : false;
++	return test_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+ }
+ 
+ static inline bool nolock_qdisc_is_empty(const struct Qdisc *qdisc)
+@@ -167,6 +174,9 @@ static inline bool qdisc_is_empty(const
+ 	return !READ_ONCE(qdisc->q.qlen);
+ }
+ 
++/* For !TCQ_F_NOLOCK qdisc, qdisc_run_begin/end() must be invoked with
++ * the qdisc root lock acquired.
++ */
+ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
+ {
+ 	if (qdisc->flags & TCQ_F_NOLOCK) {
+@@ -206,15 +216,8 @@ static inline bool qdisc_run_begin(struc
+ 		 * after it releases the lock at the end of qdisc_run_end().
+ 		 */
+ 		return spin_trylock(&qdisc->seqlock);
+-	} else if (qdisc_is_running(qdisc)) {
+-		return false;
+ 	}
+-	/* Variant of write_seqcount_begin() telling lockdep a trylock
+-	 * was attempted.
+-	 */
+-	raw_write_seqcount_begin(&qdisc->running);
+-	seqcount_acquire(&qdisc->running.dep_map, 0, 1, _RET_IP_);
+-	return true;
++	return test_and_set_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+ }
+ 
+ static inline void qdisc_run_end(struct Qdisc *qdisc)
+@@ -226,7 +229,7 @@ static inline void qdisc_run_end(struct
+ 				      &qdisc->state)))
+ 			__netif_schedule(qdisc);
+ 	} else {
+-		write_seqcount_end(&qdisc->running);
++		clear_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+ 	}
+ }
+ 
+@@ -590,14 +593,6 @@ static inline spinlock_t *qdisc_root_sle
+ 	return qdisc_lock(root);
+ }
+ 
+-static inline seqcount_t *qdisc_root_sleeping_running(const struct Qdisc *qdisc)
+-{
+-	struct Qdisc *root = qdisc_root_sleeping(qdisc);
+-
+-	ASSERT_RTNL();
+-	return &root->running;
+-}
+-
+ static inline struct net_device *qdisc_dev(const struct Qdisc *qdisc)
+ {
+ 	return qdisc->dev_queue->dev;
+--- a/net/core/gen_estimator.c
++++ b/net/core/gen_estimator.c
+@@ -42,7 +42,7 @@
+ struct net_rate_estimator {
+ 	struct gnet_stats_basic_sync	*bstats;
+ 	spinlock_t		*stats_lock;
+-	seqcount_t		*running;
++	bool			running;
+ 	struct gnet_stats_basic_sync __percpu *cpu_bstats;
+ 	u8			ewma_log;
+ 	u8			intvl_log; /* period : (250ms << intvl_log) */
+@@ -66,7 +66,7 @@ static void est_fetch_counters(struct ne
+ 	if (e->stats_lock)
+ 		spin_lock(e->stats_lock);
+ 
+-	__gnet_stats_copy_basic(e->running, b, e->cpu_bstats, e->bstats);
++	__gnet_stats_copy_basic(b, e->cpu_bstats, e->bstats, e->running);
+ 
+ 	if (e->stats_lock)
+ 		spin_unlock(e->stats_lock);
+@@ -113,7 +113,9 @@ static void est_timer(struct timer_list
+  * @cpu_bstats: bstats per cpu
+  * @rate_est: rate estimator statistics
+  * @lock: lock for statistics and control path
+- * @running: qdisc running seqcount
++ * @running: true if @bstats represents a running qdisc, thus @bstats'
++ *           internal values might change during basic reads. Only used
++ *           if @bstats_cpu is NULL
+  * @opt: rate estimator configuration TLV
+  *
+  * Creates a new rate estimator with &bstats as source and &rate_est
+@@ -129,7 +131,7 @@ int gen_new_estimator(struct gnet_stats_
+ 		      struct gnet_stats_basic_sync __percpu *cpu_bstats,
+ 		      struct net_rate_estimator __rcu **rate_est,
+ 		      spinlock_t *lock,
+-		      seqcount_t *running,
++		      bool running,
+ 		      struct nlattr *opt)
+ {
+ 	struct gnet_estimator *parm = nla_data(opt);
+@@ -218,7 +220,9 @@ EXPORT_SYMBOL(gen_kill_estimator);
+  * @cpu_bstats: bstats per cpu
+  * @rate_est: rate estimator statistics
+  * @lock: lock for statistics and control path
+- * @running: qdisc running seqcount (might be NULL)
++ * @running: true if @bstats represents a running qdisc, thus @bstats'
++ *           internal values might change during basic reads. Only used
++ *           if @cpu_bstats is NULL
+  * @opt: rate estimator configuration TLV
+  *
+  * Replaces the configuration of a rate estimator by calling
+@@ -230,7 +234,7 @@ int gen_replace_estimator(struct gnet_st
+ 			  struct gnet_stats_basic_sync __percpu *cpu_bstats,
+ 			  struct net_rate_estimator __rcu **rate_est,
+ 			  spinlock_t *lock,
+-			  seqcount_t *running, struct nlattr *opt)
++			  bool running, struct nlattr *opt)
+ {
+ 	return gen_new_estimator(bstats, cpu_bstats, rate_est,
+ 				 lock, running, opt);
+--- a/net/core/gen_stats.c
++++ b/net/core/gen_stats.c
+@@ -150,42 +150,43 @@ static void
+ }
+ 
+ void
+-__gnet_stats_copy_basic(const seqcount_t *running,
+-			struct gnet_stats_basic_sync *bstats,
++__gnet_stats_copy_basic(struct gnet_stats_basic_sync *bstats,
+ 			struct gnet_stats_basic_sync __percpu *cpu,
+-			struct gnet_stats_basic_sync *b)
++			struct gnet_stats_basic_sync *b,
++			bool running)
+ {
+-	unsigned int seq;
++	unsigned int start;
+ 	__u64 bytes = 0;
+ 	__u64 packets = 0;
+ 
++	WARN_ON_ONCE((cpu || running) && !in_task());
++
+ 	if (cpu) {
+ 		__gnet_stats_copy_basic_cpu(bstats, cpu);
+ 		return;
+ 	}
+ 	do {
+ 		if (running)
+-			seq = read_seqcount_begin(running);
++			start = u64_stats_fetch_begin_irq(&b->syncp);
+ 		bytes = u64_stats_read(&b->bytes);
+ 		packets = u64_stats_read(&b->packets);
+-	} while (running && read_seqcount_retry(running, seq));
++	} while (running && u64_stats_fetch_retry_irq(&b->syncp, start));
+ 
+ 	_bstats_update(bstats, bytes, packets);
+ }
+ EXPORT_SYMBOL(__gnet_stats_copy_basic);
+ 
+ static int
+-___gnet_stats_copy_basic(const seqcount_t *running,
+-			 struct gnet_dump *d,
++___gnet_stats_copy_basic(struct gnet_dump *d,
+ 			 struct gnet_stats_basic_sync __percpu *cpu,
+ 			 struct gnet_stats_basic_sync *b,
+-			 int type)
++			 int type, bool running)
+ {
+ 	struct gnet_stats_basic_sync bstats;
+ 	u64 bstats_bytes, bstats_packets;
+ 
+ 	gnet_stats_basic_sync_init(&bstats);
+-	__gnet_stats_copy_basic(running, &bstats, cpu, b);
++	__gnet_stats_copy_basic(&bstats, cpu, b, running);
+ 
+ 	bstats_bytes = u64_stats_read(&bstats.bytes);
+ 	bstats_packets = u64_stats_read(&bstats.packets);
+@@ -214,10 +215,14 @@ static int
+ 
+ /**
+  * gnet_stats_copy_basic - copy basic statistics into statistic TLV
+- * @running: seqcount_t pointer
+  * @d: dumping handle
+  * @cpu: copy statistic per cpu
+  * @b: basic statistics
++ * @running: true if @b represents a running qdisc, thus @b's
++ *           internal values might change during basic reads.
++ *           Only used if @cpu is NULL
++ *
++ * Context: task; must not be run from IRQ or BH contexts
+  *
+  * Appends the basic statistics to the top level TLV created by
+  * gnet_stats_start_copy().
+@@ -226,22 +231,25 @@ static int
+  * if the room in the socket buffer was not sufficient.
+  */
+ int
+-gnet_stats_copy_basic(const seqcount_t *running,
+-		      struct gnet_dump *d,
++gnet_stats_copy_basic(struct gnet_dump *d,
+ 		      struct gnet_stats_basic_sync __percpu *cpu,
+-		      struct gnet_stats_basic_sync *b)
++		      struct gnet_stats_basic_sync *b,
++		      bool running)
+ {
+-	return ___gnet_stats_copy_basic(running, d, cpu, b,
+-					TCA_STATS_BASIC);
++	return ___gnet_stats_copy_basic(d, cpu, b, TCA_STATS_BASIC, running);
+ }
+ EXPORT_SYMBOL(gnet_stats_copy_basic);
+ 
+ /**
+  * gnet_stats_copy_basic_hw - copy basic hw statistics into statistic TLV
+- * @running: seqcount_t pointer
+  * @d: dumping handle
+  * @cpu: copy statistic per cpu
+  * @b: basic statistics
++ * @running: true if @b represents a running qdisc, thus @b's
++ *           internal values might change during basic reads.
++ *           Only used if @cpu is NULL
++ *
++ * Context: task; must not be run from IRQ or BH contexts
+  *
+  * Appends the basic statistics to the top level TLV created by
+  * gnet_stats_start_copy().
+@@ -250,13 +258,12 @@ EXPORT_SYMBOL(gnet_stats_copy_basic);
+  * if the room in the socket buffer was not sufficient.
+  */
+ int
+-gnet_stats_copy_basic_hw(const seqcount_t *running,
+-			 struct gnet_dump *d,
++gnet_stats_copy_basic_hw(struct gnet_dump *d,
+ 			 struct gnet_stats_basic_sync __percpu *cpu,
+-			 struct gnet_stats_basic_sync *b)
++			 struct gnet_stats_basic_sync *b,
++			 bool running)
+ {
+-	return ___gnet_stats_copy_basic(running, d, cpu, b,
+-					TCA_STATS_BASIC_HW);
++	return ___gnet_stats_copy_basic(d, cpu, b, TCA_STATS_BASIC_HW, running);
+ }
+ EXPORT_SYMBOL(gnet_stats_copy_basic_hw);
+ 
+--- a/net/sched/act_api.c
++++ b/net/sched/act_api.c
+@@ -501,7 +501,7 @@ int tcf_idr_create(struct tc_action_net
+ 	if (est) {
+ 		err = gen_new_estimator(&p->tcfa_bstats, p->cpu_bstats,
+ 					&p->tcfa_rate_est,
+-					&p->tcfa_lock, NULL, est);
++					&p->tcfa_lock, false, est);
+ 		if (err)
+ 			goto err4;
+ 	}
+@@ -1173,9 +1173,10 @@ int tcf_action_copy_stats(struct sk_buff
+ 	if (err < 0)
+ 		goto errout;
+ 
+-	if (gnet_stats_copy_basic(NULL, &d, p->cpu_bstats, &p->tcfa_bstats) < 0 ||
+-	    gnet_stats_copy_basic_hw(NULL, &d, p->cpu_bstats_hw,
+-				     &p->tcfa_bstats_hw) < 0 ||
++	if (gnet_stats_copy_basic(&d, p->cpu_bstats,
++				  &p->tcfa_bstats, false ) < 0 ||
++	    gnet_stats_copy_basic_hw(&d, p->cpu_bstats_hw,
++				     &p->tcfa_bstats_hw, false) < 0 ||
+ 	    gnet_stats_copy_rate_est(&d, &p->tcfa_rate_est) < 0 ||
+ 	    gnet_stats_copy_queue(&d, p->cpu_qstats,
+ 				  &p->tcfa_qstats,
+--- a/net/sched/act_police.c
++++ b/net/sched/act_police.c
+@@ -125,7 +125,7 @@ static int tcf_police_init(struct net *n
+ 					    police->common.cpu_bstats,
+ 					    &police->tcf_rate_est,
+ 					    &police->tcf_lock,
+-					    NULL, est);
++					    false, est);
+ 		if (err)
+ 			goto failure;
+ 	} else if (tb[TCA_POLICE_AVRATE] &&
+--- a/net/sched/sch_api.c
++++ b/net/sched/sch_api.c
+@@ -942,8 +942,7 @@ static int tc_fill_qdisc(struct sk_buff
+ 		cpu_qstats = q->cpu_qstats;
+ 	}
+ 
+-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(q),
+-				  &d, cpu_bstats, &q->bstats) < 0 ||
++	if (gnet_stats_copy_basic(&d, cpu_bstats, &q->bstats, true) < 0 ||
+ 	    gnet_stats_copy_rate_est(&d, &q->rate_est) < 0 ||
+ 	    gnet_stats_copy_queue(&d, cpu_qstats, &q->qstats, qlen) < 0)
+ 		goto nla_put_failure;
+@@ -1264,26 +1263,17 @@ static struct Qdisc *qdisc_create(struct
+ 		rcu_assign_pointer(sch->stab, stab);
+ 	}
+ 	if (tca[TCA_RATE]) {
+-		seqcount_t *running;
+-
+ 		err = -EOPNOTSUPP;
+ 		if (sch->flags & TCQ_F_MQROOT) {
+ 			NL_SET_ERR_MSG(extack, "Cannot attach rate estimator to a multi-queue root qdisc");
+ 			goto err_out4;
+ 		}
+ 
+-		if (sch->parent != TC_H_ROOT &&
+-		    !(sch->flags & TCQ_F_INGRESS) &&
+-		    (!p || !(p->flags & TCQ_F_MQROOT)))
+-			running = qdisc_root_sleeping_running(sch);
+-		else
+-			running = &sch->running;
+-
+ 		err = gen_new_estimator(&sch->bstats,
+ 					sch->cpu_bstats,
+ 					&sch->rate_est,
+ 					NULL,
+-					running,
++					true,
+ 					tca[TCA_RATE]);
+ 		if (err) {
+ 			NL_SET_ERR_MSG(extack, "Failed to generate new estimator");
+@@ -1359,7 +1349,7 @@ static int qdisc_change(struct Qdisc *sc
+ 				      sch->cpu_bstats,
+ 				      &sch->rate_est,
+ 				      NULL,
+-				      qdisc_root_sleeping_running(sch),
++				      true,
+ 				      tca[TCA_RATE]);
+ 	}
+ out:
+--- a/net/sched/sch_atm.c
++++ b/net/sched/sch_atm.c
+@@ -653,8 +653,7 @@ atm_tc_dump_class_stats(struct Qdisc *sc
+ {
+ 	struct atm_flow_data *flow = (struct atm_flow_data *)arg;
+ 
+-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
+-				  d, NULL, &flow->bstats) < 0 ||
++	if (gnet_stats_copy_basic(d, NULL, &flow->bstats, true) < 0 ||
+ 	    gnet_stats_copy_queue(d, NULL, &flow->qstats, flow->q->q.qlen) < 0)
+ 		return -1;
+ 
+--- a/net/sched/sch_cbq.c
++++ b/net/sched/sch_cbq.c
+@@ -1383,8 +1383,7 @@ cbq_dump_class_stats(struct Qdisc *sch,
+ 	if (cl->undertime != PSCHED_PASTPERFECT)
+ 		cl->xstats.undertime = cl->undertime - q->now;
+ 
+-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
+-				  d, NULL, &cl->bstats) < 0 ||
++	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
+ 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
+ 	    gnet_stats_copy_queue(d, NULL, &cl->qstats, qlen) < 0)
+ 		return -1;
+@@ -1518,7 +1517,7 @@ cbq_change_class(struct Qdisc *sch, u32
+ 			err = gen_replace_estimator(&cl->bstats, NULL,
+ 						    &cl->rate_est,
+ 						    NULL,
+-						    qdisc_root_sleeping_running(sch),
++						    true,
+ 						    tca[TCA_RATE]);
+ 			if (err) {
+ 				NL_SET_ERR_MSG(extack, "Failed to replace specified rate estimator");
+@@ -1619,9 +1618,7 @@ cbq_change_class(struct Qdisc *sch, u32
+ 
+ 	if (tca[TCA_RATE]) {
+ 		err = gen_new_estimator(&cl->bstats, NULL, &cl->rate_est,
+-					NULL,
+-					qdisc_root_sleeping_running(sch),
+-					tca[TCA_RATE]);
++					NULL, true, tca[TCA_RATE]);
+ 		if (err) {
+ 			NL_SET_ERR_MSG(extack, "Couldn't create new estimator");
+ 			tcf_block_put(cl->block);
+--- a/net/sched/sch_drr.c
++++ b/net/sched/sch_drr.c
+@@ -85,8 +85,7 @@ static int drr_change_class(struct Qdisc
+ 		if (tca[TCA_RATE]) {
+ 			err = gen_replace_estimator(&cl->bstats, NULL,
+ 						    &cl->rate_est,
+-						    NULL,
+-						    qdisc_root_sleeping_running(sch),
++						    NULL, true,
+ 						    tca[TCA_RATE]);
+ 			if (err) {
+ 				NL_SET_ERR_MSG(extack, "Failed to replace estimator");
+@@ -119,9 +118,7 @@ static int drr_change_class(struct Qdisc
+ 
+ 	if (tca[TCA_RATE]) {
+ 		err = gen_replace_estimator(&cl->bstats, NULL, &cl->rate_est,
+-					    NULL,
+-					    qdisc_root_sleeping_running(sch),
+-					    tca[TCA_RATE]);
++					    NULL, true, tca[TCA_RATE]);
+ 		if (err) {
+ 			NL_SET_ERR_MSG(extack, "Failed to replace estimator");
+ 			qdisc_put(cl->qdisc);
+@@ -268,8 +265,7 @@ static int drr_dump_class_stats(struct Q
+ 	if (qlen)
+ 		xstats.deficit = cl->deficit;
+ 
+-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
+-				  d, NULL, &cl->bstats) < 0 ||
++	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
+ 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
+ 	    gnet_stats_copy_queue(d, cl_q->cpu_qstats, &cl_q->qstats, qlen) < 0)
+ 		return -1;
+--- a/net/sched/sch_ets.c
++++ b/net/sched/sch_ets.c
+@@ -325,8 +325,7 @@ static int ets_class_dump_stats(struct Q
+ 	struct ets_class *cl = ets_class_from_arg(sch, arg);
+ 	struct Qdisc *cl_q = cl->qdisc;
+ 
+-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
+-				  d, NULL, &cl_q->bstats) < 0 ||
++	if (gnet_stats_copy_basic(d, NULL, &cl_q->bstats, true) < 0 ||
+ 	    qdisc_qstats_copy(d, cl_q) < 0)
+ 		return -1;
+ 
+--- a/net/sched/sch_generic.c
++++ b/net/sched/sch_generic.c
+@@ -304,8 +304,8 @@ static struct sk_buff *dequeue_skb(struc
+ 
+ /*
+  * Transmit possibly several skbs, and handle the return status as
+- * required. Owning running seqcount bit guarantees that
+- * only one CPU can execute this function.
++ * required. Owning qdisc running bit guarantees that only one CPU
++ * can execute this function.
+  *
+  * Returns to the caller:
+  *				false  - hardware queue frozen backoff
+@@ -606,7 +606,6 @@ struct Qdisc noop_qdisc = {
+ 	.ops		=	&noop_qdisc_ops,
+ 	.q.lock		=	__SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock),
+ 	.dev_queue	=	&noop_netdev_queue,
+-	.running	=	SEQCNT_ZERO(noop_qdisc.running),
+ 	.busylock	=	__SPIN_LOCK_UNLOCKED(noop_qdisc.busylock),
+ 	.gso_skb = {
+ 		.next = (struct sk_buff *)&noop_qdisc.gso_skb,
+@@ -867,7 +866,6 @@ struct Qdisc_ops pfifo_fast_ops __read_m
+ EXPORT_SYMBOL(pfifo_fast_ops);
+ 
+ static struct lock_class_key qdisc_tx_busylock;
+-static struct lock_class_key qdisc_running_key;
+ 
+ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
+ 			  const struct Qdisc_ops *ops,
+@@ -917,10 +915,6 @@ struct Qdisc *qdisc_alloc(struct netdev_
+ 	lockdep_set_class(&sch->seqlock,
+ 			  dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);
+ 
+-	seqcount_init(&sch->running);
+-	lockdep_set_class(&sch->running,
+-			  dev->qdisc_running_key ?: &qdisc_running_key);
+-
+ 	sch->ops = ops;
+ 	sch->flags = ops->static_flags;
+ 	sch->enqueue = ops->enqueue;
+--- a/net/sched/sch_hfsc.c
++++ b/net/sched/sch_hfsc.c
+@@ -965,7 +965,7 @@ hfsc_change_class(struct Qdisc *sch, u32
+ 			err = gen_replace_estimator(&cl->bstats, NULL,
+ 						    &cl->rate_est,
+ 						    NULL,
+-						    qdisc_root_sleeping_running(sch),
++						    true,
+ 						    tca[TCA_RATE]);
+ 			if (err)
+ 				return err;
+@@ -1033,9 +1033,7 @@ hfsc_change_class(struct Qdisc *sch, u32
+ 
+ 	if (tca[TCA_RATE]) {
+ 		err = gen_new_estimator(&cl->bstats, NULL, &cl->rate_est,
+-					NULL,
+-					qdisc_root_sleeping_running(sch),
+-					tca[TCA_RATE]);
++					NULL, true, tca[TCA_RATE]);
+ 		if (err) {
+ 			tcf_block_put(cl->block);
+ 			kfree(cl);
+@@ -1328,7 +1326,7 @@ hfsc_dump_class_stats(struct Qdisc *sch,
+ 	xstats.work    = cl->cl_total;
+ 	xstats.rtwork  = cl->cl_cumul;
+ 
+-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch), d, NULL, &cl->bstats) < 0 ||
++	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
+ 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
+ 	    gnet_stats_copy_queue(d, NULL, &cl->qstats, qlen) < 0)
+ 		return -1;
+--- a/net/sched/sch_htb.c
++++ b/net/sched/sch_htb.c
+@@ -1368,8 +1368,7 @@ htb_dump_class_stats(struct Qdisc *sch,
+ 		}
+ 	}
+ 
+-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
+-				  d, NULL, &cl->bstats) < 0 ||
++	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
+ 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
+ 	    gnet_stats_copy_queue(d, NULL, &qs, qlen) < 0)
+ 		return -1;
+@@ -1865,7 +1864,7 @@ static int htb_change_class(struct Qdisc
+ 			err = gen_new_estimator(&cl->bstats, NULL,
+ 						&cl->rate_est,
+ 						NULL,
+-						qdisc_root_sleeping_running(sch),
++						true,
+ 						tca[TCA_RATE] ? : &est.nla);
+ 			if (err)
+ 				goto err_block_put;
+@@ -1991,7 +1990,7 @@ static int htb_change_class(struct Qdisc
+ 			err = gen_replace_estimator(&cl->bstats, NULL,
+ 						    &cl->rate_est,
+ 						    NULL,
+-						    qdisc_root_sleeping_running(sch),
++						    true,
+ 						    tca[TCA_RATE]);
+ 			if (err)
+ 				return err;
+--- a/net/sched/sch_mq.c
++++ b/net/sched/sch_mq.c
+@@ -147,9 +147,8 @@ static int mq_dump(struct Qdisc *sch, st
+ 
+ 		qlen = qdisc_qlen_sum(qdisc);
+ 
+-		__gnet_stats_copy_basic(NULL, &sch->bstats,
+-					qdisc->cpu_bstats,
+-					&qdisc->bstats);
++		__gnet_stats_copy_basic(&sch->bstats, qdisc->cpu_bstats,
++					&qdisc->bstats, false);
+ 		__gnet_stats_copy_queue(&sch->qstats,
+ 					qdisc->cpu_qstats,
+ 					&qdisc->qstats, qlen);
+@@ -235,8 +234,7 @@ static int mq_dump_class_stats(struct Qd
+ 	struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
+ 
+ 	sch = dev_queue->qdisc_sleeping;
+-	if (gnet_stats_copy_basic(&sch->running, d, sch->cpu_bstats,
+-				  &sch->bstats) < 0 ||
++	if (gnet_stats_copy_basic(d, sch->cpu_bstats, &sch->bstats, true) < 0 ||
+ 	    qdisc_qstats_copy(d, sch) < 0)
+ 		return -1;
+ 	return 0;
+--- a/net/sched/sch_mqprio.c
++++ b/net/sched/sch_mqprio.c
+@@ -404,9 +404,8 @@ static int mqprio_dump(struct Qdisc *sch
+ 		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
+ 		spin_lock_bh(qdisc_lock(qdisc));
+ 
+-		__gnet_stats_copy_basic(NULL, &sch->bstats,
+-					qdisc->cpu_bstats,
+-					&qdisc->bstats);
++		__gnet_stats_copy_basic(&sch->bstats, qdisc->cpu_bstats,
++					&qdisc->bstats, false);
+ 		__gnet_stats_copy_queue(&sch->qstats,
+ 					qdisc->cpu_qstats,
+ 					&qdisc->qstats, qlen);
+@@ -524,9 +523,8 @@ static int mqprio_dump_class_stats(struc
+ 			spin_lock_bh(qdisc_lock(qdisc));
+ 
+ 			qlen = qdisc_qlen_sum(qdisc);
+-			__gnet_stats_copy_basic(NULL, &bstats,
+-						qdisc->cpu_bstats,
+-						&qdisc->bstats);
++			__gnet_stats_copy_basic(&bstats, qdisc->cpu_bstats,
++						&qdisc->bstats, false);
+ 			__gnet_stats_copy_queue(&qstats,
+ 						qdisc->cpu_qstats,
+ 						&qdisc->qstats,
+@@ -537,15 +535,15 @@ static int mqprio_dump_class_stats(struc
+ 		/* Reclaim root sleeping lock before completing stats */
+ 		if (d->lock)
+ 			spin_lock_bh(d->lock);
+-		if (gnet_stats_copy_basic(NULL, d, NULL, &bstats) < 0 ||
++		if (gnet_stats_copy_basic(d, NULL, &bstats, false) < 0 ||
+ 		    gnet_stats_copy_queue(d, NULL, &qstats, qlen) < 0)
+ 			return -1;
+ 	} else {
+ 		struct netdev_queue *dev_queue = mqprio_queue_get(sch, cl);
+ 
+ 		sch = dev_queue->qdisc_sleeping;
+-		if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch), d,
+-					  sch->cpu_bstats, &sch->bstats) < 0 ||
++		if (gnet_stats_copy_basic(d, sch->cpu_bstats,
++					  &sch->bstats, true) < 0 ||
+ 		    qdisc_qstats_copy(d, sch) < 0)
+ 			return -1;
+ 	}
+--- a/net/sched/sch_multiq.c
++++ b/net/sched/sch_multiq.c
+@@ -338,8 +338,7 @@ static int multiq_dump_class_stats(struc
+ 	struct Qdisc *cl_q;
+ 
+ 	cl_q = q->queues[cl - 1];
+-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
+-				  d, cl_q->cpu_bstats, &cl_q->bstats) < 0 ||
++	if (gnet_stats_copy_basic(d, cl_q->cpu_bstats, &cl_q->bstats, true) < 0 ||
+ 	    qdisc_qstats_copy(d, cl_q) < 0)
+ 		return -1;
+ 
+--- a/net/sched/sch_prio.c
++++ b/net/sched/sch_prio.c
+@@ -361,8 +361,8 @@ static int prio_dump_class_stats(struct
+ 	struct Qdisc *cl_q;
+ 
+ 	cl_q = q->queues[cl - 1];
+-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
+-				  d, cl_q->cpu_bstats, &cl_q->bstats) < 0 ||
++	if (gnet_stats_copy_basic(d, cl_q->cpu_bstats,
++				  &cl_q->bstats, true) < 0 ||
+ 	    qdisc_qstats_copy(d, cl_q) < 0)
+ 		return -1;
+ 
+--- a/net/sched/sch_qfq.c
++++ b/net/sched/sch_qfq.c
+@@ -451,7 +451,7 @@ static int qfq_change_class(struct Qdisc
+ 			err = gen_replace_estimator(&cl->bstats, NULL,
+ 						    &cl->rate_est,
+ 						    NULL,
+-						    qdisc_root_sleeping_running(sch),
++						    true,
+ 						    tca[TCA_RATE]);
+ 			if (err)
+ 				return err;
+@@ -478,7 +478,7 @@ static int qfq_change_class(struct Qdisc
+ 		err = gen_new_estimator(&cl->bstats, NULL,
+ 					&cl->rate_est,
+ 					NULL,
+-					qdisc_root_sleeping_running(sch),
++					true,
+ 					tca[TCA_RATE]);
+ 		if (err)
+ 			goto destroy_class;
+@@ -640,8 +640,7 @@ static int qfq_dump_class_stats(struct Q
+ 	xstats.weight = cl->agg->class_weight;
+ 	xstats.lmax = cl->agg->lmax;
+ 
+-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
+-				  d, NULL, &cl->bstats) < 0 ||
++	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
+ 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
+ 	    qdisc_qstats_copy(d, cl->qdisc) < 0)
+ 		return -1;
+--- a/net/sched/sch_taprio.c
++++ b/net/sched/sch_taprio.c
+@@ -1973,7 +1973,7 @@ static int taprio_dump_class_stats(struc
+ 	struct netdev_queue *dev_queue = taprio_queue_get(sch, cl);
+ 
+ 	sch = dev_queue->qdisc_sleeping;
+-	if (gnet_stats_copy_basic(&sch->running, d, NULL, &sch->bstats) < 0 ||
++	if (gnet_stats_copy_basic(d, NULL, &sch->bstats, true) < 0 ||
+ 	    qdisc_qstats_copy(d, sch) < 0)
+ 		return -1;
+ 	return 0;
diff --git a/patches/0010-drm-i915-Don-t-disable-interrupts-and-pretend-a-lock.patch b/patches/0010-drm-i915-Don-t-disable-interrupts-and-pretend-a-lock.patch
new file mode 100644
index 000000000000..0021138be365
--- /dev/null
+++ b/patches/0010-drm-i915-Don-t-disable-interrupts-and-pretend-a-lock.patch
@@ -0,0 +1,176 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Tue, 7 Jul 2020 12:25:11 +0200
+Subject: [PATCH 10/10] drm/i915: Don't disable interrupts and pretend a lock
+ as been acquired in __timeline_mark_lock().
+
+This is a revert of commits
+   d67739268cf0e ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe")
+   6c69a45445af9 ("drm/i915/gt: Mark context->active_count as protected by timeline->mutex")
+
+The existing code leads to a different behaviour depending on wheather
+lockdep is enabled or not. Any following lock that is acquired without
+disabling interrupts (but needs to) will not be noticed by lockdep.
+
+This it not just a lockdep annotation but is used but an actual mutex_t
+that is properly used as a lock but in case of __timeline_mark_lock()
+lockdep is only told that it is acquired but no lock has been acquired.
+
+It appears that its purporse is just satisfy the lockdep_assert_held()
+check in intel_context_mark_active(). The other problem with disabling
+interrupts is that on PREEMPT_RT interrupts are also disabled which
+leads to problems for instance later during memory allocation.
+
+Add an argument to intel_context_mark_active() which is true if the lock
+must have been acquired, false if other magic is involved and the lock
+is not needed. Use the `false' argument only from within
+switch_to_kernel_context() and remove __timeline_mark_lock().
+
+Cc: Peter Zijlstra <peterz@infradead.org>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/gpu/drm/i915/gt/intel_context.h          |    6 ++-
+ drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c |    2 -
+ drivers/gpu/drm/i915/gt/intel_engine_pm.c        |   38 -----------------------
+ drivers/gpu/drm/i915/i915_request.c              |    7 ++--
+ drivers/gpu/drm/i915/i915_request.h              |    3 +
+ 5 files changed, 12 insertions(+), 44 deletions(-)
+
+--- a/drivers/gpu/drm/i915/gt/intel_context.h
++++ b/drivers/gpu/drm/i915/gt/intel_context.h
+@@ -161,9 +161,11 @@ static inline void intel_context_enter(s
+ 		ce->ops->enter(ce);
+ }
+ 
+-static inline void intel_context_mark_active(struct intel_context *ce)
++static inline void intel_context_mark_active(struct intel_context *ce,
++					     bool timeline_mutex_needed)
+ {
+-	lockdep_assert_held(&ce->timeline->mutex);
++	if (timeline_mutex_needed)
++		lockdep_assert_held(&ce->timeline->mutex);
+ 	++ce->active_count;
+ }
+ 
+--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
++++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+@@ -42,7 +42,7 @@ heartbeat_create(struct intel_context *c
+ 	struct i915_request *rq;
+ 
+ 	intel_context_enter(ce);
+-	rq = __i915_request_create(ce, gfp);
++	rq = __i915_request_create(ce, gfp, true);
+ 	intel_context_exit(ce);
+ 
+ 	return rq;
+--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
++++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+@@ -80,39 +80,6 @@ static int __engine_unpark(struct intel_
+ 	return 0;
+ }
+ 
+-#if IS_ENABLED(CONFIG_LOCKDEP)
+-
+-static unsigned long __timeline_mark_lock(struct intel_context *ce)
+-{
+-	unsigned long flags;
+-
+-	local_irq_save(flags);
+-	mutex_acquire(&ce->timeline->mutex.dep_map, 2, 0, _THIS_IP_);
+-
+-	return flags;
+-}
+-
+-static void __timeline_mark_unlock(struct intel_context *ce,
+-				   unsigned long flags)
+-{
+-	mutex_release(&ce->timeline->mutex.dep_map, _THIS_IP_);
+-	local_irq_restore(flags);
+-}
+-
+-#else
+-
+-static unsigned long __timeline_mark_lock(struct intel_context *ce)
+-{
+-	return 0;
+-}
+-
+-static void __timeline_mark_unlock(struct intel_context *ce,
+-				   unsigned long flags)
+-{
+-}
+-
+-#endif /* !IS_ENABLED(CONFIG_LOCKDEP) */
+-
+ static void duration(struct dma_fence *fence, struct dma_fence_cb *cb)
+ {
+ 	struct i915_request *rq = to_request(fence);
+@@ -159,7 +126,6 @@ static bool switch_to_kernel_context(str
+ {
+ 	struct intel_context *ce = engine->kernel_context;
+ 	struct i915_request *rq;
+-	unsigned long flags;
+ 	bool result = true;
+ 
+ 	/* GPU is pointing to the void, as good as in the kernel context. */
+@@ -201,10 +167,9 @@ static bool switch_to_kernel_context(str
+ 	 * engine->wakeref.count, we may see the request completion and retire
+ 	 * it causing an underflow of the engine->wakeref.
+ 	 */
+-	flags = __timeline_mark_lock(ce);
+ 	GEM_BUG_ON(atomic_read(&ce->timeline->active_count) < 0);
+ 
+-	rq = __i915_request_create(ce, GFP_NOWAIT);
++	rq = __i915_request_create(ce, GFP_NOWAIT, false);
+ 	if (IS_ERR(rq))
+ 		/* Context switch failed, hope for the best! Maybe reset? */
+ 		goto out_unlock;
+@@ -233,7 +198,6 @@ static bool switch_to_kernel_context(str
+ 
+ 	result = false;
+ out_unlock:
+-	__timeline_mark_unlock(ce, flags);
+ 	return result;
+ }
+ 
+--- a/drivers/gpu/drm/i915/i915_request.c
++++ b/drivers/gpu/drm/i915/i915_request.c
+@@ -833,7 +833,8 @@ static void __i915_request_ctor(void *ar
+ }
+ 
+ struct i915_request *
+-__i915_request_create(struct intel_context *ce, gfp_t gfp)
++__i915_request_create(struct intel_context *ce, gfp_t gfp,
++		      bool timeline_mutex_needed)
+ {
+ 	struct intel_timeline *tl = ce->timeline;
+ 	struct i915_request *rq;
+@@ -957,7 +958,7 @@ struct i915_request *
+ 
+ 	rq->infix = rq->ring->emit; /* end of header; start of user payload */
+ 
+-	intel_context_mark_active(ce);
++	intel_context_mark_active(ce, timeline_mutex_needed);
+ 	list_add_tail_rcu(&rq->link, &tl->requests);
+ 
+ 	return rq;
+@@ -993,7 +994,7 @@ i915_request_create(struct intel_context
+ 		i915_request_retire(rq);
+ 
+ 	intel_context_enter(ce);
+-	rq = __i915_request_create(ce, GFP_KERNEL);
++	rq = __i915_request_create(ce, GFP_KERNEL, true);
+ 	intel_context_exit(ce); /* active reference transferred to request */
+ 	if (IS_ERR(rq))
+ 		goto err_unlock;
+--- a/drivers/gpu/drm/i915/i915_request.h
++++ b/drivers/gpu/drm/i915/i915_request.h
+@@ -320,7 +320,8 @@ static inline bool dma_fence_is_i915(con
+ struct kmem_cache *i915_request_slab_cache(void);
+ 
+ struct i915_request * __must_check
+-__i915_request_create(struct intel_context *ce, gfp_t gfp);
++__i915_request_create(struct intel_context *ce, gfp_t gfp,
++		      bool timeline_mutex_needed);
+ struct i915_request * __must_check
+ i915_request_create(struct intel_context *ce);
+ 
diff --git a/patches/0010-sch_htb-Use-helpers-to-read-stats-in-dump_stats.patch b/patches/0010-sch_htb-Use-helpers-to-read-stats-in-dump_stats.patch
new file mode 100644
index 000000000000..aaf05a532980
--- /dev/null
+++ b/patches/0010-sch_htb-Use-helpers-to-read-stats-in-dump_stats.patch
@@ -0,0 +1,81 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Fri, 8 Oct 2021 20:31:49 +0200
+Subject: [PATCH 10/10] sch_htb: Use helpers to read stats in ->dump_stats().
+
+The read of packets/bytes statistics in htb_dump_class_stats() appears
+not to be synchronized. htb_dump_class_stats() does not acquire locks
+but I'm not sure if the other `bstats' that are read can be modified or
+are stable while this callback in invoked.
+
+Add a helper to read the two members while synchronizing against
+seqcount_t.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/net/sch_generic.h |   16 ++++++++++++++++
+ net/sched/sch_htb.c       |   18 +++++++++---------
+ 2 files changed, 25 insertions(+), 9 deletions(-)
+
+--- a/include/net/sch_generic.h
++++ b/include/net/sch_generic.h
+@@ -849,6 +849,22 @@ static inline void _bstats_update(struct
+ 	u64_stats_update_end(&bstats->syncp);
+ }
+ 
++static inline void bstats_read_add(struct gnet_stats_basic_sync *bstats,
++				   __u64 *bytes, __u64 *packets)
++{
++	u64 t_bytes, t_packets;
++	unsigned int start;
++
++	do {
++		start = u64_stats_fetch_begin_irq(&bstats->syncp);
++		t_bytes = u64_stats_read(&bstats->bytes);
++		t_packets = u64_stats_read(&bstats->packets);
++	} while (u64_stats_fetch_retry_irq(&bstats->syncp, start));
++
++	*bytes = t_bytes;
++	*packets = t_packets;
++}
++
+ static inline void bstats_update(struct gnet_stats_basic_sync *bstats,
+ 				 const struct sk_buff *skb)
+ {
+--- a/net/sched/sch_htb.c
++++ b/net/sched/sch_htb.c
+@@ -1324,12 +1324,10 @@ static void htb_offload_aggregate_stats(
+ 			if (p != cl)
+ 				continue;
+ 
+-			bytes += u64_stats_read(&c->bstats_bias.bytes);
+-			packets += u64_stats_read(&c->bstats_bias.packets);
+-			if (c->level == 0) {
+-				bytes += u64_stats_read(&c->leaf.q->bstats.bytes);
+-				packets += u64_stats_read(&c->leaf.q->bstats.packets);
+-			}
++			bstats_read_add(&c->bstats_bias, &bytes, &packets);
++			if (c->level == 0)
++				bstats_read_add(&c->leaf.q->bstats,
++						&bytes, &packets);
+ 		}
+ 	}
+ 	_bstats_update(&cl->bstats, bytes, packets);
+@@ -1356,13 +1354,15 @@ htb_dump_class_stats(struct Qdisc *sch,
+ 
+ 	if (q->offload) {
+ 		if (!cl->level) {
++			u64 bytes = 0, packets = 0;
++
+ 			if (cl->leaf.q)
+ 				cl->bstats = cl->leaf.q->bstats;
+ 			else
+ 				gnet_stats_basic_sync_init(&cl->bstats);
+-			_bstats_update(&cl->bstats,
+-				       u64_stats_read(&cl->bstats_bias.bytes),
+-				       u64_stats_read(&cl->bstats_bias.packets));
++
++			bstats_read_add(&cl->bstats_bias, &bytes, &packets);
++			_bstats_update(&cl->bstats, bytes, packets);
+ 		} else {
+ 			htb_offload_aggregate_stats(q, cl);
+ 		}
diff --git a/patches/Add_localversion_for_-RT_release.patch b/patches/Add_localversion_for_-RT_release.patch
index e58a29adc4af..c8061e5a5d82 100644
--- a/patches/Add_localversion_for_-RT_release.patch
+++ b/patches/Add_localversion_for_-RT_release.patch
@@ -15,4 +15,4 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
 --- /dev/null
 +++ b/localversion-rt
 @@ -0,0 +1 @@
-+-rt7
++-rt8
diff --git a/patches/drm_i915_gt__Only_disable_interrupts_for_the_timeline_lock_on_force-threaded.patch b/patches/drm_i915_gt__Only_disable_interrupts_for_the_timeline_lock_on_force-threaded.patch
deleted file mode 100644
index 4224a7f07b19..000000000000
--- a/patches/drm_i915_gt__Only_disable_interrupts_for_the_timeline_lock_on_force-threaded.patch
+++ /dev/null
@@ -1,49 +0,0 @@
-Subject: drm/i915/gt: Only disable interrupts for the timeline lock on !force-threaded
-From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Date: Tue Jul  7 12:25:11 2020 +0200
-
-From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-
-According to commit
-    d67739268cf0e ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe")
-
-the intrrupts are disabled the code may be called from an interrupt
-handler and from preemptible context.
-With `force_irqthreads' set the timeline mutex is never observed in IRQ
-context so it is not neede to disable interrupts.
-
-Disable only interrupts if not in `force_irqthreads' mode.
-
-Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
-
----
- drivers/gpu/drm/i915/gt/intel_engine_pm.c |    8 +++++---
- 1 file changed, 5 insertions(+), 3 deletions(-)
----
---- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
-+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
-@@ -84,9 +84,10 @@ static int __engine_unpark(struct intel_
- 
- static unsigned long __timeline_mark_lock(struct intel_context *ce)
- {
--	unsigned long flags;
-+	unsigned long flags = 0;
- 
--	local_irq_save(flags);
-+	if (!force_irqthreads())
-+		local_irq_save(flags);
- 	mutex_acquire(&ce->timeline->mutex.dep_map, 2, 0, _THIS_IP_);
- 
- 	return flags;
-@@ -96,7 +97,8 @@ static void __timeline_mark_unlock(struc
- 				   unsigned long flags)
- {
- 	mutex_release(&ce->timeline->mutex.dep_map, _THIS_IP_);
--	local_irq_restore(flags);
-+	if (!force_irqthreads())
-+		local_irq_restore(flags);
- }
- 
- #else
diff --git a/patches/drmradeoni915__Use_preempt_disable_enable_rt_where_recommended.patch b/patches/drmradeoni915__Use_preempt_disable_enable_rt_where_recommended.patch
deleted file mode 100644
index 175722566ae6..000000000000
--- a/patches/drmradeoni915__Use_preempt_disable_enable_rt_where_recommended.patch
+++ /dev/null
@@ -1,56 +0,0 @@
-Subject: drm,radeon,i915: Use preempt_disable/enable_rt() where recommended
-From: Mike Galbraith <umgwanakikbuti@gmail.com>
-Date: Sat Feb 27 08:09:11 2016 +0100
-
-From: Mike Galbraith <umgwanakikbuti@gmail.com>
-
-DRM folks identified the spots, so use them.
-
-Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
-
----
- drivers/gpu/drm/i915/i915_irq.c         |    2 ++
- drivers/gpu/drm/radeon/radeon_display.c |    2 ++
- 2 files changed, 4 insertions(+)
----
---- a/drivers/gpu/drm/i915/i915_irq.c
-+++ b/drivers/gpu/drm/i915/i915_irq.c
-@@ -887,6 +887,7 @@ static bool i915_get_crtc_scanoutpos(str
- 	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
- 
- 	/* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */
-+	preempt_disable_rt();
- 
- 	/* Get optional system timestamp before query. */
- 	if (stime)
-@@ -951,6 +952,7 @@ static bool i915_get_crtc_scanoutpos(str
- 		*etime = ktime_get();
- 
- 	/* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */
-+	preempt_enable_rt();
- 
- 	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
- 
---- a/drivers/gpu/drm/radeon/radeon_display.c
-+++ b/drivers/gpu/drm/radeon/radeon_display.c
-@@ -1814,6 +1814,7 @@ int radeon_get_crtc_scanoutpos(struct dr
- 	struct radeon_device *rdev = dev->dev_private;
- 
- 	/* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */
-+	preempt_disable_rt();
- 
- 	/* Get optional system timestamp before query. */
- 	if (stime)
-@@ -1906,6 +1907,7 @@ int radeon_get_crtc_scanoutpos(struct dr
- 		*etime = ktime_get();
- 
- 	/* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */
-+	preempt_enable_rt();
- 
- 	/* Decode into vertical and horizontal scanout position. */
- 	*vpos = position & 0x1fff;
diff --git a/patches/genirq__update_irq_set_irqchip_state_documentation.patch b/patches/genirq__update_irq_set_irqchip_state_documentation.patch
index 18c9bbca489f..c3b062d4fd3c 100644
--- a/patches/genirq__update_irq_set_irqchip_state_documentation.patch
+++ b/patches/genirq__update_irq_set_irqchip_state_documentation.patch
@@ -17,7 +17,7 @@ Link: https://lkml.kernel.org/r/20210917103055.92150-1-bigeasy@linutronix.de
 ---
 --- a/kernel/irq/manage.c
 +++ b/kernel/irq/manage.c
-@@ -2834,7 +2834,7 @@ EXPORT_SYMBOL_GPL(irq_get_irqchip_state)
+@@ -2833,7 +2833,7 @@ EXPORT_SYMBOL_GPL(irq_get_irqchip_state)
   *	This call sets the internal irqchip state of an interrupt,
   *	depending on the value of @which.
   *
diff --git a/patches/irq-Export-force_irqthreads_key.patch b/patches/irq-Export-force_irqthreads_key.patch
deleted file mode 100644
index 43667d1a7c24..000000000000
--- a/patches/irq-Export-force_irqthreads_key.patch
+++ /dev/null
@@ -1,22 +0,0 @@
-From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Date: Mon, 27 Sep 2021 11:59:17 +0200
-Subject: [PATCH] irq: Export force_irqthreads_key
-
-Temporary add the EXPORT_SYMBOL_GPL for force_irqthreads_key until it is
-settled if it is needed or not.
-
-Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
----
- kernel/irq/manage.c |    1 +
- 1 file changed, 1 insertion(+)
-
---- a/kernel/irq/manage.c
-+++ b/kernel/irq/manage.c
-@@ -26,6 +26,7 @@
- 
- #if defined(CONFIG_IRQ_FORCED_THREADING) && !defined(CONFIG_PREEMPT_RT)
- DEFINE_STATIC_KEY_FALSE(force_irqthreads_key);
-+EXPORT_SYMBOL_GPL(force_irqthreads_key);
- 
- static int __init setup_forced_irqthreads(char *arg)
- {
diff --git a/patches/net-core-disable-NET_RX_BUSY_POLL-on-PREEMPT_RT.patch b/patches/net-core-disable-NET_RX_BUSY_POLL-on-PREEMPT_RT.patch
new file mode 100644
index 000000000000..6124f8358491
--- /dev/null
+++ b/patches/net-core-disable-NET_RX_BUSY_POLL-on-PREEMPT_RT.patch
@@ -0,0 +1,36 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Fri, 1 Oct 2021 16:58:41 +0200
+Subject: [PATCH] net/core: disable NET_RX_BUSY_POLL on PREEMPT_RT
+
+napi_busy_loop() disables preemption and performs a NAPI poll. We can't acquire
+sleeping locks with disabled preemption which would be required while
+__napi_poll() invokes the callback of the driver.
+
+A threaded interrupt performing the NAPI-poll can be preempted on PREEMPT_RT.
+A RT thread on another CPU may observe NAPIF_STATE_SCHED bit set and busy-spin
+until it is cleared or its spin time runs out. Given it is the task with the
+highest priority it will never observe the NEED_RESCHED bit set.
+In this case the time is better spent by simply sleeping.
+
+The NET_RX_BUSY_POLL is disabled by default (the system wide sysctls for
+poll/read are set to zero). Disabling NET_RX_BUSY_POLL on PREEMPT_RT to avoid
+wrong locking context in case it is used.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Link: https://lore.kernel.org/r/20211001145841.2308454-1-bigeasy@linutronix.de
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+---
+ net/Kconfig |    2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/net/Kconfig
++++ b/net/Kconfig
+@@ -294,7 +294,7 @@ config CGROUP_NET_CLASSID
+ 
+ config NET_RX_BUSY_POLL
+ 	bool
+-	default y
++	default y if !PREEMPT_RT
+ 
+ config BQL
+ 	bool
diff --git a/patches/net_Qdisc__use_a_seqlock_instead_seqcount.patch b/patches/net_Qdisc__use_a_seqlock_instead_seqcount.patch
deleted file mode 100644
index a0ee4e6fd3da..000000000000
--- a/patches/net_Qdisc__use_a_seqlock_instead_seqcount.patch
+++ /dev/null
@@ -1,286 +0,0 @@
-Subject: net/Qdisc: use a seqlock instead seqcount
-From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Date: Wed Sep 14 17:36:35 2016 +0200
-
-From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-
-The seqcount disables preemption on -RT while it is held which can't
-remove. Also we don't want the reader to spin for ages if the writer is
-scheduled out. The seqlock on the other hand will serialize / sleep on
-the lock while writer is active.
-
-Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
-
----
- include/net/gen_stats.h    |   11 ++++++-----
- include/net/net_seq_lock.h |   24 ++++++++++++++++++++++++
- include/net/sch_generic.h  |   18 ++++++++++++++++--
- net/core/gen_estimator.c   |    6 +++---
- net/core/gen_stats.c       |   12 ++++++------
- net/sched/sch_api.c        |    2 +-
- net/sched/sch_generic.c    |   10 ++++++++++
- 7 files changed, 66 insertions(+), 17 deletions(-)
- create mode 100644 include/net/net_seq_lock.h
----
---- a/include/net/gen_stats.h
-+++ b/include/net/gen_stats.h
-@@ -6,6 +6,7 @@
- #include <linux/socket.h>
- #include <linux/rtnetlink.h>
- #include <linux/pkt_sched.h>
-+#include <net/net_seq_lock.h>
- 
- /* Note: this used to be in include/uapi/linux/gen_stats.h */
- struct gnet_stats_basic_packed {
-@@ -42,15 +43,15 @@ int gnet_stats_start_copy_compat(struct
- 				 spinlock_t *lock, struct gnet_dump *d,
- 				 int padattr);
- 
--int gnet_stats_copy_basic(const seqcount_t *running,
-+int gnet_stats_copy_basic(net_seqlock_t *running,
- 			  struct gnet_dump *d,
- 			  struct gnet_stats_basic_cpu __percpu *cpu,
- 			  struct gnet_stats_basic_packed *b);
--void __gnet_stats_copy_basic(const seqcount_t *running,
-+void __gnet_stats_copy_basic(net_seqlock_t *running,
- 			     struct gnet_stats_basic_packed *bstats,
- 			     struct gnet_stats_basic_cpu __percpu *cpu,
- 			     struct gnet_stats_basic_packed *b);
--int gnet_stats_copy_basic_hw(const seqcount_t *running,
-+int gnet_stats_copy_basic_hw(net_seqlock_t *running,
- 			     struct gnet_dump *d,
- 			     struct gnet_stats_basic_cpu __percpu *cpu,
- 			     struct gnet_stats_basic_packed *b);
-@@ -70,13 +71,13 @@ int gen_new_estimator(struct gnet_stats_
- 		      struct gnet_stats_basic_cpu __percpu *cpu_bstats,
- 		      struct net_rate_estimator __rcu **rate_est,
- 		      spinlock_t *lock,
--		      seqcount_t *running, struct nlattr *opt);
-+		      net_seqlock_t *running, struct nlattr *opt);
- void gen_kill_estimator(struct net_rate_estimator __rcu **ptr);
- int gen_replace_estimator(struct gnet_stats_basic_packed *bstats,
- 			  struct gnet_stats_basic_cpu __percpu *cpu_bstats,
- 			  struct net_rate_estimator __rcu **ptr,
- 			  spinlock_t *lock,
--			  seqcount_t *running, struct nlattr *opt);
-+			  net_seqlock_t *running, struct nlattr *opt);
- bool gen_estimator_active(struct net_rate_estimator __rcu **ptr);
- bool gen_estimator_read(struct net_rate_estimator __rcu **ptr,
- 			struct gnet_stats_rate_est64 *sample);
---- /dev/null
-+++ b/include/net/net_seq_lock.h
-@@ -0,0 +1,24 @@
-+#ifndef __NET_NET_SEQ_LOCK_H__
-+#define __NET_NET_SEQ_LOCK_H__
-+
-+#ifdef CONFIG_PREEMPT_RT
-+# define net_seqlock_t			seqlock_t
-+# define net_seq_begin(__r)		read_seqbegin(__r)
-+# define net_seq_retry(__r, __s)	read_seqretry(__r, __s)
-+
-+static inline int try_write_seqlock(seqlock_t *sl)
-+{
-+	if (spin_trylock(&sl->lock)) {
-+		write_seqcount_begin(&sl->seqcount);
-+		return 1;
-+	}
-+	return 0;
-+}
-+
-+#else
-+# define net_seqlock_t			seqcount_t
-+# define net_seq_begin(__r)		read_seqcount_begin(__r)
-+# define net_seq_retry(__r, __s)	read_seqcount_retry(__r, __s)
-+#endif
-+
-+#endif
---- a/include/net/sch_generic.h
-+++ b/include/net/sch_generic.h
-@@ -10,6 +10,7 @@
- #include <linux/percpu.h>
- #include <linux/dynamic_queue_limits.h>
- #include <linux/list.h>
-+#include <net/net_seq_lock.h>
- #include <linux/refcount.h>
- #include <linux/workqueue.h>
- #include <linux/mutex.h>
-@@ -108,7 +109,7 @@ struct Qdisc {
- 	struct sk_buff_head	gso_skb ____cacheline_aligned_in_smp;
- 	struct qdisc_skb_head	q;
- 	struct gnet_stats_basic_packed bstats;
--	seqcount_t		running;
-+	net_seqlock_t		running;
- 	struct gnet_stats_queue	qstats;
- 	unsigned long		state;
- 	struct Qdisc            *next_sched;
-@@ -147,7 +148,11 @@ static inline bool qdisc_is_running(stru
- {
- 	if (qdisc->flags & TCQ_F_NOLOCK)
- 		return spin_is_locked(&qdisc->seqlock);
-+#ifdef CONFIG_PREEMPT_RT
-+	return spin_is_locked(&qdisc->running.lock);
-+#else
- 	return (raw_read_seqcount(&qdisc->running) & 1) ? true : false;
-+#endif
- }
- 
- static inline bool nolock_qdisc_is_empty(const struct Qdisc *qdisc)
-@@ -209,12 +214,17 @@ static inline bool qdisc_run_begin(struc
- 	} else if (qdisc_is_running(qdisc)) {
- 		return false;
- 	}
-+
-+#ifdef CONFIG_PREEMPT_RT
-+	return try_write_seqlock(&qdisc->running);
-+#else
- 	/* Variant of write_seqcount_begin() telling lockdep a trylock
- 	 * was attempted.
- 	 */
- 	raw_write_seqcount_begin(&qdisc->running);
- 	seqcount_acquire(&qdisc->running.dep_map, 0, 1, _RET_IP_);
- 	return true;
-+#endif
- }
- 
- static inline void qdisc_run_end(struct Qdisc *qdisc)
-@@ -226,7 +236,11 @@ static inline void qdisc_run_end(struct
- 				      &qdisc->state)))
- 			__netif_schedule(qdisc);
- 	} else {
-+#ifdef CONFIG_PREEMPT_RT
-+		write_sequnlock(&qdisc->running);
-+#else
- 		write_seqcount_end(&qdisc->running);
-+#endif
- 	}
- }
- 
-@@ -590,7 +604,7 @@ static inline spinlock_t *qdisc_root_sle
- 	return qdisc_lock(root);
- }
- 
--static inline seqcount_t *qdisc_root_sleeping_running(const struct Qdisc *qdisc)
-+static inline net_seqlock_t *qdisc_root_sleeping_running(const struct Qdisc *qdisc)
- {
- 	struct Qdisc *root = qdisc_root_sleeping(qdisc);
- 
---- a/net/core/gen_estimator.c
-+++ b/net/core/gen_estimator.c
-@@ -42,7 +42,7 @@
- struct net_rate_estimator {
- 	struct gnet_stats_basic_packed	*bstats;
- 	spinlock_t		*stats_lock;
--	seqcount_t		*running;
-+	net_seqlock_t		*running;
- 	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
- 	u8			ewma_log;
- 	u8			intvl_log; /* period : (250ms << intvl_log) */
-@@ -125,7 +125,7 @@ int gen_new_estimator(struct gnet_stats_
- 		      struct gnet_stats_basic_cpu __percpu *cpu_bstats,
- 		      struct net_rate_estimator __rcu **rate_est,
- 		      spinlock_t *lock,
--		      seqcount_t *running,
-+		      net_seqlock_t *running,
- 		      struct nlattr *opt)
- {
- 	struct gnet_estimator *parm = nla_data(opt);
-@@ -226,7 +226,7 @@ int gen_replace_estimator(struct gnet_st
- 			  struct gnet_stats_basic_cpu __percpu *cpu_bstats,
- 			  struct net_rate_estimator __rcu **rate_est,
- 			  spinlock_t *lock,
--			  seqcount_t *running, struct nlattr *opt)
-+			  net_seqlock_t *running, struct nlattr *opt)
- {
- 	return gen_new_estimator(bstats, cpu_bstats, rate_est,
- 				 lock, running, opt);
---- a/net/core/gen_stats.c
-+++ b/net/core/gen_stats.c
-@@ -137,7 +137,7 @@ static void
- }
- 
- void
--__gnet_stats_copy_basic(const seqcount_t *running,
-+__gnet_stats_copy_basic(net_seqlock_t *running,
- 			struct gnet_stats_basic_packed *bstats,
- 			struct gnet_stats_basic_cpu __percpu *cpu,
- 			struct gnet_stats_basic_packed *b)
-@@ -150,15 +150,15 @@ void
- 	}
- 	do {
- 		if (running)
--			seq = read_seqcount_begin(running);
-+			seq = net_seq_begin(running);
- 		bstats->bytes = b->bytes;
- 		bstats->packets = b->packets;
--	} while (running && read_seqcount_retry(running, seq));
-+	} while (running && net_seq_retry(running, seq));
- }
- EXPORT_SYMBOL(__gnet_stats_copy_basic);
- 
- static int
--___gnet_stats_copy_basic(const seqcount_t *running,
-+___gnet_stats_copy_basic(net_seqlock_t *running,
- 			 struct gnet_dump *d,
- 			 struct gnet_stats_basic_cpu __percpu *cpu,
- 			 struct gnet_stats_basic_packed *b,
-@@ -204,7 +204,7 @@ static int
-  * if the room in the socket buffer was not sufficient.
-  */
- int
--gnet_stats_copy_basic(const seqcount_t *running,
-+gnet_stats_copy_basic(net_seqlock_t *running,
- 		      struct gnet_dump *d,
- 		      struct gnet_stats_basic_cpu __percpu *cpu,
- 		      struct gnet_stats_basic_packed *b)
-@@ -228,7 +228,7 @@ EXPORT_SYMBOL(gnet_stats_copy_basic);
-  * if the room in the socket buffer was not sufficient.
-  */
- int
--gnet_stats_copy_basic_hw(const seqcount_t *running,
-+gnet_stats_copy_basic_hw(net_seqlock_t *running,
- 			 struct gnet_dump *d,
- 			 struct gnet_stats_basic_cpu __percpu *cpu,
- 			 struct gnet_stats_basic_packed *b)
---- a/net/sched/sch_api.c
-+++ b/net/sched/sch_api.c
-@@ -1258,7 +1258,7 @@ static struct Qdisc *qdisc_create(struct
- 		rcu_assign_pointer(sch->stab, stab);
- 	}
- 	if (tca[TCA_RATE]) {
--		seqcount_t *running;
-+		net_seqlock_t *running;
- 
- 		err = -EOPNOTSUPP;
- 		if (sch->flags & TCQ_F_MQROOT) {
---- a/net/sched/sch_generic.c
-+++ b/net/sched/sch_generic.c
-@@ -606,7 +606,11 @@ struct Qdisc noop_qdisc = {
- 	.ops		=	&noop_qdisc_ops,
- 	.q.lock		=	__SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock),
- 	.dev_queue	=	&noop_netdev_queue,
-+#ifdef CONFIG_PREEMPT_RT
-+	.running	=	__SEQLOCK_UNLOCKED(noop_qdisc.running),
-+#else
- 	.running	=	SEQCNT_ZERO(noop_qdisc.running),
-+#endif
- 	.busylock	=	__SPIN_LOCK_UNLOCKED(noop_qdisc.busylock),
- 	.gso_skb = {
- 		.next = (struct sk_buff *)&noop_qdisc.gso_skb,
-@@ -916,9 +920,15 @@ struct Qdisc *qdisc_alloc(struct netdev_
- 	lockdep_set_class(&sch->seqlock,
- 			  dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);
- 
-+#ifdef CONFIG_PREEMPT_RT
-+	seqlock_init(&sch->running);
-+	lockdep_set_class(&sch->running.lock,
-+			  dev->qdisc_running_key ?: &qdisc_running_key);
-+#else
- 	seqcount_init(&sch->running);
- 	lockdep_set_class(&sch->running,
- 			  dev->qdisc_running_key ?: &qdisc_running_key);
-+#endif
- 
- 	sch->ops = ops;
- 	sch->flags = ops->static_flags;
diff --git a/patches/net__Properly_annotate_the_try-lock_for_the_seqlock.patch b/patches/net__Properly_annotate_the_try-lock_for_the_seqlock.patch
deleted file mode 100644
index ea946d2079ac..000000000000
--- a/patches/net__Properly_annotate_the_try-lock_for_the_seqlock.patch
+++ /dev/null
@@ -1,68 +0,0 @@
-Subject: net: Properly annotate the try-lock for the seqlock
-From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Date: Tue Sep  8 16:57:11 2020 +0200
-
-From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-
-In patch
-   ("net/Qdisc: use a seqlock instead seqcount")
-
-the seqcount has been replaced with a seqlock to allow to reader to
-boost the preempted writer.
-The try_write_seqlock() acquired the lock with a try-lock but the
-seqcount annotation was "lock".
-
-Opencode write_seqcount_t_begin() and use the try-lock annotation for
-lockdep.
-
-Reported-by: Mike Galbraith <efault@gmx.de>
-Cc: stable-rt@vger.kernel.org
-Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
-
----
- include/net/net_seq_lock.h |    9 ---------
- include/net/sch_generic.h  |   13 ++++++++++++-
- 2 files changed, 12 insertions(+), 10 deletions(-)
----
---- a/include/net/net_seq_lock.h
-+++ b/include/net/net_seq_lock.h
-@@ -6,15 +6,6 @@
- # define net_seq_begin(__r)		read_seqbegin(__r)
- # define net_seq_retry(__r, __s)	read_seqretry(__r, __s)
- 
--static inline int try_write_seqlock(seqlock_t *sl)
--{
--	if (spin_trylock(&sl->lock)) {
--		write_seqcount_begin(&sl->seqcount);
--		return 1;
--	}
--	return 0;
--}
--
- #else
- # define net_seqlock_t			seqcount_t
- # define net_seq_begin(__r)		read_seqcount_begin(__r)
---- a/include/net/sch_generic.h
-+++ b/include/net/sch_generic.h
-@@ -216,7 +216,18 @@ static inline bool qdisc_run_begin(struc
- 	}
- 
- #ifdef CONFIG_PREEMPT_RT
--	return try_write_seqlock(&qdisc->running);
-+	if (spin_trylock(&qdisc->running.lock)) {
-+		seqcount_t *s = &qdisc->running.seqcount.seqcount;
-+
-+		/*
-+		 * Variant of write_seqcount_t_begin() telling lockdep that
-+		 * a trylock was attempted.
-+		 */
-+		do_raw_write_seqcount_begin(s);
-+		seqcount_acquire(&s->dep_map, 0, 1, _RET_IP_);
-+		return true;
-+	}
-+	return false;
- #else
- 	/* Variant of write_seqcount_begin() telling lockdep a trylock
- 	 * was attempted.
diff --git a/patches/net_core__disable_NET_RX_BUSY_POLL_on_RT.patch b/patches/net_core__disable_NET_RX_BUSY_POLL_on_RT.patch
deleted file mode 100644
index 6de158b8c102..000000000000
--- a/patches/net_core__disable_NET_RX_BUSY_POLL_on_RT.patch
+++ /dev/null
@@ -1,42 +0,0 @@
-Subject: net/core: disable NET_RX_BUSY_POLL on RT
-From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Date: Sat May 27 19:02:06 2017 +0200
-
-From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-
-napi_busy_loop() disables preemption and performs a NAPI poll. We can't acquire
-sleeping locks with disabled preemption so we would have to work around this
-and add explicit locking for synchronisation against ksoftirqd.
-Without explicit synchronisation a low priority process would "own" the NAPI
-state (by setting NAPIF_STATE_SCHED) and could be scheduled out (no
-preempt_disable() and BH is preemptible on RT).
-In case a network packages arrives then the interrupt handler would set
-NAPIF_STATE_MISSED and the system would wait until the task owning the NAPI
-would be scheduled in again.
-Should a task with RT priority busy poll then it would consume the CPU instead
-allowing tasks with lower priority to run.
-
-The NET_RX_BUSY_POLL is disabled by default (the system wide sysctls for
-poll/read are set to zero) so disable NET_RX_BUSY_POLL on RT to avoid wrong
-locking context on RT. Should this feature be considered useful on RT systems
-then it could be enabled again with proper locking and synchronisation.
-
-Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
-
----
- net/Kconfig |    2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
----
---- a/net/Kconfig
-+++ b/net/Kconfig
-@@ -294,7 +294,7 @@ config CGROUP_NET_CLASSID
- 
- config NET_RX_BUSY_POLL
- 	bool
--	default y
-+	default y if !PREEMPT_RT
- 
- config BQL
- 	bool
diff --git a/patches/sched__Add_support_for_lazy_preemption.patch b/patches/sched__Add_support_for_lazy_preemption.patch
index bb4fc4c13dcd..346b0ac36849 100644
--- a/patches/sched__Add_support_for_lazy_preemption.patch
+++ b/patches/sched__Add_support_for_lazy_preemption.patch
@@ -469,7 +469,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
  }
  
  static __always_inline
-@@ -5511,7 +5511,7 @@ static void hrtick_start_fair(struct rq
+@@ -5515,7 +5515,7 @@ static void hrtick_start_fair(struct rq
  
  		if (delta < 0) {
  			if (task_current(rq, p))
@@ -478,7 +478,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
  			return;
  		}
  		hrtick_start(rq, delta);
-@@ -7201,7 +7201,7 @@ static void check_preempt_wakeup(struct
+@@ -7205,7 +7205,7 @@ static void check_preempt_wakeup(struct
  	return;
  
  preempt:
@@ -487,7 +487,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
  	/*
  	 * Only set the backward buddy when the current task is still
  	 * on the rq. This can happen when a wakeup gets interleaved
-@@ -11102,7 +11102,7 @@ static void task_fork_fair(struct task_s
+@@ -11106,7 +11106,7 @@ static void task_fork_fair(struct task_s
  		 * 'current' within the tree based on its new key value.
  		 */
  		swap(curr->vruntime, se->vruntime);
@@ -496,7 +496,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
  	}
  
  	se->vruntime -= cfs_rq->min_vruntime;
-@@ -11129,7 +11129,7 @@ prio_changed_fair(struct rq *rq, struct
+@@ -11133,7 +11133,7 @@ prio_changed_fair(struct rq *rq, struct
  	 */
  	if (task_current(rq, p)) {
  		if (p->prio > oldprio)
diff --git a/patches/sched_introduce_migratable.patch b/patches/sched_introduce_migratable.patch
index c31f03372f7a..f66b422e03ae 100644
--- a/patches/sched_introduce_migratable.patch
+++ b/patches/sched_introduce_migratable.patch
@@ -26,7 +26,7 @@ Link: https://lore.kernel.org/r/20210811201354.1976839-3-valentin.schneider@arm.
 
 --- a/include/linux/sched.h
 +++ b/include/linux/sched.h
-@@ -1730,6 +1730,16 @@ static inline bool is_percpu_thread(void
+@@ -1730,6 +1730,16 @@ static __always_inline bool is_percpu_th
  #endif
  }
  
diff --git a/patches/series b/patches/series
index 053976565ffc..9aed12373667 100644
--- a/patches/series
+++ b/patches/series
@@ -34,6 +34,10 @@ kthread-Move-prio-affinite-change-into-the-newly-cre.patch
 genirq-Move-prio-assignment-into-the-newly-created-t.patch
 genirq-Disable-irqfixup-poll-on-PREEMPT_RT.patch
 lockdep-Let-lock_is_held_type-detect-recursive-read-.patch
+efi-Disable-runtime-services-on-RT.patch
+efi-Allow-efi-runtime.patch
+mm-Disable-zsmalloc-on-PREEMPT_RT.patch
+net-core-disable-NET_RX_BUSY_POLL-on-PREEMPT_RT.patch
 
 # KCOV (akpm)
 0001_documentation_kcov_include_types_h_in_the_example.patch
@@ -46,10 +50,7 @@ lockdep-Let-lock_is_held_type-detect-recursive-read-.patch
 # Posted
 ###########################################################################
 crypto-testmgr-Only-disable-migration-in-crypto_disa.patch
-mm-Disable-zsmalloc-on-PREEMPT_RT.patch
 irq_poll-Use-raise_softirq_irqoff-in-cpu_dead-notifi.patch
-efi-Disable-runtime-services-on-RT.patch
-efi-Allow-efi-runtime.patch
 smp_wake_ksoftirqd_on_preempt_rt_instead_do_softirq.patch
 x86-softirq-Disable-softirq-stacks-on-PREEMPT_RT.patch
 
@@ -71,10 +72,9 @@ x86-softirq-Disable-softirq-stacks-on-PREEMPT_RT.patch
 
 # irqwork: Needs upstream consolidation
 0001_sched_rt_annotate_the_rt_balancing_logic_irqwork_as_irq_work_hard_irq.patch
-0002_irq_work_ensure_that_irq_work_runs_in_in_irq_context.patch
-0003_irq_work_allow_irq_work_sync_to_sleep_if_irq_work_no_irq_support.patch
-0004_irq_work_handle_some_irq_work_in_softirq_on_preempt_rt.patch
-0005_irq_work_also_rcuwait_for_irq_work_hard_irq_on_preempt_rt.patch
+0002_irq_work_allow_irq_work_sync_to_sleep_if_irq_work_no_irq_support.patch
+0003_irq_work_handle_some_irq_work_in_a_per_cpu_thread_on_preempt_rt.patch
+0004_irq_work_also_rcuwait_for_irq_work_hard_irq_on_preempt_rt.patch
 
 ###########################################################################
 # Post
@@ -84,13 +84,24 @@ mm__workingset__replace_IRQ-off_check_with_a_lockdep_assert..patch
 tcp__Remove_superfluous_BH-disable_around_listening_hash.patch
 samples_kfifo__Rename_read_lock_write_lock.patch
 
+# Qdics's seqcount removal.
+0001-mqprio-Correct-stats-in-mqprio_dump_class_stats.patch
+0002-gen_stats-Add-instead-Set-the-value-in-__gnet_stats_.patch
+0003-gen_stats-Add-instead-Set-the-value-in-__gnet_stats_.patch
+0004-mq-mqprio-Simplify-stats-copy.patch
+0005-u64_stats-Introduce-u64_stats_set.patch
+0006-net-sched-Protect-Qdisc-bstats-with-u64_stats.patch
+0007-net-sched-Use-_bstats_update-set-instead-of-raw-writ.patch
+0008-net-sched-Merge-Qdisc-bstats-and-Qdisc-cpu_bstats-da.patch
+0009-net-sched-Remove-Qdisc-running-sequence-counter.patch
+0010-sch_htb-Use-helpers-to-read-stats-in-dump_stats.patch
+
 ###########################################################################
 # Kconfig bits:
 ###########################################################################
 jump-label__disable_if_stop_machine_is_used.patch
 kconfig__Disable_config_options_which_are_not_RT_compatible.patch
 mm__Allow_only_SLUB_on_RT.patch
-net_core__disable_NET_RX_BUSY_POLL_on_RT.patch
 
 ###########################################################################
 # Include fixes
@@ -177,8 +188,6 @@ rcu__Delay_RCU-selftests.patch
 ###########################################################################
 # net:
 ###########################################################################
-net_Qdisc__use_a_seqlock_instead_seqcount.patch
-net__Properly_annotate_the_try-lock_for_the_seqlock.patch
 net_core__use_local_bh_disable_in_netif_rx_ni.patch
 net__Use_skbufhead_with_raw_lock.patch
 net__Dequeue_in_dev_cpu_dead_without_the_lock.patch
@@ -194,14 +203,16 @@ random__Make_it_work_on_rt.patch
 ###########################################################################
 # DRM:
 ###########################################################################
-irq-Export-force_irqthreads_key.patch
-drmradeoni915__Use_preempt_disable_enable_rt_where_recommended.patch
-drm_i915__Dont_disable_interrupts_on_PREEMPT_RT_during_atomic_updates.patch
-drm_i915__disable_tracing_on_-RT.patch
-drm_i915__skip_DRM_I915_LOW_LEVEL_TRACEPOINTS_with_NOTRACE.patch
-drm_i915_gt__Only_disable_interrupts_for_the_timeline_lock_on_force-threaded.patch
-drm-i915-gt-Queue-and-wait-for-the-irq_work-item.patch
-drm-i915-gt-Use-spin_lock_irq-instead-of-local_irq_d.patch
+0001-drm-i915-remember-to-call-i915_sw_fence_fini.patch
+0002-drm-Increase-DRM_OBJECT_MAX_PROPERTY-by-18.patch
+0003-drm-i915-Use-preempt_disable-enable_rt-where-recomme.patch
+0004-drm-i915-Don-t-disable-interrupts-on-PREEMPT_RT-duri.patch
+0005-drm-i915-Disable-tracing-points-on-PREEMPT_RT.patch
+0006-drm-i915-skip-DRM_I915_LOW_LEVEL_TRACEPOINTS-with-NO.patch
+0007-drm-i915-gt-Queue-and-wait-for-the-irq_work-item.patch
+0008-drm-i915-gt-Use-spin_lock_irq-instead-of-local_irq_d.patch
+0009-drm-i915-Drop-the-irqs_disabled-check.patch
+0010-drm-i915-Don-t-disable-interrupts-and-pretend-a-lock.patch
 
 ###########################################################################
 # X86:
diff --git a/patches/u64_stats__Disable_preemption_on_32bit-UP_SMP_with_RT_during_updates.patch b/patches/u64_stats__Disable_preemption_on_32bit-UP_SMP_with_RT_during_updates.patch
index 79babdad61f0..417c3a241021 100644
--- a/patches/u64_stats__Disable_preemption_on_32bit-UP_SMP_with_RT_during_updates.patch
+++ b/patches/u64_stats__Disable_preemption_on_32bit-UP_SMP_with_RT_during_updates.patch
@@ -31,7 +31,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
  	seqcount_t	seq;
  #endif
  };
-@@ -115,7 +115,7 @@ static inline void u64_stats_inc(u64_sta
+@@ -125,7 +125,7 @@ static inline void u64_stats_inc(u64_sta
  }
  #endif
  
@@ -40,7 +40,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
  #define u64_stats_init(syncp)	seqcount_init(&(syncp)->seq)
  #else
  static inline void u64_stats_init(struct u64_stats_sync *syncp)
-@@ -125,15 +125,19 @@ static inline void u64_stats_init(struct
+@@ -135,15 +135,19 @@ static inline void u64_stats_init(struct
  
  static inline void u64_stats_update_begin(struct u64_stats_sync *syncp)
  {
@@ -62,7 +62,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
  #endif
  }
  
-@@ -142,8 +146,11 @@ u64_stats_update_begin_irqsave(struct u6
+@@ -152,8 +156,11 @@ u64_stats_update_begin_irqsave(struct u6
  {
  	unsigned long flags = 0;
  
@@ -76,7 +76,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
  	write_seqcount_begin(&syncp->seq);
  #endif
  	return flags;
-@@ -153,15 +160,18 @@ static inline void
+@@ -163,15 +170,18 @@ static inline void
  u64_stats_update_end_irqrestore(struct u64_stats_sync *syncp,
  				unsigned long flags)
  {
@@ -98,7 +98,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
  	return read_seqcount_begin(&syncp->seq);
  #else
  	return 0;
-@@ -170,7 +180,7 @@ static inline unsigned int __u64_stats_f
+@@ -180,7 +190,7 @@ static inline unsigned int __u64_stats_f
  
  static inline unsigned int u64_stats_fetch_begin(const struct u64_stats_sync *syncp)
  {
@@ -107,7 +107,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
  	preempt_disable();
  #endif
  	return __u64_stats_fetch_begin(syncp);
-@@ -179,7 +189,7 @@ static inline unsigned int u64_stats_fet
+@@ -189,7 +199,7 @@ static inline unsigned int u64_stats_fet
  static inline bool __u64_stats_fetch_retry(const struct u64_stats_sync *syncp,
  					 unsigned int start)
  {
@@ -116,7 +116,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
  	return read_seqcount_retry(&syncp->seq, start);
  #else
  	return false;
-@@ -189,7 +199,7 @@ static inline bool __u64_stats_fetch_ret
+@@ -199,7 +209,7 @@ static inline bool __u64_stats_fetch_ret
  static inline bool u64_stats_fetch_retry(const struct u64_stats_sync *syncp,
  					 unsigned int start)
  {
@@ -125,7 +125,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
  	preempt_enable();
  #endif
  	return __u64_stats_fetch_retry(syncp, start);
-@@ -203,7 +213,9 @@ static inline bool u64_stats_fetch_retry
+@@ -213,7 +223,9 @@ static inline bool u64_stats_fetch_retry
   */
  static inline unsigned int u64_stats_fetch_begin_irq(const struct u64_stats_sync *syncp)
  {
@@ -136,7 +136,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
  	local_irq_disable();
  #endif
  	return __u64_stats_fetch_begin(syncp);
-@@ -212,7 +224,9 @@ static inline unsigned int u64_stats_fet
+@@ -222,7 +234,9 @@ static inline unsigned int u64_stats_fet
  static inline bool u64_stats_fetch_retry_irq(const struct u64_stats_sync *syncp,
  					     unsigned int start)
  {
diff --git a/patches/x86__kvm_Require_const_tsc_for_RT.patch b/patches/x86__kvm_Require_const_tsc_for_RT.patch
index 9b523254fc98..4508ca75cedf 100644
--- a/patches/x86__kvm_Require_const_tsc_for_RT.patch
+++ b/patches/x86__kvm_Require_const_tsc_for_RT.patch
@@ -18,7 +18,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
 ---
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
-@@ -8402,6 +8402,14 @@ int kvm_arch_init(void *opaque)
+@@ -8416,6 +8416,14 @@ int kvm_arch_init(void *opaque)
  		goto out;
  	}
author	Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2021-10-08 23:18:03 +0200
committer	Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2021-10-08 23:18:03 +0200
commit	8faf46dc8df161aa1b6be244e6d842094fc32212 (patch)
tree	4761b83d49b72a53caf36962e02efc407b335a9b
parent	6b69678d2e2b3c2109af22409c3aff3cbab3f239 (diff)
download	linux-rt-8faf46dc8df161aa1b6be244e6d842094fc32212.tar.gz