diff options
author | Sebastian Andrzej Siewior <bigeasy@linutronix.de> | 2022-02-04 23:27:55 +0100 |
---|---|---|
committer | Sebastian Andrzej Siewior <bigeasy@linutronix.de> | 2022-02-04 23:27:55 +0100 |
commit | 1dc85f15501cb306f5a52ca8c6b95f7ea6e34ac3 (patch) | |
tree | f37012716c80ad39320ff9e08508cfdfba6b00bc | |
parent | 20dfb35764a8c0c3ec89d7ac3c78e645cefa9a61 (diff) | |
download | linux-rt-1dc85f15501cb306f5a52ca8c6b95f7ea6e34ac3.tar.gz |
[ANNOUNCE] v5.17-rc2-rt4v5.17-rc2-rt4-patches
Dear RT folks!
I'm pleased to announce the v5.17-rc2-rt4 patch set.
Changes since v5.17-rc2-rt3:
- Replace Valentin ARM64 patch regarding arch_faults_on_old_pte() with
an alternative version done by him.
- Correct tracing output. Due to a thinko in the preempt-lazy bits, it
always reported 'p' for preempt-sched which was not true. Now that
field is either empty (.) or showing the need-sched bit (n).
- Update the networking patches based on review on the list.
- Replace the tty/random patches with an alternative approach kindly
contributed by Jason A. Donenfeld. They appear to work, more testing
is needed.
- Update John's printk series.
Known issues
- netconsole triggers WARN.
- Valentin Schneider reported a few splats on ARM64, see
https://lkml.kernel.org/r/20210810134127.1394269-1-valentin.schneider@arm.com
The delta patch against v5.17-rc2-rt3 is appended below and can be found here:
https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.17/incr/patch-5.17-rc2-rt3-rt4.patch.xz
You can get this release via the git tree at:
git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git v5.17-rc2-rt4
The RT patch against v5.17-rc2 can be found here:
https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.17/older/patch-5.17-rc2-rt4.patch.xz
The split quilt queue is available at:
https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.17/older/patches-5.17-rc2-rt4.tar.xz
Sebastian
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
41 files changed, 1370 insertions, 642 deletions
diff --git a/patches/0001-net-dev-Remove-preempt_disable-and-get_cpu-in-netif_.patch b/patches/0001-net-dev-Remove-preempt_disable-and-get_cpu-in-netif_.patch new file mode 100644 index 000000000000..9893806a2398 --- /dev/null +++ b/patches/0001-net-dev-Remove-preempt_disable-and-get_cpu-in-netif_.patch @@ -0,0 +1,64 @@ +From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> +Date: Wed, 15 Dec 2021 09:40:00 +0100 +Subject: [PATCH 1/3] net: dev: Remove preempt_disable() and get_cpu() in + netif_rx_internal(). + +The preempt_disable() () section was introduced in commit + cece1945bffcf ("net: disable preemption before call smp_processor_id()") + +and adds it in case this function is invoked from preemtible context and +because get_cpu() later on as been added. + +The get_cpu() usage was added in commit + b0e28f1effd1d ("net: netif_rx() must disable preemption") + +because ip_dev_loopback_xmit() invoked netif_rx() with enabled preemption +causing a warning in smp_processor_id(). The function netif_rx() should +only be invoked from an interrupt context which implies disabled +preemption. The commit + e30b38c298b55 ("ip: Fix ip_dev_loopback_xmit()") + +was addressing this and replaced netif_rx() with in netif_rx_ni() in +ip_dev_loopback_xmit(). + +Based on the discussion on the list, the former patch (b0e28f1effd1d) +should not have been applied only the latter (e30b38c298b55). + +Remove get_cpu() and preempt_disable() since the function is supposed to +be invoked from context with stable per-CPU pointers. Bottom halves have +to be disabled at this point because the function may raise softirqs +which need to be processed. + +Link: https://lkml.kernel.org/r/20100415.013347.98375530.davem@davemloft.net +Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> +Reviewed-by: Eric Dumazet <edumazet@google.com> +--- + net/core/dev.c | 5 +---- + 1 file changed, 1 insertion(+), 4 deletions(-) + +--- a/net/core/dev.c ++++ b/net/core/dev.c +@@ -4796,7 +4796,6 @@ static int netif_rx_internal(struct sk_b + struct rps_dev_flow voidflow, *rflow = &voidflow; + int cpu; + +- preempt_disable(); + rcu_read_lock(); + + cpu = get_rps_cpu(skb->dev, skb, &rflow); +@@ -4806,14 +4805,12 @@ static int netif_rx_internal(struct sk_b + ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail); + + rcu_read_unlock(); +- preempt_enable(); + } else + #endif + { + unsigned int qtail; + +- ret = enqueue_to_backlog(skb, get_cpu(), &qtail); +- put_cpu(); ++ ret = enqueue_to_backlog(skb, smp_processor_id(), &qtail); + } + return ret; + } diff --git a/patches/0001-net-dev-Remove-the-preempt_disable-in-netif_rx_inter.patch b/patches/0001-net-dev-Remove-the-preempt_disable-in-netif_rx_inter.patch deleted file mode 100644 index 40aec006655f..000000000000 --- a/patches/0001-net-dev-Remove-the-preempt_disable-in-netif_rx_inter.patch +++ /dev/null @@ -1,39 +0,0 @@ -From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> -Date: Wed, 15 Dec 2021 09:40:00 +0100 -Subject: [PATCH 1/4] net: dev: Remove the preempt_disable() in - netif_rx_internal(). - -The preempt_disable() and rcu_disable() section was introduced in commit - bbbe211c295ff ("net: rcu lock and preempt disable missing around generic xdp") - -The backtrace shows that bottom halves were disabled and so the usage of -smp_processor_id() would not trigger a warning. -The "suspicious RCU usage" warning was triggered because -rcu_dereference() was not used in rcu_read_lock() section (only -rcu_read_lock_bh()). A rcu_read_lock() is sufficient. - -Remove the preempt_disable() statement which is not needed. - -Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> ---- - net/core/dev.c | 2 -- - 1 file changed, 2 deletions(-) - ---- a/net/core/dev.c -+++ b/net/core/dev.c -@@ -4796,7 +4796,6 @@ static int netif_rx_internal(struct sk_b - struct rps_dev_flow voidflow, *rflow = &voidflow; - int cpu; - -- preempt_disable(); - rcu_read_lock(); - - cpu = get_rps_cpu(skb->dev, skb, &rflow); -@@ -4806,7 +4805,6 @@ static int netif_rx_internal(struct sk_b - ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail); - - rcu_read_unlock(); -- preempt_enable(); - } else - #endif - { diff --git a/patches/0001-printk-rename-cpulock-functions.patch b/patches/0001-printk-rename-cpulock-functions.patch index dd16def21bda..3b537e3c3c10 100644 --- a/patches/0001-printk-rename-cpulock-functions.patch +++ b/patches/0001-printk-rename-cpulock-functions.patch @@ -1,6 +1,6 @@ From: John Ogness <john.ogness@linutronix.de> -Date: Tue, 28 Sep 2021 11:27:02 +0206 -Subject: [PATCH 01/15] printk: rename cpulock functions +Date: Fri, 4 Feb 2022 16:01:15 +0106 +Subject: [PATCH 01/16] printk: rename cpulock functions Since the printk cpulock is CPU-reentrant and since it is used in all contexts, its usage must be carefully considered and diff --git a/patches/0001-random-use-computational-hash-for-entropy-extraction.patch b/patches/0001-random-use-computational-hash-for-entropy-extraction.patch new file mode 100644 index 000000000000..067494403142 --- /dev/null +++ b/patches/0001-random-use-computational-hash-for-entropy-extraction.patch @@ -0,0 +1,498 @@ +From: "Jason A. Donenfeld" <Jason@zx2c4.com> +Date: Sun, 16 Jan 2022 14:23:10 +0100 +Subject: [PATCH 1/2] random: use computational hash for entropy extraction + +The current 4096-bit LFSR used for entropy collection had a few +desirable attributes for the context in which it was created. For +example, the state was huge, which meant that /dev/random would be able +to output quite a bit of accumulated entropy before blocking. It was +also, in its time, quite fast at accumulating entropy byte-by-byte, +which matters given the varying contexts in which mix_pool_bytes() is +called. And its diffusion was relatively high, which meant that changes +would ripple across several words of state rather quickly. + +However, it also suffers from a few security vulnerabilities. In +particular, inputs learned by an attacker can be undone, but more over, +if the state of the pool leaks, its contents can be controlled and +entirely zeroed out. I've demonstrated this attack with this SMT2 +script, <https://xn--4db.cc/5o9xO8pb>, which Boolector/CaDiCal solves in +a matter of seconds on a single core of my laptop, resulting in little +proof of concept C demonstrators such as <https://xn--4db.cc/jCkvvIaH/c>. + +For basically all recent formal models of RNGs, these attacks represent +a significant cryptographic flaw. But how does this manifest +practically? If an attacker has access to the system to such a degree +that he can learn the internal state of the RNG, arguably there are +other lower hanging vulnerabilities -- side-channel, infoleak, or +otherwise -- that might have higher priority. On the other hand, seed +files are frequently used on systems that have a hard time generating +much entropy on their own, and these seed files, being files, often leak +or are duplicated and distributed accidentally, or are even seeded over +the Internet intentionally, where their contents might be recorded or +tampered with. Seen this way, an otherwise quasi-implausible +vulnerability is a bit more practical than initially thought. + +Another aspect of the current mix_pool_bytes() function is that, while +its performance was arguably competitive for the time in which it was +created, it's no longer considered so. This patch improves performance +significantly: on a high-end CPU, an i7-11850H, it improves performance +of mix_pool_bytes() by 225%, and on a low-end CPU, a Cortex-A7, it +improves performance by 103%. + +This commit replaces the LFSR of mix_pool_bytes() with a straight- +forward cryptographic hash function, BLAKE2s, which is already in use +for pool extraction. Universal hashing with a secret seed was considered +too, something along the lines of <https://eprint.iacr.org/2013/338>, +but the requirement for a secret seed makes for a chicken & egg problem. +Instead we go with a formally proven scheme using a computational hash +function, described in sections 5.1, 6.4, and B.1.8 of +<https://eprint.iacr.org/2019/198>. + +BLAKE2s outputs 256 bits, which should give us an appropriate amount of +min-entropy accumulation, and a wide enough margin of collision +resistance against active attacks. mix_pool_bytes() becomes a simple +call to blake2s_update(), for accumulation, while the extraction step +becomes a blake2s_final() to generate a seed, with which we can then do +a HKDF-like or BLAKE2X-like expansion, the first part of which we fold +back as an init key for subsequent blake2s_update()s, and the rest we +produce to the caller. This then is provided to our CRNG like usual. In +that expansion step, we make opportunistic use of 32 bytes of RDRAND +output, just as before. We also always reseed the crng with 32 bytes, +unconditionally, or not at all, rather than sometimes with 16 as before, +as we don't win anything by limiting beyond the 16 byte threshold. + +Going for a hash function as an entropy collector is a conservative, +proven approach. The result of all this is a much simpler and much less +bespoke construction than what's there now, which not only plugs a +vulnerability but also improves performance considerably. + +[ bigeasy: commit 107307cbac3871a1b26088d4865de57ae9b50032 from + git.kernel.org/pub/scm/linux/kernel/git/crng/random.git ] + +Cc: Theodore Ts'o <tytso@mit.edu> +Cc: Dominik Brodowski <linux@dominikbrodowski.net> +Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> +Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> +Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> +Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> +--- + drivers/char/random.c | 304 +++++++++----------------------------------------- + 1 file changed, 55 insertions(+), 249 deletions(-) + +--- a/drivers/char/random.c ++++ b/drivers/char/random.c +@@ -42,61 +42,6 @@ + */ + + /* +- * (now, with legal B.S. out of the way.....) +- * +- * This routine gathers environmental noise from device drivers, etc., +- * and returns good random numbers, suitable for cryptographic use. +- * Besides the obvious cryptographic uses, these numbers are also good +- * for seeding TCP sequence numbers, and other places where it is +- * desirable to have numbers which are not only random, but hard to +- * predict by an attacker. +- * +- * Theory of operation +- * =================== +- * +- * Computers are very predictable devices. Hence it is extremely hard +- * to produce truly random numbers on a computer --- as opposed to +- * pseudo-random numbers, which can easily generated by using a +- * algorithm. Unfortunately, it is very easy for attackers to guess +- * the sequence of pseudo-random number generators, and for some +- * applications this is not acceptable. So instead, we must try to +- * gather "environmental noise" from the computer's environment, which +- * must be hard for outside attackers to observe, and use that to +- * generate random numbers. In a Unix environment, this is best done +- * from inside the kernel. +- * +- * Sources of randomness from the environment include inter-keyboard +- * timings, inter-interrupt timings from some interrupts, and other +- * events which are both (a) non-deterministic and (b) hard for an +- * outside observer to measure. Randomness from these sources are +- * added to an "entropy pool", which is mixed using a CRC-like function. +- * This is not cryptographically strong, but it is adequate assuming +- * the randomness is not chosen maliciously, and it is fast enough that +- * the overhead of doing it on every interrupt is very reasonable. +- * As random bytes are mixed into the entropy pool, the routines keep +- * an *estimate* of how many bits of randomness have been stored into +- * the random number generator's internal state. +- * +- * When random bytes are desired, they are obtained by taking the BLAKE2s +- * hash of the contents of the "entropy pool". The BLAKE2s hash avoids +- * exposing the internal state of the entropy pool. It is believed to +- * be computationally infeasible to derive any useful information +- * about the input of BLAKE2s from its output. Even if it is possible to +- * analyze BLAKE2s in some clever way, as long as the amount of data +- * returned from the generator is less than the inherent entropy in +- * the pool, the output data is totally unpredictable. For this +- * reason, the routine decreases its internal estimate of how many +- * bits of "true randomness" are contained in the entropy pool as it +- * outputs random numbers. +- * +- * If this estimate goes to zero, the routine can still generate +- * random numbers; however, an attacker may (at least in theory) be +- * able to infer the future output of the generator from prior +- * outputs. This requires successful cryptanalysis of BLAKE2s, which is +- * not believed to be feasible, but there is a remote possibility. +- * Nonetheless, these numbers should be useful for the vast majority +- * of purposes. +- * + * Exported interfaces ---- output + * =============================== + * +@@ -298,23 +243,6 @@ + * + * mknod /dev/random c 1 8 + * mknod /dev/urandom c 1 9 +- * +- * Acknowledgements: +- * ================= +- * +- * Ideas for constructing this random number generator were derived +- * from Pretty Good Privacy's random number generator, and from private +- * discussions with Phil Karn. Colin Plumb provided a faster random +- * number generator, which speed up the mixing function of the entropy +- * pool, taken from PGPfone. Dale Worley has also contributed many +- * useful ideas and suggestions to improve this driver. +- * +- * Any flaws in the design are solely my responsibility, and should +- * not be attributed to the Phil, Colin, or any of authors of PGP. +- * +- * Further background information on this topic may be obtained from +- * RFC 1750, "Randomness Recommendations for Security", by Donald +- * Eastlake, Steve Crocker, and Jeff Schiller. + */ + + #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +@@ -358,79 +286,15 @@ + + /* #define ADD_INTERRUPT_BENCH */ + +-/* +- * If the entropy count falls under this number of bits, then we +- * should wake up processes which are selecting or polling on write +- * access to /dev/random. +- */ +-static int random_write_wakeup_bits = 28 * (1 << 5); +- +-/* +- * Originally, we used a primitive polynomial of degree .poolwords +- * over GF(2). The taps for various sizes are defined below. They +- * were chosen to be evenly spaced except for the last tap, which is 1 +- * to get the twisting happening as fast as possible. +- * +- * For the purposes of better mixing, we use the CRC-32 polynomial as +- * well to make a (modified) twisted Generalized Feedback Shift +- * Register. (See M. Matsumoto & Y. Kurita, 1992. Twisted GFSR +- * generators. ACM Transactions on Modeling and Computer Simulation +- * 2(3):179-194. Also see M. Matsumoto & Y. Kurita, 1994. Twisted +- * GFSR generators II. ACM Transactions on Modeling and Computer +- * Simulation 4:254-266) +- * +- * Thanks to Colin Plumb for suggesting this. +- * +- * The mixing operation is much less sensitive than the output hash, +- * where we use BLAKE2s. All that we want of mixing operation is that +- * it be a good non-cryptographic hash; i.e. it not produce collisions +- * when fed "random" data of the sort we expect to see. As long as +- * the pool state differs for different inputs, we have preserved the +- * input entropy and done a good job. The fact that an intelligent +- * attacker can construct inputs that will produce controlled +- * alterations to the pool's state is not important because we don't +- * consider such inputs to contribute any randomness. The only +- * property we need with respect to them is that the attacker can't +- * increase his/her knowledge of the pool's state. Since all +- * additions are reversible (knowing the final state and the input, +- * you can reconstruct the initial state), if an attacker has any +- * uncertainty about the initial state, he/she can only shuffle that +- * uncertainty about, but never cause any collisions (which would +- * decrease the uncertainty). +- * +- * Our mixing functions were analyzed by Lacharme, Roeck, Strubel, and +- * Videau in their paper, "The Linux Pseudorandom Number Generator +- * Revisited" (see: http://eprint.iacr.org/2012/251.pdf). In their +- * paper, they point out that we are not using a true Twisted GFSR, +- * since Matsumoto & Kurita used a trinomial feedback polynomial (that +- * is, with only three taps, instead of the six that we are using). +- * As a result, the resulting polynomial is neither primitive nor +- * irreducible, and hence does not have a maximal period over +- * GF(2**32). They suggest a slight change to the generator +- * polynomial which improves the resulting TGFSR polynomial to be +- * irreducible, which we have made here. +- */ + enum poolinfo { +- POOL_WORDS = 128, +- POOL_WORDMASK = POOL_WORDS - 1, +- POOL_BYTES = POOL_WORDS * sizeof(u32), +- POOL_BITS = POOL_BYTES * 8, ++ POOL_BITS = BLAKE2S_HASH_SIZE * 8, + POOL_BITSHIFT = ilog2(POOL_BITS), + + /* To allow fractional bits to be tracked, the entropy_count field is + * denominated in units of 1/8th bits. */ + POOL_ENTROPY_SHIFT = 3, + #define POOL_ENTROPY_BITS() (input_pool.entropy_count >> POOL_ENTROPY_SHIFT) +- POOL_FRACBITS = POOL_BITS << POOL_ENTROPY_SHIFT, +- +- /* x^128 + x^104 + x^76 + x^51 +x^25 + x + 1 */ +- POOL_TAP1 = 104, +- POOL_TAP2 = 76, +- POOL_TAP3 = 51, +- POOL_TAP4 = 25, +- POOL_TAP5 = 1, +- +- EXTRACT_SIZE = BLAKE2S_HASH_SIZE / 2 ++ POOL_FRACBITS = POOL_BITS << POOL_ENTROPY_SHIFT + }; + + /* +@@ -438,6 +302,12 @@ enum poolinfo { + */ + static DECLARE_WAIT_QUEUE_HEAD(random_write_wait); + static struct fasync_struct *fasync; ++/* ++ * If the entropy count falls under this number of bits, then we ++ * should wake up processes which are selecting or polling on write ++ * access to /dev/random. ++ */ ++static int random_write_wakeup_bits = POOL_BITS * 3 / 4; + + static DEFINE_SPINLOCK(random_ready_list_lock); + static LIST_HEAD(random_ready_list); +@@ -493,73 +363,31 @@ MODULE_PARM_DESC(ratelimit_disable, "Dis + * + **********************************************************************/ + +-static u32 input_pool_data[POOL_WORDS] __latent_entropy; +- + static struct { ++ struct blake2s_state hash; + spinlock_t lock; +- u16 add_ptr; +- u16 input_rotate; + int entropy_count; + } input_pool = { ++ .hash.h = { BLAKE2S_IV0 ^ (0x01010000 | BLAKE2S_HASH_SIZE), ++ BLAKE2S_IV1, BLAKE2S_IV2, BLAKE2S_IV3, BLAKE2S_IV4, ++ BLAKE2S_IV5, BLAKE2S_IV6, BLAKE2S_IV7 }, ++ .hash.outlen = BLAKE2S_HASH_SIZE, + .lock = __SPIN_LOCK_UNLOCKED(input_pool.lock), + }; + +-static ssize_t extract_entropy(void *buf, size_t nbytes, int min); +-static ssize_t _extract_entropy(void *buf, size_t nbytes); ++static bool extract_entropy(void *buf, size_t nbytes, int min); ++static void _extract_entropy(void *buf, size_t nbytes); + + static void crng_reseed(struct crng_state *crng, bool use_input_pool); + +-static const u32 twist_table[8] = { +- 0x00000000, 0x3b6e20c8, 0x76dc4190, 0x4db26158, +- 0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 }; +- + /* + * This function adds bytes into the entropy "pool". It does not + * update the entropy estimate. The caller should call + * credit_entropy_bits if this is appropriate. +- * +- * The pool is stirred with a primitive polynomial of the appropriate +- * degree, and then twisted. We twist by three bits at a time because +- * it's cheap to do so and helps slightly in the expected case where +- * the entropy is concentrated in the low-order bits. + */ + static void _mix_pool_bytes(const void *in, int nbytes) + { +- unsigned long i; +- int input_rotate; +- const u8 *bytes = in; +- u32 w; +- +- input_rotate = input_pool.input_rotate; +- i = input_pool.add_ptr; +- +- /* mix one byte at a time to simplify size handling and churn faster */ +- while (nbytes--) { +- w = rol32(*bytes++, input_rotate); +- i = (i - 1) & POOL_WORDMASK; +- +- /* XOR in the various taps */ +- w ^= input_pool_data[i]; +- w ^= input_pool_data[(i + POOL_TAP1) & POOL_WORDMASK]; +- w ^= input_pool_data[(i + POOL_TAP2) & POOL_WORDMASK]; +- w ^= input_pool_data[(i + POOL_TAP3) & POOL_WORDMASK]; +- w ^= input_pool_data[(i + POOL_TAP4) & POOL_WORDMASK]; +- w ^= input_pool_data[(i + POOL_TAP5) & POOL_WORDMASK]; +- +- /* Mix the result back in with a twist */ +- input_pool_data[i] = (w >> 3) ^ twist_table[w & 7]; +- +- /* +- * Normally, we add 7 bits of rotation to the pool. +- * At the beginning of the pool, add an extra 7 bits +- * rotation, so that successive passes spread the +- * input bits across the pool evenly. +- */ +- input_rotate = (input_rotate + (i ? 7 : 14)) & 31; +- } +- +- input_pool.input_rotate = input_rotate; +- input_pool.add_ptr = i; ++ blake2s_update(&input_pool.hash, in, nbytes); + } + + static void __mix_pool_bytes(const void *in, int nbytes) +@@ -954,15 +782,14 @@ static int crng_slow_load(const u8 *cp, + static void crng_reseed(struct crng_state *crng, bool use_input_pool) + { + unsigned long flags; +- int i, num; ++ int i; + union { + u8 block[CHACHA_BLOCK_SIZE]; + u32 key[8]; + } buf; + + if (use_input_pool) { +- num = extract_entropy(&buf, 32, 16); +- if (num == 0) ++ if (!extract_entropy(&buf, 32, 16)) + return; + } else { + _extract_crng(&primary_crng, buf.block); +@@ -1329,74 +1156,48 @@ static size_t account(size_t nbytes, int + } + + /* +- * This function does the actual extraction for extract_entropy. +- * +- * Note: we assume that .poolwords is a multiple of 16 words. ++ * This is an HKDF-like construction for using the hashed collected entropy ++ * as a PRF key, that's then expanded block-by-block. + */ +-static void extract_buf(u8 *out) ++static void _extract_entropy(void *buf, size_t nbytes) + { +- struct blake2s_state state __aligned(__alignof__(unsigned long)); +- u8 hash[BLAKE2S_HASH_SIZE]; +- unsigned long *salt; + unsigned long flags; +- +- blake2s_init(&state, sizeof(hash)); +- +- /* +- * If we have an architectural hardware random number +- * generator, use it for BLAKE2's salt & personal fields. +- */ +- for (salt = (unsigned long *)&state.h[4]; +- salt < (unsigned long *)&state.h[8]; ++salt) { +- unsigned long v; +- if (!arch_get_random_long(&v)) +- break; +- *salt ^= v; ++ u8 seed[BLAKE2S_HASH_SIZE], next_key[BLAKE2S_HASH_SIZE]; ++ struct { ++ unsigned long rdrand[32 / sizeof(long)]; ++ size_t counter; ++ } block; ++ size_t i; ++ ++ for (i = 0; i < ARRAY_SIZE(block.rdrand); ++i) { ++ if (!arch_get_random_long(&block.rdrand[i])) ++ block.rdrand[i] = random_get_entropy(); + } + +- /* Generate a hash across the pool */ + spin_lock_irqsave(&input_pool.lock, flags); +- blake2s_update(&state, (const u8 *)input_pool_data, POOL_BYTES); +- blake2s_final(&state, hash); /* final zeros out state */ + +- /* +- * We mix the hash back into the pool to prevent backtracking +- * attacks (where the attacker knows the state of the pool +- * plus the current outputs, and attempts to find previous +- * outputs), unless the hash function can be inverted. By +- * mixing at least a hash worth of hash data back, we make +- * brute-forcing the feedback as hard as brute-forcing the +- * hash. +- */ +- __mix_pool_bytes(hash, sizeof(hash)); +- spin_unlock_irqrestore(&input_pool.lock, flags); ++ /* seed = HASHPRF(last_key, entropy_input) */ ++ blake2s_final(&input_pool.hash, seed); + +- /* Note that EXTRACT_SIZE is half of hash size here, because above +- * we've dumped the full length back into mixer. By reducing the +- * amount that we emit, we retain a level of forward secrecy. +- */ +- memcpy(out, hash, EXTRACT_SIZE); +- memzero_explicit(hash, sizeof(hash)); +-} ++ /* next_key = HASHPRF(key, RDRAND || 0) */ ++ block.counter = 0; ++ blake2s(next_key, (u8 *)&block, seed, sizeof(next_key), sizeof(block), sizeof(seed)); ++ blake2s_init_key(&input_pool.hash, BLAKE2S_HASH_SIZE, next_key, sizeof(next_key)); + +-static ssize_t _extract_entropy(void *buf, size_t nbytes) +-{ +- ssize_t ret = 0, i; +- u8 tmp[EXTRACT_SIZE]; ++ spin_unlock_irqrestore(&input_pool.lock, flags); ++ memzero_explicit(next_key, sizeof(next_key)); + + while (nbytes) { +- extract_buf(tmp); +- i = min_t(int, nbytes, EXTRACT_SIZE); +- memcpy(buf, tmp, i); ++ i = min_t(size_t, nbytes, BLAKE2S_HASH_SIZE); ++ /* output = HASHPRF(key, RDRAND || ++counter) */ ++ ++block.counter; ++ blake2s(buf, (u8 *)&block, seed, i, sizeof(block), sizeof(seed)); + nbytes -= i; + buf += i; +- ret += i; + } + +- /* Wipe data just returned from memory */ +- memzero_explicit(tmp, sizeof(tmp)); +- +- return ret; ++ memzero_explicit(seed, sizeof(seed)); ++ memzero_explicit(&block, sizeof(block)); + } + + /* +@@ -1404,13 +1205,18 @@ static ssize_t _extract_entropy(void *bu + * returns it in a buffer. + * + * The min parameter specifies the minimum amount we can pull before +- * failing to avoid races that defeat catastrophic reseeding. ++ * failing to avoid races that defeat catastrophic reseeding. If we ++ * have less than min entropy available, we return false and buf is ++ * not filled. + */ +-static ssize_t extract_entropy(void *buf, size_t nbytes, int min) ++static bool extract_entropy(void *buf, size_t nbytes, int min) + { + trace_extract_entropy(nbytes, POOL_ENTROPY_BITS(), _RET_IP_); +- nbytes = account(nbytes, min); +- return _extract_entropy(buf, nbytes); ++ if (account(nbytes, min)) { ++ _extract_entropy(buf, nbytes); ++ return true; ++ } ++ return false; + } + + #define warn_unseeded_randomness(previous) \ +@@ -1674,7 +1480,7 @@ static void __init init_std_data(void) + unsigned long rv; + + mix_pool_bytes(&now, sizeof(now)); +- for (i = POOL_BYTES; i > 0; i -= sizeof(rv)) { ++ for (i = BLAKE2S_BLOCK_SIZE; i > 0; i -= sizeof(rv)) { + if (!arch_get_random_seed_long(&rv) && + !arch_get_random_long(&rv)) + rv = random_get_entropy(); diff --git a/patches/0004-net-dev-Make-rps_lock-disable-interrupts.patch b/patches/0002-net-dev-Make-rps_lock-disable-interrupts.patch index 74e36a936816..92b9927ec8c7 100644 --- a/patches/0004-net-dev-Make-rps_lock-disable-interrupts.patch +++ b/patches/0002-net-dev-Make-rps_lock-disable-interrupts.patch @@ -1,25 +1,30 @@ From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Date: Thu, 16 Dec 2021 10:57:55 +0100 -Subject: [PATCH 4/4] net: dev: Make rps_lock() disable interrupts. +Subject: [PATCH 2/3] net: dev: Make rps_lock() disable interrupts. + +Disabling interrupts and in the RPS case locking input_pkt_queue is +split into local_irq_disable() and optional spin_lock(). -Interrupts disabling and in the RPS case locking input_pkt_queue case is split -into local_irq_disable() and optional spin_lock(). This breaks on PREEMPT_RT because the spinlock_t typed lock can not be acquired with disabled interrupts. The sections in which the lock is acquired is usually short in a sense that it is not causing long und unbounded latiencies. One exception is the -skb_flow_limit() invocation which may invoke a BPF program. +skb_flow_limit() invocation which may invoke a BPF program (and may +require sleeping locks). -By moving local_irq_disable()+spin_lock() into rps_lock(), we can keep -interrupts disabled on !RT kernels and enabled on RT kernels. Without -RPS, the needed synchronisation happens as part of local_bh_disable() on -the local CPU. Since interrupts remain enabled, enqueue_to_backlog() -needs to disable interrupts for ____napi_schedule(). +By moving local_irq_disable() + spin_lock() into rps_lock(), we can keep +interrupts disabled on !PREEMPT_RT and enabled on PREEMPT_RT kernels. +Without RPS on a PREEMPT_RT kernel, the needed synchronisation happens +as part of local_bh_disable() on the local CPU. +____napi_schedule() is only invoked if sd is from the local CPU. Replace +it with __napi_schedule_irqoff() which already disables interrupts on +PREEMPT_RT as needed. Move this call to rps_ipi_queued() and rename the +function to napi_schedule_rps as suggested by Jakub. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- - net/core/dev.c | 72 ++++++++++++++++++++++++++++++++++----------------------- - 1 file changed, 44 insertions(+), 28 deletions(-) + net/core/dev.c | 76 +++++++++++++++++++++++++++++++-------------------------- + 1 file changed, 42 insertions(+), 34 deletions(-) --- a/net/core/dev.c +++ b/net/core/dev.c @@ -70,7 +75,29 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> } static struct netdev_name_node *netdev_name_node_alloc(struct net_device *dev, -@@ -4525,9 +4545,7 @@ static int enqueue_to_backlog(struct sk_ +@@ -4456,11 +4476,11 @@ static void rps_trigger_softirq(void *da + * If yes, queue it to our IPI list and return 1 + * If no, return 0 + */ +-static int rps_ipi_queued(struct softnet_data *sd) ++static int napi_schedule_rps(struct softnet_data *sd) + { +-#ifdef CONFIG_RPS + struct softnet_data *mysd = this_cpu_ptr(&softnet_data); + ++#ifdef CONFIG_RPS + if (sd != mysd) { + sd->rps_ipi_next = mysd->rps_ipi_list; + mysd->rps_ipi_list = sd; +@@ -4469,6 +4489,7 @@ static int rps_ipi_queued(struct softnet + return 1; + } + #endif /* CONFIG_RPS */ ++ __napi_schedule_irqoff(&mysd->backlog); + return 0; + } + +@@ -4525,9 +4546,7 @@ static int enqueue_to_backlog(struct sk_ sd = &per_cpu(softnet_data, cpu); @@ -81,7 +108,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> if (!netif_running(skb->dev)) goto drop; qlen = skb_queue_len(&sd->input_pkt_queue); -@@ -4536,26 +4554,30 @@ static int enqueue_to_backlog(struct sk_ +@@ -4536,26 +4555,21 @@ static int enqueue_to_backlog(struct sk_ enqueue: __skb_queue_tail(&sd->input_pkt_queue, skb); input_queue_tail_incr_save(sd, qtail); @@ -93,18 +120,13 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> /* Schedule NAPI for backlog device * We can use non atomic operation since we own the queue lock -+ * PREEMPT_RT needs to disable interrupts here for -+ * synchronisation needed in napi_schedule. */ -+ if (IS_ENABLED(CONFIG_PREEMPT_RT)) -+ local_irq_disable(); -+ - if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd->backlog.state)) { - if (!rps_ipi_queued(sd)) - ____napi_schedule(sd, &sd->backlog); - } -+ if (IS_ENABLED(CONFIG_PREEMPT_RT)) -+ local_irq_enable(); +- if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd->backlog.state)) { +- if (!rps_ipi_queued(sd)) +- ____napi_schedule(sd, &sd->backlog); +- } ++ if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd->backlog.state)) ++ napi_schedule_rps(sd); goto enqueue; } @@ -117,7 +139,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> atomic_long_inc(&skb->dev->rx_dropped); kfree_skb(skb); -@@ -5617,8 +5639,7 @@ static void flush_backlog(struct work_st +@@ -5647,8 +5661,7 @@ static void flush_backlog(struct work_st local_bh_disable(); sd = this_cpu_ptr(&softnet_data); @@ -127,7 +149,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp) { if (skb->dev->reg_state == NETREG_UNREGISTERING) { __skb_unlink(skb, &sd->input_pkt_queue); -@@ -5626,8 +5647,7 @@ static void flush_backlog(struct work_st +@@ -5656,8 +5669,7 @@ static void flush_backlog(struct work_st input_queue_head_incr(sd); } } @@ -137,7 +159,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> skb_queue_walk_safe(&sd->process_queue, skb, tmp) { if (skb->dev->reg_state == NETREG_UNREGISTERING) { -@@ -5645,16 +5665,14 @@ static bool flush_required(int cpu) +@@ -5675,16 +5687,14 @@ static bool flush_required(int cpu) struct softnet_data *sd = &per_cpu(softnet_data, cpu); bool do_flush; @@ -156,7 +178,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> return do_flush; #endif -@@ -5769,8 +5787,7 @@ static int process_backlog(struct napi_s +@@ -5799,8 +5809,7 @@ static int process_backlog(struct napi_s } @@ -166,7 +188,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> if (skb_queue_empty(&sd->input_pkt_queue)) { /* * Inline a custom version of __napi_complete(). -@@ -5786,8 +5803,7 @@ static int process_backlog(struct napi_s +@@ -5816,8 +5825,7 @@ static int process_backlog(struct napi_s skb_queue_splice_tail_init(&sd->input_pkt_queue, &sd->process_queue); } diff --git a/patches/0002-net-dev-Remove-get_cpu-in-netif_rx_internal.patch b/patches/0002-net-dev-Remove-get_cpu-in-netif_rx_internal.patch deleted file mode 100644 index 61ed5009aef8..000000000000 --- a/patches/0002-net-dev-Remove-get_cpu-in-netif_rx_internal.patch +++ /dev/null @@ -1,41 +0,0 @@ -From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> -Date: Wed, 15 Dec 2021 11:28:09 +0100 -Subject: [PATCH 2/4] net: dev: Remove get_cpu() in netif_rx_internal(). - -The get_cpu() usage was added in commit - b0e28f1effd1d ("net: netif_rx() must disable preemption") - -because ip_dev_loopback_xmit() invoked netif_rx() with enabled preemtion -causing a warning in smp_processor_id(). The function netif_rx() should -only be invoked from an interrupt context which implies disabled -preemption. The commit - e30b38c298b55 ("ip: Fix ip_dev_loopback_xmit()") - -was addressing this and replaced replaced netif_rx() with in -netif_rx_ni() in ip_dev_loopback_xmit(). - -Based on the discussion on the list, the former patch (b0e28f1effd1d) -should not have been applied only the latter (e30b38c298b55). - -Remove get_cpu() since the function is supossed to be invoked from -context with stable per-CPU pointers (either by disabling preemption or -software interrupts). - -Link: https://lkml.kernel.org/r/20100415.013347.98375530.davem@davemloft.net -Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> ---- - net/core/dev.c | 3 +-- - 1 file changed, 1 insertion(+), 2 deletions(-) - ---- a/net/core/dev.c -+++ b/net/core/dev.c -@@ -4810,8 +4810,7 @@ static int netif_rx_internal(struct sk_b - { - unsigned int qtail; - -- ret = enqueue_to_backlog(skb, get_cpu(), &qtail); -- put_cpu(); -+ ret = enqueue_to_backlog(skb, smp_processor_id(), &qtail); - } - return ret; - } diff --git a/patches/0002-printk-cpu-sync-always-disable-interrupts.patch b/patches/0002-printk-cpu-sync-always-disable-interrupts.patch index ac4fdc6c1ac5..87a9766a0e5f 100644 --- a/patches/0002-printk-cpu-sync-always-disable-interrupts.patch +++ b/patches/0002-printk-cpu-sync-always-disable-interrupts.patch @@ -1,6 +1,6 @@ From: John Ogness <john.ogness@linutronix.de> -Date: Tue, 3 Aug 2021 13:00:00 +0206 -Subject: [PATCH 02/15] printk: cpu sync always disable interrupts +Date: Fri, 4 Feb 2022 16:01:15 +0106 +Subject: [PATCH 02/16] printk: cpu sync always disable interrupts The CPU sync functions are a NOP for !CONFIG_SMP. But for !CONFIG_SMP they still need to disable interrupts in order to diff --git a/patches/0002-random-do-not-take-spinlocks-in-irq-handler.patch b/patches/0002-random-do-not-take-spinlocks-in-irq-handler.patch new file mode 100644 index 000000000000..a1f1550ed6dc --- /dev/null +++ b/patches/0002-random-do-not-take-spinlocks-in-irq-handler.patch @@ -0,0 +1,170 @@ +From: "Jason A. Donenfeld" <Jason@zx2c4.com> +Date: Fri, 4 Feb 2022 16:31:49 +0100 +Subject: [PATCH 2/2] random: do not take spinlocks in irq handler +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +On PREEMPT_RT, it's problematic to take spinlocks from hard IRQ +handlers. We can fix this by deferring to a work queue the dumping of +the fast pool into the input pool. + +We accomplish this by making `u8 count` an `atomic_t count`, with the +following rules: + + - When it's incremented to >= 64, we schedule the work. + - If the top bit is set, we never schedule the work, even if >= 64. + - The worker is responsible for setting it back to 0 when it's done. + - If we need to retry the worker later, we clear the top bit. + +In the worst case, an IRQ handler is mixing a new IRQ into the pool at +the same time as the worker is dumping it into the input pool. In this +case, we only ever set the count back to 0 _after_ we're done, so that +subsequent cycles will require a full 64 to dump it in again. In other +words, the result of this race is only ever adding a little bit more +information than normal, but never less, and never crediting any more +for this partial additional information. + +Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> +Cc: Thomas Gleixner <tglx@linutronix.de> +Cc: Peter Zijlstra <peterz@infradead.org> +Cc: Theodore Ts'o <tytso@mit.edu> +Cc: Sultan Alsawaf <sultan@kerneltoast.com> +Cc: Jonathan Neuschäfer <j.neuschaefer@gmx.net> +Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> +Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> +Link: https://lkml.kernel.org/r/20220204153149.51428-1-Jason@zx2c4.com +--- + drivers/char/random.c | 67 ++++++++++++++++++++++-------------------- + include/trace/events/random.h | 6 --- + 2 files changed, 36 insertions(+), 37 deletions(-) + +--- a/drivers/char/random.c ++++ b/drivers/char/random.c +@@ -390,12 +390,6 @@ static void _mix_pool_bytes(const void * + blake2s_update(&input_pool.hash, in, nbytes); + } + +-static void __mix_pool_bytes(const void *in, int nbytes) +-{ +- trace_mix_pool_bytes_nolock(nbytes, _RET_IP_); +- _mix_pool_bytes(in, nbytes); +-} +- + static void mix_pool_bytes(const void *in, int nbytes) + { + unsigned long flags; +@@ -407,11 +401,13 @@ static void mix_pool_bytes(const void *i + } + + struct fast_pool { +- u32 pool[4]; ++ struct work_struct mix; + unsigned long last; ++ u32 pool[4]; ++ atomic_t count; + u16 reg_idx; +- u8 count; + }; ++#define FAST_POOL_MIX_INFLIGHT (1U << 31) + + /* + * This is a fast mixing routine used by the interrupt randomness +@@ -441,7 +437,6 @@ static void fast_mix(struct fast_pool *f + + f->pool[0] = a; f->pool[1] = b; + f->pool[2] = c; f->pool[3] = d; +- f->count++; + } + + static void process_random_ready_list(void) +@@ -1047,12 +1042,37 @@ static u32 get_reg(struct fast_pool *f, + return *ptr; + } + ++static void mix_interrupt_randomness(struct work_struct *work) ++{ ++ struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix); ++ ++ fast_pool->last = jiffies; ++ ++ /* Since this is the result of a trip through the scheduler, xor in ++ * a cycle counter. It can't hurt, and might help. ++ */ ++ fast_pool->pool[3] ^= random_get_entropy(); ++ ++ if (unlikely(crng_init == 0)) { ++ if (crng_fast_load((u8 *)&fast_pool->pool, sizeof(fast_pool->pool)) > 0) ++ atomic_set(&fast_pool->count, 0); ++ else ++ atomic_and(~FAST_POOL_MIX_INFLIGHT, &fast_pool->count); ++ return; ++ } ++ ++ mix_pool_bytes(&fast_pool->pool, sizeof(fast_pool->pool)); ++ atomic_set(&fast_pool->count, 0); ++ credit_entropy_bits(1); ++} ++ + void add_interrupt_randomness(int irq) + { + struct fast_pool *fast_pool = this_cpu_ptr(&irq_randomness); + struct pt_regs *regs = get_irq_regs(); + unsigned long now = jiffies; + cycles_t cycles = random_get_entropy(); ++ unsigned int new_count; + u32 c_high, j_high; + u64 ip; + +@@ -1070,29 +1090,14 @@ void add_interrupt_randomness(int irq) + fast_mix(fast_pool); + add_interrupt_bench(cycles); + +- if (unlikely(crng_init == 0)) { +- if ((fast_pool->count >= 64) && +- crng_fast_load((u8 *)fast_pool->pool, sizeof(fast_pool->pool)) > 0) { +- fast_pool->count = 0; +- fast_pool->last = now; +- } +- return; ++ new_count = (unsigned int)atomic_inc_return(&fast_pool->count); ++ if (new_count >= 64 && new_count < FAST_POOL_MIX_INFLIGHT && ++ (time_after(now, fast_pool->last + HZ) || unlikely(crng_init == 0))) { ++ if (unlikely(!fast_pool->mix.func)) ++ INIT_WORK(&fast_pool->mix, mix_interrupt_randomness); ++ atomic_or(FAST_POOL_MIX_INFLIGHT, &fast_pool->count); ++ schedule_work(&fast_pool->mix); + } +- +- if ((fast_pool->count < 64) && !time_after(now, fast_pool->last + HZ)) +- return; +- +- if (!spin_trylock(&input_pool.lock)) +- return; +- +- fast_pool->last = now; +- __mix_pool_bytes(&fast_pool->pool, sizeof(fast_pool->pool)); +- spin_unlock(&input_pool.lock); +- +- fast_pool->count = 0; +- +- /* award one bit for the contents of the fast pool */ +- credit_entropy_bits(1); + } + EXPORT_SYMBOL_GPL(add_interrupt_randomness); + +--- a/include/trace/events/random.h ++++ b/include/trace/events/random.h +@@ -52,12 +52,6 @@ DEFINE_EVENT(random__mix_pool_bytes, mix + TP_ARGS(bytes, IP) + ); + +-DEFINE_EVENT(random__mix_pool_bytes, mix_pool_bytes_nolock, +- TP_PROTO(int bytes, unsigned long IP), +- +- TP_ARGS(bytes, IP) +-); +- + TRACE_EVENT(credit_entropy_bits, + TP_PROTO(int bits, int entropy_count, unsigned long IP), + diff --git a/patches/0003-net-dev-Makes-sure-netif_rx-can-be-invoked-in-any-co.patch b/patches/0003-net-dev-Makes-sure-netif_rx-can-be-invoked-in-any-co.patch index 5f926571a28c..1139916bce57 100644 --- a/patches/0003-net-dev-Makes-sure-netif_rx-can-be-invoked-in-any-co.patch +++ b/patches/0003-net-dev-Makes-sure-netif_rx-can-be-invoked-in-any-co.patch @@ -1,49 +1,68 @@ From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Date: Wed, 15 Dec 2021 12:47:30 +0100 -Subject: [PATCH 3/4] net: dev: Makes sure netif_rx() can be invoked in any +Subject: [PATCH 3/3] net: dev: Makes sure netif_rx() can be invoked in any context. -Dave suggested a while ago (11y by now) "Let's make netif_rx() work in -all contexts and get rid of netif_rx_ni()". Eric agreed and pointed out -that modern devices should use netif_receive_skb() to avoid the -overhead. +Dave suggested a while ago (eleven years by now) "Let's make netif_rx() +work in all contexts and get rid of netif_rx_ni()". Eric agreed and +pointed out that modern devices should use netif_receive_skb() to avoid +the overhead. In the meantime someone added another variant, netif_rx_any_context(), which behaves as suggested. -netif_rx() must be invoked with disabled bottom halfs to ensure that to -ensure that pending softirqs, which were raised within the function, are -handled. -netif_rx_ni() can be invoked only from preemptible context because the -function handles pending softirqs without checking if bottom halfes were -disabled or not. +netif_rx() must be invoked with disabled bottom halves to ensure that +pending softirqs, which were raised within the function, are handled. +netif_rx_ni() can be invoked only from process context (bottom halves +must be enabled) because the function handles pending softirqs without +checking if bottom halves were disabled or not. netif_rx_any_context() invokes on the former functions by checking in_interrupts(). netif_rx() could be taught to handle both cases (disabled and enabled -bottom halves) by simply disabling bottom halfs while invoking -netif_rx_internal(). The local_bh_enable() invokcation will then invoke +bottom halves) by simply disabling bottom halves while invoking +netif_rx_internal(). The local_bh_enable() invocation will then invoke pending softirqs only if the BH-disable counter drops to zero. +Eric is concerned about the overhead of BH-disable+enable especially in +regard to the loopback driver. As critical as this driver is, it will +receive a shortcut to avoid the additional overhead which is not needed. + Add a local_bh_disable() section in netif_rx() to ensure softirqs are -handled if needed. Make netif_rx_ni() and netif_rx_any_context() invoke -netif_rx() so they can be removed once they are no more users left. +handled if needed. Provide the internal bits as __netif_rx() which can +be used by the loopback driver. This function is not exported so it +can't be used by modules. +Make netif_rx_ni() and netif_rx_any_context() invoke netif_rx() so they +can be removed once they are no more users left. Link: https://lkml.kernel.org/r/20100415.020246.218622820.davem@davemloft.net Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- - include/linux/netdevice.h | 13 +++++++++++-- - include/trace/events/net.h | 14 -------------- - net/core/dev.c | 34 ++-------------------------------- - 3 files changed, 13 insertions(+), 48 deletions(-) + drivers/net/loopback.c | 2 - + include/linux/netdevice.h | 14 ++++++++++- + include/trace/events/net.h | 14 ----------- + net/core/dev.c | 53 ++++++++++++--------------------------------- + 4 files changed, 28 insertions(+), 55 deletions(-) +--- a/drivers/net/loopback.c ++++ b/drivers/net/loopback.c +@@ -86,7 +86,7 @@ static netdev_tx_t loopback_xmit(struct + skb->protocol = eth_type_trans(skb, dev); + + len = skb->len; +- if (likely(netif_rx(skb) == NET_RX_SUCCESS)) ++ if (likely(__netif_rx(skb) == NET_RX_SUCCESS)) + dev_lstats_add(dev, len); + + return NETDEV_TX_OK; --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h -@@ -3668,8 +3668,17 @@ u32 bpf_prog_run_generic_xdp(struct sk_b +@@ -3669,8 +3669,18 @@ u32 bpf_prog_run_generic_xdp(struct sk_b void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog); int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb); int netif_rx(struct sk_buff *skb); -int netif_rx_ni(struct sk_buff *skb); -int netif_rx_any_context(struct sk_buff *skb); ++int __netif_rx(struct sk_buff *skb); + +static inline int netif_rx_ni(struct sk_buff *skb) +{ @@ -90,17 +109,48 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- a/net/core/dev.c +++ b/net/core/dev.c -@@ -4834,47 +4834,17 @@ int netif_rx(struct sk_buff *skb) +@@ -4829,6 +4829,16 @@ static int netif_rx_internal(struct sk_b + return ret; + } + ++int __netif_rx(struct sk_buff *skb) ++{ ++ int ret; ++ ++ trace_netif_rx_entry(skb); ++ ret = netif_rx_internal(skb); ++ trace_netif_rx_exit(ret); ++ return ret; ++} ++ + /** + * netif_rx - post buffer to the network code + * @skb: buffer to post +@@ -4837,58 +4847,25 @@ static int netif_rx_internal(struct sk_b + * the upper (protocol) levels to process. It always succeeds. The buffer + * may be dropped during processing for congestion control or by the + * protocol layers. ++ * This interface is considered legacy. Modern NIC driver should use NAPI ++ * and GRO. + * + * return values: + * NET_RX_SUCCESS (no congestion) + * NET_RX_DROP (packet was dropped) + * + */ +- + int netif_rx(struct sk_buff *skb) { int ret; +- trace_netif_rx_entry(skb); +- +- ret = netif_rx_internal(skb); +- trace_netif_rx_exit(ret); +- + local_bh_disable(); - trace_netif_rx_entry(skb); - - ret = netif_rx_internal(skb); - trace_netif_rx_exit(ret); ++ ret = __netif_rx(skb); + local_bh_enable(); - return ret; } EXPORT_SYMBOL(netif_rx); diff --git a/patches/0003-printk-use-percpu-flag-instead-of-cpu_online.patch b/patches/0003-printk-use-percpu-flag-instead-of-cpu_online.patch index 6916cbda34f0..5d63c9856ab8 100644 --- a/patches/0003-printk-use-percpu-flag-instead-of-cpu_online.patch +++ b/patches/0003-printk-use-percpu-flag-instead-of-cpu_online.patch @@ -1,6 +1,6 @@ From: John Ogness <john.ogness@linutronix.de> -Date: Wed, 10 Nov 2021 17:19:25 +0106 -Subject: [PATCH 03/15] printk: use percpu flag instead of cpu_online() +Date: Fri, 4 Feb 2022 16:01:15 +0106 +Subject: [PATCH 03/16] printk: use percpu flag instead of cpu_online() The CON_ANYTIME console flag is used to label consoles that will work correctly before percpu resources are allocated. To check diff --git a/patches/0003_random_split_add_interrupt_randomness.patch b/patches/0003_random_split_add_interrupt_randomness.patch deleted file mode 100644 index 4a29e3e4905b..000000000000 --- a/patches/0003_random_split_add_interrupt_randomness.patch +++ /dev/null @@ -1,88 +0,0 @@ -From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> -Subject: random: Split add_interrupt_randomness(). -Date: Tue, 07 Dec 2021 13:17:35 +0100 - -Split add_interrupt_randomness() into two parts: -- add_interrupt_randomness() which collects the entropy on the - invocation of a hardware interrupt and it feeds into the fast_pool, - a per-CPU variable (irq_randomness). - -- process_interrupt_randomness_pool() which feeds the fast_pool/ - irq_randomness into the entropy_store if enough entropy has been - gathered. - -This is a preparations step to ease PREEMPT_RT support. - -Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> -Link: https://lore.kernel.org/r/20211207121737.2347312-4-bigeasy@linutronix.de ---- - drivers/char/random.c | 51 +++++++++++++++++++++++++++----------------------- - 1 file changed, 28 insertions(+), 23 deletions(-) - ---- a/drivers/char/random.c -+++ b/drivers/char/random.c -@@ -1220,6 +1220,33 @@ static u32 get_reg(struct fast_pool *f, - return *ptr; - } - -+static void process_interrupt_randomness_pool(struct fast_pool *fast_pool) -+{ -+ if (unlikely(crng_init == 0)) { -+ if ((fast_pool->count >= 64) && -+ crng_fast_load((u8 *)fast_pool->pool, sizeof(fast_pool->pool)) > 0) { -+ fast_pool->count = 0; -+ fast_pool->last = jiffies; -+ } -+ return; -+ } -+ -+ if ((fast_pool->count < 64) && !time_after(jiffies, fast_pool->last + HZ)) -+ return; -+ -+ if (!spin_trylock(&input_pool.lock)) -+ return; -+ -+ fast_pool->last = jiffies; -+ __mix_pool_bytes(&fast_pool->pool, sizeof(fast_pool->pool)); -+ spin_unlock(&input_pool.lock); -+ -+ fast_pool->count = 0; -+ -+ /* award one bit for the contents of the fast pool */ -+ credit_entropy_bits(1); -+} -+ - void add_interrupt_randomness(int irq) - { - struct fast_pool *fast_pool = this_cpu_ptr(&irq_randomness); -@@ -1243,29 +1270,7 @@ void add_interrupt_randomness(int irq) - fast_mix(fast_pool); - add_interrupt_bench(cycles); - -- if (unlikely(crng_init == 0)) { -- if ((fast_pool->count >= 64) && -- crng_fast_load((u8 *)fast_pool->pool, sizeof(fast_pool->pool)) > 0) { -- fast_pool->count = 0; -- fast_pool->last = now; -- } -- return; -- } -- -- if ((fast_pool->count < 64) && !time_after(now, fast_pool->last + HZ)) -- return; -- -- if (!spin_trylock(&input_pool.lock)) -- return; -- -- fast_pool->last = now; -- __mix_pool_bytes(&fast_pool->pool, sizeof(fast_pool->pool)); -- spin_unlock(&input_pool.lock); -- -- fast_pool->count = 0; -- -- /* award one bit for the contents of the fast pool */ -- credit_entropy_bits(1); -+ process_interrupt_randomness_pool(fast_pool); - } - EXPORT_SYMBOL_GPL(add_interrupt_randomness); - diff --git a/patches/0004-printk-get-caller_id-timestamp-after-migration-disab.patch b/patches/0004-printk-get-caller_id-timestamp-after-migration-disab.patch index 784039a646fb..cf30e431c911 100644 --- a/patches/0004-printk-get-caller_id-timestamp-after-migration-disab.patch +++ b/patches/0004-printk-get-caller_id-timestamp-after-migration-disab.patch @@ -1,6 +1,6 @@ From: John Ogness <john.ogness@linutronix.de> -Date: Wed, 10 Nov 2021 17:26:21 +0106 -Subject: [PATCH 04/15] printk: get caller_id/timestamp after migration disable +Date: Fri, 4 Feb 2022 16:01:15 +0106 +Subject: [PATCH 04/16] printk: get caller_id/timestamp after migration disable Currently the local CPU timestamp and caller_id for the record are collected while migration is enabled. Since this information is diff --git a/patches/0004_random_move_the_fast_pool_reset_into_the_caller.patch b/patches/0004_random_move_the_fast_pool_reset_into_the_caller.patch deleted file mode 100644 index b3e5b858ff54..000000000000 --- a/patches/0004_random_move_the_fast_pool_reset_into_the_caller.patch +++ /dev/null @@ -1,71 +0,0 @@ -From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> -Subject: random: Move the fast_pool reset into the caller. -Date: Tue, 07 Dec 2021 13:17:36 +0100 - -The state of the fast_pool (number of added entropy, timestamp of last -addition) is reset after entropy has been consumed. - -Move the reset of the fast_pool into the caller. -This is a preparations step to ease PREEMPT_RT support. - -Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> -Link: https://lore.kernel.org/r/20211207121737.2347312-5-bigeasy@linutronix.de ---- - drivers/char/random.c | 25 +++++++++++++------------ - 1 file changed, 13 insertions(+), 12 deletions(-) - ---- a/drivers/char/random.c -+++ b/drivers/char/random.c -@@ -1220,31 +1220,29 @@ static u32 get_reg(struct fast_pool *f, - return *ptr; - } - --static void process_interrupt_randomness_pool(struct fast_pool *fast_pool) -+static bool process_interrupt_randomness_pool(struct fast_pool *fast_pool) - { - if (unlikely(crng_init == 0)) { -+ bool pool_reset = false; -+ - if ((fast_pool->count >= 64) && -- crng_fast_load((u8 *)fast_pool->pool, sizeof(fast_pool->pool)) > 0) { -- fast_pool->count = 0; -- fast_pool->last = jiffies; -- } -- return; -+ crng_fast_load((u8 *)fast_pool->pool, sizeof(fast_pool->pool)) > 0) -+ pool_reset = true; -+ return pool_reset; - } - - if ((fast_pool->count < 64) && !time_after(jiffies, fast_pool->last + HZ)) -- return; -+ return false; - - if (!spin_trylock(&input_pool.lock)) -- return; -+ return false; - -- fast_pool->last = jiffies; - __mix_pool_bytes(&fast_pool->pool, sizeof(fast_pool->pool)); - spin_unlock(&input_pool.lock); - -- fast_pool->count = 0; -- - /* award one bit for the contents of the fast pool */ - credit_entropy_bits(1); -+ return true; - } - - void add_interrupt_randomness(int irq) -@@ -1270,7 +1268,10 @@ void add_interrupt_randomness(int irq) - fast_mix(fast_pool); - add_interrupt_bench(cycles); - -- process_interrupt_randomness_pool(fast_pool); -+ if (process_interrupt_randomness_pool(fast_pool)) { -+ fast_pool->last = now; -+ fast_pool->count = 0; -+ } - } - EXPORT_SYMBOL_GPL(add_interrupt_randomness); - diff --git a/patches/0005-printk-call-boot_delay_msec-in-printk_delay.patch b/patches/0005-printk-call-boot_delay_msec-in-printk_delay.patch index 1502e7b8a998..a7c1e803a729 100644 --- a/patches/0005-printk-call-boot_delay_msec-in-printk_delay.patch +++ b/patches/0005-printk-call-boot_delay_msec-in-printk_delay.patch @@ -1,6 +1,6 @@ From: John Ogness <john.ogness@linutronix.de> -Date: Mon, 30 Nov 2020 01:42:04 +0106 -Subject: [PATCH 05/15] printk: call boot_delay_msec() in printk_delay() +Date: Fri, 4 Feb 2022 16:01:16 +0106 +Subject: [PATCH 05/16] printk: call boot_delay_msec() in printk_delay() boot_delay_msec() is always called immediately before printk_delay() so just call it from within printk_delay(). diff --git a/patches/0005_random_defer_processing_of_randomness_on_preempt_rt.patch b/patches/0005_random_defer_processing_of_randomness_on_preempt_rt.patch deleted file mode 100644 index 9f0d4c359ffd..000000000000 --- a/patches/0005_random_defer_processing_of_randomness_on_preempt_rt.patch +++ /dev/null @@ -1,111 +0,0 @@ -From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> -Subject: random: Defer processing of randomness on PREEMPT_RT. -Date: Tue, 07 Dec 2021 13:17:37 +0100 - -On interrupt invocation, add_interrupt_randomness() adds entropy to its -per-CPU state and if it gathered enough of it then it will mix it into a -entropy_store. In order to do so, it needs to lock the pool by acquiring -entropy_store::lock which is a spinlock_t. This lock can not be acquired -on PREEMPT_RT with disabled interrupts because it is a sleeping lock. - -This lock could be made a raw_spinlock_t which will then allow to -acquire it with disabled interrupts on PREEMPT_RT. The lock is usually -hold for short amount of cycles while entropy is added to the pool and -the invocation from the IRQ handler has a try-lock which avoids spinning -on the lock if contended. The extraction of entropy (extract_buf()) -needs a few cycles more because it performs additionally few -SHA1 transformations. This takes around 5-10us on a testing box (E5-2650 -32 Cores, 2way NUMA) and is negligible. - -The frequent invocation of the IOCTLs RNDADDTOENTCNT and RNDRESEEDCRNG -on multiple CPUs in parallel leads to filling and depletion of the pool -which in turn results in heavy contention on the lock. The spinning with -disabled interrupts on multiple CPUs leads to latencies of at least -100us on the same machine which is no longer acceptable. - -Collect only the IRQ randomness in IRQ-context on PREEMPT_RT. -In threaded-IRQ context, make a copy of the per-CPU state with disabled -interrupts to ensure that it is not modified while duplicated. Pass the -copy to process_interrupt_randomness_pool() and reset the per-CPU -afterwards if needed. - -Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> -Link: https://lore.kernel.org/r/20211207121737.2347312-6-bigeasy@linutronix.de ---- - drivers/char/random.c | 34 +++++++++++++++++++++++++++++++++- - include/linux/random.h | 1 + - kernel/irq/manage.c | 3 +++ - 3 files changed, 37 insertions(+), 1 deletion(-) - ---- a/drivers/char/random.c -+++ b/drivers/char/random.c -@@ -1245,6 +1245,32 @@ static bool process_interrupt_randomness - return true; - } - -+#ifdef CONFIG_PREEMPT_RT -+void process_interrupt_randomness(void) -+{ -+ struct fast_pool *cpu_pool; -+ struct fast_pool fast_pool; -+ -+ lockdep_assert_irqs_enabled(); -+ -+ migrate_disable(); -+ cpu_pool = this_cpu_ptr(&irq_randomness); -+ -+ local_irq_disable(); -+ memcpy(&fast_pool, cpu_pool, sizeof(fast_pool)); -+ local_irq_enable(); -+ -+ if (process_interrupt_randomness_pool(&fast_pool)) { -+ local_irq_disable(); -+ cpu_pool->last = jiffies; -+ cpu_pool->count = 0; -+ local_irq_enable(); -+ } -+ memzero_explicit(&fast_pool, sizeof(fast_pool)); -+ migrate_enable(); -+} -+#endif -+ - void add_interrupt_randomness(int irq) - { - struct fast_pool *fast_pool = this_cpu_ptr(&irq_randomness); -@@ -1268,7 +1294,13 @@ void add_interrupt_randomness(int irq) - fast_mix(fast_pool); - add_interrupt_bench(cycles); - -- if (process_interrupt_randomness_pool(fast_pool)) { -+ /* -+ * On PREEMPT_RT the entropy can not be fed into the input_pool because -+ * it needs to acquire sleeping locks with disabled interrupts. -+ * This is deferred to the threaded handler. -+ */ -+ if (!IS_ENABLED(CONFIG_PREEMPT_RT) && -+ process_interrupt_randomness_pool(fast_pool)) { - fast_pool->last = now; - fast_pool->count = 0; - } ---- a/include/linux/random.h -+++ b/include/linux/random.h -@@ -36,6 +36,7 @@ static inline void add_latent_entropy(vo - extern void add_input_randomness(unsigned int type, unsigned int code, - unsigned int value) __latent_entropy; - extern void add_interrupt_randomness(int irq) __latent_entropy; -+extern void process_interrupt_randomness(void); - - extern void get_random_bytes(void *buf, int nbytes); - extern int wait_for_random_bytes(void); ---- a/kernel/irq/manage.c -+++ b/kernel/irq/manage.c -@@ -1281,6 +1281,9 @@ static int irq_thread(void *data) - if (action_ret == IRQ_WAKE_THREAD) - irq_wake_secondary(desc, action); - -+ if (IS_ENABLED(CONFIG_PREEMPT_RT)) -+ process_interrupt_randomness(); -+ - wake_threads_waitq(desc); - } - diff --git a/patches/0006-printk-refactor-and-rework-printing-logic.patch b/patches/0006-printk-refactor-and-rework-printing-logic.patch index af9f6e544b85..6a5411038a91 100644 --- a/patches/0006-printk-refactor-and-rework-printing-logic.patch +++ b/patches/0006-printk-refactor-and-rework-printing-logic.patch @@ -1,6 +1,6 @@ From: John Ogness <john.ogness@linutronix.de> -Date: Tue, 10 Aug 2021 16:32:52 +0206 -Subject: [PATCH 06/15] printk: refactor and rework printing logic +Date: Fri, 4 Feb 2022 16:01:16 +0106 +Subject: [PATCH 06/16] printk: refactor and rework printing logic Refactor/rework printing logic in order to prepare for moving to threaded console printing. diff --git a/patches/0007-printk-move-buffer-definitions-into-console_emit_nex.patch b/patches/0007-printk-move-buffer-definitions-into-console_emit_nex.patch index 74b33b24834a..9851f4c70872 100644 --- a/patches/0007-printk-move-buffer-definitions-into-console_emit_nex.patch +++ b/patches/0007-printk-move-buffer-definitions-into-console_emit_nex.patch @@ -1,6 +1,6 @@ From: John Ogness <john.ogness@linutronix.de> -Date: Mon, 22 Nov 2021 17:04:02 +0106 -Subject: [PATCH 07/15] printk: move buffer definitions into +Date: Fri, 4 Feb 2022 16:01:16 +0106 +Subject: [PATCH 07/16] printk: move buffer definitions into console_emit_next_record() caller Extended consoles print extended messages and do not print messages about diff --git a/patches/0008-printk-add-pr_flush.patch b/patches/0008-printk-add-pr_flush.patch index 972bb9ae69b0..28c790211bc3 100644 --- a/patches/0008-printk-add-pr_flush.patch +++ b/patches/0008-printk-add-pr_flush.patch @@ -1,6 +1,6 @@ From: John Ogness <john.ogness@linutronix.de> -Date: Wed, 15 Dec 2021 18:44:59 +0106 -Subject: [PATCH 08/15] printk: add pr_flush() +Date: Fri, 4 Feb 2022 16:01:16 +0106 +Subject: [PATCH 08/16] printk: add pr_flush() Provide a might-sleep function to allow waiting for threaded console printers to catch up to the latest logged message. diff --git a/patches/0009-printk-add-functions-to-allow-direct-printing.patch b/patches/0009-printk-add-functions-to-allow-direct-printing.patch new file mode 100644 index 000000000000..972888c7915d --- /dev/null +++ b/patches/0009-printk-add-functions-to-allow-direct-printing.patch @@ -0,0 +1,313 @@ +From: John Ogness <john.ogness@linutronix.de> +Date: Fri, 4 Feb 2022 16:01:16 +0106 +Subject: [PATCH 09/16] printk: add functions to allow direct printing + +Once kthread printing is introduced, console printing will no longer +occur in the context of the printk caller. However, there are some +special contexts where it is desirable for the printk caller to +directly print out kernel messages. Using pr_flush() to wait for +threaded printers is only possible if the caller is in a sleepable +context and the kthreads are active. That is not always the case. + +Introduce printk_direct_enter() and printk_direct_exit() functions +to explicitly (and globally) activate/deactivate direct console +printing. + +Activate direct printing for: + - sysrq + - emergency reboot/shutdown + - cpu and rcu stalls + - hard and soft lockups + - hung tasks + - stack dumps + +Signed-off-by: John Ogness <john.ogness@linutronix.de> +--- + drivers/tty/sysrq.c | 2 ++ + include/linux/printk.h | 11 +++++++++++ + kernel/hung_task.c | 11 ++++++++++- + kernel/printk/printk.c | 25 +++++++++++++++++++++++++ + kernel/rcu/tree_stall.h | 2 ++ + kernel/reboot.c | 14 +++++++++++++- + kernel/watchdog.c | 4 ++++ + kernel/watchdog_hld.c | 4 ++++ + lib/dump_stack.c | 2 ++ + lib/nmi_backtrace.c | 2 ++ + 10 files changed, 75 insertions(+), 2 deletions(-) + +--- a/drivers/tty/sysrq.c ++++ b/drivers/tty/sysrq.c +@@ -594,9 +594,11 @@ void __handle_sysrq(int key, bool check_ + * should not) and is the invoked operation enabled? + */ + if (!check_mask || sysrq_on_mask(op_p->enable_mask)) { ++ printk_direct_enter(); + pr_info("%s\n", op_p->action_msg); + console_loglevel = orig_log_level; + op_p->handler(key); ++ printk_direct_exit(); + } else { + pr_info("This sysrq operation is disabled.\n"); + console_loglevel = orig_log_level; +--- a/include/linux/printk.h ++++ b/include/linux/printk.h +@@ -170,6 +170,9 @@ extern void __printk_safe_exit(void); + #define printk_deferred_enter __printk_safe_enter + #define printk_deferred_exit __printk_safe_exit + ++extern void printk_direct_enter(void); ++extern void printk_direct_exit(void); ++ + extern bool pr_flush(int timeout_ms, bool reset_on_progress); + + /* +@@ -222,6 +225,14 @@ static inline void printk_deferred_exit( + { + } + ++static inline void printk_direct_enter(void) ++{ ++} ++ ++static inline void printk_direct_exit(void) ++{ ++} ++ + static inline bool pr_flush(int timeout_ms, bool reset_on_progress) + { + return true; +--- a/kernel/hung_task.c ++++ b/kernel/hung_task.c +@@ -127,6 +127,8 @@ static void check_hung_task(struct task_ + * complain: + */ + if (sysctl_hung_task_warnings) { ++ printk_direct_enter(); ++ + if (sysctl_hung_task_warnings > 0) + sysctl_hung_task_warnings--; + pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n", +@@ -142,6 +144,8 @@ static void check_hung_task(struct task_ + + if (sysctl_hung_task_all_cpu_backtrace) + hung_task_show_all_bt = true; ++ ++ printk_direct_exit(); + } + + touch_nmi_watchdog(); +@@ -204,12 +208,17 @@ static void check_hung_uninterruptible_t + } + unlock: + rcu_read_unlock(); +- if (hung_task_show_lock) ++ if (hung_task_show_lock) { ++ printk_direct_enter(); + debug_show_all_locks(); ++ printk_direct_exit(); ++ } + + if (hung_task_show_all_bt) { + hung_task_show_all_bt = false; ++ printk_direct_enter(); + trigger_all_cpu_backtrace(); ++ printk_direct_exit(); + } + + if (hung_task_call_panic) +--- a/kernel/printk/printk.c ++++ b/kernel/printk/printk.c +@@ -349,6 +349,31 @@ static int console_msg_format = MSG_FORM + static DEFINE_MUTEX(syslog_lock); + + #ifdef CONFIG_PRINTK ++static atomic_t printk_direct = ATOMIC_INIT(0); ++ ++/** ++ * printk_direct_enter - cause console printing to occur in the context of ++ * printk() callers ++ * ++ * This globally effects all printk() callers. ++ * ++ * Context: Any context. ++ */ ++void printk_direct_enter(void) ++{ ++ atomic_inc(&printk_direct); ++} ++ ++/** ++ * printk_direct_exit - restore console printing behavior from direct ++ * ++ * Context: Any context. ++ */ ++void printk_direct_exit(void) ++{ ++ atomic_dec(&printk_direct); ++} ++ + DECLARE_WAIT_QUEUE_HEAD(log_wait); + /* All 3 protected by @syslog_lock. */ + /* the next printk record to read by syslog(READ) or /proc/kmsg */ +--- a/kernel/rcu/tree_stall.h ++++ b/kernel/rcu/tree_stall.h +@@ -587,6 +587,7 @@ static void print_cpu_stall(unsigned lon + * See Documentation/RCU/stallwarn.rst for info on how to debug + * RCU CPU stall warnings. + */ ++ printk_direct_enter(); + trace_rcu_stall_warning(rcu_state.name, TPS("SelfDetected")); + pr_err("INFO: %s self-detected stall on CPU\n", rcu_state.name); + raw_spin_lock_irqsave_rcu_node(rdp->mynode, flags); +@@ -621,6 +622,7 @@ static void print_cpu_stall(unsigned lon + */ + set_tsk_need_resched(current); + set_preempt_need_resched(); ++ printk_direct_exit(); + } + + static void check_cpu_stall(struct rcu_data *rdp) +--- a/kernel/reboot.c ++++ b/kernel/reboot.c +@@ -447,9 +447,11 @@ static int __orderly_reboot(void) + ret = run_cmd(reboot_cmd); + + if (ret) { ++ printk_direct_enter(); + pr_warn("Failed to start orderly reboot: forcing the issue\n"); + emergency_sync(); + kernel_restart(NULL); ++ printk_direct_exit(); + } + + return ret; +@@ -462,6 +464,7 @@ static int __orderly_poweroff(bool force + ret = run_cmd(poweroff_cmd); + + if (ret && force) { ++ printk_direct_enter(); + pr_warn("Failed to start orderly shutdown: forcing the issue\n"); + + /* +@@ -471,6 +474,7 @@ static int __orderly_poweroff(bool force + */ + emergency_sync(); + kernel_power_off(); ++ printk_direct_exit(); + } + + return ret; +@@ -528,6 +532,8 @@ EXPORT_SYMBOL_GPL(orderly_reboot); + */ + static void hw_failure_emergency_poweroff_func(struct work_struct *work) + { ++ printk_direct_enter(); ++ + /* + * We have reached here after the emergency shutdown waiting period has + * expired. This means orderly_poweroff has not been able to shut off +@@ -544,6 +550,8 @@ static void hw_failure_emergency_powerof + */ + pr_emerg("Hardware protection shutdown failed. Trying emergency restart\n"); + emergency_restart(); ++ ++ printk_direct_exit(); + } + + static DECLARE_DELAYED_WORK(hw_failure_emergency_poweroff_work, +@@ -582,11 +590,13 @@ void hw_protection_shutdown(const char * + { + static atomic_t allow_proceed = ATOMIC_INIT(1); + ++ printk_direct_enter(); ++ + pr_emerg("HARDWARE PROTECTION shutdown (%s)\n", reason); + + /* Shutdown should be initiated only once. */ + if (!atomic_dec_and_test(&allow_proceed)) +- return; ++ goto out; + + /* + * Queue a backup emergency shutdown in the event of +@@ -594,6 +604,8 @@ void hw_protection_shutdown(const char * + */ + hw_failure_emergency_poweroff(ms_until_forced); + orderly_poweroff(true); ++out: ++ printk_direct_exit(); + } + EXPORT_SYMBOL_GPL(hw_protection_shutdown); + +--- a/kernel/watchdog.c ++++ b/kernel/watchdog.c +@@ -424,6 +424,8 @@ static enum hrtimer_restart watchdog_tim + /* Start period for the next softlockup warning. */ + update_report_ts(); + ++ printk_direct_enter(); ++ + pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", + smp_processor_id(), duration, + current->comm, task_pid_nr(current)); +@@ -442,6 +444,8 @@ static enum hrtimer_restart watchdog_tim + add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK); + if (softlockup_panic) + panic("softlockup: hung tasks"); ++ ++ printk_direct_exit(); + } + + return HRTIMER_RESTART; +--- a/kernel/watchdog_hld.c ++++ b/kernel/watchdog_hld.c +@@ -135,6 +135,8 @@ static void watchdog_overflow_callback(s + if (__this_cpu_read(hard_watchdog_warn) == true) + return; + ++ printk_direct_enter(); ++ + pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", + this_cpu); + print_modules(); +@@ -155,6 +157,8 @@ static void watchdog_overflow_callback(s + if (hardlockup_panic) + nmi_panic(regs, "Hard LOCKUP"); + ++ printk_direct_exit(); ++ + __this_cpu_write(hard_watchdog_warn, true); + return; + } +--- a/lib/dump_stack.c ++++ b/lib/dump_stack.c +@@ -102,9 +102,11 @@ asmlinkage __visible void dump_stack_lvl + * Permit this cpu to perform nested stack dumps while serialising + * against other CPUs + */ ++ printk_direct_enter(); + printk_cpu_sync_get_irqsave(flags); + __dump_stack(log_lvl); + printk_cpu_sync_put_irqrestore(flags); ++ printk_direct_exit(); + } + EXPORT_SYMBOL(dump_stack_lvl); + +--- a/lib/nmi_backtrace.c ++++ b/lib/nmi_backtrace.c +@@ -99,6 +99,7 @@ bool nmi_cpu_backtrace(struct pt_regs *r + * Allow nested NMI backtraces while serializing + * against other CPUs. + */ ++ printk_direct_enter(); + printk_cpu_sync_get_irqsave(flags); + if (!READ_ONCE(backtrace_idle) && regs && cpu_in_idle(instruction_pointer(regs))) { + pr_warn("NMI backtrace for cpu %d skipped: idling at %pS\n", +@@ -111,6 +112,7 @@ bool nmi_cpu_backtrace(struct pt_regs *r + dump_stack(); + } + printk_cpu_sync_put_irqrestore(flags); ++ printk_direct_exit(); + cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask)); + return true; + } diff --git a/patches/0009-printk-add-kthread-console-printers.patch b/patches/0010-printk-add-kthread-console-printers.patch index 1fb69d5c3dd4..4b429e670f8d 100644 --- a/patches/0009-printk-add-kthread-console-printers.patch +++ b/patches/0010-printk-add-kthread-console-printers.patch @@ -1,6 +1,6 @@ From: John Ogness <john.ogness@linutronix.de> -Date: Mon, 13 Dec 2021 21:22:17 +0106 -Subject: [PATCH 09/15] printk: add kthread console printers +Date: Fri, 4 Feb 2022 16:01:17 +0106 +Subject: [PATCH 10/16] printk: add kthread console printers Create a kthread for each console to perform console printing. During normal operation (@system_state == SYSTEM_RUNNING), the kthread @@ -15,11 +15,10 @@ Console printers synchronize against each other and against console lockers by taking the console lock for each message that is printed. Signed-off-by: John Ogness <john.ogness@linutronix.de> -Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- include/linux/console.h | 2 - kernel/printk/printk.c | 157 +++++++++++++++++++++++++++++++++++++++++++++++- - 2 files changed, 157 insertions(+), 2 deletions(-) + kernel/printk/printk.c | 159 +++++++++++++++++++++++++++++++++++++++++++++++- + 2 files changed, 159 insertions(+), 2 deletions(-) --- a/include/linux/console.h +++ b/include/linux/console.h @@ -34,7 +33,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> }; --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c -@@ -348,6 +348,20 @@ static int console_msg_format = MSG_FORM +@@ -348,6 +348,13 @@ static int console_msg_format = MSG_FORM /* syslog_lock protects syslog_* variables and write access to clear_seq. */ static DEFINE_MUTEX(syslog_lock); @@ -45,26 +44,34 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> + */ +static bool kthreads_started; + -+static inline bool kthread_printers_active(void) + #ifdef CONFIG_PRINTK + static atomic_t printk_direct = ATOMIC_INIT(0); + +@@ -374,6 +381,14 @@ void printk_direct_exit(void) + atomic_dec(&printk_direct); + } + ++static inline bool allow_direct_printing(void) +{ -+ return (kthreads_started && -+ system_state == SYSTEM_RUNNING && -+ !oops_in_progress); ++ return (!kthreads_started || ++ system_state != SYSTEM_RUNNING || ++ oops_in_progress || ++ atomic_read(&printk_direct)); +} + - #ifdef CONFIG_PRINTK DECLARE_WAIT_QUEUE_HEAD(log_wait); /* All 3 protected by @syslog_lock. */ -@@ -2201,7 +2215,7 @@ asmlinkage int vprintk_emit(int facility + /* the next printk record to read by syslog(READ) or /proc/kmsg */ +@@ -2226,7 +2241,7 @@ asmlinkage int vprintk_emit(int facility printed_len = vprintk_store(facility, level, dev_info, fmt, args); /* If called from the scheduler, we can not call up(). */ - if (!in_sched) { -+ if (!in_sched && !kthread_printers_active()) { ++ if (!in_sched && allow_direct_printing()) { /* * Disable preemption to avoid being preempted while holding * console_sem which would prevent anyone from printing to -@@ -2242,6 +2256,8 @@ asmlinkage __visible int _printk(const c +@@ -2267,6 +2282,8 @@ asmlinkage __visible int _printk(const c } EXPORT_SYMBOL(_printk); @@ -73,15 +80,16 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> #else /* CONFIG_PRINTK */ #define CONSOLE_LOG_MAX 0 -@@ -2273,6 +2289,7 @@ static void call_console_driver(struct c +@@ -2298,6 +2315,8 @@ static void call_console_driver(struct c char *dropped_text) {} static bool suppress_message_printing(int level) { return false; } static void printk_delay(int level) {} +static void start_printk_kthread(struct console *con) {} ++static bool allow_direct_printing(void) { return true; } #endif /* CONFIG_PRINTK */ -@@ -2461,6 +2478,10 @@ void resume_console(void) +@@ -2486,6 +2505,10 @@ void resume_console(void) down_console_sem(); console_suspended = 0; console_unlock(); @@ -92,18 +100,18 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> pr_flush(1000, true); } -@@ -2676,6 +2697,10 @@ static bool console_flush_all(bool do_co +@@ -2701,6 +2724,10 @@ static bool console_flush_all(bool do_co *handover = false; do { + /* Let the kthread printers do the work if they can. */ -+ if (kthread_printers_active()) -+ return false; ++ if (!allow_direct_printing()) ++ break; + any_progress = false; for_each_console(con) { -@@ -2884,6 +2909,10 @@ void console_start(struct console *conso +@@ -2909,6 +2936,10 @@ void console_start(struct console *conso console_lock(); console->flags |= CON_ENABLED; console_unlock(); @@ -114,7 +122,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> pr_flush(1000, true); } EXPORT_SYMBOL(console_start); -@@ -3088,6 +3117,8 @@ void register_console(struct console *ne +@@ -3113,6 +3144,8 @@ void register_console(struct console *ne /* Begin with next message. */ newcon->seq = prb_next_seq(prb); } @@ -123,7 +131,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> console_unlock(); console_sysfs_notify(); -@@ -3144,6 +3175,11 @@ int unregister_console(struct console *c +@@ -3169,6 +3202,11 @@ int unregister_console(struct console *c } } @@ -135,7 +143,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> if (res) goto out_disable_unlock; -@@ -3250,6 +3286,13 @@ static int __init printk_late_init(void) +@@ -3275,6 +3313,13 @@ static int __init printk_late_init(void) console_cpu_notify, NULL); WARN_ON(ret < 0); printk_sysctl_init(); @@ -149,7 +157,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> return 0; } late_initcall(printk_late_init); -@@ -3320,6 +3363,116 @@ bool pr_flush(int timeout_ms, bool reset +@@ -3345,6 +3390,116 @@ bool pr_flush(int timeout_ms, bool reset } EXPORT_SYMBOL(pr_flush); @@ -266,7 +274,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> /* * Delayed printk version, for scheduler-internal messages: */ -@@ -3339,7 +3492,7 @@ static void wake_up_klogd_work_func(stru +@@ -3364,7 +3519,7 @@ static void wake_up_klogd_work_func(stru } if (pending & PRINTK_PENDING_WAKEUP) diff --git a/patches/0010-printk-reimplement-console_lock-for-proper-kthread-s.patch b/patches/0011-printk-reimplement-console_lock-for-proper-kthread-s.patch index 71809f6aece9..350dcad42742 100644 --- a/patches/0010-printk-reimplement-console_lock-for-proper-kthread-s.patch +++ b/patches/0011-printk-reimplement-console_lock-for-proper-kthread-s.patch @@ -1,6 +1,6 @@ From: John Ogness <john.ogness@linutronix.de> -Date: Mon, 13 Dec 2021 21:24:23 +0106 -Subject: [PATCH 10/15] printk: reimplement console_lock for proper kthread +Date: Fri, 4 Feb 2022 16:01:17 +0106 +Subject: [PATCH 11/16] printk: reimplement console_lock for proper kthread support With non-threaded console printers preemption is disabled while @@ -196,7 +196,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> * This is used for debugging the mess that is the VT code by * keeping track if we have the console semaphore held. It's * definitely not the perfect debug tool (we don't know if _WE_ -@@ -2478,10 +2529,6 @@ void resume_console(void) +@@ -2505,10 +2556,6 @@ void resume_console(void) down_console_sem(); console_suspended = 0; console_unlock(); @@ -207,7 +207,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> pr_flush(1000, true); } -@@ -2519,6 +2566,7 @@ void console_lock(void) +@@ -2546,6 +2593,7 @@ void console_lock(void) down_console_sem(); if (console_suspended) return; @@ -215,7 +215,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> console_locked = 1; console_may_schedule = 1; } -@@ -2540,15 +2588,45 @@ int console_trylock(void) +@@ -2567,15 +2615,45 @@ int console_trylock(void) up_console_sem(); return 0; } @@ -262,7 +262,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> } EXPORT_SYMBOL(is_console_locked); -@@ -2582,6 +2660,19 @@ static inline bool console_is_usable(str +@@ -2609,6 +2687,19 @@ static inline bool console_is_usable(str static void __console_unlock(void) { console_locked = 0; @@ -282,7 +282,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> up_console_sem(); } -@@ -2604,7 +2695,8 @@ static void __console_unlock(void) +@@ -2631,7 +2722,8 @@ static void __console_unlock(void) * * @handover will be set to true if a printk waiter has taken over the * console_lock, in which case the caller is no longer holding the @@ -292,7 +292,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> */ static bool console_emit_next_record(struct console *con, char *text, char *ext_text, char *dropped_text, bool *handover) -@@ -2612,12 +2704,14 @@ static bool console_emit_next_record(str +@@ -2639,12 +2731,14 @@ static bool console_emit_next_record(str struct printk_info info; struct printk_record r; unsigned long flags; @@ -308,7 +308,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> if (!prb_read_valid(prb, con->seq, &r)) return false; -@@ -2643,18 +2737,23 @@ static bool console_emit_next_record(str +@@ -2670,18 +2764,23 @@ static bool console_emit_next_record(str len = record_print_text(&r, console_msg_format & MSG_FORMAT_SYSLOG, printk_time); } @@ -344,7 +344,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> stop_critical_timings(); /* don't trace print latency */ call_console_driver(con, write_text, len, dropped_text); -@@ -2662,8 +2761,10 @@ static bool console_emit_next_record(str +@@ -2689,8 +2788,10 @@ static bool console_emit_next_record(str con->seq++; @@ -357,7 +357,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> printk_delay(r.info->level); skip: -@@ -2797,7 +2898,7 @@ void console_unlock(void) +@@ -2824,7 +2925,7 @@ void console_unlock(void) * Re-check if there is a new record to flush. If the trylock * fails, another context is already handling the printing. */ @@ -366,7 +366,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> } EXPORT_SYMBOL(console_unlock); -@@ -2828,6 +2929,10 @@ void console_unblank(void) +@@ -2855,6 +2956,10 @@ void console_unblank(void) if (oops_in_progress) { if (down_trylock_console_sem() != 0) return; @@ -377,7 +377,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> } else { pr_flush(1000, true); console_lock(); -@@ -2909,10 +3014,6 @@ void console_start(struct console *conso +@@ -2936,10 +3041,6 @@ void console_start(struct console *conso console_lock(); console->flags |= CON_ENABLED; console_unlock(); @@ -388,7 +388,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> pr_flush(1000, true); } EXPORT_SYMBOL(console_start); -@@ -3107,7 +3208,11 @@ void register_console(struct console *ne +@@ -3134,7 +3235,11 @@ void register_console(struct console *ne if (newcon->flags & CON_EXTENDED) nr_ext_console_drivers++; @@ -400,7 +400,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> if (newcon->flags & CON_PRINTBUFFER) { /* Get a consistent copy of @syslog_seq. */ mutex_lock(&syslog_lock); -@@ -3370,16 +3475,17 @@ static bool printer_should_wake(struct c +@@ -3397,16 +3502,17 @@ static bool printer_should_wake(struct c if (kthread_should_stop()) return true; @@ -422,7 +422,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> return prb_read_valid(prb, seq, NULL); } -@@ -3390,7 +3496,6 @@ static int printk_kthread_func(void *dat +@@ -3417,7 +3523,6 @@ static int printk_kthread_func(void *dat char *dropped_text = NULL; char *ext_text = NULL; bool progress; @@ -430,7 +430,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> u64 seq = 0; char *text; int error; -@@ -3423,9 +3528,17 @@ static int printk_kthread_func(void *dat +@@ -3450,9 +3555,17 @@ static int printk_kthread_func(void *dat continue; do { @@ -451,7 +451,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> break; } -@@ -3439,14 +3552,13 @@ static int printk_kthread_func(void *dat +@@ -3466,14 +3579,13 @@ static int printk_kthread_func(void *dat */ console_may_schedule = 0; progress = console_emit_next_record(con, text, ext_text, diff --git a/patches/0011-printk-remove-console_locked.patch b/patches/0012-printk-remove-console_locked.patch index bc829e47705a..ad98f119ae36 100644 --- a/patches/0011-printk-remove-console_locked.patch +++ b/patches/0012-printk-remove-console_locked.patch @@ -1,6 +1,6 @@ From: John Ogness <john.ogness@linutronix.de> -Date: Fri, 17 Dec 2021 12:29:13 +0106 -Subject: [PATCH 11/15] printk: remove @console_locked +Date: Fri, 4 Feb 2022 16:01:17 +0106 +Subject: [PATCH 12/16] printk: remove @console_locked The static global variable @console_locked is used to help debug VT code to make sure that certain code paths are running with the @@ -37,7 +37,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> /* * Array of consoles built from command line options (console=) -@@ -2567,7 +2559,6 @@ void console_lock(void) +@@ -2594,7 +2586,6 @@ void console_lock(void) if (console_suspended) return; pause_all_consoles(); @@ -45,7 +45,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> console_may_schedule = 1; } EXPORT_SYMBOL(console_lock); -@@ -2592,7 +2583,6 @@ int console_trylock(void) +@@ -2619,7 +2610,6 @@ int console_trylock(void) up_console_sem(); return 0; } @@ -53,7 +53,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> console_may_schedule = 0; return 1; } -@@ -2619,14 +2609,25 @@ static int console_trylock_sched(bool ma +@@ -2646,14 +2636,25 @@ static int console_trylock_sched(bool ma return 0; } pause_all_consoles(); @@ -81,7 +81,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> } EXPORT_SYMBOL(is_console_locked); -@@ -2659,8 +2660,6 @@ static inline bool console_is_usable(str +@@ -2686,8 +2687,6 @@ static inline bool console_is_usable(str static void __console_unlock(void) { @@ -90,7 +90,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> /* * Depending on whether console_lock() or console_trylock() was used, * appropriately allow the kthread printers to continue. -@@ -2938,7 +2937,6 @@ void console_unblank(void) +@@ -2965,7 +2964,6 @@ void console_unblank(void) console_lock(); } diff --git a/patches/0012-console-introduce-CON_MIGHT_SLEEP-for-vt.patch b/patches/0013-console-introduce-CON_MIGHT_SLEEP-for-vt.patch index 4d6ec19a94c0..d6746a857504 100644 --- a/patches/0012-console-introduce-CON_MIGHT_SLEEP-for-vt.patch +++ b/patches/0013-console-introduce-CON_MIGHT_SLEEP-for-vt.patch @@ -1,6 +1,6 @@ From: John Ogness <john.ogness@linutronix.de> -Date: Thu, 16 Dec 2021 16:06:29 +0106 -Subject: [PATCH 12/15] console: introduce CON_MIGHT_SLEEP for vt +Date: Fri, 4 Feb 2022 16:01:17 +0106 +Subject: [PATCH 13/16] console: introduce CON_MIGHT_SLEEP for vt Deadlocks and the framebuffer console have been a recurring issue that is getting worse. Daniel Vetter suggested [0] that @@ -43,7 +43,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> char name[16]; --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c -@@ -2808,6 +2808,8 @@ static bool console_flush_all(bool do_co +@@ -2835,6 +2835,8 @@ static bool console_flush_all(bool do_co if (!console_is_usable(con)) continue; diff --git a/patches/0013-printk-add-infrastucture-for-atomic-consoles.patch b/patches/0014-printk-add-infrastucture-for-atomic-consoles.patch index b90c9d440f26..d8df9dd2c209 100644 --- a/patches/0013-printk-add-infrastucture-for-atomic-consoles.patch +++ b/patches/0014-printk-add-infrastucture-for-atomic-consoles.patch @@ -1,6 +1,6 @@ From: John Ogness <john.ogness@linutronix.de> -Date: Wed, 22 Dec 2021 13:44:40 +0106 -Subject: [PATCH 13/15] printk: add infrastucture for atomic consoles +Date: Fri, 4 Feb 2022 16:01:17 +0106 +Subject: [PATCH 14/16] printk: add infrastucture for atomic consoles Many times it is not possible to see the console output on panic because printing threads cannot be scheduled and/or the @@ -118,7 +118,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> #include <linux/sched/clock.h> #include <linux/sched/debug.h> #include <linux/sched/task_stack.h> -@@ -1942,21 +1943,30 @@ static int console_trylock_spinning(void +@@ -1968,21 +1969,30 @@ static int console_trylock_spinning(void * dropped, a dropped message will be written out first. */ static void call_console_driver(struct console *con, const char *text, size_t len, @@ -155,7 +155,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> } /* -@@ -2299,6 +2309,76 @@ asmlinkage __visible int _printk(const c +@@ -2325,6 +2335,76 @@ asmlinkage __visible int _printk(const c } EXPORT_SYMBOL(_printk); @@ -232,7 +232,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> static void start_printk_kthread(struct console *con); #else /* CONFIG_PRINTK */ -@@ -2311,6 +2391,8 @@ static void start_printk_kthread(struct +@@ -2337,6 +2417,8 @@ static void start_printk_kthread(struct #define prb_first_valid_seq(rb) 0 #define prb_next_seq(rb) 0 @@ -241,7 +241,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> static u64 syslog_seq; static size_t record_print_text(const struct printk_record *r, -@@ -2329,7 +2411,7 @@ static ssize_t msg_print_ext_body(char * +@@ -2355,7 +2437,7 @@ static ssize_t msg_print_ext_body(char * static void console_lock_spinning_enable(void) { } static int console_lock_spinning_disable_and_check(void) { return 0; } static void call_console_driver(struct console *con, const char *text, size_t len, @@ -250,7 +250,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> static bool suppress_message_printing(int level) { return false; } static void printk_delay(int level) {} static void start_printk_kthread(struct console *con) {} -@@ -2637,13 +2719,23 @@ EXPORT_SYMBOL(is_console_locked); +@@ -2664,13 +2746,23 @@ EXPORT_SYMBOL(is_console_locked); * * Requires the console_lock. */ @@ -268,15 +268,15 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> + if (!con->atomic_data) + return false; +#else -+ return false; + return false; +#endif + } else if (!con->write) { - return false; ++ return false; + } /* * Console drivers may assume that per-cpu resources have been -@@ -2675,6 +2767,66 @@ static void __console_unlock(void) +@@ -2702,6 +2794,66 @@ static void __console_unlock(void) up_console_sem(); } @@ -343,7 +343,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> /* * Print one record for the given console. The record printed is whatever * record is the next available record for the given console. -@@ -2687,6 +2839,8 @@ static void __console_unlock(void) +@@ -2714,6 +2866,8 @@ static void __console_unlock(void) * If dropped messages should be printed, @dropped_text is a buffer of size * DROPPED_TEXT_MAX. Otherise @dropped_text must be NULL. * @@ -352,7 +352,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> * Requires the console_lock. * * Returns false if the given console has no next record to print, otherwise -@@ -2698,7 +2852,8 @@ static void __console_unlock(void) +@@ -2725,7 +2879,8 @@ static void __console_unlock(void) * to disable allowing the console_lock to be taken over by a printk waiter. */ static bool console_emit_next_record(struct console *con, char *text, char *ext_text, @@ -362,7 +362,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> { struct printk_info info; struct printk_record r; -@@ -2706,23 +2861,27 @@ static bool console_emit_next_record(str +@@ -2733,23 +2888,27 @@ static bool console_emit_next_record(str bool allow_handover; char *write_text; size_t len; @@ -395,7 +395,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> goto skip; } -@@ -2755,10 +2914,10 @@ static bool console_emit_next_record(str +@@ -2782,10 +2941,10 @@ static bool console_emit_next_record(str } stop_critical_timings(); /* don't trace print latency */ @@ -408,7 +408,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> if (allow_handover) { *handover = console_lock_spinning_disable_and_check(); -@@ -2806,7 +2965,7 @@ static bool console_flush_all(bool do_co +@@ -2833,7 +2992,7 @@ static bool console_flush_all(bool do_co for_each_console(con) { bool progress; @@ -417,7 +417,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> continue; if ((con->flags & CON_MIGHT_SLEEP) && !do_cond_resched) continue; -@@ -2816,11 +2975,11 @@ static bool console_flush_all(bool do_co +@@ -2843,11 +3002,11 @@ static bool console_flush_all(bool do_co /* Extended consoles do not print "dropped messages". */ progress = console_emit_next_record(con, &text[0], &ext_text[0], NULL, @@ -431,7 +431,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> } if (*handover) return true; -@@ -2841,6 +3000,67 @@ static bool console_flush_all(bool do_co +@@ -2868,6 +3027,67 @@ static bool console_flush_all(bool do_co return any_usable; } @@ -499,7 +499,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> /** * console_unlock - unlock the console system * -@@ -2954,6 +3174,11 @@ void console_unblank(void) +@@ -2981,6 +3201,11 @@ void console_unblank(void) */ void console_flush_on_panic(enum con_flush_mode mode) { @@ -511,7 +511,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> /* * If someone else is holding the console lock, trylock will fail * and may_schedule may be set. Ignore and proceed to unlock so -@@ -2970,7 +3195,7 @@ void console_flush_on_panic(enum con_flu +@@ -2997,7 +3222,7 @@ void console_flush_on_panic(enum con_flu seq = prb_first_valid_seq(prb); for_each_console(c) @@ -520,7 +520,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> } console_unlock(); } -@@ -3211,16 +3436,19 @@ void register_console(struct console *ne +@@ -3238,16 +3463,19 @@ void register_console(struct console *ne if (consoles_paused) newcon->flags |= CON_PAUSED; @@ -543,7 +543,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> } if (kthreads_started) start_printk_kthread(newcon); -@@ -3302,6 +3530,10 @@ int unregister_console(struct console *c +@@ -3329,6 +3557,10 @@ int unregister_console(struct console *c console_unlock(); console_sysfs_notify(); @@ -554,7 +554,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> if (console->exit) res = console->exit(console); -@@ -3436,7 +3668,7 @@ bool pr_flush(int timeout_ms, bool reset +@@ -3463,7 +3695,7 @@ bool pr_flush(int timeout_ms, bool reset console_lock(); for_each_console(con) { @@ -563,7 +563,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> continue; printk_seq = con->seq; if (printk_seq < seq) -@@ -3504,6 +3736,11 @@ static int printk_kthread_func(void *dat +@@ -3531,6 +3763,11 @@ static int printk_kthread_func(void *dat (con->flags & CON_BOOT) ? "boot" : "", con->name, con->index); @@ -575,7 +575,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> text = kmalloc(CONSOLE_LOG_MAX, GFP_KERNEL); if (!text) goto out; -@@ -3532,7 +3769,7 @@ static int printk_kthread_func(void *dat +@@ -3559,7 +3796,7 @@ static int printk_kthread_func(void *dat if (error) break; @@ -584,7 +584,7 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> mutex_unlock(&con->lock); break; } -@@ -3552,7 +3789,7 @@ static int printk_kthread_func(void *dat +@@ -3579,7 +3816,7 @@ static int printk_kthread_func(void *dat */ console_may_schedule = 0; progress = console_emit_next_record(con, text, ext_text, diff --git a/patches/0014-serial-8250-implement-write_atomic.patch b/patches/0015-serial-8250-implement-write_atomic.patch index 42af2d2ed00a..9193ef05bf2e 100644 --- a/patches/0014-serial-8250-implement-write_atomic.patch +++ b/patches/0015-serial-8250-implement-write_atomic.patch @@ -1,6 +1,6 @@ From: John Ogness <john.ogness@linutronix.de> -Date: Mon, 30 Nov 2020 01:42:02 +0106 -Subject: [PATCH 14/15] serial: 8250: implement write_atomic +Date: Fri, 4 Feb 2022 16:01:17 +0106 +Subject: [PATCH 15/16] serial: 8250: implement write_atomic Implement a non-sleeping NMI-safe write_atomic() console function in order to support atomic console printing during a panic. diff --git a/patches/0015-printk-avoid-preempt_disable-for-PREEMPT_RT.patch b/patches/0016-printk-avoid-preempt_disable-for-PREEMPT_RT.patch index dc615e9f661f..d0b421420b18 100644 --- a/patches/0015-printk-avoid-preempt_disable-for-PREEMPT_RT.patch +++ b/patches/0016-printk-avoid-preempt_disable-for-PREEMPT_RT.patch @@ -1,48 +1,43 @@ From: John Ogness <john.ogness@linutronix.de> -Date: Thu, 20 Jan 2022 16:53:56 +0106 -Subject: [PATCH 15/15] printk: avoid preempt_disable() for PREEMPT_RT +Date: Fri, 4 Feb 2022 16:01:17 +0106 +Subject: [PATCH 16/16] printk: avoid preempt_disable() for PREEMPT_RT During non-normal operation, printk() calls will attempt to write the messages directly to the consoles. This involves using console_trylock() to acquire @console_sem. -Since commit fd5f7cde1b85 ("printk: Never set -console_may_schedule in console_trylock()"), preemption is -disabled while directly printing to the consoles in order to -ensure that the printing task is not scheduled away while -holding @console_sem. +Preemption is disabled while directly printing to the consoles +in order to ensure that the printing task is not scheduled away +while holding @console_sem, thus blocking all other printers +and causing delays in printing. -On PREEMPT_RT systems, disabling preemption here is not allowed -because console drivers will acquire spin locks (which under -PREEMPT_RT is an rtmutex). +Commit fd5f7cde1b85 ("printk: Never set console_may_schedule in +console_trylock()") specifically reverted a previous attempt at +allowing preemption while printing. -For normal operation, direct printing is not used. In a panic -scenario, atomic consoles and spinlock busting are used to -handle direct printing. So the usefulness of disabling -preemption here is really restricted to early boot. - -For PREEMPT_RT systems, do not disable preemption during direct -console printing. This also means that console handovers cannot -take place. Console handovers are also something that is really -restricted to early boot. +However, on PREEMPT_RT systems, disabling preemption while +printing is not allowed because console drivers typically +acquire a spin lock (which under PREEMPT_RT is an rtmutex). +Since direct printing is only used during early boot and +non-panic dumps, the risks of delayed print output for these +scenarios will be accepted under PREEMPT_RT. Signed-off-by: John Ogness <john.ogness@linutronix.de> -Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- kernel/printk/printk.c | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c -@@ -1873,6 +1873,7 @@ static int console_lock_spinning_disable +@@ -1899,6 +1899,7 @@ static int console_lock_spinning_disable return 1; } -+#if (!IS_ENABLED(CONFIG_PREEMPT_RT)) ++#if !IS_ENABLED(CONFIG_PREEMPT_RT) /** * console_trylock_spinning - try to get console_lock by busy waiting * -@@ -1936,6 +1937,7 @@ static int console_trylock_spinning(void +@@ -1962,6 +1963,7 @@ static int console_trylock_spinning(void return 1; } @@ -50,14 +45,14 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> /* * Call the specified console driver, asking it to write out the specified -@@ -2270,19 +2272,31 @@ asmlinkage int vprintk_emit(int facility +@@ -2296,19 +2298,31 @@ asmlinkage int vprintk_emit(int facility /* If called from the scheduler, we can not call up(). */ - if (!in_sched && !kthread_printers_active()) { + if (!in_sched && allow_direct_printing()) { /* + * Try to acquire and then immediately release the console + * semaphore. The release will print out buffers. + */ -+#if (IS_ENABLED(CONFIG_PREEMPT_RT)) ++#if IS_ENABLED(CONFIG_PREEMPT_RT) + /* + * Use the non-spinning trylock since PREEMPT_RT does not + * support console lock handovers. @@ -87,11 +82,11 @@ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> } wake_up_klogd(); -@@ -2895,8 +2909,13 @@ static bool console_emit_next_record(str +@@ -2922,8 +2936,13 @@ static bool console_emit_next_record(str len = record_print_text(&r, console_msg_format & MSG_FORMAT_SYSLOG, printk_time); } -+#if (IS_ENABLED(CONFIG_PREEMPT_RT)) ++#if IS_ENABLED(CONFIG_PREEMPT_RT) + /* PREEMPT_RT does not support console lock handovers. */ + allow_handover = false; +#else diff --git a/patches/ARM__Allow_to_enable_RT.patch b/patches/ARM__Allow_to_enable_RT.patch index 89993cc53b1a..ad7dffb7b4cb 100644 --- a/patches/ARM__Allow_to_enable_RT.patch +++ b/patches/ARM__Allow_to_enable_RT.patch @@ -24,7 +24,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF select ARCH_USE_MEMTEST -@@ -125,6 +126,7 @@ config ARM +@@ -126,6 +127,7 @@ config ARM select OLD_SIGSUSPEND3 select PCI_SYSCALL if PCI select PERF_USE_VMALLOC diff --git a/patches/Add_localversion_for_-RT_release.patch b/patches/Add_localversion_for_-RT_release.patch index 53b69a97ca19..41fc0b58e69e 100644 --- a/patches/Add_localversion_for_-RT_release.patch +++ b/patches/Add_localversion_for_-RT_release.patch @@ -15,4 +15,4 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- /dev/null +++ b/localversion-rt @@ -0,0 +1 @@ -+-rt3 ++-rt4 diff --git a/patches/arm64_mm_make_arch_faults_on_old_pte_check_for_migratability.patch b/patches/arm64-mm-Make-arch_faults_on_old_pte-check-for-migra.patch index 01b100a09d25..22f2191261e1 100644 --- a/patches/arm64_mm_make_arch_faults_on_old_pte_check_for_migratability.patch +++ b/patches/arm64-mm-Make-arch_faults_on_old_pte-check-for-migra.patch @@ -1,6 +1,7 @@ From: Valentin Schneider <valentin.schneider@arm.com> -Subject: arm64: mm: Make arch_faults_on_old_pte() check for migratability -Date: Wed, 11 Aug 2021 21:13:54 +0100 +Date: Thu, 27 Jan 2022 19:24:37 +0000 +Subject: [PATCH] arm64: mm: Make arch_faults_on_old_pte() check for + migratability arch_faults_on_old_pte() relies on the calling context being non-preemptible. CONFIG_PREEMPT_RT turns the PTE lock into a sleepable @@ -11,23 +12,25 @@ It does however disable migration, ensuring the task remains on the same CPU during the entirety of the critical section, making the read of cpu_has_hw_af() safe and stable. -Make arch_faults_on_old_pte() check migratable() instead of preemptible(). +Make arch_faults_on_old_pte() check cant_migrate() instead of preemptible(). +Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> -Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> -Link: https://lore.kernel.org/r/20210811201354.1976839-5-valentin.schneider@arm.com +Link: https://lore.kernel.org/r/20220127192437.1192957-1-valentin.schneider@arm.com +Acked-by: Catalin Marinas <catalin.marinas@arm.com> --- - arch/arm64/include/asm/pgtable.h | 2 +- - 1 file changed, 1 insertion(+), 1 deletion(-) + arch/arm64/include/asm/pgtable.h | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h -@@ -1001,7 +1001,7 @@ static inline void update_mmu_cache(stru +@@ -1001,7 +1001,8 @@ static inline void update_mmu_cache(stru */ static inline bool arch_faults_on_old_pte(void) { - WARN_ON(preemptible()); -+ WARN_ON(is_migratable()); ++ /* The register read below requires a stable CPU to make any sense */ ++ cant_migrate(); return !cpu_has_hw_af(); } diff --git a/patches/arm__Add_support_for_lazy_preemption.patch b/patches/arm__Add_support_for_lazy_preemption.patch index 5244fa44c144..8ceccc1acaf0 100644 --- a/patches/arm__Add_support_for_lazy_preemption.patch +++ b/patches/arm__Add_support_for_lazy_preemption.patch @@ -19,7 +19,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig -@@ -109,6 +109,7 @@ config ARM +@@ -110,6 +110,7 @@ config ARM select HAVE_PERF_EVENTS select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP diff --git a/patches/fs_dcache__use_swait_queue_instead_of_waitqueue.patch b/patches/fs_dcache__use_swait_queue_instead_of_waitqueue.patch index 7566db2b8304..5939a8b77916 100644 --- a/patches/fs_dcache__use_swait_queue_instead_of_waitqueue.patch +++ b/patches/fs_dcache__use_swait_queue_instead_of_waitqueue.patch @@ -137,7 +137,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> struct dentry *dentry; struct dentry *alias; struct inode *inode; -@@ -1861,7 +1861,7 @@ int nfs_atomic_open(struct inode *dir, s +@@ -1883,7 +1883,7 @@ int nfs_atomic_open(struct inode *dir, s struct file *file, unsigned open_flags, umode_t mode) { @@ -218,7 +218,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> extern struct dentry * d_exact_alias(struct dentry *, struct inode *); --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h -@@ -1684,7 +1684,7 @@ struct nfs_unlinkdata { +@@ -1686,7 +1686,7 @@ struct nfs_unlinkdata { struct nfs_removeargs args; struct nfs_removeres res; struct dentry *dentry; diff --git a/patches/ptrace__fix_ptrace_vs_tasklist_lock_race.patch b/patches/ptrace__fix_ptrace_vs_tasklist_lock_race.patch index 839abf0c8750..54dec73215bb 100644 --- a/patches/ptrace__fix_ptrace_vs_tasklist_lock_race.patch +++ b/patches/ptrace__fix_ptrace_vs_tasklist_lock_race.patch @@ -49,7 +49,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> /* * Special states are those that do not use the normal wait-loop pattern. See * the comment with set_special_state(). -@@ -2011,6 +2007,81 @@ static inline int test_tsk_need_resched( +@@ -2007,6 +2003,81 @@ static inline int test_tsk_need_resched( return unlikely(test_tsk_thread_flag(tsk,TIF_NEED_RESCHED)); } diff --git a/patches/rcu__Delay_RCU-selftests.patch b/patches/rcu__Delay_RCU-selftests.patch index c865f28b93de..f10d54a8031d 100644 --- a/patches/rcu__Delay_RCU-selftests.patch +++ b/patches/rcu__Delay_RCU-selftests.patch @@ -44,7 +44,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> smp_init(); --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h -@@ -1657,7 +1657,7 @@ static void test_rcu_tasks_callback(stru +@@ -1661,7 +1661,7 @@ static void test_rcu_tasks_callback(stru rttd->notrun = true; } @@ -53,7 +53,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> { pr_info("Running RCU-tasks wait API self tests\n"); #ifdef CONFIG_TASKS_RCU -@@ -1694,9 +1694,7 @@ static int rcu_tasks_verify_self_tests(v +@@ -1698,9 +1698,7 @@ static int rcu_tasks_verify_self_tests(v return ret; } late_initcall(rcu_tasks_verify_self_tests); @@ -64,7 +64,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> void __init rcu_init_tasks_generic(void) { -@@ -1711,9 +1709,6 @@ void __init rcu_init_tasks_generic(void) +@@ -1715,9 +1713,6 @@ void __init rcu_init_tasks_generic(void) #ifdef CONFIG_TASKS_TRACE_RCU rcu_spawn_tasks_trace_kthread(); #endif diff --git a/patches/sched__Add_support_for_lazy_preemption.patch b/patches/sched__Add_support_for_lazy_preemption.patch index 646026aca3fa..b2a60232bcfd 100644 --- a/patches/sched__Add_support_for_lazy_preemption.patch +++ b/patches/sched__Add_support_for_lazy_preemption.patch @@ -67,8 +67,8 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> kernel/sched/sched.h | 9 ++++ kernel/trace/trace.c | 50 ++++++++++++++++---------- kernel/trace/trace_events.c | 1 - kernel/trace/trace_output.c | 16 +++++++- - 12 files changed, 258 insertions(+), 36 deletions(-) + kernel/trace/trace_output.c | 18 ++++++++- + 12 files changed, 260 insertions(+), 36 deletions(-) --- --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -177,7 +177,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- a/include/linux/sched.h +++ b/include/linux/sched.h -@@ -2011,6 +2011,43 @@ static inline int test_tsk_need_resched( +@@ -2007,6 +2007,43 @@ static inline int test_tsk_need_resched( return unlikely(test_tsk_thread_flag(tsk,TIF_NEED_RESCHED)); } @@ -367,7 +367,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> #ifdef CONFIG_SMP plist_node_init(&p->pushable_tasks, MAX_PRIO); RB_CLEAR_NODE(&p->pushable_dl_tasks); -@@ -6262,6 +6307,7 @@ static void __sched notrace __schedule(u +@@ -6261,6 +6306,7 @@ static void __sched notrace __schedule(u next = pick_next_task(rq, prev, &rf); clear_tsk_need_resched(prev); @@ -375,7 +375,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> clear_preempt_need_resched(); #ifdef CONFIG_SCHED_DEBUG rq->last_seen_need_resched_ns = 0; -@@ -6473,6 +6519,30 @@ static void __sched notrace preempt_sche +@@ -6472,6 +6518,30 @@ static void __sched notrace preempt_sche } while (need_resched()); } @@ -406,7 +406,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> #ifdef CONFIG_PREEMPTION /* * This is the entry point to schedule() from in-kernel preemption -@@ -6486,7 +6556,8 @@ asmlinkage __visible void __sched notrac +@@ -6485,7 +6555,8 @@ asmlinkage __visible void __sched notrac */ if (likely(!preemptible())) return; @@ -416,7 +416,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> preempt_schedule_common(); } NOKPROBE_SYMBOL(preempt_schedule); -@@ -6519,6 +6590,9 @@ asmlinkage __visible void __sched notrac +@@ -6518,6 +6589,9 @@ asmlinkage __visible void __sched notrac if (likely(!preemptible())) return; @@ -426,7 +426,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> do { /* * Because the function tracer can trace preempt_count_sub() -@@ -8691,7 +8765,9 @@ void __init init_idle(struct task_struct +@@ -8684,7 +8758,9 @@ void __init init_idle(struct task_struct /* Set the preempt count _outside_ the spinlocks! */ init_idle_preempt_count(idle, cpu); @@ -439,7 +439,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> */ --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c -@@ -4393,7 +4393,7 @@ check_preempt_tick(struct cfs_rq *cfs_rq +@@ -4427,7 +4427,7 @@ check_preempt_tick(struct cfs_rq *cfs_rq ideal_runtime = sched_slice(cfs_rq, curr); delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime; if (delta_exec > ideal_runtime) { @@ -448,7 +448,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> /* * The current task ran long enough, ensure it doesn't get * re-elected due to buddy favours. -@@ -4417,7 +4417,7 @@ check_preempt_tick(struct cfs_rq *cfs_rq +@@ -4451,7 +4451,7 @@ check_preempt_tick(struct cfs_rq *cfs_rq return; if (delta > ideal_runtime) @@ -457,7 +457,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> } static void -@@ -4563,7 +4563,7 @@ entity_tick(struct cfs_rq *cfs_rq, struc +@@ -4597,7 +4597,7 @@ entity_tick(struct cfs_rq *cfs_rq, struc * validating it and just reschedule. */ if (queued) { @@ -466,7 +466,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> return; } /* -@@ -4712,7 +4712,7 @@ static void __account_cfs_rq_runtime(str +@@ -4746,7 +4746,7 @@ static void __account_cfs_rq_runtime(str * hierarchy can be throttled */ if (!assign_cfs_rq_runtime(cfs_rq) && likely(cfs_rq->curr)) @@ -475,7 +475,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> } static __always_inline -@@ -5475,7 +5475,7 @@ static void hrtick_start_fair(struct rq +@@ -5509,7 +5509,7 @@ static void hrtick_start_fair(struct rq if (delta < 0) { if (task_current(rq, p)) @@ -484,7 +484,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> return; } hrtick_start(rq, delta); -@@ -7125,7 +7125,7 @@ static void check_preempt_wakeup(struct +@@ -7159,7 +7159,7 @@ static void check_preempt_wakeup(struct return; preempt: @@ -493,7 +493,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> /* * Only set the backward buddy when the current task is still * on the rq. This can happen when a wakeup gets interleaved -@@ -11160,7 +11160,7 @@ static void task_fork_fair(struct task_s +@@ -11196,7 +11196,7 @@ static void task_fork_fair(struct task_s * 'current' within the tree based on its new key value. */ swap(curr->vruntime, se->vruntime); @@ -502,7 +502,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> } se->vruntime -= cfs_rq->min_vruntime; -@@ -11187,7 +11187,7 @@ prio_changed_fair(struct rq *rq, struct +@@ -11223,7 +11223,7 @@ prio_changed_fair(struct rq *rq, struct */ if (task_current(rq, p)) { if (p->prio > oldprio) @@ -637,19 +637,25 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> char irqs_off; int hardirq; int softirq; -@@ -465,9 +466,11 @@ int trace_print_lat_fmt(struct trace_seq +@@ -462,20 +463,27 @@ int trace_print_lat_fmt(struct trace_seq + + switch (entry->flags & (TRACE_FLAG_NEED_RESCHED | + TRACE_FLAG_PREEMPT_RESCHED)) { ++#ifndef CONFIG_PREEMPT_LAZY case TRACE_FLAG_NEED_RESCHED | TRACE_FLAG_PREEMPT_RESCHED: need_resched = 'N'; break; -+#ifndef CONFIG_PREEMPT_LAZY ++#endif case TRACE_FLAG_NEED_RESCHED: need_resched = 'n'; break; -+#endif ++#ifndef CONFIG_PREEMPT_LAZY case TRACE_FLAG_PREEMPT_RESCHED: need_resched = 'p'; break; -@@ -476,6 +479,9 @@ int trace_print_lat_fmt(struct trace_seq ++#endif + default: + need_resched = '.'; break; } @@ -659,7 +665,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> hardsoft_irq = (nmi && hardirq) ? 'Z' : nmi ? 'z' : -@@ -484,14 +490,20 @@ int trace_print_lat_fmt(struct trace_seq +@@ -484,14 +492,20 @@ int trace_print_lat_fmt(struct trace_seq softirq ? 's' : '.' ; diff --git a/patches/sched_introduce_migratable.patch b/patches/sched_introduce_migratable.patch deleted file mode 100644 index f9535b8944c4..000000000000 --- a/patches/sched_introduce_migratable.patch +++ /dev/null @@ -1,45 +0,0 @@ -From: Valentin Schneider <valentin.schneider@arm.com> -Subject: sched: Introduce migratable() -Date: Wed, 11 Aug 2021 21:13:52 +0100 - -Some areas use preempt_disable() + preempt_enable() to safely access -per-CPU data. The PREEMPT_RT folks have shown this can also be done by -keeping preemption enabled and instead disabling migration (and acquiring a -sleepable lock, if relevant). - -Introduce a helper which checks whether the current task can be migrated -elsewhere, IOW if it is pinned to its local CPU in the current -context. This can help determining if per-CPU properties can be safely -accessed. - -Note that CPU affinity is not checked here, as a preemptible task can have -its affinity changed at any given time (including if it has -PF_NO_SETAFFINITY, when hotplug gets involved). - -Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> -[bigeasy: Return false on UP, call it is_migratable().] -Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> -Link: https://lore.kernel.org/r/20210811201354.1976839-3-valentin.schneider@arm.com ---- - include/linux/sched.h | 10 ++++++++++ - 1 file changed, 10 insertions(+) - ---- a/include/linux/sched.h -+++ b/include/linux/sched.h -@@ -1739,6 +1739,16 @@ static __always_inline bool is_percpu_th - #endif - } - -+/* Is the current task guaranteed to stay on its current CPU? */ -+static inline bool is_migratable(void) -+{ -+#ifdef CONFIG_SMP -+ return preemptible() && !current->migration_disabled; -+#else -+ return false; -+#endif -+} -+ - /* Per-process atomic flags. */ - #define PFA_NO_NEW_PRIVS 0 /* May not gain new privileges. */ - #define PFA_SPREAD_PAGE 1 /* Spread page cache over cpuset */ diff --git a/patches/series b/patches/series index c647129a82a0..000fa5282f02 100644 --- a/patches/series +++ b/patches/series @@ -11,13 +11,14 @@ 0006-printk-refactor-and-rework-printing-logic.patch 0007-printk-move-buffer-definitions-into-console_emit_nex.patch 0008-printk-add-pr_flush.patch -0009-printk-add-kthread-console-printers.patch -0010-printk-reimplement-console_lock-for-proper-kthread-s.patch -0011-printk-remove-console_locked.patch -0012-console-introduce-CON_MIGHT_SLEEP-for-vt.patch -0013-printk-add-infrastucture-for-atomic-consoles.patch -0014-serial-8250-implement-write_atomic.patch -0015-printk-avoid-preempt_disable-for-PREEMPT_RT.patch +0009-printk-add-functions-to-allow-direct-printing.patch +0010-printk-add-kthread-console-printers.patch +0011-printk-reimplement-console_lock-for-proper-kthread-s.patch +0012-printk-remove-console_locked.patch +0013-console-introduce-CON_MIGHT_SLEEP-for-vt.patch +0014-printk-add-infrastucture-for-atomic-consoles.patch +0015-serial-8250-implement-write_atomic.patch +0016-printk-avoid-preempt_disable-for-PREEMPT_RT.patch ########################################################################### # Posted and applied @@ -42,11 +43,6 @@ locking-local_lock-Make-the-empty-local_lock_-functi.patch 0007_kernel_fork_only_cache_the_vmap_stack_in_finish_task_switch.patch 0008_kernel_fork_use_is_enabled_in_account_kernel_stack.patch -# random -0003_random_split_add_interrupt_randomness.patch -0004_random_move_the_fast_pool_reset_into_the_caller.patch -0005_random_defer_processing_of_randomness_on_preempt_rt.patch - # cgroup 0001-mm-memcg-Disable-threshold-event-handlers-on-PREEMPT.patch 0002-mm-memcg-Protect-per-CPU-counter-by-disabling-preemp.patch @@ -65,6 +61,10 @@ locking-Enable-RT_MUTEXES-by-default-on-PREEMPT_RT.patch genirq-Provide-generic_handle_irq_safe.patch Use-generic_handle_irq_safe-where-it-makes-sense.patch +# Random, WIP +0001-random-use-computational-hash-for-entropy-extraction.patch +0002-random-do-not-take-spinlocks-in-irq-handler.patch + ########################################################################### # Kconfig bits: ########################################################################### @@ -76,10 +76,9 @@ jump-label__disable_if_stop_machine_is_used.patch sched-Make-preempt_enable_no_resched-behave-like-pre.patch # net -0001-net-dev-Remove-the-preempt_disable-in-netif_rx_inter.patch -0002-net-dev-Remove-get_cpu-in-netif_rx_internal.patch +0001-net-dev-Remove-preempt_disable-and-get_cpu-in-netif_.patch +0002-net-dev-Make-rps_lock-disable-interrupts.patch 0003-net-dev-Makes-sure-netif_rx-can-be-invoked-in-any-co.patch -0004-net-dev-Make-rps_lock-disable-interrupts.patch ########################################################################### # sched: @@ -157,14 +156,9 @@ arch_arm64__Add_lazy_preempt_support.patch ########################################################################### # ARM/ARM64 ########################################################################### -# Valentin's fixes -########################################################################### -sched_introduce_migratable.patch -arm64_mm_make_arch_faults_on_old_pte_check_for_migratability.patch - -########################################################################### ARM__enable_irq_in_translation_section_permission_fault_handlers.patch KVM__arm_arm64__downgrade_preempt_disabled_region_to_migrate_disable.patch +arm64-mm-Make-arch_faults_on_old_pte-check-for-migra.patch arm64-sve-Delay-freeing-memory-in-fpsimd_flush_threa.patch arm64-sve-Make-kernel-FPU-protection-RT-friendly.patch arm64-signal-Use-ARCH_RT_DELAYS_SIGNAL_SEND.patch diff --git a/patches/signal_x86__Delay_calling_signals_in_atomic.patch b/patches/signal_x86__Delay_calling_signals_in_atomic.patch index 7d0f64da6850..7483d87d40f1 100644 --- a/patches/signal_x86__Delay_calling_signals_in_atomic.patch +++ b/patches/signal_x86__Delay_calling_signals_in_atomic.patch @@ -66,7 +66,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> typedef sigset_t compat_sigset_t; --- a/include/linux/sched.h +++ b/include/linux/sched.h -@@ -1087,6 +1087,10 @@ struct task_struct { +@@ -1083,6 +1083,10 @@ struct task_struct { /* Restored if set_restore_sigmask() was used: */ sigset_t saved_sigmask; struct sigpending pending; diff --git a/patches/softirq__Check_preemption_after_reenabling_interrupts.patch b/patches/softirq__Check_preemption_after_reenabling_interrupts.patch index ce29ba9ec4a4..4d7a7c8900e4 100644 --- a/patches/softirq__Check_preemption_after_reenabling_interrupts.patch +++ b/patches/softirq__Check_preemption_after_reenabling_interrupts.patch @@ -61,7 +61,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> } EXPORT_SYMBOL(__dev_kfree_skb_irq); -@@ -5742,12 +5744,14 @@ static void net_rps_action_and_irq_enabl +@@ -5741,12 +5743,14 @@ static void net_rps_action_and_irq_enabl sd->rps_ipi_list = NULL; local_irq_enable(); @@ -76,7 +76,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> } static bool sd_has_rps_ipi_waiting(struct softnet_data *sd) -@@ -5823,6 +5827,7 @@ void __napi_schedule(struct napi_struct +@@ -5822,6 +5826,7 @@ void __napi_schedule(struct napi_struct local_irq_save(flags); ____napi_schedule(this_cpu_ptr(&softnet_data), n); local_irq_restore(flags); @@ -84,7 +84,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> } EXPORT_SYMBOL(__napi_schedule); -@@ -10647,6 +10652,7 @@ static int dev_cpu_dead(unsigned int old +@@ -10646,6 +10651,7 @@ static int dev_cpu_dead(unsigned int old raise_softirq_irqoff(NET_TX_SOFTIRQ); local_irq_enable(); diff --git a/patches/tty_serial_pl011__Make_the_locking_work_on_RT.patch b/patches/tty_serial_pl011__Make_the_locking_work_on_RT.patch index 49a0d9ab9779..d989e30aba19 100644 --- a/patches/tty_serial_pl011__Make_the_locking_work_on_RT.patch +++ b/patches/tty_serial_pl011__Make_the_locking_work_on_RT.patch @@ -16,7 +16,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- --- a/drivers/tty/serial/amba-pl011.c +++ b/drivers/tty/serial/amba-pl011.c -@@ -2279,18 +2279,24 @@ pl011_console_write(struct console *co, +@@ -2270,18 +2270,24 @@ pl011_console_write(struct console *co, { struct uart_amba_port *uap = amba_ports[co->index]; unsigned int old_cr = 0, new_cr; @@ -45,7 +45,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> /* * First save the CR then disable the interrupts -@@ -2316,8 +2322,7 @@ pl011_console_write(struct console *co, +@@ -2307,8 +2313,7 @@ pl011_console_write(struct console *co, pl011_write(old_cr, uap, REG_CR); if (locked) diff --git a/patches/x86__Support_for_lazy_preemption.patch b/patches/x86__Support_for_lazy_preemption.patch index 362d9c5f3d3a..2bd7e8af4a0d 100644 --- a/patches/x86__Support_for_lazy_preemption.patch +++ b/patches/x86__Support_for_lazy_preemption.patch @@ -19,7 +19,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig -@@ -235,6 +235,7 @@ config X86 +@@ -236,6 +236,7 @@ config X86 select HAVE_PCI select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP diff --git a/patches/x86__kvm_Require_const_tsc_for_RT.patch b/patches/x86__kvm_Require_const_tsc_for_RT.patch index be11d3e3139f..ad1ccab7b8fa 100644 --- a/patches/x86__kvm_Require_const_tsc_for_RT.patch +++ b/patches/x86__kvm_Require_const_tsc_for_RT.patch @@ -18,7 +18,7 @@ Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c -@@ -8742,6 +8742,12 @@ int kvm_arch_init(void *opaque) +@@ -8811,6 +8811,12 @@ int kvm_arch_init(void *opaque) goto out; } |