From 5d5e4522a7f404d1a96fd6c703989d32a9c9568d Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Sun, 7 Nov 2021 14:51:16 +1000 Subject: printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces printk from NMI context relies on irq work being raised on the local CPU to print to console. This can be a problem if the NMI was raised by a lockup detector to print lockup stack and regs, because the CPU may not enable irqs (because it is locked up). Introduce printk_trigger_flush() that can be called another CPU to try to get those messages to the console, call that where printk_safe_flush was previously called. Fixes: 93d102f094be ("printk: remove safe buffers") Cc: stable@vger.kernel.org # 5.15 Signed-off-by: Nicholas Piggin Reviewed-by: Petr Mladek Reviewed-by: John Ogness Signed-off-by: Petr Mladek Link: https://lore.kernel.org/r/20211107045116.1754411-1-npiggin@gmail.com --- lib/nmi_backtrace.c | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'lib') diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c index f9e89001b52e..199ab201d501 100644 --- a/lib/nmi_backtrace.c +++ b/lib/nmi_backtrace.c @@ -75,6 +75,12 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask, touch_softlockup_watchdog(); } + /* + * Force flush any remote buffers that might be stuck in IRQ context + * and therefore could not run their irq_work. + */ + printk_trigger_flush(); + clear_bit_unlock(0, &backtrace_flag); put_cpu(); } -- cgit v1.2.1 From ae8d67b2117f1ec6c8170d6e1af8ded17392bd2c Mon Sep 17 00:00:00 2001 From: Nick Terrell Date: Mon, 15 Nov 2021 19:08:19 -0800 Subject: lib: zstd: Fix unused variable warning The variable `litLengthSum` is only used by an `assert()`, so when asserts are disabled the compiler doesn't see any usage and warns. This issue is already fixed upstream by PR #2838 [0]. It was reported by the Kernel test robot in [1]. Another approach would be to change zstd's disabled `assert()` definition to use the argument in a disabled branch, instead of ignoring the argument. I've avoided this approach because there are some small changes necessary to get zstd to build, and I would want to thoroughly re-test for performance, since that is slightly changing the code in every function in zstd. It seems like a trivial change, but some functions are pretty sensitive to small changes. However, I think it is a valid approach that I would like to see upstream take, so I've opened Issue #2868 to attempt this upstream. Lastly, I've chosen not to use __maybe_unused because all code in lib/zstd/ must eventually be upstreamed. Upstream zstd can't use __maybe_unused because it isn't portable across all compilers. [0] https://github.com/facebook/zstd/pull/2838 [1] https://lore.kernel.org/linux-mm/202111120312.833wII4i-lkp@intel.com/T/ [2] https://github.com/facebook/zstd/issues/2868 Link: https://lore.kernel.org/r/20211117014949.1169186-2-nickrterrell@gmail.com/ Link: https://lore.kernel.org/r/20211117201459.1194876-2-nickrterrell@gmail.com/ Reported-by: kernel test robot Signed-off-by: Nick Terrell --- lib/zstd/compress/zstd_compress_superblock.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'lib') diff --git a/lib/zstd/compress/zstd_compress_superblock.c b/lib/zstd/compress/zstd_compress_superblock.c index ee03e0aedb03..b0610b255653 100644 --- a/lib/zstd/compress/zstd_compress_superblock.c +++ b/lib/zstd/compress/zstd_compress_superblock.c @@ -411,6 +411,8 @@ static size_t ZSTD_seqDecompressedSize(seqStore_t const* seqStore, const seqDef* const seqDef* sp = sstart; size_t matchLengthSum = 0; size_t litLengthSum = 0; + /* Only used by assert(), suppress unused variable warnings in production. */ + (void)litLengthSum; while (send-sp > 0) { ZSTD_sequenceLength const seqLen = ZSTD_getSequenceLength(seqStore, sp); litLengthSum += seqLen.litLength; -- cgit v1.2.1 From 1974990cca43a6ba708a70b15862113eb9c2f399 Mon Sep 17 00:00:00 2001 From: Nick Terrell Date: Mon, 15 Nov 2021 20:33:08 -0800 Subject: lib: zstd: Don't inline functions in zstd_opt.c `zstd_opt.c` contains the match finder for the highest compression levels. These levels are already very slow, and are unlikely to be used in the kernel. If they are used, they shouldn't be used in latency sensitive workloads, so slowing them down shouldn't be a big deal. This saves 188 KB of the 288 KB regression reported by Geert Uytterhoeven [0]. I've also opened an issue upstream [1] so that we can properly tackle the code size issue in `zstd_opt.c` for all users, and can hopefully remove this hack in the next zstd version we import. Bloat-o-meter output on x86-64: ``` > ../scripts/bloat-o-meter vmlinux.old vmlinux add/remove: 6/5 grow/shrink: 1/9 up/down: 16673/-209939 (-193266) Function old new delta ZSTD_compressBlock_opt_generic.constprop - 7559 +7559 ZSTD_insertBtAndGetAllMatches - 6304 +6304 ZSTD_insertBt1 - 1731 +1731 ZSTD_storeSeq - 693 +693 ZSTD_BtGetAllMatches - 255 +255 ZSTD_updateRep - 128 +128 ZSTD_updateTree 96 99 +3 ZSTD_insertAndFindFirstIndexHash3 81 - -81 ZSTD_setBasePrices.constprop 98 - -98 ZSTD_litLengthPrice.constprop 138 - -138 ZSTD_count 362 181 -181 ZSTD_count_2segments 1407 938 -469 ZSTD_insertBt1.constprop 2689 - -2689 ZSTD_compressBlock_btultra2 19990 423 -19567 ZSTD_compressBlock_btultra 19633 15 -19618 ZSTD_initStats_ultra 19825 - -19825 ZSTD_compressBlock_btopt 20374 12 -20362 ZSTD_compressBlock_btopt_extDict 29984 12 -29972 ZSTD_compressBlock_btultra_extDict 30718 15 -30703 ZSTD_compressBlock_btopt_dictMatchState 32689 12 -32677 ZSTD_compressBlock_btultra_dictMatchState 33574 15 -33559 Total: Before=6611828, After=6418562, chg -2.92% ``` [0] https://lkml.org/lkml/2021/11/14/189 [1] https://github.com/facebook/zstd/issues/2862 Link: https://lore.kernel.org/r/20211117014949.1169186-3-nickrterrell@gmail.com/ Link: https://lore.kernel.org/r/20211117201459.1194876-3-nickrterrell@gmail.com/ Reported-by: Geert Uytterhoeven Tested-by: Geert Uytterhoeven Reviewed-by: Geert Uytterhoeven Signed-off-by: Nick Terrell --- lib/zstd/common/compiler.h | 7 +++++++ lib/zstd/compress/zstd_opt.c | 12 ++++++++++++ 2 files changed, 19 insertions(+) (limited to 'lib') diff --git a/lib/zstd/common/compiler.h b/lib/zstd/common/compiler.h index a1a051e4bce6..f5a9c70a228a 100644 --- a/lib/zstd/common/compiler.h +++ b/lib/zstd/common/compiler.h @@ -16,6 +16,7 @@ *********************************************************/ /* force inlining */ +#if !defined(ZSTD_NO_INLINE) #if (defined(__GNUC__) && !defined(__STRICT_ANSI__)) || defined(__cplusplus) || defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L /* C99 */ # define INLINE_KEYWORD inline #else @@ -24,6 +25,12 @@ #define FORCE_INLINE_ATTR __attribute__((always_inline)) +#else + +#define INLINE_KEYWORD +#define FORCE_INLINE_ATTR + +#endif /* On MSVC qsort requires that functions passed into it use the __cdecl calling conversion(CC). diff --git a/lib/zstd/compress/zstd_opt.c b/lib/zstd/compress/zstd_opt.c index 04337050fe9a..dfc55e3e8119 100644 --- a/lib/zstd/compress/zstd_opt.c +++ b/lib/zstd/compress/zstd_opt.c @@ -8,6 +8,18 @@ * You may select, at your option, one of the above-listed licenses. */ +/* + * Disable inlining for the optimal parser for the kernel build. + * It is unlikely to be used in the kernel, and where it is used + * latency shouldn't matter because it is very slow to begin with. + * We prefer a ~180KB binary size win over faster optimal parsing. + * + * TODO(https://github.com/facebook/zstd/issues/2862): + * Improve the code size of the optimal parser in general, so we + * don't need this hack for the kernel build. + */ +#define ZSTD_NO_INLINE 1 + #include "zstd_compress_internal.h" #include "hist.h" #include "zstd_opt.h" -- cgit v1.2.1 From 7416cdc9b9c10968c57b1f73be5d48b3ecdaf3c8 Mon Sep 17 00:00:00 2001 From: Nick Terrell Date: Tue, 16 Nov 2021 15:11:39 -0800 Subject: lib: zstd: Don't add -O3 to cflags After the update to zstd-1.4.10 passing -O3 is no longer necessary to get good performance from zstd. Using the default optimization level -O2 is sufficient to get good performance. I've measured no significant change to compression speed, and a ~1% decompression speed loss, which is acceptable. This fixes the reported parisc -Wframe-larger-than=1536 errors [0]. The gcc-8-hppa-linux-gnu compiler performed very poorly with -O3, generating stacks that are ~3KB. With -O2 these same functions generate stacks in the < 100B, completely fixing the problem. Function size deltas are listed below: ZSTD_compressBlock_fast_extDict_generic: 3800 -> 68 ZSTD_compressBlock_fast: 2216 -> 40 ZSTD_compressBlock_fast_dictMatchState: 1848 -> 64 ZSTD_compressBlock_doubleFast_extDict_generic: 3744 -> 76 ZSTD_fillDoubleHashTable: 3252 -> 0 ZSTD_compressBlock_doubleFast: 5856 -> 36 ZSTD_compressBlock_doubleFast_dictMatchState: 5380 -> 84 ZSTD_copmressBlock_lazy2: 2420 -> 72 Additionally, this improves the reported code bloat [1]. With gcc-11 bloat-o-meter shows an 80KB code size improvement: ``` > ../scripts/bloat-o-meter vmlinux.old vmlinux add/remove: 31/8 grow/shrink: 24/155 up/down: 25734/-107924 (-82190) Total: Before=6418562, After=6336372, chg -1.28% ``` Compared to before the zstd-1.4.10 update we see a total code size regression of 105KB, down from 374KB at v5.16-rc1: ``` > ../scripts/bloat-o-meter vmlinux.old vmlinux add/remove: 292/62 grow/shrink: 56/88 up/down: 235009/-127487 (107522) Total: Before=6228850, After=6336372, chg +1.73% ``` [0] https://lkml.org/lkml/2021/11/15/710 [1] https://lkml.org/lkml/2021/11/14/189 Link: https://lore.kernel.org/r/20211117014949.1169186-4-nickrterrell@gmail.com/ Link: https://lore.kernel.org/r/20211117201459.1194876-4-nickrterrell@gmail.com/ Reported-by: Geert Uytterhoeven Tested-by: Geert Uytterhoeven Reviewed-by: Geert Uytterhoeven Signed-off-by: Nick Terrell --- lib/zstd/Makefile | 2 -- 1 file changed, 2 deletions(-) (limited to 'lib') diff --git a/lib/zstd/Makefile b/lib/zstd/Makefile index 65218ec5b8f2..fc45339fc3a3 100644 --- a/lib/zstd/Makefile +++ b/lib/zstd/Makefile @@ -11,8 +11,6 @@ obj-$(CONFIG_ZSTD_COMPRESS) += zstd_compress.o obj-$(CONFIG_ZSTD_DECOMPRESS) += zstd_decompress.o -ccflags-y += -O3 - zstd_compress-y := \ zstd_compress_module.o \ common/debug.o \ -- cgit v1.2.1 From cab71f7495f7aa639ca4b8508f4c3e426e9cb2f7 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 19 Nov 2021 16:43:46 -0800 Subject: kasan: test: silence intentional read overflow warnings As done in commit d73dad4eb5ad ("kasan: test: bypass __alloc_size checks") for __write_overflow warnings, also silence some more cases that trip the __read_overflow warnings seen in 5.16-rc1[1]: In file included from include/linux/string.h:253, from include/linux/bitmap.h:10, from include/linux/cpumask.h:12, from include/linux/mm_types_task.h:14, from include/linux/mm_types.h:5, from include/linux/page-flags.h:13, from arch/arm64/include/asm/mte.h:14, from arch/arm64/include/asm/pgtable.h:12, from include/linux/pgtable.h:6, from include/linux/kasan.h:29, from lib/test_kasan.c:10: In function 'memcmp', inlined from 'kasan_memcmp' at lib/test_kasan.c:897:2: include/linux/fortify-string.h:263:25: error: call to '__read_overflow' declared with attribute error: detected read beyond size of object (1st parameter) 263 | __read_overflow(); | ^~~~~~~~~~~~~~~~~ In function 'memchr', inlined from 'kasan_memchr' at lib/test_kasan.c:872:2: include/linux/fortify-string.h:277:17: error: call to '__read_overflow' declared with attribute error: detected read beyond size of object (1st parameter) 277 | __read_overflow(); | ^~~~~~~~~~~~~~~~~ [1] http://kisskb.ellerman.id.au/kisskb/buildresult/14660585/log/ Link: https://lkml.kernel.org/r/20211116004111.3171781-1-keescook@chromium.org Fixes: d73dad4eb5ad ("kasan: test: bypass __alloc_size checks") Signed-off-by: Kees Cook Reviewed-by: Andrey Konovalov Acked-by: Marco Elver Cc: Andrey Ryabinin Cc: Alexander Potapenko Cc: Dmitry Vyukov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/test_kasan.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'lib') diff --git a/lib/test_kasan.c b/lib/test_kasan.c index 67ed689a0b1b..0643573f8686 100644 --- a/lib/test_kasan.c +++ b/lib/test_kasan.c @@ -869,6 +869,7 @@ static void kasan_memchr(struct kunit *test) ptr = kmalloc(size, GFP_KERNEL | __GFP_ZERO); KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr); + OPTIMIZER_HIDE_VAR(size); KUNIT_EXPECT_KASAN_FAIL(test, kasan_ptr_result = memchr(ptr, '1', size + 1)); @@ -894,6 +895,7 @@ static void kasan_memcmp(struct kunit *test) KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr); memset(arr, 0, sizeof(arr)); + OPTIMIZER_HIDE_VAR(size); KUNIT_EXPECT_KASAN_FAIL(test, kasan_int_result = memcmp(ptr, arr, size+1)); kfree(ptr); -- cgit v1.2.1