delta/linux.git - git.kernel.org: pub/scm/linux/kernel/git/torvalds/linux.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	intel_idle: mark few variables as __read_mostly	Artem Bityutskiy	2023-04-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	The intention is to clean up the code and make it look a bit more consistent. Mark all unitialized module parameter variables as __read_mostly, not just one of them. The other parameters are read-mostly too. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: do not sprinkle module parameter definitions around	Artem Bityutskiy	2023-04-27	1	-3/+7
\| \| \| \| \| \| \| \| \| \|	This is a cleanup which improves code consistency. Move the force_irq_on module parameter variable and definition to the same place where we have variables and definitions for other module parameters. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: fix confusing message	Artem Bityutskiy	2023-04-27	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default, all non-POLL C-states are entered with interrupts disabled. There are 2 ways to make 'intel_idle' enter C-states with interrupts enabled: 1. Mark the C-state with the CPUIDLE_FLAG_IRQ_ENABLE flag. 2. Use the force_irq_on module parameter. The former is the "proper" way of doing it, it is per-C-state and per-platform. The latter is for debugging purposes only. The problem is that intel_idle prints the "forced intel_idle_irq" message in both cases, even though the former case does not needed this message, because nothing is forced there. This patch addresses the problem. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: improve C-state flags handling robustness	Artem Bityutskiy	2023-04-27	1	-10/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following C-state flags are currently mutually-exclusive and should not be combined: * IRQ_ENABLE * IBRS * XSTATE There is a warning for the situation when the IRQ_ENABLE flag is combined with the IBRS flag, but no warnings for other combinations. This is inconsistent and prone to errors. Improve the situation by adding warnings for all the unexpected combinations. Add a couple of helpful commentaries too. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: further intel_idle_init_cstates_icpu() cleanup	Artem Bityutskiy	2023-04-27	1	-11/+13
\| \| \| \| \| \| \| \| \| \| \|	Introduce a temporary 'state' variable for referencing the currently processed C-state in the intel_idle_init_cstates_icpu() function. This makes code lines shorter and easier to read. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: clean up intel_idle_init_cstates_icpu()	Artem Bityutskiy	2023-04-27	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The intel_idle_init_cstates_icpu() function includes a loop that iterates over every C-state. Inside the loop, the same C-state data is referenced 2 ways: 1. as cpuidle_state_table[cstate] 2. as drv->states[drv->state_count] (but it is a copy of #1, not the same object). Make the code be more consistent and easier to read by using only the 2nd way. So the code structure would be as follows: 1. Use cpuidle_state_table[cstate] 2. Copy cpuidle_state_table[cstate] to drv->states[drv->state_count] 3. Use only drv->states[drv->state_count] from this point. Note, this change introduces a checkpatch.pl warning (too long line), but it will be addressed in the next patch. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: use pr_info() instead of printk()	Artem Bityutskiy	2023-04-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Substitute 'printk()' with 'pr_info()', because 'intel_idle' already uses 'pr_debug()', so using 'pr_info()' will be more consistent. In addition to this, this patch addresses the following checkpatch.pl warning: WARNING: printk() should include KERN_<LEVEL> facility level Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	Merge tag 'pm-6.3-rc1' of ↵	Linus Torvalds	2023-02-21	1	-0/+2
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add EPP support to the AMD P-state cpufreq driver, add support for new platforms to the Intel RAPL power capping driver, intel_idle and the Qualcomm cpufreq driver, enable thermal cooling for Tegra194, drop the custom cpufreq driver for loongson1 that is not necessary any more (and the corresponding cpufreq platform device), fix assorted issues and clean up code. Specifics: - Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes Karny, Arnd Bergmann, Bagas Sanjaya) - Drop the custom cpufreq driver for loongson1 that is not necessary any more and the corresponding cpufreq platform device (Keguang Zhang) - Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig entries (Paul E. McKenney) - Enable thermal cooling for Tegra194 (Yi-Wei Wang) - Register module device table and add missing compatibles for cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss) - Various dt binding updates for qcom-cpufreq-nvmem and opp-v2-kryo-cpu (Christian Marangi) - Make kobj_type structure in the cpufreq core constant (Thomas Weißschuh) - Make cpufreq_unregister_driver() return void (Uwe Kleine-König) - Make the TEO cpuidle governor check CPU utilization in order to refine idle state selection (Kajetan Puchalski) - Make Kconfig select the haltpoll cpuidle governor when the haltpoll cpuidle driver is selected and replace a default_idle() call in that driver with arch_cpu_idle() to allow MWAIT to be used (Li RongQing) - Add Emerald Rapids Xeon support to the intel_idle driver (Artem Bityutskiy) - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to avoid randconfig build failures (Arnd Bergmann) - Make kobj_type structures used in the cpuidle sysfs interface constant (Thomas Weißschuh) - Make the cpuidle driver registration code update microsecond values of idle state parameters in accordance with their nanosecond values if they are provided (Rafael Wysocki) - Make the PSCI cpuidle driver prevent topology CPUs from being suspended on PREEMPT_RT (Krzysztof Kozlowski) - Document that pm_runtime_force_suspend() cannot be used with DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald) - Add EXPORT macros for exporting PM functions from drivers (Richard Fitzgerald) - Remove /** from non-kernel-doc comments in hibernation code (Randy Dunlap) - Fix possible name leak in powercap_register_zone() (Yang Yingliang) - Add Meteor Lake and Emerald Rapids support to the intel_rapl power capping driver (Zhang Rui) - Modify the idle_inject power capping facility to support 100% idle injection (Srinivas Pandruvada) - Fix large time windows handling in the intel_rapl power capping driver (Zhang Rui) - Fix memory leaks with using debugfs_lookup() in the generic PM domains and Energy Model code (Greg Kroah-Hartman) - Add missing 'cache-unified' property in the example for kryo OPP bindings (Rob Herring) - Fix error checking in opp_migrate_dentry() (Qi Zheng) - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad Dybcio) - Modify some power management utilities to use the canonical ftrace path (Ross Zwisler) - Correct spelling problems for Documentation/power/ as reported by codespell (Randy Dunlap)" * tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits) Documentation: amd-pstate: disambiguate user space sections cpufreq: amd-pstate: Fix invalid write to MSR_AMD_CPPC_REQ dt-bindings: opp: opp-v2-kryo-cpu: enlarge opp-supported-hw maximum dt-bindings: cpufreq: qcom-cpufreq-nvmem: make cpr bindings optional dt-bindings: cpufreq: qcom-cpufreq-nvmem: specify supported opp tables PM: Add EXPORT macros for exporting PM functions cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT MIPS: loongson32: Drop obsolete cpufreq platform device powercap: intel_rapl: Fix handling for large time window cpuidle: driver: Update microsecond values of state parameters as needed cpuidle: sysfs: make kobj_type structures constant cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies PM: EM: fix memory leak with using debugfs_lookup() PM: domains: fix memory leak with using debugfs_lookup() cpufreq: Make kobj_type structure constant cpufreq: davinci: Fix clk use after free cpufreq: amd-pstate: avoid uninitialized variable use cpufreq: Make cpufreq_unregister_driver() return void OPP: fix error checking in opp_migrate_dentry() dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM8550 compatible ...
\| *	intel_idle: add Emerald Rapids Xeon support	Artem Bityutskiy	2023-01-20	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Emerald Rapids (EMR) is the next Intel Xeon processor after Sapphire Rapids (SPR). EMR C-states are the same as SPR C-states, and we expect that EMR C-state characteristics (latency and target residency) will be the same as in SPR. Therefore, add EMR support by using SPR C-states table. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* \|	intel_idle: Add force_irq_on module param	Peter Zijlstra	2023-01-13	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For testing purposes. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195541.967699392@infradead.org
* \|	cpuidle, intel_idle: Fix CPUIDLE_FLAG_IBRS	Peter Zijlstra	2023-01-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	objtool to the rescue: vmlinux.o: warning: objtool: intel_idle_ibrs+0x17: call to spec_ctrl_current() leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle_ibrs+0x27: call to wrmsrl.constprop.0() leaves .noinstr.text section Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.556912863@infradead.org
* \|	cpuidle, intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE again	Peter Zijlstra	2023-01-13	1	-7/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	So objtool found this bug: vmlinux.o: warning: objtool: intel_idle_irq+0x10c: call to trace_hardirqs_off() leaves .noinstr.text section As per commit 32d4fd5751ea ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE"): "must not have tracing in idle functions" Clearly people can't read and tinker along until splat dissapears. This straight up reverts commit d295ad34f236 ("intel_idle: Fix false positive RCU splats due to incorrect hardirqs state"). It doesn't re-introduce the problem because preceding patches fixed it properly. Fixes: d295ad34f236 ("intel_idle: Fix false positive RCU splats due to incorrect hardirqs state") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.434302128@infradead.org
*	intel_idle: Add AlderLake-N support	Zhang Rui	2022-09-21	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similar to the other other AlderLake platforms, the C1 and C1E states on ADL-N are mutually exclusive. Only one of them can be enabled at a time. C1E is preferred on ADL-N for better energy efficiency. C6S is also supported on this platform. Its latency is far bigger than C6, but really close to C8 (PC8), thus it is not exposed as a separate state. Suggested-by: Baieswara Reddy Sagili <baieswara.reddy.sagili@intel.com> Suggested-by: Vinay Kumar <vinay.kumar@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: move from strlcpy() with unused retval to strscpy()	Wolfram Sang	2022-08-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	Merge tag 'pm-5.20-rc1' of ↵	Linus Torvalds	2022-08-02	1	-23/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These are mostly minor improvements all over including new CPU IDs for the Intel RAPL driver, an Energy Model rework to use micro-Watt as the power unit, cpufreq fixes and cleanus, cpuidle updates, devfreq updates, documentation cleanups and a new version of the pm-graph suite of utilities. Specifics: - Make cpufreq_show_cpus() more straightforward (Viresh Kumar). - Drop unnecessary CPU hotplug locking from store() used by cpufreq sysfs attributes (Viresh Kumar). - Make the ACPI cpufreq driver support the boost control interface on Zhaoxin/Centaur processors (Tony W Wang-oc). - Print a warning message on attempts to free an active cpufreq policy which should never happen (Viresh Kumar). - Fix grammar in the Kconfig help text for the loongson2 cpufreq driver (Randy Dunlap). - Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq governor (Zhao Liu). - Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll cpuidle driver (Eiichi Tsukata). - Modify intel_idle to treat C1 and C1E as independent idle states on Sapphire Rapids (Artem Bityutskiy). - Extend support for wakeirq to callback wrappers used during system suspend and resume (Ulf Hansson). - Defer waiting for device probe before loading a hibernation image till the first actual device access to avoid possible deadlocks reported by syzbot (Tetsuo Handa). - Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn Helgaas). - Add Raptor Lake-P to the list of processors supported by the Intel RAPL driver (George D Sworo). - Add Alder Lake-N and Raptor Lake-P to the list of processors for which Power Limit4 is supported in the Intel RAPL driver (Sumeet Pawnikar). - Make pm_genpd_remove() check genpd_debugfs_dir against NULL before attempting to remove it (Hsin-Yi Wang). - Change the Energy Model code to represent power in micro-Watts and adjust its users accordingly (Lukasz Luba). - Add new devfreq driver for Mediatek CCI (Cache Coherent Interconnect) (Johnson Wang). - Convert the Samsung Exynos SoC Bus bindings to DT schema of exynos-bus.c (Krzysztof Kozlowski). - Address kernel-doc warnings by adding the description for unused function parameters in devfreq core (Mauro Carvalho Chehab). - Use NULL to pass a null pointer rather than zero according to the function propotype in imx-bus.c (Colin Ian King). - Print error message instead of error interger value in tegra30-devfreq.c (Dmitry Osipenko). - Add checks to prevent setting negative frequency QoS limits for CPUs (Shivnandan Kumar). - Update the pm-graph suite of utilities to the latest revision 5.9 including multiple improvements (Todd Brandt). - Drop pme_interrupt reference from the PCI power management documentation (Mario Limonciello)" * tag 'pm-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (27 commits) powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P PM: QoS: Add check to make sure CPU freq is non-negative PM: hibernate: defer device probing when resuming from hibernation intel_idle: make SPR C1 and C1E be independent cpufreq: ondemand: Use cpumask_var_t for on-stack cpu mask cpufreq: loongson2: fix Kconfig "its" grammar pm-graph v5.9 cpufreq: Warn users while freeing active policy cpufreq: scmi: Support the power scale in micro-Watts in SCMI v3.1 firmware: arm_scmi: Get detailed power scale from perf Documentation: EM: Switch to micro-Watts scale PM: EM: convert power field to micro-Watts precision and align drivers PM / devfreq: tegra30: Add error message for devm_devfreq_add_device() PM / devfreq: imx-bus: use NULL to pass a null pointer rather than zero PM / devfreq: shut up kernel-doc warnings dt-bindings: interconnect: samsung,exynos-bus: convert to dtschema PM / devfreq: mediatek: Introduce MediaTek CCI devfreq driver dt-bindings: interconnect: Add MediaTek CCI dt-bindings PM: domains: Ensure genpd_debugfs_dir exists before remove PM: runtime: Extend support for wakeirq for force_suspend\|resume ...
\| *	Merge back cpuidle material for 5.20.	Rafael J. Wysocki	2022-07-29	1	-23/+1
\| \|\
\| \| *	intel_idle: make SPR C1 and C1E be independent	Artem Bityutskiy	2022-07-25	1	-23/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch partially reverts the changes made by the following commit: da0e58c038e6 intel_idle: add 'preferred_cstates' module argument As that commit describes, on early Sapphire Rapids Xeon platforms the C1 and C1E states were mutually exclusive, so that users could only have either C1 and C6, or C1E and C6. However, Intel firmware engineers managed to remove this limitation and make C1 and C1E to be completely independent, just like on previous Xeon platforms. Therefore, this patch: * Removes commentary describing the old, and now non-existing SPR C1E limitation. * Marks SPR C1E as available by default. * Removes the 'preferred_cstates' parameter handling for SPR. Both C1 and C1E will be available regardless of 'preferred_cstates' value. We expect that all SPR systems are shipping with new firmware, which includes the C1/C1E improvement. Cc: v5.18+ <stable@vger.kernel.org> # v5.18+ Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* \| \|	Merge tag 'x86_fpu_for_v6.0_rc1' of ↵	Linus Torvalds	2022-08-01	1	-2/+23
\|\ \ \ \| \|/ / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu update from Borislav Petkov: - Add machinery to initialize AMX register state in order for AMX-capable CPUs to be able to enter deeper low-power state * tag 'x86_fpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: intel_idle: Add a new flag to initialize the AMX state x86/fpu: Add a helper to prepare AMX state for low-power CPU idle
\| * \|	intel_idle: Add a new flag to initialize the AMX state	Chang S. Bae	2022-07-19	1	-2/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The non-initialized AMX state can be the cause of C-state demotion from C6 to C1E. This low-power idle state may improve power savings and thus result in a higher available turbo frequency budget. This behavior is implementation-specific. Initialize the state for the C6 entrance of Sapphire Rapids as needed. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Link: https://lkml.kernel.org/r/20220614164116.5196-1-chang.seok.bae@intel.com
* \| \|	intel_idle: Fix false positive RCU splats due to incorrect hardirqs state	Waiman Long	2022-07-25	1	-1/+7
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 32d4fd5751ea ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE") uses raw_local_irq_enable/local_irq_disable() around call to __intel_idle() in intel_idle_irq(). With interrupt enabled, timer tick interrupt can happen and a subsequently call to __do_softirq() may change the lockdep hardirqs state of a debug kernel back to 'on'. This will result in a mismatch between the cpu hardirqs state (off) and the lockdep hardirqs state (on) causing a number of false positive "WARNING: suspicious RCU usage" splats. Fix that by using local_irq_disable() to disable interrupt in intel_idle_irq(). Fixes: 32d4fd5751ea ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE") Signed-off-by: Waiman Long <longman@redhat.com> Cc: 5.16+ <stable@vger.kernel.org> # 5.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* \|	intel_idle: Disable IBRS during long idle	Peter Zijlstra	2022-06-27	1	-6/+38
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Having IBRS enabled while the SMT sibling is idle unnecessarily slows down the running sibling. OTOH, disabling IBRS around idle takes two MSR writes, which will increase the idle latency. Therefore, only disable IBRS around deeper idle states. Shallow idle states are bounded by the tick in duration, since NOHZ is not allowed for them by virtue of their short target residency. Only do this for mwait-driven idle, since that keeps interrupts disabled across idle, which makes disabling IBRS vs IRQ-entry a non-issue. Note: C6 is a random threshold, most importantly C1 probably shouldn't disable IBRS, benchmarking needed. Suggested-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
*	cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE	Peter Zijlstra	2022-06-08	1	-7/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit c227233ad64c ("intel_idle: enable interrupts before C1 on Xeons") wrecked intel_idle in two ways: - must not have tracing in idle functions - must return with IRQs disabled Additionally, it added a branch for no good reason. Fixes: c227233ad64c ("intel_idle: enable interrupts before C1 on Xeons") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ rjw: Moved the intel_idle() kerneldoc comment next to the function ] Cc: 5.16+ <stable@vger.kernel.org> # 5.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: Add AlderLake support	Zhang Rui	2022-04-28	1	-0/+133
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similar to SPR, the C1 and C1E states on ADL are mutually exclusive. Only one of them can be enabled at a time. But contrast to SPR, which usually has a strong latency requirement as a Xeon processor, C1E is preferred on ADL for better energy efficiency. Add custom C-state tables for ADL with both C1 and C1E, and 1. Enable the "C1E promotion" bit in MSR_IA32_POWER_CTL and mark C1 with the CPUIDLE_FLAG_UNUSABLE flag, so C1 is not available by default. 2. Add support for the "preferred_cstates" module parameter, so that users can choose to use C1 instead of C1E by booting with "intel_idle.preferred_cstates=2". Separate custom C-state tables are introduced for the ADL mobile and desktop processors, because of the exit latency differences between these two variants, especially with respect to PC10. Signed-off-by: Zhang Rui <rui.zhang@intel.com> [ rjw: Changelog edits, code rearrangement ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: Fix SPR C6 optimization	Artem Bityutskiy	2022-04-27	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Sapphire Rapids (SPR) C6 optimization was added to the end of the 'spr_idle_state_table_update()' function. However, the function has a 'return' which may happen before the optimization has a chance to run. And this may prevent the optimization from happening. This is an unlikely scenario, but possible if user boots with, say, the 'intel_idle.preferred_cstates=6' kernel boot option. This patch fixes the issue by eliminating the problematic 'return' statement. Fixes: 3a9cf77b60dc ("intel_idle: add core C6 optimization for SPR") Suggested-by: Jan Beulich <jbeulich@suse.com> Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> [ rjw: Minor changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: Fix the 'preferred_cstates' module parameter	Artem Bityutskiy	2022-04-27	1	-7/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem description. When user boots kernel up with the 'intel_idle.preferred_cstates=4' option, we enable C1E and disable C1 states on Sapphire Rapids Xeon (SPR). In order for C1E to work on SPR, we have to enable the C1E promotion bit on all CPUs. However, we enable it only on one CPU. Fix description. The 'intel_idle' driver already has the infrastructure for disabling C1E promotion on every CPU. This patch uses the same infrastructure for enabling C1E promotion on every CPU. It changes the boolean 'disable_promotion_to_c1e' variable to a tri-state 'c1e_promotion' variable. Tested on a 2-socket SPR system. I verified the following combinations: * C1E promotion enabled and disabled in BIOS. * Booted with and without the 'intel_idle.preferred_cstates=4' kernel argument. In all 4 cases C1E promotion was correctly set on all CPUs. Also tested on an old Broadwell system, just to make sure it does not cause a regression. C1E promotion was correctly disabled on that system, both C1 and C1E were exposed (as expected). Fixes: da0e58c038e6 ("intel_idle: add 'preferred_cstates' module argument") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> [ rjw: Minor changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	cpuidle: intel_idle: Drop redundant backslash at line end	Rafael J. Wysocki	2022-03-17	1	-1/+1
\| \| \| \| \| \| \| \|	Drop a redundant backslash character at the end of a line in the spr_cstates[] definition. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
*	cpuidle: intel_idle: Update intel_idle() kerneldoc comment	Rafael J. Wysocki	2022-03-17	1	-3/+0
\| \| \| \| \| \| \| \| \|	Commit bf9282dc26e7 ("cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic") moved the leave_mm() call away from intel_idle(), but it didn't update its kerneldoc comment accordingly, so do that now. Fixes: bf9282dc26e7 ("cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: add core C6 optimization for SPR	Artem Bityutskiy	2022-03-04	1	-0/+15
\| \| \| \| \| \| \| \| \|	Add a Sapphire Rapids Xeon C6 optimization, similar to what we have for Sky Lake Xeon: if package C6 is disabled, adjust C6 exit latency and target residency to match core C6 values, instead of using the default package C6 values. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: add 'preferred_cstates' module argument	Artem Bityutskiy	2022-03-04	1	-0/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually exclusive - only one of them can be enabled. By default, 'intel_idle' driver enables C1 and disables C1E. However, some users prefer to use C1E instead of C1, because it saves more energy. This patch adds a new module parameter ('preferred_cstates') for enabling C1E and disabling C1. Here is the idea behind it. 1. This option has effect only for "mutually exclusive" C-states like C1 and C1E on SPR. 2. It does not have any effect on independent C-states, which do not require other C-states to be disabled (most states on most platforms as of today). 3. For mutually exclusive C-states, the 'intel_idle' driver always has a reasonable default, such as enabling C1 on SPR by default. On other platforms, the default may be different. 4. Users can override the default using the 'preferred_cstates' parameter. 5. The parameter accepts the preferred C-states bit-mask, similarly to the existing 'states_off' parameter. 6. This parameter is not limited to C1/C1E, and leaves room for supporting other mutually exclusive C-states, if they come in the future. Today 'intel_idle' can only be compiled-in, which means that on SPR, in order to disable C1 and enable C1E, users should boot with the following kernel argument: intel_idle.preferred_cstates=4 Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: add SPR support	Artem Bityutskiy	2022-03-04	1	-0/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add Sapphire Rapids Xeon support. Up until very recently, the C1 and C1E C-states were independent, but this has changed in some new chips, including Sapphire Rapids Xeon (SPR). In these chips the C1 and C1E states cannot be enabled at the same time. The "C1E promotion" bit in 'MSR_IA32_POWER_CTL' also has its semantics changed a bit. Here are the C1, C1E, and "C1E promotion" bit rules on Xeons before SPR. 1. If C1E promotion bit is disabled. a. C1 requests end up with C1 C-state. b. C1E requests end up with C1E C-state. 2. If C1E promotion bit is enabled. a. C1 requests end up with C1E C-state. b. C1E requests end up with C1E C-state. Here are the C1, C1E, and "C1E promotion" bit rules on Sapphire Rapids Xeon. 1. If C1E promotion bit is disabled. a. C1 requests end up with C1 C-state. b. C1E requests end up with C1 C-state. 2. If C1E promotion bit is enabled. a. C1 requests end up with C1E C-state. b. C1E requests end up with C1E C-state. Before SPR Xeon, the 'intel_idle' driver was disabling C1E promotion and was exposing C1 and C1E as independent C-states. But on SPR, C1 and C1E cannot be enabled at the same time. This patch adds both C1 and C1E states. However, C1E is marked as with the "CPUIDLE_FLAG_UNUSABLE" flag, which means that in won't be registered by default. The C1E promotion bit will be cleared, which means that by default only C1 and C6 will be registered on SPR. The next patch will add an option for enabling C1E and disabling C1 on SPR. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: enable interrupts before C1 on Xeons	Artem Bityutskiy	2021-09-24	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable local interrupts before requesting C1 on the last two generations of Intel Xeon platforms: Sky Lake, Cascade Lake, Cooper Lake, Ice Lake. This decreases average C1 interrupt latency by about 5-10%, as measured with the 'wult' tool. The '->enter()' function of the driver enters C-states with local interrupts disabled by executing the 'monitor' and 'mwait' pair of instructions. If an interrupt happens, the CPU exits the C-state and continues executing instructions after 'mwait'. It does not jump to the interrupt handler, because local interrupts are disabled. The cpuidle subsystem enables interrupts a bit later, after doing some housekeeping. With this patch, we enable local interrupts before requesting C1. In this case, if the CPU wakes up because of an interrupt, it will jump to the interrupt handler right away. The cpuidle housekeeping will be done after the pending interrupt(s) are handled. Enabling interrupts before entering a C-state has measurable impact for faster C-states, like C1. Deeper, but slower C-states like C6 do not really benefit from this sort of change, because their latency is a lot higher comparing to the delay added by cpuidle housekeeping. This change was also tested with cyclictest and dbench. In case of Ice Lake, the average cyclictest latency decreased by 5.1%, and the average 'dbench' throughput increased by about 0.8%. Both tests were run for 4 hours with only C1 enabled (all other idle states, including 'POLL', were disabled). CPU frequency was pinned to HFM, and uncore frequency was pinned to the maximum value. The other platforms had similar single-digit percentage improvements. It is worth noting that this patch affects 'cpuidle' statistics a tiny bit. Before this patch, C1 residency did not include the interrupt handling time, but with this patch, it will include it. This is similar to what happens in case of the 'POLL' state, which also runs with interrupts enabled. Suggested-by: Len Brown <len.brown@intel.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: Adjust the SKX C6 parameters if PC6 is disabled	Chen Yu	2021-06-09	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because cpuidle assumes worst-case C-state parameters, PC6 parameters are used for describing C6, which is worst-case for requesting CC6. When PC6 is enabled, this is appropriate. But if PC6 is disabled in the BIOS, the exit latency and target residency should be adjusted accordingly. Exit latency: Previously the C6 exit latency was measured as the PC6 exit latency. With PC6 disabled, the C6 exit latency should be the one of CC6. Target residency: With PC6 disabled, the idle duration within [CC6, PC6) would make the idle governor choose C1E over C6. This would cause low energy-efficiency. We should lower the bar to request C6 when PC6 is disabled. To fill this gap, check if PC6 is disabled in the BIOS in the MSR_PKG_CST_CONFIG_CONTROL(0xe2) register. If so, use the CC6 exit latency for C6 and set target_residency to 3 times of the new exit latency. [This is consistent with how intel_idle driver uses _CST to calculate the target_residency.] As a result, the OS would be more likely to choose C6 over C1E when PC6 is disabled, which is reasonable, because if C6 is enabled, it implies that the user cares about energy, so choosing C6 more frequently makes sense. The new CC6 exit latency of 92us was measured with wult[1] on SKX via NIC wakeup as the 99.99th percentile. Also CLX and CPX both have the same CPU model number as SkX, but their CC6 exit latencies are similar to the SKX one, 96us and 89us respectively, so reuse the SKX value for them. There is a concern that it might be better to use a more generic approach instead of optimizing every platform. However, if the required code complexity and different PC6 bit interpretation on different platforms are taken into account, tuning the code per platform seems to be an acceptable tradeoff. Link: https://intel.github.io/wult/ # [1] Suggested-by: Len Brown <len.brown@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: add Iclelake-D support	Artem Bityutskiy	2021-04-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	This patch adds Icelake Xeon D support to the intel_idle driver. Since Icelake D and Icelake SP C-state characteristics the same, we use Icelake SP C-states table for Icelake D as well. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: update ICX C6 data	Artem Bityutskiy	2021-03-18	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change IceLake Xeon C6 latency from 128 us to 170 us. The latency was measured with the "wult" tool and corresponds to the 99.99th percentile when measuring with the "nic" method. Note, the 128 us figure correspond to the median latency, but in intel_idle we use the "worst case" latency figure instead. C6 target residency was increased from 384 us to 600 us, which may result in less C6 residency in some workloads. This value was tested and compared to values 384, and 1000. Value 600 is a reasonable tradeoff between power and performance. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: remove definition of DEBUG	Tom Rix	2021-01-22	1	-1/+1
\| \| \| \| \| \| \| \|	Defining DEBUG should only be done in development. So remove DEBUG. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: add SnowRidge C-state table	Artem Bityutskiy	2020-12-30	1	-1/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add C-state table for the SnowRidge SoC which is found on Intel Jacobsville platforms. The following has been changed. 1. C1E latency changed from 10us to 15us. It was measured using the open source "wult" tool (the "nic" method, 15us is the 99.99th percentile). 2. C1E power break even changed from 20us to 25us, which may result in less C1E residency in some workloads. 3. C6 latency changed from 50us to 130us. Measured the same way as C1E. The C6 C-state is supported only by some SnowRidge revisions, so add a C-state table commentary about this. On SnowRidge, C6 support is enumerated via the usual mechanism: "mwait" leaf of the "cpuid" instruction. The 'intel_idle' driver does check this leaf, so even though C6 is present in the table, the driver will only use it if the CPU does support it. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: Build fix	Peter Zijlstra	2020-12-03	1	-14/+14
\| \| \| \| \| \| \| \| \|	Because CONFIG_ soup. Fixes: 6e1d2bc675bd ("intel_idle: Fix intel_idle() vs tracing") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201130115402.GO3040@hirez.programming.kicks-ass.net
*	intel_idle: Fix intel_idle() vs tracing	Peter Zijlstra	2020-11-24	1	-17/+20
\| \| \| \| \| \| \| \| \| \| \|	cpuidle->enter() callbacks should not call into tracing because RCU has already been disabled. Instead of doing the broadcast thing itself, simply advertise to the cpuidle core that those states stop the timer. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20201123143510.GR3021@hirez.programming.kicks-ass.net
*	intel_idle: Fix max_cstate for processor models without C-state tables	Chen Yu	2020-10-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently intel_idle driver gets the c-state information from ACPI _CST if the processor model is not recognized by it. However the c-state in _CST starts with index 1 which is different from the index in intel_idle driver's internal c-state table. While intel_idle_max_cstate_reached() was previously introduced to deal with intel_idle driver's internal c-state table, re-using this function directly on _CST is incorrect. Fix this by subtracting 1 from the index when checking max_cstate in the _CST case. For example, append intel_idle.max_cstate=1 in boot command line, Before the patch: grep . /sys/devices/system/cpu/cpu0/cpuidle/state/name POLL After the patch: grep . /sys/devices/system/cpu/cpu0/cpuidle/state/name /sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL /sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI Fixes: 18734958e9bf ("intel_idle: Use ACPI _CST for processor models without C-state tables") Reported-by: Pengfei Xu <pengfei.xu@intel.com> Cc: 5.6+ <stable@vger.kernel.org> # 5.6+ Signed-off-by: Chen Yu <yu.c.chen@intel.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: Ignore _CST if control cannot be taken from the platform	Mel Gorman	2020-10-16	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	e6d4f08a6776 ("intel_idle: Use ACPI _CST on server systems") avoids enabling c-states that have been disabled by the platform with the exception of C1E. Unfortunately, BIOS implementations are not always consistent in terms of how capabilities are advertised and control cannot always be handed over. If control cannot be handed over then intel_idle reports that "ACPI _CST not found or not usable" but does not clear acpi_state_table.count meaning the information is still partially used. This patch ignores ACPI information if CST control cannot be requested from the platform. This was only observed on a number of Haswell platforms that had identical CPUs but not identical BIOS versions. While this problem may be rare overall, 24 separate test cases bisected to this specific commit across 4 separate test machines and is worth addressing. If the situation occurs, the kernel behaves as it did before commit e6d4f08a6776 and uses any c-states that are discovered. The affected test cases were all ones that involved a small number of processes -- exec microbenchmark, pipe microbenchmark, git test suite, netperf, tbench with one client and system call microbenchmark. Each case benefits from being able to use turboboost which is prevented if the lower c-states are unavailable. This may mask real regressions specific to older hardware so it is worth addressing. C-state status before and after the patch 5.9.0-vanilla POLL latency:0 disabled:0 default:enabled 5.9.0-vanilla C1 latency:2 disabled:0 default:enabled 5.9.0-vanilla C1E latency:10 disabled:0 default:enabled 5.9.0-vanilla C3 latency:33 disabled:1 default:disabled 5.9.0-vanilla C6 latency:133 disabled:1 default:disabled 5.9.0-ignore-cst-v1r1 POLL latency:0 disabled:0 default:enabled 5.9.0-ignore-cst-v1r1 C1 latency:2 disabled:0 default:enabled 5.9.0-ignore-cst-v1r1 C1E latency:10 disabled:0 default:enabled 5.9.0-ignore-cst-v1r1 C3 latency:33 disabled:0 default:enabled 5.9.0-ignore-cst-v1r1 C6 latency:133 disabled:0 default:enabled Patch enables C3/C6. Netperf UDP_STREAM netperf-udp 5.5.0 5.9.0 vanilla ignore-cst-v1r1 Hmean send-64 193.41 ( 0.00%) 226.54 * 17.13%* Hmean send-128 392.16 ( 0.00%) 450.54 * 14.89%* Hmean send-256 769.94 ( 0.00%) 881.85 * 14.53%* Hmean send-1024 2994.21 ( 0.00%) 3468.95 * 15.85%* Hmean send-2048 5725.60 ( 0.00%) 6628.99 * 15.78%* Hmean send-3312 8468.36 ( 0.00%) 10288.02 * 21.49%* Hmean send-4096 10135.46 ( 0.00%) 12387.57 * 22.22%* Hmean send-8192 17142.07 ( 0.00%) 19748.11 * 15.20%* Hmean send-16384 28539.71 ( 0.00%) 30084.45 * 5.41%* Fixes: e6d4f08a6776 ("intel_idle: Use ACPI _CST on server systems") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: 5.6+ <stable@vger.kernel.org> # 5.6+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	intel_idle: mention assumption that WBINVD is not needed	Alexander Monakov	2020-10-16	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Intel SDM does not explicitly say that entering a C-state via MWAIT will implicitly flush CPU caches as appropriate for that C-state. However, documentation for individual Intel CPU generations does mention this behavior. Since intel_idle binds to any Intel CPU with MWAIT, list this assumption of MWAIT behavior. In passing, reword opening comment to make it clear that the driver can load on any old and future Intel CPU with MWAIT. Signed-off-by: Alexander Monakov <amonakov@ispras.ru> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic	Peter Zijlstra	2020-08-26	1	-16/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This allows moving the leave_mm() call into generic code before rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.369441600@infradead.org
*	Merge tag 'uninit-macro-v5.9-rc1' of ↵	Linus Torvalds	2020-08-04	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull uninitialized_var() macro removal from Kees Cook: "This is long overdue, and has hidden too many bugs over the years. The series has several "by hand" fixes, and then a trivial treewide replacement. - Clean up non-trivial uses of uninitialized_var() - Update documentation and checkpatch for uninitialized_var() removal - Treewide removal of uninitialized_var()" * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: compiler: Remove uninitialized_var() macro treewide: Remove uninitialized_var() usage checkpatch: Remove awareness of uninitialized_var() macro mm/debug_vm_pgtable: Remove uninitialized_var() usage f2fs: Eliminate usage of uninitialized_var() macro media: sur40: Remove uninitialized_var() usage KVM: PPC: Book3S PR: Remove uninitialized_var() usage clk: spear: Remove uninitialized_var() usage clk: st: Remove uninitialized_var() usage spi: davinci: Remove uninitialized_var() usage ide: Remove uninitialized_var() usage rtlwifi: rtl8192cu: Remove uninitialized_var() usage b43: Remove uninitialized_var() usage drbd: Remove uninitialized_var() usage x86/mm/numa: Remove uninitialized_var() usage docs: deprecated.rst: Add uninitialized_var()
\| *	treewide: Remove uninitialized_var() usage	Kees Cook	2020-07-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' \| cut -d: -f1 \| sort -u \| \ xargs perl -pi -e \ 's/\buninitialized_var$([^$]+)\)/\1/g; s:\s/\ (GCC be quiet\|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
* \|	intel_idle: Customize IceLake server support	Chen Yu	2020-07-30	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On ICX platform, the C1E auto-promotion is enabled by default. As a result, the CPU might fall into C1E more offen than previous platforms. Besides, the C1E is not exposed to sysfs on ICX, which is inconsistent with previous server platforms. So disable C1E auto-promotion and expose C1E as a separate idle state, so the C1E and C6 can be disabled via sysfs when necessary. Beside C1 and C1E, the exit latency of C6 was measured by a dedicated tool. However the exit latency(41us) exposed by _CST is much smaller than the one we measured(128us). This is probably due to the _CST uses the exit latency when woken up from PC0+C6, rather than PC6+C6 when C6 was measured. Choose the latter as we need the longest latency in theory. Reported-by: kernel test robot <lkp@intel.com> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* \|	cpuidle: change enter_s2idle() prototype	Neal Liu	2020-07-29	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Control Flow Integrity(CFI) is a security mechanism that disallows changes to the original control flow graph of a compiled binary, making it significantly harder to perform such attacks. init_state_node() assign same function callback to different function pointer declarations. static int init_state_node(struct cpuidle_state idle_state, const struct of_device_id matches, struct device_node state_node) { ... idle_state->enter = match_id->data; ... idle_state->enter_s2idle = match_id->data; } Function declarations: struct cpuidle_state { ... int (enter) (struct cpuidle_device dev, struct cpuidle_driver drv, int index); void (enter_s2idle) (struct cpuidle_device dev, struct cpuidle_driver *drv, int index); }; In this case, either enter() or enter_s2idle() would cause CFI check failed since they use same callee. Align function prototype of enter() since it needs return value for some use cases. The return value of enter_s2idle() is no need currently. Signed-off-by: Neal Liu <neal.liu@mediatek.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* \|	intel_idle: Eliminate redundant static variable	Rafael J. Wysocki	2020-06-29	1	-8/+3
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The value of the lapic_timer_always_reliable static variable in the intel_idle driver reflects the boot_cpu_has(X86_FEATURE_ARAT) value and so it also reflects the static_cpu_has(X86_FEATURE_ARAT) value. Hence, the lapic_timer_always_reliable check in intel_idle() is redundant and apart from this lapic_timer_always_reliable is only used in two places in which boot_cpu_has(X86_FEATURE_ARAT) can be used directly. Eliminate the lapic_timer_always_reliable variable in accordance with the above observations. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	Merge branch 'perf-core-for-linus' of ↵	Linus Torvalds	2020-03-30	1	-41/+38
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main changes in this cycle were: Kernel side changes: - A couple of x86/cpu cleanups and changes were grandfathered in due to patch dependencies. These clean up the set of CPU model/family matching macros with a consistent namespace and C99 initializer style. - A bunch of updates to various low level PMU drivers: * AMD Family 19h L3 uncore PMU * Intel Tiger Lake uncore support * misc fixes to LBR TOS sampling - optprobe fixes - perf/cgroup: optimize cgroup event sched-in processing - misc cleanups and fixes Tooling side changes are to: - perf {annotate,expr,record,report,stat,test} - perl scripting - libapi, libperf and libtraceevent - vendor events on Intel and S390, ARM cs-etm - Intel PT updates - Documentation changes and updates to core facilities - misc cleanups, fixes and other enhancements" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (89 commits) cpufreq/intel_pstate: Fix wrong macro conversion x86/cpu: Cleanup the now unused CPU match macros hwrng: via_rng: Convert to new X86 CPU match macros crypto: Convert to new CPU match macros ASoC: Intel: Convert to new X86 CPU match macros powercap/intel_rapl: Convert to new X86 CPU match macros PCI: intel-mid: Convert to new X86 CPU match macros mmc: sdhci-acpi: Convert to new X86 CPU match macros intel_idle: Convert to new X86 CPU match macros extcon: axp288: Convert to new X86 CPU match macros thermal: Convert to new X86 CPU match macros hwmon: Convert to new X86 CPU match macros platform/x86: Convert to new CPU match macros EDAC: Convert to new X86 CPU match macros cpufreq: Convert to new X86 CPU match macros ACPI: Convert to new X86 CPU match macros x86/platform: Convert to new CPU match macros x86/kernel: Convert to new CPU match macros x86/kvm: Convert to new CPU match macros x86/perf/events: Convert to new CPU match macros ...
\| *	intel_idle: Convert to new X86 CPU match macros	Thomas Gleixner	2020-03-24	1	-41/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new macro set has a consistent namespace and uses C99 initializers instead of the grufty C89 ones. Get rid the of the local macro wrappers for consistency. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lkml.kernel.org/r/20200320131510.193755545@linutronix.de
* \|	intel_idle: Update copyright notice, known limitations and version	Rafael J. Wysocki	2020-02-11	1	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update the copyright notice in intel_idle.c to cover the recent changes, drop the description of a "known limitation" that is not a limitation any more and bump up the driver version number. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>