summaryrefslogtreecommitdiff
path: root/arch/arm64/include
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'riscv-for-linus-5.12-mw0' of ↵Linus Torvalds2021-02-261-47/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: "A handful of new RISC-V related patches for this merge window: - A check to ensure drivers are properly using uaccess. This isn't manifesting with any of the drivers I'm currently using, but may catch errors in new drivers. - Some preliminary support for the FU740, along with the HiFive Unleashed it will appear on. - NUMA support for RISC-V, which involves making the arm64 code generic. - Support for kasan on the vmalloc region. - A handful of new drivers for the Kendryte K210, along with the DT plumbing required to boot on a handful of K210-based boards. - Support for allocating ASIDs. - Preliminary support for kernels larger than 128MiB. - Various other improvements to our KASAN support, including the utilization of huge pages when allocating the KASAN regions. We may have already found a bug with the KASAN_VMALLOC code, but it's passing my tests. There's a fix in the works, but that will probably miss the merge window. * tag 'riscv-for-linus-5.12-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (75 commits) riscv: Improve kasan population by using hugepages when possible riscv: Improve kasan population function riscv: Use KASAN_SHADOW_INIT define for kasan memory initialization riscv: Improve kasan definitions riscv: Get rid of MAX_EARLY_MAPPING_SIZE soc: canaan: Sort the Makefile alphabetically riscv: Disable KSAN_SANITIZE for vDSO riscv: Remove unnecessary declaration riscv: Add Canaan Kendryte K210 SD card defconfig riscv: Update Canaan Kendryte K210 defconfig riscv: Add Kendryte KD233 board device tree riscv: Add SiPeed MAIXDUINO board device tree riscv: Add SiPeed MAIX GO board device tree riscv: Add SiPeed MAIX DOCK board device tree riscv: Add SiPeed MAIX BiT board device tree riscv: Update Canaan Kendryte K210 device tree dt-bindings: add resets property to dw-apb-timer dt-bindings: fix sifive gpio properties dt-bindings: update sifive uart compatible string dt-bindings: update sifive clint compatible string ...
| * numa: Move numa implementation to common codeAtish Patra2021-01-141-47/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARM64 numa implementation is generic enough that RISC-V can reuse that implementation with very minor cosmetic changes. This will help both ARM64 and RISC-V in terms of maintanace and feature improvement Move the numa implementation code to common directory so that both ISAs can reuse this. This doesn't introduce any function changes for ARM64. Signed-off-by: Atish Patra <atish.patra@wdc.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
| * arm64, numa: Change the numa init functions name to be genericAtish Patra2021-01-141-2/+2
| | | | | | | | | | | | | | | | | | | | This is a preparatory patch for unifying numa implementation between ARM64 & RISC-V. As the numa implementation will be moved to generic code, rename the arm64 related functions to a generic one. Signed-off-by: Atish Patra <atish.patra@wdc.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
* | Merge tag 'arm64-fixes' of ↵Linus Torvalds2021-02-261-3/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "The big one is a fix for the VHE enabling path during early boot, where the code enabling the MMU wasn't necessarily in the identity map of the new page-tables, resulting in a consistent crash with 64k pages. In fixing that, we noticed some missing barriers too, so we added those for the sake of architectural compliance. Other than that, just the usual merge window trickle. There'll be more to come, too. Summary: - Fix lockdep false alarm on resume-from-cpuidle path - Fix memory leak in kexec_file - Fix module linker script to work with GDB - Fix error code when trying to use uprobes with AArch32 instructions - Fix late VHE enabling with 64k pages - Add missing ISBs after TLB invalidation - Fix seccomp when tracing syscall -1 - Fix stacktrace return code at end of stack - Fix inconsistent whitespace for pointer return values - Fix compiler warnings when building with W=1" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: stacktrace: Report when we reach the end of the stack arm64: ptrace: Fix seccomp of traced syscall -1 (NO_SYSCALL) arm64: Add missing ISB after invalidating TLB in enter_vhe arm64: Add missing ISB after invalidating TLB in __primary_switch arm64: VHE: Enable EL2 MMU from the idmap KVM: arm64: make the hyp vector table entries local arm64/mm: Fixed some coding style issues arm64: uprobe: Return EOPNOTSUPP for AARCH32 instruction probing kexec: move machine_kexec_post_load() to public interface arm64 module: set plt* section addresses to 0x0 arm64: kexec_file: fix memory leakage in create_dtb() when fdt_open_into() fails arm64: spectre: Prevent lockdep splat on v4 mitigation enable path
| * | arm64 module: set plt* section addresses to 0x0Shaoying Xu2021-02-191-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These plt* and .text.ftrace_trampoline sections specified for arm64 have non-zero addressses. Non-zero section addresses in a relocatable ELF would confuse GDB when it tries to compute the section offsets and it ends up printing wrong symbol addresses. Therefore, set them to zero, which mirrors the change in commit 5d8591bc0fba ("module: set ksymtab/kcrctab* section addresses to 0x0"). Reported-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Shaoying Xu <shaoyi@amazon.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210216183234.GA23876@amazon.com Signed-off-by: Will Deacon <will@kernel.org>
* | | arm64: kasan: simplify and inline MTE functionsAndrey Konovalov2021-02-265-11/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change provides a simpler implementation of mte_get_mem_tag(), mte_get_random_tag(), and mte_set_mem_tag_range(). Simplifications include removing system_supports_mte() checks as these functions are onlye called from KASAN runtime that had already checked system_supports_mte(). Besides that, size and address alignment checks are removed from mte_set_mem_tag_range(), as KASAN now does those. This change also moves these functions into the asm/mte-kasan.h header and implements mte_set_mem_tag_range() via inline assembly to avoid unnecessary functions calls. [vincenzo.frascino@arm.com: fix warning in mte_get_random_tag()] Link: https://lkml.kernel.org/r/20210211152208.23811-1-vincenzo.frascino@arm.com Link: https://lkml.kernel.org/r/a26121b294fdf76e369cb7a74351d1c03a908930.1612546384.git.andreyknvl@google.com Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | kfence: use pt_regs to generate stack trace on faultsMarco Elver2021-02-261-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of removing the fault handling portion of the stack trace based on the fault handler's name, just use struct pt_regs directly. Change kfence_handle_page_fault() to take a struct pt_regs, and plumb it through to kfence_report_error() for out-of-bounds, use-after-free, or invalid access errors, where pt_regs is used to generate the stack trace. If the kernel is a DEBUG_KERNEL, also show registers for more information. Link: https://lkml.kernel.org/r/20201105092133.2075331-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | arm64, kfence: enable KFENCE for ARM64Marco Elver2021-02-261-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add architecture specific implementation details for KFENCE and enable KFENCE for the arm64 architecture. In particular, this implements the required interface in <asm/kfence.h>. KFENCE requires that attributes for pages from its memory pool can individually be set. Therefore, force the entire linear map to be mapped at page granularity. Doing so may result in extra memory allocated for page tables in case rodata=full is not set; however, currently CONFIG_RODATA_FULL_DEFAULT_ENABLED=y is the default, and the common case is therefore not affected by this change. [elver@google.com: add missing copyright and description header] Link: https://lkml.kernel.org/r/20210118092159.145934-3-elver@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-4-elver@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Co-developed-by: Alexander Potapenko <glider@google.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2021-02-242-0/+13
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc updates from Andrew Morton: "A few small subsystems and some of MM. 172 patches. Subsystems affected by this patch series: hexagon, scripts, ntfs, ocfs2, vfs, and mm (slab-generic, slab, slub, debug, pagecache, swap, memcg, pagemap, mprotect, mremap, page-reporting, vmalloc, kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction, mempolicy, oom-kill, hugetlbfs, and migration)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (172 commits) mm/migrate: remove unneeded semicolons hugetlbfs: remove unneeded return value of hugetlb_vmtruncate() hugetlbfs: fix some comment typos hugetlbfs: correct some obsolete comments about inode i_mutex hugetlbfs: make hugepage size conversion more readable hugetlbfs: remove meaningless variable avoid_reserve hugetlbfs: correct obsolete function name in hugetlbfs_read_iter() hugetlbfs: use helper macro default_hstate in init_hugetlbfs_fs hugetlbfs: remove useless BUG_ON(!inode) in hugetlbfs_setattr() hugetlbfs: remove special hugetlbfs_set_page_dirty() mm/hugetlb: change hugetlb_reserve_pages() to type bool mm, oom: fix a comment in dump_task() mm/mempolicy: use helper range_in_vma() in queue_pages_test_walk() numa balancing: migrate on fault among multiple bound nodes mm, compaction: make fast_isolate_freepages() stay within zone mm/compaction: fix misbehaviors of fast_find_migrateblock() mm/compaction: correct deferral logic for proactive compaction mm/compaction: remove duplicated VM_BUG_ON_PAGE !PageLocked mm/compaction: remove rcu_read_lock during page compaction z3fold: simplify the zhdr initialization code in init_z3fold_page() ...
| * | | kasan, arm64: allow using KUnit tests with HW_TAGS modeAndrey Konovalov2021-02-242-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a high level, this patch allows running KUnit KASAN tests with the hardware tag-based KASAN mode. Internally, this change reenables tag checking at the end of each KASAN test that triggers a tag fault and leads to tag checking being disabled. Also simplify is_write calculation in report_tag_fault. With this patch KASAN tests are still failing for the hardware tag-based mode; fixes come in the next few patches. [andreyknvl@google.com: export HW_TAGS symbols for KUnit tests] Link: https://lkml.kernel.org/r/e7eeb252da408b08f0c81b950a55fb852f92000b.1613155970.git.andreyknvl@google.com Link: https://linux-review.googlesource.com/id/Id94dc9eccd33b23cda4950be408c27f879e474c8 Link: https://lkml.kernel.org/r/51b23112cf3fd62b8f8e9df81026fa2b15870501.1610733117.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge tag 'char-misc-5.12-rc1' of ↵Linus Torvalds2021-02-241-0/+11
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the large set of char/misc/whatever driver subsystem updates for 5.12-rc1. Over time it seems like this tree is collecting more and more tiny driver subsystems in one place, making it easier for those maintainers, which is why this is getting larger. Included in here are: - coresight driver updates - habannalabs driver updates - virtual acrn driver addition (proper acks from the x86 maintainers) - broadcom misc driver addition - speakup driver updates - soundwire driver updates - fpga driver updates - amba driver updates - mei driver updates - vfio driver updates - greybus driver updates - nvmeem driver updates - phy driver updates - mhi driver updates - interconnect driver udpates - fsl-mc bus driver updates - random driver fix - some small misc driver updates (rtsx, pvpanic, etc.) All of these have been in linux-next for a while, with the only reported issue being a merge conflict due to the dfl_device_id addition from the fpga subsystem in here" * tag 'char-misc-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (311 commits) spmi: spmi-pmic-arb: Fix hw_irq overflow Documentation: coresight: Add PID tracing description coresight: etm-perf: Support PID tracing for kernel at EL2 coresight: etm-perf: Clarify comment on perf options ACRN: update MAINTAINERS: mailing list is subscribers-only regmap: sdw-mbq: use MODULE_LICENSE("GPL") regmap: sdw: use no_pm routines for SoundWire 1.2 MBQ regmap: sdw: use _no_pm functions in regmap_read/write soundwire: intel: fix possible crash when no device is detected MAINTAINERS: replace my with email with replacements mhi: Fix double dma free uapi: map_to_7segment: Update example in documentation uio: uio_pci_generic: don't fail probe if pdev->irq equals to IRQ_NOTCONNECTED drivers/misc/vmw_vmci: restrict too big queue size in qp_host_alloc_queue firewire: replace tricky statement by two simple ones vme: make remove callback return void firmware: google: make coreboot driver's remove callback return void firmware: xilinx: Use explicit values for all enum values sample/acrn: Introduce a sample of HSM ioctl interface usage virt: acrn: Introduce an interface for Service VM to control vCPU ...
| * | | arm64: Add TRFCR_ELx definitionsJonathan Zhou2021-02-041-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add definitions for the Arm v8.4 SelfHosted trace extensions registers. [ split the register definitions to separate patch rename some of the symbols ] Link: https://lore.kernel.org/r/20210110224850.1880240-28-suzuki.poulose@arm.com Cc: Will Deacon <will@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jonathan Zhou <jonathan.zhouwen@huawei.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20210201181351.1475223-30-mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | Merge tag 'idmapped-mounts-v5.12' of ↵Linus Torvalds2021-02-232-1/+3
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8 In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...
| * | | | fs: add mount_setattr()Christian Brauner2021-01-242-1/+3
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements the missing mount_setattr() syscall. While the new mount api allows to change the properties of a superblock there is currently no way to change the properties of a mount or a mount tree using file descriptors which the new mount api is based on. In addition the old mount api has the restriction that mount options cannot be applied recursively. This hasn't changed since changing mount options on a per-mount basis was implemented in [1] and has been a frequent request not just for convenience but also for security reasons. The legacy mount syscall is unable to accommodate this behavior without introducing a whole new set of flags because MS_REC | MS_REMOUNT | MS_BIND | MS_RDONLY | MS_NOEXEC | [...] only apply the mount option to the topmost mount. Changing MS_REC to apply to the whole mount tree would mean introducing a significant uapi change and would likely cause significant regressions. The new mount_setattr() syscall allows to recursively clear and set mount options in one shot. Multiple calls to change mount options requesting the same changes are idempotent: int mount_setattr(int dfd, const char *path, unsigned flags, struct mount_attr *uattr, size_t usize); Flags to modify path resolution behavior are specified in the @flags argument. Currently, AT_EMPTY_PATH, AT_RECURSIVE, AT_SYMLINK_NOFOLLOW, and AT_NO_AUTOMOUNT are supported. If useful, additional lookup flags to restrict path resolution as introduced with openat2() might be supported in the future. The mount_setattr() syscall can be expected to grow over time and is designed with extensibility in mind. It follows the extensible syscall pattern we have used with other syscalls such as openat2(), clone3(), sched_{set,get}attr(), and others. The set of mount options is passed in the uapi struct mount_attr which currently has the following layout: struct mount_attr { __u64 attr_set; __u64 attr_clr; __u64 propagation; __u64 userns_fd; }; The @attr_set and @attr_clr members are used to clear and set mount options. This way a user can e.g. request that a set of flags is to be raised such as turning mounts readonly by raising MOUNT_ATTR_RDONLY in @attr_set while at the same time requesting that another set of flags is to be lowered such as removing noexec from a mount tree by specifying MOUNT_ATTR_NOEXEC in @attr_clr. Note, since the MOUNT_ATTR_<atime> values are an enum starting from 0, not a bitmap, users wanting to transition to a different atime setting cannot simply specify the atime setting in @attr_set, but must also specify MOUNT_ATTR__ATIME in the @attr_clr field. So we ensure that MOUNT_ATTR__ATIME can't be partially set in @attr_clr and that @attr_set can't have any atime bits set if MOUNT_ATTR__ATIME isn't set in @attr_clr. The @propagation field lets callers specify the propagation type of a mount tree. Propagation is a single property that has four different settings and as such is not really a flag argument but an enum. Specifically, it would be unclear what setting and clearing propagation settings in combination would amount to. The legacy mount() syscall thus forbids the combination of multiple propagation settings too. The goal is to keep the semantics of mount propagation somewhat simple as they are overly complex as it is. The @userns_fd field lets user specify a user namespace whose idmapping becomes the idmapping of the mount. This is implemented and explained in detail in the next patch. [1]: commit 2e4b7fcd9260 ("[PATCH] r/o bind mounts: honor mount writer counts at remount") Link: https://lore.kernel.org/r/20210121131959.646623-35-christian.brauner@ubuntu.com Cc: David Howells <dhowells@redhat.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-api@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
* | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2021-02-218-69/+57
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "x86: - Support for userspace to emulate Xen hypercalls - Raise the maximum number of user memslots - Scalability improvements for the new MMU. Instead of the complex "fast page fault" logic that is used in mmu.c, tdp_mmu.c uses an rwlock so that page faults are concurrent, but the code that can run against page faults is limited. Right now only page faults take the lock for reading; in the future this will be extended to some cases of page table destruction. I hope to switch the default MMU around 5.12-rc3 (some testing was delayed due to Chinese New Year). - Cleanups for MAXPHYADDR checks - Use static calls for vendor-specific callbacks - On AMD, use VMLOAD/VMSAVE to save and restore host state - Stop using deprecated jump label APIs - Workaround for AMD erratum that made nested virtualization unreliable - Support for LBR emulation in the guest - Support for communicating bus lock vmexits to userspace - Add support for SEV attestation command - Miscellaneous cleanups PPC: - Support for second data watchpoint on POWER10 - Remove some complex workarounds for buggy early versions of POWER9 - Guest entry/exit fixes ARM64: - Make the nVHE EL2 object relocatable - Cleanups for concurrent translation faults hitting the same page - Support for the standard TRNG hypervisor call - A bunch of small PMU/Debug fixes - Simplification of the early init hypercall handling Non-KVM changes (with acks): - Detection of contended rwlocks (implemented only for qrwlocks, because KVM only needs it for x86) - Allow __DISABLE_EXPORTS from assembly code - Provide a saner follow_pfn replacements for modules" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (192 commits) KVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytes KVM: selftests: Don't bother mapping GVA for Xen shinfo test KVM: selftests: Fix hex vs. decimal snafu in Xen test KVM: selftests: Fix size of memslots created by Xen tests KVM: selftests: Ignore recently added Xen tests' build output KVM: selftests: Add missing header file needed by xAPIC IPI tests KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static locking/arch: Move qrwlock.h include after qspinlock.h KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2 KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path KVM: PPC: remove unneeded semicolon KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest KVM: PPC: Book3S HV: Fix radix guest SLB side channel KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR ...
| * \ \ \ Merge tag 'kvmarm-5.12' of ↵Paolo Bonzini2021-02-129-80/+62
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.12 - Make the nVHE EL2 object relocatable, resulting in much more maintainable code - Handle concurrent translation faults hitting the same page in a more elegant way - Support for the standard TRNG hypervisor call - A bunch of small PMU/Debug fixes - Allow the disabling of symbol export from assembly code - Simplification of the early init hypercall handling
| | * \ \ \ Merge branch 'kvm-arm64/pmu-debug-fixes-5.11' into kvmarm-master/nextMarc Zyngier2021-02-121-0/+3
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * | | | KVM: arm64: Upgrade PMU support to ARMv8.4Marc Zyngier2021-02-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upgrading the PMU code from ARMv8.1 to ARMv8.4 turns out to be pretty easy. All that is required is support for PMMIR_EL1, which is read-only, and for which returning 0 is a valid option as long as we don't advertise STALL_SLOT as an implemented event. Let's just do that and adjust what we return to the guest. Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * | | | | Merge branch 'kvm-arm64/rng-5.12' into kvmarm-master/nextMarc Zyngier2021-02-121-0/+2
| | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * | | | | KVM: arm64: Implement the TRNG hypervisor callArd Biesheuvel2021-01-251-0/+2
| | | | |/ / / | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide a hypervisor implementation of the ARM architected TRNG firmware interface described in ARM spec DEN0098. All function IDs are implemented, including both 32-bit and 64-bit versions of the TRNG_RND service, which is the centerpiece of the API. The API is backed by the kernel's entropy pool only, to avoid guests draining more precious direct entropy sources. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> [Andre: minor fixes, drop arch_get_random() usage] Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210106103453.152275-6-andre.przywara@arm.com
| | * | | | | Merge branch 'kvm-arm64/hyp-reloc' into kvmarm-master/nextMarc Zyngier2021-02-124-73/+46
| | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * | | | | KVM: arm64: Remove hyp_symbol_addrDavid Brazdil2021-01-231-26/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hyp code used the hyp_symbol_addr helper to force PC-relative addressing because absolute addressing results in kernel VAs due to the way hyp code is linked. This is not true anymore, so remove the helper and update all of its users. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210105180541.65031-9-dbrazdil@google.com
| | | * | | | | KVM: arm64: Remove patching of fn pointers in hypDavid Brazdil2021-01-231-18/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Storing a function pointer in hyp now generates relocation information used at early boot to convert the address to hyp VA. The existing alternative-based conversion mechanism is therefore obsolete. Remove it and simplify its users. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210105180541.65031-8-dbrazdil@google.com
| | | * | | | | KVM: arm64: Fix constant-pool users in hypDavid Brazdil2021-01-231-26/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hyp code uses absolute addressing to obtain a kimg VA of a small number of kernel symbols. Since the kernel now converts constant pool addresses to hyp VAs, this trick does not work anymore. Change the helpers to convert from hyp VA back to kimg VA or PA, as needed and rework the callers accordingly. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210105180541.65031-7-dbrazdil@google.com
| | | * | | | | KVM: arm64: Apply hyp relocations at runtimeDavid Brazdil2021-01-232-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM nVHE code runs under a different VA mapping than the kernel, hence so far it avoided using absolute addressing because the VA in a constant pool is relocated by the linker to a kernel VA (see hyp_symbol_addr). Now the kernel has access to a list of positions that contain a kimg VA but will be accessed only in hyp execution context. These are generated by the gen-hyprel build-time tool and stored in .hyp.reloc. Add early boot pass over the entries and convert the kimg VAs to hyp VAs. Note that this requires for .hyp* ELF sections to be mapped read-write at that point. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210105180541.65031-6-dbrazdil@google.com
| | | * | | | | KVM: arm64: Add symbol at the beginning of each hyp sectionDavid Brazdil2021-01-231-2/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generating hyp relocations will require referencing positions at a given offset from the beginning of hyp sections. Since the final layout will not be determined until the linking of `vmlinux`, modify the hyp linker script to insert a symbol at the first byte of each hyp section to use as an anchor. The linker of `vmlinux` will place the symbols together with the sections. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210105180541.65031-4-dbrazdil@google.com
| | | * | | | | KVM: arm64: Set up .hyp.rodata ELF sectionDavid Brazdil2021-01-231-1/+1
| | | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We will need to recognize pointers in .rodata specific to hyp, so establish a .hyp.rodata ELF section. Merge it with the existing .hyp.data..ro_after_init as they are treated the same at runtime. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210105180541.65031-3-dbrazdil@google.com
| | * | | | | KVM: arm64: Filter out the case of only changing permissions from stage-2 ↵Yanan Wang2021-01-251-0/+5
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | map path (1) During running time of a a VM with numbers of vCPUs, if some vCPUs access the same GPA almost at the same time and the stage-2 mapping of the GPA has not been built yet, as a result they will all cause translation faults. The first vCPU builds the mapping, and the followed ones end up updating the valid leaf PTE. Note that these vCPUs might want different access permissions (RO, RW, RX, RWX, etc.). (2) It's inevitable that we sometimes will update an existing valid leaf PTE in the map path, and we perform break-before-make in this case. Then more unnecessary translation faults could be caused if the *break stage* of BBM is just catched by other vCPUS. With (1) and (2), something unsatisfactory could happen: vCPU A causes a translation fault and builds the mapping with RW permissions, vCPU B then update the valid leaf PTE with break-before-make and permissions are updated back to RO. Besides, *break stage* of BBM may trigger more translation faults. Finally, some useless small loops could occur. We can make some optimization to solve above problems: When we need to update a valid leaf PTE in the map path, let's filter out the case where this update only change access permissions, and don't update the valid leaf PTE here in this case. Instead, let the vCPU enter back the guest and it will exit next time to go through the relax_perms path without break-before-make if it still wants more permissions. Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210114121350.123684-3-wangyanan55@huawei.com
| * | | | | locking/arch: Move qrwlock.h include after qspinlock.hWaiman Long2021-02-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | include/asm-generic/qrwlock.h was trying to get arch_spin_is_locked via asm-generic/qspinlock.h. However, this does not work because architectures might be using queued rwlocks but not queued spinlocks (csky), or because they might be defining their own queued_* macros before including asm/qspinlock.h. To fix this, ensure that asm/spinlock.h always includes qrwlock.h after defining arch_spin_is_locked (either directly for csky, or via asm/qspinlock.h for other architectures). The only inclusion elsewhere is in kernel/locking/qrwlock.c. That one is really unnecessary because the file is only compiled in SMP configurations (config QUEUED_RWLOCKS depends on SMP) and in that case linux/spinlock.h already includes asm/qrwlock.h if needed, via asm/spinlock.h. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Waiman Long <longman@redhat.com> Fixes: 26128cb6c7e6 ("locking/rwlocks: Add contention detection for rwlocks") Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Ben Gardon <bgardon@google.com> [Add arch/sparc and kernel/locking parts per discussion with Waiman. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | KVM: Raise the maximum number of user memslotsVitaly Kuznetsov2021-02-091-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current KVM_USER_MEM_SLOTS limits are arch specific (512 on Power, 509 on x86, 32 on s390, 16 on MIPS) but they don't really need to be. Memory slots are allocated dynamically in KVM when added so the only real limitation is 'id_to_index' array which is 'short'. We don't have any other KVM_MEM_SLOTS_NUM/KVM_USER_MEM_SLOTS-sized statically defined structures. Low KVM_USER_MEM_SLOTS can be a limiting factor for some configurations. In particular, when QEMU tries to start a Windows guest with Hyper-V SynIC enabled and e.g. 256 vCPUs the limit is hit as SynIC requires two pages per vCPU and the guest is free to pick any GFN for each of them, this fragments memslots as QEMU wants to have a separate memslot for each of these pages (which are supposed to act as 'overlay' pages). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210127175731.2020089-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | | | | Merge tag 'arm64-upstream' of ↵Linus Torvalds2021-02-2119-76/+265
|\ \ \ \ \ \ | | |_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: - vDSO build improvements including support for building with BSD. - Cleanup to the AMU support code and initialisation rework to support cpufreq drivers built as modules. - Removal of synthetic frame record from exception stack when entering the kernel from EL0. - Add support for the TRNG firmware call introduced by Arm spec DEN0098. - Cleanup and refactoring across the board. - Avoid calling arch_get_random_seed_long() from add_interrupt_randomness() - Perf and PMU updates including support for Cortex-A78 and the v8.3 SPE extensions. - Significant steps along the road to leaving the MMU enabled during kexec relocation. - Faultaround changes to initialise prefaulted PTEs as 'old' when hardware access-flag updates are supported, which drastically improves vmscan performance. - CPU errata updates for Cortex-A76 (#1463225) and Cortex-A55 (#1024718) - Preparatory work for yielding the vector unit at a finer granularity in the crypto code, which in turn will one day allow us to defer softirq processing when it is in use. - Support for overriding CPU ID register fields on the command-line. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (85 commits) drivers/perf: Replace spin_lock_irqsave to spin_lock mm: filemap: Fix microblaze build failure with 'mmu_defconfig' arm64: Make CPU_BIG_ENDIAN depend on ld.bfd or ld.lld 13.0.0+ arm64: cpufeatures: Allow disabling of Pointer Auth from the command-line arm64: Defer enabling pointer authentication on boot core arm64: cpufeatures: Allow disabling of BTI from the command-line arm64: Move "nokaslr" over to the early cpufeature infrastructure KVM: arm64: Document HVC_VHE_RESTART stub hypercall arm64: Make kvm-arm.mode={nvhe, protected} an alias of id_aa64mmfr1.vh=0 arm64: Add an aliasing facility for the idreg override arm64: Honor VHE being disabled from the command-line arm64: Allow ID_AA64MMFR1_EL1.VH to be overridden from the command line arm64: cpufeature: Add an early command-line cpufeature override facility arm64: Extract early FDT mapping from kaslr_early_init() arm64: cpufeature: Use IDreg override in __read_sysreg_by_encoding() arm64: cpufeature: Add global feature override facility arm64: Move SCTLR_EL1 initialisation to EL-agnostic code arm64: Simplify init_el2_state to be non-VHE only arm64: Move VHE-specific SPE setup to mutate_to_vhe() arm64: Drop early setting of MDSCR_EL2.TPMS ...
| * | | | | Merge branch 'for-next/rng' into for-next/coreWill Deacon2021-02-121-10/+72
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for the TRNG firmware call introduced by Arm spec DEN0098. * for-next/rng: arm64: Add support for SMCCC TRNG entropy source firmware: smccc: Introduce SMCCC TRNG framework firmware: smccc: Add SMCCC TRNG function call IDs
| | * | | | | arm64: Add support for SMCCC TRNG entropy sourceAndre Przywara2021-01-211-11/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ARM architected TRNG firmware interface, described in ARM spec DEN0098, defines an ARM SMCCC based interface to a true random number generator, provided by firmware. This can be discovered via the SMCCC >=v1.1 interface, and provides up to 192 bits of entropy per call. Hook this SMC call into arm64's arch_get_random_*() implementation, coming to the rescue when the CPU does not implement the ARM v8.5 RNG system registers. For the detection, we piggy back on the PSCI/SMCCC discovery (which gives us the conduit to use (hvc/smc)), then try to call the ARM_SMCCC_TRNG_VERSION function, which returns -1 if this interface is not implemented. Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
| | * | | | | firmware: smccc: Introduce SMCCC TRNG frameworkAndre Przywara2021-01-211-0/+12
| | | |/ / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ARM DEN0098 document describe an SMCCC based firmware service to deliver hardware generated random numbers. Its existence is advertised according to the SMCCC v1.1 specification. Add a (dummy) call to probe functions implemented in each architecture (ARM and arm64), to determine the existence of this interface. For now this return false, but this will be overwritten by each architecture's support patch. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
| * | | | | Merge branch 'for-next/perf' into for-next/coreWill Deacon2021-02-121-1/+8
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Perf and PMU updates including support for Cortex-A78 and the v8.3 SPE extensions. * for-next/perf: drivers/perf: Replace spin_lock_irqsave to spin_lock dt-bindings: arm: add Cortex-A78 binding arm64: perf: add support for Cortex-A78 arm64: perf: Constify static attribute_group structs drivers/perf: Prevent forced unbinding of ARM_DMC620_PMU drivers perf/arm-cmn: Move IRQs when migrating context perf/arm-cmn: Fix PMU instance naming perf: Constify static struct attribute_group perf: hisi: Constify static struct attribute_group perf/imx_ddr: Constify static struct attribute_group perf: qcom: Constify static struct attribute_group drivers/perf: Add support for ARMv8.3-SPE
| | * | | | | drivers/perf: Add support for ARMv8.3-SPEWei Li2021-01-201-1/+8
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Armv8.3 extends the SPE by adding: - Alignment field in the Events packet, and filtering on this event using PMSEVFR_EL1. - Support for the Scalable Vector Extension (SVE). The main additions for SVE are: - Recording the vector length for SVE operations in the Operation Type packet. It is not possible to filter on vector length. - Incomplete predicate and empty predicate fields in the Events packet, and filtering on these events using PMSEVFR_EL1. Update the check of pmsevfr for empty/partial predicated SVE and alignment event in SPE driver. Signed-off-by: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20201203141609.14148-1-liwei391@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
| * | | | | Merge branch 'for-next/misc' into for-next/coreWill Deacon2021-02-126-17/+37
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Miscellaneous arm64 changes for 5.12. * for-next/misc: arm64: Make CPU_BIG_ENDIAN depend on ld.bfd or ld.lld 13.0.0+ arm64: vmlinux.ld.S: add assertion for tramp_pg_dir offset arm64: vmlinux.ld.S: add assertion for reserved_pg_dir offset arm64/ptdump:display the Linear Mapping start marker arm64: ptrace: Fix missing return in hw breakpoint code KVM: arm64: Move __hyp_set_vectors out of .hyp.text arm64: Include linux/io.h in mm/mmap.c arm64: cacheflush: Remove stale comment arm64: mm: Remove unused header file arm64/sparsemem: reduce SECTION_SIZE_BITS arm64/mm: Add warning for outside range requests in vmemmap_populate() arm64: Drop workaround for broken 'S' constraint with GCC 4.9
| | * | | | | arm64: vmlinux.ld.S: add assertion for tramp_pg_dir offsetJoey Gouly2021-02-031-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add TRAMP_SWAPPER_OFFSET and use that instead of hardcoding the offset between swapper_pg_dir and tramp_pg_dir. Then use TRAMP_SWAPPER_OFFSET to assert that the offset is correct at link time. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20210202123658.22308-3-joey.gouly@arm.com Signed-off-by: Will Deacon <will@kernel.org>
| | * | | | | arm64: vmlinux.ld.S: add assertion for reserved_pg_dir offsetJoey Gouly2021-02-033-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add RESERVED_SWAPPER_OFFSET and use that instead of hardcoding the offset between swapper_pg_dir and reserved_pg_dir. Then use RESERVED_SWAPPER_OFFSET to assert that the offset is correct at link time. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20210202123658.22308-2-joey.gouly@arm.com Signed-off-by: Will Deacon <will@kernel.org>
| | * | | | | arm64: cacheflush: Remove stale commentShaokun Zhang2021-01-261-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove stale comment since commit a7ba121215fa ("arm64: use asm-generic/cacheflush.h") Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Link: https://lore.kernel.org/r/1611575753-36435-1-git-send-email-zhangshaokun@hisilicon.com Signed-off-by: Will Deacon <will@kernel.org>
| | * | | | | arm64/sparsemem: reduce SECTION_SIZE_BITSSudarshan Rajagopalan2021-01-211-2/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | memory_block_size_bytes() determines the memory hotplug granularity i.e the amount of memory which can be hot added or hot removed from the kernel. The generic value here being MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS) for memory_block_size_bytes() on platforms like arm64 that does not override. Current SECTION_SIZE_BITS is 30 i.e 1GB which is large and a reduction here increases memory hotplug granularity, thus improving its agility. A reduced section size also reduces memory wastage in vmemmmap mapping for sections with large memory holes. So we try to set the least section size as possible. A section size bits selection must follow: (MAX_ORDER - 1 + PAGE_SHIFT) <= SECTION_SIZE_BITS CONFIG_FORCE_MAX_ZONEORDER is always defined on arm64 and so just following it would help achieve the smallest section size. SECTION_SIZE_BITS = (CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT) SECTION_SIZE_BITS = 22 (11 - 1 + 12) i.e 4MB for 4K pages SECTION_SIZE_BITS = 24 (11 - 1 + 14) i.e 16MB for 16K pages without THP SECTION_SIZE_BITS = 25 (12 - 1 + 14) i.e 32MB for 16K pages with THP SECTION_SIZE_BITS = 26 (11 - 1 + 16) i.e 64MB for 64K pages without THP SECTION_SIZE_BITS = 29 (14 - 1 + 16) i.e 512MB for 64K pages with THP But there are other problems in reducing SECTION_SIZE_BIT. Reducing it by too much would over populate /sys/devices/system/memory/ and also consume too many page->flags bits in the !vmemmap case. Also section size needs to be multiple of 128MB to have PMD based vmemmap mapping with CONFIG_ARM64_4K_PAGES. Given these constraints, lets just reduce the section size to 128MB for 4K and 16K base page size configs, and to 512MB for 64K base page size config. Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org> Suggested-by: Anshuman Khandual <anshuman.khandual@arm.com> Suggested-by: David Hildenbrand <david@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Steven Price <steven.price@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/43843c5e092bfe3ec4c41e3c8c78a7ee35b69bb0.1611206601.git.sudaraja@codeaurora.org Signed-off-by: Will Deacon <will@kernel.org>
| | * | | | | arm64: Drop workaround for broken 'S' constraint with GCC 4.9Marc Zyngier2021-01-201-7/+1
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since GCC < 5.1 has been shown to be unsuitable for the arm64 kernel, let's drop the workaround for the 'S' asm constraint that GCC 4.9 doesn't always grok. This is effectively a revert of 9fd339a45be5 ("arm64: Work around broken GCC 4.9 handling of "S" constraint"). Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210118130129.2875949-1-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
| * | | | | Merge branch 'for-next/kexec' into for-next/coreWill Deacon2021-02-123-6/+45
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Significant steps along the road to leaving the MMU enabled during kexec relocation. * for-next/kexec: arm64: hibernate: add __force attribute to gfp_t casting arm64: kexec: arm64_relocate_new_kernel don't use x0 as temp arm64: kexec: arm64_relocate_new_kernel clean-ups and optimizations arm64: kexec: call kexec_image_info only once arm64: kexec: move relocation function setup arm64: trans_pgd: hibernate: idmap the single page that holds the copy page routines arm64: mm: Always update TCR_EL1 from __cpu_set_tcr_t0sz() arm64: trans_pgd: pass NULL instead of init_mm to *_populate functions arm64: trans_pgd: pass allocator trans_pgd_create_copy arm64: trans_pgd: make trans_pgd_map_page generic arm64: hibernate: move page handling function to new trans_pgd.c arm64: hibernate: variable pudp is used instead of pd4dp arm64: kexec: make dtb_mem always enabled
| | * | | | | arm64: kexec: move relocation function setupPavel Tatashin2021-01-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, kernel relocation function is configured in machine_kexec() at the time of kexec reboot by using control_code_page. This operation, however, is more logical to be done during kexec_load, and thus remove from reboot time. Move, setup of this function to newly added machine_kexec_post_load(). Because once MMU is enabled, kexec control page will contain more than relocation kernel, but also vector table, add pointer to the actual function within this page arch.kern_reloc. Currently, it equals to the beginning of page, we will add offsets later, when vector table is added. Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20210125191923.1060122-10-pasha.tatashin@soleen.com Signed-off-by: Will Deacon <will@kernel.org>
| | * | | | | arm64: trans_pgd: hibernate: idmap the single page that holds the copy page ↵James Morse2021-01-271-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | routines To resume from hibernate, the contents of memory are restored from the swap image. This may overwrite any page, including the running kernel and its page tables. Hibernate copies the code it uses to do the restore into a single page that it knows won't be overwritten, and maps it with page tables built from pages that won't be overwritten. Today the address it uses for this mapping is arbitrary, but to allow kexec to reuse this code, it needs to be idmapped. To idmap the page we must avoid the kernel helpers that have VA_BITS baked in. Convert create_single_mapping() to take a single PA, and idmap it. The page tables are built in the reverse order to normal using pfn_pte() to stir in any bits between 52:48. T0SZ is always increased to cover 48bits, or 52 if the copy code has bits 52:48 in its PA. Signed-off-by: James Morse <james.morse@arm.com> [Adopted the original patch from James to trans_pgd interface, so it can be commonly used by both Kexec and Hibernate. Some minor clean-ups.] Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/linux-arm-kernel/20200115143322.214247-4-james.morse@arm.com/ Link: https://lore.kernel.org/r/20210125191923.1060122-9-pasha.tatashin@soleen.com Signed-off-by: Will Deacon <will@kernel.org>
| | * | | | | arm64: mm: Always update TCR_EL1 from __cpu_set_tcr_t0sz()James Morse2021-01-271-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because only the idmap sets a non-standard T0SZ, __cpu_set_tcr_t0sz() can check for platforms that need to do this using __cpu_uses_extended_idmap() before doing its work. The idmap is only built with enough levels, (and T0SZ bits) to map its single page. To allow hibernate, and then kexec to idmap their single page copy routines, __cpu_set_tcr_t0sz() needs to consider additional users, who may need a different number of levels/T0SZ-bits to the idmap. (i.e. VA_BITS may be enough for the idmap, but not hibernate/kexec) Always read TCR_EL1, and check whether any work needs doing for this request. __cpu_uses_extended_idmap() remains as it is used by KVM, whose idmap is also part of the kernel image. This mostly affects the cpuidle path, where we now get an extra system register read . CC: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com> CC: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/r/20210125191923.1060122-8-pasha.tatashin@soleen.com Signed-off-by: Will Deacon <will@kernel.org>
| | * | | | | arm64: trans_pgd: pass allocator trans_pgd_create_copyPavel Tatashin2021-01-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make trans_pgd_create_copy and its subroutines to use allocator that is passed as an argument Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20210125191923.1060122-6-pasha.tatashin@soleen.com Signed-off-by: Will Deacon <will@kernel.org>
| | * | | | | arm64: trans_pgd: make trans_pgd_map_page genericPavel Tatashin2021-01-271-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kexec is going to use a different allocator, so make trans_pgd_map_page to accept allocator as an argument, and also kexec is going to use a different map protection, so also pass it via argument. Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Matthias Brugger <mbrugger@suse.com> Link: https://lore.kernel.org/r/20210125191923.1060122-5-pasha.tatashin@soleen.com Signed-off-by: Will Deacon <will@kernel.org>
| | * | | | | arm64: hibernate: move page handling function to new trans_pgd.cPavel Tatashin2021-01-271-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now, that we abstracted the required functions move them to a new home. Later, we will generalize these function in order to be useful outside of hibernation. Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20210125191923.1060122-4-pasha.tatashin@soleen.com Signed-off-by: Will Deacon <will@kernel.org>
| | * | | | | arm64: kexec: make dtb_mem always enabledPavel Tatashin2021-01-271-2/+2
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, dtb_mem is enabled only when CONFIG_KEXEC_FILE is enabled. This adds ugly ifdefs to c files. Always enabled dtb_mem, when it is not used, it is NULL. Change the dtb_mem to phys_addr_t, as it is a physical address. Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20210125191923.1060122-2-pasha.tatashin@soleen.com Signed-off-by: Will Deacon <will@kernel.org>