linux

mirror of https://github.com/torvalds/linux.git synced 2025-12-07 11:56:58 +00:00

Author	SHA1	Message	Date
Linus Torvalds	44fc84337b	Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "These are the arm64 updates for 6.19. The biggest part is the Arm MPAM driver under drivers/resctrl/. There's a patch touching mm/ to handle spurious faults for huge pmd (similar to the pte version). The corresponding arm64 part allows us to avoid the TLB maintenance if a (huge) page is reused after a write fault. There's EFI refactoring to allow runtime services with preemption enabled and the rest is the usual perf/PMU updates and several cleanups/typos. Summary: Core features: - Basic Arm MPAM (Memory system resource Partitioning And Monitoring) driver under drivers/resctrl/ which makes use of the fs/rectrl/ API Perf and PMU: - Avoid cycle counter on multi-threaded CPUs - Extend CSPMU device probing and add additional filtering support for NVIDIA implementations - Add support for the PMUs on the NoC S3 interconnect - Add additional compatible strings for new Cortex and C1 CPUs - Add support for data source filtering to the SPE driver - Add support for i.MX8QM and "DB" PMU in the imx PMU driver Memory managemennt: - Avoid broadcast TLBI if page reused in write fault - Elide TLB invalidation if the old PTE was not valid - Drop redundant cpu_set__tcr_t0sz() macros - Propagate pgtable_alloc() errors outside of __create_pgd_mapping() - Propagate return value from __change_memory_common() ACPI and EFI: - Call EFI runtime services without disabling preemption - Remove unused ACPI function Miscellaneous: - ptrace support to disable streaming on SME-only systems - Improve sysreg generation to include a 'Prefix' descriptor - Replace __ASSEMBLY__ with __ASSEMBLER__ - Align register dumps in the kselftest zt-test - Remove some no longer used macros/functions - Various spelling corrections" tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits) arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic arm64/pageattr: Propagate return value from __change_memory_common arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS KVM: arm64: selftests: Consider all 7 possible levels of cache KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros Documentation/arm64: Fix the typo of register names ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() perf: arm_spe: Add support for filtering on data source perf: Add perf_event_attr::config4 perf/imx_ddr: Add support for PMU in DB (system interconnects) perf/imx_ddr: Get and enable optional clks perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT arm64: mm: use untagged address to calculate page index MAINTAINERS: new entry for MPAM Driver arm_mpam: Add kunit tests for props_mismatch() arm_mpam: Add kunit test for bitmap reset arm_mpam: Add helper to reset saved mbwu state ...	2025-12-02 17:03:55 -08:00
Linus Torvalds	2547f79b0b	Merge tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Provide a new interface for dynamic configuration and deconfiguration of hotplug memory, allowing with and without memmap_on_memory support. This makes the way memory hotplug is handled on s390 much more similar to other architectures - Remove compat support. There shouldn't be any compat user space around anymore, therefore get rid of a lot of code which also doesn't need to be tested anymore - Add stackprotector support. GCC 16 will get new compiler options, which allow to generate code required for kernel stackprotector support - Merge pai_crypto and pai_ext PMU drivers into a new driver. This removes a lot of duplicated code. The new driver is also extendable and allows to support new PMUs - Add driver override support for AP queues - Rework and extend zcrypt and AP trace events to allow for tracing of crypto requests - Support block sizes larger than 65535 bytes for CCW tape devices - Since the rework of the virtual kernel address space the module area and the kernel image are within the same 4GB area. This eliminates the need of weak per cpu variables. Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU - Various other small improvements and fixes * tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (92 commits) watchdog: diag288_wdt: Remove KMSG_COMPONENT macro s390/entry: Use lay instead of aghik s390/vdso: Get rid of -m64 flag handling s390/vdso: Rename vdso64 to vdso s390: Rename head64.S to head.S s390/vdso: Use common STABS_DEBUG and DWARF_DEBUG macros s390: Add stackprotector support s390/modules: Simplify module_finalize() slightly s390: Remove KMSG_COMPONENT macro s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU s390/ap: Restrict driver_override versus apmask and aqmask use s390/ap: Rename mutex ap_perms_mutex to ap_attr_mutex s390/ap: Support driver_override for AP queue devices s390/ap: Use all-bits-one apmask/aqmask for vfio in_use() checks s390/debug: Update description of resize operation s390/syscalls: Switch to generic system call table generation s390/syscalls: Remove system call table pointer from thread_struct s390/uapi: Remove 31 bit support from uapi header files s390: Remove compat support tools: Remove s390 compat support ...	2025-12-02 16:37:00 -08:00
Linus Torvalds	4a21d1b33f	Merge tag 'm68k-for-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k update from Geert Uytterhoeven: - defconfig update * tag 'm68k-for-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: defconfig: Update defconfigs for v6.18-rc1	2025-12-02 16:32:02 -08:00
Linus Torvalds	d61f1cc5db	Merge tag 'x86_cpu_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPU feature updates from Dave Hansen: "The biggest thing of note here is Linear Address Space Separation (LASS). It represents the first time I can think of that the upper=>kernel/lower=>user address space convention is actually recognized by the hardware on x86. It ensures that userspace can not even get the hardware to _start_ page walks for the kernel address space. This, of course, is a really nice generic side channel defense. This is really only a down payment on LASS support. There are still some details to work out in its interaction with EFI calls and vsyscall emulation. For now, LASS is disabled if either of those features is compiled in (which is almost always the case). There's also one straggler commit in here which converts an under-utilized AMD CPU feature leaf into a generic Linux-defined leaf so more feature can be packed in there. Summary: - Enable Linear Address Space Separation (LASS) - Change X86_FEATURE leaf 17 from an AMD leaf to Linux-defined" * tag 'x86_cpu_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Enable LASS during CPU initialization selftests/x86: Update the negative vsyscall tests to expect a #GP x86/traps: Communicate a LASS violation in #GP message x86/kexec: Disable LASS during relocate kernel x86/alternatives: Disable LASS when patching kernel code x86/asm: Introduce inline memcpy and memset x86/cpu: Add an LASS dependency on SMAP x86/cpufeatures: Enumerate the LASS feature bits x86/cpufeatures: Make X86_FEATURE leaf 17 Linux-specific	2025-12-02 14:48:08 -08:00
Linus Torvalds	a7610b8465	Merge tag 'x86_entry_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 entry update from Dave Hansen: "This one is pretty trivial: fix a badly-named FRED data structure member" * tag 'x86_entry_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fred: Fix 64bit identifier in fred_ss	2025-12-02 14:24:21 -08:00
Linus Torvalds	e2aa39b368	Merge tag 'x86_misc_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Dave Hansen: "The most significant are some changes to ensure that symbols exported for KVM are used only by KVM modules themselves, along with some related cleanups. In true x86/misc fashion, the other patch is completely unrelated and just enhances an existing pr_warn() to make it clear to users how they have tainted their kernel when something is mucking with MSRs. Summary: - Make MSR-induced taint easier for users to track down - Restrict KVM-specific exports to KVM itself" * tag 'x86_misc_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Restrict KVM-induced symbol exports to KVM modules where obvious/possible x86/mm: Drop unnecessary export of "ptdump_walk_pgd_level_debugfs" x86/mtrr: Drop unnecessary export of "mtrr_state" x86/bugs: Drop unnecessary export of "x86_spec_ctrl_base" x86/msr: Add CPU_OUT_OF_SPEC taint name to "unrecognized" pr_warn(msg)	2025-12-02 14:16:42 -08:00
Linus Torvalds	54de197c9a	Merge tag 'x86_sgx_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SGX updates from Dave HansenL "The main content here is adding support for the new EUPDATESVN SGX ISA. Before this, folks who updated microcode had to reboot before enclaves could attest to the new microcode. The new functionality lets them do this without a reboot. The rest are some nice, but relatively mundane comment and kernel-doc fixups. Summary: - Allow security version (SVN) updates so enclaves can attest to new microcode - Fix kernel docs typos" * tag 'x86_sgx_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Fix a typo in the kernel-doc comment for enum sgx_attribute x86/sgx: Remove superfluous asterisk from copyright comment in asm/sgx.h x86/sgx: Document structs and enums with '@', not '%' x86/sgx: Add kernel-doc descriptions for params passed to vDSO user handler x86/sgx: Add a missing colon in kernel-doc markup for "struct sgx_enclave_run" x86/sgx: Enable automatic SVN updates for SGX enclaves x86/sgx: Implement ENCLS[EUPDATESVN] x86/sgx: Define error codes for use by ENCLS[EUPDATESVN] x86/cpufeatures: Add X86_FEATURE_SGX_EUPDATESVN feature flag x86/sgx: Introduce functions to count the sgx_(vepc_)open()	2025-12-02 14:03:05 -08:00
Linus Torvalds	c76431e3b5	Merge tag 'x86_mm_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Borislav Petkov: - Use the proper accessors when reading CR3 as part of the page level transitions (5-level to 4-level, the use case being kexec) so that only the physical address in CR3 is picked up and not flags which are above the physical mask shift - Clean up and unify __phys_addr_symbol() definitions * tag 'x86_mm_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub: Fix page table access in 5-level to 4-level paging transition x86/boot: Fix page table access in 5-level to 4-level paging transition x86/mm: Unify __phys_addr_symbol()	2025-12-02 13:32:52 -08:00
Linus Torvalds	a9a10e920e	Merge tag 'x86_bugs_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPU mitigation updates from Borislav Petkov: - Convert the tsx= cmdline parsing to use early_param() - Cleanup forward declarations gunk in bugs.c * tag 'x86_bugs_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bugs: Get rid of the forward declarations x86/tsx: Get the tsx= command line parameter with early_param() x86/tsx: Make tsx_ctrl_state static	2025-12-02 13:27:09 -08:00
Linus Torvalds	cb502f0e5e	Merge tag 'x86_sev_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Largely cleanups along with a change to save XSS to the GHCB (Guest-Host Communication Block) in SEV-ES guests so that the hypervisor can determine the guest's XSAVES buffer size properly and thus support shadow stacks in AMD confidential guests * tag 'x86_sev_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cc: Fix enum spelling to fix kernel-doc warnings x86/boot: Drop unused sev_enable() fallback x86/coco/sev: Convert has_cpuflag() to use cpu_feature_enabled() x86/sev: Include XSS value in GHCB CPUID request x86/boot: Move boot_*msr helpers to asm/shared/msr.h	2025-12-02 13:07:53 -08:00
Linus Torvalds	d748981834	Merge tag 'x86_cleanups_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: - The mandatory pile of cleanups the cat drags in every merge window * tag 'x86_cleanups_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Clean up whitespace in a20.c x86/mm: Delete disabled debug code x86/{boot,mtrr}: Remove unused function declarations x86/percpu: Use BIT_WORD() and BIT_MASK() macros x86/cpufeatures: Correct LKGS feature flag description x86/idtentry: Add missing '*' to kernel-doc lines	2025-12-02 12:17:47 -08:00
Linus Torvalds	2ae20d6510	Merge tag 'x86_cache_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Add support for AMD's Smart Data Cache Injection feature which allows for direct insertion of data from I/O devices into the L3 cache, thus bypassing DRAM and saving its bandwidth; the resctrl side of the feature allows the size of the L3 used for data injection to be controlled - Add Intel Clearwater Forest to the list of CPUs which support Sub-NUMA clustering - Other fixes and cleanups * tag 'x86_cache_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/resctrl: Update bit_usage to reflect io_alloc fs/resctrl: Introduce interface to modify io_alloc capacity bitmasks fs/resctrl: Modify struct rdt_parse_data to pass mode and CLOSID fs/resctrl: Introduce interface to display io_alloc CBMs fs/resctrl: Add user interface to enable/disable io_alloc feature fs/resctrl: Introduce interface to display "io_alloc" support x86,fs/resctrl: Implement "io_alloc" enable/disable handlers x86,fs/resctrl: Detect io_alloc feature x86/resctrl: Add SDCIAE feature in the command line options x86/cpufeatures: Add support for L3 Smart Data Cache Injection Allocation Enforcement fs/resctrl: Consider sparse masks when initializing new group's allocation x86/resctrl: Support Sub-NUMA Cluster (SNC) mode on Clearwater Forest	2025-12-02 11:55:58 -08:00
Linus Torvalds	2a47c26e55	Merge tag 'x86_microcode_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading updates from Borislav Petkov: - Add microcode staging support on Intel: it moves the sole microcode blobs loading to a non-critical path so that microcode loading latencies are kept at minimum. The actual "directing" the hardware to load microcode is the only step which is done on the critical path. This scheme is also opportunistic as in: on a failure, the machinery falls back to normal loading - Add the capability to the AMD side of the loader to select one of two per-family/model/stepping patches: one is pre-Entrysign and the other is post-Entrysign; with the goal to take care of machines which haven't updated their BIOS yet - something they should absolutely do as this is the only proper Entrysign fix - Other small cleanups and fixlets * tag 'x86_microcode_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Mark early_parse_cmdline() as __init x86/microcode/AMD: Select which microcode patch to load x86/microcode/intel: Enable staging when available x86/microcode/intel: Support mailbox transfer x86/microcode/intel: Implement staging handler x86/microcode/intel: Define staging state struct x86/microcode/intel: Establish staging control logic x86/microcode: Introduce staging step to reduce late-loading time x86/cpu/topology: Make primary thread mask available with SMP=n	2025-12-02 11:35:49 -08:00
Linus Torvalds	a61288200e	Merge tag 'ras_core_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Borislav Petkov: - The second part of the AMD MCA interrupts rework after the last-minute show-stopper from the last merge window was sorted out. After this, the AMD MCA deferred errors, thresholding and corrected errors interrupt handlers use common MCA code and are tightly integrated into the core MCA code, thereby getting rid of considerable duplication. All culminating into allowing CMCI error thresholding storms to be detected at AMD too, using the common infrastructure - Add support for two new MCA bank bits on AMD Zen6 which denote whether the error address logged is a system physical address, which obviates the need for it to be translated before further error recovery can be done * tag 'ras_core_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Handle AMD threshold interrupt storms x86/mce: Do not clear bank's poll bit in mce_poll_banks on AMD SMCA systems x86/mce: Add support for physical address valid bit x86/mce: Save and use APEI corrected threshold limit x86/mce/amd: Define threshold restart function for banks x86/mce/amd: Remove redundant reset_block() x86/mce/amd: Support SMCA Corrected Error Interrupt x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems x86/mce: Unify AMD DFR handler with MCA Polling x86/mce: Unify AMD THR handler with MCA Polling	2025-12-02 11:04:37 -08:00
Linus Torvalds	49219bba01	Merge tag 'edac_updates_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - imh_edac: Add a new EDAC driver for Intel Diamond Rapids and future incarnations of this memory controllers architecture - amd64_edac: Remove the legacy csrow sysfs interface which has been deprecated and unused (we assume) for at least a decade - Add the capability to fallback to BIOS-provided address translation functionality (ACPI PRM) which can be used on systems unsupported by the current AMD address translation library - The usual fixes, fixlets, cleanups and improvements all over the place * tag 'edac_updates_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: RAS/AMD/ATL: Replace bitwise_xor_bits() with hweight16() EDAC/igen6: Fix error handling in igen6_edac driver EDAC/imh: Setup 'imh_test' debugfs testing node EDAC/{skx_comm,imh}: Detect 2-level memory configuration EDAC/skx_common: Extend the maximum number of DRAM chip row bits EDAC/{skx_common,imh}: Add EDAC driver for Intel Diamond Rapids servers EDAC/skx_common: Prepare for skx_set_hi_lo() EDAC/skx_common: Prepare for skx_get_edac_list() EDAC/{skx_common,skx,i10nm}: Make skx_register_mci() independent of pci_dev EDAC/ghes: Replace deprecated strcpy() in ghes_edac_report_mem_error() EDAC/ie31200: Fix error handling in ie31200_register_mci RAS/CEC: Replace use of system_wq with system_percpu_wq EDAC: Remove the legacy EDAC sysfs interface EDAC/amd64: Remove NUM_CONTROLLERS macro EDAC/amd64: Generate ctl_name string at runtime RAS/AMD/ATL: Require PRM support for future systems ACPI: PRM: Add acpi_prm_handler_available() RAS/AMD/ATL: Return error codes from helper functions	2025-12-02 10:45:50 -08:00
Linus Torvalds	7f8d5f70ff	Merge tag 'core-core-2025-12-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core irq cleanup from Thomas Gleixner: "Tree wide cleanup of the remaining users of in_irq() which got replaced by in_hardirq() and marked deprecated in 2020" * tag 'core-core-2025-12-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide: Remove in_irq()	2025-12-02 10:18:49 -08:00
Linus Torvalds	d42e504a55	Merge tag 'timers-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner: - Prevent a thundering herd problem when the timekeeper CPU is delayed and a large number of CPUs compete to acquire jiffies_lock to do the update. Limit it to one CPU with a separate "uncontended" atomic variable. - A set of improvements for the timer migration mechanism: - Support imbalanced NUMA trees correctly - Support dynamic exclusion of CPUs from the migrator duty to allow the cpuset/isolation mechanism to exclude them from handling timers of remote idle CPUs - The usual small updates, cleanups and enhancements * tag 'timers-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers/migration: Exclude isolated cpus from hierarchy cpumask: Add initialiser to use cleanup helpers sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks() timers/migration: Use scoped_guard on available flag set/clear timers/migration: Add mask for CPUs available in the hierarchy timers/migration: Rename 'online' bit to 'available' selftests/timers/nanosleep: Add tests for return of remaining time selftests/timers: Clean up kernel version check in posix_timers time: Fix a few typos in time[r] related code comments time: tick-oneshot: Add missing Return and parameter descriptions to kernel-doc hrtimer: Store time as ktime_t in restart block timers/migration: Remove dead code handling idle CPU checking for remote timers timers/migration: Remove unused "cpu" parameter from tmigr_get_group() timers/migration: Assert that hotplug preparing CPU is part of stable active hierarchy timers/migration: Fix imbalanced NUMA trees timers/migration: Remove locking on group connection timers/migration: Convert "while" loops to use "for" tick/sched: Limit non-timekeeper CPUs calling jiffies update	2025-12-02 09:58:33 -08:00
Linus Torvalds	5028f42416	Merge tag 'timers-clocksource-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull clocksource updates from Thomas Gleixner: "Updates for clocksource and clockevent drivers: - A new driver for the Realtel system timer - Prevent the unbinding of timers when the drivers do not support that - Expand the timer counter readout for the SPRD driver to 64 bit to allow IOT devices suspend times of more than 36 hours, which is the current limit of the 32-bi readout - The usual small cleanups, fixes and enhancements all over the place" * tag 'timers-clocksource-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers: Add Realtek system timer driver dt-bindings: timer: Add Realtek SYSTIMER clocksource/drivers/stm32-lp: Drop unused module alias clocksource/drivers/rda: Add sched_clock_register for RDA8810PL SoC clocksource/drivers/nxp-stm: Prevent driver unbind clocksource/drivers/nxp-pit: Prevent driver unbind clocksource/drivers/arm_arch_timer_mmio: Prevent driver unbind clocksource/drivers/nxp-stm: Fix section mismatches clocksource/drivers/sh_cmt: Always leave device running after probe clocksource/drivers/stm: Fix double deregistration on probe failure clocksource/drivers/ralink: Fix resource leaks in init error path clocksource/drivers/timer-sp804: Fix read_current_timer() issue when clock source is not registered clocksource/drivers/sprd: Enable register for timer counter from 32 bit to 64 bit	2025-12-02 09:54:27 -08:00
Linus Torvalds	9ce62ebbb7	Merge tag 'irq-msi-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull MSI updates from Thomas Gleixner: "Updates for [PCI] MSI related code: - Remove one variant of PCI/MSI management as all users have been converted to use per device domains. That reduces the variants to two: The modern and the real archaic legacy variant, which keeps the usual suspects in the museum category alive. - Rework the platform MSI device ID detection mechanism in the ARM GIC world to address resource leaks, duplicated code and other details. This requires a corresponding preparatory step in the PCI/iproc driver. - Trivial core code cleanups" * tag 'irq-msi-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-its: Rework platform MSI deviceID detection PCI: iproc: Implement MSI controller node detection with of_msi_xlate() genirq/msi: Slightly simplify msi_domain_alloc() PCI/MSI: Delete pci_msi_create_irq_domain()	2025-12-02 09:35:59 -08:00
Linus Torvalds	15b87bec89	Merge tag 'irq-drivers-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq driver updates from Thomas Gleixner: "Boring updates for interrupt drivers: - Support for a couple of new ARM64 and RISCV SoC variants and their magic interrupt controllers which either can reuse existing code or require quirks due to a botched hardware implementation - More section mismatch fixes - The usual cleanups and fixes all over the place" * tag 'irq-drivers-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) irqchip/meson-gpio: Add support for Amlogic S6 S7 and S7D SoCs dt-bindings: interrupt-controller: Add support for Amlogic S6 S7 and S7D SoCs dt-bindings: interrupt-controller: aspeed,ast2700: Correct #interrupt-cells and interrupts count irqchip/aclint-sswi: Add Nuclei UX900 support dt-bindings: interrupt-controller: Add Anlogic DR1V90 ACLINT SSWI dt-bindings: interrupt-controller: Add Anlogic DR1V90 ACLINT MSWI dt-bindings: interrupt-controller: Add Anlogic DR1V90 PLIC irqchip/irq-bcm7038-l1: Remove unused reg_mask_status() irqchip/sifive-plic: Fix call to __plic_toggle() in M-Mode code path irqchip/sifive-plic: Add support for UltraRISC DP1000 PLIC irqchip/sifive-plic: Cache the interrupt enable state dt-bindings: interrupt-controller: Add UltraRISC DP1000 PLIC dt-bindings: vendor-prefixes: Add UltraRISC irqchip/qcom-irq-combiner: Rename driver structure irqchip/riscv-imsic: Inline imsic_vector_from_local_id() irqchip/riscv-imsic: Embed the vector array in lpriv irqchip/riscv-imsic: Remove redundant irq_data lookups irqchip/ts4800: Drop unused module alias irqchip/mvebu-pic: Drop unused module alias irqchip/meson-gpio: Drop unused module alias ...	2025-12-02 09:32:53 -08:00
Linus Torvalds	6863c8385c	Merge tag 'irq-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq core updates from Thomas Gleixner: "Updates for the interrupt core and treewide cleanups: - Rework of the Per Processor Interrupt (PPI) management on ARM[64] PPI support was built under the assumption that the systems are homogenous so that the same CPU local device types are connected to them. That's unfortunately wishful thinking and created horrible workarounds. This rework provides affinity management for PPIs so that they can be individually configured in the firmware tables and mops up the related drivers all over the place. - Prevent CPUSET/isolation changes to arbitrarily affine interrupt threads to random CPUs, which ignores user or driver settings. - Plug a harmless race in the interrupt affinity proc interface, which allows to see a half updated mask - Adjust the priority of secondary interrupt threads on RT, so that the combination of primary and secondary thread emulates the hardware interrupt plus thread scenario. Having them at the same priority can cause starvation issues in some drivers" * tag 'irq-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) genirq: Remove cpumask availability check on kthread affinity setting genirq: Fix interrupt threads affinity vs. cpuset isolated partitions genirq: Prevent early spurious wake-ups of interrupt threads genirq: Use raw_spinlock_irq() in irq_set_affinity_notifier() genirq/manage: Reduce priority of forced secondary interrupt handler genirq/proc: Fix race in show_irq_affinity() genirq: Fix percpu_devid irq affinity documentation perf: arm_pmu: Kill last use of per-CPU cpu_armpmu pointer irqdomain: Kill of_node_to_fwnode() helper genirq: Kill irq_{g,s}et_percpu_devid_partition() irqchip: Kill irq-partition-percpu irqchip/apple-aic: Drop support for custom PMU irq partitions irqchip/gic-v3: Drop support for custom PPI partitions coresight: trbe: Request specific affinities for per CPU interrupts perf: arm_spe_pmu: Request specific affinities for per CPU interrupts perf: arm_pmu: Request specific affinities for per CPU NMIs/interrupts genirq: Add request_percpu_irq_affinity() helper genirq: Allow per-cpu interrupt sharing for non-overlapping affinities genirq: Update request_percpu_nmi() to take an affinity genirq: Add affinity to percpu_devid interrupt requests ...	2025-12-02 09:14:26 -08:00
Linus Torvalds	312f5b1866	Merge tag 'core-debugobjects-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull debugobjects update from Thomas Gleixner: "Two small updates for debugobjects: - Allow pool refill on RT enabled kernels before the scheduler is up and running to prevent pool exhaustion - Correct the lockdep override to prevent false positives" * tag 'core-debugobjects-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: debugobjects: Use LD_WAIT_CONFIG instead of LD_WAIT_SLEEP debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING	2025-12-02 09:07:48 -08:00
Linus Torvalds	2b09f480f0	Merge tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rseq updates from Thomas Gleixner: "A large overhaul of the restartable sequences and CID management: The recent enablement of RSEQ in glibc resulted in regressions which are caused by the related overhead. It turned out that the decision to invoke the exit to user work was not really a decision. More or less each context switch caused that. There is a long list of small issues which sums up nicely and results in a 3-4% regression in I/O benchmarks. The other detail which caused issues due to extra work in context switch and task migration is the CID (memory context ID) management. It also requires to use a task work to consolidate the CID space, which is executed in the context of an arbitrary task and results in sporadic uncontrolled exit latencies. The rewrite addresses this by: - Removing deprecated and long unsupported functionality - Moving the related data into dedicated data structures which are optimized for fast path processing. - Caching values so actual decisions can be made - Replacing the current implementation with a optimized inlined variant. - Separating fast and slow path for architectures which use the generic entry code, so that only fault and error handling goes into the TIF_NOTIFY_RESUME handler. - Rewriting the CID management so that it becomes mostly invisible in the context switch path. That moves the work of switching modes into the fork/exit path, which is a reasonable tradeoff. That work is only required when a process creates more threads than the cpuset it is allowed to run on or when enough threads exit after that. An artificial thread pool benchmarks which triggers this did not degrade, it actually improved significantly. The main effect in migration heavy scenarios is that runqueue lock held time and therefore contention goes down significantly" * tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) sched/mmcid: Switch over to the new mechanism sched/mmcid: Implement deferred mode change irqwork: Move data struct to a types header sched/mmcid: Provide CID ownership mode fixup functions sched/mmcid: Provide new scheduler CID mechanism sched/mmcid: Introduce per task/CPU ownership infrastructure sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex sched/mmcid: Provide precomputed maximal value sched/mmcid: Move initialization out of line signal: Move MMCID exit out of sighand lock sched/mmcid: Convert mm CID mask to a bitmap cpumask: Cache num_possible_cpus() sched/mmcid: Use cpumask_weighted_or() cpumask: Introduce cpumask_weighted_or() sched/mmcid: Prevent pointless work in mm_update_cpus_allowed() sched/mmcid: Move scheduler code out of global header sched: Fixup whitespace damage sched/mmcid: Cacheline align MM CID storage sched/mmcid: Use proper data structures sched/mmcid: Revert the complex CID management ...	2025-12-02 08:48:53 -08:00
Linus Torvalds	1dce50698a	Merge tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scoped user access updates from Thomas Gleixner: "Scoped user mode access and related changes: - Implement the missing u64 user access function on ARM when CONFIG_CPU_SPECTRE=n. This makes it possible to access a 64bit value in generic code with [unsafe_]get_user(). All other architectures and ARM variants provide the relevant accessors already. - Ensure that ASM GOTO jump label usage in the user mode access helpers always goes through a local C scope label indirection inside the helpers. This is required because compilers are not supporting that a ASM GOTO target leaves a auto cleanup scope. GCC silently fails to emit the cleanup invocation and CLANG fails the build. [ Editor's note: gcc-16 will have fixed the code generation issue in commit f68fe3ddda4 ("eh: Invoke cleanups/destructors in asm goto jumps [PR122835]"). But we obviously have to deal with clang and older versions of gcc, so.. - Linus ] This provides generic wrapper macros and the conversion of affected architecture code to use them. - Scoped user mode access with auto cleanup Access to user mode memory can be required in hot code paths, but if it has to be done with user controlled pointers, the access is shielded with a speculation barrier, so that the CPU cannot speculate around the address range check. Those speculation barriers impact performance quite significantly. This cost can be avoided by "masking" the provided pointer so it is guaranteed to be in the valid user memory access range and otherwise to point to a guaranteed unpopulated address space. This has to be done without branches so it creates an address dependency for the access, which the CPU cannot speculate ahead. This results in repeating and error prone programming patterns: if (can_do_masked_user_access()) from = masked_user_read_access_begin((from)); else if (!user_read_access_begin(from, sizeof(from))) return -EFAULT; unsafe_get_user(val, from, Efault); user_read_access_end(); return 0; Efault: user_read_access_end(); return -EFAULT; which can be replaced with scopes and automatic cleanup: scoped_user_read_access(from, Efault) unsafe_get_user(val, from, Efault); return 0; Efault: return -EFAULT; - Convert code which implements the above pattern over to scope_user..access(). This also corrects a couple of imbalanced masked__begin() instances which are harmless on most architectures, but prevent PowerPC from implementing the masking optimization. - Add a missing speculation barrier in copy_from_user_iter()" tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lib/strn,uaccess: Use masked_user_{read/write}_access_begin when required scm: Convert put_cmsg() to scoped user access iov_iter: Add missing speculation barrier to copy_from_user_iter() iov_iter: Convert copy_from_user_iter() to masked user access select: Convert to scoped user access x86/futex: Convert to scoped user access futex: Convert to get/put_user_inline() uaccess: Provide put/get_user_inline() uaccess: Provide scoped user access regions arm64: uaccess: Use unsafe wrappers for ASM GOTO s390/uaccess: Use unsafe wrappers for ASM GOTO riscv/uaccess: Use unsafe wrappers for ASM GOTO powerpc/uaccess: Use unsafe wrappers for ASM GOTO x86/uaccess: Use unsafe wrappers for ASM GOTO uaccess: Provide ASM GOTO safe wrappers for unsafe__user() ARM: uaccess: Implement missing __get_user_asm_dword()	2025-12-02 08:01:39 -08:00
Borislav Petkov (AMD)	e2349c5811	Merge remote-tracking branches 'ras/edac-amd-atl', 'ras/edac-drivers' and 'ras/edac-misc' into edac-updates Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2025-12-01 12:06:08 +01:00
Harry Fellowes	d911fe6e94	x86/boot: Clean up whitespace in a20.c Remove trailing whitespace on empty lines. No functional changes. [ bp: Massage commit message. ] Signed-off-by: Harry Fellowes <harryfellowes1@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20250825192832.6444-3-harryfellowes1@gmail.com	2025-11-28 20:29:52 +01:00
Catalin Marinas	edde060637	Merge branch 'for-next/set_memory' into for-next/core * for-next/set_memory: : Fix + documentation for the arm64 change_memory_common() arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic arm64/pageattr: Propagate return value from __change_memory_common	2025-11-28 15:48:03 +00:00
Catalin Marinas	52c4d1d624	Merge branch 'for-next/sysreg' into for-next/core * for-next/sysreg: : arm64 sysreg updates/cleanups arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS KVM: arm64: selftests: Consider all 7 possible levels of cache KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user arm64/sysreg: Add ICH_VMCR_EL2 arm64/sysreg: Move generation of RES0/RES1/UNKN to function arm64/sysreg: Support feature-specific fields with 'Prefix' descriptor arm64/sysreg: Fix checks for incomplete sysreg definitions arm64/sysreg: Replace TCR_EL1 field macros	2025-11-28 15:47:53 +00:00
Catalin Marinas	17c05cb0ef	Merge branches 'for-next/misc', 'for-next/kselftest', 'for-next/efi-preempt', 'for-next/assembler-macro', 'for-next/typos', 'for-next/sme-ptrace-disable', 'for-next/local-tlbi-page-reused', 'for-next/mpam', 'for-next/acpi' and 'for-next/documentation', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf: arm_spe: Add support for filtering on data source perf: Add perf_event_attr::config4 perf/imx_ddr: Add support for PMU in DB (system interconnects) perf/imx_ddr: Get and enable optional clks perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL arch_topology: Provide a stub topology_core_has_smt() for !CONFIG_GENERIC_ARCH_TOPOLOGY perf/arm-ni: Fix and optimise register offset calculation perf: arm_pmuv3: Add new Cortex and C1 CPU PMUs perf: arm_cspmu: fix error handling in arm_cspmu_impl_unregister() perf/arm-ni: Add NoC S3 support perf/arm_cspmu: nvidia: Add pmevfiltr2 support perf/arm_cspmu: nvidia: Add revision id matching perf/arm_cspmu: Add pmpidr support perf/arm_cspmu: Add callback to reset filter config perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores * for-next/misc: : Miscellaneous patches arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT arm64: mm: use untagged address to calculate page index arm64: mm: make linear mapping permission update more robust for patial range arm64/mm: Elide TLB flush in certain pte protection transitions arm64/mm: Rename try_pgd_pgtable_alloc_init_mm arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors arm64: add unlikely hint to MTE async fault check in el0_svc_common arm64: acpi: add newline to deferred APEI warning arm64: entry: Clean out some indirection arm64/mm: Ensure PGD_SIZE is aligned to 64 bytes when PA_BITS = 52 arm64/mm: Drop cpu_set_[default\|idmap]_tcr_t0sz() arm64: remove unused ARCH_PFN_OFFSET arm64: use SOFTIRQ_ON_OWN_STACK for enabling softirq stack arm64: Remove assertion on CONFIG_VMAP_STACK * for-next/kselftest: : arm64 kselftest patches kselftest/arm64: Align zt-test register dumps * for-next/efi-preempt: : arm64: Make EFI calls preemptible arm64/efi: Call EFI runtime services without disabling preemption arm64/efi: Move uaccess en/disable out of efi_set_pgd() arm64/efi: Drop efi_rt_lock spinlock from EFI arch wrapper arm64/fpsimd: Permit kernel mode NEON with IRQs off arm64/fpsimd: Don't warn when EFI execution context is preemptible efi/runtime-wrappers: Keep track of the efi_runtime_lock owner efi: Add missing static initializer for efi_mm::cpus_allowed_lock * for-next/assembler-macro: : arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in headers arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers * for-next/typos: : Random typo/spelling fixes arm64: Fix double word in comments arm64: Fix typos and spelling errors in comments * for-next/sme-ptrace-disable: : Support disabling streaming mode via ptrace on SME only systems kselftest/arm64: Cover disabling streaming mode without SVE in fp-ptrace kselftst/arm64: Test NT_ARM_SVE FPSIMD format writes on non-SVE systems arm64/sme: Support disabling streaming mode via ptrace on SME only systems * for-next/local-tlbi-page-reused: : arm64, mm: avoid TLBI broadcast if page reused in write fault arm64, tlbflush: don't TLBI broadcast if page reused in write fault mm: add spurious fault fixing support for huge pmd * for-next/mpam: (34 commits) : Basic Arm MPAM driver (more to follow) MAINTAINERS: new entry for MPAM Driver arm_mpam: Add kunit tests for props_mismatch() arm_mpam: Add kunit test for bitmap reset arm_mpam: Add helper to reset saved mbwu state arm_mpam: Use long MBWU counters if supported arm_mpam: Probe for long/lwd mbwu counters arm_mpam: Consider overflow in bandwidth counter state arm_mpam: Track bandwidth counter state for power management arm_mpam: Add mpam_msmon_read() to read monitor value arm_mpam: Add helpers to allocate monitors arm_mpam: Probe and reset the rest of the features arm_mpam: Allow configuration to be applied and restored during cpu online arm_mpam: Use a static key to indicate when mpam is enabled arm_mpam: Register and enable IRQs arm_mpam: Extend reset logic to allow devices to be reset any time arm_mpam: Add a helper to touch an MSC from any CPU arm_mpam: Reset MSC controls from cpuhp callbacks arm_mpam: Merge supported features during mpam_enable() into mpam_class arm_mpam: Probe the hardware features resctrl supports arm_mpam: Add helpers for managing the locking around the mon_sel registers ... * for-next/acpi: : arm64 acpi updates ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() * for-next/documentation: : arm64 Documentation updates Documentation/arm64: Fix the typo of register names	2025-11-28 15:47:12 +00:00
Dev Jain	0c2988aaa4	arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic Consider the following code path: (1) vmalloc -> (2) set_vm_flush_reset_perms -> (3) set_memory_ro/set_memory_rox -> .... (4) use the mapping .... -> (5) vfree -> (6) vm_reset_perms -> (7) set_area_direct_map. Or, it may happen that we encounter failure at (3) and directly jump to (5). In both cases, (7) may fail due to linear map split failure. But, we care about its success only for the region which got successfully changed by (3). Such a region is guaranteed to be pte-mapped. The TLDR is that (7) will surely succeed for the regions we care about. Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-28 15:36:40 +00:00
Dev Jain	e5efd56fa1	arm64/pageattr: Propagate return value from __change_memory_common The rodata=on security measure requires that any code path which does vmalloc -> set_memory_ro/set_memory_rox must protect the linear map alias too. Therefore, if such a call fails, we must abort set_memory_* and caller must take appropriate action; currently we are suppressing the error, and there is a real chance of such an error arising post commit `a166563e7e` ("arm64: mm: support large block mapping when rodata=full"). Therefore, propagate any error to the caller. Fixes: `a166563e7e` ("arm64: mm: support large block mapping when rodata=full") Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-28 15:36:40 +00:00
Ben Horgan	27abb1ee5a	arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS The define ARM64_FEATURE_FIELD_BITS is now unused and feature id fields don't always have 4 bits. Remove it. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-27 18:17:59 +00:00
Ben Horgan	4138cc63d3	KVM: arm64: selftests: Consider all 7 possible levels of cache In test_clidr() if an empty cache level is not found then the TEST_ASSERT will not fire. Fix this by considering all 7 possible levels when iterating through the hierarchy. Found by inspection. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-27 18:16:46 +00:00
Ben Horgan	bf09ee9180	KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user ARM64_FEATURE_FIELD_BITS is set to 4 but not all ID register fields are 4 bits. See for instance ID_AA64SMFR0_EL1. The last user of this define, ARM64_FEATURE_FIELD_BITS, is the set_id_regs selftest. Its logic assumes the fields aren't a single bits; assert that's the case and stop using the define. As there are no more users, ARM64_FEATURE_FIELD_BITS is removed from the arm64 tools sysreg.h header. A separate commit removes this from the kernel version of the header. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-27 18:16:46 +00:00
Seongsu Park	c86d9f8764	arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros The ATOMIC_FETCH_OP_AND and ATOMIC64_FETCH_OP_AND macros accept 'mb' and 'cl' parameters but never use them in their implementation. These macros simply delegate to the corresponding andnot functions, which handle the actual atomic operations and memory barriers. Signed-off-by: Seongsu Park <sgsu.park@samsung.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-27 18:15:24 +00:00
Sebastian Andrzej Siewior	37de2dbc31	debugobjects: Use LD_WAIT_CONFIG instead of LD_WAIT_SLEEP fill_pool_map is used to suppress nesting violations caused by acquiring a spinlock_t (from within the memory allocator) while holding a raw_spinlock_t. The used annotation is wrong. LD_WAIT_SLEEP is for always sleeping lock types such as mutex_t. LD_WAIT_CONFIG is for lock type which are sleeping while spinning on PREEMPT_RT such as spinlock_t. Use LD_WAIT_CONFIG as override. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251127153652.291697-3-bigeasy@linutronix.de	2025-11-27 16:55:34 +01:00
Sebastian Andrzej Siewior	06e0ae988f	debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING The pool of free objects is refilled on several occasions such as object initialisation. On PREEMPT_RT refilling is limited to preemptible sections due to sleeping locks used by the memory allocator. The system boots with disabled interrupts so the pool can not be refilled. If too many objects are initialized and the pool gets empty then debugobjects disables itself. Refiling can also happen early in the boot with disabled interrupts as long as the scheduler is not operational. If the scheduler can not preempt a task then a sleeping lock can not be contended. Allow to additionally refill the pool if the scheduler is not operational. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251127153652.291697-2-bigeasy@linutronix.de	2025-11-27 16:55:34 +01:00
Brendan Jackman	3d1f108845	x86/mm: Delete disabled debug code This code doesn't run. Since 2008: `4f9c11dd49` ("x86, 64-bit: adjust mapping of physical pagetables to work with Xen") the kernel has gained more flexible logging and tracing capabilities; presumably if anyone wanted to take advantage of this log message they would have got rid of the "if (0)" so they could use these capabilities. Since they haven't, just delete it. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251003-x86-init-cleanup-v1-1-f2b7994c2ad6@google.com	2025-11-27 14:32:16 +01:00
Heiko Carstens	283f90b50d	watchdog: diag288_wdt: Remove KMSG_COMPONENT macro The KMSG_COMPONENT macro is a leftover of the s390 specific "kernel message catalog" from 2008 [1] which never made it upstream. The macro was added to s390 code to allow for an out-of-tree patch which used this to generate unique message ids. Also this out-of-tree doesn't exist anymore. Remove the macro in order to get rid of a pointless indirection. [1] https://lwn.net/Articles/292650/ Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-26 17:34:52 +01:00
Thomas Gleixner	2437f79880	Merge tag 'timers-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/daniel.lezcano/linux into timers/clocksource Pull clocksource/event changes from Daniel Lezcano: - Use 64-bits for timer compensation for IoT usage where the suspend time is much longer than what 32-bits can provide (Enlin Mu) - Add delay support on sp804 for ARM32 platforms (Stephen Eta Zhou) - Fix missing resource release on error in the probe path of in the ralink driver (Haotian Zhang) - Fix double deregistration on probe failure in the NXP STM driver (Johan Hovold) - Disable runtime PM for the Renesas SH CMT timer because it is incompatible with PREEMPT_RT=y (Niklas Söderlund) - Fix section mismatches in the NXP STM driver (Johan Hovold) - Preventing unbinding the NXP PIT, STM and MMIO ARM Arch timers as the code does not suppport bind/unbind (Johan Hovold) - Use the clocksource instead of ticks on the RDA8810PL platform (Enlin Mu) - Drop the unused module alias for the STM32-LP (Johan Hovold) - Add Realtek system timer driver (Hao-Wen Ting) Link: https://lore.kernel.org/all/9303b790-28d4-4bd9-b01d-28fb05493596@linaro.org	2025-11-26 15:36:52 +01:00
Heiko Carstens	1c93edfd50	s390/entry: Use lay instead of aghik Use the lay instruction instead of aghik. aghik is only available since z196, therefore compiling the kernel for z10 results in this error: arch/s390/kernel/entry.S: Assembler messages: arch/s390/kernel/entry.S:165: Error: Unrecognized opcode: `aghik' Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511261518.nBbQN5h7-lkp@intel.com/ Fixes: `f5730d44e0` ("s390: Add stackprotector support") Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-26 12:28:23 +01:00
Hao-Wen Ting	d1780dce95	clocksource/drivers: Add Realtek system timer driver Add a system timer driver for Realtek SoCs. This driver registers the 1 MHz global hardware counter on Realtek platforms as a clock event device. Since this hardware counter starts counting automatically after SoC power-on, no clock initialization is required. Because the counter does not stop or get affected by CPU power down, and it supports oneshot mode, it is typically used as a tick broadcast timer. Signed-off-by: Hao-Wen Ting <haowen.ting@realtek.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251126060110.198330-3-haowen.ting@realtek.com	2025-11-26 11:25:15 +01:00
Hao-Wen Ting	40caba2bd0	dt-bindings: timer: Add Realtek SYSTIMER The Realtek SYSTIMER (System Timer) is a 64-bit global hardware counter operating at a fixed 1MHz frequency. Thanks to its compare match interrupt capability, the timer natively supports oneshot mode for tick broadcast functionality. Signed-off-by: Hao-Wen Ting <haowen.ting@realtek.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://patch.msgid.link/20251126060110.198330-2-haowen.ting@realtek.com	2025-11-26 11:25:15 +01:00
Johan Hovold	ed92a968a9	clocksource/drivers/stm32-lp: Drop unused module alias The driver cannot be built as a module so drop the unused platform module alias. Note that platform aliases are not needed for OF probing should it ever become possible to build the driver as a module. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251111154516.1698-1-johan@kernel.org	2025-11-26 11:25:15 +01:00
Enlin Mu	627f3f3716	clocksource/drivers/rda: Add sched_clock_register for RDA8810PL SoC The current system log timestamp accuracy is tick based, which can not meet the usage requirements and needs to reach nanoseconds. Therefore, the sched_clock_register function needs to be added. [ dlezcano: Fixed typos ] Signed-off-by: Enlin Mu <enlin.mu@unisoc.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251107063347.3692-1-enlin.mu@linux.dev	2025-11-26 11:25:11 +01:00
Johan Hovold	6a2416892e	clocksource/drivers/nxp-stm: Prevent driver unbind Clockevents cannot be deregistered so suppress the bind attributes to prevent the driver from being unbound and releasing the underlying resources after registration. Even if the driver can currently only be built-in, also switch to builtin_platform_driver() to prevent it from being unloaded should modular builds ever be enabled. Fixes: `cec32ac758` ("clocksource/drivers/nxp-timer: Add the System Timer Module for the s32gx platforms") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251111153226.579-4-johan@kernel.org	2025-11-26 11:25:03 +01:00
Johan Hovold	e25f964cf4	clocksource/drivers/nxp-pit: Prevent driver unbind The driver does not support unbinding (e.g. as clockevents cannot be deregistered) so suppress the bind attributes to prevent the driver from being unbound and rebound after registration (and disabling the timer when reprobing fails). Even if the driver can currently only be built-in, also switch to builtin_platform_driver() to prevent it from being unloaded should modular builds ever be enabled. Fixes: `bee33f22d7` ("clocksource/drivers/nxp-pit: Add NXP Automotive s32g2 / s32g3 support") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251111153226.579-3-johan@kernel.org	2025-11-26 11:24:57 +01:00
Johan Hovold	6aa10f0e2e	clocksource/drivers/arm_arch_timer_mmio: Prevent driver unbind Clockevents cannot be deregistered so suppress the bind attributes to prevent the driver from being unbound and releasing the underlying resources after registration. Fixes: `4891f01527` ("clocksource/drivers/arm_arch_timer: Add standalone MMIO driver") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://patch.msgid.link/20251111153226.579-2-johan@kernel.org	2025-11-26 11:24:47 +01:00
Johan Hovold	b452d2c97e	clocksource/drivers/nxp-stm: Fix section mismatches Platform drivers can be probed after their init sections have been discarded (e.g. on probe deferral or manual rebind through sysfs) so the probe function must not live in init. Device managed resource actions similarly cannot be discarded. The "_probe" suffix of the driver structure name prevents modpost from warning about this so replace it to catch any similar future issues. Fixes: `cec32ac758` ("clocksource/drivers/nxp-timer: Add the System Timer Module for the s32gx platforms") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: stable@vger.kernel.org # 6.16 Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251017054943.7195-1-johan@kernel.org	2025-11-26 11:24:44 +01:00
Niklas Söderlund	62524f285c	clocksource/drivers/sh_cmt: Always leave device running after probe The CMT device can be used as both a clocksource and a clockevent provider. The driver tries to be smart and power itself on and off, as well as enabling and disabling its clock when it's not in operation. This behavior is slightly altered if the CMT is used as an early platform device in which case the device is left powered on after probe, but the clock is still enabled and disabled at runtime. This has worked for a long time, but recent improvements in PREEMPT_RT and PROVE_LOCKING have highlighted an issue. As the CMT registers itself as a clockevent provider, clockevents_register_device(), it needs to use raw spinlocks internally as this is the context of which the clockevent framework interacts with the CMT driver. However in the context of holding a raw spinlock the CMT driver can't really manage its power state or clock with calls to pm_runtime_() and clk_() as these calls end up in other platform drivers using regular spinlocks to control power and clocks. This mix of spinlock contexts trips a lockdep warning. ============================= [ BUG: Invalid wait context ] 6.17.0-rc3-arm64-renesas-03071-gb3c4f4122b28-dirty #21 Not tainted ----------------------------- swapper/1/0 is trying to lock: ffff00000898d180 (&dev->power.lock){-...}-{3:3}, at: __pm_runtime_resume+0x38/0x88 ccree e6601000.crypto: ARM CryptoCell 630P Driver: HW version 0xAF400001/0xDCC63000, Driver version 5.0 other info that might help us debug this: ccree e6601000.crypto: ARM ccree device initialized context-{5:5} 2 locks held by swapper/1/0: #0: ffff80008173c298 (tick_broadcast_lock){-...}-{2:2}, at: __tick_broadcast_oneshot_control+0xa4/0x3a8 #1: ffff0000089a5858 (&ch->lock){....}-{2:2} usbcore: registered new interface driver usbhid , at: sh_cmt_start+0x30/0x364 stack backtrace: CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.17.0-rc3-arm64-renesas-03071-gb3c4f4122b28-dirty #21 PREEMPT Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT) Call trace: show_stack+0x14/0x1c (C) dump_stack_lvl+0x6c/0x90 dump_stack+0x14/0x1c __lock_acquire+0x904/0x1584 lock_acquire+0x220/0x34c _raw_spin_lock_irqsave+0x58/0x80 __pm_runtime_resume+0x38/0x88 sh_cmt_start+0x54/0x364 sh_cmt_clock_event_set_oneshot+0x64/0xb8 clockevents_switch_state+0xfc/0x13c tick_broadcast_set_event+0x30/0xa4 __tick_broadcast_oneshot_control+0x1e0/0x3a8 tick_broadcast_oneshot_control+0x30/0x40 cpuidle_enter_state+0x40c/0x680 cpuidle_enter+0x30/0x40 do_idle+0x1f4/0x26c cpu_startup_entry+0x34/0x40 secondary_start_kernel+0x11c/0x13c __secondary_switched+0x74/0x78 For non-PREEMPT_RT builds this is not really an issue, but for PREEMPT_RT builds where normal spinlocks can sleep this might be an issue. Be cautious and always leave the power and clock running after probe. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/20251016182022.1837417-1-niklas.soderlund+renesas@ragnatech.se	2025-11-26 11:24:40 +01:00
Johan Hovold	6b38a8b31e	clocksource/drivers/stm: Fix double deregistration on probe failure The purpose of the devm_add_action_or_reset() helper is to call the action function in case adding an action ever fails so drop the clock source deregistration from the error path to avoid deregistering twice. Fixes: `cec32ac758` ("clocksource/drivers/nxp-timer: Add the System Timer Module for the s32gx platforms") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251017055039.7307-1-johan@kernel.org	2025-11-26 11:24:37 +01:00
Haotian Zhang	2ba8e2aae1	clocksource/drivers/ralink: Fix resource leaks in init error path The ralink_systick_init() function does not release all acquired resources on its error paths. If irq_of_parse_and_map() or a subsequent call fails, the previously created I/O memory mapping and IRQ mapping are leaked. Add goto-based error handling labels to ensure that all allocated resources are correctly freed. Fixes: `1f2acc5a8a` ("MIPS: ralink: Add support for systick timer found on newer ralink SoC") Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251030090710.1603-1-vulab@iscas.ac.cn	2025-11-26 11:24:34 +01:00
Stephen Eta Zhou	640594a04f	clocksource/drivers/timer-sp804: Fix read_current_timer() issue when clock source is not registered Register a valid read_current_timer() function for the SP804 timer on ARM32. On ARM32 platforms, when the SP804 timer is selected as the clocksource, the driver does not register a valid read_current_timer() function. As a result, features that rely on this API—such as rdseed—consistently return incorrect values. To fix this, a delay_timer structure is registered during the SP804 driver's initialization. The read_current_timer() function is implemented using the existing sp804_read() logic, and the timer frequency is reused from the already-initialized clocksource. Signed-off-by: Stephen Eta Zhou <stephen.eta.zhou@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20250525-sp804-fix-read_current_timer-v4-1-87a9201fa4ec@gmail.com	2025-11-26 11:24:32 +01:00
Enlin Mu	576c564ec3	clocksource/drivers/sprd: Enable register for timer counter from 32 bit to 64 bit Using 32 bit for suspend compensation, the max compensation time is 36 hours(working clock is 32k).In some IOT devices, the suspend time may be long, even exceeding 36 hours. Therefore, a 64 bit timer counter is needed for counting. Signed-off-by: Enlin Mu <enlin.mu@unisoc.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Link: https://patch.msgid.link/20251106021830.34846-1-enlin.mu@linux.dev	2025-11-26 11:24:26 +01:00
Thomas Gleixner	653fda7ae7	sched/mmcid: Switch over to the new mechanism Now that all pieces are in place, change the implementations of sched_mm_cid_fork() and sched_mm_cid_exit() to adhere to the new strict ownership scheme and switch context_switch() over to use the new mm_cid_schedin() functionality. The common case is that there is no mode change required, which makes fork() and exit() just update the user count and the constraints. In case that a new user would exceed the CID space limit the fork() context handles the transition to per CPU mode with mm::mm_cid::mutex held. exit() handles the transition back to per task mode when the user count drops below the switch back threshold. fork() might also be forced to handle a deferred switch back to per task mode, when a affinity change increased the number of allowed CPUs enough. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172550.280380631@linutronix.de	2025-11-25 19:45:42 +01:00
Thomas Gleixner	9da6ccbcea	sched/mmcid: Implement deferred mode change When affinity changes cause an increase of the number of CPUs allowed for tasks which are related to a MM, that might results in a situation where the ownership mode can go back from per CPU mode to per task mode. As affinity changes happen with runqueue lock held there is no way to do the actual mode change and required fixup right there. Add the infrastructure to defer it to a workqueue. The scheduled work can race with a fork() or exit(). Whatever happens first takes care of it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172550.216484739@linutronix.de	2025-11-25 19:45:42 +01:00
Thomas Gleixner	c809f081fe	irqwork: Move data struct to a types header ... to avoid header recursion hell. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172550.152813625@linutronix.de	2025-11-25 19:45:41 +01:00
Thomas Gleixner	fbd0e71dc3	sched/mmcid: Provide CID ownership mode fixup functions CIDs are either owned by tasks or by CPUs. The ownership mode depends on the number of tasks related to a MM and the number of CPUs on which these tasks are theoretically allowed to run on. Theoretically because that number is the superset of CPU affinities of all tasks which only grows and never shrinks. Switching to per CPU mode happens when the user count becomes greater than the maximum number of CIDs, which is calculated by: opt_cids = min(mm_cid::nr_cpus_allowed, mm_cid::users); max_cids = min(1.25 * opt_cids, nr_cpu_ids); The +25% allowance is useful for tight CPU masks in scenarios where only a few threads are created and destroyed to avoid frequent mode switches. Though this allowance shrinks, the closer opt_cids becomes to nr_cpu_ids, which is the (unfortunate) hard ABI limit. At the point of switching to per CPU mode the new user is not yet visible in the system, so the task which initiated the fork() runs the fixup function: mm_cid_fixup_tasks_to_cpu() walks the thread list and either transfers each tasks owned CID to the CPU the task runs on or drops it into the CID pool if a task is not on a CPU at that point in time. Tasks which schedule in before the task walk reaches them do the handover in mm_cid_schedin(). When mm_cid_fixup_tasks_to_cpus() completes it's guaranteed that no task related to that MM owns a CID anymore. Switching back to task mode happens when the user count goes below the threshold which was recorded on the per CPU mode switch: pcpu_thrs = min(opt_cids - (opt_cids / 4), nr_cpu_ids / 2); This threshold is updated when a affinity change increases the number of allowed CPUs for the MM, which might cause a switch back to per task mode. If the switch back was initiated by a exiting task, then that task runs the fixup function. If it was initiated by a affinity change, then it's run either in the deferred update function in context of a workqueue or by a task which forks a new one or by a task which exits. Whatever happens first. mm_cid_fixup_cpus_to_task() walks through the possible CPUs and either transfers the CPU owned CIDs to a related task which runs on the CPU or drops it into the pool. Tasks which schedule in on a CPU which the walk did not cover yet do the handover themselves. This transition from CPU to per task ownership happens in two phases: 1) mm:mm_cid.transit contains MM_CID_TRANSIT. This is OR'ed on the task CID and denotes that the CID is only temporarily owned by the task. When it schedules out the task drops the CID back into the pool if this bit is set. 2) The initiating context walks the per CPU space and after completion clears mm:mm_cid.transit. After that point the CIDs are strictly task owned again. This two phase transition is required to prevent CID space exhaustion during the transition as a direct transfer of ownership would fail if two tasks are scheduled in on the same CPU before the fixup freed per CPU CIDs. When mm_cid_fixup_cpus_to_tasks() completes it's guaranteed that no CID related to that MM is owned by a CPU anymore. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172550.088189028@linutronix.de	2025-11-25 19:45:41 +01:00
Thomas Gleixner	9a723ed7fa	sched/mmcid: Provide new scheduler CID mechanism The MM CID management has two fundamental requirements: 1) It has to guarantee that at no given point in time the same CID is used by concurrent tasks in userspace. 2) The CID space must not exceed the number of possible CPUs in a system. While most allocators (glibc, tcmalloc, jemalloc) do not care about that, there seems to be at least some LTTng library depending on it. The CID space compaction itself is not a functional correctness requirement, it is only a useful optimization mechanism to reduce the memory foot print in unused user space pools. The optimal CID space is: min(nr_tasks, nr_cpus_allowed); Where @nr_tasks is the number of actual user space threads associated to the mm and @nr_cpus_allowed is the superset of all task affinities. It is growth only as it would be insane to take a racy snapshot of all task affinities when the affinity of one task changes just do redo it 2 milliseconds later when the next task changes it's affinity. That means that as long as the number of tasks is lower or equal than the number of CPUs allowed, each task owns a CID. If the number of tasks exceeds the number of CPUs allowed it switches to per CPU mode, where the CPUs own the CIDs and the tasks borrow them as long as they are scheduled in. For transition periods CIDs can go beyond the optimal space as long as they don't go beyond the number of possible CPUs. The current upstream implementation adds overhead into task migration to keep the CID with the task. It also has to do the CID space consolidation work from a task work in the exit to user space path. As that work is assigned to a random task related to a MM this can inflict unwanted exit latencies. Implement the context switch parts of a strict ownership mechanism to address this. This removes most of the work from the task which schedules out. Only during transitioning from per CPU to per task ownership it is required to drop the CID when leaving the CPU to prevent CID space exhaustion. Other than that scheduling out is just a single check and branch. The task which schedules in has to check whether: 1) The ownership mode changed 2) The CID is within the optimal CID space In stable situations this results in zero work. The only short disruption is when ownership mode changes or when the associated CID is not in the optimal CID space. The latter only happens when tasks exit and therefore the optimal CID space shrinks. That mechanism is strictly optimized for the common case where no change happens. The only case where it actually causes a temporary one time spike is on mode changes when and only when a lot of tasks related to a MM schedule exactly at the same time and have eventually to compete on allocating a CID from the bitmap. In the sysbench test case which triggered the spinlock contention in the initial CID code, __schedule() drops significantly in perf top on a 128 Core (256 threads) machine when running sysbench with 255 threads, which fits into the task mode limit of 256 together with the parent thread: Upstream rseq/perf branch +CID rework 0.42% 0.37% 0.32% [k] __schedule Increasing the number of threads to 256, which puts the test process into per CPU mode looks about the same. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172550.023984859@linutronix.de	2025-11-25 19:45:41 +01:00
Thomas Gleixner	23343b6b09	sched/mmcid: Introduce per task/CPU ownership infrastructure The MM CID management has two fundamental requirements: 1) It has to guarantee that at no given point in time the same CID is used by concurrent tasks in userspace. 2) The CID space must not exceed the number of possible CPUs in a system. While most allocators (glibc, tcmalloc, jemalloc) do not care about that, there seems to be at least librseq depending on it. The CID space compaction itself is not a functional correctness requirement, it is only a useful optimization mechanism to reduce the memory foot print in unused user space pools. The optimal CID space is: min(nr_tasks, nr_cpus_allowed); Where @nr_tasks is the number of actual user space threads associated to the mm and @nr_cpus_allowed is the superset of all task affinities. It is growth only as it would be insane to take a racy snapshot of all task affinities when the affinity of one task changes just do redo it 2 milliseconds later when the next task changes its affinity. That means that as long as the number of tasks is lower or equal than the number of CPUs allowed, each task owns a CID. If the number of tasks exceeds the number of CPUs allowed it switches to per CPU mode, where the CPUs own the CIDs and the tasks borrow them as long as they are scheduled in. For transition periods CIDs can go beyond the optimal space as long as they don't go beyond the number of possible CPUs. The current upstream implementation adds overhead into task migration to keep the CID with the task. It also has to do the CID space consolidation work from a task work in the exit to user space path. As that work is assigned to a random task related to a MM this can inflict unwanted exit latencies. This can be done differently by implementing a strict CID ownership mechanism. Either the CIDs are owned by the tasks or by the CPUs. The latter provides less locality when tasks are heavily migrating, but there is no justification to optimize for overcommit scenarios and thereby penalizing everyone else. Provide the basic infrastructure to implement this: - Change the UNSET marker to BIT(31) from ~0U - Add the ONCPU marker as BIT(30) - Add the TRANSIT marker as BIT(29) That allows to check for ownership trivially and provides a simple check for UNSET as well. The TRANSIT marker is required to prevent CID space exhaustion when switching from per CPU to per task mode. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251119172549.960252358@linutronix.de	2025-11-25 19:45:41 +01:00
Thomas Gleixner	51dd92c71a	sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex Prepare for the new CID management scheme which puts the CID ownership transition into the fork() and exit() slow path by serializing sched_mm_cid_fork()/exit() with it, so task list and cpu mask walks can be done in interruptible and preemptible code. The contention on it is not worse than on other concurrency controls in the fork()/exit() machinery. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.895826703@linutronix.de	2025-11-25 19:45:41 +01:00
Thomas Gleixner	b0c3d51b54	sched/mmcid: Provide precomputed maximal value Reading mm::mm_users and mm:::mm_cid::nr_cpus_allowed every time to compute the maximal CID value is just wasteful as that value is only changing on fork(), exit() and eventually when the affinity changes. So it can be easily precomputed at those points and provided in mm::mm_cid for consumption in the hot path. But there is an issue with using mm::mm_users for accounting because that does not necessarily reflect the number of user space tasks as other kernel code can take temporary references on the MM which skew the picture. Solve that by adding a users counter to struct mm_mm_cid, which is modified by fork() and exit() and used for precomputing under mm_mm_cid::lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.832764634@linutronix.de	2025-11-25 19:45:40 +01:00
Thomas Gleixner	bf070520e3	sched/mmcid: Move initialization out of line It's getting bigger soon, so just move it out of line to the rest of the code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.769636491@linutronix.de	2025-11-25 19:45:40 +01:00
Thomas Gleixner	2b1642b881	signal: Move MMCID exit out of sighand lock There is no need anymore to keep this under sighand lock as the current code and the upcoming replacement are not depending on the exit state of a task anymore. That allows to use a mutex in the exit path. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.706439391@linutronix.de	2025-11-25 19:45:40 +01:00
Thomas Gleixner	539115f08c	sched/mmcid: Convert mm CID mask to a bitmap This is truly a bitmap and just conveniently uses a cpumask because the maximum size of the bitmap is nr_cpu_ids. But that prevents to do searches for a zero bit in a limited range, which is helpful to provide an efficient mechanism to consolidate the CID space when the number of users decreases. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Link: https://patch.msgid.link/20251119172549.642866767@linutronix.de	2025-11-25 19:45:40 +01:00
Thomas Gleixner	35a5c37cb9	cpumask: Cache num_possible_cpus() Reevaluating num_possible_cpus() over and over does not make sense. That becomes a constant after init as cpu_possible_mask is marked ro_after_init. Cache the value during initialization and provide that for consumption. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://patch.msgid.link/20251119172549.578653738@linutronix.de	2025-11-25 19:45:40 +01:00
Heiko Carstens	509c34924d	s390/vdso: Get rid of -m64 flag handling The compiler/assembler flag -m64 is added and removed at two locations. This pointless exercise is a leftover to keep the 31 and 64 bit vdso Makefiles as symmetrical as possible. Given that the 31 bit vdso code does not exist anymore, remove the -m64 flag handling. Suggested-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-25 15:28:08 +01:00
Heiko Carstens	c0087d807a	s390/vdso: Rename vdso64 to vdso Since compat is gone there is only a 64 bit vdso left. Remove the superfluous "64" suffix everywhere. Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-25 15:28:07 +01:00
Heiko Carstens	b3bdfdf1f9	s390: Rename head64.S to head.S All the code is 64 bit, therefore remove the superfluous suffix. Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-25 15:28:07 +01:00
Jens Remus	5e811b922e	s390/vdso: Use common STABS_DEBUG and DWARF_DEBUG macros This simplifies the vDSO linker script. The ELF_DETAILS macro was not used in addition, as done on arm64 and powerpc, as that would introduce an empty .modinfo section. Note that this rearranges the .comment section to follow after all of the debug sections. Signed-off-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-25 15:28:07 +01:00
Zenon Xiu	4b7a59fa70	Documentation/arm64: Fix the typo of register names The register name 'HWFGWTR_EL2' and 'HWFGRTR_EL2' is wrong, should be 'HFGWTR_EL2' and 'HFGRTR_EL2'. Find the register description on arm website here, https://developer.arm.com/documentation/ddi0601/2025-09/AArch64-Registers/HFGWTR-EL2--Hypervisor-Fine-Grained-Write-Trap-Register https://developer.arm.com/documentation/ddi0601/2025-09/AArch64-Registers/HFGRTR-EL2--Hypervisor-Fine-Grained-Read-Trap-Register?lang=en Signed-off-by: Zenon Xiu <zenonxiu@outlook.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-25 12:26:31 +00:00
Marc Zyngier	155f8d4ef0	ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() Since `0f67b56d84` ("clocksource/drivers/arm_arch_timer_mmio: Switch over to standalone driver"), acpi_arch_timer_mem_init() is unused. Remove it. Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-25 11:55:13 +00:00
Randy Dunlap	73029e73cc	x86/cc: Fix enum spelling to fix kernel-doc warnings Make the enum name in kernel-doc match the code to prevent kernel-doc warnings: Warning: include/linux/cc_platform.h:106 Enum value 'CC_ATTR_GUEST_SEV_SNP' not described in enum 'cc_attr' Warning: include/linux/cc_platform.h:106 Excess enum value '%CC_ATTR_SEV_SNP' description in 'cc_attr' Fixes: `f742b90e61` ("x86/mm: Extend cc_attr to include AMD SEV-SNP") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251125022730.3163679-1-rdunlap@infradead.org	2025-11-25 09:17:13 +01:00
Nikolay Borisov	69acbdbbef	RAS/AMD/ATL: Replace bitwise_xor_bits() with hweight16() Doing hweight16() and checking whether the lsb is set is functionally equivalent to what bitwise_xor_bits() does. In addition, it results in better generated code as before gcc would inline the function 4 times. With hweight16(), the resulting code boils down to 2 instructions - POPCNT and AND, and all relevant CPUs support POPCNT. An alternative would have been to use the __builtin_parity() function provided by both Clang/GCC, however under some circumstances the compiler can choose not to inline it but generate a library call which is unsupported in the kernel. No functional changes. [ bp: Massage commit message. ] Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251124142517.1708451-1-nik.borisov@suse.com	2025-11-24 17:00:37 +01:00
James Clark	e6a27290d8	perf: arm_spe: Add support for filtering on data source SPE_FEAT_FDS adds the ability to filter on the data source of packets. Like the other existing filters, enable filtering with PMSFCR_EL1.FDS when any of the filter bits are set. Each bit position of the 64 bit filter maps to numerical data sources 0-63 described by bits[0:5] in the data source packet (although the full range of data source is 16 bits so higher value data sources can't be filtered on). The filter is an OR of all the filter bits, so for example clearing filter bits 0 and 3 only includes packets from data sources 0 OR 3. Invert the filter given by userspace so that the default value of 0 is equivalent to including all values (no filtering). This allows us to skip adding a new format bit to enable filtering and still support excluding all data sources which would have been a filter value of 0 if not for the inversion. Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-24 15:59:18 +00:00
James Clark	cbbfba4847	perf: Add perf_event_attr::config4 Arm FEAT_SPE_FDS adds the ability to filter on the data source of a packet using another 64-bits of event filtering control. As the existing perf_event_attr::configN fields are all used up for SPE PMU, an additional field is needed. Add a new 'config4' field. Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-24 15:59:18 +00:00
Joakim Zhang	11abb4e87b	perf/imx_ddr: Add support for PMU in DB (system interconnects) There is a PMU in DB, which has the same function with PMU in DDR subsystem, the difference is PMU in DB only supports cycles, axid-read, axid-write events. e.g. perf stat -a -e imx8_db0/axid-read,axi_mask=0xMMMM,axi_id=0xDDDD,axi_port=0xPP,axi_channel=0xH/ cmd perf stat -a -e imx8_db0/axid-write,axi_mask=0xMMMM,axi_id=0xDDDD,axi_port=0xPP,axi_channel=0xH/ cmd Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-24 15:39:05 +00:00
Frank Li	037e8cf671	perf/imx_ddr: Get and enable optional clks Get and enable optional clks because fsl,imx8dxl-db-pmu have two clocks. Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-24 15:39:05 +00:00
Frank Li	66db99ffdf	perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() Move ida_alloc() from helper ddr_perf_init() into ddr_perf_probe() to clarify why ida_free() must be called at the error path. Add return value check for ida_alloc(). Rename label 'cpuhp_state_err' to 'idr_free' to make the code clearer, since two error paths now jump to this label. Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-24 15:39:05 +00:00
Frank Li	de8209e554	dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL Add compatible string fsl,imx8qm-ddr-pmu, fsl,imx8qxp-ddr-pmu, which fallback to fsl,imx8-ddr-pmu and fsl,imx8dxl-db-pmu (for data bus fabric). Add clocks, clock-names for fsl,imx8dxl-db-pmu and keep the same restriction for existing compatible strings. Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-24 15:39:05 +00:00
Heiko Carstens	f5730d44e0	s390: Add stackprotector support Stackprotector support was previously unavailable on s390 because by default compilers generate code which is not suitable for the kernel: the canary value is accessed via thread local storage, where the address of thread local storage is within access registers 0 and 1. Using those registers also for the kernel would come with a significant performance impact and more complicated kernel entry/exit code, since access registers contents would have to be exchanged on every kernel entry and exit. With the upcoming gcc 16 release new compiler options will become available which allow to generate code suitable for the kernel. [1] Compiler option -mstack-protector-guard=global instructs gcc to generate stackprotector code that refers to a global stackprotector canary value via symbol __stack_chk_guard. Access to this value is guaranteed to occur via larl and lgrl instructions. Furthermore, compiler option -mstack-protector-guard-record generates a section containing all code addresses that reference the canary value. To allow for per task canary values the instructions which load the address of __stack_chk_guard are patched so they access a lowcore field instead: a per task canary value is available within the task_struct of each task, and is written to the per-cpu lowcore location on each context switch. Also add sanity checks and debugging option to be consistent with other kernel code patching mechanisms. Full debugging output can be enabled with the following kernel command line options: debug_stackprotector bootdebug ignore_loglevel earlyprintk dyndbg="file stackprotector.c +p" Example debug output: stackprot: 0000021e402d4eda: c010005a9ae3 -> c01f00070240 where "<insn address>: <old insn> -> <new insn>". [1] gcc commit 0cd1f03939d5 ("s390: Support global stack protector") Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:45:21 +01:00
Heiko Carstens	1d7764cfe3	s390/modules: Simplify module_finalize() slightly Preinitialize the return value, and break out the for loop in module_finalize() in case of an error to get rid of an ifdef. This makes it easier to add additional code, which may also depend on config options. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:45:21 +01:00
Heiko Carstens	c3d17464f0	s390: Remove KMSG_COMPONENT macro The KMSG_COMPONENT macro is a leftover of the s390 specific "kernel message catalog" which never made it upstream. Remove the macro in order to get rid of a pointless indirection. Replace all users with the string it defines. In almost all cases this leads to a simple replacement like this: - #define KMSG_COMPONENT "appldata" - #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt + #define pr_fmt(fmt) "appldata: " fmt Except for some special cases this is just mechanical/scripted work. Acked-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:45:21 +01:00
Heiko Carstens	e950d1f84d	s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU Since the rework of the kernel virtual address space [1] the module area and the kernel image are within the same 4GB area. Therefore there is no need for the weak per cpu workaround for modules anymore. Remove it. [1] commit `c98d2ecae0` ("s390/mm: Uncouple physical vs virtual address spaces") Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:45:20 +01:00
Heiko Carstens	f555d885bf	Merge branch 'ap-driver-override' into features Harald Freudenberger says: ==================== Support for driver override on AP queues. Add a new sysfs attribute driver_override the AP queue's directory. Writing in a string overrides the default driver determination and the drivers are matched against this string instead. This overrules the driver binding determined by the apmask/aqmask bitmask fields. With the write to the attribute a check is done if the queue is in use by an mdev device. If this is true, the write is aborted and EBUSY is returned. As there exists some tooling for this kind of driver_override (see package driverctl) the AP bus behavior for re-binding should be compatible to this. The steps for a driver_override are: 1) unbind the current driver from the device. For example echo "17.0005" > /sys/devices/ap/card17/17.0005/driver/unbind 2) set the new driver for this device in the sysfs driver_override attribute. For example echo "vfio_ap" > /sys//devices/ap/card17/17.0005/driver_override 3) trigger a bus reprobe of this device. For example echo "17.0005" > /sys/bus/ap/drivers_probe With the driverctl package this is more comfortable and the settings get persisted: driverctl -b ap set-override 17.0005 vfio_ap and unset with driverctl -b ap unset-override 17.0005 ==================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:43:19 +01:00
Harald Freudenberger	46030379f1	s390/ap: Restrict driver_override versus apmask and aqmask use Introduce a restriction for the driver_override feature versus apmask and aqmask: - driver_override is only allowed when the apmask and aqmask values both are default (=0xffff..ffff). - apmask and aqmask modifications are only allowed when there is no driver_override on any AP device active. So in the end the user is restricted to choose to either use apmask/apmask to divide the AP devices into host owned and vfio owned or use the driver_override feature but not mix these two approaches. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:43:06 +01:00
Harald Freudenberger	8babcc2b6a	s390/ap: Rename mutex ap_perms_mutex to ap_attr_mutex The mutex ap_perms_mutex was already used not only for protection of the struct ap_perms ap_perms variable but also for an consistent update of the AP bus sysfs attributes apmask and aqmask. So rename this mutex to ap_attr_mutex which better reflects the current use. This is also a preparation for an upcoming patch which will use this mutex to lock updates on a new sysfs attribute. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:43:06 +01:00
Harald Freudenberger	d38a87d7c0	s390/ap: Support driver_override for AP queue devices Add a new sysfs attribute driver_override the AP queue's directory. Writing in a string overrides the default driver determination and the drivers are matched against this string instead. This overrules the driver binding determined by the apmask/aqmask bitmask fields. According to the common understanding of how the driver_override behavior shall work, there is no further checking done. Neither about the string which is given as override driver nor if this device is currently in use by an mdev device. Another patch may limit this behavior to refuse a mixed usage of the driver_override and apmask/aqmask feature. As there exists some tooling for this kind of driver_override (see package driverctl) the AP bus behavior for re-binding should be compatible to this. The steps for a driver_override are: 1) unbind the current driver from the device. For example echo "17.0005" > /sys/devices/ap/card17/17.0005/driver/unbind 2) set the new driver for this device in the sysfs driver_override attribute. For example echo "vfio_ap" > /sys//devices/ap/card17/17.0005/driver_override 3) trigger a bus reprobe of this device. For example echo "17.0005" > /sys/bus/ap/drivers_probe With the driverctl package this is more comfortable and the settings get persisted: driverctl -b ap set-override 17.0005 vfio_ap and unset with driverctl -b ap unset-override 17.0005 Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:43:05 +01:00
Harald Freudenberger	6917f434fd	s390/ap: Use all-bits-one apmask/aqmask for vfio in_use() checks For the in_use() check of an updated apmask the host's aqmask was provided to the vfio function. Similar on an update of the aqmask the host's apmask was provided to the vfio in_use() function. This led to false results on the check for apmask or aqmask updates. For example with only one APQN when exactly this card is tried to be re-assigned back to the host, the in_use() check did not complain. The correct behavior is achieved with providing a full mask for aqmask when an adapter is to be checked and similar a full mask for aqmask when a domain is to be checked for usage. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:43:05 +01:00
Geert Uytterhoeven	aaf4e92341	m68k: defconfig: Update defconfigs for v6.18-rc1 - Drop CONFIG_SCTP_COOKIE_HMAC_SHA1=y (removed in commit `2f3dd6ec90` ("sctp: Convert cookie authentication to use HMAC-SHA256")), - Drop CONFIG_BATMAN_ADV_NC=y (removed in commit `87b95082db` ("batman-adv: remove network coding support")), - Enable modular build of the SHA-1 secure hash algorithm (no longer auto-enabled since commit `2f3dd6ec90` ("sctp: Convert cookie authentication to use HMAC-SHA256")). Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://patch.msgid.link/65e00bcb7b2980278bb087986ee405627aa32d8b.1760360254.git.geert@linux-m68k.org	2025-11-24 11:03:50 +01:00
Yue Haibing	e6a11a526e	x86/{boot,mtrr}: Remove unused function declarations Commits `28be1b454c` ("x86/boot: Remove unused copy_*_gs() functions") `34d2819f20` ("x86, mtrr: Remove unused mtrr/state.c") removed the functions but left the prototypes. Remove them. [ bp: Merge into a single patch. ] Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251120121037.1479334-1-yuehaibing@huawei.com	2025-11-22 21:26:36 +01:00
Lorenzo Pieralisi	9c1fbc56ca	irqchip/gic-its: Rework platform MSI deviceID detection Current code retrieving platform devices MSI devID in the GIC ITS MSI parent helpers suffers from some minor issues: - It leaks a struct device_node reference - It is duplicated between GICv3 and GICv5 for no good reason - It does not use the OF phandle iterator code that simplifies the msi-parent property parsing Consolidate GIC v3 and v5 deviceID retrieval in a function that addresses the full set of issues in one go by merging GIC v3 and v5 code and converting the msi-parent parsing loop to the more modern OF phandle iterator API, fixing the struct device_node reference leak in the process. Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://patch.msgid.link/20251021124103.198419-6-lpieralisi@kernel.org	2025-11-22 17:09:03 +01:00
Lorenzo Pieralisi	4f32612f6a	PCI: iproc: Implement MSI controller node detection with of_msi_xlate() The functionality implemented in the iproc driver in order to detect an OF MSI controller node is now fully implemented in of_msi_xlate(). Replace the current msi-map/msi-parent parsing code with of_msi_xlate(). Since of_msi_xlate() is also a deviceID mapping API, pass in a fictitious 0 as deviceID - the driver only requires detecting the OF MSI controller node not the deviceID mapping per-se (of_msi_xlate() return value is ignored for the same reason). Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frank Li <Frank.Li@nxp.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20251021124103.198419-5-lpieralisi@kernel.org	2025-11-22 17:09:03 +01:00
Thomas Gleixner	ebb922c920	Merge tag 'v6.18-rc3' into irq/msi Pick up OF changes to resolve dependencies	2025-11-22 17:07:57 +01:00
Babu Moger	ac7de456a3	fs/resctrl: Update bit_usage to reflect io_alloc The "shareable_bits" and "bit_usage" resctrl files associated with cache resources give insight into how instances of a cache is used. Update the annotated capacity bitmasks displayed by "bit_usage" to include the cache portions allocated for I/O via the "io_alloc" feature. "shareable_bits" is a global bitmask of shareable cache with I/O and can thus not present the per-domain I/O allocations possible with the "io_alloc" feature. Revise the "shareable_bits" documentation to direct users to "bit_usage" for accurate cache usage information. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/e02a0d424129fd7f3e45822a559b1c614ae4652a.1762995456.git.babu.moger@amd.com	2025-11-22 14:30:34 +01:00
Babu Moger	28fa2cce7a	fs/resctrl: Introduce interface to modify io_alloc capacity bitmasks The io_alloc feature in resctrl enables system software to configure the portion of the cache allocated for I/O traffic. When supported, the io_alloc_cbm file in resctrl provides access to capacity bitmasks (CBMs) allocated for I/O devices. Enable users to modify io_alloc CBMs by writing to the io_alloc_cbm resctrl file when the io_alloc feature is enabled. Mirror the CBMs between CDP_CODE and CDP_DATA when CDP is enabled to present consistent I/O allocation information to user space. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/67609641b03ccfba18a8ee0bf9dbd1f3dcbecda3.1762995456.git.babu.moger@amd.com	2025-11-22 14:28:31 +01:00
Babu Moger	af1242eeca	fs/resctrl: Modify struct rdt_parse_data to pass mode and CLOSID parse_cbm() requires resource group mode and CLOSID to validate the capacity bitmask (CBM). It is passed via struct rdtgroup in struct rdt_parse_data. The io_alloc feature also uses CBMs to indicate which portions of cache are allocated for I/O traffic. The CBMs are provided by user space and need to be validated the same as CBMs provided for general (CPU) cache allocation. parse_cbm() cannot be used as-is since io_alloc does not have rdtgroup context. Pass the resource group mode and CLOSID directly to parse_cbm() via struct rdt_parse_data, instead of through the rdtgroup struct, to facilitate calling parse_cbm() to verify the CBM of the io_alloc feature. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/f8ec6ab5cf594d906a3fe75f56793d5fbd63f38f.1762995456.git.babu.moger@amd.com	2025-11-22 13:10:12 +01:00
Babu Moger	77b6623262	fs/resctrl: Introduce interface to display io_alloc CBMs Introduce the "io_alloc_cbm" resctrl file to display the capacity bitmasks (CBMs) that represent the portions of each cache instance allocated for I/O traffic on a cache resource that supports the "io_alloc" feature. io_alloc_cbm resides in the info directory of a cache resource, for example, /sys/fs/resctrl/info/L3/. Since the resource name is part of the path, it is not necessary to display the resource name as done in the schemata file. When CDP is enabled, io_alloc routes traffic using the highest CLOSID associated with the CDP_CODE resource and that CLOSID becomes unusable for the CDP_DATA resource. The highest CLOSID of CDP_CODE and CDP_DATA resources will be kept in sync to ensure consistent user interface. In preparation for this, access the CBMs for I/O traffic through highest CLOSID of either CDP_CODE or CDP_DATA resource. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/55a3ff66a70e7ce8239f022e62b334e9d64af604.1762995456.git.babu.moger@amd.com	2025-11-22 11:37:21 +01:00
Frederic Weisbecker	3de5e46e50	genirq: Remove cpumask availability check on kthread affinity setting Failing to allocate the affinity mask of an interrupt descriptor fails the whole descriptor initialization. It is then guaranteed that the cpumask is always available whenever the related interrupt objects are alive, such as the kthread handler. Therefore remove the superfluous check since it is merely a historical leftover. Get rid also of the comments above it that are obsolete and useless. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251121143500.42111-4-frederic@kernel.org	2025-11-22 09:26:18 +01:00
Frederic Weisbecker	801afdfbfc	genirq: Fix interrupt threads affinity vs. cpuset isolated partitions When a cpuset isolated partition is created / updated or destroyed, the interrupt threads are affined blindly to all the non-isolated CPUs. This happens without taking into account the interrupt threads initial affinity that becomes ignored. For example in a system with 8 CPUs, if an interrupt and its kthread are initially affine to CPU 5, creating an isolated partition with only CPU 2 inside will eventually end up affining the interrupt kthread to all CPUs but CPU 2 (that is CPUs 0,1,3-7), losing the kthread preference for CPU 5. Besides the blind re-affining, this doesn't take care of the actual low level interrupt which isn't migrated. As of today the only way to isolate non managed interrupts, along with their kthreads, is to overwrite their affinity separately, for example through /proc/irq/ To avoid doing that manually, future development should focus on updating the interrupt's affinity whenever cpuset isolated partitions are updated. In the meantime, cpuset shouldn't fiddle with interrupt threads directly. To prevent from that, set the PF_NO_SETAFFINITY flag to them. This is done through kthread_bind_mask() by affining them initially to all possible CPUs as at that point the interrupt is not started up which means the affinity of the hard interrupt is not known. The thread will adjust that once it reaches the handler, which is guaranteed to happen after the initial affinity of the hard interrupt is established. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251121143500.42111-3-frederic@kernel.org	2025-11-22 09:26:18 +01:00
Frederic Weisbecker	68775ca79a	genirq: Prevent early spurious wake-ups of interrupt threads During initialization, the interrupt thread is created before the interrupt is enabled. The interrupt enablement happens before the actual kthread wake up point. Once the interrupt is enabled the hardware can raise an interrupt and once setup_irq() drops the descriptor lock a interrupt wake-up can happen. Even when such an interrupt can be considered premature, this is not a problem in general because at the point where the descriptor lock is dropped and the wakeup can happen, the data which is used by the thread is fully initialized. Though from the perspective of least surprise, the initial wakeup really should be performed by the setup code and not randomly by a premature interrupt. Prevent this by performing a wake-up only if the target is in state TASK_INTERRUPTIBLE, which the thread uses in wait_for_interrupt(). If the thread is still in state TASK_UNINTERRUPTIBLE, the wake-up is not lost because after the setup code completed the initial wake-up the thread will observe the IRQTF_RUNTHREAD and proceed with the handling. [ tglx: Simplified the changes and extended the changelog. ] Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251121143500.42111-2-frederic@kernel.org	2025-11-22 09:26:18 +01:00
Babu Moger	9445c7059c	fs/resctrl: Add user interface to enable/disable io_alloc feature AMD's SDCIAE forces all SDCI lines to be placed into the L3 cache portions identified by the highest-supported L3_MASK_n register, where n is the maximum supported CLOSID. To support this, when io_alloc resctrl feature is enabled, reserve the highest CLOSID exclusively for I/O allocation traffic making it no longer available for general CPU cache allocation. Introduce user interface to enable/disable io_alloc feature and encourage users to enable io_alloc only when running workloads that can benefit from this functionality. On enable, initialize the io_alloc CLOSID with all usable CBMs across all the domains. Since CLOSIDs are managed by resctrl fs, it is least invasive to make "io_alloc is supported by maximum supported CLOSID" part of the initial resctrl fs support for io_alloc. Take care to minimally (only in error messages) expose this use of CLOSID for io_alloc to user space so that this is not required from other architectures that may support io_alloc differently in the future. When resctrl is mounted with "-o cdp" to enable code/data prioritization, there are two L3 resources that can support I/O allocation: L3CODE and L3DATA. From resctrl fs perspective the two resources share a CLOSID and the architecture's available CLOSID are halved to support this. The architecture's underlying CLOSID used by SDCIAE when CDP is enabled is the CLOSID associated with the CDP_CODE resource, but from resctrl's perspective there is only one CLOSID for both CDP_CODE and CDP_DATA. CDP_DATA is thus not usable for general (CPU) cache allocation nor I/O allocation. Keep the CDP_CODE and CDP_DATA I/O alloc status in sync to avoid any confusion to user space. That is, enabling io_alloc on CDP_CODE does so on CDP_DATA and vice-versa, and keep the I/O allocation CBMs of CDP_CODE and CDP_DATA in sync. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/c7d3037795e653e22b02d8fc73ca80d9b075031c.1762995456.git.babu.moger@amd.com	2025-11-21 23:01:54 +01:00
Babu Moger	48068e5650	fs/resctrl: Introduce interface to display "io_alloc" support Introduce the "io_alloc" resctrl file to the "info" area of a cache resource, for example /sys/fs/resctrl/info/L3/io_alloc. "io_alloc" indicates support for the "io_alloc" feature that allows direct insertion of data from I/O devices into the cache. Restrict exposing support for "io_alloc" to the L3 resource that is the only resource where this feature can be backed by AMD's L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE). With that, the "io_alloc" file is only visible to user space if the L3 resource supports "io_alloc". Doing so makes the file visible for all cache resources though, for example also L2 cache (if it supports cache allocation). As a consequence, add capability for file to report expected "enabled" and "disabled", as well as "not supported". Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/e8b116a8f424128b227734bb1d433c14af478d90.1762995456.git.babu.moger@amd.com	2025-11-21 22:49:42 +01:00
Babu Moger	556d2892aa	x86,fs/resctrl: Implement "io_alloc" enable/disable handlers "io_alloc" is the generic name of the new resctrl feature that enables system software to configure the portion of cache allocated for I/O traffic. On AMD systems, "io_alloc" resctrl feature is backed by AMD's L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE). Introduce the architecture-specific functions that resctrl fs should call to enable, disable, or check status of the "io_alloc" feature. Change SDCIAE state by setting (to enable) or clearing (to disable) bit 1 of MSR_IA32_L3_QOS_EXT_CFG on all logical processors within the cache domain. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/9e9070100c320eab5368e088a3642443dee95ed7.1762995456.git.babu.moger@amd.com	2025-11-21 22:35:22 +01:00
Babu Moger	7923ae7698	x86,fs/resctrl: Detect io_alloc feature AMD's SDCIAE (SDCI Allocation Enforcement) PQE feature enables system software to control the portions of L3 cache used for direct insertion of data from I/O devices into the L3 cache. Introduce a generic resctrl cache resource property "io_alloc_capable" as the first part of the new "io_alloc" resctrl feature that will support AMD's SDCIAE. Any architecture can set a cache resource as "io_alloc_capable" if a portion of the cache can be allocated for I/O traffic. Set the "io_alloc_capable" property for the L3 cache resource on x86 (AMD) systems that support SDCIAE. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/df85a9a6081674fd3ef6b4170920485512ce2ded.1762995456.git.babu.moger@amd.com	2025-11-21 22:04:59 +01:00
Babu Moger	4d4840b125	x86/resctrl: Add SDCIAE feature in the command line options Add a kernel command-line parameter to enable or disable the exposure of the L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE) hardware feature to resctrl. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/c623edf7cb369ba9da966de47d9f1b666778a40e.1762995456.git.babu.moger@amd.com	2025-11-21 22:03:23 +01:00
Babu Moger	3767def18f	x86/cpufeatures: Add support for L3 Smart Data Cache Injection Allocation Enforcement Smart Data Cache Injection (SDCI) is a mechanism that enables direct insertion of data from I/O devices into the L3 cache. By directly caching data from I/O devices rather than first storing the I/O data in DRAM, SDCI reduces demands on DRAM bandwidth and reduces latency to the processor consuming the I/O data. The SDCIAE (SDCI Allocation Enforcement) PQE feature allows system software to control the portion of the L3 cache used for SDCI. When enabled, SDCIAE forces all SDCI lines to be placed into the L3 cache partitions identified by the highest-supported L3_MASK_n register, where n is the maximum supported CLOSID. Add CPUID feature bit that can be used to configure SDCIAE. The SDCIAE feature details are documented in: AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.4.7 L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE). available at https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/83ca10d981c48e86df2c3ad9658bb3ba3544c763.1762995456.git.babu.moger@amd.com	2025-11-21 22:03:07 +01:00
Smita Koralahalli	5c4663ed1e	x86/mce: Handle AMD threshold interrupt storms Extend the logic of handling CMCI storms to AMD threshold interrupts. Rely on the similar approach as of Intel's CMCI to mitigate storms per CPU and per bank. But, unlike CMCI, do not set thresholds and reduce interrupt rate on a storm. Rather, disable the interrupt on the corresponding CPU and bank. Re-enable back the interrupts if enough consecutive polls of the bank show no corrected errors (30, as programmed by Intel). Turning off the threshold interrupts would be a better solution on AMD systems as other error severities will still be handled even if the threshold interrupts are disabled. [ Tony: Small tweak because mce_handle_storm() isn't a pointer now ] [ Yazen: Rebase and simplify ] [ Avadhut: Remove check to not clear bank's bit in mce_poll_banks and fix checkpatch warnings. ] Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251121190542.2447913-3-avadhut.naik@amd.com	2025-11-21 20:41:10 +01:00
Avadhut Naik	d7ac083f09	x86/mce: Do not clear bank's poll bit in mce_poll_banks on AMD SMCA systems Currently, when a CMCI storm detected on a Machine Check bank, subsides, the bank's corresponding bit in the mce_poll_banks per-CPU variable is cleared unconditionally by cmci_storm_end(). On AMD SMCA systems, this essentially disables polling on that particular bank on that CPU. Consequently, any subsequent correctable errors or storms will not be logged. Since AMD SMCA systems allow banks to be managed by both polling and interrupts, the polling banks bitmap for a CPU, i.e., mce_poll_banks, should not be modified when a storm subsides. Fixes: `7eae17c4ad` ("x86/mce: Add per-bank CMCI storm mitigation") Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251121190542.2447913-2-avadhut.naik@amd.com	2025-11-21 20:33:12 +01:00
Ma Ke	ef1b6d9049	EDAC/igen6: Fix error handling in igen6_edac driver The igen6_edac driver calls device_initialize() for all memory controllers in igen6_register_mci(), but misses corresponding put_device() calls in error paths and during normal shutdown in igen6_unregister_mcis(). Adding the missing put_device() calls improves code readability and ensures proper reference counting for the device structure. Found by code review. Signed-off-by: Ma Ke <make24@iscas.ac.cn> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20251105090244.23327-1-make24@iscas.ac.cn	2025-11-21 10:20:51 -08:00
Qiuxu Zhuo	5f40ea7f41	EDAC/imh: Setup 'imh_test' debugfs testing node Setup the following debugfs testing node to enable fake memory error address decoding tests for the imh_edac driver. /sys/kernel/debug/edac/imh_test/addr Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20251119134132.2389472-8-qiuxu.zhuo@intel.com	2025-11-21 10:20:51 -08:00
Qiuxu Zhuo	f619613f30	EDAC/{skx_comm,imh}: Detect 2-level memory configuration Detect 2-level memory configurations and notify the 'skx_common' library to enable ADXL 2-level memory error decoding. Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20251119134132.2389472-7-qiuxu.zhuo@intel.com	2025-11-21 10:20:51 -08:00
Qiuxu Zhuo	39abdcbdad	EDAC/skx_common: Extend the maximum number of DRAM chip row bits The allowed maximum number of row bits for DRAM chips in the Diamond Rapids server processor is up to 19. Extend the current maximum row bits from 18 to 19. Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20251119134132.2389472-6-qiuxu.zhuo@intel.com	2025-11-21 10:20:51 -08:00
Qiuxu Zhuo	9fc67b1170	EDAC/{skx_common,imh}: Add EDAC driver for Intel Diamond Rapids servers Intel Diamond Rapids CPUs include Integrated Memory and I/O Hubs (IMH). The memory controllers within the IMHs provide memory stacks to the processor. Create a new driver for this IMH-based memory controllers rather than applying additional patches to the existing i10nm_edac.c for the following reasons: 1) The memory controllers are not presented as PCI devices; instead, the detection and all their registers have been transitioned to MMIO-based memory spaces. 2) Validation processes are costly. Modifications to i10nm_edac would require extensive validation checks against multiple platforms, including Ice Lake, Sapphire Rapids, Emerald Rapids, Granite Rapids, Sierra Forest, and Grand Ridge. 3) Future Intel CPUs will likely only need patches on top of this new EDAC driver. Validation can be limited to Diamond Rapids servers and future Intel CPU generations. [Tony: Fix kerneldoc for struct local_reg] [randconfig: Added dependencies on NFIT and DMI] Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20251119134132.2389472-5-qiuxu.zhuo@intel.com	2025-11-21 10:19:43 -08:00
Avadhut Naik	821f5fe4db	x86/mce: Add support for physical address valid bit Starting with Zen6, AMD's Scalable MCA systems will incorporate two new bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a valid System Physical Address (SPA) is present in MCA_ADDR. PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or if it indicates validity of SPA in MCA_ADDR. PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid SPA or if it is implementation specific. Use and prefer MCA_STATUS[PhysAddrV] when checking for a usable address. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251118191731.181269-1-avadhut.naik@amd.com	2025-11-21 10:32:28 +01:00
Yazen Ghannam	eeb3f76d73	x86/mce: Save and use APEI corrected threshold limit The MCA threshold limit generally is not something that needs to change during runtime. It is common for a system administrator to decide on a policy for their managed systems. If MCA thresholding is OS-managed, then the threshold limit must be set at every boot. However, many systems allow the user to set a value in their BIOS. And this is reported through an APEI HEST entry even if thresholding is not in FW-First mode. Use this value, if available, to set the OS-managed threshold limit. Users can still override it through sysfs if desired for testing or debug. APEI is parsed after MCE is initialized. So reset the thresholding blocks later to pick up the threshold limit. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com	2025-11-21 10:32:28 +01:00
Ard Biesheuvel	a3e6907128	x86/boot: Drop unused sev_enable() fallback The misc.h header is not included by the EFI stub, which is the only C caller of sev_enable(). This means the fallback for cases where CONFIG_AMD_MEM_ENCRYPT is not set is never used, so it can be dropped. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://patch.msgid.link/20250909080631.2867579-6-ardb+git@google.com	2025-11-20 21:12:48 +01:00
Gabriele Monaco	7dec062cfc	timers/migration: Exclude isolated cpus from hierarchy The timer migration mechanism allows active CPUs to pull timers from idle ones to improve the overall idle time. This is however undesired when CPU intensive workloads run on isolated cores, as the algorithm would move the timers from housekeeping to isolated cores, negatively affecting the isolation. Exclude isolated cores from the timer migration algorithm, extend the concept of unavailable cores, currently used for offline ones, to isolated ones: * A core is unavailable if isolated or offline; * A core is available if non isolated and online; A core is considered unavailable as isolated if it belongs to: * the isolcpus (domain) list * an isolated cpuset Except if it is: * in the nohz_full list (already idle for the hierarchy) * the nohz timekeeper core (must be available to handle global timers) CPUs are added to the hierarchy during late boot, excluding isolated ones, the hierarchy is also adapted when the cpuset isolation changes. Due to how the timer migration algorithm works, any CPU part of the hierarchy can have their global timers pulled by remote CPUs and have to pull remote timers, only skipping pulling remote timers would break the logic. For this reason, prevent isolated CPUs from pulling remote global timers, but also the other way around: any global timer started on an isolated CPU will run there. This does not break the concept of isolation (global timers don't come from outside the CPU) and, if considered inappropriate, can usually be mitigated with other isolation techniques (e.g. IRQ pinning). This effect was noticed on a 128 cores machine running oslat on the isolated cores (1-31,33-63,65-95,97-127). The tool monopolises CPUs, and the CPU with lowest count in a timer migration hierarchy (here 1 and 65) appears as always active and continuously pulls global timers, from the housekeeping CPUs. This ends up moving driver work (e.g. delayed work) to isolated CPUs and causes latency spikes: before the change: # oslat -c 1-31,33-63,65-95,97-127 -D 62s ... Maximum: 1203 10 3 4 ... 5 (us) after the change: # oslat -c 1-31,33-63,65-95,97-127 -D 62s ... Maximum: 10 4 3 4 3 ... 5 (us) The same behaviour was observed on a machine with as few as 20 cores / 40 threads with isocpus set to: 1-9,11-39 with rtla-osnoise-top. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: John B. Wyatt IV <jwyatt@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://patch.msgid.link/20251120145653.296659-8-gmonaco@redhat.com	2025-11-20 20:17:32 +01:00
Yury Norov	b56651007f	cpumask: Add initialiser to use cleanup helpers Now we can simplify a code that allocates cpumasks for local needs. Automatic variables have to be initialized at declaration, or at least before any possibility for the logic to return, so that compiler wouldn't try to call an associate destructor function on a random stack number. Because cpumask_var_t, depending on the CPUMASK_OFFSTACK config, is either a pointer or an array, we have to have a macro for initialization. So define a CPUMASK_VAR_NULL macro, which allows to init struct cpumask pointer with NULL when CPUMASK_OFFSTACK is enabled, and effectively a no-op when CPUMASK_OFFSTACK is disabled (initialisation optimised out with -O2). Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://patch.msgid.link/20251120145653.296659-7-gmonaco@redhat.com	2025-11-20 20:17:32 +01:00
Gabriele Monaco	185bccc797	sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any Currently the user can set up isolcpus and nohz_full in such a way that leaves no housekeeping CPU (i.e. no CPU that is neither domain isolated nor nohz full). This can be a problem for other subsystems (e.g. the timer wheel imgration). Prevent this configuration by invalidating the last setting in case the union of isolcpus (domain) and nohz_full covers all CPUs. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://patch.msgid.link/20251120145653.296659-6-gmonaco@redhat.com	2025-11-20 20:17:31 +01:00
Gabriele Monaco	22f8e41680	cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks() update_unbound_workqueue_cpumask() updates unbound workqueues settings when there's a change in isolated CPUs, but it can be used for other subsystems requiring updated when isolated CPUs change. Generalise the name to update_isolation_cpumasks() to prepare for other functions unrelated to workqueues to be called in that spot. [longman: Change the function name to update_isolation_cpumasks()] Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Chen Ridong <chenridong@huaweicloud.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://patch.msgid.link/20251120145653.296659-5-gmonaco@redhat.com	2025-11-20 20:17:31 +01:00
Gabriele Monaco	4c2374ed86	timers/migration: Use scoped_guard on available flag set/clear Cleanup tmigr_clear_cpu_available() and tmigr_set_cpu_available() to prepare for easier checks on the available flag. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251120145653.296659-4-gmonaco@redhat.com	2025-11-20 20:17:31 +01:00
Gabriele Monaco	a048ca5f00	timers/migration: Add mask for CPUs available in the hierarchy Keep track of the CPUs available for timer migration in a cpumask. This prepares the ground to generalise the concept of unavailable CPUs. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251120145653.296659-3-gmonaco@redhat.com	2025-11-20 20:17:31 +01:00
Gabriele Monaco	8312cab5ff	timers/migration: Rename 'online' bit to 'available' The timer migration hierarchy excludes offline CPUs via the tmigr_is_not_available function, which is essentially checking the online bit for the CPU. Rename the online bit to available and all references in function names and tracepoint to generalise the concept of available CPUs. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251120145653.296659-2-gmonaco@redhat.com	2025-11-20 20:17:31 +01:00
Cai Xinchen	f20810157f	arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT The commit `e7bafbf717` ("arm64: mm: Add top-level dispatcher for internal mem_encrypt API") adds ARCH_HAS_MEM_ENCRYPT. And then the commit `42be24a417` ("arm64: Enable memory encrypt for Realms") adds duplicate config. Just remove it. Fixes: `42be24a417` ("arm64: Enable memory encrypt for Realms") Signed-off-by: Cai Xinchen <caixinchen1@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-20 17:57:23 +00:00
Yang Shi	a06494adb7	arm64: mm: use untagged address to calculate page index Nathan Chancellor reported the below bug: [ 0.149929] BUG: KASAN: invalid-access in change_memory_common+0x258/0x2d0 [ 0.151006] Read of size 8 at addr f96680000268a000 by task swapper/0/1 [ 0.152031] [ 0.152274] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.0-rc1-00012-g37cb0aab9068 #1 PREEMPT [ 0.152288] Hardware name: linux,dummy-virt (DT) [ 0.152292] Call trace: [ 0.152295] show_stack+0x18/0x30 (C) [ 0.152309] dump_stack_lvl+0x60/0x80 [ 0.152320] print_report+0x480/0x498 [ 0.152331] kasan_report+0xac/0xf0 [ 0.152343] kasan_check_range+0x90/0xb0 [ 0.152353] __hwasan_load8_noabort+0x20/0x34 [ 0.152364] change_memory_common+0x258/0x2d0 [ 0.152375] set_memory_ro+0x18/0x24 [ 0.152386] bpf_prog_pack_alloc+0x200/0x2e8 [ 0.152397] bpf_jit_binary_pack_alloc+0x78/0x188 [ 0.152409] bpf_int_jit_compile+0xa4c/0xc74 [ 0.152420] bpf_prog_select_runtime+0x1c0/0x2bc [ 0.152430] bpf_prepare_filter+0x5a4/0x7c0 [ 0.152443] bpf_prog_create+0xa4/0x100 [ 0.152454] ptp_classifier_init+0x80/0xd0 [ 0.152465] sock_init+0x12c/0x178 [ 0.152474] do_one_initcall+0xa0/0x260 [ 0.152484] kernel_init_freeable+0x2d8/0x358 [ 0.152495] kernel_init+0x20/0x140 [ 0.152510] ret_from_fork+0x10/0x20 It is because the KASAN tagged address was used when calculating the page index. The untagged address should be used. Fixes: `37cb0aab90` ("arm64: mm: make linear mapping permission update more robust for patial range") Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-20 17:57:19 +00:00
Thomas Gleixner	79c11fb3da	sched/mmcid: Use cpumask_weighted_or() Use cpumask_weighted_or() instead of cpumask_or() and cpumask_weight() on the result, which walks the same bitmap twice. Results in 10-20% less cycles, which reduces the runqueue lock hold time. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Link: https://patch.msgid.link/20251119172549.511736272@linutronix.de	2025-11-20 12:14:54 +01:00
Thomas Gleixner	437cb3ded2	cpumask: Introduce cpumask_weighted_or() CID management OR's two cpumasks and then calculates the weight on the result. That's inefficient as that has to walk the same stuff twice. As this is done with runqueue lock held, there is a real benefit of speeding this up. Depending on the system this results in 10-20% less cycles spent with runqueue lock held for a 4K cpumask. Provide cpumask_weighted_or() and the corresponding bitmap functions which return the weight of the OR result right away. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.448263340@linutronix.de	2025-11-20 12:14:54 +01:00
Thomas Gleixner	0d032a43eb	sched/mmcid: Prevent pointless work in mm_update_cpus_allowed() mm_update_cpus_allowed() is not required to be invoked for affinity changes due to migrate_disable() and migrate_enable(). migrate_disable() restricts the task temporarily to a CPU on which the task was already allowed to run, so nothing changes. migrate_enable() restores the actual task affinity mask. If that mask changed between migrate_disable() and migrate_enable() then that change was already accounted for. Move the invocation to the proper place to avoid that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.385208276@linutronix.de	2025-11-20 12:14:54 +01:00
Thomas Gleixner	b08ef5fc8f	sched/mmcid: Move scheduler code out of global header This is only used in the scheduler core code, so there is no point to have it in a global header. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Link: https://patch.msgid.link/20251119172549.321259077@linutronix.de	2025-11-20 12:14:53 +01:00
Thomas Gleixner	925b7847bb	sched: Fixup whitespace damage With whitespace checks enabled in the editor this makes eyes bleed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.258651925@linutronix.de	2025-11-20 12:14:53 +01:00
Thomas Gleixner	be4463fa2c	sched/mmcid: Cacheline align MM CID storage Both the per CPU storage and the data in mm_struct are heavily used in context switch. As they can end up next to other frequently modified data, they are subject to false sharing. Make them cache line aligned. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.194111661@linutronix.de	2025-11-20 12:14:53 +01:00
Thomas Gleixner	8cea569ca7	sched/mmcid: Use proper data structures Having a lot of CID functionality specific members in struct task_struct and struct mm_struct is not really making the code easier to read. Encapsulate the CID specific parts in data structures and keep them separate from the stuff they are embedded in. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.131573768@linutronix.de	2025-11-20 12:14:52 +01:00
Thomas Gleixner	77d7dc8bef	sched/mmcid: Revert the complex CID management The CID management is a complex beast, which affects both scheduling and task migration. The compaction mechanism forces random tasks of a process into task work on exit to user space causing latency spikes. Revert back to the initial simple bitmap allocating mechanics, which are known to have scalability issues as that allows to gradually build up a replacement functionality in a reviewable way. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.068197830@linutronix.de	2025-11-20 12:14:52 +01:00
Qiuxu Zhuo	d4839582bc	EDAC/skx_common: Prepare for skx_set_hi_lo() The upcoming imh_edac driver for Intel Diamond Rapids servers cannot use skx_get_hi_lo() in skx_common to retrieve the TOHM (Top of High Memory) and TOLM (Top of Low Memory) parameters. Instead, it obtains these parameters within its own EDAC driver. To accommodate this, prepare skx_set_hi_lo() to allow the driver to notify skx_common of these parameters. Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20251119134132.2389472-4-qiuxu.zhuo@intel.com	2025-11-19 12:11:40 -08:00
Qiuxu Zhuo	9529e69773	EDAC/skx_common: Prepare for skx_get_edac_list() The Intel EDAC library 'skx_common' maintains the Intel server EDAC device list for {skx, i10nm}_edac drivers, which use skx_get_all_bus_mappings() to build and retrieve the EDAC device list. However, the upcoming Intel EDAC driver, imh_edac, for Diamond Rapids servers is designed for memory controllers that are MMIO-based devices rather than PCI devices. Consequently, it can't use skx_get_all_bus_mappings() due to the absence of a PCI bus. To accommodate this, prepare skx_get_edac_list() to enable the upcoming imh_edac driver to obtain the EDAC device list from the skx_common library and build the EDAC device list independently. Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20251119134132.2389472-3-qiuxu.zhuo@intel.com	2025-11-19 12:11:40 -08:00
Qiuxu Zhuo	b3d70059cb	EDAC/{skx_common,skx,i10nm}: Make skx_register_mci() independent of pci_dev Memory controllers in the new Intel server CPUs, such as Diamond Rapids, are presented as MMIO-based devices rather than PCI devices. Modify skx_register_mci() to be independent of 'pci_dev' and use a generic 'dev' of 'struct device' to prepare for support of such MMIO-based memory controllers. Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20251119134132.2389472-2-qiuxu.zhuo@intel.com	2025-11-19 12:11:40 -08:00
Ben Horgan	ce1e1421f8	MAINTAINERS: new entry for MPAM Driver Create a maintainer entry for the new MPAM Driver. Add myself and James Morse as maintainers. James created the driver and I have taken up the later versions of his series. Cc: James Morse <james.morse@arm.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:24 +00:00
James Morse	2557e0eafe	arm_mpam: Add kunit tests for props_mismatch() When features are mismatched between MSC the way features are combined to the class determines whether resctrl can support this SoC. Add some tests to illustrate the sort of thing that is expected to work, and those that must be removed. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:24 +00:00
James Morse	e3565d1fd4	arm_mpam: Add kunit test for bitmap reset The bitmap reset code has been a source of bugs. Add a unit test. This currently has to be built in, as the rest of the driver is builtin. Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:24 +00:00
James Morse	201d96ca4c	arm_mpam: Add helper to reset saved mbwu state resctrl expects to reset the bandwidth counters when the filesystem is mounted. To allow this, add a helper that clears the saved mbwu state. Instead of cross calling to each CPU that can access the component MSC to write to the counter, set a flag that causes it to be zero'd on the the next read. This is easily done by forcing a configuration update. Signed-off-by: James Morse <james.morse@arm.com> Cc: Peter Newman <peternewman@google.com> Reviewed-by: Fenghua Yu <fenghuay@nvdia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:24 +00:00
Rohit Mathew	9e5afb7c32	arm_mpam: Use long MBWU counters if supported Now that the larger counter sizes are probed, make use of them. Callers of mpam_msmon_read() may not know (or care!) about the different counter sizes. Allow them to specify mpam_feat_msmon_mbwu and have the driver pick the counter to use. Only 32bit accesses to the MSC are required to be supported by the spec, but these registers are 64bits. The lower half may overflow into the higher half between two 32bit reads. To avoid this, use a helper that reads the top half multiple times to check for overflow. Signed-off-by: Rohit Mathew <rohit.mathew@arm.com> [morse: merged multiple patches from Rohit, added explicit counter selection ] Signed-off-by: James Morse <james.morse@arm.com> Cc: Peter Newman <peternewman@google.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:24 +00:00
Rohit Mathew	fdc29a141d	arm_mpam: Probe for long/lwd mbwu counters mpam v0.1 and versions above v1.0 support optional long counter for memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register has fields indicating support for long counters. Probe these feature bits. The mpam_feat_msmon_mbwu feature is used to indicate that bandwidth monitors are supported, instead of muddling this with which size of bandwidth monitors, add an explicit 31 bit counter feature. Signed-off-by: Rohit Mathew <rohit.mathew@arm.com> [ morse: Added 31bit counter feature to simplify later logic ] Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:23 +00:00
Ben Horgan	b353637932	arm_mpam: Consider overflow in bandwidth counter state Use the overflow status bit to track overflow on each bandwidth counter read and add the counter size to the correction when overflow is detected. This assumes that only a single overflow has occurred since the last read of the counter. Overflow interrupts, on hardware that supports them could be used to remove this limitation. Cc: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:23 +00:00
James Morse	41e8a14950	arm_mpam: Track bandwidth counter state for power management Bandwidth counters need to run continuously to correctly reflect the bandwidth. Save the counter state when the hardware is reset due to CPU hotplug. Add struct mbwu_state to track the bandwidth counter. Support for tracking overflow with the same structure will be added in a subsequent commit. Cc: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:23 +00:00
James Morse	823e7c3712	arm_mpam: Add mpam_msmon_read() to read monitor value Reading a monitor involves configuring what you want to monitor, and reading the value. Components made up of multiple MSC may need values from each MSC. MSCs may take time to configure, returning 'not ready'. The maximum 'not ready' time should have been provided by firmware. Add mpam_msmon_read() to hide all this. If (one of) the MSC returns not ready, then wait the full timeout value before trying again. CC: Shanker Donthineni <sdonthineni@nvidia.com> Cc: Shaopeng Tan (Fujitsu) <tan.shaopeng@fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:23 +00:00
James Morse	c891bae664	arm_mpam: Add helpers to allocate monitors MPAM's MSC support a number of monitors, each of which supports bandwidth counters, or cache-storage-utilisation counters. To use a counter, a monitor needs to be configured. Add helpers to allocate and free CSU or MBWU monitors. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:22 +00:00
James Morse	880df85d86	arm_mpam: Probe and reset the rest of the features MPAM supports more features than are going to be exposed to resctrl. For partid other than 0, the reset values of these controls isn't known. Discover the rest of the features so they can be reset to avoid any side effects when resctrl is in use. PARTID narrowing allows MSC/RIS to support less configuration space than is usable. If this feature is found on a class of device we are likely to use, then reduce the partid_max to make it usable. This allows us to map a PARTID to itself. CC: Rohit Mathew <Rohit.Mathew@arm.com> CC: Zeng Heng <zengheng4@huawei.com> CC: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:22 +00:00
James Morse	09b89d2a72	arm_mpam: Allow configuration to be applied and restored during cpu online When CPUs come online the MSC's original configuration should be restored. Add struct mpam_config to hold the configuration. For each component, this has a bitmap of features that have been changed from the reset values. The mpam_config is also used on RIS reset where all bits are set to ensure all features are reset. Once the maximum partid is known, allocate a configuration array for each component, and reprogram each RIS configuration from this. CC: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Cc: Fujitsu Fujitsu <Shaopeng Tan tan.shaopeng@fujitsu.com> Cc: Peter Newman peternewman@google.com Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:22 +00:00
James Morse	3796f75aa7	arm_mpam: Use a static key to indicate when mpam is enabled Once all the MSC have been probed, the system wide usable number of PARTID is known and the configuration arrays can be allocated. After this point, checking all the MSC have been probed is pointless, and the cpuhp callbacks should restore the configuration, instead of just resetting the MSC. Add a static key to enable this behaviour. This will also allow MPAM to be disabled in response to an error, and the architecture code to enable/disable the context switch of the MPAM system registers. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:22 +00:00
James Morse	49aa621c4d	arm_mpam: Register and enable IRQs Register and enable error IRQs. All the MPAM error interrupts indicate a software bug, e.g. out of range partid. If the error interrupt is ever signalled, attempt to disable MPAM. Only the irq handler accesses the MPAMF_ESR register, so no locking is needed. The work to disable MPAM after an error needs to happen at process context as it takes mutex. It also unregisters the interrupts, meaning it can't be done from the threaded part of a threaded interrupt. Instead, mpam_disable() gets scheduled. Enabling the IRQs in the MSC may involve cross calling to a CPU that can access the MSC. Once the IRQ is requested, the mpam_disable() path can be called asynchronously, which will walk structures sized by max_partid. Ensure this size is fixed before the interrupt is requested. CC: Rohit Mathew <rohit.mathew@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Rohit Mathew <rohit.mathew@arm.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:22 +00:00
James Morse	3bd04fe7d8	arm_mpam: Extend reset logic to allow devices to be reset any time cpuhp callbacks aren't the only time the MSC configuration may need to be reset. Resctrl has an API call to reset a class. If an MPAM error interrupt arrives it indicates the driver has misprogrammed an MSC. The safest thing to do is reset all the MSCs and disable MPAM. Add a helper to reset RIS via their class. Call this from mpam_disable(), which can be scheduled from the error interrupt handler. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:22 +00:00
James Morse	475228d15d	arm_mpam: Add a helper to touch an MSC from any CPU Resetting RIS entries from the cpuhp callback is easy as the callback occurs on the correct CPU. This won't be true for any other caller that wants to reset or configure an MSC. Add a helper that schedules the provided function if necessary. Callers should take the cpuhp lock to prevent the cpuhp callbacks from changing the MSC state. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:21 +00:00
James Morse	f188a36ca2	arm_mpam: Reset MSC controls from cpuhp callbacks When a CPU comes online, it may bring a newly accessible MSC with it. Only the default partid has its value reset by hardware, and even then the MSC might not have been reset since its config was previously dirtied. e.g. Kexec. Any in-use partid must have its configuration restored, or reset. In-use partids may be held in caches and evicted later. MSC are also reset when CPUs are taken offline to cover cases where firmware doesn't reset the MSC over reboot using UEFI, or kexec where there is no firmware involvement. If the configuration for a RIS has not been touched since it was brought online, it does not need resetting again. To reset, write the maximum values for all discovered controls. CC: Rohit Mathew <Rohit.Mathew@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:21 +00:00
James Morse	c10ca83a77	arm_mpam: Merge supported features during mpam_enable() into mpam_class To make a decision about whether to expose an mpam class as a resctrl resource we need to know its overall supported features and properties. Once we've probed all the resources, we can walk the tree and produce overall values by merging the bitmaps. This eliminates features that are only supported by some MSC that make up a component or class. If bitmap properties are mismatched within a component we cannot support the mismatched feature. Care has to be taken as vMSC may hold mismatched RIS. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:21 +00:00
James Morse	8c90dc68a5	arm_mpam: Probe the hardware features resctrl supports Expand the probing support with the control and monitor types we can use with resctrl. CC: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:21 +00:00
James Morse	d02beb06ca	arm_mpam: Add helpers for managing the locking around the mon_sel registers The MSC MON_SEL register needs to be accessed from hardirq for the overflow interrupt, and when taking an IPI to access these registers on platforms where MSC are not accessible from every CPU. This makes an irqsave spinlock the obvious lock to protect these registers. On systems with SCMI or PCC mailboxes it must be able to sleep, meaning a mutex must be used. The SCMI or PCC platforms can't support an overflow interrupt, and can't access the registers from hardirq context. Clearly these two can't exist for one MSC at the same time. Add helpers for the MON_SEL locking. For now, use a irqsave spinlock and only support 'real' MMIO platforms. In the future this lock will be split in two allowing SCMI/PCC platforms to take a mutex. Because there are contexts where the SCMI/PCC platforms can't make an access, mpam_mon_sel_lock() needs to be able to fail. Do this now, so that all the error handling on these paths is present. This allows the relevant paths to fail if they are needed on a platform where this isn't possible, instead of having to make explicit checks of the interface type. Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:21 +00:00
James Morse	bd221f9f82	arm_mpam: Probe hardware to find the supported partid/pmg values CPUs can generate traffic with a range of PARTID and PMG values, but each MSC may also have its own maximum size for these fields. Before MPAM can be used, the driver needs to probe each RIS on each MSC, to find the system-wide smallest value that can be used. The limits from requestors (e.g. CPUs) also need taking into account. While doing this, RIS entries that firmware didn't describe are created under MPAM_CLASS_UNKNOWN. This adds the low level MSC write accessors. While we're here, implement the mpam_register_requestor() call for the arch code to register the CPU limits. Future callers of this will tell us about the SMMU and ITS. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:20 +00:00
James Morse	8f8d0ac1da	arm_mpam: Add cpuhp callbacks to probe MSC hardware Because an MSC can only by accessed from the CPUs in its cpu-affinity set we need to be running on one of those CPUs to probe the MSC hardware. Do this work in the cpuhp callback. Probing the hardware will only happen before MPAM is enabled, walk all the MSCs and probe those we can reach that haven't already been probed as each CPU's online call is made. This adds the low-level MSC register read accessors. Once all MSCs reported by the firmware have been probed from a CPU in their respective cpu-affinity set, the probe-time cpuhp callbacks are replaced. The replacement callbacks will ultimately need to handle save/restore of the runtime MSC state across power transitions, but for now there is nothing to do in them: so do nothing. The architecture's context switch code will be enabled by a static-key, this can be set by mpam_enable(), but must be done from process context, not a cpuhp callback because both take the cpuhp lock. Whenever a new MSC has been probed, the mpam_enable() work is scheduled to test if all the MSCs have been probed. If probing fails, mpam_disable() is scheduled to unregister the cpuhp callbacks and free memory. CC: Lecopzer Chen <lecopzerc@nvidia.com> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:20 +00:00
James Morse	aa64b9e110	arm_mpam: Add MPAM MSC register layout definitions Memory Partitioning and Monitoring (MPAM) has memory mapped devices (MSCs) with an identity/configuration page. Add the definitions for these registers as offset within the page(s). Link: https://developer.arm.com/documentation/ihi0099/aa/ Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:20 +00:00
James Morse	01fb4b8224	arm_mpam: Add the class and component structures for firmware described ris An MSC is a container of resources, each identified by their RIS index. Some RIS are described by firmware to provide their position in the system. Others are discovered when the driver probes the hardware. To configure a resource it needs to be found by its class, e.g. 'L2'. There are two kinds of grouping, a class is a set of components, which are visible to user-space as there are likely to be multiple instances of the L2 cache. (e.g. one per cluster or package) Add support for creating and destroying structures to allow a hierarchy of resources to be created. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:20 +00:00
James Morse	f04046f257	arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate Probing MPAM is convoluted. MSCs that are integrated with a CPU may only be accessible from those CPUs, and they may not be online. Touching the hardware early is pointless as MPAM can't be used until the system-wide common values for num_partid and num_pmg have been discovered. Start with driver probe/remove and mapping the MSC. Cc: Carl Worth <carl@os.amperecomputing.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:20 +00:00
James Morse	115c5325be	ACPI / MPAM: Parse the MPAM table Add code to parse the arm64 specific MPAM table, looking up the cache level from the PPTT and feeding the end result into the MPAM driver. This happens in two stages. Platform devices are created first for the MSC devices. Once the driver probes it calls acpi_mpam_parse_resources() to discover the RIS entries the MSC contains. For now the MPAM hook mpam_ris_create() is stubbed out, but will update the MPAM driver with optional discovered data about the RIS entries. CC: Carl Worth <carl@os.amperecomputing.com> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:20 +00:00
Ben Horgan	96f4a4d53e	ACPI: Define acpi_put_table cleanup handler and acpi_get_table_pointer() helper Define a cleanup helper for use with __free to release the acpi table when the pointer goes out of scope. Also, introduce the helper acpi_get_table_pointer() to simplify a commonly used pattern involving acpi_get_table(). These are first used in a subsequent commit. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:19 +00:00
Ben Horgan	f5915600cc	platform: Define platform_device_put cleanup handler Define a cleanup helper for use with __free to destroy platform devices automatically when the pointer goes out of scope. This is only intended to be used in error cases and so should be used with return_ptr() or no_free_ptr() directly to avoid the automatic destruction on success. A first use of this is introduced in a subsequent commit. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:19 +00:00
James Morse	d8bf01d809	arm64: kconfig: Add Kconfig entry for MPAM The bulk of the MPAM driver lives outside the arch code because it largely manages MMIO devices that generate interrupts. The driver needs a Kconfig symbol to enable it. As MPAM is only found on arm64 platforms, the arm64 tree is the most natural home for the Kconfig option. This Kconfig option will later be used by the arch code to enable or disable the MPAM context-switch code, and to register properties of CPUs with the MPAM driver. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> CC: Dave Martin <dave.martin@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:19 +00:00
James Morse	a39a723a6f	ACPI / PPTT: Add a helper to fill a cpumask from a cache_id MPAM identifies CPUs by the cache_id in the PPTT cache structure. The driver needs to know which CPUs are associated with the cache. The CPUs may not all be online, so cacheinfo does not have the information. Add a helper to pull this information out of the PPTT. CC: Rohit Mathew <Rohit.Mathew@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:14 +00:00
James Morse	41a7bb39fe	ACPI / PPTT: Find cache level by cache-id The MPAM table identifies caches by id. The MPAM driver also wants to know the cache level to determine if the platform is of the shape that can be managed via resctrl. Cacheinfo has this information, but only for CPUs that are online. Waiting for all CPUs to come online is a problem for platforms where CPUs are brought online late by user-space. Add a helper that walks every possible cache, until it finds the one identified by cache-id, then return the level. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:09 +00:00
Ben Horgan	cfc085af83	ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure In actbl2.h, acpi_pptt_cache describes the fields in the original Cache Type Structure. In PPTT table version 3 a new field was added at the end, cache_id. This is described in acpi_pptt_cache_v1 but rather than including all v1 fields it just includes this one. In lieu of this being fixed in acpica, introduce acpi_pptt_cache_v1_full to contain all the fields of the Cache Type Structure . Update the existing code to use this new struct. This simplifies the code and removes a non-standard use of ACPI_ADD_PTR. Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:01 +00:00
James Morse	eeec7845e9	ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels In acpi_count_levels(), the initial value of *levels passed by the caller is really an implementation detail of acpi_count_levels(), so it is unreasonable to expect the callers of this function to know what to pass in for this parameter. The only sensible initial value is 0, which is what the only upstream caller (acpi_get_cache_info()) passes. Use a local variable for the starting cache level in acpi_count_levels(), and pass the result back to the caller via the function return value. Get rid of the levels parameter, which has no remaining purpose. Fix acpi_get_cache_info() to match. Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:33:56 +00:00
James Morse	796e29b857	ACPI / PPTT: Add a helper to fill a cpumask from a processor container The ACPI MPAM table uses the UID of a processor container specified in the PPTT to indicate the subset of CPUs and cache topology that can access each MPAM System Component (MSC). This information is not directly useful to the kernel. The equivalent cpumask is needed instead. Add a helper to find the processor container by its id, then walk the possible CPUs to fill a cpumask with the CPUs that have this processor container as a parent. CC: Dave Martin <dave.martin@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:33:36 +00:00
Huang Ying	cb1fa2e999	arm64, tlbflush: don't TLBI broadcast if page reused in write fault A multi-thread customer workload with large memory footprint uses fork()/exec() to run some external programs every tens seconds. When running the workload on an arm64 server machine, it's observed that quite some CPU cycles are spent in the TLB flushing functions. While running the workload on the x86_64 server machine, it's not. This causes the performance on arm64 to be much worse than that on x86_64. During the workload running, after fork()/exec() write-protects all pages in the parent process, memory writing in the parent process will cause a write protection fault. Then the page fault handler will make the PTE/PDE writable if the page can be reused, which is almost always true in the workload. On arm64, to avoid the write protection fault on other CPUs, the page fault handler flushes the TLB globally with TLBI broadcast after changing the PTE/PDE. However, this isn't always necessary. Firstly, it's safe to leave some stale read-only TLB entries as long as they will be flushed finally. Secondly, it's quite possible that the original read-only PTE/PDEs aren't cached in remote TLB at all if the memory footprint is large. In fact, on x86_64, the page fault handler doesn't flush the remote TLB in this situation, which benefits the performance a lot. To improve the performance on arm64, make the write protection fault handler flush the TLB locally instead of globally via TLBI broadcast after making the PTE/PDE writable. If there are stale read-only TLB entries in the remote CPUs, the page fault handler on these CPUs will regard the page fault as spurious and flush the stale TLB entries. To test the patchset, make the usemem.c from vm-scalability (https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git). support calling fork()/exec() periodically. To mimic the behavior of the customer workload, run usemem with 4 threads, access 100GB memory, and call fork()/exec() every 40 seconds. Test results show that with the patchset the score of usemem improves ~40.6%. The cycles% of TLB flush functions reduces from ~50.5% to ~0.3% in perf profile. Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Barry Song <baohua@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Will Deacon <will@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Yin Fengwei <fengwei_yin@linux.alibaba.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 16:01:48 +00:00
Huang Ying	79301c7d60	mm: add spurious fault fixing support for huge pmd The page faults may be spurious because of the racy access to the page table. For example, a non-populated virtual page is accessed on 2 CPUs simultaneously, thus the page faults are triggered on both CPUs. However, it's possible that one CPU (say CPU A) cannot find the reason for the page fault if the other CPU (say CPU B) has changed the page table before the PTE is checked on CPU A. Most of the time, the spurious page faults can be ignored safely. However, if the page fault is for the write access, it's possible that a stale read-only TLB entry exists in the local CPU and needs to be flushed on some architectures. This is called the spurious page fault fixing. In the current kernel, there is spurious fault fixing support for pte, but not for huge pmd because no architectures need it. But in the next patch in the series, we will change the write protection fault handling logic on arm64, so that some stale huge pmd entries may remain in the TLB. These entries need to be flushed via the huge pmd spurious fault fixing mechanism. Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Will Deacon <will@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Yin Fengwei <fengwei_yin@linux.alibaba.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 16:01:48 +00:00
Reinette Chatre	5a88a6e92b	fs/resctrl: Consider sparse masks when initializing new group's allocation A new resource group is intended to be created with sane defaults. For a cache resource this means all cache portions the new group could possibly allocate into. This includes unused cache portions and shareable cache portions used by other groups and hardware. New resource group creation does not take sparse masks into account. After determining the bitmask reflecting the new group's possible allocations the bitmask is forced to be contiguous even if the system supports sparse masks. For example, a new group could by default allocate into a large portion of cache represented by 0xff0f, but it is instead created with a mask of 0xf. Do not force a contiguous allocation range if the system supports sparse masks. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/abbbb008bc09d982d715e79d3b885c10f92c64e0.1763426240.git.reinette.chatre@intel.com	2025-11-18 21:10:56 +01:00
Sohil Mehta	d5cb957439	x86/cpu: Enable LASS during CPU initialization Linear Address Space Separation (LASS) mitigates a class of side-channel attacks that rely on speculative access across the user/kernel boundary. Enable LASS along with similar security features if the platform supports it. While at it, remove the comment above the SMAP/SMEP/UMIP/LASS setup instead of updating it, as the whole sequence is quite self-explanatory. Some EFI runtime and boot services may rely on 1:1 mappings in the lower half during early boot and even after SetVirtualAddressMap(). To avoid tripping LASS, the initial CR4 programming would need to be delayed until EFI has completely finished entering virtual mode (including efi_free_boot_services()). Also, LASS would need to be temporarily disabled while switching to efi_mm to avoid potential faults on stray runtime accesses. Similarly, legacy vsyscall page accesses are flagged by LASS resulting in a #GP (instead of a #PF). Without LASS, the #PF handler emulates the accesses and returns the appropriate values. Equivalent emulation support is required in the #GP handler with LASS enabled. In case of vsyscall XONLY (execute only) mode, the faulting address is readily available in the RIP which would make it easier to reuse the #PF emulation logic. For now, keep it simple and disable LASS if either of those are compiled in. Though not ideal, this makes it easier to start testing LASS support in some environments. In future, LASS support can easily be expanded to support EFI and legacy vsyscalls. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-9-sohil.mehta%40intel.com	2025-11-18 10:38:27 -08:00
Sohil Mehta	c9129cf0f0	selftests/x86: Update the negative vsyscall tests to expect a #GP Some of the vsyscall selftests expect a #PF when vsyscalls are disabled. However, with LASS enabled, an invalid access results in a SIGSEGV due to a #GP instead of a #PF. One such negative test fails because it is expecting X86_PF_INSTR to be set. Update the failing test to expect either a #GP or a #PF. Also, update the printed messages to show the trap number (denoting the type of fault) instead of assuming a #PF. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-8-sohil.mehta%40intel.com	2025-11-18 10:38:26 -08:00
Alexander Shishkin	42fea0a3a7	x86/traps: Communicate a LASS violation in #GP message A LASS violation typically results in a #GP. With LASS active, any invalid access to user memory (including the first page frame) would be reported as a #GP, instead of a #PF. Unfortunately, the #GP error messages provide limited information about the cause of the fault. This could be confusing for kernel developers and users who are accustomed to the friendly #PF messages. To make the transition easier, enhance the #GP Oops message to include a hint about LASS violations. Also, add a special hint for kernel NULL pointer dereferences to match with the existing #PF message. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-7-sohil.mehta%40intel.com	2025-11-18 10:38:26 -08:00
Sohil Mehta	731d43750c	x86/kexec: Disable LASS during relocate kernel The relocate kernel mechanism uses an identity mapping to copy the new kernel, which leads to a LASS violation when executing from a low address. LASS must be disabled after the original CR4 value is saved because kexec paths that preserve context need to restore CR4.LASS. But, disabling it along with CET during identity_mapped() is too late. So, disable LASS immediately after saving CR4, along with PGE, and before jumping to the identity-mapped page. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-6-sohil.mehta%40intel.com	2025-11-18 10:38:26 -08:00
Sohil Mehta	b3a7e973ab	x86/alternatives: Disable LASS when patching kernel code For patching, the kernel initializes a temporary mm area in the lower half of the address range. LASS blocks these accesses because its enforcement relies on bit 63 of the virtual address as opposed to SMAP which depends on the _PAGE_BIT_USER bit in the page table. Disable LASS enforcement by toggling the RFLAGS.AC bit during patching to avoid triggering a #GP fault. Introduce LASS-specific STAC/CLAC helpers to set the AC bit only on platforms that need it. Name the wrappers as lass_stac()/_clac() instead of lass_disable()/_enable() because they only control the kernel data access enforcement. The entire LASS mechanism (including instruction fetch enforcement) is controlled by the CR4.LASS bit. Describe the usage of the new helpers in comparison to the ones used for SMAP. Also, add comments to explain when the existing stac()/clac() should be used. While at it, move the duplicated "barrier" comment to the same block. The Text poking functions use standard memcpy()/memset() while patching kernel code. However, objtool complains about calling such dynamic functions within an AC=1 region. See warning #9, regarding function calls with UACCESS enabled, in tools/objtool/Documentation/objtool.txt. To pacify objtool, one option is to add memcpy() and memset() to the list of allowed-functions. However, that would provide a blanket exemption for all usages of memcpy() and memset(). Instead, replace the standard calls in the text poking functions with their unoptimized, always-inlined versions. Considering that patching is usually small, there is no performance impact expected. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-5-sohil.mehta%40intel.com	2025-11-18 10:38:26 -08:00
Peter Zijlstra (Intel)	d9a96cc18b	x86/asm: Introduce inline memcpy and memset Provide inline memcpy and memset functions that can be used instead of the GCC builtins when necessary. The immediate use case is for the text poking functions to avoid the standard memcpy()/memset() calls because objtool complains about such dynamic calls within an AC=1 region. See tools/objtool/Documentation/objtool.txt, warning #9, regarding function calls with UACCESS enabled. Some user copy functions such as copy_user_generic() and __clear_user() have similar rep_{movs,stos} usages. But, those are highly specialized and hard to combine or reuse for other things. Define these new helpers for all other usages that need a completely unoptimized, strictly inline version of memcpy() or memset(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-4-sohil.mehta%40intel.com	2025-11-18 10:38:26 -08:00
Sohil Mehta	e39c5387ad	x86/cpu: Add an LASS dependency on SMAP With LASS enabled, any kernel data access to userspace typically results in a #GP, or a #SS in some stack-related cases. When the kernel needs to access user memory, it can suspend LASS enforcement by toggling the RFLAGS.AC bit. Most of these cases are already covered by the stac()/clac() pairs used to avoid SMAP violations. Even though LASS could potentially be enabled independently, it would be very painful without SMAP and the related stac()/clac() calls. There is no reason to support such a configuration because all future hardware with LASS is expected to have SMAP as well. Also, the STAC/CLAC instructions are architected to: #UD - If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0. So, make LASS depend on SMAP to conveniently reuse the existing AC bit toggling already in place. Note: Additional STAC/CLAC would still be needed for accesses such as text poking which are not flagged by SMAP. This is because such mappings are in the lower half but do not have the _PAGE_USER bit set which SMAP uses for enforcement. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-3-sohil.mehta%40intel.com	2025-11-18 10:38:26 -08:00
Sohil Mehta	7baadd463e	x86/cpufeatures: Enumerate the LASS feature bits Linear Address Space Separation (LASS) is a security feature that mitigates a class of side-channel attacks relying on speculative access across the user/kernel boundary. Privilege mode based access protection already exists today with paging and features such as SMEP and SMAP. However, to enforce these protections, the processor must traverse the paging structures in memory. An attacker can use timing information resulting from this traversal to determine details about the paging structures, and to determine the layout of the kernel memory. LASS provides the same mode-based protections as paging but without traversing the paging structures. Because the protections are enforced prior to page-walks, an attacker will not be able to derive paging-based timing information from the various caching structures such as the TLBs, mid-level caches, page walker, data caches, etc. LASS enforcement relies on the kernel implementation to divide the 64-bit virtual address space into two halves: Addr[63]=0 -> User address space Addr[63]=1 -> Kernel address space Any data access or code execution across address spaces typically results in a #GP fault, with an #SS generated in some rare cases. The LASS enforcement for kernel data accesses is dependent on CR4.SMAP being set. The enforcement can be disabled by toggling the RFLAGS.AC bit similar to SMAP. Define the CPU feature bits to enumerate LASS. Also, disable the feature at compile time on 32-bit kernels. Use a direct dependency on X86_32 (instead of !X86_64) to make it easier to combine with similar 32-bit specific dependencies in the future. LASS mitigates a class of side-channel speculative attacks, such as Spectre LAM, described in the paper, "Leaky Address Masking: Exploiting Unmasked Spectre Gadgets with Noncanonical Address Translation". Add the "lass" flag to /proc/cpuinfo to indicate that the feature is supported by hardware and enabled by the kernel. This allows userspace to determine if the system is secure against such attacks. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Xin Li (Intel) <xin@zytor.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-2-sohil.mehta%40intel.com	2025-11-18 10:38:26 -08:00
Thorsten Blum	cdf5ecc3f6	EDAC/ghes: Replace deprecated strcpy() in ghes_edac_report_mem_error() strcpy() has been deprecated¹ because it performs no bounds checking on the destination buffer, which can lead to buffer overflows. Use the safer strscpy() instead. ¹ https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://patch.msgid.link/20251118135621.101148-2-thorsten.blum@linux.dev	2025-11-18 16:50:32 +01:00
Chengkaitao	9d3faec60b	genirq: Use raw_spinlock_irq() in irq_set_affinity_notifier() Since irq_set_affinity_notifier() may sleep, interrupts are enabled. So raw_spinlock_irqsave() can be replaced with raw_spinlock_irq(). Signed-off-by: Chengkaitao <chengkaitao@kylinos.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251118012754.61805-1-pilgrimtao@gmail.com	2025-11-18 16:19:40 +01:00
Dan Carpenter	80adaccf0e	rseq: Delete duplicate if statement in rseq_virt_userspace_exit() This if statement is indented weirdly. It's a duplicate and doesn't affect runtime (beyond wasting a little time). Delete it. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/aRxP3YcwscrP1BU_@stanley.mountain	2025-11-18 15:56:55 +01:00
Christophe Leroy	4322c8f81c	lib/strn*,uaccess: Use masked_user_{read/write}_access_begin when required Properly use masked_user_read_access_begin() and masked_user_write_access_begin() instead of masked_user_access_begin() in order to match user_read_access_end() and user_write_access_end(). This is important for architectures like PowerPC that enable separately user reads and user writes. That means masked_user_read_access_begin() is used when user memory is exclusively read during the window and masked_user_write_access_begin() is used when user memory is exclusively writen during the window. masked_user_access_begin() remains and is used when both reads and writes are performed during the open window. Each of them is expected to be terminated by the matching user_read_access_end(), user_write_access_end() and user_access_end(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/cb5e4b0fa49ea9c740570949d5e3544423389757.1763396724.git.christophe.leroy@csgroup.eu	2025-11-18 15:27:35 +01:00
Christophe Leroy	1c204914bc	scm: Convert put_cmsg() to scoped user access Replace the open coded implementation with the scoped user access guard. That also corrects the imbalance between masked_user_access_begin() and user_write_access_end(), which would affect PowerPC when it gains masked user access support. No functional change intended. [ tglx: Amend change log ] Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/793219313f641eda09a892d06768d2837246bf9f.1763396724.git.christophe.leroy@csgroup.eu	2025-11-18 15:27:34 +01:00
Christophe Leroy	803abedbd5	iov_iter: Add missing speculation barrier to copy_from_user_iter() The results of "access_ok()" can be mis-speculated. The result is that the CPU can end speculatively: if (access_ok(from, size)) // Right here For the same reason as done in copy_from_user() in commit `74e19ef0ff` ("uaccess: Add speculation barrier to copy_from_user()"), add a speculation barrier to copy_from_user_iter(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/6b73e69cc7168c89df4eab0a216e3ed4cca36b0a.1763396724.git.christophe.leroy@csgroup.eu	2025-11-18 15:27:34 +01:00
Christophe Leroy	4db1df7a72	iov_iter: Convert copy_from_user_iter() to masked user access copy_from_user_iter() lacks a speculation barrier, which will degrade performance on some architecture like x86, which would be unfortunate as copy_from_user_iter() is a critical hotpath function. Convert copy_from_user_iter() to using masked user access on architecture that support it. This allows to add the speculation barrier without impacting performance. This is similar to what was done for copy_from_user() in commit `0fc810ae3a` ("x86/uaccess: Avoid barrier_nospec() in 64-bit copy_from_user()") [ tglx: Massage change log ] Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/58e4b07d469ca68a2b9477fe2c1ccc8a44cef131.1763396724.git.christophe.leroy@csgroup.eu	2025-11-18 15:27:34 +01:00
Mark Brown	a0245b42f8	kselftest/arm64: Cover disabling streaming mode without SVE in fp-ptrace On a system which support SME but not SVE we can now disable streaming mode via ptrace by writing FPSIMD formatted data through NT_ARM_SVE with a VL of 0. Extend fp-ptrace to cover rather than skip these cases, relax the check for SVE writes of FPSIMD format data to not skip if SME is supported and accept 0 as the VL when performing the ptrace write. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-17 20:11:54 +00:00
Mark Brown	eb9df6d69a	kselftst/arm64: Test NT_ARM_SVE FPSIMD format writes on non-SVE systems In order to allow exiting streaming mode on systems with SME but not SVE we allow writes of FPSIMD format data via NT_ARM_SVE even when SVE is not supported, add a test case that covers this to sve-ptrace. We do not support reads. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-17 20:11:54 +00:00
Mark Brown	472800cd5e	arm64/sme: Support disabling streaming mode via ptrace on SME only systems Currently it is not possible to disable streaming mode via ptrace on SME only systems, the interface for doing this is to write via NT_ARM_SVE but such writes will be rejected on a system without SVE support. Enable this functionality by allowing userspace to write SVE_PT_REGS_FPSIMD format data via NT_ARM_SVE with the vector length set to 0 on SME only systems. Such writes currently error since we require that a vector length is specified which should minimise the risk that existing software is relying on current behaviour. Reads are not supported since I am not aware of any use case for this and there is some risk that an existing userspace application may be confused if it reads NT_ARM_SVE on a system without SVE. Existing kernels will return FPSIMD formatted register state from NT_ARM_SVE if full SVE state is not stored, for example if the task has not used SVE. Returning a vector length of 0 would create a risk that software would try to do things like allocate space for register state with zero sizes, while returning a vector length of 128 bits would look like SVE is supported. It seems safer to just not make the changes to add read support. It remains possible for userspace to detect a SME only system via the ptrace interface only since reads of NT_ARM_SSVE and NT_ARM_ZA will succeed while reads of NT_ARM_SVE will fail. Read/write access to the FPSIMD registers in non-streaming mode is available via REGSET_FPR. sve_set_common() already avoids allocating SVE storage when doing a FPSIMD formatted write and allocating SME storage when doing a NT_ARM_SVE write so we change the function to validate the new case and skip setting a vector length for it. The aim is to make a minimally invasive change, no operation that would previously have succeeded will be affected, and we use a previously defined interface in new circumstances rather than define completely new ABI. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: David Spickett <david.spickett@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-17 20:11:54 +00:00
Peter Oberparleiter	2a2153a2ba	s390/debug: Update description of resize operation With commit `1204777867` ("s390/debug: keep debug data on resize") the behavior of a debug area resize operation was changed. Update the associated documentation to reflect this change. Fixes: `1204777867` ("s390/debug: keep debug data on resize") Reported-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:55:16 +01:00
Heiko Carstens	0d79affa31	Merge branch 'compat-removal' Heiko Carstens says: ==================== Remove s390 compat support to allow for code simplification and especially reduced test effort. To the best of our knowledge there aren't any 31 bit binaries out in the world anymore that would matter for newer kernels or newer distributions. Distributions do not provide compat packages since quite some time or even have CONFIG_COMPAT disabled. Instead of adding deprecation warnings to config option, or adding kernel messages, just remove the code. Deprecation warnings haven't proven to be useful. If it turns out there is still a reason to keep the compat support this series can be reverted at any time in the future. ==================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:11:48 +01:00
Heiko Carstens	4ac286c4a8	s390/syscalls: Switch to generic system call table generation The s390 syscall.tbl format differs slightly from most others, and therefore requires an s390 specific system call table generation script. With compat support gone use the opportunity to switch to generic system call table generation. The abi for all 64 bit system calls is now common, since there is no need to specify if system call entry points are only for 64 bit anymore. Furthermore create the system call table in C instead of assembler code in order to get type checking for all system call functions contained within the table. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:10:39 +01:00
Heiko Carstens	f4e1f1b137	s390/syscalls: Remove system call table pointer from thread_struct With compat support gone there is only one system call table left. Therefore remove the sys_call_table pointer from thread_struct and use the sys_call_table directly. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:10:39 +01:00
Heiko Carstens	3db5cf9354	s390/uapi: Remove 31 bit support from uapi header files Since the kernel does not support running 31 bit / compat binaries anymore, remove also the corresponding 31 bit support from uapi header files. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:10:38 +01:00
Heiko Carstens	8e0b986c59	s390: Remove compat support There shouldn't be any 31 bit code around anymore that matters. Remove the compat layer support required to run 31 bit code. Reason for removal is code simplification and reduced test effort. Note that this comes without any deprecation warnings added to config options, or kernel messages, since most likely those would be ignored anyway. If it turns out there is still a reason to keep the compat layer this can be reverted at any time in the future. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:10:38 +01:00
Heiko Carstens	169ebcbb90	tools: Remove s390 compat support Remove s390 compat support from everything within tools, since s390 compat support will be removed from the kernel. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Weißschuh <linux@weissschuh.net> # tools/nolibc selftests/nolibc Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> # selftests/vDSO Acked-by: Alexei Starovoitov <ast@kernel.org> # bpf bits Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:10:38 +01:00
Heiko Carstens	7afb095df3	s390/syscalls: Add pt_regs parameter to SYSCALL_DEFINE0() syscall wrapper All system call wrappers should match the sys_call_ptr_t type. This is not the case for system calls without parameters. Add the missing pt_regs parameter there too. Note: this is currently not a problem, since the parameter is unused. However it prevents to create a correctly typed system call table in C. With the current assembler implementation this works because of missing type checking. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:10:38 +01:00
Heiko Carstens	b2da5f6400	s390/kvm: Use psw32_t instead of psw_compat_t kvm_s390_handle_lpsw() make use of the psw_compat_t type even though the code has nothing to do with CONFIG_COMPAT, for which the type is supposed to be used. Use psw32_t instead. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:10:37 +01:00
Heiko Carstens	8c633c78c2	s390/ptrace: Rename psw_t32 to psw32_t Use a standard "_t" suffix for psw_t32 and rename it to psw32_t. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:10:37 +01:00
Sean Christopherson	f2f22721ac	x86/sgx: Fix a typo in the kernel-doc comment for enum sgx_attribute Use the exact enum name when documenting "enum sgx_attribute" to fix a warning if the file is fed into kernel-doc processing: WARNING: ./arch/x86/include/asm/sgx.h:139 expecting prototype for enum sgx_attributes. Prototype was for enum sgx_attribute instead Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251112160708.1343355-6-seanjc%40google.com	2025-11-14 15:30:32 -08:00
Sean Christopherson	55bf13b612	x86/sgx: Remove superfluous asterisk from copyright comment in asm/sgx.h Drop an asterisk from a file-level copyright comment so that the comment isn't intrepeted as a kernel-doc comment. E.g. if arch/x86/include/asm/sgx.h is fed into kernel-doc processing: WARNING: ./arch/x86/include/asm/sgx.h:2 This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251112160708.1343355-5-seanjc%40google.com	2025-11-14 15:30:28 -08:00
Sean Christopherson	905885fdb1	x86/sgx: Document structs and enums with '@', not '%' Use '@' to document structure members and enum values in kernel-doc markup, as per Documentation/doc-guide/kernel-doc.rst and flagged by make htmldocs. WARNING: arch/x86/include/uapi/asm/sgx.h:17 Enum value 'SGX_PAGE_MEASURE' not described in enum 'sgx_page_flags' Opportunistically add a missing ':' for SGX_CHILD_PRESENT. Closes: https://lore.kernel.org/all/20251106145506.145fc620@canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251112160708.1343355-4-seanjc%40google.com	2025-11-14 15:30:26 -08:00
Sean Christopherson	243ea511fe	x86/sgx: Add kernel-doc descriptions for params passed to vDSO user handler Add kernel-doc markup for the register parameters passed by the vDSO blob to the user handler to suppress build warnings, e.g. WARNING: arch/x86/include/uapi/asm/sgx.h:157 function parameter 'r8' not described in 'sgx_enclave_user_handler_t' Call out that except for RSP, the registers are undefined on asynchronous exits as far as the vDSO ABI is concerned. E.g. the vDSO's exception handler clobbers RDX, RDI, and RSI, and the kernel doesn't guarantee that R8 or R9 will be zero (the synthetic value loaded by the CPU). Closes: https://lore.kernel.org/all/20251106145506.145fc620@canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251112160708.1343355-3-seanjc%40google.com	2025-11-14 15:30:22 -08:00
Sean Christopherson	75801ca620	x86/sgx: Add a missing colon in kernel-doc markup for "struct sgx_enclave_run" Add a missing ':' for the description of sgx_enclave_run.reserved so that documentation for the member is correctly generated: WARNING: arch/x86/include/uapi/asm/sgx.h:184 struct member 'reserved' not described in 'sgx_enclave_run' Closes: https://lore.kernel.org/all/20251106145506.145fc620@canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251112160708.1343355-2-seanjc%40google.com	2025-11-14 15:30:13 -08:00
Thomas Weißschuh	308bc2e338	selftests/timers/nanosleep: Add tests for return of remaining time If interrupted by a signal clock_nanosleep() returns the remaining time into the structure pointed to by the rmtp parameter. So far this functionality was not tested by the timer selftests. Extend the nanosleep selftest to cover this feature. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251106-nanosleep-rtmp-selftest-v1-1-f9212fb295fe@linutronix.de	2025-11-14 20:34:50 +01:00
Wake Liu	05d89fe7e4	selftests/timers: Clean up kernel version check in posix_timers Several tests in the posix_timers selftest which test timer behavior related to SIG_IGN fail on kernels older than 6.13. This is due to a refactoring of signal handling in commit `caf77435dd` ("signal: Handle ignored signals in do_sigaction(action != SIG_IGN)"). A previous attempt to fix this by adding a kernel version check to each of the nine affected tests was suboptimal, as it resulted in emitting the same skip message nine times. Following the suggestion from Thomas Gleixner, this is refactored to perform a single version check in main(). To satisfy the kselftest framework's requirement for the test count to match the declared plan, the plan is now conditionally set to 10 (for older kernels) or 19. While setting the plan conditionally may seem complex, it is the better approach to avoid the alternatives: either running tests on unsupported kernels that are known to fail, or emitting a noisy series of nine identical skip messages. A single informational message is now printed instead when the tests are skipped. Signed-off-by: Wake Liu <wakel@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250807085042.1690931-1-wakel@google.com/ Link: https://patch.msgid.link/20251103114502.584940-1-wakel@google.com	2025-11-14 20:34:50 +01:00
Jianyun Gao	4518767be9	time: Fix a few typos in time[r] related code comments Signed-off-by: Jianyun Gao <jianyungao89@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20250927093411.1509275-1-jianyungao89@gmail.com	2025-11-14 20:34:50 +01:00
Borislav Petkov (AMD)	e67997021f	x86/bugs: Get rid of the forward declarations Get rid of the forward declarations of the mitigation functions by moving their single caller below them. No functional changes. Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20251105200447.GBaQut3w4dLilZrX-z@fat_crate.local	2025-11-14 20:32:21 +01:00
Sunday Adelodun	e54dd0474c	time: tick-oneshot: Add missing Return and parameter descriptions to kernel-doc Several functions in kernel/time/tick-oneshot.c are missing parameter and return value descriptions in their kernel-doc comments. This causes warnings during doc generation. Update the kernel-doc blocks to include detailed @param and Return: descriptions for better clarity and to fix kernel-doc warnings. No functional code changes are made. Signed-off-by: Sunday Adelodun <adelodunolaoluwa@yahoo.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251106113938.34693-3-adelodunolaoluwa@yahoo.com	2025-11-14 20:17:44 +01:00
Thomas Weißschuh	4702f4eceb	hrtimer: Store time as ktime_t in restart block The hrtimer core uses ktime_t to represent times, use that also for the restart block. CPU timers internally use nanoseconds instead of ktime_t but use the same restart block, so use the correct accessors for those. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251110-restart-block-expiration-v1-3-5d39cc93df4f@linutronix.de	2025-11-14 16:31:19 +01:00
Heiko Carstens	52a1f73d17	s390/fault: Print unmodified PSW address on protection exception In case of a kernel crash caused by a protection exception, print the unmodified PSW address as reported by the CPU. The protection exception handler modifies the PSW address in order to keep fault handling easy, however that leads to misleading call traces. Therefore restore the original PSW address before printing it. Before this change the output in case of a protection exception looks like this: Oops: 0004 ilc:2 [#1]SMP Krnl PSW : 0704c00180000000 000003ffe0b40d78 (sysrq_handle_crash+0x28/0x40) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 ... Krnl Code: 000003ffe0b40d66: e3e0f0980024 stg %r14,152(%r15) 000003ffe0b40d6c: c010fffffff2 larl %r1,000003ffe0b40d50 #000003ffe0b40d72: c0200046b6bc larl %r2,000003ffe1417aea >000003ffe0b40d78: 92021000 mvi 0(%r1),2 000003ffe0b40d7c: c0e5ffae03d6 brasl %r14,000003ffe0101528 With this change it looks like this: Oops: 0004 ilc:2 [#1]SMP Krnl PSW : 0704c00180000000 000003ffe0b40dfc (sysrq_handle_crash+0x2c/0x40) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 ... Krnl Code: 000003ffe0b40dec: c010fffffff2 larl %r1,000003ffe0b40dd0 000003ffe0b40df2: c0200046b67c larl %r2,000003ffe1417aea 000003ffe0b40df8: 92021000 mvi 0(%r1),2 >000003ffe0b40dfc: c0e5ffae03b6 brasl %r14,000003ffe0101568 000003ffe0b40e02: 0707 bcr 0,%r7 Note that with this change the PSW address points to the instruction behind the instruction which caused the exception like it is expected for protection exceptions. This also replaces the '#' marker in the disassembly with '', which allows to distinguish between new and old behavior. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:34:28 +01:00
Heiko Carstens	a603a00399	s390/uprobes: Use __forward_psw() instead of private implementation With adjust_psw_addr() the uprobes code contains more or less a private __forward_psw() implementation. Switch it to use __forward_psw(), and remove adjust_psw_addr(). Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:34:28 +01:00
Heiko Carstens	37450e0994	s390/processor: Add __forward_psw() helper Similar to __rewind_psw() add the counter part __forward_psw(). This helps to make code more readable if a PSW address has to be forwarded, since it is more natural to write addr = __forward_psw(psw, ilen); instead of addr = __rewind_psw(psw, -ilen); This renames also the ilc parameter of __rewind_psw() to ilen, since the parameter reflects an instruction length, and not an instruction length code. Also change the type of ilen from unsigned long to long so it reflects that lengths can be negative or positive. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:34:28 +01:00
Aleksei Nikiforov	14e4e4175b	s390/fpu: Fix false-positive kmsan report in fpu_vstl() A false-positive kmsan report is detected when running ping command. An inline assembly instruction 'vstl' can write varied amount of bytes depending on value of 'index' argument. If 'index' > 0, 'vstl' writes at least 2 bytes. clang generates kmsan write helper call depending on inline assembly constraints. Constraints are evaluated compile-time, but value of 'index' argument is known only at runtime. clang currently generates call to __msan_instrument_asm_store with 1 byte as size. Manually call kmsan function to indicate correct amount of bytes written and fix false-positive report. This change fixes following kmsan reports: [ 36.563119] ===================================================== [ 36.563594] BUG: KMSAN: uninit-value in virtqueue_add+0x35c6/0x7c70 [ 36.563852] virtqueue_add+0x35c6/0x7c70 [ 36.564016] virtqueue_add_outbuf+0xa0/0xb0 [ 36.564266] start_xmit+0x288c/0x4a20 [ 36.564460] dev_hard_start_xmit+0x302/0x900 [ 36.564649] sch_direct_xmit+0x340/0xea0 [ 36.564894] __dev_queue_xmit+0x2e94/0x59b0 [ 36.565058] neigh_resolve_output+0x936/0xb40 [ 36.565278] __neigh_update+0x2f66/0x3a60 [ 36.565499] neigh_update+0x52/0x60 [ 36.565683] arp_process+0x1588/0x2de0 [ 36.565916] NF_HOOK+0x1da/0x240 [ 36.566087] arp_rcv+0x3e4/0x6e0 [ 36.566306] __netif_receive_skb_list_core+0x1374/0x15a0 [ 36.566527] netif_receive_skb_list_internal+0x1116/0x17d0 [ 36.566710] napi_complete_done+0x376/0x740 [ 36.566918] virtnet_poll+0x1bae/0x2910 [ 36.567130] __napi_poll+0xf4/0x830 [ 36.567294] net_rx_action+0x97c/0x1ed0 [ 36.567556] handle_softirqs+0x306/0xe10 [ 36.567731] irq_exit_rcu+0x14c/0x2e0 [ 36.567910] do_io_irq+0xd4/0x120 [ 36.568139] io_int_handler+0xc2/0xe8 [ 36.568299] arch_cpu_idle+0xb0/0xc0 [ 36.568540] arch_cpu_idle+0x76/0xc0 [ 36.568726] default_idle_call+0x40/0x70 [ 36.568953] do_idle+0x1d6/0x390 [ 36.569486] cpu_startup_entry+0x9a/0xb0 [ 36.569745] rest_init+0x1ea/0x290 [ 36.570029] start_kernel+0x95e/0xb90 [ 36.570348] startup_continue+0x2e/0x40 [ 36.570703] [ 36.570798] Uninit was created at: [ 36.571002] kmem_cache_alloc_node_noprof+0x9e8/0x10e0 [ 36.571261] kmalloc_reserve+0x12a/0x470 [ 36.571553] __alloc_skb+0x310/0x860 [ 36.571844] __ip_append_data+0x483e/0x6a30 [ 36.572170] ip_append_data+0x11c/0x1e0 [ 36.572477] raw_sendmsg+0x1c8c/0x2180 [ 36.572818] inet_sendmsg+0xe6/0x190 [ 36.573142] __sys_sendto+0x55e/0x8e0 [ 36.573392] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 36.573571] __do_syscall+0x12e/0x240 [ 36.573823] system_call+0x6e/0x90 [ 36.573976] [ 36.574017] Byte 35 of 98 is uninitialized [ 36.574082] Memory access of size 98 starts at 0000000007aa0012 [ 36.574218] [ 36.574325] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G B N 6.17.0-dirty #16 NONE [ 36.574541] Tainted: [B]=BAD_PAGE, [N]=TEST [ 36.574617] Hardware name: IBM 3931 A01 703 (KVM/Linux) [ 36.574755] ===================================================== [ 63.532541] ===================================================== [ 63.533639] BUG: KMSAN: uninit-value in virtqueue_add+0x35c6/0x7c70 [ 63.533989] virtqueue_add+0x35c6/0x7c70 [ 63.534940] virtqueue_add_outbuf+0xa0/0xb0 [ 63.535861] start_xmit+0x288c/0x4a20 [ 63.536708] dev_hard_start_xmit+0x302/0x900 [ 63.537020] sch_direct_xmit+0x340/0xea0 [ 63.537997] __dev_queue_xmit+0x2e94/0x59b0 [ 63.538819] neigh_resolve_output+0x936/0xb40 [ 63.539793] ip_finish_output2+0x1ee2/0x2200 [ 63.540784] __ip_finish_output+0x272/0x7a0 [ 63.541765] ip_finish_output+0x4e/0x5e0 [ 63.542791] ip_output+0x166/0x410 [ 63.543771] ip_push_pending_frames+0x1a2/0x470 [ 63.544753] raw_sendmsg+0x1f06/0x2180 [ 63.545033] inet_sendmsg+0xe6/0x190 [ 63.546006] __sys_sendto+0x55e/0x8e0 [ 63.546859] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 63.547730] __do_syscall+0x12e/0x240 [ 63.548019] system_call+0x6e/0x90 [ 63.548989] [ 63.549779] Uninit was created at: [ 63.550691] kmem_cache_alloc_node_noprof+0x9e8/0x10e0 [ 63.550975] kmalloc_reserve+0x12a/0x470 [ 63.551969] __alloc_skb+0x310/0x860 [ 63.552949] __ip_append_data+0x483e/0x6a30 [ 63.553902] ip_append_data+0x11c/0x1e0 [ 63.554912] raw_sendmsg+0x1c8c/0x2180 [ 63.556719] inet_sendmsg+0xe6/0x190 [ 63.557534] __sys_sendto+0x55e/0x8e0 [ 63.557875] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 63.558869] __do_syscall+0x12e/0x240 [ 63.559832] system_call+0x6e/0x90 [ 63.560780] [ 63.560972] Byte 35 of 98 is uninitialized [ 63.561741] Memory access of size 98 starts at 0000000005704312 [ 63.561950] [ 63.562824] CPU: 3 UID: 0 PID: 192 Comm: ping Tainted: G B N 6.17.0-dirty #16 NONE [ 63.563868] Tainted: [B]=BAD_PAGE, [N]=TEST [ 63.564751] Hardware name: IBM 3931 A01 703 (KVM/Linux) [ 63.564986] ===================================================== Fixes: `dcd3e1de9d` ("s390/checksum: provide csum_partial_copy_nocheck()") Signed-off-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:34:27 +01:00
Thomas Richter	d17901e8e8	s390/pai: Calculate size of reserved PAI extension control block area The PAI extension 1 control block area is 512 bytes in total. It currently contains three address pointer which refer to counter memory blocks followed by a reserved area. Calculate the reserved area instead of hardcoding its size. This makes the code more readable and maintainable. No functional chance. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Suggested-by: Jan Polensky <japo@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:34:27 +01:00
Heiko Carstens	b60d126c8e	s390/mm: Let dump_fault_info() print additional information Let dump_fault_info() print additional information to make debugging easier: Print "FSI" if the access-exception-fetch/store-indication facility is installed. If it is installed the TEID may also indicate if an exception happened because of a fetch or a store operation. Print "SOP", "ESOP-1", or "ESOP-2" depending on the type of the installed Suppression-on-Protection facility. This also gives additional information about the validity and meaning of the TEID bits. The output is changed from something like: Failing address: 0000000000000000 TEID: 0000000000000803 to Failing address: 0000000000000000 TEID: 0000000000000803 ESOP-2 FSI Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:34:27 +01:00
Heiko Carstens	76502abca2	s390/mm: Change comment and die() message if teid.b61 is zero The comments in do_protection() give the impression that a TEID, where bit 61 is zero, indicates a low address protection exception. This is not necessarily true, and it depends on the type of Suppression-on-Protection facility of the machine (see Princples of Operation) what this means. Rework the comments and the die() message to reflect this. This may also help to avoid confusion. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:34:27 +01:00
Heiko Carstens	02310adcc6	s390/mm: Remove unused flush_tlb() flush_tlb() exists for historic reasons and was never used. Remove it. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:34:27 +01:00
Heiko Carstens	f518d469fe	Merge branch 'pai-pmu-merge' Thomas Richter says: ==================== The PAI PMUs pai_crypto and pai_ext both operate on memory mapped counters supported by z16 and follow on machines. These memory mapped counters have a lot in common, like: - validation, installing and removing events - starting and stopping events - retrieving counter values - collecting sample data. However both PMU drivers have slightly different parameters, for example: - different mapped memory size - different number of supported counters - different counter numbers and names - different bits in the CR0 register - different anchor address in lowcore Due to these different parameters, two independent PMUs have been developed. However both PMU drivers have very much in common and most of the PMU call back functions look very similar and are sometimes identical. This patch set combines both independent PMU device drivers perf_pai_crypto.c and per_pai_ext.c into one device driver. The new device driver operations on a table which contains the different parameters and uses common functions for event operations. Result is one PAI PMU driver which supports both PMUs. It is also extendable to support new PAI PMUs. ==================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:50 +01:00
Thomas Richter	492578d3a2	s390/pai: Rename perf_pai_crypto.c to perf_pai.c Rename perf_pai_crypto.c to perf_pai.c. The new perf_pai.c contains both PAI device drivers: - pai_crypto for PAI crypto counter set - pai_ext for PAI NNPA counter set The rename reflects this common driver supporting both PMUs. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:08 +01:00
Thomas Richter	8b65b0ba35	s390/pai_crypto: Merge pai_ext PMU into pai_crypto Combine PAI cryptography and PAI extension (NNPA) PMUs in one driver. Remove file perf_pai_ext.c and registration of PMU "pai_ext" from perf_pai_crypto.c. Includes: - Shared alloc/free and sched_task handling - NNPA events with exclude_kernel enforced, exclude_user rejected - Setup CR0 bits for both PMUs Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:07 +01:00
Thomas Richter	3abb6b1675	s390/pai_crypto: Introduce PAI crypto specific event delete function Introduce PAI crypto specific event delete function to handle additional actions to be done at event removal. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:07 +01:00
Thomas Richter	35a27bad07	s390/pai_crypto: Make pai_root per-PMU and unify naming Prepare the common PAI PMU driver to handle multiple PMUs. Convert pai_root into an array indexed by PAI_PMU_IDX(event) so that per-CPU state becomes per-PMU. Adjust all call sites accordingly. Rename KMSG_COMPONENT and the s390dbf buffer from "pai_crypto" to "pai" for consistent naming. No functional change intended beyond log identifiers. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:07 +01:00
Thomas Richter	f124735413	s390/pai_crypto: Rename paicrypt_copy() to pai_copy() To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename paicrypt_copy() to pai_copy() to indicate its common usage. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:07 +01:00
Thomas Richter	42e6a0f6d2	s390/pai_crypto: Add common pai_del() function To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Add a common usable function pai_stop() for the event on a CPU. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:07 +01:00
Thomas Richter	ac03223f07	s390/pai_crypto: Add common pai_stop() function To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Add a common usable function pai_stop() for the event on a CPU. Call this common pai_stop() from paicrypt_del(). Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:07 +01:00
Thomas Richter	a65a4d7e80	s390/pai_crypto: Add common pai_add() function To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Add a common usable function pai_add() for the event on a CPU. Call this common pai_add() from paicrypt_add(). Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:06 +01:00
Thomas Richter	6fe66b2157	s390/pai_crypto: Add common pai_start() function To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Add a common usable function pai_start() to the event on a CPU. The function expects a PAI PMU specific read function as second parameter to read out the start value for an event. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:06 +01:00
Thomas Richter	8f6116fd49	s390/pai_crypto: Add common pai_read() function To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Add a common usable function pai_read() to read counter values. The function expects a PAI PMU specific read function as second parameter. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:06 +01:00
Thomas Richter	74466e87e7	s390/pai_crypto: Unify sample push logic and update context handling Unify naming and logic for PAI PMU drivers to support both PMUs pai_crypto and pai_ext. Rename paicrypt_push_sample() to pai_push_sample() to reflect its common usage. Add detailed comments about invocation context and scheduler callbacks. Use struct pai_pmu to determine area_size instead of PAGE_SIZE for counter backup. Remove obsolete variable paicrypt_cnt. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:06 +01:00
Thomas Richter	0f1c0d754a	s390/pai_crypto: Rename paicrypt_have_samples() to pai_have_samples() To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename paicrypt_have_samples() to pai_have_samples() to reflect its common usage. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:06 +01:00
Thomas Richter	360e180d8b	s390/pai_crypto: Rename paicrypt_getctr() to pai_getctr() To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename paicrypt_getctr() to pai_getctr() to reflect is common purpose. pai_getctr() now uses pai_pmu table to extract PAI PMU characteristics such as kernel_offset inside the counter area page. Also rename paicrypt_have_sample() to pai_have_sample(). Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:06 +01:00
Thomas Richter	42cd0c8242	s390/pai_crypto: Rename paicrypt_getdata() to pai_getdata() To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename paicrypt_getdata() to pai_getdata(). Use the PAI PMU characteristics in the pai_pmu table to determine the number of counters to be extracted. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:05 +01:00
Thomas Richter	65b9831bd3	s390/pai_crypto: Rename some function for common usage. To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename functions - paicrypt_free() -> pai_free() - paicrypt_destroy_event() -> pai_destroy_event() - paicrypt_destroy_event_cpu() -> pai_destroy_event_cpu() to reflect their future common usage. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:05 +01:00
Thomas Richter	413957980a	s390/pai_crypto: Introduce generic event init using pai_pmu[] To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rework PAI crypto event initialization. Add a common function for event initialization. It uses the PAI characteristics stored in the pai_pmu table instead of hardcoded values. Enlarge pai_event_valid() to check all event validation aspects. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:05 +01:00
Thomas Richter	a3f8423622	s390/pai_crypto: Add PAI crypto characteristics table for parameters Create and add a PMU characteristics table to store the parameters of the PAI crypto PMU. This table contains PMU details such as - number of available counters - name of these counters to export to /sysfs - Size of the memory mapped counter area - base number of first counter - etc Also define a PMU specific initialization function to be called when a PAI PMU feature is supported. At device driver initialization test these features and if available use instruction qpaci to retrieve the number of available counters. Also export these counter names to /sysfs and register this PMU. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:05 +01:00
Thomas Richter	387c7b5f04	s390/pai_crypto: Rename paicrypt_root_alloc() and paicrypt_root_free() To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename functions paicrypt_root_alloc() and paicrypt_root_free() to pai_root_alloc() and pai_root_free(). No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:05 +01:00
Thomas Richter	3f082c2e47	s390/pai_crypto: Rename structure paicrypt_root To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename structure paicrypt_root to pai_root. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:04 +01:00
Thomas Richter	a626e0d46a	s390/pai_crypto: Rename structure paicrypt_map to pai_map To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename structure paicrypt_map to pai_map. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:04 +01:00
Thomas Richter	2706ea193a	s390/pai_crypto: Rename structure paicrypt_mapptr to pai_mapptr To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename structure paicrypt_mapptr to pai_mapptr. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:04 +01:00
Thomas Richter	c124208b74	s390/pai_crypto: Rename member paicrypt_map::page Rename member page in struct paicrypt_map to area. This rename creates consistent naming for both PMU drivers paicrypto and PMU paiext. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:04 +01:00
Thomas Richter	abc524caa1	s390/pai_crypto: Rename variable cfm_dbg The global variable cfm_dbg points to the s390dbf debug buffer. Rename it to paidbg to better reflect its purpose. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:04 +01:00
Sascha Bischoff	a04fbfb8a1	arm64/sysreg: Add ICH_VMCR_EL2 Add the ICH_VMCR_EL2 register, which is required for the upcoming GICv5 KVM support. This register has two different field encodings, based on if it is used for GICv3 or GICv5-based VMs. The GICv5-specific field encodings are generated with a FEAT_GCIE prefix. This register is already described in the GICv3 KVM code directly. This will be ported across to use the generated encodings as part of an upcoming change. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 18:09:46 +00:00
Sascha Bischoff	a0b130eedd	arm64/sysreg: Move generation of RES0/RES1/UNKN to function The RESx and UNKN define generation happens in two places (EndSysreg and EndSysregFields), and was using nearly identical code. Split this out into a function, and call that instead, rather then keeping the dupliated code. There are no changes to the generated sysregs as part of this change. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 18:09:46 +00:00
Sascha Bischoff	fe2ef46995	arm64/sysreg: Support feature-specific fields with 'Prefix' descriptor Some system register field encodings change based on, for example the in-use architecture features, or the context in which they are accessed. In order to support these different field encodings, introduce the Prefix descriptor (Prefix, EndPrefix) for describing such sysregs. The Prefix descriptor can be used in the following way: Sysreg EXAMPLE 0 1 2 3 4 Prefix FEAT_A Field 63:0 Foo EndPrefix Prefix FEAT_B Field 63:1 Bar Res0 0 EndPrefix Field 63:0 Baz EndSysreg This will generate a single set of system register encodings (REG_, SYS_, ...), and then generate three sets of field definitions for the system register called EXAMPLE. The first set is prefixed by FEAT_A, e.g. FEAT_A_EXAMPLE_Foo. The second set is prefixed by FEAT_B, e.g., FEAT_B_EXAMPLE_Bar. The third set is not given a prefix at all, e.g. EXAMPLE_BAZ. For each set, a corresponding set of defines for Res0, Res1, and Unkn is generated. The intent for the final prefix-less fields is to describe default or legacy field encodings. This ensure that prefixed encodings can be added to already-present sysregs without affecting existing legacy code. Prefixed fields must be defined before those without a prefix, and this is checked by the generator. This ensures consisnt ordering within the sysregs definitions. The Prefix descriptor can be used within Sysreg or SysregFields blocks. Field, Res0, Res1, Unkn, Rax, SignedEnum, Enum can all be used within a Prefix block. Fields and Mapping can not. Fields that vary with features must be described as part of a SysregFields block, instead. Mappings, which are just a code comment, make little sense in this context, and have hence not been included. There are no changes to the generated system register definitions as part of this change. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 18:09:46 +00:00
Sascha Bischoff	0aab5772a5	arm64/sysreg: Fix checks for incomplete sysreg definitions The checks for incomplete sysreg definitions were checking if the next_bit was greater than 0, which is incorrect and missed occasions where bit 0 hasn't been defined for a sysreg. The reason is that next_bit is -1 when all bits have been processed (LSB - 1). Change the checks to use >= 0, instead. Also, set next_bit in Mapping to -1 instead of 0 to match these new checks. There are no changes to the generated sysreg definitons as part of this change, and conveniently no definitions lack definitions for bit 0. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 18:09:46 +00:00
Yang Shi	37cb0aab90	arm64: mm: make linear mapping permission update more robust for patial range The commit `fcf8dda8cc` ("arm64: pageattr: Explicitly bail out when changing permissions for vmalloc_huge mappings") made permission update for partial range more robust. But the linear mapping permission update still assumes update the whole range by iterating from the first page all the way to the last page of the area. Make it more robust by updating the linear mapping permission from the page mapped by start address, and update the number of numpages. Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 18:07:21 +00:00
Dev Jain	c320dbb7c8	arm64/mm: Elide TLB flush in certain pte protection transitions Currently arm64 does an unconditional TLB flush in mprotect(). This is not required for some cases, for example, when changing from PROT_NONE to PROT_READ \| PROT_WRITE (a real usecase - glibc malloc does this to emulate growing into the non-main heaps), and unsetting uffd-wp in a range. Therefore, implement pte_needs_flush() for arm64, which is already implemented by some other arches as well. Running a userspace program changing permissions back and forth between PROT_NONE and PROT_READ \| PROT_WRITE, and measuring the average time taken for the none->rw transition, I get a reduction from 3.2 microseconds to 2.85 microseconds, giving a 12.3% improvement. Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 17:50:25 +00:00
Linu Cherian	1b214452b6	arm64/mm: Rename try_pgd_pgtable_alloc_init_mm With BUG_ON in pgd_pgtable_alloc_init_mm moved up to higher layer, gfp flags is the only difference between try_pgd_pgtable_alloc_init_mm and pgd_pgtable_alloc_init_mm. Hence rename the "try" version to pgd_pgtable_alloc_init_mm_gfp. Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Linu Cherian <linu.cherian@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 16:00:19 +00:00
Chaitanya S Prakash	bfc184cb1b	arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors arch_add_memory() is used to hotplug memory into a system but as a part of its implementation it calls __create_pgd_mapping(), which uses pgtable_alloc() in order to build intermediate page tables. As this path was initally only used during early boot pgtable_alloc() is designed to BUG_ON() on failure. However, in the event that memory hotplug is attempted when the system's memory is extremely tight and the allocation were to fail, it would lead to panicking the system, which is not desirable. Hence update __create_pgd_mapping and all it's callers to be non void and propagate -ENOMEM on allocation failure to allow system to fail gracefully. But during early boot if there is an allocation failure, we want the system to panic, hence create a wrapper around __create_pgd_mapping() called early_create_pgd_mapping() which is designed to panic, if ret is non zero value. All the init calls are updated to use this wrapper rather than the modified __create_pgd_mapping() to restore functionality. Fixes: `4ab2150615` ("arm64: Add memory hotplug support") Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Signed-off-by: Linu Cherian <linu.cherian@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 16:00:19 +00:00
Anshuman Khandual	b0a3f0e894	arm64/sysreg: Replace TCR_EL1 field macros This just replaces all used TCR_EL1 field macros with tools sysreg variant based fields and subsequently drops them from the header (pgtable-hwdef.h), although while retaining the ones used for KVM (represented via the sysreg tools format). Cc: Will Deacon <will@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 15:58:30 +00:00
Xianwei Zhao	fc584d871c	irqchip/meson-gpio: Add support for Amlogic S6 S7 and S7D SoCs The Amlogic S6/S7/S7D SoCs support GPIO interrupt lines: S6 IRQ Number: - 99:98 2 pins on bank CC - 97 1 pin on bank TESTN - 96:81 16 pins on bank A - 80:65 16 pins on bank Z - 64:45 20 pins on bank X - 44:37 8 pins on bank H offs H1 - 36:32 5 pins on bank F - 31:25 7 pins on bank D - 24:22 3 pins on bank E - 21:14 8 pins on bank C - 13:0 14 pins on bank B S7 IRQ Number: - 83:82 2 pins on bank CC - 81 1 pin on bank TESTN - 80:68 13 pins on bank Z - 67:48 20 pins on bank X - 47:36 12 pins on bank H - 35:24 12 pins on bank D - 23:22 2 pins on bank E - 21:14 8 pins on bank C - 13:0 14 pins on bank B S7D IRQ Number: - 83:82 2 pins on bank CC - 81:75 7 pins on bank DV - 74 1 pin on bank TESTN - 73:61 13 pins on bank Z - 60:41 20 pins on bank X - 40:29 12 pins on bank H - 28:24 5 pins on bank D - 23:22 2 pins on bank E - 21:14 8 pins on bank C - 13:0 14 pins on bank B Add the required compatibles and interrupt count initializers. Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patch.msgid.link/20251105-irqchip-gpio-s6-s7-s7d-v1-2-b4d1fe4781c1@amlogic.com	2025-11-13 14:04:16 +01:00
Xianwei Zhao	e4ca152008	dt-bindings: interrupt-controller: Add support for Amlogic S6 S7 and S7D SoCs Update the device tree binding document for GPIO interrupt controller of Amlogic S6 S7 and S7D SoCs. Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20251105-irqchip-gpio-s6-s7-s7d-v1-1-b4d1fe4781c1@amlogic.com	2025-11-13 14:04:16 +01:00
Sean Christopherson	6276c67f2b	x86: Restrict KVM-induced symbol exports to KVM modules where obvious/possible Extend KVM's export macro framework to provide EXPORT_SYMBOL_FOR_KVM(), and use the helper macro to export symbols for KVM throughout x86 if and only if KVM will build one or more modules, and only for those modules. To avoid unnecessary exports when CONFIG_KVM=m but kvm.ko will not be built (because no vendor modules are selected), let arch code #define EXPORT_SYMBOL_FOR_KVM to suppress/override the exports. Note, the set of symbols to restrict to KVM was generated by manual search and audit; any "misses" are due to human error, not some grand plan. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251112173944.1380633-5-seanjc%40google.com	2025-11-12 15:29:38 -08:00
Sean Christopherson	e6f2d5866c	x86/mm: Drop unnecessary export of "ptdump_walk_pgd_level_debugfs" Don't export "ptdump_walk_pgd_level_debugfs" as its sole user is arch/x86/mm/debug_pagetables.c, which can't be built as a module. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251112173944.1380633-4-seanjc%40google.com	2025-11-12 15:24:42 -08:00
Sean Christopherson	9c26c91e10	x86/mtrr: Drop unnecessary export of "mtrr_state" Don't export "mtrr_state" as usage is limited to arch/x86/kernel/cpu/mtrr (and nothing outside of that directory even includes the local mtrr.h). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251112173944.1380633-3-seanjc%40google.com	2025-11-12 15:24:42 -08:00
Sean Christopherson	ed02882460	x86/bugs: Drop unnecessary export of "x86_spec_ctrl_base" Don't export x86_spec_ctrl_base as it's used only in bugs.c and process.c, neither of which can be built into a module. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251112173944.1380633-2-seanjc%40google.com	2025-11-12 15:24:42 -08:00
Bo Liu	337f7e3a4b	arm64: Fix double word in comments Remove the repeated word "the" in comments. Signed-off-by: Bo Liu <liubo03@inspur.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-12 17:07:59 +00:00
mrigendrachaubey	96ac403ea2	arm64: Fix typos and spelling errors in comments This patch corrects several minor typographical and spelling errors in comments across multiple arm64 source files. No functional changes. Signed-off-by: mrigendrachaubey <mrigendra.chaubey@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-12 17:06:21 +00:00
Ryan Chen	7083e14225	dt-bindings: interrupt-controller: aspeed,ast2700: Correct #interrupt-cells and interrupts count Update the AST2700 interrupt controller binding to match the actual hardware and the irq-aspeed-intc driver behavior. - Interrupts: First-level INTC banks request multiple interrupt lines to the root GIC, with a maximum of 10 per bank. Second-level INTC banks request only one interrupt line to their parent INTC-IC. Therefore, set the interrupts property to allow a minimum of 1 and a maximum of 10 entries. - #interrupt-cells: Set '#interrupt-cells' to <1> since the aspeed intc driver does not support specifying a trigger type; only the interrupt index is used. Signed-off-by: Ryan Chen <ryan_chen@aspeedtech.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20251030060155.2342604-2-ryan_chen@aspeedtech.com	2025-11-11 22:20:45 +01:00
Junhui Liu	47a4ebbf91	irqchip/aclint-sswi: Add Nuclei UX900 support Reuse the generic ACLINT SSWI probe for Nuclei UX900 since it is compliant with the ACLINT specification. Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251021-dr1v90-basic-dt-v3-9-5478db4f664a@pigmoral.tech	2025-11-11 22:17:22 +01:00
Junhui Liu	a1c3a7d7ee	dt-bindings: interrupt-controller: Add Anlogic DR1V90 ACLINT SSWI Add SSWI support for Anlogic DR1V90 SoC, which uses Nuclei UX900 with a TIMER unit compliant with the ACLINT specification. Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20251021-dr1v90-basic-dt-v3-6-5478db4f664a@pigmoral.tech	2025-11-11 22:17:21 +01:00
Junhui Liu	579951da64	dt-bindings: interrupt-controller: Add Anlogic DR1V90 ACLINT MSWI Add MSWI support for Anlogic DR1V90 SoC, which uses Nuclei UX900 with a TIMER unit compliant with the ACLINT specification. Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20251021-dr1v90-basic-dt-v3-5-5478db4f664a@pigmoral.tech	2025-11-11 22:17:21 +01:00
Junhui Liu	b90ac5fe32	dt-bindings: interrupt-controller: Add Anlogic DR1V90 PLIC Add PLIC support for Anlogic DR1V90. Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20251021-dr1v90-basic-dt-v3-4-5478db4f664a@pigmoral.tech	2025-11-11 22:17:21 +01:00
Krzysztof Kozlowski	45cc441de7	irqchip/irq-bcm7038-l1: Remove unused reg_mask_status() reg_mask_status() is not referenced anywhere leading to W=1 warning: irq-bcm7038-l1.c:85:28: error: unused function 'reg_mask_status' [-Werror,-Wunused-function] Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20251106155200.337399-2-krzysztof.kozlowski@linaro.org	2025-11-11 22:11:17 +01:00
Charles Mirabile	a045359e72	irqchip/sifive-plic: Fix call to __plic_toggle() in M-Mode code path The code path for M-Mode linux that disables interrupts for other contexts was missed when refactoring __plic_toggle(). Since the new version caches updates to the state for the primary context, its use in this codepath is no longer desireable even if it could be made correct. Replace the calls to __plic_toggle() with a loop that simply disables all of the interrupts in groups of 32 with a direct mmio write. Fixes: `14ff9e54dd` ("irqchip/sifive-plic: Cache the interrupt enable state") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Charles Mirabile <cmirabil@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251103161813.2437427-1-cmirabil@redhat.com Closes: https://lore.kernel.org/oe-kbuild-all/202510271316.AQM7gCCy-lkp@intel.com/	2025-11-11 22:11:16 +01:00
Li Qiang	df717b9564	arm64: add unlikely hint to MTE async fault check in el0_svc_common Add unlikely() hint to the _TIF_MTE_ASYNC_FAULT flag check in el0_svc_common() since asynchronous MTE faults are expected to be rare occurrences during normal system call execution. This optimization helps the compiler to improve instruction caching and branch prediction for the common case where no asynchronous MTE faults are pending, while maintaining correct behavior for the exceptional case where such faults need to be handled prior to system call execution. Signed-off-by: Li Qiang <liqiang01@kylinos.cn> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 19:49:19 +00:00
Thomas Huth	287d163322	arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers While the GCC and Clang compilers already define __ASSEMBLER__ automatically when compiling assembly code, __ASSEMBLY__ is a macro that only gets defined by the Makefiles in the kernel. This can be very confusing when switching between userspace and kernelspace coding, or when dealing with uapi headers that rather should use __ASSEMBLER__ instead. So let's standardize now on the __ASSEMBLER__ macro that is provided by the compilers. This is a mostly mechanical patch (done with a simple "sed -i" statement), except for the following files where comments with mis-spelled macros were tweaked manually: arch/arm64/include/asm/stacktrace/frame.h arch/arm64/include/asm/kvm_ptrauth.h arch/arm64/include/asm/debug-monitors.h arch/arm64/include/asm/esr.h arch/arm64/include/asm/scs.h arch/arm64/include/asm/memory.h Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 19:35:59 +00:00
Thomas Huth	639f08fc20	arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers __ASSEMBLY__ is only defined by the Makefile of the kernel, so this is not really useful for uapi headers (unless the userspace Makefile defines it, too). Let's switch to __ASSEMBLER__ which gets set automatically by the compiler when compiling assembly code. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 19:35:58 +00:00
Osama Abdelkader	420cab0155	arm64: acpi: add newline to deferred APEI warning missing newline in pr_warn_ratelimited in apei_claim_sea Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 19:28:37 +00:00
Linus Walleij	555827a064	arm64: entry: Clean out some indirection The conversion to generic IRQ entry left some functions in the EL1 (kernel) IRQ entry path very shallow, so drop the __inner_functions() where appropriate, saving some time and stack. This is not a fix but an optimization. Drop stale comments about irqentry_enter/exit() while we are at it. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 19:28:05 +00:00
Anshuman Khandual	e2e21a9757	arm64/mm: Ensure PGD_SIZE is aligned to 64 bytes when PA_BITS = 52 Although the comment clearly states about PGD table's alignment requirement (when PA_BITS = 52) but the subsequent BUILD_BUG_ON() tests size comparison to 64 bytes instead. So change it as an actual alignment test. Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 19:13:03 +00:00
Ard Biesheuvel	a5baf582f4	arm64/efi: Call EFI runtime services without disabling preemption The only remaining reason why EFI runtime services are invoked with preemption disabled is the fact that the mm is swapped out behind the back of the context switching code. The kernel no longer disables preemption in kernel_neon_begin(). Furthermore, the EFI spec is being clarified to explicitly state that only baseline FP/SIMD is permitted in EFI runtime service implementations, and so the existing kernel mode NEON context switching code is sufficient to preserve and restore the execution context of an in-progress EFI runtime service call. Most EFI calls are made from the efi_rts_wq, which is serviced by a kthread. As kthreads never return to user space, they usually don't have an mm, and so we can use the existing infrastructure to swap in the efi_mm while the EFI call is in progress. This is visible to the scheduler, which will therefore reactivate the selected mm when switching out the kthread and back in again. Given that the EFI spec explicitly permits runtime services to be called with interrupts enabled, firmware code is already required to tolerate interruptions. So rather than disable preemption, disable only migration so that EFI runtime services are less likely to cause scheduling delays. To avoid potential issues where runtime services are interrupted while polling the secure firmware for async completions, keep migration disabled so that a runtime service invocation does not resume on a different CPU from the one it was started on. Note, though, that the firmware executes at the same privilege level as the kernel, and is therefore able to disable interrupts altogether. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 18:59:22 +00:00
Ard Biesheuvel	6b9c98e657	arm64/efi: Move uaccess en/disable out of efi_set_pgd() efi_set_pgd() will no longer be called when invoking EFI runtime services via the efi_rts_wq work queue, but the uaccess en/disable are still needed when using PAN emulation using TTBR0 switching. So move these into the callers. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 18:59:22 +00:00
Ard Biesheuvel	1068cb52e8	arm64/efi: Drop efi_rt_lock spinlock from EFI arch wrapper Since commit `5894cf571e` ("acpi/prmt: Use EFI runtime sandbox to invoke PRM handlers") all EFI runtime calls on arm64 are routed via the EFI runtime wrappers, which are serialized using the efi_runtime_lock semaphore. This means the efi_rt_lock spinlock in the arm64 arch wrapper code has become redundant, and can be dropped. For robustness, replace it with an assert that the EFI runtime lock is in fact held by 'current'. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 18:59:22 +00:00
Ard Biesheuvel	7137a203b2	arm64/fpsimd: Permit kernel mode NEON with IRQs off Currently, may_use_simd() will return false when called from a context where IRQs are disabled. One notable case where this happens is when calling the ResetSystem() EFI runtime service from the reboot/poweroff code path. For this case alone, there is a substantial amount of FP/SIMD support code to handle the corner case where a EFI runtime service is invoked with IRQs disabled. The only reason kernel mode SIMD is not allowed when IRQs are disabled is that re-enabling softirqs in this case produces a noisy diagnostic when lockdep is enabled. The warning is valid, in the sense that delivering pending softirqs over the back of the call to local_bh_enable() is problematic when IRQs are disabled. While the API lacks a facility to simply mask and unmask softirqs without triggering their delivery, disabling softirqs is not needed to begin with when IRQs are disabled, given that softirqs are only every taken asynchronously over the back of a hard IRQ. So dis/enable softirq processing conditionally, based on whether IRQs are enabled, and relax the check in may_use_simd(). Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 18:59:22 +00:00
Ard Biesheuvel	1d038e8018	arm64/fpsimd: Don't warn when EFI execution context is preemptible Kernel mode FP/SIMD no longer requires preemption to be disabled, so only warn on uses of FP/SIMD from preemptible context if the fallback path is taken for cases where kernel mode NEON would not be allowed otherwise. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 18:59:22 +00:00
Ard Biesheuvel	a286050120	efi/runtime-wrappers: Keep track of the efi_runtime_lock owner The EFI runtime wrappers use a file local semaphore to serialize access to the EFI runtime services. This means that any calls to the arch wrappers around the runtime services will also be serialized, removing the need for redundant locking. For robustness, add a facility that allows those arch wrappers to assert that the semaphore was taken by the current task. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 18:59:22 +00:00
Ard Biesheuvel	40374d308e	efi: Add missing static initializer for efi_mm::cpus_allowed_lock Initialize the cpus_allowed_lock struct member of efi_mm. Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 18:59:22 +00:00
Borislav Petkov (AMD)	b2c1dd6c6f	x86/coco/sev: Convert has_cpuflag() to use cpu_feature_enabled() Drop one redundant definition, while at it. There should be no functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251031122122.GKaQSpwhLvkinKKbjG@fat_crate.local	2025-11-11 16:42:31 +01:00
Ma Ke	f18e71cd6c	EDAC/ie31200: Fix error handling in ie31200_register_mci ie31200_register_mci() calls device_initialize() for priv->dev unconditionally. However, in the error path, put_device() is not called, leading to an imbalance. Similarly, in the unload path, put_device() is missing. Although edac_mc_free() eventually frees the memory, it does not release the device initialized by device_initialize(). For code readability and proper pairing of device_initialize()/put_device(), add put_device() calls in both error and unload paths. Found by code review. Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://patch.msgid.link/20251106084735.35017-1-make24@iscas.ac.cn	2025-11-10 17:06:10 -08:00
Uros Bizjak	fd4e025526	x86/percpu: Use BIT_WORD() and BIT_MASK() macros Use BIT_WORD() and BIT_MASK() macros from <linux/bits.h> in <arch/x86/include/asm/percpu.h> instead of open-coding them. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20250907184915.78041-1-ubizjak@gmail.com	2025-11-10 11:55:54 +01:00
Anshuman Khandual	fc1abd4093	arm64/mm: Drop cpu_set_[default\|idmap]_tcr_t0sz() These TCR_El1 helpers don't have any other callers. Drop these redundant indirections completely thus making this code more compact and readable. No functional change. Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-07 20:01:34 +00:00
Mark Rutland	a7717cad61	kselftest/arm64: Align zt-test register dumps The zt-test output is awkward to read, as the 'Expected' value isn't dumped on its own line and isn't aligned with the 'Got' value beneath. For example: Mismatch: PID=5281, iteration=3270249 Expected [00a1146901a1146902a1146903a1146904a1146905a1146906a1146907a1146908a1146909a114690aa114690ba114690ca114690da114690ea114690fa11469] Got [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] SVCR: 2 Add a newline, matching the other FPSIMD/SVE/SME tests, so that we get output that can be read more easily: Mismatch: PID=5281, iteration=3270249 Expected [00a1146901a1146902a1146903a1146904a1146905a1146906a1146907a1146908a1146909a114690aa114690ba114690ca114690da114690ea114690fa11469] Got [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] SVCR: 2 Admittedly this isn't all that important when the 'Got' value is all zeroes, but otherwise this would be a major help for identifying which portion of the 'Got' value is not as expected. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kselftest@vger.kernel.org Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-07 20:00:49 +00:00
Omar Sandoval	bf6b3fed18	arm64: remove unused ARCH_PFN_OFFSET This is only relevant to the FLATMEM memory model, which isn't an option since commit `782276b4d0` ("arm64: Force SPARSEMEM_VMEMMAP as the only memory management model"). Signed-off-by: Omar Sandoval <osandov@fb.com> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-07 19:59:37 +00:00
Ryo Takakura	d3b570eba7	arm64: use SOFTIRQ_ON_OWN_STACK for enabling softirq stack For those architectures with HAVE_SOFTIRQ_ON_OWN_STACK use their dedicated softirq stack when !PREEMPT_RT. This condition is ensured by SOFTIRQ_ON_OWN_STACK. Let arm64 use SOFTIRQ_ON_OWN_STACK as well to select its usage of the stack. Signed-off-by: Ryo Takakura <ryotkkr98@gmail.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-07 19:55:52 +00:00
Dawei Li	4002068508	arm64: Remove assertion on CONFIG_VMAP_STACK CONFIG_VMAP_STACK is selected by arm64 arch unconditionly since commit `ef6861b8e6` ("arm64: Mandate VMAP_STACK"). Remove the redundant assertion and headers. Signed-off-by: Dawei Li <dawei.li@linux.dev> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-07 19:46:39 +00:00
Yicong Yang	7ab06ea41a	arch_topology: Provide a stub topology_core_has_smt() for !CONFIG_GENERIC_ARCH_TOPOLOGY The arm_pmu driver is using topology_core_has_smt() for retrieving the SMT implementation which depends on CONFIG_GENERIC_ARCH_TOPOLOGY. The config is optional on arm platforms so provide a !CONFIG_GENERIC_ARCH_TOPOLOGY stub for topology_core_has_smt(). Fixes: `c3d78c34ad` ("perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511041757.vuCGOmFc-lkp@intel.com/ Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Yicong Yang <yangyccccc@gmail.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-07 13:45:02 +00:00
Robin Murphy	2d7a824807	perf/arm-ni: Fix and optimise register offset calculation LKP points out an operator precedence oversight in the new NoC S3 support that, annoyingly, my local W=1 build didn't flag. In fixing that, we can also take the similarly-missed opportunity to cache the version check itself at event_init time. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511041749.ok8zDP6u-lkp@intel.com/ Fixes: `8fa08f8835` ("perf/arm-ni: Add NoC S3 support") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-07 13:42:46 +00:00
Marco Crivellari	24e3848a2e	RAS/CEC: Replace use of system_wq with system_percpu_wq Switch to using system_percpu_wq because system_wq is going away as part of a workqueue restructuring. Currently if a user enqueues a work item using schedule_delayed_work() the used workqueue is "system_wq" (per-cpu workqueue) while queue_delayed_work() uses WORK_CPU_UNBOUND (used when a CPU is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use of WORK_CPU_UNBOUND again. This lack of consistency cannot be addressed without refactoring the API. For more details see those commits and the Link tag below. `128ea9f6cc` ("workqueue: Add system_percpu_wq and system_dfl_wq") `930c2ea566` ("workqueue: Add new WQ_PERCPU flag") [ bp: Massage commit message. ] Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de	2025-11-07 13:48:28 +01:00
Sumanth Korikkar	c1287d67c3	s390/sclp_mem: Consider global memory_hotplug.memmap_on_memory setting When the global kernel command line parameter memory_hotplug.memmap_on_memory is set to false, per-memory-block memmap_on_memory setting can still be set to true. However, when configuring memory block, add_memory_resource() would configure it without memmap_on_memory. i.e. Even if the MHP_MEMMAP_ON_MEMORY flag is set, mhp_supports_memmap_on_memory() returns false unless the kernel command line parameter "memory_hotplug.memmap_on_memory" is enabled. When both the flag and the cmdline parameter are set, the memory block can be configured with or without memmap_on_memory support. To ensure consistent behavior, permit configuring per-memory-block memmap_on_memory only when the memory_hotplug.memmap_on_memory kernel command line parameter is enabled. This is similar to commit `73954d379e` ("dax: add a sysfs knob to control memmap_on_memory behavior") Fixes: `ff18dcb19a` ("s390/sclp: Add support for dynamic (de)configuration of memory") Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:18:23 +01:00
Mete Durlu	8840cc4520	s390/hiperdispatch: Decrease steal time threshold Higher steal time thresholds favor low utilization scenarios, which is not the common case for s390. Set steal time threshold to a lower value to prioritize vertical high and medium CPUs sooner and allow high utilization scenarios to benefit from it. Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:17:28 +01:00
Thorsten Blum	eb3a9b405b	s390/smp: Mark pcpu_delegate() and smp_call_ipl_cpu() as __noreturn pcpu_delegate() never returns to its caller. If the target CPU is the current CPU, it calls __pcpu_delegate(), whose delegate function is not supposed to return. In any case, even if __pcpu_delegate() unexpectedly returns, pcpu_delegate() sends SIGP_STOP to the current CPU and waits in an infinite loop. Annotate pcpu_delegate() with the __noreturn attribute to improve compiler optimizations. Also annotate smp_call_ipl_cpu() accordingly since it always calls pcpu_delegate(). [hca: Merge two patches from Thorsten Blum] Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:17:28 +01:00
Thorsten Blum	f07ebfa5e4	s390/nmi: Annotate s390_handle_damage() with __noreturn s390_handle_damage() ends by calling the non-returning function disabled_wait() and therefore also never returns. Annotate it with the __noreturn compiler attribute to improve compiler optimizations. Remove the unreachable infinite while loop. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:17:28 +01:00
Bo Liu	858063c1ae	s390: Fix double word in comments Remove the repeated word "the" in comments. Signed-off-by: Bo Liu <liubo03@inspur.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:17:27 +01:00
Heiko Carstens	547e9feb0e	Merge branch 'dat-enhancement-1' Heiko Carstens says: ==================== Add the Dat-Enhancement facility 1 to the list of facilities which are required to start the kernel. The facility provides the CSPG and IDTE instructions. In particular the CSPG instruction can be used to replace a valid page table entry with a different page table entry, which also differs in the page frame real address. Without the CSPG instruction it is possible to use the CSP instruction to change valid page table entries, however it only allows to change the lower or higher 32 bits of such entries, which means it cannot be used to change the page frame real address of valid page table entries. Given that there is code around (e.g. HugeTLB vmemmap optimization) which requires to change valid page table entries of the kernel mapping, without the detour over an invalid page table entry, make the CSPG instruction unconditionally available. The Dat-Enhancement facility 1 is available since z990, which is older than the currently supported minimum architecture (z10). Therefore adding this the architecture level set shouldn't cause any problems. ==================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:14:00 +01:00
Heiko Carstens	68807a894f	s390/mm: Replace the CSP instruction with CSPG The CSPG instruction is part of the Dat-Enhancement facility 1, which is always available. Given that it can be used everywhere where also the CSP instruction can be used, replace CSP with CSPG everywhere. This allows to remove the csp() inline assembly. Also remove the unused gmap_pmdp_csp() function. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:12:31 +01:00
Heiko Carstens	220d8e10d6	s390/mm: Remove cpu_has_idte() Remove cpu_has_idte(). The IDTE instruction is part of the Dat-Enhancement facility 1, which is always available. Therefore remove the helper and now superfluous code. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:12:31 +01:00
Heiko Carstens	73c4b5d728	s390: Add Dat-Enhancement facility 1 to architecture level set Add the Dat-Enhancement facility 1 to the list of facilities which are required to start the kernel. The facility provides the CSPG and IDTE instructions. In particular the CSPG instruction can be used to replace a valid page table entry with a different page table entry, which also differs in the page frame real address. Without the CSPG instruction it is possible to use the CSP instruction to change valid page table entries, however it only allows to change the lower or higher 32 bits of such entries, which means it cannot be used to change the page frame real address of valid page table entries. Given that there is code around (e.g. HugeTLB vmemmap optimization) which requires to change valid page table entries of the kernel mapping, without the detour over an invalid page table entry, make the CSPG instruction unconditionally available. The Dat-Enhancement facility 1 is available since z990, which is older than the currently supported minimum architecture (z10). Therefore adding this to the architecture level set shouldn't cause any problems. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:12:30 +01:00
Avadhut Naik	8616025ae6	EDAC: Remove the legacy EDAC sysfs interface Commit `1997471069` ("edac: add a new per-dimm API and make the old per-virtual-rank API obsolete") introduced a new per-DIMM sysfs interface for EDAC making the old per-virtual-rank sysfs interface obsolete. Since this new sysfs interface was introduced more than a decade ago, remove the obsolete legacy interface. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251106015727.1987246-1-avadhut.naik@amd.com	2025-11-06 13:21:29 +01:00
Avadhut Naik	6a85796915	EDAC/amd64: Remove NUM_CONTROLLERS macro Currently, the NUM_CONTROLLERS macro is used to limit the amount of memory controllers (UMCs) available per node. The number of UMCs available per node, however, is already cached by the max_mcs variable of struct amd64_pvt. Allocate the relevant data structures dynamically using the variable instead of static allocation through the macro. The max_mcs variable is used for legacy systems too. These systems have a max of 2 controllers. Since the default value of max_mcs, set in per_family_init(), is 2, these legacy systems are also covered. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251106015727.1987246-1-avadhut.naik@amd.com	2025-11-06 12:51:33 +01:00
Avadhut Naik	e9abd990ae	EDAC/amd64: Generate ctl_name string at runtime Currently, the ctl_name string is statically assigned based on the family and model of the SOC when the amd64_edac module is loaded. The same, however, is not exactly needed as the string can be generated and assigned at runtime through scnprintf(). Remove all static assignments and generate the string at runtime. Also, cleanup the switch cases which became defunct and consolidate identical cases. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251106015727.1987246-1-avadhut.naik@amd.com	2025-11-06 12:35:59 +01:00
Yazen Ghannam	56f17be67a	x86/mce/amd: Define threshold restart function for banks Prepare for CMCI storm support by moving the common bank/block iterator code to a helper function. Include a parameter to switch the interrupt enable. This will be used by the CMCI storm handling function. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com	2025-11-05 22:38:31 +01:00
Yazen Ghannam	3206b41604	x86/mce/amd: Remove redundant reset_block() Many of the checks in reset_block() are done again in the block reset function. So drop the redundant checks. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com	2025-11-05 22:34:53 +01:00
Yazen Ghannam	4efaec6e16	x86/mce/amd: Support SMCA Corrected Error Interrupt AMD systems optionally support MCA thresholding which provides the ability for hardware to send an interrupt when a set error threshold is reached. This feature counts errors of all severities, but it is commonly used to report correctable errors with an interrupt rather than polling. Scalable MCA systems allow the platform to take control of this feature. In this case, the OS will not see the feature configuration and control bits in the MCA_MISC* registers. The OS will not receive the MCA thresholding interrupt, and it will need to poll for correctable errors. A "corrected error interrupt" will be available on Scalable MCA systems. This will be used in the same configuration where the platform controls MCA thresholding. However, the platform will now be able to send the MCA thresholding interrupt to the OS. Check for, and enable, this feature during per-CPU SMCA init. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com	2025-11-05 22:10:23 +01:00
Usama Arif	8436112341	efi/libstub: Fix page table access in 5-level to 4-level paging transition When transitioning from 5-level to 4-level paging, the existing code incorrectly accesses page table entries by directly dereferencing CR3 and applying PAGE_MASK. This approach has several issues: - __native_read_cr3() returns the raw CR3 register value, which on x86_64 includes not just the physical address but also flags Bits above the physical address width of the system (i.e. above __PHYSICAL_MASK_SHIFT) are also not masked. - The pgd value is masked by PAGE_SIZE which doesn't take into account the higher bits such as _PAGE_BIT_NOPTISHADOW. Replace this with proper accessor functions: - native_read_cr3_pa(): Uses CR3_ADDR_MASK to additionally mask metadata out of CR3 (like SME or LAM bits). All remaining bits are real address bits or reserved and must be 0. - mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for flags above bit 51 (_PAGE_BIT_NOPTISHADOW in particular). Bits below 51, but above the max physical address are reserved and must be 0. Fixes: `cb1c9e02b0` ("x86/efistub: Perform 4/5 level paging switch from the stub") Reported-by: Michael van der Westhuizen <rmikey@meta.com> Reported-by: Tobias Fleig <tfleig@meta.com> Co-developed-by: Kiryl Shutsemau <kas@kernel.org> Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Signed-off-by: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://patch.msgid.link/20251103141002.2280812-3-usamaarif642@gmail.com	2025-11-05 17:31:32 +01:00
Usama Arif	eb22663125	x86/boot: Fix page table access in 5-level to 4-level paging transition When transitioning from 5-level to 4-level paging, the existing code incorrectly accesses page table entries by directly dereferencing CR3 and applying PAGE_MASK. This approach has several issues: - __native_read_cr3() returns the raw CR3 register value, which on x86_64 includes not just the physical address but also flags. Bits above the physical address width of the system i.e. above __PHYSICAL_MASK_SHIFT) are also not masked. - The PGD entry is masked by PAGE_SIZE which doesn't take into account the higher bits such as _PAGE_BIT_NOPTISHADOW. Replace this with proper accessor functions: - native_read_cr3_pa(): Uses CR3_ADDR_MASK to additionally mask metadata out of CR3 (like SME or LAM bits). All remaining bits are real address bits or reserved and must be 0. - mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for flags above bit 51 (_PAGE_BIT_NOPTISHADOW in particular). Bits below 51, but above the max physical address are reserved and must be 0. Fixes: `e9d0e6330e` ("x86/boot/compressed/64: Prepare new top-level page table for trampoline") Reported-by: Michael van der Westhuizen <rmikey@meta.com> Reported-by: Tobias Fleig <tfleig@meta.com> Co-developed-by: Kiryl Shutsemau <kas@kernel.org> Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Signed-off-by: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/a482fd68-ce54-472d-8df1-33d6ac9f6bb5@intel.com	2025-11-05 17:19:11 +01:00
Yazen Ghannam	134b1eabe6	x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems Scalable MCA systems have a per-CPU register that gives the APIC LVT offset for the thresholding and deferred error interrupts. Currently, this register is read once to set up the deferred error interrupt and then read again for each thresholding block. Furthermore, the APIC LVT registers are configured each time, but they only need to be configured once per-CPU. Move the APIC LVT setup to the early part of CPU init, so that the registers are set up once. Also, this ensures that the kernel is ready to service the interrupts before the individual error sources (each MCA bank) are enabled. Apply this change only to SMCA systems to avoid breaking any legacy behavior. The deferred error interrupt is technically advertised by the SUCCOR feature. However, this was first made available on SMCA systems. Therefore, only set up the deferred error interrupt on SMCA systems and simplify the code. Guidance from hardware designers is that the LVT offsets provided from the platform should be used. The kernel should not try to enforce specific values. However, the kernel should check that an LVT offset is not reused for multiple sources. Therefore, remove the extra checking and value enforcement from the MCE code. The "reuse/conflict" case is already handled in setup_APIC_eilvt(). Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com	2025-11-05 16:51:27 +01:00
Yazen Ghannam	7cb735d7c0	x86/mce: Unify AMD DFR handler with MCA Polling AMD systems optionally support a deferred error interrupt. The interrupt should be used as another signal to trigger MCA polling. This is similar to how other MCA interrupts are handled. Deferred errors do not require any special handling related to the interrupt, e.g. resetting or rearming the interrupt, etc. However, Scalable MCA systems include a pair of registers, MCA_DESTAT and MCA_DEADDR, that should be checked for valid errors. This check should be done whenever MCA registers are polled. Currently, the deferred error interrupt does this check, but the MCA polling function does not. Call the MCA polling function when handling the deferred error interrupt. This keeps all "polling" cases in a common function. Add an SMCA status check helper. This will do the same status check and register clearing that the interrupt handler has done. And it extends the common polling flow to find AMD deferred errors. Clear the MCA_DESTAT register at the end of the handler rather than the beginning. This maintains the procedure that the 'status' register must be cleared as the final step. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com	2025-11-05 16:41:32 +01:00
Yazen Ghannam	34da4a5d68	x86/mce: Unify AMD THR handler with MCA Polling AMD systems optionally support an MCA thresholding interrupt. The interrupt should be used as another signal to trigger MCA polling. This is similar to how the Intel Corrected Machine Check interrupt (CMCI) is handled. AMD MCA thresholding is managed using the MCA_MISC registers within an MCA bank. The OS will need to modify the hardware error count field in order to reset the threshold limit and rearm the interrupt. Management of the MCA_MISC register should be done as a follow up to the basic MCA polling flow. It should not be the main focus of the interrupt handler. Furthermore, future systems will have the ability to send an MCA thresholding interrupt to the OS even when the OS does not manage the feature, i.e. MCA_MISC registers are Read-as-Zero/Locked. Call the common MCA polling function when handling the MCA thresholding interrupt. This will allow the OS to find any valid errors whether or not the MCA thresholding feature is OS-managed. Also, this allows the common MCA polling options and kernel parameters to apply to AMD systems. Add a callback to the MCA polling function to check and reset any threshold blocks that have reached their threshold limit. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com	2025-11-05 13:41:18 +01:00
Marc Herbert	41f4767000	x86/msr: Add CPU_OUT_OF_SPEC taint name to "unrecognized" pr_warn(msg) While restricting access, `a7e1f67ed2` ("x86/msr: Filter MSR writes") also added warning and started tainting the kernel. But the warning message never mentioned tainting. Moreover, this uses the "CPU_OUT_OF_SPEC" flag which is not clearly related to MSRs: that flag is overloaded by several, fairly different situations, including some much scarier ones. So, without an expert around (thank you Dave Hansen), it would have been practically impossible to root cause the tainting from just the log file at hand. So it would be prudent to explicitly mention in the logs when the tainting happens so that debugging crashes can be made easier. Fix this by simply appending the CPU_OUT_OF_SPEC flag to the warning message. This readability issue happened when staring at logs involving the Intel Memory Latency Checker (among many other things going on in that log). The MLC disables hardware prefetch. [ bp: Massage and extend commit message. ] Signed-off-by: Marc Herbert <marc.herbert@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20251101-tainted-msr-v1-1-e00658ba04d4@linux.intel.com	2025-11-05 13:14:42 +01:00
Borislav Petkov (AMD)	47955b58cf	x86/cpufeatures: Correct LKGS feature flag description Quotation marks in cpufeatures.h comments are special and when the comment begins with a quoted string, that string lands in /proc/cpuinfo, turning it into a user-visible one. The LKGS comment doesn't begin with a quoted string but just in case drop the quoted "kernel" in there to avoid confusion. And while at it, simply change the description into what the LKGS instruction does for more clarity. No functional changes. Reviewed-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20251015103548.10194-1-bp@kernel.org	2025-11-04 23:09:34 +01:00
Peter Zijlstra	1fe4002cf7	x86/ptrace: Always inline trivial accessors A KASAN build bloats these single load/store helpers such that it fails to inline them: vmlinux.o: error: objtool: irqentry_exit+0x5e8: call to instruction_pointer_set() with UACCESS enabled Make sure the compiler isn't allowed to do stupid. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251031105435.GU4068168@noisy.programming.kicks-ass.net	2025-11-04 08:36:20 +01:00
Peter Zijlstra	323d93f043	cleanup: Always inline everything KASAN bloat caused cleanup helper functions to not get inlined: vmlinux.o: error: objtool: irqentry_exit+0x323: call to class_user_rw_access_destructor() with UACCESS enabled Force inline all the cleanup helpers like they already are on normal builds. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251031105435.GU4068168@noisy.programming.kicks-ass.net	2025-11-04 08:35:58 +01:00
Thomas Gleixner	32034df66b	rseq: Switch to TIF_RSEQ if supported TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is suboptimal especially with the RSEQ fast path depending on it, but not really handling it. Define a separate TIF_RSEQ in the generic TIF space and enable the full separation of fast and slow path for architectures which utilize that. That avoids the hassle with invocations of resume_user_mode_work() from hypervisors, which clear TIF_NOTIFY_RESUME. It makes the therefore required re-evaluation at the end of vcpu_run() a NOOP on architectures which utilize the generic TIF space and have a separate TIF_RSEQ. The hypervisor TIF handling does not include the separate TIF_RSEQ as there is no point in doing so. The guest does neither know nor care about the VMM host applications RSEQ state. That state is only relevant when the ioctl() returns to user space. The fastpath implementation still utilizes TIF_NOTIFY_RESUME for failure handling, but this only happens within exit_to_user_mode_loop(), so arguably the hypervisor ioctl() code is long done when this happens. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.903622031@linutronix.de	2025-11-04 08:35:37 +01:00
Thomas Gleixner	7a5201ea19	rseq: Split up rseq_exit_to_user_mode() Separate the interrupt and syscall exit handling. Syscall exit does not require to clear the user_irq bit as it can't be set. On interrupt exit it can be set when the interrupt did not result in a scheduling event and therefore the return path did not invoke the TIF work handling, which would have cleared it. The debug check for the event state is also not really required even when debug mode is enabled via the static key. Debug mode is largely aiding user space by enabling a larger amount of validation checks, which cause a segfault when a malformed critical section is detected. In production mode the critical section handling takes the content mostly as is and lets user space keep the pieces when it screwed up. On kernel changes in that area the state check is useful, but that can be done when lockdep is enabled, which is anyway a required test scenario for fundamental changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.842785700@linutronix.de	2025-11-04 08:35:30 +01:00
Thomas Gleixner	70fe25a3bc	entry: Split up exit_to_user_mode_prepare() exit_to_user_mode_prepare() is used for both interrupts and syscalls, but there is extra rseq work, which is only required for in the interrupt exit case. Split up the function and provide wrappers for syscalls and interrupts, which allows to separate the rseq exit work in the next step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.782234789@linutronix.de	2025-11-04 08:35:17 +01:00
Thomas Gleixner	3db6b38dfe	rseq: Switch to fast path processing on exit to user Now that all bits and pieces are in place, hook the RSEQ handling fast path function into exit_to_user_mode_prepare() after the TIF work bits have been handled. If case of fast path failure, TIF_NOTIFY_RESUME has been raised and the caller needs to take another turn through the TIF handling slow path. This only works for architectures which use the generic entry code. Architectures who still have their own incomplete hacks are not supported and won't be. This results in the following improvements: Kernel build Before After Reduction exit to user 80692981 80514451 signal checks: 32581 121 99% slowpath runs: 1201408 1.49% 198 0.00% 100% fastpath runs: 675941 0.84% N/A id updates: 1233989 1.53% 50541 0.06% 96% cs checks: 1125366 1.39% 0 0.00% 100% cs cleared: 1125366 100% 0 100% cs fixup: 0 0% 0 RSEQ selftests Before After Reduction exit to user: 386281778 387373750 signal checks: 35661203 0 100% slowpath runs: 140542396 36.38% 100 0.00% 100% fastpath runs: `9509789` 2.51% N/A id updates: 176203599 45.62% 9087994 2.35% 95% cs checks: 175587856 45.46% 4728394 1.22% 98% cs cleared: 172359544 98.16% 1319307 27.90% 99% cs fixup: 3228312 1.84% 3409087 72.10% The 'cs cleared' and 'cs fixup' percentages are not relative to the exit to user invocations, they are relative to the actual 'cs check' invocations. While some of this could have been avoided in the original code, like the obvious clearing of CS when it's already clear, the main problem of going through TIF_NOTIFY_RESUME cannot be solved. In some workloads the RSEQ notify handler is invoked more than once before going out to user space. Doing this once when everything has stabilized is the only solution to avoid this. The initial attempt to completely decouple it from the TIF work turned out to be suboptimal for workloads, which do a lot of quick and short system calls. Even if the fast path decision is only 4 instructions (including a conditional branch), this adds up quickly and becomes measurable when the rate for actually having to handle rseq is in the low single digit percentage range of user/kernel transitions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.701201365@linutronix.de	2025-11-04 08:34:39 +01:00
Thomas Gleixner	05b44aef70	rseq: Implement fast path for exit to user Implement the actual logic for handling RSEQ updates in a fast path after handling the TIF work and at the point where the task is actually returning to user space. This is the right point to do that because at this point the CPU and the MM CID are stable and cannot longer change due to yet another reschedule. That happens when the task is handling it via TIF_NOTIFY_RESUME in resume_user_mode_work(), which is invoked from the exit to user mode work loop. The function is invoked after the TIF work is handled and runs with interrupts disabled, which means it cannot resolve page faults. It therefore disables page faults and in case the access to the user space memory faults, it: - notes the fail in the event struct - raises TIF_NOTIFY_RESUME - returns false to the caller The caller has to go back to the TIF work, which runs with interrupts enabled and therefore can resolve the page faults. This happens mostly on fork() when the memory is marked COW. If the user memory inspection finds invalid data, the function returns false as well and sets the fatal flag in the event struct along with TIF_NOTIFY_RESUME. The slow path notify handler has to evaluate that flag and terminate the task with SIGSEGV as documented. The initial decision to invoke any of this is based on one flags in the event struct: @sched_switch. The decision is in pseudo ASM: load tsk::event::sched_switch jnz inspect_user_space mov $0, tsk::event::events ... leave So for the common case where the task was not scheduled out, this really boils down to three instructions before going out if the compiler is not completely stupid (and yes, some of them are). If the condition is true, then it checks, whether CPU ID or MM CID have changed. If so, then the CPU/MM IDs have to be updated and are thereby cached for the next round. The update unconditionally retrieves the user space critical section address to spare another user*begin/end() pair. If that's not zero and tsk::event::user_irq is set, then the critical section is analyzed and acted upon. If either zero or the entry came via syscall the critical section analysis is skipped. If the comparison is false then the critical section has to be analyzed because the event flag is then only true when entry from user was by interrupt. This is provided without the actual hookup to let reviewers focus on the implementation details. The hookup happens in the next step. Note: As with quite some other optimizations this depends on the generic entry infrastructure and is not enabled to be sucked into random architecture implementations. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.638929615@linutronix.de	2025-11-04 08:34:18 +01:00
Thomas Gleixner	39a167560a	rseq: Optimize event setting After removing the various condition bits earlier it turns out that one extra information is needed to avoid setting event::sched_switch and TIF_NOTIFY_RESUME unconditionally on every context switch. The update of the RSEQ user space memory is only required, when either the task was interrupted in user space and schedules or the CPU or MM CID changes in schedule() independent of the entry mode Right now only the interrupt from user information is available. Add an event flag, which is set when the CPU or MM CID or both change. Evaluate this event in the scheduler to decide whether the sched_switch event and the TIF bit need to be set. It's an extra conditional in context_switch(), but the downside of unconditionally handling RSEQ after a context switch to user is way more significant. The utilized boolean logic minimizes this to a single conditional branch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.578058898@linutronix.de	2025-11-04 08:34:03 +01:00
Thomas Gleixner	e2d4f42271	rseq: Rework the TIF_NOTIFY handler Replace the whole logic with a new implementation, which is shared with signal delivery and the upcoming exit fast path. Contrary to the original implementation, this ignores invocations from KVM/IO-uring, which invoke resume_user_mode_work() with the @regs argument set to NULL. The original implementation updated the CPU/Node/MM CID fields, but that was just a side effect, which was addressing the problem that this invocation cleared TIF_NOTIFY_RESUME, which in turn could cause an update on return to user space to be lost. This problem has been addressed differently, so that it's not longer required to do that update before entering the guest. That might be considered a user visible change, when the hosts thread TLS memory is mapped into the guest, but as this was never intentionally supported, this abuse of kernel internal implementation details is not considered an ABI break. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.517640811@linutronix.de	2025-11-04 08:33:54 +01:00
Thomas Gleixner	9f6ffd4ceb	rseq: Separate the signal delivery path Completely separate the signal delivery path from the notify handler as they have different semantics versus the event handling. The signal delivery only needs to ensure that the interrupted user context was not in a critical section or the section is aborted before it switches to the signal frame context. The signal frame context does not have the original instruction pointer anymore, so that can't be handled on exit to user space. No point in updating the CPU/CID ids as they might change again before the task returns to user space for real. The fast path optimization, which checks for the 'entry from user via interrupt' condition is only available for architectures which use the generic entry code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.455429038@linutronix.de	2025-11-04 08:33:47 +01:00
Thomas Gleixner	0f085b4188	rseq: Provide and use rseq_set_ids() Provide a new and straight forward implementation to set the IDs (CPU ID, Node ID and MM CID), which can be later inlined into the fast path. It does all operations in one scoped_user_rw_access() section and retrieves also the critical section member (rseq::cs_rseq) from user space to avoid another user..begin/end() pair. This is in preparation for optimizing the fast path to avoid extra work when not required. On rseq registration set the CPU ID fields to RSEQ_CPU_ID_UNINITIALIZED and node and MM CID to zero. That's the same as the kernel internal reset values. That makes the debug validation in the exit code work correctly on the first exit to user space. Use it to replace the whole related zoo in rseq.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.393972266@linutronix.de	2025-11-04 08:33:33 +01:00
Thomas Gleixner	eaa9088d56	rseq: Use static branch for syscall exit debug when GENERIC_IRQ_ENTRY=y Make the syscall exit debug mechanism available via the static branch on architectures which utilize the generic entry code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.333440475@linutronix.de	2025-11-04 08:33:27 +01:00
Thomas Gleixner	c1cbad8f99	rseq: Make exit debugging static branch based Disconnect it from the config switch and use the static debug branch. This is a temporary measure for validating the rework. At the end this check needs to be hidden behind lockdep as it has nothing to do with the other debug infrastructure, which mainly aids user space debugging by enabling a zoo of checks which terminate misbehaving tasks instead of letting them keep the hard to diagnose pieces. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.272660745@linutronix.de	2025-11-04 08:33:20 +01:00
Thomas Gleixner	f7ee1964ac	rseq: Replace the original debug implementation Just utilize the new infrastructure and put the original one to rest. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.212510692@linutronix.de	2025-11-04 08:33:12 +01:00
Thomas Gleixner	abc850e761	rseq: Provide and use rseq_update_user_cs() Provide a straight forward implementation to check for and eventually clear/fixup critical sections in user space. The non-debug version does only the minimal sanity checks and aims for efficiency. There are two attack vectors, which are checked for: 1) An abort IP which is in the kernel address space. That would cause at least x86 to return to kernel space via IRET. 2) A rogue critical section descriptor with an abort IP pointing to some arbitrary address, which is not preceded by the RSEQ signature. If the section descriptors are invalid then the resulting misbehaviour of the user space application is not the kernels problem. The kernel provides a run-time switchable debug slow path, which implements the full zoo of checks including termination of the task when one of the gazillion conditions is not met. Replace the zoo in rseq.c with it and invoke it from the TIF_NOTIFY_RESUME handler. Move the remainders into the CONFIG_DEBUG_RSEQ section, which will be replaced and removed in a subsequent step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.151465632@linutronix.de	2025-11-04 08:32:57 +01:00
Thomas Gleixner	9c37cb6e80	rseq: Provide static branch for runtime debugging Config based debug is rarely turned on and is not available easily when things go wrong. Provide a static branch to allow permanent integration of debug mechanisms along with the usual toggles in Kconfig, command line and debugfs. Requested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.089270547@linutronix.de	2025-11-04 08:32:49 +01:00
Thomas Gleixner	5412910487	rseq: Expose lightweight statistics in debugfs Analyzing the call frequency without actually using tracing is helpful for analysis of this infrastructure. The overhead is minimal as it just increments a per CPU counter associated to each operation. The debugfs readout provides a racy sum of all counters. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.027916598@linutronix.de	2025-11-04 08:32:41 +01:00
Thomas Gleixner	dab344753e	rseq: Provide tracepoint wrappers for inline code Provide tracepoint wrappers for the upcoming RSEQ exit to user space inline fast path, so that the header can be safely included by code which defines actual trace points. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.967114316@linutronix.de	2025-11-04 08:32:35 +01:00
Thomas Gleixner	2fc0e4b412	rseq: Record interrupt from user space For RSEQ the only relevant reason to inspect and eventually fixup (abort) user space critical sections is when user space was interrupted and the task was scheduled out. If the user to kernel entry was from a syscall no fixup is required. If user space invokes a syscall from a critical section it can keep the pieces as documented. This is only supported on architectures which utilize the generic entry code. If your architecture does not use it, bad luck. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.905067101@linutronix.de	2025-11-04 08:32:23 +01:00
Thomas Gleixner	4b7de6df20	rseq: Cache CPU ID and MM CID values In preparation for rewriting RSEQ exit to user space handling provide storage to cache the CPU ID and MM CID values which were written to user space. That prepares for a quick check, which avoids the update when nothing changed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.841964081@linutronix.de	2025-11-04 08:32:14 +01:00
Thomas Gleixner	4fc9225d19	sched: Move MM CID related functions to sched.h There is nothing mm specific in that and including mm.h can cause header recursion hell. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.778457951@linutronix.de	2025-11-04 08:32:04 +01:00
Thomas Gleixner	7702a9c285	entry: Inline irqentry_enter/exit_from/to_user_mode() There is no point to have this as a function which just inlines enter_from_user_mode(). The function call overhead is larger than the function itself. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.715309918@linutronix.de	2025-11-04 08:31:47 +01:00
Thomas Gleixner	54a5ab5624	entry: Remove syscall_enter_from_user_mode_prepare() Open code the only user in the x86 syscall code and reduce the zoo of functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.652839989@linutronix.de	2025-11-04 08:31:37 +01:00
Thomas Gleixner	5204be1679	entry: Clean up header Clean up the include ordering, kernel-doc and other trivialities before making further changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.590338411@linutronix.de	2025-11-04 08:31:14 +01:00
Thomas Gleixner	faba9d250e	rseq: Introduce struct rseq_data In preparation for a major rewrite of this code, provide a data structure for rseq management. Put all the rseq related data into it (except for the debug part), which allows to simplify fork/execve by using memset() and memcpy() instead of adding new fields to initialize over and over. Create a storage struct for event management as well and put the sched_switch event and a indicator for RSEQ on a task into it as a start. That uses a union, which allows to mask and clear the whole lot efficiently. The indicators are explicitly not a bit field. Bit fields generate abysmal code. The boolean members are defined as u8 as that actually guarantees that it fits. There seem to be strange architecture ABIs which need more than 8 bits for a boolean. The has_rseq member is redundant vs. task::rseq, but it turns out that boolean operations and quick checks on the union generate better code than fiddling with separate entities and data types. This struct will be extended over time to carry more information. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.527086690@linutronix.de	2025-11-04 08:30:50 +01:00
Thomas Gleixner	566d8015f7	rseq: Avoid CPU/MM CID updates when no event pending There is no need to update these values unconditionally if there is no event pending. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.462964916@linutronix.de	2025-11-04 08:30:43 +01:00
Thomas Gleixner	83409986f4	rseq, virt: Retrigger RSEQ after vcpu_run() Hypervisors invoke resume_user_mode_work() before entering the guest, which clears TIF_NOTIFY_RESUME. The @regs argument is NULL as there is no user space context available to them, so the rseq notify handler skips inspecting the critical section, but updates the CPU/MM CID values unconditionally so that the eventual pending rseq event is not lost on the way to user space. This is a pointless exercise as the task might be rescheduled before actually returning to user space and it creates unnecessary work in the vcpu_run() loops. It's way more efficient to ignore that invocation based on @regs == NULL and let the hypervisors re-raise TIF_NOTIFY_RESUME after returning from the vcpu_run() loop before returning from the ioctl(). This ensures that a pending RSEQ update is not lost and the IDs are updated before returning to user space. Once the RSEQ handling is decoupled from TIF_NOTIFY_RESUME, this turns into a NOOP. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20251027084306.399495855@linutronix.de	2025-11-04 08:30:23 +01:00
Thomas Gleixner	d923739e2e	rseq: Simplify the event notification Since commit `0190e4198e` ("rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flags") the bits in task::rseq_event_mask are meaningless and just extra work in terms of setting them individually. Aside of that the only relevant point where an event has to be raised is context switch. Neither the CPU nor MM CID can change without going through a context switch. Collapse them all into a single boolean which simplifies the code a lot and remove the pointless invocations which have been sprinkled all over the place for no value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.336978188@linutronix.de	2025-11-04 08:30:09 +01:00
Thomas Gleixner	067b3b41b4	rseq: Simplify registration There is no point to read the critical section element in the newly registered user space RSEQ struct first in order to clear it. Just clear it and be done with it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.274661227@linutronix.de	2025-11-04 08:30:05 +01:00
Thomas Gleixner	41b43a6ba3	rseq: Remove the ksig argument from rseq_handle_notify_resume() There is no point for this being visible in the resume_to_user_mode() handling. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.211520245@linutronix.de	2025-11-04 08:30:01 +01:00
Thomas Gleixner	77f19e4d4f	rseq: Move algorithm comment to top Move the comment which documents the RSEQ algorithm to the top of the file, so it does not create horrible diffs later when the actual implementation is fed into the mincer. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.149519580@linutronix.de	2025-11-04 08:29:52 +01:00
Thomas Gleixner	fdc0f39d28	rseq: Condense the inline stubs Scrolling over tons of pointless { } lines to find the actual code is annoying at best. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.085971048@linutronix.de	2025-11-04 08:29:08 +01:00
Thomas Gleixner	3ca59da7aa	rseq: Avoid pointless evaluation in __rseq_notify_resume() The RSEQ critical section mechanism only clears the event mask when a critical section is registered, otherwise it is stale and collects bits. That means once a critical section is installed the first invocation of that code when TIF_NOTIFY_RESUME is set will abort the critical section, even when the TIF bit was not raised by the rseq preempt/migrate/signal helpers. This also has a performance implication because TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is utilized by quite some infrastructure. That means every invocation of __rseq_notify_resume() goes unconditionally through the heavy lifting of user space access and consistency checks even if there is no reason to do so. Keeping the stale event mask around when exiting to user space also prevents it from being utilized by the upcoming time slice extension mechanism. Avoid this by reading and clearing the event mask before doing the user space critical section access with interrupts or preemption disabled, which ensures that the read and clear operation is CPU local atomic versus scheduling and the membarrier IPI. This is correct as after re-enabling interrupts/preemption any relevant event will set the bit again and raise TIF_NOTIFY_RESUME, which makes the user space exit code take another round of TIF bit clearing. If the event mask was non-zero, invoke the slow path. On debug kernels the slow path is invoked unconditionally and the result of the event mask evaluation is handed in. Add a exit path check after the TIF bit loop, which validates on debug kernels that the event mask is zero before exiting to user space. While at it reword the convoluted comment why the pt_regs pointer can be NULL under certain circumstances. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.022571576@linutronix.de	2025-11-04 08:28:38 +01:00
Thomas Gleixner	3ce17e6909	select: Convert to scoped user access Replace the open coded implementation with the scoped user access guard. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027083745.862419776@linutronix.de	2025-11-04 08:28:34 +01:00
Thomas Gleixner	e02718c986	x86/futex: Convert to scoped user access Replace the open coded implementation with the scoped user access guards No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251027083745.799714344@linutronix.de	2025-11-04 08:28:29 +01:00
Thomas Gleixner	e4e28fd698	futex: Convert to get/put_user_inline() Replace the open coded implementation with the new get/put_user_inline() helpers. This might be replaced by a regular get/put_user(), but that needs a proper performance evaluation. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251027083745.736737934@linutronix.de	2025-11-04 08:28:23 +01:00
Thomas Gleixner	b2cfc0cd68	uaccess: Provide put/get_user_inline() Provide convenience wrappers around scoped user access similar to put/get_user(), which reduce the usage sites to: if (!get_user_inline(val, ptr)) return -EFAULT; Should only be used if there is a demonstrable performance benefit. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027083745.609031602@linutronix.de	2025-11-04 08:28:15 +01:00
Thomas Gleixner	e497310b4f	uaccess: Provide scoped user access regions User space access regions are tedious and require similar code patterns all over the place: if (!user_read_access_begin(from, sizeof(from))) return -EFAULT; unsafe_get_user(val, from, Efault); user_read_access_end(); return 0; Efault: user_read_access_end(); return -EFAULT; This got worse with the recent addition of masked user access, which optimizes the speculation prevention: if (can_do_masked_user_access()) from = masked_user_read_access_begin((from)); else if (!user_read_access_begin(from, sizeof(from))) return -EFAULT; unsafe_get_user(val, from, Efault); user_read_access_end(); return 0; Efault: user_read_access_end(); return -EFAULT; There have been issues with using the wrong user__access_end() variant in the error path and other typical Copy&Pasta problems, e.g. using the wrong fault label in the user accessor which ends up using the wrong accesss end variant. These patterns beg for scopes with automatic cleanup. The resulting outcome is: scoped_user_read_access(from, Efault) unsafe_get_user(val, from, Efault); return 0; Efault: return -EFAULT; The scope guarantees the proper cleanup for the access mode is invoked both in the success and the failure (fault) path. The scoped_user_$MODE_access() macros are implemented as self terminating nested for() loops. Thanks to Andrew Cooper for pointing me at them. The scope can therefore be left with 'break', 'goto' and 'return'. Even 'continue' "works" due to the self termination mechanism. Both GCC and clang optimize all the convoluted macro maze out and the above results with clang in: b80: f3 0f 1e fa endbr64 b84: 48 b8 ef cd ab 89 67 45 23 01 movabs $0x123456789abcdef,%rax b8e: 48 39 c7 cmp %rax,%rdi b91: 48 0f 47 f8 cmova %rax,%rdi b95: 90 nop b96: 90 nop b97: 90 nop b98: 31 c9 xor %ecx,%ecx b9a: 8b 07 mov (%rdi),%eax b9c: 89 06 mov %eax,(%rsi) b9e: 85 c9 test %ecx,%ecx ba0: 0f 94 c0 sete %al ba3: 90 nop ba4: 90 nop ba5: 90 nop ba6: c3 ret Which looks as compact as it gets. The NOPs are placeholder for STAC/CLAC. GCC emits the fault path seperately: bf0: f3 0f 1e fa endbr64 bf4: 48 b8 ef cd ab 89 67 45 23 01 movabs $0x123456789abcdef,%rax bfe: 48 39 c7 cmp %rax,%rdi c01: 48 0f 47 f8 cmova %rax,%rdi c05: 90 nop c06: 90 nop c07: 90 nop c08: 31 d2 xor %edx,%edx c0a: 8b 07 mov (%rdi),%eax c0c: 89 06 mov %eax,(%rsi) c0e: 85 d2 test %edx,%edx c10: 75 09 jne c1b <afoo+0x2b> c12: 90 nop c13: 90 nop c14: 90 nop c15: b8 01 00 00 00 mov $0x1,%eax c1a: c3 ret c1b: 90 nop c1c: 90 nop c1d: 90 nop c1e: 31 c0 xor %eax,%eax c20: c3 ret The fault labels for the scoped() macros and the fault labels for the actual user space accessors can be shared and must be placed outside of the scope. If masked user access is enabled on an architecture, then the pointer handed in to scoped_user_$MODE_access() can be modified to point to a guaranteed faulting user address. This modification is only scope local as the pointer is aliased inside the scope. When the scope is left the alias is not longer in effect. IOW the original pointer value is preserved so it can be used e.g. for fixup or diagnostic purposes in the fault path. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027083745.546420421@linutronix.de	2025-11-04 08:27:52 +01:00
Thomas Gleixner	2db48d8bf8	arm64: uaccess: Use unsafe wrappers for ASM GOTO Clang propagates a provided label, which is outside of a cleanup scope to ASM GOTO despite the fact that __raw_get_mem() has a local label for that purpose: "error: cannot jump from this asm goto statement to one of its possible targets" Using the unsafe wrapper with the extra local label indirection cures that. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-11-04 08:27:20 +01:00
Thomas Gleixner	43cc54d8db	s390/uaccess: Use unsafe wrappers for ASM GOTO ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope: bool foo(u32 __user p, u32 val) { scoped_guard(pagefault) unsafe_put_user(val, p, efault); return true; efault: return false; } It ends up leaking the pagefault disable counter in the fault path. clang at least fails the build. S390 is not affected for unsafe__user() as it uses its own local label already, but __get/put_kernel_nofault() lack that. Rename them to arch_*_kernel_nofault() which makes the generic uaccess header wrap it with a local label that makes both compilers emit correct code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://patch.msgid.link/20251027083745.483079889@linutronix.de	2025-11-03 15:26:10 +01:00
Thomas Gleixner	0988ea18c6	riscv/uaccess: Use unsafe wrappers for ASM GOTO ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope: bool foo(u32 __user p, u32 val) { scoped_guard(pagefault) unsafe_put_user(val, p, efault); return true; efault: return false; } It ends up leaking the pagefault disable counter in the fault path. clang at least fails the build. Rename unsafe__user() to arch_unsafe_*_user() which makes the generic uaccess header wrap it with a local label that makes both compilers emit correct code. Same for the kernel_nofault() variants. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027083745.419351819@linutronix.de	2025-11-03 15:26:10 +01:00
Thomas Gleixner	5002dd5314	powerpc/uaccess: Use unsafe wrappers for ASM GOTO ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope: bool foo(u32 __user p, u32 val) { scoped_guard(pagefault) unsafe_put_user(val, p, efault); return true; efault: return false; } It ends up leaking the pagefault disable counter in the fault path. clang at least fails the build. Rename unsafe__user() to arch_unsafe_*_user() which makes the generic uaccess header wrap it with a local label that makes both compilers emit correct code. Same for the kernel_nofault() variants. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027083745.356628509@linutronix.de	2025-11-03 15:26:09 +01:00
Thomas Gleixner	14219398e3	x86/uaccess: Use unsafe wrappers for ASM GOTO ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope: bool foo(u32 __user p, u32 val) { scoped_guard(pagefault) unsafe_put_user(val, p, efault); return true; efault: return false; } It ends up leaking the pagefault disable counter in the fault path. clang at least fails the build. Rename unsafe__user() to arch_unsafe_*_user() which makes the generic uaccess header wrap it with a local label that makes both compilers emit correct code. Same for the kernel_nofault() variants. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027083745.294359925@linutronix.de	2025-11-03 15:26:09 +01:00
Thomas Gleixner	3eb6660f26	uaccess: Provide ASM GOTO safe wrappers for unsafe__user() ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope: bool foo(u32 __user p, u32 val) { scoped_guard(pagefault) unsafe_put_user(val, p, efault); return true; efault: return false; } e80: e8 00 00 00 00 call e85 <foo+0x5> e85: 65 48 8b 05 00 00 00 00 mov %gs:0x0(%rip),%rax e8d: 83 80 04 14 00 00 01 addl $0x1,0x1404(%rax) // pf_disable++ e94: 89 37 mov %esi,(%rdi) e96: 83 a8 04 14 00 00 01 subl $0x1,0x1404(%rax) // pf_disable-- e9d: b8 01 00 00 00 mov $0x1,%eax // success ea2: e9 00 00 00 00 jmp ea7 <foo+0x27> // ret ea7: 31 c0 xor %eax,%eax // fail ea9: e9 00 00 00 00 jmp eae <foo+0x2e> // ret which is broken as it leaks the pagefault disable counter on failure. Clang at least fails the build. Linus suggested to add a local label into the macro scope and let that jump to the actual caller supplied error label. __label__ local_label; \ arch_unsafe_get_user(x, ptr, local_label); \ if (0) { \ local_label: \ goto label; \ That works for both GCC and clang. clang: c80: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) c85: 65 48 8b 0c 25 00 00 00 00 mov %gs:0x0,%rcx c8e: ff 81 04 14 00 00 incl 0x1404(%rcx) // pf_disable++ c94: 31 c0 xor %eax,%eax // set retval to false c96: 89 37 mov %esi,(%rdi) // write c98: b0 01 mov $0x1,%al // set retval to true c9a: ff 89 04 14 00 00 decl 0x1404(%rcx) // pf_disable-- ca0: 2e e9 00 00 00 00 cs jmp ca6 <foo+0x26> // ret The exception table entry points correctly to c9a GCC: f70: e8 00 00 00 00 call f75 <baz+0x5> f75: 65 48 8b 05 00 00 00 00 mov %gs:0x0(%rip),%rax f7d: 83 80 04 14 00 00 01 addl $0x1,0x1404(%rax) // pf_disable++ f84: 8b 17 mov (%rdi),%edx f86: 89 16 mov %edx,(%rsi) f88: 83 a8 04 14 00 00 01 subl $0x1,0x1404(%rax) // pf_disable-- f8f: b8 01 00 00 00 mov $0x1,%eax // success f94: e9 00 00 00 00 jmp f99 <baz+0x29> // ret f99: 83 a8 04 14 00 00 01 subl $0x1,0x1404(%rax) // pf_disable-- fa0: 31 c0 xor %eax,%eax // fail fa2: e9 00 00 00 00 jmp fa7 <baz+0x37> // ret The exception table entry points correctly to f99 So both compilers optimize out the extra goto and emit correct and efficient code. Provide a generic wrapper to do that to avoid modifying all the affected architecture specific implementation with that workaround. The only change required for architectures is to rename unsafe__user() to arch_unsafe__user(). That's done in subsequent changes. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/877bweujtn.ffs@tglx	2025-11-03 15:26:09 +01:00
Thomas Gleixner	44c5b6768e	ARM: uaccess: Implement missing __get_user_asm_dword() When CONFIG_CPU_SPECTRE=n then get_user() is missing the 8 byte ASM variant for no real good reason. This prevents using get_user(u64) in generic code. Implement it as a sequence of two 4-byte reads with LE/BE awareness and make the unsigned long (or long long) type for the intermediate variable to read into dependend on the the target type. The __long_type() macro and idea was lifted from PowerPC. Thanks to Christophe for pointing it out. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Closes: https://lore.kernel.org/oe-kbuild-all/202509120155.pFgwfeUD-lkp@intel.com/ Link: https://patch.msgid.link/20251027083745.168468637@linutronix.de	2025-11-03 15:26:09 +01:00
Rob Herring (Arm)	989b40b757	perf: arm_pmuv3: Add new Cortex and C1 CPU PMUs Add CPU PMU compatible strings for Cortex-A320, Cortex-A520AE, Cortex-A720AE, and C1 Nano/Premium/Pro/Ultra. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-03 14:21:55 +00:00
Ma Ke	970e1e4180	perf: arm_cspmu: fix error handling in arm_cspmu_impl_unregister() driver_find_device() calls get_device() to increment the reference count once a matching device is found. device_release_driver() releases the driver, but it does not decrease the reference count that was incremented by driver_find_device(). At the end of the loop, there is no put_device() to balance the reference count. To avoid reference count leakage, add put_device() to decrease the reference count. Found by code review. Cc: stable@vger.kernel.org Fixes: `bfc653aa89` ("perf: arm_cspmu: Separate Arm and vendor module") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-03 14:20:03 +00:00
Robin Murphy	8fa08f8835	perf/arm-ni: Add NoC S3 support NoC S3 and its SI L1 sibling look largely similar to their predecessors, but add the notion of subfeatures to the discovery process, which we now use to find the event muxes for each device node. Plus, as ever, more mildly annoying shuffling around of some of the PMU registers (this time it's the counters...) Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-03 14:19:17 +00:00
Besar Wicaksono	decc3684c2	perf/arm_cspmu: nvidia: Add pmevfiltr2 support Support NVIDIA PMU that utilizes the optional event filter2 register. Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-03 13:35:07 +00:00
Besar Wicaksono	82dfd72bfb	perf/arm_cspmu: nvidia: Add revision id matching Distinguish NVIDIA devices by revision and variant bits in PMIIDR register in addition to product id. Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-03 13:35:07 +00:00
Besar Wicaksono	04330be8dc	perf/arm_cspmu: Add pmpidr support The PMIIDR value is composed by the values in PMPIDR registers. We can use PMPIDR registers as alternative for device identification for systems that do not implement PMIIDR. Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-03 13:35:07 +00:00
Besar Wicaksono	a2573bc790	perf/arm_cspmu: Add callback to reset filter config Implementer may need to reset a filter config when stopping a counter, thus adding a callback for this. Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-03 13:35:07 +00:00
Yicong Yang	c3d78c34ad	perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if one of the SMT siblings is not idle on a multi-threaded implementation. So don't use it on SMT cores. Introduce topology_core_has_smt() for knowing the SMT implementation and cached it in arm_pmu::has_smt during allocation. When counting cycles on SMT CPU 2-3 and CPU 3 is idle, without this patch we'll get: [root@client1 tmp]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1 --taskset 2 --timeout 1 [...] Performance counter stats for 'CPU(s) 2-3': CPU2 2880457316 cycles CPU3 2880459810 cycles 1.254688470 seconds time elapsed With this patch the idle state of CPU3 is observed as expected: [root@client1 ~]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1 --taskset 2 --timeout 1 [...] Performance counter stats for 'CPU(s) 2-3': CPU2 2558580492 cycles CPU3 305749 cycles 1.113626410 seconds time elapsed Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-03 13:28:48 +00:00
Lukas Wunner	51d0656959	genirq/manage: Reduce priority of forced secondary interrupt handler Crystal reports that the PCIe Advanced Error Reporting driver gets stuck in an infinite loop on PREEMPT_RT: Both the primary interrupt handler aer_irq() as well as the secondary handler aer_isr() are forced into threads with identical priority. Crystal writes that on the ARM system in question, the primary handler has to clear an error in the Root Error Status register... "before the next error happens, or else the hardware will set the Multiple ERR_COR Received bit. If that bit is set, then aer_isr() can't rely on the Error Source Identification register, so it scans through all devices looking for errors -- and for some reason, on this system, accessing the AER registers (or any Config Space above 0x400, even though there are capabilities located there) generates an Unsupported Request Error (but returns valid data). Since this happens more than once, without aer_irq() preempting, it causes another multi error and we get stuck in a loop." The issue does not show on non-PREEMPT_RT because the primary handler runs in hardirq context and thus can preempt the threaded secondary handler, clear the Root Error Status register and prevent the secondary handler from getting stuck. Emulate the same behavior on PREEMPT_RT by assigning a lower default priority to the secondary handler if the primary handler is forced into a thread. Reported-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Crystal Wood <crwood@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/f6dcdb41be2694886b8dbf4fe7b3ab89e9d5114c.1761569303.git.lukas@wunner.de Closes: https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/	2025-11-01 21:30:02 +01:00
Frederic Weisbecker	ba14500e4b	timers/migration: Remove dead code handling idle CPU checking for remote timers Idle migrators don't walk the whole tree in order to find out if there are timers to migrate because they recorded the next deadline to be verified within a single check in tmigr_requires_handle_remote(). Remove the related dead code and data. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251024132536.39841-7-frederic@kernel.org	2025-11-01 20:38:25 +01:00
Frederic Weisbecker	93643b90d6	timers/migration: Remove unused "cpu" parameter from tmigr_get_group() Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251024132536.39841-6-frederic@kernel.org	2025-11-01 20:38:25 +01:00
Frederic Weisbecker	3c8eb36e2a	timers/migration: Assert that hotplug preparing CPU is part of stable active hierarchy The CPU doing the prepare work for a remote target must be online from the tree point of view and its hierarchy must be active, otherwise propagating its active state up to the new root branch would be either incorrect or racy. Assert those conditions with more sanity checks. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251024132536.39841-5-frederic@kernel.org	2025-11-01 20:38:25 +01:00
Frederic Weisbecker	5eb579dfd4	timers/migration: Fix imbalanced NUMA trees When a CPU from a new node boots, the old root may happen to be connected to the new root even if their node mismatch, as depicted in the following scenario: 1) CPU 0 boots and creates the first group for node 0. [GRP0:0] node 0 \| CPU 0 2) CPU 1 from node 1 boots and creates a new top that corresponds to node 1, but it also connects the old root from node 0 to the new root from node 1 by mistake. [GRP1:0] node 1 / \ / \ [GRP0:0] [GRP0:1] node 0 node 1 \| \| CPU 0 CPU 1 3) This eventually leads to an imbalanced tree where some node 0 CPUs migrate node 1 timers (and vice versa) way before reaching the crossnode groups, resulting in more frequent remote memory accesses than expected. [GRP2:0] NUMA_NO_NODE / \ [GRP1:0] [GRP1:1] node 1 node 0 / \ \| / \ [...] [GRP0:0] [GRP0:1] node 0 node 1 \| \| CPU 0... CPU 1... A balanced tree should only contain groups having children that belong to the same node: [GRP2:0] NUMA_NO_NODE / \ [GRP1:0] [GRP1:0] node 0 node 1 / \ / \ / \ / \ [GRP0:0] [...] [...] [GRP0:1] node 0 node 1 \| \| CPU 0... CPU 1... In order to fix this, the hierarchy must be unfolded up to the crossnode level as soon as a node mismatch is detected. For example the stage 2 above should lead to this layout: [GRP2:0] NUMA_NO_NODE / \ [GRP1:0] [GRP1:1] node 0 node 1 / \ / \ [GRP0:0] [GRP0:1] node 0 node 1 \| \| CPU 0 CPU 1 This means that not only GRP1:0 must be created but also GRP1:1 and GRP2:0 in order to prepare a balanced tree for next CPUs to boot. Fixes: `7ee9887703` ("timers: Implement the hierarchical pull model") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251024132536.39841-4-frederic@kernel.org	2025-11-01 20:38:25 +01:00
Frederic Weisbecker	fa9620355d	timers/migration: Remove locking on group connection Initializing the tmc's group, the group's number of children and the group's parent can all be done without locking because: 1) Reading the group's parent and its group mask is done locklessly. 2) The connections prepared for a given CPU hierarchy are visible to the target CPU once online, thanks to the CPU hotplug enforced memory ordering. 3) In case of a newly created upper level, the new root and its connections and initialization are made visible by the CPU which made the connections. When that CPUs goes idle in the future, the new link is published by tmigr_inactive_up() through the atomic RmW on ->migr_state. 4) If CPUs were still walking up the active hierarchy, they could observe the new root earlier. In this case the ordering is enforced by an early initialization of the group mask and by barriers that maintain address dependency as explained in: `b729cc1ec2` ("timers/migration: Fix another race between hotplug and idle entry/exit") `de3ced72a7` ("timers/migration: Enforce group initialization visibility to tree walkers") 5) Timers are propagated by a chain of group locking from the bottom to the top. And while doing so, the tree also propagates groups links and initialization. Therefore remote expiration, which also relies on group locking, will observe those links and initialization while holding the root lock before walking the tree remotely and update remote timers. This is especially important for migrators in the active hierarchy that may observe the new root early. Therefore the locking is unnecessary at initialization. If anything, it just brings confusion. Remove it. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251024132536.39841-3-frederic@kernel.org	2025-11-01 20:38:25 +01:00
Frederic Weisbecker	6c181b5667	timers/migration: Convert "while" loops to use "for" Both the "do while" and "while" loops in tmigr_setup_groups() eventually mimic the behaviour of "for" loops. Simplify accordingly. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251024132536.39841-2-frederic@kernel.org	2025-11-01 20:38:24 +01:00
Steve Wahl	4138787408	tick/sched: Limit non-timekeeper CPUs calling jiffies update On large NUMA systems, while running a test program that saturates the inter-processor and inter-NUMA links, acquiring the jiffies_lock can be very expensive. If the cpu designated to do jiffies updates (tick_do_timer_cpu) gets delayed and other cpus decide to do the jiffies update themselves, a large number of them decide to do so at the same time. The inexpensive check against tick_next_period is far quicker than actually acquiring the lock, so most of these get in line to obtain the lock. If obtaining the lock is slow enough, this spirals into the vast majority of CPUs continuously being stuck waiting for this lock, just to obtain it and find out that time has already been updated by another cpu. For example, on one random entry to kdb by manually-injected NMI, 2912 of 3840 CPUs were observed to be stuck there. To avoid this, allow only one non-timekeeper CPU to call tick_do_update_jiffies64() at any given time, resetting ts->stalled jiffies only if the jiffies update function is actually called. With this change, manually interrupting the test at most two CPUs are observed to invoke tick_do_update_jiffies64() - the timekeeper and one other. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://patch.msgid.link/20251027183456.343407-1-steve.wahl@hpe.com	2025-11-01 20:25:53 +01:00
Muchun Song	9ea2b810d5	genirq/proc: Fix race in show_irq_affinity() Reading /proc/irq/N/smp_affinity* races with irq_set_affinity() and irq_move_masked_irq(), leading to old or torn output for users. After a user writes a new CPU mask to /proc/irq/N/affinity*, the syscall returns success, yet a subsequent read of the same file immediately returns a value different from what was just written. That's due to a race between show_irq_affinity() and irq_move_masked_irq() which lets the read observe a transient, inconsistent affinity mask. Cure it by guarding the read with irq_desc::lock. [ tglx: Massaged change log ] Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251028090408.76331-1-songmuchun@bytedance.com	2025-10-31 22:30:05 +01:00
Marc Zyngier	68c4c159a0	genirq: Fix percpu_devid irq affinity documentation Stephen points out that some of the percpu_devid irq affinity documentation is either missing or not matching the data structures. Address all the issues in one go. Fixes: `87b0031f7f` ("irqdomain: Add firmware info reporting interface") Fixes: `258e7d28a3` ("genirq: Add affinity to percpu_devid interrupt requests") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251030143032.2035987-1-maz@kernel.org	2025-10-31 22:25:34 +01:00
John Allen	92ad6505a4	x86/sev: Include XSS value in GHCB CPUID request When a guest issues a CPUID instruction for Fn0000000D_x01, the hypervisor may be intercepting the CPUID instruction and need to access the guest XSS value. For SEV-ES, the XSS value is encrypted and needs to be included in the GHCB to be visible to the hypervisor. Signed-off-by: John Allen <john.allen@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://patch.msgid.link/all/20250924200852.4452-3-john.allen@amd.com/	2025-10-30 17:47:49 +01:00
John Allen	9249bcdea0	x86/boot: Move boot_msr helpers to asm/shared/msr.h The boot_{rdmsr,wrmsr}() helpers are just* the barebones MSR access functionality, without any tracing or exception handling glue as it is done in kernel proper. Move these helpers to asm/shared/msr.h and rename to raw_{rdmsr,wrmsr}() to indicate what they are. [ bp: Correct the reason why those helpers exist. I should've caught that in the original patch that added them: `176db62257` ("x86/boot: Introduce helpers for MSR reads/writes" but oh well... - fixup include path delimiters to <> ] Signed-off-by: John Allen <john.allen@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://patch.msgid.link/all/20250924200852.4452-2-john.allen@amd.com	2025-10-30 16:29:53 +01:00
Yu Peng	ca8313fd83	x86/microcode: Mark early_parse_cmdline() as __init Fix section mismatch warning reported by modpost: .text:early_parse_cmdline() -> .init.data:boot_command_line The function early_parse_cmdline() is only called during init and accesses init data, so mark it __init to match its usage. [ bp: This happens only when the toolchain fails to inline the function and I haven't been able to reproduce it with any toolchain I'm using. Patch is obviously correct regardless. ] Signed-off-by: Yu Peng <pengyu@kylinos.cn> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/all/20251030123757.1410904-1-pengyu@kylinos.cn	2025-10-30 14:33:31 +01:00
Borislav Petkov (AMD)	8d17104506	x86/microcode/AMD: Select which microcode patch to load All microcode patches up to the proper BIOS Entrysign fix are loaded only after the sha256 signature carried in the driver has been verified. Microcode patches after the Entrysign fix has been applied, do not need that signature verification anymore. In order to not abandon machines which haven't received the BIOS update yet, add the capability to select which microcode patch to load. The corresponding microcode container supplied through firmware-linux has been modified to carry two patches per CPU type (family/model/stepping) so that the proper one gets selected. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Waiman Long <longman@redhat.com> Link: https://patch.msgid.link/20251027133818.4363-1-bp@kernel.org	2025-10-30 14:29:54 +01:00
Yazen Ghannam	187d1b27a1	RAS/AMD/ATL: Require PRM support for future systems Currently, the AMD Address Translation Library will fail to load for new, unrecognized systems (based on Data Fabric revision). The intention is to prevent the code from executing on new systems and returning incorrect results. Recent AMD systems, however, may provide UEFI PRM handlers for address translation. This is code provided by the platform through BIOS tables. These are the preferred method for translation, and the Linux native code can be used as a fallback. Future AMD systems are expected to provide PRM handlers by default. And Linux native code will not be used. Adjust the ATL init code so that new, unrecognized systems will default to using PRM handlers only. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: "Mario Limonciello (AMD)" <superm1@kernel.org> Link: https://patch.msgid.link/all/20251017-wip-atl-prm-v2-2-7ab1df4a5fbc@amd.com	2025-10-27 19:56:41 +01:00
Marc Zyngier	fa9d277738	perf: arm_pmu: Kill last use of per-CPU cpu_armpmu pointer Having removed the use of the cpu_armpmu per-CPU variable from the interrupt handling, the only user left is the BRBE scheduler hook. It is easy to drop the use of this variable by following the pointer to the generic PMU structure, and get the arm_pmu structure from there. Perform the conversion and kill cpu_armpmu altogether. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-27-maz@kernel.org	2025-10-27 17:16:37 +01:00
Marc Zyngier	ebac4649fc	irqdomain: Kill of_node_to_fwnode() helper There is no in-tree users of this helper since `b13b41cc3d` ("misc: ti_fpc202: Switch to of_fwnode_handle()"), and is replaced with of_fwnode_handle(). Get rid of it. Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-26-maz@kernel.org	2025-10-27 17:16:37 +01:00
Marc Zyngier	ee2d50a9f5	genirq: Kill irq_{g,s}et_percpu_devid_partition() These two helpers do not have any user anymore, and can be removed, together with the affinity field kept in the irqdesc structure. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-25-maz@kernel.org	2025-10-27 17:16:37 +01:00
Marc Zyngier	c620438ef2	irqchip: Kill irq-partition-percpu This code is now completely unused, and nobody will ever miss it. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-24-maz@kernel.org	2025-10-27 17:16:36 +01:00
Marc Zyngier	7443813f10	irqchip/apple-aic: Drop support for custom PMU irq partitions Similarly to what has been done for GICv3, drop the irq partitioning support from the AIC driver, effectively merging the two per-cpu interrupts for the PMU. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Sven Peter <sven@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-23-maz@kernel.org	2025-10-27 17:16:36 +01:00
Marc Zyngier	64b9738eaa	irqchip/gic-v3: Drop support for custom PPI partitions The only thing getting in the way of correctly handling PPIs the way they were intended is the GICv3 hack that deals with PPI partitions. Remove that code, allowing the common code to kick in. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-22-maz@kernel.org	2025-10-27 17:16:36 +01:00
Marc Zyngier	4cdf4813f5	coresight: trbe: Request specific affinities for per CPU interrupts Let the TRBE driver request interrupts with an affinity mask matching the TRBE implementation affinity. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://patch.msgid.link/20251020122944.3074811-21-maz@kernel.org	2025-10-27 17:16:36 +01:00
Marc Zyngier	f8112d29ba	perf: arm_spe_pmu: Request specific affinities for per CPU interrupts Let the SPE driver request interrupts with an affinity mask matching the SPE implementation affinity. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-20-maz@kernel.org	2025-10-27 17:16:36 +01:00
Will Deacon	54b350fa8e	perf: arm_pmu: Request specific affinities for per CPU NMIs/interrupts Let the PMU driver request both NMIs and normal interrupts with an affinity mask matching the PMU affinity. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-19-maz@kernel.org	2025-10-27 17:16:35 +01:00
Marc Zyngier	c734af3b2b	genirq: Add request_percpu_irq_affinity() helper While it would be nice to simply make request_percpu_irq() take an affinity mask, the churn is likely to be on the irritating side given that most drivers do not give a damn about affinities. So take the more innocuous path to provide a helper that parallels request_percpu_irq(), with an affinity as a bonus argument. Yes, request_percpu_irq_affinity() is a bit of a mouthful. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-18-maz@kernel.org	2025-10-27 17:16:35 +01:00
Marc Zyngier	bdf4e2ac29	genirq: Allow per-cpu interrupt sharing for non-overlapping affinities Interrupt sharing for percpu-devid interrupts is forbidden, and for good reasons. These are interrupts generated from a CPU and handled by itself (timer, for example). Nobody in their right mind would put two devices on the same pin (and if they have, they get to keep the pieces...). But this also prevents more benign cases, where devices are connected to groups of CPUs, and for which the affinities are not overlapping. Effectively, the only thing they share is the interrupt number, and nothing else. Tweak the definition of IRQF_SHARED applied to percpu_devid interrupts to allow this particular use case. This results in extra validation at the point of the interrupt being setup and freed, as well as a tiny bit of extra complexity for interrupts at handling time (to pick the correct irqaction). Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-17-maz@kernel.org	2025-10-27 17:16:35 +01:00
Marc Zyngier	b9c6aa9efc	genirq: Update request_percpu_nmi() to take an affinity Continue spreading the notion of affinity to the per CPU interrupt request code by updating the call sites that use request_percpu_nmi() (all two of them) to take an affinity pointer. This pointer is firmly NULL for now. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-16-maz@kernel.org	2025-10-27 17:16:35 +01:00
Marc Zyngier	258e7d28a3	genirq: Add affinity to percpu_devid interrupt requests Add an affinity field to both the irqaction structure and the interrupt request primitives. Nothing is making use of it yet, and the only value used it NULL, which is used as a shorthand for cpu_possible_mask. This will shortly get used with actual affinities. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-15-maz@kernel.org	2025-10-27 17:16:34 +01:00
Marc Zyngier	9047a39daa	genirq: Factor-in percpu irqaction creation Move the code creating a per-cpu irqaction into its own helper, so that future changes to this code can be kept localised. At the same time, fix the documentation which appears to say the wrong thing when it comes to interrupts being automatically enabled (percpu_devid interrupts never are). Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251020122944.3074811-14-maz@kernel.org	2025-10-27 17:16:34 +01:00
Marc Zyngier	5c2b2cc472	genirq: Merge irqaction::{dev_id,percpu_dev_id} When irqaction::percpu_dev_id was introduced, it was hoped that it could be part of an anonymous union with dev_id, as the two fields are mutually exclusive. However, toolchains used at the time were often showing terrible support for anonymous unions, breaking the build on a number of architectures. It was therefore decided to keep the two fields separate and address this down the line. 14 years later, the compiler dark age is over, and there is universal support for anonymous unions. Get a whole pointer back that can immediately be spent on something else. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-13-maz@kernel.org	2025-10-27 17:16:34 +01:00
Marc Zyngier	5ff78c8de9	genirq: Kill handle_percpu_devid_fasteoi_nmi() There is no in-tree user of this flow handler anymore, so simply remove it. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-12-maz@kernel.org	2025-10-27 17:16:34 +01:00
Marc Zyngier	21bbbc50f3	irqchip/gic-v3: Switch high priority PPIs over to handle_percpu_devid_irq() It so appears that handle_percpu_devid_irq() is extremely similar to handle_percpu_devid_fasteoi_nmi(), and that the differences do no justify the horrid machinery in the GICv3 driver to handle the flow handler switch. Stick with the standard flow handler, even for NMIs. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-11-maz@kernel.org	2025-10-27 17:16:34 +01:00
Marc Zyngier	f6c8aced7c	perf: arm_spe_pmu: Convert to new interrupt affinity retrieval API Now that the relevant interrupt controllers are equipped with a callback returning the affinity of per-CPU interrupts, switch the ARM SPE driver over to this new method. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patch.msgid.link/20251020122944.3074811-10-maz@kernel.org	2025-10-27 17:16:33 +01:00
Marc Zyngier	663783e001	perf: arm_pmu: Convert to the new interrupt affinity retrieval API Now that the relevant interrupt controllers are equipped with a callback returning the affinity of per-CPU interrupts, switch the OF side of the ARM PMU driver over to this new method. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patch.msgid.link/20251020122944.3074811-9-maz@kernel.org	2025-10-27 17:16:33 +01:00
Marc Zyngier	541454dd20	coresight: trbe: Convert to the new interrupt affinity retrieval API Now that the relevant interrupt controllers are equipped with a callback returning the affinity of per-CPU interrupts, switch the TRBE driver over to this new method. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://patch.msgid.link/20251020122944.3074811-8-maz@kernel.org	2025-10-27 17:16:33 +01:00
Marc Zyngier	de575de83c	irqchip/apple-aic: Add FW info retrieval support Plug the new .get_fwspec_info() callback into the Apple AIC driver, using some of the existing FIQ affinity handling infrastructure. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Acked-by: Sven Peter <sven@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-7-maz@kernel.org	2025-10-27 17:16:33 +01:00
Marc Zyngier	68905ea65c	irqchip/gic-v3: Add FW info retrieval support Plug the new .get_fwspec_info() callback into the GICv3 core driver, using some of the existing PPI affinity handling infrastructure. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-6-maz@kernel.org	2025-10-27 17:16:33 +01:00
Marc Zyngier	0d5daa938c	platform: Add firmware-agnostic irq and affinity retrieval interface Expand platform_get_irq_optional() to also return an affinity if available, renaming it to platform_get_irq_affinity() in the process. platform_get_irq_optional() is preserved with its current semantics by calling into the new helper with a NULL affinity pointer. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251020122944.3074811-5-maz@kernel.org	2025-10-27 17:16:32 +01:00
Marc Zyngier	5404f5c06d	of/irq: Add interrupt affinity reporting interface Plug the irq_populate_fwspec_info() helper into the OF layer to offer an interrupt affinity reporting function. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251020122944.3074811-4-maz@kernel.org	2025-10-27 17:16:32 +01:00
Marc Zyngier	5324fe21ba	ACPI: irq: Add interrupt affinity reporting interface Plug the irq_populate_fwspec_info() helper into the ACPI layer to offer an interrupt affinity reporting function. This is currently only supported for the CONFIG_ACPI_GENERIC_GSI configurations, but could later be extended to legacy architectures if necessary. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-3-maz@kernel.org	2025-10-27 17:16:32 +01:00
Marc Zyngier	87b0031f7f	irqdomain: Add firmware info reporting interface Add an irqdomain callback to report firmware-provided information that is otherwise not available in a generic way. This is reported using a new data structure (struct irq_fwspec_info). This callback is optional and the only information that can be reported currently is the affinity of an interrupt. However, the containing structure is designed to be extensible, allowing other potentially relevant information to be reported in the future. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251020122944.3074811-2-maz@kernel.org	2025-10-27 17:16:32 +01:00
Yazen Ghannam	83be4bee57	ACPI: PRM: Add acpi_prm_handler_available() Add a helper function to check if a PRM handler/module is present. This can be used during init time by code that depends on a particular handler. If the handler is not present, then the code does not need to be loaded. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: "Mario Limonciello (AMD)" <superm1@kernel.org> Acked-by: "Rafael J. Wysocki (Intel)" <rafael@kernel.org> Link: https://patch.msgid.link/all/20251017-wip-atl-prm-v2-1-7ab1df4a5fbc@amd.com	2025-10-27 15:45:22 +01:00
Borislav Petkov (AMD)	4058386498	Merge tag 'x86_urgent_for_v6.18_rc3' into x86/microcode Pick up the below urgent upstream change in order to base more work ontop: - Correct the last Zen1 microcode revision for which Entrysign sha256 check is needed Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2025-10-27 14:06:38 +01:00
Charles Mirabile	539d147ef6	irqchip/sifive-plic: Add support for UltraRISC DP1000 PLIC Add a new compatible for the plic found in UltraRISC DP1000 with a quirk to work around a known hardware bug with IRQ claiming in the UR-CP100 cores. When claiming an interrupt on UR-CP100 cores, all other interrupts must be disabled before the claim register is accessed to prevent incorrect handling of the interrupt. This is a hardware bug in the CP100 core implementation, not specific to the DP1000 SoC. When the PLIC_QUIRK_CP100_CLAIM_REGISTER_ERRATUM flag is present, a specialized handler (plic_handle_irq_cp100) disables all interrupts except for the first pending one before reading the claim register, and then restores the interrupts before further processing of the claimed interrupt continues. This implementation leverages the enable_save optimization, which maintains the current interrupt enable state in memory, avoiding additional register reads during the workaround. The driver matches on "ultrarisc,cp100-plic" to apply the quirk to all SoCs using UR-CP100 cores, regardless of the specific SoC implementation. This has no impact on other platforms. [ tglx: Condensed the code a bit, massaged change log and comments ] Co-developed-by: Zhang Xincheng <zhangxincheng@ultrarisc.com> Signed-off-by: Zhang Xincheng <zhangxincheng@ultrarisc.com> Signed-off-by: Charles Mirabile <cmirabil@redhat.com> Signed-off-by: Lucas Zampieri <lzampier@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Samuel Holland <samuel.holland@sifive.com> Link: https://patch.msgid.link/20251024083647.475239-5-lzampier@redhat.com	2025-10-27 12:11:56 +01:00
Brendan Jackman	5385dec724	x86/mm: Unify __phys_addr_symbol() There are two implementations on 64-bit, depending on CONFIG_DEBUG_VIRTUAL, but they differ only regarding the presence of VIRTUAL_BUG_ON, which is already ifdef'd on CONFIG_DEBUG_VIRTUAL. To avoid adding a function call on non-LTO non-DEBUG_VIRTUAL builds, move the function into the header. (Note the function is already only used on 64-bit). Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/all/20250813-phys-addr-cleanup-v1-1-19e334b1c466@google.com/	2025-10-24 22:13:00 +02:00
Matthew Wilcox (Oracle)	70e0a80a1f	treewide: Remove in_irq() This old alias for in_hardirq() has been marked as deprecated since 2020; remove the stragglers. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251024180654.1691095-1-willy@infradead.org	2025-10-24 21:39:27 +02:00
Charles Mirabile	14ff9e54dd	irqchip/sifive-plic: Cache the interrupt enable state Optimize the PLIC driver by maintaining the interrupt enable state in the handler's enable_save array during normal operation rather than only during suspend/resume. This eliminates the need to read enable registers during suspend and makes the enable state immediately available for other purposes. Let __plic_toggle() update both the hardware registers and the cached enable_save state atomically within the existing enable_lock protection. That allows to remove the suspend-time enable register reading since handler::enable_save now always reflects the current state. [ tglx: Massaged change log ] Signed-off-by: Charles Mirabile <cmirabil@redhat.com> Signed-off-by: Lucas Zampieri <lzampier@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251024083647.475239-4-lzampier@redhat.com	2025-10-24 21:34:32 +02:00
Charles Mirabile	9dfb295a93	dt-bindings: interrupt-controller: Add UltraRISC DP1000 PLIC Add compatible strings for the PLIC found in UltraRISC DP1000 SoC. The PLIC is part of the UR-CP100 core and has a hardware bug requiring a workaround. Signed-off-by: Charles Mirabile <cmirabil@redhat.com> Signed-off-by: Lucas Zampieri <lzampier@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20251024083647.475239-3-lzampier@redhat.com	2025-10-24 21:34:32 +02:00
Lucas Zampieri	e95f66dd0e	dt-bindings: vendor-prefixes: Add UltraRISC Add vendor prefix for UltraRISC Technology Co., Ltd. Signed-off-by: Lucas Zampieri <lzampier@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20251024083647.475239-2-lzampier@redhat.com	2025-10-24 21:34:31 +02:00
Petr Tesarik	e9cc99142a	x86/tsx: Get the tsx= command line parameter with early_param() Use early_param() to get the value of the tsx= command line parameter. It is an early parameter, because it must be parsed before tsx_init(), which is called long before kernel_init(), where normal parameters are parsed. Although cmdline_find_option() from tsx_init() works fine, the option is later reported as unknown and passed to user space. The latter is not a real issue, but the former is confusing and makes people wonder if the tsx= parameter had any effect and double-check for typos unnecessarily. The behavior changes slightly if "tsx" is given without any argument (which is invalid syntax). Until now, the kernel logged an error message and disabled TSX. Now, the kernel still issues a warning (Malformed early option 'tsx'), but TSX state is unchanged. The new behavior is consistent with other parameters, e.g. "tsx_async_abort". [ bp: Fixup minor formatting request during review. ] Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/all/cover.1758906115.git.ptesarik@suse.com	2025-10-24 18:35:17 +02:00
Petr Tesarik	f018fca8f9	x86/tsx: Make tsx_ctrl_state static Move all definitions related to tsx_ctrl_state to tsx.c. They are never referenced outside this file. No functional change. Signed-off-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/all/cover.1758906115.git.ptesarik@suse.com	2025-10-24 18:24:42 +02:00
Heiko Carstens	020d5dc578	s390/ap: Don't leak debug feature files if AP instructions are not available If no AP instructions are available the AP bus module leaks registered debug feature files. Change function call order to fix this. Fixes: `cccd85bfb7` ("s390/zcrypt: Rework debug feature invocations.") Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-24 15:25:56 +02:00
Jens Remus	f25d952ab6	s390/ptrace: Explicitly include <linux/typecheck.h> The psw_bits() macro makes use of typecheck() without that typecheck.h is included. Add the missing include to avoid potential future compile problems. [hca@linux.ibm.com: change commit message] Signed-off-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-24 15:25:56 +02:00
Harald Freudenberger	51d921a613	s390/ap: Expose ap_bindings_complete_count counter via sysfs The AP bus udev event BINDINGS=complete is sent out when the first time all devices detected by the AP bus scan have been bound to device drivers. This is the ideal time to for example change the AP bus masks apmask and aqmask to re-establish a persistent change on the decision about which cards/domains should be available for the host and which should go into the pool for kvm guests. However, if exactly this initial udev event is sent out early in the boot process a udev rule may not have been established yet and thus this event will never be recognized. To have some indication about if the AP bus binding complete has already happened, the internal ap_bindings_complete_count counter is exposed via sysfs with this patch. Suggested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-23 16:11:38 +02:00
Heiko Carstens	07a75d08cf	s390/smp: Fix fallback CPU detection In case SCLP CPU detection does not work a fallback mechanism using SIGP is in place. Since a cleanup this does not work correctly anymore: new CPUs are only considered if their type matches the boot CPU. Before the cleanup the information if a CPU type should be considered was also part of a structure generated by the fallback mechanism and indicated that a CPU type should not be considered when adding CPUs. Since the rework a global SCLP state is used instead. If the global SCLP state indicates that the CPU type should be considered and the fallback mechanism is used, there may be a mismatch with CPU types if CPUs are added. This can lead to a system with only a single CPU even tough there are many more CPUs. Address this by simply copying the boot cpu type into the generated data structure from the fallback mechanism. Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> Fixes: `d08d94306e` ("s390/smp: cleanup core vs. cpu in the SCLP interface") Reviewed-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-23 16:11:38 +02:00
Gerd Bayer	564ebcae6a	s390/pci: Highlight failure to enable PCI function Emit an error log when a PCI function cannot be enabled for use, despite being reported as configured to the system. This brings to attention situations where functions might go missing without notice. Going unnoticed is less likely when functions are added to the system through hotplug, but will produce the same error log. Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-23 16:11:37 +02:00
Heiko Carstens	5c02c74dd4	Merge branch 'ap-bus-trace-events' Harald Freudenberger says: ==================== Investigations related to runtime of crypto requests has revealed a lack of performance or runtime information with crypto requests. There are the two zcrypt ioctl trace events covering the entry and exit of an ioctl with crypto requests giving the overall runtime within the kernel. However, there is no way to figure out the time where a request is enqueued into the AP bus queue but not pushed into the firmware queue. Then there is no information about the runtime of an request during processing in the firmware. And finally some info about pulling the reply from the firmware and delivering it into user space is missing. This series is aiming to provide a way to collect measurements which can be used to cover these runtime information for each crypto request/reply. ==================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 11:10:55 +02:00
Harald Freudenberger	9c11918040	s390/ap: Introduce new AP nqap and dqap trace events Introduce two new AP bus related tracepoint events: - There is a tracepoint s390_ap_nqap event immediately after a request has been pushed into the AP firmware queue with the NQAP AP command. - The other tracepoint s390_ap_dqap event fires immediately after a reply has been pulled out of the AP firmware queue via DQAP AP command. Both events are triggered unconditional and may need filtering. Filtering can be done based on the status value which is part of the nqap and dqap trace. So for example a echo "!(status & 0x00ff0000)" >.../s390_ap_dqap/filter filters out all trace events which have a response_code != 0 leaving just the successful nqap and dqap invocations. The idea of these two trace events focuses on performance to measure the runtime of a crypto request/reply as close as possible at the firmware level. In combination with the two zcrypt tracepoints (see the zcrypt.h trace event definition file) this gives measurement data about the runtime of a request/reply within the zcrpyt and AP bus layer. However, with having the status of these AP commands in hand also other usage may be possible. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 11:09:21 +02:00
Harald Freudenberger	7f124d78d4	s390/ap: Extend struct ap_queue_status with some convenience fields Sometimes there is a different view of the AP status word needed. So here is slight rework of the struct ap_queue_status to open up the possibility to have different ways of accessing the AP status bits and fields. The new struct ap_queue_status struct ap_queue_status { union { unsigned int value : 32; struct { unsigned int status_bits : 8; unsigned int rc : 8; unsigned int : 16; }; struct { unsigned int queue_empty : 1; unsigned int replies_waiting : 1; unsigned int queue_full : 1; unsigned int : 3; unsigned int async : 1; unsigned int irq_enabled : 1; unsigned int response_code : 8; unsigned int : 16; }; }; }; comprises the old struct ap_queue_status but extends it to have this also accessible as an unsigned int required for example for a simple print or trace of the whole value. Note that this rework is fully backward compatible to the existing code exploiting the struct ap_queue_status. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 11:09:21 +02:00
Harald Freudenberger	507cff242a	s390/zcrypt: Rework zcrypt request and reply trace event definition This is a slight rework of the s390_zcrypt_req and s390_zcrypt_rep trace event: - the psmid has been added to the s390_zcrypt_rep - "dev" renamed to "card" - "domain" renamed to "dom" The motivation of these changes is to make these traces more aligned to new upcoming traces for AP bus related trace events. Additionally the psmid is needed to match the reply (and thus indirect the request) to AP bus related trace events where only the psmid is unique identifying AP messages. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 11:09:21 +02:00
Josephine Pfeiffer	215231deea	s390/ptdump: Use seq_puts() in pt_dump_seq_puts() macro The pt_dump_seq_puts() macro incorrectly uses seq_printf() instead of seq_puts(). This is both a performance issue and conceptually wrong, as the macro name suggests plain string output (puts) but the implementation uses formatted output (printf). The macro is used in dump_pagetables.c:67-68 and 131 to output constant strings. Using seq_printf() adds unnecessary overhead for format string parsing. Signed-off-by: Josephine Pfeiffer <hi@josie.lol> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:29:50 +02:00
Heiko Carstens	5e09c0a03e	Merge branch 'tape-block-sizes' Jan Höppner says: ==================== The tape device driver is limited to a block size of 65535 bytes since a single CCW can only transfer up to 64K-1 bytes (The count field is a 16bit value). This series introduces data chaining for all read/write functions to support block sizes larger than 65535. The tape device type 3490 (emulated) and 3590/3592 can handle up to 256K. [1] [1] https://www.ibm.com/docs/en/zos/3.1.0?topic=blksize-system-determined-block-size ==================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:26:26 +02:00
Jan Höppner	319d3d6653	s390/tape: Add support for bigger block sizes The tape device type 3590/3592 and emulated 3490 VTS can handle a block size of up to 256K bytes. Currently the tape device driver is limited to a block size of 65535 bytes (64K-1). This limitation stems from the maximum of 65535 bytes of data that can be transferred with one Channel-Command Word (CCW). To work around this limitation data chaining is used which uses several CCW to transfer an entire 256K block of data. A single CCW holds a maximum of 65535 bytes of data. Set MAX_BLOCKSIZE to 262144 (= 256K) to allow for data transfers with larger block sizes. The read_block() and write_block() discipline functions calculate the number of CCWs required based on the IDAL buffer array size that was created for a given block size. If there is more than one CCW required for the data transfer, the new helper function tape_ccw_dc_idal() is used to build the data chain accordingly. The Interruption-Repsonse Block (irb) is added to the tape_request struct so that the tapechar_read/write() functions can analyze what data was read or written accordingly. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:25:55 +02:00
Jan Höppner	574817d6c0	s390/tape: Introduce idal buffer array The tape device driver uses a single idal_buffer for I/O. While the buffer itself can be arbitrary big, the limit for data transfer for a single Channel-Command Word is at 65535 bytes (64K-1) since the count field specifying the amount of data designated by the CCW is a 16-bit unsigned value. Provide functionality that allocates an array of multiple IDAL buffer with the limitation mentioned above in mind. A call to idal_buffer_array_alloc() allocates an array with a certain amount of IDAL buffers which is determined based on the total size of @size. Each individual buffer is limited to a size of CCW_MAX_BYTE_COUNT (65535 bytes). Add helper functions that determine the size (# of elements) and the total data size covered by the array as well. Current users of the single IDAL buffer are adapted to use the new functions with one buffer to allocate. The single IDAL buffer is removed from the tape_char_data struct. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:25:55 +02:00
Jan Höppner	a5e2ca22c1	s390/tape: Move idal allocation to core functions Currently tapechar_check_idalbuffer() is part of tape_char.c and is used to ensure the idal buffer is big enough for the requested I/O and reallocates a new one if required. The same is done in tape_std.c when a fixed block size is set using the mtsetblk command. This is essentially duplicate code. The allocation of the buffer that is required for I/O can be considered core functionality. Move the idal buffer allocation to tape_core.c, make it generally available, and reduce code duplication. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:25:55 +02:00
Jan Höppner	e039400f75	s390/tape: Fix return value of ccw helper functions In contrast to all other helper functions used to build CCW chains, tape_ccw_cc_idal() and tape_ccw_end_idal() return values using post-increments, which results in returning the same CCW pointer. Though, the intent of the CCW helper functions is to return the _next_ CCW in the chain, which can then be processed. There is currently no actual issue, as tape_ccw_cc_idal() is not used yet and tape_ccw_end_idal() is only used at the end of a chain. Change both functions return statement to ccw + 1 and bring them in line with the other helper functions. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:25:55 +02:00
Jan Höppner	83cff1b124	s390/tape: Remove extra CCW allocation for error recovery The Read Opposite error recovery code required 2 extra CCWs to be allocated in order to transform the request. As this error recovery code for both 34xx and 3590 was removed the additional allocation isn't required anymore. Reduce it to two. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:25:54 +02:00
Jan Höppner	a984d71277	s390/tape: Remove 3590 Read Opposite error recovery On old native type 3590 tape devices a Read Opposite error recovery procedure on Error Recovery Action Code (ERA) 26 was issued if a Read Forward command failed. This recovery procedure was implemented with the Read Backward command. This is no longer supported. Remove 3590 ERA 26 and Read Backward related recovery code. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:25:54 +02:00
Jan Höppner	1b9df1a28f	s390/tape: Remove 34xx Read Opposite error recovery On old native type 3490 tape devices a Read Opposite error recovery procedure on Error Recovery Action Code (ERA) 26 was issued if a Read Forward command failed. This recovery procedure was implemented with the Read Backward command. As a preparation for a subsequent commit, that adds support for bigger block sizes, remove the 34xx ERA 26 related recovery code. The recovery code would need to be adapted to the bigger block sizes, without any possibility to be tested, as modern Virtual Tape Servers (VTS) do neither report ERA 26 on a Read Forward command failure nor support the error recovery procedure anymore. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:25:54 +02:00
Jan Höppner	39376c77a5	s390/tape: Remove count parameter from read/write_block functions The count parameter of the read/write_block discipline functions was never used. Remove it. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:25:54 +02:00
Heiko Carstens	ec9b3b85ea	Merge branch 'memory-hotplug' Sumanth Korikkar says: ==================== Provide a new interface for dynamic configuration and deconfiguration of hotplug memory on s390, allowing with/without memmap_on_memory support. It is a follow up on the discussion with David when introducing memmap_on_memory support for s390 and support dynamic (de)configuration of memory: https://lore.kernel.org/all/ee492da8-74b4-4a97-8b24-73e07257f01d@redhat.com/ https://lore.kernel.org/all/20241202082732.3959803-1-sumanthk@linux.ibm.com/ The original motivation for introducing memmap_on_memory on s390 was to avoid using online memory to store struct pages metadata, particularly for standby memory blocks. This became critical in cases where there was an imbalance between standby and online memory, potentially leading to boot failures due to insufficient memory for metadata allocation. To address this, memmap_on_memory was utilized on s390. However, in its current form, it adds struct pages metadata at the start of each memory block at the time of addition (only standby memory), and this configuration is static. It cannot be changed at runtime (When the user needs continuous physical memory). Inorder to provide more flexibility to the user and overcome the above limitation, add an option to dynamically configure and deconfigure hotpluggable memory block with/without memmap_on_memory. With the new interface, s390 will not add all possible hotplug memory in advance, like before, to make it visible in sysfs for online/offline actions. Instead, before memory block can be set online, it has to be configured via a new interface in /sys/firmware/memory/memoryX/config, which makes s390 similar to others. i.e. Adding of hotpluggable memory is controlled by the user instead of adding it at boottime. s390 kernel sysfs interface to configure/deconfigure memory with memmap_on_memory (with upcoming lsmem changes): * Initial memory layout: lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x7fffffff 2G online 0-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes * Configure memory echo 1 > /sys/firmware/memory/memory16/config lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x7fffffff 2G online 0-15 yes no 0x80000000-0x87ffffff 128M offline 16 yes yes 0x88000000-0xffffffff 1.9G offline 17-31 no yes * Deconfigure memory echo 0 > /sys/firmware/memory/memory16/config lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x7fffffff 2G online 0-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes * Enable memmap_on_memory and online it. (Deconfigure first) echo 0 > /sys/devices/system/memory/memory5/online echo 0 > /sys/firmware/memory/memory5/config lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x27ffffff 640M online 0-4 yes no 0x28000000-0x2fffffff 128M offline 5 no no 0x30000000-0x7fffffff 1.3G online 6-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes (Enable memmap_on_memory and online it) echo 1 > /sys/firmware/memory/memory5/memmap_on_memory echo 1 > /sys/firmware/memory/memory5/config echo 1 > /sys/devices/system/memory/memory5/online lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x27ffffff 640M online 0-4 yes no 0x28000000-0x2fffffff 128M online 5 yes yes 0x30000000-0x7fffffff 1.3G online 6-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes * Disable memmap_on_memory and online it. (Deconfigure first) echo 0 > /sys/devices/system/memory/memory5/online echo 0 > /sys/firmware/memory/memory5/config lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x27ffffff 640M online 0-4 yes no 0x28000000-0x2fffffff 128M offline 5 no yes 0x30000000-0x7fffffff 1.3G online 6-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes (Disable memmap_on_memory and online it) echo 0 > /sys/firmware/memory/memory5/memmap_on_memory echo 1 > /sys/firmware/memory/memory5/config echo 1 > /sys/devices/system/memory/memory5/online lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x7fffffff 2G online 0-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes * Userspace changes: lsmem/chmem tool is also changed to use the new interface. I will send it to util-linux soon. Patch 1 adds support for removal of boot-allocated memory blocks. Patch 2 provides option to dynamically configure and deconfigure memory with/without memmap_on_memory. Patch 3 removes MHP_OFFLINE_INACCESSIBLE from s390. The mhp flag was used to mark memory as not accessible until memory hotplug online phase begins. However, with patch 2, it is no longer essential. Memory can be brought to accessible state before adding memory, as the memory is added during runttime now instead of boottime. Patch 4 removes the MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers. It is no longer needed. Memory can be brought to accessible state before adding memory now, with runtime (de)configuration of memory. Note: The patches apply to the linux-next branch. v3: Thanks David * Avoid goto label in create_standby_sclp_mems(). * Use unsigned long instead of u64. * Add Acked-by. v2: Thanks David * Rename struct mblock/mblock_arg with struct sclp_mem/sclp_mem_arg. * Rename all mblocks/mblock references with sclp_mems/sclp_mem - structures, functions. * Rename create_online_mblock() with create_configured_sclp_mem(). * Rename config_mblock_show()/config_mblock_store() with config_sclp_mem_show()/config_sclp_mem_store(). * Remove contains_standby_increment() and sclp_mem_notifier. sclp mem state change is performed when adding/removing memory. sclp memory notifier - no longer needed with this patchset. * Recover sclp mem state when add_memory() fails. * Refactor and add function init_sclp_mem(). * Use unsigned long instead of unsigned long long. * Simplify and correct kobj handling. Thanks Heiko. ==================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:19:30 +02:00
Heiko Carstens	c97689345c	s390/con3270: Use scnprintf() instead of sprintf() Use scnprintf() instead of sprintf() for those cases where the destination is an array and the size of the array is known at compile time. This prevents theoretical buffer overflows, but also avoids that people again and again spend time to figure out if the code is actually safe. Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:30 +02:00
Heiko Carstens	c769941de8	s390/tape: Use scnprintf() instead of sprintf() Use scnprintf() instead of sprintf() for those cases where the destination is an array and the size of the array is known at compile time. This prevents theoretical buffer overflows, but also avoids that people again and again spend time to figure out if the code is actually safe. Reviewed-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:30 +02:00
Heiko Carstens	ffb5d3af5e	s390/dcss: Use scnprintf() instead of sprintf() Use scnprintf() instead of sprintf() for those cases where the destination is an array and the size of the array is known at compile time. This prevents theoretical buffer overflows, but also avoids that people again and again spend time to figure out if the code is actually safe. Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:30 +02:00
Heiko Carstens	ba06238bbe	s390/cio: Use scnprintf() instead of sprintf() Use scnprintf() instead of sprintf() for those cases where the destination is an array and the size of the array is known at compile time. This prevents theoretical buffer overflows, but also avoids that people again and again spend time to figure out if the code is actually safe. Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:30 +02:00
Heiko Carstens	6850221116	s390/early: Use scnprintf() instead of sprintf() Use scnprintf() instead of sprintf() for those cases where the destination is an array and the size of the array is known at compile time. This prevents theoretical buffer overflows, but also avoids that people again and again spend time to figure out if the code is actually safe. Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:30 +02:00
Thomas Richter	4d065f3c80	s390/pai_crypto: Adjust paicrypt_copy() return statement Adjust the return statement in paicrypt_copy() to the same statement as in paiext_copy(). Use one common style. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:29 +02:00
Josephine Pfeiffer	4738e11662	s390/sysinfo: Replace sprintf() with snprintf() for buffer safety Replace sprintf() with snprintf() when formatting symlink target name to prevent potential buffer overflow. The link_to buffer is only 10 bytes, and using snprintf() ensures proper bounds checking if the topology nesting limit value is unexpectedly large. Signed-off-by: Josephine Pfeiffer <hi@josie.lol> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:29 +02:00
Josephine Pfeiffer	5379879a76	s390/extmem: Replace sprintf() with snprintf() for buffer safety Replace unsafe sprintf() calls with snprintf() in segment_save() to prevent potential buffer overflows. The function builds command strings by repeatedly appending to a fixed-size buffer, which could overflow if segment ranges are numerous or values are large. Signed-off-by: Josephine Pfeiffer <hi@josie.lol> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:29 +02:00
Josephine Pfeiffer	dd7d1d34ae	s390/cmm: Replace sprintf() with scnprintf() for buffer safety Replace sprintf() with scnprintf() in cmm_timeout_handler() to prevent potential buffer overflow. The scnprintf() function ensures we don't write beyond the buffer size and provides safer string formatting. Signed-off-by: Josephine Pfeiffer <hi@josie.lol> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:20 +02:00
Christophe JAILLET	ac646f4495	genirq/msi: Slightly simplify msi_domain_alloc() The return value of irq_find_mapping() is only tested, not used for anything else. Replaced it by irq_resolve_mapping() which is internally used by irq_find_mapping() and allows a simple boolean decision. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/1ce680114cdb8d40b072c54d7f015696a540e5a6.1760863194.git.christophe.jaillet@wanadoo.fr	2025-10-20 20:18:48 +02:00
Johan Hovold	a7f25e00c4	irqchip/qcom-irq-combiner: Rename driver structure The "_probe" suffix of the driver structure name prevents modpost from warning about section mismatches so replace it to catch any future issues like the recently fixed probe function being incorrectly marked as __init. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-10-17 15:18:18 +02:00
Yazen Ghannam	6553c68bc7	RAS/AMD/ATL: Return error codes from helper functions Pass up error codes from helper functions rather than discarding them. Suggested-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2025-10-17 14:38:42 +02:00
Elena Reshetova	0f2753efc5	x86/sgx: Enable automatic SVN updates for SGX enclaves == Background == ENCLS[EUPDATESVN] is a new SGX instruction [1] which allows enclave attestation to include information about updated microcode SVN without a reboot. Before an EUPDATESVN operation can be successful, all SGX memory (aka. EPC) must be marked as “unused” in the SGX hardware metadata (aka.EPCM). This requirement ensures that no compromised enclave can survive the EUPDATESVN procedure and provides an opportunity to generate new cryptographic assets. == Solution == Attempt to execute ENCLS[EUPDATESVN] every time the first file descriptor is obtained via sgx_(vepc_)open(). In the most common case the microcode SVN is already up-to-date, and the operation succeeds without updating SVN. Note: while in such cases the underlying crypto assets are regenerated, it does not affect enclaves' visible keys obtained via EGETKEY instruction. If it fails with any other error code than SGX_INSUFFICIENT_ENTROPY, this is considered unexpected and the open() returns an error. This should not happen in practice. On contrary, SGX_INSUFFICIENT_ENTROPY might happen due to a pressure on the system's DRNG (RDSEED) and therefore the open() can be safely retried to allow normal enclave operation. [1] Runtime Microcode Updates with Intel Software Guard Extensions, https://cdrdv2.intel.com/v1/dl/getContent/648682 Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Nataliia Bondarevska <bondarn@google.com>	2025-10-16 14:42:09 -07:00
Elena Reshetova	4e75697faa	x86/sgx: Implement ENCLS[EUPDATESVN] All running enclaves and cryptographic assets (such as internal SGX encryption keys) are assumed to be compromised whenever an SGX-related microcode update occurs. To mitigate this assumed compromise the new supervisor SGX instruction ENCLS[EUPDATESVN] can generate fresh cryptographic assets. Before executing EUPDATESVN, all SGX memory must be marked as unused. This requirement ensures that no potentially compromised enclave survives the update and allows the system to safely regenerate cryptographic assets. Add the method to perform ENCLS[EUPDATESVN]. However, until the follow up patch that wires calling sgx_update_svn() from sgx_inc_usage_count(), this code is not reachable. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Nataliia Bondarevska <bondarn@google.com>	2025-10-16 14:42:09 -07:00
Elena Reshetova	7b502832ee	x86/sgx: Define error codes for use by ENCLS[EUPDATESVN] Add error codes for ENCLS[EUPDATESVN], then SGX CPUSVN update process can know the execution state of EUPDATESVN and notify userspace. EUPDATESVN will be called when no active SGX users is guaranteed. Only add the error codes that can legally happen. E.g., it could also fail due to "SGX not ready" when there's SGX users but it wouldn't happen in this implementation. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Nataliia Bondarevska <bondarn@google.com>	2025-10-16 14:42:09 -07:00
Elena Reshetova	6ffdb49101	x86/cpufeatures: Add X86_FEATURE_SGX_EUPDATESVN feature flag Add a flag indicating whenever ENCLS[EUPDATESVN] SGX instruction is supported. This will be used by SGX driver to perform CPU SVN updates. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Nataliia Bondarevska <bondarn@google.com>	2025-10-16 14:42:08 -07:00
Elena Reshetova	483fc19e9c	x86/sgx: Introduce functions to count the sgx_(vepc_)open() Currently, when SGX is compromised and the microcode update fix is applied, the machine needs to be rebooted to invalidate old SGX crypto-assets and make SGX be in an updated safe state. It's not friendly for the cloud. To avoid having to reboot, a new ENCLS[EUPDATESVN] is introduced to update SGX environment at runtime. This process needs to be done when there's no SGX users to make sure no compromised enclaves can survive from the update and allow the system to regenerate crypto-assets. For now there's no counter to track the active SGX users of host enclave and virtual EPC. Introduce such counter mechanism so that the EUPDATESVN can be done only when there's no SGX users. Define placeholder functions sgx_inc/dec_usage_count() that are used to increment and decrement such a counter. Also, wire the call sites for these functions. Encapsulate the current sgx_(vepc_)open() to __sgx_(vepc_)open() to make the new sgx_(vepc_)open() easy to read. The definition of the counter itself and the actual implementation of sgx_inc/dec_usage_count() functions come next. Note: The EUPDATESVN, which may fail, will be done in sgx_inc_usage_count(). Make it return 'int' to make subsequent patches which implement EUPDATESVN easier to review. For now it always returns success. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Nataliia Bondarevska <bondarn@google.com>	2025-10-16 14:42:08 -07:00
Nam Cao	dce7450093	PCI/MSI: Delete pci_msi_create_irq_domain() pci_msi_create_irq_domain() is now unused. Delete it. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com>	2025-10-16 21:09:52 +02:00
Samuel Holland	3a16b05384	irqchip/riscv-imsic: Inline imsic_vector_from_local_id() This function is only called from one place, which is in the interrupt handling hot path. Inline it to improve code generation and to take advantage of this_cpu operations. lpriv and imsic->base_domain can never be NULL because irq_set_chained_handler() is called after they are allocated. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-10-16 18:17:28 +02:00
Samuel Holland	79eaabc61d	irqchip/riscv-imsic: Embed the vector array in lpriv Reduce pointer chasing and the number of allocations by using a flexible array member for the vector array instead of a separate allocation. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-10-16 18:17:28 +02:00
Samuel Holland	c475c0b713	irqchip/riscv-imsic: Remove redundant irq_data lookups imsic_irq_set_affinity() already takes the irq_data pointer as a parameter, so it is pointless to look it up again from the IRQ number. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-10-16 18:17:28 +02:00
Johan Hovold	dcc31768ff	irqchip/ts4800: Drop unused module alias The driver has never supported anything but OF probing so drop the unused platform alias. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-10-16 18:17:28 +02:00
Johan Hovold	b03127a4e7	irqchip/mvebu-pic: Drop unused module alias The driver has never supported anything but OF probing so drop the unused platform alias. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch>	2025-10-16 18:17:28 +02:00
Johan Hovold	867c6aa283	irqchip/meson-gpio: Drop unused module alias The driver has never supported anything but OF probing so drop the unused platform alias that was erroneously added by commit `a947aa00ed` ("irqchip/meson-gpio: Make it possible to build as a module"). Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-10-16 18:17:27 +02:00
Johan Hovold	1230fbb225	irqchip: Enable compile testing of Broadcom drivers There seems to be nothing preventing the Broadcom drivers from being compile tested so enable that for wider build coverage. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>	2025-10-16 18:17:27 +02:00
Johan Hovold	1e3e330c07	irqchip: Pass platform device to platform drivers The IRQCHIP_PLATFORM_DRIVER macros can be used to convert OF irqchip drivers to platform drivers but currently reuse the OF init callback prototype that only takes OF nodes as arguments. This forces drivers to do reverse lookups of their struct devices during probe if they need them for things like dev_printk() and device managed resources. Half of the drivers doing reverse lookups also currently fail to release the additional reference taken during the lookup, while other drivers have had the reference leak plugged in various ways (e.g. using non-intuitive cleanup constructs which still confuse static checkers). Switch to using a probe callback that takes a platform device as its first argument to simplify drivers and plug the remaining (mostly benign) reference leaks. Fixes: `32c6c05466` ("irqchip: Add Broadcom BCM2712 MSI-X interrupt controller") Fixes: `70afdab904` ("irqchip: Add IMX MU MSI controller driver") Fixes: `a6199bb514` ("irqchip: Add Qualcomm MPM controller driver") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Changhuang Liang <changhuang.liang@starfivetech.com>	2025-10-16 18:17:27 +02:00
Randy Dunlap	762a3d1ca2	x86/idtentry: Add missing '' to kernel-doc lines Fix kernel-doc warnings by adding the missing '' to each line. Warning: include/asm/idtentry.h:395 bad line: when raised from kernel mode Warning: include/asm/idtentry.h:405 bad line: when raised from user mode Since this is in a kernel-doc block, these lines need a leading " *" on each line to prevent the warnings. Fixes: `a13644f3a5` ("x86/entry/64: Add entry code for #VC handler") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2025-10-16 17:45:42 +02:00
Johan Hovold	3540d99c03	irqchip: Drop leftover brackets Drop some unnecessary brackets in platform_irqchip_probe() mistakenly left by commit `9322d1915f` ("irqchip: Plug a OF node reference leak in platform_irqchip_probe()"). Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>	2025-10-16 11:30:38 +02:00
Johan Hovold	9b685058ca	irqchip/qcom-irq-combiner: Fix section mismatch Platform drivers can be probed after their init sections have been discarded so the probe callback must not live in init. Fixes: `f20cc9b00c` ("irqchip/qcom: Add IRQ combiner driver") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-10-16 11:30:38 +02:00
Johan Hovold	f798bdb9aa	irqchip/starfive-jh8100: Fix section mismatch Platform drivers can be probed after their init sections have been discarded so the irqchip init callback must not live in init. Fixes: `e4e5350361` ("irqchip: Add StarFive external interrupt controller") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Changhuang Liang <changhuang.liang@starfivetech.com>	2025-10-16 11:30:38 +02:00
Johan Hovold	5b338fbb2b	irqchip/renesas-rzg2l: Fix section mismatch Platform drivers can be probed after their init sections have been discarded so the irqchip init callbacks must not live in init. Fixes: `d011c022ef` ("irqchip/renesas-rzg2l: Add support for RZ/Five SoC") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>	2025-10-16 11:30:38 +02:00
Johan Hovold	64acfd8e68	irqchip/imx-mu-msi: Fix section mismatch Platform drivers can be probed after their init sections have been discarded so the irqchip init callbacks must not live in init. Fixes: `70afdab904` ("irqchip: Add IMX MU MSI controller driver") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-10-16 11:30:38 +02:00
Johan Hovold	bbe1775924	irqchip/irq-brcmstb-l2: Fix section mismatch Platform drivers can be probed after their init sections have been discarded so the irqchip init callbacks must not live in init. Fixes: `51d9db5c8f` ("irqchip/irq-brcmstb-l2: Switch to IRQCHIP_PLATFORM_DRIVER") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>	2025-10-16 11:30:37 +02:00
Johan Hovold	bfc0c5beab	irqchip/irq-bcm7120-l2: Fix section mismatch Platform drivers can be probed after their init sections have been discarded so the irqchip init callbacks must not live in init. Fixes: `3ac268d5ed` ("irqchip/irq-bcm7120-l2: Switch to IRQCHIP_PLATFORM_DRIVER") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>	2025-10-16 11:30:37 +02:00
Johan Hovold	e9db5332ca	irqchip/irq-bcm7038-l1: Fix section mismatch Platform drivers can be probed after their init sections have been discarded so the irqchip init callback must not live in init. Fixes: `c057c799e3` ("irqchip/irq-bcm7038-l1: Switch to IRQCHIP_PLATFORM_DRIVER") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>	2025-10-16 11:30:37 +02:00
Johan Hovold	a8452d1d59	irqchip/bcm2712-mip: Fix section mismatch Platform drivers can be probed after their init sections have been discarded so the irqchip init callback must not live in init. Fixes: `32c6c05466` ("irqchip: Add Broadcom BCM2712 MSI-X interrupt controller") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>	2025-10-16 11:30:37 +02:00
Johan Hovold	0435bcc4e5	irqchip/bcm2712-mip: Fix OF node reference imbalance The init callback must not decrement the reference count of the provided irqchip OF node. This should not cause any trouble currently, but if the driver ever starts probe deferring it could lead to warnings about reference underflow and saturation. Fixes: `32c6c05466` ("irqchip: Add Broadcom BCM2712 MSI-X interrupt controller") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>	2025-10-16 11:30:37 +02:00
Chang S. Bae	bffeb2fd0b	x86/microcode/intel: Enable staging when available With staging support implemented, enable it when the CPU reports the feature. [ bp: Sort in the MSR properly. ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com	2025-10-15 16:47:50 +02:00
Chang S. Bae	4ab410287b	x86/microcode/intel: Support mailbox transfer The functions for sending microcode data and retrieving the next offset were previously placeholders, as they need to handle a specific mailbox format. While the kernel supports similar mailboxes, none of them are compatible with this one. Attempts to share code led to unnecessary complexity, so add a dedicated implementation instead. [ bp: Sort the include properly. ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com	2025-10-15 16:47:43 +02:00
Chang S. Bae	afc3b50954	x86/microcode/intel: Implement staging handler Previously, per-package staging invocations and their associated state data were established. The next step is to implement the actual staging handler according to the specified protocol. Below are key aspects to note: (a) Each staging process must begin by resetting the staging hardware. (b) The staging hardware processes up to a page-sized chunk of the microcode image per iteration, requiring software to submit data incrementally. (c) Once a data chunk is processed, the hardware responds with an offset in the image for the next chunk. (d) The offset may indicate completion or request retransmission of an already transferred chunk. As long as the total transferred data remains within the predefined limit (twice the image size), retransmissions should be acceptable. Incorporate them in the handler, while data transmission and mailbox format handling are implemented separately. [ bp: Sort the headers in a reversed name-length order. ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com	2025-10-15 16:47:37 +02:00
Chang S. Bae	079b90d4ba	x86/microcode/intel: Define staging state struct Define a staging_state struct to simplify function prototypes by consolidating relevant data, instead of passing multiple local variables. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com	2025-10-15 16:47:31 +02:00
Chang S. Bae	740144bc6b	x86/microcode/intel: Establish staging control logic When microcode staging is initiated, operations are carried out through an MMIO interface. Each package has a unique interface specified by the IA32_MCU_STAGING_MBOX_ADDR MSR, which maps to a set of 32-bit registers. Prepare staging with the following steps: 1. Ensure the microcode image is 32-bit aligned to match the MMIO register size. 2. Identify each MMIO interface based on its per-package scope. 3. Invoke the staging function for each identified interface, which will be implemented separately. [ bp: Improve error logging. ] Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/all/871pznq229.ffs@tglx	2025-10-15 16:47:20 +02:00
Chang S. Bae	7cdda85ed9	x86/microcode: Introduce staging step to reduce late-loading time As microcode patch sizes continue to grow, late-loading latency spikes can lead to timeouts and disruptions in running workloads. This trend of increasing patch sizes is expected to continue, so a foundational solution is needed to address the issue. To mitigate the problem, introduce a microcode staging feature. This option processes most of the microcode update (excluding activation) on a non-critical path, allowing CPUs to remain operational during the majority of the update. By offloading work from the critical path, staging can significantly reduce latency spikes. Integrate staging as a preparatory step in late-loading. Introduce a new callback for staging, which is invoked at the beginning of load_late_stop_cpus(), before CPUs enter the rendezvous phase. Staging follows an opportunistic model: * If successful, it reduces CPU rendezvous time * Even though it fails, the process falls back to the legacy path to finish the loading process but with potentially higher latency. Extend struct microcode_ops to incorporate staging properties, which will be implemented in the vendor code separately. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com	2025-10-15 16:46:58 +02:00
Chang S. Bae	ed44a5625f	x86/cpu/topology: Make primary thread mask available with SMP=n cpu_primary_thread_mask is only defined when CONFIG_SMP=y. However, even in UP kernels there is always exactly one CPU, which can reasonably be treated as the primary thread. Historically, topology_is_primary_thread() always returned true with CONFIG_SMP=n. A recent commit: `4b455f5994` ("cpu/SMT: Provide a default topology_is_primary_thread()") replaced it with a generic implementation with the note: "When disabling SMT, the primary thread of the SMT will remain enabled/active. Architectures that have a special primary thread (e.g. x86) need to override this function. ..." For consistency and clarity, make the primary thread mask available regardless of SMP, similar to cpu_possible_mask and cpu_present_mask. Move __cpu_primary_thread_mask into common code to prevent build issues. Let cpu_mark_primary_thread() configure the mask even for UP kernels, alongside other masks. Then, topology_is_primary_thread() can consistently reference it. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20250320234104.8288-1-chang.seok.bae@intel.com	2025-10-15 16:46:11 +02:00
Sumanth Korikkar	300709fbef	mm/memory_hotplug: Remove MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers were introduced to prepare the transition of memory to and from a physically accessible state. This enhancement was crucial for implementing the "memmap on memory" feature for s390. With introduction of dynamic (de)configuration of hotpluggable memory, memory can be brought to accessible state before add_memory(). Memory can be brought to inaccessible state before remove_memory(). Hence, there is no need of MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers anymore. This basically reverts commit `c5f1e2d189` ("mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers") Additionally, apply minor adjustments to the function parameters of move_pfn_range_to_zone() and mhp_supports_memmap_on_memory() to ensure compatibility with the latest branch. Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-14 14:24:53 +02:00
Sumanth Korikkar	ce2071e02d	s390/sclp: Remove MHP_OFFLINE_INACCESSIBLE mhp_flag MHP_OFFLINE_INACCESSIBLE was used to mark memory as not accessible until memory hotplug online phase begins. Earlier, standby memory blocks were added upfront during boottime and MHP_OFFLINE_INACCESSIBLE flag avoided page_init_poison() on memmap during mhp addition phase. However with dynamic runtime configuration of memory, standby memory can be brought to accessible state before performing add_memory(). Hence, remove MHP_OFFLINE_INACCESSIBLE. Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-14 14:24:53 +02:00
Sumanth Korikkar	ff18dcb19a	s390/sclp: Add support for dynamic (de)configuration of memory Provide a new interface for dynamic configuration and deconfiguration of hotplug memory, allowing with/without memmap_on_memory support. It is a follow up on the discussion with David when introducing memmap_on_memory support for s390 and support dynamic (de)configuration of memory: https://lore.kernel.org/all/ee492da8-74b4-4a97-8b24-73e07257f01d@redhat.com/ https://lore.kernel.org/all/20241202082732.3959803-1-sumanthk@linux.ibm.com/ The original motivation for introducing memmap_on_memory on s390 was to avoid using online memory to store struct pages metadata, particularly for standby memory blocks. This became critical in cases where there was an imbalance between standby and online memory, potentially leading to boot failures due to insufficient memory for metadata allocation. To address this, memmap_on_memory was utilized on s390. However, in its current form, it adds struct pages metadata at the start of each memory block at the time of addition and this configuration is static. It cannot be changed at runtime. (When the user needs continuous physical memory). Inorder to provide more flexibility to the user and overcome the above limitation, add option to dynamically configure and deconfigure hotpluggable memory block with/without memmap_on_memory. With the new interface, s390 will not add all possible hotplug memory in advance, like before, to make it visible in sysfs for online/offline actions. Instead, before memory block can be set online, it has to be configured via a new interface in /sys/firmware/memory/memoryX/config, which makes s390 similar to others. i.e. Adding of hotpluggable memory is controlled by the user instead of adding it at boottime. The s390 kernel sysfs interface to configure and deconfigure memory is as follows (considering the upcoming lsmem changes): * Initial memory layout: lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x7fffffff 2G online 0-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes * Configure memory sys="/sys" echo 1 > $sys/firmware/memory/memory16/config lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x7fffffff 2G online 0-15 yes no 0x80000000-0x87ffffff 128M offline 16 yes yes 0x88000000-0xffffffff 1.9G offline 17-31 no yes * Deconfigure memory echo 0 > $sys/firmware/memory/memory16/config lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x7fffffff 2G online 0-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes 3. Enable memmap_on_memory and online it. echo 0 > $sys/devices/system/memory/memory5/online echo 0 > $sys/firmware/memory/memory5/config lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x27ffffff 640M online 0-4 yes no 0x28000000-0x2fffffff 128M offline 5 no no 0x30000000-0x7fffffff 1.3G online 6-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes echo 1 > $sys/firmware/memory/memory5/memmap_on_memory echo 1 > $sys/firmware/memory/memory5/config echo 1 > $sys/devices/system/memory/memory5/online lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x27ffffff 640M online 0-4 yes no 0x28000000-0x2fffffff 128M online 5 yes yes 0x30000000-0x7fffffff 1.3G online 6-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes 4. Disable memmap_on_memory and online it. echo 0 > $sys/devices/system/memory/memory5/online echo 0 > $sys/firmware/memory/memory5/config lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x27ffffff 640M online 0-4 yes no 0x28000000-0x2fffffff 128M offline 5 no yes 0x30000000-0x7fffffff 1.3G online 6-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes echo 0 > $sys/firmware/memory/memory5/memmap_on_memory echo 1 > $sys/firmware/memory/memory5/config echo 1 > $sys/devices/system/memory/memory5/online lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x7fffffff 2G online 0-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-14 14:24:53 +02:00
Sumanth Korikkar	d5e88d32de	s390/mm: Support removal of boot-allocated virtual memory map On s390, memory blocks are not currently removed via arch_remove_memory(). With upcoming dynamic memory (de)configuration support, runtime removal of memory blocks is possible. This internally involves tearing down identity mapping, virtual memory mappings and freeing the physical memory backing the struct pages metadata. During early boot, physical memory used to back the struct pages metadata in vmemmap is allocated through: setup_arch() -> sparse_init() -> sparse_init_nid() -> __populate_section_memmap() -> vmemmap_alloc_block_buf() -> sparse_buffer_alloc() -> memblock_alloc() Here, sparse_init_nid() sets up virtual-to-physical mapping for struct pages backed by memblock_alloc(). This differs from runtime addition of hotplug memory which uses the buddy allocator later. To correctly free identity mappings, vmemmap mappings during hot-remove, boot-time and runtime allocations must be distinguished using the PageReserved bit: * Boot-time memory, such as identity-mapped page tables allocated via boot_crst_alloc() and reserved via reserve_pgtables() is marked PageReserved in memmap_init_reserved_pages(). * Physical memory backing vmemmap (struct pages from memblock_alloc()) is also marked PageReserved similarly. During teardown, PageReserved bit is checked to distinguish between boot-time allocation or buddy allocation. This is similar to commit `645d5ce2f7` ("powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-14 14:24:53 +02:00
Andrew Cooper	4ab13be5ed	x86/fred: Fix 64bit identifier in fred_ss FRED can only be enabled in Long Mode. This is the 64bit mode (as opposed to compatibility mode) identifier, rather than being something hard-wired at 1. No functional change. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Xin Li (Intel) <xin@zytor.com> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>	2025-10-13 14:05:42 -07:00
Chen Yu	a0a0999507	x86/resctrl: Support Sub-NUMA Cluster (SNC) mode on Clearwater Forest Clearwater Forest supports SNC mode. Add it to the snc_cpu_ids[] table. Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Tony Luck <tony.luck@intel.com>	2025-10-13 16:59:55 +02:00
Borislav Petkov (AMD)	ddde4abaa0	x86/cpufeatures: Make X86_FEATURE leaf 17 Linux-specific That cpuinfo_x86.x86_capability[] element was supposed to mirror CPUID flags from CPUID_0x80000007_EBX but that leaf has still to this day only three bits defined in it. So move those bits to scattered.c and free the capability element for synthetic flags. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2025-10-13 16:21:25 +02:00

696 changed files with 16100 additions and 11418 deletions

									
										142

Documentation/admin-guide/RAS/main.rst
									
												View File
												
				@@ -406,24 +406,8 @@ index of the MC::

						   |->mc2

						   ....

				Under each ``mcX`` directory each ``csrowX`` is again represented by a

				``csrowX``, where ``X`` is the csrow index::

					.../mc/mc0/

						|

						|->csrow0

						|->csrow2

						|->csrow3

						....

				Notice that there is no csrow1, which indicates that csrow0 is composed

				of a single ranked DIMMs. This should also apply in both Channels, in

				order to have dual-channel mode be operational. Since both csrow2 and

				csrow3 are populated, this indicates a dual ranked set of DIMMs for

				channels 0 and 1.

				Within each of the ``mcX`` and ``csrowX`` directories are several EDAC

				control and attribute files.

				Within each of the ``mcX`` directory are several EDAC control and

				attribute files.

				``mcX`` directories

				-------------------

				@@ -569,7 +553,7 @@ this ``X`` memory module:

						- Unbuffered-DDR

				.. [#f5] On some systems, the memory controller doesn't have any logic

				  to identify the memory module. On such systems, the directory is called ``rankX`` and works on a similar way as the ``csrowX`` directories.

				  to identify the memory module. On such systems, the directory is called ``rankX``.

				  On modern Intel memory controllers, the memory controller identifies the

				  memory modules directly. On such systems, the directory is called ``dimmX``.

				@@ -577,126 +561,6 @@ this ``X`` memory module:

				  symlinks inside the sysfs mapping that are automatically created by

				  the sysfs subsystem. Currently, they serve no purpose.

				``csrowX`` directories

				----------------------

				When CONFIG_EDAC_LEGACY_SYSFS is enabled, sysfs will contain the ``csrowX``

				directories. As this API doesn't work properly for Rambus, FB-DIMMs and

				modern Intel Memory Controllers, this is being deprecated in favor of

				``dimmX`` directories.

				In the ``csrowX`` directories are EDAC control and attribute files for

				this ``X`` instance of csrow:

				- ``ue_count`` - Total Uncorrectable Errors count attribute file

					This attribute file displays the total count of uncorrectable

					errors that have occurred on this csrow. If panic_on_ue is set

					this counter will not have a chance to increment, since EDAC

					will panic the system.

				- ``ce_count`` - Total Correctable Errors count attribute file

					This attribute file displays the total count of correctable

					errors that have occurred on this csrow. This count is very

					important to examine. CEs provide early indications that a

					DIMM is beginning to fail. This count field should be

					monitored for non-zero values and report such information

					to the system administrator.

				- ``size_mb`` - Total memory managed by this csrow attribute file

					This attribute file displays, in count of megabytes, the memory

					that this csrow contains.

				- ``mem_type`` - Memory Type attribute file

					This attribute file will display what type of memory is currently

					on this csrow. Normally, either buffered or unbuffered memory.

					Examples:

						- Registered-DDR

						- Unbuffered-DDR

				- ``edac_mode`` - EDAC Mode of operation attribute file

					This attribute file will display what type of Error detection

					and correction is being utilized.

				- ``dev_type`` - Device type attribute file

					This attribute file will display what type of DRAM device is

					being utilized on this DIMM.

					Examples:

						- x1

						- x2

						- x4

						- x8

				- ``ch0_ce_count`` - Channel 0 CE Count attribute file

					This attribute file will display the count of CEs on this

					DIMM located in channel 0.

				- ``ch0_ue_count`` - Channel 0 UE Count attribute file

					This attribute file will display the count of UEs on this

					DIMM located in channel 0.

				- ``ch0_dimm_label`` - Channel 0 DIMM Label control file

					This control file allows this DIMM to have a label assigned

					to it. With this label in the module, when errors occur

					the output can provide the DIMM label in the system log.

					This becomes vital for panic events to isolate the

					cause of the UE event.

					DIMM Labels must be assigned after booting, with information

					that correctly identifies the physical slot with its

					silk screen label. This information is currently very

					motherboard specific and determination of this information

					must occur in userland at this time.

				- ``ch1_ce_count`` - Channel 1 CE Count attribute file

					This attribute file will display the count of CEs on this

					DIMM located in channel 1.

				- ``ch1_ue_count`` - Channel 1 UE Count attribute file

					This attribute file will display the count of UEs on this

					DIMM located in channel 0.

				- ``ch1_dimm_label`` - Channel 1 DIMM Label control file

					This control file allows this DIMM to have a label assigned

					to it. With this label in the module, when errors occur

					the output can provide the DIMM label in the system log.

					This becomes vital for panic events to isolate the

					cause of the UE event.

					DIMM Labels must be assigned after booting, with information

					that correctly identifies the physical slot with its

					silk screen label. This information is currently very

					motherboard specific and determination of this information

					must occur in userland at this time.

				System Logging

				--------------

6

Documentation/admin-guide/kernel-parameters.txt

View File

@@ -6207,7 +6207,7 @@
 	rdt=		[HW,X86,RDT]
 			Turn on/off individual RDT features. List is:
 			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
 			mba, smba, bmec, abmc.
 			mba, smba, bmec, abmc, sdciae.
 			E.g. to turn on cmt and turn off mba use:
 				rdt=cmt,!mba
@@ -6500,6 +6500,10 @@
 			Memory area to be used by remote processor image,
 			managed by CMA.
 	rseq_debug=	[KNL] Enable or disable restartable sequence
 			debug mode. Defaults to CONFIG_RSEQ_DEBUG_DEFAULT_ENABLE.
 			Format: <bool>
 	rt_group_sched=	[KNL] Enable or disable SCHED_RR/FIFO group scheduling
 			when CONFIG_RT_GROUP_SCHED=y. Defaults to
 			!CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED.

									
										8

Documentation/arch/arm64/booting.rst
									
												View File
												
				@@ -391,13 +391,13 @@ Before jumping into the kernel, the following conditions must be met:

				    - SMCR_EL2.LEN must be initialised to the same value for all CPUs the

				      kernel will execute on.

				    - HWFGRTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01.

				    - HFGRTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01.

				    - HWFGWTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01.

				    - HFGWTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01.

				    - HWFGRTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.

				    - HFGRTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.

				    - HWFGWTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.

				    - HFGWTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.

				  For CPUs with the Scalable Matrix Extension FA64 feature (FEAT_SME_FA64):

									
										5

Documentation/arch/arm64/sve.rst
									
												View File
												
				@@ -402,6 +402,11 @@ The regset data starts with struct user_sve_header, containing:

				  streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode

				  if the target was not in streaming mode.

				* On systems that do not support SVE it is permitted to use SETREGSET to

				  write SVE_PT_REGS_FPSIMD formatted data via NT_ARM_SVE, in this case the

				  vector length should be specified as 0. This allows streaming mode to be

				  disabled on systems with SME but not SVE.

				* If any register data is provided along with SVE_PT_VL_ONEXEC then the

				  registers data will be interpreted with the current vector length, not

				  the vector length configured for use on exec.

									
										5

Documentation/arch/s390/s390dbf.rst
									
												View File
												
				@@ -243,9 +243,8 @@ Examples:

				Changing the size of debug areas

				------------------------------------

				It is possible the change the size of debug areas through piping

				the number of pages to the debugfs file "pages". The resize request will

				also flush the debug areas.

				To resize a debug area, write the desired page count to the "pages" file.

				Existing data is preserved if it fits; otherwise, oldest entries are dropped.

				Example:

									
										3

Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml
									
												View File
												
				@@ -39,6 +39,9 @@ properties:

				              - amlogic,a4-gpio-ao-intc

				              - amlogic,a5-gpio-intc

				              - amlogic,c3-gpio-intc

				              - amlogic,s6-gpio-intc

				              - amlogic,s7-gpio-intc

				              - amlogic,s7d-gpio-intc

				              - amlogic,t7-gpio-intc

				          - const: amlogic,meson-gpio-intc

									
										13

Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2700-intc.yaml
									
												View File
												
				@@ -25,13 +25,14 @@ properties:

				  interrupt-controller: true

				  '#interrupt-cells':

				    const: 2

				    const: 1

				    description:

				      The first cell is the IRQ number, the second cell is the trigger

				      type as defined in interrupt.txt in this directory.

				  interrupts:

				    maxItems: 6

				    minItems: 1

				    maxItems: 10

				    description: |

				      Depend to which INTC0 or INTC1 used.

				      INTC0 and INTC1 are two kinds of interrupt controller with enable and raw

				@@ -74,13 +75,17 @@ examples:

				        interrupt-controller@12101b00 {

				            compatible = "aspeed,ast2700-intc-ic";

				            reg = <0 0x12101b00 0 0x10>;

				            #interrupt-cells = <2>;

				            #interrupt-cells = <1>;

				            interrupt-controller;

				            interrupts = <GIC_SPI 192 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 193 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 194 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 195 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 196 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 197 IRQ_TYPE_LEVEL_HIGH>;

				                         <GIC_SPI 197 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 198 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 199 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 200 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 201 IRQ_TYPE_LEVEL_HIGH>;

				        };

				    };

									
										4

Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.yaml
									
												View File
												
				@@ -58,6 +58,7 @@ properties:

				          - const: andestech,nceplic100

				      - items:

				          - enum:

				              - anlogic,dr1v90-plic

				              - canaan,k210-plic

				              - eswin,eic7700-plic

				              - sifive,fu540-c000-plic

				@@ -75,6 +76,9 @@ properties:

				              - sophgo,sg2044-plic

				              - thead,th1520-plic

				          - const: thead,c900-plic

				      - items:

				          - const: ultrarisc,dp1000-plic

				          - const: ultrarisc,cp100-plic

				      - items:

				          - const: sifive,plic-1.0.0

				          - const: riscv,plic0

									
										17

Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-mswi.yaml
									
												View File
												
				@@ -4,18 +4,23 @@

				$id: http://devicetree.org/schemas/interrupt-controller/thead,c900-aclint-mswi.yaml#

				$schema: http://devicetree.org/meta-schemas/core.yaml#

				title: Sophgo sg2042 CLINT Machine-level Software Interrupt Device

				title: ACLINT Machine-level Software Interrupt Device

				maintainers:

				  - Inochi Amaoto <inochiama@outlook.com>

				properties:

				  compatible:

				    items:

				      - enum:

				          - sophgo,sg2042-aclint-mswi

				          - sophgo,sg2044-aclint-mswi

				      - const: thead,c900-aclint-mswi

				    oneOf:

				      - items:

				          - enum:

				              - sophgo,sg2042-aclint-mswi

				              - sophgo,sg2044-aclint-mswi

				          - const: thead,c900-aclint-mswi

				      - items:

				          - enum:

				              - anlogic,dr1v90-aclint-mswi

				          - const: nuclei,ux900-aclint-mswi

				  reg:

				    maxItems: 1

									
										4

Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-sswi.yaml
									
												View File
												
				@@ -30,6 +30,10 @@ properties:

				          - const: thead,c900-aclint-sswi

				      - items:

				          - const: mips,p8700-aclint-sswi

				      - items:

				          - enum:

				              - anlogic,dr1v90-aclint-sswi

				          - const: nuclei,ux900-aclint-sswi

				  reg:

				    maxItems: 1

									
										29

Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml
									
												View File
												
				@@ -14,6 +14,7 @@ properties:

				    oneOf:

				      - enum:

				          - fsl,imx8-ddr-pmu

				          - fsl,imx8dxl-db-pmu

				          - fsl,imx8m-ddr-pmu

				          - fsl,imx8mq-ddr-pmu

				          - fsl,imx8mm-ddr-pmu

				@@ -28,7 +29,10 @@ properties:

				              - fsl,imx8mp-ddr-pmu

				          - const: fsl,imx8m-ddr-pmu

				      - items:

				          - const: fsl,imx8dxl-ddr-pmu

				          - enum:

				              - fsl,imx8dxl-ddr-pmu

				              - fsl,imx8qm-ddr-pmu

				              - fsl,imx8qxp-ddr-pmu

				          - const: fsl,imx8-ddr-pmu

				      - items:

				          - enum:

				@@ -43,6 +47,14 @@ properties:

				  interrupts:

				    maxItems: 1

				  clocks:

				    maxItems: 2

				  clock-names:

				    items:

				      - const: ipg

				      - const: cnt

				required:

				  - compatible

				  - reg

				@@ -50,6 +62,21 @@ required:

				additionalProperties: false

				allOf:

				  - if:

				      properties:

				        compatible:

				          contains:

				            const: fsl,imx8dxl-db-pmu

				    then:

				      required:

				        - clocks

				        - clock-names

				    else:

				      properties:

				        clocks: false

				        clock-names: false

				examples:

				  - |

				    #include <dt-bindings/interrupt-controller/arm-gic.h>

									
										47

Documentation/devicetree/bindings/timer/realtek,rtd1625-systimer.yaml
									
										Normal file
									
												View File
												
				@@ -0,0 +1,47 @@

				# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)

				%YAML 1.2

				---

				$id: http://devicetree.org/schemas/timer/realtek,rtd1625-systimer.yaml#

				$schema: http://devicetree.org/meta-schemas/core.yaml#

				title: Realtek System Timer

				maintainers:

				  - Hao-Wen Ting <haowen.ting@realtek.com>

				description:

				  The Realtek SYSTIMER (System Timer) is a 64-bit global hardware counter operating

				  at a fixed 1MHz frequency. Thanks to its compare match interrupt capability,

				  the timer natively supports oneshot mode for tick broadcast functionality.

				properties:

				  compatible:

				    oneOf:

				      - const: realtek,rtd1625-systimer

				      - items:

				          - const: realtek,rtd1635-systimer

				          - const: realtek,rtd1625-systimer

				  reg:

				    maxItems: 1

				  interrupts:

				    maxItems: 1

				required:

				  - compatible

				  - reg

				  - interrupts

				additionalProperties: false

				examples:

				  - |

				    #include <dt-bindings/interrupt-controller/arm-gic.h>

				    timer@89420 {

				        compatible = "realtek,rtd1635-systimer",

				                     "realtek,rtd1625-systimer";

				        reg = <0x89420 0x18>;

				        interrupts = <GIC_SPI 112 IRQ_TYPE_LEVEL_HIGH>;

				    };

									
										2

Documentation/devicetree/bindings/vendor-prefixes.yaml
									
												View File
												
				@@ -1705,6 +1705,8 @@ patternProperties:

				    description: Universal Scientific Industrial Co., Ltd.

				  "^usr,.*":

				    description: U.S. Robotics Corporation

				  "^ultrarisc,.*":

				    description: UltraRISC Technology Co., Ltd.

				  "^ultratronik,.*":

				    description: Ultratronik GmbH

				  "^utoo,.*":

									
										134

Documentation/filesystems/resctrl.rst
									
												View File
												
				@@ -17,17 +17,18 @@ AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).

				This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo

				flag bits:

				===============================================	================================

				RDT (Resource Director Technology) Allocation	"rdt_a"

				CAT (Cache Allocation Technology)		"cat_l3", "cat_l2"

				CDP (Code and Data Prioritization)		"cdp_l3", "cdp_l2"

				CQM (Cache QoS Monitoring)			"cqm_llc", "cqm_occup_llc"

				MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total", "cqm_mbm_local"

				MBA (Memory Bandwidth Allocation)		"mba"

				SMBA (Slow Memory Bandwidth Allocation)         ""

				BMEC (Bandwidth Monitoring Event Configuration) ""

				ABMC (Assignable Bandwidth Monitoring Counters) ""

				===============================================	================================

				=============================================================== ================================

				RDT (Resource Director Technology) Allocation			"rdt_a"

				CAT (Cache Allocation Technology)				"cat_l3", "cat_l2"

				CDP (Code and Data Prioritization)				"cdp_l3", "cdp_l2"

				CQM (Cache QoS Monitoring)					"cqm_llc", "cqm_occup_llc"

				MBM (Memory Bandwidth Monitoring)				"cqm_mbm_total", "cqm_mbm_local"

				MBA (Memory Bandwidth Allocation)				"mba"

				SMBA (Slow Memory Bandwidth Allocation)				""

				BMEC (Bandwidth Monitoring Event Configuration)			""

				ABMC (Assignable Bandwidth Monitoring Counters)			""

				SDCIAE (Smart Data Cache Injection Allocation Enforcement)	""

				=============================================================== ================================

				Historically, new features were made visible by default in /proc/cpuinfo. This

				resulted in the feature flags becoming hard to parse by humans. Adding a new

				@@ -72,6 +73,11 @@ The 'info' directory contains information about the enabled

				resources. Each resource has its own subdirectory. The subdirectory

				names reflect the resource names.

				Most of the files in the resource's subdirectory are read-only, and

				describe properties of the resource. Resources that support global

				configuration options also include writable files that can be used

				to modify those settings.

				Each subdirectory contains the following files with respect to

				allocation:

				@@ -90,12 +96,19 @@ related to allocation:

						must be set when writing a mask.

				"shareable_bits":

						Bitmask of shareable resource with other executing

						entities (e.g. I/O). User can use this when

						setting up exclusive cache partitions. Note that

						some platforms support devices that have their

						own settings for cache use which can over-ride

						these bits.

						Bitmask of shareable resource with other executing entities

						(e.g. I/O). Applies to all instances of this resource. User

						can use this when setting up exclusive cache partitions.

						Note that some platforms support devices that have their

						own settings for cache use which can over-ride these bits.

						When "io_alloc" is enabled, a portion of each cache instance can

						be configured for shared use between hardware and software.

						"bit_usage" should be used to see which portions of each cache

						instance is configured for hardware use via "io_alloc" feature

						because every cache instance can have its "io_alloc" bitmask

						configured independently via "io_alloc_cbm".

				"bit_usage":

						Annotated capacity bitmasks showing how all

						instances of the resource are used. The legend is:

				@@ -109,16 +122,16 @@ related to allocation:

							"H":

							      Corresponding region is used by hardware only

							      but available for software use. If a resource

							      has bits set in "shareable_bits" but not all

							      of these bits appear in the resource groups'

							      schematas then the bits appearing in

							      "shareable_bits" but no resource group will

							      be marked as "H".

							      has bits set in "shareable_bits" or "io_alloc_cbm"

							      but not all of these bits appear in the resource

							      groups' schemata then the bits appearing in

							      "shareable_bits" or "io_alloc_cbm" but no

							      resource group will be marked as "H".

							"X":

							      Corresponding region is available for sharing and

							      used by hardware and software. These are the

							      bits that appear in "shareable_bits" as

							      well as a resource group's allocation.

							      used by hardware and software. These are the bits

							      that appear in "shareable_bits" or "io_alloc_cbm"

							      as well as a resource group's allocation.

							"S":

							      Corresponding region is used by software

							      and available for sharing.

				@@ -136,6 +149,77 @@ related to allocation:

							"1":

							      Non-contiguous 1s value in CBM is supported.

				"io_alloc":

						"io_alloc" enables system software to configure the portion of

						the cache allocated for I/O traffic. File may only exist if the

						system supports this feature on some of its cache resources.

							"disabled":

							      Resource supports "io_alloc" but the feature is disabled.

							      Portions of cache used for allocation of I/O traffic cannot

							      be configured.

							"enabled":

							      Portions of cache used for allocation of I/O traffic

							      can be configured using "io_alloc_cbm".

							"not supported":

							      Support not available for this resource.

						The feature can be modified by writing to the interface, for example:

						To enable::

							# echo 1 > /sys/fs/resctrl/info/L3/io_alloc

						To disable::

							# echo 0 > /sys/fs/resctrl/info/L3/io_alloc

						The underlying implementation may reduce resources available to

						general (CPU) cache allocation. See architecture specific notes

						below. Depending on usage requirements the feature can be enabled

						or disabled.

						On AMD systems, io_alloc feature is supported by the L3 Smart

						Data Cache Injection Allocation Enforcement (SDCIAE). The CLOSID for

						io_alloc is the highest CLOSID supported by the resource. When

						io_alloc is enabled, the highest CLOSID is dedicated to io_alloc and

						no longer available for general (CPU) cache allocation. When CDP is

						enabled, io_alloc routes I/O traffic using the highest CLOSID allocated

						for the instruction cache (CDP_CODE), making this CLOSID no longer

						available for general (CPU) cache allocation for both the CDP_CODE

						and CDP_DATA resources.

				"io_alloc_cbm":

						Capacity bitmasks that describe the portions of cache instances to

						which I/O traffic from supported I/O devices are routed when "io_alloc"

						is enabled.

						CBMs are displayed in the following format:

							<cache_id0>=<cbm>;<cache_id1>=<cbm>;...

						Example::

							# cat /sys/fs/resctrl/info/L3/io_alloc_cbm

							0=ffff;1=ffff

						CBMs can be configured by writing to the interface.

						Example::

							# echo 1=ff > /sys/fs/resctrl/info/L3/io_alloc_cbm

							# cat /sys/fs/resctrl/info/L3/io_alloc_cbm

							0=ffff;1=00ff

							# echo "0=ff;1=f" > /sys/fs/resctrl/info/L3/io_alloc_cbm

							# cat /sys/fs/resctrl/info/L3/io_alloc_cbm

							0=00ff;1=000f

						When CDP is enabled "io_alloc_cbm" associated with the CDP_DATA and CDP_CODE

						resources may reflect the same values. For example, values read from and

						written to /sys/fs/resctrl/info/L3DATA/io_alloc_cbm may be reflected by

						/sys/fs/resctrl/info/L3CODE/io_alloc_cbm and vice versa.

				Memory bandwidth(MB) subdirectory contains the following files

				with respect to allocation:

15

MAINTAINERS

View File

@@ -17470,6 +17470,16 @@ S:	Maintained
 F:	Documentation/devicetree/bindings/leds/backlight/mps,mp3309c.yaml
 F:	drivers/video/backlight/mp3309c.c
 MPAM DRIVER
 M:	James Morse <james.morse@arm.com>
 M:	Ben Horgan <ben.horgan@arm.com>
 R:	Reinette Chatre <reinette.chatre@intel.com>
 R:	Fenghua Yu <fenghuay@nvidia.com>
 S:	Maintained
 F:	drivers/resctrl/mpam_*
 F:	drivers/resctrl/test_mpam_*
 F:	include/linux/arm_mpam.h
 MPS MP2869 DRIVER
 M:	Wensheng Wang <wenswang@yeah.net>
 L:	linux-hwmon@vger.kernel.org
@@ -21681,6 +21691,11 @@ S:	Maintained
 F:	Documentation/devicetree/bindings/spi/realtek,rtl9301-snand.yaml
 F:	drivers/spi/spi-realtek-rtl-snand.c
 REALTEK SYSTIMER DRIVER
 M:	Hao-Wen Ting <haowen.ting@realtek.com>
 S:	Maintained
 F:	drivers/clocksource/timer-realtek.c
 REALTEK WIRELESS DRIVER (rtlwifi family)
 M:	Ping-Ke Shih <pkshih@realtek.com>
 L:	linux-wireless@vger.kernel.org

									
										26

arch/arm/include/asm/uaccess.h
									
												View File
												
				@@ -283,10 +283,17 @@ extern int __put_user_8(void *, unsigned long long);

					__gu_err;							\

				})

				/*

				 * This is a type: either unsigned long, if the argument fits into

				 * that type, or otherwise unsigned long long.

				 */

				#define __long_type(x) \

					__typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))

				#define __get_user_err(x, ptr, err, __t)				\

				do {									\

					unsigned long __gu_addr = (unsigned long)(ptr);			\

					unsigned long __gu_val;						\

					__long_type(x) __gu_val;					\

					unsigned int __ua_flags;					\

					__chk_user_ptr(ptr);						\

					might_fault();							\

				@@ -295,6 +302,7 @@ do {									\

					case 1:	__get_user_asm_byte(__gu_val, __gu_addr, err, __t); break;	\

					case 2:	__get_user_asm_half(__gu_val, __gu_addr, err, __t); break;	\

					case 4:	__get_user_asm_word(__gu_val, __gu_addr, err, __t); break;	\

					case 8:	__get_user_asm_dword(__gu_val, __gu_addr, err, __t); break;	\

					default: (__gu_val) = __get_user_bad();				\

					}								\

					uaccess_restore(__ua_flags);					\

				@@ -353,6 +361,22 @@ do {									\

				#define __get_user_asm_word(x, addr, err, __t)			\

					__get_user_asm(x, addr, err, "ldr" __t)

				#ifdef __ARMEB__

				#define __WORD0_OFFS	4

				#define __WORD1_OFFS	0

				#else

				#define __WORD0_OFFS	0

				#define __WORD1_OFFS	4

				#endif

				#define __get_user_asm_dword(x, addr, err, __t)				\

					({								\

					unsigned long __w0, __w1;					\

					__get_user_asm(__w0, addr + __WORD0_OFFS, err, "ldr" __t);	\

					__get_user_asm(__w1, addr + __WORD1_OFFS, err, "ldr" __t);	\

					(x) = ((u64)__w1 << 32) | (u64) __w0;				\

				})

				#define __put_user_switch(x, ptr, __err, __fn)				\

					do {								\

						const __typeof__(*(ptr)) __user *__pu_ptr = (ptr);	\

26

arch/arm64/Kconfig

View File

@@ -47,7 +47,6 @@ config ARM64
 	select ARCH_HAS_SETUP_DMA_OPS
 	select ARCH_HAS_SET_DIRECT_MAP
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_MEM_ENCRYPT
 	select ARCH_HAS_FORCE_DMA_UNENCRYPTED
 	select ARCH_STACKWALK
 	select ARCH_HAS_STRICT_KERNEL_RWX
@@ -2023,6 +2022,31 @@ config ARM64_TLB_RANGE
 	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
 	  range of input addresses.
 config ARM64_MPAM
 	bool "Enable support for MPAM"
 	select ARM64_MPAM_DRIVER if EXPERT	# does nothing yet
 	select ACPI_MPAM if ACPI
 	help
 	  Memory System Resource Partitioning and Monitoring (MPAM) is an
 	  optional extension to the Arm architecture that allows each
 	  transaction issued to the memory system to be labelled with a
 	  Partition identifier (PARTID) and Performance Monitoring Group
 	  identifier (PMG).
 	  Memory system components, such as the caches, can be configured with
 	  policies to control how much of various physical resources (such as
 	  memory bandwidth or cache memory) the transactions labelled with each
 	  PARTID can consume.  Depending on the capabilities of the hardware,
 	  the PARTID and PMG can also be used as filtering criteria to measure
 	  the memory system resource consumption of different parts of a
 	  workload.
 	  Use of this extension requires CPU support, support in the
 	  Memory System Components (MSC), and a description from firmware
 	  of where the MSCs are in the address space.
 	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
 endmenu # "ARMv8.4 architectural features"
 menu "ARMv8.5 architectural features"

									
										8

arch/arm64/include/asm/alternative-macros.h
									
												View File
												
				@@ -19,7 +19,7 @@

				#error "cpucaps have overflown ARM64_CB_BIT"

				#endif

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/stringify.h>

				@@ -207,7 +207,7 @@ alternative_endif

				#define _ALTERNATIVE_CFG(insn1, insn2, cap, cfg, ...)	\

					alternative_insn insn1, insn2, cap, IS_ENABLED(cfg)

				#endif  /*  __ASSEMBLY__  */

				#endif  /*  __ASSEMBLER__  */

				/*

				 * Usage: asm(ALTERNATIVE(oldinstr, newinstr, cpucap));

				@@ -219,7 +219,7 @@ alternative_endif

				#define ALTERNATIVE(oldinstr, newinstr, ...)   \

					_ALTERNATIVE_CFG(oldinstr, newinstr, __VA_ARGS__, 1)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/types.h>

				@@ -263,6 +263,6 @@ l_yes:

					return true;

				}

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_ALTERNATIVE_MACROS_H */

									
										4

arch/arm64/include/asm/alternative.h
									
												View File
												
				@@ -4,7 +4,7 @@

				#include <asm/alternative-macros.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/init.h>

				#include <linux/types.h>

				@@ -37,5 +37,5 @@ static inline int apply_alternatives_module(void *start, size_t length)

				void alt_cb_patch_nops(struct alt_instr *alt, __le32 *origptr,

						       __le32 *updptr, int nr_inst);

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_ALTERNATIVE_H */

									
										4

arch/arm64/include/asm/arch_gicv3.h
									
												View File
												
				@@ -9,7 +9,7 @@

				#include <asm/sysreg.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/irqchip/arm-gic-common.h>

				#include <linux/stringify.h>

				@@ -188,5 +188,5 @@ static inline bool gic_has_relaxed_pmr_sync(void)

					return cpus_have_cap(ARM64_HAS_GIC_PRIO_RELAXED_SYNC);

				}

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_ARCH_GICV3_H */

									
										6

arch/arm64/include/asm/asm-extable.h
									
												View File
												
				@@ -27,7 +27,7 @@

				/* Data fields for EX_TYPE_UACCESS_CPY */

				#define EX_DATA_UACCESS_WRITE	BIT(0)

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

				#define __ASM_EXTABLE_RAW(insn, fixup, type, data)	\

					.pushsection	__ex_table, "a";		\

				@@ -77,7 +77,7 @@

					__ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_UACCESS_CPY, \uaccess_is_write)

					.endm

				#else /* __ASSEMBLY__ */

				#else /* __ASSEMBLER__ */

				#include <linux/stringify.h>

				@@ -132,6 +132,6 @@

							    EX_DATA_REG(ADDR, addr)				\

							  ")")

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_ASM_EXTABLE_H */

									
										12

arch/arm64/include/asm/assembler.h
									
												View File
												
				@@ -5,7 +5,7 @@

				 * Copyright (C) 1996-2000 Russell King

				 * Copyright (C) 2012 ARM Ltd.

				 */

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#error "Only include this from assembly code"

				#endif

				@@ -325,14 +325,14 @@ alternative_cb_end

				 * tcr_set_t0sz - update TCR.T0SZ so that we can load the ID map

				 */

					.macro	tcr_set_t0sz, valreg, t0sz

					bfi	\valreg, \t0sz, #TCR_T0SZ_OFFSET, #TCR_TxSZ_WIDTH

					bfi	\valreg, \t0sz, #TCR_EL1_T0SZ_SHIFT, #TCR_EL1_T0SZ_WIDTH

					.endm

				/*

				 * tcr_set_t1sz - update TCR.T1SZ

				 */

					.macro	tcr_set_t1sz, valreg, t1sz

					bfi	\valreg, \t1sz, #TCR_T1SZ_OFFSET, #TCR_TxSZ_WIDTH

					bfi	\valreg, \t1sz, #TCR_EL1_T1SZ_SHIFT, #TCR_EL1_T1SZ_WIDTH

					.endm

				/*

				@@ -371,7 +371,7 @@ alternative_endif

				 * [start, end) with dcache line size explicitly provided.

				 *

				 * 	op:		operation passed to dc instruction

				 * 	domain:		domain used in dsb instruciton

				 * 	domain:		domain used in dsb instruction

				 * 	start:          starting virtual address of the region

				 * 	end:            end virtual address of the region

				 *	linesz:		dcache line size

				@@ -412,7 +412,7 @@ alternative_endif

				 * [start, end)

				 *

				 * 	op:		operation passed to dc instruction

				 * 	domain:		domain used in dsb instruciton

				 * 	domain:		domain used in dsb instruction

				 * 	start:          starting virtual address of the region

				 * 	end:            end virtual address of the region

				 * 	fixup:		optional label to branch to on user fault

				@@ -589,7 +589,7 @@ alternative_endif

					.macro	offset_ttbr1, ttbr, tmp

				#if defined(CONFIG_ARM64_VA_BITS_52) && !defined(CONFIG_ARM64_LPA2)

					mrs	\tmp, tcr_el1

					and	\tmp, \tmp, #TCR_T1SZ_MASK

					and	\tmp, \tmp, #TCR_EL1_T1SZ_MASK

					cmp	\tmp, #TCR_T1SZ(VA_BITS_MIN)

					orr	\tmp, \ttbr, #TTBR1_BADDR_4852_OFFSET

					csel	\ttbr, \tmp, \ttbr, eq

									
										20

arch/arm64/include/asm/atomic_lse.h
									
												View File
												
				@@ -103,17 +103,17 @@ static __always_inline void __lse_atomic_and(int i, atomic_t *v)

					return __lse_atomic_andnot(~i, v);

				}

				#define ATOMIC_FETCH_OP_AND(name, mb, cl...)				\

				#define ATOMIC_FETCH_OP_AND(name)					\

				static __always_inline int						\

				__lse_atomic_fetch_and##name(int i, atomic_t *v)			\

				{									\

					return __lse_atomic_fetch_andnot##name(~i, v);			\

				}

				ATOMIC_FETCH_OP_AND(_relaxed,   )

				ATOMIC_FETCH_OP_AND(_acquire,  a, "memory")

				ATOMIC_FETCH_OP_AND(_release,  l, "memory")

				ATOMIC_FETCH_OP_AND(        , al, "memory")

				ATOMIC_FETCH_OP_AND(_relaxed)

				ATOMIC_FETCH_OP_AND(_acquire)

				ATOMIC_FETCH_OP_AND(_release)

				ATOMIC_FETCH_OP_AND(        )

				#undef ATOMIC_FETCH_OP_AND

				@@ -210,17 +210,17 @@ static __always_inline void __lse_atomic64_and(s64 i, atomic64_t *v)

					return __lse_atomic64_andnot(~i, v);

				}

				#define ATOMIC64_FETCH_OP_AND(name, mb, cl...)				\

				#define ATOMIC64_FETCH_OP_AND(name)					\

				static __always_inline long						\

				__lse_atomic64_fetch_and##name(s64 i, atomic64_t *v)			\

				{									\

					return __lse_atomic64_fetch_andnot##name(~i, v);		\

				}

				ATOMIC64_FETCH_OP_AND(_relaxed,   )

				ATOMIC64_FETCH_OP_AND(_acquire,  a, "memory")

				ATOMIC64_FETCH_OP_AND(_release,  l, "memory")

				ATOMIC64_FETCH_OP_AND(        , al, "memory")

				ATOMIC64_FETCH_OP_AND(_relaxed)

				ATOMIC64_FETCH_OP_AND(_acquire)

				ATOMIC64_FETCH_OP_AND(_release)

				ATOMIC64_FETCH_OP_AND(        )

				#undef ATOMIC64_FETCH_OP_AND

									
										4

arch/arm64/include/asm/barrier.h
									
												View File
												
				@@ -7,7 +7,7 @@

				#ifndef __ASM_BARRIER_H

				#define __ASM_BARRIER_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/kasan-checks.h>

				@@ -221,6 +221,6 @@ do {									\

				#include <asm-generic/barrier.h>

				#endif	/* __ASSEMBLY__ */

				#endif	/* __ASSEMBLER__ */

				#endif	/* __ASM_BARRIER_H */

									
										4

arch/arm64/include/asm/cache.h
									
												View File
												
				@@ -35,7 +35,7 @@

				#define ARCH_DMA_MINALIGN	(128)

				#define ARCH_KMALLOC_MINALIGN	(8)

				#if !defined(__ASSEMBLY__) && !defined(BUILD_VDSO)

				#if !defined(__ASSEMBLER__) && !defined(BUILD_VDSO)

				#include <linux/bitops.h>

				#include <linux/kasan-enabled.h>

				@@ -135,6 +135,6 @@ static inline u32 __attribute_const__ read_cpuid_effective_cachetype(void)

					return ctr;

				}

				#endif /* !defined(__ASSEMBLY__) && !defined(BUILD_VDSO) */

				#endif /* !defined(__ASSEMBLER__) && !defined(BUILD_VDSO) */

				#endif

									
										4

arch/arm64/include/asm/cpucaps.h
									
												View File
												
				@@ -5,7 +5,7 @@

				#include <asm/cpucap-defs.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/types.h>

				/*

				 * Check whether a cpucap is possible at compiletime.

				@@ -77,6 +77,6 @@ cpucap_is_possible(const unsigned int cap)

					return true;

				}

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_CPUCAPS_H */

									
										8

arch/arm64/include/asm/cpufeature.h
									
												View File
												
				@@ -19,7 +19,7 @@

				#define ARM64_SW_FEATURE_OVERRIDE_HVHE		4

				#define ARM64_SW_FEATURE_OVERRIDE_RODATA_OFF	8

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/bug.h>

				#include <linux/jump_label.h>

				@@ -199,7 +199,7 @@ extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0;

				 *    registers (e.g, SCTLR, TCR etc.) or patching the kernel via

				 *    alternatives. The kernel patching is batched and performed at later

				 *    point. The actions are always initiated only after the capability

				 *    is finalised. This is usally denoted by "enabling" the capability.

				 *    is finalised. This is usually denoted by "enabling" the capability.

				 *    The actions are initiated as follows :

				 *	a) Action is triggered on all online CPUs, after the capability is

				 *	finalised, invoked within the stop_machine() context from

				@@ -251,7 +251,7 @@ extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0;

				#define ARM64_CPUCAP_SCOPE_LOCAL_CPU		((u16)BIT(0))

				#define ARM64_CPUCAP_SCOPE_SYSTEM		((u16)BIT(1))

				/*

				 * The capabilitiy is detected on the Boot CPU and is used by kernel

				 * The capability is detected on the Boot CPU and is used by kernel

				 * during early boot. i.e, the capability should be "detected" and

				 * "enabled" as early as possibly on all booting CPUs.

				 */

				@@ -1078,6 +1078,6 @@ static inline bool cpu_has_lpa2(void)

				#endif

				}

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif

									
										6

arch/arm64/include/asm/cputype.h
									
												View File
												
				@@ -247,9 +247,9 @@

				/* Fujitsu Erratum 010001 affects A64FX 1.0 and 1.1, (v0r0 and v1r0) */

				#define MIDR_FUJITSU_ERRATUM_010001		MIDR_FUJITSU_A64FX

				#define MIDR_FUJITSU_ERRATUM_010001_MASK	(~MIDR_CPU_VAR_REV(1, 0))

				#define TCR_CLEAR_FUJITSU_ERRATUM_010001	(TCR_NFD1 | TCR_NFD0)

				#define TCR_CLEAR_FUJITSU_ERRATUM_010001	(TCR_EL1_NFD1 | TCR_EL1_NFD0)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/sysreg.h>

				@@ -328,6 +328,6 @@ static inline u32 __attribute_const__ read_cpuid_cachetype(void)

				{

					return read_cpuid(CTR_EL0);

				}

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif

									
										4

arch/arm64/include/asm/current.h
									
												View File
												
				@@ -4,7 +4,7 @@

				#include <linux/compiler.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				struct task_struct;

				@@ -23,7 +23,7 @@ static __always_inline struct task_struct *get_current(void)

				#define current get_current()

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_CURRENT_H */

									
										4

arch/arm64/include/asm/debug-monitors.h
									
												View File
												
				@@ -48,7 +48,7 @@

				#define AARCH32_BREAK_THUMB2_LO	0xf7f0

				#define AARCH32_BREAK_THUMB2_HI	0xa000

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				struct task_struct;

				#define DBG_ARCH_ID_RESERVED	0	/* In case of ptrace ABI updates. */

				@@ -88,5 +88,5 @@ static inline bool try_step_suspended_breakpoints(struct pt_regs *regs)

				bool try_handle_aarch32_break(struct pt_regs *regs);

				#endif	/* __ASSEMBLY */

				#endif	/* __ASSEMBLER__ */

				#endif	/* __ASM_DEBUG_MONITORS_H */

									
										13

arch/arm64/include/asm/efi.h
									
												View File
												
				@@ -126,21 +126,14 @@ static inline void efi_set_pgd(struct mm_struct *mm)

						if (mm != current->active_mm) {

							/*

							 * Update the current thread's saved ttbr0 since it is

							 * restored as part of a return from exception. Enable

							 * access to the valid TTBR0_EL1 and invoke the errata

							 * workaround directly since there is no return from

							 * exception when invoking the EFI run-time services.

							 * restored as part of a return from exception.

							 */

							update_saved_ttbr0(current, mm);

							uaccess_ttbr0_enable();

							post_ttbr_update_workaround();

						} else {

							/*

							 * Defer the switch to the current thread's TTBR0_EL1

							 * until uaccess_enable(). Restore the current

							 * thread's saved ttbr0 corresponding to its active_mm

							 * Restore the current thread's saved ttbr0

							 * corresponding to its active_mm

							 */

							uaccess_ttbr0_disable();

							update_saved_ttbr0(current, current->active_mm);

						}

					}

									
										4

arch/arm64/include/asm/el2_setup.h
									
												View File
												
				@@ -7,7 +7,7 @@

				#ifndef __ARM_KVM_INIT_H__

				#define __ARM_KVM_INIT_H__

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#error Assembly-only header

				#endif

				@@ -24,7 +24,7 @@

					 * ID_AA64MMFR4_EL1.E2H0 < 0. On such CPUs HCR_EL2.E2H is RES1, but it

					 * can reset into an UNKNOWN state and might not read as 1 until it has

					 * been initialized explicitly.

					 * Initalize HCR_EL2.E2H so that later code can rely upon HCR_EL2.E2H

					 * Initialize HCR_EL2.E2H so that later code can rely upon HCR_EL2.E2H

					 * indicating whether the CPU is running in E2H mode.

					 */

					mrs_s	x1, SYS_ID_AA64MMFR4_EL1

									
										4

arch/arm64/include/asm/elf.h
									
												View File
												
				@@ -133,7 +133,7 @@

				#define ELF_ET_DYN_BASE		(2 * DEFAULT_MAP_WINDOW_64 / 3)

				#endif /* CONFIG_ARM64_FORCE_52BIT */

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <uapi/linux/elf.h>

				#include <linux/bug.h>

				@@ -293,6 +293,6 @@ static inline int arch_check_elf(void *ehdr, bool has_interp,

					return 0;

				}

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif

									
										4

arch/arm64/include/asm/esr.h
									
												View File
												
				@@ -431,7 +431,7 @@

				#define ESR_ELx_IT_GCSPOPCX		6

				#define ESR_ELx_IT_GCSPOPX		7

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/types.h>

				static inline unsigned long esr_brk_comment(unsigned long esr)

				@@ -534,6 +534,6 @@ static inline bool esr_iss_is_eretab(unsigned long esr)

				}

				const char *esr_get_class_string(unsigned long esr);

				#endif /* __ASSEMBLY */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_ESR_H */

									
										4

arch/arm64/include/asm/fixmap.h
									
												View File
												
				@@ -15,7 +15,7 @@

				#ifndef _ASM_ARM64_FIXMAP_H

				#define _ASM_ARM64_FIXMAP_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/kernel.h>

				#include <linux/math.h>

				#include <linux/sizes.h>

				@@ -117,5 +117,5 @@ extern void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t pr

				#include <asm-generic/fixmap.h>

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif /* _ASM_ARM64_FIXMAP_H */

									
										2

arch/arm64/include/asm/fpsimd.h
									
												View File
												
				@@ -12,7 +12,7 @@

				#include <asm/sigcontext.h>

				#include <asm/sysreg.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/bitmap.h>

				#include <linux/build_bug.h>

									
										6

arch/arm64/include/asm/ftrace.h
									
												View File
												
				@@ -37,7 +37,7 @@

				 */

				#define ARCH_FTRACE_SHIFT_STACK_TRACER 1

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/compat.h>

				extern void _mcount(unsigned long);

				@@ -217,9 +217,9 @@ static inline bool arch_syscall_match_sym_name(const char *sym,

					 */

					return !strcmp(sym + 8, name);

				}

				#endif /* ifndef __ASSEMBLY__ */

				#endif /* ifndef __ASSEMBLER__ */

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#ifdef CONFIG_FUNCTION_GRAPH_TRACER

				void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,

									
										6

arch/arm64/include/asm/gpr-num.h
									
												View File
												
				@@ -2,7 +2,7 @@

				#ifndef __ASM_GPR_NUM_H

				#define __ASM_GPR_NUM_H

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

					.irp	num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30

					.equ	.L__gpr_num_x\num, \num

				@@ -11,7 +11,7 @@

					.equ	.L__gpr_num_xzr, 31

					.equ	.L__gpr_num_wzr, 31

				#else /* __ASSEMBLY__ */

				#else /* __ASSEMBLER__ */

				#define __DEFINE_ASM_GPR_NUMS					\

				"	.irp	num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30\n" \

				@@ -21,6 +21,6 @@

				"	.equ	.L__gpr_num_xzr, 31\n"				\

				"	.equ	.L__gpr_num_wzr, 31\n"

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_GPR_NUM_H */

									
										2

arch/arm64/include/asm/hwcap.h
									
												View File
												
				@@ -46,7 +46,7 @@

				#define COMPAT_HWCAP2_SB	(1 << 5)

				#define COMPAT_HWCAP2_SSBS	(1 << 6)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/log2.h>

				/*

									
										4

arch/arm64/include/asm/image.h
									
												View File
												
				@@ -20,7 +20,7 @@

				#define ARM64_IMAGE_FLAG_PAGE_SIZE_64K		3

				#define ARM64_IMAGE_FLAG_PHYS_BASE		1

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#define arm64_image_flag_field(flags, field) \

								(((flags) >> field##_SHIFT) & field##_MASK)

				@@ -54,6 +54,6 @@ struct arm64_image_header {

					__le32 res5;

				};

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_IMAGE_H */

									
										4

arch/arm64/include/asm/insn.h
									
												View File
												
				@@ -12,7 +12,7 @@

				#include <asm/insn-def.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				enum aarch64_insn_hint_cr_op {

					AARCH64_INSN_HINT_NOP	= 0x0 << 5,

				@@ -730,6 +730,6 @@ u32 aarch32_insn_mcr_extract_crm(u32 insn);

				typedef bool (pstate_check_t)(unsigned long);

				extern pstate_check_t * const aarch32_opcode_cond_checks[16];

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif	/* __ASM_INSN_H */

									
										4

arch/arm64/include/asm/jump_label.h
									
												View File
												
				@@ -8,7 +8,7 @@

				#ifndef __ASM_JUMP_LABEL_H

				#define __ASM_JUMP_LABEL_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/types.h>

				#include <asm/insn.h>

				@@ -58,5 +58,5 @@ l_yes:

					return true;

				}

				#endif  /* __ASSEMBLY__ */

				#endif  /* __ASSEMBLER__ */

				#endif	/* __ASM_JUMP_LABEL_H */

									
										2

arch/arm64/include/asm/kasan.h
									
												View File
												
				@@ -2,7 +2,7 @@

				#ifndef __ASM_KASAN_H

				#define __ASM_KASAN_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/linkage.h>

				#include <asm/memory.h>

									
										4

arch/arm64/include/asm/kexec.h
									
												View File
												
				@@ -25,7 +25,7 @@

				#define KEXEC_ARCH KEXEC_ARCH_AARCH64

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				/**

				 * crash_setup_regs() - save registers for the panic kernel

				@@ -130,6 +130,6 @@ extern int load_other_segments(struct kimage *image,

						char *cmdline);

				#endif

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif

									
										4

arch/arm64/include/asm/kgdb.h
									
												View File
												
				@@ -14,7 +14,7 @@

				#include <linux/ptrace.h>

				#include <asm/debug-monitors.h>

				#ifndef	__ASSEMBLY__

				#ifndef	__ASSEMBLER__

				static inline void arch_kgdb_breakpoint(void)

				{

				@@ -36,7 +36,7 @@ static inline int kgdb_single_step_handler(struct pt_regs *regs,

				}

				#endif

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				/*

				 * gdb remote procotol (well most versions of it) expects the following

									
										4

arch/arm64/include/asm/kvm_asm.h
									
												View File
												
				@@ -46,7 +46,7 @@

				#define __KVM_HOST_SMCCC_FUNC___kvm_hyp_init			0

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/mm.h>

				@@ -303,7 +303,7 @@ void kvm_compute_final_ctr_el0(struct alt_instr *alt,

				void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr, u64 elr_virt,

					u64 elr_phys, u64 par, uintptr_t vcpu, u64 far, u64 hpfar);

				#else /* __ASSEMBLY__ */

				#else /* __ASSEMBLER__ */

				.macro get_host_ctxt reg, tmp

					adr_this_cpu \reg, kvm_host_data, \tmp

									
										4

arch/arm64/include/asm/kvm_mmu.h
									
												View File
												
				@@ -49,7 +49,7 @@

				 * mappings, and none of this applies in that case.

				 */

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

				#include <asm/alternative.h>

				@@ -396,5 +396,5 @@ void kvm_s2_ptdump_create_debugfs(struct kvm *kvm);

				static inline void kvm_s2_ptdump_create_debugfs(struct kvm *kvm) {}

				#endif /* CONFIG_PTDUMP_STAGE2_DEBUGFS */

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ARM64_KVM_MMU_H__ */

									
										4

arch/arm64/include/asm/kvm_mte.h
									
												View File
												
				@@ -5,7 +5,7 @@

				#ifndef __ASM_KVM_MTE_H

				#define __ASM_KVM_MTE_H

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

				#include <asm/sysreg.h>

				@@ -62,5 +62,5 @@ alternative_else_nop_endif

				.endm

				#endif /* CONFIG_ARM64_MTE */

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_KVM_MTE_H */

									
										6

arch/arm64/include/asm/kvm_ptrauth.h
									
												View File
												
				@@ -8,7 +8,7 @@

				#ifndef __ASM_KVM_PTRAUTH_H

				#define __ASM_KVM_PTRAUTH_H

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

				#include <asm/sysreg.h>

				@@ -100,7 +100,7 @@ alternative_else_nop_endif

				.endm

				#endif /* CONFIG_ARM64_PTR_AUTH */

				#else  /* !__ASSEMBLY */

				#else  /* !__ASSEMBLER__ */

				#define __ptrauth_save_key(ctxt, key)					\

					do {								\

				@@ -120,5 +120,5 @@ alternative_else_nop_endif

						__ptrauth_save_key(ctxt, APGA);				\

					} while(0)

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_KVM_PTRAUTH_H */

									
										2

arch/arm64/include/asm/linkage.h
									
												View File
												
				@@ -1,7 +1,7 @@

				#ifndef __ASM_LINKAGE_H

				#define __ASM_LINKAGE_H

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

				#include <asm/assembler.h>

				#endif

									
										5

arch/arm64/include/asm/memory.h
									
												View File
												
				@@ -207,7 +207,7 @@

				 */

				#define TRAMP_SWAPPER_OFFSET	(2 * PAGE_SIZE)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/bitops.h>

				#include <linux/compiler.h>

				@@ -392,7 +392,6 @@ static inline unsigned long virt_to_pfn(const void *kaddr)

				 *  virt_to_page(x)	convert a _valid_ virtual address to struct page *

				 *  virt_addr_valid(x)	indicates whether a virtual address is valid

				 */

				#define ARCH_PFN_OFFSET		((unsigned long)PHYS_PFN_OFFSET)

				#if defined(CONFIG_DEBUG_VIRTUAL)

				#define page_to_virt(x)	({						\

				@@ -422,7 +421,7 @@ static inline unsigned long virt_to_pfn(const void *kaddr)

				})

				void dump_mem_limit(void);

				#endif /* !ASSEMBLY */

				#endif /* !__ASSEMBLER__ */

				/*

				 * Given that the GIC architecture permits ITS implementations that can only be

									
										4

arch/arm64/include/asm/mmu.h
									
												View File
												
				@@ -12,7 +12,7 @@

				#define USER_ASID_FLAG	(UL(1) << USER_ASID_BIT)

				#define TTBR_ASID_MASK	(UL(0xffff) << 48)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/refcount.h>

				#include <asm/cpufeature.h>

				@@ -112,5 +112,5 @@ void kpti_install_ng_mappings(void);

				static inline void kpti_install_ng_mappings(void) {}

				#endif

				#endif	/* !__ASSEMBLY__ */

				#endif	/* !__ASSEMBLER__ */

				#endif

									
										20

arch/arm64/include/asm/mmu_context.h
									
												View File
												
				@@ -8,7 +8,7 @@

				#ifndef __ASM_MMU_CONTEXT_H

				#define __ASM_MMU_CONTEXT_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/compiler.h>

				#include <linux/sched.h>

				@@ -61,11 +61,6 @@ static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)

					cpu_do_switch_mm(virt_to_phys(pgd),mm);

				}

				/*

				 * TCR.T0SZ value to use when the ID map is active.

				 */

				#define idmap_t0sz	TCR_T0SZ(IDMAP_VA_BITS)

				/*

				 * Ensure TCR.T0SZ is set to the provided value.

				 */

				@@ -73,18 +68,15 @@ static inline void __cpu_set_tcr_t0sz(unsigned long t0sz)

				{

					unsigned long tcr = read_sysreg(tcr_el1);

					if ((tcr & TCR_T0SZ_MASK) == t0sz)

					if ((tcr & TCR_EL1_T0SZ_MASK) == t0sz)

						return;

					tcr &= ~TCR_T0SZ_MASK;

					tcr &= ~TCR_EL1_T0SZ_MASK;

					tcr |= t0sz;

					write_sysreg(tcr, tcr_el1);

					isb();

				}

				#define cpu_set_default_tcr_t0sz()	__cpu_set_tcr_t0sz(TCR_T0SZ(vabits_actual))

				#define cpu_set_idmap_tcr_t0sz()	__cpu_set_tcr_t0sz(idmap_t0sz)

				/*

				 * Remove the idmap from TTBR0_EL1 and install the pgd of the active mm.

				 *

				@@ -103,7 +95,7 @@ static inline void cpu_uninstall_idmap(void)

					cpu_set_reserved_ttbr0();

					local_flush_tlb_all();

					cpu_set_default_tcr_t0sz();

					__cpu_set_tcr_t0sz(TCR_T0SZ(vabits_actual));

					if (mm != &init_mm && !system_uses_ttbr0_pan())

						cpu_switch_mm(mm->pgd, mm);

				@@ -113,7 +105,7 @@ static inline void cpu_install_idmap(void)

				{

					cpu_set_reserved_ttbr0();

					local_flush_tlb_all();

					cpu_set_idmap_tcr_t0sz();

					__cpu_set_tcr_t0sz(TCR_T0SZ(IDMAP_VA_BITS));

					cpu_switch_mm(lm_alias(idmap_pg_dir), &init_mm);

				}

				@@ -330,6 +322,6 @@ static inline void deactivate_mm(struct task_struct *tsk,

				#include <asm-generic/mmu_context.h>

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif /* !__ASM_MMU_CONTEXT_H */

									
										4

arch/arm64/include/asm/mte-kasan.h
									
												View File
												
				@@ -9,7 +9,7 @@

				#include <asm/cputype.h>

				#include <asm/mte-def.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/types.h>

				@@ -259,6 +259,6 @@ static inline int mte_enable_kernel_store_only(void)

				#endif /* CONFIG_ARM64_MTE */

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_MTE_KASAN_H  */

									
										4

arch/arm64/include/asm/mte.h
									
												View File
												
				@@ -8,7 +8,7 @@

				#include <asm/compiler.h>

				#include <asm/mte-def.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/bitfield.h>

				#include <linux/kasan-enabled.h>

				@@ -282,5 +282,5 @@ static inline void mte_check_tfsr_exit(void)

				}

				#endif /* CONFIG_KASAN_HW_TAGS */

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_MTE_H  */

									
										4

arch/arm64/include/asm/page.h
									
												View File
												
				@@ -10,7 +10,7 @@

				#include <asm/page-def.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/personality.h> /* for READ_IMPLIES_EXEC */

				#include <linux/types.h> /* for gfp_t */

				@@ -45,7 +45,7 @@ int pfn_is_map_memory(unsigned long pfn);

				#include <asm/memory.h>

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#define VM_DATA_DEFAULT_FLAGS	(VM_DATA_FLAGS_TSK_EXEC | VM_MTE_ALLOWED)

									
										125

arch/arm64/include/asm/pgtable-hwdef.h
									
												View File
												
				@@ -228,102 +228,53 @@

				/*

				 * TCR flags.

				 */

				#define TCR_T0SZ_OFFSET		0

				#define TCR_T1SZ_OFFSET		16

				#define TCR_T0SZ(x)		((UL(64) - (x)) << TCR_T0SZ_OFFSET)

				#define TCR_T1SZ(x)		((UL(64) - (x)) << TCR_T1SZ_OFFSET)

				#define TCR_TxSZ(x)		(TCR_T0SZ(x) | TCR_T1SZ(x))

				#define TCR_TxSZ_WIDTH		6

				#define TCR_T0SZ_MASK		(((UL(1) << TCR_TxSZ_WIDTH) - 1) << TCR_T0SZ_OFFSET)

				#define TCR_T1SZ_MASK		(((UL(1) << TCR_TxSZ_WIDTH) - 1) << TCR_T1SZ_OFFSET)

				#define TCR_T0SZ(x)		((UL(64) - (x)) << TCR_EL1_T0SZ_SHIFT)

				#define TCR_T1SZ(x)		((UL(64) - (x)) << TCR_EL1_T1SZ_SHIFT)

				#define TCR_EPD0_SHIFT		7

				#define TCR_EPD0_MASK		(UL(1) << TCR_EPD0_SHIFT)

				#define TCR_IRGN0_SHIFT		8

				#define TCR_IRGN0_MASK		(UL(3) << TCR_IRGN0_SHIFT)

				#define TCR_IRGN0_NC		(UL(0) << TCR_IRGN0_SHIFT)

				#define TCR_IRGN0_WBWA		(UL(1) << TCR_IRGN0_SHIFT)

				#define TCR_IRGN0_WT		(UL(2) << TCR_IRGN0_SHIFT)

				#define TCR_IRGN0_WBnWA		(UL(3) << TCR_IRGN0_SHIFT)

				#define TCR_T0SZ_MASK		TCR_EL1_T0SZ_MASK

				#define TCR_T1SZ_MASK		TCR_EL1_T1SZ_MASK

				#define TCR_EPD1_SHIFT		23

				#define TCR_EPD1_MASK		(UL(1) << TCR_EPD1_SHIFT)

				#define TCR_IRGN1_SHIFT		24

				#define TCR_IRGN1_MASK		(UL(3) << TCR_IRGN1_SHIFT)

				#define TCR_IRGN1_NC		(UL(0) << TCR_IRGN1_SHIFT)

				#define TCR_IRGN1_WBWA		(UL(1) << TCR_IRGN1_SHIFT)

				#define TCR_IRGN1_WT		(UL(2) << TCR_IRGN1_SHIFT)

				#define TCR_IRGN1_WBnWA		(UL(3) << TCR_IRGN1_SHIFT)

				#define TCR_EPD0_MASK		TCR_EL1_EPD0_MASK

				#define TCR_EPD1_MASK		TCR_EL1_EPD1_MASK

				#define TCR_IRGN_NC		(TCR_IRGN0_NC | TCR_IRGN1_NC)

				#define TCR_IRGN_WBWA		(TCR_IRGN0_WBWA | TCR_IRGN1_WBWA)

				#define TCR_IRGN_WT		(TCR_IRGN0_WT | TCR_IRGN1_WT)

				#define TCR_IRGN_WBnWA		(TCR_IRGN0_WBnWA | TCR_IRGN1_WBnWA)

				#define TCR_IRGN_MASK		(TCR_IRGN0_MASK | TCR_IRGN1_MASK)

				#define TCR_IRGN0_MASK		TCR_EL1_IRGN0_MASK

				#define TCR_IRGN0_WBWA		(TCR_EL1_IRGN0_WBWA << TCR_EL1_IRGN0_SHIFT)

				#define TCR_ORGN0_MASK		TCR_EL1_ORGN0_MASK

				#define TCR_ORGN0_WBWA		(TCR_EL1_ORGN0_WBWA << TCR_EL1_ORGN0_SHIFT)

				#define TCR_ORGN0_SHIFT		10

				#define TCR_ORGN0_MASK		(UL(3) << TCR_ORGN0_SHIFT)

				#define TCR_ORGN0_NC		(UL(0) << TCR_ORGN0_SHIFT)

				#define TCR_ORGN0_WBWA		(UL(1) << TCR_ORGN0_SHIFT)

				#define TCR_ORGN0_WT		(UL(2) << TCR_ORGN0_SHIFT)

				#define TCR_ORGN0_WBnWA		(UL(3) << TCR_ORGN0_SHIFT)

				#define TCR_SH0_MASK		TCR_EL1_SH0_MASK

				#define TCR_SH0_INNER		(TCR_EL1_SH0_INNER << TCR_EL1_SH0_SHIFT)

				#define TCR_ORGN1_SHIFT		26

				#define TCR_ORGN1_MASK		(UL(3) << TCR_ORGN1_SHIFT)

				#define TCR_ORGN1_NC		(UL(0) << TCR_ORGN1_SHIFT)

				#define TCR_ORGN1_WBWA		(UL(1) << TCR_ORGN1_SHIFT)

				#define TCR_ORGN1_WT		(UL(2) << TCR_ORGN1_SHIFT)

				#define TCR_ORGN1_WBnWA		(UL(3) << TCR_ORGN1_SHIFT)

				#define TCR_SH1_MASK		TCR_EL1_SH1_MASK

				#define TCR_ORGN_NC		(TCR_ORGN0_NC | TCR_ORGN1_NC)

				#define TCR_ORGN_WBWA		(TCR_ORGN0_WBWA | TCR_ORGN1_WBWA)

				#define TCR_ORGN_WT		(TCR_ORGN0_WT | TCR_ORGN1_WT)

				#define TCR_ORGN_WBnWA		(TCR_ORGN0_WBnWA | TCR_ORGN1_WBnWA)

				#define TCR_ORGN_MASK		(TCR_ORGN0_MASK | TCR_ORGN1_MASK)

				#define TCR_TG0_SHIFT		TCR_EL1_TG0_SHIFT

				#define TCR_TG0_MASK		TCR_EL1_TG0_MASK

				#define TCR_TG0_4K		(TCR_EL1_TG0_4K << TCR_EL1_TG0_SHIFT)

				#define TCR_TG0_64K		(TCR_EL1_TG0_64K << TCR_EL1_TG0_SHIFT)

				#define TCR_TG0_16K		(TCR_EL1_TG0_16K << TCR_EL1_TG0_SHIFT)

				#define TCR_SH0_SHIFT		12

				#define TCR_SH0_MASK		(UL(3) << TCR_SH0_SHIFT)

				#define TCR_SH0_INNER		(UL(3) << TCR_SH0_SHIFT)

				#define TCR_TG1_SHIFT		TCR_EL1_TG1_SHIFT

				#define TCR_TG1_MASK		TCR_EL1_TG1_MASK

				#define TCR_TG1_16K		(TCR_EL1_TG1_16K << TCR_EL1_TG1_SHIFT)

				#define TCR_TG1_4K		(TCR_EL1_TG1_4K << TCR_EL1_TG1_SHIFT)

				#define TCR_TG1_64K		(TCR_EL1_TG1_64K << TCR_EL1_TG1_SHIFT)

				#define TCR_SH1_SHIFT		28

				#define TCR_SH1_MASK		(UL(3) << TCR_SH1_SHIFT)

				#define TCR_SH1_INNER		(UL(3) << TCR_SH1_SHIFT)

				#define TCR_SHARED		(TCR_SH0_INNER | TCR_SH1_INNER)

				#define TCR_TG0_SHIFT		14

				#define TCR_TG0_MASK		(UL(3) << TCR_TG0_SHIFT)

				#define TCR_TG0_4K		(UL(0) << TCR_TG0_SHIFT)

				#define TCR_TG0_64K		(UL(1) << TCR_TG0_SHIFT)

				#define TCR_TG0_16K		(UL(2) << TCR_TG0_SHIFT)

				#define TCR_TG1_SHIFT		30

				#define TCR_TG1_MASK		(UL(3) << TCR_TG1_SHIFT)

				#define TCR_TG1_16K		(UL(1) << TCR_TG1_SHIFT)

				#define TCR_TG1_4K		(UL(2) << TCR_TG1_SHIFT)

				#define TCR_TG1_64K		(UL(3) << TCR_TG1_SHIFT)

				#define TCR_IPS_SHIFT		32

				#define TCR_IPS_MASK		(UL(7) << TCR_IPS_SHIFT)

				#define TCR_A1			(UL(1) << 22)

				#define TCR_ASID16		(UL(1) << 36)

				#define TCR_TBI0		(UL(1) << 37)

				#define TCR_TBI1		(UL(1) << 38)

				#define TCR_HA			(UL(1) << 39)

				#define TCR_HD			(UL(1) << 40)

				#define TCR_HPD0_SHIFT		41

				#define TCR_HPD0		(UL(1) << TCR_HPD0_SHIFT)

				#define TCR_HPD1_SHIFT		42

				#define TCR_HPD1		(UL(1) << TCR_HPD1_SHIFT)

				#define TCR_TBID0		(UL(1) << 51)

				#define TCR_TBID1		(UL(1) << 52)

				#define TCR_NFD0		(UL(1) << 53)

				#define TCR_NFD1		(UL(1) << 54)

				#define TCR_E0PD0		(UL(1) << 55)

				#define TCR_E0PD1		(UL(1) << 56)

				#define TCR_TCMA0		(UL(1) << 57)

				#define TCR_TCMA1		(UL(1) << 58)

				#define TCR_DS			(UL(1) << 59)

				#define TCR_IPS_SHIFT		TCR_EL1_IPS_SHIFT

				#define TCR_IPS_MASK		TCR_EL1_IPS_MASK

				#define TCR_A1			TCR_EL1_A1

				#define TCR_ASID16		TCR_EL1_AS

				#define TCR_TBI0		TCR_EL1_TBI0

				#define TCR_TBI1		TCR_EL1_TBI1

				#define TCR_HA			TCR_EL1_HA

				#define TCR_HD			TCR_EL1_HD

				#define TCR_HPD0		TCR_EL1_HPD0

				#define TCR_HPD1		TCR_EL1_HPD1

				#define TCR_TBID0		TCR_EL1_TBID0

				#define TCR_TBID1		TCR_EL1_TBID1

				#define TCR_E0PD0		TCR_EL1_E0PD0

				#define TCR_E0PD1		TCR_EL1_E0PD1

				#define TCR_DS			TCR_EL1_DS

				/*

				 * TTBR.

									
										6

arch/arm64/include/asm/pgtable-prot.h
									
												View File
												
				@@ -62,7 +62,7 @@

				#define _PAGE_READONLY_EXEC	(_PAGE_DEFAULT | PTE_USER | PTE_RDONLY | PTE_NG | PTE_PXN)

				#define _PAGE_EXECONLY		(_PAGE_DEFAULT | PTE_RDONLY | PTE_NG | PTE_PXN)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/cpufeature.h>

				#include <asm/pgtable-types.h>

				@@ -84,7 +84,7 @@ extern unsigned long prot_ns_shared;

				#else

				static inline bool __pure lpa2_is_enabled(void)

				{

					return read_tcr() & TCR_DS;

					return read_tcr() & TCR_EL1_DS;

				}

				#define PTE_MAYBE_SHARED	(lpa2_is_enabled() ? 0 : PTE_SHARED)

				@@ -127,7 +127,7 @@ static inline bool __pure lpa2_is_enabled(void)

				#define PAGE_READONLY_EXEC	__pgprot(_PAGE_READONLY_EXEC)

				#define PAGE_EXECONLY		__pgprot(_PAGE_EXECONLY)

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#define pte_pi_index(pte) ( \

					((pte & BIT(PTE_PI_IDX_3)) >> (PTE_PI_IDX_3 - 3)) | \

									
										22

arch/arm64/include/asm/pgtable.h
									
												View File
												
				@@ -30,7 +30,7 @@

				#define vmemmap			((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/cmpxchg.h>

				#include <asm/fixmap.h>

				@@ -130,12 +130,16 @@ static inline void arch_leave_lazy_mmu_mode(void)

				#endif /* CONFIG_TRANSPARENT_HUGEPAGE */

				/*

				 * Outside of a few very special situations (e.g. hibernation), we always

				 * use broadcast TLB invalidation instructions, therefore a spurious page

				 * fault on one CPU which has been handled concurrently by another CPU

				 * does not need to perform additional invalidation.

				 * We use local TLB invalidation instruction when reusing page in

				 * write protection fault handler to avoid TLBI broadcast in the hot

				 * path.  This will cause spurious page faults if stale read-only TLB

				 * entries exist.

				 */

				#define flush_tlb_fix_spurious_fault(vma, address, ptep) do { } while (0)

				#define flush_tlb_fix_spurious_fault(vma, address, ptep)	\

					local_flush_tlb_page_nonotify(vma, address)

				#define flush_tlb_fix_spurious_fault_pmd(vma, address, pmdp)	\

					local_flush_tlb_page_nonotify(vma, address)

				/*

				 * ZERO_PAGE is a global shared page that is always zero: used

				@@ -433,7 +437,7 @@ bool pgattr_change_is_safe(pteval_t old, pteval_t new);

				 *   1      0      |   1           0          1

				 *   1      1      |   0           1          x

				 *

				 * When hardware DBM is not present, the sofware PTE_DIRTY bit is updated via

				 * When hardware DBM is not present, the software PTE_DIRTY bit is updated via

				 * the page fault mechanism. Checking the dirty status of a pte becomes:

				 *

				 *   PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY)

				@@ -599,7 +603,7 @@ static inline int pte_protnone(pte_t pte)

					/*

					 * pte_present_invalid() tells us that the pte is invalid from HW

					 * perspective but present from SW perspective, so the fields are to be

					 * interpretted as per the HW layout. The second 2 checks are the unique

					 * interpreted as per the HW layout. The second 2 checks are the unique

					 * encoding that we use for PROT_NONE. It is insufficient to only use

					 * the first check because we share the same encoding scheme with pmds

					 * which support pmd_mkinvalid(), so can be present-invalid without

				@@ -1949,6 +1953,6 @@ static inline void clear_young_dirty_ptes(struct vm_area_struct *vma,

				#endif /* CONFIG_ARM64_CONTPTE */

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif /* __ASM_PGTABLE_H */

									
										4

arch/arm64/include/asm/proc-fns.h
									
												View File
												
				@@ -9,7 +9,7 @@

				#ifndef __ASM_PROCFNS_H

				#define __ASM_PROCFNS_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/page.h>

				@@ -21,5 +21,5 @@ extern u64 cpu_do_resume(phys_addr_t ptr, u64 idmap_ttbr);

				#include <asm/memory.h>

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_PROCFNS_H */

									
										4

arch/arm64/include/asm/processor.h
									
												View File
												
				@@ -25,7 +25,7 @@

				#define MTE_CTRL_STORE_ONLY		(1UL << 19)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/build_bug.h>

				#include <linux/cache.h>

				@@ -437,5 +437,5 @@ int set_tsc_mode(unsigned int val);

				#define GET_TSC_CTL(adr)        get_tsc_mode((adr))

				#define SET_TSC_CTL(val)        set_tsc_mode((val))

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_PROCESSOR_H */

									
										4

arch/arm64/include/asm/ptrace.h
									
												View File
												
				@@ -94,7 +94,7 @@

				 */

				#define NO_SYSCALL (-1)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/bug.h>

				#include <linux/types.h>

				@@ -361,5 +361,5 @@ static inline void procedure_link_pointer_set(struct pt_regs *regs,

				extern unsigned long profile_pc(struct pt_regs *regs);

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif

									
										4

arch/arm64/include/asm/rsi_smc.h
									
												View File
												
				@@ -122,7 +122,7 @@

				 */

				#define SMC_RSI_ATTESTATION_TOKEN_CONTINUE	SMC_RSI_FID(0x195)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				struct realm_config {

					union {

				@@ -142,7 +142,7 @@ struct realm_config {

					 */

				} __aligned(0x1000);

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				/*

				 * Read configuration for the current Realm.

									
										4

arch/arm64/include/asm/rwonce.h
									
												View File
												
				@@ -5,7 +5,7 @@

				#ifndef __ASM_RWONCE_H

				#define __ASM_RWONCE_H

				#if defined(CONFIG_LTO) && !defined(__ASSEMBLY__)

				#if defined(CONFIG_LTO) && !defined(__ASSEMBLER__)

				#include <linux/compiler_types.h>

				#include <asm/alternative-macros.h>

				@@ -62,7 +62,7 @@

				})

				#endif	/* !BUILD_VDSO */

				#endif	/* CONFIG_LTO && !__ASSEMBLY__ */

				#endif	/* CONFIG_LTO && !__ASSEMBLER__ */

				#include <asm-generic/rwonce.h>

									
										4

arch/arm64/include/asm/scs.h
									
												View File
												
				@@ -2,7 +2,7 @@

				#ifndef _ASM_SCS_H

				#define _ASM_SCS_H

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

				#include <asm/asm-offsets.h>

				#include <asm/sysreg.h>

				@@ -55,6 +55,6 @@ enum {

				int __pi_scs_patch(const u8 eh_frame[], int size, bool skip_dry_run);

				#endif /* __ASSEMBLY __ */

				#endif /* __ASSEMBLER__ */

				#endif /* _ASM_SCS_H */

									
										4

arch/arm64/include/asm/sdei.h
									
												View File
												
				@@ -9,7 +9,7 @@

				#define SDEI_STACK_SIZE		IRQ_STACK_SIZE

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/linkage.h>

				#include <linux/preempt.h>

				@@ -49,5 +49,5 @@ unsigned long do_sdei_event(struct pt_regs *regs,

				unsigned long sdei_arch_get_entry_point(int conduit);

				#define sdei_arch_get_entry_point(x)	sdei_arch_get_entry_point(x)

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif	/* __ASM_SDEI_H */

									
										2

arch/arm64/include/asm/simd.h
									
												View File
												
				@@ -29,7 +29,7 @@ static __must_check inline bool may_use_simd(void)

					 */

					return !WARN_ON(!system_capabilities_finalized()) &&

					       system_supports_fpsimd() &&

					       !in_hardirq() && !irqs_disabled() && !in_nmi();

					       !in_hardirq() && !in_nmi();

				}

				#else /* ! CONFIG_KERNEL_MODE_NEON */

									
										4

arch/arm64/include/asm/smp.h
									
												View File
												
				@@ -23,7 +23,7 @@

				#define CPU_STUCK_REASON_52_BIT_VA	(UL(1) << CPU_STUCK_REASON_SHIFT)

				#define CPU_STUCK_REASON_NO_GRAN	(UL(2) << CPU_STUCK_REASON_SHIFT)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/threads.h>

				#include <linux/cpumask.h>

				@@ -155,6 +155,6 @@ bool cpus_are_stuck_in_kernel(void);

				extern void crash_smp_send_stop(void);

				extern bool smp_crash_stop_failed(void);

				#endif /* ifndef __ASSEMBLY__ */

				#endif /* ifndef __ASSEMBLER__ */

				#endif /* ifndef __ASM_SMP_H */

									
										4

arch/arm64/include/asm/spectre.h
									
												View File
												
				@@ -12,7 +12,7 @@

				#define BP_HARDEN_EL2_SLOTS 4

				#define __BP_HARDEN_HYP_VECS_SZ	((BP_HARDEN_EL2_SLOTS - 1) * SZ_2K)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/smp.h>

				#include <asm/percpu.h>

				@@ -119,5 +119,5 @@ void spectre_bhb_patch_clearbhb(struct alt_instr *alt,

								__le32 *origptr, __le32 *updptr, int nr_inst);

				void spectre_print_disabled_mitigations(void);

				#endif	/* __ASSEMBLY__ */

				#endif	/* __ASSEMBLER__ */

				#endif	/* __ASM_SPECTRE_H */

									
										4

arch/arm64/include/asm/stacktrace/frame.h
									
												View File
												
				@@ -25,7 +25,7 @@

				#define FRAME_META_TYPE_FINAL		1

				#define FRAME_META_TYPE_PT_REGS		2

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				/* 

				 * A standard AAPCS64 frame record.

				 */

				@@ -43,6 +43,6 @@ struct frame_record_meta {

					struct frame_record record;

					u64 type;

				};

				#endif /* __ASSEMBLY */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_STACKTRACE_FRAME_H */

									
										2

arch/arm64/include/asm/suspend.h
									
												View File
												
				@@ -23,7 +23,7 @@ struct cpu_suspend_ctx {

				 * __cpu_suspend_enter()'s caller, and populated by __cpu_suspend_enter().

				 * This data must survive until cpu_resume() is called.

				 *

				 * This struct desribes the size and the layout of the saved cpu state.

				 * This struct describes the size and the layout of the saved cpu state.

				 * The layout of the callee_saved_regs is defined by the implementation

				 * of __cpu_suspend_enter(), and cpu_resume(). This struct must be passed

				 * in by the caller as __cpu_suspend_enter()'s stack-frame is gone once it

									
										12

arch/arm64/include/asm/sysreg.h
									
												View File
												
				@@ -52,7 +52,7 @@

				#ifndef CONFIG_BROKEN_GAS_INST

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

				// The space separator is omitted so that __emit_inst(x) can be parsed as

				// either an assembler directive or an assembler macro argument.

				#define __emit_inst(x)			.inst(x)

				@@ -71,11 +71,11 @@

									 (((x) >> 24) & 0x000000ff))

				#endif	/* CONFIG_CPU_BIG_ENDIAN */

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

				#define __emit_inst(x)			.long __INSTR_BSWAP(x)

				#else  /* __ASSEMBLY__ */

				#else  /* __ASSEMBLER__ */

				#define __emit_inst(x)			".long " __stringify(__INSTR_BSWAP(x)) "\n\t"

				#endif	/* __ASSEMBLY__ */

				#endif	/* __ASSEMBLER__ */

				#endif	/* CONFIG_BROKEN_GAS_INST */

				@@ -1129,9 +1129,7 @@

				#define gicr_insn(insn)			read_sysreg_s(GICV5_OP_GICR_##insn)

				#define gic_insn(v, insn)		write_sysreg_s(v, GICV5_OP_GIC_##insn)

				#define ARM64_FEATURE_FIELD_BITS	4

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

					.macro	mrs_s, rt, sreg

					 __emit_inst(0xd5200000|(\sreg)|(.L__gpr_num_\rt))

									
										4

arch/arm64/include/asm/system_misc.h
									
												View File
												
				@@ -7,7 +7,7 @@

				#ifndef __ASM_SYSTEM_MISC_H

				#define __ASM_SYSTEM_MISC_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/compiler.h>

				#include <linux/linkage.h>

				@@ -28,6 +28,6 @@ void arm64_notify_die(const char *str, struct pt_regs *regs,

				struct mm_struct;

				extern void __show_regs(struct pt_regs *);

				#endif	/* __ASSEMBLY__ */

				#endif	/* __ASSEMBLER__ */

				#endif	/* __ASM_SYSTEM_MISC_H */

									
										2

arch/arm64/include/asm/thread_info.h
									
												View File
												
				@@ -10,7 +10,7 @@

				#include <linux/compiler.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				struct task_struct;

									
										85

arch/arm64/include/asm/tlbflush.h
									
												View File
												
				@@ -8,7 +8,7 @@

				#ifndef __ASM_TLBFLUSH_H

				#define __ASM_TLBFLUSH_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/bitfield.h>

				#include <linux/mm_types.h>

				@@ -249,6 +249,19 @@ static inline unsigned long get_trans_granule(void)

				 *		cannot be easily determined, the value TLBI_TTL_UNKNOWN will

				 *		perform a non-hinted invalidation.

				 *

				 *	local_flush_tlb_page(vma, addr)

				 *		Local variant of flush_tlb_page().  Stale TLB entries may

				 *		remain in remote CPUs.

				 *

				 *	local_flush_tlb_page_nonotify(vma, addr)

				 *		Same as local_flush_tlb_page() except MMU notifier will not be

				 *		called.

				 *

				 *	local_flush_tlb_contpte(vma, addr)

				 *		Invalidate the virtual-address range

				 *		'[addr, addr+CONT_PTE_SIZE)' mapped with contpte on local CPU

				 *		for the user address space corresponding to 'vma->mm'.  Stale

				 *		TLB entries may remain in remote CPUs.

				 *

				 *	Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented

				 *	on top of these routines, since that is our interface to the mmu_gather

				@@ -282,6 +295,33 @@ static inline void flush_tlb_mm(struct mm_struct *mm)

					mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);

				}

				static inline void __local_flush_tlb_page_nonotify_nosync(struct mm_struct *mm,

											  unsigned long uaddr)

				{

					unsigned long addr;

					dsb(nshst);

					addr = __TLBI_VADDR(uaddr, ASID(mm));

					__tlbi(vale1, addr);

					__tlbi_user(vale1, addr);

				}

				static inline void local_flush_tlb_page_nonotify(struct vm_area_struct *vma,

										 unsigned long uaddr)

				{

					__local_flush_tlb_page_nonotify_nosync(vma->vm_mm, uaddr);

					dsb(nsh);

				}

				static inline void local_flush_tlb_page(struct vm_area_struct *vma,

									unsigned long uaddr)

				{

					__local_flush_tlb_page_nonotify_nosync(vma->vm_mm, uaddr);

					mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, uaddr & PAGE_MASK,

										(uaddr & PAGE_MASK) + PAGE_SIZE);

					dsb(nsh);

				}

				static inline void __flush_tlb_page_nosync(struct mm_struct *mm,

									   unsigned long uaddr)

				{

				@@ -472,6 +512,22 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,

					dsb(ish);

				}

				static inline void local_flush_tlb_contpte(struct vm_area_struct *vma,

									   unsigned long addr)

				{

					unsigned long asid;

					addr = round_down(addr, CONT_PTE_SIZE);

					dsb(nshst);

					asid = ASID(vma->vm_mm);

					__flush_tlb_range_op(vale1, addr, CONT_PTES, PAGE_SIZE, asid,

							     3, true, lpa2_is_enabled());

					mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, addr,

										    addr + CONT_PTE_SIZE);

					dsb(nsh);

				}

				static inline void flush_tlb_range(struct vm_area_struct *vma,

								   unsigned long start, unsigned long end)

				{

				@@ -524,6 +580,33 @@ static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *b

				{

					__flush_tlb_range_nosync(mm, start, end, PAGE_SIZE, true, 3);

				}

				static inline bool __pte_flags_need_flush(ptdesc_t oldval, ptdesc_t newval)

				{

					ptdesc_t diff = oldval ^ newval;

					/* invalid to valid transition requires no flush */

					if (!(oldval & PTE_VALID))

						return false;

					/* Transition in the SW bits requires no flush */

					diff &= ~PTE_SWBITS_MASK;

					return diff;

				}

				static inline bool pte_needs_flush(pte_t oldpte, pte_t newpte)

				{

					return __pte_flags_need_flush(pte_val(oldpte), pte_val(newpte));

				}

				#define pte_needs_flush pte_needs_flush

				static inline bool huge_pmd_needs_flush(pmd_t oldpmd, pmd_t newpmd)

				{

					return __pte_flags_need_flush(pmd_val(oldpmd), pmd_val(newpmd));

				}

				#define huge_pmd_needs_flush huge_pmd_needs_flush

				#endif

				#endif

									
										4

arch/arm64/include/asm/uaccess.h
									
												View File
												
				@@ -422,9 +422,9 @@ static __must_check __always_inline bool user_access_begin(const void __user *pt

				}

				#define user_access_begin(a,b)	user_access_begin(a,b)

				#define user_access_end()	uaccess_ttbr0_disable()

				#define unsafe_put_user(x, ptr, label) \

				#define arch_unsafe_put_user(x, ptr, label) \

					__raw_put_mem("sttr", x, uaccess_mask_ptr(ptr), label, U)

				#define unsafe_get_user(x, ptr, label) \

				#define arch_unsafe_get_user(x, ptr, label) \

					__raw_get_mem("ldtr", x, uaccess_mask_ptr(ptr), label, U)

				/*

									
										4

arch/arm64/include/asm/vdso.h
									
												View File
												
				@@ -7,7 +7,7 @@

				#define __VDSO_PAGES    4

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <generated/vdso-offsets.h>

				@@ -19,6 +19,6 @@

				extern char vdso_start[], vdso_end[];

				extern char vdso32_start[], vdso32_end[];

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif /* __ASM_VDSO_H */

									
										4

arch/arm64/include/asm/vdso/compat_barrier.h
									
												View File
												
				@@ -5,7 +5,7 @@

				#ifndef __COMPAT_BARRIER_H

				#define __COMPAT_BARRIER_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				/*

				 * Warning: This code is meant to be used from the compat vDSO only.

				 */

				@@ -31,6 +31,6 @@

				#define smp_rmb()	aarch32_smp_rmb()

				#define smp_wmb()	aarch32_smp_wmb()

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif /* __COMPAT_BARRIER_H */

									
										4

arch/arm64/include/asm/vdso/compat_gettimeofday.h
									
												View File
												
				@@ -5,7 +5,7 @@

				#ifndef __ASM_VDSO_COMPAT_GETTIMEOFDAY_H

				#define __ASM_VDSO_COMPAT_GETTIMEOFDAY_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/barrier.h>

				#include <asm/unistd_compat_32.h>

				@@ -161,6 +161,6 @@ static inline bool vdso_clocksource_ok(const struct vdso_clock *vc)

				}

				#define vdso_clocksource_ok	vdso_clocksource_ok

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif /* __ASM_VDSO_COMPAT_GETTIMEOFDAY_H */

									
										4

arch/arm64/include/asm/vdso/getrandom.h
									
												View File
												
				@@ -3,7 +3,7 @@

				#ifndef __ASM_VDSO_GETRANDOM_H

				#define __ASM_VDSO_GETRANDOM_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/unistd.h>

				#include <asm/vdso/vsyscall.h>

				@@ -33,6 +33,6 @@ static __always_inline ssize_t getrandom_syscall(void *_buffer, size_t _len, uns

					return ret;

				}

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif /* __ASM_VDSO_GETRANDOM_H */

									
										4

arch/arm64/include/asm/vdso/gettimeofday.h
									
												View File
												
				@@ -7,7 +7,7 @@

				#ifdef __aarch64__

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/alternative.h>

				#include <asm/arch_timer.h>

				@@ -96,7 +96,7 @@ static __always_inline const struct vdso_time_data *__arch_get_vdso_u_time_data(

				#define __arch_get_vdso_u_time_data __arch_get_vdso_u_time_data

				#endif /* IS_ENABLED(CONFIG_CC_IS_GCC) && IS_ENABLED(CONFIG_PAGE_SIZE_64KB) */

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#else /* !__aarch64__ */

									
										4

arch/arm64/include/asm/vdso/processor.h
									
												View File
												
				@@ -5,13 +5,13 @@

				#ifndef __ASM_VDSO_PROCESSOR_H

				#define __ASM_VDSO_PROCESSOR_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				static inline void cpu_relax(void)

				{

					asm volatile("yield" ::: "memory");

				}

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_VDSO_PROCESSOR_H */

									
										4

arch/arm64/include/asm/vdso/vsyscall.h
									
												View File
												
				@@ -2,7 +2,7 @@

				#ifndef __ASM_VDSO_VSYSCALL_H

				#define __ASM_VDSO_VSYSCALL_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <vdso/datapage.h>

				@@ -22,6 +22,6 @@ void __arch_update_vdso_clock(struct vdso_clock *vc)

				/* The asm-generic header needs to be included after the definitions above */

				#include <asm-generic/vdso/vsyscall.h>

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif /* __ASM_VDSO_VSYSCALL_H */

									
										4

arch/arm64/include/asm/virt.h
									
												View File
												
				@@ -56,7 +56,7 @@

				 */

				#define BOOT_CPU_FLAG_E2H	BIT_ULL(32)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/ptrace.h>

				#include <asm/sections.h>

				@@ -161,6 +161,6 @@ static inline bool is_hyp_nvhe(void)

					return is_hyp_mode_available() && !is_kernel_in_hyp_mode();

				}

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* ! __ASM__VIRT_H */

									
										4

arch/arm64/include/asm/vmap_stack.h
									
												View File
												
				@@ -3,9 +3,7 @@

				#ifndef __ASM_VMAP_STACK_H

				#define __ASM_VMAP_STACK_H

				#include <linux/bug.h>

				#include <linux/gfp.h>

				#include <linux/kconfig.h>

				#include <linux/vmalloc.h>

				#include <linux/pgtable.h>

				#include <asm/memory.h>

				@@ -19,8 +17,6 @@ static inline unsigned long *arch_alloc_vmap_stack(size_t stack_size, int node)

				{

					void *p;

					BUILD_BUG_ON(!IS_ENABLED(CONFIG_VMAP_STACK));

					p = __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, node,

							__builtin_return_address(0));

					return kasan_reset_tag(p);

									
										2

arch/arm64/include/uapi/asm/kvm.h
									
												View File
												
				@@ -31,7 +31,7 @@

				#define KVM_SPSR_FIQ	4

				#define KVM_NR_SPSR	5

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/psci.h>

				#include <linux/types.h>

				#include <asm/ptrace.h>

									
										4

arch/arm64/include/uapi/asm/ptrace.h
									
												View File
												
				@@ -80,7 +80,7 @@

				#define PTRACE_PEEKMTETAGS	  33

				#define PTRACE_POKEMTETAGS	  34

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				/*

				 * User structures for general purpose, floating point and debug registers.

				@@ -332,6 +332,6 @@ struct user_gcs {

					__u64 gcspr_el0;

				};

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* _UAPI__ASM_PTRACE_H */

									
										4

arch/arm64/include/uapi/asm/sigcontext.h
									
												View File
												
				@@ -17,7 +17,7 @@

				#ifndef _UAPI__ASM_SIGCONTEXT_H

				#define _UAPI__ASM_SIGCONTEXT_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/types.h>

				@@ -192,7 +192,7 @@ struct gcs_context {

					__u64 reserved;

				};

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#include <asm/sve_context.h>

									
										4

arch/arm64/kernel/acpi.c
									
												View File
												
				@@ -133,7 +133,7 @@ static int __init acpi_fadt_sanity_check(void)

					/*

					 * FADT is required on arm64; retrieve it to check its presence

					 * and carry out revision and ACPI HW reduced compliancy tests

					 * and carry out revision and ACPI HW reduced compliance tests

					 */

					status = acpi_get_table(ACPI_SIG_FADT, 0, &table);

					if (ACPI_FAILURE(status)) {

				@@ -423,7 +423,7 @@ int apei_claim_sea(struct pt_regs *regs)

							irq_work_run();

							__irq_exit();

						} else {

							pr_warn_ratelimited("APEI work queued but not completed");

							pr_warn_ratelimited("APEI work queued but not completed\n");

							err = -EINPROGRESS;

						}

					}

									
										6

arch/arm64/kernel/cpufeature.c
									
												View File
												
				@@ -1003,7 +1003,7 @@ static void __init sort_ftr_regs(void)

				/*

				 * Initialise the CPU feature register from Boot CPU values.

				 * Also initiliases the strict_mask for the register.

				 * Also initialises the strict_mask for the register.

				 * Any bits that are not covered by an arm64_ftr_bits entry are considered

				 * RES0 for the system-wide value, and must strictly match.

				 */

				@@ -1970,7 +1970,7 @@ static struct cpumask dbm_cpus __read_mostly;

				static inline void __cpu_enable_hw_dbm(void)

				{

					u64 tcr = read_sysreg(tcr_el1) | TCR_HD;

					u64 tcr = read_sysreg(tcr_el1) | TCR_EL1_HD;

					write_sysreg(tcr, tcr_el1);

					isb();

				@@ -2256,7 +2256,7 @@ static bool has_generic_auth(const struct arm64_cpu_capabilities *entry,

				static void cpu_enable_e0pd(struct arm64_cpu_capabilities const *cap)

				{

					if (this_cpu_has_cap(ARM64_HAS_E0PD))

						sysreg_clear_set(tcr_el1, 0, TCR_E0PD1);

						sysreg_clear_set(tcr_el1, 0, TCR_EL1_E0PD1);

				}

				#endif /* CONFIG_ARM64_E0PD */

									
										46

arch/arm64/kernel/efi.c
									
												View File
												
				@@ -10,6 +10,7 @@

				#include <linux/efi.h>

				#include <linux/init.h>

				#include <linux/kmemleak.h>

				#include <linux/kthread.h>

				#include <linux/screen_info.h>

				#include <linux/vmalloc.h>

				@@ -165,20 +166,53 @@ asmlinkage efi_status_t efi_handle_corrupted_x18(efi_status_t s, const char *f)

					return s;

				}

				static DEFINE_RAW_SPINLOCK(efi_rt_lock);

				void arch_efi_call_virt_setup(void)

				{

					efi_virtmap_load();

					raw_spin_lock(&efi_rt_lock);

					efi_runtime_assert_lock_held();

					if (preemptible() && (current->flags & PF_KTHREAD)) {

						/*

						 * Disable migration to ensure that a preempted EFI runtime

						 * service call will be resumed on the same CPU. This avoids

						 * potential issues with EFI runtime calls that are preempted

						 * while polling for an asynchronous completion of a secure

						 * firmware call, which may not permit the CPU to change.

						 */

						migrate_disable();

						kthread_use_mm(&efi_mm);

					} else {

						efi_virtmap_load();

					}

					/*

					 * Enable access to the valid TTBR0_EL1 and invoke the errata

					 * workaround directly since there is no return from exception when

					 * invoking the EFI run-time services.

					 */

					uaccess_ttbr0_enable();

					post_ttbr_update_workaround();

					__efi_fpsimd_begin();

				}

				void arch_efi_call_virt_teardown(void)

				{

					__efi_fpsimd_end();

					raw_spin_unlock(&efi_rt_lock);

					efi_virtmap_unload();

					/*

					 * Defer the switch to the current thread's TTBR0_EL1 until

					 * uaccess_enable(). Do so before efi_virtmap_unload() updates the

					 * saved TTBR0 value, so the userland page tables are not activated

					 * inadvertently over the back of an exception.

					 */

					uaccess_ttbr0_disable();

					if (preemptible() && (current->flags & PF_KTHREAD)) {

						kthread_unuse_mm(&efi_mm);

						migrate_enable();

					} else {

						efi_virtmap_unload();

					}

				}

				asmlinkage u64 *efi_rt_stack_top __ro_after_init;

									
										30

arch/arm64/kernel/entry-common.c
									
												View File
												
				@@ -34,20 +34,12 @@

				 * Handle IRQ/context state management when entering from kernel mode.

				 * Before this function is called it is not safe to call regular kernel code,

				 * instrumentable code, or any code which may trigger an exception.

				 *

				 * This is intended to match the logic in irqentry_enter(), handling the kernel

				 * mode transitions only.

				 */

				static __always_inline irqentry_state_t __enter_from_kernel_mode(struct pt_regs *regs)

				{

					return irqentry_enter(regs);

				}

				static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)

				{

					irqentry_state_t state;

					state = __enter_from_kernel_mode(regs);

					state = irqentry_enter(regs);

					mte_check_tfsr_entry();

					mte_disable_tco_entry(current);

				@@ -58,21 +50,12 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)

				 * Handle IRQ/context state management when exiting to kernel mode.

				 * After this function returns it is not safe to call regular kernel code,

				 * instrumentable code, or any code which may trigger an exception.

				 *

				 * This is intended to match the logic in irqentry_exit(), handling the kernel

				 * mode transitions only, and with preemption handled elsewhere.

				 */

				static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs,

										  irqentry_state_t state)

				{

					irqentry_exit(regs, state);

				}

				static void noinstr exit_to_kernel_mode(struct pt_regs *regs,

									irqentry_state_t state)

				{

					mte_check_tfsr_exit();

					__exit_to_kernel_mode(regs, state);

					irqentry_exit(regs, state);

				}

				/*

				@@ -80,17 +63,12 @@ static void noinstr exit_to_kernel_mode(struct pt_regs *regs,

				 * Before this function is called it is not safe to call regular kernel code,

				 * instrumentable code, or any code which may trigger an exception.

				 */

				static __always_inline void __enter_from_user_mode(struct pt_regs *regs)

				static __always_inline void arm64_enter_from_user_mode(struct pt_regs *regs)

				{

					enter_from_user_mode(regs);

					mte_disable_tco_entry(current);

				}

				static __always_inline void arm64_enter_from_user_mode(struct pt_regs *regs)

				{

					__enter_from_user_mode(regs);

				}

				/*

				 * Handle IRQ/context state management when exiting to user mode.

				 * After this function returns it is not safe to call regular kernel code,

				@@ -100,7 +78,7 @@ static __always_inline void arm64_enter_from_user_mode(struct pt_regs *regs)

				static __always_inline void arm64_exit_to_user_mode(struct pt_regs *regs)

				{

					local_irq_disable();

					exit_to_user_mode_prepare(regs);

					exit_to_user_mode_prepare_legacy(regs);

					local_daif_mask();

					mte_check_tfsr_exit();

					exit_to_user_mode();

									
										2

arch/arm64/kernel/entry-ftrace.S
									
												View File
												
				@@ -94,7 +94,7 @@ SYM_CODE_START(ftrace_caller)

					stp	x29, x30, [sp, #FREGS_SIZE]

					add	x29, sp, #FREGS_SIZE

					/* Prepare arguments for the the tracer func */

					/* Prepare arguments for the tracer func */

					sub	x0, x30, #AARCH64_INSN_SIZE		// ip (callsite's BL insn)

					mov	x1, x9					// parent_ip (callsite's LR)

					mov	x3, sp					// regs

									
										29

arch/arm64/kernel/fpsimd.c
									
												View File
												
				@@ -225,10 +225,21 @@ static void fpsimd_bind_task_to_cpu(void);

				 */

				static void get_cpu_fpsimd_context(void)

				{

					if (!IS_ENABLED(CONFIG_PREEMPT_RT))

						local_bh_disable();

					else

					if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {

						/*

						 * The softirq subsystem lacks a true unmask/mask API, and

						 * re-enabling softirq processing using local_bh_enable() will

						 * not only unmask softirqs, it will also result in immediate

						 * delivery of any pending softirqs.

						 * This is undesirable when running with IRQs disabled, but in

						 * that case, there is no need to mask softirqs in the first

						 * place, so only bother doing so when IRQs are enabled.

						 */

						if (!irqs_disabled())

							local_bh_disable();

					} else {

						preempt_disable();

					}

				}

				/*

				@@ -240,10 +251,12 @@ static void get_cpu_fpsimd_context(void)

				 */

				static void put_cpu_fpsimd_context(void)

				{

					if (!IS_ENABLED(CONFIG_PREEMPT_RT))

						local_bh_enable();

					else

					if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {

						if (!irqs_disabled())

							local_bh_enable();

					} else {

						preempt_enable();

					}

				}

				unsigned int task_get_vl(const struct task_struct *task, enum vec_type type)

				@@ -1934,11 +1947,11 @@ void __efi_fpsimd_begin(void)

					if (!system_supports_fpsimd())

						return;

					WARN_ON(preemptible());

					if (may_use_simd()) {

						kernel_neon_begin();

					} else {

						WARN_ON(preemptible());

						/*

						 * If !efi_sve_state, SVE can't be in use yet and doesn't need

						 * preserving:

									
										2

arch/arm64/kernel/ftrace.c
									
												View File
												
				@@ -492,7 +492,7 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,

						return ret;

					/*

					 * When using mcount, callsites in modules may have been initalized to

					 * When using mcount, callsites in modules may have been initialized to

					 * call an arbitrary module PLT (which redirects to the _mcount stub)

					 * rather than the ftrace PLT we'll use at runtime (which redirects to

					 * the ftrace trampoline). We can ignore the old PLT when initializing

									
										2

arch/arm64/kernel/irq.c
									
												View File
												
				@@ -62,7 +62,7 @@ static void __init init_irq_stacks(void)

					}

				}

				#ifndef CONFIG_PREEMPT_RT

				#ifdef CONFIG_SOFTIRQ_ON_OWN_STACK

				static void ____do_softirq(struct pt_regs *regs)

				{

					__do_softirq();

									
										2

arch/arm64/kernel/machine_kexec.c
									
												View File
												
				@@ -251,7 +251,7 @@ void crash_post_resume(void)

				 * marked as Reserved as memory was allocated via memblock_reserve().

				 *

				 * In hibernation, the pages which are Reserved and yet "nosave" are excluded

				 * from the hibernation iamge. crash_is_nosave() does thich check for crash

				 * from the hibernation image. crash_is_nosave() does thich check for crash

				 * dump kernel and will reduce the total size of hibernation image.

				 */

									
										8

arch/arm64/kernel/pi/map_kernel.c
									
												View File
												
				@@ -141,13 +141,13 @@ static void __init map_kernel(u64 kaslr_offset, u64 va_offset, int root_level)

				static void noinline __section(".idmap.text") set_ttbr0_for_lpa2(phys_addr_t ttbr)

				{

					u64 sctlr = read_sysreg(sctlr_el1);

					u64 tcr = read_sysreg(tcr_el1) | TCR_DS;

					u64 tcr = read_sysreg(tcr_el1) | TCR_EL1_DS;

					u64 mmfr0 = read_sysreg(id_aa64mmfr0_el1);

					u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,

											   ID_AA64MMFR0_EL1_PARANGE_SHIFT);

					tcr &= ~TCR_IPS_MASK;

					tcr |= parange << TCR_IPS_SHIFT;

					tcr &= ~TCR_EL1_IPS_MASK;

					tcr |= parange << TCR_EL1_IPS_SHIFT;

					asm("	msr	sctlr_el1, %0		;"

					    "	isb				;"

				@@ -263,7 +263,7 @@ asmlinkage void __init early_map_kernel(u64 boot_status, phys_addr_t fdt)

					}

					if (va_bits > VA_BITS_MIN)

						sysreg_clear_set(tcr_el1, TCR_T1SZ_MASK, TCR_T1SZ(va_bits));

						sysreg_clear_set(tcr_el1, TCR_EL1_T1SZ_MASK, TCR_T1SZ(va_bits));

					/*

					 * The virtual KASLR displacement modulo 2MiB is decided by the

									
										2

arch/arm64/kernel/probes/uprobes.c
									
												View File
												
				@@ -131,7 +131,7 @@ void arch_uprobe_abort_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)

					struct uprobe_task *utask = current->utask;

					/*

					 * Task has received a fatal signal, so reset back to probbed

					 * Task has received a fatal signal, so reset back to probed

					 * address.

					 */

					instruction_pointer_set(regs, utask->vaddr);

									
										40

arch/arm64/kernel/ptrace.c
									
												View File
												
				@@ -912,13 +912,39 @@ static int sve_set_common(struct task_struct *target,

						return -EINVAL;

					/*

					 * Apart from SVE_PT_REGS_MASK, all SVE_PT_* flags are consumed by

					 * vec_set_vector_length(), which will also validate them for us:

					 * On systems without SVE we accept FPSIMD format writes with

					 * a VL of 0 to allow exiting streaming mode, otherwise a VL

					 * is required.

					 */

					ret = vec_set_vector_length(target, type, header.vl,

						((unsigned long)header.flags & ~SVE_PT_REGS_MASK) << 16);

					if (ret)

						return ret;

					if (header.vl) {

						/*

						 * If the system does not support SVE we can't

						 * configure a SVE VL.

						 */

						if (!system_supports_sve() && type == ARM64_VEC_SVE)

							return -EINVAL;

						/*

						 * Apart from SVE_PT_REGS_MASK, all SVE_PT_* flags are

						 * consumed by vec_set_vector_length(), which will

						 * also validate them for us:

						 */

						ret = vec_set_vector_length(target, type, header.vl,

									    ((unsigned long)header.flags & ~SVE_PT_REGS_MASK) << 16);

						if (ret)

							return ret;

					} else {

						/* If the system supports SVE we require a VL. */

						if (system_supports_sve())

							return -EINVAL;

						/*

						 * Only FPSIMD formatted data with no flags set is

						 * supported.

						 */

						if (header.flags != SVE_PT_REGS_FPSIMD)

							return -EINVAL;

					}

					/* Allocate SME storage if necessary, preserving any existing ZA/ZT state */

					if (type == ARM64_VEC_SME) {

				@@ -1016,7 +1042,7 @@ static int sve_set(struct task_struct *target,

						   unsigned int pos, unsigned int count,

						   const void *kbuf, const void __user *ubuf)

				{

					if (!system_supports_sve())

					if (!system_supports_sve() && !system_supports_sme())

						return -EINVAL;

					return sve_set_common(target, regset, pos, count, kbuf, ubuf,

Compare commits

490 Commits 4a26e7032d ... 44fc84337b

142 Documentation/admin-guide/RAS/main.rst Unescape Escape View File

6 Documentation/admin-guide/kernel-parameters.txt Unescape Escape View File

8 Documentation/arch/arm64/booting.rst Unescape Escape View File

5 Documentation/arch/arm64/sve.rst Unescape Escape View File

5 Documentation/arch/s390/s390dbf.rst Unescape Escape View File

3 Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml Unescape Escape View File

13 Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2700-intc.yaml Unescape Escape View File

4 Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.yaml Unescape Escape View File

17 Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-mswi.yaml Unescape Escape View File

4 Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-sswi.yaml Unescape Escape View File

29 Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml Unescape Escape View File

47 Documentation/devicetree/bindings/timer/realtek,rtd1625-systimer.yaml Normal file Unescape Escape View File

2 Documentation/devicetree/bindings/vendor-prefixes.yaml Unescape Escape View File

134 Documentation/filesystems/resctrl.rst Unescape Escape View File

15 MAINTAINERS Unescape Escape View File

26 arch/arm/include/asm/uaccess.h Unescape Escape View File

26 arch/arm64/Kconfig Unescape Escape View File

8 arch/arm64/include/asm/alternative-macros.h Unescape Escape View File

4 arch/arm64/include/asm/alternative.h Unescape Escape View File

4 arch/arm64/include/asm/arch_gicv3.h Unescape Escape View File

6 arch/arm64/include/asm/asm-extable.h Unescape Escape View File

12 arch/arm64/include/asm/assembler.h Unescape Escape View File

20 arch/arm64/include/asm/atomic_lse.h Unescape Escape View File

4 arch/arm64/include/asm/barrier.h Unescape Escape View File

4 arch/arm64/include/asm/cache.h Unescape Escape View File

4 arch/arm64/include/asm/cpucaps.h Unescape Escape View File

8 arch/arm64/include/asm/cpufeature.h Unescape Escape View File

6 arch/arm64/include/asm/cputype.h Unescape Escape View File

4 arch/arm64/include/asm/current.h Unescape Escape View File

4 arch/arm64/include/asm/debug-monitors.h Unescape Escape View File

13 arch/arm64/include/asm/efi.h Unescape Escape View File

4 arch/arm64/include/asm/el2_setup.h Unescape Escape View File

4 arch/arm64/include/asm/elf.h Unescape Escape View File

4 arch/arm64/include/asm/esr.h Unescape Escape View File

4 arch/arm64/include/asm/fixmap.h Unescape Escape View File

2 arch/arm64/include/asm/fpsimd.h Unescape Escape View File

6 arch/arm64/include/asm/ftrace.h Unescape Escape View File

6 arch/arm64/include/asm/gpr-num.h Unescape Escape View File

2 arch/arm64/include/asm/hwcap.h Unescape Escape View File

4 arch/arm64/include/asm/image.h Unescape Escape View File

4 arch/arm64/include/asm/insn.h Unescape Escape View File

4 arch/arm64/include/asm/jump_label.h Unescape Escape View File

2 arch/arm64/include/asm/kasan.h Unescape Escape View File

4 arch/arm64/include/asm/kexec.h Unescape Escape View File

4 arch/arm64/include/asm/kgdb.h Unescape Escape View File

4 arch/arm64/include/asm/kvm_asm.h Unescape Escape View File

4 arch/arm64/include/asm/kvm_mmu.h Unescape Escape View File

4 arch/arm64/include/asm/kvm_mte.h Unescape Escape View File

6 arch/arm64/include/asm/kvm_ptrauth.h Unescape Escape View File

2 arch/arm64/include/asm/linkage.h Unescape Escape View File

5 arch/arm64/include/asm/memory.h Unescape Escape View File

4 arch/arm64/include/asm/mmu.h Unescape Escape View File

20 arch/arm64/include/asm/mmu_context.h Unescape Escape View File

4 arch/arm64/include/asm/mte-kasan.h Unescape Escape View File

4 arch/arm64/include/asm/mte.h Unescape Escape View File

4 arch/arm64/include/asm/page.h Unescape Escape View File

125 arch/arm64/include/asm/pgtable-hwdef.h Unescape Escape View File

6 arch/arm64/include/asm/pgtable-prot.h Unescape Escape View File

22 arch/arm64/include/asm/pgtable.h Unescape Escape View File

4 arch/arm64/include/asm/proc-fns.h Unescape Escape View File

4 arch/arm64/include/asm/processor.h Unescape Escape View File

4 arch/arm64/include/asm/ptrace.h Unescape Escape View File

4 arch/arm64/include/asm/rsi_smc.h Unescape Escape View File

4 arch/arm64/include/asm/rwonce.h Unescape Escape View File

4 arch/arm64/include/asm/scs.h Unescape Escape View File

4 arch/arm64/include/asm/sdei.h Unescape Escape View File

2 arch/arm64/include/asm/simd.h Unescape Escape View File

4 arch/arm64/include/asm/smp.h Unescape Escape View File

4 arch/arm64/include/asm/spectre.h Unescape Escape View File

4 arch/arm64/include/asm/stacktrace/frame.h Unescape Escape View File

2 arch/arm64/include/asm/suspend.h Unescape Escape View File

12 arch/arm64/include/asm/sysreg.h Unescape Escape View File

4 arch/arm64/include/asm/system_misc.h Unescape Escape View File

2 arch/arm64/include/asm/thread_info.h Unescape Escape View File

85 arch/arm64/include/asm/tlbflush.h Unescape Escape View File

4 arch/arm64/include/asm/uaccess.h Unescape Escape View File

4 arch/arm64/include/asm/vdso.h Unescape Escape View File

4 arch/arm64/include/asm/vdso/compat_barrier.h Unescape Escape View File

490 Commits

4a26e7032d ... 44fc84337b

142

Documentation/admin-guide/RAS/main.rst

View File

6

Documentation/admin-guide/kernel-parameters.txt

View File

8

Documentation/arch/arm64/booting.rst

View File

5

Documentation/arch/arm64/sve.rst

View File

5

Documentation/arch/s390/s390dbf.rst

View File

3

Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml

View File

13

Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2700-intc.yaml

View File

4

Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.yaml

View File

17

Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-mswi.yaml

View File

4

Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-sswi.yaml

View File

29

Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml

View File

47

Documentation/devicetree/bindings/timer/realtek,rtd1625-systimer.yaml Normal file

View File

2

Documentation/devicetree/bindings/vendor-prefixes.yaml

View File

134

Documentation/filesystems/resctrl.rst

View File

15

MAINTAINERS

View File

26

arch/arm/include/asm/uaccess.h

View File

26

arch/arm64/Kconfig

View File

8

arch/arm64/include/asm/alternative-macros.h

View File

4

arch/arm64/include/asm/alternative.h

View File

4

arch/arm64/include/asm/arch_gicv3.h

View File

6

arch/arm64/include/asm/asm-extable.h

View File

12

arch/arm64/include/asm/assembler.h

View File

20

arch/arm64/include/asm/atomic_lse.h

View File

4

arch/arm64/include/asm/barrier.h

View File

4

arch/arm64/include/asm/cache.h

View File

4

arch/arm64/include/asm/cpucaps.h

View File

8

arch/arm64/include/asm/cpufeature.h

View File

6

arch/arm64/include/asm/cputype.h

View File

4

arch/arm64/include/asm/current.h

View File

4

arch/arm64/include/asm/debug-monitors.h

View File

13

arch/arm64/include/asm/efi.h

View File

4

arch/arm64/include/asm/el2_setup.h

View File

4

arch/arm64/include/asm/elf.h

View File

4

arch/arm64/include/asm/esr.h

View File

4

arch/arm64/include/asm/fixmap.h

View File

2

arch/arm64/include/asm/fpsimd.h

View File

6

arch/arm64/include/asm/ftrace.h

View File

6

arch/arm64/include/asm/gpr-num.h

View File

2

arch/arm64/include/asm/hwcap.h

View File

4

arch/arm64/include/asm/image.h

View File

4

arch/arm64/include/asm/insn.h

View File

4

arch/arm64/include/asm/jump_label.h

View File

2

arch/arm64/include/asm/kasan.h

View File

4

arch/arm64/include/asm/kexec.h

View File

4

arch/arm64/include/asm/kgdb.h

View File

4

arch/arm64/include/asm/kvm_asm.h

View File

4

arch/arm64/include/asm/kvm_mmu.h

View File

4

arch/arm64/include/asm/kvm_mte.h

View File

6

arch/arm64/include/asm/kvm_ptrauth.h

View File

2

arch/arm64/include/asm/linkage.h

View File

5

arch/arm64/include/asm/memory.h

View File

4

arch/arm64/include/asm/mmu.h

View File

20

arch/arm64/include/asm/mmu_context.h

View File

4

arch/arm64/include/asm/mte-kasan.h

View File

4

arch/arm64/include/asm/mte.h

View File

4

arch/arm64/include/asm/page.h

View File

125

arch/arm64/include/asm/pgtable-hwdef.h

View File

6

arch/arm64/include/asm/pgtable-prot.h

View File

22

arch/arm64/include/asm/pgtable.h

View File

4

arch/arm64/include/asm/proc-fns.h

View File

4

arch/arm64/include/asm/processor.h

View File

4

arch/arm64/include/asm/ptrace.h

View File

4

arch/arm64/include/asm/rsi_smc.h

View File

4

arch/arm64/include/asm/rwonce.h

View File

4

arch/arm64/include/asm/scs.h

View File

4

arch/arm64/include/asm/sdei.h

View File

2

arch/arm64/include/asm/simd.h

View File

4

arch/arm64/include/asm/smp.h

View File

4

arch/arm64/include/asm/spectre.h

View File

4

arch/arm64/include/asm/stacktrace/frame.h

View File

2

arch/arm64/include/asm/suspend.h

View File

12

arch/arm64/include/asm/sysreg.h

View File

4

arch/arm64/include/asm/system_misc.h

View File

2

arch/arm64/include/asm/thread_info.h

View File

85

arch/arm64/include/asm/tlbflush.h

View File

4

arch/arm64/include/asm/uaccess.h

View File

4

arch/arm64/include/asm/vdso.h

View File

4

arch/arm64/include/asm/vdso/compat_barrier.h

View File

4

arch/arm64/include/asm/vdso/compat_gettimeofday.h

View File