Compare commits

...

1855 Commits

Author SHA1 Message Date
Mike Snitzer
3af870aedb nfs/localio: fix regression due to out-of-order __put_cred
Commit f2060bdc21 ("nfs/localio: add refcounting for each iocb IO
associated with NFS pgio header") inadvertantly reintroduced the same
potential for __put_cred() triggering BUG_ON(cred == current->cred) that
commit 992203a1fb ("nfs/localio: restore creds before releasing pageio
data") fixed.

Fix this by saving and restoring the cred around each {read,write}_iter
call within the respective for loop of nfs_local_call_{read,write} using
scoped_with_creds().

NOTE: this fix started by first reverting the following commits:

 94afb627df ("nfs: use credential guards in nfs_local_call_read()")
 bff3c841f7 ("nfs: use credential guards in nfs_local_call_write()")
 1d18101a64 ("Merge tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs")

followed by narrowly fixing the cred lifetime issue by using
scoped_with_creds().  In doing so, this commit's changes appear more
extensive than they really are (as evidenced by comparing to v6.18's
fs/nfs/localio.c).

Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Acked-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/linux-next/20251205111942.4150b06f@canb.auug.org.au/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-12-05 17:53:56 -08:00
Linus Torvalds
11efc1cb70 Merge tag 'soc-drivers-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull more SoC driver updates from Arnd Bergmann:
 "These updates came a little late, or were based on a later 6.18-rc tag
  than the others:

   - A new driver for cache management on cxl devices with memory shared
     in a coherent cluster. This is part of the drivers/cache/ tree, but
     unlike the other drivers that back the dma-mapping interfaces, this
     one is needed only during CPU hotplug.

   - A shared branch for reset controllers using swnode infrastructure

   - Added support for new SoC variants in the Amlogic soc_device
     identification

   - Minor updates in Freescale, Microchip, Samsung, and Apple SoC
     drivers"

* tag 'soc-drivers-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (24 commits)
  soc: samsung: exynos-pmu: fix device leak on regmap lookup
  soc: samsung: exynos-pmu: Fix structure initialization
  soc: fsl: qbman: use kmalloc_array() instead of kmalloc()
  soc: fsl: qbman: add WQ_PERCPU to alloc_workqueue users
  MAINTAINERS: Update email address for Christophe Leroy
  MAINTAINERS: refer to intended file in STANDALONE CACHE CONTROLLER DRIVERS
  cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent
  cache: Make top level Kconfig menu a boolean dependent on RISCV
  MAINTAINERS: Add Jonathan Cameron to drivers/cache and add lib/cache_maint.c + header
  arm64: Select GENERIC_CPU_CACHE_MAINTENANCE
  lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
  soc: amlogic: meson-gx-socinfo: add new SoCs id
  dt-bindings: arm: amlogic: meson-gx-ao-secure: support more SoCs
  memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion()
  memregion: Drop unused IORES_DESC_* parameter from cpu_cache_invalidate_memregion()
  dt-bindings: cache: sifive,ccache0: add a pic64gx compatible
  MAINTAINERS: rename Microchip RISC-V entry
  MAINTAINERS: add new soc drivers to Microchip RISC-V entry
  soc: microchip: add mfd drivers for two syscon regions on PolarFire SoC
  dt-bindings: soc: microchip: document the simple-mfd syscon on PolarFire SoC
  ...
2025-12-05 17:47:59 -08:00
Linus Torvalds
208eed95fc Merge tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC driver updates from Arnd Bergmann:
 "This is the first half of the driver changes:

   - A treewide interface change to the "syscore" operations for power
     management, as a preparation for future Tegra specific changes

   - Reset controller updates with added drivers for LAN969x, eic770 and
     RZ/G3S SoCs

   - Protection of system controller registers on Renesas and Google
     SoCs, to prevent trivially triggering a system crash from e.g.
     debugfs access

   - soc_device identification updates on Nvidia, Exynos and Mediatek

   - debugfs support in the ST STM32 firewall driver

   - Minor updates for SoC drivers on AMD/Xilinx, Renesas, Allwinner, TI

   - Cleanups for memory controller support on Nvidia and Renesas"

* tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (114 commits)
  memory: tegra186-emc: Fix missing put_bpmp
  Documentation: reset: Remove reset_controller_add_lookup()
  reset: fix BIT macro reference
  reset: rzg2l-usbphy-ctrl: Fix a NULL vs IS_ERR() bug in probe
  reset: th1520: Support reset controllers in more subsystems
  reset: th1520: Prepare for supporting multiple controllers
  dt-bindings: reset: thead,th1520-reset: Add controllers for more subsys
  dt-bindings: reset: thead,th1520-reset: Remove non-VO-subsystem resets
  reset: remove legacy reset lookup code
  clk: davinci: psc: drop unused reset lookup
  reset: rzg2l-usbphy-ctrl: Add support for RZ/G3S SoC
  reset: rzg2l-usbphy-ctrl: Add support for USB PWRRDY
  dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document RZ/G3S support
  reset: eswin: Add eic7700 reset driver
  dt-bindings: reset: eswin: Documentation for eic7700 SoC
  reset: sparx5: add LAN969x support
  dt-bindings: reset: microchip: Add LAN969x support
  soc: rockchip: grf: Add select correct PWM implementation on RK3368
  soc/tegra: pmc: Add USB wake events for Tegra234
  amba: tegra-ahb: Fix device leak on SMMU enable
  ...
2025-12-05 17:29:04 -08:00
Linus Torvalds
66a1025f7f Merge tag 'soc-newsoc-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull new SoC families update from Arnd Bergmann:
 "These three new families of SoC are split out into a separate branch
  because they touch multiple parts of the source tree and are better
  left separate for the initial merge.

   - Black Sesame Technologies C1200 is an automotive SoC using
     Cortex-A78 CPU cores

   - Anlogic dr1v90 (not to be confused with Amlogic) is an FPGA
     platform using a single nuclei ux900 RISC-V core

   - Tenstorrent Blackhole is a Neural Processing Unit using custom
     "Tensix" cores for computation offload managed by Linux running on
     SiFive X280 RISC-V cores.

  Support for all three is rather rudimentary at the moment and will get
  improved as device drivers are merged through other tree"

* tag 'soc-newsoc-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (24 commits)
  MAINTAINERS: add Black Sesame Technologies (BST) ARM SoC support
  arm64: defconfig: enable BST platform support
  arm64: dts: bst: add support for Black Sesame Technologies C1200 CDCU1.0 board
  arm64: Kconfig: add ARCH_BST for Black Sesame Technologies SoCs
  dt-bindings: arm: add Black Sesame Technologies (bst) SoC
  dt-bindings: vendor-prefixes: Add Black Sesame Technologies Co., Ltd.
  MAINTAINERS: Setup support for Anlogic tree
  riscv: defconfig: Enable Anlogic SoC
  riscv: dts: anlogic: Add Milianke MLKPAI FS01 board
  riscv: dts: Add initial Anlogic DR1V90 SoC device tree
  riscv: Add Anlogic SoC famly Kconfig support
  dt-bindings: serial: snps-dw-apb-uart: Add Anlogic DR1V90 uart
  dt-bindings: timer: Add Anlogic DR1V90 ACLINT MTIMER
  dt-bindings: riscv: Add Anlogic DR1V90
  dt-bindings: riscv: Add Nuclei UX900 compatibles
  dt-bindings: vendor-prefixes: Add Anlogic, Milianke and Nuclei
  riscv: defconfig: Enable Tenstorrent SoCs
  riscv: Kconfig.socs: Add ARCH_TENSTORRENT for Tenstorrent SoCs
  riscv: dts: Add Tenstorrent Blackhole SoC PCIe cards
  dt-bindings: interrupt-controller: Add Tenstorrent Blackhole compatible
  ...
2025-12-05 17:27:12 -08:00
Linus Torvalds
0cac5ce06e Merge tag 'soc-dt-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC devicetree updates from Arnd Bergmann:
 "Three new SoCs got added in existing arm64 chip families:

   - Renesas R-Car X5H (R8A78000) is a new generation of automotive
     SoCs, based on 16 Cortex-A720 (Armv9.2) cores, which makes the the
     currently highest-perforance embedded SoC.

   - TI AM62L is a new variant of the AM62 family of industrial SoCs,
     this one comes without a GPU.

   - Qualcomm MSM8937 (Snapdragon 430) is an older mobile phone chip
     based on Cortex-A53, and closely related to MSM8917 (Snapdragn
     425), which we already support.

  In addition, there are a good number of newly supported machines
  across SoC families:

   - Two Aspeed AST2600 (Cortex-A7) based BMC setups for large servers

   - Mobile Phones and tables based on Mediatek MT6582, Nvidia Tegra124,
     Qualcomm MSM8937 and Qualcomm MSM8939,

   - Two Laptops based on Qualcomm SoCs: one using the older sdm850, the
     other using x1p42100.

   - One Router based on Rockchips RK3568

   - 24 variants of the Enclustra Mercury system-on-module, all based on
     32-bit Intel/Altera SocFPGA chips, plus two boards using 64-bit
     SocFPGA Agilex chips..

   - 30 industrial/embedded boards and single-board computers, using
     various chips from NXP, Rockchips, Mediatek, TI, Amlogic, Qualcomm,
     Spacemit, and Starfive.

  In total there are 783 commits here, the majority of these improving
  hardware support and cleaning up devicetree files across the tree,
  with the majority of the changes going into the Qualcomm, NXP, Renesas
  and Rockchips platforms"

* tag 'soc-dt-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (782 commits)
  arm64: dts: mediatek: mt8195: Fix address range for JPEG decoder core 1
  ARM: dts: samsung: exynos4412-midas: turn off SDIO WLAN chip during system suspend
  ARM: dts: samsung: exynos4210-trats: turn off SDIO WLAN chip during system suspend
  ARM: dts: samsung: exynos4210-i9100: turn off SDIO WLAN chip during system suspend
  ARM: dts: samsung: universal_c210: turn off SDIO WLAN chip during system suspend
  arm64: dts: amlogic: meson-g12b: Fix L2 cache reference for S922X CPUs
  arm64: dts: Add gpio_intc node for Amlogic S7D SoCs
  arm64: dts: Add gpio_intc node for Amlogic S7 SoCs
  arm64: dts: Add gpio_intc node for Amlogic S6 SoCs
  arm64: dts: amlogic: s7d: add ao secure node
  arm64: dts: amlogic: s7: add ao secure node
  arm64: dts: amlogic: s6: add ao secure node
  arm64: dts: amlogic: Fix the register name of the 'DBI' region
  dts: arm64: amlogic: add a5 pinctrl node
  arm64: dts: amlogic: s7d: add power domain controller node
  arm64: dts: amlogic: s7: add power domain controller node
  arm64: dts: amlogic: s6: add power domain controller node
  dts: arm64: amlogic: Add ISP related nodes for C3
  arm64: dts: meson: add initial device-tree for Tanix TX9 Pro
  dt-bindings: arm: amlogic: add support for Tanix TX9 Pro
  ...
2025-12-05 17:24:29 -08:00
Linus Torvalds
09cab48db9 Merge tag 'soc-arm-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC ARM code updates from Arnd Bergmann:
 "These are very minimal changes for 32-bit Arm platform code, enabling
  SMP bringup for one more SoC variant (mt6582) among spelling changes
  and a build warning fix"

* tag 'soc-arm-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  ARM: omap1: avoid symbol clashes in fiq handler
  ARM: gemini: fix typos in comments
  ARM: versatile: Fix typo in versatile.c
  ARM: OMAP2+: Fix falg->flag typo in omap_smc2()
  ARM: mediatek: add MT6582 smp bring up code
  ARM: mediatek: add board_dt_compat entry for the MT6582 SoC
2025-12-05 17:23:12 -08:00
Linus Torvalds
b4c6c76e40 Merge tag 'soc-defconfig-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC defconfig updates from Arnd Bergmann:
 "As usual, a number of newly added drivers get enabled in the arm64
  defconfig, in addition to minor housekeeping work on defconfig files
  for arm32, arm64 and riscv"

* tag 'soc-defconfig-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (24 commits)
  arm64: defconfig: enable Exynos ACPM clocks
  arm64: defconfig: Remove the redundant SCHED_MC/SCHED_SMT
  ARM: multi_v7_defconfig: Enable TI PRU Ethernet driver
  arm64: defconfig: enable i.MX AIPSTZ driver
  ARM: mxs_defconfig: enable sound drivers for imx28-amarula-rmm
  arm64: defconfig: Enable i.MX95 drivers for pinctrl, Ethernet and PCIe
  arm64: defconfig: enable rockchip camera interface
  ARM: tegra: Enable EXT4 for Tegra
  arm64: defconfig: Enable NVIDIA VRS PSEQ RTC
  arm64: defconfig: Enable SX150x GPIO expander driver
  riscv: defconfig: enable SPI_FSL_QUADSPI as a module
  ARM: at91: at91_dt_defconfig: set MMC_SPI to module
  arm64: defconfig: Build NSS clock controller driver for IPQ5424
  arm64: defconfig: Enable SCSI UFS Crypto and Block Inline encryption drivers
  arm64: defconfig: Add M31 eUSB2 PHY config
  arm64: defconfig: Enable configs for Fairphone 3, 4, 5 smartphones
  arm64: defconfig: Enable two Novatek display panels for MTP8750 and Tianma
  arm64: defconfig: Enable RZ/T2H / RZ/N2H ADC driver
  ARM: shmobile: defconfig: Refresh for v6.18-rc1
  arm64: defconfig: Enable DW HDMI QP CEC support
  ...
2025-12-05 17:22:09 -08:00
Linus Torvalds
51d90a15fe Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
 "ARM:

   - Support for userspace handling of synchronous external aborts
     (SEAs), allowing the VMM to potentially handle the abort in a
     non-fatal manner

   - Large rework of the VGIC's list register handling with the goal of
     supporting more active/pending IRQs than available list registers
     in hardware. In addition, the VGIC now supports EOImode==1 style
     deactivations for IRQs which may occur on a separate vCPU than the
     one that acked the IRQ

   - Support for FEAT_XNX (user / privileged execute permissions) and
     FEAT_HAF (hardware update to the Access Flag) in the software page
     table walkers and shadow MMU

   - Allow page table destruction to reschedule, fixing long
     need_resched latencies observed when destroying a large VM

   - Minor fixes to KVM and selftests

  Loongarch:

   - Get VM PMU capability from HW GCFG register

   - Add AVEC basic support

   - Use 64-bit register definition for EIOINTC

   - Add KVM timer test cases for tools/selftests

  RISC/V:

   - SBI message passing (MPXY) support for KVM guest

   - Give a new, more specific error subcode for the case when in-kernel
     AIA virtualization fails to allocate IMSIC VS-file

   - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually
     in small chunks

   - Fix guest page fault within HLV* instructions

   - Flush VS-stage TLB after VCPU migration for Andes cores

  s390:

   - Always allocate ESCA (Extended System Control Area), instead of
     starting with the basic SCA and converting to ESCA with the
     addition of the 65th vCPU. The price is increased number of exits
     (and worse performance) on z10 and earlier processor; ESCA was
     introduced by z114/z196 in 2010

   - VIRT_XFER_TO_GUEST_WORK support

   - Operation exception forwarding support

   - Cleanups

  x86:

   - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO
     SPTE caching is disabled, as there can't be any relevant SPTEs to
     zap

   - Relocate a misplaced export

   - Fix an async #PF bug where KVM would clear the completion queue
     when the guest transitioned in and out of paging mode, e.g. when
     handling an SMI and then returning to paged mode via RSM

   - Leave KVM's user-return notifier registered even when disabling
     virtualization, as long as kvm.ko is loaded. On reboot/shutdown,
     keeping the notifier registered is ok; the kernel does not use the
     MSRs and the callback will run cleanly and restore host MSRs if the
     CPU manages to return to userspace before the system goes down

   - Use the checked version of {get,put}_user()

   - Fix a long-lurking bug where KVM's lack of catch-up logic for
     periodic APIC timers can result in a hard lockup in the host

   - Revert the periodic kvmclock sync logic now that KVM doesn't use a
     clocksource that's subject to NTP corrections

   - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the
     latter behind CONFIG_CPU_MITIGATIONS

   - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast
     path; the only reason they were handled in the fast path was to
     paper of a bug in the core #MC code, and that has long since been
     fixed

   - Add emulator support for AVX MOV instructions, to play nice with
     emulated devices whose guest drivers like to access PCI BARs with
     large multi-byte instructions

  x86 (AMD):

   - Fix a few missing "VMCB dirty" bugs

   - Fix the worst of KVM's lack of EFER.LMSLE emulation

   - Add AVIC support for addressing 4k vCPUs in x2AVIC mode

   - Fix incorrect handling of selective CR0 writes when checking
     intercepts during emulation of L2 instructions

   - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32]
     on VMRUN and #VMEXIT

   - Fix a bug where KVM corrupt the guest code stream when re-injecting
     a soft interrupt if the guest patched the underlying code after the
     VM-Exit, e.g. when Linux patches code with a temporary INT3

   - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits
     to userspace, and extend KVM "support" to all policy bits that
     don't require any actual support from KVM

  x86 (Intel):

   - Use the root role from kvm_mmu_page to construct EPTPs instead of
     the current vCPU state, partly as worthwhile cleanup, but mostly to
     pave the way for tracking per-root TLB flushes, and elide EPT
     flushes on pCPU migration if the root is clean from a previous
     flush

   - Add a few missing nested consistency checks

   - Rip out support for doing "early" consistency checks via hardware
     as the functionality hasn't been used in years and is no longer
     useful in general; replace it with an off-by-default module param
     to WARN if hardware fails a check that KVM does not perform

   - Fix a currently-benign bug where KVM would drop the guest's
     SPEC_CTRL[63:32] on VM-Enter

   - Misc cleanups

   - Overhaul the TDX code to address systemic races where KVM (acting
     on behalf of userspace) could inadvertantly trigger lock contention
     in the TDX-Module; KVM was either working around these in weird,
     ugly ways, or was simply oblivious to them (though even Yan's
     devilish selftests could only break individual VMs, not the host
     kernel)

   - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a
     TDX vCPU, if creating said vCPU failed partway through

   - Fix a few sparse warnings (bad annotation, 0 != NULL)

   - Use struct_size() to simplify copying TDX capabilities to userspace

   - Fix a bug where TDX would effectively corrupt user-return MSR
     values if the TDX Module rejects VP.ENTER and thus doesn't clobber
     host MSRs as expected

  Selftests:

   - Fix a math goof in mmu_stress_test when running on a single-CPU
     system/VM

   - Forcefully override ARCH from x86_64 to x86 to play nice with
     specifying ARCH=x86_64 on the command line

   - Extend a bunch of nested VMX to validate nested SVM as well

   - Add support for LA57 in the core VM_MODE_xxx macro, and add a test
     to verify KVM can save/restore nested VMX state when L1 is using
     5-level paging, but L2 is not

   - Clean up the guest paging code in anticipation of sharing the core
     logic for nested EPT and nested NPT

  guest_memfd:

   - Add NUMA mempolicy support for guest_memfd, and clean up a variety
     of rough edges in guest_memfd along the way

   - Define a CLASS to automatically handle get+put when grabbing a
     guest_memfd from a memslot to make it harder to leak references

   - Enhance KVM selftests to make it easer to develop and debug
     selftests like those added for guest_memfd NUMA support, e.g. where
     test and/or KVM bugs often result in hard-to-debug SIGBUS errors

   - Misc cleanups

  Generic:

   - Use the recently-added WQ_PERCPU when creating the per-CPU
     workqueue for irqfd cleanup

   - Fix a goof in the dirty ring documentation

   - Fix choice of target for directed yield across different calls to
     kvm_vcpu_on_spin(); the function was always starting from the first
     vCPU instead of continuing the round-robin search"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits)
  KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
  KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
  KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
  KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
  KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
  KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
  KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
  KVM: arm64: selftests: Add test for AT emulation
  KVM: arm64: nv: Expose hardware access flag management to NV guests
  KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
  KVM: arm64: Implement HW access flag management in stage-1 SW PTW
  KVM: arm64: Propagate PTW errors up to AT emulation
  KVM: arm64: Add helper for swapping guest descriptor
  KVM: arm64: nv: Use pgtable definitions in stage-2 walk
  KVM: arm64: Handle endianness in read helper for emulated PTW
  KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
  KVM: arm64: Call helper for reading descriptors directly
  KVM: arm64: nv: Advertise support for FEAT_XNX
  KVM: arm64: Teach ptdump about FEAT_XNX permissions
  KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions
  ...
2025-12-05 17:01:20 -08:00
Linus Torvalds
399ead3a6d Merge tag 'uml-for-linux-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML updates from Johannes Berg:
 "Apart from the usual small churn, we have

   - initial SMP support (only kernel)

   - major vDSO cleanups (and fixes for 32-bit)"

* tag 'uml-for-linux-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (33 commits)
  um: Disable KASAN_INLINE when STATIC_LINK is selected
  um: Don't rename vmap to kernel_vmap
  um: drivers: virtio: use string choices helper
  um: Always set up AT_HWCAP and AT_PLATFORM
  x86/um: Remove FIXADDR_USER_START and FIXADDR_USE_END
  um: Remove __access_ok_vsyscall()
  um: Remove redundant range check from __access_ok_vsyscall()
  um: Remove fixaddr_user_init()
  x86/um: Drop gate area handling
  x86/um: Do not inherit vDSO from host
  um: Split out default elf_aux_hwcap
  x86/um: Move ELF_PLATFORM fallback to x86-specific code
  um: Split out default elf_aux_platform
  um: Avoid circular dependency on asm-offsets in pgtable.h
  um: Enable SMP support on x86
  asm-generic: percpu: Add assembly guard
  um: vdso: Remove getcpu support on x86
  um: Add initial SMP support
  um: Define timers on a per-CPU basis
  um: Determine sleep based on need_resched()
  ...
2025-12-05 16:30:56 -08:00
Linus Torvalds
07025b51c1 Merge tag 'riscv-for-linus-6.19-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley:

 - Enable parallel hotplug for RISC-V

 - Optimize vector regset allocation for ptrace()

 - Add a kernel selftest for the vector ptrace interface

 - Enable the userspace RAID6 test to build and run using RISC-V vectors

 - Add initial support for the Zalasr RISC-V ratified ISA extension

 - For the Zicbop RISC-V ratified ISA extension to userspace, expose
   hardware and kernel support to userspace and add a kselftest for
   Zicbop

 - Convert open-coded instances of 'asm goto's that are controlled by
   runtime ALTERNATIVEs to use riscv_has_extension_{un,}likely(),
   following arm64's alternative_has_cap_{un,}likely()

 - Remove an unnecessary mask in the GFP flags used in some calls to
   pagetable_alloc()

* tag 'riscv-for-linus-6.19-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  selftests/riscv: Add Zicbop prefetch test
  riscv: hwprobe: Expose Zicbop extension and its block size
  riscv: Introduce Zalasr instructions
  riscv: hwprobe: Export Zalasr extension
  dt-bindings: riscv: Add Zalasr ISA extension description
  riscv: Add ISA extension parsing for Zalasr
  selftests: riscv: Add test for the Vector ptrace interface
  riscv: ptrace: Optimize the allocation of vector regset
  raid6: test: Add support for RISC-V
  raid6: riscv: Allow code to be compiled in userspace
  raid6: riscv: Prevent compiler from breaking inline vector assembly code
  riscv: cmpxchg: Use riscv_has_extension_likely
  riscv: bitops: Use riscv_has_extension_likely
  riscv: hweight: Use riscv_has_extension_likely
  riscv: checksum: Use riscv_has_extension_likely
  riscv: pgtable: Use riscv_has_extension_unlikely
  riscv: Remove __GFP_HIGHMEM masking
  RISC-V: Enable HOTPLUG_PARALLEL for secondary CPUs
2025-12-05 16:26:57 -08:00
Linus Torvalds
ad952db4a8 Merge tag 'powerpc-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:

 - Restore clearing of MSR[RI] at interrupt/syscall exit on 32-bit

 - Fix unpaired stwcx on interrupt exit on 32-bit

 - Fix race condition leading to double list-add in
   mac_hid_toggle_emumouse()

 - Fix mprotect on book3s 32-bit

 - Fix SLB multihit issue during SLB preload with 64-bit hash MMU

 - Add support for crashkernel CMA reservation

 - Add die_id and die_cpumask for Power10 & later to expose chip
   hemispheres

 - A series of minor fixes and improvements to the hash SLB code

Thanks to Antonio Alvarez Feijoo, Ben Collins, Bhaskar Chowdhury,
Christophe Leroy, Daniel Thompson, Dave Vasilevsky, Donet Tom,
J. Neuschäfer, Kunwu Chan, Long Li, Naresh Kamboju, Nathan Chancellor,
Ritesh Harjani (IBM), Shirisha G, Shrikanth Hegde, Sourabh Jain, Srikar
Dronamraju, Stephen Rothwell, Thomas Zimmermann, Venkat Rao Bagalkote,
and Vishal Chourasia.

* tag 'powerpc-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (32 commits)
  macintosh/via-pmu-backlight: Include <linux/fb.h> and <linux/of.h>
  powerpc/powermac: backlight: Include <linux/of.h>
  powerpc/64s/slb: Add no_slb_preload early cmdline param
  powerpc/64s/slb: Make preload_add return type as void
  powerpc/ptdump: Dump PXX level info for kernel_page_tables
  powerpc/64s/pgtable: Enable directMap counters in meminfo for Hash
  powerpc/64s/hash: Update directMap page counters for Hash
  powerpc/64s/hash: Hash hpt_order should be only available with Hash MMU
  powerpc/64s/hash: Improve hash mmu printk messages
  powerpc/64s/hash: Fix phys_addr_t printf format in htab_initialize()
  powerpc/64s/ptdump: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE format
  powerpc/64s/hash: Restrict stress_hpt_struct memblock region to within RMA limit
  powerpc/64s/slb: Fix SLB multihit issue during SLB preload
  powerpc, mm: Fix mprotect on book3s 32-bit
  powerpc/smp: Expose die_id and die_cpumask
  powerpc/83xx: Add a null pointer check to mcu_gpiochip_add
  arch:powerpc:tools This file was missing shebang line, so added it
  kexec: Include kernel-end even without crashkernel
  powerpc: p2020: Rename wdt@ nodes to watchdog@
  powerpc: 86xx: Rename wdt@ nodes to watchdog@
  ...
2025-12-05 16:18:21 -08:00
Christian Brauner
87c9e88ac4 ovl: pass original credentials, not mounter credentials during create
When creating new files the security layer expects the original
credentials to be passed.  When cleaning up the code this was accidently
changed to pass the mounter's credentials by relying on current->cred
which is already overriden at this point.  Pass the original credentials
directly.

Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Reported-by: Paul Moore <paul@paul-moore.com>
Fixes: e566bff963 ("ovl: port ovl_create_or_link() to new ovl_override_creator_creds")
Link: https://lore.kernel.org/CAFqZXNvL1ciLXMhHrnoyBmQu1PAApH41LkSWEhrcvzAAbFij8Q@mail.gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Tested-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-12-05 16:16:20 -08:00
Linus Torvalds
4b9d25b4d3 Merge tag 'vfs-6.19-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:

 - Fix a type conversion bug in the ipc subsystem

 - Fix per-dentry timeout warning in autofs

 - Drop the fd conversion from sockets

 - Move assert from iput_not_last() to iput()

 - Fix reversed check in filesystems_freeze_callback()

 - Use proper uapi types for new struct delegation definitions

* tag 'vfs-6.19-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  vfs: use UAPI types for new struct delegation definition
  mqueue: correct the type of ro to int
  Revert "net/socket: convert sock_map_fd() to FD_ADD()"
  autofs: fix per-dentry timeout warning
  fs: assert on I_FREEING not being set in iput() and iput_not_last()
  fs: PM: Fix reverse check in filesystems_freeze_callback()
2025-12-05 15:52:30 -08:00
Linus Torvalds
e40e023591 Merge tag 'exfat-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:

 - Fix a remount failure caused by differing process masks by inheriting
   the original mount options during the remount process

 - Fix a potential divide-by-zero error and system crash in
   exfat_allocate_bitmap that occurred when the readahead count was zero

 - Add validation for directory cluster bitmap bits to prevent directory
   and root cluster from being incorrectly zeroed out on corrupted
   images

 - Clear the post-EOF page cache when extending a file to prevent stale
   mmap data from becoming visible, addressing an generic/363 failure

 - Fix a reference count leak in exfat_find by properly releasing the
   dentry set in specific error paths

* tag 'exfat-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: fix remount failure in different process environments
  exfat: fix divide-by-zero in exfat_allocate_bitmap
  exfat: validate the cluster bitmap bits of directory
  exfat: zero out post-EOF page cache on file extension
  exfat: fix refcount leak in exfat_find
2025-12-05 15:48:09 -08:00
Linus Torvalds
4b6b432128 Merge tag 'fuse-update-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:

 - Add mechanism for cleaning out unused, stale dentries; controlled via
   a module option (Luis Henriques)

 - Fix various bugs

 - Cleanups

* tag 'fuse-update-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: Uninitialized variable in fuse_epoch_work()
  fuse: fix io-uring list corruption for terminated non-committed requests
  fuse: signal that a fuse inode should exhibit local fs behaviors
  fuse: Always flush the page cache before FOPEN_DIRECT_IO write
  fuse: Invalidate the page cache after FOPEN_DIRECT_IO write
  fuse: rename 'namelen' to 'namesize'
  fuse: use strscpy instead of strcpy
  fuse: refactor fuse_conn_put() to remove negative logic.
  fuse: new work queue to invalidate dentries from old epochs
  fuse: new work queue to periodically invalidate expired dentries
  dcache: export shrink_dentry_list() and add new helper d_dispose_if_unused()
  fuse: add WARN_ON and comment for RCU revalidate
  fuse: Fix whitespace for fuse_uring_args_to_ring() comment
  fuse: missing copy_finish in fuse-over-io-uring argument copies
  fuse: fix readahead reclaim deadlock
2025-12-05 15:25:13 -08:00
Linus Torvalds
7cd122b552 Merge tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull persistent dentry infrastructure and conversion from Al Viro:
 "Some filesystems use a kinda-sorta controlled dentry refcount leak to
  pin dentries of created objects in dcache (and undo it when removing
  those). A reference is grabbed and not released, but it's not actually
  _stored_ anywhere.

  That works, but it's hard to follow and verify; among other things, we
  have no way to tell _which_ of the increments is intended to be an
  unpaired one. Worse, on removal we need to decide whether the
  reference had already been dropped, which can be non-trivial if that
  removal is on umount and we need to figure out if this dentry is
  pinned due to e.g. unlink() not done. Usually that is handled by using
  kill_litter_super() as ->kill_sb(), but there are open-coded special
  cases of the same (consider e.g. /proc/self).

  Things get simpler if we introduce a new dentry flag
  (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set
  claims responsibility for +1 in refcount.

  The end result this series is aiming for:

   - get these unbalanced dget() and dput() replaced with new primitives
     that would, in addition to adjusting refcount, set and clear
     persistency flag.

   - instead of having kill_litter_super() mess with removing the
     remaining "leaked" references (e.g. for all tmpfs files that hadn't
     been removed prior to umount), have the regular
     shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries,
     dropping the corresponding reference if it had been set. After that
     kill_litter_super() becomes an equivalent of kill_anon_super().

  Doing that in a single step is not feasible - it would affect too many
  places in too many filesystems. It has to be split into a series.

  This work has really started early in 2024; quite a few preliminary
  pieces have already gone into mainline. This chunk is finally getting
  to the meat of that stuff - infrastructure and most of the conversions
  to it.

  Some pieces are still sitting in the local branches, but the bulk of
  that stuff is here"

* tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  d_make_discardable(): warn if given a non-persistent dentry
  kill securityfs_recursive_remove()
  convert securityfs
  get rid of kill_litter_super()
  convert rust_binderfs
  convert nfsctl
  convert rpc_pipefs
  convert hypfs
  hypfs: swich hypfs_create_u64() to returning int
  hypfs: switch hypfs_create_str() to returning int
  hypfs: don't pin dentries twice
  convert gadgetfs
  gadgetfs: switch to simple_remove_by_name()
  convert functionfs
  functionfs: switch to simple_remove_by_name()
  functionfs: fix the open/removal races
  functionfs: need to cancel ->reset_work in ->kill_sb()
  functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
  functionfs: don't abuse ffs_data_closed() on fs shutdown
  convert selinuxfs
  ...
2025-12-05 14:36:21 -08:00
Linus Torvalds
7203ca412f Merge tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

  "__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki)
     Rework the vmalloc() code to support non-blocking allocations
     (GFP_ATOIC, GFP_NOWAIT)

  "ksm: fix exec/fork inheritance" (xu xin)
     Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not
     inherited across fork/exec

  "mm/zswap: misc cleanup of code and documentations" (SeongJae Park)
     Some light maintenance work on the zswap code

  "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira)
     Enhance the /sys/kernel/debug/page_owner debug feature by adding
     unique identifiers to differentiate the various stack traces so
     that userspace monitoring tools can better match stack traces over
     time

  "mm/page_alloc: pcp->batch cleanups" (Joshua Hahn)
     Minor alterations to the page allocator's per-cpu-pages feature

  "Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra)
     Address a scalability issue in userfaultfd's UFFDIO_MOVE operation

  "kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov)

  "drivers/base/node: fold node register and unregister functions" (Donet Tom)
     Clean up the NUMA node handling code a little

  "mm: some optimizations for prot numa" (Kefeng Wang)
     Cleanups and small optimizations to the NUMA allocation hinting
     code

  "mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn)
     Address long lock hold times at boot on large machines. These were
     causing (harmless) softlockup warnings

  "optimize the logic for handling dirty file folios during reclaim" (Baolin Wang)
     Remove some now-unnecessary work from page reclaim

  "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park)
     Enhance the DAMOS auto-tuning feature

  "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan)
     Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace
     configuration

  "expand mmap_prepare functionality, port more users" (Lorenzo Stoakes)
     Enhance the new(ish) file_operations.mmap_prepare() method and port
     additional callsites from the old ->mmap() over to ->mmap_prepare()

  "Fix stale IOTLB entries for kernel address space" (Lu Baolu)
     Fix a bug (and possible security issue on non-x86) in the IOMMU
     code. In some situations the IOMMU could be left hanging onto a
     stale kernel pagetable entry

  "mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang)
     Clean up and optimize the folio splitting code

  "mm, swap: misc cleanup and bugfix" (Kairui Song)
     Some cleanups and a minor fix in the swap discard code

  "mm/damon: misc documentation fixups" (SeongJae Park)

  "mm/damon: support pin-point targets removal" (SeongJae Park)
     Permit userspace to remove a specific monitoring target in the
     middle of the current targets list

  "mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo)
     A couple of cleanups related to mm header file inclusion

  "mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He)
     improve the selection of swap devices for NUMA machines

  "mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista)
     Change the memory block labels from macros to enums so they will
     appear in kernel debug info

  "ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes)
     Address an inefficiency when KSM unmerges an address range

  "mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park)
     Fix leaks and unhandled malloc() failures in DAMON userspace unit
     tests

  "some cleanups for pageout()" (Baolin Wang)
     Clean up a couple of minor things in the page scanner's
     writeback-for-eviction code

  "mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu)
     Move hugetlb's sysfs/sysctl handling code into a new file

  "introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes)
     Make the VMA guard regions available in /proc/pid/smaps and
     improves the mergeability of guarded VMAs

  "mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes)
     Reduce mmap lock contention for callers performing VMA guard region
     operations

  "vma_start_write_killable" (Matthew Wilcox)
     Start work on permitting applications to be killed when they are
     waiting on a read_lock on the VMA lock

  "mm/damon/tests: add more tests for online parameters commit" (SeongJae Park)
     Add additional userspace testing of DAMON's "commit" feature

  "mm/damon: misc cleanups" (SeongJae Park)

  "make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes)
     Address the possible loss of a VMA's VM_SOFTDIRTY flag when that
     VMA is merged with another

  "mm: support device-private THP" (Balbir Singh)
     Introduce support for Transparent Huge Page (THP) migration in zone
     device-private memory

  "Optimize folio split in memory failure" (Zi Yan)

  "mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang)
     Some more cleanups in the folio splitting code

  "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes)
     Clean up our handling of pagetable leaf entries by introducing the
     concept of 'software leaf entries', of type softleaf_t

  "reparent the THP split queue" (Muchun Song)
     Reparent the THP split queue to its parent memcg. This is in
     preparation for addressing the long-standing "dying memcg" problem,
     wherein dead memcg's linger for too long, consuming memory
     resources

  "unify PMD scan results and remove redundant cleanup" (Wei Yang)
     A little cleanup in the hugepage collapse code

  "zram: introduce writeback bio batching" (Sergey Senozhatsky)
     Improve zram writeback efficiency by introducing batched bio
     writeback support

  "memcg: cleanup the memcg stats interfaces" (Shakeel Butt)
     Clean up our handling of the interrupt safety of some memcg stats

  "make vmalloc gfp flags usage more apparent" (Vishal Moola)
     Clean up vmalloc's handling of incoming GFP flags

  "mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang)
     Teach soft dirty and userfaultfd write protect tracking to use
     RISC-V's Svrsw60t59b extension

  "mm: swap: small fixes and comment cleanups" (Youngjun Park)
     Fix a small bug and clean up some of the swap code

  "initial work on making VMA flags a bitmap" (Lorenzo Stoakes)
     Start work on converting the vma struct's flags to a bitmap, so we
     stop running out of them, especially on 32-bit

  "mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park)
     Address a possible bug in the swap discard code and clean things
     up a little

[ This merge also reverts commit ebb9aeb980 ("vfio/nvgrace-gpu:
  register device memory for poison handling") because it looks
  broken to me, I've asked for clarification   - Linus ]

* tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
  mm: fix vma_start_write_killable() signal handling
  mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate
  mm/swapfile: fix list iteration when next node is removed during discard
  fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling
  mm/kfence: add reboot notifier to disable KFENCE on shutdown
  memcg: remove inc/dec_lruvec_kmem_state helpers
  selftests/mm/uffd: initialize char variable to Null
  mm: fix DEBUG_RODATA_TEST indentation in Kconfig
  mm: introduce VMA flags bitmap type
  tools/testing/vma: eliminate dependency on vma->__vm_flags
  mm: simplify and rename mm flags function for clarity
  mm: declare VMA flags by bit
  zram: fix a spelling mistake
  mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity
  mm/vmscan: skip increasing kswapd_failures when reclaim was boosted
  pagemap: update BUDDY flag documentation
  mm: swap: remove scan_swap_map_slots() references from comments
  mm: swap: change swap_alloc_slow() to void
  mm, swap: remove redundant comment for read_swap_cache_async
  mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational
  ...
2025-12-05 13:52:43 -08:00
Linus Torvalds
ac20755937 Merge tag 'sysctl-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:

 - Move jiffies converters out of kernel/sysctl.c

   Move the jiffies converters into kernel/time/jiffies.c and replace
   the pipe-max-size proc_handler converter with a macro based version.
   This is all part of the effort to relocate non-sysctl logic out of
   kernel/sysctl.c into more relevant subsystems. No functional changes.

 - Generalize proc handler converter creation

   Remove duplicated sysctl converter logic by consolidating it in
   macros. These are used inside sysctl core as well as in pipe.c and
   jiffies.c. Converter kernel and user space pointer args are now
   automatically const qualified for the convenience of the caller. No
   functional changes.

 - Miscellaneous

   Fix kernel-doc format warnings, remove unnecessary __user
   qualifiers, and move the nmi_watchdog sysctl into .rodata.

 - Testing

   This series was run through sysctl selftests/kunit test suite in
   x86_64. It went into linux-next after rc2, giving it a good 4/5 weeks
   of testing.

* tag 'sysctl-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (21 commits)
  sysctl: Wrap do_proc_douintvec with the public function proc_douintvec_conv
  sysctl: Create pipe-max-size converter using sysctl UINT macros
  sysctl: Move proc_doulongvec_ms_jiffies_minmax to kernel/time/jiffies.c
  sysctl: Move jiffies converters to kernel/time/jiffies.c
  sysctl: Move UINT converter macros to sysctl header
  sysctl: Move INT converter macros to sysctl header
  sysctl: Allow custom converters from outside sysctl
  sysctl: remove __user qualifier from stack_erasing_sysctl buffer argument
  sysctl: Create macro for user-to-kernel uint converter
  sysctl: Add optional range checking to SYSCTL_UINT_CONV_CUSTOM
  sysctl: Create unsigned int converter using new macro
  sysctl: Add optional range checking to SYSCTL_INT_CONV_CUSTOM
  sysctl: Create integer converters with one macro
  sysctl: Create converter functions with two new macros
  sysctl: Discriminate between kernel and user converter params
  sysctl: Indicate the direction of operation with macro names
  sysctl: Remove superfluous __do_proc_* indirection
  sysctl: Remove superfluous tbl_data param from "dovec" functions
  sysctl: Replace void pointer with const pointer to ctl_table
  sysctl: fix kernel-doc format warning
  ...
2025-12-05 11:15:37 -08:00
Arnd Bergmann
79edb7f596 Merge tag 'samsung-dt-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/dt
Samsung DTS ARM changes for v6.19

Fix WiFi on Exynos4210 and Exynos4412 boards with Broadcom chip after
system suspend and resume, by using cap-power-off-card to power off the
WiFi during suspend.

* tag 'samsung-dt-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
  ARM: dts: samsung: exynos4412-midas: turn off SDIO WLAN chip during system suspend
  ARM: dts: samsung: exynos4210-trats: turn off SDIO WLAN chip during system suspend
  ARM: dts: samsung: exynos4210-i9100: turn off SDIO WLAN chip during system suspend
  ARM: dts: samsung: universal_c210: turn off SDIO WLAN chip during system suspend

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-12-05 15:32:21 +01:00
Arnd Bergmann
68f9bbf4df Merge tag 'samsung-drivers-6.19-2-late' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/drivers-late
Samsung SoC drivers for v6.19, part two

Two fixes for Exynos PMU (Power Management Unit) driver:

1. Silence lockdep warning being actually a false positive, but quite
   disturbing during testing.  Issue was introduced in v6.18.

2. Drop device refcount when requesting device regmap with
   exynos_get_pmu_regmap_by_phandle().  Issue was introduced much
   earlier (around v6.9), with code being rewritten in between.

* tag 'samsung-drivers-6.19-2-late' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
  soc: samsung: exynos-pmu: fix device leak on regmap lookup
  soc: samsung: exynos-pmu: Fix structure initialization

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-12-05 15:31:33 +01:00
Arnd Bergmann
3ce8f4a501 ARM: omap1: avoid symbol clashes in fiq handler
The ams-delta-fiq-handler.S file has a number of symbols with fairly
generic names, including one named 'exit' that causes a compiler warning
in some configuration options:

vmlinux.o: error: exit() function name creates ambiguity with -ffunction-sections

Change all these symbols to use a .L prefix to make them local to
the fiq handler.

Reviewed-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>
Link: https://lore.kernel.org/r/20251204095355.1032786-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-12-05 15:25:13 +01:00
Thomas Weißschuh
fe93446b5e vfs: use UAPI types for new struct delegation definition
Using libc types and headers from the UAPI headers is problematic as it
introduces a dependency on a full C toolchain.

Use the fixed-width integer types provided by the UAPI headers instead.

Fixes: 1602bad16d ("vfs: expose delegation support to userland")
Fixes: 4be9e04ebf ("vfs: add needed headers for new struct delegation definition")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20251203-uapi-fcntl-v1-1-490c67bf3425@linutronix.de
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-05 13:57:39 +01:00
Edward Adam Davis
8cf01d0c43 mqueue: correct the type of ro to int
The ro variable, being of type bool, caused the -EROFS return value from
mnt_want_write() to be implicitly converted to 1. This prevented the file
from being correctly acquired, thus triggering the issue reported by
syzbot [1].

Changing the type of ro to int allows the system to correctly identify
the reason for the file open failure.

[1]
KASAN: null-ptr-deref in range [0x0000000000000040-0x0000000000000047]
Call Trace:
 do_mq_open+0x5a0/0x770 ipc/mqueue.c:932
 __do_sys_mq_open ipc/mqueue.c:945 [inline]
 __se_sys_mq_open ipc/mqueue.c:938 [inline]
 __x64_sys_mq_open+0x16a/0x1c0 ipc/mqueue.c:938

Fixes: f2573685bd ("ipc: convert do_mq_open() to FD_ADD()")
Reported-by: syzbot+40f42779048f7476e2e0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=40f42779048f7476e2e0
Tested-by: syzbot+40f42779048f7476e2e0@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Link: https://patch.msgid.link/tencent_369728EA76ED36CD98793A6D942C956C4C0A@qq.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-05 13:57:39 +01:00
Christian Brauner
afb9917d9b Revert "net/socket: convert sock_map_fd() to FD_ADD()"
This reverts commit 245f0d1c62.

When allocating a file sock_alloc_file() consumes the socket reference
unconditionally which isn't correctly handled in the conversion. This
can be fixed by massaging this appropriately but this is best left for
next cycle.

Reported-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/CADvbK_ewub4ZZK-tZg8GBQbDFHWhd9a48C+AFXZ93pMsssCrUg@mail.gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-05 13:57:39 +01:00
Ian Kent
b6cb3ccef6 autofs: fix per-dentry timeout warning
The check that determines if the message that warns about the per-dentry
timeout being greater than the super block timeout is not correct.

The initial value for this field is -1 and the type of the field is
unsigned long.

I could change the type to long but the message is in the wrong place
too, it should come after the timeout setting. So leave everything else
as it is and move the message and check the timeout is actually set
as an additional condition on issuing the message. Also fix the timeout
comparison.

Signed-off-by: Ian Kent <raven@themaw.net>
Link: https://patch.msgid.link/20251111060439.19593-2-raven@themaw.net
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-04 09:57:51 +01:00
Mateusz Guzik
aa8aba61d4 fs: assert on I_FREEING not being set in iput() and iput_not_last()
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251201132037.22835-1-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-03 11:14:50 +01:00
Rafael J. Wysocki
222047f68e fs: PM: Fix reverse check in filesystems_freeze_callback()
The freeze_all_ptr check in filesystems_freeze_callback() introduced by
commit a3f8f86627 ("power: always freeze efivarfs") is reverse which
quite confusingly causes all file systems to be frozen when
filesystem_freeze_enabled is false.

On my systems it causes the WARN_ON_ONCE() in __set_task_frozen() to
trigger, most likely due to an attempt to freeze a file system that is
not ready for that.

Add a logical negation to the check in question to reverse it as
appropriate.

Fixes: a3f8f86627 ("power: always freeze efivarfs")
Cc: 6.18+ <stable@vger.kernel.org> # 6.18+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/12788397.O9o76ZdvQC@rafael.j.wysocki
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-03 11:10:12 +01:00
Yuezhang Mo
51fc7b4ce1 exfat: fix remount failure in different process environments
The kernel test robot reported that the exFAT remount operation
failed. The reason for the failure was that the process's umask
is different between mount and remount, causing fs_fmask and
fs_dmask are changed.

Potentially, both gid and uid may also be changed. Therefore, when
initializing fs_context for remount, inherit these mount options
from the options used during mount.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202511251637.81670f5c-lkp@intel.com
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-12-03 10:00:17 +09:00
Namjae Jeon
d70a5804c5 exfat: fix divide-by-zero in exfat_allocate_bitmap
The variable max_ra_count can be 0 in exfat_allocate_bitmap(),
which causes a divide-by-zero error in the subsequent modulo operation
(i % max_ra_count), leading to a system crash.
When max_ra_count is 0, it means that readahead is not used. This patch
load the bitmap without readahead.

Fixes: 9fd688678d ("exfat: optimize allocation bitmap loading time")
Reported-by: Jiaming Zhang <r772577952@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-12-03 10:00:16 +09:00
Namjae Jeon
866cba3675 exfat: validate the cluster bitmap bits of directory
Syzbot created this issue by testing an image that did not have the root
cluster bitmap bit marked. After accessing a file through the root
directory via exfat_lookup, when creating a file again with mkdir,
the root cluster bit can be allocated for direcotry, which can cause
the root cluster to be zeroed out and the same entry can be allocated
in the same cluster. This patch improved this issue by adding
exfat_test_bitmap to validate the cluster bits of the root directory
and directory. And the first cluster bit of the root directory should
never be unset except when storage is corrupted. This bit is set to
allow operations after mount.

Reported-by: syzbot+5216036fc59c43d1ee02@syzkaller.appspotmail.com
Tested-by: syzbot+5216036fc59c43d1ee02@syzkaller.appspotmail.com
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-12-03 10:00:16 +09:00
Yuezhang Mo
4e163c39dd exfat: zero out post-EOF page cache on file extension
xfstests generic/363 was failing due to unzeroed post-EOF page
cache that allowed mmap writes beyond EOF to become visible
after file extension.

For example, in following xfs_io sequence, 0x22 should not be
written to the file but would become visible after the extension:

  xfs_io -f -t -c "pwrite -S 0x11 0 8" \
    -c "mmap 0 4096" \
    -c "mwrite -S 0x22 32 32" \
    -c "munmap" \
    -c "pwrite -S 0x33 512 32" \
    $testfile

This violates the expected behavior where writes beyond EOF via
mmap should not persist after the file is extended. Instead, the
extended region should contain zeros.

Fix this by using truncate_pagecache() to truncate the page cache
after the current EOF when extending the file.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-12-03 10:00:16 +09:00
Shuhao Fu
9aee8de970 exfat: fix refcount leak in exfat_find
Fix refcount leaks in `exfat_find` related to `exfat_get_dentry_set`.

Function `exfat_get_dentry_set` would increase the reference counter of
`es->bh` on success. Therefore, `exfat_put_dentry_set` must be called
after `exfat_get_dentry_set` to ensure refcount consistency. This patch
relocate two checks to avoid possible leaks.

Fixes: 82ebecdc74 ("exfat: fix improper check of dentry.stream.valid_size")
Fixes: 13940cef95 ("exfat: add a check for invalid data size")
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-12-03 10:00:16 +09:00
Paolo Bonzini
e0c26d47de Merge tag 'kvm-s390-next-6.19-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
- SCA rework
- VIRT_XFER_TO_GUEST_WORK support
- Operation exception forwarding support
- Cleanups
2025-12-02 18:58:47 +01:00
Paolo Bonzini
f58e70cc31 Merge tag 'kvmarm-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.19

 - Support for userspace handling of synchronous external aborts (SEAs),
   allowing the VMM to potentially handle the abort in a non-fatal
   manner.

 - Large rework of the VGIC's list register handling with the goal of
   supporting more active/pending IRQs than available list registers in
   hardware. In addition, the VGIC now supports EOImode==1 style
   deactivations for IRQs which may occur on a separate vCPU than the
   one that acked the IRQ.

 - Support for FEAT_XNX (user / privileged execute permissions) and
   FEAT_HAF (hardware update to the Access Flag) in the software page
   table walkers and shadow MMU.

 - Allow page table destruction to reschedule, fixing long need_resched
   latencies observed when destroying a large VM.

 - Minor fixes to KVM and selftests
2025-12-02 18:36:26 +01:00
Paolo Bonzini
63a9b0bc65 Merge tag 'kvm-riscv-6.19-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv changes for 6.19

- SBI MPXY support for KVM guest
- New KVM_EXIT_FAIL_ENTRY_NO_VSFILE for the case when in-kernel
  AIA virtualization fails to allocate IMSIC VS-file
- Support enabling dirty log gradually in small chunks
- Fix guest page fault within HLV* instructions
- Flush VS-stage TLB after VCPU migration for Andes cores
2025-12-02 18:35:25 +01:00
Paolo Bonzini
8040280405 Merge tag 'loongarch-kvm-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.19

1. Get VM PMU capability from HW GCFG register.
2. Add AVEC basic support.
3. Use 64-bit register definition for EIOINTC.
4. Add KVM timer test cases for tools/selftests.
2025-12-02 18:34:22 +01:00
Thomas Zimmermann
9b36c7fc5a macintosh/via-pmu-backlight: Include <linux/fb.h> and <linux/of.h>
Include <linux/fb.h> and <linux/of.h> to avoid dependency on backlight
header to include them. Declares of_machine_is_compatible() and defines
FB_BACKLIGHT_MAX.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 243ce64b2b ("backlight: Do not include <linux/fb.h> in header file")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: Daniel Thompson (RISCstar) <danielt@kernel.org>
Closes: https://lore.kernel.org/linuxppc-dev/CA+G9fYs8fn5URQx2+s2oNxdUgZkSrdLC0P1tNBW_n-6BaBkK2Q@mail.gmail.com/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2025-12-02 21:55:07 +11:00
Thomas Zimmermann
8d39832496 powerpc/powermac: backlight: Include <linux/of.h>
Include <linux/of.h> to avoid dependency on backlight header to include
it. Declares of_find_node_by_name(), of_property_match_string() and
of_node_put().

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 243ce64b2b ("backlight: Do not include <linux/fb.h> in header file")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: Daniel Thompson (RISCstar) <danielt@kernel.org>
Closes: https://lore.kernel.org/linuxppc-dev/CA+G9fYs8fn5URQx2+s2oNxdUgZkSrdLC0P1tNBW_n-6BaBkK2Q@mail.gmail.com/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2025-12-02 21:54:59 +11:00
Christophe Leroy (CS GROUP)
a3209bb94b um: Disable KASAN_INLINE when STATIC_LINK is selected
um doesn't support KASAN_INLINE together with STATIC_LINK.

Instead of failing the build, disable KASAN_INLINE when
STATIC_LINK is selected.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511290451.x9GZVJ1l-lkp@intel.com/
Fixes: 1e338f4d99 ("kasan: introduce ARCH_DEFER_KASAN and unify static key across modes")
Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Link: https://patch.msgid.link/2620ab0bbba640b6237c50b9c0dca1c7d1142f5d.1764410067.git.chleroy@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-12-01 11:54:50 +01:00
Oliver Upton
3eef0c83c3 Merge branch 'kvm-arm64/nv-xnx-haf' into kvmarm/next
* kvm-arm64/nv-xnx-haf: (22 commits)
  : Support for FEAT_XNX and FEAT_HAF in nested
  :
  : Add support for a couple of MMU-related features that weren't
  : implemented by KVM's software page table walk:
  :
  :  - FEAT_XNX: Allows the hypervisor to describe execute permissions
  :    separately for EL0 and EL1
  :
  :  - FEAT_HAF: Hardware update of the Access Flag, which in the context of
  :    nested means software walkers must also set the Access Flag.
  :
  : The series also adds some basic support for testing KVM's emulation of
  : the AT instruction, including the implementation detail that AT sets the
  : Access Flag in KVM.
  KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
  KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
  KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
  KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
  KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
  KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
  KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
  KVM: arm64: selftests: Add test for AT emulation
  KVM: arm64: nv: Expose hardware access flag management to NV guests
  KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
  KVM: arm64: Implement HW access flag management in stage-1 SW PTW
  KVM: arm64: Propagate PTW errors up to AT emulation
  KVM: arm64: Add helper for swapping guest descriptor
  KVM: arm64: nv: Use pgtable definitions in stage-2 walk
  KVM: arm64: Handle endianness in read helper for emulated PTW
  KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
  KVM: arm64: Call helper for reading descriptors directly
  KVM: arm64: nv: Advertise support for FEAT_XNX
  KVM: arm64: Teach ptdump about FEAT_XNX permissions
  KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2
  ...

Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:47:41 -08:00
Oliver Upton
938309b028 Merge branch 'kvm-arm64/vgic-lr-overflow' into kvmarm/next
* kvm-arm64/vgic-lr-overflow: (50 commits)
  : Support for VGIC LR overflows, courtesy of Marc Zyngier
  :
  : Address deficiencies in KVM's GIC emulation when a vCPU has more active
  : IRQs than can be represented in the VGIC list registers. Sort the AP
  : list to prioritize inactive and pending IRQs, potentially spilling
  : active IRQs outside of the LRs.
  :
  : Handle deactivation of IRQs outside of the LRs for both EOImode=0/1,
  : which involves special consideration for SPIs being deactivated from a
  : different vCPU than the one that acked it.
  KVM: arm64: Convert ICH_HCR_EL2_TDIR cap to EARLY_LOCAL_CPU_FEATURE
  KVM: arm64: selftests: vgic_irq: Add timer deactivation test
  KVM: arm64: selftests: vgic_irq: Add Group-0 enable test
  KVM: arm64: selftests: vgic_irq: Add asymmetric SPI deaectivation test
  KVM: arm64: selftests: vgic_irq: Perform EOImode==1 deactivation in ack order
  KVM: arm64: selftests: vgic_irq: Remove LR-bound limitation
  KVM: arm64: selftests: vgic_irq: Exclude timer-controlled interrupts
  KVM: arm64: selftests: vgic_irq: Change configuration before enabling interrupt
  KVM: arm64: selftests: vgic_irq: Fix GUEST_ASSERT_IAR_EMPTY() helper
  KVM: arm64: selftests: gic_v3: Disable Group-0 interrupts by default
  KVM: arm64: selftests: gic_v3: Add irq group setting helper
  KVM: arm64: GICv2: Always trap GICV_DIR register
  KVM: arm64: GICv2: Handle deactivation via GICV_DIR traps
  KVM: arm64: GICv2: Handle LR overflow when EOImode==0
  KVM: arm64: GICv3: Force exit to sync ICH_HCR_EL2.En
  KVM: arm64: GICv3: nv: Plug L1 LR sync into deactivation primitive
  KVM: arm64: GICv3: nv: Resync LRs/VMCR/HCR early for better MI emulation
  KVM: arm64: GICv3: Avoid broadcast kick on CPUs lacking TDIR
  KVM: arm64: GICv3: Handle in-LR deactivation when possible
  KVM: arm64: GICv3: Add SPI tracking to handle asymmetric deactivation
  ...

Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:47:32 -08:00
Oliver Upton
11b8e6edc1 Merge branch 'kvm-arm64/sea-user' into kvmarm/next
* kvm-arm64/sea-user:
  : Userspace handling of SEAs, courtesy of Jiaqi Yan
  :
  : Add support for processing external aborts in userspace in situations
  : where the host has failed to do so, allowing the VMM to potentially
  : reinject an external abort into the VM.
  Documentation: kvm: new UAPI for handling SEA
  KVM: selftests: Test for KVM_EXIT_ARM_SEA
  KVM: arm64: VM exit to userspace to handle SEA

Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:47:20 -08:00
Oliver Upton
404c2027d5 Merge branch 'kvm-arm64/misc' into kvmarm/next
* kvm-arm64/misc:
  : Miscellaneous fixes/cleanups for KVM/arm64
  :
  :  - Fix for need_resched warnings on non-preemptible kernels when
  :    tearing down a VM's stage-2
  :
  :  - Improvements to KVM struct allocation, getting rid of pointless
  :    __GFP_HIGHMEM and switching to kvzalloc()
  :
  :  - SYNC ITS configuration before injecting LPIs in vgic_lpi_stress
  :    selftest
  KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables
  KVM: arm64: Split kvm_pgtable_stage2_destroy()
  KVM: arm64: Only drop references on empty tables in stage2_free_walker
  KVM: selftests: SYNC after guest ITS setup in vgic_lpi_stress
  KVM: selftests: Assert GICR_TYPER.Processor_Number matches selftest CPU number
  KVM: arm64: Use kvzalloc() for kvm struct allocation
  KVM: arm64: Drop useless __GFP_HIGHMEM from kvm struct allocation

Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:47:12 -08:00
Alexandru Elisei
d52aca1635 KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
A guest can write 1 to TCR_ELx.HA, making the KVM software walker update
the access flag in a table descriptor even if FEAT_HAFDBS is not present.
Avoid this by making wi->ha depend on FEAT_HAFDBS being enabled in the VM,
similar to how the software walker treats FEAT_HPDS.

This is not needed for VTCR_EL2.HA, since a guest will always write to
the in-memory copy of the register, where the HA bit is masked (set to
0) by KVM if the VM doesn't have FEAT_HAFDBS.

Fixes: c59ca4b5b0c3 ("KVM: arm64: Implement HW access flag management in stage-1 SW PTW")
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://msgid.link/20251128100946.74210-5-alexandru.elisei@arm.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:55 -08:00
Alexandru Elisei
e88d60c0aa KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
According to ARM DDI 0487L.b, the HA bit in TCR_EL2 when the translation
regime is EL2 (or !ELIsInHost(EL2)) is bit 21, not 39.

Fixes: c59ca4b5b0c3 ("KVM: arm64: Implement HW access flag management in stage-1 SW PTW")
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://msgid.link/20251128100946.74210-3-alexandru.elisei@arm.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:03 -08:00
Alexandru Elisei
93e8d99781 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
Commit 2608563b46 ("KVM: arm64: Add support for FEAT_XNX stage-2
permissions") added the KVM_PGTABLE_PROX_{UX,PX} permissions to stage 2 and
to EL2 translation regimes, but left them undocumented. Let's fix that.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://msgid.link/20251128100946.74210-2-alexandru.elisei@arm.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:03 -08:00
Colin Ian King
05474b7bc7 KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
There is a spelling mistake in a TEST_FAIL message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://msgid.link/20251128175124.319094-1-colin.i.king@gmail.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:02 -08:00
Nathan Chancellor
d98a04dc19 KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
Clang warns (or errors with CONFIG_WERROR=y / W=e):

  arch/arm64/kvm/hyp/pgtable.c:757:2: error: label at end of compound statement is a C23 extension [-Werror,-Wc23-extensions]
    757 |         }
        |         ^

With older versions of clang (15 and older) and GCC (at least the minimum
supported, 8.1), this is an unconditional hard error:

  arch/arm64/kvm/hyp/pgtable.c: In function 'kvm_pgtable_stage2_pte_prot':
  arch/arm64/kvm/hyp/pgtable.c:756:2: error: label at end of compound statement
    default:
    ^~~~~~~

  arch/arm64/kvm/hyp/pgtable.c:756:10: error: label at end of compound statement: expected statement
          default:
                  ^
                   ;

Add a break statement to this default case to clear up the error/warning.

Fixes: 2608563b46 ("KVM: arm64: Add support for FEAT_XNX stage-2 permissions")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251125-arm64-kvm-hyp-pgtable-fix-c23-ext-warn-v1-1-98b506ddefbf@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:02 -08:00
Marc Zyngier
b0fc8329ec KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
Keep sparse quiet by explicitly casting endianness conversion
when swapping S1 and S2 descriptors.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511260246.JQDGsQKa-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202511260344.9XehvH5Q-lkp@intel.com/
Fixes: c59ca4b5b0c3f ("KVM: arm64: Implement HW access flag management in stage-1 SW PTW")
Fixes: 39db933ba67f8 ("KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251125204848.1136383-1-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:02 -08:00
Oliver Upton
36fe022f88 KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
__lse_swap_desc() is compiled unconditionally, even if LSE is disabled
using the config option. Align with the spirit of the config option and
fix some build errors due to __LSE_PREAMBLE being undefined with the
application of some ifdeffery.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511250700.kAutzJFm-lkp@intel.com/
Link: https://msgid.link/20251124235409.1731253-1-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:02 -08:00
Oliver Upton
66f1888583 KVM: arm64: selftests: Add test for AT emulation
Add a basic test for AT emulation in the EL2&0 and EL1&0 translation
regimes.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-16-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:02 -08:00
Oliver Upton
d5bbb76f44 KVM: arm64: nv: Expose hardware access flag management to NV guests
Everything is in place to update the access flag at S1 and S2. Expose
support for the access flag flavor of FEAT_HAFDBS to NV guests.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-15-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:02 -08:00
Oliver Upton
e4c7dfac2f KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
Give the stage-2 walk similar treatment to stage-1: update the access
flag during the table walk and do so for any walk context.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-14-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:02 -08:00
Oliver Upton
bff8aa213d KVM: arm64: Implement HW access flag management in stage-1 SW PTW
Atomically update the Access flag at stage-1 when the guest has
configured the MMU to do so. Make the implementation choice (and liberal
interpretation of speculation) that any access type updates the Access
flag, including AT and CMO instructions.

Restart the entire walk by returning to the exception-generating
instruction in the case of a failed Access flag update.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-13-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:02 -08:00
Oliver Upton
92c6443222 KVM: arm64: Propagate PTW errors up to AT emulation
KVM's software PTW will soon support 'hardware' updates to the access
flag. Similar to fault handling, races to update the descriptor will be
handled by restarting the instruction. Prepare for this by propagating
errors up to the AT emulation, only retiring the instruction if the walk
succeeds.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-12-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:02 -08:00
Oliver Upton
f6927b41d5 KVM: arm64: Add helper for swapping guest descriptor
Implementing FEAT_HAFDBS in KVM's software PTWs requires the ability to
CAS a descriptor to update the in-memory value. Add an accessor to do
exactly that, coping with the fact that guest descriptors are in user
memory (duh).

While FEAT_LSE required on any system that implements NV, KVM now uses
the stage-1 PTW for non-nested use cases meaning an LL/SC implementation
is necessary as well.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-11-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:02 -08:00
Oliver Upton
590e694820 KVM: arm64: nv: Use pgtable definitions in stage-2 walk
Use the existing page table definitions instead of magic numbers for the
stage-2 table walk.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-10-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:02 -08:00
Oliver Upton
fabf321cba KVM: arm64: Handle endianness in read helper for emulated PTW
Implementing FEAT_HAFDBS means adding another descriptor accessor that
needs to deal with the guest-configured endianness. Prepare by moving
the endianness handling into the read accessor and out of the main body
of the S1/S2 PTWs.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-9-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:02 -08:00
Oliver Upton
977d1bf15c KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
The stage-2 table walker passes down the vCPU as a void pointer. That
might've made sense if the walker was generic although at this point it
is clear this will only ever be used in the context of a vCPU.

Suggested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-8-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:01 -08:00
Oliver Upton
cdba9da34b KVM: arm64: Call helper for reading descriptors directly
Going through a function pointer doesn't serve much purpose when there's
only one implementation.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-7-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:01 -08:00
Oliver Upton
692650bd7b KVM: arm64: nv: Advertise support for FEAT_XNX
Everything is in place to support FEAT_XNX, advertise support.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-6-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:01 -08:00
Oliver Upton
6b49f70022 KVM: arm64: Teach ptdump about FEAT_XNX permissions
Although KVM doesn't make direct use of the feature, guest hypervisors
can use FEAT_XNX which influences the permissions of the shadow stage-2.
Update ptdump to separately print the privileged and unprivileged
execute permissions.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-5-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:43:47 -08:00
Matthew Wilcox (Oracle)
faf3c92352 mm: fix vma_start_write_killable() signal handling
If we get a signal, we need to restore the vm_refcnt.  We don't think that
the refcount can actually be decremented to zero here as it requires the
VMA to be detached, and the vma_mark_detached() uses TASK_UNINTERRUPTIBLE.
However, that's a bit subtle, so handle it as if the refcount was zero at
the start of this function.

Link: https://lkml.kernel.org/r/20251128040100.3022561-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com
Fixes: 2197bb60f8 ("mm: add vma_start_write_killable()")
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:11 -08:00
Youngjun Park
b60a3ef784 mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate
The loop breaks immediately after finding the first swap device and
never modifies the list. Replace plist_for_each_entry_safe() with
plist_for_each_entry() and remove the unused next variable.

Link: https://lkml.kernel.org/r/20251127100303.783198-3-youngjun.park@lge.com
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Kairui Song <kasong@tencent.com>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: Barry Song <baohua@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:11 -08:00
Youngjun Park
f9e82f99b3 mm/swapfile: fix list iteration when next node is removed during discard
Patch series "mm/swapfile: fix and cleanup swap list iterations", v2.

This series fixes a potential list iteration issue in swap_sync_discard()
when devices are removed, and includes a cleanup for
__folio_throttle_swaprate().


This patch (of 2):

When the next node is removed from the plist (e.g.  by swapoff),
plist_del() makes the node point to itself, causing the iteration to loop
on the same entry indefinitely.

Add a plist_node_empty() check to detect this case and restart iteration,
allowing swap_sync_discard() to continue processing remaining swap devices
that still have pending discard entries.

Additionally, switch from swap_avail_lock/swap_avail_head to
swap_lock/swap_active_head so that iteration is only affected by swapoff
operations rather than frequent availability changes, reducing exceptional
condition checks and lock contention.

Link: https://lkml.kernel.org/r/20251127100303.783198-1-youngjun.park@lge.com
Link: https://lkml.kernel.org/r/20251127100303.783198-2-youngjun.park@lge.com
Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation")
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
Suggested-by: Kairui Song <kasong@tencent.com>
Acked-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:10 -08:00
Lorenzo Stoakes
12f0cd3933 fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling
make_uffd_wp_huge_pte() should return after handling a huge_pte_none()
pte.

Link: https://lkml.kernel.org/r/66178124-ebdf-4e23-b8ca-ed3eb8030c81@lucifer.local
Fixes: 03bfbc3ad6 ("mm: remove is_hugetlb_entry_[migration, hwpoisoned]()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Closes: https://lkml.kernel.org/r/dc483db3-be4d-45f7-8b40-a28f5d8f5738@suse.cz
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:10 -08:00
Breno Leitao
ce2bba8956 mm/kfence: add reboot notifier to disable KFENCE on shutdown
During system shutdown, KFENCE can cause IPI synchronization issues if it
remains active through the reboot process.  To prevent this, register a
reboot notifier that disables KFENCE and cancels any pending timer work
early in the shutdown sequence.

This is only necessary when CONFIG_KFENCE_STATIC_KEYS is enabled, as this
configuration sends IPIs that can interfere with shutdown.  Without static
keys, no IPIs are generated and KFENCE can safely remain active.

The notifier uses maximum priority (INT_MAX) to ensure KFENCE shuts down
before other subsystems that might still depend on stable memory
allocation behavior.

This fixes a late kexec CSD lockup[1] when kfence is trying to IPI a CPU
that is busy in a IRQ-disabled context printing characters to the console.

Link: https://lkml.kernel.org/r/20251127-kfence-v2-1-daeccb5ef9aa@debian.org
Link: https://lkml.kernel.org/r/20251126-kfence-v1-1-5a6e1d7c681c@debian.org
Link: https://lore.kernel.org/all/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupfexeml6hkd@v6sqmpbu3vvu/ [1]
Fixes: 0ce20dd840 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:10 -08:00
Chen Ridong
f3b566d726 memcg: remove inc/dec_lruvec_kmem_state helpers
The dec_lruvec_kmem_state helper is unused by any caller and can be safely
removed.  Meanwhile, the inc_lruvec_kmem_state helper is only referenced
by shadow_lru_isolate, retaining these two helpers is unnecessary.  This
patch removes both helper functions to eliminate redundant code.

Link: https://lkml.kernel.org/r/20251126020435.1511637-1-chenridong@huaweicloud.com
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Lu Jialin <lujialin4@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:10 -08:00
Ankit Khushwaha
0384c8ea96 selftests/mm/uffd: initialize char variable to Null
In "uffd-stress.c" & "uffd-unit-tests.c". address of char variable having
garbage value (uninitialized) is passed to 'write' syscall triggers
warning.

	uffd-stress.c:246:39: warning: variable 'c' is uninitialized when
	passed  as a const pointer argument here
	[-Wuninitialized-const-pointer]

	uffd-unit-tests.c:581:31: warning: variable 'c' is uninitialized
	when passed as a const pointer argument here
	[-Wuninitialized-const-pointer]

so the fix is to assign char variable to '\0' to prevent writing of
garbage value.

Link: https://lkml.kernel.org/r/20251126160830.52124-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:09 -08:00
Geert Uytterhoeven
f65372cd7a mm: fix DEBUG_RODATA_TEST indentation in Kconfig
Most of the DEBUG_RODATA_TEST section is indented by four spaces instead
of the customary single TAB.

Link: https://lkml.kernel.org/r/74f39b1bffc6ed802088cb3e7d17b4c82330e8b3.1764058676.git.geert@linux-m68k.org
Fixes: 2959a5f726 ("mm: add arch-independent testcases for RODATA")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Jinbum Park <jinb.park7@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:09 -08:00
Lorenzo Stoakes
9ea35a25d5 mm: introduce VMA flags bitmap type
It is useful to transition to using a bitmap for VMA flags so we can avoid
running out of flags, especially for 32-bit kernels which are constrained
to 32 flags, necessitating some features to be limited to 64-bit kernels
only.

By doing so, we remove any constraint on the number of VMA flags moving
forwards no matter the platform and can decide in future to extend beyond
64 if required.

We start by declaring an opaque types, vma_flags_t (which resembles
mm_struct flags of type mm_flags_t), setting it to precisely the same size
as vm_flags_t, and place it in union with vm_flags in the VMA declaration.

We additionally update struct vm_area_desc equivalently placing the new
opaque type in union with vm_flags.

This change therefore does not impact the size of struct vm_area_struct or
struct vm_area_desc.

In order for the change to be iterative and to avoid impacting
performance, we designate VM_xxx declared bitmap flag values as those
which must exist in the first system word of the VMA flags bitmap.

We therefore declare vma_flags_clear_all(), vma_flags_overwrite_word(),
vma_flags_overwrite_word(), vma_flags_overwrite_word_once(),
vma_flags_set_word() and vma_flags_clear_word() in order to allow us to
update the existing vm_flags_*() functions to utilise these helpers.

This is a stepping stone towards converting users to the VMA flags bitmap
and behaves precisely as before.

By doing this, we can eliminate the existing private vma->__vm_flags field
in the vma->vm_flags union and replace it with the newly introduced opaque
type vma_flags, which we call flags so we refer to the new bitmap field as
vma->flags.

We update vma_flag_[test, set]_atomic() to account for the change also.

We adapt vm_flags_reset_once() to only clear those bits above the first
system word providing write-once semantics to the first system word (which
it is presumed the caller requires - and in all current use cases this is
so).

As we currently only specify that the VMA flags bitmap size is equal to
BITS_PER_LONG number of bits, this is a noop, but is defensive in
preparation for a future change that increases this.

We additionally update the VMA userland test declarations to implement the
same changes there.

Finally, we update the rust code to reference vma->vm_flags on update
rather than vma->__vm_flags which has been removed.  This is safe for now,
albeit it is implicitly performing a const cast.

Once we introduce flag helpers we can improve this more.

No functional change intended.

Link: https://lkml.kernel.org/r/bab179d7b153ac12f221b7d65caac2759282cfe9.1764064557.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Alice Ryhl <aliceryhl@google.com>	[rust]
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Trevor Gross <tmgross@umich.edu>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:09 -08:00
Lorenzo Stoakes
4c613f518f tools/testing/vma: eliminate dependency on vma->__vm_flags
The userland VMA test code relied on an internal implementation detail -
the existence of vma->__vm_flags to directly access VMA flags.  There is
no need to do so when we have the vm_flags_*() helper functions available.

This is ugly, but also a subsequent commit will eliminate this field
altogether so this will shortly become broken.

This patch has us utilise the helper functions instead.

Link: https://lkml.kernel.org/r/6275c53a6bb20743edcbe92d3e130183b47d18d0.1764064557.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Alice Ryhl <aliceryhl@google.com>	[rust]
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Trevor Gross <tmgross@umich.edu>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:09 -08:00
Lorenzo Stoakes
58eac97a8b mm: simplify and rename mm flags function for clarity
The __mm_flags_set_word() function is slightly ambiguous - we use 'set' to
refer to setting individual bits (such as in mm_flags_set()) but here we
use it to refer to overwriting the value altogether.

Rename it to __mm_flags_overwrite_word() to eliminate this ambiguity.

We additionally simplify the functions, eliminating unnecessary
bitmap_xxx() operations (the compiler would have optimised these out but
it's worth being as clear as we can be here).

Link: https://lkml.kernel.org/r/8f0bc556e1b90eca8ea5eba41f8d5d3f9cd7c98a.1764064557.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Alice Ryhl <aliceryhl@google.com>	[rust]
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Trevor Gross <tmgross@umich.edu>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:08 -08:00
Lorenzo Stoakes
2b6a3f061f mm: declare VMA flags by bit
Patch series "initial work on making VMA flags a bitmap", v3.

We are in the rather silly situation that we are running out of VMA flags
as they are currently limited to a system word in size.

This leads to absurd situations where we limit features to 64-bit
architectures only because we simply do not have the ability to add a flag
for 32-bit ones.

This is very constraining and leads to hacks or, in the worst case, simply
an inability to implement features we want for entirely arbitrary reasons.

This also of course gives us something of a Y2K type situation in mm where
we might eventually exhaust all of the VMA flags even on 64-bit systems.

This series lays the groundwork for getting away from this limitation by
establishing VMA flags as a bitmap whose size we can increase in future
beyond 64 bits if required.

This is necessarily a highly iterative process given the extensive use of
VMA flags throughout the kernel, so we start by performing basic steps.

Firstly, we declare VMA flags by bit number rather than by value,
retaining the VM_xxx fields but in terms of these newly introduced
VMA_xxx_BIT fields.

While we are here, we use sparse annotations to ensure that, when dealing
with VMA bit number parameters, we cannot be passed values which are not
declared as such - providing some useful type safety.

We then introduce an opaque VMA flag type, much like the opaque mm_struct
flag type introduced in commit bb6525f2f8 ("mm: add bitmap mm->flags
field"), which we establish in union with vma->vm_flags (but still set at
system word size meaning there is no functional or data type size change).

We update the vm_flags_xxx() helpers to use this new bitmap, introducing
sensible helpers to do so.

This series lays the foundation for further work to expand the use of
bitmap VMA flags and eventually eliminate these arbitrary restrictions.


This patch (of 4):

In order to lay the groundwork for VMA flags being a bitmap rather than a
system word in size, we need to be able to consistently refer to VMA flags
by bit number rather than value.

Take this opportunity to do so in an enum which we which is additionally
useful for tooling to extract metadata from.

This additionally makes it very clear which bits are being used for what
at a glance.

We use the VMA_ prefix for the bit values as it is logical to do so since
these reference VMAs.  We consistently suffix with _BIT to make it clear
what the values refer to.

We declare bit values even when the flags that use them would not be
enabled by config options as this is simply clearer and clearly defines
what bit numbers are used for what, at no additional cost.

We declare a sparse-bitwise type vma_flag_t which ensures that users can't
pass around invalid VMA flags by accident and prepares for future work
towards VMA flags being a bitmap where we want to ensure bit values are
type safe.

To make life easier, we declare some macro helpers - DECLARE_VMA_BIT()
allows us to avoid duplication in the enum bit number declarations (and
maintaining the sparse __bitwise attribute), and INIT_VM_FLAG() is used to
assist with declaration of flags.

Unfortunately we can't declare both in the enum, as we run into issue with
logic in the kernel requiring that flags are preprocessor definitions, and
additionally we cannot have a macro which declares another macro so we
must define each flag macro directly.

Additionally, update the VMA userland testing vma_internal.h header to
include these changes.

We also have to fix the parameters to the vma_flag_*_atomic() functions
since VMA_MAYBE_GUARD_BIT is now of type vma_flag_t and sparse will
complain otherwise.

We have to update some rather silly if-deffery found in mm/task_mmu.c
which would otherwise break.

Finally, we update the rust binding helper as now it cannot auto-detect
the flags at all.

Link: https://lkml.kernel.org/r/cover.1764064556.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/3a35e5a0bcfa00e84af24cbafc0653e74deda64a.1764064556.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Alice Ryhl <aliceryhl@google.com>	[rust]
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Trevor Gross <tmgross@umich.edu>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:08 -08:00
Chu Guangqing
8f4338b114 zram: fix a spelling mistake
The spelling of the word "relases" is incorrect; it should be "releases".

Link: https://lkml.kernel.org/r/20251125020522.1913-1-chuguangqing@inspur.com
Signed-off-by: Chu Guangqing <chuguangqing@inspur.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:08 -08:00
fujunjie
a493c7a650 mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity
calculate_totalreserve_pages() currently finds the maximum
lowmem_reserve[j] for a zone by scanning the full forward range [j =
zone_idx ..  MAX_NR_ZONES).  However, for a given zone i, the
lowmem_reserve[j] array (for j > i) is naturally expected to form a
monotonically non-decreasing sequence in j, not as an implementation
detail, but as a consequence that naturally arises from the semantics of
lowmem_reserve[].

For zone "i", lowmem_reserve[j] expresses how many pages in zone i must
effectively be kept in reserve when deciding whether an allocation class
that may allocate from zones up to j is allowed to fall back into i.  It
protects less flexible allocation classes (which cannot use higher zones)
from being starved by more flexible ones.

Viewed from this semantics, it is natural to expect a partial ordering in
j: as j increases, the allocation class gains access to a strictly larger
set of fallback zones.  Therefore lowmem_reserve[j] is expected to be
monotonically non-decreasing in j: more flexible allocation classes must
not be allowed to deplete low zones more aggressively than less flexible
ones.

In other words, if lowmem_reserve[j] were ever observed to *decrease* as j
grows, that would be unexpected from the reserve semantics' point of view
and would likely indicate a semantic change or a misconfiguration.

The current implementation in setup_per_zone_lowmem_reserve() reflects
this policy by accumulating managed pages from higher zones and applying
the configured ratio, which results in a non-decreasing sequence.  This
patch makes calculate_totalreserve_pages() rely on that monotonicity
explicitly and finds the maximum reserve value by scanning backward and
stopping at the first non-zero entry.  This avoids unnecessary iteration
and reflects the conceptual model more directly.  No functional behavior
changes.

To maintain this assumption explicitly, a comment is added next to
setup_per_zone_lowmem_reserve() documenting the monotonicity expectation
and noting that calculate_totalreserve_pages() relies on it.

Link: https://lkml.kernel.org/r/tencent_EB0FED91B01B1F8B6DAEE96719C5F5797F07@qq.com
Signed-off-by: fujunjie <fujunjie1@qq.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:08 -08:00
Jiayuan Chen
3cf41edc20 mm/vmscan: skip increasing kswapd_failures when reclaim was boosted
We have a colocation cluster used for deploying both offline and online
services simultaneously.  In this environment, we encountered a
scenario where direct memory reclamation was triggered due to kswapd
not running.

1. When applications start up, rapidly consume memory, or experience
   network traffic bursts, the kernel reaches steal_suitable_fallback(),
   which sets watermark_boost and subsequently wakes kswapd.

2. In the core logic of kswapd thread (balance_pgdat()), when reclaim is
   triggered by watermark_boost, the maximum priority is 10. Higher
   priority values mean less aggressive LRU scanning, which can result in
   no pages being reclaimed during a single scan cycle:

   if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
       raise_priority = false;

3. Additionally, many of our pods are configured with memory.low, which
   prevents memory reclamation in certain cgroups, further increasing the
   chance of failing to reclaim memory.

4. This eventually causes pgdat->kswapd_failures to continuously
   accumulate, exceeding MAX_RECLAIM_RETRIES, and consequently kswapd
   stops working.  At this point, the system's available memory is still
   significantly above the high watermark -- it's inappropriate for kswapd
   to stop under these conditions.

The final observable issue is that a brief period of rapid memory
allocation causes kswapd to stop running, ultimately triggering direct
reclaim and making the applications unresponsive.

This problem leading to direct memory reclamation has been a
long-standing issue in our production environment.  We initially held
the simple assumption that it was caused by applications allocating
memory too rapidly for kswapd to keep up with reclamation.  However,
after we began monitoring kswapd's runtime behavior, we discovered a
different pattern:

kswapd initially exhibits very aggressive activity even when there is
still considerable free memory, but it subsequently stops running
entirely, even as memory levels approach the low watermark.

In summary, both boosted watermarks and memory.low increase the
probability of kswapd operation failures.

This patch specifically addresses the scenario involving boosted
watermarks by not incrementing kswapd_failures when reclamation fails. 
A more general solution, potentially addressing memory.low or other
cases, requires further discussion.

Link: https://lkml.kernel.org/r/53de0b3ee0b822418e909db29bfa6513faff9d36@linux.dev
Link: https://lkml.kernel.org/r/20251024022711.382238-1-jiayuan.chen@linux.dev
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:07 -08:00
Andrew Donnellan
2bd1337a12 KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions
Switch to using the generic infrastructure to check for and handle pending
work before transitioning into guest mode.

xfer_to_guest_mode_handle_work() does a few more things than the current
code does when deciding whether or not to exit the __vcpu_run() loop. The
exittime tests from kvm-unit-tests, in my tests, were within a few percent
compared to before this series, which is within noise tolerance.

Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
[frankja@linux.ibm.com: Removed semicolon]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2025-11-28 10:11:14 +01:00
Bibo Mao
0f90fa6e2e KVM: LoongArch: selftests: Add time counter test case
With time counter test, it is to verify that time count starts from 0
and always grows up then.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-28 14:49:48 +08:00
Bibo Mao
4e88240940 KVM: LoongArch: selftests: Add SW emulated timer test case
This test case setup one-shot timer and execute idle instruction
immediately to indicate giving up CPU, hypervisor will emulate SW
hrtimer and wakeup vCPU when SW hrtimer is fired.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-28 14:49:47 +08:00
Bibo Mao
df41742343 KVM: LoongArch: selftests: Add timer interrupt test case
Add timer test case based on common arch_timer code, timer interrupt
with one-shot and period mode is tested.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-28 14:49:44 +08:00
Arnd Bergmann
00de4ef9d3 Merge tag 'riscv-config-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/defconfig
RISC-V config for v6.19

Spacemit:
The Spacemit k1 wants the freescale qspi driver enabled as a module
as they appear to be rather similar IPs.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>

* tag 'riscv-config-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
  riscv: defconfig: enable SPI_FSL_QUADSPI as a module

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-27 23:03:34 +01:00
Arnd Bergmann
8e2baac0f2 Merge tag 'cache-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/drivers-late
standalone cache drivers for v6.19

ccache:
Add a compatible for the pic64gx SoC. No driver change needed, as it
falls back to the PolarFire SoC.

hisi hha/generic cpu cache maintenance:
Add support for a non-architectural mechanism for invalidating memory
regions, needed for some cxl implementations on arm64 (and probably
elsewhere in the future). The HiSilicon Hydra Home Agent is the first
driver to provide this support.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>

* tag 'cache-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
  MAINTAINERS: refer to intended file in STANDALONE CACHE CONTROLLER DRIVERS
  cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent
  cache: Make top level Kconfig menu a boolean dependent on RISCV
  MAINTAINERS: Add Jonathan Cameron to drivers/cache and add lib/cache_maint.c + header
  arm64: Select GENERIC_CPU_CACHE_MAINTENANCE
  lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
  memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion()
  memregion: Drop unused IORES_DESC_* parameter from cpu_cache_invalidate_memregion()
  dt-bindings: cache: sifive,ccache0: add a pic64gx compatible

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-27 23:00:45 +01:00
Arnd Bergmann
909752d392 Merge tag 'soc-drivers-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/drivers-late
RISC-V soc-drivers for v6.19

Microchip:
Add bindings and mfd drivers for two syscon regions on PolarFire SoC,
needed as part of a rework of the devicetree to permit supporting,
among other things, pinctrl sanely and avoiding the "new" pic64gx SoC
ever using the original incorrect clock nodes. Fiddle with the Microchip
RISC-V MAINTAINERS entry to add these drivers and avoid branding it FPGA
only.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>

* tag 'soc-drivers-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
  MAINTAINERS: rename Microchip RISC-V entry
  MAINTAINERS: add new soc drivers to Microchip RISC-V entry
  soc: microchip: add mfd drivers for two syscon regions on PolarFire SoC
  dt-bindings: soc: microchip: document the simple-mfd syscon on PolarFire SoC

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-27 22:59:41 +01:00
Arnd Bergmann
aa3294ff41 Merge tag 'apple-soc-drivers-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/sven/linux into soc/drivers-late
Apple SoC driver updates for 6.18

Two small fixes:
- mailbox: Stop leaking a reference to the mbox platform device during
  lookup
- sart: drop device reference after lookup since it's no longer used
  afterwards

Signed-off-by: Sven Peter <sven@kernel.org>

* tag 'apple-soc-drivers-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/sven/linux:
  soc: apple: sart: drop device reference after lookup
  soc: apple: mailbox: fix device leak on lookup

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-27 22:58:57 +01:00
Arnd Bergmann
16cc3988cc Merge tag 'soc_fsl-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chleroy/linux into soc/drivers-late
FSL SOC Changes for 6.19
- A couple misc changes to fsl/qbman
- Update email address for Christophe Leroy in MAINTAINERS

* tag 'soc_fsl-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chleroy/linux:
  soc: fsl: qbman: use kmalloc_array() instead of kmalloc()
  soc: fsl: qbman: add WQ_PERCPU to alloc_workqueue users
  MAINTAINERS: Update email address for Christophe Leroy

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-27 22:57:51 +01:00
Arnd Bergmann
e9191aa446 Merge tag 'amlogic-drivers-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux into soc/drivers-late
Amlogic Drivers for v6.19:
- Canvas device leak fix and error handling simplification
- Add more SoCs definitions
- Suport more SoCs for meson-gx-ao-secure

* tag 'amlogic-drivers-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux:
  soc: amlogic: meson-gx-socinfo: add new SoCs id
  dt-bindings: arm: amlogic: meson-gx-ao-secure: support more SoCs
  soc: amlogic: canvas: simplify lookup error handling
  soc: amlogic: canvas: fix device leak on lookup

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-27 22:57:07 +01:00
Arnd Bergmann
3d892259d1 Merge tag 'reset-gpio-for-v6.19-2' of https://git.pengutronix.de/git/pza/linux into soc/drivers-late
Reset/GPIO/swnode changes for v6.19 (v2)

* Fix chtwc_int33fe build issue since commit d7cdbbc93c ("software
  node: allow referencing firmware nodes").

* tag 'reset-gpio-for-v6.19-2' of https://git.pengutronix.de/git/pza/linux:
  platform/x86: intel: chtwc_int33fe: don't dereference swnode args

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-27 22:54:36 +01:00
Arnd Bergmann
dc567f36f9 Merge tag 'reset-gpio-for-v6.19' of https://git.pengutronix.de/git/pza/linux into soc/drivers-late
Reset/GPIO/swnode changes for v6.19

* Extend software node implementation, allowing its properties to reference
  existing firmware nodes.
* Update the GPIO property interface to use reworked swnode macros.
* Rework reset-gpio code to use GPIO lookup via swnode.
* Fix spi-cs42l43 driver to work with swnode changes.

* tag 'reset-gpio-for-v6.19' of https://git.pengutronix.de/git/pza/linux:
  reset: gpio: use software nodes to setup the GPIO lookup
  reset: gpio: convert the driver to using the auxiliary bus
  reset: make the provider of reset-gpios the parent of the reset device
  reset: order includes alphabetically in reset/core.c
  gpio: swnode: allow referencing GPIO chips by firmware nodes
  spi: cs42l43: Use actual ACPI firmware node for chip selects
  software node: allow referencing firmware nodes
  software node: increase the reference of the swnode by its fwnode
  software node: read the reference args via the fwnode API

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-27 22:53:01 +01:00
Arnd Bergmann
3aa9940035 Merge tag 'riscv-dt-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/dt
RISC-V Devicetrees for v6.19

MAINTAINERS:
There's some re-jigging of things to reduce duplication, by moving me
into the StarFive entry and my tree into the Microchip one. The
other platforms that I look after (SiFive and Canaan) are marked as Odd
Fixes to better represent their status. Nothing functionally changes.

Microchip:
Add adc and mmc nodes for the Beagle-V Fire.

SiFive:
Add pwm fans to the unmatched board.

StarFive:
Add the Orange PI RV board, another VisionFive 2 derived SBC. This
required moving a mmc related nodes out of the common file, into
<board>.dts. Yet more things moved out of the common file when the
VisionFive 2 Lite boards were added, which use the JH7110S SoC instead of
the JH7110. The difference here between SoCs is just temperature and
frequency ranges, but the boards differ enough that the pool of common
nodes decreases a little further.  There's an eMMC and an SD variant
here, that are different SKUs, bringing the total new StarFive boards to
three.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>

* tag 'riscv-dt-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
  riscv: dts: starfive: add Orange Pi RV
  dt-bindings: riscv: starfive: add xunlong,orangepi-rv
  riscv: dts: starfive: Add VisionFive 2 Lite eMMC board device tree
  riscv: dts: starfive: Add VisionFive 2 Lite board device tree
  riscv: dts: starfive: Add common board dtsi for VisionFive 2 Lite variants
  riscv: dts: starfive: jh7110-common: Move out some nodes to the board dts
  dt-bindings: riscv: Add StarFive JH7110S SoC and VisionFive 2 Lite board
  MAINTAINERS: degrade RISC-V MISC SOC SUPPORT to Odd Fixes
  MAINTAINERS: add tree to RISC-V Microchip entry
  MAINTAINERS: remove patchwork from RISC-V MISC SOC SUPPORT
  MAINTAINERS: add Conor to StarFive entry
  riscv: dts: sifive: unmatched: Add PWM controlled fans
  riscv: dts: microchip: enable qspi adc/mmc-spi-slot on BeagleV Fire
  dts: starfive: jh7110-common: split out mmc0 reset pins from common into boards

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-27 22:49:32 +01:00
Chen-Yu Tsai
ce48af13e6 arm64: dts: mediatek: mt8195: Fix address range for JPEG decoder core 1
The base address of JPEG decoder core 1 should start at 0x10000, and
have a size of 0x10000, i.e. it is right after core 0.

Instead the core has  the same base address as core 0, and with a crazy
large size. This looks like a mixup of address and size cells when the
ranges were converted.

This causes the kernel to fail to register the second core due to sysfs
name conflicts:

    sysfs: cannot create duplicate filename '/devices/platform/soc/soc:jpeg-decoder@1a040000/1a040000.jpgdec'

Fix up the address range.

Fixes: a9eac43d03 ("arm64: dts: mediatek: mt8195: Fix ranges for jpeg enc/decoder nodes")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Acked-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20251127100044.612825-1-wenst@chromium.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-27 22:48:25 +01:00
Arnd Bergmann
6038f30aa8 Merge tag 'amlogic-arm64-dt-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux into soc/dt
Amlogic ARM64 DT Changes for v6.19:
- Fix the PCIe DBI memory region name
- Add ISP nodes for Amlogic C3
- Add power controller nodes for Amlogic S6/S7/S7D
- Add Pinctrl node for Amlogic A4
- Add AO Secure node for Amlogic S6/S7/S7D
- Add GPIO Interrupt node for Amlogic S6/S7/S7D
- Fix S922X cache layout
- New Boards:
  - Tanix TX9 Pro

* tag 'amlogic-arm64-dt-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux:
  arm64: dts: amlogic: meson-g12b: Fix L2 cache reference for S922X CPUs
  arm64: dts: Add gpio_intc node for Amlogic S7D SoCs
  arm64: dts: Add gpio_intc node for Amlogic S7 SoCs
  arm64: dts: Add gpio_intc node for Amlogic S6 SoCs
  arm64: dts: amlogic: s7d: add ao secure node
  arm64: dts: amlogic: s7: add ao secure node
  arm64: dts: amlogic: s6: add ao secure node
  arm64: dts: amlogic: Fix the register name of the 'DBI' region
  dts: arm64: amlogic: add a5 pinctrl node
  arm64: dts: amlogic: s7d: add power domain controller node
  arm64: dts: amlogic: s7: add power domain controller node
  arm64: dts: amlogic: s6: add power domain controller node
  dts: arm64: amlogic: Add ISP related nodes for C3
  arm64: dts: meson: add initial device-tree for Tanix TX9 Pro
  dt-bindings: arm: amlogic: add support for Tanix TX9 Pro

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-27 22:46:47 +01:00
Johan Hovold
990eb9a8eb soc: samsung: exynos-pmu: fix device leak on regmap lookup
Make sure to drop the reference taken when looking up the PMU device and
its regmap.

Note that holding a reference to a device does not prevent its regmap
from going away so there is no point in keeping the reference.

Fixes: 0b7c607502 ("soc: samsung: exynos-pmu: Add regmap support for SoCs that protect PMU regs")
Cc: stable@vger.kernel.org	# 6.9
Cc: Peter Griffin <peter.griffin@linaro.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20251121121852.16825-1-johan@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2025-11-27 16:55:38 +01:00
Marek Szyprowski
2ff147fdfa ARM: dts: samsung: exynos4412-midas: turn off SDIO WLAN chip during system suspend
Commit 8c3170628a ("wifi: brcmfmac: keep power during suspend if board
requires it") changed default behavior of the BRCMFMAC driver, which now
keeps SDIO card powered during system suspend to enable optional support
for WOWL. This feature is not supported by the legacy Exynos4 based
boards and leads to WLAN disfunction after system suspend/resume cycle.
Fix this by annotating SDIO host used by WLAN chip with
'cap-power-off-card' property, which should have been there from the
beginning.

Fixes: f77cbb9a3e ("ARM: dts: exynos: Add bcm4334 device node to Trats2")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://patch.msgid.link/20251126102618.3103517-5-m.szyprowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2025-11-27 16:07:16 +01:00
Marek Szyprowski
97cc9c346b ARM: dts: samsung: exynos4210-trats: turn off SDIO WLAN chip during system suspend
Commit 8c3170628a ("wifi: brcmfmac: keep power during suspend if board
requires it") changed default behavior of the BRCMFMAC driver, which now
keeps SDIO card powered during system suspend to enable optional support
for WOWL. This feature is not supported by the legacy Exynos4 based
boards and leads to WLAN disfunction after system suspend/resume cycle.
Fix this by annotating SDIO host used by WLAN chip with
'cap-power-off-card' property, which should have been there from the
beginning.

Fixes: a19f6efc01 ("ARM: dts: exynos: Enable WLAN support for the Trats board")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://patch.msgid.link/20251126102618.3103517-4-m.szyprowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2025-11-27 16:07:06 +01:00
Marek Szyprowski
863d69923b ARM: dts: samsung: exynos4210-i9100: turn off SDIO WLAN chip during system suspend
Commit 8c3170628a ("wifi: brcmfmac: keep power during suspend if board
requires it") changed default behavior of the BRCMFMAC driver, which now
keeps SDIO card powered during system suspend to enable optional support
for WOWL. This feature is not supported by the legacy Exynos4 based
boards and leads to WLAN disfunction after system suspend/resume cycle.
Fix this by annotating SDIO host used by WLAN chip with
'cap-power-off-card' property, which should have been there from the
beginning.

Fixes: 8620cc2f99 ("ARM: dts: exynos: Add devicetree file for the Galaxy S2")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://patch.msgid.link/20251126102618.3103517-3-m.szyprowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2025-11-27 16:06:54 +01:00
Marek Szyprowski
97aee67e24 ARM: dts: samsung: universal_c210: turn off SDIO WLAN chip during system suspend
Commit 8c3170628a ("wifi: brcmfmac: keep power during suspend if board
requires it") changed default behavior of the BRCMFMAC driver, which now
keeps SDIO card powered during system suspend to enable optional support
for WOWL. This feature is not supported by the legacy Exynos4 based
boards and leads to WLAN disfunction after system suspend/resume cycle.
Fix this by annotating SDIO host used by WLAN chip with
'cap-power-off-card' property, which should have been there from the
beginning.

Fixes: f1b0ffaa68 ("ARM: dts: exynos: Enable WLAN support for the UniversalC210 board")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://patch.msgid.link/20251126102618.3103517-2-m.szyprowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2025-11-27 16:06:44 +01:00
Marek Szyprowski
2224ea67c7 soc: samsung: exynos-pmu: Fix structure initialization
Commit 78b72897a5 ("soc: samsung: exynos-pmu: Enable CPU Idle for
gs101") added system wide suspend/resume callbacks to Exynos PMU driver,
but some items used by these callbacks are initialized only on
GS101-compatible boards. Move that initialization to exynos_pmu_probe()
to avoid potential lockdep warnings like below observed during system
suspend/resume cycle:

INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 0 UID: 0 PID: 2134 Comm: rtcwake Not tainted 6.18.0-rc7-next-20251126-00039-g1d656a1af243 #11794 PREEMPT
Hardware name: Samsung Exynos (Flattened Device Tree)
Call trace:
 unwind_backtrace from show_stack+0x10/0x14
 show_stack from dump_stack_lvl+0x68/0x88
 dump_stack_lvl from register_lock_class+0x970/0x988
 register_lock_class from __lock_acquire+0xc8/0x29ec
 __lock_acquire from lock_acquire+0x134/0x39c
 lock_acquire from _raw_spin_lock+0x38/0x48
 _raw_spin_lock from exynos_cpupm_suspend_noirq+0x18/0x34
 exynos_cpupm_suspend_noirq from dpm_run_callback+0x98/0x2b8
 dpm_run_callback from device_suspend_noirq+0x8c/0x310

Fixes: 78b72897a5 ("soc: samsung: exynos-pmu: Enable CPU Idle for gs101")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://patch.msgid.link/20251126110038.3326768-1-m.szyprowski@samsung.com
[krzk: include calltrace into commit msg]
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2025-11-27 16:02:59 +01:00
Joel Granados
564195c1a3 sysctl: Wrap do_proc_douintvec with the public function proc_douintvec_conv
Make do_proc_douintvec static and export proc_douintvec_conv wrapper
function for external use. This is to keep with the design in sysctl.c.
Update fs/pipe.c to use the new public API.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:45:38 +01:00
Joel Granados
30baaeb685 sysctl: Create pipe-max-size converter using sysctl UINT macros
Create a converter for the pipe-max-size proc_handler using the
SYSCTL_UINT_CONV_CUSTOM. Move SYSCTL_CONV_IDENTITY macro to the sysctl
header to make it available for pipe size validation. Keep returning
-EINVAL when (val == 0) by using a range checking converter and setting
the minimal valid value (extern1) to SYSCTL_ONE. Keep round_pipe_size by
passing it as the operation for SYSCTL_USER_TO_KERN_INT_CONV.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:45:37 +01:00
Joel Granados
4639faaa60 sysctl: Move proc_doulongvec_ms_jiffies_minmax to kernel/time/jiffies.c
Move proc_doulongvec_ms_jiffies_minmax to kernel/time/jiffies.c. Create
a non static wrapper function proc_doulongvec_minmax_conv that
forwards the custom convmul and convdiv argument values to the internal
do_proc_doulongvec_minmax. Remove unused linux/times.h include from
kernel/sysctl.c.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:45:37 +01:00
Joel Granados
54932988c4 sysctl: Move jiffies converters to kernel/time/jiffies.c
Move integer jiffies converters (proc_dointvec{_,_ms_,_userhz_}jiffies
and proc_dointvec_ms_jiffies_minmax) to kernel/time/jiffies.c. Error
stubs for when CONFIG_PRCO_SYSCTL is not defined are not reproduced
because all the jiffies converters go through proc_dointvec_conv which
is already stubbed. This is part of the greater effort to move sysctl
logic out of kernel/sysctl.c thereby reducing merge conflicts in
kernel/sysctl.c.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:45:37 +01:00
Joel Granados
24a08eefdd sysctl: Move UINT converter macros to sysctl header
Move SYSCTL_USER_TO_KERN_UINT_CONV and SYSCTL_UINT_CONV_CUSTOM macros to
include/linux/sysctl.h. No need to embed sysctl_kern_to_user_uint_conv
in a macro as it will not need a custom kernel pointer operation. This
is a preparation commit to enable jiffies converter creation outside
kernel/sysctl.c.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:45:37 +01:00
Joel Granados
e2e5dac304 sysctl: Move INT converter macros to sysctl header
Move direction macros (SYSCTL_{USER_TO_KERN,KERN_TO_USER}) and the
integer converter macros (SYSCTL_{USER_TO_KERN,KERN_TO_USER}_INT_CONV,
SYSCTL_INT_CONV_CUSTOM) into include/linux/sysctl.h. This is a
preparation commit to enable jiffies converter creation outside
kernel/sysctl.c.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:45:37 +01:00
Joel Granados
c5b4c183f7 sysctl: Allow custom converters from outside sysctl
The new non-static proc_dointvec_conv forwards a custom converter
function to do_proc_dointvec from outside the sysctl scope. Rename the
do_proc_dointvec call points so any future changes to proc_dointvec_conv
are propagated in sysctl.c This is a preparation commit that allows the
integer jiffie converter functions to move out of kernel/sysctl.c.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:45:37 +01:00
Joel Granados
1aa53326e1 sysctl: remove __user qualifier from stack_erasing_sysctl buffer argument
The buffer arg in proc handler functions have been void* (no __user
qualifier) since commit 32927393dc ("sysctl: pass kernel pointers to
->proc_handler"). The __user qualifier was erroneously brought back in
commit 0df8bdd5e3 ("stackleak: move stack_erasing sysctl to
stackleak.c"). This fixes the error by removing the __user qualifier.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202510221719.3ggn070M-lkp@intel.com/
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:44:53 +01:00
Joel Granados
c3102febf4 sysctl: Create macro for user-to-kernel uint converter
Replace sysctl_user_to_kern_uint_conv function with
SYSCTL_USER_TO_KERN_UINT_CONV macro that accepts u_ptr_op parameter for
value transformation. Replacing sysctl_kern_to_user_uint_conv is not
needed as it will only be used from within sysctl.c. This is a
preparation commit for creating a custom converter in fs/pipe.c. No
Functional changes are intended.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:43:20 +01:00
Joel Granados
0c1d2dc7cc sysctl: Add optional range checking to SYSCTL_UINT_CONV_CUSTOM
Add k_ptr_range_check parameter to SYSCTL_UINT_CONV_CUSTOM macro to
enable range validation using table->extra1/extra2. Replace
do_proc_douintvec_minmax_conv with do_proc_uint_conv_minmax generated
by the updated macro.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:43:20 +01:00
Joel Granados
49d3288c1d sysctl: Create unsigned int converter using new macro
Pass sysctl_{user_to_kern,kern_to_user}_uint_conv (unsigned integer
uni-directional converters) to the new SYSCTL_UINT_CONV_CUSTOM macro
to create do_proc_douintvec_conv's replacement (do_proc_uint_conv).

This is a preparation commit to use the unsigned integer converter from
outside sysctl. No functional change is intended.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:43:20 +01:00
Joel Granados
54e77495a7 sysctl: Add optional range checking to SYSCTL_INT_CONV_CUSTOM
Extend the SYSCTL_INT_CONV_CUSTOM macro with a k_ptr_range_check
parameter to conditionally generate range validation code. When enabled,
validation is done against table->extra1 (min) and table->extra2 (max)
bounds before assignment. Add base minmax and ms_jiffies_minmax
converter instances that utilize the range checking functionality.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:43:20 +01:00
Joel Granados
796c481a4b sysctl: Create integer converters with one macro
New SYSCTL_INT_CONV_CUSTOM macro creates "bi-directional" converters
from a user-to-kernel and a kernel-to-user functions. Replace integer
versions of do_proc_*_conv functions with the ones from the new macro.
Rename "_dointvec_" to just "_int_" as these converters are not applied
to vectors and the "do" is already in the name.

Move the USER_HZ validation directly into proc_dointvec_userhz_jiffies()

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:43:20 +01:00
Joel Granados
2dc164a48e sysctl: Create converter functions with two new macros
Eight converter functions are created using two new macros
(SYSCTL_USER_TO_KERN_INT_CONV & SYSCTL_KERN_TO_USER_INT_CONV); they are
called from four pre-existing converter functions: do_proc_dointvec_conv
and do_proc_dointvec{,_userhz,_ms}_jiffies_conv. The function names
generated by the macros are differentiated by a string suffix passed as
the first macro argument.

The SYSCTL_USER_TO_KERN_INT_CONV macro first executes the u_ptr_op
operation, then checks for overflow, assigns sign (-, +) and finally
writes to the kernel var with WRITE_ONCE; it always returns an -EINVAL
when an overflow is detected. The SYSCTL_KERN_TO_USER_INT_CONV uses
READ_ONCE, casts to unsigned long, then executes the k_ptr_op before
assigning the value to the user space buffer.

The overflow check is always done against MAX_INT after applying
{k,u}_ptr_op. This approach avoids rounding or precision errors that
might occur when using the inverse operations.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:43:20 +01:00
Joel Granados
551bf18450 sysctl: Discriminate between kernel and user converter params
Rename converter parameter to indicate data flow direction: "lvalp" to
"u_ptr" indicating a user space parsed value pointer. "valp" to "k_ptr"
indicating a kernel storage value pointer. This facilitates the
identification of discrepancies between direction (copy to kernel or
copy to user space) and the modified variable. This is a preparation
commit for when the converter functions are exposed to the rest of the
kernel.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:43:20 +01:00
Joel Granados
5412f5b13d sysctl: Indicate the direction of operation with macro names
Replace the "write" integer parameter with SYSCTL_USER_TO_KERN() and
SYSCTL_KERN_TO_USER() that clearly indicate data flow direction in
sysctl operations.

"write" originates in proc_sysctl.c (proc_sys_{read,write}) and can take
one of two values: "0" or "1" when called from proc_sys_read and
proc_sys_write respectively. When write has a value of zero, data is
"written" to a user space buffer from a kernel variable (usually
ctl_table->data). Whereas when write has a value greater than zero, data
is "written" to an internal kernel variable from a user space buffer.
Remove this ambiguity by introducing macros that clearly indicate the
direction of the "write".

The write mode names in sysctl_writes_mode are left unchanged as these
directly relate to the sysctl_write_strict file in /proc/sys where the
word "write" unambiguously refers to writing to a file.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:43:20 +01:00
Joel Granados
610c9b6efb sysctl: Remove superfluous __do_proc_* indirection
Remove "__" from __do_proc_do{intvec,uintvec,ulongvec_minmax} internal
functions and delete their corresponding do_proc_do* wrappers. These
indirections are unnecessary as they do not add extra logic nor do they
indicate a layer separation.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:43:20 +01:00
Joel Granados
ee581c0e3a sysctl: Remove superfluous tbl_data param from "dovec" functions
Remove superfluous tbl_data param from do_proc_douintvec{,_r,_w}
and __do_proc_do{intvec,uintvec,ulongvec_minmax}. There is no need to
pass it as it is always contained within the ctl_table struct.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:43:20 +01:00
Joel Granados
6ca07a9b63 sysctl: Replace void pointer with const pointer to ctl_table
* Replace void* data in the converter functions with a const struct
  ctl_table* table as it was only getting forwarding values from
  ctl_table->extra{1,2}.
* Remove the void* data in the do_proc_* functions as they already had a
  pointer to the ctl_table.
* Remove min/max structures do_proc_do{uint,int}vec_minmax_conv_param;
  the min/max values get passed directly in ctl_table.
* Keep min/max initialization in extra{1,2} in proc_dou8vec_minmax.
* The do_proc_douintvec was adjusted outside sysctl.c as it is exported
  to fs/pipe.c.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27 15:43:20 +01:00
Heiko Carstens
d0139059e3 KVM: s390: Enable and disable interrupts in entry code
Move enabling and disabling of interrupts around the SIE instruction to
entry code. Enabling interrupts only after the __TI_sie flag has been set
guarantees that the SIE instruction is not executed if an interrupt happens
between enabling interrupts and the execution of the SIE instruction.
Interrupt handlers and machine check handler forward the PSW to the
sie_exit label in such cases.

This is a prerequisite for VIRT_XFER_TO_GUEST_WORK to prevent that guest
context is entered when e.g. a scheduler IPI, indicating that a reschedule
is required, happens right before the SIE instruction, which could lead to
long delays.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2025-11-27 15:39:46 +01:00
Andrew Donnellan
c067847c52 KVM: s390: Add signal_exits counter
Add a signal_exits counter for s390, as exists on arm64, loongarch, mips,
powerpc, riscv and x86.

This is used by kvm_handle_signal_exit(), which we will use when we
later enable CONFIG_VIRT_XFER_TO_GUEST_WORK.

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2025-11-27 15:39:46 +01:00
Bibo Mao
d84fe2f30b KVM: LoongArch: selftests: Add exception handler register interface
Add interrupt and exception handler register interface. When exception
happens, execute registered exception handler if exists, else report an
error.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-27 11:00:18 +08:00
Bibo Mao
1c5d3a1eab KVM: LoongArch: selftests: Add basic interfaces
Add some basic function interfaces such as CSR register access, local
irq enable or disable APIs.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-27 11:00:18 +08:00
Bibo Mao
985a96983b KVM: LoongArch: selftests: Add system registers save/restore on exception
When system returns from exception with ertn instruction, PC comes from
LOONGARCH_CSR_ERA, and CSR.CRMD comes LOONGARCH_CSR_PRMD.

Here save CSR register CSR.ERA and CSR.PRMD into stack, and then restore
them from stack. So it can be modified by exception handlers in future.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-27 11:00:18 +08:00
Bibo Mao
d3e43a1f34 LoongArch: KVM: Use 64-bit register definition for EIOINTC
With in-kernel emulated eiointc driver, hardware register can be
accessed by different size, there is reg_u8/reg_u16/reg_u32/reg_u64
union type with EIOINTC register.

Here use 64-bit type with register definition and remove union type
since most registers are accessed with 64-bit method. And this makes
EIOINTC emulated driver simpler.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-27 11:00:18 +08:00
Song Gao
7bcd8d0b22 LoongArch: KVM: Add AVEC basic support
Check whether the host CPU supported AVEC, and save/restore CSR_MSGIS0-
CSR_MSGIS3 when necessary.

Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-27 11:00:18 +08:00
Bibo Mao
74087611f0 LoongArch: KVM: Get VM PMU capability from HW GCFG register
Now VM PMU capability comes from host PMU capability directly, instead
bit 23 of HW GCFG CSR register also show PMU capability for VM. It will
be better if it comes from HW GCFG CSR register rather than just host
PMU capability, especially when LVZ feature is emulated in TCG mode, in
which case without PMU capability.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-27 11:00:18 +08:00
Dan Carpenter
8da059f2a4 fuse: Uninitialized variable in fuse_epoch_work()
The fuse_ilookup() function only sets *fm on the success path so this
"if (fm) {" NULL check doesn't work.  The "fm" pointer is either
uninitialized or valid.  Check the "inode" pointer instead.

Also, while it's not necessary, it is cleaner to move the iput(inode)
under the NULL check as well.

Fixes: 64becd224f ("fuse: new work queue to invalidate dentries from old epochs")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-26 12:45:29 +01:00
Joanne Koong
95c39eef7c fuse: fix io-uring list corruption for terminated non-committed requests
When a request is terminated before it has been committed, the request
is not removed from the queue's list. This leaves a dangling list entry
that leads to list corruption and use-after-free issues.

Remove the request from the queue's list for terminated non-committed
requests.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Fixes: c090c8abae ("fuse: Add io-uring sqe commit and fetch support")
Cc: stable@vger.kernel.org
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-26 12:38:40 +01:00
Gongwei Li
760b8eec2c soc: fsl: qbman: use kmalloc_array() instead of kmalloc()
Replace kmalloc() with kmalloc_array() to prevent potential
overflow, as recommended in Documentation/process/deprecated.rst.

Signed-off-by: Gongwei Li <ligongwei@kylinos.cn>
Reviewed-by: Fushuai Wang <wangfushuai@baidu.com>
Link: https://lore.kernel.org/r/20251121061022.114609-1-13875017792@163.com
Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
2025-11-26 12:29:40 +01:00
Marco Crivellari
c181703a29 soc: fsl: qbman: add WQ_PERCPU to alloc_workqueue users
Currently if a user enqueues a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://lore.kernel.org/r/20251107152950.293899-1-marco.crivellari@suse.com
Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
2025-11-26 12:29:40 +01:00
Christophe Leroy (CS GROUP)
3fca89b775 MAINTAINERS: Update email address for Christophe Leroy
My address at csgroup.eu is redirected to the new one at
cs-soprasteria.com which is a Professionnal Microsoft account without
SMTP gateway. We still have the SMTP gateway for csgroup.eu but it is
not maintained anymore and might stop working at anytime. In addition
the DKIM signature is not performed allthough the domain has DMARC
set up.

Switch to kernel.org email address and add entries in mailmap.

Link: https://lore.kernel.org/r/d9b6758297d7dcddf79feb4459ceaedd7d6f1f2e.1764155757.git.chleroy@kernel.org
Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
2025-11-26 12:28:48 +01:00
Paolo Bonzini
679fcce002 Merge tag 'kvm-x86-svm-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM SVM changes for 6.19:

 - Fix a few missing "VMCB dirty" bugs.

 - Fix the worst of KVM's lack of EFER.LMSLE emulation.

 - Add AVIC support for addressing 4k vCPUs in x2AVIC mode.

 - Fix incorrect handling of selective CR0 writes when checking intercepts
   during emulation of L2 instructions.

 - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on
   VMRUN and #VMEXIT.

 - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft
   interrupt if the guest patched the underlying code after the VM-Exit, e.g.
   when Linux patches code with a temporary INT3.

 - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to
   userspace, and extend KVM "support" to all policy bits that don't require
   any actual support from KVM.
2025-11-26 09:48:39 +01:00
Paolo Bonzini
d1e7b4613e Merge tag 'kvm-x86-vmx-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM VMX changes for 6.19:

 - Use the root role from kvm_mmu_page to construct EPTPs instead of the
   current vCPU state, partly as worthwhile cleanup, but mostly to pave the
   way for tracking per-root TLB flushes so that KVM can elide EPT flushes on
   pCPU migration if KVM has flushed the root at least once.

 - Add a few missing nested consistency checks.

 - Rip out support for doing "early" consistency checks via hardware as the
   functionality hasn't been used in years and is no longer useful in general,
   and replace it with an off-by-default module param to detected missed
   consistency checks (i.e. WARN if hardware finds a check that KVM does not).

 - Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32]
   on VM-Enter.

 - Misc cleanups.
2025-11-26 09:44:52 +01:00
Paolo Bonzini
de8e8ebb1a Merge tag 'kvm-x86-tdx-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM TDX changes for 6.19:

 - Overhaul the TDX code to address systemic races where KVM (acting on behalf
   of userspace) could inadvertantly trigger lock contention in the TDX-Module,
   which KVM was either working around in weird, ugly ways, or was simply
   oblivious to (as proven by Yan tripping several KVM_BUG_ON()s with clever
   selftests).

 - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a vCPU if
   creating said vCPU failed partway through.

 - Fix a few sparse warnings (bad annotation, 0 != NULL).

 - Use struct_size() to simplify copying capabilities to userspace.
2025-11-26 09:36:37 +01:00
Paolo Bonzini
adc99a6cfc Merge tag 'kvm-x86-mmu-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM x86 MMU changes for 6.19:

 - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE
   caching is disabled, as there can't be any relevant SPTEs to zap.

 - Relocate a misplace export.
2025-11-26 09:36:01 +01:00
Guillaume La Roque
a7ab6f9466 arm64: dts: amlogic: meson-g12b: Fix L2 cache reference for S922X CPUs
The original addition of cache information for the Amlogic S922X SoC
used the wrong next-level cache node for CPU cores 100 and 101,
incorrectly referencing `l2_cache_l`. These cores actually belong to
the big cluster and should reference `l2_cache_b`. Update the device
tree accordingly.

Fixes: e7f85e6c15 ("arm64: dts: amlogic: Add cache information to the Amlogic S922X SoC")
Signed-off-by: Guillaume La Roque <glaroque@baylibre.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20251123-fixkhadas-v1-1-045348f0a4c2@baylibre.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-26 09:35:42 +01:00
Xianwei Zhao
e1c246c641 arm64: dts: Add gpio_intc node for Amlogic S7D SoCs
Add GPIO interrupt controller device.

Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20251105-irqchip-gpio-s6-s7-s7d-v1-5-b4d1fe4781c1@amlogic.com
[narmstrong: fixed applying on top as secure node]
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-26 09:35:42 +01:00
Xianwei Zhao
303dad7af6 arm64: dts: Add gpio_intc node for Amlogic S7 SoCs
Add GPIO interrupt controller device.

Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20251105-irqchip-gpio-s6-s7-s7d-v1-4-b4d1fe4781c1@amlogic.com
[narmstrong: fixed applying on top of ao secure node]
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-26 09:35:42 +01:00
Xianwei Zhao
3ac37d522a arm64: dts: Add gpio_intc node for Amlogic S6 SoCs
Add GPIO interrupt controller device.

Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20251105-irqchip-gpio-s6-s7-s7d-v1-3-b4d1fe4781c1@amlogic.com
[narmstrong: fixed applying on top of ao secure node]
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-26 09:35:42 +01:00
Xianwei Zhao
2cab15a781 arm64: dts: amlogic: s7d: add ao secure node
Add node for board info registers, which allows getting SoC family and
board revision.

Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com>
Link: https://patch.msgid.link/20251119-soc-info-s6-s7-s7d-v3-5-1764c1995c04@amlogic.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-26 09:35:42 +01:00
Xianwei Zhao
35e41e2a39 arm64: dts: amlogic: s7: add ao secure node
Add node for board info registers, which allows getting SoC family and
board revision.

Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com>
Link: https://patch.msgid.link/20251119-soc-info-s6-s7-s7d-v3-4-1764c1995c04@amlogic.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-26 09:35:42 +01:00
Xianwei Zhao
6a46754317 arm64: dts: amlogic: s6: add ao secure node
Add node for board info registers, which allows getting SoC family and
board revision.

Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com>
Link: https://patch.msgid.link/20251119-soc-info-s6-s7-s7d-v3-3-1764c1995c04@amlogic.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-26 09:35:42 +01:00
Manivannan Sadhasivam
032f2b83a6 arm64: dts: amlogic: Fix the register name of the 'DBI' region
DT incorrectly specifies the 'DBI' region as 'ELBI'. DBI is a must have
region for DWC controllers as it has the Root Port and controller specific
registers, while ELBI has optional registers.

Hence, fix the DT for both Meson platforms.

Cc: stable+noautosel@kernel.org # Driver dependency
Fixes: 5b3a9c2092 ("arm64: dts: meson-axg: add PCIe nodes")
Fixes: 1f8607d597 ("arm64: dts: meson-g12a: Add PCIe node")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20251101-pci-meson-fix-v1-2-c50dcc56ed6a@oss.qualcomm.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-26 09:35:42 +01:00
Xianwei Zhao
dbb559cfda dts: arm64: amlogic: add a5 pinctrl node
Add pinctrl device to support Amlogic A5.

Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com>
Link: https://patch.msgid.link/20251022-a5-pinctrl-node-v4-1-a71911852c4b@amlogic.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-26 09:35:42 +01:00
hongyu.chen1
7ee8fc4163 arm64: dts: amlogic: s7d: add power domain controller node
Add power domain controller node for Amlogic S7D SoC.

Signed-off-by: hongyu.chen1 <hongyu.chen1@amlogic.com>
Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com>
Link: https://patch.msgid.link/20250822-pm-s6-s7-s7d-v1-5-82e3f3aff327@amlogic.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-26 09:35:42 +01:00
hongyu.chen1
f46ac57752 arm64: dts: amlogic: s7: add power domain controller node
Add power domain controller node for Amlogic S7 SoC.

Signed-off-by: hongyu.chen1 <hongyu.chen1@amlogic.com>
Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com>
Link: https://patch.msgid.link/20250822-pm-s6-s7-s7d-v1-4-82e3f3aff327@amlogic.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-26 09:35:42 +01:00
hongyu.chen1
e5dde6ff48 arm64: dts: amlogic: s6: add power domain controller node
Add power domain controller node for Amlogic S6 SoC.

Signed-off-by: hongyu.chen1 <hongyu.chen1@amlogic.com>
Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com>
Link: https://patch.msgid.link/20250822-pm-s6-s7-s7d-v1-3-82e3f3aff327@amlogic.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-26 09:35:42 +01:00
Keke Li
3f738dc33f dts: arm64: amlogic: Add ISP related nodes for C3
Add the IMX290 sensor node description to the device tree file,
which will be controlled via I2C bus with image data transmission
through MIPI CSI-2 interface.

Add CSI-2, adapter and ISP nodes for C3 family.

Signed-off-by: Keke Li <keke.li@amlogic.com>
Link: https://patch.msgid.link/20250918-b4-c3isp-v1-1-5f48db6516c9@amlogic.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-26 09:35:42 +01:00
Christian Hewitt
af94dc5610 arm64: dts: meson: add initial device-tree for Tanix TX9 Pro
Oranth Tanix TX9 Pro is based on the Amlogic Q200 reference design with
an S912 chip and the following specs:

- 3GB DDR3 RAM
- 32GB eMMC
- 10/100/1000 Base-T Ethernet
- AP6356 Wireless (802.11 b/g/n/ac, BT 5.0)
- HDMI 2.0a video
- VFD for clock/status
- 2x USB 2.0 ports
- IR receiver
- 1x Power LED (white)
- 1x Update/Reset button (underside)
- 1x micro SD card slot

Signed-off-by: Christian Hewitt <christianshewitt@gmail.com>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://patch.msgid.link/20250927125006.824293-2-christianshewitt@gmail.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-26 09:35:42 +01:00
Christian Hewitt
ea1156e840 dt-bindings: arm: amlogic: add support for Tanix TX9 Pro
The Oranth Tanix TX9 Pro is an Android STB using the Amlogic S912 chip

Signed-off-by: Christian Hewitt <christianshewitt@gmail.com>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20250927125006.824293-1-christianshewitt@gmail.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-26 09:35:41 +01:00
Paolo Bonzini
b0bf3d67a7 Merge tag 'kvm-x86-selftests-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM selftests changes for 6.19:

 - Fix a math goof in mmu_stress_test when running on a single-CPU system/VM.

 - Forcefully override ARCH from x86_64 to x86 to play nice with specifying
   ARCH=x86_64 on the command line.

 - Extend a bunch of nested VMX to validate nested SVM as well.

 - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to
   verify KVM can save/restore nested VMX state when L1 is using 5-level
   paging, but L2 is not.

 - Clean up the guest paging code in anticipation of sharing the core logic for
   nested EPT and nested NPT.
2025-11-26 09:35:40 +01:00
Paolo Bonzini
e64dcfab57 Merge tag 'kvm-x86-misc-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.19:

 - Fix an async #PF bug where KVM would clear the completion queue when the
   guest transitioned in and out of paging mode, e.g. when handling an SMI and
   then returning to paged mode via RSM.

 - Fix a bug where TDX would effectively corrupt user-return MSR values if the
   TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected.

 - Leave the user-return notifier used to restore MSRs registered when
   disabling virtualization, and instead pin kvm.ko.  Restoring host MSRs via
   IPI callback is either pointless (clean reboot) or dangerous (forced reboot)
   since KVM has no idea what code it's interrupting.

 - Use the checked version of {get,put}_user(), as Linus wants to kill them
   off, and they're measurably faster on modern CPUs due to the unchecked
   versions containing an LFENCE.

 - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC
   timers can result in a hard lockup in the host.

 - Revert the periodic kvmclock sync logic now that KVM doesn't use a
   clocksource that's subject to NPT corrections.

 - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter
   behind CONFIG_CPU_MITIGATIONS.

 - Context switch XCR0, XSS, and PKRU outside of the entry/exit fastpath as
   the only reason they were handled in the faspath was to paper of a bug in
   the core #MC code that has long since been fixed.

 - Add emulator support for AVX MOV instructions to play nice with emulated
   devices whose PCI BARs guest drivers like to access with large multi-byte
   instructions.
2025-11-26 09:34:21 +01:00
Paolo Bonzini
236831743c Merge tag 'kvm-x86-gmem-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM guest_memfd changes for 6.19:

 - Add NUMA mempolicy support for guest_memfd, and clean up a variety of
   rough edges in guest_memfd along the way.

 - Define a CLASS to automatically handle get+put when grabbing a guest_memfd
   from a memslot to make it harder to leak references.

 - Enhance KVM selftests to make it easer to develop and debug selftests like
   those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs
   often result in hard-to-debug SIGBUS errors.

 - Misc cleanups.
2025-11-26 09:32:44 +01:00
Paolo Bonzini
9aca52b552 Merge tag 'kvm-x86-generic-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM generic changes for 6.19:

 - Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for
   irqfd cleanup.

 - Fix a goof in the dirty ring documentation.
2025-11-26 09:22:45 +01:00
Marc Zyngier
64d67e7add KVM: arm64: Convert ICH_HCR_EL2_TDIR cap to EARLY_LOCAL_CPU_FEATURE
Suzuki notices that making the ICH_HCR_EL2_TDIR capability a system
one isn't a very good idea, should we end-up with CPUs that have
asymmetric TDIR support (somehow unlikely, but you never know what
level of stupidity vendors are up to). For this hypothetical setup,
making this an "EARLY_LOCAL_CPU_FEATURE" is a much better option.

This is actually consistent with what we already do with GICv5
legacy interface, so flip the capability over.

Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Fixes: 2a28810cbb ("KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trapping")
Link: https://lore.kernel.org/r/5df713d4-8b79-4456-8fd1-707ca89a61b6@arm.com
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://msgid.link/20251125160144.1086511-1-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-25 22:23:01 -08:00
Icenowy Zheng
5b70764e10 riscv: dts: starfive: add Orange Pi RV
Orange Pi RV is a SBC based on the StarFive VisionFive 2 board.

Orange Pi RV features:

- StarFive JH7110 SoC
- GbE port connected to JH7110 GMAC0 via YT8531 PHY
- 4x USB ports via VL805 PCIe USB controller connected to JH7110 pcie0
- M.2 M-key slot connected to JH7110 pcie1
- HDMI video output
- 3.5mm audio output
- Ampak AP6256 SDIO Wi-Fi/Bluetooth module on mmc0
- microSD slot on mmc1
- SPI NOR flash memory
- 24c02 EEPROM (read only by default)

Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Signed-off-by: E Shattow <e@freeshell.de>
[conor: amend comment to say what's missing]
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-25 22:20:54 +00:00
Icenowy Zheng
d94ebab404 dt-bindings: riscv: starfive: add xunlong,orangepi-rv
Add "xunlong,orangepi-rv" as a StarFive JH7110 SoC-based board.

Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Signed-off-by: E Shattow <e@freeshell.de>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-25 22:17:33 +00:00
Hal Feng
ae264ae124 riscv: dts: starfive: Add VisionFive 2 Lite eMMC board device tree
VisionFive 2 Lite eMMC board uses a non-removable onboard 64GiB eMMC
instead of the MicroSD slot.

Acked-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Tested-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: Hal Feng <hal.feng@starfivetech.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-25 22:16:00 +00:00
Hal Feng
900b32fd60 riscv: dts: starfive: Add VisionFive 2 Lite board device tree
VisionFive 2 Lite is a mini SBC based on the StarFive JH7110S SoC.

Board features:
- JH7110S SoC
- 4/8 GiB LPDDR4 DRAM
- AXP15060 PMIC
- 40 pin GPIO header
- 1x USB 3.0 host port
- 3x USB 2.0 host port
- 1x M.2 M-Key (size: 2242)
- 1x MicroSD slot (optional non-removable 64GiB eMMC)
- 1x QSPI Flash
- 1x I2C EEPROM
- 1x 1Gbps Ethernet port
- SDIO-based Wi-Fi & UART-based Bluetooth
- 1x HDMI port
- 1x 2-lane DSI
- 1x 2-lane CSI

VisionFive 2 Lite schematics: https://doc-en.rvspace.org/VisionFive2Lite/PDF/VF2_LITE_V1.10_TF_20250818_SCH.pdf
VisionFive 2 Lite Quick Start Guide: https://doc-en.rvspace.org/VisionFive2Lite/VisionFive2LiteQSG/index.html
More documents: https://doc-en.rvspace.org/Doc_Center/visionfive_2_lite.html

Acked-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Tested-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: Hal Feng <hal.feng@starfivetech.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-25 22:16:00 +00:00
Hal Feng
2ad6d71a0d riscv: dts: starfive: Add common board dtsi for VisionFive 2 Lite variants
Add a common board dtsi for use by VisionFive 2 Lite and
VisionFive 2 Lite eMMC.

Acked-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Tested-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: Hal Feng <hal.feng@starfivetech.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-25 22:16:00 +00:00
Hal Feng
84853940a7 riscv: dts: starfive: jh7110-common: Move out some nodes to the board dts
Some node in this file are not used by the upcoming VisionFive 2 Lite
board. Move them to the board dts to prepare for adding the new
VisionFive 2 Lite device tree.

Tested-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: Hal Feng <hal.feng@starfivetech.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-25 22:16:00 +00:00
Hal Feng
7a1e15b248 dt-bindings: riscv: Add StarFive JH7110S SoC and VisionFive 2 Lite board
Add device tree bindings for the StarFive JH7110S SoC
and the VisionFive 2 Lite board equipped with it.

JH7110S SoC is an industrial SoC which can run at -40~85 degrees centigrade
and up to 1.25GHz. Its CPU cores and peripherals are the same as
those of the JH7110 SoC.

VisionFive 2 Lite boards have MicroSD card version (default) and eMMC
version, which are called "VisionFive 2 Lite" and "VisionFive 2 Lite eMMC"
respectively.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Tested-by: Matthias Brugger <mbrugger@suse.com>
Reviewed-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Signed-off-by: Hal Feng <hal.feng@starfivetech.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-25 22:16:00 +00:00
Conor Dooley
56dfdf2da1 MAINTAINERS: degrade RISC-V MISC SOC SUPPORT to Odd Fixes
The SiFive and Canaan platforms are not being actively looked after at
this point, but fixes for them would be applied if/when the patches
appeared. Since they're now the only things in the RISC-V MISC SOC
SUPPORT, mark them as Odd Fixes. I don't believe this is a functional
change, it just represents what's actually happening - particularly
since the Canaan k230 never built up enough steam to get merged and the
new SiFive demo chips have been done in partnership with with other
companies, e.g. Eswin, and will reside in their directories instead.

Reviewed-by: Paul Walmsley <pjw@kernel.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-25 22:12:59 +00:00
Conor Dooley
76cc0ba2af MAINTAINERS: add tree to RISC-V Microchip entry
In fairness to my own employer, lumping it in as "misc" is not quite
accurate when they do pay me to look after the platform. Move the tree
link for it to its entry, rather than having the RISC-V MISC SOC SUPPORT
entry cover it.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-25 22:12:59 +00:00
Conor Dooley
d794a761c7 MAINTAINERS: remove patchwork from RISC-V MISC SOC SUPPORT
I don't use the main riscv patchwork for anything to do with SoCs,
remove them from here to avoid confusion.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-25 22:12:59 +00:00
Conor Dooley
d15cd50d14 MAINTAINERS: add Conor to StarFive entry
I apply the patches for StarFive devicetrees, add me to the entry along
with my tree location etc. This is not a functional change, as this info
was in the "RISC-V MISC" entry but I'd rather not have the duplication
of entries covering the StarFive directory.

Acked-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-25 22:12:59 +00:00
Arnd Bergmann
4b1e815701 Merge tag 'mtk-soc-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux into soc/drivers
MediaTek soc driver updates

This adds socinfo entries for MT8189 Kompanio 540, an extra entry
for a variant of MT8391 (AV/AZA) Genio 720 SoC, and support for
the PMIC Wrapper (by adding a compatible string) in MT8189.

* tag 'mtk-soc-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux:
  dt-bindings: soc: mediatek: pwrap: Add compatible for MT8189 SoC
  soc: mediatek: mtk-socinfo: Add entry for MT8391AV/AZA Genio 720
  soc: mediatek: mtk-socinfo: Add extra entry for MT8189

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 14:52:14 +01:00
Arnd Bergmann
af058d5f32 Merge tag 'reset-for-v6.19' of https://git.pengutronix.de/git/pza/linux into soc/drivers
Reset controller updates for v6.19

* Add support for LAN969x, eic770 and RZ/G3S reset controllers,
  for the RZ/G3S USB-PHY reset controller, and for the remaining
  TH1520 reset controllers.
* Drop legacy reset control lookup code.
* Include linux/bits.h from linux/reset.h to make it self-contained.

* tag 'reset-for-v6.19' of https://git.pengutronix.de/git/pza/linux:
  Documentation: reset: Remove reset_controller_add_lookup()
  reset: fix BIT macro reference
  reset: rzg2l-usbphy-ctrl: Fix a NULL vs IS_ERR() bug in probe
  reset: th1520: Support reset controllers in more subsystems
  reset: th1520: Prepare for supporting multiple controllers
  dt-bindings: reset: thead,th1520-reset: Add controllers for more subsys
  dt-bindings: reset: thead,th1520-reset: Remove non-VO-subsystem resets
  reset: remove legacy reset lookup code
  clk: davinci: psc: drop unused reset lookup
  reset: rzg2l-usbphy-ctrl: Add support for RZ/G3S SoC
  reset: rzg2l-usbphy-ctrl: Add support for USB PWRRDY
  dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document RZ/G3S support
  reset: eswin: Add eic7700 reset driver
  dt-bindings: reset: eswin: Documentation for eic7700 SoC
  reset: sparx5: add LAN969x support
  dt-bindings: reset: microchip: Add LAN969x support

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 14:11:21 +01:00
Arnd Bergmann
73147be308 Merge tag 'stm32-bus-firewall-for-v6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32 into soc/drivers
STM32 Firewall bus for v6.19, round 1

Highlights:
----------

The STM32MP21x platforms have a slightly different RIFSC. Add support
for these platforms.

Also, the RIF is a complex firewall framework which can be tricky
to debug. To facilitate the latter, add a debugfs entry that can
be used to display the whole RIFSC firewall configuration at runtime.

* tag 'stm32-bus-firewall-for-v6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32:
  bus: rifsc: add debugfs entry to dump the firewall configuration
  dt-bindings: bus: add stm32mp21 RIFSC compatible

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 14:04:50 +01:00
Arnd Bergmann
c0401dcbca Merge tag 'v6.19-rockchip-drivers1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/drivers
Some additional sane defaults for the oldish rk3368 soc.

* tag 'v6.19-rockchip-drivers1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  soc: rockchip: grf: Add select correct PWM implementation on RK3368
  soc: rockchip: grf: Set pwm2/xin32k pad default to xin32k for rk3368

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 14:02:26 +01:00
Arnd Bergmann
3d497bf8ef Merge tag 'qcom-drivers-for-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/drivers
Qualcomm driver updates for v6.19

Support for hardware-keymanager v1 support for wrapped keys is introduce
in the ICE driver.

Support for the new Kaanapali mobile platform is added to last-level
cache controller, pd-mapper, and UBWC drivers.

UBWC driver gains support for the Monaco and Glymur platforms.

The PMIC GLINK driver is extended to handle the differences found in
targets where the related firmware runs on the SoCCP.

Support for running on targets without initialized SMEM is provided, by
reworking the SMEM driver to differentiate between "not yet probed" and
"probed but there was no SMEM". An unwanted WARN_ON() that triggered if
clients asked for a SMEM item beyond the currently running system's
limit, was removed, to allow new use cases to gracefully fail on old
targets.

The Qualcomm socinfo driver is extended with support for version 20
through 23 and support for providing version information about more than
32 remote processors. Identifiers for QCS6490 and SM8850 are also added.

Additionally, a number of smaller bug fixes and cleanups in PBS, OCMEM,
GSBI, TZMEM, and MDT-loader are included.

* tag 'qcom-drivers-for-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (31 commits)
  soc: qcom: mdt_loader: rename 'firmware' parameter of qcom_mdt_load()
  soc: qcom: mdt_loader: merge __qcom_mdt_load() and qcom_mdt_load_no_init()
  soc: qcom: socinfo: Add reserve field to support future extension
  soc: qcom: socinfo: Add support for new fields in revision 20
  dt-bindings: firmware: qcom,scm: Document SCM on Kaanapali SOC
  soc: qcom: socinfo: add support to extract more than 32 image versions
  soc: qcom: smem: drop the WARN_ON() on SMEM item validation
  soc: qcom: ubwc: Add config for Kaanapali
  soc: qcom: socinfo: Add SoC ID for QCS6490
  dt-bindings: arm: qcom,ids: Add SoC ID for QCS6490
  soc: qcom: ice: Add HWKM v1 support for wrapped keys
  soc: qcom: smem: better track SMEM uninitialized state
  err.h: add INIT_ERR_PTR() macro
  soc: qcom: smem: fix hwspinlock resource leak in probe error paths
  dt-bindings: soc: qcom,aoss-qmp: Document the Glymur AOSS side channel
  dt-bindings: soc: qcom,aoss-qmp: Document the Kaanapali AOSS channel
  soc: qcom: ubwc: Add QCS8300 UBWC cfg
  dt-bindings: firmware: qcom,scm: Document Glymur scm
  soc: qcom: socinfo: Add SM8850 SoC ID
  dt-bindings: arm: qcom,ids: Add SoC ID for SM8850
  ...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 14:00:49 +01:00
Arnd Bergmann
a677d87689 Merge tag 'omap-for-v6.19/drivers-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap into soc/drivers
ti-sysc: allow OMAP2 and OMAP4 timers to be reserved on AM33xx

* tag 'omap-for-v6.19/drivers-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap:
  ti-sysc: allow OMAP2 and OMAP4 timers to be reserved on AM33xx

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 13:59:13 +01:00
Arnd Bergmann
0363169ac7 Merge tag 'ti-driver-soc-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux into soc/drivers
TI SoC driver updates for v6.19

- ti_sci: Add Partial-IO poweroff support and sys_off handler integration
- ti_sci: Gate IO isolation programming on firmware capability flag
- ti_sci: cleanup by replacing ifdeffery in PM ops with pm_sleep_ptr() macro

* tag 'ti-driver-soc-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux:
  firmware: ti_sci: Partial-IO support
  firmware: ti_sci: Support transfers without response
  firmware: ti_sci: Set IO Isolation only if the firmware is capable
  firmware: ti_sci: Replace ifdeffery by pm_sleep_ptr() macro

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 11:33:10 +01:00
Arnd Bergmann
16e8af6c03 Merge tag 'imx-drivers-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into soc/drivers
i.MX drivers update for 6.19:

- A series from Peng Fan to to improve i.MX SCU firmware drivers

* tag 'imx-drivers-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  firmware: imx: scu: Use devm_mutex_init
  firmware: imx: scu: Suppress bind attrs
  firmware: imx: scu: Update error code
  firmware: imx: scu-irq: Remove unused export of imx_scu_enable_general_irq_channel
  firmware: imx: scu-irq: Set mu_resource_id before get handle
  firmware: imx: scu-irq: Init workqueue before request mbox channel
  firmware: imx: scu-irq: Free mailbox client on failure at imx_scu_enable_general_irq_channel()
  firmware: imx: scu-irq: fix OF node leak in

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 11:32:27 +01:00
Arnd Bergmann
e7ec4df222 Merge tag 'sunxi-drivers-for-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into soc/drivers
Allwinner driver changes for 6.19

Just one cleanup change that is part of tree wide cleanup of redundant
pm_runtime_mark_last_busy() calls.

* tag 'sunxi-drivers-for-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  bus: sunxi-rsb: Remove redundant pm_runtime_mark_last_busy() calls

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 11:26:00 +01:00
Arnd Bergmann
aee7ea4681 Merge tag 'tegra-for-6.19-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/drivers
soc/tegra: Changes for v6.19-rc1

A couple of small fixes across the board: ACPI support on FUSE no longer
exposes duplicate SoC information, speedo IDs for Tegra210 are updated,
some comments see typo fixes or kerneldoc additions. Finally, support
for USB wake events is added on Tegra234, which allow these systems to
resume from suspend on USB activity.

* tag 'tegra-for-6.19-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  soc/tegra: pmc: Add USB wake events for Tegra234
  soc/tegra: pmc: Document tegra_pmc.syscore field
  soc/tegra: pmc: Don't fail if "aotag" is not present
  soc/tegra: fuse: speedo-tegra210: Add SoC speedo 2
  soc/tegra: fuse: speedo-tegra210: Update speedo IDs
  soc/tegra: Resolve a spelling error in the tegra194-cbb.c
  soc/tegra: fuse: Do not register SoC device on ACPI boot

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 10:49:10 +01:00
Arnd Bergmann
d34a71ba21 Merge tag 'tegra-for-6.19-syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/drivers
syscore: Changes for v6.19-rc1

Add a parameter to syscore operations to allow passing contextual data,
which in turn enables refactoring of drivers to make them independent of
global data. This initially only contains the API changes along with the
updates for existing drivers. Subsequent work will make use of this to
improve drivers.

* tag 'tegra-for-6.19-syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  syscore: Pass context data to callbacks

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 10:25:56 +01:00
Arnd Bergmann
5d5b056ba3 Merge tag 'tegra-for-6.19-core' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/drivers
amba: Fixes for v6.19-rc1

Fix a device leak. Could go into v6.18 as a fix, but since this problem
has existed for a long time and nobody has reported it before it doesn't
seem critical enough and sufficient to get it into 6.19 and then
backported.

* tag 'tegra-for-6.19-core' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  amba: tegra-ahb: Fix device leak on SMMU enable

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 10:24:21 +01:00
Arnd Bergmann
9b26d16d79 Merge tag 'renesas-drivers-for-v6.19-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into soc/drivers
Renesas driver updates for v6.19 (take two)

  - Fix accessing forbidden registers from regmap debugfs on RZ/G3E,
    RZ/G3S, RZ/V2H, and RZ/V2N.

* tag 'renesas-drivers-for-v6.19-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel:
  soc: renesas: rz-sysc: Populate readable_reg/writeable_reg in regmap config
  soc: renesas: r9a09g056-sys: Populate max_register

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 10:23:26 +01:00
Arnd Bergmann
babf4c8841 Merge tag 'samsung-drivers-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/drivers
Samsung SoC drivers for v6.19

1. ChipID driver: Add support for identifying Exynos8890 and Exynos9610.

2. PMU driver: Allow specifying list of valid registers for the custom
   regmap used on Google GS101 SoC.  The PMU (Power Management Unit) on
   that SoC uses more complex access to registers than simple MMIO and
   invalid registers trigger aborts halting the system.

3. Few minor cleanups.

4. Several new bindings for compatible devices.

* tag 'samsung-drivers-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
  dt-bindings: soc: samsung: exynos-pmu: allow mipi-phy subnode for Exynos7870 PMU
  soc: samsung: exynos-chipid: use a local dev variable
  dt-bindings: soc: samsung: exynos-sysreg: add gs101 hsi0 and misc compatibles
  dt-bindings: soc: samsung: exynos-sysreg: add power-domains
  soc: samsung: gs101-pmu: implement access tables for read and write
  soc: samsung: exynos-pmu: move some gs101 related code into new file
  soc: samsung: exynos-pmu: allow specifying read & write access tables for secure regmap
  dt-bindings: samsung: exynos-sysreg: add exynos7870 sysregs
  soc: samsung: exynos-chipid: add exynos8890 SoC support
  dt-bindings: hwinfo: samsung,exynos-chipid: add exynos8890-chipid compatible
  dt-bindings: soc: samsung: exynos-pmu: add exynos8890 compatible
  soc: samsung: exynos-pmu: Annotate online/offline functions with __must_hold
  soc: samsung: exynos-chipid: Add exynos9610 SoC support
  dt-bindings: hwinfo: samsung,exynos-chipid: add exynos9610 compatible
  dt-bindings: soc: samsung: exynos-sysreg: Add Exynos990 PERIC0/1 compatibles

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 10:21:57 +01:00
David Gow
a74b6c0e53 um: Don't rename vmap to kernel_vmap
In order to work around the existence of a vmap symbol in libpcap, the
UML makefile unconditionally redefines vmap to kernel_vmap. However,
this not only affects the actual vmap symbol, but also anything else
named vmap, including a number of struct members in DRM.

This would not be too much of a problem, since all uses are also
updated, except we now have Rust DRM bindings, which expect the
corresponding Rust structs to have 'vmap' names. Since the redefinition
applies in bindgen, but not to Rust code, we end up with errors such as:

error[E0560]: struct `drm_gem_object_funcs` has no fields named `vmap`
  --> rust/kernel/drm/gem/mod.rs:210:9

Since libpcap support was removed in commit 12b8e7e69a ("um: Remove
obsolete pcap driver"), remove the, now unnecessary, define as well.

We also take this opportunity to update the comment.

Signed-off-by: David Gow <davidgow@google.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://patch.msgid.link/20251122083213.3996586-1-davidgow@google.com
Fixes: 12b8e7e69a ("um: Remove obsolete pcap driver")
[adjust commmit message a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-25 10:18:37 +01:00
Arnd Bergmann
bcae469769 Merge tag 'memory-controller-drv-6.19-2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl into soc/drivers
Memory controller drivers for v6.19

1. Tegra drivers: Several cleanups (dev_err_probe(), error messages).
2. Renesas RPC IF: Add system suspend support.

* tag 'memory-controller-drv-6.19-2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl:
  memory: tegra186-emc: Fix missing put_bpmp
  memory: renesas-rpc-if: Add suspend/resume support
  memory: tegra30-emc: Add the SoC model prefix to functions
  memory: tegra20-emc: Add the SoC model prefix to functions
  memory: tegra186-emc: Add the SoC model prefix to functions
  memory: tegra124-emc: Add the SoC model prefix to functions
  memory: tegra124-emc: Simplify and handle deferred probe with dev_err_probe()
  memory: tegra186-emc: Simplify and handle deferred probe with dev_err_probe()
  memory: tegra20-emc: Simplify and handle deferred probe with dev_err_probe()
  memory: tegra30-emc: Simplify and handle deferred probe with dev_err_probe()
  memory: tegra30-emc: Do not print error on icc_node_create() failure
  memory: tegra20-emc: Do not print error on icc_node_create() failure
  memory: tegra186-emc: Do not print error on icc_node_create() failure
  memory: tegra124-emc: Do not print error on icc_node_create() failure
  memory: tegra124-emc: Simplify return of emc_init()

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 09:24:27 +01:00
Arnd Bergmann
d30c7cf491 Merge tag 'renesas-drivers-for-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into soc/drivers
Renesas driver updates for v6.19

  - Keep the WDTRSTCR.RESBAR2S bit in the default state on R-Car Gen4.

* tag 'renesas-drivers-for-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel:
  soc: renesas: rcar-rst: Keep RESBAR2S in default state

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 09:23:02 +01:00
Wanpeng Li
32bd348be3 KVM: Fix last_boosted_vcpu index assignment bug
In kvm_vcpu_on_spin(), the loop counter 'i' is incorrectly written to
last_boosted_vcpu instead of the actual vCPU index 'idx'. This causes
last_boosted_vcpu to store the loop iteration count rather than the
vCPU index, leading to incorrect round-robin behavior in subsequent
directed yield operations.

Fix this by using 'idx' instead of 'i' in the assignment.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20251110033232.12538-7-kernellwp@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-11-25 09:15:38 +01:00
Arnd Bergmann
feae664d3a Merge tag 'v6.19-rockchip-dts64-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt
New boards: QNAP TS233 (2-bay variant of the RK3568 NAS series) and
Asus Tinkerboard 3 + 3S.
Additional peripherals enabled on 100ASK DshanPi A1, Orange Pi 3B,
Indiedroid Nova, QNAP-TSx33 series + LED states on Radxa boards,
power-domains for the previously added RK3368 display components.

* tag 'v6.19-rockchip-dts64-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: (22 commits)
  arm64: dts: rockchip: enable RTC for 100ASK DshanPi A1
  arm64: dts: rockchip: enable USB for 100ASK DshanPi A1
  arm64: dts: rockchip: enable button for 100ASK DshanPi A1
  arm64: dts: rockchip: add mmc aliases for 100ASK DshanPi A1
  arm64: dts: rockchip: remove mmc max-frequency for 100ASK DshanPi A1
  arm64: dts: rockchip: Enable i2c2 on Orange Pi 3B
  arm64: dts: rockchip: Use default-state for power LED for Radxa boards
  arm64: dts: rockchip: fix PCIe 3.3V regulator voltage on 9Tripod X3568 v4
  arm64: dts: rockchip: Add power-domain to RK3368 VOP controller
  arm64: dts: rockchip: Add power-domain to RK3368 DSI controller
  arm64: dts: rockchip: Add host wake pin for wifi on Indiedroid Nova
  arm64: dts: rockchip: Correct pinctrl for pcie for Indiedroid Nova
  arm64: dts: rockchip: Define regulator for pcie2x1l2 on Indiedroid Nova
  arm64: dts: rockchip: Add clk32k_in for Indiedroid Nova
  arm64: dts: rockchip: Add Asus Tinker Board 3 and 3S device tree
  dt-bindings: arm: rockchip: Add Asus Tinker Board 3/3S
  dt-bindings: arm: rockchip: merge Asus Tinker and Tinker S
  arm64: dts: rockchip: add QNAP TS233 devicetree
  dt-bindings: arm: rockchip: add TS233 to RK3568-based QNAP NAS devices
  arm64: dts: rockchip: move common qnap tsx33 parts to dtsi
  ...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 09:13:43 +01:00
Rob Herring (Arm)
d86a4e6458 arm64: dts: sprd: sc9860: Simplify clock nodes
The various "syscon" nodes in SC9860 are only referenced by clock
provider nodes in a 1:1 relationship, and nothing else references the
"syscon" nodes. There's no apparent reason for this split. The 2 nodes
can simply be merged into 1 node. The clock driver has supported using
either "reg" or "sprd,syscon" to access registers from the start, so
there shouldn't be any compatibility issues.

With this, DT schema warnings for missing a specific compatible with
"syscon" and non-MMIO devices on "simple-bus" are fixed.

Reviewed-by: Chunyan Zhang <zhang.lyra@gmail.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20251124210031.767382-2-robh@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25 09:13:11 +01:00
Richard Weinberger
84a8d467cc pagemap: update BUDDY flag documentation
Since v4.6 the BUDDY flag is set for _all_ pages in the block
and no longer just for the first one.
This change was introduced by:
commit 832fc1de01 ("/proc/kpageflags: return KPF_BUDDY for "tail" buddy pages")

Strictly speaking, this was an ABI change, but as nobody has noticed
since 2016, let's just update the documentation.

Link: https://lkml.kernel.org/r/20251122211920.3410371-1-richard@nod.at
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:56 -08:00
Youngjun Park
b7dd80f8f9 mm: swap: remove scan_swap_map_slots() references from comments
The scan_swap_map_slots() helper has been removed, but several comments
still referred to it in swap allocation and reclaim paths.  This patch
cleans up those outdated references and reflows the affected comment
blocks to match kernel coding style.

Link: https://lkml.kernel.org/r/20251031065011.40863-6-youngjun.park@lge.com
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: Barry Song <baohua@kernel.org>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:56 -08:00
Youngjun Park
4c239d5f59 mm: swap: change swap_alloc_slow() to void
swap_alloc_slow() does not need to return a bool, as all callers handle
allocation results via the entry parameter.  Update the function signature
and remove return statements accordingly.

Link: https://lkml.kernel.org/r/20251031065011.40863-5-youngjun.park@lge.com
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
Reviewed-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: Barry Song <baohua@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:56 -08:00
Youngjun Park
f1bae15c6a mm, swap: remove redundant comment for read_swap_cache_async
The function now manages get/put_swap_device() internally, making the
comment explaining this behavior to callers unnecessary.

Link: https://lkml.kernel.org/r/20251031065011.40863-4-youngjun.park@lge.com
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:56 -08:00
Youngjun Park
68f78bf55b mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational
The current non rotational check is unreliable as the device's rotational
status can be changed by a user via sysfs.

Use the more reliable SWP_SOLIDSTATE flag which is set at swapon time, to
ensure the nr_rotate_swap count remains consistent.  Plus, it is easy to
read and simple.

Link: https://lkml.kernel.org/r/20251031065011.40863-3-youngjun.park@lge.com
Fixes: 81a0298bdf ("mm, swap: don't use VMA based swap readahead if HDD is used as swap")
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:56 -08:00
Youngjun Park
cb65082a0a mm, swap: fix memory leak in setup_clusters() error path
Patch series "mm: swap: small fixes and comment cleanups", v2.

This series provides a few small fixes and cleanups for the swap code.

The first patch fixes a memory leak in an error path that was recently
introduced. The subsequent patches include minor logic adjustments and
the removal of redundant comments.


This patch (of 5):

setup_clusters() could leak 'cluster_info' memory if an error occurred on
a path that did not jump to the 'err_free' label.

This patch simplifies the error handling by removing the goto label and
instead calling free_cluster_info() on all error exit paths.

The new logic is safe, as free_cluster_info() already handles NULL pointer
inputs.

Link: https://lkml.kernel.org/r/20251031065011.40863-1-youngjun.park@lge.com
Link: https://lkml.kernel.org/r/20251031065011.40863-2-youngjun.park@lge.com
Fixes: 07adc4cf1e ("mm, swap: implement dynamic allocation of swap table")
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
Reviewed-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:56 -08:00
Youngjun Park
c230719523 mm/swap: fix wrong plist empty check in swap_alloc_slow()
swap_alloc_slow() was checking `si->avail_list` instead of
`next->avail_list` when verifying if the next swap device is still in the
list, which could cause unnecessary restarts during allocation.

Link: https://lkml.kernel.org/r/20251119114136.594108-1-youngjun.park@lge.com
Fixes: 8e689f8ea4 ("mm/swap: do not choose swap device according to numa node")
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
Acked-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:56 -08:00
Dan Carpenter
a9ce09b157 mm/damon/tests/sysfs-kunit: fix use after free on error path
Re-order these frees to avoid dereferencing "sysfs_target" after it has
been freed.

Link: https://lkml.kernel.org/r/aSBq5uSPIqsqH8zO@stanley.mountain
Fixes: ee131696794c ("mm/damon/tests/sysfs-kunit: handle alloc failures on damon_sysfs_test_add_targets()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:55 -08:00
Gregory Price
348ced3da5 hugetlb: add __read_mostly to sysctl_hugetlb_shm_group
sysctl bits are mostly-read values.

Link: https://lkml.kernel.org/r/20251121194859.265259-2-gourry@gourry.net
Signed-off-by: Gregory Price <gourry@gourry.net>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:55 -08:00
Lorenzo Stoakes
ccf9eb326b tools/testing/vma: add missing stub
vm_flags_reset() is not available in the userland VMA tests, so add a stub
which const-casts vma->vm_flags and avoids the upcoming removal of the
vma->__vm_flags field.

Link: https://lkml.kernel.org/r/4aff8bf7-d367-4ba3-90ad-13eef7a063fa@lucifer.local
Fixes: c5c67c1de357 ("tools/testing/vma: eliminate dependency on vma->__vm_flags")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:55 -08:00
Yuwen Chen
04d31610a7 zram: fix the issue that the write - back limits might overflow
When the page size exceeds 4KB, if bd_wb_limit is set to a value that is
not aligned with the page size, it will cause a numerical wrap-around
issue for bd_wb_limit.  For example, when the page size is set to 16KB and
bd_wb_limit is set to 3, after one write-back operation, the value of
bd_wb_limit will become -1.  More seriously, since bd_wb_limit is an
unsigned number, its value may become as large as 2^64 - 1.

The core reason for this problem is that the unit of bd_wb_limit is 4KB. 
For example, when a write-back occurs on a system with a page size of
16KB, 4 needs to be subtracted from bd_wb_limit.  This operation takes
place in the zram_account_writeback_submit function.

This patch fixes the issue by limiting bd_wb_limit to be an integer
multiple of PAGE_SIZE / 4096.

Link: https://lkml.kernel.org/r/tencent_5936CFE72BAB2BA76887BB69DCC1B5E67C05@qq.com
Fixes: 1d69a3f8ae ("zram: idle writeback fixes and cleanup")
Signed-off-by: Yuwen Chen <ywen.chen@foxmail.com>
Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Richard Chang <richardycc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:55 -08:00
Matthew Wilcox (Oracle)
ecf371b2ca mm: tweak __vma_enter_locked()
Move the commentary on how __vma_enter_locked() behaves from the body of
__vma_start_write() to the head of __vma_enter_locked() and merge it with
the existing documentation.  Also add a call to
mmap_assert_write_locked().

Link: https://lkml.kernel.org/r/20251119042639.3937024-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:55 -08:00
Xie Yuanbin
31807483d3 mm/memory-failure: remove the selection of RAS
commit 97f0b13452 ("tracing: add trace event for
memory-failure") introduces the selection of RAS in memory-failure.  This
commit is just a tracing feature; in reality, there is no dependency
between memory-failure and RAS.  RAS increases the size of the bzImage
image by 8k, which is very valuable for embedded devices.

Move the memory-failure traceing code from ras_event.h to
memory-failure.h and remove the selection of RAS.

Link: https://lkml.kernel.org/r/20251119095943.67125-1-xieyuanbin1@huawei.com
Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:55 -08:00
Chunyan Zhang
519912bdae dt-bindings: riscv: Add Svrsw60t59b extension description
Add description for the Svrsw60t59b extension (PTE Reserved for SW
bits 60:59) extension which was ratified recently in
riscv-non-isa/riscv-iommu.

Link: https://lkml.kernel.org/r/20251113072806.795029-7-zhangchunyan@iscas.ac.cn
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Conor Dooley <conor@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:55 -08:00
Chunyan Zhang
c64da3950c riscv: mm: add userfaultfd write-protect support
The Svrsw60t59b extension allows to free the PTE reserved bits 60 and 59
for software, this patch uses bit 60 for uffd-wp tracking

Additionally for tracking the uffd-wp state as a PTE swap bit, we borrow
bit 4 which is not involved into swap entry computation.

Link: https://lkml.kernel.org/r/20251113072806.795029-6-zhangchunyan@iscas.ac.cn
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Conor Dooley <conor.dooley@microchip.com>
Cc: Conor Dooley <conor@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:55 -08:00
Chunyan Zhang
2a3ebad4db riscv: mm: add soft-dirty page tracking support
The Svrsw60t59b extension allows to free the PTE reserved bits 60 and 59
for software, this patch uses bit 59 for soft-dirty.

To add swap PTE soft-dirty tracking, we borrow bit 3 which is available
for swap PTEs on RISC-V systems.

Link: https://lkml.kernel.org/r/20251113072806.795029-5-zhangchunyan@iscas.ac.cn
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Reviewed-by: Deepak Gupta <debug@rivosinc.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Conor Dooley <conor.dooley@microchip.com>
Cc: Conor Dooley <conor@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:55 -08:00
Chunyan Zhang
59f6acb4be riscv: add RISC-V Svrsw60t59b extension support
The Svrsw60t59b extension allows to free the PTE reserved bits 60 and 59
for software to use.

Link: https://lkml.kernel.org/r/20251113072806.795029-4-zhangchunyan@iscas.ac.cn
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Deepak Gupta <debug@rivosinc.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Conor Dooley <conor.dooley@microchip.com>
Cc: Conor Dooley <conor@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:55 -08:00
Chunyan Zhang
f59c0924d6 mm: userfaultfd: add pgtable_supports_uffd_wp()
Some platforms can customize the PTE/PMD entry uffd-wp bit making it
unavailable even if the architecture provides the resource.  This patch
adds a macro API pgtable_supports_uffd_wp() that allows architectures to
define their specific implementations to check if the uffd-wp bit is
available on which device the kernel is running.

Also this patch is removing "ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP" and
"ifdef CONFIG_PTE_MARKER_UFFD_WP" in favor of pgtable_supports_uffd_wp()
and uffd_supports_wp_marker() checks respectively that default to
IS_ENABLED(CONFIG_HAVE_ARCH_USERFAULTFD_WP) and
"IS_ENABLED(CONFIG_HAVE_ARCH_USERFAULTFD_WP) &&
IS_ENABLED(CONFIG_PTE_MARKER_UFFD_WP)" if not overridden by the
architecture, no change in behavior is expected.

Link: https://lkml.kernel.org/r/20251113072806.795029-3-zhangchunyan@iscas.ac.cn
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Conor Dooley <conor.dooley@microchip.com>
Cc: Conor Dooley <conor@kernel.org>
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:54 -08:00
Chunyan Zhang
277a1ae387 mm: softdirty: add pgtable_supports_soft_dirty()
Patch series "mm: Add soft-dirty and uffd-wp support for RISC-V", v15.

This patchset adds support for Svrsw60t59b [1] extension which is ratified
now, also add soft dirty and userfaultfd write protect tracking for
RISC-V.

The patches 1 and 2 add macros to allow architectures to define their own
checks if the soft-dirty / uffd_wp PTE bits are available, in other words
for RISC-V, the Svrsw60t59b extension is supported on which device the
kernel is running.  Also patch1-2 are removing "ifdef
CONFIG_MEM_SOFT_DIRTY" "ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP" and "ifdef
CONFIG_PTE_MARKER_UFFD_WP" in favor of checks which if not overridden by
the architecture, no change in behavior is expected.

This patchset has been tested with kselftest mm suite in which soft-dirty,
madv_populate, test_unmerge_uffd_wp, and uffd-unit-tests run and pass, and
no regressions are observed in any of the other tests.


This patch (of 6):

Some platforms can customize the PTE PMD entry soft-dirty bit making it
unavailable even if the architecture provides the resource.

Add an API which architectures can define their specific implementations
to detect if soft-dirty bit is available on which device the kernel is
running.

This patch is removing "ifdef CONFIG_MEM_SOFT_DIRTY" in favor of
pgtable_supports_soft_dirty() checks that defaults to
IS_ENABLED(CONFIG_MEM_SOFT_DIRTY), if not overridden by the architecture,
no change in behavior is expected.

We make sure to never set VM_SOFTDIRTY if !pgtable_supports_soft_dirty(),
so we will never run into VM_SOFTDIRTY checks.

[lorenzo.stoakes@oracle.com: fix VMA selftests]
  Link: https://lkml.kernel.org/r/dac6ddfe-773a-43d5-8f69-021b9ca4d24b@lucifer.local
Link: https://lkml.kernel.org/r/20251113072806.795029-1-zhangchunyan@iscas.ac.cn
Link: https://lkml.kernel.org/r/20251113072806.795029-2-zhangchunyan@iscas.ac.cn
Link: https://github.com/riscv-non-isa/riscv-iommu/pull/543 [1]
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Conor Dooley <conor@kernel.org>
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:54 -08:00
Vishal Moola (Oracle)
d85b653f2c mm/vmalloc: cleanup gfp flag use in new_vmap_block()
The only caller, vb_alloc(), passes GFP_KERNEL into new_vmap_block() which
is a subset of GFP_RECLAIM_MASK.  Since there's no reason to use this mask
here, remove it.

Link: https://lkml.kernel.org/r/20251121094405.40628-5-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Acked-by: SeongJae Park <sj@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:54 -08:00
Vishal Moola (Oracle)
75f20b1744 mm/vmalloc: cleanup large_gfp in vm_area_alloc_pages()
Now that we have already checked for unsupported flags, we can use the
helper function to set the necessary gfp flags for the large order
allocation optimization.

Link: https://lkml.kernel.org/r/20251121094405.40628-4-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Acked-by: SeongJae Park <sj@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:54 -08:00
Vishal Moola (Oracle)
bb4d3c7686 mm/vmalloc: add a helper to optimize vmalloc allocation gfps
vm_area_alloc_pages() attempts to use different gfp flags as a way to
optimize allocations.  This has been done inline which makes things harder
to read.

Add a helper function to make the code more readable.

Link: https://lkml.kernel.org/r/20251121094405.40628-3-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Acked-by: SeongJae Park <sj@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:54 -08:00
Vishal Moola (Oracle)
07003531e0 mm/vmalloc: warn on invalid vmalloc gfp flags
Patch series "make vmalloc gfp flags usage more apparent", v4.

We should do a better job at enforcing gfp flags for vmalloc.  Right now,
we have a kernel-doc for __vmalloc_node_range(), and hope callers pass in
supported flags.  If a caller were to pass in an unsupported flag, we may
BUG, silently clear it, or completely ignore it.

If we are more proactive about enforcing gfp flags, we can making sure
callers know when they may be asking for unsupported behavior.

This patchset lets vmalloc control the incoming gfp flags, and cleans up
some hard to read gfp code.


This patch (of 4):

Vmalloc explicitly supports a list of flags, but we never enforce them. 
vmalloc has been trying to handle unsupported flags by clearing and
setting flags wherever necessary.  This is messy and makes the code harder
to understand, when we could simply check for a supported input
immediately instead.

Define a helper mask and function telling callers they have passed in
invalid flags, and clear those unsupported vmalloc flags.

Link: https://lkml.kernel.org/r/20251121094405.40628-1-vishal.moola@gmail.com
Link: https://lkml.kernel.org/r/20251121094405.40628-2-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Acked-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:54 -08:00
Shakeel Butt
c1bd09994c memcg: remove __lruvec_stat_mod_folio
__lruvec_stat_mod_folio() is already safe against irqs, so there is no
need to have a separate interface (i.e.  lruvec_stat_mod_folio) which
wraps calls to it with irq disabling and reenabling.  Let's rename
__lruvec_stat_mod_folio() to lruvec_stat_mod_folio().

Link: https://lkml.kernel.org/r/20251110232008.1352063-5-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:54 -08:00
Shakeel Butt
5b3eb779a2 memcg: remove __mod_lruvec_state
__mod_lruvec_state() is already safe against irqs, so there is no need to
have a separate interface (i.e.  mod_lruvec_state) which wraps calls to it
with irq disabling and reenabling.  Let's rename __mod_lruvec_state() to
mod_lruvec_state().

Link: https://lkml.kernel.org/r/20251110232008.1352063-4-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:54 -08:00
Shakeel Butt
469241fe76 memcg: remove __mod_lruvec_kmem_state
__mod_lruvec_kmem_state() is already safe against irqs, so there is no
need to have a separate interface (i.e.  mod_lruvec_kmem_state) which
wraps calls to it with irq disabling and reenabling.  Let's rename
__mod_lruvec_kmem_state() to mod_lruvec_kmem_state().

Link: https://lkml.kernel.org/r/20251110232008.1352063-3-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:54 -08:00
Shakeel Butt
7e44d00a13 memcg: use mod_node_page_state to update stats
Patch series "memcg: cleanup the memcg stats interfaces".

The memcg stats are safe against irq (and nmi) context and thus does not
require disabling irqs.  However for some stats which are also maintained
at node level, it is using irq unsafe interface and thus requiring the
users to still disables irqs or use interfaces which explicitly disables
irqs.  Let's move memcg code to use irq safe node level stats function
which is already optimized for architectures with HAVE_CMPXCHG_LOCAL (all
major ones), so there will not be any performance penalty for its usage.


This patch (of 4):

The memcg stats are safe against irq (and nmi) context and thus does not
require disabling irqs.  However some code paths for memcg stats also
update the node level stats and use irq unsafe interface and thus require
the users to disable irqs.  However node level stats, on architectures
with HAVE_CMPXCHG_LOCAL (all major ones), has interface which does not
require irq disabling.  Let's move memcg stats code to start using that
interface for node level stats.

Link: https://lkml.kernel.org/r/20251110232008.1352063-1-shakeel.butt@linux.dev
Link: https://lkml.kernel.org/r/20251110232008.1352063-2-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:53 -08:00
Peng Li
3e700b715e selftests/mm: gup_test: fix comment regarding origin of FOLL_WRITE
The 'FOLL_WRITE' of the copied source is located in mm_types.h of mm, not
mm.h, so fix it.

Link: https://lkml.kernel.org/r/20251117154012.197499-2-peng8420.li@gmail.com
Signed-off-by: Peng Li <peng8420.li@gmail.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:53 -08:00
Peng Li
218fbfad16 selftests/mm: gup_test: stop testing FOLL_TOUCH
commit 0f20bba168 ("mm/gup: explicitly define and check internal GUP
flags, disallow FOLL_TOUCH") marked FOLL_TOUCH as a GUP-internal flag.

This causes a warning to fire when running gup_test, for example:

    $ ./gup_test -L -r 100 -z

dmesg:
    WARNING: CPU: 1 PID: 117 at mm/gup.c:2512 is_valid_gup_args+0x66/0x8c

Therefore, remove the "FOLL_TOUCH" test code from gup_test.c.

Link: https://lkml.kernel.org/r/20251117154012.197499-1-peng8420.li@gmail.com
Signed-off-by: Peng Li <peng8420.li@gmail.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:53 -08:00
Balbir Singh
cab812d9c9 mm/huge_memory.c: introduce folio_split_unmapped
Unmapped was added as a parameter to __folio_split() and related call
sites to support splitting of folios already in the midst of a migration. 
This special case arose for device private folio migration since during
migration there could be a disconnect between source and destination on
the folio size.

Introduce folio_split_unmapped() to handle this special case.  Also
refactor code and add __folio_freeze_and_split_unmapped() helper that is
common to both __folio_split() and folio_split_unmapped().

This in turn removes the special casing introduced by the unmapped
parameter in __folio_split().

[balbirs@nvidia.com: v2]
  Link: https://lkml.kernel.org/r/20251115084041.3914728-1-balbirs@nvidia.com
[balbirs@nvidia.com: fix clang-20 build]
  Link: https://lkml.kernel.org/r/20251120134232.3588203-1-balbirs@nvidia.com
[akpm@linux-foundation.org: add `inline' to shmem_uncharge() stub, per Balbir]
Link: https://lkml.kernel.org/r/20251114012228.2634882-1-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Suggested-by: Zi Yan <ziy@nvidia.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:53 -08:00
Baolin Wang
8826f09616 mm: shmem: allow fallback to smaller large orders for tmpfs mmap() access
After commit 69e0a3b490 ("mm: shmem: fix the strategy for the tmpfs
'huge=' options"), we have fixed the large order allocation strategy for
tmpfs, which always tries PMD-sized large folios first, and if that fails,
falls back to smaller large folios.  For tmpfs large folio allocation via
mmap(), we should maintain the same strategy as well.  Let's unify the
large order allocation strategy for tmpfs.

There is no functional change for large folio allocation of anonymous shmem.

Link: https://lkml.kernel.org/r/283a0bdfd6ac7aa334a491422bcae70919c572bd.1763008453.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:53 -08:00
Sergey Senozhatsky
1b1a4e4d67 zram: read slot block idx under slot lock
Read slot's block id under slot-lock.  We release the slot-lock for bdev
read so, technically, slot still can get freed in the meantime, but at
least we will read bdev block (page) that holds previous know slot data,
not from slot->handle bdev block, which can be anything at that point.

Link: https://lkml.kernel.org/r/20251122074029.3948921-7-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Richard Chang <richardycc@google.com>
Cc: Yuwen Chen <ywen.chen@foxmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:53 -08:00
Sergey Senozhatsky
e87ddea345 zram: rework bdev block allocation
First, writeback bdev ->bitmap bits are set only from one context, as we
can have only one single task performing writeback, so we cannot race with
anything else.  Remove retry path.

Second, we always check ZRAM_WB flag to distinguish writtenback slots, so
we should not confuse 0 bdev block index and 0 handle.  We can use first
bdev block (0 bit) for writeback as well.

While at it, give functions slightly more accurate names, as we don't
alloc/free anything there, we reserve a block for async writeback or
release the block.

Link: https://lkml.kernel.org/r/20251122074029.3948921-6-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Brian Geffon <bgeffon@google.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Richard Chang <richardycc@google.com>
Cc: Yuwen Chen <ywen.chen@foxmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:53 -08:00
Sergey Senozhatsky
a4f506c569 zram: drop wb_limit_lock
We don't need wb_limit_lock.  Writeback limit setters take an exclusive
write zram init_lock, while wb_limit modifications happen only from a
single task and under zram read init_lock.  No concurrent wb_limit
modifications are possible (we permit only one post-processing task at a
time).  Add lockdep assertions to wb_limit mutators.

While at it, fixup coding styles.

Link: https://lkml.kernel.org/r/20251122074029.3948921-5-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Brian Geffon <bgeffon@google.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Richard Chang <richardycc@google.com>
Cc: Yuwen Chen <ywen.chen@foxmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:53 -08:00
Sergey Senozhatsky
7c929664fd zram: take write lock in wb limit store handlers
Write device attrs handlers should take write zram init_lock.  While at
it, fixup coding styles.

Link: https://lkml.kernel.org/r/20251122074029.3948921-4-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Brian Geffon <bgeffon@google.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Richard Chang <richardycc@google.com>
Cc: Yuwen Chen <ywen.chen@foxmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:53 -08:00
Sergey Senozhatsky
e828cccb72 zram: add writeback batch size device attr
Introduce writeback_batch_size device attribute so that the maximum number
of in-flight writeback bio requests can be configured at run-time
per-device.  This essentially enables batched bio writeback.

Link: https://lkml.kernel.org/r/20251122074029.3948921-3-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Brian Geffon <bgeffon@google.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Richard Chang <richardycc@google.com>
Cc: Yuwen Chen <ywen.chen@foxmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:53 -08:00
Sergey Senozhatsky
f405066a1f zram: introduce writeback bio batching
Patch series "zram: introduce writeback bio batching", v6.

As writeback is becoming more and more common the longstanding limitations
of zram writeback throughput are becoming more visible.  Introduce
writeback bio batching so that multiple writeback bios can be processed
simultaneously.


This patch (of 6):

As was stated in a comment [1] a single page writeback IO is not
efficient, but it works.  It's time to address this throughput limitation
as writeback becomes used more often.  Introduce batched (multiple) bio
writeback support to take advantage of parallel requests processing and
better requests scheduling.

Approach used in this patch doesn't use a dedicated kthread like in [2],
or blk-plug like in [3].  Dedicated kthread adds complexity, which can be
avoided.  Apart from that not all zram setups use writeback, so having
numerous per-device kthreads (on systems that create multiple zram
devices) hanging around is not the most optimal thing to do.  blk-plug, on
the other hand, works best when request are sequential, which doesn't
particularly fit zram writebck IO patterns: zram writeback IO patterns are
expected to be random, due to how bdev block reservation/release are
handled.  blk-plug approach also works in cycles: idle IO, when zram sets
up requests in a batch, is followed by bursts of IO, when zram submits the
entire batch.

Instead we use a batch of requests and submit new bio as soon as one of
the in-flight requests completes.

For the time being the writeback batch size (maximum number of in-flight
bio requests) is set to 32 for all devices.  A follow up patch adds a
writeback_batch_size device attribute, so the batch size becomes run-time
configurable.

Link: https://lkml.kernel.org/r/20251122074029.3948921-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20251122074029.3948921-2-senozhatsky@chromium.org
Link: https://lore.kernel.org/all/20181203024045.153534-6-minchan@kernel.org/ [1]
Link: https://lore.kernel.org/all/20250731064949.1690732-1-richardycc@google.com/ [2]
Link: https://lore.kernel.org/all/tencent_78FC2C4FE16BA1EBAF0897DB60FCD675ED05@qq.com/ [3]
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Co-developed-by: Yuwen Chen <ywen.chen@foxmail.com>
Co-developed-by: Richard Chang <richardycc@google.com>
Suggested-by: Minchan Kim <minchan@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Richard Chang <richardycc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:52 -08:00
Wei Yang
9e01407708 mm/khugepaged: unify SCAN_PMD_NONE and SCAN_PMD_NULL into SCAN_NO_PTE_TABLE
The current hugepage collapse scan results include two separate values,
SCAN_PMD_NONE and SCAN_PMD_NULL, which are handled identically by the
consuming code.

To reduce confusion and improve long-term maintenance, this commit merges
these two functionally equivalent states into a single, clearer
identifier: SCAN_NO_PTE_TABLE

Link: https://lkml.kernel.org/r/20251114030028.7035-4-richard.weiyang@gmail.com
Suggested-by: "David Hildenbrand (Red Hat)" <david@kernel.org>
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Nico Pache <npache@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:52 -08:00
Wei Yang
f1040f8898 mm/khugepaged: continue to collapse on SCAN_PMD_NONE
SCAN_PMD_NONE means current pmd is empty, but we can still continue
collapse next pmd range.

Link: https://lkml.kernel.org/r/20251114030028.7035-3-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Barry Song <baohua@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:52 -08:00
Wei Yang
eaa4c8063f mm/khugepaged: remove redundant clearing of struct collapse_control
Patch series "unify PMD scan results and remove redundant cleanup", v2.

This small series addresses two minor cleanup opportunities in the
hugepage collapse logic.

The initial motivation arose during a code review of madvise_collapse(),
where it was noted that the function was missing a handler for
SCAN_PMD_NONE.  This oversight exposed the inconsistent handling of
SCAN_PMD_NULL and SCAN_PMD_NONE.

Since both scan results are functionally identical (they indicate the
absence of a PTE table), the primary patch unifies them into a single,
clearer identifier, SCAN_NO_PTE_TABLE.

The series also takes the opportunity to remove a redundant clearing of
the struct collapse_control.


This patch (of 3):

The structure struct collapse_control is being unnecessarily cleared twice
during the huge page collapse process.

Both hpage_collapse_scan_file() and hpage_collapse_scan_pmd() currently
perform a clear operation on this structure.

Remove the redundant clear operation.

Link: https://lkml.kernel.org/r/20251114030028.7035-1-richard.weiyang@gmail.com
Link: https://lkml.kernel.org/r/20251114030028.7035-2-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Nico Pache <npache@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:52 -08:00
Qi Zheng
46156dba32 mm: thp: reparent the split queue during memcg offline
Similar to list_lru, the split queue is relatively independent and does
not need to be reparented along with objcg and LRU folios (holding objcg
lock and lru lock).  So let's apply the similar mechanism as list_lru to
reparent the split queue separately when memcg is offine.

This is also a preparation for reparenting LRU folios.

Link: https://lkml.kernel.org/r/8703f907c4d1f7e8a2ef2bfed3036a84fa53028b.1762762324.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:52 -08:00
Muchun Song
776bde7caf mm: thp: use folio_batch to handle THP splitting in deferred_split_scan()
The maintenance of the folio->_deferred_list is intricate because it's
reused in a local list.

Here are some peculiarities:

   1) When a folio is removed from its split queue and added to a local
      on-stack list in deferred_split_scan(), the ->split_queue_len isn't
      updated, leading to an inconsistency between it and the actual
      number of folios in the split queue.

   2) When the folio is split via split_folio() later, it's removed from
      the local list while holding the split queue lock. At this time,
      the lock is not needed as it is not protecting anything.

   3) To handle the race condition with a third-party freeing or migrating
      the preceding folio, we must ensure there's always one safe (with
      raised refcount) folio before by delaying its folio_put(). More
      details can be found in commit e66f3185fa ("mm/thp: fix deferred
      split queue not partially_mapped"). It's rather tricky.

We can use the folio_batch infrastructure to handle this clearly.  In this
case, ->split_queue_len will be consistent with the real number of folios
in the split queue.  If list_empty(&folio->_deferred_list) returns false,
it's clear the folio must be in its split queue (not in a local list
anymore).

In the future, we will reparent LRU folios during memcg offline to
eliminate dying memory cgroups, which requires reparenting the split queue
to its parent first.  So this patch prepares for using
folio_split_queue_lock_irqsave() as the memcg may change then.

Link: https://lkml.kernel.org/r/59cb6b6fb5ffcff9d23b81890b252960139ad8e7.1762762324.git.zhengqi.arch@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nico Pache <npache@redhat.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:52 -08:00
Muchun Song
ad7c7f4576 mm: thp: introduce folio_split_queue_lock and its variants
In future memcg removal, the binding between a folio and a memcg may
change, making the split lock within the memcg unstable when held.

A new approach is required to reparent the split queue to its parent. 
This patch starts introducing a unified way to acquire the split lock for
future work.

It's a code-only refactoring with no functional changes.

Link: https://lkml.kernel.org/r/a31a90bcac04dc754f775e87ae3205be3170b571.1762762324.git.zhengqi.arch@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nico Pache <npache@redhat.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:52 -08:00
Muchun Song
fd603ae11e mm: thp: replace folio_memcg() with folio_memcg_charged()
Patch series "reparent the THP split queue", v6.

In the future, we will reparent LRU folios during memcg offline to
eliminate dying memory cgroups, which requires reparenting the THP split
queue to its parent memcg.

Similar to list_lru, the split queue is relatively independent and does
not need to be reparented along with objcg and LRU folios (holding objcg
lock and lru lock).  Therefore, we can apply the same mechanism as
list_lru to reparent the split queue first when memcg is offine.

The first three patches in this series are separated from the series
"Eliminate Dying Memory Cgroup" [1], mainly to do some cleanup and
preparatory work.

The last patch reparents the THP split queue to its parent memcg during
memcg offline.


This patch (of 4):

folio_memcg_charged() is intended for use when the user is unconcerned
about the returned memcg pointer.  It is more efficient than
folio_memcg().  Therefore, replace folio_memcg() with
folio_memcg_charged().

Link: https://lkml.kernel.org/r/56624d537520e33e5a6b3755238b3dfb959a52ee.1762762324.git.zhengqi.arch@bytedance.com
Link: https://lore.kernel.org/all/20250415024532.26632-1-songmuchun@bytedance.com/ [1]
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:52 -08:00
Lorenzo Stoakes
a3a3e215c9 mm: replace remaining pte_to_swp_entry() with softleaf_from_pte()
There are straggler invocations of pte_to_swp_entry() lying around,
replace all of these with the software leaf entry equivalent -
softleaf_from_pte().

With those removed, eliminate pte_to_swp_entry() altogether.

No functional change intended.

Link: https://lkml.kernel.org/r/d8ee5ccefe4c42d7c4fe1a2e46f285ac40421cd3.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:52 -08:00
Lorenzo Stoakes
93976a2034 mm: eliminate further swapops predicates
Having converted so much of the code base to software leaf entries, we can
mop up some remaining cases.

We replace is_pfn_swap_entry(), pfn_swap_entry_to_page(),
is_writable_device_private_entry(), is_device_exclusive_entry(),
is_migration_entry(), is_writable_migration_entry(),
is_readable_migration_entry(), swp_offset_pfn() and pfn_swap_entry_folio()
with softleaf equivalents.

No functional change intended.

Link: https://lkml.kernel.org/r/956bc9c031604811c0070d2f4bf2f1373f230213.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:52 -08:00
Lorenzo Stoakes
03bfbc3ad6 mm: remove is_hugetlb_entry_[migration, hwpoisoned]()
We do not need to have explicit helper functions for these, it adds a
level of confusion and indirection when we can simply use software leaf
entry logic here instead and spell out the special huge_pte_none() case we
must consider.

No functional change intended.

Link: https://lkml.kernel.org/r/0e92d6924d3de88cd014ce1c53e20edc08fc152e.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:51 -08:00
Lorenzo Stoakes
9ff30bb9ab mm: remove non_swap_entry() and use softleaf helpers instead
There is simply no need for the hugely confusing concept of 'non-swap'
swap entries now we have the concept of softleaf entries and relevant
softleaf_xxx() helpers.

Adjust all callers to use these instead and remove non_swap_entry()
altogether.

No functional change intended.

Link: https://lkml.kernel.org/r/2562093f37f4a9cffea0447058014485eb50aaaf.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:51 -08:00
Lorenzo Stoakes
c0a80c2ce6 mm: remove remaining is_swap_pmd() users and is_swap_pmd()
Update copy_huge_pmd() and change_huge_pmd() to use
pmd_is_valid_softleaf() - as this checks for the only valid non-present
huge PMD states.

Also update mm/debug_vm_pgtable.c to explicitly test for a valid leaf PMD
entry (which it was not before, which was incorrect), and have it test
against pmd_is_huge() and pmd_is_valid_softleaf() rather than
is_swap_pmd().

With these changes done there are no further users of is_swap_pmd(), so
remove it.

Link: https://lkml.kernel.org/r/1628b00b00c8498bbd2c20b82117ee87845fb738.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:51 -08:00
Lorenzo Stoakes
15eabc898d mm: introduce pmd_is_huge() and use where appropriate
The leaf entry PMD case is confusing as only migration entries and device
private entries are valid at PMD level, not true swap entries.

We repeatedly perform checks of the form is_swap_pmd() || pmd_trans_huge()
which is itself confusing - it implies that leaf entries at PMD level
exist and are different from huge entries.

Address this confusion by introduced pmd_is_huge() which checks for either
case.  Sadly due to header dependency issues (huge_mm.h is included very
early on in headers and cannot really rely on much else) we cannot use
pmd_is_valid_softleaf() here.

However since these are the only valid, handled cases the function is
still achieving what it intends to do.

We then replace all instances of is_swap_pmd() || pmd_trans_huge() with
pmd_is_huge() invocations and adjust logic accordingly to accommodate
this.

No functional change intended.

Link: https://lkml.kernel.org/r/00f79db3b15293cac8f7040a48d69c52d00117e4.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:51 -08:00
Lorenzo Stoakes
0ac881efe1 mm: replace pmd_to_swp_entry() with softleaf_from_pmd()
Introduce softleaf_from_pmd() to do the equivalent operation for PMDs that
softleaf_from_pte() fulfils, and cascade changes through code base
accordingly, introducing helpers as necessary.

We are then able to eliminate pmd_to_swp_entry(),
is_pmd_migration_entry(), is_pmd_device_private_entry() and
is_pmd_non_present_folio_entry().

This further establishes the use of leaf operations throughout the code
base and further establishes the foundations for eliminating
is_swap_pmd().

No functional change intended.

[lorenzo.stoakes@oracle.com: check writable, not readable/writable, per Vlastimil]
  Link: https://lkml.kernel.org/r/cd97b6ec-00f9-45a4-9ae0-8f009c212a94@lucifer.local
Link: https://lkml.kernel.org/r/3fb431699639ded8fdc63d2210aa77a38c8891f1.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: SeongJae Park <sj@kernel.org>\
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:51 -08:00
Lorenzo Stoakes
5dfa791605 mm/huge_memory: refactor change_huge_pmd() non-present logic
Similar to copy_huge_pmd(), there is a large mass of open-coded logic for
the CONFIG_ARCH_ENABLE_THP_MIGRATION non-present entry case that does not
use thp_migration_supported() consistently.

Resolve this by separating out this logic and introduce
change_non_present_huge_pmd().

No functional change intended.

Link: https://lkml.kernel.org/r/451b85636ad711e307fdfbff19af699fdab4d05f.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:51 -08:00
Lorenzo Stoakes
e244d82d02 mm/huge_memory: refactor copy_huge_pmd() non-present logic
Right now we are inconsistent in our use of thp_migration_supported():

static inline bool thp_migration_supported(void)
{
	return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);
}

And simply having arbitrary and ugly #ifdef
CONFIG_ARCH_ENABLE_THP_MIGRATION blocks in code.

This is exhibited in copy_huge_pmd(), which inserts a large #ifdef
CONFIG_ARCH_ENABLE_THP_MIGRATION block and an if-branch which is difficult
to follow

It's difficult to follow the logic of such a large function and the
non-present PMD logic is clearly separate as it sits in a giant if-branch.

Therefore this patch both separates out the logic and utilises
thp_migration_supported().

No functional change intended.

Link: https://lkml.kernel.org/r/6eaadc23ed512d370ede65561e34e96241c54b9d.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:51 -08:00
Lorenzo Stoakes
aa62204cb6 mm: avoid unnecessary use of is_swap_pmd()
PMD 'non-swap' swap entries are currently used for PMD-level migration
entries and device private entries.

To add to the confusion in this terminology we use is_swap_pmd() in an
inconsistent way similar to how is_swap_pte() was being used - sometimes
adopting the convention that !pmd_none(), !pmd_present() implies PMD 'swap'
entry, sometimes not.

This patch handles the low-hanging fruit of cases where we can simply
substitute other predicates for is_swap_pmd().

No functional change intended.

Link: https://lkml.kernel.org/r/8a1704b36a009c18032d5bea4cb68e71448fbbe5.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:51 -08:00
Lorenzo Stoakes
de4d6c9491 fs/proc/task_mmu: refactor pagemap_pmd_range()
Separate out THP logic so we can drop an indentation level and reduce the
amount of noise in this function.

We add pagemap_pmd_range_thp() for this purpose.

While we're here, convert the VM_BUG_ON() to a VM_WARN_ON_ONCE() at the
same time.

No functional change intended.

Link: https://lkml.kernel.org/r/f9ce7f3bb57e3627288225e23f2498cc5315f5ab.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:51 -08:00
Lorenzo Stoakes
fb410d8b89 mm: use leaf entries in debug pgtable + remove is_swap_pte()
Remove invocations of is_swap_pte() in mm/debug_vm_pgtable.c and use
softleaf_from_pte() and softleaf_is_swap() as necessary to replace this
usage.

We update the test code to use a 'true' swap entry throughout so we are
guaranteed this is not a non-swap entry, so all asserts continue to
operate correctly.

With this change in place, we no longer use is_swap_pte() anywhere, so
remove it.

Link: https://lkml.kernel.org/r/222f352e7a99191b4bdfa77e835f2fc0dd83fa72.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:50 -08:00
Lorenzo Stoakes
06fb61462b mm: eliminate is_swap_pte() when softleaf_from_pte() suffices
In cases where we can simply utilise the fact that softleaf_from_pte()
treats present entries as if they were none entries and thus eliminate
spurious uses of is_swap_pte(), do so.

No functional change intended.

Link: https://lkml.kernel.org/r/92ebab9567978155116804c67babc3c64636c403.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:50 -08:00
Lorenzo Stoakes
fb888710e2 mm: avoid unnecessary uses of is_swap_pte()
There's an established convention in the kernel that we treat PTEs as
containing swap entries (and the unfortunately named non-swap swap
entries) should they be neither empty (i.e.  pte_none() evaluating true)
nor present (i.e.  pte_present() evaluating true).

However, there is some inconsistency in how this is applied, as we also
have the is_swap_pte() helper which explicitly performs this check:

	/* check whether a pte points to a swap entry */
	static inline int is_swap_pte(pte_t pte)
	{
		return !pte_none(pte) && !pte_present(pte);
	}

As this represents a predicate, and it's logical to assume that in order
to establish that a PTE entry can correctly be manipulated as a
swap/non-swap entry, this predicate seems as if it must first be checked.

But we instead, we far more often utilise the established convention of
checking pte_none() / pte_present() before operating on entries as if they
were swap/non-swap.

This patch works towards correcting this inconsistency by removing all
uses of is_swap_pte() where we are already in a position where we perform
pte_none()/pte_present() checks anyway or otherwise it is clearly logical
to do so.

We also take advantage of the fact that pte_swp_uffd_wp() is only set on
swap entries.

Additionally, update comments referencing to is_swap_pte() and
non_swap_entry().

No functional change intended.

Link: https://lkml.kernel.org/r/17fd6d7f46a846517fd455fadd640af47fcd7c55.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:50 -08:00
Lorenzo Stoakes
68aa2fdbf5 mm: introduce leaf entry type and use to simplify leaf entry logic
The kernel maintains leaf page table entries which contain either:

The kernel maintains leaf page table entries which contain either:

 - Nothing ('none' entries)
 - Present entries*
 - Everything else that will cause a fault which the kernel handles

* Present entries are either entries the hardware can navigate without page
  fault or special cases like NUMA hint protnone or PMD with cleared
  present bit which contain hardware-valid entries modulo the present bit.

In the 'everything else' group we include swap entries, but we also
include a number of other things such as migration entries, device private
entries and marker entries.

Unfortunately this 'everything else' group expresses everything through a
swp_entry_t type, and these entries are referred to swap entries even
though they may well not contain a...  swap entry.

This is compounded by the rather mind-boggling concept of a non-swap swap
entry (checked via non_swap_entry()) and the means by which we twist and
turn to satisfy this.

This patch lays the foundation for reducing this confusion.

We refer to 'everything else' as a 'software-define leaf entry' or
'softleaf'.  for short And in fact we scoop up the 'none' entries into
this concept also so we are left with:

- Present entries.
- Softleaf entries (which may be empty).

This allows for radical simplification across the board - one can simply
convert any leaf page table entry to a leaf entry via softleaf_from_pte().

If the entry is present, we return an empty leaf entry, so it is assumed
the caller is aware that they must differentiate between the two
categories of page table entries, checking for the former via
pte_present().

As a result, we can eliminate a number of places where we would otherwise
need to use predicates to see if we can proceed with leaf page table entry
conversion and instead just go ahead and do it unconditionally.

We do so where we can, adjusting surrounding logic as necessary to
integrate the new softleaf_t logic as far as seems reasonable at this
stage.

We typedef swp_entry_t to softleaf_t for the time being until the
conversion can be complete, meaning everything remains compatible
regardless of which type is used.  We will eventually remove swp_entry_t
when the conversion is complete.

We introduce a new header file to keep things clear - leafops.h - this
imports swapops.h so can direct replace swapops imports without issue, and
we do so in all the files that require it.

Additionally, add new leafops.h file to core mm maintainers entry.

Link: https://lkml.kernel.org/r/c879383aac77d96a03e4d38f7daba893cd35fc76.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:50 -08:00
Lorenzo Stoakes
c093cf4510 mm: correctly handle UFFD PTE markers
Patch series "mm: remove is_swap_[pte, pmd]() + non-swap entries,
introduce leaf entries", v3.

There's an established convention in the kernel that we treat leaf page
tables (so far at the PTE, PMD level) as containing 'swap entries' should
they be neither empty (i.e.  p**_none() evaluating true) nor present (i.e.
p**_present() evaluating true).

However, at the same time we also have helper predicates - is_swap_pte(),
is_swap_pmd() - which are inconsistently used.

This is problematic, as it is logical to assume that should somebody wish
to operate upon a page table swap entry they should first check to see if
it is in fact one.

It also implies that perhaps, in future, we might introduce a non-present,
none page table entry that is not a swap entry.

This series resolves this issue by systematically eliminating all use of
the is_swap_pte() and is swap_pmd() predicates so we retain only the
convention that should a leaf page table entry be neither none nor present
it is a swap entry.

We also have the further issue that 'swap entry' is unfortunately a really
rather overloaded term and in fact refers to both entries for swap and for
other information such as migration entries, page table markers, and
device private entries.

We therefore have the rather 'unique' concept of a 'non-swap' swap entry.

This series therefore introduces the concept of 'software leaf entries',
of type softleaf_t, to eliminate this confusion.

A software leaf entry in this sense is any page table entry which is
non-present, and represented by the softleaf_t type.  That is - page table
leaf entries which are software-controlled by the kernel.

This includes 'none' or empty entries, which are simply represented by an
zero leaf entry value.

In order to maintain compatibility as we transition the kernel to this new
type, we simply typedef swp_entry_t to softleaf_t.

We introduce a number of predicates and helpers to interact with software
leaf entries in include/linux/leafops.h which, as it imports swapops.h,
can be treated as a drop-in replacement for swapops.h wherever leaf entry
helpers are used.

Since softleaf_from_[pte, pmd]() treats present entries as they were
empty/none leaf entries, this allows for a great deal of simplification of
code throughout the code base, which this series utilises a great deal.

We additionally change from swap entry to software leaf entry handling
where it makes sense to and eliminate functions from swapops.h where
software leaf entries obviate the need for the functions.


This patch (of 16):

PTE markers were previously only concerned with UFFD-specific logic - that
is, PTE entries with the UFFD WP marker set or those marked via
UFFDIO_POISON.

However since the introduction of guard markers in commit 7c53dfbdb0
("mm: add PTE_MARKER_GUARD PTE marker"), this has no longer been the case.

Issues have been avoided as guard regions are not permitted in conjunction
with UFFD, but it still leaves very confusing logic in place, most notably
the misleading and poorly named pte_none_mostly() and
huge_pte_none_mostly().

This predicate returns true for PTE entries that ought to be treated as
none, but only in certain circumstances, and on the assumption we are
dealing with H/W poison markers or UFFD WP markers.

This patch removes these functions and makes each invocation of these
functions instead explicitly check what it needs to check.

As part of this effort it introduces is_uffd_pte_marker() to explicitly
determine if a marker in fact is used as part of UFFD or not.

In the HMM logic we note that the only time we would need to check for a
fault is in the case of a UFFD WP marker, otherwise we simply encounter a
fault error (VM_FAULT_HWPOISON for H/W poisoned marker, VM_FAULT_SIGSEGV
for a guard marker), so only check for the UFFD WP case.

While we're here we also refactor code to make it easier to understand.

[akpm@linux-foundation.org: fix comment typo, per Mike]
Link: https://lkml.kernel.org/r/cover.1762812360.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/c38625fd9a1c1f1cf64ae8a248858e45b3dcdf11.1762812360.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:50 -08:00
Wei Yang
8a0e4bdddd mm/huge_memory: merge uniform_split_supported() and non_uniform_split_supported()
uniform_split_supported() and non_uniform_split_supported() share
significantly similar logic.

The only functional difference is that uniform_split_supported() includes
an additional check on the requested @new_order.

The reason for this check comes from the following two aspects:

  * some file system or swap cache just supports order-0 folio
  * the behavioral difference between uniform/non-uniform split

The behavioral difference between uniform split and non-uniform:

  * uniform split splits folio directly to @new_order
  * non-uniform split creates after-split folios with orders from
    folio_order(folio) - 1 to new_order.

This means for non-uniform split or !new_order split we should check the
file system and swap cache respectively.

This commit unifies the logic and merge the two functions into a single
combined helper, removing redundant code and simplifying the split
support checking mechanism.

Link: https://lkml.kernel.org/r/20251106034155.21398-3-richard.weiyang@gmail.com
Fixes: c010d47f10 ("mm: thp: split huge page to any lower order pages")
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:50 -08:00
Wei Yang
c467061fbb mm/huge_memory: introduce enum split_type for clarity
Patch series "mm/huge_memory: Define split_type and consolidate split
support checks", v3.

This two-patch series focuses on improving code clarity and removing
redundancy in the huge memory handling logic related to folio splitting.

The series is based on an original proposal to merge two significantly
identical functions that check folio split support[1].  During this
process, we found an opportunity to improve readability by explicitly
defining the split types.  

Patch 1: define split_type and use it
Patch 2: merge uniform_split_supported() and non_uniform_split_supported()


This patch (of 2):

We currently handle two distinct types of large folio splitting:
  * uniform split
  * non-uniform split

Differentiating between these types using a simple boolean variable is not
obvious and can harm code readability.

This commit introduces enum split_type to explicitly define these two
types.  Replacing the existing boolean variable with this enumeration
significantly improves code clarity and expressiveness when dealing with
folio splitting logic.

No functional change is expected.

[akpm@linux-foundation.org: tweak layout, per David]
Link: https://lkml.kernel.org/r/20251106034155.21398-1-richard.weiyang@gmail.com
Link: https://lkml.kernel.org/r/20251106034155.21398-2-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:50 -08:00
Ma Ke
fe9d31fd1a mm/hmm/test: fix error handling in dmirror_device_init
dmirror_device_init() calls device_initialize() which sets the device
reference count to 1, but fails to call put_device() when error occurs
after dev_set_name() or cdev_device_add() failures.  This results in
memory leaks of struct device objects.  Additionally,
dmirror_device_remove() lacks the final put_device() call to properly
release the device reference.

Found by code review.

Link: https://lkml.kernel.org/r/20251108115346.6368-1-make24@iscas.ac.cn
Fixes: 6a760f58c7 ("mm/hmm/test: use char dev with struct device to get device node")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Cc: Haoxiang Li <make24@iscas.ac.cn>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Mika Penttilä <mpenttil@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:50 -08:00
Zi Yan
50d0598cf2 mm/huge_memory: fix kernel-doc comments for folio_split() and related
try_folio_split_to_order(), folio_split, __folio_split(), and
__split_unmapped_folio() do not have correct kernel-doc comment format. 
Fix them.

[ziy@nvidia.com: kernel-doc fixup]
  Link: https://lkml.kernel.org/r/BE7AC5F3-9E64-4923-861D-C2C4E0CB91EB@nvidia.com
[ziy@nvidia.com: add newline to fix an error and a warning from docutils]
  Link: https://lkml.kernel.org/r/040B38C0-23C6-4AEA-B069-69AE6DAA828B@nvidia.com
Link: https://lkml.kernel.org/r/20251031162001.670503-4-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Luis Chamberalin <mcgrof@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Pankaj Raghav <kernel@pankajraghav.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:49 -08:00
Zi Yan
689b898677 mm/memory-failure: improve large block size folio handling
Large block size (LBS) folios cannot be split to order-0 folios but
min_order_for_folio().  Current split fails directly, but that is not
optimal.  Split the folio to min_order_for_folio(), so that, after split,
only the folio containing the poisoned page becomes unusable instead.

For soft offline, do not split the large folio if its
min_order_for_folio() is not 0.  Since the folio is still accessible from
userspace and premature split might lead to potential performance loss.

Link: https://lkml.kernel.org/r/20251031162001.670503-3-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Suggested-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Pankaj Raghav <kernel@pankajraghav.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:49 -08:00
Zi Yan
a7ef12c64f mm/huge_memory: add split_huge_page_to_order()
Patch series "Optimize folio split in memory failure", v5.

This patchset optimizes folio split operations in memory failure code by
always splitting a folio to min_order_for_split() to minimize unusable
pages, even if min_order_for_split() is non zero and memory failure code
would take the failed path eventually for a successfully split folio. 

This means instead of making the entire original folio unusable memory
failure code would only make its after-split folio, which has order of
min_order_for_split() and contains HWPoison page, unusable.

For soft offline case, since the original folio is still accessible, no
split is performed if the folio cannot be split to order-0 to prevent
potential performance loss.

In addition, add split_huge_page_to_order() to improve code readability
and fix kernel-doc comment format for folio_split() and other related
functions.

Background
==========

This patchset is a follow-up of "[PATCH v3] mm/huge_memory: do not change
split_huge_page*() target order silently."[1] and [PATCH v4]
mm/huge_memory: preserve PG_has_hwpoisoned if a folio is split to >0
order[2], since both are separated out as hotfixes.  It improves how
memory failure code handles large block size(LBS) folios with
min_order_for_split() > 0.  By splitting a large folio containing HW
poisoned pages to min_order_for_split(), the after-split folios without HW
poisoned pages could be freed for reuse.  To achieve this, folio split
code needs to set has_hwpoisoned on after-split folios containing HW
poisoned pages and it is done in the hotfix in [2].

This patchset includes:
1. A patch adds split_huge_page_to_order(),
2. Patch 2 and Patch 3 of "[PATCH v2 0/3] Do not change split folio target
   order"[3],


This patch (of 3):

When the caller does not supply a list to
split_huge_page_to_list_to_order(), use split_huge_page_to_order()
instead.

Link: https://lkml.kernel.org/r/20251031162001.670503-1-ziy@nvidia.com
Link: https://lkml.kernel.org/r/20251031162001.670503-2-ziy@nvidia.com
Link: https://lore.kernel.org/all/20251017013630.139907-1-ziy@nvidia.com/ [1]
Link: https://lore.kernel.org/all/20251023030521.473097-1-ziy@nvidia.com/ [2]
Link: https://lore.kernel.org/all/20251016033452.125479-1-ziy@nvidia.com/ [3]
Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Luis Chamberalin <mcgrof@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Pankaj Raghav <kernel@pankajraghav.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:49 -08:00
Wei Yang
d87f4a8f19 mm/huge_memory: only get folio_order() once during __folio_split()
Before splitting folio, its order keeps the same.

It is only necessary to get folio_order() once.

Also rename order to old_order to represent the original folio order.

Link: https://lkml.kernel.org/r/20251010141142.1349-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Lance Yang <lance.yang@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Barry Song <baohua@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:49 -08:00
Wei Yang
ac7756771a mm/khugepaged: unify pmd folio installation with map_anon_folio_pmd()
Currently we install pmd folio with map_anon_folio_pmd() in
__do_huge_pmd_anonymous_page() and do_huge_zero_wp_pmd().  While in
collapse_huge_page(), it is done with identical code except statistics
adjustment.

Unify the process with map_anon_folio_pmd() to install pmd folio.  Split
it to map_anon_folio_pmd_pf() and map_anon_folio_pmd_nopf() to be used in
page fault or not respectively.

No functional change is intended.

[akpm@linux-foundation.org: remove unneeded map_anon_folio_pmd_nopf() stub, per Wei & David]
Link: https://lkml.kernel.org/r/20251008095453.18772-3-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Lance Yang <lance.yang@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:49 -08:00
Wei Yang
2a1351cd41 mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
We add pmd folio into ds_queue on the first page fault in
__do_huge_pmd_anonymous_page(), so that we can split it in case of memory
pressure.  This should be the same for a pmd folio during wp page fault.

Commit 1ced09e033 ("mm: allocate THP on hugezeropage wp-fault") miss to
add it to ds_queue, which means system may not reclaim enough memory in
case of memory pressure even the pmd folio is under used.

Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
folio installation consistent.

Link: https://lkml.kernel.org/r/20251008095453.18772-2-richard.weiyang@gmail.com
Fixes: 1ced09e033 ("mm: allocate THP on hugezeropage wp-fault")
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Acked-by: Usama Arif <usamaarif642@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:49 -08:00
Balbir Singh
c322874710 gpu/drm/nouveau: enable THP support for GPU memory migration
Enable MIGRATE_VMA_SELECT_COMPOUND support in nouveau driver to take
advantage of THP zone device migration capabilities.

Update migration and eviction code paths to handle compound page sizes
appropriately, improving memory bandwidth utilization and reducing
migration overhead for large GPU memory allocations.

[balbirs@nvidia.com: fix sparse error]
  Link: https://lkml.kernel.org/r/20251115003333.3516870-1-balbirs@nvidia.com
Link: https://lkml.kernel.org/r/20251001065707.920170-17-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:49 -08:00
Balbir Singh
271a7b2e3c selftests/mm/hmm-tests: new throughput tests including THP
Add new benchmark style support to test transfer bandwidth for zone device
memory operations.

Link: https://lkml.kernel.org/r/20251001065707.920170-16-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:49 -08:00
Matthew Brost
24c2c5b8ff selftests/mm/hmm-tests: partial unmap, mremap and anon_write tests
Add partial unmap test case which munmaps memory while in the device.

Add tests exercising mremap on faulted-in memory (CPU and GPU) at various
offsets and verify correctness.

Update anon_write_child to read device memory after fork verifying this
flow works in the kernel.

Both THP and non-THP cases are updated.

Link: https://lkml.kernel.org/r/20251001065707.920170-15-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:49 -08:00
Balbir Singh
519071529d selftests/mm/hmm-tests: new tests for zone device THP migration
Add new tests for migrating anon THP pages, including anon_huge,
anon_huge_zero and error cases involving forced splitting of pages during
migration.

Link: https://lkml.kernel.org/r/20251001065707.920170-14-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:49 -08:00
Balbir Singh
aa3ade4295 lib/test_hmm: add large page allocation failure testing
Add HMM_DMIRROR_FLAG_FAIL_ALLOC flag to simulate large page allocation
failures, enabling testing of split migration code paths.

This test flag allows validation of the fallback behavior when destination
device cannot allocate compound pages.  This is useful for testing the
split migration functionality.

Link: https://lkml.kernel.org/r/20251001065707.920170-13-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:48 -08:00
Balbir Singh
4265d67e40 mm/migrate_device: add THP splitting during migration
Implement migrate_vma_split_pages() to handle THP splitting during the
migration process when destination cannot allocate compound pages.

This addresses the common scenario where migrate_vma_setup() succeeds with
MIGRATE_PFN_COMPOUND pages, but the destination device cannot allocate
large pages during the migration phase.

Key changes:
- migrate_vma_split_pages(): Split already-isolated pages during migration
- Enhanced folio_split() and __split_unmapped_folio() with isolated
  parameter to avoid redundant unmap/remap operations

This provides a fallback mechansim to ensure migration succeeds even when
large page allocation fails at the destination.

[matthew.brost@intel.com: add THP splitting during migration]
  Link: https://lkml.kernel.org/r/20251120230825.181072-2-matthew.brost@intel.com
Link: https://lkml.kernel.org/r/20251001065707.920170-12-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:48 -08:00
Balbir Singh
56ef398996 mm/memremap: add driver callback support for folio splitting
When a zone device page is split (via huge pmd folio split).  The driver
callback for folio_split is invoked to let the device driver know that the
folio size has been split into a smaller order.

Provide a default implementation for drivers that do not provide this
callback that copies the pgmap and mapping fields for the split folios.

Update the HMM test driver to handle the split.

Link: https://lkml.kernel.org/r/20251001065707.920170-11-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:48 -08:00
Balbir Singh
775465fd26 lib/test_hmm: add zone device private THP test infrastructure
Enhance the hmm test driver (lib/test_hmm) with support for THP pages.

A new pool of free_folios() has now been added to the dmirror device,
which can be allocated when a request for a THP zone device private page
is made.

Add compound page awareness to the allocation function during normal
migration and fault based migration.  These routines also copy
folio_nr_pages() when moving data between system memory and device memory.

args.src and args.dst used to hold migration entries are now dynamically
allocated (as they need to hold HPAGE_PMD_NR entries or more).

Split and migrate support will be added in future patches in this series.

Link: https://lkml.kernel.org/r/20251001065707.920170-10-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:48 -08:00
Balbir Singh
4964099163 mm/memory/fault: add THP fault handling for zone device private pages
Implement CPU fault handling for zone device THP entries through
do_huge_pmd_device_private(), enabling transparent migration of
device-private large pages back to system memory on CPU access.

When the CPU accesses a zone device THP entry, the fault handler calls the
device driver's migrate_to_ram() callback to migrate the entire large page
back to system memory.

Link: https://lkml.kernel.org/r/20251001065707.920170-9-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:48 -08:00
Balbir Singh
a30b48bf1b mm/migrate_device: implement THP migration of zone device pages
MIGRATE_VMA_SELECT_COMPOUND will be used to select THP pages during
migrate_vma_setup() and MIGRATE_PFN_COMPOUND will make migrating device
pages as compound pages during device pfn migration.

migrate_device code paths go through the collect, setup and finalize
phases of migration.

The entries in src and dst arrays passed to these functions still remain
at a PAGE_SIZE granularity.  When a compound page is passed, the first
entry has the PFN along with MIGRATE_PFN_COMPOUND and other flags set
(MIGRATE_PFN_MIGRATE, MIGRATE_PFN_VALID), the remaining entries
(HPAGE_PMD_NR - 1) are filled with 0's.  This representation allows for
the compound page to be split into smaller page sizes.

migrate_vma_collect_hole(), migrate_vma_collect_pmd() are now THP page
aware.  Two new helper functions migrate_vma_collect_huge_pmd() and
migrate_vma_insert_huge_pmd_page() have been added.

migrate_vma_collect_huge_pmd() can collect THP pages, but if for some
reason this fails, there is fallback support to split the folio and
migrate it.

migrate_vma_insert_huge_pmd_page() closely follows the logic of
migrate_vma_insert_page()

Support for splitting pages as needed for migration will follow in later
patches in this series.

Link: https://lkml.kernel.org/r/20251001065707.920170-8-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:48 -08:00
Balbir Singh
022a12deda mm/migrate_device: handle partially mapped folios during collection
Extend migrate_vma_collect_pmd() to handle partially mapped large folios
that require splitting before migration can proceed.

During PTE walk in the collection phase, if a large folio is only
partially mapped in the migration range, it must be split to ensure the
folio is correctly migrated.

[matthew.brost@intel.com: handle partially mapped folios during split]
  Link: https://lkml.kernel.org/r/20251120230825.181072-1-matthew.brost@intel.com
Link: https://lkml.kernel.org/r/20251001065707.920170-7-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:48 -08:00
Balbir Singh
1462872900 mm/huge_memory: implement device-private THP splitting
Add support for splitting device-private THP folios, enabling fallback
to smaller page sizes when large page allocation or migration fails.

Key changes:
- split_huge_pmd(): Handle device-private PMD entries during splitting
- Preserve RMAP_EXCLUSIVE semantics for anonymous exclusive folios
- Skip RMP_USE_SHARED_ZEROPAGE for device-private entries as they
  don't support shared zero page semantics

Link: https://lkml.kernel.org/r/20251001065707.920170-6-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:48 -08:00
Balbir Singh
65edfda6f3 mm/rmap: extend rmap and migration support device-private entries
Add device-private THP support to reverse mapping infrastructure, enabling
proper handling during migration and walk operations.

The key changes are:
- add_migration_pmd()/remove_migration_pmd(): Handle device-private
  entries during folio migration and splitting
- page_vma_mapped_walk(): Recognize device-private THP entries during
  VMA traversal operations

This change supports folio splitting and migration operations on
device-private entries.

[balbirs@nvidia.com: fix override of entry in remove_migration_pmd]
  Link: https://lkml.kernel.org/r/20251114012153.2634497-2-balbirs@nvidia.com
[balbirs@nvidia.com: follow pattern used in remove_migration_pte()]
  Link: https://lkml.kernel.org/r/20251115002835.3515194-1-balbirs@nvidia.com
Link: https://lkml.kernel.org/r/20251001065707.920170-5-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:48 -08:00
Balbir Singh
368076f52e mm/huge_memory: add device-private THP support to PMD operations
Extend core huge page management functions to handle device-private THP
entries.  This enables proper handling of large device-private folios in
fundamental MM operations.

The following functions have been updated:

- copy_huge_pmd(): Handle device-private entries during fork/clone
- zap_huge_pmd(): Properly free device-private THP during munmap
- change_huge_pmd(): Support protection changes on device-private THP
- __pte_offset_map(): Add device-private entry awareness

Link: https://lkml.kernel.org/r/20251001065707.920170-4-balbirs@nvidia.com
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:48 -08:00
Balbir Singh
3a5a065545 mm/zone_device: rename page_free callback to folio_free
Change page_free to folio_free to make the folio support for
zone device-private more consistent. The PCI P2PDMA callback
has also been updated and changed to folio_free() as a result.

For drivers that do not support folios (yet), the folio is
converted back into page via &folio->page and the page is used
as is, in the current callback implementation.

Link: https://lkml.kernel.org/r/20251001065707.920170-3-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:47 -08:00
Balbir Singh
d245f9b4ab mm/zone_device: support large zone device private folios
Patch series "mm: support device-private THP", v7.

This patch series introduces support for Transparent Huge Page (THP)
migration in zone device-private memory.  The implementation enables
efficient migration of large folios between system memory and
device-private memory

Background

Current zone device-private memory implementation only supports PAGE_SIZE
granularity, leading to:
- Increased TLB pressure
- Inefficient migration between CPU and device memory

This series extends the existing zone device-private infrastructure to
support THP, leading to:
- Reduced page table overhead
- Improved memory bandwidth utilization
- Seamless fallback to base pages when needed

In my local testing (using lib/test_hmm) and a throughput test, the series
shows a 350% improvement in data transfer throughput and a 80% improvement
in latency

These patches build on the earlier posts by Ralph Campbell [1]

Two new flags are added in vma_migration to select and mark compound
pages.  migrate_vma_setup(), migrate_vma_pages() and
migrate_vma_finalize() support migration of these pages when
MIGRATE_VMA_SELECT_COMPOUND is passed in as arguments.

The series also adds zone device awareness to (m)THP pages along with
fault handling of large zone device private pages.  page vma walk and the
rmap code is also zone device aware.  Support has also been added for
folios that might need to be split in the middle of migration (when the
src and dst do not agree on MIGRATE_PFN_COMPOUND), that occurs when src
side of the migration can migrate large pages, but the destination has not
been able to allocate large pages.  The code supported and used
folio_split() when migrating THP pages, this is used when
MIGRATE_VMA_SELECT_COMPOUND is not passed as an argument to
migrate_vma_setup().

The test infrastructure lib/test_hmm.c has been enhanced to support THP
migration.  A new ioctl to emulate failure of large page allocations has
been added to test the folio split code path.  hmm-tests.c has new test
cases for huge page migration and to test the folio split path.  A new
throughput test has been added as well.

The nouveau dmem code has been enhanced to use the new THP migration
capability.  

mTHP support:

The patches hard code, HPAGE_PMD_NR in a few places, but the code has been
kept generic to support various order sizes.  With additional refactoring
of the code support of different order sizes should be possible.

The future plan is to post enhancements to support mTHP with a rough
design as follows:

1. Add the notion of allowable thp orders to the HMM based test driver
2. For non PMD based THP paths in migrate_device.c, check to see if
   a suitable order is found and supported by the driver
3. Iterate across orders to check the highest supported order for migration
4. Migrate and finalize

The mTHP patches can be built on top of this series, the key design
elements that need to be worked out are infrastructure and driver support
for multiple ordered pages and their migration.

HMM support for large folios was added in 10b9feee2d ("mm/hmm:
populate PFNs from PMD swap entry").


This patch (of 16)

Add routines to support allocation of large order zone device folios and
helper functions for zone device folios, to check if a folio is device
private and helpers for setting zone device data.

When large folios are used, the existing page_free() callback in pgmap is
called when the folio is freed, this is true for both PAGE_SIZE and higher
order pages.

Zone device private large folios do not support deferred split and scan
like normal THP folios.

Link: https://lkml.kernel.org/r/20251001065707.920170-1-balbirs@nvidia.com
Link: https://lkml.kernel.org/r/20251001065707.920170-2-balbirs@nvidia.com
Link: https://lore.kernel.org/linux-mm/20201106005147.20113-1-rcampbell@nvidia.com/ [1]
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:47 -08:00
Claudio Imbrenda
1452468447 KVM: s390: fix missing present bit for gmap puds
For hugetlbs, gmap puds have the present bit set.  For normal puds (which
point to ptes), the bit is not set.  This is in contrast to the normal
userspace puds, which always have the bit set for present pmds.

This causes issues when ___pte_offset_map() is modified to only check for
the present bit.

The solution to the problem is simply to always set the present bit for
present gmap pmds.

Link: https://lkml.kernel.org/r/20251028130150.57379-2-imbrenda@linux.ibm.com
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/lkml/20251017144924.10034-1-borntraeger@linux.ibm.com/
Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Balbir Singh <balbirs@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dave Airlie <airlied@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Lyude <lyude@redhat.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:47 -08:00
Andrew Morton
87fcafc4e2 Merge branch 'mm-hotfixes-stable' into mm-stable in order to merge
"mm/huge_memory: only get folio_order() once during __folio_split()" into
mm-stable.
2025-11-24 15:07:34 -08:00
Marc Zyngier
de88423277 KVM: arm64: selftests: vgic_irq: Add timer deactivation test
Add a new test case that triggers the HW deactivation emulation path
when trapping ICV_DIR_EL1. This is obviously tied to the way KVM
works now, but the test follows the expected architectural behaviour.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-50-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:15 -08:00
Marc Zyngier
1c9c71ac1b KVM: arm64: selftests: vgic_irq: Add Group-0 enable test
Add a new test case that inject a Group-0 interrupt together
with a bunch of Group-1 interrupts, Ack/EOI the G1 interrupts,
and only then enable G0, expecting to get the G0 interrupt.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-49-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:15 -08:00
Marc Zyngier
d2dee2e849 KVM: arm64: selftests: vgic_irq: Add asymmetric SPI deaectivation test
Add a new test case that makes an interrupt pending on a vcpu,
activates it, do the priority drop, and then get *another* vcpu
to do the deactivation.

Special care is taken not to trigger an exit in the process, so
that we are sure that the active interrupt is in an LR. Joy.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-48-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:15 -08:00
Marc Zyngier
b6c68612ab KVM: arm64: selftests: vgic_irq: Perform EOImode==1 deactivation in ack order
When EOImode==1, perform the deactivation in the order of activation,
just to make things a bit worse for KVM. Yes, I'm nasty.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-47-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:15 -08:00
Marc Zyngier
fd5fa1c8d0 KVM: arm64: selftests: vgic_irq: Remove LR-bound limitation
Good news: our GIC emulation is not completely broken, and we can
activate as many interrupts as we want.

Bump the test to cover all the SGIs, all the allowed PPIs, and
31 SPIs. Yes, 31, because we have 31 available priorities, and the
test is not happy with having two interrupts with the same priority.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-46-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:15 -08:00
Marc Zyngier
5053c2ab92 KVM: arm64: selftests: vgic_irq: Exclude timer-controlled interrupts
The PPI injection API is clear that you can't inject the timer PPIs
from userspace, since they are controlled by the timers themselves.

Add an exclusion list for this purpose.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-45-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:15 -08:00
Marc Zyngier
8b7888c511 KVM: arm64: selftests: vgic_irq: Change configuration before enabling interrupt
The architecture is pretty clear that changing the configuration of
an enable interrupt is not OK. It doesn't really matter here, but
doing the right thing is not more expensive.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-44-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:15 -08:00
Marc Zyngier
27392612c8 KVM: arm64: selftests: vgic_irq: Fix GUEST_ASSERT_IAR_EMPTY() helper
No, 0 is not a spurious INTID. Never been, never was.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-43-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:14 -08:00
Marc Zyngier
2366295c76 KVM: arm64: selftests: gic_v3: Disable Group-0 interrupts by default
Make sure G0 is disabled at the point of initialising the GIC.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-42-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:14 -08:00
Marc Zyngier
a1650de7c1 KVM: arm64: selftests: gic_v3: Add irq group setting helper
Being able to set the group of an interrupt is pretty useful.
Add such a helper.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-41-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:14 -08:00
Marc Zyngier
07bb1c5622 KVM: arm64: GICv2: Always trap GICV_DIR register
Since we can't decide to trap the DIR register on a per-vcpu basis,
always trap the second page of the GIC CPU interface. Yes, this is
costly. On the bright side, no sane SW should use EOImode==1 on
GICv2...

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-40-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:14 -08:00
Marc Zyngier
255de897e7 KVM: arm64: GICv2: Handle deactivation via GICV_DIR traps
Add the plumbing of GICv2 interrupt deactivation via GICV_DIR.
This requires adding a new device so that we can easily decode
the DIR address.

The deactivation itself is very similar to the GICv3 version.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-39-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:14 -08:00
Marc Zyngier
281c6c06e2 KVM: arm64: GICv2: Handle LR overflow when EOImode==0
Similarly to the GICv3 version, handle the EOIcount-driven deactivation
by walking the overflow list.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-38-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:14 -08:00
Marc Zyngier
78ffc28456 KVM: arm64: GICv3: Force exit to sync ICH_HCR_EL2.En
FEAT_NV2 is pretty terrible for anything that tries to enforce immediate
effects, and writing to ICH_HCR_EL2 in the hope to disable a maintenance
interrupt is vain. This only hits memory, and the guest hasn't cleared
anything -- the MI will fire.

For example, running the vgic_irq test under NV results in about 800
maintenance interrupts being actually handled by the L1 guest,
when none were expected.

As a cheap workaround, read back ICH_MISR_EL2 after writing 0 to
ICH_HCR_EL2. This is very cheap on real HW, and causes a trap to
the host in NV, giving it the opportunity to retire the pending MI.
With this, the above test runs to completion without any MI being
actually handled.

Yes, this is really poor...

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-37-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:14 -08:00
Marc Zyngier
6dd333c894 KVM: arm64: GICv3: nv: Plug L1 LR sync into deactivation primitive
Pretty much like the rest of the LR handling, deactivation of an
L2 interrupt gets reflected in the L1 LRs, and therefore must be
propagated into the L1 shadow state if the interrupt is HW-bound.

Instead of directly handling the active state (which looks a bit
off as it ignores locking and L1->L0 HW propagation), use the new
deactivation primitive to perform the deactivation and deal with
the required maintenance.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-36-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:14 -08:00
Marc Zyngier
eb33ffa2bd KVM: arm64: GICv3: nv: Resync LRs/VMCR/HCR early for better MI emulation
The current approach to nested GICv3 support is to not do anything
while L2 is running, wait a transition from L2 to L1 to resync
LRs, VMCR and HCR, and only then evaluate the state to decide
whether to generate a maintenance interrupt.

This doesn't provide a good quality of emulation, and it would be
far preferable to find out early that we need to perform a switch.

Move the LRs/VMCR and HCR resync into vgic_v3_sync_nested(), so
that we have most of the state available. As we turning the vgic
off at this stage to avoid a screaming host MI, add a new helper
vgic_v3_flush_nested() that switches the vgic on again. The MI can
then be directly injected as required.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-35-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:14 -08:00
Marc Zyngier
84792050e0 KVM: arm64: GICv3: Avoid broadcast kick on CPUs lacking TDIR
CPUs lacking TDIR always trap ICV_DIR_EL1, no matter what, since
we have ICH_HCR_EL2.TC set permanently. For these CPUs, it is
useless to use a broadcast kick on SPI injection, as the sole
purpose of this is to set TDIR.

We can therefore skip this on these CPUs, which are challenged
enough not to be burdened by extra IPIs. As a consequence,
permanently set the TDIR bit in the shadow state to notify the
fast-path emulation code of the exit reason.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-34-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:14 -08:00
Marc Zyngier
ca3c34da36 KVM: arm64: GICv3: Handle in-LR deactivation when possible
Even when we have either an LR overflow or SPIs in flight, it is
extremely likely that the interrupt being deactivated is still in
the LRs, and that going all the way back to the the generic trap
handling code is a waste of time.

Instead, try and deactivate in place when possible, and only if
this fails, perform a full exit.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-33-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:14 -08:00
Marc Zyngier
1c3b3cadcd KVM: arm64: GICv3: Add SPI tracking to handle asymmetric deactivation
SPIs are specially annpying, as they can be activated on a CPU and
deactivated on another. WHich means that when an SPI is in flight
anywhere, all CPUs need to have their TDIR trap bit set.

This translates into broadcasting an IPI across all CPUs to make sure
they set their trap bit, The number of in-flight SPIs is kept in
an atomic variable so that CPUs can turn the trap bit off as soon
as possible.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-32-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:14 -08:00
Marc Zyngier
70fd60bded KVM: arm64: GICv3: Set ICH_HCR_EL2.TDIR when interrupts overflow LR capacity
Now that we are ready to handle deactivation through ICV_DIR_EL1,
set the trap bit if we have active interrupts outside of the LRs.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-31-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:13 -08:00
Marc Zyngier
295b692165 KVM: arm64: GICv3: Add GICv2 SGI handling to deactivation primitive
The GICv2 SGIs require additional handling for deactivation, as they
are effectively multiple interrrupts muxed into one. Make sure we
check for the source CPU when deactivating.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-30-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:13 -08:00
Marc Zyngier
cd4f6ee99b KVM: arm64: GICv3: Handle deactivation via ICV_DIR_EL1 traps
Deactivation via ICV_DIR_EL1 is both relatively straightforward
(we have the interrupt that needs deactivation) and really awkward.

The main issue is that the interrupt may either be in an LR on
another CPU, or ourside of any LR.

In the former case, we process the deactivation is if ot was
a write to GICD_CACTIVERn, which is already implemented as a big
hammer IPI'ing all vcpus. In the latter case, we just perform
a normal deactivation, similar to what we do for EOImode==0.

Another annoying aspect is that we need to tell the CPU owning
the interrupt that its ap_list needs laudering. We use a brand new
vcpu request to that effect.

Note that this doesn't address deactivation via the GICV MMIO view,
which will be taken care of in a later change.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-29-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:13 -08:00
Marc Zyngier
3cfd59f81e KVM: arm64: GICv3: Handle LR overflow when EOImode==0
Now that we can identify interrupts that have not made it into the LRs,
it becomes relatively easy to use EOIcount to walk the overflow list.

What is a bit odd is that we compute a fake LR for the original
state of the interrupt, clear the active bit, and feed into the existing
logic for processing. In a way, this is what would have happened if
the interrupt was in an LR.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-28-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:13 -08:00
Marc Zyngier
a69e2d6f89 KVM: arm64: Use MI to detect groups being enabled/disabled
Add the maintenance interrupt to force an exit when the guest
enables/disables individual groups, so that we can resort the
ap_list accordingly.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-27-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:13 -08:00
Marc Zyngier
33c1f60b32 KVM: arm64: Move undeliverable interrupts to the end of ap_list
Interrupts in the ap_list that cannot be acted upon because they
are not enabled, or that their group is not enabled, shouldn't
make it into the LRs if we are space-constrained.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-26-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:13 -08:00
Marc Zyngier
05984ba67e KVM: arm64: Invert ap_list sorting to push active interrupts out
Having established that pending interrupts should have priority
to be moved into the LRs over the active interrupts, implement this
in the ap_list sorting.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-25-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:13 -08:00
Marc Zyngier
76b2eda65c KVM: arm64: Make vgic_target_oracle() globally available
Make the internal crystal ball global, so that implementation-specific
code can use it.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-24-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:13 -08:00
Marc Zyngier
f04b8a5a83 KVM: arm64: Turn kvm_vgic_vcpu_enable() into kvm_vgic_vcpu_reset()
Now that we always reconfigure the vgic HCR register on entry,
the "enable" part of kvm_vgic_vcpu_enable() is pretty useless.

Removing the enable bits from these functions makes it plain that
they are just about computing the reset state. Just rename the
functions accordingly.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-23-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:13 -08:00
Marc Zyngier
6780a75604 KVM: arm64: Revamp vgic maintenance interrupt configuration
We currently don't use the maintenance interrupt very much, apart
from EOI on level interrupts, and for LR underflow in limited cases.

However, as we are moving toward a setup where active interrupts
can live outside of the LRs, we need to use the MIs in a more
diverse set of cases.

Add a new helper that produces a digest of the ap_list, and use
that summary to set the various control bits as required.

This slightly changes the way v2 SGIs are handled, as they used to
count for more than one interrupt, but not anymore.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-22-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:13 -08:00
Marc Zyngier
cf72ee6371 KVM: arm64: Eagerly save VMCR on exit
We currently save/restore the VMCR register in a pretty lazy way
(on load/put, consistently with what we do with the APRs).

However, we are going to need the group-enable bits that are backed
by VMCR on each entry (so that we can avoid injecting interrupts for
disabled groups).

Move the synchronisation from put to sync, which results in some minor
churn in the nVHE hypercalls to simplify things.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-21-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:13 -08:00
Marc Zyngier
dd598fc113 KVM: arm64: Compute vgic state irrespective of the number of interrupts
As we are going to rely on the [G]ICH_HCR{,_EL2} register to be
programmed with MI information at all times, slightly de-optimise
the flush/sync code to always be called. This is rather lightweight
when no interrupts are in flight.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-20-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:12 -08:00
Marc Zyngier
0660bc4a2b KVM: arm64: GICv2: Extract LR computing primitive
Split vgic_v2_populate_lr() into two helpers, so that we have another
primitive that computes the LR from a vgic_irq, but doesn't update
anything in the shadow structure.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-19-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:12 -08:00
Marc Zyngier
3aa9a50c20 KVM: arm64: GICv2: Extract LR folding primitive
As we are going to need to handle deactivation for interrupts that
are not in the LRs, split vgic_v2_fold_lr_state() into a helper
that deals with a single interrupt, and the function that loops
over the used LRs.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-18-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:12 -08:00
Marc Zyngier
a00c88ac1f KVM: arm64: GICv2: Decouple GICH_HCR programming from LRs being loaded
Not programming GICH_HCR while no LRs are populated is a bit
of an issue, as we otherwise don't see any maintenance interrupt
when the guest interacts with the LRs.

Decouple the two and always program the control register, even when
we don't have to touch the LRs.

This is very similar to what we are already doing for GICv3.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-17-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:12 -08:00
Marc Zyngier
5ceb3dac80 KVM: arm64: GICv2: Preserve EOIcount on exit
EOIcount is how the virtual CPU interface signals that the guest
is deactivating interrupts outside of the LRs when EOImode==0.

We therefore need to preserve that information so that we can find
out what actually needs deactivating, just like we already do on
GICv3.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-16-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:12 -08:00
Marc Zyngier
1ae0448ca7 KVM: arm64: GICv3: Extract LR computing primitive
Split vgic_v3_populate_lr() into two, so that we have another
primitive that computes the LR from a vgic_irq, but doesn't
update anything in the shadow structure.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-15-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:12 -08:00
Marc Zyngier
438e47b697 KVM: arm64: GICv3: Extract LR folding primitive
As we are going to need to handle deactivation for interrupts that
are not in the LRs, split vgic_v3_fold_lr_state() into a helper
that deals with a single interrupt, and the function that loops
over the used LRs.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-14-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:12 -08:00
Marc Zyngier
00c6d0d4a8 KVM: arm64: GICv3: Decouple ICH_HCR_EL2 programming from LRs
Not programming ICH_HCR_EL2 while no LRs are populated is a bit
of an issue, as we otherwise don't see any maintenance interrupt
when the guest interacts with the LRs.

Decouple the two and always program the control register, even when
we don't have to touch the LRs.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-13-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:12 -08:00
Marc Zyngier
f4ded7b084 KVM: arm64: GICv3: Preserve EOIcount on exit
EOIcount is how the virtual CPU interface signals that the guest
is deactivating interrupts outside of the LRs when EOImode==0.

We therefore need to preserve that information so that we can find
out what actually needs deactivating.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-12-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:12 -08:00
Marc Zyngier
73c9726975 KVM: arm64: GICv3: Drop LPI active state when folding LRs
Despite LPIs not having an active state, *virtual* LPIs do have
one, which gets cleared on EOI. So far, so good.

However, this leads to a small problem: when an active LPI is not
in the LRs, that EOImode==0 and that the guest EOIs it, EOIcount
doesn't get bumped up. Which means that in these condition, the
LPI would stay active forever.

Clearly, we can't have that. So if we spot an active LPI, we drop
that state. It's pretty pointless anyway, and only serves as a way
to trip SW over.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-11-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:12 -08:00
Marc Zyngier
0dc433e79a KVM: arm64: Add LR overflow handling documentation
Add a bit of documentation describing how we are dealing with LR
overflow. This is mostly a braindump of how things are expected
to work. For now anyway.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-10-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:12 -08:00
Marc Zyngier
879a7fd4fd KVM: arm64: Add tracking of vgic_irq being present in a LR
We currently cannot identify whether an interrupt is queued into
a LR. It wasn't needed until now, but that's about to change.

Add yet another flag to track that state.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-9-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:12 -08:00
Marc Zyngier
a4413a7c31 KVM: arm64: Repack struct vgic_irq fields
struct vgic_irq has grown over the years, in a rather bad way.
Repack it using bitfields so that the individual flags, and move
things around a bit so that it a bit smaller.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-8-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:11 -08:00
Marc Zyngier
2a28810cbb KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trapping
A long time ago, an unsuspecting architect forgot to add a trap
bit for ICV_DIR_EL1 in ICH_HCR_EL2. Which was unfortunate, but
what's a bit of spec between friends? Thankfully, this was fixed
in a later revision, and ARM "deprecates" the lack of trapping
ability.

Unfortuantely, a few (billion) CPUs went out with that defect,
anything ARMv8.0 from ARM, give or take. And on these CPUs,
you can't trap DIR on its own, full stop.

As the next best thing, we can trap everything in the common group,
which is a tad expensive, but hey ho, that's what you get. You can
otherwise recycle the HW in the neaby bin.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-7-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:11 -08:00
Marc Zyngier
567ebfedb5 KVM: arm64: vgic-v3: Fix GICv3 trapping in protected mode
As we are about to start trapping a bunch of extra things, augment
the pKVM trap description with all the registers trapped by ICH_HCR_EL2.TC,
making them legal instead of resulting in a UNDEF injection in the guest.

While we're at it, ensure that pKVM captures the vgic model so that it
can be checked by the emulation code.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-6-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:11 -08:00
Marc Zyngier
8d3dfab1d3 KVM: arm64: Turn vgic-v3 errata traps into a patched-in constant
The trap bits are currently only set to manage CPU errata. However,
we are about to make use of them for purposes beyond beating broken
CPUs into submission.

For this purpose, turn these errata-driven bits into a patched-in
constant that is merged with the KVM-driven value at the point of
programming the ICH_HCR_EL2 register, rather than being directly
stored with with the shadow value..

This allows the KVM code to distinguish between a trap being handled
for the purpose of an erratum workaround, or for KVM's own need.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-5-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:11 -08:00
Marc Zyngier
08f4f41c1e irqchip/apple-aic: Spit out ICH_MISR_EL2 value on spurious vGIC MI
It is all good and well to scream about spurious vGIC maintenance
interrupts. It would be even better to output the reason why, which
is already checked, but not printed out.

The unsuspecting kernel tinkerer thanks you.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-4-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:11 -08:00
Marc Zyngier
fa8f11e8e1 irqchip/gic: Expose CPU interface VA to KVM
Future changes will require KVM to be able to perform deactivations
by writing to the physical CPU interface. Add the corresponding
VA to the kvm_info structure, and let KVM stash it.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-3-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:11 -08:00
Marc Zyngier
8cb4ecec5e irqchip/gic: Add missing GICH_HCR control bits
The GICH_HCR description is missing a bunch of control bits that
control the maintenance interrupt. Add them.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-2-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:11 -08:00
Oliver Upton
d93febe2ed KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2
Add support for FEAT_XNX to shadow stage-2 MMUs, being careful to only
evaluate XN[0] when the feature is actually exposed to the VM.
Restructure the layering of permissions in the fault handler to assume
pX and uX then restricting based on the guest's stage-2 afterwards.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-4-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:24:45 -08:00
Oliver Upton
2608563b46 KVM: arm64: Add support for FEAT_XNX stage-2 permissions
FEAT_XNX adds support for encoding separate execute permissions for EL0
and EL1 at stage-2. Add support for this to the page table library,
hiding the unintuitive encoding scheme behind generic pX and uX
permission flags.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-3-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:24:44 -08:00
Oliver Upton
dc31124379 arm64: Detect FEAT_XNX
Detect the feature in anticipation of using it in KVM.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-2-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:24:44 -08:00
Kuninori Morimoto
54618003a1 um: drivers: virtio: use string choices helper
Remove hard-coded strings by using the string helper functions

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/87h5uywtwp.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-24 14:12:09 +01:00
Hui Min Mina Chou
3239c52fd2 RISC-V: KVM: Flush VS-stage TLB after VCPU migration for Andes cores
Most implementations cache the combined result of two-stage translation,
but some, like Andes cores, use split TLBs that store VS-stage and
G-stage entries separately.

On such systems, when a VCPU migrates to another CPU, an additional
HFENCE.VVMA is required to avoid using stale VS-stage entries, which
could otherwise cause guest faults.

Introduce a static key to identify CPUs with split two-stage TLBs.
When enabled, KVM issues an extra HFENCE.VVMA on VCPU migration to
prevent stale VS-stage mappings.

Signed-off-by: Hui Min Mina Chou <minachou@andestech.com>
Signed-off-by: Ben Zong-You Xie <ben717@andestech.com>
Reviewed-by: Radim Krčmář <rkrcmar@ventanamicro.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Link: https://lore.kernel.org/r/20251117084555.157642-1-minachou@andestech.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24 09:55:36 +05:30
Fangyu Yu
974555d6e4 RISC-V: KVM: Fix guest page fault within HLV* instructions
When executing HLV* instructions at the HS mode, a guest page fault
may occur when a g-stage page table migration between triggering the
virtual instruction exception and executing the HLV* instruction.

This may be a corner case, and one simpler way to handle this is to
re-execute the instruction where the virtual  instruction exception
occurred, and the guest page fault will be automatically handled.

Fixes: b91f0e4cb8 ("RISC-V: KVM: Factor-out instruction emulation into separate sources")
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251121133543.46822-1-fangyu.yu@linux.alibaba.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24 09:55:36 +05:30
Dong Yang
df60cb2e67 KVM: riscv: Support enabling dirty log gradually in small chunks
There is already support of enabling dirty log gradually in small chunks
for x86 in commit 3c9bd4006b ("KVM: x86: enable dirty log gradually in
small chunks") and c862626 ("KVM: arm64: Support enabling dirty log
gradually in small chunks"). This adds support for riscv.

x86 and arm64 writes protect both huge pages and normal pages now, so
riscv protect also protects both huge pages and normal pages.

On a nested virtualization setup (RISC-V KVM running inside a QEMU VM
on an [Intel® Core™ i5-12500H] host), I did some tests with a 2G Linux
VM using different backing page sizes. The time taken for
memory_global_dirty_log_start in the L2 QEMU is listed below:

Page Size      Before    After Optimization
  4K            4490.23ms         31.94ms
  2M             48.97ms          45.46ms
  1G             28.40ms          30.93ms

Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Signed-off-by: Dong Yang <dayss1224@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251103062825.9084-1-dayss1224@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24 09:55:36 +05:30
BillXiang
a2483d5d1e RISC-V: KVM: Introduce KVM_EXIT_FAIL_ENTRY_NO_VSFILE
Currently, we return CSR_HSTATUS as hardware_entry_failure_reason when
kvm_riscv_aia_alloc_hgei failed in KVM_DEV_RISCV_AIA_MODE_HWACCEL
mode, which is vague so it is better to return a well defined value
KVM_EXIT_FAIL_ENTRY_NO_VSFILE provided via uapi/asm/kvm.h.

Signed-off-by: BillXiang <xiangwencheng@lanxincomputing.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250923053851.32863-1-xiangwencheng@lanxincomputing.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24 09:55:36 +05:30
Anup Patel
d1c5620781 KVM: riscv: selftests: Add SBI MPXY extension to get-reg-list
The KVM RISC-V allows SBI MPXY extensions for Guest/VM so add
it to the get-reg-list test.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20251017155925.361560-5-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24 09:55:36 +05:30
Anup Patel
7050f1d79f RISC-V: KVM: Add SBI MPXY extension support for Guest
The SBI MPXY extension is a platform-level functionality so KVM only
needs to forward SBI MPXY calls to KVM user-space.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20251017155925.361560-4-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24 09:55:36 +05:30
Anup Patel
12fd6c62e9 RISC-V: KVM: Add separate source for forwarded SBI extensions
Add a separate source vcpu_sbi_forward.c for SBI extensions
which are entirely forwarded to KVM user-space.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20251017155925.361560-3-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24 09:55:36 +05:30
Anup Patel
e2f3e2d37b RISC-V: KVM: Convert kvm_riscv_vcpu_sbi_forward() into extension handler
All uses of kvm_riscv_vcpu_sbi_forward() also updates retdata->uexit so
to further reduce code duplication move retdata->uexit assignment to
kvm_riscv_vcpu_sbi_forward() and convert it into SBI extension handler.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20251017155925.361560-2-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24 09:55:36 +05:30
René Rebe
123b5eb726 riscv: dts: sifive: unmatched: Add PWM controlled fans
This adds the two PWM-controlled fans of the HiFive Unmatched board to
the device tree.

Signed-off-by: René Rebe <rene@exactco.de>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-23 19:59:48 +00:00
Jon Hunter
1463299a22 memory: tegra186-emc: Fix missing put_bpmp
Commit a52ddb98a6 ("memory: tegra186-emc: Simplify and handle deferred
probe with dev_err_probe()") accidently dropped a call to 'put_bpmp' to
release a handle to the BPMP when getting the EMC clock fails. Fix this
by restoring the 'goto put_bpmp' if devm_clk_get() fails.

Fixes: a52ddb98a6 ("memory: tegra186-emc: Simplify and handle deferred probe with dev_err_probe()")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20251106190550.1776974-1-jonathanh@nvidia.com
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-11-22 13:41:24 +01:00
Chukun Pan
e34003dd2f arm64: dts: rockchip: enable RTC for 100ASK DshanPi A1
Enable RTC support for the 100ASK DshanPi A1 board.

Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
Link: https://patch.msgid.link/20251120120011.279104-6-amadeus@jmu.edu.cn
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-21 22:39:06 +01:00
Chukun Pan
d43935da2b arm64: dts: rockchip: enable USB for 100ASK DshanPi A1
Enable USB support for the 100ASK DshanPi A1 board.
Note that the HUSB311 Type-C chip is not supported.

Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
Link: https://patch.msgid.link/20251120120011.279104-5-amadeus@jmu.edu.cn
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-21 22:39:06 +01:00
Chukun Pan
80ec14d1d2 arm64: dts: rockchip: enable button for 100ASK DshanPi A1
The 100ASK DshanPi A1 board has three ADC buttons and one
GPIO button. Enable them.

Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
Link: https://patch.msgid.link/20251120120011.279104-4-amadeus@jmu.edu.cn
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-21 22:39:06 +01:00
Chukun Pan
98461b5e3c arm64: dts: rockchip: add mmc aliases for 100ASK DshanPi A1
Add missing MMC aliases for 100ASK DshanPi A1 board.

Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
Link: https://patch.msgid.link/20251120120011.279104-3-amadeus@jmu.edu.cn
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-21 22:39:06 +01:00
Chukun Pan
b120a9df26 arm64: dts: rockchip: remove mmc max-frequency for 100ASK DshanPi A1
The max-frequency property is already defined in the mmc
node of rk3576.dtsi. Remove the redundant definition.

Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Link: https://patch.msgid.link/20251120120011.279104-2-amadeus@jmu.edu.cn
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-21 22:39:05 +01:00
Michael Opdenacker
b92c4eae75 arm64: dts: rockchip: Enable i2c2 on Orange Pi 3B
Enable the "i2c2" bus on header pins 3 (I2C_SDA_M1)
and 5 (I2C2_SCL_M1) of the Orange Pi 3B board.

As documented on http://www.orangepi.org/img/pi3b/0719-pi3b-19.png,
such pins are the only ones offering I2C functionality
without conflicting with other SoC blocks.

Signed-off-by: Michael Opdenacker <michael.opdenacker@rootcommit.com>
Link: https://patch.msgid.link/20251120-orangepi3-enable-i2c2-v1-1-2e023a74012a@rootcommit.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-21 22:36:32 +01:00
Arnd Bergmann
ad2478e9fc Merge tag 'omap-for-v6.19/soc-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap into soc/arm
ARM: OMAP2+: Fix falg->flag typo in omap_smc2()

* tag 'omap-for-v6.19/soc-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap:
  ARM: OMAP2+: Fix falg->flag typo in omap_smc2()

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 22:20:05 +01:00
Arnd Bergmann
a60997452a Merge tag 'imx-bindings-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into soc/dt
i.MX dt-bindings update for 6.19:

- New board support: Protonic PRT8ML, Toradex SMARC iMX95, Skov Rev.C
  HDMI, i.MX 95 Verdin Evaluation KitPHYTEC phyBOARD-Segin-i.MX91 board,
  Skov i.MX8MP variant
- One imx-iomuxc-gpr update from Fabio Estevam to document CSI mux
- A couple of fpga-qixis bindings updates from Ioana Ciornei
- One embedded-controller update from Mathew McBride to add Traverse
  Ten64 board controller

* tag 'imx-bindings-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  dt-bindings: arm: fsl: add Toradex SMARC iMX95
  dt-bindings: arm: fsl: add Skov Rev.C HDMI support
  dt-bindings: arm: fsl: Add PHYTEC phyBOARD-Segin-i.MX91 board
  dt-bindings: fsl,fpga-qixis: describe the gpio child node found on LS1046AQDS
  dt-bindings: fsl,fpga-qixis-i2c: add support for LX2160ARDB FPGA
  dt-bindings: arm: fsl: Add Protonic PRT8ML
  dt-bindings: arm: imx: document i.MX 95 Verdin Evaluation Kit (EVK)
  dt-bindings: embedded-controller: add Traverse Ten64 board controller
  dt-bindings: soc: imx-iomuxc-gpr: Document the CSI mux
  dt-bindings: arm: fsl: add compatible for Skov i.MX8MP variant

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 22:18:35 +01:00
Julia Lawall
d6ecfc55d4 ARM: gemini: fix typos in comments
Various spelling mistakes in comments.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20251023204737.2716443-1-linus.walleij@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 22:16:39 +01:00
Arnd Bergmann
1a263d3a70 Merge tag 'v6.19-rockchip-defconfig64-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/defconfig
Rockchip support for basic camera interface (CIF) and Synopsis DW-DP
driver, as well as the CEC extension to the DW-HDMI-QP driver.

* tag 'v6.19-rockchip-defconfig64-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  arm64: defconfig: enable rockchip camera interface
  arm64: defconfig: Enable DW HDMI QP CEC support
  arm64: defconfig: Enable Rockchip extensions for Synopsys DW DP

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:43:36 +01:00
Arnd Bergmann
922ef7512d Merge tag 'qcom-arm64-defconfig-for-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/defconfig
Qualcomm Arm64 defconfig updates for v6.19

Enable config options for the hardare used across Fairphone 3, 4, and 5.
Then enable Novatek display panels founds on Xiaomi Pocophone F1, and
the SM8750 MTP, eUSB2 PHY found in SM8750, NSS clock controller found in
IPQ5424, the SX150x gpio expander used in QCS615 reference device, and
the support for UFS inline crypto.

* tag 'qcom-arm64-defconfig-for-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
  arm64: defconfig: Enable SX150x GPIO expander driver
  arm64: defconfig: Build NSS clock controller driver for IPQ5424
  arm64: defconfig: Enable SCSI UFS Crypto and Block Inline encryption drivers
  arm64: defconfig: Add M31 eUSB2 PHY config
  arm64: defconfig: Enable configs for Fairphone 3, 4, 5 smartphones
  arm64: defconfig: Enable two Novatek display panels for MTP8750 and Tianma

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:42:44 +01:00
Arnd Bergmann
f65372ea2f Merge tag 'omap-for-v6.19/defconfig-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap into soc/defconfig
multi_v7_defconfig: Enable TI PRU Ethernet driver

* tag 'omap-for-v6.19/defconfig-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap:
  ARM: multi_v7_defconfig: Enable TI PRU Ethernet driver

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:42:03 +01:00
Arnd Bergmann
8283edc8ba Merge tag 'at91-defconfig-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into soc/defconfig
Microchip AT91 defconfig updates for v6.19

This update includes:
- CONFIG_MMC_SPI is set to module for at91_dt_defconfig

* tag 'at91-defconfig-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
  ARM: at91: at91_dt_defconfig: set MMC_SPI to module

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:41:15 +01:00
Arnd Bergmann
68bdc95066 Merge tag 'imx-defconfig-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into soc/defconfig
i.MX defconfig changes for 6.19:

- Enable sound drivers for imx28-amarula-rmm in mxs_defconfig
- Enable i.MX AIPSTZ driver, i.MX95 pinctrl driver, Ethernet and PCIe
  support in arm64 defconfig

* tag 'imx-defconfig-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  arm64: defconfig: enable i.MX AIPSTZ driver
  ARM: mxs_defconfig: enable sound drivers for imx28-amarula-rmm
  arm64: defconfig: Enable i.MX95 drivers for pinctrl, Ethernet and PCIe

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:39:20 +01:00
Arnd Bergmann
74620292be Merge tag 'tegra-for-6.19-arm64-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/defconfig
arm64: tegra: Default configuration changes for v6.19-rc1

Enable the new driver for the VRS PSEQ RTC found on Tegra234 and later.

* tag 'tegra-for-6.19-arm64-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  arm64: defconfig: Enable NVIDIA VRS PSEQ RTC

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:37:54 +01:00
Arnd Bergmann
909d8d395f Merge tag 'tegra-for-6.19-arm-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/defconfig
ARM: tegra: Default configuration changes for v6.19-rc1

Enable ext4 by default on Tegra to restore systems booting from MMC.

* tag 'tegra-for-6.19-arm-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  ARM: tegra: Enable EXT4 for Tegra

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:36:57 +01:00
Arnd Bergmann
0719a58aac Merge tag 'mtk-defconfig-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux into soc/defconfig
MediaTek defconfig updates

As MediaTek boards with UFS appeared some time ago, this adds a
single commit enabling the MediaTek UFS driver, allowing those
boards to boot over UFS as primary storage.

* tag 'mtk-defconfig-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux:
  arm64: defconfig: Enable UFS support for MediaTek Genio 1200 EVK UFS board

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:35:44 +01:00
Tudor Ambarus
38c8787f1b arm64: defconfig: enable Exynos ACPM clocks
Enable the Exynos ACPM clocks driver. Samsung Exynos platforms
implement ACPM to provide support for clock configuration, PMIC
and temperature sensors.

Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Peter Griffin <peter.griffin@linaro.org>
Tested-by: Peter Griffin <peter.griffin@linaro.org> # on gs101-oriole
Link: https://patch.msgid.link/20251010-acpm-clk-v6-5-321ee8826fd4@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20251110121344.120785-5-krzysztof.kozlowski@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:34:59 +01:00
Arnd Bergmann
24a979a1fe Merge tag 'renesas-arm-defconfig-for-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into soc/defconfig
Renesas ARM defconfig updates for v6.19

  - Enable support for the Renesas RZ/G3S and RZ/G3E thermal drivers,
    and the RZ/T2H and RZ/N2H ADC drivers in the ARM64 defconfig,
  - Refresh the ARM SH-Mobile defconfig for v6.18-rc1.

* tag 'renesas-arm-defconfig-for-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel:
  arm64: defconfig: Enable RZ/T2H / RZ/N2H ADC driver
  ARM: shmobile: defconfig: Refresh for v6.18-rc1
  arm64: defconfig: Enable the Renesas RZ/G3E thermal driver
  arm64: defconfig: Enable Renesas RZ/G3S thermal driver

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:33:06 +01:00
Huang Shijie
04392a1e81 arm64: defconfig: Remove the redundant SCHED_MC/SCHED_SMT
The patch "7bd291abe2d sched: Unify the SCHED_{SMT,CLUSTER,MC} Kconfig"
has enabled the SCHED_MC/SCHED_SMT by default for arm64.

So remove the redundant code in defconfig.

Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20251021075704.527626-1-shijie@os.amperecomputing.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:32:31 +01:00
Arnd Bergmann
9b418a3bfd Merge tag 'anlogic-initial-6.19-v2' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/newsoc
Initial Anlogic Platform Support

Add bindings for the serial and timer peripherals, and a basic soc dtsi
for the Anlogic dr1v90 SoC. The Milianke MLKPAI FS01 is the first board
for this SoC. Add myself as maintainer for this platform for the time
being.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>

* tag 'anlogic-initial-6.19-v2' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
  MAINTAINERS: Setup support for Anlogic tree
  riscv: defconfig: Enable Anlogic SoC
  riscv: dts: anlogic: Add Milianke MLKPAI FS01 board
  riscv: dts: Add initial Anlogic DR1V90 SoC device tree
  riscv: Add Anlogic SoC famly Kconfig support
  dt-bindings: serial: snps-dw-apb-uart: Add Anlogic DR1V90 uart
  dt-bindings: timer: Add Anlogic DR1V90 ACLINT MTIMER
  dt-bindings: riscv: Add Anlogic DR1V90
  dt-bindings: riscv: Add Nuclei UX900 compatibles
  dt-bindings: vendor-prefixes: Add Anlogic, Milianke and Nuclei
2025-11-21 21:29:57 +01:00
Shivam Chaudhary
00465358f4 ARM: versatile: Fix typo in versatile.c
Corrected minor typo in versatile.c
- Fixed "documentaton" to "documentation"

Signed-off-by: Shivam Chaudhary <cvam0000@gmail.com>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20251023223258.3181274-1-linus.walleij@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:20:42 +01:00
Arnd Bergmann
9572838be3 Merge tag 'mtk-arm32-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux into soc/arm
MediaTek mach ARM32 updates

This adds support for the MT6582 SoC and its SMP bringup code.
This SoC is found in old smartphones and tablets from various
manufacturers.

* tag 'mtk-arm32-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux:
  ARM: mediatek: add MT6582 smp bring up code
  ARM: mediatek: add board_dt_compat entry for the MT6582 SoC

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:19:52 +01:00
Arnd Bergmann
c5dc31b2b4 Merge branch 'bst/newsoc' into soc/newsoc
This patch series introduces platform support for Black Sesame Technologies
(BST) C1200 SoC and CDCU1.0 ADAS 4C2G board. BST is a leading automotive-grade
computing SoC provider focusing on intelligent driving, computer vision, and AI
capabilities for ADAS and autonomous driving applications. You can find more
information about the SoC and related boards at: https://bst.ai

This series provides the foundational platform enablement including device tree
bindings, SoC and board device trees, platform configuration, and maintainer
information. MMC/SDHCI driver support will be submitted in a separate patch series.

* bst/newsoc:
  MAINTAINERS: add Black Sesame Technologies (BST) ARM SoC support
  arm64: defconfig: enable BST platform support
  arm64: dts: bst: add support for Black Sesame Technologies C1200 CDCU1.0 board
  arm64: Kconfig: add ARCH_BST for Black Sesame Technologies SoCs
  dt-bindings: arm: add Black Sesame Technologies (bst) SoC
  dt-bindings: vendor-prefixes: Add Black Sesame Technologies Co., Ltd.

Link: https://lore.kernel.org/all/20251016120558.2390960-1-yangzh0906@thundersoft.com/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:14:07 +01:00
Albert Yang
92bcb8cc7b MAINTAINERS: add Black Sesame Technologies (BST) ARM SoC support
Add a MAINTAINERS entry for Black Sesame Technologies (BST) ARM SoC
support. This entry covers device tree bindings, drivers, and board files
for BST SoCs, and platform support.

Signed-off-by: Albert Yang <yangzh0906@thundersoft.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:12:26 +01:00
Albert Yang
be64ae6fe2 arm64: defconfig: enable BST platform support
Enable support for Black Sesame Technologies (BST) platform
in the ARM64 defconfig:
- CONFIG_ARCH_BST: Enable BST SoC platform support

Signed-off-by: Albert Yang <yangzh0906@thundersoft.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:12:26 +01:00
Albert Yang
c03a37773b arm64: dts: bst: add support for Black Sesame Technologies C1200 CDCU1.0 board
Add device tree support for the Black Sesame Technologies (BST) C1200
CDCU1.0 ADAS 4C2G platform. This platform is based on the BST C1200 SoC
family.
The changes include:
- Adding a new BST device tree directory
- Adding Makefile entries to build the BST platform device trees
- Adding the device tree for the BST C1200 CDCU1.0 ADAS 4C2G board
This board features a quad-core Cortex-A78 CPU, and various peripherals
including UART, and interrupt controller.

Signed-off-by: Albert Yang <yangzh0906@thundersoft.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:12:26 +01:00
Albert Yang
1541219416 arm64: Kconfig: add ARCH_BST for Black Sesame Technologies SoCs
Add ARCH_BST configuration option to enable support for Black Sesame
Technologies SoC family. BST produces automotive-grade system-on-chips
for intelligent driving, focusing on computer vision and AI capabilities.
The BST C1200 family includes SoCs for ADAS and autonomous driving
applications.

Signed-off-by: Albert Yang <yangzh0906@thundersoft.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:12:26 +01:00
Albert Yang
fc77dc0620 dt-bindings: arm: add Black Sesame Technologies (bst) SoC
Add device tree bindings for Black Sesame Technologies Arm SoC,
it consists several SoC models like C1200, etc.

Signed-off-by: Albert Yang <yangzh0906@thundersoft.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:12:25 +01:00
Albert Yang
9371cd4183 dt-bindings: vendor-prefixes: Add Black Sesame Technologies Co., Ltd.
Black Sesame Technologies Co., Ltd.s a leading automotive-grade
computing SoC and SoC-based intelligent vehicle solution provider.
Link: https://bst.ai/.

Signed-off-by: Albert Yang <yangzh0906@thundersoft.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:12:25 +01:00
Arnd Bergmann
4079b4700d Merge tag 'ti-k3-dt-for-v6.19-part2' of https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux into soc/dt
TI K3 device tree updates for v6.19 part2

Late fixes and cleanups:
* Fix build warnings for unapplied overlays for PHYTEC, SA67 and certain TI EVM
* Fix pinmux of SD regulator control line on J721e SK
* Correct unit address of cbass_wakeup node for AM62L

* tag 'ti-k3-dt-for-v6.19-part2' of https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux:
  arm64: dts: ti: k3-am62l: Fix unit address of cbass_wakeup
  arm64: dts: ti: k3-j721e-sk: Fix pinmux for pin Y1 used by power regulator
  arm64: dts: ti: Add missing applied DT overlay targets
  arm64: dts: ti: sa67: add build time dtb for overlays
  arm64: dts: ti: Enable build testing of PHYTEC board overlays

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:06:29 +01:00
Arnd Bergmann
887bc88163 Merge tag 'mvebu-dt64-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu into soc/dt
mvebu dt64 for 6.19 (part 1)

pinctrl node names cleanup from Rob on Marvell device tree files
Proper fix for pci errors on armada cp11x based platforms

* tag 'mvebu-dt64-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu:
  clk: mvebu: cp110 add CLK_IGNORE_UNUSED to pcie_x10, pcie_x11 & pcie_x4
  Revert "arm64: dts: marvell: cn9132-clearfog: fix multi-lane pci x2 and x4 ports"
  arm64/arm: dts: marvell: Rename "nand-rb" pinctrl node names

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:05:33 +01:00
Arnd Bergmann
314bfe59ec Merge tag 'riscv-sophgo-dt-for-v6.19' of https://github.com/sophgo/linux into soc/dt
RISC-V Devicetrees for v6.19

Sophgo:

For CV18xx serials:
Add top syscon device related DTS change, the top system
controller provides register access to configure some
misc modules, such as usb2 phy and a dma multiplexer.

For SG2042:
There are two changes. The first one is to add DTS
definition for PCIe controllers for SoC SG2042 and
boards such as Pioneerbox/EVB_V1/EVB_V2 uses SG2042.
The second one is to add DTS to support SPI-NOR flash
controllers for this SoC and the same for related boards.

Signed-off-by: Chen Wang <unicorn_wang@outlook.com>

* tag 'riscv-sophgo-dt-for-v6.19' of https://github.com/sophgo/linux:
  riscv: dts: sophgo: Enable SPI NOR node for SG2042_EVB_V2
  riscv: dts: sophgo: Enable SPI NOR node for SG2042_EVB_V1
  riscv: dts: sophgo: Enable SPI NOR node for PioneerBox
  riscv: dts: sophgo: Add SPI NOR node for SG2042
  riscv: dts: sophgo: Add USB support for cv18xx
  riscv: dts: sophgo: Add syscon node for cv18xx
  dt-bindings: soc: sophgo: add TOP syscon for CV18XX/SG200X series SoC
  riscv: sophgo: dts: enable PCIe for SG2042_EVB_V2.0
  riscv: sophgo: dts: enable PCIe for SG2042_EVB_V1.X
  riscv: sophgo: dts: enable PCIe for PioneerBox
  riscv: sophgo: dts: add PCIe controllers for SG2042

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 21:04:31 +01:00
Arnd Bergmann
f0218293ee Merge tag 'cix-dt-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/cix into soc/dt
CIX device tree changes for v6.19-rc1, add below new components support:
- PCIe
- Pinctrl
- SPI

* tag 'cix-dt-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/cix:
  arm64: dts: cix: add a compatible string for the cix sky1 SoC
  arm64: dts: cix: Enable PCIe on the Orion O6 board
  arm64: dts: cix: Add PCIe Root Complex on sky1
  arm64: dts: cix: Add pinctrl nodes for sky1
  arm64: dts: cix: add DT nodes for SPI

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 20:59:54 +01:00
Arnd Bergmann
5733ad88a5 Merge tag 'stm32-dt-for-v6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32 into soc/dt
STM32 DT for v6.19, round 1

Highlights:
-----------

- MPU:
  - STM32MP13:
    - Add and enable the ARM SMC watchdog to use IWDG1 in the secure
      world.

  - STMP32MP15:
    - Phytec SOM: Fix STMPE811 touchscreen
    - LXA: drop unnecessary vusb_d/a-supply as already defined by
           "phy-supply" and "vdda1v8-supply".

  - STM32MP23:
    - Use the RIFSC as an access controler (firewall) as it is done
      for STM32MP25 and STM32MP23.

  - STM32MP25:
    - Add OSPI memory region name.
    - Add I/O synchronization properties to satisfy RGMII
      specification.

* tag 'stm32-dt-for-v6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32:
  arm64: dts: st: set RIFSC as an access controller on stm32mp21x platforms
  ARM: dts: stm32: add the IWDG2 interrupt line in stm32mp131.dtsi
  ARM: dts: stm32: enable the ARM SMC watchdog node in stm32mp135f-dk
  ARM: dts: stm32: add the ARM SMC watchdog in stm32mp131.dtsi
  ARM: dts: stm32: add iwdg1 node in stm32mp131.dtsi
  arm64: dts: st: Add I/O sync to eth pinctrl in stm32mp25-pinctrl.dtsi
  arm64: dts: st: Add memory-region-names property for stm32mp257f-ev1
  ARM: dts: stm32: lxa: drop unnecessary vusb_d/a-supply
  ARM: dts: stm32: stm32mp157c-phycore: Fix STMPE811 touchscreen node properties

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 20:53:53 +01:00
Arnd Bergmann
bb4553b163 Merge tag 'v6.19-rockchip-dts64-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt
New boards: 9Tripod X3568, 100ASK DShanPi A1, LinkEase EasePi R1,
FriendlyElec NanoPi R76S

Interesting archeological addition: RK3368 (2015) gets display
output afterall.

New peripherals: vicap on px30 and rk356x, PCIe Gen2x1 on RK3528,
use actual clock-ids for SCMI clocks - not hardcoded numbers,
CQE support for the eMMC on RK3588.

As well as a number of enablements for individual boards.
For example enablement for the now usable NPU.

* tag 'v6.19-rockchip-dts64-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: (43 commits)
  arm64: dts: rockchip: add vicap node to rk356x
  arm64: dts: rockchip: add the vip node to px30
  arm64: dts: rockchip: fixes audio for 100ASK DshanPi A1
  arm64: dts: rockchip: fixes vcc3v3_s0 supply for 100ASK DshanPi A1
  arm64: dts: rockchip: fixes ethernet for 100ASK DshanPi A1
  arm64: dts: rockchip: fixes regulator for 100ASK DshanPi A1
  arm64: dts: rockchip: correct assigned-clock-rates spelling on 2 boards
  arm64: dts: rockchip: clean up devicetree for 9Tripod X3568 v4
  arm64: dts: rockchip: Enable USB-C DP Alt for Indiedroid Nova
  arm64: dts: rockchip: add eMMC CQE support for rk3588
  arm64: dts: rockchip: enable HDMI audio on Rock 5 ITX
  arm64: dts: rockchip: Add eeprom vcc-supply for Radxa ROCK 3C
  arm64: dts: rockchip: Add eeprom vcc-supply for Radxa ROCK 5A
  arm64: dts: rockchip: Move the EEPROM to correct I2C bus on Radxa ROCK 5A
  arm64: dts: rockchip: use SCMI clock id for gpu clock on rk356x
  arm64: dts: rockchip: Remove sdmmc max-frequency on RK3588S EVB1 board
  arm64: dts: rockchip: Remove sdmmc max-frequency for Radxa ROCK 5 ITX/5B/5B+/5T
  arm64: dts: rockchip: Switch microSD card detect to gpio on Radxa ROCK 5 ITX/5C
  arm64: dts: rockchip: Add devicetree for the 9Tripod X3568 v4
  dt-bindings: arm: rockchip: Add 9Tripod X3568 series
  ...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 20:52:41 +01:00
Arnd Bergmann
d7eebe1098 Merge tag 'v6.19-rockchip-dts32-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt
A number of cleanups for older socs.

* tag 'v6.19-rockchip-dts32-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  ARM: dts: rockchip: move edp assigned-clocks to edp node on rk3288
  ARM: dts: rockchip: Add spi_flash label to rk3288-veyron
  ARM: dts: rockchip: Remove mshc aliases from RK3288
  ARM: dts: rockchip: Adapt tps65910 nodes on RK3066 boards

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 20:51:06 +01:00
Lukas Bulwahn
055bcc552b MAINTAINERS: refer to intended file in STANDALONE CACHE CONTROLLER DRIVERS
Commit 23db6eed72bd ("MAINTAINERS: Add Jonathan Cameron to drivers/cache
and add lib/cache_maint.c + header") intends to add a file entry pointing
to the cache_coherency.h file, but messes up to name the right path.

Update the entry to the intended file.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-21 18:42:02 +00:00
Yushan Wang
2ec3b54a6f cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent
Hydra Home Agent is a device used to maintain cache coherency. Add support
for explicit cache maintenance operations using it. A system has multiple
of these agents. Whilst only one agent is responsible for a given cache
line, interleave means that for a range operation, responsibility for the
cache lines making up the range will typically be spread across multiple
instances.

Put this driver on a new Kconfig menu under drivers/cache. The short
description as memory hotplug like operations is intended to cover
the somewhat complex set of cases where this unit applies and differentiate
it clearly from typical non coherent DMA flows.

Co-developed-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Yushan Wang <wangyushan12@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-21 18:42:02 +00:00
Jonathan Cameron
4d1608d0ab cache: Make top level Kconfig menu a boolean dependent on RISCV
The next patch will add a new type of cache maintenance driver responsible
for flushing deeper than is necessary for non coherent DMA (current
use case of drivers/cache drivers), as needed when performing operations
such as memory hotplug and security unlocking of persistent memory. The two
types of operation are similar enough to share a drivers/cache directory
and MAINTAINERS but are otherwise currently unrelated.

To avoid confusion have two separate menus. Each has dependencies that are
implemented by making them boolean symbols, here CACHEMAINT_FOR_DMA
which is dependent on RISCV as all driver are currently for platforms of
that architecture. Set new symbol default to y to avoid breaking existing
configs. This has no affect on actual code built, just visibility of the
menu.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-21 18:42:02 +00:00
Jonathan Cameron
9b9de5a56a MAINTAINERS: Add Jonathan Cameron to drivers/cache and add lib/cache_maint.c + header
Seems unfair to inflict the cache-coherency drivers on Conor with out also
stepping up as a second maintainer for drivers/cache.

Include the library support for cache-coherency maintenance drivers to the
existing entry.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-21 18:42:02 +00:00
Jonathan Cameron
4d873c5dc3 arm64: Select GENERIC_CPU_CACHE_MAINTENANCE
The generic CPU cache maintenance framework provides a way to register
drivers for devices implementing the underlying support for
cpu_cache_has_invalidate_memregion(). Enable it for arm64 by selecting
GENERIC_CPU_CACHE_MAINTENANCE which provides the implementation for,
and in turn selects, ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-21 18:42:01 +00:00
Yicong Yang
c460697d34 lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION provides the mechanism for
invalidating certain memory regions in a cache-incoherent manner. Currently
this is used by NVDIMM and CXL memory drivers in cases where it is
necessary to flush all data from caches by physical address range.
The operations in question are effectively memory hotplug, where stale
data might otherwise remain in the caches.

This is separate from the invalidates done to enable use of non-coherent
DMA masters, primarily in terms of when it is needed (not related to DMA
mappings) and how deep the flush must push data. The flushes done for
non-coherent DMA only need to reach the Point of Coherence of a single host
(which is often nearer CPUs and DMA masters than the physical storage).
This operation must push the data out of non architectural caches
(memory-side caches, write buffers etc) and typically all the way to the
memory device.

In some architectures these operations are supported by system components
that may become available only later in boot as they are either present
on a discoverable bus, or via a firmware description of an MMIO interface
(e.g. ACPI DSDT). Provide a framework to handle this case.

Architectures can opt in for this support via
CONFIG_GENERIC_CPU_CACHE_MAINTENANCE

Add a registration framework. Each driver provides an ops structure and
the first op is Write Back and Invalidate by PA Range. The driver may
over invalidate.

For systems that can perform this operation asynchronously an optional
completion check operation is also provided. If present that must be called
to ensure that the action has finished. This provides a considerable
performance advantage if multiple agents are involved in the maintenance
operation.

When multiple agents are present in the system each should register with
this framework and the core code will issue the invalidate to all of them
before checking for completion on each. This is done to avoid need for
filtering in the core code which can become complex when interleave,
potentially across different cache coherency hardware is going on, so it
is easier to tell everyone and let those who don't care do nothing.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-21 18:42:01 +00:00
Yosry Ahmed
d2e50389ab KVM: selftests: Make sure vm->vpages_mapped is always up-to-date
Call paths leading to __virt_pg_map() are currently:
(a) virt_pg_map() -> virt_arch_pg_map() -> __virt_pg_map()
(b) virt_map_level() -> __virt_pg_map()

For (a), calls to virt_pg_map() from kvm_util.c make sure they update
vm->vpages_mapped, but other callers do not. Move the sparsebit_set()
call into virt_pg_map() to make sure all callers are captured.

For (b), call sparsebit_set_num() from virt_map_level().

It's tempting to have a single the call inside __virt_pg_map(), however:
- The call path in (a) is not x86-specific, while (b) is. Moving the
  call into __virt_pg_map() would require doing something similar for
  other archs implementing virt_pg_map().

- Future changes will reusue __virt_pg_map() for nested PTEs, which should
  not update vm->vpages_mapped, i.e. a triple underscore version that does
  not update vm->vpages_mapped would need to be provided.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-12-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-21 10:17:05 -08:00
Yosry Ahmed
1de4dc15ba KVM: selftests: Stop using __virt_pg_map() directly in tests
Replace __virt_pg_map() calls in tests by high-level equivalent
functions, removing some loops in the process.

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-11-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-21 10:15:14 -08:00
Arnd Bergmann
c1ce3348ae Merge tag 'qcom-arm64-for-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/dt
Qualcomm Arm64 DeviceTree updates for v6.19

Introduce support for the Redxa Dragon Q6A development board, the Huawei
MateBoot E 2019, the Asus ZenFone 2 Laser/Selfie, the MSM8937 platform
and the Xiaomo Redmi 3S device based on it.

SoC dtsi files for Agatti, Hamoa, Kodiak, Monaco, Purwa, and Talos, are
renamed in order to better facilitate the addition of new boards on the
various SKUs of these.

Cooling maps are introduced for the CPU cores in IPQ5424, and the
network subsystem clock controller is added.

On Lemans, RTC is enabled, the EVK fan controller is described and a
camera mezzanine overlay is introduced.

Touchscreen support is added to the BQ Aquaris M5, and the touchscreen
from Samsung Galaxy Core Prime is moved to the common platform to
benefit the other devices sharing common definitions.

On Agatti two more UARTs are described, as well as APR and the related audio
services, and the LPASS LPI pin controller. The RB1 board gets HDMI
autio playback support.

On Kodiak-based targets, Fairphone FP5 gains definitions of the UW camera
actuator, regulator for the ToF sensor, and haptic module. The SHIFT
SHIFTphone 8 gains RGB and flash LEDs, and Venus support. The Rb3Gen2
development board gets QUP firmware path defined, to support dynamic
loading of the serial engine firmware. Kodiak also gains Coresight
devices for AOSS and QDSS blocks.

Display support is added for the Talos platform, and enabled on the Ride
board. Talos also gains the definitions to scale DDR and L3
interconnects.

On SC8280XP, the camera privacy indicator on Lenovo Thinkpad X13s is
connected to the camera stack. Off-by-one GPI DMA channels are
corrected.

The SDM845-based LG and OnePlus custom defined rmtfs guard pages are
replaced with the inline-support for guard pages.

SDX75 DWC3 node is flattened and marked for USB role switching.

On SM8550, the camera subsystem and the S5K3M5 camera sensor is
introduced for the QRD, and an overlay for the "Rear Camera Card" for
the Hardware Development Kit (HDK) is introduced.

USB support is introduce for the SM8750 platform, and enabled in the MTP
and QRD devices.

On Hamoa, like on other devices the Asus Zenbook A14 definition of the
eDP panel is reworked to support both LCD and OLED configurations. WiFi
and Bluetooth is also enabled on the A14. The CRD gains support for
controlling charge limits.

The refgen regulator supplying DSI is defined and wired up on a variety
of platforms.

* tag 'qcom-arm64-for-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (138 commits)
  arm64: dts: qcom: sdx75: Add missing usb-role-switch property
  arm64: dts: qcom: sdx75: Flatten usb controller node
  arm64: dts: qcom: HAMOA-IOT-SOM: Unreserve GPIOs blocking SPI11 access
  arm64: dts: qcom: qrb2210-rb1: Fix UART3 wakeup IRQ storm
  Revert "arm64: dts: qcom: sc7280: Increase config size to 256MB for ECAM feature"
  arm64: dts: qcom: kodiak: add coresight nodes
  arm64: dts: qcom: sdm845-oneplus: Describe TE gpio
  arm64: dts: qcom: sdm845-oneplus: Implement panel sleep pinctrl
  arm64: dts: qcom: sdm845-oneplus: Group panel pinctrl
  arm64: dts: qcom: sdm845-oneplus: Update compatbible and add DDIC supplies
  arm64: dts: qcom: qcs6490-rb3gen2: Rename vph-pwr regulator node
  arm64: dts: qcom: qcm6490-fairphone-fp5: Add UW cam actuator
  arm64: dts: qcom: qcm6490-fairphone-fp5: Enable CCI pull-up
  arm64: dts: qcom: sm8750: Add USB support for SM8750 QRD platform
  arm64: dts: qcom: sm8750: Add USB support for SM8750 MTP platform
  arm64: dts: qcom: sm8750: Add USB support to SM8750 SoCs
  arm64: dts: qcom: rename x1p42100 to purwa
  arm64: dts: qcom: rename sc7280 to kodiak
  arm64: dts: qcom: rename qcm2290 to agatti
  arm64: dts: qcom: add gpu_zap_shader label
  ...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 17:24:41 +01:00
Arnd Bergmann
c02dce25bc Merge tag 'qcom-arm32-for-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/dt
Qualcomm Arm32 DeviceTree updates for v6.19

In addition to a variety of cleanups and reordering of nodes, four GSBIs
are added to the MSM8960 platform.
On the MSM8226-based Samsung Galaxy Grand 2, a simple framebuffer is
defined.

* tag 'qcom-arm32-for-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
  ARM: dts: qcom: msm8226-samsung-ms013g: add simple-framebuffer
  ARM: dts: qcom: msm8960: rename msmgpio node to tlmm
  ARM: dts: qcom: msm8960: add I2C nodes for gsbi1 and gsbi8
  ARM: dts: qcom: msm8960: add I2C nodes for gsbi10 and gsbi12
  ARM: dts: qcom: msm8960: inline qcom-msm8960-pins.dtsi
  ARM: dts: qcom: msm8960: reorder nodes and properties

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 17:21:12 +01:00
Arnd Bergmann
2ddb010a9e Merge tag 'omap-for-v6.19/dt-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap into soc/dt
Misc. DT updates, mostly for AM3 platforms.

* tag 'omap-for-v6.19/dt-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap:
  ARM: dts: ti/omap: fix incorrect compatible string in internal eeprom node
  ARM: dts: omap: am335x-mba335x: Fix stray '/*' in comment
  ARM: dts: omap: am335x-tqma335x/mba335x: Fix MicIn routing
  dt-bindings: mmc: ti,omap2430-sdhci: convert to DT schema
  arm: dts: ti: omap: Drop unnecessary properties for SDHCI node
  arm: dts: ti: omap: am335x-pepper: Fix vmmc-supply property typo
  ARM: dts: omap3: n900: Correct obsolete TWL4030 power compatible
  ARM: dts: omap3: beagle-xm: Correct obsolete TWL4030 power compatible
  ARM: dts: omap: Add support for TQMa335x/MBa335x
  ARM: dts: omap: AM33xx: add cpu label
  dt-bindings: omap: add AM335x-based TQMa335x SOM and MBa335x board
  ARM: dts: am335x-netcom-plus-2xx: add missing GPIO labels
  ARM: dts: ti: omap: am335x-baltos: add a regulator for the mPCIe slot
  ARM: dts: omap: rework Baltos LED nodes
  ARM: dts: am33xx: Add missing serial console speed

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 17:19:45 +01:00
Arnd Bergmann
4033cc098c Merge tag 'at91-dt-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into soc/dt
Microchip AT91 device tree updates for v6.19

This update includes:
- FIFO size fixes for some Flexcom UARTs and SPIs

* tag 'at91-dt-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
  ARM: dts: microchip: sama5d2: fix spi flexcom fifo size to 32
  ARM: dts: microchip: sama7g5: fix uart fifo size to 32
  ARM: dts: microchip: sama7d65: fix uart fifo size to 32

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 17:19:04 +01:00
Arnd Bergmann
287df51938 Merge tag 'ti-k3-dt-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux into soc/dt
TI K3 device tree updates for v6.19

Generic fixes and cleanups:
* Multiple SoCs: Disable CPSW in SoC files and
  enable them in board files for better board-level control
* Replace rgmii-rxid with rgmii-id for CPSW ports across multiple boards

New Boards/SoM:
* AM62L SoC and basic support for EVM
* Toradex Aquila AM69 board support
* Kontron SMARC-sAM67 module and ADS2 carrier board support

Platform wide:
* Define possible system states amd wakeup-source (AM62/AM62A/AM62P)

SoC/EVM specific changes:
AM62:
* Add RNG node
* Add OLDI support

AM62P:
* Move audio_refclk to common main dtsi (k3-am62p-j722s-common-main)
* Fix memory ranges for GPU

AM62D2:
* Enable PMIC support on EVM
* Misc fixes

AM64:
* Add DMA support for TSCADC on EVM

AM69:
* Add Aquila board support with Clover variant

J722S:
* Fix audio refclk source in main dtsi
* Explicitly use PLL1_HSDIV6 audio refclk for EVM

J784S4/J742S2:
* Add bootph-all tag to support PCIe boot

Variscite VAR-SOM-AM62P:
* Add support for ADS7846 touchscreen
* Add support for WM8904 audio codec

* tag 'ti-k3-dt-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux: (42 commits)
  arm64: dts: ti: k3-am62l: add initial reference board file
  arm64: dts: ti: k3-am62l: add initial infrastructure
  dt-bindings: arm: ti: Add binding for AM62L SoCs
  arm64: dts: ti: am69-aquila: Add Clover
  arm64: dts: ti: Add Aquila AM69 Support
  dt-bindings: arm: ti: add Toradex Aquila AM69
  arm64: dts: ti: k3-j721s2: disable "mcu_cpsw" in SoC file and enable in board files
  arm64: dts: ti: k3-j721e: disable "mcu_cpsw" in SoC file and enable it in board file
  arm64: dts: ti: k3-j7200: disable "mcu_cpsw" in SoC file and enable in board file
  arm64: dts: ti: k3-am65: disable "mcu_cpsw" in SoC file and enable in board file
  arm64: dts: ti: k3-am62: disable "cpsw3g" in SoC file and enable in board file
  arm64: dts: ti: k3-am62p5-sk: Set wakeup-source system-states
  arm64: dts: ti: k3-am62a7-sk: Set wakeup-source system-states
  arm64: dts: ti: k3-am62-lp-sk: Set wakeup-source system-states
  arm64: dts: ti: k3-am62p: Define possible system states
  arm64: dts: ti: k3-am62a: Define possible system states
  arm64: dts: ti: k3-am62: Define possible system states
  arm64: dts: ti: k3-am62p-j722s-common-main: move audio_refclk here
  arm64: dts: ti: k3-*: Replace rgmii-rxid with rgmii-id for CPSW ports
  arm64: dts: ti: k3-am642-tqma64xxl: add boot phase tags
  ...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 17:15:08 +01:00
Arnd Bergmann
c7cdc91603 Merge tag 'socfpga_dts_updates_for_v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux into soc/dt
SoCFPGA DTS updates for v6.19
- Add 4-bit SPI bus width(n5x, stratix10, agilex and agilex5)
- Agilex5 updates:
	- Add GMAC0 for NAND daughter card
	- Add SMMU support
	- Add VGIC maintenance interrupt
	- Add L2 and L3 cache
	- Add support for the 013b board
	- Add I3C support
- Add support for the Enclustra Mercury+ SA1 SoM based on Cyclone5
- Add support for Agilex3 board(a variant of the Agilex5 board)
- dt-bindings update:
	- Document iommu in cdns,hp-nfc, snps,dw-axi-dmac and Agilex5
	- Document Enclustra Mercury SA1 and AA1 boards
	- Document Agilex5 013b board
	- Document Agilex3 board
- Fix dtbs_check warnings:
	- stratix10-swvp
	- Agilex(NAND and Clock manager)
	- Move sdmmc-ecc to base DTSI file(Stratix10)

* tag 'socfpga_dts_updates_for_v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux: (35 commits)
  arm64: dts: socfpga: agilex5: update qspi partitions for 013b board
  arm64: dts: socfpga: add Agilex3 board
  dt-bindings: intel: Add Agilex3 SoCFPGA board
  arm64: dts: intel: agilex5: Add Altera compatible for I3C controllers
  arm64: dts: socfpga: Add Agilex5 SVC node with memory region
  dt-bindings: firmware: svc: Add IOMMU support for Agilex5
  arm64: dts: socfpga: agilex5: Add SMMU-V3-PMCG nodes
  arm64: dts: socfpga: agilex5: Add L2 and L3 cache
  arm64: dts: socfpga: agilex5: fix CHECK_DTBS warning for NAND
  arm64: dts: socfpga: agilex5: add support for 013b board
  dt-bindings: intel: Add Agilex5 SoCFPGA 013b board
  arm64: dts: socfpga: agilex5: add VGIC maintenance interrupt
  arm64: dts: socfpga: agilex: fix dtbs_check warning for NAND
  arm64: dts: socfpga: agilex: fix dtbs_check warning for clock manager
  arm64: dts: socfpga: stratix10-swvp: fix dtbs_check warnings swvp
  arm64: dts: socfpga: move sdmmc-ecc to the base DTSI file
  ARM: dts: socfpga: add Enclustra SoM dts files
  dt-bindings: altera: removal of generic PE1 dts
  ARM: dts: socfpga: removal of generic PE1 dts
  dt-bindings: altera: add Mercury AA1 variants
  ...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 17:13:50 +01:00
Arnd Bergmann
976e33268c Merge tag 'imx-dt64-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into soc/dt
i.MX arm64 device tree changes for 6.19:

- New board support: Protonic PRT8ML, Toradex SMARC iMX95, Skov Rev.C
  HDMI, i.MX 95 Verdin Evaluation KitPHYTEC phyBOARD-Segin-i.MX91 board,
  Skov i.MX8MP variant
- A series from Alexander Stein to clean up and improve imx95-tqma9596sa
  board support
- Add MicIn routing support for mba8mx boards
- A couple of patch sets from Frank Li to clean up dt-schema warnings
  and add more device support for imx8dxl and imx8qxp boards
- A series from Ioana Ciornei to add FPGA based GPIO controller and SFP+
  cages for layerscape boards
- A change from Jan Petrous to add GMAC Ethernet for S32G2 EVB, RDB2 and
  S32G3 RDB3 boards
- A series from Markus Niebel to improve imx95-tqma9596sa board support
- A couple of changes from Martin Kepplinger-Novaković to enable cpuidle
  cooling device support for imx8mp
- A series from Max Krummenacher to clean up todo and add thermal
  support for imx8-apalis board
- A series from Primoz Fiser to add USB vbus regulators, jtag and
  pwm-fan overlay for imx93-phyboard
- A couple of series from Richard Zhu to add supports-clkreq property
  and vpcie3v3aux regulator for PCIe M.2 device
- A series from Stefano Radaelli to add WiFi, BT, PMIC, WM8904 audio,
  and ADS7846 touchscreen support for imx93-var-som
- A series from Tim Harvey to make some cleanups for imx8mm-venice
  boards
- A change from Xu Yang to add DDR Perf Monitor support for i.MX94
- Other small and random changes

* tag 'imx-dt64-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: (122 commits)
  arm64: dts: freescale: add Toradex SMARC iMX95
  arm64: dts: freescale: tqma9352: Add vcc-supply for spi-nor
  arm64: dts: mb-smarc-2: Add MicIn routing
  arm64: dts: mba8xx: Add MicIn routing
  arm64: dts: mba8mx: Add MicIn routing
  arm64: dts: imx8mp: make 'dsp' node depend on 'aips5'
  arm64: dts: imx8mp: convert 'aips5' to 'aipstz5'
  arm64: dts: imx8mp-skov: add Rev.C HDMI support
  arm64: dts: imx8mp: Add missing LED enumerators for DH electronics i.MX8M Plus DHCOM on PDK2
  arm64: dts: freescale: Add GMAC Ethernet for S32G2 EVB and RDB2 and S32G3 RDB3
  arm64: dts: imx8qm-apalis: add pwm used by the backlight
  arm64: dts: imx95-tqma9596sa-mb-smarc-2: add aliases for SPI
  arm64: dts: imx95-tqma9596sa-mb-smarc-2: remove superfluous line
  arm64: dts: imx95-tqma9596sa-mb-smarc-2: mark LPUART1 as reserved
  arm64: dts: imx95-tqma9596sa-mb-smarc-2: Add MicIn routing
  arm64: dts: imx95-tqma9596sa: add EEPROM pagesize
  arm64: dts: imx95-tqma9596sa: whitespace fixes
  arm64: dts: imx95-tqma9596sa: add gpio bus recovery for i2c
  arm64: dts: imx95-tqma9596sa: remove superfluous pinmux for usdhci
  arm64: dts: imx95-tqma9596sa: remove superfluous pinmux for i2c
  ...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 17:10:42 +01:00
Arnd Bergmann
e5060380e0 Merge tag 'imx-dt-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into soc/dt
i.MX ARM device tree changes for 6.19:

- A bunch of dt-schema warning cleanup patches from Frank Li
- A couple of imx6dl-yapp4 board update from Michal Vokáč to enable
  pwm-beeper and model the RGB LED as a single multi-led part
- Enable PMIC RTC on imx53-qsrb board
- Correct rtc compatible for imx6q-evi board
- Add sy7636 support for e70k02 board
- Replace license text comment with SPDX identifier for imx53-usbarmory
  board
- Add I2S audio support for imx28-amarula-rmm board

* tag 'imx-dt-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: (29 commits)
  ARM: dts: imx6qdl: make VAR-SOM SoM SoC-agnostic
  ARM: dts: imx6dl-yapp4: Model the RGB LED as a single multi-led part
  ARM: dts: imx6dl-yapp43: Enable pwm-beeper on boards with speaker
  ARM: dts: imx: e70k02: add sy7636
  ARM: dts: imx28-amarula-rmm: add I2S audio
  ARM: dts: imx: add vdd-supply and vddio-supply for fsl,mpl3115
  ARM: dts: imx7ulp: remove bias-pull-up
  ARM: dts: remove undocumented clock-names for ov5642
  ARM: dts: add device_type for memory node
  ARM: dts: Add bus type for parallel ov5640
  ARM: dts: imx6q-cm-fx6.dts: add supplies for wm8731
  ARM: dts: imx6qdl-skov-cpu fix typo interrupt
  ARM: dts: imx: remove redundant linux,phandle
  ARM: dts: imx6ull-dhcom-pdk2: rename power-supply to vcc-supply for touchscreen
  ARM: dts: imx: add power-supply for lcd panel
  ARM: dts: imx6qdl-nitrogen6_max: rename i2c<n>mux to i2c
  ARM: dts: imx6ull-phytec-tauri: remove extra space before jedec,spi-nor
  ARM: dts: imx6q-utilite-pro: add missing required property for pci
  ARM: dts: imx6-tbs2910: rename ir_recv to ir-receiver
  ARM: dts: imx6: remove pinctrl-name if pinctrl-0 doesn't exist
  ...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 17:04:25 +01:00
Arnd Bergmann
e828dff381 Merge tag 'sunxi-dt-for-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into soc/dt
Allwinner device tree changes for 6.19

The A523 family gains support for I2S and SPDIF audio interfaces, as
well as the GMAC200 Ethernet controller.

The H616 gains support for the NAND controller.

* tag 'sunxi-dt-for-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  arm64: dts: allwinner: a523: Add SPDIF TX pin on PB and PI pins
  arm64: dts: allwinner: a523: Add I2S2 pins on PI pin group
  arm64: dts: allwinner: a523: Add device nodes for I2S controllers
  arm64: dts: allwinner: a523: Add device node for SPDIF block
  arm64: dts: allwinner: a523: Add DMA controller device nodes
  dt-bindings: dma: allwinner,sun50i-a64-dma: Add compatibles for A523
  arm64: dts: allwinner: h616: add NAND controller
  arm64: dts: allwinner: t527: orangepi-4a: Enable Ethernet port
  arm64: dts: allwinner: t527: avaota-a1: enable second Ethernet port
  arm64: dts: allwinner: a527: cubie-a5e: Enable second Ethernet port
  arm64: dts: allwinner: a523: Add GMAC200 ethernet controller

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 16:58:34 +01:00
Arnd Bergmann
252f83d547 Merge tag 'tegra-for-6.19-arm64-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/dt
arm64: tegra: Device tree changes for v6.19-rc1

This contains a bunch of additions and improvements for older devices.
Tegra210 devices now have empty reserved-memory nodes to improve inter-
operability with certain bootloaders. These chips now also support more
multimedia engines. A new variant of the Jetson Nano is also added.

Jetson TX2 sees some improvements. PCI endpoint mode is improved for
Tegra234 so that reset interrupts are properly routed.

A new RTC device is added starting with Orin.

Rounding things off is a flurry of small fixes for DT validation and USB
OTG mode.

* tag 'tegra-for-6.19-arm64-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux: (25 commits)
  arm64: tegra: Remove OTG ID GPIO from Jetson TX2 NX
  arm64: tegra: Set USB Micro-B port to OTG mode on P3450
  arm64: tegra: Add NVJPG node for Tegra210 platforms
  arm64: tegra: Add Tegra210 NVJPG power-domain node
  arm64: tegra: Add interrupts for Tegra234 USB wake events
  arm64: tegra: Add reserved-memory node for P2180
  arm64: tegra: Add reserved-memory node for P3450
  arm64: tegra: Enable NVDEC and NVENC on Tegra210
  arm64: tegra: Fix APB DMA controller node name
  arm64: tegra: Add default GIC address cells on Tegra210
  arm64: tegra: Add default GIC address cells on Tegra194
  arm64: tegra: Add default GIC address cells on Tegra186
  arm64: tegra: Add default GIC address cells on Tegra132
  arm64: tegra: Add OPP tables on Tegra210
  arm64: tegra: Add interconnect properties for Tegra210
  arm64: tegra: Add ACTMON on Tegra210
  arm64: tegra: Add device-tree node for NVVRS RTC
  arm64: tegra: Move avdd-dsi-csi-supply into CSI node
  arm64: tegra: Drop redundant clock and reset names from TSEC node
  arm64: tegra: Move HDA into the correct bus
  ...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 16:54:51 +01:00
Arnd Bergmann
17aaeb67e1 Merge tag 'tegra-for-6.19-arm-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/dt
ARM: tegra: Device tree changes for v6.19-rc1

Add more host1x devices on Tegra114 and Tegra124, as well as CSI for
Tegra20 and Tegra30. Support for the Xiaomi Mi Pad is also added.

* tag 'tegra-for-6.19-arm-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  ARM: tegra: Add device-tree for Xiaomi Mi Pad (A0101)
  ARM: tegra: add CSI nodes for Tegra20 and Tegra30
  ARM: tegra: Add missing HOST1X device nodes on Tegra124
  ARM: tegra: Add missing HOST1X device nodes on Tegra114

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 16:50:42 +01:00
Arnd Bergmann
5d5b665e3e Merge tag 'tegra-for-6.19-dt-bindings' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/dt
dt-bindings: Changes for v6.19-rc1

Document various new IPs on older chips, as well as some existing
developer kits that were missing compatible strings. Add power domain
IDs on Tegra264 and wake-up support for the XUSB controller on Tegra234.

* tag 'tegra-for-6.19-dt-bindings' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  dt-bindings: usb: Add wake-up support for Tegra234 XUSB host controller
  dt-bindings: devfreq: tegra30-actmon: Add Tegra124 fallback for Tegra210
  dt-bindings: display: tegra: Document Tegra20 and Tegra30 CSI
  dt-bindings: display: tegra: document EPP, ISP, MPE and TSEC for Tegra114+
  dt-bindings: arm: tegra: Document Jetson Nano Devkits
  dt-bindings: power: Add power domain IDs for Tegra264

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 16:49:09 +01:00
Arnd Bergmann
9481f1ce9b Merge tag 'renesas-dts-for-v6.19-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into soc/dt
Renesas DTS updates for v6.19 (take two)

  - Add Imagination Technologies PowerVR Series 7XE GE7800 GPU support
    for the R-Car M3-N and V3U SoCs,
  - Add Ethernet support for the RZ/T2H and RZ/N2H SoCs and their
    evaluation boards,
  - Add ADC support for the RZ/N1D SoC,
  - Add thermal, NMI pushbutton, and RTC support for the RZ/V2H SoC and
    the RZ/V2H EVK development board,
  - Add USB2.0 support for the RZ/G3S SoC and the RZ/G3S SMARC Carrier
    II board.

* tag 'renesas-dts-for-v6.19-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel:
  arm64: dts: renesas: rzt2h-n2h-evk: Enable Ethernet support
  arm64: dts: renesas: r9a09g057h44-rzv2h-evk: Enable RTC
  arm64: dts: renesas: r9a09g057: Add RTC node
  arm64: dts: renesas: r9a09g057h44-rzv2h-evk: Add NMI pushbutton support
  arm64: dts: renesas: rzg3s-smarc: Enable USB support
  arm64: dts: renesas: r9a08g045: Add USB support
  arm64: dts: renesas: r9a09g057: Add TSU nodes
  ARM: dts: renesas: r9a06g032: Add the ADC device
  arm64: dts: renesas: r9a09g087: Add GMAC nodes
  arm64: dts: renesas: r9a09g077: Add GMAC nodes
  arm64: dts: renesas: r9a09g087: Add ETHSS node
  arm64: dts: renesas: r9a09g077: Add ETHSS node
  arm64: dts: renesas: r8a779a0: Add GE7800 GPU node
  arm64: dts: renesas: r8a77965: Add GE7800 GPU node
  dt-bindings: clock: r8a779a0: Add ZG core clock

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 16:46:10 +01:00
Arnd Bergmann
441bd15680 Merge tag 'spacemit-dt-for-6.19-1' of https://github.com/spacemit-com/linux into soc/dt
RISC-V SpacemiT DT changes for 6.19

- Add Uart and I2C nodes
- Add P1 PMIC nodes
- Add MusePi Pro board support
- Add OrangePi R2S board support
- Enable eeprom for BPI-F3
- Enable QSPI on BPI-F3
- Enable Ethernet and PDMA on OrangePi RV2

* tag 'spacemit-dt-for-6.19-1' of https://github.com/spacemit-com/linux:
  riscv: dts: spacemit: define all missing I2C controller nodes
  riscv: dts: spacemit: reorder i2c2 node
  riscv: dts: spacemit: Add OrangePi R2S board device tree
  dt-bindings: riscv: spacemit: Add OrangePi R2S board
  riscv: dts: spacemit: enable K1 SoC QSPI on BPI-F3
  riscv: dts: spacemit: add MusePi Pro board device tree
  dt-bindings: riscv: spacemit: add MusePi Pro board
  riscv: dts: spacemit: add Ethernet and PDMA to OrangePi RV2
  riscv: dts: spacemit: add i2c aliases on BPI-F3
  riscv: dts: spacemit: add 24c02 eeprom on BPI-F3
  riscv: dts: spacemit: enable the i2c2 adapter on BPI-F3
  riscv: dts: spacemit: define regulator constraints
  riscv: dts: spacemit: define fixed regulators
  riscv: dts: spacemit: enable the i2c8 adapter
  riscv: dts: spacemit: add UART pinctrl combinations

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 16:41:52 +01:00
Arnd Bergmann
3350e81c62 Merge tag 'mtk-dts64-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux into soc/dt
MediaTek ARM64 Device Tree updates

This adds support for new boards and variants based on different
already supported MediaTek SoCs, and improves support for current
boards.

In particular:
 - New machines:
    - MT7988 BananaPi R4 Pro eMMC and SD router board with support
      for both Key-M and Key-E M.2 slots through DTB Overlays
    - MT8370 Grinn GenioSBC-510 (GenioSOM-510 + GenioBoard Edge AI)
    - MT8390 Grinn GenioSBC-700 (GenioSOM-700 + GenioBoard Edge AI)
 - New variant: MT8395 MediaTek Genio 1200 EVK with UFS

...preparation for new SoCs (MT8196 Kompanio Ultra, a clone of the
   MT6991 Dimensity 9400, and MT6878 Dimensity 7300) with the
   addition of GCE/PIO definitions

...improvements for already supported SoCs and machines:
 - MT7622/7981b/7986a/7988a gain support for reading SoC UUID from
   eFuse, used to generate a persistent MAC address on boards that
   don't have any factory-assigned addresses.
 - MT7986 BananaPi R3 gets changes to its default fan PWM speed to
   improve compatibility with cheaper fans (usually coming with the
   heatsink+fan combos)
 - The MT7981b OpenWRT One router sees general support improvements
   with the enablement of its UART-0 console and correct pinmuxing
   for the same, addition of reserved memory for Trusted Firmware A,
   its SPI NOR Flash (for recovery system, WiFi eeprom data and ETH
   MAC address from factory), and board LEDs.
 - MT8365 gets support for its Mali G52 MC1 GPU, which gets enabled
   in the MediaTek Genio 350 EVK board

...and a dt-bindings warning fix for MT8183 machines through trivial
changes to rename the audiosys and afe nodes to reflect bindings.

* tag 'mtk-dts64-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux: (27 commits)
  arm64: dts: mediatek: mt7981b-openwrt-one: Enable software leds
  arm64: dts: mediatek: mt7981b-openwrt-one: Enable SPI NOR
  arm64: dts: mediatek: mt7988a-bpi-r4pro: Add mmc overlays
  arm64: dts: mediatek: mt7988a-bpi-r4-pro: Add PCIe overlays
  arm64: dts: mediatek: mt7988: Add devicetree for BananaPi R4 Pro
  arm64: dts: mediatek: mt7988: Disable 2.5G phy and enable at board layer
  dt-bindings: arm: mediatek: add BPI-R4 Pro board
  arm64: dts: mediatek: Add GCE header for MT8196
  arm64: dts: mediatek: mt7981b: Add reserved memory for TF-A
  arm64: dts: mediatek: mt7981b: Configure UART0 pinmux
  arm64: dts: mediatek: mt8365-evk: Enable GPU support
  arm64: dts: mediatek: mt8365: Add GPU support
  arm64: dts: mediatek: mt8395-genio-1200-evk: Describe CPU supplies
  arm64: dts: mediatek: Add MT6878 pinmux macro header file
  arm64: dts: mediatek: mt7986-bpi-r3: Change fan PWM value for mid speed
  arm64: dts: mediatek: mt8370-grinn-genio-510-sbc: Add Grinn GenioSBC-510
  arm64: dts: mediatek: mt8390-genio-700-evk: Add Grinn GenioSBC-700
  arm64: dts: mediatek: mt7988a: add 'soc-uuid' cell to efuse
  arm64: dts: mediatek: mt7981b: add 'soc-uuid' cell to efuse
  arm64: dts: mediatek: mt7986a: add 'soc-uuid' cell to efuse
  ...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 16:39:35 +01:00
Arnd Bergmann
c4aee1ad41 Merge tag 'mtk-dts32-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux into soc/dt
MediaTek ARM32 Device Tree updates

This performs a cleanup of the MT6582 devicetrees and adds support
for secondary cores bringup on this SoC.

This also introduces basic support for a new machine, the MT6582
Alcatel "yarisxl" Pop C7 (OT-7041D) smartphone, with support for
booting into a initramfs with UART console output.

* tag 'mtk-dts32-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux:
  ARM: dts: mediatek: drop wrong syscon hifsys compatible for MT2701/7623
  ARM: dts: mediatek: add basic support for Alcatel yarisxl board
  dt-bindings: arm: mediatek: Add MT6582 yarisxl
  ARM: dts: mediatek: mt6582: add enable-method property to cpus
  ARM: dts: mediatek: mt6582: add clock-names property to uart nodes
  ARM: dts: mediatek: mt6582: add mt6582 compatible to timer
  ARM: dts: mediatek: mt6582: remove compatible property from root node
  ARM: dts: mediatek: mt6582: sort nodes and properties
  ARM: dts: mediatek: mt6582: move MMIO devices under soc node

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 16:38:24 +01:00
Arnd Bergmann
ab07edaab6 Merge tag 'samsung-dt64-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/dt
Samsung DTS ARM64 changes for v6.19

1. ExynosAutov920:
 - Add more clock controller nodes.

2. Google GS101:
 - PMIC clock
 - Mark ACPM (Alive Clock and Power Manager) firmware node as clock
   provider and use its clocks.  Add also Devicetree binding headers
   with clock its clock indices used in DTS (kept as separate branch).
 - Add more SYSREG (syscon) regions.
 - Correct several blocks address space sizes and APM SYSREG's starting
   address.

3. Exynos7870:
 - Enable display over DSI and several display planels.
 - Few cleanups.

* tag 'samsung-dt64-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
  arm64: dts: exynos7870-j6lte: enable display panel support
  arm64: dts: exynos7870-a2corelte: enable display panel support
  arm64: dts: exynos7870-on7xelte: enable display panel support
  arm64: dts: exynos7870: add DSI support
  arm64: dts: exynos: gs101: fix sysreg_apm reg property
  arm64: dts: exynos: gs101: fix clock module unit reg sizes
  arm64: dts: exynos: gs101: add sysreg_misc and sysreg_hsi0 nodes
  arm64: dts: exynos: gs101: add OPPs
  arm64: dts: exynos: gs101: add CPU clocks
  arm64: dts: exynos: gs101: add #clock-cells to the ACPM protocol node
  dt-bindings: firmware: google,gs101-acpm-ipc: add ACPM clocks
  arm64: dts: exynos: gs101-pixel-common: add node for s2mpg10 / clock
  arm64: dts: exynos990: Add sysreg nodes for PERIC0 and PERIC1
  arm64: dts: exynosautov920: add CMU_MFC clock DT nodes
  arm64: dts: exynosautov920: add CMU_M2M clock DT nodes
  arm64: dts: exynos7870-on7xelte: add bus-width to mmc0 node
  arm64: dts: exynos7870-j6lte: add bus-width to mmc0 node
  arm64: dts: exynos7870-a2corelte: add bus-width to mmc0 node

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 16:36:49 +01:00
Arnd Bergmann
7e90eede6b Merge tag 'thead-dt-for-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/fustini/linux into soc/dt
T-HEAD Devicetrees for v6.19

Add PWM controlled fan and it's associated thermal management for the
Lichee Pi 4A board.

Enable additional ISA extenstions supported by the T-Head C910 cores:
Zfh, Ziccrse, XTheadvector.

Add reset controllers of more TH1520 subsystems: AP, AO, DSP, MISC, VI.

Signed-off-by: Drew Fustini <fustini@kernel.org>

* tag 'thead-dt-for-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/fustini/linux:
  riscv: dts: thead: Add reset controllers of more subsystems for TH1520
  riscv: dts: thead: Add PWM fan and thermal control
  riscv: dts: thead: Add PWM controller node
  riscv: dts: thead: add zfh for th1520
  riscv: dts: thead: add ziccrse for th1520
  riscv: dts: thead: add xtheadvector to the th1520 devicetree

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 16:35:58 +01:00
Arnd Bergmann
156c42c7fc Merge tag 'renesas-dts-for-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into soc/dt
Renesas DTS updates for v6.19

  - Add thermal support for the RZ/G3S and RZ/G3E SoCs,
  - Add DT overlay support for the Raspberry Pi Display 2 and Argon40
    fan hats on the Retronix Sparrow Hawk board,
  - Add eMMC support for the Eagle Function expansion board,
  - Add initial support for the R-Car X5H (R8A78000) SoC and the
    Ironhide development board,
  - Move interrupt-parent properties to root nodes,
  - Add system watchdog timer support for R-Car Gen3 and Gen4 SoCs,
    which is reserved for secure firmware on R-Car Gen3 boards,
  - Add ADC support for the RZ/T2H and RZ/N2H SoCs and their evaluation
    boards,
  - Add watchdog timer support on the R-Car V3M Starter Kit board,
  - Add Cortex-A55 PMU support on the RZ/V2H, RZ/V2N, RZ/T2H, and RZ/N2H
    SoCs,
  - Add Imagination Technologies PowerVR Series 6XT GX6250 GPU support
    on the R-Car M3-W and M3-W+ SoCs,
  - Miscellaneous fixes and improvements.

* tag 'renesas-dts-for-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel: (80 commits)
  arm64: dts: renesas: sparrow-hawk: Fix full-size DP connector node name and labels
  arm64: dts: renesas: r8a77961: Add GX6250 GPU node
  arm64: dts: renesas: r8a77960: Add GX6250 GPU node
  ARM: dts: renesas: kzm9g: Name interrupts for accelerometer
  arm64: dts: renesas: r9a09g087: Add Cortex-A55 PMU node
  arm64: dts: renesas: r9a09g077: Add Cortex-A55 PMU node
  arm64: dts: renesas: r9a09g056: Add Cortex-A55 PMU node
  arm64: dts: renesas: r9a09g057: Add Cortex-A55 PMU node
  ARM: dts: renesas: r9a06g032-rzn1d400-db: Drop invalid #cells properties
  arm64: dts: renesas: v3msk: Enable watchdog timer
  arm64: dts: renesas: r8a779h0: Add SWDT node
  arm64: dts: renesas: r8a779g0: Add SWDT node
  arm64: dts: renesas: r8a779f0: Add SWDT node
  arm64: dts: renesas: r8a779a0: Add SWDT node
  arm64: dts: renesas: rzt2h/rzn2h-evk: Enable ADCs
  arm64: dts: renesas: r9a09g087: Add ADCs support
  arm64: dts: renesas: r9a09g077: Add ADCs support
  ARM: dts: renesas: koelsch: Update ADV7180 binding
  ARM: dts: renesas: r9a06g032: Move interrupt-parent to root node
  ARM: dts: renesas: r8a7794: Move interrupt-parent to root node
  ...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 16:30:48 +01:00
Arnd Bergmann
7ab4745858 Merge tag 'aspeed-6.19-devicetree-0' of https://git.kernel.org/pub/scm/linux/kernel/git/bmc/linux into soc/dt
First batch of ASPEED Arm devicetree changes for 6.19

Significant changes:

- The IBM Power11 FSI DTSIs have been rearranged to accommodate new systems

New platforms:

- IBM Balcones

  The Balcones system is similar to Bonnell but with a POWER11 processor.
  Like POWER10, the POWER11 is a dual-chip module, so a dual chip FSI
  tree is needed.

- Meta Yosemite5

  The Yosemite5 platform provides monitoring of voltages, power,
  temperatures, and other critical parameters across the motherboard,
  CXL board, E1.S expansion board, and NIC components.

Updated platforms:

- clemente (Meta): LEDs, shunt resistor configuration
- santabarbara (Meta): AMD APML, EEPROMs, LEDs, GPIO line names, MCTP for NICs

There are a scattering of one-off changes and devicetree cleanups for other
platforms as well.

* tag 'aspeed-6.19-devicetree-0' of https://git.kernel.org/pub/scm/linux/kernel/git/bmc/linux:
  ARM: dts: aspeed: santabarbara: Add eeprom device node for PRoT module
  ARM: dts: aspeed: santabarbara: Add AMD APML interface support
  ARM: dts: aspeed: santabarbara: Add gpio line name
  ARM: dts: aspeed: santabarbara: Add bmc_ready_noled Led
  ARM: dts: aspeed: santabarbara: Enable MCTP for frontend NIC
  ARM: dts: aspeed: santabarbara: Add sensor support for extension boards
  ARM: dts: aspeed: santabarbara: Add blank lines between nodes for readability
  ARM: dts: aspeed: yosemite5: Add Meta Yosemite5 BMC
  dt-bindings: arm: aspeed: add Meta Yosemite5 board
  ARM: dts: aspeed: clemente: Add HDD LED GPIO
  ARM: dts: aspeed: Fix max31785 fan properties
  ARM: dts: aspeed: Add Balcones system
  dt-bindings: arm: aspeed: add IBM Bonnell board
  dt-bindings: arm: aspeed: add IBM Balcones board
  ARM: dts: aspeed: harma: Add MCTP I2C controller node
  ARM: dts: aspeed: yosemite4: allocate ramoops for kernel panic
  ARM: dts: aspeed: clemente: add shunt-resistor-micro-ohms for LM5066i

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 16:28:43 +01:00
Arnd Bergmann
6ef9a6fca7 Merge tag 'pxa1908-dt-for-6.19' of https://gitlab.com/pxa1908-mainline/linux into soc/dt
PXA1908 DT changes for 6.19

Rollup of hardware support which has accumulated since support for the
SoC and coreprimevelte board was merged. This most notably includes
eMMC, PMIC, backlight and touchscreen. A few QoL fixes are also
included.

* tag 'pxa1908-dt-for-6.19' of https://gitlab.com/pxa1908-mainline/linux:
  arm64: dts: marvell: pxa1908: Add power domains
  arm64: dts: marvell: samsung,coreprimevelte: Add USB connector
  arm64: dts: marvell: samsung,coreprimevelte: Fill in memory node
  arm64: dts: marvell: samsung,coreprimevelte: Drop some reserved memory
  arm64: dts: marvell: pxa1908: Move ramoops to SoC dtsi
  arm64: dts: marvell: samsung,coreprimevelte: Add vibrator
  arm64: dts: marvell: pxa1908: Add PWMs
  arm64: dts: marvell: samsung,coreprimevelte: Enable eMMC
  arm64: dts: marvell: samsung,coreprimevelte: Correct CD GPIO
  arm64: dts: marvell: samsung,coreprimevelte: Add backlight
  arm64: dts: samsung,coreprimevelte: add SDIO
  arm64: dts: samsung,coreprimevelte: add touchscreen
  arm64: dts: samsung,coreprimevelte: add PMIC

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-21 16:26:26 +01:00
Arnd Bergmann
2bab99283d Merge tag 'tenstorrent-dt-for-v6.19' of https://github.com/tenstorrent/linux into soc/newsoc
Tenstorrent device tree for v6.19

Add Tenstorrent as a vendor and enable support for the Blackhole SoC
in Blackhole P100 and P150 PCIe cards. The SoC contains four RISC-V
CPU tiles consisting of 4x SiFive X280 cores.

There is a virtual UART implemented in OpenSBI firmware that allows a
console program on the PCIe host to communicate through shared memory
with Linux running on the Blackhole card.

Link: https://github.com/tenstorrent/tt-bh-linux
Link: https://github.com/tenstorrent/opensbi/
Signed-off-by: Drew Fustini <fustini@kernel.org>

* tag 'tenstorrent-dt-for-v6.19' of https://github.com/tenstorrent/linux:
  riscv: defconfig: Enable Tenstorrent SoCs
  riscv: Kconfig.socs: Add ARCH_TENSTORRENT for Tenstorrent SoCs
  riscv: dts: Add Tenstorrent Blackhole SoC PCIe cards
  dt-bindings: interrupt-controller: Add Tenstorrent Blackhole compatible
  dt-bindings: timers: Add Tenstorrent Blackhole compatible
  dt-bindings: riscv: cpus: Add SiFive X280 compatible
  dt-bindings: riscv: Add Tenstorrent Blackhole compatible
  dt-bindings: vendor-prefixes: Add Tenstorrent AI ULC
2025-11-21 16:22:33 +01:00
Eric Farman
44acac00be KVM: s390: vsie: Check alignment of BSCA header
The VSIE code currently checks that the BSCA struct fits within
a page, and returns a validity exception 0x003b if it doesn't.
The BSCA is pinned in memory rather than shadowed (see block
comment at end of kvm_s390_cpu_feat_init()), so enforcing the
CPU entries to be on the same pinned page makes sense.

Except those entries aren't going to be used below the guest,
and according to the definition of that validity exception only
the header of the BSCA (everything but the CPU entries) needs to
be within a page. Adjust the alignment check to account for that.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2025-11-21 10:33:57 +01:00
Janosch Frank
8e8678e740 KVM: s390: Add capability that forwards operation exceptions
Setting KVM_CAP_S390_USER_OPEREXEC will forward all operation
exceptions to user space. This also includes the 0x0000 instructions
managed by KVM_CAP_S390_USER_INSTR0. It's helpful if user space wants
to emulate instructions which do not (yet) have an opcode.

While we're at it refine the documentation for
KVM_CAP_S390_USER_INSTR0.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2025-11-21 10:26:03 +01:00
Vignesh Raghavendra
56baa91909 arm64: dts: ti: k3-am62l: Fix unit address of cbass_wakeup
Fix the following warning with W=1:
arch/arm64/boot/dts/ti/k3-am62l.dtsi:101.30-112.5: Warning (simple_bus_reg): /bus@f0000/bus@43000000: simple-bus unit address format error, expected "a80000"

While at that, also remove extra space b/w label and node name.

Fixes: 5f016758b0 ("arm64: dts: ti: k3-am62l: add initial infrastructure")
Link: https://patch.msgid.link/20251120143419.223238-1-vigneshr@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-21 09:25:42 +05:30
Siddharth Vadapalli
51f89c488f arm64: dts: ti: k3-j721e-sk: Fix pinmux for pin Y1 used by power regulator
The SoC pin Y1 is incorrectly defined in the WKUP Pinmux device-tree node
(pinctrl@4301c000) leading to the following silent failure:

    pinctrl-single 4301c000.pinctrl: mux offset out of range: 0x1dc (0x178)

According to the datasheet for the J721E SoC [0], the pin Y1 belongs to the
MAIN Pinmux device-tree node (pinctrl@11c000). This is confirmed by the
address of the pinmux register for it on page 142 of the datasheet which is
0x00011C1DC.

Hence fix it.

[0]: https://www.ti.com/lit/ds/symlink/tda4vm.pdf

Fixes: 97b67cc102 ("arm64: dts: ti: k3-j721e-sk: Add DT nodes for power regulators")
Cc: stable@vger.kernel.org
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Yemike Abhilash Chandra <y-abhilashchandra@ti.com>
Link: https://patch.msgid.link/20251119160148.2752616-1-s-vadapalli@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-21 09:25:42 +05:30
Rob Herring (Arm)
56ca5a3030 arm64: dts: ti: Add missing applied DT overlay targets
It's a requirement that DT overlays be applied at build time in order to
validate them as overlays are not validated on their own.

Add the missing TI overlays. Some of the TI overlays have the first part
needed (a "*-dtbs" variable), but not the second part adding the target to
dtb-y/dtb- variable.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
[vigneshr@ti.com: create new target for J721e GESI EVM]
Link: https://patch.msgid.link/20251120141936.190796-1-vigneshr@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-21 09:25:24 +05:30
Jim Mattson
6a8818de21 KVM: selftests: Add a VMX test for LA57 nested state
Add a selftest that verifies KVM's ability to save and restore
nested state when the L1 guest is using 5-level paging and the L2
guest is using 4-level paging. Specifically, canonicality tests of
the VMCS12 host-state fields should accept 57-bit virtual addresses.

Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://patch.msgid.link/20251028225827.2269128-5-jmattson@google.com
[sean: rename to vmx_nested_la57_state_test to prep nested_<test> namespace]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:21:52 -08:00
Jim Mattson
ec5806639e KVM: selftests: Change VM_MODE_PXXV48_4K to VM_MODE_PXXVYY_4K
Use 57-bit addresses with 5-level paging on hardware that supports
LA57. Continue to use 48-bit addresses with 4-level paging on hardware
that doesn't support LA57.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://patch.msgid.link/20251028225827.2269128-4-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:59 -08:00
Jim Mattson
2103a8baf5 KVM: selftests: Use a loop to walk guest page tables
Walk the guest page tables via a loop when searching for a PTE,
instead of using unique variables for each level of the page tables.

This simplifies the code and makes it easier to support 5-level paging
in the future.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251028225827.2269128-3-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:59 -08:00
Jim Mattson
ae5b498b8d KVM: selftests: Use a loop to create guest page tables
Walk the guest page tables via a loop when creating new mappings,
instead of using unique variables for each level of the page tables.

This simplifies the code and makes it easier to support 5-level paging
in the future.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251028225827.2269128-2-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:59 -08:00
Yosry Ahmed
ff736dba47 KVM: selftests: Remove the unused argument to prepare_eptp()
eptp_memslot is unused, remove it. No functional change intended.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-10-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:57 -08:00
Yosry Ahmed
28b2dced8b KVM: selftests: Stop hardcoding PAGE_SIZE in x86 selftests
Use PAGE_SIZE instead of 4096.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-9-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:56 -08:00
Yosry Ahmed
3c40777f0e KVM: selftests: Extend vmx_tsc_adjust_test to cover SVM
Add SVM L1 code to run the nested guest, and allow the test to run with
SVM as well as VMX.

Reviewed-by: Jim Mattson <jmattson@google.com>

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-8-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:56 -08:00
Yosry Ahmed
91423b041d KVM: selftests: Extend nested_invalid_cr3_test to cover SVM
Add SVM L1 code to run the nested guest, and allow the test to run with
SVM as well as VMX.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-7-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:55 -08:00
Yosry Ahmed
4d256d00e4 KVM: selftests: Move nested invalid CR3 check to its own test
vmx_tsc_adjust_test currently verifies that a nested VMLAUNCH fails with
an invalid CR3. This is irrelevant to TSC scaling, move it to a
standalone test.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-6-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:54 -08:00
Yosry Ahmed
e6bcdd2122 KVM: selftests: Extend vmx_nested_tsc_scaling_test to cover SVM
Add SVM L1 code to run the nested guest, and allow the test to run with
SVM as well as VMX.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-5-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:54 -08:00
Yosry Ahmed
0a9eb2afa1 KVM: selftests: Extend vmx_close_while_nested_test to cover SVM
Add SVM L1 code to run the nested guest, and allow the test to run with
SVM as well as VMX.

Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-4-yosry.ahmed@linux.dev
[sean: rename to "nested_close_kvm_test" to provide nested_* sorting]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:53 -08:00
FUKAUMI Naoki
6ed597fdbc arm64: dts: rockchip: Use default-state for power LED for Radxa boards
Currently, on Radxa boards, the power LED is turned on immediately
after power-up, independent of software control. The heartbeat LED and
other available LEDs are subsequently turned on by the initial
software, such as U-Boot, to indicate software is running.

However, the device tree description for this behavior is inconsistent
and fragmented, with definitions split between the main Linux DTS
files and separate U-Boot files (u-boot/arch/arm/dts/*-u-boot.dtsi).

This patch addresses the inconsistency for the power LED by using
default-state = "on" instead of linux,default-trigger = "default-on".

Signed-off-by: FUKAUMI Naoki <naoki@radxa.com>
Reviewed-by: Dragan Simic <dsimic@manjaro.org>
Link: https://patch.msgid.link/20251113124222.4691-2-naoki@radxa.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-20 23:15:28 +01:00
Coia Prant
c8ecda015e arm64: dts: rockchip: fix PCIe 3.3V regulator voltage on 9Tripod X3568 v4
The regulator type is "regulator-fixed" since its voltage always be
3.3v, min and max should be 3300000 make the regulator has a voltage

The regulator is supplied by dc_12v, so add the vin-supply.

Link: 17e9559f67
Signed-off-by: Coia Prant <coiaprant@gmail.com>
Link: https://patch.msgid.link/20251114092114.62664-2-coiaprant@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-20 23:11:07 +01:00
Heiko Stuebner
f54b09d46f arm64: dts: rockchip: Add power-domain to RK3368 VOP controller
The VOP is also part of the VIO power-domain and it definitely needs to
be on when accessing it to not cause SError faults, so add the
power-domains property to it.

Fixes: ef06b5ddee ("arm64: dts: rockchip: Add display subsystem for RK3368")
Signed-off-by: Heiko Stuebner <heiko.stuebner@cherry.de>
Link: https://patch.msgid.link/20251021074254.87065-6-heiko@sntech.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-20 23:06:49 +01:00
Heiko Stuebner
8319be65e8 arm64: dts: rockchip: Add power-domain to RK3368 DSI controller
The DSI controller is also part of the VIO power-domain and it
definitely needs to be on when accessing it to not cause SError
faults, so add the power-domains property to it.

Fixes: 5023d0cd61 ("arm64: dts: rockchip: Add DSI for RK3368")
Signed-off-by: Heiko Stuebner <heiko.stuebner@cherry.de>
Link: https://patch.msgid.link/20251021074254.87065-5-heiko@sntech.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-20 23:06:49 +01:00
Chris Morgan
932feac9d8 arm64: dts: rockchip: Add host wake pin for wifi on Indiedroid Nova
Add the pin definition for the host wake interrupt on the Indiedroid
Nova. This necessitates adding a node for the wifi controller to
properly define the interrupt. Additionally, we can consolidate both
pinctrl definitions under a wifi node to note their common functionality.

Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
Link: https://patch.msgid.link/20251118223048.4531-5-macroalpha82@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-20 23:04:19 +01:00
Chris Morgan
6f563ebf0b arm64: dts: rockchip: Correct pinctrl for pcie for Indiedroid Nova
Correct the pin definitions of the PCIE controller on the Indiedroid
Nova according to the schematics. Since GPIO3 D1 is already defined
as a reset pin in the rk3588-base-pinctrl.dtsi file we do not need
a custom definition anymore.

Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
Link: https://patch.msgid.link/20251118223048.4531-4-macroalpha82@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-20 23:04:19 +01:00
Chris Morgan
835c909e85 arm64: dts: rockchip: Define regulator for pcie2x1l2 on Indiedroid Nova
Add the correct regulator defined per the schematics to the PCIE
interface for the Indiedroid Nova.

Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
Link: https://patch.msgid.link/20251118223048.4531-3-macroalpha82@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-20 23:04:13 +01:00
Chris Morgan
e28f44013d arm64: dts: rockchip: Add clk32k_in for Indiedroid Nova
The clk32k_in pin of the SoC is connected to the real time clock
according to the schematics. Set the pin definition on the real
time clock to reflect this.

Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
Link: https://patch.msgid.link/20251118223048.4531-2-macroalpha82@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-20 23:03:11 +01:00
Michael Opdenacker
9f80b39523 arm64: dts: rockchip: Add Asus Tinker Board 3 and 3S device tree
Add initial device tree support for Asus Tinker Board 3 [1] and 3S [2],
which are SBCs based on the Rockchip 3566 SoC.

The "3S" version ("S" for "storage") just adds a 16 GB eMMC
and a "mask ROM" DIP switch (to mask the eMMC and enter "Mask ROM"
mode for recovery) to the "3" version.

This adds support for:
- Debug UART (/dev/ttyS2)
- SD card (/dev/mmcblk1)
- eMMC (/dev/mmcblk0, only on Tinker Board 3S)
- I2C:
  - i2c0 (internal bus with a PMIC and regulators)
  - i2c2 (internal bus with an at24 eeprom and an RTC device)
- USB 2.0 ports
- 2 GPIO LEDS

[1] https://tinker-board.asus.com/series/tinker-board-3.html
[2] https://tinker-board.asus.com/series/tinker-board-3s.html

Signed-off-by: Michael Opdenacker <michael.opdenacker@rootcommit.com>
Link: https://patch.msgid.link/20251118-tinker3-v3-2-2903693f2ebb@rootcommit.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-20 23:02:01 +01:00
Michael Opdenacker
f4e81d0b95 dt-bindings: arm: rockchip: Add Asus Tinker Board 3/3S
Document the compatible strings for Asus Tinker Board 3 [1] and 3S [2],
which are SBCs based on the Rockchip 3566 SoC.

The "3S" version ("S" for "storage") just adds a 16 GB eMMC
and a "mask ROM" DIP switch to the "3" version.

[1] https://tinker-board.asus.com/series/tinker-board-3.html
[2] https://tinker-board.asus.com/series/tinker-board-3s.html

Signed-off-by: Michael Opdenacker <michael.opdenacker@rootcommit.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://patch.msgid.link/20251118-tinker3-v3-1-2903693f2ebb@rootcommit.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-20 22:51:15 +01:00
Lorenzo Stoakes
c7ba92bcfe testing/selftests/mm: add soft-dirty merge self-test
Assert that we correctly merge VMAs containing VM_SOFTDIRTY flags now that
we correctly handle these as sticky.

In order to do so, we have to account for the fact the pagemap interface
checks soft dirty PTEs and additionally that newly merged VMAs are marked
VM_SOFTDIRTY.

We do this by using use unfaulted anon VMAs, establishing one and clearing
references on that one, before establishing another and merging the two
before checking that soft-dirty is propagated as expected.

We check that this functions correctly with mremap() and mprotect() as
sample cases, because VMA merge of adjacent newly mapped VMAs will
automatically be made soft-dirty due to existing logic which does so.

We are therefore exercising other means of merging VMAs.

Link: https://lkml.kernel.org/r/d5a0f735783fb4f30a604f570ede02ccc5e29be9.1763399675.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrey Vagin <avagin@gmail.com>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:01 -08:00
Lorenzo Stoakes
6707915e03 mm: propagate VM_SOFTDIRTY on merge
Patch series "make VM_SOFTDIRTY a sticky VMA flag", v2.

Currently we set VM_SOFTDIRTY when a new mapping is set up (whether by
establishing a new VMA, or via merge) as implemented in __mmap_complete()
and do_brk_flags().

However, when performing a merge of existing mappings such as when
performing mprotect(), we may lose the VM_SOFTDIRTY flag.

Now we have the concept of making VMA flags 'sticky', that is that they
both don't prevent merge and, importantly, are propagated to merged VMAs,
this seems a sensible alternative to the existing special-casing of
VM_SOFTDIRTY.

We additionally add a self-test that demonstrates that this logic behaves
as expected.


This patch (of 2):

Currently we set VM_SOFTDIRTY when a new mapping is set up (whether by
establishing a new VMA, or via merge) as implemented in __mmap_complete()
and do_brk_flags().

However, when performing a merge of existing mappings such as when
performing mprotect(), we may lose the VM_SOFTDIRTY flag.

This is because currently we simply ignore VM_SOFTDIRTY for the purposes
of merge, so one VMA may possess the flag and another not, and whichever
happens to be the target VMA will be the one upon which the merge is
performed which may or may not have VM_SOFTDIRTY set.

Now we have the concept of 'sticky' VMA flags, let's make VM_SOFTDIRTY one
which solves this issue.

Additionally update VMA userland tests to propagate changes.

[akpm@linux-foundation.org: update comments, per Lorenzo]
  Link: https://lkml.kernel.org/r/0019e0b8-ee1e-4359-b5ee-94225cbe5588@lucifer.local
Link: https://lkml.kernel.org/r/cover.1763399675.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/955478b5170715c895d1ef3b7f68e0cd77f76868.1763399675.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Andrey Vagin <avagin@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:01 -08:00
SeongJae Park
6e57c1ce81 Docs/mm/damon/maintainer-profile: fix grammatical errors
Fix a few grammatical errors on DAMON maintainer-profile.

Link: https://lkml.kernel.org/r/20251112154114.66053-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:01 -08:00
SeongJae Park
7ad58e009d Docs/mm/damon/maintainer-profile: fix a typo on mm-untable link
Commit 0b473f9e6e ("Docs/mm/damon/maintainer-profile: update for mm-new
tree") mistakenly forgot putting a space between a link and the next word.
Fix it.

Link: https://lkml.kernel.org/r/20251112154114.66053-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:01 -08:00
SeongJae Park
675774adbe selftests/damon/sysfs.py: merge DAMON status dumping into commitment assertion
For each test case, sysfs.py makes changes to DAMON, dumps DAMON internal
status and asserts the expectation is met.  The dumping part should be the
same for all cases, so it is duplicated for each test case.  Which means
it is easy to make mistakes.  Actually a few of those duplicates are not
turning DAMON off in case of the dumping failure.  It makes following
selftests that need to turn DAMON on fails with -EBUSY.  Merge the status
dumping into commitment assertion with proper dumping failure handling, to
deduplicate and avoid the unnecessary following tests failures.

Link: https://lkml.kernel.org/r/20251112154114.66053-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:01 -08:00
SeongJae Park
10e8c7ba64 mm/damon/tests/core-kunit: remove DAMON_MIN_REGION redefinition
A few DAMON core functions including damon_set_regions() were hard-coded
to use DAMON_MIN_REGION as their regions management granularity.  For
simple and human-readable unit tests' expectations, DAMON core layer kunit
test re-defines DAMON_MIN_REGION to '1'.

A previous patch series [1] has removed the hard-coded part but kept the
redefinition and updated related function calls to explicitly use
DAMON_MIN_REGION.  Remove the unnecessary redefinition and update relevant
function calls to pass literals (number '1') instead of the
DAMON_MIN_REGION.

Link: https://lkml.kernel.org/r/20251112154114.66053-7-sj@kernel.org
Link: https://lore.kernel.org/20250828171242.59810-1-sj@kernel.org [1]
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:01 -08:00
SeongJae Park
09efc56a3b mm/damon/vaddr: consistently use only pmd_entry for damos_migrate
For page table walks, it is usual [1] to have only one pmd entry function.
The vaddr.c code for DAMOS_MIGRATE_{HOT,COLD} is not following the
pattern.  Instead, it uses both pmd and pte entry functions without a
special reason.  Refactor it to use only the pmd entry function, to make
the code under mm/ more consistent.

Link: https://lkml.kernel.org/r/20251112154114.66053-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Suggested-by: David Hildenbrand <david@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:01 -08:00
SeongJae Park
f0eb046cd3 mm/damon/vaddr: use vm_normal_folio{,_pmd}() instead of damon_get_folio()
A few page table walk entry callback functions in vaddr.c uses
damon_get_folio() with p{te,md}_pfn() to get the folio, and then
put_folio().  Simplify and drop unnecessary folio get/put by using
vm_normal_folio() and its friends instead.

Note that this cleanup was suggested by David Hildenbrand during a review
of another patch series [1] and the patch was updated following the
suggestion.  This patch further applies the cleanup to DAMON code that
merged before the patch.

Link: https://lkml.kernel.org/r/20251112154114.66053-5-sj@kernel.org
Link: https://lore.kernel.org/0cb3d5a5-683b-4dba-90a8-b45ab83eec53@redhat.com [1]
Signed-off-by: SeongJae Park <sj@kernel.org>
Suggested-by: David Hildenbrand <david@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:01 -08:00
SeongJae Park
96549d56b8 mm/damon/vaddr: cleanup using pmd_trans_huge_lock()
Three pmd walk functions in vaddr.c are using pmd_trans_huge() and
pmd_lock() to handle THPs.  Simplify the code by replacing the two
function calls with a single pmd_trans_huge_lock() call.

Note that this cleanup is not only reducing the lines of code, but also
simplifies code execution flows for migration entries case, as kindly
explained [1] by Hugh, who suggested this cleanup.

[sj@kernel.org: provide lvalue to pmd_present()]
  Link: https://lkml.kernel.org/r/20251117154415.11041-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20251112154114.66053-4-sj@kernel.org
Link: https://lore.kernel.org/296c2b3f-6748-158f-b85d-2952165c0588@google.com [1]
Signed-off-by: SeongJae Park <sj@kernel.org>
Suggested-by: Hugh Dickins <hughd@google.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:01 -08:00
SeongJae Park
53298afe45 mm/damon: rename damos->filters to damos->core_filters
DAMOS filters that are handled by the ops layer are linked to
damos->ops_filters.  Owing to the ops_ prefix on the name, it is easy to
understand it is for ops layer handled filters.  The other types of
filters, which are handled by the core layer, are linked to
damos->filters.  Because of the name, it is easy to confuse the list is
there for not only core layer handled ones but all filters.  Avoid such
confusions by renaming the field to core_filters.

Link: https://lkml.kernel.org/r/20251112154114.66053-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:01 -08:00
SeongJae Park
8b02baf373 mm/damon: rename damos core filter helpers to have word core
Patch series "mm/damon: misc cleanups".

Yet another batch of misc cleanups and refactoring for DAMON code, tests,
and documents.

First two patches (1and 2) rename DAMOS core filters related code for
readability.

Three following patches (3-5) refactor page table walk callback functions
in DAMON, as suggested by Hugh and David, and I promised.

Next two patches (6 and 7) refactor DAMON core layer kunit test and sysfs
interface selftest to be simple and deduplicated.

Final two patches (8 and 9) fix up sphinx and grammatical errors on
documents.


This patch (of 9):

DAMOS filters handled by the core layer are called core filters, while
those handled by the ops layer are called ops filters.  They share the
same type but are managed in different places since core filters are
evaluated before the ops filters.  They also have different helper
functions that depend on their managed places.

The helper functions for ops filters have '_ops_' keyword on their name,
so it is easy to know they are for ops filters.  Meanwhile, the helper
functions for core filters are not having the 'core' keyword on their
name.  This makes it easy to be mistakenly used for ops filters.  Actually
there was such a bug.

To avoid future mistakes from similar confusions, rename DAMOS core
filters helper functions to have a keyword 'core' on their names.

Link: https://lkml.kernel.org/r/20251112154114.66053-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20251112154114.66053-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:00 -08:00
Mehdi Ben Hadj Khelifa
1ec5d5810b selftests/mm/uffd: remove static address usage in shmem_allocate_area()
The current shmem_allocate_area() implementation uses a hardcoded virtual
base address (BASE_PMD_ADDR) as a hint for mmap() when creating
shmem-backed test areas.  This approach is fragile and may fail on systems
with ASLR or different virtual memory layouts, where the chosen address is
unavailable.

Replace the static base address with a dynamically reserved address range
obtained via mmap(NULL, ..., PROT_NONE).  The memfd-backed areas and their
alias are then mapped into that reserved region using MAP_FIXED,
preserving the original layout and aliasing semantics while avoiding
collisions with unrelated mappings.

This change improves robustness and portability of the test suite without
altering its behavior or coverage.

[mehdi.benhadjkhelifa@gmail.com: make cleanup code more clear, per Mike]
  Link: https://lkml.kernel.org/r/20251113142050.108638-1-mehdi.benhadjkhelifa@gmail.com
Link: https://lkml.kernel.org/r/20251111205739.420009-1-mehdi.benhadjkhelifa@gmail.com
Signed-off-by: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa@gmail.com>
Suggested-by: Mike Rapoport <rppt@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Hunter <david.hunter.linux@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:00 -08:00
SeongJae Park
603f67eb91 mm/damon/tests/core-kunit: add damon_commit_target_regions() test
Add a new test for damon_commit_target_regions().

Link: https://lkml.kernel.org/r/20251111184415.141757-12-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:00 -08:00
SeongJae Park
299a88f6ec mm/damon/tests/core-kunit: add damos_commit() test
Add a new unit test for damos_commit().

Link: https://lkml.kernel.org/r/20251111184415.141757-11-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:00 -08:00
SeongJae Park
eec573b8dd mm/damon/tests/core-kunit: add damos_commit_dests() test
Add a new unit test for damos_commit_dests().

Link: https://lkml.kernel.org/r/20251111184415.141757-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:00 -08:00
SeongJae Park
c1cefda776 mm/damon/core: pass migrate_dests to damos_commit_dests()
damos_commit_dests() receives 'struct damos' pointers, while it uses only
their ->migrate_dests fields.  This makes code unnecessarily difficult to
read.  It also makes unit tests writing complicated.  Refactor the
function to receive pointers to the ->migrate_dests fields.

Link: https://lkml.kernel.org/r/20251111184415.141757-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:00 -08:00
SeongJae Park
11bb980d41 mm/damon/tests/core-kunit: add damos_commit_quota() test
Add a new unit test for damos_commit_quota().

Link: https://lkml.kernel.org/r/20251111184415.141757-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:00 -08:00
SeongJae Park
d9adfb8a28 mm/damon/tests/core-kunit: add damos_commit_quota_goals() test
Add a new unit test for damos_commit_quota_goals().

Link: https://lkml.kernel.org/r/20251111184415.141757-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:00 -08:00
SeongJae Park
99f89debaf mm/damon/tests/core-kunit: add damos_commit_quota_goal() test
Add a new unit test for damos_commit_quota_goal().

Link: https://lkml.kernel.org/r/20251111184415.141757-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:00 -08:00
SeongJae Park
3caf767e21 mm/damon/tests/core-kunit: add test cases to damos_test_commit_filter()
damos_test_commit_filter() is covering only a single test case.  Extend it
to cover multiple combinations of inputs.

Link: https://lkml.kernel.org/r/20251111184415.141757-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:00 -08:00
SeongJae Park
1b43b7950d mm/damon/tests/core-kunit: extend damos_test_commit_filter_for() for union fields
damos_commit_filter() also updates union fields of 'struct damos_filter'. 
Extend damos_test_commit_filter_for() to cover the expectations of the
union fields.

Link: https://lkml.kernel.org/r/20251111184415.141757-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:59 -08:00
SeongJae Park
1968236f75 mm/damon/tests/core-kunit: split out damos_test_commit_filter() core logic
damos_test_commit_filter() is written for only a single test case.  Split
out the core logic of damos_test_commit_filter() as a general one so that
it can be reused for multiple test cases.

Link: https://lkml.kernel.org/r/20251111184415.141757-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:59 -08:00
SeongJae Park
37104286f9 mm/damon/tests/core-kunit: remove dynamic allocs on damos_test_commit_filter()
Patch series "mm/damon/tests: add more tests for online parameters commit".

A DAMON feature called parameters "commit" allows DAMON API callers and
ABI users to update nearly every DAMON parameter while DAMON is running. 
This is being used for flexible DAMON use cases such as taking a snapshot
of the monitoring results with minimum overhead, or adjusting access-aware
system operations (DAMOS) for user-space driven auto-tuning or
investigations.

Compared to the usefulness of the feature and size of the implementation,
the test coverage is pretty small.  Only the filter commit part has a
single test case, namely damos_test_commit_filter().  Actually, we found
and fixed a few bugs of the feature in the past.  The single existing test
was also added to avoid reintroduction of a found bug.

Add more unit tests for the feature.

First four patches (1-4) refactor and extend the existing test for DAMOS
filter commit for multiple test cases.

Next three patches (5-7) add tests for DAMOS quota commit.

Next two patches (8 and 9) refactor damos_commit_dests() for ease of code
reading and test writing, and implement a new unit test of the function
that is being refactored in a test-friendly way.

Final two patches (10 and 11) further add new unit tests for
damos_commit() and damon_commit_target_regions().


This patch (of 11):

damos_test_commit_filter() is dynamically allocating test-purpose DAMOS
filters.  Allocation failure checks are making the code longer,
complicated, and difficult to extend for more test cases.  Refactor the
code to remove the dynamic allocation.

Link: https://lkml.kernel.org/r/20251111184415.141757-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20251111184415.141757-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:59 -08:00
Matthew Wilcox (Oracle)
7370f8e1b3 mm: use vma_start_write_killable() in dup_mmap()
Allow waiting for the VMA write lock to be interrupted by fatal signals. 
The explicit check for fatal_signal_pending() can be removed as it is
checked during vma_start_write_killable().  Improves the latency of
killing the task as we do not wait for the reader to finish before
checking for signals.

Link: https://lkml.kernel.org/r/20251110203204.1454057-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Chris Li <chriscli@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:59 -08:00
Matthew Wilcox (Oracle)
2197bb60f8 mm: add vma_start_write_killable()
Patch series "vma_start_write_killable"", v2.

When we added the VMA lock, we made a major oversight in not adding a
killable variant.  That can run us into trouble where a thread takes the
VMA lock for read (eg handling a page fault) and then goes out to lunch
for an hour (eg doing reclaim).  Another thread tries to modify the VMA,
taking the mmap_lock for write, then attempts to lock the VMA for write. 
That blocks on the first thread, and ensures that every other page fault
now tries to take the mmap_lock for read.  Because everything's in an
uninterruptible sleep, we can't kill the task, which makes me angry.

This patchset just adds vma_start_write_killable() and converts one caller
to use it.  Most users are somewhat tricky to convert, so expect follow-up
individual patches per call-site which need careful analysis to make sure
we've done proper cleanup.


This patch (of 2):

The vma can be held read-locked for a substantial period of time, eg if
memory allocation needs to go into reclaim.  It's useful to be able to
send fatal signals to threads which are waiting for the write lock.

Link: https://lkml.kernel.org/r/20251110203204.1454057-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20251110203204.1454057-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Chris Li <chriscli@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:59 -08:00
Qi Zheng
3a47e8771c mm: vmstat: correct the comment above preempt_disable_nested()
The comment explaining why these parts use preempt_disable_nested() is in
__mod_zone_page_state(), not in __mod_node_page_state(), so we should see
__mod_zone_page_state(). Just correct it.

Link: https://lkml.kernel.org/r/20251110084437.46701-1-qi.zheng@linux.dev
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:59 -08:00
Lorenzo Stoakes
2ab7f1bbaf mm/madvise: allow guard page install/remove under VMA lock
We only need to keep the page table stable so we can perform this
operation under the VMA lock.  PTE installation is stabilised via the PTE
lock.

One caveat is that, if we prepare vma->anon_vma we must hold the mmap read
lock.  We can account for this by adapting the VMA locking logic to
explicitly check for this case and prevent a VMA lock from being acquired
should it be the case.

This check is safe, as while we might be raced on anon_vma installation,
this would simply make the check conservative, there's no way for us to
see an anon_vma and then for it to be cleared, as doing so requires the
mmap/VMA write lock.

We abstract the VMA lock validity logic to is_vma_lock_sufficient() for
this purpose, and add prepares_anon_vma() to abstract the anon_vma logic.

In order to do this we need to have a way of installing page tables
explicitly for an identified VMA, so we export walk_page_range_vma() in an
unsafe variant - walk_page_range_vma_unsafe() and use this should the VMA
read lock be taken.

We additionally update the comments in madvise_guard_install() to more
accurately reflect the cases in which the logic may be reattempted,
specifically THP huge pages being present.

Link: https://lkml.kernel.org/r/cca1edbd99cd1386ad20556d08ebdb356c45ef91.1762795245.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: SeongJae Park <sj@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:59 -08:00
Lorenzo Stoakes
f4af67ff4f mm: rename walk_page_range_mm()
Patch series "mm: perform guard region install/remove under VMA lock", v2.

There is no reason why can't perform guard region operations under the VMA
lock, as long we take proper precautions to ensure that we do so in a safe
manner.

This is fine, as VMA lock acquisition is always best-effort, so if we are
unable to do so, we can simply fall back to using the mmap read lock.

Doing so will reduce mmap lock contention for callers performing guard
region operations and help establish a precedent of trying to use the VMA
lock where possible.

As part of this change we perform a trivial rename of page walk functions
which bypass safety checks (i.e.  whether or not mm_walk_ops->install_pte
is specified) in order that we can keep naming consistent with the mm
walk.

This is because we need to expose a VMA-specific walk that still allows us
to install PTE entries.


This patch (of 2):

Make it clear we're referencing an unsafe variant of this function
explicitly.

This is laying the foundation for exposing more such functions and
maintaining a consistent naming scheme.

As a part of this change, rename check_ops_valid() to check_ops_safe() for
consistency.

Link: https://lkml.kernel.org/r/cover.1762795245.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/c684d91464a438d6e31172c9450416a373f10649.1762795245.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: SeongJae Park <sj@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:59 -08:00
Shakeel Butt
bc8e51c05a mm: memcg: dump memcg protection info on oom or alloc failures
Currently kernel dumps memory state on oom and allocation failures.  One
of the question usually raised on those dumps is why the kernel has not
reclaimed the reclaimable memory instead of triggering oom.  One potential
reason is the usage of memory protection provided by memcg.  So, let's
also dump the memory protected by the memcg in such reports to ease the
debugging.

Link: https://lkml.kernel.org/r/20251107234041.3632644-1-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:59 -08:00
Huacai Chen
05be028795 mm: remove unnecessary __GFP_HIGHMEM in __p*d_alloc_one_*()
__{pgd,p4d,pud,pmd,pte}_alloc_one_*() always allocate pages with GFP flag
GFP_PGTABLE_KERNEL/GFP_PGTABLE_USER.  These two macros are defined as
follows:

 #define GFP_PGTABLE_KERNEL	(GFP_KERNEL | __GFP_ZERO)
 #define GFP_PGTABLE_USER	(GFP_PGTABLE_KERNEL | __GFP_ACCOUNT)

There is no __GFP_HIGHMEM in them, so we needn't to clear __GFP_HIGHMEM
explicitly.

Link: https://lkml.kernel.org/r/20251109021817.346181-1-chenhuacai@loongson.cn
Link: https://lkml.kernel.org/r/20251107095536.3101371-1-chenhuacai@loongson.cn
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:59 -08:00
Lorenzo Stoakes
c0ae966fac tools/testing/selftests/mm: add smaps visibility guard region test
Assert that we observe guard regions appearing in /proc/$pid/smaps as
expected, and when split/merge is performed too (with expected sticky
behaviour).

Also add handling for file systems which don't sanely handle mmap() VMA
merging so we don't incorrectly encounter a test failure in this
situation.

Link: https://lkml.kernel.org/r/059e62b8c67e55e6d849878206a95ea1d7c1e885.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
Lorenzo Stoakes
89330ec897 tools/testing/selftests/mm: add MADV_COLLAPSE test case
To ensure the retract_page_tables() logic functions correctly with the
introduction of VM_MAYBE_GUARD, add a test to assert that madvise collapse
fails when guard regions are established in the collapsed range in all
cases.

Unfortunately we cannot differentiate between e.g. 
CONFIG_READ_ONLY_THP_FOR_FS not being set vs.  a file-backed VMA having
collapse correctly disallowed, so in each instance we will get an assert
pass here.

We add an additional check to see whether guard regions are preserved
across collapse in case of a bug causing the collapse to succeed, which
will give us more data to debug with should this occur in future.

Link: https://lkml.kernel.org/r/0748beeb864525b8ddfa51adad7128dd32eb3ac4.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
Lorenzo Stoakes
29bef05e6d tools/testing/vma: add VMA sticky userland tests
Modify existing merge new/existing userland VMA tests to assert that
sticky VMA flags behave as expected.

We do so by generating every possible permutation of VMAs being
manipulated being sticky/not sticky and asserting that VMA flags with this
property retain are retained upon merge.

Link: https://lkml.kernel.org/r/5e2c7244485867befd052f8afc8188be6a4be670.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
Lorenzo Stoakes
49e14dabed mm: set the VM_MAYBE_GUARD flag on guard region install
Now we have established the VM_MAYBE_GUARD flag and added the capacity to
set it atomically, do so upon MADV_GUARD_INSTALL.

The places where this flag is used currently and matter are:

* VMA merge - performed under mmap/VMA write lock, therefore excluding
  racing writes.

* /proc/$pid/smaps - can race the write, however this isn't meaningful
  as the flag write is performed at the point of the guard region being
  established, and thus an smaps reader can't reasonably expect to avoid
  races.  Due to atomicity, a reader will observe either the flag being
  set or not.  Therefore consistency will be maintained.

In all other cases the flag being set is irrelevant and atomicity
guarantees other flags will be read correctly.

Note that non-atomic updates of unrelated flags do not cause an issue with
this flag being set atomically, as writes of other flags are performed
under mmap/VMA write lock, and these atomic writes are performed under
mmap/VMA read lock, which excludes the write, avoiding RMW races.

Note that we do not encounter issues with KCSAN by adjusting this flag
atomically, as we are only updating a single bit in the flag bitmap and
therefore we do not need to annotate these changes.

We intentionally set this flag in advance of actually updating the page
tables, to ensure that any racing atomic read of this flag will only
return false prior to page tables being updated, to allow for
serialisation via page table locks.

Note that we set vma->anon_vma for anonymous mappings.  This is because
the expectation for anonymous mappings is that an anon_vma is established
should they possess any page table mappings.  This is also consistent with
what we were doing prior to this patch (unconditionally setting anon_vma
on guard region installation).

We also need to update retract_page_tables() to ensure that madvise(...,
MADV_COLLAPSE) doesn't incorrectly collapse file-backed ranges contain
guard regions.

This was previously guarded by anon_vma being set to catch MAP_PRIVATE
cases, but the introduction of VM_MAYBE_GUARD necessitates that we check
this flag instead.

We utilise vma_flag_test_atomic() to do so - we first perform an
optimistic check, then after the PTE page table lock is held, we can check
again safely, as upon guard marker install the flag is set atomically
prior to the page table lock being taken to actually apply it.

So if the initial check fails either:

* Page table retraction acquires page table lock prior to VM_MAYBE_GUARD
  being set - guard marker installation will be blocked until page table
  retraction is complete.

OR:

* Guard marker installation acquires page table lock after setting
  VM_MAYBE_GUARD, which raced and didn't pick this up in the initial
  optimistic check, blocking page table retraction until the guard regions
  are installed - the second VM_MAYBE_GUARD check will prevent page table
  retraction.

Either way we're safe.

We refactor the retraction checks into a single
file_backed_vma_is_retractable(), there doesn't seem to be any reason that
the checks were separated as before.

Note that VM_MAYBE_GUARD being set atomically remains correct as
vma_needs_copy() is invoked with the mmap and VMA write locks held,
excluding any race with madvise_guard_install().

Link: https://lkml.kernel.org/r/e9e9ce95b6ac17497de7f60fc110c7dd9e489e8d.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
Lorenzo Stoakes
ab04b530e7 mm: introduce copy-on-fork VMAs and make VM_MAYBE_GUARD one
Gather all the VMA flags whose presence implies that page tables must be
copied on fork into a single bitmap - VM_COPY_ON_FORK - and use this
rather than specifying individual flags in vma_needs_copy().

We also add VM_MAYBE_GUARD to this list, as it being set on a VMA implies
that there may be metadata contained in the page tables (that is - guard
markers) which would will not and cannot be propagated upon fork.

This was already being done manually previously in vma_needs_copy(), but
this makes it very explicit, alongside VM_PFNMAP, VM_MIXEDMAP and
VM_UFFD_WP all of which imply the same.

Note that VM_STICKY flags ought generally to be marked VM_COPY_ON_FORK too
- because equally a flag being VM_STICKY indicates that the VMA contains
metadat that is not propagated by being faulted in - i.e.  that the VMA
metadata does not fully describe the VMA alone, and thus we must propagate
whatever metadata there is on a fork.

However, for maximum flexibility, we do not make this necessarily the case
here.

Link: https://lkml.kernel.org/r/5d41b24e7bc622cda0af92b6d558d7f4c0d1bc8c.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
Lorenzo Stoakes
64212ba02e mm: implement sticky VMA flags
It is useful to be able to designate that certain flags are 'sticky', that
is, if two VMAs are merged one with a flag of this nature and one without,
the merged VMA sets this flag.

As a result we ignore these flags for the purposes of determining VMA flag
differences between VMAs being considered for merge.

This patch therefore updates the VMA merge logic to perform this action,
with flags possessing this property being described in the VM_STICKY
bitmap.

Those flags which ought to be ignored for the purposes of VMA merge are
described in the VM_IGNORE_MERGE bitmap, which the VMA merge logic is also
updated to use.

As part of this change we place VM_SOFTDIRTY in VM_IGNORE_MERGE as it
already had this behaviour, alongside VM_STICKY as sticky flags by
implication must not disallow merge.

Ultimately it seems that we should make VM_SOFTDIRTY a sticky flag in its
own right, but this change is out of scope for this series.

The only sticky flag designated as such is VM_MAYBE_GUARD, so as a result
of this change, once the VMA flag is set upon guard region installation,
VMAs with guard ranges will now not have their merge behaviour impacted as
a result and can be freely merged with other VMAs without VM_MAYBE_GUARD
set.

Also update the comments for vma_modify_flags() to directly reference
sticky flags now we have established the concept.

We also update the VMA userland tests to account for the changes.

Link: https://lkml.kernel.org/r/22ad5269f7669d62afb42ce0c79bad70b994c58d.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
Lorenzo Stoakes
9119d6c209 mm: update vma_modify_flags() to handle residual flags, document
The vma_modify_*() family of functions each either perform splits, a merge
or no changes at all in preparation for the requested modification to
occur.

When doing so for a VMA flags change, we currently don't account for any
flags which may remain (for instance, VM_SOFTDIRTY) despite the requested
change in the case that a merge succeeded.

This is made more important by subsequent patches which will introduce the
concept of sticky VMA flags which rely on this behaviour.

This patch fixes this by passing the VMA flags parameter as a pointer and
updating it accordingly on merge and updating callers to accommodate for
this.

Additionally, while we are here, we add kdocs for each of the
vma_modify_*() functions, as the fact that the requested modification is
not performed is confusing so it is useful to make this abundantly clear.

We also update the VMA userland tests to account for this change.

Link: https://lkml.kernel.org/r/23b5b549b0eaefb2922625626e58c2a352f3e93c.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
Lorenzo Stoakes
5688225023 mm: add atomic VMA flags and set VM_MAYBE_GUARD as such
This patch adds the ability to atomically set VMA flags with only the mmap
read/VMA read lock held.

As this could be hugely problematic for VMA flags in general given that
all other accesses are non-atomic and serialised by the mmap/VMA locks, we
implement this with a strict allow-list - that is, only designated flags
are allowed to do this.

We make VM_MAYBE_GUARD one of these flags.

Link: https://lkml.kernel.org/r/97e57abed09f2663077ed7a36fb8206e243171a9.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
Lorenzo Stoakes
5dba5cc2e0 mm: introduce VM_MAYBE_GUARD and make visible in /proc/$pid/smaps
Patch series "introduce VM_MAYBE_GUARD and make it sticky", v4.

Currently, guard regions are not visible to users except through
/proc/$pid/pagemap, with no explicit visibility at the VMA level.

This makes the feature less useful, as it isn't entirely apparent which
VMAs may have these entries present, especially when performing actions
which walk through memory regions such as those performed by CRIU.

This series addresses this issue by introducing the VM_MAYBE_GUARD flag
which fulfils this role, updating the smaps logic to display an entry for
these.

The semantics of this flag are that a guard region MAY be present if set
(we cannot be sure, as we can't efficiently track whether an
MADV_GUARD_REMOVE finally removes all the guard regions in a VMA) - but if
not set the VMA definitely does NOT have any guard regions present.

It's problematic to establish this flag without further action, because
that means that VMAs with guard regions in them become non-mergeable with
adjacent VMAs for no especially good reason.

To work around this, this series also introduces the concept of 'sticky'
VMA flags - that is flags which:

a. if set in one VMA and not in another still permit those VMAs to be
   merged (if otherwise compatible).

b. When they are merged, the resultant VMA must have the flag set.

The VMA logic is updated to propagate these flags correctly.

Additionally, VM_MAYBE_GUARD being an explicit VMA flag allows us to solve
an issue with file-backed guard regions - previously these established an
anon_vma object for file-backed mappings solely to have vma_needs_copy()
correctly propagate guard region mappings to child processes.

We introduce a new flag alias VM_COPY_ON_FORK (which currently only
specifies VM_MAYBE_GUARD) and update vma_needs_copy() to check explicitly
for this flag and to copy page tables if it is present, which resolves
this issue.

Additionally, we add the ability for allow-listed VMA flags to be
atomically writable with only mmap/VMA read locks held.

The only flag we allow so far is VM_MAYBE_GUARD, which we carefully ensure
does not cause any races by being allowed to do so.

This allows us to maintain guard region installation as a read-locked
operation and not endure the overhead of obtaining a write lock here.

Finally we introduce extensive VMA userland tests to assert that the
sticky VMA logic behaves correctly as well as guard region self tests to
assert that smaps visibility is correctly implemented.


This patch (of 9):

Currently, if a user needs to determine if guard regions are present in a
range, they have to scan all VMAs (or have knowledge of which ones might
have guard regions).

Since commit 8e2f2aeb8b ("fs/proc/task_mmu: add guard region bit to
pagemap") and the related commit a516403787 ("fs/proc: extend the
PAGEMAP_SCAN ioctl to report guard regions"), users can use either
/proc/$pid/pagemap or the PAGEMAP_SCAN functionality to perform this
operation at a virtual address level.

This is not ideal, and it gives no visibility at a /proc/$pid/smaps level
that guard regions exist in ranges.

This patch remedies the situation by establishing a new VMA flag,
VM_MAYBE_GUARD, to indicate that a VMA may contain guard regions (it is
uncertain because we cannot reasonably determine whether a
MADV_GUARD_REMOVE call has removed all of the guard regions in a VMA, and
additionally VMAs may change across merge/split).

We utilise 0x800 for this flag which makes it available to 32-bit
architectures also, a flag that was previously used by VM_DENYWRITE, which
was removed in commit 8d0920bde5 ("mm: remove VM_DENYWRITE") and hasn't
bee reused yet.

We also update the smaps logic and documentation to identify these VMAs.

Another major use of this functionality is that we can use it to identify
that we ought to copy page tables on fork.

We do not actually implement usage of this flag in mm/madvise.c yet as we
need to allow some VMA flags to be applied atomically under mmap/VMA read
lock in order to avoid the need to acquire a write lock for this purpose.

Link: https://lkml.kernel.org/r/cover.1763460113.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/cf8ef821eba29b6c5b5e138fffe95d6dcabdedb9.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
Hui Zhu
cdcb53e1de mm/hugetlb: extract sysctl into hugetlb_sysctl.c
Following the extraction of sysfs code, this patch moves the sysctl
interface implementation into a dedicated file to further improve code
organization and maintainability of the hugetlb subsystem.

The following components are moved to mm/hugetlb_sysctl.c:
- proc_hugetlb_doulongvec_minmax()
- hugetlb_sysctl_handler_common()
- hugetlb_sysctl_handler()
- hugetlb_mempolicy_sysctl_handler() (CONFIG_NUMA)
- hugetlb_overcommit_handler()
- hugetlb_table[] sysctl table definition
- hugetlb_sysctl_init()

The hugetlb_internal.h header file is updated to declare the sysctl
initialization function with proper #ifdef guards for configurations
without CONFIG_SYSCTL support.

The Makefile is updated to compile hugetlb_sysctl.o when CONFIG_HUGETLBFS
is enabled.  This refactoring reduces the size of hugetlb.c and logically
separates the sysctl interface from core hugetlb management code.

MAINTAINERS is updated to add new file hugetlb_sysctl.c.

No functional changes are introduced; all code is moved as-is from
hugetlb.c with consistent formatting.

Link: https://lkml.kernel.org/r/5bbee7ab5be71d0bb1aebec38642d7e83526bb7a.1762398359.git.zhuhui@kylinos.cn
Signed-off-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Cc: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:57 -08:00
Hui Zhu
ecd6703f64 mm/hugetlb: extract sysfs into hugetlb_sysfs.c
Patch series "mm/hugetlb: refactor sysfs/sysctl interfaces", v5.

hugetlb.c has grown significantly and become difficult to maintain.  This
patch series extracts the sysfs and sysctl interface code into separate
dedicated files to improve code organization.

The refactoring includes:
- Patch 1: Extract sysfs interface into mm/hugetlb_sysfs.c
- Patch 2: Extract sysctl interface into mm/hugetlb_sysctl.c

No functional changes are introduced in this series.  The code is moved
as-is, with only minor formatting adjustments for code style consistency. 
This should make future maintenance and enhancements to the hugetlb
subsystem easier.

Testing: The patch series has been compile-tested and maintains the same
functionality as the original code.


This patch (of 2):

Currently, hugetlb.c contains both core management logic and sysfs
interface implementations, making it difficult to maintain.  This patch
extracts the sysfs-related code into a dedicated file to improve code
organization.

The following components are moved to mm/hugetlb_sysfs.c:
- sysfs attribute definitions and handlers
- sysfs kobject management functions
- NUMA per-node hstate attribute registration

Several inline helper functions and macros are moved to
mm/hugetlb_internal.h:
- hstate_is_gigantic_no_runtime()
- next_node_allowed()
- get_valid_node_allowed()
- hstate_next_node_to_alloc()
- hstate_next_node_to_free()
- for_each_node_mask_to_alloc/to_free macros

To support code sharing, these functions are changed from static to
exported symbols:
- remove_hugetlb_folio()
- add_hugetlb_folio()
- init_new_hugetlb_folio()
- prep_and_add_allocated_folios()
- demote_pool_huge_page()
- __nr_hugepages_store_common()

The Makefile is updated to compile hugetlb_sysfs.o when CONFIG_HUGETLBFS
is enabled.  This maintains all existing functionality while improving
maintainability by separating concerns.

MAINTAINERS is updated to add new file hugetlb_sysfs.c.

Link: https://lkml.kernel.org/r/cover.1762398359.git.zhuhui@kylinos.cn
Link: https://lkml.kernel.org/r/656a03dff7e2bb20e24e841ede81fdca01d21410.1762398359.git.zhuhui@kylinos.cn
Signed-off-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Cc: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:57 -08:00
Matthew Wilcox (Oracle)
98be155451 mm: constify __dump_folio() arguments
These arguments aren't modified by the function; mark them as const to
help the compiler.

Link: https://lkml.kernel.org/r/20251106203526.2368275-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:57 -08:00
Matthew Wilcox (Oracle)
20605eb5bb memory_hotplug: optimise try_offline_memory_block()
Extract the zone number directly from the page instead of using the page's
zone number to look up the zone and asking the zone what its number is.

Link: https://lkml.kernel.org/r/20251106201452.2292631-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:57 -08:00
Matthew Wilcox (Oracle)
c537f0dd30 migrate: optimise alloc_migration_target()
Extract the zone number directly from the folio instead of using the
folio's zone number to look up the zone and asking the zone what its
number is.

[ziy@nvidia.com: fix folio_zonenum() return type]
  Link: https://lkml.kernel.org/r/26E8FF35-503E-4F14-98F7-7B4FA25FBD37@nvidia.com
Link: https://lkml.kernel.org/r/20251106201452.2292631-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:57 -08:00
Matthew Wilcox (Oracle)
e24f66e87b hugetlb: optimise hugetlb_folio_init_tail_vmemmap()
Extract the zone number directly from the folio instead of using the
folio's zone number to look up the zone and asking the zone what its
number is.

Also we should use &folio->page instead of casting from folio to page

Link: https://lkml.kernel.org/r/20251106201452.2292631-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:57 -08:00
Zeng Chi
135e541ae8 lib/alloc_tag: use %pe format specifier
The %pe format specifier is designed to print error pointers.  It prints a
symbolic error name (eg.  -EINVAL) and it makes the code simpler by
omitting PTR_ERR();

This patch fixes this cocci report:
lib/alloc_tag.c:776:63-70: WARNING: Consider using %pe to print PTR_ERR()

Link: https://lkml.kernel.org/r/20251105023925.1447482-1-zeng_chi911@163.com
Signed-off-by: Zeng Chi <zengchi@kylinos.cn>
Acked-by: SeongJae Park <sj@kernel.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:57 -08:00
Kefeng Wang
340b59816b mm: kill mm_wr_locked from unmap_vmas() and unmap_single_vma()
Kill mm_wr_locked since commit f8e97613fe ("mm: convert VM_PFNMAP
tracking to pfnmap_track() + pfnmap_untrack()") remove the user.

Link: https://lkml.kernel.org/r/20251104085709.2688433-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:57 -08:00
Ankit Khushwaha
3b12a53b64 selftest/mm: fix pointer comparison in mremap_test
Pointer arthemitic with 'void * addr' and 'ulong dest_alignment' triggers
following warning:

mremap_test.c:1035:31: warning: pointer comparison always evaluates to
false [-Wtautological-compare]
 1035 |                 if (addr + c.dest_alignment < addr) {
      |                                             ^

this warning is raised from clang version 20.1.8 (Fedora 20.1.8-4.fc42).

use 'void *tmp_addr' to do the pointer arthemitic.

Link: https://lkml.kernel.org/r/20251108161829.25105-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:57 -08:00
Baolin Wang
4f8961b295 mm: vmscan: simplify the folio refcount check in pageout()
Since we no longer attempt to write back filesystem folios in pageout()
(they will be filtered out by the following check in pageout()), and only
tmpfs/shmem folios and anonymous swapcache folios can be written back, we
can remove the redundant folio_test_private() when checking the folio's
refcount, as tmpfs/shmem and swapcache folios do not use the PG_private
flag.

While we're at it, we can open-code the folio refcount check instead of
adding a simple helper that has only one user.

Link: https://lkml.kernel.org/r/4cbbec5bb92397aa4597105f1f499aabf7a1901c.1758166683.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:56 -08:00
Baolin Wang
d94d9293a1 mm: vmscan: remove folio_test_private() check in pageout()
Patch series "some cleanups for pageout()", v2.

Since we no longer attempt to write back filesystem folios in pageout(),
and only tmpfs/shmem folios and anonymous swapcache folios can be written
back, we can remove the redundant folio_test_private() related logic to
simplify the logic of pageout(), as tmpfs/shmem and swapcache folios do
not use the PG_private flag.


This patch (of 2):

The folio_test_private() check in pageout() was introduced by commit
ce91b575332b ("orphaned pagecache memleak fix") in 2005 (checked from a
history tree[1]).  As the commit message mentioned, it was to address the
issue where reiserfs pagecache may be truncated while still pinned.  To
further explain, the truncation removes the page->mapping, but the page is
still listed in the VM queues because it still has buffers.

In 2008, commit a2b345642f ("Fix dirty page accounting leak with ext3
data=journal") seems to be dealing with a similar issue, where the page
becomes dirty after truncation, and it provides a very useful call stack:

truncate_complete_page()
      cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
      do_invalidatepage()
        ext3_invalidatepage()
          journal_invalidatepage()
            journal_unmap_buffer()
              __dispose_buffer()
                __journal_unfile_buffer()
                  __journal_temp_unlink_buffer()
                    mark_buffer_dirty(); // PG_dirty set, incr. dirty pages

In this commit a2b345642f, we forcefully clear the page's dirty flag
during truncation (in truncate_complete_page()).

Now it seems this was just a peculiar usage specific to reiserfs.  Maybe
reiserfs had some extra refcount on these pages, which caused them to pass
the is_page_cache_freeable() check.

With the fix provided by commit a2b345642f and reiserfs being removed
in 2024 by commit fb6f20ecb1 ("reiserfs: The last commit"), such a case
is unlikely to occur again.  So let's remove the redundant
folio_test_private() checks and related buffer_head release logic, and
just leave a warning here to catch such a bug.

[akpm@linux-foundation.org: redo comment, per David]
  Link: https://lkml.kernel.org/r/17d1b293-e393-4989-a357-7eea74b3c805@redhat.com
[baolin.wang@linux.alibaba.com: remove comment and WARNing, per Hugh and others]
  Link: https://lkml.kernel.org/r/392a9ca3-31ac-4447-bd44-3c656d63e4ca@linux.alibaba.com
Link: https://lkml.kernel.org/r/cover.1758166683.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/9ef0e560dc83650bc538eb5dcd1594e112c1369f.1758166683.git.baolin.wang@linux.alibaba.com
Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git [1]
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:56 -08:00
Michael Opdenacker
605945281a dt-bindings: arm: rockchip: merge Asus Tinker and Tinker S
Merge sections for Asus Tinker Board and Asus Tinker Board S.

Asus Tinker Board S (Storage) is just a variant of Tinker Board
with an added eMMC chip.

Signed-off-by: Michael Opdenacker <michael.opdenacker@rootcommit.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://patch.msgid.link/20251119-merge-tinker-bindings-v1-1-4072b55750c1@rootcommit.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-20 22:06:24 +01:00
Heiko Stuebner
ce0b84e766 arm64: dts: rockchip: add QNAP TS233 devicetree
The TS233 is a 2 bay NAS similar to the TS433. Architecture-wise it really
seems to be the same minus the additional PCIe connected components the
TS433 has.

So it just uses two of the SoCs SATA ports and the SoC's gigabit ethernet.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20251112214206.423244-6-heiko@sntech.de
2025-11-20 20:46:32 +01:00
Heiko Stuebner
a576b51e13 dt-bindings: arm: rockchip: add TS233 to RK3568-based QNAP NAS devices
QNAP builds a number of variants of the RK3568-based NAS design.

Add the 2-bay TS233 variant.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20251112214206.423244-5-heiko@sntech.de
2025-11-20 20:46:32 +01:00
Heiko Stuebner
35202606ea arm64: dts: rockchip: move common qnap tsx33 parts to dtsi
The NAS series based around the rk3568 contains a number of models with
1-4 drives, that reuse most of the board structure.

Therefore move the shared parts to a dtsi, to be included by the devices.

As the smallest device is the 1-bay TS133, keep everything > slot1 in
the individual devicetree.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20251112214206.423244-4-heiko@sntech.de
2025-11-20 20:46:32 +01:00
Heiko Stuebner
81f1169914 arm64: dts: rockchip: describe mcu eeprom cells on rk3568-ts433
The MCU's eeprom contains the unit's serial and a number of slots for
mac-addresses. As the MCU seems to be used in different devices, up to
8 mac addresses can live there and the unused slots are actually
initialized with empty mac-address strings like 00:00:00:00:05:09 .

Interestingly on the TS-433, the PCIe ethernet adapter brings its own
memory to hold its mac, and the gmac0 is supposed to get its mac from
the second mac-slot, while the first one stays empty.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20251112214206.423244-3-heiko@sntech.de
2025-11-20 20:46:32 +01:00
Heiko Stuebner
15efa24146 arm64: dts: rockchip: move cpu_thermal node to the correct position
The &cpu_thermal node was added at the wrong position, move it to
the correctly sorted one.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20251112214206.423244-2-heiko@sntech.de
2025-11-20 20:46:32 +01:00
Yue Haibing
c09816f2af KVM: x86: Remove unused declaration kvm_mmu_may_ignore_guest_pat()
Commit 3fee4837ef ("KVM: x86: remove shadow_memtype_mask")
removed the functions but leave this declaration.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20251120120930.1448593-1-yuehaibing@huawei.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 07:26:35 -08:00
Paolo Bonzini
ebec25438f KVM: x86: Enable support for emulating AVX MOV instructions
Some users of KVM have emulated devices (typically added to private
forks of QEMU) that execute AVX instructions on PCI BARs.  Whenever
the guest OS tries to do that, an illegal instruction exception or
emulation failure is triggered.

Add the Avx flag to move instructions:
- (66) 0f 10 - MOVUPS/MOVUPD from memory
- (66) 0f 11 - MOVUPS/MOVUPD to memory
- 66 0f 6f - MOVDQA from memory
- 66 0f 7f - MOVDQA to memory
- f3 0f 6f - MOVDQU from memory
- f3 0f 7f - MOVDQU to memory
- (66) 0f 28 - MOVAPS/MOVAPD from memory
- (66) 0f 29 - MOVAPS/MOVAPD to memory
- (66) 0f 2b - MOVNTPS/MOVNTPD to memory
- 66 0f e7 - MOVNTDQ to memory
- 66 0f 38 2a - MOVNTDQA to memory

Co-developed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/kvm/BD108C42-0382-4B17-B601-434A4BD038E7@fb.com/T/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://patch.msgid.link/20251114003633.60689-11-pbonzini@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 07:26:08 -08:00
Michael Walle
4ba96cd38f arm64: dts: ti: sa67: add build time dtb for overlays
Since commit d8c8a575f5 ("kbuild: Ensure .dtbo targets are applied to
a base .dtb") kbuild will throw a warning for any stray overlays. Add a
new .dtb which will only be build if CONFIG_OF_ALL_DTBS is enabled.

Signed-off-by: Michael Walle <mwalle@kernel.org>
Link: https://patch.msgid.link/20251120080138.2397851-1-mwalle@kernel.org
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-20 20:06:56 +05:30
Wadim Egorov
7026f224e2 arm64: dts: ti: Enable build testing of PHYTEC board overlays
Add missing PHYTEC overlay build targets so they are included in
CONFIG_OF_ALL_DTBS coverage. This ensures all PHYTEC board-overlay
combinations are apply-tested during build time testing.

Signed-off-by: Wadim Egorov <w.egorov@phytec.de>
Link: https://patch.msgid.link/20251119123216.1481420-1-w.egorov@phytec.de
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-20 20:06:14 +05:30
Conor Dooley
d5c3f49b75 MAINTAINERS: Setup support for Anlogic tree
Add myself as the maintainer of the Anlogic DR1V90 SoC tree, including
the corresponding DTS and DT bindings paths for Anlogic RISC-V-based
SoCs. I don't really want to look after this platform, but am due to
irritation of the vendor's behaviour towards the contributor of support.
Hence, Odd Fixes as the status.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-20 09:48:13 +00:00
Yue Haibing
1387cb78f1 Documentation: reset: Remove reset_controller_add_lookup()
Commit 205b261463ff ("reset: remove legacy reset lookup code") removed this
api, so update the document.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-20 10:02:37 +01:00
Paolo Bonzini
f0585a714a KVM: x86: Add emulator support for decoding VEX prefixes
After all the changes done in the previous patches, the only thing
left to support AVX MOV instructions is to expand the VEX prefix into
the appropriate REX, 66/F3/F2 and map prefixes.  Three-operand
instructions are not supported.

The Avx bit in this case is not cleared, in fact it is used as the
sign that the instruction does support VEX encoding.  Until it is
added to any instruction, however, the only functional change is
to change some not-implemented instructions to #UD if they correspond
to a VEX prefix with an invalid map.

Co-developed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://patch.msgid.link/20251114003633.60689-10-pbonzini@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-19 14:36:52 -08:00
Chang S. Bae
825f0aece0 KVM: x86: Refactor REX prefix handling in instruction emulation
Restructure how to represent and interpret REX fields, preparing
for handling of both REX2 and VEX.

REX uses the upper four bits of a single byte as a fixed identifier,
and the lower four bits containing the data. VEX and REX2 extends this so
that the first byte identifies the prefix and the rest encode additional
bits; and while VEX only has the same four data bits as REX, eight zero
bits are a valid value for the data bits of REX2.  So, stop storing the
REX byte as-is.  Instead, store only the low bits of the REX prefix and
track separately whether a REX-like prefix was used.

No functional changes intended.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Message-ID: <20251110180131.28264-11-chang.seok.bae@intel.com>
[Extracted from APX series; removed bitfields and REX2-specific default. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://patch.msgid.link/20251114003633.60689-9-pbonzini@redhat.com
[sean: name REX_{BXRW} enum "rex_bits"]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-19 14:36:38 -08:00
Paolo Bonzini
4cb21be4c3 KVM: x86: Add AVX support to the emulator's register fetch and writeback
Prepare struct operand for hosting AVX registers.  Remove the
existing, incomplete code that placed the Avx flag in the operand
alignment field, and repurpose the name for a separate bit that
indicates:

- after decode, whether an instruction supports the VEX prefix;

- before writeback, that the instruction did have the VEX prefix and
therefore 1) it can have op_bytes == 32; 2) t should clear high
bytes of XMM registers.

Right now the bit will never be set and the patch has no intended
functional change.  However, this is actually more vexing than the
decoder changes itself, and therefore worth separating.

Co-developed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://patch.msgid.link/20251114003633.60689-8-pbonzini@redhat.com
[sean: guard ymm[8-15] accesses with #ifdef CONFIG_X86_64]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-19 14:30:54 -08:00
Paolo Bonzini
f106797f81 KVM: x86: Add x86_emulate_ops.get_xcr() callback
This will be necessary in order to check whether AVX is enabled.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com>
Link: https://patch.msgid.link/20251114003633.60689-7-pbonzini@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-19 14:29:55 -08:00
Paolo Bonzini
7e11eec989 KVM: x86: Share emulator's common register decoding code
Remove all duplicate handling of register operands, including picking
the right register class and fetching it, by extracting a new function
that can be used for both REG and MODRM operands.

Centralize setting op->orig_val = op->val in fetch_register_operand()
as well.

No functional change intended.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com>
Link: https://patch.msgid.link/20251114003633.60689-6-pbonzini@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-19 14:28:53 -08:00
Paolo Bonzini
1a84b07aca KVM: x86: Move op_prefix to struct x86_emulate_ctxt (from x86_decode_insn())
VEX decode will need to set it based on the "pp" bits, so make it
a field in the struct rather than a local variable.

No functional change intended.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com>
Link: https://patch.msgid.link/20251114003633.60689-5-pbonzini@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-19 14:28:30 -08:00
Paolo Bonzini
3d8834a0d1 KVM: x86: Improve formatting of the emulator's flags table
Align a little better the comments on the right side and list
explicitly the bits used by multi-bit fields.

No functional change intended.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com>
Link: https://patch.msgid.link/20251114003633.60689-4-pbonzini@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-19 14:27:25 -08:00
Paolo Bonzini
3f3fc58df5 KVM: x86: Move Src2Shift up one bit (use bits 36:32 for Src2 in the emulator)
An irresistible microoptimization (changing accesses to Src2 to just an
AND :)) that also frees a bit for AVX in the low flags word.  This makes
it closer to SSE since both of them can access XMM registers, pointlessly
shaving another clock cycle or two (maybe).

No functional change intended.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com
Link: https://patch.msgid.link/20251114003633.60689-3-pbonzini@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-19 14:25:59 -08:00
Paolo Bonzini
c57d9bafbd KVM: x86: Add support for emulating MOVNTDQA
MOVNTDQA is a simple MOV instruction, in fact it has the same
characteristics as 0F E7 (MOVNTDQ) other than the aligned-address
requirement.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://patch.msgid.link/20251114003633.60689-2-pbonzini@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-19 14:24:53 -08:00
Raghavendra Rao Ananta
4ddfab5436 KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables
When a large VM, specifically one that holds a significant number of PTEs,
gets abruptly destroyed, the following warning is seen during the
page-table walk:

 sched: CPU 0 need_resched set for > 100018840 ns (100 ticks) without schedule
 CPU: 0 UID: 0 PID: 9617 Comm: kvm_page_table_ Tainted: G O 6.16.0-smp-DEV #3 NONE
 Tainted: [O]=OOT_MODULE
 Call trace:
  show_stack+0x20/0x38 (C)
  dump_stack_lvl+0x3c/0xb8
  dump_stack+0x18/0x30
  resched_latency_warn+0x7c/0x88
  sched_tick+0x1c4/0x268
  update_process_times+0xa8/0xd8
  tick_nohz_handler+0xc8/0x168
  __hrtimer_run_queues+0x11c/0x338
  hrtimer_interrupt+0x104/0x308
  arch_timer_handler_phys+0x40/0x58
  handle_percpu_devid_irq+0x8c/0x1b0
  generic_handle_domain_irq+0x48/0x78
  gic_handle_irq+0x1b8/0x408
  call_on_irq_stack+0x24/0x30
  do_interrupt_handler+0x54/0x78
  el1_interrupt+0x44/0x88
  el1h_64_irq_handler+0x18/0x28
  el1h_64_irq+0x84/0x88
  stage2_free_walker+0x30/0xa0 (P)
  __kvm_pgtable_walk+0x11c/0x258
  __kvm_pgtable_walk+0x180/0x258
  __kvm_pgtable_walk+0x180/0x258
  __kvm_pgtable_walk+0x180/0x258
  kvm_pgtable_walk+0xc4/0x140
  kvm_pgtable_stage2_destroy+0x5c/0xf0
  kvm_free_stage2_pgd+0x6c/0xe8
  kvm_uninit_stage2_mmu+0x24/0x48
  kvm_arch_flush_shadow_all+0x80/0xa0
  kvm_mmu_notifier_release+0x38/0x78
  __mmu_notifier_release+0x15c/0x250
  exit_mmap+0x68/0x400
  __mmput+0x38/0x1c8
  mmput+0x30/0x68
  exit_mm+0xd4/0x198
  do_exit+0x1a4/0xb00
  do_group_exit+0x8c/0x120
  get_signal+0x6d4/0x778
  do_signal+0x90/0x718
  do_notify_resume+0x70/0x170
  el0_svc+0x74/0xd8
  el0t_64_sync_handler+0x60/0xc8
  el0t_64_sync+0x1b0/0x1b8

The warning is seen majorly on the host kernels that are configured
not to force-preempt, such as CONFIG_PREEMPT_NONE=y. To avoid this,
instead of walking the entire page-table in one go, split it into
smaller ranges, by checking for cond_resched() between each range.
Since the path is executed during VM destruction, after the
page-table structure is unlinked from the KVM MMU, relying on
cond_resched_rwlock_write() isn't necessary.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Link: https://msgid.link/20251113052452.975081-4-rananta@google.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-19 14:12:03 -08:00
Raghavendra Rao Ananta
d68d66e57e KVM: arm64: Split kvm_pgtable_stage2_destroy()
Split kvm_pgtable_stage2_destroy() into two:
  - kvm_pgtable_stage2_destroy_range(), that performs the
    page-table walk and free the entries over a range of addresses.
  - kvm_pgtable_stage2_destroy_pgd(), that frees the PGD.

This refactoring enables subsequent patches to free large page-tables
in chunks, calling cond_resched() between each chunk, to yield the
CPU as necessary.

Existing callers of kvm_pgtable_stage2_destroy(), that probably cannot
take advantage of this (such as nVMHE), will continue to function as is.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Suggested-by: Oliver Upton <oupton@kernel.org>
Link: https://msgid.link/20251113052452.975081-3-rananta@google.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-19 14:11:57 -08:00
Oliver Upton
156f70afcf KVM: arm64: Only drop references on empty tables in stage2_free_walker
A subsequent change to the way KVM frees stage-2s will invoke the free
walker on sub-ranges of the VM's IPA space, meaning there's potential
for only partially visiting a table's PTEs.

Split the leaf and table visitors and only drop references on a table
when the page count reaches 1, implying there are no valid PTEs that
need to be visited. Invalidate the table PTE to avoid traversing the
stale reference.

Link: https://msgid.link/20251113052452.975081-2-rananta@google.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-19 14:11:50 -08:00
Maximilian Dittgen
85f329df29 KVM: selftests: SYNC after guest ITS setup in vgic_lpi_stress
vgic_lpi_stress sends MAPTI and MAPC commands during guest GIC setup to
map interrupt events to ITT entries and collection IDs to
redistributors, respectively.

We have no guarantee that the ITS will finish handling these mapping
commands before the selftest calls KVM_SIGNAL_MSI to inject LPIs to the
guest. If LPIs are injected before ITS mapping completes, the ITS cannot
properly pass the interrupt on to the redistributor.

Fix by adding a SYNC command to the selftests ITS library, then calling
SYNC after ITS mapping to ensure mapping completes before signal_lpi()
writes to GITS_TRANSLATER.

Signed-off-by: Maximilian Dittgen <mdittgen@amazon.de>
Link: https://msgid.link/20251119135744.68552-2-mdittgen@amazon.de
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-19 12:38:59 -08:00
Maximilian Dittgen
31df012da4 KVM: selftests: Assert GICR_TYPER.Processor_Number matches selftest CPU number
The selftests GIC library and tests assume that the
GICR_TYPER.Processor_number associated with a given CPU is the same as
the CPU's selftest index.

Since this assumption is not guaranteed by specification, add an assert
in gicv3_cpu_init() that validates this is true.

Signed-off-by: Maximilian Dittgen <mdittgen@amazon.de>
Link: https://msgid.link/20251119135744.68552-1-mdittgen@amazon.de
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-19 12:38:59 -08:00
Oliver Upton
cb17d79ff5 KVM: arm64: Use kvzalloc() for kvm struct allocation
Physically-allocated KVM structs aren't necessary when in VHE mode as
there's no need to share with the hyp's address space. Of course, there
can still be a performance benefit from physical allocations.

Use kvzalloc() for opportunistic physical allocations.

Acked-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://msgid.link/20251119093822.2513142-3-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-19 12:20:57 -08:00
Oliver Upton
297877069b KVM: arm64: Drop useless __GFP_HIGHMEM from kvm struct allocation
A recent change on the receiving end of vmalloc() started warning about
unsupported GFP flags passed by the caller. Nathan reports that this
warning fires in kvm_arch_alloc_vm(), owing to the fact that KVM is
passing a meaningless __GFP_HIGHMEM.

Do as the warning says and fix the code.

Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/kvmarm/20251118224448.GA998046@ax162/
Acked-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://msgid.link/20251119093822.2513142-2-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-19 12:20:56 -08:00
Yao Zihong
a131fd6079 selftests/riscv: Add Zicbop prefetch test
Add selftests to cbo.c to verify Zicbop extension behavior, and split
the previous `--sigill` mode into two options so they can be tested
independently.

The test checks:
- That hwprobe correctly reports Zicbop presence and block size.
- That prefetch instructions execute without exception on valid and NULL
  addresses when Zicbop is present.

Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://patch.msgid.link/20251118162436.15485-3-zihong.plct@isrc.iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:29 -07:00
Yao Zihong
e0a504984a riscv: hwprobe: Expose Zicbop extension and its block size
- Add `RISCV_HWPROBE_EXT_ZICBOP` to report the presence of the
  Zicbop extension.
- Add `RISCV_HWPROBE_KEY_ZICBOP_BLOCK_SIZE` to expose the block
  size (in bytes) when Zicbop is supported.
- Update hwprobe.rst to document the new extension bit and block
  size key, following the existing Zicbom/Zicboz style.

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Link: https://patch.msgid.link/20251118162436.15485-2-zihong.plct@isrc.iscas.ac.cn
[pjw@kernel.org: updated to apply]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:29 -07:00
Xu Lu
ad1bb4b852 riscv: Introduce Zalasr instructions
Introduce l{b|h|w|d}.{aq|aqrl} and s{b|h|w|d}.{rl|aqrl} instruction
encodings.

Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://patch.msgid.link/20251020042056.30283-5-luxu.kernel@bytedance.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Xu Lu
f4922b6916 riscv: hwprobe: Export Zalasr extension
Export the Zalasr extension to userspace using hwprobe.

Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Link: https://patch.msgid.link/20251020042056.30283-4-luxu.kernel@bytedance.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Xu Lu
4640be2588 dt-bindings: riscv: Add Zalasr ISA extension description
Add description for the Zalasr ISA extension

Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20251020042056.30283-3-luxu.kernel@bytedance.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Xu Lu
c9651fbc60 riscv: Add ISA extension parsing for Zalasr
Add parsing for Zalasr ISA extension.

Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Link: https://patch.msgid.link/20251020042056.30283-2-luxu.kernel@bytedance.com
[pjw@kernel.org: updated to apply]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Yong-Xuan Wang
f0ae09a892 selftests: riscv: Add test for the Vector ptrace interface
Add a test case that does some basic verification of the Vector ptrace
interface. This forks a child process then using ptrace to inspect and
manipulate the v31 register of the child.

Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Link: https://patch.msgid.link/20251013091318.467864-3-yongxuan.wang@sifive.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Yong-Xuan Wang
6efb1a9462 riscv: ptrace: Optimize the allocation of vector regset
The vector regset uses the maximum possible vlen value to estimate the
.n field. But not all the hardwares support the maximum vlen. Linux
might wastes time to prepare a large memory buffer(about 2^6 pages) for
the vector regset.

The regset can only copy vector registers when the process are using
vector. Add .active callback and determine the n field of vector regset
in riscv_v_setup_ctx_cache() doesn't affect the ptrace syscall and
coredump. It can avoid oversized allocations and better matches real
hardware limits.

Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Link: https://patch.msgid.link/20251013091318.467864-2-yongxuan.wang@sifive.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Chunyan Zhang
3ac022bf38 raid6: test: Add support for RISC-V
Add RISC-V code to be compiled to allow the userspace raid6test program
to be built and run on RISC-V.

Signed-off-by: Chunyan Zhang <zhang.lyra@gmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://patch.msgid.link/20250718072711.3865118-6-zhangchunyan@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Chunyan Zhang
3c58d7a513 raid6: riscv: Allow code to be compiled in userspace
To support userspace raid6test, this patch adds __KERNEL__ ifdef for kernel
header inclusions also userspace wrapper definitions to allow code to be
compiled in userspace.

This patch also drops the NSIZE macro, instead of using the vector length,
which can work for both kernel and user space.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Link: https://patch.msgid.link/20250718072711.3865118-5-zhangchunyan@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Chunyan Zhang
6fcce9f028 raid6: riscv: Prevent compiler from breaking inline vector assembly code
To prevent the compiler from breaking the inline vector assembly code,
this code must be built without compiler support for vector.

Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Link: https://patch.msgid.link/20250718072711.3865118-4-zhangchunyan@iscas.ac.cn
[pjw@kernel.org: cleaned up commit message]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Vivian Wang
724c694479 riscv: cmpxchg: Use riscv_has_extension_likely
Use riscv_has_extension_likely() to check for RISCV_ISA_EXT_ZAWRS,
replacing the use of asm goto with ALTERNATIVE.

The "likely" variant is used to match the behavior of the original
implementation using ALTERNATIVE("j %l[no_zawrs]", "nop", ...).

Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20251020-riscv-altn-helper-wip-v4-5-ef941c87669a@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Vivian Wang
6b85e9ac4a riscv: bitops: Use riscv_has_extension_likely
Use riscv_has_extension_likely() to check for RISCV_ISA_EXT_ZBB,
replacing the use of asm goto with ALTERNATIVE.

The "likely" variant is used to match the behavior of the original
implementation using ALTERNATIVE("j %l[legacy]", "nop", ...).

Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20251020-riscv-altn-helper-wip-v4-4-ef941c87669a@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Vivian Wang
8261a9d167 riscv: hweight: Use riscv_has_extension_likely
Use riscv_has_extension_likely() to check for RISCV_ISA_EXT_ZBB,
replacing the use of asm goto with ALTERNATIVE.

The "likely" variant is used to match the behavior of the original
implementation using ALTERNATIVE("j %l[legacy]", "nop", ...).

Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20251020-riscv-altn-helper-wip-v4-3-ef941c87669a@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:27 -07:00
Vivian Wang
1c7d491d86 riscv: checksum: Use riscv_has_extension_likely
Use riscv_has_extension_likely() to check for RISCV_ISA_EXT_ZBB,
replacing the use of asm goto with ALTERNATIVE.

The "likely" variant is used to match the behavior of the original
implementation using ALTERNATIVE("j %l[no_zbb]", "nop", ...).

While we're at it, also remove bogus comment about Zbb being likely
available. We have to choose between "likely" and "unlikely" due to
limitations of the asm goto feature, but that does not mean we should
put a bad comment on why we pick "likely" over "unlikely".

Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20251020-riscv-altn-helper-wip-v4-2-ef941c87669a@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:27 -07:00
Vivian Wang
0a067ae21b riscv: pgtable: Use riscv_has_extension_unlikely
Use riscv_has_extension_unlikely() to check for RISCV_ISA_EXT_SVVPTC,
replacing the use of asm goto with ALTERNATIVE.

The "unlikely" variant is used to match the behavior of the original
implementation using ALTERNATIVE("nop", "j %l[svvptc]", ...).

Note that this makes the check for RISCV_ISA_EXT_SVVPTC a runtime one if
RISCV_ALTERNATIVE=n, but it should still be worthwhile to do so given
that TLB flushes are relatively slow.

Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Link: https://patch.msgid.link/20251020-riscv-altn-helper-wip-v4-1-ef941c87669a@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:27 -07:00
Vishal Moola (Oracle)
c9a7161044 riscv: Remove __GFP_HIGHMEM masking
Remove unnecessary __GFP_HIGHMEM masking, which was introduced with
commit 380f2c1ae9 ("riscv: convert alloc_{pmd, pte}_late() to use
ptdescs"). GFP_KERNEL doesn't contain __GFP_HIGHMEM.

Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Link: https://patch.msgid.link/20251107182620.95844-1-vishal.moola@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:27 -07:00
Anup Patel
231fb999a9 RISC-V: Enable HOTPLUG_PARALLEL for secondary CPUs
The core kernel already supports parallel bringup of secondary
CPUs (aka HOTPLUG_PARALLEL). The x86 and MIPS architectures
already use HOTPLUG_PARALLEL and ARM is also moving toward it.

On RISC-V, there is no arch specific global data accessed in the
RISC-V secondary CPU bringup path so enabling HOTPLUG_PARALLEL for
RISC-V would only require:
1) Providing RISC-V specific arch_cpuhp_kick_ap_alive()
2) Calling cpuhp_ap_sync_alive() from smp_callin()

This patch is tested natively with OpenSBI on QEMU RV64 virt machine
with 64 cores and also tested with KVM RISC-V guest with 32 VCPUs.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://patch.msgid.link/20250905122512.71684-1-apatel@ventanamicro.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:27 -07:00
Xianwei Zhao
ba8abbdfd0 soc: amlogic: meson-gx-socinfo: add new SoCs id
Add new definition for Amlogic SoCs, include S6, S7, S7D.

Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com>
Link: https://patch.msgid.link/20251119-soc-info-s6-s7-s7d-v3-2-1764c1995c04@amlogic.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-19 15:47:15 +01:00
Xianwei Zhao
1d80bed4e3 dt-bindings: arm: amlogic: meson-gx-ao-secure: support more SoCs
Add new compatible for ao-secure of Amlogic SoCs(S6,S7,S7D).

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com>
Link: https://patch.msgid.link/20251119-soc-info-s6-s7-s7d-v3-1-1764c1995c04@amlogic.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-11-19 15:47:15 +01:00
Binbin Wu
0b28f21ad4 KVM: x86: Add a helper to dedup loading guest/host XCR0 and XSS
Add and use a helper, kvm_load_xfeatures(), to dedup the code that loads
guest/host xfeatures.

Opportunistically return early if X86_CR4_OSXSAVE is not set to reduce
indentations.

No functional change intended.

Suggested-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://patch.msgid.link/20251110050539.3398759-1-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-19 05:41:12 -08:00
Sean Christopherson
7649412af3 KVM: x86: Load guest/host PKRU outside of the fastpath run loop
Move KVM's swapping of PKRU outside of the fastpath loop, as there is no
KVM code anywhere in the fastpath that accesses guest/userspace memory,
i.e. that can consume protection keys.

As documented by commit 1be0e61c1f ("KVM, pkeys: save/restore PKRU when
guest/host switches"), KVM just needs to ensure the host's PKRU is loaded
when KVM (or the kernel at-large) may access userspace memory.  And at the
time of commit 1be0e61c1f, KVM didn't have a fastpath, and PKU was
strictly contained to VMX, i.e. there was no reason to swap PKRU outside
of vmx_vcpu_run().

Over time, the "need" to swap PKRU close to VM-Enter was likely falsely
solidified by the association with XFEATUREs in commit 37486135d3
("KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c"),
and XFEATURE swapping was in turn moved close to VM-Enter/VM-Exit as a
KVM hack-a-fix ution for an #MC handler bug by commit 1811d979c7
("x86/kvm: move kvm_load/put_guest_xcr0 into atomic context").

Deferring the PKRU loads shaves ~40 cycles off the fastpath for Intel,
and ~60 cycles for AMD.  E.g. using INVD in KVM-Unit-Test's vmexit.c,
with extra hacks to enable CR4.PKE and PKRU=(-1u & ~0x3), latency numbers
for AMD Turin go from ~1560 => ~1500, and for Intel Emerald Rapids, go
from ~810 => ~770.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Jon Kohler <jon@nutanix.com>
Link: https://patch.msgid.link/20251118222328.2265758-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-19 05:41:11 -08:00
Sean Christopherson
75c69c82f2 KVM: x86: Load guest/host XCR0 and XSS outside of the fastpath run loop
Move KVM's swapping of XFEATURE masks, i.e. XCR0 and XSS, out of the
fastpath loop now that the guts of the #MC handler runs in task context,
i.e. won't invoke schedule() with preemption disabled and clobber state
(or crash the kernel) due to trying to context switch XSTATE with a mix
of host and guest state.

For all intents and purposes, this reverts commit 1811d979c7 ("x86/kvm:
move kvm_load/put_guest_xcr0 into atomic context"), which papered over an
egregious bug/flaw in the #MC handler where it would do schedule() even
though IRQs are disabled.  E.g. the call stack from the commit:

  kvm_load_guest_xcr0
  ...
  kvm_x86_ops->run(vcpu)
    vmx_vcpu_run
      vmx_complete_atomic_exit
        kvm_machine_check
          do_machine_check
            do_memory_failure
              memory_failure
                lock_page

Commit 1811d979c7 "fixed" the immediate issue of XRSTORS exploding, but
completely ignored that scheduling out a vCPU task while IRQs and
preemption is wildly broken.  Thankfully, commit 5567d11c21 ("x86/mce:
Send #MC singal from task work") (somewhat incidentally?) fixed that flaw
by pushing the meat of the work to the user-return path, i.e. to task
context.

KVM has also hardened itself against #MC goofs by moving #MC forwarding to
kvm_x86_ops.handle_exit_irqoff(), i.e. out of the fastpath.  While that's
by no means a robust fix, restoring as much state as possible before
handling the #MC will hopefully provide some measure of protection in the
event that #MC handling goes off the rails again.

Note, KVM always intercepts XCR0 writes for vCPUs without protected state,
e.g. there's no risk of consuming a stale XCR0 when determining if a PKRU
update is needed; kvm_load_host_xfeatures() only reads, and never writes,
vcpu->arch.xcr0.

Deferring the XCR0 and XSS loads shaves ~300 cycles off the fastpath for
Intel, and ~500 cycles for AMD.  E.g. using INVD in KVM-Unit-Test's
vmexit.c, which an extra hack to enable CR4.OXSAVE, latency numbers for
AMD Turin go from ~2000 => 1500, and for Intel Emerald Rapids, go from
~1300 => ~1000.

Cc: Jon Kohler <jon@nutanix.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Jon Kohler <jon@nutanix.com>
Link: https://patch.msgid.link/20251118222328.2265758-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-19 05:41:10 -08:00
Sean Christopherson
63669bd1d5 KVM: VMX: Handle #MCs on VM-Enter/TD-Enter outside of the fastpath
Handle Machine Checks (#MC) that happen on VM-Enter (VMX or TDX) outside
of KVM's fastpath so that as much host state as possible is re-loaded
before invoking the kernel's #MC handler.  The only requirement is that
KVM invokes the #MC handler before enabling IRQs (and even that could
_probably_ be related to handling #MCs before enabling preemption).

Waiting to handle #MCs until "more" host state is loaded hardens KVM
against flaws in the #MC handler, which has historically been quite
brittle. E.g. prior to commit 5567d11c21 ("x86/mce: Send #MC singal from
task work"), the #MC code could trigger a schedule() with IRQs and
preemption disabled.  That led to a KVM hack-a-fix in commit 1811d979c7
("x86/kvm: move kvm_load/put_guest_xcr0 into atomic context").

Note, vmx_handle_exit_irqoff() is common to VMX and TDX guests.

Cc: Tony Lindgren <tony.lindgren@linux.intel.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Jon Kohler <jon@nutanix.com>
Reviewed-by: Tony Lindgren <tony.lindgren@linux.intel.com>
Link: https://patch.msgid.link/20251118222328.2265758-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-19 05:40:52 -08:00
Sean Christopherson
ebd1a33655 KVM: SVM: Handle #MCs in guest outside of fastpath
Handle Machine Checks (#MC) that happen in the guest (by forwarding them
to the host) outside of KVM's fastpath so that as much host state as
possible is re-loaded before invoking the kernel's #MC handler.  The only
requirement is that KVM invokes the #MC handler before enabling IRQs (and
even that could _probably_ be relaxed to handling #MCs before enabling
preemption).

Waiting to handle #MCs until "more" host state is loaded hardens KVM
against flaws in the #MC handler, which has historically been quite
brittle. E.g. prior to commit 5567d11c21 ("x86/mce: Send #MC singal from
task work"), the #MC code could trigger a schedule() with IRQs and
preemption disabled.  That led to a KVM hack-a-fix in commit 1811d979c7
("x86/kvm: move kvm_load/put_guest_xcr0 into atomic context").

Note, except for #MCs on VM-Enter, VMX already handles #MCs outside of the
fastpath.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Jon Kohler <jon@nutanix.com>
Link: https://patch.msgid.link/20251118222328.2265758-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-18 16:22:46 -08:00
Brendan Jackman
38ee66cb18 KVM: x86: Unify L1TF flushing under per-CPU variable
Currently the tracking of the need to flush L1D for L1TF is tracked by
two bits: one per-CPU and one per-vCPU.

The per-vCPU bit is always set when the vCPU shows up on a core, so
there is no interesting state that's truly per-vCPU. Indeed, this is a
requirement, since L1D is a part of the physical CPU.

So simplify this by combining the two bits.

The vCPU bit was being written from preemption-enabled regions.  To play
nice with those cases, wrap all calls from KVM and use a raw write so that
request a flush with preemption enabled doesn't trigger what would
effectively be DEBUG_PREEMPT false positives.  Preemption doesn't need to
be disabled, as kvm_arch_vcpu_load() will mark the new CPU as needing a
flush if the vCPU task is migrated, or if userspace runs the vCPU on a
different task.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
[sean: put raw write in KVM instead of in a hardirq.h variant]
Link: https://patch.msgid.link/20251113233746.1703361-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-18 16:22:45 -08:00
Sean Christopherson
05bd63959a KVM: VMX: Disable L1TF L1 data cache flush if CONFIG_CPU_MITIGATIONS=n
Disable support for flushing the L1 data cache to mitigate L1TF if CPU
mitigations are disabled for the entire kernel.  KVM's mitigation of L1TF
is in no way special enough to justify ignoring CONFIG_CPU_MITIGATIONS=n.

Deliberately use CPU_MITIGATIONS instead of the more precise
MITIGATION_L1TF, as MITIGATION_L1TF only controls the default behavior,
i.e. CONFIG_MITIGATION_L1TF=n doesn't completely disable L1TF mitigations
in the kernel.

Keep the vmentry_l1d_flush module param to avoid breaking existing setups,
and leverage the .set path to alert the user to the fact that
vmentry_l1d_flush will be ignored.  Don't bother validating the incoming
value; if an admin misconfigures vmentry_l1d_flush, the fact that the bad
configuration won't be detected when running with CONFIG_CPU_MITIGATIONS=n
is likely the least of their worries.

Reviewed-by: Brendan Jackman <jackmanb@google.com>
Link: https://patch.msgid.link/20251113233746.1703361-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-18 16:22:44 -08:00
Sean Christopherson
0abd9610d6 KVM: VMX: Bundle all L1 data cache flush mitigation code together
Move vmx_l1d_flush(), vmx_cleanup_l1d_flush(), and the vmentry_l1d_flush
param code up in vmx.c so that all of the L1 data cache flushing code is
bundled together.  This will allow conditioning the mitigation code on
CONFIG_CPU_MITIGATIONS=y with minimal #ifdefs.

No functional change intended.

Reviewed-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Link: https://patch.msgid.link/20251113233746.1703361-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-18 16:22:44 -08:00
Sean Christopherson
fc704b5789 x86/bugs: KVM: Move VM_CLEAR_CPU_BUFFERS into SVM as SVM_CLEAR_CPU_BUFFERS
Now that VMX encodes its own sequence for clearing CPU buffers, move
VM_CLEAR_CPU_BUFFERS into SVM to minimize the chances of KVM botching a
mitigation in the future, e.g. using VM_CLEAR_CPU_BUFFERS instead of
checking multiple mitigation flags.

No functional change intended.

Reviewed-by: Brendan Jackman <jackmanb@google.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20251113233746.1703361-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-18 16:22:43 -08:00
Sean Christopherson
e6ff1d61de KVM: VMX: Handle MMIO Stale Data in VM-Enter assembly via ALTERNATIVES_2
Rework the handling of the MMIO Stale Data mitigation to clear CPU buffers
immediately prior to VM-Enter, i.e. in the same location that KVM emits a
VERW for unconditional (at runtime) clearing.  Co-locating the code and
using a single ALTERNATIVES_2 makes it more obvious how VMX mitigates the
various vulnerabilities.

Deliberately order the alternatives as:

 0. Do nothing
 1. Clear if vCPU can access MMIO
 2. Clear always

since the last alternative wins in ALTERNATIVES_2(), i.e. so that KVM will
honor the strictest mitigation (always clear CPU buffers) if multiple
mitigations are selected.  E.g. even if the kernel chooses to mitigate
MMIO Stale Data via X86_FEATURE_CLEAR_CPU_BUF_VM_MMIO, another mitigation
may enable X86_FEATURE_CLEAR_CPU_BUF_VM, and that other thing needs to win.

Note, decoupling the MMIO mitigation from the L1TF mitigation also fixes
a mostly-benign flaw where KVM wouldn't do any clearing/flushing if the
L1TF mitigation is configured to conditionally flush the L1D, and the MMIO
mitigation but not any other "clear CPU buffers" mitigation is enabled.
For that specific scenario, KVM would skip clearing CPU buffers for the
MMIO mitigation even though the kernel requested a clear on every VM-Enter.

Note #2, the flaw goes back to the introduction of the MDS mitigation.  The
MDS mitigation was inadvertently fixed by commit 43fb862de8 ("KVM/VMX:
Move VERW closer to VMentry for MDS mitigation"), but previous kernels
that flush CPU buffers in vmx_vcpu_enter_exit() are affected (though it's
unlikely the flaw is meaningfully exploitable even older kernels).

Fixes: 650b68a062 ("x86/kvm/vmx: Add MDS protection when L1D Flush is not active")
Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Link: https://patch.msgid.link/20251113233746.1703361-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-18 16:22:42 -08:00
Sean Christopherson
f6106d41ec x86/bugs: Use an x86 feature to track the MMIO Stale Data mitigation
Convert the MMIO Stale Data mitigation tracking from a static branch into
an x86 feature flag so that it can be used via ALTERNATIVE_2 in KVM.

No functional change intended.

Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Link: https://patch.msgid.link/20251113233746.1703361-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-18 16:22:42 -08:00
Sean Christopherson
afb99ffbd5 x86/bugs: Decouple ALTERNATIVE usage from VERW macro definition
Decouple the use of ALTERNATIVE from the encoding of VERW to clear CPU
buffers so that KVM can use ALTERNATIVE_2 to handle "always clear buffers"
and "clear if guest can access host MMIO" in a single statement.

No functional change intended.

Reviewed-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Link: https://patch.msgid.link/20251113233746.1703361-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-18 16:22:41 -08:00
Pawan Gupta
aba7de6088 x86/bugs: Use VM_CLEAR_CPU_BUFFERS in VMX as well
TSA mitigation:

  d8010d4ba4 ("x86/bugs: Add a Transient Scheduler Attacks mitigation")

introduced VM_CLEAR_CPU_BUFFERS for guests on AMD CPUs. Currently on Intel
CLEAR_CPU_BUFFERS is being used for guests which has a much broader scope
(kernel->user also).

Make mitigations on Intel consistent with TSA. This would help handling the
guest-only mitigations better in future.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
[sean: make CLEAR_CPU_BUF_VM mutually exclusive with the MMIO mitigation]
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Link: https://patch.msgid.link/20251113233746.1703361-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-18 16:22:40 -08:00
Sean Christopherson
844afc1af3 KVM: VMX: Use on-stack copy of @flags in __vmx_vcpu_run()
When testing for VMLAUNCH vs. VMRESUME, use the copy of @flags from the
stack instead of first moving it to EBX, and then propagating
VMX_RUN_VMRESUME to RFLAGS.CF (because RBX is clobbered with the guest
value prior to the conditional branch to VMLAUNCH).  Stashing information
in RFLAGS is gross, especially with the writer and reader being bifurcated
by yet more gnarly assembly code.

Opportunistically drop the SHIFT macros as they existed purely to allow
the VM-Enter flow to use Bit Test.

Suggested-by: Borislav Petkov <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Link: https://patch.msgid.link/20251113233746.1703361-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-18 16:22:35 -08:00
Chao Gao
11d984633f KVM: x86: Allocate/free user_return_msrs at kvm.ko (un)loading time
Move user_return_msrs allocation/free from vendor modules (kvm-intel.ko and
kvm-amd.ko) (un)loading time to kvm.ko's to make it less risky to access
user_return_msrs in kvm.ko. Tying the lifetime of user_return_msrs to
vendor modules makes every access to user_return_msrs prone to
use-after-free issues as vendor modules may be unloaded at any time.

Opportunistically turn the per-CPU variable into full structs, as there's
no practical difference between statically allocating the memory and
allocating it unconditionally during module_init().

Zero out kvm_nr_uret_msrs on vendor module exit to further minimize the
chances of consuming stale data, and WARN on vendor module load if KVM
thinks there are existing user-return MSRs.

Note!  The user-return MSRs also need to be "destroyed" if
ops->hardware_setup() fails, as both SVM and VMX expect common KVM to
clean up (because common code, not vendor code, is responsible for
kvm_nr_uret_msrs).

Signed-off-by: Chao Gao <chao.gao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://patch.msgid.link/20251108013601.902918-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-18 15:53:54 -08:00
Encrow Thorne
f3d8b64ee4 reset: fix BIT macro reference
RESET_CONTROL_FLAGS_BIT_* macros use BIT(), but reset.h does not
include bits.h. This causes compilation errors when including
reset.h standalone.

Include bits.h to make reset.h self-contained.

Suggested-by: Troy Mitchell <troy.mitchell@linux.dev>
Reviewed-by: Troy Mitchell <troy.mitchell@linux.dev>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Encrow Thorne <jyc0019@gmail.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18 17:52:54 +01:00
Dan Carpenter
cea5d43b63 reset: rzg2l-usbphy-ctrl: Fix a NULL vs IS_ERR() bug in probe
The devm_regmap_field_alloc() function never returns NULL, it returns
error pointers.  Update the error checking to match.

Fixes: 58128aa88867 ("reset: rzg2l-usbphy-ctrl: Add support for USB PWRRDY")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18 17:52:54 +01:00
Yao Zi
da91533c2b reset: th1520: Support reset controllers in more subsystems
Introduce reset controllers for AP, MISC, VI, VP and DSP subsystems and
add their reset signal mappings.

Signed-off-by: Yao Zi <ziyao@disroot.org>
Reviewed-by: Drew Fustini <fustini@kernel.org>
Acked-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18 17:52:54 +01:00
Yao Zi
0040d9eac3 reset: th1520: Prepare for supporting multiple controllers
TH1520 SoC is divided into several subsystems, shipping distinct reset
controllers with similar control logic. Let's make reset signal mapping
a data structure specific to one compatible to prepare for introduction
of more reset controllers in the future.

Signed-off-by: Yao Zi <ziyao@disroot.org>
Acked-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Drew Fustini <fustini@kernel.org>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18 17:52:54 +01:00
Yao Zi
a35ac6f3bd dt-bindings: reset: thead,th1520-reset: Add controllers for more subsys
TH1520 SoC is divided into several subsystems, most of them have
distinct reset controllers. Let's document reset controllers other than
the one for VO subsystem and IDs for their reset signals.

Signed-off-by: Yao Zi <ziyao@disroot.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Drew Fustini <fustini@kernel.org>
Acked-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18 17:52:54 +01:00
Yao Zi
5334eb9de7 dt-bindings: reset: thead,th1520-reset: Remove non-VO-subsystem resets
Registers in control of TH1520_RESET_ID_{NPU,WDT0,WDT1} belong to AP
reset controller, not the VO one which is documented as
"thead,th1520-reset" and is the only reset controller supported for
TH1520 for now.

Let's remove the IDs, leaving them to be implemented by AP-subsystem
reset controller in the future.

Fixes: 30e7573bab ("dt-bindings: reset: Add T-HEAD TH1520 SoC Reset Controller")
Signed-off-by: Yao Zi <ziyao@disroot.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Drew Fustini <fustini@kernel.org>
Acked-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18 17:52:54 +01:00
Bartosz Golaszewski
8bffbfdc01 reset: remove legacy reset lookup code
There are no more users of this code. Let's remove the exported symbols
and the implementation from reset core.

Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
[p.zabel@pengutronix.de: folded in 8e6ec20e-8965-4b42-99fc-0462269ff2f1@paulmck-laptop]
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18 17:52:54 +01:00
Bartosz Golaszewski
a86aaf2b62 clk: davinci: psc: drop unused reset lookup
We no longer support any non-DT DaVinci boards so there are no more
users of legacy reset lookup.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: David Lechner <david@lechnology.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18 17:52:54 +01:00
Claudiu Beznea
819ac6b055 reset: rzg2l-usbphy-ctrl: Add support for RZ/G3S SoC
The Renesas RZ/G3S SoC USB PHY HW block receives as input the USB PWRRDY
signal from the system controller. Add support for the Renesas RZ/G3S SoC.

Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18 17:52:54 +01:00
Claudiu Beznea
78f2d64e48 reset: rzg2l-usbphy-ctrl: Add support for USB PWRRDY
On the Renesas RZ/G3S SoC, the USB PHY block has an input signal called
PWRRDY. This signal is managed by the system controller and must be
de-asserted after powering on the area where USB PHY resides and asserted
before powering it off.

On power-on/resume the USB PWRRDY signal need to be de-asserted before
enabling clock and switching the module to normal state (through MSTOP
support). The power-on/resume configuration sequence must be:

1/ PWRRDY=0
2/ CLK_ON=1
3/ MSTOP=0

On power-off/suspend the configuration sequence should be:

1/ MSTOP=1
2/ CLK_ON=0
3/ PWRRDY=1

The CLK_ON and MSTOP functionalities are controlled by clock drivers.
The suspend/resume support will be handled by different patches.

After long discussions with the internal HW team, it has been confirmed
that the HW connection b/w USB PHY block, the USB channels, the system
controller, clock, MSTOP, PWRRDY signal is as follows:

                               ┌──────────────────────────────┐
                               │                              │◄── CPG_CLKON_USB.CLK0_ON
                               │     USB CH0                  │
┌──────────────────────────┐   │┌───────────────────────────┐ │◄── CPG_CLKON_USB.CLK2_ON
│                 ┌────────┐   ││host controller registers  │ │
│                 │        │   ││function controller registers│
│                 │ PHY0   │◄──┤└───────────────────────────┘ │
│     USB PHY     │        │   └────────────▲─────────────────┘
│                 └────────┘                │
│                          │    CPG_BUS_PERI_COM_MSTOP.MSTOP{6, 5}_ON
│┌──────────────┐ ┌────────┐
││USHPHY control│ │        │
││  registers   │ │ PHY1   │   ┌──────────────────────────────┐
│└──────────────┘ │        │◄──┤     USB CH1                  │
│                 └────────┘   │┌───────────────────────────┐ │◄── CPG_CLKON_USB.CLK1_ON
└─▲───────▲─────────▲──────┘   ││ host controller registers │ │
  │       │         │          │└───────────────────────────┘ │
  │       │         │          └────────────▲─────────────────┘
  │       │         │                       │
  │       │         │           CPG_BUS_PERI_COM_MSTOP.MSTOP7_ON
  │PWRRDY │         │
  │       │   CPG_CLK_ON_USB.CLK3_ON
  │       │
  │  CPG_BUS_PERI_COM_MSTOP.MSTOP4_ON
  │
┌────┐
│SYSC│
└────┘

where:
- CPG_CLKON_USB.CLK.CLKX_ON is the register bit controlling the clock X
  of different USB blocks, X in {0, 1, 2, 3}
- CPG_BUS_PERI_COM_MSTOP.MSTOPX_ON is the register bit controlling the
  MSTOP of different USB blocks, X in {4, 5, 6, 7}
- USB PHY is the USB PHY block exposing 2 ports, port0 and port1, used
  by the USB CH0, USB CH1
- SYSC is the system controller block controlling the PWRRDY signal
- USB CHx are individual USB block with host and function capabilities
  (USB CH0 have both host and function capabilities, USB CH1 has only
  host capabilities)

The USBPHY control registers are controlled though the
reset-rzg2l-usbphy-ctrl driver. The USB PHY ports are controlled by
phy_rcar_gen3_usb2 (drivers/phy/renesas/phy-rcar-gen3-usb2.c file). The
USB PHY ports request resets from the reset-rzg2l-usbphy-ctrl driver.

The connection b/w the system controller and the USB PHY CTRL driver is
implemented through the renesas,sysc-pwrrdy device tree property
proposed in this patch. This property specifies the register offset and the
bitmask required to control the PWRRDY signal.

Since the USB PHY CTRL driver needs to be probed before any other
USB-specific driver on RZ/G3S, control of PWRRDY is passed exclusively
to it. This guarantees the correct configuration sequence between clocks,
MSTOP bits, and the PWRRDY bit on probe/resume and remove/suspend. At the
same time, changes are kept minimal by avoiding modifications to the USB
PHY driver to also handle the PWRRDY itself.

Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18 17:52:54 +01:00
Claudiu Beznea
20eee0f69c dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document RZ/G3S support
The Renesas USB PHY hardware block needs to have the PWRRDY bit in the
system controller set before applying any other settings. The PWRRDY bit
must be controlled during power-on, power-off, and system suspend/resume
sequences as follows:
- during power-on/resume, it must be set to zero before enabling clocks and
  modules
- during power-off/suspend, it must be set to one after disabling clocks
  and modules

Add the renesas,sysc-pwrrdy device tree property, which allows the
reset-rzg2l-usbphy-ctrl driver to parse, map, and control the system
controller PWRRDY bit at the appropriate time. Along with it add a new
compatible for the RZ/G3S SoC.

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18 17:52:54 +01:00
Xuyang Dong
0884bd97c0 reset: eswin: Add eic7700 reset driver
Add support for reset controller in eic7700 series chips.
Provide functionality for asserting and deasserting resets
on the chip.

Signed-off-by: Yifeng Huang <huangyifeng@eswincomputing.com>
Signed-off-by: Xuyang Dong <dongxuyang@eswincomputing.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18 17:52:54 +01:00
Xuyang Dong
23818ebb9c dt-bindings: reset: eswin: Documentation for eic7700 SoC
Add device tree binding documentation and header file for the ESWIN
eic7700 reset controller module.

Signed-off-by: Yifeng Huang <huangyifeng@eswincomputing.com>
Signed-off-by: Xuyang Dong <dongxuyang@eswincomputing.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18 17:52:54 +01:00
Robert Marko
bf919ccfce reset: sparx5: add LAN969x support
LAN969x uses the same reset configuration as LAN966x, but we need to
allow compiling it when ARCH_LAN969X is selected.

A fallback compatible to LAN966x will be used.

Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18 17:52:53 +01:00
Robert Marko
fc80e57c51 dt-bindings: reset: microchip: Add LAN969x support
LAN969x also uses the Microchip reset driver, it reuses the LAN966x
support so use a fallback compatible.

Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18 17:52:53 +01:00
Darrick J. Wong
28fec8b95e fuse: signal that a fuse inode should exhibit local fs behaviors
Create a new fuse inode flag that indicates that the kernel should
implement various local filesystem behaviors instead of passing vfs
commands straight through to the fuse server and expecting the server to
do all the work.  For example, this means that we'll use the kernel to
transform some ACL updates into mode changes, and later to do
enforcement of the immutable and append iflags.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-18 16:29:42 +01:00
Ritesh Harjani (IBM)
5b3a426aff powerpc/64s/slb: Add no_slb_preload early cmdline param
no_slb_preload cmdline can come useful in quickly disabling and/or
testing the performance impact of userspace slb preloads. Recently there
was a slb multi-hit issue due to slb preload cache which was very
difficult to triage. This cmdline option allows to quickly disable
preloads and verify if the issue exists in preload cache or somewhere
else. This can also be a useful option to see the effect of slb preloads
for any application workload e.g. number of slb faults with or w/o slb
preloads.

with slb_preload:
slb_faults (minimal initrd boot): 15
slb_faults (full systemd boot): 300

with no_slb_preload:
slb_faults (minimal initrd boot): 33
slb_faults (full systemd boot): 138180

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/de484b55c45d831bc2db63945f455153c89a9a65.1761834163.git.ritesh.list@gmail.com
2025-11-18 12:35:53 +05:30
Ritesh Harjani (IBM)
2a492d6b38 powerpc/64s/slb: Make preload_add return type as void
We dropped preload_new_slb_context() & slb_setup_new_exec() in a
previous patch. That means we don't really need preload_add() return
type anymore. So let's make its return type as void.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/88b07223e93b0fc56c1119f6889b7d9c83e20109.1761834163.git.ritesh.list@gmail.com
2025-11-18 12:35:53 +05:30
Ritesh Harjani (IBM)
3d44be297e powerpc/ptdump: Dump PXX level info for kernel_page_tables
This patch adds PGD/PUD/PMD/PTE level information while dumping kernel
page tables. Before this patch it was hard to identify which entries
belongs to which page table level e.g.

~ # dmesg |grep -i radix
[0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000005400000 with 2.00 MiB pages (exec)
[0.000000] radix-mmu: Mapped 0x0000000005400000-0x0000000040000000 with 2.00 MiB pages
[0.000000] radix-mmu: Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
[0.000000] radix-mmu: Initializing Radix MMU

Before:
---[ Start of kernel VM ]---
0xc000000000000000-0xc000000003ffffff  XXX   64M   r      X   pte  valid  present  dirty  accessed
0xc000000004000000-0xc00000003fffffff  XXX  960M   r  w       pte  valid  present  dirty  accessed
0xc000000040000000-0xc0000000ffffffff  XXX    3G   r  w       pte  valid  present  dirty  accessed
...
---[ vmemmap start ]---
0xc00c000000000000-0xc00c0000003fffff  XXX    4M   r  w       pte  valid  present  dirty  accessed

After:
---[ Start of kernel VM ]---
0xc000000000000000-0xc000000003ffffff  XXX   64M PMD  r      X   pte  valid  present  dirty  accessed
0xc000000004000000-0xc00000003fffffff  XXX  960M PMD  r  w       pte  valid  present  dirty  accessed
0xc000000040000000-0xc0000000ffffffff  XXX    3G PUD  r  w       pte  valid  present  dirty  accessed
...
---[ vmemmap start ]---
0xc00c000000000000-0xc00c0000003fffff  XXX    4M PMD  r  w       pte  valid  present  dirty  accessed

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/95defb675ee5607ef3923a1e6aeac39311b8fad4.1761834163.git.ritesh.list@gmail.com
2025-11-18 12:35:53 +05:30
Ritesh Harjani (IBM)
6394f0e8ab powerpc/64s/pgtable: Enable directMap counters in meminfo for Hash
This patch enables the directMap counters to be printed in proc/meminfo
for Hash mmu. With this patch on a system with 8G of DRAM we can see the
entire RAM mapped with 16M pagesize:

cat /proc/meminfo |grep -i direct
DirectMap4k:           0 kB
DirectMap64k:          0 kB
DirectMap16M:    8388608 kB
DirectMap16G:          0 kB

Tested with devdax too:
root@buildroot:/# ndctl create-namespace -r region0 -m devdax -s 2G
{
  "dev":"namespace0.0",
  "mode":"devdax",
  "map":"dev",
  "size":"2032.00 MiB (2130.71 MB)",
  "uuid":"aa383ded-cd99-43a0-979f-5225467cfb40",
  "daxregion":{
    "id":0,
    "size":"2032.00 MiB (2130.71 MB)",
    "align":16777216,
    "devices":[
      {
        "chardev":"dax0.0",
        "size":"2032.00 MiB (2130.71 MB)",
        "target_node":0,
        "align":"16.00 MiB (16.78 MB)",
        "mode":"devdax"
      }
    ]
  },
  "align":16777216
}
root@buildroot:/# cat /proc/meminfo |grep -i direct
DirectMap4k:           0 kB
DirectMap64k:          0 kB
DirectMap16M:   10485760 kB
DirectMap16G:          0 kB

root@buildroot:/# ndctl destroy-namespace -f all
destroyed 1 namespace
root@buildroot:/# cat /proc/meminfo |grep -i direct
DirectMap4k:           0 kB
DirectMap64k:          0 kB
DirectMap16M:    8388608 kB
DirectMap16G:          0 kB

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/ffe47912e17075649b6ce5b2ee5d7f3eb5352e0b.1761834163.git.ritesh.list@gmail.com
2025-11-18 12:35:53 +05:30
Ritesh Harjani (IBM)
b296fda58d powerpc/64s/hash: Update directMap page counters for Hash
Update the directMap page counters for Hash. Hash by default always uses
mmu_linear_psize only, for it's directMap. However, once the kernel has
booted and the dmesg log is wrapped over there is no way of knowing the
kernel linear pagesize with Hash mmu. Features like debug_page_alloc can
make mmu_linear_psize to be PAGE_SIZE instead of PMD / PUD mappings. It
would be easier if we have this info printed in proc meminfo similar to
Radix for debugging purposes.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/208e6f946d2ba9c1e2b8b4f665728abe5c891e7c.1761834163.git.ritesh.list@gmail.com
2025-11-18 12:35:53 +05:30
Ritesh Harjani (IBM)
b80691e25e powerpc/64s/hash: Hash hpt_order should be only available with Hash MMU
This disables creating hpt_order debugfs entry with radix mode.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/99237176a51c73e85f4a7edd60a2460017882d69.1761834163.git.ritesh.list@gmail.com
2025-11-18 12:35:52 +05:30
Ritesh Harjani (IBM)
fec40fe7e6 powerpc/64s/hash: Improve hash mmu printk messages
Let's use pr_info() instead of printk() in order to utilize the pr_fmt
set to "hash-mmu:". This improves the debug messages that are spitted
out during kernel bootup.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/a2af842f85c099cdbd19bf468606960c5226a079.1761834163.git.ritesh.list@gmail.com
2025-11-18 12:35:52 +05:30
Ritesh Harjani (IBM)
178dd2ee2b powerpc/64s/hash: Fix phys_addr_t printf format in htab_initialize()
We get below errors when we try to enable debug logs in book3s64/hash_utils.c
This patch fixes these errors related to phys_addr_t printf format.

arch/powerpc/mm/book3s64/hash_utils.c: In function ‘htab_initialize’:
arch/powerpc/mm/book3s64/hash_utils.c:1401:21: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘phys_addr_t’ {aka ‘long long unsigned int’} [-Werror=format=]
 1401 |                 DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
arch/powerpc/mm/book3s64/hash_utils.c:1401:21: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘phys_addr_t’ {aka ‘long long unsigned int’} [-Werror=format=]
cc1: all warnings being treated as errors
make[6]: *** [../scripts/Makefile.build:287: arch/powerpc/mm/book3s64/hash_utils.o] Error 1

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/4873e9692fc4411099c9741005d218d5e734c345.1761834163.git.ritesh.list@gmail.com
2025-11-18 12:35:52 +05:30
Ritesh Harjani (IBM)
eae40a6da6 powerpc/64s/ptdump: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE format
HPTE format was changed since Power9 (ISA 3.0) onwards. While dumping
kernel hash page tables, nothing gets printed on powernv P9+. This patch
utilizes the helpers added in the patch tagged as fixes, to convert new
format to old format and dump the hptes. This fix is only needed for
native_find() (powernv), since pseries continues to work fine with the
old format.

Fixes: 6b243fcfb5 ("powerpc/64: Simplify adaptation to new ISA v3.00 HPTE format")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/4c2bb9e5b3cfbc0dd80b61b67cdd3ccfc632684c.1761834163.git.ritesh.list@gmail.com
2025-11-18 12:35:52 +05:30
Ritesh Harjani (IBM)
17b45ccf09 powerpc/64s/hash: Restrict stress_hpt_struct memblock region to within RMA limit
When HV=0 & IR/DR=0, the Hash MMU is said to be in Virtual Real
Addressing Mode during early boot. During this, we should ensure that
memory region allocations for stress_hpt_struct should happen from
within RMA region as otherwise the boot might get stuck while doing
memset of this region.

History behind why do we have RMA region limitation is better explained
in these 2 patches [1] & [2]. This patch ensures that memset to
stress_hpt_struct during early boot does not cross ppc64_rma_size
boundary.

[1]: https://lore.kernel.org/all/20190710052018.14628-1-sjitindarsingh@gmail.com/
[2]: https://lore.kernel.org/all/87wp54usvj.fsf@linux.vnet.ibm.com/

Fixes: 6b34a099fa ("powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/ada1173933ea7617a994d6ee3e54ced8797339fc.1761834163.git.ritesh.list@gmail.com
2025-11-18 12:35:52 +05:30
Donet Tom
00312419f0 powerpc/64s/slb: Fix SLB multihit issue during SLB preload
On systems using the hash MMU, there is a software SLB preload cache that
mirrors the entries loaded into the hardware SLB buffer. This preload
cache is subject to periodic eviction — typically after every 256 context
switches — to remove old entry.

To optimize performance, the kernel skips switch_mmu_context() in
switch_mm_irqs_off() when the prev and next mm_struct are the same.
However, on hash MMU systems, this can lead to inconsistencies between
the hardware SLB and the software preload cache.

If an SLB entry for a process is evicted from the software cache on one
CPU, and the same process later runs on another CPU without executing
switch_mmu_context(), the hardware SLB may retain stale entries. If the
kernel then attempts to reload that entry, it can trigger an SLB
multi-hit error.

The following timeline shows how stale SLB entries are created and can
cause a multi-hit error when a process moves between CPUs without a
MMU context switch.

CPU 0                                   CPU 1
-----                                    -----
Process P
exec                                    swapper/1
 load_elf_binary
  begin_new_exc
    activate_mm
     switch_mm_irqs_off
      switch_mmu_context
       switch_slb
       /*
        * This invalidates all
        * the entries in the HW
        * and setup the new HW
        * SLB entries as per the
        * preload cache.
        */
context_switch
sched_migrate_task migrates process P to cpu-1

Process swapper/0                       context switch (to process P)
(uses mm_struct of Process P)           switch_mm_irqs_off()
                                         switch_slb
                                           load_slb++
                                            /*
                                            * load_slb becomes 0 here
                                            * and we evict an entry from
                                            * the preload cache with
                                            * preload_age(). We still
                                            * keep HW SLB and preload
                                            * cache in sync, that is
                                            * because all HW SLB entries
                                            * anyways gets evicted in
                                            * switch_slb during SLBIA.
                                            * We then only add those
                                            * entries back in HW SLB,
                                            * which are currently
                                            * present in preload_cache
                                            * (after eviction).
                                            */
                                        load_elf_binary continues...
                                         setup_new_exec()
                                          slb_setup_new_exec()

                                        sched_switch event
                                        sched_migrate_task migrates
                                        process P to cpu-0

context_switch from swapper/0 to Process P
 switch_mm_irqs_off()
  /*
   * Since both prev and next mm struct are same we don't call
   * switch_mmu_context(). This will cause the HW SLB and SW preload
   * cache to go out of sync in preload_new_slb_context. Because there
   * was an SLB entry which was evicted from both HW and preload cache
   * on cpu-1. Now later in preload_new_slb_context(), when we will try
   * to add the same preload entry again, we will add this to the SW
   * preload cache and then will add it to the HW SLB. Since on cpu-0
   * this entry was never invalidated, hence adding this entry to the HW
   * SLB will cause a SLB multi-hit error.
   */
load_elf_binary continues...
 START_THREAD
  start_thread
   preload_new_slb_context
   /*
    * This tries to add a new EA to preload cache which was earlier
    * evicted from both cpu-1 HW SLB and preload cache. This caused the
    * HW SLB of cpu-0 to go out of sync with the SW preload cache. The
    * reason for this was, that when we context switched back on CPU-0,
    * we should have ideally called switch_mmu_context() which will
    * bring the HW SLB entries on CPU-0 in sync with SW preload cache
    * entries by setting up the mmu context properly. But we didn't do
    * that since the prev mm_struct running on cpu-0 was same as the
    * next mm_struct (which is true for swapper / kernel threads). So
    * now when we try to add this new entry into the HW SLB of cpu-0,
    * we hit a SLB multi-hit error.
    */

WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62
assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149]
Modules linked in:
CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty #12
VOLUNTARY
Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected)
0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries
NIP:  c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000
REGS: c0000000497c77e0 TRAP: 0700   Not tainted  (6.16.0-rc3-dirty)
MSR:  8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 28888482  XER: 00000000
CFAR: c0000000001543b0 IRQMASK: 3
<...>
NIP [c00000000015426c] assert_slb_presence+0x2c/0x50
LR [c0000000001543b4] slb_insert_entry+0x124/0x390
Call Trace:
  0x7fffceb5ffff (unreliable)
  preload_new_slb_context+0x100/0x1a0
  start_thread+0x26c/0x420
  load_elf_binary+0x1b04/0x1c40
  bprm_execve+0x358/0x680
  do_execveat_common+0x1f8/0x240
  sys_execve+0x58/0x70
  system_call_exception+0x114/0x300
  system_call_common+0x160/0x2c4

>From the above analysis, during early exec the hardware SLB is cleared,
and entries from the software preload cache are reloaded into hardware
by switch_slb. However, preload_new_slb_context and slb_setup_new_exec
also attempt to load some of the same entries, which can trigger a
multi-hit. In most cases, these additional preloads simply hit existing
entries and add nothing new. Removing these functions avoids redundant
preloads and eliminates the multi-hit issue. This patch removes these
two functions.

We tested process switching performance using the context_switch
benchmark on POWER9/hash, and observed no regression.

Without this patch: 129041 ops/sec
With this patch:    129341 ops/sec

We also measured SLB faults during boot, and the counts are essentially
the same with and without this patch.

SLB faults without this patch: 19727
SLB faults with this patch:    19786

Fixes: 5434ae7462 ("powerpc/64s/hash: Add a SLB preload cache")
cc: stable@vger.kernel.org
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/0ac694ae683494fe8cadbd911a1a5018d5d3c541.1761834163.git.ritesh.list@gmail.com
2025-11-18 12:35:52 +05:30
Dave Vasilevsky
78fc63ffa7 powerpc, mm: Fix mprotect on book3s 32-bit
On 32-bit book3s with hash-MMUs, tlb_flush() was a no-op. This was
unnoticed because all uses until recently were for unmaps, and thus
handled by __tlb_remove_tlb_entry().

After commit 4a18419f71 ("mm/mprotect: use mmu_gather") in kernel 5.19,
tlb_gather_mmu() started being used for mprotect as well. This caused
mprotect to simply not work on these machines:

  int *ptr = mmap(NULL, 4096, PROT_READ|PROT_WRITE,
                  MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
  *ptr = 1; // force HPTE to be created
  mprotect(ptr, 4096, PROT_READ);
  *ptr = 2; // should segfault, but succeeds

Fixed by making tlb_flush() actually flush TLB pages. This finally
agrees with the behaviour of boot3s64's tlb_flush().

Fixes: 4a18419f71 ("mm/mprotect: use mmu_gather")
Cc: stable@vger.kernel.org
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Dave Vasilevsky <dave@vasilevsky.ca>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251116-vasi-mprotect-g3-v3-1-59a9bd33ba00@vasilevsky.ca
2025-11-18 12:34:17 +05:30
Al Viro
eb028c3345 d_make_discardable(): warn if given a non-persistent dentry
At this point there are very few call chains that might lead to
d_make_discardable() on a dentry that hadn't been made persistent:
calls of simple_unlink() and simple_rmdir() in configfs and
apparmorfs.

Both filesystems do pin (part of) their contents in dcache, but
they are currently playing very unusual games with that.  Converting
them to more usual patterns might be possible, but it's definitely
going to be a long series of changes in both cases.

For now the easiest solution is to have both stop using simple_unlink()
and simple_rmdir() - that allows to make d_make_discardable() warn
when given a non-persistent dentry.

Rather than giving them full-blown private copies (with calls of
d_make_discardable() replaced with dput()), let's pull the parts of
simple_unlink() and simple_rmdir() that deal with timestamps and link
counts into separate helpers (__simple_unlink() and __simple_rmdir()
resp.) and have those used by configfs and apparmorfs.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:59:27 -05:00
Al Viro
ca459ca70f kill securityfs_recursive_remove()
it's an unused alias for securityfs_remove()

Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:59:27 -05:00
Al Viro
2026c6f8eb convert securityfs
securityfs uses simple_recursive_removal(), but does not bother to mark
dentries persistent.  This is the only place where it still happens; get
rid of that irregularity.

* use simple_{start,done}_creating() and d_make_persitent(); kill_litter_super()
use was already gone, since we empty the filesystem instance before it gets
shut down.

Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:59:27 -05:00
Al Viro
fc45aee662 get rid of kill_litter_super()
Not used anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:59:27 -05:00
Al Viro
4433d8e25d convert rust_binderfs
Parallel to binderfs stuff:
	* use simple_start_creating()/simple_done_creating()/d_make_persistent()
instead of manual inode_lock()/lookup_noperm()/d_instanitate()/inode_unlock().
	* allocate inode first - simpler cleanup that way.
	* use simple_recursive_removal() instead of open-coding it.
	* switch to kill_anon_super()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:59:27 -05:00
Al Viro
0c9a266868 convert nfsctl
One instance per net-ns.  There's a fixed subset (several files in root,
an optional symlink in root + initially empty /clients/) + per-client
subdirectory in /clients/.  Clients can appear only after the filesystem
is there and they are all gone before it gets through ->kill_sb().

Fixed subset created in fill_super(), regular files by simple_fill_super(),
then a subdirectory and a symlink - manually.  It is removed by
kill_litter_super().

Per-client subdirectories are created by nfsd_client_mkdir() (populated
with client-supplied list of files in them).  Removed by nfsd_client_rmdir(),
which is simple_recursive_removal().

All dentries except for the ones from simple_fill_super() come from
	* nfsd_mkdir() (subdirectory, dentry from simple_start_creating()).
	  Called from fill_super() (creates initially empty /clients)
	  and from nfsd_client_mkdir (creates a per-client subdirectory
	  in /clients).
	* _nfsd_symlink() (symlink, dentry from simple_start_creating()), called
	  from fill_super().
	* nfsdfs_create_files() (regulars, dentry from simple_start_creating()),
	  called only from nfsd_client_mkdir().

Turn d_instatiate() + inode_unlock() into d_make_persistent() + simple_done_creating()
in nfsd_mkdir(), _nfsd_symlink() and nfsdfs_create_files() and we are done.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:59:27 -05:00
Al Viro
946e225677 convert rpc_pipefs
Just use d_make_persistent() + dput() (and fold the latter into
simple_finish_creating()) and that's it...

NOTE: pipe->dentry is a borrowed reference - it does not contribute
to dentry refcount.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:59:27 -05:00
Al Viro
2a3d40476b convert hypfs
just have hypfs_create_file() do the usual simple_start_creating()/
d_make_persistent()/simple_done_creating() and that's it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:59:27 -05:00
Al Viro
723c2ba859 hypfs: swich hypfs_create_u64() to returning int
same story as for hypfs_create_str()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:59:27 -05:00
Al Viro
63f76f51fe hypfs: switch hypfs_create_str() to returning int
Every single caller only cares about PTR_ERR_OR_ZERO() of return value...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:59:27 -05:00
Al Viro
781716cd4a hypfs: don't pin dentries twice
hypfs dentries end up with refcount 2 when they are not busy.
Refcount 1 is enough to keep them pinned, and going that way
allows to simplify things nicely:
	* don't need to drop an extra reference before the
call of kill_litter_super() in ->kill_sb(); all we need
there is to reset the cleanup list - everything on it will
be taken out automatically.
	* we can make use of simple_recursive_removal() on
tree rebuilds; just make sure that only children of root
end up in the cleanup list and hypfs_delete_tree() becomes
much simpler

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:59:27 -05:00
Al Viro
57db9d428b convert gadgetfs
same as functionfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:59:27 -05:00
Al Viro
b65e3b11e9 gadgetfs: switch to simple_remove_by_name()
No need to return dentry from gadgetfs_create_file() or keep it around
afterwards.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:59:27 -05:00
Al Viro
6ca67378d0 convert functionfs
All files are regular; ep0 is there all along, other ep* may appear
and go away during the filesystem lifetime; all of those are guaranteed
to be gone by the time we umount it.

Object creation is in ffs_sb_create_file(), removals - at ->kill_sb()
time (for ep0) or by simple_remove_by_name() from ffs_epfiles_destroy()
(for the rest of them).

Switch ffs_sb_create_file() to simple_start_creating()/d_make_persistent()/
simple_done_creating() and that's it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:59:05 -05:00
Al Viro
c7747fafab functionfs: switch to simple_remove_by_name()
No need to return dentry from ffs_sb_create_file() or keep it around
afterwards.

To avoid subtle issues with getting to ffs from epfiles in
ffs_epfiles_destroy(), pass the superblock as explicit argument.
Callers have it anyway.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:57:39 -05:00
Al Viro
e5bf5ee266 functionfs: fix the open/removal races
ffs_epfile_open() can race with removal, ending up with file->private_data
pointing to freed object.

There is a total count of opened files on functionfs (both ep0 and
dynamic ones) and when it hits zero, dynamic files get removed.
Unfortunately, that removal can happen while another thread is
in ffs_epfile_open(), but has not incremented the count yet.
In that case open will succeed, leaving us with UAF on any subsequent
read() or write().

The root cause is that ffs->opened is misused; atomic_dec_and_test() vs.
atomic_add_return() is not a good idea, when object remains visible all
along.

To untangle that
	* serialize openers on ffs->mutex (both for ep0 and for dynamic files)
	* have dynamic ones use atomic_inc_not_zero() and fail if we had
zero ->opened; in that case the file we are opening is doomed.
	* have the inodes of dynamic files marked on removal (from the
callback of simple_recursive_removal()) - clear ->i_private there.
	* have open of dynamic ones verify they hadn't been already removed,
along with checking that state is FFS_ACTIVE.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:54:42 -05:00
Al Viro
1544775687 functionfs: need to cancel ->reset_work in ->kill_sb()
... otherwise we just might free ffs with ffs->reset_work
still on queue.  That needs to be done after ffs_data_reset() -
that's the cutoff point for configfs accesses (serialized
on gadget_info->lock), which is where the schedule_work()
would come from.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:53:51 -05:00
Al Viro
fe47466282 functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
A reference is held by the superblock (it's dropped in ffs_kill_sb())
and filesystem will not get to ->kill_sb() while there are any opened
files, TYVM...

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:52:57 -05:00
Al Viro
fcb8985143 functionfs: don't abuse ffs_data_closed() on fs shutdown
ffs_data_closed() has a seriously confusing logics in it: in addition
to the normal "decrement a counter and do some work if it hits zero"
there's "... and if it has somehow become negative, do that" bit.

It's not a race, despite smelling rather fishy.  What really happens
is that in addition to "call that on close of files there, to match
the increments of counter on opens" there's one call in ->kill_sb().
Counter starts at 0 and never goes negative over the lifetime of
filesystem (or we have much worse problems everywhere - ->release()
call of some file somehow unpaired with successful ->open() of the
same).  At the filesystem shutdown it will be 0 or, again, we have
much worse problems - filesystem instance destroyed with files on it
still open.  In other words, at that call and at that call alone
the decrement would go from 0 to -1, hitting that chunk (and not
hitting the "if it hits 0" part).

So that check is a weirdly spelled "called from ffs_kill_sb()".
Just expand the call in the latter and kill the misplaced chunk
in ffs_data_closed().

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:52:20 -05:00
Zixian Zeng
af5eb17ff8 riscv: dts: sophgo: Enable SPI NOR node for SG2042_EVB_V2
Enable SPI NOR node for SG2042_EVB_V2 device tree

According to SG2042_EVB_V2 schematic, SPI-NOR Flash cannot support QSPI
due to hardware design. Thus spi-(tx|rx)-bus-width must be set to 1.

Signed-off-by: Han Gao <rabenda.cn@gmail.com>
Signed-off-by: Zixian Zeng <sycamoremoon376@gmail.com>
Link: https://lore.kernel.org/r/20250916-sfg-spidts-v2-4-b5d9024fe1c8@gmail.com
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
Signed-off-by: Chen Wang <wangchen20@iscas.ac.cn>
2025-11-18 09:17:55 +08:00
Zixian Zeng
11f4d84c9f riscv: dts: sophgo: Enable SPI NOR node for SG2042_EVB_V1
Enable SPI NOR node for SG2042_EVB_V1 device tree

According to SG2042_EVB_V1 schematic, SPI-NOR Flash cannot support QSPI
due to hardware design. Thus spi-(tx|rx)-bus-width must be set to 1.

Signed-off-by: Han Gao <rabenda.cn@gmail.com>
Signed-off-by: Zixian Zeng <sycamoremoon376@gmail.com>
Link: https://lore.kernel.org/r/20250916-sfg-spidts-v2-3-b5d9024fe1c8@gmail.com
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
Signed-off-by: Chen Wang <wangchen20@iscas.ac.cn>
2025-11-18 09:17:55 +08:00
Zixian Zeng
f49314cbbc riscv: dts: sophgo: Enable SPI NOR node for PioneerBox
Enable SPI NOR node for PioneerBox device tree

According to PioneerBox schematic, SPI-NOR Flash cannot support QSPI
due to hardware design. Thus spi-(tx|rx)-bus-width must be set to 1.

Reviewed-by: Chen Wang <unicorn_wang@outlook.com>
Tested-by: Chen Wang <unicorn_wang@outlook.com>
Signed-off-by: Zixian Zeng <sycamoremoon376@gmail.com>
Link: https://lore.kernel.org/r/20250916-sfg-spidts-v2-2-b5d9024fe1c8@gmail.com
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
Signed-off-by: Chen Wang <wangchen20@iscas.ac.cn>
2025-11-18 09:17:55 +08:00
Zixian Zeng
59dc89fdfe riscv: dts: sophgo: Add SPI NOR node for SG2042
Add SPI NOR controller node for SG2042

Reviewed-by: Chen Wang <unicorn_wang@outlook.com>
Tested-by: Chen Wang <unicorn_wang@outlook.com>
Signed-off-by: Zixian Zeng <sycamoremoon376@gmail.com>
Link: https://lore.kernel.org/r/20250916-sfg-spidts-v2-1-b5d9024fe1c8@gmail.com
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
Signed-off-by: Chen Wang <wangchen20@iscas.ac.cn>
2025-11-18 09:17:55 +08:00
Yicong Yang
b43652d867 memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion()
Extend cpu_cache_invalidate_memregion() to support invalidating a
particular range of memory by introducing start and length parameters.
Control of types of invalidation is left for when use cases turn up. For
now everything is Clean and Invalidate.

Where the range is unknown, use the provided cpu_cache_invalidate_all()
helper to act as documentation of intent in a fashion that is clearer than
passing (0, -1) to cpu_cache_invalidate_memregion().

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-17 23:45:45 +00:00
Jonathan Cameron
f49ae86483 memregion: Drop unused IORES_DESC_* parameter from cpu_cache_invalidate_memregion()
The res_desc parameter was originally introduced for documentation purposes
and with the idea that with HDM-DB CXL invalidation could be triggered from
the device. That has not come to pass and the continued existence of the
option is confusing when we add a range in the following patch which might
not be a strict subset of the res_desc. So avoid that confusion by dropping
the parameter.

Link: https://lore.kernel.org/linux-mm/686eedb25ed02_24471002e@dwillia2-xfh.jf.intel.com.notmuch/
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-17 23:45:45 +00:00
Pierre-Henry Moussay
d52341da4d dt-bindings: cache: sifive,ccache0: add a pic64gx compatible
The pic64gx use the same IP than mpfs, therefore add compatibility with
mpfs as fallback.

Signed-off-by: Pierre-Henry Moussay <pierre-henry.moussay@microchip.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-17 23:44:12 +00:00
Heiko Stuebner
048213a38e soc: rockchip: grf: Add select correct PWM implementation on RK3368
Similar to the RK3288, the RK3368 has two different implementations of
the PWM block inside the SoC - the newer ones that we have a driver for
and that is used on every SoC and a previous variant that was likely
left as a fallback if the new one creates problems.

The devicetree is already set up for the new variant, so make sure
we actually use it - similar to the RK3288.

Signed-off-by: Heiko Stuebner <heiko.stuebner@cherry.de>
Link: https://patch.msgid.link/20251021074254.87065-4-heiko@sntech.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-17 22:34:12 +01:00
Parvathi Pudi
393f40ef96 ARM: multi_v7_defconfig: Enable TI PRU Ethernet driver
The Programmable Real-time Unit and Industrial Communication Subsystem
Megabit (ICSSM) is a microcontroller subsystem in TI SoCs such as
AM57x, AM437x, and AM335x. It provides real-time processing
capabilities for industrial communication and custom peripheral interfaces.

Currently, EVMs based on AM57x, AM437x, and AM335x use the ICSSM driver
for PRU-based Ethernet functionality.

This patch enables TI_PRUSS and TI_PRUETH as a module for TI SoCs.

Signed-off-by: Parvathi Pudi <parvathi@couthit.com>
Link: https://lore.kernel.org/r/20251103125451.1679404-1-parvathi@couthit.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-17 08:53:20 -08:00
Lei Chen
e78fb96b41 KVM: x86: remove comment about ntp correction sync for
Since vcpu local clock is no longer affected by ntp,
remove comment about ntp correction sync for function
kvm_gen_kvmclock_update.

Signed-off-by: Lei Chen <lei.chen@smartx.com>
Link: https://patch.msgid.link/20250819152027.1687487-4-lei.chen@smartx.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-17 07:50:25 -08:00
Lei Chen
446fcce2a5 Revert "x86: kvm: rate-limit global clock updates"
This reverts commit 7e44e4495a.

Commit 7e44e4495a ("x86: kvm: rate-limit global clock updates")
intends to use a kvmclock_update_work to sync ntp corretion
across all vcpus kvmclock, which is based on commit 0061d53daf
("KVM: x86: limit difference between kvmclock updates")

Since kvmclock has been switched to mono raw, this commit can be
reverted.

Signed-off-by: Lei Chen <lei.chen@smartx.com>
Link: https://patch.msgid.link/20250819152027.1687487-3-lei.chen@smartx.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-17 07:50:24 -08:00
Lei Chen
43ddbf16ed Revert "x86: kvm: introduce periodic global clock updates"
This reverts commit 332967a3ea.

Commit 332967a3ea ("x86: kvm: introduce periodic global clock
updates") introduced a 300s interval work to sync ntp corrections
across all vcpus.

Since commit 53fafdbb8b ("KVM: x86: switch KVMCLOCK base to
monotonic raw clock"), kvmclock switched to mono raw clock,
we can no longer take ntp into consideration.

Signed-off-by: Lei Chen <lei.chen@smartx.com>
Link: https://patch.msgid.link/20250819152027.1687487-2-lei.chen@smartx.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-17 07:50:23 -08:00
Sean Christopherson
a091fe60c2 KVM: x86: Grab lapic_timer in a local variable to cleanup periodic code
Stash apic->lapic_timer in a local "ktimer" variable in
advance_periodic_target_expiration() to eliminate a few unaligned wraps,
and to make the code easier to read overall.

No functional change intended.

Link: https://patch.msgid.link/20251113205114.1647493-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-17 07:50:23 -08:00
fuqiang wang
18ab3fc8e8 KVM: x86: Fix VM hard lockup after prolonged inactivity with periodic HV timer
When advancing the target expiration for the guest's APIC timer in periodic
mode, set the expiration to "now" if the target expiration is in the past
(similar to what is done in update_target_expiration()).  Blindly adding
the period to the previous target expiration can result in KVM generating
a practically unbounded number of hrtimer IRQs due to programming an
expired timer over and over.  In extreme scenarios, e.g. if userspace
pauses/suspends a VM for an extended duration, this can even cause hard
lockups in the host.

Currently, the bug only affects Intel CPUs when using the hypervisor timer
(HV timer), a.k.a. the VMX preemption timer.  Unlike the software timer,
a.k.a. hrtimer, which KVM keeps running even on exits to userspace, the
HV timer only runs while the guest is active.  As a result, if the vCPU
does not run for an extended duration, there will be a huge gap between
the target expiration and the current time the vCPU resumes running.
Because the target expiration is incremented by only one period on each
timer expiration, this leads to a series of timer expirations occurring
rapidly after the vCPU/VM resumes.

More critically, when the vCPU first triggers a periodic HV timer
expiration after resuming, advancing the expiration by only one period
will result in a target expiration in the past.  As a result, the delta
may be calculated as a negative value.  When the delta is converted into
an absolute value (tscdeadline is an unsigned u64), the resulting value
can overflow what the HV timer is capable of programming.  I.e. the large
value will exceed the VMX Preemption Timer's maximum bit width of
cpu_preemption_timer_multi + 32, and thus cause KVM to switch from the
HV timer to the software timer (hrtimers).

After switching to the software timer, periodic timer expiration callbacks
may be executed consecutively within a single clock interrupt handler,
because hrtimers honors KVM's request for an expiration in the past and
immediately re-invokes KVM's callback after reprogramming.  And because
the interrupt handler runs with IRQs disabled, restarting KVM's hrtimer
over and over until the target expiration is advanced to "now" can result
in a hard lockup.

E.g. the following hard lockup was triggered in the host when running a
Windows VM (only relevant because it used the APIC timer in periodic mode)
after resuming the VM from a long suspend (in the host).

  NMI watchdog: Watchdog detected hard LOCKUP on cpu 45
  ...
  RIP: 0010:advance_periodic_target_expiration+0x4d/0x80 [kvm]
  ...
  RSP: 0018:ff4f88f5d98d8ef0 EFLAGS: 00000046
  RAX: fff0103f91be678e RBX: fff0103f91be678e RCX: 00843a7d9e127bcc
  RDX: 0000000000000002 RSI: 0052ca4003697505 RDI: ff440d5bfbdbd500
  RBP: ff440d5956f99200 R08: ff2ff2a42deb6a84 R09: 000000000002a6c0
  R10: 0122d794016332b3 R11: 0000000000000000 R12: ff440db1af39cfc0
  R13: ff440db1af39cfc0 R14: ffffffffc0d4a560 R15: ff440db1af39d0f8
  FS:  00007f04a6ffd700(0000) GS:ff440db1af380000(0000) knlGS:000000e38a3b8000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000000d5651feff8 CR3: 000000684e038002 CR4: 0000000000773ee0
  PKRU: 55555554
  Call Trace:
   <IRQ>
   apic_timer_fn+0x31/0x50 [kvm]
   __hrtimer_run_queues+0x100/0x280
   hrtimer_interrupt+0x100/0x210
   ? ttwu_do_wakeup+0x19/0x160
   smp_apic_timer_interrupt+0x6a/0x130
   apic_timer_interrupt+0xf/0x20
   </IRQ>

Moreover, if the suspend duration of the virtual machine is not long enough
to trigger a hard lockup in this scenario, since commit 98c25ead5e
("KVM: VMX: Move preemption timer <=> hrtimer dance to common x86"), KVM
will continue using the software timer until the guest reprograms the APIC
timer in some way.  Since the periodic timer does not require frequent APIC
timer register programming, the guest may continue to use the software
timer in perpetuity.

Fixes: d8f2f498d9 ("x86/kvm: fix LAPIC timer drift when guest uses periodic mode")
Cc: stable@vger.kernel.org
Signed-off-by: fuqiang wang <fuqiang.wng@gmail.com>
[sean: massage comments and changelog]
Link: https://patch.msgid.link/20251113205114.1647493-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-17 07:50:22 -08:00
fuqiang wang
9633f180ce KVM: x86: Explicitly set new periodic hrtimer expiration in apic_timer_fn()
When restarting an hrtimer to emulate a the guest's APIC timer in periodic
mode, explicitly set the expiration using the target expiration computed
by advance_periodic_target_expiration() instead of adding the period to
the existing timer.  This will allow making adjustments to the expiration,
e.g. to deal with expirations far in the past, without having to implement
the same logic in both advance_periodic_target_expiration() and
apic_timer_fn().

Cc: stable@vger.kernel.org
Signed-off-by: fuqiang wang <fuqiang.wng@gmail.com>
[sean: split to separate patch, write changelog]
Link: https://patch.msgid.link/20251113205114.1647493-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-17 07:50:21 -08:00
Sean Christopherson
0ea9494be9 KVM: x86: WARN if hrtimer callback for periodic APIC timer fires with period=0
WARN and don't restart the hrtimer if KVM's callback runs with the guest's
APIC timer in periodic mode but with a period of '0', as not advancing the
hrtimer's deadline would put the CPU into an infinite loop of hrtimer
events.  Observing a period of '0' should be impossible, even when the
hrtimer is running on a different CPU than the vCPU, as KVM is supposed to
cancel the hrtimer before changing (or zeroing) the period, e.g. when
switching from periodic to one-shot.

Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251113205114.1647493-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-17 07:50:21 -08:00
Sean Christopherson
b3e5b670c9 KVM: x86: Use "checked" versions of get_user() and put_user()
Use the normal, checked versions for get_user() and put_user() instead of
the double-underscore versions that omit range checks, as the checked
versions are actually measurably faster on modern CPUs (12%+ on Intel,
25%+ on AMD).

The performance hit on the unchecked versions is almost entirely due to
the added LFENCE on CPUs where LFENCE is serializing (which is effectively
all modern CPUs), which was added by commit 304ec1b050 ("x86/uaccess:
Use __uaccess_begin_nospec() and uaccess_try_nospec").  The small
optimizations done by commit b19b74bc99 ("x86/mm: Rework address range
check in get_user() and put_user()") likely shave a few cycles off, but
the bulk of the extra latency comes from the LFENCE.

Don't bother trying to open-code an equivalent for performance reasons, as
the loss of inlining (e.g. see commit ea6f043fc9 ("x86: Make __get_user()
generate an out-of-line call") is largely a non-factor (ignoring setups
where RET is something entirely different),

As measured across tens of millions of calls of guest PTE reads in
FNAME(walk_addr_generic):

              __get_user()  get_user()  open-coded  open-coded, no LFENCE
Intel (EMR)           75.1        67.6        75.3                   65.5
AMD (Turin)           68.1        51.1        67.5                   49.3

Note, Hyper-V MSR emulation is not a remotely hot path, but convert it
anyways for consistency, and because there is a general desire to remove
__{get,put}_user() entirely.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Closes: https://lore.kernel.org/all/CAHk-=wimh_3jM9Xe8Zx0rpuf8CPDu6DkRCGb44azk0Sz5yqSnw@mail.gmail.com
Cc: Borislav Petkov <bp@alien8.de>
Link: https://patch.msgid.link/20251106210206.221558-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-17 07:50:20 -08:00
Niravkumar L Rabara
38eff72f2d arm64: dts: socfpga: agilex5: update qspi partitions for 013b board
Update qspi flash partitions to support Remote System Update (RSU).

Signed-off-by: Niravkumar L Rabara <niravkumarlaxmidas.rabara@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-11-17 05:37:59 -06:00
Max Krummenacher
90bbe88e0e arm64: dts: freescale: add Toradex SMARC iMX95
Add DT support for Toradex SMARC iMX95 SoM and Development carrier
board.

The module consists of an NXP i.MX95 family SoC, up to 16GB of LPDDR5
RAM and up to 128GB of storage, a USB 3.0 Host Hub and 2.0 OTG, two
Gigabit Ethernet PHYs, a 10 Gigabit Ethernet interface, an I2C EEPROM
and Temperature Sensor, an RX8130 RTC, a Quad/Dual lane CSI interface,
and some optional addons: TPM 2.0, DSI, LVDS, DisplayPort (through a
DSI-DP bridge), and Wi-Fi/BT module.

Link: https://www.toradex.com/computer-on-modules/smarc-arm-family/nxp-imx95
Link: https://www.toradex.com/products/carrier-board/smarc-development-board-kit
Signed-off-by: Max Krummenacher <max.krummenacher@toradex.com>
Co-developed-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com>
Signed-off-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com>
Co-developed-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com>
Signed-off-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com>
Co-developed-by: João Paulo Gonçalves <joao.goncalves@toradex.com>
Signed-off-by: João Paulo Gonçalves <joao.goncalves@toradex.com>
Co-developed-by: Vitor Soares <vitor.soares@toradex.com>
Signed-off-by: Vitor Soares <vitor.soares@toradex.com>
Co-developed-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 16:39:27 +08:00
João Paulo Gonçalves
5ffac985b5 dt-bindings: arm: fsl: add Toradex SMARC iMX95
Add DT compatible strings for Toradex SMARC iMX95 SoM and
Toradex SMARC Development carrier board.

Link: https://www.toradex.com/computer-on-modules/smarc-arm-family/nxp-imx95
Link: https://www.toradex.com/products/carrier-board/smarc-development-board-kit
Signed-off-by: João Paulo Gonçalves <joao.goncalves@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 16:37:55 +08:00
Longbin Li
e307248a3c riscv: dts: sophgo: Add USB support for cv18xx
Add USB controller node for cv18xx and enable it for Huashan Pi, milkv-duo.

Co-developed-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Longbin Li <looong.bin@gmail.com>
Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Link: https://lore.kernel.org/r/20251101014329.18439-4-looong.bin@gmail.com
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
Signed-off-by: Chen Wang <wangchen20@iscas.ac.cn>
2025-11-17 15:51:55 +08:00
Longbin Li
126a1b3c61 riscv: dts: sophgo: Add syscon node for cv18xx
Add top syscon node and all subdevice nodes for cv18xx series SoC.

Co-developed-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Longbin Li <looong.bin@gmail.com>
Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Link: https://lore.kernel.org/r/20251101014329.18439-3-looong.bin@gmail.com
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
Signed-off-by: Chen Wang <wangchen20@iscas.ac.cn>
2025-11-17 15:51:55 +08:00
Longbin Li
565c450124 dt-bindings: soc: sophgo: add TOP syscon for CV18XX/SG200X series SoC
The Sophgo CV18XX/SG200X SoC top misc system controller provides register
access to configure related modules. It includes a usb2 phy and a dma
multiplexer.

Co-developed-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Longbin Li <looong.bin@gmail.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20251101014329.18439-2-looong.bin@gmail.com
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
Signed-off-by: Chen Wang <wangchen20@iscas.ac.cn>
2025-11-17 15:51:55 +08:00
Jun Guo
7dfe67ab5a arm64: dts: cix: add a compatible string for the cix sky1 SoC
The SPI IP design for the cix sky1 SoC uses a FIFO with a data width
of 32 bits, instead of the default 8 bits. Therefore, a compatible
string is added to specify the FIFO data width configuration for the
cix sky1 SoC.

Signed-off-by: Jun Guo <jun.guo@cixtech.com>
Link: https://lore.kernel.org/r/20251031073003.3289573-4-jun.guo@cixtech.com
Signed-off-by: Peter Chen <peter.chen@cixtech.com>
2025-11-17 12:51:05 +08:00
Hans Zhang
b2bc5a821b arm64: dts: cix: Enable PCIe on the Orion O6 board
Add PCIe RC support on Orion O6 board.

The Orion O6 board includes multiple PCIe root complexes. The current
device tree configuration enables detection and basic operation of PCIe
endpoints on this platform.

GPIO and pinctrl subsystems for this platform are not yet ready for
upstream inclusion. Consequently, attributes such as reset-gpios and
pinctrl configurations are temporarily omitted from the PCIe node
definitions.

Endpoint detection and functionality are confirmed to be operational with
this basic configuration. The missing GPIO and pinctrl support will be
added incrementally in future patches as the dependent subsystems become
available upstream.

Acked-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Hans Zhang <hans.zhang@cixtech.com>
Link: https://lore.kernel.org/r/20251108140305.1120117-11-hans.zhang@cixtech.com
Signed-off-by: Peter Chen <peter.chen@cixtech.com>
2025-11-17 12:49:42 +08:00
Hans Zhang
0b014cd8f1 arm64: dts: cix: Add PCIe Root Complex on sky1
Add pcie_x*_rc node to support Sky1 PCIe driver based on the
Cadence PCIe core.

Supports Gen1/Gen2/Gen3/Gen4, 1/2/4/8 lane, MSI/MSI-x interrupts
using the ARM GICv3.

Acked-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Hans Zhang <hans.zhang@cixtech.com>
Link: https://lore.kernel.org/r/20251108140305.1120117-10-hans.zhang@cixtech.com
Signed-off-by: Peter Chen <peter.chen@cixtech.com>
2025-11-17 12:46:59 +08:00
SeongJae Park
80d725f96c mm/damon/tests/core-kunit: remove unused ctx in damon_test_split_regions_of()
damon_test_split_regions_of() dynamically allocates a 'struct damon_ctx'
object, but it is not really being used in the code other than handling
the allocation failure and deallocating it at the end of the function. 
Remove the unnecessary allocation and deallocation of the object.

Link: https://lkml.kernel.org/r/20251101182021.74868-23-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:35 -08:00
SeongJae Park
40b11d1eb1 mm/damon/tests/core-kunit: remove unnecessary damon_ctx variable on damon_test_split_at()
damon_test_split_at() dynamically allocates a 'struct damon_ctx' object,
but it is not really being used in the code other than handling the
allocation failure and deallocating it at the end of the function.  Remove
the unnecessary allocation and deallocation of the object.

Link: https://lkml.kernel.org/r/20251101182021.74868-22-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:35 -08:00
SeongJae Park
7d808bf139 mm/damon/tests/sysfs-kunit: handle alloc failures on damon_sysfs_test_add_targets()
damon_sysfs_test_add_targets() is assuming all dynamic memory allocation
in it will succeed.  Those are indeed likely in the real use cases since
those allocations are too small to fail, but theoretically those could
fail.  In the case, inappropriate memory access can happen.  Fix it by
appropriately cleanup pre-allocated memory and skip the execution of the
remaining tests in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-21-sj@kernel.org
Fixes: b8ee5575f7 ("mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[6.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:35 -08:00
SeongJae Park
0a63a0e757 mm/damon/tests/vaddr-kunit: handle alloc failures on damon_test_split_evenly_succ()
damon_test_split_evenly_succ() is assuming all dynamic memory allocation
in it will succeed.  Those are indeed likely in the real use cases since
those allocations are too small to fail, but theoretically those could
fail.  In the case, inappropriate memory access can happen.  Fix it by
appropriately cleanup pre-allocated memory and skip the execution of the
remaining tests in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-20-sj@kernel.org
Fixes: 17ccae8bb5 ("mm/damon: add kunit tests")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[5.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:34 -08:00
SeongJae Park
7890e5b5bb mm/damon/tests/vaddr-kunit: handle alloc failures in damon_test_split_evenly_fail()
damon_test_split_evenly_fail() is assuming all dynamic memory allocation
in it will succeed.  Those are indeed likely in the real use cases since
those allocations are too small to fail, but theoretically those could
fail.  In the case, inappropriate memory access can happen.  Fix it by
appropriately cleanup pre-allocated memory and skip the execution of the
remaining tests in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-19-sj@kernel.org
Fixes: 17ccae8bb5 ("mm/damon: add kunit tests")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[5.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:34 -08:00
SeongJae Park
2b22d0fcc6 mm/damon/tests/vaddr-kunit: handle alloc failures on damon_do_test_apply_three_regions()
damon_do_test_apply_three_regions() is assuming all dynamic memory
allocation in it will succeed.  Those are indeed likely in the real use
cases since those allocations are too small to fail, but theoretically
those could fail.  In the case, inappropriate memory access can happen. 
Fix it by appropriately cleanup pre-allocated memory and skip the
execution of the remaining tests in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-18-sj@kernel.org
Fixes: 17ccae8bb5 ("mm/damon: add kunit tests")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[5.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:34 -08:00
SeongJae Park
84be856cc8 mm/damon/tests/core-kunit: handle alloc failures on damon_test_set_filters_default_reject()
damon_test_set_filters_default_reject() is assuming all dynamic memory
allocation in it will succeed.  Those are indeed likely in the real use
cases since those allocations are too small to fail, but theoretically
those could fail.  In the case, inappropriate memory access can happen. 
Fix it by appropriately cleanup pre-allocated memory and skip the
execution of the remaining tests in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-17-sj@kernel.org
Fixes: 094fb14913 ("mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[6.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:34 -08:00
SeongJae Park
d14d5671e7 mm/damon/tests/core-kunit: handle alloc failures on damos_test_filter_out()
damon_test_filter_out() is assuming all dynamic memory allocation in it
will succeed.  Those are indeed likely in the real use cases since those
allocations are too small to fail, but theoretically those could fail.  In
the case, inappropriate memory access can happen.  Fix it by appropriately
cleanup pre-allocated memory and skip the execution of the remaining tests
in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-16-sj@kernel.org
Fixes: 26713c8908 ("mm/damon/core-test: add a unit test for __damos_filter_out()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[6.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:33 -08:00
SeongJae Park
3e5c4a1a17 mm/damon/tests/core-kunit: handle alloc failure on damos_test_commit_filter()
damon_test_commit_filter() is assuming all dynamic memory allocation in it
will succeed.  Those are indeed likely in the real use cases since those
allocations are too small to fail, but theoretically those could fail.  In
the case, inappropriate memory access can happen.  Fix it by appropriately
cleanup pre-allocated memory and skip the execution of the remaining tests
in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-15-sj@kernel.org
Fixes: f6a4a150f1 ("mm/damon/tests/core-kunit: add damos_commit_filter test")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[6.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:33 -08:00
SeongJae Park
28ab2265e9 mm/damon/tests/core-kunit: handle alloc failres in damon_test_new_filter()
damon_test_new_filter() is assuming all dynamic memory allocation in it
will succeed.  Those are indeed likely in the real use cases since those
allocations are too small to fail, but theoretically those could fail.  In
the case, inappropriate memory access can happen.  Fix it by appropriately
cleanup pre-allocated memory and skip the execution of the remaining tests
in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-14-sj@kernel.org
Fixes: 2a158e956b ("mm/damon/core-test: add a test for damos_new_filter()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[6.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:33 -08:00
SeongJae Park
915a2453d8 mm/damon/tests/core-kunit: handle alloc failure on damon_test_set_attrs()
damon_test_set_attrs() is assuming all dynamic memory allocation in it
will succeed.  Those are indeed likely in the real use cases since those
allocations are too small to fail, but theoretically those could fail.  In
the case, inappropriate memory access can happen.  Fix it by appropriately
cleanup pre-allocated memory and skip the execution of the remaining tests
in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-13-sj@kernel.org
Fixes: aa13779be6 ("mm/damon/core-test: add a test for damon_set_attrs()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:33 -08:00
SeongJae Park
8cf298c01b mm/damon/tests/core-kunit: handle alloc failures in damon_test_update_monitoring_result()
damon_test_update_monitoring_result() is assuming all dynamic memory
allocation in it will succeed.  Those are indeed likely in the real use
cases since those allocations are too small to fail, but theoretically
those could fail.  In the case, inappropriate memory access can happen. 
Fix it by appropriately cleanup pre-allocated memory and skip the
execution of the remaining tests in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-12-sj@kernel.org
Fixes: f4c978b659 ("mm/damon/core-test: add a test for damon_update_monitoring_results()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[6.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:32 -08:00
SeongJae Park
74d5969995 mm/damon/tests/core-kunit: handle alloc failures in damon_test_set_regions()
damon_test_set_regions() is assuming all dynamic memory allocation in it
will succeed.  Those are indeed likely in the real use cases since those
allocations are too small to fail, but theoretically those could fail.  In
the case, inappropriate memory access can happen.  Fix it by appropriately
cleanup pre-allocated memory and skip the execution of the remaining tests
in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-11-sj@kernel.org
Fixes: 62f409560e ("mm/damon/core-test: test damon_set_regions")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[6.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:32 -08:00
SeongJae Park
4f835f4e8c mm/damon/tests/core-kunit: handle alloc failures in damon_test_ops_registration()
damon_test_ops_registration() is assuming all dynamic memory allocation in
it will succeed.  Those are indeed likely in the real use cases since
those allocations are too small to fail, but theoretically those could
fail.  In the case, inappropriate memory access can happen.  Fix it by
appropriately cleanup pre-allocated memory and skip the execution of the
remaining tests in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-10-sj@kernel.org
Fixes: 4f540f5ab4 ("mm/damon/core-test: add a kunit test case for ops registration")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[5.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:32 -08:00
SeongJae Park
eded254cb6 mm/damon/tests/core-kunit: handle alloc failures on damon_test_split_regions_of()
damon_test_split_regions_of() is assuming all dynamic memory allocation in
it will succeed.  Those are indeed likely in the real use cases since
those allocations are too small to fail, but theoretically those could
fail.  In the case, inappropriate memory access can happen.  Fix it by
appropriately cleanup pre-allocated memory and skip the execution of the
remaining tests in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-9-sj@kernel.org
Fixes: 17ccae8bb5 ("mm/damon: add kunit tests")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[5.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:32 -08:00
SeongJae Park
0998d27572 mm/damon/tests/core-kunit: handle alloc failures on dasmon_test_merge_regions_of()
damon_test_merge_regions_of() is assuming all dynamic memory allocation in
it will succeed.  Those are indeed likely in the real use cases since
those allocations are too small to fail, but theoretically those could
fail.  In the case, inappropriate memory access can happen.  Fix it by
appropriately cleanup pre-allocated memory and skip the execution of the
remaining tests in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-8-sj@kernel.org
Fixes: 17ccae8bb5 ("mm/damon: add kunit tests")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[5.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:31 -08:00
SeongJae Park
3d443dd29a mm/damon/tests/core-kunit: handle alloc failures on damon_test_merge_two()
damon_test_merge_two() is assuming all dynamic memory allocation in it
will succeed.  Those are indeed likely in the real use cases since those
allocations are too small to fail, but theoretically those could fail.  In
the case, inappropriate memory access can happen.  Fix it by appropriately
cleanup pre-allocated memory and skip the execution of the remaining tests
in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-7-sj@kernel.org
Fixes: 17ccae8bb5 ("mm/damon: add kunit tests")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[5.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:31 -08:00
SeongJae Park
5e80d73f22 mm/damon/tests/core-kunit: handle alloc failures on damon_test_split_at()
damon_test_split_at() is assuming all dynamic memory allocation in it will
succeed.  Those are indeed likely in the real use cases since those
allocations are too small to fail, but theoretically those could fail.  In
the case, inappropriate memory access can happen.  Fix it by appropriately
cleanup pre-allocated memory and skip the execution of the remaining tests
in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-6-sj@kernel.org
Fixes: 17ccae8bb5 ("mm/damon: add kunit tests")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[5.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:31 -08:00
SeongJae Park
f79f2fc44e mm/damon/tests/core-kunit: handle memory alloc failure from damon_test_aggregate()
damon_test_aggregate() is assuming all dynamic memory allocation in it
will succeed.  Those are indeed likely in the real use cases since those
allocations are too small to fail, but theoretically those could fail.  In
the case, inappropriate memory access can happen.  Fix it by appropriately
cleanup pre-allocated memory and skip the execution of the remaining tests
in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-5-sj@kernel.org
Fixes: 17ccae8bb5 ("mm/damon: add kunit tests")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[5.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:31 -08:00
SeongJae Park
fafe953de2 mm/damon/tests/core-kunit: handle memory failure from damon_test_target()
damon_test_target() is assuming all dynamic memory allocation in it will
succeed.  Those are indeed likely in the real use cases since those
allocations are too small to fail, but theoretically those could fail.  In
the case, inappropriate memory access can happen.  Fix it by appropriately
cleanup pre-allocated memory and skip the execution of the remaining tests
in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-4-sj@kernel.org
Fixes: 17ccae8bb5 ("mm/damon: add kunit tests")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[5.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:30 -08:00
SeongJae Park
e16fdd4f75 mm/damon/tests/core-kunit: handle allocation failures in damon_test_regions()
damon_test_regions() is assuming all dynamic memory allocation in it will
succeed.  Those are indeed likely in the real use cases since those
allocations are too small to fail, but theoretically those could fail.  In
the case, inappropriate memory access can happen.  Fix it by appropriately
cleanup pre-allocated memory and skip the execution of the remaining tests
in the failure cases.

Link: https://lkml.kernel.org/r/20251101182021.74868-3-sj@kernel.org
Fixes: 17ccae8bb5 ("mm/damon: add kunit tests")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[5.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:30 -08:00
SeongJae Park
b5ab490d85 mm/damon/tests/core-kunit: fix memory leak in damon_test_set_filters_default_reject()
Patch series "mm/damon/tests: fix memory bugs in kunit tests".

DAMON kunit tests were initially written assuming those will be run on
environments that are well controlled and therefore tolerant to transient
test failures and bugs in the test code itself.  The user-mode linux based
manual run of the tests is one example of such an environment.  And the
test code was written for adding more test coverage as fast as possible,
over making those safe and reliable.

As a result, the tests resulted in having a number of bugs including real
memory leaks, theoretical unhandled memory allocation failures, and unused
memory allocations.  The allocation failures that are not handled well are
unlikely in the real world, since those allocations are too small to fail.
But in theory, it can happen and cause inappropriate memory access.

It is arguable if bugs in test code can really harm users.  But, anyway
bugs are bugs that need to be fixed.  Fix the bugs one by one.  Also Cc
stable@ for the fixes of memory leak and unhandled memory allocation
failures.  The unused memory allocations are only a matter of memory
efficiency, so not Cc-ing stable@.

The first patch fixes memory leaks in the test code for the DAMON core
layer.

Following fifteen, three, and one patches respectively fix unhandled
memory allocation failures in the test code for DAMON core layer, virtual
address space DAMON operation set, and DAMON sysfs interface, one by one
per test function.

Final two patches remove memory allocations that are correctly deallocated
at the end, but not really being used by any code.


This patch (of 22):

Kunit test function for damos_set_filters_default_reject() allocates two
'struct damos_filter' objects and not deallocates those, so that the
memory for the two objects are leaked for every time the test runs.  Fix
this by deallocating those objects at the end of the test code.

Link: https://lkml.kernel.org/r/20251101182021.74868-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20251101182021.74868-2-sj@kernel.org
Fixes: 094fb14913 ("mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>	[6.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:30 -08:00
Ankit Agrawal
ebb9aeb980 vfio/nvgrace-gpu: register device memory for poison handling
The nvgrace-gpu-vfio-pci module [1] maps the device memory to the user VA
(Qemu) using remap_pfn_range() without adding the memory to the kernel. 
The device memory pages are not backed by struct page.  The previous patch
implements the mechanism to handle ECC/poison on memory page without
struct page.  This new mechanism is being used here.

The module registers its memory region and the address_space with the
kernel MM for ECC handling using the register_pfn_address_space()
registration API exposed by the kernel.

Link: https://lore.kernel.org/all/20240220115055.23546-1-ankita@nvidia.com/ [1]
Link: https://lkml.kernel.org/r/20251102184434.2406-4-ankita@nvidia.com
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Acked-by: Alex Williamson <alex@shazbot.org>
Cc: Aniket Agashe <aniketa@nvidia.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew R. Ochs <mochs@nvidia.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Neo Jia <cjia@nvidia.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tarun Gupta <targupta@nvidia.com>
Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Cc: Vikram Sethi <vsethi@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:30 -08:00
Ankit Agrawal
2ec4196718 mm: handle poisoning of pfn without struct pages
Poison (or ECC) errors can be very common on a large size cluster.  The
kernel MM currently does not handle ECC errors / poison on a memory region
that is not backed by struct pages.  If a memory region mapped using
remap_pfn_range() for example, but not added to the kernel, MM will not
have associated struct pages.  Add a new mechanism to handle memory
failure on such memory.

Make kernel MM expose a function to allow modules managing the device
memory to register the device memory SPA and the address space associated
it.  MM maintains this information as an interval tree.  On poison, MM can
search for the range that the poisoned PFN belong and use the
address_space to determine the mapping VMA.

In this implementation, kernel MM follows the following sequence that is
largely similar to the memory_failure() handler for struct page backed
memory:

1. memory_failure() is triggered on reception of a poison error.  An
   absence of struct page is detected and consequently
   memory_failure_pfn() is executed.

2. memory_failure_pfn() collects the processes mapped to the PFN.

3. memory_failure_pfn() sends SIGBUS to all the processes mapping the
   faulty PFN using kill_procs().

Note that there is one primary difference versus the handling of the
poison on struct pages, which is to skip unmapping to the faulty PFN. 
This is done to handle the huge PFNMAP support added recently [1] that
enables VM_PFNMAP vmas to map at PMD or PUD level.  A poison to a PFN
mapped in such as way would need breaking the PMD/PUD mapping into PTEs
that will get mirrored into the S2.  This can greatly increase the cost of
table walks and have a major performance impact.

Link: https://lore.kernel.org/all/20240826204353.2228736-1-peterx@redhat.com/ [1]
Link: https://lkml.kernel.org/r/20251102184434.2406-3-ankita@nvidia.com
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Cc: Aniket Agashe <aniketa@nvidia.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew R. Ochs <mochs@nvidia.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Neo Jia <cjia@nvidia.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tarun Gupta <targupta@nvidia.com>
Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Cc: Vikram Sethi <vsethi@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:29 -08:00
Ankit Agrawal
30d0a12910 mm: change ghes code to allow poison of non-struct pfn
Poison (or ECC) errors can be very common on a large size cluster.  The
kernel MM currently handles ECC errors / poison only on memory page backed
by struct page.  The handling is currently missing for the PFNMAP memory
that does not have struct pages.  The series adds such support.

Implement a new ECC handling for memory without struct pages.  Kernel MM
expose registration APIs to allow modules that are managing the device to
register its device memory region.  MM then tracks such regions using
interval tree.

The mechanism is largely similar to that of ECC on pfn with struct pages. 
If there is an ECC error on a pfn, all the mapping to it are identified
and a SIGBUS is sent to the user space processes owning those mappings. 
Note that there is one primary difference versus the handling of the
poison on struct pages, which is to skip unmapping to the faulty PFN. 
This is done to handle the huge PFNMAP support added recently [1] that
enables VM_PFNMAP vmas to map at PMD or PUD level.  A poison to a PFN
mapped in such as way would need breaking the PMD/PUD mapping into PTEs
that will get mirrored into the S2.  This can greatly increase the cost of
table walks and have a major performance impact.

nvgrace-gpu-vfio-pci module maps the device memory to user VA (Qemu) using
remap_pfn_range without being added to the kernel [2].  These device
memory PFNs are not backed by struct page.  So make nvgrace-gpu-vfio-pci
module make use of the mechanism to get poison handling support on the
device memory.


This patch (of 3):

The GHES code allows calling of memory_failure() on the PFNs that pass the
pfn_valid() check.  This contract is broken for the remapped PFNs which
fails the check and ghes_do_memory_failure() returns without triggering
memory_failure().

Update code to allow memory_failure() call on PFNs failing pfn_valid().

Link: https://lkml.kernel.org/r/20251102184434.2406-1-ankita@nvidia.com
Link: https://lkml.kernel.org/r/20251102184434.2406-2-ankita@nvidia.com
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Aniket Agashe <aniketa@nvidia.com>
Cc: Ankit Agrawal <ankita@nvidia.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew R. Ochs <mochs@nvidia.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Neo Jia <cjia@nvidia.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tarun Gupta <targupta@nvidia.com>
Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Cc: Vikram Sethi <vsethi@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:29 -08:00
Baokun Li
ee040cbd6e mm/page_alloc: don't warn about large allocations with __GFP_NOFAIL
Filesystems use __GFP_NOFAIL to allocate block-sized folios for metadata
reads at critical points, since they cannot afford to go read-only, shut
down, or enter an inconsistent state due to memory pressure.

Currently, attempting to allocate page units greater than order-1 with the
__GFP_NOFAIL flag triggers a WARN_ON() in __alloc_pages_slowpath(). 
However, filesystems supporting large block sizes (blocksize > PAGE_SIZE)
can easily require allocations larger than order-1.

As Matthew Wilcox noted in [1], if we have a filesystem with 64KiB
sectors, there will be many clean folios in the page cache that are 64KiB
or larger.  He also explained in [2] why kvmalloc isn't a valid approach
here.

With gfp flags and order already included in the OOM report, both
Vlastimil Babka and Michal Hocko suggested that we can take the risk of
removing this warning first and then observe whether a large number of
related OOM reports appear.

If that happens, we can consider adding special handling in other places.

Link: https://lkml.kernel.org/r/20251105085652.4081123-1-libaokun@huaweicloud.com
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Link: https://lore.kernel.org/all/aQPX1-XWQjKaMTZB@casper.infradead.org [1]
Link: https://lore.kernel.org/all/aQTHMI3t5mNXp0M1@casper.infradead.org [2]
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/all/188a95ba-6384-4319-bb74-c0d9ec6c4079@suse.cz
Suggested-by: Michal Hocko <mhocko@suse.com>
Link: https://lore.kernel.org/all/aQotQBjnDDeL_wHx@tiehlicka
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: ErKun Yang <yangerkun@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "zhangyi (F)" <yi.zhang@huawei.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:29 -08:00
Zhang Chujun
77a7cfd96c mm/debug: fix missing space in case statement
In setup_vm_debug() , the case statement for 'p' option is written as
'case'p':' without a space between 'case' and the character constant. 
While this is syntactically valid C, it violates the Linux kernel coding
style, which requires a space after 'case'.  This patch adds the missing
space to comply with coding standards.

Link: https://lkml.kernel.org/r/20251103065910.2196-1-zhangchujun@cmss.chinamobile.com
Signed-off-by: Zhang Chujun <zhangchujun@cmss.chinamobile.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:29 -08:00
Pedro Demarchi Gomes
05c3fa9c9f ksm: replace function unmerge_ksm_pages with break_ksm
Function unmerge_ksm_pages() is unnecessary since now break_ksm() walks an
address range.  So replace it with break_ksm().

Link: https://lkml.kernel.org/r/20251105184912.186329-4-pedrodemargomes@gmail.com
Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:28 -08:00
Pedro Demarchi Gomes
5d4939fc22 ksm: perform a range-walk in break_ksm
Make break_ksm() receive an address range and change break_ksm_pmd_entry()
to perform a range-walk and return the address of the first ksm page
found.

This change allows break_ksm() to skip unmapped regions instead of
iterating every page address.  When unmerging large sparse VMAs, this
significantly reduces runtime.

In a benchmark unmerging a 32 TiB sparse virtual address space where only
one page was populated, the runtime dropped from 9 minutes to less then 5
seconds.

Link: https://lkml.kernel.org/r/20251105184912.186329-3-pedrodemargomes@gmail.com
Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:28 -08:00
Pedro Demarchi Gomes
912aa82595 Revert "mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk"
Patch series "ksm: perform a range-walk to jump over holes in break_ksm",
v4.

When unmerging an address range, unmerge_ksm_pages function walks every
page address in the specified range to locate ksm pages.  This becomes
highly inefficient when scanning large virtual memory areas that contain
mostly unmapped regions, causing the process to get blocked for several
minutes.

This patch makes break_ksm, function called by unmerge_ksm_pages for every
page in an address range, perform a range walk, allowing it to skip over
entire unmapped holes in a VMA, avoiding unnecessary lookups.

As pointed out by David Hildenbrand in [1], unmerge_ksm_pages() is called
from:

* ksm_madvise() through madvise(MADV_UNMERGEABLE).  There are not a lot
  of users of that function.

* __ksm_del_vma() through ksm_del_vmas().  Effectively called when
  disabling KSM for a process either through the sysctl or from s390x gmap
  code when enabling storage keys for a VM.

Consider the following test program which creates a 32 TiB mapping in the
virtual address space but only populates a single page:

#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>

/* 32 TiB */
const size_t size = 32ul * 1024 * 1024 * 1024 * 1024;

int main() {
        char *area = mmap(NULL, size, PROT_READ | PROT_WRITE,
                          MAP_NORESERVE | MAP_PRIVATE | MAP_ANON, -1, 0);

        if (area == MAP_FAILED) {
                perror("mmap() failed\n");
                return -1;
        }

        /* Populate a single page such that we get an anon_vma. */
        *area = 0;

        /* Enable KSM. */
        madvise(area, size, MADV_MERGEABLE);
        madvise(area, size, MADV_UNMERGEABLE);
        return 0;
}


Without this patch, this program takes 9 minutes to finish, while with
this patch it finishes in less then 5 seconds.


This patch (of 3):

This reverts commit e317a8d8b4 and changes
function break_ksm_pmd_entry() to use folios.

This reverts break_ksm() to use walk_page_range_vma() instead of
folio_walk_start().

Change break_ksm_pmd_entry() to call is_ksm_zero_pte() only if we know the
folio is present, and also rename variable ret to found.  This will make
it easier to later modify break_ksm() to perform a proper range walk.

Link: https://lkml.kernel.org/r/20251105184912.186329-1-pedrodemargomes@gmail.com
Link: https://lkml.kernel.org/r/20251105184912.186329-2-pedrodemargomes@gmail.com
Link: https://lore.kernel.org/linux-mm/e0886fdf-d198-4130-bd9a-be276c59da37@redhat.com/ [1]
Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:28 -08:00
Israel Batista
ed1f8855dd mm: change type of parameter for memory_notify
memory_notify() is responsible for sending events related to memory
hotplugging to a notification queue.  Since all the events must match one
of the values from the enum memory_block_state, it is appropriate to
change the function parameter type to make this condition explicit at
compile time.

Link: https://lkml.kernel.org/r/20251029195617.2210700-4-linux@israelbatista.dev.br
Signed-off-by: Israel Batista <linux@israelbatista.dev.br>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Omar Sandoval <osandov@osandov.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:28 -08:00
Israel Batista
8bc7ba3d26 mm: change type of state in struct memory_block
The state of a memory block should be restricted to values specified in
the documentation of the memory hotplug API.  However, since the state
field in the memory_block struct was defined as an unsigned long, this
restriction was not enforced at compile time.

With the introduction of the enum memory_block_state, it is now possible
to incorporate the desired semantics in the field declaration and enforce
these restrictions at compile time.

[akpm@linux-foundation.org: fix whitespace, per Randy]
Link: https://lkml.kernel.org/r/20251029195617.2210700-3-linux@israelbatista.dev.br
Signed-off-by: Israel Batista <linux@israelbatista.dev.br>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Omar Sandoval <osandov@osandov.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:27 -08:00
Israel Batista
1a4f70f685 mm: convert memory block states (MEM_*) macros to enum
Patch series "mm: Convert memory block states (MEM_*) macros to enums", v2.

The MEM_* constants indicating the state of a memory block are currently
defined as macros, meaning their definitions will be omitted from the
debuginfo on most kernel builds.  This makes it harder for debuggers to
correctly map the block state at runtime, which can be quite useful when
analysing errors related to memory hot plugging and unplugging with tools
such as drgn.

Converting the constants to an enum ensures the correct information is
emitted by the compiler and available for the debugger, without needing to
hard-code them into the debugger and track their changes.

This patch series aims to replace the current macros with a newly created
enum named memory_block_state, while also taking advantage of the compile
time guarantees that we get when using enums.

The first patch does the conversion of the macros to an enum, while the
2nd and 3rd patches use this enum to clean up some type declarations and
make sure that only valid values are used.


This patch (of 3):

Converting the MEM_* constants from macros to an enum ensures that their
values will be correctly emitted in the debug symbols, making it easier to
trace the meaning of each value when debugging with tools such as drgn,
without the need to hard-code the values.

Since the values are mutually exclusive and they are not exposed directly
to userspace, I also dropped the misleading pattern (1<<X) that made it
look like they were combinable flags.

Link: https://lkml.kernel.org/r/20251029195617.2210700-1-linux@israelbatista.dev.br
Link: https://lkml.kernel.org/r/20251029195617.2210700-2-linux@israelbatista.dev.br
Signed-off-by: Israel Batista <linux@israelbatista.dev.br>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Omar Sandoval <osandov@osandov.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:27 -08:00
Baoquan He
52f37efc59 mm/swap: select swap device with default priority round robin
Swap devices are assumed to have similar accessing speed when swapon if no
priority is specified.  It's unfair and doesn't make sense just because
one swap device is swapped on firstly, its priority will be higher than
the one swapped on later.

Here, set all swap devicess to have priority '-1' by default.  With this
change, swap device with default priority will be selected round robin
when swapping out.  This can improve the swapping efficiency a lot among
multiple swap devices with default priority.

Below are swapon output during the processes when high pressure
vm-scability test is being taken:

1) This is pre-commit a2468cc9bf, swap device is selectd one by one by
   priority from high to low when one swap device is exhausted:
------------------------------------
[root@hp-dl385g10-03 ~]# swapon
NAME       TYPE      SIZE   USED PRIO
/dev/zram0 partition  16G    16G   -1
/dev/zram1 partition  16G 966.2M   -2
/dev/zram2 partition  16G     0B   -3
/dev/zram3 partition  16G     0B   -4

2) This is behaviour with commit a2468cc9bf, on node, swap device
   sharing the same node id is selected firstly until exhausted; while
   on node no swap device sharing the node id it selects the one with
   highest priority until exhaustd:
------------------------------------
[root@hp-dl385g10-03 ~]# swapon
NAME       TYPE      SIZE  USED PRIO
/dev/zram0 partition  16G 15.7G   -2
/dev/zram1 partition  16G  3.4G   -3
/dev/zram2 partition  16G  3.4G   -4
/dev/zram3 partition  16G  2.6G   -5

3) After this patch applied, swap devices with default priority are selectd
   round robin:
------------------------------------
[root@hp-dl385g10-03 block]# swapon
NAME       TYPE      SIZE USED PRIO
/dev/zram0 partition  16G 6.6G   -1
/dev/zram1 partition  16G 6.6G   -1
/dev/zram2 partition  16G 6.6G   -1
/dev/zram3 partition  16G 6.6G   -1

With the change, about 18% efficiency promotion relative to node based
way as below. (Surely, the pre-commit a2468cc9bf way is the worst.)

vm-scability test:
==================
Test with:
usemem --init-time -O -y -x -n 31 2G (4G memcg, zram as swap)
                            one by one:      node based:      round robin:
System time:                1087.38 s        637.92 s         526.74 s     (lower is better)
Sum Throughput:             2036.55 MB/s     3546.56 MB/s     4207.56 MB/s (higher is better)
Single process Throughput:  65.69 MB/s       114.40 MB/s      135.72 MB/s  (high is better)
free latency:               15769409.48 us   10138455.99 us   6810119.01 us(lower is better)

Link: https://lkml.kernel.org/r/20251028034308.929550-3-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Suggested-by: Chris Li <chrisl@kernel.org>
Acked-by: Chris Li <chrisl@kernel.org>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:27 -08:00
Baoquan He
8e689f8ea4 mm/swap: do not choose swap device according to numa node
Patch series "mm/swapfile.c: select swap devices of default priority round
robin", v5.

Currently, on system with multiple swap devices, swap allocation will
select one swap device according to priority.  The swap device with the
highest priority will be chosen to allocate firstly.

People can specify a priority from 0 to 32767 when swapon a swap device,
or the system will set it from -2 then downwards by default.  Meanwhile,
on NUMA system, the swap device with node_id will be considered first on
that NUMA node of the node_id.

In the current code, an array of plist, swap_avail_heads[nid], is used to
organize swap devices on each NUMA node.  For each NUMA node, there is a
plist organizing all swap devices.  The 'prio' value in the plist is the
negated value of the device's priority due to plist being sorted from low
to high.  The swap device owning one node_id will be promoted to the front
position on that NUMA node, then other swap devices are put in order of
their default priority.

E.g I got a system with 8 NUMA nodes, and I setup 4 zram partition as
swap devices.

Current behaviour:
their priorities will be(note that -1 is skipped):
NAME       TYPE      SIZE USED PRIO
/dev/zram0 partition  16G   0B   -2
/dev/zram1 partition  16G   0B   -3
/dev/zram2 partition  16G   0B   -4
/dev/zram3 partition  16G   0B   -5

And their positions in the 8 swap_avail_lists[nid] will be:
swap_avail_lists[0]: /* node 0's available swap device list */
zram0   -> zram1   -> zram2   -> zram3
prio:1     prio:3     prio:4     prio:5
swap_avali_lists[1]: /* node 1's available swap device list */
zram1   -> zram0   -> zram2   -> zram3
prio:1     prio:2     prio:4     prio:5
swap_avail_lists[2]: /* node 2's available swap device list */
zram2   -> zram0   -> zram1   -> zram3
prio:1     prio:2     prio:3     prio:5
swap_avail_lists[3]: /* node 3's available swap device list */
zram3   -> zram0   -> zram1   -> zram2
prio:1     prio:2     prio:3     prio:4
swap_avail_lists[4-7]: /* node 4,5,6,7's available swap device list */
zram0   -> zram1   -> zram2   -> zram3
prio:2     prio:3     prio:4     prio:5

The adjustment for swap device with node_id intended to decrease the
pressure of lock contention for one swap device by taking different swap
device on different node.  The adjustment was introduced in commit
a2468cc9bf ("swap: choose swap device according to numa node"). 
However, the adjustment is a little coarse-grained.  On the node, the swap
device sharing the node's id will always be selected firstly by node's
CPUs until exhausted, then next one.  And on other nodes where no swap
device shares its node id, swap device with priority '-2' will be selected
firstly until exhausted, then next with priority '-3'.

This is the swapon output during the process high pressure vm-scability
test is being taken.  It's clearly showing zram0 is heavily exploited
until exhausted.

===================================
[root@hp-dl385g10-03 ~]# swapon
NAME       TYPE      SIZE  USED PRIO
/dev/zram0 partition  16G 15.7G   -2
/dev/zram1 partition  16G  3.4G   -3
/dev/zram2 partition  16G  3.4G   -4
/dev/zram3 partition  16G  2.6G   -5

The node based strategy on selecting swap device is much better then the
old way one by one selecting swap device.  However it is still
unreasonable because swap devices are assumed to have similar accessing
speed if no priority is specified when swapon.  It's unfair and doesn't
make sense just because one swap device is swapped on firstly, its
priority will be higher than the one swapped on later.

So in this patchset, change is made to select the swap device round robin
if default priority.  In code, the plist array swap_avail_heads[nid] is
replaced with a plist swap_avail_head which reverts commit a2468cc9bf. 
Meanwhile, on top of the revert, further change is taken to make any
device w/o specified priority get the same default priority '-1'.  Surely,
swap device with specified priority are always put foremost, this is not
impacted.  If you care about their different accessing speed, then use
'swapon -p xx' to deploy priority for your swap devices.

New behaviour:

swap_avail_list: /* one global available swap device list */
zram0   -> zram1   -> zram2   -> zram3
prio:1     prio:1     prio:1     prio:1

This is the swapon output during the process high pressure vm-scability
being taken, all is selected round robin:
=======================================
[root@hp-dl385g10-03 linux]# swapon
NAME       TYPE      SIZE  USED PRIO
/dev/zram0 partition  16G 12.6G   -1
/dev/zram1 partition  16G 12.6G   -1
/dev/zram2 partition  16G 12.6G   -1
/dev/zram3 partition  16G 12.6G   -1

With the change, we can see about 18% efficiency promotion as below:

vm-scability test:
==================
Test with:
usemem --init-time -O -y -x -n 31 2G (4G memcg, zram as swap)
                           Before:          After:
System time:               637.92 s         526.74 s      (lower is better)
Sum Throughput:            3546.56 MB/s     4207.56 MB/s  (higher is better)
Single process Throughput: 114.40 MB/s      135.72 MB/s   (higher is better)
free latency:              10138455.99 us   6810119.01 us (low is better)


This patch (of 2):

This reverts commit a2468cc9bf ("swap: choose swap device according to
numa node").

After this patch, the behaviour will change back to pre-commit
a2468cc9bf.  Means the priority will be set from -1 then downwards by
default, and when swapping, it will exhault swap device one by one
according to priority from high to low.  This is preparation work for
later change.

[root@hp-dl385g10-03 ~]# swapon
NAME       TYPE      SIZE   USED PRIO
/dev/zram0 partition  16G    16G   -1
/dev/zram1 partition  16G 966.2M   -2
/dev/zram2 partition  16G     0B   -3
/dev/zram3 partition  16G     0B   -4

Link: https://lkml.kernel.org/r/20251028034308.929550-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20251028034308.929550-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Suggested-by: Chris Li <chrisl@kernel.org>
Acked-by: Chris Li <chrisl@kernel.org>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Kairui Song <kasong@tencent.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:27 -08:00
Jiayuan Chen
6af766c86b mm: vmstat: output reserved_highatomic and free_highatomic in zoneinfo
The nr_free_highatomic is a key factor in calculating watermarks as it
affects the free pages count.  Adding this metric, along with
nr_reserved_highatomic, to /proc/zoneinfo facilitates easier diagnosis
memory watermark calculations and memory pressure states.

Sample output:
cat /proc/zoneinfo
......
pagesets
cpu: 0
		  count:    52069
		  high:     52675
		  batch:    63
		  high_min: 13971
		  high_max: 62284
vm stats threshold: 10
node_unreclaimable:  0
start_pfn:           4096
reserved_highatomic: 5120
free_highatomic:     2081

Link: https://lkml.kernel.org/r/20251027141818.283587-1-jiayuan.chen@linux.dev
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:26 -08:00
Anshuman Khandual
272239dc8f mm: make INVALID_PHYS_ADDR a generic macro
INVALID_PHYS_ADDR has very similar definitions across the code base. 
Hence just move that inside header <liux/mm.h> for more generic usage. 
Also drop the now redundant ones which are no longer required.

Link: https://lkml.kernel.org/r/20251021025638.2420216-1-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>	[s390]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:26 -08:00
Lorenzo Stoakes
b734b9d973 mm/vma: small VMA lock cleanups
We declare vma_start_read() as a static function in mm/mmap_lock.c, so
there is no need to provide a stub for !CONFIG_PER_VMA_LOCK.

__is_vma_write_locked() is declared in a header and should therefore be
static inline.

Put parens around (refcnt & VMA_LOCK_OFFSET) in is_vma_writer_only() to
make precedence clear.

Link: https://lkml.kernel.org/r/20251024090902.1118174-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:26 -08:00
Harry Yoo
ad8b2e0961 treewide: include linux/pgalloc.h instead of asm/pgalloc.h
For now, including <asm/pgalloc.h> instead of <linux/pgalloc.h> is
technically fine unless the .c file calls p*d_populate_kernel() helper
functions.

But it is a better practice to always include <linux/pgalloc.h>.  Include
<linux/pgalloc.h> instead of <asm/pgalloc.h> outside arch/.

Link: https://lkml.kernel.org/r/20251024113047.119058-3-harry.yoo@oracle.com
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:25 -08:00
Harry Yoo
5e0fa7ed98 MAINTAINERS: add include/linux/pgalloc.h to MM CORE section
Patch series "mm: MISC follow-up patches for linux/pgalloc.h", v2.

This is a follow-up patch series for the patch series named: "[PATCH V5
mm-hotfixes 0/3] mm, x86: fix crash due to missing page table sync and
make it harder to miss".


This patch (of 2):

Since include/linux/pgtable.h is already listed in the MM CORE section,
add it to the section as well to keep it maintained by the appropriate
maintainers.

Link: https://lkml.kernel.org/r/20251024113047.119058-1-harry.yoo@oracle.com
Link: https://lkml.kernel.org/r/20251024113047.119058-2-harry.yoo@oracle.com
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: SeongJae Park <sj@kernel.org>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:25 -08:00
SeongJae Park
809ba69f9f selftests/damon/sysfs: add obsolete_target test
A new DAMON sysfs file for pin-point target removal, namely
obsolete_target, has been added.  Add a test for the functionality.  It
starts DAMON with three monitoring target processes, mark one in the
middle as obsolete, commit it, and confirm the internal DAMON status is
updated to remove the target in the middle.

Link: https://lkml.kernel.org/r/20251023012535.69625-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:25 -08:00
SeongJae Park
65a9033db7 sysfs.py: extend assert_ctx_committed() for monitoring targets
assert_ctx_committed() is not asserting monitoring targets commitment,
since all existing callers of the function assume no target changes. 
Extend it for future usage.

Link: https://lkml.kernel.org/r/20251023012535.69625-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:25 -08:00
SeongJae Park
a00f18abef drgn_dump_damon_status: dump damon_target->obsolete
A new field of damon_target for pin-point target removal, namely obsolete,
has newly been added.  Extend drgn_dump_damon_status.py to dump it, for
easily writing a future DAMON selftests of it.

Link: https://lkml.kernel.org/r/20251023012535.69625-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:24 -08:00
SeongJae Park
badfa4361c selftests/damon/_damon_sysfs: support obsolete_target file
A DAMON sysfs file, namely obsolete_target, has been newly introduced. 
Add a support of that file to _damon_sysfs.py so that DAMON selftests for
the file can be easily written.

Link: https://lkml.kernel.org/r/20251023012535.69625-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:24 -08:00
SeongJae Park
9abe8d0519 Docs/ABI/damon: document obsolete_target sysfs file
Update DAMON ABI document for the newly added obsolete_target DAMON sysfs
file.

Link: https://lkml.kernel.org/r/20251023012535.69625-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:24 -08:00
SeongJae Park
e06469cdf1 Docs/admin-guide/mm/damon/usage: document obsolete_target file
Document the newly added obsolete_target DAMON sysfs file.

Link: https://lkml.kernel.org/r/20251023012535.69625-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:24 -08:00
SeongJae Park
e35afdf228 mm/damon/sysfs: implement obsolete_target file
There is no good way to remove DAMON targets in the middle of the existing
targets list.  It restricts efficient and flexible DAMON use cases. 
Improve the usability by implementing a new DAMON sysfs interface file,
namely obsolete_target, under each target directory.  It is connected to
the obsolete field of parameters commit-source targets, so allows removing
arbitrary targets in the middle of existing targets list.

Note that the sysfs files are not automatically updated.  For example,
let's suppose there are three targets in the running context, and a user
removes the third target using this feature.  If the user writes 'commit'
to the kdamond 'state' file again, DAMON sysfs interface will again try to
remove the third target.  But because there is no matching target in the
running context, the commit will fail.  It is the user's responsibility to
understand resulting DAMON internal targets list change, and construct
sysfs files (using nr_targets and other sysfs files) to correctly
represent it.

Also note that this is arguably an improvement rather than a fix of broken
things.

Link: https://lkml.kernel.org/r/20251023012535.69625-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Reported-by: Bijan Tabatabai <bijan311@gmail.com>
Closes: https://github.com/damonitor/damo/issues/36
Reviewed-by: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:23 -08:00
SeongJae Park
60bd24f272 mm/damon/sysfs: test commit input against realistic destination
DAMON sysfs interface tests if given online parameters update request is
valid, by committing those using the DAMON kernel API, to a test-purpose
destination context.  The test-purpose destination context is constructed
using damon_new_ctx(), so it has no target, no scheme.

If a source target has the obsolete field set, the test-purpose commit
will fail because damon_commit_targets() fails when there is a source
obsolete target that cannot find its matching destination target.  DAMON
sysfs interface is not letting users set the field for now, so there is no
problem.  However, the following commit will support that.  Also there
could be similar future changes that making commit fails based on current
context structure.

Make the test purpose commit destination context similar to the current
running one, by committing the running one to the test purpose context,
before doing the real test-purpose commit.

Link: https://lkml.kernel.org/r/20251023012535.69625-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:23 -08:00
SeongJae Park
adf7d6cdd7 mm/damon/core: add damon_target->obsolete for pin-point removal
Patch series "mm/damon: support pin-point targets removal".

DAMON maintains the targets in a list, and allows committing only an
entire list of targets having the new parameters.  Targets having same
index on the lists are treated as matching source and destination
targets.  If an existing target cannot find a matching one in the
sources list, the target is removed.  This means that there is no way to
remove only a specific monitoring target in the middle of the current
targets list.

Such pin-point target removal is really needed in some use cases,
though.  Monitoring access patterns on virtual address spaces of
processes that spawned from the same ancestor is one example.  If a
process of the group is terminated, the user may want to remove the
matching DAMON target as soon as possible, to save in-kernel memory
usage for the unnecessary target data.  The user may also want to do
that without turning DAMON off or removing unnecessary targets, to keep
the current monitoring results for other active processes.

Extend DAMON kernel API and sysfs ABI to support the pin-point removal
in the following way.  For API, add a new damon_target field, namely
'obsolete'.  If the field on parameters commit source target is set, it
means the matching destination target is obsolete.  Then the parameters
commit logic removes the destination target from the existing targets
list.  For sysfs ABI, add a new file under the target directory, namely
'obsolete_target'.  It is connected with the 'obsolete' field of the
commit source targets, so internally using the new API.

Also add a selftest for the new feature.  The related helper scripts for
manipulating the sysfs interface and dumping in-kernel DAMON status are
also extended for this.  Note that the selftest part was initially
posted as an individual RFC series [1], but now merged into this one.

Bijan Tabatabai has originally reported this issue, and participated in
this solution design on a GitHub issue [1] for DAMON user-space tool.


This patch (of 9):

DAMON's monitoring targets parameters update function,
damon_commit_targets(), is not providing a way to remove a target in the
middle of the existing targets list.  Extend the API by adding a field to
struct damon_target.  If the field of a damon_commit_targets() source
target is set, it indicates the matching target on the existing targets
list is obsolete.  damon_commit_targets() understands that and removes
those from the list, while respecting the index based matching for other
non-obsolete targets.

Link: https://lkml.kernel.org/r/20251023012535.69625-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20251023012535.69625-2-sj@kernel.org
Link: https://github.com/damonitor/damo/issues/36 [1]
Signed-off-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:23 -08:00
Dmitry Ilvokhin
e97d7c5165 mm: shmem/tmpfs hugepage defaults config choice
Allow to override defaults for shemem and tmpfs at config time.  This is
consistent with how transparent hugepages can be configured.

Same results can be achieved with the existing
'transparent_hugepage_shmem' and 'transparent_hugepage_tmpfs' settings in
the kernel command line, but it is more convenient to define basic
settings at config time instead of changing kernel command line later.

Defaults for shmem and tmpfs were not changed.  They are remained the same
as before: 'never' for both cases.  Options 'deny' and 'force' are omitted
intentionally since these are special values and supposed to be used for
emergencies or testing and are not expected to be permanent ones.

Primary motivation for adding config option is to enable policy
enforcement at build time.  In large-scale production environments (Meta's
for example), the kernel configuration is often maintained centrally close
to the kernel code itself and owned by the kernel engineers, while boot
parameters are managed independently (e.g.  by provisioning systems).  In
such setups, the kernel build defines the supported and expected behavior
in a single place, but there is no reliable or uniform control over the
kernel command line options.

A build-time default allows kernel integrators to enforce a predictable
hugepage policy for shmem/tmpfs on a base layer, ensuring reproducible
behavior and avoiding configuration drift caused by possible boot-time
differences.

In short, primary benefit is mostly operational: it provides a way to
codify preferred policy in the kernel configuration, which is versioned,
reviewed, and tested as part of the kernel build process, rather than
depending on potentially variable boot parameters.

[d@ilvokhin.com: v2]
  Link: https://lkml.kernel.org/r/aQECPpjd-fU_TC79@shell.ilvokhin.com
Link: https://lkml.kernel.org/r/aPpv8sAa2sYgNu3L@shell.ilvokhin.com
Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:23 -08:00
SeongJae Park
f46dbea0d9 Docs/admin-guide/mm/damon/stat: document negative idle time
Commit a983a26d52 ("mm/damon/stat: expose negative idle time")
introduced the negative idle time feature for DAMON_STAT.  But it is not
documented.  Document it on the usage document.

Link: https://lkml.kernel.org/r/20251026182216.118200-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:22 -08:00
SeongJae Park
da8644a476 Docs/admin-guide/mm/damon/stat: document aggr_interval_us parameter
Commit cc7ceb1d14 ("mm/damon/stat: expose the current tuned aggregation
interval"), has introduced 'aggr_interval_us' parameter for DAMON_STAT. 
But the new parameter is not yet documented.  Document it on the usage
document for the module.

Link: https://lkml.kernel.org/r/20251026182216.118200-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:22 -08:00
SeongJae Park
448666e418 Docs/admin-guide/mm/damon/lru_sort: document addr_unit parameter
Commit 2e0fe9245d ("mm/damon/lru_sort: support addr_unit for
DAMON_LRU_SORT") introduced the 'addr_unit' parameter for DAMON_LRU_SORT. 
But the usage document is not updated for that.  Update the document.

Link: https://lkml.kernel.org/r/20251026182216.118200-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:22 -08:00
SeongJae Park
bb01656e00 Docs/admin-guide/mm/damon/reclaim: document addr_unit parameter
Commit 7db551fcfb ("mm/damon/reclaim: support addr_unit for
DAMON_RECLAIM") introduced the 'addr_unit' parameter for DAMON_RECLAIM. 
But the usage document is not updated for that.  Update the document.

Link: https://lkml.kernel.org/r/20251026182216.118200-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:22 -08:00
SeongJae Park
29221406f0 Docs/admin-guide/mm/damon/usage: document empty target regions commit behavior
Committing a monitoring target with empty target regions is for keeping
the current monitoring results.  This behavior was introduced by commit
9732336006 ("mm/damon/sysfs: update monitoring target regions for online
input commit").  The behavior is not documented, though.  Update the usage
document for clarifying this behavior.

Link: https://lkml.kernel.org/r/20251026182216.118200-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:21 -08:00
SeongJae Park
a01386c16d Docs/admin-guide/mm/damon/stat: fix a typo: s/sampling events/sampling interval/
It is a contextual typo.  Fix it.

Link: https://lkml.kernel.org/r/20251026182216.118200-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:21 -08:00
SeongJae Park
d7484f6edd Docs/mm/damon/design: fix wrong link to intervals goal section
Commit b243d666d1 ("Docs/admin-guide/mm/damon/usage: add intervals_goal
directory on the hierarchy") mistakenly added a wrong reference for
intervals goal usage documentation on the design document.  Fix it.

Link: https://lkml.kernel.org/r/20251026182216.118200-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:21 -08:00
SeongJae Park
50ca642364 mm/damon/core: fix wrong comment of damon_call() return timing
Patch series "mm/damon: misc documentation fixups".

First three patches fix up issues in the documents, including wrong
explanation of a behavior, wrong link, and a contextual typo.  Following
five patches update documents for not yet documented features and
behaviors.


This patch (of 8):

damon_call() works asynchronously and synchronously for repeat and
non-repeat mode requests, respectively.  The comment about the behavior is
wrong, though.  Fix it.

The wrong comment was introduced together with the repeat mode, by commit
43df7676e5 ("mm/damon/core: introduce repeat mode damon_call()").

Link: https://lkml.kernel.org/r/20251026182216.118200-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20251026182216.118200-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:21 -08:00
Kairui Song
4fd58b51ef mm, swap: remove redundant argument for isolating a cluster
The order argument was introduced by an intermediate commit and was then
never used, just remove it.

Link: https://lkml.kernel.org/r/20251024-swap-clean-after-swap-table-p1-v2-5-a709469052e7@tencent.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:20 -08:00
Kairui Song
ab61de9b78 mm/migrate, swap: drop usage of folio_index
This helper was used when swap cache was mixed with page cache.  Now they
are completely separate from each other, access to the swap cache is all
wrapped by the swap_cache_* helpers, which expect the folio's swap entry
as a parameter.

This helper is no longer used, remove the last redundant user and drop it.

Link: https://lkml.kernel.org/r/20251024-swap-clean-after-swap-table-p1-v2-4-a709469052e7@tencent.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Chris Li <chrisl@kernel.org>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:20 -08:00
Kairui Song
a983471cfc mm, swap: cleanup swap entry allocation parameter
We no longer need this GFP parameter after commit 8578e0c00d ("mm, swap:
use the swap table for the swap cache and switch API").  Before that
commit the GFP parameter is already almost identical for all callers, so
nothing changed by that commit.  Swap table just moved the GFP to lower
layer and make it more defined and changes depend on atomic or sleep
allocation.

Now this parameter is no longer used, just remove it.  No behavior change.

Link: https://lkml.kernel.org/r/20251024-swap-clean-after-swap-table-p1-v2-3-a709469052e7@tencent.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Chris Li <chrisl@kernel.org>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:20 -08:00
Kairui Song
e4adea27b9 mm, swap: rename helper for setup bad slots
The name inc_cluster_info_page is very confusing, as this helper is only
used during swapon to mark bad slots.  Rename it properly and turn the
VM_BUG_ON in it into WARN_ON to expose more potential issues.  Swapon is a
cold path, so adding more checks should be a good idea.

No feature change except new WARN_ON.

Link: https://lkml.kernel.org/r/20251024-swap-clean-after-swap-table-p1-v2-2-a709469052e7@tencent.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Chris Li <chrisl@kernel.org>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:20 -08:00
Kairui Song
9fb749cd15 mm, swap: do not perform synchronous discard during allocation
Patch series "mm, swap: misc cleanup and bugfix", v2.

A few cleanups and a bugfix that are either suitable after the swap table
phase I or found during code review.

Patch 1 is a bugfix and needs to be included in the stable branch, the
rest have no behavioral change.


This patch (of 5):

Since commit 1b7e90020e ("mm, swap: use percpu cluster as allocation
fast path"), swap allocation is protected by a local lock, which means we
can't do any sleeping calls during allocation.

However, the discard routine is not taken well care of.  When the swap
allocator failed to find any usable cluster, it would look at the pending
discard cluster and try to issue some blocking discards.  It may not
necessarily sleep, but the cond_resched at the bio layer indicates this is
wrong when combined with a local lock.  And the bio GFP flag used for
discard bio is also wrong (not atomic).

It's arguable whether this synchronous discard is helpful at all.  In most
cases, the async discard is good enough.  And the swap allocator is doing
very differently at organizing the clusters since the recent change, so it
is very rare to see discard clusters piling up.

So far, no issues have been observed or reported with typical SSD setups
under months of high pressure.  This issue was found during my code
review.  But by hacking the kernel a bit: adding a mdelay(500) in the
async discard path, this issue will be observable with WARNING triggered
by the wrong GFP and cond_resched in the bio layer for debug builds.

So now let's apply a hotfix for this issue: remove the synchronous discard
in the swap allocation path.  And when order 0 is failing with all cluster
list drained on all swap devices, try to do a discard following the swap
device priority list.  If any discards released some cluster, try the
allocation again.  This way, we can still avoid OOM due to swap failure if
the hardware is very slow and memory pressure is extremely high.

This may cause more fragmentation issues if the discarding hardware is
really slow.  Ideally, we want to discard pending clusters before
continuing to iterate the fragment cluster lists.  This can be implemented
in a cleaner way if we clean up the device list iteration part first.

Link: https://lkml.kernel.org/r/20251024-swap-clean-after-swap-table-p1-v2-0-a709469052e7@tencent.com
Link: https://lkml.kernel.org/r/20251024-swap-clean-after-swap-table-p1-v2-1-c5b0e1092927@tencent.com
Fixes: 1b7e90020e ("mm, swap: use percpu cluster as allocation fast path")
Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:19 -08:00
Wei Yang
f0b1602871 mm/huge_memory: optimize old_order derivation during folio splitting
Folio splitting requires both the folio's original order (@old_order) and
the new target order (@split_order).

In the current implementation, @old_order is repeatedly retrieved using
folio_order().

However, for every iteration after the first, the folio being split is the
result of the previous split, meaning its order is already known to be
equal to the previous iteration's @split_order.

This commit optimizes the logic:

  * Instead of calling folio_order(), we now set @old_order directly to
    the value of @split_order from the previous iteration.

This change avoids unnecessary function calls and simplifies the loop
setup.

Also it removes a check for non-existent case, since for uniform splitting
we only do split when @split_order == @new_order.

Link: https://lkml.kernel.org/r/20251021212142.25766-5-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: wang lian <lianux.mm@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:19 -08:00
Wei Yang
fc4f15ee0b mm/huge_memory: optimize and simplify folio stat update after split
The loop executed after a successful folio split currently has two
combined responsibilities:

  * updating statistics for the new folios
  * determining the folio for the next split iteration.

This commit refactors the logic to directly calculate and update folio
statistics, eliminating the need for the iteration step.

We can do this because all necessary information is already available:

  * All resulting new folios have the same order, which is @split_order.
  * The exact number of new folios can be calculated directly using
    @old_order and @split_order.
  * The folio for the subsequent split is simply the one containing
    @split_at.

By leveraging this knowledge, we can achieve the stat update more cleanly
and efficiently without the looping logic.

Link: https://lkml.kernel.org/r/20251021212142.25766-4-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: wang lian <lianux.mm@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:19 -08:00
Wei Yang
092ef38997 mm/huge_memory: update folio stat after successful split
The current implementation complicates this process:

  * It iterates over the resulting new folios.
  * It uses a flag (@stop_split) to conditionally skip updating the stat
    for the folio at @split_at during the loop.
  * It then attempts to update the skipped stat on a subsequent failure
    path.

This logic is unnecessarily hard to follow.

This commit refactors the code to update the folio statistics only after a
successful split.  This makes the logic much cleaner and sets the stage
for further simplification of the stat-handling code.

Link: https://lkml.kernel.org/r/20251021212142.25766-3-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: wang lian <lianux.mm@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:19 -08:00
Wei Yang
ef0258857d mm/huge_memory: avoid reinvoking folio_test_anon()
Patch series "mm/huge_memory: cleanup __split_unmapped_folio()", v3.

This patch series cleans up and optimizes the internal logic of the
__split_unmapped_folio() function.

The goal is to improve clarity and efficiency by eliminating redundant
checks, caching stable attribute values, and simplifying the iteration
logic used for updating folio statistics.

These changes make the code easier to follow and maintain.

The split_huge_page_test selftest pass.


This patch (of 4):

During the execution of __split_unmapped_folio(), the folio's anon/!anon
attribute is invariant (not expected to change).

Therefore, it is safe and more efficient to retrieve this attribute once
at the start and reuse it throughout the function.

Link: https://lkml.kernel.org/r/20251021212142.25766-1-richard.weiyang@gmail.com
Link: https://lkml.kernel.org/r/20251016004613.514-1-richard.weiyang@gmail.com
Link: https://lkml.kernel.org/r/20251016004613.514-2-richard.weiyang@gmail.com
Link: https://lkml.kernel.org/r/20251021212142.25766-2-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: wang lian <lianux.mm@gmail.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Nico Pache <npache@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:18 -08:00
Anshuman Khandual
eca1fba233 mm/debug_vm_pgtable: add [pte|pmd]_mkwrite_novma() tests
Add some [pte|pmd]_mkwrite_novma() relevant tests.

[anshuman.khandual@arm.com: add a new test combination per Huang Ying]
  Link: https://lkml.kernel.org/r/20251024013137.136926-1-anshuman.khandual@arm.com
Link: https://lkml.kernel.org/r/20251022032951.3498553-1-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:18 -08:00
Lu Baolu
e37d5a2d60 iommu/sva: invalidate stale IOTLB entries for kernel address space
Introduce a new IOMMU interface to flush IOTLB paging cache entries for
the CPU kernel address space.  This interface is invoked from the x86
architecture code that manages combined user and kernel page tables,
specifically before any kernel page table page is freed and reused.

This addresses the main issue with vfree() which is a common occurrence
and can be triggered by unprivileged users.  While this resolves the
primary problem, it doesn't address some extremely rare case related to
memory unplug of memory that was present as reserved memory at boot, which
cannot be triggered by unprivileged users.  The discussion can be found at
the link below.

Enable SVA on x86 architecture since the IOMMU can now receive
notification to flush the paging cache before freeing the CPU kernel page
table pages.

Link: https://lkml.kernel.org/r/20251022082635.2462433-9-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/linux-iommu/04983c62-3b1d-40d4-93ae-34ca04b827e5@intel.com/
Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Suggested-by: Jann Horn <jannh@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:18 -08:00
Dave Hansen
5ba2f0a155 mm: introduce deferred freeing for kernel page tables
This introduces a conditional asynchronous mechanism, enabled by
CONFIG_ASYNC_KERNEL_PGTABLE_FREE.  When enabled, this mechanism defers the
freeing of pages that are used as page tables for kernel address mappings.
These pages are now queued to a work struct instead of being freed
immediately.

This deferred freeing allows for batch-freeing of page tables, providing a
safe context for performing a single expensive operation (TLB flush) for a
batch of kernel page tables instead of performing that expensive operation
for each page table.

Link: https://lkml.kernel.org/r/20251022082635.2462433-8-baolu.lu@linux.intel.com
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasant Hegde <vasant.hegde@amd.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:18 -08:00
Lu Baolu
bf9e4e30f3 x86/mm: use pagetable_free()
The kernel's memory management subsystem provides a dedicated interface,
pagetable_free(), for freeing page table pages.  Updates two call sites to
use pagetable_free() instead of the lower-level __free_page() or
free_pages().  This improves code consistency and clarity, and ensures the
correct freeing mechanism is used.

Link: https://lkml.kernel.org/r/20251022082635.2462433-7-baolu.lu@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasant Hegde <vasant.hegde@amd.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:17 -08:00
Dave Hansen
0189429567 mm: introduce pure page table freeing function
The pages used for ptdescs are currently freed back to the allocator in a
single location.  They will shortly be freed from a second location.

Create a simple helper that just frees them back to the allocator.

Link: https://lkml.kernel.org/r/20251022082635.2462433-6-baolu.lu@linux.intel.com
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasant Hegde <vasant.hegde@amd.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:17 -08:00
Dave Hansen
412d000346 x86/mm: use 'ptdesc' when freeing PMD pages
There are a billion ways to refer to a physical memory address.  One of
the x86 PMD freeing code location chooses to use a 'pte_t *' to point to a
PMD page and then call a PTE-specific freeing function for it.  That's a
bit wonky.

Just use a 'struct ptdesc *' instead.  Its entire purpose is to refer to
page table pages.  It also means being able to remove an explicit cast.

Right now, pte_free_kernel() is a one-liner that calls
pagetable_dtor_free().  Effectively, all this patch does is remove one
superfluous __pa(__va(paddr)) conversion and then call
pagetable_dtor_free() directly instead of through a helper.

Link: https://lkml.kernel.org/r/20251022082635.2462433-5-baolu.lu@linux.intel.com
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasant Hegde <vasant.hegde@amd.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:17 -08:00
Dave Hansen
977870522a mm: actually mark kernel page table pages
Now that the API is in place, mark kernel page table pages just after they
are allocated.  Unmark them just before they are freed.

Note: Unconditionally clearing the 'kernel' marking (via
ptdesc_clear_kernel()) would be functionally identical to what is here. 
But having the if() makes it logically clear that this function can be
used for kernel and non-kernel page tables.

Link: https://lkml.kernel.org/r/20251022082635.2462433-4-baolu.lu@linux.intel.com
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasant Hegde <vasant.hegde@amd.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:17 -08:00
Dave Hansen
27bfafac65 mm: add a ptdesc flag to mark kernel page tables
The page tables used to map the kernel and userspace often have very
different handling rules.  There are frequently *_kernel() variants of
functions just for kernel page tables.  That's not great and has lead to
code duplication.

Instead of having completely separate call paths, allow a 'ptdesc' to be
marked as being for kernel mappings.  Introduce helpers to set and clear
this status.

Note: this uses the PG_referenced bit.  Page flags are a great fit for
this since it is truly a single bit of information.  Use PG_referenced
itself because it's a fairly benign flag (as opposed to things like
PG_lock).  It's also (according to Willy) unlikely to go away any time
soon.

PG_referenced is not in PAGE_FLAGS_CHECK_AT_FREE.  It does not need to be
cleared before freeing the page, and pages coming out of the allocator
should have it cleared.  Regardless, introduce an API to clear it anyway. 
Having symmetry in the API makes it easier to change the underlying
implementation later, like if there was a need to move to a
PAGE_FLAGS_CHECK_AT_FREE bit.

Link: https://lkml.kernel.org/r/20251022082635.2462433-3-baolu.lu@linux.intel.com
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasant Hegde <vasant.hegde@amd.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:16 -08:00
Lu Baolu
72f98ef9a4 iommu: disable SVA when CONFIG_X86 is set
Patch series "Fix stale IOTLB entries for kernel address space", v7.

This proposes a fix for a security vulnerability related to IOMMU Shared
Virtual Addressing (SVA).  In an SVA context, an IOMMU can cache kernel
page table entries.  When a kernel page table page is freed and
reallocated for another purpose, the IOMMU might still hold stale,
incorrect entries.  This can be exploited to cause a use-after-free or
write-after-free condition, potentially leading to privilege escalation or
data corruption.

This solution introduces a deferred freeing mechanism for kernel page
table pages, which provides a safe window to notify the IOMMU to
invalidate its caches before the page is reused.


This patch (of 8):

In the IOMMU Shared Virtual Addressing (SVA) context, the IOMMU hardware
shares and walks the CPU's page tables.  The x86 architecture maps the
kernel's virtual address space into the upper portion of every process's
page table.  Consequently, in an SVA context, the IOMMU hardware can walk
and cache kernel page table entries.

The Linux kernel currently lacks a notification mechanism for kernel page
table changes, specifically when page table pages are freed and reused. 
The IOMMU driver is only notified of changes to user virtual address
mappings.  This can cause the IOMMU's internal caches to retain stale
entries for kernel VA.

Use-After-Free (UAF) and Write-After-Free (WAF) conditions arise when
kernel page table pages are freed and later reallocated.  The IOMMU could
misinterpret the new data as valid page table entries.  The IOMMU might
then walk into attacker-controlled memory, leading to arbitrary physical
memory DMA access or privilege escalation.  This is also a
Write-After-Free issue, as the IOMMU will potentially continue to write
Accessed and Dirty bits to the freed memory while attempting to walk the
stale page tables.

Currently, SVA contexts are unprivileged and cannot access kernel
mappings.  However, the IOMMU will still walk kernel-only page tables all
the way down to the leaf entries, where it realizes the mapping is for the
kernel and errors out.  This means the IOMMU still caches these
intermediate page table entries, making the described vulnerability a real
concern.

Disable SVA on x86 architecture until the IOMMU can receive notification
to flush the paging cache before freeing the CPU kernel page table pages.

Link: https://lkml.kernel.org/r/20251022082635.2462433-1-baolu.lu@linux.intel.com
Link: https://lkml.kernel.org/r/20251022082635.2462433-2-baolu.lu@linux.intel.com
Fixes: 26b25a2b98 ("iommu: Bind process address spaces to devices")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasant Hegde <vasant.hegde@amd.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yi Lai <yi1.lai@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:16 -08:00
Shakeel Butt
5ff592bec7 memcg: manually uninline __memcg_memory_event
__memcg_memory_event() has been unnecessarily marked inline even when it
is not really performance critical.  It is usually called to track extreme
conditions.  Over the time, it has evolved to include more functionality
and inlining it is causing more harm.

Before the patch:
$ size mm/memcontrol.o net/ipv4/tcp_input.o net/ipv4/tcp_output.o
   text    data     bss     dec     hex filename
  35645   10574    4192   50411    c4eb mm/memcontrol.o
  54738    1658       0   56396    dc4c net/ipv4/tcp_input.o
  34644    1065       0   35709    8b7d net/ipv4/tcp_output.o

After the patch:
$ size mm/memcontrol.o net/ipv4/tcp_input.o net/ipv4/tcp_output.o
   text    data     bss     dec     hex filename
  35137   10446    4192   49775    c26f mm/memcontrol.o
  54322    1562       0   55884    da4c net/ipv4/tcp_input.o
  34492    1017       0   35509    8ab5 net/ipv4/tcp_output.o

[akpm@linux-foundation.org: use EXPORT_SYMBOL_GPL for __memcg_memory_event, per Michal and Christoph]
Link: https://lkml.kernel.org/r/20251021234425.1885471-1-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: SeongJae Park <sj@kernel.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:16 -08:00
Vishal Moola (Oracle)
a061578043 mm/vmalloc: request large order pages from buddy allocator
Sometimes, vm_area_alloc_pages() will want many pages from the buddy
allocator.  Rather than making requests to the buddy allocator for at most
100 pages at a time, we can eagerly request large order pages a smaller
number of times.

We still split the large order pages down to order-0 as the rest of the
vmalloc code (and some callers) depend on it.  We still defer to the bulk
allocator and fallback path in case of order-0 pages or failure.

Running 1000 iterations of allocations on a small 4GB system finds:

1000 2mb allocations:
	[Baseline]			[This patch]
	real    46.310s			real    0m34.582
	user    0.001s			user    0.006s
	sys     46.058s			sys     0m34.365s

10000 200kb allocations:
	[Baseline]			[This patch]
	real    56.104s			real    0m43.696
	user    0.001s			user    0.003s
	sys     55.375s			sys     0m42.995s

Link: https://lkml.kernel.org/r/20251021194455.33351-2-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:15 -08:00
Jing Su
645a3c4243 mm/vmstat: fix indentation in fold_diff function
Adjust misaligned braces in fold_diff() to improve code readability and
maintain consistent coding style.

[akpm@linux-foundation.org: add braces, per Vlastimil & Liam]
Link: https://lkml.kernel.org/r/aPc4I/8zXCGyiapN@pilot-ThinkCentre-M930t-N000
Signed-off-by: Jing Su <jingsusu@didiglobal.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Donet Tom <donettom@linux.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:15 -08:00
William Kucharski
fe62415c9b mm: remove reference to destructor in comment in calculate_sizes()
The commit that removed support for destructors from kmem_cache_alloc()
never removed the comment regarding destructors in the explanation of the
possible relocation of the free pointer in calculate_sizes().

Link: https://lkml.kernel.org/r/20251021110004.2209008-1-william.kucharski@oracle.com
Fixes: 20c2df83d2 ("mm: Remove slab destructors from kmem_cache_create().")
Signed-off-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Christoph Lameter (Ampere) <cl@gentwo.org>
Acked-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:15 -08:00
Leon Hwang
2da6fe91c2 mm/khugepaged: factor out common logic in [scan,alloc]_sleep_millisecs_store()
Both scan_sleep_millisecs_store() and alloc_sleep_millisecs_store()
perform the same operations: parse the input value, update their
respective sleep interval, reset khugepaged_sleep_expire, and wake up the
khugepaged thread.

Factor out this duplicated logic into a helper function
__sleep_millisecs_store(), and simplify both store functions.

No functional change intended.

Link: https://lkml.kernel.org/r/20251021134431.26488-1-leon.hwang@linux.dev
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Nico Pache <npache@redhat.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:15 -08:00
Swaraj Gaikwad
fae4e86024 mm/damon/sysfs: remove misleading todo comment in nid_show()
The TODO comment in nid_show() suggested returning an error if the goal
was not using nid.  However, this comment was found to be inaccurate and
misleading.This patch removes the TODO comment without changing any
existing behavior.

This change follows feedback from SJ who pointed out [1] that wiring-order
independence is expected and the function should simply show the last set
value.  and [2] checkpatch.pl complain about number of chars per line

No functional code changes were made.

Tested with KUnit:
- Built kernel with KUnit and DAMON sysfs tests enabled.
- Executed KUnit tests:
  ./tools/testing/kunit/kunit.py run --kunitconfig ./mm/damon/tests/
- All 25 tests passed, including damon_sysfs_test_add_targets.

Link: https://lkml.kernel.org/r/20251021215323.29734-2-swarajgaikwad1925@gmail.com
Link: https://lore.kernel.org/lkml/20251020151315.66260-1-sj@kernel.org/ [1]
Link: https://lore.kernel.org/lkml/20251021010847.68473-1-sj@kernel.org/ [2]
Signed-off-by: Swaraj Gaikwad <swarajgaikwad1925@gmail.com>
Suggested-by: SeongJae Park <sj@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: David Hunter <david.hunter.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:14 -08:00
Mehdi Ben Hadj Khelifa
f0c74b6cb9 mm/vmalloc: use kmalloc_array() instead of kmalloc()
The number of NUMA nodes (nr_node_ids) is bounded, so overflow is not a
practical concern here.  However, using kmalloc_array() better reflects
the intent to allocate an array of unsigned ints, and improves consistency
with other NUMA-related allocations.

No functional change intended.

Link: https://lkml.kernel.org/r/20251018201207.27441-1-mehdi.benhadjkhelifa@gmail.com
Signed-off-by: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Khalid Aziz <khalid@kernel.org>
Cc: David Hunter <david.hunter.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:14 -08:00
Bagas Sanjaya
184c753342 vmalloc: separate gfp_mask adjunctive parentheses in __vmalloc_node_noprof() kernel-doc comment
Sphinx reports htmldocs warning on __vmalloc_node() comment:

Documentation/core-api/mm-api:52: ./mm/vmalloc.c:4036: WARNING: Inline strong start-string without end-string. [docutils]

Fix it by separating adjunctive parentheses from preceding gfp_mask
formatting markup.

Link: https://lkml.kernel.org/r/20251020044933.15222-1-bagasdotme@gmail.com
Fixes: 32904ba6f5ef ("vmalloc: update __vmalloc_node_noprof() documentation")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20251020134902.3a11107e@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Stehen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:14 -08:00
Lorenzo Stoakes
8247e2600e mm: update resctl to use mmap_prepare
Make use of the ability to specify a remap action within mmap_prepare to
update the resctl pseudo-lock to use mmap_prepare in favour of the
deprecated mmap hook.

Link: https://lkml.kernel.org/r/95b28b066f37ca25f56fa9460a9367f1a866f88b.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:14 -08:00
Lorenzo Stoakes
ab04945f91 mm: update mem char driver to use mmap_prepare
Update the mem char driver (backing /dev/mem and /dev/zero) to use
f_op->mmap_prepare hook rather than the deprecated f_op->mmap.

The /dev/zero implementation has a very unique and rather concerning
characteristic in that it converts MAP_PRIVATE mmap() mappings anonymous
when they are, in fact, not.

The new f_op->mmap_prepare() can support this, but rather than introducing
a helper function to perform this hack (and risk introducing other users),
utilise the success hook to do so.

We utilise the newly introduced shmem_zero_setup_desc() to allow for the
shared mapping case via an f_op->mmap_prepare() hook.

We also use the desc->action_error_hook to filter the remap error to
-EAGAIN to keep behaviour consistent.

Link: https://lkml.kernel.org/r/48f60764d7a6901819d1af778fa33b775d2e8c77.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:13 -08:00
Lorenzo Stoakes
89646d9c74 mm: add shmem_zero_setup_desc()
Add the ability to set up a shared anonymous mapping based on a VMA
descriptor rather than a VMA.

This is a prerequisite for converting to the char mm driver to use the
mmap_prepare hook.

Link: https://lkml.kernel.org/r/d9181517a7e3d6b014a5697c6990d3722c2c9fcd.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:13 -08:00
Lorenzo Stoakes
ea52cb24cd mm/hugetlbfs: update hugetlbfs to use mmap_prepare
Since we can now perform actions after the VMA is established via
mmap_prepare, use desc->action_success_hook to set up the hugetlb lock
once the VMA is setup.

We also make changes throughout hugetlbfs to make this possible.

Note that we must hide newly established hugetlb VMAs from the rmap until
the operation is entirely complete as we establish a hugetlb lock during
VMA setup that can be raced by rmap users.

Link: https://lkml.kernel.org/r/b1afa16d3cfa585a03df9ae215ae9f905b3f0ed7.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:13 -08:00
Lorenzo Stoakes
da003453dc doc: update porting, vfs documentation for mmap_prepare actions
Now we have introduced the ability to specify that actions should be taken
after a VMA is established via the vm_area_desc->action field as specified
in mmap_prepare, update both the VFS documentation and the porting guide
to describe this.

Link: https://lkml.kernel.org/r/472ce3da7662ed1065cc299d14bffb70b1a845e7.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:13 -08:00
Lorenzo Stoakes
ac0a3fc9c0 mm: add ability to take further action in vm_area_desc
Some drivers/filesystems need to perform additional tasks after the VMA is
set up.  This is typically in the form of pre-population.

The forms of pre-population most likely to be performed are a PFN remap
or the insertion of normal folios and PFNs into a mixed map.

We start by implementing the PFN remap functionality, ensuring that we
perform the appropriate actions at the appropriate time - that is setting
flags at the point of .mmap_prepare, and performing the actual remap at the
point at which the VMA is fully established.

This prevents the driver from doing anything too crazy with a VMA at any
stage, and we retain complete control over how the mm functionality is
applied.

Unfortunately callers still do often require some kind of custom action,
so we add an optional success/error _hook to allow the caller to do
something after the action has succeeded or failed.

This is done at the point when the VMA has already been established, so
the harm that can be done is limited.

The error hook can be used to filter errors if necessary.

There may be cases in which the caller absolutely must hold the file rmap
lock until the operation is entirely complete. It is an edge case, but
certainly the hugetlbfs mmap hook requires it.

To accommodate this, we add the hide_from_rmap_until_complete flag to the
mmap_action type. In this case, if a new VMA is allocated, we will hold the
file rmap lock until the operation is entirely completed (including any
success/error hooks).

Note that we do not need to update __compat_vma_mmap() to accommodate this
flag, as this function will be invoked from an .mmap handler whose VMA is
not yet visible, so we implicitly hide it from the rmap.

If any error arises on these final actions, we simply unmap the VMA
altogether.

Also update the stacked filesystem compatibility layer to utilise the
action behaviour, and update the VMA tests accordingly.

While we're here, rename __compat_vma_mmap_prepare() to __compat_vma_mmap()
as we are now performing actions invoked by the mmap_prepare in addition to
just the mmap_prepare hook.

Link: https://lkml.kernel.org/r/2601199a7b2eaeadfcd8ab6e199c6d1706650c94.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:12 -08:00
Lorenzo Stoakes
db91b78329 mm: introduce io_remap_pfn_range_[prepare, complete]()
We introduce the io_remap*() equivalents of remap_pfn_range_prepare() and
remap_pfn_range_complete() to allow for I/O remapping via mmap_prepare.

Make these internal to mm, as they should only be used by internal helpers.

Link: https://lkml.kernel.org/r/4065134f13a24a3e14691b7443bcee7490b18a5c.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:12 -08:00
Lorenzo Stoakes
c707a68f94 mm: abstract io_remap_pfn_range() based on PFN
The only instances in which we customise this function are ones in which we
customise the PFN used.

Instances where architectures were not passing the pgprot value through
pgprot_decrypted() are ones where pgprot_decrypted() was a no-op anyway, so
we can simply always pass pgprot through this function.

Use this fact to simplify the use of io_remap_pfn_range(), by abstracting
the PFN via io_remap_pfn_range_pfn() and using this instead of providing a
general io_remap_pfn_range() function per-architecture.

Link: https://lkml.kernel.org/r/d086191bf431b58ce3b231b4f4f555d080f60327.1760959442.git.lorenzo.stoakes@oracle.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:12 -08:00
Lorenzo Stoakes
51e38e7d40 mm: add remap_pfn_range_prepare(), remap_pfn_range_complete()
We need the ability to split PFN remap between updating the VMA and
performing the actual remap, in order to do away with the legacy f_op->mmap
hook.

To do so, update the PFN remap code to provide shared logic, and also make
remap_pfn_range_notrack() static, as its one user, io_mapping_map_user()
was removed in commit 9a4f90e246 ("mm: remove mm/io-mapping.c").

Then, introduce remap_pfn_range_prepare(), which accepts VMA descriptor
and PFN parameters, and remap_pfn_range_complete() which accepts the same
parameters as remap_pfn_rangte().

remap_pfn_range_prepare() will set the cow vma->vm_pgoff if necessary, so
it must be supplied with a correct PFN to do so.

While we're here, also clean up the duplicated #ifdef
__HAVE_PFNMAP_TRACKING check and put into a single #ifdef/#else block.

We keep these internal to mm as they should only be used by internal
helpers.

Link: https://lkml.kernel.org/r/75b55de63249b3aa0fd5b3b08ed1d3ff19255d0d.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:12 -08:00
Lorenzo Stoakes
2bcd9207de mm/vma: rename __mmap_prepare() function to avoid confusion
Now we have the f_op->mmap_prepare() hook, having a static function called
__mmap_prepare() that has nothing to do with it is confusing, so rename
the function to __mmap_setup().

Link: https://lkml.kernel.org/r/d25a22c60ca0f04091697ef9cda0d72ce0cf8af3.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:11 -08:00
Lorenzo Stoakes
651fdda840 relay: update relay to use mmap_prepare
It is relatively trivial to update this code to use the f_op->mmap_prepare
hook in favour of the deprecated f_op->mmap hook, so do so.

Link: https://lkml.kernel.org/r/7c9e82cdddf8b573ea3edb8cdb697363e3ccb5d7.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:11 -08:00
Lorenzo Stoakes
54c58a2f5f mm: add vma_desc_size(), vma_desc_pages() helpers
It's useful to be able to determine the size of a VMA descriptor range
used on f_op->mmap_prepare, expressed both in bytes and pages, so add
helpers for both and update code that could make use of it to do so.

Link: https://lkml.kernel.org/r/74ef338203c9ff08a9ace73a8f1f6116a79112a0.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:11 -08:00
Lorenzo Stoakes
cf1d98f44d mm/vma: remove unused function, make internal functions static
unlink_file_vma() is not used by anything, so remove it.

vma_link() and vma_link_file() are only used within mm/vma.c, so make them
static.

Link: https://lkml.kernel.org/r/f2ab9ea051225a02e6d1d45a7608f4e149220117.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:10 -08:00
Lorenzo Stoakes
8e18a7f435 device/dax: update devdax to use mmap_prepare
The devdax driver does nothing special in its f_op->mmap hook, so
straightforwardly update it to use the mmap_prepare hook instead.

Link: https://lkml.kernel.org/r/1e8665d052ac8cf2f7ff92b6c7862614f7fd306c.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:10 -08:00
Lorenzo Stoakes
ab3c8e7b86 mm/shmem: update shmem to use mmap_prepare
Patch series "expand mmap_prepare functionality, port more users", v5.

Since commit c84bf6dd2b ("mm: introduce new .mmap_prepare() file
callback"), The f_op->mmap hook has been deprecated in favour of
f_op->mmap_prepare.

This was introduced in order to make it possible for us to eventually
eliminate the f_op->mmap hook which is highly problematic as it allows
drivers and filesystems raw access to a VMA which is not yet correctly
initialised.

This hook also introduced complexity for the memory mapping operation, as
we must correctly unwind what we do should an error arises.

Overall this interface being so open has caused significant problems for
us, including security issues, it is important for us to simply eliminate
this as a source of problems.

Therefore this series continues what was established by extending the
functionality further to permit more drivers and filesystems to use
mmap_prepare.

We start by udpating some existing users who can use the mmap_prepare
functionality as-is.

We then introduce the concept of an mmap 'action', which a user, on
mmap_prepare, can request to be performed upon the VMA:

* Nothing - default, we're done
* Remap PFN - perform PFN remap with specified parameters
* I/O remap PFN - perform I/O PFN remap with specified parameters

By setting the action in mmap_prepare, this allows us to dynamically
decide what to do next, so if a driver/filesystem needs to determine
whether to e.g.  remap or use a mixed map, it can do so then change which
is done.

This significantly expands the capabilities of the mmap_prepare hook,
while maintaining as much control as possible in the mm logic.

We split [io_]remap_pfn_range*() functions which allow for PFN remap (a
typical mapping prepopulation operation) split between a prepare/complete
step, as well as io_mremap_pfn_range_prepare, complete for a similar
purpose.

From there we update various mm-adjacent logic to use this functionality
as a first set of changes.

We also add success and error hooks for post-action processing for e.g. 
output debug log on success and filtering error codes.


This patch (of 15):

This simply assigns the vm_ops so is easily updated - do so.

Link: https://lkml.kernel.org/r/cover.1760959441.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/7b93b1e89028e39507dac5ca01991e1374d5bbe8.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:10 -08:00
Quanmin Yan
dfc02531f4 mm/damon/reclaim: use min_sz_region for core address alignment when setting regions
When setting regions in DAMON_RECLAIM, DAMON_MIN_REGION will be applied as
the core address alignment, and the monitoring target address ranges would
be aligned on DAMON_MIN_REGION * addr_unit.  When users 1) set addr_unit
to a value larger than 1, and 2) set the monitoring target address range
as not aligned on DAMON_MIN_REGION * addr_unit, it will cause
DAMON_RECLAIM to operate on unexpectedly large physical address ranges.

For example, if the user sets the monitoring target address range to [4,
8) and addr_unit as 1024, the aimed monitoring target address range is [4
KiB, 8 KiB).  Assuming DAMON_MIN_REGION is 4096, so resulting target
address range will be [0, 4096) in the DAMON core layer address system,
and [0, 4 MiB) in the physical address space, which is an unexpected
range.

To fix the issue, use min_sz_region for core address alignment when
setting regions.

Link: https://lkml.kernel.org/r/20251020130125.2875164-3-yanquanmin1@huawei.com
Fixes: 7db551fcfb ("mm/damon/reclaim: support addr_unit for DAMON_RECLAIM")
Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: ze zuo <zuoze1@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:10 -08:00
Quanmin Yan
e859a224fa mm/damon: add a min_sz_region parameter to damon_set_region_biggest_system_ram_default()
Patch series "mm/damon: fixes for address alignment issues in
DAMON_LRU_SORT and DAMON_RECLAIM", v2.

In DAMON_LRU_SORT and DAMON_RECLAIM, damon_set_regions() will apply
DAMON_MIN_REGION as the core address alignment, and the monitoring target
address ranges would be aligned on DAMON_MIN_REGION * addr_unit.  When
users 1) set addr_unit to a value larger than 1, and 2) set the monitoring
target address range as not aligned on DAMON_MIN_REGION * addr_unit, it
will cause DAMON_LRU_SORT and DAMON_RECLAIM to operate on unexpectedly
large physical address ranges.

For example, if the user sets the monitoring target address range to [4,
8) and addr_unit as 1024, the aimed monitoring target address range is [4
KiB, 8 KiB).  Assuming DAMON_MIN_REGION is 4096, so resulting target
address range will be [0, 4096) in the DAMON core layer address system,
and [0, 4 MiB) in the physical address space, which is an unexpected
range.

To fix the issue, add a min_sz_region parameter to
damon_set_region_biggest_system_ram_default() and use it when calling
damon_set_regions(), replacing the direct use of DAMON_MIN_REGION.


This patch (of 2):

In DAMON_LRU_SORT, damon_set_regions() will apply DAMON_MIN_REGION as the
core address alignment, and the monitoring target address ranges would be
aligned on DAMON_MIN_REGION * addr_unit.  When users 1) set addr_unit to a
value larger than 1, and 2) set the monitoring target address range as not
aligned on DAMON_MIN_REGION * addr_unit, it will cause DAMON_LRU_SORT to
operate on unexpectedly large physical address ranges.

For example, if the user sets the monitoring target address range to [4,
8) and addr_unit as 1024, the aimed monitoring target address range is [4
KiB, 8 KiB).  Assuming DAMON_MIN_REGION is 4096, so resulting target
address range will be [0, 4096) in the DAMON core layer address system,
and [0, 4 MiB) in the physical address space, which is an unexpected
range.

To fix the issue, add a min_sz_region parameter to
damon_set_region_biggest_system_ram_default() and use it when calling
damon_set_regions(), replacing the direct use of DAMON_MIN_REGION.

Link: https://lkml.kernel.org/r/20251020130125.2875164-1-yanquanmin1@huawei.com
Link: https://lkml.kernel.org/r/20251020130125.2875164-2-yanquanmin1@huawei.com
Fixes: 2e0fe9245d ("mm/damon/lru_sort: support addr_unit for DAMON_LRU_SORT")
Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: ze zuo <zuoze1@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:09 -08:00
Lance Yang
074f027d15 mm/khugepaged: guard is_zero_pfn() calls with pte_present()
A non-present entry, like a swap PTE, contains completely different data
(swap type and offset).  pte_pfn() doesn't know this, so if we feed it a
non-present entry, it will spit out a junk PFN.

What if that junk PFN happens to match the zeropage's PFN by sheer chance?
While really unlikely, this would be really bad if it did.

So, let's fix this potential bug by ensuring all calls to is_zero_pfn() in
khugepaged.c are properly guarded by a pte_present() check.

Link: https://lkml.kernel.org/r/20251020151111.53561-1-lance.yang@linux.dev
Signed-off-by: Lance Yang <lance.yang@linux.dev>
Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Nico Pache <npache@redhat.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:09 -08:00
SeongJae Park
40d923acfa Docs/ABI/damon: document DAMOS quota goal path file
A DAMON sysfs interface file for DAMOS quota goal's optional path argument
has been added.  Document it on the ABI doc.

Link: https://lkml.kernel.org/r/20251017212706.183502-11-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:09 -08:00
SeongJae Park
87b8351580 Docs/admin-guide/mm/damon/usage: document DAMOS quota goal path file
A new DAMON sysfs interface file, namely 'path' has been added under DAMOS
quota goal directory, for specifying the cgroup for
DAMOS_QUOTA_NODE_MEMCG_{USED,FREE}_BP metrics.  Document it on the usage
document.

Link: https://lkml.kernel.org/r/20251017212706.183502-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:09 -08:00
SeongJae Park
4cc00d41c6 Docs/mm/damon/design: document DAMOS_QUOTA_NODE_MEMCG_{USED,FREE}_BP
Update design doc for the newly added two DAMOS quota auto-tuning target
goal metrics, DAMOS_QUOTA_NODE_MEMCG_{USED,FREE}_BP.

Link: https://lkml.kernel.org/r/20251017212706.183502-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:08 -08:00
SeongJae Park
c2fbf2da4c mm/damon/sysfs-schemes: support DAMOS_QUOTA_NODE_MEMCG_FREE_BP
Extend DAMON sysfs to support DAMOS_QUOTA_NODE_MEMCG_FREE_BP.

Link: https://lkml.kernel.org/r/20251017212706.183502-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:08 -08:00
SeongJae Park
98fdce76fb mm/damon/core: add DAMOS quota gaol metric for per-memcg per-numa free memory
Add a variant of DAMOS_QUOTA_NODE_MEMCG_USED_BP, for the free memory
portion.  The value of the metric is implemented as the entire memory of
the given NUMA node subtracted by the given cgroup's usage.  So from a
perspective, "unused" could be a better term than "free".  But arguably it
is not very clear what is better, so use the term "free".

Link: https://lkml.kernel.org/r/20251017212706.183502-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:08 -08:00
SeongJae Park
a1d1df78ac mm/damon/sysfs-schemes: support DAMOS_QUOTA_NODE_MEMCG_USED_BP
Add support of DAMOS_QUOTA_NODE_MEMCG_USED_BP.  For this, extend quota
goal metric inputs for the new metric, and update DAMOS core layer request
construction logic to set the target cgroup, which is specified by the
user, via the 'path' file.

Link: https://lkml.kernel.org/r/20251017212706.183502-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:08 -08:00
SeongJae Park
c41e253a41 mm/damon/sysfs-schemes: implement path file under quota goal directory
Add a DAMOS sysfs file for specifying the cgroup of the interest for
DAMOS_QUOTA_NODE_MEMCG_USED_BP.

Link: https://lkml.kernel.org/r/20251017212706.183502-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:07 -08:00
SeongJae Park
b74a120bcf mm/damon/core: implement DAMOS_QUOTA_NODE_MEMCG_USED_BP
Implement the handling of the new DAMOS quota goal metric for per-memcg
per-node memory usage, namely DAMOS_QUOTA_NODE_MEMCG_USED_BP.  The metric
value is calculated as the sum of active/inactive anon/file pages of the
given cgroup for a given NUMA node.

Link: https://lkml.kernel.org/r/20251017212706.183502-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:07 -08:00
SeongJae Park
6a18bbe483 mm/damon: add DAMOS quota goal type for per-memcg per-node memory usage
Define a new DAMOS quota auto-tuning target metric for per-cgroup per-node
memory usage.  For specifying the cgroup of the interest, add a field,
namely memcg_id, to damos_quota_goal struct.

Note that this commit is only implementing the interface.  The handling of
the interface (the metric value calculation) will be implemented in the
following commit.

Link: https://lkml.kernel.org/r/20251017212706.183502-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:07 -08:00
SeongJae Park
d3946c5f4c mm/damon: document damos_quota_goal->nid use case
Patch series "mm/damon: allow DAMOS auto-tuned for per-memcg per-node
memory usage".

Introduce two new DAMOS quota auto-tuning target metrics for per-cgroup
per-NUMA node memory utilization.  Expected use cases are cgroup level
access-aware NUMA memory managements, such as memory tiering or proactive
reclamation on cgroup-based multi-tenant NUMA systems.

Background
==========

The aim-oriented aggressiveness auto-tuning feature of DAMOS is a highly
recommended way for modern DAMOS use cases.  Using it, users can specify
what system status they want to achieve with what access-aware system
operations.  For example, reclaim cold memory aiming for 0.5 percent of
memory pressure (proactive reclaim), or migrate hot and cold memory
between NUMA nodes having different speed (memory tiering).  Then DAMOS
automatically adjusts the aggressiveness of the system operation (e.g.,
increase/decrease reclaim target coldness threshold) based on current
status of the system.

The use case is limited by the supported system status metrics for
specifying the target system status.  Two new system metrics for per-node
memory usage ratio, namely DAMOS_QUOTA_NODE_MEM_{USED,FREE}_BP, were
recently added to extend the use cases for access-aware NUMA nodes
management, such as memory tiering.  Those are expected to be useful for
not only memory tiering but also general access-aware inter-NUMA node page
migration, though.

Limitation
----------

The per-node memory usage based auto-tuning can be applied only
system-wide.  For cgroups-based multi-tenant systems, it could arguably
harm the fairness.  For example, a cgroup may use faster NUMA node memory
more than other cgroup, depending on their access pattern.  If the user of
each cgroup are promised to get the same quality and amount of the system
resource, this can arguably be an unfair situation.

DAMOS supports cgroup level system operations via DAMOS filter.  But the
quota auto-tuning system is not aware of cgroups.

New DAMOS Quota Tuning Metrics for Per-Cgroup Per-NUMA Memory Usage
===================================================================

To overcome the limitation, introduce two new DAMOS quota auto-tuning goal
metrics, namely DAMOS_QUOTA_NODE_MEMCG_{USED,FREE}_BP.  Those can be
thought of as a variant of DAMOS_QUOTA_NODE_MEM_{USED,FREE}_BP that
extended for cgroups.

The two metrics specifies per-cgroup, per-node amount of used and unused
memory in ratio to the total memory of the node.  For example, let's
assume a system has two NUMA nodes of size 100 GiB and 50 GiB.  And two
cgroups are using 40 GiB and 60 GiB of node 0, 20 GiB and 10 GiB of node
1, respectively, as illustrated by the below table.

                     node-0    node-1
    Total memory     100 GiB   50 GiB
    Cgroup A usage   40 GiB    20 GiB
    Cgroup B usage   60 GiB    10 GiB

Then, DAMOS_QUOTA_NODE_MEMCG_USED_BP for the cgroups for the first node
are, 40 GiB / 100 GiB = 4,000 bp (40 percent) and 60 GiB / 100 GiB = 6,000
bp (60 percent), respectively.  Those for the second node are, 20 GiB / 50
GiB = 4000 bp (40 percent) and 10 GiB / 50 GiB = 2000 bp (20 percent),
respectively.

DAMOS_QUOTA_NODE_MEMCG_FREE_BP for the four cases are, 60 GiB /100 GiB =
6000 bp, 40 GiB / 100 GiB = 4000 bp, 30 GiB / 50 GiB = 6000 bp, and 40 GiB
/ 50 GiB = 8000 bp, respectively.

    DAMOS_QUOTA_NODE_MEMCG_USED_BP for cgroup A node-0: 4000 bp
    DAMOS_QUOTA_NODE_MEMCG_USED_BP for cgroup B node-0: 6000 bp
    DAMOS_QUOTA_NODE_MEMCG_USED_BP for cgroup A node-1: 4000 bp
    DAMOS_QUOTA_NODE_MEMCG_USED_BP for cgroup B node-1: 2000 bp

    DAMOS_QUOTA_NODE_MEMCG_FREE_BP for cgroup A node-0: 6000 bp
    DAMOS_QUOTA_NODE_MEMCG_FREE_BP for cgroup B node-0: 4000 bp
    DAMOS_QUOTA_NODE_MEMCG_FREE_BP for cgroup A node-1: 6000 bp
    DAMOS_QUOTA_NODE_MEMCG_FREE_BP for cgroup B node-1: 8000 bp

Using these, users can specify how much [un]used amount of memory for
per-cgroup and per-node DAMOS should make as a result of the auto-tuning.

Example Usecase: Cgroup Level Memory Tiering
============================================

Let's suppose a typical and simple tiered memory system.  The system
equips two NUMA nodes.  The first node (node 0) is CPU-attached and fast. 
The second node (node 1) is CPU-unattached and slow.  It runs two cgroups
that desire to use about 30 percent and 70 percent of the faster node as
much as possible for their hot data, respectively.  Then, the user can
implement DAMOS-based memory tiering for the system using the DAMON
user-space tool (damo), like below.

    # ./damo start \
    	`# kdamond for node 1 (slow)` \
        --numa_node 1 --monitoring_intervals_goal 4% 3 5ms 10s \
	    `# promotion scheme for cgroup a` \
            --damos_action migrate_hot 0 --damos_access_rate 5% max \
            --damos_apply_interval 1s \
	    --damos_filter allow memcg /workloads/a \
            --damos_filter allow young \
            --damos_quota_interval 1s --damos_quota_space 200MB \
            --damos_quota_goal node_memcg_used_bp 29.7% 0 /workloads/a \
	    \
	    `# promotion scheme for cgroup b` \
            --damos_action migrate_hot 0 --damos_access_rate 5% max \
            --damos_apply_interval 1s \
	    --damos_filter allow memcg /workloads/b \
            --damos_filter allow young \
            --damos_quota_interval 1s --damos_quota_space 200MB \
            --damos_quota_goal node_memcg_used_bp 69.7% 0 workloads/b \
	    \
    	`# kdamond for node 0 (fast)` \
        --numa_node 0 --monitoring_intervals_goal 4% 3 5ms 10s \
            `# demotion scheme for cgroup a` \
            --damos_action migrate_cold 1 --damos_access_rate 0% 0% \
            --damos_apply_interval 1s \
	    --damos_filter allow memcg /workloads/a \
            --damos_filter reject young \
            --damos_quota_interval 1s --damos_quota_space 200MB \
            --damos_quota_goal node_memcg_free_bp 70.5% 0 \
	    \
            `# demotion scheme for cgroup b` \
            --damos_action migrate_cold 1 --damos_access_rate 0% 0% \
            --damos_apply_interval 1s \
	    --damos_filter allow memcg /workloads/a \
            --damos_filter reject young \
            --damos_quota_interval 1s --damos_quota_space 200MB \
            --damos_quota_goal node_memcg_free_bp 30.5% 0 \
	    \
            --damos_nr_quota_goals 1 1 1 1 --damos_nr_filters 1 1 1 1 \
        --nr_targets 1 1 --nr_schemes 2 2 --nr_ctxs 1 1

With the command, the user-space tool will ask DAMON to spawn two kernel
threads, each for monitoring accesses to node 1 (slow) and node 0 (fast),
respectively.  It installs two DAMOS schemes on each thread.  Let's call
them "promotion scheme for cgroup a/b", and "demotion scheme for cgroup
a/b" in the order.  The promotion schemes are installed on the DAMON
thread for node 1 (slow), and demotion schemes are installed on the DAMON
thread for node 0 (fast).

Cgroup Level Hot Pages Migration (Promotion)
--------------------------------------------

Promotion schemes will find memory regions on node 1 (slow), that some
access was detected.  The schemes will then migrate the found memory to
node 0 (fast), hottest pages first.

For accurate and effective migration, these schemes use two page level
filters.  First, the migration will be filtered for only cgroup A and
cgroup B.  That is, "promotion scheme for cgroup B" will not do the
migration if the page is for cgroup A.  Secondly, the schemes will ignore
pages that having their page table's Accessed bits unset.  The per-page
Accessed bit check logic will also unset the bit if it was set, for the
next check.

For controlled amounts of system resource consumption and aiming on the
target memory usage, the schemes use quotas setup.  The migration is
limited to be done only up to 200 MiB per second, to limit the peak system
resource usage.  And DAMOS_QUOTA_NODE_MEMCG_USED_BP target is set for
29.7% and 69.7% of node 0 (fast), respectively.  The target value is lower
than the high level goal (30% and 70% system memory), to give headroom on
node 0 (fast).  DAMOS will adjust the speed of the pages migration based
on the target and current per-cgroup node 0 memory usage.  For example, if
cgroup A is utilizing only 10% of node 0, DAMOS will try to migrate more
of cgroup A hot pages from node 1 to node 0, up to 200 MiB per second.  If
cgroup A utilizes more than 29.7% of node 0 memory, the cgroup A hot pages
migration from node 1 to node 0 will be slowed and eventually stopped.

Cgroup Level Cold Pages Migration (Demotion)
--------------------------------------------

Demotion schemes are similar to promotion schemes, but differ in filtering
setup and quota tuning setup.  Those filter out pages having their page
table Accessed bits set.  And set 70.5% and 30.5% of node 0 memory free
rate for the cgroup A and B, respectively.  Hence, if promotion schemes or
something made cgroup A and/or B uses more than 29.5% and 69.5% of node 0,
demotion schemes will start migrating cold pages of appropriate cgroups in
node 0 to node 1, under the 200 MiB per second speed cap, while adjusting
the speed based on how much more than wanted memory is being used.

The quota target values are set to overlap with promotion targets, to keep
a minimum level of page exchanges between the nodes.  This is to avoid a
case that the target memory utilization is met, and then access pattern
changes (pages in node 1 become hotter than pages in node 0) while the
memory utilization is unchanged.  Without the overlap, neither promotion
of hotter pages in node 1, nor demotion of colder pages in node 0 will
happen since both goals are met.  As a result, the faster and slower node
will unexpectedly serve cold and hot data.

Test: Per-cgroup Memory Tiering
===============================

I ran a simplified cgroup level memory tiering using the feature, and
confirmed it works as intended.

Setup
-----

I configured a QEMU virtual machine representing a simplified version of
the system that described on the above cgroup level memory tiering example
use case.  The system equips 40 CPU cores and two NUMA nodes each having
30 GiB physical memory.  The first node (node 0) represents the faster
NUMA node, and the second node (node 1) represents the slower NUMA node. 
In specific, below qemu command line options are used.

    [...]
    -object memory-backend-ram,size=30G,id=m0 \
    -object memory-backend-ram,size=30G,id=m1 \
    -numa node,cpus=0-39,memdev=m0 \
    -numa node,memdev=m1 \
    [...]

I booted the virtual machine with a kernel that this patch series is
applied.  On the virtual machine, I created two cgroups, namely workload_a
and workload_b.  And ran a test program in each cgroup, resulting in one
process per cgroup.  The test program allocates 10 GiB memory and evenly
split it into 10 regions.  After the allocation, it repeatedly access the
first region for one minute, than the second one for one minute, and so
on.  After the one minute repeated access for the 10-th region is done, it
repeats the access from the first region.  So the process has 10 GiB of
data in total, but only 1 GiB of it is hot at a given moment, and the hot
data is gradually changed.

While the processes are running, run DAMON for a simple access-aware
memory tiering using below script.  It migrates hot and cold data of the
cgroups into node 0 and node 1, aiming the first and the second cgroups
(workload_a and workload_b, respectively) utilizing about 9.7 percent and
19.7 percent of node 0, respectively.

Note that this setup is a simplified version of the above example use
case, for ease of test.  Also note that we assigned 30 GiB physical memory
to node 0, but DAMON in this setup works for only 27 GiB of the memory. 
It is due to an internal implementation detail of DAMON user-space tool
that not really important for this test.

    #!/bin/bash
    damo start \
        --numa_node 1 \
            --damos_action migrate_hot 0 --damos_access_rate 5% max \
                --damos_apply_interval 1s \
                --damos_filter allow memcg /workload_a \
                --damos_filter allow young \
                --damos_quota_interval 1s \
                --damos_quota_goal node_memcg_used_bp 9.7% 0 /workload_a \
            --damos_action migrate_hot 0 --damos_access_rate 5% max \
                --damos_apply_interval 1s \
                --damos_filter allow memcg /workload_b \
                --damos_filter allow young \
                --damos_quota_interval 1s \
                --damos_quota_goal node_memcg_used_bp 19.7% 0 /workload_b \
        --numa_node 0 \
            --damos_action migrate_cold 1 --damos_access_rate 0% 0% \
                --damos_apply_interval 1s \
                --damos_filter allow memcg /workload_a \
                --damos_filter reject young \
                --damos_quota_interval 1s \
                --damos_quota_goal node_memcg_free_bp 90.5% 0 /workload_a \
            --damos_action migrate_cold 1 --damos_access_rate 0% 0% \
                --damos_apply_interval 1s \
                --damos_filter allow memcg /workload_b \
                --damos_filter reject young \
                --damos_quota_interval 1s \
                --damos_quota_goal node_memcg_free_bp 80.5% 0 /workload_b \
                --damos_nr_quota_goals 1 1 1 1 --damos_nr_filters 2 2 2 2 \
        --nr_targets 1 1 --nr_schemes 2 2 --nr_ctxs 1 1

After starting DAMON, the pages continuously be migrated across nodes.  A
few minutes later, the memory usage of the cgroups converges into the
aimed amounts, and keeps the level, as expected.  To confirm the status is
kept in the target level as expected, I collected the memory usage stat of
the cgroups using memory.numa_stat file, after the stats are converged.  I
repeat the stat collection 42 times with 5 seconds delay between each of
the collections.  The results are as below:

    node0_memory_usage  average  stdev
    workload_a          2.79GiB  522.06MiB
    workload_b          5.15GiB  739.10MiB

The average values are quite close to the targeted values: 27 GiB * 9.7% =
2.619 GiB for workload_a, and 27 GiB * 19.7% = 5.319 GiB.  A level of
variances are expected, given the overlap of the promotion/demotion
targets, and dynamic data access pattern of the workloads.  Give that, the
measured variances are at a reasonable level.

Patches Sequence
================

The first patch (patch 1) updates the kernel-doc comment of
damos_quota_goal struct to clarify usage of optional fields of the struct,
since later patches will add such optional fields.

Following four patches (patches 2-5) implement a new DAMOS quota goal
metric for per-cgroup per-node memory usage.  Those extends the core layer
interface for the new metric (patch 2), implement the metric value
calculation on the core layer (patch 3), add DAMON sysfs interface file
for the target cgroup specification (patch 4), and implement support of
the new metric on DAMON sysfs interface (patch 5).

Next two patches implment the second new DAMOS quota goal metric for
per-cgroup per-node free (or, unused) memory.  Those implement it in the
core layer (patch 6) and DAMON sysfs interface (patch 7), extending the
existing implementation for memory usage metric.

Final three patches update the design (patch 8), the usage (patch 9), and
the ABI (patch 10) documents for the changes that are introduced by this
patch series.


This patch (of 10):

damos_quota_goal kerneldoc comment is not explaining when @metric is used.
Update the comment for that.

Link: https://lkml.kernel.org/r/20251017212706.183502-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20251017212706.183502-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:07 -08:00
Baolin Wang
2f05435df9 mm: vmscan: simplify the logic for activating dirty file folios
After commit 6b0dfabb35 ("fs: Remove aops->writepage"), we no longer
attempt to write back filesystem folios through reclaim.

However, in the shrink_folio_list() function, there still remains some
logic related to writeback control of dirty file folios.  The original
logic was that, for direct reclaim, or when folio_test_reclaim() is false,
or the PGDAT_DIRTY flag is not set, the dirty file folios would be
directly activated to avoid being scanned again; otherwise, it will try to
writeback the dirty file folios.  However, since we can no longer perform
writeback on dirty folios, the dirty file folios will still be activated.

Additionally, under the original logic, if we continue to try writeback
dirty file folios, we will also check the references flag,
sc->may_writepage, and may_enter_fs(), which may result in dirty file
folios being left in the inactive list.  This is unreasonable.  Even if
these dirty folios are scanned again, we still cannot clean them.

Therefore, the checks on these dirty file folios appear to be redundant
and can be removed.  Dirty file folios should be directly moved to the
active list to avoid being scanned again.  Since we set the PG_reclaim
flag for the dirty folios, once the writeback is completed, they will be
moved back to the tail of the inactive list to be retried for quick
reclaim.

Link: https://lkml.kernel.org/r/ba5c49955fd93c6850bcc19abf0e02e1573768aa.1760687075.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:06 -08:00
Baolin Wang
b34619af9c mm: vmscan: filter out the dirty file folios for node_reclaim()
Patch series "optimize the logic for handling dirty file folios during
reclaim", v2.

Since we no longer attempt to write back filesystem folios during reclaim,
some logic for handling dirty file folios in the reclaim process also
needs to be updated.  Please check the details in each patch.  


This patch (of 2):

After commit 6b0dfabb35 ("fs: Remove aops->writepage"), we no longer
attempt to write back filesystem folios in pageout(), and only tmpfs/shmem
folios and anonymous swapcache folios can be written back.  Therefore, we
should also filter out the dirty filesystem folios for node_reclaim() to
avoid unnecessary LRU scans.

Link: https://lkml.kernel.org/r/cover.1760687075.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/c91f5ecc5152b647904c7503618a01885d913928.1760687075.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:06 -08:00
Ye Liu
5bf65d4a8d tools/mm/page_owner_sort: add help option support
Add -h/--help option to display usage information and improve code style.

Link: https://lkml.kernel.org/r/20251016054927.138510-1-ye.liu@linux.dev
Signed-off-by: Ye Liu <liuye@kylinos.cn>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:06 -08:00
Shakeel Butt
d929525c2e memcg: net: track network throttling due to memcg memory pressure
The kernel can throttle network sockets if the memory cgroup associated
with the corresponding socket is under memory pressure.  The throttling
actions include clamping the transmit window, failing to expand receive or
send buffers, aggressively prune out-of-order receive queue, FIN deferred
to a retransmitted packet and more.  Let's add memcg metric to track such
throttling actions.

At the moment memcg memory pressure is defined through vmpressure and in
future it may be defined using PSI or we may add more flexible way for the
users to define memory pressure, maybe through ebpf.  However the
potential throttling actions will remain the same, so this newly
introduced metric will continue to track throttling actions irrespective
of how memcg memory pressure is defined.

Link: https://lkml.kernel.org/r/20251016161035.86161-1-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Daniel Sedlak <daniel.sedlak@cdn77.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:06 -08:00
wang lian
a059ad48b4 mm/khugepaged: fix comment for default scan sleep duration
The comment for khugepaged_scan_sleep_millisecs incorrectly states the
default scan period is 30 seconds.  The actual default value in the code
is 10000ms (10 seconds).

This patch corrects the comment to match the code, preventing potential
confusion.  The incorrect comment has existed since the feature was first
introduced.  While at it, replace the magic value 512 by HPAGE_PMD_NR and
use 'ptes'.

Link: https://lkml.kernel.org/r/20251015092957.37432-1-lianux.mm@gmail.com
Signed-off-by: wang lian <lianux.mm@gmail.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Acked-by: Nico Pache <npache@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:05 -08:00
Ye Liu
2f79ddb64b tools/mm: use <stdbool.h> in page_owner_sort.c
Use standard <stdbool.h> instead of manually defining bool, true and false.

Link: https://lkml.kernel.org/r/20251015093851.109663-1-ye.liu@linux.dev
Signed-off-by: Ye Liu <liuye@kylinos.cn>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:05 -08:00
Vlastimil Babka
0f21b91101 mm/page_alloc: simplify and cleanup pcp locking
The pcp locking relies on pcp_spin_trylock() which has to be used together
with pcp_trylock_prepare()/pcp_trylock_finish() to work properly on !SMP
!RT configs.  This is tedious and error-prone.

We can remove pcp_spin_lock() and underlying pcpu_spin_lock() because we
don't use it.  Afterwards pcp_spin_unlock() is only used together with
pcp_spin_trylock().  Therefore we can add the UP_flags parameter to them
both and handle pcp_trylock_prepare()/finish() within.

Additionally for the configs where pcp_trylock_prepare()/finish() are
no-op (SMP || RT) make them pass &UP_flags to a no-op inline function. 
This ensures typechecking and makes the local variable "used" so we can
remove the __maybe_unused attributes.

In my compile testing, bloat-o-meter reported no change on SMP config, so
the compiler is capable of optimizing away the no-ops same as before, and
we have simplified the code using pcp_spin_trylock().

Link: https://lkml.kernel.org/r/20251015-b4-pcp-lock-cleanup-v2-1-740d999595d5@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:05 -08:00
Joshua Hahn
91e6912966 mm/page_alloc: batch page freeing in free_frozen_page_commit
Before returning, free_frozen_page_commit calls free_pcppages_bulk using
nr_pcp_free to determine how many pages can appropritately be freed, based
on the tunable parameters stored in pcp.  While this number is an accurate
representation of how many pages should be freed in total, it is not an
appropriate number of pages to free at once using free_pcppages_bulk,
since we have seen the value consistently go above 2000 in the Meta fleet
on larger machines.

As such, perform batched page freeing in free_pcppages_bulk by using
pcp->batch.  In order to ensure that other processes are not starved of
the zone lock, free both the zone lock and pcp lock to yield to other
threads.

Note that because free_frozen_page_commit now performs a spinlock inside
the function (and can fail), the function may now return with a freed pcp.
To handle this, return true if the pcp is locked on exit and false
otherwise.

In addition, since free_frozen_page_commit must now be aware of what UP
flags were stored at the time of the spin lock, and because we must be
able to report new UP flags to the callers, add a new unsigned long*
parameter UP_flags to keep track of this.

The following are a few synthetic benchmarks, made on three machines.  The
first is a large machine with 754GiB memory and 316 processors.  The
second is a relatively smaller machine with 251GiB memory and 176
processors.  The third and final is the smallest of the three, which has
62GiB memory and 36 processors.

On all machines, I kick off a kernel build with -j$(nproc).  Negative
delta is better (faster compilation)

Large machine (754GiB memory, 316 processors)
make -j$(nproc)
+------------+---------------+-----------+
| Metric (s) | Variation (%) | Delta(%)  |
+------------+---------------+-----------+
| real       |        0.8070 |  - 1.4865 |
| user       |        0.2823 |  + 0.4081 |
| sys        |        5.0267 |  -11.8737 |
+------------+---------------+-----------+

Medium machine (251GiB memory, 176 processors)
make -j$(nproc)
+------------+---------------+----------+
| Metric (s) | Variation (%) | Delta(%) |
+------------+---------------+----------+
| real       |        0.2806 |  +0.0351 |
| user       |        0.0994 |  +0.3170 |
| sys        |        0.6229 |  -0.6277 |
+------------+---------------+----------+

Small machine (62GiB memory, 36 processors)
make -j$(nproc)
+------------+---------------+----------+
| Metric (s) | Variation (%) | Delta(%) |
+------------+---------------+----------+
| real       |        0.1503 |  -2.6585 |
| user       |        0.0431 |  -2.2984 |
| sys        |        0.1870 |  -3.2013 |
+------------+---------------+----------+

Here, variation is the coefficient of variation, i.e.  standard deviation
/ mean.

[joshua.hahnjy@gmail.com: simplify checks]
  Link: https://lkml.kernel.org/r/20251014192827.851389-1-joshua.hahnjy@gmail.com
Link: https://lkml.kernel.org/r/20251014145011.3427205-4-joshua.hahnjy@gmail.com
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Suggested-by: Chris Mason <clm@fb.com>
Co-developed-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:05 -08:00
Joshua Hahn
fc4b909c36 mm/page_alloc: batch page freeing in decay_pcp_high
It is possible for pcp->count - pcp->high to exceed pcp->batch by a lot. 
When this happens, we should perform batching to ensure that
free_pcppages_bulk isn't called with too many pages to free at once and
starve out other threads that need the pcp or zone lock.

Since we are still only freeing the difference between the initial
pcp->count and pcp->high values, there should be no change to how many
pages are freed.

Link: https://lkml.kernel.org/r/20251014145011.3427205-3-joshua.hahnjy@gmail.com
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Suggested-by: Chris Mason <clm@fb.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Co-developed-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:04 -08:00
Joshua Hahn
0acc67c403 mm/page_alloc/vmstat: simplify refresh_cpu_vm_stats change detection
Patch series "mm/page_alloc: Batch callers of free_pcppages_bulk", v5.

Motivation & Approach
=====================

While testing workloads with high sustained memory pressure on large
machines in the Meta fleet (1Tb memory, 316 CPUs), we saw an unexpectedly
high number of softlockups.  Further investigation showed that the zone
lock in free_pcppages_bulk was being held for a long time, and was called
to free 2k+ pages over 100 times just during boot.

This causes starvation in other processes for the zone lock, which can
lead to the system stalling as multiple threads cannot make progress
without the locks.  We can see these issues manifesting as warnings:

[ 4512.591979] rcu: INFO: rcu_sched self-detected stall on CPU
[ 4512.604370] rcu:     20-....: (9312 ticks this GP) idle=a654/1/0x4000000000000000 softirq=309340/309344 fqs=5426
[ 4512.626401] rcu:              hardirqs   softirqs   csw/system
[ 4512.638793] rcu:      number:        0        145            0
[ 4512.651177] rcu:     cputime:       30      10410          174   ==> 10558(ms)
[ 4512.666657] rcu:     (t=21077 jiffies g=783665 q=1242213 ncpus=316)

While these warnings don't indicate a crash or a kernel panic, they do
point to the underlying issue of lock contention.  To prevent starvation
in both locks, batch the freeing of pages using pcp->batch.

Because free_pcppages_bulk is called with the pcp lock and acquires the
zone lock, relinquishing and reacquiring the locks are only effective when
both of them are broken together (unless the system was built with queued
spinlocks).  Thus, instead of modifying free_pcppages_bulk to break both
locks, batch the freeing from its callers instead.

A similar fix has been implemented in the Meta fleet, and we have seen
significantly less softlockups.

Testing
=======
The following are a few synthetic benchmarks, made on three machines. The
first is a large machine with 754GiB memory and 316 processors.
The second is a relatively smaller machine with 251GiB memory and 176
processors. The third and final is the smallest of the three, which has 62GiB
memory and 36 processors.

On all machines, I kick off a kernel build with -j$(nproc).
Negative delta is better (faster compilation).

Large machine (754GiB memory, 316 processors)
make -j$(nproc)
+------------+---------------+-----------+
| Metric (s) | Variation (%) | Delta(%)  |
+------------+---------------+-----------+
| real       |        0.8070 |  - 1.4865 |
| user       |        0.2823 |  + 0.4081 |
| sys        |        5.0267 |  -11.8737 |
+------------+---------------+-----------+

Medium machine (251GiB memory, 176 processors)
make -j$(nproc)
+------------+---------------+----------+
| Metric (s) | Variation (%) | Delta(%) |
+------------+---------------+----------+
| real       |        0.2806 |  +0.0351 |
| user       |        0.0994 |  +0.3170 |
| sys        |        0.6229 |  -0.6277 |
+------------+---------------+----------+

Small machine (62GiB memory, 36 processors)
make -j$(nproc)
+------------+---------------+----------+
| Metric (s) | Variation (%) | Delta(%) |
+------------+---------------+----------+
| real       |        0.1503 |  -2.6585 |
| user       |        0.0431 |  -2.2984 |
| sys        |        0.1870 |  -3.2013 |
+------------+---------------+----------+

Here, variation is the coefficient of variation, i.e.  standard deviation
/ mean.

Based on these results, it seems like there are varying degrees to how
much lock contention this reduces.  For the largest and smallest machines
that I ran the tests on, it seems like there is quite some significant
reduction.  There is also some performance increases visible from
userspace.

Interestingly, the performance gains don't scale with the size of the
machine, but rather there seems to be a dip in the gain there is for the
medium-sized machine.  One possible theory is that because the high
watermark depends on both memory and the number of local CPUs, what
impacts zone contention the most is not these individual values, but
rather the ratio of mem:processors.


This patch (of 5):

Currently, refresh_cpu_vm_stats returns an int, indicating how many
changes were made during its updates.  Using this information, callers
like vmstat_update can heuristically determine if more work will be done
in the future.

However, all of refresh_cpu_vm_stats's callers either (a) ignore the
result, only caring about performing the updates, or (b) only care about
whether changes were made, but not *how many* changes were made.

Simplify the code by returning a bool instead to indicate if updates
were made.

In addition, simplify fold_diff and decay_pcp_high to return a bool
for the same reason.

Link: https://lkml.kernel.org/r/20251014145011.3427205-1-joshua.hahnjy@gmail.com
Link: https://lkml.kernel.org/r/20251014145011.3427205-2-joshua.hahnjy@gmail.com
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Chris Mason <clm@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:04 -08:00
Kefeng Wang
f66e2727dd mm: huge_memory: use folio_can_map_prot_numa() for pmd folio
The folio_can_map_prot_numa() checks whether the folio can map prot numa,
which skips unsuitable folio, i.e.  zone device, shared folios (KSM, CoW),
non-movable dma pinned, dirty file folio and folios that already have the
expected node affinity.  Although the ksm only applies to small folios, an
extra test was added for large folios, but the other policies should be
applied to pmd folio, which helps to avoid unnecessary pmd change and
folio migration attempts.

Link: https://lkml.kernel.org/r/20251023113737.3572790-5-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:04 -08:00
Kefeng Wang
ca43034cdb mm: mprotect: convert to folio_can_map_prot_numa()
The prot_numa_skip() naming is not good since it updates the folio access
time except checking whether to skip prot NUMA, so rename it to
folio_can_map_prot_numa(), and cleanup it a bit, remove ret by directly
return value instead of goto style.

Adding a new helper vma_is_single_threaded_private() to check whether it's
a single threaded private VMA, and make folio_can_map_prot_numa() a
non-static function so that they could be reused in change_huge_pmd(),
since folio_can_map_prot_numa() will be shared in different paths, let's
move it near change_prot_numa() in mempolicy.c.

Link: https://lkml.kernel.org/r/20251023113737.3572790-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:03 -08:00
Kefeng Wang
6e97624dac mm: mprotect: avoid unnecessary struct page accessing if pte_protnone()
If the pte_protnone() is true, we could avoid unnecessary struct page
accessing and reduce cache footprint when scanning page tables for prot
numa, there was a similar change before, see more commit a818f5363a
("autonuma: reduce cache footprint when scanning page tables").

Link: https://lkml.kernel.org/r/20251023113737.3572790-3-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:03 -08:00
Kefeng Wang
03aa8e4f27 mm: mprotect: always skip dma pinned folio in prot_numa_skip()
Patch series "mm: some optimizations for prot numa", v5.


This patch (of 4):

If the folio (even not CoW folio) is dma pinned, it can't be migrated due
to the elevated reference count.  So always skip a pinned folio to avoid
wasting cycles when folios are migrated.

Link: https://lkml.kernel.org/r/20251023113737.3572790-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20251023113737.3572790-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:03 -08:00
Donet Tom
d945667dcb drivers/base/node: fold unregister_node() into unregister_one_node()
unregister_node() is only called from unregister_one_node().  This patch
folds unregister_node() into its only caller and renames
unregister_one_node() to unregister_node().

This reduces unnecessary indirection and simplifies the code structure. 
No functional changes are introduced.

[donettom@linux.ibm.com: remove extra spaces before @nid and "All"]
  Link: https://lkml.kernel.org/r/cff01514-9074-4c97-bcf1-d4e3594e48b0@linux.ibm.com
Link: https://lkml.kernel.org/r/32b7d5d8f0f30d313c3e1d8798f591459c8746f9.1760097208.git.donettom@linux.ibm.com
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: SeongJae Park <sj@kernel.org>
Cc: Aboorva Devarajan <aboorvad@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:03 -08:00
Donet Tom
eb8762dc22 drivers/base/node: fold register_node() into register_one_node()
Patch series "drivers/base/node: fold node register and unregister
functions", v2.

The first patch merges register_one_node() and register_node(), leaving a
single register_node() function.

The second patch merges unregister_one_node() and unregister_node(),
leaving a single unregister_node() function.

There are no functional changes in these patches.


This patch (of 2):

register_node() is only called from register_one_node().  This patch folds
register_node() into its only caller and renames register_one_node() to
register_node().

This reduces unnecessary indirection and simplifies the code structure. 
No functional changes are introduced.

[akpm@linux-foundation.org: fix kerneldoc, per David]
Link: https://lkml.kernel.org/r/cover.1760097207.git.donettom@linux.ibm.com
Link: https://lkml.kernel.org/r/910853c9dd61f7a2190a56cba101e73e9c6859be.1760097207.git.donettom@linux.ibm.com
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: SeongJae Park <sj@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Aboorva Devarajan <aboorvad@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:02 -08:00
Huacai Chen
900fcf00e1 mm: remove the BOUNCE config option
Commit eeadd68e2a ("block: remove bounce buffering support") remove
block/bounce.c but left the BOUNCE config option.  Now this option has no
users, so remove it.

Link: https://lkml.kernel.org/r/20251013095620.1111061-1-chenhuacai@loongson.cn
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:02 -08:00
Uladzislau Rezki (Sony)
8cb290dd4b vmalloc: update __vmalloc_node_noprof() documentation
The kernel-doc for __vmalloc_node_noprof() incorrectly states that
__GFP_NOFAIL reclaim modifier is not supported.  In fact it has been
supported since commit 9376130c39 ("mm/vmalloc: add support for
__GFP_NOFAIL").

To avoid duplication and future drift, point this helper's doc to
__vmalloc_node_range_noprof() for details and the full description.

Link: https://lkml.kernel.org/r/20251013174222.90123-1-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:02 -08:00
Yadong Qi
a739e6b557 mm: vmalloc: WARN_ON if mapping size is not PAGE_SIZE aligned
In mm/vmalloc.c, the function vmap_pte_range() assumes that the mapping
size is aligned to PAGE_SIZE.  If this assumption is violated, the loop
will become infinite because the termination condition (`addr != end`)
will never be met.  This can lead to overwriting other VA ranges and/or
random pages physically follow the page table.

It's the caller's responsibility to ensure that the mapping size is
aligned to PAGE_SIZE.  However, the memory corruption is hard to root
cause.  To identify the programming error in the caller easier, check
whether the mapping size is PAGE_SIZE aligned with WARN_ON_ONCE().

[yadong.qi@linux.alibaba.com: fix uninitialized value issue]
  Closes: https://lore.kernel.org/r/202510110050.VG9YKMRK-lkp@intel.com/
Link: https://lkml.kernel.org/r/20251010014311.1689-1-yadong.qi@linux.alibaba.com
Signed-off-by: Yadong Qi <yadong.qi@linux.alibaba.com>
Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:02 -08:00
Song Hu
ca30ac479e mm/page_owner: simplify zone iteration logic in init_early_allocated_pages()
The current implementation uses nested loops: first iterating over all
online nodes, then over zones within each node.  This can be simplified by
using the for_each_populated_zone() macro which directly iterates through
all populated zones.

This change:
1. Removes the intermediate init_zones_in_node() function
2. Simplifies init_early_allocated_pages() to use direct zone iteration
3. Updates init_pages_in_zone() to take only zone parameter and access
   node_id via zone->zone_pgdat

The functionality remains identical, but the code is cleaner and more
maintainable.

Link: https://lkml.kernel.org/r/20250930092153.843109-2-husong@kylinos.cn
Signed-off-by: Song Hu <husong@kylinos.cn>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Ye Liu <liuye@kylinos.cn>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:01 -08:00
Song Hu
9686080d62 mm/page_owner: rename proc-prefixed variables for clarity
`proc_page_owner_operations` and related variables were renamed to
`page_owner_fops` to better reflect their association with `debugfs`
rather than `/proc`.  This improves code clarity and aligns with kernel
naming conventions.

Link: https://lkml.kernel.org/r/20250930092153.843109-1-husong@kylinos.cn
Signed-off-by: Song Hu <husong@kylinos.cn>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Ye Liu <liuye@kylinos.cn>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:01 -08:00
Sabyrzhan Tasbolatov
ada5cbe33a kasan: cleanup of kasan_enabled() checks
Deduplication of kasan_enabled() checks which are already used by callers.

* Altered functions:

check_page_allocation
	Delete the check because callers have it already in __wrappers in
	include/linux/kasan.h:
		__kasan_kfree_large
		__kasan_mempool_poison_pages
		__kasan_mempool_poison_object

kasan_populate_vmalloc, kasan_release_vmalloc
	Add __wrappers in include/linux/kasan.h.
	They are called externally in mm/vmalloc.c.

__kasan_unpoison_vmalloc, __kasan_poison_vmalloc
	Delete checks because there're already kasan_enabled() checks
	in respective __wrappers in include/linux/kasan.h.

release_free_meta -- Delete the check because the higher caller path
	has it already. See the stack trace:

	__kasan_slab_free -- has the check already
	__kasan_mempool_poison_object -- has the check already
		poison_slab_object
			kasan_save_free_info
				release_free_meta
					kasan_enabled() -- Delete here

Link: https://lkml.kernel.org/r/20251009155403.1379150-3-snovitoll@gmail.com
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:01 -08:00
Sabyrzhan Tasbolatov
27109f5703 kasan: remove __kasan_save_free_info wrapper
Patch series "kasan: cleanups for kasan_enabled() checks".

This patch series is the continuation of [1] the previous discussion
related to the KASAN internal refactoring.

Here we remove kasan_enabled() checks which are duplicated by higher
callers.  These checks deduplication are also related to the separate
patch series [2].


This patch (of 2):

We don't need a kasan_enabled() check in kasan_save_free_info() at all. 
Both the higher level paths (kasan_slab_free and
kasan_mempool_poison_object) already contain this check.  Therefore,
remove the __wrapper.

Link: https://lkml.kernel.org/r/20251009155403.1379150-1-snovitoll@gmail.com
Link: https://lkml.kernel.org/r/20251009155403.1379150-2-snovitoll@gmail.com
Link: https://lore.kernel.org/all/CA+fCnZce3AR+pUesbDkKMtMJ+iR8eDrcjFTbVpAcwjBoZ=gJnQ@mail.gmail.com/ [1]
Link: https://lore.kernel.org/all/aNTfPjS2buXMI46D@MiWiFi-R3L-srv/ [2]
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:01 -08:00
Lokesh Gidra
cc22b99785 mm/userfaultfd: don't lock anon_vma when performing UFFDIO_MOVE
Now that rmap_walk() is guaranteed to be called with the folio lock held,
we can stop serializing on the src VMA anon_vma lock when moving an
exclusive folio from a src VMA to a dst VMA in UFFDIO_MOVE ioctl.

When moving a folio, we modify folio->mapping through
folio_move_anon_rmap() and adjust folio->index accordingly.  Doing that
while we could have concurrent RMAP walks would be dangerous.  Therefore,
to avoid that, we had to acquire anon_vma of src VMA in write-mode.  That
meant that when multiple threads called UFFDIO_MOVE concurrently on
distinct pages of the same src VMA, they would serialize on it, hurting
scalability.

In addition to avoiding the scalability bottleneck, this patch also
simplifies the complicated lock dance that UFFDIO_MOVE has to go through
between RCU, folio-lock, ptl, and anon_vma.

folio_move_anon_rmap() already enforces that the folio is locked.  So when
we have the folio locked we can no longer race with concurrent rmap_walk()
as used by folio_referenced() and others who call it on unlocked non-KSM
anon folios, and therefore the anon_vma lock is no longer required.

Note that this handling is now the same as for other
folio_move_anon_rmap() users that also do not hold the anon_vma lock --
namely COW reuse handling (do_wp_page()->wp_can_reuse_anon_folio(),
do_huge_pmd_wp_page(), and hugetlb_wp()).  These users never required the
anon_vma lock as they are only moving the anon VMA closer to the anon_vma
leaf of the VMA, for example, from an anon_vma root to a leaf of that
root.  rmap walks were always able to tolerate that scenario.

Link: https://lkml.kernel.org/r/20250923071019.775806-3-lokeshgidra@google.com
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:00 -08:00
Lokesh Gidra
95b34d6648 mm: always call rmap_walk() on locked folios
Patch series "Improve UFFDIO_MOVE scalability by removing anon_vma lock", v2.

Userfaultfd has a scalability issue in its UFFDIO_MOVE ioctl, which is
heavily used in Android as its java garbage collector uses it for
concurrent heap compaction.

The issue arises because UFFDIO_MOVE updates folio->mapping to an anon_vma
with a different root, in order to move the folio from a src VMA to dst
VMA.  It performs the operation with the folio locked, but this is
insufficient, because rmap_walk() can be performed on non-KSM anonymous
folios without folio lock.

This means that UFFDIO_MOVE has to acquire the anon_vma write lock of the
root anon_vma belonging to the folio it wishes to move.

This causes scalability bottleneck when multiple threads perform
UFFDIO_MOVE simultanously on distinct pages of the same src VMA.  In field
traces of arm64 android devices, we have observed janky user interactions
due to long (sometimes over ~50ms) uninterruptible sleeps on main UI
thread caused by anon_vma lock contention in UFFDIO_MOVE.  This is
particularly severe during the beginning of GC's compaction phase when it
is likely to have multiple threads involved.

This patch resolves the issue by removing the exception in rmap_walk() for
non-KSM anon folios by ensuring that all folios are locked during rmap
walk.  This is less problematic than it might seem, as the only major
caller which utilises this mode is shrink_active_list(), which is covered
in detail in the first patch of this series.

As a result of changing our approach to locking, we can remove all the
code that took steps to acquire an anon_vma write lock instead of a folio
lock.  This results in a significant simplification and scalability
improvement of the code (currently only in UFFDIO_MOVE).  Furthermore, as
a side-effect, folio_lock_anon_vma_read() gets simpler as we don't need to
worry that folio->mapping may have changed under us.


This patch (of 2):

Guarantee that rmap_walk() is called on locked folios so that threads
changing folio->mapping and folio->index for non-KSM anon folios can
serialize on fine-grained folio lock rather than anon_vma lock.  Other
folio types are already always locked before rmap_walk().  With this, we
are going from 'not necessarily' locking the non-KSM anon folio to
'definitely' locking it during rmap walks.

This patch is in preparation for removing anon_vma write-lock from
UFFDIO_MOVE.

With this patch, three functions are now expected to be called with a
locked folio.  To be careful of not missing any case, here is the
exhaustive list of all their callers.

1) rmap_walk() is called from:

a) folio_referenced()
b) damon_folio_mkold()
c) damon_folio_young()
d) page_idle_clear_pte_refs()
e) try_to_unmap()
f) try_to_migrate()
g) folio_mkclean()
h) remove_migration_ptes()

In the above list, first 4 are changed in this patch to try-lock non-KSM
anon folios, similar to other types of folios.  The remaining functions in
the list already hold folio lock when calling rmap_walk().

2) folio_lock_anon_vma_read() is called from following functions:

a) collect_procs_anon()
b) page_idle_clear_pte_refs()
c) damon_folio_mkold()
d) damon_folio_young()
e) folio_referenced()
f) try_to_unmap()
g) try_to_migrate()

All the functions in above list, except collect_procs_anon(), are covered
by the rmap_walk() list above.  For collect_procs_anon(), with
kill_procs_now() changed to take folio lock in this patch ensures that all
callers of folio_lock_anon_vma_read() now hold the lock.

3) folio_get_anon_vma() is called from following functions, all of which
   already hold the folio lock:

a) move_pages_huge_pmd()
b) __folio_split()
c) move_pages_ptes()
d) migrate_folio_unmap()
e) unmap_and_move_huge_page()

Functionally, this patch doesn't break the logic because rmap walkers
generally do some other check to see if what is expected to mapped did
happen so it's fine, or otherwise treat things as best-effort.

Among the 4 functions changed in this patch, folio_referenced() is the
only core-mm function, and is also frequently accessed.  To assess the
impact of locking non-KSM anon folios in
shrink_active_list()->folio_referenced() path, we performed an app cycle
test on an arm64 android device.  During the whole duration of the test
there were over 140k invocations of shrink_active_list(), out of which
over 29k had at least one non-KSM anon folio on which folio_referenced()
was called.  In none of these invocations folio_trylock() failed.

Of course, we now take a lock where we wouldn't previously have.  In the
past it would have had a major impact in causing a CoW write fault to copy
a page in do_wp_page(), as commit 09854ba94c ("mm: do_wp_page()
simplification") caused a failure to obtain folio lock to result in a page
copy even if one wasn't necessary.

However, since commit 6c287605fd ("mm: remember exclusively mapped
anonymous pages with PG_anon_exclusive"), and the introduction of the
folio anon exclusive flag, this issue is significantly mitigated.

The only case remaining that we might worry about from this perspective is
that of read-only folios immediately after fork where the anon exclusive
bit will not have been set yet.

We note however in the case of read-only just-forked folios that
wp_can_reuse_anon_folio() will notice the raised reference count
established by shrink_active_list() via isolate_lru_folios() and refuse to
reuse in any case, so this will in fact have no impact - the folio lock is
ultimately immaterial here.

All-in-all it appears that there is little opportunity for meaningful
negative impact from this change.

Link: https://lkml.kernel.org/r/20250923071019.775806-1-lokeshgidra@google.com
Link: https://lkml.kernel.org/r/20250923071019.775806-2-lokeshgidra@google.com
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Barry Song <baohua@kernel.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:00 -08:00
Usama Arif
eb02f14c4a mm/hugetlb: allow overcommitting gigantic hugepages
Currently, gigantic hugepages cannot use the overcommit mechanism
(nr_overcommit_hugepages), forcing users to permanently reserve memory via
nr_hugepages even when pages might not be actively used.

The restriction was added in 2011 [1], which was before there was support
for reserving 1G hugepages at runtime.  Remove this blanket restriction on
gigantic hugepage overcommit.  This will bring the same benefits to
gigantic pages as hugepages:

- Memory is only taken out of regular use when actually needed
- Unused surplus pages can be returned to the system
- Better memory utilization, especially with CMA backing which can
  significantly increase the changes of hugepage allocation

Without this patch:
echo 3 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
bash: echo: write error: Invalid argument

With this patch:
echo 3 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
./mmap_hugetlb_test
Successfully allocated huge pages at address: 0x7f9d40000000

cat mmap_hugetlb_test.c
...
    unsigned long ALLOC_SIZE = 3 * (unsigned long) HUGE_PAGE_SIZE;
    addr = mmap(NULL,
                ALLOC_SIZE, // 3GB
                PROT_READ | PROT_WRITE,
                MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_HUGE_1GB,
                -1,
                0);

    if (addr == MAP_FAILED) {
        fprintf(stderr, "mmap failed: %s\n", strerror(errno));
        return 1;
    }
    printf("Successfully allocated huge pages at address: %p\n", addr);
...

Link: https://lkml.kernel.org/r/20251009172433.4158118-2-usamaarif642@gmail.com
Link: https://git.zx2c4.com/linux-rng/commit/mm/hugetlb.c?id=adbe8726dc2a3805630d517270db17e3af86e526 [1]
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Rik van Riel <riel@surriel.com>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:00 -08:00
Usama Arif
a743e0af50 mm/hugetlb: create hstate_is_gigantic_no_runtime helper
This is a common condition used to skip operations that cannot be
performed on gigantic pages when runtime support is disabled.  This helper
is introduced as the condition will exist even more when allowing
"overcommit" of gigantic hugepages.  No functional change intended with
this patch.

Link: https://lkml.kernel.org/r/20251009172433.4158118-1-usamaarif642@gmail.com
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Rik van Riel <riel@surriel.com>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:00 -08:00
Joshua Hahn
2783088ef2 mm/page_alloc: prevent reporting pcp->batch = 0
zone_batchsize returns the appropriate value that should be used for
pcp->batch. If it finds a zone with less than 4096 pages or PAGE_SIZE >
1M, however, it leads to some incorrect math.

In the above case, we will get an intermediary value of 1, which is then
rounded down to the nearest power of two, and 1 is subtracted from it.
Since 1 is already a power of two, we will get batch = 1-1 = 0:

	batch = rounddown_pow_of_two(batch + batch/2) - 1;

A pcp->batch value of 0 is nonsensical. If this were actually set, then
functions like drain_zone_pages would become no-ops, since they could
only free 0 pages at a time.

Of the two callers of zone_batchsize, the one that is actually used to
set pcp->batch works around this by setting pcp->batch to the maximum
of 1 and zone_batchsize. However, the other caller, zone_pcp_init,
incorrectly prints out the batch size of the zone to be 0.

This is probably rare in a typical zone, but the DMA zone can often have
less than 4096 pages, which means it will print out "LIFO batch:0".

Before: [    0.001216]   DMA zone: 3998 pages, LIFO batch:0
After:  [    0.001210]   DMA zone: 3998 pages, LIFO batch:1

Instead of dealing with the error handling and the mismatch between the
reported and actual zone batchsize, just return 1 if the zone_batchsize
is 1 page or less before the rounding.

Link: https://lkml.kernel.org/r/20251009192933.3756712-3-joshua.hahnjy@gmail.com
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:59 -08:00
Joshua Hahn
4dcf65bf5b mm/page_alloc: clarify batch tuning in zone_batchsize
Patch series "mm/page_alloc: pcp->batch cleanups", v2.

Two small cleanups for mm/page_alloc.

Patch 1 cleans up a misleading comment about how pcp->batch is calculated,
and folds in the calculation to increase clarity. No functional change
intended.

Patch 2 corrects zones from reporting that their pcp->batch is 0 when it
is actually 1. Namely, corrects ZONE_DMA from reporting that its batch
size is 0.


This patch (of 2):

Recently while working on another patch about batching free_pcppages_bulk
[1], I was curious why pcp->batch was always 63 on my machine.  This led
me to zone_batchsize(), where I found this set of lines to determine what
the batch size should be for the host:

	batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE);
	batch /= 4;		/* We effectively *= 4 below */
	if (batch < 1)
		batch = 1;

All of this is good, except the comment above which says "We effectively
*= 4 below".  Nowhere else in the function zone_batchsize(), is there a
corresponding multipliation by 4.  Looking into the history of this, it
seems like Dave Hansen had also noticed this back in 2013 [1].  Turns out
there *used* to be a corresponding *= 4, which was turned into a *= 6
later on to be used in pageset_setup_from_batch_size(), which no longer
exists.

Despite this mismatch not being corrected in the comments, it seems that
getting rid of the /= 4 leads to a performance regression on machines with
less than 250G memory and 176 processors.  As such, let us preserve the
functionality but clean up the comments.

Fold the /= 4 into the calculation above: bitshift by 10+2=12, and instead
of dividing 1MB, divide 256KB and adjust the comments accordingly.  No
functional change intended.

Link: https://lkml.kernel.org/r/20251009192933.3756712-1-joshua.hahnjy@gmail.com
Link: https://lkml.kernel.org/r/20251009192933.3756712-2-joshua.hahnjy@gmail.com
Link: https://lore.kernel.org/all/20251002204636.4016712-1-joshua.hahnjy@gmail.com/ [1]
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:59 -08:00
Mauricio Faria de Oliveira
0de9a442ee mm/page_owner: update Documentation with 'show_handles' and 'show_stacks_handles'
Describe and provide examples for 'show_handles' and 'show_stacks_handles'.

Link: https://lkml.kernel.org/r/20251001175611.575861-6-mfo@igalia.com
Signed-off-by: Mauricio Faria de Oliveira <mfo@igalia.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:59 -08:00
Mauricio Faria de Oliveira
5513cfbcf4 mm/page_owner: add debugfs file 'show_stacks_handles'
Add the file 'show_stacks_handles' to show just stack traces and their
handles, in order to resolve stack traces and handles (i.e., to identify
the stack traces for handles in previous reads from 'show_handles').

All stacks/handles must show up, regardless of their number of pages, that
might have become zero or no longer make 'count_threshold', but made it in
previous reads from 'show_handles' -- and need to be resolved later.

P.S.: now, print the extra newline independently of the number of pages.

Link: https://lkml.kernel.org/r/20251001175611.575861-5-mfo@igalia.com
Signed-off-by: Mauricio Faria de Oliveira <mfo@igalia.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:59 -08:00
Mauricio Faria de Oliveira
3b52b9e31a mm/page_owner: add debugfs file 'show_handles'
Add the flag STACK_PRINT_FLAG_HANDLE to print a stack's handle number from
stackdepot, and add the file 'show_handles' to show just handles and their
number of pages.

This is similar to 'show_stacks', with handles instead of stack traces.

Link: https://lkml.kernel.org/r/20251001175611.575861-4-mfo@igalia.com
Signed-off-by: Mauricio Faria de Oliveira <mfo@igalia.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:58 -08:00
Mauricio Faria de Oliveira
5c8ca473d5 mm/page_owner: add struct stack_print_ctx.flags
Add the flags field to stack_print_ctx, and define two flags for current
behavior (printing stack traces and their number of base pages).

The plumbing of flags is debugfs_create_file(data) -> inode.i_private ->
page_owner_stack_open() -> stack_print_ctx.flags -> stack_print().

No behavior change intended.

Link: https://lkml.kernel.org/r/20251001175611.575861-3-mfo@igalia.com
Signed-off-by: Mauricio Faria de Oliveira <mfo@igalia.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:58 -08:00
Mauricio Faria de Oliveira
156c0c5d14 mm/page_owner: introduce struct stack_print_ctx
Patch series "mm/page_owner: add debugfs files 'show_handles' and
'show_stacks_handles'", v2.

Context:

The page_owner debug feature can help understand a particular situation in
in a point in time (e.g., identify biggest memory consumers; verify memory
counters that do not add up).

Another useful usecase is to collect data repeatedly over time, and use it
for profiling, monitoring, and even comparing different kernel versions,
at the stack trace level (e.g., watch for trends, leaks, correlations, and
regressions).

For this usecase, userspace periorically collects the data from page_owner
and organizes it in data structures appropriate for access per-stack
trace.

Problem:

The usecase of tracking memory usage per stack trace (or tracking it for a
particular stack trace) requires uniquely identifying each stack trace
(i.e., keys to store their memory usage over periodic data collections).

This has to be done for every stack trace in every sample/data collection,
even if tracking only one stack trace (to identify it among all others).

Therefore, an approach like hashing the stack traces in userspace to
create unique keys/identifiers for them during post-processing can quickly
become expensive, considering the repetition and a growing number of stack
traces.

Solution:

Fortunately, the kernel can provide a unique identifier for stack traces
in page_owner, which is the handle number in stackdepot.  This eliminates
the need for creating keys (hashing) in userspace during post-processing.

Additionally, with that information, the stack traces themselves are not
needed until the memory usage should be resolved from a handle to a stack
trace (say, to look at the stack traces of a few top consumers).  This can
reduce the amount of text emitted/copied by the kernel to userspace, and
save userspace from matching and discarding stack traces when not needed.

Changes:

This patchset adds 2 files to provide information, like 'show_stacks':
 - show_handles: print handle number and number of pages (no stack traces)
 - show_stacks_handles: print handle numbers and stack traces (no pages)

Now, it's possible to periodically collect data with handle numbers (keys)
and without stack traces (lower overhead) from 'show_handles', and later
do a final collection with handles and stack traces from
'show_stacks_handles' to resolve the handles to their stack traces.

The output format follows the existing 'show_stacks' file, for simplicity,
but it can certainly be changed if a different format is more convenient.

Example:

The number of base pages collected can be stored per-handle number over
the periodic data collections, and finally resolved to stack traces
per-handle number as well with a final collection.

Later, one can, for example, identify the biggest consumers and watch
their trends or correlate increases/decreases with other events in the
system, or watch a particular stack trace(s) of interest during
development.

Testing:

Tested on next-20250929.

 - show_stacks:
	
	 register_dummy_stack+0x32/0x70
	 init_page_owner+0x29/0x2f0
	 page_ext_init+0x27c/0x2b0
	 mm_core_init+0xdc/0x110
	nr_base_pages: 47
		
 - show_handles:

	handle: 1
	nr_base_pages: 47

 - show_stacks_handles:

	 register_dummy_stack+0x32/0x70
	 init_page_owner+0x29/0x2f0
	 page_ext_init+0x27c/0x2b0
	 mm_core_init+0xdc/0x110
	handle: 1
	
 - count_threshold:

	# echo 100 >/sys/kernel/debug/page_owner_stacks/count_threshold
	# grep register_dummy_stack show_stacks		# not present
	# grep -B4 '^handle: 1$' show_handles  		# not present 
	# grep -B4 '^handle: 1$' show_stacks_handles	# present
	 register_dummy_stack+0x32/0x70
	 init_page_owner+0x29/0x2f0
	 page_ext_init+0x27c/0x2b0
	 mm_core_init+0xdc/0x110
	handle: 1


This patch (of 5):

Currently, struct seq_file.private is used as an iterator in stack_list by
stack_start|next(), for stack_print().

Create a context struct for this, in order to add another field next.

No behavior change intended.

P.S.: page_owner_stack_open() is expanded with separate statements for
variable definition and return just in preparation for the next patch.

Link: https://lkml.kernel.org/r/20251001175611.575861-1-mfo@igalia.com
Link: https://lkml.kernel.org/r/20251001175611.575861-2-mfo@igalia.com
Signed-off-by: Mauricio Faria de Oliveira <mfo@igalia.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:58 -08:00
Anshuman Khandual
b4e53984f2 mm/dirty: replace READ_ONCE() with pudp_get()
Replace READ_ONCE() with a standard page table accessor i.e pudp_get() that
anyways defaults into READ_ONCE() in cases where platform does not override

Link: https://lkml.kernel.org/r/20251006055214.1845342-1-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:58 -08:00
Ryan Roberts
9ac09bb9fe mm: consistently use current->mm in mm_get_unmapped_area()
mm_get_unmapped_area() is a wrapper around arch_get_unmapped_area() /
arch_get_unmapped_area_topdown(), both of which search current->mm for
some free space.  Neither take an mm_struct - they implicitly operate on
current->mm.

But the wrapper takes an mm_struct and uses it to decide whether to search
bottom up or top down.  All callers pass in current->mm for this, so
everything is working consistently.  But it feels like an accident waiting
to happen; eventually someone will call that function with a different mm,
expecting to find free space in it, but what gets returned is free space
in the current mm.

So let's simplify by removing the parameter and have the wrapper use
current->mm to decide which end to start at.  Now everything is consistent
and self-documenting.

Link: https://lkml.kernel.org/r/20251003155306.2147572-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:57 -08:00
SeongJae Park
0fdaa13ee9 Docs/admin-guide/mm/zswap: s/red-black tree/xarray/
The change from commit 796c2c23e1 ("zswap: replace RB tree with xarray")
is not reflected on the document.  Update the document.

Link: https://lkml.kernel.org/r/20251003203851.43128-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:57 -08:00
SeongJae Park
1f52f3de4b mm/zswap: s/red-black tree/xarray/
Changes made by commit 796c2c23e1 ("zswap: replace RB tree with xarray")
are not reflected on a comment.  Update the comment.

Link: https://lkml.kernel.org/r/20251003203851.43128-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:57 -08:00
SeongJae Park
f7ed6bf237 mm/zswap: fix typos: s/zwap/zswap/
As the subject says.

Link: https://lkml.kernel.org/r/20251003203851.43128-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:57 -08:00
SeongJae Park
138336d674 mm/zswap: remove unnecessary dlen writes for incompressible pages
Patch series "mm/zswap: misc cleanup of code and documentations".

Clean up an unnecessary local variable write in incompressible pages
handling, typos (s/zwap/zswap/) and outdated comments/documentations about
the zswap's red-black tree, which is replaced by xarray.


This patch (of 4):

Incompressible pages handling logic in zswap_compress() is setting 'dlen'
as PAGE_SIZE twice.  Once before deciding whether to save the content as
is, and once again after it is decided to save it as is.  But the value of
'dlen' is used only if it is decided to save the content as is, so the
first write is unnecessary.  It is not causing real user issues, but
making code confusing to read.  Remove the unnecessary write operation.

Link: https://lkml.kernel.org/r/20251003203851.43128-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20251003203851.43128-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Chris Li <chrisl@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:56 -08:00
Fushuai Wang
bd63d0fde2 mm/vmscan: remove redundant __GFP_NOWARN
The __GFP_NOWARN flag was included in GFP_NOWAIT since commit 16f5dfbc85
("gfp: include __GFP_NOWARN in GFP_NOWAIT").  So remove the redundant
__GFP_NOWARN flag.

Link: https://lkml.kernel.org/r/20251006014948.44695-1-wangfushuai@baidu.com
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:56 -08:00
Roman Gushchin
9f1edf1aed mm: readahead: make thp readahead conditional to mmap_miss logic
Commit 4687fdbb80 ("mm/filemap: Support VM_HUGEPAGE for file mappings")
introduced a special handling for VM_HUGEPAGE mappings: even if the
readahead is disabled, 1 or 2 HPAGE_PMD_ORDER pages are allocated.

This change causes a significant regression for containers with a tight
memory.max limit, if VM_HUGEPAGE is widely used.  Prior to this commit,
mmap_miss logic would eventually lead to the readahead disablement,
effectively reducing the memory pressure in the cgroup.  With this change
the kernel is trying to allocate 1-2 huge pages for each fault, no matter
if these pages are used or not before being evicted, increasing the memory
pressure multi-fold.

To fix the regression, let's make the new VM_HUGEPAGE conditional to the
mmap_miss check, but keep independent from the ra->ra_pages.  This way the
main intention of commit 4687fdbb80 ("mm/filemap: Support VM_HUGEPAGE
for file mappings") stays intact, but the regression is resolved.

The logic behind this changes is simple: even if a user explicitly
requests using huge pages to back the file mapping (using VM_HUGEPAGE
flag), under a very strong memory pressure it's better to fall back to
ordinary pages.

Link: https://lkml.kernel.org/r/20251006175106.377411-1-roman.gushchin@linux.dev
Fixes: 4687fdbb80 ("mm/filemap: Support VM_HUGEPAGE for file mappings")
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:56 -08:00
Anshuman Khandual
c0efdb373c mm: replace READ_ONCE() with standard page table accessors
Replace all READ_ONCE() with a standard page table accessors i.e
pxdp_get() that defaults into READ_ONCE() in cases where platform does not
override.

Link: https://lkml.kernel.org/r/20251007063100.2396936-1-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:56 -08:00
xu xin
bda7bf0684 selftests: update ksm inheritance tests for prctl fork/exec
To reproduce the issue mentioned by [1], this add a setting of
pages_to_scan and sleep_millisecs at the start of test_prctl_fork_exec(). 
The main change is just raise the scanning frequency of ksmd.

[1] https://lore.kernel.org/all/202510012256278259zrhgATlLA2C510DMD3qI@zte.com.cn/

Link: https://lkml.kernel.org/r/20251007182935207jm31wCIgLpZg5XbXQY64S@zte.com.cn
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jinjiang Tu <tujinjiang@huawei.com>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: Wang Yaxin <wang.yaxin@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:55 -08:00
xu xin
590c03ca6a mm/ksm: fix exec/fork inheritance support for prctl
Patch series "ksm: fix exec/fork inheritance", v2.

This series fixes exec/fork inheritance.  See the detailed description of
the issue below.


This patch (of 2):

Background
==========

commit d7597f59d1 ("mm: add new api to enable ksm per process")
introduced MMF_VM_MERGE_ANY for mm->flags, and allowed user to set it by
prctl() so that the process's VMAs are forcibly scanned by ksmd. 

Subsequently, the 3c6f33b727 ("mm/ksm: support fork/exec for prctl")
supported inheriting the MMF_VM_MERGE_ANY flag when a task calls execve().

Finally, commit 3a9e567ca4 ("mm/ksm: fix ksm exec support for prctl")
fixed the issue that ksmd doesn't scan the mm_struct with MMF_VM_MERGE_ANY
by adding the mm_slot to ksm_mm_head in __bprm_mm_init().

Problem
=======

In some extreme scenarios, however, this inheritance of MMF_VM_MERGE_ANY
during exec/fork can fail.  For example, when the scanning frequency of
ksmd is tuned extremely high, a process carrying MMF_VM_MERGE_ANY may
still fail to pass it to the newly exec'd process.  This happens because
ksm_execve() is executed too early in the do_execve flow (prematurely
adding the new mm_struct to the ksm_mm_slot list).

As a result, before do_execve completes, ksmd may have already performed a
scan and found that this new mm_struct has no VM_MERGEABLE VMAs, thus
clearing its MMF_VM_MERGE_ANY flag.  Consequently, when the new program
executes, the flag MMF_VM_MERGE_ANY inheritance missed.

Root reason
===========

commit d7597f59d1 ("mm: add new api to enable ksm per process") clear
the flag MMF_VM_MERGE_ANY when ksmd found no VM_MERGEABLE VMAs.

Solution
========

Firstly, Don't clear MMF_VM_MERGE_ANY when ksmd found no VM_MERGEABLE
VMAs, because perhaps their mm_struct has just been added to ksm_mm_slot
list, and its process has not yet officially started running or has not
yet performed mmap/brk to allocate anonymous VMAS.

Secondly, recheck MMF_VM_MERGEABLE again if a process takes
MMF_VM_MERGE_ANY, and create a mm_slot and join it into ksm_scan_list
again.

Link: https://lkml.kernel.org/r/20251007182504440BJgK8VXRHh8TD7IGSUIY4@zte.com.cn
Link: https://lkml.kernel.org/r/20251007182821572h_SoFqYZXEP1mvWI4n9VL@zte.com.cn
Fixes: 3c6f33b727 ("mm/ksm: support fork/exec for prctl")
Fixes: d7597f59d1 ("mm: add new api to enable ksm per process")
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jinjiang Tu <tujinjiang@huawei.com>
Cc: Wang Yaxin <wang.yaxin@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:55 -08:00
Uladzislau Rezki (Sony)
c6307674ed mm: kvmalloc: add non-blocking support for vmalloc
Extend __kvmalloc_node_noprof() to handle non-blocking GFP flags
(GFP_NOWAIT and GFP_ATOMIC).  Previously such flags were rejected,
returning NULL.  With this change:

- kvmalloc() can fall back to vmalloc() if non-blocking contexts;
- for non-blocking allocations the VM_ALLOW_HUGE_VMAP option is
  disabled, since the huge mapping path still contains might_sleep();
- documentation update to reflect that GFP_NOWAIT and GFP_ATOMIC
  are now supported.

Link: https://lkml.kernel.org/r/20251007122035.56347-11-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:55 -08:00
Uladzislau Rezki (Sony)
0667b209e9 mm/vmalloc: update __vmalloc_node_range() documentation
__vmalloc() now supports non-blocking flags such as GFP_ATOMIC and
GFP_NOWAIT.  Update the documentation accordingly.

Link: https://lkml.kernel.org/r/20251007122035.56347-10-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Marco Elver <elver@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:55 -08:00
Uladzislau Rezki (Sony)
7241bb2ea3 mm: skip might_alloc() warnings when PF_MEMALLOC is set
might_alloc() catches invalid blocking allocations in contexts where
sleeping is not allowed.

However when PF_MEMALLOC is set, the page allocator already skips reclaim
and other blocking paths.  In such cases, a blocking gfp_mask does not
actually lead to blocking, so triggering might_alloc() splats is
misleading.

Adjust might_alloc() to skip warnings when the current task has
PF_MEMALLOC set, matching the allocator's actual blocking behaviour.

Link: https://lkml.kernel.org/r/20251007122035.56347-9-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:54 -08:00
Uladzislau Rezki (Sony)
b186a94227 kmsan: remove hard-coded GFP_KERNEL flags
kmsan_vmap_pages_range_noflush() allocates its temp s_pages/o_pages arrays
with GFP_KERNEL, which may sleep.  This is inconsistent with vmalloc() as
it will support non-blocking requests later.

Plumb gfp_mask through the kmsan_vmap_pages_range_noflush(), so it can use
it internally for its demand.

Please note, the subsequent __vmap_pages_range_noflush() still uses
GFP_KERNEL and can sleep.  If a caller runs under reclaim constraints,
sleeping is forbidden, it must establish the appropriate memalloc scope
API.

Link: https://lkml.kernel.org/r/20251007122035.56347-8-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:54 -08:00
Uladzislau Rezki (Sony)
ad435e79f8 mm/kasan: support non-blocking GFP in kasan_populate_vmalloc()
A "gfp_mask" is already passed to kasan_populate_vmalloc() as an argument
to respect GFPs from callers and KASAN uses it for its internal
allocations.

But apply_to_page_range() function ignores GFP flags due to a hard-coded
mask.

Wrap the call with memalloc_apply_gfp_scope()/memalloc_restore_scope() so
that non-blocking GFP flags(GFP_ATOMIC, GFP_NOWAIT) are respected.

Link: https://lkml.kernel.org/r/20251007122035.56347-7-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:54 -08:00
Uladzislau Rezki (Sony)
8da89ba18e mm/vmalloc: handle non-blocking GFP in __vmalloc_area_node()
Make __vmalloc_area_node() respect non-blocking GFP masks such as
GFP_ATOMIC and GFP_NOWAIT.

- Add memalloc_apply_gfp_scope()/memalloc_restore_scope()
  helpers to apply a proper scope.
- Apply memalloc_apply_gfp_scope()/memalloc_restore_scope()
  around vmap_pages_range() for page table setup.
- Set "nofail" to false if a non-blocking mask is used, as
  they are mutually exclusive.

This is particularly important for page table allocations that internally
use GFP_PGTABLE_KERNEL, which may sleep unless such scope restrictions are
applied.  For example:

<snip>
__pte_alloc_kernel()
  pte_alloc_one_kernel(&init_mm);
    pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0);
<snip>

Note: in most cases, PTE entries are established only up to the level
required by current vmap space usage, meaning the page tables are
typically fully populated during the mapping process.

Link: https://lkml.kernel.org/r/20251007122035.56347-6-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:54 -08:00
Uladzislau Rezki (Sony)
9c47753167 mm/vmalloc: defer freeing partly initialized vm_struct
__vmalloc_area_node() may call free_vmap_area() or vfree() on error paths,
both of which can sleep.  This becomes problematic if the function is
invoked from an atomic context, such as when GFP_ATOMIC or GFP_NOWAIT is
passed via gfp_mask.

To fix this, unify error paths and defer the cleanup of partly initialized
vm_struct objects to a workqueue.  This ensures that freeing happens in a
process context and avoids invalid sleeps in atomic regions.

Link: https://lkml.kernel.org/r/20251007122035.56347-5-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:53 -08:00
Uladzislau Rezki (Sony)
86e968d8ca mm/vmalloc: support non-blocking GFP flags in alloc_vmap_area()
alloc_vmap_area() currently assumes that sleeping is allowed during
allocation.  This is not true for callers which pass non-blocking GFP
flags, such as GFP_ATOMIC or GFP_NOWAIT.

This patch adds logic to detect whether the given gfp_mask permits
blocking.  It avoids invoking might_sleep() or falling back to reclaim
path if blocking is not allowed.

This makes alloc_vmap_area() safer for use in non-sleeping contexts, where
previously it could hit unexpected sleeps, trigger warnings.

It is a preparation and adjustment step to later allow both GFP_ATOMIC and
GFP_NOWAIT allocations in this series.

Link: https://lkml.kernel.org/r/20251007122035.56347-4-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:53 -08:00
Uladzislau Rezki (Sony)
e781c1c0a9 lib/test_vmalloc: remove xfail condition check
A test marked with "xfail = true" is expected to fail but that does not
mean it is predetermined to fail.  Remove "xfail" condition check for
tests which pass successfully.

Link: https://lkml.kernel.org/r/20251007122035.56347-3-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Marco Elver <elver@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:53 -08:00
Uladzislau Rezki (Sony)
9ff86ca1cc lib/test_vmalloc: add no_block_alloc_test case
Patch series "__vmalloc()/kvmalloc() and no-block support", v4.


This patch (of 10):

Introduce a new test case "no_block_alloc_test" that verifies non-blocking
allocations using __vmalloc() with GFP_ATOMIC and GFP_NOWAIT flags.

It is recommended to build kernel with CONFIG_DEBUG_ATOMIC_SLEEP enabled
to help catch "sleeping while atomic" issues.  This test ensures that
memory allocation logic under atomic constraints does not inadvertently
sleep.

Link: https://lkml.kernel.org/r/20251007122035.56347-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:52 -08:00
Anshuman Khandual
11119b19f6 mm/ptdump: replace READ_ONCE() with standard page table accessors
Replace READ_ONCE() with standard page table accessors i.e pxdp_get()
which anyways default into READ_ONCE() in cases where platform does not
override.  Also convert ptep_get_lockless() into ptep_get() as well.

Link: https://lkml.kernel.org/r/20251001042502.1400726-1-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Acked-by: Lance Yang <lance.yang@linux.dev>
Acked-by: SeongJae Park <sj@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:52 -08:00
jianyun.gao
b6c46600bf mm: fix some typos in mm module
Below are some typos in the code comments:

  intevals ==> intervals
  addesses ==> addresses
  unavaliable ==> unavailable
  facor ==> factor
  droping ==> dropping
  exlusive ==> exclusive
  decription ==> description
  confict ==> conflict
  desriptions ==> descriptions
  otherwize ==> otherwise
  vlaue ==> value
  cheching ==> checking
  exisitng ==> existing
  modifed ==> modified
  differenciate ==> differentiate
  refernece ==> reference
  permissons ==> permissions
  indepdenent ==> independent
  spliting ==> splitting

Just fix it.

Link: https://lkml.kernel.org/r/20250929002608.1633825-1-jianyungao89@gmail.com
Signed-off-by: jianyun.gao <jianyungao89@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Chris Li <chrisl@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:52 -08:00
Anshuman Khandual
37d1792548 mm/thp: drop follow_devmap_pmd() default stub
follow_devmap_pmd() has already been dropped by the commit fd2825b076
("mm/gup: remove pXX_devmap usage from get_user_pages()").  The fallback
stub in the header which is now redundant, can be dropped off as well.

Link: https://lkml.kernel.org/r/20250929104643.1100421-1-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:52 -08:00
Andrew Morton
3407caccb8 Merge branch 'mm-hotfixes-stable' into mm-stable to be able to merge "mm:
introduce deferred freeing for kernel page tables" into mm-stable.
2025-11-16 17:24:35 -08:00
Alexander Stein
691d5ccb42 arm64: dts: freescale: tqma9352: Add vcc-supply for spi-nor
(Q)SPI NOR flash is supplied by 1.8V. Add the corresponding supply.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:56 +08:00
Alexander Stein
e9c909657d arm64: dts: mb-smarc-2: Add MicIn routing
MicIn is connected to IN3_L. Add routing including the Mic Bias.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:56 +08:00
Alexander Stein
c3184d2ecf arm64: dts: mba8xx: Add MicIn routing
MicIn is connected to IN3_L. Add routing including the Mic Bias.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:56 +08:00
Alexander Stein
9a9396de05 arm64: dts: mba8mx: Add MicIn routing
MicIn is connected to IN3_L. Add routing including the Mic Bias.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:56 +08:00
Laurentiu Mihalcea
9ceb1cf5b2 arm64: dts: imx8mp: make 'dsp' node depend on 'aips5'
The DSP needs to access peripherals on AIPSTZ5 (to communicate with
the AP using AUDIOMIX MU, for instance). To do so, the security-related
registers of the bridge have to be configured before the DSP is started.
Enforce a dependency on AIPSTZ5 by adding the 'access-controllers'
property to the 'dsp' node.

Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Signed-off-by: Laurentiu Mihalcea <laurentiu.mihalcea@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:56 +08:00
Laurentiu Mihalcea
286e622bf4 arm64: dts: imx8mp: convert 'aips5' to 'aipstz5'
Change the programming model of the "aips5" node to allow configuring
the security-related registers exposed by the AIPSTZ5 bridge. Without
this, masters such as the HIFI4 DSP will have their access to the
peripherals connected to the bridge denied after power cycling the
AUDIOMIX domain.

Co-developed-by: Daniel Baluta <daniel.baluta@nxp.com>
Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com>
Signed-off-by: Laurentiu Mihalcea <laurentiu.mihalcea@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:56 +08:00
Marco Felsch
45011b7633 arm64: dts: imx8mp-skov: add Rev.C HDMI support
From software perspective, Rev.C HDMI and Rev.B HDMI don't differ since
the panel is connected via HDMI and the touchscreen is connected via
USB. However, the bootloader firmware expects to find a dts with the
correct revc-hdmi compatible.

Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:56 +08:00
Marek Vasut
d1da1068bc arm64: dts: imx8mp: Add missing LED enumerators for DH electronics i.MX8M Plus DHCOM on PDK2
The LED enumerators are missing, which prevents the LEDs from being
accurately told apart by label. Fill in the enumerators the same way
they are already present on PDK3.

Signed-off-by: Marek Vasut <marek.vasut@mailbox.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:56 +08:00
Jan Petrous (OSS)
40c687a4f2 arm64: dts: freescale: Add GMAC Ethernet for S32G2 EVB and RDB2 and S32G3 RDB3
Add support for the Ethernet connection over GMAC controller connected to
the Micrel KSZ9031 Ethernet RGMII PHY located on the boards.

The mentioned GMAC controller is one of two network controllers
embedded on the NXP Automotive SoCs S32G2 and S32G3.

The supported boards:
 * EVB:  S32G-VNP-EVB with S32G2 SoC
 * RDB2: S32G-VNP-RDB2
 * RDB3: S32G-VNP-RDB3

Tested-by: Enric Balletbo i Serra <eballetb@redhat.com>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:56 +08:00
Max Krummenacher
cb92737919 arm64: dts: imx8qm-apalis: add pwm used by the backlight
Add pwm node used by the backlight output pin BKL1_PWM and
reference it from the pwm-backlight node.

Signed-off-by: Max Krummenacher <max.krummenacher@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:55 +08:00
Markus Niebel
916e798303 arm64: dts: imx95-tqma9596sa-mb-smarc-2: add aliases for SPI
Add missing alias for SPI controllers.

Signed-off-by: Markus Niebel <Markus.Niebel@ew.tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:55 +08:00
Markus Niebel
f19950eb8a arm64: dts: imx95-tqma9596sa-mb-smarc-2: remove superfluous line
Remove empty line, no functional changes.

Signed-off-by: Markus Niebel <Markus.Niebel@ew.tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:55 +08:00
Alexander Stein
dc4290f5fa arm64: dts: imx95-tqma9596sa-mb-smarc-2: mark LPUART1 as reserved
LPUART1 is not disabled, but used by system manager (SM) and should not
be used by Linux.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:55 +08:00
Alexander Stein
2d226e5404 arm64: dts: imx95-tqma9596sa-mb-smarc-2: Add MicIn routing
MicIn is connected to IN3_L. Add routing including the Mic Bias.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:55 +08:00
Markus Niebel
c0a5960113 arm64: dts: imx95-tqma9596sa: add EEPROM pagesize
The special ID device of the EEPROMS needs pagesize, too.
Otherwise only slow byte write is possible.

Signed-off-by: Markus Niebel <Markus.Niebel@ew.tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:55 +08:00
Alexander Stein
e0ac00be7a arm64: dts: imx95-tqma9596sa: whitespace fixes
Use tabs instead of spaces for indentation.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:55 +08:00
Alexander Stein
b5a07fe96d arm64: dts: imx95-tqma9596sa: add gpio bus recovery for i2c
Add pinctrl group for GPIO based bus recovery.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:55 +08:00
Alexander Stein
35eecf5d8c arm64: dts: imx95-tqma9596sa: remove superfluous pinmux for usdhci
A sleep pin mux is not useful if it is the same as the normal pin mux.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:55 +08:00
Alexander Stein
b997a3acdc arm64: dts: imx95-tqma9596sa: remove superfluous pinmux for i2c
A sleep pin mux is not useful if it is the same as the normal pin mux.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:55 +08:00
Alexander Stein
88b02dc673 arm64: dts: imx95-tqma9596sa: remove superfluous pinmux for flexspi
A sleep pin mux is not useful if it is the same as the normal pin mux.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:54 +08:00
Alexander Stein
0df2b380e5 arm64: dts: imx95-tqma9596sa: update pcie config
Fix pcie clock config and switch from deprecated reset-gpio to
reset-gpios.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:54 +08:00
Alexander Stein
ab5b4a9158 arm64: dts: imx95-tqma9596sa: move pcie config to SOM
The muxing and other features are mostly determined by SOM, so add it
at this level.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:54 +08:00
Alexander Stein
8b452dddd8 arm64: dts: imx95-tqma9596sa: move sai config to SOM
The muxing and other features are mostly determined by SOM, so add it
at this level.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:54 +08:00
Markus Niebel
16471a3234 arm64: dts: imx95-tqma9596sa: move USDHC2 config to SOM
The muxing and other features are mostly determined by SOM,
so add it at this level.

Signed-off-by: Markus Niebel <Markus.Niebel@ew.tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:54 +08:00
Alexander Stein
68b7d5383b arm64: dts: imx95-tqma9596sa: move lpspi3 pinctrl to SOM
The muxing is determined by SOM, so add it at this level.

Signed-off-by: Markus Niebel <Markus.Niebel@ew.tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:54 +08:00
Alexander Stein
438a4d0cd5 arm64: dts: imx95-tqma9596sa: move flexcan pinctrl to SOM
The muxing for flexcan is determined by SOM, so add it at this level.

Signed-off-by: Markus Niebel <Markus.Niebel@ew.tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:54 +08:00
Alexander Stein
310b0aaa65 arm64: dts: imx95-tqma9596sa: increase flexspi slew rate
Switch to fast slew rate.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:54 +08:00
Alexander Stein
461be38025 arm64: dts: imx95-tqma9596sa: reduce maximum FlexSPI frequency to 66MHz
66 MHz is the maximum FlexPI clock frequency in normal/overdrive mode
when RXCLKSRC = 0 (Default)

Fixes: 91d1ff322c ("arm64: dt: imx95: Add TQMa95xxSA")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:54 +08:00
Markus Niebel
046cb64923 arm64: dts: imx95-tqma9596sa: fix TPM5 pinctrl node name
tpm4grp will be overwritten. Fix node name

Fixes: 91d1ff322c ("arm64: dt: imx95: Add TQMa95xxSA")
Signed-off-by: Markus Niebel <Markus.Niebel@ew.tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-17 08:56:36 +08:00
Stefan Prisacariu
6f813b7814 ARM: dts: imx6qdl: make VAR-SOM SoM SoC-agnostic
Make SoM .dtsi SoC-agnostic by moving SoC include to board level

imx6qdl-var-som.dtsi currently includes imx6q.dtsi, which makes this SoM
description Quad/Dual specific and prevents reuse from i.MX6DL boards.

Changes:
 - Move imx6q.dtsi include from imx6qdl-var-som.dtsi to
imx6q-var-mx6customboard.dts.
 - Remove /dts-v1/; from imx6qdl-var-som.dtsi (dtsi files should not declare
version)

This keeps the SoM .dtsi SoC-agnostic (it already relies on imx6qdl.dtsi for
family-common parts) and allows boards using the DualLite or Solo to include
imx6dl.dtsi instead.

Why this is needed:
To reuse imx6qdl-var-som.dtsi on i.MX6DL board.

No functional changes for imx6q-var-mx6customboard are intended.

Signed-off-by: Stefan Prisacariu <stefan.prisacariu@prevas.dk>
Link: https://lore.kernel.org/all/20250925104942.4148376-1-stefan.prisacariu@prevas.dk/
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 21:03:49 +08:00
Laurentiu Mihalcea
910ab02fbb arm64: defconfig: enable i.MX AIPSTZ driver
Enable the i.MX AIPSTZ driver, which is used for i.MX8MP-based boards such
as NXP's IMX8MP-EVK.

Signed-off-by: Laurentiu Mihalcea <laurentiu.mihalcea@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 20:56:27 +08:00
Marco Felsch
88f717c042 dt-bindings: arm: fsl: add Skov Rev.C HDMI support
From software perspective, Rev.C HDMI and Rev.B HDMI don't differ since
the panel is connected via HDMI and the touchscreen is connected via
USB. However, the bootloader firmware expects to find a dts with the
correct revc-hdmi compatible.

Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 20:50:29 +08:00
Michal Vokáč
35d7230ceb ARM: dts: imx6dl-yapp4: Model the RGB LED as a single multi-led part
Describe the RGB LED indicator according to the reality - it is a single
part containing all the three R,G and B LEDs in one package.
With this description the chan-name property becomes useless, remove it.

Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 20:35:35 +08:00
Michal Vokáč
c8bf7740aa ARM: dts: imx6dl-yapp43: Enable pwm-beeper on boards with speaker
Lynx, Pegasus and Pegasus+ boards have a speaker connected to the PWM3.
Enable a pwm-beeper on these boards so the system can produce simple
sounds.

Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 20:35:35 +08:00
Stefano Radaelli
5651eff270 arm64: dts: freescale: imx93-var-som: Add support for ADS7846 touchscreen
The VAR-SOM-MX93 integrates an ADS7846 resistive touchscreen controller.
The controller is physically located on the SOM, and its signals are
routed to the SOM pins, allowing carrier boards to make use of it.

This patch adds the ADS7846 node and the appropriate SPI controller.

Signed-off-by: Stefano Radaelli <stefano.radaelli21@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 18:01:30 +08:00
Stefano Radaelli
c815711e27 arm64: dts: freescale: imx93-var-som: Add support for WM8904 audio codec
The VAR-SOM-MX93 can integrate the WM8904, a high-performance
ultra-low-power stereo codec optimized for portable audio applications.

This patch adds the WM8904 device to the appropriate I2C bus, enables
the SAI peripheral, and introduces the sound node to expose the
sound card to the system.

Signed-off-by: Stefano Radaelli <stefano.radaelli21@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 18:01:30 +08:00
Stefano Radaelli
2e2a8aa9fc arm64: dts: freescale: imx93-var-som: Add PMIC support
The VAR-SOM-MX93 features Dual Freescale/NXP PCA9541 chip as a Power
Management Integrated circuit (PMIC).
The PMIC is programmable via the I2C interface and its associated
register map, and this patch adds its support.

Signed-off-by: Stefano Radaelli <stefano.radaelli21@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 18:01:30 +08:00
Stefano Radaelli
5d0c788263 arm64: dts: freescale: imx93-var-som: Add WiFi and Bluetooth support
Add device tree nodes for the WiFi and Bluetooth module mounted on the
VAR-SOM-MX93. The module can be based on either the NXP IW612 or IW611
chipset, depending on the configuration chosen by the customer.

Regardless of the chipset used, WiFi communicates over SDIO and Bluetooth
over UART.

Signed-off-by: Stefano Radaelli <stefano.radaelli21@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 18:01:30 +08:00
Frank Li
d59ae05e3d arm64: dts: imx8qxp-mek: change space with tab
Change space with tab to align with code style.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:57:32 +08:00
Sherry Sun
5d82ec7472 arm64: dts: imx8qxp-mek: Add lpuart1 to support the M.2 PCIE9098 bluetooth
Add the lpuart1 dts node to support the PCIE9098 bluetooth on M.2
connector.

Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:57:32 +08:00
Joy Zou
b85104933a arm64: dts: imx8: add edma error interrupt support
Add edma error interrupt for i.MX8QM, i.MX8QXP and i.MX8DXL.

Signed-off-by: Joy Zou <joy.zou@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:57:32 +08:00
Frank Li
0e2dbdcfad arm64: dts: imx8qxp-mek: add fec2 support
Add fec2 and related nodes.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:57:32 +08:00
Frank Li
f4dcaab01b arm64: dts: imx8qxp-mek: add phandle ocotp mac-address for fec
Add phandle to the OCOTP mac-address nodes so the FEC can obtain a fixed
MAC address specific to each board.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:57:32 +08:00
Frank Li
339cd3056b arm64: dts: imx8qxp-mek: add flexspi and flash
Add flexspi and flash node.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:57:32 +08:00
Frank Li
00f69e5f6d arm64: dts: imx8qxp-mek: update usdhc1 clock to 400Mhz
Update usdhc1 clock to 400Mhz to support eMMC HS400.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:57:32 +08:00
Frank Li
1c1e81ef39 arm64: dts: imx8qxp-mek: add state_100mhz and state_200mhz for usdhc
default, state_100mhz and state_200mhz use the same settings. But current
driver use these to indicate if sd3.0 support.

Add SD gpio pin group (Reset, CD, WP) for usdhc2.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:57:32 +08:00
Frank Li
6f90786d73 arm64: dts: imx8qxp: add wakeup source for power-key
Add wakeup source property for power-key.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:57:31 +08:00
Frank Li
a29a5a5596 arm64: dts: imx8qxp: add MAC address in ocotp
Add MAC address nodes in ocotp.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:56:55 +08:00
Frank Li
05a14dc3f6 arm64: dts: imx8qm-mek: replace space with tab
Replace space with tab.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:46:12 +08:00
Frank Li
9c70313efe arm64: dts: imx8qm-mek: add usbotg1 and related nodes
Add usbotg1 and related nodes.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:45:53 +08:00
Frank Li
5c9f27cd7e arm64: dts: imx8qm-mek: add pmic thermal-zones
Add pmic thermal-zones.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:45:53 +08:00
Frank Li
70051810d0 arm64: dts: imx8qm: add label thermal_zones
Add label thermal_zones to prepare add pmic thermal zones.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:45:53 +08:00
Frank Li
a8eebe6195 arm64: dts: imx8qm-mek: add lpuart1 and bluetooth node
Add lpuart1 and bluetooth support.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:45:53 +08:00
Frank Li
64b60a7280 arm64: dts: imx8qm-mek: assign double SD bus frequency for usdhc1
Assign double SD bus frequency to support SDR104 mode, where the operating
clock runs at 208 MHz.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:45:52 +08:00
Frank Li
2984af7932 arm64: dts: imx8qm-mek: add state_100mhz and state_200mhz for usdhc
default, state_100mhz and state_200mhz use the same settings. But current
driver use these to indicate if sd3.0 support.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:45:52 +08:00
Andreas Kemnade
e58a429680 ARM: dts: imx: e70k02: add sy7636
Add the EPD PMIC for the e70k02 based devices as a step towards full EPD
support.

Acked-by: Alistair Francis <alistair@alistair23.me>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Andreas Kemnade <akemnade@kernel.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:42:41 +08:00
Primoz Fiser
09b33db32e arm64: dts: freescale: imx93-phyboard-nash: Add pwm-fan overlay
Add overlay to support PWM fan on the phyBOARD-Nash-i.MX93 board. Fan
can be connected to the FAN (X48) connector on the board and will be
controlled according to the following CPU temperature trips table:

 - bellow 50 degrees - fan is off (<1% duty cycle)
 - between 50 and 58 degrees - low fan speed (~35% duty cycle)
 - between 58 and 65 degrees - fan medium speed (~60% duty cycle)
 - above 65 degrees - fan at full speed (>99% duty cycle)

The output frequency of PWM signal is set to 25 kHz.

Signed-off-by: Primoz Fiser <primoz.fiser@norik.com>
Reviewed-by: Alberto Merciai <alb3rt0.m3rciai@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:41:18 +08:00
Primoz Fiser
af21bfd1f7 arm64: dts: freescale: imx93-phyboard-nash: Add jtag overlay
Add overlay to enable JTAG pins on the board's JTAG (X41) connector.

Signed-off-by: Primoz Fiser <primoz.fiser@norik.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:41:18 +08:00
Teresa Remmet
3b0bf06595 arm64: dts: imx8mm-phyboard-polis-peb-av-10: Fix audio codec reset pin ctl
Enable internal pull up of the active low audio codec reset pin.
Otherwise the audio codec does not reset properly and is not working.

Signed-off-by: Teresa Remmet <t.remmet@phytec.de>
Signed-off-by: Jan Remmet <j.remmet@phytec.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:36:53 +08:00
Jan Remmet
f7a77fb14c arm64: dts: imx8mm-phyboard-polis-peb-av-10-ph128800t006
Add support for powertip,ph128800t006-zhc01 connected via peb-av-10

Signed-off-by: Jan Remmet <j.remmet@phytec.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:36:53 +08:00
Jan Remmet
4f03fd6d95 arm64: dts: imx8mm-phyboard-polis-peb-av-10: split display configuration
The PEB-AV-10 board can be used with different displays or in audio-only
mode.
Split the device tree overlays to reflect these use cases. To use the
board with the EDT ETML1010G3DRA display, the overlay
imx8mm-phyboard-polis-peb-av-10-etml1010g3dra.dtbo must now be used
instead of imx8mm-phyboard-polis-peb-av-10.dtbo.

Signed-off-by: Jan Remmet <j.remmet@phytec.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:36:53 +08:00
Jan Remmet
c277c95959 arm64: dts: imx8mm-phyboard-polis-peb-av-10: reorder properties to match dts coding style
Sort properties. Rename regulator label to match schematics.

Signed-off-by: Jan Remmet <j.remmet@phytec.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:36:52 +08:00
Jan Remmet
4b73a1bd86 arm64: dts: imx8mm-phyboard-polis: move mipi bridge to som
sn65dsi83 is mounted on som. Add the static configuration there.
So it can be used by other boards too.
Use mipi_dsi_out from imx8mm.dtsi directly.

Signed-off-by: Jan Remmet <j.remmet@phytec.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:36:52 +08:00
Jan Remmet
aab882780a arm64: dts: imx8mm-phyboard-polis: Use GPL-2.0-or-later OR MIT
Update license and remove individual authorship.

Signed-off-by: Jan Remmet <j.remmet@phytec.de>
Acked-by: Teresa Remmet <t.remmet@phytec.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:36:52 +08:00
Primoz Fiser
33583fa58e arm64: dts: freescale: Add phyBOARD-Segin-i.MX91 support
Add initial support for the PHYTEC phyBOARD-Segin-i.MX91 board [1] based
on the PHYTEC phyCORE-i.MX91 SoM (System-on-Module) [2].

Supported features:
* Audio
* CAN
* eMMC
* Ethernet
* I2C
* RTC
* SD-Card
* UART
* USB

For more details see the product pages for the development board and the
SoM:

[1] https://www.phytec.eu/en/produkte/development-kits/phyboard-segin-kit/
[2] https://www.phytec.eu/en/produkte/system-on-modules/phycore-imx-91-93/

Signed-off-by: Primoz Fiser <primoz.fiser@norik.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:35:26 +08:00
Primoz Fiser
21906e6f52 dt-bindings: arm: fsl: Add PHYTEC phyBOARD-Segin-i.MX91 board
Add device-tree bindings for PHYTEC phyBOARD-Segin-i.MX91 board based on
the PHYTEC phyCORE-i.MX91 SoM (System-on-Module).

Signed-off-by: Primoz Fiser <primoz.fiser@norik.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 17:35:08 +08:00
Al Viro
cd08d17f39 convert selinuxfs
Tree has invariant part + two subtrees that get replaced upon each
policy load.  Invariant parts stay for the lifetime of filesystem,
these two subdirs - from policy load to policy load (serialized
on lock_rename(root, ...)).

All object creations are via d_alloc_name()+d_add() inside selinuxfs,
all removals are via simple_recursive_removal().

Turn those d_add() into d_make_persistent()+dput() and that's mostly it.

Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:05 -05:00
Al Viro
d1e4a99358 selinuxfs: new helper for attaching files to tree
allocating dentry after the inode has been set up reduces the amount
of boilerplate - "attach this inode under that name and this parent
or drop inode in case of failure" simplifies quite a few places.

Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:05 -05:00
Al Viro
d297622875 selinuxfs: don't stash the dentry of /policy_capabilities
Don't bother to store the dentry of /policy_capabilities - it belongs
to invariant part of tree and we only use it to populate that directory,
so there's no reason to keep it around afterwards.

Same situation as with /avc, /ss, etc.  There are two directories that
get replaced on policy load - /class and /booleans.  These we need to
stash (and update the pointers on policy reload); /policy_capabilities
is not in the same boat.

Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:05 -05:00
Al Viro
7beafd51c4 convert binfmt_misc
removals are done with locked_recursive_removal(); switch creations to
simple_start_creating()/d_make_persistent()/simple_done_creating() and
take them to a helper (add_entry()), while we are at it - simpler control
flow that way.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:05 -05:00
Al Viro
b85d6b2419 convert autofs
creation/removal is via normal VFS paths; make ->mkdir() and ->symlink()
use d_make_persistent(); ->rmdir() and ->unlink() - d_make_discardable()
instead of dput() and that's it.

d_make_persistent() works for unhashed just fine...

Note that only persistent dentries are ever hashed there; unusual absense
of ->d_delete() in dentry_operations is due to that - anything that has
refcount reach 0 will be unhashed there, so it won't get to checking
->d_delete anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:05 -05:00
Al Viro
4c7d25094f autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:05 -05:00
Al Viro
b89aa54482 convert binderfs
Objects are created either by d_alloc_name()+d_add() (in
binderfs_ctl_create()) or by simple_start_creating()+d_instantiate().
Removals are by simple_recurisive_removal().

Switch d_add()/d_instantiate() to d_make_persistent() + dput().
Voila - kill_litter_super() is not needed anymore.

Fold dput()+unlocking the parent into simple_done_creating(), while
we are at it.

NOTE: return value of binderfs_create_file() is borrowed; it may get
stored in proc->binderfs_entry.  See binder_release()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:05 -05:00
Al Viro
02da8d2c09 binderfs_binder_ctl_create(): kill a bogus check
It's called once, during binderfs mount, right after allocating
root dentry.  Checking that it hadn't been already called is
only obfuscating things.

Looks like that bogosity had been copied from devpts...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:04 -05:00
Al Viro
185d241c88 binderfs: use simple_start_creating()
binderfs_binder_device_create() gets simpler, binderfs_create_dentry() simply
goes away...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:04 -05:00
Al Viro
b4a76faf9d convert devpts
Two kinds of objects there - ptmx and everything else (pty).  The former is
created on mount and kept until the fs shutdown; the latter get created
and removed by tty layer (the references are borrowed into tty->driver_data).
The reference to ptmx dentry is also kept, but we only ever use it to
find ptmx inode on remount.

* turn d_add() into d_make_persistent() + dput()  both in mknod_ptmx() and
in devpts_pty_new().

* turn dput() to d_make_discardable() in devpts_pty_kill().

* switch mknod_ptmx() to simple_{start,done}_creating().

* instead of storing in pts_fs_info a reference to ptmx dentry, store a reference
to its inode, seeing that this is what we use it for.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:04 -05:00
Al Viro
b83431564d ibmasmfs: get rid of ibmasmfs_dir_ops
it is always equal (and always had been equal) to &simple_dir_operations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:04 -05:00
Al Viro
e6ef35deec convert ibmasmfs
static contents for each "service processor", whatever the fuck it is.
Congruent subdirectories of root, created at mount time, taken out
by kill_litter_super().  All dentries created with d_alloc_name() and are
left pinned.  The odd part is that the list of service providers is
assumed to be unchanging - no locking, nothing to handle removals or
extra elements added later on.

... and it's a PCI device.  If you ever tell it to remove an instance,
you are fucked - it doesn't bother with removing its directory from filesystem,
it has a strange check that presumably wanted to be a check for removed
devices, but it had never been fleshed out.

Anyway, d_add() -> d_make_persistent()+dput() in ibmasmfs_create_dir() and
ibmasmfs_create_file(), and make the latter return int - no need to even
borrow that dentry, callers completely ignore it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:04 -05:00
Al Viro
ea800a515f convert spufs
have spufs_new_file() use d_make_persistent() instead of d_add() and
do an uncondition dput() in the caller; the rest is completely
straightforward.

[a braino in spufs_mkgang() fixed]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:04 -05:00
Al Viro
e11e247608 convert efivarfs
Initially filesystem is populated with d_alloc_name() + d_add().
That becomes d_alloc_name() + d_make_persistent() + dput().
Dynamic creation is switched to d_make_persistent();
removal - to simple_unlink() (no point open-coding it in
efivarfs_unlink(), better call it there)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:04 -05:00
Al Viro
0703b36e4e debugfs: remove duplicate checks in callers of start_creating()
we'd already verified that DEBUGFS_ALLOW_API was there in
start_creating() - it would've failed otherwise

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:03 -05:00
Al Viro
ebb54484c3 convert debugfs
similar to tracefs - simulation of normal codepath for creation,
simple_recursive_removal() for removal.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:03 -05:00
Al Viro
50d7fd3c3a convert tracefs
A mix of persistent and non-persistent dentries in there.  Strictly
speaking, no need for kill_litter_super() anyway - it pins an internal
mount whenever a persistent dentry is created, so at fs shutdown time
there won't be any to deal with.

However, let's make it explicit - replace d_instantiate() with
d_make_persistent() + dput() (the latter in tracefs_end_creating(),
where it folds with inode_unlock() into simple_done_creating())
for dentries we want persistent and have d_make_discardable() done
either by simple_recursive_removal() (used by tracefs_remove())
or explicitly in eventfs_remove_events_dir().

Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:03 -05:00
Al Viro
eec876df5e convert pstore
object creation by d_alloc_name()+d_add() in pstore_mkfile(), removal -
via normal VFS codepaths (with ->unlink() using simple_unlink()) or
in pstore_put_backend_records() via locked_recursive_removal()

Replace d_add() with d_make_persistent()+dput() - that's what really
happens there.  The reference that goes into record->dentry is valid
only until the unlink (and explicitly cleared by pstore_unlink()).

Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:03 -05:00
Al Viro
5a8993a15a convert fuse_ctl
objects are created in fuse_ctl_add_dentry() by d_alloc_name()+d_add(),
removed by simple_remove_by_name().

What we return is a borrowed reference - it is valid until the call of
fuse_ctl_remove_conn() and we depend upon the exclusion (on fuse_mutex)
for safety.  Return value is used only within the caller
(fuse_ctl_add_conn()).

Replace d_add() with d_make_persistent() + dput().  dput() is paired
with d_alloc_name() and return value is the result of d_make_persistent().

Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:03 -05:00
Al Viro
ae1fde9038 convert dlmfs
All modifications via normal VFS codepaths; just take care of making
persistent in ->create() and ->mkdir() and that's it (removal side
doesn't need any changes, since it uses simple_rmdir() for ->rmdir()
and calls simple_unlink() from ->unlink()).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:03 -05:00
Al Viro
c5055286f8 convert bpf
object creation goes through the normal VFS paths or approximation
thereof (user_path_create()/done_path_create() in case of bpf_obj_do_pin(),
open-coded simple_{start,done}_creating() in bpf_iter_link_pin_kernel()
at mount time), removals go entirely through the normal VFS paths (and
->unlink() is simple_unlink() there).

Enough to have bpf_dentry_finalize() use d_make_persistent() instead
of dget() and we are done.

Convert bpf_iter_link_pin_kernel() to simple_{start,done}_creating(),
while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:03 -05:00
Al Viro
1e508e05dd convert mqueue
All modifications via normal VFS codepaths; just take care of making
persistent in in mqueue_create_attr() and discardable in mqueue_unlink()
and it doesn't need kill_litter_super() at all.

mqueue_unlink() side is best handled by having it call simple_unlink()
rather than duplicating its guts...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:02 -05:00
Al Viro
50889f533f convert hugetlbfs
Very much ramfs-like; dget()+d_instantiate() -> d_make_persistent()
(in two places) is all it takes.

NB: might make sense to turn its ->put_super() into ->kill_sb().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:02 -05:00
Al Viro
bdd80b5c1b convert smackfs
Entirely static tree populated by simple_fill_super().  Can use
kill_anon_super() as-is.

Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:02 -05:00
Al Viro
153f99a45a convert xenfs
entirely static tree, populated by simple_fill_super().  Can switch
to kill_anon_super() without any other changes.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:02 -05:00
Al Viro
b1494e6bc4 configfs, securityfs: kill_litter_super() not needed
These are guaranteed to be empty by the time they are shut down;
both are single-instance and there is an internal mount maintained
for as long as there is any contents.

Both have that internal mount pinned by every object in root.

In other words, kill_litter_super() boils down to kill_anon_super()
for those.

Reviewed-by: Joel Becker <jlbec@evilplan.org>
Acked-by: Paul Moore <paul@paul-moore> (LSM)
Acked-by: Andreas Hindborg <a.hindborg@kernel.org> (configfs)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:02 -05:00
Al Viro
23cbc7a795 procfs: make /self and /thread_self dentries persistent
... and there's no need to remember those pointers anywhere - ->kill_sb()
no longer needs to bother since kill_anon_super() will take care of
them anyway and proc_pid_readdir() only wants the inumbers, which
we had in a couple of static variables all along.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:02 -05:00
Al Viro
2313598222 convert ramfs and tmpfs
Quite a bit is already done by infrastructure changes (simple_link(),
simple_unlink()) - all that is left is replacing d_instantiate() +
pinning dget() (in ->symlink() and ->mknod()) with d_make_persistent(),
and, in case of shmem, using simple_unlink() and simple_link() in
->unlink() and ->link() resp., instead of open-coding those there.
Since d_make_persistent() accepts (and hashes) unhashed ones, shmem
situation gets simpler - we no longer care whether ->lookup() has hashed
the sucker.

With that done, we don't need kill_litter_super() for these filesystems
anymore - by the umount time all remaining dentries will be marked
persistent and kill_litter_super() will boil down to call of
kill_anon_super().

The same goes for devtmpfs and rootfs - they are handled by
ramfs or by shmem, depending upon config.

NB: strictly speaking, both devtmpfs and rootfs ought to use
ramfs_kill_sb() if they end up using ramfs; that's a separate
story and the only impact of "just use kill_{litter,anon}_super()"
is that we fail to free their sb->s_fs_info... on reboot.
That's orthogonal to the changes in this series - kill_litter_super()
is identical to kill_anon_super() for those at this point.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:02 -05:00
Al Viro
e49ce25855 convert simple_{link,unlink,rmdir,rename,fill_super}() to new primitives
Note that simple_unlink() et.al. are used by many filesystems; for now
they can not assume that persistency mark will have been set back
when the object got created.  Once all conversions are done we'll
have them complain if called for something that had not been marked
persistent.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:01 -05:00
Al Viro
bacdf1d70b primitives for maintaining persisitency
* d_make_persistent(dentry, inode) - bump refcount, mark persistent and
make hashed positive.  Return value is a borrowed reference to dentry;
it can be used until something removes persistency (at the very least,
until the parent gets unlocked, but some filesystems may have stronger
exclusion).

* d_make_discardable() - remove persistency mark and drop reference.

d_make_persistent() is similar to combination of d_instantiate(), dget()
and setting flag.  The only difference is that unlike d_instantiate()
it accepts hashed and unhashed negatives alike.  It is always called in
strong locking environment (parent held exclusive, or, in some cases,
dentry coming from d_alloc_name()); if we ever start using it with parent
held only shared and dentry coming from d_alloc_parallel(), we'll need
to copy the in-lookup logics from __d_add().

d_make_discardable() is eqiuvalent to combination of removing flag and
dput(); since flag removal requires ->d_lock, there's no point trying
to avoid taking that for refcount decrement as fast_dput() does.
The slow path of dput() has been taken into a helper and reused in
d_make_discardable() instead.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:01 -05:00
Al Viro
8a210cacf5 introduce a flag for explicitly marking persistently pinned dentries
Some filesystems use a kinda-sorta controlled dentry refcount leak to pin
dentries of created objects in dcache (and undo it when removing those).
Reference is grabbed and not released, but it's not actually _stored_
anywhere.  That works, but it's hard to follow and verify; among other
things, we have no way to tell _which_ of the increments is intended
to be an unpaired one.  Worse, on removal we need to decide whether
the reference had already been dropped, which can be non-trivial if
that removal is on umount and we need to figure out if this dentry is
pinned due to e.g. unlink() not done.  Usually that is handled by using
kill_litter_super() as ->kill_sb(), but there are open-coded special
cases of the same (consider e.g. /proc/self).

Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT)
marking those "leaked" dentries.  Having it set claims responsibility
for +1 in refcount.

The end result this series is aiming for:

* get these unbalanced dget() and dput() replaced with new primitives that
  would, in addition to adjusting refcount, set and clear persistency flag.
* instead of having kill_litter_super() mess with removing the remaining
  "leaked" references (e.g. for all tmpfs files that hadn't been removed
  prior to umount), have the regular shrink_dcache_for_umount() strip
  DCACHE_PERSISTENT of all dentries, dropping the corresponding
  reference if it had been set.  After that kill_litter_super() becomes
  an equivalent of kill_anon_super().

Doing that in a single step is not feasible - it would affect too many places
in too many filesystems.  It has to be split into a series.

Here we
	* introduce the new flag
	* teach shrink_dcache_for_umount() to handle it (i.e. remove
and drop refcount on anything that survives to umount with that flag
still set)
	* teach kill_litter_super() that anything with that flag does
*not* need to be unpinned.

Next commits will add primitives for maintaing that flag and convert the
common helpers to those.  After that - a long series of per-filesystem
patches converting to those primitives.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:01 -05:00
Al Viro
1552ddc7fa new helper: simple_done_creating()
should be paired with simple_start_creating() - unlocks parent and
drops dentry reference.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:01 -05:00
Al Viro
4051a9115a new helper: simple_remove_by_name()
simple_recursive_removal(), but instead of victim dentry it takes
parent + name.

Used to be open-coded in fs/fuse/control.c, but there's no need to expose
the guts of that thing there and there are other potential users, so
let's lift it into libfs...

Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:01 -05:00
Al Viro
798a401660 tracefs: fix a leak in eventfs_create_events_dir()
If we have LOCKDOWN_TRACEFS, the function bails out - *after*
having locked the parent directory and without bothering to
undo that.  Just check it before tracefs_start_creating()...

Fixes: e24709454c "tracefs/eventfs: Add missing lockdown checks"
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:01 -05:00
Al Viro
c460192aae fuse_ctl_add_conn(): fix nlink breakage in case of early failure
fuse_ctl_remove_conn() used to decrement the link count of root
manually; that got subsumed by simple_recursive_removal(), but
in case when subdirectory creation has failed the latter won't
get called.

Just move the modification of parent's link count into
fuse_ctl_add_dentry() to keep the things simple.  Allows to
get rid of the nlink argument as well...

Fixes: fcaac5b427 "fuse_ctl: use simple_recursive_removal()"
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:01 -05:00
Dario Binacchi
2f185905da ARM: dts: imx28-amarula-rmm: add I2S audio
Add support for I2S audio on Amarula rmm board. Audio codec
TLV320AIC3X is connected as slave to SAIF0, which provides
bitclock, frame and MCLK.

Co-developed-by: Michael Trimarchi <michael@amarulasolutions.com>
Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com>
Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 12:03:08 +08:00
Dario Binacchi
193c1baab0 ARM: mxs_defconfig: enable sound drivers for imx28-amarula-rmm
Enable TLV320AIC3X I2C codec and simple-card support used on Amarula
rmm board.

Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 12:02:53 +08:00
João Paulo Gonçalves
ef2429fe38 arm64: defconfig: Enable i.MX95 drivers for pinctrl, Ethernet and PCIe
Enable i.MX95 pinctrl driver necessary for booting. Also enable the
missing drivers required for Ethernet and PCIe functionality. These
drivers are used on i.MX95 boards, including the NXP i.MX95 19x19 EVK.

The below configurations were enabled (listed with their DT nodes on
imx95.dtsi):

* CONFIG_PINCTRL_IMX_SCMI for the `scmi_iomuxc` pinctrl.
* CONFIG_CLK_IMX95_BLK_CTL for the HSIO domain clock controller
  (`hsio_blk_ctl`) used by the PCIe controller.
* CONFIG_NXP_NETC_BLK_CTRL for the NETC hardware domain controller
  (`netc_blk_ctrl`).
* CONFIG_NXP_ENETC4 for the Ethernet controller (`enetc_port*`).

Signed-off-by: João Paulo Gonçalves <joao.goncalves@toradex.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 12:01:44 +08:00
Stefan Eichenberger
4c9cf0b3f1 arm64: dts: imx8-apalis: use startup-delay-us for wifi regulator
We used regulator-settling-time-us for the wifi regulator which is
wrong for regulator-fixed. We have to use startup-delay-us instead.

Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com>
Signed-off-by: Max Krummenacher <max.krummenacher@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 11:55:56 +08:00
Stefan Eichenberger
aa6f6799f9 arm64: dts: imx8-apalis: rename wifi regulator
Rename the wifi regulator to what is the net name in the schematic.

Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com>
Signed-off-by: Max Krummenacher <max.krummenacher@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 11:55:56 +08:00
Max Krummenacher
4a59771555 arm64: dts: imx8-apalis: specify adc reference voltage regulator
This sets in_voltage_scale to calculate the measured voltage from the
raw digital value of the ADC.

Signed-off-by: Max Krummenacher <max.krummenacher@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 11:55:56 +08:00
Max Krummenacher
a2c79d55bd arm64: dts: imx8-apalis: add thermal nodes
Add the thermal-zones and cooling-maps nodes for the PMIC device.

Signed-off-by: Max Krummenacher <max.krummenacher@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 11:55:56 +08:00
Max Krummenacher
53948d7f54 arm64: dts: imx8-apalis: cleanup todo
Functionality has been added without removing the associated TODO
comments.
Clean that up by removing TODOs no longer applicable.

Signed-off-by: Max Krummenacher <max.krummenacher@toradex.com>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 11:55:56 +08:00
Shengjiu Wang
b21f87b832 arm64: dts: imx8mp-evk: enable hdmi_pai device
Enable hdmi_pai device.

Aud2htx module, hdmi_pai and hdmi controller compose the hdmi audio
pipeline.

Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 11:48:36 +08:00
Shengjiu Wang
4677a3f84c arm64: dts: imx8mp: Add hdmi parallel audio interface node
The HDMI TX Parallel Audio Interface (HTX_PAI) is a bridge between the
Audio Subsystem to the HDMI TX Controller.

Shrink register map size of hdmi_pvi to avoid overlapped hdmi_pai device.

Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-16 11:48:25 +08:00
Michael Riesch
78f316e909 arm64: dts: rockchip: add vicap node to rk356x
Add the device tree node for the RK356x Video Capture (VICAP) unit.

Signed-off-by: Michael Riesch <michael.riesch@wolfvision.net>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Michael Riesch <michael.riesch@collabora.com>
Link: https://patch.msgid.link/20240220-rk3568-vicap-v15-13-8f4915ee365d@collabora.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-16 00:39:47 +01:00
Mehdi Djait
d2da7e98c0 arm64: dts: rockchip: add the vip node to px30
Add the device tree node for the PX30 Video Input Processor (VIP).

Signed-off-by: Mehdi Djait <mehdi.djait@bootlin.com>
[added cosmetic changes]
Signed-off-by: Michael Riesch <michael.riesch@wolfvision.net>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Michael Riesch <michael.riesch@collabora.com>
Link: https://patch.msgid.link/20240220-rk3568-vicap-v15-12-8f4915ee365d@collabora.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-16 00:39:47 +01:00
Michael Riesch
60705b039f arm64: defconfig: enable rockchip camera interface
The Rockchip Camera Interface (CIF) is featured in many Rockchip SoCs
in different variations.
Enable the driver for it in the default configuration.

Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Mehdi Djait <mehdi.djait@linux.intel.com>
Signed-off-by: Michael Riesch <michael.riesch@collabora.com>
Link: https://patch.msgid.link/20240220-rk3568-vicap-v15-11-8f4915ee365d@collabora.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-16 00:34:27 +01:00
Chukun Pan
4e8cb8e032 arm64: dts: rockchip: fixes audio for 100ASK DshanPi A1
Correct the i2c address of the ES8388 chip and the gpio of hp_det pin.

According to the schematic, the ES8388 chip is connected to the SoC's
sai2 controller instead of sai1, fix it.

Fixes: d809417c5a ("arm64: dts: rockchip: add DTs for 100ASK DShanPi A1")
Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
Link: https://patch.msgid.link/20251101120010.41729-5-amadeus@jmu.edu.cn
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-15 20:50:29 +01:00
Chukun Pan
eaf661552b arm64: dts: rockchip: fixes vcc3v3_s0 supply for 100ASK DshanPi A1
From the schematic, the supply of ES8388 chip and the vmmc-supply
of SD card both originate from the VCC3V3_S0 regulator, fix it.

Fixes: d809417c5a ("arm64: dts: rockchip: add DTs for 100ASK DShanPi A1")
Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
Link: https://patch.msgid.link/20251101120010.41729-4-amadeus@jmu.edu.cn
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-15 20:50:25 +01:00
Nicolas Ferre
7d5864dc5d ARM: dts: microchip: sama5d2: fix spi flexcom fifo size to 32
Unlike standalone spi peripherals, on sama5d2, the flexcom spi have fifo
size of 32 data. Fix flexcom/spi nodes where this property is wrong.

Fixes: 6b9a3584c7 ("ARM: dts: at91: sama5d2: Add missing flexcom definitions")
Cc: stable@vger.kernel.org # 5.8+
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Link: https://lore.kernel.org/r/20251114140225.30372-1-nicolas.ferre@microchip.com
Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
2025-11-15 15:53:56 +02:00
Aaron Kling
0936fa8015 arm64: tegra: Remove OTG ID GPIO from Jetson TX2 NX
The P3509 carrier board does not connect the ID GPIO. Prior to this, the
GPIO role switch driver could not detect the mode of the OTG port.

Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 22:55:12 +01:00
Aaron Kling
a72c3372ad arm64: tegra: Set USB Micro-B port to OTG mode on P3450
The USB Micro-B port on p3450 is capable of OTG and doesn't need to be
hardcoded to peripheral. No other supported Tegra device is set up like
this, so align for consistency.

Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 22:55:12 +01:00
Diogo Ivo
7beff596dd arm64: tegra: Add NVJPG node for Tegra210 platforms
The Tegra X1 chip contains a NVJPG accelerator capable of
encoding/decoding JPEG files in hardware. Complete its DT node
and enable it.

Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 22:55:12 +01:00
Diogo Ivo
172de4d681 arm64: tegra: Add Tegra210 NVJPG power-domain node
Add the NVJPG power-domain node in order to support the NVJPG
accelerator in Tegra210 platforms.

Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 22:55:12 +01:00
Haotien Hsu
11c52a226e arm64: tegra: Add interrupts for Tegra234 USB wake events
Add interrupts for Tegra234 USB wake events to support the USB wake-up
function.

Signed-off-by: Haotien Hsu <haotienh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 22:55:12 +01:00
Aaron Kling
3da2ea1933 arm64: tegra: Add reserved-memory node for P2180
The Tegra210 L4T bootloader RAM training will corrupt the in-RAM kernel
DT if no reserved-memory node exists. This prevents said bootloader from
being able to boot a kernel without this node, unless a chainloaded
bootloader loads the DT. Add the node to eliminate the requirement for
extra boot stages.

Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 22:55:11 +01:00
Aaron Kling
28f917d1a8 arm64: tegra: Add reserved-memory node for P3450
The Tegra210 L4T bootloader RAM training will corrupt the in-RAM kernel
DT if no reserved-memory node exists. This prevents said bootloader from
being able to boot a kernel without this node, unless a chainloaded
bootloader loads the DT. Add the node to eliminate the requirement for
extra boot stages.

Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 22:55:11 +01:00
Aaron Kling
961e69a5dc arm64: tegra: Enable NVDEC and NVENC on Tegra210
The other engines are already enabled, finish filling out the media
engine nodes and power domains.

Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 22:55:11 +01:00
Nino Zhang
feee7f5ae2 arm64: tegra: Fix APB DMA controller node name
The APB DMA controller node is currently named "dma@60020000", but
according to the DT bindings the node name should be "dma-controller".

Update the node name to match the binding and fix dtbs_check warnings.

Signed-off-by: Nino Zhang <ninozhang001@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 22:55:11 +01:00
Krzysztof Kozlowski
6db753b921 arm64: tegra: Add default GIC address cells on Tegra210
Add missing address-cells 0 to GIC interrupt node to silence W=1
warning:

  tegra210.dtsi:31.3-41: Warning (interrupt_map): /pcie@1003000:interrupt-map:
    Missing property '#address-cells' in node /interrupt-controller@50041000, using 0 as fallback

Value '0' is correct because:
1. GIC interrupt controller does not have children,
2. interrupt-map property (in PCI node) consists of five components and
   the fourth component "parent unit address", which size is defined by
   '#address-cells' of the node pointed to by the interrupt-parent
   component, is not used (=0)

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 22:55:11 +01:00
Krzysztof Kozlowski
7d14d53bc8 arm64: tegra: Add default GIC address cells on Tegra194
Add missing address-cells 0 to GIC interrupt node to silence W=1
warning:

  tegra194.dtsi:2391.4-42: Warning (interrupt_map): /bus@0/pcie@14100000:interrupt-map:
    Missing property '#address-cells' in node /bus@0/interrupt-controller@3881000, using 0 as fallback

Value '0' is correct because:
1. GIC interrupt controller does not have children,
2. interrupt-map property (in PCI node) consists of five components and
   the fourth component "parent unit address", which size is defined by
   '#address-cells' of the node pointed to by the interrupt-parent
   component, is not used (=0)

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 22:55:11 +01:00
Krzysztof Kozlowski
311cbb9c13 arm64: tegra: Add default GIC address cells on Tegra186
Add missing address-cells 0 to GIC interrupt node to silence W=1
warning:

  tegra186.dtsi:1355.3-41: Warning (interrupt_map): /pcie@10003000:interrupt-map:
    Missing property '#address-cells' in node /interrupt-controller@3881000, using 0 as fallback

Value '0' is correct because:
1. GIC interrupt controller does not have children,
2. interrupt-map property (in PCI node) consists of five components and
   the fourth component "parent unit address", which size is defined by
   '#address-cells' of the node pointed to by the interrupt-parent
   component, is not used (=0)

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 22:55:10 +01:00
Krzysztof Kozlowski
cddc94edf1 arm64: tegra: Add default GIC address cells on Tegra132
Add missing address-cells 0 to GIC interrupt node to silence W=1
warning:

  tegra132.dtsi:32.3-41: Warning (interrupt_map): /pcie@1003000:interrupt-map:
    Missing property '#address-cells' in node /interrupt-controller@50041000, using 0 as fallback

Value '0' is correct because:
1. GIC interrupt controller does not have children,
2. interrupt-map property (in PCI node) consists of five components and
   the fourth component "parent unit address", which size is defined by
   '#address-cells' of the node pointed to by the interrupt-parent
   component, is not used (=0)

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 22:55:10 +01:00
Aaron Kling
654427e0b9 arm64: tegra: Add OPP tables on Tegra210
This adds OPP tables for ACTMON and EMC, enabling dynamic frequency
scaling for system memory.

Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 22:55:10 +01:00
Aaron Kling
3cad436939 arm64: tegra: Add interconnect properties for Tegra210
Add interconnect properties to the Memory Controller, External Memory
Controller and the Display Controller nodes in order to describe the
hardware interconnection.

Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 22:55:10 +01:00
Aaron Kling
60bb55de5d arm64: tegra: Add ACTMON on Tegra210
This enables the action monitor to facilitate dynamic frequency scaling.

Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 22:55:10 +01:00
Haotien Hsu
905f0dcc38 dt-bindings: usb: Add wake-up support for Tegra234 XUSB host controller
Populate USB wake events for Tegra234 XUSB host controller.
These wake-up events are optional to maintain backward compatibility and
because the USB controller does not require them for normal operation.

Signed-off-by: Haotien Hsu <haotienh@nvidia.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 19:31:10 +01:00
Tom Lendacky
275d6d1189 KVM: SEV: Add known supported SEV-SNP policy bits
Add to the known supported SEV-SNP policy bits that don't require any
implementation support from KVM in order to successfully use them.

At this time, this includes:
  - CXL_ALLOW
  - MEM_AES_256_XTS
  - RAPL_DIS
  - CIPHERTEXT_HIDING_DRAM
  - PAGE_SWAP_DISABLE

Arguably, RAPL_DIS and CIPHERTEXT_HIDING_DRAM require KVM and the CCP
driver to enable these features in order for the setting of the policy
bits to be successfully handled. But, a guest owner may not wish their
guest to run on a system that doesn't provide support for those features,
so allowing the specification of these bits accomplishes that. Whether
or not the bit is supported by SEV firmware, a system that doesn't support
these features will either fail during the KVM validation of supported
policy bits before issuing the LAUNCH_START or fail during the
LAUNCH_START.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://patch.msgid.link/ec040de9864099cf592a97c201dc4cc110b2b0cf.1761593632.git.thomas.lendacky@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-14 10:30:12 -08:00
Tom Lendacky
7a61d61396 KVM: SEV: Publish supported SEV-SNP policy bits
Define the set of policy bits that KVM currently knows as not requiring
any implementation support within KVM. Provide this value to userspace
via the KVM_GET_DEVICE_ATTR ioctl.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://patch.msgid.link/c596f7529518f3f826a57970029451d9385949e5.1761593632.git.thomas.lendacky@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-14 10:30:12 -08:00
Tom Lendacky
c9434e64e8 crypto: ccp - Add an API to return the supported SEV-SNP policy bits
Supported policy bits are dependent on the level of SEV firmware that is
currently running. Create an API to return the supported policy bits for
the current level of firmware.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://patch.msgid.link/e3f711366ddc22e3dd215c987fd2e28dc1c07f54.1761593632.git.thomas.lendacky@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-14 10:30:10 -08:00
Haotien Hsu
dd03d34d6d soc/tegra: pmc: Add USB wake events for Tegra234
Add USB wake events for Tegra234 so that system can be woken up from
suspend when USB devices hot-plug/unplug event is detected.

Signed-off-by: Haotien Hsu <haotienh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 19:29:50 +01:00
Tom Lendacky
ce62118a2e KVM: SEV: Consolidate the SEV policy bits in a single header file
Consolidate SEV policy bit definitions into a single file. Use
include/linux/psp-sev.h to hold the definitions and remove the current
definitions from the arch/x86/kvm/svm/sev.c and arch/x86/include/svm.h
files.

No functional change intended.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://patch.msgid.link/d9639f88a0b521a1a67aeac77cc609fdea1f90bd.1761593632.git.thomas.lendacky@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-14 10:29:43 -08:00
Jon Hunter
23a29a81b4 ARM: tegra: Enable EXT4 for Tegra
After commit d6ace46c82 ("ext4: remove obsolete EXT3 config options")
was added, when using the 'tegra_defconfig' kernel configuration,
mounting an MMC device on Tegra20, Tegra30 and Tegra124 boards is
failing with "unknown filesystem type 'ext4'". Fix this by updating the
'tegra_defconfig' to use the EXT4 config options and remove the
obselete EXT2/3 options.

Fixes: d6ace46c82 ("ext4: remove obsolete EXT3 config options")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 18:05:14 +01:00
Aaron Kling
22c788aec2 dt-bindings: devfreq: tegra30-actmon: Add Tegra124 fallback for Tegra210
The Tegra210 actmon is compatible with the existing Tegra124 driver.
Describe the compatibles as such.

Acked-by: Chanwoo Choi <cw00c.choi@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 18:00:47 +01:00
Johan Hovold
500e1368e4 amba: tegra-ahb: Fix device leak on SMMU enable
Make sure to drop the reference taken to the AHB platform device when
looking up its driver data while enabling the SMMU.

Note that holding a reference to a device does not prevent its driver
data from going away.

Fixes: 89c788bab1 ("ARM: tegra: Add SMMU enabler in AHB")
Cc: stable@vger.kernel.org	# 3.5
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 17:49:12 +01:00
Svyatoslav Ryhel
f224e936da ARM: tegra: Add device-tree for Xiaomi Mi Pad (A0101)
The Mi Pad is a tablet computer based on Nvidia Tegra K1 SoC which
originally ran the Android operating system. The Mi Pad has a 7.9" IPS
display with 1536 x 2048 (324 ppi) resolution. 2 GB of RAM and 16/64 GB
of internal memory that can be supplemented with a microSDXC card giving
up to 128 GB of additional storage.

Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 17:43:09 +01:00
Thierry Reding
aaca2e9933 soc/tegra: pmc: Document tegra_pmc.syscore field
This eliminates a warning from the documentation build targets.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 17:42:08 +01:00
Prathamesh Shete
1c458bbe4b soc/tegra: pmc: Don't fail if "aotag" is not present
The "aotog" is an optional aperture, so if that aperture is not defined
for a given device, then initialise the 'aotag' pointer to NULL instead
of returning an error. Note that the PMC driver will not use 'aotag'
pointer if initialised to NULL.

Co-developed-by: Shardar Mohammed <smohammed@nvidia.com>
Signed-off-by: Shardar Mohammed <smohammed@nvidia.com>
Signed-off-by: Prathamesh Shete <pshete@nvidia.com>
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 17:35:36 +01:00
Aaron Kling
688dfe40b4 soc/tegra: fuse: speedo-tegra210: Add SoC speedo 2
The Jetson Nano series of modules only have 2 EMC table entries,
different from other SoC SKUs. As the EMC driver uses the SoC speedo ID
to populate the EMC OPP tables, add a new speedo ID to uniquely identify
this.

Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 17:18:25 +01:00
Aaron Kling
ce27c9c212 soc/tegra: fuse: speedo-tegra210: Update speedo IDs
Existing code only sets CPU and GPU speedo IDs 0 and 1. The CPU DVFS
code supports 11 IDs and nouveau supports 5. This aligns with what the
downstream vendor kernel supports. Align SKUs with the downstream list.

The Tegra210 CVB tables were added in the first referenced fixes commit.
Since then, all Tegra210 SoCs have tried to scale to 1.9 GHz, when the
supported devkits are only supposed to scale to 1.5 or 1.7 GHZ.
Overclocking should not be the default state.

Fixes: 2b2dbc2f94 ("clk: tegra: dfll: add CVB tables for Tegra210")
Fixes: 579db6e5d9 ("arm64: tegra: Enable DFLL support on Jetson Nano")
Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 17:17:33 +01:00
Shubhi Garg
145cde94a5 arm64: defconfig: Enable NVIDIA VRS PSEQ RTC
Enable NVIDIA VRS (Voltage Regulator Specification) RTC device module.
It provides functionality to get/set system time, retain system time
across boot, wake system from suspend and shutdown state.

Supported platforms:
- NVIDIA Jetson AGX Orin Developer Kit
- NVIDIA IGX Orin Development Kit
- NVIDIA Jetson Orin NX Developer Kit
- NVIDIA Jetson Orin Nano Developer Kit

Signed-off-by: Shubhi Garg <shgarg@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 16:29:44 +01:00
Shubhi Garg
b1806f2b4e arm64: tegra: Add device-tree node for NVVRS RTC
Add NVIDIA VRS (Voltage Regulator Specification) RTC device tree node for
Tegra234 P3701 and P3767 platforms. Assign VRS RTC as primary RTC (rtc0).

Signed-off-by: Shubhi Garg <shgarg@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 16:28:51 +01:00
Krishna Kurapati
f481e772e0 arm64: dts: qcom: sdx75: Add missing usb-role-switch property
Add missing usb-role-switch property to usb controller node.

Signed-off-by: Krishna Kurapati <krishna.kurapati@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251114061553.512441-3-krishna.kurapati@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-14 09:23:44 -06:00
Krishna Kurapati
101dae743d arm64: dts: qcom: sdx75: Flatten usb controller node
Flatten usb controller node and update to using latest bindings
and flattened driver approach.

Signed-off-by: Krishna Kurapati <krishna.kurapati@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251114061553.512441-2-krishna.kurapati@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-14 09:23:17 -06:00
Bruno Sobreira França
e13c1f34aa soc/tegra: Resolve a spelling error in the tegra194-cbb.c
Fix a typo spotted during code reading.

Signed-off-by: Bruno Sobreira França <brunofrancadevsec@gmail.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Reviewed-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 16:16:20 +01:00
Kartik Rajput
c87f820bc4 soc/tegra: fuse: Do not register SoC device on ACPI boot
On Tegra platforms using ACPI, the SMCCC driver already registers the
SoC device. This makes the registration performed by the Tegra fuse
driver redundant.

When booted via ACPI, skip registering the SoC device and suppress
printing SKU information from the Tegra fuse driver, as this information
is already provided by the SMCCC driver.

Fixes: 972167c690 ("soc/tegra: fuse: Add ACPI support for Tegra194 and Tegra234")
Cc: stable@vger.kernel.org
Signed-off-by: Kartik Rajput <kkartik@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 16:11:19 +01:00
Svyatoslav Ryhel
f797e587eb arm64: tegra: Move avdd-dsi-csi-supply into CSI node
avdd-dsi-csi-supply belongs in CSI node, not VI.

Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com> # tegra20, parallel camera
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 16:01:56 +01:00
Svyatoslav Ryhel
dc3ec7cbe9 arm64: tegra: Drop redundant clock and reset names from TSEC node
Clock and reset names are not needed if node contains only one clock and
one reset.

Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 16:01:56 +01:00
Thierry Reding
8911ee2543 arm64: tegra: Move HDA into the correct bus
HDA is part of the DISP_USB bus, so move it into that and drop the
address prefix accordingly.

Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 16:01:46 +01:00
Svyatoslav Ryhel
e7dc9c3a6f dt-bindings: display: tegra: Document Tegra20 and Tegra30 CSI
Document CSI HW block found in Tegra20 and Tegra30 SoC.

The #nvidia,mipi-calibrate-cells is not an introduction of property, such
property already exists in nvidia,tegra114-mipi.yaml and is used in
multiple device trees. In case of Tegra30 and Tegra20 CSI block combines
mipi calibration function and CSI function, in Tegra114+ mipi calibration
got a dedicated hardware block which is already supported. This property
here is used to align with mipi-calibration logic used by Tegra114+.

Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 15:54:57 +01:00
Svyatoslav Ryhel
2cf244db79 ARM: tegra: add CSI nodes for Tegra20 and Tegra30
Add CSI node to Tegra20 and Tegra30 device trees.

Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 15:54:02 +01:00
Xueyao An
aabd179bcb arm64: dts: qcom: HAMOA-IOT-SOM: Unreserve GPIOs blocking SPI11 access
GPIOs 44-47 were previously reserved, preventing Linux from accessing
SPI11 (qupv1_se3). Since there is no TZ use case for these pins on Linux,
they can be safely unreserved. Removing them from the reserved list
resolves the SPI11 access issue for Linux.

Signed-off-by: Xueyao An <xueyao.an@oss.qualcomm.com>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251105054548.2347569-1-xueyao.an@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-14 08:51:16 -06:00
Praveen Talari
9c92d36b0b arm64: dts: qcom: qrb2210-rb1: Fix UART3 wakeup IRQ storm
For BT use cases, pins are configured with pull-up state in sleep state
to avoid noise. If IRQ type is configured as level high and the GPIO line
is also in a high state, it causes continuous interrupt assertions leading
to an IRQ storm when wakeup irq enables at system suspend/runtime suspend.

Switching to edge-triggered interrupt (IRQ_TYPE_EDGE_FALLING) resolves
this by only triggering on state transitions (high-to-low) rather than
maintaining sensitivity to the static level state, effectively preventing
the continuous interrupt condition and eliminating the wakeup IRQ storm.

Fixes: 9380e0a1d4 ("arm64: dts: qcom: qrb2210-rb1: add Bluetooth support")
Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251110101043.2108414-2-praveen.talari@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-14 08:51:15 -06:00
Bjorn Andersson
bba4562adc Revert "arm64: dts: qcom: sc7280: Increase config size to 256MB for ECAM feature"
The author failed to document the dependencies of this commit, resulting
in a regression.

This reverts commit 03e928442d.

Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-14 08:51:14 -06:00
Svyatoslav Ryhel
ad5eb81cbd ARM: tegra: Add missing HOST1X device nodes on Tegra124
Add nodes for devices on the HOST1X bus: VI, ISP, ISPB, MSENC and TSEC.

Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 15:42:28 +01:00
Svyatoslav Ryhel
cead96a67e ARM: tegra: Add missing HOST1X device nodes on Tegra114
Add nodes for devices on the HOST1X bus: VI, EPP, ISP, MSENC and TSEC.

Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 15:42:16 +01:00
Svyatoslav Ryhel
a0c70244e5 dt-bindings: display: tegra: document EPP, ISP, MPE and TSEC for Tegra114+
The current EPP, ISP and MPE schemas are largely compatible with Tegra114+,
requiring only minor adjustments. Additionally, the TSEC schema for the
Security engine, which is available from Tegra114 onwards, is included.

Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 15:39:39 +01:00
Niravkumar L Rabara
44964e81d1 arm64: dts: socfpga: add Agilex3 board
Agilex3 SoCFPGA development kit is a small form factor board similar to
Agilex5 013b board. Agilex3 is derived from Agilex5 SoCFPGA, with the main
difference of CPU cores — Agilex3 has 2 cores compared to 4 in Agilex5.

Signed-off-by: Niravkumar L Rabara <niravkumarlaxmidas.rabara@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-11-14 07:00:22 -06:00
Niravkumar L Rabara
425c16c37d dt-bindings: intel: Add Agilex3 SoCFPGA board
Add compatible string for Agilex3 SoCFPGA board, which shares the same
architecture as Agilex5 but with two fewer CPU cores.

Signed-off-by: Niravkumar L Rabara <niravkumarlaxmidas.rabara@altera.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-11-14 06:59:24 -06:00
Nicolas Ferre
5654889a94 ARM: dts: microchip: sama7g5: fix uart fifo size to 32
On some flexcom nodes related to uart, the fifo sizes were wrong: fix
them to 32 data.

Fixes: 7540629e2f ("ARM: dts: at91: add sama7g5 SoC DT and sama7g5-ek")
Cc: stable@vger.kernel.org # 5.15+
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Link: https://lore.kernel.org/r/20251114103313.20220-2-nicolas.ferre@microchip.com
Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
2025-11-14 14:13:07 +02:00
Nicolas Ferre
1f591be0a0 ARM: dts: microchip: sama7d65: fix uart fifo size to 32
On some flexcom nodes related to uart, the fifo sizes were wrong: fix
them to 32 data.  Note that product datasheet is being reviewed to fix
inconsistency, but this value is validated by product's designers.

Fixes: 261dcfad1b ("ARM: dts: microchip: add sama7d65 SoC DT")
Fixes: b51e4aea3e ("ARM: dts: microchip: sama7d65: Add FLEXCOMs to sama7d65 SoC")
Cc: stable@vger.kernel.org # 6.16+
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Link: https://lore.kernel.org/r/20251114103313.20220-1-nicolas.ferre@microchip.com
Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
2025-11-14 14:13:06 +02:00
Thierry Reding
a97fbc3ee3 syscore: Pass context data to callbacks
Several drivers can benefit from registering per-instance data along
with the syscore operations. To achieve this, move the modifiable fields
out of the syscore_ops structure and into a separate struct syscore that
can be registered with the framework. Add a void * driver data field for
drivers to store contextual data that will be passed to the syscore ops.

Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 10:01:52 +01:00
Gatien Chevallier
3bae4748c1 bus: rifsc: add debugfs entry to dump the firewall configuration
RIFSC configuration can be difficult to debug. Add a debugfs entry
that dumps the configuration of the RISUPs, the RISALs and the RIMUs.
This will allow to display the whole RIFSC firewall configuration at
runtime.

While there, fix a bug on the computation of firewall entries in the
probe function.

Signed-off-by: Gatien Chevallier <gatien.chevallier@foss.st.com>
Link: https://lore.kernel.org/r/20251106-rifsc_debugfs-v2-3-f90e94ae756d@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2025-11-14 09:49:09 +01:00
Gatien Chevallier
a6961e2ced arm64: dts: st: set RIFSC as an access controller on stm32mp21x platforms
Similarly to stm32mp23x/25x platforms, the RIFSC is a firewall
controller. Declare it as an access controller, keep the "simple-bus"
compatible in case CONFIG_STM32_FIREWALL is not set and update the
child nodes.

Signed-off-by: Gatien Chevallier <gatien.chevallier@foss.st.com>
Link: https://lore.kernel.org/r/20251106-rifsc_debugfs-v2-2-f90e94ae756d@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2025-11-14 09:48:40 +01:00
Gatien Chevallier
d50807fee6 dt-bindings: bus: add stm32mp21 RIFSC compatible
The STM32MP21x platforms have a slightly different RIFSC. While its
core functionalities are similar, the wiring is not the same. Hence,
declare a new compatible.

Signed-off-by: Gatien Chevallier <gatien.chevallier@foss.st.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20251106-rifsc_debugfs-v2-1-f90e94ae756d@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2025-11-14 09:46:12 +01:00
Gatien Chevallier
af9c7a6c5d ARM: dts: stm32: add the IWDG2 interrupt line in stm32mp131.dtsi
Add the interrupt line that can be used for the early interrupt of
the IWDG2 to the IWDG2 node

Signed-off-by: Gatien Chevallier <gatien.chevallier@foss.st.com>
Link: https://lore.kernel.org/r/20251031-iwdg1-v2-4-2dc6e0116725@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2025-11-14 09:39:50 +01:00
Gatien Chevallier
5da23908a3 ARM: dts: stm32: enable the ARM SMC watchdog node in stm32mp135f-dk
On the stm32mp135f-dk board, the IWDG1 is secured and used to monitor
the cortex-A7. Use the ARM SMC watchdog to communicate with it.

Signed-off-by: Gatien Chevallier <gatien.chevallier@foss.st.com>
Link: https://lore.kernel.org/r/20251031-iwdg1-v2-3-2dc6e0116725@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2025-11-14 09:39:50 +01:00
Gatien Chevallier
5512618b58 ARM: dts: stm32: add the ARM SMC watchdog in stm32mp131.dtsi
Add the arm_wdt node in the stm32mp131.dtsi SoC device tree file. When
the platform watchdog is managed by the secure world, SMC calls are used
to interact with it.

Signed-off-by: Gatien Chevallier <gatien.chevallier@foss.st.com>
Link: https://lore.kernel.org/r/20251031-iwdg1-v2-2-2dc6e0116725@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2025-11-14 09:39:50 +01:00
Gatien Chevallier
0b1aa90e2d ARM: dts: stm32: add iwdg1 node in stm32mp131.dtsi
Add the IWDG1 node in the stm32mp131.dtsi SoC device tree file. It can
be used by Linux as the Cortex-A7 watchdog when it's configured as
non-secure.

Signed-off-by: Gatien Chevallier <gatien.chevallier@foss.st.com>
Link: https://lore.kernel.org/r/20251031-iwdg1-v2-1-2dc6e0116725@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2025-11-14 09:39:50 +01:00
Antonio Borneo
e613ef5c15 arm64: dts: st: Add I/O sync to eth pinctrl in stm32mp25-pinctrl.dtsi
On board stm32mp257f-ev1, the propagation delay between eth1/eth2
and the external PHY requires a compensation to guarantee that no
packet get lost in all the working conditions.

Add I/O synchronization properties in pinctrl on all the RGMII
data pins, activating re-sampling on both edges of the clock.

Co-developed-by: Christophe Roullier <christophe.roullier@foss.st.com>
Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com>
Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20251023132700.1199871-13-antonio.borneo@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2025-11-14 09:33:12 +01:00
Patrice Chotard
22f0ae971c arm64: dts: st: Add memory-region-names property for stm32mp257f-ev1
In order to set the AMCR register, which configures the
memory-region split between ospi1 and ospi2, we need to
identify the ospi instance.

By using memory-region-names, it allows to identify the
ospi instance this memory-region belongs to.

Fixes: cad2492de9 ("arm64: dts: st: Add SPI NOR flash support on stm32mp257f-ev1 board")
Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20251031-upstream_fix_dts_omm-v4-1-e4a059a50074@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2025-11-14 09:25:56 +01:00
Srikar Dronamraju
fb2ff9fa72 powerpc/smp: Expose die_id and die_cpumask
>From Power10 processors onwards, each chip has 2 hemispheres. For LPARs
running on PowerVM Hypervisor, hypervisor determines the allocation of
CPU groups to each LPAR, resulting in two LPARs with the same number of
CPUs potentially having different numbers of CPUs from each hemisphere.
Additionally, it is not feasible to ascertain the hemisphere based
solely on the CPU number.

Users wishing to assign their workload to all CPUs, or a subset of CPUs
within a specific hemisphere, encounter difficulties in identifying the
cpumask. To address this, it is proposed to expose hemisphere
information as a die in sysfs. This aligns with other architectures
and facilitates the identification of CPUs within the same hemisphere.
Tools such as lstopo can also access this information.

Please note: The hypervisor reveals the locality of the CPUs to
hemispheres only in dedicated mode. Consequently, in systems where
hemisphere information is unavailable, such as shared LPARs, the
die_cpus information in sysfs will mirror package_cpus, with
die_id set to -1.

Without this change.
$ grep . /sys/devices/system/cpu/cpu16/topology/{die*,package*} 2>/dev/null
/sys/devices/system/cpu/cpu16/topology/package_cpus:000000,000000ff,ffff0000
/sys/devices/system/cpu/cpu16/topology/package_cpus_list:16-39

With this change.
$ grep . /sys/devices/system/cpu/cpu16/topology/{die*,package*} 2>/dev/null
/sys/devices/system/cpu/cpu16/topology/die_cpus:000000,00000000,00ff0000
/sys/devices/system/cpu/cpu16/topology/die_cpus_list:16-23
/sys/devices/system/cpu/cpu16/topology/die_id:2
/sys/devices/system/cpu/cpu16/topology/package_cpus:000000,000000ff,ffff0000
/sys/devices/system/cpu/cpu16/topology/package_cpus_list:16-39

snipped lstopo-no-graphics o/p
  Group0 L#0 (total=8747584KB)
    Package L#0 (total=3564096KB CPUModel="POWER10 (architected), altivec supported" CPURevision="2.0 (pvr 0080 0200)")
      NUMANode L#0 (P#0 local=3564096KB total=3564096KB)
      Die L#0 (P#0)
        Core L#0 (P#0)
<snipped>
    Package L#1 (total=5183488KB CPUModel="POWER10 (architected), altivec supported" CPURevision="2.0 (pvr 0080 0200)")
      NUMANode L#1 (P#1 local=5183488KB total=5183488KB)
      Die L#2 (P#2)
        Core L#2 (P#16)
          L3Cache L#4 (size=4096KB linesize=128 ways=16)
            L2Cache L#4 (size=1024KB linesize=128 ways=8)
              L1dCache L#4 (size=32KB linesize=128 ways=8)
                L1iCache L#4 (size=48KB linesize=128 ways=6)
                  PU L#16 (P#16)
                  PU L#17 (P#18)
                  PU L#18 (P#20)
                  PU L#19 (P#22)
          L3Cache L#5 (size=4096KB linesize=128 ways=16)
            L2Cache L#5 (size=1024KB linesize=128 ways=8)
              L1dCache L#5 (size=32KB linesize=128 ways=8)
                L1iCache L#5 (size=48KB linesize=128 ways=6)
                  PU L#20 (P#17)
                  PU L#21 (P#19)
                  PU L#22 (P#21)
                  PU L#23 (P#23)
      Die L#3 (P#3)
        Core L#3 (P#24)
          L3Cache L#6 (size=4096KB linesize=128 ways=16)
            L2Cache L#6 (size=1024KB linesize=128 ways=8)
              L1dCache L#6 (size=32KB linesize=128 ways=8)
                L1iCache L#6 (size=48KB linesize=128 ways=6)
                  PU L#24 (P#24)
                  PU L#25 (P#26)
                  PU L#26 (P#28)
                  PU L#27 (P#30)
          L3Cache L#7 (size=4096KB linesize=128 ways=16)
            L2Cache L#7 (size=1024KB linesize=128 ways=8)
              L1dCache L#7 (size=32KB linesize=128 ways=8)
                L1iCache L#7 (size=48KB linesize=128 ways=6)
                  PU L#28 (P#25)
                  PU L#29 (P#27)
                  PU L#30 (P#29)
                  PU L#31 (P#31)
        Core L#4 (P#32)
          L3Cache L#8 (size=4096KB linesize=128 ways=16)
            L2Cache L#8 (size=1024KB linesize=128 ways=8)
              L1dCache L#8 (size=32KB linesize=128 ways=8)
                L1iCache L#8 (size=48KB linesize=128 ways=6)
                  PU L#32 (P#32)
                  PU L#33 (P#34)
                  PU L#34 (P#36)
                  PU L#35 (P#38)
          L3Cache L#9 (size=4096KB linesize=128 ways=16)
            L2Cache L#9 (size=1024KB linesize=128 ways=8)
              L1dCache L#9 (size=32KB linesize=128 ways=8)
                L1iCache L#9 (size=48KB linesize=128 ways=6)
                  PU L#36 (P#33)
                  PU L#37 (P#35)
                  PU L#38 (P#37)
                  PU L#39 (P#39)
  Group0 L#1 (total=7736896KB)
    Package L#2 (total=5170880KB CPUModel="POWER10 (architected), altivec supported" CPURevision="2.0 (pvr 0080 0200)")
      NUMANode L#2 (P#2 local=5170880KB total=5170880KB)
      Die L#4 (P#4)
<snipped>

Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251112074859.814087-1-srikar@linux.ibm.com
2025-11-14 11:12:56 +05:30
Chukun Pan
9c79c9e035 arm64: dts: rockchip: fixes ethernet for 100ASK DshanPi A1
Currently, Ethernet is unusable due to an incorrect PHY address.
This commit fixes this, removes the incorrect 25M clock pinctrl,
and adds the missing PHY supply.

Fixes: d809417c5a ("arm64: dts: rockchip: add DTs for 100ASK DShanPi A1")
Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
Link: https://patch.msgid.link/20251101120010.41729-3-amadeus@jmu.edu.cn
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-14 00:02:06 +01:00
Chukun Pan
0b822c59c8 arm64: dts: rockchip: fixes regulator for 100ASK DshanPi A1
Referencing the schematic [1], correct the names of the USB
regulator, remove these non-existent RTC and UFS regulators.

[1] https://dl.100ask.net/Hardware/MPU/RK3576-DshanPi-A1/DshanPi-A1-RK3576-SCH_V1.1.pdf

Fixes: d809417c5a ("arm64: dts: rockchip: add DTs for 100ASK DShanPi A1")
Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
Link: https://patch.msgid.link/20251101120010.41729-2-amadeus@jmu.edu.cn
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-14 00:02:06 +01:00
Johan Jonker
76f4f7e04c arm64: dts: rockchip: correct assigned-clock-rates spelling on 2 boards
Due to some copy and paste from the manufacturer tree the property
assigned-clock-rate is missing a letter "s". Correct spelling to
reduce dtbs_check output.

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://patch.msgid.link/92714b6c-6c0d-4a10-afe4-73ed313c87c0@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-13 23:54:37 +01:00
Johan Jonker
10712ce694 ARM: dts: rockchip: move edp assigned-clocks to edp node on rk3288
The rk3288 power-controller node contains an assigned-clocks property
that conflicts with the bindings. From the git history it shows that they
wanted to assign the rk3288 EDP_24M clock input centrally before an edp
node was available. Move the edp assigned-clocks property to the edp node
to reduce dtbs_check output.

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://patch.msgid.link/7d6fa223-ab90-4c44-9180-54df78467ea5@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-13 23:52:04 +01:00
Coia Prant
c1ffe49930 arm64: dts: rockchip: clean up devicetree for 9Tripod X3568 v4
Fix indentation, remove unused SDIO properties, and drop the GMAC clock
that was used for input direction. The board uses the clock as
output, so the input clock is not needed.

Signed-off-by: Coia Prant <coiaprant@gmail.com>
Link: https://patch.msgid.link/20251107133839.300252-1-coiaprant@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-13 23:50:09 +01:00
Chris Morgan
7230cdc1fc arm64: dts: rockchip: Enable USB-C DP Alt for Indiedroid Nova
Enable the Display Port alt-mode for the USB-C port on the Indiedroid
Nova.

Note that while ROCKCHIP_VOP2_EP_DP0 is defined as 10 we need to
set the address to "a" or else we receive a dtc warning.

Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
Link: https://patch.msgid.link/20251107214724.878955-1-macroalpha82@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-13 23:48:37 +01:00
Sebastian Reichel
9d856aa1c8 arm64: dts: rockchip: add eMMC CQE support for rk3588
The RK3588 eMMC controller supports CQE, so add the missing
DT flag.

Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Link: https://patch.msgid.link/20251031-rockchip-emmc-cqe-support-v2-2-958171f5edad@collabora.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-13 23:46:02 +01:00
Torsten Duwe
3906f85588 arm64: dts: rockchip: enable HDMI audio on Rock 5 ITX
The Rock 5 ITX only needs enablement for 2 nodes in order to send audio
on HDMI1, the connector closer to the 12V barrel jack and farther from
S/PDIF.  It is sufficient to declare the audio injection as okay, and
to activate I2S6.

Note that for the other HDMI output it is not that trivial, as the video
data there originates from the SoC's DisplayPort output DP1 and is only
converted to HDMI in U7 (an RA620).

Signed-off-by: Torsten Duwe <duwe@lst.de>
[fixed commit subject prefixes]
Link: https://patch.msgid.link/20251110181153.CC62B6732A@verein.lst.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-13 23:43:02 +01:00
FUKAUMI Naoki
260316d35c arm64: dts: rockchip: Add eeprom vcc-supply for Radxa ROCK 3C
The VCC supply for the BL24C16 EEPROM chip found on Radxa ROCK 3C is
vcca1v8_pmu. [1] Describe this supply.

[1] https://dl.radxa.com/rock3/docs/hw/3c/v1400/radxa_rock_3c_v1400_schematic.pdf p.13

Fixes: ee219017dd ("arm64: dts: rockchip: Add Radxa ROCK 3C")
Signed-off-by: FUKAUMI Naoki <naoki@radxa.com>
Link: https://patch.msgid.link/20251112035133.28753-4-naoki@radxa.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-13 23:28:24 +01:00
FUKAUMI Naoki
3069ff1930 arm64: dts: rockchip: Add eeprom vcc-supply for Radxa ROCK 5A
The VCC supply for the BL24C16 EEPROM chip found on Radxa ROCK 5A is
vcc_3v3_pmu, which is routed to vcc_3v3_s3 via a zero-ohm resistor. [1]
Describe this supply.

[1] https://dl.radxa.com/rock5/5a/docs/hw/radxa_rock5a_V1.1_sch.pdf p.4, p.19

Fixes: 89c880808c ("arm64: dts: rockchip: add I2C EEPROM to rock-5a")
Signed-off-by: FUKAUMI Naoki <naoki@radxa.com>
Link: https://patch.msgid.link/20251112035133.28753-3-naoki@radxa.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-13 23:28:24 +01:00
FUKAUMI Naoki
92e6e0b0e5 arm64: dts: rockchip: Move the EEPROM to correct I2C bus on Radxa ROCK 5A
The BL24C16 EEPROM chip found on Radxa ROCK 5A is connected to the
i2c0 bus, [1] so move the eeprom node from the i2c2 bus to the i2c0
bus.

[1] Link: https://dl.radxa.com/rock5/5a/docs/hw/radxa_rock5a_V1.1_sch.pdf p.19

Fixes: 89c880808c ("arm64: dts: rockchip: add I2C EEPROM to rock-5a")
Signed-off-by: FUKAUMI Naoki <naoki@radxa.com>
Link: https://patch.msgid.link/20251112035133.28753-2-naoki@radxa.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-13 23:28:23 +01:00
Heiko Stuebner
4c60942598 arm64: dts: rockchip: use SCMI clock id for gpu clock on rk356x
Instead of hard-coding 1, use the more descriptive ID from the binding
to reference the SCMI clock for the gpu on rk356x.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20251105235408.163282-1-heiko@sntech.de
2025-11-13 23:25:24 +01:00
Shawn Lin
f151b74ead arm64: dts: rockchip: Remove sdmmc max-frequency on RK3588S EVB1 board
sdmmc on RK3588S EVB1 could work fine under 200Mhz, no need to limit it
to 150Mhz.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Link: https://patch.msgid.link/1762844673-123776-1-git-send-email-shawn.lin@rock-chips.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-13 23:23:10 +01:00
FUKAUMI Naoki
57bd160290 arm64: dts: rockchip: Remove sdmmc max-frequency for Radxa ROCK 5 ITX/5B/5B+/5T
Default max-frequency for sdmmc is "200000000"[1]. Remove redundant
definition.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi?h=v6.17#n2013

Signed-off-by: FUKAUMI Naoki <naoki@radxa.com>
Link: https://patch.msgid.link/20251111071730.126238-4-naoki@radxa.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-13 23:22:35 +01:00
FUKAUMI Naoki
c7a9549ce4 arm64: dts: rockchip: Switch microSD card detect to gpio on Radxa ROCK 5 ITX/5C
Due to the discussion about cd-gpios and sdmmmc_det pin functionality [1],
it would be better to use cd-gpios for now.

When the sdmmc controller runtime-suspends, the detection logic inside the
controller cannot detect anything anymore, which using the gpio variant
fixes.

The Rock 5B/5B+/5T already uses cd-gpios, so only get the pinctrl added.

[1] https://lore.kernel.org/linux-rockchip/20240912152538.1.I858c2a0bf83606c8b59ba1ab6944978a398d2ac5@changeid/

Signed-off-by: FUKAUMI Naoki <naoki@radxa.com>
[amended commit description a bit and squashed the pinctrl patch]
Link: https://patch.msgid.link/20251111071730.126238-2-naoki@radxa.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-13 23:22:35 +01:00
Yosry Ahmed
9f4ce48788 KVM: x86: Document a virtualization gap for GIF on AMD CPUs
According to the APM Volume #2, Section 15.17, Table 15-10 (24593—Rev.
3.42—March 2024), When "GIF==0", an "Debug exception or trap, due to
breakpoint register match" should be "Ignored and discarded".

KVM lacks any handling of this. Even when vGIF is enabled and vGIF==0,
the CPU does not ignore #DBs and relies on the VMM to do so.

Handling this is possible, but the complexity is unjustified given the
rarity of using HW breakpoints when GIF==0 (e.g. near VMRUN). KVM would
need to intercept the #DB, temporarily disable the breakpoint,
singe-step over the instruction (probably reusing NMI singe-stepping),
and re-enable the breakpoint.

Instead, document this as an erratum.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://patch.msgid.link/20251030223757.2950309-1-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-13 13:15:15 -08:00
Omar Sandoval
4da3768e18 KVM: SVM: Don't skip unrelated instruction if INT3/INTO is replaced
When re-injecting a soft interrupt from an INT3, INT0, or (select) INTn
instruction, discard the exception and retry the instruction if the code
stream is changed (e.g. by a different vCPU) between when the CPU
executes the instruction and when KVM decodes the instruction to get the
next RIP.

As effectively predicted by commit 6ef88d6e36 ("KVM: SVM: Re-inject
INT3/INTO instead of retrying the instruction"), failure to verify that
the correct INTn instruction was decoded can effectively clobber guest
state due to decoding the wrong instruction and thus specifying the
wrong next RIP.

The bug most often manifests as "Oops: int3" panics on static branch
checks in Linux guests.  Enabling or disabling a static branch in Linux
uses the kernel's "text poke" code patching mechanism.  To modify code
while other CPUs may be executing that code, Linux (temporarily)
replaces the first byte of the original instruction with an int3 (opcode
0xcc), then patches in the new code stream except for the first byte,
and finally replaces the int3 with the first byte of the new code
stream.  If a CPU hits the int3, i.e. executes the code while it's being
modified, then the guest kernel must look up the RIP to determine how to
handle the #BP, e.g. by emulating the new instruction.  If the RIP is
incorrect, then this lookup fails and the guest kernel panics.

The bug reproduces almost instantly by hacking the guest kernel to
repeatedly check a static branch[1] while running a drgn script[2] on
the host to constantly swap out the memory containing the guest's TSS.

[1]: https://gist.github.com/osandov/44d17c51c28c0ac998ea0334edf90b5a
[2]: https://gist.github.com/osandov/10e45e45afa29b11e0c7209247afc00b

Fixes: 6ef88d6e36 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
Cc: stable@vger.kernel.org
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Link: https://patch.msgid.link/1cc6dcdf36e3add7ee7c8d90ad58414eeb6c3d34.1762278762.git.osandov@fb.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-13 13:03:19 -08:00
Claudiu Beznea
c432180a7d soc: renesas: rz-sysc: Populate readable_reg/writeable_reg in regmap config
Not all system controller registers are accessible from Linux. Accessing
such registers generates synchronous external abort. Populate the
readable_reg and writeable_reg members of the regmap config to inform the
regmap core which registers can be accessed. The list will need to be
updated whenever new system controller functionality is exported through
regmap.

Fixes: 2da2740fb9 ("soc: renesas: rz-sysc: Add syscon/regmap support")
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251105070526.264445-3-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-13 21:21:30 +01:00
Claudiu Beznea
4ff787433b soc: renesas: r9a09g056-sys: Populate max_register
Populate max_register to avoid external aborts.

Fixes: 2da2740fb9 ("soc: renesas: rz-sysc: Add syscon/regmap support")
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251105070526.264445-2-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-13 21:21:30 +01:00
Lad Prabhakar
b272b94fd2 arm64: dts: renesas: rzt2h-n2h-evk: Enable Ethernet support
Enable Ethernet support on the RZ/T2H and RZ/N2H EVKs.

Configure the MIIC converter in mode 0x6:
  Port 0 <-> ETHSW Port 0
  Port 1 <-> ETHSW Port 1
  Port 2 <-> GMAC2
  Port 3 <-> GMAC1

Enable the ETHSS, GMAC1 and GMAC2 nodes. ETHSW support will be added
once the switch driver is available.

Configure the MIIC converters to map ports according to the selected
switching mode, with converters 0 and 1 mapped to switch ports and
converters 2 and 3 mapped to GMAC ports.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251110203926.692242-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-13 21:19:22 +01:00
Ovidiu Panait
1eb61aa4d8 arm64: dts: renesas: r9a09g057h44-rzv2h-evk: Enable RTC
Enable RTC.

Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251107210706.45044-5-ovidiu.panait.rb@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-13 21:19:22 +01:00
Ovidiu Panait
cfc733da4e arm64: dts: renesas: r9a09g057: Add RTC node
Add RTC node to Renesas RZ/V2H ("R9A09G057") SoC DTSI.

Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251107210706.45044-4-ovidiu.panait.rb@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-13 21:19:22 +01:00
Ovidiu Panait
f61c497a91 arm64: dts: renesas: r9a09g057h44-rzv2h-evk: Add NMI pushbutton support
RZ/V2H EVK has a user pushbutton connected to the SoC NMI pin, which
can be used to wake up the system from suspend to idle. Add a DT node
in the device tree to instantiate the gpio-keys driver for this button.

Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251027140651.18367-1-ovidiu.panait.rb@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-13 21:19:22 +01:00
Claudiu Beznea
0926e6e698 arm64: dts: renesas: rzg3s-smarc: Enable USB support
Enable USB support (host, device, USB PHYs).

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Link: https://patch.msgid.link/20251023135810.1688415-8-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-13 21:19:22 +01:00
Claudiu Beznea
c837ad879e arm64: dts: renesas: r9a08g045: Add USB support
Add USB nodes for the Renesas RZ/G3S SoC. This consists of PHY reset,
host and device support.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Link: https://patch.msgid.link/20251023135810.1688415-7-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-13 21:19:22 +01:00
Ovidiu Panait
1f77aced8c arm64: dts: renesas: r9a09g057: Add TSU nodes
The Renesas RZ/V2H SoC includes a Thermal Sensor Unit (TSU) block designed
to measure the junction temperature. The device provides real-time
temperature measurements for thermal management, utilizing two dedicated
channels for temperature sensing:
- TSU0, which is located near the DRP-AI block
- TSU1, which is located near the CPU and DRP-AI block

Since TSU1 is physically closer the CPU and the highest temperature
spot, it is used for CPU throttling through a passive trip and cooling
map. TSU0 is configured only with a critical trip.

Add TSU nodes along with thermal zones and keep them enabled in the SoC
DTSI.

Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251020143107.13974-4-ovidiu.panait.rb@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-13 21:19:22 +01:00
Markus Schneider-Pargmann (TI.com)
e0431ff998 firmware: ti_sci: Partial-IO support
Add support for Partial-IO poweroff. In Partial-IO pins of a few
hardware units can generate system wakeups while DDR memory is not
powered resulting in a fresh boot of the system. These hardware units in
the SoC are always powered so that some logic can detect pin activity.

If the system supports Partial-IO as described in the fw capabilities, a
sys_off handler is added. This sys_off handler decides if the poweroff
is executed by entering normal poweroff or Partial-IO instead. The
decision is made by checking if wakeup is enabled on all devices that
may wake up the SoC from Partial-IO.

The possible wakeup devices are found by checking which devices
reference a "Partial-IO" system state in the list of wakeup-source
system states. Only devices that are actually enabled by the user will
be considered as an active wakeup source. If none of the wakeup sources
is enabled the system will do a normal poweroff. If at least one wakeup
source is enabled it will instead send a TI_SCI_MSG_PREPARE_SLEEP
message from the sys_off handler. Sending this message will result in an
immediate shutdown of the system. No execution is expected after this
point. The code will wait for 5s and do an emergency_restart afterwards
if Partial-IO wasn't entered at that point.

A short documentation about Partial-IO can be found in section 6.2.4.5
of the TRM at
  https://www.ti.com/lit/pdf/spruiv7

Signed-off-by: Markus Schneider-Pargmann (TI.com) <msp@baylibre.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Kendall Willis <k-willis@ti.com>
Reviewed-by: Sebin Francis <sebin.francis@ti.com>
Link: https://patch.msgid.link/20251103-topic-am62-partialio-v6-12-b4-v10-2-0557e858d747@baylibre.com
Signed-off-by: Nishanth Menon <nm@ti.com>
2025-11-13 13:03:55 -06:00
Markus Schneider-Pargmann (TI.com)
170a3ef605 firmware: ti_sci: Support transfers without response
Check the header flags if an response is expected or not. If it is not
expected skip the receive part of ti_sci_do_xfer(). This prepares the
driver for one-way messages as prepare_sleep for Partial-IO.

Reviewed-by: Kendall Willis <k-willis@ti.com>
Signed-off-by: Markus Schneider-Pargmann (TI.com) <msp@baylibre.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Sebin Francis <sebin.francis@ti.com>
Link: https://patch.msgid.link/20251103-topic-am62-partialio-v6-12-b4-v10-1-0557e858d747@baylibre.com
Signed-off-by: Nishanth Menon <nm@ti.com>
2025-11-13 13:03:55 -06:00
Vignesh Raghavendra
00fb4c73b6 arm64: dts: ti: k3-am62l: add initial reference board file
Add the initial board file for the AM62L3's Evaluation Module.

Reviewed-by: Dhruva Gole <d-gole@ti.com>
Signed-off-by: Bryan Brattlof <bb@ti.com>
Link: https://patch.msgid.link/20251105-am62lx-v8-3-496f353e8237@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-13 22:01:07 +05:30
Vignesh Raghavendra
5f016758b0 arm64: dts: ti: k3-am62l: add initial infrastructure
Add the initial infrastructure needed for the AM62L. ALl of which can be
found in the Technical Reference Manual (TRM) located here:

    https://www.ti.com/lit/pdf/sprujb4

Reviewed-by: Dhruva Gole <d-gole@ti.com>
Signed-off-by: Bryan Brattlof <bb@ti.com>
Link: https://patch.msgid.link/20251105-am62lx-v8-2-496f353e8237@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-13 22:01:07 +05:30
Sean Christopherson
398180f93c KVM: TDX: Use struct_size to simplify tdx_get_capabilities()
Use struct_size() instead of manually calculating the number of bytes to
allocate for 'caps', including the nested flexible array, and copy all of
'caps' to user space with a single copy_to_user() call (thanks to the full
size being provided by struct_size()).

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Link: https://patch.msgid.link/20251017213914.167301-1-thorsten.blum@linux.dev
[sean: separate from swap of get_user() vs. kzalloc() ordering]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-13 08:30:07 -08:00
Thorsten Blum
11b79f8318 KVM: TDX: Check size of user's kvm_tdx_capabilities array before allocating
When userspace is getting TDX capabilities, retrieve and check the number
of user entries before allocating kernel scratch space to avoid having to
unwind the allocation if get_user() fails or if 'user_caps' is too small
to fit 'caps'.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Link: https://patch.msgid.link/20251017213914.167301-1-thorsten.blum@linux.dev
[sean: split to separate patch]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-13 08:29:59 -08:00
Bryan Brattlof
b70d9d7dac dt-bindings: arm: ti: Add binding for AM62L SoCs
Add the binding for TI's AM62L family of devices.

Reviewed-by: Dhruva Gole <d-gole@ti.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Bryan Brattlof <bb@ti.com>
Link: https://patch.msgid.link/20251105-am62lx-v8-1-496f353e8237@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-13 21:48:30 +05:30
João Paulo Gonçalves
9f748a6177 arm64: dts: ti: am69-aquila: Add Clover
Add support for Aquila AM69 mated with Clover carrier board.

Link: https://www.toradex.com/computer-on-modules/aquila-arm-family/ti-am69
Link: https://www.toradex.com/products/carrier-board/clover
Signed-off-by: João Paulo Gonçalves <joao.goncalves@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Link: https://patch.msgid.link/20251111175502.8847-4-francesco@dolcini.it
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-13 21:45:30 +05:30
Parth Pancholi
39ac6623b1 arm64: dts: ti: Add Aquila AM69 Support
Add support for the Toradex Aquila AM69 and its Development Carrier
Board.

The Aquila AM69 SoM is based on the TI AM69 SoC from the Jacinto 7
family and is designed for high-end embedded computing, featuring up to
32GB of LPDDR4 and 256GB eMMC storage, extensive multimedia support (3x
Quad CSI, 2x Quad DSI, DisplayPort, 5x Audio I2S/TDM), six Ethernet
interfaces (1x 1G, 4x 2.5G SGMII, 1x 10G), USB 3.2 Host/DRD support, and
a Wi-Fi 7/BT 5.3 module, alongside an RX8130 RTC, I2C EEPROM and
Temperature Sensor, and optional TPM 2.0 module.

Various nodes, inherited from the SoC dtsi, are explicitly disabled in
the SoM dtsi file (`status = disabled`) even if already disabled. These
nodes need to be disabled in the SoM, given that the node is not
complete there, explicitly disabling it limits the dependency on the SoC
dtsi allowing for refactoring without no impact on this file.

Link: https://www.toradex.com/computer-on-modules/aquila-arm-family/ti-am69
Link: https://www.toradex.com/products/carrier-board/aquila-development-board-kit
Signed-off-by: Parth Pancholi <parth.pancholi@toradex.com>
Co-developed-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com>
Signed-off-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com>
Co-developed-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com>
Signed-off-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com>
Co-developed-by: João Paulo Gonçalves <joao.goncalves@toradex.com>
Signed-off-by: João Paulo Gonçalves <joao.goncalves@toradex.com>
Co-developed-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Link: https://patch.msgid.link/20251111175502.8847-3-francesco@dolcini.it
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-13 21:45:30 +05:30
João Paulo Gonçalves
2f6ef830a7 dt-bindings: arm: ti: add Toradex Aquila AM69
Add DT compatible strings for the Toradex Aquila AM69 SoM and its
supported carrier boards: the Aquila Development Board and the Clover
carrier board.

Link: https://www.toradex.com/computer-on-modules/aquila-arm-family/ti-am69
Link: https://www.toradex.com/products/carrier-board/aquila-development-board-kit
Link: https://www.toradex.com/products/carrier-board/clover
Signed-off-by: João Paulo Gonçalves <joao.goncalves@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20251111175502.8847-2-francesco@dolcini.it
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-13 21:45:29 +05:30
Dave Hansen
27376465e9 KVM: TDX: Fix sparse warnings from using 0 for NULL
Stop using 0 for NULL.

sparse moans:

	... arch/x86/kvm/vmx/tdx.c:859:38: warning: Using plain integer as NULL pointer

for several TDX pointer initializations. While I love a good ptr=0
now and then, it's good to have quiet sparse builds.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Fixes: a50f673f25 ("KVM: TDX: Do TDX specific vcpu initialization")
Fixes: 8d032b683c ("KVM: TDX: create/destroy VM structure")
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Kirill A. Shutemov" <kas@kernel.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Link: https://patch.msgid.link/20251103234439.DC8227E4@davehans-spike.ostc.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-13 07:10:40 -08:00
Dave Hansen
228add34dc KVM: TDX: Remove __user annotation from kernel pointer
Separate __user pointer variable declaration from kernel one.

There are two 'kvm_cpuid2' pointers involved here. There's an "input"
side: 'td_cpuid' which is a normal kernel pointer and an 'output'
side. The output here is userspace and there is an attempt at properly
annotating the variable with __user:

	struct kvm_cpuid2 __user *output, *td_cpuid;

But, alas, this is wrong. The __user in the definition applies to both
'output' and 'td_cpuid'. Sparse notices the address space mismatch and
will complain about it.

Fix it up by completely separating the two definitions so that it is
obviously correct without even having to know what the C syntax rules
even are.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Fixes: 488808e682 ("KVM: x86: Introduce KVM_TDX_GET_CPUID")
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Kirill A. Shutemov" <kas@kernel.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Link: https://patch.msgid.link/20251103234437.A0532420@davehans-spike.ostc.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-13 07:10:39 -08:00
Rick Edgecombe
9a89894f30 KVM: TDX: Take MMU lock around tdh_vp_init()
Take MMU lock around tdh_vp_init() in KVM_TDX_INIT_VCPU to prevent
meeting contention during retries in some no-fail MMU paths.

The TDX module takes various try-locks internally, which can cause
SEAMCALLs to return an error code when contention is met. Dealing with
an error in some of the MMU paths that make SEAMCALLs is not straight
forward, so KVM takes steps to ensure that these will meet no contention
during a single BUSY error retry. The whole scheme relies on KVM to take
appropriate steps to avoid making any SEAMCALLs that could contend while
the retry is happening.

Unfortunately, there is a case where contention could be met if userspace
does something unusual. Specifically, hole punching a gmem fd while
initializing the TD vCPU. The impact would be triggering a KVM_BUG_ON().

The resource being contended is called the "TDR resource" in TDX docs
parlance. The tdh_vp_init() can take this resource as exclusive if the
'version' passed is 1, which happens to be version the kernel passes. The
various MMU operations (tdh_mem_range_block(), tdh_mem_track() and
tdh_mem_page_remove()) take it as shared.

There isn't a KVM lock that maps conceptually and in a lock order friendly
way to the TDR lock. So to minimize infrastructure, just take MMU lock
around tdh_vp_init(). This makes the operations we care about mutually
exclusive. Since the other operations are under a write mmu_lock, the code
could just take the lock for read, however this is weirdly inverted from
the actual underlying resource being contended. Since this is covering an
edge case that shouldn't be hit in normal usage, be a little less weird
and take the mmu_lock for write around the call.

Fixes: 02ab57707b ("KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table")
Reported-by: Yan Zhao <yan.y.zhao@intel.com>
Suggested-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://patch.msgid.link/20251028002824.1470939-1-rick.p.edgecombe@intel.com
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
[sean: tweak comment and capture PUNCH_HOLE interaction]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-13 07:10:30 -08:00
Bernd Schubert
1ce120dcef fuse: Always flush the page cache before FOPEN_DIRECT_IO write
This was done as condition on direct_io_allow_mmap, but I believe
this is not right, as a file might be open two times - once with
write-back enabled another time with FOPEN_DIRECT_IO.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-13 14:54:05 +01:00
Bernd Schubert
b359af8275 fuse: Invalidate the page cache after FOPEN_DIRECT_IO write
generic_file_direct_write() also does this and has a large
comment about.

Reproducer here is xfstest's generic/209, which is exactly to
have competing DIO write and cached IO read.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-13 14:54:05 +01:00
Miquel Sabaté Solà
47781ee71f fuse: rename 'namelen' to 'namesize'
By "length of a string" usually the number of non-null chars is
meant (i.e. strlen(str)). So the variable 'namelen' was confusingly
named, whereas 'namesize' refers more to what's being done in
'get_security_context'.

Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-13 10:38:01 +01:00
Miquel Sabaté Solà
c755a09b52 fuse: use strscpy instead of strcpy
As pointed out in [1], strcpy() is deprecated in favor of strscpy().

Furthermore, the size of the buffer for the name to be copied is well known
at this point since we are going to move the pointer by that much on the
next line.  Hence, it's safe to assume 'namelen' for the size of the string
to be copied.

[1] https://github.com/KSPP/linux/issues/88

Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-13 10:36:44 +01:00
Fange Zhang
fdc7da1733 arm64: defconfig: Enable SX150x GPIO expander driver
The ANX7625 bridge on the Qualcomm QCS615 Ride reference board is connected
to a Semtech SX150x GPIO expander. Enable the SX150x driver to make the
display on boards built following this design functional.

Signed-off-by: Fange Zhang <fange.zhang@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251105-enable-sx150x-gpio-expander-v3-1-2ec8dfde2c9e@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-12 14:49:45 -06:00
Adrian Ng Ho Yin
5e7235d122 arm64: dts: intel: agilex5: Add Altera compatible for I3C controllers
Add the "altr,agilex5-dw-i3c-master" compatible string to the
I3C controller nodes on the Agilex5 SoCFPGA platform.

Signed-off-by: Adrian Ng Ho Yin <adrianhoyin.ng@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-11-12 12:57:02 -06:00
Junhui Liu
fa9311d949 riscv: defconfig: Enable Anlogic SoC
Enable Anlogic SoC config in defconfig to allow the default upstream
kernel booting on Milianke MLKPAI-FS01 board.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-12 17:06:57 +00:00
Junhui Liu
7e6fd69c12 riscv: dts: anlogic: Add Milianke MLKPAI FS01 board
Add support for the Milianke MLKPAI FS01 board based on the Anlogic
DR1V90 SoC. The board features 512MB of onboard memory, USB-C UART, 1GbE
RJ45 Ethernet, USB-A 2.0 port, TF card slot, and 256Mbit Quad-SPI flash.

Currently, the board can boot to a console via UART1, which is connected
to the onboard serial chip and routed to the Type-C interface.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-12 17:06:57 +00:00
Junhui Liu
77874ebd40 riscv: dts: Add initial Anlogic DR1V90 SoC device tree
DR1V90 is a FPSoC from Anlogic, which features a RISC-V core as the PS
part and 94,464 LUTs for the PL part.

The PS part integrates a Nuclei UX900 RISC-V core with 32KB L1 icache
and 32KB L1 dcache. It also provides two "snps,dw-apb-uart" compatible
UART controllers.

Some basic information of the processor can be obtained by running a
simple application from nuclei-sdk [1]:

-----Nuclei RISC-V CPU Configuration Information-----
         MARCHID: 0xc900
          MIMPID: 0x20300
             ISA: RV64 A B C D F I M P S U
            MCFG: TEE ECC ECLIC PLIC PPI ILM DLM ICACHE DCACHE IREGION No-Safety-Mechanism DLEN=VLEN/2
             ILM: 256 KB has-ecc
             DLM: 256 KB has-ecc
          ICACHE: 32 KB(set=256,way=2,lsize=64,ecc=1)
          DCACHE: 32 KB(set=256,way=2,lsize=64,ecc=1)
             TLB: MainTLB(set=32,way=2,entry=1,ecc=1) ITLB(entry=8) DTLB(entry=8)
         IREGION: 0x68000000 128 MB
                  Unit        Size        Address
                  INFO        64KB        0x68000000
                  DEBUG       64KB        0x68010000
                  ECLIC       64KB        0x68020000
                  TIMER       64KB        0x68030000
                  PLIC        64MB        0x6c000000
     INFO-Detail:
                  mpasize : 0
             PPI: 0xf8000000 128 MB
-----End of Nuclei CPU INFO-----

Link: https://github.com/Nuclei-Software/nuclei-sdk/blob/master/application/baremetal/cpuinfo/main.c [1]
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-12 17:06:56 +00:00
Junhui Liu
9c96219602 riscv: Add Anlogic SoC famly Kconfig support
The first SoC in the Anlogic series is DR1V90, which contains a RISC-V
core from Nuclei.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-12 17:06:56 +00:00
Junhui Liu
a94f9be294 dt-bindings: serial: snps-dw-apb-uart: Add Anlogic DR1V90 uart
The Anlogic DR1V90 SoC integrates a UART controller compatible with
snps,dw-apb-uart, operating at a 50 MHz clock.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-12 17:06:56 +00:00
Junhui Liu
ccc3fd3ebe dt-bindings: timer: Add Anlogic DR1V90 ACLINT MTIMER
Add MTIMER support for Anlogic DR1V90 SoC, which uses Nuclei UX900 with
a TIMER unit compliant with the ACLINT specification.

Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-12 17:06:56 +00:00
Junhui Liu
4689d4422a dt-bindings: riscv: Add Anlogic DR1V90
Add Anlogic DR1V90 FPSoC, featuring a UX900 RISC-V core as the
processing system (PS) and 94,464 LUTs programmable logic (PL). It is
used by the Milianke MLKPAI-FS01 board, a SBC equipped with 512MB DDR3
memory, USB-C UART, 1GbE RJ45 Ethernet, USB-A 2.0 port, TF card slot,
and 256Mbit Quad-SPI flash.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-12 17:06:56 +00:00
Junhui Liu
66c2a3173c dt-bindings: riscv: Add Nuclei UX900 compatibles
The UX900 is a RISC-V core from Nuclei, used in the Anlogic DR1V90 SoC.
It features a 64-bit architecture and dual-issue, 9-stage pipeline, with
lots of optional extensions including V, K, Zc, and more.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-12 17:06:56 +00:00
Junhui Liu
c86ee66e14 dt-bindings: vendor-prefixes: Add Anlogic, Milianke and Nuclei
Add vendor prefixes for "anlogic", "milianke" and "nuclei". These are
required for describing the Milianke MLKPAI-FS01 board with DR1V90 SoC
from Anlogic, which uses a processor core designed by Nuclei.

Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-12 17:06:56 +00:00
Alex Elder
3187e25eb2 riscv: defconfig: enable SPI_FSL_QUADSPI as a module
The SpacemiT K1 SoC QSPI IP uses the Freescale driver.  Enable it as a
module in the default kernel configuration for RISC-V.

Acked-by: Paul Walmsley <pjw@kernel.org>  # for arch/riscv
Signed-off-by: Alex Elder <elder@riscstar.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-12 11:26:51 +00:00
Troy Mitchell
5a97a38c22 riscv: dts: spacemit: define all missing I2C controller nodes
SpacemiT K1 SoC is equipped with a total of nine I2C controllers,
ranging from I2C0 to I2C8.

Prior to this change, only I2C2 and I2C8 were explicitly defined
within the device tree. This patch comprehensively adds the
device tree node definitions for I2C controller 0, 1, 4 to 7.

The I2C3 node is not added because it belongs exclusively to the
secure domain which not used in the linux realm.

All newly added I2C nodes are set to "disabled" status by default,
allowing future board-specific device tree to enable and configure.

Signed-off-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Link: https://lore.kernel.org/r/20251105-k1-add-i2c-node-v1-2-d18dae246137@linux.spacemit.com
Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-11-12 18:53:26 +08:00
Luis Henriques
b4909ae8d4 fuse: refactor fuse_conn_put() to remove negative logic.
There is no functional change with this patch.  It simply refactors
function fuse_conn_put() to not use negative logic, which makes it more
easier to read.

Signed-off-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-12 11:45:03 +01:00
Luis Henriques
64becd224f fuse: new work queue to invalidate dentries from old epochs
With the infrastructure introduced to periodically invalidate expired
dentries, it is now possible to add an extra work queue to invalidate
dentries when an epoch is incremented.  This work queue will only be
triggered when the 'inval_wq' parameter is set.

Signed-off-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-12 11:45:03 +01:00
Luis Henriques
ab84ad5973 fuse: new work queue to periodically invalidate expired dentries
This patch adds the necessary infrastructure to keep track of all dentries
created for FUSE file systems.  A set of rbtrees, protected by hashed
locks, will be used to keep all these dentries sorted by expiry time.

A new module parameter 'inval_wq' is also added.  When set, it will start
a work queue which will periodically invalidate expired dentries.  The
value of this new parameter is the period, in seconds, for this work
queue.  Once this parameter is set, every new dentry will be added to one
of the rbtrees.

When the work queue is executed, it will check all the rbtrees and will
invalidate those dentries that have timed-out.

The work queue period can not be smaller than 5 seconds, but can be
disabled by setting 'inval_wq' to zero (which is the default).

Signed-off-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-12 11:45:03 +01:00
Luis Henriques
395b955303 dcache: export shrink_dentry_list() and add new helper d_dispose_if_unused()
Add and export a new helper d_dispose_if_unused() which is simply a wrapper
around to_shrink_list(), to add an entry to a dispose list if it's not used
anymore.

Also export shrink_dentry_list() to kill all dentries in a dispose list.

Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-12 11:45:03 +01:00
Miklos Szeredi
66c6a77e00 fuse: add WARN_ON and comment for RCU revalidate
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-12 11:45:03 +01:00
Bernd Schubert
2a36511609 fuse: Fix whitespace for fuse_uring_args_to_ring() comment
The function comment accidentally got wrong indentation.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-12 11:45:03 +01:00
Cheng Ding
6e0d7f7f4a fuse: missing copy_finish in fuse-over-io-uring argument copies
Fix a possible reference count leak of payload pages during
fuse argument copies.

[Joanne: simplified error cleanup]

Fixes: c090c8abae ("fuse: Add io-uring sqe commit and fetch support")
Cc: stable@vger.kernel.org # v6.14
Signed-off-by: Cheng Ding <cding@ddn.com>
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-12 11:45:03 +01:00
Troy Mitchell
3b70f972b6 riscv: dts: spacemit: reorder i2c2 node
Reorder the i2c2 node to its correct position according to its
register address.This improves the readability and maintainability
of the device tree file by adhering to the established ordering convention.

No functional change is introduced by this reordering.

Signed-off-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Link: https://lore.kernel.org/r/20251105-k1-add-i2c-node-v1-1-d18dae246137@linux.spacemit.com
Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-11-12 18:38:12 +08:00
Herve Codina (Schneider Electric)
8c95f813d7 ARM: dts: renesas: r9a06g032: Add the ADC device
The ADC available in the r9a06g032 SoC can use up to two internal ADC
cores (ADC1 and ADC2) those internal cores are handled through ADC
controller virtual channels.

Describe this device.

Signed-off-by: Herve Codina (Schneider Electric) <herve.codina@bootlin.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251103141834.71677-4-herve.codina@bootlin.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-12 11:28:30 +01:00
Michael Opdenacker
63e572b114 riscv: dts: spacemit: Add OrangePi R2S board device tree
Add initial device tree support for the OrangePi RV2 board [1], which is
marketed as using the Ky X1 SoC but is identical in die and package
to the SpacemiT K1 SoC [2].

Enable UART0, to boot into a serial console

Two Gigabit Ethernet ports with RGMII interface standard support
are enabled, each port is connected to an external
Motorcomm YT8531C PHY chip which uses the GPIO for reset control.

Enable PDMA.

Enable 8 GB eMMC chip for storage.

Link: http://www.orangepi.org/html/hardWare/computerAndMicrocontrollers/details/Orange-Pi-R2S.html [1]
Link: https://www.spacemit.com/en/key-stone-k1 [2]
Signed-off-by: Michael Opdenacker <michael.opdenacker@rootcommit.com>
Reviewed-by: Yixun Lan <dlan@gentoo.org>
Link: https://lore.kernel.org/r/20251112044426.2351999-3-michael.opdenacker@rootcommit.com
Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-11-12 18:28:17 +08:00
Michael Opdenacker
323256d11e dt-bindings: riscv: spacemit: Add OrangePi R2S board
Document the compatible string for the OrangePi R2S board [1], which
is marketed as using the Ky X1 SoC but is in fact identical in die
and package to the SpacemiT K1 SoC [2].

Link: http://www.orangepi.org/html/hardWare/computerAndMicrocontrollers/details/Orange-Pi-R2S.html [1]
Link: https://www.spacemit.com/en/key-stone-k1 [2]
Signed-off-by: Michael Opdenacker <michael.opdenacker@rootcommit.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Yixun Lan <dlan@gentoo.org>
Link: https://lore.kernel.org/r/20251112044426.2351999-2-michael.opdenacker@rootcommit.com
Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-11-12 18:25:31 +08:00
Lad Prabhakar
c4698a3499 arm64: dts: renesas: r9a09g087: Add GMAC nodes
Add Ethernet MAC (GMAC) device nodes to the RZ/N2H (R9A09G087) SoC DTSI.
The RZ/N2H integrates three GMAC interfaces based on the Synopsys
DesignWare MAC (version 5.20).

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251028175458.1037397-5-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-12 11:17:15 +01:00
Lad Prabhakar
394c1e24a4 arm64: dts: renesas: r9a09g077: Add GMAC nodes
Add Ethernet MAC (GMAC) device nodes to the RZ/T2H (R9A09G077) SoC DTSI.
The RZ/T2H integrates three GMAC interfaces based on the Synopsys
DesignWare MAC (version 5.20).

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251028175458.1037397-4-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-12 11:17:13 +01:00
Lad Prabhakar
5d06389a05 arm64: dts: renesas: r9a09g087: Add ETHSS node
Add an Ethernet Switch Subsystem (ETHSS) device node to the RZ/N2H
(R9A09G087) SoC. The ETHSS IP block is responsible for handling MII
pass-through or conversion to RMII/RGMII.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251028175458.1037397-3-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-12 11:17:03 +01:00
Lad Prabhakar
fea7a8b7d7 arm64: dts: renesas: r9a09g077: Add ETHSS node
Add an Ethernet Switch Subsystem (ETHSS) device node to the RZ/T2H
(R9A09G077) SoC. The ETHSS IP block is responsible for handling MII
pass-through or conversion to RMII/RGMII.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251028175458.1037397-2-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-12 11:16:58 +01:00
Jiaqi Yan
4debb5e895 Documentation: kvm: new UAPI for handling SEA
Document the new userspace-visible features and APIs for handling
synchronous external abort (SEA)
- KVM_CAP_ARM_SEA_TO_USER: How userspace enables the new feature.
- KVM_EXIT_ARM_SEA: exit userspace gets when it needs to handle SEA
  and what userspace gets while taking the SEA.

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Link: https://msgid.link/20251013185903.1372553-4-jiaqiyan@google.com
[ oliver: make documentation concise, remove implementation detail ]
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-12 01:52:37 -08:00
Jiaqi Yan
feee9ef7ac KVM: selftests: Test for KVM_EXIT_ARM_SEA
Test how KVM handles guest SEA when APEI is unable to claim it, and
KVM_CAP_ARM_SEA_TO_USER is enabled.

The behavior is triggered by consuming recoverable memory error (UER)
injected via EINJ. The test asserts two major things:
1. KVM returns to userspace with KVM_EXIT_ARM_SEA exit reason, and
   has provided expected fault information, e.g. esr, flags, gva, gpa.
2. Userspace is able to handle KVM_EXIT_ARM_SEA by injecting SEA to
   guest and KVM injects expected SEA into the VCPU.

Tested on a data center server running Siryn AmpereOne processor
that has RAS support.

Several things to notice before attempting to run this selftest:
- The test relies on EINJ support in both firmware and kernel to
  inject UER. Otherwise the test will be skipped.
- The under-test platform's APEI should be unable to claim the SEA.
  Otherwise the test will be skipped.
- Some platform doesn't support notrigger in EINJ, which may cause
  APEI and GHES to offline the memory before guest can consume
  injected UER, and making test unable to trigger SEA.

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Link: https://msgid.link/20251013185903.1372553-3-jiaqiyan@google.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-12 01:27:16 -08:00
Jiaqi Yan
ad9c62bd89 KVM: arm64: VM exit to userspace to handle SEA
When APEI fails to handle a stage-2 synchronous external abort (SEA),
today KVM injects an asynchronous SError to the VCPU then resumes it,
which usually results in unpleasant guest kernel panic.

One major situation of guest SEA is when vCPU consumes recoverable
uncorrected memory error (UER). Although SError and guest kernel panic
effectively stops the propagation of corrupted memory, guest may
re-use the corrupted memory if auto-rebooted; in worse case, guest
boot may run into poisoned memory. So there is room to recover from
an UER in a more graceful manner.

Alternatively KVM can redirect the synchronous SEA event to VMM to
- Reduce blast radius if possible. VMM can inject a SEA to VCPU via
  KVM's existing KVM_SET_VCPU_EVENTS API. If the memory poison
  consumption or fault is not from guest kernel, blast radius can be
  limited to the triggering thread in guest userspace, so VM can
  keep running.
- Allow VMM to protect from future memory poison consumption by
  unmapping the page from stage-2, or to interrupt guest of the
  poisoned page so guest kernel can unmap it from stage-1 page table.
- Allow VMM to track SEA events that VM customers care about, to restart
  VM when certain number of distinct poison events have happened,
  to provide observability to customers in log management UI.

Introduce an userspace-visible feature to enable VMM handle SEA:
- KVM_CAP_ARM_SEA_TO_USER. As the alternative fallback behavior
  when host APEI fails to claim a SEA, userspace can opt in this new
  capability to let KVM exit to userspace during SEA if it is not
  owned by host.
- KVM_EXIT_ARM_SEA. A new exit reason is introduced for this.
  KVM fills kvm_run.arm_sea with as much as possible information about
  the SEA, enabling VMM to emulate SEA to guest by itself.
  - Sanitized ESR_EL2. The general rule is to keep only the bits
    useful for userspace and relevant to guest memory.
  - Flags indicating if faulting guest physical address is valid.
  - Faulting guest physical and virtual addresses if valid.

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://msgid.link/20251013185903.1372553-2-jiaqiyan@google.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-12 01:27:12 -08:00
Richard Zhu
db82f6cda5 arm64: dts: imx95-19x19-evk: Add vpcie3v3aux regulator for PCIe[0,1]
Refer to PCI Express M.2 Specification r5.1 sec3.1.1 Power Sources and
Grounds.

PCI Express M.2 Socket 1 utilizes a 3.3 V power source. The voltage
source, 3.3 V, is expected to be available during the system’s
stand-by/suspend state to support wake event processing on the
communications card.

Add vpcie3v3aux regulator to let this 3.3 V power source always on for
PCIe M.2 Key E connector(PCIe0) on i.MX95 19x19 EVK board.

PCIe1 uses one standard PCIe slot connector, but combines the +3.3v and
+3.3Vaux into a same 3.3v power source, and intends to let it always on.
Add vpcie3v3aux regulator to let this 3.3 V power source always on for
PCIe1 on i.MX95 19x19 EVK board too.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-12 09:57:54 +08:00
Richard Zhu
4737f7c742 arm64: dts: imx95-15x15-evk: Add vpcie3v3aux regulator for PCIe M.2 connector
Refer to PCI Express M.2 Specification r5.1 sec3.1.1 Power Sources and
Grounds.

PCI Express M.2 Socket 1 utilizes a 3.3 V power source. The voltage
source, 3.3 V, is expected to be available during the system’s
stand-by/suspend state to support wake event processing on the
communications card.

Add vpcie3v3aux regulator to let this 3.3 V power source always on for
PCIe M.2 Key E connector on i.MX95 15x15 EVK board.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-12 09:57:52 +08:00
Richard Zhu
b64620022d arm64: dts: imx8qxp-mek: Add vpcie3v3aux regulator for PCIe M.2 connector
Refer to PCI Express M.2 Specification r5.1 sec3.1.1 Power Sources and
Grounds.

PCI Express M.2 Socket 1 utilizes a 3.3 V power source. The voltage
source, 3.3 V, is expected to be available during the system’s
stand-by/suspend state to support wake event processing on the
communications card.

Add vpcie3v3aux regulator to let this 3.3 V power source always on for
PCIe M.2 Key E connector on i.MX8QXP MEK board.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-12 09:57:49 +08:00
Richard Zhu
f4303fa035 arm64: dts: imx8qm-mek: Add vpcie3v3aux regulator for PCIe M.2 connector
Refer to PCI Express M.2 Specification r5.1 sec3.1.1 Power Sources and
Grounds.

PCI Express M.2 Socket 1 utilizes a 3.3 V power source. The voltage
source, 3.3 V, is expected to be available during the system’s
stand-by/suspend state to support wake event processing on the
communications card.

Add vpcie3v3aux regulator to let this 3.3 V power source always on for
PCIe M.2 Key E connector on i.MX8QM MEK board.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-12 09:57:47 +08:00
Richard Zhu
6fa16090ef arm64: dts: imx8mq-evk: Add vpcie3v3aux regulator for PCIe M.2 connector
Refer to PCI Express M.2 Specification r5.1 sec3.1.1 Power Sources and
Grounds.

PCI Express M.2 Socket 1 utilizes a 3.3 V power source. The voltage
source, 3.3 V, is expected to be available during the system’s
stand-by/suspend state to support wake event processing on the
communications card.

Add vpcie3v3aux regulator to let this 3.3 V power source always on for
PCIe M.2 Key E connector on i.MX8MQ EVK board.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-12 09:57:44 +08:00
Richard Zhu
caf08e2d49 arm64: dts: imx8mp-evk: Add vpcie3v3aux regulator for PCIe M.2 connector
Refer to PCI Express M.2 Specification r5.1 sec3.1.1 Power Sources and
Grounds.

PCI Express M.2 Socket 1 utilizes a 3.3 V power source. The voltage
source, 3.3 V, is expected to be available during the system’s
stand-by/suspend state to support wake event processing on the
communications card.

Add vpcie3v3aux regulator to let this 3.3 V power source always on for
PCIe M.2 Key E connector on i.MX8MP EVK board.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-12 09:57:41 +08:00
Richard Zhu
55a79e599a arm64: dts: imx8dxl-evk: Add vpcie3v3aux regulator for PCIe M.2 connector
Refer to PCI Express M.2 Specification r5.1 sec3.1.1 Power Sources and
Grounds.

PCI Express M.2 Socket 1 utilizes a 3.3 V power source. The voltage
source, 3.3 V, is expected to be available during the system’s
stand-by/suspend state to support wake event processing on the
communications card.

Add vpcie3v3aux regulator to let this 3.3 V power source always on for
PCIe M.2 Key E connector on i.MX8DXL EVK board.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-12 09:57:26 +08:00
Richard Zhu
1a79e70e9d arm64: dts: imx8qxp-mek: Add supports-clkreq property to PCIe M.2 port
According to PCIe r6.1, sec 5.5.1.

The following rules define how the L1.1 and L1.2 substates are entered:
Both the Upstream and Downstream Ports must monitor the logical state of
the CLKREQ# signal.

Typical implement is using open drain, which connect RC's clkreq# to
EP's clkreq# together and pull up clkreq#.

imx8qxp-mek matches this requirement, so add supports-clkreq to allow
PCIe device enter ASPM L1 Sub-State.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-12 09:51:29 +08:00
Richard Zhu
44f9bcc943 arm64: dts: imx8qm-mek: Add supports-clkreq property to PCIe M.2 port
According to PCIe r6.1, sec 5.5.1.

The following rules define how the L1.1 and L1.2 substates are entered:
Both the Upstream and Downstream Ports must monitor the logical state of
the CLKREQ# signal.

Typical implement is using open drain, which connect RC's clkreq# to
EP's clkreq# together and pull up clkreq#.

imx8qm-mek matches this requirement, so add supports-clkreq to allow
PCIe device enter ASPM L1 Sub-State.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-12 09:51:29 +08:00
Richard Zhu
053ee55576 arm64: dts: imx8mq-evk: Add supports-clkreq property to PCIe M.2 port
According to PCIe r6.1, sec 5.5.1.

The following rules define how the L1.1 and L1.2 substates are entered:
Both the Upstream and Downstream Ports must monitor the logical state of
the CLKREQ# signal.

Typical implement is using open drain, which connect RC's clkreq# to
EP's clkreq# together and pull up clkreq#.

imx8mq-evk matches this requirement, so add supports-clkreq to allow
PCIe device enter ASPM L1 Sub-State.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-12 09:51:28 +08:00
Richard Zhu
45b68efd9b arm64: dts: imx8mp-evk: Add supports-clkreq property to PCIe M.2 port
According to PCIe r6.1, sec 5.5.1.

The following rules define how the L1.1 and L1.2 substates are entered:
Both the Upstream and Downstream Ports must monitor the logical state of
the CLKREQ# signal.

Typical implement is using open drain, which connect RC's clkreq# to
EP's clkreq# together and pull up clkreq#.

imx8mp-evk matches this requirement, so add supports-clkreq to allow
PCIe device enter ASPM L1 Sub-State.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-12 09:51:28 +08:00
Richard Zhu
798825c11e arm64: dts: imx8mm-evk: Add supports-clkreq property to PCIe M.2 port
According to PCIe r6.1, sec 5.5.1.

The following rules define how the L1.1 and L1.2 substates are entered:
Both the Upstream and Downstream Ports must monitor the logical state of
the CLKREQ# signal.

Typical implement is using open drain, which connect RC's clkreq# to
EP's clkreq# together and pull up clkreq#.

imx8mm-evk matches this requirement, so add supports-clkreq to allow
PCIe device enter ASPM L1 Sub-State.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-12 09:51:28 +08:00
Richard Zhu
756d0ef76e arm64: dts: imx95-19x19-evk: Add supports-clkreq property to PCIe M.2 port
According to PCIe r6.1, sec 5.5.1.

The following rules define how the L1.1 and L1.2 substates are entered:
Both the Upstream and Downstream Ports must monitor the logical state of
the CLKREQ# signal.

Typical implement is using open drain, which connect RC's clkreq# to
EP's clkreq# together and pull up clkreq#.

imx95-19x19-evk matches this requirement, so add supports-clkreq to
allow PCIe device enter ASPM L1 Sub-State.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-12 09:51:28 +08:00
Richard Zhu
dd93ee01a5 arm64: dts: imx95-15x15-evk: Add supports-clkreq property to PCIe M.2 port
According to PCIe r6.1, sec 5.5.1.

The following rules define how the L1.1 and L1.2 substates are entered:
Both the Upstream and Downstream Ports must monitor the logical state of
the CLKREQ# signal.

Typical implement is using open drain, which connect RC's clkreq# to
EP's clkreq# together and pull up clkreq#.

imx95-15x15-evk matches this requirement, so add supports-clkreq to
allow PCIe device enter ASPM L1 Sub-State.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-12 09:51:28 +08:00
George Kelly
73f0769ebf ARM: dts: ti/omap: fix incorrect compatible string in internal eeprom node
While the Beaglebone capes have the Atmel AT24C256 chip (256kbit or 32kB),
the internal Beaglebone eeprom chip (i2c bus 0, addr 0x50), is an AT24C32
(32kbit or 4kB). Yet the device tree lists AT24C256 as the compatible chip
prior to this patch. You can confirm this by running
`sudo hexdump -C /sys/bus/nvmem/devices/0-00500/nvmem`. You can see the
factory data is repeated every 0x1000 addresses (every 4096 bytes or 32768
bits). This is because the read command wraps around to reading 0x0000 when
a user requests address 0x1000.

This is not a huge issue for reading, but it is for writing to the EEPROM
for two reasons:

1) If a user writes to addresses 0x1000 - 0x104e, they'll accidentally
overwrite the factory data stored at 0x0000 - 0x104e. This also is an issue
for writing to 0x2000 - 0x204e, and so on.
2) AT24C256 has 64-byte pages, but AT24C32 only has 32 byte pages. Thus, if
you attempt to write more than 32 bytes, bytes 32-64 will wrap around. This
causes your data in the actual EEPROM chip's bytes 0-32 to be overwritten by
the data in your request's bytes 32-64, while the EEPROM chip's bytes 32-64
remain 0xFF (unwritten). Lastly, the Beaglebone Black's user manual does
correctly mention that the internal EEPROM is 4kB (while capes are 32kB or
256kbit). It's just this bit of code that does not match.

Signed-off-by: George Kelly <george.kelly1097@gmail.com>
Link: https://lore.kernel.org/r/20251108102741.47628-1-george.kelly1097@gmail.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-11 12:18:23 -08:00
Conor Dooley
1088d49b62 riscv: dts: microchip: enable qspi adc/mmc-spi-slot on BeagleV Fire
The BeagleV Fire has an SD card slot and an ADC connected to the QSPI
controller.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-11 16:58:53 +00:00
Conor Dooley
66c6ceb41e MAINTAINERS: rename Microchip RISC-V entry
There's now non-FPGA RISC-V SoCs from Microchip, so rename the entry
to reflect that.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-11 16:52:30 +00:00
Conor Dooley
587c0a5e81 MAINTAINERS: add new soc drivers to Microchip RISC-V entry
Add the two new syscon drivers to the RISC-V entry for Microchip
platforms.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-11 16:52:30 +00:00
Gabor Juhos
186b8f8fcc soc: qcom: mdt_loader: rename 'firmware' parameter of qcom_mdt_load()
In the 'mdt_loader.h' header, both the prototype and the inline
version of the qcom_mdt_load() function uses 'fw_name' as name for
the firmware name parameter. Additionally, the other qcom_mdt_*
functions are using that as well.

For consistency, rename the 'firmware' parameter in the implementation
of the qcom_mdt_load() to 'fw_name' and update the function accordingly.

No functional changes.

Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251111-mdt-loader-cleanup-v1-2-71afee094dce@gmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-11 09:48:57 -06:00
Gabor Juhos
0cda8823b1 soc: qcom: mdt_loader: merge __qcom_mdt_load() and qcom_mdt_load_no_init()
The qcom_mdt_load_no_init() function is just a simple wrapper around
of __qcom_mdt_load(). Since commit 0daf35da39 ("soc: qcom: mdt_loader:
Remove pas id parameter") both functions are using the same type of
parameters and providing the same functionality.

Keeping two functions for the same purpose is superfluous, so rename
the __qcom_mdt_load() function to qcom_mdt_load_no_init() and remove
the wrapper.

No functional changes.

Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251111-mdt-loader-cleanup-v1-1-71afee094dce@gmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-11 09:48:56 -06:00
Sean Christopherson
dfd1572a64 KVM: VMX: Make loaded_vmcs_clear() static in vmx.c
Make loaded_vmcs_clear() local to vmx.c as there are no longer any
external callers.

No functional change intended.

Link: https://patch.msgid.link/20251106205114.218226-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-11 07:41:16 -08:00
Joanne Koong
bd5603eaae fuse: fix readahead reclaim deadlock
Commit e26ee4efbc ("fuse: allocate ff->release_args only if release is
needed") skips allocating ff->release_args if the server does not
implement open. However in doing so, fuse_prepare_release() now skips
grabbing the reference on the inode, which makes it possible for an
inode to be evicted from the dcache while there are inflight readahead
requests. This causes a deadlock if the server triggers reclaim while
servicing the readahead request and reclaim attempts to evict the inode
of the file being read ahead. Since the folio is locked during
readahead, when reclaim evicts the fuse inode and fuse_evict_inode()
attempts to remove all folios associated with the inode from the page
cache (truncate_inode_pages_range()), reclaim will block forever waiting
for the lock since readahead cannot relinquish the lock because it is
itself blocked in reclaim:

>>> stack_trace(1504735)
 folio_wait_bit_common (mm/filemap.c:1308:4)
 folio_lock (./include/linux/pagemap.h:1052:3)
 truncate_inode_pages_range (mm/truncate.c:336:10)
 fuse_evict_inode (fs/fuse/inode.c:161:2)
 evict (fs/inode.c:704:3)
 dentry_unlink_inode (fs/dcache.c:412:3)
 __dentry_kill (fs/dcache.c:615:3)
 shrink_kill (fs/dcache.c:1060:12)
 shrink_dentry_list (fs/dcache.c:1087:3)
 prune_dcache_sb (fs/dcache.c:1168:2)
 super_cache_scan (fs/super.c:221:10)
 do_shrink_slab (mm/shrinker.c:435:9)
 shrink_slab (mm/shrinker.c:626:10)
 shrink_node (mm/vmscan.c:5951:2)
 shrink_zones (mm/vmscan.c:6195:3)
 do_try_to_free_pages (mm/vmscan.c:6257:3)
 do_swap_page (mm/memory.c:4136:11)
 handle_pte_fault (mm/memory.c:5562:10)
 handle_mm_fault (mm/memory.c:5870:9)
 do_user_addr_fault (arch/x86/mm/fault.c:1338:10)
 handle_page_fault (arch/x86/mm/fault.c:1481:3)
 exc_page_fault (arch/x86/mm/fault.c:1539:2)
 asm_exc_page_fault+0x22/0x27

Fix this deadlock by allocating ff->release_args and grabbing the
reference on the inode when preparing the file for release even if the
server does not implement open. The inode reference will be dropped when
the last reference on the fuse file is dropped (see fuse_file_put() ->
fuse_release_end()).

Fixes: e26ee4efbc ("fuse: allocate ff->release_args only if release is needed")
Cc: stable@vger.kernel.org
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reported-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-11 16:04:45 +01:00
Laurent Pinchart
8f76686bbd arm64: dts: imx8mp-debix-model-a: Fix ethernet PHY address
The RTL8211E ethernet PHY on the Debix Model A board it located at
address 1. Replace the broadcast address with the correct unicast
address.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-11 21:49:16 +08:00
Kunwu Chan
2617bd81ae powerpc/83xx: Add a null pointer check to mcu_gpiochip_add
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure. Ensure the allocation was successful
by checking the pointer validity.

Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20240115094330.33014-1-chentao@kylinos.cn
2025-11-11 14:39:25 +05:30
Bhaskar Chowdhury
f90d28443b arch:powerpc:tools This file was missing shebang line, so added it
This file was missing the shebang line, so added it.

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250722220043.14862-1-unixbhaskar@gmail.com
2025-11-11 14:36:24 +05:30
Ben Collins
38c64dfe0a kexec: Include kernel-end even without crashkernel
Certain versions of kexec don't even work without kernel-end being
added to the device-tree. Add it even if crash-kernel is disabled.

Signed-off-by: Ben Collins <bcollins@kernel.org>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/2025042122-inescapable-mandrill-8a5ff2@boujee-and-buff
2025-11-11 14:34:15 +05:30
J. Neuschäfer
af6850ac9e powerpc: p2020: Rename wdt@ nodes to watchdog@
The watchdog.yaml schema prescribes a node name of "timer" or "watchdog"
rather than the abbreviation "wdt".

Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250418-watchdog-v1-4-987ff2046272@posteo.net
2025-11-11 14:32:47 +05:30
J. Neuschäfer
39fe29e7f2 powerpc: 86xx: Rename wdt@ nodes to watchdog@
The watchdog.yaml schema prescribes a node name of "timer" or "watchdog"
rather than the abbreviation "wdt".

Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250418-watchdog-v1-3-987ff2046272@posteo.net
2025-11-11 14:32:47 +05:30
J. Neuschäfer
cc156be1e7 powerpc: 83xx: Rename wdt@ nodes to watchdog@
The watchdog.yaml schema prescribes a node name of "timer" or "watchdog"
rather than the abbreviation "wdt".

Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250418-watchdog-v1-2-987ff2046272@posteo.net
2025-11-11 14:32:47 +05:30
J. Neuschäfer
0070b2cbfe powerpc: 512x: Rename wdt@ node to watchdog@
The watchdog.yaml schema prescribes a node name of "timer" or "watchdog"
rather than the abbreviation "wdt".

Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250418-watchdog-v1-1-987ff2046272@posteo.net
2025-11-11 14:32:47 +05:30
Ben Collins
825ce89a3e powerpc/addnote: Fix overflow on 32-bit builds
The PUT_64[LB]E() macros need to cast the value to unsigned long long
like the GET_64[LB]E() macros. Caused lots of warnings when compiled
on 32-bit, and clobbered addresses (36-bit P4080).

Signed-off-by: Ben Collins <bcollins@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/2025042122-mustard-wrasse-694572@boujee-and-buff
2025-11-11 14:32:32 +05:30
Antonio Alvarez Feijoo
cbc8bd1b7d powerpc/boot: Add missing compression methods to usage
lzma and lzo are also supported.

Signed-off-by: Antonio Alvarez Feijoo <antonio.feijoo@suse.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250916061840.5492-1-antonio.feijoo@suse.com
2025-11-11 14:25:57 +05:30
Nathan Chancellor
d2be62d585 powerpc/vmlinux.lds: Drop .interp description
Commit da30705c46 ("arch/powerpc: Remove .interp section in vmlinux")
intended to drop the .interp section from vmlinux but even with this
change, relocatable kernels linked with ld.lld contain an empty .interp
section, which ends up causing crashes in GDB [1].

  $ make -skj"$(nproc)" ARCH=powerpc LLVM=1 clean pseries_le_defconfig vmlinux

  $ llvm-readelf -S vmlinux | grep interp
    [44] .interp           PROGBITS        c0000000021ddb34 21edb34 000000 00   A  0   0  1

There appears to be a subtle difference between GNU ld and ld.lld when
it comes to discarding sections that specify load addresses [2].

Since '--no-dynamic-linker' prevents emission of the .interp section,
there is no need to describe it in the output sections of the vmlinux
linker script. Drop the .interp section description from vmlinux.lds.S
to avoid this issue altogether.

Link: https://sourceware.org/bugzilla/show_bug.cgi?id=33481 [1]
Link: https://github.com/ClangBuiltLinux/linux/issues/2137 [2]
Reported-by: Vishal Chourasia <vishalc@linux.ibm.com>
Closes: https://lore.kernel.org/20251013040148.560439-1-vishalc@linux.ibm.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251018-ppc-fix-lld-interp-v1-1-a083de6dccc9@kernel.org
2025-11-11 14:23:27 +05:30
Long Li
1e4b207ffe macintosh/mac_hid: fix race condition in mac_hid_toggle_emumouse
The following warning appears when running syzkaller, and this issue also
exists in the mainline code.

 ------------[ cut here ]------------
 list_add double add: new=ffffffffa57eee28, prev=ffffffffa57eee28, next=ffffffffa5e63100.
 WARNING: CPU: 0 PID: 1491 at lib/list_debug.c:35 __list_add_valid_or_report+0xf7/0x130
 Modules linked in:
 CPU: 0 PID: 1491 Comm: syz.1.28 Not tainted 6.6.0+ #3
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
 RIP: 0010:__list_add_valid_or_report+0xf7/0x130
 RSP: 0018:ff1100010dfb7b78 EFLAGS: 00010282
 RAX: 0000000000000000 RBX: ffffffffa57eee18 RCX: ffffffff97fc9817
 RDX: 0000000000040000 RSI: ffa0000002383000 RDI: 0000000000000001
 RBP: ffffffffa57eee28 R08: 0000000000000001 R09: ffe21c0021bf6f2c
 R10: 0000000000000001 R11: 6464615f7473696c R12: ffffffffa5e63100
 R13: ffffffffa57eee28 R14: ffffffffa57eee28 R15: ff1100010dfb7d48
 FS:  00007fb14398b640(0000) GS:ff11000119600000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000010d096005 CR4: 0000000000773ef0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 80000000
 Call Trace:
  <TASK>
  input_register_handler+0xb3/0x210
  mac_hid_start_emulation+0x1c5/0x290
  mac_hid_toggle_emumouse+0x20a/0x240
  proc_sys_call_handler+0x4c2/0x6e0
  new_sync_write+0x1b1/0x2d0
  vfs_write+0x709/0x950
  ksys_write+0x12a/0x250
  do_syscall_64+0x5a/0x110
  entry_SYSCALL_64_after_hwframe+0x78/0xe2

The WARNING occurs when two processes concurrently write to the mac-hid
emulation sysctl, causing a race condition in mac_hid_toggle_emumouse().
Both processes read old_val=0, then both try to register the input handler,
leading to a double list_add of the same handler.

  CPU0                             CPU1
  -------------------------        -------------------------
  vfs_write() //write 1            vfs_write()  //write 1
    proc_sys_write()                 proc_sys_write()
      mac_hid_toggle_emumouse()          mac_hid_toggle_emumouse()
        old_val = *valp // old_val=0
                                           old_val = *valp // old_val=0
                                           mutex_lock_killable()
                                           proc_dointvec() // *valp=1
                                           mac_hid_start_emulation()
                                             input_register_handler()
                                           mutex_unlock()
        mutex_lock_killable()
        proc_dointvec()
        mac_hid_start_emulation()
          input_register_handler() //Trigger Warning
        mutex_unlock()

Fix this by moving the old_val read inside the mutex lock region.

Fixes: 99b089c3c3 ("Input: Mac button emulation - implement as an input filter")
Signed-off-by: Long Li <leo.lilong@huawei.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250819091035.2263329-1-leo.lilong@huaweicloud.com
2025-11-11 14:18:05 +05:30
Christophe Leroy
10e1c77c36 powerpc/32: Fix unpaired stwcx. on interrupt exit
Commit b96bae3ae2 ("powerpc/32: Replace ASM exception exit by C
exception exit from ppc64") erroneouly copied to powerpc/32 the logic
from powerpc/64 based on feature CPU_FTR_STCX_CHECKS_ADDRESS which is
always 0 on powerpc/32.

Re-instate the logic implemented by commit b64f87c16f ("[POWERPC]
Avoid unpaired stwcx. on some processors") which is based on
CPU_FTR_NEED_PAIRED_STWCX feature.

Fixes: b96bae3ae2 ("powerpc/32: Replace ASM exception exit by C exception exit from ppc64")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/6040b5dbcf5cdaa1cd919fcf0790f12974ea6e5a.1757666244.git.christophe.leroy@csgroup.eu
2025-11-11 14:14:07 +05:30
Christophe Leroy
2997876c4a powerpc/32: Restore clearing of MSR[RI] at interrupt/syscall exit
Commit 13799748b9 ("powerpc/64: use interrupt restart table to speed
up return from interrupt") removed the inconditional clearing of
MSR[RI] when returning from interrupt into kernel. But powerpc/32
doesn't implement interrupt restart table hence still need MSR[RI]
to be cleared.

It could be added back in interrupt_exit_kernel_prepare() but it is
easier and better to add it back in entry_32.S for following reasons:
- Writing to MSR must be followed by a synchronising instruction
- The smaller the non recoverable section is the better it is

So add a macro called clr_ri and use it in the three places that play
up with SRR0/SRR1. Use it just before another mtspr for synchronisation
to avoid having to add an isync.

Now that's done in entry_32.S, exit_must_hard_disable() can return
false for non book3s/64, taking into account that BOOKE doesn't have
MSR_RI.

Also add back blacklisting syscall_exit_finish for kprobe. This was
initially added by commit 7cdf440138 ("powerpc/entry32: Blacklist
syscall exit points for kprobe.") then lost with
commit 6f76a01173 ("powerpc/syscall: implement system call
entry/exit logic in C for PPC32").

Fixes: 6f76a01173 ("powerpc/syscall: implement system call entry/exit logic in C for PPC32")
Fixes: 13799748b9 ("powerpc/64: use interrupt restart table to speed up return from interrupt")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/66d0ab070563ad460ed481328ab0887c27f21a2c.1757593807.git.christophe.leroy@csgroup.eu
2025-11-11 14:13:37 +05:30
Christophe Leroy
98fa236044 powerpc/8xx: Remove specific code from fast_exception_return
The label 2: in fast_exception_return is a leftover from
commit b96bae3ae2 ("powerpc/32: Replace ASM exception exit by C
exception exit from ppc64"). Once removed, we see that
fast_exception_return is a standalone function that is called only
from pieces of assembly dedicated to book3s/32 or booke, never by
common code or 8xx code.

So remove the clear of MSR[RI] enclosed in #ifdef CONFIG_PPC_8xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/39de3e0f0122b571474b1ba352a2dc3ad8cb71dd.1756304318.git.christophe.leroy@csgroup.eu
2025-11-11 14:12:48 +05:30
Sourabh Jain
7afe2383ef powerpc/kdump: Fix size calculation for hot-removed memory ranges
The elfcorehdr segment in the kdump image stores information about the
memory regions (called crash memory ranges) that the kdump kernel must
capture.

When a memory hot-remove event occurs, the kernel regenerates the
elfcorehdr for the currently loaded kdump image to remove the
hot-removed memory from the crash memory ranges.

Call chain:
remove_mem_range()
update_crash_elfcorehdr()
arch_crash_handle_hotplug_event()
crash_handle_hotplug_event()

While removing the hot-removed memory from the crash memory ranges in
remove_mem_range(), if the removed memory lies within an existing crash
range, that range is split into two. During this split, the size of the
second range was being calculated incorrectly.

This leads to dump capture failure with makedumpfile with below error:

$ makedumpfile -l -d 31 /proc/vmcore /tmp/vmcore

readpage_elf: Attempt to read non-existent page at 0xbbdab0000.
readmem: type_addr: 0, addr:c000000bbdab7f00, size:16
validate_mem_section: Can't read mem_section array.
readpage_elf: Attempt to read non-existent page at 0xbbdab0000.
readmem: type_addr: 0, addr:c000000bbdab7f00, size:8
get_mm_sparsemem: Can't get the address of mem_section.

The updated crash memory range in PT_LOAD entry is holding incorrect
data (checkout FileSiz and MemSiz):

readelf -a /proc/vmcore
<snip...>
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000b013d0000 0xc000000b80000000 0x0000000b80000000
                 0xffffffffc0000000 0xffffffffc0000000  RWE    0x0
<snip...>

Update the size calculation for the new crash memory range to fix this
issue.

Note: This problem will not occur if the kdump kernel is loaded or
reloaded after a memory hot-remove operation.

Fixes: 849599b702 ("powerpc/crash: add crash memory hotplug support")
Reported-by: Shirisha G <shirisha@linux.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251105033941.1752287-1-sourabhjain@linux.ibm.com
2025-11-11 14:11:55 +05:30
Sourabh Jain
b4a96ab50f powerpc/kdump: Add support for crashkernel CMA reservation
Commit 35c18f2933 ("Add a new optional ",cma" suffix to the
crashkernel= command line option") and commit ab475510e0 ("kdump:
implement reserve_crashkernel_cma") added CMA support for kdump
crashkernel reservation.

Extend crashkernel CMA reservation support to powerpc.

The following changes are made to enable CMA reservation on powerpc:

- Parse and obtain the CMA reservation size along with other crashkernel
  parameters
- Call reserve_crashkernel_cma() to allocate the CMA region for kdump
- Include the CMA-reserved ranges in the usable memory ranges for the
  kdump kernel to use.
- Exclude the CMA-reserved ranges from the crash kernel memory to
  prevent them from being exported through /proc/vmcore.

With the introduction of the CMA crashkernel regions,
crash_exclude_mem_range() needs to be called multiple times to exclude
both crashk_res and crashk_cma_ranges from the crash memory ranges. To
avoid repetitive logic for validating mem_ranges size and handling
reallocation when required, this functionality is moved to a new wrapper
function crash_exclude_mem_range_guarded().

To ensure proper CMA reservation, reserve_crashkernel_cma() is called
after pageblock_order is initialized.

Update kernel-parameters.txt to document CMA support for crashkernel on
powerpc architecture.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251107080334.708028-1-sourabhjain@linux.ibm.com
2025-11-11 14:11:08 +05:30
Srikar Dronamraju
8127c4fdf1 pseries/lparcfg: Add resource group monitoring
Systems can now be partitioned into resource groups. By default all
systems will be part of default resource group. Once a resource group is
created, and resources allocated to the resource group, those resources
will be removed from the default resource group. If a LPAR moved to a
resource group, then it can only use resources in the resource group.

So maximum processors that can be allocated to a LPAR can be equal or
smaller than the resources in the resource group.

lparcfg can now exposes the resource group id to which this LPAR belongs
to. It also exposes the number of processors in the current resource
group. The default resource group id happens to be 0. These would be
documented in the upcoming PAPR update.

Example of an LPAR in a default resource group
root@ltcp11-lp3 $ grep resource_group /proc/powerpc/lparcfg
resource_group_number=0
resource_group_active_processors=50
root@ltcp11-lp3 $

Example of an LPAR in a non-default resource group
root@ltcp11-lp5 $ grep resource_group /proc/powerpc/lparcfg
resource_group_number=1
resource_group_active_processors=30
root@ltcp11-lp5 $

Signed-off-by: Srikar Dronamraju <srikar@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250716104600.59102-1-srikar@linux.ibm.com
2025-11-11 14:07:43 +05:30
Frank Li
adf6a5a275 ARM: dts: imx: add vdd-supply and vddio-supply for fsl,mpl3115
Add vdd-supply and vddio-supply for fsl,mpl3115 to fix below CHECK_DTBS
warnings:
  arch/arm/boot/dts/nxp/imx/imx53-ppd.dtb: pressure-sensor@60 (fsl,mpl3115): 'vdd-supply' is a required property

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-11 15:32:11 +08:00
Frank Li
6b416cb376 arm64: dts: imx8: add vdd-supply and vddio-supply for fsl,mpl3115
Add vdd-supply and vddio-supply for fsl,mpl3115 to fix CHECK_DTBS warning:
arch/arm64/boot/dts/freescale/imx8qm-mek.dtb: pressure-sensor@60 (fsl,mpl3115): 'vdd-supply' is a required property
  from schema $id: http://devicetree.org/schemas/iio/pressure/fsl,mpl3115.yaml#

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-11 15:31:52 +08:00
Frank Li
7753a51e61 arm64: dts: imx8dxl-ss-conn: delete usb3_lpcg node
Delete usb3_lpcg node for imx8dxl because not exist at such hardware.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-11 15:24:56 +08:00
Frank Li
dadcd4b92a arm64: dts: imx8-ss-conn: add missed clock enet_2x_txclk for fec[1,2]
Add missed clock enet_2x_txclk for fec[1,2].

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-11 15:24:55 +08:00
Frank Li
d8ce633518 arm64: dts: imx8-ss-conn: add fsl,tuning-step for usdhc1 and usdhc2
Add fsl,tuning-step for usdhc1 and usdhc2 to improve card compatibility.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-11 15:24:53 +08:00
Shenwei Wang
1ddf34137c arm64: dts: imx8: add default clock rate for usdhc
Add default clock rate for usdhc nodes to support higher transfer speed.

Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-11 15:24:50 +08:00
Frank Li
7db8963633 arm64: dts: imx8dxl-evk: add state_100mhz and state_200mhz for usdhc
Default, state_100mhz and state_200mhz use the same settings. But current
kernel driver use these to indicate if sd3.0 support.

Add max-frequency for usdhc2 because board design limitation.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-11 15:24:48 +08:00
Frank Li
1b9b4e6645 arm64: dts: imx8dxl-evk: add bt information for lpuart1
Add BT information for lpuart1.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-11 15:24:37 +08:00
Khairul Anuar Romli
1aa4ee5338 arm64: dts: socfpga: Add Agilex5 SVC node with memory region
Introduce the Stratix10 SoC Service Layer (SVC) node for Agilex5 SoCs. This
node includes the compatible string "intel,agilex5-svc" and references a
reserved memory region used for communication with the Secure Device
Manager (SDM).

Agilex5 introduces changes in how reserved memory is mapped and accessed
compared to previous SoC generations. This commit updates the device tree
structure to support Agilex5-specific handling of the SVC interface.

Signed-off-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-11-10 21:00:37 -06:00
Josua Mayer
f0e6bc0c3e clk: mvebu: cp110 add CLK_IGNORE_UNUSED to pcie_x10, pcie_x11 & pcie_x4
CP110 based platforms rely on the bootloader for pci port
initialization.
TF-A actively prevents non-uboot re-configuration of pci lanes, and many
boards do not have software control over the pci card reset.

If a pci port had link at boot-time and the clock is stopped at a later
point, the link fails and can not be recovered.

PCI controller driver probe - and by extension ownership of a driver for
the pci clocks - may be delayed especially on large modular kernels,
causing the clock core to start disabling unused clocks.

Add the CLK_IGNORE_UNUSED flag to the three pci port's clocks to ensure
they are not stopped before the pci controller driver has taken
ownership and tested for an existing link.

This fixes failed pci link detection when controller driver probes late,
e.g. with arm64 defconfig and CONFIG_PHY_MVEBU_CP110_COMPHY=m.

Closes: https://lore.kernel.org/r/b71596c7-461b-44b6-89ab-3cfbd492639f@solid-run.com
Signed-off-by: Josua Mayer <josua@solid-run.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
2025-11-10 15:19:00 +01:00
Josua Mayer
c9b6a83670 Revert "arm64: dts: marvell: cn9132-clearfog: fix multi-lane pci x2 and x4 ports"
This reverts commit 794a066688 because it
misunderstood interworking between arm trusted firmware and the common
phy driver, and does not consistently resolve the issue it was intended
to address.

Further diagnostics have revealed the root cause for the reported system
lock-up in a race condition between pci driver probe and clock core
disabling unused clocks.

Revert the wrong change restoring driver control over all pci lanes.
As a temporary workaround for the original issue, users can boot with
"clk_ignore_unused".

Signed-off-by: Josua Mayer <josua@solid-run.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
2025-11-10 15:19:00 +01:00
Rob Herring (Arm)
da69aeac10 arm64/arm: dts: marvell: Rename "nand-rb" pinctrl node names
Update the "nand-rb" pinctrl child node names to use the defined "-pins"
suffix fixing DT schema warnings.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
2025-11-10 15:13:08 +01:00
Niklas Söderlund
7d41b3085b arm64: dts: renesas: r8a779a0: Add GE7800 GPU node
Describe Imagination Technologies PowerVR Rogue GE7800 BNVC 15.5.1.64
present in Renesas R-Car R8A779A0 V3U SoC.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251106212342.2771579-3-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-10 11:09:49 +01:00
Geert Uytterhoeven
3bef06115a Merge tag 'renesas-r8a779a0-dt-binding-defs-tag2' into renesas-dts-for-v6.19
Renesas R-Car V3U Graphics Clock DT Binding Definition

Graphics Clock DT binding definition for the Renesas R-Car V3U
(R8A779A0) SoC, shared by driver and DT source files.
2025-11-10 11:09:40 +01:00
Marek Vasut
303a5185e0 arm64: dts: renesas: r8a77965: Add GE7800 GPU node
Describe Imagination Technologies PowerVR Rogue GE7800 BNVC 15.5.1.64
present in Renesas R-Car R8A77965 M3-N SoC.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Acked-by: Matt Coster <matt.coster@imgtec.com>
Link: https://patch.msgid.link/20251104135716.12497-3-marek.vasut+renesas@mailbox.org
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-10 11:08:15 +01:00
Niklas Söderlund
1de3d9e2cd dt-bindings: clock: r8a779a0: Add ZG core clock
Add the core clock used by the GPU on the Renesas R-Car V3U
(R8A779A0) SoC.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251106211604.2766465-2-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-11-10 10:56:47 +01:00
Ahmad Fatoum
c23cb4565a ARM: dts: stm32: lxa: drop unnecessary vusb_d/a-supply
The LXA device trees are the only STM32MP1 device tree that specify
vusb_d/usb_a-supply and apparently not for good reason:

- vusb_d-supply (vdd_usb) is the same as the phy-supply for usbphyc_port1
- vusb_a-supply (reg18) is the same as vdda1v8-supply for usbphyc_port1

and usbphyc_port1 is linked to the usbotg_hs node via the phys property.

Specifying the regulators in the &usbotg_hs node is thus superfluous and
has been even found to be harmful in one instance:
Linux v6.10 was found to lock up every 50-125 or so reboots on the LXA
TAC when the DWC2 driver probe enables the regulators in bulk, unless
both properties were removed.

This issue was so far not reproducible on v6.17 (> 500 reboots), but as
these properties are unnecessary and different from other STM32MP1
boards, remove them anyway.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Link: https://lore.kernel.org/r/20251007-lxa-usb-dt-v1-1-cacde8088bb9@pengutronix.de
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2025-11-10 09:51:25 +01:00
Jihed Chaibi
e40b061cd3 ARM: dts: stm32: stm32mp157c-phycore: Fix STMPE811 touchscreen node properties
Move st,adc-freq, st,mod-12b, st,ref-sel, and st,sample-time properties
from the touchscreen subnode to the parent touch@44 node. These properties
are defined in the st,stmpe.yaml schema for the parent node, not the
touchscreen subnode, resolving the validation error about unevaluated
properties.

Fixes: 27538a18a4 ("ARM: dts: stm32: add STM32MP1-based Phytec SoM")
Signed-off-by: Jihed Chaibi <jihed.chaibi.dev@gmail.com>
Link: https://lore.kernel.org/r/20250915224611.169980-1-jihed.chaibi.dev@gmail.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
2025-11-10 09:40:46 +01:00
Laurent Pinchart
0d1250c6af arm64: dts: imx8mp: Specify the number of channels for CSI-2 receivers
The CSI-2 receivers in the i.MX8MP have 3 output channels. Specify this
in the device tree, to enable support for more than one channel.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-11-10 16:09:16 +08:00
Khairul Anuar Romli
dd94481408 dt-bindings: firmware: svc: Add IOMMU support for Agilex5
In Agilex5, the TBU (Translation Buffer Unit) can now operate in non-secure
mode, enabling Linux to utilize it through the IOMMU framework. This allows
improved memory management capabilities in non-secure environments. With
Agilex5 lifting this restriction, we are now extending the device tree
bindings to support IOMMU for the Agilex5 SVC.

Signed-off-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-11-09 22:39:41 -06:00
Manikandan Muralidharan
c1c654b5da ARM: at91: at91_dt_defconfig: set MMC_SPI to module
Change CONFIG_MMC_SPI from built-in to module in the
at91_dt_defconfig. This allows the MMC over SPI driver to
be loaded only when needed for WILC subsystem.

Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>
Link: https://lore.kernel.org/r/20250923094552.430595-1-manikandan.m@microchip.com
Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
2025-11-08 11:06:23 +02:00
Alex Elder
41d34e0b54 riscv: dts: spacemit: enable K1 SoC QSPI on BPI-F3
Define DTS nodes to enable support for QSPI on the K1 SoC, including the
pin control configuration used.  Enable QSPI on the Banana Pi BPI-F3 board.

Signed-off-by: Alex Elder <elder@riscstar.com>
Link: https://lore.kernel.org/r/20251027133008.360237-9-elder@riscstar.com
Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-11-08 09:37:40 +08:00
Hou Wenlong
995d504100 KVM: x86: Don't disable IRQs when unregistering user-return notifier
Remove the code to disable IRQs when unregistering KVM's user-return
notifier now that KVM doesn't invoke kvm_on_user_return() when disabling
virtualization via IPI function call, i.e. now that there's no need to
guard against re-entrancy via IPI callback.

Note, disabling IRQs has largely been unnecessary since commit
a377ac1cd9 ("x86/entry: Move user return notifier out of loop") moved
fire_user_return_notifiers() into the section with IRQs disabled.  In doing
so, the commit somewhat inadvertently fixed the underlying issue that
was papered over by commit 1650b4ebc9 ("KVM: Disable irq while
unregistering user notifier").  I.e. in practice, the code and comment
has been stale since commit a377ac1cd9.

Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
[sean: rewrite changelog after rebasing, drop lockdep assert]
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030191528.3380553-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-07 10:59:47 -08:00
Sean Christopherson
2baa33a8dd KVM: x86: Leave user-return notifier registered on reboot/shutdown
Leave KVM's user-return notifier registered in the unlikely case that the
notifier is registered when disabling virtualization via IPI callback in
response to reboot/shutdown.  On reboot/shutdown, keeping the notifier
registered is ok as far as MSR state is concerned (arguably better then
restoring MSRs at an unknown point in time), as the callback will run
cleanly and restore host MSRs if the CPU manages to return to userspace
before the system goes down.

The only wrinkle is that if kvm.ko module unload manages to race with
reboot/shutdown, then leaving the notifier registered could lead to
use-after-free due to calling into unloaded kvm.ko module code.  But such
a race is only possible on --forced reboot/shutdown, because otherwise
userspace tasks would be frozen before kvm_shutdown() is called, i.e. on a
"normal" reboot/shutdown, it should be impossible for the CPU to return to
userspace after kvm_shutdown().

Furthermore, on a --forced reboot/shutdown, unregistering the user-return
hook from IRQ context doesn't fully guard against use-after-free, because
KVM could immediately re-register the hook, e.g. if the IRQ arrives before
kvm_user_return_register_notifier() is called.

Rather than trying to guard against the IPI in the "normal" user-return
code, which is difficult and noisy, simply leave the user-return notifier
registered on a reboot, and bump the kvm.ko module refcount to defend
against a use-after-free due to kvm.ko unload racing against reboot.

Alternatively, KVM could allow kvm.ko and try to drop the notifiers during
kvm_x86_exit(), but that's also a can of worms as registration is per-CPU,
and so KVM would need to blast an IPI, and doing so while a reboot/shutdown
is in-progress is far risky than preventing userspace from unloading KVM.

Link: https://patch.msgid.link/20251030191528.3380553-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-07 10:59:46 -08:00
Sean Christopherson
b371174d2f KVM: x86: WARN if user-return MSR notifier is registered on exit
When freeing the per-CPU user-return MSRs structures, WARN if any CPU has
a registered notifier to help detect and/or debug potential use-after-free
issues.  The lifecycle of the notifiers is rather convoluted, and has
several non-obvious paths where notifiers are unregistered, i.e. isn't
exactly the most robust code possible.

The notifiers they are registered on-demand in KVM, on the first WRMSR to
a tracked register.  _Usually_ the notifier is unregistered whenever the
CPU returns to userspace.  But because any given CPU isn't guaranteed to
return to userspace, e.g. the CPU could be offlined before doing so, KVM
also "drops", a.k.a. unregisters, the notifiers when virtualization is
disabled on the CPU.

Further complicating the unregister path is the fact that the calls to
disable virtualization come from common KVM, and the per-CPU calls are
guarded by a per-CPU flag (to harden _that_ code against bugs, e.g. due to
mishandling reboot).  Reboot/shutdown in particular is problematic, as KVM
disables virtualization via IPI function call, i.e. from IRQ context,
instead of using the cpuhp framework, which runs in task context.  I.e. on
reboot/shutdown, drop_user_return_notifiers() is called asynchronously.

Forced reboot/shutdown is the most problematic scenario, as userspace tasks
are not frozen before kvm_shutdown() is invoked, i.e. KVM could be actively
manipulating the user-return MSR lists and/or notifiers when the IPI
arrives.  To a certain extent, all bets are off when userspace forces a
reboot/shutdown, but KVM should at least avoid a use-after-free, e.g. to
avoid crashing the kernel when trying to reboot.

Link: https://patch.msgid.link/20251030191528.3380553-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-07 10:59:46 -08:00
Sean Christopherson
c0711f8c61 KVM: TDX: Explicitly set user-return MSRs that *may* be clobbered by the TDX-Module
Set all user-return MSRs to their post-TD-exit value when preparing to run
a TDX vCPU to ensure the value that KVM expects to be loaded after running
the vCPU is indeed the value that's loaded in hardware.  If the TDX-Module
doesn't actually enter the guest, i.e. doesn't do VM-Enter, then it won't
"restore" VMM state, i.e. won't clobber user-return MSRs to their expected
post-run values, in which case simply updating KVM's "cached" value will
effectively corrupt the cache due to hardware still holding the original
value.

In theory, KVM could conditionally update the current user-return value if
and only if tdh_vp_enter() succeeds, but in practice "success" doesn't
guarantee the TDX-Module actually entered the guest, e.g. if the TDX-Module
synthesizes an EPT Violation because it suspects a zero-step attack.

Force-load the expected values instead of trying to decipher whether or
not the TDX-Module restored/clobbered MSRs, as the risk doesn't justify
the benefits.  Effectively avoiding four WRMSRs once per run loop (even if
the vCPU is scheduled out, user-return MSRs only need to be reloaded if
the CPU exits to userspace or runs a non-TDX vCPU) is likely in the noise
when amortized over all entries, given the cost of running a TDX vCPU.
E.g. the cost of the WRMSRs is somewhere between ~300 and ~500 cycles,
whereas the cost of a _single_ roundtrip to/from a TDX guest is thousands
of cycles.

Fixes: e0b4f31a3c ("KVM: TDX: restore user ret MSRs")
Cc: stable@vger.kernel.org
Cc: Yan Zhao <yan.y.zhao@intel.com>
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://patch.msgid.link/20251030191528.3380553-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-07 10:59:45 -08:00
Maxim Levitsky
ab4e41eb9f KVM: x86: Don't clear async #PF queue when CR0.PG is disabled (e.g. on #SMI)
Fix an interaction between SMM and PV asynchronous #PFs where an #SMI can
cause KVM to drop an async #PF ready event, and thus result in guest tasks
becoming permanently stuck due to the task that encountered the #PF never
being resumed.  Specifically, don't clear the completion queue when paging
is disabled, and re-check for completed async #PFs if/when paging is
enabled.

Prior to commit 2635b5c4a0 ("KVM: x86: interrupt based APF 'page ready'
event delivery"), flushing the APF queue without notifying the guest of
completed APF requests when paging is disabled was "necessary", in that
delivering a #PF to the guest when paging is disabled would likely confuse
and/or crash the guest.  And presumably the original async #PF development
assumed that a guest would only disable paging when there was no intent to
ever re-enable paging.

That assumption fails in several scenarios, most visibly on an emulated
SMI, as entering SMM always disables CR0.PG (i.e. initially runs with
paging disabled).  When the SMM handler eventually executes RSM, the
interrupted paging-enabled is restored, and the async #PF event is lost.

Similarly, invoking firmware, e.g. via EFI runtime calls, might require a
transition through paging modes and thus also disable paging with valid
entries in the competion queue.

To avoid dropping completion events, drop the "clear" entirely, and handle
paging-enable transitions in the same way KVM already handles APIC
enable/disable events: if a vCPU's APIC is disabled, APF completion events
are not kept pending and not injected while APIC is disabled.  Once a
vCPU's APIC is re-enabled, KVM raises KVM_REQ_APF_READY so that the vCPU
recognizes any pending pending #APF ready events.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251015033258.50974-4-mlevitsk@redhat.com
[sean: rework changelog to call out #PF injection, drop "real mode"
       references, expand the code comment]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-07 10:59:44 -08:00
Maxim Levitsky
68c35f89d0 KVM: x86: Fix a semi theoretical bug in kvm_arch_async_page_present_queued()
Fix a semi theoretical race condition related to a lack of memory barriers
when dealing with vcpu->arch.apf.pageready_pending.  In theory, the "ready"
side could see a stale pageready_pending and neglect to kick the vCPU, and
thus allow the vCPU to enter the guest with a pending KVM_REQ_APF_READY
and no kick/IPI on the way, in which case the KVM would fail to deliver a
completed async #PF event to the guest in a timely manner as the request
would be recognized only on the next (coincidental) VM-Exit.

kvm_arch_async_page_present_queued() running in workqueue context:

    kvm_make_request(KVM_REQ_APF_READY, vcpu);
    /* memory barrier is missing here*/
    if (!vcpu->arch.apf.pageready_pending)
        kvm_vcpu_kick(vcpu);

kvm_set_msr_common() running in task context:

    vcpu->arch.apf.pageready_pending = false;
    /* memory barrier is missing here*/

And later, vcpu_enter_guest() running in task context:

    if (kvm_check_request(KVM_REQ_APF_READY, vcpu))
         kvm_check_async_pf_completion(vcpu)

Add missing full memory barriers in both cases to avoid theoretical
case of not kicking the vCPU thread.

Note that the bug is mostly theoretical because kvm_make_request()
uses an atomic operation, which is always serializing on x86, requiring
only for documentation purposes the smp_mb__after_atomic() after it
(smp_mb__after_atomic() is a NOP on x86).

The second missing barrier, between kvm_set_msr_common() and
vcpu_enter_guest(), isn't strictly needed because KVM executes several
barriers in between calling these functions, however it still makes
sense to have an explicit barrier to be on the safe side and to document
the ordering dependencies.

Finally, also use READ_ONCE/WRITE_ONCE.

Thanks a lot to Paolo for the help with this patch.

Link: https://lore.kernel.org/all/7c7a5a75-a786-4a05-a836-4368582ca4c2@redhat.com
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://patch.msgid.link/20251015033258.50974-3-mlevitsk@redhat.com
[sean: explain the race and its impact in more detail]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-07 10:59:43 -08:00
Coia Prant
156a530ed5 arm64: dts: rockchip: Add devicetree for the 9Tripod X3568 v4
The 9Tripod X3568 v4 is an RK3568-based SBC, just like the RK3568-EVB.
It always uses soldered connections between the X3568CV2/X3568CV3/X3568CV4 core board
and the X3568bv4 I/O board.

The differences between the core boards
- PCB size, layout
- CPU (RKK3568B2/RK3568J)
- Memory type (DDR4/LPDDR4/LPDDR4X) and size
- eMMC size
- DSI/EDP resistor values
Although the components vary, they maintain full compatibility.

The X3568 board has multiple hardware revisions, and we currently support v4 (I/O board).

Specification:
- SoC: RockChip RK3568 ARM64 (4 cores)
- eMMC: 16-128 GB
- RAM: 2-16 GB
- Power: DC 12V 2A
- Ethernet: 2x YT8521SC RGMII (10/100/1000 Mbps)
- Wireless radio: 802.11b/g/n/ac/ax dual-band
- LED:
  Power: AlwaysOn
  User: GPIO
- Button:
  VOL+: SARADC/0 <35k µV>
  VOL-: SARADC/0 <450k µV>
  Power/Reset: PMIC RK809
- CAN
  CAN/1: 4-pin (PH 2.0)
- PWM
  PWM/4: Backlight DSI/0 DSI/1
  PWM/7: IR Receiver [may not install]
- UART:
  UART/2: Debug TTL - 1500000 8N1 (1.25mm)
  UART/3: TTL (PH 2.0)
  UART/4: TTL (PH 2.0)
  UART/8: AP6275S Bluetooth
  UART/9: TTL (PH 2.0)
- I2C:
  I2C/0: PMIC RK809
  I2C/1: Touchscreen DSI/0 DSI/1
  I2C/4: Camera
  I2C/5: RTC@51 PCF8563
- I2S:
  I2S/0: miniHDMI Sound
  I2S/1: RK809 Audio Codec
  I2S/3: AP6275S Bluetooth Sound
- SDMMC:
  SDMMC/0: microSD (TF) slot
  SDMMC/2: AP6275S SDIO WiFi card
- Camera: 1x CSI
- Video: miniHDMI / DSI0 (MIPI/LVDS) / DSI1 (MIPI/EDP)
- Audio: miniHDMI / MIC on-board / Speaker / SPDIF / 3.5mm Headphones / AP6275S Bluetooth
- USB:
  USB 2.0 HOST x2
  USB 2.0 HOST x3 (4-pin)
  USB 2.0 OTG x1 (shared with USB 3.0 OTG/HOST) [slot may not install]
  USB 3.0 HOST x1
  USB 3.0 OTG/HOST x1
- SATA: 1x SATA 3.0 with Power/4-pin [slot may not install]
- PCIe: 1x PCIe 3.0 x2 (x4 connecter) [power/clock/slot may not install]

Link:
- https://appletsapi.52solution.com/media/X3568V4%E5%BC%80%E5%8F%91%E6%9D%BF%E7%A1%AC%E4%BB%B6%E6%89%8B%E5%86%8C.pdf
- https://blog.gov.cooking/archives/research-ninetripod-x3568-v4-and-flash.html

Signed-off-by: Coia Prant <coiaprant@gmail.com>
Link: https://patch.msgid.link/20251103171702.1518730-4-coiaprant@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-06 23:55:21 +01:00
Coia Prant
a12838aab0 dt-bindings: arm: rockchip: Add 9Tripod X3568 series
This documents 9Tripod X3568 v4 which is a SBC based on RK3568 SoC.

Link: http://www.9tripod.com/showpro.php?id=117
Link: https://appletsapi.52solution.com/media/X3568V4%E5%BC%80%E5%8F%91%E6%9D%BF%E7%A1%AC%E4%BB%B6%E6%89%8B%E5%86%8C.pdf

Signed-off-by: Coia Prant <coiaprant@gmail.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20251103171702.1518730-3-coiaprant@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-06 23:36:18 +01:00
Coia Prant
25802f8d16 dt-bindings: vendor-prefixes: Add 9Tripod
Add 9Tripod to the vendor prefixes.

Signed-off-by: Coia Prant <coiaprant@gmail.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20251103171702.1518730-2-coiaprant@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-06 23:36:18 +01:00
FUKAUMI Naoki
fbf90d1b69 arm64: dts: rockchip: Fix USB Type-C host mode for Radxa ROCK 5B+/5T
The Radxa ROCK 5B+/5T USB Type-C port supports Dual Role Data and
should also act as a host. However, currently, when acting as a host,
only self-powered devices work.

Since the ROCK 5B+ supports Dual Role Power, set the power-role
property to "dual" and the try-power-role property to "sink". (along
with related properties)

The ROCK 5T should only support the "source" power-role.

This allows the port to act as a host, supply power to the port, and
allow bus-powered devices to work.

Note that on the ROCK 5T, with this patch applied, it has been
observed that some bus-powered devices do not work correctly. Also,
it has been observed that after connecting a device (and the data-role
switches to host), connecting a host device does not switch the
data-role back to the device role. These issues should be addressed
separately.

Note that there is a separate known issue where USB 3.0 SuperSpeed
devices do not work when oriented in reverse. This issue should also be
addressed separately. (USB 2.0/1.1 devices work in both orientations)

Fixes: 67b2c15d8f ("arm64: dts: rockchip: add USB-C support for ROCK 5B/5B+/5T")
Signed-off-by: FUKAUMI Naoki <naoki@radxa.com>
Link: https://patch.msgid.link/20251104085227.820-1-naoki@radxa.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-06 23:32:21 +01:00
Chris Morgan
c6c76445c3 arm64: dts: rockchip: Fix DMA for Indiedroid Nova Bluetooth
When the device was first added, there was a problem with the bluetooth
controller that manifested when DMA was enabled for the underlying UART
interface. At some point in the intervening time the problem appears to
have been resolved. Add the UART rx and tx channels back to re-enable
UART.

Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
Link: https://patch.msgid.link/20251105205708.732125-6-macroalpha82@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-06 23:26:45 +01:00
Chris Morgan
cce837b7a3 arm64: dts: rockchip: Enable HS400 for Indiedroid Nova
Once the DLL_STRBIN_TAPNUM_DEFAULT is set to the correct value [1], the
workaround to ensure stable eMMC operation is no longer needed. I
have observed testing this that hdparm went from:
"Timing buffered disk reads: 502 MB in  3.00 seconds = 167.14 MB/sec"
to the following:
"Timing buffered disk reads: 832 MB in  3.00 seconds = 277.32 MB/sec"

[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/mmc/host/sdhci-of-dwcmshc.c?id=a28352cf2d2f8380e7aca8cb61682396dca7a991

Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
Link: https://patch.msgid.link/20251105205708.732125-5-macroalpha82@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-06 23:26:45 +01:00
Chris Morgan
dfa4f7f5cb arm64: dts: rockchip: enable NPU on Indiedroid Nova
Enable the NPU on the Indiedroid Nova.

Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
Link: https://patch.msgid.link/20251105205708.732125-4-macroalpha82@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-06 23:26:45 +01:00
Chris Morgan
318bc4b5ad arm64: dts: rockchip: Add device type for Indiedroid Nova.
Update the devicetree node to note the device category of the
Indiedroid Nova as an embedded device.

Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
Link: https://patch.msgid.link/20251105205708.732125-2-macroalpha82@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-06 23:26:45 +01:00
Chris Morgan
8a429c5835 arm64: dts: rockchip: enable NPU on Gameforce Ace
Enable the NPU on the Gameforce Ace.

Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
Link: https://patch.msgid.link/20251106182436.771805-1-macroalpha82@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-06 23:25:20 +01:00
Uros Bizjak
c331b400e2 KVM: SVM: Ensure SPEC_CTRL[63:32] is context switched between guest and host
SPEC_CTRL is an MSR, i.e. a 64-bit value, but the VMRUN assembly code
assumes bits 63:32 are always zero.  The bug is _currently_ benign because
neither KVM nor the kernel support setting any of bits 63:32, but it's
still a bug that needs to be fixed.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://patch.msgid.link/20251106191230.182393-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-06 12:35:51 -08:00
Mukesh Ojha
6918667af5 soc: qcom: socinfo: Add reserve field to support future extension
Some of the new field added to socinfo structure with version 21, 22
and 23 which is only used by boot firmware and it is of no use for
Linux.Add reserve field in socinfo so that the structure remain
updated and prepared if we get any new field in future which could
be used by Linux. While at it, also updates switch case for backward
compatibility if the SoC runs with boot firmware which has these
new version added.

Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251104130906.167666-2-mukesh.ojha@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-06 14:11:54 -06:00
Mukesh Ojha
6a571d762c soc: qcom: socinfo: Add support for new fields in revision 20
Add support for socinfo version 20. Version 20 adds a new field
package id and its zeroth bit contain information that can be
can be used to tune temperature thresholds on devices which might
be able to withstand higher temperatures. Zeroth bit value 1 means
that its heat dissipation is better and more relaxed thermal
scheme can be put in place and 0 means a more aggressive scheme
may be needed.

Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251104130906.167666-1-mukesh.ojha@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-06 14:11:54 -06:00
Jie Gan
35ddab2845 arm64: dts: qcom: kodiak: add coresight nodes
Add TPDM, TPDA, CTI and funnel coresight devices for AOSS and QDSS
blocks.

Signed-off-by: Jie Gan <jie.gan@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251104-add-coresight-nodes-for-sc7280-v2-1-c67fa3890c2a@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-06 14:04:26 -06:00
Uros Bizjak
32ed0bc2f0 KVM: VMX: Ensure guest's SPEC_CTRL[63:32] is loaded on VM-Enter
SPEC_CTRL is an MSR, i.e. a 64-bit value, but the assembly code that loads
the guest's value assumes bits 63:32 are always zero.  The bug is
_currently_ benign because neither KVM nor the kernel support setting any
of bits 63:32, but it's still a bug that needs to be fixed.

Note, the host's value is restored in C code and is unaffected.

Fixes: 07853adc29 ("KVM: VMX: Prevent RSB underflow before vmenter")
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://patch.msgid.link/20250820100007.356761-1-ubizjak@gmail.com
[sean: call out that only the guest's value is affected]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-06 06:23:52 -08:00
Troy Mitchell
0ee5993466 riscv: dts: spacemit: add MusePi Pro board device tree
Add initial device tree support for the MusePi Pro board [1].
The board is using the SpacemiT K1/M1 SoC.

This device tree is adapted from the SpacemiT vendor tree [2] and
enables basic board functionality, including UART console, LED, eMMC,
Ethernet, and PDMA.

Link: https://developer.spacemit.com/documentation?token=YJtdwnvvViPVcmkoPDpcvwfVnrh&type=pdf [1]
Link: https://gitee.com/bianbu-linux/linux-6.6/blob/k1-bl-v2.2.y/arch/riscv/boot/dts/spacemit/k1-x_MUSE-Pi-Pro.dts [2]
Signed-off-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Link: https://lore.kernel.org/r/20251023-k1-musepi-pro-dts-v4-2-01836303e10f@linux.spacemit.com
Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-11-06 20:07:52 +08:00
Troy Mitchell
2cc2289063 dt-bindings: riscv: spacemit: add MusePi Pro board
Document the compatible string for the MusePi Pro [1]. It is a 1.8-inch
single board computer based on the SpacemiT K1/M1 RISC-V SoC [2].

Here's a refined list of its core features:
  - SoC: SpacemiT M1/K1, 8-core 64-bit RISC-V.
  - Memory: LPDDR4X @ 2400MT/s, available in 8GB & 16GB options.
  - Storage: Onboard eMMC 5.1 (64GB/128GB options), M.2 M-Key for NVMe
             SSD (2230 size), and a microSD slot (UHS-II) for expansion.
  - Display: HDMI 1.4 (1080P@60Hz) and 2-lane MIPI DSI FPC (1080P@60Hz).
  - Connectivity: Onboard Wi-Fi 6 & Bluetooth 5.2, single Gigabit Ethernet
                  port (RJ45).
  - USB: 4x USB 3.0 Type-A (host) and 1x USB 2.0 Type-C (device/OTG).
  - Expansion: Full-size miniPCIe slot and a second M.2 M-Key (2230).
  - GPIO: Standard 40-pin GPIO interface.
  - MIPI: 1x 4-lane MIPI CSI FPC and 2x MIPI DSI FPC interfaces.
  - Clock: Onboard RTC with battery support.

Link: https://developer.spacemit.com/documentation?token=YJtdwnvvViPVcmkoPDpcvwfVnrh&type=pdf [1]
Link: https://www.spacemit.com/en/key-stone-k1 [2]
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Link: https://lore.kernel.org/r/20251023-k1-musepi-pro-dts-v4-1-01836303e10f@linux.spacemit.com
Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-11-06 20:07:40 +08:00
Thomas Weißschuh
293f71435d um: Always set up AT_HWCAP and AT_PLATFORM
Historically the code to set up AT_HWCAP and AT_PLATFORM was only built
for 32bit x86 as it was intermingled with the vDSO passthrough code.

Now that vDSO passthrough has been removed, always pass through AT_HWCAP
and AT_PLATFORM.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251028-uml-remove-32bit-pseudo-vdso-v1-10-e930063eff5f@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-06 13:02:34 +01:00
Thomas Weißschuh
117e796fc5 x86/um: Remove FIXADDR_USER_START and FIXADDR_USE_END
There are no users left.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251028-uml-remove-32bit-pseudo-vdso-v1-9-e930063eff5f@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-06 13:02:34 +01:00
Thomas Weißschuh
f20e32ffda um: Remove __access_ok_vsyscall()
FIXADDR_USER_START and FIXADDR_USER_END are now always zero.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251028-uml-remove-32bit-pseudo-vdso-v1-8-e930063eff5f@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-06 13:02:34 +01:00
Thomas Weißschuh
880f615bf9 um: Remove redundant range check from __access_ok_vsyscall()
The only caller __access_ok() is already doing the same check through
__addr_range_nowrap().

Remove the redundant check.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251028-uml-remove-32bit-pseudo-vdso-v1-7-e930063eff5f@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-06 13:02:33 +01:00
Thomas Weißschuh
78fdfc9fc4 um: Remove fixaddr_user_init()
With the removal of the vDSO passthrough from the host,
FIXADDR_USER_START is always 0 and fixaddr_user_init() is dead code.

Remove it.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251028-uml-remove-32bit-pseudo-vdso-v1-6-e930063eff5f@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-06 13:02:33 +01:00
Thomas Weißschuh
dbd7cf408a x86/um: Drop gate area handling
With the removal of the vDSO passthrough from the host,
FIXADDR_USER_START is always 0 and the gate area setup code is dead.

Remove it.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251028-uml-remove-32bit-pseudo-vdso-v1-5-e930063eff5f@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-06 13:02:33 +01:00
Thomas Weißschuh
70d52694b6 x86/um: Do not inherit vDSO from host
Inheriting the vDSO from the host is problematic. The values read
from the time functions will not be correct for the UML kernel.
Furthermore the start and end of the vDSO are not stable or
detectable by userspace. Specifically the vDSO datapages start
before AT_SYSINFO_EHDR and the vDSO itself is larger than a single page.

This codepath is only used on 32bit x86 UML. In my testing with both
32bit and 64bit hosts the passthrough functionality has always been
disabled anyways due to the checks against envp in scan_elf_aux().

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251028-uml-remove-32bit-pseudo-vdso-v1-4-e930063eff5f@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-06 13:02:33 +01:00
Thomas Weißschuh
83b4b44a2b um: Split out default elf_aux_hwcap
Setting all auxiliary vector values to default values if one of them
was not provided by the host will discard perfectly fine values.

Remove the elf_aux_platform fallback from the vDSO ones.
As zero is the correct fallback anyways, don't create a new conditional.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251028-uml-remove-32bit-pseudo-vdso-v1-3-e930063eff5f@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-06 13:02:33 +01:00
Thomas Weißschuh
c1b0775151 x86/um: Move ELF_PLATFORM fallback to x86-specific code
The generic UM code should not have references to x86-specific value.

Move the fallback into the x86-specific header.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251028-uml-remove-32bit-pseudo-vdso-v1-2-e930063eff5f@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-06 13:02:33 +01:00
Thomas Weißschuh
74d438872d um: Split out default elf_aux_platform
Setting all auxiliary vector values to default values if one of them
was not provided by the host will discard perfectly fine values.

Move the elf_aux_platform fallback to its own conditional.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251028-uml-remove-32bit-pseudo-vdso-v1-1-e930063eff5f@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-06 13:02:32 +01:00
Gary Yang
1f0de24c54 arm64: dts: cix: Add pinctrl nodes for sky1
Add the pin-controller nodes for Sky1 platform.

Signed-off-by: Gary Yang <gary.yang@cixtech.com>
Link: https://lore.kernel.org/r/20251021070410.3585997-4-gary.yang@cixtech.com
Signed-off-by: Peter Chen <peter.chen@cixtech.com>
2025-11-06 19:26:53 +08:00
Jun Guo
74178bb23c arm64: dts: cix: add DT nodes for SPI
Add the device tree node for the spi controller of the CIX SKY1 SoC.

Signed-off-by: Jun Guo <jun.guo@cixtech.com>
Link: https://lore.kernel.org/r/20250919013118.853078-1-jun.guo@cixtech.com
Signed-off-by: Peter Chen <peter.chen@cixtech.com>
2025-11-06 19:26:53 +08:00
Sjoerd Simons
9b2d2beaef arm64: dts: mediatek: mt7981b-openwrt-one: Enable software leds
The openwrt has 3 status leds at the front:
* red: Used as failsafe led by openwrt
* white: Used as boot led by openwrt
* green: Used as running/upgrade led by openwrt

On the back each RJ45 jack has the typical amber/green leds. For the WAN
jack this is hardware controlled by the phy, for LAN these are under
software control and enabled by this patch.

Signed-off-by: Sjoerd Simons <sjoerd@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-11-06 10:06:52 +01:00
Sjoerd Simons
869b3bb5ad arm64: dts: mediatek: mt7981b-openwrt-one: Enable SPI NOR
The openwrt one has a SPI NOR flash which from factory is used for:
* Recovery system
* WiFi eeprom data
* ethernet Mac addresses

Describe this following the same partitions as the openwrt configuration
uses.

Signed-off-by: Sjoerd Simons <sjoerd@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-11-06 10:06:52 +01:00
Frank Wunderlich
a58c368067 arm64: dts: mediatek: mt7988a-bpi-r4pro: Add mmc overlays
Add MMC overlays for BPI-R4 Pro.

Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-11-06 10:02:58 +01:00
Frank Wunderlich
dec929e61a arm64: dts: mediatek: mt7988a-bpi-r4-pro: Add PCIe overlays
Add overlays to switch between key-m and key-e slots.

Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-11-06 10:02:58 +01:00
Frank Wunderlich
f397471a6a arm64: dts: mediatek: mt7988: Add devicetree for BananaPi R4 Pro
Add devicetree for Bpi-R4-Pro.

BananaPi R4 Pro is a MT7988A based board which exists in 2 different
hardware versions:

- 4E: 4 GB RAM and using internal 2.5G Phy for WAN-Combo
- 8X: 8 GB RAM and 2x Aeonsemi AS21010P 10G phys

common parts:

- MediaTek MT7988A Quad-core Arm Corex-A73,1.8GHz processor
- 8GB eMMC flash
- 256MB SPI-NAND Flash
- Micro SD card slot
- 1x 10G SFP+ WAN
- 1x 10G SFP+ LAN
- 4x 2.5G RJ45 LAN (MxL86252C)
- 1x 1G RJ45 LAN (MT7988 internal switch)
- 2x miniPCIe slots with PCIe3.0 2lane interface for Wi-Fi NIC
- 2x M.2 M-KEY slots with PCIe3.0 1lane interface for NVME SSD
- 3x M.2 B-KEY slots with USB3.2 for 5G Module (PCIe shared with key-m)
- 1x USB3.2 slot
- 1x USB2.0 slot
- 1x USB TypeC Debug Console
- 2x13 PIN Header for expanding application

https://docs.banana-pi.org/en/BPI-R4_Pro/BananaPi_BPI-R4_Pro

The PCIe is per default in key-m state and can be changed to key-b with
the pcie-overlays.

Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-11-06 10:02:58 +01:00
Frank Wunderlich
0e5d9e5293 arm64: dts: mediatek: mt7988: Disable 2.5G phy and enable at board layer
The internal 2.5G phy of mt7988 is only used by some specific board
variants.
Disable it by default and enable it where needed.

Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-11-06 10:02:57 +01:00
Siddharth Vadapalli
c984dd0ecd arm64: dts: ti: k3-j721s2: disable "mcu_cpsw" in SoC file and enable in board files
Following the existing convention of disabling nodes in the SoC file and
enabling only the required ones in the board file, disable "mcu_cpsw" node
in the SoC file "k3-j721s2-mcu-wakeup.dtsi" and enable it in the board
files:
a) k3-am68-phyboard-izar.dts
b) k3-am68-sk-base-board.dts
c) k3-j721s2-common-proc-board.dts

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://patch.msgid.link/20251015111344.3639415-6-s-vadapalli@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-06 11:48:29 +05:30
Siddharth Vadapalli
ee90abbed7 arm64: dts: ti: k3-j721e: disable "mcu_cpsw" in SoC file and enable it in board file
Following the existing convention of disabling nodes in the SoC file and
enabling only the required ones in the board file, disable "mcu_cpsw" node
in the SoC file "k3-j721e-mcu-wakeup.dtsi" and enable it in the board
files:
a) k3-j721e-beagleboneai64.dts
b) k3-j721e-common-proc-board.dts
c) k3-j721e-sk.dts

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://patch.msgid.link/20251015111344.3639415-5-s-vadapalli@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-06 11:48:29 +05:30
Siddharth Vadapalli
5a74aa002c arm64: dts: ti: k3-j7200: disable "mcu_cpsw" in SoC file and enable in board file
Following the existing convention of disabling nodes in the SoC file and
enabling only the required ones in the board file, disable "mcu_cpsw" node
in the SoC file "k3-j7200-mcu-wakeup.dtsi" and enable it in the board file
"k3-j7200-common-proc-board.dts".

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://patch.msgid.link/20251015111344.3639415-4-s-vadapalli@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-06 11:48:29 +05:30
Siddharth Vadapalli
6d2138b85e arm64: dts: ti: k3-am65: disable "mcu_cpsw" in SoC file and enable in board file
Following the existing convention of disabling nodes in the SoC file and
enabling only the required ones in the board file, disable "mcu_cpsw" node
in the SoC file "k3-am65-mcu.dtsi" and enable it in the board file
"k3-am654-base-board.dts". Also, now that "mcu_cpsw" is disabled in the
SoC file, disabling it in "k3-am65-iot2050-common.dtsi" is no longer
required. Hence, remove the section corresponding to this change.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://patch.msgid.link/20251015111344.3639415-3-s-vadapalli@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-06 11:48:29 +05:30
Siddharth Vadapalli
e8535e2b27 arm64: dts: ti: k3-am62: disable "cpsw3g" in SoC file and enable in board file
Following the existing convention of disabling nodes in the SoC file and
enabling only the required ones in the board file, disable "cpsw3g" node
in the SoC file "k3-am62-main.dtsi" and enable it in the board (or board
include) files:
a) k3-am62-lp-sk.dts
b) k3-am62-phycore-som.dtsi
c) k3-am625-beagleplay.dts
d) k3-am625-sk-common.dtsi

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://patch.msgid.link/20251015111344.3639415-2-s-vadapalli@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-06 11:48:29 +05:30
Markus Schneider-Pargmann (TI.com)
1581a732f1 arm64: dts: ti: k3-am62p5-sk: Set wakeup-source system-states
The CANUART pins of mcu_mcan0, mcu_mcan1, mcu_uart0 and wkup_uart0 are
powered during Partial-IO and I/O Only + DDR and are capable of waking
up the system in these states. Specify the states in which these units
can do a wakeup on this board.

Note that the UARTs are not capable of wakeup in Partial-IO because of
of a UART mux on the board not being powered during Partial-IO.

Add pincontrol definitions for mcu_mcan0 and mcu_mcan1 for wakeup from
Partial-IO. Add these as wakeup pinctrl entries for both devices.

Signed-off-by: Markus Schneider-Pargmann (TI.com) <msp@baylibre.com>
Link: https://patch.msgid.link/20251103-topic-am62-dt-partialio-v6-15-v5-6-b8d9ff5f2742@baylibre.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-06 11:48:29 +05:30
Markus Schneider-Pargmann (TI.com)
afb919a6b3 arm64: dts: ti: k3-am62a7-sk: Set wakeup-source system-states
The CANUART pins of mcu_mcan0, mcu_mcan1, mcu_uart0 and wkup_uart0 are
powered during Partial-IO and I/O Only + DDR and are capable of waking
up the system in these states. Specify the states in which these units
can do a wakeup on this board.

Note that the UARTs are not capable of wakeup in Partial-IO because of
of a UART mux on the board not being powered during Partial-IO.

Add pincontrol definitions for mcu_mcan0 and mcu_mcan1 for wakeup from
Partial-IO. Add these as wakeup pinctrl entries for both devices.

Signed-off-by: Markus Schneider-Pargmann (TI.com) <msp@baylibre.com>
Link: https://patch.msgid.link/20251103-topic-am62-dt-partialio-v6-15-v5-5-b8d9ff5f2742@baylibre.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-06 11:48:29 +05:30
Markus Schneider-Pargmann (TI.com)
1f2f34b619 arm64: dts: ti: k3-am62-lp-sk: Set wakeup-source system-states
The CANUART pins of mcu_mcan0, mcu_mcan1, mcu_uart0 and wkup_uart0 are
powered during Partial-IO and I/O Only + DDR and are capable of waking
up the system in these states. Specify the states in which these units
can do a wakeup on this board.

Note that the UARTs are not capable of wakeup in Partial-IO because of
of a UART mux on the board not being powered during Partial-IO. As I/O
Only + DDR is not supported on AM62x, the UARTs are not added in this
patch.

Add pincontrol definitions for mcu_mcan0 and mcu_mcan1 for wakeup from
Partial-IO. Add these as wakeup pinctrl entries for both devices.

Signed-off-by: Markus Schneider-Pargmann (TI.com) <msp@baylibre.com>
Link: https://patch.msgid.link/20251103-topic-am62-dt-partialio-v6-15-v5-4-b8d9ff5f2742@baylibre.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-06 11:48:28 +05:30
Markus Schneider-Pargmann (TI.com)
4060cf6015 arm64: dts: ti: k3-am62p: Define possible system states
Add the system states that are available on TI AM62P SoCs.

Signed-off-by: Markus Schneider-Pargmann (TI.com) <msp@baylibre.com>
Link: https://patch.msgid.link/20251103-topic-am62-dt-partialio-v6-15-v5-3-b8d9ff5f2742@baylibre.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-06 11:48:28 +05:30
Markus Schneider-Pargmann (TI.com)
6992c72c3a arm64: dts: ti: k3-am62a: Define possible system states
Add the system states that are available on TI AM62A SoCs.

Signed-off-by: Markus Schneider-Pargmann (TI.com) <msp@baylibre.com>
Link: https://patch.msgid.link/20251103-topic-am62-dt-partialio-v6-15-v5-2-b8d9ff5f2742@baylibre.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-06 11:48:28 +05:30
Markus Schneider-Pargmann (TI.com)
1b45cec18f arm64: dts: ti: k3-am62: Define possible system states
Add the system states that are available on TI AM62 SoCs.

Signed-off-by: Markus Schneider-Pargmann (TI.com) <msp@baylibre.com>
Link: https://patch.msgid.link/20251103-topic-am62-dt-partialio-v6-15-v5-1-b8d9ff5f2742@baylibre.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-06 11:48:28 +05:30
Michael Walle
2fc9f6f112 arm64: dts: ti: k3-am62p-j722s-common-main: move audio_refclk here
Since commit 9dee9cb2df08 ("arm64: dts: ti: k3-j722s-main: fix the audio
refclk source") the clock nodes of the am62p and j722 are the same. Move
them into the commit dtsi.

Please note, that for the j722s the nodes are renamed from clock@ to
clock-controller@.

Suggested-by: Udit Kumar <u-kumar1@ti.com>
Signed-off-by: Michael Walle <mwalle@kernel.org>
Link: https://patch.msgid.link/20251103152826.1608309-1-mwalle@kernel.org
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-06 11:48:28 +05:30
Siddharth Vadapalli
1446fc4dc0 arm64: dts: ti: k3-*: Replace rgmii-rxid with rgmii-id for CPSW ports
The MAC Ports across all of the CPSW instances (CPSW2G, CPSW3G, CPSW5G and
CPSW9G) present in various K3 SoCs only support the 'RGMII-ID' mode. This
correction has been implemented/enforced by the updates to:
a) Device-Tree binding for CPSW [0]
b) Driver for CPSW [1]
c) Driver for CPSW MAC Port's GMII [2]

To complete the transition from 'RGMII-RXID' to 'RGMII-ID', update the
'phy-mode' property for all CPSW ports by replacing 'rgmii-rxid' with
'rgmii-id'.

[0]: commit 9b357ea525 ("dt-bindings: net: ti: k3-am654-cpsw-nuss: update phy-mode in example")
[1]: commit ca13b249f2 ("net: ethernet: ti: am65-cpsw: fixup PHY mode for fixed RGMII TX delay")
[2]: commit a22d3b0d49 ("phy: ti: gmii-sel: Always write the RGMII ID setting")

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Tested-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> # k3-am642-tqma64xxl-mbax4xxl
Tested-by: Francesco Dolcini <francesco.dolcini@toradex.com> # Toradex Verdin AM62P
Link: https://patch.msgid.link/20251025073802.1790437-1-s-vadapalli@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-06 10:59:21 +05:30
Matthias Schiffer
c9836bf7c3 arm64: dts: ti: k3-am642-tqma64xxl: add boot phase tags
Similar to other AM64x-based boards, add boot phase tags to make the
Device Trees usable for firmware/bootloaders without modification.

Supported boot devices are eMMC/SD card, SPI-NOR and USB (both mass
storage and DFU). The I2C EEPROM is included to allow the firmware to
select the correct RAM configuration for different TQMa64xxL variants.

Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Link: https://patch.msgid.link/20251105141726.39579-1-matthias.schiffer@ew.tq-group.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-06 10:57:39 +05:30
Nathan Chancellor
35c973187f ARM: dts: omap: am335x-mba335x: Fix stray '/*' in comment
When preprocessing arch/arm/boot/dts/ti/omap/am335x-mba335x.dts with
clang, there are a couple of warnings about '/*' within a block comment.

  arch/arm/boot/dts/ti/omap/am335x-mba335x.dts:260:7: warning: '/*' within block comment [-Wcomment]
    260 |                         /* /* gpmc_csn3.gpio2_0 - interrupt */
        |                            ^
  arch/arm/boot/dts/ti/omap/am335x-mba335x.dts:267:7: warning: '/*' within block comment [-Wcomment]
    267 |                         /* /* gpmc_ben1.gpio1_28 - interrupt */
        |                            ^

Remove the duplicate '/*' to clear up the warning.

Fixes: 5267fcd180 ("ARM: dts: omap: Add support for TQMa335x/MBa335x")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20251105-omap-mba335x-fix-clang-comment-warning-v2-1-f8a0003e1003@kernel.org
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-05 16:34:40 -08:00
Alexander Stein
c1d5c20269 ARM: dts: omap: am335x-tqma335x/mba335x: Fix MicIn routing
'Mic Jack' is connected to IN3_L and 'Mic Bias' is connected to 'Mic Jack'
Adjust routing accordingly.

Fixes: 5267fcd180 ("ARM: dts: omap: Add support for TQMa335x/MBa335x")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Link: https://lore.kernel.org/r/20251105083422.1010825-1-alexander.stein@ew.tq-group.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-05 16:33:57 -08:00
Heiko Stuebner
e06a419eaa arm64: dts: rockchip: add missing clocks for cpu cores on rk356x
All cpu cores are supplied by the same clock, but all except the first
core are missing that clocks reference - add the missing ones.

Reviewed-by: Diederik de Haas <diederik@cknow-tech.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20251103234926.416137-4-heiko@sntech.de
2025-11-06 00:24:48 +01:00
Heiko Stuebner
ff8912700f arm64: dts: rockchip: use SCMI clock id for cpu clock on rk356x
Instead of hard-coding 0, use the more descriptive ID from the binding
to reference the SCMI clock for the cpu on rk356x.

Reviewed-by: Diederik de Haas <diederik@cknow-tech.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20251103234926.416137-3-heiko@sntech.de
2025-11-06 00:24:48 +01:00
Heiko Stuebner
6c30efe540 Merge branch 'v6.19-shared/clkids' into v6.19-armsoc/dts64 2025-11-06 00:24:17 +01:00
Heiko Stuebner
88b6a93af4 dt-bindings: clock: rk3568: Add SCMI clock ids
The Trusted Firmware on RK3568 exposes 3 clocks via the SCMI clock
interface. Add descriptive IDs for them.

The clock ids are used in both the older vendor-binary TF-A, as well
as the recently merged upstream SCMI clock implementation.

Link: https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/31265
Reviewed-by: Diederik de Haas <diederik@cknow-tech.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20251103234926.416137-2-heiko@sntech.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-11-06 00:02:47 +01:00
Yosry Ahmed
3d80f4c93d KVM: nSVM: Avoid incorrect injection of SVM_EXIT_CR0_SEL_WRITE
When emulating L2 instructions, svm_check_intercept() checks whether a
write to CR0 should trigger a synthesized #VMEXIT with
SVM_EXIT_CR0_SEL_WRITE. However, it does not check whether L1 enabled
the intercept for SVM_EXIT_WRITE_CR0, which has higher priority
according to the APM (24593—Rev.  3.42—March 2024, Table 15-7):

  When both selective and non-selective CR0-write intercepts are active at
  the same time, the non-selective intercept takes priority. With respect
  to exceptions, the priority of this intercept is the same as the generic
  CR0-write intercept.

Make sure L1 does NOT intercept SVM_EXIT_WRITE_CR0 before checking if
SVM_EXIT_CR0_SEL_WRITE needs to be injected.

Opportunistically tweak the "not CR0" logic to explicitly bail early so
that it's more obvious that only CR0 has a selective intercept, and that
modifying icpt_info.exit_code is functionally necessary so that the call
to nested_svm_exit_handled() checks the correct exit code.

Fixes: cfec82cb7d ("KVM: SVM: Add intercept check for emulated cr accesses")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251024192918.3191141-4-yosry.ahmed@linux.dev
[sean: isolate non-CR0 write logic, tweak comments accordingly]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 13:33:46 -08:00
Yosry Ahmed
5674a76db0 KVM: nSVM: Propagate SVM_EXIT_CR0_SEL_WRITE correctly for LMSW emulation
When emulating L2 instructions, svm_check_intercept() checks whether a
write to CR0 should trigger a synthesized #VMEXIT with
SVM_EXIT_CR0_SEL_WRITE. For MOV-to-CR0, SVM_EXIT_CR0_SEL_WRITE is only
triggered if any bit other than CR0.MP and CR0.TS is updated. However,
according to the APM (24593—Rev.  3.42—March 2024, Table 15-7):

  The LMSW instruction treats the selective CR0-write
  intercept as a non-selective intercept (i.e., it intercepts
  regardless of the value being written).

Skip checking the changed bits for x86_intercept_lmsw and always inject
SVM_EXIT_CR0_SEL_WRITE.

Fixes: cfec82cb7d ("KVM: SVM: Add intercept check for emulated cr accesses")
Cc: stable@vger.kernel.org
Reported-by: Matteo Rizzo <matteorizzo@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251024192918.3191141-3-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 13:32:36 -08:00
Yosry Ahmed
3d31bdf9cc KVM: nSVM: Remove redundant cases in nested_svm_intercept()
Both the CRx and DRx cases are doing exactly what the default case is
doing, remove them.

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251024192918.3191141-2-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 13:32:36 -08:00
Yan Zhao
1e3a825c9e KVM: TDX: Fix list_add corruption during vcpu_load()
During vCPU creation, a vCPU may be destroyed immediately after
kvm_arch_vcpu_create() (e.g., due to vCPU id confiliction). However, the
vcpu_load() inside kvm_arch_vcpu_create() may have associate the vCPU to
pCPU via "list_add(&tdx->cpu_list, &per_cpu(associated_tdvcpus, cpu))"
before invoking tdx_vcpu_free().

Though there's no need to invoke tdh_vp_flush() on the vCPU, failing to
dissociate the vCPU from pCPU (i.e., "list_del(&to_tdx(vcpu)->cpu_list)")
will cause list corruption of the per-pCPU list associated_tdvcpus.

Then, a later list_add() during vcpu_load() would detect list corruption
and print calltrace as shown below.

Dissociate a vCPU from its associated pCPU in tdx_vcpu_free() for the vCPUs
destroyed immediately after creation which must be in
VCPU_TD_STATE_UNINITIALIZED state.

kernel BUG at lib/list_debug.c:29!
Oops: invalid opcode: 0000 [#2] SMP NOPTI
RIP: 0010:__list_add_valid_or_report+0x82/0xd0

Call Trace:
 <TASK>
 tdx_vcpu_load+0xa8/0x120
 vt_vcpu_load+0x25/0x30
 kvm_arch_vcpu_load+0x81/0x300
 vcpu_load+0x55/0x90
 kvm_arch_vcpu_create+0x24f/0x330
 kvm_vm_ioctl_create_vcpu+0x1b1/0x53
 kvm_vm_ioctl+0xc2/0xa60
  __x64_sys_ioctl+0x9a/0xf0
 x64_sys_call+0x10ee/0x20d0
 do_syscall_64+0xc3/0x470
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: d789fa6efa ("KVM: TDX: Handle vCPU dissociation")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-29-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:17:33 -08:00
Sean Christopherson
ad44aa4c5d KVM: TDX: Bug the VM if extending the initial measurement fails
WARN and terminate the VM if TDH_MR_EXTEND fails, as extending the
measurement should fail if and only if there is a KVM bug, or if the S-EPT
mapping is invalid.  Now that KVM makes all state transitions mutually
exclusive via tdx_vm_state_guard, it should be impossible for S-EPT
mappings to be removed between kvm_tdp_mmu_map_private_pfn() and
tdh_mr_extend().

Holding slots_lock prevents zaps due to memslot updates,
filemap_invalidate_lock() prevents zaps due to guest_memfd PUNCH_HOLE,
vcpu->mutex locks prevents updates from other vCPUs, kvm->lock prevents
VM-scoped ioctls from creating havoc (e.g. by creating new vCPUs), and all
usage of kvm_zap_gfn_range() is mutually exclusive with S-EPT entries that
can be used for the initial image.

For kvm_zap_gfn_range(), the call from sev.c is obviously mutually
exclusive, TDX disallows KVM_X86_QUIRK_IGNORE_GUEST_PAT so the same goes
for kvm_noncoherent_dma_assignment_start_or_stop(), and
__kvm_set_or_clear_apicv_inhibit() is blocked by virtue of holding all
VM and vCPU mutexes (and the APIC page has its own KVM-internal memslot
that is never created for TDX VMs, and so can't possibly be used for the
initial image, which means that too is mutually exclusive irrespective of
locking).

Opportunistically return early if the region doesn't need to be measured
in order to reduce line lengths and avoid wraps.  Similarly, immediately
and explicitly return if TDH_MR_EXTEND fails to make it clear that KVM
needs to bail entirely if extending the measurement fails.

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-28-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:17:33 -08:00
Sean Christopherson
15945e9ec1 KVM: TDX: Guard VM state transitions with "all" the locks
Acquire kvm->lock, kvm->slots_lock, and all vcpu->mutex locks when
servicing ioctls that (a) transition the TD to a new state, i.e. when
doing INIT or FINALIZE or (b) are only valid if the TD is in a specific
state, i.e. when initializing a vCPU or memory region.  Acquiring "all"
the locks fixes several KVM_BUG_ON() situations where a SEAMCALL can fail
due to racing actions, e.g. if tdh_vp_create() contends with either
tdh_mr_extend() or tdh_mr_finalize().

For all intents and purposes, the paths in question are fully serialized,
i.e. there's no reason to try and allow anything remotely interesting to
happen.  Smack 'em with a big hammer instead of trying to be "nice".

Acquire kvm->lock to prevent VM-wide things from happening, slots_lock to
prevent kvm_mmu_zap_all_fast(), and _all_ vCPU mutexes to prevent vCPUs
from interefering.  Use the recently-renamed kvm_arch_vcpu_unlocked_ioctl()
to service the vCPU-scoped ioctls to avoid a lock inversion problem, e.g.
due to taking vcpu->mutex outside kvm->lock.

See also commit ecf371f8b0 ("KVM: SVM: Reject SEV{-ES} intra host
migration if vCPU creation is in-flight"), which fixed a similar bug with
SEV intra-host migration where an in-flight vCPU creation could race with
a VM-wide state transition.

Define a fancy new CLASS to handle the lock+check => unlock logic with
guard()-like syntax:

        CLASS(tdx_vm_state_guard, guard)(kvm);
        if (IS_ERR(guard))
                return PTR_ERR(guard);

to simplify juggling the many locks.

Note!  Take kvm->slots_lock *after* all vcpu->mutex locks, as per KVM's
soon-to-be-documented lock ordering rules[1].

Link: https://lore.kernel.org/all/20251016235538.171962-1-seanjc@google.com [1]
Reported-by: Yan Zhao <yan.y.zhao@intel.com>
Closes: https://lore.kernel.org/all/aLFiPq1smdzN3Ary@yzhao56-desk.sh.intel.com
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-27-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:17:32 -08:00
Sean Christopherson
f26061fe2c KVM: TDX: Don't copy "cmd" back to userspace for KVM_TDX_CAPABILITIES
Don't copy the kvm_tdx_cmd structure back to userspace when handling
KVM_TDX_CAPABILITIES, as tdx_get_capabilities() doesn't modify hw_error or
any other fields.

Opportunistically hoist the call to tdx_get_capabilities() outside of the
kvm->lock critical section, as getting the capabilities doesn't touch the
VM in any way, e.g. doesn't even take @kvm.

Suggested-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-26-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:17:31 -08:00
Sean Christopherson
0b76e827b2 KVM: TDX: Use guard() to acquire kvm->lock in tdx_vm_ioctl()
Use guard() in tdx_vm_ioctl() to tidy up the code a small amount, but more
importantly to minimize the diff of a future change, which will use
guard-like semantics to acquire and release multiple locks.

No functional change intended.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-25-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:17:30 -08:00
Sean Christopherson
94428e3ba3 KVM: TDX: Convert INIT_MEM_REGION and INIT_VCPU to "unlocked" vCPU ioctl
Handle the KVM_TDX_INIT_MEM_REGION and KVM_TDX_INIT_VCPU vCPU sub-ioctls
in the unlocked variant, i.e. outside of vcpu->mutex, in anticipation of
taking kvm->lock along with all other vCPU mutexes, at which point the
sub-ioctls _must_ start without vcpu->mutex held.

No functional change intended.

Reviewed-by: Kai Huang <kai.huang@intel.com>
Co-developed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-24-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:17:30 -08:00
Sean Christopherson
59d5c1ed6d KVM: TDX: Add tdx_get_cmd() helper to get and validate sub-ioctl command
Add a helper to copy a kvm_tdx_cmd structure from userspace and verify
that must-be-zero fields are indeed zero.

No functional change intended.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-23-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:17:29 -08:00
Sean Christopherson
3d626ce5a8 KVM: TDX: Add macro to retry SEAMCALLs when forcing vCPUs out of guest
Add a macro to handle kicking vCPUs out of the guest and retrying
SEAMCALLs on TDX_OPERAND_BUSY instead of providing small helpers to be
used by each SEAMCALL.  Wrapping the SEAMCALLs in a macro makes it a
little harder to tease out which SEAMCALL is being made, but
significantly reduces the amount of copy+paste code, and makes it all but
impossible to leave an elevated wait_for_sept_zap.

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-22-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:17:07 -08:00
Sean Christopherson
2ff1411698 KVM: TDX: Assert that mmu_lock is held for write when removing S-EPT entries
Unconditionally assert that mmu_lock is held for write when removing S-EPT
entries, not just when removing S-EPT entries triggers certain conditions,
e.g. needs to do TDH_MEM_TRACK or kick vCPUs out of the guest.
Conditionally asserting implies that it's safe to hold mmu_lock for read
when those paths aren't hit, which is simply not true, as KVM doesn't
support removing S-EPT entries under read-lock.

Only two paths lead to remove_external_spte(), and both paths asserts that
mmu_lock is held for write (tdp_mmu_set_spte() via lockdep, and
handle_removed_pt() via KVM_BUG_ON()).

Deliberately leave lockdep assertions in the "no vCPUs" helpers to document
that wait_for_sept_zap is guarded by holding mmu_lock for write, and keep
the conditional assert in tdx_track() as well, but with a comment to help
explain why holding mmu_lock for write matters (above and beyond why
tdx_sept_remove_private_spte()'s requirements).

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-21-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:07:04 -08:00
Sean Christopherson
55560b6be5 KVM: TDX: Derive error argument names from the local variable names
When printing SEAMCALL errors, use the name of the variable holding an
error parameter instead of the register from whence it came, so that flows
which use descriptive variable names will similarly print descriptive
error messages.

Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-20-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:07:03 -08:00
Sean Christopherson
597d706870 KVM: TDX: Combine KVM_BUG_ON + pr_tdx_error() into TDX_BUG_ON()
Add TDX_BUG_ON() macros (with varying numbers of arguments) to deduplicate
the myriad flows that do KVM_BUG_ON()/WARN_ON_ONCE() followed by a call to
pr_tdx_error().  In addition to reducing boilerplate copy+paste code, this
also helps ensure that KVM provides consistent handling of SEAMCALL errors.

Opportunistically convert a handful of bare WARN_ON_ONCE() paths to the
equivalent of KVM_BUG_ON(), i.e. have them terminate the VM.  If a SEAMCALL
error is fatal enough to WARN on, it's fatal enough to terminate the TD.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-19-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:07:02 -08:00
Sean Christopherson
14c9938619 KVM: TDX: Fold tdx_sept_zap_private_spte() into tdx_sept_remove_private_spte()
Do TDH_MEM_RANGE_BLOCK directly in tdx_sept_remove_private_spte() instead
of using a one-off helper now that the nr_premapped tracking is gone.

Opportunistically drop the WARN on hugepages, which was dead code (see the
KVM_BUG_ON() in tdx_sept_remove_private_spte()).

No functional change intended.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-18-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:07:02 -08:00
Sean Christopherson
6b5b71ffab KVM: TDX: ADD pages to the TD image while populating mirror EPT entries
When populating the initial memory image for a TDX guest, ADD pages to the
TD as part of establishing the mappings in the mirror EPT, as opposed to
creating the mappings and then doing ADD after the fact.  Doing ADD in the
S-EPT callbacks eliminates the need to track "premapped" pages, as the
mirror EPT (M-EPT) and S-EPT are always synchronized, e.g. if ADD fails,
KVM reverts to the previous M-EPT entry (guaranteed to be !PRESENT).

Eliminating the hole where the M-EPT can have a mapping that doesn't exist
in the S-EPT in turn obviates the need to handle errors that are unique to
encountering a missing S-EPT entry (see tdx_is_sept_zap_err_due_to_premap()).

Keeping the M-EPT and S-EPT synchronized also eliminates the need to check
for unconsumed "premap" entries during tdx_td_finalize(), as there simply
can't be any such entries.  Dropping that check in particular reduces the
overall cognitive load, as the management of nr_premapped with respect
to removal of S-EPT is _very_ subtle.  E.g. successful removal of an S-EPT
entry after it completed ADD doesn't adjust nr_premapped, but it's not
clear why that's "ok" but having half-baked entries is not (it's not truly
"ok" in that removing pages from the image will likely prevent the guest
from booting, but from KVM's perspective it's "ok").

Doing ADD in the S-EPT path requires passing an argument via a scratch
field, but the current approach of tracking the number of "premapped"
pages effectively does the same.  And the "premapped" counter is much more
dangerous, as it doesn't have a singular lock to protect its usage, since
nr_premapped can be modified as soon as mmu_lock is dropped, at least in
theory.  I.e. nr_premapped is guarded by slots_lock, but only for "happy"
paths.

Note, this approach was used/tried at various points in TDX development,
but was ultimately discarded due to a desire to avoid stashing temporary
state in kvm_tdx.  But as above, KVM ended up with such state anyways,
and fully committing to using temporary state provides better access
rules (100% guarded by slots_lock), and makes several edge cases flat out
impossible.

Note #2, continue to extend the measurement outside of mmu_lock, as it's
a slow operation (typically 16 SEAMCALLs per page whose data is included
in the measurement), and doesn't *need* to be done under mmu_lock, e.g.
for consistency purposes.  However, MR.EXTEND isn't _that_ slow, e.g.
~1ms latency to measure a full page, so if it needs to be done under
mmu_lock in the future, e.g. because KVM gains a flow that can remove
S-EPT entries during KVM_TDX_INIT_MEM_REGION, then extending the
measurement can also be moved into the S-EPT mapping path (again, only if
absolutely necessary).  P.S. _If_ MR.EXTEND is moved into the S-EPT path,
take care not to return an error up the stack if TDH_MR_EXTEND fails, as
removing the M-EPT entry but not the S-EPT entry would result in
inconsistent state!

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-17-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:06:59 -08:00
Sean Christopherson
b4b2b6eda5 KVM: TDX: Fold tdx_mem_page_record_premap_cnt() into its sole caller
Fold tdx_mem_page_record_premap_cnt() into tdx_sept_set_private_spte() as
providing a one-off helper for effectively three lines of code is at best a
wash, and splitting the code makes the comment for smp_rmb()  _extremely_
confusing as the comment talks about reading kvm->arch.pre_fault_allowed
before kvm_tdx->state, but the immediately visible code does the exact
opposite.

Opportunistically rewrite the comments to more explicitly explain who is
checking what, as well as _why_ the ordering matters.

No functional change intended.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-16-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:05:53 -08:00
Sean Christopherson
af96d5452e KVM: TDX: Use atomic64_dec_return() instead of a poor equivalent
Use atomic64_dec_return() when decrementing the number of "pre-mapped"
S-EPT pages to ensure that the count can't go negative without KVM
noticing.  In theory, checking for '0' and then decrementing in a separate
operation could miss a 0=>-1 transition.  In practice, such a condition is
impossible because nr_premapped is protected by slots_lock, i.e. doesn't
actually need to be an atomic (that wart will be addressed shortly).

Don't bother trying to keep the count non-negative, as the KVM_BUG_ON()
ensures the VM is dead, i.e. there's no point in trying to limp along.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:05:52 -08:00
Sean Christopherson
24adff3970 KVM: TDX: Avoid a double-KVM_BUG_ON() in tdx_sept_zap_private_spte()
Return -EIO immediately from tdx_sept_zap_private_spte() if the number of
to-be-added pages underflows, so that the following "KVM_BUG_ON(err, kvm)"
isn't also triggered.  Isolating the check from the "is premap error"
if-statement will also allow adding a lockdep assertion that premap errors
are encountered if and only if slots_lock is held.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:05:51 -08:00
Sean Christopherson
b9d5cf6de0 KVM: TDX: WARN if mirror SPTE doesn't have full RWX when creating S-EPT mapping
Pass in the mirror_spte to kvm_x86_ops.set_external_spte() to provide
symmetry with .remove_external_spte(), and assert in TDX that the mirror
SPTE is shadow-present with full RWX permissions (the TDX-Module doesn't
allow the hypervisor to control protections).

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:05:51 -08:00
Sean Christopherson
7139c86065 KVM: x86/mmu: Drop the return code from kvm_x86_ops.remove_external_spte()
Drop the return code from kvm_x86_ops.remove_external_spte(), a.k.a.
tdx_sept_remove_private_spte(), as KVM simply does a KVM_BUG_ON() failure,
and that KVM_BUG_ON() is redundant since all error paths in TDX also do a
KVM_BUG_ON().

Opportunistically pass the spte instead of the pfn, as the API is clearly
about removing an spte.

Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:05:50 -08:00
Sean Christopherson
b836503300 KVM: TDX: Fold tdx_sept_drop_private_spte() into tdx_sept_remove_private_spte()
Fold tdx_sept_drop_private_spte() into tdx_sept_remove_private_spte() as a
step towards having "remove" be the one and only function that deals with
removing/zapping/dropping a SPTE, e.g. to avoid having to differentiate
between "zap", "drop", and "remove".  Eliminating the "drop" helper also
gets rid of what is effectively dead code due to redundant checks, e.g. on
an HKID being assigned.

No functional change intended.

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:05:42 -08:00
Sean Christopherson
e6348c90dd KVM: TDX: Return -EIO, not -EINVAL, on a KVM_BUG_ON() condition
Return -EIO when a KVM_BUG_ON() is tripped, as KVM's ABI is to return -EIO
when a VM has been killed due to a KVM bug, not -EINVAL.  Note, many (all?)
of the affected paths never propagate the error code to userspace, i.e.
this is about internal consistency more than anything else.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:03:16 -08:00
Yan Zhao
ce7b569539 KVM: TDX: Drop superfluous page pinning in S-EPT management
Don't explicitly pin pages when mapping pages into the S-EPT, guest_memfd
doesn't support page migration in any capacity, i.e. there are no migrate
callbacks because guest_memfd pages *can't* be migrated.  See the WARN in
kvm_gmem_migrate_folio().

Eliminating TDX's explicit pinning will also enable guest_memfd to support
in-place conversion between shared and private memory[1][2].  Because KVM
cannot distinguish between speculative/transient refcounts and the
intentional refcount for TDX on private pages[3], failing to release
private page refcount in TDX could cause guest_memfd to indefinitely wait
on decreasing the refcount for the splitting.

Under normal conditions, not holding an extra page refcount in TDX is safe
because guest_memfd ensures pages are retained until its invalidation
notification to KVM MMU is completed. However, if there're bugs in KVM/TDX
module, not holding an extra refcount when a page is mapped in S-EPT could
result in a page being released from guest_memfd while still mapped in the
S-EPT.  But, doing work to make a fatal error slightly less fatal is a net
negative when that extra work adds complexity and confusion.

Several approaches were considered to address the refcount issue, including
  - Attempting to modify the KVM unmap operation to return a failure,
    which was deemed too complex and potentially incorrect[4].
  - Increasing the folio reference count only upon S-EPT zapping failure[5].
  - Use page flags or page_ext to indicate a page is still used by TDX[6],
    which does not work for HVO (HugeTLB Vmemmap Optimization).
  - Setting HWPOISON bit or leveraging folio_set_hugetlb_hwpoison()[7].

Due to the complexity or inappropriateness of these approaches, and the
fact that S-EPT zapping failure is currently only possible when there are
bugs in the KVM or TDX module, which is very rare in a production kernel,
a straightforward approach of simply not holding the page reference count
in TDX was chosen[8].

When S-EPT zapping errors occur, KVM_BUG_ON() is invoked to kick off all
vCPUs and mark the VM as dead. Although there is a potential window that a
private page mapped in the S-EPT could be reallocated and used outside the
VM, the loud warning from KVM_BUG_ON() should provide sufficient debug
information. To be robust against bugs, the user can enable panic_on_warn
as normal.

Link: https://lore.kernel.org/all/cover.1747264138.git.ackerleytng@google.com [1]
Link: https://youtu.be/UnBKahkAon4 [2]
Link: https://lore.kernel.org/all/CAGtprH_ypohFy9TOJ8Emm_roT4XbQUtLKZNFcM6Fr+fhTFkE0Q@mail.gmail.com [3]
Link: https://lore.kernel.org/all/aEEEJbTzlncbRaRA@yzhao56-desk.sh.intel.com [4]
Link: https://lore.kernel.org/all/aE%2Fq9VKkmaCcuwpU@yzhao56-desk.sh.intel.com [5]
Link: https://lore.kernel.org/all/aFkeBtuNBN1RrDAJ@yzhao56-desk.sh.intel.com [6]
Link: https://lore.kernel.org/all/diqzy0tikran.fsf@ackerleytng-ctop.c.googlers.com [7]
Link: https://lore.kernel.org/all/53ea5239f8ef9d8df9af593647243c10435fd219.camel@intel.com [8]
Suggested-by: Vishal Annapurve <vannapurve@google.com>
Suggested-by: Ackerley Tng <ackerleytng@google.com>
Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
[sean: extract out of hugepage series, massage changelog accordingly]
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:03:15 -08:00
Sean Christopherson
6de2fb089b KVM: x86/mmu: Rename kvm_tdp_map_page() to kvm_tdp_page_prefault()
Rename kvm_tdp_map_page() to kvm_tdp_page_prefault() now that it's used
only by kvm_arch_vcpu_pre_fault_memory().

No functional change intended.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:03:14 -08:00
Sean Christopherson
fe7413e398 Revert "KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU"
Remove the helper and exports that were added to allow TDX code to reuse
kvm_tdp_map_page() for its gmem post-populate flow now that a dedicated
TDP MMU API is provided to install a mapping given a gfn+pfn pair.

This reverts commit 2608f10576.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:03:14 -08:00
Sean Christopherson
c1f173fb33 KVM: x86/mmu: WARN if KVM attempts to map into an invalid TDP MMU root
When mapping into the TDP MMU, WARN (if KVM_PROVE_MMU=y) if the root is
invalid, e.g. if KVM is attempting to insert a mapping without checking if
the information and MMU context is fresh.

Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:03:13 -08:00
Sean Christopherson
3ab3283dbb KVM: x86/mmu: Add dedicated API to map guest_memfd pfn into TDP MMU
Add and use a new API for mapping a private pfn from guest_memfd into the
TDP MMU from TDX's post-populate hook instead of partially open-coding the
functionality into the TDX code.  Sharing code with the pre-fault path
sounded good on paper, but it's fatally flawed as simulating a fault loses
the pfn, and calling back into gmem to re-retrieve the pfn creates locking
problems, e.g. kvm_gmem_populate() already holds the gmem invalidation
lock.

Providing a dedicated API will also removing several MMU exports that
ideally would not be exposed outside of the MMU, let alone to vendor code.
On that topic, opportunistically drop the kvm_mmu_load() export.  Leave
kvm_tdp_mmu_gpa_is_mapped() alone for now; the entire commit that added
kvm_tdp_mmu_gpa_is_mapped() will be removed in the near future.

Gate the API on CONFIG_KVM_GUEST_MEMFD=y as private memory _must_ be backed
by guest_memfd.  Add a lockdep-only assert to that the incoming pfn is
indeed backed by guest_memfd, and that the gmem instance's invalidate lock
is held (which, combined with slots_lock being held, obviates the need to
check for a stale "fault").

Cc: Michael Roth <michael.roth@amd.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Vishal Annapurve <vannapurve@google.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/all/20250709232103.zwmufocd3l7sqk7y@amd.com
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:03:12 -08:00
Sean Christopherson
5294a4b93e KVM: TDX: Drop PROVE_MMU=y sanity check on to-be-populated mappings
Drop TDX's sanity check that a mirror EPT mapping isn't zapped between
creating said mapping and doing TDH.MEM.PAGE.ADD, as the check is
simultaneously superfluous and incomplete.  Per commit 2608f10576
("KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU"), the
justification for introducing kvm_tdp_mmu_gpa_is_mapped() was to check
that the target gfn was pre-populated, with a link that points to this
snippet:

 : > One small question:
 : >
 : > What if the memory region passed to KVM_TDX_INIT_MEM_REGION hasn't been pre-
 : > populated?  If we want to make KVM_TDX_INIT_MEM_REGION work with these regions,
 : > then we still need to do the real map.  Or we can make KVM_TDX_INIT_MEM_REGION
 : > return error when it finds the region hasn't been pre-populated?
 :
 : Return an error.  I don't love the idea of bleeding so many TDX details into
 : userspace, but I'm pretty sure that ship sailed a long, long time ago.

But that justification makes little sense for the final code, as the check
on nr_premapped after TDH.MEM.PAGE.ADD will detect and return an error if
KVM attempted to zap a S-EPT entry (tdx_sept_zap_private_spte() will fail
on TDH.MEM.RANGE.BLOCK due lack of a valid S-EPT entry).  And as evidenced
by the "is mapped?" code being guarded with CONFIG_KVM_PROVE_MMU=y, KVM is
NOT relying on the check for general correctness.

The sanity check is also incomplete in the sense that mmu_lock is dropped
between the check and TDH.MEM.PAGE.ADD, i.e. will only detect KVM bugs that
zap SPTEs in a very specific window (note, this also applies to the check
on nr_premapped).

Removing the sanity check will allow removing kvm_tdp_mmu_gpa_is_mapped(),
which has no business being exposed to vendor code, and more importantly
will pave the way for eliminating the "pre-map" approach entirely in favor
of doing TDH.MEM.PAGE.ADD under mmu_lock.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:03:11 -08:00
Sean Christopherson
50efc2340a KVM: Rename kvm_arch_vcpu_async_ioctl() to kvm_arch_vcpu_unlocked_ioctl()
Rename the "async" ioctl API to "unlocked" so that upcoming usage in x86's
TDX code doesn't result in a massive misnomer.  To avoid having to retry
SEAMCALLs, TDX needs to acquire kvm->lock *and* all vcpu->mutex locks, and
acquiring all of those locks after/inside the current vCPU's mutex is a
non-starter.  However, TDX also needs to acquire the vCPU's mutex and load
the vCPU, i.e. the handling is very much not async to the vCPU.

No functional change intended.

Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:03:11 -08:00
Sean Christopherson
0a0da3f921 KVM: Make support for kvm_arch_vcpu_async_ioctl() mandatory
Implement kvm_arch_vcpu_async_ioctl() "natively" in x86 and arm64 instead
of relying on an #ifdef'd stub, and drop HAVE_KVM_VCPU_ASYNC_IOCTL in
anticipation of using the API on x86.  Once x86 uses the API, providing a
stub for one architecture and having all other architectures opt-in
requires more code than simply implementing the API in the lone holdout.

Eliminating the Kconfig will also reduce churn if the API is renamed in
the future (spoiler alert).

No functional change intended.

Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030200951.3402865-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05 11:03:10 -08:00
Niklas Cassel
21ef26d0e7 arm64: tegra: Add pinctrl definitions for pcie-ep nodes
When the PCIe controller is running in endpoint mode, the controller
initialization is triggered by a PERST# (PCIe reset) GPIO deassertion.

The driver has configured an IRQ to trigger when the PERST# GPIO changes
state. Without the pinctrl definition, we do not get an IRQ when PERST#
is deasserted, so the PCIe controller never gets initialized.

Add the missing definitions, so that the controller actually gets
initialized.

Fixes: ec142c44b0 ("arm64: tegra: Add P2U and PCIe controller nodes to Tegra234 DT")
Fixes: 0580286d0d ("arm64: tegra: Add Tegra234 PCIe C4 EP definition")
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
[treding@nvidia.com: add blank lines to separate blocks]
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-05 14:36:12 +01:00
Frank Wunderlich
b88827cb0b dt-bindings: arm: mediatek: add BPI-R4 Pro board
Add compatibles for Bananapi R4 Pro boards.

Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-11-05 13:56:18 +01:00
Jason-JH Lin
355531a5ff arm64: dts: mediatek: Add GCE header for MT8196
Add GCE header define for GCE Thread Priority and GCE Event IDs
that used in the MT8196 dtsi.

Signed-off-by: Jason-JH Lin <jason-jh.lin@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-11-05 10:14:15 +01:00
Sjoerd Simons
c6dcf32830 arm64: dts: mediatek: mt7981b: Add reserved memory for TF-A
Add memory range handled by ARM Trusted Firmware

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Sjoerd Simons <sjoerd@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-11-05 10:02:27 +01:00
Sjoerd Simons
8bc650405f arm64: dts: mediatek: mt7981b: Configure UART0 pinmux
Add explicit pinctrl configuration for UART0

Signed-off-by: Sjoerd Simons <sjoerd@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-11-05 10:02:27 +01:00
Kaustabh Chakraborty
cad767a9af arm64: dts: exynos7870-j6lte: enable display panel support
Enable DECON and DSI nodes, and add the compatible display panel and
appropriate panel timings for this device. Also, remove the
simple-framebuffer node in favor of the panel.

This device has a 720x1480 AMOLED Samsung AMS561RA01 panel with
S6E8AA5X01 controller.

Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
Link: https://patch.msgid.link/20251031-exynos7870-drm-dts-v4-5-c1f77fb16b87@disroot.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-11-05 08:23:56 +01:00
Kaustabh Chakraborty
ba2cd209cf arm64: dts: exynos7870-a2corelte: enable display panel support
Enable DECON and DSI nodes, and add the compatible display panel and
appropriate panel timings for this device. Also, remove the
simple-framebuffer node in favor of the panel.

This device has a 540x960 Synaptics TD4101 display panel.

Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
Link: https://patch.msgid.link/20251031-exynos7870-drm-dts-v4-4-c1f77fb16b87@disroot.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-11-05 08:23:53 +01:00
Kaustabh Chakraborty
35e69cc6bd arm64: dts: exynos7870-on7xelte: enable display panel support
Enable DECON and DSI nodes, and add the compatible display panel and
appropriate panel timings for this device. Also, remove the
simple-framebuffer node in favor of the panel.

This device has a 1080x1920 Synaptics TD4300 display panel.

Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
Link: https://patch.msgid.link/20251031-exynos7870-drm-dts-v4-3-c1f77fb16b87@disroot.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-11-05 08:23:48 +01:00
Kaustabh Chakraborty
fe02894a19 arm64: dts: exynos7870: add DSI support
Add devicetree nodes for MIPI PHYs, Samsung's DECON and DSIM blocks, and
DECON IOMMU devicetree nodes. Enables SoC support for hardware to be
able to drive a MIPI DSI display.

Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
Link: https://patch.msgid.link/20251031-exynos7870-drm-dts-v4-2-c1f77fb16b87@disroot.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-11-05 08:23:35 +01:00
Kaustabh Chakraborty
81280d39a2 dt-bindings: soc: samsung: exynos-pmu: allow mipi-phy subnode for Exynos7870 PMU
Exynos7870 PMU is already documented in schema. Add Exynos7870's PMU
compatible to the list of nodes which allow a MIPI PHY driver.

Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20251031-exynos7870-drm-dts-v4-1-c1f77fb16b87@disroot.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-11-05 08:20:21 +01:00
Charan Pedumuru
333fa35fbd dt-bindings: mmc: ti,omap2430-sdhci: convert to DT schema
Convert TI OMAP SDHCI Controller binding to YAML format.
Changes during Conversion:
- Define new properties like "clocks", "clock-names",
  "pbias-supply" and "power-domains" to resolve dtb_check errors.
- Remove "pinctrl-names" and "pinctrl-<n>"
  from required as they are not necessary for all DTS files.
- Remove "ti,hwmods" property entirely from the YAML as the
  DTS doesn't contain this property for the given compatibles and the
  text binding is misleading.
- Add "clocks", "clock-names" and "max-frequency" to the required
  properties based on the compatible and the text binding doesn't mention
  these properties as required.
- Add missing strings like "default-rev11", "sdr12-rev11", "sdr25-rev11",
  "hs-rev11", "sdr25-rev11" and "sleep" to pinctrl-names string array
  to resolve errors detected by dtb_check.

Signed-off-by: Charan Pedumuru <charan.pedumuru@gmail.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20251024-ti-sdhci-omap-v5-3-df5f6f033a38@gmail.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-04 14:58:07 -08:00
Charan Pedumuru
9485ba8ab3 arm: dts: ti: omap: Drop unnecessary properties for SDHCI node
Remove the "ti,needs-special-reset", "ti,needs-special-hs-handling", and
"cap-mmc-dual-data-rate" properties from the DTS for the sdhci nodes,
as the sdhci-omap driver does not depend on these properties.

Signed-off-by: Charan Pedumuru <charan.pedumuru@gmail.com>
Link: https://lore.kernel.org/r/20251024-ti-sdhci-omap-v5-2-df5f6f033a38@gmail.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-04 14:58:07 -08:00
Charan Pedumuru
bb2ffb47d8 arm: dts: ti: omap: am335x-pepper: Fix vmmc-supply property typo
Rectify a typo for the property "vmmc-supply" to resolve the errors
detected by dtb_check.

Signed-off-by: Charan Pedumuru <charan.pedumuru@gmail.com>
Link: https://lore.kernel.org/r/20251024-ti-sdhci-omap-v5-1-df5f6f033a38@gmail.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-04 14:58:07 -08:00
Jihed Chaibi
3862123e9b ARM: dts: omap3: n900: Correct obsolete TWL4030 power compatible
The "ti,twl4030-power-n900" compatible string is obsolete and is not
supported by any in-kernel driver. Currently, the kernel falls back to
the second entry, "ti,twl4030-power-idle-osc-off", to bind a driver to
this node.

Make this fallback explicit by removing the obsolete board-specific
compatible. This preserves the existing functionality while making the
DTS compliant with the new, stricter 'ti,twl.yaml' binding.

Fixes: daebabd578 ("mfd: twl4030-power: Fix PM idle pin configuration to not conflict with regulators")
Signed-off-by: Jihed Chaibi <jihed.chaibi.dev@gmail.com>
Link: https://lore.kernel.org/r/20250914192516.164629-4-jihed.chaibi.dev@gmail.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-04 14:55:07 -08:00
Jihed Chaibi
f7f3bc1830 ARM: dts: omap3: beagle-xm: Correct obsolete TWL4030 power compatible
The "ti,twl4030-power-beagleboard-xm" compatible string is obsolete and
is not supported by any in-kernel driver. Currently, the kernel falls
back to the second entry, "ti,twl4030-power-idle-osc-off", to bind a
driver to this node.

Make this fallback explicit by removing the obsolete board-specific
compatible. This preserves the existing functionality while making the
DTS compliant with the new, stricter 'ti,twl.yaml' binding.

Fixes: 9188883fd6 ("ARM: dts: Enable twl4030 off-idle configuration for selected omaps")
Signed-off-by: Jihed Chaibi <jihed.chaibi.dev@gmail.com>
Link: https://lore.kernel.org/r/20250914192516.164629-3-jihed.chaibi.dev@gmail.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-04 14:55:07 -08:00
Matthias Schiffer
5267fcd180 ARM: dts: omap: Add support for TQMa335x/MBa335x
TQMa335x[L] is a SoM family using TI AM335x CPU family.
MBa335x is an evaluation mainboard for this SoM.

Signed-off-by: Matthias Schiffer <matthias.schiffer@tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Link: https://lore.kernel.org/r/20250826140853.2570528-4-alexander.stein@ew.tq-group.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-04 14:50:40 -08:00
Matthias Schiffer
c579ced456 ARM: dts: omap: AM33xx: add cpu label
Board Device Trees often want to set the cpu0-supply. Provide a label to
reference the cpu@0 node.

Signed-off-by: Matthias Schiffer <matthias.schiffer@tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Link: https://lore.kernel.org/r/20250826140853.2570528-3-alexander.stein@ew.tq-group.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-04 14:50:40 -08:00
Matthias Schiffer
6cf18454d7 dt-bindings: omap: add AM335x-based TQMa335x SOM and MBa335x board
TQMa335x[L] is a SOM family using TI AM335x CPU family.
MBa335x is an evaluation mainboard for this SOM.

Signed-off-by: Matthias Schiffer <matthias.schiffer@tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20250826140853.2570528-2-alexander.stein@ew.tq-group.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-04 14:50:40 -08:00
Matthias Schiffer
3f61783920 ti-sysc: allow OMAP2 and OMAP4 timers to be reserved on AM33xx
am33xx.dtsi has the same clock setup as am35xx.dtsi, setting
ti,no-reset-on-init and ti,no-idle on timer1_target and timer2_target,
so AM33 needs the same workaround as AM35 to avoid ti-sysc probe
failing on certain target modules.

Signed-off-by: Matthias Schiffer <matthias.schiffer@tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Link: https://lore.kernel.org/r/20250825131114.2206804-1-alexander.stein@ew.tq-group.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-04 14:48:47 -08:00
Ali Khan
ae18c465b3 ARM: OMAP2+: Fix falg->flag typo in omap_smc2()
Fix a spelling error in the omap_smc2() function declaration
and in a corresponding comment within the assembly source.
This was reported via bugzilla in 2019.

Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=205891
Signed-off-by: Ali Khan <abdulalikhan1337@gmail.com>
Link: https://lore.kernel.org/r/20251007-omap-falg-fix-v1-1-545a1de74a0a@gmail.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-04 14:47:46 -08:00
Yegor Yefremov
d0c4b1723c ARM: dts: am335x-netcom-plus-2xx: add missing GPIO labels
Fixes: 8e9d75fd2e ("ARM: dts: am335x-netcom: add GPIO names for NetCom Plus 2-port devices")

Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Link: https://lore.kernel.org/r/20251007103851.3765678-1-yegorslists@googlemail.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-04 14:47:03 -08:00
Yegor Yefremov
473de5d346 ARM: dts: ti: omap: am335x-baltos: add a regulator for the mPCIe slot
Baltos device provide a mPCIe slot that can be power cycled via a GPIO.

Add a userspace consumer mpcie-power-switch that references a fixed
regulator attached to the GPIO3_4.

Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Link: https://lore.kernel.org/r/20251007085037.3605676-1-yegorslists@googlemail.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-04 14:47:00 -08:00
Yegor Yefremov
905664e276 ARM: dts: omap: rework Baltos LED nodes
Add labels to the Baltos LED nodes.

Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Link: https://lore.kernel.org/r/20251006093517.3075431-1-yegorslists@googlemail.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-04 14:46:58 -08:00
Geert Uytterhoeven
9c95fc710b ARM: dts: am33xx: Add missing serial console speed
Without a serial console speed specified in chosen/stdout-path in the
DTB, the serial console uses the default speed of the serial driver,
unless explicitly overridden in a legacy console= kernel command-line
parameter.

After dropping "ti,omap3-uart" from the list of compatible values in DT,
AM33xx serial ports can no longer be used with the legacy OMAP serial
driver, but only with the OMAP-flavored 8250 serial driver (which is
mutually-exclusive with the former).  However, replacing
CONFIG_SERIAL_OMAP=y by CONFIG_SERIAL_8250_OMAP=y (with/without enabling
CONFIG_SERIAL_8250_OMAP_TTYO_FIXUP) may not be sufficient to restore
serial console functionality: the legacy OMAP serial driver defaults to
115200 bps, while the 8250 serial driver defaults to 9600 bps, causing
no visible output on the serial console when no appropriate console=
kernel command-line parameter is specified.

Fix this for all AM33xx boards by adding ":115200n8" to
chosen/stdout-path.  This requires replacing the "&uartN" reference by
the corresponding "serialN" DT alias.

Fixes: ca8be8fc2c ("ARM: dts: am33xx-l4: fix UART compatible")
Fixes: 077e1cde78 ("ARM: omap2plus_defconfig: Enable 8250_OMAP")
Closes: https://lore.kernel.org/CAMuHMdUb7Jb2=GqK3=Rn+Gv5G9KogcQieqDvjDCkJA4zyX4VcA@mail.gmail.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com>
Tested-by: Matti Vaittinen <mazziesaccount@gmail.com>
Reviewed-by: Bruno Thomsen <bruno.thomsen@gmail.com>
Link: https://lore.kernel.org/r/63cef5c3643d359e8ec13366ca79377f12dd73b1.1759398641.git.geert+renesas@glider.be
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-04 14:46:49 -08:00
Adrian Ng Ho Yin
aef9703dcb arm64: dts: socfpga: agilex5: Add SMMU-V3-PMCG nodes
Add SMMU-V3 Performance Monitoring Counter Group (PMCG) nodes for
Agilex5 to support SMMU performance event monitoring.

Signed-off-by: Adrian Ng Ho Yin <adrianhoyin.ng@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-11-04 15:25:44 -06:00
Adrian Ng Ho Yin
3e99d51aaa arm64: dts: socfpga: agilex5: Add L2 and L3 cache
Add L2 and L3 cache nodes to the device tree to resolve the
"unable to detect cache hierarchy" warning reported by cacheinfo.

Signed-off-by: Adrian Ng Ho Yin <adrianhoyin.ng@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-11-04 15:25:44 -06:00
Kai Huang
6422060aa9 KVM: x86/mmu: Move the misplaced export of kvm_zap_gfn_range()
Currently, the export of kvm_zap_gfn_range() is misplaced, i.e., it's
not placed right after the kvm_zap_gfn_range() function body but after
kvm_mmu_zap_collapsible_spte().  Move it to the right place.

No functional change intended.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251021114345.159372-1-kai.huang@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-04 09:51:06 -08:00
Sean Christopherson
65a70164ab KVM: x86: Add a helper to dedup reporting of unhandled VM-Exits
Add and use a helper, kvm_prepare_unexpected_reason_exit(), to dedup the
code that fills the exit reason and CPU when KVM encounters a VM-Exit that
KVM doesn't know how to handle.

Reviewed-by: yaoyuan@linux.alibaba.com
Reviewed-by: Yao Yuan <yaoyuan@linux.alibaba.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251030185004.3372256-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-04 09:14:47 -08:00
Dinh Nguyen
2f6da95cfb arm64: dts: socfpga: agilex5: fix CHECK_DTBS warning for NAND
Add the required clock-names property NAND controller. This change corrects
the warning:

socfpga_agilex5_socdk_nand.dtb: nand-controller@10b80000 (cdns,hp-nfc):
 'clock-names' is a required property

Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-11-04 10:03:32 -06:00
Jingyi Wang
682921ab33 dt-bindings: firmware: qcom,scm: Document SCM on Kaanapali SOC
Document SCM compatible for the Qualcomm Kaanapali SoC.

Reviewed-by: Eugen Hristev <eugen.hristev@linaro.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251102-knp-soc-binding-v3-2-11255ec4a535@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-03 21:18:21 -06:00
Kathiravan Thirumoorthy
db252c1056 soc: qcom: socinfo: add support to extract more than 32 image versions
SMEM_IMAGE_VERSION_TABLE contains the version of the first 32 images.
Add images beyond that and read these from SMEM_IMAGE_VERSION_TABLE_2.

Signed-off-by: Kathiravan Thirumoorthy <kathiravan.thirumoorthy@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251031-image-crm-part2-v2-2-c224c45c381a@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-03 21:14:17 -06:00
Kathiravan Thirumoorthy
d403276969 soc: qcom: smem: drop the WARN_ON() on SMEM item validation
When a SMEM item is allocated or retrieved, sanity check on the SMEM item
is performed and backtrace is printed if it is invalid. But there is no
benefit in dumping that information in the logs. Lets drop it.

Signed-off-by: Kathiravan Thirumoorthy <kathiravan.thirumoorthy@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251031-image-crm-part2-v2-1-c224c45c381a@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-03 21:14:16 -06:00
Luo Jie
4e13c6aed8 arm64: defconfig: Build NSS clock controller driver for IPQ5424
NSS clock controller is needed for supplying clocks and resets to the
networking blocks for the Ethernet functions on the IPQ5424 platforms.

All boards based on the IPQ5424 SoC will require this driver to be enabled.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://lore.kernel.org/r/20251014-qcom_ipq5424_nsscc-v7-10-081f4956be02@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-03 21:12:57 -06:00
Akhil P Oommen
85d55d8cc3 soc: qcom: ubwc: Add config for Kaanapali
Add the ubwc configuration for Kaanapali chipset. This chipset brings
support for UBWC v6 version. The rest of the configurations remains
as usual.

Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250930-kaana-gpu-support-v1-1-73530b0700ed@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-03 21:11:17 -06:00
David Heidelberg
a3da84c36b arm64: dts: qcom: sdm845-oneplus: Describe TE gpio
Describe panel Tearing Effect (TE) GPIO line.

Signed-off-by: David Heidelberg <david@ixit.cz>
Link: https://lore.kernel.org/r/20251103-s6e3fc2x01-v6-4-d4eb4abaefa4@ixit.cz
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-03 19:03:56 -06:00
David Heidelberg
8dda2fecf7 arm64: dts: qcom: sdm845-oneplus: Implement panel sleep pinctrl
We can DSI pin from 8mA to 2mA while suspend, do it.

In theory, should give us extra 2 hours of idle battery life.

cosmetic: sort pinctrl properties.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Link: https://lore.kernel.org/r/20251103-s6e3fc2x01-v6-3-d4eb4abaefa4@ixit.cz
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-03 19:03:56 -06:00
David Heidelberg
6c55c3c261 arm64: dts: qcom: sdm845-oneplus: Group panel pinctrl
As these pins won't be used outside the group, let's group them.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Link: https://lore.kernel.org/r/20251103-s6e3fc2x01-v6-2-d4eb4abaefa4@ixit.cz
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-03 19:03:55 -06:00
Casey Connolly
69b8bbde23 arm64: dts: qcom: sdm845-oneplus: Update compatbible and add DDIC supplies
Update the compatible to reflect combination of DDIC and panel.

Original compatible describing only the DDIC used, but omit describing
the panel used (Samsung AMS641RW), which we have no way to detect.

There are two additional supplies used by the panel, both are GPIO
controlled and are left enabled by the bootloader for continuous splash.

Previously these were (incorrectly) modelled as pinctrl. Describe them
properly so that the panel can control them.

Fixes: 288ef8a426 ("arm64: dts: sdm845: add oneplus6/6t devices")
Signed-off-by: Casey Connolly <casey.connolly@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Co-developed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: David Heidelberg <david@ixit.cz>
Link: https://lore.kernel.org/r/20251103-s6e3fc2x01-v6-1-d4eb4abaefa4@ixit.cz
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-03 19:03:55 -06:00
Komal Bajaj
4036080362 soc: qcom: socinfo: Add SoC ID for QCS6490
Add SoC ID table entry for Qualcomm QCS6490.

Signed-off-by: Komal Bajaj <komal.bajaj@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251103-qcs6490_soc_id-v1-2-c139dd1e32c8@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-03 18:28:52 -06:00
Komal Bajaj
98c92de40f dt-bindings: arm: qcom,ids: Add SoC ID for QCS6490
Add unique ID for Qualcomm QCS6490 SoC.

Signed-off-by: Komal Bajaj <komal.bajaj@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251103-qcs6490_soc_id-v1-1-c139dd1e32c8@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-03 18:28:52 -06:00
Neeraj Soni
d92ebadda5 soc: qcom: ice: Add HWKM v1 support for wrapped keys
HWKM v1 and v2 differ slightly in wrapped key size and the bit fields for
certain status registers and operating mode (legacy or standard).

Add support to select HWKM version based on the major and minor revisions.
Use this HWKM version to select wrapped key size and to configure the bit
fields in registers for operating modes and hardware status.

Support for SCM calls for wrapped keys is being added in the TrustZone for
few SoCs with HWKM v1. Existing check of qcom_scm_has_wrapped_key_support()
API ensures that HWKM is used only if these SCM calls are supported in
TrustZone for that SoC.

Signed-off-by: Neeraj Soni <neeraj.soni@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251030161012.3391239-1-neeraj.soni@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-03 18:21:56 -06:00
Bjorn Andersson
72a63169bf arm64: dts: qcom: qcs6490-rb3gen2: Rename vph-pwr regulator node
When fixed regulators are not named with "regulator-" prefix, they can
not be neatly grouped and sorted together.

Rename the vph-pwr-regulator, to facilitate the incoming addition of
additional fixed regulators.

Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251102-rb3gen2-regulator-sort-v1-1-908879d240be@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-03 18:13:31 -06:00
Sean Christopherson
83e0e12219 KVM: selftests: Rename "guest_paddr" variables to "gpa"
Rename "guest_paddr" variables in vm_userspace_mem_region_add() and
vm_mem_add() to KVM's de facto standard "gpa", both for consistency and
to shorten line lengths.

Opportunistically fix the indentation of the
vm_userspace_mem_region_add() declaration.

Link: https://patch.msgid.link/20251007223625.369939-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-03 12:54:21 -08:00
Thomas Richard (TI.com)
999e9bc953 firmware: ti_sci: Set IO Isolation only if the firmware is capable
Prevent calling ti_sci_cmd_set_io_isolation() on firmware
that does not support the IO_ISOLATION capability. Add the
MSG_FLAG_CAPS_IO_ISOLATION capability flag and check it before
attempting to set IO isolation during suspend/resume operations.

Without this check, systems with older firmware may experience
undefined behavior or errors when entering/exiting suspend states.

Fixes: ec24643bdd ("firmware: ti_sci: Add system suspend and resume call")
Signed-off-by: Thomas Richard (TI.com) <thomas.richard@bootlin.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Link: https://patch.msgid.link/20251031-ti-sci-io-isolation-v2-1-60d826b65949@bootlin.com
Signed-off-by: Nishanth Menon <nm@ti.com>
2025-11-03 09:09:02 -06:00
Thomas Richard (TI.com)
7a4cd87e76 firmware: ti_sci: Replace ifdeffery by pm_sleep_ptr() macro
Using pm_sleep_ptr() macro allows to remove ifdeffery and
'__maybe_unused' annotations.

Signed-off-by: Thomas Richard (TI.com) <thomas.richard@bootlin.com>
Link: https://patch.msgid.link/20251014-ti-sci-pm-ops-cleanup-v1-1-70b50b73ac85@bootlin.com
Signed-off-by: Nishanth Menon <nm@ti.com>
2025-11-03 09:08:53 -06:00
Tudor Ambarus
bea18c67a0 soc: samsung: exynos-chipid: use a local dev variable
Use a local variable for struct device to avoid dereferencing.

Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Link: https://patch.msgid.link/20251031-gs101-chipid-v1-2-d78d1076b210@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-11-03 11:02:01 +01:00
Hrushikesh Salunke
56bf2702ca arm64: dts: ti: k3-j784s4-j742s2-evm-common: Add bootph-all tag to "pcie1_ctrl"
J784S4 SoC has two instances of PCIe which are PCIe0 and PCIe1. J784S4
SoC uses PCIe1 instance for PCIe boot process. To configure PCIe1 at
all boot stages "pcie1_ctrl" also needs to be present at all boot
stages. Thus add the "bootph-all" boot phase tag to "pcie1_ctrl" device
tree node.

Signed-off-by: Hrushikesh Salunke <h-salunke@ti.com>
Link: https://patch.msgid.link/20251017084654.2929945-4-h-salunke@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-03 14:33:59 +05:30
Hrushikesh Salunke
cadd9234ae arm64: dts: ti: k3-j784s4-j742s2-evm-common: Add bootph-all tag to SERDES0
J784S4 SoC has two instances of PCIe which are PCIe0 and PCIe1. PCIe1
instance is used for PCIe boot process. J784S4 SoC has four instances
of 4-lane SERDES. Out of which SERDES0 is used as PHY for PCIe1. So it
needs to be functional at all stages of PCIe boot process. Thus add the
"bootph-all" boot phase tag to nodes required to enable SERDES0 at all
boot stages.

Signed-off-by: Hrushikesh Salunke <h-salunke@ti.com>
Link: https://patch.msgid.link/20251017084654.2929945-3-h-salunke@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-03 14:33:59 +05:30
Hrushikesh Salunke
1f03b9e71e arm64: dts: ti: k3-j784s4-evm-pcie0-pcie1-ep: Add boot phase tag to "pcie1_ep"
J784S4 SoC has two instances of PCIe which are PCIe0 and PCIe1. J784S4
SoC uses PCIe1 instance for PCIe boot process. So it needs to be in
endpoint mode and it needs to be functional at all stages of PCIe boot
process. Thus add the "bootph-all" boot phase tag to "pcie1_ep" device
tree node.

Signed-off-by: Hrushikesh Salunke <h-salunke@ti.com>
Link: https://patch.msgid.link/20251017084654.2929945-2-h-salunke@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-11-03 14:33:58 +05:30
Raymond Hackley
84df51667a ARM: dts: qcom: msm8226-samsung-ms013g: add simple-framebuffer
Add simple-framebuffer, necessary power-domains and clocks for ms013g.

Signed-off-by: Raymond Hackley <raymondhackley@protonmail.com>
Link: https://lore.kernel.org/r/20251011185750.743615-1-raymondhackley@protonmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-02 12:07:08 -06:00
Griffin Kroah-Hartman
1cf6be79a8 arm64: dts: qcom: qcm6490-fairphone-fp5: Add UW cam actuator
Add a node for the Dongwoon DW9800K actuator, used for focus of the
ultra-wide camera sensor.

Tested-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Griffin Kroah-Hartman <griffin.kroah@fairphone.com>
Link: https://lore.kernel.org/r/20251016-dw9800-driver-v3-4-d7058f72ead4@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-02 11:29:54 -06:00
Griffin Kroah-Hartman
6678d5cf48 arm64: dts: qcom: qcm6490-fairphone-fp5: Enable CCI pull-up
Enable vreg_l6p, which is the voltage source for the pull-up resistor of
the CCI busses.

This ensures that I2C communication works as expected.

Reviewed-by: Bryan O'Donoghue <bod@kernel.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Griffin Kroah-Hartman <griffin.kroah@fairphone.com>
Link: https://lore.kernel.org/r/20251016-dw9800-driver-v3-3-d7058f72ead4@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-02 11:29:54 -06:00
Antony Kurniawan Soemardi
d35984295d ARM: dts: qcom: msm8960: rename msmgpio node to tlmm
Rename the GPIO controller node from "msmgpio" to "tlmm" to match the
convention used by other Qualcomm SoCs.

Suggested-by: Shinjo Park <peremen@gmail.com>
Signed-off-by: Antony Kurniawan Soemardi <linux@smankusors.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-msm8960-reorder-v2-5-26c478366d21@smankusors.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-02 11:27:08 -06:00
Antony Kurniawan Soemardi
47f46fa732 ARM: dts: qcom: msm8960: add I2C nodes for gsbi1 and gsbi8
These are present on msm8960 and are required for devices such as the
Casio G'zOne, which has NFC wired to gsbi1 and audio amplifier wired to
gsbi8.

The nodes are added disabled by default.

Co-developed-by: Shinjo Park <peremen@gmail.com>
Signed-off-by: Shinjo Park <peremen@gmail.com>
Signed-off-by: Antony Kurniawan Soemardi <linux@smankusors.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-msm8960-reorder-v2-4-26c478366d21@smankusors.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-02 11:27:08 -06:00
Antony Kurniawan Soemardi
f239a394f9 ARM: dts: qcom: msm8960: add I2C nodes for gsbi10 and gsbi12
These are present on msm8960 and are required for devices such as the
Sony Xperia SP, which has NFC wired to gsbi10 and various motion
sensors wired to gsbi12.

The nodes are added disabled by default.

Signed-off-by: Antony Kurniawan Soemardi <linux@smankusors.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-msm8960-reorder-v2-3-26c478366d21@smankusors.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-02 11:27:08 -06:00
Antony Kurniawan Soemardi
34fc20c484 ARM: dts: qcom: msm8960: inline qcom-msm8960-pins.dtsi
Inline qcom-msm8960-pins.dtsi into the main SoC dtsi. Most Qualcomm
SoCs embed their TLMM definitions directly, with only msm8960 and
apq8064 using a separate pins file. After this change, only apq8064
remains split.

This is a cosmetic change only, with no functional impact.

Tested-by: Rudraksha Gupta <guptarud@gmail.com>
Tested-by: Shinjo Park <peremen@gmail.com>
Signed-off-by: Antony Kurniawan Soemardi <linux@smankusors.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-msm8960-reorder-v2-2-26c478366d21@smankusors.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-02 11:27:08 -06:00
Antony Kurniawan Soemardi
df41d58048 ARM: dts: qcom: msm8960: reorder nodes and properties
Reorder the nodes in qcom-msm8960.dtsi by unit address and sort
properties, as recommended in the Devicetree style guide. This is a
cosmetic change only, with no functional impact.

Tested-by: Rudraksha Gupta <guptarud@gmail.com>
Tested-by: Shinjo Park <peremen@gmail.com>
Signed-off-by: Antony Kurniawan Soemardi <linux@smankusors.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-msm8960-reorder-v2-1-26c478366d21@smankusors.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-02 11:27:08 -06:00
Wesley Cheng
530f824a4e arm64: dts: qcom: sm8750: Add USB support for SM8750 QRD platform
Enable USB support on SM8750 QRD variant. Add the PMIC glink node with
connector to enable role switch support.

Signed-off-by: Wesley Cheng <wesley.cheng@oss.qualcomm.com>
[Konrad: Provided diff to flatten USB node on MTP]
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Co-developed-by: Jishnu Prakash <jishnu.prakash@oss.qualcomm.com>
Signed-off-by: Jishnu Prakash <jishnu.prakash@oss.qualcomm.com>
Signed-off-by: Krishna Kurapati <krishna.kurapati@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251101174437.1267998-4-krishna.kurapati@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-02 09:58:02 -06:00
Wesley Cheng
2340f29e2b arm64: dts: qcom: sm8750: Add USB support for SM8750 MTP platform
Enable USB support on SM8750 MTP variant. Add the PMIC glink node with
connector to enable role switch support.

Signed-off-by: Wesley Cheng <wesley.cheng@oss.qualcomm.com>
[Konrad: Provided diff to flatten USB node on MTP]
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Co-developed-by: Jishnu Prakash <jishnu.prakash@oss.qualcomm.com>
Signed-off-by: Jishnu Prakash <jishnu.prakash@oss.qualcomm.com>
Signed-off-by: Krishna Kurapati <krishna.kurapati@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251101174437.1267998-3-krishna.kurapati@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-02 09:58:02 -06:00
Wesley Cheng
7a5bb9f605 arm64: dts: qcom: sm8750: Add USB support to SM8750 SoCs
SM8750 chipset contains a single DWC3 USB3 controller (rev. 200a), SS
QMP PHY (rev. v8) and M31 eUSB2 PHY.  The major difference for SM8750 is
the transition to using the M31 eUSB2 PHY compared to previous SoCs.

Enable USB support on SM8750 MTP and QRD variants. SM8750 has a QMP combo
PHY for the SSUSB path, and a M31 eUSB2 PHY for the HSUSB path.

Signed-off-by: Wesley Cheng <wesley.cheng@oss.qualcomm.com>
[Konrad: Provided diff to flattened USB node]
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Krishna Kurapati <krishna.kurapati@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251101174437.1267998-2-krishna.kurapati@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-02 09:58:02 -06:00
Chen Wang
579d6526aa riscv: sophgo: dts: enable PCIe for SG2042_EVB_V2.0
Enable PCIe controllers for Sophgo SG2042_EVB_V2.0 board,
which uses SG2042 SoC.

Signed-off-by: Han Gao <rabenda.cn@gmail.com>
Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
Link: https://lore.kernel.org/r/50a753f9b8cbd5a90b5b2df737f87fc77a9b33a7.1760929111.git.unicorn_wang@outlook.com
Tested-by: Han Gao <rabenda.cn@gmail.com>
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Chen Wang <wangchen20@iscas.ac.cn>
2025-11-02 07:42:29 +08:00
Chen Wang
c6c215099e riscv: sophgo: dts: enable PCIe for SG2042_EVB_V1.X
Enable PCIe controllers for Sophgo SG2042_EVB_V1.X board,
which uses SG2042 SoC.

Signed-off-by: Han Gao <rabenda.cn@gmail.com>
Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
Link: https://lore.kernel.org/r/1ad96631cc9d9d7403a2bed5585d856fa101a2ef.1760929111.git.unicorn_wang@outlook.com
Tested-by: Han Gao <rabenda.cn@gmail.com>
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Chen Wang <wangchen20@iscas.ac.cn>
2025-11-02 07:42:08 +08:00
Chen Wang
b85ad0d06a riscv: sophgo: dts: enable PCIe for PioneerBox
Enable PCIe controllers for PioneerBox, which uses SG2042 SoC.

Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
Link: https://lore.kernel.org/r/ec474c5eefb79626dd6a4d65454da9109aaf2f4d.1760929111.git.unicorn_wang@outlook.com
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Chen Wang <wangchen20@iscas.ac.cn>
2025-11-02 07:42:06 +08:00
Chen Wang
4e27aca488 riscv: sophgo: dts: add PCIe controllers for SG2042
Add PCIe controller nodes in DTS for Sophgo SG2042.
Default they are disabled.

Acked-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Han Gao <rabenda.cn@gmail.com>
Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
Link: https://lore.kernel.org/r/b34d819cd763482e0ecbc5c5ea721f0101d1f844.1760929111.git.unicorn_wang@outlook.com
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Chen Wang <wangchen20@iscas.ac.cn>
2025-11-02 07:42:02 +08:00
Christian Marangi
7a94d5f31b soc: qcom: smem: better track SMEM uninitialized state
There is currently a problem where, in the specific case of SMEM not
initialized by SBL, any SMEM API wrongly returns PROBE_DEFER
communicating wrong info to any user of this API.

A better way to handle this would be to track the SMEM state and return
a different kind of error than PROBE_DEFER.

Rework the __smem handle to always init it to the error pointer
-EPROBE_DEFER following what is already done by the SMEM API.
If we detect that the SBL didn't initialized SMEM, set the __smem handle
to the error pointer -ENODEV.
Also rework the SMEM API to handle the __smem handle to be an error
pointer and return it appropriately.

This way user of the API can react and return a proper error or use
fallback way for the failing API.

While at it, change the return error when SMEM is not initialized by SBL
also to -ENODEV to make it consistent with the __smem handle and use
dev_err_probe() helper to return the message.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20251031130835.7953-3-ansuelsmth@gmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-01 12:46:12 -05:00
Christian Marangi
652a86b24c err.h: add INIT_ERR_PTR() macro
Add INIT_ERR_PTR() macro to initialize static variables with error
pointers. This might be useful for specific case where there is a static
variable initialized to an error condition and then later set to the
real handle once probe finish/completes.

This is to handle compilation problems like:

error: initializer element is not constant

where ERR_PTR() can't be used.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20251031130835.7953-2-ansuelsmth@gmail.com
[bjorn: Added () suffix on macro references]
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-11-01 12:44:49 -05:00
Johan Jonker
ac7116a626 ARM: dts: rockchip: Add spi_flash label to rk3288-veyron
The u-boot,spl-boot-order property requires a label at a boot
device node. In order to migrate to OF_UPSTREAM more easier
add a spi_flash label to the rk3288-veyron.dtsi file.

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://patch.msgid.link/8a423a6f-bfdc-4947-aef9-35ee7c4f6ca2@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-31 23:45:35 +01:00
Johan Jonker
cafaf99e4a ARM: dts: rockchip: Remove mshc aliases from RK3288
The use of mshc aliases is deprecated for some while,
so remove them from the rk3288.dtsi file.

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://patch.msgid.link/3e461bce-e8e6-48f1-a296-34040447ff25@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-31 23:44:12 +01:00
Johan Jonker
130daaaf10 ARM: dts: rockchip: Adapt tps65910 nodes on RK3066 boards
The binding for tps65910 has been converted to yaml and instead of the
deprecated regulator-compatible, the node-names are now used to identify
the individual regulators. Also some additional required properties
were added.

Adapt the tps65910 nodes on Rockchip boards to adhere to the updated
binding, which also allows us to drop the tps65910.dtsi include.

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://patch.msgid.link/b3d05df4-a916-48e1-8d9e-590782806bd5@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-31 23:42:54 +01:00
Niravkumar L Rabara
95853aaab9 arm64: dts: socfpga: agilex5: add support for 013b board
Agilex5 SoCFPGA 013b is a small form factor development kit.
Supports both tabletop and PCIe add-in card operation. It features
expansion headers for Raspberry Pi 4/5 HATs and Digilent Pmod modules,
enabling integration with popular ecosystems.

Signed-off-by: Niravkumar L Rabara <niravkumarlaxmidas.rabara@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-31 10:13:46 -05:00
Niravkumar L Rabara
58676e819b dt-bindings: intel: Add Agilex5 SoCFPGA 013b board
Add compatible for Agilex5 SoCFPGA 013b board.

Signed-off-by: Niravkumar L Rabara <niravkumarlaxmidas.rabara@altera.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-31 10:12:05 -05:00
Yao Zi
d8a174babf riscv: dts: thead: Add reset controllers of more subsystems for TH1520
Describe reset controllers for VI, MISC, AP, DSP and AO subsystems. The
one for AO subsystem is marked as reserved, since it may be used by AON
firmware.

Reviewed-by: Drew Fustini <fustini@kernel.org>
Signed-off-by: Yao Zi <ziyao@disroot.org>
Signed-off-by: Drew Fustini <fustini@kernel.org>
2025-10-31 10:17:11 +00:00
Marek Vasut
9d22a34a01 arm64: dts: renesas: sparrow-hawk: Fix full-size DP connector node name and labels
The DisplayPort connector on Retronix R-Car V4H Sparrow Hawk board
is a full-size DisplayPort connector. Fix the copy-paste error and
update the DT node name and labels accordingly. No functional change.

Fixes: a719915e76 ("arm64: dts: renesas: r8a779g3: Add Retronix R-Car V4H Sparrow Hawk board support")
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251027184604.34550-1-marek.vasut+renesas@mailbox.org
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-31 10:16:03 +01:00
Niravkumar L Rabara
d07eddcd45 arm64: dts: socfpga: agilex5: add VGIC maintenance interrupt
Add VGIC maintenance interrupt and interrupt-parent property for
interrupt controller, required to run Linux in virtualized environment.

Signed-off-by: Niravkumar L Rabara <niravkumar.l.rabara@intel.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-30 16:52:00 -05:00
Dinh Nguyen
d37c471666 arm64: dts: socfpga: agilex: fix dtbs_check warning for NAND
nand-controller@ffb90000 (altr,socfpga-denali-nand): Unevaluated
properties are not allowed ('flash@0' was unexpected)

Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-30 16:51:59 -05:00
Dinh Nguyen
4bb2d0f87c arm64: dts: socfpga: agilex: fix dtbs_check warning for clock manager
clock-controller@ffd10000 (intel,agilex-clkmgr): 'clocks' is a required
property

Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-30 16:51:59 -05:00
Dinh Nguyen
b8fb4cbe0a arm64: dts: socfpga: stratix10-swvp: fix dtbs_check warnings swvp
Unevaluated properties are not allowed ('phy-addr' was unexpected)

socfpga_stratix10_swvp.dtb: sysmgr@ffd12000 (altr,sys-mgr-s10):
'interrupts' does not match any of the regexes:

Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-30 16:51:59 -05:00
Aaron Kling
ba97758a63 arm64: tegra: Add NVIDIA Jetson Nano 2GB Developer Kit support
This devkit is very similar to P3450, except it has less RAM, no display
port, and only 3 USB host ports. Derive from P3450 and disable the
hardware that is unavailable.

GPIO PA6 is used to control the HDMI power rail and needs to be on for
hotplug detect to work. This is mapped to the 3.3V USB hub on P3450.
That USB rail is not used here, so delete the regulator to avoid
conflicts.

Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-10-30 20:32:03 +01:00
sheetal
0867951a1c arm64: tegra: Add Tegra264 audio support
- Add the audio devices for the Tegra264 SoC in the tegra264.dtsi file,
  which includes sound, HDA and APE(Audio Processing Engine) subsystem
  nodes.
  APE subsystem includes,
   - I/O interfaces such as I2S, DMIC and DSPK (all the available
     instances).
   - HW accelerators such as ASRC, OPE, MVC, SFC, AMX, ADX and Mixer (all
     the available instances).
   - ADMA controller and Interrupt controllers.

- Enable the audio nodes in tegra264-p3971.dtsi platform DT file.

Signed-off-by: sheetal <sheetal@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-10-30 20:31:11 +01:00
Aaron Kling
700c48a438 arm64: tegra: Add Tegra186 pin controllers
Add the device tree nodes for the MAIN and AON pin controllers found on
the Tegra186 family of SoCs.

Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-10-30 20:31:11 +01:00
Thierry Reding
c0ff284584 Merge branch 'for-6.19/dt-bindings' into for-6.19/arm64/dt 2025-10-30 20:31:03 +01:00
Aaron Kling
ccec106924 dt-bindings: arm: tegra: Document Jetson Nano Devkits
Add compatibles for the Tegra210 Jetson Nano Developer Kits

Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-10-30 20:29:47 +01:00
Dmitry Baryshkov
ef659a5bd9 arm64: dts: qcom: rename x1p42100 to purwa
Follow the example of other platforms and rename X1P42100 to purwa.dtsi.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251030-rename-dts-2-v1-3-80c0b81c4d77@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-30 14:01:09 -05:00
Dmitry Baryshkov
dd6edcd7d3 arm64: dts: qcom: rename sc7280 to kodiak
SC7280, QCM6490 and QCS6490 are three names for the same die,
collectively known as 'kodiak'. Follow the example of other platforms
and rename SC7280 to kodiak.dtsi.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251030-rename-dts-2-v1-2-80c0b81c4d77@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-30 14:01:09 -05:00
Dmitry Baryshkov
021df9dee9 arm64: dts: qcom: rename qcm2290 to agatti
QCM2290 and QRB2210 are two names for the same die, collectively known
as 'agatti'. Follow the example of other platforms and rename QCM2290 to
agatti.dtsi.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251030-rename-dts-2-v1-1-80c0b81c4d77@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-30 14:01:08 -05:00
Bjorn Andersson
041c12988a Merge branch 'arm64-fixes-for-6.18' into HEAD
Merge the v6.18-fixes branch before we rename talos.dtsi, to avoid merge
issues throughout the rest of the cycle.
2025-10-30 13:59:52 -05:00
Dmitry Baryshkov
2377626fd2 arm64: dts: qcom: add gpu_zap_shader label
Patching existing DT nodes based on full path is error prone and
generally not recommended. Follow the pattern introduced in the last
platforms, add gpu_zap_shader label to the ZAP node and use it in the
board files.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Link: https://lore.kernel.org/r/20251028-dt-zap-shader-v1-7-7eccb823b986@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-30 13:43:10 -05:00
Dmitry Baryshkov
d994ae0427 arm64: dts: qcom: sc7180: add gpu_zap_shader label
Patching existing DT nodes based on full path is error prone and
generally not recommended. Add a generic zap-shader subnode to the GPU
node on SC7180, delete it on trogdor and IDP, two platforms which don't
use ZAP and patch it with the firmware-name on all other platforms by
using the label.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Link: https://lore.kernel.org/r/20251028-dt-zap-shader-v1-6-7eccb823b986@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-30 13:43:10 -05:00
Dmitry Baryshkov
00d3f7b053 arm64: dts: qcom: sm8250: drop duplicate memory-region defs
The base file, sm8250.dtsi, alread includes memory-region under the
GPU's zap-shader node. Drop duplicates from the individual board files.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Link: https://lore.kernel.org/r/20251028-dt-zap-shader-v1-5-7eccb823b986@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-30 13:43:09 -05:00
Dmitry Baryshkov
6e9612ced0 arm64: dts: qcom: sc8280xp: create common zap-shader node
In order to reduce duplication, mMove common GPU memory configuration
from individual board files to sc8280xp.dtsi.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Link: https://lore.kernel.org/r/20251028-dt-zap-shader-v1-4-7eccb823b986@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-30 13:43:09 -05:00
Dmitry Baryshkov
e3f81bdd46 arm64: dts: qcom: sc8180x: create common zap-shader node
In order to reduce duplication, move common GPU memory configuration
from individual board files to sc8180x.dtsi.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Link: https://lore.kernel.org/r/20251028-dt-zap-shader-v1-3-7eccb823b986@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-30 13:43:09 -05:00
Dmitry Baryshkov
8464b804bd arm64: dts: qcom: sdm845: create common zap-shader node
In order to reduce duplication, move common GPU memory configuration
from individual board files to sdm845.dtsi.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Link: https://lore.kernel.org/r/20251028-dt-zap-shader-v1-2-7eccb823b986@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-30 13:43:09 -05:00
Dmitry Baryshkov
f85592223d arm64: dts: qcom: sdm670: create common zap-shader node
In order to reduce duplication, move common GPU memory configuration
from individual board files to sdm670.dtsi.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Link: https://lore.kernel.org/r/20251028-dt-zap-shader-v1-1-7eccb823b986@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-30 13:43:09 -05:00
Manivannan Sadhasivam
5f37788ade arm64: defconfig: Enable SCSI UFS Crypto and Block Inline encryption drivers
These drivers will allow using the crypto functionalities of the UFS
device, like Inline Crypto Encryption. Both of these drivers are of type
'bool', so they cannot be built as modules.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/20251030095509.5877-1-manivannan.sadhasivam@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-30 12:12:03 -05:00
Paresh Bhagat
da84d094de arm64: dts: ti: k3-am62d2-evm: Remove unused ospi0 chip select pins
Since only a single flash device is connected to ospi0 retain only the
OSPI0_CSn0 pin configuration and remove the unused CSn1-CSn3 pins from
the default pinctrl. This simplifies the ospi0 pin configuration without
affecting functionality.

Signed-off-by: Paresh Bhagat <p-bhagat@ti.com>
Reviewed-by: Andrew Davis <afd@ti.com>
Link: https://patch.msgid.link/20251029032144.502603-1-p-bhagat@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-30 19:42:55 +05:30
Aniket Limaye
671c852fc5 arm64: dts: ti: k3-j784s4: Fix I2C pinmux pull configuration
The I2C pins for some of the instances on J784S4/J742S2/AM69 are
configured as PIN_INPUT_PULLUP while these pins are open-drain type and
do not support internal pull-ups [0][1][2]. The pullup configuration
bits in the corresponding padconfig registers are reserved and any
writes to them have no effect and readback checks on those bits fail.

Update the pinmux settings to use PIN_INPUT instead of PIN_INPUT_PULLUP
to reflect the correct hardware behaviour.

[0]: https://www.ti.com/lit/gpn/tda4ah-q1 (J784S4 Datasheet: Table 5-1. Pin Attributes)
[1]: https://www.ti.com/lit/gpn/tda4ape-q1 (J742S2 Datasheet: Table 5-1. Pin Attributes)
[2]: https://www.ti.com/lit/gpn/am69a (AM69 Datasheet: Table 5-1. Pin Attributes)

Fixes: e20a06aca5 ("arm64: dts: ti: Add support for J784S4 EVM board")
Fixes: 635fb18ba0 ("arch: arm64: dts: Add support for AM69 Starter Kit")
Fixes: 0ec1a48d99 ("arm64: dts: ti: k3-am69-sk: Add pinmux for RPi Header")
Signed-off-by: Aniket Limaye <a-limaye@ti.com>
Reviewed-by: Udit Kumar <u-kumar1@ti.com>
Link: https://patch.msgid.link/20251022122638.234367-1-a-limaye@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-30 19:13:26 +05:30
Dominik Haller
33a2c98cab arm64: dts: ti: k3-am68-phycore-som: Add pmic label
Add a label to the pmic node which is necessary to configure the PMIC
ESM in the SPL devicetree.

Signed-off-by: Dominik Haller <d.haller@phytec.de>
Link: https://patch.msgid.link/20251014002240.85045-1-d.haller@phytec.de
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-30 19:13:26 +05:30
Sai Sree Kartheek Adivi
5ccb63373d arm64: dts: ti: k3-am642-evm: Add DMA support for TSCADC
Add DMA support for tscadc0

Signed-off-by: Sai Sree Kartheek Adivi <s-adivi@ti.com>
Link: https://patch.msgid.link/20251013115225.3668641-1-s-adivi@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-30 19:13:26 +05:30
Stefano Radaelli
397dae3cec arm64: dts: ti: var-som-am62p: Add support for ADS7846 touchscreen
The VAR-SOM-AM62P integrates an ADS7846 resistive touchscreen controller.
The controller is physically located on the SOM, and its signals are
routed to the SOM pins, allowing carrier boards to make use of it.

This patch adds the ADS7846 node under the appropriate SPI controller.

Signed-off-by: Stefano Radaelli <stefano.radaelli21@gmail.com>
Link: https://patch.msgid.link/20251003125031.30539-4-stefano.radaelli21@gmail.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-30 19:13:26 +05:30
Stefano Radaelli
6f41007cf8 arm64: dts: ti: var-som-am62p: Add support for WM8904 audio codec
The VAR-SOM-AM62P can integrate the WM8904, a high-performance
ultra-low-power stereo codec optimized for portable audio applications.

This patch adds the WM8904 device to the appropriate I2C bus, enables
the McASP1 peripheral, and introduces the sound node to expose the
sound card to the system.

Signed-off-by: Stefano Radaelli <stefano.radaelli21@gmail.com>
Link: https://patch.msgid.link/20251003125031.30539-3-stefano.radaelli21@gmail.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-30 19:13:26 +05:30
Stefano Radaelli
1d10e0e78c arm64: dts: ti: var-som-am62p: Refactor IPC configuration into common dtsi
Update the VAR-SOM-AM62P dtsi to align with the refactor introduced by
k3-am62p-ti-ipc-firmware.dtsi common file, allowing to remove the IPC
nodes from the board file including the new common dtsi.

No functional changes intended.

Signed-off-by: Stefano Radaelli <stefano.radaelli21@gmail.com>
Link: https://patch.msgid.link/20251003125031.30539-2-stefano.radaelli21@gmail.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-30 19:13:26 +05:30
Paresh Bhagat
50856649d6 arm64: dts: ti: k3-am62d2-evm: Enable PMIC
Add support for TPS65224 PMIC family on wakeup I2C0 bus. This
device provides regulators (bucks and LDOs), along with GPIOs,
and monitors SOC's MCU error signal.

Signed-off-by: Paresh Bhagat <p-bhagat@ti.com>
Reviewed-by: Shree Ramamoorthy <s-ramamoorthy@ti.com>
Link: https://patch.msgid.link/20251028213645.437957-3-p-bhagat@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-30 19:13:26 +05:30
Paresh Bhagat
394b02210a arm64: dts: ti: k3-am62d2-evm: Fix PMIC padconfig
Fix the PMIC padconfig for AM62D. PMIC's INT pin is connected to the
SoC's EXTINTn input.

Reference Docs
Datasheet - https://www.ti.com/lit/ug/sprujd4/sprujd4.pdf
Schematics - https://www.ti.com/lit/zip/sprcal5

Fixes: 1544bca2f1 ("arm64: dts: ti: Add support for AM62D2-EVM")
Cc: stable@vger.kernel.org
Signed-off-by: Paresh Bhagat <p-bhagat@ti.com>
Link: https://patch.msgid.link/20251028213645.437957-2-p-bhagat@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-30 19:13:26 +05:30
Paresh Bhagat
0103435072 arm64: dts: ti: k3-am62d2-evm: Fix regulator properties
Fix missing supply for regulators TLV7103318QDSERQ1 and TPS22918DBVR.
Correct padconfig and gpio for TLV7103318QDSERQ1.

Reference Docs
Datasheet - https://www.ti.com/lit/ug/sprujd4/sprujd4.pdf
Schematics - https://www.ti.com/lit/zip/sprcal5

Fixes: 1544bca2f1 ("arm64: dts: ti: Add support for AM62D2-EVM")
Cc: stable@vger.kernel.org
Signed-off-by: Paresh Bhagat <p-bhagat@ti.com>
Reviewed-by: Shree Ramamoorthy <s-ramamoorthy@ti.com>
Link: https://patch.msgid.link/20251028210153.420473-1-p-bhagat@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-30 19:13:26 +05:30
Michael Walle
0114330e7f arm64: dts: ti: sa67: add overlay for the ADS2 carrier
The SMARC module can be used on the Kontron SMARC 2.2 Evaluation carrier
(ads2). Add an overlay to enable all the devices found on the carrier
and enable the corresponding peripherals of the SoC.

Signed-off-by: Michael Walle <mwalle@kernel.org>
Link: https://patch.msgid.link/20251017135116.548236-4-mwalle@kernel.org
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-30 19:13:25 +05:30
Michael Walle
1c3c4df06f arm64: dts: ti: Add support for Kontron SMARC-sAM67
Add device tree support for the Kontron SMARC-sAM67 module, which is
based on a TI AM67A SoC.

The module features:
 * Quad-core AM67A94 at 1.4GHz with 8 GiB RAM
 * 64 GiB eMMC, 4 MiB SPI flash for failsafe booting
 * Dedicated RTC
 * Multiple interfaces: 4x UART, 2x USB 2.0/USB 3.2, 2x GBE, QSPI,
        7x I2C,
 * Display support: 2x LVDS, 1x DSI (*), 1x DP (*)
 * Camera support: 4x CSI (*)
 * Onboard microcontroller for boot control, failsafe booting and
   external watchdog

(*) not yet supported by the kernel

There is a base device tree and overlays which will add optional
features. At the moment there is one full featured variant of that
board whose device tree is generated during build by merging all the
device tree overlays.

Signed-off-by: Michael Walle <mwalle@kernel.org>
Reviewed-by: Udit Kumar <u-kumar1@ti.com>
Link: https://patch.msgid.link/20251017135116.548236-3-mwalle@kernel.org
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-30 19:13:25 +05:30
Michael Walle
22e1d0d8cd dt-bindings: arm: ti: Add Kontron SMARC-sAM67 module
Add devicetree bindings for the AM67 based Kontron SMARC-sAM67
module.

Signed-off-by: Michael Walle <mwalle@kernel.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20251017135116.548236-2-mwalle@kernel.org
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-30 19:13:25 +05:30
Michael Walle
a3a74f9b15 arm64: dts: ti: k3-j722s-main: fix the audio refclk source
At the moment the clock parent of the audio extclk output is PLL1_HSDIV6
of the main domain. This very clock output is also used among various IP
cores, for example for the USB1 LPM clock. The audio extclock being an
external clock output with a variable frequency, it is likely that a
user of this clock will try to set it's frequency to a different value,
i.e. an audio codec. Because that clock output is used also for other IP
cores, bad things will happen.

Instead of using PLL1_HSDIV6 use the PLL2_HSDIV8 as a sane default, as
this output is exclusively used among other audio peripherals.

Signed-off-by: Michael Walle <mwalle@kernel.org>
Link: https://patch.msgid.link/20251017102228.530517-2-mwalle@kernel.org
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-30 19:13:25 +05:30
Michael Walle
b2f45bdc18 arm64: dts: ti: k3-j722s-evm: explicitly use PLL1_HSDIV6 audio refclk
The parent of the audio_refclk0/1 is about to change to a sane default
value. Thus, move the (soon to be) non-default value into the board
device tree.

Signed-off-by: Michael Walle <mwalle@kernel.org>
Link: https://patch.msgid.link/20251017102228.530517-1-mwalle@kernel.org
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-30 19:13:17 +05:30
Randy Dunlap
74a7b4f183 sysctl: fix kernel-doc format warning
Describe the "type" struct member using '@type' and move it together
with the rest of the doc for ctl_table_header to avoid a kernel-doc
warning:

Warning: include/linux/sysctl.h:178 Incorrect use of kernel-doc format:
 * enum type - Enumeration to differentiate between ctl target types

Fixes: 2f2665c13a ("sysctl: replace child with an enumeration")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-10-30 13:23:09 +01:00
Louis-Alexis Eyraud
4459d667a3 dt-bindings: soc: mediatek: pwrap: Add compatible for MT8189 SoC
Add compatible string for the PWRAP block on MT8189 SoC, which is
compatible with the one used on MT8195.

Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-30 10:25:05 +01:00
Louis-Alexis Eyraud
781bb149da soc: mediatek: mtk-socinfo: Add entry for MT8391AV/AZA Genio 720
Add an entry for the MT8391 SoC with commercial name Genio 720.

Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-30 10:12:05 +01:00
Li Liu
0b4bbf0514 arm64: dts: qcom: Add display support for QCS615 RIDE board
Add display MDSS and DSI configuration for QCS615 RIDE board.
QCS615 has a DP port, and DP support will be added in a later patch.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Li Liu <li.liu@oss.qualcomm.com>
Signed-off-by: Fange Zhang <fange.zhang@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250903-add-display-support-for-qcs615-platform-v8-2-7971c05d1262@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-29 11:57:52 -05:00
Li Liu
bf9f0bb2ec arm64: dts: qcom: Add display support for QCS615
Add display MDSS and DSI configuration for QCS615 platform.
QCS615 has a DP port, and DP support will be added in a later patch.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Li Liu <li.liu@oss.qualcomm.com>
Signed-off-by: Fange Zhang <fange.zhang@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250903-add-display-support-for-qcs615-platform-v8-1-7971c05d1262@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-29 11:57:51 -05:00
Sarthak Garg
c3398456f6 arm64: dts: qcom: sm8550: Limit max SD HS mode frequency by default
Due to an implementation detail in this SoC, additional passive
electrical components are required to achieve the maximum rated speed
of the SD controller when paired with a High-Speed SD Card. Without
them, the clock frequency must be limited to 37.5 MHz for link stability.

Because the reference design does not contain these components, most
(derivative) boards do not have them either. To accommodate for that,
apply the frequency limit by default and delegate lifting it to the
odd boards that do contain the necessary onboard hardware.

Signed-off-by: Sarthak Garg <quic_sartgarg@quicinc.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20250908104122.2062653-5-quic_sartgarg@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-29 11:56:11 -05:00
Xin Liu
500d3d0e88 arm64: dts: qcom: qcs615-ride: Update 'model' string for qcs615 ride
Update the 'model' property in the QCS615-ride device tree to include
the public board name "IQ-615 Beta EVK". This ensures consistency with
official documentation and release notes.

Signed-off-by: Xin Liu <xin.liu@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251029023137.381386-1-xin.liu@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-29 11:49:23 -05:00
Haotian Zhang
dc5db35073 soc: qcom: smem: fix hwspinlock resource leak in probe error paths
The hwspinlock acquired via hwspin_lock_request_specific() is not
released on several error paths. This results in resource leakage
when probe fails.

Switch to devm_hwspin_lock_request_specific() to automatically
handle cleanup on probe failure. Remove the manual hwspin_lock_free()
in qcom_smem_remove() as devm handles it automatically.

Fixes: 20bb6c9de1 ("soc: qcom: smem: map only partitions used by local HOST")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251029022733.255-1-vulab@iscas.ac.cn
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-29 11:46:57 -05:00
Conor Dooley
4aac11c9a6 soc: microchip: add mfd drivers for two syscon regions on PolarFire SoC
The control-scb and mss-top-sysreg regions on PolarFire SoC both fulfill
multiple purposes. The former is used for mailbox functions in addition
to the temperature & voltage sensor while the latter is used for clocks,
resets, interrupt muxing and pinctrl.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-10-29 16:22:37 +00:00
Sibi Sankar
df8764c2cb dt-bindings: soc: qcom,aoss-qmp: Document the Glymur AOSS side channel
Document the Always-on Subsystem side channel on the Glymur SoC.

Signed-off-by: Sibi Sankar <sibi.sankar@oss.qualcomm.com>
Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20251022-knp-soc-binding-v2-4-3cd3f390f3e2@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-29 11:16:05 -05:00
Jingyi Wang
0fb35ecee0 dt-bindings: soc: qcom,aoss-qmp: Document the Kaanapali AOSS channel
Document the Always-On Subsystem side channel on the Qualcomm Kaanapali
platform for communication with client found on the SoC such as
remoteprocs.

Reviewed-by: Eugen Hristev <eugen.hristev@linaro.org>
Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20251022-knp-soc-binding-v2-2-3cd3f390f3e2@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-29 11:15:53 -05:00
Yongxing Mou
58fbc8208e soc: qcom: ubwc: Add QCS8300 UBWC cfg
The QCS8300 supports UBWC 4.0 and 4 channels LP5 memory interface. Use
the SC8280XP data structure for QCS8300 according to the specification.

Acked-by: Bjorn Andersson <andersson@kernel.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Yongxing Mou <yongxing.mou@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251029-qcs8300_mdss-v13-4-e8c8c4f82da2@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-29 11:02:48 -05:00
Sirius Wang
dd1d7af420 soc: mediatek: mtk-socinfo: Add extra entry for MT8189
The MT8189 has a different socinfo match for MT8189 SoC
(commercial name Kompanio 540), so add it the driver.

Signed-off-by: Sirius Wang <sirius.wang@mediatek.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-29 14:12:09 +01:00
Shiva Tripathi
67106d2171 arm64: dts: ti: k3-am62: Add RNG node
Add EIP76 Random Number Generator (RNG) node within crypto engine for
AM62 and AM62A SoCs. The RNG hardware is integrated in crypto
subsystem at address 0x40910000.

Mark the RNG node with status "reserved" as it is intended for use by
OP-TEE for secure random number generation. If required, this hardware
can also be used through Linux kernel by enabling this node.

Signed-off-by: Shiva Tripathi <s-tripathi1@ti.com>
Link: https://patch.msgid.link/20250926100229.923547-1-s-tripathi1@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-29 11:06:40 +05:30
Luo Jie
e7a1bf542c arm64: dts: qcom: ipq5424: Add NSS clock controller node
NSS clock controller provides the clocks and resets to the networking
hardware blocks on the IPQ5424, such as PPE (Packet Process Engine) and
UNIPHY (PCS) blocks.

Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://lore.kernel.org/r/20251014-qcom_ipq5424_nsscc-v7-9-081f4956be02@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-28 16:44:43 -05:00
Bjorn Andersson
12fe71e70f Merge branch '20251014-qcom_ipq5424_nsscc-v7-2-081f4956be02@quicinc.com' into HEAD
Merge IPQ5424 DeviceTree bindings for the Network Subsystem clock
controller from topic branch, to gain access to binding constants.
2025-10-28 16:43:57 -05:00
Sakari Ailus
122df8416e bus: sunxi-rsb: Remove redundant pm_runtime_mark_last_busy() calls
pm_runtime_put_autosuspend(), pm_runtime_put_sync_autosuspend(),
pm_runtime_autosuspend() and pm_request_autosuspend() now include a call
to pm_runtime_mark_last_busy(). Remove the now-reduntant explicit call to
pm_runtime_mark_last_busy().

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251027133359.392984-1-sakari.ailus@linux.intel.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
2025-10-29 01:10:10 +08:00
Chen-Yu Tsai
ae0d3f1e6d arm64: dts: allwinner: a523: Add SPDIF TX pin on PB and PI pins
The SPDIF TX (called OWA OUT in the datasheet) is available on three
pins. Of those, the PH pin is unlikely to be used since it conflicts
with the first Ethernet controller.

The Radxa Cubie A5E exposes SPDIF TX through the PI pin group on the
40-pin GPIO header.

The Orange Pi 4A exposes SPDIF TX through both the PB and PI pin
groups on the 40-pin GPIO header. The PB pin alternatively would be
used for I2S0 though.

Add pinmux settings for both options so potential users can directly
reference either one.

Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251027125655.793277-10-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
2025-10-29 01:07:33 +08:00
Chen-Yu Tsai
a9050236f8 arm64: dts: allwinner: a523: Add I2S2 pins on PI pin group
The Radxa Cubie A5E exposes I2S2 through the PI pin group on the 40-pin
GPIO header.

Add a pinmux setting for it so potential users can directly reference
it.

Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251027125655.793277-9-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
2025-10-29 01:07:33 +08:00
Chen-Yu Tsai
1fe1e9b671 arm64: dts: allwinner: a523: Add device nodes for I2S controllers
The A523 family of SoCs have four I2S controllers capable of both
playback and capture. The user manual also implies that I2S2 also
outputs to the eDP interface controller.

Add device nodes for all of them.

Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251027125655.793277-8-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
2025-10-29 01:07:33 +08:00
Chen-Yu Tsai
e51b773798 arm64: dts: allwinner: a523: Add device node for SPDIF block
The A523 has a SPDIF interface that is capable of both playback and
capture.

Add a node for it.

Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251027125655.793277-7-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
2025-10-29 01:07:33 +08:00
Chen-Yu Tsai
55d43ef777 arm64: dts: allwinner: a523: Add DMA controller device nodes
The A523 has two DMA controllers. Add device nodes for both. Also hook
up DMA for existing devices.

Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251027125655.793277-6-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
2025-10-29 01:07:33 +08:00
Chen-Yu Tsai
697fbb43ab dt-bindings: dma: allwinner,sun50i-a64-dma: Add compatibles for A523
There are two DMA controllers on the A523, one in the main system area
and the other for the MCU. These are the same as the one found on the
A100. The only difference is the DMA endpoint (DRQ) layout.

Since the number of channels and endpoints are described with additional
generic properties, just add new A523-specific compatible strings and
fallback to the A100 one.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20251027125655.793277-2-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
2025-10-29 01:07:33 +08:00
Richard Genoud
f55a252631 arm64: dts: allwinner: h616: add NAND controller
The H616 has a NAND controller quite similar to the A10/A23 ones, but
with some register differences, more clocks (for ECC and MBUS), more ECC
strengths, so this requires a new compatible string.

Add the NAND controller node and pins in the device tree.

Signed-off-by: Richard Genoud <richard.genoud@bootlin.com>
Link: https://patch.msgid.link/20251028073534.526992-17-richard.genoud@bootlin.com
[wens@kernel.org: Fixed alignment of clocks in nand-controller node]
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
2025-10-29 01:00:56 +08:00
Thomas Weißschuh
8e03c195cc um: Avoid circular dependency on asm-offsets in pgtable.h
Recent changes have added an include of as-layout.h to pgtable.h.
However this introduces a circular dependency during asm-offsets
generation as as-layout.h depends on asm-offsets and pgtable.h is an
input for asm-offsets.

Building from a clean state results in the following error:

  CC      arch/um/kernel/asm-offsets.s
In file included from arch/um/include/asm/pgtable.h:48,
                 from include/linux/pgtable.h:6,
                 from include/linux/mm.h:31,
                 from include/linux/pid_namespace.h:7,
                 from include/linux/ptrace.h:10,
                 from include/linux/audit.h:13,
                 from arch/um/kernel/asm-offsets.c:8:
arch/um/include/shared/as-layout.h:9:10: fatal error: generated/asm-offsets.h: No such file or directory
    9 | #include <generated/asm-offsets.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[4]: *** [scripts/Makefile.build:182: arch/um/kernel/asm-offsets.s] Error 1

As the inclusion of as-layout.h in pgtable.h is not yet needed while
asm-offsets are generated, break the dependency here.

Fixes: a7f7dbae94 ("um: Remove file-based iomem emulation support")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251028-uml-offsets-circular-v1-1-601c363cfaaa@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-28 11:40:49 +01:00
Michal Wilczynski
a9fb9c7e3b riscv: dts: thead: Add PWM fan and thermal control
Add Device Tree nodes to enable a PWM controlled fan and it's associated
thermal management for the Lichee Pi 4A board.

This enables temperature-controlled active cooling for the Lichee Pi 4A
board based on SoC temperature.

Reviewed-by: Drew Fustini <fustini@kernel.org>
Tested-by: Drew Fustini <fustini@kernel.org>
Reviewed-by: Elle Rhumsaa <elle@weathered-steel.dev>
Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
2025-10-28 09:38:44 +00:00
Michal Wilczynski
b6b00b220d riscv: dts: thead: Add PWM controller node
Add the Device Tree node for the T-HEAD TH1520 SoC's PWM controller.

Reviewed-by: Drew Fustini <fustini@kernel.org>
Tested-by: Drew Fustini <fustini@kernel.org>
Reviewed-by: Elle Rhumsaa <elle@weathered-steel.dev>
Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
2025-10-28 09:38:41 +00:00
Frank Li
7b204755eb arm64: dts: layerscape: add dma-coherent for usb node
Add SOC special compatible string, remove fallback snps,dwc3 to let flatten
dwc3-layerscape driver to be probed and enable dma-coherence for usb node
since commit add layerscape dwc3 support, which set correct gsbustcfg0
value.

Add iommus property to run at old uboot, which use fixup add iommus by
check compatible string snsp,dwc3 compatible string.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-28 16:54:11 +08:00
Marek Vasut
6e20a9d94a arm64: dts: renesas: r8a77961: Add GX6250 GPU node
Describe Imagination Technologies PowerVR Rogue GX6250 BNVC 4.45.2.58
present in Renesas R-Car R8A77961 M3-W+ SoC.

Acked-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251022033847.471106-3-marek.vasut+renesas@mailbox.org
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:47 +01:00
Marek Vasut
73100fa8e4 arm64: dts: renesas: r8a77960: Add GX6250 GPU node
Describe Imagination Technologies PowerVR Rogue GX6250 BNVC 4.45.2.58
present in Renesas R-Car R8A77960 M3-W SoC.

Acked-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251022033847.471106-2-marek.vasut+renesas@mailbox.org
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:47 +01:00
Wolfram Sang
82e6de6ebc ARM: dts: renesas: kzm9g: Name interrupts for accelerometer
Name the interrupts to make them descriptive.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://patch.msgid.link/20251011212358.3347-2-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:47 +01:00
Lad Prabhakar
5c036f6fe8 arm64: dts: renesas: r9a09g087: Add Cortex-A55 PMU node
Enable the performance monitor unit for the Cortex-A55 cores on the
RZ/N2H (R9A09G087) SoC.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251007121508.1595889-5-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:47 +01:00
Lad Prabhakar
06414e30fa arm64: dts: renesas: r9a09g077: Add Cortex-A55 PMU node
Enable the performance monitor unit for the Cortex-A55 cores on the
RZ/T2H (R9A09G077) SoC.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251007121508.1595889-4-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:46 +01:00
Lad Prabhakar
19bbd91790 arm64: dts: renesas: r9a09g056: Add Cortex-A55 PMU node
Enable the performance monitor unit for the Cortex-A55 cores on the
RZ/V2N (R9A09G056) SoC.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251007121508.1595889-3-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:46 +01:00
Lad Prabhakar
32bd03f255 arm64: dts: renesas: r9a09g057: Add Cortex-A55 PMU node
Enable the performance monitor unit for the Cortex-A55 cores on the
RZ/V2H(P) (R9A09G057) SoC.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251007121508.1595889-2-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:46 +01:00
Wolfram Sang
ca7fffb6e9 ARM: dts: renesas: r9a06g032-rzn1d400-db: Drop invalid #cells properties
The 'ethernet-ports' node in the SoC DTSI handles them already. Fixes:

    arch/arm/boot/dts/renesas/r9a06g032-rzn1d400-db.dtb: switch@44050000 (renesas,r9a06g032-a5psw): Unevaluated properties are not allowed ('#address-cells', '#size-cells' were unexpected)
	    from schema $id: http://devicetree.org/schemas/net/dsa/renesas,rzn1-a5psw.yaml#

Fixes: 5b6d7c3c58 ("ARM: dts: r9a06g032-rzn1d400-db: Add switch description")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251007104624.19786-2-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:46 +01:00
Geert Uytterhoeven
5820baefba arm64: dts: renesas: v3msk: Enable watchdog timer
Enable the watchdog timer on the R-Car V3M Starter Kit board.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://patch.msgid.link/e30fb396d73307f2538a638cdda06ca58a1a4e60.1759830182.git.geert+renesas@glider.be
2025-10-28 09:23:46 +01:00
Wolfram Sang
4651760fb2 arm64: dts: renesas: r8a779h0: Add SWDT node
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251006082520.10570-19-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:46 +01:00
Wolfram Sang
639ddf8220 arm64: dts: renesas: r8a779g0: Add SWDT node
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251006082520.10570-16-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:46 +01:00
Wolfram Sang
da07140e04 arm64: dts: renesas: r8a779f0: Add SWDT node
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251006082520.10570-14-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:46 +01:00
Wolfram Sang
91c8012077 arm64: dts: renesas: r8a779a0: Add SWDT node
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251006082520.10570-12-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:46 +01:00
Cosmin Tanislav
62fb11fba0 arm64: dts: renesas: rzt2h/rzn2h-evk: Enable ADCs
The ADCs on RZ/T2H and RZ/N2H are exposed on the evaluation kit boards.

Enable them.

Signed-off-by: Cosmin Tanislav <cosmin-gabriel.tanislav.xa@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251005111323.804638-6-cosmin-gabriel.tanislav.xa@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:46 +01:00
Cosmin Tanislav
4ed27b4fde arm64: dts: renesas: r9a09g087: Add ADCs support
Renesas RZ/T2H (R9A09G087) includes three 12-Bit successive
approximation A/D converters, two 4-channel ADCs, and one 15-channel
ADC.

Add support for all of them.

Signed-off-by: Cosmin Tanislav <cosmin-gabriel.tanislav.xa@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251005111323.804638-5-cosmin-gabriel.tanislav.xa@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:46 +01:00
Cosmin Tanislav
a82a42963c arm64: dts: renesas: r9a09g077: Add ADCs support
Renesas RZ/T2H (R9A09G077) includes three 12-Bit successive
approximation A/D converters, two 4-channel ADCs, and one 6-channel ADC.

Add support for all of them.

Signed-off-by: Cosmin Tanislav <cosmin-gabriel.tanislav.xa@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251005111323.804638-4-cosmin-gabriel.tanislav.xa@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:46 +01:00
Niklas Söderlund
69daad87d3 ARM: dts: renesas: koelsch: Update ADV7180 binding
Use the more modern adi,adv7180cp compatible for the CVBS input found on
R-Car Gen2 Koelsch boards. This aligns the bindings with the other Gen2
board with the same setup Gose.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251003215318.39757-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:46 +01:00
Geert Uytterhoeven
bc8fd8995e ARM: dts: renesas: r9a06g032: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" properties from the soc and timer
nodes to the root node, to reduce duplication.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/8416011a488aa5ba883fca2647d09e21cad26351.1759414774.git.geert+renesas@glider.be
2025-10-28 09:23:46 +01:00
Geert Uytterhoeven
cdf3f05840 ARM: dts: renesas: r8a7794: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/e0fd5e98d27c266e9498350a44747d314ce87e71.1759414774.git.geert+renesas@glider.be
2025-10-28 09:23:46 +01:00
Geert Uytterhoeven
68b5a2072e ARM: dts: renesas: r8a7793: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/a561c3ee412df8e6fd293a91fa0aa5d303143d22.1759414774.git.geert+renesas@glider.be
2025-10-28 09:23:46 +01:00
Geert Uytterhoeven
f53816bded ARM: dts: renesas: r8a7792: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/3fc9ca6fd1469ec76c6c820a8c966b0a6652fbad.1759414774.git.geert+renesas@glider.be
2025-10-28 09:23:46 +01:00
Geert Uytterhoeven
06f429c9f0 ARM: dts: renesas: r8a7791: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/32809538c2ceedcd142fc419918c6928870bbb6c.1759414774.git.geert+renesas@glider.be
2025-10-28 09:23:46 +01:00
Geert Uytterhoeven
13bb95c7e8 ARM: dts: renesas: r8a7790: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/d98bdf49a93db0e17a73b9be6cae5cbc8da76488.1759414774.git.geert+renesas@glider.be
2025-10-28 09:23:46 +01:00
Geert Uytterhoeven
f07b2b42c8 ARM: dts: renesas: r8a77470: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/13edb8c780f21366343268a0c8f1ab5d54032c66.1759414774.git.geert+renesas@glider.be
2025-10-28 09:23:46 +01:00
Geert Uytterhoeven
48ccd2949d ARM: dts: renesas: r8a7745: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/fc23a6b5b7c8d92334089770854535f088201d58.1759414774.git.geert+renesas@glider.be
2025-10-28 09:23:46 +01:00
Geert Uytterhoeven
7a7ab7c3a7 ARM: dts: renesas: r8a7744: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/412460167747bd26e962b5cb022a85dcac31a00c.1759414774.git.geert+renesas@glider.be
2025-10-28 09:23:46 +01:00
Geert Uytterhoeven
e291e4c000 ARM: dts: renesas: r8a7743: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/0c96651b9b7307cad03c42da88e4115629c5ae60.1759414774.git.geert+renesas@glider.be
2025-10-28 09:23:46 +01:00
Geert Uytterhoeven
07e1e027c4 ARM: dts: renesas: r8a7742: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/eab2e4860569e877e66b2f35940ba00e5ec7ff55.1759414774.git.geert+renesas@glider.be
2025-10-28 09:23:45 +01:00
Geert Uytterhoeven
336ade03f7 ARM: dts: renesas: r7s9210: Remove duplicate interrupt-parent
There are two identical interrupt-parent properties: one at the top
level, and one under the soc node.  Remove the latter, as it is
unneeded.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/ef9e56dfb55da092bdc489309309bf4262651042.1759414774.git.geert+renesas@glider.be
2025-10-28 09:23:45 +01:00
Geert Uytterhoeven
aaee68616c ARM: dts: renesas: r7s72100: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/6aaabd73f6732f932b5708b1036a9c398c44cd19.1759414774.git.geert+renesas@glider.be
2025-10-28 09:23:45 +01:00
Wolfram Sang
00df14f346 ARM: dts: renesas: gose: Remove superfluous port property
'bus-width' is defined for the corresponding vin input port already.
No need to declare it in the output port again. Fixes:

    arch/arm/boot/dts/renesas/r8a7793-gose.dtb: composite-in@20 (adi,adv7180cp): ports:port@3:endpoint: Unevaluated properties are not allowed ('bus-width' was unexpected)
    from schema $id: http://devicetree.org/schemas/media/i2c/adi,adv7180.yaml#

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250929093616.17679-2-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Wolfram Sang
502679d256 arm64: dts: renesas: eagle/v3msk: Mark SWDT as reserved
This watchdog can't be used with Linux because the firmware needs it on
V3M boards. Sadly, it doesn't mark the node as reserved, so this is
added manually here.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250925093941.8800-26-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Wolfram Sang
d3034fa6a2 arm64: dts: renesas: ebisu: Mark SWDT as reserved
This watchdog can't be used with Linux because the firmware needs it on
Ebisu boards. Sadly, it doesn't mark the node as reserved, so this is
added manually here.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250925093941.8800-25-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Wolfram Sang
f30dbf65a1 arm64: dts: renesas: draak: Mark SWDT as reserved
This watchdog can't be used with Linux because the firmware needs it on
Draak boards. Sadly, it doesn't mark the node as reserved, so this is
added manually here.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250925093941.8800-24-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Wolfram Sang
48aba08e5e arm64: dts: renesas: ulcb: Mark SWDT as reserved
This watchdog can't be used with Linux because the firmware needs it on
ULCB boards. Sadly, it doesn't mark the node as reserved, so this is
added manually here.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250925093941.8800-23-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Wolfram Sang
249e6bb275 arm64: dts: renesas: salvator-common: Mark SWDT as reserved
This watchdog can't be used with Linux because the firmware needs it on
Salvator boards. Sadly, it doesn't mark the node as reserved, so this is
added manually here.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250925093941.8800-22-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Wolfram Sang
55af20f55c arm64: dts: renesas: r8a77995: Add SWDT node
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250925093941.8800-21-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Wolfram Sang
8d3348b115 arm64: dts: renesas: r8a77990: Add SWDT node
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250925093941.8800-20-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Wolfram Sang
5fb22fc1d0 arm64: dts: renesas: r8a77970: Add SWDT node
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250925093941.8800-19-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Wolfram Sang
64d0f44a42 arm64: dts: renesas: r8a77965: Add SWDT node
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250925093941.8800-18-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Wolfram Sang
adab61b448 arm64: dts: renesas: r8a77961: Add SWDT node
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250925093941.8800-17-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Wolfram Sang
eb254eb1f4 arm64: dts: renesas: r8a77960: Add SWDT node
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250925093941.8800-16-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Wolfram Sang
f8a66f7c5a arm64: dts: renesas: r8a77951: Add SWDT node
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250925093941.8800-15-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Kuninori Morimoto
db5a848a2c arm64: dts: renesas: r9a09g087: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87bjn08o7z.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Kuninori Morimoto
ad58d1078a arm64: dts: renesas: r9a09g077: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87cy7g8o83.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Kuninori Morimoto
098da100b3 arm64: dts: renesas: r9a09g057: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87ecrw8o87.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Kuninori Morimoto
e45e76a02b arm64: dts: renesas: r9a09g056: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87frcc8o8b.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Kuninori Morimoto
e57389d554 arm64: dts: renesas: r9a09g047: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87h5ws8o8g.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Kuninori Morimoto
0deef14e7e arm64: dts: renesas: r9a09g011: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87ikh88o8k.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Kuninori Morimoto
1342f314c4 arm64: dts: renesas: r9a08g045: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87jz1o8o8o.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:45 +01:00
Kuninori Morimoto
3a7b120a64 arm64: dts: renesas: r9a07g054: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87ldm48o8s.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
7f65d9f7c9 arm64: dts: renesas: r9a07g044: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87ms6k8o8x.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
3fbaac745e arm64: dts: renesas: r9a07g043u: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87o6r08o91.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
26564e4031 arm64: dts: renesas: r8a779h0: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87plbg8o96.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
7ba09f8b44 arm64: dts: renesas: r8a779g0: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87qzvw8o9v.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
c0811acfa6 arm64: dts: renesas: r8a779f0: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87segc8oa0.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
e82821bf63 arm64: dts: renesas: r8a779a0: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87tt0s8oa4.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
ac0db59999 arm64: dts: renesas: r8a77995: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87v7l88oa9.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
31fe8ed6ea arm64: dts: renesas: r8a77990: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87wm5o8oae.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
ee578ced71 arm64: dts: renesas: r8a77980: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87y0q48oaj.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
26bed1290c arm64: dts: renesas: r8a77970: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87zfak8oap.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
7b71ddcf5d arm64: dts: renesas: r8a77965: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/871pnwa2vb.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
7b76c923f5 arm64: dts: renesas: r8a77961: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87348ca2vi.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
157acc5043 arm64: dts: renesas: r8a77960: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/874issa2vn.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
e281af623d arm64: dts: renesas: r8a77951: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/875xd8a2vs.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
a17efe3ea0 arm64: dts: renesas: r8a774e1: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/877bxoa2vy.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
734b69c420 arm64: dts: renesas: r8a774c0: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/878qi4a2w4.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
9ac98796cd arm64: dts: renesas: r8a774b1: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87a52ka2wa.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Kuninori Morimoto
ef3db38c5d arm64: dts: renesas: r8a774a1: Move interrupt-parent to root node
Move the "interrupt-parent = <&gic>" property from the soc node to the
root node, and simplify "interrupts-extended = <&gic ...>" to
"interrupts = <...>".

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87bjn0a2wk.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Hai Pham
ad142a4ef7 arm64: dts: renesas: r8a78000: Add initial Ironhide board support
Add initial support for the Renesas Ironhide board, which is based on
the R-Car X5H (R8A78000) SoC.

Signed-off-by: Hai Pham <hai.pham.ud@renesas.com>
Signed-off-by: Vinh Nguyen <vinh.nguyen.xz@renesas.com>
Signed-off-by: Takeshi Kihara <takeshi.kihara.df@renesas.com>
Signed-off-by: Khanh Le <khanh.le.xr@renesas.com>
Signed-off-by: Huy Bui <huy.bui.wm@renesas.com>
Signed-off-by: Phong Hoang <phong.hoang.wz@renesas.com>
[Kuninori: tidyup for upstreaming]
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87ecrwa3fj.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Hai Pham
63500d12cf arm64: dts: renesas: Add R8A78000 SoC support
Add initial support for the Renesas R-Car X5H (R8A78000) SoC.

Signed-off-by: Hai Pham <hai.pham.ud@renesas.com>
Signed-off-by: Vinh Nguyen <vinh.nguyen.xz@renesas.com>
Signed-off-by: Minh Le <minh.le.aj@renesas.com>
Signed-off-by: Huy Bui <huy.bui.wm@renesas.com>
Signed-off-by: Khanh Le <khanh.le.xr@renesas.com>
Signed-off-by: Phong Hoang <phong.hoang.wz@renesas.com>
[Kuninori: tidyup for upstreaming]
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/87frcca3fn.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-28 09:23:44 +01:00
Fenglin Wu
74c2c1e0d0 arm64: dts: qcom: x1e80100-crd: Add charge limit nvmem
Add nvmem cells for getting charge control thresholds if they have
been set previously.

Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on Thinkpad T14S OLED
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Fenglin Wu <fenglin.wu@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250919-qcom_battmgr_update_new-v6-1-ed5c38867614@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 22:45:39 -05:00
Loic Poulain
67445dc8a8 arm64: dts: qcom: qcm2290: Fix camss register prop ordering
The qcm2290 CAMSS node has been applied from the V4 series, but a later
version changed the order of the register property, fix it to prevent
dtb check error.

Fixes: 2b3aef30dd ("arm64: dts: qcom: qcm2290: Add CAMSS node")
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250918155456.1158691-1-loic.poulain@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 22:45:02 -05:00
Pankaj Patil
6d49c6ede8 dt-bindings: firmware: qcom,scm: Document Glymur scm
Document the SCM compatible for Qualcomm Glymur SoC.
Secure Channel Manager(SCM) is used to communicate
with secure firmware.

Signed-off-by: Pankaj Patil <pankaj.patil@oss.qualcomm.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20250918141738.2524269-1-pankaj.patil@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 22:43:51 -05:00
Yu Zhang(Yuriy)
7cb69f8970 arm64: dts: qcom: qcs615-ride: Set drive strength for wlan-en-state pin
Set the drive-strength to 16mA for gpio98 used as wlan-en-state in the
QCS615 ride platform device tree. This ensures sufficient output
strength for controlling the WLAN enable signal reliably.

Signed-off-by: Yu Zhang (Yuriy) <yu.zhang@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250918112729.3512516-1-yu.zhang@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 22:43:40 -05:00
Aleksandrs Vinarskis
b54c412b51 arm64: dts: qcom: sc8280xp-x13s: enable camera privacy indicator
Leverage newly introduced 'leds' and 'led-names' properties to pass
indicator's phandle and function to v4l2 subnode. The latter supports
privacy led since couple of years ago under 'privacy-led' designation.
Unlike initially proposed trigger-source based approach, this solution
cannot be easily bypassed from userspace, thus reducing privacy
concerns.

Signed-off-by: Aleksandrs Vinarskis <alex@vinarskis.com>
Tested-by: Steev Klimaszewski <threeway@gmail.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250910-leds-v5-4-bb90a0f897d5@vinarskis.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 22:39:34 -05:00
Manikanta Mylavarapu
d5e86096fe arm64: dts: qcom: ipq5424: add cooling maps for CPU thermal zones
Add cooling-maps to the cpu1, cpu2, and cpu3 thermal zones to associate
passive trip points with CPU cooling devices. This enables proper
thermal mitigation by allowing the thermal framework to throttle CPUs
based on temperature thresholds. Also, label the trip points to allow
referencing them in the cooling maps.

Signed-off-by: Manikanta Mylavarapu <quic_mmanikan@quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251023043838.1603673-1-quic_mmanikan@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 14:45:22 -05:00
Luca Weiss
c1a45887a3 arm64: dts: qcom: sm6350: Add interconnect support to UFS
Define the two NoC paths used by UFS: ufs-ddr and cpu-ufs.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Link: https://lore.kernel.org/r/20251023-sm6350-ufs-things-v3-3-b68b74e29d35@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 13:50:18 -05:00
Luca Weiss
06d262bcdb arm64: dts: qcom: sm6350: Add OPP table support to UFSHC
UFS host controller, when scaling gears, should choose appropriate
performance state of RPMh power domain controller along with clock
frequency. So let's add the OPP table support to specify both clock
frequency and RPMh performance states replacing the old "freq-table-hz"
property.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Link: https://lore.kernel.org/r/20251023-sm6350-ufs-things-v3-2-b68b74e29d35@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 13:50:18 -05:00
Luca Weiss
ec9d588391 arm64: dts: qcom: sm6350: Fix wrong order of freq-table-hz for UFS
During upstreaming the order of clocks was adjusted to match the
upstream sort order, but mistakently freq-table-hz wasn't re-ordered
with the new order.

Fix that by moving the entry for the ICE clk to the last place.

Fixes: 5a814af5fc ("arm64: dts: qcom: sm6350: Add UFS nodes")
Cc: stable@vger.kernel.org
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Link: https://lore.kernel.org/r/20251023-sm6350-ufs-things-v3-1-b68b74e29d35@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 13:50:18 -05:00
Jingyi Wang
4648c70f2e soc: qcom: socinfo: Add SM8850 SoC ID
Add SoC ID for Qualcomm SM8850 which represents the Kaanapali platform.

Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251022-knp-socid-v2-2-d147eadd09ee@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 13:48:01 -05:00
Jingyi Wang
457129aa36 dt-bindings: arm: qcom,ids: Add SoC ID for SM8850
Add the ID for the Qualcomm SM8850 SoC which represents the Kaanapali
platform.

Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20251022-knp-socid-v2-1-d147eadd09ee@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 13:48:01 -05:00
Melody Olvera
f5474a3408 arm64: defconfig: Add M31 eUSB2 PHY config
The Qualcomm SM8750 SoCs use an eUSB2 PHY driver different from the
already existing M31 USB driver because it requires a connection
to an eUSB2 repeater. Thus, for USB to probe and work properly on
the Qualcomm SM8750 SoCs, enable the additional driver.

Signed-off-by: Melody Olvera <melody.olvera@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251021-knp-usb-v2-4-a2809fffcfab@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 13:30:42 -05:00
Alexey Klimov
acb854eba8 arm64: dts: qcom: qrb2210-rb1: add HDMI/I2S audio playback support
Add sound node and aDSP-related pieces to enable HDMI+I2S audio playback
support on Qualcomm QR2210 RB1 board. That is the only sound output
supported for now.

The audio playback is verified using the following commands:
amixer -c0 cset iface=MIXER,name='SEC_MI2S_RX Audio Mixer MultiMedia1' 1
aplay -D hw:0,0 /usr/share/sounds/alsa/Front_Center.wav

Cc: Srinivas Kandagatla <srini@kernel.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org>
Link: https://lore.kernel.org/r/20251022-rb1_hdmi_audio-v3-3-0d38f777a547@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 13:29:58 -05:00
Alexey Klimov
1fc3073156 arm64: dts: qcom: qcm2290: add LPASS LPI pin controller
Add the Low Power Audio SubSystem Low Power Island (LPASS LPI) pin
controller device node required for audio subsystem on Qualcomm
QRB2210 RB1. QRB2210 is based on qcm2290 which is based on sm6115.

While at this, also add description of lpi_i2s2 pins (active state)
required for audio playback via HDMI/I2S.

Cc: Srinivas Kandagatla <srini@kernel.org>
Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org>
Link: https://lore.kernel.org/r/20251022-rb1_hdmi_audio-v3-2-0d38f777a547@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 13:29:57 -05:00
Alexey Klimov
f8d21b5e4c arm64: dts: qcom: qcm2290: add APR and its services
Add APR (asynchronous packet router) node and its associated services
required to enable audio on QRB2210 RB1 board.

Cc: Srinivas Kandagatla <srini@kernel.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org>
Link: https://lore.kernel.org/r/20251022-rb1_hdmi_audio-v3-1-0d38f777a547@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 13:29:57 -05:00
Joel Selvaraj
e19dc81263 arm64: dts: qcom: sdm845-shift-axolotl: fix touchscreen properties
The touchscreen properties previously upstreamed was based on downstream
touchscreen driver. We ended up adapting upstream edt_ft5x06 driver to
support the touchscreen controller used in this device. Update the
touchscreen properties to match with the upstream edt_ft5x06
driver.

Also, the touchscreen controller used in this device is ft5452 and not
fts8719. Fix the compatible string accordingly.

The wakeup-source property was removed as it prevents the touchscreen's
regulators and irq from being disabled when the device is suspended and
could lead to unexpected battery drain. Once low power mode and
tap-to-wake functionality is properly implemented and tested to be
working, we can add it back, if needed.

Signed-off-by: Joel Selvaraj <foss@joelselvaraj.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251021-shift-axolotl-fix-touchscreen-dts-v2-1-e94727f0aa7e@joelselvaraj.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 13:26:58 -05:00
David Heidelberg
ab9a2c821a arm64: dts: qcom: sdm845: Define guard pages within the rmtfs region
Use qcom,use-guard-pages property instead of polluting device-tree with
lower and upper rmtfs guard nodes.

No functional change intended.

cosmetic: set name the node rmtfs-region.

Signed-off-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Tested-By: Paul Sajna <sajattack@postmarketos.org>
Link: https://lore.kernel.org/r/20251020-sdm845-use-guard-pages-v1-1-64d714f8bd73@ixit.cz
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:31:20 -05:00
Guido Günther
720ebcc3e6 arm64: dts: qcom: sdm845-shift-axolotl: Drop address and size cells from panel
They're set in the parent to describe the panel's reg property already.

Fixes the

linux/arch/arm64/boot/dts/qcom/sdm845-shift-axolotl.dtb: panel@0 (visionox,rm69299-shift): '#address-cells', '#size-cells' do not match any of the regexes: '^pinctrl-[0-9]+$'

warning.

Signed-off-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251020-shift6mq-dt-v2-1-d8fc3ec71281@sigxcpu.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:30:51 -05:00
Konrad Dybcio
222c975e06 arm64: dts: qcom: sdx75: Fix the USB interrupt entry order
The DP and DM interrupts are expected to come in a different order.
Reorder them to align with bindings.

Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Link: https://lore.kernel.org/r/20251020-topic-sdx75_usb-v1-1-1a96d5de19c9@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:30:45 -05:00
Konrad Dybcio
4b6e99e488 arm64: dts: qcom: lemans: Align ethernet interconnect-names with schema
Reshuffle the entries to match the expected order.

Fixes the following warnings:

(qcom,sa8775p-ethqos): interconnect-names:0: 'cpu-mac' was expected
(qcom,sa8775p-ethqos): interconnect-names:1: 'mac-mem' was expected

Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251020-topic-lemans_eth_dt-v1-1-25f4532addb2@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:30:37 -05:00
Gaurav Kohli
8053174aac arm64: dts: qcom: lemans-evk: Enable AMC6821 fan controller
Enable AMC6821 fan controller for lemans-evk platform and
configure pwm polarity as inverted.

Signed-off-by: Gaurav Kohli <gaurav.kohli@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251017050256.987660-1-gaurav.kohli@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:29:29 -05:00
Tingguo Cheng
84ff999ae4 arm64: dts: qcom: lemans-pmics: enable rtc
Add RTC node, the RTC is controlled by PMIC device via spmi bus.

Signed-off-by: Tingguo Cheng <tingguo.cheng@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251017-add-rtc-for-lemans-v2-1-0aaf174b25b9@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:27:01 -05:00
Eric Gonçalves
42e56b53a1 arm64: dts: qcom: sm8250-samsung-common: correct reserved pins
The S20 series has additional reserved pins for the fingerprint sensor,
GPIO 20-23. Correct it by adding them into gpio-reserved-ranges.

Fixes: 6657fe9e9f ("arm64: dts: qcom: add initial support for Samsung Galaxy S20 FE")
Signed-off-by: Eric Gonçalves <ghatto404@gmail.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251016202129.226449-1-ghatto404@gmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:27:01 -05:00
Konrad Dybcio
6030fa0636 arm64: dts: qcom: sdm845-starqltechn: Fix i2c-gpio node name
Fix the following DT checker warning:

$nodename:0: 'i2c21' does not match '^i2c(@.+|-[a-z0-9]+)?$'

Fixes: 3a4600448b ("arm64: dts: qcom: sdm845-starqltechn: add display PMIC")
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251015-topic-starltechn_i2c_gpio-v1-1-6d303184ee87@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:27:00 -05:00
Rakesh Kota
bc303efddf arm64: dts: qcom: lemans-evk: Add resin key code for PMM8654AU
Update the PMM8654AU resin input code to KEY_VOLUMEDOWN
and enable it.

Signed-off-by: Rakesh Kota <rakesh.kota@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251015-add_pon_resin-v2-1-44e2e45de5f1@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:21:49 -05:00
Barnabás Czémán
2144f6d57d arm64: dts: qcom: Add Xiaomi Redmi 3S
Add initial support for Xiaomi Redmi 3S (land).

Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org>
Link: https://lore.kernel.org/r/20251014-msm8937-v10-3-b3e8da82e968@mainlining.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:19:35 -05:00
Barnabás Czémán
1a61426728 dt-bindings: arm: qcom: Add Xiaomi Redmi 3S
Document Xiaomi Redmi 3S (land).
Add qcom,msm8937 for msm-id, board-id allow-list.

Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org>
Link: https://lore.kernel.org/r/20251014-msm8937-v10-2-b3e8da82e968@mainlining.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:18:42 -05:00
Dang Huynh
a829f6f2e8 arm64: dts: qcom: Add initial support for MSM8937
Add initial support for MSM8937 SoC.

Signed-off-by: Dang Huynh <danct12@riseup.net>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Co-developed-by: Barnabás Czémán <barnabas.czeman@mainlining.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org>
Link: https://lore.kernel.org/r/20251014-msm8937-v10-1-b3e8da82e968@mainlining.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:18:42 -05:00
Abel Vesa
9b21c3bd24 soc: qcom: ubwc: Add configuration Glymur platform
Describe the Universal Bandwidth Compression (UBWC) configuration
for the new Glymur platform.

Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251014-glymur-display-v2-7-ff935e2f88c5@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:17:45 -05:00
Luca Weiss
4158379863 arm64: defconfig: Enable configs for Fairphone 3, 4, 5 smartphones
Enable base options for MSM8953 and SM6350 SoCs and device-specific
options which are used on the Fairphone 3, Fairphone 4 and Fairphone 5.

* MSM8953 SoC
    MSM_GCC_8953, INTERCONNECT_QCOM_MSM8953
* Fairphone 3
    NFC_NXP_NCI*, TOUCHSCREEN_HIMAX_HX83112B, DRM_PANEL_HIMAX_HX83112B

* SM6350 SoC
    SM_CAMCC_6350, SM_DISPCC_6350, SM_GCC_6350, SM_GPUCC_6350,
    SM_VIDEOCC_6350, INTERCONNECT_QCOM_SM6350
* Fairphone 4
    DRM_PANEL_HIMAX_HX83112A

* Fairphone 5
    DRM_PANEL_RAYDIUM_RM692E5, TYPEC_MUX_PTN36502, INPUT_AW86927

* QCOM PMICs (used on multiple of the devices)
    BACKLIGHT_QCOM_WLED, MFD_QCOM_PM8008, REGULATOR_QCOM_PM8008,
    LEDS_CLASS_FLASH, LEDS_QCOM_FLASH

* USB audio offloading (enabled on Fairphone 4 & 5)
    SND_USB_AUDIO*, SND_SOC_USB, SND_SOC_QDSP6_USB, USB_XHCI_SIDEBAND

Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251014-arm64-defconfig-fp345-v1-1-53fe1eeb598d@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:15:57 -05:00
Vladimir Zapolskiy
08ce527403 arm64: dts: qcom: sm8550-hdk: Add SM8550-HDK Rear Camera Card overlay
Lantronix SM8550-HDK board may be equipped with a Rear Camera Card PCB
which contains:
* Samsung S3K33D time-of-fligt image sensor connected to CSIPHY0 (TOF),
* Omnivision OV64B40 image sensor connected to CSIPHY1 (uWide),
* Sony IMX766 image sensor connected to CSIPHY2 (Wide),
* Samsung S5K3M5 image sensor connected to CSIPHY3 (Tele),
* two flash leds.

The change adds support of a Samsung S5K3M5 camera image sensor and
two flash leds on the external camera card module.

Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20251013235500.1883847-4-vladimir.zapolskiy@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:06:35 -05:00
Vladimir Zapolskiy
3f85737757 arm64: dts: qcom: sm8550-qrd: Enable CAMSS and S5K3M5 camera sensor
Enable CAMSS IP and Samsung S5K3M5 camera sensor on SM8550-QRD board.

Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20251013235500.1883847-3-vladimir.zapolskiy@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:06:35 -05:00
Vladimir Zapolskiy
78db965913 arm64: dts: qcom: sm8550: Add description of MCLK pins
Add MCLK pin descriptions for all pins with such supported function
on SM8550 SoC.

Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20251013235500.1883847-2-vladimir.zapolskiy@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:06:35 -05:00
Pengyu Luo
fb48d3f3ab arm64: dts: qcom: sc8280xp: Fix shifted GPI DMA channels
The GPI DMA channels in sc8280xp.dtsi are wrong. Let's fix it.

Origianl patch was rebased to the linux-next and formated to a new
patch again later, then it got the GPI DMA channels in the new patch
shifted.

Fixes: 71b12166a2 ("arm64: dts: qcom: sc8280xp: Describe GPI DMA controller nodes")
Signed-off-by: Pengyu Luo <mitltlatltl@gmail.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251013115506.103649-1-mitltlatltl@gmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:06:13 -05:00
Krishna Chaitanya Chundru
cfd8f45ddf arm64: dts: qcom: x1e80100: Add opp-level to indicate PCIe data rates
The existing OPP table for PCIe is shared across different link
configurations such as data rates 8GT/s x2 and 16GT/s x1. These
configurations often operate at the same frequency, allowing them
to reuse the same OPP entries. However, 8GT/s and 16 GT/s may have
different RPMh votes which cannot be represented accurately when
sharing a single OPP.

To address this, introduce an `opp-level` to indicate the PCIe data
rate and uniquely differentiate OPP entries even when the frequenc
is the same.

Although this platform does not currently suffer from this issue, the
change is introduced to support unification across platforms.

Append the opp level to name of the opp node to indicate both frequency
and level.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Acked-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/20251013-opp_pcie-v5-4-eb64db2b4bd3@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:05:54 -05:00
Krishna Chaitanya Chundru
860d514f09 arm64: dts: qcom: sm8650: Add opp-level to indicate PCIe data rates
The existing OPP table for PCIe is shared across different link
configurations such as data rates 8GT/s x2 and 16GT/s x1. These
configurations often operate at the same frequency, allowing them
to reuse the same OPP entries. However, 8GT/s and 16 GT/s may have
different RPMh votes which cannot be represented accurately when
sharing a single OPP.

To address this, introduce an `opp-level` to indicate the PCIe data
rate and uniquely differentiate OPP entries even when the frequenc
is the same.

Although this platform does not currently suffer from this issue, the
change is introduced to support unification across platforms.

Append the opp level to name of the opp node to indicate both frequency
and level.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Acked-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/20251013-opp_pcie-v5-3-eb64db2b4bd3@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:05:40 -05:00
Krishna Chaitanya Chundru
fc0ed54869 arm64: dts: qcom: sm8550: Add opp-level to indicate PCIe data rates
The existing OPP table for PCIe is shared across different link
configurations such as data rates 8GT/s x2 and 16GT/s x1. These
configurations often operate at the same frequency, allowing them
to reuse the same OPP entries. However, 8GT/s and 16 GT/s may have
different RPMh votes which cannot be represented accurately when
sharing a single OPP.

To address this, introduce an `opp-level` to indicate the PCIe data
rate and uniquely differentiate OPP entries even when the frequency
is the same.

Although this platform does not currently suffer from this issue, the
change is introduced to support unification across platforms.

Append the opp level to name of the opp node to indicate both frequency
and level.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Acked-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/20251013-opp_pcie-v5-2-eb64db2b4bd3@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:05:40 -05:00
Krishna Chaitanya Chundru
367c2f473f arm64: dts: qcom: sm8450: Add opp-level to indicate PCIe data rates
The existing OPP table for PCIe is shared across different link
configurations such as data rates 8GT/s x2 and 16GT/s x1. These
configurations often operate at the same frequency, allowing them
to reuse the same OPP entries. However, 8GT/s and 16 GT/s may have
different RPMh votes which cannot be represented accurately when
sharing a single OPP.

To address this, introduce an `opp-level` to indicate the PCIe data
rate and uniquely differentiate OPP entries even when the frequency
is the same.

Although this platform does not currently suffer from this issue, the
change is introduced to support unification across platforms.

Append the opp level to name of the opp node to indicate both frequency
and level.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Acked-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/20251013-opp_pcie-v5-1-eb64db2b4bd3@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 12:05:40 -05:00
Val Packett
147d5eefab arm64: dts: qcom: x1-dell-thena: remove dp data-lanes
The commit 458de58424 ("arm64: dts: qcom: x1e80100: move dp0/1/2
data-lanes to SoC dtsi") has landed before this file was added, so
the data-lanes lines here remained.

Remove them to enable 4-lane DP on the X1E Dell Inspiron/Latitude.

Fixes: e7733b4211 ("arm64: dts: qcom: Add support for Dell Inspiron 7441 / Latitude 7455")
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Val Packett <val@packett.cool>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251012224909.14988-1-val@packett.cool
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:49:40 -05:00
Val Packett
1bdfe3edd4 arm64: dts: qcom: x1-dell-thena: Add missing pinctrl for eDP HPD
The commit a41d23142d ("arm64: dts: qcom: x1e80100-dell-xps13-9345:
Add missing pinctrl for eDP HPD") has applied this change to a very
similar machine, so apply it here too.

This allows us not to rely on the boot firmware to set up the pinctrl
for the eDP HPD line of the internal display.

Fixes: e7733b4211 ("arm64: dts: qcom: Add support for Dell Inspiron 7441 / Latitude 7455")
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Val Packett <val@packett.cool>
Link: https://lore.kernel.org/r/20251012224706.14311-1-val@packett.cool
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:49:21 -05:00
Konrad Dybcio
5b5014f667 arm64: dts: qcom: x1e80100: Move CPU idle states to their respective PSCI PDs
To make things uniform with other Qualcomm platforms, move the CPU idle
states under their PSCI power domains. No functional change.

Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251010-topic-x1e_dt_idle-v1-1-b1c8d558e635@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:48:36 -05:00
Le Qi
210d525d9c arm64: dts: qcom: hamoa-iot-evk: Fix 4-speaker playback support
On the HAMOA-IOT-EVK board only 2 out of 4 speakers were functional.
Unlike the CRD, which shares a single GPIO reset line for WSA1/2,
this board provides a dedicated GPIO reset for each WSA, resulting
in 4 separate reset lines.

Update the device tree accordingly so that all 4 speakers can
playback audio as expected.

Signed-off-by: Le Qi <le.qi@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251010033728.1808133-1-le.qi@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:47:05 -05:00
Konrad Dybcio
752c3765a9 arm64: dts: qcom: x1e80100: Describe the full 'link' region of DP hosts
The regions are larger than currently described. Rather inconveniently,
some control registers, including some related to USB4, are in that
left-out chunk.

Extend it to cover the entire region, as per the hw specification.

Fixes: 1940c25eaa ("arm64: dts: qcom: x1e80100: Add display nodes")
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Link: https://lore.kernel.org/r/20251009-topic-hamoa_dp_reg-v1-1-4c70afa5f029@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:46:56 -05:00
Wojciech Slenska
89e4902ac7 arm64: dts: qcom: qcm2290: Add uart1 and uart5 nodes
Add nodes to support uart1 and uart5.

Signed-off-by: Wojciech Slenska <wojciech.slenska@gmail.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251009090858.32911-1-wojciech.slenska@gmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:46:32 -05:00
Wojciech Slenska
cdf9756037 arm64: dts: qcom: qcm2290: Fix uart3 QUP interconnect
The config_noc interconnect should use SLAVE_QUP_0.

Signed-off-by: Wojciech Slenska <wojciech.slenska@gmail.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251009090718.32503-1-wojciech.slenska@gmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:46:23 -05:00
Alexander Martinz
6e66efe16b arm64: dts: qcom: qcm6490-shift-otter: Enable venus node
Enable the venus node so that the video encoder/decoder will start
working.

Signed-off-by: Alexander Martinz <amartinz@shiftphones.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251009-otter-further-bringup-v2-6-5bb2f4a02cea@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:46:12 -05:00
Luca Weiss
66e7483966 arm64: dts: qcom: qcm6490-shift-otter: Enable RGB LED
Enable the RGB LED connected to the PM7350C (PM8350C).

Signed-off-by: Luca Weiss <luca@lucaweiss.eu>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251009-otter-further-bringup-v2-5-5bb2f4a02cea@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:46:12 -05:00
Casey Connolly
2fd302ea31 arm64: dts: qcom: qcm6490-shift-otter: Enable flash LED
Describe the flash LED on this phone.

Signed-off-by: Casey Connolly <casey.connolly@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Pavel Machek <pavel@ucw.cz>
Link: https://lore.kernel.org/r/20251009-otter-further-bringup-v2-4-5bb2f4a02cea@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:46:12 -05:00
Alexander Martinz
f404fdcb50 arm64: dts: qcom: qcm6490-shift-otter: Add missing reserved-memory
It seems we also need to reserve a region of 81 MiB called "removed_mem"
otherwise we can easily hit memory errors with higher RAM usage.

Fixes: 249666e34c ("arm64: dts: qcom: add QCM6490 SHIFTphone 8")
Signed-off-by: Alexander Martinz <amartinz@shiftphones.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251009-otter-further-bringup-v2-3-5bb2f4a02cea@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:46:12 -05:00
Alexander Martinz
a206ee34db arm64: dts: qcom: qcm6490-shift-otter: Remove thermal zone polling delays
As with all other devices in commit 7747a49db7 ("arm64: dts: qcom:
sc7280-*: Remove thermal zone polling delays"), apply the same change to
this device as the delays are assumed to be equal to "0" if not set.

Signed-off-by: Alexander Martinz <amartinz@shiftphones.com>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251009-otter-further-bringup-v2-2-5bb2f4a02cea@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:46:12 -05:00
Alexander Martinz
03eb18495d arm64: dts: qcom: qcm6490-shift-otter: Fix sorting and indentation
Make sure the nodes are sorted correctly, and the indentation is
correct.

Signed-off-by: Alexander Martinz <amartinz@shiftphones.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251009-otter-further-bringup-v2-1-5bb2f4a02cea@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:46:12 -05:00
Krzysztof Kozlowski
94c3d824c5 arm64: defconfig: Enable two Novatek display panels for MTP8750 and Tianma
Enable drivers for Novatek NT36672A panel (used on Xiaomi Pocophone F1
Tianma, Qualcomm SDM845 SoC) and Novatek NT37801 panel (used on Qualcomm
MTP8750 development board).

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20251009021507.175290-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:43:05 -05:00
Erikas Bitovtas
42621cbb3a arm64: dts: qcom: msm8939-asus-z00t: add initial device tree
Add an initial device tree for Asus ZenFone 2 Laser/Selfie. This
includes support for:
- UART
- USB
- Internal storage
- MicroSD
- Volume keys
- Touchscreen: Focaltech FT5306
- Accelerometer: Invensense MPU6515
- Magnetometer: Asahi Kasei AK09911
- Vibrator
- Audio input and output
- Modem

Signed-off-by: Erikas Bitovtas <xerikasxx@gmail.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251008182106.217340-3-xerikasxx@gmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:42:26 -05:00
Erikas Bitovtas
bc42d98593 dt-bindings: arm: qcom: Add Asus ZenFone 2 Laser/Selfie
Add a compatible for Asus ZenFone 2 Laser/Selfie (1080p)

Signed-off-by: Erikas Bitovtas <xerikasxx@gmail.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20251008182106.217340-2-xerikasxx@gmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:42:26 -05:00
Jingzhou Zhu
aab69794b5 arm64: dts: qcom: Add support for Huawei MateBook E 2019
Add device tree for Huawei MateBook E 2019, which is a 2-in-1 tablet based
on Qualcomm's sdm850 platform.

Supported features:
 - ADSP, CDSP and SLPI
 - Volume Key
 - Power Key
 - Tablet Mode Switching
 - Display
 - Touchscreen
 - Stylus
 - WiFi [1]
 - Bluetooth [2]
 - GPU
 - USB
 - Keyboard
 - Touchpad
 - UFS
 - SD Card
 - Audio (right internal mic and headphone mic not working)
 - Mobile Network

[1] WiFi probing log:
ath10k_snoc 18800000.wifi: Adding to iommu group 12
ath10k_snoc 18800000.wifi: qmi chip_id 0x30214 chip_family 0x4001 board_id 0xff soc_id 0x40030001
ath10k_snoc 18800000.wifi: qmi fw_version 0x2009856b fw_build_timestamp 2018-07-19 12:28 fw_build_id QC_IMAGE_VERSION_STRING=WLAN.HL.2.0-01387-QCAHLSWMTPLZ-1
ath10k_snoc 18800000.wifi: wcn3990 hw1.0 target 0x00000008 chip_id 0x00000000 sub 0000:0000
ath10k_snoc 18800000.wifi: kconfig debug 1 debugfs 1 tracing 1 dfs 0 testmode 0
ath10k_snoc 18800000.wifi: firmware ver  api 5 features wowlan,mgmt-tx-by-reference,non-bmi crc32 b3d4b790
ath10k_snoc 18800000.wifi: htt-ver 3.53 wmi-op 4 htt-op 3 cal file max-sta 32 raw 0 hwcrypto 1
ath10k_snoc 18800000.wifi: invalid MAC address; choosing random

[2] Bluetooth probing log:
Bluetooth: hci0: setting up wcn399x
Bluetooth: hci0: QCA Product ID   :0x0000000a
Bluetooth: hci0: QCA SOC Version  :0x40010214
Bluetooth: hci0: QCA ROM Version  :0x00000201
Bluetooth: hci0: QCA Patch Version:0x00000001
Bluetooth: hci0: QCA controller version 0x02140201
Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
Bluetooth: hci0: QCA Downloading qca/crnv21.bin
Bluetooth: hci0: QCA setup on UART is completed

Features not supported yet:
 - Panel Backlight
 - Lid Detection
 - Battery
 - EFI Variable Access
 - Cameras

1. Panel backlight, lid detection and battery will be supported with the
   EC driver upstreamed.
2. EFI variables can only be read with the QSEECOM driver, and will be
   enabled when the driver is fixed.
3. Cameras are tested to work with modified downstream driver, and once
   drivers for these camera modules are included in the tree, cameras can
   be enabled.

Features won't be supported:
 - External Display
 - Fingerprint

1. To make external display work, more reverse engineering may be required,
   but it's beyond my ability.
2. Fingerprint is controlled by TrustZone, meaning direct access to it
   isn't possible.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Jingzhou Zhu <newwheatzjz@zohomail.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251008130052.11427-3-newwheatzjz@zohomail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:31:25 -05:00
Jingzhou Zhu
bfc5cabaa4 dt-bindings: arm: qcom: Document Huawei MateBook E 2019
Add compatible for the sdm850-based tablet Huawei MateBook E 2019 using
its codename "planck".

Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jingzhou Zhu <newwheatzjz@zohomail.com>
Link: https://lore.kernel.org/r/20251008130052.11427-2-newwheatzjz@zohomail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:31:24 -05:00
Krishna Chaitanya Chundru
cc8056a164 arm64: dts: qcom: sm8750-mtp: move PCIe GPIOs to pcieport0 node
Relocate the wake-gpios and perst-gpios properties from the pcie0
controller node to the pcieport0 node. These GPIOs are associated with
the PCIe root port and should reside under the pcieport0 node.

Also rename perst-gpios to reset-gpios to match the expected property name
in the PCIe port node.

Fixes: 141714e163 ("arm64: dts: qcom: sm8750-mtp: Add WiFi and Bluetooth")
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Tested-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20251008-sm8750-v1-1-daeadfcae980@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:30:46 -05:00
Neil Armstrong
c2703c9016 arm64: dts: qcom: sm8650: set ufs as dma coherent
The UFS device is ovbiously dma coherent like the other IOMMU devices
like usb, mmc, ... let's fix this by adding the flag.

To be sure an extensive test has been performed to be sure it's
safe, as downstream uses this flag for UFS as well.

As an experiment, I checked how the dma-coherent could impact
the UFS bandwidth, and it happens the max bandwidth on cached
write is slighly highter (up to 10%) while using less cpu time
since cache sync/flush is skipped.

Fixes: 10e0246712 ("arm64: dts: qcom: sm8650: add interconnect dependent device nodes")
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251007-topic-sm8650-upstream-ufs-dma-coherent-v1-1-f3cfeaee04ce@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:30:29 -05:00
Luca Weiss
037f0f59bb arm64: dts: qcom: sm7325-nothing-spacewar: Use correct compatible for audiocc
Use the correct compatible for this phone with standard Qualcomm
firmware and remove references to power-domains from a 'reserved' node.

Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250930-sc7280-dts-misc-v1-5-5a45923ef705@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:28:50 -05:00
Luca Weiss
1a3051614f arm64: dts: qcom: qcm6490-shift-otter: Use correct compatible for audiocc
Use the correct compatible for this phone with standard Qualcomm
firmware and remove references to power-domains from a 'reserved' node.

Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250930-sc7280-dts-misc-v1-4-5a45923ef705@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:28:50 -05:00
Luca Weiss
c207f5319d arm64: dts: qcom: qcm6490-fairphone-fp5: Use correct compatible for audiocc
Use the correct compatible for this phone with standard Qualcomm
firmware and remove references to power-domains from a 'reserved' node.

Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250930-sc7280-dts-misc-v1-3-5a45923ef705@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:28:50 -05:00
Luca Weiss
99dc57012d arm64: dts: qcom: qcm6490-fairphone-fp5: Add VTOF_LDO_2P8 regulator
Describe yet another regulator-fixed on this board, powering the ToF
sensor.

Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Link: https://lore.kernel.org/r/20250930-sc7280-dts-misc-v1-2-5a45923ef705@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:28:50 -05:00
Luca Weiss
3d4142cac4 arm64: dts: qcom: qcm6490-fairphone-fp5: Add supplies to simple-fb node
Add the OLED power supplies to the simple-framebuffer node, so that
the regulators don't get turned off while the simple-fb is being used.

Fixes: c365a02615 ("arm64: dts: qcom: qcm6490-fairphone-fp5: Enable display")
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250930-sc7280-dts-misc-v1-1-5a45923ef705@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:28:50 -05:00
Luca Weiss
8b4faf419d arm64: dts: qcom: sm8250: Add MDSS_CORE reset to mdss
Like on other platforms, if the OS does not support recovering the state
left by the bootloader it needs access to MDSS_CORE, so that it can
clear the MDSS configuration. Add a reference to the relevant reset.

Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Tested-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> # RB5
Link: https://lore.kernel.org/r/20250930-sm8250-mdss-reset-v1-1-a64522d91f12@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:28:03 -05:00
Xilin Wu
ef254b12ec arm64: dts: qcom: qcs6490: Introduce Radxa Dragon Q6A
Radxa Dragon Q6A is a single board computer, based on the Qualcomm
QCS6490 platform.

Features enabled and working:

- Configurable I2C/SPI/UART from 40-Pin GPIO
- Three USB-A 2.0 ports
- RTL8111K Ethernet connected to PCIe0
- eMMC module
- SD card
- M.2 M-Key 2230 PCIe 3.0 x2
- Headphone jack
- Onboard thermal sensors
- QSPI controller for updating boot firmware
- ADSP remoteproc (Type-C and charging features disabled in firmware)
- CDSP remoteproc (for AI applications using QNN)
- Venus video encode and decode accelerator

Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Xilin Wu <sophon@radxa.com>
Link: https://lore.kernel.org/r/20250929-radxa-dragon-q6a-v5-2-aa96ffc352f8@radxa.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:27:25 -05:00
Xilin Wu
8388ebac98 dt-bindings: arm: qcom: Add Radxa Dragon Q6A
Radxa Dragon Q6A is a single board computer, based on the Qualcomm
QCS6490 platform.

Document the top-level compatible for this board.

Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Xilin Wu <sophon@radxa.com>
Link: https://lore.kernel.org/r/20250929-radxa-dragon-q6a-v5-1-aa96ffc352f8@radxa.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:27:25 -05:00
Aleksandrs Vinarskis
c2ca1cc0d4 arm64: dts: qcom: x1e80100-asus-zenbook-a14: Enable WiFi, Bluetooth
Unlike UX3407QA with WCN6855, UX3407RA comes with WCN7850. Definitions
were not added during initial bringup due to lack of hardware to test
it. Add missing definitions that were now confirmed to work.

Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Aleksandrs Vinarskis <alex@vinarskis.com>
Link: https://lore.kernel.org/r/20250927-zenbook-improvements-v3-3-d46c7368dc70@vinarskis.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:20:49 -05:00
Aleksandrs Vinarskis
462b39931c arm64: dts: qcom: Rework X1-based Asus Zenbook A14's displays
The laptop comes in two variants:

* UX3407RA, higher end, FHD+ OLED or WOXGA+ OLED panels
* UX3407QA, lower end, FHD+ OLED or FHD+ LCD panels

Even though all three panels work with "edp-panel", unfortunately the
brightness adjustmenet of LCD panel is PWM based, requiring a dedicated
device-tree. Convert "x1p42100-asus-zenbook-a14.dts" into ".dtsi" to
allow for this split, introduce new LCD variant. Leave current variant
without postfix and with the unchanged model name, as some distros
(eg. Ubuntu) rely on this for automatic device-tree detection during
kernel installation/upgrade.

As dedicated device-tree is required, update compatibles of OLED
variants to correct ones. Keep "edp-panel" as fallback, since it is
enough to make the panels work.

While at it moving .dts, .dtsi around, drop 'model' from the top level
x1-asus-zenbook-a14.dtsi as well.

Co-developed-by: Jens Glathe <jens.glathe@oldschoolsolutions.biz>
Signed-off-by: Jens Glathe <jens.glathe@oldschoolsolutions.biz>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Aleksandrs Vinarskis <alex@vinarskis.com>
Link: https://lore.kernel.org/r/20250927-zenbook-improvements-v3-2-d46c7368dc70@vinarskis.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:20:49 -05:00
Aleksandrs Vinarskis
1ade4b89d8 dt-bindings: arm: qcom: Add Asus Zenbook A14 UX3407QA LCD/OLED variants
X1/X1 Plus variant of the said device comes in either FHD+ OLED or FHD+
LCD panel, and shares the same model number UX3407QA. It appears LCD
panel's brightness adjustment is PWM backlight controlled, so a
dedicated device-tree is required. Introduce dedicated compatibles with
fallback to 'asus,zenbook-a14-ux3407qa' as they are otherwise the same.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Aleksandrs Vinarskis <alex@vinarskis.com>
Link: https://lore.kernel.org/r/20250927-zenbook-improvements-v3-1-d46c7368dc70@vinarskis.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:20:12 -05:00
Gergo Koteles
d7ec7d3423 arm64: dts: qcom: sdm845-oneplus: Correct gpio used for slider
The previous GPIO numbers were wrong. Update them to the correct
ones and fix the label.

Fixes: 288ef8a426 ("arm64: dts: sdm845: add oneplus6/6t devices")
Signed-off-by: Gergo Koteles <soyer@irl.hu>
Signed-off-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250927-slider-correct-v1-1-fb8cc7fdcedf@ixit.cz
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:19:33 -05:00
Dzmitry Sankouski
4372b15d89 arm64: dts: qcom: sdm845-starqltechn: fix max77705 interrupts
Since max77705 has a register, which indicates interrupt source, it acts
as an interrupt controller.

Direct MAX77705's subdevices to use the IC's internal interrupt
controller, instead of listening to every interrupt fired by the
chip towards the host device.

Fixes: 7a88a931d0 ("arm64: dts: qcom: sdm845-starqltechn: add max77705 PMIC")
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Dzmitry Sankouski <dsankouski@gmail.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250926-starqltechn-correct_max77705_nodes-v5-2-c6ab35165534@gmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:19:03 -05:00
Dzmitry Sankouski
4133486382 arm64: dts: qcom: sdm845-starqltechn: remove (address|size)-cells
Drop the unused address/size-cells properties to silence the DT
checker warning:

pmic@66 (maxim,max77705): '#address-cells', '#size-cells' do not
match any of the regexes: '^pinctrl-[0-9]+$'

Fixes: 7a88a931d0 ("arm64: dts: qcom: sdm845-starqltechn: add max77705 PMIC")
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Dzmitry Sankouski <dsankouski@gmail.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250926-starqltechn-correct_max77705_nodes-v5-1-c6ab35165534@gmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:19:00 -05:00
Griffin Kroah-Hartman
ebb14a39c0 arm64: dts: qcom: qcm6490-fairphone-fp5: Add vibrator support
Add the required node for haptic playback (Awinic AW86927).

Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Griffin Kroah-Hartman <griffin.kroah@fairphone.com>
Link: https://lore.kernel.org/r/20250925-aw86927-v3-3-1fc6265b42de@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:05:02 -05:00
Viken Dadhaniya
30b5167b80 arm64: dts: qcom: monaco-evk: Add firmware-name to QUPv3 nodes
Traditionally, firmware loading for Serial Engines (SE) in the QUP hardware
of Qualcomm SoCs has been managed by TrustZone (TZ). While this approach
ensures secure SE assignment and access control, it limits flexibility for
developers who need to enable various protocols on different SEs.

Add the firmware-name property to QUPv3 nodes in the device tree to enable
firmware loading from the Linux environment. Handle SE assignments and
access control permissions directly within Linux, removing the dependency
on TrustZone.

Signed-off-by: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250925042605.1388951-1-viken.dadhaniya@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 11:01:48 -05:00
Viken Dadhaniya
3f9fa03b7e arm64: dts: qcom: lemans-evk: Add firmware-name to QUPv3 nodes
Traditionally, firmware loading for Serial Engines (SE) in the QUP hardware
of Qualcomm SoCs has been managed by TrustZone (TZ). While this approach
ensures secure SE assignment and access control, it limits flexibility for
developers who need to enable various protocols on different SEs.

Add the firmware-name property to QUPv3 nodes in the device tree to enable
firmware loading from the Linux environment. Handle SE assignments and
access control permissions directly within Linux, removing the dependency
on TrustZone.

Signed-off-by: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Acked-by: Mukesh Kumar Savaliya <mukesh.savaliya@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250924035409.3976652-1-viken.dadhaniya@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:57:40 -05:00
Viken Dadhaniya
5af8a9e843 arm64: dts: qcom: qcs6490-rb3gen2: Add firmware-name to QUPv3 nodes
Traditionally, firmware loading for Serial Engines (SE) in the QUP hardware
of Qualcomm SoCs has been managed by TrustZone (TZ). While this approach
ensures secure SE assignment and access control, it limits flexibility for
developers who need to enable various protocols on different SEs.

Add the firmware-name property to QUPv3 nodes in the device tree to enable
firmware loading from the Linux environment. Handle SE assignments and
access control permissions directly within Linux, removing the dependency
on TrustZone.

Signed-off-by: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250923161107.3541698-1-viken.dadhaniya@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:57:31 -05:00
Jonathan Albrieux
b91f5d73b6 arm64: dts: qcom: msm8916-longcheer-l8910: Add touchscreen
The BQ Aquaris X5 (Longcheer L8910) has a Himax HX852x-ES touchscreen,
which can now be described with the bindings recently added to linux-next.
Add it to the device tree to allow using the touchscreen.

Signed-off-by: Jonathan Albrieux <jonathan.albrieux@gmail.com>
Co-developed-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250919-msm8916-l8910-touchscreen-v1-1-c46e56ec0a3b@gerhold.net
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:48:31 -05:00
Rakesh Kota
172ca2d802 arm64: dts: qcom: qcs6490-rb3gen2: Update regulator settings
Update min/max voltage settings for regulators below to align
with the HW specification
vreg_l3b_0p504
vreg_l6b_1p2
vreg_l11b_1p504
vreg_l14b_1p08
vreg_l16b_1p1
vreg_l17b_1p7
vreg_s1c_2p19
vreg_l8c_1p62
vreg_l9c_2p96
vreg_l12c_1p65.

While at it, remove RPMH regulator rails (listed below) as
these are not to be used on APPS, and any client accidently
voting on it can potentially cause issues.
vreg_s2b_0p876
vreg_s2c_0p752
vreg_s5c_0p752
vreg_s7c_0p752
vreg_s10c_0p752
vreg_l4b_0p752
vreg_l5b_0p752.

Signed-off-by: Rakesh Kota <rakesh.kota@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250919-b4-rb3gen2-update-regulator-v1-1-1ea9e70d01cb@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:48:25 -05:00
Luca Weiss
f618fef3f1 arm64: dts: qcom: sm6350: Add MDSS_CORE reset to mdss
Like on other platforms, if the OS does not support recovering the state
left by the bootloader it needs access to MDSS_CORE, so that it can
clear the MDSS configuration. Add a reference to the relevant reset.

This also fixes display init on Linux v6.17.

Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250919-sm6350-mdss-reset-v1-3-48dcac917c73@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:47:45 -05:00
Tiwei Bie
aa3e6faf62 um: Enable SMP support on x86
Implement spinlock support for SMP on UML/x86, leveraging x86's
spinlock implementation. In addition, to support SMP on CPUs that
do not support CX8, some additional work is required. However,
considering that such CPUs are already very outdated, and x86 is
also removing support for them [1], let's enable SMP support only
on CPUs that support CX8.

[1] https://lore.kernel.org/lkml/20250515085708.2510123-1-mingo@kernel.org/

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251027001815.1666872-9-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 16:41:53 +01:00
Tiwei Bie
8d74895527 asm-generic: percpu: Add assembly guard
Currently, asm/percpu.h is directly or indirectly included by
some assembly files on x86. Some of them (e.g., checksum_32.S)
are also used on um. But x86 and um provide different versions
of asm/percpu.h -- um uses asm-generic/percpu.h directly.

When SMP is enabled, asm-generic/percpu.h will introduce C code
that cannot be assembled. Since asm-generic/percpu.h currently
is not designed for use in assembly, and these assembly files
do not actually need asm/percpu.h on um, let's add the assembly
guard in asm-generic/percpu.h to fix this issue.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20251027001815.1666872-8-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 16:41:53 +01:00
Tiwei Bie
37f847b794 um: vdso: Remove getcpu support on x86
We are going to support SMP on UML/x86, so we can't hard code
the CPU and NUMA node in __vdso_getcpu() anymore. Let's just
remove it and let applications fall back to the syscall.

Suggested-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251027001815.1666872-7-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 16:41:53 +01:00
Tiwei Bie
1e4ee5135d um: Add initial SMP support
Add initial symmetric multi-processing (SMP) support to UML. With
this support enabled, users can tell UML to start multiple virtual
processors, each represented as a separate host thread.

In UML, kthreads and normal threads (when running in kernel mode)
can be scheduled and executed simultaneously on different virtual
processors. However, the userspace code of normal threads still
runs within their respective single-threaded stubs.

That is, SMP support is currently available both within the kernel
and across different processes, but still remains limited within
threads of the same process in userspace.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251027001815.1666872-6-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 16:41:15 +01:00
Tiwei Bie
9c82de55d4 um: Define timers on a per-CPU basis
Define timers on a per-CPU basis to enable each CPU to have its
own timer. This is a preparation for adding SMP support.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251027001815.1666872-5-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 16:41:15 +01:00
Tiwei Bie
2670917c2f um: Determine sleep based on need_resched()
With SMP and NO_HZ enabled, the CPU may still need to sleep even
if the timer is disarmed. Switch to deciding whether to sleep based
on pending resched. Additionally, because disabling IRQs does not
block SIGALRM, it is also necessary to check for any pending timer
alarms. This is a preparation for adding SMP support.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251027001815.1666872-4-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 16:41:15 +01:00
Tiwei Bie
9e5a9f1c9b um: Turn signals_* into thread-local variables
Turn signals_enabled, signals_pending and signals_active into
thread-local variables. This enables us to control and track
signals independently on each CPU thread. This is a preparation
for adding SMP support.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251027001815.1666872-3-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 16:41:15 +01:00
Tiwei Bie
6aaf00d14e um: Do not disable kmalloc in initial_thread_cb()
Currently, initial_thread_cb() temporarily disables kmalloc when
it invokes the callback, allowing the callback to bypass kmalloc.
This is unnecessary for the current users of initial_thread_cb(),
and we should avoid memory allocations that are not under the
control of the UML kernel. Therefore, let's stop temporarily
disabling kmalloc in initial_thread_cb().

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251027001815.1666872-2-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 16:41:14 +01:00
Tiwei Bie
a7f7dbae94 um: Remove file-based iomem emulation support
The file-based iomem emulation was introduced to support writing
paravirtualized drivers based on emulated iomem regions. However,
the only driver that makes use of it is an example driver called
mmapper, which was written over two decades ago.

We now have several modern device emulation mechanisms, such as
vhost-user-based virtio-uml. Remove the file-based iomem emulation
support to reduce the maintenance burden.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251027054519.1996090-5-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 16:37:12 +01:00
Tiwei Bie
9c84022c1d um: Replace UML_ROUND_UP() with PAGE_ALIGN()
Although UML_ROUND_UP() is defined in a shared header file, it
depends on the PAGE_SIZE and PAGE_MASK macros, so it can only be
used in kernel code. Considering its name is not very clear and
its functionality is the same as PAGE_ALIGN(), replace its usages
with a direct call to PAGE_ALIGN() and remove it.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251027054519.1996090-4-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 16:37:12 +01:00
Tiwei Bie
de20326748 um: Use PAGE_ALIGN() for address alignment
Use PAGE_ALIGN() instead of open-coded calculations.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251027054519.1996090-3-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 16:37:12 +01:00
Tiwei Bie
691ff59148 um: Make host_task_size a local variable
Currently, host_task_size is a global variable, but it is only used
in linux_main() to compute stub_start and task_size. Make it a local
variable to limit its scope to where it is actually needed.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20251027054519.1996090-2-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 16:37:12 +01:00
Tingguo Cheng
4654433409 arm64: dts: qcom: qcs8300-pmics: Remove 'allow-set-time' property
Remove the 'allow-set-time' property from the rtc node because APPS
is prohibited from setting the hardware RTC time.

Signed-off-by: Tingguo Cheng <tingguo.cheng@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250926-remove-rtc-allow-set-time-v1-1-76895a666939@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:13:00 -05:00
Dmitry Baryshkov
9a5b294dcc arm64: dts: qcom: rename sm6150 to talos
SM6150 and QCS615 are two names for the same die, collectively known as
'talos'. Follow the example of other platforms and rename SM6150 to
talos.dtsi.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250923-rename-dts-v1-3-21888b68c781@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:11:52 -05:00
Dmitry Baryshkov
8c0b058ab5 arm64: dts: qcom: rename x1e80100 to hamoa
The X1E80100 and several other similar names (X1E78100, X1E001DE) all
belong to the platform now known as 'hamoa'. Follow the example of
'lemans' and rename the x1e80100.dtsi to hamoa.dtsi and
x1e80100-pmics.dtsi to hamoa-pmics.dtsi.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250923-rename-dts-v1-2-21888b68c781@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:11:52 -05:00
Dmitry Baryshkov
e50e601ef5 arm64: dts: qcom: rename qcs8300 to monaco
The QCS8300 and QCS8275 are two variants of the same die with no
difference visible to the Linux kernel, which are collectively named as
'monaco'. Rather than trying to using the name, which is not always
relevant, follow the example of 'lemans' and rename qcs8300.dtsi to
monaco.dtsi (and qcs8300-pmics.dtsi to monaco-pmics.dtsi).

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250923-rename-dts-v1-1-21888b68c781@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:11:52 -05:00
Dmitry Baryshkov
afec70ac03 arm64: dts: qcom: sm6375: add refgen regulator
Add the refgen regulator block. It should be used for DSI controllers
once they are added.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-refgen-v1-14-9d93e64133ea@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:10:13 -05:00
Dmitry Baryshkov
d4d1e799e9 arm64: dts: qcom: sc8280xp: add refgen regulator
Add the refgen regulator block. It should be used for DSI controllers
once they are added.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-refgen-v1-13-9d93e64133ea@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:10:13 -05:00
Dmitry Baryshkov
2c9e4d7c68 arm64: dts: qcom: qcs8300: add refgen regulator
Add the refgen regulator block. It should be used for DSI controllers
once they are added.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-refgen-v1-12-9d93e64133ea@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:10:13 -05:00
Dmitry Baryshkov
3aedde1859 arm64: dts: qcom: sm8250: add refgen regulator and use it for DSI
Add the refgen regulator block and use it for the DSI controller.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-refgen-v1-11-9d93e64133ea@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:09:58 -05:00
Dmitry Baryshkov
0aa588760d arm64: dts: qcom: sm8150: add refgen regulator and use it for DSI
Add the refgen regulator block and use it for the DSI controller.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-refgen-v1-10-9d93e64133ea@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:09:58 -05:00
Dmitry Baryshkov
e53107df09 arm64: dts: qcom: sm6350: add refgen regulator and use it for DSI
Add the refgen regulator block and use it for the DSI controller.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Luca Weiss <luca.weiss@fairphone.com>
Link: https://lore.kernel.org/r/20250921-refgen-v1-9-9d93e64133ea@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:09:58 -05:00
Dmitry Baryshkov
ca031c24bf arm64: dts: qcom: sdm845: add refgen regulator and use it for DSI
Add the refgen regulator block and use it for the DSI controllers.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-refgen-v1-8-9d93e64133ea@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:09:58 -05:00
Dmitry Baryshkov
7223744176 arm64: dts: qcom: sdm670: add refgen regulator and use it for DSI
Add the refgen regulator block and use it for the DSI controllers.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-refgen-v1-7-9d93e64133ea@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:09:58 -05:00
Dmitry Baryshkov
4be2ab8c4e arm64: dts: qcom: sc8180x: add refgen regulator and use it for DSI
Add the refgen regulator block and use it for the DSI controllers.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-refgen-v1-6-9d93e64133ea@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:09:58 -05:00
Dmitry Baryshkov
f8cfb1932c arm64: dts: qcom: sc7280: add refgen regulator and use it for DSI
Add the refgen regulator block and use it for the DSI controller.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-refgen-v1-5-9d93e64133ea@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:09:58 -05:00
Dmitry Baryshkov
ac44b60f5d arm64: dts: qcom: sc7180: add refgen regulator and use it for DSI
Add the refgen regulator block and use it for the DSI controller.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-refgen-v1-4-9d93e64133ea@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:09:58 -05:00
Dmitry Baryshkov
7522c9ffaa arm64: dts: qcom: lemans: add refgen regulator and use it for DSI
Add the refgen regulator block and use it for the DSI controllers.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-refgen-v1-3-9d93e64133ea@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:09:58 -05:00
Dmitry Baryshkov
883e20433f arm64: dts: qcom: lemans: move USB PHYs to a proper place
Sort the lemans.dtsi, moving USB1 and USB2 PHYs to a proper place,
making the DT file sorted by the address.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250921-refgen-v1-2-9d93e64133ea@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:09:57 -05:00
Krishna Chaitanya Chundru
03e928442d arm64: dts: qcom: sc7280: Increase config size to 256MB for ECAM feature
PCIe ECAM(Enhanced Configuration Access Mechanism) feature requires
maximum of 256MB configuration space.

To enable this feature increase configuration space size to 256MB. If
the config space is increased, the BAR space needs to be truncated as
it resides in the same location. To avoid the bar space truncation move
config space, DBI, ELBI, iATU to upper PCIe region and use lower PCIe
iregion entirely for BAR region.

This depends on the commit: '10ba0854c5e6 ("PCI: qcom: Disable mirroring
of DBI and iATU register space in BAR region")'

Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250828-ecam_v4-v8-1-92a30e0fa02d@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:04:04 -05:00
Raviteja Laggyshetty
44562f5918 arm64: dts: qcom: qcs615: Add OSM l3 interconnect provider node and CPU OPP tables to scale DDR/L3
Add Operation State Manager (OSM) L3 interconnect provide node and OPP
tables required to scale DDR and L3 per freq-domain on QCS615 SoC.
As QCS615 and SM8150 SoCs have same OSM hardware, added SM8150
compatible as fallback for QCS615 OSM device node.

Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
Signed-off-by: Raviteja Laggyshetty <raviteja.laggyshetty@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250819-talos-l3-icc-v3-2-04529e85dac7@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 10:00:33 -05:00
Wenmeng Liu
fe9829de17 arm64: dts: qcom: lemans-evk-camera: Add DT overlay
Enable IMX577 via CCI1 on Lemans EVK.

Signed-off-by: Wenmeng Liu <quic_wenmliu@quicinc.com>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Link: https://lore.kernel.org/r/20250815-rb8_camera-v2-3-6806242913ed@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 09:58:20 -05:00
Thomas Weißschuh
3c9b904f90 x86/um/vdso: Drop VDSO64-y from Makefile
This symbol is unnecessary, remove it.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251013-uml-vdso-cleanup-v1-4-a079c7adcc69@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 15:12:06 +01:00
Thomas Weißschuh
12fe820fae x86/um/vdso: Panic when vDSO can not be allocated
The vDSO address is added to the userspace auxiliary vectors even if the
vDSO was not allocated. When accessing the page, userspace processes
will crash.

Enforce that the allocation works.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251013-uml-vdso-cleanup-v1-3-a079c7adcc69@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 15:12:06 +01:00
Thomas Weißschuh
8c0fbd6ae4 x86/um/vdso: Use prototypes from generic vDSO headers
The generic vDSO library provides a convenient header for the vDSO
function prototypes, use it.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251013-uml-vdso-cleanup-v1-2-a079c7adcc69@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 15:12:06 +01:00
Thomas Weißschuh
019cde8fc9 x86/um/vdso: Fix prototype of clock_gettime()
The clock_gettime() system call takes a pointer to
'struct __kernel_timespec', not 'struct __kernel_old_timespec'.
Right now this is not an issue as the vDSO never works with the
actual struct but only passes it through to the kernel.

Fix the prototype for consistency with the system call.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251013-uml-vdso-cleanup-v1-1-a079c7adcc69@weissschuh.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 15:12:06 +01:00
Johannes Berg
6e3fc802ab um: move asm-offsets generation into a single file
There's nothing subarch dependent here, and it's odd
that includes need to be done in the subarch, and then
entries defined in the common file.

Simplify the whole thing from three files into one.

Link: https://patch.msgid.link/20251007071452.367989-4-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 15:07:44 +01:00
Johannes Berg
f11839c16c um/hostfs: define HOSTFS_ATTR_* via asm-offsets
The HOSTFS_ATTR_* values were meant to be standalone for
communication between hostfs's kernel and user code parts.
However, it's easy to forget that HOSTFS_ATTR_* should be
used even on the kernel side, and that wasn't consistently
done. As a result, the values need to match ATTR_* values,
which is not useful to maintain by hand. Instead, generate
them via asm-offsets like other constants that UML needs
in user-side code that aren't otherwise available in any
header files that can be included there.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://patch.msgid.link/20251007071452.367989-3-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 15:07:43 +01:00
Johannes Berg
7b5d441696 um: init cpu_tasks[] earlier
This is currently done in uml_finishsetup(), but e.g. with
KCOV enabled we'll crash because some init code can call
into e.g. memparse(), which has coverage annotations, and
then the checks in check_kcov_mode() crash because current
is NULL.

Simply initialize the cpu_tasks[] array statically, which
fixes the crash. For the later SMP work, it seems to have
not really caused any problems yet, but initialize all of
the entries anyway.

Link: https://patch.msgid.link/20250924113214.c76cd74d0583.I974f691ebb1a2b47915bd2b04cc38e5263b9447f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27 15:05:50 +01:00
Krishna Kurapati
0903296efd arm64: dts: qcom: lemans: Add missing quirk for HS only USB controller
The PIPE clock is provided by the USB3 PHY, which is predictably not
connected to the HS-only controller. Add "qcom,select-utmi-as-pipe-clk"
quirk to  HS only USB controller to disable pipe clock requirement.

Fixes: de1001525c ("arm64: dts: qcom: sa8775p: add USB nodes")
Signed-off-by: Krishna Kurapati <krishna.kurapati@oss.qualcomm.com>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Link: https://lore.kernel.org/r/20251024105019.2220832-3-krishna.kurapati@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 09:02:51 -05:00
Krishna Kurapati
6b3e8a5d6c arm64: dts: qcom: x1e80100: Add missing quirk for HS only USB controller
The PIPE clock is provided by the USB3 PHY, which is predictably not
connected to the HS-only controller. Add "qcom,select-utmi-as-pipe-clk"
quirk to  HS only USB controller to disable pipe clock requirement.

Fixes: 4af46b7bd6 ("arm64: dts: qcom: x1e80100: Add USB nodes")
Signed-off-by: Krishna Kurapati <krishna.kurapati@oss.qualcomm.com>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Link: https://lore.kernel.org/r/20251024105019.2220832-2-krishna.kurapati@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 09:02:50 -05:00
Krishna Kurapati
0dab10c382 arm64: dts: qcom: x1e80100: Fix compile warnings for USB HS controller
With W=1, the following error comes up:

Warning (graph_child_address): /soc@0/usb@a2f8800/usb@a200000/ports: graph node has single child node 'port@0', #address-cells/#size-cells are not necessary

This could be since the controller is only HS capable and only one port
node is added.

Fixes: 4af46b7bd6 ("arm64: dts: qcom: x1e80100: Add USB nodes")
Signed-off-by: Krishna Kurapati <krishna.kurapati@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251019115630.2222720-1-krishna.kurapati@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-27 09:02:50 -05:00
Louis-Alexis Eyraud
c8dd72788f arm64: dts: mediatek: mt8365-evk: Enable GPU support
Enable for the Mediatek Genio 350-EVK board the support of the
Arm Mali G52 MC1 GPU integrated in the MT8365 SoC.

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-27 11:32:34 +01:00
Louis-Alexis Eyraud
499af66b8f arm64: dts: mediatek: mt8365: Add GPU support
The Mediatek MT8365 SoC has an integrated Arm Mali G52 MC1 GPU
(Bifrost).
Add gpu, OPP table, and MFG clock driver nodes in mt8365.dtsi to support
it using the Panfrost driver.

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-27 11:32:34 +01:00
Nícolas F. R. A. Prado
eb98462478 arm64: dts: mediatek: mt8395-genio-1200-evk: Describe CPU supplies
The Genio 1200 EVK board has the big and little CPU clusters fed by the
same regulators as MT8195-Cherry boards, so describe them in the same
way as commit 17b33dd9e4 ("arm64: dts: mediatek: cherry: Describe CPU
supplies").

This prevents the system from hanging during boot in the case that the
cpufreq-mediatek-hw driver tries to probe before the drivers for the
regulators have probed (which happens when using the current defconfig).

Fixes: f2b543a191 ("arm64: dts: mediatek: add device-tree for Genio 1200 EVK board")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-27 11:31:23 +01:00
Aradhya Bhatia
a00ee8014d arm64: dts: ti: k3-am625: Add OLDI support
The AM625 SoC has 2 OLDI TXes under the DSS. Add their support.

Signed-off-by: Aradhya Bhatia <a-bhatia1@ti.com>
Signed-off-by: Swamil Jain <s-jain1@ti.com>
Reviewed-by: Devarsh Thakkar <devarsht@ti.com>
Link: https://patch.msgid.link/20250913064205.4152249-3-s-jain1@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-27 15:19:28 +05:30
Aradhya Bhatia
779ea073db arm64: dts: ti: k3-am62: Add support for AM625 OLDI IO Control
Add TI DSS OLDI-IO control registers for AM625 DSS. This is a region of
12 32bit registers found in the TI AM625 CTRL_MMR0 register space[0].
They are used to control the characteristics of the OLDI DATA/CLK IO as
needed by the DSS display controller node.

[0]: https://www.ti.com/lit/pdf/spruiv7

Signed-off-by: Aradhya Bhatia <a-bhatia1@ti.com>
Signed-off-by: Swamil Jain <s-jain1@ti.com>
Reviewed-by: Devarsh Thakkar <devarsht@ti.com>
Link: https://patch.msgid.link/20250913064205.4152249-2-s-jain1@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-27 15:19:28 +05:30
Frank Li
203a6a7e92 ARM: dts: imx7ulp: remove bias-pull-up
i.MX7ULP pinctrl don't support bias-pull-up property. So remove it to fix
below CHECK_DTBS warnings:
  arch/arm/boot/dts/nxp/imx/imx7ulp-evk.dtb: pinctrl@40ac0000 (fsl,imx7ulp-iomuxc1): lpuart4grp: 'bias-pull-up' does not match any of the regexes: '^pinctrl-[0-9]+$'

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:59:43 +08:00
Frank Li
0978cbfe5b ARM: dts: remove undocumented clock-names for ov5642
Remove undocumented clock-names for ov5642 to fix below CHECK_DTBS warnings:
  arch/arm/boot/dts/nxp/imx/imx6q-sabresd.dtb: camera@3c (ovti,ov5642): 'clock-names' does not match any of the regexes: '^pinctrl-[0-9]+$'
	from schema $id: http://devicetree.org/schemas/media/i2c/ovti,ov5642.yaml#

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:59:43 +08:00
Frank Li
b20ba9e502 ARM: dts: add device_type for memory node
Add device_type for memory node to fix below CHECK_DTB warnings:
  arch/arm/boot/dts/nxp/imx/imx6dl-b105pv2.dtb: / (ge,imx6dl-b105pv2): memory@10000000: 'device_type' is a required property

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:59:43 +08:00
Frank Li
0a1178155a ARM: dts: Add bus type for parallel ov5640
Add bus type for parallel ov5640 to fix below CHECK_DTBS warnings:
  arch/arm/boot/dts/nxp/imx/imx6q-sabrelite.dtb: camera@42 (ovti,ov5642): port:endpoint:hsync-active: False schema does not allow 1

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:59:43 +08:00
Frank Li
b29ba0669c ARM: dts: imx6q-cm-fx6.dts: add supplies for wm8731
Add supplies for wm8731 to fix below CHECK_DTB warnings:
arch/arm/boot/dts/nxp/imx/imx6q-cm-fx6.dtb: codec@1a (wlf,wm8731): 'AVDD-supply' is a required property
        from schema $id: http://devicetree.org/schemas/sound/wlf,wm8731.yaml#

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:59:43 +08:00
Frank Li
ebaec0e02f ARM: dts: imx6qdl-skov-cpu fix typo interrupt
Fix typo interrupt, which should be 'interrupts'. Fix below CHECK_DTBS
warnings.

arch/arm/boot/dts/nxp/imx/imx6dl-skov-revc-lt2.dtb: switch@0 (microchip,ksz8873): Unevaluated properties are not allowed ('interrupt', 'pinctrl-names' were unexpected)
        from schema $id: http://devicetree.org/schemas/net/dsa/microchip,ksz.yaml

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:59:43 +08:00
Frank Li
6ad6ade587 ARM: dts: imx: remove redundant linux,phandle
Remove redundant linux,phandle to fix below CHECK_DTBS warnings:
  arch/arm/boot/dts/nxp/imx/imx6dl-gw560x.dtb: pmic@3c (lltc,ltc3676): regulators:sw3: Unevaluated properties are not allowed ('linux,phandle' was unexpected)

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:59:43 +08:00
Frank Li
c3145218f3 ARM: dts: imx6ull-dhcom-pdk2: rename power-supply to vcc-supply for touchscreen
Rename power-supply to vcc-supply for touchscreen to fix below CHECK_DTB
warnings:
  arch/arm/boot/dts/nxp/imx/imx6ull-dhcom-pdk2.dtb: touchscreen@38 (edt,edt-ft5406): Unevaluated properties are not allowed ('power-supply' was unexpected)

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:59:42 +08:00
Frank Li
819908313c ARM: dts: imx: add power-supply for lcd panel
Add power-supply for lcd panel to fix below CHECK_DTBS warnings:
  arch/arm/boot/dts/nxp/imx/imx6q-evi.dtb: panel (sharp,lq101k1ly04): 'power-supply' is a required property

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:59:42 +08:00
Yannic Moog
fcb2626dc4 arm64: dts: imx8mp pollux: add displays for expansion board
The same displays that can be connected directly to the
imx8mp-phyboard-pollux can also be connected to the expansion board
PEB-AV-10. For displays connected to the expansion board, a second LVDS
channel of the i.MX 8M Plus SoC is used and only a single display
connected to the SoC LVDS display bridge at a given time is supported.

Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Yannic Moog <y.moog@phytec.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:50:41 +08:00
Yannic Moog
7012f55108 arm64: dts: imx8mp pollux: add expansion board overlay
An expansion board (PEB-AV-10) may be connected to the
imx8mp-phyboard-pollux. Its main purpose is to provide multimedia
interfaces, featuring a 3.5mm headphone jack, a USB-A port and LVDS as
well as backlight connectors.
Introduce the expansion board as dtsi, as it may be used standalone as
an expansion board, as well as in combination with display panels. These
display panels will include the dtsi.

Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Yannic Moog <y.moog@phytec.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:50:38 +08:00
Yannic Moog
edfdb7bce8 arm64: dts: imx8mp pollux: add display overlays
imx8mp-phyboard-pollux had a display baked into its board dts file.
However this approach does not truly discribe the hardware and is not
suitable when using different displays.
Move display specific description into an overlay and add the successor
display for the phyboard-pollux as an additional overlay.

Reviewed-by: Teresa Remmet <t.remmet@phytec.de>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Yannic Moog <y.moog@phytec.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:50:36 +08:00
Yannic Moog
a6d681b7d1 arm64: dts: im8mp-phy{board,core}: update license
Change license from GPL-2.0 to GPL-2.0-or-later OR MIT.
Use syntax as defined in the SPDX standard. Also remove individual
authorship.

Acked-by: Teresa Remmet <t.remmet@phytec.de>
Signed-off-by: Yannic Moog <y.moog@phytec.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:50:24 +08:00
Peng Fan
97a07dd2b5 firmware: imx: scu: Use devm_mutex_init
In normal case, there is no need to invoke mutex_destroy in error path,
but it is useful when CONFIG_DEBUG_MUTEXES, so use devm_mutex_init().

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:43:48 +08:00
Peng Fan
ff79af939d firmware: imx: scu: Suppress bind attrs
The SCU driver is critical for system working properly, it should
never be removed and binded again. So suppress the bind attrs

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:43:47 +08:00
Peng Fan
27d408697f firmware: imx: scu: Update error code
IMX_SC_ERR_NOTFOUND should map with -ENOENT, not -EEXIST. -ENODEV makes
more sense for IMX_SC_ERR_NOPOWER, and -ECOMM makes more sense for
IMX_SC_ERR_IPC.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:43:47 +08:00
Peng Fan
ea2f83c6aa firmware: imx: scu-irq: Remove unused export of imx_scu_enable_general_irq_channel
Since its introduction, this symbol has not been used by any loadable
modules. It remains only referenced within imx-scu.c, which is always built
together with imx-scu-irq.c

As such, exporting imx_scu_enable_general_irq_channel is unnecessary, so
remove the export.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:43:47 +08:00
Peng Fan
ff3f9913bc firmware: imx: scu-irq: Set mu_resource_id before get handle
mu_resource_id is referenced in imx_scu_irq_get_status() and
imx_scu_irq_group_enable() which could be used by other modules, so
need to set correct value before using imx_sc_irq_ipc_handle in
SCU API call.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:43:47 +08:00
Peng Fan
81fb53feb6 firmware: imx: scu-irq: Init workqueue before request mbox channel
With mailbox channel requested, there is possibility that interrupts may
come in, so need to make sure the workqueue is initialized before
the queue is scheduled by mailbox rx callback.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:43:47 +08:00
Peng Fan
62c740fb11 firmware: imx: scu-irq: Free mailbox client on failure at imx_scu_enable_general_irq_channel()
The IRQ mailbox is an optional channel and does not need to be kept until
driver removal when an error occurs. Free the allocated memory in the
error path.

Add 'goto free_cl' when mbox_request_channel_byname() fails, to keep free
at one place.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:43:47 +08:00
Peng Fan
ee67247843 firmware: imx: scu-irq: fix OF node leak in
imx_scu_enable_general_irq_channel() calls of_parse_phandle_with_args(),
but does not release the OF node reference. Add a of_node_put() call
to release the reference.

Fixes: 851826c756 ("firmware: imx: enable imx scu general irq function")
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:43:47 +08:00
Frank Li
6d97ac47de ARM: dts: imx6qdl-nitrogen6_max: rename i2c<n>mux to i2c
Rename i2c<n>mux to i2c to fix below CHECK_DTBS warnings:
arch/arm/boot/dts/nxp/imx/imx6q-nitrogen6_max.dtb: i2c2mux (i2c-mux-gpio): $nodename:0: 'i2c2mux' does not match '^(i2c-?)?mux'

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:42:54 +08:00
Frank Li
bbe7f27581 ARM: dts: imx6ull-phytec-tauri: remove extra space before jedec,spi-nor
Remove extra space in " jedec,spi-nor" to fix below CHECK_DTBS warnings:
arch/arm/boot/dts/nxp/imx/imx6ull-phytec-tauri-emmc.dtb: /soc/bus@2000000/spba-bus@2000000/spi@2008000/flash@2: failed to match any schema with compatible: [' jedec,spi-nor']

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:42:54 +08:00
Frank Li
aaf857946e ARM: dts: imx6q-utilite-pro: add missing required property for pci
Add device_type, bus-range, ranges for pci nodes. Rename intel,i211 to
ethernet to fix below CHECK_DTBS warnings:
  arch/arm/boot/dts/nxp/imx/imx6q-utilite-pro.dtb: pcie@0,0: 'device_type' is a required property
        from schema $id: http://devicetree.org/schemas/pci/pci-bus-common.yaml#
  arch/arm/boot/dts/nxp/imx/imx6q-utilite-pro.dtb: pcie@0,0: 'ranges' is a required property
        from schema $id: http://devicetree.org/schemas/pci/pci-bus-common.yaml
  arm/boot/dts/nxp/imx/imx6q-utilite-pro.dtb: pcie@0,0: 'intel,i211@pcie0,0' does not match any of the regexes: '.*-names$', '.*-supply$', '^#.*-cells$', '^#[a-zA-Z0-9,+\\-._]{0,63}$', '^[a-zA-Z0-9][a-zA-Z0-9#,+\\-._]{0,63}$', '^[a-zA-Z0-9][a-zA-Z0-9,+\\-._]{0,63}@[0-9a-fA-F]+(,[0-9a-fA-F]+)*$', '^__.*__$', 'pinctrl-[0-9]+'
        from schema $id: http://devicetree.org/schemas/dt-core.yaml

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:42:54 +08:00
Frank Li
7ea1e6df5b ARM: dts: imx6-tbs2910: rename ir_recv to ir-receiver
Rename ir_recv to ir-receiver to fix below CHECK_DTBS warnings:
arm/boot/dts/nxp/imx/imx6q-tbs2910.dtb: ir_recv (gpio-ir-receiver): $nodename:0: 'ir_recv' does not match '^ir(-receiver)?(@[a-f0-9]+)?$'
        from schema $id: http://devicetree.org/schemas/media/gpio-ir-receiver.yaml

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:42:54 +08:00
Frank Li
f2a7629c30 ARM: dts: imx6: remove pinctrl-name if pinctrl-0 doesn't exist
Remove redundant pinctrl-name since pinctrl-0 doesn't exist to fix below
CHECK_DTBS warnings:
arch/arm/boot/dts/nxp/imx/imx6q-pistachio.dtb: pinctrl@20e0000 (fsl,imx6q-iomuxc): 'pinctrl-0' is a dependency of 'pinctrl-names'
        from schema $id: http://devicetree.org/schemas/pinctrl/pinctrl-consumer.yaml

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:42:54 +08:00
Frank Li
81da2a3245 ARM: dts: imx6: change voltage-divider's io-channel-cells to 1
Change voltage-divider's io-channel-cells to 1 to fix below CHECK_DTBS
warnings:
arch/arm/boot/dts/nxp/imx/imx6dl-plym2m.dtb: voltage-divider-vaccu (voltage-divider): #io-channel-cells: 1 was expected
        from schema $id: http://devicetree.org/schemas/iio/afe/voltage-divider.yaml

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:42:54 +08:00
Frank Li
5889eb175a ARM: dts: imx6: remove gpio suffix for regulator
Remove gpio suffix for regulator to fix below CHECK_DTBS warnings:
 (fsl,imx6sx-sdb): regulator-pcie-gpio: {'compatible': ['regulator-fixed'], 'phandle': 69} is not of type 'array'
        from schema $id: http://devicetree.org/schemas/gpio/gpio-consumer.yaml

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:42:54 +08:00
Frank Li
698390d701 ARM: dts: imx6qdl-ts4900: rename ngpio to ngpios
Rename ngpio to ngpios to fix below CHECK_DTBS warnings:
arch/arm/boot/dts/nxp/imx/imx6dl-ts4900.dtb: gpio@28 (technologic,ts4900-gpio): 'ngpio' does not match any of the regexes: '^(hog-[0-9]+|.+-hog(-[0-9]+)?)$', '^pinctrl-[0-9]+$'
        from schema $id: http://devicetree.org/schemas/trivial-gpio.yaml

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:42:54 +08:00
Frank Li
e39f072d2a ARM: dts: imx6: rename m95m02 to eeprom
rename m95m02 to eeprom to fix below CHECK_DTBS warnings:

arch/arm/boot/dts/nxp/imx/imx6q-evi.dtb: m95m02@1 (st,m95m02): $nodename: 'anyOf' conditional failed, one must be fixed:
        'm95m02@1' does not match '^eeprom@[0-9a-f]{1,2}$'
        'm95m02@1' does not match '^fram@[0-9a-f]{1,2}$'

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:42:54 +08:00
Frank Li
c052a9349a ARM: dts: imx6: rename touch-thermal0 to touch-0-thermal
Rename touch-thermal0 to touch-0-thermal to fix below CHECK_DTBS warnings:
arch/arm/boot/dts/nxp/imx/imx6dl-plym2m.dtb: thermal-zones: 'touch-thermal0', 'touch-thermal1' do not match any of the regexes: '^[a-zA-Z][a-zA-Z0-9\\-]{1,10}-thermal$', 'pinctrl-[0-9]+'
        from schema $id: http://devicetree.org/schemas/thermal/thermal-zones.yaml

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:42:53 +08:00
Frank Li
664625e4c2 ARM: dts: imx6: rename stmpgpio to gpio
Rename stmpgpio to gpio and add gpio-controller and interrupt-controller.
Rename stmpe_adc to adc. Move interrupt-controller and gpio-controller
under gpio node.

to fix below CHECK_DTBS warnings:

/home/lizhi/source/linux-upstream-pci/arch/arm/boot/dts/nxp/imx/imx6q-dmo-edmqmx6.dtb: stmpe1601@40 (st,stmpe1601): gpio: 'interrupt-controller' is a required property
	from schema $id: http://devicetree.org/schemas/mfd/st,stmpe.yaml#
/home/lizhi/source/linux-upstream-pci/arch/arm/boot/dts/nxp/imx/imx6q-dmo-edmqmx6.dtb: gpio (st,stmpe-gpio): 'interrupt-controller' is a required property
	from schema $id: http://devicetree.org/schemas/gpio/st,stmpe-gpio.yaml#
/home/lizhi/source/linux-upstream-pci/arch/arm/boot/dts/nxp/imx/imx6q-dmo-edmqmx6.dtb: stmpe1601@44 (st,stmpe1601): gpio: 'interrupt-controller' is a required property
	from schema $id: http://devicetree.org/schemas/mfd/st,stmpe.yaml#
/home/lizhi/source/linux-upstream-pci/arch/arm/boot/dts/nxp/imx/imx6q-dmo-edmqmx6.dtb: gpio (st,stmpe-gpio): 'interrupt-controller' is a required property
	from schema $id: http://devicetree.org/schemas/gpio/st,stmpe-gpio.yaml#

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:42:53 +08:00
Ioana Ciornei
8cd60e1daf arm64: dts: ls1046a-qds: describe the two on-board SFP+ cages
Describe the two SFP+ cages present on the LS1046AQDS board and their
associated I2C buses and GPIO lines.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:28:50 +08:00
Ioana Ciornei
7c9f33d0ab arm64: dts: lx2160a-rdb: fully describe the two SFP+ cages
Describe the two SFP+ cages found on the LX2160ARDB board with their
respective I2C buses and GPIO lines.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:28:39 +08:00
Ioana Ciornei
dc49682902 arm64: dts: ls1046a-qds: describe the FPGA based GPIO controller
The QIXIS FPGA node is extended so that it describes the GPIO controller
responsible for all the status presence lines on both SFP+ cages as well
as the IO SLOTs present on the board.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:26:47 +08:00
Ioana Ciornei
6499815172 arm64: dts: lx2160a-rdb: describe the QIXIS FPGA and two child GPIO controllers
Describe the FPGA present on the LX2160ARDB board as a simple-mfd I2C
device. The FPGA presents registers that deal with power-on-reset
timing, muxing, SFP cage monitoring and control etc.

Also add the two GPIO controllers responsible for monitoring and
controlling the SFP+ cages used for MAC5 and MAC6.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:26:04 +08:00
Ioana Ciornei
2346a408b4 dt-bindings: fsl,fpga-qixis: describe the gpio child node found on LS1046AQDS
Extend the list of accepted child nodes with the QIXIS FPGA based GPIO
controller and explicitly list its compatible string
fsl,ls1046aqds-fpga-gpio-stat-pres2 as the only one accepted.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:21:05 +08:00
Ioana Ciornei
b8ac5ceef2 dt-bindings: fsl,fpga-qixis-i2c: add support for LX2160ARDB FPGA
Extend the list of supported compatible strings with fsl,lx2160ardb-fpga.

Since the register map exposed by the LX2160ARDB's FPGA also contains
two GPIO controllers, accept the necessary GPIO pattern property.
At the same time, add the #address-cells and #size-cells properties as
valid ones so that the child nodes of the fsl,lx2160ardb-fpga node are
addressable.

This is necessary because when defining child devices such as the GPIO
controller described in the added example, the child device needs a the
reg property to properly identify its register location in the parent
I2C device address space.

Impose this restriction for the new compatible through an if-statement.

The feature set exposed by these QIXIS FPGA devices is highly dependent
on the board type, meaning that even though the FPGA found on the
LX2160AQDS board (fsl,lx2160aqds-fpga) works in the same way in terms of
access over I2C as the one found on the LX2160ARDB (fsl,lx2160ardb-fpga
added here), the register map inside the device space is different since
there are different on-board devices to be controlled.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:20:55 +08:00
Jonas Rebmann
0bb66f6df1 arm64: dts: add Protonic PRT8ML board
Add devicetree for the Protonic PRT8ML.

The board is similar to the Protonic PRT8MM but i.MX8MP based.

Some features have been removed as the drivers haven't been mainlined
yet or other issues where encountered:
 - Stepper motors to be controlled using motion control subsystem
 - MIPI/DSI to eDP USB alt-mode
 - Onboard T1 ethernet (10BASE-T1L+PoDL, 100BASE-T1+PoDL, 1000BASE-T1)

Signed-off-by: David Jander <david@protonic.nl>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Jonas Rebmann <jre@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:05:54 +08:00
Jonas Rebmann
e252582624 dt-bindings: arm: fsl: Add Protonic PRT8ML
Add DT compatible string for Protonic PRT8ML board.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Jonas Rebmann <jre@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 14:04:47 +08:00
Martin Kepplinger-Novaković
7a88b98809 arm64: dts: imx8mp: add cpuidle cooling device to the alert trip point
Idle-inject up to 50% of all cpu's time in order to help cpufreq
to keep the temperature below the trip points.

Signed-off-by: Martin Kepplinger-Novaković <martink@posteo.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 13:54:48 +08:00
Martin Kepplinger-Novaković
ead2dafd66 arm64: dts: imx8mp: add idle cooling devices to cpu core
The thermal framework can use the cpu-idle-states as
described for imx8mp as an alternative or in parallel to
cpufreq.

Add the DT node to the cpu so the cooling devices will be present
and the thermal zone descriptions can use them.

Signed-off-by: Martin Kepplinger-Novaković <martink@posteo.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 13:54:48 +08:00
Alexander Kurz
c5efaf2e88 ARM: dts: imx53: enable PMIC RTC on imx53-qsrb
The RTC inside mc34708 is supported by RTC_DRV_MC13XXX since v3.6-rc1.
Enable the PMIC RTC on the imx53-qsrb. Without a battery the RTC may be
powered via the micro-USB connector when main 5V power is not available.

Signed-off-by: Alexander Kurz <akurz@blala.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 13:49:39 +08:00
Randolph Sapp
76546090b1 arm64: dts: ti: k3-am62p: Fix memory ranges for GPU
Update the memory region listed in the k3-am62p.dtsi for the BXS-4-64
GPU to match the Main Memory Map described in the TRM [1].

[1] https://www.ti.com/lit/ug/spruj83b/spruj83b.pdf

Fixes: 29075cc09f ("arm64: dts: ti: Introduce AM62P5 family of SoCs")
Signed-off-by: Randolph Sapp <rs@ti.com>
Reviewed-by: Michael Walle <mwalle@kernel.org>
Link: https://patch.msgid.link/20250919193341.707660-2-rs@ti.com
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
2025-10-27 10:45:24 +05:30
Alexandre Belloni
ae3b64675f ARM: dts: imx6q-evi: fix rtc compatible
The correct compatible is nxp,pcf2123

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 09:23:08 +08:00
Teresa Remmet
052290cdea arm64: dts: imx8mm-phygate-tauri-l: Update pad ctl for USB OC pin
Disable Hysteresis Enable Field in pad ctl register for USB OC pin
as this is more appropriate for the signal form in our case.

Signed-off-by: Teresa Remmet <t.remmet@phytec.de>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 09:22:13 +08:00
Teresa Remmet
2a5f92d11f arm64: dts: imx8mm-phyboard-polis-rdk: Add USB1 OC pin configuration
Add USB1 OC pin configuration for proper over-current detection.

Signed-off-by: Teresa Remmet <t.remmet@phytec.de>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 09:22:13 +08:00
Marek Vasut
2a11955082 arm64: dts: freescale: add initial support for i.MX 95 Verdin Evaluation Kit (EVK)
Add initial support for i.MX 95 Verdin Evaluation Kit (EVK), which
used to be the Titan EVK. Currently supported is lpuart1 as console,
SDHC1/2/3 as storage, WM8904 Audio, USB3.0 and ENETC ethernet RGMII
Gigabit port.

Note that the SoM used in this EVK is a derivative SoM from Verdin
line of SoMs, an actual i.MX95 Verdin SoM is under development.

[1] https://www.toradex.com/computer-on-modules/verdin-arm-family/nxp-imx95-evaluation-kit

Reviewed-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: João Paulo Gonçalves <joao.goncalves@toradex.com>
Tested-by: João Paulo Gonçalves <joao.goncalves@toradex.com> # i.MX95 Verdin EVK
Signed-off-by: Marek Vasut <marek.vasut@mailbox.org>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 09:01:23 +08:00
Marek Vasut
5a0236736e dt-bindings: arm: imx: document i.MX 95 Verdin Evaluation Kit (EVK)
Document support for i.MX 95 Verdin Evaluation Kit (EVK), which
used to be the Titan EVK.

Note that the SoM used in this EVK is a derivative SoM from Verdin
line of SoMs, an actual i.MX95 Verdin SoM is under development.

[1] https://www.toradex.com/computer-on-modules/verdin-arm-family/nxp-imx95-evaluation-kit

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Reviewed-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Signed-off-by: Marek Vasut <marek.vasut@mailbox.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-27 09:00:53 +08:00
Joel Granados
18c4e02884 watchdog: move nmi_watchdog sysctl into .rodata
Move nmi_watchdog into the watchdog_sysctls array to prevent it from
unnecessary modification. This move effectively moves it inside the
.rodata section.

Initially moved out into its own non-const array in commit 9ec272c586
("watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe
fails"), which made it writable only when watchdog_hardlockup_available
was true. Moving it back to watchdog_sysctl keeps this behavior as
writing to nmi_watchdog still fails when watchdog_hardlockup_available
is false.

Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-10-24 15:35:11 +02:00
Michael Opdenacker
9813395078 riscv: dts: spacemit: add Ethernet and PDMA to OrangePi RV2
The OrangePi RV2 board ships two RGMII ethernet ports.
Each has an external Motorcomm YT8531C PHY attached, the PHY uses GPIO
for reset pin control.

Enable PDMA for the SpacemiT K1-based SoC in the OrangePi RV2 board.

Signed-off-by: Michael Opdenacker <michael.opdenacker@rootcommit.com>
Reviewed-by: Yixun Lan <dlan@gentoo.org>
Link: https://lore.kernel.org/r/20251022201807.1474789-2-michael.opdenacker@rootcommit.com
Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-10-24 14:18:47 +08:00
Dinh Nguyen
06b0f1c336 arm64: dts: socfpga: move sdmmc-ecc to the base DTSI file
The ECC manager entry for sdmmc should be a chip level entry, not a
board entry.

Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-23 15:30:24 -05:00
Cosmin Tanislav
e7d3465774 arm64: defconfig: Enable RZ/T2H / RZ/N2H ADC driver
Renesas RZ/T2H (R9A09G077) and RZ/N2H (R9A09G087) SoCs include three
12-Bit successive approximation A/D converters.

RZ/T2H has two ADCs with 4 channels and one with 6.
RZ/N2H has two ADCs with 4 channels and one with 15.

Enable the driver for them, as used on the RZ/T2H and RZ/N2H EVKs.

Signed-off-by: Cosmin Tanislav <cosmin-gabriel.tanislav.xa@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251005111323.804638-7-cosmin-gabriel.tanislav.xa@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-23 16:22:41 +02:00
Wolfram Sang
4765d59fcf soc: renesas: rcar-rst: Keep RESBAR2S in default state
Unlike Gen2, Gen4 has bit 15 of WDTRSTCR register also used. Keep it in
the default state for the V3U firmware workaround.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251017114234.2968-2-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-23 16:03:30 +02:00
Biju Das
42ec0bc61f memory: renesas-rpc-if: Add suspend/resume support
On RZ/G3E using PSCI, s2ram powers down the SoC. Add suspend/resume
callbacks to control spi/spix2 clocks.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://patch.msgid.link/20251019180940.157088-1-biju.das.jz@bp.renesas.com
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-23 15:28:15 +02:00
Haotian Zhang
2286e18e39 soc: qcom: gsbi: fix double disable caused by devm
In the commit referenced by the Fixes tag, devm_clk_get_enabled() was
introduced to replace devm_clk_get() and clk_prepare_enable(). While
the clk_disable_unprepare() call in the error path was correctly
removed, the one in the remove function was overlooked, leading to a
double disable issue.

Remove the redundant clk_disable_unprepare() call from gsbi_remove()
to fix this issue. Since all resources are now managed by devres
and will be automatically released, the remove function serves no purpose
and can be deleted entirely.

Fixes: 489d7a8cc2 ("soc: qcom: use devm_clk_get_enabled() in gsbi_probe()")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/stable/20251020160215.523-1-vulab%40iscas.ac.cn
Link: https://lore.kernel.org/r/20251020160215.523-1-vulab@iscas.ac.cn
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-22 17:24:52 -05:00
Kathiravan Thirumoorthy
dcbce2c23e soc: qcom: socinfo: add the missing entries to the smem image table
Add the missing entries to the SMEM image table to ensure completeness,
rather than adding support for one image at a time.

Signed-off-by: Kathiravan Thirumoorthy <kathiravan.thirumoorthy@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250929-image_crm-v1-2-e06530c42357@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-22 17:23:29 -05:00
Kathiravan Thirumoorthy
abac241e1d soc: qcom: socinfo: arrange the socinfo_image_names array in alphabetical order
The socinfo_image_names array is currently neither arranged alphabetically
nor by image index values, making it harder to maintain. Reorder the array
alphabetically to improve readability and simplify the addition of new
entries.

Signed-off-by: Kathiravan Thirumoorthy <kathiravan.thirumoorthy@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250929-image_crm-v1-1-e06530c42357@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-22 17:23:29 -05:00
Johan Hovold
94124bf253 soc: qcom: pbs: fix device leak on lookup
Make sure to drop the reference taken to the pbs platform device when
looking up its driver data.

Note that holding a reference to a device does not prevent its driver
data from going away so there is no point in keeping the reference.

Fixes: 5b2dd77be1 ("soc: qcom: add QCOM PBS driver")
Cc: stable@vger.kernel.org	# 6.9
Cc: Anjelique Melendez <quic_amelende@quicinc.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20250926143511.6715-3-johan@kernel.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-22 17:23:14 -05:00
Johan Hovold
b5c16ea57b soc: qcom: ocmem: fix device leak on lookup
Make sure to drop the reference taken to the ocmem platform device when
looking up its driver data.

Note that holding a reference to a device does not prevent its driver
data from going away so there is no point in keeping the reference.

Also note that commit 0ff027027e ("soc: qcom: ocmem: Fix missing
put_device() call in of_get_ocmem") fixed the leak in a lookup error
path, but the reference is still leaking on success.

Fixes: 88c1e9404f ("soc: qcom: add OCMEM driver")
Cc: stable@vger.kernel.org	# 5.5: 0ff027027e
Cc: Brian Masney <bmasney@redhat.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20250926143511.6715-2-johan@kernel.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-22 17:23:14 -05:00
Jingyi Wang
c88b6ee3ba soc: qcom: llcc-qcom: Add support for Kaanapali
Add system cache table and configs for Kaanapali SoC.

Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250924-knp-llcc-v1-2-ae6a016e5138@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-22 17:21:30 -05:00
Jingyi Wang
f5f1e5abb6 dt-bindings: cache: qcom,llcc: Document the Kaanapali LLCC
Document the Last Level Cache Controller on Kaanapali platform.

Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20250924-knp-llcc-v1-1-ae6a016e5138@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-22 17:21:30 -05:00
Anjelique Melendez
6773cb33e7 soc: qcom: pmic_glink: Add support for SOCCP remoteproc channels
System On Chip Control Processor (SOCCP) is a subsystem that can have
battery management firmware running on it to support Type-C/PD and
battery charging. SOCCP does not have multiple PDs and hence PDR is not
supported. So, if the subsystem comes up/down, rpmsg driver would be
probed or removed. Use that for notifying clients of pmic_glink for
PDR events.

Add support for battery management FW running on SOCCP by adding the
"PMIC_RTR_SOCCP_APPS" channel name to the rpmsg_match list and
updating notify_clients logic.

Signed-off-by: Anjelique Melendez <anjelique.melendez@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250919175025.2988948-1-anjelique.melendez@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-22 17:02:19 -05:00
Luo Jie
06ac2566e7 dt-bindings: clock: qcom: Add NSS clock controller for IPQ5424 SoC
NSS clock controller provides the clocks and resets to the networking
blocks such as PPE (Packet Process Engine) and UNIPHY (PCS) on IPQ5424
devices.

Add support for the compatible string "qcom,ipq5424-nsscc" based on the
existing IPQ9574 NSS clock controller Device Tree binding. Additionally,
update the clock names for PPE and NSS for newer SoC additions like
IPQ5424 to use generic and reusable identifiers "nss" and "ppe" without
the clock rate suffix.

Also add master/slave ids for IPQ5424 networking interfaces, which is
used by nss-ipq5424 driver for providing interconnect services using
icc-clk framework.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://lore.kernel.org/r/20251014-qcom_ipq5424_nsscc-v7-7-081f4956be02@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-22 16:57:22 -05:00
Luo Jie
60c8b7569c dt-bindings: clock: gcc-ipq5424: Add definition for GPLL0_OUT_AUX
The GCC clock GPLL0_OUT_AUX is one of source clocks for IPQ5424 NSS clock
controller.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://lore.kernel.org/r/20251014-qcom_ipq5424_nsscc-v7-5-081f4956be02@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-22 16:57:22 -05:00
Luo Jie
2985e76c66 dt-bindings: interconnect: Add Qualcomm IPQ5424 NSSNOC IDs
Add the NSSNOC master/slave ids for Qualcomm IPQ5424 network subsystem
(NSS) hardware blocks. These will be used by the gcc-ipq5424 driver
that provides the interconnect services by using the icc-clk framework.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Acked-by: Georgi Djakov <djakov@kernel.org>
Link: https://lore.kernel.org/r/20251014-qcom_ipq5424_nsscc-v7-3-081f4956be02@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-22 16:57:22 -05:00
Luo Jie
fbfbc68852 dt-bindings: clock: Add "#interconnect-cells" property in IPQ9574 example
The Networking Subsystem (NSS) clock controller acts as both a clock
provider and an interconnect provider. The #interconnect-cells property
is needed in the Device Tree Source (DTS) to ensure that client drivers
such as the PPE driver can correctly acquire ICC clocks from the NSS ICC
provider.

Add the #interconnect-cells property to the IPQ9574 Device Tree binding
example to complete it.

Fixes: 28300ecedc ("dt-bindings: clock: Add ipq9574 NSSCC clock and reset definitions")
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://lore.kernel.org/r/20251014-qcom_ipq5424_nsscc-v7-2-081f4956be02@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-22 16:57:22 -05:00
Bjorn Andersson
c352f94002 Merge branch '20250919-sm6350-mdss-reset-v1-1-48dcac917c73@fairphone.com' into arm64-for-6.19
Merge the topic branch adding MDSS reset constants for SM6350, so they
can be referenced from the MDSS node.
2025-10-22 16:36:49 -05:00
Luca Weiss
e090dc10c6 dt-bindings: clock: dispcc-sm6350: Add MDSS_CORE & MDSS_RSCC resets
Add the indexes for two resets inside the dispcc on SM6350 SoC.

Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20250919-sm6350-mdss-reset-v1-1-48dcac917c73@fairphone.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-22 16:35:57 -05:00
Peter Griffin
af17f9f0bb dt-bindings: soc: samsung: exynos-sysreg: add gs101 hsi0 and misc compatibles
Add dedicated compatibles for gs101 hsi0 and misc sysreg controllers to the
documentation.

Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: André Draszik <andre.draszik@linaro.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20251013-automatic-clocks-v1-1-72851ee00300@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-22 07:52:01 +02:00
André Draszik
8c79c80ff3 dt-bindings: soc: samsung: exynos-sysreg: add power-domains
On gs101 only, sysreg can be part of a power domain, so we need to
allow the relevant property 'power-domains' for the relevant
compatibles google,gs101-*-sysreg.

Signed-off-by: André Draszik <andre.draszik@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20251010-power-domains-dt-bindings-soc-samsung-exynos-sysreg-v2-1-552f5787a3f3@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-22 07:51:57 +02:00
Peter Griffin
4348c22a4f arm64: dts: exynos: gs101: fix sysreg_apm reg property
Both the start address and size are incorrect for the apm_sysreg DT
node. Update to match the TRM (rather than how it was defined
downstream).

Fixes: ea89fdf24f ("arm64: dts: exynos: google: Add initial Google gs101 SoC support")
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: André Draszik <andre.draszik@linaro.org>
Link: https://patch.msgid.link/20251013-automatic-clocks-v1-5-72851ee00300@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-21 21:26:05 +02:00
Peter Griffin
ddb2a16804 arm64: dts: exynos: gs101: fix clock module unit reg sizes
The memory map lists each clock module unit as having a size of
0x10000. Additionally there are some undocumented registers in this region
that need to be used for automatic clock gating mode. Some of those
registers also need to be saved/restored on suspend & resume.

Fixes: 86124c7668 ("arm64: dts: exynos: gs101: enable cmu-hsi2 clock controller")
Fixes: 4982a4a209 ("arm64: dts: exynos: gs101: enable cmu-hsi0 clock controller")
Fixes: 7d66d98b5b ("arm64: dts: exynos: gs101: enable cmu-peric1 clock controller")
Fixes: e62c706f3a ("arm64: dts: exynos: gs101: enable cmu-peric0 clock controller")
Fixes: ea89fdf24f ("arm64: dts: exynos: google: Add initial Google gs101 SoC support")
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: André Draszik <andre.draszik@linaro.org>
Link: https://patch.msgid.link/20251013-automatic-clocks-v1-4-72851ee00300@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-21 21:25:49 +02:00
Peter Griffin
08d9d0d9ae arm64: dts: exynos: gs101: add sysreg_misc and sysreg_hsi0 nodes
Add syscon DT node for the hsi0 and misc sysreg controllers. These will
be referenced by their respective CMU nodes in future patches.

Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: André Draszik <andre.draszik@linaro.org>
Link: https://patch.msgid.link/20251013-automatic-clocks-v1-3-72851ee00300@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-21 21:25:42 +02:00
Conor Dooley
feaa716adc dt-bindings: soc: microchip: document the simple-mfd syscon on PolarFire SoC
"mss-top-sysreg" contains clocks, pinctrl, resets, an interrupt controller
and more. At this point, only the reset controller child is described as
that's all that is described by the existing bindings.
The clock controller already has a dedicated node, and will retain it as
there are other clock regions, so like the mailbox, a compatible-based
lookup of the syscon is sufficient to keep the clock driver working as
before, so no child is needed. There's also an interrupt multiplexing
service provided by this syscon, for which there is work in progress at
[1].

Link: https://lore.kernel.org/linux-gpio/20240723-uncouple-enforcer-7c48e4a4fefe@wendy/ [1]
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-10-21 14:29:34 +01:00
Johan Hovold
075daf2264 soc: amlogic: canvas: simplify lookup error handling
Simplify the canvas lookup error handling by dropping the OF node
reference sooner.

Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://patch.msgid.link/20250926142454.5929-3-johan@kernel.org
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-10-21 14:20:29 +02:00
Johan Hovold
32200f4828 soc: amlogic: canvas: fix device leak on lookup
Make sure to drop the reference taken to the canvas platform device when
looking up its driver data.

Note that holding a reference to a device does not prevent its driver
data from going away so there is no point in keeping the reference.

Also note that commit 28f851e6af ("soc: amlogic: canvas: add missing
put_device() call in meson_canvas_get()") fixed the leak in a lookup
error path, but the reference is still leaking on success.

Fixes: d4983983d9 ("soc: amlogic: add meson-canvas driver")
Cc: stable@vger.kernel.org	# 4.20: 28f851e6af
Cc: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://patch.msgid.link/20250926142454.5929-2-johan@kernel.org
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-10-21 14:20:29 +02:00
WeiHao Li
fc3d701f0d soc: rockchip: grf: Set pwm2/xin32k pad default to xin32k for rk3368
PWM2 and xin32k share the same pad, but some peripheral need the xin32k
clock to run properly, such as tsadc. I have observed that this pad is
used as xin32k by default on some existing board [1], so it maybe more
appropriate to set it to xin32k by default.

I also tested it on another rk3368 based board [2], without this adjust,
tsadc does not work properly.

[1] https://rockchip.fr/geekbox/Geekbox_V1.23.pdf
[2] https://ieiao.github.io/wiki/embedded-dev/rockchip/rk3368

Signed-off-by: WeiHao Li <cn.liweihao@gmail.com>
Link: https://patch.msgid.link/20250906142125.7602-1-cn.liweihao@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-21 11:19:53 +02:00
Haibo Chen
04dba84fba arm64: dts: imx: correct the flexspi compatible string to match with yaml
According to Documentation/devicetree/bindings/spi/spi-nxp-fspi.yaml,
imx93/imx95 should use it's own compatible string and fallback
compatible with imx8mm.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-21 16:35:17 +08:00
Joy Zou
93b2fac5cd arm64: dts: imx95-15x15-evk: add fan-supply property for pwm-fan
Add fan-supply regulator to pwm-fan node to specify power source.

Fixes: e3e8b199af ("arm64: dts: imx95: Add imx95-15x15-evk support")
Signed-off-by: Joy Zou <joy.zou@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-21 16:32:39 +08:00
Tim Harvey
9db04b310e arm64: dts: imx8mp-venice-gw702x: remove off-board sdhc1
SDHC1 on the GW702x SOM routes to a connector for use on a baseboard
and as such are defined in the baseboard device-trees.

Remove it from the gw702x SOM device-tree.

Fixes: 0d5b288c21 ("arm64: dts: freescale: Add imx8mp-venice-gw7905-2x")
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-21 16:29:40 +08:00
Tim Harvey
effe98060f arm64: dts: imx8mp-venice-gw702x: remove off-board uart
UART1 and UART3 go to a connector for use on a baseboard and as such are
defined in the baseboard device-trees. Remove them from the gw702x SOM
device-tree.

Fixes: 0d5b288c21 ("arm64: dts: freescale: Add imx8mp-venice-gw7905-2x")
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-21 16:29:40 +08:00
Tim Harvey
d949b8d12d arm64: dts: imx8mm-venice-gw72xx: remove unused sdhc1 pinctrl
The SDHC1 interface is not used on the imx8mm-venice-gw72xx. Remove the
unused pinctrl_usdhc1 iomux node.

Fixes: 6f30b27c5e ("arm64: dts: imx8mm: Add Gateworks i.MX 8M Mini Development Kits")
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-21 16:29:40 +08:00
Tim Harvey
2c4d3066a5 arm64: dts: imx8mm-venice-gw700x: reduce RGMII CLK drive strength
The i.MX8M Mini FEC RGMII tracelength is less than 1in and does not
require a x6 drive strength. Reduce the CLK drive strength to x1 for
lower emissions. Additionally since TXC is not a high frequency clock,
use slow slew rate (FSEL=0) for lower emmissions and improved signal
quality.

Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-21 16:29:40 +08:00
Tim Harvey
0a138a2cfd arm64: dts: imx8mp-venice-gw702x: reduce RGMII CLK drive strength
The i.MX8M Plus EQOS RGMII tracelength is less than 1in and does not
require a x6 drive strength. Reduce the CLK drive strength to x1 for
lower emissions. Additionally since TXC is not a high frequency clock,
use slow slew rate (FSEL=0) for lower emmissions and improved signal
quality.

Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-21 16:29:40 +08:00
Tim Harvey
69bb376d5e arm64: dts: imx8m{m,n,p}-venice: disable unused clk output for TI PHY
Disable the unused refclk output for the TI DP83867 PHY used on
Gateworks Venice boards.

Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-21 16:29:40 +08:00
Tim Harvey
8b7e58ab4a arm64: dts: freescale: imx8mp-venice-gw7905-2x: remove duplicate usdhc1 props
Remove the un-intended duplicate properties from usdhc1.

Fixes: 0d5b288c21 ("arm64: dts: freescale: Add imx8mp-venice-gw7905-2x")
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-21 16:29:40 +08:00
Mathew McBride
410349084e arm64: dts: ten64: add board controller binding
The board (micro)controller[1] is responsible for functions such as
power supply sequencing, SoC reset as well as serial/MAC address storage,
bootcount and scratch registers.

There is currently no Linux kernel driver for this controller, however,
there is a driver in U-Boot which utilises this binding.

[1] https://ten64doc.traverse.com.au/hardware/microcontroller/

Signed-off-by: Mathew McBride <matt@traverse.com.au>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-21 15:57:46 +08:00
Mathew McBride
e260e8114e dt-bindings: embedded-controller: add Traverse Ten64 board controller
Add device tree binding for the board (micro)controller on Ten64 family
boards[1].

The schema is simple and is (presently) only consumed by U-Boot, but it
is possible nvmem, watchdog and other features could be described in
the future, as well as extension to future Traverse boards.

[1] https://ten64doc.traverse.com.au/hardware/microcontroller/

Signed-off-by: Mathew McBride <matt@traverse.com.au>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-21 15:57:08 +08:00
E Shattow
fa939a2872 dts: starfive: jh7110-common: split out mmc0 reset pins from common into boards
Prepare for Orange Pi RV using jh7110-common.dtsi having GPIO62 assignment
different than mmc0 reset by splitting this out into each board dts.

Signed-off-by: E Shattow <e@freeshell.de>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-10-20 18:36:20 +01:00
Lothar Rubusch
91b97ca3d4 ARM: dts: socfpga: add Enclustra SoM dts files
Add the approach to set up a combination of Enclustra's SoM on a carrier
board and corresponding boot-mode as single device-tree target.

Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-20 11:26:34 -05:00
Lothar Rubusch
558417387b dt-bindings: altera: removal of generic PE1 dts
Remove the binding for the generic Mercury+ AA1 on PE1 carrier board.

The removed Mercury+ AA1 on PE1 carrier board is just a particular
setup case, which is actually replaced by the set of generic Mercury+
AA1 combinations patch.

In other words a combination of a Mercury+ AA1 on a PE1 base board,
with boot mode SD card is already covered by the generic AA1
combinations. There is no further reason to keep this particular case
now in a redundantly. Thus the redundant DT setup is removed.

Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-20 11:25:55 -05:00
Lothar Rubusch
adacc09225 ARM: dts: socfpga: removal of generic PE1 dts
Remove the older socfpga_arria10_mercury_pe1.dts, since it is duplicate,
the hardware is covered by the combination of Enclustra's .dtsi files.

The older .dts was limited to only the case of having an Enclustra
Mercury+ AA1 on a Mercury+ PE1 base board, booting from sdmmc. This
functionality is provided also by the generic Enclustra dtsi and dts
files, in particular socfpga_arria10_mercury_aa1_pe1_sdmmc.dts. Since
both .dts files cover the same, the older one is to e replaced in
favor of the more modularized approach.

Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Acked-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-20 11:25:43 -05:00
Lothar Rubusch
a31736de52 dt-bindings: altera: add Mercury AA1 variants
Update binding with combined .dts for the Mercury+ PE1, PE3 and ST1
carrier boards with the Mercury+ AA1 SoM.

Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-20 11:25:00 -05:00
Lothar Rubusch
26a0157f4c ARM: dts: socfpga: add Mercury AA1 variants
Introduce support for Enclustra's Mercury+ AA1 SoM, based on Intel
Arria10. This is a flexible approach to allow for combining SoM
with carrier board .dtsi and boot-mode .dtsi in a device-tree file.

Signed-off-by: Andreas Buerkler <andreas.buerkler@enclustra.com>
Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-20 11:24:03 -05:00
Lothar Rubusch
729ef23972 dt-bindings: altera: add binding for Mercury+ SA2
Update the device-tree binding for the Enclustra Mercury+ SA2 SoM.

Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-20 11:23:43 -05:00
Lothar Rubusch
9fbc22b3c5 ARM: dts: socfpga: add Enclustra Mercury+ SA2
Introduce Enclustra's Mercury+ SA2 SoM based on Intel Cyclone5
technology as a .dtsi file.

Signed-off-by: Andreas Buerkler <andreas.buerkler@enclustra.com>
Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-20 11:23:29 -05:00
Lothar Rubusch
72459d9079 dt-bindings: altera: add Enclustra Mercury SA1
Update the DT binding for the Enclustra Mercury+ SA1 SoM

Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-20 11:22:56 -05:00
Lothar Rubusch
66243445fd ARM: dts: socfpga: add Enclustra Mercury SA1
Introduce support for Enclustra's Mercury SA1 SoM based on Intel Cyclone5
technology as a .dtsi file.

Signed-off-by: Andreas Buerkler <andreas.buerkler@enclustra.com>
Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-20 11:22:36 -05:00
Lothar Rubusch
2d7ef02f1e ARM: dts: socfpga: add Enclustra base-board dtsi
Add generic Enclustra base-board support for the Mercury+ PE1, the
Mercury+ PE3 and the Mercury+ ST1 board. The carrier boards can be
freely combined with the SoMs Mercury+ AA1, Mercury SA1 and
Mercury+ SA2.

Signed-off-by: Andreas Buerkler <andreas.buerkler@enclustra.com>
Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-20 11:16:49 -05:00
Lothar Rubusch
fad0dcaf37 ARM: dts: socfpga: add Enclustra boot-mode dtsi
Add generic boot-mode support to Enclustra Arria10 and Cyclone5 boards.
Some Enclustra carrier boards need hardware adjustments specific to the
selected boot-mode.

Enclustra's Arria10 SoMs allow for booting from different media. By
muxing certain IO pins, the media can be selected. This muxing can be
done by gpios at runtime e.g. when flashing QSPI from off the
bootloader. But also to have statically certain boot media available,
certain adjustments to the DT are needed:
- SD: QSPI must be disabled
- eMMC: QSPI must be disabled, bus width can be doubled to 8 byte
- QSPI: any mmc is disabled, QSPI then defaults to be enabled

The boot media must be accessible to the bootloader, e.g. to load a
bitstream file, but also to the system to mount the rootfs and to use
the specific performance.

Signed-off-by: Andreas Buerkler <andreas.buerkler@enclustra.com>
Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-20 11:15:25 -05:00
Sean Christopherson
9e4ce7a89e KVM: selftests: Use "gpa" and "gva" for local variable names in pre-fault test
Rename guest_test_{phys,virt}_mem to g{p,v}a in the pre-fault memory test
to shorten line lengths and to use standard terminology.

Opportunsitically use "base_gva" in the guest code instead of "base_gpa"
to match the host side code, which now passes in "gva" (and because
referencing the virtual address avoids having to know that the data is
identity mapped).

No functional change intended.

Cc: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/r/20251007224515.374516-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 08:59:30 -07:00
Sean Christopherson
17e5a9b777 KVM: selftests: Forcefully override ARCH from x86_64 to x86
Forcefully override ARCH from x86_64 to x86 to handle the scenario where
the user specifies ARCH=x86_64 on the command line.

Fixes: 9af04539d4 ("KVM: selftests: Override ARCH for x86_64 instead of using ARCH_DIR")
Cc: stable@vger.kernel.org
Reported-by: David Matlack <dmatlack@google.com>
Closes: https://lore.kernel.org/all/20250724213130.3374922-1-dmatlack@google.com
Link: https://lore.kernel.org/r/20251007223057.368082-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 08:55:24 -07:00
Brendan Jackman
b146b289f7 KVM: selftests: Don't fall over in mmu_stress_test when only one CPU is present
Running mmu_stress_test on a system with only one CPU is not a recipe for
success. However, there's no clear-cut reason why it absolutely
shouldn't work, so the test shouldn't completely reject such a platform.

At present, the *3/4 calculation will return zero on these platforms and
the test fails. So, instead just skip that calculation.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Link: https://lore.kernel.org/r/20251007-b4-kvm-mmu-stresstest-1proc-v1-1-8c95aa0e30b6@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 08:55:23 -07:00
Khairul Anuar Romli
2fab055251 arm64: dts: socfpga: agilex5: Add SMMU nodes
Agilex5 includes an ARM SMMU v3 (System Memory Management Unit) to provide
address translation and memory protection for DMA-capable devices such as
PCIe, USB, and other peripherals.

This commit adds the SMMU node to the Agilex5 device tree with compatible
string "arm,smmu-v3", along with its register space and interrupts.

The SMMU is required to:
- Enable DMA address translation for devices that cannot directly access
  the full physical memory space.
- Provide isolation and memory protection by restricting device access
  to specific regions of memory, improving system security.
- Support virtualization use cases by enabling safe and isolated device
  passthrough to guest VMs.
- Align with ARM platform architecture requirements for IOMMU support.

By describing the SMMU in the device tree, the Linux IOMMU framework
can probe and initialize it during boot. Devices in the system can then
bind to the SMMU via the `iommus` property, enabling memory translation
and protection features as expected.

The following devices are updated to reference the SMMU:
- NAND controller
- DMA controller
- SPI controller

This change is a necessary step toward full enablement high-speed
peripherals on Agilex5.

Signed-off-by: Adrian Ng Ho Yin <adrianhoyin.ng@altera.com>
Signed-off-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-20 10:21:30 -05:00
Khairul Anuar Romli
2c83769b2f dt-bindings: dma: snps,dw-axi-dmac: Add iommu property
Agilex5 integrates an ARM SMMU v3 (System Memory Management Unit) with
dedicated Translation Buffer Units (TBUs) assigned to various peripherals,
including the Synopsys DesignWare AXI DMA controller.

Each TBU handles address translation for its associated device by mapping
stream IDs to memory access permissions and virtual-to-physical address
mappings via the SMMU core.

The DesignWare AXI DMAC instances on Agilex5 are connected to their
respective TBUs. These TBUs forward DMA transactions from the controller
through the SMMU, enabling IOMMU-based features such as:
- Address translation for DMA operations
- Isolation and protection of memory regions accessed by the DMA controller
- Support for secure and virtualized environments through enforced access
  control

To support this configuration, the `iommus` property must be added to the
binding schema for `snps,dw-axi-dmac`. This allows the device tree to
associate each DMA controller with the correct SMMU stream ID, enabling
the Linux IOMMU framework to configure translation contexts at runtime.

This change documents the IOMMU support for the DMA controller on Agilex5
and allows proper integration with the SMMUv3 hardware.

Signed-off-by: Adrian Ng Ho Yin <adrianhoyin.ng@altera.com>
Signed-off-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-20 10:20:53 -05:00
Khairul Anuar Romli
4430d52cd7 dt-bindings: mtd: cdns,hp-nfc: Add iommu property
Agilex5 integrates an ARM SMMU (System Memory Management Unit) with
Translation Buffer Units (TBUs) assigned to various peripherals,
including the NAND controller.

The Cadence HP NAND controller ("cdns,hp-nfc") on Agilex5 is behind a
TBU connected to the system's SMMUv3. To support this, the controller
requires an `iommus` property in the device tree to properly configure
address translation through the IOMMU framework.

Adding the `iommus` property to the binding schema allows the OS
to associate the NAND controller with its corresponding SMMU stream ID.
This enables:
- DMA address translation between the controller and system memory
- Memory protection for NAND operations
- Proper functioning of the IOMMU framework in secure or virtualized
  environments

This change documents the IOMMU integration for the NAND controller
on platforms like Agilex5 where such hardware is present.

Signed-off-by: Adrian Ng Ho Yin <adrianhoyin.ng@altera.com>
Signed-off-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-20 10:19:36 -05:00
Hsun Lai
d809417c5a arm64: dts: rockchip: add DTs for 100ASK DShanPi A1
Add device tree for 100ASK DShanPi A1 with Rockchip RK3576 SoC
(4x Cortex-A72, 4x Cortex-A53, Mali-G52 MC3 GPU, 6 TOPS NPU). Enables
basic booting and connectivity.

Link: https://wiki.dshanpi.org/en/docs/DshanPi-A1/intro/

The device contains the following hardware that is tested/working:
 - 32 or 64GB eMMC
 - SDMMC card slot
 - M.2 WiFi slot
 - 4 or 8GB of RAM
 - 2x 1Gbps Ethernet
 - 2x USB 3.2 Gen 1 Type-A ports
 - USB 3.0 port
 - HDMI port

Signed-off-by: Hsun Lai <i@chainsx.cn>
Link: https://patch.msgid.link/77BB123A1F1E3D11+20251014021623.286121-3-i@chainsx.cn
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-20 16:55:17 +02:00
Hsun Lai
ff29a83cda dt-bindings: arm: rockchip: Add 100ASK DShanPi A1
This documents 100ASK DShanPi A1 which is a SBC based on RK3576 SoC.

Link: https://wiki.dshanpi.org/en/docs/DshanPi-A1/intro/

Signed-off-by: Hsun Lai <i@chainsx.cn>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/0601AB836AE5F348+20251014021623.286121-2-i@chainsx.cn
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-20 16:55:17 +02:00
Liangbin Lian
deaefeaf3d arm64: dts: rockchip: add LinkEase EasePi R1
LinkEase EasePi R1 [1] is a high-performance mini router.

Specification:
- Rockchip RK3568
- 2GB/4GB LPDDR4 RAM
- 16GB on-board eMMC
- 1x M.2 key for 2280 NVMe (PCIe 3.0)
- 1x USB 3.0 Type-A
- 1x USB 2.0 Type-C (for USB flashing)
- 2x 1000 Base-T (native, RTL8211F)
- 2x 2500 Base-T (PCIe, RTL8125B)
- 1x HDMI 2.0 Output
- 12v DC Jack
- 1x Power key connected to PMIC
- 2x LEDs (one static power supplied, one GPIO controlled)

[1] https://doc.linkease.com/zh/guide/easepi-r1/hardware.html

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Liangbin Lian <jjm2473@gmail.com>
Link: https://patch.msgid.link/20251014051226.64255-4-jjm2473@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-20 16:50:40 +02:00
Liangbin Lian
fc3cd4021e dt-bindings: arm: rockchip: Add LinkEase EasePi R1
LinkEase EasePi R1 is a high-performance mini router based on RK3568.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Liangbin Lian <jjm2473@gmail.com>
Link: https://patch.msgid.link/20251014051226.64255-3-jjm2473@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-20 16:50:40 +02:00
Liangbin Lian
9e38dc1abc dt-bindings: vendor-prefixes: Document LinkEase
LinkEase is a company focusing on the research and development of
network equipment and related software and hardware from Shenzhen.

Add vendor prefix for it.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Liangbin Lian <jjm2473@gmail.com>
Link: https://patch.msgid.link/20251014051226.64255-2-jjm2473@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-20 16:50:40 +02:00
Sean Christopherson
0bb4d9c39b KVM: guest_memfd: Define a CLASS to get+put guest_memfd file from a memslot
Add a CLASS to handle getting and putting a guest_memfd file given a
memslot to reduce the amount of related boilerplate, and more importantly
to minimize the chances of forgetting to put the file (thankfully the bug
that prompted this didn't escape initial testing).

Define a CLASS instead of using __free(fput) as _free() comes with subtle
caveats related to FILO ordering (objects are freed in the order in which
they are declared), and the recommended solution/workaround (declare file
pointers exactly when they are initialized) is visually jarring relative
to KVM's (and the kernel's) overall strict adherence to not mixing
declarations and code.  E.g. the use in kvm_gmem_populate() would be:

	slot = gfn_to_memslot(kvm, start_gfn);
	if (!kvm_slot_has_gmem(slot))
		return -EINVAL;

	struct file *file __free(fput) = kvm_gmem_get_file(slot;
	if (!file)
		return -EFAULT;

	filemap_invalidate_lock(file->f_mapping);

Note, using CLASS() still declares variables in the middle of code, but
the syntactic sugar obfuscates the declaration, i.e. hides the anomaly to
a large extent.

No functional change intended.

Link: https://lore.kernel.org/r/20251007222356.348349-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:46 -07:00
Sean Christopherson
e66438bb81 KVM: guest_memfd: Add gmem_inode.flags field instead of using i_private
Track a guest_memfd instance's flags in gmem_inode instead of burying them
in i_private.  Burning an extra 8 bytes per inode is well worth the added
clarity provided by explicit tracking.

Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Link: https://lore.kernel.org/r/20251016172853.52451-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:46 -07:00
Shivank Garg
38ccc50ac0 KVM: selftests: Add guest_memfd tests for mmap and NUMA policy support
Add tests for NUMA memory policy binding and NUMA aware allocation in
guest_memfd. This extends the existing selftests by adding proper
validation for:
  - KVM GMEM set_policy and get_policy() vm_ops functionality using
    mbind() and get_mempolicy()
  - NUMA policy application before and after memory allocation

Run the NUMA mbind() test with and without INIT_SHARED, as KVM should allow
doing mbind(), madvise(), etc. on guest-private memory, e.g. so that
userspace can set NUMA policy for CoCo VMs.

Run the NUMA allocation test only for INIT_SHARED, i.e. if the host can't
fault-in memory (via direct access, madvise(), etc.) as move_pages()
returns -ENOENT if the page hasn't been faulted in (walks the host page
tables to find the associated folio)

[sean: don't skip entire test when running on non-NUMA system, test mbind()
       with private memory, provide more info in assert messages]

Signed-off-by: Shivank Garg <shivankg@amd.com>
Tested-by: Ashish Kalra <ashish.kalra@amd.com>
Link: https://lore.kernel.org/r/20251016172853.52451-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:45 -07:00
Shivank Garg
e698e89b3e KVM: selftests: Add helpers to probe for NUMA support, and multi-node systems
Add NUMA helpers to probe for support/availability and to check if the
test is running on a multi-node system.  The APIs will be used to verify
guest_memfd NUMA support.

Signed-off-by: Shivank Garg <shivankg@amd.com>
[sean: land helpers in numaif.h, add comments, tweak names]
Link: https://lore.kernel.org/r/20251016172853.52451-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:44 -07:00
Sean Christopherson
fe7baebb99 KVM: selftests: Use proper uAPI headers to pick up mempolicy.h definitions
Drop the KVM's re-definitions of MPOL_xxx flags in numaif.h as they are
defined by the already-included, kernel-provided mempolicy.h.  The only
reason the duplicate definitions don't cause compiler warnings is because
they are identical, but only on x86-64!  The syscall numbers in particular
are subtly x86_64-specific, i.e. will cause problems if/when numaif.h is
used outsize of x86.

Opportunistically clean up the file comment as the license information is
covered by the SPDX header, the path is superfluous, and as above the
comment about the contents is flat out wrong.

Fixes: 346b59f220 ("KVM: selftests: Add missing header file needed by xAPIC IPI tests")
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Link: https://lore.kernel.org/r/20251016172853.52451-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:44 -07:00
Sean Christopherson
2189d78269 KVM: selftests: Add additional equivalents to libnuma APIs in KVM's numaif.h
Add APIs for all syscalls defined in the kernel's mm/mempolicy.c to match
those that would be provided by linking to libnuma.  Opportunistically use
the recently inroduced KVM_SYSCALL_DEFINE() builders to take care of the
boilerplate, and to fix a flaw where the two existing wrappers would
generate multiple symbols if numaif.h were to be included multiple times.

Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Link: https://lore.kernel.org/r/20251016172853.52451-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:43 -07:00
Sean Christopherson
29dc539d74 KVM: selftests: Report stacktraces SIGBUS, SIGSEGV, SIGILL, and SIGFPE by default
Register handlers for signals for all selftests that are likely happen due
to test (or kernel) bugs, and explicitly fail tests on unexpected signals
so that users get a stack trace, i.e. don't have to go spelunking to do
basic triage.

Register the handlers as early as possible, to catch as many unexpected
signals as possible, and also so that the common code doesn't clobber a
handler that's installed by test (or arch) code.

Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Link: https://lore.kernel.org/r/20251016172853.52451-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:42 -07:00
Sean Christopherson
3223560c93 KVM: selftests: Define wrappers for common syscalls to assert success
Add kvm_<sycall> wrappers for munmap(), close(), fallocate(), and
ftruncate() to cut down on boilerplate code when a sycall is expected
to succeed, and to make it easier for developers to remember to assert
success.

Implement and use a macro framework similar to the kernel's SYSCALL_DEFINE
infrastructure to further cut down on boilerplate code, and to drastically
reduce the probability of typos as the kernel's syscall definitions can be
copy+paste almost verbatim.

Provide macros to build the raw <sycall>() wrappers as well, e.g. to
replace hand-coded wrappers (NUMA) or pure open-coded calls.

Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Link: https://lore.kernel.org/r/20251016172853.52451-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:42 -07:00
Shivank Garg
ed1ffa810b KVM: guest_memfd: Enforce NUMA mempolicy using shared policy
Previously, guest-memfd allocations followed local NUMA node id in absence
of process mempolicy, resulting in arbitrary memory allocation.
Moreover, mbind() couldn't be used  by the VMM as guest memory wasn't
mapped into userspace when allocation occurred.

Enable NUMA policy support by implementing vm_ops for guest-memfd mmap
operation.  This allows the VMM to use mmap()+mbind() to set the desired
NUMA policy for a range of memory, and provides fine-grained control over
guest memory allocation across NUMA nodes.

Note, using mmap()+mbind() works even for PRIVATE memory, as mbind()
doesn't require the memory to be faulted in.  However, get_mempolicy()
and other paths that require the userspace page tables to be populated
may return incorrect information for PRIVATE memory (though under the hood,
KVM+guest_memfd will still behave correctly).

Store the policy in the inode structure, gmem_inode, as a shared memory
policy, so that the policy is a property of the physical memory itself,
i.e. not bound to the VMA.  In guest_memfd, KVM is the primary MMU and any
VMAs are secondary, i.e. using mbind() on a VMA to set policy is a means
to an end, e.g. to avoid having to add a file-based equivalent to mbind().

Similarly, retrieve the policy via mpol_shared_policy_lookup(), not
get_vma_policy(), even when allocating to fault in memory for userspace
mappings, so that the policy stored in gmem_inode is always the source of
true.

Apply policy changes only to future allocations, i.e. do not migrate
existing memory in the guest_memfd instance.  This matches mbind(2)'s
default behavior, which affects only new allocations unless overridden
with MPOL_MF_MOVE/MPOL_MF_MOVE_ALL flags (which are not supported by
guest_memfd as guest_memfd memory is unmovable).

Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Shivank Garg <shivankg@amd.com>
Tested-by: Ashish Kalra <ashish.kalra@amd.com>
Link: https://lore.kernel.org/all/e9d43abc-bcdb-4f9f-9ad7-5644f714de19@amd.com
[sean: fold in fixup (see Link above), massage changelog]
Link: https://lore.kernel.org/r/20251016172853.52451-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:41 -07:00
Shivank Garg
f609e89ae8 KVM: guest_memfd: Add slab-allocated inode cache
Add a dedicated gmem_inode structure and a slab-allocated inode cache for
guest memory backing, similar to how shmem handles inodes.

This adds the necessary allocation/destruction functions and prepares
for upcoming guest_memfd NUMA policy support changes.  Using a dedicated
structure will also allow for additional cleanups, e.g. to track flags in
gmem_inode instead of i_private.

Signed-off-by: Shivank Garg <shivankg@amd.com>
Tested-by: Ashish Kalra <ashish.kalra@amd.com>
[sean: s/kvm_gmem_inode_info/gmem_inode, name init_once()]
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Link: https://lore.kernel.org/r/20251016172853.52451-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:40 -07:00
Ackerley Tng
a63ca4236e KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes
guest_memfd's inode represents memory the guest_memfd is
providing. guest_memfd's file represents a struct kvm's view of that
memory.

Using a custom inode allows customization of the inode teardown
process via callbacks. For example, ->evict_inode() allows
customization of the truncation process on file close, and
->destroy_inode() and ->free_inode() allow customization of the inode
freeing process.

Customizing the truncation process allows flexibility in management of
guest_memfd memory and customization of the inode freeing process
allows proper cleanup of memory metadata stored on the inode.

Memory metadata is more appropriately stored on the inode (as opposed
to the file), since the metadata is for the memory and is not unique
to a specific binding and struct kvm.

Acked-by: David Hildenbrand <david@redhat.com>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
Tested-by: Ashish Kalra <ashish.kalra@amd.com>
[sean: drop helpers, open code logic in __kvm_gmem_create()]
Link: https://lore.kernel.org/r/20251016172853.52451-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:40 -07:00
Sean Christopherson
392dd9d948 KVM: guest_memfd: Add macro to iterate over gmem_files for a mapping/inode
Add a kvm_gmem_for_each_file() to make it more obvious that KVM is
iterating over guest_memfd _files_, not guest_memfd instances, as could
be assumed given the name "gmem_list".

No functional change intended.

Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Link: https://lore.kernel.org/r/20251016172853.52451-3-seanjc@google.com
[sean: drop .clang-format change]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:39 -07:00
Sean Christopherson
497b1dfbca KVM: guest_memfd: Rename "struct kvm_gmem" to "struct gmem_file"
Rename the "kvm_gmem" structure to "gmem_file" in anticipation of using
dedicated guest_memfd inodes instead of anonyomous inodes, at which point
the "kvm_gmem" nomenclature becomes quite misleading.  In guest_memfd,
inodes are effectively the raw underlying physical storage, and will be
used to track properties of the physical memory, while each gmem file is
effectively a single VM's view of that storage, and is used to track assets
specific to its associated VM, e.g. memslots=>gmem bindings.

Using "kvm_gmem" suggests that the per-VM/per-file structures are _the_
guest_memfd instance, which almost the exact opposite of reality.

Opportunistically rename local variables from "gmem" to "f", again to
avoid confusion once guest_memfd specific inodes come along.

No functional change intended.

Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Link: https://lore.kernel.org/r/20251016172853.52451-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:38 -07:00
Sean Christopherson
5f3e10797a KVM: guest_memfd: Drop a superfluous local var in kvm_gmem_fault_user_mapping()
Drop the local "int err" that's buried in the middle guest_memfd's user
fault handler to avoid the potential for variable shadowing, e.g. if an
"err" variable were also declared at function scope.

No functional change intended.

Link: https://lore.kernel.org/r/20251007222733.349460-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:29 -07:00
Pedro Demarchi Gomes
765fcd7c07 KVM: guest_memfd: use folio_nr_pages() instead of shift operation
folio_nr_pages() is a faster helper function to get the number of pages when
NR_PAGES_IN_LARGE_FOLIO is enabled.

Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
Link: https://lore.kernel.org/r/20251004030210.49080-1-pedrodemargomes@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:28 -07:00
Shivank Garg
3f1078a445 KVM: guest_memfd: remove redundant gmem variable initialization
Remove redundant initialization of gmem in __kvm_gmem_get_pfn() as it is
already initialized at the top of the function.

No functional change intended.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
Link: https://lore.kernel.org/r/20251012071607.17646-2-shivankg@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:27 -07:00
Shivank Garg
049e560d4f KVM: guest_memfd: move kvm_gmem_get_index() and use in kvm_gmem_prepare_folio()
Move kvm_gmem_get_index() to the top of the file so that it can be used
in kvm_gmem_prepare_folio() to replace the open-coded calculation.

No functional change intended.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
Link: https://lore.kernel.org/r/20251012071607.17646-1-shivankg@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:27 -07:00
Shivank Garg
f634f10809 mm/mempolicy: Export memory policy symbols
KVM guest_memfd wants to implement support for NUMA policies just like
shmem already does using the shared policy infrastructure. As
guest_memfd currently resides in KVM module code, we have to export the
relevant symbols.

In the future, guest_memfd might be moved to core-mm, at which point the
symbols no longer would have to be exported. When/if that happens is
still unclear.

Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Shivank Garg <shivankg@amd.com>
Tested-by: Ashish Kalra <ashish.kalra@amd.com>
Link: https://lore.kernel.org/r/20250827175247.83322-6-shivankg@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:26 -07:00
Matthew Wilcox
16a542e223 mm/filemap: Extend __filemap_get_folio() to support NUMA memory policies
Extend __filemap_get_folio() to support NUMA memory policies by
renaming the implementation to __filemap_get_folio_mpol() and adding
a mempolicy parameter. The original function becomes a static inline
wrapper that passes NULL for the mempolicy.

This infrastructure will enable future support for NUMA-aware page cache
allocations in guest_memfd memory backend KVM guests.

Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
Tested-by: Ashish Kalra <ashish.kalra@amd.com>
Link: https://lore.kernel.org/r/20250827175247.83322-5-shivankg@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:25 -07:00
Matthew Wilcox
7f3779a3ac mm/filemap: Add NUMA mempolicy support to filemap_alloc_folio()
Add a mempolicy parameter to filemap_alloc_folio() to enable NUMA-aware
page cache allocations. This will be used by upcoming changes to
support NUMA policies in guest-memfd, where guest_memory need to be
allocated NUMA policy specified by VMM.

All existing users pass NULL maintaining current behavior.

Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
Tested-by: Ashish Kalra <ashish.kalra@amd.com>
Link: https://lore.kernel.org/r/20250827175247.83322-4-shivankg@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:25 -07:00
Janosch Frank
182a258b5e Documentation: kvm: Fix ordering
7.43 has been assigned twice, make
KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 7.44.

Fixes: f55ce5a6cd ("KVM: arm64: Expose new KVM cap for cacheable PFNMAP")
Reviewed-by: Ankit Agrawal <ankita@nvidia.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2025-10-20 12:44:04 +00:00
Josephine Pfeiffer
f5a6fa189a KVM: s390: Replace sprintf with snprintf for buffer safety
Replace sprintf() with snprintf() when formatting debug names to prevent
potential buffer overflow. The debug_name buffer is 16 bytes, and while
unlikely to overflow with current PIDs, using snprintf() provides proper
bounds checking.

Signed-off-by: Josephine Pfeiffer <hi@josie.lol>
[frankja@linux.ibm.com: Fixed subject prefix]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2025-10-20 12:43:18 +00:00
Kieran Bingham
2ad83e016e arm64: dts: freescale: debix-som-a-bmb-08: Enable HDMI output
Enable the HDMI output on the Debix SOM A board, using the HDMI encoder
present in the i.MX8MP SoC.

Enable and configure all nodes required for the HDMI port usage.

Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Marco Felsch <m.felsch@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-20 20:34:45 +08:00
Primoz Fiser
3ae88e453e arm64: dts: freescale: imx93-phyboard-segin: Add USB vbus regulators
Add USB vbus regulators to silence the following kernel warnings:

  usb_phy_generic usbphynop1: dummy supplies not allowed for exclusive requests (id=vbus)
  usb_phy_generic usbphynop2: dummy supplies not allowed for exclusive requests (id=vbus)

Because generic USB PHY driver requires exclusive vbus regulators since
commit 75fd6485cc ("usb: phy: generic: Get the vbus supply").

Signed-off-by: Primoz Fiser <primoz.fiser@norik.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-20 20:31:44 +08:00
Primoz Fiser
fec2d8fcde arm64: dts: freescale: imx93-phyboard-nash: Add USB vbus regulators
Add USB vbus regulators to silence the following kernel warnings:

  usb_phy_generic usbphynop1: dummy supplies not allowed for exclusive requests (id=vbus)
  usb_phy_generic usbphynop2: dummy supplies not allowed for exclusive requests (id=vbus)

Because generic USB PHY driver requires exclusive vbus regulators since
commit 75fd6485cc ("usb: phy: generic: Get the vbus supply").

Signed-off-by: Primoz Fiser <primoz.fiser@norik.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-20 20:31:44 +08:00
Alexander Stein
606f1ecaa4 arm64: dts: tqma8mpql-mba8mpxl: Add MicIn routing
MicIn is connected to IN3_L. Add routing including the Mic Bias.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-20 20:28:49 +08:00
Fabio Estevam
f679e54e67 dt-bindings: soc: imx-iomuxc-gpr: Document the CSI mux
On i.MX6Q/6DL the following subnodes exist to describe the CSI port muxing:

- ipu1_csi0_mux
- ipu1_csi1_mux
- ipu2_csi0_mux
- ipu2_csi1_mux

As they were not documented, dt-schema emits warnings like:

  'ipu1_csi0_mux', 'ipu1_csi1_mux' do not match any of the regexes:
  '^pinctrl-[0-9]+$'

Add a top-level patternProperties entry for these CSI mux subnodes
and restrict it to i.MX6Q.

Signed-off-by: Fabio Estevam <festevam@gmail.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-20 19:57:43 +08:00
Yao Zi
047bac0be3 arm64: dts: rockchip: Enable PCIe controller on Radxa E20C
Radxa E20C provides one of its GbE ports through RTL8111H connected to
SoC's PCIe controller. Let's enable the controller and the PHY used by
it to allow usage of the port.

Signed-off-by: Yao Zi <ziyao@disroot.org>
Link: https://patch.msgid.link/20250918153057.56023-4-ziyao@disroot.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-20 13:03:18 +02:00
Yao Zi
263fac6b09 arm64: dts: rockchip: Add PCIe Gen2x1 controller for RK3528
Describes the PCIe Gen2x1 controller integrated in RK3528 SoC. The SoC
doesn't provide a separate MSI controller, thus the one integrated in
designware PCIe IP must be used.

Signed-off-by: Yao Zi <ziyao@disroot.org>
Reviewed-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://patch.msgid.link/20250918153057.56023-3-ziyao@disroot.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-20 13:03:18 +02:00
Xu Yang
e918e5f847 arm64: dts: imx94: add DDR Perf Monitor node
Add DDR Perf Monitor for i.MX94.

Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-20 18:00:54 +08:00
Steffen Trumtrar
b26fdab2e1 arm64: dts: imx8mp-skov: support new 10" panel board
This board is similar to the already upstream
imx8mp-skov-recv-tian-g07017.dts but uses a different 10" panel with a
different touch controller.

Signed-off-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-20 17:57:59 +08:00
Steffen Trumtrar
cf3c07f4da dt-bindings: arm: fsl: add compatible for Skov i.MX8MP variant
In preparation for adding a new device tree variant with a different
panel, describe the DT compatible in the binding.

Signed-off-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-20 17:57:41 +08:00
Bence Csókás
faa6baa364 ARM: dts: imx53-usbarmory: Replace license text comment with SPDX identifier
Replace verbatim license text with a `SPDX-License-Identifier`.

The comment header mis-attributes this license to be "X11", but the
license text does not include the last line "Except as contained in this
notice, the name of the X Consortium shall not be used in advertising or
otherwise to promote the sale, use or other dealings in this Software
without prior written authorization from the X Consortium.". Therefore,
this license is actually equivalent to the SPDX "MIT" license (confirmed
by text diffing).

Cc: Andrej Rosano <andrej@inversepath.com>
Signed-off-by: Bence Csókás <csokas.bence@prolan.hu>
Acked-by: Andrej Rosano <andrej.rosano@reversec.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-20 17:47:17 +08:00
Frank Li
e22ff6f193 arm64: dts: fsl-lx2160a: include rev2 chip's dts
The mass production lx2160 rev2 use designware PCIe Controller. Old Rev1
which use mobivel PCIe controller was not supported. Although uboot
fixup can change compatible string fsl,lx2160a-pcie to fsl,ls2088a-pcie
since 2019, it is quite confused and should correctly reflect hardware
status in dtb. Change freescale's board to use rev2's dtsi firstly.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2025-10-20 17:28:30 +08:00
Thorsten Blum
7d5136ed1b KVM: s390: Remove unused return variable in kvm_arch_vcpu_ioctl_set_fpu
kvm_arch_vcpu_ioctl_set_fpu() always returns 0 and the local return
variable 'ret' is not used anymore. Remove it.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2025-10-20 08:47:58 +00:00
Christoph Schlameuss
14542a0a54 KVM: S390: Remove sca_lock
Since we are no longer switching from a BSCA to a ESCA we can completely
get rid of the sca_lock. The write lock was only taken for that
conversion.

After removal of the lock some local code cleanups are possible.

Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Suggested-by: Janosch Frank <frankja@linux.ibm.com>
[frankja@linux.ibm.com: Added suggested-by tag as discussed on list]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2025-10-20 08:47:58 +00:00
Christoph Schlameuss
e72753ed12 KVM: s390: Use ESCA instead of BSCA at VM init
All modern IBM Z and Linux One machines do offer support for the
Extended System Control Area (ESCA). The ESCA is available since the
z114/z196 released in 2010.
KVM needs to allocate and manage the SCA for guest VMs. Prior to this
change the SCA was setup as Basic SCA only supporting a maximum of 64
vCPUs when initializing the VM. With addition of the 65th vCPU the SCA
was needed to be converted to a ESCA.

Instead of allocating a BSCA and upgrading it for PV or when adding the
65th cpu we can always allocate the ESCA directly upon VM creation
simplifying the code in multiple places as well as completely removing
the need to convert an existing SCA.

In cases where the ESCA is not supported (z10 and earlier) the use of
the SCA entries and with that SIGP interpretation are disabled for VMs.
This increases the number of exits from the VM in multiprocessor
scenarios and thus decreases performance.
The same is true for VSIE where SIGP is currently disabled and thus no
SCA entries are used.

The only downside of the change is that we will always allocate 4 pages
for a 248 cpu ESCA instead of a single page for the BSCA per VM.
In return we can delete a bunch of checks and special handling depending
on the SCA type as well as the whole BSCA to ESCA conversion.

With that behavior change we are no longer referencing a bsca_block in
kvm->arch.sca. This will always be esca_block instead.
By specifying the type of the sca as esca_block we can simplify access
to the sca and get rid of some helpers while making the code clearer.

KVM_MAX_VCPUS is also moved to kvm_host_types to allow using this in
future type definitions.

Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2025-10-20 08:47:43 +00:00
Geert Uytterhoeven
a73b6603b4 ARM: shmobile: defconfig: Refresh for v6.18-rc1
Refresh the defconfig for Renesas ARM systems:
  - Drop CONFIG_SCHED_MC=y (auto-enabled since commit 7bd291abe2
    ("sched: Unify the SCHED_{SMT,CLUSTER,MC} Kconfig")),
  - Disable CONFIG_SCHED_SMT (auto-enabled since commit 7bd291abe2
    ("sched: Unify the SCHED_{SMT,CLUSTER,MC} Kconfig")),
  - Restore CONFIG_ARM_GT_INITIAL_PRESCALER_VAL=1 (default changed to
    zero (auto-detect) in commit 1c4b87c921
    ("clocksource/drivers/arm_global_timer: Add auto-detection for
    initial prescaler values")),
  - Disable CONFIG_RPCSEC_GSS_KRB5 (auto-enabled since commit
    d8e97cc476 ("SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO
    instead of depending on it")).

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/d0fcc82fb294021bf96f8a490234165e15aadb43.1760530468.git.geert+renesas@glider.be
2025-10-20 10:01:35 +02:00
Tudor Ambarus
bb103f6c73 arm64: dts: exynos: gs101: add OPPs
Add operating performance points (OPPs).

Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Peter Griffin <peter.griffin@linaro.org>
Tested-by: Peter Griffin <peter.griffin@linaro.org> # on gs101-oriole
Link: https://patch.msgid.link/20250924-acpm-dvfs-dt-v4-3-3106d49e03f5@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-20 09:05:35 +02:00
Tudor Ambarus
025707fa26 arm64: dts: exynos: gs101: add CPU clocks
Add the GS101 CPU clocks exposed through the ACPM protocol.

Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Peter Griffin <peter.griffin@linaro.org>
Tested-by: Peter Griffin <peter.griffin@linaro.org> # on gs101-oriole
Link: https://patch.msgid.link/20250924-acpm-dvfs-dt-v4-2-3106d49e03f5@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-20 09:05:34 +02:00
Tudor Ambarus
2e96df3200 arm64: dts: exynos: gs101: add #clock-cells to the ACPM protocol node
Make the ACPM node a clock provider by adding the mandatory
"#clock-cells" property, which allows devices to reference
its clock outputs.

Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Peter Griffin <peter.griffin@linaro.org>
Tested-by: Peter Griffin <peter.griffin@linaro.org> # on gs101-oriole
Link: https://patch.msgid.link/20250924-acpm-dvfs-dt-v4-1-3106d49e03f5@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-20 09:05:34 +02:00
Krzysztof Kozlowski
c06dce26c2 Merge branch 'for-v6.19/clk-dt-bindings' into next/dt64
Merge topic branch with binding headers
(include/dt-bindings/clock/google,gs101-acpm.h) which are used in DTS.
2025-10-20 09:03:54 +02:00
Tudor Ambarus
83c4e3c39b dt-bindings: firmware: google,gs101-acpm-ipc: add ACPM clocks
The firmware exposes clocks that can be controlled via the
Alive Clock and Power Manager (ACPM) interface.

Make the ACPM node a clock provider by adding the mandatory
"#clock-cells" property, which allows devices to reference its
clock outputs.

Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Peter Griffin <peter.griffin@linaro.org>
Tested-by: Peter Griffin <peter.griffin@linaro.org> # on gs101-oriole
Link: https://patch.msgid.link/20251010-acpm-clk-v6-1-321ee8826fd4@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-20 08:46:28 +02:00
Randy Dunlap
edd548dc64 firmware: qcom: tzmem: fix qcom_tzmem_policy kernel-doc
Fix kernel-doc warnings by using correct kernel-doc syntax and
formatting to prevent warnings:

Warning: include/linux/firmware/qcom/qcom_tzmem.h:25 Enum value
 'QCOM_TZMEM_POLICY_STATIC' not described in enum 'qcom_tzmem_policy'
Warning: ../include/linux/firmware/qcom/qcom_tzmem.h:25 Enum value
 'QCOM_TZMEM_POLICY_MULTIPLIER' not described in enum 'qcom_tzmem_policy'
Warning: ../include/linux/firmware/qcom/qcom_tzmem.h:25 Enum value
 'QCOM_TZMEM_POLICY_ON_DEMAND' not described in enum 'qcom_tzmem_policy'

Fixes: 84f5a7b67b ("firmware: qcom: add a dedicated TrustZone buffer allocator")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20251017191323.1820167-1-rdunlap@infradead.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-19 19:29:11 -05:00
Drew Fustini
a71e6e8eea riscv: defconfig: Enable Tenstorrent SoCs
Enable support for Tenstorrent SoCs in the default configuration.

Reviewed-by: Joel Stanley <jms@oss.tenstorrent.com>
Signed-off-by: Drew Fustini <dfustini@oss.tenstorrent.com>
2025-10-18 10:44:15 -07:00
Drew Fustini
16eb4e02a6 riscv: Kconfig.socs: Add ARCH_TENSTORRENT for Tenstorrent SoCs
Add Kconfig option ARCH_TENSTORRENT to enable support for SoCs like the
Blackhole.

Reviewed-by: Joel Stanley <jms@oss.tenstorrent.com>
Signed-off-by: Drew Fustini <dfustini@oss.tenstorrent.com>
2025-10-18 10:44:15 -07:00
Drew Fustini
8e88602e99 riscv: dts: Add Tenstorrent Blackhole SoC PCIe cards
Add device tree source describing the Tenstorrent Blackhole SoC and the
Blackhole P100 and P150 PCIe cards. There are no differences between
the P100 and P150 cards from the perspective of an OS kernel like Linux
running on the X280 cores.

There is a virtual UART implemented in OpenSBI firmware that allows a
console program on the PCIe host to communicate through shared memory
with Linux running on the Blackhole card. CONFIG_HVC_RISCV_SBI needs to
be enabled. The boot script on the host adds 'console=hvc0' so that the
full boot output appears in the console program on the host.

Link: https://github.com/tenstorrent/opensbi/
Link: https://github.com/tenstorrent/tt-bh-linux
Reviewed-by: Joel Stanley <jms@oss.tenstorrent.com>
Signed-off-by: Drew Fustini <dfustini@oss.tenstorrent.com>
2025-10-18 10:44:15 -07:00
Drew Fustini
d6133f79da dt-bindings: interrupt-controller: Add Tenstorrent Blackhole compatible
Document compatible for the PLIC in the Tenstorrent Blackhole SoC.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Joel Stanley <jms@oss.tenstorrent.com>
Signed-off-by: Drew Fustini <dfustini@oss.tenstorrent.com>
2025-10-18 10:44:14 -07:00
Drew Fustini
b5951b25d7 dt-bindings: timers: Add Tenstorrent Blackhole compatible
Document clint compatible for the Tenstorrent Blackhole SoC.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Joel Stanley <jms@oss.tenstorrent.com>
Signed-off-by: Drew Fustini <dfustini@oss.tenstorrent.com>
2025-10-18 10:44:14 -07:00
Drew Fustini
571e42a119 dt-bindings: riscv: cpus: Add SiFive X280 compatible
Document compatible for the SiFive X280 RISC-V core.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Joel Stanley <jms@oss.tenstorrent.com>
Signed-off-by: Drew Fustini <dfustini@oss.tenstorrent.com>
2025-10-18 10:44:14 -07:00
Drew Fustini
4de28f1edc dt-bindings: riscv: Add Tenstorrent Blackhole compatible
Add compatibles for the Tenstorrent Blackhole SoC PCIe card.

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Joel Stanley <jms@oss.tenstorrent.com>
Signed-off-by: Drew Fustini <dfustini@oss.tenstorrent.com>
2025-10-18 10:44:14 -07:00
Drew Fustini
fbf76e4bc2 dt-bindings: vendor-prefixes: Add Tenstorrent AI ULC
Document vendor prefix for Tenstorrent in DT bindings.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Joel Stanley <jms@oss.tenstorrent.com>
Signed-off-by: Drew Fustini <dfustini@oss.tenstorrent.com>
2025-10-18 10:44:14 -07:00
André Draszik
8b9cd112f1 soc: samsung: gs101-pmu: implement access tables for read and write
Accessing non-existent PMU registers causes an SError, halting the
system.

Implement read and write access tables for the gs101-PMU to specify
which registers are read- and/or writable to avoid that SError.

Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org>
Signed-off-by: André Draszik <andre.draszik@linaro.org>
Link: https://patch.msgid.link/20251009-gs101-pmu-regmap-tables-v2-3-2d64f5261952@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-18 19:16:45 +02:00
André Draszik
b320711e4c soc: samsung: exynos-pmu: move some gs101 related code into new file
To avoid cluttering common code, move most of the gs101 code into a new
file, gs101-pmu.c

More code is going to be added for gs101 - having it all in one file
helps keeping the common code (file) more readable. While at it, rename
variables 'ctx' to 'context' for consistency.

No functional change.

Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org>
Signed-off-by: André Draszik <andre.draszik@linaro.org>
Link: https://patch.msgid.link/20251009-gs101-pmu-regmap-tables-v2-2-2d64f5261952@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-18 19:16:44 +02:00
André Draszik
1fce7e4d6c soc: samsung: exynos-pmu: allow specifying read & write access tables for secure regmap
Accessing non-existent PMU registers causes an SError, halting the
system.

regmap can help us with that by allowing to pass the list of valid
registers as part of the config during creation. When this driver
creates a new regmap itself rather than relying on
syscon_node_to_regmap(), it's therefore easily possible to hook in
custom access tables for valid read and write registers.

Specifying access tables avoids SErrors for invalid registers and
instead the regmap core can just return an error. Outside drivers, this
is also helpful when using debugfs to access the regmap.

Make it possible for drivers to specify read and write tables to be
used on creation of the secure regmap by adding respective fields to
struct exynos_pmu_data. Also add kerneldoc to same struct while
updating it.

Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org>
Signed-off-by: André Draszik <andre.draszik@linaro.org>
Link: https://patch.msgid.link/20251009-gs101-pmu-regmap-tables-v2-1-2d64f5261952@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-18 19:16:44 +02:00
Kaustabh Chakraborty
3abd9b087a dt-bindings: samsung: exynos-sysreg: add exynos7870 sysregs
Add sysreg compatible strings for the Exynos7870 SoC. Two sysregs are
added, used for the SoC MIPI PHY's CSIS and DSIM blocks.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-18 18:48:22 +02:00
Ivaylo Ivanov
aaf9a2f487 soc: samsung: exynos-chipid: add exynos8890 SoC support
Add exynos8890 information to soc_ids tables. This SoC product id
is "0xE8890000".

Signed-off-by: Ivaylo Ivanov <ivo.ivanov.ivanov1@gmail.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-18 18:14:35 +02:00
Ivaylo Ivanov
7319872fe0 dt-bindings: hwinfo: samsung,exynos-chipid: add exynos8890-chipid compatible
Document the samsung,exynos8890-chipid compatible. The registers are
entirely compatible with "samsung,exynos4210-chipid".

Signed-off-by: Ivaylo Ivanov <ivo.ivanov.ivanov1@gmail.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-18 18:14:35 +02:00
Ivaylo Ivanov
a1d5955b28 dt-bindings: soc: samsung: exynos-pmu: add exynos8890 compatible
Add exynos8890-pmu compatible to the bindings documentation. Since
Samsung, as usual, reuses devices from older designs, use the
samsung,exynos7-pmu compatible.

Signed-off-by: Ivaylo Ivanov <ivo.ivanov.ivanov1@gmail.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-18 18:14:35 +02:00
Valentine Burley
30f89840f1 arm64: dts: qcom: apq8096-db820c: Specify zap shader location
The zap shader was previously loaded from "qcom/a530_zap.mdt", which is a
symlink to "qcom/apq8096/a530_zap.mbn". Update the DTS to reference the
actual firmware file in linux-firmware directly.

This avoids relying on the symlink and ensures a more robust firmware load
path.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251014084808.112097-1-valentine.burley@collabora.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-17 15:39:21 -07:00
Antony Kurniawan Soemardi
c26a725087 arm64: dts: qcom: pmi8950: Fix VADC channel scaling factors
Fix USBIN/DCIN scaling to match the downstream implementation [1].

Downstream defines the following scaling mappings [2], corresponding
to mainline pre-scaling values:

  <4>  ->  <1 20>
  <1>  ->  <1 3>

[1] e6b46fc6f5/arch/arm64/boot/dts/qcom/pmi8950.dtsi (L55-L86)
[2] e6b46fc6f5/include/linux/qpnp/qpnp-adc.h (L342-L357)

Signed-off-by: Antony Kurniawan Soemardi <linux@smankusors.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251004-fix-pmi8950-vadc-v1-2-3143ecab99e9@smankusors.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-17 15:39:21 -07:00
Antony Kurniawan Soemardi
78473494a7 arm64: dts: qcom: pmi8950: Add missing VADC channels
When booting msm8953-based devices, the following kernel message
appears:
[ 13.090800] qcom-spmi-vadc 200f000.spmi:pmic@2:adc@3100: Please define VDD channel

It turns out the pmi8950 uses same VDD and GND channels as other
Qualcomm's PMICs, so we can simply copy the channel definition from
the other Qualcomm's PMIC dtsi.

Signed-off-by: Antony Kurniawan Soemardi <linux@smankusors.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251004-fix-pmi8950-vadc-v1-1-3143ecab99e9@smankusors.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-17 15:39:21 -07:00
Raymond Hackley
e38c92b877 arm64: dts: qcom: msm8916-samsung-rossa: Move touchscreen to common device tree
Every Core Prime uses an Imagis IST3038 touchscreen that is connected to
&blsp_i2c5. Move it to the common device tree.

Signed-off-by: Raymond Hackley <raymondhackley@protonmail.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251004123907.84270-1-raymondhackley@protonmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-17 15:39:21 -07:00
Konrad Dybcio
fcf8517850 arm64: dts: qcom: x1e80100: Extend the gcc input clock list
With the recent dt-bindings update, the missing USB4 clocks have been
added.

Extend the existing list to make sure the DT contains the expected
amount of 'clocks' entries.

Reviewed-by: Bryan O'Donoghue <bod@kernel.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251003-topic-hamoa_gcc_usb4-v2-3-61d27a14ee65@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-17 15:39:21 -07:00
Naveen N Rao
940fc47cfb KVM: SVM: Add AVIC support for 4k vCPUs in x2AVIC mode
With AVIC support for 4k vCPUs, the maximum supported physical ID in
x2AVIC mode is 4095. Since this is no longer fixed, introduce a variable
(x2avic_max_physical_id) to capture the maximum supported physical ID on
the current platform and use that in place of the existing macro
(X2AVIC_MAX_PHYSICAL_ID).

With AVIC support for 4k vCPUs, the AVIC Physical ID table is no
longer a single page and can occupy up to 8 contiguous 4k pages. Since
AVIC hardware accesses of the physical ID table are limited by the
physical max index programmed in the VMCB, it is sufficient to allocate
only as many pages as are required to have a physical table entry for
the max guest APIC ID. Since the guest APIC mode is not available at
this point, provision for the maximum possible x2AVIC ID. For this
purpose, add a variant of avic_get_max_physical_id() that works with a
NULL vCPU pointer and returns the max x2AVIC ID. Wrap this in a new
helper for obtaining the allocation order.

To make it easy to identify support for 4k vCPUs in x2AVIC mode, update
the message printed to the kernel log to print the maximum number of
vCPUs supported. Do this on all platforms supporting x2AVIC since it is
useful to know what is supported on a specific platform.

Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://lore.kernel.org/r/7fc5962f6da028f7dd3c79dbbd5c574fa02c99dd.1757009416.git.naveen@kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-17 15:24:02 -07:00
Naveen N Rao
5d0316e25d x86/cpufeatures: Add X86_FEATURE_X2AVIC_EXT
Add CPUID feature bit for x2AVIC extension that enables AMD SVM to
support up to 4096 vCPUs in x2AVIC mode. The primary change is in the
size of the AVIC Physical ID table, which can now go up to 8 contiguous
4k pages. The number of pages allocated is controlled by the maximum
APIC ID for a guest, and that controls the number of pages to allocate
for the AVIC Physical ID table. AVIC hardware is enhanced to look up
Physical ID table entries for vCPUs > 512 for locating the target APIC
backing page and the host APIC ID of the physical core on which the
guest vCPU is running.

Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/e5c9c471ab99a130bf9b728b77050ab308cf8624.1757009416.git.naveen@kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-17 15:23:45 -07:00
Naveen N Rao
54ffe74cc4 KVM: SVM: Move AVIC Physical ID table allocation to vcpu_precreate()
With support for 4k vCPUs in x2AVIC, the size of the AVIC Physical ID
table is expanded from a single 4k page to a maximum of 8 contiguous 4k
pages. The actual number of pages allocated depends on the maximum
possible APIC ID in the guest, which is only known by the time the first
vCPU is created. In preparation for supporting a dynamic AVIC Physical
ID table size, move its allocation to vcpu_precreate().

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://lore.kernel.org/r/7dc764e0af7f01440bbac3d9215ed174027c2384.1757009416.git.naveen@kernel.org
[sean: drop enable_apicv check from svm_vcpu_precreate()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-17 15:23:24 -07:00
Naveen N Rao
ca11d9d35e KVM: SVM: Expand AVIC_PHYSICAL_MAX_INDEX_MASK to be a 12-bit field
In the latest APM describing AVIC support for 4k vCPUs, VMCB
AVIC_PHYSICAL_MAX_INDEX (Offset 0xF8) and EXITINFO2.Index are both
updated from 9-bit wide to 12-bit wide fields unconditionally (i.e.,
regardless of AVIC support for 4k vCPUs). Expand
AVIC_PHYSICAL_MAX_INDEX_MASK accordingly.

While AVIC_PHYSICAL_MAX_INDEX_MASK is updated to a 12-bit field, KVM
will limit the max vCPU/APIC ID based on the maximum supported on a
specific processor and enforce that limit during vCPU creation. I.e.,
KVM doesn't need to rely on the mask to ensure that the max APIC ID being
programmed in the VMCB is in range. The additional bits (11:9) were
previously marked reserved and were never set/read by older processors.

Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://lore.kernel.org/r/a24ae953cea716bf9c56c136f7ca4bf5e97b1080.1757009416.git.naveen@kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-17 15:23:07 -07:00
Naveen N Rao
83f3cbcd3a KVM: SVM: Replace hard-coded value 0x1FF with the corresponding macro
The lower 9-bit field in EXITINFO2 represents an index into the AVIC
Physical/Logical APIC ID table for a AVIC_INCOMPLETE_IPI #VMEXIT. Since
the index into the Logical APIC ID table is just 8 bits, this field is
actually bound by the bit-width of the index into the AVIC Physical ID
table which is represented by AVIC_PHYSICAL_MAX_INDEX_MASK. So, use that
macro to mask EXITINFO2.Index instead of hard coding 0x1FF in
avic_incomplete_ipi_interception().

Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://lore.kernel.org/r/95795f449c68bffcb3e1789ee2b0b7393711d37d.1757009416.git.naveen@kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-17 15:22:10 -07:00
Naveen N Rao
f2f6e67a56 KVM: SVM: Add a helper to look up the max physical ID for AVIC
To help with a future change, add a helper to look up the maximum
physical ID depending on the vCPU AVIC mode. No functional change
intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://lore.kernel.org/r/0ab9bf5e20a3463a4aa3a5ea9bbbac66beedf1d1.1757009416.git.naveen@kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-17 15:21:54 -07:00
Naveen N Rao
574ef752d4 KVM: SVM: Limit AVIC physical max index based on configured max_vcpu_ids
KVM allows VMMs to specify the maximum possible APIC ID for a virtual
machine through KVM_CAP_MAX_VCPU_ID capability so as to limit data
structures related to APIC/x2APIC. Utilize the same to set the AVIC
physical max index in the VMCB, similar to VMX. This helps hardware
limit the number of entries to be scanned in the physical APIC ID table
speeding up IPI broadcasts for virtual machines with smaller number of
vCPUs.

Unlike VMX, SVM AVIC requires a single page to be allocated for the
Physical APIC ID table and the Logical APIC ID table, so retain the
existing approach of allocating those during VM init.

Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://lore.kernel.org/r/adb07ccdb3394cd79cb372ba6bcc69a4e4d4ef54.1757009416.git.naveen@kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-17 15:21:42 -07:00
Sean Christopherson
1100e4910a KVM: nVMX: Add an off-by-default module param to WARN on missed consistency checks
Add an off-by-default param, "warn_on_missed_cc", to have KVM WARN on a
missed VMX Consistency Check on nested VM-Enter, specifically so that KVM
developers and maintainers can more easily detect missing checks.  KVM's
goal/intent is that KVM detect *all* VM-Fail conditions in software, as
relying on hardware leads to false passes when KVM's nested support is a
subset of hardware support, e.g. see commit 095686e6fc ("KVM: nVMX:
Check vmcs12->guest_ia32_debugctl on nested VM-Enter").

With one notable exception, KVM now detects all VM-Fail scenarios for
which there is known test coverage, i.e. KVM developers can enable the
param and expect a clean run, and thus can use the param to detect missed
checks, e.g. when enabling new features, when writing new tests, etc.

The one exception is an unfortunate consistency check on vTPR.  Because
the vTPR for L2 comes from the virtual APIC page provided by L1, L2's vTPR
is fully writable at all times, i.e. is inherently subject to TOCTOU
issues with respect to checks in software versus consumption in hardware.
Further complicating matters is KVM's deferred handling of vmcs12 pages
when loading nested state; KVM flat out cannot check vTPR during
KVM_SET_NESTED_STATE without breaking setups that do on-demand paging,
e.g. for live migration and/or live update.

To fudge around the vTPR issue, add a "late" controls check for vTPR and
also treat an invalid virtual APIC as VM-Fail, but gate the check on
warn_on_missed_cc being enabled to avoid unwanted false positives, i.e. to
avoid breaking KVM in production.

Cc: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250919005955.1366256-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-17 15:11:27 -07:00
Sean Christopherson
a175da6d43 KVM: nVMX: Remove support for "early" consistency checks via hardware
Remove nested_early_check and all associated code, as it's quite obviously
not being used or tested (it's been broken for 4+ years without a single
bug report).  More importantly, KVM's software-based consistency checks
have matured since the option to do hardware-based checks was added; KVM
appears to be missing only _one_ consistency check, on vTPR.  And even
*more* importantly, that consistency check can't be prevented by an early
hardware check due to L1 being able to modify the virtual APIC at any
time, i.e. there's an inherent TOCTOU flaw that could cause KVM to "miss"
a consistency check VM-Fail, regardless of whether the check is performed
by software or by hardware.

In other words, KVM _must_ be able to unwind from a late VM-Fail (which
was a big motivation for doing early checks).  I.e. now that KVM provides
(almost) all necessary consistency checks, what's really needed is a way
to detect missing checks in KVM, not a way to avoid having to unwind from
a late VM-Fail.  And that can be done much more simply, e.g. by an simple
module param to guard a WARN (which, sadly, must be off-by-default to
avoid splats due to the aforementioned TOCTOU issue).

For all intents and purposes, this reverts commit 52017608da ("KVM:
nVMX: add option to perform early consistency checks via H/W").

Link: https://lore.kernel.org/r/20250919005955.1366256-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-17 15:11:27 -07:00
Sean Christopherson
f91699d569 KVM: nVMX: Stuff vmcs02.TSC_MULTIPLIER early on for nested early checks
If KVM is doing "early" nested VM-Enter consistency checks and TSC scaling
is supported, stuff vmcs02's TSC Multiplier early on to avoid getting a
false positive VM-Fail due to trying to do VM-Enter with TSC_MULTIPLIER=0.
To minimize complexity around L1 vs. L2 TSC, KVM sets the actual TSC
Multiplier rather late during VM-Entry, i.e. may have '0' at the time of
early consistency checks.

If vmcs12 has TSC Scaling enabled, use the multiplier from vmcs12 so that
nested early checks actually check vmcs12 state, otherwise throw in an
arbitrary value of '1' (anything non-zero is legal).

Fixes: d041b5ea93 ("KVM: nVMX: Enable nested TSC scaling")
Link: https://lore.kernel.org/r/20250919005955.1366256-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-17 15:11:27 -07:00
Sean Christopherson
ae8e6ad841 KVM: nVMX: Add consistency check for TSC_MULTIPLIER=0
Add a missing consistency check on the TSC Multiplier being '0'.  Per the
SDM:

  If the "use TSC scaling" VM-execution control is 1, the TSC-multiplier
  must not be zero.

Fixes: d041b5ea93 ("KVM: nVMX: Enable nested TSC scaling")
Link: https://lore.kernel.org/r/20250919005955.1366256-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-17 15:11:26 -07:00
Sean Christopherson
15fe455dd1 KVM: nVMX: Add consistency check for TPR_THRESHOLD[31:4]!=0 without VID
Add a missing consistency check on the TPR Threshold.  Per the SDM

  If the "use TPR shadow" VM-execution control is 1 and the "virtual-
  interrupt delivery" VM-execution control is 0, bits 31:4 of the TPR
  threshold VM-execution control field must be 0.

Note, nested_vmx_check_tpr_shadow_controls() bails early if "use TPR
shadow" is 0.

Link: https://lore.kernel.org/r/20250919005955.1366256-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-17 15:11:26 -07:00
Sean Christopherson
2f723a8634 KVM: VMX: Use kvm_mmu_page role to construct EPTP, not current vCPU state
Use the role for the to-be-loaded/invalidated EPT root to compute the
root's level and A/D enablement instead of pulling the information from
the vCPU (e.g. by passing in the root level and querying vmcs12).  Not
making unnecessary assumptions about the root will allow invalidating
arbitrary EPT roots (which sadly requires a full EPTP) at any given time.

No functional change intended (the end result should be the same).

Link: https://lore.kernel.org/r/20250919005955.1366256-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-17 15:11:26 -07:00
Sean Christopherson
a10f5cc3ac KVM: x86/mmu: Move "dummy root" helpers to spte.h
Move the helpers to get/query a dummy root from mmu_internal.h to spte.h
so that VMX can detect and handle dummy roots when constructing EPTPs.
This will allow using the root's role to build the EPTP instead of pulling
equivalent information out of the vCPU structure.

No functional change intended.

Link: https://lore.kernel.org/r/20250919005955.1366256-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-17 15:11:26 -07:00
Sean Christopherson
a8749281e4 KVM: nVMX: Hardcode dummy EPTP used for early nested consistency checks
Hardcode the dummy EPTP used for "early" consistency checks as there's no
need to use 5-level EPT based on the guest.MAXPHYADDR (the EPTP just needs
to be valid, it's never truly consumed).

This will allow breaking construct_eptp()'s dependency on having access to
the vCPU, which in turn will (much further in the future) allow for eliding
per-root TLB flushes when a vCPU is migrated between pCPUs (a flush is
need if and only if that particular pCPU hasn't already flushed the vCPU's
roots).

Link: https://lore.kernel.org/r/20250919005955.1366256-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-17 15:11:26 -07:00
Sean Christopherson
f48888bb8a KVM: VMX: Hoist construct_eptp() "up" in vmx.c
Move construct_eptp() further up in vmx.c so that it's above
vmx_flush_tlb_current(), its "first" user in vmx.c.  This will allow a
future patch to opportunistically make construct_eptp() local to vmx.c.

No functional change intended.

Link: https://lore.kernel.org/r/20250919005955.1366256-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-17 15:11:26 -07:00
Kathiravan Thirumoorthy
28803705b5 arm64: dts: qcom: ipq5424: correct the TF-A reserved memory to 512K
Correct the reserved memory size for TF-A to 512K, as it was mistakenly
marked as 500K. Update the reserved memory node accordingly.

Fixes: 8517204c98 ("arm64: dts: qcom: ipq5424: Add reserved memory for TF-A")
Signed-off-by: Kathiravan Thirumoorthy <kathiravan.thirumoorthy@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251014-tfa-reserved-mem-v1-1-48c82033c8a7@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-17 14:59:32 -07:00
Prasad Kumpatla
5e88dfc52f soc: qcom: pd-mapper: Add Kaanapali compatible
Add support for the Qualcomm Kaanapali SoC to the protection
domain mapper. Kaanapali shares the same protection domain
configuration as SM8550, except charger_pd as it move to SoCCP.

Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
Signed-off-by: Prasad Kumpatla <prasad.kumpatla@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251014-knp-pdmapper-v2-v2-1-ba44422ac503@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-17 14:58:36 -07:00
Konrad Dybcio
e4c4f5a1ae dt-bindings: clock: qcom,x1e80100-gcc: Add missing USB4 clocks/resets
Some of the USB4 muxes, RCGs and resets were not initially described.

Add indices for them to allow extending the driver.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Bryan O'Donoghue <bod@kernel.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251003-topic-hamoa_gcc_usb4-v2-1-61d27a14ee65@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-10-17 14:50:55 -07:00
Han Gao
fac4be7b3d riscv: dts: thead: add zfh for th1520
th1520 support Zfh ISA extension.
It supports the same RISC-V extensions as SG2042.

commit cb074bed11 ("riscv: dts: sophgo: add zfh for sg2042")

Signed-off-by: Han Gao <rabenda.cn@gmail.com>
Reviewed-by: Drew Fustini <fustini@kernel.org>
Signed-off-by: Drew Fustini <fustini@kernel.org>
2025-10-17 11:32:41 -07:00
Han Gao
bcc3b9c5de riscv: dts: thead: add ziccrse for th1520
Existing rv64 hardware conforms to the rva20 profile.

Ziccrse is an additional extension required by the rva20 profile, so
th1520 has this extension.

Signed-off-by: Han Gao <rabenda.cn@gmail.com>
Reviewed-by: Drew Fustini <fustini@kernel.org>
Signed-off-by: Drew Fustini <fustini@kernel.org>
2025-10-17 11:32:41 -07:00
Han Gao
aef6dc0066 riscv: dts: thead: add xtheadvector to the th1520 devicetree
The th1520 support xtheadvector [1] so it can be included in the
devicetree. Also include vlenb for the cpu. And set vlenb=16 [2].

This can be tested by passing the "mitigations=off" kernel parameter.

Link: https://lore.kernel.org/linux-riscv/20241113-xtheadvector-v11-4-236c22791ef9@rivosinc.com/ [1]
Link: https://lore.kernel.org/linux-riscv/aCO44SAoS2kIP61r@ghost/ [2]

Signed-off-by: Han Gao <rabenda.cn@gmail.com>
Reviewed-by: Drew Fustini <fustini@kernel.org>
Signed-off-by: Drew Fustini <fustini@kernel.org>
2025-10-17 11:32:41 -07:00
Arnd Bergmann
18caf68d2d Merge tag 'zynqmp-soc-for-6.18' of https://github.com/Xilinx/linux-xlnx into soc/drivers
arm64: Xilinx SOC changes for 6.18

firmware:
- Add debugfs interface
- Wire versal-net compatible string
- Change SOC family detection

* tag 'zynqmp-soc-for-6.18' of https://github.com/Xilinx/linux-xlnx:
  drivers: firmware: xilinx: Switch to new family code in zynqmp_pm_get_family_info()
  drivers: firmware: xilinx: Add unique family code for all platforms
  firmware: xilinx: Add Versal NET platform compatible string
  firmware: xilinx: Add debugfs support for PM_GET_NODE_STATUS
2025-10-17 15:24:00 +02:00
Fred Chen
6953afcd81 ARM: dts: aspeed: santabarbara: Add eeprom device node for PRoT module
Add eeprom device node for PRot module FRU.

Signed-off-by: Fred Chen <fredchen.openbmc@gmail.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17 16:44:13 +10:30
Fred Chen
ba317bdb79 ARM: dts: aspeed: santabarbara: Add AMD APML interface support
Enable AMD APML related features
 - add amd sbrmi node for SoC power reading
 - add amd sbtsi node for SoC temperature reading
 - rename the P0_I3C_APML_ALERT_L GPIO to align with the naming
   convention expected by the AMD APML tool

Signed-off-by: Fred Chen <fredchen.openbmc@gmail.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17 16:44:13 +10:30
Fred Chen
4db26c65d2 ARM: dts: aspeed: santabarbara: Add gpio line name
Add GPIO line name for userspace control or monitoring
- Add leak-related line names to report chassis leak event
- Add debug-card-mux to control debug card access
- Add FM_MAIN_PWREN_RMC_EN_ISO_R to receive RMC power control signal

Signed-off-by: Fred Chen <fredchen.openbmc@gmail.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17 16:44:13 +10:30
Fred Chen
816d369ebd ARM: dts: aspeed: santabarbara: Add bmc_ready_noled Led
Add a 'bmc_ready_noled' LED on GPIOB3 with GPIO_TRANSITORY to ensure its
state resets on BMC reboot.

Signed-off-by: Fred Chen <fredchen.openbmc@gmail.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17 16:44:13 +10:30
Fred Chen
5941b4239f ARM: dts: aspeed: santabarbara: Enable MCTP for frontend NIC
Add the mctp-controller property and MCTP node to enable frontend NIC
management via PLDM over MCTP.

Signed-off-by: Fred Chen <fredchen.openbmc@gmail.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17 16:44:13 +10:30
Fred Chen
89c51b7066 ARM: dts: aspeed: santabarbara: Add sensor support for extension boards
add power monitor and temperature sensors for extension boards in bus 6,
8, 10 and 13.

Signed-off-by: Fred Chen <fredchen.openbmc@gmail.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17 16:44:13 +10:30
Fred Chen
76de084503 ARM: dts: aspeed: santabarbara: Add blank lines between nodes for readability
Add missing blank lines between DT nodes to follow the devicetree coding
style and improve readability.

No functional changes.

Signed-off-by: Fred Chen <fredchen.openbmc@gmail.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17 16:44:13 +10:30
Kevin Tung
a5c59a2923 ARM: dts: aspeed: yosemite5: Add Meta Yosemite5 BMC
Add device tree for the Meta (Facebook) Yosemite5 compute node,
based on the AST2600 BMC.

The Yosemite5 platform provides monitoring of voltages, power,
temperatures, and other critical parameters across the motherboard,
CXL board, E1.S expansion board, and NIC components. The BMC also
logs relevant events and performs appropriate system actions in
response to abnormal conditions.

Signed-off-by: Kevin Tung <kevin.tung.openbmc@gmail.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17 15:28:34 +10:30
Kevin Tung
8b811220eb dt-bindings: arm: aspeed: add Meta Yosemite5 board
Document the new compatibles used on Meta Yosemite5.

Signed-off-by: Kevin Tung <kevin.tung.openbmc@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-17 15:28:27 +10:30
Thierry Reding
25937d399b dt-bindings: power: Add power domain IDs for Tegra264
Add the set of power domain IDs available on the Tegra264 SoC so that
they can be used in device tree files.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-10-16 11:23:57 +02:00
Cristian Ciocaltea
ae753d769a arm64: defconfig: Enable DW HDMI QP CEC support
Enable support for the CEC interface of the Synopsys DesignWare HDMI QP
IP block.

This is used by all boards based on RK3588 & RK3576 SoCs.

Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-15 22:33:03 +02:00
Jim Mattson
c53c632592 KVM: SVM: Disallow EFER.LMSLE when not supported by hardware
Modern AMD CPUs do not support segment limit checks in 64-bit mode
(i.e. EFER.LMSLE must be zero). Do not allow a guest to set EFER.LMSLE
on a CPU that requires the bit to be zero.

For backwards compatibility, allow EFER.LMSLE to be set on CPUs that
support segment limit checks in 64-bit mode, even though KVM's
implementation of the feature is incomplete (e.g. KVM's emulator does
not enforce segment limits in 64-bit mode).

Fixes: eec4b140c9 ("KVM: SVM: Allow EFER.LMSLE to be set with nested svm")

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://lore.kernel.org/r/20251001001529.1119031-3-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-15 10:57:21 -07:00
Boon Khai Ng
0f1fd7319c arm64: dts: agilex5: Add GMAC0 node for NAND daughter card
Enable the GMAC0 node for the Agilex5 device when using the NAND
daughter card.

Signed-off-by: Boon Khai Ng <boon.khai.ng@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-14 22:13:09 -05:00
Fong, Yan Kei
a025e1fb35 arm64: dts: socfpga: agilex5: Add 4-bit SPI bus width
Add spi-tx-bus-width and spi-rx-bus-width properties with value 4 to the
agilex5 device tree. This update configures the SPI controller to use a
4-bit bus width for both transmission and reception, potentially improving
SPI throughput and matching the hardware capabilities more closely.

Signed-off-by: Fong, Yan Kei <yan.kei.fong@altera.com>
Reviewed-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-14 22:07:56 -05:00
Fong, Yan Kei
e928e15a3e arm64: dts: socfpga: agilex: Add 4-bit SPI bus width
Add spi-tx-bus-width and spi-rx-bus-width properties with value 4 to the
agilex device tree. This update configures the SPI controller to use a
4-bit bus width for both transmission and reception, potentially improving
SPI throughput and matching the hardware capabilities more closely.

Signed-off-by: Fong, Yan Kei <yan.kei.fong@altera.com>
Reviewed-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-14 22:07:20 -05:00
Fong, Yan Kei
76297a4a93 arm64: dts: socfpga: stratix10: Add 4-bit SPI bus width
Add spi-tx-bus-width and spi-rx-bus-width properties with value 4 to the
stratix10 device tree. This update configures the SPI controller to use a
4-bit bus width for both transmission and reception, potentially improving
SPI throughput and matching the hardware capabilities more closely.

Signed-off-by: Fong, Yan Kei <yan.kei.fong@altera.com>
Reviewed-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-14 22:06:22 -05:00
Fong, Yan Kei
b24ecccd1a arm64: dts: socfpga: n5x: Add 4-bit SPI bus width
Add spi-tx-bus-width and spi-rx-bus-width properties with value 4 to the
n5x device tree. This update configures the SPI controller to use a 4-bit
bus width for both transmission and reception, potentially improving SPI
throughput and matching the hardware capabilities more closely.

Signed-off-by: Fong, Yan Kei <yan.kei.fong@altera.com>
Reviewed-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-10-14 22:05:20 -05:00
Jim Mattson
4793f990ea KVM: x86: Advertise EferLmsleUnsupported to userspace
CPUID.80000008H:EBX.EferLmsleUnsupported[bit 20] is a defeature
bit. When this bit is clear, EFER.LMSLE is supported. When this bit is
set, EFER.LMLSE is unsupported. KVM has never _emulated_ EFER.LMSLE, so
KVM cannot truly support a 0-setting of this bit.

However, KVM has allowed the guest to enable EFER.LMSLE in hardware
since commit eec4b140c9 ("KVM: SVM: Allow EFER.LMSLE to be set with
nested svm"), i.e. KVM partially virtualizes long-mode segment limits _if_
they are supported by the underlying hardware.

Pass through the bit in KVM_GET_SUPPORTED_CPUID to advertise the
unavailability of EFER.LMSLE to userspace based on the raw underlying
hardware.  Attempting to enable EFER.LSMLE on such CPUs simply doesn't
work, e.g. immediately crashes on VMRUN.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://lore.kernel.org/r/20251001001529.1119031-2-jmattson@google.com
[sean: add context about partial virtualization, use PASSTHROUGH_F]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-14 15:49:10 -07:00
Leonardo Bras
04fd067b77 KVM: Fix VM exit code for full dirty ring in API documentation
While reading the documentation, I saw a exit code I could not grep for, to
figure out it has a slightly different name.

Fix that name in documentation so it points to the right exit code.

Signed-off-by: Leonardo Bras <leo.bras@arm.com>
Link: https://lore.kernel.org/r/20251014152802.13563-1-leo.bras@arm.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-14 15:19:05 -07:00
Jim Mattson
7c8b465a1c KVM: SVM: Mark VMCB_NPT as dirty on nested VMRUN
Mark the VMCB_NPT bit as dirty in nested_vmcb02_prepare_save()
on every nested VMRUN.

If L1 changes the PAT MSR between two VMRUN instructions on the same
L1 vCPU, the g_pat field in the associated vmcb02 will change, and the
VMCB_NPT clean bit should be cleared.

Fixes: 4bb170a543 ("KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit")
Cc: stable@vger.kernel.org
Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250922162935.621409-3-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-14 12:50:57 -07:00
Jim Mattson
93c9e10738 KVM: SVM: Mark VMCB_PERM_MAP as dirty on nested VMRUN
Mark the VMCB_PERM_MAP bit as dirty in nested_vmcb02_prepare_control()
on every nested VMRUN.

If L1 changes MSR interception (INTERCEPT_MSR_PROT) between two VMRUN
instructions on the same L1 vCPU, the msrpm_base_pa in the associated
vmcb02 will change, and the VMCB_PERM_MAP clean bit should be cleared.

Fixes: 4bb170a543 ("KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit")
Reported-by: Matteo Rizzo <matteorizzo@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250922162935.621409-2-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-14 12:50:48 -07:00
Chaoyi Chen
134fae98cf arm64: dts: rockchip: Enable DisplayPort for rk3588-evb2
The rk3588 evb2 board has a full size DisplayPort connector, enable
for it.

Signed-off-by: Chaoyi Chen <chaoyi.chen@rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14 19:51:42 +02:00
Tianling Shen
7fee888827 arm64: dts: rockchip: Add devicetree for the FriendlyElec NanoPi R76S
The NanoPi R76S (as "R76S") is an open-sourced mini IoT gateway
device with two 2.5G, designed and developed by FriendlyElec.

Specification:
- Rockchip RK3576
- 2/4GB LPDDR4X RAM
- 2x 2500Base-T (PCIe, rtl8125b)
- 3x LEDs (Power, LAN, WAN)
- 32GB eMMC
- MicroSD Slot
- MDMI 1.4/2.0 OUT
- M.2 E-Key SDIO slot
- USB 3.0 Port
- USB Type-C 5V Power

Signed-off-by: Tianling Shen <cnsztl@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14 19:48:16 +02:00
Tianling Shen
75e700fa35 dt-bindings: arm: rockchip: Add FriendlyElec NanoPi R76S
The NanoPi R76S (as "R76S") is an open-sourced mini IoT gateway
device with two 2.5G, designed and developed by FriendlyElec.

Add devicetree binding documentation for the FriendlyElec NanoPi R76S
board.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Tianling Shen <cnsztl@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14 19:48:16 +02:00
Andy Yan
642f287981 arm64: defconfig: Enable Rockchip extensions for Synopsys DW DP
Enable Rockchip specific extensions for Synopsys DesignWare DisplayPort
driver. This is used to provide DisplayPort output support for many boards
based on RK3588 SoC.

Signed-off-by: Andy Yan <andyshrk@163.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14 18:17:00 +02:00
WeiHao Li
5023d0cd61 arm64: dts: rockchip: Add DSI for RK3368
Add the Designware MIPI DSI controller and it's port nodes.

Signed-off-by: WeiHao Li <cn.liweihao@gmail.com>
[removed endpoint address, as there is only one vop leading to DSI]
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14 17:24:31 +02:00
WeiHao Li
d62e2ad29b arm64: dts: rockchip: Add D-PHY for RK3368
RK3368 has a InnoSilicon D-PHY which supports DSI/LVDS/TTL with maximum
trasnfer rate of 1 Gbps per lane.

Signed-off-by: WeiHao Li <cn.liweihao@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14 17:24:28 +02:00
WeiHao Li
ef06b5ddee arm64: dts: rockchip: Add display subsystem for RK3368
Add vop and display-subsystem nodes to RK3368's device tree.

Signed-off-by: WeiHao Li <cn.liweihao@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-10-14 17:24:25 +02:00
Christian Marangi
5416aeee4e ARM: dts: mediatek: drop wrong syscon hifsys compatible for MT2701/7623
The syscon compatible for the hifsys node for Mediatek MT2701/MT7623 SoC
was wrongly added following the pattern of other clock node but it's
actually not needed as the register are not used by other device on the
SoC.

On top of this it's against the schema for hifsys YAML and causes a
dtbs_check warning.

Drop the "syscon" compatible to mute the warning and reflect the
compatible property described in the mediatek,mt2701-hifsys.yaml schema.

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-14 13:47:13 +02:00
Thorsten Blum
0bd0a4a142 KVM: TDX: Replace kmalloc + copy_from_user with memdup_user in tdx_td_init()
Use get_user() to retrieve the number of entries instead of allocating
memory for 'init_vm' with the maximum size, copying 'cmd->data' to it,
only to then read the actual entry count 'cpuid.nent' from the copy.

Use memdup_user() to allocate just enough memory to fit all entries and
to copy 'cmd->data' from userspace. Use struct_size() instead of
manually calculating the number of bytes to allocate and copy.

No functional changes intended.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://lore.kernel.org/r/20250916213129.2535597-2-thorsten.blum@linux.dev
[sean: s/user_init_vm/user_data]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-13 15:11:06 -07:00
Marco Crivellari
9259607ec7 KVM: Explicitly allocate/setup irqfd cleanup as per-CPU workqueue
Explicitly request the use of per-CPU queues for the irqfd cleanup
workqueue in preparation for changing the default behavior of
alloc_workqueue() from per-CPU to unbound, which will in turn allow for
the removal of WQ_UNBOUND.  See commit 930c2ea566 ("workqueue: Add new
WQ_PERCPU flag") for details.

No functional change intended.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://lore.kernel.org/r/20250905091139.110677-2-marco.crivellari@suse.com
[sean: rewrite changelog to tailor it to the KVM]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-13 15:02:20 -07:00
Xin Li
f505c7b16f KVM: nVMX: Use vcpu instead of vmx->vcpu when vcpu is available
Prefer using vcpu directly when available, instead of accessing it
through vmx->vcpu.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Link: https://lore.kernel.org/r/20250924145421.2046822-1-xin@zytor.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-13 14:52:18 -07:00
Dmytro Maluka
0152e049bd KVM: VMX: Remove stale vmx_set_dr6() declaration
Remove leftover after commit 80c64c7afe ("KVM: x86: Drop
kvm_x86_ops.set_dr6() in favor of a new KVM_RUN flag") which removed
vmx_set_dr6().

Signed-off-by: Dmytro Maluka <dmaluka@chromium.org>
Link: https://lore.kernel.org/r/20250926155724.1619716-1-dmaluka@chromium.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-13 14:50:56 -07:00
Dmytro Maluka
b850841a53 KVM: x86/mmu: Skip MMIO SPTE invalidation if enable_mmio_caching=0
If MMIO caching is disabled, there are no MMIO SPTEs to invalidate, so
the costly zapping of all pages is unnecessary even in the unlikely case
when the MMIO generation number has wrapped.

Signed-off-by: Dmytro Maluka <dmaluka@chromium.org>
Link: https://lore.kernel.org/r/20250926135139.1597781-1-dmaluka@chromium.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-13 14:50:46 -07:00
Johan Hovold
f95f3bcead soc: apple: sart: drop device reference after lookup
Holding a reference to a device does not prevent its driver data from
going away so there is no point in keeping the reference after looking
up the sart device.

Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Sven Peter <sven@kernel.org>
2025-10-13 18:32:53 +02:00
Johan Hovold
f401671e90 soc: apple: mailbox: fix device leak on lookup
Make sure to drop the reference taken to the mbox platform device when
looking up its driver data.

Note that holding a reference to a device does not prevent its driver
data from going away so there is no point in keeping the reference.

Fixes: 6e1457fcad ("soc: apple: mailbox: Add ASC/M3 mailbox driver")
Cc: stable@vger.kernel.org	# 6.8
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Sven Peter <sven@kernel.org>
2025-10-13 18:32:53 +02:00
Aurelien Jarno
859ce3828f riscv: dts: spacemit: add i2c aliases on BPI-F3
Add i2c aliases for i2c2 and i2c8 on BPI-F3. This is useful to keep a
stable number for the /dev entries after loading the i2c-dev module.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn>
Reviewed-by: Yixun Lan <dlan@gentoo.org>
Link: https://lore.kernel.org/r/20250926175833.3048516-4-aurelien@aurel32.net
Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-10-13 21:52:14 +08:00
Aurelien Jarno
bfce75e234 riscv: dts: spacemit: add 24c02 eeprom on BPI-F3
The BPI-F3 board includes a 24c02 eeprom, that stores the MAC addresses
of the two network interfaces and the board's serial number. These
values are also exposed via an onie,tlv-layout nvmem layout.

The eeprom is marked as read-only since its contents are not supposed to
be modified.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn>
Reviewed-by: Yixun Lan <dlan@gentoo.org>
Link: https://lore.kernel.org/r/20250926175833.3048516-3-aurelien@aurel32.net
Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-10-13 21:52:11 +08:00
Aurelien Jarno
dcca228777 riscv: dts: spacemit: enable the i2c2 adapter on BPI-F3
Define properties for the I2C adapter, and enable it on the BPI-F3. It
will be used by the 24c02 eeprom.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn>
Reviewed-by: Yixun Lan <dlan@gentoo.org>
Link: https://lore.kernel.org/r/20250926175833.3048516-2-aurelien@aurel32.net
Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-10-13 21:52:07 +08:00
Alex Elder
09a412d397 riscv: dts: spacemit: define regulator constraints
Define basic constraints for the regulators in the SpacemiT P1 PMIC,
as implemented in the Banana Pi BPI-F3.

Signed-off-by: Alex Elder <elder@riscstar.com>
Link: https://lore.kernel.org/r/20250825172057.163883-8-elder@riscstar.com
Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-10-13 21:20:40 +08:00
Alex Elder
1df07a4045 riscv: dts: spacemit: define fixed regulators
Define the DC power input and the 4v power as fixed supplies in the
Banana Pi BPI-F3.

Signed-off-by: Alex Elder <elder@riscstar.com>
Link: https://lore.kernel.org/r/20250825172057.163883-7-elder@riscstar.com
Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-10-13 21:20:36 +08:00
Alex Elder
3e8d7309e6 riscv: dts: spacemit: enable the i2c8 adapter
Define properties for the I2C adapter that provides access to the
SpacemiT P1 PMIC.  Enable this adapter on the Banana Pi BPI-F3.

Signed-off-by: Alex Elder <elder@riscstar.com>
Link: https://lore.kernel.org/r/20250825172057.163883-6-elder@riscstar.com
Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-10-13 21:20:32 +08:00
Hendrik Hamerlinck
1187f9b3f6 riscv: dts: spacemit: add UART pinctrl combinations
Add UART pinctrl configurations based on the SoC datasheet and the
downstream Bianbu Linux tree. The drive strength values were taken from
the downstream implementation, which uses medium drive strength.
CTS/RTS are moved to separate *-cts-rts-cfg states so boards can enable
hardware flow control conditionally.

Signed-off-by: Hendrik Hamerlinck <hendrik.hamerlinck@hammernet.be>
Reviewed-by: Yixun Lan <dlan@gentoo.org>
Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Link: https://lore.kernel.org/r/20250917065907.160615-1-hendrik.hamerlinck@hammernet.be
Signed-off-by: Yixun Lan <dlan@gentoo.org>
2025-10-13 20:50:14 +08:00
Igor Belwon
07a9bd3079 arm64: dts: mediatek: Add MT6878 pinmux macro header file
Add the required macros for the pinmux nodes of the MediaTek MT6878 SoC.

Signed-off-by: Igor Belwon <igor.belwon@mentallysanemainliners.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 12:12:14 +02:00
Duje Mihanović
0e53b0bcad arm64: dts: marvell: pxa1908: Add power domains
Update the APMU clock controller's compatible to allow the new power
domain driver to probe. Also add the first two power domain consumers:
IOMMU (fixes probing) and framebuffer.

Signed-off-by: Duje Mihanović <duje@dujemihanovic.xyz>
2025-10-13 12:11:38 +02:00
Duje Mihanović
b3061b16d5 arm64: dts: marvell: samsung,coreprimevelte: Add USB connector
Add a node for the micro-USB connector found on the board.

Signed-off-by: Duje Mihanović <duje@dujemihanovic.xyz>
2025-10-13 12:11:38 +02:00
Duje Mihanović
5698566012 arm64: dts: marvell: samsung,coreprimevelte: Fill in memory node
The board is known to have 1 GiB of DRAM with the first 16 MiB unusable.
Instead of relying on the bootloader to fill in the memory node, do it
ourselves.

Signed-off-by: Duje Mihanović <duje@dujemihanovic.xyz>
2025-10-13 12:11:38 +02:00
Duje Mihanović
502c952160 arm64: dts: marvell: samsung,coreprimevelte: Drop some reserved memory
Most of the memory marked as reserved is actually usable. Delete its
reserved-memory nodes so that the memory can be used.

Signed-off-by: Duje Mihanović <duje@dujemihanovic.xyz>
2025-10-13 12:11:38 +02:00
Duje Mihanović
35aa5733ac arm64: dts: marvell: pxa1908: Move ramoops to SoC dtsi
The ramoops memory region is the same for all boards based on the SoC.
Move its node to the appropriate dtsi.

Signed-off-by: Duje Mihanović <duje@dujemihanovic.xyz>
2025-10-13 12:11:38 +02:00
Duje Mihanović
412f548663 arm64: dts: marvell: samsung,coreprimevelte: Add vibrator
The board has a vibrator hooked up to PWM3. Add a node for it and its
associated pinctrl configuration.

Signed-off-by: Duje Mihanović <duje@dujemihanovic.xyz>
2025-10-13 12:11:37 +02:00
Duje Mihanović
1e49943914 arm64: dts: marvell: pxa1908: Add PWMs
The SoC has 4 onboard PWMs. Add a node for each of them.

Signed-off-by: Duje Mihanović <duje@dujemihanovic.xyz>
2025-10-13 12:11:37 +02:00
Duje Mihanović
dc0e140e2b arm64: dts: marvell: samsung,coreprimevelte: Enable eMMC
Commit a41fcca4b3 ("mmc: sdhci-pxav3: set NEED_RSP_BUSY capability")
fixed eMMC probing on this board. Enable the eMMC and add its pinctrl.

Signed-off-by: Duje Mihanović <duje@dujemihanovic.xyz>
2025-10-13 12:11:37 +02:00
Duje Mihanović
e7a644f1f9 arm64: dts: marvell: samsung,coreprimevelte: Correct CD GPIO
Right now, the CD GPIO is defined as active high with a cd-inverted
property. Just define the GPIO as active low instead.

Signed-off-by: Duje Mihanović <duje@dujemihanovic.xyz>
2025-10-13 12:11:37 +02:00
Duje Mihanović
3d7590419a arm64: dts: marvell: samsung,coreprimevelte: Add backlight
Add a node for the KTD2801 backlight chip found on the board.

Signed-off-by: Duje Mihanović <duje@dujemihanovic.xyz>
2025-10-13 12:11:37 +02:00
Karel Balej
645d81b886 arm64: dts: samsung,coreprimevelte: add SDIO
Set some basic properties of the SDIO card of the samsung,coreprimevelte
smartphone.

The SDIO is used as an interface for WiFi, Bluetooth and FM radio
serviced by the Marvell 88W8777 (SD8777) chipset. Support for this
chipset is currently not in-tree because the firmware is not available
in linux-firmware, however it is possible to trivially run it
out-of-tree using the mwifiex and Marvell Bluetooth drivers with some
caveats.

Link: https://lore.kernel.org/r/20231029111807.19261-1-balejk@matfyz.cz/
Signed-off-by: Karel Balej <balejk@matfyz.cz>
Reviewed-by: Duje Mihanović <duje@dujemihanovic.xyz>
[Duje: fix formatting of pins_0 and fast_pins_1 pin arrays]
Signed-off-by: Duje Mihanović <duje@dujemihanovic.xyz>
2025-10-13 12:11:37 +02:00
Karel Balej
ec958b5b18 arm64: dts: samsung,coreprimevelte: add touchscreen
Bind touchscreen for the samsung,coreprimevelte smartphone. The
downstream code sets the VDD voltage to the exact value of 3.1 V,
however it's been empirically verified that the lower bound used here
sufficies for the proper operation of the chip and is thus used for
power-saving purposes.

Signed-off-by: Karel Balej <balejk@matfyz.cz>
Reviewed-by: Duje Mihanović <duje@dujemihanovic.xyz>
Signed-off-by: Duje Mihanović <duje@dujemihanovic.xyz>
2025-10-13 12:11:36 +02:00
Karel Balej
66595c1ad8 arm64: dts: samsung,coreprimevelte: add PMIC
Bind power management chip to the samsung,coreprimevelte smartphone.
This enables support for onkey and RTC as well as for regulators two of
which are explicitly bound here to the SD card.

Signed-off-by: Karel Balej <balejk@matfyz.cz>
Reviewed-by: Duje Mihanović <duje@dujemihanovic.xyz>
Signed-off-by: Duje Mihanović <duje@dujemihanovic.xyz>
2025-10-13 12:11:34 +02:00
Cristian Cozzolino
12fd34e32d ARM: dts: mediatek: add basic support for Alcatel yarisxl board
This smartphone uses a MediaTek MT6582 system-on-chip with 512MB of RAM.
It can currently boot into initramfs with a working UART and
Simple Framebuffer using already initialized panel by the bootloader.

Signed-off-by: Cristian Cozzolino <cristian_ci@protonmail.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:48:05 +02:00
Cristian Cozzolino
10dee355bd dt-bindings: arm: mediatek: Add MT6582 yarisxl
Add an entry for Alcatel Pop C7 (OT-7041D) smartphone board, named
yarisxl, based on MT6582 SoC.

Signed-off-by: Cristian Cozzolino <cristian_ci@protonmail.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:48:05 +02:00
Cristian Cozzolino
246475ee9b ARM: dts: mediatek: mt6582: add enable-method property to cpus
Add missing property for SMP.

Signed-off-by: Cristian Cozzolino <cristian_ci@protonmail.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:48:05 +02:00
Cristian Cozzolino
a97a57c68d ARM: dts: mediatek: mt6582: add clock-names property to uart nodes
Other SoC DTSI files define this property for their uart nodes.
Do the same here too, to be consistent.

Signed-off-by: Cristian Cozzolino <cristian_ci@protonmail.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:48:05 +02:00
Cristian Cozzolino
f6b85f1cec ARM: dts: mediatek: mt6582: add mt6582 compatible to timer
Bindings already have mt6582 compatible for timer, so add that to DTSI.
While at it, remove clock-names property (which is not required by timer
node) and fix node address.

Signed-off-by: Cristian Cozzolino <cristian_ci@protonmail.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:48:05 +02:00
Cristian Cozzolino
5f3204e33a ARM: dts: mediatek: mt6582: remove compatible property from root node
SoC's DTSI does not require compatible property. Drop that.

Signed-off-by: Cristian Cozzolino <cristian_ci@protonmail.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:48:05 +02:00
Cristian Cozzolino
e898d7a2e2 ARM: dts: mediatek: mt6582: sort nodes and properties
Sort fixed clocks nodes by clock frequency and memory mapped device
nodes by reg address. Also, sort properties as shown in dt-bindings
examples.

Signed-off-by: Cristian Cozzolino <cristian_ci@protonmail.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:48:04 +02:00
Cristian Cozzolino
1e955255a8 ARM: dts: mediatek: mt6582: move MMIO devices under soc node
Memory mapped devices are part of soc. So, add such node and move those
devices into it.

Signed-off-by: Cristian Cozzolino <cristian_ci@protonmail.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:48:04 +02:00
Cristian Cozzolino
7f637d7187 ARM: mediatek: add MT6582 smp bring up code
Add support for booting secondary CPUs on MT6582.

Signed-off-by: Cristian Cozzolino <cristian_ci@protonmail.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:46:58 +02:00
Cristian Cozzolino
a9ad357fb2 ARM: mediatek: add board_dt_compat entry for the MT6582 SoC
Add a compatible string for the MT6582 SoC.

Signed-off-by: Cristian Cozzolino <cristian_ci@protonmail.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:46:58 +02:00
Mikhail Kshevetskiy
8365a3e1fd arm64: dts: mediatek: mt7986-bpi-r3: Change fan PWM value for mid speed
Popular cheap PWM fans for this machine, like the ones coming in
heatsink+fan combos will not work properly at the currently defined
medium speed. Trying different pwm setting using a command

  echo $value > /sys/devices/platform/pwm-fan/hwmon/hwmon1/pwm1

I found:

  pwm1 value     fan rotation speed   cpu temperature     notes
  -----------------------------------------------------------------
    0            maximal              31.5 Celsius        too noisy
   40            optimal              35.2 Celsius        no noise hearable
   95            minimal
   above 95      does not rotate      55.5 Celsius
  -----------------------------------------------------------------

Thus only cpu-active-high and cpu-active-low modes are usable.
I think this is wrong.

This patch fixes cpu-active-medium settings for bpi-r3 board.

I know, the patch is not ideal as it can break pwm fan for some users.
Likely this is the only official mt7986-bpi-r3 heatsink+fan solution
available on the market.

This patch may not be enough. Users may wants to tweak their thermal_zone0
trip points, thus tuning fan rotation speed depending on cpu temperature.
That can be done on the base of the following example:

  === example =========
  # cpu temperature below 25 Celsius degrees, no rotation
  echo 25000 > /sys/class/thermal/thermal_zone0/trip_point_4_temp
  # cpu temperature in [25..32] Celsius degrees, normal rotation speed
  echo 32000 > /sys/class/thermal/thermal_zone0/trip_point_3_temp
  # cpu temperature above 50 Celsius degrees, max rotation speed
  echo 50000 > /sys/class/thermal/thermal_zone0/trip_point_2_temp
  =====================

Signed-off-by: Mikhail Kshevetskiy <mikhail.kshevetskiy@iopsys.eu>
Acked-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:42:17 +02:00
Bartosz Bilas
bbfb1118fe arm64: dts: mediatek: mt8370-grinn-genio-510-sbc: Add Grinn GenioSBC-510
Add support for Grinn GenioSBC-510. Based on the commit introducing
support for the Grinn GenioSBC-700, this change adds support for the
Grinn GenioSBC-510, a single-board computer based on the MediaTek Genio
510 SoC.

More details about the hardware:
- https://grinn-global.com/products/grinn-geniosom-510
- https://grinn-global.com/products/grinn-genioboard-edge-ai-sbc

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Bartosz Bilas <b.bilas@grinn-global.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:39:43 +02:00
Mateusz Koza
f4d1eace54 arm64: dts: mediatek: mt8390-genio-700-evk: Add Grinn GenioSBC-700
Add support for Grinn GenioSBC-700. The Grinn GenioSBC-700 is a
single-board computer based on the MediaTek Genio 700 SoC. Its device
tree is split into separate SoM (.dtsi) and SBC (.dtsi) files, which are
combined in the SoC-specific .dts file.

More details about the hardware:
- https://grinn-global.com/products/grinn-geniosom-700
- https://grinn-global.com/products/grinn-genioboard-edge-ai-sbc

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Mateusz Koza <mateusz.koza@grinn-global.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:38:45 +02:00
Wolfram Sang
8895b0e600 arm64: dts: renesas: eagle-function-expansion: Add eMMC support
Add pinmuxing and configuration of the MMC-capable SDHI instance to make
use of the eMMC.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250923161709.3110-2-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-13 11:35:40 +02:00
Macpaul Lin
b59a508c1f arm64: defconfig: Enable UFS support for MediaTek Genio 1200 EVK UFS board
Enable the UFS related settings to support Genio 1200 EVK UFS board.
This board uses UFS as the boot device and also the main storage.
This includes support for:
 - CONFIG_SCSI_UFS_MEDIATEK

Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:35:35 +02:00
Daniel Golle
6bb220964d arm64: dts: mediatek: mt7988a: add 'soc-uuid' cell to efuse
The efuse of the MediaTek MT7988 contains a 16-byte unique identifier.
Add a 'soc-uuid' cell covering those 16 bytes to the nvmem defininition
of the efuse to allow easy access from userspace, eg. to generate a
persistent random MAC address on boards like the BananaPi R4 which
doesn't have any factory-assigned addresses.

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:34:37 +02:00
Daniel Golle
d261557f7a arm64: dts: mediatek: mt7981b: add 'soc-uuid' cell to efuse
The efuse of the MediaTek MT7981 contains a 16-byte unique identifier.
Add a 'soc-uuid' cell covering those 16 bytes to the nvmem defininition
of the efuse to allow easy access from userspace, eg. to generate a
persistent random MAC address on boards which don't have any
factory-assigned addresses.

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:34:37 +02:00
Daniel Golle
18d0f98a17 arm64: dts: mediatek: mt7986a: add 'soc-uuid' cell to efuse
The efuse of the MediaTek MT7986 contains an 8-byte unique identifier.
Add a 'soc-uuid' cell covering those 8 bytes to the nvmem defininition
of the efuse to allow easy access from userspace, eg. to generate a
persistent random MAC address on boards like the BananaPi R3 which
doesn't have any factory-assigned addresses.

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:34:37 +02:00
Daniel Golle
79fb9654a2 arm64: dts: mediatek: mt7622: add 'soc-uuid' cell to efuse
The efuse of the MediaTek MT7622 contains an 8-byte unique identifier.
Add a 'soc-uuid' cell covering those 8 bytes to the nvmem defininition
of the efuse to allow easy access from userspace, eg. to generate a
persistent random MAC address on boards like the BananaPi R64 which
doesn't have any factory-assigned addresses.

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:34:37 +02:00
Bryan Hinton
36712c5cf9 arm64: dts: mediatek: mt7981b: OpenWrt One: set console UART to 115200
Use the new uart0 label for the console and make the speed explicit by
setting stdout-path = "serial0:115200n8" under /chosen. This keeps the
DTS OS-agnostic: no bootargs or distribution-specific properties are
added.

Drop the 'current-speed' property from uart0 as it is not allowed by the
mediatek UART binding. The baud is already provided via stdout-path.

Verification: Boot-tested with mainline Image+DTB via U-Boot on OpenWrt
One (MT7981B). Serial console active at 115200, DTB decompile
confirms serial0 alias and stdout-path set correctly.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202509211032.0rJjPoYE-lkp@intel.com/
Signed-off-by: Bryan Hinton <bryan@bryanhinton.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:34:15 +02:00
Bryan Hinton
c0a824edb6 arm64: dts: mediatek: mt7981b: add labels for UART0/1/2 nodes
Add stable labels (uart0, uart1, uart2) to the MT7981B SoC UART nodes so
board DTS files can reference them directly. This change is purely
cosmetic and introduces no functional differences.

Verification: Built dtbs and boot-tested mainline Image+DTB via U-Boot on
MT7981B hardware; decompiled DT shows the uart0 label present and the
serial0 alias (or absolute path) resolves to serial@11002000.

Signed-off-by: Bryan Hinton <bryan@bryanhinton.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:34:15 +02:00
Macpaul Lin
e6bd92bb20 arm64: dts: mediatek: add device-tree for Genio 1200 EVK UFS board
Add a basic device-tree (mt8395-genio-1200-evk-ufs.dts) in order to be able
to use UFS storage as the main storage on Genio 1200 EVK board.

This board is the origin Genio 1200 EVK already mounted two main storages,
one is eMMC, and the other is UFS. The system automatically prioritizes
between eMMC and UFS via BROM detection, so user could not use both storage
types simultaneously. As a result, mt8395-evk-ufs must be treated as a
separate board.

It use mt8395-genio-common.dtsi file to use common definitions.

Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:33:51 +02:00
Macpaul Lin
6af8936fd7 arm64: dts: mediatek: mt8395-genio-1200-evk: Move common parts to dtsi
In preparation for introducing the Genio 1200 EVK UFS board support, split
mt8395-genio-1200-evk.dts file in two to create mt8395-genio-common.dtsi
file, containing common definitions for both boards.

Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:33:51 +02:00
Macpaul Lin
3d37117ab4 dt-bindings: arm64: mediatek: add mt8395-evk-ufs board
Add a compatible string for the MediaTek mt8395-evk-ufs board.
This board is the origin Genio 1200 EVK already mounted two main storages,
one is eMMC, and the other is UFS. The system automatically prioritizes
between eMMC and UFS via BROM detection, so user could not use both storage
types simultaneously. As a result, mt8395-evk-ufs must be treated as a
separate board.

Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:33:51 +02:00
Wolfram Sang
43e7b3c713 arm64: dts: renesas: v3[mh]sk: Remove wrong sound property in HDMI encoder node
'#sound-dai-cells' is not mentioned in the encoder bindings doc, so
dtbs_check rightfully complains. Remove the property.

.../renesas-v8/arch/arm64/boot/dts/renesas/r8a77980-v3hsk.dtb: hdmi@39 (adi,adv7511w): '#sound-dai-cells' does not match any of the regexes: 'pinctrl-[0-9]+'
	from schema $id: http://devicetree.org/schemas/display/bridge/adi,adv7511.yaml#

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250920070433.8229-2-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-13 11:32:25 +02:00
John Madieu
1ad66039be arm64: dts: renesas: r9a09g047: Add TSU node
Add TSU node along with thermal zones and keep it enabled in the SoC DTSI.

Signed-off-by: John Madieu <john.madieu.xa@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250917170202.197929-4-john.madieu.xa@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-13 11:32:25 +02:00
Marek Vasut
45447550f8 arm64: dts: renesas: sparrow-hawk: Add overlay for Argon40 fan HAT
Add a DT overlay to bind the Argon40 fan HAT on the Retronix R-Car V4H
Sparrow Hawk board.  Fan RPM control and full RPM on reboot has been
tested.

Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250907161130.218470-1-marek.vasut+renesas@mailbox.org
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-13 11:32:25 +02:00
Marek Vasut
38ef5be6e1 arm64: dts: renesas: sparrow-hawk: Add overlay for RPi Display 2
Add a DT overlay to bind the Raspberry Pi Display 2 (both 5" and 7"
variants), on the Retronix R-Car V4H Sparrow Hawk board.  All of display
output, touch controller, and backlight control have been tested.

Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250904202838.172579-1-marek.vasut+renesas@mailbox.org
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-13 11:32:25 +02:00
Laurent Pinchart
f16068832a arm64: dts: renesas: rzg2l-smarc: Drop clock-frequency from camera sensor node
The clock-frequency for camera sensors has been deprecated in favour of
the assigned-clocks and assigned-clock-rates properties. As the clock
source for the sensor is a fixed-frequency oscillator, simply drop the
clock-frequency.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Mehdi Djait <mehdi.djait@linux.intel.com>
Link: https://patch.msgid.link/20250812214620.30425-73-laurent.pinchart@ideasonboard.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-13 11:32:25 +02:00
Laurent Pinchart
6f83835f56 arm64: dts: renesas: aistarvision-mipi-adapter-2.1: Drop clock-frequency from camera sensor node
The clock-frequency for camera sensors has been deprecated in favour of
the assigned-clocks and assigned-clock-rates properties. As the clock
source for the sensor is a fixed-frequency oscillator, simply drop the
clock-frequency.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Mehdi Djait <mehdi.djait@linux.intel.com>
Link: https://patch.msgid.link/20250812214620.30425-72-laurent.pinchart@ideasonboard.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-13 11:32:25 +02:00
Julien Massot
872fa3ea0c arm64: dts: mt8183: Rename nodes to match audiosys DT schema
The DT schema for "mediatek,mt8183-audiosys" expects an
audio-controller node inside the audiosys block. Rename
the nested AFE node from "mt8183-afe-pcm" to
"audio-controller" accordingly.

Also rename the audiosys node itself from "audio-controller" to
"clock-controller" to better reflect its function.

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Julien Massot <julien.massot@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2025-10-13 11:31:24 +02:00
Claudiu Beznea
ee9bfab464 arm64: dts: renesas: r9a08g045: Add TSU node
Add TSU node along with thermal zones and keep it enabled in the SoC DTSI.
The temperature reported by the TSU can only be read through channel 8 of
the ADC. Therefore, enable the ADC by default.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Link: https://patch.msgid.link/20250810122125.792966-4-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-13 11:15:42 +02:00
John Madieu
caf433bb59 arm64: defconfig: Enable the Renesas RZ/G3E thermal driver
Enable the Renesas RZ/G3E thermal driver, as used on the Renesas
RZ/G3E SMARC EVK board.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: John Madieu <john.madieu.xa@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250917170202.197929-5-john.madieu.xa@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-13 11:09:48 +02:00
Claudiu Beznea
823062d153 arm64: defconfig: Enable Renesas RZ/G3S thermal driver
Enable the CONFIG_RZG3S_THERMAL flag for the RZ/G3S SoC.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Link: https://patch.msgid.link/20250810122125.792966-5-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-13 11:08:59 +02:00
Chen-Yu Tsai
a3606e8a78 arm64: dts: allwinner: t527: orangepi-4a: Enable Ethernet port
On the Orangepi 4A board, the second Ethernet controller, aka the GMAC200,
is connected to an external Motorcomm YT8531 PHY. The PHY uses an external
25MHz crystal, has the SoC's PI15 pin connected to its reset pin, and
the PI16 pin for its interrupt pin.

Enable it.

Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250923140247.2622602-7-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2025-10-13 15:52:14 +08:00
Chen-Yu Tsai
2e5d147ba9 arm64: dts: allwinner: t527: avaota-a1: enable second Ethernet port
On the Avaota A1 board, the second Ethernet controller, aka the GMAC200,
is connected to a second external RTL8211F-CG PHY. The PHY uses an
external 25MHz crystal, and has the SoC's PJ16 pin connected to its
reset pin.

Enable the second Ethernet port. Also fix up the label for the existing
external PHY connected to the first Ethernet port.

Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250923140247.2622602-6-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2025-10-13 15:52:14 +08:00
Chen-Yu Tsai
7076938d20 arm64: dts: allwinner: a527: cubie-a5e: Enable second Ethernet port
On the Radxa Cubie A5E board, the second Ethernet controller, aka the
GMAC200, is connected to a second external Maxio MAE0621A PHY. The PHY
uses an external 25MHz crystal, and has the SoC's PJ16 pin connected to
its reset pin.

Enable the second Ethernet port. Also fix up the label for the existing
external PHY connected to the first Ethernet port. An enable delay for the
PHY supply regulator is added to make sure the PHY's internal regulators
are fully powered and the PHY is operational.

Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250923140247.2622602-5-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2025-10-13 15:52:14 +08:00
Chen-Yu Tsai
460a71b564 arm64: dts: allwinner: a523: Add GMAC200 ethernet controller
The A523 SoC family has a second ethernet controller, called the
GMAC200. It is not exposed on all the SoCs in the family.

Add a device node for it. All the hardware specific settings are from
the vendor BSP.

Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20250923140247.2622602-4-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2025-10-13 15:52:13 +08:00
André Draszik
f409bd20f0 arm64: dts: exynos: gs101-pixel-common: add node for s2mpg10 / clock
On Pixel 6 (and Pro), a Samsung S2MPG10 is used as main PMIC, which
contains the following functional blocks:
    * common / speedy interface
    * regulators
    * 3 clock outputs
    * RTC
    * power meters

This change adds a node for the clock outputs which are used as inputs
as follows:
* RTC clock for AP
* GNSS receiver, WLAN, Bluetooth
* vibrator, modem

The names have been chosen to match the schematic.

Signed-off-by: André Draszik <andre.draszik@linaro.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 04:37:46 +02:00
Krzysztof Kozlowski
3263127074 soc: samsung: exynos-pmu: Annotate online/offline functions with __must_hold
Annotate functions writing to PMU registers to online and offline CPUs
as __must_hold() the necessary spinlock for code correctness.  These are
static functions so possibility of mistakes is low here, but
__must_hold() serves as self-documenting code.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:58:51 +02:00
Alexandru Chimac
fe31e894c2 soc: samsung: exynos-chipid: Add exynos9610 SoC support
Exynos9610's product ID is "0xE9610000". Add this ID to the IDs
table along with the name of the SoC.

Signed-off-by: Alexandru Chimac <alex@chimac.ro>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:58:07 +02:00
Alexandru Chimac
944d40232e dt-bindings: hwinfo: samsung,exynos-chipid: add exynos9610 compatible
Add a compatible for the "samsung,exynos9610-chipid" node, used by
Exynos9610 platforms.

Signed-off-by: Alexandru Chimac <alex@chimac.ro>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:58:07 +02:00
Denzeel Oliva
1e6daba7a3 arm64: dts: exynos990: Add sysreg nodes for PERIC0 and PERIC1
Add syscon nodes for PERIC0 and PERIC1 blocks.
These are required for configuring the USI, SPI and I2C controllers.

Signed-off-by: Denzeel Oliva <wachiturroxd150@gmail.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:57:09 +02:00
Denzeel Oliva
ca4bd675ec dt-bindings: soc: samsung: exynos-sysreg: Add Exynos990 PERIC0/1 compatibles
Add compatible strings for Exynos990 PERIC0 and PERIC1 system register
controllers.

Signed-off-by: Denzeel Oliva <wachiturroxd150@gmail.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:56:27 +02:00
Leo Wang
b5dd162282 ARM: dts: aspeed: clemente: Add HDD LED GPIO
Define a GPIO expander pin for the HDD LED and expose it via the
LED subsystem. This allows the BMC to control the front panel
HDD activity LED.

Signed-off-by: Leo Wang <leo.jt.wang@gmail.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-13 11:21:14 +10:30
Eddie James
5ee7313022 ARM: dts: aspeed: Fix max31785 fan properties
Remove non-existant fan properties from max31785 nodes.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-13 11:21:14 +10:30
Eddie James
c0377c18f4 ARM: dts: aspeed: Add Balcones system
The Balcones system is similar to Bonnell but with a POWER11 processor.
Like POWER10, the POWER11 is a dual-chip module, so a dual chip FSI
tree is needed. Therefore, split up the quad chip FSI tree.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-13 11:21:14 +10:30
Eddie James
fa2d3aa180 dt-bindings: arm: aspeed: add IBM Bonnell board
Document the existing AST2600 BMC board for IBM P10 server.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-13 11:21:14 +10:30
Eddie James
7d970b893e dt-bindings: arm: aspeed: add IBM Balcones board
Document a new AST2600 BMC board for IBM P11 server.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-13 11:21:14 +10:30
Daniel Hsu
3b223bd48f ARM: dts: aspeed: harma: Add MCTP I2C controller node
The Facebook Harma BMC uses I2C1 as an MCTP (Management Component
Transport Protocol) bus. This patch enables the controller by
adding the `mctp-i2c-controller` node under I2C1, with multi-master
support.

Signed-off-by: Daniel Hsu <Daniel-Hsu@quantatw.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-13 11:21:14 +10:30
Zane Li
f4c7b62be2 ARM: dts: aspeed: yosemite4: allocate ramoops for kernel panic
Reserve a ramoops memory region in the Yosemite4 device tree so that
kernel panic logs can be preserved across reboots. This helps with
post-mortem debugging and crash analysis.

Signed-off-by: Zane Li <zane_li@wiwynn.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-13 11:21:14 +10:30
Leo Wang
7474ec9300 ARM: dts: aspeed: clemente: add shunt-resistor-micro-ohms for LM5066i
Add the 'shunt-resistor-micro-ohms' property to the LM5066i power
monitors on I2C1 for the Meta Clemente BMC board. This accurately
describes the hardware and is required for proper power monitoring.

Signed-off-by: Leo Wang <leo.jt.wang@gmail.com>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
2025-10-13 11:21:13 +10:30
Raghav Sharma
e3be50e829 arm64: dts: exynosautov920: add CMU_MFC clock DT nodes
Add required dt node for CMU_MFC block, which provides
clocks for MFC IP

Signed-off-by: Raghav Sharma <raghav.s@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:37:52 +02:00
Raghav Sharma
7b0ccabb87 arm64: dts: exynosautov920: add CMU_M2M clock DT nodes
Add required dt node for CMU_M2M block, which provides
clocks for M2M IP

Signed-off-by: Raghav Sharma <raghav.s@samsung.com>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:37:22 +02:00
Kaustabh Chakraborty
2bb07378ba arm64: dts: exynos7870-on7xelte: add bus-width to mmc0 node
Add the bus-width property in &mmc0 node. The Exynos DWMMC driver
assumes bus width to be 8 if not present in devicetree, so at least with
respect to the Linux kernel, this doesn't introduce any functional
changes. But other drivers referring to it may not. Either way, without
the bus-width property the hardware description remains incomplete.

Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:24:51 +02:00
Kaustabh Chakraborty
9e3e379393 arm64: dts: exynos7870-j6lte: add bus-width to mmc0 node
Add the bus-width property in &mmc0 node. The Exynos DWMMC driver
assumes bus width to be 8 if not present in devicetree, so at least with
respect to the Linux kernel, this doesn't introduce any functional
changes. But other drivers referring to it may not. Either way, without
the bus-width property the hardware description remains incomplete.

Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:24:51 +02:00
Kaustabh Chakraborty
5b33bce0bd arm64: dts: exynos7870-a2corelte: add bus-width to mmc0 node
Add the bus-width property in &mmc0 node. The Exynos DWMMC driver
assumes bus width to be 8 if not present in devicetree, so at least with
respect to the Linux kernel, this doesn't introduce any functional
changes. But other drivers referring to it may not. Either way, without
the bus-width property the hardware description remains incomplete.

Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:24:46 +02:00
Krzysztof Kozlowski
50c833c5cd memory: tegra30-emc: Add the SoC model prefix to functions
Replace "tegra_emc" with "tegra30_emc" in all functions to:
1. Avoid name clashing with other Tegra EMC drivers which makes it
   easier to jump to function definitions,
2. Decode the calltraces a bit easier,
3. Unify with other Tegra MC and EMC drivers, which use the SoC model
   prefixes.

No functional impact.

Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:23:03 +02:00
Krzysztof Kozlowski
5c8c19417c memory: tegra20-emc: Add the SoC model prefix to functions
Replace "tegra_emc" with "tegra20_emc" in all functions to:
1. Avoid name clashing with other Tegra EMC drivers which makes it
   easier to jump to function definitions,
2. Decode the calltraces a bit easier,
3. Unify with other Tegra MC and EMC drivers, which use the SoC model
   prefixes.

No functional impact.

Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:23:02 +02:00
Krzysztof Kozlowski
4ebcacbb44 memory: tegra186-emc: Add the SoC model prefix to functions
Replace "tegra_emc" with "tegra186_emc" in all functions to:
1. Avoid name clashing with other Tegra EMC drivers which makes it
   easier to jump to function definitions,
2. Decode the calltraces a bit easier,
3. Unify with other Tegra MC and EMC drivers, which use the SoC model
   prefixes.

No functional impact.

Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:23:02 +02:00
Krzysztof Kozlowski
e6e50496b7 memory: tegra124-emc: Add the SoC model prefix to functions
Replace "tegra_emc" with "tegra124_emc" in all functions to:
1. Avoid name clashing with other Tegra EMC drivers which makes it
   easier to jump to function definitions,
2. Decode the calltraces a bit easier,
3. Unify with other Tegra MC and EMC drivers, which use the SoC model
   prefixes.

No functional impact.

Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:23:01 +02:00
Krzysztof Kozlowski
f398631b76 memory: tegra124-emc: Simplify and handle deferred probe with dev_err_probe()
Certain calls, like clk_get, can cause probe deferral and driver should
handle it.  Use dev_err_probe() to fix that and also change other
non-deferred errors cases to make the code simpler.

Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:23:01 +02:00
Krzysztof Kozlowski
a52ddb98a6 memory: tegra186-emc: Simplify and handle deferred probe with dev_err_probe()
Certain calls, like clk_get, can cause probe deferral and driver should
handle it.  Use dev_err_probe() to fix that and also change other
non-deferred errors cases to make the code simpler.

Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:23:00 +02:00
Krzysztof Kozlowski
57c9f6e29c memory: tegra20-emc: Simplify and handle deferred probe with dev_err_probe()
Certain calls, like clk_get, can cause probe deferral and driver should
handle it.  Use dev_err_probe() to fix that and also change other
non-deferred errors cases to make the code simpler.

Also fix missing new line in error message of devm_devfreq_add_device().

Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:23:00 +02:00
Krzysztof Kozlowski
db2bd7ab1a memory: tegra30-emc: Simplify and handle deferred probe with dev_err_probe()
Certain calls, like clk_get, can cause probe deferral and driver should
handle it.  Use dev_err_probe() to fix that and also change other
non-deferred errors cases to make the code simpler.

Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:23:00 +02:00
Krzysztof Kozlowski
c0ca941c93 memory: tegra30-emc: Do not print error on icc_node_create() failure
icc_node_create() is alloc-like function, so no need to print error
messages on its failure.  Dropping one label makes the code a bit
simpler.

Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:22:59 +02:00
Krzysztof Kozlowski
e215d91d66 memory: tegra20-emc: Do not print error on icc_node_create() failure
icc_node_create() is alloc-like function, so no need to print error
messages on its failure.  Dropping one label makes the code a bit
simpler.

Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:22:59 +02:00
Krzysztof Kozlowski
515498a3f5 memory: tegra186-emc: Do not print error on icc_node_create() failure
icc_node_create() is alloc-like function, so no need to print error
messages on its failure.  Dropping one label makes the code a bit
simpler.

Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:22:58 +02:00
Krzysztof Kozlowski
1c9cce8a0e memory: tegra124-emc: Do not print error on icc_node_create() failure
icc_node_create() is alloc-like function, so no need to print error
messages on its failure.  Dropping one label makes the code a bit
simpler.

Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:22:58 +02:00
Krzysztof Kozlowski
da722f1c9d memory: tegra124-emc: Simplify return of emc_init()
emc_init() returns always success, so just drop the return value to
simplify it.

Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-10-13 02:22:53 +02:00
Jay Buddhabhatti
25e3ae0ce3 drivers: firmware: xilinx: Switch to new family code in zynqmp_pm_get_family_info()
Currently, the family code and subfamily code are derived from the
PMC_TAP_IDCODE register. Versal, Versal NET share the same family
code. Also some platforms share the same subfamily code, making it
difficult to distinguish between platforms. Update
zynqmp_pm_get_family_info() to use IDs derived from the compatible
string instead of silicon ID codes derived from PMC_TAP_IDCODE register.

Signed-off-by: Jay Buddhabhatti <jay.buddhabhatti@amd.com>
Link: https://lore.kernel.org/r/20250701123851.1314531-4-jay.buddhabhatti@amd.com
Signed-off-by: Michal Simek <michal.simek@amd.com>
2025-08-29 09:58:45 +02:00
Jay Buddhabhatti
e66f4c35e3 drivers: firmware: xilinx: Add unique family code for all platforms
The family code is currently derived from the PMC_TAP_IDCODE register
value, but there are issues where Versal, Versal NET, and future
platforms share the same family code. Additionally for some platforms
have identical subfamily code, making it challenging to differentiate
between platforms based on the family and subfamily codes. To resolve
this, a new family code member is added to the platform data, initialized
with unique values. This change enables better platform distinction via
the compatible string.

Signed-off-by: Jay Buddhabhatti <jay.buddhabhatti@amd.com>
Link: https://lore.kernel.org/r/20250701123851.1314531-3-jay.buddhabhatti@amd.com
Signed-off-by: Michal Simek <michal.simek@amd.com>
2025-08-29 09:58:43 +02:00
Jay Buddhabhatti
ff1c629cea firmware: xilinx: Add Versal NET platform compatible string
Introduce a compatible string for the Versal NET platform which enables
platform identification based on the compatible property for platform
specific configurations.

Signed-off-by: Jay Buddhabhatti <jay.buddhabhatti@amd.com>
Link: https://lore.kernel.org/r/20250701123851.1314531-2-jay.buddhabhatti@amd.com
Signed-off-by: Michal Simek <michal.simek@amd.com>
2025-08-29 09:57:31 +02:00
Madhav Bhatt
548fe51740 firmware: xilinx: Add debugfs support for PM_GET_NODE_STATUS
Add new debug interface to support PM_GET_NODE_STATUS to get the node
information like requirements and usage.

The debugfs firmware driver interface is only meant for testing and
debugging EEMI APIs. Hence, it is by-default disabled in production
systems.

Signed-off-by: Madhav Bhatt <madhav.bhatt@amd.com>
Link: https://lore.kernel.org/r/20250417094543.3873507-1-madhav.bhatt@amd.com
Signed-off-by: Michal Simek <michal.simek@amd.com>
2025-08-29 09:57:31 +02:00
1609 changed files with 80888 additions and 17917 deletions

View File

@@ -140,8 +140,8 @@ ForEachMacros:
- 'damon_for_each_scheme_safe'
- 'damon_for_each_target'
- 'damon_for_each_target_safe'
- 'damos_for_each_filter'
- 'damos_for_each_filter_safe'
- 'damos_for_each_core_filter'
- 'damos_for_each_core_filter_safe'
- 'damos_for_each_ops_filter'
- 'damos_for_each_ops_filter_safe'
- 'damos_for_each_quota_goal'

View File

@@ -186,6 +186,9 @@ Christian Brauner <brauner@kernel.org> <christian@brauner.io>
Christian Brauner <brauner@kernel.org> <christian.brauner@canonical.com>
Christian Brauner <brauner@kernel.org> <christian.brauner@ubuntu.com>
Christian Marangi <ansuelsmth@gmail.com>
Christophe Leroy <chleroy@kernel.org> <christophe.leroy@c-s.fr>
Christophe Leroy <chleroy@kernel.org> <christophe.leroy@csgroup.eu>
Christophe Leroy <chleroy@kernel.org> <christophe.leroy2@cs-soprasteria.com>
Christophe Ricard <christophe.ricard@gmail.com>
Christopher Obbard <christopher.obbard@linaro.org> <chris.obbard@collabora.com>
Christoph Hellwig <hch@lst.de>

View File

@@ -164,6 +164,13 @@ Description: Writing to and reading from this file sets and gets the pid of
the target process if the context is for virtual address spaces
monitoring, respectively.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/targets/<T>/obsolete_target
Date: Oct 2025
Contact: SeongJae Park <sj@kernel.org>
Description: Writing to and reading from this file sets and gets the
obsoleteness of the matching parameters commit destination
target.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/targets/<T>/regions/nr_regions
Date: Mar 2022
Contact: SeongJae Park <sj@kernel.org>
@@ -303,6 +310,12 @@ Contact: SeongJae Park <sj@kernel.org>
Description: Writing to and reading from this file sets and gets the nid
parameter of the goal.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/goals/<G>/path
Date: Oct 2025
Contact: SeongJae Park <sj@kernel.org>
Description: Writing to and reading from this file sets and gets the path
parameter of the goal.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/weights/sz_permil
Date: Mar 2022
Contact: SeongJae Park <sj@kernel.org>

View File

@@ -1513,6 +1513,10 @@ The following nested keys are defined.
oom_group_kill
The number of times a group OOM has occurred.
sock_throttled
The number of times network sockets associated with
this cgroup are throttled.
memory.events.local
Similar to memory.events but the fields in the file are local
to the cgroup i.e. not hierarchical. The file modified event

View File

@@ -1111,7 +1111,7 @@ Kernel parameters
It will be ignored when crashkernel=X,high is not used
or memory reserved is below 4G.
crashkernel=size[KMG],cma
[KNL, X86] Reserve additional crash kernel memory from
[KNL, X86, ppc] Reserve additional crash kernel memory from
CMA. This reservation is usable by the first system's
userspace memory and kernel movable allocations (memory
balloon, zswap). Pages allocated from this memory range
@@ -7304,6 +7304,9 @@ Kernel parameters
them frequently to increase the rate of SLB faults
on kernel addresses.
no_slb_preload [PPC,EARLY]
Disables slb preloading for userspace.
sunrpc.min_resvport=
sunrpc.max_resvport=
[NFS,SUNRPC]

View File

@@ -211,6 +211,28 @@ End of target memory region in physical address.
The end physical address of memory region that DAMON_LRU_SORT will do work
against. By default, biggest System RAM is used as the region.
addr_unit
---------
A scale factor for memory addresses and bytes.
This parameter is for setting and getting the :ref:`address unit
<damon_design_addr_unit>` parameter of the DAMON instance for DAMON_RECLAIM.
``monitor_region_start`` and ``monitor_region_end`` should be provided in this
unit. For example, let's suppose ``addr_unit``, ``monitor_region_start`` and
``monitor_region_end`` are set as ``1024``, ``0`` and ``10``, respectively.
Then DAMON_LRU_SORT will work for 10 KiB length of physical address range that
starts from address zero (``[0 * 1024, 10 * 1024)`` in bytes).
Stat parameters having ``bytes_`` prefix are also in this unit. For example,
let's suppose values of ``addr_unit``, ``bytes_lru_sort_tried_hot_regions`` and
``bytes_lru_sorted_hot_regions`` are ``1024``, ``42``, and ``32``,
respectively. Then it means DAMON_LRU_SORT tried to LRU-sort 42 KiB of hot
memory and successfully LRU-sorted 32 KiB of the memory in total.
If unsure, use only the default value (``1``) and forget about this.
kdamond_pid
-----------

View File

@@ -232,6 +232,28 @@ The end physical address of memory region that DAMON_RECLAIM will do work
against. That is, DAMON_RECLAIM will find cold memory regions in this region
and reclaims. By default, biggest System RAM is used as the region.
addr_unit
---------
A scale factor for memory addresses and bytes.
This parameter is for setting and getting the :ref:`address unit
<damon_design_addr_unit>` parameter of the DAMON instance for DAMON_RECLAIM.
``monitor_region_start`` and ``monitor_region_end`` should be provided in this
unit. For example, let's suppose ``addr_unit``, ``monitor_region_start`` and
``monitor_region_end`` are set as ``1024``, ``0`` and ``10``, respectively.
Then DAMON_RECLAIM will work for 10 KiB length of physical address range that
starts from address zero (``[0 * 1024, 10 * 1024)`` in bytes).
``bytes_reclaim_tried_regions`` and ``bytes_reclaimed_regions`` are also in
this unit. For example, let's suppose values of ``addr_unit``,
``bytes_reclaim_tried_regions`` and ``bytes_reclaimed_regions`` are ``1024``,
``42``, and ``32``, respectively. Then it means DAMON_RECLAIM tried to reclaim
42 KiB memory and successfully reclaimed 32 KiB memory in total.
If unsure, use only the default value (``1``) and forget about this.
skip_anon
---------

View File

@@ -10,6 +10,8 @@ on the system's entire physical memory using DAMON, and provides simplified
access monitoring results statistics, namely idle time percentiles and
estimated memory bandwidth.
.. _damon_stat_monitoring_accuracy_overhead:
Monitoring Accuracy and Overhead
================================
@@ -17,9 +19,11 @@ DAMON_STAT uses monitoring intervals :ref:`auto-tuning
<damon_design_monitoring_intervals_autotuning>` to make its accuracy high and
overhead minimum. It auto-tunes the intervals aiming 4 % of observable access
events to be captured in each snapshot, while limiting the resulting sampling
events to be 5 milliseconds in minimum and 10 seconds in maximum. On a few
interval to be 5 milliseconds in minimum and 10 seconds in maximum. On a few
production server systems, it resulted in consuming only 0.x % single CPU time,
while capturing reasonable quality of access patterns.
while capturing reasonable quality of access patterns. The tuning-resulting
intervals can be retrieved via ``aggr_interval_us`` :ref:`parameter
<damon_stat_aggr_interval_us>`.
Interface: Module Parameters
============================
@@ -41,6 +45,18 @@ You can enable DAMON_STAT by setting the value of this parameter as ``Y``.
Setting it as ``N`` disables DAMON_STAT. The default value is set by
``CONFIG_DAMON_STAT_ENABLED_DEFAULT`` build config option.
.. _damon_stat_aggr_interval_us:
aggr_interval_us
----------------
Auto-tuned aggregation time interval in microseconds.
Users can read the aggregation interval of DAMON that is being used by the
DAMON instance for DAMON_STAT. It is :ref:`auto-tuned
<damon_stat_monitoring_accuracy_overhead>` and therefore the value is
dynamically changed.
estimated_memory_bandwidth
--------------------------
@@ -58,12 +74,13 @@ memory_idle_ms_percentiles
Per-byte idle time (milliseconds) percentiles of the system.
DAMON_STAT calculates how long each byte of the memory was not accessed until
now (idle time), based on the current DAMON results snapshot. If DAMON found a
region of access frequency (nr_accesses) larger than zero, every byte of the
region gets zero idle time. If a region has zero access frequency
(nr_accesses), how long the region was keeping the zero access frequency (age)
becomes the idle time of every byte of the region. Then, DAMON_STAT exposes
the percentiles of the idle time values via this read-only parameter. Reading
the parameter returns 101 idle time values in milliseconds, separated by comma.
now (idle time), based on the current DAMON results snapshot. For regions
having access frequency (nr_accesses) larger than zero, how long the current
access frequency level was kept multiplied by ``-1`` becomes the idlee time of
every byte of the region. If a region has zero access frequency (nr_accesses),
how long the region was keeping the zero access frequency (age) becomes the
idle time of every byte of the region. Then, DAMON_STAT exposes the
percentiles of the idle time values via this read-only parameter. Reading the
parameter returns 101 idle time values in milliseconds, separated by comma.
Each value represents 0-th, 1st, 2nd, 3rd, ..., 99th and 100th percentile idle
times.

View File

@@ -67,7 +67,7 @@ comma (",").
│ │ │ │ │ │ │ intervals_goal/access_bp,aggrs,min_sample_us,max_sample_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ :ref:`targets <sysfs_targets>`/nr_targets
│ │ │ │ │ │ :ref:`0 <sysfs_target>`/pid_target
│ │ │ │ │ │ :ref:`0 <sysfs_target>`/pid_target,obsolete_target
│ │ │ │ │ │ │ :ref:`regions <sysfs_regions>`/nr_regions
│ │ │ │ │ │ │ │ :ref:`0 <sysfs_region>`/start,end
│ │ │ │ │ │ │ │ ...
@@ -81,7 +81,7 @@ comma (",").
│ │ │ │ │ │ │ :ref:`quotas <sysfs_quotas>`/ms,bytes,reset_interval_ms,effective_bytes
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ │ │ :ref:`goals <sysfs_schemes_quota_goals>`/nr_goals
│ │ │ │ │ │ │ │ │ 0/target_metric,target_value,current_value,nid
│ │ │ │ │ │ │ │ │ 0/target_metric,target_value,current_value,nid,path
│ │ │ │ │ │ │ :ref:`watermarks <sysfs_watermarks>`/metric,interval_us,high,mid,low
│ │ │ │ │ │ │ :ref:`{core_,ops_,}filters <sysfs_filters>`/nr_filters
│ │ │ │ │ │ │ │ 0/type,matching,allow,memcg_path,addr_start,addr_end,target_idx,min,max
@@ -134,7 +134,8 @@ Users can write below commands for the kdamond to the ``state`` file.
- ``on``: Start running.
- ``off``: Stop running.
- ``commit``: Read the user inputs in the sysfs files except ``state`` file
again.
again. Monitoring :ref:`target region <sysfs_regions>` inputs are also be
ignored if no target region is specified.
- ``update_tuned_intervals``: Update the contents of ``sample_us`` and
``aggr_us`` files of the kdamond with the auto-tuning applied ``sampling
interval`` and ``aggregation interval`` for the files. Please refer to
@@ -264,13 +265,20 @@ to ``N-1``. Each directory represents each monitoring target.
targets/<N>/
------------
In each target directory, one file (``pid_target``) and one directory
(``regions``) exist.
In each target directory, two files (``pid_target`` and ``obsolete_target``)
and one directory (``regions``) exist.
If you wrote ``vaddr`` to the ``contexts/<N>/operations``, each target should
be a process. You can specify the process to DAMON by writing the pid of the
process to the ``pid_target`` file.
Users can selectively remove targets in the middle of the targets array by
writing non-zero value to ``obsolete_target`` file and committing it (writing
``commit`` to ``state`` file). DAMON will remove the matching targets from its
internal targets array. Users are responsible to construct target directories
again, so that those correctly represent the changed internal targets array.
.. _sysfs_regions:
targets/<N>/regions
@@ -289,6 +297,11 @@ In the beginning, this directory has only one file, ``nr_regions``. Writing a
number (``N``) to the file creates the number of child directories named ``0``
to ``N-1``. Each directory represents each initial monitoring target region.
If ``nr_regions`` is zero when committing new DAMON parameters online (writing
``commit`` to ``state`` file of :ref:`kdamond <sysfs_kdamond>`), the commit
logic ignores the target regions. In other words, the current monitoring
results for the target are preserved.
.. _sysfs_region:
regions/<N>/
@@ -402,9 +415,9 @@ number (``N``) to the file creates the number of child directories named ``0``
to ``N-1``. Each directory represents each goal and current achievement.
Among the multiple feedback, the best one is used.
Each goal directory contains four files, namely ``target_metric``,
``target_value``, ``current_value`` and ``nid``. Users can set and get the
four parameters for the quota auto-tuning goals that specified on the
Each goal directory contains five files, namely ``target_metric``,
``target_value``, ``current_value`` ``nid`` and ``path``. Users can set and
get the five parameters for the quota auto-tuning goals that specified on the
:ref:`design doc <damon_design_damos_quotas_auto_tuning>` by writing to and
reading from each of the files. Note that users should further write
``commit_schemes_quota_goals`` to the ``state`` file of the :ref:`kdamond

View File

@@ -39,7 +39,6 @@ the Linux memory management.
shrinker_debugfs
slab
soft-dirty
swap_numa
transhuge
userfaultfd
zswap

View File

@@ -115,7 +115,8 @@ Short descriptions to the page flags
A free memory block managed by the buddy system allocator.
The buddy system organizes free memory in blocks of various orders.
An order N block has 2^N physically contiguous pages, with the BUDDY flag
set for and _only_ for the first page.
set for all pages.
Before 4.6 only the first page of the block had the flag set.
15 - COMPOUND_HEAD
A compound page with order N consists of 2^N physically contiguous pages.
A compound page with order 2 takes the form of "HTTT", where H donates its

View File

@@ -1,78 +0,0 @@
===========================================
Automatically bind swap device to numa node
===========================================
If the system has more than one swap device and swap device has the node
information, we can make use of this information to decide which swap
device to use in get_swap_pages() to get better performance.
How to use this feature
=======================
Swap device has priority and that decides the order of it to be used. To make
use of automatically binding, there is no need to manipulate priority settings
for swap devices. e.g. on a 2 node machine, assume 2 swap devices swapA and
swapB, with swapA attached to node 0 and swapB attached to node 1, are going
to be swapped on. Simply swapping them on by doing::
# swapon /dev/swapA
# swapon /dev/swapB
Then node 0 will use the two swap devices in the order of swapA then swapB and
node 1 will use the two swap devices in the order of swapB then swapA. Note
that the order of them being swapped on doesn't matter.
A more complex example on a 4 node machine. Assume 6 swap devices are going to
be swapped on: swapA and swapB are attached to node 0, swapC is attached to
node 1, swapD and swapE are attached to node 2 and swapF is attached to node3.
The way to swap them on is the same as above::
# swapon /dev/swapA
# swapon /dev/swapB
# swapon /dev/swapC
# swapon /dev/swapD
# swapon /dev/swapE
# swapon /dev/swapF
Then node 0 will use them in the order of::
swapA/swapB -> swapC -> swapD -> swapE -> swapF
swapA and swapB will be used in a round robin mode before any other swap device.
node 1 will use them in the order of::
swapC -> swapA -> swapB -> swapD -> swapE -> swapF
node 2 will use them in the order of::
swapD/swapE -> swapA -> swapB -> swapC -> swapF
Similaly, swapD and swapE will be used in a round robin mode before any
other swap devices.
node 3 will use them in the order of::
swapF -> swapA -> swapB -> swapC -> swapD -> swapE
Implementation details
======================
The current code uses a priority based list, swap_avail_list, to decide
which swap device to use and if multiple swap devices share the same
priority, they are used round robin. This change here replaces the single
global swap_avail_list with a per-numa-node list, i.e. for each numa node,
it sees its own priority based list of available swap devices. Swap
device's priority can be promoted on its matching node's swap_avail_list.
The current swap device's priority is set as: user can set a >=0 value,
or the system will pick one starting from -1 then downwards. The priority
value in the swap_avail_list is the negated value of the swap device's
due to plist being sorted from low to high. The new policy doesn't change
the semantics for priority >=0 cases, the previous starting from -1 then
downwards now becomes starting from -2 then downwards and -1 is reserved
as the promoted value. So if multiple swap devices are attached to the same
node, they will all be promoted to priority -1 on that node's plist and will
be used round robin before any other swap devices.

View File

@@ -381,6 +381,11 @@ hugepage allocation policy for the tmpfs mount by using the kernel parameter
four valid policies for tmpfs (``always``, ``within_size``, ``advise``,
``never``). The tmpfs mount default policy is ``never``.
Additionally, Kconfig options are available to set the default hugepage
policies for shmem (``CONFIG_TRANSPARENT_HUGEPAGE_SHMEM_HUGE_*``) and tmpfs
(``CONFIG_TRANSPARENT_HUGEPAGE_TMPFS_HUGE_*``) at build time. Refer to the
Kconfig help for more details.
In the same manner as ``thp_anon`` controls each supported anonymous THP
size, ``thp_shmem`` controls each supported shmem THP size. ``thp_shmem``
has the same format as ``thp_anon``, but also supports the policy

View File

@@ -59,11 +59,11 @@ returned by the allocation routine and that handle must be mapped before being
accessed. The compressed memory pool grows on demand and shrinks as compressed
pages are freed. The pool is not preallocated.
When a swap page is passed from swapout to zswap, zswap maintains a mapping
of the swap entry, a combination of the swap type and swap offset, to the
zsmalloc handle that references that compressed swap page. This mapping is
achieved with a red-black tree per swap type. The swap offset is the search
key for the tree nodes.
When a swap page is passed from swapout to zswap, zswap maintains a mapping of
the swap entry, a combination of the swap type and swap offset, to the zsmalloc
handle that references that compressed swap page. This mapping is achieved
with an xarray per swap type. The swap offset is the search key for the xarray
nodes.
During a page fault on a PTE that is a swap entry, the swapin code calls the
zswap load function to decompress the page into the page allocated by the page

View File

@@ -249,6 +249,9 @@ The following keys are defined:
defined in the in the RISC-V ISA manual starting from commit e87412e621f1
("integrate Zaamo and Zalrsc text (#1304)").
* :c:macro:`RISCV_HWPROBE_EXT_ZALASR`: The Zalasr extension is supported as
frozen at commit 194f0094 ("Version 0.9 for freeze") of riscv-zalasr.
* :c:macro:`RISCV_HWPROBE_EXT_ZALRSC`: The Zalrsc extension is supported as
defined in the in the RISC-V ISA manual starting from commit e87412e621f1
("integrate Zaamo and Zalrsc text (#1304)").
@@ -275,6 +278,9 @@ The following keys are defined:
ratified in commit 49f49c842ff9 ("Update to Rafified state") of
riscv-zabha.
* :c:macro:`RISCV_HWPROBE_EXT_ZICBOP`: The Zicbop extension is supported, as
ratified in commit 3dd606f ("Create cmobase-v1.0.pdf") of riscv-CMOs.
* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: Deprecated. Returns similar values to
:c:macro:`RISCV_HWPROBE_KEY_MISALIGNED_SCALAR_PERF`, but the key was
mistakenly classified as a bitmask rather than a value.
@@ -369,4 +375,7 @@ The following keys are defined:
* :c:macro:`RISCV_HWPROBE_VENDOR_EXT_XSFVFWMACCQQQ`: The Xsfvfwmaccqqq
vendor extension is supported in version 1.0 of Matrix Multiply Accumulate
Instruction Extensions Specification.
Instruction Extensions Specification.
* :c:macro:`RISCV_HWPROBE_KEY_ZICBOP_BLOCK_SIZE`: An unsigned int which
represents the size of the Zicbop block in bytes.

View File

@@ -31,7 +31,9 @@ properties:
- description: Mercury+ AA1 boards
items:
- enum:
- enclustra,mercury-pe1
- enclustra,mercury-aa1-pe1
- enclustra,mercury-aa1-pe3
- enclustra,mercury-aa1-st1
- google,chameleon-v3
- const: enclustra,mercury-aa1
- const: altr,socfpga-arria10
@@ -52,6 +54,26 @@ properties:
- const: altr,socfpga-cyclone5
- const: altr,socfpga
- description: Mercury SA1 boards
items:
- enum:
- enclustra,mercury-sa1-pe1
- enclustra,mercury-sa1-pe3
- enclustra,mercury-sa1-st1
- const: enclustra,mercury-sa1
- const: altr,socfpga-cyclone5
- const: altr,socfpga
- description: Mercury+ SA2 boards
items:
- enum:
- enclustra,mercury-sa2-pe1
- enclustra,mercury-sa2-pe3
- enclustra,mercury-sa2-st1
- const: enclustra,mercury-sa2
- const: altr,socfpga-cyclone5
- const: altr,socfpga
- description: Stratix 10 boards
items:
- enum:

View File

@@ -134,6 +134,7 @@ properties:
- libretech,aml-s912-pc
- minix,neo-u9h
- nexbox,a1
- oranth,tx9-pro
- tronsmart,vega-s96
- ugoos,am3
- videostrong,gxm-kiii-pro

View File

@@ -34,6 +34,9 @@ properties:
- amlogic,a4-ao-secure
- amlogic,c3-ao-secure
- amlogic,s4-ao-secure
- amlogic,s6-ao-secure
- amlogic,s7-ao-secure
- amlogic,s7d-ao-secure
- amlogic,t7-ao-secure
- const: amlogic,meson-gx-ao-secure
- const: syscon

View File

@@ -93,7 +93,10 @@ properties:
- facebook,minerva-cmc
- facebook,santabarbara-bmc
- facebook,yosemite4-bmc
- facebook,yosemite5-bmc
- ibm,balcones-bmc
- ibm,blueridge-bmc
- ibm,bonnell-bmc
- ibm,everest-bmc
- ibm,fuji-bmc
- ibm,rainier-bmc

View File

@@ -0,0 +1,31 @@
# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
%YAML 1.2
---
$id: http://devicetree.org/schemas/arm/bst.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: BST platforms
description:
Black Sesame Technologies (BST) is a semiconductor company that produces
automotive-grade system-on-chips (SoCs) for intelligent driving, focusing
on computer vision and AI capabilities. The BST C1200 family includes SoCs
for ADAS (Advanced Driver Assistance Systems) and autonomous driving
applications.
maintainers:
- Ge Gordon <gordon.ge@bst.ai>
properties:
$nodename:
const: '/'
compatible:
oneOf:
- description: BST C1200 CDCU1.0 ADAS 4C2G board
items:
- const: bst,c1200-cdcu1.0-adas-4c2g
- const: bst,c1200
additionalProperties: true
...

View File

@@ -1106,11 +1106,14 @@ properties:
- gateworks,imx8mp-gw75xx-2x # i.MX8MP Gateworks Board
- gateworks,imx8mp-gw82xx-2x # i.MX8MP Gateworks Board
- gocontroll,moduline-display # GOcontroll Moduline Display controller
- prt,prt8ml # Protonic PRT8ML
- skov,imx8mp-skov-basic # SKOV i.MX8MP baseboard without frontplate
- skov,imx8mp-skov-revb-hdmi # SKOV i.MX8MP climate control without panel
- skov,imx8mp-skov-revb-lt6 # SKOV i.MX8MP climate control with 7” panel
- skov,imx8mp-skov-revb-mi1010ait-1cp1 # SKOV i.MX8MP climate control with 10.1" panel
- skov,imx8mp-skov-revc-hdmi # SKOV i.MX8MP climate control without panel
- skov,imx8mp-skov-revc-bd500 # SKOV i.MX8MP climate control with LED frontplate
- skov,imx8mp-skov-revc-jutouch-jt101tm023 # SKOV i.MX8MP climate control with 10" JuTouch panel
- skov,imx8mp-skov-revc-tian-g07017 # SKOV i.MX8MP climate control with 7" panel
- ultratronik,imx8mp-ultra-mach-sbc # Ultratronik SBC i.MX8MP based board
- ysoft,imx8mp-iota2-lumpy # Y Soft i.MX8MP IOTA2 Lumpy Board
@@ -1430,6 +1433,7 @@ properties:
- enum:
- fsl,imx95-15x15-evk # i.MX95 15x15 EVK Board
- fsl,imx95-19x19-evk # i.MX95 19x19 EVK Board
- toradex,verdin-imx95-19x19-evk # i.MX95 Verdin Evaluation Kit (EVK)
- const: fsl,imx95
- description: PHYTEC i.MX 95 FPSC based Boards
@@ -1439,6 +1443,12 @@ properties:
- const: phytec,imx95-phycore-fpsc # phyCORE-i.MX 95 FPSC
- const: fsl,imx95
- description: Toradex Boards with SMARC iMX95 Modules
items:
- const: toradex,smarc-imx95-dev # Toradex SMARC iMX95 on Toradex SMARC Development Board
- const: toradex,smarc-imx95 # Toradex SMARC iMX95 Module
- const: fsl,imx95
- description: i.MXRT1050 based Boards
items:
- enum:
@@ -1492,6 +1502,13 @@ properties:
- const: tq,imx93-tqma9352 # TQ-Systems GmbH i.MX93 TQMa93xxCA/LA SOM
- const: fsl,imx93
- description: PHYTEC phyCORE-i.MX91 SoM based boards
items:
- enum:
- phytec,imx91-phyboard-segin # phyBOARD-Segin with i.MX91
- const: phytec,imx91-phycore-som # phyCORE-i.MX91 SoM
- const: fsl,imx91
- description: PHYTEC phyCORE-i.MX93 SoM based boards
items:
- enum:

View File

@@ -21,10 +21,17 @@ properties:
- intel,socfpga-agilex-n6000
- intel,socfpga-agilex-socdk
- const: intel,socfpga-agilex
- description: Agilex3 boards
items:
- enum:
- intel,socfpga-agilex3-socdk
- const: intel,socfpga-agilex3
- const: intel,socfpga-agilex5
- description: Agilex5 boards
items:
- enum:
- intel,socfpga-agilex5-socdk
- intel,socfpga-agilex5-socdk-013b
- intel,socfpga-agilex5-socdk-nand
- const: intel,socfpga-agilex5

View File

@@ -38,6 +38,7 @@ properties:
- const: mediatek,mt6580
- items:
- enum:
- alcatel,yarisxl
- prestigio,pmt5008-3g
- const: mediatek,mt6582
- items:
@@ -113,6 +114,12 @@ properties:
- const: bananapi,bpi-r4-2g5
- const: bananapi,bpi-r4
- const: mediatek,mt7988a
- items:
- enum:
- bananapi,bpi-r4-pro-4e
- bananapi,bpi-r4-pro-8x
- const: bananapi,bpi-r4-pro
- const: mediatek,mt7988a
- items:
- enum:
- mediatek,mt8127-moose
@@ -445,6 +452,7 @@ properties:
- enum:
- kontron,3-5-sbc-i1200
- mediatek,mt8395-evk
- mediatek,mt8395-evk-ufs
- radxa,nio-12l
- const: mediatek,mt8395
- const: mediatek,mt8195

View File

@@ -88,6 +88,7 @@ properties:
- items:
- enum:
- asus,z00t
- huawei,kiwi
- longcheer,l9100
- samsung,a7
@@ -191,6 +192,11 @@ properties:
- xiaomi,riva
- const: qcom,msm8917
- items:
- enum:
- xiaomi,land
- const: qcom,msm8937
- items:
- enum:
- flipkart,rimob
@@ -340,6 +346,7 @@ properties:
- particle,tachyon
- qcom,qcm6490-idp
- qcom,qcs6490-rb3gen2
- radxa,dragon-q6a
- shift,otter
- const: qcom,qcm6490
@@ -893,6 +900,7 @@ properties:
- items:
- enum:
- huawei,planck
- lenovo,yoga-c630
- lg,judyln
- lg,judyp
@@ -1083,7 +1091,13 @@ properties:
- items:
- enum:
- asus,zenbook-a14-ux3407qa
- asus,zenbook-a14-ux3407qa-lcd
- asus,zenbook-a14-ux3407qa-oled
- const: asus,zenbook-a14-ux3407qa
- const: qcom,x1p42100
- items:
- enum:
- hp,omnibook-x14-fe1
- lenovo,thinkbook-16
- qcom,x1p42100-crd
@@ -1167,6 +1181,7 @@ allOf:
- qcom,apq8094
- qcom,apq8096
- qcom,msm8917
- qcom,msm8937
- qcom,msm8939
- qcom,msm8953
- qcom,msm8956

View File

@@ -15,6 +15,11 @@ properties:
compatible:
oneOf:
- description: 100ASK DshanPi A1 board
items:
- const: 100ask,dshanpi-a1
- const: rockchip,rk3576
- description: 96boards RK3399 Ficus (ROCK960 Enterprise Edition)
items:
- const: vamrs,ficus
@@ -25,6 +30,12 @@ properties:
- const: vamrs,rock960
- const: rockchip,rk3399
- description: 9Tripod X3568 series board
items:
- enum:
- 9tripod,x3568-v4
- const: rockchip,rk3568
- description: Amarula Vyasa RK3288
items:
- const: amarula,vyasa-rk3288
@@ -78,13 +89,17 @@ properties:
- description: Asus Tinker board
items:
- const: asus,rk3288-tinker
- enum:
- asus,rk3288-tinker
- asus,rk3288-tinker-s
- const: rockchip,rk3288
- description: Asus Tinker board S
- description: Asus Tinker Board 3/3S
items:
- const: asus,rk3288-tinker-s
- const: rockchip,rk3288
- enum:
- asus,rk3566-tinker-board-3
- asus,rk3566-tinker-board-3s
- const: rockchip,rk3566
- description: Beelink A1
items:
@@ -330,6 +345,11 @@ properties:
- friendlyarm,nanopi-r6s
- const: rockchip,rk3588s
- description: FriendlyElec NanoPi R76S
items:
- const: friendlyarm,nanopi-r76s
- const: rockchip,rk3576
- description: FriendlyElec NanoPi Zero2
items:
- const: friendlyarm,nanopi-zero2
@@ -748,6 +768,11 @@ properties:
- const: lckfb,tspi-rk3566
- const: rockchip,rk3566
- description: LinkEase EasePi R1
items:
- const: linkease,easepi-r1
- const: rockchip,rk3568
- description: Luckfox Core3576 Module based boards
items:
- enum:
@@ -868,9 +893,11 @@ properties:
- const: prt,mecsbc
- const: rockchip,rk3568
- description: QNAP TS-433-4G 4-Bay NAS
- description: QNAP TS-x33 NAS devices
items:
- const: qnap,ts433
- enum:
- qnap,ts233
- qnap,ts433
- const: rockchip,rk3568
- description: Radxa Compute Module 3 (CM3)

View File

@@ -189,6 +189,11 @@ properties:
- nvidia,p2371-2180
- nvidia,p2571
- nvidia,p2894-0050-a08
- nvidia,p3450-0000
- const: nvidia,tegra210
- items:
- const: nvidia,p3541-0000
- const: nvidia,p3450-0000
- const: nvidia,tegra210
- description: Jetson TX2 Developer Kit
items:

View File

@@ -37,6 +37,12 @@ properties:
- const: phytec,am62a-phycore-som
- const: ti,am62a7
- description: K3 AM62L3 SoC and Boards
items:
- enum:
- ti,am62l3-evm
- const: ti,am62l3
- description: K3 AM62P5 SoC and Boards
items:
- enum:
@@ -158,6 +164,14 @@ properties:
- ti,am654-evm
- const: ti,am654
- description: K3 AM69 SoC Toradex Aquila Modules and Carrier Boards
items:
- enum:
- toradex,aquila-am69-clover # Aquila AM69 Module on Clover Board
- toradex,aquila-am69-dev # Aquila AM69 Module on Aquila Development Board
- const: toradex,aquila-am69 # Aquila AM69 Module
- const: ti,j784s4
- description: K3 J7200 SoC
oneOf:
- const: ti,j7200
@@ -194,6 +208,7 @@ properties:
items:
- enum:
- beagle,am67a-beagley-ai
- kontron,sa67 # Kontron SMARC-sAM67 board
- ti,j722s-evm
- const: ti,j722s

View File

@@ -129,6 +129,13 @@ properties:
- const: phytec,am335x-phycore-som
- const: ti,am33xx
- description: TQ-Systems TQMa335x[L] SoM
items:
- enum:
- tq,tqma3359-mba335x # MBa335x carrier board
- const: tq,tqma3359
- const: ti,am33xx
- description: TI OMAP4430 SoC based platforms
items:
- enum:

View File

@@ -22,6 +22,13 @@ properties:
- fsl,lx2160aqds-fpga
- const: fsl,fpga-qixis-i2c
- const: simple-mfd
- const: fsl,lx2160ardb-fpga
"#address-cells":
const: 1
"#size-cells":
const: 0
interrupts:
maxItems: 1
@@ -32,10 +39,37 @@ properties:
mux-controller:
$ref: /schemas/mux/reg-mux.yaml
patternProperties:
"^gpio@[0-9a-f]+$":
type: object
additionalProperties: true
properties:
compatible:
contains:
enum:
- fsl,lx2160ardb-fpga-gpio-sfp
required:
- compatible
- reg
allOf:
- if:
properties:
compatible:
contains:
enum:
- fsl,lx2160ardb-fpga
then:
required:
- "#address-cells"
- "#size-cells"
else:
properties:
"#address-cells": false
"#size-cells": false
additionalProperties: false
examples:
@@ -68,3 +102,27 @@ examples:
};
};
- |
i2c {
#address-cells = <1>;
#size-cells = <0>;
board-control@66 {
compatible = "fsl,lx2160ardb-fpga";
reg = <0x66>;
#address-cells = <1>;
#size-cells = <0>;
gpio@19 {
compatible = "fsl,lx2160ardb-fpga-gpio-sfp";
reg = <0x19>;
gpio-controller;
#gpio-cells = <2>;
gpio-line-names =
"SFP2_TX_EN", "",
"", "",
"SFP2_RX_LOS", "SFP2_TX_FAULT",
"", "SFP2_MOD_ABS";
};
};
};

View File

@@ -57,6 +57,16 @@ patternProperties:
'^mdio-mux@[a-f0-9,]+$':
$ref: /schemas/net/mdio-mux-mmioreg.yaml
'^gpio@[0-9a-f]+$':
type: object
additionalProperties: true
properties:
compatible:
contains:
enum:
- fsl,ls1046aqds-fpga-gpio-stat-pres2
required:
- compatible
- reg

View File

@@ -33,14 +33,18 @@ select:
properties:
compatible:
contains:
const: st,stm32mp25-rifsc
enum:
- st,stm32mp21-rifsc
- st,stm32mp25-rifsc
required:
- compatible
properties:
compatible:
items:
- const: st,stm32mp25-rifsc
- enum:
- st,stm32mp21-rifsc
- st,stm32mp25-rifsc
- const: simple-bus
reg:

View File

@@ -21,6 +21,7 @@ properties:
compatible:
enum:
- qcom,ipq5424-llcc
- qcom,kaanapali-llcc
- qcom,qcs615-llcc
- qcom,qcs8300-llcc
- qcom,qdu1000-llcc
@@ -272,6 +273,7 @@ allOf:
compatible:
contains:
enum:
- qcom,kaanapali-llcc
- qcom,sm8450-llcc
- qcom,sm8550-llcc
- qcom,sm8650-llcc

View File

@@ -48,6 +48,11 @@ properties:
- const: microchip,mpfs-ccache
- const: sifive,fu540-c000-ccache
- const: cache
- items:
- const: microchip,pic64gx-ccache
- const: microchip,mpfs-ccache
- const: sifive,fu540-c000-ccache
- const: cache
cache-block-size:
const: 64

View File

@@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/clock/qcom,ipq9574-nsscc.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Qualcomm Networking Sub System Clock & Reset Controller on IPQ9574
title: Qualcomm Networking Sub System Clock & Reset Controller on IPQ9574 and IPQ5424
maintainers:
- Bjorn Andersson <andersson@kernel.org>
@@ -12,21 +12,29 @@ maintainers:
description: |
Qualcomm networking sub system clock control module provides the clocks,
resets on IPQ9574
resets on IPQ9574 and IPQ5424
See also::
See also:
include/dt-bindings/clock/qcom,ipq5424-nsscc.h
include/dt-bindings/clock/qcom,ipq9574-nsscc.h
include/dt-bindings/reset/qcom,ipq5424-nsscc.h
include/dt-bindings/reset/qcom,ipq9574-nsscc.h
properties:
compatible:
const: qcom,ipq9574-nsscc
enum:
- qcom,ipq5424-nsscc
- qcom,ipq9574-nsscc
clocks:
items:
- description: Board XO source
- description: CMN_PLL NSS 1200MHz (Bias PLL cc) clock source
- description: CMN_PLL PPE 353MHz (Bias PLL ubi nc) clock source
- description: CMN_PLL NSS (Bias PLL cc) clock source. This clock rate
can vary for different IPQ SoCs. For example, it is 1200 MHz on the
IPQ9574 and 300 MHz on the IPQ5424.
- description: CMN_PLL PPE (Bias PLL ubi nc) clock source. The clock
rate can vary for different IPQ SoCs. For example, it is 353 MHz
on the IPQ9574 and 375 MHz on the IPQ5424.
- description: GCC GPLL0 OUT AUX clock source
- description: Uniphy0 NSS Rx clock source
- description: Uniphy0 NSS Tx clock source
@@ -42,8 +50,12 @@ properties:
clock-names:
items:
- const: xo
- const: nss_1200
- const: ppe_353
- enum:
- nss_1200
- nss
- enum:
- ppe_353
- ppe
- const: gpll0_out
- const: uniphy0_rx
- const: uniphy0_tx
@@ -60,6 +72,40 @@ required:
allOf:
- $ref: qcom,gcc.yaml#
- if:
properties:
compatible:
const: qcom,ipq9574-nsscc
then:
properties:
clock-names:
items:
- const: xo
- const: nss_1200
- const: ppe_353
- const: gpll0_out
- const: uniphy0_rx
- const: uniphy0_tx
- const: uniphy1_rx
- const: uniphy1_tx
- const: uniphy2_rx
- const: uniphy2_tx
- const: bus
else:
properties:
clock-names:
items:
- const: xo
- const: nss
- const: ppe
- const: gpll0_out
- const: uniphy0_rx
- const: uniphy0_tx
- const: uniphy1_rx
- const: uniphy1_tx
- const: uniphy2_rx
- const: uniphy2_tx
- const: bus
unevaluatedProperties: false
@@ -94,5 +140,6 @@ examples:
"bus";
#clock-cells = <1>;
#reset-cells = <1>;
#interconnect-cells = <1>;
};
...

View File

@@ -32,9 +32,36 @@ properties:
- description: PCIe 5 pipe clock
- description: PCIe 6a pipe clock
- description: PCIe 6b pipe clock
- description: USB QMP Phy 0 clock source
- description: USB QMP Phy 1 clock source
- description: USB QMP Phy 2 clock source
- description: USB4_0 QMPPHY clock source
- description: USB4_1 QMPPHY clock source
- description: USB4_2 QMPPHY clock source
- description: USB4_0 PHY DP0 GMUX clock source
- description: USB4_0 PHY DP1 GMUX clock source
- description: USB4_0 PHY PCIE PIPEGMUX clock source
- description: USB4_0 PHY PIPEGMUX clock source
- description: USB4_0 PHY SYS PCIE PIPEGMUX clock source
- description: USB4_1 PHY DP0 GMUX 2 clock source
- description: USB4_1 PHY DP1 GMUX 2 clock source
- description: USB4_1 PHY PCIE PIPEGMUX clock source
- description: USB4_1 PHY PIPEGMUX clock source
- description: USB4_1 PHY SYS PCIE PIPEGMUX clock source
- description: USB4_2 PHY DP0 GMUX 2 clock source
- description: USB4_2 PHY DP1 GMUX 2 clock source
- description: USB4_2 PHY PCIE PIPEGMUX clock source
- description: USB4_2 PHY PIPEGMUX clock source
- description: USB4_2 PHY SYS PCIE PIPEGMUX clock source
- description: USB4_0 PHY RX 0 clock source
- description: USB4_0 PHY RX 1 clock source
- description: USB4_1 PHY RX 0 clock source
- description: USB4_1 PHY RX 1 clock source
- description: USB4_2 PHY RX 0 clock source
- description: USB4_2 PHY RX 1 clock source
- description: USB4_0 PHY PCIE PIPE clock source
- description: USB4_0 PHY max PIPE clock source
- description: USB4_1 PHY PCIE PIPE clock source
- description: USB4_1 PHY max PIPE clock source
- description: USB4_2 PHY PCIE PIPE clock source
- description: USB4_2 PHY max PIPE clock source
power-domains:
description:
@@ -67,7 +94,34 @@ examples:
<&pcie6b_phy>,
<&usb_1_ss0_qmpphy 0>,
<&usb_1_ss1_qmpphy 1>,
<&usb_1_ss2_qmpphy 2>;
<&usb_1_ss2_qmpphy 2>,
<&usb4_0_phy_dp0_gmux_clk>,
<&usb4_0_phy_dp1_gmux_clk>,
<&usb4_0_phy_pcie_pipegmux_clk>,
<&usb4_0_phy_pipegmux_clk>,
<&usb4_0_phy_sys_pcie_pipegmux_clk>,
<&usb4_1_phy_dp0_gmux_2_clk>,
<&usb4_1_phy_dp1_gmux_2_clk>,
<&usb4_1_phy_pcie_pipegmux_clk>,
<&usb4_1_phy_pipegmux_clk>,
<&usb4_1_phy_sys_pcie_pipegmux_clk>,
<&usb4_2_phy_dp0_gmux_2_clk>,
<&usb4_2_phy_dp1_gmux_2_clk>,
<&usb4_2_phy_pcie_pipegmux_clk>,
<&usb4_2_phy_pipegmux_clk>,
<&usb4_2_phy_sys_pcie_pipegmux_clk>,
<&usb4_0_phy_rx_0_clk>,
<&usb4_0_phy_rx_1_clk>,
<&usb4_1_phy_rx_0_clk>,
<&usb4_1_phy_rx_1_clk>,
<&usb4_2_phy_rx_0_clk>,
<&usb4_2_phy_rx_1_clk>,
<&usb4_0_phy_pcie_pipe_clk>,
<&usb4_0_phy_max_pipe_clk>,
<&usb4_1_phy_pcie_pipe_clk>,
<&usb4_1_phy_max_pipe_clk>,
<&usb4_2_phy_pcie_pipe_clk>,
<&usb4_2_phy_max_pipe_clk>;
power-domains = <&rpmhpd RPMHPD_CX>;
#clock-cells = <1>;
#reset-cells = <1>;

View File

@@ -19,11 +19,14 @@ description: |
properties:
compatible:
enum:
- nvidia,tegra30-actmon
- nvidia,tegra114-actmon
- nvidia,tegra124-actmon
- nvidia,tegra210-actmon
oneOf:
- enum:
- nvidia,tegra30-actmon
- nvidia,tegra114-actmon
- nvidia,tegra124-actmon
- items:
- const: nvidia,tegra210-actmon
- const: nvidia,tegra124-actmon
reg:
maxItems: 1

View File

@@ -0,0 +1,68 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/display/tegra/nvidia,tegra114-tsec.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: NVIDIA Tegra Security co-processor
maintainers:
- Svyatoslav Ryhel <clamor95@gmail.com>
- Thierry Reding <thierry.reding@gmail.com>
description: Tegra Security co-processor, an embedded security processor used
mainly to manage the HDCP encryption and keys on the HDMI link.
properties:
compatible:
oneOf:
- enum:
- nvidia,tegra114-tsec
- nvidia,tegra124-tsec
- nvidia,tegra210-tsec
- items:
- const: nvidia,tegra132-tsec
- const: nvidia,tegra124-tsec
reg:
maxItems: 1
interrupts:
maxItems: 1
clocks:
maxItems: 1
resets:
maxItems: 1
iommus:
maxItems: 1
operating-points-v2: true
power-domains:
maxItems: 1
additionalProperties: false
required:
- compatible
- reg
- interrupts
- clocks
- resets
examples:
- |
#include <dt-bindings/clock/tegra114-car.h>
#include <dt-bindings/interrupt-controller/arm-gic.h>
tsec@54500000 {
compatible = "nvidia,tegra114-tsec";
reg = <0x54500000 0x00040000>;
interrupts = <GIC_SPI 50 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&tegra_car TEGRA114_CLK_TSEC>;
resets = <&tegra_car TEGRA114_CLK_TSEC>;
};

View File

@@ -0,0 +1,138 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/display/tegra/nvidia,tegra20-csi.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: NVIDIA Tegra20 CSI controller
maintainers:
- Svyatoslav Ryhel <clamor95@gmail.com>
properties:
compatible:
enum:
- nvidia,tegra20-csi
- nvidia,tegra30-csi
reg:
maxItems: 1
clocks:
minItems: 1
items:
- description: module clock
- description: PAD A clock
- description: PAD B clock
clock-names:
items:
- const: csi
- const: csia-pad
- const: csib-pad
avdd-dsi-csi-supply:
description: DSI/CSI power supply. Must supply 1.2 V.
power-domains:
maxItems: 1
"#nvidia,mipi-calibrate-cells":
description:
The number of cells in a MIPI calibration specifier. Should be 1.
The single cell specifies an id of the pad that need to be
calibrated for a given device. Valid pad ids for receiver would be
0 for CSI-A; 1 for CSI-B; 2 for DSI-A and 3 for DSI-B.
$ref: /schemas/types.yaml#/definitions/uint32
const: 1
"#address-cells":
const: 1
"#size-cells":
const: 0
patternProperties:
"^channel@[0-1]$":
type: object
description: channel 0 represents CSI-A and 1 represents CSI-B
additionalProperties: false
properties:
reg:
maximum: 1
nvidia,mipi-calibrate:
description: Should contain a phandle and a specifier specifying
which pad is used by this CSI channel and needs to be calibrated.
$ref: /schemas/types.yaml#/definitions/phandle-array
"#address-cells":
const: 1
"#size-cells":
const: 0
port@0:
$ref: /schemas/graph.yaml#/$defs/port-base
unevaluatedProperties: false
description: port receiving the video stream from the sensor
properties:
endpoint:
$ref: /schemas/media/video-interfaces.yaml#
unevaluatedProperties: false
required:
- data-lanes
port@1:
$ref: /schemas/graph.yaml#/properties/port
description: port sending the video stream to the VI
required:
- reg
- "#address-cells"
- "#size-cells"
- port@0
- port@1
allOf:
- if:
properties:
compatible:
contains:
enum:
- nvidia,tegra20-csi
then:
properties:
clocks:
maxItems: 1
clock-names: false
- if:
properties:
compatible:
contains:
enum:
- nvidia,tegra30-csi
then:
properties:
clocks:
minItems: 3
clock-names:
minItems: 3
additionalProperties: false
required:
- compatible
- reg
- clocks
- power-domains
- "#address-cells"
- "#size-cells"
# see nvidia,tegra20-vi.yaml for an example

View File

@@ -15,10 +15,16 @@ properties:
pattern: "^epp@[0-9a-f]+$"
compatible:
enum:
- nvidia,tegra20-epp
- nvidia,tegra30-epp
- nvidia,tegra114-epp
oneOf:
- enum:
- nvidia,tegra20-epp
- nvidia,tegra30-epp
- nvidia,tegra114-epp
- nvidia,tegra124-epp
- items:
- const: nvidia,tegra132-epp
- const: nvidia,tegra124-epp
reg:
maxItems: 1

View File

@@ -12,10 +12,17 @@ maintainers:
properties:
compatible:
enum:
- nvidia,tegra20-isp
- nvidia,tegra30-isp
- nvidia,tegra210-isp
oneOf:
- enum:
- nvidia,tegra20-isp
- nvidia,tegra30-isp
- nvidia,tegra114-isp
- nvidia,tegra124-isp
- nvidia,tegra210-isp
- items:
- const: nvidia,tegra132-isp
- const: nvidia,tegra124-isp
reg:
maxItems: 1

View File

@@ -12,13 +12,21 @@ maintainers:
properties:
$nodename:
pattern: "^mpe@[0-9a-f]+$"
oneOf:
- pattern: "^mpe@[0-9a-f]+$"
- pattern: "^msenc@[0-9a-f]+$"
compatible:
enum:
- nvidia,tegra20-mpe
- nvidia,tegra30-mpe
- nvidia,tegra114-mpe
oneOf:
- enum:
- nvidia,tegra20-mpe
- nvidia,tegra30-mpe
- nvidia,tegra114-msenc
- nvidia,tegra124-msenc
- items:
- const: nvidia,tegra132-msenc
- const: nvidia,tegra124-msenc
reg:
maxItems: 1

View File

@@ -29,7 +29,10 @@ properties:
- const: allwinner,sun8i-r40-dma
- const: allwinner,sun50i-a64-dma
- items:
- const: allwinner,sun50i-h616-dma
- enum:
- allwinner,sun50i-h616-dma
- allwinner,sun55i-a523-dma
- allwinner,sun55i-a523-mcu-dma
- const: allwinner,sun50i-a100-dma
reg:

View File

@@ -42,6 +42,9 @@ properties:
minItems: 1
maxItems: 8
iommus:
maxItems: 1
clocks:
items:
- description: Bus Clock

View File

@@ -0,0 +1,40 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/embedded-controller/traverse,ten64-controller.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Traverse Ten64 board microcontroller
maintainers:
- Mathew McBride <matt@traverse.com.au>
description: |
The board microcontroller on the Ten64 board family is responsible for
management of power sources on the board, as well as signalling the SoC
to power on and reset.
properties:
compatible:
const: traverse,ten64-controller
reg:
const: 0x7e
required:
- compatible
- reg
additionalProperties: false
examples:
- |
i2c {
#address-cells = <1>;
#size-cells = <0>;
board-controller@7e {
compatible = "traverse,ten64-controller";
reg = <0x7e>;
};
};

View File

@@ -24,6 +24,15 @@ properties:
compatible:
const: google,gs101-acpm-ipc
"#clock-cells":
const: 1
description:
Clocks that are variable and index based. These clocks don't provide
an entire range of values between the limits but only discrete points
within the range. The firmware also manages the voltage scaling
appropriately with the clock scaling. The argument is the ID of the
clock contained by the firmware messages.
mboxes:
maxItems: 1
@@ -45,6 +54,7 @@ properties:
required:
- compatible
- "#clock-cells"
- mboxes
- shmem
@@ -56,6 +66,7 @@ examples:
power-management {
compatible = "google,gs101-acpm-ipc";
#clock-cells = <1>;
mboxes = <&ap2apm_mailbox>;
shmem = <&apm_sram>;

View File

@@ -34,6 +34,7 @@ properties:
enum:
- intel,stratix10-svc
- intel,agilex-svc
- intel,agilex5-svc
method:
description: |
@@ -54,6 +55,9 @@ properties:
reserved memory region for the service layer driver to
communicate with the secure device manager.
iommus:
maxItems: 1
fpga-mgr:
$ref: /schemas/fpga/intel,stratix10-soc-fpga-mgr.yaml
description: Optional child node for fpga manager to perform fabric configuration.
@@ -63,6 +67,17 @@ required:
- method
- memory-region
allOf:
- if:
properties:
compatible:
contains:
enum:
- intel,agilex5-svc
then:
required:
- iommus
additionalProperties: false
examples:

View File

@@ -23,6 +23,7 @@ properties:
- enum:
- qcom,scm-apq8064
- qcom,scm-apq8084
- qcom,scm-glymur
- qcom,scm-ipq4019
- qcom,scm-ipq5018
- qcom,scm-ipq5332
@@ -31,6 +32,7 @@ properties:
- qcom,scm-ipq806x
- qcom,scm-ipq8074
- qcom,scm-ipq9574
- qcom,scm-kaanapali
- qcom,scm-mdm9607
- qcom,scm-milos
- qcom,scm-msm8226
@@ -202,6 +204,7 @@ allOf:
compatible:
contains:
enum:
- qcom,scm-kaanapali
- qcom,scm-milos
- qcom,scm-sm8450
- qcom,scm-sm8550

View File

@@ -20,12 +20,14 @@ properties:
- samsung,exynos5433-chipid
- samsung,exynos7-chipid
- samsung,exynos7870-chipid
- samsung,exynos8890-chipid
- const: samsung,exynos4210-chipid
- items:
- enum:
- samsung,exynos2200-chipid
- samsung,exynos7885-chipid
- samsung,exynos8895-chipid
- samsung,exynos9610-chipid
- samsung,exynos9810-chipid
- samsung,exynos990-chipid
- samsung,exynosautov9-chipid

View File

@@ -66,6 +66,7 @@ properties:
- spacemit,k1-plic
- starfive,jh7100-plic
- starfive,jh7110-plic
- tenstorrent,blackhole-plic
- const: sifive,plic-1.0.0
- items:
- enum:

View File

@@ -1,43 +0,0 @@
* TI OMAP SDHCI Controller
Refer to mmc.txt for standard MMC bindings.
For UHS devices which require tuning, the device tree should have a "cpu_thermal" node which maps to the appropriate thermal zone. This is used to get the temperature of the zone during tuning.
Required properties:
- compatible: Should be "ti,omap2430-sdhci" for omap2430 controllers
Should be "ti,omap3-sdhci" for omap3 controllers
Should be "ti,omap4-sdhci" for omap4 and ti81 controllers
Should be "ti,omap5-sdhci" for omap5 controllers
Should be "ti,dra7-sdhci" for DRA7 and DRA72 controllers
Should be "ti,k2g-sdhci" for K2G
Should be "ti,am335-sdhci" for am335x controllers
Should be "ti,am437-sdhci" for am437x controllers
- ti,hwmods: Must be "mmc<n>", <n> is controller instance starting 1
(Not required for K2G).
- pinctrl-names: Should be subset of "default", "hs", "sdr12", "sdr25", "sdr50",
"ddr50-rev11", "sdr104-rev11", "ddr50", "sdr104",
"ddr_1_8v-rev11", "ddr_1_8v" or "ddr_3_3v", "hs200_1_8v-rev11",
"hs200_1_8v",
- pinctrl-<n> : Pinctrl states as described in bindings/pinctrl/pinctrl-bindings.txt
Optional properties:
- dmas: List of DMA specifiers with the controller specific format as described
in the generic DMA client binding. A tx and rx specifier is required.
- dma-names: List of DMA request names. These strings correspond 1:1 with the
DMA specifiers listed in dmas. The string naming is to be "tx"
and "rx" for TX and RX DMA requests, respectively.
Deprecated properties:
- ti,non-removable: Compatible with the generic non-removable property
Example:
mmc1: mmc@4809c000 {
compatible = "ti,dra7-sdhci";
reg = <0x4809c000 0x400>;
ti,hwmods = "mmc1";
bus-width = <4>;
vmmc-supply = <&vmmc>; /* phandle to regulator node */
dmas = <&sdma 61 &sdma 62>;
dma-names = "tx", "rx";
};

View File

@@ -0,0 +1,169 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/mmc/ti,omap2430-sdhci.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: TI OMAP SDHCI Controller
maintainers:
- Kishon Vijay Abraham I <kishon@ti.com>
description:
For UHS devices which require tuning, the device tree should have a
cpu_thermal node which maps to the appropriate thermal zone. This
is used to get the temperature of the zone during tuning.
properties:
compatible:
enum:
- ti,omap2430-sdhci
- ti,omap3-sdhci
- ti,omap4-sdhci
- ti,omap5-sdhci
- ti,dra7-sdhci
- ti,k2g-sdhci
- ti,am335-sdhci
- ti,am437-sdhci
reg:
maxItems: 1
interrupts:
maxItems: 1
clocks:
maxItems: 2
clock-names:
items:
- const: fck
- const: mmchsdb_fck
dmas:
maxItems: 2
dma-names:
items:
- const: tx
- const: rx
pinctrl-names:
minItems: 1
maxItems: 14
items:
enum:
- default
- default-rev11
- hs
- sdr12
- sdr12-rev11
- sdr25
- sdr25-rev11
- sdr50
- ddr50-rev11
- sdr104-rev11
- ddr50
- sdr104
- ddr_1_8v-rev11
- ddr_1_8v
- ddr_3_3v
- hs-rev11
- hs200_1_8v-rev11
- hs200_1_8v
- sleep
pinctrl-0:
maxItems: 1
pinctrl-1:
maxItems: 1
pinctrl-2:
maxItems: 1
pinctrl-3:
maxItems: 1
pinctrl-4:
maxItems: 1
pinctrl-5:
maxItems: 1
pinctrl-6:
maxItems: 1
pinctrl-7:
maxItems: 1
pinctrl-8:
maxItems: 1
power-domains:
maxItems: 1
pbias-supply:
description:
It is used to specify the voltage regulator that provides the bias
voltage for certain analog or I/O pads.
ti,non-removable:
description:
It indicates that a component is not meant to be easily removed or
replaced by the user, such as an embedded battery or a non-removable
storage slot like eMMC.
type: boolean
deprecated: true
clock-frequency:
$ref: /schemas/types.yaml#/definitions/uint32
description:
It represents the speed at which a clock signal associated with a device
or bus operates, measured in Hertz (Hz). This value is crucial for configuring
hardware components that require a specific clock speed.
required:
- compatible
- reg
- interrupts
allOf:
- $ref: sdhci-common.yaml#
- if:
properties:
compatible:
contains:
enum:
- ti,dra7-sdhci
- ti,k2g-sdhci
then:
required:
- max-frequency
- if:
properties:
compatible:
contains:
const: ti,k2g-sdhci
then:
required:
- clocks
- clock-names
unevaluatedProperties: false
examples:
- |
#include <dt-bindings/interrupt-controller/arm-gic.h>
mmc@4809c000 {
compatible = "ti,dra7-sdhci";
reg = <0x4809c000 0x400>;
interrupts = <GIC_SPI 64 IRQ_TYPE_LEVEL_HIGH>;
max-frequency = <192000000>;
sdhci-caps-mask = <0x0 0x400000>;
bus-width = <4>;
vmmc-supply = <&vmmc>; /* phandle to regulator node */
dmas = <&sdma 61>, <&sdma 62>;
dma-names = "tx", "rx";
};
...

View File

@@ -40,6 +40,9 @@ properties:
dmas:
maxItems: 1
iommus:
maxItems: 1
cdns,board-delay-ps:
description: |
Estimated Board delay. The value includes the total round trip

View File

@@ -0,0 +1,42 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/reset/eswin,eic7700-reset.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: ESWIN EIC7700 SoC reset controller
maintainers:
- Yifeng Huang <huangyifeng@eswincomputing.com>
- Xuyang Dong <dongxuyang@eswincomputing.com>
description:
The system reset controller can be used to reset various peripheral
controllers in ESWIN eic7700 SoC.
properties:
compatible:
const: eswin,eic7700-reset
reg:
maxItems: 1
'#reset-cells':
const: 1
required:
- compatible
- reg
- '#reset-cells'
additionalProperties: false
examples:
- |
#include <dt-bindings/reset/eswin,eic7700-reset.h>
reset-controller@51828300 {
compatible = "eswin,eic7700-reset";
reg = <0x51828300 0x200>;
#reset-cells = <1>;
};

View File

@@ -20,9 +20,14 @@ properties:
pattern: "^reset-controller@[0-9a-f]+$"
compatible:
enum:
- microchip,sparx5-switch-reset
- microchip,lan966x-switch-reset
oneOf:
- enum:
- microchip,sparx5-switch-reset
- microchip,lan966x-switch-reset
- items:
- enum:
- microchip,lan9691-switch-reset
- const: microchip,lan966x-switch-reset
reg:
items:

View File

@@ -15,12 +15,14 @@ description:
properties:
compatible:
items:
- enum:
- renesas,r9a07g043-usbphy-ctrl # RZ/G2UL and RZ/Five
- renesas,r9a07g044-usbphy-ctrl # RZ/G2{L,LC}
- renesas,r9a07g054-usbphy-ctrl # RZ/V2L
- const: renesas,rzg2l-usbphy-ctrl
oneOf:
- items:
- enum:
- renesas,r9a07g043-usbphy-ctrl # RZ/G2UL and RZ/Five
- renesas,r9a07g044-usbphy-ctrl # RZ/G2{L,LC}
- renesas,r9a07g054-usbphy-ctrl # RZ/V2L
- const: renesas,rzg2l-usbphy-ctrl
- const: renesas,r9a08g045-usbphy-ctrl # RZ/G3S
reg:
maxItems: 1
@@ -48,6 +50,20 @@ properties:
$ref: /schemas/regulator/regulator.yaml#
unevaluatedProperties: false
renesas,sysc-pwrrdy:
description:
The system controller PWRRDY indicates to the USB PHY if the power supply
is ready. PWRRDY needs to be set during power-on before applying any
other settings. It also needs to be set before powering off the USB.
$ref: /schemas/types.yaml#/definitions/phandle-array
items:
- items:
- description:
System controller phandle required by USB PHY CTRL driver to set
PWRRDY
- description: Register offset associated with PWRRDY
- description: Register bitmask associated with PWRRDY
required:
- compatible
- reg
@@ -57,6 +73,19 @@ required:
- '#reset-cells'
- regulator-vbus
allOf:
- if:
properties:
compatible:
contains:
const: renesas,r9a08g045-usbphy-ctrl
then:
required:
- renesas,sysc-pwrrdy
else:
properties:
renesas,sysc-pwrrdy: false
additionalProperties: false
examples:

View File

@@ -16,7 +16,13 @@ maintainers:
properties:
compatible:
enum:
- thead,th1520-reset
- thead,th1520-reset # Reset controller for VO subsystem
- thead,th1520-reset-ao
- thead,th1520-reset-ap
- thead,th1520-reset-dsp
- thead,th1520-reset-misc
- thead,th1520-reset-vi
- thead,th1520-reset-vp
reg:
maxItems: 1

View File

@@ -0,0 +1,27 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/riscv/anlogic.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Anlogic SoC-based boards
maintainers:
- Junhui Liu <junhui.liu@pigmoral.tech>
description:
Anlogic SoC-based boards
properties:
$nodename:
const: '/'
compatible:
oneOf:
- items:
- enum:
- milianke,mlkpai-fs01
- const: anlogic,dr1v90
additionalProperties: true
...

View File

@@ -48,6 +48,7 @@ properties:
- amd,mbv64
- andestech,ax45mp
- canaan,k210
- nuclei,ux900
- sifive,bullet0
- sifive,e5
- sifive,e7
@@ -70,6 +71,7 @@ properties:
- enum:
- sifive,e51
- sifive,u54-mc
- sifive,x280
- const: sifive,rocket0
- const: riscv
- const: riscv # Simulator only

View File

@@ -217,6 +217,12 @@ properties:
memory types as ratified in the 20191213 version of the privileged
ISA specification.
- const: svrsw60t59b
description:
The Svrsw60t59b extension for providing two more bits[60:59] to
PTE/PMD entry as ratified at commit 28bde925e7a7 ("PTE Reserved
for SW bits 60:59") of riscv-non-isa/riscv-iommu.
- const: svvptc
description:
The standard Svvptc supervisor-level extension for
@@ -242,6 +248,11 @@ properties:
is supported as ratified at commit 5059e0ca641c ("update to
ratified") of the riscv-zacas.
- const: zalasr
description: |
The standard Zalasr extension for load-acquire/store-release as frozen
at commit 194f0094 ("Version 0.9 for freeze") of riscv-zalasr.
- const: zalrsc
description: |
The standard Zalrsc extension for load-reserved/store-conditional as

View File

@@ -22,6 +22,8 @@ properties:
- enum:
- bananapi,bpi-f3
- milkv,jupiter
- spacemit,musepi-pro
- xunlong,orangepi-r2s
- xunlong,orangepi-rv2
- const: spacemit,k1

View File

@@ -33,8 +33,15 @@ properties:
- pine64,star64
- starfive,visionfive-2-v1.2a
- starfive,visionfive-2-v1.3b
- xunlong,orangepi-rv
- const: starfive,jh7110
- items:
- enum:
- starfive,visionfive-2-lite
- starfive,visionfive-2-lite-emmc
- const: starfive,jh7110s
additionalProperties: true
...

View File

@@ -0,0 +1,28 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/riscv/tenstorrent.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Tenstorrent SoC-based boards
maintainers:
- Drew Fustini <dfustini@oss.tenstorrent.com>
- Joel Stanley <jms@oss.tenstorrent.com>
description:
Tenstorrent SoC-based boards
properties:
$nodename:
const: '/'
compatible:
oneOf:
- description: Tenstorrent Blackhole PCIe card
items:
- const: tenstorrent,blackhole-card
- const: tenstorrent,blackhole
additionalProperties: true
...

View File

@@ -51,6 +51,7 @@ properties:
- const: renesas,rzn1-uart
- items:
- enum:
- anlogic,dr1v90-uart
- brcm,bcm11351-dw-apb-uart
- brcm,bcm21664-dw-apb-uart
- rockchip,px30-uart

View File

@@ -51,6 +51,22 @@ properties:
type: object
$ref: /schemas/mux/reg-mux.yaml
patternProperties:
"^ipu[12]_csi[01]_mux$":
type: object
$ref: /schemas/media/video-mux.yaml
allOf:
- if:
properties:
compatible:
not:
contains:
const: fsl,imx6q-iomuxc-gpr
then:
patternProperties:
'^ipu[12]_csi[01]_mux$': false
additionalProperties: false
required:

View File

@@ -52,6 +52,7 @@ properties:
- items:
- enum:
- mediatek,mt8188-pwrap
- mediatek,mt8189-pwrap
- const: mediatek,mt8195-pwrap
- const: syscon

View File

@@ -0,0 +1,47 @@
# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
%YAML 1.2
---
$id: http://devicetree.org/schemas/soc/microchip/microchip,mpfs-mss-top-sysreg.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Microchip PolarFire SoC Microprocessor Subsystem (MSS) sysreg register region
maintainers:
- Conor Dooley <conor.dooley@microchip.com>
description:
An wide assortment of registers that control elements of the MSS on PolarFire
SoC, including pinmuxing, resets and clocks among others.
properties:
compatible:
items:
- const: microchip,mpfs-mss-top-sysreg
- const: syscon
reg:
maxItems: 1
'#reset-cells':
description:
The AHB/AXI peripherals on the PolarFire SoC have reset support, so
from CLK_ENVM to CLK_CFM. The reset consumer should specify the
desired peripheral via the clock ID in its "resets" phandle cell.
See include/dt-bindings/clock/microchip,mpfs-clock.h for the full list
of PolarFire clock/reset IDs.
const: 1
required:
- compatible
- reg
additionalProperties: false
examples:
- |
syscon@20002000 {
compatible = "microchip,mpfs-mss-top-sysreg", "syscon";
reg = <0x20002000 0x1000>;
#reset-cells = <1>;
};

View File

@@ -25,6 +25,8 @@ properties:
compatible:
items:
- enum:
- qcom,glymur-aoss-qmp
- qcom,kaanapali-aoss-qmp
- qcom,milos-aoss-qmp
- qcom,qcs615-aoss-qmp
- qcom,qcs8300-aoss-qmp

View File

@@ -55,6 +55,7 @@ properties:
- samsung,exynos2200-pmu
- samsung,exynos7870-pmu
- samsung,exynos7885-pmu
- samsung,exynos8890-pmu
- samsung,exynos8895-pmu
- samsung,exynos9810-pmu
- samsung,exynos990-pmu
@@ -172,6 +173,7 @@ allOf:
- samsung,exynos5250-pmu
- samsung,exynos5420-pmu
- samsung,exynos5433-pmu
- samsung,exynos7870-pmu
then:
properties:
mipi-phy: true

View File

@@ -15,7 +15,9 @@ properties:
- items:
- enum:
- google,gs101-apm-sysreg
- google,gs101-hsi0-sysreg
- google,gs101-hsi2-sysreg
- google,gs101-misc-sysreg
- google,gs101-peric0-sysreg
- google,gs101-peric1-sysreg
- samsung,exynos2200-cmgp-sysreg
@@ -26,10 +28,14 @@ properties:
- samsung,exynos3-sysreg
- samsung,exynos4-sysreg
- samsung,exynos5-sysreg
- samsung,exynos7870-cam0-sysreg
- samsung,exynos7870-disp-sysreg
- samsung,exynos8895-fsys0-sysreg
- samsung,exynos8895-fsys1-sysreg
- samsung,exynos8895-peric0-sysreg
- samsung,exynos8895-peric1-sysreg
- samsung,exynos990-peric0-sysreg
- samsung,exynos990-peric1-sysreg
- samsung,exynosautov920-hsi2-sysreg
- samsung,exynosautov920-peric0-sysreg
- samsung,exynosautov920-peric1-sysreg
@@ -73,6 +79,9 @@ properties:
clocks:
maxItems: 1
power-domains:
maxItems: 1
required:
- compatible
- reg
@@ -83,7 +92,9 @@ allOf:
compatible:
contains:
enum:
- google,gs101-hsi0-sysreg
- google,gs101-hsi2-sysreg
- google,gs101-misc-sysreg
- google,gs101-peric0-sysreg
- google,gs101-peric1-sysreg
- samsung,exynos850-cmgp-sysreg
@@ -93,6 +104,8 @@ allOf:
- samsung,exynos8895-fsys1-sysreg
- samsung,exynos8895-peric0-sysreg
- samsung,exynos8895-peric1-sysreg
- samsung,exynos990-peric0-sysreg
- samsung,exynos990-peric1-sysreg
then:
required:
- clocks
@@ -100,6 +113,16 @@ allOf:
properties:
clocks: false
- if:
properties:
compatible:
not:
contains:
pattern: "^google,gs101-[^-]+-sysreg$"
then:
properties:
power-domains: false
additionalProperties: false
examples:

View File

@@ -0,0 +1,80 @@
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/soc/sophgo/sophgo,cv1800b-top-syscon.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Sophgo CV18XX/SG200X SoC top system controller
maintainers:
- Inochi Amaoto <inochiama@outlook.com>
description:
The Sophgo CV18XX/SG200X SoC top misc system controller provides
register access to configure related modules.
properties:
compatible:
oneOf:
- items:
- const: sophgo,cv1800b-top-syscon
- const: syscon
- const: simple-mfd
reg:
maxItems: 1
"#address-cells":
const: 1
"#size-cells":
const: 1
ranges: true
dma-router@154:
$ref: /schemas/dma/sophgo,cv1800b-dmamux.yaml#
unevaluatedProperties: false
phy@48:
$ref: /schemas/phy/sophgo,cv1800b-usb2-phy.yaml#
unevaluatedProperties: false
required:
- compatible
- reg
- "#address-cells"
- "#size-cells"
additionalProperties: false
examples:
- |
#include <dt-bindings/clock/sophgo,cv1800.h>
syscon@3000000 {
compatible = "sophgo,cv1800b-top-syscon", "syscon", "simple-mfd";
reg = <0x03000000 0x1000>;
#address-cells = <1>;
#size-cells = <1>;
phy@48 {
compatible = "sophgo,cv1800b-usb2-phy";
reg = <0x48 0x4>;
#phy-cells = <0>;
clocks = <&clk CLK_USB_125M>,
<&clk CLK_USB_33K>,
<&clk CLK_USB_12M>;
clock-names = "app", "stb", "lpm";
resets = <&rst 58>;
};
dma-router@154 {
compatible = "sophgo,cv1800b-dmamux";
reg = <0x154 0x8>, <0x298 0x4>;
#dma-cells = <2>;
dma-masters = <&dmac>;
};
};
...

View File

@@ -36,6 +36,7 @@ properties:
- starfive,jh7100-clint # StarFive JH7100
- starfive,jh7110-clint # StarFive JH7110
- starfive,jh8100-clint # StarFive JH8100
- tenstorrent,blackhole-clint # Tenstorrent Blackhole
- const: sifive,clint0 # SiFive CLINT v0 IP block
- items:
- {}

View File

@@ -4,18 +4,23 @@
$id: http://devicetree.org/schemas/timer/thead,c900-aclint-mtimer.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Sophgo CLINT Timer
title: ACLINT Machine-level Timer Device
maintainers:
- Inochi Amaoto <inochiama@outlook.com>
properties:
compatible:
items:
- enum:
- sophgo,sg2042-aclint-mtimer
- sophgo,sg2044-aclint-mtimer
- const: thead,c900-aclint-mtimer
oneOf:
- items:
- enum:
- sophgo,sg2042-aclint-mtimer
- sophgo,sg2044-aclint-mtimer
- const: thead,c900-aclint-mtimer
- items:
- enum:
- anlogic,dr1v90-aclint-mtimer
- const: nuclei,ux900-aclint-mtimer
reg:
items:

View File

@@ -32,9 +32,35 @@ properties:
- const: bar2
interrupts:
minItems: 2
items:
- description: xHCI host interrupt
- description: mailbox interrupt
- description: USB wake event 0
- description: USB wake event 1
- description: USB wake event 2
- description: USB wake event 3
- description: USB wake event 4
- description: USB wake event 5
- description: USB wake event 6
description: |
The first two interrupts are required for the USB host controller. The
remaining USB wake event interrupts are optional. Each USB wake event is
independent; it is not necessary to use all of these events on a
platform. The USB host controller can function even if no wake-up events
are defined. The USB wake event interrupts are handled by the Tegra PMC;
hence, the interrupt controller for these is the PMC and the interrupt
IDs correspond to the PMC wake event IDs. A complete list of wake event
IDs is provided below, and this information is also present in the Tegra
TRM document.
PMC wake-up 76 for USB3 port 0 wakeup
PMC wake-up 77 for USB3 port 1 wakeup
PMC wake-up 78 for USB3 port 2 and port 3 wakeup
PMC wake-up 79 for USB2 port 0 wakeup
PMC wake-up 80 for USB2 port 1 wakeup
PMC wake-up 81 for USB2 port 2 wakeup
PMC wake-up 82 for USB2 port 3 wakeup
clocks:
items:
@@ -127,8 +153,9 @@ examples:
<0x03650000 0x10000>;
reg-names = "hcd", "fpci", "bar2";
interrupts = <GIC_SPI 163 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 164 IRQ_TYPE_LEVEL_HIGH>;
interrupts-extended = <&gic GIC_SPI 163 IRQ_TYPE_LEVEL_HIGH>,
<&gic GIC_SPI 164 IRQ_TYPE_LEVEL_HIGH>,
<&pmc 76 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&bpmp TEGRA234_CLK_XUSB_CORE_HOST>,
<&bpmp TEGRA234_CLK_XUSB_FALCON>,

View File

@@ -30,6 +30,8 @@ patternProperties:
description: 70mai Co., Ltd.
"^8dev,.*":
description: 8devices, UAB
"^9tripod,.*":
description: Shenzhen 9Tripod Innovation and Development CO., LTD.
"^abb,.*":
description: ABB
"^abilis,.*":
@@ -132,6 +134,8 @@ patternProperties:
description: Anbernic
"^andestech,.*":
description: Andes Technology Corporation
"^anlogic,.*":
description: Shanghai Anlogic Infotech Co., Ltd.
"^anvo,.*":
description: Anvo-Systems Dresden GmbH
"^aoly,.*":
@@ -253,6 +257,8 @@ patternProperties:
description: Shanghai Broadmobi Communication Technology Co.,Ltd.
"^bsh,.*":
description: BSH Hausgeraete GmbH
"^bst,.*":
description: Black Sesame Technologies Co., Ltd.
"^bticino,.*":
description: Bticino International
"^buffalo,.*":
@@ -913,6 +919,8 @@ patternProperties:
description: Lincoln Technology Solutions
"^lineartechnology,.*":
description: Linear Technology
"^linkease,.*":
description: Shenzhen LinkEase Network Technology Co., Ltd.
"^linksprite,.*":
description: LinkSprite Technologies, Inc.
"^linksys,.*":
@@ -1029,6 +1037,8 @@ patternProperties:
description: MikroElektronika d.o.o.
"^mikrotik,.*":
description: MikroTik
"^milianke,.*":
description: Changzhou Milianke Electronic Technology Co., Ltd
"^milkv,.*":
description: MilkV Technology Co., Ltd
"^miniand,.*":
@@ -1146,6 +1156,8 @@ patternProperties:
description: Novatek
"^novtech,.*":
description: NovTech, Inc.
"^nuclei,.*":
description: Nuclei System Technology
"^numonyx,.*":
description: Numonyx (deprecated, use micron)
deprecated: true
@@ -1618,6 +1630,8 @@ patternProperties:
description: Tempo Semiconductor
"^tenda,.*":
description: Shenzhen Tenda Technology Co., Ltd.
"^tenstorrent,.*":
description: Tenstorrent AI ULC
"^terasic,.*":
description: Terasic Inc.
"^tesla,.*":

View File

@@ -218,4 +218,3 @@ devm_reset_controller_register().
reset_controller_register
reset_controller_unregister
devm_reset_controller_register
reset_controller_add_lookup

View File

@@ -24,7 +24,7 @@
| s390: | ok |
| sh: | ok |
| sparc: | ok |
| um: | TODO |
| um: | ok |
| x86: | ok |
| xtensa: | ok |
-----------------------

View File

@@ -1286,6 +1286,11 @@ The vm_area_desc provides the minimum required information for a filesystem
to initialise state upon memory mapping of a file-backed region, and output
parameters for the file system to set this state.
In nearly all cases, this is all that is required for a filesystem. However, if
a filesystem needs to perform an operation such a pre-population of page tables,
then that action can be specified in the vm_area_desc->action field, which can
be configured using the mmap_action_*() helpers.
---
**mandatory**
@@ -1322,3 +1327,10 @@ When vfs_mkdir() returns an error, and so both dputs() the original
dentry and doesn't provide a replacement, it also unlocks the parent.
Consequently the return value from vfs_mkdir() can be passed to
end_creating() and the parent will be unlocked precisely when necessary.
---
**mandatory**
kill_litter_super() is gone; convert to DCACHE_PERSISTENT use (as all
in-tree filesystems have done).

View File

@@ -553,7 +553,7 @@ otherwise.
kernel flags associated with the particular virtual memory area in two letter
encoded manner. The codes are the following:
== =======================================
== =============================================================
rd readable
wr writeable
ex executable
@@ -591,7 +591,8 @@ encoded manner. The codes are the following:
sl sealed
lf lock on fault pages
dp always lazily freeable mapping
== =======================================
gu maybe contains guard regions (if not set, definitely doesn't)
== =============================================================
Note that there is no guarantee that every flag and associated mnemonic will
be present in all further kernel releases. Things get changed, the flags may

View File

@@ -1213,6 +1213,10 @@ otherwise noted.
file-backed memory mapping, most notably establishing relevant
private state and VMA callbacks.
If further action such as pre-population of page tables is required,
this can be specified by the vm_area_desc->action field and related
parameters.
Note that the file operations are implemented by the specific
filesystem in which the inode resides. When opening a device node
(character or block special) most filesystems will call special

View File

@@ -381,8 +381,8 @@ That is, assumes 4% (20% of 20%) DAMON-observed access events ratio (source)
to capture 64% (80% multipled by 80%) real access events (outcomes).
To know how user-space can use this feature via :ref:`DAMON sysfs interface
<sysfs_interface>`, refer to :ref:`intervals_goal <sysfs_scheme>` part of
the documentation.
<sysfs_interface>`, refer to :ref:`intervals_goal
<damon_usage_sysfs_monitoring_intervals_goal>` part of the documentation.
.. _damon_design_damos:
@@ -564,9 +564,9 @@ aggressiveness (the quota) of the corresponding scheme. For example, if DAMOS
is under achieving the goal, DAMOS automatically increases the quota. If DAMOS
is over achieving the goal, it decreases the quota.
The goal can be specified with four parameters, namely ``target_metric``,
``target_value``, ``current_value`` and ``nid``. The auto-tuning mechanism
tries to make ``current_value`` of ``target_metric`` be same to
The goal can be specified with five parameters, namely ``target_metric``,
``target_value``, ``current_value``, ``nid`` and ``path``. The auto-tuning
mechanism tries to make ``current_value`` of ``target_metric`` be same to
``target_value``.
- ``user_input``: User-provided value. Users could use any metric that they
@@ -581,9 +581,18 @@ tries to make ``current_value`` of ``target_metric`` be same to
set by users at the initial time. In other words, DAMOS does self-feedback.
- ``node_mem_used_bp``: Specific NUMA node's used memory ratio in bp (1/10,000).
- ``node_mem_free_bp``: Specific NUMA node's free memory ratio in bp (1/10,000).
- ``node_memcg_used_bp``: Specific cgroup's node used memory ratio for a
specific NUMA node, in bp (1/10,000).
- ``node_memcg_free_bp``: Specific cgroup's node unused memory ratio for a
specific NUMA node, in bp (1/10,000).
``nid`` is optionally required for only ``node_mem_used_bp`` and
``node_mem_free_bp`` to point the specific NUMA node.
``nid`` is optionally required for only ``node_mem_used_bp``,
``node_mem_free_bp``, ``node_memcg_used_bp`` and ``node_memcg_free_bp`` to
point the specific NUMA node.
``path`` is optionally required for only ``node_memcg_used_bp`` and
``node_memcg_free_bp`` to point the path to the cgroup. The value should be
the path of the memory cgroup from the cgroups mount point.
To know how user-space can set the tuning goal metric, the target value, and/or
the current value via :ref:`DAMON sysfs interface <sysfs_interface>`, refer to

View File

@@ -27,8 +27,8 @@ maintainer.
Note again the patches for `mm-new tree
<https://git.kernel.org/akpm/mm/h/mm-new>`_ are queued by the memory management
subsystem maintainer. If the patches requires some patches in `damon/next tree
<https://git.kernel.org/sj/h/damon/next>`_ which not yet merged in mm-new,
subsystem maintainer. If the patches require some patches in `damon/next tree
<https://git.kernel.org/sj/h/damon/next>`_ which have not yet merged in mm-new,
please make sure the requirement is clearly specified.
Submit checklist addendum
@@ -57,7 +57,7 @@ Key cycle dates
Patches can be sent anytime. Key cycle dates of the `mm-new
<https://git.kernel.org/akpm/mm/h/mm-new>`_, `mm-unstable
<https://git.kernel.org/akpm/mm/h/mm-unstable>`_and `mm-stable
<https://git.kernel.org/akpm/mm/h/mm-unstable>`_ and `mm-stable
<https://git.kernel.org/akpm/mm/h/mm-stable>`_ trees depend on the memory
management subsystem maintainer.
@@ -99,5 +99,5 @@ Schedules and reservation status are available at the Google `doc
<https://docs.google.com/document/d/1v43Kcj3ly4CYqmAkMaZzLiM2GEnWfgdGbZAH3mi2vpM/edit?usp=sharing>`_.
There is also a public Google `calendar
<https://calendar.google.com/calendar/u/0?cid=ZDIwOTA4YTMxNjc2MDQ3NTIyMmUzYTM5ZmQyM2U4NDA0ZGIwZjBiYmJlZGQxNDM0MmY4ZTRjOTE0NjdhZDRiY0Bncm91cC5jYWxlbmRhci5nb29nbGUuY29t>`_
that has the events. Anyone can subscribe it. DAMON maintainer will also
provide periodic reminder to the mailing list (damon@lists.linux.dev).
that has the events. Anyone can subscribe to it. DAMON maintainer will also
provide periodic reminders to the mailing list (damon@lists.linux.dev).

View File

@@ -165,7 +165,7 @@ The users of `ZONE_DEVICE` are:
* pmem: Map platform persistent memory to be used as a direct-I/O target
via DAX mappings.
* hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()`
* hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->folio_free()`
event callbacks to allow a device-driver to coordinate memory management
events related to device-memory, typically GPU memory. See
Documentation/mm/hmm.rst.

View File

@@ -27,7 +27,10 @@ enabled. Other usages are more than welcome.
It can also be used to show all the stacks and their current number of
allocated base pages, which gives us a quick overview of where the memory
is going without the need to screen through all the pages and match the
allocation and free operation.
allocation and free operation. It's also possible to show only a numeric
identifier of all the stacks (without stack traces) and their number of
allocated base pages (faster to read and parse, eg, for monitoring) that
can be matched with stacks later (show_handles and show_stacks_handles).
page owner is disabled by default. So, if you'd like to use it, you need
to add "page_owner=on" to your boot cmdline. If the kernel is built
@@ -116,6 +119,33 @@ Usage
nr_base_pages: 20824
...
cat /sys/kernel/debug/page_owner_stacks/show_handles > handles_7000.txt
cat handles_7000.txt
handle: 42
nr_base_pages: 20824
...
cat /sys/kernel/debug/page_owner_stacks/show_stacks_handles > stacks_handles.txt
cat stacks_handles.txt
post_alloc_hook+0x177/0x1a0
get_page_from_freelist+0xd01/0xd80
__alloc_pages+0x39e/0x7e0
alloc_pages_mpol+0x22e/0x490
folio_alloc+0xd5/0x110
filemap_alloc_folio+0x78/0x230
page_cache_ra_order+0x287/0x6f0
filemap_get_pages+0x517/0x1160
filemap_read+0x304/0x9f0
xfs_file_buffered_read+0xe6/0x1d0 [xfs]
xfs_file_read_iter+0x1f0/0x380 [xfs]
__kernel_read+0x3b9/0x730
kernel_read_file+0x309/0x4d0
__do_sys_finit_module+0x381/0x730
do_syscall_64+0x8d/0x150
entry_SYSCALL_64_after_hwframe+0x62/0x6a
handle: 42
...
cat /sys/kernel/debug/page_owner > page_owner_full.txt
./page_owner_sort page_owner_full.txt sorted_page_owner.txt

View File

@@ -48,7 +48,8 @@ Terminology
* **VMA locks** - The VMA lock is at VMA granularity (of course) which behaves
as a read/write semaphore in practice. A VMA read lock is obtained via
:c:func:`!lock_vma_under_rcu` (and unlocked via :c:func:`!vma_end_read`) and a
write lock via :c:func:`!vma_start_write` (all VMA write locks are unlocked
write lock via vma_start_write() or vma_start_write_killable()
(all VMA write locks are unlocked
automatically when the mmap write lock is released). To take a VMA write lock
you **must** have already acquired an :c:func:`!mmap_write_lock`.
* **rmap locks** - When trying to access VMAs through the reverse mapping via a
@@ -907,3 +908,9 @@ Stack expansion
Stack expansion throws up additional complexities in that we cannot permit there
to be racing page faults, as a result we invoke :c:func:`!vma_start_write` to
prevent this in :c:func:`!expand_downwards` or :c:func:`!expand_upwards`.
------------------------
Functions and structures
------------------------
.. kernel-doc:: include/linux/mmap_lock.h

View File

@@ -7286,6 +7286,41 @@ exit, even without calls to ``KVM_ENABLE_CAP`` or similar. In this case,
it will enter with output fields already valid; in the common case, the
``unknown.ret`` field of the union will be ``TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED``.
Userspace need not do anything if it does not wish to support a TDVMCALL.
::
/* KVM_EXIT_ARM_SEA */
struct {
#define KVM_EXIT_ARM_SEA_FLAG_GPA_VALID (1ULL << 0)
__u64 flags;
__u64 esr;
__u64 gva;
__u64 gpa;
} arm_sea;
Used on arm64 systems. When the VM capability ``KVM_CAP_ARM_SEA_TO_USER`` is
enabled, a KVM exits to userspace if a guest access causes a synchronous
external abort (SEA) and the host APEI fails to handle the SEA.
``esr`` is set to a sanitized value of ESR_EL2 from the exception taken to KVM,
consisting of the following fields:
- ``ESR_EL2.EC``
- ``ESR_EL2.IL``
- ``ESR_EL2.FnV``
- ``ESR_EL2.EA``
- ``ESR_EL2.CM``
- ``ESR_EL2.WNR``
- ``ESR_EL2.FSC``
- ``ESR_EL2.SET`` (when FEAT_RAS is implemented for the VM)
``gva`` is set to the value of FAR_EL2 from the exception taken to KVM when
``ESR_EL2.FnV == 0``. Otherwise, the value of ``gva`` is unknown.
``gpa`` is set to the faulting IPA from the exception taken to KVM when
the ``KVM_EXIT_ARM_SEA_FLAG_GPA_VALID`` flag is set. Otherwise, the value of
``gpa`` is unknown.
::
/* Fix the size of the union. */
@@ -7820,7 +7855,7 @@ where 0xff represents CPUs 0-7 in cluster 0.
:Architectures: s390
:Parameters: none
With this capability enabled, all illegal instructions 0x0000 (2 bytes) will
With this capability enabled, the illegal instruction 0x0000 (2 bytes) will
be intercepted and forwarded to user space. User space can use this
mechanism e.g. to realize 2-byte software breakpoints. The kernel will
not inject an operating exception for these instructions, user space has
@@ -8028,7 +8063,7 @@ will be initialized to 1 when created. This also improves performance because
dirty logging can be enabled gradually in small chunks on the first call
to KVM_CLEAR_DIRTY_LOG. KVM_DIRTY_LOG_INITIALLY_SET depends on
KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (it is also only available on
x86 and arm64 for now).
x86, arm64 and riscv for now).
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make
@@ -8524,7 +8559,7 @@ Therefore, the ioctl must be called *before* reading the content of
the dirty pages.
The dirty ring can get full. When it happens, the KVM_RUN of the
vcpu will return with exit reason KVM_EXIT_DIRTY_LOG_FULL.
vcpu will return with exit reason KVM_EXIT_DIRTY_RING_FULL.
The dirty ring interface has a major difference comparing to the
KVM_GET_DIRTY_LOG interface in that, when reading the dirty ring from
@@ -8692,7 +8727,7 @@ given VM.
When this capability is enabled, KVM resets the VCPU when setting
MP_STATE_INIT_RECEIVED through IOCTL. The original MP_STATE is preserved.
7.43 KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED
7.44 KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED
-------------------------------------------
:Architectures: arm64
@@ -8703,6 +8738,33 @@ This capability indicate to the userspace whether a PFNMAP memory region
can be safely mapped as cacheable. This relies on the presence of
force write back (FWB) feature support on the hardware.
7.45 KVM_CAP_ARM_SEA_TO_USER
----------------------------
:Architecture: arm64
:Target: VM
:Parameters: none
:Returns: 0 on success, -EINVAL if unsupported.
When this capability is enabled, KVM may exit to userspace for SEAs taken to
EL2 resulting from a guest access. See ``KVM_EXIT_ARM_SEA`` for more
information.
7.46 KVM_CAP_S390_USER_OPEREXEC
-------------------------------
:Architectures: s390
:Parameters: none
When this capability is enabled KVM forwards all operation exceptions
that it doesn't handle itself to user space. This also includes the
0x0000 instructions managed by KVM_CAP_S390_USER_INSTR0. This is
helpful if user space wants to emulate instructions which are not
(yet) implemented in hardware.
This capability can be enabled dynamically even if VCPUs were already
created and are running.
8. Other capabilities.
======================

View File

@@ -48,7 +48,14 @@ versus "has_error_code", i.e. KVM's ABI follows AMD behavior.
Nested virtualization features
------------------------------
TBD
On AMD CPUs, when GIF is cleared, #DB exceptions or traps due to a breakpoint
register match are ignored and discarded by the CPU. The CPU relies on the VMM
to fully virtualize this behavior, even when vGIF is enabled for the guest
(i.e. vGIF=0 does not cause the CPU to drop #DBs when the guest is running).
KVM does not virtualize this behavior as the complexity is unjustified given
the rarity of the use case. One way to handle this would be for KVM to
intercept the #DB, temporarily disable the breakpoint, single-step over the
instruction, then re-enable the breakpoint.
x2APIC
------

View File

@@ -2575,6 +2575,14 @@ S: Maintained
F: Documentation/devicetree/bindings/arm/blaize.yaml
F: arch/arm64/boot/dts/blaize/
ARM/BST SOC SUPPORT
M: Ge Gordon <gordon.ge@bst.ai>
R: BST Linux Kernel Upstream Group <bst-upstream@bstai.top>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
S: Supported
F: Documentation/devicetree/bindings/arm/bst.yaml
F: arch/arm64/boot/dts/bst/
ARM/CALXEDA HIGHBANK ARCHITECTURE
M: Andre Przywara <andre.przywara@arm.com>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
@@ -4582,7 +4590,7 @@ F: drivers/net/ethernet/netronome/nfp/bpf/
BPF JIT for POWERPC (32-BIT AND 64-BIT)
M: Hari Bathini <hbathini@linux.ibm.com>
M: Christophe Leroy <christophe.leroy@csgroup.eu>
M: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
R: Naveen N Rao <naveen@kernel.org>
L: bpf@vger.kernel.org
S: Supported
@@ -10074,7 +10082,7 @@ F: drivers/spi/spi-fsl-qspi.c
FREESCALE QUICC ENGINE LIBRARY
M: Qiang Zhao <qiang.zhao@nxp.com>
M: Christophe Leroy <christophe.leroy@csgroup.eu>
M: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
L: linuxppc-dev@lists.ozlabs.org
S: Maintained
F: drivers/soc/fsl/qe/
@@ -10127,7 +10135,7 @@ S: Maintained
F: drivers/tty/serial/ucc_uart.c
FREESCALE SOC DRIVERS
M: Christophe Leroy <christophe.leroy@csgroup.eu>
M: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
L: linuxppc-dev@lists.ozlabs.org
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
S: Maintained
@@ -10671,6 +10679,7 @@ F: Documentation/devicetree/bindings/clock/google,gs101-clock.yaml
F: Documentation/devicetree/bindings/soc/google/google,gs101-pmu-intr-gen.yaml
F: arch/arm64/boot/dts/exynos/google/
F: drivers/clk/samsung/clk-gs101.c
F: drivers/soc/samsung/gs101-pmu.c
F: drivers/phy/samsung/phy-gs101-ufs.c
F: include/dt-bindings/clock/google,gs101.h
K: [gG]oogle.?[tT]ensor
@@ -11604,6 +11613,8 @@ F: mm/hugetlb.c
F: mm/hugetlb_cgroup.c
F: mm/hugetlb_cma.c
F: mm/hugetlb_cma.h
F: mm/hugetlb_sysctl.c
F: mm/hugetlb_sysfs.c
F: mm/hugetlb_vmemmap.c
F: mm/hugetlb_vmemmap.h
F: tools/testing/selftests/cgroup/test_hugetlb_memcg.c
@@ -11621,6 +11632,8 @@ M: Miaohe Lin <linmiaohe@huawei.com>
R: Naoya Horiguchi <nao.horiguchi@gmail.com>
L: linux-mm@kvack.org
S: Maintained
F: include/linux/memory-failure.h
F: include/trace/events/memory-failure.h
F: mm/hwpoison-inject.c
F: mm/memory-failure.c
@@ -14387,7 +14400,7 @@ LINUX FOR POWERPC (32-BIT AND 64-BIT)
M: Madhavan Srinivasan <maddy@linux.ibm.com>
M: Michael Ellerman <mpe@ellerman.id.au>
R: Nicholas Piggin <npiggin@gmail.com>
R: Christophe Leroy <christophe.leroy@csgroup.eu>
R: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
L: linuxppc-dev@lists.ozlabs.org
S: Supported
W: https://github.com/linuxppc/wiki/wiki
@@ -14443,7 +14456,7 @@ F: Documentation/devicetree/bindings/powerpc/fsl/
F: arch/powerpc/platforms/85xx/
LINUX FOR POWERPC EMBEDDED PPC8XX AND PPC83XX
M: Christophe Leroy <christophe.leroy@csgroup.eu>
M: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
L: linuxppc-dev@lists.ozlabs.org
S: Maintained
F: arch/powerpc/platforms/8xx/
@@ -16346,6 +16359,7 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
F: include/linux/gfp.h
F: include/linux/gfp_types.h
F: include/linux/highmem.h
F: include/linux/leafops.h
F: include/linux/memory.h
F: include/linux/mm.h
F: include/linux/mm_*.h
@@ -16353,6 +16367,7 @@ F: include/linux/mmzone.h
F: include/linux/mmdebug.h
F: include/linux/mmu_notifier.h
F: include/linux/pagewalk.h
F: include/linux/pgalloc.h
F: include/linux/pgtable.h
F: include/linux/ptdump.h
F: include/linux/vmpressure.h
@@ -22269,6 +22284,14 @@ F: Documentation/devicetree/bindings/riscv/andes.yaml
F: Documentation/devicetree/bindings/timer/andestech,plmt0.yaml
F: arch/riscv/boot/dts/andes/
RISC-V ANLOGIC SoC SUPPORT
M: Conor Dooley <conor@kernel.org>
T: git https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git/
L: linux-riscv@lists.infradead.org
S: Odd Fixes
F: Documentation/devicetree/bindings/riscv/anlogic.yaml
F: arch/riscv/boot/dts/anlogic/
RISC-V ARCHITECTURE
M: Paul Walmsley <pjw@kernel.org>
M: Palmer Dabbelt <palmer@dabbelt.com>
@@ -22293,11 +22316,12 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
F: Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
F: drivers/iommu/riscv/
RISC-V MICROCHIP FPGA SUPPORT
RISC-V MICROCHIP SUPPORT
M: Conor Dooley <conor.dooley@microchip.com>
M: Daire McNamara <daire.mcnamara@microchip.com>
L: linux-riscv@lists.infradead.org
S: Supported
T: git https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git/ (dts, soc, firmware)
F: Documentation/devicetree/bindings/clock/microchip,mpfs*.yaml
F: Documentation/devicetree/bindings/gpio/microchip,mpfs-gpio.yaml
F: Documentation/devicetree/bindings/i2c/microchip,corei2c.yaml
@@ -22319,6 +22343,8 @@ F: drivers/pci/controller/plda/pcie-microchip-host.c
F: drivers/pwm/pwm-microchip-core.c
F: drivers/reset/reset-mpfs.c
F: drivers/rtc/rtc-mpfs.c
F: drivers/soc/microchip/mpfs-control-scb.c
F: drivers/soc/microchip/mpfs-mss-top-sysreg.c
F: drivers/soc/microchip/mpfs-sys-controller.c
F: drivers/spi/spi-microchip-core-qspi.c
F: drivers/spi/spi-mpfs.c
@@ -22328,13 +22354,10 @@ F: include/soc/microchip/mpfs.h
RISC-V MISC SOC SUPPORT
M: Conor Dooley <conor@kernel.org>
L: linux-riscv@lists.infradead.org
S: Maintained
Q: https://patchwork.kernel.org/project/linux-riscv/list/
S: Odd Fixes
T: git https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git/
F: arch/riscv/boot/dts/canaan/
F: arch/riscv/boot/dts/microchip/
F: arch/riscv/boot/dts/sifive/
F: arch/riscv/boot/dts/starfive/
RISC-V PMU DRIVERS
M: Atish Patra <atish.patra@linux.dev>
@@ -22372,6 +22395,15 @@ F: arch/riscv/boot/dts/spacemit/
N: spacemit
K: spacemit
RISC-V TENSTORRENT SoC SUPPORT
M: Drew Fustini <dfustini@oss.tenstorrent.com>
M: Joel Stanley <jms@oss.tenstorrent.com>
L: linux-riscv@lists.infradead.org
S: Maintained
T: git https://github.com/tenstorrent/linux.git
F: Documentation/devicetree/bindings/riscv/tenstorrent.yaml
F: arch/riscv/boot/dts/tenstorrent/
RISC-V THEAD SoC SUPPORT
M: Drew Fustini <fustini@kernel.org>
M: Guo Ren <guoren@kernel.org>
@@ -24666,10 +24698,13 @@ F: drivers/staging/
STANDALONE CACHE CONTROLLER DRIVERS
M: Conor Dooley <conor@kernel.org>
M: Jonathan Cameron <jonathan.cameron@huawei.com>
S: Maintained
T: git https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git/
F: Documentation/devicetree/bindings/cache/
F: drivers/cache
F: include/linux/cache_coherency.h
F: lib/cache_maint.c
STARFIRE/DURALAN NETWORK DRIVER
M: Ion Badulescu <ionut@badula.org>
@@ -24694,7 +24729,10 @@ F: drivers/crypto/starfive/
STARFIVE DEVICETREES
M: Emil Renner Berthing <kernel@esmil.dk>
M: Conor Dooley <conor@kernel.org>
L: linux-riscv@lists.infradead.org
S: Maintained
T: git https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git/
F: arch/riscv/boot/dts/starfive/
STARFIVE DWMAC GLUE LAYER

View File

@@ -39,6 +39,8 @@ dtb-$(CONFIG_ARCH_ASPEED) += \
aspeed-bmc-facebook-yamp.dtb \
aspeed-bmc-facebook-yosemitev2.dtb \
aspeed-bmc-facebook-yosemite4.dtb \
aspeed-bmc-facebook-yosemite5.dtb \
aspeed-bmc-ibm-balcones.dtb \
aspeed-bmc-ibm-blueridge.dtb \
aspeed-bmc-ibm-bonnell.dtb \
aspeed-bmc-ibm-everest.dtb \

View File

@@ -95,6 +95,11 @@
label = "bmc_ready_cpld_noled";
gpios = <&gpio0 ASPEED_GPIO(P, 5) (GPIO_ACTIVE_HIGH|GPIO_TRANSITORY)>;
};
led-hdd {
label = "hdd_led";
gpios = <&io_expander13 1 GPIO_ACTIVE_LOW>;
};
};
memory@80000000 {
@@ -642,12 +647,14 @@
power-monitor@12 {
compatible = "ti,lm5066i";
reg = <0x12>;
shunt-resistor-micro-ohms = <183>;
};
// PDB
power-monitor@14 {
compatible = "ti,lm5066i";
reg = <0x14>;
shunt-resistor-micro-ohms = <183>;
};
// Module 0
@@ -1197,7 +1204,7 @@
#gpio-cells = <2>;
gpio-line-names =
"rmc_en_dc_pwr_on",
"",
"HDD_LED_N",
"",
"",
"",

View File

@@ -240,6 +240,14 @@
&i2c1 {
status = "okay";
mctp-controller;
multi-master;
mctp@10 {
compatible = "mctp-i2c-controller";
reg = <(0x10 | I2C_OWN_SLAVE_ADDRESS)>;
};
temperature-sensor@4b {
compatible = "ti,tmp75";
reg = <0x4b>;

File diff suppressed because it is too large Load Diff

View File

@@ -49,6 +49,20 @@
reg = <0x80000000 0x80000000>;
};
reserved-memory {
#address-cells = <1>;
#size-cells = <1>;
ranges;
ramoops@b8dfa000 {
compatible = "ramoops";
reg = <0xb8dfa000 0x6000>;
record-size = <0x2000>;
console-size = <0x2000>;
pmsg-size = <0x2000>;
max-reason = <1>;
};
};
iio-hwmon {
compatible = "iio-hwmon";
io-channels = <&adc0 0>, <&adc0 1>, <&adc0 2>, <&adc0 3>,

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,609 @@
// SPDX-License-Identifier: GPL-2.0-or-later
// Copyright 2025 IBM Corp.
/dts-v1/;
#include <dt-bindings/gpio/aspeed-gpio.h>
#include <dt-bindings/i2c/i2c.h>
#include <dt-bindings/leds/leds-pca955x.h>
#include "aspeed-g6.dtsi"
#include "ibm-power11-dual.dtsi"
/ {
model = "Balcones";
compatible = "ibm,balcones-bmc", "aspeed,ast2600";
aliases {
serial4 = &uart5;
i2c16 = &i2c11mux0chn0;
i2c17 = &i2c11mux0chn1;
i2c18 = &i2c11mux0chn2;
i2c19 = &i2c11mux0chn3;
};
chosen {
stdout-path = &uart5;
};
gpio-keys-polled {
compatible = "gpio-keys-polled";
poll-interval = <1000>;
event-fan0-presence {
gpios = <&gpio0 ASPEED_GPIO(F, 4) GPIO_ACTIVE_LOW>;
label = "fan0-presence";
linux,code = <6>;
};
event-fan1-presence {
gpios = <&gpio0 ASPEED_GPIO(F, 5) GPIO_ACTIVE_LOW>;
label = "fan1-presence";
linux,code = <7>;
};
};
iio-hwmon {
compatible = "iio-hwmon";
io-channels = <&adc1 7>;
};
leds {
compatible = "gpio-leds";
led-fan0 {
gpios = <&gpio0 ASPEED_GPIO(G, 0) GPIO_ACTIVE_LOW>;
};
led-fan1 {
gpios = <&gpio0 ASPEED_GPIO(G, 1) GPIO_ACTIVE_LOW>;
};
led-rear-enc-id0 {
gpios = <&gpio0 ASPEED_GPIO(H, 2) GPIO_ACTIVE_LOW>;
};
led-rear-enc-fault0 {
gpios = <&gpio0 ASPEED_GPIO(H, 3) GPIO_ACTIVE_LOW>;
};
};
memory@80000000 {
device_type = "memory";
reg = <0x80000000 0x40000000>;
};
reserved-memory {
#address-cells = <1>;
#size-cells = <1>;
ranges;
event_log: region@b3d00000 {
reg = <0xb3d00000 0x100000>;
no-map;
};
ramoops@b3e00000 {
compatible = "ramoops";
reg = <0xb3e00000 0x200000>; /* 16 * (4 * 0x8000) */
record-size = <0x8000>;
console-size = <0x8000>;
ftrace-size = <0x8000>;
pmsg-size = <0x8000>;
max-reason = <3>; /* KMSG_DUMP_EMERG */
};
/* LPC FW cycle bridge region requires natural alignment */
flash_memory: region@b4000000 {
reg = <0xb4000000 0x04000000>; /* 64M */
no-map;
};
/* VGA region is dictated by hardware strapping */
vga_memory: region@bf000000 {
compatible = "shared-dma-pool";
reg = <0xbf000000 0x01000000>; /* 16M */
no-map;
};
};
};
&adc1 {
aspeed,int-vref-microvolt = <2500000>;
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_adc8_default &pinctrl_adc9_default
&pinctrl_adc10_default &pinctrl_adc11_default
&pinctrl_adc12_default &pinctrl_adc13_default
&pinctrl_adc14_default &pinctrl_adc15_default>;
status = "okay";
};
&ehci1 {
status = "okay";
};
&emmc {
clk-phase-mmc-hs200 = <180>, <180>;
status = "okay";
};
&emmc_controller {
status = "okay";
};
&gpio0 {
gpio-line-names =
/*A0-A7*/ "","","","","","","","",
/*B0-B7*/ "","","","","","","checkstop","",
/*C0-C7*/ "","","","","","","","",
/*D0-D7*/ "","","","","","","","",
/*E0-E7*/ "","","","","","","","",
/*F0-F7*/ "","fan-ctlr-reset","rtc-battery-voltage-read-enable",
"reset-cause-pinhole","","","","",
/*G0-G7*/ "fan0","fan1","","","","","","",
/*H0-H7*/ "","","rear-enc-id0","rear-enc-fault0","","","","",
/*I0-I7*/ "","","","","","","bmc-secure-boot","",
/*J0-J7*/ "","","","","","","","",
/*K0-K7*/ "","","","","","","","",
/*L0-L7*/ "","","","","","","","",
/*M0-M7*/ "","","","","","","","",
/*N0-N7*/ "","","","","","","","",
/*O0-O7*/ "","","","usb-power","","","","",
/*P0-P7*/ "","","","","","","","",
/*Q0-Q7*/ "cfam-reset","","regulator-standby-faulted","","","","","",
/*R0-R7*/ "bmc-tpm-reset","power-chassis-control","power-chassis-good","","",
"","","",
/*S0-S7*/ "presence-ps0","presence-ps1","","","power-ffs-sync-history","","",
"",
/*T0-T7*/ "","","","","","","","",
/*U0-U7*/ "","","","","","","","",
/*V0-V7*/ "","","","","","","","",
/*W0-W7*/ "","","","","","","","",
/*X0-X7*/ "","","","","","","","",
/*Y0-Y7*/ "","","","","","","","",
/*Z0-Z7*/ "","","","","","","","";
usb-power-hog {
gpio-hog;
gpios = <ASPEED_GPIO(O, 3) GPIO_ACTIVE_LOW>;
output-high;
};
};
&i2c0 {
status = "okay";
gpio@20 {
compatible = "ti,tca9554";
reg = <0x20>;
gpio-controller;
#gpio-cells = <2>;
gpio-line-names =
"",
"RUSSEL_FW_I2C_ENABLE_N",
"RUSSEL_OPPANEL_PRESENCE_N",
"BLYTH_OPPANEL_PRESENCE_N",
"CPU_TPM_CARD_PRESENT_N",
"",
"",
"DASD_BP_PRESENT_N";
};
eeprom@51 {
compatible = "atmel,24c64";
reg = <0x51>;
};
};
&i2c1 {
status = "okay";
};
&i2c2 {
status = "okay";
pmic@64 {
compatible = "ti,ucd90160";
reg = <0x64>;
};
};
&i2c3 {
status = "okay";
power-supply@5a {
compatible = "acbel,fsg032";
reg = <0x5a>;
};
power-supply@5b {
compatible = "acbel,fsg032";
reg = <0x5b>;
};
};
&i2c4 {
status = "okay";
};
&i2c5 {
status = "okay";
eeprom@52 {
compatible = "atmel,24c64";
reg = <0x52>;
};
led-controller@62 {
compatible = "nxp,pca9551";
reg = <0x62>;
#address-cells = <1>;
#size-cells = <0>;
gpio-controller;
#gpio-cells = <2>;
led@0 {
reg = <0>;
default-state = "keep";
label = "cablecard2-cxp-top";
retain-state-shutdown;
type = <PCA955X_TYPE_LED>;
};
led@1 {
reg = <1>;
default-state = "keep";
label = "cablecard2-cxp-bot";
retain-state-shutdown;
type = <PCA955X_TYPE_LED>;
};
};
};
&i2c6 {
status = "okay";
};
&i2c7 {
multi-master;
status = "okay";
temperature-sensor@48 {
compatible = "ti,tmp275";
reg = <0x48>;
};
eeprom@50 {
compatible = "atmel,24c64";
reg = <0x50>;
};
eeprom@51 {
compatible = "atmel,24c64";
reg = <0x51>;
};
pwm@53 {
compatible = "maxim,max31785a";
reg = <0x53>;
};
led-controller@60 {
compatible = "nxp,pca9551";
reg = <0x60>;
#address-cells = <1>;
#size-cells = <0>;
gpio-controller;
#gpio-cells = <2>;
led@0 {
reg = <0>;
default-state = "keep";
label = "front-sys-id0";
retain-state-shutdown;
type = <PCA955X_TYPE_LED>;
};
led@1 {
reg = <1>;
default-state = "keep";
label = "front-check-log0";
retain-state-shutdown;
type = <PCA955X_TYPE_LED>;
};
led@2 {
reg = <2>;
default-state = "keep";
label = "front-enc-fault1";
retain-state-shutdown;
type = <PCA955X_TYPE_LED>;
};
led@3 {
reg = <3>;
default-state = "keep";
label = "front-sys-pwron0";
retain-state-shutdown;
type = <PCA955X_TYPE_LED>;
};
};
lcd-controller@62 {
compatible = "ibm,op-panel";
reg = <(0x62 | I2C_OWN_SLAVE_ADDRESS)>;
};
pressure-sensor@76 {
compatible = "infineon,dps310";
reg = <0x76>;
#io-channel-cells = <0>;
};
};
&i2c8 {
status = "okay";
rtc@32 {
compatible = "epson,rx8900";
reg = <0x32>;
};
eeprom@50 {
compatible = "atmel,24c128";
reg = <0x50>;
};
led-controller@60 {
compatible = "nxp,pca9551";
reg = <0x60>;
#address-cells = <1>;
#size-cells = <0>;
gpio-controller;
#gpio-cells = <2>;
gpio-line-names =
"",
"APSS_RESET_N",
"",
"N_MODE_CPU_N",
"",
"",
"P10_DCM_PRESENT",
"";
};
led-controller@61 {
compatible = "nxp,pca9552";
reg = <0x61>;
#address-cells = <1>;
#size-cells = <0>;
gpio-controller;
#gpio-cells = <2>;
gpio-line-names =
"",
"",
"SLOT2_PRSNT_EN_RSVD",
"",
"",
"",
"",
"SLOT2_EXPANDER_PRSNT_N",
"",
"",
"",
"",
"",
"",
"",
"";
};
};
&i2c9 {
status = "okay";
temperature-sensor@4c {
compatible = "ti,tmp423";
reg = <0x4c>;
};
};
&i2c10 {
status = "okay";
};
&i2c11 {
status = "okay";
gpio@20 {
compatible = "ti,tca9554";
reg = <0x20>;
gpio-controller;
#gpio-cells = <2>;
gpio-line-names =
"BOOT_RCVRY_TWI",
"BOOT_RCVRY_UART",
"",
"",
"",
"",
"",
"PE_SWITCH_RSTB_N";
};
temperature-sensor@4c {
compatible = "ti,tmp435";
reg = <0x4c>;
};
i2c-mux@75 {
compatible = "nxp,pca9849";
reg = <0x75>;
#address-cells = <1>;
#size-cells = <0>;
i2c-mux-idle-disconnect;
i2c11mux0chn0: i2c@0 {
reg = <0>;
#address-cells = <1>;
#size-cells = <0>;
};
i2c11mux0chn1: i2c@1 {
reg = <1>;
#address-cells = <1>;
#size-cells = <0>;
};
i2c11mux0chn2: i2c@2 {
reg = <2>;
#address-cells = <1>;
#size-cells = <0>;
};
i2c11mux0chn3: i2c@3 {
reg = <3>;
#address-cells = <1>;
#size-cells = <0>;
};
};
};
&i2c12 {
status = "okay";
tpm@2e {
compatible = "nuvoton,npct75x", "tcg,tpm-tis-i2c";
reg = <0x2e>;
memory-region = <&event_log>;
};
eeprom@50 {
compatible = "atmel,24c64";
reg = <0x50>;
};
};
&i2c13 {
status = "okay";
eeprom@50 {
compatible = "atmel,24c64";
reg = <0x50>;
};
led-controller@60 {
compatible = "nxp,pca9551";
reg = <0x60>;
#address-cells = <1>;
#size-cells = <0>;
gpio-controller;
#gpio-cells = <2>;
led@0 {
reg = <0>;
default-state = "keep";
label = "nvme3";
retain-state-shutdown;
type = <PCA955X_TYPE_LED>;
};
led@1 {
reg = <1>;
default-state = "keep";
label = "nvme2";
retain-state-shutdown;
type = <PCA955X_TYPE_LED>;
};
led@2 {
reg = <2>;
default-state = "keep";
label = "nvme1";
retain-state-shutdown;
type = <PCA955X_TYPE_LED>;
};
led@3 {
reg = <3>;
default-state = "keep";
label = "nvme0";
retain-state-shutdown;
type = <PCA955X_TYPE_LED>;
};
};
};
&i2c14 {
status = "okay";
};
&i2c15 {
status = "okay";
};
&ibt {
status = "okay";
};
&kcs2 {
aspeed,lpc-io-reg = <0xca8 0xcac>;
status = "okay";
};
&kcs3 {
aspeed,lpc-io-reg = <0xca2>;
aspeed,lpc-interrupts = <11 IRQ_TYPE_LEVEL_LOW>;
status = "okay";
};
&lpc_ctrl {
memory-region = <&flash_memory>;
status = "okay";
};
&mac2 {
clocks = <&syscon ASPEED_CLK_GATE_MAC3CLK>,
<&syscon ASPEED_CLK_MAC3RCLK>;
clock-names = "MACCLK", "RCLK";
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_rmii3_default>;
use-ncsi;
status = "okay";
};
&pinctrl_emmc_default {
bias-disable;
};
&uart2 {
status = "okay";
};
&uhci {
status = "okay";
};
&vuart1 {
status = "okay";
};
&vuart2 {
status = "okay";
};
&wdt1 {
aspeed,reset-type = "none";
aspeed,external-signal;
aspeed,ext-push-pull;
aspeed,ext-active-high;
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_wdtrst1_default>;
};
&wdt2 {
status = "okay";
};

View File

@@ -277,15 +277,11 @@
#size-cells = <0>;
fan0: fan@0 {
compatible = "pmbus-fan";
reg = <0>;
tach-pulses = <2>;
};
fan1: fan@1 {
compatible = "pmbus-fan";
reg = <1>;
tach-pulses = <2>;
};
};

View File

@@ -2066,27 +2066,19 @@
reg = <0x52>;
fan@0 {
compatible = "pmbus-fan";
reg = <0>;
tach-pulses = <2>;
};
fan@1 {
compatible = "pmbus-fan";
reg = <1>;
tach-pulses = <2>;
};
fan@2 {
compatible = "pmbus-fan";
reg = <2>;
tach-pulses = <2>;
};
fan@3 {
compatible = "pmbus-fan";
reg = <3>;
tach-pulses = <2>;
};
};

View File

@@ -1080,39 +1080,27 @@
#size-cells = <0>;
fan0: fan@0 {
compatible = "pmbus-fan";
reg = <0>;
tach-pulses = <2>;
};
fan1: fan@1 {
compatible = "pmbus-fan";
reg = <1>;
tach-pulses = <2>;
};
fan2: fan@2 {
compatible = "pmbus-fan";
reg = <2>;
tach-pulses = <2>;
};
fan3: fan@3 {
compatible = "pmbus-fan";
reg = <3>;
tach-pulses = <2>;
};
fan4: fan@4 {
compatible = "pmbus-fan";
reg = <4>;
tach-pulses = <2>;
};
fan5: fan@5 {
compatible = "pmbus-fan";
reg = <5>;
tach-pulses = <2>;
};
};

View File

@@ -481,55 +481,19 @@
#size-cells = <0>;
fan@0 {
compatible = "pmbus-fan";
reg = <0>;
tach-pulses = <2>;
maxim,fan-rotor-input = "tach";
maxim,fan-pwm-freq = <25000>;
maxim,fan-dual-tach;
maxim,fan-no-watchdog;
maxim,fan-no-fault-ramp;
maxim,fan-ramp = <2>;
maxim,fan-fault-pin-mon;
};
fan@1 {
compatible = "pmbus-fan";
reg = <1>;
tach-pulses = <2>;
maxim,fan-rotor-input = "tach";
maxim,fan-pwm-freq = <25000>;
maxim,fan-dual-tach;
maxim,fan-no-watchdog;
maxim,fan-no-fault-ramp;
maxim,fan-ramp = <2>;
maxim,fan-fault-pin-mon;
};
fan@2 {
compatible = "pmbus-fan";
reg = <2>;
tach-pulses = <2>;
maxim,fan-rotor-input = "tach";
maxim,fan-pwm-freq = <25000>;
maxim,fan-dual-tach;
maxim,fan-no-watchdog;
maxim,fan-no-fault-ramp;
maxim,fan-ramp = <2>;
maxim,fan-fault-pin-mon;
};
fan@3 {
compatible = "pmbus-fan";
reg = <3>;
tach-pulses = <2>;
maxim,fan-rotor-input = "tach";
maxim,fan-pwm-freq = <25000>;
maxim,fan-dual-tach;
maxim,fan-no-watchdog;
maxim,fan-no-fault-ramp;
maxim,fan-ramp = <2>;
maxim,fan-fault-pin-mon;
};
};

View File

@@ -0,0 +1,779 @@
// SPDX-License-Identifier: GPL-2.0-or-later
// Copyright 2025 IBM Corp.
/ {
aliases {
i2c100 = &cfam0_i2c0;
i2c101 = &cfam0_i2c1;
i2c110 = &cfam0_i2c10;
i2c111 = &cfam0_i2c11;
i2c112 = &cfam0_i2c12;
i2c113 = &cfam0_i2c13;
i2c114 = &cfam0_i2c14;
i2c115 = &cfam0_i2c15;
i2c202 = &cfam1_i2c2;
i2c203 = &cfam1_i2c3;
i2c210 = &cfam1_i2c10;
i2c211 = &cfam1_i2c11;
i2c214 = &cfam1_i2c14;
i2c215 = &cfam1_i2c15;
i2c216 = &cfam1_i2c16;
i2c217 = &cfam1_i2c17;
sbefifo100 = &sbefifo100;
sbefifo101 = &sbefifo101;
sbefifo110 = &sbefifo110;
sbefifo111 = &sbefifo111;
sbefifo112 = &sbefifo112;
sbefifo113 = &sbefifo113;
sbefifo114 = &sbefifo114;
sbefifo115 = &sbefifo115;
sbefifo202 = &sbefifo202;
sbefifo203 = &sbefifo203;
sbefifo210 = &sbefifo210;
sbefifo211 = &sbefifo211;
sbefifo214 = &sbefifo214;
sbefifo215 = &sbefifo215;
sbefifo216 = &sbefifo216;
sbefifo217 = &sbefifo217;
scom100 = &scom100;
scom101 = &scom101;
scom110 = &scom110;
scom111 = &scom111;
scom112 = &scom112;
scom113 = &scom113;
scom114 = &scom114;
scom115 = &scom115;
scom202 = &scom202;
scom203 = &scom203;
scom210 = &scom210;
scom211 = &scom211;
scom214 = &scom214;
scom215 = &scom215;
scom216 = &scom216;
scom217 = &scom217;
spi10 = &cfam0_spi0;
spi11 = &cfam0_spi1;
spi12 = &cfam0_spi2;
spi13 = &cfam0_spi3;
spi20 = &cfam1_spi0;
spi21 = &cfam1_spi1;
spi22 = &cfam1_spi2;
spi23 = &cfam1_spi3;
};
};
&fsim0 {
bus-frequency = <100000000>;
#address-cells = <2>;
#size-cells = <0>;
cfam-reset-gpios = <&gpio0 ASPEED_GPIO(Q, 0) GPIO_ACTIVE_HIGH>;
status = "okay";
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom@1000 {
compatible = "ibm,p9-scom";
reg = <0x1000 0x400>;
};
i2c@1800 {
compatible = "ibm,i2c-fsi";
reg = <0x1800 0x400>;
#address-cells = <1>;
#size-cells = <0>;
cfam0_i2c0: i2c-bus@0 {
reg = <0>; /* OMI01 */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom100: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo100: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam0_i2c1: i2c-bus@1 {
reg = <1>; /* OMI23 */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom101: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo101: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam0_i2c10: i2c-bus@a {
reg = <10>; /* OP3A */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom110: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo110: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam0_i2c11: i2c-bus@b {
reg = <11>; /* OP3B */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom111: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo111: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam0_i2c12: i2c-bus@c {
reg = <12>; /* OP4A */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom112: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo112: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam0_i2c13: i2c-bus@d {
reg = <13>; /* OP4B */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom113: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo113: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam0_i2c14: i2c-bus@e {
reg = <14>; /* OP5A */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom114: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo114: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam0_i2c15: i2c-bus@f {
reg = <15>; /* OP5B */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom115: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo115: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
};
fsi2spi@1c00 {
compatible = "ibm,fsi2spi";
reg = <0x1c00 0x400>;
#address-cells = <1>;
#size-cells = <0>;
cfam0_spi0: spi@0 {
compatible = "ibm,spi-fsi";
reg = <0x0>;
#address-cells = <1>;
#size-cells = <0>;
eeprom@0 {
compatible = "atmel,at25";
reg = <0>;
address-width = <24>;
pagesize = <256>;
size = <0x80000>;
spi-max-frequency = <10000000>;
};
};
cfam0_spi1: spi@20 {
compatible = "ibm,spi-fsi";
reg = <0x20>;
#address-cells = <1>;
#size-cells = <0>;
eeprom@0 {
compatible = "atmel,at25";
reg = <0>;
address-width = <24>;
pagesize = <256>;
size = <0x80000>;
spi-max-frequency = <10000000>;
};
};
cfam0_spi2: spi@40 {
compatible = "ibm,spi-fsi";
reg = <0x40>;
#address-cells = <1>;
#size-cells = <0>;
eeprom@0 {
compatible = "atmel,at25";
reg = <0>;
address-width = <24>;
pagesize = <256>;
size = <0x80000>;
spi-max-frequency = <10000000>;
};
};
cfam0_spi3: spi@60 {
compatible = "ibm,spi-fsi";
reg = <0x60>;
#address-cells = <1>;
#size-cells = <0>;
eeprom@0 {
compatible = "atmel,at25";
reg = <0>;
address-width = <24>;
pagesize = <256>;
size = <0x80000>;
spi-max-frequency = <10000000>;
};
};
};
sbefifo@2400 {
compatible = "ibm,p9-sbefifo";
reg = <0x2400 0x400>;
occ {
compatible = "ibm,p10-occ";
hwmon {
compatible = "ibm,p10-occ-hwmon";
ibm,no-poll-on-init;
};
};
};
fsi_hub0: fsi@3400 {
compatible = "ibm,p9-fsi-controller";
reg = <0x3400 0x400>;
#address-cells = <2>;
#size-cells = <0>;
};
};
};
&fsi_hub0 {
cfam@1,0 {
reg = <1 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <1>;
scom@1000 {
compatible = "ibm,p9-scom";
reg = <0x1000 0x400>;
};
i2c@1800 {
compatible = "ibm,i2c-fsi";
reg = <0x1800 0x400>;
#address-cells = <1>;
#size-cells = <0>;
cfam1_i2c2: i2c-bus@2 {
reg = <2>; /* OMI45 */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom202: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo202: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam1_i2c3: i2c-bus@3 {
reg = <3>; /* OMI67 */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom203: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo203: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam1_i2c10: i2c-bus@a {
reg = <10>; /* OP3A */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom210: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo210: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam1_i2c11: i2c-bus@b {
reg = <11>; /* OP3B */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom211: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo211: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam1_i2c14: i2c-bus@e {
reg = <14>; /* OP5A */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom214: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo214: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam1_i2c15: i2c-bus@f {
reg = <15>; /* OP5B */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom215: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo215: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam1_i2c16: i2c-bus@10 {
reg = <16>; /* OP6A */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom216: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo216: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam1_i2c17: i2c-bus@11 {
reg = <17>; /* OP6B */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom217: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo217: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
};
fsi2spi@1c00 {
compatible = "ibm,fsi2spi";
reg = <0x1c00 0x400>;
#address-cells = <1>;
#size-cells = <0>;
cfam1_spi0: spi@0 {
compatible = "ibm,spi-fsi";
reg = <0x0>;
#address-cells = <1>;
#size-cells = <0>;
eeprom@0 {
compatible = "atmel,at25";
reg = <0>;
address-width = <24>;
pagesize = <256>;
size = <0x80000>;
spi-max-frequency = <10000000>;
};
};
cfam1_spi1: spi@20 {
compatible = "ibm,spi-fsi";
reg = <0x20>;
#address-cells = <1>;
#size-cells = <0>;
eeprom@0 {
compatible = "atmel,at25";
reg = <0>;
address-width = <24>;
pagesize = <256>;
size = <0x80000>;
spi-max-frequency = <10000000>;
};
};
cfam1_spi2: spi@40 {
compatible = "ibm,spi-fsi";
reg = <0x40>;
#address-cells = <1>;
#size-cells = <0>;
eeprom@0 {
compatible = "atmel,at25";
reg = <0>;
address-width = <24>;
pagesize = <256>;
size = <0x80000>;
spi-max-frequency = <10000000>;
};
};
cfam1_spi3: spi@60 {
compatible = "ibm,spi-fsi";
reg = <0x60>;
#address-cells = <1>;
#size-cells = <0>;
eeprom@0 {
compatible = "atmel,at25";
reg = <0>;
address-width = <24>;
pagesize = <256>;
size = <0x80000>;
spi-max-frequency = <10000000>;
};
};
};
sbefifo@2400 {
compatible = "ibm,p9-sbefifo";
reg = <0x2400 0x400>;
occ {
compatible = "ibm,p10-occ";
hwmon {
compatible = "ibm,p10-occ-hwmon";
ibm,no-poll-on-init;
};
};
};
fsi@3400 {
compatible = "ibm,p9-fsi-controller";
reg = <0x3400 0x400>;
#address-cells = <2>;
#size-cells = <0>;
no-scan-on-init;
};
};
};

View File

@@ -1,24 +1,10 @@
// SPDX-License-Identifier: GPL-2.0-or-later
// Copyright 2024 IBM Corp.
#include "ibm-power11-dual.dtsi"
/ {
aliases {
i2c100 = &cfam0_i2c0;
i2c101 = &cfam0_i2c1;
i2c110 = &cfam0_i2c10;
i2c111 = &cfam0_i2c11;
i2c112 = &cfam0_i2c12;
i2c113 = &cfam0_i2c13;
i2c114 = &cfam0_i2c14;
i2c115 = &cfam0_i2c15;
i2c202 = &cfam1_i2c2;
i2c203 = &cfam1_i2c3;
i2c210 = &cfam1_i2c10;
i2c211 = &cfam1_i2c11;
i2c214 = &cfam1_i2c14;
i2c215 = &cfam1_i2c15;
i2c216 = &cfam1_i2c16;
i2c217 = &cfam1_i2c17;
i2c300 = &cfam2_i2c0;
i2c301 = &cfam2_i2c1;
i2c310 = &cfam2_i2c10;
@@ -36,22 +22,6 @@
i2c416 = &cfam3_i2c16;
i2c417 = &cfam3_i2c17;
sbefifo100 = &sbefifo100;
sbefifo101 = &sbefifo101;
sbefifo110 = &sbefifo110;
sbefifo111 = &sbefifo111;
sbefifo112 = &sbefifo112;
sbefifo113 = &sbefifo113;
sbefifo114 = &sbefifo114;
sbefifo115 = &sbefifo115;
sbefifo202 = &sbefifo202;
sbefifo203 = &sbefifo203;
sbefifo210 = &sbefifo210;
sbefifo211 = &sbefifo211;
sbefifo214 = &sbefifo214;
sbefifo215 = &sbefifo215;
sbefifo216 = &sbefifo216;
sbefifo217 = &sbefifo217;
sbefifo300 = &sbefifo300;
sbefifo301 = &sbefifo301;
sbefifo310 = &sbefifo310;
@@ -69,22 +39,6 @@
sbefifo416 = &sbefifo416;
sbefifo417 = &sbefifo417;
scom100 = &scom100;
scom101 = &scom101;
scom110 = &scom110;
scom111 = &scom111;
scom112 = &scom112;
scom113 = &scom113;
scom114 = &scom114;
scom115 = &scom115;
scom202 = &scom202;
scom203 = &scom203;
scom210 = &scom210;
scom211 = &scom211;
scom214 = &scom214;
scom215 = &scom215;
scom216 = &scom216;
scom217 = &scom217;
scom300 = &scom300;
scom301 = &scom301;
scom310 = &scom310;
@@ -102,14 +56,6 @@
scom416 = &scom416;
scom417 = &scom417;
spi10 = &cfam0_spi0;
spi11 = &cfam0_spi1;
spi12 = &cfam0_spi2;
spi13 = &cfam0_spi3;
spi20 = &cfam1_spi0;
spi21 = &cfam1_spi1;
spi22 = &cfam1_spi2;
spi23 = &cfam1_spi3;
spi30 = &cfam2_spi0;
spi31 = &cfam2_spi1;
spi32 = &cfam2_spi2;
@@ -121,718 +67,7 @@
};
};
&fsim0 {
#address-cells = <2>;
#size-cells = <0>;
status = "okay";
bus-frequency = <100000000>;
cfam-reset-gpios = <&gpio0 ASPEED_GPIO(Q, 0) GPIO_ACTIVE_HIGH>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom@1000 {
compatible = "ibm,p9-scom";
reg = <0x1000 0x400>;
};
i2c@1800 {
compatible = "ibm,i2c-fsi";
reg = <0x1800 0x400>;
#address-cells = <1>;
#size-cells = <0>;
cfam0_i2c0: i2c-bus@0 {
reg = <0>; /* OMI01 */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom100: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo100: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam0_i2c1: i2c-bus@1 {
reg = <1>; /* OMI23 */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom101: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo101: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam0_i2c10: i2c-bus@a {
reg = <10>; /* OP3A */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom110: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo110: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam0_i2c11: i2c-bus@b {
reg = <11>; /* OP3B */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom111: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo111: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam0_i2c12: i2c-bus@c {
reg = <12>; /* OP4A */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom112: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo112: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam0_i2c13: i2c-bus@d {
reg = <13>; /* OP4B */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom113: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo113: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam0_i2c14: i2c-bus@e {
reg = <14>; /* OP5A */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom114: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo114: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam0_i2c15: i2c-bus@f {
reg = <15>; /* OP5B */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom115: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo115: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
};
fsi2spi@1c00 {
compatible = "ibm,fsi2spi";
reg = <0x1c00 0x400>;
#address-cells = <1>;
#size-cells = <0>;
cfam0_spi0: spi@0 {
compatible = "ibm,spi-fsi";
reg = <0x0>;
#address-cells = <1>;
#size-cells = <0>;
eeprom@0 {
compatible = "atmel,at25";
reg = <0>;
address-width = <24>;
pagesize = <256>;
size = <0x80000>;
spi-max-frequency = <10000000>;
};
};
cfam0_spi1: spi@20 {
compatible = "ibm,spi-fsi";
reg = <0x20>;
#address-cells = <1>;
#size-cells = <0>;
eeprom@0 {
compatible = "atmel,at25";
reg = <0>;
address-width = <24>;
pagesize = <256>;
size = <0x80000>;
spi-max-frequency = <10000000>;
};
};
cfam0_spi2: spi@40 {
compatible = "ibm,spi-fsi";
reg = <0x40>;
#address-cells = <1>;
#size-cells = <0>;
eeprom@0 {
compatible = "atmel,at25";
reg = <0>;
address-width = <24>;
pagesize = <256>;
size = <0x80000>;
spi-max-frequency = <10000000>;
};
};
cfam0_spi3: spi@60 {
compatible = "ibm,spi-fsi";
reg = <0x60>;
#address-cells = <1>;
#size-cells = <0>;
eeprom@0 {
compatible = "atmel,at25";
reg = <0>;
address-width = <24>;
pagesize = <256>;
size = <0x80000>;
spi-max-frequency = <10000000>;
};
};
};
sbefifo@2400 {
compatible = "ibm,p9-sbefifo";
reg = <0x2400 0x400>;
occ {
compatible = "ibm,p10-occ";
hwmon {
compatible = "ibm,p10-occ-hwmon";
ibm,no-poll-on-init;
};
};
};
fsi_hub0: fsi@3400 {
compatible = "ibm,p9-fsi-controller";
reg = <0x3400 0x400>;
#address-cells = <2>;
#size-cells = <0>;
};
};
};
&fsi_hub0 {
cfam@1,0 {
reg = <1 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <1>;
scom@1000 {
compatible = "ibm,p9-scom";
reg = <0x1000 0x400>;
};
i2c@1800 {
compatible = "ibm,i2c-fsi";
reg = <0x1800 0x400>;
#address-cells = <1>;
#size-cells = <0>;
cfam1_i2c2: i2c-bus@2 {
reg = <2>; /* OMI45 */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom202: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo202: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam1_i2c3: i2c-bus@3 {
reg = <3>; /* OMI67 */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom203: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo203: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam1_i2c10: i2c-bus@a {
reg = <10>; /* OP3A */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom210: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo210: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam1_i2c11: i2c-bus@b {
reg = <11>; /* OP3B */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom211: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo211: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam1_i2c14: i2c-bus@e {
reg = <14>; /* OP5A */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom214: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo214: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam1_i2c15: i2c-bus@f {
reg = <15>; /* OP5B */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom215: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo215: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam1_i2c16: i2c-bus@10 {
reg = <16>; /* OP6A */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom216: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo216: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
cfam1_i2c17: i2c-bus@11 {
reg = <17>; /* OP6B */
#address-cells = <1>;
#size-cells = <0>;
fsi@20 {
compatible = "ibm,i2cr-fsi-master";
reg = <0x20>;
#address-cells = <2>;
#size-cells = <0>;
cfam@0,0 {
reg = <0 0>;
#address-cells = <1>;
#size-cells = <1>;
chip-id = <0>;
scom217: scom@1000 {
compatible = "ibm,i2cr-scom";
reg = <0x1000 0x400>;
};
sbefifo217: sbefifo@2400 {
compatible = "ibm,odyssey-sbefifo";
reg = <0x2400 0x400>;
};
};
};
};
};
fsi2spi@1c00 {
compatible = "ibm,fsi2spi";
reg = <0x1c00 0x400>;
#address-cells = <1>;
#size-cells = <0>;
cfam1_spi0: spi@0 {
compatible = "ibm,spi-fsi";
reg = <0x0>;
#address-cells = <1>;
#size-cells = <0>;
eeprom@0 {
compatible = "atmel,at25";
reg = <0>;
address-width = <24>;
pagesize = <256>;
size = <0x80000>;
spi-max-frequency = <10000000>;
};
};
cfam1_spi1: spi@20 {
compatible = "ibm,spi-fsi";
reg = <0x20>;
#address-cells = <1>;
#size-cells = <0>;
eeprom@0 {
compatible = "atmel,at25";
reg = <0>;
address-width = <24>;
pagesize = <256>;
size = <0x80000>;
spi-max-frequency = <10000000>;
};
};
cfam1_spi2: spi@40 {
compatible = "ibm,spi-fsi";
reg = <0x40>;
#address-cells = <1>;
#size-cells = <0>;
eeprom@0 {
compatible = "atmel,at25";
reg = <0>;
address-width = <24>;
pagesize = <256>;
size = <0x80000>;
spi-max-frequency = <10000000>;
};
};
cfam1_spi3: spi@60 {
compatible = "ibm,spi-fsi";
reg = <0x60>;
#address-cells = <1>;
#size-cells = <0>;
eeprom@0 {
compatible = "atmel,at25";
reg = <0>;
address-width = <24>;
pagesize = <256>;
size = <0x80000>;
spi-max-frequency = <10000000>;
};
};
};
sbefifo@2400 {
compatible = "ibm,p9-sbefifo";
reg = <0x2400 0x400>;
occ {
compatible = "ibm,p10-occ";
hwmon {
compatible = "ibm,p10-occ-hwmon";
ibm,no-poll-on-init;
};
};
};
fsi@3400 {
compatible = "ibm,p9-fsi-controller";
reg = <0x3400 0x400>;
#address-cells = <2>;
#size-cells = <0>;
no-scan-on-init;
};
};
cfam@2,0 {
reg = <2 0>;
#address-cells = <1>;

Some files were not shown because too many files have changed in this diff Show More