linux

mirror of https://github.com/torvalds/linux.git synced 2025-12-07 20:06:24 +00:00

Author	SHA1	Message	Date
Linus Torvalds	51d90a15fe	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "ARM: - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM - Minor fixes to KVM and selftests Loongarch: - Get VM PMU capability from HW GCFG register - Add AVEC basic support - Use 64-bit register definition for EIOINTC - Add KVM timer test cases for tools/selftests RISC/V: - SBI message passing (MPXY) support for KVM guest - Give a new, more specific error subcode for the case when in-kernel AIA virtualization fails to allocate IMSIC VS-file - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually in small chunks - Fix guest page fault within HLV* instructions - Flush VS-stage TLB after VCPU migration for Andes cores s390: - Always allocate ESCA (Extended System Control Area), instead of starting with the basic SCA and converting to ESCA with the addition of the 65th vCPU. The price is increased number of exits (and worse performance) on z10 and earlier processor; ESCA was introduced by z114/z196 in 2010 - VIRT_XFER_TO_GUEST_WORK support - Operation exception forwarding support - Cleanups x86: - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE caching is disabled, as there can't be any relevant SPTEs to zap - Relocate a misplaced export - Fix an async #PF bug where KVM would clear the completion queue when the guest transitioned in and out of paging mode, e.g. when handling an SMI and then returning to paged mode via RSM - Leave KVM's user-return notifier registered even when disabling virtualization, as long as kvm.ko is loaded. On reboot/shutdown, keeping the notifier registered is ok; the kernel does not use the MSRs and the callback will run cleanly and restore host MSRs if the CPU manages to return to userspace before the system goes down - Use the checked version of {get,put}_user() - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC timers can result in a hard lockup in the host - Revert the periodic kvmclock sync logic now that KVM doesn't use a clocksource that's subject to NTP corrections - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter behind CONFIG_CPU_MITIGATIONS - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast path; the only reason they were handled in the fast path was to paper of a bug in the core #MC code, and that has long since been fixed - Add emulator support for AVX MOV instructions, to play nice with emulated devices whose guest drivers like to access PCI BARs with large multi-byte instructions x86 (AMD): - Fix a few missing "VMCB dirty" bugs - Fix the worst of KVM's lack of EFER.LMSLE emulation - Add AVIC support for addressing 4k vCPUs in x2AVIC mode - Fix incorrect handling of selective CR0 writes when checking intercepts during emulation of L2 instructions - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on VMRUN and #VMEXIT - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft interrupt if the guest patched the underlying code after the VM-Exit, e.g. when Linux patches code with a temporary INT3 - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to userspace, and extend KVM "support" to all policy bits that don't require any actual support from KVM x86 (Intel): - Use the root role from kvm_mmu_page to construct EPTPs instead of the current vCPU state, partly as worthwhile cleanup, but mostly to pave the way for tracking per-root TLB flushes, and elide EPT flushes on pCPU migration if the root is clean from a previous flush - Add a few missing nested consistency checks - Rip out support for doing "early" consistency checks via hardware as the functionality hasn't been used in years and is no longer useful in general; replace it with an off-by-default module param to WARN if hardware fails a check that KVM does not perform - Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32] on VM-Enter - Misc cleanups - Overhaul the TDX code to address systemic races where KVM (acting on behalf of userspace) could inadvertantly trigger lock contention in the TDX-Module; KVM was either working around these in weird, ugly ways, or was simply oblivious to them (though even Yan's devilish selftests could only break individual VMs, not the host kernel) - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a TDX vCPU, if creating said vCPU failed partway through - Fix a few sparse warnings (bad annotation, 0 != NULL) - Use struct_size() to simplify copying TDX capabilities to userspace - Fix a bug where TDX would effectively corrupt user-return MSR values if the TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected Selftests: - Fix a math goof in mmu_stress_test when running on a single-CPU system/VM - Forcefully override ARCH from x86_64 to x86 to play nice with specifying ARCH=x86_64 on the command line - Extend a bunch of nested VMX to validate nested SVM as well - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to verify KVM can save/restore nested VMX state when L1 is using 5-level paging, but L2 is not - Clean up the guest paging code in anticipation of sharing the core logic for nested EPT and nested NPT guest_memfd: - Add NUMA mempolicy support for guest_memfd, and clean up a variety of rough edges in guest_memfd along the way - Define a CLASS to automatically handle get+put when grabbing a guest_memfd from a memslot to make it harder to leak references - Enhance KVM selftests to make it easer to develop and debug selftests like those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs often result in hard-to-debug SIGBUS errors - Misc cleanups Generic: - Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for irqfd cleanup - Fix a goof in the dirty ring documentation - Fix choice of target for directed yield across different calls to kvm_vcpu_on_spin(); the function was always starting from the first vCPU instead of continuing the round-robin search" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits) KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions ...	2025-12-05 17:01:20 -08:00
Linus Torvalds	399ead3a6d	Merge tag 'uml-for-linux-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Johannes Berg: "Apart from the usual small churn, we have - initial SMP support (only kernel) - major vDSO cleanups (and fixes for 32-bit)" * tag 'uml-for-linux-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (33 commits) um: Disable KASAN_INLINE when STATIC_LINK is selected um: Don't rename vmap to kernel_vmap um: drivers: virtio: use string choices helper um: Always set up AT_HWCAP and AT_PLATFORM x86/um: Remove FIXADDR_USER_START and FIXADDR_USE_END um: Remove __access_ok_vsyscall() um: Remove redundant range check from __access_ok_vsyscall() um: Remove fixaddr_user_init() x86/um: Drop gate area handling x86/um: Do not inherit vDSO from host um: Split out default elf_aux_hwcap x86/um: Move ELF_PLATFORM fallback to x86-specific code um: Split out default elf_aux_platform um: Avoid circular dependency on asm-offsets in pgtable.h um: Enable SMP support on x86 asm-generic: percpu: Add assembly guard um: vdso: Remove getcpu support on x86 um: Add initial SMP support um: Define timers on a per-CPU basis um: Determine sleep based on need_resched() ...	2025-12-05 16:30:56 -08:00
Linus Torvalds	07025b51c1	Merge tag 'riscv-for-linus-6.19-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Paul Walmsley: - Enable parallel hotplug for RISC-V - Optimize vector regset allocation for ptrace() - Add a kernel selftest for the vector ptrace interface - Enable the userspace RAID6 test to build and run using RISC-V vectors - Add initial support for the Zalasr RISC-V ratified ISA extension - For the Zicbop RISC-V ratified ISA extension to userspace, expose hardware and kernel support to userspace and add a kselftest for Zicbop - Convert open-coded instances of 'asm goto's that are controlled by runtime ALTERNATIVEs to use riscv_has_extension_{un,}likely(), following arm64's alternative_has_cap_{un,}likely() - Remove an unnecessary mask in the GFP flags used in some calls to pagetable_alloc() * tag 'riscv-for-linus-6.19-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: selftests/riscv: Add Zicbop prefetch test riscv: hwprobe: Expose Zicbop extension and its block size riscv: Introduce Zalasr instructions riscv: hwprobe: Export Zalasr extension dt-bindings: riscv: Add Zalasr ISA extension description riscv: Add ISA extension parsing for Zalasr selftests: riscv: Add test for the Vector ptrace interface riscv: ptrace: Optimize the allocation of vector regset raid6: test: Add support for RISC-V raid6: riscv: Allow code to be compiled in userspace raid6: riscv: Prevent compiler from breaking inline vector assembly code riscv: cmpxchg: Use riscv_has_extension_likely riscv: bitops: Use riscv_has_extension_likely riscv: hweight: Use riscv_has_extension_likely riscv: checksum: Use riscv_has_extension_likely riscv: pgtable: Use riscv_has_extension_unlikely riscv: Remove __GFP_HIGHMEM masking RISC-V: Enable HOTPLUG_PARALLEL for secondary CPUs	2025-12-05 16:26:57 -08:00
Linus Torvalds	ad952db4a8	Merge tag 'powerpc-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Restore clearing of MSR[RI] at interrupt/syscall exit on 32-bit - Fix unpaired stwcx on interrupt exit on 32-bit - Fix race condition leading to double list-add in mac_hid_toggle_emumouse() - Fix mprotect on book3s 32-bit - Fix SLB multihit issue during SLB preload with 64-bit hash MMU - Add support for crashkernel CMA reservation - Add die_id and die_cpumask for Power10 & later to expose chip hemispheres - A series of minor fixes and improvements to the hash SLB code Thanks to Antonio Alvarez Feijoo, Ben Collins, Bhaskar Chowdhury, Christophe Leroy, Daniel Thompson, Dave Vasilevsky, Donet Tom, J. Neuschäfer, Kunwu Chan, Long Li, Naresh Kamboju, Nathan Chancellor, Ritesh Harjani (IBM), Shirisha G, Shrikanth Hegde, Sourabh Jain, Srikar Dronamraju, Stephen Rothwell, Thomas Zimmermann, Venkat Rao Bagalkote, and Vishal Chourasia. * tag 'powerpc-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (32 commits) macintosh/via-pmu-backlight: Include <linux/fb.h> and <linux/of.h> powerpc/powermac: backlight: Include <linux/of.h> powerpc/64s/slb: Add no_slb_preload early cmdline param powerpc/64s/slb: Make preload_add return type as void powerpc/ptdump: Dump PXX level info for kernel_page_tables powerpc/64s/pgtable: Enable directMap counters in meminfo for Hash powerpc/64s/hash: Update directMap page counters for Hash powerpc/64s/hash: Hash hpt_order should be only available with Hash MMU powerpc/64s/hash: Improve hash mmu printk messages powerpc/64s/hash: Fix phys_addr_t printf format in htab_initialize() powerpc/64s/ptdump: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE format powerpc/64s/hash: Restrict stress_hpt_struct memblock region to within RMA limit powerpc/64s/slb: Fix SLB multihit issue during SLB preload powerpc, mm: Fix mprotect on book3s 32-bit powerpc/smp: Expose die_id and die_cpumask powerpc/83xx: Add a null pointer check to mcu_gpiochip_add arch:powerpc:tools This file was missing shebang line, so added it kexec: Include kernel-end even without crashkernel powerpc: p2020: Rename wdt@ nodes to watchdog@ powerpc: 86xx: Rename wdt@ nodes to watchdog@ ...	2025-12-05 16:18:21 -08:00
Christian Brauner	87c9e88ac4	ovl: pass original credentials, not mounter credentials during create When creating new files the security layer expects the original credentials to be passed. When cleaning up the code this was accidently changed to pass the mounter's credentials by relying on current->cred which is already overriden at this point. Pass the original credentials directly. Reported-by: Ondrej Mosnacek <omosnace@redhat.com> Reported-by: Paul Moore <paul@paul-moore.com> Fixes: `e566bff963` ("ovl: port ovl_create_or_link() to new ovl_override_creator_creds") Link: https://lore.kernel.org/CAFqZXNvL1ciLXMhHrnoyBmQu1PAApH41LkSWEhrcvzAAbFij8Q@mail.gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Tested-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2025-12-05 16:16:20 -08:00
Linus Torvalds	4b9d25b4d3	Merge tag 'vfs-6.19-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix a type conversion bug in the ipc subsystem - Fix per-dentry timeout warning in autofs - Drop the fd conversion from sockets - Move assert from iput_not_last() to iput() - Fix reversed check in filesystems_freeze_callback() - Use proper uapi types for new struct delegation definitions * tag 'vfs-6.19-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: use UAPI types for new struct delegation definition mqueue: correct the type of ro to int Revert "net/socket: convert sock_map_fd() to FD_ADD()" autofs: fix per-dentry timeout warning fs: assert on I_FREEING not being set in iput() and iput_not_last() fs: PM: Fix reverse check in filesystems_freeze_callback()	2025-12-05 15:52:30 -08:00
Linus Torvalds	e40e023591	Merge tag 'exfat-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Fix a remount failure caused by differing process masks by inheriting the original mount options during the remount process - Fix a potential divide-by-zero error and system crash in exfat_allocate_bitmap that occurred when the readahead count was zero - Add validation for directory cluster bitmap bits to prevent directory and root cluster from being incorrectly zeroed out on corrupted images - Clear the post-EOF page cache when extending a file to prevent stale mmap data from becoming visible, addressing an generic/363 failure - Fix a reference count leak in exfat_find by properly releasing the dentry set in specific error paths * tag 'exfat-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: fix remount failure in different process environments exfat: fix divide-by-zero in exfat_allocate_bitmap exfat: validate the cluster bitmap bits of directory exfat: zero out post-EOF page cache on file extension exfat: fix refcount leak in exfat_find	2025-12-05 15:48:09 -08:00
Linus Torvalds	4b6b432128	Merge tag 'fuse-update-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Add mechanism for cleaning out unused, stale dentries; controlled via a module option (Luis Henriques) - Fix various bugs - Cleanups * tag 'fuse-update-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: Uninitialized variable in fuse_epoch_work() fuse: fix io-uring list corruption for terminated non-committed requests fuse: signal that a fuse inode should exhibit local fs behaviors fuse: Always flush the page cache before FOPEN_DIRECT_IO write fuse: Invalidate the page cache after FOPEN_DIRECT_IO write fuse: rename 'namelen' to 'namesize' fuse: use strscpy instead of strcpy fuse: refactor fuse_conn_put() to remove negative logic. fuse: new work queue to invalidate dentries from old epochs fuse: new work queue to periodically invalidate expired dentries dcache: export shrink_dentry_list() and add new helper d_dispose_if_unused() fuse: add WARN_ON and comment for RCU revalidate fuse: Fix whitespace for fuse_uring_args_to_ring() comment fuse: missing copy_finish in fuse-over-io-uring argument copies fuse: fix readahead reclaim deadlock	2025-12-05 15:25:13 -08:00
Linus Torvalds	7cd122b552	Merge tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull persistent dentry infrastructure and conversion from Al Viro: "Some filesystems use a kinda-sorta controlled dentry refcount leak to pin dentries of created objects in dcache (and undo it when removing those). A reference is grabbed and not released, but it's not actually _stored_ anywhere. That works, but it's hard to follow and verify; among other things, we have no way to tell _which_ of the increments is intended to be an unpaired one. Worse, on removal we need to decide whether the reference had already been dropped, which can be non-trivial if that removal is on umount and we need to figure out if this dentry is pinned due to e.g. unlink() not done. Usually that is handled by using kill_litter_super() as ->kill_sb(), but there are open-coded special cases of the same (consider e.g. /proc/self). Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set claims responsibility for +1 in refcount. The end result this series is aiming for: - get these unbalanced dget() and dput() replaced with new primitives that would, in addition to adjusting refcount, set and clear persistency flag. - instead of having kill_litter_super() mess with removing the remaining "leaked" references (e.g. for all tmpfs files that hadn't been removed prior to umount), have the regular shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries, dropping the corresponding reference if it had been set. After that kill_litter_super() becomes an equivalent of kill_anon_super(). Doing that in a single step is not feasible - it would affect too many places in too many filesystems. It has to be split into a series. This work has really started early in 2024; quite a few preliminary pieces have already gone into mainline. This chunk is finally getting to the meat of that stuff - infrastructure and most of the conversions to it. Some pieces are still sitting in the local branches, but the bulk of that stuff is here" * tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) d_make_discardable(): warn if given a non-persistent dentry kill securityfs_recursive_remove() convert securityfs get rid of kill_litter_super() convert rust_binderfs convert nfsctl convert rpc_pipefs convert hypfs hypfs: swich hypfs_create_u64() to returning int hypfs: switch hypfs_create_str() to returning int hypfs: don't pin dentries twice convert gadgetfs gadgetfs: switch to simple_remove_by_name() convert functionfs functionfs: switch to simple_remove_by_name() functionfs: fix the open/removal races functionfs: need to cancel ->reset_work in ->kill_sb() functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}() functionfs: don't abuse ffs_data_closed() on fs shutdown convert selinuxfs ...	2025-12-05 14:36:21 -08:00
Linus Torvalds	7203ca412f	Merge tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki) Rework the vmalloc() code to support non-blocking allocations (GFP_ATOIC, GFP_NOWAIT) "ksm: fix exec/fork inheritance" (xu xin) Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited across fork/exec "mm/zswap: misc cleanup of code and documentations" (SeongJae Park) Some light maintenance work on the zswap code "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira) Enhance the /sys/kernel/debug/page_owner debug feature by adding unique identifiers to differentiate the various stack traces so that userspace monitoring tools can better match stack traces over time "mm/page_alloc: pcp->batch cleanups" (Joshua Hahn) Minor alterations to the page allocator's per-cpu-pages feature "Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra) Address a scalability issue in userfaultfd's UFFDIO_MOVE operation "kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov) "drivers/base/node: fold node register and unregister functions" (Donet Tom) Clean up the NUMA node handling code a little "mm: some optimizations for prot numa" (Kefeng Wang) Cleanups and small optimizations to the NUMA allocation hinting code "mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn) Address long lock hold times at boot on large machines. These were causing (harmless) softlockup warnings "optimize the logic for handling dirty file folios during reclaim" (Baolin Wang) Remove some now-unnecessary work from page reclaim "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park) Enhance the DAMOS auto-tuning feature "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan) Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace configuration "expand mmap_prepare functionality, port more users" (Lorenzo Stoakes) Enhance the new(ish) file_operations.mmap_prepare() method and port additional callsites from the old ->mmap() over to ->mmap_prepare() "Fix stale IOTLB entries for kernel address space" (Lu Baolu) Fix a bug (and possible security issue on non-x86) in the IOMMU code. In some situations the IOMMU could be left hanging onto a stale kernel pagetable entry "mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang) Clean up and optimize the folio splitting code "mm, swap: misc cleanup and bugfix" (Kairui Song) Some cleanups and a minor fix in the swap discard code "mm/damon: misc documentation fixups" (SeongJae Park) "mm/damon: support pin-point targets removal" (SeongJae Park) Permit userspace to remove a specific monitoring target in the middle of the current targets list "mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo) A couple of cleanups related to mm header file inclusion "mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He) improve the selection of swap devices for NUMA machines "mm: Convert memory block states (MEM_) macros to enums" (Israel Batista) Change the memory block labels from macros to enums so they will appear in kernel debug info "ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes) Address an inefficiency when KSM unmerges an address range "mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park) Fix leaks and unhandled malloc() failures in DAMON userspace unit tests "some cleanups for pageout()" (Baolin Wang) Clean up a couple of minor things in the page scanner's writeback-for-eviction code "mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu) Move hugetlb's sysfs/sysctl handling code into a new file "introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes) Make the VMA guard regions available in /proc/pid/smaps and improves the mergeability of guarded VMAs "mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes) Reduce mmap lock contention for callers performing VMA guard region operations "vma_start_write_killable" (Matthew Wilcox) Start work on permitting applications to be killed when they are waiting on a read_lock on the VMA lock "mm/damon/tests: add more tests for online parameters commit" (SeongJae Park) Add additional userspace testing of DAMON's "commit" feature "mm/damon: misc cleanups" (SeongJae Park) "make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes) Address the possible loss of a VMA's VM_SOFTDIRTY flag when that VMA is merged with another "mm: support device-private THP" (Balbir Singh) Introduce support for Transparent Huge Page (THP) migration in zone device-private memory "Optimize folio split in memory failure" (Zi Yan) "mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang) Some more cleanups in the folio splitting code "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes) Clean up our handling of pagetable leaf entries by introducing the concept of 'software leaf entries', of type softleaf_t "reparent the THP split queue" (Muchun Song) Reparent the THP split queue to its parent memcg. This is in preparation for addressing the long-standing "dying memcg" problem, wherein dead memcg's linger for too long, consuming memory resources "unify PMD scan results and remove redundant cleanup" (Wei Yang) A little cleanup in the hugepage collapse code "zram: introduce writeback bio batching" (Sergey Senozhatsky) Improve zram writeback efficiency by introducing batched bio writeback support "memcg: cleanup the memcg stats interfaces" (Shakeel Butt) Clean up our handling of the interrupt safety of some memcg stats "make vmalloc gfp flags usage more apparent" (Vishal Moola) Clean up vmalloc's handling of incoming GFP flags "mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang) Teach soft dirty and userfaultfd write protect tracking to use RISC-V's Svrsw60t59b extension "mm: swap: small fixes and comment cleanups" (Youngjun Park) Fix a small bug and clean up some of the swap code "initial work on making VMA flags a bitmap" (Lorenzo Stoakes) Start work on converting the vma struct's flags to a bitmap, so we stop running out of them, especially on 32-bit "mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park) Address a possible bug in the swap discard code and clean things up a little [ This merge also reverts commit `ebb9aeb980` ("vfio/nvgrace-gpu: register device memory for poison handling") because it looks broken to me, I've asked for clarification - Linus ] tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm: fix vma_start_write_killable() signal handling mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate mm/swapfile: fix list iteration when next node is removed during discard fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling mm/kfence: add reboot notifier to disable KFENCE on shutdown memcg: remove inc/dec_lruvec_kmem_state helpers selftests/mm/uffd: initialize char variable to Null mm: fix DEBUG_RODATA_TEST indentation in Kconfig mm: introduce VMA flags bitmap type tools/testing/vma: eliminate dependency on vma->__vm_flags mm: simplify and rename mm flags function for clarity mm: declare VMA flags by bit zram: fix a spelling mistake mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity mm/vmscan: skip increasing kswapd_failures when reclaim was boosted pagemap: update BUDDY flag documentation mm: swap: remove scan_swap_map_slots() references from comments mm: swap: change swap_alloc_slow() to void mm, swap: remove redundant comment for read_swap_cache_async mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational ...	2025-12-05 13:52:43 -08:00
Maurice Hieronymus	c5108c58b9	tracing: Fix typo in trace_seq.c Fix typo "wont" to "won't". Link: https://patch.msgid.link/20251121221835.28032-15-mhi@mailbox.org Signed-off-by: Maurice Hieronymus <mhi@mailbox.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-12-05 15:43:41 -05:00
Maurice Hieronymus	0f17df72a7	tracing: Fix typo in trace_probe.c Fix typo "separater" to "separator". Link: https://patch.msgid.link/20251121221835.28032-14-mhi@mailbox.org Signed-off-by: Maurice Hieronymus <mhi@mailbox.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-12-05 15:43:41 -05:00
Maurice Hieronymus	fa3f733d97	tracing: Fix multiple typos in trace_osnoise.c Fix multiple typos in comments: "Anotate" -> "Annotate" "infor" -> "info" "timestemp" -> "timestamp" "tread" -> "thread" "varaibles" -> "variables" "wast" -> "waste" Link: https://patch.msgid.link/20251121221835.28032-13-mhi@mailbox.org Signed-off-by: Maurice Hieronymus <mhi@mailbox.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-12-05 15:43:41 -05:00
Maurice Hieronymus	6ce5725d73	tracing: Fix multiple typos in trace_events_user.c Fix multiple typos in comments: "ambigious" -> "ambiguous" "explictly" -> "explicitly" "Uknown" -> "Unknown" Link: https://patch.msgid.link/20251121221835.28032-12-mhi@mailbox.org Signed-off-by: Maurice Hieronymus <mhi@mailbox.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-12-05 15:43:41 -05:00
Maurice Hieronymus	0166d3e31a	tracing: Fix typo in trace_events_trigger.c Fix typo "componenents" to "components". Link: https://patch.msgid.link/20251121221835.28032-11-mhi@mailbox.org Signed-off-by: Maurice Hieronymus <mhi@mailbox.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-12-05 15:43:41 -05:00
Maurice Hieronymus	c29e75532e	tracing: Fix typo in trace_events_hist.c Fix typo "tigger" to "trigger". Link: https://patch.msgid.link/20251121221835.28032-10-mhi@mailbox.org Signed-off-by: Maurice Hieronymus <mhi@mailbox.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-12-05 15:43:40 -05:00
Maurice Hieronymus	86f320904e	tracing: Fix typo in trace_events_filter.c Fix typo "singe" to "single". Link: https://patch.msgid.link/20251121221835.28032-9-mhi@mailbox.org Signed-off-by: Maurice Hieronymus <mhi@mailbox.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-12-05 15:43:40 -05:00
Maurice Hieronymus	d4290963d5	tracing: Fix multiple typos in trace_events.c Fix multiple typos in comments: "appened" -> "appended" "paranthesis" -> "parenthesis" "parethesis" -> "parenthesis" "wont" -> "won't" Link: https://patch.msgid.link/20251121221835.28032-8-mhi@mailbox.org Signed-off-by: Maurice Hieronymus <mhi@mailbox.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-12-05 15:43:40 -05:00
Maurice Hieronymus	8d4cdbd45c	tracing: Fix multiple typos in trace.c Fix multiple typos in comments: "alse" -> "also" "enabed" -> "enabled" "instane" -> "instance" "outputing" -> "outputting" "seperated" -> "separated" Link: https://patch.msgid.link/20251121221835.28032-7-mhi@mailbox.org Signed-off-by: Maurice Hieronymus <mhi@mailbox.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-12-05 15:43:40 -05:00
Maurice Hieronymus	81354f6335	tracing: Fix typo in ring_buffer_benchmark.c Fix typo "overwite" to "overwrite". Link: https://patch.msgid.link/20251121221835.28032-6-mhi@mailbox.org Signed-off-by: Maurice Hieronymus <mhi@mailbox.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-12-05 15:43:40 -05:00
Maurice Hieronymus	1edb820ae9	tracing: Fix multiple typos in ring_buffer.c Fix multiple typos in comments: "ording" -> "ordering" "scatch" -> "scratch" "wont" -> "won't" Link: https://patch.msgid.link/20251121221835.28032-5-mhi@mailbox.org Signed-off-by: Maurice Hieronymus <mhi@mailbox.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-12-05 15:43:40 -05:00
Maurice Hieronymus	2ec7345c2d	tracing: Fix typo in fprobe.c Fix typo "funciton" to "function". Link: https://patch.msgid.link/20251121221835.28032-4-mhi@mailbox.org Signed-off-by: Maurice Hieronymus <mhi@mailbox.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-12-05 15:43:39 -05:00
Maurice Hieronymus	9c3f3b8fea	tracing: Fix typo in fpgraph.c Fix typo "reservered" to "reserved". Link: https://patch.msgid.link/20251121221835.28032-3-mhi@mailbox.org Signed-off-by: Maurice Hieronymus <mhi@mailbox.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-12-05 15:43:39 -05:00
Steven Rostedt	47ef834209	tracing: Fix fixed array of synthetic event The commit `4d38328eb4` ("tracing: Fix synth event printk format for str fields") replaced "%.*s" with "%s" but missed removing the number size of the dynamic and static strings. The commit `e1a453a57b` ("tracing: Do not add length to print format in synthetic events") fixed the dynamic part but did not fix the static part. That is, with the commands: # echo 's:wake_lat char[] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events # echo 'hist:keys=pid:ts=common_timestamp.usecs if !(common_flags & 0x18)' > /sys/kernel/tracing/events/sched/sched_waking/trigger # echo 'hist:keys=next_pid:delta=common_timestamp.usecs-$ts:onmatch(sched.sched_waking).trace(wake_lat,next_comm,$delta)' > /sys/kernel/tracing/events/sched/sched_switch/trigger That caused the output of: <idle>-0 [001] d..5. 193.428167: wake_lat: wakee=(efault)sshd-sessiondelta=155 sshd-session-879 [001] d..5. 193.811080: wake_lat: wakee=(efault)kworker/u34:5delta=58 <idle>-0 [002] d..5. 193.811198: wake_lat: wakee=(efault)bashdelta=91 The commit `e1a453a57b` fixed the part where the synthetic event had "char[] wakee". But if one were to replace that with a static size string: # echo 's:wake_lat char[16] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events Where "wakee" is defined as "char[16]" and not "char[]" making it a static size, the code triggered the "(efaul)" again. Remove the added STR_VAR_LEN_MAX size as the string is still going to be nul terminated. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Link: https://patch.msgid.link/20251204151935.5fa30355@gandalf.local.home Fixes: `e1a453a57b` ("tracing: Do not add length to print format in synthetic events") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-12-05 15:38:10 -05:00
Steven Rostedt	02e7769e38	tracing: Fix enabling of tracing on file release The trace file will pause tracing if the tracing instance has the "pause-on-trace" option is set. This happens when the file is opened, and it is unpaused when the file is closed. When this was first added, there was only one user that paused tracing. On open, the check to pause was: if (!iter->snapshot && (tr->trace_flags & TRACE_ITER(PAUSE_ON_TRACE))) Where if it is not the snapshot tracer and the "pause-on-trace" option is set, then it increments a "stop_count" of the trace instance. On close, the check is: if (!iter->snapshot && tr->stop_count) That is, if it is not the snapshot buffer and it was stopped, it will re-enable tracing. Now there's more places that stop tracing. This means, if something else stops tracing the tr->stop_count will be non-zero, and that means if the trace file is closed, it will decrement the stop_count even though it never incremented it. This causes a warning because when the user that stopped tracing enables it again, the stop_count goes below zero. Instead of relying on the stop_count being set to know if the close of the trace file should enable tracing again, add a new flag to the trace iterator. The trace iterator is unique per open of the trace file, and if the open stops tracing set the trace iterator PAUSE flag. On close, if the PAUSE flag is set, then re-enable it again. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251202161751.24abaaf1@gandalf.local.home Fixes: `06e0a548ba` ("tracing: Do not disable tracing when reading the trace file") Reported-by: syzbot+ccdec3bfe0beec58a38d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/692f44a5.a70a0220.2ea503.00c8.GAE@google.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-12-05 15:17:56 -05:00
Linus Torvalds	ac20755937	Merge tag 'sysctl-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Move jiffies converters out of kernel/sysctl.c Move the jiffies converters into kernel/time/jiffies.c and replace the pipe-max-size proc_handler converter with a macro based version. This is all part of the effort to relocate non-sysctl logic out of kernel/sysctl.c into more relevant subsystems. No functional changes. - Generalize proc handler converter creation Remove duplicated sysctl converter logic by consolidating it in macros. These are used inside sysctl core as well as in pipe.c and jiffies.c. Converter kernel and user space pointer args are now automatically const qualified for the convenience of the caller. No functional changes. - Miscellaneous Fix kernel-doc format warnings, remove unnecessary __user qualifiers, and move the nmi_watchdog sysctl into .rodata. - Testing This series was run through sysctl selftests/kunit test suite in x86_64. It went into linux-next after rc2, giving it a good 4/5 weeks of testing. * tag 'sysctl-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (21 commits) sysctl: Wrap do_proc_douintvec with the public function proc_douintvec_conv sysctl: Create pipe-max-size converter using sysctl UINT macros sysctl: Move proc_doulongvec_ms_jiffies_minmax to kernel/time/jiffies.c sysctl: Move jiffies converters to kernel/time/jiffies.c sysctl: Move UINT converter macros to sysctl header sysctl: Move INT converter macros to sysctl header sysctl: Allow custom converters from outside sysctl sysctl: remove __user qualifier from stack_erasing_sysctl buffer argument sysctl: Create macro for user-to-kernel uint converter sysctl: Add optional range checking to SYSCTL_UINT_CONV_CUSTOM sysctl: Create unsigned int converter using new macro sysctl: Add optional range checking to SYSCTL_INT_CONV_CUSTOM sysctl: Create integer converters with one macro sysctl: Create converter functions with two new macros sysctl: Discriminate between kernel and user converter params sysctl: Indicate the direction of operation with macro names sysctl: Remove superfluous __do_proc_* indirection sysctl: Remove superfluous tbl_data param from "dovec" functions sysctl: Replace void pointer with const pointer to ctl_table sysctl: fix kernel-doc format warning ...	2025-12-05 11:15:37 -08:00
Linus Torvalds	d1d36025a6	Merge tag 'probes-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes updates from Masami Hiramatsu: "fprobe performance enhancement using rhltable: - use rhltable for fprobe_ip_table. The fprobe IP table has been converted to use an rhltable for improved performance when dealing with a large number of probed functions - Fix a suspicious RCU usage warning of the above change in the fprobe entry handler - Remove an unused local variable of the above change - Fix to initialize fprobe_ip_table in core_initcall() Performance optimization of fprobe by ftrace: - Use ftrace instead of fgraph for entry only probes. This avoids the unneeded overhead of fgraph stack setup - Also update fprobe selftest for entry-only probe - fprobe: Use ftrace only if CONFIG_DYNAMIC_FTRACE_WITH_ARGS or WITH_REGS is defined Cleanup probe event subsystems: - Allocate traceprobe_parse_context per probe instead of each probe argument parsing. This reduce memory allocation/free of temporary working memory - Cleanup code using __free() - Replace strcpy() with memcpy() in __trace_probe_log_err()" * tag 'probes-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: fprobe: use ftrace if CONFIG_DYNAMIC_FTRACE_WITH_ARGS lib/test_fprobe: add testcase for mixed fprobe tracing: fprobe: optimization for entry only case tracing: fprobe: Fix to init fprobe_ip_table earlier tracing: fprobe: Remove unused local variable tracing: probes: Replace strcpy() with memcpy() in __trace_probe_log_err() tracing: fprobe: fix suspicious rcu usage in fprobe_entry tracing: uprobe: eprobes: Allocate traceprobe_parse_context per probe tracing: uprobes: Cleanup __trace_uprobe_create() with __free() tracing: eprobe: Cleanup eprobe event using __free() tracing: probes: Use __free() for trace_probe_log tracing: fprobe: use rhltable for fprobe_ip_table	2025-12-05 10:55:47 -08:00
Linus Torvalds	2e8c1c6a50	Merge tag 'ktest-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest Pull ktest fix from Steven Rostedt: - Fix incorrect variable in error message in config-bisect.pl If the old config file fails to get copied as the last good or bad config file, then it fails the program and prints an error message. But the variable used to print what the old config's name was incorrect. It was $config when it should have been $output_config. * tag 'ktest-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest: ktest.pl: Fix uninitialized var in config-bisect.pl	2025-12-05 10:53:43 -08:00
Linus Torvalds	2ba59045fb	Merge tag 'trace-ringbuffer-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull trace ring-buffer cleanup from Steven Rostedt: - Add helper functions for allocations The allocation of the per CPU buffer descriptor, the buffer page descriptors and the buffer page data itself can be pretty ugly. Add some helper macros and a function to have the code that allocates buffer pages and such look a little cleaner. * tag 'trace-ringbuffer-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Add helper functions for allocations	2025-12-05 10:50:24 -08:00
Arnaldo Carvalho de Melo	2eeb09fe1c	libperf: Use 'extern' in LIBPERF_API visibility macro Use 'extern' on LIBPERF_API to address this issue that started appearing with gcc 15, first seen in ubuntu 25.10: evlist.c: In function 'perf_evlist__purge': evlist.c:202:17: error: implicit declaration of function 'perf_evsel__delete'; did you mean 'perf_evsel__exit'? [-Wimplicit-function-declaration] 202 \| perf_evsel__delete(pos); \| ^~~~~~~~~~~~~~~~~~ \| perf_evsel__exit evlist.c:202:17: error: nested extern declaration of 'perf_evsel__delete' [-Werror=nested-externs] evlist.c: In function 'perf_evlist__open': evlist.c:261:23: error: implicit declaration of function 'perf_evsel__open'; did you mean 'perf_evsel__exit'? [-Wimplicit-function-declaration] 261 \| err = perf_evsel__open(evsel, evsel->cpus, evsel->threads); \| ^~~~~~~~~~~~~~~~ \| perf_evsel__exit evlist.c:261:23: error: nested extern declaration of 'perf_evsel__open' [-Werror=nested-externs] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-05 10:31:32 -08:00
Linus Torvalds	0b1b4a3d8e	Merge tag 'trace-rv-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull runtime verifier updates from Steven Rostedt: - Adapt the ftracetest script to be run from a different folder This uses the already existing OPT_TEST_DIR but extends it further to run independent tests, then add an --rv flag to allow using the script for testing RV (mostly) independently on ftrace. - Add basic RV selftests in selftests/verification for more validations Add more validations for available/enabled monitors and reactors. This could have caught the bug introducing kernel panic solved above. Tests use ftracetest. - Convert react() function in reactor to use va_list directly Use a central helper to handle the variadic arguments. Clean up macros and mark functions as static. - Add lockdep annotations to reactors to have lockdep complain of errors If the reactors are called from improper context. Useful to develop new reactors. This highlights a warning in the panic reactor that is related to the printk subsystem and not to RV. - Convert core RV code to use lock guards and __free helpers This completely removes goto statements. - Fix compilation if !CONFIG_RV_REACTORS Fix the warning by keeping LTL monitor variable as always static. * tag 'trace-rv-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rv: Fix compilation if !CONFIG_RV_REACTORS rv: Convert to use __free rv: Convert to use lock guard rv: Add explicit lockdep context for reactors rv: Make rv_reacting_on() static rv: Pass va_list to reactors selftests/verification: Add initial RV tests selftest/ftrace: Generalise ftracetest to use with RV	2025-12-05 10:17:00 -08:00
Linus Torvalds	0771cee974	Merge tag 'ftrace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace updates from Steven Rostedt: - Fix regression of pid filtering of function graph tracer When the function graph tracer allowed multiple instances of graph tracing using subops, the filtering by pid broke. The ftrace_ops->private that was used for pid filtering wasn't updated on creation. The wrong function entry callback was used when pid filtering was enabled when the function graph tracer started, which meant that the pid filtering wasn't happening. - Remove no longer needed ftrace_trace_task() With PID filtering working via ftrace_pids_enabled() and fgraph_pid_func(), the coarse-grained ftrace_trace_task() check in graph_entry() is obsolete. It was only a fallback for uninitialized op->private (now fixed), and its removal ensures consistent PID filtering with standard function tracing. * tag 'ftrace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: fgraph: Remove coarse PID filtering from graph_entry() fgraph: Check ftrace_pids_enabled on registration for early filtering fgraph: Initialize ftrace_ops->private for function graph ops	2025-12-05 10:13:04 -08:00
Linus Torvalds	69c5079b49	Merge tag 'trace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Extend tracing option mask to 64 bits The trace options were defined by a 32 bit variable. This limits the tracing instances to have a total of 32 different options. As that limit has been hit, and more options are being added, increase the option mask to a 64 bit number, doubling the number of options available. As this is required for the kprobe topic branches as well as the tracing topic branch, a separate branch was created and merged into both. - Make trace_user_fault_read() available for the rest of tracing The function trace_user_fault_read() is used by trace_marker file read to allow reading user space to be done fast and without locking or allocations. Make this available so that the system call trace events can use it too. - Have system call trace events read user space values Now that the system call trace events callbacks are called in a faultable context, take advantage of this and read the user space buffers for various system calls. For example, show the path name of the openat system call instead of just showing the pointer to that path name in user space. Also show the contents of the buffer of the write system call. Several system call trace events are updated to make tracing into a light weight strace tool for all applications in the system. - Update perf system call tracing to do the same - And a config and syscall_user_buf_size file to control the size of the buffer Limit the amount of data that can be read from user space. The default size is 63 bytes but that can be expanded to 165 bytes. - Allow the persistent ring buffer to print system calls normally The persistent ring buffer prints trace events by their type and ignores the print_fmt. This is because the print_fmt may change from kernel to kernel. As the system call output is fixed by the system call ABI itself, there's no reason to limit that. This makes reading the system call events in the persistent ring buffer much nicer and easier to understand. - Add options to show text offset to function profiler The function profiler that counts the number of times a function is hit currently lists all functions by its name and offset. But this becomes ambiguous when there are several functions with the same name. Add a tracing option that changes the output to be that of '_text+offset' instead. Now a user space tool can use this information to map the '_text+offset' to the unique function it is counting. - Report bad dynamic event command If a bad command is passed to the dynamic_events file, report it properly in the error log. - Clean up tracer options Clean up the tracer option code a bit, by removing some useless code and also using switch statements instead of a series of if statements. - Have tracing options be instance specific Tracers can have their own options (function tracer, irqsoff tracer, function graph tracer, etc). But now that the same tracer can be enabled in multiple trace instances, their options are still global. The API is per instance, thus changing one affects other instances. This isn't even consistent, as the option take affect differently depending on when an tracer started in an instance. Make the options for instances only affect the instance it is changed under. - Optimize pid_list lock contention Whenever the pid_list is read, it uses a spin lock. This happens at every sched switch. Taking the lock at sched switch can be removed by instead using a seqlock counter. - Clean up the trace trigger structures The trigger code uses two different structures to implement a single tigger. This was due to trying to reuse code for the two different types of triggers (always on trigger, and count limited trigger). But by adding a single field to one structure, the other structure could be absorbed into the first structure making he code easier to understand. - Create a bulk garbage collector for trace triggers If user space has triggers for several hundreds of events and then removes them, it can take several seconds to complete. This is because each removal calls tracepoint_synchronize_unregister() that can take hundreds of milliseconds to complete. Instead, create a helper thread that will do the clean up. When a trigger is removed, it will create the kthread if it isn't already created, and then add the trigger to a llist. The kthread will take the items off the llist, call tracepoint_synchronize_unregister(), and then remove the items it took off. It will then check if there's more items to free before sleeping. This makes user space removing all these triggers to finish in less than a second. - Allow function tracing of some of the tracing infrastructure code Because the tracing code can cause recursion issues if it is traced by the function tracer the entire tracing directory disables function tracing. But not all of tracing causes issues if it is traced. Namely, the event tracing code. Add a config that enables some of the tracing code to be traced to help in debugging it. Note, when this is enabled, it does add noise to general function tracing, especially if events are enabled as well (which is a common case). - Add boot-time backup instance for persistent buffer The persistent ring buffer is used mostly for kernel crash analysis in the field. One issue is that if there's a crash, the data in the persistent ring buffer must be read before tracing can begin using it. This slows down the boot process. Once tracing starts in the persistent ring buffer, the old data must be freed and the addresses no longer match and old events can't be in the buffer with new events. Create a way to create a backup buffer that copies the persistent ring buffer at boot up. Then after a crash, the always on tracer can begin immediately as well as the normal boot process while the crash analysis tooling uses the backup buffer. After the backup buffer is finished being read, it can be removed. - Enable function graph args and return address options at the same time Currently the when reading of arguments in the function graph tracer is enabled, the option to record the parent function in the entry event can not be enabled. Update the code so that it can. - Add new struct_offset() helper macro Add a new macro that takes a pointer to a structure and a name of one of its members and it will return the offset of that member. This allows the ring buffer code to simplify the following: From: size = struct_size(entry, buf, cnt - sizeof(entry->id)); To: size = struct_offset(entry, id) + cnt; There should be other simplifications that this macro can help out with as well * tag 'trace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (42 commits) overflow: Introduce struct_offset() to get offset of member function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously tracing: Add boot-time backup of persistent ring buffer ftrace: Allow tracing of some of the tracing code tracing: Use strim() in trigger_process_regex() instead of skip_spaces() tracing: Add bulk garbage collection of freeing event_trigger_data tracing: Remove unneeded event_mutex lock in event_trigger_regex_release() tracing: Merge struct event_trigger_ops into struct event_command tracing: Remove get_trigger_ops() and add count_func() from trigger ops tracing: Show the tracer options in boot-time created instance ftrace: Avoid redundant initialization in register_ftrace_direct tracing: Remove unused variable in tracing_trace_options_show() fgraph: Make fgraph_no_sleep_time signed tracing: Convert function graph set_flags() to use a switch() statement tracing: Have function graph tracer option sleep-time be per instance tracing: Move graph-time out of function graph options tracing: Have function graph tracer option funcgraph-irqs be per instance trace/pid_list: optimize pid_list->lock contention tracing: Have function graph tracer define options per instance tracing: Have function tracer define options per instance ...	2025-12-05 09:51:37 -08:00
Linus Torvalds	36492b7141	Merge tag 'tracepoints-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull unused tracepoints update from Steven Rostedt: "Detect unused tracepoints. If a tracepoint is defined but never used (TRACE_EVENT() created but no trace_<tracepoint>() called), it can take up to or more than 5K of memory each. This can add up as there are around a hundred unused tracepoints with various configs. That is 500K of wasted memory. Add a make build parameter of "UT=1" to have the build warn if an unused tracepoint is detected in the build. This allows detection of unused tracepoints to be upstream so that outreachy and the mentoring project can have new developers look for fixing them, without having these warnings suddenly show up when someone upgrades their kernel. When all known unused tracepoints are removed, then the "UT=1" build parameter can be removed and unused tracepoints will always warn. This will catch new unused tracepoints after the current ones have been removed. Summary: - Separate out elf functions from sorttable.c Move out the ELF parsing functions from sorttable.c so that the tracing tooling can use it. - Add a tracepoint verifier tool to the build process If "UT=1" is added to the kernel command line, any unused tracepoints will trigger a warning at build time. - Do not warn about unused tracepoints for tracepoints that are exported There are sever cases where a tracepoint is created by the kernel and used by modules. Since there's no easy way to detect if these are truly unused since the users are in modules, if a tracepoint is exported, assume it will eventually be used by a module. Note, there's not many exported tracepoints so this should not be a problem to ignore them. - Have building of modules also detect unused tracepoints Do not only check the main vmlinux for unused tracepoints, also check modules. If a module is defining a tracepoint it should be using it. - Add the tracepoint-update program to the ignore file The new tracepoint-update program needs to be ignored by git" * tag 'tracepoints-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: scripts: add tracepoint-update to the list of ignores files tracing: Add warnings for unused tracepoints for modules tracing: Allow tracepoint-update.c to work with modules tracepoint: Do not warn for unused event that is exported tracing: Add a tracepoint verification check at build time sorttable: Move ELF parsing into scripts/elf-parse.[ch]	2025-12-05 09:37:41 -08:00
Linus Torvalds	5779de8d36	Merge tag 'trace-tools-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull rtla trace tooling updates from Steven Rostedt: - Officially add Tomas Glozar as a maintainer to RTLA tool - Add for_each_monitored_cpu() helper In multiple places, RTLA tools iterate over the list of CPUs running tracer threads. Use single helper instead of repeating the for/if combination. - Remove unused variable option_index in argument parsing RTLA tools use getopt_long() for argument parsing. For its last argument, an unused variable "option_index" is passed. Remove the variable and pass NULL to getopt_long() to shorten the naturally long parsing functions, and make them more readable. - Fix unassigned nr_cpus after code consolidation In recent code consolidation, timerlat tool cleanup, previously implemented separately for each tool, was moved to a common function timerlat_free(). The cleanup relies on nr_cpus being set. This was not done in the new function, leaving the variable uninitialized. Initialize the variable properly, and remove silencing of compiler warning for uninitialized variables. - Stop tracing on user latency in BPF mode Despite the name, rtla-timerlat's -T/--thread option sets timerlat's stop_tracing_total_us option, which also stops tracing on return-from-user latency, not only on thread latency. Implement the same behavior also in BPF sample collection stop tracing handler to avoid a discrepancy and restore correspondence of behavior with the equivalent option of cyclictest. - Fix threshold actions always triggering A bug in threshold action logic caused the action to execute even if tracing did not stop because of threshold. Fix the logic to stop correctly. - Fix few minor issues in tests Extend tests that were shown to need it to 5s, fix osnoise test calling timerlat by mistake, and use new, more reliable output checking in timerlat's "top stop at failed action" test. - Do not print usage on argument parsing error RTLA prints the entire usage message on encountering errors in argument parsing, like a malformed CPU list. The usage message has gotten too long. Instead of printing it, use newly added fatal() helper function to simply exit with the error message, excluding the usage. - Fix unintuitive -C/--cgroup interface "-C cgroup" and "--cgroup cgroup" are invalid syntax, despite that being a common way to specify an option with argument. Moreover, using them fails silently and no cgroup is set. Create new helper function to unify the handling of all such options and allow all of: -Xsomething -X=something -X something as well as the equivalent for the long option. - Fix -a overriding -t argument filename Fix a bug where -a following -t custom_file.txt overrides the custom filename with the default timerlat_trace.txt. - Stop tracing correctly on multiple events at once In some race scenarios, RTLA BPF sample collection might send multiple stop tracing events via the BPF ringbuffer at once. Compare the number of events for != 0 instead of == 1 to cover for this scenario and stop tracing properly. * tag 'trace-tools-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rtla/timerlat: Exit top main loop on any non-zero wait_retval rtla/tests: Don't rely on matching ^1ALL rtla: Fix -a overriding -t argument rtla: Fix -C/--cgroup interface tools/rtla: Replace osnoise_hist_usage("...") with fatal("...") tools/rtla: Replace osnoise_top_usage("...") with fatal("...") tools/rtla: Replace timerlat_hist_usage("...") with fatal("...") tools/rtla: Replace timerlat_top_usage("...") with fatal("...") tools/rtla: Add fatal() and replace error handling pattern rtla/tests: Fix osnoise test calling timerlat rtla/tests: Extend action tests to 5s tools/rtla: Fix --on-threshold always triggering rtla/timerlat_bpf: Stop tracing on user latency tools/rtla: Fix unassigned nr_cpus tools/rtla: Remove unused optional option_index tools/rtla: Add for_each_monitored_cpu() helper MAINTAINERS: Add Tomas Glozar as a maintainer to RTLA tool	2025-12-05 09:34:01 -08:00
Linus Torvalds	ed1b409137	Merge tag 'hardening-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: - string: Add missing kernel-doc return descriptions (Kriish Sharma) - Update some mis-typed allocations These correct some accidentally wrong types used in allocations (that didn't affect the resulting size) that never got picked up from the batch I sent a few months ago. - Enable GCC diagnostic context for value-tracking warnings This results in better GCC diagnostics for the value range tracking, so we can get better visibility into where those values are coming from when we get out-of-bounds warnings at compile time. * tag 'hardening-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: kbuild: Enable GCC diagnostic context for value-tracking warnings string: Add missing kernel-doc return descriptions media: iris: Cast iris_hfi_gen2_get_instance() allocation type drm/plane: Remove const qualifier from plane->modifiers allocation type comedi: Adjust range_table_list allocation type	2025-12-05 09:11:02 -08:00
Linus Torvalds	3ee37abbbd	Merge tag 'pstore-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore update from Kees Cook: - pstore/ram: Update module parameters from platform data (Tzung-Bi Shih) * tag 'pstore-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/ram: Update module parameters from platform data	2025-12-05 09:08:13 -08:00
Linus Torvalds	5d45c729ed	Merge tag 'configfs-for-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/a.hindborg/linux Pull configfs updates from Andreas Hindborg: "Two commits changing constness of the configfs vtable pointers. We plan to follow up with changes at call sites down the road" * tag 'configfs-for-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/a.hindborg/linux: configfs: Constify ct_item_ops in struct config_item_type configfs: Constify ct_group_ops in struct config_item_type	2025-12-05 08:59:41 -08:00
Arnd Bergmann	79edb7f596	Merge tag 'samsung-dt-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/dt Samsung DTS ARM changes for v6.19 Fix WiFi on Exynos4210 and Exynos4412 boards with Broadcom chip after system suspend and resume, by using cap-power-off-card to power off the WiFi during suspend. * tag 'samsung-dt-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: ARM: dts: samsung: exynos4412-midas: turn off SDIO WLAN chip during system suspend ARM: dts: samsung: exynos4210-trats: turn off SDIO WLAN chip during system suspend ARM: dts: samsung: exynos4210-i9100: turn off SDIO WLAN chip during system suspend ARM: dts: samsung: universal_c210: turn off SDIO WLAN chip during system suspend Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2025-12-05 15:32:21 +01:00
Arnd Bergmann	68f9bbf4df	Merge tag 'samsung-drivers-6.19-2-late' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/drivers-late Samsung SoC drivers for v6.19, part two Two fixes for Exynos PMU (Power Management Unit) driver: 1. Silence lockdep warning being actually a false positive, but quite disturbing during testing. Issue was introduced in v6.18. 2. Drop device refcount when requesting device regmap with exynos_get_pmu_regmap_by_phandle(). Issue was introduced much earlier (around v6.9), with code being rewritten in between. * tag 'samsung-drivers-6.19-2-late' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: soc: samsung: exynos-pmu: fix device leak on regmap lookup soc: samsung: exynos-pmu: Fix structure initialization Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2025-12-05 15:31:33 +01:00
Arnd Bergmann	3ce8f4a501	ARM: omap1: avoid symbol clashes in fiq handler The ams-delta-fiq-handler.S file has a number of symbols with fairly generic names, including one named 'exit' that causes a compiler warning in some configuration options: vmlinux.o: error: exit() function name creates ambiguity with -ffunction-sections Change all these symbols to use a .L prefix to make them local to the fiq handler. Reviewed-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> Link: https://lore.kernel.org/r/20251204095355.1032786-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2025-12-05 15:25:13 +01:00
Thomas Weißschuh	fe93446b5e	vfs: use UAPI types for new struct delegation definition Using libc types and headers from the UAPI headers is problematic as it introduces a dependency on a full C toolchain. Use the fixed-width integer types provided by the UAPI headers instead. Fixes: `1602bad16d` ("vfs: expose delegation support to userland") Fixes: `4be9e04ebf` ("vfs: add needed headers for new struct delegation definition") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20251203-uapi-fcntl-v1-1-490c67bf3425@linutronix.de Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-05 13:57:39 +01:00
Edward Adam Davis	8cf01d0c43	mqueue: correct the type of ro to int The ro variable, being of type bool, caused the -EROFS return value from mnt_want_write() to be implicitly converted to 1. This prevented the file from being correctly acquired, thus triggering the issue reported by syzbot [1]. Changing the type of ro to int allows the system to correctly identify the reason for the file open failure. [1] KASAN: null-ptr-deref in range [0x0000000000000040-0x0000000000000047] Call Trace: do_mq_open+0x5a0/0x770 ipc/mqueue.c:932 __do_sys_mq_open ipc/mqueue.c:945 [inline] __se_sys_mq_open ipc/mqueue.c:938 [inline] __x64_sys_mq_open+0x16a/0x1c0 ipc/mqueue.c:938 Fixes: `f2573685bd` ("ipc: convert do_mq_open() to FD_ADD()") Reported-by: syzbot+40f42779048f7476e2e0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=40f42779048f7476e2e0 Tested-by: syzbot+40f42779048f7476e2e0@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Link: https://patch.msgid.link/tencent_369728EA76ED36CD98793A6D942C956C4C0A@qq.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-05 13:57:39 +01:00
Christian Brauner	afb9917d9b	Revert "net/socket: convert sock_map_fd() to FD_ADD()" This reverts commit `245f0d1c62`. When allocating a file sock_alloc_file() consumes the socket reference unconditionally which isn't correctly handled in the conversion. This can be fixed by massaging this appropriately but this is best left for next cycle. Reported-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/CADvbK_ewub4ZZK-tZg8GBQbDFHWhd9a48C+AFXZ93pMsssCrUg@mail.gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-05 13:57:39 +01:00
Eric Sandeen	3e281113f8	9p: fix new mount API cache option handling After commit `4eb3117888`, 9p needs to be able to accept numerical cache= mount options as well as the string "shortcuts" because the option is printed numerically in /proc/mounts rather than by string. This was missed in the mount API conversion, which used an enum for the shortcuts and therefore could not handle a numeric equivalent as an argument to the cache option. Fix this by removing the enum and reverting to the slightly more open-coded option handling for Opt_cache, with the reinstated get_cache_mode() helper. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Message-ID: <48cdeec9-5bb9-4c7a-a203-39bb8e0ef443@redhat.com> Tested-by: Remi Pommarel <repk@triplefau.lt> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-12-05 12:54:05 +00:00
Eric Sandeen	f044561331	9p: fix cache/debug options printing in v9fs_show_options commit `4eb3117888` changed the cache= option to accept either string shortcuts or bitfield values. It also changed /proc/mounts to emit the option as the hexadecimal numeric value rather than the shortcut string. However, by printing "cache=%x" without the leading 0x, shortcuts such as "cache=loose" will emit "cache=f" and 'f' is not a string that is parseable by kstrtoint(), so remounting may fail if a remount with "cache=f" is attempted. debug=%x has had the same problem since options have been displayed in `c4fac91004` ("9p: Implement show_options") Fix these by adding the 0x prefix to the hexadecimal value shown in /proc/mounts. Fixes: `4eb3117888` ("fs/9p: Rework cache modes and add new options to Documentation") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Message-ID: <54b93378-dcf1-4b04-922d-c8b4393da299@redhat.com> [Dominique: use %#x at Al Viro's suggestion, also handle debug] Tested-by: Remi Pommarel <repk@triplefau.lt> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-12-05 12:53:16 +00:00
Abdun Nihaal	164312662a	fbdev: ssd1307fb: fix potential page leak in ssd1307fb_probe() The page allocated for vmem using __get_free_pages() is not freed on the error paths after it. Fix that by adding a corresponding __free_pages() call to the error path. Fixes: `facd94bc45` ("fbdev: ssd1307fb: Allocate page aligned video memory.") Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in> Signed-off-by: Helge Deller <deller@gmx.de>	2025-12-05 11:29:14 +01:00
David Laight	150215b89b	drivers/xen: use min() instead of min_t() min_t(unsigned int, a, b) casts an 'unsigned long' to 'unsigned int'. Use min(a, b) instead as it promotes any 'unsigned int' to 'unsigned long' and so cannot discard significant bits. In this case the 'unsigned long' value is small enough that the result is ok. Detected by an extra check added to min_t(). Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20251119224140.8616-30-david.laight.linux@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com>	2025-12-05 08:46:07 +01:00
Jarkko Sakkinen	b7960b9048	tpm2-sessions: Open code tpm_buf_append_hmac_session() Open code 'tpm_buf_append_hmac_session_opt' to the call site, as it only masks a call sequence and does otherwise nothing particularly useful. Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com> Reviewed-by: Jonathan McDowell <noodles@meta.com>	2025-12-05 06:42:51 +02:00
Jarkko Sakkinen	bc677a9216	tpm2-sessions: Remove 'attributes' parameter from tpm_buf_append_auth Remove 'attributes' parameter from 'tpm_buf_append_auth', as it is not used by the function. Fixes: `27184f8905` ("tpm: Opt-in in disable PCR integrity protection") Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com> Reviewed-by: Jonathan McDowell <noodles@meta.com>	2025-12-05 06:42:51 +02:00

1 2 3 4 5 ...

1410962 Commits