Pull ARM fix from Russell King:
"Just one fix to the module freeing function that was declared __weak
when it should not have been. Thanks to Petr Pavlu for spotting this"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9458/1: module: Ensure the override of module_arch_freeing_init()
Pull tracing fixes from Steven Rostedt:
- Fix buffer overflow in osnoise_cpu_write()
The allocated buffer to read user space did not add a nul terminating
byte after copying from user the string. It then reads the string,
and if user space did not add a nul byte, the read will continue
beyond the string.
Add a nul terminating byte after reading the string.
- Fix missing check for lockdown on tracing
There's a path from kprobe events or uprobe events that can update
the tracing system even if lockdown on tracing is activate. Add a
check in the dynamic event path.
- Add a recursion check for the function graph return path
Now that fprobes can hook to the function graph tracer and call
different code between the entry and the exit, the exit code may now
call functions that are not called in entry. This means that the exit
handler can possibly trigger recursion that is not caught and cause
the system to crash.
Add the same recursion checks in the function exit handler as exists
in the entry handler path.
* tag 'trace-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: fgraph: Protect return handler from recursion loop
tracing: dynevent: Add a missing lockdown check on dynevent
tracing/osnoise: Fix slab-out-of-bounds in _parse_integer_limit()
Pull spi fixes from Mark Brown:
"A few final driver specific fixes that have been sitting in -next for
a bit.
The OMAP issue is likely to come up very infrequently since mixed
configuration SPI buses are rare and the Cadence issue is specific to
SoCFPGA systems"
* tag 'spi-fix-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: omap2-mcspi: drive SPI_CLK on transfer_setup()
spi: cadence-qspi: defer runtime support on socfpga if reset bit is enabled
Pull misc fixes from Andrew Morton:
"7 hotfixes. 4 are cc:stable and the remainder address post-6.16 issues
or aren't considered necessary for -stable kernels. 6 of these fixes
are for MM.
All singletons, please see the changelogs for details"
* tag 'mm-hotfixes-stable-2025-09-27-22-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
include/linux/pgtable.h: convert arch_enter_lazy_mmu_mode() and friends to static inlines
mm/damon/sysfs: do not ignore callback's return value in damon_sysfs_damon_call()
mailmap: add entry for Bence Csókás
fs/proc/task_mmu: check p->vec_buf for NULL
kmsan: fix out-of-bounds access to shadow memory
mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count
mm/hugetlb: fix folio is still mapped when deleted
While applying the patch for commit ede965fd55 ("i2c: rtl9300: remove
broken SMBus Quick operation support"), a conflict was incorrectly solved
by adding the I2C_FUNC_SMBUS_I2C_BLOCK feature flag. But the code to handle
I2C_SMBUS_I2C_BLOCK_DATA requests will be added by a separate commit.
Fixes: ede965fd55 ("i2c: rtl9300: remove broken SMBus Quick operation support")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
The email address bounced. I couldn't find a newer one in recent git history,
so delete this email entry.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Pull rtla tool fixes from Steven Rostedt:
- Fix a buffer overflow in actions_parse()
The "trigger_c" variable did not account for the nul byte when
determining its size
- Fix a compare that had the values reversed
actions_destroy() is supposed to reallocate when len is greater than
the current size, but the compare was testing if size is greater than
the new length
* tag 'trace-tools-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rtla/actions: Fix condition for buffer reallocation
rtla: Fix buffer overflow in actions_parse
function_graph_enter_regs() prevents itself from recursion by
ftrace_test_recursion_trylock(), but __ftrace_return_to_handler(),
which is called at the exit, does not prevent such recursion.
Therefore, while it can prevent recursive calls from
fgraph_ops::entryfunc(), it is not able to prevent recursive calls
to fgraph from fgraph_ops::retfunc(), resulting in a recursive loop.
This can lead an unexpected recursion bug reported by Menglong.
is_endbr() is called in __ftrace_return_to_handler -> fprobe_return
-> kprobe_multi_link_exit_handler -> is_endbr.
To fix this issue, acquire ftrace_test_recursion_trylock() in the
__ftrace_return_to_handler() after unwind the shadow stack to mark
this section must prevent recursive call of fgraph inside user-defined
fgraph_ops::retfunc().
This is essentially a fix to commit 4346ba1604 ("fprobe: Rewrite
fprobe on function-graph tracer"), because before that fgraph was
only used from the function graph tracer. Fprobe allowed user to run
any callbacks from fgraph after that commit.
Reported-by: Menglong Dong <menglong8.dong@gmail.com>
Closes: https://lore.kernel.org/all/20250918120939.1706585-1-dongml2@chinatelecom.cn/
Fixes: 4346ba1604 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/175852292275.307379.9040117316112640553.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Menglong Dong <menglong8.dong@gmail.com>
Acked-by: Menglong Dong <menglong8.dong@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull RISC-V fixes from Paul Walmsley:
- A race-free implementation of pudp_huge_get_and_clear() (based on the
x86 code)
- A MAINTAINERS update to my E-mail address
* tag 'riscv-for-linus-v6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
MAINTAINERS: Update Paul Walmsley's E-mail address
riscv: Use an atomic xchg in pudp_huge_get_and_clear()
Pull x86 fixes from Ingo Molnar:
"Fix a CPU topology code regression that caused the mishandling of
certain boot command line options, and re-enable CONFIG_PTDUMP on i386
that was mistakenly turned off in the Kconfig"
* tag 'x86-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/topology: Implement topology_is_core_online() to address SMT regression
x86/Kconfig: Reenable PTDUMP on i386
Pull scheduler fixes from Ingo Molnar:
"Fix two dl_server regressions: a race that can end up leaving the
dl_server stuck, and a dl_server throttling bug causing lag to fair
tasks"
* tag 'sched-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix dl_server behaviour
sched/deadline: Fix dl_server getting stuck
Pull locking fixes from Ingo Molnar:
"Fix a PI-futexes race, and fix a copy_process() futex cleanup bug"
* tag 'locking-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Use correct exit on failure from futex_hash_allocate_default()
futex: Prevent use-after-free during requeue-PI
Pull core fix from Ingo Molnar:
"Fix a CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y bug on older Clang versions"
* tag 'core-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang < 17
Pull smb client fixes from Steve French:
- Fix unlink bug
- Fix potential out of bounds access in processing compound requests
* tag 'v6.17rc7-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix wrong index reference in smb2_compound_op()
smb: client: handle unlink(2) of files open by different clients
Pull vfs fixes from Christian Brauner:
- Prevent double unlock in netfs
- Fix a NULL pointer dereference in afs_put_server()
- Fix a reference leak in netfs
* tag 'vfs-6.17-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
netfs: fix reference leak
afs: Fix potential null pointer dereference in afs_put_server
netfs: Prevent duplicate unlocking
Pull pmdomain fix from Ulf Hansson:
- mediatek: Make sure MT8195 AUDIO power domain isn't left powered-on
* tag 'pmdomain-v6.17-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: mediatek: set default off flag for MT8195 AUDIO power domain
Pull x86 platform driver fixes from Ilpo Järvinen:
"Fixes and New HW Supoort
- amd/pmc: Use 8042 quirk for Stellaris Slim Gen6 AMD
- dell: Set USTT mode according to BIOS after reboot
- dell-lis3lv02d: Add Latitude E6530
- lg-laptop: Fix setting the fan mode"
* tag 'platform-drivers-x86-v6.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: lg-laptop: Fix WMAB call in fan_mode_store()
platform/x86: dell-lis3lv02d: Add Latitude E6530
platform/x86/dell: Set USTT mode according to BIOS after reboot
platform/x86/amd/pmc: Add Stellaris Slim Gen6 AMD to spurious 8042 quirks list
Pull gpio fixes from Bartosz Golaszewski:
- allow looking up GPIOs by the secondary firmware node too
- fix memory leak in gpio-regmap
* tag 'gpio-fixes-for-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: regmap: fix memory leak of gpio_regmap structure
gpiolib: Extend software-node support to support secondary software-nodes
Pull block fixes from Jens Axboe:
"A regression fix for this series where an attempt to silence an EOD
error got messed up a bit, and then a change of git trees for the
block and io_uring trees.
Switching the git trees to kernel.org now, as I've just about had it
trying to battle AI bots that bring the box to its knees, continually.
At least I don't have to maintain the kernel.org side"
* tag 'block-6.17-20250925' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
MAINTAINERS: update io_uring and block tree git trees
block: fix EOD return for device with nr_sectors == 0
Pull drm fixes from Dave Airlie:
"Weekly fixes, some fbcon font handling fixes, then amdgpu/xe/i915 with
a few, and a few misc fixes for other drivers. Seems about right for
this stage, and I don't know of anything outstanding.
fbcon:
- fix OOB access in font allocation
- fix integer overflow in font handling
amdgpu:
- Backlight fix
- DC preblend fix
- DCN 3.5 fix
- Cleanup output_tf_change
xe:
- Don't expose sysfs attributes not applicable for VFs
- Fix build with CONFIG_MODULES=n
- Don't copy pinned kernel bos twice on suspend
i915:
- Set O_LARGEFILE in __create_shmem()
- Guard reg_val against a INVALID_TRANSCODER [ddi]
ast:
- sleeps causing cpu stall fix
panthor:
- scheduler race condition fix
gma500:
- NULL ptr deref in hdmi teardown fix"
* tag 'drm-fixes-2025-09-26' of https://gitlab.freedesktop.org/drm/kernel:
drm/panthor: Defer scheduler entitiy destruction to queue release
drm/amd/display: remove output_tf_change flag
drm/amd/display: Init DCN35 clocks from pre-os HW values
drm/amd/display: Use mpc.preblend flag to indicate preblend
drm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume
fbcon: Fix OOB access in font allocation
drm/i915/ddi: Guard reg_val against a INVALID_TRANSCODER
drm/i915: set O_LARGEFILE in __create_shmem()
drm/xe: Don't copy pinned kernel bos twice on suspend
drm/xe: Fix build with CONFIG_MODULES=n
drm/xe/vf: Don't expose sysfs attributes not applicable for VFs
fbcon: fix integer overflow in fbcon_do_set_font
drm/gma500: Fix null dereference in hdmi teardown
drm/ast: Use msleep instead of mdelay for edid read
In smb2_compound_op(), the loop that processes each command's response
uses wrong indices when accessing response bufferes.
This incorrect indexing leads to improper handling of command results.
Also, if incorrectly computed index is greather than or equal to
MAX_COMPOUND, it can cause out-of-bounds accesses.
Fixes: 3681c74d34 ("smb: client: handle lack of EA support in smb2_query_path_info()") # 6.14
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Commit 20d72b00ca ("netfs: Fix the request's work item to not
require a ref") modified netfs_alloc_request() to initialize the
reference counter to 2 instead of 1. The rationale was that the
requet's "work" would release the second reference after completion
(via netfs_{read,write}_collection_worker()). That works most of the
time if all goes well.
However, it leaks this additional reference if the request is released
before the I/O operation has been submitted: the error code path only
decrements the reference counter once and the work item will never be
queued because there will never be a completion.
This has caused outages of our whole server cluster today because
tasks were blocked in netfs_wait_for_outstanding_io(), leading to
deadlocks in Ceph (another bug that I will address soon in another
patch). This was caused by a netfs_pgpriv2_begin_copy_to_cache() call
which failed in fscache_begin_write_operation(). The leaked
netfs_io_request was never completed, leaving `netfs_inode.io_count`
with a positive value forever.
All of this is super-fragile code. Finding out which code paths will
lead to an eventual completion and which do not is hard to see:
- Some functions like netfs_create_write_req() allocate a request, but
will never submit any I/O.
- netfs_unbuffered_read_iter_locked() calls netfs_unbuffered_read()
and then netfs_put_request(); however, netfs_unbuffered_read() can
also fail early before submitting the I/O request, therefore another
netfs_put_request() call must be added there.
A rule of thumb is that functions that return a `netfs_io_request` do
not submit I/O, and all of their callers must be checked.
For my taste, the whole netfs code needs an overhaul to make reference
counting easier to understand and less fragile & obscure. But to fix
this bug here and now and produce a patch that is adequate for a
stable backport, I tried a minimal approach that quickly frees the
request object upon early failure.
I decided against adding a second netfs_put_request() each time
because that would cause code duplication which obscures the code
further. Instead, I added the function netfs_put_failed_request()
which frees such a failed request synchronously under the assumption
that the reference count is exactly 2 (as initially set by
netfs_alloc_request() and never touched), verified by a
WARN_ON_ONCE(). It then deinitializes the request object (without
going through the "cleanup_work" indirection) and frees the allocation
(with RCU protection to protect against concurrent access by
netfs_requests_seq_start()).
All code paths that fail early have been changed to call
netfs_put_failed_request() instead of netfs_put_request().
Additionally, I have added a netfs_put_request() call to
netfs_unbuffered_read() as explained above because the
netfs_put_failed_request() approach does not work there.
Fixes: 20d72b00ca ("netfs: Fix the request's work item to not require a ref")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
commit c519c3c0a1 ("mm/kasan: avoid lazy MMU mode hazards") introduced
the use of arch_enter_lazy_mmu_mode(), which results in the compiler
complaining about "statement has no effect", when
__HAVE_ARCH_LAZY_MMU_MODE is not defined in include/linux/pgtable.h
The exact warning/error is:
In file included from ./include/linux/kasan.h:37,
from mm/kasan/shadow.c:14:
mm/kasan/shadow.c: In function kasan_populate_vmalloc_pte:
./include/linux/pgtable.h:247:41: error: statement with no effect [-Werror=unused-value]
247 | #define arch_enter_lazy_mmu_mode() (LAZY_MMU_DEFAULT)
| ^
mm/kasan/shadow.c:322:9: note: in expansion of macro arch_enter_lazy_mmu_mode> 322 | arch_enter_lazy_mmu_mode();
| ^~~~~~~~~~~~~~~~~~~~~~~~
switching these "functions" to static inlines fixes this up.
Fixes: c519c3c0a1 ("mm/kasan: avoid lazy MMU mode hazards")
Reported-by: Balbir Singh <balbirs@nvidia.com>
Closes: https://lkml.kernel.org/r/20250912235515.367061-1-balbirs@nvidia.com
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When the PAGEMAP_SCAN ioctl is invoked with vec_len = 0 reaches
pagemap_scan_backout_range(), kernel panics with null-ptr-deref:
[ 44.936808] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 44.937797] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[ 44.938391] CPU: 1 UID: 0 PID: 2480 Comm: reproducer Not tainted 6.17.0-rc6 #22 PREEMPT(none)
[ 44.939062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 44.939935] RIP: 0010:pagemap_scan_thp_entry.isra.0+0x741/0xa80
<snip registers, unreliable trace>
[ 44.946828] Call Trace:
[ 44.947030] <TASK>
[ 44.949219] pagemap_scan_pmd_entry+0xec/0xfa0
[ 44.952593] walk_pmd_range.isra.0+0x302/0x910
[ 44.954069] walk_pud_range.isra.0+0x419/0x790
[ 44.954427] walk_p4d_range+0x41e/0x620
[ 44.954743] walk_pgd_range+0x31e/0x630
[ 44.955057] __walk_page_range+0x160/0x670
[ 44.956883] walk_page_range_mm+0x408/0x980
[ 44.958677] walk_page_range+0x66/0x90
[ 44.958984] do_pagemap_scan+0x28d/0x9c0
[ 44.961833] do_pagemap_cmd+0x59/0x80
[ 44.962484] __x64_sys_ioctl+0x18d/0x210
[ 44.962804] do_syscall_64+0x5b/0x290
[ 44.963111] entry_SYSCALL_64_after_hwframe+0x76/0x7e
vec_len = 0 in pagemap_scan_init_bounce_buffer() means no buffers are
allocated and p->vec_buf remains set to NULL.
This breaks an assumption made later in pagemap_scan_backout_range(), that
page_region is always allocated for p->vec_buf_index.
Fix it by explicitly checking p->vec_buf for NULL before dereferencing.
Other sites that might run into same deref-issue are already (directly or
transitively) protected by checking p->vec_buf.
Note:
From PAGEMAP_SCAN man page, it seems vec_len = 0 is valid when no output
is requested and it's only the side effects caller is interested in,
hence it passes check in pagemap_scan_get_args().
This issue was found by syzkaller.
Link: https://lkml.kernel.org/r/20250922082206.6889-1-acsjakub@amazon.de
Fixes: 52526ca7fd ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jinjiang Tu <tujinjiang@huawei.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Penglei Jiang <superman.xpt@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Running sha224_kunit on a KMSAN-enabled kernel results in a crash in
kmsan_internal_set_shadow_origin():
BUG: unable to handle page fault for address: ffffbc3840291000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1810067 P4D 1810067 PUD 192d067 PMD 3c17067 PTE 0
Oops: 0000 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 81 Comm: kunit_try_catch Tainted: G N 6.17.0-rc3 #10 PREEMPT(voluntary)
Tainted: [N]=TEST
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
RIP: 0010:kmsan_internal_set_shadow_origin+0x91/0x100
[...]
Call Trace:
<TASK>
__msan_memset+0xee/0x1a0
sha224_final+0x9e/0x350
test_hash_buffer_overruns+0x46f/0x5f0
? kmsan_get_shadow_origin_ptr+0x46/0xa0
? __pfx_test_hash_buffer_overruns+0x10/0x10
kunit_try_run_case+0x198/0xa00
This occurs when memset() is called on a buffer that is not 4-byte aligned
and extends to the end of a guard page, i.e. the next page is unmapped.
The bug is that the loop at the end of kmsan_internal_set_shadow_origin()
accesses the wrong shadow memory bytes when the address is not 4-byte
aligned. Since each 4 bytes are associated with an origin, it rounds the
address and size so that it can access all the origins that contain the
buffer. However, when it checks the corresponding shadow bytes for a
particular origin, it incorrectly uses the original unrounded shadow
address. This results in reads from shadow memory beyond the end of the
buffer's shadow memory, which crashes when that memory is not mapped.
To fix this, correctly align the shadow address before accessing the 4
shadow bytes corresponding to each origin.
Link: https://lkml.kernel.org/r/20250911195858.394235-1-ebiggers@kernel.org
Fixes: 2ef3cec44c ("kmsan: do not wipe out origin when doing partial unpoisoning")
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Migration may be raced with fallocating hole. remove_inode_single_folio
will unmap the folio if the folio is still mapped. However, it's called
without folio lock. If the folio is migrated and the mapped pte has been
converted to migration entry, folio_mapped() returns false, and won't
unmap it. Due to extra refcount held by remove_inode_single_folio,
migration fails, restores migration entry to normal pte, and the folio is
mapped again. As a result, we triggered BUG in filemap_unaccount_folio.
The log is as follows:
BUG: Bad page cache in process hugetlb pfn:156c00
page: refcount:515 mapcount:0 mapping:0000000099fef6e1 index:0x0 pfn:0x156c00
head: order:9 mapcount:1 entire_mapcount:1 nr_pages_mapped:0 pincount:0
aops:hugetlbfs_aops ino:dcc dentry name(?):"my_hugepage_file"
flags: 0x17ffffc00000c1(locked|waiters|head|node=0|zone=2|lastcpupid=0x1fffff)
page_type: f4(hugetlb)
page dumped because: still mapped when deleted
CPU: 1 UID: 0 PID: 395 Comm: hugetlb Not tainted 6.17.0-rc5-00044-g7aac71907bde-dirty #484 NONE
Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Call Trace:
<TASK>
dump_stack_lvl+0x4f/0x70
filemap_unaccount_folio+0xc4/0x1c0
__filemap_remove_folio+0x38/0x1c0
filemap_remove_folio+0x41/0xd0
remove_inode_hugepages+0x142/0x250
hugetlbfs_fallocate+0x471/0x5a0
vfs_fallocate+0x149/0x380
Hold folio lock before checking if the folio is mapped to avold race with
migration.
Link: https://lkml.kernel.org/r/20250912074139.3575005-1-tujinjiang@huawei.com
Fixes: 4aae8d1c05 ("mm/hugetlbfs: unmap pages if page fault raced with hole punch")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull networking fixes from Paolo Abeni:
"Including fixes from Bluetooth, IPsec and CAN.
No known regressions at this point.
Current release - regressions:
- xfrm: xfrm_alloc_spi shouldn't use 0 as SPI
Previous releases - regressions:
- xfrm: fix offloading of cross-family tunnels
- bluetooth: fix several races leading to UaFs
- dsa: lantiq_gswip: fix FDB entries creation for the CPU port
- eth:
- tun: update napi->skb after XDP process
- mlx: fix UAF in flow counter release
Previous releases - always broken:
- core: forbid FDB status change while nexthop is in a group
- smc: fix warning in smc_rx_splice() when calling get_page()
- can: provide missing ndo_change_mtu(), to prevent buffer overflow.
- eth:
- i40e: fix VF config validation
- broadcom: fix support for PTP_EXTTS_REQUEST2 ioctl"
* tag 'net-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
octeontx2-pf: Fix potential use after free in otx2_tc_add_flow()
net: dsa: lantiq_gswip: suppress -EINVAL errors for bridge FDB entries added to the CPU port
net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()
libie: fix string names for AQ error codes
net/mlx5e: Fix missing FEC RS stats for RS_544_514_INTERLEAVED_QUAD
net/mlx5: HWS, ignore flow level for multi-dest table
net/mlx5: fs, fix UAF in flow counter release
selftests: fib_nexthops: Add test cases for FDB status change
selftests: fib_nexthops: Fix creation of non-FDB nexthops
nexthop: Forbid FDB status change while nexthop is in a group
net: allow alloc_skb_with_frags() to use MAX_SKB_FRAGS
bnxt_en: correct offset handling for IPv6 destination address
ptp: document behavior of PTP_STRICT_FLAGS
broadcom: fix support for PTP_EXTTS_REQUEST2 ioctl
broadcom: fix support for PTP_PEROUT_DUTY_CYCLE
Bluetooth: MGMT: Fix possible UAFs
Bluetooth: hci_event: Fix UAF in hci_acl_create_conn_sync
Bluetooth: hci_event: Fix UAF in hci_conn_tx_dequeue
Bluetooth: hci_sync: Fix hci_resume_advertising_sync
Bluetooth: Fix build after header cleanup
...
Pull virtio fixes from Michael Tsirkin:
"virtio,vhost: last minute fixes
More small fixes. Most notably this fixes crashes and hangs in
vhost-net"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
MAINTAINERS, mailmap: Update address for Peter Hilber
virtio_config: clarify output parameters
uapi: vduse: fix typo in comment
vhost: Take a reference on the task in struct vhost_task.
vhost-net: flush batched before enabling notifications
Revert "vhost/net: Defer TX queue re-enable until after sendmsg"
vhost-net: unbreak busy polling
vhost-scsi: fix argument order in tport allocation error message
When WMAB is called to set the fan mode, the new mode is read from either
bits 0-1 or bits 4-5 (depending on the value of some other EC register).
Thus when WMAB is called with bits 4-5 zeroed and called again with
bits 0-1 zeroed, the second call undoes the effect of the first call.
This causes writes to /sys/devices/platform/lg-laptop/fan_mode to have
no effect (and causes reads to always report a status of zero).
Fix this by calling WMAB once, with the mode set in bits 0,1 and 4,5.
When the fan mode is returned from WMAB it always has this form, so
there is no need to preserve the other bits. As a bonus, the driver
now supports the "Performance" fan mode seen in the LG-provided Windows
control app, which provides less aggressive CPU throttling but louder
fan noise and shorter battery life.
Also, correct the documentation to reflect that 0 corresponds to the
default mode (what the Windows app calls "Optimal") and 1 corresponds
to the silent mode.
Fixes: dbf0c5a6b1 ("platform/x86: Add LG Gram laptop special features driver")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204913#c4
Signed-off-by: Daniel Lee <dany97@live.ca>
Link: https://patch.msgid.link/MN2PR06MB55989CB10E91C8DA00EE868DDC1CA@MN2PR06MB5598.namprd06.prod.outlook.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
This code calls kfree_rcu(new_node, rcu) and then dereferences "new_node"
and then dereferences it on the next line. Two lines later, we take
a mutex so I don't think this is an RCU safe region. Re-order it to do
the dereferences before queuing up the free.
Fixes: 68fbff68db ("octeontx2-pf: Add police action for TC flower")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/aNKCL1jKwK8GRJHh@stanley.mountain
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit de85488138 ("drm/panthor: Add the scheduler logical block")
handled destruction of a group's queues' drm scheduler entities early
into the group destruction procedure.
However, that races with the group submit ioctl, because by the time
entities are destroyed (through the group destroy ioctl), the submission
procedure might've already obtained a group handle, and therefore the
ability to push jobs into entities. This is met with a DRM error message
within the drm scheduler core as a situation that should never occur.
Fix by deferring drm scheduler entity destruction to queue release time.
Fixes: de85488138 ("drm/panthor: Add the scheduler logical block")
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20250919164436.531930-1-adrian.larumbe@collabora.com
The blamed commit and others in that patch set started the trend
of reusing existing DSA driver API for a new purpose: calling
ds->ops->port_fdb_add() on the CPU port.
The lantiq_gswip driver was not prepared to handle that, as can be seen
from the many errors that Daniel presents in the logs:
[ 174.050000] gswip 1e108000.switch: port 2 failed to add fa:aa:72:f4:8b:1e vid 1 to fdb: -22
[ 174.060000] gswip 1e108000.switch lan2: entered promiscuous mode
[ 174.070000] gswip 1e108000.switch: port 2 failed to add 00:01:02:03:04:02 vid 0 to fdb: -22
[ 174.090000] gswip 1e108000.switch: port 2 failed to add 00:01:02:03:04:02 vid 1 to fdb: -22
[ 174.090000] gswip 1e108000.switch: port 2 failed to delete fa:aa:72:f4:8b:1e vid 1 from fdb: -2
The errors are because gswip_port_fdb() wants to get a handle to the
bridge that originated these FDB events, to associate it with a FID.
Absolutely honourable purpose, however this only works for user ports.
To get the bridge that generated an FDB entry for the CPU port, one
would need to look at the db.bridge.dev argument. But this was
introduced in commit c26933639b ("net: dsa: request drivers to perform
FDB isolation"), first appeared in v5.18, and when the blamed commit was
introduced in v5.14, no such API existed.
So the core DSA feature was introduced way too soon for lantiq_gswip.
Not acting on these host FDB entries and suppressing any errors has no
other negative effect, and practically returns us to not supporting the
host filtering feature at all - peacefully, this time.
Fixes: 10fae4ac89 ("net: dsa: include bridge addresses which are local in the host fdb list")
Reported-by: Daniel Golle <daniel@makrotopia.org>
Closes: https://lore.kernel.org/netdev/aJfNMLNoi1VOsPrN@pidgin.makrotopia.org/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250918072142.894692-3-vladimir.oltean@nxp.com
Tested-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
A port added to a "single port bridge" operates as standalone, and this
is mutually exclusive to being part of a Linux bridge. In fact,
gswip_port_bridge_join() calls gswip_add_single_port_br() with
add=false, i.e. removes the port from the "single port bridge" to enable
autonomous forwarding.
The blamed commit seems to have incorrectly thought that ds->ops->port_enable()
is called one time per port, during the setup phase of the switch.
However, it is actually called during the ndo_open() implementation of
DSA user ports, which is to say that this sequence of events:
1. ip link set swp0 down
2. ip link add br0 type bridge
3. ip link set swp0 master br0
4. ip link set swp0 up
would cause swp0 to join back the "single port bridge" which step 3 had
just removed it from.
The correct DSA hook for one-time actions per port at switch init time
is ds->ops->port_setup(). This is what seems to match the coder's
intention; also see the comment at the beginning of the file:
* At the initialization the driver allocates one bridge table entry for
~~~~~~~~~~~~~~~~~~~~~
* each switch port which is used when the port is used without an
* explicit bridge.
Fixes: 8206e0ce96 ("net: dsa: lantiq: Add VLAN unaware bridge offloading")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250918072142.894692-2-vladimir.oltean@nxp.com
Tested-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
John reported undesirable behaviour with the dl_server since commit:
cccb45d7c4 ("sched/deadline: Less agressive dl_server handling").
When starving fair tasks on purpose (starting spinning FIFO tasks),
his fair workload, which often goes (briefly) idle, would delay fair
invocations for a second, running one invocation per second was both
unexpected and terribly slow.
The reason this happens is that when dl_se->server_pick_task() returns
NULL, indicating no runnable tasks, it would yield, pushing any later
jobs out a whole period (1 second).
Instead simply stop the server. This should restore behaviour in that
a later wakeup (which restarts the server) will be able to continue
running (subject to the CBS wakeup rules).
Notably, this does not re-introduce the behaviour cccb45d7c4 set
out to solve, any start/stop cycle is naturally throttled by the timer
period (no active cancel).
Fixes: cccb45d7c4 ("sched/deadline: Less agressive dl_server handling")
Reported-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: John Stultz <jstultz@google.com>
John found it was easy to hit lockup warnings when running locktorture
on a 2 CPU VM, which he bisected down to: commit cccb45d7c4
("sched/deadline: Less agressive dl_server handling").
While debugging it seems there is a chance where we end up with the
dl_server dequeued, with dl_se->dl_server_active. This causes
dl_server_start() to return without enqueueing the dl_server, thus it
fails to run when RT tasks starve the cpu.
When this happens, dl_server_timer() catches the
'!dl_se->server_has_tasks(dl_se)' case, which then calls
replenish_dl_entity() and dl_server_stopped() and finally return
HRTIMER_NO_RESTART.
This ends in no new timer and also no enqueue, leaving the dl_server
'dead', allowing starvation.
What should have happened is for the bandwidth timer to start the
zero-laxity timer, which in turn would enqueue the dl_server and cause
dl_se->server_pick_task() to be called -- which will stop the
dl_server if no fair tasks are observed for a whole period.
IOW, it is totally irrelevant if there are fair tasks at the moment of
bandwidth refresh.
This removes all dl_se->server_has_tasks() users, so remove the whole
thing.
Fixes: cccb45d7c4 ("sched/deadline: Less agressive dl_server handling")
Reported-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: John Stultz <jstultz@google.com>
afs_put_server() accessed server->debug_id before the NULL check, which
could lead to a null pointer dereference. Move the debug_id assignment,
ensuring we never dereference a NULL server pointer.
Fixes: 2757a4dc18 ("afs: Fix access after dec in put functions")
Cc: stable@vger.kernel.org
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
Acked-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Pull probes fixes from Masami Hiramatsu:
- fprobe: Even if there is a memory allocation failure, try to remove
the addresses recorded until then from the filter. Previously we just
skipped it.
- tracing: dynevent: Add a missing lockdown check on dynevent. This
dynevent is the interface for all probe events. Thus if there is no
check, any probe events can be added after lock down the tracefs.
* tag 'probes-fixes-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: dynevent: Add a missing lockdown check on dynevent
tracing: fprobe: Fix to remove recorded module addresses from filter
The LIBIE_AQ_STR macro() introduced by commit 5feaa7a07b ("libie: add
adminq helper for converting err to str") is used in order to generate
strings for printing human readable error codes. Its definition is missing
the separating underscore ('_') character which makes the resulting strings
difficult to read. Additionally, the string won't match the source code,
preventing search tools from working properly.
Add the missing underscore character, fixing the error string names.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Fixes: 5feaa7a07b ("libie: add adminq helper for converting err to str")
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250923205657.846759-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 1b34cbbf4f ("crypto: af_alg - Disallow concurrent writes in
af_alg_sendmsg") changed some fields from bool to 1-bit bitfields of
type u32.
However, some assignments to these fields, specifically 'more' and
'merge', assign values greater than 1. These relied on C's implicit
conversion to bool, such that zero becomes false and nonzero becomes
true.
With a 1-bit bitfields of type u32 instead, mod 2 of the value is taken
instead, resulting in 0 being assigned in some cases when 1 was intended.
Fix this by restoring the bool type.
Fixes: 1b34cbbf4f ("crypto: af_alg - Disallow concurrent writes in af_alg_sendmsg")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull SoC fixes from Arnd Bergmann:
"There are a few minor code fixes for tegra firmware, i.MX firmware
and the eyeq reset controller, and a MAINTAINERS update as Alyssa
Rosenzweig moves on to non-kernel projects.
The other changes are all for devicetree files:
- Multiple Marvell Armada SoCs need changes to fix PCIe, audio and
SATA
- A socfpga board fails to probe the ethernet phy
- The two temperature sensors on i.MX8MP are swapped
- Allwinner devicetree files cause build-time warnings
- Two Rockchip based boards need corrections for headphone detection
and SPI flash"
* tag 'soc-fixes-6.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
MAINTAINERS: remove Alyssa Rosenzweig
firmware: tegra: Do not warn on missing memory-region property
arm64: dts: marvell: cn9132-clearfog: fix multi-lane pci x2 and x4 ports
arm64: dts: marvell: cn9132-clearfog: disable eMMC high-speed modes
arm64: dts: marvell: cn913x-solidrun: fix sata ports status
ARM: dts: kirkwood: Fix sound DAI cells for OpenRD clients
arm64: dts: imx8mp: Correct thermal sensor index
ARM: imx: Kconfig: Adjust select after renamed config option
firmware: imx: Add stub functions for SCMI CPU API
firmware: imx: Add stub functions for SCMI LMM API
firmware: imx: Add stub functions for SCMI MISC API
riscv: dts: allwinner: rename devterm i2c-gpio node to comply with binding
arm64: dts: rockchip: Fix the headphone detection on the orangepi 5
arm64: dts: rockchip: Add vcc supply for SPI Flash on NanoPC-T6
ARM: dts: socfpga: sodia: Fix mdio bus probe and PHY address
reset: eyeq: fix OF node leak
ARM64: dts: mcbin: fix SATA ports on Macchiatobin
ARM: dts: armada-370-db: Fix stereo audio input routing on Armada 370
ARM: dts: allwinner: Minor whitespace cleanup
Pull power management fix from Rafael Rafael:
"Fix a locking issue in the cpufreq core introduced recently and caught
by lockdep (Christian Loehle)"
* tag 'pm-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Initialize cpufreq-based invariance before subsys
Pull btrfs fix from David Sterba:
"One more regression fix for a problem in zoned mode: mounting would
fail if the number of open and active zones reached a common limit
that didn't use to be checked"
* tag 'for-6.17-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: don't fail mount needlessly due to too many active zones
Pull smb server fixes from Steve French:
- free_transport fix for disconnect races
- minor delayed work fix
* tag '6.17-rc7-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
smb: server: use disable_work_sync in transport_rdma.c
smb: server: don't use delayed_work for post_recv_credits_work
Even if there is a memory allocation failure in fprobe_addr_list_add(),
there is a partial list of module addresses. So remove the recorded
addresses from filter if exists.
This also removes the redundant ret local variable.
Fixes: a3dc2983ca ("tracing: fprobe: Cleanup fprobe hash when module unloading")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Menglong Dong <menglong8.dong@gmail.com>
clang < 17 fails to use scope local labels with CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y:
{
__label__ local_lbl;
...
unsafe_get_user(uval, uaddr, local_lbl);
...
return 0;
local_lbl:
return -EFAULT;
}
when two such scopes exist in the same function:
error: cannot jump from this asm goto statement to one of its possible targets
There are other failure scenarios. Shuffling code around slightly makes it
worse and fail even with one instance.
That issue prevents using local labels for a cleanup based user access
mechanism.
After failed attempts to provide a simple enough test case for the 'depends
on' test in Kconfig, the initial cure was to mark ASM goto broken on clang
versions < 17 to get this road block out of the way.
But Nathan pointed out that this is a known clang issue and indeed affects
clang < version 17 in combination with cleanup(). It's not even required to
use local labels for that.
The clang issue tracker has a small enough test case, which can be used as
a test in the 'depends on' section of CC_HAS_ASM_GOTO_OUTPUT:
void bar(void **);
void* baz(void);
int foo (void) {
{
asm goto("jmp %l0"::::l0);
return 0;
l0:
return 1;
}
void *x __attribute__((cleanup(bar))) = baz();
{
asm goto("jmp %l0"::::l1);
return 42;
l1:
return 0xff;
}
}
Add another dependency to config CC_HAS_ASM_GOTO_OUTPUT for it and use the
clang issue tracker test case for detection by condensing it to obfuscated
C-code contest format. This reliably catches the problem on clang < 17 and
did not show any issues on the non broken GCC versions.
That test might be sufficient to catch all issues and therefore could
replace the existing test, but keeping that around does no harm either.
Thanks to Nathan for pointing to the relevant clang issue!
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1886
Link: f023f5cdb2
copy_process() uses the wrong error exit path from futex_hash_allocate_default().
After exiting from futex_hash_allocate_default(), neither tasklist_lock
nor siglock has been acquired. The exit label bad_fork_core_free unlocks
both of these locks which is wrong.
The next exit label, bad_fork_cancel_cgroup, is the correct exit.
sched_cgroup_fork() did not allocate any resources that need to freed.
Use bad_fork_cancel_cgroup on error exit from futex_hash_allocate_default().
Fixes: 7c4f75a21f ("futex: Allow automatic allocation of process wide futex hash")
Reported-by: syzbot+80cb3cc5c14fad191a10@syzkaller.appspotmail.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Closes: https://lore.kernel.org/all/68cb1cbd.050a0220.2ff435.0599.GAE@google.com
My experiment with using corporate Gmail for Linux kernel list
interaction has come to an end. For my MAINTAINERS entries that
use that E-mail address, let's switch those to use the k.org E-mail
forwarding.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Paul Walmsley <pjw@kernel.org>
When HWS creates multi-dest FW table and adds rules to
forward to other tables, ignore the flow level enforcement
in FW, because HWS is responsible for table levels.
This fixes the following error:
mlx5_core 0000:08:00.0: mlx5_cmd_out_err:818:(pid 192306):
SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed,
status bad parameter(0x3), syndrome (0x6ae84c), err(-22)
Fixes: 504e536d90 ("net/mlx5: HWS, added actions handling")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758525094-816583-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix a kernel trace [1] caused by releasing an HWS action of a local flow
counter in mlx5_cmd_hws_delete_fte(), where the HWS action refcount and
mutex were not initialized and the counter struct could already be freed
when deleting the rule.
Fix it by adding the missing initializations and adding refcount for the
local flow counter struct.
[1] Kernel log:
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
mlx5_fs_put_hws_action.part.0.cold+0x21/0x94 [mlx5_core]
mlx5_fc_put_hws_action+0x96/0xad [mlx5_core]
mlx5_fs_destroy_fs_actions+0x8b/0x152 [mlx5_core]
mlx5_cmd_hws_delete_fte+0x5a/0xa0 [mlx5_core]
del_hw_fte+0x1ce/0x260 [mlx5_core]
mlx5_del_flow_rules+0x12d/0x240 [mlx5_core]
? ttwu_queue_wakelist+0xf4/0x110
mlx5_ib_destroy_flow+0x103/0x1b0 [mlx5_ib]
uverbs_free_flow+0x20/0x50 [ib_uverbs]
destroy_hw_idr_uobject+0x1b/0x50 [ib_uverbs]
uverbs_destroy_uobject+0x34/0x1a0 [ib_uverbs]
uobj_destroy+0x3c/0x80 [ib_uverbs]
ib_uverbs_run_method+0x23e/0x360 [ib_uverbs]
? uverbs_finalize_object+0x60/0x60 [ib_uverbs]
ib_uverbs_cmd_verbs+0x14f/0x2c0 [ib_uverbs]
? do_tty_write+0x1a9/0x270
? file_tty_write.constprop.0+0x98/0xc0
? new_sync_write+0xfc/0x190
ib_uverbs_ioctl+0xd7/0x160 [ib_uverbs]
__x64_sys_ioctl+0x87/0xc0
do_syscall_64+0x59/0x90
Fixes: b581f42669 ("net/mlx5: fs, manage flow counters HWS action sharing by refcount")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758525094-816583-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel says:
====================
nexthop: Various fixes
Patch #1 fixes a NPD that was recently reported by syzbot.
Patch #2 fixes an issue in the existing FIB nexthop selftest.
Patch #3 extends the selftest with test cases for the bug that was fixed
in the first patch.
====================
Link: https://patch.msgid.link/20250921150824.149157-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add the following test cases for both IPv4 and IPv6:
* Can change from FDB nexthop to non-FDB nexthop and vice versa.
* Can change FDB nexthop address while in a group.
* Cannot change from FDB nexthop to non-FDB nexthop and vice versa while
in a group.
Output without "nexthop: Forbid FDB status change while nexthop is in a
group":
# ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal"
IPv6 fdb groups functional
--------------------------
[...]
TEST: Replace FDB nexthop to non-FDB nexthop [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop [ OK ]
TEST: Replace FDB nexthop address while in a group [ OK ]
TEST: Replace FDB nexthop to non-FDB nexthop while in a group [FAIL]
TEST: Replace non-FDB nexthop to FDB nexthop while in a group [FAIL]
[...]
IPv4 fdb groups functional
--------------------------
[...]
TEST: Replace FDB nexthop to non-FDB nexthop [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop [ OK ]
TEST: Replace FDB nexthop address while in a group [ OK ]
TEST: Replace FDB nexthop to non-FDB nexthop while in a group [FAIL]
TEST: Replace non-FDB nexthop to FDB nexthop while in a group [FAIL]
[...]
Tests passed: 36
Tests failed: 4
Tests skipped: 0
Output with "nexthop: Forbid FDB status change while nexthop is in a
group":
# ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal"
IPv6 fdb groups functional
--------------------------
[...]
TEST: Replace FDB nexthop to non-FDB nexthop [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop [ OK ]
TEST: Replace FDB nexthop address while in a group [ OK ]
TEST: Replace FDB nexthop to non-FDB nexthop while in a group [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop while in a group [ OK ]
[...]
IPv4 fdb groups functional
--------------------------
[...]
TEST: Replace FDB nexthop to non-FDB nexthop [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop [ OK ]
TEST: Replace FDB nexthop address while in a group [ OK ]
TEST: Replace FDB nexthop to non-FDB nexthop while in a group [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop while in a group [ OK ]
[...]
Tests passed: 40
Tests failed: 0
Tests skipped: 0
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250921150824.149157-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test creates non-FDB nexthops without a nexthop device which leads
to the expected failure, but for the wrong reason:
# ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal" -v
IPv6 fdb groups functional
--------------------------
[...]
COMMAND: ip -netns me-nRsN3E nexthop add id 63 via 2001:db8:91::4
Error: Device attribute required for non-blackhole and non-fdb nexthops.
COMMAND: ip -netns me-nRsN3E nexthop add id 64 via 2001:db8:91::5
Error: Device attribute required for non-blackhole and non-fdb nexthops.
COMMAND: ip -netns me-nRsN3E nexthop add id 103 group 63/64 fdb
Error: Invalid nexthop id.
TEST: Fdb Nexthop group with non-fdb nexthops [ OK ]
[...]
IPv4 fdb groups functional
--------------------------
[...]
COMMAND: ip -netns me-nRsN3E nexthop add id 14 via 172.16.1.2
Error: Device attribute required for non-blackhole and non-fdb nexthops.
COMMAND: ip -netns me-nRsN3E nexthop add id 15 via 172.16.1.3
Error: Device attribute required for non-blackhole and non-fdb nexthops.
COMMAND: ip -netns me-nRsN3E nexthop add id 103 group 14/15 fdb
Error: Invalid nexthop id.
TEST: Fdb Nexthop group with non-fdb nexthops [ OK ]
COMMAND: ip -netns me-nRsN3E nexthop add id 16 via 172.16.1.2 fdb
COMMAND: ip -netns me-nRsN3E nexthop add id 17 via 172.16.1.3 fdb
COMMAND: ip -netns me-nRsN3E nexthop add id 104 group 14/15
Error: Invalid nexthop id.
TEST: Non-Fdb Nexthop group with fdb nexthops [ OK ]
[...]
COMMAND: ip -netns me-0dlhyd ro add 172.16.0.0/22 nhid 15
Error: Nexthop id does not exist.
TEST: Route add with fdb nexthop [ OK ]
In addition, as can be seen in the above output, a couple of IPv4 test
cases used the non-FDB nexthops (14 and 15) when they intended to use
the FDB nexthops (16 and 17). These test cases only passed because
failure was expected, but they failed for the wrong reason.
Fix the test to create the non-FDB nexthops with a nexthop device and
adjust the IPv4 test cases to use the FDB nexthops instead of the
non-FDB nexthops.
Output after the fix:
# ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal" -v
IPv6 fdb groups functional
--------------------------
[...]
COMMAND: ip -netns me-lNzfHP nexthop add id 63 via 2001:db8:91::4 dev veth1
COMMAND: ip -netns me-lNzfHP nexthop add id 64 via 2001:db8:91::5 dev veth1
COMMAND: ip -netns me-lNzfHP nexthop add id 103 group 63/64 fdb
Error: FDB nexthop group can only have fdb nexthops.
TEST: Fdb Nexthop group with non-fdb nexthops [ OK ]
[...]
IPv4 fdb groups functional
--------------------------
[...]
COMMAND: ip -netns me-lNzfHP nexthop add id 14 via 172.16.1.2 dev veth1
COMMAND: ip -netns me-lNzfHP nexthop add id 15 via 172.16.1.3 dev veth1
COMMAND: ip -netns me-lNzfHP nexthop add id 103 group 14/15 fdb
Error: FDB nexthop group can only have fdb nexthops.
TEST: Fdb Nexthop group with non-fdb nexthops [ OK ]
COMMAND: ip -netns me-lNzfHP nexthop add id 16 via 172.16.1.2 fdb
COMMAND: ip -netns me-lNzfHP nexthop add id 17 via 172.16.1.3 fdb
COMMAND: ip -netns me-lNzfHP nexthop add id 104 group 16/17
Error: Non FDB nexthop group cannot have fdb nexthops.
TEST: Non-Fdb Nexthop group with fdb nexthops [ OK ]
[...]
COMMAND: ip -netns me-lNzfHP ro add 172.16.0.0/22 nhid 16
Error: Route cannot point to a fdb nexthop.
TEST: Route add with fdb nexthop [ OK ]
[...]
Tests passed: 30
Tests failed: 0
Tests skipped: 0
Fixes: 0534c5489c ("selftests: net: add fdb nexthop tests")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250921150824.149157-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The kernel forbids the creation of non-FDB nexthop groups with FDB
nexthops:
# ip nexthop add id 1 via 192.0.2.1 fdb
# ip nexthop add id 2 group 1
Error: Non FDB nexthop group cannot have fdb nexthops.
And vice versa:
# ip nexthop add id 3 via 192.0.2.2 dev dummy1
# ip nexthop add id 4 group 3 fdb
Error: FDB nexthop group can only have fdb nexthops.
However, as long as no routes are pointing to a non-FDB nexthop group,
the kernel allows changing the type of a nexthop from FDB to non-FDB and
vice versa:
# ip nexthop add id 5 via 192.0.2.2 dev dummy1
# ip nexthop add id 6 group 5
# ip nexthop replace id 5 via 192.0.2.2 fdb
# echo $?
0
This configuration is invalid and can result in a NPD [1] since FDB
nexthops are not associated with a nexthop device:
# ip route add 198.51.100.1/32 nhid 6
# ping 198.51.100.1
Fix by preventing nexthop FDB status change while the nexthop is in a
group:
# ip nexthop add id 7 via 192.0.2.2 dev dummy1
# ip nexthop add id 8 group 7
# ip nexthop replace id 7 via 192.0.2.2 fdb
Error: Cannot change nexthop FDB status while in a group.
[1]
BUG: kernel NULL pointer dereference, address: 00000000000003c0
[...]
Oops: Oops: 0000 [#1] SMP
CPU: 6 UID: 0 PID: 367 Comm: ping Not tainted 6.17.0-rc6-virtme-gb65678cacc03 #1 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014
RIP: 0010:fib_lookup_good_nhc+0x1e/0x80
[...]
Call Trace:
<TASK>
fib_table_lookup+0x541/0x650
ip_route_output_key_hash_rcu+0x2ea/0x970
ip_route_output_key_hash+0x55/0x80
__ip4_datagram_connect+0x250/0x330
udp_connect+0x2b/0x60
__sys_connect+0x9c/0xd0
__x64_sys_connect+0x18/0x20
do_syscall_64+0xa4/0x2a0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Fixes: 38428d6871 ("nexthop: support for fdb ecmp nexthops")
Reported-by: syzbot+6596516dd2b635ba2350@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68c9a4d2.050a0220.3c6139.0e63.GAE@google.com/
Tested-by: syzbot+6596516dd2b635ba2350@syzkaller.appspotmail.com
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250921150824.149157-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, alloc_skb_with_frags() will only fill (MAX_SKB_FRAGS - 1)
slots. I think it should use all MAX_SKB_FRAGS slots, as callers of
alloc_skb_with_frags() will size their allocation of frags based
on MAX_SKB_FRAGS.
This issue was discovered via a test patch that sets 'order' to 0
in alloc_skb_with_frags(), which effectively tests/simulates high
fragmentation. In this case sendmsg() on unix sockets will fail every
time for large allocations. If the PAGE_SIZE is 4K, then data_len will
request 68K or 17 pages, but alloc_skb_with_frags() can only allocate
64K in this case or 16 pages.
Fixes: 09c2c90705 ("net: allow alloc_skb_with_frags() to allocate bigger packets")
Signed-off-by: Jason Baron <jbaron@akamai.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250922191957.2855612-1-jbaron@akamai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marc Kleine-Budde says:
====================
pull-request: can 2025-09-23
The 1st patch is by Chen Yufeng and fixes a potential NULL pointer
deref in the hi311x driver.
Duy Nguyen contributes a patch for the rcar_canfd driver to fix the
controller mode setting.
The next 4 patches are by Vincent Mailhol and populate the
ndo_change_mtu(( callback in the etas_es58x, hi311x, sun4i_can and
mcba_usb driver to prevent buffer overflows.
Stéphane Grosjean's patch for the peak_usb driver fixes a
shift-out-of-bounds issue.
* tag 'linux-can-fixes-for-6.17-20250923' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: peak_usb: fix shift-out-of-bounds issue
can: mcba_usb: populate ndo_change_mtu() to prevent buffer overflow
can: sun4i_can: populate ndo_change_mtu() to prevent buffer overflow
can: hi311x: populate ndo_change_mtu() to prevent buffer overflow
can: etas_es58x: populate ndo_change_mtu() to prevent buffer overflow
can: rcar_canfd: Fix controller mode setting
can: hi311x: fix null pointer dereference when resuming from sleep before interface was enabled
====================
Link: https://patch.msgid.link/20250923073427.493034-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
firmware: tegra: Fixes for v6.17
This contains a simple patch to avoid a warning in the case where the
optional memory-region property is missing.
* tag 'tegra-for-6.17-firmware-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
firmware: tegra: Do not warn on missing memory-region property
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Another missing supply and a wrong headphone gpio level.
* tag 'v6.17-rockchip-dtsfixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: Fix the headphone detection on the orangepi 5
arm64: dts: rockchip: Add vcc supply for SPI Flash on NanoPC-T6
The MR1.CKS field is 3 bits wide and all the possible values (from 0 to
7) are valid. This is true for all the SoCs currently integrated in
upstream Linux. Take into account CKS=7 which allows setting bus
frequencies lower than 50KHz. This may be useful at least for debugging.
Fixes: d982d66514 ("i2c: riic: remove clock and frequency restrictions")
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
My address is going to bounce soon and I won't have access to the
Synopsys datasheets so I'm going step down being a maintainer for this
driver.
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Remove this flag as the driver stopped managing it individually since
commit a4056c2a63 ("drm/amd/display: use HW hdr mult for brightness
boost"). After some back and forth it was reintroduced as a condition to
`set_output_transfer_func()` in [1]. Without direct management, this
flag only changes value when all surface update flags are set true on
UPDATE_TYPE_FULL with no output TF status meaning.
Fixes: bb622e0c00 ("drm/amd/display: program output tf when required") [1]
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 752e6f283e)
[Why]
We did not initialize dc clocks with boot-time hw values during init.
This lead to incorrect clock values in dc, causing `dcn35_update_clocks`
to make incorrect updates.
[How]
Correctly initialize DC with pre-os clk values from HW.
s/dump/save/ as that accurately reflects the purpose of the functions.
Fixes: 8774029f76 ("drm/amd/display: Add DCN35 CLK_MGR")
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d43cc4ea1f)
On clients that utilize AMD_PRIVATE_COLOR properties for HDR support,
brightness sliders can include a hardware controlled portion and a
gamma-based portion. This is the case on the Steam Deck OLED when using
gamescope with Steam as a client.
When a user sets a brightness level while HDR is active, the gamma-based
portion and/or hardware portion are adjusted to achieve the desired
brightness. However, when a modeset takes place while the gamma-based
portion is in-use, restoring the hardware brightness level overrides the
user's overall brightness level and results in a mismatch between what
the slider reports and the display's current brightness.
To avoid overriding gamma-based brightness, only restore HW backlight
level after boot or resume. This ensures that the backlight level is
set correctly after the DC layer resets it while avoiding interference
with subsequent modesets.
Fixes: 7875afafba ("drm/amd/display: Fix brightness level not retained over reboot")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4551
Signed-off-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a490c8d77d)
Cc: stable@vger.kernel.org
When config osnoise cpus by write() syscall, the following KASAN splat may
be observed:
BUG: KASAN: slab-out-of-bounds in _parse_integer_limit+0x103/0x130
Read of size 1 at addr ffff88810121e3a1 by task test/447
CPU: 1 UID: 0 PID: 447 Comm: test Not tainted 6.17.0-rc6-dirty #288 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x55/0x70
print_report+0xcb/0x610
kasan_report+0xb8/0xf0
_parse_integer_limit+0x103/0x130
bitmap_parselist+0x16d/0x6f0
osnoise_cpus_write+0x116/0x2d0
vfs_write+0x21e/0xcc0
ksys_write+0xee/0x1c0
do_syscall_64+0xa8/0x2a0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
This issue can be reproduced by below code:
const char *cpulist = "1";
int fd=open("/sys/kernel/debug/tracing/osnoise/cpus", O_WRONLY);
write(fd, cpulist, strlen(cpulist));
Function bitmap_parselist() was called to parse cpulist, it require that
the parameter 'buf' must be terminated with a '\0' or '\n'. Fix this issue
by adding a '\0' to 'buf' in osnoise_cpus_write().
Cc: <mhiramat@kernel.org>
Cc: <mathieu.desnoyers@efficios.com>
Cc: <tglozar@redhat.com>
Link: https://lore.kernel.org/20250916063948.3154627-1-wangliang74@huawei.com
Fixes: 17f89102fe ("tracing/osnoise: Allow arbitrarily long CPU string")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
In MT8195 power domain data array, set the KEEP_DEFAULT_OFF and
ACTIVE_WAKEUP flags for the AUDIO power domain entry to avoid
having this domain being on during boot sequence when unneeded.
Fixes: 0e789b491b ("pmdomain: core: Leave powered-on genpds on until sync_state")
Fixes: 13a4b7fb62 ("pmdomain: core: Leave powered-on genpds on until late_initcall_sync")
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Previously BTRFS did not look at a device's reported max_open_zones limit,
but starting with commit 04147d8394 ("btrfs: zoned: limit active zones
to max_open_zones"), zoned BTRFS limited the number of concurrently used
block-groups to the number of max_open_zones a device reported, if it
hadn't already reported a number of max_active_zones.
Starting with commit 04147d8394 the number of open zones is treated the
same way as active zones. But this leads to mount failures on filesystems
which have been used before 04147d8394 because too many zones are in an
open state.
Ignore the new limitations on these filesystems, so zones can be finished
or evacuated.
Reported-by: Yuwei Han <hrx@bupt.moe>
Link: https://lore.kernel.org/all/2F48A90AF7DDF380+1790bcfd-cb6f-456b-870d-7982f21b5eae@bupt.moe/
Fixes: 04147d8394 ("btrfs: zoned: limit active zones to max_open_zones")
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In bnxt_tc_parse_pedit(), the code incorrectly writes IPv6
destination values to the source address field (saddr) when
processing pedit offsets within the destination address range.
This patch corrects the assignment to use daddr instead of saddr,
ensuring that pedit operations on IPv6 destination addresses are
applied correctly.
Fixes: 9b9eb518e3 ("bnxt_en: Add support for NAT(L3/L4 rewrite)")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://patch.msgid.link/20250920121157.351921-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Steffen Klassert says:
====================
pull request (net): ipsec 2025-09-22
1) Fix 0 assignment for SPIs. 0 is not a valid SPI,
it means no SPI assigned.
2) Fix offloading for inter address family tunnels.
* tag 'ipsec-2025-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: fix offloading of cross-family tunnels
xfrm: xfrm_alloc_spi shouldn't use 0 as SPI
====================
Link: https://patch.msgid.link/20250922073512.62703-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
i40e: virtchnl improvements
Przemek Kitszel says:
Improvements hardening PF-VF communication for i40e driver.
This patchset targets several issues that can cause undefined behavior
or be exploited in some other way.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: improve VF MAC filters accounting
i40e: add mask to apply valid bits for itr_idx
i40e: add max boundary check for VF filters
i40e: fix validation of VF state in get resources
i40e: fix input validation logic for action_meta
i40e: fix idx validation in config queues msg
i40e: fix idx validation in i40e_validate_queue_map
i40e: add validation for ring_len param
====================
Link: https://patch.msgid.link/20250919184959.656681-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christian reported that commit a430c11f40 ("intel_idle: Rescan "dead" SMT
siblings during initialization") broke the use case in which both 'nosmt'
and 'maxcpus' are on the kernel command line because it onlines primary
threads, which were offline due to the maxcpus limit.
The initially proposed fix to skip primary threads in the loop is
inconsistent. While it prevents the primary thread to be onlined, it then
onlines the corresponding hyperthread(s), which does not really make sense.
The CPU iterator in cpuhp_smt_enable() contains a check which excludes all
threads of a core, when the primary thread is offline. The default
implementation is a NOOP and therefore not effective on x86.
Implement topology_is_core_online() on x86 to address this issue. This
makes the behaviour consistent between x86 and PowerPC.
Fixes: a430c11f40 ("intel_idle: Rescan "dead" SMT siblings during initialization")
Fixes: f694481b1d ("ACPI: processor: Rescan "dead" SMT siblings during initialization")
Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/
Reported-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/12740505.O9o76ZdvQC@rafael.j.wysocki
Jacob Keller says:
====================
broadcom: report the supported flags for ancillary features
James Clark reported off list that the broadcom PHY PTP driver was
incorrectly handling PTP_EXTTS_REQUEST and PTP_PEROUT_REQUEST ioctls since
the conversion to the .supported_*_flags fields. This series fixes the
driver to correctly report its flags through the .supported_perout_flags
and .supported_extts_flags fields. It also contains an update to comment
the behavior of the PTP_STRICT_FLAGS being always enabled for
PTP_EXTTS_REQUEST2.
I plan to follow up this series with some improvements to the PTP
documentation better explaining each flag and the expectation of the driver
APIs.
====================
Link: https://patch.msgid.link/20250918-jk-fix-bcm-phy-supported-flags-v1-0-747b60407c9c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 6138e687c7 ("ptp: Introduce strict checking of external time stamp
options.") added the PTP_STRICT_FLAGS to the set of flags supported for the
external timestamp request ioctl.
It is only supported by PTP_EXTTS_REQUEST2, as it was introduced the
introduction of the new ioctls. Further, the kernel has always set this
flag for PTP_EXTTS_REQUEST2 regardless of whether or not the user requested
the behavior.
This effectively means that the flag is not useful for userspace. If the
user issues a PTP_EXTTS_REQUEST ioctl, the flag is ignored due to not being
supported on the old ioctl. If the user issues a PTP_EXTTS_REQUEST2 ioctl,
the flag will be set by the kernel regardless of whether the user set the
flag in their structure.
Add a comment documenting this behavior in the uAPI header file.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Tested-by: James Clark <jjc@jclark.com>
Link: https://patch.msgid.link/20250918-jk-fix-bcm-phy-supported-flags-v1-3-747b60407c9c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 7c571ac57d ("net: ptp: introduce .supported_extts_flags to
ptp_clock_info") modified the PTP core kernel logic to validate the
supported flags for the PTP_EXTTS_REQUEST ioctls, rather than relying on
each individual driver correctly checking its flags.
The bcm_ptp_enable() function implements support for PTP_CLK_REQ_EXTTS, but
does not check the flags, and does not forward the request structure into
bcm_ptp_extts_locked().
When originally converting the bcm-phy-ptp.c code, it was unclear what
edges the hardware actually timestamped. Thus, no flags were initialized in
the .supported_extts_flags field. This results in the kernel automatically
rejecting all userspace requests for the PTP_EXTTS_REQUEST2 ioctl.
This occurs because the PTP_STRICT_FLAGS is always assumed when operating
under PTP_EXTTS_REQUEST2. This has been the case since the flags
introduction by commit 6138e687c7 ("ptp: Introduce strict checking of
external time stamp options.").
The bcm-phy-ptp.c logic never properly supported strict flag validation,
as it previously ignored all flags including both PTP_STRICT_FLAGS and the
PTP_FALLING_EDGE and PTP_RISING_EDGE flags.
Reports from users in the field prove that the hardware timestamps the
rising edge. Encode this in the .supported_extts_flags field. This
re-enables support for the PTP_EXTTS_REQUEST2 ioctl.
Reported-by: James Clark <jjc@jclark.com>
Fixes: 7c571ac57d ("net: ptp: introduce .supported_extts_flags to ptp_clock_info")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Tested-by: James Clark <jjc@jclark.com>
Link: https://patch.msgid.link/20250918-jk-fix-bcm-phy-supported-flags-v1-2-747b60407c9c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bcm_ptp_perout_locked() function has support for handling
PTP_PEROUT_DUTY_CYCLE, but its not listed in the supported_perout_flags.
Attempts to use the duty cycle support will be rejected since commit
d9f3e9ecc4 ("net: ptp: introduce .supported_perout_flags to
ptp_clock_info"), as this flag accidentally missed while doing the
conversion.
Drop the unnecessary supported flags check from the bcm_ptp_perout_locked()
function and correctly set the supported_perout_flags. This fixes use of
the PTP_PEROUT_DUTY_CYCLE support for the broadcom driver.
Reported-by: James Clark <jjc@jclark.com>
Fixes: d9f3e9ecc4 ("net: ptp: introduce .supported_perout_flags to ptp_clock_info")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Tested-by: James Clark <jjc@jclark.com>
Link: https://patch.msgid.link/20250918-jk-fix-bcm-phy-supported-flags-v1-1-747b60407c9c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull sched_ext fix from jun Heo:
"This contains a fix for sched_ext idle CPU selection that likely fixes
a substantial performance regression.
The scx_bpf_select_cpu_dfl/and() kfuncs were incorrectly detecting all
tasks as migration-disabled when called outside ops.select_cpu(),
causing them to always return -EBUSY instead of finding idle CPUs.
The fix properly distinguishes between genuinely migration-disabled
tasks vs. the current task whose migration is temporarily disabled by
BPF execution"
* tag 'sched_ext-for-6.17-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: idle: Handle migration-disabled tasks in BPF code
Pull iommufd fixes from Jason Gunthorpe:
"Fix two user triggerable use-after-free issues:
- Possible race UAF setting up mmaps
- Syzkaller found UAF when erroring an file descriptor creation ioctl
due to the fput() work queue"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd/selftest: Update the fail_nth limit
iommufd: WARN if an object is aborted with an elevated refcount
iommufd: Fix race during abort for file descriptors
iommufd: Fix refcounting race during mmap
Pull rdma fix from Jason Gunthorpe:
"Just a one line change, was expecting more rc stuff, but it has been
quiet.
- Fix mlx5 devx event delivery to userspace for certain kinds of SRQs"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/mlx5: Fix obj_type mismatch for SRQ event subscriptions
Pull HID fixes from Jiri Kosina:
- work data memory corruption fix in amd_sfh (Basavaraj Natikar)
- fix for regression in cp2112 where setting a GPIO value would always
fail (Sébastien Szymanski)
- fix for regression in hid-lenovo causing driver to fail on non-ACPI
systems (Janne Grunau)
- a couple device ID additions and tiny device-specific quirks
* tag 'hid-for-linus-2025092201' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: amd_sfh: Add sync across amd sfh work functions
HID: asus: add support for missing PX series fn keys
HID: cp2112: fix setter callbacks return value
HID: lenovo: Use KEY_PERFORMANCE instead of ACPI's platform_profile
HID: intel-thc-hid: intel-quickspi: Add WCL Device IDs
HID: intel-thc-hid: intel-quicki2c: Add WCL Device IDs
Pull pin control fixes from Linus Walleij:
"Two small driver fixes for the Airhoa driver:
- Correct a PHY LED mux value so the PHY LED will blink as it should
- Fix the MDIO function bitmasks, working around a HW bug to
force-enable the MDIO pins"
* tag 'pinctrl-v6.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: airoha: fix wrong MDIO function bitmaks
pinctrl: airoha: fix wrong PHY LED mux value for LED1 GPIO46
When scx_bpf_select_cpu_dfl()/and() kfuncs are invoked outside of
ops.select_cpu() we can't rely on @p->migration_disabled to determine if
migration is disabled for the task @p.
In fact, migration is always disabled for the current task while running
BPF code: __bpf_prog_enter() disables migration and __bpf_prog_exit()
re-enables it.
To handle this, when @p->migration_disabled == 1, check whether @p is
the current task. If so, migration was not disabled before entering the
callback, otherwise migration was disabled.
This ensures correct idle CPU selection in all cases. The behavior of
ops.select_cpu() remains unchanged, because this callback is never
invoked for the current task and migration-disabled tasks are always
excluded.
Example: without this change scx_bpf_select_cpu_and() called from
ops.enqueue() always returns -EBUSY; with this change applied, it
correctly returns idle CPUs.
Fixes: 06efc9fe0b ("sched_ext: idle: Handle migration-disabled tasks in idle selection")
Cc: stable@vger.kernel.org # v6.16+
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Acked-by: Changwoo Min <changwoo@igalia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
We were copying the bo content the bos on the list
"xe->pinned.late.kernel_bo_present" twice on suspend.
Presumingly the intent is to copy the pinned external bos on
the first pass.
This is harmless since we (currently) should have no pinned
external bos needing copy since
a) exernal system bos don't have compressed content,
b) We do not (yet) allow pinning of VRAM bos.
Still, fix this up so that we copy pinned external bos on
the first pass. We're about to allow bos pinned in VRAM.
Fixes: c6a4d46ec1 ("drm/xe: evict user memory in PM notifier")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.16+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250918092207.54472-2-thomas.hellstrom@linux.intel.com
(cherry picked from commit 9e69bafece)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
VFs can't read BMG_PCIE_CAP(0x138340) register nor access PCODE
(already guarded by the info.skip_pcode flag) so we shouldn't
expose attributes that require any of them to avoid errors like:
[] xe 0000:03:00.1: [drm] Tile0: GT0: VF is trying to read an \
inaccessible register 0x138340+0x0
[] RIP: 0010:xe_gt_sriov_vf_read32+0x6c2/0x9a0 [xe]
[] Call Trace:
[] xe_mmio_read32+0x110/0x280 [xe]
[] auto_link_downgrade_capable_show+0x2e/0x70 [xe]
[] dev_attr_show+0x1a/0x70
[] sysfs_kf_seq_show+0xaa/0x120
[] kernfs_seq_show+0x41/0x60
Fixes: 0e414bf7ad ("drm/xe: Expose PCIe link downgrade attributes")
Fixes: cdc36b66cd ("drm/xe: Expose fan control and voltage regulator version")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250916170029.3313-2-michal.wajdeczko@intel.com
(cherry picked from commit a2d6223d22)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
A recent commit skipped dumping the usual "attempt to access beyond end
of device" message if the device size is 0 sectors, as that's a common
pattern for devices that have been hot removed. But while it stopped
that message, it also prevented returning -EIO for that condition.
Reinstate the -EIO return, while retaining the quiet operation for
triggering EOD for a device with 0 sectors.
Reported-by: syzbot+4b12286339fe4c2700c1@syzkaller.appspotmail.com
Reported-by: Sahil Chandna <chandna.linuxkernel@gmail.com>
Fixes: d0a2b527d8 ("block: tone down bio_check_eod")
Tested-by: Sahil Chandna <chandna.linuxkernel@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When a software-node gets added to a device which already has another
fwnode as primary node it will become the secondary fwnode for that
device.
Currently if a software-node with GPIO properties ends up as the secondary
fwnode then gpiod_find_by_fwnode() will fail to find the GPIOs.
Add a new gpiod_fwnode_lookup() helper which falls back to calling
gpiod_find_by_fwnode() with the secondary fwnode if the GPIO was not
found in the primary fwnode.
Fixes: e7f9ff5dc9 ("gpiolib: add support for software nodes")
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hansg@kernel.org>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: https://lore.kernel.org/r/20250920200955.20403-1-hansg@kernel.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
The commit
f9aad62200 ("mm: rename GENERIC_PTDUMP and PTDUMP_CORE")
has broken PTDUMP and the Kconfig options that use it on ARCH=i386, including
CONFIG_DEBUG_WX.
CONFIG_GENERIC_PTDUMP was renamed into CONFIG_ARCH_HAS_PTDUMP, but it was
mistakenly moved from "config X86" to "config X86_64". That made PTDUMP
unavailable for i386.
Move CONFIG_ARCH_HAS_PTDUMP back to "config X86" to fix it.
[ bp: Massage commit message. ]
Fixes: f9aad62200 ("mm: rename GENERIC_PTDUMP and PTDUMP_CORE")
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
If the cached contents of the CHCONF register doesn't have the FORCE bit
set, the setup() function failed to set the relevant idle state of the
SPI_CLK pin. In such case, the SPI_CLK's idle state is reached later with
set_cs(), but it's too late for the first SPI transfer which fails since
the CS is asserted before the clock reaching its idle state.
Add a first write in setup() that always sets the FORCE bit.
Keep the current write afterwards to ensure the FORCE bit won't stay in
the cached contents of the CHCONF register unless it's intended.
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Link: https://patch.msgid.link/20250912-omap-spi-fix-v1-1-f925b0d27ede@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
In order to identify whether a certain file is open by a different
client, start the unlink process by sending a compound request of
CREATE(DELETE_ON_CLOSE) + CLOSE with only FILE_SHARE_DELETE bit set in
smb2_create_req::ShareAccess. If the file is currently open, then the
server will fail the request with STATUS_SHARING_VIOLATION, in which
case we'll map it to -EBUSY, so __cifs_unlink() will fall back to
silly-rename the file.
This fixes the following case where open(O_CREAT) fails with
-ENOENT (STATUS_DELETE_PENDING) due to file still open by a different
client.
* Before patch
$ mount.cifs //srv/share /mnt/1 -o ...,nosharesock
$ mount.cifs //srv/share /mnt/2 -o ...,nosharesock
$ cd /mnt/1
$ touch foo
$ exec 3<>foo
$ cd /mnt/2
$ rm foo
$ touch foo
touch: cannot touch 'foo': No such file or directory
$ exec 3>&-
* After patch
$ mount.cifs //srv/share /mnt/1 -o ...,nosharesock
$ mount.cifs //srv/share /mnt/2 -o ...,nosharesock
$ cd /mnt/1
$ touch foo
$ exec 3<>foo
$ cd /mnt/2
$ rm foo
$ touch foo
$ exec 3>&-
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: Frank Sorenson <sorenson@redhat.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
vhost_task_create() creates a task and keeps a reference to its
task_struct. That task may exit early via a signal and its task_struct
will be released.
A pending vhost_task_wake() will then attempt to wake the task and
access a task_struct which is no longer there.
Acquire a reference on the task_struct while creating the thread and
release the reference while the struct vhost_task itself is removed.
If the task exits early due to a signal, then the vhost_task_wake() will
still access a valid task_struct. The wake is safe and will be skipped
in this case.
Fixes: f9010dbdce ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
Reported-by: Sean Christopherson <seanjc@google.com>
Closes: https://lore.kernel.org/all/aKkLEtoDXKxAAWju@google.com/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Message-Id: <20250918181144.Ygo8BZ-R@linutronix.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Sean Christopherson <seanjc@google.com>
Pull clk fixes from Stephen Boyd:
"Fixes to the Allwinner and Renesas clk drivers:
- Do the math properly in Allwinner's ccu_mp_recalc_rate() so clk
rates aren't bogus
- Fix a clock domain regression on Renesas R-Car M1A, R-Car H1,
and RZ/A1 by registering the domain after the pmdomain bus is
registered instead of before"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi-ng: mp: Fix dual-divider clock rate readback
clk: renesas: mstp: Add genpd OF provider at postcore_initcall()
Pull a few more btrfs fixes from David Sterba:
- in tree-checker, fix wrong size of check for inode ref item
- in ref-verify, handle combination of mount options that allow
partially damaged extent tree (reported by syzbot)
- additional validation of compression mount option to catch invalid
string as level
* tag 'for-6.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: reject invalid compression level
btrfs: ref-verify: handle damaged extent root tree
btrfs: tree-checker: fix the incorrect inode ref size check
Pull SCSI fix from James Bottomley:
"One driver fix for a dma error checking thinko"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: mcq: Fix memory allocation checks for SQE and CQE
Pull firewire fix from Takashi Sakamoto:
"When new structures and events were added to UAPI in v6.5 kernel, the
required update to the subsystem ABI version returned to userspace
client was overlooked. The version is now updated"
* tag 'firewire-fixes-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: fix overlooked update of subsystem ABI version
Pull x86 fix from Ingo Molnar:
"Fix a SEV-SNP regression when CONFIG_KVM_AMD_SEV is disabled"
* tag 'x86-urgent-2025-09-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Guard sev_evict_cache() with CONFIG_AMD_MEM_ENCRYPT
syzbot managed to trigger the following race:
T1 T2
futex_wait_requeue_pi()
futex_do_wait()
schedule()
futex_requeue()
futex_proxy_trylock_atomic()
futex_requeue_pi_prepare()
requeue_pi_wake_futex()
futex_requeue_pi_complete()
/* preempt */
* timeout/ signal wakes T1 *
futex_requeue_pi_wakeup_sync() // Q_REQUEUE_PI_LOCKED
futex_hash_put()
// back to userland, on stack futex_q is garbage
/* back */
wake_up_state(q->task, TASK_NORMAL);
In this scenario futex_wait_requeue_pi() is able to leave without using
futex_q::lock_ptr for synchronization.
This can be prevented by reading futex_q::task before updating the
futex_q::requeue_state. A reference on the task_struct is not needed
because requeue_pi_wake_futex() is invoked with a spinlock_t held which
implies a RCU read section.
Even if T1 terminates immediately after, the task_struct will remain valid
during T2's wake_up_state(). A READ_ONCE on futex_q::task before
futex_requeue_pi_complete() is enough because it ensures that the variable
is read before the state is updated.
Read futex_q::task before updating the requeue state, use it for the
following wakeup.
Fixes: 07d91ef510 ("futex: Prevent requeue_pi() lock nesting issue on RT")
Reported-by: syzbot+034246a838a10d181e78@syzkaller.appspotmail.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Closes: https://lore.kernel.org/all/68b75989.050a0220.3db4df.01dd.GAE@google.com/
hci_resume_advertising_sync is suppose to resume all instance paused by
hci_pause_advertising_sync, this logic is used for procedures are only
allowed when not advertising, but instance 0x00 was not being
re-enabled.
Fixes: ad383c2c65 ("Bluetooth: hci_sync: Enable advertising when LL privacy is enabled")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Some Kconfig dependencies are needed after my recent cleanup, since
the core code has its own option.
Since btmtksdio does not actually call h4_recv_buf(), move the
definitions it uses outside the BT_HCIUART_H4 gate in hci_uart.h to
avoid adding a dependency for btmtksdio.
The rest I touched (bpa10x, btmtkuart, and btnxpuart) do really call
h4_recv_buf(), so the dependency is required, add it for them.
Fixes: 0e272fc7e17d ("Bluetooth: remove duplicate h4_recv_buf() in header")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202508300413.OnIedvRh-lkp@intel.com/
Signed-off-by: Calvin Owens <calvin@wbinvd.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
commit 2a6c727387 ("cpufreq: Initialize cpufreq-based
frequency-invariance later") postponed the frequency invariance
initialization to avoid disabling it in the error case.
This isn't locking safe, instead move the initialization up before
the subsys interface is registered (which will rebuild the
sched_domains) and add the corresponding disable on the error path.
Observed lockdep without this patch:
[ 0.989686] ======================================================
[ 0.989688] WARNING: possible circular locking dependency detected
[ 0.989690] 6.17.0-rc4-cix-build+ #31 Tainted: G S
[ 0.989691] ------------------------------------------------------
[ 0.989692] swapper/0/1 is trying to acquire lock:
[ 0.989693] ffff800082ada7f8 (sched_energy_mutex){+.+.}-{4:4}, at: rebuild_sched_domains_energy+0x30/0x58
[ 0.989705]
but task is already holding lock:
[ 0.989706] ffff000088c89bc8 (&policy->rwsem){+.+.}-{4:4}, at: cpufreq_online+0x7f8/0xbe0
[ 0.989713]
which lock already depends on the new lock.
Fixes: 2a6c727387 ("cpufreq: Initialize cpufreq-based frequency-invariance later")
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull an Allwinner clk driver fix from Chen-Yu Tsai:
- One fix for the clock rate readback on the recently added dual
divider clocks
* tag 'sunxi-clk-fixes-for-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
clk: sunxi-ng: mp: Fix dual-divider clock rate readback
In kernel v6.5, several functions were added to the cdev layer. This
required updating the default version of subsystem ABI up to 6, but
this requirement was overlooked.
This commit updates the version accordingly.
Fixes: 6add87e976 ("firewire: cdev: add new version of ABI to notify time stamp at request/response subaction of transaction#")
Link: https://lore.kernel.org/r/20250920025148.163402-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Pull smb client fixes from Steve French:
- Two unlink fixes: one for rename and one for deferred close
- Four smbdirect/RDMA fixes: fix buffer leak in negotiate, two fixes
for races in smbd_destroy, fix offset and length checks in recv_done
* tag '6.17-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix smbdirect_recv_io leak in smbd_negotiate() error path
smb: client: fix file open check in __cifs_unlink()
smb: client: let smbd_destroy() call disable_work_sync(&info->post_send_credits_work)
smb: client: use disable[_delayed]_work_sync in smbdirect.c
smb: client: fix filename matching of deferred files
smb: client: let recv_done verify data_offset, data_length and remaining_data_length
Pull iommu fixes from Joerg Roedel:
- Fixes for memory leak and memory corruption bugs on S390 and AMD-Vi
- Race condition fix in AMD-Vi page table code and S390 device attach
code
- Intel VT-d: Fix alignment checks in __domain_mapping()
- AMD-Vi: Fix potentially incorrect DTE settings when device has
aliases
* tag 'iommu-fixes-v6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu/amd/pgtbl: Fix possible race while increase page table level
iommu/amd: Fix alias device DTE setting
iommu/s390: Make attach succeed when the device was surprise removed
iommu/vt-d: Fix __domain_mapping()'s usage of switch_to_super_page()
iommu/s390: Fix memory corruption when using identity domain
iommu/amd: Fix ivrs_base memleak in early_amd_iommu_init()
Pull block fixes from Jens Axboe:
"A set of fixes for an issue with md array assembly and drbd for
devices supporting write zeros"
* tag 'block-6.17-20250918' of git://git.kernel.dk/linux:
drbd: init queue_limits->max_hw_wzeroes_unmap_sectors parameter
md: init queue_limits->max_hw_wzeroes_unmap_sectors parameter
Pull io_uring fixes from Jens Axboe:
- Fix for a regression introduced in the io-wq worker creation logic.
- Remove the allocation cache for the msg_ring io_kiocb allocations. I
have a suspicion that there's a bug there, and since we just fixed
one in that area, let's just yank the use of that cache entirely.
It's not that important, and it kills some code.
- Treat a closed ring like task exiting in that any requests that
trigger post that condition should just get canceled. Doesn't fix any
real issues, outside of having tasks being able to rely on that
guarantee.
- Fix for a bug in the network zero-copy notification mechanism, where
a comparison for matching tctx/ctx for notifications was buggy in
that it didn't correctly compare with the previous notification.
* tag 'io_uring-6.17-20250919' of git://git.kernel.dk/linux:
io_uring: fix incorrect io_kiocb reference in io_link_skb
io_uring/msg_ring: kill alloc_cache for io_kiocb allocations
io_uring: include dying ring in task_work "should cancel" state
io_uring/io-wq: fix `max_workers` breakage and `nr_workers` underflow
Pull gpio fixes from Bartosz Golaszewski:
- fix an ACPI I2C HID driver breakage due to not initializing a
structure on the stack and passing garbage down to GPIO core
- ignore touchpad wakeup on GPD G1619-05
- fix debouncing configuration when looking up GPIOs in ACPI
* tag 'gpio-fixes-for-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: acpi: initialize acpi_gpio_info struct
gpiolib: acpi: Ignore touchpad wakeup on GPD G1619-05
gpiolib: acpi: Program debounce when finding GPIO
Pull MMC host fixes from Ulf Hansson:
- mvsdio: Fix dma_unmap_sg() nents value
- sdhci: Fix clock management for UHS-II
- sdhci-pci-gli: Fix initialization of UHS-II for GL9767
* tag 'mmc-v6.17-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-pci-gli: GL9767: Fix initializing the UHS-II interface during a power-on
mmc: sdhci-uhs2: Fix calling incorrect sdhci_set_clock() function
mmc: sdhci: Move the code related to setting the clock from sdhci_set_ios_common() into sdhci_set_ios()
mmc: mvsdio: Fix dma_unmap_sg() nents value
Pull LoongArch fixes from Huacai Chen:
"Fix some build warnings for RUST-enabled objtool check, align ACPI
structures for ARCH_STRICT_ALIGN, fix an unreliable stack for live
patching, add some NULL pointer checkings, and fix some bugs around
KVM"
* tag 'loongarch-fixes-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_pch_pic_regs_access()
LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_sw_status_access()
LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_regs_access()
LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_ctrl_access()
LoongArch: KVM: Fix VM migration failure with PTW enabled
LoongArch: KVM: Remove unused returns and semicolons
LoongArch: vDSO: Check kcalloc() result in init_vdso()
LoongArch: Fix unreliable stack for live patching
LoongArch: Replace sprintf() with sysfs_emit()
LoongArch: Check the return value when creating kobj
LoongArch: Align ACPI structures if ARCH_STRICT_ALIGN enabled
LoongArch: Update help info of ARCH_STRICT_ALIGN
LoongArch: Handle jump tables options for RUST
LoongArch: Make LTO case independent in Makefile
objtool/LoongArch: Mark special atomic instruction as INSN_BUG type
objtool/LoongArch: Mark types based on break immediate code
Vincent Mailhol <mailhol@kernel.org> says:
Four drivers, namely etas_es58x, hi311x, sun4i_can and mcba_usb forgot
to populate their net_device_ops->ndo_change_mtu(). Because of that,
the user is free to configure any MTU on these interfaces.
This can be abused by an attacker who could craft some skbs and send
them through PF_PACKET to perform a buffer overflow of up to 247 bytes
in each of these drivers.
This series contains four patches, one for each of the drivers, to add
the missing ndo_change_mtu() callback. The descriptions contain
detailed explanations of how the buffer overflow could be triggered.
Link: https://patch.msgid.link/20250918-can-fix-mtu-v1-0-0d1cada9393b@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Sending an PF_PACKET allows to bypass the CAN framework logic and to
directly reach the xmit() function of a CAN driver. The only check
which is performed by the PF_PACKET framework is to make sure that
skb->len fits the interface's MTU.
Unfortunately, because the mcba_usb driver does not populate its
net_device_ops->ndo_change_mtu(), it is possible for an attacker to
configure an invalid MTU by doing, for example:
$ ip link set can0 mtu 9999
After doing so, the attacker could open a PF_PACKET socket using the
ETH_P_CANXL protocol:
socket(PF_PACKET, SOCK_RAW, htons(ETH_P_CANXL))
to inject a malicious CAN XL frames. For example:
struct canxl_frame frame = {
.flags = 0xff,
.len = 2048,
};
The CAN drivers' xmit() function are calling can_dev_dropped_skb() to
check that the skb is valid, unfortunately under above conditions, the
malicious packet is able to go through can_dev_dropped_skb() checks:
1. the skb->protocol is set to ETH_P_CANXL which is valid (the
function does not check the actual device capabilities).
2. the length is a valid CAN XL length.
And so, mcba_usb_start_xmit() receives a CAN XL frame which it is not
able to correctly handle and will thus misinterpret it as a CAN frame.
This can result in a buffer overflow. The driver will consume cf->len
as-is with no further checks on these lines:
usb_msg.dlc = cf->len;
memcpy(usb_msg.data, cf->data, usb_msg.dlc);
Here, cf->len corresponds to the flags field of the CAN XL frame. In
our previous example, we set canxl_frame->flags to 0xff. Because the
maximum expected length is 8, a buffer overflow of 247 bytes occurs!
Populate net_device_ops->ndo_change_mtu() to ensure that the
interface's MTU can not be set to anything bigger than CAN_MTU. By
fixing the root cause, this prevents the buffer overflow.
Fixes: 51f3baad7d ("can: mcba_usb: Add support for Microchip CAN BUS Analyzer")
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250918-can-fix-mtu-v1-4-0d1cada9393b@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Sending an PF_PACKET allows to bypass the CAN framework logic and to
directly reach the xmit() function of a CAN driver. The only check
which is performed by the PF_PACKET framework is to make sure that
skb->len fits the interface's MTU.
Unfortunately, because the sun4i_can driver does not populate its
net_device_ops->ndo_change_mtu(), it is possible for an attacker to
configure an invalid MTU by doing, for example:
$ ip link set can0 mtu 9999
After doing so, the attacker could open a PF_PACKET socket using the
ETH_P_CANXL protocol:
socket(PF_PACKET, SOCK_RAW, htons(ETH_P_CANXL))
to inject a malicious CAN XL frames. For example:
struct canxl_frame frame = {
.flags = 0xff,
.len = 2048,
};
The CAN drivers' xmit() function are calling can_dev_dropped_skb() to
check that the skb is valid, unfortunately under above conditions, the
malicious packet is able to go through can_dev_dropped_skb() checks:
1. the skb->protocol is set to ETH_P_CANXL which is valid (the
function does not check the actual device capabilities).
2. the length is a valid CAN XL length.
And so, sun4ican_start_xmit() receives a CAN XL frame which it is not
able to correctly handle and will thus misinterpret it as a CAN frame.
This can result in a buffer overflow. The driver will consume cf->len
as-is with no further checks on this line:
dlc = cf->len;
Here, cf->len corresponds to the flags field of the CAN XL frame. In
our previous example, we set canxl_frame->flags to 0xff. Because the
maximum expected length is 8, a buffer overflow of 247 bytes occurs a
couple line below when doing:
for (i = 0; i < dlc; i++)
writel(cf->data[i], priv->base + (dreg + i * 4));
Populate net_device_ops->ndo_change_mtu() to ensure that the
interface's MTU can not be set to anything bigger than CAN_MTU. By
fixing the root cause, this prevents the buffer overflow.
Fixes: 0738eff14d ("can: Allwinner A10/A20 CAN Controller support - Kernel module")
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250918-can-fix-mtu-v1-3-0d1cada9393b@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Sending an PF_PACKET allows to bypass the CAN framework logic and to
directly reach the xmit() function of a CAN driver. The only check
which is performed by the PF_PACKET framework is to make sure that
skb->len fits the interface's MTU.
Unfortunately, because the sun4i_can driver does not populate its
net_device_ops->ndo_change_mtu(), it is possible for an attacker to
configure an invalid MTU by doing, for example:
$ ip link set can0 mtu 9999
After doing so, the attacker could open a PF_PACKET socket using the
ETH_P_CANXL protocol:
socket(PF_PACKET, SOCK_RAW, htons(ETH_P_CANXL))
to inject a malicious CAN XL frames. For example:
struct canxl_frame frame = {
.flags = 0xff,
.len = 2048,
};
The CAN drivers' xmit() function are calling can_dev_dropped_skb() to
check that the skb is valid, unfortunately under above conditions, the
malicious packet is able to go through can_dev_dropped_skb() checks:
1. the skb->protocol is set to ETH_P_CANXL which is valid (the
function does not check the actual device capabilities).
2. the length is a valid CAN XL length.
And so, hi3110_hard_start_xmit() receives a CAN XL frame which it is
not able to correctly handle and will thus misinterpret it as a CAN
frame. The driver will consume frame->len as-is with no further
checks.
This can result in a buffer overflow later on in hi3110_hw_tx() on
this line:
memcpy(buf + HI3110_FIFO_EXT_DATA_OFF,
frame->data, frame->len);
Here, frame->len corresponds to the flags field of the CAN XL frame.
In our previous example, we set canxl_frame->flags to 0xff. Because
the maximum expected length is 8, a buffer overflow of 247 bytes
occurs!
Populate net_device_ops->ndo_change_mtu() to ensure that the
interface's MTU can not be set to anything bigger than CAN_MTU. By
fixing the root cause, this prevents the buffer overflow.
Fixes: 57e83fb9b7 ("can: hi311x: Add Holt HI-311x CAN driver")
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250918-can-fix-mtu-v1-2-0d1cada9393b@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Sending an PF_PACKET allows to bypass the CAN framework logic and to
directly reach the xmit() function of a CAN driver. The only check
which is performed by the PF_PACKET framework is to make sure that
skb->len fits the interface's MTU.
Unfortunately, because the etas_es58x driver does not populate its
net_device_ops->ndo_change_mtu(), it is possible for an attacker to
configure an invalid MTU by doing, for example:
$ ip link set can0 mtu 9999
After doing so, the attacker could open a PF_PACKET socket using the
ETH_P_CANXL protocol:
socket(PF_PACKET, SOCK_RAW, htons(ETH_P_CANXL));
to inject a malicious CAN XL frames. For example:
struct canxl_frame frame = {
.flags = 0xff,
.len = 2048,
};
The CAN drivers' xmit() function are calling can_dev_dropped_skb() to
check that the skb is valid, unfortunately under above conditions, the
malicious packet is able to go through can_dev_dropped_skb() checks:
1. the skb->protocol is set to ETH_P_CANXL which is valid (the
function does not check the actual device capabilities).
2. the length is a valid CAN XL length.
And so, es58x_start_xmit() receives a CAN XL frame which it is not
able to correctly handle and will thus misinterpret it as a CAN(FD)
frame.
This can result in a buffer overflow. For example, using the es581.4
variant, the frame will be dispatched to es581_4_tx_can_msg(), go
through the last check at the beginning of this function:
if (can_is_canfd_skb(skb))
return -EMSGSIZE;
and reach this line:
memcpy(tx_can_msg->data, cf->data, cf->len);
Here, cf->len corresponds to the flags field of the CAN XL frame. In
our previous example, we set canxl_frame->flags to 0xff. Because the
maximum expected length is 8, a buffer overflow of 247 bytes occurs!
Populate net_device_ops->ndo_change_mtu() to ensure that the
interface's MTU can not be set to anything bigger than CAN_MTU or
CANFD_MTU (depending on the device capabilities). By fixing the root
cause, this prevents the buffer overflow.
Fixes: 8537257874 ("can: etas_es58x: add core support for ETAS ES58X CAN USB interfaces")
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250918-can-fix-mtu-v1-1-0d1cada9393b@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This issue is similar to the vulnerability in the `mcp251x` driver,
which was fixed in commit 03c427147b ("can: mcp251x: fix resume from
sleep before interface was brought up").
In the `hi311x` driver, when the device resumes from sleep, the driver
schedules `priv->restart_work`. However, if the network interface was
not previously enabled, the `priv->wq` (workqueue) is not allocated and
initialized, leading to a null pointer dereference.
To fix this, we move the allocation and initialization of the workqueue
from the `hi3110_open` function to the `hi3110_can_probe` function.
This ensures that the workqueue is properly initialized before it is
used during device resume. And added logic to destroy the workqueue
in the error handling paths of `hi3110_can_probe` and in the
`hi3110_can_remove` function to prevent resource leaks.
Signed-off-by: Chen Yufeng <chenyufeng@iie.ac.cn>
Link: https://patch.msgid.link/20250911150820.250-1-chenyufeng@iie.ac.cn
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Pull crypto fixes from Herbert Xu:
"This fixes a NULL pointer dereference in ccp and a couple of bugs in
the af_alg interface"
* tag 'v6.17-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: af_alg - Disallow concurrent writes in af_alg_sendmsg
crypto: af_alg - Set merge to zero early in af_alg_sendmsg
crypto: ccp - Always pass in an error pointer to __sev_platform_shutdown_locked()
Pull sound fixes from Takashi Iwai:
"A collection of small fixes. The volume became higher than wished, but
nothing really stands out -- all small, nice and smooth.
A slightly large change is found in qcom USB-audio offload stuff, but
this is a regression fix specific to this device, hence it should be
safe to apply at this late stage.
- Various small fixes for ASoC Cirrus, Realtek, lpass, Intel and
Qualcomm drivers
- ASoC SoundWire fixes
- A few TAS2781 HD-audio side-codec driver fixes
- A fix for Qualcomm USB-audio offload breakage
- Usual a few HD-audio quirks"
* tag 'sound-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (35 commits)
ALSA: hda/realtek: Fix mute led for HP Laptop 15-dw4xx
ALSA: hda: intel-dsp-config: Prevent SEGFAULT if ACPI_HANDLE() is NULL
ALSA: usb: qcom: Fix false-positive address space check
ASoC: rt5682s: Adjust SAR ADC button mode to fix noise issue
ASoC: Intel: PTL: Add entry for HDMI-In capture support to non-I2S codec boards.
ASoC: amd: acp: Fix incorrect retrival of acp_chip_info
ASoC: Intel: sof_sdw: use PRODUCT_FAMILY for Fatcat series
ASoC: qcom: sc8280xp: Fix sound card driver name match data for QCS8275
ALSA: hda/realtek: Fix volume control on Lenovo Thinkbook 13x Gen 4
ALSA: hda/realtek: Support Lenovo Thinkbook 13x Gen 5
ALSA: hda: cs35l41: Support Lenovo Thinkbook 13x Gen 5
ALSA: hda/realtek: Add ALC295 Dell TAS2781 I2C fixup
ALSA: hda/tas2781: Fix a potential race condition that causes a NULL pointer in case no efi.get_variable exsits
ASoC: qcom: sc8280xp: Enable DAI format configuration for MI2S interfaces
ASoC: qcom: q6apm-lpass-dais: Fix missing set_fmt DAI op for I2S
ASoC: qcom: audioreach: Fix lpaif_type configuration for the I2S interface
ASoC: Intel: catpt: Expose correct bit depth to userspace
ALSA: hda/tas2781: Fix the order of TAS2781 calibrated-data
ASoC: codecs: lpass-wsa-macro: Fix speaker quality distortion
ASoC: codecs: lpass-rx-macro: Fix playback quality distortion
...
Pull drm fixes from Dave Airlie:
"Weekly fixes for drm, it's a bit busier than I'd like on the xe side
this week, but otherwise amdgpu and some smaller fixes for i915/bridge
and a revert on docs.
docs:
- fix docs build regression
i915:
- Honor VESA eDP backlight luminance control capability
bridge:
- anx7625: Fix NULL pointer dereference with early IRQ
- cdns-mhdp8546: Fix missing mutex unlock on error path
xe:
- Release kobject for the failure path
- SRIOV PF: Drop rounddown_pow_of_two fair
- Remove type casting on hwmon
- Defer free of NVM auxiliary container to device release
- Fix a NULL vs IS_ERR
- Add cleanup action in xe_device_sysfs_init
- Fix error handling if PXP fails to start
- Set GuC RCS/CCS yield policy
amdgpu:
- GC 11.0.1/4 cleaner shader support
- DC irq fix
- OD fix
amdkfd:
- S0ix fix"
* tag 'drm-fixes-2025-09-19' of https://gitlab.freedesktop.org/drm/kernel:
drm/amdgpu: suspend KFD and KGD user queues for S0ix
drm/amdkfd: add proper handling for S0ix
drm/xe/guc: Set RCS/CCS yield policy
drm/xe: Fix error handling if PXP fails to start
drm/xe/sysfs: Add cleanup action in xe_device_sysfs_init
drm/amd: Only restore cached manual clock settings in restore if OD enabled
drm/xe: Fix a NULL vs IS_ERR() in xe_vm_add_compute_exec_queue()
drm: bridge: cdns-mhdp8546: Fix missing mutex unlock on error path
drm/i915/backlight: Honor VESA eDP backlight luminance control capability
drm/amd/display: Allow RX6xxx & RX7700 to invoke amdgpu_irq_get/put
drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.0.1/11.0.4 GPUs
drm: bridge: anx7625: Fix NULL pointer dereference with early IRQ
drm/xe: defer free of NVM auxiliary container to device release callback
drm/xe/hwmon: Remove type casting
drm/xe/pf: Drop rounddown_pow_of_two fair LMEM limitation
drm/xe/tile: Release kobject for the failure path
Revert "drm: Add directive to format code in comment"
If something holds a refcount then it is at risk of UAFing. For abort
paths we expect the caller to never share the object with a parallel
thread and to clean up any refcounts it obtained on its own.
Add the missing dec inside iommufd_hwpt_paging_alloc() during error unwind
by making iommufd_hw_pagetable_attach/detach() proper pairs.
Link: https://patch.msgid.link/r/2-v1-02cd136829df+31-iommufd_syz_fput_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
fput() doesn't actually call file_operations release() synchronously, it
puts the file on a work queue and it will be released eventually.
This is normally fine, except for iommufd the file and the iommufd_object
are tied to gether. The file has the object as it's private_data and holds
a users refcount, while the object is expected to remain alive as long as
the file is.
When the allocation of a new object aborts before installing the file it
will fput() the file and then go on to immediately kfree() the obj. This
causes a UAF once the workqueue completes the fput() and tries to
decrement the users refcount.
Fix this by putting the core code in charge of the file lifetime, and call
__fput_sync() during abort to ensure that release() is called before
kfree. __fput_sync() is a bit too tricky to open code in all the object
implementations. Instead the objects tell the core code where the file
pointer is and the core will take care of the life cycle.
If the object is successfully allocated then the file will hold a users
refcount and the iommufd_object cannot be destroyed.
It is worth noting that close(); ioctl(IOMMU_DESTROY); doesn't have an
issue because close() is already using a synchronous version of fput().
The UAF looks like this:
BUG: KASAN: slab-use-after-free in iommufd_eventq_fops_release+0x45/0xc0 drivers/iommu/iommufd/eventq.c:376
Write of size 4 at addr ffff888059c97804 by task syz.0.46/6164
CPU: 0 UID: 0 PID: 6164 Comm: syz.0.46 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xcd/0x630 mm/kasan/report.c:482
kasan_report+0xe0/0x110 mm/kasan/report.c:595
check_region_inline mm/kasan/generic.c:183 [inline]
kasan_check_range+0x100/0x1b0 mm/kasan/generic.c:189
instrument_atomic_read_write include/linux/instrumented.h:96 [inline]
atomic_fetch_sub_release include/linux/atomic/atomic-instrumented.h:400 [inline]
__refcount_dec include/linux/refcount.h:455 [inline]
refcount_dec include/linux/refcount.h:476 [inline]
iommufd_eventq_fops_release+0x45/0xc0 drivers/iommu/iommufd/eventq.c:376
__fput+0x402/0xb70 fs/file_table.c:468
task_work_run+0x14d/0x240 kernel/task_work.c:227
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop+0xeb/0x110 kernel/entry/common.c:43
exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
do_syscall_64+0x41c/0x4c0 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Link: https://patch.msgid.link/r/1-v1-02cd136829df+31-iommufd_syz_fput_jgg@nvidia.com
Cc: stable@vger.kernel.org
Fixes: 07838f7fd5 ("iommufd: Add iommufd fault object")
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Nirmoy Das <nirmoyd@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Reported-by: syzbot+80620e2d0d0a33b09f93@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/68c8583d.050a0220.2ff435.03a2.GAE@google.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The owner object of the imap can be destroyed while the imap remains in
the mtree. So access to the imap pointer without holding locks is racy
with destruction.
The imap is safe to access outside the lock once a users refcount is
obtained, the owner object cannot start destruction until users is 0.
Thus the users refcount should not be obtained at the end of
iommufd_fops_mmap() but instead inside the mtree lock held around the
mtree_load(). Move the refcount there and use refcount_inc_not_zero() as
we can have a 0 refcount inside the mtree during destruction races.
Link: https://patch.msgid.link/r/0-v1-e6faace50971+3cc-iommufd_mmap_fix_jgg@nvidia.com
Cc: stable@vger.kernel.org
Fixes: 56e9a0d8e5 ("iommufd: Add mmap interface")
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In io_link_skb function, there is a bug where prev_notif is incorrectly
assigned using 'nd' instead of 'prev_nd'. This causes the context
validation check to compare the current notification with itself instead
of comparing it with the previous notification.
Fix by using the correct prev_nd parameter when obtaining prev_notif.
Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Fixes: 6fe4220912 ("io_uring/notif: implement notification stacking")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 8c2e6b26ff ("vhost/net: Defer TX queue re-enable until after
sendmsg") tries to defer the notification enabling by moving the logic
out of the loop after the vhost_tx_batch() when nothing new is spotted.
This caused unexpected side effects as the new logic is reused for
several other error conditions.
A previous patch reverted 8c2e6b26ff. Now, bring the performance
back up by flushing batched buffers before enabling notifications.
Reported-by: Jon Kohler <jon@nutanix.com>
Cc: stable@vger.kernel.org
Fixes: 8c2e6b26ff ("vhost/net: Defer TX queue re-enable until after sendmsg")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20250917063045.2042-3-jasowang@redhat.com>
This reverts commit 8c2e6b26ff. It tries
to defer the notification enabling by moving the logic out of the loop
after the vhost_tx_batch() when nothing new is spotted. This will
bring side effects as the new logic would be reused for several other
error conditions.
One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs()
might return -EAGAIN and exit the loop and see there's still available
buffers, so it will queue the tx work again until userspace feed the
IOTLB entry correctly. This will slowdown the tx processing and
trigger the TX watchdog in the guest as reported in
https://lkml.org/lkml/2025/9/10/1596.
To fix, revert the change. A follow up patch will bring the performance
back in a safe way.
Reported-by: Jon Kohler <jon@nutanix.com>
Cc: stable@vger.kernel.org
Fixes: 8c2e6b26ff ("vhost/net: Defer TX queue re-enable until after sendmsg")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20250917063045.2042-2-jasowang@redhat.com>
Commit 67a873df0c ("vhost: basic in order support") pass the number
of used elem to vhost_net_rx_peek_head_len() to make sure it can
signal the used correctly before trying to do busy polling. But it
forgets to clear the count, this would cause the count run out of sync
with handle_rx() and break the busy polling.
Fixing this by passing the pointer of the count and clearing it after
the signaling the used.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 67a873df0c ("vhost: basic in order support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20250917063045.2042-1-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The AMD IOMMU host page table implementation supports dynamic page table levels
(up to 6 levels), starting with a 3-level configuration that expands based on
IOVA address. The kernel maintains a root pointer and current page table level
to enable proper page table walks in alloc_pte()/fetch_pte() operations.
The IOMMU IOVA allocator initially starts with 32-bit address and onces its
exhuasted it switches to 64-bit address (max address is determined based
on IOMMU and device DMA capability). To support larger IOVA, AMD IOMMU
driver increases page table level.
But in unmap path (iommu_v1_unmap_pages()), fetch_pte() reads
pgtable->[root/mode] without lock. So its possible that in exteme corner case,
when increase_address_space() is updating pgtable->[root/mode], fetch_pte()
reads wrong page table level (pgtable->mode). It does compare the value with
level encoded in page table and returns NULL. This will result is
iommu_unmap ops to fail and upper layer may retry/log WARN_ON.
CPU 0 CPU 1
------ ------
map pages unmap pages
alloc_pte() -> increase_address_space() iommu_v1_unmap_pages() -> fetch_pte()
pgtable->root = pte (new root value)
READ pgtable->[mode/root]
Reads new root, old mode
Updates mode (pgtable->mode += 1)
Since Page table level updates are infrequent and already synchronized with a
spinlock, implement seqcount to enable lock-free read operations on the read path.
Fixes: 754265bcab ("iommu/amd: Fix race in increase_address_space()")
Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Cc: stable@vger.kernel.org
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
When adding new VM MAC, driver checks only *active* filters in
vsi->mac_filter_hash. Each MAC, even in non-active state is using resources.
To determine number of MACs VM uses, count VSI filters in *any* state.
Add i40e_count_all_filters() to simply count all filters, and rename
i40e_count_filters() to i40e_count_active_filters() to avoid ambiguity.
Fixes: cfb1d572c9 ("i40e: Add ensurance of MacVlan resources for every trusted VF")
Cc: stable@vger.kernel.org
Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ITR index (itr_idx) is only 2 bits wide. When constructing the
register value for QINT_RQCTL, all fields are ORed together. Without
masking, higher bits from itr_idx may overwrite adjacent fields in the
register.
Apply I40E_QINT_RQCTL_ITR_INDX_MASK to ensure only the intended bits are
set.
Fixes: 5c3c48ac6b ("i40e: implement virtual device interface")
Cc: stable@vger.kernel.org
Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
VF state I40E_VF_STATE_ACTIVE is not the only state in which
VF is actually active so it should not be used to determine
if a VF is allowed to obtain resources.
Use I40E_VF_STATE_RESOURCES_LOADED that is set only in
i40e_vc_get_vf_resources_msg() and cleared during reset.
Fixes: 61125b8be8 ("i40e: Fix failed opcode appearing if handling messages from VF")
Cc: stable@vger.kernel.org
Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The `ring_len` parameter provided by the virtual function (VF)
is assigned directly to the hardware memory context (HMC) without
any validation.
To address this, introduce an upper boundary check for both Tx and Rx
queue lengths. The maximum number of descriptors supported by the
hardware is 8k-32.
Additionally, enforce alignment constraints: Tx rings must be a multiple
of 8, and Rx rings must be a multiple of 32.
Fixes: 5c3c48ac6b ("i40e: implement virtual device interface")
Cc: stable@vger.kernel.org
Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Pull runtime verifier fixes from Steven Rostedt:
- Fix build in some RISC-V flavours
Some system calls only are available for the 64bit RISC-V machines.
#ifdef out the cases of clock_nanosleep and futex in the sleep
monitor if they are not supported by the architecture.
- Fix wrong cast, obsolete after refactoring
Use container_of() to get to the rv_monitor structure from the
enable_monitors_next() 'p' pointer. The assignment worked only
because the list field used happened to be the first field of the
structure.
- Remove redundant include files
Some include files were listed twice. Remove the extra ones and sort
the includes.
- Fix missing unlock on failure
There was an error path that exited the rv_register_monitor()
function without releasing a lock. Change that to goto the lock
release.
- Add Gabriele Monaco to be Runtime Verifier maintainer
Gabriele is doing most of the work on RV as well as collecting
patches. Add him to the maintainers file for Runtime Verification.
* tag 'trace-rv-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rv: Add Gabriele Monaco as maintainer for Runtime Verification
rv: Fix missing mutex unlock in rv_register_monitor()
include/linux/rv.h: remove redundant include file
rv: Fix wrong type cast in enabled_monitors_next()
rv: Support systems with time64-only syscalls
A recent commit:
fc582cd26e ("io_uring/msg_ring: ensure io_kiocb freeing is deferred for RCU")
fixed an issue with not deferring freeing of io_kiocb structs that
msg_ring allocates to after the current RCU grace period. But this only
covers requests that don't end up in the allocation cache. If a request
goes into the alloc cache, it can get reused before it is sane to do so.
A recent syzbot report would seem to indicate that there's something
there, however it may very well just be because of the KASAN poisoning
that the alloc_cache handles manually.
Rather than attempt to make the alloc_cache sane for that use case, just
drop the usage of the alloc_cache for msg_ring request payload data.
Fixes: 50cf5f3842 ("io_uring/msg_ring: add an alloc cache for io_kiocb entries")
Link: https://lore.kernel.org/io-uring/68cc2687.050a0220.139b6.0005.GAE@google.com/
Reported-by: syzbot+baa2e0f4e02df602583e@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This laptop uses the ALC236 codec with COEF 0x7 and idx 1 to
control the mute LED. Enable the existing quirk for this device.
Signed-off-by: Praful Adiga <praful.adiga@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless. No known regressions at this point.
Current release - fix to a fix:
- eth: Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"
- wifi: iwlwifi: pcie: fix byte count table for 7000/8000 devices
- net: clear sk->sk_ino in sk_set_socket(sk, NULL), fix CRIU
Previous releases - regressions:
- bonding: set random address only when slaves already exist
- rxrpc: fix untrusted unsigned subtract
- eth:
- ice: fix Rx page leak on multi-buffer frames
- mlx5: don't return mlx5_link_info table when speed is unknown
Previous releases - always broken:
- tls: make sure to abort the stream if headers are bogus
- tcp: fix null-deref when using TCP-AO with TCP_REPAIR
- dpll: fix skipping last entry in clock quality level reporting
- eth: qed: don't collect too many protection override GRC elements,
fix memory corruption"
* tag 'net-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()
cnic: Fix use-after-free bugs in cnic_delete_task
devlink rate: Remove unnecessary 'static' from a couple places
MAINTAINERS: update sundance entry
net: liquidio: fix overflow in octeon_init_instr_queue()
net: clear sk->sk_ino in sk_set_socket(sk, NULL)
Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"
selftests: tls: test skb copy under mem pressure and OOB
tls: make sure to abort the stream if headers are bogus
selftest: packetdrill: Add tcp_fastopen_server_reset-after-disconnect.pkt.
tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().
octeon_ep: fix VF MAC address lifecycle handling
selftests: bonding: add vlan over bond testing
bonding: don't set oif to bond dev when getting NS target destination
net: rfkill: gpio: Fix crash due to dereferencering uninitialized pointer
net/mlx5e: Add a miss level for ipsec crypto offload
net/mlx5e: Harden uplink netdev access against device unbind
MAINTAINERS: make the DPLL entry cover drivers
doc/netlink: Fix typos in operation attributes
igc: don't fail igc_probe() on LED setup error
...
Pull kvm fixes from Paolo Bonzini:
"These are mostly Oliver's Arm changes: lock ordering fixes for the
vGIC, and reverts for a buggy attempt to avoid RCU stalls on large
VMs.
Arm:
- Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
visiting from an MMU notifier
- Fixes to the TLB match process and TLB invalidation range for
managing the VCNR pseudo-TLB
- Prevent SPE from erroneously profiling guests due to UNKNOWN reset
values in PMSCR_EL1
- Fix save/restore of host MDCR_EL2 to account for eagerly
programming at vcpu_load() on VHE systems
- Correct lock ordering when dealing with VGIC LPIs, avoiding
scenarios where an xarray's spinlock was nested with a *raw*
spinlock
- Permit stage-2 read permission aborts which are possible in the
case of NV depending on the guest hypervisor's stage-2 translation
- Call raw_spin_unlock() instead of the internal spinlock API
- Fix parameter ordering when assigning VBAR_EL1
- Reverted a couple of fixes for RCU stalls when destroying a stage-2
page table.
There appears to be some nasty refcounting / UAF issues lurking in
those patches and the band-aid we tried to apply didn't hold.
s390:
- mm fixes, including userfaultfd bug fix
x86:
- Sync the vTPR from the local APIC to the VMCB even when AVIC is
active.
This fixes a bug where host updates to the vTPR, e.g. via
KVM_SET_LAPIC or emulation of a guest access, are lost and result
in interrupt delivery issues in the guest"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active
Revert "KVM: arm64: Split kvm_pgtable_stage2_destroy()"
Revert "KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables"
KVM: arm64: vgic: fix incorrect spinlock API usage
KVM: arm64: Remove stage 2 read fault check
KVM: arm64: Fix parameter ordering for VBAR_EL1 assignment
KVM: arm64: nv: Fix incorrect VNCR invalidation range calculation
KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock
KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks
KVM: arm64: Spin off release helper from vgic_put_irq()
KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs
KVM: arm64: vgic: Drop stale comment on IRQ active state
KVM: arm64: VHE: Save and restore host MDCR_EL2 value correctly
KVM: arm64: Initialize PMSCR_EL1 when in VHE
KVM: arm64: nv: fix VNCR TLB ASID match logic for non-Global entries
KVM: s390: Fix FOLL_*/FAULT_FLAG_* confusion
KVM: s390: Fix incorrect usage of mmu_notifier_register()
KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
KVM: arm64: Mark freed S2 MMUs as invalid
When running task_work for an exiting task, rather than perform the
issue retry attempt, the task_work is canceled. However, this isn't
done for a ring that has been closed. This can lead to requests being
successfully completed post the ring being closed, which is somewhat
confusing and surprising to an application.
Rather than just check the task exit state, also include the ring
ref state in deciding whether or not to terminate a given request when
run from task_work.
Cc: stable@vger.kernel.org # 6.1+
Link: https://github.com/axboe/liburing/discussions/1459
Reported-by: Benedek Thaler <thaler@thaler.hu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull x86 platform driver fixes from Ilpo Järvinen:
"Fixes and new HW support:
- amd/pmc: Add MECHREVO Yilong15Pro to spurious_8042 list
- amd/pmf: Support new ACPI ID AMDI0108
- asus-wmi: Re-add extra keys to ignore_key_wlan quirk
- oxpec: Add support for AOKZOE A1X and OneXPlayer X1Pro EVA-02"
* tag 'platform-drivers-x86-v6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: asus-wmi: Re-add extra keys to ignore_key_wlan quirk
platform/x86/amd/pmf: Support new ACPI ID AMDI0108
platform/x86: oxpec: Add support for AOKZOE A1X
platform/x86: oxpec: Add support for OneXPlayer X1Pro EVA-02
platform/x86/amd/pmc: Add MECHREVO Yilong15Pro to spurious_8042 list
Pull UML fixes from Johannes Berg:
"A few fixes for UML, which I'd meant to send earlier but then forgot.
All of them are pretty long-standing issues that are either not really
happening (the UAF), in rarely used code (the FD buffer issue), or an
issue only for some host configurations (the executable stack):
- mark stack not executable to work on more modern systems with
selinux
- fix use-after-free in a virtio error path
- fix stack buffer overflow in external unix socket FD receive
function"
* tag 'uml-for-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
um: Fix FD copy size in os_rcv_fd_msg()
um: virtio_uml: Fix use-after-free after put_device in probe
um: Don't mark stack executable
The original code relies on cancel_delayed_work() in otx2_ptp_destroy(),
which does not ensure that the delayed work item synctstamp_work has fully
completed if it was already running. This leads to use-after-free scenarios
where otx2_ptp is deallocated by otx2_ptp_destroy(), while synctstamp_work
remains active and attempts to dereference otx2_ptp in otx2_sync_tstamp().
Furthermore, the synctstamp_work is cyclic, the likelihood of triggering
the bug is nonnegligible.
A typical race condition is illustrated below:
CPU 0 (cleanup) | CPU 1 (delayed work callback)
otx2_remove() |
otx2_ptp_destroy() | otx2_sync_tstamp()
cancel_delayed_work() |
kfree(ptp) |
| ptp = container_of(...); //UAF
| ptp-> //UAF
This is confirmed by a KASAN report:
BUG: KASAN: slab-use-after-free in __run_timer_base.part.0+0x7d7/0x8c0
Write of size 8 at addr ffff88800aa09a18 by task bash/136
...
Call Trace:
<IRQ>
dump_stack_lvl+0x55/0x70
print_report+0xcf/0x610
? __run_timer_base.part.0+0x7d7/0x8c0
kasan_report+0xb8/0xf0
? __run_timer_base.part.0+0x7d7/0x8c0
__run_timer_base.part.0+0x7d7/0x8c0
? __pfx___run_timer_base.part.0+0x10/0x10
? __pfx_read_tsc+0x10/0x10
? ktime_get+0x60/0x140
? lapic_next_event+0x11/0x20
? clockevents_program_event+0x1d4/0x2a0
run_timer_softirq+0xd1/0x190
handle_softirqs+0x16a/0x550
irq_exit_rcu+0xaf/0xe0
sysvec_apic_timer_interrupt+0x70/0x80
</IRQ>
...
Allocated by task 1:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x7f/0x90
otx2_ptp_init+0xb1/0x860
otx2_probe+0x4eb/0xc30
local_pci_probe+0xdc/0x190
pci_device_probe+0x2fe/0x470
really_probe+0x1ca/0x5c0
__driver_probe_device+0x248/0x310
driver_probe_device+0x44/0x120
__driver_attach+0xd2/0x310
bus_for_each_dev+0xed/0x170
bus_add_driver+0x208/0x500
driver_register+0x132/0x460
do_one_initcall+0x89/0x300
kernel_init_freeable+0x40d/0x720
kernel_init+0x1a/0x150
ret_from_fork+0x10c/0x1a0
ret_from_fork_asm+0x1a/0x30
Freed by task 136:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3a/0x60
__kasan_slab_free+0x3f/0x50
kfree+0x137/0x370
otx2_ptp_destroy+0x38/0x80
otx2_remove+0x10d/0x4c0
pci_device_remove+0xa6/0x1d0
device_release_driver_internal+0xf8/0x210
pci_stop_bus_device+0x105/0x150
pci_stop_and_remove_bus_device_locked+0x15/0x30
remove_store+0xcc/0xe0
kernfs_fop_write_iter+0x2c3/0x440
vfs_write+0x871/0xd70
ksys_write+0xee/0x1c0
do_syscall_64+0xac/0x280
entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the delayed work item is properly canceled before the otx2_ptp is
deallocated.
This bug was initially identified through static analysis. To reproduce
and test it, I simulated the OcteonTX2 PCI device in QEMU and introduced
artificial delays within the otx2_sync_tstamp() function to increase the
likelihood of triggering the bug.
Fixes: 2958d17a89 ("octeontx2-pf: Add support for ptp 1-step mode on CN10K silicon")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The original code uses cancel_delayed_work() in cnic_cm_stop_bnx2x_hw(),
which does not guarantee that the delayed work item 'delete_task' has
fully completed if it was already running. Additionally, the delayed work
item is cyclic, the flush_workqueue() in cnic_cm_stop_bnx2x_hw() only
blocks and waits for work items that were already queued to the
workqueue prior to its invocation. Any work items submitted after
flush_workqueue() is called are not included in the set of tasks that the
flush operation awaits. This means that after the cyclic work items have
finished executing, a delayed work item may still exist in the workqueue.
This leads to use-after-free scenarios where the cnic_dev is deallocated
by cnic_free_dev(), while delete_task remains active and attempt to
dereference cnic_dev in cnic_delete_task().
A typical race condition is illustrated below:
CPU 0 (cleanup) | CPU 1 (delayed work callback)
cnic_netdev_event() |
cnic_stop_hw() | cnic_delete_task()
cnic_cm_stop_bnx2x_hw() | ...
cancel_delayed_work() | /* the queue_delayed_work()
flush_workqueue() | executes after flush_workqueue()*/
| queue_delayed_work()
cnic_free_dev(dev)//free | cnic_delete_task() //new instance
| dev = cp->dev; //use
Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the cyclic delayed work item is properly canceled and that any
ongoing execution of the work item completes before the cnic_dev is
deallocated. Furthermore, since cancel_delayed_work_sync() uses
__flush_work(work, true) to synchronously wait for any currently
executing instance of the work item to finish, the flush_workqueue()
becomes redundant and should be removed.
This bug was identified through static analysis. To reproduce the issue
and validate the fix, I simulated the cnic PCI device in QEMU and
introduced intentional delays — such as inserting calls to ssleep()
within the cnic_delete_task() function — to increase the likelihood
of triggering the bug.
Fixes: fdf24086f4 ("cnic: Defer iscsi connection cleanup")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
devlink_rate_node_get_by_name() and devlink_rate_nodes_destroy() have a
couple of unnecessary static variables for iterating over devlink rates.
This could lead to races/corruption/unhappiness if two concurrent
operations execute the same function.
Remove 'static' from both. It's amazing this was missed for 4+ years.
While at it, I confirmed there are no more examples of this mistake in
net/ with 1, 2 or 3 levels of indentation.
Fixes: a8ecb93ef0 ("devlink: Introduce rate nodes")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The expression `(conf->instr_type == 64) << iq_no` can overflow because
`iq_no` may be as high as 64 (`CN23XX_MAX_RINGS_PER_PF`). Casting the
operand to `u64` ensures correct 64-bit arithmetic.
Fixes: f21fb3ed36 ("Add support of Cavium Liquidio ethernet adapters")
Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Function copy_from_user() and copy_to_user() may sleep because of page
fault, and they cannot be called in spin_lock hold context. Here move
function calling of copy_from_user() and copy_to_user() out of spinlock
context in function kvm_pch_pic_regs_access().
Otherwise there will be possible warning such as:
BUG: sleeping function called from invalid context at include/linux/uaccess.h:192
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
INFO: lockdep is turned off.
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40
softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full)
Tainted: [W]=WARN
Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000
9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8
9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001
0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880
00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe
000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0
0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000
0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0
0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40
00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d
Call Trace:
[<9000000004c2827c>] show_stack+0x5c/0x180
[<9000000004c20fac>] dump_stack_lvl+0x94/0xe4
[<9000000004c99c7c>] __might_resched+0x26c/0x290
[<9000000004f68968>] __might_fault+0x20/0x88
[<ffff800002311de0>] kvm_pch_pic_regs_access.isra.0+0x88/0x380 [kvm]
[<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm]
[<900000000506b0d8>] sys_ioctl+0x388/0x1010
[<90000000063ed210>] do_syscall+0xb0/0x2d8
[<9000000004c25ef8>] handle_syscall+0xb8/0x158
Cc: stable@vger.kernel.org
Fixes: d206d95148 ("LoongArch: KVM: Add PCHPIC user mode read and write functions")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
With PTW disabled system, bit _PAGE_DIRTY is a HW bit for page writing.
However with PTW enabled system, bit _PAGE_WRITE is also a "HW bit" for
page writing, because hardware synchronizes _PAGE_WRITE to _PAGE_DIRTY
automatically. Previously, _PAGE_WRITE is treated as a SW bit to record
the page writeable attribute for the fast page fault handling in the
secondary MMU, however with PTW enabled machine, this bit is used by HW
already (so setting it will silence the TLB modify exception).
Here define KVM_PAGE_WRITEABLE with the SW bit _PAGE_MODIFIED, so that
it can work on both PTW disabled and enabled machines. And for HW write
bits, both _PAGE_DIRTY and _PAGE_WRITE are set or clear together.
Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Add a NULL-pointer check after the kcalloc() call in init_vdso(). If
allocation fails, return -ENOMEM to prevent a possible dereference of
vdso_info.code_mapping.pages when it is NULL.
Cc: stable@vger.kernel.org
Fixes: 2ed119aef6 ("LoongArch: Set correct size for vDSO code mapping")
Signed-off-by: Guangshuo Li <202321181@mail.sdu.edu.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
When testing the kernel live patching with "modprobe livepatch-sample",
there is a timeout over 15 seconds from "starting patching transition"
to "patching complete". The dmesg command shows "unreliable stack" for
user tasks in debug mode, here is one of the messages:
livepatch: klp_try_switch_task: bash:1193 has an unreliable stack
The "unreliable stack" is because it can not unwind from do_syscall()
to its previous frame handle_syscall(). It should use fp to find the
original stack top due to secondary stack in do_syscall(), but fp is
not used for some other functions, then fp can not be restored by the
next frame of do_syscall(), so it is necessary to save fp if task is
not current, in order to get the stack top of do_syscall().
Here are the call chains:
klp_enable_patch()
klp_try_complete_transition()
klp_try_switch_task()
klp_check_and_switch_task()
klp_check_stack()
stack_trace_save_tsk_reliable()
arch_stack_walk_reliable()
When executing "rmmod livepatch-sample", there exists a similar issue.
With this patch, it takes a short time for patching and unpatching.
Before:
# modprobe livepatch-sample
# dmesg -T | tail -3
[Sat Sep 6 11:00:20 2025] livepatch: 'livepatch_sample': starting patching transition
[Sat Sep 6 11:00:35 2025] livepatch: signaling remaining tasks
[Sat Sep 6 11:00:36 2025] livepatch: 'livepatch_sample': patching complete
# echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
# rmmod livepatch_sample
rmmod: ERROR: Module livepatch_sample is in use
# rmmod livepatch_sample
# dmesg -T | tail -3
[Sat Sep 6 11:06:05 2025] livepatch: 'livepatch_sample': starting unpatching transition
[Sat Sep 6 11:06:20 2025] livepatch: signaling remaining tasks
[Sat Sep 6 11:06:21 2025] livepatch: 'livepatch_sample': unpatching complete
After:
# modprobe livepatch-sample
# dmesg -T | tail -2
[Tue Sep 16 16:19:30 2025] livepatch: 'livepatch_sample': starting patching transition
[Tue Sep 16 16:19:31 2025] livepatch: 'livepatch_sample': patching complete
# echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
# rmmod livepatch_sample
# dmesg -T | tail -2
[Tue Sep 16 16:19:36 2025] livepatch: 'livepatch_sample': starting unpatching transition
[Tue Sep 16 16:19:37 2025] livepatch: 'livepatch_sample': unpatching complete
Cc: stable@vger.kernel.org # v6.9+
Fixes: 199cc14cb4 ("LoongArch: Add kernel livepatching support")
Reported-by: Xi Zhang <zhangxi@kylinos.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
As Documentation/filesystems/sysfs.rst suggested, show() should only use
sysfs_emit() or sysfs_emit_at() when formatting the value to be returned
to user space.
No functional change intended.
Cc: stable@vger.kernel.org
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
ARCH_STRICT_ALIGN is used for hardware without UAL, now it only control
the -mstrict-align flag. However, ACPI structures are packed by default
so will cause unaligned accesses.
To avoid this, define ACPI_MISALIGNMENT_NOT_SUPPORTED in asm/acenv.h to
align ACPI structures if ARCH_STRICT_ALIGN enabled.
Cc: stable@vger.kernel.org
Reported-by: Binbin Zhou <zhoubinbin@loongson.cn>
Suggested-by: Xi Ruoyao <xry111@xry111.site>
Suggested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Loongson-3A6000 and 3C6000 CPUs also support unaligned memory access, so
the current description is out of date to some extent.
Actually, all of Loongson-3 series processors based on LoongArch support
unaligned memory access, this hardware capability is indicated by the bit
20 (UAL) of CPUCFG1 register, update the help info to reflect the reality.
Cc: stable@vger.kernel.org
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
When compiling with LLVM and CONFIG_RUST is set, there exist objtool
warnings in rust/core.o and rust/kernel.o, like this:
rust/core.o: warning: objtool:
_RNvXs1_NtNtCs5QSdWC790r4_4core5ascii10ascii_charNtB5_9AsciiCharNtNtB9_3fmt5Debug3fmt+0x54:
sibling call from callable instruction with modified stack frame
For this special case, the related object file shows that there is no
generated relocation section '.rela.discard.tablejump_annotate' for the
table jump instruction jirl, thus objtool can not know that what is the
actual destination address.
If rustc has the option "-Cllvm-args=--loongarch-annotate-tablejump",
pass the option to enable jump tables for objtool, otherwise it should
pass "-Zno-jump-tables" to keep compatibility with older rustc.
How to test:
$ rustup component add rust-src
$ make LLVM=1 rustavailable
$ make ARCH=loongarch LLVM=1 clean defconfig
$ scripts/config -d MODVERSIONS \
-e RUST -e SAMPLES -e SAMPLES_RUST \
-e SAMPLE_RUST_CONFIGFS -e SAMPLE_RUST_MINIMAL \
-e SAMPLE_RUST_MISC_DEVICE -e SAMPLE_RUST_PRINT \
-e SAMPLE_RUST_DMA -e SAMPLE_RUST_DRIVER_PCI \
-e SAMPLE_RUST_DRIVER_PLATFORM -e SAMPLE_RUST_DRIVER_FAUX \
-e SAMPLE_RUST_DRIVER_AUXILIARY -e SAMPLE_RUST_HOSTPROGS
$ make ARCH=loongarch LLVM=1 olddefconfig all
Cc: stable@vger.kernel.org
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Reported-by: Miguel Ojeda <ojeda@kernel.org>
Closes: https://lore.kernel.org/rust-for-linux/CANiq72mNeCuPkCDrG2db3w=AX+O-zYrfprisDPmRac_qh65Dmg@mail.gmail.com/
Suggested-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
LTO is not only used for Clang, but maybe also used for Rust, make LTO
case out of CONFIG_CC_HAS_ANNOTATE_TABLEJUMP in Makefile.
This is preparation for later patch, no function changes.
Cc: stable@vger.kernel.org
Suggested-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
When compiling with LLVM and CONFIG_RUST is set, there exists the
following objtool warning:
rust/compiler_builtins.o: warning: objtool: __rust__unordsf2(): unexpected end of section .text.unlikely.
objdump shows that the end of section .text.unlikely is an atomic
instruction:
amswap.w $zero, $ra, $zero
According to the LoongArch Reference Manual, if the amswap.w atomic
memory access instruction has the same register number as rd and rj,
the execution will trigger an Instruction Non-defined Exception, so
mark the above instruction as INSN_BUG type to fix the warning.
Cc: stable@vger.kernel.org
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
If the break immediate code is 0, it should mark the type as
INSN_TRAP. If the break immediate code is 1, it should mark the
type as INSN_BUG.
While at it, format the code style and add the code comment for nop.
Cc: stable@vger.kernel.org
Suggested-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Inspired by recent changes to compression level parsing in
6db1df415d ("btrfs: accept and ignore compression level for lzo")
it turns out that we do not do any extra validation for compression
level input string, thus allowing things like "compress=lzo:invalid" to
be accepted without warnings.
Although we accept levels that are beyond the supported algorithm
ranges, accepting completely invalid level specification is not correct.
Fix the too loose checks for compression level, by doing proper error
handling of kstrtoint(), so that we will reject not only too large
values (beyond int range) but also completely wrong levels like
"lzo:invalid".
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Normally we wait for the socket to buffer up the whole record
before we service it. If the socket has a tiny buffer, however,
we read out the data sooner, to prevent connection stalls.
Make sure that we abort the connection when we find out late
that the record is actually invalid. Retrying the parsing is
fine in itself but since we copy some more data each time
before we parse we can overflow the allocated skb space.
Constructing a scenario in which we're under pressure without
enough data in the socket to parse the length upfront is quite
hard. syzbot figured out a way to do this by serving us the header
in small OOB sends, and then filling in the recvbuf with a large
normal send.
Make sure that tls_rx_msg_size() aborts strp, if we reach
an invalid record there's really no way to recover.
Reported-by: Lee Jones <lee@kernel.org>
Fixes: 84c61fe1a7 ("tls: rx: do not use the standard strparser")
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250917002814.1743558-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Issuing two writes to the same af_alg socket is bogus as the
data will be interleaved in an unpredictable fashion. Furthermore,
concurrent writes may create inconsistencies in the internal
socket state.
Disallow this by adding a new ctx->write field that indiciates
exclusive ownership for writing.
Fixes: 8ff590903d ("crypto: algif_skcipher - User-space interface for skcipher operations")
Reported-by: Muhammad Alifa Ramdhan <ramdhan@starlabs.sg>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If an error causes af_alg_sendmsg to abort, ctx->merge may contain
a garbage value from the previous loop. This may then trigger a
crash on the next entry into af_alg_sendmsg when it attempts to do
a merge that can't be done.
Fix this by setting ctx->merge to zero near the start of the loop.
Fixes: 8ff590903d ("crypto: algif_skcipher - User-space interface for skcipher operations")
Reported-by: Muhammad Alifa Ramdhan <ramdhan@starlabs.sg>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ASoC: Fixes for v6.17
A pile of fixes that accumilated over the past few -rcs, this is all
driver specifics including a small pile of quirks for new systems.
Pull misc fixes from Andrew Morton:
"15 hotfixes. 11 are cc:stable and the remainder address post-6.16
issues or aren't considered necessary for -stable kernels. 13 of these
fixes are for MM.
The usual shower of singletons, plus
- fixes from Hugh to address various misbehaviors in get_user_pages()
- patches from SeongJae to address a quite severe issue in DAMON
- another series also from SeongJae which completes some fixes for a
DAMON startup issue"
* tag 'mm-hotfixes-stable-2025-09-17-21-10' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
zram: fix slot write race condition
nilfs2: fix CFI failure when accessing /sys/fs/nilfs2/features/*
samples/damon/mtier: avoid starting DAMON before initialization
samples/damon/prcl: avoid starting DAMON before initialization
samples/damon/wsse: avoid starting DAMON before initialization
MAINTAINERS: add Lance Yang as a THP reviewer
MAINTAINERS: add Jann Horn as rmap reviewer
mm/damon/sysfs: use dynamically allocated repeat mode damon_call_control
mm/damon/core: introduce damon_call_control->dealloc_on_cancel
mm: folio_may_be_lru_cached() unless folio_test_large()
mm: revert "mm: vmscan.c: fix OOM on swap stress test"
mm: revert "mm/gup: clear the LRU flag of a page before adding to LRU batch"
mm/gup: local lru_add_drain() to avoid lru_add_drain_all()
mm/gup: check ref_count instead of lru before migration
[BUG]
Inside check_inode_ref(), we need to make sure every structure,
including the btrfs_inode_extref header, is covered by the item. But
our code is incorrectly using "sizeof(iref)", where @iref is just a
pointer.
This means "sizeof(iref)" will always be "sizeof(void *)", which is much
smaller than "sizeof(struct btrfs_inode_extref)".
This will allow some bad inode extrefs to sneak in, defeating tree-checker.
[FIX]
Fix the typo by calling "sizeof(*iref)", which is the same as
"sizeof(struct btrfs_inode_extref)", and will be the correct behavior we
want.
Fixes: 71bf92a9b8 ("btrfs: tree-checker: Add check for INODE_REF")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Fix the following case where the client would end up closing both
deferred files (foo.tmp & foo) after unlink(foo) due to strstr() call
in cifs_close_deferred_file_under_dentry():
fd1 = openat(AT_FDCWD, "foo", O_WRONLY|O_CREAT|O_TRUNC, 0666);
fd2 = openat(AT_FDCWD, "foo.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0666);
close(fd1);
close(fd2);
unlink("foo");
Fixes: e3fc065682 ("cifs: Deferred close performance improvements")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Cc: Frank Sorenson <sorenson@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull smb server fixes from Steve French:
- Two fixes for remaining_data_length and offset checks in receive path
- Don't go over max SGEs which caused smbdirect send to fail (and
trigger disconnect)
* tag '6.17-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: smbdirect: verify remaining_data_length respects max_fragmented_recv_size
ksmbd: smbdirect: validate data_offset and data_length field of smb_direct_data_transfer
smb: server: let smb_direct_writev() respect SMB_DIRECT_MAX_SEND_SGES
All recent platforms (including all the ones officially supported by the
Xe driver) do not allow concurrent execution of RCS and CCS workloads
from different address spaces, with the HW blocking the context switch
when it detects such a scenario.
The DUAL_QUEUE flag helps with this, by causing the GuC to not submit a
context it knows will not be able to execute. This, however, causes a new
problem: if RCS and CCS queues have pending workloads from different
address spaces, the GuC needs to choose from which of the 2 queues to
pick the next workload to execute. By default, the GuC prioritizes RCS
submissions over CCS ones, which can lead to CCS workloads being
significantly (or completely) starved of execution time.
The driver can tune this by setting a dedicated scheduling policy KLV;
this KLV allows the driver to specify a quantum (in ms) and a ratio
(percentage value between 0 and 100), and the GuC will prioritize the CCS
for that percentage of each quantum.
Given that we want to guarantee enough RCS throughput to avoid missing
frames, we set the yield policy to 20% of each 80ms interval.
v2: updated quantum and ratio, improved comment, use xe_guc_submit_disable
in gt_sanitize
Fixes: d9a1ae0d17 ("drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250905235632.3333247-2-daniele.ceraolospurio@intel.com
(cherry picked from commit 8843444843)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo added #include "xe_guc_submit.h" while backporting]
Pull probe fix from Masami Hiramatsu:
- kprobe-event: Fix null-ptr-deref in trace_kprobe_create_internal(),
by handling NULL return of kmemdup() correctly
* tag 'probes-fixes-v6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: kprobe-event: Fix null-ptr-deref in trace_kprobe_create_internal()
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-09-16 (ice, i40e, ixgbe, igc)
For ice:
Jake resolves leaking pages with multi-buffer frames when a 0-sized
descriptor is encountered.
For i40e:
Maciej removes a redundant, and incorrect, memory barrier.
For ixgbe:
Jedrzej adjusts lifespan of ACI lock to ensure uses are while it is
valid.
For igc:
Kohei Enju does not fail probe on LED setup failure which resolves a
kernel panic in the cleanup path, if we were to fail.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
igc: don't fail igc_probe() on LED setup error
ixgbe: destroy aci.lock later within ixgbe_remove path
ixgbe: initialize aci.lock before it's used
i40e: remove redundant memory barrier when cleaning Tx descs
ice: fix Rx page leak on multi-buffer frames
====================
Link: https://patch.msgid.link/20250916212801.2818440-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima says:
====================
tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().
syzbot reported a warning in tcp_retransmit_timer() for TCP Fast
Open socket.
Patch 1 fixes the issue and Patch 2 adds a test for the scenario.
====================
Link: https://patch.msgid.link/20250915175800.118793-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test reproduces the scenario explained in the previous patch.
Without the patch, the test triggers the warning and cannot see the last
retransmitted packet.
# ./ksft_runner.sh tcp_fastopen_server_reset-after-disconnect.pkt
TAP version 13
1..2
[ 29.229250] ------------[ cut here ]------------
[ 29.231414] WARNING: CPU: 26 PID: 0 at net/ipv4/tcp_timer.c:542 tcp_retransmit_timer+0x32/0x9f0
...
tcp_fastopen_server_reset-after-disconnect.pkt:26: error handling packet: Timed out waiting for packet
not ok 1 ipv4
tcp_fastopen_server_reset-after-disconnect.pkt:26: error handling packet: Timed out waiting for packet
not ok 2 ipv6
# Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250915175800.118793-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, VF MAC address info is not updated when the MAC address is
configured from VF, and it is not cleared when the VF is removed. This
leads to stale or missing MAC information in the PF, which may cause
incorrect state tracking or inconsistencies when VFs are hot-plugged
or reassigned.
Fix this by:
- storing the VF MAC address in the PF when it is set from VF
- clearing the stored VF MAC address when the VF is removed
This ensures that the PF always has correct VF MAC state.
Fixes: cde29af9e6 ("octeon_ep: add PF-VF mailbox communication")
Signed-off-by: Sathesh B Edara <sedara@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250916133207.21737-1-sedara@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Unlike IPv4, IPv6 routing strictly requires the source address to be valid
on the outgoing interface. If the NS target is set to a remote VLAN interface,
and the source address is also configured on a VLAN over a bond interface,
setting the oif to the bond device will fail to retrieve the correct
destination route.
Fix this by not setting the oif to the bond device when retrieving the NS
target destination. This allows the correct destination device (the VLAN
interface) to be determined, so that bond_verify_device_path can return the
proper VLAN tags for sending NS messages.
Reported-by: David Wilder <wilder@us.ibm.com>
Closes: https://lore.kernel.org/netdev/aGOKggdfjv0cApTO@fedora/
Suggested-by: Jay Vosburgh <jv@jvosburgh.net>
Tested-by: David Wilder <wilder@us.ibm.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Fixes: 4e24be018e ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250916080127.430626-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull sched_ext fixes from Tejun Heo:
- Fix build failure when !FAIR_GROUP_SCHED && EXT_GROUP_SCHED
- Revert "sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()"
which was causing issues with per-CPU task scheduling and reenqueuing
behavior
* tag 'sched_ext-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext, sched/core: Fix build failure when !FAIR_GROUP_SCHED && EXT_GROUP_SCHED
Revert "sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()"
Pull cgroup fixes from Tejun Heo:
"This contains two cgroup changes. Both are pretty low risk.
- Fix deadlock in cgroup destruction when repeatedly
mounting/unmounting perf_event and net_prio controllers.
The issue occurs because cgroup_destroy_wq has max_active=1, causing
root destruction to wait for CSS offline operations that are queued
behind it.
The fix splits cgroup_destroy_wq into three separate workqueues to
eliminate the blocking.
- Set of->priv to NULL upon file release to make potential bugs to
manifest as NULL pointer dereferences rather than use-after-free
errors"
* tag 'cgroup-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/psi: Set of->priv to NULL upon file release
cgroup: split cgroup_destroy_wq into 3 workqueues
KVM x86 fix for 6.17-rcN
Sync the vTPR from the local APIC to the VMCB even when AVIC is active, to fix
a bug where host updates to the vTPR, e.g. via KVM_SET_LAPIC or emulation of a
guest access, effectively get lost and result in interrupt delivery issues in
the guest.
KVM/arm64 changes for 6.17, round #3
- Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
visiting from an MMU notifier
- Fixes to the TLB match process and TLB invalidation range for
managing the VCNR pseudo-TLB
- Prevent SPE from erroneously profiling guests due to UNKNOWN reset
values in PMSCR_EL1
- Fix save/restore of host MDCR_EL2 to account for eagerly programming
at vcpu_load() on VHE systems
- Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios
where an xarray's spinlock was nested with a *raw* spinlock
- Permit stage-2 read permission aborts which are possible in the case
of NV depending on the guest hypervisor's stage-2 translation
- Call raw_spin_unlock() instead of the internal spinlock API
- Fix parameter ordering when assigning VBAR_EL1
Since the PXP start comes after __xe_exec_queue_init() has completed,
we need to cleanup what was done in that function in case of a PXP
start error.
__xe_exec_queue_init calls the submission backend init() function,
so we need to introduce an opposite for that. Unfortunately, while
we already have a fini() function pointer, it performs other
operations in addition to cleaning up what was done by the init().
Therefore, for clarity, the existing fini() has been renamed to
destroy(), while a new fini() has been added to only clean up what was
done by the init(), with the latter being called by the former (via
xe_exec_queue_fini).
Fixes: 72d479601d ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250909221240.3711023-3-daniele.ceraolospurio@intel.com
(cherry picked from commit 626667321d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Check in snd_intel_dsp_check_soundwire() that the pointer returned by
ACPI_HANDLE() is not NULL, before passing it on to other functions.
The original code assumed a non-NULL return, but if it was unexpectedly
NULL it would end up passed to acpi_walk_namespace() as the start
point, and would result in
[ 3.219028] BUG: kernel NULL pointer dereference, address:
0000000000000018
[ 3.219029] #PF: supervisor read access in kernel mode
[ 3.219030] #PF: error_code(0x0000) - not-present page
[ 3.219031] PGD 0 P4D 0
[ 3.219032] Oops: Oops: 0000 [#1] SMP NOPTI
[ 3.219035] CPU: 2 UID: 0 PID: 476 Comm: (udev-worker) Tainted: G S
AW E 6.17.0-rc5-test #1 PREEMPT(voluntary)
[ 3.219038] Tainted: [S]=CPU_OUT_OF_SPEC, [A]=OVERRIDDEN_ACPI_TABLE,
[W]=WARN, [E]=UNSIGNED_MODULE
[ 3.219040] RIP: 0010:acpi_ns_walk_namespace+0xb5/0x480
This problem was triggered by a bugged DSDT that the kernel couldn't parse.
But it shouldn't be possible to SEGFAULT the kernel just because of some
bugs in ACPI.
Fixes: 0650857570 ("ALSA: hda: add autodetection for SoundWire")
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull device mapper fixes from Mikulas Patocka:
- fix integer overflow in dm-stripe
- limit tag size in dm-integrity to 255 bytes
- fix 'alignment inconsistency' warning in dm-raid
* tag 'for-6.17/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-raid: don't set io_min and io_opt for raid1
dm-integrity: limit MAX_TAG_SIZE to 255
dm-stripe: fix a possible integer overflow
Pull btrfs fixes from David Sterba:
- in zoned mode, turn assertion to proper code when reserving space in
relocation block group
- fix search key of extended ref (hardlink) when replaying log
- fix initialization of file extent tree on filesystems without
no-holes feature
- add harmless data race annotation to block group comparator
* tag 'for-6.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: annotate block group access with data_race() when sorting for reclaim
btrfs: initialize inode::file_extent_tree after i_mode has been set
btrfs: zoned: fix incorrect ASSERT in btrfs_zoned_reserve_data_reloc_bg()
btrfs: fix invalid extref key setup when replaying dentry
The parameter max_hw_wzeroes_unmap_sectors in queue_limits should be
equal to max_write_zeroes_sectors if it is set to a non-zero value.
However, when the backend bdev is specified, this parameter is
initialized to UINT_MAX during the call to blk_set_stacking_limits(),
while only max_write_zeroes_sectors is adjusted. Therefore, this
discrepancy triggers a value check failure in blk_validate_limits().
Since the drvd driver doesn't yet support unmap write zeroes, so fix
this failure by explicitly setting max_hw_wzeroes_unmap_sectors to
zero.
Fixes: 0c40d7cb5e ("block: introduce max_{hw|user}_wzeroes_unmap_sectors to queue limits")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
These commands
modprobe brd rd_size=1048576
vgcreate vg /dev/ram*
lvcreate -m4 -L10 -n lv vg
trigger the following warnings:
device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency
device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency
The warnings are caused by the fact that io_min is 512 and physical block
size is 4096.
If there's chunk-less raid, such as raid1, io_min shouldn't be set to zero
because it would be raised to 512 and it would trigger the warning.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
Pull MD fixes from Yu Kuai:
"For 6.17 on drivers supporting write zeros, raid{0,1,10,5} are broken
and can't be assembled."
* tag 'md-6.17-20250917' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux:
md: init queue_limits->max_hw_wzeroes_unmap_sectors parameter
The sanity check previously added to uaudio_transfer_buffer_setup()
assumed the allocated buffer being linear-mapped. But the buffer
allocated via usb_alloc_coherent() isn't always so, rather to be used
with (SG-)DMA API. This leaded to a false-positive warning and the
driver failed to work.
Actually uaudio_transfer_buffer_setup() deals only with the DMA-API
addresses for MEM_XFER_BUF type, while other callers of
uaudio_iommu_map() are with pages with physical addresses for
MEM_EVENT_RING and MEM_XFER_RING types. So this patch splits the
mapping helper function to two different ones, uaudio_iommu_map() for
the DMA pages and uaudio_iommu_map_pa() for the latter, in order to
handle mapping differently for each type. Along with it, the
unnecessary address check that caused probe error is dropped, too.
Fixes: 3335a1bbd6 ("ALSA: qc_audio_offload: try to reduce address space confusion")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reported-and-tested-by: Luca Weiss <luca.weiss@fairphone.com>
Closes: https://lore.kernel.org/DBR2363A95M1.L9XBNC003490@fairphone.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Since commit 7d5e9737ef ("net: rfkill: gpio: get the name and type from
device property") rfkill_find_type() gets called with the possibly
uninitialized "const char *type_name;" local variable.
On x86 systems when rfkill-gpio binds to a "BCM4752" or "LNV4752"
acpi_device, the rfkill->type is set based on the ACPI acpi_device_id:
rfkill->type = (unsigned)id->driver_data;
and there is no "type" property so device_property_read_string() will fail
and leave type_name uninitialized, leading to a potential crash.
rfkill_find_type() does accept a NULL pointer, fix the potential crash
by initializing type_name to NULL.
Note likely sofar this has not been caught because:
1. Not many x86 machines actually have a "BCM4752"/"LNV4752" acpi_device
2. The stack happened to contain NULL where type_name is stored
Fixes: 7d5e9737ef ("net: rfkill: gpio: get the name and type from device property")
Cc: stable@vger.kernel.org
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Hans de Goede <hansg@kernel.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://patch.msgid.link/20250913113515.21698-1-hansg@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Miri Korenblit says:
====================
iwlwifi fix
====================
The fix is for byte count tables in 7000/8000 family devices.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
While collecting SCX related fields in struct task_group into struct
scx_task_group, 6e6558a6bc ("sched_ext, sched/core: Factor out struct
scx_task_group") forgot update tg->scx_weight usage in tg_weight(), which
leads to build failure when CONFIG_FAIR_GROUP_SCHED is disabled but
CONFIG_EXT_GROUP_SCHED is enabled. Fix it.
Fixes: 6e6558a6bc ("sched_ext, sched/core: Factor out struct scx_task_group")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202509170230.MwZsJSWa-lkp@intel.com/
Tested-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The cited commit adds a miss table for switchdev mode. But it
uses the same level as policy table. Will hit the following error
when running command:
# ip xfrm state add src 192.168.1.22 dst 192.168.1.21 proto \
esp spi 1001 reqid 10001 aead 'rfc4106(gcm(aes))' \
0x3a189a7f9374955d3817886c8587f1da3df387ff 128 \
mode tunnel offload dev enp8s0f0 dir in
Error: mlx5_core: Device failed to offload this state.
The dmesg error is:
mlx5_core 0000:03:00.0: ipsec_miss_create:578:(pid 311797): fail to create IPsec miss_rule err=-22
Fix it by adding a new miss level to avoid the error.
Fixes: 7d9e292ecd ("net/mlx5e: Move IPSec policy check after decryption")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1757939074-617281-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The function mlx5_uplink_netdev_get() gets the uplink netdevice
pointer from mdev->mlx5e_res.uplink_netdev. However, the netdevice can
be removed and its pointer cleared when unbound from the mlx5_core.eth
driver. This results in a NULL pointer, causing a kernel panic.
BUG: unable to handle page fault for address: 0000000000001300
at RIP: 0010:mlx5e_vport_rep_load+0x22a/0x270 [mlx5_core]
Call Trace:
<TASK>
mlx5_esw_offloads_rep_load+0x68/0xe0 [mlx5_core]
esw_offloads_enable+0x593/0x910 [mlx5_core]
mlx5_eswitch_enable_locked+0x341/0x420 [mlx5_core]
mlx5_devlink_eswitch_mode_set+0x17e/0x3a0 [mlx5_core]
devlink_nl_eswitch_set_doit+0x60/0xd0
genl_family_rcv_msg_doit+0xe0/0x130
genl_rcv_msg+0x183/0x290
netlink_rcv_skb+0x4b/0xf0
genl_rcv+0x24/0x40
netlink_unicast+0x255/0x380
netlink_sendmsg+0x1f3/0x420
__sock_sendmsg+0x38/0x60
__sys_sendto+0x119/0x180
do_syscall_64+0x53/0x1d0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Ensure the pointer is valid before use by checking it for NULL. If it
is valid, immediately call netdev_hold() to take a reference, and
preventing the netdevice from being freed while it is in use.
Fixes: 7a9fb35e8c ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1757939074-617281-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I'm trying to generate Rust bindings for netlink using the yaml spec.
It looks like there's a typo in conntrack spec: attribute set conntrack-attrs
defines attributes "counters-{orig,reply}" (plural), while get operation
references "counter-{orig,reply}" (singular). The latter should be fixed, as it
denotes multiple counters (packet and byte). The corresonding C define is
CTA_COUNTERS_ORIG.
Also, dump request references "nfgen-family" attribute, which neither exists in
conntrack-attrs attrset nor ctattr_type enum. There's member of nfgenmsg struct
with the same name, which is where family value is actually taken from.
> static int ctnetlink_dump_exp_ct(struct net *net, struct sock *ctnl,
> struct sk_buff *skb,
> const struct nlmsghdr *nlh,
> const struct nlattr * const cda[],
> struct netlink_ext_ack *extack)
> {
> int err;
> struct nfgenmsg *nfmsg = nlmsg_data(nlh);
> u_int8_t u3 = nfmsg->nfgen_family;
^^^^^^^^^^^^
Signed-off-by: Remy D. Farley <one-d-wide@protonmail.com>
Fixes: 23fc9311a5 ("netlink: specs: add conntrack dump and stats dump support")
Link: https://patch.msgid.link/20250913140515.1132886-1-one-d-wide@protonmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull perf tools fixes from Namhyung Kim:
"A small set of fixes for crashes in different commands and conditions"
* tag 'perf-tools-fixes-for-v6.17-2025-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf maps: Ensure kmap is set up for all inserts
perf lock: Provide a host_env for session new
perf subcmd: avoid crash in exclude_cmds when excludes is empty
There's another issue with aci.lock and previous patch uncovers it.
aci.lock is being destroyed during removing ixgbe while some of the
ixgbe closing routines are still ongoing. These routines use Admin
Command Interface which require taking aci.lock which has been already
destroyed what leads to call trace.
[ +0.000004] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[ +0.000007] WARNING: CPU: 12 PID: 10277 at kernel/locking/mutex.c:155 mutex_lock+0x5f/0x70
[ +0.000002] Call Trace:
[ +0.000003] <TASK>
[ +0.000006] ixgbe_aci_send_cmd+0xc8/0x220 [ixgbe]
[ +0.000049] ? try_to_wake_up+0x29d/0x5d0
[ +0.000009] ixgbe_disable_rx_e610+0xc4/0x110 [ixgbe]
[ +0.000032] ixgbe_disable_rx+0x3d/0x200 [ixgbe]
[ +0.000027] ixgbe_down+0x102/0x3b0 [ixgbe]
[ +0.000031] ixgbe_close_suspend+0x28/0x90 [ixgbe]
[ +0.000028] ixgbe_close+0xfb/0x100 [ixgbe]
[ +0.000025] __dev_close_many+0xae/0x220
[ +0.000005] dev_close_many+0xc2/0x1a0
[ +0.000004] ? kernfs_should_drain_open_files+0x2a/0x40
[ +0.000005] unregister_netdevice_many_notify+0x204/0xb00
[ +0.000006] ? __kernfs_remove.part.0+0x109/0x210
[ +0.000006] ? kobj_kset_leave+0x4b/0x70
[ +0.000008] unregister_netdevice_queue+0xf6/0x130
[ +0.000006] unregister_netdev+0x1c/0x40
[ +0.000005] ixgbe_remove+0x216/0x290 [ixgbe]
[ +0.000021] pci_device_remove+0x42/0xb0
[ +0.000007] device_release_driver_internal+0x19c/0x200
[ +0.000008] driver_detach+0x48/0x90
[ +0.000003] bus_remove_driver+0x6d/0xf0
[ +0.000006] pci_unregister_driver+0x2e/0xb0
[ +0.000005] ixgbe_exit_module+0x1c/0xc80 [ixgbe]
Same as for the previous commit, the issue has been highlighted by the
commit 337369f8ce ("locking/mutex: Add MUTEX_WARN_ON() into fast path").
Move destroying aci.lock to the end of ixgbe_remove(), as this
simply fixes the issue.
Fixes: 4600cdf9f5 ("ixgbe: Enable link management in E610 device")
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently aci.lock is initialized too late. A bunch of ACI callbacks
using the lock are called prior it's initialized.
Commit 337369f8ce ("locking/mutex: Add MUTEX_WARN_ON() into fast path")
highlights that issue what results in call trace.
[ 4.092899] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[ 4.092910] WARNING: CPU: 0 PID: 578 at kernel/locking/mutex.c:154 mutex_lock+0x6d/0x80
[ 4.098757] Call Trace:
[ 4.098847] <TASK>
[ 4.098922] ixgbe_aci_send_cmd+0x8c/0x1e0 [ixgbe]
[ 4.099108] ? hrtimer_try_to_cancel+0x18/0x110
[ 4.099277] ixgbe_aci_get_fw_ver+0x52/0xa0 [ixgbe]
[ 4.099460] ixgbe_check_fw_error+0x1fc/0x2f0 [ixgbe]
[ 4.099650] ? usleep_range_state+0x69/0xd0
[ 4.099811] ? usleep_range_state+0x8c/0xd0
[ 4.099964] ixgbe_probe+0x3b0/0x12d0 [ixgbe]
[ 4.100132] local_pci_probe+0x43/0xa0
[ 4.100267] work_for_cpu_fn+0x13/0x20
[ 4.101647] </TASK>
Move aci.lock mutex initialization to ixgbe_sw_init() before any ACI
command is sent. Along with that move also related SWFW semaphore in
order to reduce size of ixgbe_probe() and that way all locks are
initialized in ixgbe_sw_init().
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Fixes: 4600cdf9f5 ("ixgbe: Enable link management in E610 device")
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
i40e has a feature which writes to memory location last descriptor
successfully sent. Memory barrier in i40e_clean_tx_irq() was used to
avoid forward-reading descriptor fields in case DD bit was not set.
Having mentioned feature in place implies that such situation will not
happen as we know in advance how many descriptors HW has dealt with.
Besides, this barrier placement was wrong. Idea is to have this
protection *after* reading DD bit from HW descriptor, not before.
Digging through git history showed me that indeed barrier was before DD
bit check, anyways the commit introducing i40e_get_head() should have
wiped it out altogether.
Also, there was one commit doing s/read_barrier_depends/smp_rmb when get
head feature was already in place, but it was only theoretical based on
ixgbe experiences, which is different in these terms as that driver has
to read DD bit from HW descriptor.
Fixes: 1943d8ba95 ("i40e/i40evf: enable hardware feature head write back")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_put_rx_mbuf() function handles calling ice_put_rx_buf() for each
buffer in the current frame. This function was introduced as part of
handling multi-buffer XDP support in the ice driver.
It works by iterating over the buffers from first_desc up to 1 plus the
total number of fragments in the frame, cached from before the XDP program
was executed.
If the hardware posts a descriptor with a size of 0, the logic used in
ice_put_rx_mbuf() breaks. Such descriptors get skipped and don't get added
as fragments in ice_add_xdp_frag. Since the buffer isn't counted as a
fragment, we do not iterate over it in ice_put_rx_mbuf(), and thus we don't
call ice_put_rx_buf().
Because we don't call ice_put_rx_buf(), we don't attempt to re-use the
page or free it. This leaves a stale page in the ring, as we don't
increment next_to_alloc.
The ice_reuse_rx_page() assumes that the next_to_alloc has been incremented
properly, and that it always points to a buffer with a NULL page. Since
this function doesn't check, it will happily recycle a page over the top
of the next_to_alloc buffer, losing track of the old page.
Note that this leak only occurs for multi-buffer frames. The
ice_put_rx_mbuf() function always handles at least one buffer, so a
single-buffer frame will always get handled correctly. It is not clear
precisely why the hardware hands us descriptors with a size of 0 sometimes,
but it happens somewhat regularly with "jumbo frames" used by 9K MTU.
To fix ice_put_rx_mbuf(), we need to make sure to call ice_put_rx_buf() on
all buffers between first_desc and next_to_clean. Borrow the logic of a
similar function in i40e used for this same purpose. Use the same logic
also in ice_get_pgcnts().
Instead of iterating over just the number of fragments, use a loop which
iterates until the current index reaches to the next_to_clean element just
past the current frame. Unlike i40e, the ice_put_rx_mbuf() function does
call ice_put_rx_buf() on the last buffer of the frame indicating the end of
packet.
For non-linear (multi-buffer) frames, we need to take care when adjusting
the pagecnt_bias. An XDP program might release fragments from the tail of
the frame, in which case that fragment page is already released. Only
update the pagecnt_bias for the first descriptor and fragments still
remaining post-XDP program. Take care to only access the shared info for
fragmented buffers, as this avoids a significant cache miss.
The xdp_xmit value only needs to be updated if an XDP program is run, and
only once per packet. Drop the xdp_xmit pointer argument from
ice_put_rx_mbuf(). Instead, set xdp_xmit in the ice_clean_rx_irq() function
directly. This avoids needing to pass the argument and avoids an extra
bit-wise OR for each buffer in the frame.
Move the increment of the ntc local variable to ensure its updated *before*
all calls to ice_get_pgcnts() or ice_put_rx_mbuf(), as the loop logic
requires the index of the element just after the current frame.
Now that we use an index pointer in the ring to identify the packet, we no
longer need to track or cache the number of fragments in the rx_ring.
Cc: Christoph Petrausch <christoph.petrausch@deepl.com>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Closes: https://lore.kernel.org/netdev/CAK8fFZ4hY6GUJNENz3wY9jaYLZXGfpr7dnZxzGMYoE44caRbgw@mail.gmail.com/
Fixes: 743bbd93cf ("ice: put Rx buffers after being done with current frame")
Tested-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Tested-by: Priya Singh <priyax.singh@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
scx_bpf_reenqueue_local() can be called from ops.cpu_release() when a
CPU is taken by a higher scheduling class to give tasks queued to the
CPU's local DSQ a chance to be migrated somewhere else, instead of
waiting indefinitely for that CPU to become available again.
In doing so, we decided to skip migration-disabled tasks, under the
assumption that they cannot be migrated anyway.
However, when a higher scheduling class preempts a CPU, the running task
is always inserted at the head of the local DSQ as a migration-disabled
task. This means it is always skipped by scx_bpf_reenqueue_local(), and
ends up being confined to the same CPU even if that CPU is heavily
contended by other higher scheduling class tasks.
As an example, let's consider the following scenario:
$ schedtool -a 0,1, -e yes > /dev/null
$ sudo schedtool -F -p 99 -a 0, -e \
stress-ng -c 1 --cpu-load 99 --cpu-load-slice 1000
The first task (SCHED_EXT) can run on CPU0 or CPU1. The second task
(SCHED_FIFO) is pinned to CPU0 and consumes ~99% of it. If the SCHED_EXT
task initially runs on CPU0, it will remain there because it always sees
CPU0 as "idle" in the short gaps left by the RT task, resulting in ~1%
utilization while CPU1 stays idle:
0[||||||||||||||||||||||100.0%] 8[ 0.0%]
1[ 0.0%] 9[ 0.0%]
2[ 0.0%] 10[ 0.0%]
3[ 0.0%] 11[ 0.0%]
4[ 0.0%] 12[ 0.0%]
5[ 0.0%] 13[ 0.0%]
6[ 0.0%] 14[ 0.0%]
7[ 0.0%] 15[ 0.0%]
PID USER PRI NI S CPU CPU%▽MEM% TIME+ Command
1067 root RT 0 R 0 99.0 0.2 0:31.16 stress-ng-cpu [run]
975 arighi 20 0 R 0 1.0 0.0 0:26.32 yes
By allowing scx_bpf_reenqueue_local() to re-enqueue migration-disabled
tasks, the scheduler can choose to migrate them to other CPUs (CPU1 in
this case) via ops.enqueue(), leading to better CPU utilization:
0[||||||||||||||||||||||100.0%] 8[ 0.0%]
1[||||||||||||||||||||||100.0%] 9[ 0.0%]
2[ 0.0%] 10[ 0.0%]
3[ 0.0%] 11[ 0.0%]
4[ 0.0%] 12[ 0.0%]
5[ 0.0%] 13[ 0.0%]
6[ 0.0%] 14[ 0.0%]
7[ 0.0%] 15[ 0.0%]
PID USER PRI NI S CPU CPU%▽MEM% TIME+ Command
577 root RT 0 R 0 100.0 0.2 0:23.17 stress-ng-cpu [run]
555 arighi 20 0 R 1 100.0 0.0 0:28.67 yes
It's debatable whether per-CPU tasks should be re-enqueued as well, but
doing so is probably safer: the scheduler can recognize re-enqueued
tasks through the %SCX_ENQ_REENQ flag, reassess their placement, and
either put them back at the head of the local DSQ or let another task
attempt to take the CPU.
This also prevents giving per-CPU tasks an implicit priority boost,
which would otherwise make them more likely to reclaim CPUs preempted by
higher scheduling classes.
Fixes: 97e13ecb02 ("sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()")
Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Acked-by: Changwoo Min <changwoo@igalia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The parameter max_hw_wzeroes_unmap_sectors in queue_limits should be
equal to max_write_zeroes_sectors if it is set to a non-zero value.
However, the stacked md drivers call md_init_stacking_limits() to
initialize this parameter to UINT_MAX but only adjust
max_write_zeroes_sectors when setting limits. Therefore, this
discrepancy triggers a value check failure in blk_validate_limits().
$ modprobe scsi_debug num_parts=2 dev_size_mb=8 lbprz=1 lbpws=1
$ mdadm --create /dev/md0 --level=0 --raid-device=2 /dev/sda1 /dev/sda2
mdadm: Defaulting to version 1.2 metadata
mdadm: RUN_ARRAY failed: Invalid argument
Fix this failure by explicitly setting max_hw_wzeroes_unmap_sectors to
max_write_zeroes_sectors. Since the linear and raid0 drivers support
write zeroes, so they can support unmap write zeroes operation if all of
the backend devices support it. However, the raid1/10/5 drivers don't
support write zeroes, so we have to set it to zero.
Fixes: 0c40d7cb5e ("block: introduce max_{hw|user}_wzeroes_unmap_sectors to queue limits")
Reported-by: John Garry <john.g.garry@oracle.com>
Closes: https://lore.kernel.org/linux-block/803a2183-a0bb-4b7a-92f1-afc5097630d2@oracle.com/
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Tested-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Li Nan <linan122@huawei.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/linux-raid/20250910111107.3247530-2-yi.zhang@huaweicloud.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Merge series from Mohammad Rafi Shaik <mohammad.rafi.shaik@oss.qualcomm.com>:
Fix the lpaif_type configuration for the I2S interface.
The proper lpaif interface type required to allow DSP to vote
appropriate clock setting for I2S interface and also Add support
for configuring the DAI format on MI2S interfaces to allow setting
the appropriate bit clock and frame clock polarity, ensuring correct
audio data transmissionover MI2S.
intel-gpio fixes for v6.17-rc7
* Fix a regression to make GpioInt() by index work again
* Ingnore spurious wakeups from touchpad on GPD G1619-05
* Accept debounce from GpioIo() resources
`netif_rx()` already increments `rx_dropped` core stat when it fails.
The driver was also updating `ndev->stats.rx_dropped` in the same path.
Since both are reported together via `ip -s -s` command, this resulted
in drops being counted twice in user-visible stats.
Keep the driver update on `if (unlikely(!skb))`, but skip it after
`netif_rx()` errors.
Fixes: caf586e5f2 ("net: add a core netdev->rx_dropped counter")
Signed-off-by: Yeounsu Moon <yyyynoom@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250913060135.35282-3-yyyynoom@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
mptcp: pm: nl: announce deny-join-id0 flag
During the connection establishment, a peer can tell the other one that
it cannot establish new subflows to the initial IP address and port by
setting the 'C' flag [1]. Doing so makes sense when the sender is behind
a strict NAT, operating behind a legacy Layer 4 load balancer, or using
anycast IP address for example.
When this 'C' flag is set, the path-managers must then not try to
establish new subflows to the other peer's initial IP address and port.
The in-kernel PM has access to this info, but the userspace PM didn't,
not letting the userspace daemon able to respect the RFC8684.
Here are a few fixes related to this 'C' flag (aka 'deny-join-id0'):
- Patch 1: add remote_deny_join_id0 info on passive connections. A fix
for v5.14.
- Patch 2: let the userspace PM daemon know about the deny_join_id0
attribute, so when set, it can avoid creating new subflows to the
initial IP address and port. A fix for v5.19.
- Patch 3: a validation for the previous commit.
- Patch 4: record the deny_join_id0 info when TFO is used. A fix for
v6.2.
- Patch 5: not related to deny-join-id0, but it fixes errors messages in
the sockopt selftests, not to create confusions. A fix for v6.5.
====================
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-0-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch fixes several issues in the error reporting of the MPTCP sockopt
selftest:
1. Fix diff not printed: The error messages for counter mismatches had
the actual difference ('diff') as argument, but it was missing in the
format string. Displaying it makes the debugging easier.
2. Fix variable usage: The error check for 'mptcpi_bytes_acked' incorrectly
used 'ret2' (sent bytes) for both the expected value and the difference
calculation. It now correctly uses 'ret' (received bytes), which is the
expected value for bytes_acked.
3. Fix off-by-one in diff: The calculation for the 'mptcpi_rcv_delta' diff
was 's.mptcpi_rcv_delta - ret', which is off-by-one. It has been
corrected to 's.mptcpi_rcv_delta - (ret + 1)' to match the expected
value in the condition above it.
Fixes: 5dcff89e14 ("selftests: mptcp: explicitly tests aggregate counters")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-5-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When TFO is used, the check to see if the 'C' flag (deny join id0) was
set was bypassed.
This flag can be set when TFO is used, so the check should also be done
when TFO is used.
Note that the set_fully_established label is also used when a 4th ACK is
received. In this case, deny_join_id0 will not be set.
Fixes: dfc8d06030 ("mptcp: implement delayed seq generation for passive fastopen")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-4-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The previous commit adds the MPTCP_PM_EV_FLAG_DENY_JOIN_ID0 flag. Make
sure it is correctly announced by the other peer when it has been
received.
pm_nl_ctl will now display 'deny_join_id0:1' when monitoring the events,
and when this flag was set by the other peer.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 702c2f646d ("mptcp: netlink: allow userspace-driven subflow establishment")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-3-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During the connection establishment, a peer can tell the other one that
it cannot establish new subflows to the initial IP address and port by
setting the 'C' flag [1]. Doing so makes sense when the sender is behind
a strict NAT, operating behind a legacy Layer 4 load balancer, or using
anycast IP address for example.
When this 'C' flag is set, the path-managers must then not try to
establish new subflows to the other peer's initial IP address and port.
The in-kernel PM has access to this info, but the userspace PM didn't.
The RFC8684 [1] is strict about that:
(...) therefore the receiver MUST NOT try to open any additional
subflows toward this address and port.
So it is important to tell the userspace about that as it is responsible
for the respect of this flag.
When a new connection is created and established, the Netlink events
now contain the existing but not currently used 'flags' attribute. When
MPTCP_PM_EV_FLAG_DENY_JOIN_ID0 is set, it means no other subflows
to the initial IP address and port -- info that are also part of the
event -- can be established.
Link: https://datatracker.ietf.org/doc/html/rfc8684#section-3.1-20.6 [1]
Fixes: 702c2f646d ("mptcp: netlink: allow userspace-driven subflow establishment")
Reported-by: Marek Majkowski <marek@cloudflare.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/532
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-2-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a SYN containing the 'C' flag (deny join id0) was received, this
piece of information was not propagated to the path-manager.
Even if this flag is mainly set on the server side, a client can also
tell the server it cannot try to establish new subflows to the client's
initial IP address and port. The server's PM should then record such
info when received, and before sending events about the new connection.
Fixes: df377be387 ("mptcp: add deny_join_id0 in mptcp_options_received")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-1-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
selftests: mptcp: avoid spurious errors on TCP disconnect
This series should fix the recent instabilities seen by MPTCP and NIPA
CIs where the 'mptcp_connect.sh' tests fail regularly when running the
'disconnect' subtests with "plain" TCP sockets, e.g.
# INFO: disconnect
# 63 ns1 MPTCP -> ns1 (10.0.1.1:20001 ) MPTCP (duration 996ms) [ OK ]
# 64 ns1 MPTCP -> ns1 (10.0.1.1:20002 ) TCP (duration 851ms) [ OK ]
# 65 ns1 TCP -> ns1 (10.0.1.1:20003 ) MPTCP Unexpected revents: POLLERR/POLLNVAL(19)
# (duration 896ms) [FAIL] file received by server does not match (in, out):
# -rw-r--r-- 1 root root 11112852 Aug 19 09:16 /tmp/tmp.hlJe5DoMoq.disconnect
# Trailing bytes are:
# /{ga 6@=#.8:-rw------- 1 root root 10085368 Aug 19 09:16 /tmp/tmp.blClunilxx
# Trailing bytes are:
# /{ga 6@=#.8:66 ns1 MPTCP -> ns1 (dead:beef:1::1:20004) MPTCP (duration 987ms) [ OK ]
# 67 ns1 MPTCP -> ns1 (dead:beef:1::1:20005) TCP (duration 911ms) [ OK ]
# 68 ns1 TCP -> ns1 (dead:beef:1::1:20006) MPTCP (duration 980ms) [ OK ]
# [FAIL] Tests of the full disconnection have failed
These issues started to be visible after some behavioural changes in
TCP, where too quick re-connections after a shutdown() can now be more
easily rejected. Patch 3 modifies the selftests to wait, but this
resolution revealed an issue in MPTCP which is fixed by patch 1 (a fix
for v5.9 kernel).
Patches 2 and 4 improve some errors reported by the selftests, and patch
5 helps with the debugging of such issues.
====================
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-0-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is better than printing random bytes in the terminal.
Note that Jakub suggested 'hexdump', but Mat found out this tool is not
often installed by default. 'od' can do a similar job, and it is in the
POSIX specs and available in coreutils, so it should be on more systems.
While at it, display a few more bytes, just to fill in the two lines.
And no need to display the 3rd only line showing the next number of
bytes: 0000040.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-4-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The disconnect test-case, with 'plain' TCP sockets generates spurious
errors, e.g.
07 ns1 TCP -> ns1 (dead:beef:1::1:10006) MPTCP
read: Connection reset by peer
read: Connection reset by peer
(duration 155ms) [FAIL] client exit code 3, server 3
netns ns1-FloSdv (listener) socket stat for 10006:
TcpActiveOpens 2 0.0
TcpPassiveOpens 2 0.0
TcpEstabResets 2 0.0
TcpInSegs 274 0.0
TcpOutSegs 276 0.0
TcpOutRsts 3 0.0
TcpExtPruneCalled 2 0.0
TcpExtRcvPruned 1 0.0
TcpExtTCPPureAcks 104 0.0
TcpExtTCPRcvCollapsed 2 0.0
TcpExtTCPBacklogCoalesce 42 0.0
TcpExtTCPRcvCoalesce 43 0.0
TcpExtTCPChallengeACK 1 0.0
TcpExtTCPFromZeroWindowAdv 42 0.0
TcpExtTCPToZeroWindowAdv 41 0.0
TcpExtTCPWantZeroWindowAdv 13 0.0
TcpExtTCPOrigDataSent 164 0.0
TcpExtTCPDelivered 165 0.0
TcpExtTCPRcvQDrop 1 0.0
In the failing scenarios (TCP -> MPTCP), the involved sockets are
actually plain TCP ones, as fallbacks for passive sockets at 2WHS time
cause the MPTCP listeners to actually create 'plain' TCP sockets.
Similar to commit 218cc16632 ("selftests: mptcp: avoid spurious errors
on disconnect"), the root cause is in the user-space bits: the test
program tries to disconnect as soon as all the pending data has been
spooled, generating an RST. If such option reaches the peer before the
connection has reached the closed status, the TCP socket will report an
error to the user-space, as per protocol specification, causing the
above failure. Note that it looks like this issue got more visible since
the "tcp: receiver changes" series from commit 06baf9bfa6 ("Merge
branch 'tcp-receiver-changes'").
Address the issue by explicitly waiting for the TCP sockets (-t) to
reach a closed status before performing the disconnect. More precisely,
the test program now waits for plain TCP sockets or TCP subflows in
addition to the MPTCP sockets that were already monitored.
While at it, use 'ss' with '-n' to avoid resolving service names, which
is not needed here.
Fixes: 218cc16632 ("selftests: mptcp: avoid spurious errors on disconnect")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-3-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
IO errors were correctly printed to stderr, and propagated up to the
main loop for the server side, but the returned value was ignored. As a
consequence, the program for the listener side was no longer exiting
with an error code in case of IO issues.
Because of that, some issues might not have been seen. But very likely,
most issues either had an effect on the client side, or the file
transfer was not the expected one, e.g. the connection got reset before
the end. Still, it is better to fix this.
The main consequence of this issue is the error that was reported by the
selftests: the received and sent files were different, and the MIB
counters were not printed. Also, when such errors happened during the
'disconnect' tests, the program tried to continue until the timeout.
Now when an IO error is detected, the program exits directly with an
error.
Fixes: 05be5e273c ("selftests: mptcp: add disconnect tests")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-2-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the MPTCP DATA FIN have been ACKed, there is no more MPTCP related
metadata to exchange, and all subflows can be safely shutdown.
Before this patch, the subflows were actually terminated at 'close()'
time. That's certainly fine most of the time, but not when the userspace
'shutdown()' a connection, without close()ing it. When doing so, the
subflows were staying in LAST_ACK state on one side -- and consequently
in FIN_WAIT2 on the other side -- until the 'close()' of the MPTCP
socket.
Now, when the DATA FIN have been ACKed, all subflows are shutdown. A
consequence of this is that the TCP 'FIN' flag can be set earlier now,
but the end result is the same. This affects the packetdrill tests
looking at the end of the MPTCP connections, but for a good reason.
Note that tcp_shutdown() will check the subflow state, so no need to do
that again before calling it.
Fixes: 3721b9b646 ("mptcp: Track received DATA_FIN sequence number and add related helpers")
Cc: stable@vger.kernel.org
Fixes: 16a9a9da17 ("mptcp: Add helper to process acks of DATA_FIN")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-1-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After commit 5c3bf6cba7 ("bonding: assign random address if device
address is same as bond"), bonding will erroneously randomize the MAC
address of the first interface added to the bond if fail_over_mac =
follow.
Correct this by additionally testing for the bond being empty before
randomizing the MAC.
Fixes: 5c3bf6cba7 ("bonding: assign random address if device address is same as bond")
Reported-by: Qiuling Ren <qren@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250910024336.400253-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We need to increment i_fastreg_wrs before we bail out from
rds_ib_post_reg_frmr().
We have a fixed budget of how many FRWR operations that can be
outstanding using the dedicated QP used for memory registrations and
de-registrations. This budget is enforced by the atomic_t
i_fastreg_wrs. If we bail out early in rds_ib_post_reg_frmr(), we will
"leak" the possibility of posting an FRWR operation, and if that
accumulates, no FRWR operation can be carried out.
Fixes: 1659185fb4 ("RDS: IB: Support Fastreg MR (FRMR) memory registration mode")
Fixes: 3a2886cca7 ("net/rds: Keep track of and wait for FRWR segments in use upon shutdown")
Cc: stable@vger.kernel.org
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20250911133336.451212-1-haakon.bugge@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The error log in vhost_scsi_make_tport() prints the arguments in the
wrong order, producing confusing output. For example, when creating a
target with a name in WWNN format such as "fc.port1234", the log
looks like:
Emulated fc.port1234 Address: FCP, exceeds max: 64
Instead, the message should report the emulated protocol type first,
followed by the configfs name as:
Emulated FCP Address: fc.port1234, exceeds max: 64
Fix the argument order so the error log is consistent and clear.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Message-Id: <20250913154106.3995856-1-alok.a.tiwari@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Enable the cleaner shader for additional GFX11.0.1/11.0.4 series GPUs to
ensure data isolation among GPU tasks. The cleaner shader is tasked with
clearing the Local Data Store (LDS), Vector General Purpose Registers
(VGPRs), and Scalar General Purpose Registers (SGPRs), which helps avoid
data leakage and guarantees the accuracy of computational results.
This update extends cleaner shader support to GFX11.0.1/11.0.4 GPUs,
previously available for GFX11.0.3. It enhances security by clearing GPU
memory between processes and maintains a consistent GPU state across KGD
and KFD workloads.
Cc: Wasee Alam <wasee.alam@amd.com>
Cc: Mario Sopena-Novales <mario.novales@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0a71ceb27f)
Refactor: eliminate type casts by using proper u32
declarations.
v2:
- Address review comments. (Karthik)
v3:
- Use the proper u32 type and drop cast. (Lucas De Marchi)
- Modify variable when actually using u64 value.
- Change r value to reg_value with u32 type.
v4:
- Remove newline between trailer and Signed-off-by. (Lucas De Marchi)
- Change reg_val to val for more user-friendly logging.
- Use mul_u32_u32 function since both values are u32.
v5:
- mul_u32_u32 function with shift. (Lucas De Marchi)
Fixes: 7596d839f6 ("drm/xe/hwmon: Add support to manage power limits though mailbox")
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250912113458.2815172-1-mallesh.koujalagi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 4e1d3b5e64)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
__maps__fixup_overlap_and_insert may split or directly insert a map,
when doing this the map may need to have a kmap set up for the sake of
the kmaps. The missing kmap set up fails the check_invariants test in
maps, later "Internal error" reports from map__kmap and ultimately
causes segfaults.
Similar fixes were added in commit e0e4e0b8b7 ("perf maps: Add
missing map__set_kmap_maps() when replacing a kernel map") and commit
25d9c0301d ("perf maps: Set the kmaps for newly created/added kernel
maps") but they missed cases. To try to reduce the risk of this,
update the kmap directly following any manual insert. This identified
another problem in maps__copy_from.
Fixes: e0e4e0b8b7 ("perf maps: Add missing map__set_kmap_maps() when replacing a kernel map")
Fixes: 25d9c0301d ("perf maps: Set the kmaps for newly created/added kernel maps")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Commit 88e6c42e40de ("io_uring/io-wq: add check free worker before
create new worker") reused the variable `do_create` for something
else, abusing it for the free worker check.
This caused the value to effectively always be `true` at the time
`nr_workers < max_workers` was checked, but it should really be
`false`. This means the `max_workers` setting was ignored, and worse:
if the limit had already been reached, incrementing `nr_workers` was
skipped even though another worker would be created.
When later lots of workers exit, the `nr_workers` field could easily
underflow, making the problem worse because more and more workers
would be created without incrementing `nr_workers`.
The simple solution is to use a different variable for the free worker
check instead of using one variable for two different things.
Cc: stable@vger.kernel.org
Fixes: 88e6c42e40de ("io_uring/io-wq: add check free worker before create new worker")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Fengnan Chang <changfengnan@bytedance.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The sev_evict_cache() is guest-related code and should be guarded by
CONFIG_AMD_MEM_ENCRYPT, not CONFIG_KVM_AMD_SEV.
CONFIG_AMD_MEM_ENCRYPT=y is required for a guest to run properly as an SEV-SNP
guest, but a guest kernel built with CONFIG_KVM_AMD_SEV=n would get the stub
function of sev_evict_cache() instead of the version that performs the actual
eviction. Move the function declarations under the appropriate #ifdef.
Fixes: 7b306dfa32 ("x86/sev: Evict cache lines during SNP memory validation")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@kernel.org # 6.16.x
Link: https://lore.kernel.org/r/70e38f2c4a549063de54052c9f64929705313526.1757708959.git.thomas.lendacky@amd.com
The IPC shared memory can reside in system memory or SRAM. In the latter
case the memory-region property is expected not to be present, so do not
warn about it.
Reported-by: Jonathan Hunter <jonathanh@nvidia.com>
Fixes: dbe4efea38 ("firmware: tegra: bpmp: Use of_reserved_mem_region_to_resource() for "memory-region"")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Pull power supply fixes from Sebastian Reichel:
- bq27xxx: avoid spamming the log for missing bq27000 battery
* tag 'for-v6.17-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: supply: bq27xxx: restrict no-battery detection to bq27000
power: supply: bq27xxx: fix error return in case of no bq27000 hdq battery
Use dev_get_drvdata(dev->parent) instead of dev_get_platdata(dev)
to correctly obtain acp_chip_info members in the acp I2S driver.
Previously, some members were not updated properly due to incorrect
data access, which could potentially lead to null pointer
dereferences.
This issue was missed in the earlier commit
("ASoC: amd: acp: Fix NULL pointer deref in acp_i2s_set_tdm_slot"),
which only addressed set_tdm_slot(). This change ensures that all
relevant functions correctly retrieve acp_chip_info, preventing
further null pointer dereference issues.
Fixes: e3933683b2 ("ASoC: amd: acp: Remove redundant acp_dev_data structure")
Signed-off-by: Venkata Prasad Potturu <venkataprasad.potturu@amd.com>
Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20250910171419.3682468-1-venkataprasad.potturu@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The QCS8275 board is based on Qualcomm's QCS8300 SoC family, and all
supported firmware files are located in the qcs8300 directory. The
sound topology and ALSA UCM configuration files have also been migrated
from the qcs8275 directory to the actual SoC qcs8300 directory in
linux-firmware. With the current setup, the sound topology fails
to load, resulting in sound card registration failure.
This patch updates the driver match data to use the correct driver name
qcs8300 for the qcs8275-sndcard, ensuring that the sound card driver
correctly loads the sound topology and ALSA UCM configuration files
from the qcs8300 directory.
Fixes: 34d340d48e ("ASoC: qcom: sc8280xp: Add support for QCS8275")
Cc: stable@vger.kernel.org
Signed-off-by: Mohammad Rafi Shaik <mohammad.rafi.shaik@oss.qualcomm.com>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20250914131549.1198740-1-mohammad.rafi.shaik@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Xiumei reported a regression in IPsec offload tests over xfrmi, where
the traffic for IPv6 over IPv4 tunnels is processed in SW instead of
going through crypto offload, after commit
cc18f482e8 ("xfrm: provide common xdo_dev_offload_ok callback
implementation").
Commit cc18f482e8 added a generic version of existing checks
attempting to prevent packets with IPv4 options or IPv6 extension
headers from being sent to HW that doesn't support offloading such
packets. The check mistakenly uses x->props.family (the outer family)
to determine the inner packet's family and verify if
options/extensions are present.
In the case of IPv6 over IPv4, the check compares some of the traffic
class bits to the expected no-options ihl value (5). The original
check was introduced in commit 2ac9cfe782 ("net/mlx5e: IPSec, Add
Innova IPSec offload TX data path"), and then duplicated in the other
drivers. Before commit cc18f482e8, the loose check (ihl > 5) passed
because those traffic class bits were not set to a value that
triggered the no-offload codepath. Packets with options/extension
headers that should have been handled in SW went through the offload
path, and were likely dropped by the NIC or incorrectly
processed. Since commit cc18f482e8, the check is now strict (ihl !=
5), and in a basic setup (no traffic class configured), all packets go
through the no-offload codepath.
The commits that introduced the incorrect family checks in each driver
are:
2ac9cfe782 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path")
8362ea16f6 ("crypto: chcr - ESN for Inline IPSec Tx")
859a497fe8 ("nfp: implement xfrm callbacks and expose ipsec offload feature to upper layer")
32188be805 ("cn10k-ipsec: Allow ipsec crypto offload for skb with SA")
[ixgbe/ixgbevf commits are ignored, as that HW does not support tunnel
mode, thus no cross-family setups are possible]
Fixes: cc18f482e8 ("xfrm: provide common xdo_dev_offload_ok callback implementation")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The issue was caused by incorrect configuration in the driver,
which prevented proper volume control on certain systems.
Signed-off-by: Bou-Saan Che <yungmeat@inboxia.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The laptop does not contain valid _DSD for these amps, so requires
entries into the CS35L41 configuration table to function correctly.
Signed-off-by: Bou-Saan Che <yungmeat@inboxia.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This laptop does not contain _DSD so needs to be supported using the
configuration table.
Signed-off-by: Bou-Saan Che <yungmeat@inboxia.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Since commit 7c010d4633 ("gpiolib: acpi: Make sure we fill struct
acpi_gpio_info"), uninitialized acpi_gpio_info struct are passed to
__acpi_find_gpio() and later in the call stack info->quirks is used in
acpi_populate_gpio_lookup. This breaks the i2c_hid_cpi driver:
[ 58.122916] i2c_hid_acpi i2c-UNIW0001:00: HID over i2c has not been provided an Int IRQ
[ 58.123097] i2c_hid_acpi i2c-UNIW0001:00: probe with driver i2c_hid_acpi failed with error -22
Fix this by initializing the acpi_gpio_info pass to __acpi_find_gpio()
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220388
Fixes: 7c010d4633 ("gpiolib: acpi: Make sure we fill struct acpi_gpio_info")
Signed-off-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com>
Tested-by: Hans de Goede <hansg@kernel.org>
Reviewed-by: Hans de Goede <hansg@kernel.org>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-By: Calvin Owens <calvin@wbinvd.org>
Cc: stable@vger.kernel.org
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
If create_monitor_dir() fails, the function returns directly without
releasing rv_interface_lock. This leaves the mutex locked and causes
subsequent monitor registration attempts to deadlock.
Fix it by making the error path jump to out_unlock, ensuring that the
mutex is always released before returning.
Fixes: 24cbfe18d5 ("rv: Merge struct rv_monitor_def into struct rv_monitor")
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20250903065112.1878330-1-zhen.ni@easystack.cn
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Argument 'p' of enabled_monitors_next() is not a pointer to struct
rv_monitor, it is actually a pointer to the list_head inside struct
rv_monitor. Therefore it is wrong to cast 'p' to struct rv_monitor *.
This wrong type cast has been there since the beginning. But it still
worked because the list_head was the first field in struct rv_monitor_def.
This is no longer true since commit 24cbfe18d5 ("rv: Merge struct
rv_monitor_def into struct rv_monitor") moved the list_head, and this wrong
type cast became a functional problem.
Properly use container_of() instead.
Fixes: 24cbfe18d5 ("rv: Merge struct rv_monitor_def into struct rv_monitor")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/20250806120911.989365-1-namcao@linutronix.de
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
When sorting the block group list for reclaim we are using a block group's
used bytes counter without taking the block group's spinlock, so we can
race with a concurrent task updating it (at btrfs_update_block_group()),
which makes tools like KCSAN unhappy and report a race.
Since the sorting is not strictly needed from a functional perspective
and such races should rarely cause any ordering changes (only load/store
tearing could cause them), not to mention that after the sorting the
ordering may no longer be accurate due to concurrent allocations and
deallocations of extents in a block group, annotate the accesses to the
used counter with data_race() to silence KCSAN and similar tools.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_init_file_extent_tree() uses S_ISREG() to determine if the file is
a regular file. In the beginning of btrfs_read_locked_inode(), the i_mode
hasn't been read from inode item, then file_extent_tree won't be used at
all in volumes without NO_HOLES.
Fix this by calling btrfs_init_file_extent_tree() after i_mode is
initialized in btrfs_read_locked_inode().
Fixes: 3d7db6e8bd ("btrfs: don't allocate file extent tree for non regular files")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: austinchang <austinchang@synology.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When moving a block-group to the dedicated data relocation space-info in
btrfs_zoned_reserve_data_reloc_bg() it is asserted that the newly
created block group for data relocation does not contain any
zone_unusable bytes.
But on disks with zone_capacity < zone_size, the difference between
zone_size and zone_capacity is accounted as zone_unusable.
Instead of asserting that the block-group does not contain any
zone_unusable bytes, remove them from the block-groups total size.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/linux-block/CAHj4cs8-cS2E+-xQ-d2Bj6vMJZ+CwT_cbdWBTju4BV35LsvEYw@mail.gmail.com/
Fixes: daa0fde322 ("btrfs: zoned: fix data relocation block group reservation")
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The offset for an extref item's key is not the object ID of the parent
dir, otherwise we would not need the extref item and would use plain ref
items. Instead the offset is the result of a hash computation that uses
the object ID of the parent dir and the name associated to the entry.
So fix this by setting the key offset at replay_one_name() to be the
result of calling btrfs_extref_hash().
Fixes: 725af92a62 ("btrfs: Open-code name_in_log_ref in replay_one_name")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If data_offset and data_length of smb_direct_data_transfer struct are
invalid, out of bounds issue could happen.
This patch validate data_offset and data_length field in recv_done.
Cc: stable@vger.kernel.org
Fixes: 2ea086e35c ("ksmbd: add buffer validation for smb direct")
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Reported-by: Luigino Camastra, Aisle Research <luigino.camastra@aisle.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
We should not use more sges for ib_post_send() than we told the rdma
device in rdma_create_qp()!
Otherwise ib_post_send() will return -EINVAL, so we disconnect the
connection. Or with the current siw.ko we'll get 0 from ib_post_send(),
but will never ever get a completion for the request. I've already sent a
fix for siw.ko...
So we need to make sure smb_direct_writev() limits the number of vectors
we pass to individual smb_direct_post_send_data() calls, so that we
don't go over the queue pair limits.
Commit 621433b7e2 ("ksmbd: smbd: relax the count of sges required")
was very strange and I guess only needed because
SMB_DIRECT_MAX_SEND_SGES was 8 at that time. It basically removed the
check that the rdma device is able to handle the number of sges we try
to use.
While the real problem was added by commit ddbdc861e3 ("ksmbd: smbd:
introduce read/write credits for RDMA read/write") as it used the
minumun of device->attrs.max_send_sge and device->attrs.max_sge_rd, with
the problem that device->attrs.max_sge_rd is always 1 for iWarp. And
that limitation should only apply to RDMA Read operations. For now we
keep that limitation for RDMA Write operations too, fixing that is a
task for another day as it's not really required a bug fix.
Commit 2b4eeeaa90 ("ksmbd: decrease the number of SMB3 smbdirect
server SGEs") lowered SMB_DIRECT_MAX_SEND_SGES to 6, which is also used
by our client code. And that client code enforces
device->attrs.max_send_sge >= 6 since commit d2e81f92e5 ("Decrease the
number of SMB3 smbdirect client SGEs") and (briefly looking) only the
i40w driver provides only 3, see I40IW_MAX_WQ_FRAGMENT_COUNT. But
currently we'd require 4 anyway, so that would not work anyway, but now
it fails early.
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Hyunchul Lee <hyc.lee@gmail.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: linux-rdma@vger.kernel.org
Fixes: 0626e6641f ("cifsd: add server handler for central processing and tranport layers")
Fixes: ddbdc861e3 ("ksmbd: smbd: introduce read/write credits for RDMA read/write")
Fixes: 621433b7e2 ("ksmbd: smbd: relax the count of sges required")
Fixes: 2b4eeeaa90 ("ksmbd: decrease the number of SMB3 smbdirect server SGEs")
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Ilya Maximets says:
====================
net: dst_metadata: fix DF flag extraction on tunnel rx
Two patches here, first fixes the issue where tunnel core doesn't
actually extract DF bit from the outer IP header, even though both
OVS and TC flower allow matching on it. More details in the commit
message.
The second is a selftest for openvswitch that reproduces the issue,
but also just adds some basic coverage for the tunnel metadata
extraction and related openvswitch uAPI.
====================
Link: https://patch.msgid.link/20250909165440.229890-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This test ensures that upon receiving decapsulated packets from a
tunnel interface in openvswitch, the tunnel metadata fields are
properly populated. This partially covers interoperability of the
kernel tunnel ports and openvswitch tunnels (LWT) and parsing and
formatting of the tunnel metadata fields of the openvswitch netlink
uAPI. Doing so, this test also ensures that fields and flags are
properly extracted during decapsulation by the tunnel core code,
serving as a regression test for the previously fixed issue with the
DF bit not being extracted from the outer IP header.
The ovs-dpctl.py script already supports all that is necessary for
the tunnel ports for this test, so we only need to adjust the
ovs_add_if() function to pass the '-t' port type argument in order
to be able to create tunnel ports in the openvswitch datapath.
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Link: https://patch.msgid.link/20250909165440.229890-3-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Both OVS and TC flower allow extracting and matching on the DF bit of
the outer IP header via OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT in the
OVS_KEY_ATTR_TUNNEL and TCA_FLOWER_KEY_FLAGS_TUNNEL_DONT_FRAGMENT in
the TCA_FLOWER_KEY_ENC_FLAGS respectively. Flow dissector extracts
this information as FLOW_DIS_F_TUNNEL_DONT_FRAGMENT from the tunnel
info key.
However, the IP_TUNNEL_DONT_FRAGMENT_BIT in the tunnel key is never
actually set, because the tunneling code doesn't actually extract it
from the IP header. OAM and CRIT_OPT are extracted by the tunnel
implementation code, same code also sets the KEY flag, if present.
UDP tunnel core takes care of setting the CSUM flag if the checksum
is present in the UDP header, but the DONT_FRAGMENT is not handled at
any layer.
Fix that by checking the bit and setting the corresponding flag while
populating the tunnel info in the IP layer where it belongs.
Not using __assign_bit as we don't really need to clear the bit in a
just initialized field. It also doesn't seem like using __assign_bit
will make the code look better.
Clearly, users didn't rely on this functionality for anything very
important until now. The reason why this doesn't break OVS logic is
that it only matches on what kernel previously parsed out and if kernel
consistently reports this bit as zero, OVS will only match on it to be
zero, which sort of works. But it is still a bug that the uAPI reports
and allows matching on the field that is not actually checked in the
packet. And this is causing misleading -df reporting in OVS datapath
flows, while the tunnel traffic actually has the bit set in most cases.
This may also cause issues if a hardware properly implements support
for tunnel flag matching as it will disagree with the implementation
in a software path of TC flower.
Fixes: 7d5437c709 ("openvswitch: Add tunneling interface.")
Fixes: 1d17568e74 ("net/sched: cls_flower: add support for matching tunnel control flags")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250909165440.229890-2-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the protection override dump path, the firmware can return far too
many GRC elements, resulting in attempting to write past the end of the
previously-kmalloc'ed dump buffer.
This will result in a kernel panic with reason:
BUG: unable to handle kernel paging request at ADDRESS
where "ADDRESS" is just past the end of the protection override dump
buffer. The start address of the buffer is:
p_hwfn->cdev->dbg_features[DBG_FEATURE_PROTECTION_OVERRIDE].dump_buf
and the size of the buffer is buf_size in the same data structure.
The panic can be arrived at from either the qede Ethernet driver path:
[exception RIP: qed_grc_dump_addr_range+0x108]
qed_protection_override_dump at ffffffffc02662ed [qed]
qed_dbg_protection_override_dump at ffffffffc0267792 [qed]
qed_dbg_feature at ffffffffc026aa8f [qed]
qed_dbg_all_data at ffffffffc026b211 [qed]
qed_fw_fatal_reporter_dump at ffffffffc027298a [qed]
devlink_health_do_dump at ffffffff82497f61
devlink_health_report at ffffffff8249cf29
qed_report_fatal_error at ffffffffc0272baf [qed]
qede_sp_task at ffffffffc045ed32 [qede]
process_one_work at ffffffff81d19783
or the qedf storage driver path:
[exception RIP: qed_grc_dump_addr_range+0x108]
qed_protection_override_dump at ffffffffc068b2ed [qed]
qed_dbg_protection_override_dump at ffffffffc068c792 [qed]
qed_dbg_feature at ffffffffc068fa8f [qed]
qed_dbg_all_data at ffffffffc0690211 [qed]
qed_fw_fatal_reporter_dump at ffffffffc069798a [qed]
devlink_health_do_dump at ffffffff8aa95e51
devlink_health_report at ffffffff8aa9ae19
qed_report_fatal_error at ffffffffc0697baf [qed]
qed_hw_err_notify at ffffffffc06d32d7 [qed]
qed_spq_post at ffffffffc06b1011 [qed]
qed_fcoe_destroy_conn at ffffffffc06b2e91 [qed]
qedf_cleanup_fcport at ffffffffc05e7597 [qedf]
qedf_rport_event_handler at ffffffffc05e7bf7 [qedf]
fc_rport_work at ffffffffc02da715 [libfc]
process_one_work at ffffffff8a319663
Resolve this by clamping the firmware's return value to the maximum
number of legal elements the firmware should return.
Fixes: d52c89f120 ("qed*: Utilize FW 8.37.2.0")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Link: https://patch.msgid.link/f8e1182934aa274c18d0682a12dbaf347595469c.1757485536.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull generic phy driver fixes from Vinod Koul:
- Qualcomm repeater override properties, qmp pcie bindings fix for
clocks and initialization sequence for firmware power down case
- Marvell comphy bindings clock and child node constraints
- Tegra xusb device reference leaks fix
- TI omap usb device ref leak on unbind and RGMII IS settings fix
* tag 'phy-fix-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
phy: qcom: qmp-pcie: Fix PHY initialization when powered down by firmware
phy: ti: gmii-sel: Always write the RGMII ID setting
dt-bindings: phy: qcom,sc8280xp-qmp-pcie-phy: Update pcie phy bindings
phy: ti-pipe3: fix device leak at unbind
phy: ti: omap-usb2: fix device leak at unbind
phy: tegra: xusb: fix device and OF node leak at probe
dt-bindings: phy: marvell,comphy-cp110: Fix clock and child node constraints
phy: qualcomm: phy-qcom-eusb2-repeater: fix override properties
Add a helper to validate the VF ID and use it in the VF ndo ops to
prevent accessing out-of-range entries.
Without this check, users can run commands such as:
# ip link show dev enp135s0
2: enp135s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 00:00:00:01:01:00 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state enable, trust off
vf 1 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state enable, trust off
# ip link set dev enp135s0 vf 4 mac 00:00:00:00:00:14
# echo $?
0
even though VF 4 does not exist, which results in silent success instead
of returning an error.
Fixes: 8a241ef9b9 ("octeon_ep: add ndo ops for VFs in PF driver")
Signed-off-by: Kamal Heib <kheib@redhat.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250911223610.1803144-1-kheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The DPLL_CLOCK_QUALITY_LEVEL_ITU_OPT1_EPRC is not reported via netlink
due to bug in dpll_msg_add_clock_quality_level(). The usage of
DPLL_CLOCK_QUALITY_LEVEL_MAX for both DECLARE_BITMAP() and
for_each_set_bit() is not correct because these macros requires bitmap
size and not the highest valid bit in the bitmap.
Use correct bitmap size to fix this issue.
Fixes: a1afb959ad ("dpll: add clock quality level attribute and op")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20250912093331.862333-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A NULL pointer dereference can occur in tcp_ao_finish_connect() during a
connect() system call on a socket with a TCP-AO key added and TCP_REPAIR
enabled.
The function is called with skb being NULL and attempts to dereference it
on tcp_hdr(skb)->seq without a prior skb validation.
Fix this by checking if skb is NULL before dereferencing it.
The commentary is taken from bpf_skops_established(), which is also called
in the same flow. Unlike the function being patched,
bpf_skops_established() validates the skb before dereferencing it.
int main(void){
struct sockaddr_in sockaddr;
struct tcp_ao_add tcp_ao;
int sk;
int one = 1;
memset(&sockaddr,'\0',sizeof(sockaddr));
memset(&tcp_ao,'\0',sizeof(tcp_ao));
sk = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
sockaddr.sin_family = AF_INET;
memcpy(tcp_ao.alg_name,"cmac(aes128)",12);
memcpy(tcp_ao.key,"ABCDEFGHABCDEFGH",16);
tcp_ao.keylen = 16;
memcpy(&tcp_ao.addr,&sockaddr,sizeof(sockaddr));
setsockopt(sk, IPPROTO_TCP, TCP_AO_ADD_KEY, &tcp_ao,
sizeof(tcp_ao));
setsockopt(sk, IPPROTO_TCP, TCP_REPAIR, &one, sizeof(one));
sockaddr.sin_family = AF_INET;
sockaddr.sin_port = htobe16(123);
inet_aton("127.0.0.1", &sockaddr.sin_addr);
connect(sk,(struct sockaddr *)&sockaddr,sizeof(sockaddr));
return 0;
}
$ gcc tcp-ao-nullptr.c -o tcp-ao-nullptr -Wall
$ unshare -Urn
BUG: kernel NULL pointer dereference, address: 00000000000000b6
PGD 1f648d067 P4D 1f648d067 PUD 1982e8067 PMD 0
Oops: Oops: 0000 [#1] SMP NOPTI
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 11/12/2020
RIP: 0010:tcp_ao_finish_connect (net/ipv4/tcp_ao.c:1182)
Fixes: 7c2ffaf21b ("net/tcp: Calculate TCP-AO traffic keys")
Signed-off-by: Anderson Nascimento <anderson@allelesecurity.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250911230743.2551-3-anderson@allelesecurity.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull tty/serial fixes from Greg KH:
"Here are some small tty and serial driver fixes for 6.17-rc6 that
resolve some reported problems. Included in here are:
- 8250 driver dt bindings fixes
- broadcom serial driver binding fixes
- hvc_console bugfix
- xilinx serial driver bugfix
- sc16is7xx serial driver bugfix
All of these have been in linux-next for the past week with no
reported issues"
* tag 'tty-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: xilinx_uartps: read reg size from DTS
tty: hvc_console: Call hvc_kick in hvc_write unconditionally
dt-bindings: serial: 8250: allow "main" and "uart" as clock names
dt-bindings: serial: 8250: move a constraint
dt-bindings: serial: brcm,bcm7271-uart: Constrain clocks
serial: sc16is7xx: fix bug in flow control levels init
Pull USB fixes from Greg KH:
"Here are some small USB driver fixes and new device ids for 6.17-rc6.
Included in here are:
- new usb-serial driver device ids
- dummy-hcd locking bugfix for rt-enabled systems (which is crazy,
but people have odd testing requirements at times...)
- xhci driver bugfixes for reported issues
- typec driver bugfix
- midi2 gadget driver bugfixes
- usb core sysfs file regression fix from -rc1
All of these, except for the last usb sysfs file fix, have been in
linux-next with no reported issues. The sysfs fix was added to the
tree on Friday, and is "obviously correct" and should not have any
problems either, it just didn't have any time for linux-next to pick
up (0-day had no problems with it)"
* tag 'usb-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: core: remove the move buf action
usb: gadget: midi2: Fix MIDI2 IN EP max packet size
usb: gadget: midi2: Fix missing UMP group attributes initialization
usb: typec: tcpm: properly deliver cable vdms to altmode drivers
USB: gadget: dummy-hcd: Fix locking bug in RT-enabled kernels
xhci: fix memory leak regression when freeing xhci vdev devices depth first
xhci: dbc: Fix full DbC transfer ring after several reconnects
xhci: dbc: decouple endpoint allocation from initialization
USB: serial: option: add Telit Cinterion LE910C4-WWX new compositions
USB: serial: option: add Telit Cinterion FN990A w/audio compositions
Pull x86 fixes from Ingo Molnar:
"Fix a CPU topology parsing bug on AMD guests, and address
a lockdep warning in the resctrl filesystem"
* tag 'x86-urgent-2025-09-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
fs/resctrl: Eliminate false positive lockdep warning when reading SNC counters
x86/cpu/topology: Always try cpu_parse_topology_ext() on AMD/Hygon
Pull timer fix from Ingo Molnar:
"Fix a lost-timeout CPU hotplug bug in the hrtimer code, which can
trigger with certain hardware configs and regular HZ"
* tag 'timers-urgent-2025-09-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimers: Unconditionally update target CPU base after offline timer migration
Pull input fixes from Dmitry Torokhov:
- a quirk to i8042 for yet another TUXEDO laptop
- a fix to mtk-pmic-keys driver to properly handle MT6359
- a fix to iqs7222 driver to only enable proximity interrupt
if it is mapped to a key or a switch event
- an update to xpad controller driver to recognize Flydigi Apex 5
controller
- an update to maintainers file to drop bounding entry for Melfas
touch controller
* tag 'input-for-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
MAINTAINERS: Input: Drop melfas-mip4 section
Input: mtk-pmic-keys - MT6359 has a specific release irq
Input: i8042 - add TUXEDO InfinityBook Pro Gen10 AMD to i8042 quirk table
Input: iqs7222 - avoid enabling unused interrupts
Input: xpad - add support for Flydigi Apex 5
This patch adds a new fixup for the ALC295 codec on some Dell
laptops that use the TAS2781 I2C amplifier.
The fixup correctly initializes the amplifier and pins, allowing
sound to work on all speakers of these devices.
The fixup chain is added to the relevant quirk entries for
Dell Polaris models.
[ adjusted for 6.17 kernel code by tiwai ]
Fixes: 1e9c708dc3 ("ALSA: hda/tas2781: Add new quirk for Lenovo, ASUS, Dell projects")
Link: https://bugzilla.suse.com/show_bug.cgi?id=1249575
Signed-off-by: Donald Menig <djmenig@outlook.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull erofs fixes from Gao Xiang:
- Fix invalid algorithm dereference in encoded extents
- Add missing dax_break_layout_final(), since recent FSDAX fixes
didn't cover EROFS
- Arrange long xattr name prefixes more properly
* tag 'erofs-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix long xattr name prefix placement
erofs: fix runtime warning on truncate_folio_batch_exceptionals()
erofs: fix invalid algorithm for encoded extents
Pull a Renesas clk driver fix from Geert Uytterhoeven:
- Fix a Clock Domain regression on R-Car M1A, R-Car H1, and RZ/A1
* tag 'renesas-clk-fixes-for-v6.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers:
clk: renesas: mstp: Add genpd OF provider at postcore_initcall()
When accessing one of the files under /sys/fs/nilfs2/features when
CONFIG_CFI_CLANG is enabled, there is a CFI violation:
CFI failure at kobj_attr_show+0x59/0x80 (target: nilfs_feature_revision_show+0x0/0x30; expected type: 0xfc392c4d)
...
Call Trace:
<TASK>
sysfs_kf_seq_show+0x2a6/0x390
? __cfi_kobj_attr_show+0x10/0x10
kernfs_seq_show+0x104/0x15b
seq_read_iter+0x580/0xe2b
...
When the kobject of the kset for /sys/fs/nilfs2 is initialized, its ktype
is set to kset_ktype, which has a ->sysfs_ops of kobj_sysfs_ops. When
nilfs_feature_attr_group is added to that kobject via
sysfs_create_group(), the kernfs_ops of each files is sysfs_file_kfops_rw,
which will call sysfs_kf_seq_show() when ->seq_show() is called.
sysfs_kf_seq_show() in turn calls kobj_attr_show() through
->sysfs_ops->show(). kobj_attr_show() casts the provided attribute out to
a 'struct kobj_attribute' via container_of() and calls ->show(), resulting
in the CFI violation since neither nilfs_feature_revision_show() nor
nilfs_feature_README_show() match the prototype of ->show() in 'struct
kobj_attribute'.
Resolve the CFI violation by adjusting the second parameter in
nilfs_feature_{revision,README}_show() from 'struct attribute' to 'struct
kobj_attribute' to match the expected prototype.
Link: https://lkml.kernel.org/r/20250906144410.22511-1-konishi.ryusuke@gmail.com
Fixes: aebe17f684 ("nilfs2: add /sys/fs/nilfs2/features group")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202509021646.bc78d9ef-lkp@intel.com/
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "samples/damon: fix boot time enable handling fixup merge
mistakes".
First three patches of the patch series "mm/damon: fix misc bugs in DAMON
modules" [1] were trying to fix boot time DAMON sample modules enabling
issues. The issues are the modules can crash if those are enabled before
DAMON is enabled, like using boot time parameter options. The three
patches were fixing the issues by avoiding starting DAMON before the
module initialization phase.
However, probably by a mistake during a merge, only half of the change is
merged, and the part for avoiding the starting of DAMON before the module
initialized is missed. So the problem is not solved and thus the modules
can still crash if enabled before DAMON is initialized. Fix those by
applying the unmerged parts again.
Note that the broken commits are merged into 6.17-rc1, but also backported
to relevant stable kernels. So this series also needs to be merged into
the stable kernels. Hence Cc-ing stable@.
This patch (of 3):
Commit 0ed1165c37 ("samples/damon/wsse: fix boot time enable handling")
is somehow incompletely applying the origin patch [2]. It is missing the
part that avoids starting DAMON before module initialization. Probably a
mistake during a merge has happened. Fix it by applying the missed part
again.
Link: https://lkml.kernel.org/r/20250909022238.2989-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250909022238.2989-2-sj@kernel.org
Link: https://lkml.kernel.org/r/20250706193207.39810-1-sj@kernel.org [1]
Link: https://lore.kernel.org/20250706193207.39810-2-sj@kernel.org [2]
Fixes: 0ed1165c37 ("samples/damon/wsse: fix boot time enable handling")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/damon/sysfs: fix refresh_ms control overwriting on
multi-kdamonds usages".
Automatic esssential DAMON/DAMOS status update feature of DAMON sysfs
interface (refresh_ms) is broken [1] for multiple DAMON contexts
(kdamonds) use case, since it uses a global single damon_call_control
object for all created DAMON contexts. The fields of the object,
particularly the list field is over-written for the contexts and it makes
unexpected results including user-space hangup and kernel crashes [2].
Fix it by extending damon_call_control for the use case and updating the
usage on DAMON sysfs interface to use per-context dynamically allocated
damon_call_control object.
This patch (of 2):
When damon_call_control->repeat is set, damon_call() is executed
asynchronously, and is eventually canceled when kdamond finishes. If the
damon_call_control object is dynamically allocated, finding the place to
deallocate the object is difficult. Introduce a new damon_call_control
field, namely dealloc_on_cancel, to ask the kdamond deallocates those
dynamically allocated objects when those are canceled.
Link: https://lkml.kernel.org/r/20250908201513.60802-3-sj@kernel.org
Link: https://lkml.kernel.org/r/20250908201513.60802-2-sj@kernel.org
Fixes: d809a7c64b ("mm/damon/sysfs: implement refresh_ms file internal work")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Yunjeong Mun <yunjeong.mun@sk.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm: better GUP pin lru_add_drain_all()", v2.
Series of lru_add_drain_all()-related patches, arising from recent mm/gup
migration report from Will Deacon.
This patch (of 5):
Will Deacon reports:-
When taking a longterm GUP pin via pin_user_pages(),
__gup_longterm_locked() tries to migrate target folios that should not be
longterm pinned, for example because they reside in a CMA region or
movable zone. This is done by first pinning all of the target folios
anyway, collecting all of the longterm-unpinnable target folios into a
list, dropping the pins that were just taken and finally handing the list
off to migrate_pages() for the actual migration.
It is critically important that no unexpected references are held on the
folios being migrated, otherwise the migration will fail and
pin_user_pages() will return -ENOMEM to its caller. Unfortunately, it is
relatively easy to observe migration failures when running pKVM (which
uses pin_user_pages() on crosvm's virtual address space to resolve stage-2
page faults from the guest) on a 6.15-based Pixel 6 device and this
results in the VM terminating prematurely.
In the failure case, 'crosvm' has called mlock(MLOCK_ONFAULT) on its
mapping of guest memory prior to the pinning. Subsequently, when
pin_user_pages() walks the page-table, the relevant 'pte' is not present
and so the faulting logic allocates a new folio, mlocks it with
mlock_folio() and maps it in the page-table.
Since commit 2fbb0c10d1 ("mm/munlock: mlock_page() munlock_page() batch
by pagevec"), mlock/munlock operations on a folio (formerly page), are
deferred. For example, mlock_folio() takes an additional reference on the
target folio before placing it into a per-cpu 'folio_batch' for later
processing by mlock_folio_batch(), which drops the refcount once the
operation is complete. Processing of the batches is coupled with the LRU
batch logic and can be forcefully drained with lru_add_drain_all() but as
long as a folio remains unprocessed on the batch, its refcount will be
elevated.
This deferred batching therefore interacts poorly with the pKVM pinning
scenario as we can find ourselves in a situation where the migration code
fails to migrate a folio due to the elevated refcount from the pending
mlock operation.
Hugh Dickins adds:-
!folio_test_lru() has never been a very reliable way to tell if an
lru_add_drain_all() is worth calling, to remove LRU cache references to
make the folio migratable: the LRU flag may be set even while the folio is
held with an extra reference in a per-CPU LRU cache.
5.18 commit 2fbb0c10d1 may have made it more unreliable. Then 6.11
commit 33dfe9204f ("mm/gup: clear the LRU flag of a page before adding
to LRU batch") tried to make it reliable, by moving LRU flag clearing; but
missed the mlock/munlock batches, so still unreliable as reported.
And it turns out to be difficult to extend 33dfe9204f29's LRU flag
clearing to the mlock/munlock batches: if they do benefit from batching,
mlock/munlock cannot be so effective when easily suppressed while !LRU.
Instead, switch to an expected ref_count check, which was more reliable
all along: some more false positives (unhelpful drains) than before, and
never a guarantee that the folio will prove migratable, but better.
Note on PG_private_2: ceph and nfs are still using the deprecated
PG_private_2 flag, with the aid of netfs and filemap support functions.
Although it is consistently matched by an increment of folio ref_count,
folio_expected_ref_count() intentionally does not recognize it, and ceph
folio migration currently depends on that for PG_private_2 folios to be
rejected. New references to the deprecated flag are discouraged, so do
not add it into the collect_longterm_unpinnable_folios() calculation: but
longterm pinning of transiently PG_private_2 ceph and nfs folios (an
uncommon case) may invoke a redundant lru_add_drain_all(). And this makes
easy the backport to earlier releases: up to and including 6.12, btrfs
also used PG_private_2, but without a ref_count increment.
Note for stable backports: requires 6.16 commit 86ebd50224 ("mm:
add folio_expected_ref_count() for reference count calculation").
Link: https://lkml.kernel.org/r/41395944-b0e3-c3ac-d648-8ddd70451d28@google.com
Link: https://lkml.kernel.org/r/bd1f314a-fca1-8f19-cac0-b936c9614557@google.com
Fixes: 9a4e9f3b2d ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Will Deacon <will@kernel.org>
Closes: https://lore.kernel.org/linux-mm/20250815101858.24352-1-will@kernel.org/
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Keir Fraser <keirf@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Li Zhe <lizhe.67@bytedance.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shivank Garg <shivankg@amd.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: yangge <yangge1116@126.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull ceph fixes from Ilya Dryomov:
"A fix for a race condition around r_parent tracking that took a long
time to track down from Alex and some fixes for potential crashes on
accessing invalid memory from Max and myself.
All marked for stable"
* tag 'ceph-for-6.17-rc6' of https://github.com/ceph/ceph-client:
libceph: fix invalid accesses to ceph_connection_v1_info
ceph: fix crash after fscrypt_encrypt_pagecache_blocks() error
ceph: always call ceph_shift_unused_folios_left()
ceph: fix race condition where r_parent becomes stale before sending message
ceph: fix race condition validating r_parent before applying state
Pull regulator fix from Mark Brown:
"One fix for sy7636a which got confused about which device to use to
manage the lifecycle of the power good GPIO because it's looked up
from the parent device due to the way DT bindings work"
* tag 'regulator-fix-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: sy7636a: fix lifecycle of power good gpio
Pull driver core fixes from Danilo Krummrich:
- Fix UAF in cgroup pressure polling by using kernfs_get_active_of()
to prevent operations on released file descriptors
- Fix unresolved intra-doc link in the documentation of struct Device
when CONFIG_DRM != y
- Update the DMA Rust MAINTAINERS entry
* tag 'driver-core-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
MAINTAINERS: Update the DMA Rust entry
kernfs: Fix UAF in polling when open file is released
rust: device: fix unresolved link to drm::Device
The function module_arch_freeing_init() defined in arch/arm/kernel/module.c
is supposed to override a weak function of the same name defined in
kernel/module/main.c. However, the ARM override is also marked as weak,
which means that selecting the correct function unnecessarily depends on
the order in which object files with both functions are passed to the
linker. Although it happens to be correct at the moment, the proper pattern
is to make the ARM override a strong definition.
Fixes: cdcb07e45a ("ARM: 8975/1: module: fix handling of unwind init sections")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Commit 6cc44e9618 ("drm: Add directive to format code in comment")
fixes original Sphinx indentation warning as introduced in
471920ce25 ("drm/gpuvm: Add locking helpers"), by means of using
code-block:: directive. It semantically conflicts with earlier
bb324f85f7 ("drm/gpuvm: Wrap drm_gpuvm_sm_map_exec_lock() expected
usage in literal code block") that did the same using double colon
syntax instead. These duplicated literal code block directives causes
the original warnings not being fixed.
Revert 6cc44e9618 to keep things rolling without these warnings.
Fixes: 6cc44e9618 ("drm: Add directive to format code in comment")
Fixes: 471920ce25 ("drm/gpuvm: Add locking helpers")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Commit 7bea695ada restructured DTE flag handling but inadvertently changed
the alias device configuration logic. This may cause incorrect DTE settings
for certain devices.
Add alias flag check before calling set_dev_entry_from_acpi(). Also move the
device iteration loop inside the alias check to restrict execution to cases
where alias devices are present.
Fixes: 7bea695ada ("iommu/amd: Introduce struct ivhd_dte_flags to store persistent DTE flags")
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
When
9770b428b1 ("crypto: ccp - Move dev_info/err messages for SEV/SNP init and shutdown")
moved the error messages dumping so that they don't need to be issued by
the callers, it missed the case where __sev_firmware_shutdown() calls
__sev_platform_shutdown_locked() with a NULL argument which leads to
a NULL ptr deref on the shutdown path, during suspend to disk:
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 983 Comm: hib.sh Not tainted 6.17.0-rc4+ #1 PREEMPT(voluntary)
Hardware name: Supermicro Super Server/H12SSL-i, BIOS 2.5 09/08/2022
RIP: 0010:__sev_platform_shutdown_locked.cold+0x0/0x21 [ccp]
That rIP is:
00000000000006fd <__sev_platform_shutdown_locked.cold>:
6fd: 8b 13 mov (%rbx),%edx
6ff: 48 8b 7d 00 mov 0x0(%rbp),%rdi
703: 89 c1 mov %eax,%ecx
Code: 74 05 31 ff 41 89 3f 49 8b 3e 89 ea 48 c7 c6 a0 8e 54 a0 41 bf 92 ff ff ff e8 e5 2e 09 e1 c6 05 2a d4 38 00 01 e9 26 af ff ff <8b> 13 48 8b 7d 00 89 c1 48 c7 c6 18 90 54 a0 89 44 24 04 e8 c1 2e
RSP: 0018:ffffc90005467d00 EFLAGS: 00010282
RAX: 00000000ffffff92 RBX: 0000000000000000 RCX: 0000000000000000
^^^^^^^^^^^^^^^^
and %rbx is nice and clean.
Call Trace:
<TASK>
__sev_firmware_shutdown.isra.0
sev_dev_destroy
psp_dev_destroy
sp_destroy
pci_device_shutdown
device_shutdown
kernel_power_off
hibernate.cold
state_store
kernfs_fop_write_iter
vfs_write
ksys_write
do_syscall_64
entry_SYSCALL_64_after_hwframe
Pass in a pointer to the function-local error var in the caller.
With that addressed, suspending the ccp shows the error properly at
least:
ccp 0000:47:00.1: sev command 0x2 timed out, disabling PSP
ccp 0000:47:00.1: SEV: failed to SHUTDOWN error 0x0, rc -110
SEV-SNP: Leaking PFN range 0x146800-0x146a00
SEV-SNP: PFN 0x146800 unassigned, dumping non-zero entries in 2M PFN region: [0x146800 - 0x146a00]
...
ccp 0000:47:00.1: SEV-SNP firmware shutdown failed, rc -16, error 0x0
ACPI: PM: Preparing to enter system sleep state S5
kvm: exiting hardware virtualization
reboot: Power down
Btw, this driver is crying to be cleaned up to pass in a proper I/O
struct which can be used to store information between the different
functions, otherwise stuff like that will happen in the future again.
Fixes: 9770b428b1 ("crypto: ccp - Move dev_info/err messages for SEV/SNP init and shutdown")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Reviewed-by: Ashish Kalra <ashish.kalra@amd.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When "perf lock con" is run in a live mode, with no data file, a host
environment must be provided. Testing missed this as a failing assert
was creating the 1 line of expected stderr output.
$ sudo perf lock con -ab true
perf: util/session.c:195: __perf_session__new: Assertion `host_env != NULL' failed.
Aborted
Fixes: 525a599bad ("perf env: Remove global perf_env")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
When cross-compiling the perf tool for ARM64, `perf help` may crash
with the following assertion failure:
help.c:122: exclude_cmds: Assertion `cmds->names[ci] == NULL' failed.
This happens when the perf binary is not named exactly "perf" or when
multiple "perf-*" binaries exist in the same directory. In such cases,
the `excludes` command list can be empty, which leads to the final
assertion in exclude_cmds() being triggered.
Add a simple guard at the beginning of exclude_cmds() to return early
if excludes->cnt is zero, preventing the crash.
Signed-off-by: hupu <hupu.gm@gmail.com>
Reported-by: Guilherme Amadio <amadio@gentoo.org>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250909094953.106706-1-amadio@gentoo.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Network drivers sometimes return -EOPNOTSUPP from their get_ts_info()
method, and this should not cause the reporting of PHY timestamping
information to be prohibited. Handle this error code, and also
arrange for ethtool_net_get_ts_info_by_phc() to return -EOPNOTSUPP
when the method is not implemented.
This allows e.g. PHYs connected to DSA switches which support
timestamping to report their timestamping capabilities.
Fixes: b9e3f7dc9e ("net: ethtool: tsinfo: Enhance tsinfo to support several hwtstamp by net topology")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/E1uwiW3-00000004jRF-3CnC@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update email address to @oss.qualcomm.com for both the maintainers
from qualcomm, Viken dadhani and Mukesh Kumar Savaliya.
Signed-off-by: Mukesh Kumar Savaliya <mukesh.savaliya@oss.qualcomm.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Pull pci fix from Bjorn Helgaas:
- Fix mvebu PCI enumeration regression caused by converting to
for_each_of_range() iterator (Klaus Kudielka)
* tag 'pci-v6.17-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: mvebu: Fix use of for_each_of_range() iterator
Pull drm fixes from Dave Airlie:
"Weekly pull fixes for drm, mostly amdgpu and xe, with a revert for
nouveau and some maintainers updates, and misc bits, doesn't seem too
out of the normal.
MAINTAINERS:
- add rust tree to MAINTAINERS
- fix X entries for nova/nouveau
nova:
- depend on 64-bit
i915:
- Fix size for for_each_set_bit() in abox iteration
xe:
- Don't touch survivability_mode on fini
- Fixes around eviction and suspend
- Extend Wa_13011645652 to PTL-H, WCL
amdgpu:
- PSP 11.x fix
- DPCD quirk handing fix
- DCN 3.5 PG fix
- Audio suspend fix
- OEM i2c clean up fix
- Module unload memory leak fix
- DC delay fix
- ISP firmware fix
- VCN fixes
amdkfd:
- P2P topology fix
- APU mem limit calculation fix
mediatek:
- fix potential OF node use-after-free
panthor:
- out-of-bounds check
nouveau:
- revert waitqueue removal for sched teardown
* tag 'drm-fixes-2025-09-12' of https://gitlab.freedesktop.org/drm/kernel: (25 commits)
MAINTAINERS: drm-misc: fix X: entries for nova/nouveau
drm/mediatek: clean up driver data initialisation
drm/mediatek: fix potential OF node use-after-free
drm/amdgpu/vcn: Allow limiting ctx to instance 0 for AV1 at any time
drm/amdgpu/vcn4: Fix IB parsing with multiple engine info packages
drm/amd/amdgpu: Declare isp firmware binary file
drm/amd/display: use udelay rather than fsleep
drm/amdgpu: fix a memory leak in fence cleanup when unloading
drm/xe: Extend Wa_13011645652 to PTL-H, WCL
drm/xe: Block exec and rebind worker while evicting for suspend / hibernate
drm/xe: Allow the pm notifier to continue on failure
drm/xe: Attempt to bring bos back to VRAM after eviction
drm/xe/configfs: Don't touch survivability_mode on fini
amd/amdkfd: correct mem limit calculation for small APUs
drm/amdkfd: fix p2p links bug in topology
drm/amd/display: remove oem i2c adapter on finish
drm/amd/display: Drop dm_prepare_suspend() and dm_complete()
drm/amd/display: Correct sequences and delays for DCN35 PG & RCG
drm/amd/display: Disable DPCD Probe Quirk
drm/i915/power: fix size for for_each_set_bit() in abox iteration
...
Pull smb client fixes from Steve French:
"Two smb3 client fixes, both for stable:
- Fix encryption problem with multiple compounded ops
- Fix rename error cases that could lead to data corruption"
* tag 'v6.17-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix data loss due to broken rename(2)
smb: client: fix compound alignment with encryption
SoCFPGA DTS fix for v6.17
- Fix midio bus probe and PHY address for cylone5 sodia board
* tag 'socfpga_dts_fix_for_v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
ARM: dts: socfpga: sodia: Fix mdio bus probe and PHY address
Link: https://lore.kernel.org/r/20250907123058.175447-1-dinguyen@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
mvebu fixes for 6.17 (part 1)
Fix SATA ports on various boards: Macchiatobin, CN913x-solidrun.
Fix audio on Armada 370 DB and OpenRD.
Disable eMMC high-speed modes on the CN9132 CEX-7 module.
Disable runtime reconfiguration for PCIe lanes on the CN9132 CEX-7 module.
* tag 'mvebu-fixes-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu:
arm64: dts: marvell: cn9132-clearfog: fix multi-lane pci x2 and x4 ports
arm64: dts: marvell: cn9132-clearfog: disable eMMC high-speed modes
arm64: dts: marvell: cn913x-solidrun: fix sata ports status
ARM: dts: kirkwood: Fix sound DAI cells for OpenRD clients
ARM64: dts: mcbin: fix SATA ports on Macchiatobin
ARM: dts: armada-370-db: Fix stereo audio input routing on Armada 370
Link: https://lore.kernel.org/r/87ikhnn1pl.fsf@BLaptop.bootlin.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Add support for missing hotkey keycodes affecting Asus PX13 and PX16 families
so userspace can use them.
Signed-off-by: Amit Chaudhari <amitchaudhari@mac.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Since commit 6485543488 ("HID: cp2112: use new line value setter
callbacks"), setting a GPIO value always fails with error -EBADE.
That's because the returned value by the setter callbacks is the
returned value by the hid_hw_raw_request() function which is the number of
bytes sent on success or a negative value on error. The function
gpiochip_set() returns -EBADE if the setter callbacks return a value >
0.
Fix this by making the setter callbacks return 0 on success or a negative
value on error.
While at it, use the returned value by cp2112_gpio_set_unlocked() in the
direction_output callback.
Fixes: 6485543488 ("HID: cp2112: use new line value setter callbacks")
Signed-off-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Commit 84c9d2a968 ("HID: lenovo: Support for ThinkPad-X12-TAB-1/2 Kbd
Fn keys") added a dependency on ACPI's platform_profile. This should not
be done for generic USB devices as this prevents using the devices on
non ACPI devices like Apple silicon Macs and other non-ACPI arm64
systems. An attempt to allow using platform_profile on non-ACPI systems
was rejected in [1] and instead platform_profile was made to fail during
init in commit dd133162c9 ("ACPI: platform_profile: Avoid initializing
on non-ACPI platforms").
So remove the broken dependency and instead let's user space handle this
keycode by sending the new KEY_PERFORMANCE. Stable backport depends on
commit 89c5214639 ("Input: add keycode for performance mode key").
[1]: https://lore.kernel.org/linux-acpi/CAJZ5v0icRdTSToaKbdf=MdRin4NyB2MstUVaQo8VR6-n7DkVMQ@mail.gmail.com/
Cc: regressions@lists.linux.dev
Cc: stable@vger.kernel.org
Fixes: 84c9d2a968 ("HID: lenovo: Support for ThinkPad-X12-TAB-1/2 Kbd Fn keys")
Signed-off-by: Janne Grunau <j@jannau.net>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
According to the power structure of IC hardware design for UHS-II
interface, reset control and timing must be added to the initialization
process of powering on the UHS-II interface.
Fixes: 27dd3b8255 ("mmc: sdhci-pci-gli: enable UHS-II mode for GL9767")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Ben Chuang <ben.chuang@genesyslogic.com.tw>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The sdhci_set_clock() is called in sdhci_set_ios_common() and
__sdhci_uhs2_set_ios(). According to Section 3.13.2 "Card Interface
Detection Sequence" of the SD Host Controller Standard Specification
Version 7.00, the SD clock is supplied after power is supplied, so we only
need one in __sdhci_uhs2_set_ios(). Let's move the code related to setting
the clock from sdhci_set_ios_common() into sdhci_set_ios() and modify
the parameters passed to sdhci_set_clock() in __sdhci_uhs2_set_ios().
Fixes: 10c8298a05 ("mmc: sdhci-uhs2: add set_ios()")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Ben Chuang <ben.chuang@genesyslogic.com.tw>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The mvebu-comphy driver does not currently know how to pass correct
lane-count to ATF while configuring the serdes lanes.
This causes the system to hard reset during reconfiguration, if a pci
card is present and has established a link during bootloader.
Remove the comphy handles from the respective pci nodes to avoid runtime
reconfiguration, relying solely on bootloader configuration - while
avoiding the hard reset.
When bootloader has configured the lanes correctly, the pci ports are
functional under Linux.
This issue may be addressed in the comphy driver at a future point.
Fixes: e9ff907f40 ("arm64: dts: add description for solidrun cn9132 cex7 module and clearfog board")
Cc: stable@vger.kernel.org
Signed-off-by: Josua Mayer <josua@solid-run.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Similar to MacchiatoBIN the high-speed modes are unstable on the CN9132
CEX-7 module, leading to failed transactions under normal use.
Disable all high-speed modes including UHS.
Additionally add no-sdio and non-removable properties as appropriate for
eMMC.
Fixes: e9ff907f40 ("arm64: dts: add description for solidrun cn9132 cex7 module and clearfog board")
Cc: stable@vger.kernel.org
Signed-off-by: Josua Mayer <josua@solid-run.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Commit "arm64: dts: marvell: only enable complete sata nodes" changed
armada-cp11x.dtsi disabling all sata ports status by default.
The author missed some dts which relied on the dtsi enabling all ports,
and just disabled unused ones instead.
Update dts for SolidRun cn913x based boards to enable the available
ports, rather than disabling the unvavailable one.
Further according to dt bindings the serdes phys are to be specified in
the port node, not the controller node.
Move those phys properties accordingly in clearfog base/pro/solidwan.
Fixes: 30023876ae ("arm64: dts: marvell: only enable complete sata nodes")
Cc: stable@vger.kernel.org
Signed-off-by: Josua Mayer <josua@solid-run.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Johan writes:
USB serial device ids for 6.17-rc6
Here are some new modem device ids.
Everything has been in linux-next with no reported issues.
* tag 'usb-serial-6.17-rc6' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add Telit Cinterion LE910C4-WWX new compositions
USB: serial: option: add Telit Cinterion FN990A w/audio compositions
A previous commit changed the '#sound-dai-cells' property for the
kirkwood audio controller from 1 to 0 in the kirkwood.dtsi file,
but did not update the corresponding 'sound-dai' property in the
kirkwood-openrd-client.dts file.
This created a mismatch, causing a dtbs_check validation error where
the dts provides one cell (<&audio0 0>) while the .dtsi expects zero.
Remove the extraneous cell from the 'sound-dai' property to fix the
schema validation warning and align with the updated binding.
Fixes: e662e70fa4 ("arm: dts: kirkwood: fix error in #sound-dai-cells size")
Signed-off-by: Jihed Chaibi <jihed.chaibi.dev@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Starting with commit c50e747596 ("dpaa2-switch: Fix error checking in
dpaa2_switch_seed_bp()"), the probing of a second DPSW object errors out
like below.
fsl_dpaa2_switch dpsw.1: fsl_mc_driver_probe failed: -12
fsl_dpaa2_switch dpsw.1: probe with driver fsl_dpaa2_switch failed with error -12
The aforementioned commit brought to the surface the fact that seeding
buffers into the buffer pool destined for control traffic is not
successful and an access violation recoverable error can be seen in the
MC firmware log:
[E, qbman_rec_isr:391, QBMAN] QBMAN recoverable event 0x1000000
This happens because the driver incorrectly used the ID of the DPBP
object instead of the hardware buffer pool ID when trying to release
buffers into it.
This is because any DPSW object uses two buffer pools, one managed by
the Linux driver and destined for control traffic packet buffers and the
other one managed by the MC firmware and destined only for offloaded
traffic. And since the buffer pool managed by the MC firmware does not
have an external facing DPBP equivalent, any subsequent DPBP objects
created after the first DPSW will have a DPBP id different to the
underlying hardware buffer ID.
The issue was not caught earlier because these two numbers can be
identical when all DPBP objects are created before the DPSW objects are.
This is the case when the DPL file is used to describe the entire DPAA2
object layout and objects are created at boot time and it's also true
for the first DPSW being created dynamically using ls-addsw.
Fix this by using the buffer pool ID instead of the DPBP id when
releasing buffers into the pool.
Fixes: 2877e4f7e1 ("staging: dpaa2-switch: setup buffer pool and RX path rings")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20250910144825.2416019-1-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
napi_stop_kthread waits for the NAPI_STATE_SCHED_THREADED to be unset
before stopping the kthread. But it uses test_bit with the
NAPIF_STATE_SCHED_THREADED and that might stop the kthread early before
the flag is unset.
Use the NAPI_* variant of the NAPI state bits in test_bit instead.
Tested:
./tools/testing/selftests/net/nl_netdev.py
TAP version 13
1..7
ok 1 nl_netdev.empty_check
ok 2 nl_netdev.lo_check
ok 3 nl_netdev.page_pool_check
ok 4 nl_netdev.napi_list_check
ok 5 nl_netdev.dev_set_threaded
ok 6 nl_netdev.napi_set_threaded
ok 7 nl_netdev.nsim_rxq_reset_down
# Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0
./tools/testing/selftests/drivers/net/napi_threaded.py
TAP version 13
1..2
ok 1 napi_threaded.change_num_queues
ok 2 napi_threaded.enable_dev_threaded_disable_napi_threaded
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
Fixes: 689883de94 ("net: stop napi kthreads when THREADED napi is disabled")
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Link: https://patch.msgid.link/20250910203716.1016546-1-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update the DMA Rust maintainers entry in the following two aspects:
(1) Change Abdiel's entry to 'Reviewer'.
(2) Take patches through the driver-core tree.
Abdiel won't do any more maintainer work on the DMA (or scatterlist)
infrastructure, but he'd like to be kept in the loop, hence change is
entry to 'R:'.
Analogous to [1], the DMA (and scatterlist) helpers are closely coupled
with the core device infrastructure and the device lifecycle, hence take
patches through the driver-core tree by default.
Cc: Abdiel Janulgue <abdiel.janulgue@gmail.com>
Link: https://lore.kernel.org/r/20250725202840.2251768-1-ojeda@kernel.org [1]
Acked-by: Abdiel Janulgue <abdiel.janulgue@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Currently, xattr name prefixes are forcibly placed into the packed
inode if the fragments feature is enabled, and users have no option
to put them in plain form directly on disk.
This is inflexible. First, as mentioned above, users should be able
to store unwrapped long xattr name prefixes unconditionally
(COMPAT_PLAIN_XATTR_PFX). Second, since we now have the new metabox
inode to store metadata, it should be used when available instead
of the packed inode.
Fixes: 414091322c ("erofs: implement metadata compression")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
A a potential race condition reported by one of my customers that leads to
a NULL pointer dereference, where the call to efi.get_variable should be
guarded with efi_rt_services_supported() to ensure that function exists.
Fixes: 4fe2385134 ("ALSA: hda/tas2781: Move and unified the calibrated-data getting function for SPI and I2C into the tas2781_hda lib")
Signed-off-by: Shenghao Ding <shenghao-ding@ti.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull networking fixes from Paolo Abeni:
"Including fixes from CAN, netfilter and wireless.
We have an IPv6 routing regression with the relevant fix still a WiP.
This includes a last-minute revert to avoid more problems.
Current release - new code bugs:
- wifi: nl80211: completely disable per-link stats for now
Previous releases - regressions:
- dev_ioctl: take ops lock in hwtstamp lower paths
- netfilter:
- fix spurious set lookup failures
- fix lockdep splat due to missing annotation
- genetlink: fix genl_bind() invoking bind() after -EPERM
- phy: transfer phy_config_inband() locking responsibility to phylink
- can: xilinx_can: fix use-after-free of transmitted SKB
- hsr: fix lock warnings
- eth:
- igb: fix NULL pointer dereference in ethtool loopback test
- i40e: fix Jumbo Frame support after iPXE boot
- macsec: sync features on RTM_NEWLINK
Previous releases - always broken:
- tunnels: reset the GSO metadata before reusing the skb
- mptcp: make sync_socket_options propagate SOCK_KEEPOPEN
- can: j1939: implement NETDEV_UNREGISTER notification hanidler
- wifi: ath12k: fix WMI TLV header misalignment"
* tag 'net-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
Revert "net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups"
hsr: hold rcu and dev lock for hsr_get_port_ndev
hsr: use hsr_for_each_port_rtnl in hsr_port_get_hsr
hsr: use rtnl lock when iterating over ports
wifi: nl80211: completely disable per-link stats for now
net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups
net: ethtool: fix wrong type used in struct kernel_ethtool_ts_info
MAINTAINERS: add Phil as netfilter reviewer
netfilter: nf_tables: restart set lookup on base_seq change
netfilter: nf_tables: make nft_set_do_lookup available unconditionally
netfilter: nf_tables: place base_seq in struct net
netfilter: nft_set_rbtree: continue traversal if element is inactive
netfilter: nft_set_pipapo: don't check genbit from packetpath lookups
netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation
can: rcar_can: rcar_can_resume(): fix s2ram with PSCI
can: xilinx_can: xcan_write_frame(): fix use-after-free of transmitted SKB
can: j1939: j1939_local_ecu_get(): undo increment when j1939_local_ecu_get() fails
can: j1939: j1939_sk_bind(): call j1939_priv_put() immediately when j1939_local_ecu_get() failed
can: j1939: implement NETDEV_UNREGISTER notification handler
selftests: can: enable CONFIG_CAN_VCAN as a module
...
Pull s390 fixes from Alexander Gordeev:
- ptep_modify_prot_start() may be called in a loop, which might lead to
the preempt_count overflow due to the unnecessary preemption
disabling. Do not disable preemption to prevent the overflow
- Events of type PERF_TYPE_HARDWARE are not tested for sampling and
return -EOPNOTSUPP eventually.
Instead, deny all sampling events by CPUMF counter facility and
return -ENOENT to allow other PMUs to be tried
- The PAI PMU driver returns -EINVAL if an event out of its range. That
aborts a search for an alternative PMU driver.
Instead, return -ENOENT to allow other PMUs to be tried
* tag 's390-6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cpum_cf: Deny all sampling events by counter PMU
s390/pai: Deny all events not handled by this PMU
s390/mm: Prevent possible preempt_count overflow
Pull power management fixes from Rafael Wysocki:
"These fix a nasty hibernation regression introduced during the 6.16
cycle, an issue related to energy model management occurring on Intel
hybrid systems where some CPUs are offline to start with, and two
regressions in the amd-pstate driver:
- Restore a pm_restrict_gfp_mask() call in hibernation_snapshot()
that was removed incorrectly during the 6.16 development cycle
(Rafael Wysocki)
- Introduce a function for registering a perf domain without
triggering a system-wide CPU capacity update and make the
intel_pstate driver use it to avoid reocurring unsuccessful
attempts to update capacities of all CPUs in the system (Rafael
Wysocki)
- Fix setting of CPPC.min_perf in the active mode with performance
governor in the amd-pstate driver to restore its expected behavior
changed recently (Gautham Shenoy)
- Avoid mistakenly setting EPP to 0 in the amd-pstate driver after
system resume as a result of recent code changes (Mario
Limonciello)"
* tag 'pm-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: hibernate: Restrict GFP mask in hibernation_snapshot()
PM: EM: Add function for registering a PD without capacity update
cpufreq/amd-pstate: Fix a regression leading to EPP 0 after resume
cpufreq/amd-pstate: Fix setting of CPPC.min_perf in active mode for performance governor
Pull btrfs fixes from David Sterba:
- fix delayed inode tracking in xarray, eviction can race with
insertion and leave behind a disconnected inode
- on systems with large page (64K) and small block size (4K) fix
compression read that can return partially filled folio
- slightly relax compression option format for backward compatibility,
allow to specify level for LZO although there's only one
- fix simple quota accounting of compressed extents
- validate minimum device size in 'device add'
- update maintainers' entry
* tag 'for-6.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: don't allow adding block device of less than 1 MB
MAINTAINERS: update btrfs entry
btrfs: fix subvolume deletion lockup caused by inodes xarray race
btrfs: fix corruption reading compressed range when block size is smaller than page size
btrfs: accept and ignore compression level for lzo
btrfs: fix squota compressed stats leak
Pull bpf fixes from Alexei Starovoitov:
"A number of fixes accumulated due to summer vacations
- Fix out-of-bounds dynptr write in bpf_crypto_crypt() kfunc which
was misidentified as a security issue (Daniel Borkmann)
- Update the list of BPF selftests maintainers (Eduard Zingerman)
- Fix selftests warnings with icecc compiler (Ilya Leoshkevich)
- Disable XDP/cpumap direct return optimization (Jesper Dangaard
Brouer)
- Fix unexpected get_helper_proto() result in unusual configuration
BPF_SYSCALL=y and BPF_EVENTS=n (Jiri Olsa)
- Allow fallback to interpreter when JIT support is limited (KaFai
Wan)
- Fix rqspinlock and choose trylock fallback for NMI waiters. Pick
the simplest fix. More involved fix is targeted bpf-next (Kumar
Kartikeya Dwivedi)
- Fix cleanup when tcp_bpf_send_verdict() fails to allocate
psock->cork (Kuniyuki Iwashima)
- Disallow bpf_timer in PREEMPT_RT for now. Proper solution is being
discussed for bpf-next. (Leon Hwang)
- Fix XSK cq descriptor production (Maciej Fijalkowski)
- Tell memcg to use allow_spinning=false path in bpf_timer_init() to
avoid lockup in cgroup_file_notify() (Peilin Ye)
- Fix bpf_strnstr() to handle suffix match cases (Rong Tao)"
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Skip timer cases when bpf_timer is not supported
bpf: Reject bpf_timer for PREEMPT_RT
tcp_bpf: Call sk_msg_free() when tcp_bpf_send_verdict() fails to allocate psock->cork.
bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()
bpf: Allow fall back to interpreter for programs with stack size <= 512
rqspinlock: Choose trylock fallback for NMI waiters
xsk: Fix immature cq descriptor production
bpf: Update the list of BPF selftests maintainers
selftests/bpf: Add tests for bpf_strnstr
selftests/bpf: Fix "expression result unused" warnings with icecc
bpf: Fix bpf_strnstr() to handle suffix match cases better
selftests/bpf: Extend crypto_sanity selftest with invalid dst buffer
bpf: Fix out-of-bounds dynptr write in bpf_crypto_crypt
bpf: Check the helper function is valid in get_helper_proto
bpf, cpumap: Disable page_pool direct xdp_return need larger scope
Merge a hibernation regression fix and an fix related to energy model
management for 6.17-rc6
* pm-sleep:
PM: hibernate: Restrict GFP mask in hibernation_snapshot()
* pm-em:
PM: EM: Add function for registering a PD without capacity update
Johannes Berg says:
====================
Some more fixes:
- iwlwifi: fix 130/1030 devices
- ath12k: fix alignment, power save
- virt_wifi: fix crash
- cfg80211: disable per-link stats due
to buffer size issues
* tag 'wireless-2025-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: nl80211: completely disable per-link stats for now
wifi: virt_wifi: Fix page fault on connect
wifi: cfg80211: Fix "no buffer space available" error in nl80211_get_station() for MLO
wifi: iwlwifi: fix 130/1030 configs
wifi: ath12k: fix WMI TLV header misalignment
wifi: ath12k: Fix missing station power save configuration
====================
Link: https://patch.msgid.link/20250911100345.20025-3-johannes@sipsolutions.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The recent changes to genpd makes a genpd OF provider that is powered-on at
initialization to stay powered-on, until the ->sync_state() callback is
invoked for it.
This may not happen at all, if we wait for a consumer device to be probed,
leading to wasting energy. There are ways to enforce the ->sync_state()
callback to be invoked, through sysfs or via the probe-defer-timeout, but
none of them in its current form are a good fit for rmobile-sysc PM
domains.
Let's therefore opt-out from this behaviour of genpd for now, by using the
GENPD_FLAG_NO_STAY_ON.
Link: https://lore.kernel.org/all/20250701114733.636510-1-ulf.hansson@linaro.org/
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: 0e789b491b ("pmdomain: core: Leave powered-on genpds on until sync_state")
Fixes: 13a4b7fb62 ("pmdomain: core: Leave powered-on genpds on until late_initcall_sync")
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The recent changes to genpd makes a genpd OF provider that is powered-on at
initialization to stay powered-on, until the ->sync_state() callback is
invoked for it.
This may not happen at all, if we wait for a consumer device to be probed,
leading to wasting energy. There are ways to enforce the ->sync_state()
callback to be invoked, through sysfs or via the probe-defer-timeout, but
none of them in its current form are a good fit for rcar-gen4-sysc PM
domains.
Let's therefore opt-out from this behaviour of genpd for now, by using the
GENPD_FLAG_NO_STAY_ON.
Link: https://lore.kernel.org/all/20250701114733.636510-1-ulf.hansson@linaro.org/
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: 0e789b491b ("pmdomain: core: Leave powered-on genpds on until sync_state")
Fixes: 13a4b7fb62 ("pmdomain: core: Leave powered-on genpds on until late_initcall_sync")
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The recent changes to genpd makes a genpd OF provider that is powered-on at
initialization to stay powered-on, until the ->sync_state() callback is
invoked for it.
This may not happen at all, if we wait for a consumer device to be probed,
leading to wasting energy. There are ways to enforce the ->sync_state()
callback to be invoked, through sysfs or via the probe-defer-timeout, but
none of them in its current form are a good fit for rcar-sysc PM domains.
Let's therefore opt-out from this behaviour of genpd for now, by using the
GENPD_FLAG_NO_STAY_ON.
Link: https://lore.kernel.org/all/20250701114733.636510-1-ulf.hansson@linaro.org/
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: 0e789b491b ("pmdomain: core: Leave powered-on genpds on until sync_state")
Fixes: 13a4b7fb62 ("pmdomain: core: Leave powered-on genpds on until late_initcall_sync")
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The deferred regulator retrieval for Rockchip PM domains are causing some
weird dependencies. More precisely, if the power-domain is powered-on from
the HW perspective, its corresponding regulator must not be powered-off via
regulator_init_complete(), which is a late_initcall_sync.
Even on platforms that don't have the domain-supply regulator specified for
the power-domain provider, may suffer from these problems.
More precisely, things just happen to work before, because
genpd_power_off_unused() (also a late_initcall_sync) managed to power-off
the PM domain before regulator_init_complete() powered-off the regulator.
Ideally this fragile dependency must be fixed properly for the Rockchip PM
domains, but until then, let's fallback to the previous behaviour by using
the GENPD_FLAG_NO_STAY_ON flag.
Link: https://lore.kernel.org/all/20250902-rk3576-lockup-regression-v1-1-c4a0c9daeb00@collabora.com/
Reported-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Fixes: 0e789b491b ("pmdomain: core: Leave powered-on genpds on until sync_state")
Fixes: 13a4b7fb62 ("pmdomain: core: Leave powered-on genpds on until late_initcall_sync")
Tested-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Recent changes to genpd prevents those PM domains being powered-on during
initialization from being powered-off during the boot sequence. Based upon
whether CONFIG_PM_CONFIG_PM_GENERIC_DOMAINS_OF is set of not, genpd relies
on the sync_state mechanism or the genpd_power_off_unused() (which is a
late_initcall_sync), to understand when it's okay to allow these PM domains
to be powered-off.
This new behaviour in genpd has lead to problems on different platforms.
Let's therefore restore the behavior of genpd_power_off_unused().
Moreover, let's introduce GENPD_FLAG_NO_STAY_ON, to allow genpd OF
providers to opt-out from the new behaviour.
Link: https://lore.kernel.org/all/20250701114733.636510-1-ulf.hansson@linaro.org/
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/all/20250902-rk3576-lockup-regression-v1-1-c4a0c9daeb00@collabora.com/
Reported-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Fixes: 0e789b491b ("pmdomain: core: Leave powered-on genpds on until sync_state")
Fixes: 13a4b7fb62 ("pmdomain: core: Leave powered-on genpds on until late_initcall_sync")
Tested-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
hsr_port_get_hsr() iterates over ports using hsr_for_each_port(),
but many of its callers do not hold the required RCU lock.
Switch to hsr_for_each_port_rtnl(), since most callers already hold
the rtnl lock. After review, all callers are covered by either the rtnl
lock or the RCU lock, except hsr_dev_xmit(). Fix this by adding an
RCU read lock there.
Fixes: c5a7591172 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250905091533.377443-3-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
hsr_for_each_port is called in many places without holding the RCU read
lock, this may trigger warnings on debug kernels. Most of the callers
are actually hold rtnl lock. So add a new helper hsr_for_each_port_rtnl
to allow callers in suitable contexts to iterate ports safely without
explicit RCU locking.
This patch only fixed the callers that is hold rtnl lock. Other caller
issues will be fixed in later patches.
Fixes: c5a7591172 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250905091533.377443-2-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
After commit 8cc71fc3b8 ("wifi: cfg80211: Fix "no buffer
space available" error in nl80211_get_station() for MLO"),
the per-link data is only included in station dumps, where
the size limit is somewhat less of an issue. However, it's
still an issue, depending on how many links a station has
and how much per-link data there is. Thus, for now, disable
per-link statistics entirely.
A complete fix will need to take this into account, make it
opt-in by userspace, and change the dump format to be able
to split a single station's data across multiple netlink
dump messages, which all together is too much development
for a fix.
Fixes: 82d7f841d9 ("wifi: cfg80211: extend to embed link level statistics in NL message")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull misc fixes from Andrew Morton:
"20 hotfixes. 15 are cc:stable and the remainder address post-6.16
issues or aren't considered necessary for -stable kernels. 14 of these
fixes are for MM.
This includes
- kexec fixes from Breno for a recently introduced
use-uninitialized bug
- DAMON fixes from Quanmin Yan to avoid div-by-zero crashes
which can occur if the operator uses poorly-chosen insmod
parameters
and misc singleton fixes"
* tag 'mm-hotfixes-stable-2025-09-10-20-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
MAINTAINERS: add tree entry to numa memblocks and emulation block
mm/damon/sysfs: fix use-after-free in state_show()
proc: fix type confusion in pde_set_flags()
compiler-clang.h: define __SANITIZE_*__ macros only when undefined
mm/vmalloc, mm/kasan: respect gfp mask in kasan_populate_vmalloc()
ocfs2: fix recursive semaphore deadlock in fiemap call
mm/memory-failure: fix VM_BUG_ON_PAGE(PagePoisoned(page)) when unpoison memory
mm/mremap: fix regression in vrm->new_addr check
percpu: fix race on alloc failed warning limit
mm/memory-failure: fix redundant updates for already poisoned pages
s390: kexec: initialize kexec_buf struct
riscv: kexec: initialize kexec_buf struct
arm64: kexec: initialize kexec_buf struct in load_other_segments()
mm/damon/reclaim: avoid divide-by-zero in damon_reclaim_apply_parameters()
mm/damon/lru_sort: avoid divide-by-zero in damon_lru_sort_apply_parameters()
mm/damon/core: set quota->charged_from to jiffies at first charge window
mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range()
init/main.c: fix boot time tracing crash
mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range()
mm/khugepaged: fix the address passed to notifier on testing young
Pull vmescape mitigation fixes from Dave Hansen:
"Mitigate vmscape issue with indirect branch predictor flushes.
vmscape is a vulnerability that essentially takes Spectre-v2 and
attacks host userspace from a guest. It particularly affects
hypervisors like QEMU.
Even if a hypervisor may not have any sensitive data like disk
encryption keys, guest-userspace may be able to attack the
guest-kernel using the hypervisor as a confused deputy.
There are many ways to mitigate vmscape using the existing Spectre-v2
defenses like IBRS variants or the IBPB flushes. This series focuses
solely on IBPB because it works universally across vendors and all
vulnerable processors. Further work doing vendor and model-specific
optimizations can build on top of this if needed / wanted.
Do the normal issue mitigation dance:
- Add the CPU bug boilerplate
- Add a list of vulnerable CPUs
- Use IBPB to flush the branch predictors after running guests"
* tag 'vmscape-for-linus-20250904' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vmscape: Add old Intel CPUs to affected list
x86/vmscape: Warn when STIBP is disabled with SMT
x86/bugs: Move cpu_bugs_smt_update() down
x86/vmscape: Enable the mitigation
x86/vmscape: Add conditional IBPB mitigation
x86/vmscape: Enumerate VMSCAPE bug
Documentation/hw-vuln: Add VMSCAPE documentation
The TMU has two temperature measurement sites located on the chip. The
probe 0 is located inside of the ANAMIX, while the probe 1 is located near
the ARM core. This has been confirmed by checking with HW design team and
checking RTL code.
So correct the {cpu,soc}-thermal sensor index.
Fixes: 30cdd62dce ("arm64: dts: imx8mp: Add thermal zones support")
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Florian Westpha says:
====================
netfilter pull request nf-25-09-10
First patch adds a lockdep annotation for a false-positive splat.
Last patch adds formal reviewer tag for Phil Sutter to MAINTAINERS.
Rest of the patches resolve spurious false negative results during set
lookups while another CPU is processing a transaction.
This has been broken at least since v4.18 when an unconditional
synchronize_rcu call was removed from the commit phase of nf_tables.
Quoting from Stefan Hanreichs original report:
It seems like we've found an issue with atomicity when reloading
nftables rulesets. Sometimes there is a small window where rules
containing sets do not seem to apply to incoming traffic, due to the set
apparently being empty for a short amount of time when flushing / adding
elements.
Exanple ruleset:
table ip filter {
set match {
type ipv4_addr
flags interval
elements = { 0.0.0.0-192.168.2.19, 192.168.2.21-255.255.255.255 }
}
chain pre {
type filter hook prerouting priority filter; policy accept;
ip saddr @match accept
counter comment "must never match"
}
}
Reproducer transaction:
while true:
nft -f -<<EOF
flush set ip filter match
create element ip filter match { \
0.0.0.0-192.168.2.19, 192.168.2.21-255.255.255.255 }
EOF
done
Then create traffic. to/from e.g. 192.168.2.1 to 192.168.3.10.
Once in a while the counter will increment even though the
'ip saddr @match' rule should have accepted the packet.
See individual patches for details.
Thanks to Stefan Hanreich for an initial description and reproducer for
this bug and to Pablo Neira Ayuso for reviewing earlier iterations of
the patchset.
* tag 'nf-25-09-10-v2' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
MAINTAINERS: add Phil as netfilter reviewer
netfilter: nf_tables: restart set lookup on base_seq change
netfilter: nf_tables: make nft_set_do_lookup available unconditionally
netfilter: nf_tables: place base_seq in struct net
netfilter: nft_set_rbtree: continue traversal if element is inactive
netfilter: nft_set_pipapo: don't check genbit from packetpath lookups
netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation
====================
Link: https://patch.msgid.link/20250910190308.13356-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marc Kleine-Budde says:
====================
pull-request: can 2025-09-10
The 1st patch is by Alex Tran and fixes the Documentation of the
struct bcm_msg_head.
Davide Caratti's patch enabled the VCAN driver as a module for the
Linux self tests.
Tetsuo Handa contributes 3 patches that fix various problems in the
CAN j1939 protocol.
Anssi Hannula's patch fixes a potential use-after-free in the
xilinx_can driver.
Geert Uytterhoeven's patch fixes the rcan_can's suspend to RAM on
R-Car Gen3 using PSCI.
* tag 'linux-can-fixes-for-6.17-20250910' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: rcar_can: rcar_can_resume(): fix s2ram with PSCI
can: xilinx_can: xcan_write_frame(): fix use-after-free of transmitted SKB
can: j1939: j1939_local_ecu_get(): undo increment when j1939_local_ecu_get() fails
can: j1939: j1939_sk_bind(): call j1939_priv_put() immediately when j1939_local_ecu_get() failed
can: j1939: implement NETDEV_UNREGISTER notification handler
selftests: can: enable CONFIG_CAN_VCAN as a module
docs: networking: can: change bcm_msg_head frames member to support flexible array
====================
Link: https://patch.msgid.link/20250910162907.948454-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-09-09 (igb, i40e)
For igb:
Tianyu Xu removes passing of, no longer needed, NAPI id to avoid NULL
pointer dereference on ethtool loopback testing.
Kohei Enju corrects reporting/testing of link state when interface is
down.
For i40e:
Michal Schmidt corrects value being passed to free_irq().
Jake sets hardware maximum frame size on probe to ensure
expected/consistent state.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: fix Jumbo Frame support after iPXE boot
i40e: fix IRQ freeing in i40e_vsi_request_irq_msix error path
igb: fix link test skipping when interface is admin down
igb: Fix NULL pointer dereference in ethtool loopback test
====================
Link: https://patch.msgid.link/20250909203236.3603960-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 3f490a74a8a1 ("clocksource/drivers/vf-pit: Rename the VF PIT to NXP
PIT") renames the config VF_PIT_TIMER to NXP_PIT_TIMER, but it misses
adjusting a reference to VF_PIT_TIMER in arch/arm/mach-imx/Kconfig.
Adjust the config reference to the new name.
Fixes: 3f490a74a8a1 ("clocksource/drivers/vf-pit: Rename the VF PIT to NXP PIT")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
To ensure successful builds when CONFIG_IMX_SCMI_CPU_DRV is not enabled,
this patch adds static inline stub implementations for the following
functions:
- scmi_imx_cpu_start()
- scmi_imx_cpu_started()
- scmi_imx_cpu_reset_vector_set()
These stubs return -EOPNOTSUPP to indicate that the functionality is not
supported in the current configuration. This avoids potential build or
link errors in code that conditionally calls these functions based on
feature availability.
Fixes: 1055faa5d6 ("firmware: imx: Add i.MX95 SCMI CPU driver")
Reviewed-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
To ensure successful builds when CONFIG_IMX_SCMI_LMM_DRV is not enabled,
this patch adds static inline stub implementations for the following
functions:
- scmi_imx_lmm_operation()
- scmi_imx_lmm_info()
- scmi_imx_lmm_reset_vector_set()
These stubs return -EOPNOTSUPP to indicate that the functionality is not
supported in the current configuration. This avoids potential build or
link errors in code that conditionally calls these functions based on
feature availability.
Fixes: 7242bbf418 ("firmware: imx: Add i.MX95 SCMI LMM driver")
Reviewed-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
To ensure successful builds when CONFIG_IMX_SCMI_MISC_DRV is not enabled,
this patch adds static inline stub implementations for the following
functions:
- scmi_imx_misc_ctrl_get()
- scmi_imx_misc_ctrl_set()
These stubs return -EOPNOTSUPP to indicate that the functionality is not
supported in the current configuration. This avoids potential build or
link errors in code that conditionally calls these functions based on
feature availability.
This patch also drops the changes in commit 540c830212 ("firmware: imx:
remove duplicate scmi_imx_misc_ctrl_get()").
The original change aimed to simplify the handling of optional features by
removing conditional stubs. However, the use of conditional stubs is
necessary when CONFIG_IMX_SCMI_MISC_DRV is n, while consumer driver is
set to y.
This is not a matter of preserving legacy patterns, but rather to ensure
that there is no link error whether for module or built-in.
Fixes: 0b4f8a68b2 ("firmware: imx: Add i.MX95 MISC driver")
Reviewed-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Drop phylink_{suspend,resume}() from ax88772 PM callbacks.
MDIO bus accesses have their own runtime-PM handling and will try to
wake the device if it is suspended. Such wake attempts must not happen
from PM callbacks while the device PM lock is held. Since phylink
{sus|re}sume may trigger MDIO, it must not be called in PM context.
No extra phylink PM handling is required for this driver:
- .ndo_open/.ndo_stop control the phylink start/stop lifecycle.
- ethtool/phylib entry points run in process context, not PM.
- phylink MAC ops program the MAC on link changes after resume.
Fixes: e0bffe3e68 ("net: asix: ax88772: migrate to phylink")
Reported-by: Hubert Wiśniewski <hubert.wisniewski.25632@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Tested-by: Hubert Wiśniewski <hubert.wisniewski.25632@gmail.com>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://patch.msgid.link/20250908112619.2900723-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In C, enumerated types do not have a defined size, apart from being
compatible with one of the standard types. This allows an ABI /
compiler to choose the type of an enum depending on the values it
needs to store, and storing larger values in it can lead to undefined
behaviour.
The tx_type and rx_filters members of struct kernel_ethtool_ts_info
are defined as enumerated types, but are bit arrays, where each bit
is defined by the enumerated type. This means they typically store
values in excess of the maximum value of the enumerated type, in
fact (1 << max_value) and thus must not be declared using the
enumated type.
Fix both of these to use u32, as per the corresponding __u32 UAPI type.
Fixes: 2111375b85 ("net: Add struct kernel_ethtool_ts_info")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/E1uvMEK-00000003Amd-2pWR@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull NFS client fixes from Trond Myklebust:
"Stable patches:
- Revert "SUNRPC: Don't allow waiting for exiting tasks" as it is
breaking ltp tests
Bugfixes:
- Another set of fixes to the tracking of NFSv4 server capabilities
when crossing filesystem boundaries
- Localio fix to restore credentials and prevent triggering a
BUG_ON()
- Fix to prevent flapping of the localio on/off trigger
- Protections against 'eof page pollution' as demonstrated in
xfstests generic/363
- Series of patches to ensure correct ordering of O_DIRECT i/o and
truncate, fallocate and copy functions
- Fix a NULL pointer check in flexfiles reads that regresses 6.17
- Correct a typo that breaks flexfiles layout segment processing"
* tag 'nfs-for-6.17-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4/flexfiles: Fix layout merge mirror check.
SUNRPC: call xs_sock_process_cmsg for all cmsg
Revert "SUNRPC: Don't allow waiting for exiting tasks"
NFS: Fix the marking of the folio as up to date
NFS: nfs_invalidate_folio() must observe the offset and size arguments
NFSv4.2: Serialise O_DIRECT i/o and copy range
NFSv4.2: Serialise O_DIRECT i/o and clone range
NFSv4.2: Serialise O_DIRECT i/o and fallocate()
NFS: Serialise O_DIRECT i/o and truncate()
NFSv4.2: Protect copy offload and clone against 'eof page pollution'
NFS: Protect against 'eof page pollution'
flexfiles/pNFS: fix NULL checks on result of ff_layout_choose_ds_for_read
nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()
nfs/localio: restore creds before releasing pageio data
NFSv4: Clear the NFS_CAP_XATTR flag if not supported by the server
NFSv4: Clear NFS_CAP_OPEN_XOR and NFS_CAP_DELEGTIME if not supported
NFSv4: Clear the NFS_CAP_FS_LOCATIONS flag if it is not set
NFSv4: Don't clear capabilities that won't be reset
Leon Hwang says:
====================
bpf: Reject bpf_timer for PREEMPT_RT
While running './test_progs -t timer' to validate the test case from
"selftests/bpf: Introduce experimental bpf_in_interrupt()"[0] for
PREEMPT_RT, I encountered a kernel warning:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
To address this, reject bpf_timer usage in the verifier when
PREEMPT_RT is enabled, and skip the corresponding timer selftests.
Changes:
v2 -> v3:
* Drop skipping test case 'timer_interrupt'.
* Address comments from Alexei:
* Respin targeting bpf tree.
* Trim commit log.
v1 -> v2:
* Skip test case 'timer_interrupt'.
Links:
[0] https://lore.kernel.org/bpf/20250903140438.59517-1-leon.hwang@linux.dev/
====================
Link: https://patch.msgid.link/20250910125740.52172-1-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When enable CONFIG_PREEMPT_RT, the kernel will warn when run timer
selftests by './test_progs -t timer':
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
In order to avoid such warning, reject bpf_timer in verifier when
PREEMPT_RT is enabled.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20250910125740.52172-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
There is a place where generic code in messenger.c is reading and
another place where it is writing to con->v1 union member without
checking that the union member is active (i.e. msgr1 is in use).
On 64-bit systems, con->v1.auth_retry overlaps with con->v2.out_iter,
so such a read is almost guaranteed to return a bogus value instead of
0 when msgr2 is in use. This ends up being fairly benign because the
side effect is just the invalidation of the authorizer and successive
fetching of new tickets.
con->v1.connect_seq overlaps with con->v2.conn_bufs and the fact that
it's being written to can cause more serious consequences, but luckily
it's not something that happens often.
Cc: stable@vger.kernel.org
Fixes: cd1a677cad ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Commit 3bbf3565f4 ("svm: Do not intercept CR8 when enable AVIC")
inhibited pre-VMRUN sync of TPR from LAPIC into VMCB::V_TPR in
sync_lapic_to_cr8() when AVIC is active.
AVIC does automatically sync between these two fields, however it does
so only on explicit guest writes to one of these fields, not on a bare
VMRUN.
This meant that when AVIC is enabled host changes to TPR in the LAPIC
state might not get automatically copied into the V_TPR field of VMCB.
This is especially true when it is the userspace setting LAPIC state via
KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
VMCB.
Practice shows that it is the V_TPR that is actually used by the AVIC to
decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI),
so any leftover value in V_TPR will cause serious interrupt delivery issues
in the guest when AVIC is enabled.
Fix this issue by doing pre-VMRUN TPR sync from LAPIC into VMCB::V_TPR
even when AVIC is enabled.
Fixes: 3bbf3565f4 ("svm: Do not intercept CR8 when enable AVIC")
Cc: stable@vger.kernel.org
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://lore.kernel.org/r/c231be64280b1461e854e1ce3595d70cde3a2e9d.1756139678.git.maciej.szmigiero@oracle.com
[sean: tag for stable@]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Pull tracing fixes from Steven Rostedt:
- Remove redundant __GFP_NOWARN flag is kmalloc
As now __GFP_NOWARN is part of __GFP_NOWAIT, it can be removed from
kmalloc as it is redundant.
- Use copy_from_user_nofault() instead of _inatomic() for trace markers
The trace_marker files are written to to allow user space to quickly
write into the tracing ring buffer.
Back in 2016, the get_user_pages_fast() and the kmap() logic was
replaced by a __copy_from_user_inatomic(), but didn't properly
disable page faults around it.
Since the time this was added, copy_from_user_nofault() was added
which does the required page fault disabling for us.
- Fix the assembly markup in the ftrace direct sample code
The ftrace direct sample code (which is also used for selftests), had
the size directive between the "leave" and the "ret" instead of after
the ret. This caused objtool to think the code was unreachable.
- Only call unregister_pm_notifier() on outer most fgraph registration
There was an error path in register_ftrace_graph() that did not call
unregister_pm_notifier() on error, so it was added in the error path.
The problem with that fix, is that register_pm_notifier() is only
called by the initial user of fgraph. If that succeeds, but another
fgraph registration were to fail, then unregister_pm_notifier() would
be called incorrectly.
- Fix a crash in osnoise when zero size cpumask is passed in
If a zero size CPU mask is passed in, the kmalloc() would return
ZERO_SIZE_PTR which is not checked, and the code would continue
thinking it had real memory and crash. If zero is passed in as the
size of the write, simply return 0.
- Fix possible warning in trace_pid_write()
If while processing a series of numbers passed to the "set_event_pid"
file, and one of the updates fails to allocate (triggered by a fault
injection), it can cause a warning to trigger. Check the return value
of the call to trace_pid_list_set() and break out early with an error
code if it fails.
* tag 'trace-v6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Silence warning when chunk allocation fails in trace_pid_write
tracing/osnoise: Fix null-ptr-deref in bitmap_parselist()
trace/fgraph: Fix error handling
ftrace/samples: Fix function size computation
tracing: Fix tracing_marker may trigger page fault during preempt_disable
trace: Remove redundant __GFP_NOWARN
Commit 12ffc3b151 ("PM: Restrict swap use to later in the suspend
sequence") incorrectly removed a pm_restrict_gfp_mask() call from
hibernation_snapshot(), so memory allocations involving swap are not
prevented from being carried out in this code path any more which may
lead to serious breakage.
The symptoms of such breakage have become visible after adding a
shrink_shmem_memory() call to hibernation_snapshot() in commit
2640e81947 ("PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()")
which caused this problem to be much more likely to manifest itself.
However, since commit 2640e81947 was initially present in the DRM
tree that did not include commit 12ffc3b151, the symptoms of this
issue were not visible until merge commit 260f6f4fda ("Merge tag
'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel")
that exposed it through an entirely reasonable merge conflict
resolution.
Fixes: 12ffc3b151 ("PM: Restrict swap use to later in the suspend sequence")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220555
Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Cc: 6.16+ <stable@vger.kernel.org> # 6.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Phil has contributed to netfilter with features, fixes and patch reviews
for a long time. Make this more formal and add Reviewer tag.
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
The hash, hash_fast, rhash and bitwise sets may indicate no result even
though a matching element exists during a short time window while other
cpu is finalizing the transaction.
This happens when the hash lookup/bitwise lookup function has picked up
the old genbit, right before it was toggled by nf_tables_commit(), but
then the same cpu managed to unlink the matching old element from the
hash table:
cpu0 cpu1
has added new elements to clone
has marked elements as being
inactive in new generation
perform lookup in the set
enters commit phase:
A) observes old genbit
increments base_seq
I) increments the genbit
II) removes old element from the set
B) finds matching element
C) returns no match: found
element is not valid in old
generation
Next lookup observes new genbit and
finds matching e2.
Consider a packet matching element e1, e2.
cpu0 processes following transaction:
1. remove e1
2. adds e2, which has same key as e1.
P matches both e1 and e2. Therefore, cpu1 should always find a match
for P. Due to above race, this is not the case:
cpu1 observed the old genbit. e2 will not be considered once it is found.
The element e1 is not found anymore if cpu0 managed to unlink it from the
hlist before cpu1 found it during list traversal.
The situation only occurs for a brief time period, lookups happening
after I) observe new genbit and return e2.
This problem exists in all set types except nft_set_pipapo, so fix it once
in nft_lookup rather than each set ops individually.
Sample the base sequence counter, which gets incremented right before the
genbit is changed.
Then, if no match is found, retry the lookup if the base sequence was
altered in between.
If the base sequence hasn't changed:
- No update took place: no-match result is expected.
This is the common case. or:
- nf_tables_commit() hasn't progressed to genbit update yet.
Old elements were still visible and nomatch result is expected, or:
- nf_tables_commit updated the genbit:
We picked up the new base_seq, so the lookup function also picked
up the new genbit, no-match result is expected.
If the old genbit was observed, then nft_lookup also picked up the old
base_seq: nft_lookup_should_retry() returns true and relookup is performed
in the new generation.
This problem was added when the unconditional synchronize_rcu() call
that followed the current/next generation bit toggle was removed.
Thanks to Pablo Neira Ayuso for reviewing an earlier version of this
patchset, for suggesting re-use of existing base_seq and placement of
the restart loop in nft_set_do_lookup().
Fixes: 0cbc06b3fa ("netfilter: nf_tables: remove synchronize_rcu in commit phase")
Signed-off-by: Florian Westphal <fw@strlen.de>
This function was added for retpoline mitigation and is replaced by a
static inline helper if mitigations are not enabled.
Enable this helper function unconditionally so next patch can add a lookup
restart mechanism to fix possible false negatives while transactions are
in progress.
Adding lookup restarts in nft_lookup_eval doesn't work as nft_objref would
then need the same copypaste loop.
This patch is separate to ease review of the actual bug fix.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
This will soon be read from packet path around same time as the gencursor.
Both gencursor and base_seq get incremented almost at the same time, so
it makes sense to place them in the same structure.
This doesn't increase struct net size on 64bit due to padding.
Signed-off-by: Florian Westphal <fw@strlen.de>
When the rbtree lookup function finds a match in the rbtree, it sets the
range start interval to a potentially inactive element.
Then, after tree lookup, if the matching element is inactive, it returns
NULL and suppresses a matching result.
This is wrong and leads to false negative matches when a transaction has
already entered the commit phase.
cpu0 cpu1
has added new elements to clone
has marked elements as being
inactive in new generation
perform lookup in the set
enters commit phase:
I) increments the genbit
A) observes new genbit
B) finds matching range
C) returns no match: found
range invalid in new generation
II) removes old elements from the tree
C New nft_lookup happening now
will find matching element,
because it is no longer
obscured by old, inactive one.
Consider a packet matching range r1-r2:
cpu0 processes following transaction:
1. remove r1-r2
2. add r1-r3
P is contained in both ranges. Therefore, cpu1 should always find a match
for P. Due to above race, this is not the case:
cpu1 does find r1-r2, but then ignores it due to the genbit indicating
the range has been removed. It does NOT test for further matches.
The situation persists for all lookups until after cpu0 hits II) after
which r1-r3 range start node is tested for the first time.
Move the "interval start is valid" check ahead so that tree traversal
continues if the starting interval is not valid in this generation.
Thanks to Stefan Hanreich for providing an initial reproducer for this
bug.
Reported-by: Stefan Hanreich <s.hanreich@proxmox.com>
Fixes: c1eda3c639 ("netfilter: nft_rbtree: ignore inactive matching element with no descendants")
Signed-off-by: Florian Westphal <fw@strlen.de>
The pipapo set type is special in that it has two copies of its
datastructure: one live copy containing only valid elements and one
on-demand clone used during transaction where adds/deletes happen.
This clone is not visible to the datapath.
This is unlike all other set types in nftables, those all link new
elements into their live hlist/tree.
For those sets, the lookup functions must skip the new elements while the
transaction is ongoing to ensure consistency.
As the clone is shallow, removal does have an effect on the packet path:
once the transaction enters the commit phase the 'gencursor' bit that
determines which elements are active and which elements should be ignored
(because they are no longer valid) is flipped.
This causes the datapath lookup to ignore these elements if they are found
during lookup.
This opens up a small race window where pipapo has an inconsistent view of
the dataset from when the transaction-cpu flipped the genbit until the
transaction-cpu calls nft_pipapo_commit() to swap live/clone pointers:
cpu0 cpu1
has added new elements to clone
has marked elements as being
inactive in new generation
perform lookup in the set
enters commit phase:
I) increments the genbit
A) observes new genbit
removes elements from the clone so
they won't be found anymore
B) lookup in datastructure
can't see new elements yet,
but old elements are ignored
-> Only matches elements that
were not changed in the
transaction
II) calls nft_pipapo_commit(), clone
and live pointers are swapped.
C New nft_lookup happening now
will find matching elements.
Consider a packet matching range r1-r2:
cpu0 processes following transaction:
1. remove r1-r2
2. add r1-r3
P is contained in both ranges. Therefore, cpu1 should always find a match
for P. Due to above race, this is not the case:
cpu1 does find r1-r2, but then ignores it due to the genbit indicating
the range has been removed.
At the same time, r1-r3 is not visible yet, because it can only be found
in the clone.
The situation persists for all lookups until after cpu0 hits II).
The fix is easy: Don't check the genbit from pipapo lookup functions.
This is possible because unlike the other set types, the new elements are
not reachable from the live copy of the dataset.
The clone/live pointer swap is enough to avoid matching on old elements
while at the same time all new elements are exposed in one go.
After this change, step B above returns a match in r1-r2.
This is fine: r1-r2 only becomes truly invalid the moment they get freed.
This happens after a synchronize_rcu() call and rcu read lock is held
via netfilter hook traversal (nf_hook_slow()).
Cc: Stefano Brivio <sbrivio@redhat.com>
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Florian Westphal <fw@strlen.de>
Running new 'set_flush_add_atomic_bitmap' test case for nftables.git
with CONFIG_PROVE_RCU_LIST=y yields:
net/netfilter/nft_set_bitmap.c:231 RCU-list traversed in non-reader section!!
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by nft/4008:
#0: ffff888147f79cd8 (&nft_net->commit_mutex){+.+.}-{4:4}, at: nf_tables_valid_genid+0x2f/0xd0
lockdep_rcu_suspicious+0x116/0x160
nft_bitmap_walk+0x22d/0x240
nf_tables_delsetelem+0x1010/0x1a00
..
This is a false positive, the list cannot be altered while the
transaction mutex is held, so pass the relevant argument to the iterator.
Fixes tag intentionally wrong; no point in picking this up if earlier
false-positive-fixups were not applied.
Fixes: 28b7a6b84c ("netfilter: nf_tables: avoid false-positive lockdep splats in set walker")
Signed-off-by: Florian Westphal <fw@strlen.de>
When dual-divider clock support was introduced, the P divider offset was
left out of the .recalc_rate readback function. This causes the clock
rate to become bogus or even zero (possibly due to the P divider being
1, leading to a divide-by-zero).
Fix this by incorporating the P divider offset into the calculation.
Fixes: 45717804b7 ("clk: sunxi-ng: mp: introduce dual-divider clock")
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://patch.msgid.link/20250830170901.1996227-4-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
can_put_echo_skb() takes ownership of the SKB and it may be freed
during or after the call.
However, xilinx_can xcan_write_frame() keeps using SKB after the call.
Fix that by only calling can_put_echo_skb() after the code is done
touching the SKB.
The tx_lock is held for the entire xcan_write_frame() execution and
also on the can_get_echo_skb() side so the order of operations does not
matter.
An earlier fix commit 3d3c817c3a ("can: xilinx_can: Fix usage of skb
memory") did not move the can_put_echo_skb() call far enough.
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Fixes: 1598efe57b ("can: xilinx_can: refactor code in preparation for CAN FD support")
Link: https://patch.msgid.link/20250822095002.168389-1-anssi.hannula@bitwise.fi
[mkl: add "commit" in front of sha1 in patch description]
[mkl: fix indention]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Commit 25fe97cb76 ("can: j1939: move j1939_priv_put() into sk_destruct
callback") expects that a call to j1939_priv_put() can be unconditionally
delayed until j1939_sk_sock_destruct() is called. But a refcount leak will
happen when j1939_sk_bind() is called again after j1939_local_ecu_get()
from previous j1939_sk_bind() call returned an error. We need to call
j1939_priv_put() before j1939_sk_bind() returns an error.
Fixes: 25fe97cb76 ("can: j1939: move j1939_priv_put() into sk_destruct callback")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/4f49a1bc-a528-42ad-86c0-187268ab6535@I-love.SAKURA.ne.jp
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
syzbot is reporting
unregister_netdevice: waiting for vcan0 to become free. Usage count = 2
problem, for j1939 protocol did not have NETDEV_UNREGISTER notification
handler for undoing changes made by j1939_sk_bind().
Commit 25fe97cb76 ("can: j1939: move j1939_priv_put() into sk_destruct
callback") expects that a call to j1939_priv_put() can be unconditionally
delayed until j1939_sk_sock_destruct() is called. But we need to call
j1939_priv_put() against an extra ref held by j1939_sk_bind() call
(as a part of undoing changes made by j1939_sk_bind()) as soon as
NETDEV_UNREGISTER notification fires (i.e. before j1939_sk_sock_destruct()
is called via j1939_sk_release()). Otherwise, the extra ref on "struct
j1939_priv" held by j1939_sk_bind() call prevents "struct net_device" from
dropping the usage count to 1; making it impossible for
unregister_netdevice() to continue.
Reported-by: syzbot <syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
Tested-by: syzbot <syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com>
Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Fixes: 25fe97cb76 ("can: j1939: move j1939_priv_put() into sk_destruct callback")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/ac9db9a4-6c30-416e-8b94-96e6559d55b2@I-love.SAKURA.ne.jp
[mkl: remove space in front of label]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
syzbot reported the splat below. [0]
The repro does the following:
1. Load a sk_msg prog that calls bpf_msg_cork_bytes(msg, cork_bytes)
2. Attach the prog to a SOCKMAP
3. Add a socket to the SOCKMAP
4. Activate fault injection
5. Send data less than cork_bytes
At 5., the data is carried over to the next sendmsg() as it is
smaller than the cork_bytes specified by bpf_msg_cork_bytes().
Then, tcp_bpf_send_verdict() tries to allocate psock->cork to hold
the data, but this fails silently due to fault injection + __GFP_NOWARN.
If the allocation fails, we need to revert the sk->sk_forward_alloc
change done by sk_msg_alloc().
Let's call sk_msg_free() when tcp_bpf_send_verdict fails to allocate
psock->cork.
The "*copied" also needs to be updated such that a proper error can
be returned to the caller, sendmsg. It fails to allocate psock->cork.
Nothing has been corked so far, so this patch simply sets "*copied"
to 0.
[0]:
WARNING: net/ipv4/af_inet.c:156 at inet_sock_destruct+0x623/0x730 net/ipv4/af_inet.c:156, CPU#1: syz-executor/5983
Modules linked in:
CPU: 1 UID: 0 PID: 5983 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
RIP: 0010:inet_sock_destruct+0x623/0x730 net/ipv4/af_inet.c:156
Code: 0f 0b 90 e9 62 fe ff ff e8 7a db b5 f7 90 0f 0b 90 e9 95 fe ff ff e8 6c db b5 f7 90 0f 0b 90 e9 bb fe ff ff e8 5e db b5 f7 90 <0f> 0b 90 e9 e1 fe ff ff 89 f9 80 e1 07 80 c1 03 38 c1 0f 8c 9f fc
RSP: 0018:ffffc90000a08b48 EFLAGS: 00010246
RAX: ffffffff8a09d0b2 RBX: dffffc0000000000 RCX: ffff888024a23c80
RDX: 0000000000000100 RSI: 0000000000000fff RDI: 0000000000000000
RBP: 0000000000000fff R08: ffff88807e07c627 R09: 1ffff1100fc0f8c4
R10: dffffc0000000000 R11: ffffed100fc0f8c5 R12: ffff88807e07c380
R13: dffffc0000000000 R14: ffff88807e07c60c R15: 1ffff1100fc0f872
FS: 00005555604c4500(0000) GS:ffff888125af1000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555604df5c8 CR3: 0000000032b06000 CR4: 00000000003526f0
Call Trace:
<IRQ>
__sk_destruct+0x86/0x660 net/core/sock.c:2339
rcu_do_batch kernel/rcu/tree.c:2605 [inline]
rcu_core+0xca8/0x1770 kernel/rcu/tree.c:2861
handle_softirqs+0x286/0x870 kernel/softirq.c:579
__do_softirq kernel/softirq.c:613 [inline]
invoke_softirq kernel/softirq.c:453 [inline]
__irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:680
irq_exit_rcu+0x9/0x30 kernel/softirq.c:696
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1052 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1052
</IRQ>
Fixes: 4f738adba3 ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Reported-by: syzbot+4cabd1d2fa917a456db8@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68c0b6b5.050a0220.3c6139.0013.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250909232623.4151337-1-kuniyu@google.com
The i2c controller binding does not permit permit the node name to
contain "gpio", resulting in two warnings:
i2c-gpio-0 (i2c-gpio): $nodename:0: 'i2c-gpio-0' does not match '^i2c(@.+|-[a-z0-9]+)?$'
i2c-gpio-0 (i2c-gpio): Unevaluated properties are not allowed ('#address-cells', '#size-cells', 'adc@54' were unexpected)
Drop it to satisfy dtbs_check.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20250909-frown-wrinkle-f16df243a970@spud
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
When copying FDs, the copy size should not include the control
message header (cmsghdr). Fix it.
Fixes: 5cde6096a4 ("um: generalize os_rcv_fd")
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When register_virtio_device() fails in virtio_uml_probe(),
the code sets vu_dev->registered = 1 even though
the device was not successfully registered.
This can lead to use-after-free or other issues.
Fixes: 04e5b1fb01 ("um: virtio: Remove device on disconnect")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
On one of my machines UML failed to start after enabling
SELinux.
UML failed to start because SELinux's execmod rule denies
executable pages on a modified file mapping.
Historically UML marks it's stack rwx.
AFAICT, these days this is no longer needed, so let's remove
PROT_EXEC.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The intel_pstate driver manages CPU capacity changes itself and it does
not need an update of the capacity of all CPUs in the system to be
carried out after registering a PD.
Moreover, in some configurations (for instance, an SMT-capable
hybrid x86 system booted with nosmt in the kernel command line) the
em_check_capacity_update() call at the end of em_dev_register_perf_domain()
always fails and reschedules itself to run once again in 1 s, so
effectively it runs in vain every 1 s forever.
To address this, introduce a new variant of em_dev_register_perf_domain(),
called em_dev_register_pd_no_update(), that does not invoke
em_check_capacity_update(), and make intel_pstate use it instead of the
original.
Fixes: 7b010f9b90 ("cpufreq: intel_pstate: EAS support for hybrid platforms")
Closes: https://lore.kernel.org/linux-pm/40212796-734c-4140-8a85-854f72b8144d@panix.com/
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Tested-by: Kenneth R. Crudup <kenny@panix.com>
Cc: 6.16+ <stable@vger.kernel.org> # 6.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The function vgic_flush_lr_state() is calling _raw_spin_unlock()
instead of the proper raw_spin_unlock().
_raw_spin_unlock() is an internal low-level API and should not
be used directly; using raw_spin_unlock() ensures proper locking
semantics in the vgic code.
Fixes: 8fa3adb8c6 ("KVM: arm/arm64: vgic: Make vgic_irq->irq_lock a raw_spinlock")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Message-ID: <20250908180413.3655546-1-alok.a.tiwari@oracle.com>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
In the non-NV case, read permission is always granted when mapping
stage-2, so checking for it doesn't bring much. On the other hand,
shadow stage-2 for NV guests could potentially have non-readable
mappings when we align the permissions with those that L1 set for L2, we
shouldn't be checking for read faults in this case either.
So just remove this check.
Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Wei-Lin Chang <r09922117@csie.ntu.edu.tw>
Link: https://lore.kernel.org/r/20250908064806.4093081-1-r09922117@csie.ntu.edu.tw
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The __vcpu_assign_sys_reg() helper expects the register ID as the second
argument and the value to be assigned as the third. However, the
existing code was passing these parameters in the incorrect order.
Fix the function call to properly read the live value of VBAR_EL1 from
the guest and update the vCPU value immediately before pending the
exception. This ensures the vCPU's value is the same as the guest's and
that the exception will be handled at the correct address upon resuming
the guest.
Fixes: 798eb59787 ("KVM: arm64: Sync protected guest VBAR_EL1 on injecting an undef exception")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20250908163557.2419780-1-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The code for invalidating VNCR entries in both kvm_invalidate_vncr_ipa()
and invalidate_vncr_va() incorrectly uses a bitwise AND with `(size - 1)`
instead of `~(size - 1)` to align the start address. This results
in masking the address bits instead of aligning them down to the start
of the block.
This bug may cause stale VNCR TLB entries to remain valid even after a
TLBI or MMU notifier, leading to incorrect memory translation and
unexpected guest behavior.
Credit to Team 0xB6 in bob14: DongHa Lee, Gyujeong Jin, Daehyeon Ko,
Geonha Lee, Hyungyu Oh, and Jaewon Yang.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Dongha Lee <p@sswd.pw>
Link: https://lore.kernel.org/r/20250906040724.72960-1-p@sswd.pw
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Now that releases of LPIs have been unnested from the ap_list_lock there
are no xarray writers that exist in a context where IRQs are already
disabled. As such we can relax the locking to the non-IRQ disabling
spinlock to guard the LPI xarray.
Note that there are still readers of the LPI xarray where IRQs are
disabled however the readers rely on RCU protection instead of the lock.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250905100531.282980-6-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
syzkaller has caught us red-handed once more, this time nesting regular
spinlocks behind raw spinlocks:
=============================
[ BUG: Invalid wait context ]
6.16.0-rc3-syzkaller-g7b8346bd9fce #0 Not tainted
-----------------------------
syz.0.29/3743 is trying to lock:
a3ff80008e2e9e18 (&xa->xa_lock#20){....}-{3:3}, at: vgic_put_irq+0xb4/0x190 arch/arm64/kvm/vgic/vgic.c:137
other info that might help us debug this:
context-{5:5}
3 locks held by syz.0.29/3743:
#0: a3ff80008e2e90a8 (&kvm->slots_lock){+.+.}-{4:4}, at: kvm_vgic_destroy+0x50/0x624 arch/arm64/kvm/vgic/vgic-init.c:499
#1: a3ff80008e2e9fa0 (&kvm->arch.config_lock){+.+.}-{4:4}, at: kvm_vgic_destroy+0x5c/0x624 arch/arm64/kvm/vgic/vgic-init.c:500
#2: 58f0000021be1428 (&vgic_cpu->ap_list_lock){....}-{2:2}, at: vgic_flush_pending_lpis+0x3c/0x31c arch/arm64/kvm/vgic/vgic.c:150
stack backtrace:
CPU: 0 UID: 0 PID: 3743 Comm: syz.0.29 Not tainted 6.16.0-rc3-syzkaller-g7b8346bd9fce #0 PREEMPT
Hardware name: linux,dummy-virt (DT)
Call trace:
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C)
__dump_stack+0x30/0x40 lib/dump_stack.c:94
dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120
dump_stack+0x1c/0x28 lib/dump_stack.c:129
print_lock_invalid_wait_context kernel/locking/lockdep.c:4833 [inline]
check_wait_context kernel/locking/lockdep.c:4905 [inline]
__lock_acquire+0x978/0x299c kernel/locking/lockdep.c:5190
lock_acquire+0x14c/0x2e0 kernel/locking/lockdep.c:5871
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x5c/0x7c kernel/locking/spinlock.c:162
vgic_put_irq+0xb4/0x190 arch/arm64/kvm/vgic/vgic.c:137
vgic_flush_pending_lpis+0x24c/0x31c arch/arm64/kvm/vgic/vgic.c:158
__kvm_vgic_vcpu_destroy+0x44/0x500 arch/arm64/kvm/vgic/vgic-init.c:455
kvm_vgic_destroy+0x100/0x624 arch/arm64/kvm/vgic/vgic-init.c:505
kvm_arch_destroy_vm+0x80/0x138 arch/arm64/kvm/arm.c:244
kvm_destroy_vm virt/kvm/kvm_main.c:1308 [inline]
kvm_put_kvm+0x800/0xff8 virt/kvm/kvm_main.c:1344
kvm_vm_release+0x58/0x78 virt/kvm/kvm_main.c:1367
__fput+0x4ac/0x980 fs/file_table.c:465
____fput+0x20/0x58 fs/file_table.c:493
task_work_run+0x1bc/0x254 kernel/task_work.c:227
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
do_notify_resume+0x1b4/0x270 arch/arm64/kernel/entry-common.c:151
exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline]
exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline]
el0_svc+0xb4/0x160 arch/arm64/kernel/entry-common.c:768
el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
This is of course no good, but is at odds with how LPI refcounts are
managed. Solve the locking mess by deferring the release of unreferenced
LPIs after the ap_list_lock is released. Mark these to-be-released LPIs
specially to avoid racing with vgic_put_irq() and causing a double-free.
Since references can only be taken on LPIs with a nonzero refcount,
extending the lifetime of freed LPIs is still safe.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reported-by: syzbot+cef594105ac7e60c6d93@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/kvmarm/68acd0d9.a00a0220.33401d.048b.GAE@google.com/
Link: https://lore.kernel.org/r/20250905100531.282980-5-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Spin off the release implementation from vgic_put_irq() to prepare for a
more involved fix for lock ordering such that it may be unnested from
raw spinlocks. This has the minor functional change of doing call_rcu()
behind the xa_lock although it shouldn't be consequential.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250905100531.282980-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Prior to commit 75a5fbaf66 ("KVM: arm64: Compute MDCR_EL2 at
vcpu_load()"), host MDCR_EL2 was saved correctly:
kvm_arch_vcpu_load()
kvm_vcpu_load_debug() /* Doesn't touch hardware MDCR_EL2. */
kvm_vcpu_load_vhe()
__activate_traps_common()
/* Saves host MDCR_EL2. */
*host_data_ptr(host_debug_state.mdcr_el2) = read_sysreg(mdcr_el2)
/* Writes VCPU MDCR_EL2. */
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2)
The MDCR_EL2 value saved previously was restored in
kvm_arch_vcpu_put() -> kvm_vcpu_put_vhe().
After the aforementioned commit, host MDCR_EL2 is never saved:
kvm_arch_vcpu_load()
kvm_vcpu_load_debug() /* Writes VCPU MDCR_EL2 */
kvm_vcpu_load_vhe()
__activate_traps_common()
/* Saves **VCPU** MDCR_EL2. */
*host_data_ptr(host_debug_state.mdcr_el2) = read_sysreg(mdcr_el2)
/* Writes VCPU MDCR_EL2 a second time. */
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2)
kvm_arch_vcpu_put() -> kvm_vcpu_put_vhe() then restores the VCPU MDCR_EL2
value. Also VCPU's MDCR_EL2 value gets written to hardware twice now.
Fix this by saving the host MDCR_EL2 in kvm_arch_vcpu_load() before it gets
overwritten by the VCPU's MDCR_EL2 value, and restore it on VCPU put.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250902130833.338216-3-alexandru.elisei@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
According to the pseudocode for StatisticalProfilingEnabled() from Arm
DDI0487L.b, PMSCR_EL1 controls profiling at EL1 and EL0:
- PMSCR_EL1.E1SPE controls profiling at EL1.
- PMSCR_EL1.E0SPE controls profiling at EL0 if HCR_EL2.TGE=0.
These two fields reset to UNKNOWN values.
When KVM runs in VHE mode and profiling is enabled in the host, before
entering a guest, KVM does not touch any of the SPE registers, leaving the
buffer enabled, and it clears HCR_EL2.TGE. As a result, depending on the
reset value for the E1SPE and E0SPE fields, KVM might unintentionally
profile a guest. Make the behaviour consistent and predictable by clearing
PMSCR_EL1 when KVM initialises the host debug configuration.
Note that this is not a problem for nVHE, because KVM clears
PMSCR_EL1.{E1SPE,E0SPE} before entering the guest.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20250902130833.338216-2-alexandru.elisei@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
kvm_vncr_tlb_lookup() is supposed to return true when the cached VNCR
TLB entry is valid for the current context. For non-Global entries, that
means the entry’s ASID must match the current ASID.
The current code returns true when the ASIDs do *not* match, which
inverts the logic. This is a potential vulnerability:
- Valid entries are ignored and we fall back to kvm_translate_vncr(),
hurting performance.
- Mismatched entries are treated as permission faults (-EPERM) instead
of triggering a fresh translation.
- This can also cause stale translations to be (wrongly) considered
valid across address spaces.
Flip the predicate so non-Global entries only hit when ASIDs match.
Special credit to Team 0xB6 for reporting: DongHa Lee, Gyujeong Jin,
Daehyeon Ko, Geonha Lee, Hyungyu Oh, and Jaewon Yang.
Signed-off-by: Geonha Lee <w1nsom3gna@korea.ac.kr>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250903150421.90752-1-w1nsom3gna@korea.ac.kr
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Currently, nl80211_get_station() allocates a fixed buffer size using
NLMSG_DEFAULT_SIZE. In multi-link scenarios - particularly when the
number of links exceeds two - this buffer size is often insufficient
to accommodate complete station statistics, resulting in "no buffer
space available" errors.
To address this, modify nl80211_get_station() to return only
accumulated station statistics and exclude per link stats.
Pass a new flag (link_stats) to nl80211_send_station() to control
the inclusion of per link statistics. This allows retaining
detailed output with per link data in dump commands, while
excluding it from other commands where it is not needed.
This change modifies the handling of per link stats introduced in
commit 82d7f841d9 ("wifi: cfg80211: extend to embed link level
statistics in NL message") to enable them only for
nl80211_dump_station().
Apply the same fix to cfg80211_del_sta_sinfo() by skipping per link
stats to avoid buffer issues. cfg80211_new_sta() doesn't include
stats and is therefore not impacted.
Fixes: 82d7f841d9 ("wifi: cfg80211: extend to embed link level statistics in NL message")
Signed-off-by: Nithyanantham Paramasivam <nithyanantham.paramasivam@oss.qualcomm.com>
Link: https://patch.msgid.link/20250905124800.1448493-1-nithyanantham.paramasivam@oss.qualcomm.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Miri Korenblit says:
====================
iwlwifi fix
====================
Which is a fix for (old) 130/1030 devices to work again.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Jeff Johnson says:
==================
ath.git update for v6.17-rc6
==================
There's a firmware API alignment fix, and a fix for powersave,
both for ath12k.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Previous checks incorrectly tested the DMA addresses (dma_handle) for
NULL. Since dma_alloc_coherent() returns the CPU (virtual) address, the
NULL check should be performed on the *_base_addr pointer to correctly
detect allocation failures.
Update the checks to validate sqe_base_addr and cqe_base_addr instead of
sqe_dma_addr and cqe_dma_addr.
Fixes: 4682abfae2 ("scsi: ufs: core: mcq: Allocate memory for MCQ mode")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Matthieu Baerts says:
====================
mptcp: misc fixes for v6.17-rc6
Here are various unrelated fixes:
- Patch 1: Fix a wrong attribute type in the MPTCP Netlink specs. A fix
for v6.7.
- Patch 2: Avoid mentioning a deprecated MPTCP sysctl knob in the doc. A
fix for v6.15.
- Patch 3: Handle new warnings from ShellCheck v0.11.0. This prevents
some warnings reported by some CIs. If it is not a good material for
'net', please drop.
====================
Link: https://patch.msgid.link/20250908-net-mptcp-misc-fixes-6-17-rc5-v1-0-5f2168a66079@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Users reported a scenario where MPTCP connections that were configured
with SO_KEEPALIVE prior to connect would fail to enable their keepalives
if MTPCP fell back to TCP mode.
After investigating, this affects keepalives for any connection where
sync_socket_options is called on a socket that is in the closed or
listening state. Joins are handled properly. For connects,
sync_socket_options is called when the socket is still in the closed
state. The tcp_set_keepalive() function does not act on sockets that
are closed or listening, hence keepalive is not immediately enabled.
Since the SO_KEEPOPEN flag is absent, it is not enabled later in the
connect sequence via tcp_finish_connect. Setting the keepalive via
sockopt after connect does work, but would not address any subsequently
created flows.
Fortunately, the fix here is straight-forward: set SOCK_KEEPOPEN on the
subflow when calling sync_socket_options.
The fix was valdidated both by using tcpdump to observe keepalive
packets not being sent before the fix, and being sent after the fix. It
was also possible to observe via ss that the keepalive timer was not
enabled on these sockets before the fix, but was enabled afterwards.
Fixes: 1b3e7ede13 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
Cc: stable@vger.kernel.org
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/aL8dYfPZrwedCIh9@templeofstupid.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Syzkaller managed to lock the lower device via ETHTOOL_SFEATURES:
netdev_lock include/linux/netdevice.h:2761 [inline]
netdev_lock_ops include/net/netdev_lock.h:42 [inline]
netdev_sync_lower_features net/core/dev.c:10649 [inline]
__netdev_update_features+0xcb1/0x1be0 net/core/dev.c:10819
netdev_update_features+0x6d/0xe0 net/core/dev.c:10876
macsec_notify+0x2f5/0x660 drivers/net/macsec.c:4533
notifier_call_chain+0x1b3/0x3e0 kernel/notifier.c:85
call_netdevice_notifiers_extack net/core/dev.c:2267 [inline]
call_netdevice_notifiers net/core/dev.c:2281 [inline]
netdev_features_change+0x85/0xc0 net/core/dev.c:1570
__dev_ethtool net/ethtool/ioctl.c:3469 [inline]
dev_ethtool+0x1536/0x19b0 net/ethtool/ioctl.c:3502
dev_ioctl+0x392/0x1150 net/core/dev_ioctl.c:759
It happens because lower features are out of sync with the upper:
__dev_ethtool (real_dev)
netdev_lock_ops(real_dev)
ETHTOOL_SFEATURES
__netdev_features_change
netdev_sync_upper_features
disable LRO on the lower
if (old_features != dev->features)
netdev_features_change
fires NETDEV_FEAT_CHANGE
macsec_notify
NETDEV_FEAT_CHANGE
netdev_update_features (for each macsec dev)
netdev_sync_lower_features
if (upper_features != lower_features)
netdev_lock_ops(lower) # lower == real_dev
stuck
...
netdev_unlock_ops(real_dev)
Per commit af5f54b0ef ("net: Lock lower level devices when updating
features"), we elide the lock/unlock when the upper and lower features
are synced. Makes sure the lower (real_dev) has proper features after
the macsec link has been created. This makes sure we never hit the
situation where we need to sync upper flags to the lower.
Reported-by: syzbot+7e0f89fb6cae5d002de0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7e0f89fb6cae5d002de0
Fixes: 7e4d784f58 ("net: hold netdev instance lock during rtnetlink operations")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250908173614.3358264-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ndo hwtstamp callbacks are expected to run under the per-device ops
lock. Make the lower get/set paths consistent with the rest of ndo
invocations.
Kernel log:
WARNING: CPU: 13 PID: 51364 at ./include/net/netdev_lock.h:70 __netdev_update_features+0x4bd/0xe60
...
RIP: 0010:__netdev_update_features+0x4bd/0xe60
...
Call Trace:
<TASK>
netdev_update_features+0x1f/0x60
mlx5_hwtstamp_set+0x181/0x290 [mlx5_core]
mlx5e_hwtstamp_set+0x19/0x30 [mlx5_core]
dev_set_hwtstamp_phylib+0x9f/0x220
dev_set_hwtstamp_phylib+0x9f/0x220
dev_set_hwtstamp+0x13d/0x240
dev_ioctl+0x12f/0x4b0
sock_ioctl+0x171/0x370
__x64_sys_ioctl+0x3f7/0x900
? __sys_setsockopt+0x69/0xb0
do_syscall_64+0x6f/0x2e0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
...
</TASK>
....
---[ end trace 0000000000000000 ]---
Note that the mlx5_hwtstamp_set and mlx5e_hwtstamp_set functions shown
in the trace come from an in progress patch converting the legacy ioctl
to ndo_hwtstamp_get/set and are not present in mainline.
Fixes: ffb7ed19ac ("net: hold netdev instance lock during ioctl operations")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://patch.msgid.link/20250907080821.2353388-1-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rename of open files in SMB2+ has been broken for a very long time,
resulting in data loss as the CIFS client would fail the rename(2)
call with -ENOENT and then removing the target file.
Fix this by implementing ->rename_pending_delete() for SMB2+, which
will rename busy files to random filenames (e.g. silly rename) during
unlink(2) or rename(2), and then marking them to delete-on-close.
Besides, introduce a FIND_WR_NO_PENDING_DELETE flag to prevent open(2)
from reusing open handles that had been marked as delete pending.
Handle it in cifs_get_readable_path() as well.
Reported-by: Jean-Baptiste Denis <jbdenis@pasteur.fr>
Closes: https://marc.info/?i=16aeb380-30d4-4551-9134-4e7d1dc833c0@pasteur.fr
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Cc: Frank Sorenson <sorenson@redhat.com>
Cc: Olga Kornievskaia <okorniev@redhat.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Scott Mayhew <smayhew@redhat.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
The blamed commit changed the conditions which phylib uses to stop
and start the state machine in the suspend and resume paths, and
while improving it, has caused two issues.
The original code used this test:
phydev->attached_dev && phydev->adjust_link
and if true, the paths would handle the PHY state machine. This test
evaluates true for normal drivers that are using phylib directly
while the PHY is attached to the network device, but false in all
other cases, which include the following cases:
- when the PHY has never been attached to a network device.
- when the PHY has been detached from a network device (as phy_detach()
sets phydev->attached_dev to NULL, phy_disconnect() calls
phy_detach() and additionally sets phydev->adjust_link NULL.)
- when phylink is using the driver (as phydev->adjust_link is NULL.)
Only the third case was incorrect, and the blamed commit attempted to
fix this by changing this test to (simplified for brevity, see
phy_uses_state_machine()):
phydev->phy_link_change == phy_link_change ?
phydev->attached_dev && phydev->adjust_link : true
However, this also incorrectly evaluates true in the first two cases.
Fix the first case by ensuring that phy_uses_state_machine() returns
false when phydev->phy_link_change is NULL.
Fix the second case by ensuring that phydev->phy_link_change is set to
NULL when phy_detach() is called.
Reported-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20250806082931.3289134-1-xu.yang_2@nxp.com
Fixes: fc75ea20ff ("net: phy: allow MDIO bus PM ops to start/stop state machine for phylink-controlled PHY")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/E1uvMEz-00000003Aoe-3qWe@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The encryption layer can't handle the padding iovs, so flatten the
compound request into a single buffer with required padding to prevent
the server from dropping the connection when finding unaligned
compound requests.
Fixes: bc925c1216 ("smb: client: improve compound padding in encryption")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently, calling bpf_map_kmalloc_node() from __bpf_async_init() can
cause various locking issues; see the following stack trace (edited for
style) as one example:
...
[10.011566] do_raw_spin_lock.cold
[10.011570] try_to_wake_up (5) double-acquiring the same
[10.011575] kick_pool rq_lock, causing a hardlockup
[10.011579] __queue_work
[10.011582] queue_work_on
[10.011585] kernfs_notify
[10.011589] cgroup_file_notify
[10.011593] try_charge_memcg (4) memcg accounting raises an
[10.011597] obj_cgroup_charge_pages MEMCG_MAX event
[10.011599] obj_cgroup_charge_account
[10.011600] __memcg_slab_post_alloc_hook
[10.011603] __kmalloc_node_noprof
...
[10.011611] bpf_map_kmalloc_node
[10.011612] __bpf_async_init
[10.011615] bpf_timer_init (3) BPF calls bpf_timer_init()
[10.011617] bpf_prog_xxxxxxxxxxxxxxxx_fcg_runnable
[10.011619] bpf__sched_ext_ops_runnable
[10.011620] enqueue_task_scx (2) BPF runs with rq_lock held
[10.011622] enqueue_task
[10.011626] ttwu_do_activate
[10.011629] sched_ttwu_pending (1) grabs rq_lock
...
The above was reproduced on bpf-next (b338cf849e) by modifying
./tools/sched_ext/scx_flatcg.bpf.c to call bpf_timer_init() during
ops.runnable(), and hacking the memcg accounting code a bit to make
a bpf_timer_init() call more likely to raise an MEMCG_MAX event.
We have also run into other similar variants (both internally and on
bpf-next), including double-acquiring cgroup_file_kn_lock, the same
worker_pool::lock, etc.
As suggested by Shakeel, fix this by using __GFP_HIGH instead of
GFP_ATOMIC in __bpf_async_init(), so that e.g. if try_charge_memcg()
raises an MEMCG_MAX event, we call __memcg_memory_event() with
@allow_spinning=false and avoid calling cgroup_file_notify() there.
Depends on mm patch
"memcg: skip cgroup_file_notify if spinning is not allowed":
https://lore.kernel.org/bpf/20250905201606.66198-1-shakeel.butt@linux.dev/
v0 approach s/bpf_map_kmalloc_node/bpf_mem_alloc/
https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@google.com/
v1 approach:
https://lore.kernel.org/bpf/20250905234547.862249-1-yepeilin@google.com/
Fixes: b00628b1c7 ("bpf: Introduce bpf timers.")
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/20250909095222.2121438-1-yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
OpenWRT users reported regression on ARMv6 devices after updating to latest
HEAD, where tcpdump filter:
tcpdump "not ether host 3c37121a2b3c and not ether host 184ecbca2a3a \
and not ether host 14130b4d3f47 and not ether host f0f61cf440b7 \
and not ether host a84b4dedf471 and not ether host d022be17e1d7 \
and not ether host 5c497967208b and not ether host 706655784d5b"
fails with warning: "Kernel filter failed: No error information"
when using config:
# CONFIG_BPF_JIT_ALWAYS_ON is not set
CONFIG_BPF_JIT_DEFAULT_ON=y
The issue arises because commits:
1. "bpf: Fix array bounds error with may_goto" changed default runtime to
__bpf_prog_ret0_warn when jit_requested = 1
2. "bpf: Avoid __bpf_prog_ret0_warn when jit fails" returns error when
jit_requested = 1 but jit fails
This change restores interpreter fallback capability for BPF programs with
stack size <= 512 bytes when jit fails.
Reported-by: Felix Fietkau <nbd@nbd.name>
Closes: https://lore.kernel.org/bpf/2e267b4b-0540-45d8-9310-e127bf95fc63@nbd.name/
Fixes: 6ebc5030e0 ("bpf: Fix array bounds error with may_goto")
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250909144614.2991253-1-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, out of all 3 types of waiters in the rqspinlock slow path
(i.e., pending bit waiter, wait queue head waiter, and wait queue
non-head waiter), only the pending bit waiter and wait queue head
waiters apply deadlock checks and a timeout on their waiting loop. The
assumption here was that the wait queue head's forward progress would be
sufficient to identify cases where the lock owner or pending bit waiter
is stuck, and non-head waiters relying on the head waiter would prove to
be sufficient for their own forward progress.
However, the head waiter itself can be preempted by a non-head waiter
for the same lock (AA) or a different lock (ABBA) in a manner that
impedes its forward progress. In such a case, non-head waiters not
performing deadlock and timeout checks becomes insufficient, and the
system can enter a state of lockup.
This is typically not a concern with non-NMI lock acquisitions, as lock
holders which in run in different contexts (IRQ, non-IRQ) use "irqsave"
variants of the lock APIs, which naturally excludes such lock holders
from preempting one another on the same CPU.
It might seem likely that a similar case may occur for rqspinlock when
programs are attached to contention tracepoints (begin, end), however,
these tracepoints either precede the enqueue into the wait queue, or
succeed it, therefore cannot be used to preempt a head waiter's waiting
loop.
We must still be careful against nested kprobe and fentry programs that
may attach to the middle of the head's waiting loop to stall forward
progress and invoke another rqspinlock acquisition that proceeds as a
non-head waiter. To this end, drop CC_FLAGS_FTRACE from the rqspinlock.o
object file.
For now, this issue is resolved by falling back to a repeated trylock on
the lock word from NMI context, while performing the deadlock checks to
break out early in case forward progress is impossible, and use the
timeout as a final fallback.
A more involved fix to terminate the queue when such a condition occurs
will be made as a follow up. A selftest to stress this aspect of nested
NMI/non-NMI locking attempts will be added in a subsequent patch to the
bpf-next tree when this fix lands and trees are synchronized.
Reported-by: Josef Bacik <josef@toxicpanda.com>
Fixes: 164c246571 ("rqspinlock: Protect waiters in queue from stalls")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250909184959.3509085-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Eryk reported an issue that I have put under Closes: tag, related to
umem addrs being prematurely produced onto pool's completion queue.
Let us make the skb's destructor responsible for producing all addrs
that given skb used.
Commit from fixes tag introduced the buggy behavior, it was not broken
from day 1, but rather when xsk multi-buffer got introduced.
In order to mitigate performance impact as much as possible, mimic the
linear and frag parts within skb by storing the first address from XSK
descriptor at sk_buff::destructor_arg. For fragments, store them at ::cb
via list. The nodes that will go onto list will be allocated via
kmem_cache. xsk_destruct_skb() will consume address stored at
::destructor_arg and optionally go through list from ::cb, if count of
descriptors associated with this particular skb is bigger than 1.
Previous approach where whole array for storing UMEM addresses from XSK
descriptors was pre-allocated during first fragment processing yielded
too big performance regression for 64b traffic. In current approach
impact is much reduced on my tests and for jumbo frames I observed
traffic being slower by at most 9%.
Magnus suggested to have this way of processing special cased for
XDP_SHARED_UMEM, so we would identify this during bind and set different
hooks for 'backpressure mechanism' on CQ and for skb destructor, but
given that results looked promising on my side I decided to have a
single data path for XSK generic Tx. I suppose other auxiliary stuff
would have to land as well in order to make it work.
Fixes: b7f72a30e9 ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path")
Reported-by: Eryk Kubanski <e.kubanski@partner.samsung.com>
Closes: https://lore.kernel.org/netdev/20250530103456.53564-1-e.kubanski@partner.samsung.com/
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20250904194907.2342177-1-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Rong Tao says:
====================
Fix bpf_strnstr() wrong 'len' parameter, bpf_strnstr("open", "open", 4)
should return 0 instead of -ENOENT. And fix a more general case when s2
is a suffix of the first len characters of s1.
====================
Link: https://patch.msgid.link/tencent_E72A37AF03A3B18853066E421B5969976208@qq.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Ilya Leoshkevich says:
====================
selftests/bpf: Fix "expression result unused" warnings with icecc
v3: https://lore.kernel.org/bpf/20250827194929.416969-1-iii@linux.ibm.com/
v3 -> v4: Go back to the original solution (Yonghong, Alexei).
v2: https://lore.kernel.org/bpf/20250827130519.411700-1-iii@linux.ibm.com/
v2 -> v3: Do not touch libbpf, explain how having two function
declarations works (Andrii).
Fix bpf-gcc build (CI).
v1: https://lore.kernel.org/bpf/20250508113804.304665-1-iii@linux.ibm.com/
v1 -> v2: Annotate bpf_obj_new_impl() with __must_check (Alexei).
Add an explanation about icecc.
I took another look at the "expression result unused" warnings I've
been seeing, and it turned out that the root cause was the icecc
compiler wrapper and what I consider a clang bug. Back then I've
reported that the problem was reproducible with plain clang, but now
I see that it was clearly a mixup, sorry about that.
The solution is to add a few awkward (void) casts. I've added a
detailed explanation of why they are helpful to the commit message.
====================
Link: https://patch.msgid.link/20250829030017.102615-1-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
icecc is a compiler wrapper that distributes compile jobs over a build
farm [1]. It works by sending toolchain binaries and preprocessed
source code to remote machines.
Unfortunately using it with BPF selftests causes build failures due to
a clang bug [2]. The problem is that clang suppresses the
-Wunused-value warning if the unused expression comes from a macro
expansion. Since icecc compiles preprocessed source code, this
information is not available. This leads to -Wunused-value false
positives.
obj_new_no_struct() and obj_new_acq() use the bpf_obj_new() macro and
discard the result. arena_spin_lock_slowpath() uses two macros that
produce values and ignores the results. Add (void) casts to explicitly
indicate that this is intentional and suppress the warning.
An alternative solution is to change the macros to not produce values.
This would work today for the arena_spin_lock_slowpath() issue, but in
the future there may appear users who need them. Another potential
solution is to replace these macros with functions. Unfortunately this
would not work, because these macros work with unknown types and
control flow.
[1] https://github.com/icecc/icecream
[2] https://github.com/llvm/llvm-project/issues/142614
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250829030017.102615-2-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_strnstr() should not treat the ending '\0' of s2 as a matching character
if the parameter 'len' equal to s2 string length, for example:
1. bpf_strnstr("openat", "open", 4) = -ENOENT
2. bpf_strnstr("openat", "open", 5) = 0
This patch makes (1) return 0, fix just the `len == strlen(s2)` case.
And fix a more general case when s2 is a suffix of the first len
characters of s1.
Fixes: e91370550f ("bpf: Add kfuncs for read-only string operations")
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/tencent_17DC57B9D16BC443837021BEACE84B7C1507@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Small cleanup and test extension to probe the bpf_crypto_{encrypt,decrypt}()
kfunc when a bad dst buffer is passed in to assert that an error is returned.
Also, encrypt_sanity() and skb_crypto_setup() were explicit to set the global
status variable to zero before any test, so do the same for decrypt_sanity().
Do not explicitly zero the on-stack err before bpf_crypto_ctx_create() given
the kfunc is expected to do it internally for the success case.
Before kernel fix:
# ./vmtest.sh -- ./test_progs -t crypto
[...]
[ 1.531200] bpf_testmod: loading out-of-tree module taints kernel.
[ 1.533388] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
#87/1 crypto_basic/crypto_release:OK
#87/2 crypto_basic/crypto_acquire:OK
#87 crypto_basic:OK
test_crypto_sanity:PASS:skel open 0 nsec
test_crypto_sanity:PASS:ip netns add crypto_sanity_ns 0 nsec
test_crypto_sanity:PASS:ip -net crypto_sanity_ns -6 addr add face::1/128 dev lo nodad 0 nsec
test_crypto_sanity:PASS:ip -net crypto_sanity_ns link set dev lo up 0 nsec
test_crypto_sanity:PASS:open_netns 0 nsec
test_crypto_sanity:PASS:AF_ALG init fail 0 nsec
test_crypto_sanity:PASS:if_nametoindex lo 0 nsec
test_crypto_sanity:PASS:skb_crypto_setup fd 0 nsec
test_crypto_sanity:PASS:skb_crypto_setup 0 nsec
test_crypto_sanity:PASS:skb_crypto_setup retval 0 nsec
test_crypto_sanity:PASS:skb_crypto_setup status 0 nsec
test_crypto_sanity:PASS:create qdisc hook 0 nsec
test_crypto_sanity:PASS:make_sockaddr 0 nsec
test_crypto_sanity:PASS:attach encrypt filter 0 nsec
test_crypto_sanity:PASS:encrypt socket 0 nsec
test_crypto_sanity:PASS:encrypt send 0 nsec
test_crypto_sanity:FAIL:encrypt status unexpected error: -5 (errno 95)
#88 crypto_sanity:FAIL
Summary: 1/2 PASSED, 0 SKIPPED, 1 FAILED
After kernel fix:
# ./vmtest.sh -- ./test_progs -t crypto
[...]
[ 1.540963] bpf_testmod: loading out-of-tree module taints kernel.
[ 1.542404] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
#87/1 crypto_basic/crypto_release:OK
#87/2 crypto_basic/crypto_acquire:OK
#87 crypto_basic:OK
#88 crypto_sanity:OK
Summary: 2/2 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20250829143657.318524-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stanislav reported that in bpf_crypto_crypt() the destination dynptr's
size is not validated to be at least as large as the source dynptr's
size before calling into the crypto backend with 'len = src_len'. This
can result in an OOB write when the destination is smaller than the
source.
Concretely, in mentioned function, psrc and pdst are both linear
buffers fetched from each dynptr:
psrc = __bpf_dynptr_data(src, src_len);
[...]
pdst = __bpf_dynptr_data_rw(dst, dst_len);
[...]
err = decrypt ?
ctx->type->decrypt(ctx->tfm, psrc, pdst, src_len, piv) :
ctx->type->encrypt(ctx->tfm, psrc, pdst, src_len, piv);
The crypto backend expects pdst to be large enough with a src_len length
that can be written. Add an additional src_len > dst_len check and bail
out if it's the case. Note that these kfuncs are accessible under root
privileges only.
Fixes: 3e1c6f3540 ("bpf: make common crypto API for TC/XDP programs")
Reported-by: Stanislav Fort <disclosure@aisle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20250829143657.318524-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
There is no reason to require this to happen on first submitted IB only.
We need to wait for the queue to be idle, but it can be done at any
time (including when there are multiple video sessions active).
Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8908fdce06)
Cc: stable@vger.kernel.org
There can be multiple engine info packages in one IB and the first one
may be common engine, not decode/encode.
We need to parse the entire IB instead of stopping after finding first
engine info.
Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit dc8f9f0f45)
Cc: stable@vger.kernel.org
Commit b61badd20b ("drm/amdgpu: fix usage slab after free")
reordered when amdgpu_fence_driver_sw_fini() was called after
that patch, amdgpu_fence_driver_sw_fini() effectively became
a no-op as the sched entities we never freed because the
ring pointers were already set to NULL. Remove the NULL
setting.
Reported-by: Lin.Cao <lincao12@amd.com>
Cc: Vitaly Prosyak <vitaly.prosyak@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Fixes: b61badd20b ("drm/amdgpu: fix usage slab after free")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a525fa37aa)
Cc: stable@vger.kernel.org
Pull dma-mapping fix from Marek Szyprowski:
- one more fix for DMA API debugging infrastructure (Baochen Qiang)
* tag 'dma-mapping-6.17-2025-09-09' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma-debug: don't enforce dma mapping check on noncoherent allocations
The i40e hardware has multiple hardware settings which define the Maximum
Frame Size (MFS) of the physical port. The firmware has an AdminQ command
(0x0603) to configure the MFS, but the i40e Linux driver never issues this
command.
In most cases this is no problem, as the NVM default value has the device
configured for its maximum value of 9728. Unfortunately, recent versions of
the iPXE intelxl driver now issue the 0x0603 Set Mac Config command,
modifying the MFS and reducing it from its default value of 9728.
This occurred as part of iPXE commit 6871a7de705b ("[intelxl] Use admin
queue to set port MAC address and maximum frame size"), a prerequisite
change for supporting the E800 series hardware in iPXE. Both the E700 and
E800 firmware support the AdminQ command, and the iPXE code shares much of
the logic between the two device drivers.
The ice E800 Linux driver already issues the 0x0603 Set Mac Config command
early during probe, and is thus unaffected by the iPXE change.
Since commit 3a2c6ced90 ("i40e: Add a check to see if MFS is set"), the
i40e driver does check the I40E_PRTGL_SAH register, but it only logs a
warning message if its value is below the 9728 default. This register also
only covers received packets and not transmitted packets. A warning can
inform system administrators, but does not correct the issue. No
interactions from userspace cause the driver to write to PRTGL_SAH or issue
the 0x0603 AdminQ command. Only a GLOBR reset will restore the value to its
default value. There is no obvious method to trigger a GLOBR reset from
user space.
To fix this, introduce the i40e_aq_set_mac_config() function, similar to
the one from the ice driver. Call this during early probe to ensure that
the device configuration matches driver expectation. Unlike E800, the E700
firmware also has a bit to control whether the MAC should append CRC data.
It is on by default, but setting a 0 to this bit would disable CRC. The
i40e implementation must set this bit to ensure CRC will be appended by the
MAC.
In addition to the AQ command, instead of just checking the I40E_PRTGL_SAH
register, update its value to the 9728 default and write it back. This
ensures that the hardware is in the expected state, regardless of whether
the iPXE (or any other early boot driver) has modified this state.
This is a better user experience, as we now fix the issues with larger MTU
instead of merely warning. It also aligns with the way the ice E800 series
driver works.
A final note: The Fixes tag provided here is not strictly accurate. The
issue occurs as a result of an external entity (the iPXE intelxl driver),
and this is not a regression specifically caused by the mentioned change.
However, I believe the original change to just warn about PRTGL_SAH being
too low was an insufficient fix.
Fixes: 3a2c6ced90 ("i40e: Add a check to see if MFS is set")
Link: 6871a7de70
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When the xe pm_notifier evicts for suspend / hibernate, there might be
racing tasks trying to re-validate again. This can lead to suspend taking
excessive time or get stuck in a live-lock. This behaviour becomes
much worse with the fix that actually makes re-validation bring back
bos to VRAM rather than letting them remain in TT.
Prevent that by having exec and the rebind worker waiting for a completion
that is set to block by the pm_notifier before suspend and is signaled
by the pm_notifier after resume / wakeup.
It's probably still possible to craft malicious applications that block
suspending. More work is pending to fix that.
v3:
- Avoid wait_for_completion() in the kernel worker since it could
potentially cause work item flushes from freezable processes to
wait forever. Instead terminate the rebind workers if needed and
re-launch at resume. (Matt Auld)
v4:
- Fix some bad naming and leftover debug printouts.
- Fix kerneldoc.
- Use drmm_mutex_init() for the xe->rebind_resume_lock (Matt Auld).
- Rework the interface of xe_vm_rebind_resume_worker (Matt Auld).
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288
Fixes: c6a4d46ec1 ("drm/xe: evict user memory in PM notifier")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v6.16+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250904160715.2613-4-thomas.hellstrom@linux.intel.com
(cherry picked from commit 599334572a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
VRAM+TT bos that are evicted from VRAM to TT may remain in
TT also after a revalidation following eviction or suspend.
This manifests itself as applications becoming sluggish
after buffer objects get evicted or after a resume from
suspend or hibernation.
If the bo supports placement in both VRAM and TT, and
we are on DGFX, mark the TT placement as fallback. This means
that it is tried only after VRAM + eviction.
This flaw has probably been present since the xe module was
upstreamed but use a Fixes: commit below where backporting is
likely to be simple. For earlier versions we need to open-
code the fallback algorithm in the driver.
v2:
- Remove check for dgfx. (Matthew Auld)
- Update the xe_dma_buf kunit test for the new strategy (CI)
- Allow dma-buf to pin in current placement (CI)
- Make xe_bo_validate() for pinned bos a NOP.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5995
Fixes: a78a8da51b ("drm/ttm: replace busy placement with flags v6")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.9+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250904160715.2613-2-thomas.hellstrom@linux.intel.com
(cherry picked from commit cb3d7b3b46)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
If request_irq() in i40e_vsi_request_irq_msix() fails in an iteration
later than the first, the error path wants to free the IRQs requested
so far. However, it uses the wrong dev_id argument for free_irq(), so
it does not free the IRQs correctly and instead triggers the warning:
Trying to free already-free IRQ 173
WARNING: CPU: 25 PID: 1091 at kernel/irq/manage.c:1829 __free_irq+0x192/0x2c0
Modules linked in: i40e(+) [...]
CPU: 25 UID: 0 PID: 1091 Comm: NetworkManager Not tainted 6.17.0-rc1+ #1 PREEMPT(lazy)
Hardware name: [...]
RIP: 0010:__free_irq+0x192/0x2c0
[...]
Call Trace:
<TASK>
free_irq+0x32/0x70
i40e_vsi_request_irq_msix.cold+0x63/0x8b [i40e]
i40e_vsi_request_irq+0x79/0x80 [i40e]
i40e_vsi_open+0x21f/0x2f0 [i40e]
i40e_open+0x63/0x130 [i40e]
__dev_open+0xfc/0x210
__dev_change_flags+0x1fc/0x240
netif_change_flags+0x27/0x70
do_setlink.isra.0+0x341/0xc70
rtnl_newlink+0x468/0x860
rtnetlink_rcv_msg+0x375/0x450
netlink_rcv_skb+0x5c/0x110
netlink_unicast+0x288/0x3c0
netlink_sendmsg+0x20d/0x430
____sys_sendmsg+0x3a2/0x3d0
___sys_sendmsg+0x99/0xe0
__sys_sendmsg+0x8a/0xf0
do_syscall_64+0x82/0x2c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
[...]
</TASK>
---[ end trace 0000000000000000 ]---
Use the same dev_id for free_irq() as for request_irq().
I tested this with inserting code to fail intentionally.
Fixes: 493fb30011 ("i40e: Move q_vectors from pointer to array to array of pointers")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The igb driver incorrectly skips the link test when the network
interface is admin down (if_running == false), causing the test to
always report PASS regardless of the actual physical link state.
This behavior is inconsistent with other drivers (e.g. i40e, ice, ixgbe,
etc.) which correctly test the physical link state regardless of admin
state.
Remove the if_running check to ensure link test always reflects the
physical link state.
Fixes: 8d420a1b3e ("igb: correct link test not being run when link is down")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The igb driver currently causes a NULL pointer dereference when executing
the ethtool loopback test. This occurs because there is no associated
q_vector for the test ring when it is set up, as interrupts are typically
not added to the test rings.
Since commit 5ef44b3cb4 removed the napi_id assignment in
__xdp_rxq_info_reg(), there is no longer a need to pass a napi_id to it.
Therefore, simply use 0 as the last parameter.
Fixes: 2c6196013f ("igb: Add AF_XDP zero-copy Rx support")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Joe Damato <joe@dama.to>
Signed-off-by: Tianyu Xu <tianyxu@cisco.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Current mem limit check leaks some GTT memory (reserved_for_pt
reserved_for_ras + adev->vram_pin_size) for small APUs.
Since carveout VRAM is tunable on APUs, there are three case
regarding the carveout VRAM size relative to GTT:
1. 0 < carveout < gtt
apu_prefer_gtt = true, is_app_apu = false
2. carveout > gtt / 2
apu_prefer_gtt = false, is_app_apu = false
3. 0 = carveout
apu_prefer_gtt = true, is_app_apu = true
It doesn't make sense to check below limitation in case 1
(default case, small carveout) because the values in the below
expression are mixed with carveout and gtt.
adev->kfd.vram_used[xcp_id] + vram_needed >
vram_size - reserved_for_pt - reserved_for_ras -
atomic64_read(&adev->vram_pin_size)
gtt: kfd.vram_used, vram_needed, vram_size
carveout: reserved_for_pt, reserved_for_ras, adev->vram_pin_size
In case 1, vram allocation will go to gtt domain, skip vram check
since ttm_mem_limit check already cover this allocation.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit fa7c99f04f)
When creating p2p links, KFD needs to check XGMI link
with two conditions, hive_id and is_sharing_enabled,
but it is missing to check is_sharing_enabled, so add
it to fix the error.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 36cc7d1317)
When testing softirq based hrtimers on an ARM32 board, with high resolution
mode and NOHZ inactive, softirq based hrtimers fail to expire after being
moved away from an offline CPU:
CPU0 CPU1
hrtimer_start(..., HRTIMER_MODE_SOFT);
cpu_down(CPU1) ...
hrtimers_cpu_dying()
// Migrate timers to CPU0
smp_call_function_single(CPU0, returgger_next_event);
retrigger_next_event()
if (!highres && !nohz)
return;
As retrigger_next_event() is a NOOP when both high resolution timers and
NOHZ are inactive CPU0's hrtimer_cpu_base::softirq_expires_next is not
updated and the migrated softirq timers never expire unless there is a
softirq based hrtimer queued on CPU0 later.
Fix this by removing the hrtimer_hres_active() and tick_nohz_active() check
in retrigger_next_event(), which enforces a full update of the CPU base.
As this is not a fast path the extra cost does not matter.
[ tglx: Massaged change log ]
Fixes: 5c0930ccaa ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Co-developed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250805081025.54235-1-wangxiongfeng2@huawei.com
If a GSO skb is sent through a Geneve tunnel and if Geneve options are
added, the split GSO skb might not fit in the MTU anymore and an ICMP
frag needed packet can be generated. In such case the ICMP packet might
go through the segmentation logic (and dropped) later if it reaches a
path were the GSO status is checked and segmentation is required.
This is especially true when an OvS bridge is used with a Geneve tunnel
attached to it. The following set of actions could lead to the ICMP
packet being wrongfully segmented:
1. An skb is constructed by the TCP layer (e.g. gso_type SKB_GSO_TCPV4,
segs >= 2).
2. The skb hits the OvS bridge where Geneve options are added by an OvS
action before being sent through the tunnel.
3. When the skb is xmited in the tunnel, the split skb does not fit
anymore in the MTU and iptunnel_pmtud_build_icmp is called to
generate an ICMP fragmentation needed packet. This is done by reusing
the original (GSO!) skb. The GSO metadata is not cleared.
4. The ICMP packet being sent back hits the OvS bridge again and because
skb_is_gso returns true, it goes through queue_gso_packets...
5. ...where __skb_gso_segment is called. The skb is then dropped.
6. Note that in the above example on re-transmission the skb won't be a
GSO one as it would be segmented (len > MSS) and the ICMP packet
should go through.
Fix this by resetting the GSO information before reusing an skb in
iptunnel_pmtud_build_icmp and iptunnel_pmtud_build_icmpv6.
Fixes: 4cb47a8644 ("tunnels: PMTU discovery support for directly bridged IP packets")
Reported-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Link: https://patch.msgid.link/20250904125351.159740-1-atenart@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The function move_dirty_folio_in_page_array() was created by commit
ce80b76dd3 ("ceph: introduce ceph_process_folio_batch() method") by
moving code from ceph_writepages_start() to this function.
This new function is supposed to return an error code which is checked
by the caller (now ceph_process_folio_batch()), and on error, the
caller invokes redirty_page_for_writepage() and then breaks from the
loop.
However, the refactoring commit has gone wrong, and it by accident, it
always returns 0 (= success) because it first NULLs the pointer and
then returns PTR_ERR(NULL) which is always 0. This means errors are
silently ignored, leaving NULL entries in the page array, which may
later crash the kernel.
The simple solution is to call PTR_ERR() before clearing the pointer.
Cc: stable@vger.kernel.org
Fixes: ce80b76dd3 ("ceph: introduce ceph_process_folio_batch() method")
Link: https://lore.kernel.org/ceph-devel/aK4v548CId5GIKG1@swift.blarg.de/
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The function ceph_process_folio_batch() sets folio_batch entries to
NULL, which is an illegal state. Before folio_batch_release() crashes
due to this API violation, the function ceph_shift_unused_folios_left()
is supposed to remove those NULLs from the array.
However, since commit ce80b76dd3 ("ceph: introduce
ceph_process_folio_batch() method"), this shifting doesn't happen
anymore because the "for" loop got moved to ceph_process_folio_batch(),
and now the `i` variable that remains in ceph_writepages_start()
doesn't get incremented anymore, making the shifting effectively
unreachable much of the time.
Later, commit 1551ec61dc ("ceph: introduce ceph_submit_write()
method") added more preconditions for doing the shift, replacing the
`i` check (with something that is still just as broken):
- if ceph_process_folio_batch() fails, shifting never happens
- if ceph_move_dirty_page_in_page_array() was never called (because
ceph_process_folio_batch() has returned early for some of various
reasons), shifting never happens
- if `processed_in_fbatch` is zero (because ceph_process_folio_batch()
has returned early for some of the reasons mentioned above or
because ceph_move_dirty_page_in_page_array() has failed), shifting
never happens
Since those two commits, any problem in ceph_process_folio_batch()
could crash the kernel, e.g. this way:
BUG: kernel NULL pointer dereference, address: 0000000000000034
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: Oops: 0002 [#1] SMP NOPTI
CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE
Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023
Workqueue: writeback wb_workfn (flush-ceph-1)
RIP: 0010:folios_put_refs+0x85/0x140
Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 >
RSP: 0018:ffffb880af8db778 EFLAGS: 00010207
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003
RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0
RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f
R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0
R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000
FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
<TASK>
ceph_writepages_start+0xeb9/0x1410
The crash can be reproduced easily by changing the
ceph_check_page_before_write() return value to `-E2BIG`.
(Interestingly, the crash happens only if `huge_zero_folio` has
already been allocated; without `huge_zero_folio`,
is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL
entries instead of dereferencing them. That makes reproducing the bug
somewhat unreliable. See
https://lore.kernel.org/20250826231626.218675-1-max.kellermann@ionos.com
for a discussion of this detail.)
My suggestion is to move the ceph_shift_unused_folios_left() to right
after ceph_process_folio_batch() to ensure it always gets called to
fix up the illegal folio_batch state.
Cc: stable@vger.kernel.org
Fixes: ce80b76dd3 ("ceph: introduce ceph_process_folio_batch() method")
Link: https://lore.kernel.org/ceph-devel/aK4v548CId5GIKG1@swift.blarg.de/
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When the parent directory's i_rwsem is not locked, req->r_parent may become
stale due to concurrent operations (e.g. rename) between dentry lookup and
message creation. Validate that r_parent matches the encoded parent inode
and update to the correct inode if a mismatch is detected.
[ idryomov: folded a follow-up fix from Alex to drop extra reference
from ceph_get_reply_dir() in ceph_fill_trace():
ceph_get_reply_dir() may return a different, referenced inode when
r_parent is stale and the parent directory lock is not held.
ceph_fill_trace() used that inode but failed to drop the reference
when it differed from req->r_parent, leaking an inode reference.
Keep the directory inode in a local variable and iput() it at
function end if it does not match req->r_parent. ]
Cc: stable@vger.kernel.org
Signed-off-by: Alex Markuze <amarkuze@redhat.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Add validation to ensure the cached parent directory inode matches the
directory info in MDS replies. This prevents client-side race conditions
where concurrent operations (e.g. rename) cause r_parent to become stale
between request initiation and reply processing, which could lead to
applying state changes to incorrect directory inodes.
[ idryomov: folded a kerneldoc fixup and a follow-up fix from Alex to
move CEPH_CAP_PIN reference when r_parent is updated:
When the parent directory lock is not held, req->r_parent can become
stale and is updated to point to the correct inode. However, the
associated CEPH_CAP_PIN reference was not being adjusted. The
CEPH_CAP_PIN is a reference on an inode that is tracked for
accounting purposes. Moving this pin is important to keep the
accounting balanced. When the pin was not moved from the old parent
to the new one, it created two problems: The reference on the old,
stale parent was never released, causing a reference leak.
A reference for the new parent was never acquired, creating the risk
of a reference underflow later in ceph_mdsc_release_request(). This
patch corrects the logic by releasing the pin from the old parent and
acquiring it for the new parent when r_parent is switched. This
ensures reference accounting stays balanced. ]
Cc: stable@vger.kernel.org
Signed-off-by: Alex Markuze <amarkuze@redhat.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Running resctrl_tests on an SNC-2 system with lockdep debugging enabled
triggers several warnings with following trace:
WARNING: CPU: 0 PID: 1914 at kernel/cpu.c:528 lockdep_assert_cpus_held
...
Call Trace:
__mon_event_count
? __lock_acquire
? __pfx___mon_event_count
mon_event_count
? __pfx_smp_mon_event_count
smp_mon_event_count
smp_call_on_cpu_callback
get_cpu_cacheinfo_level() called from __mon_event_count() requires CPU hotplug
lock to be held. The hotplug lock is indeed held during this time, as
confirmed by the lockdep_assert_cpus_held() within mon_event_read() that calls
mon_event_count() via IPI, but the lockdep tracking is not able to follow the
IPI.
Fresh CPU cache information via get_cpu_cacheinfo_level() from
__mon_event_count() was added to support the fix for the issue where resctrl
inappropriately maintained links to L3 cache information that will be stale in
the case when the associated CPU goes offline.
Keep the cacheinfo ID in struct rdt_mon_domain to ensure that resctrl does not
maintain stale cache information while CPUs can go offline. Return to using
a pointer to the L3 cache information (struct cacheinfo) in struct rmid_read,
rmid_read::ci. Initialize rmid_read::ci before the IPI where it is used. CPU
hotplug lock is held across rmid_read::ci initialization and use to ensure
that it points to accurate cache information.
Fixes: 594902c986 ("x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Pass the right type of flag to vcpu_dat_fault_handler(); it expects a
FOLL_* flag (in particular FOLL_WRITE), but FAULT_FLAG_WRITE is passed
instead.
This still works because they happen to have the same integer value,
but it's a mistake, thus the fix.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: 05066cafa9 ("s390/mm/fault: Handle guest-related program interrupts in KVM")
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
If mmu_notifier_register() fails, for example because a signal was
pending, the mmu_notifier will not be registered. But when the VM gets
destroyed, it will get unregistered anyway and that will cause one
extra mmdrop(), which will eventually cause the mm of the process to
be freed too early, and cause a use-after free.
This bug happens rarely, and only when secure guests are involved.
The solution is to check the return value of mmu_notifier_register()
and return it to the caller (ultimately it will be propagated all the
way to userspace). In case of -EINTR, userspace will try again.
Fixes: ca2fd0609b ("KVM: s390: pv: add mmu_notifier")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
When you run a KVM guest with vhost-net and migrate that guest to
another host, and you immediately enable postcopy after starting the
migration, there is a big chance that the network connection of the
guest won't work anymore on the destination side after the migration.
With a debug kernel v6.16.0, there is also a call trace that looks
like this:
FAULT_FLAG_ALLOW_RETRY missing 881
CPU: 6 UID: 0 PID: 549 Comm: kworker/6:2 Kdump: loaded Not tainted 6.16.0 #56 NONE
Hardware name: IBM 3931 LA1 400 (LPAR)
Workqueue: events irqfd_inject [kvm]
Call Trace:
[<00003173cbecc634>] dump_stack_lvl+0x104/0x168
[<00003173cca69588>] handle_userfault+0xde8/0x1310
[<00003173cc756f0c>] handle_pte_fault+0x4fc/0x760
[<00003173cc759212>] __handle_mm_fault+0x452/0xa00
[<00003173cc7599ba>] handle_mm_fault+0x1fa/0x6a0
[<00003173cc73409a>] __get_user_pages+0x4aa/0xba0
[<00003173cc7349e8>] get_user_pages_remote+0x258/0x770
[<000031734be6f052>] get_map_page+0xe2/0x190 [kvm]
[<000031734be6f910>] adapter_indicators_set+0x50/0x4a0 [kvm]
[<000031734be7f674>] set_adapter_int+0xc4/0x170 [kvm]
[<000031734be2f268>] kvm_set_irq+0x228/0x3f0 [kvm]
[<000031734be27000>] irqfd_inject+0xd0/0x150 [kvm]
[<00003173cc00c9ec>] process_one_work+0x87c/0x1490
[<00003173cc00dda6>] worker_thread+0x7a6/0x1010
[<00003173cc02dc36>] kthread+0x3b6/0x710
[<00003173cbed2f0c>] __ret_from_fork+0xdc/0x7f0
[<00003173cdd737ca>] ret_from_fork+0xa/0x30
3 locks held by kworker/6:2/549:
#0: 00000000800bc958 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ee/0x1490
#1: 000030f3d527fbd0 ((work_completion)(&irqfd->inject)){+.+.}-{0:0}, at: process_one_work+0x81c/0x1490
#2: 00000000f99862b0 (&mm->mmap_lock){++++}-{3:3}, at: get_map_page+0xa8/0x190 [kvm]
The "FAULT_FLAG_ALLOW_RETRY missing" indicates that handle_userfaultfd()
saw a page fault request without ALLOW_RETRY flag set, hence userfaultfd
cannot remotely resolve it (because the caller was asking for an immediate
resolution, aka, FAULT_FLAG_NOWAIT, while remote faults can take time).
With that, get_map_page() failed and the irq was lost.
We should not be strictly in an atomic environment here and the worker
should be sleepable (the call is done during an ioctl from userspace),
so we can allow adapter_indicators_set() to just sleep waiting for the
remote fault instead.
Link: https://issues.redhat.com/browse/RHEL-42486
Signed-off-by: Peter Xu <peterx@redhat.com>
[thuth: Assembled patch description and fixed some cosmetical issues]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Fixes: f65470661f ("KVM: s390/interrupt: do not pin adapter interrupt pages")
[frankja: Added fixes tag]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
state_show() reads kdamond->damon_ctx without holding damon_sysfs_lock.
This allows a use-after-free race:
CPU 0 CPU 1
----- -----
state_show() damon_sysfs_turn_damon_on()
ctx = kdamond->damon_ctx; mutex_lock(&damon_sysfs_lock);
damon_destroy_ctx(kdamond->damon_ctx);
kdamond->damon_ctx = NULL;
mutex_unlock(&damon_sysfs_lock);
damon_is_running(ctx); /* ctx is freed */
mutex_lock(&ctx->kdamond_lock); /* UAF */
(The race can also occur with damon_sysfs_kdamonds_rm_dirs() and
damon_sysfs_kdamond_release(), which free or replace the context under
damon_sysfs_lock.)
Fix by taking damon_sysfs_lock before dereferencing the context, mirroring
the locking used in pid_show().
The bug has existed since state_show() first accessed kdamond->damon_ctx.
Link: https://lkml.kernel.org/r/20250905101046.2288-1-disclosure@aisle.com
Fixes: a61ea561c8 ("mm/damon/sysfs: link DAMON for virtual address spaces monitoring")
Signed-off-by: Stanislav Fort <disclosure@aisle.com>
Reported-by: Stanislav Fort <disclosure@aisle.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Clang 22 recently added support for defining __SANITIZE__ macros similar
to GCC [1], which causes warnings (or errors with CONFIG_WERROR=y or W=e)
with the existing defines that the kernel creates to emulate this behavior
with existing clang versions.
In file included from <built-in>:3:
In file included from include/linux/compiler_types.h:171:
include/linux/compiler-clang.h:37:9: error: '__SANITIZE_THREAD__' macro redefined [-Werror,-Wmacro-redefined]
37 | #define __SANITIZE_THREAD__
| ^
<built-in>:352:9: note: previous definition is here
352 | #define __SANITIZE_THREAD__ 1
| ^
Refactor compiler-clang.h to only define the sanitizer macros when they
are undefined and adjust the rest of the code to use these macros for
checking if the sanitizers are enabled, clearing up the warnings and
allowing the kernel to easily drop these defines when the minimum
supported version of LLVM for building the kernel becomes 22.0.0 or newer.
Link: https://lkml.kernel.org/r/20250902-clang-update-sanitize-defines-v1-1-cf3702ca3d92@kernel.org
Link: 568c23bbd3 [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kasan_populate_vmalloc() and its helpers ignore the caller's gfp_mask and
always allocate memory using the hardcoded GFP_KERNEL flag. This makes
them inconsistent with vmalloc(), which was recently extended to support
GFP_NOFS and GFP_NOIO allocations.
Page table allocations performed during shadow population also ignore the
external gfp_mask. To preserve the intended semantics of GFP_NOFS and
GFP_NOIO, wrap the apply_to_page_range() calls into the appropriate
memalloc scope.
xfs calls vmalloc with GFP_NOFS, so this bug could lead to deadlock.
There was a report here
https://lkml.kernel.org/r/686ea951.050a0220.385921.0016.GAE@google.com
This patch:
- Extends kasan_populate_vmalloc() and helpers to take gfp_mask;
- Passes gfp_mask down to alloc_pages_bulk() and __get_free_page();
- Enforces GFP_NOFS/NOIO semantics with memalloc_*_save()/restore()
around apply_to_page_range();
- Updates vmalloc.c and percpu allocator call sites accordingly.
Link: https://lkml.kernel.org/r/20250831121058.92971-1-urezki@gmail.com
Fixes: 451769ebb7 ("mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc")
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reported-by: syzbot+3470c9ffee63e4abafeb@syzkaller.appspotmail.com
Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 3215eaceca ("mm/mremap: refactor initial parameter sanity
checks") moved the sanity check for vrm->new_addr from mremap_to() to
check_mremap_params().
However, this caused a regression as vrm->new_addr is now checked even
when MREMAP_FIXED and MREMAP_DONTUNMAP flags are not specified. In this
case, vrm->new_addr can be garbage and create unexpected failures.
Fix this by moving the new_addr check after the vrm_implies_new_addr()
guard. This ensures that the new_addr is only checked when the user has
specified one explicitly.
Link: https://lkml.kernel.org/r/20250828142657.770502-1-cmllamas@google.com
Fixes: 3215eaceca ("mm/mremap: refactor initial parameter sanity checks")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The 'allocation failed, ...' warning messages can cause unlimited log
spam, contrary to the implementation's intent.
The warn_limit variable is accessed without synchronization. If more than
<warn_limit> threads enter the warning path at the same time, the variable
will get decremented past 0. Once it becomes negative, the non-zero check
will always return true leading to unlimited log spam.
Use atomic operation to access warn_limit and change condition to test for
non-negative (>= 0) - atomic_dec_if_positive will return -1 once
warn_limit becomes 0. Continue to print disable message alongside the
last warning.
While the change cited in Fixes is only adjacent, the warning limit
implementation was correct before it. Only non-atomic allocations were
considered for warnings, and those happened to hold pcpu_alloc_mutex while
accessing warn_limit.
[vdumitrescu@nvidia.com: prevent warn_limit from going negative, per Christoph Lameter]
Link: https://lkml.kernel.org/r/ee87cc59-2717-4dbb-8052-1d2692c5aaaa@nvidia.com
Link: https://lkml.kernel.org/r/ab22061a-a62f-4429-945b-744e5cc4ba35@nvidia.com
Fixes: f7d77dfc91 ("mm/percpu.c: print error message too if atomic alloc failed")
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For some reason Broadcom decided that BCM53101 uses 0.5s increments for
the ageing time register, but kept the field width the same [1]. Due to
this, the actual ageing time was always half of what was configured.
Fix this by adapting the limits and value calculation for BCM53101.
So far it looks like this is the only chip with the increased tick
speed:
$ grep -l -r "Specifies the aging time in 0.5 seconds" cdk/PKG/chip | sort
cdk/PKG/chip/bcm53101/bcm53101_a0_defs.h
$ grep -l -r "Specifies the aging time in seconds" cdk/PKG/chip | sort
cdk/PKG/chip/bcm53010/bcm53010_a0_defs.h
cdk/PKG/chip/bcm53020/bcm53020_a0_defs.h
cdk/PKG/chip/bcm53084/bcm53084_a0_defs.h
cdk/PKG/chip/bcm53115/bcm53115_a0_defs.h
cdk/PKG/chip/bcm53118/bcm53118_a0_defs.h
cdk/PKG/chip/bcm53125/bcm53125_a0_defs.h
cdk/PKG/chip/bcm53128/bcm53128_a0_defs.h
cdk/PKG/chip/bcm53134/bcm53134_a0_defs.h
cdk/PKG/chip/bcm53242/bcm53242_a0_defs.h
cdk/PKG/chip/bcm53262/bcm53262_a0_defs.h
cdk/PKG/chip/bcm53280/bcm53280_a0_defs.h
cdk/PKG/chip/bcm53280/bcm53280_b0_defs.h
cdk/PKG/chip/bcm53600/bcm53600_a0_defs.h
cdk/PKG/chip/bcm89500/bcm89500_a0_defs.h
[1] a5d3fc9b12/cdk/PKG/chip/bcm53101/bcm53101_a0_defs.h (L28966)
Fixes: e39d14a760 ("net: dsa: b53: implement setting ageing time")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250905124507.59186-1-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Per family bind/unbind callbacks were introduced to allow families
to track multicast group consumer presence, e.g. to start or stop
producing events depending on listeners.
However, in genl_bind() the bind() callback was invoked even if
capability checks failed and ret was set to -EPERM. This means that
callbacks could run on behalf of unauthorized callers while the
syscall still returned failure to user space.
Fix this by only invoking bind() after "if (ret) break;" check
i.e. after permission checks have succeeded.
Fixes: 3de21a8990 ("genetlink: Add per family bind/unbind callbacks")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20250905135731.3026965-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With further testing with an attached Aeonsemi it was discovered that
the pinctrl MDIO function applied the wrong bitmask. The error was
probably caused by the confusing documentation related to these bits.
Inspecting what the bootloader actually configure, the SGMII_MDIO_MODE
is never actually set but instead it's set force enable to the 2 GPIO
(gpio 1-2) for MDC and MDIO pin.
The usage of GPIO might be confusing but this is just to instruct the
SoC to not mess with those 2 PIN and as Benjamin reported it's also an
Errata of 7581. The FORCE_GPIO_EN doesn't set them as GPIO function
(that is configured by a different register) but it's really to actually
""enable"" those lines.
Normally the SoC should autodetect this by HW but it seems AN7581 have
problem with this and require this workaround to force enable the 2 pin.
Applying this configuration permits correct functionality of any
externally attached PHY.
Cc: stable@vger.kernel.org
Fixes: 1c8ace2d07 ("pinctrl: airoha: Add support for EN7581 SoC")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Acked-by: Benjamin Larsson <benjamin.larsson@genexis.eu>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
5da3d94a23 ("PCI: mvebu: Use for_each_of_range() iterator for parsing
"ranges"") simplified code by using the for_each_of_range() iterator, but
it broke PCI enumeration on Turris Omnia (and probably other mvebu
targets).
Issue #1:
To determine range.flags, of_pci_range_parser_one() uses bus->get_flags(),
which resolves to of_bus_pci_get_flags(), which already returns an
IORESOURCE bit field, and NOT the original flags from the "ranges"
resource.
Then mvebu_get_tgt_attr() attempts the very same conversion again. Remove
the misinterpretation of range.flags in mvebu_get_tgt_attr(), to restore
the intended behavior.
Issue #2:
The driver needs target and attributes, which are encoded in the raw
address values of the "/soc/pcie/ranges" resource. According to
of_pci_range_parser_one(), the raw values are stored in range.bus_addr and
range.parent_bus_addr, respectively. range.cpu_addr is a translated version
of range.parent_bus_addr, and not relevant here.
Use the correct range structure member, to extract target and attributes.
This restores the intended behavior.
Fixes: 5da3d94a23 ("PCI: mvebu: Use for_each_of_range() iterator for parsing "ranges"")
Reported-by: Jan Palus <jpalus@fastmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220479
Signed-off-by: Klaus Kudielka <klaus.kudielka@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Tony Dinh <mibodhi@gmail.com>
Tested-by: Jan Palus <jpalus@fastmail.com>
Link: https://patch.msgid.link/20250907102303.29735-1-klaus.kudielka@gmail.com
Syzkaller trigger a fault injection warning:
WARNING: CPU: 1 PID: 12326 at tracepoint_add_func+0xbfc/0xeb0
Modules linked in:
CPU: 1 UID: 0 PID: 12326 Comm: syz.6.10325 Tainted: G U 6.14.0-rc5-syzkaller #0
Tainted: [U]=USER
Hardware name: Google Compute Engine/Google Compute Engine
RIP: 0010:tracepoint_add_func+0xbfc/0xeb0 kernel/tracepoint.c:294
Code: 09 fe ff 90 0f 0b 90 0f b6 74 24 43 31 ff 41 bc ea ff ff ff
RSP: 0018:ffffc9000414fb48 EFLAGS: 00010283
RAX: 00000000000012a1 RBX: ffffffff8e240ae0 RCX: ffffc90014b78000
RDX: 0000000000080000 RSI: ffffffff81bbd78b RDI: 0000000000000001
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffffffffef
R13: 0000000000000000 R14: dffffc0000000000 R15: ffffffff81c264f0
FS: 00007f27217f66c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2e80dff8 CR3: 00000000268f8000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
tracepoint_probe_register_prio+0xc0/0x110 kernel/tracepoint.c:464
register_trace_prio_sched_switch include/trace/events/sched.h:222 [inline]
register_pid_events kernel/trace/trace_events.c:2354 [inline]
event_pid_write.isra.0+0x439/0x7a0 kernel/trace/trace_events.c:2425
vfs_write+0x24c/0x1150 fs/read_write.c:677
ksys_write+0x12b/0x250 fs/read_write.c:731
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
We can reproduce the warning by following the steps below:
1. echo 8 >> set_event_notrace_pid. Let tr->filtered_pids owns one pid
and register sched_switch tracepoint.
2. echo ' ' >> set_event_pid, and perform fault injection during chunk
allocation of trace_pid_list_alloc. Let pid_list with no pid and
assign to tr->filtered_pids.
3. echo ' ' >> set_event_pid. Let pid_list is NULL and assign to
tr->filtered_pids.
4. echo 9 >> set_event_pid, will trigger the double register
sched_switch tracepoint warning.
The reason is that syzkaller injects a fault into the chunk allocation
in trace_pid_list_alloc, causing a failure in trace_pid_list_set, which
may trigger double register of the same tracepoint. This only occurs
when the system is about to crash, but to suppress this warning, let's
add failure handling logic to trace_pid_list_set.
Link: https://lore.kernel.org/20250908024658.2390398-1-pulehui@huaweicloud.com
Fixes: 8d6e90983a ("tracing: Create a sparse bitmask for pid filtering")
Reported-by: syzbot+161412ccaeff20ce4dde@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67cb890e.050a0220.d8275.022e.GAE@google.com
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Typo in ff_lseg_match_mirrors makes the diff ineffective. This results
in merge happening all the time. Merge happening all the time is
problematic because it marks lsegs invalid. Marking lsegs invalid
causes all outstanding IO to get restarted with EAGAIN and connections
to get closed.
Closing connections constantly triggers race conditions in the RDMA
implementation...
Fixes: 660d1eb223 ("pNFS/flexfile: Don't merge layout segments if the mirrors don't match")
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When buf_len is not 4-byte aligned in ath12k_wmi_mgmt_send(), the
firmware asserts and triggers a recovery. The following error
messages are observed:
ath12k_pci 0004:01:00.0: failed to submit WMI_MGMT_TX_SEND_CMDID cmd
ath12k_pci 0004:01:00.0: failed to send mgmt frame: -108
ath12k_pci 0004:01:00.0: failed to tx mgmt frame, vdev_id 0 :-108
ath12k_pci 0004:01:00.0: waiting recovery start...
This issue was observed when running 'iw wlanx set power_save off/on'
in MLO station mode, which triggers the sending of an SMPS action frame
with a length of 27 bytes to the AP. To resolve the misalignment, use
buf_len_aligned instead of buf_len when constructing the WMI TLV header.
Tested-on: WCN7850 hw2.0 PCI WLAN.IOE_HMT.1.1-00011-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1
Fixes: d889913205 ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
Signed-off-by: Miaoqing Pan <miaoqing.pan@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20250908015139.1301437-1-miaoqing.pan@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Commit afbab6e4e8 ("wifi: ath12k: modify ath12k_mac_op_bss_info_changed()
for MLO") replaced the bss_info_changed() callback with vif_cfg_changed()
and link_info_changed() to support Multi-Link Operation (MLO). As a result,
the station power save configuration is no longer correctly applied in
ath12k_mac_bss_info_changed().
Move the handling of 'BSS_CHANGED_PS' into ath12k_mac_op_vif_cfg_changed()
to align with the updated callback structure introduced for MLO, ensuring
proper power-save behavior for station interfaces.
Tested-on: WCN7850 hw2.0 PCI WLAN.IOE_HMT.1.1-00011-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1
Fixes: afbab6e4e8 ("wifi: ath12k: modify ath12k_mac_op_bss_info_changed() for MLO")
Signed-off-by: Miaoqing Pan <miaoqing.pan@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20250908015025.1301398-1-miaoqing.pan@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Some PSPv11 SOCs take a longer time for PSP based mode-1 reset. Instead
of checking for C2PMSG_33 status, add the callback wait_for_bootloader.
Wait for bootloader to be back to steady state is already part of the
generic mode-1 reset flow. Increase the retry count for bootloader wait
and also fix the mask to prevent fake pass.
Fixes: 8345a71fc5 ("drm/amdgpu: Add more checks to PSP mailbox")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4531
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 32f73741d6)
Pull vfs fixes from Christian Brauner:
"fuse:
- Prevent opening of non-regular backing files.
Fuse doesn't support non-regular files anyway.
- Check whether copy_file_range() returns a larger size than
requested.
- Prevent overflow in copy_file_range() as fuse currently only
supports 32-bit sized copies.
- Cache the blocksize value if the server returned a new value as
inode->i_blkbits isn't modified directly anymore.
- Fix i_blkbits handling for iomap partial writes.
By default i_blkbits is set to PAGE_SIZE which causes iomap to mark
the whole folio as uptodate even on a partial write. But fuseblk
filesystems support choosing a blocksize smaller than PAGE_SIZE
risking data corruption. Simply enforce PAGE_SIZE as blocksize for
fuseblk's internal inode for now.
- Prevent out-of-bounds acces in fuse_dev_write() when the number of
bytes to be retrieved is truncated to the fc->max_pages limit.
virtiofs:
- Fix page faults for DAX page addresses.
Misc:
- Tighten file handle decoding from userns.
Check that the decoded dentry itself has a valid idmapping in the
user namespace.
- Fix mount-notify selftests.
- Fix some indentation errors.
- Add an FMODE_ flag to indicate IOCB_HAS_METADATA availability.
This will be moved to an FOP_* flag with a bit more rework needed
for that to happen not suitable for a fix.
- Don't silently ignore metadata for sync read/write.
- Don't pointlessly log warning when reading coredump sysctls"
* tag 'vfs-6.17-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fuse: virtio_fs: fix page fault for DAX page address
selftests/fs/mount-notify: Fix compilation failure.
fhandle: use more consistent rules for decoding file handle from userns
fuse: Block access to folio overlimit
fuse: fix fuseblk i_blkbits for iomap partial writes
fuse: reflect cached blocksize if blocksize was changed
fuse: prevent overflow in copy_file_range return value
fuse: check if copy_file_range() returns larger than requested size
fuse: do not allow mapping a non-regular backing file
coredump: don't pointlessly check and spew warnings
fs: fix indentation style
block: don't silently ignore metadata for sync read/write
fs: add a FMODE_ flag to indicate IOCB_HAS_METADATA availability
Please enter a commit message to explain why this merge is necessary,
especially if it merges an updated upstream into a topic branch.
Merge amd-pstate content for 6.17 (09/04/25) from Mario Limonciello:
"Fixes for regressions found from refactor around
EPP handling at suspend/resume and minimum frequency
while using the performance governor."
* tag 'amd-pstate-v6.17-2025-09-04' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
cpufreq/amd-pstate: Fix a regression leading to EPP 0 after resume
cpufreq/amd-pstate: Fix setting of CPPC.min_perf in active mode for performance governor
MAX_TAG_SIZE was 0x1a8 and it may be truncated in the "bi->metadata_size
= ic->tag_size" assignment. We need to limit it to 255.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
In all the MUX value for LED1 GPIO46 there is a Copy-Paste error where
the MUX value is set to LED0_MODE_MASK instead of LED1_MODE_MASK.
This wasn't notice as there were no board that made use of the
secondary PHY LED but looking at the internal Documentation the actual
value should be LED1_MODE_MASK similar to the other GPIO entry.
Fix the wrong value to apply the correct MUX configuration.
Cc: stable@vger.kernel.org
Fixes: 1c8ace2d07 ("pinctrl: airoha: Add support for EN7581 SoC")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Support for parsing the topology on AMD/Hygon processors using CPUID leaf 0xb
was added in
3986a0a805 ("x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available").
In an effort to keep all the topology parsing bits in one place, this commit
also introduced a pseudo dependency on the TOPOEXT feature to parse the CPUID
leaf 0xb.
The TOPOEXT feature (CPUID 0x80000001 ECX[22]) advertises the support for
Cache Properties leaf 0x8000001d and the CPUID leaf 0x8000001e EAX for
"Extended APIC ID" however support for 0xb was introduced alongside the x2APIC
support not only on AMD [1], but also historically on x86 [2].
Similar to 0xb, the support for extended CPU topology leaf 0x80000026 too does
not depend on the TOPOEXT feature.
The support for these leaves is expected to be confirmed by ensuring
leaf <= {extended_}cpuid_level
and then parsing the level 0 of the respective leaf to confirm EBX[15:0]
(LogProcAtThisLevel) is non-zero as stated in the definition of
"CPUID_Fn0000000B_EAX_x00 [Extended Topology Enumeration]
(Core::X86::Cpuid::ExtTopEnumEax0)" in Processor Programming Reference (PPR)
for AMD Family 19h Model 01h Rev B1 Vol1 [3] Sec. 2.1.15.1 "CPUID Instruction
Functions".
This has not been a problem on baremetal platforms since support for TOPOEXT
(Fam 0x15 and later) predates the support for CPUID leaf 0xb (Fam 0x17[Zen2]
and later), however, for AMD guests on QEMU, the "x2apic" feature can be
enabled independent of the "topoext" feature where QEMU expects topology and
the initial APICID to be parsed using the CPUID leaf 0xb (especially when
number of cores > 255) which is populated independent of the "topoext" feature
flag.
Unconditionally call cpu_parse_topology_ext() on AMD and Hygon processors to
first parse the topology using the XTOPOLOGY leaves (0x80000026 / 0xb) before
using the TOPOEXT leaf (0x8000001e).
While at it, break down the single large comment in parse_topology_amd() to
better highlight the purpose of each CPUID leaf.
Fixes: 3986a0a805 ("x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available")
Suggested-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org # Only v6.9 and above; depends on x86 topology rewrite
Link: https://lore.kernel.org/lkml/1529686927-7665-1-git-send-email-suravee.suthikulpanit@amd.com/ [1]
Link: https://lore.kernel.org/lkml/20080818181435.523309000@linux-os.sc.intel.com/ [2]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 [3]
A bug reported by one of my customers that the order of TAS2781
calibrated-data is incorrect, the correct way is to move R0_Low
and insert it between R0 and InvR0.
Fixes: 4fe2385134 ("ALSA: hda/tas2781: Move and unified the calibrated-data getting function for SPI and I2C into the tas2781_hda lib")
Signed-off-by: Shenghao Ding <shenghao-ding@ti.com>
Link: https://patch.msgid.link/20250907222728.988-1-shenghao-ding@ti.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull i2c fixes from Wolfram Sang:
- i801: drop superfluous WDT entry for Birch
- rtl9300:
- fix channel number check in probe
- check data length boundaries in xfer
- drop broken SMBus quick operation
* tag 'i2c-for-6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: rtl9300: remove broken SMBus Quick operation support
i2c: rtl9300: ensure data length is within supported range
i2c: rtl9300: fix channel number bound check
i2c: i801: Hide Intel Birch Stream SoC TCO WDT
The logic of the headphone detect pin seems to be inverted, with this
change headphones actually output sound when plugged in.
Does not need workaround of using pin-switches to enable output.
Verified by checking /sys/kernel/debug/gpio.
Fixes: ae46756faf ("arm64: dts: rockchip: analog audio on Orange Pi 5")
Signed-off-by: Jimmy Hon <honyuenkwun@gmail.com>
Link: https://lore.kernel.org/r/20250904030150.986042-1-honyuenkwun@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Pull EDAC fix from Borislav Petkov:
- Remove a misplaced dma_free_coherent() call in altera_edac
* tag 'edac_urgent_for_v6.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/altera: Delete an inappropriate dma_free_coherent() call
Pull timer fix from Ingo Molnar:
"Fix a severe slowdown regression in the timer vDSO code related to the
while() loop in __iter_div_u64_rem(), when the AUX-clock is enabled"
* tag 'timers-urgent-2025-09-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
vdso/vsyscall: Avoid slow division loop in auxiliary clock update
Pull locking fix from Ingo Molnar:
"Fix an 'allocation from atomic context' regression in the futex
vmalloc variant"
* tag 'locking-urgent-2025-09-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Move futex_hash_free() back to __mmput()
Pull perf event fix from Ingo Molnar:
"Fix regression where PERF_EVENT_IOC_REFRESH counters miss a PMU-stop"
* tag 'perf-urgent-2025-09-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix the POLL_HUP delivery breakage
Pull RISC-V fixes from Paul Walmsley:
- LTO fix for clang when building with CONFIG_CMODEL_MEDLOW
- Fix for ACPI CPPC CSR read/write return values
- Several fixes for incorrect access widths in thread_info.cpu reads
- Fix an issue in __put_user_nocheck() that was causing the glibc
tst-socket-timestamp test to fail
- Initialize struct kexec_buf records in several kexec-related
functions, which were generating UBSAN warnings
- Two fixes for sparse warnings
* tag 'riscv-for-linus-6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix sparse warning about different address spaces
riscv: Fix sparse warning in __get_user_error()
riscv: kexec: Initialize kexec_buf struct
riscv: use lw when reading int cpu in asm_per_cpu
riscv, bpf: use lw when reading int cpu in bpf_get_smp_processor_id
riscv, bpf: use lw when reading int cpu in BPF_MOV64_PERCPU_REG
riscv: uaccess: fix __put_user_nocheck for unaligned accesses
riscv: use lw when reading int cpu in new_vmalloc_check
ACPI: RISC-V: Fix FFH_CPPC_CSR error handling
riscv: Only allow LTO with CMODEL_MEDANY
On SoCFPGA/Sodia board, mdio bus cannot be probed, so the PHY cannot be
found and the network device does not work.
```
stmmaceth ff702000.ethernet eth0: __stmmac_open: Cannot attach to PHY (error: -19)
```
To probe the mdio bus, add "snps,dwmac-mdio" as compatible string of the
mdio bus. Also the PHY address connected to this board is 4. Therefore,
change to 4.
Cc: stable@vger.kernel.org # 6.3+
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
xs_sock_recv_cmsg was failing to call xs_sock_process_cmsg for any cmsg
type other than TLS_RECORD_TYPE_ALERT (TLS_RECORD_TYPE_DATA, and other
values not handled.) Based on my reading of the previous commit
(cc5d5908: sunrpc: fix client side handling of tls alerts), it looks
like only iov_iter_revert should be conditional on TLS_RECORD_TYPE_ALERT
(but that other cmsg types should still call xs_sock_process_cmsg). On
my machine, I was unable to connect (over mtls) to an NFS share hosted
on FreeBSD. With this patch applied, I am able to mount the share again.
Fixes: cc5d59081f ("sunrpc: fix client side handling of tls alerts")
Signed-off-by: Justin Worrell <jworrell@gmail.com>
Reviewed-and-tested-by: Scott Mayhew <smayhew@redhat.com>
Link: https://lore.kernel.org/r/20250904211038.12874-3-jworrell@gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Since all callers of nfs_page_group_covers_page() have already ensured
that there is only one group member, all that is required is to check
that the entire folio contains dirty data.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If we're truncating part of the folio, then we need to write out the
data on the part that is not covered by the cancellation.
Fixes: d47992f86b ("mm: change invalidatepage prototype to accept length")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Ensure that all O_DIRECT reads and writes complete before copying a file
range, so that the destination is up to date.
Fixes: a5864c999d ("NFS: Do not serialise O_DIRECT reads and writes")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Ensure that all O_DIRECT reads and writes complete before cloning a file
range, so that both the source and destination are up to date.
Fixes: a5864c999d ("NFS: Do not serialise O_DIRECT reads and writes")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Ensure that all O_DIRECT reads and writes complete before calling
fallocate so that we don't race w.r.t. attribute updates.
Fixes: 99f2378322 ("NFSv4.2: Always flush out writes in nfs42_proc_fallocate()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Ensure that all O_DIRECT reads and writes are complete, and prevent the
initiation of new i/o until the setattr operation that will truncate the
file is complete.
Fixes: a5864c999d ("NFS: Do not serialise O_DIRECT reads and writes")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The NFSv4.2 copy offload and clone functions can also end up extending
the size of the destination file, so they too need to call
nfs_truncate_last_folio().
Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
This commit fixes the failing xfstest 'generic/363'.
When the user mmaps() an area that extends beyond the end of file, and
proceeds to write data into the folio that straddles that eof, we're
required to discard that folio data if the user calls some function that
extends the file length.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Recent commit f06bedfa62 ("pNFS/flexfiles: don't attempt pnfs on fatal DS
errors") has changed the error return type of ff_layout_choose_ds_for_read() from
NULL to an error pointer. However, not all code paths have been updated
to match the change. Thus, some non-NULL checks will accept error pointers
as a valid return value.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: f06bedfa62 ("pNFS/flexfiles: don't attempt pnfs on fatal DS errors")
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Previously nfs_local_probe() was made to disable and then attempt to
re-enable LOCALIO (via LOCALIO protocol handshake) if/when it was
called and LOCALIO already enabled.
Vague memory for _why_ this was the case is that this was useful
if/when a local NFS server were to be restarted with a local NFS
client connected to it.
But as it happens this causes an absurd amount of LOCALIO flapping
which has a side-effect of too much IO being needlessly sent to NFSD
(using RPC over the loopback network interface). This is the
definition of "serious performance loss" (that negates the point of
having LOCALIO).
So remove this mis-optimization for re-enabling LOCALIO if/when an NFS
server is restarted (which is an extremely rare thing to do). Will
revisit testing that scenario again but in the meantime this patch
restores the full benefit of LOCALIO.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Pull rust fixes from Miguel Ojeda:
- Two changes to prepare for the future Rust 1.91.0 release (expected
2025-10-30, currently in nightly): a target specification format
change and a renamed, soon-to-be-stabilized 'core' function.
* tag 'rust-fixes-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: support Rust >= 1.91.0 target spec
rust: use the new name Location::file_as_c_str() in Rust >= 1.91.0
Support for MT6359 PMIC keys has been added recently. However, the key
release event is not properly handled: only key press events are
generated, leaving key states stuck in "pressed".
This patch ensures that both key press and key release events are
properly emitted by handling the release logic correctly.
Introduce a 'key_release_irq' member to the 'mtk_pmic_regs' to identify
the devices that have a separate irq for the release event.
Fixes: bc25e6bf03 ("Input: mtk-pmic-keys - add support for MT6359 PMIC keys")
Signed-off-by: Julien Massot <julien.massot@collabora.com>
Link: https://lore.kernel.org/r/20250905-radxa-nio-12-l-gpio-v3-1-40f11377fb55@collabora.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
A crash was observed with the following output:
BUG: kernel NULL pointer dereference, address: 0000000000000010
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 2 UID: 0 PID: 92 Comm: osnoise_cpus Not tainted 6.17.0-rc4-00201-gd69eb204c255 #138 PREEMPT(voluntary)
RIP: 0010:bitmap_parselist+0x53/0x3e0
Call Trace:
<TASK>
osnoise_cpus_write+0x7a/0x190
vfs_write+0xf8/0x410
? do_sys_openat2+0x88/0xd0
ksys_write+0x60/0xd0
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
This issue can be reproduced by below code:
fd=open("/sys/kernel/debug/tracing/osnoise/cpus", O_WRONLY);
write(fd, "0-2", 0);
When user pass 'count=0' to osnoise_cpus_write(), kmalloc() will return
ZERO_SIZE_PTR (16) and cpulist_parse() treat it as a normal value, which
trigger the null pointer dereference. Add check for the parameter 'count'.
Cc: <mhiramat@kernel.org>
Cc: <mathieu.desnoyers@efficios.com>
Cc: <tglozar@redhat.com>
Link: https://lore.kernel.org/20250906035610.3880282-1-wangliang74@huawei.com
Fixes: 17f89102fe ("tracing/osnoise: Allow arbitrarily long CPU string")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Current implementation uses `CDNS_UART_REGISTER_SPACE(0x1000)`
for request_mem_region() and ioremap() in cdns_uart_request_port() API.
The cadence/xilinx IP has register space defined from offset 0x0 to 0x48.
It also mentions that the register map is defined as [6:0]. So, the upper
region may/maynot be used based on the IP integration.
In Axiado AX3000 SoC two UART instances are defined
0x100 apart. That is creating issue in some other instance due to overlap
with addresses.
Since, this address space is already being defined in the
devicetree, use the same when requesting the register space.
Fixes: 1f70557790 ("arm64: dts: axiado: Add initial support for AX3000 SoC and eval board")
Acked-by: Michal Simek <michal.simek@amd.com>
Signed-off-by: Harshit Shah <hshah@axiado.com>
Link: https://lore.kernel.org/r/20250902-xilinx-uartps-reg-size-v3-1-d11cfa7258e3@axiado.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The EP-IN of MIDI2 (altset 1) wasn't initialized in
f_midi2_create_usb_configs() as it's an INT EP unlike others BULK
EPs. But this leaves rather the max packet size unchanged no matter
which speed is used, resulting in the very slow access.
And the wMaxPacketSize values set there look legit for INT EPs, so
let's initialize the MIDI2 EP-IN there for achieving the equivalent
speed as well.
Fixes: 8b645922b2 ("usb: gadget: Add support for USB MIDI 2.0 function driver")
Cc: stable <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20250905133240.20966-1-tiwai@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The gadget card driver forgot to call snd_ump_update_group_attrs()
after adding FBs, and this leaves the UMP group attributes
uninitialized. As a result, -ENODEV error is returned at opening a
legacy rawmidi device as an inactive group.
This patch adds the missing call to address the behavior above.
Fixes: 8b645922b2 ("usb: gadget: Add support for USB MIDI 2.0 function driver")
Cc: stable <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20250904153932.13589-1-tiwai@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tcpm_handle_vdm_request delivers messages to the partner altmode or the
cable altmode depending on the SVDM response type, which is incorrect.
The partner or cable should be chosen based on the received message type
instead.
Also add this filter to ADEV_NOTIFY_USB_AND_QUEUE_VDM, which is used when
the Enter Mode command is responded to by a NAK on SOP or SOP' and when
the Exit Mode command is responded to by an ACK on SOP.
Fixes: 7e7877c55e ("usb: typec: tcpm: add alt mode enter/exit/vdm support for sop'")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Reviewed-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20250821203759.1720841-2-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pending requests will be flushed on disconnect, and the corresponding
TRBs will be turned into No-op TRBs, which are ignored by the xHC
controller once it starts processing the ring.
If the USB debug cable repeatedly disconnects before ring is started
then the ring will eventually be filled with No-op TRBs.
No new transfers can be queued when the ring is full, and driver will
print the following error message:
"xhci_hcd 0000:00:14.0: failed to queue trbs"
This is a normal case for 'in' transfers where TRBs are always enqueued
in advance, ready to take on incoming data. If no data arrives, and
device is disconnected, then ring dequeue will remain at beginning of
the ring while enqueue points to first free TRB after last cancelled
No-op TRB.
s
Solve this by reinitializing the rings when the debug cable disconnects
and DbC is leaving the configured state.
Clear the whole ring buffer and set enqueue and dequeue to the beginning
of ring, and set cycle bit to its initial state.
Cc: stable@vger.kernel.org
Fixes: dfba2174dc ("usb: xhci: Add DbC support in xHCI driver")
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250902105306.877476-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Decouple allocation of endpoint ring buffer from initialization
of the buffer, and initialization of endpoint context parts from
from the rest of the contexts.
It allows driver to clear up and reinitialize endpoint rings
after disconnect without reallocating everything.
This is a prerequisite for the next patch that prevents the transfer
ring from filling up with cancelled (no-op) TRBs if a debug cable is
reconnected several times without transferring anything.
Cc: stable@vger.kernel.org
Fixes: dfba2174dc ("usb: xhci: Add DbC support in xHCI driver")
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250902105306.877476-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
i2c-host-fixes for v6.17-rc5
- i801: fix device IDs
- in rtl9300:
- fix channel number check in probe
- check data length boundaries in xfer
- drop unsupported SMBus quick operation
Problem description
===================
Lockdep reports a possible circular locking dependency (AB/BA) between
&pl->state_mutex and &phy->lock, as follows.
phylink_resolve() // acquires &pl->state_mutex
-> phylink_major_config()
-> phy_config_inband() // acquires &pl->phydev->lock
whereas all the other call sites where &pl->state_mutex and
&pl->phydev->lock have the locking scheme reversed. Everywhere else,
&pl->phydev->lock is acquired at the top level, and &pl->state_mutex at
the lower level. A clear example is phylink_bringup_phy().
The outlier is the newly introduced phy_config_inband() and the existing
lock order is the correct one. To understand why it cannot be the other
way around, it is sufficient to consider phylink_phy_change(), phylink's
callback from the PHY device's phy->phy_link_change() virtual method,
invoked by the PHY state machine.
phy_link_up() and phy_link_down(), the (indirect) callers of
phylink_phy_change(), are called with &phydev->lock acquired.
Then phylink_phy_change() acquires its own &pl->state_mutex, to
serialize changes made to its pl->phy_state and pl->link_config.
So all other instances of &pl->state_mutex and &phydev->lock must be
consistent with this order.
Problem impact
==============
I think the kernel runs a serious deadlock risk if an existing
phylink_resolve() thread, which results in a phy_config_inband() call,
is concurrent with a phy_link_up() or phy_link_down() call, which will
deadlock on &pl->state_mutex in phylink_phy_change(). Practically
speaking, the impact may be limited by the slow speed of the medium
auto-negotiation protocol, which makes it unlikely for the current state
to still be unresolved when a new one is detected, but I think the
problem is there. Nonetheless, the problem was discovered using lockdep.
Proposed solution
=================
Practically speaking, the phy_config_inband() requirement of having
phydev->lock acquired must transfer to the caller (phylink is the only
caller). There, it must bubble up until immediately before
&pl->state_mutex is acquired, for the cases where that takes place.
Solution details, considerations, notes
=======================================
This is the phy_config_inband() call graph:
sfp_upstream_ops :: connect_phy()
|
v
phylink_sfp_connect_phy()
|
v
phylink_sfp_config_phy()
|
| sfp_upstream_ops :: module_insert()
| |
| v
| phylink_sfp_module_insert()
| |
| | sfp_upstream_ops :: module_start()
| | |
| | v
| | phylink_sfp_module_start()
| | |
| v v
| phylink_sfp_config_optical()
phylink_start() | |
| phylink_resume() v v
| | phylink_sfp_set_config()
| | |
v v v
phylink_mac_initial_config()
| phylink_resolve()
| | phylink_ethtool_ksettings_set()
v v v
phylink_major_config()
|
v
phy_config_inband()
phylink_major_config() caller #1, phylink_mac_initial_config(), does not
acquire &pl->state_mutex nor do its callers. It must acquire
&pl->phydev->lock prior to calling phylink_major_config().
phylink_major_config() caller #2, phylink_resolve() acquires
&pl->state_mutex, thus also needs to acquire &pl->phydev->lock.
phylink_major_config() caller #3, phylink_ethtool_ksettings_set(), is
completely uninteresting, because it only calls phylink_major_config()
if pl->phydev is NULL (otherwise it calls phy_ethtool_ksettings_set()).
We need to change nothing there.
Other solutions
===============
The lock inversion between &pl->state_mutex and &pl->phydev->lock has
occurred at least once before, as seen in commit c718af2d00 ("net:
phylink: fix ethtool -A with attached PHYs"). The solution there was to
simply not call phy_set_asym_pause() under the &pl->state_mutex. That
cannot be extended to our case though, where the phy_config_inband()
call is much deeper inside the &pl->state_mutex section.
Fixes: 5fd0f1a02e ("net: phylink: add negotiation of in-band capabilities")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250904125238.193990-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently phylink_resolve() protects itself against concurrent
phylink_bringup_phy() or phylink_disconnect_phy() calls which modify
pl->phydev by relying on pl->state_mutex.
The problem is that in phylink_resolve(), pl->state_mutex is in a lock
inversion state with pl->phydev->lock. So pl->phydev->lock needs to be
acquired prior to pl->state_mutex. But that requires dereferencing
pl->phydev in the first place, and without pl->state_mutex, that is
racy.
Hence the reason for the extra lock. Currently it is redundant, but it
will serve a functional purpose once mutex_lock(&phy->lock) will be
moved outside of the mutex_lock(&pl->state_mutex) section.
Another alternative considered would have been to let phylink_resolve()
acquire the rtnl_mutex, which is also held when phylink_bringup_phy()
and phylink_disconnect_phy() are called. But since phylink_disconnect_phy()
runs under rtnl_lock(), it would deadlock with phylink_resolve() when
calling flush_work(&pl->resolve). Additionally, it would have been
undesirable because it would have unnecessarily blocked many other call
paths as well in the entire kernel, so the smaller-scoped lock was
preferred.
Link: https://lore.kernel.org/netdev/aLb6puGVzR29GpPx@shell.armlinux.org.uk/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250904125238.193990-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are fuel gauges in the bq27xxx series (e.g. bq27z561) which may in some
cases report 0xff as the value of BQ27XXX_REG_FLAGS that should not be
interpreted as "no battery" like for a disconnected battery with some built
in bq27000 chip.
So restrict the no-battery detection originally introduced by
commit 3dd843e1c2 ("bq27000: report missing device better.")
to the bq27000.
There is no need to backport further because this was hidden before
commit f16d9fb6cf ("power: supply: bq27xxx: Retrieve again when busy")
Fixes: f16d9fb6cf ("power: supply: bq27xxx: Retrieve again when busy")
Suggested-by: Jerry Lv <Jerry.Lv@axis.com>
Cc: stable@vger.kernel.org
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Link: https://lore.kernel.org/r/dd979fa6855fd051ee5117016c58daaa05966e24.1755945297.git.hns@goldelico.com
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Since commit
commit f16d9fb6cf ("power: supply: bq27xxx: Retrieve again when busy")
the console log of some devices with hdq enabled but no bq27000 battery
(like e.g. the Pandaboard) is flooded with messages like:
[ 34.247833] power_supply bq27000-battery: driver failed to report 'status' property: -1
as soon as user-space is finding a /sys entry and trying to read the
"status" property.
It turns out that the offending commit changes the logic to now return the
value of cache.flags if it is <0. This is likely under the assumption that
it is an error number. In normal errors from bq27xxx_read() this is indeed
the case.
But there is special code to detect if no bq27000 is installed or accessible
through hdq/1wire and wants to report this. In that case, the cache.flags
are set historically by
commit 3dd843e1c2 ("bq27000: report missing device better.")
to constant -1 which did make reading properties return -ENODEV. So everything
appeared to be fine before the return value was passed upwards.
Now the -1 is returned as -EPERM instead of -ENODEV, triggering the error
condition in power_supply_format_property() which then floods the console log.
So we change the detection of missing bq27000 battery to simply set
cache.flags = -ENODEV
instead of -1.
Fixes: f16d9fb6cf ("power: supply: bq27xxx: Retrieve again when busy")
Cc: Jerry Lv <Jerry.Lv@axis.com>
Cc: stable@vger.kernel.org
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Link: https://lore.kernel.org/r/692f79eb6fd541adb397038ea6e750d4de2deddf.1755945297.git.hns@goldelico.com
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Pull perf tools fixes from Namhyung Kim:
"Fixes for use-after-free that resulted in segfaults after merging the
bpf tree.
Also a couple of build and test fixes"
* tag 'perf-tools-fixes-for-v6.17-2025-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf symbol-elf: Add support for the block argument for libbfd
perf test: Checking BPF metadata collection fails on version string
perf tests: Fix "PE file support" test build
perf bpf-utils: Harden get_bpf_prog_info_linear
perf bpf-utils: Constify bpil_array_desc
perf bpf-event: Fix use-after-free in synthesis
The kexec_buf structure was previously declared without initialization.
commit bf454ec31a ("kexec_file: allow to place kexec_buf randomly")
added a field that is always read but not consistently populated by all
architectures. This un-initialized field will contain garbage.
This is also triggering a UBSAN warning when the uninitialized data was
accessed:
------------[ cut here ]------------
UBSAN: invalid-load in ./include/linux/kexec.h:210:10
load of value 252 is not a valid value for type '_Bool'
Zero-initializing kexec_buf at declaration ensures all fields are
cleanly set, preventing future instances of uninitialized memory being
used.
Fixes: bf454ec31a ("kexec_file: allow to place kexec_buf randomly")
Signed-off-by: Breno Leitao <leitao@debian.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250827-kbuf_all-v1-2-1df9882bb01a@debian.org
Signed-off-by: Paul Walmsley <pjw@kernel.org>
The type of the value to write should be determined by the size of the
destination, not by the value itself, which may be a constant. This
aligns the behavior with x86_64, where __typeof__(*(__gu_ptr)) is used
to infer the correct type.
This fixes an issue in put_cmsg, which was only writing 4 out of 8
bytes to the cmsg_len field, causing the glibc tst-socket-timestamp test
to fail.
Fixes: ca1a66cdd6 ("riscv: uaccess: do not do misaligned accesses in get/put_user()")
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250724220853.1969954-1-aurelien@aurel32.net
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Pull SCSI fixes from James Bottomley:
"Obvious driver patch plus update to sr to add back rotational media
flag since CDROMS are rotational"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sr: Reinstate rotational media flag
scsi: lpfc: Fix buffer free/clear order in deferred receive path
Pull spi fixes from Mark Brown:
"The largest batch of fixes here is a series of fixes for the Freescale
LPSPI driver which James Clark pulled out of their BSP while looking
at support for the NXP S32G version of the controller.
The majority of this turned out to be bug fixes that affect existing
systems with the actual S32G support being just a small quirk that
would be unremarkable by itself, the whole series has had a good
amount of testing and review and the individual patches are all pretty
straightforward by themselves.
We also have a few other driver specific fixes, including a relatively
large but simple one for the Cadence QuadSPI driver"
* tag 'spi-fix-v6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-qpic-snand: unregister ECC engine on probe error and device remove
spi: cadence-quadspi: Implement refcount to handle unbind during busy
spi: spi-fsl-lpspi: Add compatible for S32G
spi: spi-fsl-lpspi: Parameterize reading num-cs from hardware
spi: spi-fsl-lpspi: Treat prescale_max == 0 as no erratum
spi: spi-fsl-lpspi: Constify devtype datas
dt-bindings: lpspi: Document support for S32G
spi: spi-fsl-lpspi: Clear status register after disabling the module
spi: spi-fsl-lpspi: Reset FIFO and disable module on transfer abort
spi: spi-fsl-lpspi: Set correct chip-select polarity bit
spi: spi-fsl-lpspi: Fix transmissions when using CONT
spi: microchip-core-qspi: stop checking viability of op->max_freq in supports_op callback
Pull arm64 fixes from Catalin Marinas:
- Incorrect __BITS_PER_LONG as 64 when compiling the compat vDSO
- Unreachable PLT for ftrace_caller() in a module's .init.text
following past reworking of the module VA range selection
- Memory leak in the ACPI iort_rmr_alloc_sids() after a failed
krealloc_array()
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: ftrace: fix unreachable PLT for ftrace_caller in init_module with CONFIG_DYNAMIC_FTRACE
ACPI/IORT: Fix memory leak in iort_rmr_alloc_sids()
arm64: uapi: Provide correct __BITS_PER_LONG for the compat vDSO
Pull audit fix from Paul Moore:
"A single small audit patch to fix a potential out-of-bounds read
caused by a negative array index when comparing paths"
* tag 'audit-pr-20250905' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: fix out-of-bounds read in audit_compare_dname_path()
Pull smb client fixes from Steve French:
- Fix two potential NULL pointer references
- Two debugging improvements (to help debug recent issues) a new
tracepoint, and minor improvement to DebugData
- Trivial comment cleanup
* tag '6.17-RC4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: prevent NULL pointer dereference in UTF16 conversion
smb: client: show negotiated cipher in DebugData
smb: client: add new tracepoint to trace lease break notification
smb: client: fix spellings in comments
smb: client: Fix NULL pointer dereference in cifs_debug_dirs_proc_show()
Pull hwmon fixes from Guenter Roeck:
- ina238: Various value range fixes when writing limit attributes
- mlxreg-fan: Prevent fans from getting stuck at 0 RPM
* tag 'hwmon-for-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ina238) Correctly clamp power limits
hwmon: (ina238) Correctly clamp shunt voltage limit
hwmon: (ina238) Correctly clamp temperature
hwmon: mlxreg-fan: Prevent fans from getting stuck at 0 RPM
Commit 15ae0410c37a79 ("btrfs-progs: add error handling for
device_get_partition_size_fd_stat()") in btrfs-progs inadvertently
changed it so that if the BLKGETSIZE64 ioctl on a block device returned
a size of 0, this was no longer seen as an error condition.
Unfortunately this is how disconnected NBD devices behave, meaning that
with btrfs-progs 6.16 it's now possible to add a device you can't
remove:
# btrfs device add /dev/nbd0 /root/temp
# btrfs device remove /dev/nbd0 /root/temp
ERROR: error removing device '/dev/nbd0': Invalid argument
This check should always have been done kernel-side anyway, so add a
check in btrfs_init_new_device() that the new device doesn't have a size
less than BTRFS_DEVICE_RANGE_RESERVED (i.e. 1 MB).
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull gpio fixes from Bartosz Golaszewski:
- fix GPIO submenu regression in Kconfig
- fix make clean under tools/gpio/
* tag 'gpio-fixes-for-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
tools: gpio: remove the include directory on make clean
gpio: fix GPIO submenu in Kconfig
Pull x86 platform driver fixes from Ilpo Järvinen:
- acer-wmi: Stop using ACPI bitmap for platform profile choices
- amd/hfi: Fix pcct_tbl leak
- amd/pmc: Add TUXEDO IB Pro Gen10 AMD to spurious 8042 quirks
- asus-wmi:
- Fix registration races
- Fix ROG button mapping, tablet mode on ASUS ROG Z13
- Support more keys on ExpertBook B9
- hp-wmi: Add support for Fn+P hotkey
- intel/pmc: Add Bartlett Lake support
- intel/power-domains: Use topology_logical_package_id() for package ID
* tag 'platform-drivers-x86-v6.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/intel: power-domains: Use topology_logical_package_id() for package ID
platform/x86: acer-wmi: Stop using ACPI bitmap for platform profile choices
platform/x86: hp-wmi: Add support for Fn+P hotkey
platform/x86/intel/pmc: Add Bartlett Lake support to intel_pmc_core
platform/x86: asus-wmi: Fix racy registrations
platform/x86/amd/pmc: Add TUXEDO IB Pro Gen10 AMD to spurious 8042 quirks list
platform/x86: asus-wmi: map more keys on ExpertBook B9
platform/x86: asus-wmi: Fix ROG button mapping, tablet mode on ASUS ROG Z13
platform/x86: asus-wmi: Remove extra keys from ignore_key_wlan quirk
platform/x86/amd: hfi: Fix pcct_tbl leak in amd_hfi_metadata_parser()
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith
- Fix protection information ref tag for device side gen/strip
(Christoph)
- MD pull request via Yu
- fix data loss for writemostly in raid1 (Yu Kuai)
- fix potentional data loss by skipping recovery (Li Nan)
* tag 'block-6.17-20250905' of git://git.kernel.dk/linux:
md: prevent incorrect update of resync/recovery offset
md/raid1: fix data lost for writemostly rdev
nvme: fix PI insert on write
This is an update to reflect reality, not a signal of any seismic
change. Dave Sterba has been the acting maintainer for almost a decade,
I've simply been here as a backstop in case he gets hit by a bus. The
fact is we have a strong and thriving community with any number of more
active developers that can take on that role if it's necessary. I'm
exceedingly happy and proud of the work that Dave has done in keeping us
all in line, and know that if further changes need to be made it'll be
with the development community we've built throughout the lifetime of
btrfs.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
On arm64, it has been possible for a module's sections to be placed more
than 128M away from each other since commit:
commit 3e35d303ab ("arm64: module: rework module VA range selection")
Due to this, an ftrace callsite in a module's .init.text section can be
out of branch range for the module's ftrace PLT entry (in the module's
.text section). Any attempt to enable tracing of that callsite will
result in a BRK being patched into the callsite, resulting in a fatal
exception when the callsite is later executed.
Fix this by adding an additional trampoline for .init.text, which will
be within range.
No additional trampolines are necessary due to the way a given
module's executable sections are packed together. Any executable
section beginning with ".init" will be placed in MOD_INIT_TEXT,
and any other executable section, including those beginning with ".exit",
will be placed in MOD_TEXT.
Fixes: 3e35d303ab ("arm64: module: rework module VA range selection")
Cc: <stable@vger.kernel.org> # 6.5.x
Signed-off-by: panfan <panfan@qti.qualcomm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250905032236.3220885-1-panfan@qti.qualcomm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull drm fixes from Dave Airlie:
"Weekly drm fixes roundup, nouveau has two fixes for fence/irq racing
problems that should fix a bunch of instability in userspace.
Otherwise amdgpu along with some single fixes to bridge, xe, ivpu.
Looks about usual for this time in the release.
scheduler:
- fix race in unschedulable tracepoint
bridge:
- ti-sn65dsi86: fix REFCLK setting
xe:
- Fix incorrect migration of backed-up object to VRAM
amdgpu:
- UserQ fixes
- MES 11 fix
- eDP/LVDS fix
- Fix non-DC audio clean up
- Fix duplicate cursor issue
- Fix error path in PSP init
nouveau:
- fix nonstall interrupt handling
- fix race on fence vs irq emission
- update MAINTAINERS entry
ivpu:
- prevent recovery work during device remove"
* tag 'drm-fixes-2025-09-05' of https://gitlab.freedesktop.org/drm/kernel:
drm/amd/amdgpu: Fix missing error return on kzalloc failure
drm/bridge: ti-sn65dsi86: fix REFCLK setting
MAINTAINERS: Update git entry for nouveau
drm/xe: Fix incorrect migration of backed-up object to VRAM
drm/sched: Fix racy access to drm_sched_entity.dependency
accel/ivpu: Prevent recovery work from being queued during device removal
nouveau: Membar before between semaphore writes and the interrupt
nouveau: fix disabling the nonstall irq due to storm code
drm/amd/display: Clear the CUR_ENABLE register on DCN314 w/out DPP PG
drm/amdgpu: drop hw access in non-DC audio fini
drm/amd: Re-enable common modes for eDP and LVDS
drm/amdgpu/mes11: make MES_MISC_OP_CHANGE_CONFIG failure non-fatal
drm/amdgpu/sdma: bump firmware version checks for user queue support
Pull crypto library fixes from Eric Biggers:
"Fix a regression caused by my commits that reimplemented the sha1,
sha256, and sha512 crypto_shash algorithms on top of the library API.
Specifically, the export_core and import_core methods stopped being
supported, which broke some hardware offload drivers (such as qat)
that recently started depending on these for fallback functionality.
Later I'd like to make these drivers just use the library API for
their fallback. Then these methods won't be needed anymore. But for
now, this fixes the regression for 6.17"
* tag 'libcrypto-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
crypto: sha512 - Implement export_core() and import_core()
crypto: sha256 - Implement export_core() and import_core()
crypto: sha1 - Implement export_core() and import_core()
Pull PCMCIA fixes and cleanups from Dominik Brodowski:
"A number of minor PCMCIA bugfixes and cleanups, including the removal
of unused code paths"
[ Dominik suggested this might be 6.18 material, but having looked
through this, it looks appropriate early: minor trivial fixes and then
one slightly bigger patch that removes dead code - Linus ]
* tag 'pcmcia-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux:
pcmcia: Add error handling for add_interval() in do_validate_mem()
pcmcia: cs: Remove unused pcmcia_get_socket_by_nr
pcmcia: omap: Add missing check for platform_get_resource
pcmcia: Use str_off_on() and str_yes_no() helpers
pcmcia: remove PCCARD_IODYN
pcmcia: ds: Emphasize "really" epizeuxis
pcmcia: Fix a NULL pointer dereference in __iodyn_find_io_region()
pcmcia: omap_cf: Mark driver struct with __refdata to prevent section mismatch
When a PCI device is removed with surprise hotplug, there may still be
attempts to attach the device to the default domain as part of tear down
via (__iommu_release_dma_ownership()), or because the removal happens
during probe (__iommu_probe_device()). In both cases zpci_register_ioat()
fails with a cc value indicating that the device handle is invalid. This
is because the device is no longer part of the instance as far as the
hypervisor is concerned.
Currently this leads to an error return and s390_iommu_attach_device()
fails. This triggers the WARN_ON() in __iommu_group_set_domain_nofail()
because attaching to the default domain must never fail.
With the device fenced by the hypervisor no DMAs to or from memory are
possible and the IOMMU translations have no effect. Proceed as if the
registration was successful and let the hotplug event handling clean up
the device.
This is similar to how devices in the error state are handled since
commit 59bbf59679 ("iommu/s390: Make attach succeed even if the device
is in error state") except that for removal the domain will not be
registered later. This approach was also previously discussed at the
link.
Handle both cases, error state and removal, in a helper which checks if
the error needs to be propagated or ignored. Avoid magic number
condition codes by using the pre-existing, but never used, defines for
PCI load/store condition codes and rename them to reflect that they
apply to all PCI instructions.
Cc: stable@vger.kernel.org # v6.2
Link: https://lore.kernel.org/linux-iommu/20240808194155.GD1985367@ziepe.ca/
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Link: https://lore.kernel.org/r/20250904-iommu_succeed_attach_removed-v1-1-e7f333d2f80f@linux.ibm.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
switch_to_super_page() assumes the memory range it's working on is aligned
to the target large page level. Unfortunately, __domain_mapping() doesn't
take this into account when using it, and will pass unaligned ranges
ultimately freeing a PTE range larger than expected.
Take for example a mapping with the following iov_pfn range [0x3fe400,
0x4c0600), which should be backed by the following mappings:
iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages
iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages
iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff]
to switch_to_super_page() at a 1 GiB granularity, which will in turn
free PTEs all the way to iov_pfn 0x4fffff.
Mitigate this by rounding down the iov_pfn range passed to
switch_to_super_page() in __domain_mapping()
to the target large page level.
Additionally add range alignment checks to switch_to_super_page.
Fixes: 9906b9352a ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()")
Signed-off-by: Eugene Koira <eugkoira@amazon.com>
Cc: stable@vger.kernel.org
Reviewed-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20250826143816.38686-1-eugkoira@amazon.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
zpci_get_iommu_ctrs() returns counter information to be reported as part
of device statistics; these counters are stored as part of the s390_domain.
The problem, however, is that the identity domain is not backed by an
s390_domain and so the conversion via to_s390_domain() yields a bad address
that is zero'd initially and read on-demand later via a sysfs read.
These counters aren't necessary for the identity domain; just return NULL
in this case.
This issue was discovered via KASAN with reports that look like:
BUG: KASAN: global-out-of-bounds in zpci_fmb_enable_device
when using the identity domain for a device on s390.
Cc: stable@vger.kernel.org
Fixes: 64af12c6ec ("iommu/s390: implement iommu passthrough via identity domain")
Reported-by: Cam Miller <cam@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Cam Miller <cam@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://lore.kernel.org/r/20250827210828.274527-1-mjrosato@linux.ibm.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Pull MD fixes from Yu:
"- fix data loss for writemostly in raid1, by Yu Kuai;
- fix potentional data lost by skipping recovery, by Li Nan;"
* tag 'md-6.17-20250905' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux:
md: prevent incorrect update of resync/recovery offset
md/raid1: fix data lost for writemostly rdev
When freeing an S2 MMU, we free the associated pgd, but omit to
mark the structure as invalid. Subsequently, a call to
kvm_nested_s2_unmap() would pick these invalid S2 MMUs and
pass them down the teardown path.
This ends up with a nasty warning as we try to unmap an unallocated
set of page tables.
Fix this by making the S2 MMU invalid on freeing the pgd by calling
kvm_init_nested_s2_mmu().
Fixes: 4f128f8e1a ("KVM: arm64: nv: Support multiple nested Stage-2 mmu structures")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250905072859.211369-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Multiple DRM Rust drivers (e.g. nova-core, nova-drm, Tyr, rvkms) are in
development, with at least Nova and (soon) Tyr already upstream. Having a
shared tree will ease and accelerate development, since all drivers can
consume new infrastructure in the same release cycle.
This includes infrastructure shared with other subsystem trees (e.g. Rust
or driver-core). By consolidating in drm-rust, we avoid adding extra
burden to drm-misc maintainers, e.g. dealing with cross-tree topic
branches.
The drm-misc tree is not a good fit for this stage of development, since
its documented scope is small drivers with occasional large series.
Rust drivers in development upstream, however, regularly involve large
patch series, new infrastructure, and shared topic branches, which may
not align well with drm-misc at this stage.
The drm-rust tree may not be a permanent solution. Once the core Rust,
DRM, and KMS infrastructure have stabilized, drivers and infrastructure
changes are expected to transition into drm-misc or standalone driver
trees respectively. Until then, drm-rust provides a dedicated place to
coordinate development without disrupting existing workflows too much.
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Acked-by: Janne Grunau <j@jannau.net>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Daniel Almeida <daniel.almeida@collabora.com>
Acked-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20250901202850.208116-1-dakr@kernel.org
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
commit edf2cadf01 ("perf test: add test for BPF metadata collection")
fails consistently on the version string check. The perf version
string on some of the constant integration test machines contains
characters with special meaning in grep's extended regular expression
matching algorithm. The output of perf version is:
# perf version
perf version 6.17.0-20250814.rc1.git20.24ea63ea3877.63.fc42.s390x+git
#
and the '+' character has special meaning in egrep command.
Also the use of egrep is deprecated.
Change the perf version string check to fixed character matching
and get rid of egrep's warning being deprecated. Use grep -F instead.
Output before:
# perf test -F 102
Checking BPF metadata collection
egrep: warning: egrep is obsolescent; using grep -E
Basic BPF metadata test [Failed invalid output]
102: BPF metadata collection test : FAILED!
#
Output after:
# perf test -F 102
Checking BPF metadata collection
Basic BPF metadata test [Success]
102: BPF metadata collection test : Ok
#
Fixes: edf2cadf01 ("perf test: add test for BPF metadata collection")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Blake Jones <blakejones@google.com>
Link: https://lore.kernel.org/r/20250822122540.4104658-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Pull NVMe fix from Keith:
"nvme fixes for 6.17
- Fix protection information ref tag for device side gen/strip
(Christoph)"
* tag 'nvme-6.17-2025-09-04' of git://git.infradead.org/nvme:
nvme: fix PI insert on write
When building with CONFIG_CMODEL_MEDLOW and CONFIG_LTO_CLANG, there is a
series of errors due to some files being unconditionally compiled with
'-mcmodel=medany', mismatching with the rest of the kernel built with
'-mcmodel=medlow':
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values: 'i32 3' from vmlinux.a(init.o at 899908), and 'i32 1' from vmlinux.a(net-traces.o at 1014628)
Only allow LTO to be performed when CONFIG_CMODEL_MEDANY is enabled to
ensure there will be no code model mismatch errors. An alternative
solution would be disabling LTO for the files with a different code
model than the main kernel like some specialized areas of the kernel do
but doing that for individual files is not as sustainable than
forbidding the combination altogether.
Cc: stable@vger.kernel.org
Fixes: 021d23428b ("RISC-V: build: Allow LTO to be selected")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506290255.KBVM83vZ-lkp@intel.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20250710-riscv-restrict-lto-to-medany-v1-1-b1dac9871ecf@kernel.org
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Merge series from Charles Keepax <ckeepax@opensource.cirrus.com>:
Just some minor SDCA bug fixes and some minor structure reordering to
improve the padding.
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wireless and Bluetooth.
We're reverting the removal of a Sundance driver, a user has appeared.
This makes the PR rather large in terms of LoC.
There's a conspicuous absence of real, user-reported 6.17 issues.
Slightly worried that the summer distracted people from testing.
Previous releases - regressions:
- ax25: properly unshare skbs in ax25_kiss_rcv()
Previous releases - always broken:
- phylink: disable autoneg for interfaces that have no inband, fix
regression on pcs-lynx (NXP LS1088)
- vxlan: fix null-deref when using nexthop objects
- batman-adv: fix OOB read/write in network-coding decode
- icmp: icmp_ndo_send: fix reversing address translation for replies
- tcp: fix socket ref leak in TCP-AO failure handling for IPv6
- mctp:
- mctp_fraq_queue should take ownership of passed skb
- usb: initialise mac header in RX path, avoid WARN
- wifi: mac80211: do not permit 40 MHz EHT operation on 5/6 GHz,
respect device limitations
- wifi: wilc1000: avoid buffer overflow in WID string configuration
- wifi: mt76:
- fix regressions from mt7996 MLO support rework
- fix offchannel handling issues on mt7996
- fix multiple wcid linked list corruption issues
- mt7921: don't disconnect when AP requests switch to a channel
which requires radar detection
- mt7925u: use connac3 tx aggr check in tx complete
- wifi: intel:
- improve validation of ACPI DSM data
- cfg: restore some 1000 series configs
- wifi: ath:
- ath11k: a fix for GTK rekeying
- ath12k: a missed WiFi7 capability (multi-link EMLSR)
- eth: intel:
- ice: fix races in "low latency" firmware interface for Tx timestamps
- idpf: set mac type when adding and removing MAC filters
- i40e: remove racy read access to some debugfs files
Misc:
- Revert "eth: remove the DLink/Sundance (ST201) driver"
- netfilter: conntrack: helper: Replace -EEXIST by -EBUSY, avoid
confusing modprobe"
* tag 'net-6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits)
phy: mscc: Stop taking ts_lock for tx_queue and use its own lock
selftest: net: Fix weird setsockopt() in bind_bhash.c.
MAINTAINERS: add Sabrina to TLS maintainers
gve: update MAINTAINERS
ppp: fix memory leak in pad_compress_skb
net: xilinx: axienet: Add error handling for RX metadata pointer retrieval
net: atm: fix memory leak in atm_register_sysfs when device_register fail
netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX
selftests: netfilter: fix udpclash tool hang
ax25: properly unshare skbs in ax25_kiss_rcv()
mctp: return -ENOPROTOOPT for unknown getsockopt options
net/smc: Remove validation of reserved bits in CLC Decline message
ipv4: Fix NULL vs error pointer check in inet_blackhole_dev_init()
net: thunder_bgx: decrement cleanup index before use
net: thunder_bgx: add a missing of_node_put
net: phylink: move PHY interrupt request to non-fail path
net: lockless sock_i_ino()
tools: ynl-gen: fix nested array counting
wifi: wilc1000: avoid buffer overflow in WID string configuration
wifi: cfg80211: sme: cap SSID length in __cfg80211_connect_result()
...
Pull slab fixes from Vlastimil Babka:
- Stable fix to make slub_debug code not access invalid pointers in the
process of reporting issues (Li Qiong)
- Stable fix to make object tracking pass gfp flags to stackdepot to
avoid deadlock in contexts that can't even wake up kswapd due to e.g.
timers debugging enabled (yangshiguang)
* tag 'slab-for-6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm: slub: avoid wake up kswapd in set_track_prepare
mm/slub: avoid accessing metadata when pointer is invalid in object_err()
Commit bb4a0f497b ("ASoC: codecs: lpass: Drop unused
AIF_INVALID first DAI identifier") removed first entry in enum with DAI
identifiers, because it looked unused. Turns out that there is a
relation between DAI ID and "WSA RX0 Mux"-like kcontrols (which use
"rx_mux_text" array). That "rx_mux_text" array used first three entries
of DAI IDs enum, with value '0' being invalid.
The value passed tp "WSA RX0 Mux"-like kcontrols was used as DAI ID and
set to configure active channel count and mask, which are arrays indexed
by DAI ID.
After removal of first AIF_INVALID DAI identifier, this kcontrol was
updating wrong entries in active channel count and mask arrays which was
visible in reduced quality (distortions) during speaker playback on
several boards like Lenovo T14s laptop and Qualcomm SM8550-based boards.
Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
Fixes: bb4a0f497b ("ASoC: codecs: lpass: Drop unused AIF_INVALID first DAI identifier")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Tested-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Message-ID: <20250831151401.30897-2-krzysztof.kozlowski@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Commit bb4a0f497b ("ASoC: codecs: lpass: Drop unused
AIF_INVALID first DAI identifier") removed first entry in enum with DAI
identifiers, because it looked unused. Turns out that there is a
relation between DAI ID and "RX_MACRO RX0 MUX"-like kcontrols which use
"rx_macro_mux_text" array. That "rx_macro_mux_text" array used first
three entries of DAI IDs enum, with value '0' being invalid.
The value passed tp "RX_MACRO RX0 MUX"-like kcontrols was used as DAI ID
and set to configure active channel count and mask, which are arrays
indexed by DAI ID.
After removal of first AIF_INVALID DAI identifier, this kcontrol was
updating wrong entries in active channel count and mask arrays which was
visible in reduced quality (distortions) during headset playback on the
Qualcomm SM8750 MTP8750 board. It seems it also fixes recording silence
(instead of actual sound) via headset, even though that's different
macro codec.
Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
Fixes: bb4a0f497b ("ASoC: codecs: lpass: Drop unused AIF_INVALID first DAI identifier")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Tested-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Message-ID: <20250901074403.137263-2-krzysztof.kozlowski@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
There can be a NULL pointer dereference bug here. NULL is passed to
__cifs_sfu_make_node without checks, which passes it unchecked to
cifs_strndup_to_utf16, which in turn passes it to
cifs_local_to_utf16_bytes where '*from' is dereferenced, causing a crash.
This patch adds a check for NULL 'src' in cifs_strndup_to_utf16 and
returns NULL early to prevent dereferencing NULL pointer.
Found by Linux Verification Center (linuxtesting.org) with SVACE
Signed-off-by: Makar Semyonov <m.semenov@tssltd.ru>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
In md_do_sync(), when md_sync_action returns ACTION_FROZEN, subsequent
call to md_sync_position() will return MaxSector. This causes
'curr_resync' (and later 'recovery_offset') to be set to MaxSector too,
which incorrectly signals that recovery/resync has completed, even though
disk data has not actually been updated.
To fix this issue, skip updating any offset values when the sync action
is FROZEN. The same holds true for IDLE.
Fixes: 7d9f107a4e ("md: use new helpers in md_do_sync()")
Signed-off-by: Li Nan <linan122@huawei.com>
Link: https://lore.kernel.org/linux-raid/20250904073452.3408516-1-linan666@huaweicloud.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
When transmitting a PTP frame which is timestamp using 2 step, the
following warning appears if CONFIG_PROVE_LOCKING is enabled:
=============================
[ BUG: Invalid wait context ]
6.17.0-rc1-00326-ge6160462704e #427 Not tainted
-----------------------------
ptp4l/119 is trying to lock:
c2a44ed4 (&vsc8531->ts_lock){+.+.}-{3:3}, at: vsc85xx_txtstamp+0x50/0xac
other info that might help us debug this:
context-{4:4}
4 locks held by ptp4l/119:
#0: c145f068 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x58/0x1440
#1: c29df974 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x5c4/0x1440
#2: c2aaaad0 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x108/0x350
#3: c2aac170 (&lan966x->tx_lock){+.-.}-{2:2}, at: lan966x_port_xmit+0xd0/0x350
stack backtrace:
CPU: 0 UID: 0 PID: 119 Comm: ptp4l Not tainted 6.17.0-rc1-00326-ge6160462704e #427 NONE
Hardware name: Generic DT based system
Call trace:
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x7c/0xac
dump_stack_lvl from __lock_acquire+0x8e8/0x29dc
__lock_acquire from lock_acquire+0x108/0x38c
lock_acquire from __mutex_lock+0xb0/0xe78
__mutex_lock from mutex_lock_nested+0x1c/0x24
mutex_lock_nested from vsc85xx_txtstamp+0x50/0xac
vsc85xx_txtstamp from lan966x_fdma_xmit+0xd8/0x3a8
lan966x_fdma_xmit from lan966x_port_xmit+0x1bc/0x350
lan966x_port_xmit from dev_hard_start_xmit+0xc8/0x2c0
dev_hard_start_xmit from sch_direct_xmit+0x8c/0x350
sch_direct_xmit from __dev_queue_xmit+0x680/0x1440
__dev_queue_xmit from packet_sendmsg+0xfa4/0x1568
packet_sendmsg from __sys_sendto+0x110/0x19c
__sys_sendto from sys_send+0x18/0x20
sys_send from ret_fast_syscall+0x0/0x1c
Exception stack(0xf0b05fa8 to 0xf0b05ff0)
5fa0: 00000001 0000000e 0000000e 0004b47a 0000003a 00000000
5fc0: 00000001 0000000e 00000000 00000121 0004af58 00044874 00000000 00000000
5fe0: 00000001 bee9d420 00025a10 b6e75c7c
So, instead of using the ts_lock for tx_queue, use the spinlock that
skb_buff_head has.
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Fixes: 7d272e63e0 ("net: phy: mscc: timestamping and PHC support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://patch.msgid.link/20250902121259.3257536-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bind_bhash.c passes (SO_REUSEADDR | SO_REUSEPORT) to setsockopt().
In the asm-generic definition, the value happens to match with the
bare SO_REUSEPORT, (2 | 15) == 15, but not on some arch.
arch/alpha/include/uapi/asm/socket.h:18:#define SO_REUSEADDR 0x0004
arch/alpha/include/uapi/asm/socket.h:24:#define SO_REUSEPORT 0x0200
arch/mips/include/uapi/asm/socket.h:24:#define SO_REUSEADDR 0x0004 /* Allow reuse of local addresses. */
arch/mips/include/uapi/asm/socket.h:33:#define SO_REUSEPORT 0x0200 /* Allow local address and port reuse. */
arch/parisc/include/uapi/asm/socket.h:12:#define SO_REUSEADDR 0x0004
arch/parisc/include/uapi/asm/socket.h:18:#define SO_REUSEPORT 0x0200
arch/sparc/include/uapi/asm/socket.h:13:#define SO_REUSEADDR 0x0004
arch/sparc/include/uapi/asm/socket.h:20:#define SO_REUSEPORT 0x0200
include/uapi/asm-generic/socket.h:12:#define SO_REUSEADDR 2
include/uapi/asm-generic/socket.h:27:#define SO_REUSEPORT 15
Let's pass SO_REUSEPORT only.
Fixes: c35ecb95c4 ("selftests/net: Add test for timing a bind request to a port with a populated bhash entry")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250903222938.2601522-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If alloc_skb() fails in pad_compress_skb(), it returns NULL without
releasing the old skb. The caller does:
skb = pad_compress_skb(ppp, skb);
if (!skb)
goto drop;
drop:
kfree_skb(skb);
When pad_compress_skb() returns NULL, the reference to the old skb is
lost and kfree_skb(skb) ends up doing nothing, leading to a memory leak.
Align pad_compress_skb() semantics with realloc(): only free the old
skb if allocation and compression succeed. At the call site, use the
new_skb variable so the original skb is not lost when pad_compress_skb()
fails.
Fixes: b3f9b92a6e ("[PPP]: add PPP MPPE encryption module")
Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20250903100726.269839-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add proper error checking for dmaengine_desc_get_metadata_ptr() which
can return an error pointer and lead to potential crashes or undefined
behaviour if the pointer retrieval fails.
Properly handle the error by unmapping DMA buffer, freeing the skb and
returning early to prevent further processing with invalid data.
Fixes: 6a91b846af ("net: axienet: Introduce dmaengine support")
Signed-off-by: Abin Joseph <abin.joseph@amd.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://patch.msgid.link/20250903025213.3120181-1-abin.joseph@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal says:
====================
netfilter: updates for net
1) Fix a silly bug in conntrack selftest, busyloop may get optimized to
for (;;), reported by Yi Chen.
2) Introduce new NFTA_DEVICE_PREFIX attribute in nftables netlink api,
re-using old NFTA_DEVICE_NAME led to confusion with different
kernel/userspace versions. This refines the wildcard interface
support added in 6.16 release. From Phil Sutter.
* tag 'nf-25-09-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX
selftests: netfilter: fix udpclash tool hang
====================
Link: https://patch.msgid.link/20250904072548.3267-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If a proximity event node is defined so as to specify the wake-up
properties of the touch surface, the proximity event interrupt is
enabled unconditionally. This may result in unwanted interrupts.
Solve this problem by enabling the interrupt only if the event is
mapped to a key or switch code.
Signed-off-by: Jeff LaBundy <jeff@labundy.com>
Link: https://lore.kernel.org/r/aKJxxgEWpNaNcUaW@nixie71
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The MBQ size function returns an integer representing the size of a
Control. Currently if the Control is not found the function will return
false which makes little sense. Correct this typo to return -EINVAL.
Fixes: e3f7caf74b ("ASoC: SDCA: Add generic regmap SDCA helpers")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Message-ID: <20250820163717.1095846-2-ckeepax@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
If earlier opening of source graph fails (e.g. ADSP rejects due to
incorrect audioreach topology), the graph is closed and
"dai_data->graph[dai->id]" is assigned NULL. Preparing the DAI for sink
graph continues though and next call to q6apm_lpass_dai_prepare()
receives dai_data->graph[dai->id]=NULL leading to NULL pointer
exception:
qcom-apm gprsvc:service:2:1: Error (1) Processing 0x01001002 cmd
qcom-apm gprsvc:service:2:1: DSP returned error[1001002] 1
q6apm-lpass-dais 30000000.remoteproc:glink-edge:gpr:service@1:bedais: fail to start APM port 78
q6apm-lpass-dais 30000000.remoteproc:glink-edge:gpr:service@1:bedais: ASoC: error at snd_soc_pcm_dai_prepare on TX_CODEC_DMA_TX_3: -22
Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a8
...
Call trace:
q6apm_graph_media_format_pcm+0x48/0x120 (P)
q6apm_lpass_dai_prepare+0x110/0x1b4
snd_soc_pcm_dai_prepare+0x74/0x108
__soc_pcm_prepare+0x44/0x160
dpcm_be_dai_prepare+0x124/0x1c0
Fixes: 30ad723b93 ("ASoC: qdsp6: audioreach: add q6apm lpass dai support")
Cc: stable@vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Message-ID: <20250904101849.121503-2-krzysztof.kozlowski@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
When device_register() return error in atm_register_sysfs(), which can be
triggered by kzalloc fail in device_private_init() or other reasons,
kmemleak reports the following memory leaks:
unreferenced object 0xffff88810182fb80 (size 8):
comm "insmod", pid 504, jiffies 4294852464
hex dump (first 8 bytes):
61 64 75 6d 6d 79 30 00 adummy0.
backtrace (crc 14dfadaf):
__kmalloc_node_track_caller_noprof+0x335/0x450
kvasprintf+0xb3/0x130
kobject_set_name_vargs+0x45/0x120
dev_set_name+0xa9/0xe0
atm_register_sysfs+0xf3/0x220
atm_dev_register+0x40b/0x780
0xffffffffa000b089
do_one_initcall+0x89/0x300
do_init_module+0x27b/0x7d0
load_module+0x54cd/0x5ff0
init_module_from_file+0xe4/0x150
idempotent_init_module+0x32c/0x610
__x64_sys_finit_module+0xbd/0x120
do_syscall_64+0xa8/0x270
entry_SYSCALL_64_after_hwframe+0x77/0x7f
When device_create_file() return error in atm_register_sysfs(), the same
issue also can be triggered.
Function put_device() should be called to release kobj->name memory and
other device resource, instead of kfree().
Fixes: 1fa5ae857b ("driver core: get rid of struct device's bus_id string array")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250901063537.1472221-1-wangliang74@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This new attribute is supposed to be used instead of NFTA_DEVICE_NAME
for simple wildcard interface specs. It holds a NUL-terminated string
representing an interface name prefix to match on.
While kernel code to distinguish full names from prefixes in
NFTA_DEVICE_NAME is simpler than this solution, reusing the existing
attribute with different semantics leads to confusion between different
versions of kernel and user space though:
* With old kernels, wildcards submitted by user space are accepted yet
silently treated as regular names.
* With old user space, wildcards submitted by kernel may cause crashes
since libnftnl expects NUL-termination when there is none.
Using a distinct attribute type sanitizes these situations as the
receiving part detects and rejects the unexpected attribute nested in
*_HOOK_DEVS attributes.
Fixes: 6d07a28950 ("netfilter: nf_tables: Support wildcard netdev hook specs")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
Yi Chen reports that 'udpclash' loops forever depending on compiler
(and optimization level used); while (x == 1) gets optimized into
for (;;). Add volatile qualifier to avoid that.
While at it, also run it under timeout(1) and fix the resize script
to not ignore the timeout passed as second parameter to insert_flood.
Reported-by: Yi Chen <yiche@redhat.com>
Suggested-by: Yi Chen <yiche@redhat.com>
Fixes: 78a5883635 ("selftests: netfilter: add conntrack clash resolution test case")
Signed-off-by: Florian Westphal <fw@strlen.de>
Pull smb server fix from Steve French:
- fix handling filenames with ":" (colon) in them
* tag 'v6.17-rc4-ksmbd-fix' of git://git.samba.org/ksmbd:
ksmbd: allow a filename to contain colons on SMB3.1.1 posix extensions
Patch series "mm/damon: avoid divide-by-zero in DAMON module's parameters
application".
DAMON's RECLAIM and LRU_SORT modules perform no validation on
user-configured parameters during application, which may lead to
division-by-zero errors.
Avoid the divide-by-zero by adding validation checks when DAMON modules
attempt to apply the parameters.
This patch (of 2):
During the calculation of 'hot_thres' and 'cold_thres', either
'sample_interval' or 'aggr_interval' is used as the divisor, which may
lead to division-by-zero errors. Fix it by directly returning -EINVAL
when such a case occurs. Additionally, since 'aggr_interval' is already
required to be set no smaller than 'sample_interval' in damon_set_attrs(),
only the case where 'sample_interval' is zero needs to be checked.
Link: https://lkml.kernel.org/r/20250827115858.1186261-2-yanquanmin1@huawei.com
Fixes: 40e983cca9 ("mm/damon: introduce DAMON-based LRU-lists Sorting")
Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: ze zuo <zuoze1@huawei.com>
Cc: <stable@vger.kernel.org> [6.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kernel initializes the "jiffies" timer as 5 minutes below zero, as shown
in include/linux/jiffies.h
/*
* Have the 32 bit jiffies value wrap 5 minutes after boot
* so jiffies wrap bugs show up earlier.
*/
#define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ))
And jiffies comparison help functions cast unsigned value to signed to
cover wraparound
#define time_after_eq(a,b) \
(typecheck(unsigned long, a) && \
typecheck(unsigned long, b) && \
((long)((a) - (b)) >= 0))
When quota->charged_from is initialized to 0, time_after_eq() can
incorrectly return FALSE even after reset_interval has elapsed. This
occurs when (jiffies - reset_interval) produces a value with MSB=1, which
is interpreted as negative in signed arithmetic.
This issue primarily affects 32-bit systems because: On 64-bit systems:
MSB=1 values occur after ~292 million years from boot (assuming HZ=1000),
almost impossible.
On 32-bit systems: MSB=1 values occur during the first 5 minutes after
boot, and the second half of every jiffies wraparound cycle, starting from
day 25 (assuming HZ=1000)
When above unexpected FALSE return from time_after_eq() occurs, the
charging window will not reset. The user impact depends on esz value at
that time.
If esz is 0, scheme ignores configured quotas and runs without any limits.
If esz is not 0, scheme stops working once the quota is exhausted. It
remains until the charging window finally resets.
So, change quota->charged_from to jiffies at damos_adjust_quota() when it
is considered as the first charge window. By this change, we can avoid
unexpected FALSE return from time_after_eq()
Link: https://lkml.kernel.org/r/20250822025057.1740854-1-ekffu200098@gmail.com
Fixes: 2b8a248d58 ("mm/damon/schemes: implement size quota for schemes application speed control") # 5.16
Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When restoring a reservation for an anonymous page, we need to check to
freeing a surplus. However, __unmap_hugepage_range() causes data race
because it reads h->surplus_huge_pages without the protection of
hugetlb_lock.
And adjust_reservation is a boolean variable that indicates whether
reservations for anonymous pages in each folio should be restored.
Therefore, it should be initialized to false for each round of the loop.
However, this variable is not initialized to false except when defining
the current adjust_reservation variable.
This means that once adjust_reservation is set to true even once within
the loop, reservations for anonymous pages will be restored
unconditionally in all subsequent rounds, regardless of the folio's state.
To fix this, we need to add the missing hugetlb_lock, unlock the
page_table_lock earlier so that we don't lock the hugetlb_lock inside the
page_table_lock lock, and initialize adjust_reservation to false on each
round within the loop.
Link: https://lkml.kernel.org/r/20250823182115.1193563-1-aha310510@gmail.com
Fixes: df7a6d1f64 ("mm/hugetlb: restore the reservation if needed")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Reported-by: syzbot+417aeb05fd190f3a6da9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=417aeb05fd190f3a6da9
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In do_migrate_range(), the hwpoisoned folio may be large folio, which
can't be handled by unmap_poisoned_folio().
I can reproduce this issue in qemu after adding delay in memory_failure()
BUG: kernel NULL pointer dereference, address: 0000000000000000
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
RIP: 0010:try_to_unmap_one+0x16a/0xfc0
<TASK>
rmap_walk_anon+0xda/0x1f0
try_to_unmap+0x78/0x80
? __pfx_try_to_unmap_one+0x10/0x10
? __pfx_folio_not_mapped+0x10/0x10
? __pfx_folio_lock_anon_vma_read+0x10/0x10
unmap_poisoned_folio+0x60/0x140
do_migrate_range+0x4d1/0x600
? slab_memory_callback+0x6a/0x190
? notifier_call_chain+0x56/0xb0
offline_pages+0x3e6/0x460
memory_subsys_offline+0x130/0x1f0
device_offline+0xba/0x110
acpi_bus_offline+0xb7/0x130
acpi_scan_hot_remove+0x77/0x290
acpi_device_hotplug+0x1e0/0x240
acpi_hotplug_work_fn+0x1a/0x30
process_one_work+0x186/0x340
Besides, do_migrate_range() may be called between memory_failure set
hwpoison flag and isolate the folio from lru, so remove WARN_ON(). In other
places, unmap_poisoned_folio() is called when the folio is isolated, obey
it in do_migrate_range() too.
[david@redhat.com: don't abort offlining, fixed typo, add comment]
Link: https://lkml.kernel.org/r/3c214dff-9649-4015-840f-10de0e03ebe4@redhat.com
Fixes: b15c87263a ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Luis Chamberalin <mcgrof@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Raghav <kernel@pankajraghav.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In mctp_getsockopt(), unrecognized options currently return -EINVAL.
In contrast, mctp_setsockopt() returns -ENOPROTOOPT for unknown
options.
Update mctp_getsockopt() to also return -ENOPROTOOPT for unknown
options. This aligns the behavior of getsockopt() and setsockopt(),
and matches the standard kernel socket API convention for handling
unsupported options.
Fixes: 99ce45d5e7 ("mctp: Implement extended addressing")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20250902102059.1370008-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The blamed commit added code which could return an error after we
requested the PHY interrupt. When we return an error, the caller
will call phy_detach() which fails to free the interrupt.
Rearrange the code such that failing operations happen before the
interrupt is requested, thereby allowing phy_detach() to be used.
Note that replacing phy_detach() with phy_disconnect() in these
paths could lead to freeing an interrupt which was never requested.
Fixes: 1942b1c6f6 ("net: phylink: make configuring clock-stop dependent on MAC support")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1ut35k-00000001UEl-0iq6@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-09-02 (ice, idpf, i40e, ixgbe, e1000e)
For ice:
Jake adds checks for initialization of Tx timestamp tracking structure
to prevent NULL pointer dereferences.
For idpf:
Josh moves freeing of auxiliary device id to prevent use-after-free issue.
Emil sets, expected, MAC type value when sending virtchnl add/delete MAC
commands.
For i40e:
Jake removes read debugfs access as 'netdev_ops' has the possibility to
overflow.
Zhen Ni adds handling for when MAC list is empty.
For ixgbe:
Alok Tiwari corrects bitmap being used for link speeds.
For e1000e:
Vitaly adds check to ensure overflow does not occur in
e1000_set_eeprom().
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
e1000e: fix heap overflow in e1000_set_eeprom
ixgbe: fix incorrect map used in eee linkmode
i40e: Fix potential invalid access when MAC list is empty
i40e: remove read access to debugfs files
idpf: set mac type when adding and removing MAC filters
idpf: fix UAF in RDMA core aux dev deinitialization
ice: fix NULL access of tx->in_use in ice_ll_ts_intr
ice: fix NULL access of tx->in_use in ice_ptp_ts_irq
====================
Link: https://patch.msgid.link/20250902232131.2739555-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove the SMBus Quick operation from this driver because it is not
natively supported by the hardware and is wrongly implemented in the
driver.
The I2C controllers in Realtek RTL9300 and RTL9310 are SMBus-compliant
but there doesn't seem to be native support for the SMBus Quick
operation. It is not explicitly mentioned in the documentation but
looking at the registers which configure an SMBus transaction, one can
see that the data length cannot be set to 0. This suggests that the
hardware doesn't allow any SMBus message without data bytes (except for
those it does on it's own, see SMBus Block Read).
The current implementation of SMBus Quick operation passes a length of
0 (which is actually invalid). Before the fix of a bug in a previous
commit, this led to a read operation of 16 bytes from any register (the
one of a former transaction or any other value.
This caused issues like soft-bricked SFP modules after a simple probe
with i2cdetect which uses Quick by default. Running this with SFP
modules whose EEPROM isn't write-protected, some of the initial bytes
are overwritten because a 16-byte write operation is executed instead of
a Quick Write. (This temporarily soft-bricked one of my DAC cables.)
Because SMBus Quick operation is obviously not supported on these
controllers (because a length of 0 cannot be set, even when no register
address is set), remove that instead of claiming there is support. There
also shouldn't be any kind of emulated 'Quick' which just does another
kind of operation in the background. Otherwise, specific issues occur
in case of a 'Quick' Write which actually writes unknown data to an
unknown register.
Fixes: c366be7202 ("i2c: Add driver for the RTL9300 I2C controller")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
Tested-by: Sven Eckelmann <sven@narfation.org>
Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz> # On RTL9302C based board
Tested-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20250831100457.3114-4-jelonek.jonas@gmail.com
Add an explicit check for the xfer length to 'rtl9300_i2c_config_xfer'
to ensure the data length isn't within the supported range. In
particular a data length of 0 is not supported by the hardware and
causes unintended or destructive behaviour.
This limitation becomes obvious when looking at the register
documentation [1]. 4 bits are reserved for DATA_WIDTH and the value
of these 4 bits is used as N + 1, allowing a data length range of
1 <= len <= 16.
Affected by this is the SMBus Quick Operation which works with a data
length of 0. Passing 0 as the length causes an underflow of the value
due to:
(len - 1) & 0xf
and effectively specifying a transfer length of 16 via the registers.
This causes a 16-byte write operation instead of a Quick Write. For
example, on SFP modules without write-protected EEPROM this soft-bricks
them by overwriting some initial bytes.
For completeness, also add a quirk for the zero length.
[1] https://svanheule.net/realtek/longan/register/i2c_mst1_ctrl2
Fixes: c366be7202 ("i2c: Add driver for the RTL9300 I2C controller")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
Tested-by: Sven Eckelmann <sven@narfation.org>
Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz> # On RTL9302C based board
Tested-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20250831100457.3114-3-jelonek.jonas@gmail.com
The blamed commit introduced the concept of split attribute
counting, and later allocating an array to hold them, however
TypeArrayNest wasn't updated to use the new counting variable.
Abbreviated example from tools/net/ynl/generated/nl80211-user.c:
nl80211_if_combination_attributes_parse(...):
unsigned int n_limits = 0;
[...]
ynl_attr_for_each(attr, nlh, yarg->ys->family->hdr_len)
if (type == NL80211_IFACE_COMB_LIMITS)
ynl_attr_for_each_nested(attr2, attr)
dst->_count.limits++;
if (n_limits) {
dst->_count.limits = n_limits;
/* allocate and parse attributes */
}
In the above example n_limits is guaranteed to always be 0,
hence the conditional is unsatisfiable and is optimized out.
This patch changes the attribute counting to use n_limits++ in the
attribute counting loop in the above example.
Fixes: 58da455b31 ("tools: ynl-gen: improve unwind on parsing errors")
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Link: https://patch.msgid.link/20250902160001.760953-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johannes Berg says:
====================
Just a few updates:
- a set of buffer overflow fixes
- ath11k: a fix for GTK rekeying
- ath12k: a missed WiFi7 capability
* tag 'wireless-2025-09-03' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: wilc1000: avoid buffer overflow in WID string configuration
wifi: cfg80211: sme: cap SSID length in __cfg80211_connect_result()
wifi: libertas: cap SSID len in lbs_associate()
wifi: cw1200: cap SSID length in cw1200_do_join()
wifi: ath11k: fix group data packet drops during rekey
wifi: ath12k: Set EMLSR support flag in MLO flags for EML-capable stations
====================
Link: https://patch.msgid.link/20250903075602.30263-4-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull SoC fixes from Arnd Bergmann:
"These are mainly devicetree fixes for the rockchip and nxp platforms
on arm64, addressing mistakes in the board and soc specific
descriptions.
In particular the newly added Rock 5T board required multiple bugfixes
for PCIe and USB, while on the i.MX platform there are a number of
regulator related fixes. The only other platforms with devicetree
fixes are at91 with a fixup for SD/MMC and a change to enable all the
available UARTS on the Axiado reference board.
Also on the at91 platform, a Kconfig change addresses a regression
that stopped the DMA engine from working in 6.17-rc.
Three drivers each have a simple bugfix, stopping incorrect behavior
in op-tee firmware, the tee subsystem and the qualcomm mdt_loader.
Two trivial MAINTAINERS file changes are needed to make sure that
patches reach the correct maintainer, but don't change the actual
responsibilities"
* tag 'soc-fixes-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (27 commits)
ARM: dts: microchip: sama7d65: Force SDMMC Legacy mode
ARM: at91: select ARCH_MICROCHIP
arm64: dts: rockchip: fix second M.2 slot on ROCK 5T
arm64: dts: rockchip: fix USB on RADXA ROCK 5T
MAINTAINERS: exclude defconfig from ARM64 PORT
arm64: dts: axiado: Add missing UART aliases
MAINTAINERS: Update Nobuhiro Iwamatsu's email address
arm64: dts: rockchip: Add vcc-supply to SPI flash on Pinephone Pro
arm64: dts: rockchip: fix es8388 address on rk3588s-roc-pc
arm64: dts: rockchip: Fix Bluetooth interrupts flag on Neardi LBA3368
arm64: dts: rockchip: correct network description on Sige5
arm64: dts: rockchip: Minor whitespace cleanup
ARM: dts: rockchip: Minor whitespace cleanup
arm64: dts: rockchip: Add supplies for eMMC on rk3588-orangepi-5
arm64: dts: rockchip: Fix the headphone detection on the orangepi 5 plus
arm64: dts: imx95: Fix JPEG encoder node assigned clock
arm64: dts: imx95-19x19-evk: correct the phy setting for flexcan1/2
arm64: dts: imx8mp: Fix missing microSD slot vqmmc on Data Modul i.MX8M Plus eDM SBC
arm64: dts: imx8mp: Fix missing microSD slot vqmmc on DH electronics i.MX8M Plus DHCOM
arm64: dts: imx8mp-tqma8mpql: remove virtual 3.3V regulator
...
This reverts:
commit bead880022 ("drm/nouveau: Remove waitque for sched teardown")
commit 5f46f5c7af ("drm/nouveau: Add new callback for scheduler teardown")
from the drm/sched teardown leak fix series:
https://lore.kernel.org/dri-devel/20250710125412.128476-2-phasta@kernel.org/
The aforementioned series removed a blocking waitqueue from
nouveau_sched_fini(). It was mistakenly assumed that this waitqueue only
prevents jobs from leaking, which the series fixed.
The waitqueue, however, also guarantees that all VM_BIND related jobs
are finished in order, cleaning up mappings in the GPU's MMU. These jobs
must be executed sequentially. Without the waitqueue, this is no longer
guaranteed, because entity and scheduler teardown can race with each
other.
Revert all patches related to the waitqueue removal.
Fixes: bead880022 ("drm/nouveau: Remove waitque for sched teardown")
Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250901083107.10206-2-phasta@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Microchip AT91 fixes for v6.17
This update includes:
- adaptation to the SDHCI capabilities on sama7d65 curiosity board DT as
SDHCI quirks are not in place yet. SD/MMC don't work without these
- addition of one Kconfig symbol that is already used in DMA tree for
6.17. XDMA cannot be selected if not present.
* tag 'at91-fixes-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
ARM: dts: microchip: sama7d65: Force SDMMC Legacy mode
ARM: at91: select ARCH_MICROCHIP
Link: https://lore.kernel.org/r/20250903173403.113604-1-nicolas.ferre@microchip.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
When a watch on dir=/ is combined with an fsnotify event for a
single-character name directly under / (e.g., creating /a), an
out-of-bounds read can occur in audit_compare_dname_path().
The helper parent_len() returns 1 for "/". In audit_compare_dname_path(),
when parentlen equals the full path length (1), the code sets p = path + 1
and pathlen = 1 - 1 = 0. The subsequent loop then dereferences
p[pathlen - 1] (i.e., p[-1]), causing an out-of-bounds read.
Fix this by adding a pathlen > 0 check to the while loop condition
to prevent the out-of-bounds access.
Cc: stable@vger.kernel.org
Fixes: e92eebb0d6 ("audit: fix suffixed '/' filename matching")
Reported-by: Stanislav Fort <disclosure@aisle.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Stanislav Fort <stanislav.fort@aisle.com>
[PM: subject tweak, sign-off email fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Currently the kzalloc failure check just sets reports the failure
and sets the variable ret to -ENOMEM, which is not checked later
for this specific error. Fix this by just returning -ENOMEM rather
than setting ret.
Fixes: 4fb9307154 ("drm/amd/amdgpu: remove redundant host to psp cmd buf allocations")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1ee9d1a096)
Pull bitmap fix from Yury Norov:
"Fix sched_numa_find_nth_cpu() if mask offline
sched_numa_find_nth_cpu() uses a bsearch to look for the 'closest' CPU
in sched_domains_numa_masks and given cpus mask. However they might
not intersect if all CPUs in the cpus mask are offline.
bsearch will return NULL in that case, bail out instead of
dereferencing a bogus pointer"
* tag 'bitmap-for-6.17-rc5' of https://github.com/norov/linux:
sched: Fix sched_numa_find_nth_cpu() if mask offline
Add a quirk to include the codec amplifier function
for Dell SKU's listed in quirk table.
Note: In these SKU's, the RT722 codec amplifier is excluded,
and an external amplifier is used instead.
Signed-off-by: Syed Saba Kareem <syed.sabakareem@amd.com>
Message-ID: <20250903171817.2549507-1-syed.sabakareem@amd.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The newly added Rock 5T board needed slightly bigger fixes to make the
PCIe and USB actually work, because the PCIe does share its lanes between
two ports and the usb needs to toggle a gpio to supply power.
The other interesting fix is the headphone detection on the Orange Pi 5+.
The rest are some added supplies to make the boot log less scary and a
number of styling fixes.
* tag 'v6.17-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: fix second M.2 slot on ROCK 5T
arm64: dts: rockchip: fix USB on RADXA ROCK 5T
arm64: dts: rockchip: Add vcc-supply to SPI flash on Pinephone Pro
arm64: dts: rockchip: fix es8388 address on rk3588s-roc-pc
arm64: dts: rockchip: Fix Bluetooth interrupts flag on Neardi LBA3368
arm64: dts: rockchip: correct network description on Sige5
arm64: dts: rockchip: Minor whitespace cleanup
ARM: dts: rockchip: Minor whitespace cleanup
arm64: dts: rockchip: Add supplies for eMMC on rk3588-orangepi-5
arm64: dts: rockchip: Fix the headphone detection on the orangepi 5 plus
arm64: dts: rockchip: Add vcc-supply to SPI flash on rk3399-pinebook-pro
arm64: dts: rockchip: mark eeprom as read-only for Radxa E52C
Link: https://lore.kernel.org/r/5909239.Y6S9NjorxK@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The on-host hardware ECC engine remains registered both when
the spi_register_controller() function returns with an error
and also on device removal.
Change the qcom_spi_probe() function to unregister the engine
on the error path, and add the missing unregistering call to
qcom_spi_remove() to avoid possible use-after-free issues.
Fixes: 7304d19090 ("spi: spi-qpic: add driver for QCOM SPI NAND flash Interface")
Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
Message-ID: <20250903-qpic-snand-unregister-ecceng-v1-1-ef5387b0abdc@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The call to __iter_div_u64_rem() in vdso_time_update_aux() is a wrapper
around subtraction. It cannot be used to divide large numbers, as that
introduces long, computationally expensive delays. A regular u64 division
is also not possible in the timekeeper update path as it can be too slow.
Instead of splitting the ktime_t offset into into second and subsecond
components during the timekeeper update fast-path, do it together with the
adjustment of tk->offs_aux in the slow-path. Equivalent to the handling of
offs_boot and monotonic_to_boot.
Reuse the storage of monotonic_to_boot for the new field, as it is not used
by auxiliary timekeepers.
Fixes: 380b84e168 ("vdso/vsyscall: Update auxiliary clock data in the datapage")
Reported-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250825-vdso-auxclock-division-v1-1-a1d32a16a313@linutronix.de
Closes: https://lore.kernel.org/lkml/aKwsNNWsHJg8IKzj@localhost/
Jeff Johnson says:
==================
ath.git update for v6.17-rc5
Fix a long-standing issue with ath11k dropping group data packets
during GTK rekey, and fix an omission in the ath12k multi-link EMLSR
support introduced in v6.16.
==================
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add smb3_lease_break_enter to trace lease break notifications,
recording lease state, flags, epoch, and lease key. Align
smb3_lease_not_found to use the same payload and print format.
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix multiple fwnode reference leaks:
1. The function calls fwnode_get_named_child_node() to get the "leds" node,
but never calls fwnode_handle_put(leds) to release this reference.
2. Within the fwnode_for_each_child_node() loop, the early return
paths that don't properly release the "led" fwnode reference.
This fix follows the same pattern as commit d029edefed
("net dsa: qca8k: fix usages of device_get_named_child_node()")
Fixes: 94a2a84f5e ("net: dsa: mv88e6xxx: Support LED control")
Cc: stable@vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patch.msgid.link/20250901073224.2273103-1-linmq006@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel says:
====================
vxlan: Fix NPDs when using nexthop objects
With FDB nexthop groups, VXLAN FDB entries do not necessarily point to
a remote destination but rather to an FDB nexthop group. This means that
first_remote_{rcu,rtnl}() can return NULL and a few places in the driver
were not ready for that, resulting in NULL pointer dereferences.
Patches #1-#2 fix these NPDs.
Note that vxlan_fdb_find_uc() still dereferences the remote returned by
first_remote_rcu() without checking that it is not NULL, but this
function is only invoked by a single driver which vetoes the creation of
FDB nexthop groups. I will patch this in net-next to make the code less
fragile.
Patch #3 adds a selftests which exercises these code paths and tests
basic Tx functionality with FDB nexthop groups. I verified that the test
crashes the kernel without the first two patches.
====================
Link: https://patch.msgid.link/20250901065035.159644-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add test cases for VXLAN with FDB nexthop groups, testing both IPv4 and
IPv6. Test basic Tx functionality as well as some corner cases.
Example output:
# ./test_vxlan_nh.sh
TEST: VXLAN FDB nexthop: IPv4 basic Tx [ OK ]
TEST: VXLAN FDB nexthop: IPv6 basic Tx [ OK ]
TEST: VXLAN FDB nexthop: learning [ OK ]
TEST: VXLAN FDB nexthop: IPv4 proxy [ OK ]
TEST: VXLAN FDB nexthop: IPv6 proxy [ OK ]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250901065035.159644-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the "proxy" option is enabled on a VXLAN device, the device will
suppress ARP requests and IPv6 Neighbor Solicitation messages if it is
able to reply on behalf of the remote host. That is, if a matching and
valid neighbor entry is configured on the VXLAN device whose MAC address
is not behind the "any" remote (0.0.0.0 / ::).
The code currently assumes that the FDB entry for the neighbor's MAC
address points to a valid remote destination, but this is incorrect if
the entry is associated with an FDB nexthop group. This can result in a
NPD [1][3] which can be reproduced using [2][4].
Fix by checking that the remote destination exists before dereferencing
it.
[1]
BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
CPU: 4 UID: 0 PID: 365 Comm: arping Not tainted 6.17.0-rc2-virtme-g2a89cb21162c #2 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014
RIP: 0010:vxlan_xmit+0xb58/0x15f0
[...]
Call Trace:
<TASK>
dev_hard_start_xmit+0x5d/0x1c0
__dev_queue_xmit+0x246/0xfd0
packet_sendmsg+0x113a/0x1850
__sock_sendmsg+0x38/0x70
__sys_sendto+0x126/0x180
__x64_sys_sendto+0x24/0x30
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x4b/0x53
[2]
#!/bin/bash
ip address add 192.0.2.1/32 dev lo
ip nexthop add id 1 via 192.0.2.2 fdb
ip nexthop add id 10 group 1 fdb
ip link add name vx0 up type vxlan id 10010 local 192.0.2.1 dstport 4789 proxy
ip neigh add 192.0.2.3 lladdr 00:11:22:33:44:55 nud perm dev vx0
bridge fdb add 00:11:22:33:44:55 dev vx0 self static nhid 10
arping -b -c 1 -s 192.0.2.1 -I vx0 192.0.2.3
[3]
BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
CPU: 13 UID: 0 PID: 372 Comm: ndisc6 Not tainted 6.17.0-rc2-virtmne-g6ee90cb26014 #3 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1v996), BIOS 1.17.0-4.fc41 04/01/2x014
RIP: 0010:vxlan_xmit+0x803/0x1600
[...]
Call Trace:
<TASK>
dev_hard_start_xmit+0x5d/0x1c0
__dev_queue_xmit+0x246/0xfd0
ip6_finish_output2+0x210/0x6c0
ip6_finish_output+0x1af/0x2b0
ip6_mr_output+0x92/0x3e0
ip6_send_skb+0x30/0x90
rawv6_sendmsg+0xe6e/0x12e0
__sock_sendmsg+0x38/0x70
__sys_sendto+0x126/0x180
__x64_sys_sendto+0x24/0x30
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f383422ec77
[4]
#!/bin/bash
ip address add 2001:db8:1::1/128 dev lo
ip nexthop add id 1 via 2001:db8:1::1 fdb
ip nexthop add id 10 group 1 fdb
ip link add name vx0 up type vxlan id 10010 local 2001:db8:1::1 dstport 4789 proxy
ip neigh add 2001:db8:1::3 lladdr 00:11:22:33:44:55 nud perm dev vx0
bridge fdb add 00:11:22:33:44:55 dev vx0 self static nhid 10
ndisc6 -r 1 -s 2001:db8:1::1 -w 1 2001:db8:1::3 vx0
Fixes: 1274e1cc42 ("vxlan: ecmp support for mac fdb entries")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250901065035.159644-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
VXLAN FDB entries can point to either a remote destination or an FDB
nexthop group. The latter is usually used in EVPN deployments where
learning is disabled.
However, when learning is enabled, an incoming packet might try to
refresh an FDB entry that points to an FDB nexthop group and therefore
does not have a remote. Such packets should be dropped, but they are
only dropped after dereferencing the non-existent remote, resulting in a
NPD [1] which can be reproduced using [2].
Fix by dropping such packets earlier. Remove the misleading comment from
first_remote_rcu().
[1]
BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
CPU: 13 UID: 0 PID: 361 Comm: mausezahn Not tainted 6.17.0-rc1-virtme-g9f6b606b6b37 #1 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014
RIP: 0010:vxlan_snoop+0x98/0x1e0
[...]
Call Trace:
<TASK>
vxlan_encap_bypass+0x209/0x240
encap_bypass_if_local+0xb1/0x100
vxlan_xmit_one+0x1375/0x17e0
vxlan_xmit+0x6b4/0x15f0
dev_hard_start_xmit+0x5d/0x1c0
__dev_queue_xmit+0x246/0xfd0
packet_sendmsg+0x113a/0x1850
__sock_sendmsg+0x38/0x70
__sys_sendto+0x126/0x180
__x64_sys_sendto+0x24/0x30
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x4b/0x53
[2]
#!/bin/bash
ip address add 192.0.2.1/32 dev lo
ip address add 192.0.2.2/32 dev lo
ip nexthop add id 1 via 192.0.2.3 fdb
ip nexthop add id 10 group 1 fdb
ip link add name vx0 up type vxlan id 10010 local 192.0.2.1 dstport 12345 localbypass
ip link add name vx1 up type vxlan id 10020 local 192.0.2.2 dstport 54321 learning
bridge fdb add 00:11:22:33:44:55 dev vx0 self static dst 192.0.2.2 port 54321 vni 10020
bridge fdb add 00:aa:bb:cc:dd:ee dev vx1 self static nhid 10
mausezahn vx0 -a 00:aa:bb:cc:dd:ee -b 00:11:22:33:44:55 -c 1 -q
Fixes: 1274e1cc42 ("vxlan: ecmp support for mac fdb entries")
Reported-by: Marlin Cremers <mcremers@cloudbear.nl>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250901065035.159644-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King says:
====================
net: fix optical SFP failures
A regression was reported back in April concerning pcs-lynx and 10G
optical SFPs. This patch series addresses that regression, and likely
similar unreported regressions.
These patches:
- Add phy_interface_weight() which will be used in the solution.
- Split out the code that determines the inband "type" for an
interface mode.
- Clear the Autoneg bit in the advertising mask, or the Autoneg bit
in the support mask and the entire advertising mask if the selected
interface mode has no inband capabilties.
Tested with the mvpp2 patch posted earlier today.
====================
Link: https://patch.msgid.link/aLSHmddAqiCISeK3@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mathew reports that as a result of commit 6561f0e547 ("net: pcs:
pcs-lynx: implement pcs_inband_caps() method"), 10G SFP modules no
longer work with the Lynx PCS.
This problem is not specific to the Lynx PCS, but is caused by commit
df874f9e52 ("net: phylink: add pcs_inband_caps() method") which added
validation of the autoneg state to the optical SFP configuration path.
Fix this by handling interface modes that fundamentally have no
inband negotiation more correctly - if we only have a single interface
mode, clear the Autoneg support bit and the advertising mask. If the
module can operate with several different interface modes, autoneg may
be supported for other modes, so leave the support mask alone and just
clear the Autoneg bit in the advertising mask.
This restores 10G optical module functionality with PCS that supply
their inband support, and makes ethtool output look sane.
Reported-by: Mathew McBride <matt@traverse.com.au>
Closes: https://lore.kernel.org/r/025c0ebe-5537-4fa3-b05a-8b835e5ad317@app.fastmail.com
Fixes: df874f9e52 ("net: phylink: add pcs_inband_caps() method")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/E1uslwx-00000001SPB-2kiM@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When tcp_ao_copy_all_matching() fails in tcp_v6_syn_recv_sock() it just
exits the function. This ends up causing a memory-leak:
unreferenced object 0xffff0000281a8200 (size 2496):
comm "softirq", pid 0, jiffies 4295174684
hex dump (first 32 bytes):
7f 00 00 06 7f 00 00 06 00 00 00 00 cb a8 88 13 ................
0a 00 03 61 00 00 00 00 00 00 00 00 00 00 00 00 ...a............
backtrace (crc 5ebdbe15):
kmemleak_alloc+0x44/0xe0
kmem_cache_alloc_noprof+0x248/0x470
sk_prot_alloc+0x48/0x120
sk_clone_lock+0x38/0x3b0
inet_csk_clone_lock+0x34/0x150
tcp_create_openreq_child+0x3c/0x4a8
tcp_v6_syn_recv_sock+0x1c0/0x620
tcp_check_req+0x588/0x790
tcp_v6_rcv+0x5d0/0xc18
ip6_protocol_deliver_rcu+0x2d8/0x4c0
ip6_input_finish+0x74/0x148
ip6_input+0x50/0x118
ip6_sublist_rcv+0x2fc/0x3b0
ipv6_list_rcv+0x114/0x170
__netif_receive_skb_list_core+0x16c/0x200
netif_receive_skb_list_internal+0x1f0/0x2d0
This is because in tcp_v6_syn_recv_sock (and the IPv4 counterpart), when
exiting upon error, inet_csk_prepare_forced_close() and tcp_done() need
to be called. They make sure the newsk will end up being correctly
free'd.
tcp_v4_syn_recv_sock() makes this very clear by having the put_and_exit
label that takes care of things. So, this patch here makes sure
tcp_v4_syn_recv_sock and tcp_v6_syn_recv_sock have similar
error-handling and thus fixes the leak for TCP-AO.
Fixes: 06b22ef295 ("net/tcp: Wire TCP-AO to request sockets")
Signed-off-by: Christoph Paasch <cpaasch@openai.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250830-tcpao_leak-v1-1-e5878c2c3173@openai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 8401a108a6.
I got a report from an (anonymous) Sundance user:
Ethernet controller: Sundance Technology Inc / IC Plus Corp IC Plus IP100A Integrated 10/100 Ethernet MAC + PHY (rev 31)
Revert the driver back in. Make following changes:
- update Denis's email address in MAINTAINERS
- adjust to timer API renames:
- del_timer_sync() -> timer_delete_sync()
- from_timer() -> timer_container_of()
Fixes: 8401a108a6 ("eth: remove the DLink/Sundance (ST201) driver")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250901210818.1025316-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During GTK rekey, mac80211 issues a clear key (if the old key exists)
followed by an install key operation in the same context. This causes
ath11k to send two WMI commands in quick succession: one to clear the
old key and another to install the new key in the same slot.
Under certain conditions—especially under high load or time sensitive
scenarios, firmware may process these commands asynchronously in a way
that firmware assumes the key is cleared whereas hardware has a valid key.
This inconsistency between hardware and firmware leads to group addressed
packet drops. Only setting the same key again can restore a valid key in
firmware and allow packets to be transmitted.
This issue remained latent because the host's clear key commands were
not effective in firmware until commit 436a4e8865 ("ath11k: clear the
keys properly via DISABLE_KEY"). That commit enabled the host to
explicitly clear group keys, which inadvertently exposed the race.
To mitigate this, restrict group key clearing across all modes (AP, STA,
MESH). During rekey, the new key can simply be set on top of the previous
one, avoiding the need for a clear followed by a set.
However, in AP mode specifically, permit group key clearing when no
stations are associated. This exception supports transitions from secure
modes (e.g., WPA2/WPA3) to open mode, during which all associated peers
are removed and the group key is cleared as part of the transition.
Add a per-BSS station counter to track the presence of stations during
set key operations. Also add a reset_group_keys flag to track the key
re-installation state and avoid repeated installation of the same key
when the number of connected stations transitions to non-zero within a
rekey period.
Additionally, for AP and Mesh modes, when the first station associates,
reinstall the same group key that was last set. This ensures that the
firmware recovers from any race that may have occurred during a previous
key clear when no stations were associated.
This change ensures that key clearing is permitted only when no clients
are connected, avoiding packet loss while enabling dynamic security mode
transitions.
Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.9.0.1-02146-QCAHKSWPL_SILICONZ-1
Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
Reported-by: Steffen Moser <lists@steffen-moser.de>
Closes: https://lore.kernel.org/linux-wireless/c6366409-9928-4dd7-bf7b-ba7fcf20eabf@steffen-moser.de
Fixes: 436a4e8865 ("ath11k: clear the keys properly via DISABLE_KEY")
Signed-off-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Tested-by: Nicolas Escande <nico.escande@gmail.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Link: https://patch.msgid.link/20250810170018.1124014-1-rameshkumar.sundaram@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Currently, when updating EMLSR capabilities of a multi-link (ML) station,
only the EMLSR parameters (e.g., padding delay, transition delay, and
timeout) are sent to firmware. However, firmware also requires the
EMLSR support flag to be set in the MLO flags of the peer assoc WMI
command to properly handle EML operating mode notification frames.
Set the ATH12K_WMI_FLAG_MLO_EMLSR_SUPPORT flag in the peer assoc WMI
command when the ML station is EMLSR-capable, so that the firmware can
respond to EHT EML action frames from associated stations.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1
Fixes: 4bcf9525bc ("wifi: ath12k: update EMLSR capabilities of ML Station")
Signed-off-by: Ramya Gnanasekar <ramya.gnanasekar@oss.qualcomm.com>
Signed-off-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Link: https://patch.msgid.link/20250801104920.3326352-1-rameshkumar.sundaram@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
In get_bpf_prog_info_linear two calls to bpf_obj_get_info_by_fd are
made, the first to compute memory requirements for a struct perf_bpil
and the second to fill it in. Previously the code would warn when the
second call didn't match the first. Such races can be common place in
things like perf test, whose perf trace tests will frequently load BPF
programs. Rather than a debug message, return actual errors for this
case. Out of paranoia also validate the read bpf_prog_info array
value. Change the type of ptr to avoid mismatched pointer type
compiler warnings. Add some additional debug print outs and sanity
asserts.
Closes: https://lore.kernel.org/lkml/CAP-5=fWJQcmUOP7MuCA2ihKnDAHUCOBLkQFEkQES-1ZZTrgf8Q@mail.gmail.com/
Fixes: 6ac22d036f ("perf bpf: Pull in bpf_program__get_prog_info_linear()")
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250902181713.309797-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Calls to perf_env__insert_bpf_prog_info may fail as a sideband thread
may already have inserted the bpf_prog_info. Such failures may yield
info_linear being freed which then causes use-after-free issues with
the internal bpf_prog_info info struct. Make it so that
perf_env__insert_bpf_prog_info trigger early non-error paths and fix
the use-after-free in perf_event__synthesize_one_bpf_prog. Add proper
return error handling to perf_env__add_bpf_info (that calls
perf_env__insert_bpf_prog_info) and propagate the return value in its
callers.
Closes: https://lore.kernel.org/lkml/CAP-5=fWJQcmUOP7MuCA2ihKnDAHUCOBLkQFEkQES-1ZZTrgf8Q@mail.gmail.com/
Fixes: 03edb7020b ("perf bpf: Fix two memory leakages when calling perf_env__insert_bpf_prog_info()")
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250902181713.309797-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Pull sound fixes from Takashi Iwai:
"A collection of small changes including a few regression fixes:
- Regression fix for Intel SKL/KBL HD-audio bindings
- Regression fix for missing Nvidia HDMI codec entries after the
recent code reorganization
- A few TAS2781 codec regression fixes
- Fix for ASoC component lookup breakage
- Usual HD-audio, USB-audio and SOF quirk entries"
* tag 'sound-6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/hdmi: Add pin fix for another HP EliteDesk 800 G4 model
ALSA: usb-audio: Allow Focusrite devices to use low samplerates
ALSA: hda: tas2781: reorder tas2563 calibration variables
ALSA: hda: tas2781: fix tas2563 EFI data endianness
ALSA: firewire-motu: drop EPOLLOUT from poll return values as write is not supported
ALSA: docs: Add documents for recently changes in snd-usb-audio
ALSA: usb-audio: Add mute TLV for playback volumes on more devices
ASoC: SOF: Intel: WCL: Add the sdw_process_wakeen op
ALSA: hda: Avoid binding with SOF for SKL/KBL platforms
ASoC: rsnd: tidyup direction name on rsnd_dai_connect()
ALSA: hda/tas2781: Fix EFI name for calibration beginning with 1 instead of 0
ALSA: usb-audio: move mixer_quirks' min_mute into common quirk
ALSA: hda/realtek: Fix headset mic for TongFang X6[AF]R5xxY
ALSA: hda/hdmi: Restore missing HDMI codec entries
ASoC: codecs: idt821034: fix wrong log in idt821034_chip_direction_output()
ASoC: soc-core: tidyup snd_soc_lookup_component_nolocked()
ASoC: soc-core: care NULL dirver name on snd_soc_lookup_component_nolocked()
ALSA: hda: intel-dsp-config: Select SOF driver on MTL Chromebooks
ALSA: usb-audio: Add mute TLV for playback volumes on some devices
Pull misc fixes from Andrew Morton:
"17 hotfixes. 13 are cc:stable and the remainder address post-6.16
issues or aren't considered necessary for -stable kernels. 11 of these
fixes are for MM.
This includes a three-patch series from Harry Yoo which fixes an
intermittent boot failure which can occur on x86 systems. And a
two-patch series from Alexander Gordeev which fixes a KASAN crash on
S390 systems"
* tag 'mm-hotfixes-stable-2025-09-01-17-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: fix possible deadlock in kmemleak
x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings()
mm: introduce and use {pgd,p4d}_populate_kernel()
mm: move page table sync declarations to linux/pgtable.h
proc: fix missing pde_set_flags() for net proc files
mm: fix accounting of memmap pages
mm/damon/core: prevent unnecessary overflow in damos_set_effective_quota()
kexec: add KEXEC_FILE_NO_CMA as a legal flag
kasan: fix GCC mem-intrinsic prefix with sw tags
mm/kasan: avoid lazy MMU mode hazards
mm/kasan: fix vmalloc shadow memory (de-)population races
kunit: kasan_test: disable fortify string checker on kasan_strings() test
selftests/mm: fix FORCE_READ to read input value correctly
mm/userfaultfd: fix kmap_local LIFO ordering for CONFIG_HIGHPTE
ocfs2: prevent release journal inode after journal shutdown
rust: mm: mark VmaNew as transparent
of_numa: fix uninitialized memory nodes causing kernel panic
Pull btrfs fixes from David Sterba:
- fix a few races related to inode link count
- fix inode leak on failure to add link to inode
- move transaction aborts closer to where they happen
* tag 'for-6.17-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: avoid load/store tearing races when checking if an inode was logged
btrfs: fix race between setting last_dir_index_offset and inode logging
btrfs: fix race between logging inode and checking if it was logged before
btrfs: simplify error handling logic for btrfs_link()
btrfs: fix inode leak on failure to add link to inode
btrfs: abort transaction on failure to add link to inode
I recently ran into an issue where the PI generated using the block layer
integrity code differs from that from a kernel using the PRACT fallback
when the block layer integrity code is disabled, and I tracked this down
to us using PRACT incorrectly.
The NVM Command Set Specification (section 5.33 in 1.2, similar in older
versions) specifies the PRACT insert behavior as:
Inserted protection information consists of the computed CRC for the
protection information format (refer to section 5.3.1) in the Guard
field, the LBAT field value in the Application Tag field, the LBST
field value in the Storage Tag field, if defined, and the computed
reference tag in the Logical Block Reference Tag.
Where the computed reference tag is defined as following for type 1 and
type 2 using the text below that is duplicated in the respective bullet
points:
the value of the computed reference tag for the first logical block of
the command is the value contained in the Initial Logical Block
Reference Tag (ILBRT) or Expected Initial Logical Block Reference Tag
(EILBRT) field in the command, and the computed reference tag is
incremented for each subsequent logical block.
So we need to set ILBRT field, but we currently don't. Interestingly
this works fine on my older type 1 formatted SSD, but Qemu trips up on
this. We already set ILBRT for Write Same since commit aeb7bb061be5
("nvme: set the PRACT bit when using Write Zeroes with T10 PI").
To ease this, move the PI type check into nvme_set_ref_tag.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
There is a race condition between inode eviction and inode caching that
can cause a live struct btrfs_inode to be missing from the root->inodes
xarray. Specifically, there is a window during evict() between the inode
being unhashed and deleted from the xarray. If btrfs_iget() is called
for the same inode in that window, it will be recreated and inserted
into the xarray, but then eviction will delete the new entry, leaving
nothing in the xarray:
Thread 1 Thread 2
---------------------------------------------------------------
evict()
remove_inode_hash()
btrfs_iget_path()
btrfs_iget_locked()
btrfs_read_locked_inode()
btrfs_add_inode_to_root()
destroy_inode()
btrfs_destroy_inode()
btrfs_del_inode_from_root()
__xa_erase
In turn, this can cause issues for subvolume deletion. Specifically, if
an inode is in this lost state, and all other inodes are evicted, then
btrfs_del_inode_from_root() will call btrfs_add_dead_root() prematurely.
If the lost inode has a delayed_node attached to it, then when
btrfs_clean_one_deleted_snapshot() calls btrfs_kill_all_delayed_nodes(),
it will loop forever because the delayed_nodes xarray will never become
empty (unless memory pressure forces the inode out). We saw this
manifest as soft lockups in production.
Fix it by only deleting the xarray entry if it matches the given inode
(using __xa_cmpxchg()).
Fixes: 310b2f5d5a ("btrfs: use an xarray to track open inodes in a root")
Cc: stable@vger.kernel.org # 6.11+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Co-authored-by: Leo Martins <loemra.dev@gmail.com>
Signed-off-by: Leo Martins <loemra.dev@gmail.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
With 64K page size (aarch64 with 64K page size config) and 4K btrfs
block size, the following workload can easily lead to a corrupted read:
mkfs.btrfs -f -s 4k $dev > /dev/null
mount -o compress $dev $mnt
xfs_io -f -c "pwrite -S 0xff 0 64k" $mnt/base > /dev/null
echo "correct result:"
od -Ad -t x1 $mnt/base
xfs_io -f -c "reflink $mnt/base 32k 0 32k" \
-c "reflink $mnt/base 0 32k 32k" \
-c "pwrite -S 0xff 60k 4k" $mnt/new > /dev/null
echo "incorrect result:"
od -Ad -t x1 $mnt/new
umount $mnt
This shows the following result:
correct result:
0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
0065536
incorrect result:
0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
0032768 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0061440 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
0065536
Notice the zero in the range [32K, 60K), which is incorrect.
[CAUSE]
With extra trace printk, it shows the following events during od:
(some unrelated info removed like CPU and context)
od-3457 btrfs_do_readpage: enter r/i=5/258 folio=0(65536) prev_em_start=0000000000000000
The "r/i" is indicating the root and inode number. In our case the file
"new" is using ino 258 from fs tree (root 5).
Here notice the @prev_em_start pointer is NULL. This means the
btrfs_do_readpage() is called from btrfs_read_folio(), not from
btrfs_readahead().
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=0 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=4096 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=8192 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=12288 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=16384 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=20480 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=24576 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=28672 got em start=0 len=32768
These above 32K blocks will be read from the first half of the
compressed data extent.
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=32768 got em start=32768 len=32768
Note here there is no btrfs_submit_compressed_read() call. Which is
incorrect now.
Although both extent maps at 0 and 32K are pointing to the same compressed
data, their offsets are different thus can not be merged into the same
read.
So this means the compressed data read merge check is doing something
wrong.
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=36864 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=40960 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=45056 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=49152 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=53248 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=57344 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=61440 skip uptodate
od-3457 btrfs_submit_compressed_read: cb orig_bio: file off=0 len=61440
The function btrfs_submit_compressed_read() is only called at the end of
folio read. The compressed bio will only have an extent map of range [0,
32K), but the original bio passed in is for the whole 64K folio.
This will cause the decompression part to only fill the first 32K,
leaving the rest untouched (aka, filled with zero).
This incorrect compressed read merge leads to the above data corruption.
There were similar problems that happened in the past, commit 808f80b467
("Btrfs: update fix for read corruption of compressed and shared
extents") is doing pretty much the same fix for readahead.
But that's back to 2015, where btrfs still only supports bs (block size)
== ps (page size) cases.
This means btrfs_do_readpage() only needs to handle a folio which
contains exactly one block.
Only btrfs_readahead() can lead to a read covering multiple blocks.
Thus only btrfs_readahead() passes a non-NULL @prev_em_start pointer.
With v5.15 kernel btrfs introduced bs < ps support. This breaks the above
assumption that a folio can only contain one block.
Now btrfs_read_folio() can also read multiple blocks in one go.
But btrfs_read_folio() doesn't pass a @prev_em_start pointer, thus the
existing bio force submission check will never be triggered.
In theory, this can also happen for btrfs with large folios, but since
large folio is still experimental, we don't need to bother it, thus only
bs < ps support is affected for now.
[FIX]
Instead of passing @prev_em_start to do the proper compressed extent
check, introduce one new member, btrfs_bio_ctrl::last_em_start, so that
the existing bio force submission logic will always be triggered.
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The compression level is meaningless for lzo, but before commit
3f093ccb95 ("btrfs: harden parsing of compression mount options"),
it was silently ignored if passed.
After that commit, passing a level with lzo fails to mount:
BTRFS error: unrecognized compression value lzo:1
It seems reasonable for users to expect that lzo would permit a numeric
level option, as all the other algos do, even though the kernel's
implementation of LZO currently only supports a single level. Because it
has always worked to pass a level, it seems likely to me that users in
the real world are relying on doing so.
This patch restores the old behavior, giving "lzo:N" the same semantics
as all of the other compression algos.
To be clear, silly variants like "lzo:one", "lzo:the_first_option", or
"lzo:armageddon" also used to work. This isn't meant to suggest that
any possible mis-interpretation of mount options that once worked must
continue to work forever. This is an exceptional case where it makes
sense to preserve compatibility, both because the mis-interpretation is
reasonable, and because nothing tangible is sacrificed.
Finally update btrfs_show_options() to ignore the level of LZO, as it
is only the default level without any extra meaning.
Fixes: 3f093ccb95 ("btrfs: harden parsing of compression mount options")
Reviewed-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Calvin Owens <calvin@wbinvd.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The following workload on a squota enabled fs:
btrfs subvol create mnt/subvol
# ensure subvol extents get accounted
sync
btrfs qgroup create 1/1 mnt
btrfs qgroup assign mnt/subvol 1/1 mnt
btrfs qgroup delete mnt/subvol
# make the cleaner thread run
btrfs filesystem sync mnt
sleep 1
btrfs filesystem sync mnt
btrfs qgroup destroy 1/1 mnt
will fail with EBUSY. The reason is that 1/1 does the quick accounting
when we assign subvol to it, gaining its exclusive usage as excl and
excl_cmpr. But then when we delete subvol, the decrement happens via
record_squota_delta() which does not update excl_cmpr, as squotas does
not make any distinction between compressed and normal extents. Thus,
we increment excl_cmpr but never decrement it, and are unable to delete
1/1. The two possible fixes are to make squota always mirror excl and
excl_cmpr or to make the fast accounting separately track the plain and
cmpr numbers. The latter felt cleaner to me so that is what I opted for.
Fixes: 1e0e9d5771 ("btrfs: add helper for recording simple quota deltas")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Fix a possible heap overflow in e1000_set_eeprom function by adding
input validation for the requested length of the change in the EEPROM.
In addition, change the variable type from int to size_t for better
code practices and rearrange declarations to RCT.
Cc: stable@vger.kernel.org
Fixes: bc7f75fa97 ("[E1000E]: New pci-express e1000 driver (currently for ICH9 devices only)")
Co-developed-by: Mikael Wessel <post@mikaelkw.online>
Signed-off-by: Mikael Wessel <post@mikaelkw.online>
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
incorrectly used ixgbe_lp_map in loops intended to populate the
supported and advertised EEE linkmode bitmaps based on ixgbe_ls_map.
This results in incorrect bit setting and potential out-of-bounds
access, since ixgbe_lp_map and ixgbe_ls_map have different sizes
and purposes.
ixgbe_lp_map[i] -> ixgbe_ls_map[i]
Use ixgbe_ls_map for supported and advertised linkmodes, and keep
ixgbe_lp_map usage only for link partner (lp_advertised) mapping.
Fixes: 9356b6db9d ("net: ethernet: ixgbe: Convert EEE to use linkmodes")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
list_first_entry() never returns NULL - if the list is empty, it still
returns a pointer to an invalid object, leading to potential invalid
memory access when dereferenced.
Fix this by using list_first_entry_or_null instead of list_first_entry.
Fixes: e3219ce6a7 ("i40e: Add support for client interface for IWARP driver")
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The 'command' and 'netdev_ops' debugfs files are a legacy debugging
interface supported by the i40e driver since its early days by commit
02e9c29081 ("i40e: debugfs interface").
Both of these debugfs files provide a read handler which is mostly useless,
and which is implemented with questionable logic. They both use a static
256 byte buffer which is initialized to the empty string. In the case of
the 'command' file this buffer is literally never used and simply wastes
space. In the case of the 'netdev_ops' file, the last command written is
saved here.
On read, the files contents are presented as the name of the device
followed by a colon and then the contents of their respective static
buffer. For 'command' this will always be "<device>: ". For 'netdev_ops',
this will be "<device>: <last command written>". But note the buffer is
shared between all devices operated by this module. At best, it is mostly
meaningless information, and at worse it could be accessed simultaneously
as there doesn't appear to be any locking mechanism.
We have also recently received multiple reports for both read functions
about their use of snprintf and potential overflow that could result in
reading arbitrary kernel memory. For the 'command' file, this is definitely
impossible, since the static buffer is always zero and never written to.
For the 'netdev_ops' file, it does appear to be possible, if the user
carefully crafts the command input, it will be copied into the buffer,
which could be large enough to cause snprintf to truncate, which then
causes the copy_to_user to read beyond the length of the buffer allocated
by kzalloc.
A minimal fix would be to replace snprintf() with scnprintf() which would
cap the return to the number of bytes written, preventing an overflow. A
more involved fix would be to drop the mostly useless static buffers,
saving 512 bytes and modifying the read functions to stop needing those as
input.
Instead, lets just completely drop the read access to these files. These
are debug interfaces exposed as part of debugfs, and I don't believe that
dropping read access will break any script, as the provided output is
pretty useless. You can find the netdev name through other more standard
interfaces, and the 'netdev_ops' interface can easily result in garbage if
you issue simultaneous writes to multiple devices at once.
In order to properly remove the i40e_dbg_netdev_ops_buf, we need to
refactor its write function to avoid using the static buffer. Instead, use
the same logic as the i40e_dbg_command_write, with an allocated buffer.
Update the code to use this instead of the static buffer, and ensure we
free the buffer on exit. This fixes simultaneous writes to 'netdev_ops' on
multiple devices, and allows us to remove the now unused static buffer
along with removing the read access.
Fixes: 02e9c29081 ("i40e: debugfs interface")
Reported-by: Kunwu Chan <chentao@kylinos.cn>
Closes: https://lore.kernel.org/intel-wired-lan/20231208031950.47410-1-chentao@kylinos.cn/
Reported-by: Wang Haoran <haoranwangsec@gmail.com>
Closes: https://lore.kernel.org/all/CANZ3JQRRiOdtfQJoP9QM=6LS1Jto8PGBGw6y7-TL=BcnzHQn1Q@mail.gmail.com/
Reported-by: Amir Mohammad Jahangirzad <a.jahangirzad@gmail.com>
Closes: https://lore.kernel.org/all/20250722115017.206969-1-a.jahangirzad@gmail.com/
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kunwu Chan <kunwu.chan@linux.dev>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
On control planes that allow changing the MAC address of the interface,
the driver must provide a MAC type to avoid errors such as:
idpf 0000:0a:00.0: Transaction failed (op 535)
idpf 0000:0a:00.0: Received invalid MAC filter payload (op 535) (len 0)
idpf 0000:0a:00.0: Transaction failed (op 536)
These errors occur during driver load or when changing the MAC via:
ip link set <iface> address <mac>
Add logic to set the MAC type when sending ADD/DEL (opcodes 535/536) to
the control plane. Since only one primary MAC is supported per vport, the
driver only needs to send an ADD opcode when setting it. Remove the old
address by calling __idpf_del_mac_filter(), which skips the message and
just clears the entry from the internal list. This avoids an error on DEL
as it attempts to remove an address already cleared by the preceding ADD
opcode.
Fixes: ce1b75d063 ("idpf: add ptypes and MAC filter support")
Reported-by: Jian Liu <jianliu@redhat.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Recent versions of the E810 firmware have support for an extra interrupt to
handle report of the "low latency" Tx timestamps coming from the
specialized low latency firmware interface. Instead of polling the
registers, software can wait until the low latency interrupt is fired.
This logic makes use of the Tx timestamp tracking structure, ice_ptp_tx, as
it uses the same "ready" bitmap to track which Tx timestamps complete.
Unfortunately, the ice_ll_ts_intr() function does not check if the
tracker is initialized before its first access. This results in NULL
dereference or use-after-free bugs similar to the issues fixed in the
ice_ptp_ts_irq() function.
Fix this by only checking the in_use bitmap (and other fields) if the
tracker is marked as initialized. The reset flow will clear the init field
under lock before it tears the tracker down, thus preventing any
use-after-free or NULL access.
Fixes: 82e71b226e ("ice: Enable SW interrupt from FW for LL TS")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The E810 device has support for a "low latency" firmware interface to
access and read the Tx timestamps. This interface does not use the standard
Tx timestamp logic, due to the latency overhead of proxying sideband
command requests over the firmware AdminQ.
The logic still makes use of the Tx timestamp tracking structure,
ice_ptp_tx, as it uses the same "ready" bitmap to track which Tx
timestamps complete.
Unfortunately, the ice_ptp_ts_irq() function does not check if the tracker
is initialized before its first access. This results in NULL dereference or
use-after-free bugs similar to the following:
[245977.278756] BUG: kernel NULL pointer dereference, address: 0000000000000000
[245977.278774] RIP: 0010:_find_first_bit+0x19/0x40
[245977.278796] Call Trace:
[245977.278809] ? ice_misc_intr+0x364/0x380 [ice]
This can occur if a Tx timestamp interrupt races with the driver reset
logic.
Fix this by only checking the in_use bitmap (and other fields) if the
tracker is marked as initialized. The reset flow will clear the init field
under lock before it tears the tracker down, thus preventing any
use-after-free or NULL access.
Fixes: f9472aaabd ("ice: Process TSYN IRQ in a separate function")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The bridge has three bootstrap pins which are sampled to determine the
frequency of the external reference clock. The driver will also
(over)write that setting. But it seems this is racy after the bridge is
enabled. It was observed that although the driver write the correct
value (by sniffing on the I2C bus), the register has the wrong value.
The datasheet states that the GPIO lines have to be stable for at least
5us after asserting the EN signal. Thus, there seems to be some logic
which samples the GPIO lines and this logic appears to overwrite the
register value which was set by the driver. Waiting 20us after
asserting the EN line resolves this issue.
Fixes: a095f15c00 ("drm/bridge: add support for sn65dsi86 bridge driver")
Signed-off-by: Michael Walle <mwalle@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20250821122341.1257286-1-mwalle@kernel.org
Merge series from Charles Keepax <ckeepax@opensource.cirrus.com>:
Minor bug fixes for a couple of older devices reported by some users.
Mostly this centers around the automatic PLL configuration getting the
wrong values due to rounding.
Both tracing_mark_write and tracing_mark_raw_write call
__copy_from_user_inatomic during preempt_disable. But in some case,
__copy_from_user_inatomic may trigger page fault, and will call schedule()
subtly. And if a task is migrated to other cpu, the following warning will
be trigger:
if (RB_WARN_ON(cpu_buffer,
!local_read(&cpu_buffer->committing)))
An example can illustrate this issue:
process flow CPU
---------------------------------------------------------------------
tracing_mark_raw_write(): cpu:0
...
ring_buffer_lock_reserve(): cpu:0
...
cpu = raw_smp_processor_id() cpu:0
cpu_buffer = buffer->buffers[cpu] cpu:0
...
...
__copy_from_user_inatomic(): cpu:0
...
# page fault
do_mem_abort(): cpu:0
...
# Call schedule
schedule() cpu:0
...
# the task schedule to cpu1
__buffer_unlock_commit(): cpu:1
...
ring_buffer_unlock_commit(): cpu:1
...
cpu = raw_smp_processor_id() cpu:1
cpu_buffer = buffer->buffers[cpu] cpu:1
As shown above, the process will acquire cpuid twice and the return values
are not the same.
To fix this problem using copy_from_user_nofault instead of
__copy_from_user_inatomic, as the former performs 'access_ok' before
copying.
Link: https://lore.kernel.org/20250819105152.2766363-1-luogengkun@huaweicloud.com
Fixes: 656c7f0d2d ("tracing: Replace kmap with copy_from_user() in trace_marker writing")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
In the TX completion packet stage of TI SoCs with CPSW2G instance, which
has single external ethernet port, ndev is accessed without being
initialized if no TX packets have been processed. It results into null
pointer dereference, causing kernel to crash. Fix this by having a check
on the number of TX packets which have been processed.
Fixes: 9a369ae3d1 ("net: ethernet: ti: am65-cpsw: remove am65_cpsw_nuss_tx_compl_packets_2g()")
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Chintan Vankar <c-vankar@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250829121051.2031832-1-c-vankar@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The drm_sched_job_unschedulable trace point can access
entity->dependency after it was cleared by the callback
installed in drm_sched_entity_add_dependency_cb, causing:
BUG: kernel NULL pointer dereference, address: 0000000000000020
[...]
Workqueue: comp_1.1.0 drm_sched_run_job_work [gpu_sched]
RIP: 0010:trace_event_raw_event_drm_sched_job_unschedulable+0x70/0xd0 [gpu_sched]
To fix this we either need to keep a reference to the fence before
setting up the callbacks, or move the trace_drm_sched_job_unschedulable
calls into drm_sched_entity_add_dependency_cb where they can be
done earlier.
Fixes: 76d97c870f ("drm/sched: Trace dependencies for GPU jobs")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250901124032.1955-1-pierre-eric.pelloux-prayer@amd.com
(cherry picked from commit b2b8af21fe)
Signed-off-by: Maxime Ripard <mripard@kernel.org>
The sma1307->set.header_size is how many integers are in the header
(there are 8 of them) but instead of allocating space of 8 integers
we allocate 8 bytes. This leads to memory corruption when we copy data
it on the next line:
memcpy(sma1307->set.header, data,
sma1307->set.header_size * sizeof(int));
Also since we're immediately copying over the memory in ->set.header,
there is no need to zero it in the allocator. Use devm_kmalloc_array()
to allocate the memory instead.
Fixes: 576c57e6b4 ("ASoC: sma1307: Add driver for Iron Device SMA1307")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/aLGjvjpueVstekXP@stanley.mountain
Signed-off-by: Mark Brown <broonie@kernel.org>
drm::Device is only available when CONFIG_DRM=y, which we have to
consider for intra-doc links, otherwise the rustdoc make target produces
the following warning.
>> warning: unresolved link to `kernel::drm::Device`
--> rust/kernel/device.rs:154:22
|
154 | /// [`drm::Device`]: kernel::drm::Device
| ^^^^^^^^^^^^^^^^^^^ no item named `drm` in module `kernel`
|
= note: `#[warn(rustdoc::broken_intra_doc_links)]` on by default
Fix this by making the intra-doc link conditional on CONFIG_DRM being enabled.
Fixes: d6e26c1ae4 ("device: rust: expand documentation for Device")
Suggested-by: Alice Ryhl <aliceryhl@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202508261644.9LclwUgt-lkp@intel.com/
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250829195745.31174-1-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Fix a critical memory allocation bug in edma_setup_from_hw() where
queue_priority_map was allocated with insufficient memory. The code
declared queue_priority_map as s8 (*)[2] (pointer to array of 2 s8),
but allocated memory using sizeof(s8) instead of the correct size.
This caused out-of-bounds memory writes when accessing:
queue_priority_map[i][0] = i;
queue_priority_map[i][1] = i;
The bug manifested as kernel crashes with "Oops - undefined instruction"
on ARM platforms (BeagleBoard-X15) during EDMA driver probe, as the
memory corruption triggered kernel hardening features on Clang.
Change the allocation to use sizeof(*queue_priority_map) which
automatically gets the correct size for the 2D array structure.
Fixes: 2b6b3b7420 ("ARM/dmaengine: edma: Merge the two drivers under drivers/dma/")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://lore.kernel.org/r/20250830094953.3038012-1-anders.roxell@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Merge series from James Clark <james.clark@linaro.org>:
Various fixes for LPSI along with some refactorings. None of the fixes
are strictly related to S32G, however these changes all originate from
the work to support S32G devices. The only commits that are strictly
related are for the new s32g2 and s32g3 compatible strings.
Simon Wunderlich says:
====================
Here is a batman-adv bugfix:
- fix OOB read/write in network-coding decode, by Stanislav Fort
* tag 'batadv-net-pullrequest-20250901' of https://git.open-mesh.org/linux-merge:
batman-adv: fix OOB read/write in network-coding decode
====================
Link: https://patch.msgid.link/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The code currently reads both U32 attributes and U64 attributes as
U64, so when a U32 attribute is provided by userspace (ie, when not
using XPN), on big endian systems, we'll load that value into the
upper 32bits of the next_pn field instead of the lower 32bits. This
means that the value that userspace provided is ignored (we only care
about the lower 32bits for non-XPN), and we'll start using PNs from 0.
Switch to nla_get_uint, which will read the value correctly on all
arches, whether it's 32b or 64b.
Fixes: 48ef50fa86 ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1c1df1661b89238caf5beefb84a10ebfd56c66ea.1756459839.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 2677010e77 ("Add support to set NAPI threaded for individual
NAPI") introduced threaded NAPI configuration per individual NAPI
instance, however obsolete description that threaded NAPI is per device
has remained.
Remove the old description and clarify that only NAPI instances running
in threaded mode spawn kernel threads by changing "Each NAPI instance"
to "Each threaded NAPI instance".
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Link: https://patch.msgid.link/20250829064857.51503-1-enjuk@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The icmp_ndo_send function was originally introduced to ensure proper
rate limiting when icmp_send is called by a network device driver,
where the packet's source address may have already been transformed
by SNAT.
However, the original implementation only considers the
IP_CT_DIR_ORIGINAL direction for SNAT and always replaced the packet's
source address with that of the original-direction tuple. This causes
two problems:
1. For SNAT:
Reply-direction packets were incorrectly translated using the source
address of the CT original direction, even though no translation is
required.
2. For DNAT:
Reply-direction packets were not handled at all. In DNAT, the original
direction's destination is translated. Therefore, in the reply
direction the source address must be set to the reply-direction
source, so rate limiting works as intended.
Fix this by using the connection direction to select the correct tuple
for source address translation, and adjust the pre-checks to handle
reply-direction packets in case of DNAT.
Additionally, wrap the `ct->status` access in READ_ONCE(). This avoids
possible KCSAN reports about concurrent updates to `ct->status`.
Fixes: 0b41713b60 ("icmp: introduce helper for nat'd source address in network device context")
Signed-off-by: Fabian Bläse <fabian@blaese.de>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- vhci: Prevent use-after-free by removing debugfs files early
- L2CAP: Fix use-after-free in l2cap_sock_cleanup_listen()
* tag 'for-net-2025-08-29' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: Fix use-after-free in l2cap_sock_cleanup_listen()
Bluetooth: vhci: Prevent use-after-free by removing debugfs files early
====================
Link: https://patch.msgid.link/20250829191210.1982163-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 0cc22f5a86 ("phy: qcom: qmp-pcie: Add PHY register retention
support") added support for using the "no_csr" reset to skip configuration
of the PHY if the init sequence was already applied by the boot firmware.
The expectation is that the PHY is only turned on/off by using the "no_csr"
reset, instead of powering it down and re-programming it after a full
reset.
The boot firmware on X1E does not fully conform to this expectation: If the
PCIe3 link fails to come up (e.g. because no PCIe card is inserted), the
firmware powers down the PHY using the QPHY_PCS_POWER_DOWN_CONTROL
register. The QPHY_START_CTRL register is kept as-is, so the driver assumes
the PHY is already initialized and skips the configuration/power up
sequence. The PHY won't come up again without clearing the
QPHY_PCS_POWER_DOWN_CONTROL, so eventually initialization fails:
qcom-qmp-pcie-phy 1be0000.phy: phy initialization timed-out
phy phy-1be0000.phy.0: phy poweron failed --> -110
qcom-pcie 1bd0000.pcie: cannot initialize host
qcom-pcie 1bd0000.pcie: probe with driver qcom-pcie failed with error -110
This can be reliably reproduced on the X1E CRD, QCP and Devkit when no card
is inserted for PCIe3.
Fix this by checking the QPHY_PCS_POWER_DOWN_CONTROL register in addition
to QPHY_START_CTRL. If the PHY is powered down with the register, it
doesn't conform to the expectations for using the "no_csr" reset, so we
fully re-initialize with the normal reset sequence.
Also check the register more carefully to ensure all of the bits we expect
are actually set. A simple !!(readl()) is not enough, because the PHY might
be only partially set up with some of the expected bits set.
Cc: stable@vger.kernel.org
Fixes: 0cc22f5a86 ("phy: qcom: qmp-pcie: Add PHY register retention support")
Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250821-phy-qcom-qmp-pcie-nocsr-fix-v3-1-4898db0cc07c@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Some SoCs are just validated with the TX delay enabled. With commit
ca13b249f2 ("net: ethernet: ti: am65-cpsw: fixup PHY mode for fixed
RGMII TX delay"), the network driver will patch the delay setting on the
fly assuming that the TX delay setting is fixed. In reality, the TX
delay is configurable and just skipped in the documentation. There are
bootloaders, which will disable the TX delay and this will lead to a
transmit path which doesn't add any delays at all.
Fix that by always writing the RGMII_ID setting and report an error for
unsupported RGMII delay modes.
This is safe to do and shouldn't break any boards in mainline because
the fixed delay is only introduced for gmii-sel compatibles which are
used together with the am65-cpsw-nuss driver and also contains the
commit above.
Fixes: ca13b249f2 ("net: ethernet: ti: am65-cpsw: fixup PHY mode for fixed RGMII TX delay")
Signed-off-by: Michael Walle <mwalle@kernel.org>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://lore.kernel.org/r/20250819065622.1019537-1-mwalle@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
ina238_write_power() was attempting to clamp the user input but was
throwing away the result. Ensure that we clamp the value to the
appropriate range before it is converted into a register value.
Fixes: 0d9f596b1f ("hwmon: (ina238) Modify the calculation formula to adapt to different chips")
Cc: Wenliang Yan <wenliang202407@163.com>
Cc: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
When clamping a register value, the result needs to be masked against the
register size. This was missing, resulting in errors when trying to write
negative limits. Fix by masking the clamping result against the register
size.
Fixes: eacb52f010 ("hwmon: Driver for Texas Instruments INA238")
Cc: Nathan Rossi <nathan.rossi@digi.com>
Cc: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
driver support indirect read and indirect write operation with
assumption no force device removal(unbind) operation. However
force device removal(removal) is still available to root superuser.
Unbinding driver during operation causes kernel crash. This changes
ensure driver able to handle such operation for indirect read and
indirect write by implementing refcount to track attached devices
to the controller and gracefully wait and until attached devices
remove operation completed before proceed with removal operation.
Signed-off-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com>
Reviewed-by: Niravkumar L Rabara <nirav.rabara@altera.com>
Link: https://patch.msgid.link/8704fd6bd2ff4d37bba4a0eacf5eba3ba001079e.1756168074.git.khairul.anuar.romli@altera.com
Signed-off-by: Mark Brown <broonie@kernel.org>
This erratum only ever results in a max value of 1, otherwise the full 3
bits are available. To avoid repeating the same default prescale value
for every new device's devdata, treat 0 as no limit (7) and only set a
value when the erratum is present.
Change the field to be 3 bits to catch out of range definitions.
No functionality change.
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250828-james-nxp-lpspi-v2-7-6262b9aa9be4@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Clear the error flags after disabling the module to avoid the case when
a flag is set again between flag clear and module disable. And use
SR_CLEAR_MASK to replace hardcoded value for improved readability.
Although fsl_lpspi_reset() was only introduced in commit a15dc3d657
("spi: lpspi: Fix CLK pin becomes low before one transfer"), the
original driver only reset SR in the interrupt handler, making it
vulnerable to the same issue. Therefore the fixes commit is set at the
introduction of the driver.
Fixes: 5314987de5 ("spi: imx: add lpspi bus driver")
Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com>
Signed-off-by: Ciprian Marian Costea <ciprianmarian.costea@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://patch.msgid.link/20250828-james-nxp-lpspi-v2-4-6262b9aa9be4@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Commit 6a13044849 ("spi: lpspi: Fix wrong transmission when don't use
CONT") breaks transmissions when CONT is used. The TDIE interrupt should
not be disabled in all cases. If CONT is used and the TX transfer is not
yet completed yet, but the interrupt handler is called because there are
characters to be received, TDIE is replaced with FCIE. When the transfer
is finally completed, SR_TDF is set but the interrupt handler isn't
called again.
Fixes: 6a13044849 ("spi: lpspi: Fix wrong transmission when don't use CONT")
Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250828-james-nxp-lpspi-v2-1-6262b9aa9be4@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Commit 05f254a636 ("ALSA: usb-audio:
Improve filtering of sample rates on Focusrite devices") changed the
check for max_rate in a way which was overly restrictive, forcing
devices to use very high samplerates if they support them, despite
support existing for lower rates as well.
This maintains the intended outcome (ensuring samplerates selected are
supported) while allowing devices with higher maximum samplerates to be
opened at all supported samplerates.
This patch was tested with a Clarett+ 8Pre USB
Fixes: 05f254a636 ("ALSA: usb-audio: Improve filtering of sample rates on Focusrite devices")
Signed-off-by: Tina Wuest <tina@wuest.me>
Link: https://patch.msgid.link/20250901092024.140993-1-tina@wuest.me
Signed-off-by: Takashi Iwai <tiwai@suse.de>
fuse fixes for 6.17-rc5
* tag 'fuse-fixes-6.17-rc5' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (6 commits)
fuse: Block access to folio overlimit
fuse: fix fuseblk i_blkbits for iomap partial writes
fuse: reflect cached blocksize if blocksize was changed
fuse: prevent overflow in copy_file_range return value
fuse: check if copy_file_range() returns larger than requested size
fuse: do not allow mapping a non-regular backing file
Link: https://lore.kernel.org/CAJfpeguEVMMyw_zCb+hbOuSxdE2Z3Raw=SJsq=Y56Ae6dn2W3g@mail.gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Use disable_work_sync() instead of cancel_work_sync() in ivpu_dev_fini()
to ensure that no new recovery work items can be queued after device
removal has started. Previously, recovery work could be scheduled even
after canceling existing work, potentially leading to use-after-free
bugs if recovery accessed freed resources.
Rename ivpu_pm_cancel_recovery() to ivpu_pm_disable_recovery() to better
reflect its new behavior.
Fixes: 58cde80f45 ("accel/ivpu: Use dedicated work for job timeout detection")
Cc: stable@vger.kernel.org # v6.8+
Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://lore.kernel.org/r/20250808110939.328366-1-jacek.lawrynowicz@linux.intel.com
Patches for the arm64 defconfig are supposed to be sent to the
SoC maintainers (e.g. a change in the generic arm64 defconfig
required for Rockchip devices should be send to Heiko Stübner
as he is listed as maintainer for "ARM/Rockchip SoC support")
and not the ARM64 PORT maintainers.
While we cannot easily describe this in MAINTAINERS, we can at
least stop it from giving false information and make it behave
the same way as for the MAINTAINERS file itself (which basically
has the same rules), so that it just outputs the LKML for the
ARM64 defconfig.
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250818-arm64-defconfig-v1-1-f589553c3d72@collabora.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Axiado AX3000 EVK has total of 4 UART ports. Add missing alias for uart0,
uart1, uart2.
This fixes the probe failures on the remaining UARTs.
Fixes: 1f70557790 ("arm64: dts: axiado: Add initial support for AX3000 SoC and eval board")
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Harshit Shah <hshah@axiado.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
set_track_prepare() can incur lock recursion.
The issue is that it is called from hrtimer_start_range_ns
holding the per_cpu(hrtimer_bases)[n].lock, but when enabled
CONFIG_DEBUG_OBJECTS_TIMERS, may wake up kswapd in set_track_prepare,
and try to hold the per_cpu(hrtimer_bases)[n].lock.
Avoid deadlock caused by implicitly waking up kswapd by passing in
allocation flags, which do not contain __GFP_KSWAPD_RECLAIM in the
debug_objects_fill_pool() case. Inside stack depot they are processed by
gfp_nested_mask().
Since ___slab_alloc() has preemption disabled, we mask out
__GFP_DIRECT_RECLAIM from the flags there.
The oops looks something like:
BUG: spinlock recursion on CPU#3, swapper/3/0
lock: 0xffffff8a4bf29c80, .magic: dead4ead, .owner: swapper/3/0, .owner_cpu: 3
Hardware name: Qualcomm Technologies, Inc. Popsicle based on SM8850 (DT)
Call trace:
spin_bug+0x0
_raw_spin_lock_irqsave+0x80
hrtimer_try_to_cancel+0x94
task_contending+0x10c
enqueue_dl_entity+0x2a4
dl_server_start+0x74
enqueue_task_fair+0x568
enqueue_task+0xac
do_activate_task+0x14c
ttwu_do_activate+0xcc
try_to_wake_up+0x6c8
default_wake_function+0x20
autoremove_wake_function+0x1c
__wake_up+0xac
wakeup_kswapd+0x19c
wake_all_kswapds+0x78
__alloc_pages_slowpath+0x1ac
__alloc_pages_noprof+0x298
stack_depot_save_flags+0x6b0
stack_depot_save+0x14
set_track_prepare+0x5c
___slab_alloc+0xccc
__kmalloc_cache_noprof+0x470
__set_page_owner+0x2bc
post_alloc_hook[jt]+0x1b8
prep_new_page+0x28
get_page_from_freelist+0x1edc
__alloc_pages_noprof+0x13c
alloc_slab_page+0x244
allocate_slab+0x7c
___slab_alloc+0x8e8
kmem_cache_alloc_noprof+0x450
debug_objects_fill_pool+0x22c
debug_object_activate+0x40
enqueue_hrtimer[jt]+0xdc
hrtimer_start_range_ns+0x5f8
...
Signed-off-by: yangshiguang <yangshiguang@xiaomi.com>
Fixes: 5cf909c553 ("mm/slub: use stackdepot to save stack trace in objects")
Cc: stable@vger.kernel.org
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
If built on architectures with CONFIG_ARCH_DMA_ADDR_T_64BIT=y nova-core
produces that following build failures:
error[E0308]: mismatched types
--> drivers/gpu/nova-core/fb.rs:49:59
|
49 | hal::fb_hal(chipset).write_sysmem_flush_page(bar, page.dma_handle())?;
| ----------------------- ^^^^^^^^^^^^^^^^^ expected `u64`, found `u32`
| |
| arguments to this method are incorrect
|
note: method defined here
--> drivers/gpu/nova-core/fb/hal.rs:19:8
|
19 | fn write_sysmem_flush_page(&self, bar: &Bar0, addr: u64) -> Result;
| ^^^^^^^^^^^^^^^^^^^^^^^
help: you can convert a `u32` to a `u64`
|
49 | hal::fb_hal(chipset).write_sysmem_flush_page(bar, page.dma_handle().into())?;
| +++++++
error[E0308]: mismatched types
--> drivers/gpu/nova-core/fb.rs:65:47
|
65 | if hal.read_sysmem_flush_page(bar) == self.page.dma_handle() {
| ------------------------------- ^^^^^^^^^^^^^^^^^^^^^^ expected `u64`, found `u32`
| |
| expected because this is `u64`
|
help: you can convert a `u32` to a `u64`
|
65 | if hal.read_sysmem_flush_page(bar) == self.page.dma_handle().into() {
| +++++++
error: this arithmetic operation will overflow
--> drivers/gpu/nova-core/falcon.rs:469:23
|
469 | .set_base((dma_start >> 40) as u16)
| ^^^^^^^^^^^^^^^^^ attempt to shift right by `40_i32`, which would overflow
|
= note: `#[deny(arithmetic_overflow)]` on by default
This is due to the code making assumptions on the width of dma_addr_t to
be 64 bit.
While this could technically be handled, it is rather painful to deal
with, as the following example illustrates:
pub(super) fn read_sysmem_flush_page_ga100(bar: &Bar0) -> DmaAddress {
let addr = u64::from(regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR::read(bar).adr_39_08())
<< FLUSH_SYSMEM_ADDR_SHIFT
| u64::from(regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR_HI::read(bar).adr_63_40())
<< FLUSH_SYSMEM_ADDR_SHIFT_HI;
addr.try_into().unwrap_or_else(|_| {
kernel::warn_on!(true);
0
})
}
At the same time there's not much value for nova-core to support 32-bit,
given that the supported GPU architectures are Turing and later, hence
depend on CONFIG_64BIT.
Cc: John Hubbard <jhubbard@nvidia.com>
Reported-by: Miguel Ojeda <ojeda@kernel.org>
Closes: https://lore.kernel.org/lkml/20250828160247.37492-1-ojeda@kernel.org/
Fixes: 6554ad65b5 ("gpu: nova-core: register sysmem flush page")
Fixes: 69f5cd67ce ("gpu: nova-core: add falcon register definitions and base code")
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Link: https://lore.kernel.org/r/20250828223954.351348-1-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
The company's email address has been changed, so update my email
address in MAINTAINERS and .mailmap files.
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.x90@mail.toshiba>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
If the client sends SMB2_CREATE_POSIX_CONTEXT to ksmbd, allow the filename
to contain a colon (':'). This requires disabling the support for Alternate
Data Streams (ADS), which are denoted by a colon-separated suffix to the
filename on Windows. This should not be an issue, since this concept is not
known to POSIX anyway and the client has to explicitly request a POSIX
context to get this behavior.
Link: https://lore.kernel.org/all/f9401718e2be2ab22058b45a6817db912784ef61.camel@rx2.rx-server.de/
Signed-off-by: Philipp Kerling <pkerling@casix.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reading /proc/fs/cifs/open_dirs may hit a NULL dereference when
tcon->cfids is NULL.
Add NULL check before accessing cfids to prevent the crash.
Reproduction:
- Mount CIFS share
- cat /proc/fs/cifs/open_dirs
Fixes: 844e5c0eb1 ("smb3 client: add way to show directory leases for improved debugging")
Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
batadv_nc_skb_decode_packet() trusts coded_len and checks only against
skb->len. XOR starts at sizeof(struct batadv_unicast_packet), reducing
payload headroom, and the source skb length is not verified, allowing an
out-of-bounds read and a small out-of-bounds write.
Validate that coded_len fits within the payload area of both destination
and source sk_buffs before XORing.
Fixes: 2df5278b02 ("batman-adv: network coding - receive coded packets and decode them")
Cc: stable@vger.kernel.org
Reported-by: Stanislav Fort <disclosure@aisle.com>
Signed-off-by: Stanislav Fort <stanislav.fort@aisle.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
To avoid a memory leak via mm_alloc() + mmdrop() the futex cleanup code
has been moved to __mmdrop(). This resulted in a warnings if the futex
hash table has been allocated via vmalloc() the mmdrop() was invoked
from atomic context.
The free path must stay in __mmput() to ensure it is invoked from
preemptible context.
In order to avoid the memory leak, delay the allocation of
mm_struct::mm->futex_ref to futex_hash_allocate(). This works because
neither the per-CPU counter nor the private hash has been allocated and
therefore
- futex_private_hash() callers (such as exit_pi_state_list()) don't
acquire reference if there is no private hash yet. There is also no
reference put.
- Regular callers (futex_hash()) fallback to global hash. No reference
counting here.
The futex_ref member can be allocated in futex_hash_allocate() before
the private hash itself is allocated. This happens either while the
first thread is created or on request. In both cases the process has
just a single thread so there can be either futex operation in progress
or the request to create a private hash.
Move futex_hash_free() back to __mmput();
Move the allocation of mm_struct::futex_ref to futex_hash_allocate().
[ bp: Fold a follow-up fix to prevent a use-after-free:
https://lore.kernel.org/r/20250830213806.sEKuuGSm@linutronix.de ]
Fixes: e703b7e247 ("futex: Move futex cleanup to __mmdrop()")
Closes: https://lore.kernel.org/all/20250821102721.6deae493@kernel.org/
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lkml.kernel.org/r/20250822141238.PfnkTjFb@linutronix.de
Reinstate the rotational media flag for the CD-ROM driver. The flag has
been cleared since commit bd4a633b6f ("block: move the nonrot flag to
queue_limits") and this breaks some applications.
Move queue limit configuration from get_sectorsize() to
sr_revalidate_disk() and set the rotational flag.
Cc: Christoph Hellwig <hch@lst.de>
Fixes: bd4a633b6f ("block: move the nonrot flag to queue_limits")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250827113550.2614535-1-ming.lei@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix a use-after-free window by correcting the buffer release sequence in
the deferred receive path. The code freed the RQ buffer first and only
then cleared the context pointer under the lock. Concurrent paths (e.g.,
ABTS and the repost path) also inspect and release the same pointer under
the lock, so the old order could lead to double-free/UAF.
Note that the repost path already uses the correct pattern: detach the
pointer under the lock, then free it after dropping the lock. The
deferred path should do the same.
Fixes: 472e146d1c ("scsi: lpfc: Correct upcalling nvmet_fc transport during io done downcall")
Cc: stable@vger.kernel.org
Signed-off-by: John Evans <evans1210144@gmail.com>
Link: https://lore.kernel.org/r/20250828044008.743-1-evans1210144@gmail.com
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The tasdev_load_calibrated_data() function expects the calibration data
values in the cali_data buffer as R0, R0Low, InvR0, Power, TLim which
is not the same as what tas2563_save_calibration() writes to the buffer.
Reorder the EFI variables in the tas2563_save_calibration() function
to put the values in the buffer in the correct order.
Fixes: 4fe2385134 ("ALSA: hda/tas2781: Move and unified the calibrated-data getting function for SPI and I2C into the tas2781_hda lib")
Cc: <stable@vger.kernel.org>
Signed-off-by: Gergo Koteles <soyer@irl.hu>
Link: https://patch.msgid.link/20250829160450.66623-2-soyer@irl.hu
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Before conversion to unify the calibration data management, the
tas2563_apply_calib() function performed the big endian conversion and
wrote the calibration data to the device. The writing is now done by the
common tasdev_load_calibrated_data() function, but without conversion.
Put the values into the calibration data buffer with the expected
endianness.
Fixes: 4fe2385134 ("ALSA: hda/tas2781: Move and unified the calibrated-data getting function for SPI and I2C into the tas2781_hda lib")
Cc: <stable@vger.kernel.org>
Signed-off-by: Gergo Koteles <soyer@irl.hu>
Link: https://patch.msgid.link/20250829160450.66623-1-soyer@irl.hu
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The ALSA HwDep character device of the firewire-motu driver incorrectly
returns EPOLLOUT in poll(2), even though the driver implements no operation
for write(2). This misleads userspace applications to believe write() is
allowed, potentially resulting in unnecessarily wakeups.
This issue dates back to the driver's initial code added by a commit
71c3797779 ("ALSA: firewire-motu: add hwdep interface"), and persisted
when POLLOUT was updated to EPOLLOUT by a commit a9a08845e9 ('vfs: do
bulk POLL* -> EPOLL* replacement("").').
This commit fixes the bug.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://patch.msgid.link/20250829233749.366222-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Stefan Wahren says:
====================
microchip: lan865x: Fix probing issues
Recently I setup a customer i.MX93 board which contains a LAN8651 chip.
During this process I discovered some probing related issues.
====================
Link: https://patch.msgid.link/20250827115341.34608-1-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
dsp_hwec_enable() allocates dup pointer by kstrdup(arg),
but then it updates dup variable by strsep(&dup, ",").
As a result when it calls kfree(dup), the dup variable may be
a modified pointer that no longer points to the original allocated
memory, causing a memory leak.
The issue is the same pattern as fixed in commit c6a502c229
("mISDN: Fix memory leak in dsp_pipeline_build()").
Fixes: 9a43816182 ("mISDN: Remove VLAs")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250828081457.36061-1-linmq006@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ptp_ocp_detach() only shuts down the watchdog timer if it is
pending. However, if the timer handler is already running, the
timer_delete_sync() is not called. This leads to race conditions
where the devlink that contains the ptp_ocp is deallocated while
the timer handler is still accessing it, resulting in use-after-free
bugs. The following details one of the race scenarios.
(thread 1) | (thread 2)
ptp_ocp_remove() |
ptp_ocp_detach() | ptp_ocp_watchdog()
if (timer_pending(&bp->watchdog))| bp = timer_container_of()
timer_delete_sync() |
|
devlink_free(devlink) //free |
| bp-> //use
Resolve this by unconditionally calling timer_delete_sync() to ensure
the timer is reliably deactivated, preventing any access after free.
Fixes: 773bda9649 ("ptp: ocp: Expose various resources on the timecard.")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250828082949.28189-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal says:
====================
netfilter updates for net
1) Remove bogus WARN_ON in br_netfilter that came in 6.8.
This is now more prominent due to
commit 2d72afb340 ("netfilter: nf_conntrack: fix crash due to
removal of uninitialised entry"). From Wang Liang.
2) Better error reporting when a helper module clashes with
an existing helper name: -EEXIST makes modprobe believe that
the module is already loaded, so error message is elided.
From Phil Sutter.
* tag 'nf-25-08-27' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: conntrack: helper: Replace -EEXIST by -EBUSY
netfilter: br_netfilter: do not check confirmed bit in br_nf_local_in() after confirm
====================
Link: https://patch.msgid.link/20250827133900.16552-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub says:
nft_flowtable.sh is one of the most flake-atious test for netdev CI currently :(
The root cause is two-fold:
1. the failing part of the test is supposed to make sure that ip
fragments are forwarded for offloaded flows.
(flowtable has to pass them to classic forward path).
path mtu discovery for these subtests is disabled.
2. nft_flowtable.sh has two passes. One with fixed mtus/file size and
one where link mtus and file sizes are random.
The CI failures all have same pattern:
re-run with random mtus and file size: -o 27663 -l 4117 -r 10089 -s 54384840
[..]
PASS: dscp_egress: dscp packet counters match
FAIL: file mismatch for ns1 -> ns2
In some cases this error triggers a bit ealier, sometimes in a later
subtest:
re-run with random mtus and file size: -o 20201 -l 4555 -r 12657 -s 9405856
[..]
PASS: dscp_egress: dscp packet counters match
PASS: dscp_fwd: dscp packet counters match
2025/08/17 20:37:52 socat[18954] E write(7, 0x560716b96000, 8192): Broken pipe
FAIL: file mismatch for ns1 -> ns2
-rw------- 1 root root 9405856 Aug 17 20:36 /tmp/tmp.2n63vlTrQe
But all logs I saw show same scenario:
1. Failing tests have pmtu discovery off (i.e., ip fragmentation)
2. The test file is much larger than first-pass default (2M Byte)
3. peers have much larger MTUs compared to the 'network'.
These errors are very reproducible when re-running the test with
the same commandline arguments.
The timeout became much more prominent with
1d2fbaad7c ("tcp: stronger sk_rcvbuf checks"): reassembled packets
typically have a skb->truesize more than double the skb length.
As that commit is intentional and pmtud-off with
large-tcp-packets-as-fragments is not normal adjust the test to use a
smaller file for the pmtu-off subtests.
While at it, add more information to pass/fail messages and
also run the dscp alteration subtest with pmtu discovery enabled.
Link: https://netdev.bots.linux.dev/contest.html?test=nft-flowtable-sh
Fixes: f84ab63490 ("selftests: netfilter: nft_flowtable.sh: re-run with random mtu sizes")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20250822071330.4168f0db@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20250828214918.3385-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johannes Berg says:
====================
Some fixes for the current cycle:
- mt76: MLO regressions, offchannel handling, list corruption
- mac80211: scan allocation size, no 40 MHz EHT, signed type
- rt2x00: (randconfig) build
- cfg80211: use-after-free
- iwlwifi: config/old devices, BIOS compatibility
- mwifiex: vmalloc content leak
* tag 'wireless-2025-08-28' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (29 commits)
wifi: iwlwifi: cfg: add back more lost PCI IDs
wifi: iwlwifi: fix byte count table for old devices
wifi: iwlwifi: cfg: restore some 1000 series configs
wifi: mwifiex: Initialize the chan_stats array to zero
wifi: mac80211: do not permit 40 MHz EHT operation on 5/6 GHz
wifi: iwlwifi: uefi: check DSM item validity
wifi: iwlwifi: acpi: check DSM func validity
wifi: iwlwifi: if scratch is ~0U, consider it a failure
wifi: mt76: fix linked list corruption
wifi: mt76: free pending offchannel tx frames on wcid cleanup
wifi: mt76: mt7915: fix list corruption after hardware restart
wifi: mt76: mt7996: add missing check for rx wcid entries
wifi: mt76: do not add non-sta wcid entries to the poll list
wifi: mt76: mt7996: fix crash on some tx status reports
wifi: mt76: mt7996: use the correct vif link for scanning/roc
wifi: mt76: mt7996: disable beacons when going offchannel
wifi: mt76: prevent non-offchannel mgmt tx during scan/roc
wifi: mt76: mt7925: skip EHT MLD TLV on non-MLD and pass conn_state for sta_cmd
wifi: mt76: mt7925u: use connac3 tx aggr check in tx complete
wifi: mt76: mt7925: fix the wrong bss cleanup for SAP
...
====================
Link: https://patch.msgid.link/20250828122654.1167754-8-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move the creation of debugfs files into a dedicated function, and ensure
they are explicitly removed during vhci_release(), before associated
data structures are freed.
Previously, debugfs files such as "force_suspend", "force_wakeup", and
others were created under hdev->debugfs but not removed in
vhci_release(). Since vhci_release() frees the backing vhci_data
structure, any access to these files after release would result in
use-after-free errors.
Although hdev->debugfs is later freed in hci_release_dev(), user can
access files after vhci_data is freed but before hdev->debugfs is
released.
Fixes: ab4e4380d4 ("Bluetooth: Add vhci devcoredump support")
Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
nfs_server_set_fsinfo() shouldn't assume that NFS_CAP_XATTR is unset
on entry to the function.
Fixes: b78ef845c3 ("NFSv4.2: query the server for extended attribute support")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
_nfs4_server_capabilities() should clear capabilities that are not
supported by the server.
Fixes: d2a00cceb9 ("NFSv4: Detect support for OPEN4_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
_nfs4_server_capabilities() is expected to clear any flags that are not
supported by the server.
Fixes: 8a59bb93b7 ("NFSv4 store server support for fs_location attribute")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Don't clear the capabilities that are not going to get reset by the call
to _nfs4_server_capabilities().
Reported-by: Scott Haiden <scott.b.haiden@gmail.com>
Fixes: b01f21cacd ("NFS: Fix the setting of capabilities when automounting a new filesystem")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Nouveau has code that when it gets an IRQ with no allowed handler
it disables it to avoid storms.
However with nonstall interrupts, we often disable them from
the drm driver, but still request their emission via the push submission.
Just don't disable nonstall irqs ever in normal operation, the
event handling code will filter them out, and the driver will
just enable/disable them at load time.
This fixes timeouts we've been seeing on/off for a long time,
but they became a lot more noticeable on Blackwell.
This doesn't fix all of them, there is a subsequent fence emission
fix to fix the last few.
Fixes: 3ebd64aa3c ("drm/nouveau/intr: support multiple trees, and explicit interfaces")
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://lore.kernel.org/r/20250829021633.1674524-1-airlied@gmail.com
[ Fix a typo and a minor checkpatch.pl warning; remove "v2" from commit
subject. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Using the previous firmware could lead to problems with
PROTECTED_FENCE_SIGNAL commands, specifically causing register
conflicts between MCU_DBG0 and MCU_DBG1.
The updated firmware versions ensure proper alignment
and unification of the SDMA_SUBOP_PROTECTED_FENCE_SIGNAL value with SDMA 7.x,
resolving these hardware coordination issues
Fixes: e8cca30d8b ("drm/amdgpu/sdma6: add ucode version checks for userq support")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit aab8b689ad)
Cc: stable@vger.kernel.org
The fans controlled by the driver can get stuck at 0 RPM if they are
configured below a 20% duty cycle. The driver tries to avoid this by
enforcing a minimum duty cycle of 20%, but this is done after the fans
are registered with the thermal subsystem. This is too late as the
thermal subsystem can set their current state before the driver is able
to enforce the minimum duty cycle.
Fix by setting the minimum duty cycle before registering the fans with
the thermal subsystem.
Fixes: d7efb2ebc7 ("hwmon: (mlxreg-fan) Extend driver to support multiply cooling devices")
Reported-by: Nikolay Aleksandrov <razor@blackwall.org>
Tested-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20250730201715.1111133-1-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Currently, tpmi_get_logical_id() calls topology_physical_package_id()
to set the pkg_id of the info structure. Since some VM hosts assign non
contiguous package IDs, topology_physical_package_id() can return a
larger value than topology_max_packages(). This will result in an
invalid reference into tpmi_power_domain_mask[] as that is allocatead
based on topology_max_packages() as the maximum package ID.
Fixes: 17ca278045 ("platform/x86/intel: TPMI domain id and CPU mapping")
Signed-off-by: David Arcari <darcari@redhat.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20250829113859.1772827-1-darcari@redhat.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
In commit 1352964774 ("spi: microchip-core-qspi: Support per spi-mem
operation frequency switches") the logic for checking the viability of
op->max_freq in mchp_coreqspi_setup_clock() was copied into
mchp_coreqspi_supports_op(). Unfortunately, op->max_freq is not valid
when this function is called during probe but is instead zero.
Accordingly, baud_rate_val is calculated to be INT_MAX due to division
by zero, causing probe of the attached memory device to fail.
Seemingly spi-microchip-core-qspi was the only driver that had such a
modification made to its supports_op callback when the per_op_freq
capability was added, so just remove it to restore prior functionality.
CC: stable@vger.kernel.org
Reported-by: Valentina Fernandez <valentina.fernandezalanis@microchip.com>
Fixes: 1352964774 ("spi: microchip-core-qspi: Support per spi-mem operation frequency switches")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Message-ID: <20250825-during-ploy-939bdd068593@spud>
Signed-off-by: Mark Brown <broonie@kernel.org>
ASoC: Fixes for v6.17
The main fixes here are for some of the cleanups done in the core in
this release, we had broken component lookup in the case with a single
bus and DMA controller. Otherwise it's driver specific changes, the
shortlogs for the Intel WCL and rsnd drivers look like minor cleanups
but are actually bugfixes (adding an op needed for correct functionality
and reverting an inappropriate helper usage).
Commit 620c266f39 ("fhandle: relax open_by_handle_at() permission
checks") relaxed the coditions for decoding a file handle from non init
userns.
The conditions are that that decoded dentry is accessible from the user
provided mountfd (or to fs root) and that all the ancestors along the
path have a valid id mapping in the userns.
These conditions are intentionally more strict than the condition that
the decoded dentry should be "lookable" by path from the mountfd.
For example, the path /home/amir/dir/subdir is lookable by path from
unpriv userns of user amir, because /home perms is 755, but the owner of
/home does not have a valid id mapping in unpriv userns of user amir.
The current code did not check that the decoded dentry itself has a
valid id mapping in the userns. There is no security risk in that,
because that final open still performs the needed permission checks,
but this is inconsistent with the checks performed on the ancestors,
so the behavior can be a bit confusing.
Add the check for the decoded dentry itself, so that the entire path,
including the last component has a valid id mapping in the userns.
Fixes: 620c266f39 ("fhandle: relax open_by_handle_at() permission checks")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/20250827194309.1259650-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
For Intel SKL and KBL platforms, it may be bound with one of three
HD-audio drivers (AVS, SOF and legacy). AVS is the preferred one when
DMIC is detected, and that's how it's defined in the snd-intel-dspcfg
config table.
But, when AVS driver is disabled (CONFIG_SND_SOC_INTEL_AVS=n), the
device may be bound freely with either SOF or legacy driver.
Before 6.17, the legacy driver took it primarily, but on 6.17, likely
due to the recent code shuffling, SOF driver seems taking it at first,
and fails to probe. For avoiding the regression, we should enforce to
bind those with the legacy HD-audio drvier when AVS is disabled.
This patch adds the extra two entries in intel-dspcfg table that are
applied only when CONFIG_SND_SOC_INTEL_AVS=n, for binding with the
legacy driver.
Note that there are entries for APL in that config table block, but
APL may be supported by SOF for certain setups, so the choice can't be
exclusive. Hence this patch includes only SKL and KBL.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1248121
Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20250828141101.16294-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Deny all sampling event by the CPUMF counter facility device driver
and return -ENOENT. This return value is used to try other PMUs.
Up to now events for type PERF_TYPE_HARDWARE were not tested for
sampling and returned later on -EOPNOTSUPP. This ends the search
for alternative PMUs. Change that behavior and try other PMUs
instead.
Fixes: 613a41b0d1 ("s390/cpum_cf: Reject request for sampling in event initialization")
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Each PAI PMU device driver returns -EINVAL when an event is out of
its accepted range. This return value aborts the search for an
alternative PMU device driver to handle this event.
Change the return value to -ENOENT. This return value is used to
try other PMUs instead. This makes the PMUs more robust when
the sequence of PMU device driver initialization changes (at boot time)
or by using modules.
Fixes: 39d62336f5 ("s390/pai: add support for cryptography counters")
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
In case OOB write is requested during a data write, ECC is currently
lost. Avoid this issue by only writing in the free spare area.
This issue has been seen with a YAFFS2 file system.
Signed-off-by: Christophe Kerello <christophe.kerello@foss.st.com>
Cc: stable@vger.kernel.org
Fixes: 2cd457f328 ("mtd: rawnand: stm32_fmc2: add STM32 FMC2 NAND flash controller driver")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
If a ma35_nand_chip_init() call fails, then a reference to 'nand_np' still
needs to be released.
Use for_each_child_of_node_scoped() to fix the issue.
Fixes: 5abb5d414d ("mtd: rawnand: nuvoton: add new driver for the Nuvoton MA35 SoC")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Intel Discrete Graphics non-volatile memory is only present on Intel
discrete graphics devices, and its auxiliary device is instantiated by
the Intel i915 and Xe2 DRM drivers. Hence add dependencies on DRM_I915
and DRM_XE, to prevent asking the user about this driver when
configuring a kernel without Intel graphics support.
Fixes: ceb5ab3cb6 ("mtd: add driver for intel graphics non-volatile memory device")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Miri Korenblit says:
====================
a few fixes, mainly of the cfg rework.
====================
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The adapter->chan_stats[] array is initialized in
mwifiex_init_channel_scan_gap() with vmalloc(), which doesn't zero out
memory. The array is filled in mwifiex_update_chan_statistics()
and then the user can query the data in mwifiex_cfg80211_dump_survey().
There are two potential issues here. What if the user calls
mwifiex_cfg80211_dump_survey() before the data has been filled in.
Also the mwifiex_update_chan_statistics() function doesn't necessarily
initialize the whole array. Since the array was not initialized at
the start that could result in an information leak.
Also this array is pretty small. It's a maximum of 900 bytes so it's
more appropriate to use kcalloc() instead vmalloc().
Cc: stable@vger.kernel.org
Fixes: bf35443314 ("mwifiex: channel statistics support for mwifiex")
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/20250815023055.477719-1-rongqianfeng@vivo.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Introduce and use {pgd,p4d}_populate_kernel() in core MM code when
populating PGD and P4D entries for the kernel address space. These
helpers ensure proper synchronization of page tables when updating the
kernel portion of top-level page tables.
Until now, the kernel has relied on each architecture to handle
synchronization of top-level page tables in an ad-hoc manner. For
example, see commit 9b861528a8 ("x86-64, mem: Update all PGDs for direct
mapping and vmemmap mapping changes").
However, this approach has proven fragile for following reasons:
1) It is easy to forget to perform the necessary page table
synchronization when introducing new changes.
For instance, commit 4917f55b4e ("mm/sparse-vmemmap: improve memory
savings for compound devmaps") overlooked the need to synchronize
page tables for the vmemmap area.
2) It is also easy to overlook that the vmemmap and direct mapping areas
must not be accessed before explicit page table synchronization.
For example, commit 8d400913c2 ("x86/vmemmap: handle unpopulated
sub-pmd ranges")) caused crashes by accessing the vmemmap area
before calling sync_global_pgds().
To address this, as suggested by Dave Hansen, introduce _kernel() variants
of the page table population helpers, which invoke architecture-specific
hooks to properly synchronize page tables. These are introduced in a new
header file, include/linux/pgalloc.h, so they can be called from common
code.
They reuse existing infrastructure for vmalloc and ioremap.
Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK,
and the actual synchronization is performed by
arch_sync_kernel_mappings().
This change currently targets only x86_64, so only PGD and P4D level
helpers are introduced. Currently, these helpers are no-ops since no
architecture sets PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK.
In theory, PUD and PMD level helpers can be added later if needed by other
architectures. For now, 32-bit architectures (x86-32 and arm) only handle
PGTBL_PMD_MODIFIED, so p*d_populate_kernel() will never affect them unless
we introduce a PMD level helper.
[harry.yoo@oracle.com: fix KASAN build error due to p*d_populate_kernel()]
Link: https://lkml.kernel.org/r/20250822020727.202749-1-harry.yoo@oracle.com
Link: https://lkml.kernel.org/r/20250818020206.4517-3-harry.yoo@oracle.com
Fixes: 8d400913c2 ("x86/vmemmap: handle unpopulated sub-pmd ranges")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: bibo mao <maobibo@loongson.cn>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
During our internal testing, we started observing intermittent boot
failures when the machine uses 4-level paging and has a large amount of
persistent memory:
BUG: unable to handle page fault for address: ffffe70000000034
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP NOPTI
RIP: 0010:__init_single_page+0x9/0x6d
Call Trace:
<TASK>
__init_zone_device_page+0x17/0x5d
memmap_init_zone_device+0x154/0x1bb
pagemap_range+0x2e0/0x40f
memremap_pages+0x10b/0x2f0
devm_memremap_pages+0x1e/0x60
dev_dax_probe+0xce/0x2ec [device_dax]
dax_bus_probe+0x6d/0xc9
[... snip ...]
</TASK>
It turns out that the kernel panics while initializing vmemmap (struct
page array) when the vmemmap region spans two PGD entries, because the new
PGD entry is only installed in init_mm.pgd, but not in the page tables of
other tasks.
And looking at __populate_section_memmap():
if (vmemmap_can_optimize(altmap, pgmap))
// does not sync top level page tables
r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap);
else
// sync top level page tables in x86
r = vmemmap_populate(start, end, nid, altmap);
In the normal path, vmemmap_populate() in arch/x86/mm/init_64.c
synchronizes the top level page table (See commit 9b861528a8 ("x86-64,
mem: Update all PGDs for direct mapping and vmemmap mapping changes")) so
that all tasks in the system can see the new vmemmap area.
However, when vmemmap_can_optimize() returns true, the optimized path
skips synchronization of top-level page tables. This is because
vmemmap_populate_compound_pages() is implemented in core MM code, which
does not handle synchronization of the top-level page tables. Instead,
the core MM has historically relied on each architecture to perform this
synchronization manually.
We're not the first party to encounter a crash caused by not-sync'd top
level page tables: earlier this year, Gwan-gyeong Mun attempted to address
the issue [1] [2] after hitting a kernel panic when x86 code accessed the
vmemmap area before the corresponding top-level entries were synced. At
that time, the issue was believed to be triggered only when struct page
was enlarged for debugging purposes, and the patch did not get further
updates.
It turns out that current approach of relying on each arch to handle the
page table sync manually is fragile because 1) it's easy to forget to sync
the top level page table, and 2) it's also easy to overlook that the
kernel should not access the vmemmap and direct mapping areas before the
sync.
# The solution: Make page table sync more code robust and harder to miss
To address this, Dave Hansen suggested [3] [4] introducing
{pgd,p4d}_populate_kernel() for updating kernel portion of the page tables
and allow each architecture to explicitly perform synchronization when
installing top-level entries. With this approach, we no longer need to
worry about missing the sync step, reducing the risk of future
regressions.
The new interface reuses existing ARCH_PAGE_TABLE_SYNC_MASK,
PGTBL_P*D_MODIFIED and arch_sync_kernel_mappings() facility used by
vmalloc and ioremap to synchronize page tables.
pgd_populate_kernel() looks like this:
static inline void pgd_populate_kernel(unsigned long addr, pgd_t *pgd,
p4d_t *p4d)
{
pgd_populate(&init_mm, pgd, p4d);
if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED)
arch_sync_kernel_mappings(addr, addr);
}
It is worth noting that vmalloc() and apply_to_range() carefully
synchronizes page tables by calling p*d_alloc_track() and
arch_sync_kernel_mappings(), and thus they are not affected by this patch
series.
This series was hugely inspired by Dave Hansen's suggestion and hence
added Suggested-by: Dave Hansen.
Cc stable because lack of this series opens the door to intermittent
boot failures.
This patch (of 3):
Move ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() to
linux/pgtable.h so that they can be used outside of vmalloc and ioremap.
Link: https://lkml.kernel.org/r/20250818020206.4517-1-harry.yoo@oracle.com
Link: https://lkml.kernel.org/r/20250818020206.4517-2-harry.yoo@oracle.com
Link: https://lore.kernel.org/linux-mm/20250220064105.808339-1-gwan-gyeong.mun@intel.com [1]
Link: https://lore.kernel.org/linux-mm/20250311114420.240341-1-gwan-gyeong.mun@intel.com [2]
Link: https://lore.kernel.org/linux-mm/d1da214c-53d3-45ac-a8b6-51821c5416e4@intel.com [3]
Link: https://lore.kernel.org/linux-mm/4d800744-7b88-41aa-9979-b245e8bf794b@intel.com [4]
Fixes: 8d400913c2 ("x86/vmemmap: handle unpopulated sub-pmd ranges")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: bibo mao <maobibo@loongson.cn>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For !CONFIG_SPARSEMEM_VMEMMAP, memmap page accounting is currently done
upfront in sparse_buffer_init(). However, sparse_buffer_alloc() may
return NULL in failure scenario.
Also, memmap pages may be allocated either from the memblock allocator
during early boot or from the buddy allocator. When removed via
arch_remove_memory(), accounting of memmap pages must reflect the original
allocation source.
To ensure correctness:
* Account memmap pages after successful allocation in sparse_init_nid()
and section_activate().
* Account memmap pages in section_deactivate() based on allocation
source.
Link: https://lkml.kernel.org/r/20250807183545.1424509-1-sumanthk@linux.ibm.com
Fixes: 15995a3524 ("mm: report per-page metadata information")
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
On 32-bit systems, the throughput calculation in
damos_set_effective_quota() is prone to unnecessary multiplication
overflow. Using mult_frac() to fix it.
Andrew Paniakin also recently found and privately reported this issue, on
64 bit systems. This can also happen on 64-bit systems, once the charged
size exceeds ~17 TiB. On systems running for long time in production,
this issue can actually happen.
More specifically, when a DAMOS scheme having the time quota run for
longtime, throughput calculation can overflow and set esz too small. As a
result, speed of the scheme get unexpectedly slow.
Link: https://lkml.kernel.org/r/20250821125555.3020951-1-yanquanmin1@huawei.com
Fixes: 1cd2430300 ("mm/damon/schemes: implement time quota")
Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com>
Reported-by: Andrew Paniakin <apanyaki@amazon.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: ze zuo <zuoze1@huawei.com>
Cc: <stable@vger.kernel.org> [5.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 07d2490297 ("kexec: enable CMA based contiguous allocation")
introduces logic to use CMA-based allocation in kexec by default. As part
of the changes, it introduces a kexec_file_load flag to disable the use of
CMA allocations from userspace. However, this flag is broken since it is
missing from the list of legal flags for kexec_file_load. kexec_file_load
returns EINVAL when attempting to use the flag.
Fix this by adding the KEXEC_FILE_NO_CMA flag to the list of legal flags
for kexec_file_load.
Without this fix, kexec_file_load syscall will failed and return
'-EINVAL' when KEXEC_FILE_NO_CMA is specified.
Link: https://lkml.kernel.org/r/20250805211527.122367-2-makb@juniper.net
Fixes: 07d2490297 ("kexec: enable CMA based contiguous allocation")
Signed-off-by: Brian Mak <makb@juniper.net>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Young <dyoung@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
GCC doesn't support "hwasan-kernel-mem-intrinsic-prefix", only
"asan-kernel-mem-intrinsic-prefix"[0], while LLVM supports both. This is
already taken into account when checking
"CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX", but not in the KASAN Makefile
adding those parameters when "CONFIG_KASAN_SW_TAGS" is enabled.
Replace the version check with "CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX",
which already validates that mem-intrinsic prefix parameter can be used,
and choose the correct name depending on compiler.
GCC 13 and above trigger "CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX" which
prevents `mem{cpy,move,set}()` being redefined in "mm/kasan/shadow.c"
since commit 36be5cba99 ("kasan: treat meminstrinsic as builtins in
uninstrumented files"), as we expect the compiler to prefix those calls
with `__(hw)asan_` instead. But as the option passed to GCC has been
incorrect, the compiler has not been emitting those prefixes, effectively
never calling the instrumented versions of `mem{cpy,move,set}()` with
"CONFIG_KASAN_SW_TAGS" enabled.
If "CONFIG_FORTIFY_SOURCES" is enabled, this issue would be mitigated as
it redefines `mem{cpy,move,set}()` and properly aliases the
`__underlying_mem*()` that will be called to the instrumented versions.
Link: https://lkml.kernel.org/r/20250821120735.156244-1-ada.coupriediaz@arm.com
Link: https://gcc.gnu.org/onlinedocs/gcc-13.4.0/gcc/Optimize-Options.html [0]
Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Fixes: 36be5cba99 ("kasan: treat meminstrinsic as builtins in uninstrumented files")
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Functions __kasan_populate_vmalloc() and __kasan_depopulate_vmalloc() use
apply_to_pte_range(), which enters lazy MMU mode. In that mode updating
PTEs may not be observed until the mode is left.
That may lead to a situation in which otherwise correct reads and writes
to a PTE using ptep_get(), set_pte(), pte_clear() and other access
primitives bring wrong results when the vmalloc shadow memory is being
(de-)populated.
To avoid these hazards leave the lazy MMU mode before and re-enter it
after each PTE manipulation.
Link: https://lkml.kernel.org/r/0d2efb7ddddbff6b288fbffeeb10166e90771718.1755528662.git.agordeev@linux.ibm.com
Fixes: 3c5c3cfb9e ("kasan: support backing vmalloc space with real shadow memory")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Similar to commit 09c6304e38 ("kasan: test: fix compatibility with
FORTIFY_SOURCE") the kernel is panicing in kasan_string().
This is due to the `src` and `ptr` not being hidden from the optimizer
which would disable the runtime fortify string checker.
Call trace:
__fortify_panic+0x10/0x20 (P)
kasan_strings+0x980/0x9b0
kunit_try_run_case+0x68/0x190
kunit_generic_run_threadfn_adapter+0x34/0x68
kthread+0x1c4/0x228
ret_from_fork+0x10/0x20
Code: d503233f a9bf7bfd 910003fd 9424b243 (d4210000)
---[ end trace 0000000000000000 ]---
note: kunit_try_catch[128] exited with irqs disabled
note: kunit_try_catch[128] exited with preempt_count 1
# kasan_strings: try faulted: last
** replaying previous printk message **
# kasan_strings: try faulted: last line seen mm/kasan/kasan_test_c.c:1600
# kasan_strings: internal error occurred preventing test case from running: -4
Link: https://lkml.kernel.org/r/20250801120236.2962642-1-yeoreum.yun@arm.com
Fixes: 73228c7ecc ("KASAN: port KASAN Tests to KUnit")
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
With CONFIG_HIGHPTE on 32-bit ARM, move_pages_pte() maps PTE pages using
kmap_local_page(), which requires unmapping in Last-In-First-Out order.
The current code maps dst_pte first, then src_pte, but unmaps them in the
same order (dst_pte, src_pte), violating the LIFO requirement. This
causes the warning in kunmap_local_indexed():
WARNING: CPU: 0 PID: 604 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
addr \!= __fix_to_virt(FIX_KMAP_BEGIN + idx)
Fix this by reversing the unmap order to respect LIFO ordering.
This issue follows the same pattern as similar fixes:
- commit eca6828403 ("crypto: skcipher - fix mismatch between mapping and unmapping order")
- commit 8cf57c6df8 ("nilfs2: eliminate staggered calls to kunmap in nilfs_rename")
Both of which addressed the same fundamental requirement that kmap_local
operations must follow LIFO ordering.
Link: https://lkml.kernel.org/r/20250731144431.773923-1-sashal@kernel.org
Fixes: adef440691 ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The helper registration return value is passed-through by module_init
callbacks which modprobe confuses with the harmless -EEXIST returned
when trying to load an already loaded module.
Make sure modprobe fails so users notice their helper has not been
registered and won't work.
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Fixes: 12f7a50533 ("netfilter: add user-space connection tracking helper infrastructure")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
When send a broadcast packet to a tap device, which was added to a bridge,
br_nf_local_in() is called to confirm the conntrack. If another conntrack
with the same hash value is added to the hash table, which can be
triggered by a normal packet to a non-bridge device, the below warning
may happen.
------------[ cut here ]------------
WARNING: CPU: 1 PID: 96 at net/bridge/br_netfilter_hooks.c:632 br_nf_local_in+0x168/0x200
CPU: 1 UID: 0 PID: 96 Comm: tap_send Not tainted 6.17.0-rc2-dirty #44 PREEMPT(voluntary)
RIP: 0010:br_nf_local_in+0x168/0x200
Call Trace:
<TASK>
nf_hook_slow+0x3e/0xf0
br_pass_frame_up+0x103/0x180
br_handle_frame_finish+0x2de/0x5b0
br_nf_hook_thresh+0xc0/0x120
br_nf_pre_routing_finish+0x168/0x3a0
br_nf_pre_routing+0x237/0x5e0
br_handle_frame+0x1ec/0x3c0
__netif_receive_skb_core+0x225/0x1210
__netif_receive_skb_one_core+0x37/0xa0
netif_receive_skb+0x36/0x160
tun_get_user+0xa54/0x10c0
tun_chr_write_iter+0x65/0xb0
vfs_write+0x305/0x410
ksys_write+0x60/0xd0
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
---[ end trace 0000000000000000 ]---
To solve the hash conflict, nf_ct_resolve_clash() try to merge the
conntracks, and update skb->_nfct. However, br_nf_local_in() still use the
old ct from local variable 'nfct' after confirm(), which leads to this
warning.
If confirm() does not insert the conntrack entry and return NF_DROP, the
warning may also occur. There is no need to reserve the WARN_ON_ONCE, just
remove it.
Link: https://lore.kernel.org/netdev/20250820043329.2902014-1-wangliang74@huawei.com/
Fixes: 62e7151ae3 ("netfilter: bridge: confirm multicast packets before passing them up the stack")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Since stations are recreated from scratch, all lists that wcids are added
to must be cleared before calling ieee80211_restart_hw.
Set wcid->sta = 0 for each wcid entry in order to ensure that they are
not added again before they are ready.
Fixes: 8a55712d12 ("wifi: mt76: mt7915: enable full system reset support")
Link: https://patch.msgid.link/20250827085352.51636-4-nbd@nbd.name
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Non-station wcid entries must not be passed to the rx functions.
In case of the global wcid entry, it could even lead to corruption in the wcid
array due to pointer being casted to struct mt7996_sta_link using container_of.
Fixes: 7464b12b7d ("wifi: mt76: mt7996: rework mt7996_rx_get_wcid to support MLO")
Link: https://patch.msgid.link/20250827085352.51636-3-nbd@nbd.name
Signed-off-by: Felix Fietkau <nbd@nbd.name>
When a wcid can't be found, link_sta can be stale from a previous batch.
The code currently assumes that if link_sta is set, wcid is also non-zero.
Fix wcid NULL pointer dereference by resetting link_sta when a wcid entry
can't be found.
Fixes: 62da647a2b ("wifi: mt76: mt7996: Add MLO support to mt7996_tx_check_aggr()")
Link: https://patch.msgid.link/20250827085352.51636-1-nbd@nbd.name
Signed-off-by: Felix Fietkau <nbd@nbd.name>
When station mode, don't disconnect when we get
channel switch from AP to DFS channel. Most APs
send CSA request after pass background CAC. In other
case we should disconnect after detect beacon miss.
Without patch when we get CSA to DFS channel get:
"kernel: wlo1: preparing for channel switch failed, disconnecting"
Fixes: 8aa2f59260 ("wifi: mt76: mt7921: introduce CSA support")
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@gmail.com>
Link: https://patch.msgid.link/20250716165443.28354-1-janusz.dziedzic@gmail.com
Signed-off-by: Felix Fietkau <nbd@nbd.name>
A new warning in clang [1] points out a couple of places where a hdr
variable is not initialized then passed along to skb_put_data().
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:1894:21: warning: variable 'hdr' is uninitialized when passed as a const pointer argument here [-Wuninitialized-const-pointer]
1894 | skb_put_data(skb, &hdr, sizeof(hdr));
| ^~~
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:3386:21: warning: variable 'hdr' is uninitialized when passed as a const pointer argument here [-Wuninitialized-const-pointer]
3386 | skb_put_data(skb, &hdr, sizeof(hdr));
| ^~~
Zero initialize these headers as done in other places in the driver when
there is nothing stored in the header.
Cc: stable@vger.kernel.org
Fixes: 98686cd216 ("wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices")
Link: 00dacf8c22 [1]
Closes: https://github.com/ClangBuiltLinux/linux/issues/2104
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20250715-mt7996-fix-uninit-const-pointer-v1-1-b5d8d11d7b78@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Currently the S1G capability element is not taken into account
for the scan_ies_len, which leads to a buffer length validation
failure in ieee80211_prep_hw_scan() and subsequent WARN in
__ieee80211_start_scan(). This prevents hw scanning from functioning.
To fix ensure we accommodate for the S1G capability length.
Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com>
Link: https://patch.msgid.link/20250826085437.3493-1-lachlan.hodges@morsemicro.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The brcmf_btcoex_detach() only shuts down the btcoex timer, if the
flag timer_on is false. However, the brcmf_btcoex_timerfunc(), which
runs as timer handler, sets timer_on to false. This creates critical
race conditions:
1.If brcmf_btcoex_detach() is called while brcmf_btcoex_timerfunc()
is executing, it may observe timer_on as false and skip the call to
timer_shutdown_sync().
2.The brcmf_btcoex_timerfunc() may then reschedule the brcmf_btcoex_info
worker after the cancel_work_sync() has been executed, resulting in
use-after-free bugs.
The use-after-free bugs occur in two distinct scenarios, depending on
the timing of when the brcmf_btcoex_info struct is freed relative to
the execution of its worker thread.
Scenario 1: Freed before the worker is scheduled
The brcmf_btcoex_info is deallocated before the worker is scheduled.
A race condition can occur when schedule_work(&bt_local->work) is
called after the target memory has been freed. The sequence of events
is detailed below:
CPU0 | CPU1
brcmf_btcoex_detach | brcmf_btcoex_timerfunc
| bt_local->timer_on = false;
if (cfg->btcoex->timer_on) |
... |
cancel_work_sync(); |
... |
kfree(cfg->btcoex); // FREE |
| schedule_work(&bt_local->work); // USE
Scenario 2: Freed after the worker is scheduled
The brcmf_btcoex_info is freed after the worker has been scheduled
but before or during its execution. In this case, statements within
the brcmf_btcoex_handler() — such as the container_of macro and
subsequent dereferences of the brcmf_btcoex_info object will cause
a use-after-free access. The following timeline illustrates this
scenario:
CPU0 | CPU1
brcmf_btcoex_detach | brcmf_btcoex_timerfunc
| bt_local->timer_on = false;
if (cfg->btcoex->timer_on) |
... |
cancel_work_sync(); |
... | schedule_work(); // Reschedule
|
kfree(cfg->btcoex); // FREE | brcmf_btcoex_handler() // Worker
/* | btci = container_of(....); // USE
The kfree() above could | ...
also occur at any point | btci-> // USE
during the worker's execution|
*/ |
To resolve the race conditions, drop the conditional check and call
timer_shutdown_sync() directly. It can deactivate the timer reliably,
regardless of its current state. Once stopped, the timer_on state is
then set to false.
Fixes: 61730d4dff ("brcmfmac: support critical protocol API for DHCP")
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Link: https://patch.msgid.link/20250822050839.4413-1-duoming@zju.edu.cn
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Compile-testing this driver on Arm platforms shows a link failure
when the CRC functions are not part of the kernel:
x86_64-linux-ld: drivers/net/wireless/ralink/rt2x00/rt2800lib.o: in function `rt2800_check_firmware':
rt2800lib.c:(.text+0x20e5): undefined reference to `crc_ccitt'
Move the select statement to the correct Kconfig symbol to match
the call site.
Fixes: 311b05e235 ("wifi: rt2x00: add COMPILE_TEST")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Stanislaw Gruszka <stf_xl@wp.pl>
Reviewed-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://patch.msgid.link/20250731075837.1969136-1-arnd@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
On regular fuse filesystems, i_blkbits is set to PAGE_SHIFT which means
any iomap partial writes will mark the entire folio as uptodate. However
fuseblk filesystems work differently and allow the blocksize to be less
than the page size. As such, this may lead to data corruption if fuseblk
sets its blocksize to less than the page size, uses the writeback cache,
and does a partial write, then a read and the read happens before the
write has undergone writeback, since the folio will not be marked
uptodate from the partial write so the read will read in the entire
folio from disk, which will overwrite the partial write.
The long-term solution for this, which will also be needed for fuse to
enable large folios with the writeback cache on, is to have fuse also
use iomap for folio reads, but until that is done, the cleanest
workaround is to use the page size for fuseblk's internal kernel inode
blksize/blkbits values while maintaining current behavior for stat().
This was verified using ntfs-3g:
$ sudo mkfs.ntfs -f -c 512 /dev/vdd1
$ sudo ntfs-3g /dev/vdd1 ~/fuseblk
$ stat ~/fuseblk/hi.txt
IO Block: 512
Fixes: a4c9ab1d49 ("fuse: use iomap for buffered writes")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
As pointed out by Miklos[1], in the fuse_update_get_attr() path, the
attributes returned to stat may be cached values instead of fresh ones
fetched from the server. In the case where the server returned a
modified blocksize value, we need to cache it and reflect it back to
stat if values are not re-fetched since we now no longer directly change
inode->i_blkbits.
Link: https://lore.kernel.org/linux-fsdevel/CAJfpeguCOxeVX88_zPd1hqziB_C+tmfuDhZP5qO2nKmnb-dTUA@mail.gmail.com/ [1]
Fixes: 542ede096e ("fuse: keep inode->i_blkbits constant")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The FUSE protocol uses struct fuse_write_out to convey the return value of
copy_file_range, which is restricted to uint32_t. But the COPY_FILE_RANGE
interface supports a 64-bit size copies.
Currently the number of bytes copied is silently truncated to 32-bit, which
may result in poor performance or even failure to copy in case of
truncation to zero.
Reported-by: Florian Weimer <fweimer@redhat.com>
Closes: https://lore.kernel.org/all/lhuh5ynl8z5.fsf@oldenburg.str.redhat.com/
Fixes: 88bc7d5097 ("fuse: add support for copy_file_range()")
Cc: <stable@vger.kernel.org> # v4.20
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
We do not support passthrough operations other than read/write on
regular file, so allowing non-regular backing files makes no sense.
Fixes: efad7153bf ("fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN")
Cc: stable@vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The s390 implementation of ptep_modify_prot_start() currently does
preempt_disable(), and the preempt_enable() is done later in
ptep_modify_prot_commit(). This logic is not really required, because the
PTE lock must be held over the complete prot_start/commit transaction,
as described in the comment of the generic implementation of
ptep_modify_prot_start().
That comment also mentions that this interface should be batchable,
and modify_prot_start_ptes() might start a transaction over a batch of
PTEs, implemented as a simple loop over ptep_modify_prot_start().
In this case, the preempt_disable() in ptep_modify_prot_start() would
be called multiple times, before the corresponding preempt_enable()
calls happen, and this can lead to a preempt_count overflow.
To fix this, simply remove the preempt_disable/enable() calls in
ptep_modify_prot_start/commit(), and rely on the PTE lock being held.
Commit cac1db8c3a ("mm: optimize mprotect() by PTE batching") made use
of this PTE batching for the first time, and triggers warnings like this:
DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >= PREEMPT_MASK - 10)
BUG: sleeping function called from invalid context at mm/mprotect.c:576
Hence, add a Fixes tag on that commit. Not because it is broken, but to
make sure that it won't get backported w/o also this fix for s390.
Fixes: cac1db8c3a ("mm: optimize mprotect() by PTE batching")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Commit ad781b550f ("ALSA: hda/hdmi: Rewrite to new probe method")
rewrote the HDMI codec ID tables to a new format. In doing so, recently
added codec IDs from commit e0a911ac86 ("ALSA: hda: Add missing NVIDIA
HDA codec IDs") were dropped from the tables. These tables had recently
been split from the unified table that existed in patch_hdmi.c, and did
contain the entries in question after the split but before the codec ID
entries were rewritten to the new format.
Restore the missing codec ID entries to nvhdmi.c and tegrahdmi.c. There
do not appear to be any additional missing entries in any of the other
codec ID tables when compared to the patch_hdmi.c at the final revision
before the split.
Fixes: ad781b550f ("ALSA: hda/hdmi: Rewrite to new probe method")
Signed-off-by: Daniel Dadap <ddadap@nvidia.com>
Link: https://patch.msgid.link/aK0ghvagXy740rxd@ddadap-lakeline.nvidia.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Merge series from Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>:
Because snd_dmaengine_pcm is sharing same dev with CPU and Platform,
snd_soc_lookup_component_nolocked() might be call with NULL driver name
(= CPU). This patch fixup and cleanup it.
Fix a bug where the driver's event subscription logic for SRQ-related
events incorrectly sets obj_type for RMP objects.
When subscribing to SRQ events, get_legacy_obj_type() did not handle
the MLX5_CMD_OP_CREATE_RMP case, which caused obj_type to be 0
(default).
This led to a mismatch between the obj_type used during subscription
(0) and the value used during notification (1, taken from the event's
type field). As a result, event mapping for SRQ objects could fail and
event notification would not be delivered correctly.
This fix adds handling for MLX5_CMD_OP_CREATE_RMP in get_legacy_obj_type,
returning MLX5_EVENT_QUEUE_TYPE_RQ so obj_type is consistent between
subscription and notification.
Fixes: 7597385371 ("IB/mlx5: Enable subscription for device events over DEVX")
Link: https://patch.msgid.link/r/8f1048e3fdd1fde6b90607ce0ed251afaf8a148c.1755088962.git.leon@kernel.org
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
object_err() reports details of an object for further debugging, such as
the freelist pointer, redzone, etc. However, if the pointer is invalid,
attempting to access object metadata can lead to a crash since it does
not point to a valid object.
One known path to the crash is when alloc_consistency_checks()
determines the pointer to the allocated object is invalid because of a
freelist corruption, and calls object_err() to report it. The debug code
should report and handle the corruption gracefully and not crash in the
process.
In case the pointer is NULL or check_valid_pointer() returns false for
the pointer, only print the pointer value and skip accessing metadata.
Fixes: 81819f0fc8 ("SLUB core")
Cc: <stable@vger.kernel.org>
Signed-off-by: Li Qiong <liqiong@nfschina.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
dma_free_coherent() must only be called if the corresponding
dma_alloc_coherent() call has succeeded. Calling it when the allocation fails
leads to undefined behavior.
Delete the wrong call.
[ bp: Massage commit message. ]
Fixes: 71bcada88b ("edac: altera: Add Altera SDRAM EDAC support")
Signed-off-by: Salah Triki <salah.triki@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/aIrfzzqh4IzYtDVC@pc
On commit 9286dfd573 ("platform/x86: asus-wmi: Fix spurious rfkill on
UX8406MA"), Mathieu adds a quirk for the Zenbook Duo to ignore the code
0x5f (WLAN button disable). On that laptop, this code is triggered when
the device keyboard is attached.
On the ASUS ROG Z13 2025, this code is triggered when pressing the side
button of the device, which is used to open Armoury Crate in Windows.
As this is becoming a pattern, where newer Asus laptops use this keycode
for emitting events, let's convert the wlan ignore quirk to instead
allow emitting codes, so that userspace programs can listen to it and
so that it does not interfere with the rfkill state.
With this patch, the Z13 wil emit KEY_PROG3 and the Duo will remain
unchanged and emit no event. While at it, add a quirk for the Z13 to
switch into tablet mode when removing the keyboard.
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
Link: https://lore.kernel.org/r/20250808154710.8981-2-lkml@antheas.dev
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Currently, the ignore_key_wlan quirk applies to keycodes 0x5D, 0x5E, and
0x5F. However, the relevant code for the Asus Zenbook Duo is only 0x5F.
Since this code is emitted by other Asus devices, such as from the Z13
for its ROG button, remove the extra codes before expanding the quirk.
For the Duo devices, which are the only ones that use this quirk, there
should be no effect.
Fixes: 9286dfd573 ("platform/x86: asus-wmi: Fix spurious rfkill on UX8406MA")
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
Link: https://lore.kernel.org/r/20250808154710.8981-1-lkml@antheas.dev
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
soc-generic-dmaengine-pcm.c uses same dev for both CPU and Platform.
In such case, CPU component driver might not have driver->name, then
snd_soc_lookup_component_nolocked() will be NULL pointer access error.
Care NULL driver name.
Call trace:
strcmp from snd_soc_lookup_component_nolocked+0x64/0xa4
snd_soc_lookup_component_nolocked from snd_soc_unregister_component_by_driver+0x2c/0x44
snd_soc_unregister_component_by_driver from snd_dmaengine_pcm_unregister+0x28/0x64
snd_dmaengine_pcm_unregister from devres_release_all+0x98/0xfc
devres_release_all from device_unbind_cleanup+0xc/0x60
device_unbind_cleanup from really_probe+0x220/0x2c8
really_probe from __driver_probe_device+0x88/0x1a0
__driver_probe_device from driver_probe_device+0x30/0x110
driver_probe_device from __driver_attach+0x90/0x178
__driver_attach from bus_for_each_dev+0x7c/0xcc
bus_for_each_dev from bus_add_driver+0xcc/0x1ec
bus_add_driver from driver_register+0x80/0x11c
driver_register from do_one_initcall+0x58/0x23c
do_one_initcall from kernel_init_freeable+0x198/0x1f4
kernel_init_freeable from kernel_init+0x1c/0x12c
kernel_init from ret_from_fork+0x14/0x28
Fixes: 144d6dfc74 ("ASoC: soc-core: merge snd_soc_unregister_component() and snd_soc_unregister_component_by_driver()")
Reported-by: J. Neuschäfer <j.ne@posteo.net>
Closes: https://lore.kernel.org/r/aJb311bMDc9x-dpW@probook
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reported-by: Ondřej Jirman <megi@xff.cz>
Closes: https://lore.kernel.org/r/arxpwzu6nzgjxvsndct65ww2wz4aezb5gjdzlgr24gfx7xvyih@natjg6dg2pj6
Tested-by: J. Neuschäfer <j.ne@posteo.net>
Message-ID: <87ect8ysv8.wl-kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
GPIO_ACTIVE_x flags are not correct in the context of interrupt flags.
These are simple defines so they could be used in DTS but they will not
have the same meaning: GPIO_ACTIVE_HIGH = 0 = IRQ_TYPE_NONE.
Correct the interrupt flags, assuming the author of the code wanted same
logical behavior behind the name "ACTIVE_xxx", this is:
ACTIVE_HIGH => IRQ_TYPE_LEVEL_HIGH
Fixes: 7b4a8097e5 ("arm64: dts: rockchip: Add Neardi LBA3368 board")
Cc: stable+noautosel@kernel.org # Needs testing, because actual level is just a guess
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Tested-By: Alex Bee <knaerzche@gmail.com>
Link: https://lore.kernel.org/r/20250818090445.28112-4-krzysztof.kozlowski@linaro.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
A hung task can occur during [1] LTP cgroup testing when repeatedly
mounting/unmounting perf_event and net_prio controllers with
systemd.unified_cgroup_hierarchy=1. The hang manifests in
cgroup_lock_and_drain_offline() during root destruction.
Related case:
cgroup_fj_function_perf_event cgroup_fj_function.sh perf_event
cgroup_fj_function_net_prio cgroup_fj_function.sh net_prio
Call Trace:
cgroup_lock_and_drain_offline+0x14c/0x1e8
cgroup_destroy_root+0x3c/0x2c0
css_free_rwork_fn+0x248/0x338
process_one_work+0x16c/0x3b8
worker_thread+0x22c/0x3b0
kthread+0xec/0x100
ret_from_fork+0x10/0x20
Root Cause:
CPU0 CPU1
mount perf_event umount net_prio
cgroup1_get_tree cgroup_kill_sb
rebind_subsystems // root destruction enqueues
// cgroup_destroy_wq
// kill all perf_event css
// one perf_event css A is dying
// css A offline enqueues cgroup_destroy_wq
// root destruction will be executed first
css_free_rwork_fn
cgroup_destroy_root
cgroup_lock_and_drain_offline
// some perf descendants are dying
// cgroup_destroy_wq max_active = 1
// waiting for css A to die
Problem scenario:
1. CPU0 mounts perf_event (rebind_subsystems)
2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work
3. A dying perf_event CSS gets queued for offline after root destruction
4. Root destruction waits for offline completion, but offline work is
blocked behind root destruction in cgroup_destroy_wq (max_active=1)
Solution:
Split cgroup_destroy_wq into three dedicated workqueues:
cgroup_offline_wq – Handles CSS offline operations
cgroup_release_wq – Manages resource release
cgroup_free_wq – Performs final memory deallocation
This separation eliminates blocking in the CSS free path while waiting for
offline operations to complete.
[1] https://github.com/linux-test-project/ltp/blob/master/runtest/controllers
Fixes: 334c3679ec ("cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends")
Reported-by: Gao Yingjie <gaoyingjie@uniontech.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Suggested-by: Teju Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Booting 6.16 on the Macchiatobin, I discover that I can no longer
access my disks, and thus the userspace boot fails. The cause appears
to be that one of the SATA controllers doesn't have any ports:
[ 1.190312] ahci f4540000.sata: supply ahci not found, using dummy regulator
[ 1.196255] ahci f4540000.sata: supply phy not found, using dummy regulator
[ 1.202026] ahci f4540000.sata: No port enabled
This is as a result of the blamed commit below which added a default
disabled status to the .dtsi, but didn't properly update the mcbin
dtsi file. Fix this regression.
Fixes: 30023876ae ("arm64: dts: marvell: only enable complete sata nodes")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
The simple-audio-card configuration for the Armada 370 development
board incorrectly routed the left channel signal ("AIN1L") to both
sides of the stereo "In Jack".
This commit corrects the typo for the right channel, changing the
second "AIN1L" entry to "AIN1R" to enable proper stereo input
recording.
Signed-off-by: Jihed Chaibi <jihed.chaibi.dev@gmail.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
The assigned clock for JPEG encoder IP has to be IMX95_CLK_VPUBLK_JPEG_ENC
and not IMX95_CLK_VPUBLK_JPEG_DEC (_ENC at the end, not _DEC). This is a
simple copy-paste error, fix it.
Fixes: 153c039a73 ("arm64: dts: imx95: add jpeg encode and decode nodes")
Signed-off-by: Marek Vasut <marek.vasut@mailbox.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
1, the phy support up to 8Mbit/s databitrate for CAN FD.
refer to product data sheet:
https://www.nxp.com/docs/en/data-sheet/TJA1463.pdf
2, the standby pin of the phy is ACTIVE_LOW.
3, the phy of flexcan2 connect the standby/en pin to PCAL6408 on i2c4 bus.
Fixes: 02b7adb791 ("arm64: dts: imx95-19x19-evk: add adc0 flexcan[1,2] i2c[2,3] uart5 spi3 and tpm3")
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Add missing microSD slot vqmmc-supply property, otherwise the kernel
might shut down LDO5 regulator and that would power off the microSD
card slot, possibly while it is in use. Add the property to make sure
the kernel is aware of the LDO5 regulator which supplies the microSD
slot and keeps the LDO5 enabled.
Fixes: 562d222f23 ("arm64: dts: imx8mp: Add support for Data Modul i.MX8M Plus eDM SBC")
Signed-off-by: Marek Vasut <marek.vasut@mailbox.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Add missing microSD slot vqmmc-supply property, otherwise the kernel
might shut down LDO5 regulator and that would power off the microSD
card slot, possibly while it is in use. Add the property to make sure
the kernel is aware of the LDO5 regulator which supplies the microSD
slot and keeps the LDO5 enabled.
Fixes: 8d6712695b ("arm64: dts: imx8mp: Add support for DH electronics i.MX8M Plus DHCOM and PDK2")
Signed-off-by: Marek Vasut <marek.vasut@mailbox.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Fix SD card removal caused by automatic LDO5 power off after boot:
LDO5: disabling
mmc1: card 59b4 removed
EXT4-fs (mmcblk1p2): shut down requested (2)
Aborting journal on device mmcblk1p2-8.
JBD2: I/O error when updating journal superblock for mmcblk1p2-8.
To prevent this, add vqmmc regulator for USDHC, using a GPIO-controlled
regulator that is supplied by LDO5. Since this is implemented on SoM but
used on baseboards with SD-card interface, implement the functionality
on SoM part and optionally enable it on baseboards if needed.
Fixes: 418d1d840e ("arm64: dts: freescale: add initial device tree for TQMa8MPQL with i.MX8MP")
Signed-off-by: Markus Niebel <Markus.Niebel@ew.tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
At inode_logged() we do a couple lockless checks for ->logged_trans, and
these are generally safe except the second one in case we get a load or
store tearing due to a concurrent call updating ->logged_trans (either at
btrfs_log_inode() or later at inode_logged()).
In the first case it's safe to compare to the current transaction ID since
once ->logged_trans is set the current transaction, we never set it to a
lower value.
In the second case, where we check if it's greater than zero, we are prone
to load/store tearing races, since we can have a concurrent task updating
to the current transaction ID with store tearing for example, instead of
updating with a single 64 bits write, to update with two 32 bits writes or
four 16 bits writes. In that case the reading side at inode_logged() could
see a positive value that does not match the current transaction and then
return a false negative.
Fix this by doing the second check while holding the inode's spinlock, add
some comments about it too. Also add the data_race() annotation to the
first check to avoid any reports from KCSAN (or similar tools) and comment
about it.
Fixes: 0f8ce49821 ("btrfs: avoid inode logging during rename and link when possible")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At inode_logged() if we find that the inode was not logged before we
update its ->last_dir_index_offset to (u64)-1 with the goal that the
next directory log operation will see the (u64)-1 and then figure out
it must check what was the index of the last logged dir index key and
update ->last_dir_index_offset to that key's offset (this is done in
update_last_dir_index_offset()).
This however has a possibility for a time window where a race can happen
and lead to directory logging skipping dir index keys that should be
logged. The race happens like this:
1) Task A calls inode_logged(), sees ->logged_trans as 0 and then checks
that the inode item was logged before, but before it sets the inode's
->last_dir_index_offset to (u64)-1...
2) Task B is at btrfs_log_inode() which calls inode_logged() early, and
that has set ->last_dir_index_offset to (u64)-1;
3) Task B then enters log_directory_changes() which calls
update_last_dir_index_offset(). There it sees ->last_dir_index_offset
is (u64)-1 and that the inode was logged before (ctx->logged_before is
true), and so it searches for the last logged dir index key in the log
tree and it finds that it has an offset (index) value of N, so it sets
->last_dir_index_offset to N, so that we can skip index keys that are
less than or equal to N (later at process_dir_items_leaf());
4) Task A now sets ->last_dir_index_offset to (u64)-1, undoing the update
that task B just did;
5) Task B will now skip every index key when it enters
process_dir_items_leaf(), since ->last_dir_index_offset is (u64)-1.
Fix this by making inode_logged() not touch ->last_dir_index_offset and
initializing it to 0 when an inode is loaded (at btrfs_alloc_inode()) and
then having update_last_dir_index_offset() treat a value of 0 as meaning
we must check the log tree and update with the index of the last logged
index key. This is fine since the minimum possible value for
->last_dir_index_offset is 1 (BTRFS_DIR_START_INDEX - 1 = 2 - 1 = 1).
This also simplifies the management of ->last_dir_index_offset and now
all accesses to it are done under the inode's log_mutex.
Fixes: 0f8ce49821 ("btrfs: avoid inode logging during rename and link when possible")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There's a race between checking if an inode was logged before and logging
an inode that can cause us to mark an inode as not logged just after it
was logged by a concurrent task:
1) We have inode X which was not logged before neither in the current
transaction not in past transaction since the inode was loaded into
memory, so it's ->logged_trans value is 0;
2) We are at transaction N;
3) Task A calls inode_logged() against inode X, sees that ->logged_trans
is 0 and there is a log tree and so it proceeds to search in the log
tree for an inode item for inode X. It doesn't see any, but before
it sets ->logged_trans to N - 1...
3) Task B calls btrfs_log_inode() against inode X, logs the inode and
sets ->logged_trans to N;
4) Task A now sets ->logged_trans to N - 1;
5) At this point anyone calling inode_logged() gets 0 (inode not logged)
since ->logged_trans is greater than 0 and less than N, but our inode
was really logged. As a consequence operations like rename, unlink and
link that happen afterwards in the current transaction end up not
updating the log when they should.
Fix this by ensuring inode_logged() only updates ->logged_trans in case
the inode item is not found in the log tree if after tacking the inode's
lock (spinlock struct btrfs_inode::lock) the ->logged_trans value is still
zero, since the inode lock is what protects setting ->logged_trans at
btrfs_log_inode().
Fixes: 0f8ce49821 ("btrfs: avoid inode logging during rename and link when possible")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of incrementing the inode's link count and refcount early before
adding the link, updating the inode and deleting orphan item, do it after
all those steps succeeded right before calling d_instantiate(). This makes
the error handling logic simpler by avoiding the need for the 'drop_inode'
variable to signal if we need to undo the link count increment and the
inode refcount increase under the 'fail' label.
This also reduces the level of indentation by one, making the code easier
to read.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we fail to update the inode or delete the orphan item we leak the inode
since we update its refcount with the ihold() call to account for the
d_instantiate() call which never happens in case we fail those steps. Fix
this by setting 'drop_inode' to true in case we fail those steps.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we fail to update the inode or delete the orphan item, we must abort
the transaction to prevent persisting an inconsistent state. For example
if we fail to update the inode item, we have the inconsistency of having
a persisted inode item with a link count of N but we have N + 1 inode ref
items and N + 1 directory entries pointing to our inode in case the
transaction gets committed.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the "active" mode of the amd-pstate driver with performance
governor, the CPPC.min_perf is expected to be the nominal_perf.
However after commit a9b9b4c2a4 ("cpufreq/amd-pstate: Drop min and
max cached frequencies"), this is not the case when the governor is
switched from performance to powersave and back to performance, and
the CPPC.min_perf will be equal to the scaling_min_freq that was set
for the powersave governor.
This is because prior to commit a9b9b4c2a4 ("cpufreq/amd-pstate:
Drop min and max cached frequencies"), amd_pstate_epp_update_limit()
would unconditionally call amd_pstate_update_min_max_limit() and the
latter function would enforce the CPPC.min_perf constraint when the
governor is performance.
However, after the aforementioned commit,
amd_pstate_update_min_max_limit() is called by
amd_pstate_epp_update_limit() only when either the
scaling_{min/max}_freq is different from the cached value of
cpudata->{min/max}_limit_freq, which wouldn't have changed on a
governor transition from powersave to performance, thus missing out on
enforcing the CPPC.min_perf constraint for the performance governor.
Fix this by invoking amd_pstate_epp_udpate_limit() not only when the
{min/max} limits have changed from the cached values, but also when
the policy itself has changed.
Fixes: a9b9b4c2a4 ("cpufreq/amd-pstate: Drop min and max cached frequencies")
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20250821042638.356-1-gautham.shenoy@amd.com
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
When soc-button-array looks up the GPIO to use it calls acpi_find_gpio()
which will parse _CRS.
acpi_find_gpio.cold (drivers/gpio/gpiolib-acpi-core.c:953)
gpiod_find_and_request (drivers/gpio/gpiolib.c:4598 drivers/gpio/gpiolib.c:4625)
gpiod_get_index (drivers/gpio/gpiolib.c:4877)
The GPIO is setup basically, but the debounce information is discarded.
The platform will assert what debounce should be in _CRS, so program it
at the time it's available.
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
The clean up in idxd_setup_wqs() has had a couple bugs because the error
handling is a bit subtle. It's simpler to just re-write it in a cleaner
way. The issues here are:
1) If "idxd->max_wqs" is <= 0 then we call put_device(conf_dev) when
"conf_dev" hasn't been initialized.
2) If kzalloc_node() fails then again "conf_dev" is invalid. It's
either uninitialized or it points to the "conf_dev" from the
previous iteration so it leads to a double free.
It's better to free partial loop iterations within the loop and then
the unwinding at the end can handle whole loop iterations. I also
renamed the labels to describe what the goto does and not where the goto
was located.
Fixes: 3fd2f4bc01 ("dmaengine: idxd: fix memory leak in error handling path of idxd_setup_wqs")
Reported-by: Colin Ian King <colin.i.king@gmail.com>
Closes: https://lore.kernel.org/all/20250811095836.1642093-1-colin.i.king@gmail.com/
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/aJnJW3iYTDDCj9sk@stanley.mountain
Signed-off-by: Vinod Koul <vkoul@kernel.org>
A recent refactor introduced a misplaced put_device() call, resulting in a
reference count underflow during module unload.
There is no need to add additional put_device() calls for idxd groups,
engines, or workqueues. Although the commit claims: "Note, this also
fixes the missing put_device() for idxd groups, engines, and wqs."
It appears no such omission actually existed. The required cleanup is
already handled by the call chain:
idxd_unregister_devices() -> device_unregister() -> put_device()
Extend idxd_cleanup() to handle the remaining necessary cleanup and
remove idxd_cleanup_internals(), which duplicates deallocation logic
for idxd, engines, groups, and workqueues. Memory management is also
properly handled through the Linux device model.
Fixes: a409e919ca ("dmaengine: idxd: Refactor remove call with idxd_cleanup() helper")
Signed-off-by: Yi Sun <yi.sun@intel.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://lore.kernel.org/r/20250729150313.1934101-3-yi.sun@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The call to idxd_free() introduces a duplicate put_device() leading to a
reference count underflow:
refcount_t: underflow; use-after-free.
WARNING: CPU: 15 PID: 4428 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
...
Call Trace:
<TASK>
idxd_remove+0xe4/0x120 [idxd]
pci_device_remove+0x3f/0xb0
device_release_driver_internal+0x197/0x200
driver_detach+0x48/0x90
bus_remove_driver+0x74/0xf0
pci_unregister_driver+0x2e/0xb0
idxd_exit_module+0x34/0x7a0 [idxd]
__do_sys_delete_module.constprop.0+0x183/0x280
do_syscall_64+0x54/0xd70
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The idxd_unregister_devices() which is invoked at the very beginning of
idxd_remove(), already takes care of the necessary put_device() through the
following call path:
idxd_unregister_devices() -> device_unregister() -> put_device()
In addition, when CONFIG_DEBUG_KOBJECT_RELEASE is enabled, put_device() may
trigger asynchronous cleanup via schedule_delayed_work(). If idxd_free() is
called immediately after, it can result in a use-after-free.
Remove the improper idxd_free() to avoid both the refcount underflow and
potential memory corruption during module unload.
Fixes: d5449ff1b0 ("dmaengine: idxd: Add missing idxd cleanup to fix memory leak in remove call")
Signed-off-by: Yi Sun <yi.sun@intel.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://lore.kernel.org/r/20250729150313.1934101-2-yi.sun@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The block fops don't try to handle metadata for synchronous requests,
probably because the completion handler looks at dio->iocb which is not
valid for synchronous requests.
But silently ignoring metadata (or warning in case of
__blkdev_direct_IO_simple) is a really bad idea as that can cause
silent data corruption if a user ever shows up.
Instead simply handle metadata for synchronous requests as the completion
handler can simply check for bio_integrity() as the block layer default
integrity will already be freed at this point, and thus bio_integrity()
will only return true for user mapped integrity.
Fixes: 3d8b5a22d4 ("block: add support to pass user meta buffer")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250819082517.2038819-3-hch@lst.de
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Genpd OF providers must now be registered after genpd bus registration.
However, cpg_mstp_add_clk_domain() is only called from CLK_OF_DECLARE(),
which is too early. Hence on R-Car M1A, R-Car H1, and RZ/A1, the
CPG/MSTP Clock Domain fails to register, and any devices residing in
that clock domain fail to probe.
Fix this by splitting initialization into two steps:
- The first part keeps on registering the PM domain with genpd at
CLK_OF_DECLARE(),
- The second and new part moves the registration of the genpd OF
provider to a postcore_initcall().
See also commit c5ae5a0c61 ("pmdomain: renesas: rcar-sysc: Add
genpd OF provider at postcore_initcall").
Fixes: 18a3a510ec ("pmdomain: core: Add the genpd->dev to the genpd provider bus")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/81ef5f8d5d31374b7852b05453c52d2f735062a2.1755073087.git.geert+renesas@glider.be
In the do_validate_mem(), the call to add_interval() does not
handle errors. If kmalloc() fails in add_interval(), it could
result in a null pointer being inserted into the linked list,
leading to illegal memory access when sub_interval() is called
next.
This patch adds an error handling for the add_interval(). If
add_interval() returns an error, the function will return early
with the error code.
Fixes: 7b4884ca88 ("pcmcia: validate late-added resources")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
The last use of pcmcia_get_socket_by_nr() was removed in 2010 by
commit 5716d415f8 ("pcmcia: remove obsolete ioctl")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Add missing check for platform_get_resource() and return error if it fails
to catch the error.
Fixes: d87d44f7ab ("ARM: omap1: move CF chipselect setup to board file")
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Remove hard-coded strings by using the str_off_on() and str_yes_no()
helper functions.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
The config PCCARD_IODYN was last used in the config option PCMCIA_M8XX with
its m8xx_pcmcia driver. This driver was removed with commit 39eb56da2b
("pcmcia: Remove m8xx_pcmcia driver"), included in v3.17, back in 2014.
Since then, the config PCCARD_IODYN is unused. Remove the config option,
the corresponding file included with this config and the corresponding
definition in the pcmcia header file.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
In __iodyn_find_io_region(), pcmcia_make_resource() is assigned to
res and used in pci_bus_alloc_resource(). There is a dereference of res
in pci_bus_alloc_resource(), which could lead to a NULL pointer
dereference on failure of pcmcia_make_resource().
Fix this bug by adding a check of res.
Cc: stable@vger.kernel.org
Fixes: 49b1153adf ("pcmcia: move all pcmcia_resource_ops providers into one module")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
As described in the added code comment, a reference to .exit.text is ok
for drivers registered via platform_driver_probe(). Make this explicit
to prevent the following section mismatch warning
WARNING: modpost: drivers/pcmcia/omap_cf: section mismatch in reference: omap_cf_driver+0x4 (section: .data) -> omap_cf_remove (section: .exit.text)
that triggers on an omap1_defconfig + CONFIG_OMAP_CF=m build.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
When running an XDP bpf_prog on the remote CPU in cpumap code
then we must disable the direct return optimization that
xdp_return can perform for mem_type page_pool. This optimization
assumes code is still executing under RX-NAPI of the original
receiving CPU, which isn't true on this remote CPU.
The cpumap code already disabled this via helpers
xdp_set_return_frame_no_direct() and xdp_clear_return_frame_no_direct(),
but the scope didn't include xdp_do_flush().
When doing XDP_REDIRECT towards e.g devmap this causes the
function bq_xmit_all() to run with direct return optimization
enabled. This can lead to hard to find bugs. The issue
only happens when bq_xmit_all() cannot ndo_xdp_xmit all
frames and them frees them via xdp_return_frame_rx_napi().
Fix by expanding scope to include xdp_do_flush(). This was found
by Dragos Tatulea.
Fixes: 11941f8a85 ("bpf: cpumap: Implement generic cpumap")
Reported-by: Dragos Tatulea <dtatulea@nvidia.com>
Reported-by: Chris Arges <carges@cloudflare.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Chris Arges <carges@cloudflare.com>
Link: https://patch.msgid.link/175519587755.3008742.1088294435150406835.stgit@firesoul
Cross-thread attacks are generally harder as they require the victim to be
co-located on a core. However, with VMSCAPE the adversary targets belong to
the same guest execution, that are more likely to get co-located. In
particular, a thread that is currently executing userspace hypervisor
(after the IBPB) may still be targeted by a guest execution from a sibling
thread.
Issue a warning about the potential risk, except when:
- SMT is disabled
- STIBP is enabled system-wide
- Intel eIBRS is enabled (which implies STIBP protection)
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
cpu_bugs_smt_update() uses global variables from different mitigations. For
SMT updates it can't currently use vmscape_mitigation that is defined after
it.
Since cpu_bugs_smt_update() depends on many other mitigations, move it
after all mitigations are defined. With that, it can use vmscape_mitigation
in a moment.
No functional change.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
VMSCAPE is a vulnerability that exploits insufficient branch predictor
isolation between a guest and a userspace hypervisor (like QEMU). Existing
mitigations already protect kernel/KVM from a malicious guest. Userspace
can additionally be protected by flushing the branch predictors after a
VMexit.
Since it is the userspace that consumes the poisoned branch predictors,
conditionally issue an IBPB after a VMexit and before returning to
userspace. Workloads that frequently switch between hypervisor and
userspace will incur the most overhead from the new IBPB.
This new IBPB is not integrated with the existing IBPB sites. For
instance, a task can use the existing speculation control prctl() to
get an IBPB at context switch time. With this implementation, the
IBPB is doubled up: one at context switch and another before running
userspace.
The intent is to integrate and optimize these cases post-embargo.
[ dhansen: elaborate on suboptimal IBPB solution ]
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
The VMSCAPE vulnerability may allow a guest to cause Branch Target
Injection (BTI) in userspace hypervisors.
Kernels (both host and guest) have existing defenses against direct BTI
attacks from guests. There are also inter-process BTI mitigations which
prevent processes from attacking each other. However, the threat in this
case is to a userspace hypervisor within the same process as the attacker.
Userspace hypervisors have access to their own sensitive data like disk
encryption keys and also typically have access to all guest data. This
means guest userspace may use the hypervisor as a confused deputy to attack
sensitive guest kernel data. There are no existing mitigations for these
attacks.
Introduce X86_BUG_VMSCAPE for this vulnerability and set it on affected
Intel and AMD CPUs.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
VMSCAPE is a vulnerability that may allow a guest to influence the branch
prediction in host userspace, particularly affecting hypervisors like QEMU.
Add the documentation.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
There are two compatible strings defined in "8250.yaml" that require
two clocks to be specified, along with their names:
- "spacemit,k1-uart", used in "spacemit/k1.dtsi"
- "nxp,lpc1850-uart", used in "lpc/lpc18xx.dtsi"
When only one clock is used, the name is not required. However there
are two places that do specify a name:
- In "mediatek/mt7623.dtsi", the clock for the "mediatek,mtk-btif"
compatible serial device is named "main"
- In "qca/ar9132.dtsi", the clock for the "ns8250" compatible
serial device is named "uart"
In commit d2db0d7815 ("dt-bindings: serial: 8250: allow clock
'uartclk' and 'reg' for nxp,lpc1850-uart"), Frank Li added the
restriction that two named clocks be used for the NXP platform
mentioned above.
Change that logic, so that an additional condition for (only) the
SpacemiT platform similarly restricts the two clocks to have the
names "core" and "bus".
Finally, add "main" and "uart" as allowed names when a single clock is
specified.
Fixes: 2c0594f9f0 ("dt-bindings: serial: 8250: support an optional second clock")
Cc: stable <stable@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202507160314.wrC51lXX-lkp@intel.com/
Signed-off-by: Alex Elder <elder@riscstar.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20250813031338.2328392-1-elder@riscstar.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When trying to set MCR[2], XON1 is incorrectly accessed instead. And when
writing to the TCR register to configure flow control levels, we are
incorrectly writing to the MSR register. The default value of $00 is then
used for TCR, which means that selectable trigger levels in FCR are used
in place of TCR.
TCR/TLR access requires EFR[4] (enable enhanced functions) and MCR[2]
to be set. EFR[4] is already set in probe().
MCR access requires LCR[7] to be zero.
Since LCR is set to $BF when trying to set MCR[2], XON1 is incorrectly
accessed instead because MCR shares the same address space as XON1.
Since MCR[2] is unmodified and still zero, when writing to TCR we are in
fact writing to MSR because TCR/TLR registers share the same address space
as MSR/SPR.
Fix by first removing useless reconfiguration of EFR[4] (enable enhanced
functions), as it is already enabled in sc16is7xx_probe() since commit
43c51bb573 ("sc16is7xx: make sure device is in suspend once probed").
Now LCR is $00, which means that MCR access is enabled.
Also remove regcache_cache_bypass() calls since we no longer access the
enhanced registers set, and TCR is already declared as volatile (in fact
by declaring MSR as volatile, which shares the same address).
Finally disable access to TCR/TLR registers after modifying them by
clearing MCR[2].
Note: the comment about "... and internal clock div" is wrong and can be
ignored/removed as access to internal clock div registers (DLL/DLH)
is permitted only when LCR[7] is logic 1, not when enhanced features
is enabled. And DLL/DLH access is not needed in sc16is7xx_startup().
Fixes: dfeae619d7 ("serial: sc16is7xx")
Cc: stable@vger.kernel.org
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Link: https://lore.kernel.org/r/20250731124451.1108864-1-hugo@hugovil.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In converting marvell,comphy-cp110 to schema, the constraints for clocks on
marvell,comphy-a3700 are wrong, the maximum number of child nodes are wrong,
and the phy nodes may have a 'connector' child node:
phy@18300 (marvell,comphy-a3700): clock-names: False schema does not allow ['xtal']
phy@120000 (marvell,comphy-cp110): 'phy@3', 'phy@4', 'phy@5' do not match any of the regexes: '^phy@[0-2]$', '^pinctrl-[0-9]+$'
phy@120000 (marvell,comphy-cp110): phy@2: 'connector' does not match any of the regexes: '^pinctrl-[0-9]+$'
Fixes: 50355ac70d ("dt-bindings: phy: Convert marvell,comphy-cp110 to DT schema")
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20250806200138.1366189-1-robh@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
There's a possible integer overflow in stripe_io_hints if we have too
large chunk size. Test if the overflow happened, and if it did, don't set
limits->io_min and limits->io_opt;
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Suggested-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Cc: stable@vger.kernel.org
As described in the pinebookpro_v2.1_mainboard_schematic.pdf page 10,
he SPI Flash's VCC connector is connected to VCC_3V0 power source.
This fixes the following warning:
spi-nor spi1.0: supply vcc not found, using dummy regulator
Fixes: 5a65505a69 ("arm64: dts: rockchip: Add initial support for Pinebook Pro")
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Reviewed-by: Dragan Simic <dsimic@manjaro.org>
Link: https://lore.kernel.org/r/20250730102129.224468-1-pbrobinson@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
When we don't have a clock specified in the device tree, we have no way to
ensure the BAM is on. This is often the case for remotely-controlled or
remotely-powered BAM instances. In this case, we need to read num-channels
from the DT to have all the necessary information to complete probing.
However, at the moment invalid device trees without clock and without
num-channels still continue probing, because the error handling is missing
return statements. The driver will then later try to read the number of
channels from the registers. This is unsafe, because it relies on boot
firmware and lucky timing to succeed. Unfortunately, the lack of proper
error handling here has been abused for several Qualcomm SoCs upstream,
causing early boot crashes in several situations [1, 2].
Avoid these early crashes by erroring out when any of the required DT
properties are missing. Note that this will break some of the existing DTs
upstream (mainly BAM instances related to the crypto engine). However,
clearly these DTs have never been tested properly, since the error in the
kernel log was just ignored. It's safer to disable the crypto engine for
these broken DTBs.
[1]: https://lore.kernel.org/r/CY01EKQVWE36.B9X5TDXAREPF@fairphone.com/
[2]: https://lore.kernel.org/r/20230626145959.646747-1-krzysztof.kozlowski@linaro.org/
Cc: stable@vger.kernel.org
Fixes: 48d163b1aa ("dmaengine: qcom: bam_dma: get num-channels and num-ees from dt")
Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250212-bam-dma-fixes-v1-8-f560889e65d8@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
num-channels and qcom,num-ees are required when there are no clocks
specified in the device tree, because we have no reliable way to read them
from the hardware registers if we cannot ensure the BAM hardware is up when
the device is being probed.
This has often been forgotten when adding new SoC device trees, so make
this clear by describing this requirement in the schema.
Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Link: https://lore.kernel.org/r/20250212-bam-dma-fixes-v1-7-f560889e65d8@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.