Pull SCSI fixes from James Bottomley:
"Three small driver fixes and one larger unused function set removal in
the raid class (so no external impact)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: snic: Fix double free in snic_tgt_create()
scsi: core: raid_class: Remove raid_component_add()
scsi: ufs: ufs-qcom: Clear qunipro_g4_sel for HW major version > 5
scsi: ufs: mcq: Fix the search/wrap around logic
Pull x86 fixes from Ingo Molnar:
"Fix an FPU invalidation bug on exec(), and fix a performance
regression due to a missing setting of X86_FEATURE_OSXSAVE"
* tag 'x86-urgent-2023-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
x86/fpu: Invalidate FPU state correctly on exec()
Pull irq fix from Thomas Gleixner:
"A last minute fix for a regression introduced in the v6.5 merge
window.
The conversion of the software based interrupt resend mechanism to
hlist missed to add a check whether the descriptor is already enqueued
and dropped the interrupt descriptor lookup for nested interrupts.
The missing check whether the descriptor is already queued causes
hlist corruption and can be observed in the wild. The dropped parent
descriptor lookup has not yet caused problems, but it would result in
stale interrupt line in the worst case.
Add the missing enqueued check and bring the descriptor lookup back to
cure this"
* tag 'irq-urgent-2023-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Fix software resend lockup and nested resend
Pull LoongArch fixes from Huacai Chen:
"Fix a ptrace bug, a hw_breakpoint bug, some build errors/warnings and
some trivial cleanups"
* tag 'loongarch-fixes-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Fix hw_breakpoint_control() for watchpoints
LoongArch: Ensure FP/SIMD registers in the core dump file is up to date
LoongArch: Put the body of play_dead() into arch_cpu_idle_dead()
LoongArch: Add identifier names to arguments of die() declaration
LoongArch: Return earlier in die() if notify_die() returns NOTIFY_STOP
LoongArch: Do not kill the task in die() if notify_die() returns NOTIFY_STOP
LoongArch: Remove <asm/export.h>
LoongArch: Replace #include <asm/export.h> with #include <linux/export.h>
LoongArch: Remove unneeded #include <asm/export.h>
LoongArch: Replace -ffreestanding with finer-grained -fno-builtin's
LoongArch: Remove redundant "source drivers/firmware/Kconfig"
The switch to using hlist for managing software resend of interrupts
broke resend in at least two ways:
First, unconditionally adding interrupt descriptors to the resend list can
corrupt the list when the descriptor in question has already been
added. This causes the resend tasklet to loop indefinitely with interrupts
disabled as was recently reported with the Lenovo ThinkPad X13s after
threaded NAPI was disabled in the ath11k WiFi driver.
This bug is easily fixed by restoring the old semantics of irq_sw_resend()
so that it can be called also for descriptors that have already been marked
for resend.
Second, the offending commit also broke software resend of nested
interrupts by simply discarding the code that made sure that such
interrupts are retriggered using the parent interrupt.
Add back the corresponding code that adds the parent descriptor to the
resend list.
Fixes: bc06a9e087 ("genirq: Use hlist for managing resend handlers")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/lkml/20230809073432.4193-1-johan+linaro@kernel.org/
Link: https://lore.kernel.org/r/20230826154004.1417-1-johan+linaro@kernel.org
In hw_breakpoint_control(), encode_ctrl_reg() has already encoded the
MWPnCFG3_LoadEn/MWPnCFG3_StoreEn bits in info->ctrl. We don't need to
add (1 << MWPnCFG3_LoadEn | 1 << MWPnCFG3_StoreEn) unconditionally.
Otherwise we can't set read watchpoint and write watchpoint separately.
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
This is a port of commit 379eb01c21 ("riscv: Ensure the value
of FP registers in the core dump file is up to date").
The values of FP/SIMD registers in the core dump file come from the
thread.fpu. However, kernel saves the FP/SIMD registers only before
scheduling out the process. If no process switch happens during the
exception handling, kernel will not have a chance to save the latest
values of FP/SIMD registers. So it may cause their values in the core
dump file incorrect. To solve this problem, force fpr_get()/simd_get()
to save the FP/SIMD registers into the thread.fpu if the target task
equals the current task.
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Pull clk fixes from Stephen Boyd:
"One clk driver fix and two clk framework fixes:
- Fix an OOB access when devm_get_clk_from_child() is used and
devm_clk_release() casts the void pointer to the wrong type
- Move clk_rate_exclusive_{get,put}() within the correct ifdefs in
clk.h so that the stubs are used when CONFIG_COMMON_CLK=n
- Register the proper clk provider function depending on the value of
#clock-cells in the TI keystone driver"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: Fix slab-out-of-bounds error in devm_clk_release()
clk: Fix undefined reference to `clk_rate_exclusive_{get,put}'
clk: keystone: syscon-clk: Fix audio refclk
The gcc compiler translates on some architectures the 64-bit
__builtin_clzll() function to a call to the libgcc function __clzdi2(),
which should take a 64-bit parameter on 32- and 64-bit platforms.
But in the current kernel code, the built-in __clzdi2() function is
defined to operate (wrongly) on 32-bit parameters if BITS_PER_LONG ==
32, thus the return values on 32-bit kernels are in the range from
[0..31] instead of the expected [0..63] range.
This patch fixes the in-kernel functions __clzdi2() and __ctzdi2() to
take a 64-bit parameter on 32-bit kernels as well, thus it makes the
functions identical for 32- and 64-bit kernels.
This bug went unnoticed since kernel 3.11 for over 10 years, and here
are some possible reasons for that:
a) Some architectures have assembly instructions to count the bits and
which are used instead of calling __clzdi2(), e.g. on x86 the bsr
instruction and on ppc cntlz is used. On such architectures the
wrong __clzdi2() implementation isn't used and as such the bug has
no effect and won't be noticed.
b) Some architectures link to libgcc.a, and the in-kernel weak
functions get replaced by the correct 64-bit variants from libgcc.a.
c) __builtin_clzll() and __clzdi2() doesn't seem to be used in many
places in the kernel, and most likely only in uncritical functions,
e.g. when printing hex values via seq_put_hex_ll(). The wrong return
value will still print the correct number, but just in a wrong
formatting (e.g. with too many leading zeroes).
d) 32-bit kernels aren't used that much any longer, so they are less
tested.
A trivial testcase to verify if the currently running 32-bit kernel is
affected by the bug is to look at the output of /proc/self/maps:
Here the kernel uses a correct implementation of __clzdi2():
root@debian:~# cat /proc/self/maps
00010000-00019000 r-xp 00000000 08:05 787324 /usr/bin/cat
00019000-0001a000 rwxp 00009000 08:05 787324 /usr/bin/cat
0001a000-0003b000 rwxp 00000000 00:00 0 [heap]
f7551000-f770d000 r-xp 00000000 08:05 794765 /usr/lib/hppa-linux-gnu/libc.so.6
...
and this kernel uses the broken implementation of __clzdi2():
root@debian:~# cat /proc/self/maps
0000000010000-0000000019000 r-xp 00000000 000000008:000000005 787324 /usr/bin/cat
0000000019000-000000001a000 rwxp 000000009000 000000008:000000005 787324 /usr/bin/cat
000000001a000-000000003b000 rwxp 00000000 00:00 0 [heap]
00000000f73d1000-00000000f758d000 r-xp 00000000 000000008:000000005 794765 /usr/lib/hppa-linux-gnu/libc.so.6
...
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 4df87bb7b6 ("lib: add weak clz/ctz functions")
Cc: Chanho Min <chanho.min@lge.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull misc fixes from Andrew Morton:
"18 hotfixes. 13 are cc:stable and the remainder pertain to post-6.4
issues or aren't considered suitable for a -stable backport"
* tag 'mm-hotfixes-stable-2023-08-25-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
shmem: fix smaps BUG sleeping while atomic
selftests: cachestat: catch failing fsync test on tmpfs
selftests: cachestat: test for cachestat availability
maple_tree: disable mas_wr_append() when other readers are possible
madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check
madvise:madvise_free_huge_pmd(): don't use mapcount() against large folio for sharing check
madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check
mm: multi-gen LRU: don't spin during memcg release
mm: memory-failure: fix unexpected return value in soft_offline_page()
radix tree: remove unused variable
mm: add a call to flush_cache_vmap() in vmap_pfn()
selftests/mm: FOLL_LONGTERM need to be updated to 0x100
nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers()
mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast
selftests: cgroup: fix test_kmem_basic less than error
mm: enable page walking API to lock vmas during the walk
smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd()
mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
Pull RISC-V fixes from Palmer Dabbelt:
"This is obviously not ideal, particularly for something this late in
the cycle.
Unfortunately we found some uABI issues in the vector support while
reviewing the GDB port, which has triggered a revert -- probably a
good sign we should have reviewed GDB before merging this, I guess I
just dropped the ball because I was so worried about the context
extension and libc suff I forgot. Hence the late revert.
There's some risk here as we're still exposing the vector context for
signal handlers, but changing that would have meant reverting all of
the vector support. The issues we've found so far have been fixed
already and they weren't absolute showstoppers, so we're essentially
just playing it safe by holding ptrace support for another release (or
until we get through a proper userspace code review).
Summary:
- The vector ucontext extension has been extended with vlenb
- The vector registers ELF core dump note type has been changed to
avoid aliasing with the CSR type used in embedded systems
- Support for accessing vector registers via ptrace() has been
reverted
- Another build fix for the ISA spec changes around Zifencei/Zicsr
that manifests on some systems built with binutils-2.37 and
gcc-11.2"
* tag 'riscv-for-linus-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix build errors using binutils2.37 toolchains
RISC-V: vector: export VLENB csr in __sc_riscv_v_state
RISC-V: Remove ptrace support for vectors
Pull gpio fixes from Bartosz Golaszewski:
- fix an irq mapping leak in gpio-sim
- associate the GPIO device's software node with the irq domain in
gpio-sim
* tag 'gpio-fixes-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: sim: pass the GPIO device's software node to irq domain
gpio: sim: dispose of irq mappings before destroying the irq_sim domain
Pull pin control fixes from Linus Walleij:
"Here are some Renesas and AMD driver fixes, the AMD fix affects
important laptops in the wild so this one is pretty important. It
seems a bit tough to get this right.
- Fix DT parsing and related locking in the Renesas driver.
- Fix wakeup IRQs in the AMD driver once again. Really tricky this
one"
* tag 'pinctrl-v6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: amd: Mask wake bits on probe again
pinctrl: renesas: rza2: Add lock around pinctrl_generic{{add,remove}_group,{add,remove}_function}
pinctrl: renesas: rzv2m: Fix NULL pointer dereference in rzv2m_dt_subnode_to_map()
pinctrl: renesas: rzg2l: Fix NULL pointer dereference in rzg2l_dt_subnode_to_map()
Pull sound fixes from Takashi Iwai:
"Hopefully the last bits for 6.5. It's slightly higher LOCs than
wished, but it doesn't look scary.
The biggest change is MAINTAINERS update for TI; it's good to have the
update before the final release, so that people can contact to the
right persons for bug reports (which shouldn't happen of course!)
The rest are all device-specific fixes and quirks, most for various
ASoC platforms"
* tag 'sound-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: amd: yc: Fix a non-functional mic on Lenovo 82SJ
ALSA: ymfpci: Fix the missing snd_card_free() call at probe error
ASoC: cs35l41: Correct amp_gain_tlv values
ASoC: amd: yc: Add VivoBook Pro 15 to quirks list for acp6x
ASoC: tas2781: fixed register access error when switching to other chips
ASoC: cs35l56: Add an ACPI match table
ASoC: cs35l56: Read firmware uuid from a device property instead of _SUB
ASoC: SOF: ipc4-pcm: fix possible null pointer deference
MAINTAINERS: Add entries for TEXAS INSTRUMENTS ASoC DRIVERS
The initial aim is to silence the following objtool warning:
arch/loongarch/kernel/process.o: warning: objtool: arch_cpu_idle_dead() falls through to next function start_thread()
According to tools/objtool/Documentation/objtool.txt, this is because
the last instruction of arch_cpu_idle_dead() is a call to a noreturn
function play_dead(). In order to silence the warning, one simple way
is to add the noreturn function play_dead() to objtool's hard-coded
global_noreturns array, that is to say, just put "NORETURN(play_dead)"
into tools/objtool/noreturns.h, it works well.
But I noticed that play_dead() is only defined once and only called by
arch_cpu_idle_dead(), so put the body of play_dead() into the caller
arch_cpu_idle_dead(), then remove the noreturn function play_dead() is
an alternative way which can reduce the overhead of the function call
at the same time.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Add identifier names to arguments of die() declaration in ptrace.h
to fix the following checkpatch warnings:
WARNING: function definition argument 'const char *' should also have an identifier name
WARNING: function definition argument 'struct pt_regs *' should also have an identifier name
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
After the call to oops_exit(), it should not panic or execute
the crash kernel if the oops is to be suppressed.
Suggested-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
If notify_die() returns NOTIFY_STOP, honor the return value from the
handler chain invocation in die() and return without killing the task
as, through a debugger, the fault may have been fixed. It makes sense
even if ignoring the event will make the system unstable: by allowing
access through a debugger it has been compromised already anyway. It
makes our port consistent with x86, arm64, riscv and csky.
Commit 20c0d2d440 ("[PATCH] i386: pass proper trap numbers to die
chain handlers") may be the earliest of similar changes.
Link: https://lore.kernel.org/r/43DDF02E.76F0.0078.0@novell.com/
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
All *.S files under arch/loongarch/ have been converted to include
<linux/export.h> instead of <asm/export.h>.
Remove <asm/export.h>.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Commit ddb5cdbafa ("kbuild: generate KSYMTAB entries by modpost")
deprecated <asm/export.h>, which is now a wrapper of <linux/export.h>.
Replace #include <asm/export.h> with #include <linux/export.h>.
After all the <asm/export.h> lines are converted, <asm/export.h> and
<asm-generic/export.h> will be removed.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
There is no EXPORT_SYMBOL() line there, hence #include <asm/export.h>
is unneeded.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
As explained by Nick in the original issue: the kernel usually does a
good job of providing library helpers that have similar semantics as
their ordinary userspace libc equivalents, but -ffreestanding disables
such libcall optimization and other related features in the compiler,
which can lead to unexpected things such as CONFIG_FORTIFY_SOURCE not
working (!).
However, due to the desire for better control over unaligned accesses
with respect to CONFIG_ARCH_STRICT_ALIGN, and also for avoiding the
GCC bug https://gcc.gnu.org/PR109465, we do want to still disable
optimizations for the memory libcalls (memcpy, memmove and memset for
now). Use finer-grained -fno-builtin-* toggles to achieve this without
losing source fortification and other libcall optimizations.
Closes: https://github.com/ClangBuiltLinux/linux/issues/1897
Reported-by: Nathan Chancellor <nathan@kernel.org>
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
In drivers/Kconfig, drivers/firmware/Kconfig is sourced for all ports so
there is no need to source it in the port-specific Kconfig file. And
sourcing it here also caused the "Firmware Drivers" menu appeared two
times: one in the "Device Drivers" menu, another in the toplevel menu.
This is really puzzling so remove it.
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Pull drm fixes from Dave Airlie:
"A bit bigger than I'd care for, but it's mostly a single vmwgfx fix
and a fix for an i915 hotplug probing. Otherwise misc i915, bridge,
panfrost and dma-buf fixes.
core:
- add a HPD poll helper
i915:
- fix regression in i915 polling
- fix docs build warning
- fix DG2 idle power consumption
bridge:
- samsung-dsim: init fix
panfrost:
- fix speed binning issue
dma-buf:
- fix recursive lock in fence signal
vmwgfx:
- fix shader stage validation
- fix NULL ptr derefs in gem put"
* tag 'drm-fixes-2023-08-25' of git://anongit.freedesktop.org/drm/drm:
drm/i915: Fix HPD polling, reenabling the output poll work as needed
drm: Add an HPD poll helper to reschedule the poll work
drm/vmwgfx: Fix possible invalid drm gem put calls
drm/vmwgfx: Fix shader stage validation
dma-buf/sw_sync: Avoid recursive lock during fence signal
drm/i915: fix Sphinx indentation warning
drm/i915/dgfx: Enable d3cold at s2idle
drm/display/dp: Fix the DP DSC Receiver cap size
drm/panfrost: Skip speed binning on EOPNOTSUPP
drm: bridge: samsung-dsim: Fix init during host transfer
Pull tracing fixes from Steven Rostedt:
- Fix ring buffer being permanently disabled due to missed
record_disabled()
Changing the trace cpu mask will disable the ring buffers for the
CPUs no longer in the mask. But it fails to update the snapshot
buffer. If a snapshot takes place, the accounting for the ring buffer
being disabled is corrupted and this can lead to the ring buffer
being permanently disabled.
- Add test case for snapshot and cpu mask working together
- Fix memleak by the function graph tracer not getting closed properly.
The iterator is used to read the ring buffer. When it opens, it calls
the open function of a tracer, and when it is closed, it calls the
close iteration. While a trace is being read, it is still possible to
change the tracer.
If this happens between the function graph tracer and the wakeup
tracer (which uses function graph tracing), the tracers are not
closed properly during when the iterator sees the switch, and the
wakeup function did not initialize its private pointer to NULL, which
is used to know if the function graph tracer was the last tracer. It
could be fooled in thinking it is, but then on exit it does not call
the close function of the function graph tracer to clean up its data.
- Fix synthetic events on big endian machines, by introducing a union
that does the conversions properly.
- Fix synthetic events from printing out the number of elements in the
stacktrace when it shouldn't.
- Fix synthetic events stacktrace to not print a bogus value at the
end.
- Introduce a pipe_cpumask that prevents the trace_pipe files from
being opened by more than one task (file descriptor).
There was a race found where if splice is called, the iter->ent could
become stale and events could be missed. There's no point reading a
producer/consumer file by more than one task as they will corrupt
each other anyway. Add a cpumask that keeps track of the per_cpu
trace_pipe files as well as the global trace_pipe file that prevents
more than one open of a trace_pipe file that represents the same ring
buffer. This prevents the race from happening.
- Fix ftrace samples for arm64 to work with older compilers.
* tag 'trace-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
samples: ftrace: Replace bti assembly with hint for older compiler
tracing: Introduce pipe_cpumask to avoid race on trace_pipes
tracing: Fix memleak due to race between current_tracer and trace
tracing/synthetic: Allocate one additional element for size
tracing/synthetic: Skip first entry for stack traces
tracing/synthetic: Use union instead of casts
selftests/ftrace: Add a basic testcase for snapshot
tracing: Fix cpu buffers unavailable due to 'record_disabled' missed
Commit 41320b18a0 ("scsi: snic: Fix possible memory leak if device_add()
fails") fixed the memory leak caused by dev_set_name() when device_add()
failed. However, it did not consider that 'tgt' has already been released
when put_device(&tgt->dev) is called. Remove kfree(tgt) in the error path
to avoid double free of 'tgt' and move put_device(&tgt->dev) after the
removed kfree(tgt) to avoid a use-after-free.
Fixes: 41320b18a0 ("scsi: snic: Fix possible memory leak if device_add() fails")
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Link: https://lore.kernel.org/r/20230819083941.164365-1-wangzhu9@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull media fix from Mauro Carvalho Chehab:
"Fix a potential array out-of-bounds in the mediatek vcodec driver"
* tag 'media/v6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
The raid_component_add() function was added to the kernel tree via patch
"[SCSI] embryonic RAID class" (2005). Remove this function since it never
has had any callers in the Linux kernel. And also raid_component_release()
is only used in raid_component_add(), so it is also removed.
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Link: https://lore.kernel.org/r/20230822015254.184270-1-wangzhu9@huawei.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Fixes: 04b5b5cb01 ("scsi: core: Fix possible memory leak if device_add() fails")
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
smaps_pte_hole_lookup() is calling shmem_partial_swap_usage() with page
table lock held: but shmem_partial_swap_usage() does cond_resched_rcu() if
need_resched(): "BUG: sleeping function called from invalid context".
Since shmem_partial_swap_usage() is designed to count across a range, but
smaps_pte_hole_lookup() only calls it for a single page slot, just break
out of the loop on the last or only page, before checking need_resched().
Link: https://lkml.kernel.org/r/6fe3b3ec-abdf-332f-5c23-6a3b3a3b11a9@google.com
Fixes: 2301003215 ("mm/smaps: simplify shmem handling of pte holes")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org> [5.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The cachestat kselftest runs a test on a normal file, which is created
temporarily in the current directory. Among the tests it runs there is a
call to fsync(), which is expected to clean all dirty pages used by the
file.
However the tmpfs filesystem implements fsync() as noop_fsync(), so the
call will not even attempt to clean anything when this test file happens
to live on a tmpfs instance. This happens in an initramfs, or when the
current directory is in /dev/shm or sometimes /tmp.
To avoid this test failing wrongly, use statfs() to check which filesystem
the test file lives on. If that is "tmpfs", we skip the fsync() test.
Since the fsync test is only one part of the "normal file" test, we now
execute this twice, skipping the fsync part on the first call. This way
only the second test, including the fsync part, would be skipped.
Link: https://lkml.kernel.org/r/20230821160534.3414911-3-andre.przywara@arm.com
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests: cachestat: fix run on older kernels", v2.
I ran all kernel selftests on some test machine, and stumbled upon
cachestat failing (among others). These patches fix the run on older
kernels and when the current directory is on a tmpfs instance.
This patch (of 2):
As cachestat is a new syscall, it won't be available on older kernels, for
instance those running on a development machine. At the moment the test
reports all tests as "not ok" in this case.
Test for the cachestat syscall availability first, before doing further
tests, and bail out early with a TAP SKIP comment.
This also uses the opportunity to add the proper TAP headers, and add one
check for proper error handling (illegal file descriptor).
Link: https://lkml.kernel.org/r/20230821160534.3414911-1-andre.przywara@arm.com
Link: https://lkml.kernel.org/r/20230821160534.3414911-2-andre.przywara@arm.com
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The current implementation of append may cause duplicate data and/or
incorrect ranges to be returned to a reader during an update. Although
this has not been reported or seen, disable the append write operation
while the tree is in rcu mode out of an abundance of caution.
During the analysis of the mas_next_slot() the following was
artificially created by separating the writer and reader code:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last slot write is to start of slot
store current contents in slot
overwrite old end pivot
mas_next_slot():
read end metadata
read old end pivot
return with incorrect range
store new value
Alternatively:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last lost write to end of slot
store value
mas_next_slot():
read end metadata
read old end pivot
read new end pivot
return with incorrect range
set old end pivot
There may be other accesses that are not safe since we are now updating
both metadata and pointers, so disabling append if there could be rcu
readers is the safest action.
Link: https://lkml.kernel.org/r/20230819004356.1454718-2-Liam.Howlett@oracle.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 98b211d641 ("madvise: convert madvise_free_pte_range() to use a
folio") replaced the page_mapcount() with folio_mapcount() to check
whether the folio is shared by other mapping.
It's not correct for large folios. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-4-fengwei.yin@intel.com
Fixes: 98b211d641 ("madvise: convert madvise_free_pte_range() to use a folio")
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit fc986a38b6 ("mm: huge_memory: convert madvise_free_huge_pmd to
use a folio") replaced the page_mapcount() with folio_mapcount() to check
whether the folio is shared by other mapping.
It's not correct for large folios. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-3-fengwei.yin@intel.com
Fixes: fc986a38b6 ("mm: huge_memory: convert madvise_free_huge_pmd to use a folio")
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "don't use mapcount() to check large folio sharing", v2.
In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(),
folio_mapcount() is used to check whether the folio is shared. But it's
not correct as folio_mapcount() returns total mapcount of large folio.
Use folio_estimated_sharers() here as the estimated number is enough.
This patchset will fix the cases:
User space application call madvise() with MADV_FREE, MADV_COLD and
MADV_PAGEOUT for specific address range. There are THP mapped to the
range. Without the patchset, the THP is skipped. With the patch, the
THP will be split and handled accordingly.
David reported the cow self test skip some cases because of MADV_PAGEOUT
skip THP:
https://lore.kernel.org/linux-mm/9e92e42d-488f-47db-ac9d-75b24cd0d037@intel.com/T/#mbf0f2ec7fbe45da47526de1d7036183981691e81
and I confirmed this patchset make it work again.
This patch (of 3):
Commit 07e8c82b5e ("madvise: convert madvise_cold_or_pageout_pte_range()
to use folios") replaced the page_mapcount() with folio_mapcount() to
check whether the folio is shared by other mapping.
It's not correct for large folio. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-1-fengwei.yin@intel.com
Link: https://lkml.kernel.org/r/20230808020917.2230692-2-fengwei.yin@intel.com
Fixes: 07e8c82b5e ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios")
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull nfsd fixes from Chuck Lever:
"Two last-minute one-liners for v6.5-rc. One got lost in the shuffle,
and the other was reported just this morning"
- Close race window when handling FREE_STATEID operations
- Fix regression in /proc/fs/nfsd/v4_end_grace introduced in v6.5-rc"
* tag 'nfsd-6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Fix a thinko introduced by recent trace point changes
nfsd: Fix race to FREE_STATEID and cl_revoked
Pull spi fixes from Mark Brown:
"A couple more small driver specific fixes for v6.5.
The device mode for Cadence had been broken by some recent updates
done for host mode and large transfers for multi-byte words on stm32
had been broken by an API update in what I think was a rebasing
incident"
* tag 'spi-fix-v6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-cadence: Fix data corruption issues in slave mode
spi: stm32: fix accidential revert to byte-sized transfer splitting
Pull networking fixes from Paolo Abeni:
"Including fixes from wifi, can and netfilter.
Fixes to fixes:
- nf_tables:
- GC transaction race with abort path
- defer gc run if previous batch is still pending
Previous releases - regressions:
- ipv4: fix data-races around inet->inet_id
- phy: fix deadlocking in phy_error() invocation
- mdio: fix C45 read/write protocol
- ipvlan: fix a reference count leak warning in ipvlan_ns_exit()
- ice: fix NULL pointer deref during VF reset
- i40e: fix potential NULL pointer dereferencing of pf->vf in
i40e_sync_vsi_filters()
- tg3: use slab_build_skb() when needed
- mtk_eth_soc: fix NULL pointer on hw reset
Previous releases - always broken:
- core: validate veth and vxcan peer ifindexes
- sched: fix a qdisc modification with ambiguous command request
- devlink: add missing unregister linecard notification
- wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
- batman:
- do not get eth header before batadv_check_management_packet
- fix batadv_v_ogm_aggr_send memory leak
- bonding: fix macvlan over alb bond support
- mlxsw: set time stamp fields also when its type is MIRROR_UTC"
* tag 'net-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
selftests: bonding: add macvlan over bond testing
selftest: bond: add new topo bond_topo_2d1c.sh
bonding: fix macvlan over alb bond support
rtnetlink: Reject negative ifindexes in RTM_NEWLINK
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nf_tables: fix out of memory error handling
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: flush pending destroy work before netlink notifier
netfilter: nf_tables: validate all pending tables
ibmveth: Use dcbf rather than dcbfl
i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()
net/sched: fix a qdisc modification with ambiguous command request
igc: Fix the typo in the PTM Control macro
batman-adv: Hold rtnl lock during MTU update via netlink
igb: Avoid starting unnecessary workqueues
can: raw: add missing refcount for memory leak fix
can: isotp: fix support for transmission of SF without flow control
bnx2x: new flag for track HW resource allocation
sfc: allocate a big enough SKB for loopback selftest packet
...
0-Day found a 34.6% regression in stress-ng's 'af-alg' test case, and
bisected it to commit b81fac906a ("x86/fpu: Move FPU initialization into
arch_cpu_finalize_init()"), which optimizes the FPU init order, and moves
the CR4_OSXSAVE enabling into a later place:
arch_cpu_finalize_init
identify_boot_cpu
identify_cpu
generic_identify
get_cpu_cap --> setup cpu capability
...
fpu__init_cpu
fpu__init_cpu_xstate
cr4_set_bits(X86_CR4_OSXSAVE);
As the FPU is not yet initialized the CPU capability setup fails to set
X86_FEATURE_OSXSAVE. Many security module like 'camellia_aesni_avx_x86_64'
depend on this feature and therefore fail to load, causing the regression.
Cure this by setting X86_FEATURE_OSXSAVE feature right after OSXSAVE
enabling.
[ tglx: Moved it into the actual BSP FPU initialization code and added a comment ]
Fixes: b81fac906a ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/202307192135.203ac24e-oliver.sang@intel.com
Link: https://lore.kernel.org/lkml/20230823065747.92257-1-feng.tang@intel.com
The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is
valid and should be reloaded when returning to userspace. However, the
kernel will skip doing this if the FPU registers are already valid as
determined by fpregs_state_valid(). The logic embedded there considers
the state valid if two cases are both true:
1: fpu_fpregs_owner_ctx points to the current tasks FPU state
2: the last CPU the registers were live in was the current CPU.
This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to
the current FPU during the fpregs_restore_userregs() operation, so it
indicates that the registers have been restored on this CPU. But this
alone doesn’t preclude that the task hasn’t been rescheduled to a
different CPU, where the registers were modified, and then back to the
current CPU. To verify that this was not the case the logic relies on the
second condition. So the assumption is that if the registers have been
restored, AND they haven’t had the chance to be modified (by being
loaded on another CPU), then they MUST be valid on the current CPU.
Besides the lazy FPU optimizations, the other cases where the FPU
registers might not be valid are when the kernel modifies the FPU register
state or the FPU saved buffer. In this case the operation modifying the
FPU state needs to let the kernel know the correspondence has been
broken. The comment in “arch/x86/kernel/fpu/context.h” has:
/*
...
* If the FPU register state is valid, the kernel can skip restoring the
* FPU state from memory.
*
* Any code that clobbers the FPU registers or updates the in-memory
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
* Either one of these invalidation functions is enough. Invalidate
* a resource you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
However, this is not completely true. When the kernel modifies the
registers or saved FPU state, it can only rely on
__fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu
tracking. The exec path instead relies on fpregs_deactivate(), which sets
the CPU’s FPU context to NULL. This was observed to fail to restore the
reset FPU state to the registers when returning to userspace in the
following scenario:
1. A task is executing in userspace on CPU0
- CPU0’s FPU context points to tasks
- fpu->last_cpu=CPU0
2. The task exec()’s
3. While in the kernel the task is preempted
- CPU0 gets a thread executing in the kernel (such that no other
FPU context is activated)
- Scheduler sets task’s fpu->last_cpu=CPU0 when scheduling out
4. Task is migrated to CPU1
5. Continuing the exec(), the task gets to
fpu_flush_thread()->fpu_reset_fpregs()
- Sets CPU1’s fpu context to NULL
- Copies the init state to the task’s FPU buffer
- Sets TIF_NEED_FPU_LOAD on the task
6. The task reschedules back to CPU0 before completing the exec() and
returning to userspace
- During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set
- Skips saving the registers and updating task’s fpu→last_cpu,
because TIF_NEED_FPU_LOAD is the canonical source.
7. Now CPU0’s FPU context is still pointing to the task’s, and
fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even
though the reset FPU state has not been restored.
So the root cause is that exec() is doing the wrong kind of invalidate. It
should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
going away, they are just getting reset as part of an exec. So switch to
__fpu_invalidate_fpregs_state().
Also, delete the misleading comment that says that either kind of
invalidate will be enough, because it’s not always the case.
Fixes: 33344368cb ("x86/fpu: Clean up the fpu__clear() variants")
Reported-by: Lei Wang <lei4.wang@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Lijun Pan <lijun.pan@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Acked-by: Lijun Pan <lijun.pan@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230818170305.502891-1-rick.p.edgecombe@intel.com
Florian Westphal says:
====================
netfilter updates for net
This PR contains nf_tables updates for your *net* tree.
First patch fixes table validation, I broke this in 6.4 when tracking
validation state per table, reported by Pablo, fixup from myself.
Second patch makes sure objects waiting for memory release have been
released, this was broken in 6.1, patch from Pablo Neira Ayuso.
Patch three is a fix-for-fix from previous PR: In case a transaction
gets aborted, gc sequence counter needs to be incremented so pending
gc requests are invalidated, from Pablo.
Same for patch 4: gc list needs to use gc list lock, not destroy lock,
also from Pablo.
Patch 5 fixes a UaF in a set backend, but this should only occur when
failslab is enabled for GFP_KERNEL allocations, broken since feature
was added in 5.6, from myself.
Patch 6 fixes a double-free bug that was also added via previous PR:
We must not schedule gc work if the previous batch is still queued.
netfilter pull request 2023-08-23
* tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nf_tables: fix out of memory error handling
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: flush pending destroy work before netlink notifier
netfilter: nf_tables: validate all pending tables
====================
Link: https://lore.kernel.org/r/20230823152711.15279-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a new testing topo bond_topo_2d1c.sh which is used more commonly.
Make bond_topo_3d1c.sh just source bond_topo_2d1c.sh and add the
extra link.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The commit 14af9963ba ("bonding: Support macvlans on top of tlb/rlb mode
bonds") aims to enable the use of macvlans on top of rlb bond mode. However,
the current rlb bond mode only handles ARP packets to update remote neighbor
entries. This causes an issue when a macvlan is on top of the bond, and
remote devices send packets to the macvlan using the bond's MAC address
as the destination. After delivering the packets to the macvlan, the macvlan
will rejects them as the MAC address is incorrect. Consequently, this commit
makes macvlan over bond non-functional.
To address this problem, one potential solution is to check for the presence
of a macvlan port on the bond device using netif_is_macvlan_port(bond->dev)
and return NULL in the rlb_arp_xmit() function. However, this approach
doesn't fully resolve the situation when a VLAN exists between the bond and
macvlan.
So let's just do a partial revert for commit 14af9963ba in rlb_arp_xmit().
As the comment said, Don't modify or load balance ARPs that do not originate
locally.
Fixes: 14af9963ba ("bonding: Support macvlans on top of tlb/rlb mode bonds")
Reported-by: susan.zheng@veritas.com
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2117816
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
ASoC: Fixes for v6.5
A relatively large but generally not super urgent set of fixes for ASoC,
including some quirks and a MAINTAINERS update. There's also an update
to cs35l56 to change the firmware ABI, there are no current shipping
systems which use the current interface and the sooner we get the new
interface in the less likely it is that something will start.
It'd be nice if these landed for v6.5 but not the end of the world if
they wait till v6.6.
Pull ACPI fix from Rafael Wysocki:
"Make an existing ACPI IRQ override quirk for PCSpecialist Elimina Pro
16 M work as intended (Hans de Goede)"
* tag 'acpi-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: resource: Fix IRQ override quirk for PCSpecialist Elimina Pro 16 M
After the commit in the Fixes: line below, HPD polling stopped working
on i915, since after that change calling drm_kms_helper_poll_enable()
doesn't restart drm_mode_config::output_poll_work if the work was
stopped (no connectors needing polling) and enabling polling for a
connector (during runtime suspend or detecting an HPD IRQ storm).
After the above change calling drm_kms_helper_poll_enable() is a nop
after it's been called already and polling for some connectors was
disabled/re-enabled.
Fix this by calling drm_kms_helper_poll_reschedule() added in the
previous patch instead, which reschedules the work whenever expected.
Fixes: d33a54e399 ("drm/probe_helper: sort out poll_running vs poll_enabled")
CC: stable@vger.kernel.org # 6.4+
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230822113015.41224-2-imre.deak@intel.com
(cherry picked from commit 50452f2f76)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Andy Chiu <andy.chiu@sifive.com> says:
We add a vlenb field in Vector context and save it with the
riscv_vstate_save() macro. It should not cause performance regression as
VLENB is a design-time constant and is frequently used by hardware.
Also, adding this field into the __sc_riscv_v_state may benifit us on a
future compatibility issue becuse a hardware may have writable VLENB.
Adding and saving VLENB have an immediate benifit as it gives ptrace a
better view of the Vector extension and makes it possible to reconstruct
Vector register files from the dump without doing an additional csr read.
This patchset also sync the number of note types between us and gdb for
riscv to solve a conflicting note.
This is not an ABI break given that 6.5 has not been released yet.
* b4-shazam-merge:
RISC-V: vector: export VLENB csr in __sc_riscv_v_state
RISC-V: Remove ptrace support for vectors
Link: https://lore.kernel.org/r/20230816155450.26200-1-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Associate the swnode of the GPIO device's (which is the interrupt
controller here) with the irq domain. Otherwise the interrupt-controller
device attribute is a no-op.
Fixes: cb8c474e79 ("gpio: sim: new testing module")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
If a GPIO simulator device is unbound with interrupts still requested,
we will hit a use-after-free issue in __irq_domain_deactivate_irq(). The
owner of the irq domain must dispose of all mappings before destroying
the domain object.
Fixes: cb8c474e79 ("gpio: sim: new testing module")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
vmw_bo_unreference sets the input buffer to null on exit, resulting in
null ptr deref's on the subsequent drm gem put calls.
This went unnoticed because only very old userspace would be exercising
those paths but it wouldn't be hard to hit on old distros with brand
new kernels.
Introduce a new function that abstracts unrefing of user bo's to make
the code cleaner and more explicit.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reported-by: Ian Forbes <iforbes@vmware.com>
Fixes: 9ef8d83e8e ("drm/vmwgfx: Do not drop the reference to the handle too soon")
Cc: <stable@vger.kernel.org> # v6.4+
Reviewed-by: Maaz Mombasawala<mombasawalam@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230818041301.407636-1-zack@kde.org
Pull x86 platform driver fixes from Hans de Goede:
"Final set of three small fixes for 6.5"
* tag 'platform-drivers-x86-v6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/mellanox: Fix mlxbf-tmfifo not handling all virtio CONSOLE notifications
platform/x86: ideapad-laptop: Add support for new hotkeys found on ThinkBook 14s Yoga ITL
platform/x86: lenovo-ymc: Add Lenovo Yoga 7 14ACN6 to ec_trigger_quirk_dmi_table
Don't queue more gc work, else we may queue the same elements multiple
times.
If an element is flagged as dead, this can mean that either the previous
gc request was invalidated/discarded by a transaction or that the previous
request is still pending in the system work queue.
The latter will happen if the gc interval is set to a very low value,
e.g. 1ms, and system work queue is backlogged.
The sets refcount is 1 if no previous gc requeusts are queued, so add
a helper for this and skip gc run if old requests are pending.
Add a helper for this and skip the gc run in this case.
Fixes: f6c383b8c3 ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Several instances of pipapo_resize() don't propagate allocation failures,
this causes a crash when fault injection is enabled for gfp_kernel slabs.
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Use nf_tables_gc_list_lock spinlock, not nf_tables_destroy_list_lock to
protect the gc list.
Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Abort path is missing a synchronization point with GC transactions. Add
GC sequence number hence any GC transaction losing race will be
discarded.
Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Destroy work waits for the RCU grace period then it releases the objects
with no mutex held. All releases objects follow this path for
transactions, therefore, order is guaranteed and references to top-level
objects in the hierarchy remain valid.
However, netlink notifier might interfer with pending destroy work.
rcu_barrier() is not correct because objects are not release via RCU
callback. Flush destroy work before releasing objects from netlink
notifier path.
Fixes: d4bc8271db ("netfilter: nf_tables: netlink notifier might race to release objects")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
We have to validate all tables in the transaction that are in
VALIDATE_DO state, the blamed commit below did not move the break
statement to its right location so we only validate one table.
Moreover, we can't init table->validate to _SKIP when a table object
is allocated.
If we do, then if a transcaction creates a new table and then
fails the transaction, nfnetlink will loop and nft will hang until
user cancels the command.
Add back the pernet state as a place to stash the last state encountered.
This is either _DO (we hit an error during commit validation) or _SKIP
(transaction passed all checks).
Fixes: 00c320f9b7 ("netfilter: nf_tables: make validation state per table")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
When building for power4, newer binutils don't recognise the "dcbfl"
extended mnemonic.
dcbfl RA, RB is equivalent to dcbf RA, RB, 1.
Switch to "dcbf" to avoid the build error.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add check for pf->vf not being NULL before dereferencing
pf->vf[vsi->vf_id] in updating VSI filter sync.
Add a similar check before dereferencing !pf->vf[vsi->vf_id].trusted
in the condition for clearing promisc mode bit.
Fixes: c87c938f62 ("i40e: Add VF VLAN pruning")
Signed-off-by: Andrii Staikov <andrii.staikov@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When replacing an existing root qdisc, with one that is of the same kind, the
request boils down to essentially a parameterization change i.e not one that
requires allocation and grafting of a new qdisc. syzbot was able to create a
scenario which resulted in a taprio qdisc replacing an existing taprio qdisc
with a combination of NLM_F_CREATE, NLM_F_REPLACE and NLM_F_EXCL leading to
create and graft scenario.
The fix ensures that only when the qdisc kinds are different that we should
allow a create and graft, otherwise it goes into the "change" codepath.
While at it, fix the code and comments to improve readability.
While syzbot was able to create the issue, it did not zone on the root cause.
Analysis from Vladimir Oltean <vladimir.oltean@nxp.com> helped narrow it down.
v1->V2 changes:
- remove "inline" function definition (Vladmir)
- remove extrenous braces in branches (Vladmir)
- change inline function names (Pedro)
- Run tdc tests (Victor)
v2->v3 changes:
- dont break else/if (Simon)
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot+a3618a167af2021433cd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/20230816225759.g25x76kmgzya2gei@skbuf/T/
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
variable *nplanes is provided by user via system call argument. The
possible value of q_data->fmt->num_planes is 1-3, while the value
of *nplanes can be 1-8. The array access by index i can cause array
out-of-bounds.
Fix this bug by checking *nplanes against the array size.
Fixes: 4e855a6efa ("[media] vcodec: mediatek: Add Mediatek V4L2 Video Encoder Driver")
Signed-off-by: Wei Chen <harperchen1110@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-08-21 (ice)
This series contains updates to ice driver only.
Jesse fixes an issue on calculating buffer size.
Petr Oros reverts a commit that does not fully resolve VF reset issues
and implements one that provides a fuller fix.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Fix NULL pointer deref during VF reset
Revert "ice: Fix ice VF reset during iavf initialization"
ice: fix receive buffer size miscalculation
====================
Link: https://lore.kernel.org/r/20230821171633.2203505-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oliver Hartkopp says:
====================
CAN fixes for 6.5-rc7
The isotp fix removes an unnecessary check which leads to delays and/or
a wrong error notification.
The fix for the CAN_RAW socket solves the last issue that has been
introduced with commit ee8b94c851 ("can: raw: fix receiver memory leak")
in this upstream cycle (detected by Eric Dumazet).
====================
Link: https://lore.kernel.org/r/20230821144547.6658-1-socketcan@hartkopp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The original implementation had a very simple handling for single frame
transmissions as it just sent the single frame without a timeout handling.
With the new echo frame handling the echo frame was also introduced for
single frames but the former exception ('simple without timers') has been
maintained by accident. This leads to a 1 second timeout when closing the
socket and to an -ECOMM error when CAN_ISOTP_WAIT_TX_DONE is selected.
As the echo handling is always active (also for single frames) remove the
wrong extra condition for single frames.
Fixes: 9f39d36530 ("can: isotp: add support for transmission without flow control")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20230821144547.6658-2-socketcan@hartkopp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While injecting PCIe errors to the upstream PCIe switch of
a BCM57810 NIC, system hangs/crashes were observed.
After several calls to bnx2x_tx_timout() complete,
bnx2x_nic_unload() is called to free up HW resources
and bnx2x_napi_disable() is called to release NAPI objects.
Later, when the EEH driver calls bnx2x_io_slot_reset() to
complete the recovery process, bnx2x attempts to disable
NAPI again by calling bnx2x_napi_disable() and freeing
resources which have already been freed, resulting in a
hang or crash.
Introduce a new flag to track the HW resource and NAPI
allocation state, refactor duplicated code into a single
function, check page pool allocation status before freeing,
and reduces debug output when a TX timeout event occurs.
Reviewed-by: Manish Chopra <manishc@marvell.com>
Tested-by: Abdul Haleem <abdhalee@in.ibm.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Venkata Sai Duggi <venkata.sai.duggi@ibm.com>
Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20230818161443.708785-2-thinhtr@linux.vnet.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Problem can be reproduced by unloading snd_soc_simple_card, because in
devm_get_clk_from_child() devres data is allocated as `struct clk`, but
devm_clk_release() expects devres data to be `struct devm_clk_state`.
KASAN report:
==================================================================
BUG: KASAN: slab-out-of-bounds in devm_clk_release+0x20/0x54
Read of size 8 at addr ffffff800ee09688 by task (udev-worker)/287
Call trace:
dump_backtrace+0xe8/0x11c
show_stack+0x1c/0x30
dump_stack_lvl+0x60/0x78
print_report+0x150/0x450
kasan_report+0xa8/0xf0
__asan_load8+0x78/0xa0
devm_clk_release+0x20/0x54
release_nodes+0x84/0x120
devres_release_all+0x144/0x210
device_unbind_cleanup+0x1c/0xac
really_probe+0x2f0/0x5b0
__driver_probe_device+0xc0/0x1f0
driver_probe_device+0x68/0x120
__driver_attach+0x140/0x294
bus_for_each_dev+0xec/0x160
driver_attach+0x38/0x44
bus_add_driver+0x24c/0x300
driver_register+0xf0/0x210
__platform_driver_register+0x48/0x54
asoc_simple_card_init+0x24/0x1000 [snd_soc_simple_card]
do_one_initcall+0xac/0x340
do_init_module+0xd0/0x300
load_module+0x2ba4/0x3100
__do_sys_init_module+0x2c8/0x300
__arm64_sys_init_module+0x48/0x5c
invoke_syscall+0x64/0x190
el0_svc_common.constprop.0+0x124/0x154
do_el0_svc+0x44/0xdc
el0_svc+0x14/0x50
el0t_64_sync_handler+0xec/0x11c
el0t_64_sync+0x14c/0x150
Allocated by task 287:
kasan_save_stack+0x38/0x60
kasan_set_track+0x28/0x40
kasan_save_alloc_info+0x20/0x30
__kasan_kmalloc+0xac/0xb0
__kmalloc_node_track_caller+0x6c/0x1c4
__devres_alloc_node+0x44/0xb4
devm_get_clk_from_child+0x44/0xa0
asoc_simple_parse_clk+0x1b8/0x1dc [snd_soc_simple_card_utils]
simple_parse_node.isra.0+0x1ec/0x230 [snd_soc_simple_card]
simple_dai_link_of+0x1bc/0x334 [snd_soc_simple_card]
__simple_for_each_link+0x2ec/0x320 [snd_soc_simple_card]
asoc_simple_probe+0x468/0x4dc [snd_soc_simple_card]
platform_probe+0x90/0xf0
really_probe+0x118/0x5b0
__driver_probe_device+0xc0/0x1f0
driver_probe_device+0x68/0x120
__driver_attach+0x140/0x294
bus_for_each_dev+0xec/0x160
driver_attach+0x38/0x44
bus_add_driver+0x24c/0x300
driver_register+0xf0/0x210
__platform_driver_register+0x48/0x54
asoc_simple_card_init+0x24/0x1000 [snd_soc_simple_card]
do_one_initcall+0xac/0x340
do_init_module+0xd0/0x300
load_module+0x2ba4/0x3100
__do_sys_init_module+0x2c8/0x300
__arm64_sys_init_module+0x48/0x5c
invoke_syscall+0x64/0x190
el0_svc_common.constprop.0+0x124/0x154
do_el0_svc+0x44/0xdc
el0_svc+0x14/0x50
el0t_64_sync_handler+0xec/0x11c
el0t_64_sync+0x14c/0x150
The buggy address belongs to the object at ffffff800ee09600
which belongs to the cache kmalloc-256 of size 256
The buggy address is located 136 bytes inside of
256-byte region [ffffff800ee09600, ffffff800ee09700)
The buggy address belongs to the physical page:
page:000000002d97303b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4ee08
head:000000002d97303b order:1 compound_mapcount:0 compound_pincount:0
flags: 0x10200(slab|head|zone=0)
raw: 0000000000010200 0000000000000000 dead000000000122 ffffff8002c02480
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffffff800ee09580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffffff800ee09600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffff800ee09680: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffffff800ee09700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffffff800ee09780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Fixes: abae8e57e4 ("clk: generalize devm_clk_get() a bit")
Signed-off-by: Andrey Skvortsov <andrej.skvortzov@gmail.com>
Link: https://lore.kernel.org/r/20230805084847.3110586-1-andrej.skvortzov@gmail.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
We've found two bugs here: NT_RISCV_VECTOR steps on NT_RISCV_CSR (which
is only for embedded), and we don't have vlenb in the core dumps. Given
that we've have a pair of bugs croup up as part of the GDB review we've
probably got other issues, so let's just cut this for 6.5 and get it
right.
Fixes: 0c59922c76 ("riscv: Add ptrace vector support")
Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20230816155450.26200-2-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull devicetree fixes from Rob Herring:
- Fix DT node refcount when creating platform devices
- Fix deadlock in changeset code due to printing with devtree_lock held
- Fix unittest EXPECT strings for parse_phandle_with_args_map() test
- Fix IMA kexec memblock freeing
* tag 'devicetree-fixes-for-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of/platform: increase refcount of fwnode
of: dynamic: Refactor action prints to not use "%pOF" inside devtree_lock
of: unittest: Fix EXPECT for parse_phandle_with_args_map() test
mm,ima,kexec,of: use memblock_free_late from ima_free_kexec_buffer
Cited commits passed a size to alloc_skb that was only big enough for
the actual packet contents, but the following skb_put + memcpy writes
the whole struct efx_loopback_payload including leading and trailing
padding bytes (which are then stripped off with skb_pull/skb_trim).
This could cause an skb_over_panic, although in practice we get saved
by kmalloc_size_roundup.
Pass the entire size we use, instead of the size of the final packet.
Reported-by: Andy Moreton <andy.moreton@amd.com>
Fixes: cf60ed4696 ("sfc: use padding to fix alignment in loopback test")
Fixes: 30c24dd87f ("sfc: siena: use padding to fix alignment in loopback test")
Fixes: 1186c6b31e ("sfc: falcon: use padding to fix alignment in loopback test")
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230821180153.18652-1-edward.cree@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johannes Berg says:
====================
Two fixes:
- reorder buffer filter checks can cause bad shift/UBSAN
warning with newer HW, avoid the check (mac80211)
- add Kconfig dependency for iwlwifi for PTP clock usage
* tag 'wireless-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
wifi: iwlwifi: mvm: add dependency for PTP clock
====================
Link: https://lore.kernel.org/r/20230822124206.43926-2-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit b655892ffd ("leds: trigger: netdev: expose hw_control status
via sysfs") exposed to sysfs the flag that tells whether the LED trigger
is offloaded to hardware, under the name "hw_control", since that is the
name under which this setting is called in the code.
Everywhere else in kernel when some work that is normally done in
software can be made to be done by hardware instead, we use the word
"offloading" to describe this, e.g. "LED blinking is offloaded to
hardware".
Normally renaming sysfs entries is a no-go because of backwards
compatibility. But since this patch was not yet released in a stable
kernel, I think it is still possible to rename it, if there is
consensus.
Fixes: b655892ffd ("leds: trigger: netdev: expose hw_control status via sysfs")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230821121453.30203-1-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull NFS client fixes from Trond Myklebust:
- fix a use after free in nfs_direct_join_group() (Cc: stable)
- fix sysfs server name memory leak
- fix lock recovery hang in NFSv4.0
- fix page free in the error path for nfs42_proc_getxattr() and
__nfs4_get_acl_uncached()
- SUNRPC/rdma: fix receive buffer dma-mapping after a server disconnect
* tag 'nfs-for-6.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
xprtrdma: Remap Receive buffers after a reconnect
NFSv4: fix out path in __nfs4_get_acl_uncached
NFSv4.2: fix error handling in nfs42_proc_getxattr
NFS: Fix sysfs server name memory leak
NFS: Fix a use after free in nfs_direct_join_group()
NFSv4: Fix dropped lock for racing OPEN and delegation return
Pull selinux fix from Paul Moore:
"A small fix for a potential problem when cleaning up after a failed
SELinux policy load (list next pointer not being properly initialized
to NULL early enough)"
* tag 'selinux-pr-20230821' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: set next pointer before attaching to list
Before adding a port to bond, it need to be set down first. In the
lacpdu test the author set the port down specifically. But commit
a4abfa627c ("net: rtnetlink: Enslave device before bringing it up")
changed the operation order, the kernel will set the port down _after_
adding to bond. So all the ports will be down at last and the test failed.
In fact, the veth interfaces are already inactive when added. This
means there's no need to set them down again before adding to the bond.
Let's just remove the link down operation.
Fixes: a4abfa627c ("net: rtnetlink: Enslave device before bringing it up")
Reported-by: Zhengchao Shao <shaozhengchao@huawei.com>
Closes: https://lore.kernel.org/netdev/a0ef07c7-91b0-94bd-240d-944a330fcabd@huawei.com/
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230817082459.1685972-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
commit 0f8e565109
("of/platform: Propagate firmware node by calling device_set_node()")
use of_fwnode_handle to replace of_node_get, which introduces a side
effect that the refcount is not increased. Then the out of tree
jailhouse hypervisor enable/disable test will trigger kernel dump in
of_overlay_remove, with the following sequence
"
of_changeset_revert(&overlay_changeset);
of_changeset_destroy(&overlay_changeset);
of_overlay_remove(&overlay_id);
"
So increase the refcount to avoid issues.
This patch also release the refcount when releasing amba device to avoid
refcount leakage.
Fixes: 0f8e565109 ("of/platform: Propagate firmware node by calling device_set_node()")
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Link: https://lore.kernel.org/r/20230821023928.3324283-2-peng.fan@oss.nxp.com
Signed-off-by: Rob Herring <robh@kernel.org>
When a memcg is in the process of being released mem_cgroup_tryget will
fail because its reference count has already reached 0. This can happen
during reclaim if the memcg has already been offlined, and we reclaim all
remaining pages attributed to the offlined memcg. shrink_many attempts to
skip the empty memcg in this case, and continue reclaiming from the
remaining memcgs in the old generation. If there is only one memcg
remaining, or if all remaining memcgs are in the process of being released
then shrink_many will spin until all memcgs have finished being released.
The release occurs through a workqueue, so it can take a while before
kswapd is able to make any further progress.
This fix results in reductions in kswapd activity and direct reclaim in
a test where 28 apps (working set size > total memory) are repeatedly
launched in a random sequence:
A B delta ratio(%)
allocstall_movable 5962 3539 -2423 -40.64
allocstall_normal 2661 2417 -244 -9.17
kswapd_high_wmark_hit_quickly 53152 7594 -45558 -85.71
pageoutrun 57365 11750 -45615 -79.52
Link: https://lkml.kernel.org/r/20230814151636.1639123-1-tjmercier@google.com
Fixes: e4dde56cd2 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
After commit 2c2241081f ("mm/gup: move private gup FOLL_ flags to
internal.h") FOLL_LONGTERM flag value got updated from 0x10000 to 0x100 at
include/linux/mm_types.h.
As hmm.hmm_device_private.hmm_gup_test uses FOLL_LONGTERM Updating same
here as well.
Before this change test goes in an infinite assert loop in
hmm.hmm_device_private.hmm_gup_test
==========================================================
RUN hmm.hmm_device_private.hmm_gup_test ...
hmm-tests.c:1962:hmm_gup_test:Expected HMM_DMIRROR_PROT_WRITE..
..(2) == m[2] (34)
hmm-tests.c:157:hmm_gup_test:Expected ret (-1) == 0 (0)
hmm-tests.c:157:hmm_gup_test:Expected ret (-1) == 0 (0)
...
==========================================================
Call Trace:
<TASK>
? sched_clock+0xd/0x20
? __lock_acquire.constprop.0+0x120/0x6c0
? ktime_get+0x2c/0xd0
? sched_clock+0xd/0x20
? local_clock+0x12/0xd0
? lock_release+0x26e/0x3b0
pin_user_pages_fast+0x4c/0x70
gup_test_ioctl+0x4ff/0xbb0
? gup_test_ioctl+0x68c/0xbb0
__x64_sys_ioctl+0x99/0xd0
do_syscall_64+0x60/0x90
? syscall_exit_to_user_mode+0x2a/0x50
? do_syscall_64+0x6d/0x90
? syscall_exit_to_user_mode+0x2a/0x50
? do_syscall_64+0x6d/0x90
? irqentry_exit_to_user_mode+0xd/0x20
? irqentry_exit+0x3f/0x50
? exc_page_fault+0x96/0x200
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f6aaa31aaff
After this change test is able to pass successfully.
Link: https://lkml.kernel.org/r/20230808124347.79163-1-ayush.jain3@amd.com
Fixes: 2c2241081f ("mm/gup: move private gup FOLL_ flags to internal.h")
Signed-off-by: Ayush Jain <ayush.jain3@amd.com>
Reviewed-by: Raghavendra K T <raghavendra.kt@amd.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In contrast to most other GUP code, GUP-fast common page table walking
code like gup_pte_range() also handles hugetlb pages. But in contrast to
other hugetlb page table walking code, it does not look at the hugetlb PTE
abstraction whereby we have only a single logical hugetlb PTE per hugetlb
page, even when using multiple cont-PTEs underneath -- which is for
example what huge_ptep_get() abstracts.
So when we have a hugetlb page that is mapped via cont-PTEs, GUP-fast
might stumble over a PTE that does not map the head page of a hugetlb page
-- not the first "head" PTE of such a cont mapping.
Logically, the whole hugetlb page is mapped (entire_mapcount == 1), but we
might end up calling gup_must_unshare() with a tail page of a hugetlb
page.
We only maintain a single PageAnonExclusive flag per hugetlb page (as
hugetlb pages cannot get partially COW-shared), stored for the head page.
That flag is clear for all tail pages.
So when gup_must_unshare() ends up calling PageAnonExclusive() with a tail
page of a hugetlb page:
1) With CONFIG_DEBUG_VM_PGFLAGS
Stumbles over the:
VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page);
For example, when executing the COW selftests with 64k hugetlb pages on
arm64:
[ 61.082187] page:00000000829819ff refcount:3 mapcount:1 mapping:0000000000000000 index:0x1 pfn:0x11ee11
[ 61.082842] head:0000000080f79bf7 order:4 entire_mapcount:1 nr_pages_mapped:0 pincount:2
[ 61.083384] anon flags: 0x17ffff80003000e(referenced|uptodate|dirty|head|mappedtodisk|node=0|zone=2|lastcpupid=0xfffff)
[ 61.084101] page_type: 0xffffffff()
[ 61.084332] raw: 017ffff800000000 fffffc00037b8401 0000000000000402 0000000200000000
[ 61.084840] raw: 0000000000000010 0000000000000000 00000000ffffffff 0000000000000000
[ 61.085359] head: 017ffff80003000e ffffd9e95b09b788 ffffd9e95b09b788 ffff0007ff63cf71
[ 61.085885] head: 0000000000000000 0000000000000002 00000003ffffffff 0000000000000000
[ 61.086415] page dumped because: VM_BUG_ON_PAGE(PageHuge(page) && !PageHead(page))
[ 61.086914] ------------[ cut here ]------------
[ 61.087220] kernel BUG at include/linux/page-flags.h:990!
[ 61.087591] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 61.087999] Modules linked in: ...
[ 61.089404] CPU: 0 PID: 4612 Comm: cow Kdump: loaded Not tainted 6.5.0-rc4+ #3
[ 61.089917] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 61.090409] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 61.090897] pc : gup_must_unshare.part.0+0x64/0x98
[ 61.091242] lr : gup_must_unshare.part.0+0x64/0x98
[ 61.091592] sp : ffff8000825eb940
[ 61.091826] x29: ffff8000825eb940 x28: 0000000000000000 x27: fffffc00037b8440
[ 61.092329] x26: 0400000000000001 x25: 0000000000080101 x24: 0000000000080000
[ 61.092835] x23: 0000000000080100 x22: ffff0000cffb9588 x21: ffff0000c8ec6b58
[ 61.093341] x20: 0000ffffad6b1000 x19: fffffc00037b8440 x18: ffffffffffffffff
[ 61.093850] x17: 2864616548656761 x16: 5021202626202965 x15: 6761702865677548
[ 61.094358] x14: 6567615028454741 x13: 2929656761702864 x12: 6165486567615021
[ 61.094858] x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffffd9e958b7a1c0
[ 61.095359] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 00000000002bffa8
[ 61.095873] x5 : ffff0008bb19e708 x4 : 0000000000000000 x3 : 0000000000000000
[ 61.096380] x2 : 0000000000000000 x1 : ffff0000cf6636c0 x0 : 0000000000000046
[ 61.096894] Call trace:
[ 61.097080] gup_must_unshare.part.0+0x64/0x98
[ 61.097392] gup_pte_range+0x3a8/0x3f0
[ 61.097662] gup_pgd_range+0x1ec/0x280
[ 61.097942] lockless_pages_from_mm+0x64/0x1a0
[ 61.098258] internal_get_user_pages_fast+0xe4/0x1d0
[ 61.098612] pin_user_pages_fast+0x58/0x78
[ 61.098917] pin_longterm_test_start+0xf4/0x2b8
[ 61.099243] gup_test_ioctl+0x170/0x3b0
[ 61.099528] __arm64_sys_ioctl+0xa8/0xf0
[ 61.099822] invoke_syscall.constprop.0+0x7c/0xd0
[ 61.100160] el0_svc_common.constprop.0+0xe8/0x100
[ 61.100500] do_el0_svc+0x38/0xa0
[ 61.100736] el0_svc+0x3c/0x198
[ 61.100971] el0t_64_sync_handler+0x134/0x150
[ 61.101280] el0t_64_sync+0x17c/0x180
[ 61.101543] Code: aa1303e0 f00074c1 912b0021 97fffeb2 (d4210000)
2) Without CONFIG_DEBUG_VM_PGFLAGS
Always detects "not exclusive" for passed tail pages and refuses to PIN
the tail pages R/O, as gup_must_unshare() == true. GUP-fast will fallback
to ordinary GUP. As ordinary GUP properly considers the logical hugetlb
PTE abstraction in hugetlb_follow_page_mask(), pinning the page will
succeed when looking at the PageAnonExclusive on the head page only.
So the only real effect of this is that with cont-PTE hugetlb pages, we'll
always fallback from GUP-fast to ordinary GUP when not working on the head
page, which ends up checking the head page and do the right thing.
Consequently, the cow selftests pass with cont-PTE hugetlb pages as well
without CONFIG_DEBUG_VM_PGFLAGS.
Note that this only applies to anon hugetlb pages that are mapped using
cont-PTEs: for example 64k hugetlb pages on a 4k arm64 kernel.
... and only when R/O-pinning (FOLL_PIN) such pages that are mapped into
the page table R/O using GUP-fast.
On production kernels (and even most debug kernels, that don't set
CONFIG_DEBUG_VM_PGFLAGS) this patch should theoretically not be required
to be backported. But of course, it does not hurt.
Link: https://lkml.kernel.org/r/20230805101256.87306-1-david@redhat.com
Fixes: a7f2266041 ("mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
walk_page_range() and friends often operate under write-locked mmap_lock.
With introduction of vma locks, the vmas have to be locked as well during
such walks to prevent concurrent page faults in these areas. Add an
additional member to mm_walk_ops to indicate locking requirements for the
walk.
The change ensures that page walks which prevent concurrent page faults
by write-locking mmap_lock, operate correctly after introduction of
per-vma locks. With per-vma locks page faults can be handled under vma
lock without taking mmap_lock at all, so write locking mmap_lock would
not stop them. The change ensures vmas are properly locked during such
walks.
A sample issue this solves is do_mbind() performing queue_pages_range()
to queue pages for migration. Without this change a concurrent page
can be faulted into the area and be left out of migration.
Link: https://lkml.kernel.org/r/20230804152724.3090321-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Suggested-by: Jann Horn <jannh@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We shouldn't be using a GUP-internal helper if it can be avoided.
Similar to smaps_pte_entry() that uses vm_normal_page(), let's use
vm_normal_page_pmd() that similarly refuses to return the huge zeropage.
In contrast to follow_trans_huge_pmd(), vm_normal_page_pmd():
(1) Will always return the head page, not a tail page of a THP.
If we'd ever call smaps_account with a tail page while setting "compound
= true", we could be in trouble, because smaps_account() would look at
the memmap of unrelated pages.
If we're unlucky, that memmap does not exist at all. Before we removed
PG_doublemap, we could have triggered something similar as in
commit 24d7275ce2 ("fs/proc: task_mmu.c: don't read mapcount for
migration entry").
This can theoretically happen ever since commit ff9f47f6f0 ("mm: proc:
smaps_rollup: do not stall write attempts on mmap_lock"):
(a) We're in show_smaps_rollup() and processed a VMA
(b) We release the mmap lock in show_smaps_rollup() because it is
contended
(c) We merged that VMA with another VMA
(d) We collapsed a THP in that merged VMA at that position
If the end address of the original VMA falls into the middle of a THP
area, we would call smap_gather_stats() with a start address that falls
into a PMD-mapped THP. It's probably very rare to trigger when not
really forced.
(2) Will succeed on a is_pci_p2pdma_page(), like vm_normal_page()
Treat such PMDs here just like smaps_pte_entry() would treat such PTEs.
If such pages would be anonymous, we most certainly would want to
account them.
(3) Will skip over pmd_devmap(), like vm_normal_page() for pte_devmap()
As noted in vm_normal_page(), that is only for handling legacy ZONE_DEVICE
pages. So just like smaps_pte_entry(), we'll now also ignore such PMD
entries.
Especially, follow_pmd_mask() never ends up calling
follow_trans_huge_pmd() on pmd_devmap(). Instead it calls
follow_devmap_pmd() -- which will fail if neither FOLL_GET nor FOLL_PIN
is set.
So skipping pmd_devmap() pages seems to be the right thing to do.
(4) Will properly handle VM_MIXEDMAP/VM_PFNMAP, like vm_normal_page()
We won't be returning a memmap that should be ignored by core-mm, or
worse, a memmap that does not even exist. Note that while
walk_page_range() will skip VM_PFNMAP mappings, walk_page_vma() won't.
Most probably this case doesn't currently really happen on the PMD level,
otherwise we'd already be able to trigger kernel crashes when reading
smaps / smaps_rollup.
So most probably only (1) is relevant in practice as of now, but could only
cause trouble in extreme corner cases.
Let's move follow_trans_huge_pmd() to mm/internal.h to discourage future
reuse in wrong context.
Link: https://lkml.kernel.org/r/20230803143208.383663-3-david@redhat.com
Fixes: ff9f47f6f0 ("mm: proc: smaps_rollup: do not stall write attempts on mmap_lock")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: liubo <liubo254@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Unfortunately commit 474098edac ("mm/gup: replace FOLL_NUMA by
gup_can_follow_protnone()") missed that follow_page() and
follow_trans_huge_pmd() never implicitly set FOLL_NUMA because they really
don't want to fail on PROT_NONE-mapped pages -- either due to NUMA hinting
or due to inaccessible (PROT_NONE) VMAs.
As spelled out in commit 0b9d705297 ("mm: numa: Support NUMA hinting
page faults from gup/gup_fast"): "Other follow_page callers like KSM
should not use FOLL_NUMA, or they would fail to get the pages if they use
follow_page instead of get_user_pages."
liubo reported [1] that smaps_rollup results are imprecise, because they
miss accounting of pages that are mapped PROT_NONE. Further, it's easy to
reproduce that KSM no longer works on inaccessible VMAs on x86-64, because
pte_protnone()/pmd_protnone() also indictaes "true" in inaccessible VMAs,
and follow_page() refuses to return such pages right now.
As KVM really depends on these NUMA hinting faults, removing the
pte_protnone()/pmd_protnone() handling in GUP code completely is not
really an option.
To fix the issues at hand, let's revive FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
to restore the original behavior for now and add better comments.
Set FOLL_HONOR_NUMA_FAULT independent of FOLL_FORCE in
is_valid_gup_args(), to add that flag for all external GUP users.
Note that there are three GUP-internal __get_user_pages() users that don't
end up calling is_valid_gup_args() and consequently won't get
FOLL_HONOR_NUMA_FAULT set.
1) get_dump_page(): we really don't want to handle NUMA hinting
faults. It specifies FOLL_FORCE and wouldn't have honored NUMA
hinting faults already.
2) populate_vma_page_range(): we really don't want to handle NUMA hinting
faults. It specifies FOLL_FORCE on accessible VMAs, so it wouldn't have
honored NUMA hinting faults already.
3) faultin_vma_page_range(): we similarly don't want to handle NUMA
hinting faults.
To make the combination of FOLL_FORCE and FOLL_HONOR_NUMA_FAULT work in
inaccessible VMAs properly, we have to perform VMA accessibility checks in
gup_can_follow_protnone().
As GUP-fast should reject such pages either way in
pte_access_permitted()/pmd_access_permitted() -- for example on x86-64 and
arm64 that both implement pte_protnone() -- let's just always fallback to
ordinary GUP when stumbling over pte_protnone()/pmd_protnone().
As Linus notes [2], honoring NUMA faults might only make sense for
selected GUP users.
So we should really see if we can instead let relevant GUP callers specify
it manually, and not trigger NUMA hinting faults from GUP as default.
Prepare for that by making FOLL_HONOR_NUMA_FAULT an external GUP flag and
adding appropriate documenation.
While at it, remove a stale comment from follow_trans_huge_pmd(): That
comment for pmd_protnone() was added in commit 2b4847e730 ("mm: numa:
serialise parallel get_user_page against THP migration"), which noted:
THP does not unmap pages due to a lack of support for migration
entries at a PMD level. This allows races with get_user_pages
Nowadays, we do have PMD migration entries, so the comment no longer
applies. Let's drop it.
[1] https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com
[2] https://lore.kernel.org/r/CAHk-=wgRiP_9X0rRdZKT8nhemZGNateMtb366t37d8-x7VRs=g@mail.gmail.com
Link: https://lkml.kernel.org/r/20230803143208.383663-2-david@redhat.com
Fixes: 474098edac ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: liubo <liubo254@huawei.com>
Closes: https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com
Reported-by: Peter Xu <peterx@redhat.com>
Closes: https://lore.kernel.org/all/ZMKJjDaqZ7FW0jfe@x1n/
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
During stress test with attaching and detaching VF from KVM and
simultaneously changing VFs spoofcheck and trust there was a
NULL pointer dereference in ice_reset_vf that VF's VSI is null.
More than one instance of ice_reset_vf() can be running at a given
time. When we rebuild the VSI in ice_reset_vf, another reset can be
triaged from ice_service_task. In this case we can access the currently
uninitialized VSI and cause panic. The window for this racing condition
has been around for a long time but it's much worse after commit
227bf4500a ("ice: move VSI delete outside deconfig") because
the reset runs faster. ice_reset_vf() using vf->cfg_lock and when
we move this lock before accessing to the VF VSI, we can fix
BUG for all cases.
Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes
in ice_vsi_stop_all_rx_rings()
With our reproducer, we can hit BUG:
~8h before commit 227bf4500a ("ice: move VSI delete outside deconfig").
~20m after commit 227bf4500a ("ice: move VSI delete outside deconfig").
After this fix we are not able to reproduce it after ~48h
There was commit cf90b74341 ("ice: Fix call trace with null VSI during
VF reset") which also tried to fix this issue, but it was only
partially resolved and the bug still exists.
[ 6420.658415] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 6420.665382] #PF: supervisor read access in kernel mode
[ 6420.670521] #PF: error_code(0x0000) - not-present page
[ 6420.675659] PGD 0
[ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded Not tainted 5.14.0-317.el9.x86_64 #1
[ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS 1.6.5 04/15/2022
[ 6420.698729] Workqueue: ice ice_service_task [ice]
[ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice]
[ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted
[ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00 12 00 8b 12 83
[ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized
[ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246
[ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX: 0000000000000000
[ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI: ff2acf1a27301828
[ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09: 0000000000001000
[ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12: ff2acf19066460d0
[ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15: 00000000ffffffff
[ 6420.783742] FS: 0000000000000000(0000) GS:ff2acf28ffa80000(0000) knlGS:0000000000000000
[ 6420.791829] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4: 0000000000773ee0
[ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 -> 0002)
[ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 6420.811841] PKRU: 55555554
[ 6420.811842] Call Trace:
[ 6420.811843] <TASK>
[ 6420.811844] ice_reset_vf+0x9a/0x450 [ice]
[ 6420.811876] ice_process_vflr_event+0x8f/0xc0 [ice]
[ 6420.841343] ice_service_task+0x23b/0x600 [ice]
[ 6420.845884] ? __schedule+0x212/0x550
[ 6420.849550] process_one_work+0x1e2/0x3b0
[ 6420.853563] ? rescuer_thread+0x390/0x390
[ 6420.857577] worker_thread+0x50/0x3a0
[ 6420.861242] ? rescuer_thread+0x390/0x390
[ 6420.865253] kthread+0xdd/0x100
[ 6420.868400] ? kthread_complete_and_exit+0x20/0x20
[ 6420.873194] ret_from_fork+0x1f/0x30
[ 6420.876774] </TASK>
[ 6420.878967] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit drm_shmem_helper intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter pcspk
r
Fixes: efe4186000 ("ice: Fix memory corruption in VF driver")
Fixes: f23df5220d ("ice: Fix spurious interrupt during removal of trusted VF")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This reverts commit 7255355a06.
After this commit we are not able to attach VF to VM:
virsh attach-interface v0 hostdev --managed 0000:41:01.0 --mac 52:52:52:52:52:52
error: Failed to attach interface
error: Cannot set interface MAC to 52:52:52:52:52:52 for ifname enp65s0f0np0 vf 0: Resource temporarily unavailable
ice_check_vf_ready_for_cfg() already contain waiting for reset.
New condition in ice_check_vf_ready_for_reset() causing only problems.
Fixes: 7255355a06 ("ice: Fix ice VF reset during iavf initialization")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The driver is misconfiguring the hardware for some values of MTU such that
it could use multiple descriptors to receive a packet when it could have
simply used one.
Change the driver to use a round-up instead of the result of a shift, as
the shift can truncate the lower bits of the size, and result in the
problem noted above. It also aligns this driver with similar code in i40e.
The insidiousness of this problem is that everything works with the wrong
size, it's just not working as well as it could, as some MTU sizes end up
using two or more descriptors, and there is no way to tell that is
happening without looking at ice_trace or a bus analyzer.
Fixes: efc2214b60 ("ice: Add support for XDP")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There is race issue when concurrently splice_read main trace_pipe and
per_cpu trace_pipes which will result in data read out being different
from what actually writen.
As suggested by Steven:
> I believe we should add a ref count to trace_pipe and the per_cpu
> trace_pipes, where if they are opened, nothing else can read it.
>
> Opening trace_pipe locks all per_cpu ref counts, if any of them are
> open, then the trace_pipe open will fail (and releases any ref counts
> it had taken).
>
> Opening a per_cpu trace_pipe will up the ref count for just that
> CPU buffer. This will allow multiple tasks to read different per_cpu
> trace_pipe files, but will prevent the main trace_pipe file from
> being opened.
But because we only need to know whether per_cpu trace_pipe is open or
not, using a cpumask instead of using ref count may be easier.
After this patch, users will find that:
- Main trace_pipe can be opened by only one user, and if it is
opened, all per_cpu trace_pipes cannot be opened;
- Per_cpu trace_pipes can be opened by multiple users, but each per_cpu
trace_pipe can only be opened by one user. And if one of them is
opened, main trace_pipe cannot be opened.
Link: https://lore.kernel.org/linux-trace-kernel/20230818022645.1948314-1-zhengyejian1@huawei.com
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
While originally it was fine to format strings using "%pOF" while
holding devtree_lock, this now causes a deadlock. Lockdep reports:
of_get_parent from of_fwnode_get_parent+0x18/0x24
^^^^^^^^^^^^^
of_fwnode_get_parent from fwnode_count_parents+0xc/0x28
fwnode_count_parents from fwnode_full_name_string+0x18/0xac
fwnode_full_name_string from device_node_string+0x1a0/0x404
device_node_string from pointer+0x3c0/0x534
pointer from vsnprintf+0x248/0x36c
vsnprintf from vprintk_store+0x130/0x3b4
Fix this by moving the printing in __of_changeset_entry_apply() outside
the lock. As the only difference in the multiple prints is the action
name, use the existing "action_names" to refactor the prints into a
single print.
Fixes: a92eb7621b ("lib/vsprintf: Make use of fwnode API to obtain node names and separators")
Cc: stable@vger.kernel.org
Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230801-dt-changeset-fixes-v3-2-5f0410e007dd@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Commit 6f486556ab ("spi: stm32: renaming of spi_master into
spi_controller") included an accidential reverted of a change added in
commit 1e49291125 ("spi: stm32: split large transfers based on word
size instead of bytes").
This breaks large SPI transfers with word sizes > 8 bits, which are
e.g. common when driving MIPI DBI displays.
Fix this by using `spi_split_transfers_maxwords()` instead of
`spi_split_transfers_maxsize()`.
Fixes: 6f486556ab ("spi: stm32: renaming of spi_master into spi_controller")
Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
Link: https://lore.kernel.org/r/20230816145237.3159817-1-l.goehrs@pengutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
The Lenovo Thinkbook 14s Yoga ITL has 4 new symbols/shortcuts on their
F9-F11 and PrtSc keys:
F9: Has a symbol of a head with a headset, the manual says "Service key"
F10: Has a symbol of a telephone horn which has been picked up from the
receiver, the manual says: "Answer incoming calls"
F11: Has a symbol of a telephone horn which is resting on the receiver,
the manual says: "Reject incoming calls"
PrtSc: Has a symbol of a siccor and a dashed ellipse, the manual says:
"Open the Windows 'Snipping' Tool app"
This commit adds support for these 4 new hkey events.
Signed-off-by: André Apitzsch <git@apitzsch.eu>
Link: https://lore.kernel.org/r/20230819-lenovo_keys-v1-1-9d34eac88e0a@apitzsch.eu
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
It turns out that some PCSpecialist Elimina Pro 16 M models
have "GM6BGEQ" as DMI product-name instead of "Elimina Pro 16 M",
causing the existing DMI quirk to not work on these models.
The DMI board-name is always "GM6BGEQ", so match on that instead.
Fixes: 56fec0051a ("ACPI: resource: Add IRQ override quirk for PCSpecialist Elimina Pro 16 M")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217394#c36
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Shubhra reports that their laptop is heating up over s2idle. Even though
it's getting into the deepest state, it appears to be having spurious
wakeup events.
While debugging a tangential issue with the RTC Carsten reports that recent
6.1.y based kernel face a similar problem.
Looking at acpidump and GPIO register comparisons these spurious wakeup
events are from the GPIO associated with the I2C touchpad on both laptops
and occur even when the touchpad is not marked as a wake source by the
kernel.
This means that the boot firmware has programmed these bits and because
Linux didn't touch them lead to spurious wakeup events from that GPIO.
To fix this issue, restore most of the code that previously would clear all
the bits associated with wakeup sources. This will allow the kernel to only
program the wake up sources that are necessary.
This is similar to what was done previously; but only the wake bits are
cleared by default instead of interrupts and wake bits. If any other
problems are reported then it may make sense to clear interrupts again too.
Cc: Sachi King <nakato@nakato.io>
Cc: stable@vger.kernel.org
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Fixes: 65f6c7c91c ("pinctrl: amd: Revert "pinctrl: amd: disable and mask interrupts on probe"")
Reported-by: Shubhra Prakash Nandi <email2shubhra@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217754
Reported-by: Carsten Hatger <xmb8dsv4@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217626#c28
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230818144850.1439-1-mario.limonciello@amd.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
pinctrl: renesas: Fixes for v6.5 (take two)
- Fix race conditions in pinctrl group and function creation/remove
calls on the RZ/G2L, RZ/V2M, and RZ/A2 SoC families.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The commit 06470f7468 ("mac80211: add API to allow filtering frames in BA sessions")
added reorder_buf_filtered to mark frames filtered by firmware, and it
can only work correctly if hw.max_rx_aggregation_subframes <= 64 since
it stores the bitmap in a u64 variable.
However, new HE or EHT devices can support BlockAck number up to 256 or
1024, and then using a higher subframe index leads UBSAN warning:
UBSAN: shift-out-of-bounds in net/mac80211/rx.c:1129:39
shift exponent 215 is too large for 64-bit type 'long long unsigned int'
Call Trace:
<IRQ>
dump_stack_lvl+0x48/0x70
dump_stack+0x10/0x20
__ubsan_handle_shift_out_of_bounds+0x1ac/0x360
ieee80211_release_reorder_frame.constprop.0.cold+0x64/0x69 [mac80211]
ieee80211_sta_reorder_release+0x9c/0x400 [mac80211]
ieee80211_prepare_and_rx_handle+0x1234/0x1420 [mac80211]
ieee80211_rx_list+0xaef/0xf60 [mac80211]
ieee80211_rx_napi+0x53/0xd0 [mac80211]
Since only old hardware that supports <=64 BlockAck uses
ieee80211_mark_rx_ba_filtered_frames(), limit the use as it is, so add a
WARN_ONCE() and comment to note to avoid using this function if hardware
capability is not suitable.
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://lore.kernel.org/r/20230818014004.16177-1-pkshih@realtek.com
[edit commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Jakub asked if I'd be willing to be the maintainer of the macsec code
and review the driver code adding macsec offload, so let's add the
corresponding entry.
The keyword lines are meant to catch selftests and patches adding HW
offload support to other drivers.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull crypto fixes from Herbert Xu:
"Fix a regression in the caam driver and af_alg"
* tag 'v6.5-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: fix uninit-value in af_alg_free_resources
Revert "crypto: caam - adjust RNG timing to support more devices"
This might_sleep() goes back a long time: it was originally introduced
way back when by commit 010060741a ("x86: add might_sleep() to
do_page_fault()"), and made it into the generic VM code when the x86
fault path got re-organized and generalized in commit c2508ec5a5 ("mm:
introduce new 'lock_mm_and_find_vma()' page fault helper").
However, it turns out that the placement of that might_sleep() has
always been rather questionable simply because it's not only a debug
statement to warn about sleeping in contexts that shouldn't sleep (which
was the original reason for adding it), but it also implies a voluntary
scheduling point.
That, in turn, is less than desirable for two reasons:
(a) it ends up being done after we successfully got the mmap_lock, so
just as we got the lock we will now eagerly schedule away and
increase lock contention
and
(b) this is all very possibly part of the "oops, things went horribly
wrong" path and we just haven't figured that out yet
After all, the whole _reason_ for having that get_mmap_lock_carefully()
rather than just doing the obvious mmap_read_lock() is because this code
wants to deal somewhat gracefully with potential kernel wild pointer
bugs.
So then a voluntary scheduling point here is simply not a good idea.
We could certainly turn the 'might_sleep()' into a '__might_sleep()' and
make it be just the debug check that it was originally intended to be.
But even that seems questionable in the wild kernel pointer case - which
again is part of the whole point of this code. The problem wouldn't be
about the _sleeping_ part of the page fault, but about a bad kernel
access. The fact that that bad kernel access might happen in a section
that you shouldn't sleep in is secondary.
So it really ends up being the case that this is simply entirely the
wrong place to do this debug check and related scheduling point at all.
So let's just remove the check entirely. It's been around for over a
decade, it has served its purpose.
The re-schedule will happen at return to user space anyway for the
normal case, and the warning - if we even need it - might be better off
done as a special case for "page fault from kernel mode" once we've
dealt with any potential kernel oopses where the oops is the relevant
thing, not some artificial "scheduling while atomic" test.
Reported-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/lkml/20230820104303.2083444-1-mjguzik@gmail.com/
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update .gitignore to untrack tools directory and log.txt. "tools" is
generated in "selftests/net/Makefile" and log.txt is generated in
"selftests/net/gro.sh" when executing run_all_tests.
Signed-off-by: Anh Tuan Phan <tuananhlfc@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP sendmsg() is lockless, so ip_select_ident_segs()
can very well be run from multiple cpus [1]
Convert inet->inet_id to an atomic_t, but implement
a dedicated path for TCP, avoiding cost of a locked
instruction (atomic_add_return())
Note that this patch will cause a trivial merge conflict
because we added inet->flags in net-next tree.
v2: added missing change in
drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
(David Ahern)
[1]
BUG: KCSAN: data-race in __ip_make_skb / __ip_make_skb
read-write to 0xffff888145af952a of 2 bytes by task 7803 on cpu 1:
ip_select_ident_segs include/net/ip.h:542 [inline]
ip_select_ident include/net/ip.h:556 [inline]
__ip_make_skb+0x844/0xc70 net/ipv4/ip_output.c:1446
ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmmsg+0x269/0x500 net/socket.c:2634
__do_sys_sendmmsg net/socket.c:2663 [inline]
__se_sys_sendmmsg net/socket.c:2660 [inline]
__x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff888145af952a of 2 bytes by task 7804 on cpu 0:
ip_select_ident_segs include/net/ip.h:541 [inline]
ip_select_ident include/net/ip.h:556 [inline]
__ip_make_skb+0x817/0xc70 net/ipv4/ip_output.c:1446
ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmmsg+0x269/0x500 net/socket.c:2634
__do_sys_sendmmsg net/socket.c:2663 [inline]
__se_sys_sendmmsg net/socket.c:2660 [inline]
__x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x184d -> 0x184e
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7804 Comm: syz-executor.1 Not tainted 6.5.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
==================================================================
Fixes: 23f57406b8 ("ipv4: avoid using shared IP generator for connected sockets")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
veth and vxcan need to make sure the ifindexes of the peer
are not negative, core does not validate this.
Using iproute2 with user-space-level checking removed:
Before:
# ./ip link add index 10 type veth peer index -1
# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:74:b2:03 brd ff:ff:ff:ff:ff:ff
10: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 8a:90:ff:57:6d:5d brd ff:ff:ff:ff:ff:ff
-1: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether ae:ed:18:e6:fa:7f brd ff:ff:ff:ff:ff:ff
Now:
$ ./ip link add index 10 type veth peer index -1
Error: ifindex can't be negative.
This problem surfaced in net-next because an explicit WARN()
was added, the root cause is older.
Fixes: e6f8f1a739 ("veth: Allow to create peer link with given ifindex")
Fixes: a8f820a380 ("can: add Virtual CAN Tunnel driver (vxcan)")
Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ruan Jinjie says:
====================
net: Fix return value check for fixed_phy_register()
The fixed_phy_register() function returns error pointers and never
returns NULL. Update the checks accordingly.
Changes in v3:
- Drop the error fix patch for fixed_phy_get_gpiod().
- Split the error code update code into another patch set as suggested.
- Update the commit title and message.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The fixed_phy_register() function returns error pointers and never
returns NULL. Update the checks accordingly.
Fixes: b0ba512e25 ("net: bcmgenet: enable driver to work without a device tree")
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fixed_phy_register() function returns error pointers and never
returns NULL. Update the checks accordingly.
Fixes: c25b23b8a3 ("bgmac: register fixed PHY for ARM BCM470X / BCM5301X chipsets")
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull tty/serial fixes from Greg KH:
"Here are some small tty and serial core fixes for 6.5-rc7 that resolve
a lot of reported issues.
Primarily in here are the fixes for the serial bus code from Tony that
came in -rc1, as it hit wider testing with the huge number of
different types of systems and serial ports. All of the reported
issues with duplicate names and other issues with this code are now
resolved.
Other than that included in here is:
- n_gsm fix for a previous fix
- 8250 lockdep annotation fix
- fsl_lpuart serial driver fix
- TIOCSTI documentation update for previous CAP_SYS_ADMIN change
All of these have been in linux-next for a while with no reported
problems"
* tag 'tty-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: core: Fix serial core port id, including multiport devices
serial: 8250: drop lockdep annotation from serial8250_clear_IER()
tty: n_gsm: fix the UAF caused by race condition in gsm_cleanup_mux
serial: core: Revert port_id use
TIOCSTI: Document CAP_SYS_ADMIN behaviour in Kconfig
serial: 8250: Fix oops for port->pm on uart_change_pm()
serial: 8250: Reinit port_id when adding back serial8250_isa_devs
serial: core: Fix kmemleak issue for serial core device remove
MAINTAINERS: Merge TTY layer and serial drivers
serial: core: Fix serial_base_match() after fixing controller port name
serial: core: Fix serial core controller port name to show controller id
serial: core: Fix serial core port id to not use port->line
serial: core: Controller id cannot be negative
tty: serial: fsl_lpuart: Clear the error flags by writing 1 for lpuart32 platforms
Pull rust fix from Miguel Ojeda:
- Macros: fix 'HAS_*' redefinition by the '#[vtable]' macro
under conditional compilation
* tag 'rust-fixes-6.5-rc7' of https://github.com/Rust-for-Linux/linux:
rust: macros: vtable: fix `HAS_*` redefinition (`gen_const_name`)
Since commit 91a7cda1f4 ("net: phy: Fix race condition on link status
change") all the phy_error() method invocations have been causing the
nested-mutex-lock deadlock because it's normally done in the PHY-driver
threaded IRQ handlers which since that change have been called with the
phydev->lock mutex held. Here is the calls thread:
IRQ: phy_interrupt()
+-> mutex_lock(&phydev->lock); <--------------------+
drv->handle_interrupt() | Deadlock due
+-> ERROR: phy_error() + to the nested
+-> phy_process_error() | mutex lock
+-> mutex_lock(&phydev->lock); <-+
phydev->state = PHY_ERROR;
mutex_unlock(&phydev->lock);
mutex_unlock(&phydev->lock);
The problem can be easily reproduced just by calling phy_error() from any
PHY-device threaded interrupt handler. Fix it by dropping the phydev->lock
mutex lock from the phy_process_error() method and printing a nasty error
message to the system log if the mutex isn't held in the caller execution
context.
Note for the fix to work correctly in the PHY-subsystem itself the
phydev->lock mutex locking must be added to the phy_error_precise()
function.
Link: https://lore.kernel.org/netdev/20230816180944.19262-1-fancer.lancer@gmail.com
Fixes: 91a7cda1f4 ("net: phy: Fix race condition on link status change")
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle extended compliance code 0x1 (SFF8024_ECC_100G_25GAUI_C2M_AOC)
for active optical cables supporting 25G and 100G speeds.
Since the specification makes no statement about transmitter range, and
as the specific sfp module that had been tested features only 2m fiber -
short-range (SR) modes are selected.
The 100G speed is irrelevant because it would require multiple fibers /
multiple SFP28 modules combined under one netdev.
sfp-bus.c only handles a single module per netdev, so only 25Gbps modes
are selected.
sfp_parse_support already handles SFF8024_ECC_100GBASE_SR4_25GBASE_SR
with compatible properties, however that entry is a contradiction in
itself since with SFP(28) 100GBASE_SR4 is impossible - that would likely
be a mode for qsfp modules only.
Add a case for SFF8024_ECC_100G_25GAUI_C2M_AOC selecting 25gbase-r
interface mode and 25000baseSR link mode.
Also enforce SFP28 bitrate limits on the values read from sfp eeprom as
requested by Russell King.
Tested with fs.com S28-AO02 AOC SFP28 module.
Signed-off-by: Josua Mayer <josua@solid-run.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull i2c fixes from Wolfram Sang:
"Usual set of driver fixes. A bit more than usual because I was
unavailable for a while"
* tag 'i2c-for-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: bcm-iproc: Fix bcm_iproc_i2c_isr deadlock issue
i2c: Update documentation to use .probe() again
i2c: sun6i-p2wi: Fix an error message in probe()
i2c: hisi: Only handle the interrupt of the driver's transfer
i2c: tegra: Fix i2c-tegra DMA config option processing
i2c: tegra: Fix failure during probe deferral cleanup
i2c: designware: Handle invalid SMBus block data response length value
i2c: designware: Correct length byte validation logic
i2c: imx-lpi2c: return -EINVAL when i2c peripheral clk doesn't work
Pull btrfs fixes from David Sterba:
- fix infinite loop in readdir(), could happen in a big directory when
files get renamed during enumeration
- fix extent map handling of skipped pinned ranges
- fix a corner case when handling ordered extent length
- fix a potential crash when balance cancel races with pause
- verify correct uuid when starting scrub or device replace
* tag 'for-6.5-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix incorrect splitting in btrfs_drop_extent_map_range
btrfs: fix BUG_ON condition in btrfs_cancel_balance
btrfs: only subtract from len_to_oe_boundary when it is tracking an extent
btrfs: fix replace/scrub failure with metadata_uuid
btrfs: fix infinite directory reads
Pull fbdev fixes and cleanups from Helge Deller:
- various code cleanups in amifb, atmel_lcdfb, ssd1307fb, kyro and
goldfishfb
* tag 'fbdev-for-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: goldfishfb: Do not check 0 for platform_get_irq()
fbdev: atmel_lcdfb: Remove redundant of_match_ptr()
fbdev: kyro: Remove unused declarations
fbdev: ssd1307fb: Print the PWM's label instead of its number
fbdev: mmp: fix value check in mmphw_probe()
fbdev: amifb: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper
Pull block fixes from Jens Axboe:
"Main thing here is the fix for the regression in flush handling which
caused IO hangs/stalls for a few reporters. Hopefully that should all
be sorted out now. Outside of that, just a few minor fixes for issues
that were introduced in this cycle"
* tag 'block-6.5-2023-08-19' of git://git.kernel.dk/linux:
blk-mq: release scheduler resource when request completes
blk-crypto: dynamically allocate fallback profile
blk-cgroup: hold queue_lock when removing blkg->q_node
drivers/rnbd: restore sysfs interface to rnbd-client
On server-initiated disconnect, rpcrdma_xprt_disconnect() was DMA-
unmapping the Receive buffers, but rpcrdma_post_recvs() neglected
to remap them after a new connection had been established. The
result was immediate failure of the new connection with the Receives
flushing with LOCAL_PROT_ERR.
Fixes: 671c450b6f ("xprtrdma: Fix oops in Receive handler after device removal")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Another highly rare error case when a page allocating loop (inside
__nfs4_get_acl_uncached, this time) is not properly unwound on error.
Since pages array is allocated being uninitialized, need to free only
lower array indices. NULL checks were useful before commit 62a1573fcf
("NFSv4 fix acl retrieval over krb5i/krb5p mounts") when the array had
been initialized to zero on stack.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 62a1573fcf ("NFSv4 fix acl retrieval over krb5i/krb5p mounts")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
There is a slight issue with error handling code inside
nfs42_proc_getxattr(). If page allocating loop fails then we free the
failing page array element which is NULL but __free_page() can't deal with
NULL args.
Found by Linux Verification Center (linuxtesting.org).
Fixes: a1f26739cc ("NFSv4.2: improve page handling for GETXATTR")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Chuck reported [1] an IO hang problem on NFS exports that reside on SATA
devices and bisected to commit 615939a2ae ("blk-mq: defer to the normal
submission path for post-flush requests").
We analysed the IO hang problem, found there are two postflush requests
waiting for each other.
The first postflush request completed the REQ_FSEQ_DATA sequence, so go to
the REQ_FSEQ_POSTFLUSH sequence and added in the flush pending list, but
failed to blk_kick_flush() because of the second postflush request which
is inflight waiting in scheduler queue.
The second postflush waiting in scheduler queue can't be dispatched because
the first postflush hasn't released scheduler resource even though it has
completed by itself.
Fix it by releasing scheduler resource when the first postflush request
completed, so the second postflush can be dispatched and completed, then
make blk_kick_flush() succeed.
While at it, remove the check for e->ops.finish_request, as all
schedulers set that. Reaffirm this requirement by adding a WARN_ON_ONCE()
at scheduler registration time, just like we do for insert_requests and
dispatch_request.
[1] https://lore.kernel.org/all/7A57C7AE-A51A-4254-888B-FE15CA21F9E9@oracle.com/
Link: https://lore.kernel.org/linux-block/20230819031206.2744005-1-chengming.zhou@linux.dev/
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202308172100.8ce4b853-oliver.sang@intel.com
Fixes: 615939a2ae ("blk-mq: defer to the normal submission path for post-flush requests")
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/20230813152325.3017343-1-chengming.zhou@linux.dev
[axboe: folded in incremental fix and added tags]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Based on the original code semantic in case of Clause 45 MDIO, the address
command is supposed to be followed by the command sending the MMD address,
not the CSR address. The commit 002dd3de09 ("net: mdio: mdio-bitbang:
Separate C22 and C45 transactions") has erroneously broken that. So most
likely due to an unfortunate variable name it switched the code to sending
the CSR address. In our case it caused the protocol malfunction so the
read operation always failed with the turnaround bit always been driven to
one by PHY instead of zero. Fix that by getting back the correct
behaviour: sending MMD address command right after the regular address
command.
Fixes: 002dd3de09 ("net: mdio: mdio-bitbang: Separate C22 and C45 transactions")
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
802.1X PAE frames are link-local frames, therefore they must be trapped to
the CPU port. Currently, the MT753X switches treat 802.1X PAE frames as
regular multicast frames, therefore flooding them to user ports. To fix
this, set 802.1X PAE frames to be trapped to the CPU port(s).
Fixes: b8f126a8d5 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull media fixes from Mauro Carvalho Chehab:
"Three driver fixes"
* tag 'media/v6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: imx: imx7-media-csi: Fix applying format constraints
media: uvcvideo: Fix menu count handling for userspace XU mappings
media: mtk-jpeg: Set platform driver data earlier
Pull x86 fixes from Borislav Petkov:
"Extraordinary embargoed times call for extraordinary measures. That's
why this week's x86/urgent branch is larger than usual, containing all
the known fallout fixes after the SRSO mitigation got merged.
I know, it is a bit late in the game but everyone who has reported a
bug stemming from the SRSO pile, has tested that branch and has
confirmed that it fixes their bug.
Also, I've run it on every possible hardware I have and it is looking
good. It is running on this very machine while I'm typing, for 2 days
now without an issue. Famous last words...
- Use LEA ...%rsp instead of ADD %rsp in the Zen1/2 SRSO return
sequence as latter clobbers flags which interferes with fastop
emulation in KVM, leading to guests freezing during boot
- A fix for the DIV(0) quotient data leak on Zen1 to clear the
divider buffers at the right time
- Disable the SRSO mitigation on unaffected configurations as it got
enabled there unnecessarily
- Change .text section name to fix CONFIG_LTO_CLANG builds
- Improve the optprobe indirect jmp check so that certain
configurations can still be able to use optprobes at all
- A serious and good scrubbing of the untraining routines by PeterZ:
- Add proper speculation stopping traps so that objtool is happy
- Adjust objtool to handle the new thunks
- Make the thunk pointer assignable to the different untraining
sequences at runtime, thus avoiding the alternative at the
return thunk. It simplifies the code a bit too.
- Add a entry_untrain_ret() main entry point which selects the
respective untraining sequence
- Rename things so that they're more clear
- Fix stack validation with FRAME_POINTER=y builds
- Fix static call patching to handle when a JMP to the return thunk
is the last insn on the very last module memory page
- Add more documentation about what each untraining routine does and
why"
* tag 'x86_urgent_for_v6.5_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/srso: Correct the mitigation status when SMT is disabled
x86/static_call: Fix __static_call_fixup()
objtool/x86: Fixup frame-pointer vs rethunk
x86/srso: Explain the untraining sequences a bit more
x86/cpu/kvm: Provide UNTRAIN_RET_VM
x86/cpu: Cleanup the untrain mess
x86/cpu: Rename srso_(.*)_alias to srso_alias_\1
x86/cpu: Rename original retbleed methods
x86/cpu: Clean up SRSO return thunk mess
x86/alternative: Make custom return thunk unconditional
objtool/x86: Fix SRSO mess
x86/cpu: Fix up srso_safe_ret() and __x86_return_thunk()
x86/cpu: Fix __x86_return_thunk symbol type
x86/retpoline,kprobes: Skip optprobe check for indirect jumps with retpolines and IBT
x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANG
x86/srso: Disable the mitigation on unaffected configurations
x86/CPU/AMD: Fix the DIV(0) initial fix attempt
x86/retpoline: Don't clobber RFLAGS during srso_safe_ret()
Pull powerpc fix from Michael Ellerman:
- Fix hardened usercopy BUG when using /proc based firmware update
interface
Thanks to Nathan Lynch and Kees Cook.
* tag 'powerpc-6.5-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/rtas_flash: allow user copy to flash block cache objects
The field 'virtual router' was extended to 12 bits in Spectrum-4.
Therefore, the element 'MLXSW_AFK_ELEMENT_VIRT_ROUTER_MSB' needs 3 bits for
Spectrum < 4 and 4 bits for Spectrum >= 4.
The elements are stored in an internal storage scratchpad. Currently, the
MSB is defined there as 3 bits. It means that for Spectrum-4, only 2K VRFs
can be used for multicast routing, as the highest bit is not really used by
the driver. Fix the definition of 'VIRT_ROUTER_MSB' to use 4 bits. Adjust
the definitions of 'virtual router' field in the blocks accordingly - use
'_avoid_size_check' for Spectrum-2 instead of for Spectrum-4. Fix the mask
in parse function to use 4 bits.
Fixes: 6d5d8ebb88 ("mlxsw: Rename virtual router flex key element")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/79bed2b70f6b9ed58d4df02e9798a23da648015b.1692268427.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The two most significant bits of the "local_port" field in the SSPR
register are always cleared since they are overwritten by the deprecated
and overlapping "sub_port" field.
On systems with more than 255 local ports (e.g., Spectrum-4), this
results in the firmware maintaining invalid mappings between system port
and local port. Specifically, two different systems ports (0x1 and
0x101) point to the same local port (0x1), which eventually leads to
firmware errors.
Fix by removing the deprecated "sub_port" field.
Fixes: fd24b29a1b ("mlxsw: reg: Align existing registers to use extended local_port field")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/9b909a3033c8d3d6f67f237306bef4411c5e6ae4.1692268427.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, in Spectrum-2 and above, time stamps are extracted from the CQE
into the time stamp fields in 'struct mlxsw_skb_cb', only when the CQE
time stamp type is UTC. The time stamps are read directly from the CQE and
software can get the time stamp in UTC format using CQEv2.
From Spectrum-4, the time stamps that are read from the CQE are allowed
to be also from MIRROR_UTC type.
Therefore, we get a warning [1] from the driver that the time stamp fields
were not set, when LLDP control packet is sent.
Allow the time stamp type to be MIRROR_UTC and set the time stamp in this
case as well.
[1]
WARNING: CPU: 11 PID: 0 at drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c:1409 mlxsw_sp2_ptp_hwtstamp_fill+0x1f/0x70 [mlxsw_spectrum]
[...]
Call Trace:
<IRQ>
mlxsw_sp2_ptp_receive+0x3c/0x80 [mlxsw_spectrum]
mlxsw_core_skb_receive+0x119/0x190 [mlxsw_core]
mlxsw_pci_cq_tasklet+0x3c9/0x780 [mlxsw_pci]
tasklet_action_common.constprop.0+0x9f/0x110
__do_softirq+0xbb/0x296
irq_exit_rcu+0x79/0xa0
common_interrupt+0x86/0xa0
</IRQ>
<TASK>
Fixes: 4735402173 ("mlxsw: spectrum: Extend to support Spectrum-4 ASIC")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/bcef4d044ef608a4e258d33a7ec0ecd91f480db5.1692268427.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are two network devices(veth1 and veth3) in ns1, and ipvlan1 with
L3S mode and ipvlan2 with L2 mode are created based on them as
figure (1). In this case, ipvlan_register_nf_hook() will be called to
register nf hook which is needed by ipvlans in L3S mode in ns1 and value
of ipvl_nf_hook_refcnt is set to 1.
(1)
ns1 ns2
------------ ------------
veth1--ipvlan1 (L3S)
veth3--ipvlan2 (L2)
(2)
ns1 ns2
------------ ------------
veth1--ipvlan1 (L3S)
ipvlan2 (L2) veth3
| |
|------->-------->--------->--------
migrate
When veth3 migrates from ns1 to ns2 as figure (2), veth3 will register in
ns2 and calls call_netdevice_notifiers with NETDEV_REGISTER event:
dev_change_net_namespace
call_netdevice_notifiers
ipvlan_device_event
ipvlan_migrate_l3s_hook
ipvlan_register_nf_hook(newnet) (I)
ipvlan_unregister_nf_hook(oldnet) (II)
In function ipvlan_migrate_l3s_hook(), ipvl_nf_hook_refcnt in ns1 is not 0
since veth1 with ipvlan1 still in ns1, (I) and (II) will be called to
register nf_hook in ns2 and unregister nf_hook in ns1. As a result,
ipvl_nf_hook_refcnt in ns1 is decreased incorrectly and this in ns2
is increased incorrectly. When the second net namespace is removed, a
reference count leak warning in ipvlan_ns_exit() will be triggered.
This patch add a check before ipvlan_migrate_l3s_hook() is called. The
warning can be triggered as follows:
$ ip netns add ns1
$ ip netns add ns2
$ ip netns exec ns1 ip link add veth1 type veth peer name veth2
$ ip netns exec ns1 ip link add veth3 type veth peer name veth4
$ ip netns exec ns1 ip link add ipv1 link veth1 type ipvlan mode l3s
$ ip netns exec ns1 ip link add ipv2 link veth3 type ipvlan mode l2
$ ip netns exec ns1 ip link set veth3 netns ns2
$ ip net del ns2
Fixes: 3133822f5a ("ipvlan: use pernet operations and restrict l3s hooks to master netns")
Signed-off-by: Lu Wei <luwei32@huawei.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230817145449.141827-1-luwei32@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The blamed commit resolved a bug where frames would still get stuck at
egress, even though they're smaller than the maxSDU[tc], because the
driver did not take into account the extra 33 ns that the queue system
needs for scheduling the frame.
It now takes that into account, but the arithmetic that we perform in
vsc9959_tas_remaining_gate_len_ps() is buggy, because we operate on
64-bit unsigned integers, so gate_len_ns - VSC9959_TAS_MIN_GATE_LEN_NS
may become a very large integer if gate_len_ns < 33 ns.
In practice, this means that we've introduced a regression where all
traffic class gates which are permanently closed will not get detected
by the driver, and we won't enable oversize frame dropping for them.
Before:
mscc_felix 0000:00:00.5: port 0: max frame size 1526 needs 12400000 ps, 1152000 ps for mPackets at speed 1000
mscc_felix 0000:00:00.5: port 0 tc 0 min gate len 1000000, sending all frames
mscc_felix 0000:00:00.5: port 0 tc 1 min gate len 0, sending all frames
mscc_felix 0000:00:00.5: port 0 tc 2 min gate len 0, sending all frames
mscc_felix 0000:00:00.5: port 0 tc 3 min gate len 0, sending all frames
mscc_felix 0000:00:00.5: port 0 tc 4 min gate len 0, sending all frames
mscc_felix 0000:00:00.5: port 0 tc 5 min gate len 0, sending all frames
mscc_felix 0000:00:00.5: port 0 tc 6 min gate len 0, sending all frames
mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 5120 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 615 octets including FCS
After:
mscc_felix 0000:00:00.5: port 0: max frame size 1526 needs 12400000 ps, 1152000 ps for mPackets at speed 1000
mscc_felix 0000:00:00.5: port 0 tc 0 min gate len 1000000, sending all frames
mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 5120 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 615 octets including FCS
Fixes: 11afdc6526 ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230817120111.3522827-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- Fix issues with adjusted MTUs (2 patches), by Sven Eckelmann
- Fix header access for memory reallocation case, by Remi Pommarel
- Fix two memory leaks (2 patches), by Remi Pommarel
* tag 'batadv-net-pullrequest-20230816' of git://git.open-mesh.org/linux-merge:
batman-adv: Fix batadv_v_ogm_aggr_send memory leak
batman-adv: Fix TT global entry leak when client roamed back
batman-adv: Do not get eth header before batadv_check_management_packet
batman-adv: Don't increase MTU when set by user
batman-adv: Trigger events for auto adjusted MTU
====================
Link: https://lore.kernel.org/r/20230816163318.189996-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
blk_crypto_profile_init() calls lockdep_register_key(), which warns and
does not register if the provided memory is a static object.
blk-crypto-fallback currently has a static blk_crypto_profile and calls
blk_crypto_profile_init() thereupon, resulting in the warning and
failure to register.
Fortunately it is simple enough to use a dynamically allocated profile
and make lockdep function correctly.
Fixes: 2fb48d88e7 ("blk-crypto: use dynamic lock class for blk_crypto_profile::lock")
Cc: stable@vger.kernel.org
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20230817141615.15387-1-sweettea-kernel@dorminy.me
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Set the next pointer in filename_trans_read_helper() before attaching
the new node under construction to the list, otherwise garbage would be
dereferenced on subsequent failure during cleanup in the out goto label.
Cc: <stable@vger.kernel.org>
Fixes: 4300590243 ("selinux: implement new format of filename transitions")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Pull arm64 fixes from Catalin Marinas:
"Two more SME fixes related to ptrace(): ensure that the SME is
properly set up for the target thread and that the thread sees
the ZT registers set via ptrace"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/ptrace: Ensure that the task sees ZT writes on first use
arm64/ptrace: Ensure that SME is set up for target when writing SSVE state
Pull gpio fixes from Bartosz Golaszewski:
- fix a regression in the sysfs interface
- fix a reference counting bug that's been around for years
- MAINTAINERS update
* tag 'gpio-fixes-for-v6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: fix reference leaks when removing GPIO chips still in use
gpiolib: sysfs: Do unexport GPIO when user asks for it
MAINTAINERS: add content regex for gpio-regmap
Pull smb client fix from Steve French:
"A small SMB mount option fix, also for stable"
* tag '6.5-rc6-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix null auth
Pull RISC-V fixes from Palmer Dabbelt:
- avoid excessive rejections from seccomp RET_ERRNO rules
- compressed jal/jalr decoding fix
- fixes for independent irq/softirq stacks on kernels built with
CONFIG_FRAME_POINTER=n
- avoid a hang handling uaccess fixups
- another build fix for toolchain ISA strings, this time for Zicsr and
Zifenci on old GNU toolchains
* tag 'riscv-for-linus-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Handle zicsr/zifencei issue between gcc and binutils
riscv: uaccess: Return the number of bytes effectively not copied
riscv: stack: Fixup independent softirq stack for CONFIG_FRAME_POINTER=n
riscv: stack: Fixup independent irq stack for CONFIG_FRAME_POINTER=n
riscv: correct riscv_insn_is_c_jr() and riscv_insn_is_c_jalr()
riscv: entry: set a0 = -ENOSYS only when syscall != -1
Pull sound fixes from Takashi Iwai:
"Slightly bigger than I wished, but here we go, a collection of fixes
for 6.5.
The only change in the core side is the ease for repeated ASoC error
messages, and the rest are all pretty device-specific small fixes
(including regression fixes) for ASoC Intel and HD-audio / USB-audio
quirks"
* tag 'sound-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Remodified 3k pull low procedure
ASoC: rt1308-sdw: fix random louder sound
ALSA: hda/cs8409: Support new Dell Dolphin Variants
ALSA: hda/realtek: Switch Dell Oasis models to use SPI
ALSA: hda/realtek: Add quirks for HP G11 Laptops
ASoC: meson: axg-tdm-formatter: fix channel slot allocation
ASoC: SOF: ipc4-topology: Update the basecfg for copier earlier
ASoC: SOF: intel: hda: Clean up link DMA for IPC3 during stop
ASoC: Intel: sof-sdw-cs42142: fix for codec button mapping
ASoC: Intel: sof-sdw: update jack detection quirk for LunarLake RVP
ASoC: SOF: Fix incorrect use of sizeof in sof_ipc3_do_rx_work()
ASoC: lower "no backend DAIs enabled for ... Port" log severity
ASoC: rt5665: add missed regulator_bulk_disable
ASoC: max98363: don't return on success reading revision ID
ALSA: usb-audio: Add support for Mythware XA001AU capture and playback interfaces.
ASoC: fsl: micfil: Use dual license micfil code
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Fix in_flight[issue_type] value error to properly manage requests
MMC host:
- wbsd: Fix double free in the probe error path
- sunplus: Fix error path in probe
- sdhci_f_sdh30: Fix order of function calls in sdhci_f_sdh30_remove"
* tag 'mmc-v6.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: f-sdh30: fix order of function calls in sdhci_f_sdh30_remove
mmc: sunplus: Fix error handling in spmmc_drv_probe()
mmc: sunplus: fix return value check of mmc_add_host()
mmc: wbsd: fix double mmc_free_host() in wbsd_init()
mmc: block: Fix in_flight[issue_type] value error
The code calling ima_free_kexec_buffer runs long after the memblock
allocator has already been torn down, potentially resulting in a use
after free in memblock_isolate_range.
With KASAN or KFENCE, this use after free will result in a BUG
from the idle task, and a subsequent kernel panic.
Switch ima_free_kexec_buffer over to memblock_free_late to avoid
that issue.
Fixes: fee3ff99bc ("powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c")
Cc: stable@kernel.org
Signed-off-by: Rik van Riel <riel@surriel.com>
Suggested-by: Mike Rappoport <rppt@kernel.org>
Link: https://lore.kernel.org/r/20230817135759.0888e5ef@imladris.surriel.com
Signed-off-by: Rob Herring <robh@kernel.org>
Pull pin control fixes from Linus Walleij:
"Fixes two issues with the Qualcomm SA8775P platform:
- Some minor device tree binding flunky that is nice to iron out but
more importantly:
- Support the increased interrupt targets mask from 3 to 4 bits,
making interrupts with higher (hardware) numbers work"
* tag 'pinctrl-v6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: qcom: Add intr_target_width field to support increased number of interrupt targets
dt-bindings: pinctrl: qcom,sa8775p-tlmm: add gpio function constant
Pull ARM SoC fixes from Arnd Bergmann:
"As usual, mostly DT fixes for the major Arm platforms from Qualcomm
and NXP, plus a bit for Rockchips and others:
The qualcomm fixes mainly deal with their higher-end arm64 devices
trees, fixing issues in L3 interconnect, crypto, thermal, UFS and a
regression for the DSI phy.
NXP i.MX has two correctness fixes for the 64-bit chips, dealing with
the imx93 "anatop" module and the CSI interface. On the 32-bit side,
there are functional fixes for RTC, display and SD card intefaces.
Rockchip fixes are for wifi support on certain boards, a eMMC
stability and DT build warnings.
On TI OMAP, a regulator is described in DT to avoid problems with the
ethernet phy initialization.
The code changes include a missing MMIO serialization on OMAP, plus a
few minor fixes on ASpeed and AMD/Zynq chips"
* tag 'soc-fixes-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (30 commits)
ARM: dts: am335x-bone-common: Add vcc-supply for on-board eeprom
ARM: dts: am335x-bone-common: Add GPIO PHY reset on revision C3 board
soc: aspeed: socinfo: Add kfree for kstrdup
soc: aspeed: uart-routing: Use __sysfs_match_string
ARM: dts: integrator: fix PCI bus dtc warnings
arm64: dts: imx93: Fix anatop node size
arm64: dts: qcom: sc7180: Fix DSI0_PHY reg-names
ARM: dts: imx: Set default tuning step for imx6sx usdhc
arm64: dts: imx8mm: Drop CSI1 PHY reference clock configuration
arm64: dts: imx8mn: Drop CSI1 PHY reference clock configuration
ARM: dts: imx: Set default tuning step for imx7d usdhc
ARM: dts: imx6: phytec: fix RTC interrupt level
ARM: dts: imx6sx: Remove LDB endpoint
arm64: dts: rockchip: Fix Wifi/Bluetooth on ROCK Pi 4 boards
ARM: zynq: Explicitly include correct DT includes
arm64: dts: qcom: sa8775p-ride: Update L4C parameters
arm64: dts: rockchip: minor whitespace cleanup around '='
arm64: dts: rockchip: Disable HS400 for eMMC on ROCK 4C+
arm64: dts: rockchip: Disable HS400 for eMMC on ROCK Pi 4
arm64: dts: rockchip: add missing space before { on indiedroid nova
...
Pull asm-generic regression fix from Arnd Bergmann:
"Just one partial revert for a commit from the merge window that caused
annoying behavior when building old kernels on arm64 hosts"
* tag 'asm-generic-fix-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: partially revert "Unify uapi bitsperlong.h for arm64, riscv and loongarch"
In production we were seeing a variety of WARN_ON()'s in the extent_map
code, specifically in btrfs_drop_extent_map_range() when we have to call
add_extent_mapping() for our second split.
Consider the following extent map layout
PINNED
[0 16K) [32K, 48K)
and then we call btrfs_drop_extent_map_range for [0, 36K), with
skip_pinned == true. The initial loop will have
start = 0
end = 36K
len = 36K
we will find the [0, 16k) extent, but since we are pinned we will skip
it, which has this code
start = em_end;
if (end != (u64)-1)
len = start + len - em_end;
em_end here is 16K, so now the values are
start = 16K
len = 16K + 36K - 16K = 36K
len should instead be 20K. This is a problem when we find the next
extent at [32K, 48K), we need to split this extent to leave [36K, 48k),
however the code for the split looks like this
split->start = start + len;
split->len = em_end - (start + len);
In this case we have
em_end = 48K
split->start = 16K + 36K // this should be 16K + 20K
split->len = 48K - (16K + 36K) // this overflows as 16K + 36K is 52K
and now we have an invalid extent_map in the tree that potentially
overlaps other entries in the extent map. Even in the non-overlapping
case we will have split->start set improperly, which will cause problems
with any block related calculations.
We don't actually need len in this loop, we can simply use end as our
end point, and only adjust start up when we find a pinned extent we need
to skip.
Adjust the logic to do this, which keeps us from inserting an invalid
extent map.
We only skip_pinned in the relocation case, so this is relatively rare,
except in the case where you are running relocation a lot, which can
happen with auto relocation on.
Fixes: 55ef689900 ("Btrfs: Fix btrfs_drop_extent_cache for skip pinned case")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Fix the below random NULL pointer crash during boot by serializing
pinctrl group and function creation/remove calls in
rzv2m_dt_subnode_to_map() with mutex lock.
Crash logs:
pc : __pi_strcmp+0x20/0x140
lr : pinmux_func_name_to_selector+0x68/0xa4
Call trace:
__pi_strcmp+0x20/0x140
pinmux_generic_add_function+0x34/0xcc
rzv2m_dt_subnode_to_map+0x2e4/0x418
rzv2m_dt_node_to_map+0x15c/0x18c
pinctrl_dt_to_map+0x218/0x37c
create_pinctrl+0x70/0x3d8
While at it, add a comment for lock.
Fixes: 92a9b82525 ("pinctrl: renesas: Add RZ/V2M pin and gpio controller driver")
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230815131558.33787-3-biju.das.jz@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fix the below random NULL pointer crash during boot by serializing
pinctrl group and function creation/remove calls in
rzg2l_dt_subnode_to_map() with mutex lock.
Crash log:
pc : __pi_strcmp+0x20/0x140
lr : pinmux_func_name_to_selector+0x68/0xa4
Call trace:
__pi_strcmp+0x20/0x140
pinmux_generic_add_function+0x34/0xcc
rzg2l_dt_subnode_to_map+0x314/0x44c
rzg2l_dt_node_to_map+0x164/0x194
pinctrl_dt_to_map+0x218/0x37c
create_pinctrl+0x70/0x3d8
While at it, add comments for bitmap_lock and lock.
Fixes: c4c4637eb5 ("pinctrl: renesas: Add RZ/G2L pin and gpio controller driver")
Tested-by: Chris Paterson <Chris.Paterson2@renesas.com>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230815131558.33787-2-biju.das.jz@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Syzbot was able to trigger use of uninitialized memory in
af_alg_free_resources.
Bug is caused by missing initialization of rsgl->sgl.need_unpin before
adding to rsgl_list. Then in case of extract_iter_to_sg() failure, rsgl
is left with uninitialized need_unpin which is read during clean up
BUG: KMSAN: uninit-value in af_alg_free_sg crypto/af_alg.c:545 [inline]
BUG: KMSAN: uninit-value in af_alg_free_areq_sgls crypto/af_alg.c:778 [inline]
BUG: KMSAN: uninit-value in af_alg_free_resources+0x3d1/0xf60 crypto/af_alg.c:1117
af_alg_free_sg crypto/af_alg.c:545 [inline]
af_alg_free_areq_sgls crypto/af_alg.c:778 [inline]
af_alg_free_resources+0x3d1/0xf60 crypto/af_alg.c:1117
_skcipher_recvmsg crypto/algif_skcipher.c:144 [inline]
...
Uninit was created at:
slab_post_alloc_hook+0x12f/0xb70 mm/slab.h:767
slab_alloc_node mm/slub.c:3470 [inline]
__kmem_cache_alloc_node+0x536/0x8d0 mm/slub.c:3509
__do_kmalloc_node mm/slab_common.c:984 [inline]
__kmalloc+0x121/0x3c0 mm/slab_common.c:998
kmalloc include/linux/slab.h:586 [inline]
sock_kmalloc+0x128/0x1c0 net/core/sock.c:2683
af_alg_alloc_areq+0x41/0x2a0 crypto/af_alg.c:1188
_skcipher_recvmsg crypto/algif_skcipher.c:71 [inline]
Fixes: c1abe6f570 ("crypto: af_alg: Use extract_iter_to_sg() to create scatterlists")
Reported-and-tested-by: syzbot+cba21d50095623218389@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=cba21d50095623218389
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In case the downstream bridge or panel uses DSI transfers before the
DSI host was actually initialized through samsung_dsim_atomic_enable()
which clears the stop state (LP11) mode, all transfers will fail.
This happens with downstream bridges that are controlled by DSI
commands such as the tc358762.
As documented in [1] DSI hosts are expected to allow transfers
outside the normal bridge enable/disable flow.
To fix this make sure that stop state is cleared in
samsung_dsim_host_transfer() which restores the previous
behavior.
We also factor out the common code to enable/disable stop state
to samsung_dsim_set_stop_state().
[1] https://docs.kernel.org/gpu/drm-kms-helpers.html#mipi-dsi-bridge-operation
Fixes: 0c14d31306 ("drm: bridge: samsung-dsim: Fix i.MX8M enable flow to meet spec")
Reported-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Tested-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230724151640.555490-1-frieder@fris.de
Pull networking fixes from Jakub Kicinski:
"Including fixes from ipsec and netfilter.
No known outstanding regressions.
Fixes to fixes:
- virtio-net: set queues after driver_ok, avoid a potential race
added by recent fix
- Revert "vlan: Fix VLAN 0 memory leak", it may lead to a warning
when VLAN 0 is registered explicitly
- nf_tables:
- fix false-positive lockdep splat in recent fixes
- don't fail inserts if duplicate has expired (fix test failures)
- fix races between garbage collection and netns dismantle
Current release - new code bugs:
- mlx5: Fix mlx5_cmd_update_root_ft() error flow
Previous releases - regressions:
- phy: fix IRQ-based wake-on-lan over hibernate / power off
Previous releases - always broken:
- sock: fix misuse of sk_under_memory_pressure() preventing system
from exiting global TCP memory pressure if a single cgroup is under
pressure
- fix the RTO timer retransmitting skb every 1ms if linear option is
enabled
- af_key: fix sadb_x_filter validation, amment netlink policy
- ipsec: fix slab-use-after-free in decode_session6()
- macb: in ZynqMP resume always configure PS GTR for non-wakeup
source
Misc:
- netfilter: set default timeout to 3 secs for sctp shutdown send and
recv state (from 300ms), align with protocol timers"
* tag 'net-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
ice: Block switchdev mode when ADQ is active and vice versa
qede: fix firmware halt over suspend and resume
net: do not allow gso_size to be set to GSO_BY_FRAGS
sock: Fix misuse of sk_under_memory_pressure()
sfc: don't fail probe if MAE/TC setup fails
sfc: don't unregister flow_indr if it was never registered
net: dsa: mv88e6xxx: Wait for EEPROM done before HW reset
net/mlx5: Fix mlx5_cmd_update_root_ft() error flow
net/mlx5e: XDP, Fix fifo overrun on XDP_REDIRECT
i40e: fix misleading debug logs
iavf: fix FDIR rule fields masks validation
ipv6: fix indentation of a config attribute
mailmap: add entries for Simon Horman
broadcom: b44: Use b44_writephy() return value
net: openvswitch: reject negative ifindex
team: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves
net: phy: broadcom: stub c45 read/write for 54810
netfilter: nft_dynset: disallow object maps
netfilter: nf_tables: GC transaction race with netns dismantle
netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path
...
Pull drm fixes from Dave Airlie:
"Regular enough week, mostly the usual amdgpu and i915 fixes. Also
qaic, nouveau, qxl and a revert for an EDID patch that had some side
effects, along with a couple of panel fixes.
edid:
- revert mode parsing fix that had side effects.
i915:
- Fix the flow for ignoring GuC SLPC efficient frequency selection
- Fix SDVO panel_type initialization
- Fix display probe for IVB Q and IVB D GT2 server
nouveau:
- fix use-after-free in connector code
qaic:
- integer overflow check fix
- fix slicing memory leak
panel:
- fix JDI LT070ME05000 probing
- fix AUO G121EAN01 timings
amdgpu:
- SMU 13.x fixes
- Fix mcbp parameter for gfx9
- SMU 11.x fixes
- Temporary fix for large numbers of XCP partitions
- S0ix fixes
- DCN 2.0 fix
qxl:
- fix use after free race in dumb object allocation"
* tag 'drm-fixes-2023-08-18-1' of git://anongit.freedesktop.org/drm/drm:
drm/qxl: fix UAF on handle creation
Revert "drm/edid: Fix csync detailed mode parsing"
drm/nouveau/disp: fix use-after-free in error handling of nouveau_connector_create
Revert "Revert "drm/amdgpu/display: change pipe policy for DCN 2.0""
drm/amd: flush any delayed gfxoff on suspend entry
drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix
drm/amdgpu: skip xcp drm device allocation when out of drm resource
drm/amd/pm: Update pci link width for smu v13.0.6
drm/amd/pm: Fix temperature unit of SMU v13.0.6
drm/amdgpu/pm: fix throttle_status for other than MP1 11.0.7
drm/amdgpu: disable mcbp if parameter zero is set
drm/amd/pm: disallow the fan setting if there is no fan on smu 13.0.0
accel/qaic: Clean up integer overflow checking in map_user_pages()
accel/qaic: Fix slicing memory leak
drm/i915: fix display probe for IVB Q and IVB D GT2 server
drm/i915/sdvo: fix panel_type initialization
drm/i915/guc/slpc: Restore efficient freq earlier
drm/panel: simple: Fix AUO G121EAN01 panel timings according to the docs
drm/panel: JDI LT070ME05000 simplify with dev_err_probe()
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-08-16 (iavf, i40e)
This series contains updates to iavf and i40e drivers.
Piotr adds checks for unsupported Flow Director rules on iavf.
Andrii replaces incorrect 'write' messaging on read operations for i40e.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: fix misleading debug logs
iavf: fix FDIR rule fields masks validation
====================
Link: https://lore.kernel.org/r/20230816193308.1307535-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While performing certain power-off sequences, PCI drivers are
called to suspend and resume their underlying devices through
PCI PM (power management) interface. However this NIC hardware
does not support PCI PM suspend/resume operations so system wide
suspend/resume leads to bad MFW (management firmware) state which
causes various follow-up errors in driver when communicating with
the device/firmware afterwards.
To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding
system to go into suspended/standby mode.
Without this fix device/firmware does not recover unless system
is power cycled.
Fixes: 2950219d87 ("qede: Add basic network device support")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230816150711.59035-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The status of global socket memory pressure is updated when:
a) __sk_mem_raise_allocated():
enter: sk_memory_allocated(sk) > sysctl_mem[1]
leave: sk_memory_allocated(sk) <= sysctl_mem[0]
b) __sk_mem_reduce_allocated():
leave: sk_under_memory_pressure(sk) &&
sk_memory_allocated(sk) < sysctl_mem[0]
So the conditions of leaving global pressure are inconstant, which
may lead to the situation that one pressured net-memcg prevents the
global pressure from being cleared when there is indeed no global
pressure, thus the global constrains are still in effect unexpectedly
on the other sockets.
This patch fixes this by ignoring the net-memcg's pressure when
deciding whether should leave global memory pressure.
Fixes: e1aab161e0 ("socket: initial cgroup code.")
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Link: https://lore.kernel.org/r/20230816091226.1542-1-wuyun.abel@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the value of ZT is set via ptrace we don't disable traps for SME.
This means that when a the task has never used SME before then the value
set via ptrace will never be seen by the target task since it will
trigger a SME access trap which will flush the register state.
Disable SME traps when setting ZT, this means we also need to allocate
storage for SVE if it is not already allocated, for the benefit of
streaming SVE.
Fixes: f90b529bcb ("arm64/sme: Implement ZT0 ptrace support")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org> # 6.3.x
Link: https://lore.kernel.org/r/20230816-arm64-zt-ptrace-first-use-v2-1-00aa82847e28@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When we use NT_ARM_SSVE to either enable streaming mode or change the
vector length for a process we do not currently do anything to ensure that
there is storage allocated for the SME specific register state. If the
task had not previously used SME or we changed the vector length then
the task will not have had TIF_SME set or backing storage for ZA/ZT
allocated, resulting in inconsistent register sizes when saving state
and spurious traps which flush the newly set register state.
We should set TIF_SME to disable traps and ensure that storage is
allocated for ZA and ZT if it is not already allocated. This requires
modifying sme_alloc() to make the flush of any existing register state
optional so we don't disturb existing state for ZA and ZT.
Fixes: e12310a0d3 ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Reported-by: David Spickett <David.Spickett@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org> # 5.19.x
Link: https://lore.kernel.org/r/20230810-arm64-fix-ptrace-race-v1-1-a5361fad2bd6@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Kmemleak report a leak in graph_trace_open():
unreferenced object 0xffff0040b95f4a00 (size 128):
comm "cat", pid 204981, jiffies 4301155872 (age 99771.964s)
hex dump (first 32 bytes):
e0 05 e7 b4 ab 7d 00 00 0b 00 01 00 00 00 00 00 .....}..........
f4 00 01 10 00 a0 ff ff 00 00 00 00 65 00 10 00 ............e...
backtrace:
[<000000005db27c8b>] kmem_cache_alloc_trace+0x348/0x5f0
[<000000007df90faa>] graph_trace_open+0xb0/0x344
[<00000000737524cd>] __tracing_open+0x450/0xb10
[<0000000098043327>] tracing_open+0x1a0/0x2a0
[<00000000291c3876>] do_dentry_open+0x3c0/0xdc0
[<000000004015bcd6>] vfs_open+0x98/0xd0
[<000000002b5f60c9>] do_open+0x520/0x8d0
[<00000000376c7820>] path_openat+0x1c0/0x3e0
[<00000000336a54b5>] do_filp_open+0x14c/0x324
[<000000002802df13>] do_sys_openat2+0x2c4/0x530
[<0000000094eea458>] __arm64_sys_openat+0x130/0x1c4
[<00000000a71d7881>] el0_svc_common.constprop.0+0xfc/0x394
[<00000000313647bf>] do_el0_svc+0xac/0xec
[<000000002ef1c651>] el0_svc+0x20/0x30
[<000000002fd4692a>] el0_sync_handler+0xb0/0xb4
[<000000000c309c35>] el0_sync+0x160/0x180
The root cause is descripted as follows:
__tracing_open() { // 1. File 'trace' is being opened;
...
*iter->trace = *tr->current_trace; // 2. Tracer 'function_graph' is
// currently set;
...
iter->trace->open(iter); // 3. Call graph_trace_open() here,
// and memory are allocated in it;
...
}
s_start() { // 4. The opened file is being read;
...
*iter->trace = *tr->current_trace; // 5. If tracer is switched to
// 'nop' or others, then memory
// in step 3 are leaked!!!
...
}
To fix it, in s_start(), close tracer before switching then reopen the
new tracer after switching. And some tracers like 'wakeup' may not update
'iter->private' in some cases when reopen, then it should be cleared
to avoid being mistakenly closed again.
Link: https://lore.kernel.org/linux-trace-kernel/20230817125539.1646321-1-zhengyejian1@huawei.com
Fixes: d7350c3f45 ("tracing/core: make the read callbacks reentrants")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Merge series from Richard Fitzgerald <rf@opensource.cirrus.com>:
These two patches add an ACPI HID and update the way the platform-
specific firmware identifier is extracted from the ACPI.
Pull nfsd fix from Chuck Lever:
- Fix new MSG_SPLICE_PAGES support in server's TCP sendmsg helper
* tag 'nfsd-6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
sunrpc: set the bv_offset of first bvec in svc_tcp_sendmsg
Be more careful when tearing down the subrequests of an O_DIRECT write
as part of a retransmission.
Reported-by: Chris Mason <clm@fb.com>
Fixes: ed5d588fe4 ("NFS: Try to join page groups before an O_DIRECT retransmission")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Pausing and canceling balance can race to interrupt balance lead to BUG_ON
panic in btrfs_cancel_balance. The BUG_ON condition in btrfs_cancel_balance
does not take this race scenario into account.
However, the race condition has no other side effects. We can fix that.
Reproducing it with panic trace like this:
kernel BUG at fs/btrfs/volumes.c:4618!
RIP: 0010:btrfs_cancel_balance+0x5cf/0x6a0
Call Trace:
<TASK>
? do_nanosleep+0x60/0x120
? hrtimer_nanosleep+0xb7/0x1a0
? sched_core_clone_cookie+0x70/0x70
btrfs_ioctl_balance_ctl+0x55/0x70
btrfs_ioctl+0xa46/0xd20
__x64_sys_ioctl+0x7d/0xa0
do_syscall_64+0x38/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Race scenario as follows:
> mutex_unlock(&fs_info->balance_mutex);
> --------------------
> .......issue pause and cancel req in another thread
> --------------------
> ret = __btrfs_balance(fs_info);
>
> mutex_lock(&fs_info->balance_mutex);
> if (ret == -ECANCELED && atomic_read(&fs_info->balance_pause_req)) {
> btrfs_info(fs_info, "balance: paused");
> btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED);
> }
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: xiaoshoukui <xiaoshoukui@ruijie.com.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
bio_ctrl->len_to_oe_boundary is used to make sure we stay inside a zone
as we submit bios for writes. Every time we add a page to the bio, we
decrement those bytes from len_to_oe_boundary, and then we submit the
bio if we happen to hit zero.
Most of the time, len_to_oe_boundary gets set to U32_MAX.
submit_extent_page() adds pages into our bio, and the size of the bio
ends up limited by:
- Are we contiguous on disk?
- Does bio_add_page() allow us to stuff more in?
- is len_to_oe_boundary > 0?
The len_to_oe_boundary math starts with U32_MAX, which isn't page or
sector aligned, and subtracts from it until it hits zero. In the
non-zoned case, the last IO we submit before we hit zero is going to be
unaligned, triggering BUGs.
This is hard to trigger because bio_add_page() isn't going to make a bio
of U32_MAX size unless you give it a perfect set of pages and fully
contiguous extents on disk. We can hit it pretty reliably while making
large swapfiles during provisioning because the machine is freshly
booted, mostly idle, and the disk is freshly formatted. It's also
possible to trigger with reads when read_ahead_kb is set to 4GB.
The code has been clean up and shifted around a few times, but this flaw
has been lurking since the counter was added. I think the commit
24e6c80822 ("btrfs: simplify main loop in submit_extent_page") ended
up exposing the bug.
The fix used here is to skip doing math on len_to_oe_boundary unless
we've changed it from the default U32_MAX value. bio_add_page() is the
real limit we want, and there's no reason to do extra math when block
layer is doing it for us.
Sample reproducer, note you'll need to change the path to the bdi and
device:
SUBVOL=/btrfs/swapvol
SWAPFILE=$SUBVOL/swapfile
SZMB=8192
mkfs.btrfs -f /dev/vdb
mount /dev/vdb /btrfs
btrfs subvol create $SUBVOL
chattr +C $SUBVOL
dd if=/dev/zero of=$SWAPFILE bs=1M count=$SZMB
sync
echo 4 > /proc/sys/vm/drop_caches
echo 4194304 > /sys/class/bdi/btrfs-2/read_ahead_kb
while true; do
echo 1 > /proc/sys/vm/drop_caches
echo 1 > /proc/sys/vm/drop_caches
dd of=/dev/zero if=$SWAPFILE bs=4096M count=2 iflag=fullblock
done
Fixes: 24e6c80822 ("btrfs: simplify main loop in submit_extent_page")
CC: stable@vger.kernel.org # 6.4+
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Fstests with POST_MKFS_CMD="btrfstune -m" (as in the mailing list)
reported a few of the test cases failing.
The failure scenario can be summarized and simplified as follows:
$ mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb1 /dev/sdb2 :0
$ btrfstune -m /dev/sdb1 :0
$ wipefs -a /dev/sdb1 :0
$ mount -o degraded /dev/sdb2 /btrfs :0
$ btrfs replace start -B -f -r 1 /dev/sdb1 /btrfs :1
STDERR:
ERROR: ioctl(DEV_REPLACE_START) failed on "/btrfs": Input/output error
[11290.583502] BTRFS warning (device sdb2): tree block 22036480 mirror 2 has bad fsid, has 99835c32-49f0-4668-9e66-dc277a96b4a6 want da40350c-33ac-4872-92a8-4948ed8c04d0
[11290.586580] BTRFS error (device sdb2): unable to fix up (regular) error at logical 22020096 on dev /dev/sdb8 physical 1048576
As above, the replace is failing because we are verifying the header with
fs_devices::fsid instead of fs_devices::metadata_uuid, despite the
metadata_uuid actually being present.
To fix this, use fs_devices::metadata_uuid. We copy fsid into
fs_devices::metadata_uuid if there is no metadata_uuid, so its fine.
Fixes: a3ddbaebc7 ("btrfs: scrub: introduce a helper to verify one metadata block")
CC: stable@vger.kernel.org # 6.4+
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Unifying the asm-generic headers across 32-bit and 64-bit architectures
based on the compiler provided macros was a good idea and appears to work
with all user space, but it caused a regression when building old kernels
on systems that have the new headers installed in /usr/include, as this
combination trips an inconsistency in the kernel's own tools/include
headers that are a mix of userspace and kernel-internal headers.
This affects kernel builds on arm64, riscv64 and loongarch64 systems that
might end up using the "#define __BITS_PER_LONG 32" default from the old
tools headers. Backporting the commit into stable kernels would address
this, but it would still break building kernels without that backport,
and waste time for developers trying to understand the problem.
arm64 build machines are rather common, and on riscv64 this can also
happen in practice, but loongarch64 is probably new enough to not
be used much for building old kernels, so only revert the bits
for arm64 and riscv.
Link: https://lore.kernel.org/all/20230731160402.GB1823389@dev-arch.thelio-3990X/
Reported-by: Nathan Chancellor <nathan@kernel.org>
Fixes: 8386f58f8d ("asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch")
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Qualcomm ARM64 fixes for v6.5
This corrects the invalid path specifier for L3 interconnects in the CPU
nodes of SM8150 and SM8250. It corrects the compatible of the SC8180X L3
node, to pass the binding check.
The crypto core, and its DMA controller, is disabled on SM8350 to avoid
the system from crashing at boot while the issue is diagnosed.
A thermal zone node name conflict is resolved for PM8150L, on the RB5
board.
The UFS vccq voltage is corrected on the SA877P Ride platform, to
address observed stability issues.
The reg-names of the DSI phy on SC7180 are restored after an accidental
search-and-replace update.
* tag 'qcom-arm64-fixes-for-6.5' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: dts: qcom: sc7180: Fix DSI0_PHY reg-names
arm64: dts: qcom: sa8775p-ride: Update L4C parameters
arm64: dts: qcom: qrb5165-rb5: fix thermal zone conflict
arm64: dts: qcom: sm8350: fix BAM DMA crash and reboot
arm64: dts: qcom: sc8180x: Fix OSM L3 compatible
arm64: dts: qcom: sm8250: Fix EPSS L3 interconnect cells
arm64: dts: qcom: sm8150: Fix OSM L3 interconnect cells
Link: https://lore.kernel.org/r/20230815142042.2459048-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes for omaps
A fix external abort on non-linefetch for am335x that is fixed with a flush
of posted write. And two networking fixes for beaglebone mostly for revision
c3 to do phy reset with a gpio and to fix a boot time warning.
* tag 'omap-for-v6.5/fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: am335x-bone-common: Add vcc-supply for on-board eeprom
ARM: dts: am335x-bone-common: Add GPIO PHY reset on revision C3 board
bus: ti-sysc: Flush posted write on enable before reset
Link: https://lore.kernel.org/r/pull-1692158536-457318@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Correct wifi interrupt flags for some boards, fixed wifi on Rock PI4,
disabled hs400 speeds for some boards having problems with data
intergrity and some dt property/styling fixes.
* tag 'v6.5-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: Fix Wifi/Bluetooth on ROCK Pi 4 boards
arm64: dts: rockchip: minor whitespace cleanup around '='
arm64: dts: rockchip: Disable HS400 for eMMC on ROCK 4C+
arm64: dts: rockchip: Disable HS400 for eMMC on ROCK Pi 4
arm64: dts: rockchip: add missing space before { on indiedroid nova
arm64: dts: rockchip: correct wifi interrupt flag in Box Demo
arm64: dts: rockchip: correct wifi interrupt flag in Rock Pi 4B
arm64: dts: rockchip: correct wifi interrupt flag in eaidk-610
arm64: dts: rockchip: Drop invalid regulator-init-microvolt property
Link: https://lore.kernel.org/r/4519945.8hzESeGDPO@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Use a device property "cirrus,firmware-uid" to get the unique firmware
identifier instead of using ACPI _SUB. There aren't any products that use
_SUB.
There will not usually be a _SUB in Soundwire nodes. The ACPI can use a
_DSD section for custom properties.
There is also a need to support instantiating this driver using software
nodes. This is for systems where the CS35L56 is a back-end device and the
ACPI refers only to the front-end audio device - there will not be any ACPI
references to CS35L56.
Fixes: e496112529 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56")
Signed-off-by: Maciej Strozek <mstrozek@opensource.cirrus.com>
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20230817112712.16637-2-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Christian reported spurious module load crashes after some of Song's
module memory layout patches.
Turns out that if the very last instruction on the very last page of the
module is a 'JMP __x86_return_thunk' then __static_call_fixup() will
trip a fault and die.
And while the module rework made this slightly more likely to happen,
it's always been possible.
Fixes: ee88d363d1 ("x86,static_call: Use alternative RET encoding")
Reported-by: Christian Bricart <christian@bricart.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lkml.kernel.org/r/20230816104419.GA982867@hirez.programming.kicks-ass.net
If the switch is reset during active EEPROM transactions, as in
just after an SoC reset after power up, the I2C bus transaction
may be cut short leaving the EEPROM internal I2C state machine
in the wrong state. When the switch is reset again, the bad
state machine state may result in data being read from the wrong
memory location causing the switch to enter unexpected mode
rendering it inoperational.
Fixes: a3dcb3e7e7 ("net: dsa: mv88e6xxx: Wait for EEPROM done after HW reset")
Signed-off-by: Alfred Lee <l00g33k@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230815001323.24739-1-l00g33k@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With hardened usercopy enabled (CONFIG_HARDENED_USERCOPY=y), using the
/proc/powerpc/rtas/firmware_update interface to prepare a system
firmware update yields a BUG():
kernel BUG at mm/usercopy.c:102!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 0 PID: 2232 Comm: dd Not tainted 6.5.0-rc3+ #2
Hardware name: IBM,8408-E8E POWER8E (raw) 0x4b0201 0xf000004 of:IBM,FW860.50 (SV860_146) hv:phyp pSeries
NIP: c0000000005991d0 LR: c0000000005991cc CTR: 0000000000000000
REGS: c0000000148c76a0 TRAP: 0700 Not tainted (6.5.0-rc3+)
MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 24002242 XER: 0000000c
CFAR: c0000000001fbd34 IRQMASK: 0
[ ... GPRs omitted ... ]
NIP usercopy_abort+0xa0/0xb0
LR usercopy_abort+0x9c/0xb0
Call Trace:
usercopy_abort+0x9c/0xb0 (unreliable)
__check_heap_object+0x1b4/0x1d0
__check_object_size+0x2d0/0x380
rtas_flash_write+0xe4/0x250
proc_reg_write+0xfc/0x160
vfs_write+0xfc/0x4e0
ksys_write+0x90/0x160
system_call_exception+0x178/0x320
system_call_common+0x160/0x2c4
The blocks of the firmware image are copied directly from user memory
to objects allocated from flash_block_cache, so flash_block_cache must
be created using kmem_cache_create_usercopy() to mark it safe for user
access.
Fixes: 6d07d1cd30 ("usercopy: Restrict non-usercopy caches to size 0")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
[mpe: Trim and indent oops]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230810-rtas-flash-vs-hardened-usercopy-v2-1-dcf63793a938@linux.ibm.com
For stack-validation of a frame-pointer build, objtool validates that
every CALL instruction is preceded by a frame-setup. The new SRSO
return thunks violate this with their RSB stuffing trickery.
Extend the __fentry__ exception to also cover the embedded_insn case
used for this. This cures:
vmlinux.o: warning: objtool: srso_untrain_ret+0xd: call without frame pointer save/setup
Fixes: 4ae68b26c3 ("objtool/x86: Fix SRSO mess")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20230816115921.GH980931@hirez.programming.kicks-ass.net
The cited patch change mlx5_cmd_update_root_ft() to work with multiple
peer devices. However, it didn't align the error flow as well.
Hence, Fix the error code to work with multiple peer devices.
Fixes: 222dd18583 ("{net/RDMA}/mlx5: introduce lag_for_each_peer")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Before this fix, running high rate traffic through XDP_REDIRECT
with multibuf could overrun the fifo used to release the
xdp frames after tx completion. This resulted in corrupted data
being consumed on the free side.
The culplirt was a miscalculation of the fifo size: the maximum ratio
between fifo entries / data segments was incorrect. This ratio serves to
calculate the max fifo size for a full sq where each packet uses the
worst case number of entries in the fifo.
This patch fixes the formula and names the constant. It also makes sure
that future values will use a power of 2 number of entries for the fifo
mask to work.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Fixes: 3f734b8c59 ("net/mlx5e: XDP, Use multiple single-entry objects in xdpi_fifo")
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
While debugging another issue I noticed that the stack trace contains one
invalid entry at the end:
<idle>-0 [008] d..4. 26.484201: wake_lat: pid=0 delta=2629976084 000000009cc24024 stack=STACK:
=> __schedule+0xac6/0x1a98
=> schedule+0x126/0x2c0
=> schedule_timeout+0x150/0x2c0
=> kcompactd+0x9ca/0xc20
=> kthread+0x2f6/0x3d8
=> __ret_from_fork+0x8a/0xe8
=> 0x6b6b6b6b6b6b6b6b
This is because the code failed to add the one element containing the
number of entries to field_size.
Link: https://lkml.kernel.org/r/20230816154928.4171614-4-svens@linux.ibm.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 00cf3d672a ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
While debugging another issue I noticed that the stack trace output
contains the number of entries on top:
<idle>-0 [000] d..4. 203.322502: wake_lat: pid=0 delta=2268270616 stack=STACK:
=> 0x10
=> __schedule+0xac6/0x1a98
=> schedule+0x126/0x2c0
=> schedule_timeout+0x242/0x2c0
=> __wait_for_common+0x434/0x680
=> __wait_rcu_gp+0x198/0x3e0
=> synchronize_rcu+0x112/0x138
=> ring_buffer_reset_online_cpus+0x140/0x2e0
=> tracing_reset_online_cpus+0x15c/0x1d0
=> tracing_set_clock+0x180/0x1d8
=> hist_register_trigger+0x486/0x670
=> event_hist_trigger_parse+0x494/0x1318
=> trigger_process_regex+0x1d4/0x258
=> event_trigger_write+0xb4/0x170
=> vfs_write+0x210/0xad0
=> ksys_write+0x122/0x208
Fix this by skipping the first element. Also replace the pointer
logic with an index variable which is easier to read.
Link: https://lkml.kernel.org/r/20230816154928.4171614-3-svens@linux.ibm.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 00cf3d672a ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Similar to how it doesn't make sense to have UNTRAIN_RET have two
untrain calls, it also doesn't make sense for VMEXIT to have an extra
IBPB call.
This cures VMEXIT doing potentially unret+IBPB or double IBPB.
Also, the (SEV) VMEXIT case seems to have been overlooked.
Redefine the meaning of the synthetic IBPB flags to:
- ENTRY_IBPB -- issue IBPB on entry (was: entry + VMEXIT)
- IBPB_ON_VMEXIT -- issue IBPB on VMEXIT
And have 'retbleed=ibpb' set *BOTH* feature flags to ensure it retains
the previous behaviour and issues IBPB on entry+VMEXIT.
The new 'srso=ibpb_vmexit' option only sets IBPB_ON_VMEXIT.
Create UNTRAIN_RET_VM specifically for the VMEXIT case, and have that
check IBPB_ON_VMEXIT.
All this avoids having the VMEXIT case having to check both ENTRY_IBPB
and IBPB_ON_VMEXIT and simplifies the alternatives.
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121149.109557833@infradead.org
Since there can only be one active return_thunk, there only needs be
one (matching) untrain_ret. It fundamentally doesn't make sense to
allow multiple untrain_ret at the same time.
Fold all the 3 different untrain methods into a single (temporary)
helper stub.
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121149.042774962@infradead.org
Use the existing configurable return thunk. There is absolute no
justification for having created this __x86_return_thunk alternative.
To clarify, the whole thing looks like:
Zen3/4 does:
srso_alias_untrain_ret:
nop2
lfence
jmp srso_alias_return_thunk
int3
srso_alias_safe_ret: // aliasses srso_alias_untrain_ret just so
add $8, %rsp
ret
int3
srso_alias_return_thunk:
call srso_alias_safe_ret
ud2
While Zen1/2 does:
srso_untrain_ret:
movabs $foo, %rax
lfence
call srso_safe_ret (jmp srso_return_thunk ?)
int3
srso_safe_ret: // embedded in movabs instruction
add $8,%rsp
ret
int3
srso_return_thunk:
call srso_safe_ret
ud2
While retbleed does:
zen_untrain_ret:
test $0xcc, %bl
lfence
jmp zen_return_thunk
int3
zen_return_thunk: // embedded in the test instruction
ret
int3
Where Zen1/2 flush the BTB entry using the instruction decoder trick
(test,movabs) Zen3/4 use BTB aliasing. SRSO adds a return sequence
(srso_safe_ret()) which forces the function return instruction to
speculate into a trap (UD2). This RET will then mispredict and
execution will continue at the return site read from the top of the
stack.
Pick one of three options at boot (evey function can only ever return
once).
[ bp: Fixup commit message uarch details and add them in a comment in
the code too. Add a comment about the srso_select_mitigation()
dependency on retbleed_select_mitigation(). Add moar ifdeffery for
32-bit builds. Add a dummy srso_untrain_ret_alias() definition for
32-bit alternatives needing the symbol. ]
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.842775684@infradead.org
DCN 3.1.4 is reported to hang on s2idle entry if graphics activity
is happening during entry. This is because GFXOFF was scheduled as
delayed but RLC gets disabled in s2idle entry sequence which will
hang GFX IP if not already in GFXOFF.
To help this problem, flush any delayed work for GFXOFF early in
s2idle entry sequence to ensure that it's off when RLC is changed.
commit 4b31b92b14 ("drm/amdgpu: complete gfxoff allow signal during
suspend without delay") modified power gating flow so that if called
in s0ix that it ensured that GFXOFF wasn't put in work queue but
instead processed immediately.
This is dead code due to commit 10cb67eb8a ("drm/amdgpu: skip
CG/PG for gfx during S0ix") because GFXOFF will now not be explicitly
called as part of the suspend entry code. Remove that dead code.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
GFX v11.0.1 reported fence fallback timer expired issue on
SDMA and GFX rings after S0ix resume. This is generated by
EOP interrupts are disabled when S0ix suspend but fails to
re-enable when resume because of the GFX is in GFXOFF.
[ 203.349571] [drm] Fence fallback timer expired on ring sdma0
[ 203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0
[ 203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0
For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers
to configure the fence driver interrupts for rings that belong to GFX.
The interrupts configuration will be restored by GFXOFF exit.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Update addresses of PCIE link width registers,
& link width format used to populate gpu metrics
table for smu v13.0.6
v2:
Removed ESM register update
v3:
Updated patch subject and message
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The parameter amdgpu_mcbp shall have priority against the default value
calculated from the chip version.
User could disable mcbp by setting the parameter mcbp as zero.
v2: do not trigger preemption in sw ring muxer when mcbp is disabled.
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This testcase is constrived to reproduce a problem that the cpu buffers
become unavailable which is due to 'record_disabled' of array_buffer and
max_buffer being messed up.
Local test result after bugfix:
# ./ftracetest test.d/00basic/snapshot1.tc
=== Ftrace unit tests ===
[1] Snapshot and tracing_cpumask [PASS]
[2] (instance) Snapshot and tracing_cpumask [PASS]
# of passed: 2
# of failed: 0
# of unresolved: 0
# of untested: 0
# of unsupported: 0
# of xfailed: 0
# of undefined(test bug): 0
Link: https://lkml.kernel.org/r/20230805033816.3284594-3-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <vnagarnaik@google.com>
Cc: <shuah@kernel.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Trace ring buffer can no longer record anything after executing
following commands at the shell prompt:
# cd /sys/kernel/tracing
# cat tracing_cpumask
fff
# echo 0 > tracing_cpumask
# echo 1 > snapshot
# echo fff > tracing_cpumask
# echo 1 > tracing_on
# echo "hello world" > trace_marker
-bash: echo: write error: Bad file descriptor
The root cause is that:
1. After `echo 0 > tracing_cpumask`, 'record_disabled' of cpu buffers
in 'tr->array_buffer.buffer' became 1 (see tracing_set_cpumask());
2. After `echo 1 > snapshot`, 'tr->array_buffer.buffer' is swapped
with 'tr->max_buffer.buffer', then the 'record_disabled' became 0
(see update_max_tr());
3. After `echo fff > tracing_cpumask`, the 'record_disabled' become -1;
Then array_buffer and max_buffer are both unavailable due to value of
'record_disabled' is not 0.
To fix it, enable or disable both array_buffer and max_buffer at the same
time in tracing_set_cpumask().
Link: https://lkml.kernel.org/r/20230805033816.3284594-2-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <vnagarnaik@google.com>
Cc: <shuah@kernel.org>
Fixes: 71babb2705 ("tracing: change CPU ring buffer state from tracing_cpumask")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Return an error if a field's mask is neither full nor empty. When a mask
is only partial the field is not being used for rule programming but it
gives a wrong impression it is used. Fix by returning an error on any
partial mask to make it clear they are not supported.
The ip_ver assignment is moved earlier in code to allow using it in
iavf_validate_fdir_fltr_masks.
Fixes: 527691bf06 ("iavf: Support IPv4 Flow Director filters")
Fixes: e90cbc257a ("iavf: Support IPv6 Flow Director filters")
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
It was reported that the riscv kernel hangs while executing the test
in [1].
Indeed, the test hangs when trying to write a buffer to a file. The
problem is that the riscv implementation of raw_copy_from_user() does not
return the correct number of bytes not written when an exception happens
and is fixed up, instead it always returns the initial size to copy,
even if some bytes were actually copied.
generic_perform_write() pre-faults the user pages and bails out if nothing
can be written, otherwise it will access the userspace buffer: here the
riscv implementation keeps returning it was not able to copy any byte
though the pre-faulting indicates otherwise. So generic_perform_write()
keeps retrying to access the user memory and ends up in an infinite
loop.
Note that before the commit mentioned in [1] that introduced this
regression, it worked because generic_perform_write() would bail out if
only one byte could not be written.
So fix this by returning the number of bytes effectively not written in
__asm_copy_[to|from]_user() and __clear_user(), as it is expected.
Link: https://lore.kernel.org/linux-riscv/20230309151841.bomov6hq3ybyp42a@debian/ [1]
Fixes: ebcbd75e39 ("riscv: Fix the bug in memory access fixup code")
Reported-by: Bo YU <tsu.yubo@gmail.com>
Closes: https://lore.kernel.org/linux-riscv/20230309151841.bomov6hq3ybyp42a@debian/#t
Reported-by: Aurelien Jarno <aurelien@aurel32.net>
Closes: https://lore.kernel.org/linux-riscv/ZNOnCakhwIeue3yr@aurel32.net/
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Link: https://lore.kernel.org/r/20230811150604.1621784-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The instructions c.jr and c.jalr must have rs1 != 0, but
riscv_insn_is_c_jr() and riscv_insn_is_c_jalr() do not check for this. So,
riscv_insn_is_c_jr() can match a reserved encoding, while
riscv_insn_is_c_jalr() can match the c.ebreak instruction.
Rewrite them with check for rs1 != 0.
Signed-off-by: Nam Cao <namcaov@gmail.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Fixes: ec5f908775 ("RISC-V: Move riscv_insn_is_* macros into a common header")
Link: https://lore.kernel.org/r/20230731183925.152145-1-namcaov@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When we test seccomp with 6.4 kernel, we found errno has wrong value.
If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf5058, we will
get ENOSYS instead. We got same result with commit 9c2598d435 ("riscv:
entry: Save a0 prior syscall_enter_from_user_mode()").
After analysing code, we think that regs->a0 = -ENOSYS should only be
executed when syscall != -1. In __seccomp_filter, when seccomp rejected
this syscall with specified errno, they will set a0 to return number as
syscall ABI, and then return -1. This return number is finally pass as
return number of syscall_enter_from_user_mode, and then is compared with
NR_syscalls after converted to ulong (so it will be ULONG_MAX). The
condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS
is always executed. It covered a0 set by seccomp, so we always get
ENOSYS when match seccomp RET_ERRNO rule.
Fixes: f0bddf5058 ("riscv: entry: Convert to generic entry")
Reported-by: Felix Yan <felixonmars@archlinux.org>
Co-developed-by: Ruizhe Pan <c141028@gmail.com>
Signed-off-by: Ruizhe Pan <c141028@gmail.com>
Co-developed-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
Signed-off-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
Tested-by: Felix Yan <felixonmars@archlinux.org>
Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230801141607.435192-1-CoelacanthusHex@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Set spec->en_3kpull_low default to true.
Then fillback ALC236 and ALC257 to false.
Additional note: this addresses a regression caused by the previous
fix 69ea4c9d02 ("ALSA: hda/realtek - remove 3k pull low procedure").
The previous workaround was applied too widely without necessity,
which resulted in the pop noise at PM again. This patch corrects the
condition and restores the old behavior for the devices that don't
suffer from the original problem.
Fixes: 69ea4c9d02 ("ALSA: hda/realtek - remove 3k pull low procedure")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217732
Link: https://lore.kernel.org/r/01e212a538fc407ca6edd10b81ff7b05@realtek.com
Signed-off-by: Kailang Yang <kailang@realtek.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
After we remove a GPIO chip that still has some requested descriptors,
gpiod_free_commit() will fail and we will never put the references to the
GPIO device and the owning module in gpiod_free().
Rework this function to:
- not warn on desc == NULL as this is a use-case on which most free
functions silently return
- put the references to desc->gdev and desc->gdev->owner unconditionally
so that the release callback actually gets called when the remaining
references are dropped by external GPIO users
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Florisn Westphal says:
====================
These are netfilter fixes for the *net* tree.
First patch resolves a false-positive lockdep splat:
rcu_dereference is used outside of rcu read lock. Let lockdep
validate that the transaction mutex is locked.
Second patch fixes a kdoc warning added in previous PR.
Third patch fixes a memory leak:
The catchall element isn't disabled correctly, this allows
userspace to deactivate the element again. This results in refcount
underflow which in turn prevents memory release. This was always
broken since the feature was added in 5.13.
Patch 4 fixes an incorrect change in the previous pull request:
Adding a duplicate key to a set should work if the duplicate key
has expired, restore this behaviour. All from myself.
Patch #5 resolves an old historic artifact in sctp conntrack:
a 300ms timeout for shutdown_ack. Increase this to 3s. From Xin Long.
Patch #6 fixes a sysctl data race in ipvs, two threads can clobber the
sysctl value, from Sishuai Gong. This is a day-0 bug that predates git
history.
Patches 7, 8 and 9, from Pablo Neira Ayuso, are also followups
for the previous GC rework in nf_tables: The netlink notifier and the
netns exit path must both increment the gc worker seqcount, else worker
may encounter stale (free'd) pointers.
================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix indentation of a type attribute of IPV6_VTI config entry.
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Retire some of my email addresses from Kernel activities.
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
1) Fix a slab-out-of-bounds read in xfrm_address_filter.
From Lin Ma.
2) Fix the pfkey sadb_x_filter validation.
From Lin Ma.
3) Use the correct nla_policy structure for XFRMA_SEC_CTX.
From Lin Ma.
4) Fix warnings triggerable by bad packets in the encap functions.
From Herbert Xu.
5) Fix some slab-use-after-free in decode_session6.
From Zhengchao Shao.
6) Fix a possible NULL piointer dereference in xfrm_update_ae_params.
Lin Ma.
7) Add a forgotten nla_policy for XFRMA_MTIMER_THRESH.
From Lin Ma.
8) Don't leak offloaded policies.
From Leon Romanovsky.
9) Delete also the offloading part of an acquire state.
From Leon Romanovsky.
Please pull or let me know if there are problems.
There is infrastructure to rewrite return thunks to point to any
random thunk one desires, unwrap that from CALL_THUNKS, which up to
now was the sole user of that.
[ bp: Make the thunks visible on 32-bit and add ifdeffery for the
32-bit builds. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.775293785@infradead.org
Objtool --rethunk does two things:
- it collects all (tail) call's of __x86_return_thunk and places them
into .return_sites. These are typically compiler generated, but
RET also emits this same.
- it fudges the validation of the __x86_return_thunk symbol; because
this symbol is inside another instruction, it can't actually find
the instruction pointed to by the symbol offset and gets upset.
Because these two things pertained to the same symbol, there was no
pressing need to separate these two separate things.
However, alas, along comes SRSO and more crazy things to deal with
appeared.
The SRSO patch itself added the following symbol names to identify as
rethunk:
'srso_untrain_ret', 'srso_safe_ret' and '__ret'
Where '__ret' is the old retbleed return thunk, 'srso_safe_ret' is a
new similarly embedded return thunk, and 'srso_untrain_ret' is
completely unrelated to anything the above does (and was only included
because of that INT3 vs UD2 issue fixed previous).
Clear things up by adding a second category for the embedded instruction
thing.
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.704502245@infradead.org
vmlinux.o: warning: objtool: srso_untrain_ret() falls through to next function __x86_return_skl()
vmlinux.o: warning: objtool: __x86_return_thunk() falls through to next function __x86_return_skl()
This is because these functions (can) end with CALL, which objtool
does not consider a terminating instruction. Therefore, replace the
INT3 instruction (which is a non-fatal trap) with UD2 (which is a
fatal-trap).
This indicates execution will not continue past this point.
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.637802730@infradead.org
Commit
fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
reimplemented __x86_return_thunk with a mix of SYM_FUNC_START and
SYM_CODE_END, this is not a sane combination.
Since nothing should ever actually 'CALL' this, make it consistently
CODE.
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.571027074@infradead.org
Return result of b44_writephy() instead of zero to
deal with possible error.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit abdb1742a3 removed code that clears ctx->username when sec=none, so attempting
to mount with '-o sec=none' now fails with -EACCES. Fix it by adding that logic to the
parsing of the 'sec' option, as well as checking if the mount is using null auth before
setting the username when parsing the 'user' option.
Fixes: abdb1742a3 ("cifs: get rid of mount options string parsing")
Cc: stable@vger.kernel.org
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Recent changes in net-next (commit 759ab1edb5 ("net: store netdevs
in an xarray")) refactored the handling of pre-assigned ifindexes
and let syzbot surface a latent problem in ovs. ovs does not validate
ifindex, making it possible to create netdev ports with negative
ifindex values. It's easy to repro with YNL:
$ ./cli.py --spec netlink/specs/ovs_datapath.yaml \
--do new \
--json '{"upcall-pid": 1, "name":"my-dp"}'
$ ./cli.py --spec netlink/specs/ovs_vport.yaml \
--do new \
--json '{"upcall-pid": "00000001", "name": "some-port0", "dp-ifindex":3,"ifindex":4294901760,"type":2}'
$ ip link show
-65536: some-port0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 7a:48:21:ad:0b:fb brd ff:ff:ff:ff:ff:ff
...
Validate the inputs. Now the second command correctly returns:
$ ./cli.py --spec netlink/specs/ovs_vport.yaml \
--do new \
--json '{"upcall-pid": "00000001", "name": "some-port0", "dp-ifindex":3,"ifindex":4294901760,"type":2}'
lib.ynl.NlError: Netlink error: Numerical result out of range
nl_len = 108 (92) nl_flags = 0x300 nl_type = 2
error: -34 extack: {'msg': 'integer out of range', 'unknown': [[type:4 len:36] b'\x0c\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x03\x00\xff\xff\xff\x7f\x00\x00\x00\x00\x08\x00\x01\x00\x08\x00\x00\x00'], 'bad-attr': '.ifindex'}
Accept 0 since it used to be silently ignored.
Fixes: 54c4ef34c4 ("openvswitch: allow specifying ifindex of new interfaces")
Reported-by: syzbot+7456b5dcf65111553320@syzkaller.appspotmail.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20230814203840.2908710-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Similar to commit 01f4fd2708 ("bonding: Fix incorrect deletion of
ETH_P_8021AD protocol vid from slaves"), we can trigger BUG_ON(!vlan_info)
in unregister_vlan_dev() with the following testcase:
# ip netns add ns1
# ip netns exec ns1 ip link add team1 type team
# ip netns exec ns1 ip link add team_slave type veth peer veth2
# ip netns exec ns1 ip link set team_slave master team1
# ip netns exec ns1 ip link add link team_slave name team_slave.10 type vlan id 10 protocol 802.1ad
# ip netns exec ns1 ip link add link team1 name team1.10 type vlan id 10 protocol 802.1ad
# ip netns exec ns1 ip link set team_slave nomaster
# ip netns del ns1
Add S-VLAN tag related features support to team driver. So the team driver
will always propagate the VLAN info to its slaves.
Fixes: 8ad227ff89 ("net: vlan: add 802.1ad support")
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230814032301.2804971-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Do not allow to insert elements from datapath to objects maps.
Fixes: 8aeff920dc ("netfilter: nf_tables: add stateful object reference to set elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Use maybe_get_net() since GC workqueue might race with netns exit path.
Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Netlink event path is missing a synchronization point with GC
transactions. Add GC sequence number update to netns release path and
netlink event path, any GC transaction losing race will be discarded.
Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
When two threads run proc_do_sync_threshold() in parallel,
data races could happen between the two memcpy():
Thread-1 Thread-2
memcpy(val, valp, sizeof(val));
memcpy(valp, val, sizeof(val));
This race might mess up the (struct ctl_table *) table->data,
so we add a mutex lock to serialize them.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/netdev/B6988E90-0A1E-4B85-BF26-2DAF6D482433@gmail.com/
Signed-off-by: Sishuai Gong <sishuai.system@gmail.com>
Acked-by: Simon Horman <horms@kernel.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
In SCTP protocol, it is using the same timer (T2 timer) for SHUTDOWN and
SHUTDOWN_ACK retransmission. However in sctp conntrack the default timeout
value for SCTP_CONNTRACK_SHUTDOWN_ACK_SENT state is 3 secs while it's 300
msecs for SCTP_CONNTRACK_SHUTDOWN_SEND/RECV state.
As Paolo Valerio noticed, this might cause unwanted expiration of the ct
entry. In my test, with 1s tc netem delay set on the NAT path, after the
SHUTDOWN is sent, the sctp ct entry enters SCTP_CONNTRACK_SHUTDOWN_SEND
state. However, due to 300ms (too short) delay, when the SHUTDOWN_ACK is
sent back from the peer, the sctp ct entry has expired and been deleted,
and then the SHUTDOWN_ACK has to be dropped.
Also, it is confusing these two sysctl options always show 0 due to all
timeout values using sec as unit:
net.netfilter.nf_conntrack_sctp_timeout_shutdown_recd = 0
net.netfilter.nf_conntrack_sctp_timeout_shutdown_sent = 0
This patch fixes it by also using 3 secs for sctp shutdown send and recv
state in sctp conntrack, which is also RTO.initial value in SCTP protocol.
Note that the very short time value for SCTP_CONNTRACK_SHUTDOWN_SEND/RECV
was probably used for a rare scenario where SHUTDOWN is sent on 1st path
but SHUTDOWN_ACK is replied on 2nd path, then a new connection started
immediately on 1st path. So this patch also moves from SHUTDOWN_SEND/RECV
to CLOSE when receiving INIT in the ORIGINAL direction.
Fixes: 9fb9cbb108 ("[NETFILTER]: Add nf_conntrack subsystem.")
Reported-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
nftables selftests fail:
run-tests.sh testcases/sets/0044interval_overlap_0
Expected: 0-2 . 0-3, got:
W: [FAILED] ./testcases/sets/0044interval_overlap_0: got 1
Insertion must ignore duplicate but expired entries.
Moreover, there is a strange asymmetry in nft_pipapo_activate:
It refetches the current element, whereas the other ->activate callbacks
(bitmap, hash, rhash, rbtree) use elem->priv.
Same for .remove: other set implementations take elem->priv,
nft_pipapo_remove fetches elem->priv, then does a relookup,
remove this.
I suspect this was the reason for the change that prompted the
removal of the expired check in pipapo_get() in the first place,
but skipping exired elements there makes no sense to me, this helper
is used for normal get requests, insertions (duplicate check)
and deactivate callback.
In first two cases expired elements must be skipped.
For ->deactivate(), this gets called for DELSETELEM, so it
seems to me that expired elements should be skipped as well, i.e.
delete request should fail with -ENOENT error.
Fixes: 24138933b9 ("netfilter: nf_tables: don't skip expired elements during walk")
Signed-off-by: Florian Westphal <fw@strlen.de>
When flushing, individual set elements are disabled in the next
generation via the ->flush callback.
Catchall elements are not disabled. This is incorrect and may lead to
double-deactivations of catchall elements which then results in memory
leaks:
WARNING: CPU: 1 PID: 3300 at include/net/netfilter/nf_tables.h:1172 nft_map_deactivate+0x549/0x730
CPU: 1 PID: 3300 Comm: nft Not tainted 6.5.0-rc5+ #60
RIP: 0010:nft_map_deactivate+0x549/0x730
[..]
? nft_map_deactivate+0x549/0x730
nf_tables_delset+0xb66/0xeb0
(the warn is due to nft_use_dec() detecting underflow).
Fixes: aaa31047a6 ("netfilter: nftables: add catch-all set element support")
Reported-by: lonial con <kongln9170@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Jakub Kicinski says:
We've got some new kdoc warnings here:
net/netfilter/nft_set_pipapo.c:1557: warning: Function parameter or member '_set' not described in 'pipapo_gc'
net/netfilter/nft_set_pipapo.c:1557: warning: Excess function parameter 'set' description in 'pipapo_gc'
include/net/netfilter/nf_tables.h:577: warning: Function parameter or member 'dead' not described in 'nft_set'
Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Fixes: f6c383b8c3 ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20230810104638.746e46f1@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
Since platform_get_irq() never returned zero, so it need not to check
whether it returned zero, and we use the return error code of
platform_get_irq() to replace the current return error code.
Please refer to the commit a85a6c86c2 ("driver core: platform: Clarify
that IRQ 0 is invalid") to get that platform_get_irq() never returned
zero.
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Signed-off-by: Helge Deller <deller@gmx.de>
The driver depends on CONFIG_OF, it is not necessary to use
of_match_ptr() here.
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Signed-off-by: Helge Deller <deller@gmx.de>
These declarations is never implemented since the beginning of git history.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Pull parisc fix from Helge Deller:
"Fix the parisc TLB ptlock checks so that they can be enabled together
with the lightweight spinlock checks"
* tag 'parisc-for-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix CONFIG_TLB_PTLOCK to work with lightweight spinlock checks
Pull smb client fixes from Steve French:
"Three smb client fixes, all for stable:
- fix for oops in unmount race with lease break of deferred close
- debugging improvement for reconnect
- fix for fscache deadlock (folio_wait_bit_common hang)"
* tag '6.5-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: display network namespace in debug information
cifs: Release folio lock on fscache read hit.
cifs: fix potential oops in cifs_oplock_break
Pull regulator fixes from Mark Brown:
"Two small driver specific fixes: one incorrect definition for one of
the Qualcomm regulators and better handling of poorly formed DTs in
the DA9063 driver"
* tag 'regulator-fix-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: qcom-rpmh: Fix LDO 12 regulator for PM8550
regulator: da9063: better fix null deref with partial DT
In the real workload, I encountered an issue which could cause the RTO
timer to retransmit the skb per 1ms with linear option enabled. The amount
of lost-retransmitted skbs can go up to 1000+ instantly.
The root cause is that if the icsk_rto happens to be zero in the 6th round
(which is the TCP_THIN_LINEAR_RETRIES value), then it will always be zero
due to the changed calculation method in tcp_retransmit_timer() as follows:
icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
Above line could be converted to
icsk->icsk_rto = min(0 << 1, TCP_RTO_MAX) = 0
Therefore, the timer expires so quickly without any doubt.
I read through the RFC 6298 and found that the RTO value can be rounded
up to a certain value, in Linux, say TCP_RTO_MIN as default, which is
regarded as the lower bound in this patch as suggested by Eric.
Fixes: 36e31b0af5 ("net: TCP thin linear timeouts")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ASoC: Fixes for v6.5
A fairly large collection of fixes here, mostly SOF and Intel related.
The one core fix is Hans' change which reduces the log spam when working
out new use cases for DPCM.
The encode_dma() function has some validation on in_trans->size but it
would be more clear to move those checks to find_and_map_user_pages().
The encode_dma() had two checks:
if (in_trans->addr + in_trans->size < in_trans->addr || !in_trans->size)
return -EINVAL;
The in_trans->addr variable is the starting address. The in_trans->size
variable is the total size of the transfer. The transfer can occur in
parts and the resources->xferred_dma_size tracks how many bytes we have
already transferred.
This patch introduces a new variable "remaining" which represents the
amount we want to transfer (in_trans->size) minus the amount we have
already transferred (resources->xferred_dma_size).
I have modified the check for if in_trans->size is zero to instead check
if in_trans->size is less than resources->xferred_dma_size. If we have
already transferred more bytes than in_trans->size then there are negative
bytes remaining which doesn't make sense. If there are zero bytes
remaining to be copied, just return success.
The check in encode_dma() checked that "addr + size" could not overflow
and barring a driver bug that should work, but it's easier to check if
we do this in parts. First check that "in_trans->addr +
resources->xferred_dma_size" is safe. Then check that "xfer_start_addr +
remaining" is safe.
My final concern was that we are dealing with u64 values but on 32bit
systems the kmalloc() function will truncate the sizes to 32 bits. So
I calculated "total = in_trans->size + offset_in_page(xfer_start_addr);"
and returned -EINVAL if it were >= SIZE_MAX. This will not affect 64bit
systems.
Fixes: 129776ac2e ("accel/qaic: Add control path")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/24d3348b-25ac-4c1b-b171-9dae7c43e4e0@moroto.mountain
When the code to use the PTP HW clock was added, it didn't update
the Kconfig entry for the PTP dependency, leading to build errors,
so update the Kconfig entry to depend on PTP_1588_CLOCK_OPTIONAL.
aarch64-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ptp.o: in function `iwl_mvm_ptp_init':
drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:294: undefined reference to `ptp_clock_register'
drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:294:(.text+0xce8): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_register'
aarch64-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:301: undefined reference to `ptp_clock_index'
drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:301:(.text+0xd18): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_index'
aarch64-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ptp.o: in function `iwl_mvm_ptp_remove':
drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:315: undefined reference to `ptp_clock_index'
drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:315:(.text+0xe80): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_index'
aarch64-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:319: undefined reference to `ptp_clock_unregister'
drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:319:(.text+0xeac): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_unregister'
Fixes: 1595ecce1c ("wifi: iwlwifi: mvm: add support for PTP HW clock (PHC)")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/all/202308110447.4QSJHmFH-lkp@intel.com/
Cc: Krishnanand Prabhu <krishnanand.prabhu@intel.com>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Gregory Greenman <gregory.greenman@intel.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: linux-wireless@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230812052947.22913-1-rdunlap@infradead.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull virtio fixes from Michael Tsirkin:
"Just a bunch of bugfixes all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (26 commits)
virtio-mem: check if the config changed before fake offlining memory
virtio-mem: keep retrying on offline_and_remove_memory() errors in Sub Block Mode (SBM)
virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY
virtio-mem: remove unsafe unplug in Big Block Mode (BBM)
pds_vdpa: fix up debugfs feature bit printing
pds_vdpa: alloc irq vectors on DRIVER_OK
pds_vdpa: clean and reset vqs entries
pds_vdpa: always allow offering VIRTIO_NET_F_MAC
pds_vdpa: reset to vdpa specified mac
virtio-net: Zero max_tx_vq field for VIRTIO_NET_CTRL_MQ_HASH_CONFIG case
vdpa/mlx5: Fix crash on shutdown for when no ndev exists
vdpa/mlx5: Delete control vq iotlb in destroy_mr only when necessary
vdpa/mlx5: Fix mr->initialized semantics
vdpa/mlx5: Correct default number of queues when MQ is on
virtio-vdpa: Fix cpumask memory leak in virtio_vdpa_find_vqs()
vduse: Use proper spinlock for IRQ injection
vdpa: Enable strict validation for netlinks ops
vdpa: Add max vqp attr to vdpa_nl_policy for nlattr length check
vdpa: Add queue index attr to vdpa_nl_policy for nlattr length check
vdpa: Add features attr to vdpa_nl_policy for nlattr length check
...
Michal Schmidt says:
====================
octeon_ep: fixes for error and remove paths
I have an Octeon card that's misconfigured in a way that exposes a
couple of bugs in the octeon_ep driver's error paths. It can reproduce
the issues that patches 1 & 4 are fixing. Patches 2 & 3 are a result of
reviewing the nearby code.
====================
Link: https://lore.kernel.org/r/20230810150114.107765-1-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If it fails to get the devices's MAC address, octep_probe exits while
leaving the delayed work intr_poll_task queued. When the work later
runs, it's a use after free.
Move the cancelation of intr_poll_task from octep_remove into
octep_device_cleanup. This does not change anything in the octep_remove
flow, but octep_device_cleanup is called also in the octep_probe error
path, where the cancelation is needed.
Note that the cancelation of ctrl_mbox_task has to follow
intr_poll_task's, because the ctrl_mbox_task may be queued by
intr_poll_task.
Fixes: 24d4333233 ("octeon_ep: poll for control messages")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Link: https://lore.kernel.org/r/20230810150114.107765-5-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
intr_poll_task may queue ctrl_mbox_task. The function
octep_poll_non_ioq_interrupts_cn93_pf does this.
When removing the driver and canceling these two works, cancel
ctrl_mbox_task last to guarantee it does not run anymore.
Fixes: 24d4333233 ("octeon_ep: poll for control messages")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Link: https://lore.kernel.org/r/20230810150114.107765-4-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
tx_timeout_task is canceled too early when removing the driver. Nothing
prevents .ndo_tx_timeout from triggering and queuing the work again.
Better cancel it after the netdev is unregistered.
It's harmless for octep_tx_timeout_task to run in the window between the
unregistration and cancelation, because it checks netif_running.
Fixes: 862cd659a6 ("octeon_ep: Add driver framework and device initialization")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Link: https://lore.kernel.org/r/20230810150114.107765-3-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On Zynq UltraScale+ MPSoC ubuntu platform when systemctl issues suspend,
network manager bring down the interface and goes into suspend. When it
wakes up it again enables the interface.
This leads to xilinx-psgtr "PLL lock timeout" on interface bringup, as
the power management controller power down the entire FPD (including
SERDES) if none of the FPD devices are in use and serdes is not
initialized on resume.
$ sudo rtcwake -m no -s 120 -v
$ sudo systemctl suspend <this does ifconfig eth1 down>
$ ifconfig eth1 up
xilinx-psgtr fd400000.phy: lane 0 (type 10, protocol 5): PLL lock timeout
phy phy-fd400000.phy.0: phy poweron failed --> -110
macb driver is called in this way:
1. macb_close: Stop network interface. In this function, it
reset MACB IP and disables PHY and network interface.
2. macb_suspend: It is called in kernel suspend flow. But because
network interface has been disabled(netif_running(ndev) is
false), it does nothing and returns directly;
3. System goes into suspend state. Some time later, system is
waken up by RTC wakeup device;
4. macb_resume: It does nothing because network interface has
been disabled;
5. macb_open: It is called to enable network interface again. ethernet
interface is initialized in this API but serdes which is power-off
by PMUFW during FPD-off suspend is not initialized again and so
we hit GT PLL lock issue on open.
To resolve this PLL timeout issue always do PS GTR initialization
when ethernet device is configured as non-wakeup source.
Fixes: f22bd29ba1 ("net: macb: Fix ZynqMP SGMII non-wakeup source resume failure")
Fixes: 8b73fa3ae0 ("net: macb: Added ZynqMP-specific initialization")
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://lore.kernel.org/r/1691414091-2260697-1-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
svc_tcp_sendmsg used to factor in the xdr->page_base when sending pages,
but commit 5df5dd03a8 ("sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather
then sendpage") dropped that part of the handling. Fix it by setting
the bv_offset of the first bvec.
Fixes: 5df5dd03a8 ("sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
iproc_i2c_rd_reg() and iproc_i2c_wr_reg() are called from both
interrupt context (e.g. bcm_iproc_i2c_isr) and process context
(e.g. bcm_iproc_i2c_suspend). Therefore, interrupts should be
disabled to avoid potential deadlock. To prevent this scenario,
use spin_lock_irqsave().
Fixes: 9a10387280 ("i2c: iproc: add NIC I2C support")
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Acked-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Since commit 03c835f498 ("i2c: Switch .probe() to not take an id
parameter") .probe() is the recommended callback to implement (again).
Reflect this in the documentation and don't mention .probe_new() any
more.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
This should be done before the soft min/max frequencies are restored.
When we disable the "Ignore efficient frequency" flag, GuC does not
actually bring the requested freq down to RPn.
Specifically, this scenario-
- ignore efficient freq set to true
- reduce min to RPn (from efficient)
- suspend
- resume (includes GuC load, restore soft min/max, restore efficient freq)
- validate min freq has been resored to RPn
This will fail if we didn't first restore(disable, in this case) efficient
freq flag before setting the soft min frequency.
v2: Bring the min freq down to RPn when we disable efficient freq (Rodrigo)
Also made the change to set the min softlimit to RPn at init. Otherwise, we
were storing RPe there.
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8736
Fixes: 55f9720dbf ("drm/i915/guc/slpc: Provide sysfs for efficient freq")
Fixes: 95ccf312a1 ("drm/i915/guc/slpc: Allow SLPC to use efficient frequency")
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230726010044.3280402-1-vinay.belgaumkar@intel.com
(cherry picked from commit 28e671114f)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The "ret" variable is uninitialized. It was the "p2wi->rstc" variable
that was intended. We can also use the %pe string format to print the
error code name instead of just the number.
Fixes: 75ff8a340a ("i2c: sun6i-p2wi: Use devm_clk_get_enabled()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
The readdir implementation currently processes always up to the last index
it finds. This however can result in an infinite loop if the directory has
a large number of entries such that they won't all fit in the given buffer
passed to the readdir callback, that is, dir_emit() returns a non-zero
value. Because in that case readdir() will be called again and if in the
meanwhile new directory entries were added and we still can't put all the
remaining entries in the buffer, we keep repeating this over and over.
The following C program and test script reproduce the problem:
$ cat /mnt/readdir_prog.c
#include <sys/types.h>
#include <dirent.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
DIR *dir = opendir(".");
struct dirent *dd;
while ((dd = readdir(dir))) {
printf("%s\n", dd->d_name);
rename(dd->d_name, "TEMPFILE");
rename("TEMPFILE", dd->d_name);
}
closedir(dir);
}
$ gcc -o /mnt/readdir_prog /mnt/readdir_prog.c
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV &> /dev/null
#mkfs.xfs -f $DEV &> /dev/null
#mkfs.ext4 -F $DEV &> /dev/null
mount $DEV $MNT
mkdir $MNT/testdir
for ((i = 1; i <= 2000; i++)); do
echo -n > $MNT/testdir/file_$i
done
cd $MNT/testdir
/mnt/readdir_prog
cd /mnt
umount $MNT
This behaviour is surprising to applications and it's unlike ext4, xfs,
tmpfs, vfat and other filesystems, which always finish. In this case where
new entries were added due to renames, some file names may be reported
more than once, but this varies according to each filesystem - for example
ext4 never reported the same file more than once while xfs reports the
first 13 file names twice.
So change our readdir implementation to track the last index number when
opendir() is called and then make readdir() never process beyond that
index number. This gives the same behaviour as ext4.
Reported-by: Rob Landley <rob@landley.net>
Link: https://lore.kernel.org/linux-btrfs/2c8c55ec-04c6-e0dc-9c5c-8c7924778c35@landley.net/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217681
CC: stable@vger.kernel.org # 6.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We recently had problems where a network namespace was deleted
causing hard to debug reconnect problems. To help deal with
configuration issues like this it is useful to dump the network
namespace to better debug what happened.
So add this to information displayed in /proc/fs/cifs/DebugData for
the server (and channels if mounted with multichannel). For example:
Local Users To Server: 1 SecMode: 0x1 Req On Wire: 0 Net namespace: 4026531840
This can be easily compared with what is displayed for the
processes on the system. For example /proc/1/ns/net in this case
showed the same thing (see below), and we can see that the namespace
is still valid in this example.
'net:[4026531840]'
Cc: stable@vger.kernel.org
Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Under the current code, when cifs_readpage_worker is called, the call
contract is that the callee should unlock the page. This is documented
in the read_folio section of Documentation/filesystems/vfs.rst as:
> The filesystem should unlock the folio once the read has completed,
> whether it was successful or not.
Without this change, when fscache is in use and cache hit occurs during
a read, the page lock is leaked, producing the following stack on
subsequent reads (via mmap) to the page:
$ cat /proc/3890/task/12864/stack
[<0>] folio_wait_bit_common+0x124/0x350
[<0>] filemap_read_folio+0xad/0xf0
[<0>] filemap_fault+0x8b1/0xab0
[<0>] __do_fault+0x39/0x150
[<0>] do_fault+0x25c/0x3e0
[<0>] __handle_mm_fault+0x6ca/0xc70
[<0>] handle_mm_fault+0xe9/0x350
[<0>] do_user_addr_fault+0x225/0x6c0
[<0>] exc_page_fault+0x84/0x1b0
[<0>] asm_exc_page_fault+0x27/0x30
This requires a reboot to resolve; it is a deadlock.
Note however that the call to cifs_readpage_from_fscache does mark the
page clean, but does not free the folio lock. This happens in
__cifs_readpage_from_fscache on success. Releasing the lock at that
point however is not appropriate as cifs_readahead also calls
cifs_readpage_from_fscache and *does* unconditionally release the lock
after its return. This change therefore effectively makes
cifs_readpage_worker work like cifs_readahead.
Signed-off-by: Russell Harmon <russ@har.mn>
Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Tegra processors prior to Tegra186 used APB DMA for I2C requiring
CONFIG_TEGRA20_APB_DMA=y while Tegra186 and later use GPC DMA requiring
CONFIG_TEGRA186_GPC_DMA=y.
The check for if the processor uses APB DMA is inverted and so the wrong
DMA config options are checked.
This means if CONFIG_TEGRA20_APB_DMA=y but CONFIG_TEGRA186_GPC_DMA=n
with a Tegra186 or later processor the driver will incorrectly think DMA is
enabled and attempt to request DMA channels that will never be availible,
leaving the driver in a perpetual EPROBE_DEFER state.
Fixes: 48cb6356fa ("i2c: tegra: Add GPCDMA support")
Signed-off-by: Parker Newman <pnewman@connecttech.com>
Acked-by: Andi Shyti <andi.shyti@kernel.org>
Acked-by: Akhil R <akhilrajeev@nvidia.com>
Link: https://lore.kernel.org/r/fcfcf9b3-c8c4-9b34-2ff8-cd60a3d490bd@connecttech.com
Signed-off-by: Wolfram Sang <wsa@kernel.org>
If the driver fails to obtain a DMA channel, it will initiate cleanup
and try to release the DMA channel that couldn't be retrieved. This will
cause a crash because the cleanup will try to dereference an ERR_PTR()-
encoded error code.
However, there's nothing to clean up at this point yet, so we can avoid
this by simply resetting the DMA channel to NULL instead of storing the
error code.
Fixes: fcc8a89a1c ("i2c: tegra: Share same DMA channel for RX and TX")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tested-by: Akhil R <akhilrajeev@nvidia.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
In the I2C_FUNC_SMBUS_BLOCK_DATA case, the invalid length byte value
(outside of 1-32) of the SMBus block data response from the Slave device
is not correctly handled by the I2C Designware driver.
In case IC_EMPTYFIFO_HOLD_MASTER_EN==1, which cannot be detected
from the registers, the Master can be disabled only if the STOP bit
is set. Without STOP bit set, the Master remains active, holding the bus
until receiving a block data response length. This hangs the bus and
is unrecoverable.
Avoid this by issuing another dump read to reach the stop condition when
an invalid length byte is received.
Cc: stable@vger.kernel.org
Signed-off-by: Tam Nguyen <tamnguyenchi@os.amperecomputing.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Link: https://lore.kernel.org/r/20230726080001.337353-3-tamnguyenchi@os.amperecomputing.com
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
On MX8X platforms, the default clock rate is 0 if without explicit
clock setting in dts nodes. I2c can't work when i2c peripheral clk
rate is 0.
Add a i2c peripheral clk rate check before configuring the clock
register. When i2c peripheral clk rate is 0 and directly return
-EINVAL.
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Acked-by: Dong Aisheng <Aisheng.dong@nxp.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Commit 03e909acd9 ("drm/panel: simple: Add support for AUO G121EAN01.4
panel") added support for this panel model, but the timings it implements
are very different from what the datasheet describes. I checked both the
G121EAN01.0 datasheet from [0] and the G121EAN01.4 one from [1] and they
all have the same timings: for example the LVDS clock typical value is 74.4
MHz, not 66.7 MHz as implemented.
Replace the timings with the ones from the documentation. These timings
have been tested and the clock frequencies verified with an oscilloscope to
ensure they are correct.
Also use struct display_timing instead of struct drm_display_mode in order
to also specify the minimum and maximum values.
[0] https://embedded.avnet.com/product/g121ean01-0/
[1] https://embedded.avnet.com/product/g121ean01-4/
Fixes: 03e909acd9 ("drm/panel: simple: Add support for AUO G121EAN01.4 panel")
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230804151239.835216-1-luca.ceresoli@bootlin.com
This test verifies whether the encapsulated packets have the correct
configured TTL. It does so by sending ICMP packets through the test
topology and mirroring them to a gretap netdevice. On a busy host
however, more than just the test ICMP packets may end up flowing
through the topology, get mirrored, and counted. This leads to
potential spurious failures as the test observes much more mirrored
packets than the sent test packets, and assumes a bug.
Fix this by tightening up the mirror action match. Change it from
matchall to a flower classifier matching on ICMP packets specifically.
Fixes: 45315673e0 ("selftests: forwarding: Test changes in mirror-to-gretap")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kprobes optimization check can_optimize() calls
insn_is_indirect_jump() to detect indirect jump instructions in
a target function. If any is found, creating an optprobe is disallowed
in the function because the jump could be from a jump table and could
potentially land in the middle of the target optprobe.
With retpolines, insn_is_indirect_jump() additionally looks for calls to
indirect thunks which the compiler potentially used to replace original
jumps. This extra check is however unnecessary because jump tables are
disabled when the kernel is built with retpolines. The same is currently
the case with IBT.
Based on this observation, remove the logic to look for calls to
indirect thunks and skip the check for indirect jumps altogether if the
kernel is built with retpolines or IBT. Remove subsequently the symbols
__indirect_thunk_start and __indirect_thunk_end which are no longer
needed.
Dropping this logic indirectly fixes a problem where the range
[__indirect_thunk_start, __indirect_thunk_end] wrongly included also the
return thunk. It caused that machines which used the return thunk as
a mitigation and didn't have it patched by any alternative ended up not
being able to use optprobes in any regular function.
Fixes: 0b53c374b9 ("x86/retpoline: Use -mfunction-return")
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20230711091952.27944-3-petr.pavlu@suse.com
The linker script arch/x86/kernel/vmlinux.lds.S matches the thunk
sections ".text.__x86.*" from arch/x86/lib/retpoline.S as follows:
.text {
[...]
TEXT_TEXT
[...]
__indirect_thunk_start = .;
*(.text.__x86.*)
__indirect_thunk_end = .;
[...]
}
Macro TEXT_TEXT references TEXT_MAIN which normally expands to only
".text". However, with CONFIG_LTO_CLANG, TEXT_MAIN becomes
".text .text.[0-9a-zA-Z_]*" which wrongly matches also the thunk
sections. The output layout is then different than expected. For
instance, the currently defined range [__indirect_thunk_start,
__indirect_thunk_end] becomes empty.
Prevent the problem by using ".." as the first separator, for example,
".text..__x86.indirect_thunk". This pattern is utilized by other
explicit section names which start with one of the standard prefixes,
such as ".text" or ".data", and that need to be individually selected in
the linker script.
[ nathan: Fix conflicts with SRSO and fold in fix issue brought up by
Andrew Cooper in post-review:
https://lore.kernel.org/20230803230323.1478869-1-andrew.cooper3@citrix.com ]
Fixes: dc5723b02e ("kbuild: add support for Clang LTO")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230711091952.27944-2-petr.pavlu@suse.com
Skip the srso cmd line parsing which is not needed on Zen1/2 with SMT
disabled and with the proper microcode applied (latter should be the
case anyway) as those are not affected.
Fixes: 5a15d83488 ("x86/srso: Tie SBPB bit setting to microcode patch detection")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230813104517.3346-1-bp@alien8.de
Initially, it was thought that doing an innocuous division in the #DE
handler would take care to prevent any leaking of old data from the
divider but by the time the fault is raised, the speculation has already
advanced too far and such data could already have been used by younger
operations.
Therefore, do the innocuous division on every exit to userspace so that
userspace doesn't see any potentially old data from integer divisions in
kernel space.
Do the same before VMRUN too, to protect host data from leaking into the
guest too.
Fixes: 77245f1c3c ("x86/CPU/AMD: Do not leak quotient data after a division by 0")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230811213824.10025-1-bp@alien8.de
Use LEA instead of ADD when adjusting %rsp in srso_safe_ret{,_alias}()
so as to avoid clobbering flags. Drop one of the INT3 instructions to
account for the LEA consuming one more byte than the ADD.
KVM's emulator makes indirect calls into a jump table of sorts, where
the destination of each call is a small blob of code that performs fast
emulation by executing the target instruction with fixed operands.
E.g. to emulate ADC, fastop() invokes adcb_al_dl():
adcb_al_dl:
<+0>: adc %dl,%al
<+2>: jmp <__x86_return_thunk>
A major motivation for doing fast emulation is to leverage the CPU to
handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is
both an input and output to the target of the call. fastop() collects
the RFLAGS result by pushing RFLAGS onto the stack and popping them back
into a variable (held in %rdi in this case):
asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
<+71>: mov 0xc0(%r8),%rdx
<+78>: mov 0x100(%r8),%rcx
<+85>: push %rdi
<+86>: popf
<+87>: call *%rsi
<+89>: nop
<+90>: nop
<+91>: nop
<+92>: pushf
<+93>: pop %rdi
and then propagating the arithmetic flags into the vCPU's emulator state:
ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
<+64>: and $0xfffffffffffff72a,%r9
<+94>: and $0x8d5,%edi
<+109>: or %rdi,%r9
<+122>: mov %r9,0x10(%r8)
The failures can be most easily reproduced by running the "emulator"
test in KVM-Unit-Tests.
If you're feeling a bit of deja vu, see commit b63f20a778
("x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386").
In addition, this breaks booting of clang-compiled guest on
a gcc-compiled host where the host contains the %rsp-modifying SRSO
mitigations.
[ bp: Massage commit message, extend, remove addresses. ]
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Closes: https://lore.kernel.org/all/de474347-122d-54cd-eabf-9dcc95ab9eae@amd.com
Reported-by: Srikanth Aithal <sraithal@amd.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20230810013334.GA5354@dev-arch.thelio-3990X/
Link: https://lore.kernel.org/r/20230811155255.250835-1-seanjc@google.com
For the TLB_PTLOCK checks we used an optimization to store the spc
register into the spinlock to unlock it. This optimization works as
long as the lightweight spinlock checks (CONFIG_LIGHTWEIGHT_SPINLOCK_CHECK)
aren't enabled, because they really check if the lock word is zero or
__ARCH_SPIN_LOCK_UNLOCKED_VAL and abort with a kernel crash
("Spinlock was trashed") otherwise.
Drop that optimization to make it possible to activate both checks
at the same time.
Noticed-by: Sam James <sam@gentoo.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Sam James <sam@gentoo.org>
Cc: stable@vger.kernel.org # v6.4+
Fixes: 15e64ef652 ("parisc: Add lightweight spinlock checks")
This patch uses a vendor register to check whether the system hibernated ever.
The driver will only set the preset when the driver brings up or the system hibernated.
It will avoid the unknown issue that makes the speaker output louder and can't control the volume.
Signed-off-by: Shuming Fan <shumingf@realtek.com
Link: https://lore.kernel.org/r/20230811093822.37573-1-shumingf@realtek.com
Signed-off-by: Mark Brown <broonie@kernel.org
Pull Kbuild fixes from Masahiro Yamada:
- Clear errno before calling getline()
- Fix a modpost warning for ARCH=alpha
* tag 'kbuild-fixes-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
alpha: remove __init annotation from exported page_is_ram()
scripts/kallsyms: Fix build failure by setting errno before calling getline()
Pull x86 platform drivers fixes from Hans de Goede:
- lenovo-ymc driver causes keyboard + touchpad to not work with >= 6.4
on some Thinkbook models, fix this
- A set of small fixes for mlx-platform
- Other small fixes and hw-id additions
* tag 'platform-drivers-x86-v6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: lenovo-ymc: Only bind on machines with a convertible DMI chassis-type
platform: mellanox: Change register offset addresses
platform: mellanox: mlx-platform: Modify graceful shutdown callback and power down mask
platform: mellanox: mlx-platform: Fix signals polarity and latch mask
platform: mellanox: Fix order in exit flow
platform/x86: ISST: Reduce noise for missing numa information in logs
platform/x86: msi-ec: Fix the build
ACPI: scan: Create platform device for CS35L56
platform/x86/amd/pmf: Fix unsigned comparison with less than zero
Pull SCSI fixes from James Bottomley:
"Eleven small fixes, ten in drivers.
Of the two fixes marked core, one is in the raid helper class (used by
some raid device drivers) and the other one is the /proc/scsi/scsi
parsing fix for potential reads beyond the end of the buffer"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qedf: Fix firmware halt over suspend and resume
scsi: qedi: Fix firmware halt over suspend and resume
scsi: qedi: Fix potential deadlock on &qedi_percpu->p_work_lock
scsi: lpfc: Remove reftag check in DIF paths
scsi: ufs: renesas: Fix private allocation
scsi: snic: Fix possible memory leak if device_add() fails
scsi: core: Fix possible memory leak if device_add() fails
scsi: core: Fix legacy /proc parsing buffer overflow
scsi: 53c700: Check that command slot is not NULL
scsi: fnic: Replace return codes in fnic_clean_pending_aborts()
scsi: storvsc: Fix handling of virtual Fibre Channel timeouts
The lenovo-ymc driver is causing the keyboard + touchpad to stop working
on some regular laptop models such as the Lenovo ThinkBook 13s G2 ITL 20V9.
The problem is that there are YMC WMI GUID methods in the ACPI tables
of these laptops, despite them not being Yogas and lenovo-ymc loading
causes libinput to see a SW_TABLET_MODE switch with state 1.
This in turn causes libinput to ignore events from the builtin keyboard
and touchpad, since it filters those out for a Yoga in tablet mode.
Similar issues with false-positive SW_TABLET_MODE=1 reporting have
been seen with the intel-hid driver.
Copy the intel-hid driver approach to fix this and only bind to the WMI
device on machines where the DMI chassis-type indicates the machine
is a convertible.
Add a 'force' module parameter to allow overriding the chassis-type check
so that users can easily test if the YMC interface works on models which
report an unexpected chassis-type.
Fixes: e82882cdd2 ("platform/x86: Add driver for Yoga Tablet Mode switch")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229373
Cc: André Apitzsch <git@apitzsch.eu>
Cc: stable@vger.kernel.org
Tested-by: Andrew Kallmeyer <kallmeyeras@gmail.com>
Tested-by: Gergő Köteles <soyer@irl.hu>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230812144818.383230-1-hdegoede@redhat.com
Uwe reports:
"Most PHYs signal WoL using an interrupt. So disabling interrupts [at
shutdown] breaks WoL at least on PHYs covered by the marvell driver."
Discussing with Ioana, the problem which was trying to be solved was:
"The board in question is a LS1021ATSN which has two AR8031 PHYs that
share an interrupt line. In case only one of the PHYs is probed and
there are pending interrupts on the PHY#2 an IRQ storm will happen
since there is no entity to clear the interrupt from PHY#2's registers.
PHY#1's driver will get stuck in .handle_interrupt() indefinitely."
Further confirmation that "the two AR8031 PHYs are on the same MDIO
bus."
With WoL using interrupts to wake the system, in such a case, the
system will begin booting with an asserted interrupt. Thus, we need to
cope with an interrupt asserted during boot.
Solve this instead by disabling interrupts during PHY probe. This will
ensure in Ioana's situation that both PHYs of the same type sharing an
interrupt line on a common MDIO bus will have their interrupt outputs
disabled when the driver probes the device, but before we hook in any
interrupt handlers - thus avoiding the interrupt storm.
A better fix would be for platform firmware to disable the interrupting
devices at source during boot, before control is handed to the kernel.
Fixes: e2f016cf77 ("net: phy: add a shutdown procedure")
Link: 20230804071757.383971-1-u.kleine-koenig@pengutronix.de
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull btrfs fixes from David Sterba:
"More fixes, some of them going back to older releases and there are
fixes for hangs in stress tests regarding space caching:
- fixes and progress tracking for hangs in free space caching, found
by test generic/475
- writeback fixes, write pages in integrity mode and skip writing
pages that have been written meanwhile
- properly clear end of extent range after an error
- relocation fixes:
- fix race betwen qgroup tree creation and relocation
- detect and report invalid reloc roots"
* tag 'for-6.5-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: set cache_block_group_error if we find an error
btrfs: reject invalid reloc tree root keys with stack dump
btrfs: exit gracefully if reloc roots don't match
btrfs: avoid race between qgroup tree creation and relocation
btrfs: properly clear end of the unreserved range in cow_file_range
btrfs: don't wait for writeback on clean pages in extent_write_cache_pages
btrfs: don't stop integrity writeback too early
btrfs: wait for actual caching progress during allocation
Pull gpio fixes from Bartosz Golaszewski:
- mark virtual chips exposed by gpio-sim as ones that can sleep
(callbacks must not be called from interrupt context)
- fix an off-by-one error in gpio-ws16c48
* tag 'gpio-fixes-for-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: ws16c48: Fix off-by-one error in WS16C48 resource region extent
gpio: sim: mark the GPIO chip as a one that can sleep
The only remaining consumer is new_inode, where it showed up in 2001 as
commit c37fa164f793 ("v2.4.9.9 -> v2.4.9.10") in a historical repo [1]
with a changelog which does not mention it.
Since then the line got only touched up to keep compiling.
While it may have been of benefit back in the day, it is guaranteed to
at best not get in the way in the multicore setting -- as the code
performs *a lot* of work between the prefetch and actual lock acquire,
any contention means the cacheline is already invalid by the time the
routine calls spin_lock(). It adds spurious traffic, for short.
On top of it prefetch is notoriously tricky to use for single-threaded
purposes, making it questionable from the get go.
As such, remove it.
I admit upfront I did not see value in benchmarking this change, but I
can do it if that is deemed appropriate.
Removal from new_inode and of the entire thing are in the same patch as
requested by Linus, so whatever weird looks can be directed at that guy.
Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/fs/inode.c?id=c37fa164f793735b32aa3f53154ff1a7659e6442 [1]
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull char / misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for 6.5-rc6 that resolve
some reported issues. Included in here are:
- bunch of iio driver fixes for reported problems
- interconnect driver fixes
- counter driver build fix
- cardreader driver fixes
- binder driver fixes
- other tiny driver fixes
All of these have been in linux-next for a while with no reported
problems"
* tag 'char-misc-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
misc: tps6594-esm: Disable ESM for rev 1 PMIC
misc: rtsx: judge ASPM Mode to set PETXCFG Reg
binder: fix memory leak in binder_init()
iio: cros_ec: Fix the allocation size for cros_ec_command
tools/counter: Makefile: Replace rmdir by rm to avoid make,clean failure
iio: imu: lsm6dsx: Fix mount matrix retrieval
iio: adc: meson: fix core clock enable/disable moment
iio: core: Prevent invalid memory access when there is no parent
iio: frequency: admv1013: propagate errors from regulator_get_voltage()
counter: Fix menuconfig "Counter support" submenu entries disappearance
dt-bindings: iio: adi,ad74115: remove ref from -nanoamp
iio: adc: ina2xx: avoid NULL pointer dereference on OF device match
iio: light: bu27008: Fix intensity data type
iio: light: bu27008: Fix scale format
iio: light: bu27034: Fix scale format
iio: adc: ad7192: Fix ac excitation feature
interconnect: qcom: sa8775p: add enable_mask for bcm nodes
interconnect: qcom: sm8550: add enable_mask for bcm nodes
interconnect: qcom: sm8450: add enable_mask for bcm nodes
interconnect: qcom: Add support for mask-based BCMs
...
Pull USB / Thunderbolt driver fixes from Greg KH:
"Here are some small USB and Thunderbolt driver fixes for reported
problems. Included in here are:
- thunderbolt driver memory leak fix
- thunderbolt display flicker fix
- usb dwc3 driver fix
- usb gadget uvc disconnect crash fix
- usb typec Kconfig build dependency fix
- usb typec small fixes
- usb-con-gpio bugfix
- usb-storage old driver bugfix
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
thunderbolt: Fix memory leak in tb_handle_dp_bandwidth_request()
usb: dwc3: Properly handle processing of pending events
usb-storage: alauda: Fix uninit-value in alauda_check_media()
usb: common: usb-conn-gpio: Prevent bailing out if initial role is none
USB: Gadget: core: Help prevent panic during UVC unconfigure
usb: typec: mux: intel: Add dependency on USB_COMMON
usb: typec: nb7vpq904m: Add an error handling path in nb7vpq904m_probe()
usb: typec: altmodes/displayport: Signal hpd when configuring pin assignment
usb: typec: tcpm: Fix response to vsafe0V event
thunderbolt: Fix Thunderbolt 3 display flickering issue on 2nd hot plug onwards
Pull x86 fixes from Borislav Petkov:
- Do not parse the confidential computing blob on non-AMD hardware as
it leads to an EFI config table ending up unmapped
- Use the correct segment selector in the 32-bit version of getcpu() in
the vDSO
- Make sure vDSO and VVAR regions are placed in the 47-bit VA range
even on 5-level paging systems
- Add models 0x90-0x91 to the range of AMD Zenbleed-affected CPUs
* tag 'x86_urgent_for_v6.5_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/amd: Enable Zenbleed fix for AMD Custom APU 0405
x86/mm: Fix VDSO and VVAR placement on 5-level paging machines
x86/linkage: Fix typo of BUILD_VDSO in asm/linkage.h
x86/vdso: Choose the right GDT_ENTRY_CPUNODE for 32-bit getcpu() on 64-bit kernel
x86/sev: Do not try to parse for the CC blob on non-AMD hardware
Pull x86 mitigation fixes from Borislav Petkov:
"The first set of fallout fixes after the embargo madness. There will
be another set next week too.
- A first series of cleanups/unifications and documentation
improvements to the SRSO and GDS mitigations code which got
postponed to after the embargo date
- Fix the SRSO aliasing addresses assertion so that the LLVM linker
can parse it too"
* tag 'x86_bugs_for_v6.5_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
driver core: cpu: Fix the fallback cpu_show_gds() name
x86: Move gds_ucode_mitigated() declaration to header
x86/speculation: Add cpu_show_gds() prototype
driver core: cpu: Make cpu_show_not_affected() static
x86/srso: Fix build breakage with the LLVM linker
Documentation/srso: Document IBPB aspect and fix formatting
driver core: cpu: Unify redundant silly stubs
Documentation/hw-vuln: Unify filename specification in index
i.MX fixes for 6.5, 2nd round:
- Fix i.MX93 ANATOP 'reg' resource size to avoid overlapping with TMU
memory area.
- Fix RTC interrupt level on imx6qdl-phytec-mira board.
- Remove LDB endpoint from from the common imx6sx.dtsi as it causes
regression for boards that has the LCDIF connected directly to
a parallel display.
- Drop CSI1 PHY reference clock configuration from i.MX8MM/N device tree
to avoid overclocking.
- Set a proper default tuning step for i.MX6SX and i.MX7D uSDHC to fix
a tuning failure seen with some SD cards.
* tag 'imx-fixes-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
arm64: dts: imx93: Fix anatop node size
ARM: dts: imx: Set default tuning step for imx6sx usdhc
arm64: dts: imx8mm: Drop CSI1 PHY reference clock configuration
arm64: dts: imx8mn: Drop CSI1 PHY reference clock configuration
ARM: dts: imx: Set default tuning step for imx7d usdhc
ARM: dts: imx6: phytec: fix RTC interrupt level
ARM: dts: imx6sx: Remove LDB endpoint
Link: https://lore.kernel.org/r/20230809100034.GS151430@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull tpm irq fixes from Jarkko Sakkinen:
"These change the probing and enabling of interrupts advertised by the
platform firmware (i.e. ACPI, Device Tree) to be an opt-in for tpm_tis,
which can be set from the kernel command-line.
Note that the opt-in change is only for the PC MMIO tpm_tis module. It
does not affect other similar drivers using IRQs, like tpm_tis_spi and
synquacer"
* tag 'tpmdd-v6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm_tis: Opt-in interrupts
tpm: tpm_tis: Fix UPX-i11 DMI_MATCH condition
Pull rdma fixes from Jason Gunthorpe:
"A few small bugs:
- Fix longstanding mlx5 bug where ODP would fail with certain MR
alignments
- cancel work to prevent a hfi1 UAF
- MAINTAINERS update
- UAF, missing mutex_init and an error unwind bug in bnxt_re"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/bnxt_re: Initialize dpi_tbl_lock mutex
RDMA/bnxt_re: Fix error handling in probe failure path
RDMA/bnxt_re: Properly order ib_device_unalloc() to avoid UAF
MAINTAINERS: Remove maintainer of HiSilicon RoCE
IB/hfi1: Fix possible panic during hotplug remove
RDMA/umem: Set iova in ODP flow
Pull zonefs fix from Damien Le Moal:
- The switch to using iomap for executing a direct synchronous write to
sequential files using a zone append BIO overlooked cases where the
BIO built by iomap is too large and needs splitting, which is not
allowed with zone append.
Fix this by using regular write commands instead. The use of zone
append commands will be reintroduced later with proper support from
iomap.
* tag 'zonefs-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: fix synchronous direct writes to sequential files
Pull misc fixes from Andrew Morton:
"14 hotfixes. 11 of these are cc:stable and the remainder address
post-6.4 issues, or are not considered suitable for -stable
backporting"
* tag 'mm-hotfixes-stable-2023-08-11-13-44' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/damon/core: initialize damo_filter->list from damos_new_filter()
nilfs2: fix use-after-free of nilfs_root in dirtying inodes via iput
selftests: cgroup: fix test_kmem_basic false positives
fs/proc/kcore: reinstate bounce buffer for KCORE_TEXT regions
MAINTAINERS: add maple tree mailing list
mm: compaction: fix endless looping over same migrate block
selftests: mm: ksm: fix incorrect evaluation of parameter
hugetlb: do not clear hugetlb dtor until allocating vmemmap
mm: memory-failure: avoid false hwpoison page mapped error info
mm: memory-failure: fix potential unexpected return value from unpoison_memory()
mm/swapfile: fix wrong swap entry type for hwpoisoned swapcache page
radix tree test suite: fix incorrect allocation size for pthreads
crypto, cifs: fix error handling in extract_iter_to_sg()
zsmalloc: fix races between modifications of fullness and isolated
Commit
522b1d6921 ("x86/cpu/amd: Add a Zenbleed fix")
provided a fix for the Zen2 VZEROUPPER data corruption bug affecting
a range of CPU models, but the AMD Custom APU 0405 found on SteamDeck
was not listed, although it is clearly affected by the vulnerability.
Add this CPU variant to the Zenbleed erratum list, in order to
unconditionally enable the fallback fix until a proper microcode update
is available.
Fixes: 522b1d6921 ("x86/cpu/amd: Add a Zenbleed fix")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230811203705.1699914-1-cristian.ciocaltea@collabora.com
The WinSystems WS16C48 I/O address region spans offsets 0x0 through 0xA,
which is a total of 11 bytes. Fix the WS16C48_EXTENT define to the
correct value of 11 so that access to necessary device registers is
properly requested in the ws16c48_probe() callback by the
devm_request_region() function call.
Fixes: 2c05a0f29f ("gpio: ws16c48: Implement and utilize register structures")
Cc: stable@vger.kernel.org
Cc: Paul Demetrotion <pdemetrotion@winsystems.com>
Signed-off-by: William Breathitt Gray <william.gray@linaro.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Pull ACPI fixes from Rafael Wysocki:
"Rework the handling of interrupt overrides on AMD Zen-based machines
to avoid recently introduced regressions (Hans de Goede).
Note that this is intended as a short-term mitigation for 6.5 and the
long-term approach will be to attempt to use the configuration left by
the BIOS, but it requires more investigation"
* tag 'acpi-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: resource: Add IRQ override quirk for PCSpecialist Elimina Pro 16 M
ACPI: resource: Honor MADT INT_SRC_OVR settings for IRQ1 on AMD Zen
ACPI: resource: Always use MADT override IRQ settings for all legacy non i8042 IRQs
ACPI: resource: revert "Remove "Zen" specific match and quirks"
Pull power management fixes from Rafael Wysocki:
"These fix an amd-pstate cpufreq driver issues and recently introduced
hibernation-related breakage.
Specifics:
- Make amd-pstate use device_attributes as expected by the CPU root
kobject (Thomas Weißschuh)
- Restore the previous behavior of resume_store() when hibernation is
not available which is to return the full number of bytes that were
to be written by user space (Vlastimil Babka)"
* tag 'pm-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: amd-pstate: fix global sysfs attribute type
PM: hibernate: fix resume_store() return value when hibernation not available
We want to fix the serial core port DEVNAME to use a port id of the
hardware specific controller port instance instead of the port->line.
For example, the 8250 driver sets up a number of serial8250 ports
initially that can be inherited by the hardware specific driver. At that
the port->line no longer decribes the port's relation to the serial core
controller instance.
Let's fix the issue by assigning port->port_id for each serial core
controller port instance.
Fixes: 7d695d8376 ("serial: core: Fix serial_base_match() after fixing controller port name")
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230811103648.2826-1-tony@atomide.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The port lock is not always held when calling serial8250_clear_IER().
When an oops is in progress, the lock is tried to be taken and when it
is not, a warning is issued:
WARNING: CPU: 0 PID: 1 at drivers/tty/serial/8250/8250_port.c:707 +0x57/0x60
Modules linked in:
CPU: 0 PID: 1 Comm: init Not tainted 6.5.0-rc5-1.g225bfb7-default+ #774 00f1be860db663ed29479b8255d3b01ab1135bd3
Hardware name: QEMU Standard PC ...
RIP: 0010:serial8250_clear_IER+0x57/0x60
...
Call Trace:
<TASK>
serial8250_console_write+0x9e/0x4b0
console_flush_all+0x217/0x5f0
...
Therefore, remove the annotation as it doesn't hold for all invocations.
The other option would be to make the lockdep test conditional on
'oops_in_progress' or pass 'locked' from serial8250_console_write(). I
don't think, that is worth it.
Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Cc: John Ogness <john.ogness@linutronix.de>
Fixes: d0b309a5d3 (serial: 8250: synchronize and annotate UART_IER access)
Link: https://lore.kernel.org/r/20230811064340.13400-1-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 9b9c8195f3 ("tty: n_gsm: fix UAF in gsm_cleanup_mux"), the UAF
problem is not completely fixed. There is a race condition in
gsm_cleanup_mux(), which caused this UAF.
The UAF problem is triggered by the following race:
task[5046] task[5054]
----------------------- -----------------------
gsm_cleanup_mux();
dlci = gsm->dlci[0];
mutex_lock(&gsm->mutex);
gsm_cleanup_mux();
dlci = gsm->dlci[0]; //Didn't take the lock
gsm_dlci_release(gsm->dlci[i]);
gsm->dlci[i] = NULL;
mutex_unlock(&gsm->mutex);
mutex_lock(&gsm->mutex);
dlci->dead = true; //UAF
Fix it by assigning values after mutex_lock().
Link: https://syzkaller.appspot.com/text?tag=CrashReport&x=176188b5a80000
Cc: stable <stable@kernel.org>
Fixes: 9b9c8195f3 ("tty: n_gsm: fix UAF in gsm_cleanup_mux")
Fixes: aa371e96f0 ("tty: n_gsm: fix restart handling via CLD command")
Signed-off-by: Yi Yang <yiyang13@huawei.com>
Co-developed-by: Qiumiao Zhang <zhangqiumiao1@huawei.com>
Signed-off-by: Qiumiao Zhang <zhangqiumiao1@huawei.com>
Link: https://lore.kernel.org/r/20230811031121.153237-1-yiyang13@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Fixes for request_queue state (Ming)
- Another uuid quirk (August)
- RCU poll fix for NVMe (Ming)
- Fix for an IO stall with polled IO (me)
- Fix for blk-iocost stats enable/disable accounting (Chengming)
- Regression fix for large pages for zram (Christoph)
* tag 'block-6.5-2023-08-11' of git://git.kernel.dk/linux:
nvme: core: don't hold rcu read lock in nvme_ns_chr_uring_cmd_iopoll
blk-iocost: fix queue stats accounting
block: don't make REQ_POLLED imply REQ_NOWAIT
block: get rid of unused plug->nowait flag
zram: take device and not only bvec offset into account
nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM9B1 256G and 512G
nvme-rdma: fix potential unbalanced freeze & unfreeze
nvme-tcp: fix potential unbalanced freeze & unfreeze
nvme: fix possible hang when removing a controller during error recovery
Pull io_uring fixes from Jens Axboe:
"A followup fix for the parisc/SHM_COLOUR fix, also from Helge, which
is heading to stable.
And then just the io_uring equivalent of the RESOLVE_CACHED fix in
commit a0fc452a5d from last week for build_open_flags()"
* tag 'io_uring-6.5-2023-08-11' of git://git.kernel.dk/linux:
io_uring/parisc: Adjust pgoff in io_uring mmap() for parisc
io_uring: correct check for O_TMPFILE
In
6524c798b7 ("driver core: cpu: Make cpu_show_not_affected() static")
I fat-fingered the name of cpu_show_gds(). Usually, I'd rebase but since
those are extraordinary embargoed times, the commit above was already
pulled into another tree so no no.
Therefore, fix it ontop.
Fixes: 6524c798b7 ("driver core: cpu: Make cpu_show_not_affected() static")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230811095831.27513-1-bp@alien8.de
Merge a cpufreq fix for 6.5-rc6.
This makes amd-pstate use device_attributes as expected by the CPU root
kobject.
* pm-cpufreq:
cpufreq: amd-pstate: fix global sysfs attribute type
Pull pci fixes from Bjorn Helgaas:
- Add Manivannan Sadhasivam as DesignWare PCIe driver co-maintainer
(Krzysztof Wilczyński)
- Revert "PCI: dwc: Wait for link up only if link is started" to fix a
regression on Qualcomm platforms that don't reach interconnect sync
state if the slot is empty (Johan Hovold)
- Revert "PCI: mvebu: Mark driver as BROKEN" so people can use
pci-mvebu even though some others report problems (Bjorn Helgaas)
- Avoid a NULL pointer dereference when using acpiphp for root bus
hotplug to fix a regression added in v6.5-rc1 (Igor Mammedov)
* tag 'pci-v6.5-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus
Revert "PCI: mvebu: Mark driver as BROKEN"
Revert "PCI: dwc: Wait for link up only if link is started"
MAINTAINERS: Add Manivannan Sadhasivam as DesignWare PCIe driver maintainer
Pull RISC-V fixes from Palmer Dabbelt:
- Fixes for a pair of kexec_file_load() failures
- A fix to ensure the direct mapping is PMD-aligned
- A fix for CPU feature detection on SMP=n
- The MMIO ordering fences have been strengthened to ensure ordering
WRT delay()
- Fixes for a pair of -Wmissing-variable-declarations warnings
- A fix to avoid PUD mappings in vmap on sv39
- flush_cache_vmap() now flushes the TLB to avoid issues on systems
that cache invalid mappings
* tag 'riscv-for-linus-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Implement flush_cache_vmap()
riscv: Do not allow vmap pud mappings for 3-level page table
riscv: mm: fix 2 instances of -Wmissing-variable-declarations
riscv,mmio: Fix readX()-to-delay() ordering
riscv: Fix CPU feature detection with SMP disabled
riscv: Start of DRAM should at least be aligned on PMD size for the direct mapping
riscv/kexec: load initrd high in available memory
riscv/kexec: handle R_RISCV_CALL_PLT relocation type
Pull parisc architecture fixes from Helge Deller:
"A bugfix in the LWS code, which used different lock words than the
parisc lightweight spinlock checks. This inconsistency triggered false
positives when the lightweight spinlock checks checked the locks of
mutexes.
The other patches are trivial cleanups and most of them fix sparse
warnings.
Summary:
- Fix LWS code to use same lock words as for the parisc lightweight
spinlocks
- Use PTR_ERR_OR_ZERO() in pdt init code
- Fix lots of sparse warnings"
* tag 'parisc-for-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: perf: Make cpu_device variable static
parisc: ftrace: Add declaration for ftrace_function_trampoline()
parisc: boot: Nuke some sparse warnings in decompressor
parisc: processor: Include asm/smp.h for init_per_cpu()
parisc: unaligned: Include linux/sysctl.h for unaligned_enabled
parisc: Move proc_mckinley_root and proc_runway_root to sba_iommu
parisc: dma: Add prototype for pcxl_dma_start
parisc: parisc_ksyms: Include libgcc.h for libgcc prototypes
parisc: ucmpdi2: Fix no previous prototype for '__ucmpdi2' warning
parisc: firmware: Mark pdc_result buffers local
parisc: firmware: Fix sparse context imbalance warnings
parisc: signal: Fix sparse incorrect type in assignment warning
parisc: ioremap: Fix sparse warnings
parisc: fault: Use C99 arrary initializers
parisc: pdt: Use PTR_ERR_OR_ZERO() to simplify code
parisc: Fix lightweight spinlock checks to not break futexes
Pull cpuidle psci fixes from Ulf Hansson:
"A couple of cpuidle-psci fixes. Usually, this is managed by arm-soc
maintainers or Rafael, although due to a busy period I have stepped in
to help out:
- Fix the error path to prevent reverting from OSI back to PC mode"
* tag 'cpuidle-psci-v6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
cpuidle: psci: Move enabling OSI mode after power domains creation
cpuidle: dt_idle_genpd: Add helper function to remove genpd topology
Pull drm fixes from Dave Airlie:
"This week's fixes, as expected amdgpu is probably a little larger
since it skipped a week, but otherwise a few nouveau fixes, a couple
of bridge, rockchip and ivpu fixes.
amdgpu:
- S/G display workaround for platforms with >= 64G of memory
- S0i3 fix
- SMU 13.0.0 fixes
- Disable SMU 13.x OD features temporarily while the interface is
reworked to enable additional functionality
- Fix cursor gamma issues on DCN3+
- SMU 13.0.6 fixes
- Fix possible UAF in CS IOCTL
- Polaris display regression fix
- Only enable CP GFX shadowing on SR-IOV
amdkfd:
- Raven/Picasso KFD regression fix
bridge:
- it6505: runtime PM fix
- lt9611: revert Do not generate HFP/HBP/HSA and EOT packet
nouveau:
- enable global memory loads for helper invocations for userspace
driver
- dp 1.3 dpcd+ workaround fix
- remove unused function
- revert incorrect NULL check
accel/ivpu:
- Add set_pages_array_wc/uc for internal buffers
rockchip:
- Don't spam logs in atomic check"
* tag 'drm-fixes-2023-08-11' of git://anongit.freedesktop.org/drm/drm: (23 commits)
drm/shmem-helper: Reset vma->vm_ops before calling dma_buf_mmap()
drm/amdkfd: disable IOMMUv2 support for Raven
drm/amdkfd: disable IOMMUv2 support for KV/CZ
drm/amdkfd: ignore crat by default
drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOV
drm/amd/display: Fix a regression on Polaris cards
drm/amdgpu: fix possible UAF in amdgpu_cs_pass1()
drm/amd/pm: Fix SMU v13.0.6 energy reporting
drm/amd/display: check attr flag before set cursor degamma on DCN3+
drm/amd/pm: disable the SMU13 OD feature support temporarily
drm/amd/pm: correct the pcie width for smu 13.0.0
drm/amd/display: Don't show stack trace for missing eDP
drm/amdgpu: Match against exact bootloader status
drm/amd/pm: skip the RLC stop when S0i3 suspend for SMU v13.0.4/11
drm/amd: Disable S/G for APUs when 64GB or more host memory
drm/rockchip: Don't spam logs in atomic check
accel/ivpu: Add set_pages_array_wc/uc for internal buffers
drm/nouveau/disp: Revert a NULL check inside nouveau_connector_get_modes
Revert "drm/bridge: lt9611: Do not generate HFP/HBP/HSA and EOT packet"
drm/nouveau: remove unused tu102_gr_load() function
...
Now nvme_ns_chr_uring_cmd_iopoll() has switched to request based io
polling, and the associated NS is guaranteed to be live in case of
io polling, so request is guaranteed to be valid because blk-mq uses
pre-allocated request pool.
Remove the rcu read lock in nvme_ns_chr_uring_cmd_iopoll(), which
isn't needed any more after switching to request based io polling.
Fix "BUG: sleeping function called from invalid context" because
set_page_dirty_lock() from blk_rq_unmap_user() may sleep.
Fixes: 585079b6e4 ("nvme: wire up async polling for io passthrough commands")
Reported-by: Guangwu Zhang <guazhang@redhat.com>
Cc: Kanchan Joshi <joshi.k@samsung.com>
Cc: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Guangwu Zhang <guazhang@redhat.com>
Link: https://lore.kernel.org/r/20230809020440.174682-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The reference of pdev->dev is taken by of_find_device_by_node, so
it should be released when not need anymore.
Fixes: 7dc54d3b8d ("net: pcs: add Renesas MII converter driver")
Signed-off-by: Xiang Yang <xiangyang3@huawei.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 25266128fe ("virtio-net: fix race between set queues and
probe") tries to fix the race between set queues and probe by calling
_virtnet_set_queues() before DRIVER_OK is set. This violates virtio
spec. Fixing this by setting queues after virtio_device_ready().
Note that rtnl needs to be held for userspace requests to change the
number of queues. So we are serialized in this way.
Fixes: 25266128fe ("virtio-net: fix race between set queues and probe")
Reported-by: Dragos Tatulea <dtatulea@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With deferred close we can have closes that race with lease breaks,
and so with the current checks for whether to send the lease response,
oplock_response(), this can mean that an unmount (kill_sb) can occur
just before we were checking if the tcon->ses is valid. See below:
[Fri Aug 4 04:12:50 2023] RIP: 0010:cifs_oplock_break+0x1f7/0x5b0 [cifs]
[Fri Aug 4 04:12:50 2023] Code: 7d a8 48 8b 7d c0 c0 e9 02 48 89 45 b8 41 89 cf e8 3e f5 ff ff 4c 89 f7 41 83 e7 01 e8 82 b3 03 f2 49 8b 45 50 48 85 c0 74 5e <48> 83 78 60 00 74 57 45 84 ff 75 52 48 8b 43 98 48 83 eb 68 48 39
[Fri Aug 4 04:12:50 2023] RSP: 0018:ffffb30607ddbdf8 EFLAGS: 00010206
[Fri Aug 4 04:12:50 2023] RAX: 632d223d32612022 RBX: ffff97136944b1e0 RCX: 0000000080100009
[Fri Aug 4 04:12:50 2023] RDX: 0000000000000001 RSI: 0000000080100009 RDI: ffff97136944b188
[Fri Aug 4 04:12:50 2023] RBP: ffffb30607ddbe58 R08: 0000000000000001 R09: ffffffffc08e0900
[Fri Aug 4 04:12:50 2023] R10: 0000000000000001 R11: 000000000000000f R12: ffff97136944b138
[Fri Aug 4 04:12:50 2023] R13: ffff97149147c000 R14: ffff97136944b188 R15: 0000000000000000
[Fri Aug 4 04:12:50 2023] FS: 0000000000000000(0000) GS:ffff9714f7c00000(0000) knlGS:0000000000000000
[Fri Aug 4 04:12:50 2023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri Aug 4 04:12:50 2023] CR2: 00007fd8de9c7590 CR3: 000000011228e000 CR4: 0000000000350ef0
[Fri Aug 4 04:12:50 2023] Call Trace:
[Fri Aug 4 04:12:50 2023] <TASK>
[Fri Aug 4 04:12:50 2023] process_one_work+0x225/0x3d0
[Fri Aug 4 04:12:50 2023] worker_thread+0x4d/0x3e0
[Fri Aug 4 04:12:50 2023] ? process_one_work+0x3d0/0x3d0
[Fri Aug 4 04:12:50 2023] kthread+0x12a/0x150
[Fri Aug 4 04:12:50 2023] ? set_kthread_struct+0x50/0x50
[Fri Aug 4 04:12:50 2023] ret_from_fork+0x22/0x30
[Fri Aug 4 04:12:50 2023] </TASK>
To fix this change the ordering of the checks before sending the oplock_response
to first check if the openFileList is empty.
Fixes: da787d5b74 ("SMB3: Do not send lease break acknowledgment if all file handles have been closed")
Suggested-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Cross merge x86 fixes to fix clang linking errors:
ld.lld: error: ./arch/x86/kernel/vmlinux.lds:221: at least one side of the expression must be absolute
These will hopefully be downstream by the time we ship
the next batch of fixes.
* 'x86/bugs' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Move gds_ucode_mitigated() declaration to header
x86/speculation: Add cpu_show_gds() prototype
driver core: cpu: Make cpu_show_not_affected() static
x86/srso: Fix build breakage with the LLVM linker
Documentation/srso: Document IBPB aspect and fix formatting
driver core: cpu: Unify redundant silly stubs
Documentation/hw-vuln: Unify filename specification in index
Link: https://lore.kernel.org/all/CAHk-=wj_b+FGTnevQSBAtCWuhCk=0oQ_THvthBW2hzqpOTLFmg@mail.gmail.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If we repeatedly fail to fake offline memory to unplug it, we won't be
sending any unplug requests to the device. However, we only check if the
config changed when sending such (un)plug requests.
We could end up trying for a long time to unplug memory, even though
the config changed already and we're not supposed to unplug memory
anymore. For example, the hypervisor might detect a low-memory situation
while unplugging memory and decide to replug some memory. Continuing
trying to unplug memory in that case can be problematic.
So let's check on a more regular basis.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230713145551.2824980-5-david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In case offline_and_remove_memory() fails in SBM, we leave a completely
unplugged Linux memory block stick around until we try plugging memory
again. We won't try removing that memory block again.
offline_and_remove_memory() may, for example, fail if we're racing with
another alloc_contig_range() user, if allocating temporary memory fails,
or if some memory notifier rejected the offlining request.
Let's handle that case better, by simple retrying to offline and remove
such memory.
Tested using CONFIG_MEMORY_NOTIFIER_ERROR_INJECT.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230713145551.2824980-4-david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Just like we do with alloc_contig_range(), let's convert all unknown
errors to -EBUSY, but WARN so we can look into the issue. For example,
offline_pages() could fail with -EINTR, which would be unexpected in our
case.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230713145551.2824980-3-david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When "unsafe unplug" is enabled, we don't fake-offline all memory ahead of
actual memory offlining using alloc_contig_range(). Instead, we rely on
offline_pages() to also perform actual page migration, which might fail
or take a very long time.
In that case, it's possible to easily run into endless loops that cannot be
aborted anymore (as offlining is triggered by a workqueue then): For
example, a single (accidentally) permanently unmovable page in
ZONE_MOVABLE results in an endless loop. For ZONE_NORMAL, races between
isolating the pageblock (and checking for unmovable pages) and
concurrent page allocation are possible and similarly result in endless
loops.
The idea of the unsafe unplug mode was to make it possible to more
reliably unplug large memory blocks. However, (a) we really should be
tackling that differently, by extending the alloc_contig_range()-based
mechanism; and (b) this mode is not the default and as far as I know,
it's unused either way.
So let's simply get rid of it.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230713145551.2824980-2-david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We were allocating irq vectors at the time the aux dev was probed,
but that is before the PCI VF is assigned to a separate iommu domain
by vhost_vdpa. Because vhost_vdpa later changes the iommu domain the
interrupts do not work.
Instead, we can allocate the irq vectors later when we see DRIVER_OK and
know that the reassignment of the PCI VF to an iommu domain has already
happened.
Fixes: 151cc834f3 ("pds_vdpa: add support for vdpa and vdpamgmt interfaces")
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230711042437.69381-5-shannon.nelson@amd.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Our driver sets a mac if the HW is 00:..:00 so we need to be sure to
advertise VIRTIO_NET_F_MAC even if the HW doesn't. We also need to be
sure that virtio_net sees the VIRTIO_NET_F_MAC and doesn't rewrite the
mac address that a user may have set with the vdpa utility.
After reading the hw_feature bits, add the VIRTIO_NET_F_MAC to the driver's
supported_features and use that for reporting what is available. If the
HW is not advertising it, be sure to strip the VIRTIO_NET_F_MAC before
finishing the feature negotiation. If the user specifies a device_features
bitpattern in the vdpa utility without the VIRTIO_NET_F_MAC set, then
don't set the mac.
Fixes: 151cc834f3 ("pds_vdpa: add support for vdpa and vdpamgmt interfaces")
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Message-Id: <20230711042437.69381-3-shannon.nelson@amd.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Kernel uses `struct virtio_net_ctrl_rss` to save command-specific-data
for both the VIRTIO_NET_CTRL_MQ_HASH_CONFIG and
VIRTIO_NET_CTRL_MQ_RSS_CONFIG commands.
According to the VirtIO standard, "Field reserved MUST contain zeroes.
It is defined to make the structure to match the layout of
virtio_net_rss_config structure, defined in 5.1.6.5.7.".
Yet for the VIRTIO_NET_CTRL_MQ_HASH_CONFIG command case, the `max_tx_vq`
field in struct virtio_net_ctrl_rss, which corresponds to the
`reserved` field in struct virtio_net_hash_config, is not zeroed,
thereby violating the VirtIO standard.
This patch solves this problem by zeroing this field in
virtnet_init_default_rss().
Cc: Andrew Melnychenko <andrew@daynix.com>
Cc: stable@vger.kernel.org
Fixes: c7114b1249 ("drivers/net/virtio_net: Added basic RSS support.")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20230810110405.25558-1-yin31149@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wireless and bpf.
Still trending up in size but the good news is that the "current"
regressions are resolved, AFAIK.
We're getting weirdly many fixes for Wake-on-LAN and suspend/resume
handling on embedded this week (most not merged yet), not sure why.
But those are all for older bugs.
Current release - regressions:
- tls: set MSG_SPLICE_PAGES consistently when handing encrypted data
over to TCP
Current release - new code bugs:
- eth: mlx5: correct IDs on VFs internal to the device (IPU)
Previous releases - regressions:
- phy: at803x: fix WoL support / reporting on AR8032
- bonding: fix incorrect deletion of ETH_P_8021AD protocol VID from
slaves, leading to BUG_ON()
- tun: prevent tun_build_skb() from exceeding the packet size limit
- wifi: rtw89: fix 8852AE disconnection caused by RX full flags
- eth/PCI: enetc: fix probing after 6fffbc7ae1 ("PCI: Honor
firmware's device disabled status"), keep PCI devices around even
if they are disabled / not going to be probed to be able to apply
quirks on them
- eth: prestera: fix handling IPv4 routes with nexthop IDs
Previous releases - always broken:
- netfilter: re-work garbage collection to avoid races between
user-facing API and timeouts
- tunnels: fix generating ipv4 PMTU error on non-linear skbs
- nexthop: fix infinite nexthop bucket dump when using maximum
nexthop ID
- wifi: nl80211: fix integer overflow in nl80211_parse_mbssid_elems()
Misc:
- unix: use consistent error code in SO_PEERPIDFD
- ipv6: adjust ndisc_is_useropt() to include PREFIX_INFO, in prep for
upcoming IETF RFC"
* tag 'net-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
net: hns3: fix strscpy causing content truncation issue
net: tls: set MSG_SPLICE_PAGES consistently
ibmvnic: Ensure login failure recovery is safe from other resets
ibmvnic: Do partial reset on login failure
ibmvnic: Handle DMA unmapping of login buffs in release functions
ibmvnic: Unmap DMA login rsp buffer on send login fail
ibmvnic: Enforce stronger sanity checks on login response
net: mana: Fix MANA VF unload when hardware is unresponsive
netfilter: nf_tables: remove busy mark and gc batch API
netfilter: nft_set_hash: mark set element as dead when deleting from packet path
netfilter: nf_tables: adapt set backend to use GC transaction API
netfilter: nf_tables: GC transaction API to avoid race with control plane
selftests/bpf: Add sockmap test for redirecting partial skb data
selftests/bpf: fix a CI failure caused by vsock sockmap test
bpf, sockmap: Fix bug that strp_done cannot be called
bpf, sockmap: Fix map type error in sock_map_del_link
xsk: fix refcount underflow in error path
ipv6: adjust ndisc_is_useropt() to also return true for PIO
selftests: forwarding: bridge_mdb: Make test more robust
selftests: forwarding: bridge_mdb_max: Fix failing test with old libnet
...
mlx5_vdpa_destroy_mr can be called from .set_map with data ASID after
the control virtqueue ASID iotlb has been populated. The control vq
iotlb must not be cleared, since it will not be populated again.
So call the ASID aware destroy function which makes sure that the
right vq resource is destroyed.
Fixes: 8fcd20c307 ("vdpa/mlx5: Support different address spaces for control and data")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Message-Id: <20230802171231.11001-5-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
The mr->initialized flag is shared between the control vq and data vq
part of the mr init/uninit. But if the control vq and data vq get placed
in different ASIDs, it can happen that initializing the control vq will
prevent the data vq mr from being initialized.
This patch consolidates the control and data vq init parts into their
own init functions. The mr->initialized will now be used for the data vq
only. The control vq currently doesn't need a flag.
The uninitializing part is also taken care of: mlx5_vdpa_destroy_mr got
split into data and control vq functions which are now also ASID aware.
Fixes: 8fcd20c307 ("vdpa/mlx5: Support different address spaces for control and data")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Message-Id: <20230802171231.11001-3-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
The vdpa_nl_policy structure is used to validate the nlattr when parsing
the incoming nlmsg. It will ensure the attribute being described produces
a valid nlattr pointer in info->attrs before entering into each handler
in vdpa_nl_ops.
That is to say, the missing part in vdpa_nl_policy may lead to illegal
nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.
This patch adds the missing nla_policy for vdpa max vqp attr to avoid
such bugs.
Fixes: ad69dd0bf2 ("vdpa: Introduce query of device config layout")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Cc: stable@vger.kernel.org
Message-Id: <20230727175757.73988-7-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The vdpa_nl_policy structure is used to validate the nlattr when parsing
the incoming nlmsg. It will ensure the attribute being described produces
a valid nlattr pointer in info->attrs before entering into each handler
in vdpa_nl_ops.
That is to say, the missing part in vdpa_nl_policy may lead to illegal
nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.
This patch adds the missing nla_policy for vdpa queue index attr to avoid
such bugs.
Fixes: 13b00b1356 ("vdpa: Add support for querying vendor statistics")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Cc: stable@vger.kernelorg
Message-Id: <20230727175757.73988-5-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The vdpa_nl_policy structure is used to validate the nlattr when parsing
the incoming nlmsg. It will ensure the attribute being described produces
a valid nlattr pointer in info->attrs before entering into each handler
in vdpa_nl_ops.
That is to say, the missing part in vdpa_nl_policy may lead to illegal
nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.
This patch adds the missing nla_policy for vdpa features attr to avoid
such bugs.
Fixes: 90fea5a800 ("vdpa: device feature provisioning")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Cc: stable@vger.kernel.org
Message-Id: <20230727175757.73988-3-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The 'is_legacy' flag is used to differentiate between legacy vs modern
device. Currently, it is based on the value of vp_dev->ldev.ioaddr.
However, due to the shared memory of the union between struct
virtio_pci_legacy_device and struct virtio_pci_modern_device, when
virtio_pci_modern_probe modifies the content of struct
virtio_pci_modern_device, it affects the content of struct
virtio_pci_legacy_device, and ldev.ioaddr is no longer zero, causing
the 'is_legacy' flag to be set as true. To resolve issue, when legacy
device is probed, mark 'is_legacy' as true, when modern device is
probed, keep 'is_legacy' as false.
Fixes: 4f0fc22534 ("virtio_pci: Optimize virtio_pci_device structure size")
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Message-Id: <20230719154550.79536-1-feliu@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
The linux block layer requires bios/requests to have lengths with a 512
byte alignment. Some drivers/layers like dm-crypt and the directi IO code
will test for it and just fail. Other drivers like SCSI just assume the
requirement is met and will end up in infinte retry loops. The problem
for drivers like SCSI is that it uses functions like blk_rq_cur_sectors
and blk_rq_sectors which divide the request's length by 512. If there's
lefovers then it just gets dropped. But other code in the block/scsi
layer may use blk_rq_bytes/blk_rq_cur_bytes and end up thinking there is
still data left and try to retry the cmd. We can then end up getting
stuck in retry loops where part of the block/scsi thinks there is data
left, but other parts think we want to do IOs of zero length.
Linux will always check for alignment, but windows will not. When
vhost-scsi then translates the iovec it gets from a windows guest to a
scatterlist, we can end up with sg items where the sg->length is not
divisible by 512 due to the misaligned offset:
sg[0].offset = 255;
sg[0].length = 3841;
sg...
sg[N].offset = 0;
sg[N].length = 255;
When the lio backends then convert the SG to bios or other iovecs, we
end up sending them with the same misaligned values and can hit the
issues above.
This just has us drop down to allocating a temp page and copying the data
when we detect a misaligned buffer and the IO is large enough that it
will get split into multiple bad IOs.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230709202859.138387-2-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
vm_dev has a separate lifecycle because it has a 'struct device'
embedded. Thus, having a release callback for it is correct.
Allocating the vm_dev struct with devres totally breaks this protection,
though. Instead of waiting for the vm_dev release callback, the memory
is freed when the platform_device is removed. Resulting in a
use-after-free when finally the callback is to be called.
To easily see the problem, compile the kernel with
CONFIG_DEBUG_KOBJECT_RELEASE and unbind with sysfs.
The fix is easy, don't use devres in this case.
Found during my research about object lifetime problems.
Fixes: 7eb781b1bb ("virtio_mmio: add cleanup for virtio_mmio_probe")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Message-Id: <20230629120526.7184-1-wsa+renesas@sang-engineering.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
hns3_dbg_fill_content()/hclge_dbg_fill_content() is aim to integrate some
items to a string for content, and we add '\n' and '\0' in the last
two bytes of content.
strscpy() will add '\0' in the last byte of destination buffer(one of
items), it result in finishing content print ahead of schedule and some
dump content truncation.
One Error log shows as below:
cat mac_list/uc
UC MAC_LIST:
Expected:
UC MAC_LIST:
FUNC_ID MAC_ADDR STATE
pf 00:2b:19:05:03:00 ACTIVE
The destination buffer is length-bounded and not required to be
NUL-terminated, so just change strscpy() to memcpy() to fix it.
Fixes: 1cf3d5567f ("net: hns3: fix strncpy() not using dest-buf length as length issue")
Signed-off-by: Hao Chen <chenhao418@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://lore.kernel.org/r/20230809020902.1941471-1-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We used to change the flags for the last segment, because
non-last segments had the MSG_SENDPAGE_NOTLAST flag set.
That flag is no longer a thing so remove the setting.
Since flags most likely don't have MSG_SPLICE_PAGES set
this avoids passing parts of the sg as splice and parts
as non-splice. Before commit under Fixes we'd have called
tcp_sendpage() which would add the MSG_SPLICE_PAGES.
Why this leads to trouble remains unclear but Tariq
reports hitting the WARN_ON(!sendpage_ok()) due to
page refcount of 0.
Fixes: e117dcfd64 ("tls: Inline do_tcp_sendpages()")
Reported-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/all/4c49176f-147a-4283-f1b1-32aac7b4b996@gmail.com/
Tested-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20230808180917.1243540-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull dmaengine fixes from Vinod Koul:
- HAS_IOMEM fixes for fsl edma and intel idma
- return-value fix, interrupt vector setting and typo fix for xilinx
xdma
- email updates for codeaurora email domain move
- correct pause status for pl330 driver
- idxd clear flag on disable fix
- function documentation fix for owl dma
- potential un-allocated memory fix for mcf driver
* tag 'dmaengine-fix-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: xilinx: xdma: Fix typo
dmaengine: xilinx: xdma: Fix interrupt vector setting
dmaengine: owl-dma: Modify mismatched function name
dmaengine: idxd: Clear PRS disable flag when disabling IDXD device
dmaengine: pl330: Return DMA_PAUSED when transaction is paused
dmaengine: qcom_hidma: Update codeaurora email domain
dmaengine: mcf-edma: Fix a potential un-allocated memory access
dmaengine: xilinx: xdma: Fix Judgment of the return value
idmaengine: make FSL_EDMA and INTEL_IDMA64 depends on HAS_IOMEM
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The existing attempt to resolve races between control plane and GC work
is error prone, as reported by Bien Pham <phamnnb@sea.com>, some places
forgot to call nft_set_elem_mark_busy(), leading to double-deactivation
of elements.
This series contains the following patches:
1) Do not skip expired elements during walk otherwise elements might
never decrement the reference counter on data, leading to memleak.
2) Add a GC transaction API to replace the former attempt to deal with
races between control plane and GC. GC worker sets on NFT_SET_ELEM_DEAD_BIT
on elements and it creates a GC transaction to remove the expired
elements, GC transaction could abort in case of interference with
control plane and retried later (GC async). Set backends such as
rbtree and pipapo also perform GC from control plane (GC sync), in
such case, element deactivation and removal is safe because mutex
is held then collected elements are released via call_rcu().
3) Adapt existing set backends to use the GC transaction API.
4) Update rhash set backend to set on _DEAD bit to report deleted
elements from datapath for GC.
5) Remove old GC batch API and the NFT_SET_ELEM_BUSY_BIT.
* tag 'nf-23-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: remove busy mark and gc batch API
netfilter: nft_set_hash: mark set element as dead when deleting from packet path
netfilter: nf_tables: adapt set backend to use GC transaction API
netfilter: nf_tables: GC transaction API to avoid race with control plane
netfilter: nf_tables: don't skip expired elements during walk
====================
Link: https://lore.kernel.org/r/20230810070830.24064-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin KaFai Lau says:
====================
pull-request: bpf 2023-08-09
We've added 5 non-merge commits during the last 7 day(s) which contain
a total of 6 files changed, 102 insertions(+), 8 deletions(-).
The main changes are:
1) A bpf sockmap memleak fix and a fix in accessing the programs of
a sockmap under the incorrect map type from Xu Kuohai.
2) A refcount underflow fix in xsk from Magnus Karlsson.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add sockmap test for redirecting partial skb data
selftests/bpf: fix a CI failure caused by vsock sockmap test
bpf, sockmap: Fix bug that strp_done cannot be called
bpf, sockmap: Fix map type error in sock_map_del_link
xsk: fix refcount underflow in error path
====================
Link: https://lore.kernel.org/r/20230810055303.120917-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If a login request fails, the recovery process should be protected
against parallel resets. It is a known issue that freeing and
registering CRQ's in quick succession can result in a failover CRQ from
the VIOS. Processing a failover during login recovery is dangerous for
two reasons:
1. This will result in two parallel initialization processes, this can
cause serious issues during login.
2. It is possible that the failover CRQ is received but never executed.
We get notified of a pending failover through a transport event CRQ.
The reset is not performed until a INIT CRQ request is received.
Previously, if CRQ init fails during login recovery, then the ibmvnic
irq is freed and the login process returned error. If failover_pending
is true (a transport event was received), then the ibmvnic device
would never be able to process the reset since it cannot receive the
CRQ_INIT request due to the irq being freed. This leaved the device
in a inoperable state.
Therefore, the login failure recovery process must be hardened against
these possible issues. Possible failovers (due to quick CRQ free and
init) must be avoided and any issues during re-initialization should be
dealt with instead of being propagated up the stack. This logic is
similar to that of ibmvnic_probe().
Fixes: dff515a3e7 ("ibmvnic: Harden device login requests")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230809221038.51296-5-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Perform a partial reset before sending a login request if any of the
following are true:
1. If a previous request times out. This can be dangerous because the
VIOS could still receive the old login request at any point after
the timeout. Therefore, it is best to re-register the CRQ's and
sub-CRQ's before retrying.
2. If the previous request returns an error that is not described in
PAPR. PAPR provides procedures if the login returns with partial
success or aborted return codes (section L.5.1) but other values
do not have a defined procedure. Previously, these conditions
just returned error from the login function rather than trying
to resolve the issue.
This can cause further issues since most callers of the login
function are not prepared to handle an error when logging in. This
improper cleanup can lead to the device being permanently DOWN'd.
For example, if the VIOS believes that the device is already logged
in then it will return INVALID_STATE (-7). If we never re-register
CRQ's then it will always think that the device is already logged
in. This leaves the device inoperable.
The partial reset involves freeing the sub-CRQs, freeing the CRQ then
registering and initializing a new CRQ and sub-CRQs. This essentially
restarts all communication with VIOS to allow for a fresh login attempt
that will be unhindered by any previous failed attempts.
Fixes: dff515a3e7 ("ibmvnic: Harden device login requests")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230809221038.51296-4-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rather than leaving the DMA unmapping of the login buffers to the
login response handler, move this work into the login release functions.
Previously, these functions were only used for freeing the allocated
buffers. This could lead to issues if there are more than one
outstanding login buffer requests, which is possible if a login request
times out.
If a login request times out, then there is another call to send login.
The send login function makes a call to the login buffer release
function. In the past, this freed the buffers but did not DMA unmap.
Therefore, the VIOS could still write to the old login (now freed)
buffer. It is for this reason that it is a good idea to leave the DMA
unmap call to the login buffers release function.
Since the login buffer release functions now handle DMA unmapping,
remove the duplicate DMA unmapping in handle_login_rsp().
Fixes: dff515a3e7 ("ibmvnic: Harden device login requests")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230809221038.51296-3-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ensure that all offsets in a login response buffer are within the size
of the allocated response buffer. Any offsets or lengths that surpass
the allocation are likely the result of an incomplete response buffer.
In these cases, a full reset is necessary.
When attempting to login, the ibmvnic device will allocate a response
buffer and pass a reference to the VIOS. The VIOS will then send the
ibmvnic device a LOGIN_RSP CRQ to signal that the buffer has been filled
with data. If the ibmvnic device does not get a response in 20 seconds,
the old buffer is freed and a new login request is sent. With 2
outstanding requests, any LOGIN_RSP CRQ's could be for the older
login request. If this is the case then the login response buffer (which
is for the newer login request) could be incomplete and contain invalid
data. Therefore, we must enforce strict sanity checks on the response
buffer values.
Testing has shown that the `off_rxadd_buff_size` value is filled in last
by the VIOS and will be the smoking gun for these circumstances.
Until VIOS can implement a mechanism for tracking outstanding response
buffers and a method for mapping a LOGIN_RSP CRQ to a particular login
response buffer, the best ibmvnic can do in this situation is perform a
full reset.
Fixes: dff515a3e7 ("ibmvnic: Harden device login requests")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230809221038.51296-1-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When unloading the MANA driver, mana_dealloc_queues() waits for the MANA
hardware to complete any inflight packets and set the pending send count
to zero. But if the hardware has failed, mana_dealloc_queues()
could wait forever.
Fix this by adding a timeout to the wait. Set the timeout to 120 seconds,
which is a somewhat arbitrary value that is more than long enough for
functional hardware to complete any sends.
Cc: stable@vger.kernel.org
Fixes: ca9c54d2d6 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Link: https://lore.kernel.org/r/1691576525-24271-1-git-send-email-schakrabarti@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The RISC-V kernel needs a sfence.vma after a page table modification: we
used to rely on the vmalloc fault handling to emit an sfence.vma, but
commit 7d3332be01 ("riscv: mm: Pre-allocate PGD entries for
vmalloc/modules area") got rid of this path for 64-bit kernels, so now we
need to explicitly emit a sfence.vma in flush_cache_vmap().
Note that we don't need to implement flush_cache_vunmap() as the generic
code should emit a flush tlb after unmapping a vmalloc region.
Fixes: 7d3332be01 ("riscv: mm: Pre-allocate PGD entries for vmalloc/modules area")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230725132246.817726-1-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The vmalloc_fault() path was removed and to avoid syncing the vmalloc PGD
mappings, they are now preallocated. But if the kernel can use a PUD
mapping (which in sv39 is actually a PGD mapping) for large vmalloc
allocation, it will free the current unused preallocated PGD mapping and
install a new leaf one. Since there is no sync anymore, some page tables
lack this new mapping and that triggers a panic.
So only allow PUD mappings for sv48 and sv57.
Fixes: 7d3332be01 ("riscv: mm: Pre-allocate PGD entries for vmalloc/modules area")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230808130709.1502614-1-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Return PTR_ERR_OR_ZERO() instead of return 0 or PTR_ERR() to
simplify code.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Helge Deller <deller@gmx.de>
The lightweight spinlock checks verify that a spinlock has either value
0 (spinlock locked) and that not any other bits than in
__ARCH_SPIN_LOCK_UNLOCKED_VAL is set.
This breaks the current LWS code, which writes the address of the lock
into the lock word to unlock it, which was an optimization to save one
assembler instruction.
Fix it by making spinlock_types.h accessible for asm code, change the
LWS spinlock-unlocking code to write __ARCH_SPIN_LOCK_UNLOCKED_VAL into
the lock word, and add some missing lightweight spinlock checks to the
LWS path. Finally, make the spinlock checks dependend on DEBUG_KERNEL.
Noticed-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # v6.4+
Fixes: 15e64ef652 ("parisc: Add lightweight spinlock checks")
We set cache_block_group_error if btrfs_cache_block_group() returns an
error, this is because we could end up not finding space to allocate and
mistakenly return -ENOSPC, and which could then abort the transaction
with the incorrect errno, and in the case of ENOSPC result in a
WARN_ON() that will trip up tests like generic/475.
However there's the case where multiple threads can be racing, one
thread gets the proper error, and the other thread doesn't actually call
btrfs_cache_block_group(), it instead sees ->cached ==
BTRFS_CACHE_ERROR. Again the result is the same, we fail to allocate
our space and return -ENOSPC. Instead we need to set
cache_block_group_error to -EIO in this case to make sure that if we do
not make our allocation we get the appropriate error returned back to
the caller.
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Syzbot reported a crash that an ASSERT() got triggered inside
prepare_to_merge().
That ASSERT() makes sure the reloc tree is properly pointed back by its
subvolume tree.
[CAUSE]
After more debugging output, it turns out we had an invalid reloc tree:
BTRFS error (device loop1): reloc tree mismatch, root 8 has no reloc root, expect reloc root key (-8, 132, 8) gen 17
Note the above root key is (TREE_RELOC_OBJECTID, ROOT_ITEM,
QUOTA_TREE_OBJECTID), meaning it's a reloc tree for quota tree.
But reloc trees can only exist for subvolumes, as for non-subvolume
trees, we just COW the involved tree block, no need to create a reloc
tree since those tree blocks won't be shared with other trees.
Only subvolumes tree can share tree blocks with other trees (thus they
have BTRFS_ROOT_SHAREABLE flag).
Thus this new debug output proves my previous assumption that corrupted
on-disk data can trigger that ASSERT().
[FIX]
Besides the dedicated fix and the graceful exit, also let tree-checker to
check such root keys, to make sure reloc trees can only exist for subvolumes.
CC: stable@vger.kernel.org # 5.15+
Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Syzbot reported a crash that an ASSERT() got triggered inside
prepare_to_merge().
[CAUSE]
The root cause of the triggered ASSERT() is we can have a race between
quota tree creation and relocation.
This leads us to create a duplicated quota tree in the
btrfs_read_fs_root() path, and since it's treated as fs tree, it would
have ROOT_SHAREABLE flag, causing us to create a reloc tree for it.
The bug itself is fixed by a dedicated patch for it, but this already
taught us the ASSERT() is not something straightforward for
developers.
[ENHANCEMENT]
Instead of using an ASSERT(), let's handle it gracefully and output
extra info about the mismatch reloc roots to help debug.
Also with the above ASSERT() removed, we can trigger ASSERT(0)s inside
merge_reloc_roots() later.
Also replace those ASSERT(0)s with WARN_ON()s.
CC: stable@vger.kernel.org # 5.15+
Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Syzbot reported a weird ASSERT() triggered inside prepare_to_merge().
assertion failed: root->reloc_root == reloc_root, in fs/btrfs/relocation.c:1919
------------[ cut here ]------------
kernel BUG at fs/btrfs/relocation.c:1919!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9904 Comm: syz-executor.3 Not tainted
6.4.0-syzkaller-08881-g533925cb7604 #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 05/27/2023
RIP: 0010:prepare_to_merge+0xbb2/0xc40 fs/btrfs/relocation.c:1919
Code: fe e9 f5 (...)
RSP: 0018:ffffc9000325f760 EFLAGS: 00010246
RAX: 000000000000004f RBX: ffff888075644030 RCX: 1481ccc522da5800
RDX: ffffc90005c09000 RSI: 00000000000364ca RDI: 00000000000364cb
RBP: ffffc9000325f870 R08: ffffffff816f33ac R09: 1ffff9200064bea0
R10: dffffc0000000000 R11: fffff5200064bea1 R12: ffff888075644000
R13: ffff88803b166000 R14: ffff88803b166560 R15: ffff88803b166558
FS: 00007f4e305fd700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056080679c000 CR3: 00000000193ad000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
relocate_block_group+0xa5d/0xcd0 fs/btrfs/relocation.c:3749
btrfs_relocate_block_group+0x7ab/0xd70 fs/btrfs/relocation.c:4087
btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3283
__btrfs_balance+0x1b06/0x2690 fs/btrfs/volumes.c:4018
btrfs_balance+0xbdb/0x1120 fs/btrfs/volumes.c:4402
btrfs_ioctl_balance+0x496/0x7c0 fs/btrfs/ioctl.c:3604
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f4e2f88c389
[CAUSE]
With extra debugging, the offending reloc_root is for quota tree (rootid 8).
Normally we should not use the reloc tree for quota root at all, as reloc
trees are only for subvolume trees.
But there is a race between quota enabling and relocation, this happens
after commit 85724171b3 ("btrfs: fix the btrfs_get_global_root return value").
Before that commit, for quota and free space tree, we exit immediately
if we cannot grab it from fs_info.
But now we would try to read it from disk, just as if they are fs trees,
this sets ROOT_SHAREABLE flags in such race:
Thread A | Thread B
---------------------------------+------------------------------
btrfs_quota_enable() |
| | btrfs_get_root_ref()
| | |- btrfs_get_global_root()
| | | Returned NULL
| | |- btrfs_lookup_fs_root()
| | | Returned NULL
|- btrfs_create_tree() | |
| Now quota root item is | |
| inserted | |- btrfs_read_tree_root()
| | | Got the newly inserted quota root
| | |- btrfs_init_fs_root()
| | | Set ROOT_SHAREABLE flag
[FIX]
Get back to the old behavior by returning PTR_ERR(-ENOENT) if the target
objectid is not a subvolume tree or data reloc tree.
Reported-and-tested-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com
Fixes: 85724171b3 ("btrfs: fix the btrfs_get_global_root return value")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When the call to btrfs_reloc_clone_csums in cow_file_range returns an
error, we jump to the out_unlock label with the extent_reserved variable
set to false. The cleanup at the label will then call
extent_clear_unlock_delalloc on the range from start to end. But we've
already added cur_alloc_size to start before the jump, so there might no
range be left from the newly incremented start to end. Move the check for
'start < end' so that it is reached by also for the !extent_reserved case.
CC: stable@vger.kernel.org # 6.1+
Fixes: a315e68f6e ("Btrfs: fix invalid attempt to free reserved space on failure to cow range")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
__extent_writepage could have started on more pages than the one it was
called for. This happens regularly for zoned file systems, and in theory
could happen for compressed I/O if the worker thread was executed very
quickly. For such pages extent_write_cache_pages waits for writeback
to complete before moving on to the next page, which is highly inefficient
as it blocks the flusher thread.
Port over the PageDirty check that was added to write_cache_pages in
commit 515f4a037f ("mm: write_cache_pages optimise page cleaning") to
fix this.
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
extent_write_cache_pages stops writing pages as soon as nr_to_write hits
zero. That is the right thing for opportunistic writeback, but incorrect
for data integrity writeback, which needs to ensure that no dirty pages
are left in the range. Thus only stop the writeback for WB_SYNC_NONE
if nr_to_write hits 0.
This is a port of write_cache_pages changes in commit 05fe478dd0
("mm: write_cache_pages integrity fix").
Note that I've only trigger the problem with other changes to the btrfs
writeback code, but this condition seems worthwhile fixing anyway.
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
[ updated comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
The PCSpecialist Elimina Pro 16 M laptop model is a Zen laptop which
needs to use the MADT IRQ settings override and which does not have
an INT_SRC_OVR entry for IRQ 1 in its MADT.
So this model needs a DMI quirk to enable the MADT IRQ settings override
to fix its keyboard not working.
Fixes: a9c4a912b7 ("ACPI: resource: Remove "Zen" specific match and quirks")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217394#c18
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Recently we've been having mysterious hangs while running generic/475 on
the CI system. This turned out to be something like this:
Task 1
dmsetup suspend --nolockfs
-> __dm_suspend
-> dm_wait_for_completion
-> dm_wait_for_bios_completion
-> Unable to complete because of IO's on a plug in Task 2
Task 2
wb_workfn
-> wb_writeback
-> blk_start_plug
-> writeback_sb_inodes
-> Infinite loop unable to make an allocation
Task 3
cache_block_group
->read_extent_buffer_pages
->Waiting for IO to complete that can't be submitted because Task 1
suspended the DM device
The problem here is that we need Task 2 to be scheduled completely for
the blk plug to flush. Normally this would happen, we normally wait for
the block group caching to finish (Task 3), and this schedule would
result in the block plug flushing.
However if there's enough free space available from the current caching
to satisfy the allocation we won't actually wait for the caching to
complete. This check however just checks that we have enough space, not
that we can make the allocation. In this particular case we were trying
to allocate 9MiB, and we had 10MiB of free space, but we didn't have
9MiB of contiguous space to allocate, and thus the allocation failed and
we looped.
We specifically don't cycle through the FFE loop until we stop finding
cached block groups because we don't want to allocate new block groups
just because we're caching, so we short circuit the normal loop once we
hit LOOP_CACHING_WAIT and we found a caching block group.
This is normally fine, except in this particular case where the caching
thread can't make progress because the DM device has been suspended.
Fix this by not only waiting for free space to >= the amount of space we
want to allocate, but also that we make some progress in caching from
the time we start waiting. This will keep us from busy looping when the
caching is taking a while but still theoretically has enough space for
us to allocate from, and fixes this particular case by forcing us to
actually sleep and wait for forward progress, which will flush the plug.
With this fix we're no longer hanging with generic/475.
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
SA8775 and newer target have added support for an increased number of
interrupt targets. To implement this change, the intr_target field, which
is used to configure the interrupt target in the interrupt configuration
register is increased from 3 bits to 4 bits.
In accordance to these updates, a new intr_target_width member is
introduced in msm_pingroup structure. This member stores the value of
width of intr_target field in the interrupt configuration register. This
value is used to dynamically calculate and generate mask for setting the
intr_target field. By default, this mask is set to 3 bit wide, to ensure
backward compatibility with the older targets.
Fixes: 4b6b185599 ("pinctrl: qcom: add the tlmm driver sa8775p platforms")
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8775p-ride
Signed-off-by: Ninad Naik <quic_ninanaik@quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Reviewed-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Link: https://lore.kernel.org/r/20230809100634.3961-1-quic_ninanaik@quicinc.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Set on the NFT_SET_ELEM_DEAD_BIT flag on this element, instead of
performing element removal which might race with an ongoing transaction.
Enable gc when dynamic flag is set on since dynset deletion requires
garbage collection after this patch.
Fixes: d0a8d877da ("netfilter: nft_dynset: support for element deletion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use the GC transaction API to replace the old and buggy gc API and the
busy mark approach.
No set elements are removed from async garbage collection anymore,
instead the _DEAD bit is set on so the set element is not visible from
lookup path anymore. Async GC enqueues transaction work that might be
aborted and retried later.
rbtree and pipapo set backends does not set on the _DEAD bit from the
sync GC path since this runs in control plane path where mutex is held.
In this case, set elements are deactivated, removed and then released
via RCU callback, sync GC never fails.
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Fixes: 8d8540c4f5 ("netfilter: nft_set_rbtree: add timeout support")
Fixes: 9d0982927e ("netfilter: nft_hash: add support for timeouts")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The set types rhashtable and rbtree use a GC worker to reclaim memory.
From system work queue, in periodic intervals, a scan of the table is
done.
The major caveat here is that the nft transaction mutex is not held.
This causes a race between control plane and GC when they attempt to
delete the same element.
We cannot grab the netlink mutex from the work queue, because the
control plane has to wait for the GC work queue in case the set is to be
removed, so we get following deadlock:
cpu 1 cpu2
GC work transaction comes in , lock nft mutex
`acquire nft mutex // BLOCKS
transaction asks to remove the set
set destruction calls cancel_work_sync()
cancel_work_sync will now block forever, because it is waiting for the
mutex the caller already owns.
This patch adds a new API that deals with garbage collection in two
steps:
1) Lockless GC of expired elements sets on the NFT_SET_ELEM_DEAD_BIT
so they are not visible via lookup. Annotate current GC sequence in
the GC transaction. Enqueue GC transaction work as soon as it is
full. If ruleset is updated, then GC transaction is aborted and
retried later.
2) GC work grabs the mutex. If GC sequence has changed then this GC
transaction lost race with control plane, abort it as it contains
stale references to objects and let GC try again later. If the
ruleset is intact, then this GC transaction deactivates and removes
the elements and it uses call_rcu() to destroy elements.
Note that no elements are removed from GC lockless path, the _DEAD bit
is set and pointers are collected. GC catchall does not remove the
elements anymore too. There is a new set->dead flag that is set on to
abort the GC transaction to deal with set->ops->destroy() path which
removes the remaining elements in the set from commit_release, where no
mutex is held.
To deal with GC when mutex is held, which allows safe deactivate and
removal, add sync GC API which releases the set element object via
call_rcu(). This is used by rbtree and pipapo backends which also
perform garbage collection from control plane path.
Since element removal from sets can happen from control plane and
element garbage collection/timeout, it is necessary to keep the set
structure alive until all elements have been deactivated and destroyed.
We cannot do a cancel_work_sync or flush_work in nft_set_destroy because
its called with the transaction mutex held, but the aforementioned async
work queue might be blocked on the very mutex that nft_set_destroy()
callchain is sitting on.
This gives us the choice of ABBA deadlock or UaF.
To avoid both, add set->refs refcount_t member. The GC API can then
increment the set refcount and release it once the elements have been
free'd.
Set backends are adapted to use the GC transaction API in a follow up
patch entitled:
("netfilter: nf_tables: use gc transaction API in set backends")
This is joint work with Florian Westphal.
Fixes: cfed7e1b1f ("netfilter: nf_tables: add set garbage collection helpers")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull smb server fixes from Steve French:
"Two ksmbd server fixes, both also for stable:
- improve buffer validation when multiple EAs returned
- missing check for command payload size"
* tag '6.5-rc5-ksmbd-server' of git://git.samba.org/ksmbd:
ksmbd: fix wrong next length validation of ea buffer in smb2_set_ea()
ksmbd: validate command request size
Add a 200ms delay after sending a ctrl report to Quadro,
Octo, D5 Next and Aquaero to give them enough time to
process the request and save the data to memory. Otherwise,
under heavier userspace loads where multiple sysfs entries
are usually set in quick succession, a new ctrl report could
be requested from the device while it's still processing the
previous one and fail with -EPIPE. The delay is only applied
if two ctrl report operations are near each other in time.
Reported by a user on Github [1] and tested by both of us.
[1] https://github.com/aleksamagicka/aquacomputer_d5next-hwmon/issues/82
Fixes: 752b927951 ("hwmon: (aquacomputer_d5next) Add support for Aquacomputer Octo")
Signed-off-by: Aleksa Savic <savicaleksa83@gmail.com>
Link: https://lore.kernel.org/r/20230807172004.456968-1-savicaleksa83@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Revert a patch that unconditionally resolved addresses to inlines in
callchains, something that was done before when DWARF mode was asked
for, but could as well be done when just frame pointers (the default)
was selected.
This enriches the callchains with inlines but the way to resolve it
is gross right now, relying on addr2line, and even if we come up with
an efficient way of processing all the associated DWARF info for a
big file as vmlinux is, this has to be something people opt-in, as it
will still result in overheads, so revert it until we get this done
in a saner way.
- Update the x86 msr-index.h header with the kernel original, no change
in tooling output, just addresses a tools/perf build warning.
- Resolve a regression where special "tool events", such as
"duration_time" were being presented for all CPUs, when it only
makes sense to show it for the workload, that is, just once.
* tag 'perf-tools-fixes-for-v6.5-3-2023-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf stat: Don't display zero tool counts
tools arch x86: Sync the msr-index.h copy with the kernel sources
Revert "perf report: Append inlines to non-DWARF callchains"
Commit 16d7fd3cfa ("zonefs: use iomap for synchronous direct writes")
changes zonefs code from a self-built zone append BIO to using iomap for
synchronous direct writes. This change relies on iomap submit BIO
callback to change the write BIO built by iomap to a zone append BIO.
However, this change overlooked the fact that a write BIO may be very
large as it is split when issued. The change from a regular write to a
zone append operation for the built BIO can result in a block layer
warning as zone append BIO are not allowed to be split.
WARNING: CPU: 18 PID: 202210 at block/bio.c:1644 bio_split+0x288/0x350
Call Trace:
? __warn+0xc9/0x2b0
? bio_split+0x288/0x350
? report_bug+0x2e6/0x390
? handle_bug+0x41/0x80
? exc_invalid_op+0x13/0x40
? asm_exc_invalid_op+0x16/0x20
? bio_split+0x288/0x350
bio_split_rw+0x4bc/0x810
? __pfx_bio_split_rw+0x10/0x10
? lockdep_unlock+0xf2/0x250
__bio_split_to_limits+0x1d8/0x900
blk_mq_submit_bio+0x1cf/0x18a0
? __pfx_iov_iter_extract_pages+0x10/0x10
? __pfx_blk_mq_submit_bio+0x10/0x10
? find_held_lock+0x2d/0x110
? lock_release+0x362/0x620
? mark_held_locks+0x9e/0xe0
__submit_bio+0x1ea/0x290
? __pfx___submit_bio+0x10/0x10
? seqcount_lockdep_reader_access.constprop.0+0x82/0x90
submit_bio_noacct_nocheck+0x675/0xa20
? __pfx_bio_iov_iter_get_pages+0x10/0x10
? __pfx_submit_bio_noacct_nocheck+0x10/0x10
iomap_dio_bio_iter+0x624/0x1280
__iomap_dio_rw+0xa22/0x18a0
? lock_is_held_type+0xe3/0x140
? __pfx___iomap_dio_rw+0x10/0x10
? lock_release+0x362/0x620
? zonefs_file_write_iter+0x74c/0xc80 [zonefs]
? down_write+0x13d/0x1e0
iomap_dio_rw+0xe/0x40
zonefs_file_write_iter+0x5ea/0xc80 [zonefs]
do_iter_readv_writev+0x18b/0x2c0
? __pfx_do_iter_readv_writev+0x10/0x10
? inode_security+0x54/0xf0
do_iter_write+0x13b/0x7c0
? lock_is_held_type+0xe3/0x140
vfs_writev+0x185/0x550
? __pfx_vfs_writev+0x10/0x10
? __handle_mm_fault+0x9bd/0x1c90
? find_held_lock+0x2d/0x110
? lock_release+0x362/0x620
? find_held_lock+0x2d/0x110
? lock_release+0x362/0x620
? __up_read+0x1ea/0x720
? do_pwritev+0x136/0x1f0
do_pwritev+0x136/0x1f0
? __pfx_do_pwritev+0x10/0x10
? syscall_enter_from_user_mode+0x22/0x90
? lockdep_hardirqs_on+0x7d/0x100
do_syscall_64+0x58/0x80
This error depends on the hardware used, specifically on the max zone
append bytes and max_[hw_]sectors limits. Tests using AMD Epyc machines
that have low limits did not reveal this issue while runs on Intel Xeon
machines with larger limits trigger it.
Manually splitting the zone append BIO using bio_split_rw() can solve
this issue but also requires issuing the fragment BIOs synchronously
with submit_bio_wait(), to avoid potential reordering of the zone append
BIO fragments, which would lead to data corruption. That is, this
solution is not better than using regular write BIOs which are subject
to serialization using zone write locking at the IO scheduler level.
Given this, fix the issue by removing zone append support and using
regular write BIOs for synchronous direct writes. This allows preseving
the use of iomap and having identical synchronous and asynchronous
sequential file write path. Zone append support will be reintroduced
later through io_uring commands to ensure that the needed special
handling is done correctly.
Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 16d7fd3cfa ("zonefs: use iomap for synchronous direct writes")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Add a test case to check whether sockmap redirection works correctly
when data length returned by stream_parser is less than skb->len.
In addition, this test checks whether strp_done is called correctly.
The reason is that we returns skb->len - 1 from the stream_parser, so
the last byte in the skb will be held by strp->skb_head. Therefore,
if strp_done is not called to free strp->skb_head, we'll get a memleak
warning.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20230804073740.194770-5-xukuohai@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
BPF CI has reported the following failure:
Error: #200/79 sockmap_listen/sockmap VSOCK test_vsock_redir
Error: #200/79 sockmap_listen/sockmap VSOCK test_vsock_redir
./test_progs:vsock_unix_redir_connectible:1506: egress: write: Transport endpoint is not connected
vsock_unix_redir_connectible:FAIL:1506
./test_progs:vsock_unix_redir_connectible:1506: ingress: write: Transport endpoint is not connected
vsock_unix_redir_connectible:FAIL:1506
./test_progs:vsock_unix_redir_connectible:1506: egress: write: Transport endpoint is not connected
vsock_unix_redir_connectible:FAIL:1506
./test_progs:vsock_unix_redir_connectible:1514: ingress: recv() err, errno=11
vsock_unix_redir_connectible:FAIL:1514
./test_progs:vsock_unix_redir_connectible:1518: ingress: vsock socket map failed, a != b
vsock_unix_redir_connectible:FAIL:1518
./test_progs:vsock_unix_redir_connectible:1525: ingress: want pass count 1, have 0
It’s because the recv(... MSG_DONTWAIT) syscall in the test case is
called before the queued work sk_psock_backlog() in the kernel finishes
executing. So the data to be read is still queued in psock->ingress_skb
and cannot be read by the user program. Therefore, the non-blocking
recv() reads nothing and reports an EAGAIN error.
So replace recv(... MSG_DONTWAIT) with xrecv_nonblock(), which calls
select() to wait for data to be readable or timeout before calls recv().
Fixes: d61bd8c1fd ("selftests/bpf: add a test case for vsock sockmap")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20230804073740.194770-4-xukuohai@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
strp_done is only called when psock->progs.stream_parser is not NULL,
but stream_parser was set to NULL by sk_psock_stop_strp(), called
by sk_psock_drop() earlier. So, strp_done can never be called.
Introduce SK_PSOCK_RX_ENABLED to mark whether there is strp on psock.
Change the condition for calling strp_done from judging whether
stream_parser is set to judging whether this flag is set. This flag is
only set once when strp_init() succeeds, and will never be cleared later.
Fixes: c0d95d3380 ("bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20230804073740.194770-3-xukuohai@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Fix a refcount underflow problem reported by syzbot that can happen
when a system is running out of memory. If xp_alloc_tx_descs() fails,
and it can only fail due to not having enough memory, then the error
path is triggered. In this error path, the refcount of the pool is
decremented as it has incremented before. However, the reference to
the pool in the socket was not nulled. This means that when the socket
is closed later, the socket teardown logic will think that there is a
pool attached to the socket and try to decrease the refcount again,
leading to a refcount underflow.
I chose this fix as it involved adding just a single line. Another
option would have been to move xp_get_pool() and the assignment of
xs->pool to after the if-statement and using xs_umem->pool instead of
xs->pool in the whole if-statement resulting in somewhat simpler code,
but this would have led to much more churn in the code base perhaps
making it harder to backport.
Fixes: ba3beec2ec ("xsk: Fix possible crash when multiple sockets are created")
Reported-by: syzbot+8ada0057e69293a05fd4@syzkaller.appspotmail.com
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230809142843.13944-1-magnus.karlsson@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
When the tdm lane mask is computed, the driver currently fills the 1st lane
before moving on to the next. If the stream has less channels than the
lanes can accommodate, slots will be disabled on the last lanes.
Unfortunately, the HW distribute channels in a different way. It distribute
channels in pair on each lanes before moving on the next slots.
This difference leads to problems if a device has an interface with more
than 1 lane and with more than 2 slots per lane.
For example: a playback interface with 2 lanes and 4 slots each (total 8
slots - zero based numbering)
- Playing a 8ch stream:
- All slots activated by the driver
- channel #2 will be played on lane #1 - slot #0 following HW placement
- Playing a 4ch stream:
- Lane #1 disabled by the driver
- channel #2 will be played on lane #0 - slot #2
This behaviour is obviously not desirable.
Change the way slots are activated on the TDM lanes to follow what the HW
does and make sure each channel always get mapped to the same slot/lane.
Fixes: 1a11d88f49 ("ASoC: meson: add tdm formatter base driver")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20230809171931.1244502-1-jbrunet@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The upcoming (and nearly finalized):
https://datatracker.ietf.org/doc/draft-collink-6man-pio-pflag/
will update the IPv6 RA to include a new flag in the PIO field,
which will serve as a hint to perform DHCPv6-PD.
As we don't want DHCPv6 related logic inside the kernel, this piece of
information needs to be exposed to userspace. The simplest option is to
simply expose the entire PIO through the already existing mechanism.
Even without this new flag, the already existing PIO R (router address)
flag (from RFC6275) cannot AFAICT be handled entirely in kernel,
and provides useful information that should be exposed to userspace
(the router's global address, for use by Mobile IPv6).
Also cc'ing stable@ for inclusion in LTS, as while technically this is
not quite a bugfix, and instead more of a feature, it is absolutely
trivial and the alternative is manually cherrypicking into all Android
Common Kernel trees - and I know Greg will ask for it to be sent in via
LTS instead...
Cc: Jen Linkova <furry@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Cc: stable@vger.kernel.org
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20230807102533.1147559-1-maze@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johannes Berg says:
====================
Just a few small updates:
* fix an integer overflow in nl80211
* fix rtw89 8852AE disconnections
* fix a buffer overflow in ath12k
* fix AP_VLAN configuration lookups
* fix allocation failure handling in brcm80211
* update MAINTAINERS for some drivers
* tag 'wireless-2023-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: ath12k: Fix buffer overflow when scanning with extraie
wifi: nl80211: fix integer overflow in nl80211_parse_mbssid_elems()
wifi: cfg80211: fix sband iftype data lookup for AP_VLAN
wifi: rtw89: fix 8852AE disconnection caused by RX full flags
MAINTAINERS: Remove tree entry for rtl8180
MAINTAINERS: Update entry for rtl8187
wifi: brcm80211: handle params_v1 allocation failure
====================
Link: https://lore.kernel.org/r/20230809124818.167432-2-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Normally these two flags do go together, as the issuer of polled IO
generally cannot wait for resources that will get freed as part of IO
completion. This is because that very task is the one that will complete
the request and free those resources, hence that would introduce a
deadlock.
But it is possible to have someone else issue the polled IO, eg via
io_uring if the request is punted to io-wq. For that case, it's fine to
have the task block on IO submission, as it is not the same task that
will be completing the IO.
It's completely up to the caller to ask for both polled and nowait IO
separately! If we don't allow polled IO where IOCB_NOWAIT isn't set in
the kiocb, then we can run into repeated -EAGAIN submissions and not
make any progress.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The test installs filters that match on various IP fragments (e.g., no
fragment, first fragment) and expects a certain amount of packets to hit
each filter. This is problematic as the filters are not specific enough
and can match IP packets (e.g., IGMP) generated by the stack, resulting
in failures [1].
Fix by making the filters more specific and match on more fields in the
IP header: Source IP, destination IP and protocol.
[1]
# timeout set to 0
# selftests: net/forwarding: tc_tunnel_key.sh
# TEST: tunnel_key nofrag (skip_hw) [FAIL]
# packet smaller than MTU was not tunneled
# INFO: Could not test offloaded functionality
not ok 89 selftests: net/forwarding: tc_tunnel_key.sh # exit=1
Fixes: 533a89b194 ("selftests: forwarding: add tunnel_key "nofrag" test case")
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.unizg.hr/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Acked-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230808141503.4060661-14-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test relies on 'nc' being the netcat version from the nmap project.
While this seems to be the case on Fedora, it is not the case on Ubuntu,
resulting in failures such as [1].
Fix by explicitly using the 'ncat' utility from the nmap project and the
skip the test in case it is not installed.
[1]
# timeout set to 0
# selftests: net/forwarding: tc_actions.sh
# TEST: gact drop and ok (skip_hw) [ OK ]
# TEST: mirred egress flower redirect (skip_hw) [ OK ]
# TEST: mirred egress flower mirror (skip_hw) [ OK ]
# TEST: mirred egress matchall mirror (skip_hw) [ OK ]
# TEST: mirred_egress_to_ingress (skip_hw) [ OK ]
# nc: invalid option -- '-'
# usage: nc [-46CDdFhklNnrStUuvZz] [-I length] [-i interval] [-M ttl]
# [-m minttl] [-O length] [-P proxy_username] [-p source_port]
# [-q seconds] [-s sourceaddr] [-T keyword] [-V rtable] [-W recvlimit]
# [-w timeout] [-X proxy_protocol] [-x proxy_address[:port]]
# [destination] [port]
# nc: invalid option -- '-'
# usage: nc [-46CDdFhklNnrStUuvZz] [-I length] [-i interval] [-M ttl]
# [-m minttl] [-O length] [-P proxy_username] [-p source_port]
# [-q seconds] [-s sourceaddr] [-T keyword] [-V rtable] [-W recvlimit]
# [-w timeout] [-X proxy_protocol] [-x proxy_address[:port]]
# [destination] [port]
# TEST: mirred_egress_to_ingress_tcp (skip_hw) [FAIL]
# server output check failed
# INFO: Could not test offloaded functionality
not ok 80 selftests: net/forwarding: tc_actions.sh # exit=1
Fixes: ca22da2fbd ("act_mirred: use the backlog for nested calls to mirred ingress")
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.unizg.hr/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230808141503.4060661-12-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The selftest relies on iproute2 changes present in version 6.3, but the
test does not check for it, resulting in error:
# ./bridge_mdb.sh
INFO: # Host entries configuration tests
TEST: Common host entries configuration tests (IPv4) [FAIL]
Managed to add IPv4 host entry with a filter mode
TEST: Common host entries configuration tests (IPv6) [FAIL]
Managed to add IPv6 host entry with a filter mode
TEST: Common host entries configuration tests (L2) [FAIL]
Managed to add L2 host entry with a filter mode
INFO: # Port group entries configuration tests - (*, G)
Command "replace" is unknown, try "bridge mdb help".
[...]
Fix by skipping the test if iproute2 is too old.
Fixes: b6d00da086 ("selftests: forwarding: Add bridge MDB test")
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Closes: https://lore.kernel.org/netdev/6b04b2ba-2372-6f6b-3ac8-b7cba1cfae83@alu.unizg.hr/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230808141503.4060661-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The default timeout for selftests is 45 seconds, but it is not enough
for forwarding selftests which can takes minutes to finish depending on
the number of tests cases:
# make -C tools/testing/selftests TARGETS=net/forwarding run_tests
TAP version 13
1..102
# timeout set to 45
# selftests: net/forwarding: bridge_igmp.sh
# TEST: IGMPv2 report 239.10.10.10 [ OK ]
# TEST: IGMPv2 leave 239.10.10.10 [ OK ]
# TEST: IGMPv3 report 239.10.10.10 is_include [ OK ]
# TEST: IGMPv3 report 239.10.10.10 include -> allow [ OK ]
#
not ok 1 selftests: net/forwarding: bridge_igmp.sh # TIMEOUT 45 seconds
Fix by switching off the timeout and setting it to 0. A similar change
was done for BPF selftests in commit 6fc5916cc2 ("selftests: bpf:
Switch off timeout").
Fixes: 81573b18f2 ("selftests/net/forwarding: add Makefile to install tests")
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Closes: https://lore.kernel.org/netdev/8d149f8c-818e-d141-a0ce-a6bae606bc22@alu.unizg.hr/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230808141503.4060661-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As explained in [1], the forwarding selftests are meant to be run with
either physical loopbacks or veth pairs. The interfaces are expected to
be specified in a user-provided forwarding.config file or as command
line arguments. By default, this file is not present and the tests fail:
# make -C tools/testing/selftests TARGETS=net/forwarding run_tests
[...]
TAP version 13
1..102
# timeout set to 45
# selftests: net/forwarding: bridge_igmp.sh
# Command line is not complete. Try option "help"
# Failed to create netif
not ok 1 selftests: net/forwarding: bridge_igmp.sh # exit=1
[...]
Fix by skipping a test if interfaces are not provided either via the
configuration file or command line arguments.
# make -C tools/testing/selftests TARGETS=net/forwarding run_tests
[...]
TAP version 13
1..102
# timeout set to 45
# selftests: net/forwarding: bridge_igmp.sh
# SKIP: Cannot create interface. Name not specified
ok 1 selftests: net/forwarding: bridge_igmp.sh # SKIP
[1] tools/testing/selftests/net/forwarding/README
Fixes: 81573b18f2 ("selftests/net/forwarding: add Makefile to install tests")
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Closes: https://lore.kernel.org/netdev/856d454e-f83c-20cf-e166-6dc06cbc1543@alu.unizg.hr/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230808141503.4060661-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel says:
====================
nexthop: Nexthop dump fixes
Patches #1 and #3 fix two problems related to nexthops and nexthop
buckets dump, respectively. Patch #2 is a preparation for the third
patch.
The pattern described in these patches of splitting the NLMSG_DONE to a
separate response is prevalent in other rtnetlink dump callbacks. I
don't know if it's because I'm missing something or if this was done
intentionally to ensure the message is delivered to user space. After
commit 0642840b8b ("af_netlink: ensure that NLMSG_DONE never fails in
dumps") this is no longer necessary and I can improve these dump
callbacks assuming this analysis is correct.
No regressions in existing tests:
# ./fib_nexthops.sh
[...]
Tests passed: 230
Tests failed: 0
====================
Link: https://lore.kernel.org/r/20230808075233.3337922-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A netlink dump callback can return a positive number to signal that more
information needs to be dumped or zero to signal that the dump is
complete. In the second case, the core netlink code will append the
NLMSG_DONE message to the skb in order to indicate to user space that
the dump is complete.
The nexthop bucket dump callback always returns a positive number if
nexthop buckets were filled in the provided skb, even if the dump is
complete. This means that a dump will span at least two recvmsg() calls
as long as nexthop buckets are present. In the last recvmsg() call the
dump callback will not fill in any nexthop buckets because the previous
call indicated that the dump should restart from the last dumped nexthop
ID plus one.
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip nexthop add id 10 group 1 type resilient buckets 2
# strace -e sendto,recvmsg -s 5 ip nexthop bucket
sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396980, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 128
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128
id 10 index 0 idle_time 6.66 nhid 1
id 10 index 1 idle_time 6.66 nhid 1
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
+++ exited with 0 +++
This behavior is both inefficient and buggy. If the last nexthop to be
dumped had the maximum ID of 0xffffffff, then the dump will restart from
0 (0xffffffff + 1) and never end:
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
# ip nexthop bucket
id 4294967295 index 0 idle_time 5.55 nhid 1
id 4294967295 index 1 idle_time 5.55 nhid 1
id 4294967295 index 0 idle_time 5.55 nhid 1
id 4294967295 index 1 idle_time 5.55 nhid 1
[...]
Fix by adjusting the dump callback to return zero when the dump is
complete. After the fix only one recvmsg() call is made and the
NLMSG_DONE message is appended to the RTM_NEWNEXTHOPBUCKET responses:
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
# strace -e sendto,recvmsg -s 5 ip nexthop bucket
sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396737, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 148
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 148
id 4294967295 index 0 idle_time 6.61 nhid 1
id 4294967295 index 1 idle_time 6.61 nhid 1
+++ exited with 0 +++
Note that if the NLMSG_DONE message cannot be appended because of size
limitations, then another recvmsg() will be needed, but the core netlink
code will not invoke the dump callback and simply reply with a
NLMSG_DONE message since it knows that the callback previously returned
zero.
Add a test that fails before the fix:
# ./fib_nexthops.sh -t basic_res
[...]
TEST: Maximum nexthop ID dump [FAIL]
[...]
And passes after it:
# ./fib_nexthops.sh -t basic_res
[...]
TEST: Maximum nexthop ID dump [ OK ]
[...]
Fixes: 8a1bbabb03 ("nexthop: Add netlink handlers for bucket dump")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230808075233.3337922-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rtm_dump_nexthop_bucket_nh() is used to dump nexthop buckets belonging
to a specific resilient nexthop group. The function returns a positive
return code (the skb length) upon both success and failure.
The above behavior is problematic. When a complete nexthop bucket dump
is requested, the function that walks the different nexthops treats the
non-zero return code as an error. This causes buckets belonging to
different resilient nexthop groups to be dumped using different buffers
even if they can all fit in the same buffer:
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip nexthop add id 10 group 1 type resilient buckets 1
# ip nexthop add id 20 group 1 type resilient buckets 1
# strace -e recvmsg -s 0 ip nexthop bucket
[...]
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64
id 10 index 0 idle_time 10.27 nhid 1
[...]
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64
id 20 index 0 idle_time 6.44 nhid 1
[...]
Fix by only returning a non-zero return code when an error occurred and
restarting the dump from the bucket index we failed to fill in. This
allows buckets belonging to different resilient nexthop groups to be
dumped using the same buffer:
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip nexthop add id 10 group 1 type resilient buckets 1
# ip nexthop add id 20 group 1 type resilient buckets 1
# strace -e recvmsg -s 0 ip nexthop bucket
[...]
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128
id 10 index 0 idle_time 30.21 nhid 1
id 20 index 0 idle_time 26.7 nhid 1
[...]
While this change is more of a performance improvement change than an
actual bug fix, it is a prerequisite for a subsequent patch that does
fix a bug.
Fixes: 8a1bbabb03 ("nexthop: Add netlink handlers for bucket dump")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230808075233.3337922-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A netlink dump callback can return a positive number to signal that more
information needs to be dumped or zero to signal that the dump is
complete. In the second case, the core netlink code will append the
NLMSG_DONE message to the skb in order to indicate to user space that
the dump is complete.
The nexthop dump callback always returns a positive number if nexthops
were filled in the provided skb, even if the dump is complete. This
means that a dump will span at least two recvmsg() calls as long as
nexthops are present. In the last recvmsg() call the dump callback will
not fill in any nexthops because the previous call indicated that the
dump should restart from the last dumped nexthop ID plus one.
# ip nexthop add id 1 blackhole
# strace -e sendto,recvmsg -s 5 ip nexthop
sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOP, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691394315, nlmsg_pid=0}, {nh_family=AF_UNSPEC, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 36
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=36, nlmsg_type=RTM_NEWNEXTHOP, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394315, nlmsg_pid=343}, {nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}, [[{nla_len=8, nla_type=NHA_ID}, 1], {nla_len=4, nla_type=NHA_BLACKHOLE}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
id 1 blackhole
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394315, nlmsg_pid=343}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
+++ exited with 0 +++
This behavior is both inefficient and buggy. If the last nexthop to be
dumped had the maximum ID of 0xffffffff, then the dump will restart from
0 (0xffffffff + 1) and never end:
# ip nexthop add id $((2**32-1)) blackhole
# ip nexthop
id 4294967295 blackhole
id 4294967295 blackhole
[...]
Fix by adjusting the dump callback to return zero when the dump is
complete. After the fix only one recvmsg() call is made and the
NLMSG_DONE message is appended to the RTM_NEWNEXTHOP response:
# ip nexthop add id $((2**32-1)) blackhole
# strace -e sendto,recvmsg -s 5 ip nexthop
sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOP, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691394080, nlmsg_pid=0}, {nh_family=AF_UNSPEC, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 56
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=36, nlmsg_type=RTM_NEWNEXTHOP, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394080, nlmsg_pid=342}, {nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}, [[{nla_len=8, nla_type=NHA_ID}, 4294967295], {nla_len=4, nla_type=NHA_BLACKHOLE}]], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394080, nlmsg_pid=342}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 56
id 4294967295 blackhole
+++ exited with 0 +++
Note that if the NLMSG_DONE message cannot be appended because of size
limitations, then another recvmsg() will be needed, but the core netlink
code will not invoke the dump callback and simply reply with a
NLMSG_DONE message since it knows that the callback previously returned
zero.
Add a test that fails before the fix:
# ./fib_nexthops.sh -t basic
[...]
TEST: Maximum nexthop ID dump [FAIL]
[...]
And passes after it:
# ./fib_nexthops.sh -t basic
[...]
TEST: Maximum nexthop ID dump [ OK ]
[...]
Fixes: ab84be7e54 ("net: Initial nexthop code")
Reported-by: Petr Machata <petrm@nvidia.com>
Closes: https://lore.kernel.org/netdev/87sf91enuf.fsf@nvidia.com/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230808075233.3337922-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yingcong has noticed that on the 5-level paging machine, VDSO and VVAR
VMAs are placed above the 47-bit border:
8000001a9000-8000001ad000 r--p 00000000 00:00 0 [vvar]
8000001ad000-8000001af000 r-xp 00000000 00:00 0 [vdso]
This might confuse users who are not aware of 5-level paging and expect
all userspace addresses to be under the 47-bit border.
So far problem has only been triggered with ASLR disabled, although it
may also occur with ASLR enabled if the layout is randomized in a just
right way.
The problem happens due to custom placement for the VMAs in the VDSO
code: vdso_addr() tries to place them above the stack and checks the
result against TASK_SIZE_MAX, which is wrong. TASK_SIZE_MAX is set to
the 56-bit border on 5-level paging machines. Use DEFAULT_MAP_WINDOW
instead.
Fixes: b569bab78d ("x86/mm: Prepare to expose larger address space to userspace")
Reported-by: Yingcong Wu <yingcong.wu@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20230803151609.22141-1-kirill.shutemov%40linux.intel.com
The msi-ec driver fails to build for me (gcc 7.5):
CC [M] drivers/platform/x86/msi-ec.o
drivers/platform/x86/msi-ec.c:72:6: error: initializer element is not constant
{ SM_ECO_NAME, 0xc2 },
^~~~~~~~~~~
drivers/platform/x86/msi-ec.c:72:6: note: (near initialization for ‘CONF0.shift_mode.modes[0].name’)
drivers/platform/x86/msi-ec.c:73:6: error: initializer element is not constant
{ SM_COMFORT_NAME, 0xc1 },
^~~~~~~~~~~~~~~
drivers/platform/x86/msi-ec.c:73:6: note: (near initialization for ‘CONF0.shift_mode.modes[1].name’)
drivers/platform/x86/msi-ec.c:74:6: error: initializer element is not constant
{ SM_SPORT_NAME, 0xc0 },
^~~~~~~~~~~~~
drivers/platform/x86/msi-ec.c:74:6: note: (near initialization for ‘CONF0.shift_mode.modes[2].name’)
(...)
Don't try to be smart, just use defines for the constant strings. The
compiler will recognize it's the same string and will store it only
once in the data section anyway.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: 392cacf2aa ("platform/x86: Add new msi-ec driver")
Cc: stable@vger.kernel.org
Cc: Nikita Kravets <teackot@gmail.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Mark Gross <markgross@kernel.org>
Link: https://lore.kernel.org/r/20230805101010.54d49e91@endymion.delvare
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
All the cases, were the DSDT IRQ settings should be used instead of
the MADT override, are for IRQ 1 or 12, the PS/2 kbd resp. mouse IRQs.
Simplify things by always honering the override for other legacy IRQs
(for non DMI quirked cases).
This allows removing the DMI quirks to honor the override for
some non i8042 IRQs on some AMD ZEN based Lenovo models.
Fixes: a9c4a912b7 ("ACPI: resource: Remove "Zen" specific match and quirks")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
EXPORT_SYMBOL and __init is a bad combination because the .init.text
section is freed up after the initialization.
Commit c5a130325f ("ACPI/APEI: Add parameter check before error
injection") exported page_is_ram(), hence the __init annotation should
be removed.
This fixes the modpost warning in ARCH=alpha builds:
WARNING: modpost: vmlinux: page_is_ram: EXPORT_SYMBOL used for init symbol. Remove __init or EXPORT_SYMBOL.
Fixes: c5a130325f ("ACPI/APEI: Add parameter check before error injection")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
The same checks are repeated in three places to decide whether to use
hwrng. Consolidate these into a helper.
Also this fixes a case that one of them was missing a check in the
cleanup path.
Fixes: 554b841d47 ("tpm: Disable RNG for all AMD fTPMs")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
GDB uses /proc/PID/mem to access memory of the target process. GDB
doesn't untag addresses manually, but relies on kernel to do the right
thing.
mem_rw() of procfs uses access_remote_vm() to get data from the target
process. It worked fine until recent changes in __access_remote_vm()
that now checks if there's VMA at target address using raw address.
Untag the address before looking up the VMA.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Christina Schimpe <christina.schimpe@intel.com>
Fixes: eee9c708cc ("gup: avoid stack expansion warning for known-good case")
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the dGPU path instead. There were a lot of platform
issues with IOMMU in general on these chips due to windows
not enabling IOMMU at the time. The dGPU path has been
used for a long time with newer APUs and works fine. This
also paves the way to simplify the driver significantly.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the dGPU path instead. There were a lot of platform
issues with IOMMU in general on these chips due to windows
not enabling IOMMU at the time. The dGPU path has been
used for a long time with newer APUs and works fine. This
also paves the way to simplify the driver significantly.
v2: use the dGPU queue manager functions
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is only required for SR-IOV world switches, but it
adds additional latency leading to reduced performance in
some benchmarks. Disable for now on bare metal.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Under certain circumstances, an integer division by 0 which faults, can
leave stale quotient data from a previous division operation on Zen1
microarchitectures.
Do a dummy division 0/1 before returning from the #DE exception handler
in order to avoid any leaks of potentially sensitive data.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The existing OD interface cannot support the growing demand for more
OD features. We are in the transition to a new OD mechanism. So,
disable the SMU13 OD feature support temporarily. And this should be
reverted when the new OD mechanism online.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On PSP v13.x ASICs, boot loader will set only the MSB to 1 and clear the
least significant bits for any command submission. Hence match against
the exact register value, otherwise a register value of all 0xFFs also
could falsely indicate that boot loader is ready. Also, from PSP v13.0.6
and newer, bits[7:0] will be used to indicate command error status.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If cfg80211 is providing extraie's for a scanning process then ath12k will
copy that over to the firmware. The extraie.len is a 32 bit value in struct
element_info and describes the amount of bytes for the vendor information
elements.
The problem is the allocation of the buffer. It has to align the TLV
sections by 4 bytes. But the code was using an u8 to store the newly
calculated length of this section (with alignment). And the new
calculated length was then used to allocate the skbuff. But the actual
code to copy in the data is using the extraie.len and not the calculated
"aligned" length.
The length of extraie with IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS enabled
was 264 bytes during tests with a wifi card. But it only allocated 8
bytes (264 bytes % 256) for it. As consequence, the code to memcpy the
extraie into the skb was then just overwriting data after skb->end. Things
like shinfo were therefore corrupted. This could usually be seen by a crash
in skb_zcopy_clear which tried to call a ubuf_info callback (using a bogus
address).
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Link: https://lore.kernel.org/r/20230809081241.32765-1-quic_wgong@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
nl80211_parse_mbssid_elems() uses a u8 variable num_elems to count the
number of MBSSID elements in the nested netlink attribute attrs, which can
lead to an integer overflow if a user of the nl80211 interface specifies
256 or more elements in the corresponding attribute in userspace. The
integer overflow can lead to a heap buffer overflow as num_elems determines
the size of the trailing array in elems, and this array is thereafter
written to for each element in attrs.
Note that this vulnerability only affects devices with the
wiphy->mbssid_max_interfaces member set for the wireless physical device
struct in the device driver, and can only be triggered by a process with
CAP_NET_ADMIN capabilities.
Fix this by checking for a maximum of 255 elements in attrs.
Cc: stable@vger.kernel.org
Fixes: dc1e3cb8da ("nl80211: MBSSID and EMA support in AP mode")
Signed-off-by: Keith Yeo <keithyjy@gmail.com>
Link: https://lore.kernel.org/r/20230731034719.77206-1-keithyjy@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There is an asymmetry between commit/abort and preparation phase if the
following conditions are met:
1. set is a verdict map ("1.2.3.4 : jump foo")
2. timeouts are enabled
In this case, following sequence is problematic:
1. element E in set S refers to chain C
2. userspace requests removal of set S
3. kernel does a set walk to decrement chain->use count for all elements
from preparation phase
4. kernel does another set walk to remove elements from the commit phase
(or another walk to do a chain->use increment for all elements from
abort phase)
If E has already expired in 1), it will be ignored during list walk, so its use count
won't have been changed.
Then, when set is culled, ->destroy callback will zap the element via
nf_tables_set_elem_destroy(), but this function is only safe for
elements that have been deactivated earlier from the preparation phase:
lack of earlier deactivate removes the element but leaks the chain use
count, which results in a WARN splat when the chain gets removed later,
plus a leak of the nft_chain structure.
Update pipapo_get() not to skip expired elements, otherwise flush
command reports bogus ENOENT errors.
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Fixes: 8d8540c4f5 ("netfilter: nft_set_rbtree: add timeout support")
Fixes: 9d0982927e ("netfilter: nft_hash: add support for timeouts")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Guenter reports boot issues with duplicate sysfs entries for multiport
drivers. Let's go back to using port->line for now to fix the regression.
With this change, the serial core port device names are not correct for the
hardware specific 8250 single port drivers, but that's a cosmetic issue for
now.
Fixes: d962de6ae5 ("serial: core: Fix serial core port id to not use port->line")
Reported-by: Guenter Roeck <groeck7@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230806062052.47737-1-tony@atomide.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mmc_add_host() may return error, if we ignore its return value,
1. the memory allocated in mmc_alloc_host() will be leaked
2. null-ptr-deref will happen when calling mmc_remove_host()
in remove function spmmc_drv_remove() because deleting not
added device.
Fix this by checking the return value of mmc_add_host(). Moreover,
I fixed the error handling path of spmmc_drv_probe() to clean up.
Fixes: 4e268fed8b ("mmc: Add mmc driver for Sunplus SP7021")
Cc: stable@vger.kernel.org
Signed-off-by: Wei Chen <harperchen1110@gmail.com>
Link: https://lore.kernel.org/r/20230622090233.188539-1-harperchen1110@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Gerd Bayer says:
====================
net/smc: Fix effective buffer size
commit 0227f058aa ("net/smc: Unbind r/w buffer size from clcsock
and make them tunable") started to derive the effective buffer size for
SMC connections inconsistently in case a TCP fallback was used and
memory consumption of SMC with the default settings was doubled when
a connection negotiated SMC. That was not what we want.
This series consolidates the resulting effective buffer size that is
used with SMC sockets, which is based on Jan Karcher's effort (see
[1]). For all TCP exchanges (in particular in case of a fall back when
no SMC connection was possible) the values from net.ipv4.tcp_[rw]mem
are used. If SMC succeeds in establishing a SMC connection, the newly
introduced values from net.smc.[rw]mem are used.
net.smc.[rw]mem is initialized to 64kB, respectively. Internal test
have show this to be a good compromise between throughput/latency
and memory consumption. Also net.smc.[rw]mem is now decoupled completely
from any tuning through net.ipv4.tcp_[rw]mem.
If a user chose to tune a socket's receive or send buffer size with
setsockopt, this tuning is now consistently applied to either fall-back
TCP or proper SMC connections over the socket.
Thanks,
Gerd
v2 - v3:
- Rebase to and resolve conflict of second patch with latest net/master.
v1 - v2:
- In second patch, use sock_net() helper as suggested by Tony and demanded
by kernel test robot.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tuning of the effective buffer size through setsockopts was working for
SMC traffic only but not for TCP fall-back connections even before
commit 0227f058aa ("net/smc: Unbind r/w buffer size from clcsock and
make them tunable"). That change made it apparent that TCP fall-back
connections would use net.smc.[rw]mem as buffer size instead of
net.ipv4_tcp_[rw]mem.
Amend the code that copies attributes between the (TCP) clcsock and the
SMC socket and adjust buffer sizes appropriately:
- Copy over sk_userlocks so that both sockets agree on whether tuning
via setsockopt is active.
- When falling back to TCP use sk_sndbuf or sk_rcvbuf as specified with
setsockopt. Otherwise, use the sysctl value for TCP/IPv4.
- Likewise, use either values from setsockopt or from sysctl for SMC
(duplicated) on successful SMC connect.
In smc_tcp_listen_work() drop the explicit copy of buffer sizes as that
is taken care of by the attribute copy.
Fixes: 0227f058aa ("net/smc: Unbind r/w buffer size from clcsock and make them tunable")
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0227f058aa ("net/smc: Unbind r/w buffer size from clcsock
and make them tunable") introduced the net.smc.rmem and net.smc.wmem
sysctls to specify the size of buffers to be used for SMC type
connections. This created a regression for users that specified the
buffer size via setsockopt() as the effective buffer size was now
doubled.
Re-introduce the division by 2 in the SMC buffer create code and level
this out by duplicating the net.smc.[rw]mem values used for initializing
sk_rcvbuf/sk_sndbuf at socket creation time. This gives users of both
methods (setsockopt or sysctl) the effective buffer size that they
expect.
Initialize net.smc.[rw]mem from its own constant of 64kB, respectively.
Internal performance tests show that this value is a good compromise
between throughput/latency and memory consumption. Also, this decouples
it from any tuning that was done to net.ipv4.tcp_[rw]mem[1] before the
module for SMC protocol was loaded. Check that no more than INT_MAX / 2
is assigned to net.smc.[rw]mem, in order to avoid any overflow condition
when that is doubled for use in sk_sndbuf or sk_rcvbuf.
While at it, drop the confusing sk_buf_size variable from
__smc_buf_create and name "compressed" buffer size variables more
consistently.
Background:
Before the commit mentioned above, SMC's buffer allocator in
__smc_buf_create() always used half of the sockets' sk_rcvbuf/sk_sndbuf
value as initial value to search for appropriate buffers. If the search
resorted to using a bigger buffer when all buffers of the specified
size were busy, the duplicate of the used effective buffer size is
stored back to sk_rcvbuf/sk_sndbuf.
When available, buffers of exactly the size that a user had specified as
input to setsockopt() were used, despite setsockopt()'s documentation in
"man 7 socket" talking of a mandatory duplication:
[...]
SO_SNDBUF
Sets or gets the maximum socket send buffer in bytes.
The kernel doubles this value (to allow space for book‐
keeping overhead) when it is set using setsockopt(2),
and this doubled value is returned by getsockopt(2).
The default value is set by the
/proc/sys/net/core/wmem_default file and the maximum
allowed value is set by the /proc/sys/net/core/wmem_max
file. The minimum (doubled) value for this option is
2048.
[...]
Fixes: 0227f058aa ("net/smc: Unbind r/w buffer size from clcsock and make them tunable")
Co-developed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean says:
====================
Fix ENETC probing after 6fffbc7ae1 ("PCI: Honor firmware's device disabled status")
I'm not sure who should take this patch set (net maintainers or PCI
maintainers). Everyone could pick up just their part, and that would
work (no compile time dependencies). However, the entire series needs
ACK from both sides and Rob for sure.
v1 at:
https://lore.kernel.org/netdev/20230521115141.2384444-1-vladimir.oltean@nxp.com/
====================
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 6fffbc7ae1 ("PCI: Honor firmware's device disabled
status"), this is redundant and does nothing, because enetc_pf_probe()
no longer even gets called.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The workaround implemented in commit 3222b5b613 ("net: enetc:
initialize RFS/RSS memories for unused ports too") is no longer
effective after commit 6fffbc7ae1 ("PCI: Honor firmware's device
disabled status"). Thus, it has introduced a regression and we see AER
errors being reported again:
$ ip link set sw2p0 up && dhclient -i sw2p0 && ip addr show sw2p0
fsl_enetc 0000:00:00.2 eno2: configuring for fixed/internal link mode
fsl_enetc 0000:00:00.2 eno2: Link is Up - 2.5Gbps/Full - flow control rx/tx
mscc_felix 0000:00:00.5 swp2: configuring for fixed/sgmii link mode
mscc_felix 0000:00:00.5 swp2: Link is Up - 1Gbps/Full - flow control off
sja1105 spi2.2 sw2p0: configuring for phy/rgmii-id link mode
sja1105 spi2.2 sw2p0: Link is Up - 1Gbps/Full - flow control off
pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0
pcieport 0000:00:1f.0: AER: can't find device of ID0000
Rob's suggestion is to reimplement the enetc driver workaround as a
PCI fixup, and to modify the PCI core to run the fixups for all PCI
functions. This change handles the first part.
We refactor the common code in enetc_psi_create() and enetc_psi_destroy(),
and use the PCI fixup only for those functions for which enetc_pf_probe()
won't get called. This avoids some work being done twice for the PFs
which are enabled.
Fixes: 6fffbc7ae1 ("PCI: Honor firmware's device disabled status")
Link: https://lore.kernel.org/netdev/CAL_JsqLsVYiPLx2kcHkDQ4t=hQVCR7NHziDwi9cCFUFhx48Qow@mail.gmail.com/
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The blamed commit has broken probing on
arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi when &enetc_port0
(PCI function 0) has status = "disabled".
Background: pci_scan_slot() has logic to say that if the function 0 of a
device is absent, the entire device is absent and we can skip the other
functions entirely. Traditionally, this has meant that
pci_bus_read_dev_vendor_id() returns an error code for that function.
However, since the blamed commit, there is an extra confounding
condition: function 0 of the device exists and has a valid vendor id,
but it is disabled in the device tree. In that case, pci_scan_slot()
would incorrectly skip the entire device instead of just that function.
In the case of NXP LS1028A, status = "disabled" does not mean that the
PCI function's config space is not available for reading. It is, but the
Ethernet port is just not functionally useful with a particular SerDes
protocol configuration (0x9999) due to pinmuxing constraints of the Soc.
So, pci_scan_slot() skips all other functions on the ENETC ECAM
(enetc_port1, enetc_port2, enetc_mdio_pf3 etc) when just enetc_port0 had
to not be probed.
There is an additional regression introduced by the change, caused by
its fundamental premise. The enetc driver needs to run code for all PCI
functions, regardless of whether they're enabled or not in the device
tree. That is no longer possible if the driver's probe function is no
longer called. But Rob recommends that we move the of_device_is_available()
detection to dev->match_driver, and this makes the PCI fixups still run
on all functions, while just probing drivers for those functions that
are enabled. So, a separate change in the enetc driver will have to move
the workarounds to a PCI fixup.
Fixes: 6fffbc7ae1 ("PCI: Honor firmware's device disabled status")
Link: https://lore.kernel.org/netdev/CAL_JsqLsVYiPLx2kcHkDQ4t=hQVCR7NHziDwi9cCFUFhx48Qow@mail.gmail.com/
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add fdir_fltr_lock locking in unprotected places.
The change in iavf_fdir_is_dup_fltr adds a spinlock around a loop which
iterates over all filters and looks for a duplicate. The filter can be
removed from list and freed from memory at the same time it's being
compared. All other places where filters are deleted are already
protected with spinlock.
The remaining changes protect adapter->fdir_active_fltr variable so now
all its uses are under a spinlock.
Fixes: 527691bf06 ("iavf: Support IPv4 Flow Director filters")
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807205011.3129224-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saeed Mahameed says:
====================
mlx5 fixes 2023-08-07
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2023-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Add capability check for vnic counters
net/mlx5: Reload auxiliary devices in pci error handlers
net/mlx5: Skip clock update work when device is in error state
net/mlx5: LAG, Check correct bucket when modifying LAG
net/mlx5e: Unoffload post act rule when handling FIB events
net/mlx5: Fix devlink controller number for ECVF
net/mlx5: Allow 0 for total host VFs
net/mlx5: Return correct EC_VF function ID
net/mlx5: DR, Fix wrong allocation of modify hdr pattern
net/mlx5e: TC, Fix internal port memory leak
net/mlx5e: Take RTNL lock when needed before calling xdp_set_features()
====================
Link: https://lore.kernel.org/r/20230807212607.50883-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When externel_lb and reset are executed together, a deadlock may
occur:
[ 3147.217009] INFO: task kworker/u321:0:7 blocked for more than 120 seconds.
[ 3147.230483] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3147.238999] task:kworker/u321:0 state:D stack: 0 pid: 7 ppid: 2 flags:0x00000008
[ 3147.248045] Workqueue: hclge hclge_service_task [hclge]
[ 3147.253957] Call trace:
[ 3147.257093] __switch_to+0x7c/0xbc
[ 3147.261183] __schedule+0x338/0x6f0
[ 3147.265357] schedule+0x50/0xe0
[ 3147.269185] schedule_preempt_disabled+0x18/0x24
[ 3147.274488] __mutex_lock.constprop.0+0x1d4/0x5dc
[ 3147.279880] __mutex_lock_slowpath+0x1c/0x30
[ 3147.284839] mutex_lock+0x50/0x60
[ 3147.288841] rtnl_lock+0x20/0x2c
[ 3147.292759] hclge_reset_prepare+0x68/0x90 [hclge]
[ 3147.298239] hclge_reset_subtask+0x88/0xe0 [hclge]
[ 3147.303718] hclge_reset_service_task+0x84/0x120 [hclge]
[ 3147.309718] hclge_service_task+0x2c/0x70 [hclge]
[ 3147.315109] process_one_work+0x1d0/0x490
[ 3147.319805] worker_thread+0x158/0x3d0
[ 3147.324240] kthread+0x108/0x13c
[ 3147.328154] ret_from_fork+0x10/0x18
In externel_lb process, the hns3 driver call napi_disable()
first, then the reset happen, then the restore process of the
externel_lb will fail, and will not call napi_enable(). When
doing externel_lb again, napi_disable() will be double call,
cause a deadlock of rtnl_lock().
This patch use the HNS3_NIC_STATE_DOWN state to protect the
calling of napi_disable() and napi_enable() in externel_lb
process, just as the usage in ndo_stop() and ndo_start().
Fixes: 04b6ba1435 ("net: hns3: add support for external loopback test")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230807113452.474224-5-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Change the new (unreleased) SO_PEERPIDFD sockopt to return ENODATA
rather than ESRCH if a socket type does not support remote peer-PID
queries.
Currently, SO_PEERPIDFD returns ESRCH when the socket in question is
not an AF_UNIX socket. This is quite unexpected, given that one would
assume ESRCH means the peer process already exited and thus cannot be
found. However, in that case the sockopt actually returns EINVAL (via
pidfd_prepare()). This is rather inconsistent with other syscalls, which
usually return ESRCH if a given PID refers to a non-existant process.
This changes SO_PEERPIDFD to return ENODATA instead. This is also what
SO_PEERGROUPS returns, and thus keeps a consistent behavior across
sockopts.
Note that this code is returned in 2 cases: First, if the socket type is
not AF_UNIX, and secondly if the socket was not yet connected. In both
cases ENODATA seems suitable.
Signed-off-by: David Rheinsberg <david@readahead.eu>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Luca Boccassi <bluca@debian.org>
Fixes: 7b26952a91 ("net: core: add getsockopt SO_PEERPIDFD")
Link: https://lore.kernel.org/r/20230807081225.816199-1-david@readahead.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I'm looking to enable -Wmissing-variable-declarations behind W=1. 0day
bot spotted the following instance in ARCH=riscv builds:
arch/riscv/mm/init.c:276:7: warning: no previous extern declaration
for non-static variable 'trampoline_pg_dir'
[-Wmissing-variable-declarations]
276 | pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
| ^
arch/riscv/mm/init.c:276:1: note: declare 'static' if the variable is
not intended to be used outside of this translation unit
276 | pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
| ^
arch/riscv/mm/init.c:279:7: warning: no previous extern declaration
for non-static variable 'early_pg_dir'
[-Wmissing-variable-declarations]
279 | pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
| ^
arch/riscv/mm/init.c:279:1: note: declare 'static' if the variable is
not intended to be used outside of this translation unit
279 | pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
| ^
These symbols are referenced by more than one translation unit, so make
sure they're both declared and include the correct header for their
declarations. Finally, sort the list of includes to help keep them tidy.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/llvm/202308081000.tTL1ElTr-lkp@intel.com/
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20230808-riscv_static-v2-1-2a1e2d2c7a4f@google.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Section 2.1 of the Platform Specification [1] states:
Unless otherwise specified by a given I/O device, I/O devices are on
ordering channel 0 (i.e., they are point-to-point strongly ordered).
which is not sufficient to guarantee that a readX() by a hart completes
before a subsequent delay() on the same hart (cf. memory-barriers.txt,
"Kernel I/O barrier effects").
Set the I(nput) bit in __io_ar() to restore the ordering, align inline
comments.
[1] https://github.com/riscv/riscv-platform-specs
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20230803042738.5937-1-parri.andrea@gmail.com
Fixes: fab957c11e ("RISC-V: Atomic and Locking Code")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
commit 914d6f44fc ("RISC-V: only iterate over possible CPUs in ISA
string parser") changed riscv_fill_hwcap() from iterating over CPU DT
nodes to iterating over logical CPU IDs. Since this function runs long
before cpu_dev_init() creates CPU devices, it hits the fallback path in
of_cpu_device_node_get(), which itself iterates over the DT nodes,
searching for a node with the requested CPU ID. (Incidentally, this
makes riscv_fill_hwcap() now take quadratic time.)
riscv_fill_hwcap() passes a logical CPU ID to of_cpu_device_node_get(),
which uses the arch_match_cpu_phys_id() hook to translate the logical ID
to a physical ID as found in the DT.
arch_match_cpu_phys_id() has a generic weak definition, and RISC-V
provides a strong definition using cpuid_to_hartid_map(). However, the
RISC-V specific implementation is located in arch/riscv/kernel/smp.c,
and that file is only compiled when SMP is enabled.
As a result, when SMP is disabled, the generic definition is used, and
riscv_isa gets initialized based on the ISA string of hart 0, not the
boot hart. On FU740, this means has_fpu() returns false, and userspace
crashes when trying to use floating-point instructions.
Fix this by moving arch_match_cpu_phys_id() to a file which is always
compiled.
Fixes: 70114560b2 ("RISC-V: Add RISC-V specific arch_match_cpu_phys_id")
Fixes: 914d6f44fc ("RISC-V: only iterate over possible CPUs in ISA string parser")
Reported-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230803012608.3540081-1-samuel.holland@sifive.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull hardening fixes from Kees Cook:
- Replace remaining open-coded struct_size_t() instance (Gustavo A. R.
Silva)
- Adjust vboxsf's trailing arrays to be proper flexible arrays
* tag 'hardening-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
media: venus: Use struct_size_t() helper in pkt_session_unset_buffers()
vboxsf: Use flexible arrays for trailing string member
This was introduced to add a plug based way of signaling nowait issues,
but we have since moved on from that. Kill the old dead code, nobody is
setting it anymore.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
40613da52b ("PCI: acpiphp: Reassign resources on bridge if necessary")
changed acpiphp hotplug to use pci_assign_unassigned_bridge_resources()
which depends on bridge being available, however enable_slot() can be
called without bridge associated:
1. Legitimate case of hotplug on root bus (widely used in virt world)
2. A (misbehaving) firmware, that sends ACPI Bus Check notifications to
non existing root ports (Dell Inspiron 7352/0W6WV0), which end up at
enable_slot(..., bridge = 0) where bus has no bridge assigned to it.
acpihp doesn't know that it's a bridge, and bus specific 'PCI
subsystem' can't augment ACPI context with bridge information since
the PCI device to get this data from is/was not available.
Issue is easy to reproduce with QEMU's 'pc' machine, which supports PCI
hotplug on hostbridge slots. To reproduce, boot kernel at commit
40613da52b in VM started with following CLI (assuming guest root fs is
installed on sda1 partition):
# qemu-system-x86_64 -M pc -m 1G -enable-kvm -cpu host \
-monitor stdio -serial file:serial.log \
-kernel arch/x86/boot/bzImage \
-append "root=/dev/sda1 console=ttyS0" \
guest_disk.img
Once guest OS is fully booted at qemu prompt:
(qemu) device_add e1000
(check serial.log) it will cause NULL pointer dereference at:
void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
{
struct pci_bus *parent = bridge->subordinate;
BUG: kernel NULL pointer dereference, address: 0000000000000018
? pci_assign_unassigned_bridge_resources+0x1f/0x260
enable_slot+0x21f/0x3e0
acpiphp_hotplug_notify+0x13d/0x260
acpi_device_hotplug+0xbc/0x540
acpi_hotplug_work_fn+0x15/0x20
process_one_work+0x1f7/0x370
worker_thread+0x45/0x3b0
The issue was discovered on Dell Inspiron 7352/0W6WV0 laptop with following
sequence:
1. Suspend to RAM
2. Wake up with the same backtrace being observed:
3. 2nd suspend to RAM attempt makes laptop freeze
Fix it by using __pci_bus_assign_resources() instead of
pci_assign_unassigned_bridge_resources() as we used to do, but only in case
when bus doesn't have a bridge associated (to cover for the case of ACPI
event on hostbridge or non existing root port).
That lets us keep hotplug on root bus working like it used to and at the
same time keeps resource reassignment usable on root ports (and other 1st
level bridges) that was fixed by 40613da52b.
Fixes: 40613da52b ("PCI: acpiphp: Reassign resources on bridge if necessary")
Link: https://lore.kernel.org/r/20230726123518.2361181-2-imammedo@redhat.com
Reported-by: Woody Suwalski <terraluna977@gmail.com>
Tested-by: Woody Suwalski <terraluna977@gmail.com>
Tested-by: Michal Koutný <mkoutny@suse.com>
Link: https://lore.kernel.org/r/11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
The changes from commit 32832a407a ("io_uring: Fix io_uring mmap() by
using architecture-provided get_unmapped_area()") to the parisc
implementation of get_unmapped_area() broke glibc's locale-gen
executable when running on parisc.
This patch reverts those architecture-specific changes, and instead
adjusts in io_uring_mmu_get_unmapped_area() the pgoff offset which is
then given to parisc's get_unmapped_area() function. This is much
cleaner than the previous approach, and we still will get a coherent
addresss.
This patch has no effect on other architectures (SHM_COLOUR is only
defined on parisc), and the liburing testcase stil passes on parisc.
Cc: stable@vger.kernel.org # 6.4
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Fixes: 32832a407a ("io_uring: Fix io_uring mmap() by using architecture-provided get_unmapped_area()")
Fixes: d808459b2e ("io_uring: Adjust mapping wrt architecture aliasing requirements")
Link: https://lore.kernel.org/r/ZNEyGV0jyI8kOOfz@p100
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull gfs2 fixes from Andreas Gruenbacher:
- Fix a freeze consistency check in gfs2_trans_add_meta()
- Don't use filemap_splice_read as it can cause deadlocks on gfs2
* tag 'gfs2-v6.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Don't use filemap_splice_read
gfs2: Fix freeze consistency check in gfs2_trans_add_meta
A switch from OSI to PC mode is only possible if all CPUs other than the
calling one are OFF, either through a call to CPU_OFF or not yet booted.
Currently OSI mode is enabled before power domains are created. In cases
where CPUidle states are not using hierarchical CPU topology the bail out
path tries to switch back to PC mode which gets denied by firmware since
other CPUs are online at this point and creates inconsistent state as
firmware is in OSI mode and Linux in PC mode.
This change moves enabling OSI mode after power domains are created,
this would makes sure that hierarchical CPU topology is used before
switching firmware to OSI mode.
Cc: stable@vger.kernel.org
Fixes: 70c179b498 ("cpuidle: psci: Allow PM domain to be initialized even if no OSI mode")
Signed-off-by: Maulik Shah <quic_mkshah@quicinc.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Genpd parent and child domain topology created using dt_idle_pd_init_topology()
needs to be removed during error cases.
Add new helper function dt_idle_pd_remove_topology() for same.
Cc: stable@vger.kernel.org
Reviewed-by: Ulf Hanssson <ulf.hansson@linaro.org>
Signed-off-by: Maulik Shah <quic_mkshah@quicinc.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
To pick up the changes from these csets:
522b1d6921 ("x86/cpu/amd: Add a Zenbleed fix")
That cause no changes to tooling:
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
$
Just silences this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZND17H7BI4ariERn@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For a completed request, after the mmc_blk_mq_complete_rq(mq, req)
function is executed, the bitmap_tags corresponding to the
request will be cleared, that is, the request will be regarded as
idle. If the request is acquired by a different type of process at
this time, the issue_type of the request may change. It further
caused the value of mq->in_flight[issue_type] to be abnormal,
and a large number of requests could not be sent.
p1: p2:
mmc_blk_mq_complete_rq
blk_mq_free_request
blk_mq_get_request
blk_mq_rq_ctx_init
mmc_blk_mq_dec_in_flight
mmc_issue_type(mq, req)
This strategy can ensure the consistency of issue_type
before and after executing mmc_blk_mq_complete_rq.
Fixes: 81196976ed ("mmc: block: Add blk-mq support")
Cc: stable@vger.kernel.org
Signed-off-by: Yibin Ding <yibin.ding@unisoc.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20230802023023.1318134-1-yunlong.xing@unisoc.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Mika writes:
thunderbolt: Fixes for v6.5-rc6
This includes two fixes for v6.5-rc6:
- Correct display flickering when connecting a Thunderbolt 3 device to
an AMD USB4 host controller
- Fix a memory leak in bandwidth allocation request.
Both have been in linux-next with no reported issues.
* tag 'thunderbolt-for-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
thunderbolt: Fix memory leak in tb_handle_dp_bandwidth_request()
thunderbolt: Fix Thunderbolt 3 display flickering issue on 2nd hot plug onwards
The vDSO getcpu() reads CPU ID from the GDT_ENTRY_CPUNODE entry when the RDPID
instruction is not available. And GDT_ENTRY_CPUNODE is defined as 28 on 32-bit
Linux kernel and 15 on 64-bit. But the 32-bit getcpu() on 64-bit Linux kernel
is compiled with 32-bit Linux kernel GDT_ENTRY_CPUNODE, i.e., 28, beyond the
64-bit Linux kernel GDT limit. Thus, it just fails _silently_.
When BUILD_VDSO32_64 is defined, choose the 64-bit Linux kernel GDT definitions
to compile the 32-bit getcpu().
Fixes: 877cff5296 ("x86/vdso: Fake 32bit VDSO build on 64bit compile for vgetcpu")
Reported-by: kernel test robot <yujie.liu@intel.com>
Reported-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230322061758.10639-1-xin3.li@intel.com
Link: https://lore.kernel.org/oe-lkp/202303020903.b01fd1de-yujie.liu@intel.com
Syzkaller reported the following issue:
=======================================
Too BIG xdp->frame_sz = 131072
WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline]
WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103
...
Call Trace:
<TASK>
bpf_prog_4add87e5301a4105+0x1a/0x1c
__bpf_prog_run include/linux/filter.h:600 [inline]
bpf_prog_run_xdp include/linux/filter.h:775 [inline]
bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721
netif_receive_generic_xdp net/core/dev.c:4807 [inline]
do_xdp_generic+0x35c/0x770 net/core/dev.c:4866
tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919
tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043
call_write_iter include/linux/fs.h:1871 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x650/0xe40 fs/read_write.c:584
ksys_write+0x12f/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2bfe
("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But Jesper
Dangaard Brouer <jbrouer@redhat.com> noted that after introducing the
xdp_init_buff() which all XDP driver use - it's safe to remove this
check. The original intend was to catch cases where XDP drivers have
not been updated to use xdp.frame_sz, but that is not longer a concern
(since xdp_init_buff).
Running the initial syzkaller repro it was discovered that the
contiguous physical memory allocation is used for both xdp paths in
tun_get_user(), e.g. tun_build_skb() and tun_alloc_skb(). It was also
stated by Jesper Dangaard Brouer <jbrouer@redhat.com> that XDP can
work on higher order pages, as long as this is contiguous physical
memory (e.g. a page).
Reported-and-tested-by: syzbot+f817490f5bd20541b90a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000774b9205f1d8a80d@google.com/T/
Link: https://syzkaller.appspot.com/bug?extid=f817490f5bd20541b90a
Link: https://lore.kernel.org/all/20230725155403.796-1-andrew.kanner@gmail.com/T/
Fixes: 43b5169d83 ("net, xdp: Introduce xdp_init_buff utility routine")
Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20230803190316.2380231-1-andrew.kanner@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Using the syzkaller repro with reduced packet size it was discovered
that XDP_PACKET_HEADROOM is not checked in tun_can_build_skb(),
although pad may be incremented in tun_build_skb(). This may end up
with exceeding the PAGE_SIZE limit in tun_build_skb().
Jason Wang <jasowang@redhat.com> proposed to count XDP_PACKET_HEADROOM
always (e.g. without rcu_access_pointer(tun->xdp_prog)) in
tun_can_build_skb() since there's a window during which XDP program
might be attached between tun_can_build_skb() and tun_build_skb().
Fixes: 7df13219d7 ("tun: reserve extra headroom only when XDP is set")
Link: https://syzkaller.appspot.com/bug?extid=f817490f5bd20541b90a
Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com>
Link: https://lore.kernel.org/r/20230803185947.2379988-1-andrew.kanner@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While performing certain power-off sequences, PCI drivers are called to
suspend and resume their underlying devices through PCI PM (power
management) interface. However the hardware does not support PCI PM
suspend/resume operations so system wide suspend/resume leads to bad MFW
(management firmware) state which causes various follow-up errors in driver
when communicating with the device/firmware.
To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding system
to go into suspended/standby mode.
Fixes: 61d8658b4a ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.")
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230807093725.46829-1-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While performing certain power-off sequences, PCI drivers are called to
suspend and resume their underlying devices through PCI PM (power
management) interface. However the hardware does not support PCI PM
suspend/resume operations so system wide suspend/resume leads to bad MFW
(management firmware) state which causes various follow-up errors in driver
when communicating with the device/firmware.
To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding system
to go into suspended/standby mode.
Fixes: ace7f46ba5 ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230807093725.46829-2-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
As &qedi_percpu->p_work_lock is acquired by hard IRQ qedi_msix_handler(),
other acquisitions of the same lock under process context should disable
IRQ, otherwise deadlock could happen if the IRQ preempts the execution
while the lock is held in process context on the same CPU.
qedi_cpu_offline() is one such function which acquires the lock in process
context.
[Deadlock Scenario]
qedi_cpu_offline()
->spin_lock(&p->p_work_lock)
<irq>
->qedi_msix_handler()
->edi_process_completions()
->spin_lock_irqsave(&p->p_work_lock, flags); (deadlock here)
This flaw was found by an experimental static analysis tool I am developing
for IRQ-related deadlocks.
The tentative patch fix the potential deadlock by spin_lock_irqsave()
under process context.
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Link: https://lore.kernel.org/r/20230726125655.4197-1-dg573847474@gmail.com
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When preparing protection DIF I/O for DMA, the driver obtains reference
tags from scsi_prot_ref_tag(). Previously, there was a wrong assumption
that an all 0xffffffff value meant error and thus the driver failed the
I/O. This patch removes the evaluation code and accepts whatever the upper
layer returns.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230803211932.155745-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If device_add() returns error, the name allocated by dev_set_name() needs
be freed. As the comment of device_add() says, put_device() should be used
to give up the reference in the error path. So fix this by calling
put_device(), then the name can be freed in kobject_cleanp().
Fixes: c8806b6c9e ("snic: driver for Cisco SCSI HBA")
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Acked-by: Narsimhulu Musini <nmusini@cisco.com>
Link: https://lore.kernel.org/r/20230801111421.63651-1-wangzhu9@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If device_add() returns error, the name allocated by dev_set_name() needs
be freed. As the comment of device_add() says, put_device() should be used
to decrease the reference count in the error path. So fix this by calling
put_device(), then the name can be freed in kobject_cleanp().
Fixes: ee959b00c3 ("SCSI: convert struct class_device to struct device")
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Link: https://lore.kernel.org/r/20230803020230.226903-1-wangzhu9@huawei.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull xen netback buffer overflow fix from Juergen Gross:
"The fix for XSA-423 added logic to Linux'es netback driver to deal
with a frontend splitting a packet in a way such that not all of the
headers would come in one piece.
Unfortunately the logic introduced there didn't account for the
extreme case of the entire packet being split into as many pieces as
permitted by the protocol, yet still being smaller than the area
that's specially dealt with to keep all (possible) headers together.
Such an unusual packet would therefore trigger a buffer overrun in the
driver"
* tag 'xsa432-6.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/netback: Fix buffer overrun triggered by unusual packet
Pull x86/gds fixes from Dave Hansen:
"Mitigate Gather Data Sampling issue:
- Add Base GDS mitigation
- Support GDS_NO under KVM
- Fix a documentation typo"
* tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation/x86: Fix backwards on/off logic about YMM support
KVM: Add GDS_NO support to KVM
x86/speculation: Add Kconfig option for GDS
x86/speculation: Add force option to GDS mitigation
x86/speculation: Add Gather Data Sampling mitigation
Pull x86/srso fixes from Borislav Petkov:
"Add a mitigation for the speculative RAS (Return Address Stack)
overflow vulnerability on AMD processors.
In short, this is yet another issue where userspace poisons a
microarchitectural structure which can then be used to leak privileged
information through a side channel"
* tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/srso: Tie SBPB bit setting to microcode patch detection
x86/srso: Add a forgotten NOENDBR annotation
x86/srso: Fix return thunks in generated code
x86/srso: Add IBPB on VMEXIT
x86/srso: Add IBPB
x86/srso: Add SRSO_NO support
x86/srso: Add IBPB_BRTYPE support
x86/srso: Add a Speculative RAS Overflow mitigation
x86/bugs: Increase the x86 bugs vector size to two u32s
Pull workqueue fixes from Tejun Heo:
- The recently added cpu_intensive auto detection and warning mechanism
was spuriously triggered on slow CPUs.
While not causing serious issues, it's still a nuisance and can cause
unintended concurrency management behaviors.
Relax the threshold on machines with lower BogoMIPS. While BogoMIPS
is not an accurate measure of performance by most measures, we don't
have to be accurate and it has rough but strong enough correlation.
- A correction in Kconfig help text
* tag 'wq-for-6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Scale up wq_cpu_intensive_thresh_us if BogoMIPS is below 4000
workqueue: Fix cpu_intensive_thresh_us name in help text
Pull tpm fixes from Jarkko Sakkinen:
"A few more bug fixes"
* tag 'tpmdd-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm/tpm_tis: Disable interrupts for Lenovo P620 devices
tpm: Disable RNG for all AMD fTPMs
sysctl: set variable key_sysctls storage-class-specifier to static
tpm/tpm_tis: Disable interrupts for TUXEDO InfinityBook S 15/17 Gen7
Jason A. Donenfeld says:
====================
wireguard fixes for 6.5-rc6
Just one patch this time, somewhat late in the cycle:
1) Fix an off-by-one calculation for the maximum node depth size in the
allowedips trie data structure, and also adjust the self-tests to hit
this case so it doesn't regress again in the future.
====================
Link: https://lore.kernel.org/r/20230807132146.2191597-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the allowedips self-test, nodes are inserted into the tree, but it
generated an even amount of nodes, but for checking maximum node depth,
there is of course the root node, which makes the total number
necessarily odd. With two few nodes added, it never triggered the
maximum depth check like it should have. So, add 129 nodes instead of
128 nodes, and do so with a more straightforward scheme, starting with
all the bits set, and shifting over one each time. Then increase the
maximum depth to 129, and choose a better name for that variable to
make it clear that it represents depth as opposed to bits.
Cc: stable@vger.kernel.org
Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20230807132146.2191597-2-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
BUG_ON(!vlan_info) is triggered in unregister_vlan_dev() with
following testcase:
# ip netns add ns1
# ip netns exec ns1 ip link add bond0 type bond mode 0
# ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2
# ip netns exec ns1 ip link set bond_slave_1 master bond0
# ip netns exec ns1 ip link add link bond_slave_1 name vlan10 type vlan id 10 protocol 802.1ad
# ip netns exec ns1 ip link add link bond0 name bond0_vlan10 type vlan id 10 protocol 802.1ad
# ip netns exec ns1 ip link set bond_slave_1 nomaster
# ip netns del ns1
The logical analysis of the problem is as follows:
1. create ETH_P_8021AD protocol vlan10 for bond_slave_1:
register_vlan_dev()
vlan_vid_add()
vlan_info_alloc()
__vlan_vid_add() // add [ETH_P_8021AD, 10] vid to bond_slave_1
2. create ETH_P_8021AD protocol bond0_vlan10 for bond0:
register_vlan_dev()
vlan_vid_add()
__vlan_vid_add()
vlan_add_rx_filter_info()
if (!vlan_hw_filter_capable(dev, proto)) // condition established because bond0 without NETIF_F_HW_VLAN_STAG_FILTER
return 0;
if (netif_device_present(dev))
return dev->netdev_ops->ndo_vlan_rx_add_vid(dev, proto, vid); // will be never called
// The slaves of bond0 will not refer to the [ETH_P_8021AD, 10] vid.
3. detach bond_slave_1 from bond0:
__bond_release_one()
vlan_vids_del_by_dev()
list_for_each_entry(vid_info, &vlan_info->vid_list, list)
vlan_vid_del(dev, vid_info->proto, vid_info->vid);
// bond_slave_1 [ETH_P_8021AD, 10] vid will be deleted.
// bond_slave_1->vlan_info will be assigned NULL.
4. delete vlan10 during delete ns1:
default_device_exit_batch()
dev->rtnl_link_ops->dellink() // unregister_vlan_dev() for vlan10
vlan_info = rtnl_dereference(real_dev->vlan_info); // real_dev of vlan10 is bond_slave_1
BUG_ON(!vlan_info); // bond_slave_1->vlan_info is NULL now, bug is triggered!!!
Add S-VLAN tag related features support to bond driver. So the bond driver
will always propagate the VLAN info to its slaves.
Fixes: 8ad227ff89 ("net: vlan: add 802.1ad support")
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230802114320.4156068-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add missing capability check for each of the vnic counters exposed by
devlink health reporter, and thus avoid unexpected behavior due to
invalid access to registers.
While at it, read only the exact number of bits for each counter whether
it was 32 bits or 64 bits.
Fixes: b0bc615df4 ("net/mlx5: Add vnic devlink health reporter to PFs/VFs")
Fixes: a33682e4e7 ("net/mlx5e: Expose catastrophic steering error counters")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Handling pci errors should fully teardown and load back auxiliary
devices, same as done through mlx5 health recovery flow.
Fixes: 72ed5d5624 ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When device is in error state, marked by the flag
MLX5_DEVICE_STATE_INTERNAL_ERROR, the HW and PCI may not be accessible
and so clock update work should be skipped. Furthermore, such access
through PCI in error state, after calling mlx5_pci_disable_device() can
result in failing to recover from pci errors.
Fixes: ef9814deaf ("net/mlx5e: Add HW timestamping (TS) support")
Reported-and-tested-by: Ganesh G R <ganeshgr@linux.ibm.com>
Closes: https://lore.kernel.org/netdev/9bdb9b9d-140a-7a28-f0de-2e64e873c068@nvidia.com
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Cited patch introduced buckets in hash mode, but missed to update
the ports/bucket check when modifying LAG.
Fix the check.
Fixes: 352899f384 ("net/mlx5: Lag, use buckets in hash mode")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The controller number for ECVFs is always 0, because the ECPF must be
the eswitch owner for EC VFs to be enabled.
Fixes: dc13180824 ("net/mlx5: Enable devlink port for embedded cpu VF vports")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When querying eswitch functions 0 is a valid number of host VFs. After
introducing ARM SRIOV falling through to getting the max value from PCI
results in using the total VFs allowed on the ARM for the host.
Fixes: 86eec50bea ("net/mlx5: Support querying max VFs from device");
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The ECVF function ID range is 1..max_ec_vfs. Currently
mlx5_vport_to_func_id returns 0..max_ec_vfs - 1. Which
results in a syndrome when querying the caps with more
recent firmware, or reading incorrect caps with older
firmware that supports EC VFs.
Fixes: 9ac0b12824 ("net/mlx5: Update vport caps query/set for EC VFs")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Fixing wrong calculation of the modify hdr pattern size,
where the previously calculated number would not be enough
to accommodate the required number of actions.
Fixes: da5d0027d6 ("net/mlx5: DR, Add cache for modify header pattern")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The flow rule can be splited, and the extra post_act rules are added
to post_act table. It's possible to trigger memleak when the rule
forwards packets from internal port and over tunnel, in the case that,
for example, CT 'new' state offload is allowed. As int_port object is
assigned to the flow attribute of post_act rule, and its refcnt is
incremented by mlx5e_tc_int_port_get(), but mlx5e_tc_int_port_put() is
not called, the refcnt is never decremented, then int_port is never
freed.
The kmemleak reports the following error:
unreferenced object 0xffff888128204b80 (size 64):
comm "handler20", pid 50121, jiffies 4296973009 (age 642.932s)
hex dump (first 32 bytes):
01 00 00 00 19 00 00 00 03 f0 00 00 04 00 00 00 ................
98 77 67 41 81 88 ff ff 98 77 67 41 81 88 ff ff .wgA.....wgA....
backtrace:
[<00000000e992680d>] kmalloc_trace+0x27/0x120
[<000000009e945a98>] mlx5e_tc_int_port_get+0x3f3/0xe20 [mlx5_core]
[<0000000035a537f0>] mlx5e_tc_add_fdb_flow+0x473/0xcf0 [mlx5_core]
[<0000000070c2cec6>] __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
[<000000005cc84048>] mlx5e_configure_flower+0xd40/0x4c40 [mlx5_core]
[<000000004f8a2031>] mlx5e_rep_indr_offload.isra.0+0x10e/0x1c0 [mlx5_core]
[<000000007df797dc>] mlx5e_rep_indr_setup_tc_cb+0x90/0x130 [mlx5_core]
[<0000000016c15cc3>] tc_setup_cb_add+0x1cf/0x410
[<00000000a63305b4>] fl_hw_replace_filter+0x38f/0x670 [cls_flower]
[<000000008bc9e77c>] fl_change+0x1fd5/0x4430 [cls_flower]
[<00000000e7f766e4>] tc_new_tfilter+0x867/0x2010
[<00000000e101c0ef>] rtnetlink_rcv_msg+0x6fc/0x9f0
[<00000000e1111d44>] netlink_rcv_skb+0x12c/0x360
[<0000000082dd6c8b>] netlink_unicast+0x438/0x710
[<00000000fc568f70>] netlink_sendmsg+0x794/0xc50
[<0000000016e92590>] sock_sendmsg+0xc5/0x190
So fix this by moving int_port cleanup code to the flow attribute
free helper, which is used by all the attribute free cases.
Fixes: 8300f22526 ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
If SNDRV_PCM_IOCTL_PREPARE is called when the mixer settings linking
frontend and backend have not been setup yet this results in
e.g. the following errors getting logged:
[ 43.244549] Baytrail Audio Port: ASoC: no backend DAIs enabled for Baytrail Audio Port
[ 43.244744] Baytrail Audio Port: ASoC: error at dpcm_fe_dai_prepare on Baytrail Audio Port: -22
pipewire triggers this leading to 96 lines getting logged
after the user has logged into a GNOME session.
Change the actual "no backend DAIs enabled for ... Port" error to
dev_err_once() to avoid it getting repeated 48 times. While at it
also improve the error by hinting the user how to fix this.
To not make developing new UCM profiles harder, also log the error
at dev_dbg() level all the time (vs once). So that e.g. dyndbg can
be used to (re)enable the messages.
Also changes _soc_pcm_ret() to not log for -EINVAL errors, to fix
the other error getting logged 48 times. Userspace passing wrong
parameters should not lead to dmesg messages.
Link: https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/3407
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230805171435.31696-1-hdegoede@redhat.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The Lenovo ThinkStation P620 suffers from an irq storm issue like various
other Lenovo machines, so add an entry for it to tpm_tis_dmi_table and
force polling.
It is worth noting that 481c2d1462 (tpm,tpm_tis: Disable interrupts after
1000 unhandled IRQs) does not seem to fix the problem on this machine, but
setting 'tpm_tis.interrupts=0' on the kernel command line does.
[jarkko@kernel.org: truncated the commit ID in the description to 12
characters]
Cc: stable@vger.kernel.org # v6.4+
Fixes: e644b2f498 ("tpm, tpm_tis: Enable interrupt test")
Signed-off-by: Jonathan McDowell <noodles@meta.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
The TPM RNG functionality is not necessary for entropy when the CPU
already supports the RDRAND instruction. The TPM RNG functionality
was previously disabled on a subset of AMD fTPM series, but reports
continue to show problems on some systems causing stutter root caused
to TPM RNG functionality.
Expand disabling TPM RNG use for all AMD fTPMs whether they have versions
that claim to have fixed or not. To accomplish this, move the detection
into part of the TPM CRB registration and add a flag indicating that
the TPM should opt-out of registration to hwrng.
Cc: stable@vger.kernel.org # 6.1.y+
Fixes: b006c439d5 ("hwrng: core - start hwrng kthread also for untrusted sources")
Fixes: f1324bbc40 ("tpm: disable hwrng for fTPM on some AMD designs")
Reported-by: daniil.stas@posteo.net
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217719
Reported-by: bitlord0xff@gmail.com
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217212
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
smatch reports
security/keys/sysctl.c:12:18: warning: symbol
'key_sysctls' was not declared. Should it be static?
This variable is only used in its defining file, so it should be static.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
In commit 3666062b87 ("cpufreq: amd-pstate: move to use bus_get_dev_root()")
the "amd_pstate" attributes where moved from a dedicated kobject to the
cpu root kobject.
While the dedicated kobject expects to contain kobj_attributes the root
kobject needs device_attributes.
As the changed arguments are not used by the callbacks it works most of
the time.
However CFI will detect this issue:
[ 4947.849350] CFI failure at dev_attr_show+0x24/0x60 (target: show_status+0x0/0x70; expected type: 0x8651b1de)
...
[ 4947.849409] Call Trace:
[ 4947.849410] <TASK>
[ 4947.849411] ? __warn+0xcf/0x1c0
[ 4947.849414] ? dev_attr_show+0x24/0x60
[ 4947.849415] ? report_cfi_failure+0x4e/0x60
[ 4947.849417] ? handle_cfi_failure+0x14c/0x1d0
[ 4947.849419] ? __cfi_show_status+0x10/0x10
[ 4947.849420] ? handle_bug+0x4f/0x90
[ 4947.849421] ? exc_invalid_op+0x1a/0x60
[ 4947.849422] ? asm_exc_invalid_op+0x1a/0x20
[ 4947.849424] ? __cfi_show_status+0x10/0x10
[ 4947.849425] ? dev_attr_show+0x24/0x60
[ 4947.849426] sysfs_kf_seq_show+0xa6/0x110
[ 4947.849433] seq_read_iter+0x16c/0x4b0
[ 4947.849436] vfs_read+0x272/0x2d0
[ 4947.849438] ksys_read+0x72/0xe0
[ 4947.849439] do_syscall_64+0x76/0xb0
[ 4947.849440] ? do_user_addr_fault+0x252/0x650
[ 4947.849442] ? exc_page_fault+0x7a/0x1b0
[ 4947.849443] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Fixes: 3666062b87 ("cpufreq: amd-pstate: move to use bus_get_dev_root()")
Reported-by: Jannik Glückert <jannik.glueckert@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217765
Link: https://lore.kernel.org/lkml/c7f1bf9b-b183-bf6e-1cbb-d43f72494083@gmail.com/
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull kvm fixes from Paolo Bonzini:
"x86:
- Fix SEV race condition
ARM:
- Fixes for the configuration of SVE/SME traps when hVHE mode is in
use
- Allow use of pKVM on systems with FF-A implementations that are
v1.0 compatible
- Request/release percpu IRQs (arch timer, vGIC maintenance)
correctly when pKVM is in use
- Fix function prototype after __kvm_host_psci_cpu_entry() rename
- Skip to the next instruction when emulating writes to TCR_EL1 on
AmpereOne systems
Selftests:
- Fix missing include"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
selftests/rseq: Fix build with undefined __weak
KVM: SEV: remove ghcb variable declarations
KVM: SEV: only access GHCB fields once
KVM: SEV: snapshot the GHCB before accessing it
KVM: arm64: Skip instruction after emulating write to TCR_EL1
KVM: arm64: fix __kvm_host_psci_cpu_entry() prototype
KVM: arm64: Fix resetting SME trap values on reset for (h)VHE
KVM: arm64: Fix resetting SVE trap values on reset for hVHE
KVM: arm64: Use the appropriate feature trap register when activating traps
KVM: arm64: Helper to write to appropriate feature trap register based on mode
KVM: arm64: Disable SME traps for (h)VHE at setup
KVM: arm64: Use the appropriate feature trap register for SVE at EL2 setup
KVM: arm64: Factor out code for checking (h)VHE mode into a macro
KVM: arm64: Rephrase percpu enable/disable tracking in terms of hyp
KVM: arm64: Fix hardware enable/disable flows for pKVM
KVM: arm64: Allow pKVM on v1.0 compatible FF-A implementations
Pull MMC fixes from Ulf Hansson:
- moxart: Fix big-endian conversion for SCR structure
- sdhci-f-sdh30: Replace with sdhci_pltfm to fix PM support
* tag 'mmc-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-f-sdh30: Replace with sdhci_pltfm
mmc: moxart: read scr register without changing byte order
Starting with patch 2cb1e08985, gfs2 started using the new function
filemap_splice_read rather than the old (and subsequently deleted)
function generic_file_splice_read.
filemap_splice_read works by taking references to a number of folios in
the page cache and splicing those folios into a pipe. The folios are
then read from the pipe and the folio references are dropped. This can
take an arbitrary amount of time. We cannot allow that in gfs2 because
those folio references will pin the inode glock to the node and prevent
it from being demoted, which can lead to cluster-wide deadlocks.
Instead, use copy_splice_read.
(In addition, the old generic_file_splice_read called into ->read_iter,
which called gfs2_file_read_iter, which took the inode glock during the
operation. The new filemap_splice_read interface does not take the
inode glock anymore. This is fixable, but it still wouldn't prevent
cluster-wide deadlocks.)
Fixes: 2cb1e08985 ("splice: Use filemap_splice_read() instead of generic_file_splice_read()")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function gfs2_trans_add_meta() checks for the SDF_FROZEN flag to make
sure that no buffers are added to a transaction while the filesystem is
frozen. With the recent freeze/thaw rework, the SDF_FROZEN flag is
cleared after thaw_super() is called, which is sufficient for
serializing freeze/thaw.
However, other filesystem operations started after thaw_super() may now
be calling gfs2_trans_add_meta() before the SDF_FROZEN flag is cleared,
which will trigger the SDF_FROZEN check in gfs2_trans_add_meta(). Fix
that by checking the s_writers.frozen state instead.
In addition, make sure not to call gfs2_assert_withdraw() with the
sd_log_lock spin lock held. Check for a withdrawn filesystem before
checking for a frozen filesystem, and don't pin/add buffers to the
current transaction in case of a failure in either case.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Tao Liu reported a boot hang on an Intel Atom machine due to an unmapped
EFI config table. The reason being that the CC blob which contains the
CPUID page for AMD SNP guests is parsed for before even checking
whether the machine runs on AMD hardware.
Usually that's not a problem on !AMD hw - it simply won't find the CC
blob's GUID and return. However, if any parts of the config table
pointers array is not mapped, the kernel will #PF very early in the
decompressor stage without any opportunity to recover.
Therefore, do a superficial CPUID check before poking for the CC blob.
This will fix the current issue on real hardware. It would also work as
a guest on a non-lying hypervisor.
For the lying hypervisor, the check is done again, *after* parsing the
CC blob as the real CPUID page will be present then.
Clear the #VC handler in case SEV-{ES,SNP} hasn't been detected, as
a precaution.
Fixes: c01fce9cef ("x86/compressed: Add SEV-SNP feature detection/setup")
Reported-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Tao Liu <ltao@redhat.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230601072043.24439-1-ltao@redhat.com
On a laptop with hibernation set up but not actively used, and with
secure boot and lockdown enabled kernel, 6.5-rc1 gets stuck on boot with
the following repeated messages:
A start job is running for Resume from hibernation using device /dev/system/swap (24s / no limit)
lockdown_is_locked_down: 25311154 callbacks suppressed
Lockdown: systemd-hiberna: hibernation is restricted; see man kernel_lockdown.7
...
Checking the resume code leads to commit cc89c63e2f ("PM: hibernate:
move finding the resume device out of software_resume") which
inadvertently changed the return value from resume_store() to 0 when
!hibernation_available(). This apparently translates to userspace
write() returning 0 as in number of bytes written, and userspace looping
indefinitely in the attempt to write the intended value.
Fix this by returning the full number of bytes that were to be written,
as that's what was done before the commit.
Fixes: cc89c63e2f ("PM: hibernate: move finding the resume device out of software_resume")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The SBPB bit in MSR_IA32_PRED_CMD is supported only after a microcode
patch has been applied so set X86_FEATURE_SBPB only then. Otherwise,
guests would attempt to set that bit and #GP on the MSR write.
While at it, make SMT detection more robust as some guests - depending
on how and what CPUID leafs their report - lead to cpu_smt_control
getting set to CPU_SMT_NOT_SUPPORTED but SRSO_NO should be set for any
guest incarnation where one simply cannot do SMT, for whatever reason.
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
A couple of hardware registers need to be set to reflect which
interrupts have been allocated to the device. Each register is 32-bit
wide and can receive four 8-bit values. If we provide any other interrupt
number than four, the irq_num variable will never be 0 within the while
check and the while block will loop forever.
There is an easy way to prevent this: just break the for loop
when we reach "irq_num == 0", which anyway means all interrupts have
been processed.
Cc: stable@vger.kernel.org
Fixes: 17ce252266 ("dmaengine: xilinx: xdma: Add xilinx xdma driver")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://lore.kernel.org/r/20230731101442.792514-2-miquel.raynal@bootlin.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Disabling IDXD device doesn't reset Page Request Service (PRS)
disable flag to its initial value 0. This may cause user confusion
because once PRS is disabled user will see PRS still remains the
previous setting (i.e. disabled) via sysfs interface even after the
device is disabled.
To eliminate user confusion, reset PRS disable flag to ensure that
the PRS flag bit reflects correct state after the device is disabled.
Additionally, simplify the code by setting wq->flags to 0, which clears
all flag bits, including any future additions.
Fixes: f2dc327131 ("dmaengine: idxd: add per wq PRS disable")
Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230712193505.3440752-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
When 'mcf_edma' is allocated, some space is allocated for a
flexible array at the end of the struct. 'chans' item are allocated, that is
to say 'pdata->dma_channels'.
Then, this number of item is stored in 'mcf_edma->n_chans'.
A few lines later, if 'mcf_edma->n_chans' is 0, then a default value of 64
is set.
This ends to no space allocated by devm_kzalloc() because chans was 0, but
64 items are read and/or written in some not allocated memory.
Change the logic to define a default value before allocating the memory.
Fixes: e7a3ff92ea ("dmaengine: fsl-edma: add ColdFire mcf5441x edma support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/f55d914407c900828f6fad3ea5fa791a5f17b9a4.1685172449.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Pull vfs fixes from Christian Brauner:
- Fix a wrong check for O_TMPFILE during RESOLVE_CACHED lookup
- Clean up directory iterators and clarify file_needs_f_pos_lock()
* tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: rely on ->iterate_shared to determine f_pos locking
vfs: get rid of old '->iterate' directory operation
proc: fix missing conversion to 'iterate_shared'
open: make RESOLVE_CACHED correctly test for O_TMPFILE
ionic_start_queues_reconfig returns an error code if txrx_init fails.
Handle this error code in the relevant places.
This fixes a corner case where the device could get left in a detached
state if the CMB reconfig fails and the attempt to clean up the mess
also fails. Note that calling netif_device_attach when the netdev is
already attached does not lead to unexpected behavior.
Change goto name "errout" to "err_out" to maintain consistency across
goto statements.
Fixes: 40bc471dc7 ("ionic: add tx/rx-push support with device Component Memory Buffers")
Fixes: 6f7d6f0fd7 ("ionic: pull reset_queues into tx_timeout handler")
Signed-off-by: Nitya Sunkad <nitya.sunkad@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case rhashtable_lookup_insert_fast() fails inside vxlan_vni_add(), the
allocated percpu vni stats are not freed on the error path.
Introduce vxlan_vni_free() which would work as a nice wrapper to free
vxlan_vni_node resources properly.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 4095e0e132 ("drivers: vxlan: vnifilter: per vni stats")
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we removed ->iterate we don't need to check for either
->iterate or ->iterate_shared in file_needs_f_pos_lock(). Simply check
for ->iterate_shared instead. This will tell us whether we need to
unconditionally take the lock. Not just does it allow us to avoid
checking f_inode's mode it also actually clearly shows that we're
locking because of readdir.
Signed-off-by: Christian Brauner <brauner@kernel.org>
All users now just use '->iterate_shared()', which only takes the
directory inode lock for reading.
Filesystems that never got convered to shared mode now instead use a
wrapper that drops the lock, re-takes it in write mode, calls the old
function, and then downgrades the lock back to read mode.
This way the VFS layer and other callers no longer need to care about
filesystems that never got converted to the modern era.
The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
ntfs, ocfs2, overlayfs, and vboxsf.
Honestly, several of them look like they really could just iterate their
directories in shared mode and skip the wrapper entirely, but the point
of this change is to not change semantics or fix filesystems that
haven't been fixed in the last 7+ years, but to finally get rid of the
dual iterators.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
I'm looking at the directory handling due to the discussion about f_pos
locking (see commit 797964253d: "file: reinstate f_pos locking
optimization for regular files"), and wanting to clean that up.
And one source of ugliness is how we were supposed to move filesystems
over to the '->iterate_shared()' function that only takes the inode lock
for reading many many years ago, but several filesystems still use the
bad old '->iterate()' that takes the inode lock for exclusive access.
See commit 6192269444 ("introduce a parallel variant of ->iterate()")
that also added some documentation stating
Old method is only used if the new one is absent; eventually it will
be removed. Switch while you still can; the old one won't stay.
and that was back in April 2016. Here we are, many years later, and the
old version is still clearly sadly alive and well.
Now, some of those old style iterators are probably just because the
filesystem may end up having per-inode mutable data that it uses for
iterating a directory, but at least one case is just a mistake.
Al switched over most filesystems to use '->iterate_shared()' back when
it was introduced. In particular, the /proc filesystem was converted as
one of the first ones in commit f50752eaa0 ("switch all procfs
directories ->iterate_shared()").
But then later one new user of '->iterate()' was then re-introduced by
commit 6d9c939dbe ("procfs: add smack subdir to attrs").
And that's clearly not what we wanted, since that new case just uses the
same 'proc_pident_readdir()' and 'proc_pident_lookup()' helper functions
that other /proc pident directories use, and they are most definitely
safe to use with the inode lock held shared.
So just fix it.
This still leaves a fair number of oddball filesystems using the
old-style directory iterator (ceph, coda, exfat, jfs, ntfs, ocfs2,
overlayfs, and vboxsf), but at least we don't have any remaining in the
core filesystems.
I'm going to add a wrapper function that just drops the read-lock and
takes it as a write lock, so that we can clean up the core vfs layer and
make all the ugly 'this filesystem needs exclusive inode locking' be
just filesystem-internal warts.
I just didn't want to make that conversion when we still had a core user
left.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
O_TMPFILE is actually __O_TMPFILE|O_DIRECTORY. This means that the old
fast-path check for RESOLVE_CACHED would reject all users passing
O_DIRECTORY with -EAGAIN, when in fact the intended test was to check
for __O_TMPFILE.
Cc: stable@vger.kernel.org # v5.12+
Fixes: 99668f6180 ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Message-Id: <20230806-resolve_cached-o_tmpfile-v1-1-7ba16308465e@cyphar.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
syzbot/KCSAN reported data-races in macsec whenever dev->stats fields
are updated.
It appears all of these updates can happen from multiple cpus.
Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.
Fixes: c09440f7dc ("macsec: introduce IEEE 802.1AE driver")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
TLS records end with a 16B tag. For TLS device offload we only
need to make space for this tag in the stream, the device will
generate and replace it with the actual calculated tag.
Long time ago the code would just re-reference the head frag
which mostly worked but was suboptimal because it prevented TCP
from combining the record into a single skb frag. I'm not sure
if it was correct as the first frag may be shorter than the tag.
The commit under fixes tried to replace that with using the page
frag and if the allocation failed rolling back the data, if record
was long enough. It achieves better fragment coalescing but is
also buggy.
We don't roll back the iterator, so unless we're at the end of
send we'll skip the data we designated as tag and start the
next record as if the rollback never happened.
There's also the possibility that the record was constructed
with MSG_MORE and the data came from a different syscall and
we already told the user space that we "got it".
Allocate a single dummy page and use it as fallback.
Found by code inspection, and proven by forcing allocation
failures.
Fixes: e7b159a48b ("net/tls: remove the record tail optimization")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although the memory map of i.MX93 reference manual rev. 2 claims that
analog top has start address of 0x44480000 and end address of 0x4448ffff,
this overlaps with TMU memory area starting at 0x44482000, as stated in
section 73.6.1.
As PLL configuration registers start at addresses up to 0x44481400, as used
by clk-imx93, reduce the anatop size to 0x2000, so exclude the TMU area
but keep all PLL registers inside.
Fixes: ec8b5b5058 ("arm64: dts: freescale: Add i.MX93 dtsi support")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Jacky Bai <ping.bai@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
There are multiple smb2_ea_info buffers in FILE_FULL_EA_INFORMATION request
from client. ksmbd find next smb2_ea_info using ->NextEntryOffset of
current smb2_ea_info. ksmbd need to validate buffer length Before
accessing the next ea. ksmbd should check buffer length using buf_len,
not next variable. next is the start offset of current ea that got from
previous ea.
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21598
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
In commit 2b9b8f3b68 ("ksmbd: validate command payload size"), except
for SMB2_OPLOCK_BREAK_HE command, the request size of other commands
is not checked, it's not expected. Fix it by add check for request
size of other commands.
Cc: stable@vger.kernel.org
Fixes: 2b9b8f3b68 ("ksmbd: validate command payload size")
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Long Li <leo.lilong@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull ata fix from Damien Le Moal:
- Prevent the scsi disk driver from issuing a START STOP UNIT command
for ATA devices during system resume as this causes various issues
reported by multiple users.
* tag 'ata-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata,scsi: do not issue START STOP UNIT on resume
Commit af8b04c637 ("zram: simplify bvec iteration in
__zram_make_request") changed the bio iteration in zram to rely on the
implicit capping to page boundaries in bio_for_each_segment. But it
failed to care for the fact zram not only care about the page alignment
of the bio payload, but also the page alignment into the device. For
buffered I/O and swap those are the same, but for direct I/O or kernel
internal I/O like XFS log buffer writes they can differ.
Fix this by open coding bio_for_each_segment and limiting the bvec len
so that it never crosses over a page alignment boundary in the device
in addition to the payload boundary already taken care of by
bio_iter_iovec.
Cc: stable@vger.kernel.org
Fixes: af8b04c637 ("zram: simplify bvec iteration in __zram_make_request")
Reported-by: Dusty Mabe <dusty@dustymabe.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Link: https://lore.kernel.org/r/20230805055537.147835-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull smb client fix from Steve French:
- Fix DFS interlink problem (different namespace)
* tag '6.5-rc4-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix dfs link mount against w2k8
Pull powerpc fixes from Michael Ellerman:
- Fix vmemmap altmap boundary check which could cause memory hotunplug
failure
- Create a dummy stackframe to fix ftrace stack unwind
- Fix secondary thread bringup for Book3E ELFv2 kernels
- Use early_ioremap/unmap() in via_calibrate_decr()
Thanks to Aneesh Kumar K.V, Benjamin Gray, Christophe Leroy, David
Hildenbrand, and Naveen N Rao.
* tag 'powerpc-6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powermac: Use early_* IO variants in via_calibrate_decr()
powerpc/64e: Fix secondary thread bringup for ELFv2 kernels
powerpc/ftrace: Create a dummy stackframe to fix stack unwind
powerpc/mm/altmap: Fix altmap boundary check
Pull parisc architecture fixes from Helge Deller:
- early fixmap preallocation to fix boot failures on kernel >= 6.4
- remove DMA leftover code in parport_gsc
- drop old comments and code style fixes
* tag 'parisc-for-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: unaligned: Add required spaces after ','
parport: gsc: remove DMA leftover code
parisc: pci-dma: remove unused and dead EISA code and comment
parisc/mm: preallocate fixmap page tables at init
Georgi writes:
interconnect fixes for v6.5-rc
This contains a fix for a potential issue on some Qualcomm SoCs where
bit-masks should have been used to configure the Bus Clock Manager
hardware, instead of bandwidth units.
- interconnect: qcom: Add support for mask-based BCMs
- interconnect: qcom: sm8450: add enable_mask for bcm nodes
- interconnect: qcom: sm8550: add enable_mask for bcm nodes
- interconnect: qcom: sa8775p: add enable_mask for bcm nodes
Signed-off-by: Georgi Djakov <djakov@kernel.org>
* tag 'icc-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc:
interconnect: qcom: sa8775p: add enable_mask for bcm nodes
interconnect: qcom: sm8550: add enable_mask for bcm nodes
interconnect: qcom: sm8450: add enable_mask for bcm nodes
interconnect: qcom: Add support for mask-based BCMs
When a client roamed back to a node before it got time to destroy the
pending local entry (i.e. within the same originator interval) the old
global one is directly removed from hash table and left as such.
But because this entry had an extra reference taken at lookup (i.e using
batadv_tt_global_hash_find) there is no way its memory will be reclaimed
at any time causing the following memory leak:
unreferenced object 0xffff0000073c8000 (size 18560):
comm "softirq", pid 0, jiffies 4294907738 (age 228.644s)
hex dump (first 32 bytes):
06 31 ac 12 c7 7a 05 00 01 00 00 00 00 00 00 00 .1...z..........
2c ad be 08 00 80 ff ff 6c b6 be 08 00 80 ff ff ,.......l.......
backtrace:
[<00000000ee6e0ffa>] kmem_cache_alloc+0x1b4/0x300
[<000000000ff2fdbc>] batadv_tt_global_add+0x700/0xe20
[<00000000443897c7>] _batadv_tt_update_changes+0x21c/0x790
[<000000005dd90463>] batadv_tt_update_changes+0x3c/0x110
[<00000000a2d7fc57>] batadv_tt_tvlv_unicast_handler_v1+0xafc/0xe10
[<0000000011793f2a>] batadv_tvlv_containers_process+0x168/0x2b0
[<00000000b7cbe2ef>] batadv_recv_unicast_tvlv+0xec/0x1f4
[<0000000042aef1d8>] batadv_batman_skb_recv+0x25c/0x3a0
[<00000000bbd8b0a2>] __netif_receive_skb_core.isra.0+0x7a8/0xe90
[<000000004033d428>] __netif_receive_skb_one_core+0x64/0x74
[<000000000f39a009>] __netif_receive_skb+0x48/0xe0
[<00000000f2cd8888>] process_backlog+0x174/0x344
[<00000000507d6564>] __napi_poll+0x58/0x1f4
[<00000000b64ef9eb>] net_rx_action+0x504/0x590
[<00000000056fa5e4>] _stext+0x1b8/0x418
[<00000000878879d6>] run_ksoftirqd+0x74/0xa4
unreferenced object 0xffff00000bae1a80 (size 56):
comm "softirq", pid 0, jiffies 4294910888 (age 216.092s)
hex dump (first 32 bytes):
00 78 b1 0b 00 00 ff ff 0d 50 00 00 00 00 00 00 .x.......P......
00 00 00 00 00 00 00 00 50 c8 3c 07 00 00 ff ff ........P.<.....
backtrace:
[<00000000ee6e0ffa>] kmem_cache_alloc+0x1b4/0x300
[<00000000d9aaa49e>] batadv_tt_global_add+0x53c/0xe20
[<00000000443897c7>] _batadv_tt_update_changes+0x21c/0x790
[<000000005dd90463>] batadv_tt_update_changes+0x3c/0x110
[<00000000a2d7fc57>] batadv_tt_tvlv_unicast_handler_v1+0xafc/0xe10
[<0000000011793f2a>] batadv_tvlv_containers_process+0x168/0x2b0
[<00000000b7cbe2ef>] batadv_recv_unicast_tvlv+0xec/0x1f4
[<0000000042aef1d8>] batadv_batman_skb_recv+0x25c/0x3a0
[<00000000bbd8b0a2>] __netif_receive_skb_core.isra.0+0x7a8/0xe90
[<000000004033d428>] __netif_receive_skb_one_core+0x64/0x74
[<000000000f39a009>] __netif_receive_skb+0x48/0xe0
[<00000000f2cd8888>] process_backlog+0x174/0x344
[<00000000507d6564>] __napi_poll+0x58/0x1f4
[<00000000b64ef9eb>] net_rx_action+0x504/0x590
[<00000000056fa5e4>] _stext+0x1b8/0x418
[<00000000878879d6>] run_ksoftirqd+0x74/0xa4
Releasing the extra reference from batadv_tt_global_hash_find even at
roam back when batadv_tt_global_free is called fixes this memory leak.
Cc: stable@vger.kernel.org
Fixes: 068ee6e204 ("batman-adv: roaming handling mechanism redesign")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Signed-off-by; Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Pull clk fixes from Stephen Boyd:
"A few clk driver fixes for some SoC clk drivers:
- Change a usleep() to udelay() to avoid scheduling while atomic in
the Amlogic PLL code
- Revert a patch to the Mediatek MT8183 driver that caused an
out-of-bounds write
- Return the right error value when devm_of_iomap() fails in
imx93_clocks_probe()
- Constrain the Kconfig for the fixed mmio clk so that it depends on
HAS_IOMEM and can't be compiled on architectures such as s390"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: fixed-mmio: make COMMON_CLK_FIXED_MMIO depend on HAS_IOMEM
clk: imx93: Propagate correct error in imx93_clocks_probe()
clk: mediatek: mt8183: Add back SSPM related clocks
clk: meson: change usleep_range() to udelay() for atomic context
Matthieu Baerts says:
====================
mptcp: more fixes for v6.5
Here is a new batch of fixes related to MPTCP for v6.5 and older.
Patches 1 and 2 fix issues with MPTCP Join selftest when manually
launched with '-i' parameter to use 'ip mptcp' tool instead of the
dedicated one (pm_nl_ctl). The issues have been there since v5.18.
Thank you Andrea for your first contributions to MPTCP code in the
upstream kernel!
Patch 3 avoids corrupting the data stream when trying to reset
connections that have fallen back to TCP. This can happen from v6.1.
Patch 4 fixes a race when doing a disconnect() and an accept() in
parallel on a listener socket. The issue only happens in rare cases if
the user is really unlucky since a fix that landed in v6.3 but
backported up to v6.1.
====================
Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-0-6671b1ab11cc@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Despite commit 0ad529d9fd ("mptcp: fix possible divide by zero in
recvmsg()"), the mptcp protocol is still prone to a race between
disconnect() (or shutdown) and accept.
The root cause is that the mentioned commit checks the msk-level
flag, but mptcp_stream_accept() does acquire the msk-level lock,
as it can rely directly on the first subflow lock.
As reported by Christoph than can lead to a race where an msk
socket is accepted after that mptcp_subflow_queue_clean() releases
the listener socket lock and just before it takes destructive
actions leading to the following splat:
BUG: kernel NULL pointer dereference, address: 0000000000000012
PGD 5a4ca067 P4D 5a4ca067 PUD 37d4c067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 10955 Comm: syz-executor.5 Not tainted 6.5.0-rc1-gdc7b257ee5dd #37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:mptcp_stream_accept+0x1ee/0x2f0 include/net/inet_sock.h:330
Code: 0a 09 00 48 8b 1b 4c 39 e3 74 07 e8 bc 7c 7f fe eb a1 e8 b5 7c 7f fe 4c 8b 6c 24 08 eb 05 e8 a9 7c 7f fe 49 8b 85 d8 09 00 00 <0f> b6 40 12 88 44 24 07 0f b6 6c 24 07 bf 07 00 00 00 89 ee e8 89
RSP: 0018:ffffc90000d07dc0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888037e8d020 RCX: ffff88803b093300
RDX: 0000000000000000 RSI: ffffffff833822c5 RDI: ffffffff8333896a
RBP: 0000607f82031520 R08: ffff88803b093300 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000003e83 R12: ffff888037e8d020
R13: ffff888037e8c680 R14: ffff888009af7900 R15: ffff888009af6880
FS: 00007fc26d708640(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000012 CR3: 0000000066bc5001 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
do_accept+0x1ae/0x260 net/socket.c:1872
__sys_accept4+0x9b/0x110 net/socket.c:1913
__do_sys_accept4 net/socket.c:1954 [inline]
__se_sys_accept4 net/socket.c:1951 [inline]
__x64_sys_accept4+0x20/0x30 net/socket.c:1951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Address the issue by temporary removing the pending request socket
from the accept queue, so that racing accept() can't touch them.
After depleting the msk - the ssk still exists, as plain TCP sockets,
re-insert them into the accept queue, so that later inet_csk_listen_stop()
will complete the tcp socket disposal.
Fixes: 2a6a870e44 ("mptcp: stops worker on unaccepted sockets at listener close")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/423
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-4-6671b1ab11cc@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mptcp_join 'implicit EP' test currently fails when using ip mptcp:
$ ./mptcp_join.sh -iI
<snip>
001 implicit EP creation[fail] expected '10.0.2.2 10.0.2.2 id 1 implicit' found '10.0.2.2 id 1 rawflags 10 '
Error: too many addresses or duplicate one: -22.
ID change is prevented[fail] expected '10.0.2.2 10.0.2.2 id 1 implicit' found '10.0.2.2 id 1 rawflags 10 '
modif is allowed[fail] expected '10.0.2.2 10.0.2.2 id 1 signal' found '10.0.2.2 id 1 signal '
This happens because of two reasons:
- iproute v6.3.0 does not support the implicit flag, fixed with
iproute2-next commit 3a2535a41854 ("mptcp: add support for implicit
flag")
- pm_nl_check_endpoint wrongly expects the ip address to be repeated two
times in iproute output, and does not account for a final whitespace
in it.
This fixes the issue trimming the whitespace in the output string and
removing the double address in the expected string.
Fixes: 69c6ce7b6e ("selftests: mptcp: add implicit endpoint test case")
Cc: stable@vger.kernel.org
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-2-6671b1ab11cc@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mptcp_join 'delete and re-add' test fails when using ip mptcp:
$ ./mptcp_join.sh -iI
<snip>
002 delete and re-add before delete[ ok ]
mptcp_info subflows=1 [ ok ]
Error: argument "ADDRESS" is wrong: invalid for non-zero id address
after delete[fail] got 2:2 subflows expected 1
This happens because endpoint delete includes an ip address while id is
not 0, contrary to what is indicated in the ip mptcp man page:
"When used with the delete id operation, an IFADDR is only included when
the ID is 0."
This fixes the issue using the $addr variable in pm_nl_del_endpoint()
only when id is 0.
Fixes: 34aa6e3bcc ("selftests: mptcp: add ip mptcp wrappers")
Cc: stable@vger.kernel.org
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-1-6671b1ab11cc@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal says:
====================
tunnels: fix ipv4 pmtu icmp checksum
The checksum of the generated ipv4 icmp pmtud message is
only correct if the skb that causes the icmp error generation
is linear.
Fix this and add a selftest for this.
====================
Link: https://lore.kernel.org/r/20230803152653.29535-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TCP might get stuck if a nonlinear skb exceeds the path MTU,
icmp error contains an incorrect icmp checksum in that case.
Extend the existing test for vxlan to also send at least 1MB worth of
data via TCP in addition to the existing 'large icmp packet adds
route exception'.
On my test VM this fails due to 0-size output file without
"tunnels: fix kasan splat when generating ipv4 pmtu error".
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230803152653.29535-3-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If we try to emit an icmp error in response to a nonliner skb, we get
BUG: KASAN: slab-out-of-bounds in ip_compute_csum+0x134/0x220
Read of size 4 at addr ffff88811c50db00 by task iperf3/1691
CPU: 2 PID: 1691 Comm: iperf3 Not tainted 6.5.0-rc3+ #309
[..]
kasan_report+0x105/0x140
ip_compute_csum+0x134/0x220
iptunnel_pmtud_build_icmp+0x554/0x1020
skb_tunnel_check_pmtu+0x513/0xb80
vxlan_xmit_one+0x139e/0x2ef0
vxlan_xmit+0x1867/0x2760
dev_hard_start_xmit+0x1ee/0x4f0
br_dev_queue_push_xmit+0x4d1/0x660
[..]
ip_compute_csum() cannot deal with nonlinear skbs, so avoid it.
After this change, splat is gone and iperf3 is no longer stuck.
Fixes: 4cb47a8644 ("tunnels: PMTU discovery support for directly bridged IP packets")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230803152653.29535-2-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Audio REFCLK's are not working correctly, trying to use them lead to the
following errors:
[ 6.575277] of_clk_hw_onecell_get: invalid index 4294934528
[ 6.581515] wm8904 1-001a: Failed to get MCLK
[ 6.586290] wm8904: probe of 1-001a failed with error -2
The issue is that Audio REFCLK has #clock-cells = 0 [1], while the driver
is registering those clocks assuming they have one cells. Fix this by
registering the clock with of_clk_hw_simple_get() when there is only one
instance, e.g. "audio_refclk".
[1] Documentation/devicetree/bindings/clock/ti,am62-audio-refclk.yaml
Fixes: 6acab96ee3 ("clk: keystone: syscon-clk: Add support for audio refclk")
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Link: https://lore.kernel.org/r/20230728222639.110409-1-francesco@dolcini.it
[sboyd@kernel.org: Simplify if-return-else logic]
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Pull hyperv fixes from Wei Liu:
- Fix a bug in a python script for Hyper-V (Ani Sinha)
- Workaround a bug in Hyper-V when IBT is enabled (Michael Kelley)
- Fix an issue parsing MP table when Linux runs in VTL2 (Saurabh
Sengar)
- Several cleanup patches (Nischala Yelchuri, Kameron Carr, YueHaibing,
ZhiHu)
* tag 'hyperv-fixes-signed-20230804' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: vmbus: Remove unused extern declaration vmbus_ontimer()
x86/hyperv: add noop functions to x86_init mpparse functions
vmbus_testing: fix wrong python syntax for integer value comparison
x86/hyperv: fix a warning in mshyperv.h
x86/hyperv: Disable IBT when hypercall page lacks ENDBR instruction
x86/hyperv: Improve code for referencing hyperv_pcpu_input_arg
Drivers: hv: Change hv_free_hyperv_page() to take void * argument
Pull RISC-V fixes from Palmer Dabbelt:
- A pair of fixes for build-related failures in the selftests
- A fix for a sparse warning in acpi_os_ioremap()
- A fix to restore the kernel PA offset in vmcoreinfo, to fix crash
handling
* tag 'riscv-for-linus-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
Documentation: kdump: Add va_kernel_pa_offset for RISCV64
riscv: Export va_kernel_pa_offset in vmcoreinfo
RISC-V: ACPI: Fix acpi_os_ioremap to return iomem address
selftests: riscv: Fix compilation error with vstate_exec_nolibc.c
selftests/riscv: fix potential build failure during the "emit_tests" step
Pull power management fix from Rafael Wysocki:
"Fix a sparse warning triggered by the TPMI interface recently added to
the Intel RAPL power capping driver (Zhang Rui)"
* tag 'pm-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
powercap: intel_rapl: Fix a sparse warning in TPMI interface
When the tagging protocol in current use is "ocelot-8021q" and we unbind
the driver, we see this splat:
$ echo '0000:00:00.2' > /sys/bus/pci/drivers/fsl_enetc/unbind
mscc_felix 0000:00:00.5 swp0: left promiscuous mode
sja1105 spi2.0: Link is Down
DSA: tree 1 torn down
mscc_felix 0000:00:00.5 swp2: left promiscuous mode
sja1105 spi2.2: Link is Down
DSA: tree 3 torn down
fsl_enetc 0000:00:00.2 eno2: left promiscuous mode
mscc_felix 0000:00:00.5: Link is Down
------------[ cut here ]------------
RTNL: assertion failed at net/dsa/tag_8021q.c (409)
WARNING: CPU: 1 PID: 329 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x12c/0x1a0
Modules linked in:
CPU: 1 PID: 329 Comm: bash Not tainted 6.5.0-rc3+ #771
pc : dsa_tag_8021q_unregister+0x12c/0x1a0
lr : dsa_tag_8021q_unregister+0x12c/0x1a0
Call trace:
dsa_tag_8021q_unregister+0x12c/0x1a0
felix_tag_8021q_teardown+0x130/0x150
felix_teardown+0x3c/0xd8
dsa_tree_teardown_switches+0xbc/0xe0
dsa_unregister_switch+0x168/0x260
felix_pci_remove+0x30/0x60
pci_device_remove+0x4c/0x100
device_release_driver_internal+0x188/0x288
device_links_unbind_consumers+0xfc/0x138
device_release_driver_internal+0xe0/0x288
device_driver_detach+0x24/0x38
unbind_store+0xd8/0x108
drv_attr_store+0x30/0x50
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
RTNL: assertion failed at net/8021q/vlan_core.c (376)
WARNING: CPU: 1 PID: 329 at net/8021q/vlan_core.c:376 vlan_vid_del+0x1b8/0x1f0
CPU: 1 PID: 329 Comm: bash Tainted: G W 6.5.0-rc3+ #771
pc : vlan_vid_del+0x1b8/0x1f0
lr : vlan_vid_del+0x1b8/0x1f0
dsa_tag_8021q_unregister+0x8c/0x1a0
felix_tag_8021q_teardown+0x130/0x150
felix_teardown+0x3c/0xd8
dsa_tree_teardown_switches+0xbc/0xe0
dsa_unregister_switch+0x168/0x260
felix_pci_remove+0x30/0x60
pci_device_remove+0x4c/0x100
device_release_driver_internal+0x188/0x288
device_links_unbind_consumers+0xfc/0x138
device_release_driver_internal+0xe0/0x288
device_driver_detach+0x24/0x38
unbind_store+0xd8/0x108
drv_attr_store+0x30/0x50
DSA: tree 0 torn down
This was somewhat not so easy to spot, because "ocelot-8021q" is not the
default tagging protocol, and thus, not everyone who tests the unbinding
path may have switched to it beforehand. The default
felix_tag_npi_teardown() does not require rtnl_lock() to be held.
Fixes: 7c83a7c539 ("net: dsa: add a second tagger for Ocelot switches based on tag_8021q")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230803134253.2711124-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Coccicheck reports the error below:
net/mptcp/protocol.c:3330:15-28: ERROR: test of a variable/field address
Since the address of msk->cb_flags is used in __test_and_clear_bit, the
address should not be NULL. The judgment for if (unlikely(msk->cb_flags))
will always be true, we should check the real value of msk->cb_flags here.
Fixes: 65a569b03c ("mptcp: optimize release_cb for the common case")
Signed-off-by: Xiang Yang <xiangyang3@huawei.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230803072438.1847500-1-xiangyang3@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 3bcbc20942 ("selftests/rseq: Play nice with binaries statically
linked against glibc 2.35+") which is now in Linus' tree introduced uses
of __weak but did nothing to ensure that a definition is provided for it
resulting in build failures for the rseq tests:
rseq.c:41:1: error: unknown type name '__weak'
__weak ptrdiff_t __rseq_offset;
^
rseq.c:41:17: error: expected ';' after top level declarator
__weak ptrdiff_t __rseq_offset;
^
;
rseq.c:42:1: error: unknown type name '__weak'
__weak unsigned int __rseq_size;
^
rseq.c:43:1: error: unknown type name '__weak'
__weak unsigned int __rseq_flags;
Fix this by using the definition from tools/include compiler.h.
Fixes: 3bcbc20942 ("selftests/rseq: Play nice with binaries statically linked against glibc 2.35+")
Signed-off-by: Mark Brown <broonie@kernel.org>
Message-Id: <20230804-kselftest-rseq-build-v1-1-015830b66aa9@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
damos_new_filter() is not initializing the list field of newly allocated
filter object. However, DAMON sysfs interface and DAMON_RECLAIM are not
initializing it after calling damos_new_filter(). As a result, accessing
uninitialized memory is possible. Actually, adding multiple DAMOS filters
via DAMON sysfs interface caused NULL pointer dereferencing. Initialize
the field just after the allocation from damos_new_filter().
Link: https://lkml.kernel.org/r/20230729203733.38949-2-sj@kernel.org
Fixes: 98def236f6 ("mm/damon/core: implement damos filter")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
During unmount process of nilfs2, nothing holds nilfs_root structure after
nilfs2 detaches its writer in nilfs_detach_log_writer(). Previously,
nilfs_evict_inode() could cause use-after-free read for nilfs_root if
inodes are left in "garbage_list" and released by nilfs_dispose_list at
the end of nilfs_detach_log_writer(), and this bug was fixed by commit
9b5a04ac3a ("nilfs2: fix use-after-free bug of nilfs_root in
nilfs_evict_inode()").
However, it turned out that there is another possibility of UAF in the
call path where mark_inode_dirty_sync() is called from iput():
nilfs_detach_log_writer()
nilfs_dispose_list()
iput()
mark_inode_dirty_sync()
__mark_inode_dirty()
nilfs_dirty_inode()
__nilfs_mark_inode_dirty()
nilfs_load_inode_block() --> causes UAF of nilfs_root struct
This can happen after commit 0ae45f63d4 ("vfs: add support for a
lazytime mount option"), which changed iput() to call
mark_inode_dirty_sync() on its final reference if i_state has I_DIRTY_TIME
flag and i_nlink is non-zero.
This issue appears after commit 28a65b49eb ("nilfs2: do not write dirty
data after degenerating to read-only") when using the syzbot reproducer,
but the issue has potentially existed before.
Fix this issue by adding a "purging flag" to the nilfs structure, setting
that flag while disposing the "garbage_list" and checking it in
__nilfs_mark_inode_dirty().
Unlike commit 9b5a04ac3a ("nilfs2: fix use-after-free bug of nilfs_root
in nilfs_evict_inode()"), this patch does not rely on ns_writer to
determine whether to skip operations, so as not to break recovery on
mount. The nilfs_salvage_orphan_logs routine dirties the buffer of
salvaged data before attaching the log writer, so changing
__nilfs_mark_inode_dirty() to skip the operation when ns_writer is NULL
will cause recovery write to fail. The purpose of using the cleanup-only
flag is to allow for narrowing of such conditions.
Link: https://lkml.kernel.org/r/20230728191318.33047-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+74db8b3087f293d3a13a@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000b4e906060113fd63@google.com
Fixes: 0ae45f63d4 ("vfs: add support for a lazytime mount option")
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org> # 4.0+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This test fails routinely in our prod testing environment, and I can
reproduce it locally as well.
The test allocates dcache inside a cgroup, then drops the memory limit
and checks that usage drops correspondingly. The reason it fails is
because dentries are freed with an RCU delay - a debugging sleep shows
that usage drops as expected shortly after.
Insert a 1s sleep after dropping the limit. This should be good
enough, assuming that machines running those tests are otherwise not
very busy.
Link: https://lkml.kernel.org/r/20230801135632.1768830-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
During stress testing, the following situation was observed:
70 root 39 19 0 0 0 R 100.0 0.0 959:29.92 khugepaged
310936 root 20 0 84416 25620 512 R 99.7 1.5 642:37.22 hugealloc
Tracing shows isolate_migratepages_block() endlessly looping over the
first block in the DMA zone:
hugealloc-310936 [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA order=9 ret=no_suitable_page
hugealloc-310936 [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
hugealloc-310936 [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA order=9 ret=no_suitable_page
hugealloc-310936 [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
hugealloc-310936 [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA order=9 ret=no_suitable_page
hugealloc-310936 [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
hugealloc-310936 [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA order=9 ret=no_suitable_page
hugealloc-310936 [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
The problem is that the functions tries to test and set the skip bit once
on the block, to avoid skipping on its own skip-set, using
pageblock_aligned() on the pfn as a test. But because this is the DMA
zone which starts at pfn 1, this is never true for the first block, and
the skip bit isn't set or tested at all. As a result,
fast_find_migrateblock() returns the same pageblock over and over.
If the pfn isn't pageblock-aligned, also check if it's the start of the
zone to ensure test-and-set-exactly-once on unaligned ranges.
Thanks to Vlastimil Babka for the help in debugging this.
Link: https://lkml.kernel.org/r/20230731172450.1632195-1-hannes@cmpxchg.org
Fixes: 90ed667c03 ("Revert "Revert "mm/compaction: fix set skip in fast_find_migrateblock""")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A missing break in kms_tests leads to kselftest hang when the parameter -s
is used.
In current code flow because of missing break in -s, -t parses args
spilled from -s and as -t accepts only valid values as 0,1 so any arg in
-s >1 or <0, gets in ksm_test failure
This went undetected since, before the addition of option -t, the next
case -M would immediately break out of the switch statement but that is no
longer the case
Add the missing break statement.
----Before----
./ksm_tests -H -s 100
Invalid merge type
----After----
./ksm_tests -H -s 100
Number of normal pages: 0
Number of huge pages: 50
Total size: 100 MiB
Total time: 0.401732682 s
Average speed: 248.922 MiB/s
Link: https://lkml.kernel.org/r/20230728163952.4634-1-ayush.jain3@amd.com
Fixes: 07115fcc15 ("selftests/mm: add new selftests for KSM")
Signed-off-by: Ayush Jain <ayush.jain3@amd.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Fix hugetlb free path race with memory errors".
In the discussion of Jiaqi Yan's series "Improve hugetlbfs read on
HWPOISON hugepages" the race window was discovered.
https://lore.kernel.org/linux-mm/20230616233447.GB7371@monkey/
Freeing a hugetlb page back to low level memory allocators is performed
in two steps.
1) Under hugetlb lock, remove page from hugetlb lists and clear destructor
2) Outside lock, allocate vmemmap if necessary and call low level free
Between these two steps, the hugetlb page will appear as a normal
compound page. However, vmemmap for tail pages could be missing.
If a memory error occurs at this time, we could try to update page
flags non-existant page structs.
A much more detailed description is in the first patch.
The first patch addresses the race window. However, it adds a
hugetlb_lock lock/unlock cycle to every vmemmap optimized hugetlb page
free operation. This could lead to slowdowns if one is freeing a large
number of hugetlb pages.
The second path optimizes the update_and_free_pages_bulk routine to only
take the lock once in bulk operations.
The second patch is technically not a bug fix, but includes a Fixes tag
and Cc stable to avoid a performance regression. It can be combined with
the first, but was done separately make reviewing easier.
This patch (of 2):
Freeing a hugetlb page and releasing base pages back to the underlying
allocator such as buddy or cma is performed in two steps:
- remove_hugetlb_folio() is called to remove the folio from hugetlb
lists, get a ref on the page and remove hugetlb destructor. This
all must be done under the hugetlb lock. After this call, the page
can be treated as a normal compound page or a collection of base
size pages.
- update_and_free_hugetlb_folio() is called to allocate vmemmap if
needed and the free routine of the underlying allocator is called
on the resulting page. We can not hold the hugetlb lock here.
One issue with this scheme is that a memory error could occur between
these two steps. In this case, the memory error handling code treats
the old hugetlb page as a normal compound page or collection of base
pages. It will then try to SetPageHWPoison(page) on the page with an
error. If the page with error is a tail page without vmemmap, a write
error will occur when trying to set the flag.
Address this issue by modifying remove_hugetlb_folio() and
update_and_free_hugetlb_folio() such that the hugetlb destructor is not
cleared until after allocating vmemmap. Since clearing the destructor
requires holding the hugetlb lock, the clearing is done in
remove_hugetlb_folio() if the vmemmap is present. This saves a
lock/unlock cycle. Otherwise, destructor is cleared in
update_and_free_hugetlb_folio() after allocating vmemmap.
Note that this will leave hugetlb pages in a state where they are marked
free (by hugetlb specific page flag) and have a ref count. This is not
a normal state. The only code that would notice is the memory error
code, and it is set up to retry in such a case.
A subsequent patch will create a routine to do bulk processing of
vmemmap allocation. This will eliminate a lock/unlock cycle for each
hugetlb page in the case where we are freeing a large number of pages.
Link: https://lkml.kernel.org/r/20230711220942.43706-1-mike.kravetz@oracle.com
Link: https://lkml.kernel.org/r/20230711220942.43706-2-mike.kravetz@oracle.com
Fixes: ad2fa3717b ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Jiaqi Yan <jiaqiyan@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If unpoison_memory() fails to clear page hwpoisoned flag, return value ret
is expected to be -EBUSY. But when get_hwpoison_page() returns 1 and
fails to clear page hwpoisoned flag due to races, return value will be
unexpected 1 leading to users being confused. And there's a code smell
that the variable "ret" is used not only to save the return value of
unpoison_memory(), but also the return value from get_hwpoison_page().
Make a further cleanup by using another auto-variable solely to save the
return value of get_hwpoison_page() as suggested by Naoya.
Link: https://lkml.kernel.org/r/20230727115643.639741-3-linmiaohe@huawei.com
Fixes: bf181c5825 ("mm/hwpoison: fix unpoison_memory()")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "A few fixup patches for mm", v2.
This series contains a few fixup patches to fix potential unexpected
return value, fix wrong swap entry type for hwpoisoned swapcache page and
so on. More details can be found in the respective changelogs.
This patch (of 3):
Hwpoisoned dirty swap cache page is kept in the swap cache and there's
simple interception code in do_swap_page() to catch it. But when trying
to swapoff, unuse_pte() will wrongly install a general sense of "future
accesses are invalid" swap entry for hwpoisoned swap cache page due to
unaware of such type of page. The user will receive SIGBUS signal without
expected BUS_MCEERR_AR payload. BTW, typo 'hwposioned' is fixed.
Link: https://lkml.kernel.org/r/20230727115643.639741-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20230727115643.639741-2-linmiaohe@huawei.com
Fixes: 6b970599e8 ("mm: hwpoison: support recovery from ksm_might_need_to_copy()")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently the pthread allocation for each array item is based on the size
of a pthread_t pointer and should be the size of the pthread_t structure,
so the allocation is under-allocating the correct size. Fix this by using
the size of each element in the pthreads array.
Static analysis cppcheck reported:
tools/testing/radix-tree/regression1.c:180:2: warning: Size of pointer
'threads' used instead of size of its data. [pointerSize]
Link: https://lkml.kernel.org/r/20230727160930.632674-1-colin.i.king@gmail.com
Fixes: 1366c37ed8 ("radix tree test harness")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull arm64 fixes from Catalin Marinas:
"More SVE/SME fixes for ptrace() and for the (potentially future) case
where SME is implemented in hardware without SVE support"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/fpsimd: Sync and zero pad FPSIMD state for streaming SVE
arm64/fpsimd: Sync FPSIMD state with SVE for SME only systems
arm64/ptrace: Don't enable SVE when setting streaming SVE
arm64/ptrace: Flush FP state when setting ZT0
arm64/fpsimd: Clear SME state in the target task when setting the VL
Pull mtd fixes from Miquel Raynal:
"Raw NAND fixes:
- fsl_upm: Fix an off-by one test in fun_exec_op()
- Rockchip:
- Align hwecc vs. raw page helper layouts
- Fix oobfree offset and description
- Meson: Fix OOB available bytes for ECC
- Omap ELM: Fix incorrect type in assignment
SPI-NOR fix:
- Avoid holes in struct spi_mem_op
Hyperbus fix:
- Add Tudor as reviewer in MAINTAINERS
SPI-NAND fixes:
- Winbond and Toshiba: Fix ecc_get_status"
* tag 'mtd/fixes-for-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: fsl_upm: Fix an off-by one test in fun_exec_op()
mtd: spi-nor: avoid holes in struct spi_mem_op
MAINTAINERS: Add myself as reviewer for HYPERBUS
mtd: rawnand: rockchip: Align hwecc vs. raw page helper layouts
mtd: rawnand: rockchip: fix oobfree offset and description
mtd: rawnand: meson: fix OOB available bytes for ECC
mtd: rawnand: omap_elm: Fix incorrect type in assignment
mtd: spinand: winbond: Fix ecc_get_status
mtd: spinand: toshiba: Fix ecc_get_status
Pull drm fixes from Dave Airlie:
"Small set of fixes this week, i915 and a few misc ones. I didn't see
an amd pull so maybe next week it'll have a few more on that driver.
ttm:
- NULL ptr deref fix
panel:
- add missing MODULE_DEVICE_TABLE
imx/ipuv3:
- timing fix
i915:
- Fix bug in getting msg length in AUX CH registers handler
- Gen12 AUX invalidation fixes
- Fix premature release of request's reusable memory"
* tag 'drm-fixes-2023-08-04' of git://anongit.freedesktop.org/drm/drm:
drm/panel: samsung-s6d7aa0: Add MODULE_DEVICE_TABLE
drm/i915: Fix premature release of request's reusable memory
drm/i915/gt: Support aux invalidation on all engines
drm/i915/gt: Poll aux invalidation register bit on invalidation
drm/i915/gt: Enable the CCS_FLUSH bit in the pipe control and in the CS
drm/i915/gt: Rename flags with bit_group_X according to the datasheet
drm/i915/gt: Ensure memory quiesced before invalidation
drm/i915: Add the gen12_needs_ccs_aux_inv helper
drm/i915/gt: Cleanup aux invalidation registers
drm/i915/gvt: Fix bug in getting msg length in AUX CH registers handler
drm/imx/ipuv3: Fix front porch adjustment upon hactive aligning
drm/ttm: check null pointer before accessing when swapping
Pull ceph fixes from Ilya Dryomov:
"Two patches to improve RBD exclusive lock interaction with
osd_request_timeout option and another fix to reduce the potential for
erroneous blocklisting -- this time in CephFS. All going to stable"
* tag 'ceph-for-6.5-rc5' of https://github.com/ceph/ceph-client:
libceph: fix potential hang in ceph_osdc_notify()
rbd: prevent busy loop when requesting exclusive lock
ceph: defer stopping mdsc delayed_work
In commit 20ea1e7d13 ("file: always lock position for
FMODE_ATOMIC_POS") we ended up always taking the file pos lock, because
pidfd_getfd() could get a reference to the file even when it didn't have
an elevated file count due to threading of other sharing cases.
But Mateusz Guzik reports that the extra locking is actually measurable,
so let's re-introduce the optimization, and only force the locking for
directory traversal.
Directories need the lock for correctness reasons, while regular files
only need it for "POSIX semantics". Since pidfd_getfd() is about
debuggers etc special things that are _way_ outside of POSIX, we can
relax the rules for that case.
Reported-by: Mateusz Guzik <mjguzik@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/linux-fsdevel/20230803095311.ijpvhx3fyrbkasul@f/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KVM/arm64 fixes for 6.5, part #2
- Fixes for the configuration of SVE/SME traps when hVHE mode is in use
- Allow use of pKVM on systems with FF-A implementations that are v1.0
compatible
- Request/release percpu IRQs (arch timer, vGIC maintenance) correctly
when pKVM is in use
- Fix function prototype after __kvm_host_psci_cpu_entry() rename
- Skip to the next instruction when emulating writes to TCR_EL1 on
AmpereOne systems
To avoid possible time-of-check/time-of-use issues, the GHCB should
almost never be accessed outside dump_ghcb, sev_es_sync_to_ghcb
and sev_es_sync_from_ghcb. The only legitimate uses are to set the
exitinfo fields and to find the address of the scratch area embedded
in the ghcb. Accessing ghcb_usage also goes through svm->sev_es.ghcb
in sev_es_validate_vmgexit(), but that is because anyway the value is
not used.
Removing a shortcut variable that contains the value of svm->sev_es.ghcb
makes these cases a bit more verbose, but it limits the chance of someone
reading the ghcb by mistake.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A KVM guest using SEV-ES or SEV-SNP with multiple vCPUs can trigger
a double fetch race condition vulnerability and invoke the VMGEXIT
handler recursively.
sev_handle_vmgexit() maps the GHCB page using kvm_vcpu_map() and then
fetches the exit code using ghcb_get_sw_exit_code(). Soon after,
sev_es_validate_vmgexit() fetches the exit code again. Since the GHCB
page is shared with the guest, the guest is able to quickly swap the
values with another vCPU and hence bypass the validation. One vmexit code
that can be rejected by sev_es_validate_vmgexit() is SVM_EXIT_VMGEXIT;
if sev_handle_vmgexit() observes it in the second fetch, the call
to svm_invoke_exit_handler() will invoke sev_handle_vmgexit() again
recursively.
To avoid the race, always fetch the GHCB data from the places where
sev_es_sync_from_ghcb stores it.
Exploiting recursions on linux kernel has been proven feasible
in the past, but the impact is mitigated by stack guard pages
(CONFIG_VMAP_STACK). Still, if an attacker manages to call the handler
multiple times, they can theoretically trigger a stack overflow and
cause a denial-of-service, or potentially guest-to-host escape in kernel
configurations without stack guard pages.
Note that winning the race reliably in every iteration is very tricky
due to the very tight window of the fetches; depending on the compiler
settings, they are often consecutive because of optimization and inlining.
Tested by booting an SEV-ES RHEL9 guest.
Fixes: CVE-2023-4155
Fixes: 291bd20d5d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Cc: stable@vger.kernel.org
Reported-by: Andy Nguyen <theflow@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Validation of the GHCB is susceptible to time-of-check/time-of-use vulnerabilities.
To avoid them, we would like to always snapshot the fields that are read in
sev_es_validate_vmgexit(), and not use the GHCB anymore after it returns.
This means:
- invoking sev_es_sync_from_ghcb() before any GHCB access, including before
sev_es_validate_vmgexit()
- snapshotting all fields including the valid bitmap and the sw_scratch field,
which are currently not caching anywhere.
The valid bitmap is the first thing to be copied out of the GHCB; then,
further accesses will use the copy in svm->sev_es.
Fixes: 291bd20d5d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We have some reports of linux NFS clients that cannot satisfy a linux knfsd
server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though
those clients repeatedly walk all their known state using TEST_STATEID and
receive NFS4_OK for all.
Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then
nfsd4_free_stateid() finds the delegation and returns NFS4_OK to
FREE_STATEID. Afterward, revoke_delegation() moves the same delegation to
cl_revoked. This would produce the observed client/server effect.
Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID
and move to cl_revoked happens within the same cl_lock. This will allow
nfsd4_free_stateid() to properly remove the delegation from cl_revoked.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org # v4.17+
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
We have a function sve_sync_from_fpsimd_zeropad() which is used by the
ptrace code to update the SVE state when the user writes to the the
FPSIMD register set. Currently this checks that the task has SVE
enabled but this will miss updates for tasks which have streaming SVE
enabled if SVE has not been enabled for the thread, also do the
conversion if the task has streaming SVE enabled.
Fixes: e12310a0d3 ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-ssve-no-sve-v1-3-49df214bfb3e@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Systems which implement SME without also implementing SVE are
architecturally valid but were not initially supported by the kernel,
unfortunately we missed one issue in the ptrace code.
The SVE register setting code is shared between SVE and streaming mode
SVE. When we set full SVE register state we currently enable TIF_SVE
unconditionally, in the case where streaming SVE is being configured on a
system that supports vanilla SVE this is not an issue since we always
initialise enough state for both vector lengths but on a system which only
support SME it will result in us attempting to restore the SVE vector
length after having set streaming SVE registers.
Fix this by making the enabling of SVE conditional on setting SVE vector
state. If we set streaming SVE state and SVE was not already enabled this
will result in a SVE access trap on next use of normal SVE, this will cause
us to flush our register state but this is fine since the only way to
trigger a SVE access trap would be to exit streaming mode which will cause
the in register state to be flushed anyway.
Fixes: e12310a0d3 ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-ssve-no-sve-v1-1-49df214bfb3e@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Unloading a hardware specific 8250 driver can produce error "Unable to
handle kernel paging request at virtual address" about ten seconds after
unloading the driver. This happens on uart_hangup() calling
uart_change_pm().
Turns out commit 04e82793f0 ("serial: 8250: Reinit port->pm on port
specific driver unbind") was only a partial fix. If the hardware specific
driver has initialized port->pm function, we need to clear port->pm too.
Just reinitializing port->ops does not do this. Otherwise serial8250_pm()
will call port->pm() instead of serial8250_do_pm().
Fixes: 04e82793f0 ("serial: 8250: Reinit port->pm on port specific driver unbind")
Signed-off-by: Tony Lindgren <tony@atomide.com>
Link: https://lore.kernel.org/r/20230804131553.52927-1-tony@atomide.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With commit 2d47c6956a ("ubsan: Tighten UBSAN_BOUNDS on GCC") if
CONFIG_UBSAN is enabled and gcc supports -fsanitize=bounds-strict, we
can trigger the following build error due to bindgen lacking support for
this additional build option:
BINDGEN rust/bindings/bindings_generated.rs
error: unsupported argument 'bounds-strict' to option '-fsanitize='
Fix by adding -fsanitize=bounds-strict to the list of skipped gcc flags
for bindgen.
Fixes: 2d47c6956a ("ubsan: Tighten UBSAN_BOUNDS on GCC")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Link: https://lore.kernel.org/r/20230711071914.133946-1-andrea.righi@canonical.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
We discovered that the current design of `borrow_mut` is problematic.
This patch removes it until a better solution can be found.
Specifically, the current design gives you access to a `&mut T`, which
lets you change where the `ForeignOwnable` points (e.g., with
`core::mem::swap`). No upcoming user of this API intended to make that
possible, making all of them unsound.
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Fixes: 0fc4424d24 ("rust: types: introduce `ForeignOwnable`")
Link: https://lore.kernel.org/r/20230706094615.3080784-1-aliceryhl@google.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Currently the rust allocator simply passes the size of the type Layout
to krealloc(), and in theory the alignment requirement from the type
Layout may be larger than the guarantee provided by SLAB, which means
the allocated object is mis-aligned.
Fix this by adjusting the allocation size to the nearest power of two,
which SLAB always guarantees a size-aligned allocation. And because Rust
guarantees that the original size must be a multiple of alignment and
the alignment must be a power of two, then the alignment requirement is
satisfied.
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Co-developed-by: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
Signed-off-by: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Cc: stable@vger.kernel.org # v6.1+
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 247b365dc8 ("rust: add `kernel` crate")
Link: https://github.com/Rust-for-Linux/linux/issues/974
Link: https://lore.kernel.org/r/20230730012905.643822-2-boqun.feng@gmail.com
[ Applied rewording of comment as discussed in the mailing list. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Jonathan writes:
1st set of IIO fixes for 6.5
Usual mixed bag of fixes for recently introduced issues and ones from way
back that have recently been noticed.
* core
- Avoid a device with no parent issues seen on the dummy example device.
* adi,ad71145
- Drop ref now that dt-schema supports -nanoamp.
* adi,ad7192
- Fix wrong bit set for enabling AC excitation and exposure of control
on devices without the feature.
* adi,admv1013
- Don't ignore errors from regulator_get_voltage().
* amlogic,meson-adc
- Make sure clocks enabled early enough.
* google,cros_ec
- Fix undersized cros_ec_command allocation that resulted in a buffer
overrun.
* rohm,bu27008
- Fix truncation issue with scale format that prevents smallest value
being set
- Report intensity as unsigned. Previously large values would be
interpretted as negative intensities (and odd concept).
* rohm,bu27034
- Fix truncation issue with scale format that prevents smallest value
being set.
* st,lsm6dsx
- Return an error code, not false (which is 0 and hence success)
to indicate ACPI mount matrix retrieval failed due to no ACPI
support.
* ti,ina2xx
- Avoid a NULL pointer dereference if fall back compatible is used.
* tag 'iio-fixes-for-6.5a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: cros_ec: Fix the allocation size for cros_ec_command
iio: imu: lsm6dsx: Fix mount matrix retrieval
iio: adc: meson: fix core clock enable/disable moment
iio: core: Prevent invalid memory access when there is no parent
iio: frequency: admv1013: propagate errors from regulator_get_voltage()
dt-bindings: iio: adi,ad74115: remove ref from -nanoamp
iio: adc: ina2xx: avoid NULL pointer dereference on OF device match
iio: light: bu27008: Fix intensity data type
iio: light: bu27008: Fix scale format
iio: light: bu27034: Fix scale format
iio: adc: ad7192: Fix ac excitation feature
William writes:
Second set of Counter fixes for 6.5
The I8254 Kconfig entry is repositioned to resolve a misplacement
causing the "Counter support" submenu items to disappear in menuconfig.
The tools/counter/Makefile clean recipe is adjusted to replace rmdir
with an equivalent set of rm to prevent failure if someone tries to
clean the counter directory without building it first.
* tag 'counter-fixes-for-6.5b' of git://git.kernel.org/pub/scm/linux/kernel/git/wbg/counter:
tools/counter: Makefile: Replace rmdir by rm to avoid make,clean failure
counter: Fix menuconfig "Counter support" submenu entries disappearance
The memory allocated in tb_queue_dp_bandwidth_request() needs to be
released once the request is handled to avoid leaking it.
Fixes: 6ce3563520 ("thunderbolt: Add support for DisplayPort bandwidth allocation mode")
Cc: stable@vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
William writes:
First set of Counter fixes for 6.5
In commit d428487471 ("counter: i8254: Introduce the Intel 8254
interface library module"), the misplacement of the I8254 Kconfig entry
results in the "Counter support" submenu items disappearing in
menuconfig. A fix is provided to reposition the I8254 Kconfig entry to
restore the intended submenu behavior.
* tag 'counter-fixes-for-6.5a' of git://git.kernel.org/pub/scm/linux/kernel/git/wbg/counter:
counter: Fix menuconfig "Counter support" submenu entries disappearance
After fixing the serial core port device to use port->port_id instead of
port->line, unloading a hardware specific 8250 port driver started
producing an error for "sysfs: cannot create duplicate filename".
This is happening as we are wrongly initializing port->port_id to zero
when adding back serial8250_isa_devs instances, and the serial8250:0.0
sysfs entry may already exist. For serial8250 devices, we typically have
multiple devices mapped to a single driver instance. For the
serial8250_isa_devs instances, the port->port_id is the same as port->line.
Let's fix the issue by re-initializing port_id when adding back the
serial8250_isa_devs instances in serial8250_unregister_port().
Fixes: d962de6ae5 ("serial: core: Fix serial core port id to not use port->line")
Signed-off-by: Tony Lindgren <tony@atomide.com>
Link: https://lore.kernel.org/r/20230804123546.25293-1-tony@atomide.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kmemleak reports issues for serial8250 ports after the hardware specific
driver takes over on boot as noted by Tomi.
The kerneldoc for device_initialize() says we must call device_put()
after calling device_initialize(). We are calling device_put() on the
error path, but are missing it from the device remove path. This causes
release() to never get called for the devices on remove.
Let's add the missing put_device() calls for both serial ctrl and
port devices.
Fixes: 84a9582fd2 ("serial: core: Start managing serial controllers to enable runtime PM")
Reported-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Link: https://lore.kernel.org/r/20230804090909.51529-1-tony@atomide.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If dwc3 is runtime suspended we defer processing the event buffer
until resume, by setting the pending_events flag. Set this flag before
triggering resume to avoid race with the runtime resume callback.
While handling the pending events, in addition to checking the event
buffer we also need to process it. Handle this by explicitly calling
dwc3_thread_interrupt(). Also balance the runtime pm get() operation
that triggered this processing.
Cc: stable@vger.kernel.org
Fixes: fc8bb91bc8 ("usb: dwc3: implement runtime PM")
Signed-off-by: Elson Roy Serrao <quic_eserrao@quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20230801192658.19275-1-quic_eserrao@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Syzbot got KMSAN to complain about access to an uninitialized value in
the alauda subdriver of usb-storage:
BUG: KMSAN: uninit-value in alauda_transport+0x462/0x57f0
drivers/usb/storage/alauda.c:1137
CPU: 0 PID: 12279 Comm: usb-storage Not tainted 5.3.0-rc7+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x191/0x1f0 lib/dump_stack.c:113
kmsan_report+0x13a/0x2b0 mm/kmsan/kmsan_report.c:108
__msan_warning+0x73/0xe0 mm/kmsan/kmsan_instr.c:250
alauda_check_media+0x344/0x3310 drivers/usb/storage/alauda.c:460
The problem is that alauda_check_media() doesn't verify that its USB
transfer succeeded before trying to use the received data. What
should happen if the transfer fails isn't entirely clear, but a
reasonably conservative approach is to pretend that no media is
present.
A similar problem exists in a usb_stor_dbg() call in
alauda_get_media_status(). In this case, when an error occurs the
call is redundant, because usb_stor_ctrl_transfer() already will print
a debugging message.
Finally, unrelated to the uninitialized memory access, is the fact
that alauda_check_media() performs DMA to a buffer on the stack.
Fortunately usb-storage provides a general purpose DMA-able buffer for
uses like this. We'll use it instead.
Reported-and-tested-by: syzbot+e7d46eb426883fb97efd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/0000000000007d25ff059457342d@google.com/T/
Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: e80b0fade0 ("[PATCH] USB Storage: add alauda support")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/693d5d5e-f09b-42d0-8ed9-1f96cd30bcce@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Avichal Rakesh reported a kernel panic that occurred when the UVC
gadget driver was removed from a gadget's configuration. The panic
involves a somewhat complicated interaction between the kernel driver
and a userspace component (as described in the Link tag below), but
the analysis did make one thing clear: The Gadget core should
accomodate gadget drivers calling usb_gadget_deactivate() as part of
their unbind procedure.
Currently this doesn't work. gadget_unbind_driver() calls
driver->unbind() while holding the udc->connect_lock mutex, and
usb_gadget_deactivate() attempts to acquire that mutex, which will
result in a deadlock.
The simple fix is for gadget_unbind_driver() to release the mutex when
invoking the ->unbind() callback. There is no particular reason for
it to be holding the mutex at that time, and the mutex isn't held
while the ->bind() callback is invoked. So we'll drop the mutex
before performing the unbind callback and reacquire it afterward.
We'll also add a couple of comments to usb_gadget_activate() and
usb_gadget_deactivate(). Because they run in process context they
must not be called from a gadget driver's ->disconnect() callback,
which (according to the kerneldoc for struct usb_gadget_driver in
include/linux/usb/gadget.h) may run in interrupt context. This may
help prevent similar bugs from arising in the future.
Reported-and-tested-by: Avichal Rakesh <arakesh@google.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: 286d9975a8 ("usb: gadget: udc: core: Prevent soft_connect_store() race")
Link: https://lore.kernel.org/linux-usb/4d7aa3f4-22d9-9f5a-3d70-1bd7148ff4ba@google.com/
Cc: Badhri Jagan Sridharan <badhri@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/48b2f1f1-0639-46bf-bbfc-98cb05a24914@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When connecting to some DisplayPort partners, the initial status update
after entering DisplayPort Alt Mode notifies that the DFP_D/UFP_D is not in
the connected state. This leads to sending a configure message that keeps
the device in USB mode. The port partner then sets DFP_D/UFP_D to the
connected state and HPD to high in the same Attention message. Currently,
the HPD signal is dropped in order to handle configuration.
This patch saves changes to the HPD signal when the device chooses to
configure during dp_altmode_status_update, and invokes sysfs_notify if
necessary for HPD after configuring.
Fixes: 0e3bb7d689 ("usb: typec: Add driver for DisplayPort alternate mode")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20230726020903.1409072-1-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Do not transition to SNK_UNATTACHED state when receiving vsafe0v event
while in SNK_HARD_RESET_WAIT_VBUS. Ignore VBUS off events as well as
in some platforms VBUS off can be signalled more than once.
[143515.364753] Requesting mux state 1, usb-role 2, orientation 2
[143515.365520] pending state change SNK_HARD_RESET_SINK_OFF -> SNK_HARD_RESET_SINK_ON @ 650 ms [rev3 HARD_RESET]
[143515.632281] CC1: 0 -> 0, CC2: 3 -> 0 [state SNK_HARD_RESET_SINK_OFF, polarity 1, disconnected]
[143515.637214] VBUS on
[143515.664985] VBUS off
[143515.664992] state change SNK_HARD_RESET_SINK_OFF -> SNK_HARD_RESET_WAIT_VBUS [rev3 HARD_RESET]
[143515.665564] VBUS VSAFE0V
[143515.665566] state change SNK_HARD_RESET_WAIT_VBUS -> SNK_UNATTACHED [rev3 HARD_RESET]
Fixes: 28b43d3d74 ("usb: typec: tcpm: Introduce vsafe0v for vbus")
Cc: <stable@vger.kernel.org>
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20230712085722.1414743-1-badhri@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Li Yang says:
====================
fix at803x wol setting
v3:
Break long lines
Add back error checking of phy_read
v4:
Disable WoL in 1588 register for AR8031 in probe
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the AR8032 part does not support wol, remove related callbacks
from it.
Fixes: 5800091a20 ("net: phy: at803x: add support for AR8032 PHY")
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Cc: David Bauer <mail@david-bauer.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 7beecaf7d5 ("net: phy: at803x: improve the WOL feature"), it
seems not correct to use a wol_en bit in a 1588 Control Register which is
only available on AR8031/AR8033(share the same phy_id) to determine if WoL
is enabled. Change it back to use AT803X_INTR_ENABLE_WOL for determining
the WoL status which is applicable on all chips supporting wol. Also update
the at803x_set_wol() function to only update the 1588 register on chips
having it. After this change, disabling wol at probe from commit
d7cd5e06c9 ("net: phy: at803x: disable WOL at probe") is no longer
needed. Change it to just disable the WoL bit in 1588 register for
AR8031/AR8033 to be aligned with AT803X_INTR_ENABLE_WOL in probe.
Fixes: 7beecaf7d5 ("net: phy: at803x: improve the WOL feature")
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Reviewed-by: Viorel Suman <viorel.suman@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.5
- Fixes for request_queue state (Ming)
- Another uuid quirk (August)"
* tag 'nvme-6.5-2023-08-02' of git://git.infradead.org/nvme:
nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM9B1 256G and 512G
nvme-rdma: fix potential unbalanced freeze & unfreeze
nvme-tcp: fix potential unbalanced freeze & unfreeze
nvme: fix possible hang when removing a controller during error recovery
When booting a kernel with CONFIG_MISDN_DSP=y and CONFIG_CFI_CLANG=y,
there is a failure when dsp_cmx_send() is called indirectly from
call_timer_fn():
[ 0.371412] CFI failure at call_timer_fn+0x2f/0x150 (target: dsp_cmx_send+0x0/0x530; expected type: 0x92ada1e9)
The function pointer prototype that call_timer_fn() expects is
void (*fn)(struct timer_list *)
whereas dsp_cmx_send() has a parameter type of 'void *', which causes
the control flow integrity checks to fail because the parameter types do
not match.
Change dsp_cmx_send()'s parameter type to be 'struct timer_list' to
match the expected prototype. The argument is unused anyways, so this
has no functional change, aside from avoiding the CFI failure.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202308020936.58787e6c-oliver.sang@intel.com
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Fixes: e313ac12eb ("mISDN: Convert timers to use timer_setup()")
Link: https://lore.kernel.org/r/20230802-fix-dsp_cmx_send-cfi-failure-v1-1-2f2e79b0178d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix segfault in the powerpc specific arch_skip_callchain_idx
function. The patch doing the reference count init/exit that went
into 6.5 missed this function.
- Fix regression reading the arm64 PMU cpu slots in sysfs, a patch
removing some code duplication ended up duplicating the /sysfs prefix
for these files.
- Fix grouping of events related to topdown, addressing a regression on
the CSV output produced by 'perf stat' noticed on the downstream tool
toplev.
- Fix the uprobe_from_different_cu 'perf test' entry, it is failing
when gcc isn't available, so we need to check that and skip the test
if it is not installed.
* tag 'perf-tools-fixes-for-v6.5-2-2023-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf test parse-events: Test complex name has required event format
perf pmus: Create placholder regardless of scanning core_only
perf test uprobe_from_different_cu: Skip if there is no gcc
perf parse-events: Only move force grouped evsels when sorting
perf parse-events: When fixing group leaders always set the leader
perf parse-events: Extra care around force grouped events
perf callchain powerpc: Fix addr location init during arch_skip_callchain_idx function
perf pmu arm64: Fix reading the PMU cpu slots in sysfs
Pull cxl fixes from Vishal Verma:
- Fixup the Sanitixe device ABI that was merged for v6.5 to hide some
sysfs files when the necessary support is missing. Update the ABI
documentation around this as well.
* tag 'cxl-fixes-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/memdev: Only show sanitize sysfs files when supported
cxl/memdev: Document security state in kern-doc
cxl/memdev: Improve sanitize ABI descriptions
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf and wireless.
Nothing scary here. Feels like the first wave of regressions from v6.5
is addressed - one outstanding fix still to come in TLS for the
sendpage rework.
Current release - regressions:
- udp: fix __ip_append_data()'s handling of MSG_SPLICE_PAGES
- dsa: fix older DSA drivers using phylink
Previous releases - regressions:
- gro: fix misuse of CB in udp socket lookup
- mlx5: unregister devlink params in case interface is down
- Revert "wifi: ath11k: Enable threaded NAPI"
Previous releases - always broken:
- sched: cls_u32: fix match key mis-addressing
- sched: bind logic fixes for cls_fw, cls_u32 and cls_route
- add bound checks to a number of places which hand-parse netlink
- bpf: disable preemption in perf_event_output helpers code
- qed: fix scheduling in a tasklet while getting stats
- avoid using APIs which are not hardirq-safe in couple of drivers,
when we may be in a hard IRQ (netconsole)
- wifi: cfg80211: fix return value in scan logic, avoid page
allocator warning
- wifi: mt76: mt7615: do not advertise 5 GHz on first PHY of MT7615D
(DBDC)
Misc:
- drop handful of inactive maintainers, put some new in place"
* tag 'net-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (98 commits)
MAINTAINERS: update TUN/TAP maintainers
test/vsock: remove vsock_perf executable on `make clean`
tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen
tcp_metrics: annotate data-races around tm->tcpm_net
tcp_metrics: annotate data-races around tm->tcpm_vals[]
tcp_metrics: annotate data-races around tm->tcpm_lock
tcp_metrics: annotate data-races around tm->tcpm_stamp
tcp_metrics: fix addr_same() helper
prestera: fix fallback to previous version on same major version
udp: Fix __ip_append_data()'s handling of MSG_SPLICE_PAGES
net/mlx5e: Set proper IPsec source port in L4 selector
net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio
net/mlx5: fs_core: Make find_closest_ft more generic
wifi: brcmfmac: Fix field-spanning write in brcmf_scan_params_v2_to_v1()
vxlan: Fix nexthop hash size
ip6mr: Fix skb_under_panic in ip6mr_cache_report()
s390/qeth: Don't call dev_close/dev_open (DOWN/UP)
net: tap_open(): set sk_uid from current_fsuid()
net: tun_chr_open(): set sk_uid from current_fsuid()
net: dcb: choose correct policy to parse DCB_ATTR_BCN
...
Willem and Jason have agreed to take over the maintainer
duties for TUN/TAP, thank you!
There's an existing entry for TUN/TAP which only covers
the user mode Linux implementation.
Since we haven't heard from Maxim on the list for almost
a decade, extend that entry and take it over, rather than
adding a new one.
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20230802182843.4193099-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin KaFai Lau says:
====================
pull-request: bpf 2023-08-03
We've added 5 non-merge commits during the last 7 day(s) which contain
a total of 3 files changed, 37 insertions(+), 20 deletions(-).
The main changes are:
1) Disable preemption in perf_event_output helpers code,
from Jiri Olsa
2) Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing,
from Lin Ma
3) Multiple warning splat fixes in cpumap from Hou Tao
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, cpumap: Handle skb as well when clean up ptr_ring
bpf, cpumap: Make sure kthread is running before map update returns
bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing
bpf: Disable preemption in bpf_event_output
bpf: Disable preemption in bpf_perf_event_output
====================
Link: https://lore.kernel.org/r/20230803181429.994607-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kalle Valo says:
====================
wireless fixes for v6.5
We did some house cleaning in MAINTAINERS file so several patches
about that. Few regressions fixed and also fix some recently enabled
memcpy() warnings. Only small commits and nothing special standing
out.
* tag 'wireless-2023-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: brcmfmac: Fix field-spanning write in brcmf_scan_params_v2_to_v1()
wifi: ray_cs: Replace 1-element array with flexible array
MAINTAINERS: add Jeff as ath10k, ath11k and ath12k maintainer
MAINTAINERS: wifi: mark mlw8k as orphan
MAINTAINERS: wifi: mark b43 as orphan
MAINTAINERS: wifi: mark zd1211rw as orphan
MAINTAINERS: wifi: mark wl3501 as orphan
MAINTAINERS: wifi: mark rndis_wlan as orphan
MAINTAINERS: wifi: mark ar5523 as orphan
MAINTAINERS: wifi: mark cw1200 as orphan
MAINTAINERS: wifi: atmel: mark as orphan
MAINTAINERS: wifi: rtw88: change Ping as the maintainer
Revert "wifi: ath6k: silence false positive -Wno-dangling-pointer warning on GCC 12"
wifi: cfg80211: Fix return value in scan logic
Revert "wifi: ath11k: Enable threaded NAPI"
MAINTAINERS: Update mwifiex maintainer list
wifi: mt76: mt7615: do not advertise 5 GHz on first phy of MT7615D (DBDC)
====================
Link: https://lore.kernel.org/r/20230803140058.57476C433C9@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet says:
====================
tcp_metrics: series of fixes
This series contains a fix for addr_same() and various
data-race annotations.
We still have to address races over tm->tcpm_saddr and
tm->tcpm_daddr later.
====================
Link: https://lore.kernel.org/r/20230802131500.1478140-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Whenever tcpm_new() reclaims an old entry, tcpm_suck_dst()
would overwrite data that could be read from tcp_fastopen_cache_get()
or tcp_metrics_fill_info().
We need to acquire fastopen_seqlock to maintain consistency.
For newly allocated objects, tcpm_new() can switch to kzalloc()
to avoid an extra fastopen_seqlock acquisition.
Fixes: 1fe4c481ba ("net-tcp: Fast Open client - cookie cache")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230802131500.1478140-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
tm->tcpm_vals[] values can be read or written locklessly.
Add needed READ_ONCE()/WRITE_ONCE() to document this,
and force use of tcp_metric_get() and tcp_metric_set()
Fixes: 51c5d0c4b1 ("tcp: Maintain dynamic metrics in local cache.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Because v4 and v6 families use separate inetpeer trees (respectively
net->ipv4.peers and net->ipv6.peers), inetpeer_addr_cmp(a, b) assumes
a & b share the same family.
tcp_metrics use a common hash table, where entries can have different
families.
We must therefore make sure to not call inetpeer_addr_cmp()
if the families do not match.
Fixes: d39d14ffa2 ("net: Add helper function to compare inetpeer addresses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230802131500.1478140-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When both supported and previous version have the same major version,
and the firmwares are missing, the driver ends in a loop requesting the
same (previous) version over and over again:
[ 76.327413] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.1.img firmware, fall-back to previous 4.0 version
[ 76.339802] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.352162] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.364502] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.376848] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.389183] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.401522] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.413860] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.426199] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
...
Fix this by inverting the check to that we aren't yet at the previous
version, and also check the minor version.
This also catches the case where both versions are the same, as it was
after commit bb5dbf2cc6 ("net: marvell: prestera: add firmware v4.0
support").
With this fix applied:
[ 88.499622] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.1.img firmware, fall-back to previous 4.0 version
[ 88.511995] Prestera DX 0000:01:00.0: failed to request previous firmware: mrvl/prestera/mvsw_prestera_fw-v4.0.img
[ 88.522403] Prestera DX: probe of 0000:01:00.0 failed with error -2
Fixes: 47f26018a4 ("net: marvell: prestera: try to load previous fw version")
Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Acked-by: Elad Nachman <enachman@marvell.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Taras Chornyi <taras.chornyi@plvision.eu>
Link: https://lore.kernel.org/r/20230802092357.163944-1-jonas.gorski@bisdn.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull nfsd fix from Chuck Lever:
- Fix tmpfs splice read support
* tag 'nfsd-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: Fix reading via splice
Pull erofs fixes from Gao Xiang:
- Fix data corruption caused by insufficient decompression on
deduplicated compressed extents
- Drop a useless s_magic checking in erofs_kill_sb()
* tag 'erofs-for-6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: drop unnecessary WARN_ON() in erofs_kill_sb()
erofs: fix wrong primary bvec selection on deduplicated extents
Pull s390 fixes from Heiko Carstens:
- Split kernel large page mappings into 4k mappings in case debug
pagealloc is enabled again. This got accidentally removed by commit
bb1520d581 ("s390/mm: start kernel with DAT enabled")
- Fix error handling in KVM's sthyi handling
- Add missing include to s390's uapi ptrace.h
- Update defconfigs
* tag 's390-6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/ptrace: add missing linux/const.h include
KVM: s390: fix sthyi error handling
s390: update defconfigs
s390/vmem: split pages when debug pagealloc is enabled
When setting ZT0 via ptrace we do not currently force a reload of the
floating point register state from memory, do that to ensure that the newly
set value gets loaded into the registers on next task execution.
The function was templated off the function for FPSIMD which due to our
providing the option of embedding a FPSIMD regset within the SVE regset
does not directly include the flush.
Fixes: f90b529bcb ("arm64/sme: Implement ZT0 ptrace support")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-zt0-flush-v1-1-72e854eaf96e@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When setting SME vector lengths we clear TIF_SME to reenable SME traps,
doing a reallocation of the backing storage on next use. We do this using
clear_thread_flag() which operates on the current thread, meaning that when
setting the vector length via ptrace we may both not force traps for the
target task and force a spurious flush of any SME state that the tracing
task may have.
Clear the flag in the target task.
Fixes: e12310a0d3 ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Reported-by: David Spickett <David.Spickett@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-tif-sme-v1-1-88312fd6fbfd@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We need this in order to easily reuse register definitions
and some functions with Sound Open Firmware driver.
According to Documentation/process/license-rules.rst:
"Dual BSD/GPL" The module is dual licensed under a GPL v2
variant or BSD license choice. The exact
variant of the BSD license can only be
determined via the license information
in the corresponding source files.
so use "Dual BSD/GPL" for license string.
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Link: https://lore.kernel.org/r/20230803072638.640789-1-daniel.baluta@oss.nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Simulated chips use a mutex for synchronization in driver callbacks so
they must not be called from interrupt context. Set the can_sleep field
of the GPIO chip to true to force users to only use threaded irqs.
Fixes: cb8c474e79 ("gpio: sim: new testing module")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Fix checkpatch warnings:
unaligned.c:475: ERROR: space required after that ','
Signed-off-by: Yu Han <hanyu001@208suo.com>
Signed-off-by: Helge Deller <deller@gmx.de>
This driver does not actually work with DMA mode, but still tries
to call ISA DMA interface functions that are stubbed out on
parisc, resulting in a W=1 build warning:
drivers/parport/parport_gsc.c: In function 'parport_remove_chip':
drivers/parport/parport_gsc.c:389:20: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
389 | free_dma(p->dma);
Remove the corresponding code as a prerequisite for turning on -Wempty-body
by default in all kernels.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Helge Deller <deller@gmx.de>
Clearly, this code isn't needed, but it gives a false positive when
grepping the complete source tree for coherent_dma_mask.
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Previously, on unplug events, the TMU mode was disabled first
followed by the Time Synchronization Handshake, irrespective of
whether the tb_switch_tmu_rate_write() API was successful or not.
However, this caused a problem with Thunderbolt 3 (TBT3)
devices, as the TSPacketInterval bits were always enabled by default,
leading the host router to assume that the device router's TMU was
already enabled and preventing it from initiating the Time
Synchronization Handshake. As a result, TBT3 monitors experienced
display flickering from the second hot plug onwards.
To address this issue, we have modified the code to only disable the
Time Synchronization Handshake during TMU disable if the
tb_switch_tmu_rate_write() function is successful. This ensures that
the TBT3 devices function correctly and eliminates the display
flickering issue.
Co-developed-by: Sanath S <Sanath.S@amd.com>
Signed-off-by: Sanath S <Sanath.S@amd.com>
Signed-off-by: Sanjay R Mehta <sanju.mehta@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Currently we use the drm_dp_dpcd_read_caps() helper in the DRM side of
nouveau in order to read the DPCD of a DP connector, which makes sure we do
the right thing and also check for extended DPCD caps. However, it turns
out we're not currently doing this on the nvkm side since we don't have
access to the drm_dp_aux structure there - which means that the DRM side of
the driver and the NVKM side can end up with different DPCD capabilities
for the same connector.
Ideally in order to fix this, we just want to use the
drm_dp_read_dpcd_caps() helper in nouveau. That's not currently possible
though, and is going to depend on having a bunch of the DP code moved out
of nvkm and into the DRM side of things as part of the GSP enablement work.
Until then however, let's workaround this problem by porting a copy of
drm_dp_read_dpcd_caps() into NVKM - which should fix this issue.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/211
Link: https://patchwork.freedesktop.org/patch/msgid/20230728225858.350581-1-lyude@redhat.com
(cherry picked from commit cc4adf3a73 in drm-misc-next)
Cc: <stable@vger.kernel.org> # 6.3+
Signed-off-by: Karol Herbst <kherbst@redhat.com>
We have a lurking bug where Fragment Shader Helper Invocations can't load
from memory. But this is actually required in OpenGL and is causing random
hangs or failures in random shaders.
It is unknown how widespread this issue is, but shaders hitting this can
end up with infinite loops.
We enable those only on all Kepler and newer GPUs where we use our own
Firmware.
Nvidia's firmware provides a way to set a kernelspace controlled list of
mmio registers in the gr space from push buffers via MME macros.
v2: drop code for gm200 and newer.
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: David Airlie <airlied@gmail.com>
Cc: nouveau@lists.freedesktop.org
Cc: stable@vger.kernel.org # 4.19+
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230622152017.2512101-1-kherbst@redhat.com
On system resume, the driver might call it6505_poweron directly if the
runtime PM hasn't been enabled. In such case, pm_runtime_get_if_in_use
will always return 0 because dev->power.runtime_status stays at
RPM_SUSPENDED, and the IRQ will never be handled.
Use it6505->powered from the driver struct fixes this because it always
gets updated when it6505_poweron is called.
Fixes: 5eb9a43140 ("drm/bridge: it6505: Guard bridge power in IRQ handler")
Signed-off-by: Pin-yen Lin <treapking@chromium.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230727100131.2338127-1-treapking@chromium.org
It is possible that a guest can send a packet that contains a head + 18
slots and yet has a len <= XEN_NETBACK_TX_COPY_LEN. This causes nr_slots
to underflow in xenvif_get_requests() which then causes the subsequent
loop's termination condition to be wrong, causing a buffer overrun of
queue->tx_map_ops.
Rework the code to account for the extra frag_overflow slots.
This is CVE-2023-34319 / XSA-432.
Fixes: ad7f402ae4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
__ip_append_data() can get into an infinite loop when asked to splice into
a partially-built UDP message that has more than the frag-limit data and up
to the MTU limit. Something like:
pipe(pfd);
sfd = socket(AF_INET, SOCK_DGRAM, 0);
connect(sfd, ...);
send(sfd, buffer, 8161, MSG_CONFIRM|MSG_MORE);
write(pfd[1], buffer, 8);
splice(pfd[0], 0, sfd, 0, 0x4ffe0ul, 0);
where the amount of data given to send() is dependent on the MTU size (in
this instance an interface with an MTU of 8192).
The problem is that the calculation of the amount to copy in
__ip_append_data() goes negative in two places, and, in the second place,
this gets subtracted from the length remaining, thereby increasing it.
This happens when pagedlen > 0 (which happens for MSG_ZEROCOPY and
MSG_SPLICE_PAGES), because the terms in:
copy = datalen - transhdrlen - fraggap - pagedlen;
then mostly cancel when pagedlen is substituted for, leaving just -fraggap.
This causes:
length -= copy + transhdrlen;
to increase the length to more than the amount of data in msg->msg_iter,
which causes skb_splice_from_iter() to be unable to fill the request and it
returns less than 'copied' - which means that length never gets to 0 and we
never exit the loop.
Fix this by:
(1) Insert a note about the dodgy calculation of 'copy'.
(2) If MSG_SPLICE_PAGES, clear copy if it is negative from the above
equation, so that 'offset' isn't regressed and 'length' isn't
increased, which will mean that length and thus copy should match the
amount left in the iterator.
(3) When handling MSG_SPLICE_PAGES, give a warning and return -EIO if
we're asked to splice more than is in the iterator. It might be
better to not give the warning or even just give a 'short' write.
[!] Note that this ought to also affect MSG_ZEROCOPY, but MSG_ZEROCOPY
avoids the problem by simply assuming that everything asked for got copied,
not just the amount that was in the iterator. This is a potential bug for
the future.
Fixes: 7ac7c98785 ("udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES")
Reported-by: syzbot+f527b971b4bdc8e79f9e@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000881d0606004541d1@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/1420063.1690904933@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Leon Romanovsky says:
====================
mlx5 IPsec fixes
The following patches are combination of Jianbo's work on IPsec eswitch mode
together with our internal review toward addition of TCP protocol selectors
support to IPSec packet offload.
Despite not-being fix, the first patch helps us to make second one more
clear, so I'm asking to apply it anyway as part of this series.
====================
Link: https://lore.kernel.org/r/cover.1690803944.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the cited commit, new type of FS_TYPE_PRIO_CHAINS fs_prio was added
to support multiple parallel namespaces for multi-chains. And we skip
all the flow tables under the fs_node of this type unconditionally,
when searching for the next or previous flow table to connect for a
new table.
As this search function is also used for find new root table when the
old one is being deleted, it will skip the entire FS_TYPE_PRIO_CHAINS
fs_node next to the old root. However, new root table should be chosen
from it if there is any table in it. Fix it by skipping only the flow
tables in the same FS_TYPE_PRIO_CHAINS fs_node when finding the
closest FT for a fs_node.
Besides, complete the connecting from FTs of previous priority of prio
because there should be multiple prevs after this fs_prio type is
introduced. And also the next FT should be chosen from the first flow
table next to the prio in the same FS_TYPE_PRIO_CHAINS fs_prio, if
this prio is the first child.
Fixes: 328edb499f ("net/mlx5: Split FDB fast path prio to multiple namespaces")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/7a95754df479e722038996c97c97b062b372591f.1690803944.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull ARM SoC fixes from Arnd Bergmann:
"A couple of platforms get a lone dts fix each:
- SoCFPGA: Fix incorrect I2C property for SCL signal
- Renesas: Fix interrupt names for MTU3 channels on RZ/G2L and
RZ/V2L.
- Juno/Vexpress: remove a dangling symlink
- at91: sam9x60 SoC detection compatible strings
- nspire: Fix arm primecell compatible string
On the NXP i.MX platform, there multiple issues that get addressed:
- A couple of ARM DTS fixes for i.MX6SLL usbphy and supported CPU
frequency of sk-imx53 board
- Add missing pull-up for imx8mn-var-som onboard PHY reset pinmux
- A couple of imx8mm-venice fixes from Tim Harvey to diable
disp_blk_ctrl
- A couple of phycore-imx8mm fixes from Yashwanth Varakala to correct
VPU label and gpio-line-names
- Fix imx8mp-blk-ctrl driver to register HSIO PLL clock as
bus_power_dev child, so that runtime PM can translate into the
necessary GPC power domain action
On the driver side, there are two fixes for tegra memory controller
drivers addressing regressions from the merge window, a couple of
minor correctness fixes for SCMI and SMCCC firmware, as well as a
build fix for an lcd backlight driver"
* tag 'soc-fixes-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (22 commits)
backlight: corgi_lcd: fix missing prototype
memory: tegra: make icc_set_bw return zero if BWMGR not supported
arm64: dts: renesas: rzg2l: Update overfow/underflow IRQ names for MTU3 channels
dt-bindings: serial: atmel,at91-usart: update compatible for sam9x60
ARM: dts: at91: sam9x60: fix the SOC detection
ARM: dts: nspire: Fix arm primecell compatible string
firmware: arm_scmi: Fix chan_free cleanup on SMC
firmware: arm_scmi: Drop OF node reference in the transport channel setup
soc: imx: imx8mp-blk-ctrl: register HSIO PLL clock as bus_power_dev child
ARM: dts: nxp/imx: limit sk-imx53 supported frequencies
firmware: arm_scmi: Fix signed error return values handling
firmware: smccc: Fix use of uninitialised results structure
arm64: dts: freescale: Fix VPU G2 clock
arm64: dts: imx8mn-var-som: add missing pull-up for onboard PHY reset pinmux
arm64: dts: phycore-imx8mm: Correction in gpio-line-names
arm64: dts: phycore-imx8mm: Label typo-fix of VPU
ARM: dts: nxp/imx6sll: fix wrong property name in usbphy node
arm64: dts: imx8mm-venice-gw7904: disable disp_blk_ctrl
arm64: dts: imx8mm-venice-gw7903: disable disp_blk_ctrl
arm64: dts: arm: Remove the dangling vexpress-v2m-rs1.dtsi symlink
...
Pull bitmap fixes from Yury Norov:
- Fix for bitmap documentation
- Fix for kernel build under certain configurations
* tag 'bitmap-6.5-rc5' of https://github.com:/norov/linux:
lib/bitmap: workaround const_eval test build failure
cpumask: eliminate kernel-doc warnings
Hyper-V can run VMs at different privilege "levels" known as Virtual
Trust Levels (VTL). Sometimes, it chooses to run two different VMs
at different levels but they share some of their address space. In
such setups VTL2 (higher level VM) has visibility of all of the
VTL0 (level 0) memory space.
When the CONFIG_X86_MPPARSE is enabled for VTL2, the VTL2 kernel
performs a search within the low memory to locate MP tables. However,
in systems where VTL0 manages the low memory and may contain valid
tables, this scanning can result in incorrect MP table information
being provided to the VTL2 kernel, mistakenly considering VTL0's MP
table as its own
Add noop functions to avoid MP parse scan by VTL2.
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/1687537688-5397-1-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The following error happens:
In file included from vstate_exec_nolibc.c:2:
/usr/include/riscv64-linux-gnu/sys/prctl.h:42:12: error: conflicting types for ‘prctl’; h
ave ‘int(int, ...)’
42 | extern int prctl (int __option, ...) __THROW;
| ^~~~~
In file included from ./../../../../include/nolibc/nolibc.h:99,
from <command-line>:
./../../../../include/nolibc/sys.h:892:5: note: previous definition of ‘prctl’ with type
‘int(int, long unsigned int, long unsigned int, long unsigned int, long unsigned int)
’
892 | int prctl(int option, unsigned long arg2, unsigned long arg3,
| ^~~~~
Fix this by not including <sys/prctl.h>, which is not needed here since
prctl syscall is directly called using its number.
Fixes: 7cf6198ce2 ("selftests: Test RISC-V Vector prctl interface")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230713115829.110421-1-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The riscv selftests (which were modeled after the arm64 selftests) are
improperly declaring the "emit_tests" target to depend upon the "all"
target. This approach, when combined with commit 9fc96c7c19
("selftests: error out if kernel header files are not yet built"), has
caused build failures [1] on arm64, and is likely to cause similar
failures for riscv.
To fix this, simply remove the unnecessary "all" dependency from the
emit_tests target. The dependency is still effectively honored, because
again, invocation is via "install", which also depends upon "all".
An alternative approach would be to harden the emit_tests target so that
it can depend upon "all", but that's a lot more complicated and hard to
get right, and doesn't seem worth it, especially given that emit_tests
should probably not be overridden at all.
[1] https://lore.kernel.org/20230710-kselftest-fix-arm64-v1-1-48e872844f25@kernel.org
Fixes: 9fc96c7c19 ("selftests: error out if kernel header files are not yet built")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230712193514.740033-1-jhubbard@nvidia.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull exfat fixes from Namjae Jeon:
- Fix page allocation failure from allocation bitmap by using
kvmalloc_array/kvfree
- Add the check to validate if filename entries exceeds max filename
length
- Fix potential deadlock condition from dir_emit*()
* tag 'exfat-for-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: release s_lock before calling dir_emit()
exfat: check if filename entries exceeds max filename length
exfat: use kvmalloc_array/kvfree instead of kmalloc_array/kfree
Customer reported that they couldn't mount their DFS link that was
seen by the client as a DFS interlink -- special form of DFS link
where its single target may point to a different DFS namespace -- and
it turned out that it was just a regular DFS link where its referral
header flags missed the StorageServers bit thus making the client
think it couldn't tree connect to target directly without requiring
further referrals.
When the DFS link referral header flags misses the StoraServers bit
and its target doesn't respond to any referrals, then tree connect to
it.
Fixes: a1c0d00572 ("cifs: share dfs connections and supers")
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull SCSI fixes from James Bottomley:
"Three small fixes, all in drivers"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: pm80xx: Fix error return code in pm8001_pci_probe()
scsi: zfcp: Defer fc_rport blocking until after ADISC response
scsi: storvsc: Limit max_sectors for virtual Fibre Channel devices
Compiling big-endian targets with Clang produces the diagnostic:
fs/namei.c:2173:13: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical]
} while (!(has_zero(a, &adata, &constants) | has_zero(b, &bdata, &constants)));
~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
||
fs/namei.c:2173:13: note: cast one or both operands to int to silence this warning
It appears that when has_zero was introduced, two definitions were
produced with different signatures (in particular different return
types).
Looking at the usage in hash_name() in fs/namei.c, I suspect that
has_zero() is meant to be invoked twice per while loop iteration; using
logical-or would not update `bdata` when `a` did not have zeros. So I
think it's preferred to always return an unsigned long rather than a
bool than update the while loop in hash_name() to use a logical-or
rather than bitwise-or.
[ Also changed powerpc version to do the same - Linus ]
Link: https://github.com/ClangBuiltLinux/linux/issues/1832
Link: https://lore.kernel.org/lkml/20230801-bitwise-v1-1-799bec468dc4@google.com/
Fixes: 36126f8f2e ("word-at-a-time: make the interfaces truly generic")
Debugged-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On a powermac platform, via the call path:
start_kernel()
time_init()
ppc_md.calibrate_decr() (pmac_calibrate_decr)
via_calibrate_decr()
ioremap() and iounmap() are called. The unmap can enable interrupts
unexpectedly (cond_resched() in vunmap_pmd_range()), which causes a
warning later in the boot sequence in start_kernel().
Use the early_* variants of these IO functions to prevent this.
The issue is pre-existing, but is surfaced by commit 721255b982
("genirq: Use a maple tree for interrupt descriptor management").
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230706010816.72682-1-bgray@linux.ibm.com
Using brcmfmac with 6.5-rc3 on a brcmfmac43241b4-sdio triggers
a backtrace caused by the following field-spanning warning:
memcpy: detected field-spanning write (size 120) of single field
"¶ms_le->channel_list[0]" at
drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c:1072 (size 2)
The driver still works after this warning. The warning was introduced by the
new field-spanning write checks which were enabled recently.
Fix this by replacing the channel_list[1] declaration at the end of
the struct with a flexible array declaration.
Most users of struct brcmf_scan_params_le calculate the size to alloc
using the size of the non flex-array part of the struct + needed extra
space, so they do not care about sizeof(struct brcmf_scan_params_le).
brcmf_notify_escan_complete() however uses the struct on the stack,
expecting there to be room for at least 1 entry in the channel-list
to store the special -1 abort channel-id.
To make this work use an anonymous union with a padding member
added + the actual channel_list flexible array.
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230729140500.27892-1-hdegoede@redhat.com
dev_close() and dev_open() are issued to change the interface state to DOWN
or UP (dev->flags IFF_UP). When the netdev is set DOWN it loses e.g its
Ipv6 addresses and routes. We don't want this in cases of device recovery
(triggered by hardware or software) or when the qeth device is set
offline.
Setting a qeth device offline or online and device recovery actions call
netif_device_detach() and/or netif_device_attach(). That will reset or
set the LOWER_UP indication i.e. change the dev->state Bit
__LINK_STATE_PRESENT. That is enough to e.g. cause bond failovers, and
still preserves the interface settings that are handled by the network
stack.
Don't call dev_open() nor dev_close() from the qeth device driver. Let the
network stack handle this.
Fixes: d4560150cb ("s390/qeth: call dev_close() during recovery")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Laszlo Ersek says:
====================
tun/tap: set sk_uid from current_fsuid()
The original patches fixing CVE-2023-1076 are incorrect in my opinion.
This small series fixes them up; see the individual commit messages for
explanation.
I have a very elaborate test procedure demonstrating the problem for
both tun and tap; it involves libvirt, qemu, and "crash". I can share
that procedure if necessary, but it's indeed quite long (I wrote it
originally for our QE team).
The patches in this series are supposed to "re-fix" CVE-2023-1076; given
that said CVE is classified as Low Impact (CVSSv3=5.5), I'm posting this
publicly, and not suggesting any embargo. Red Hat Product Security may
assign a new CVE number later.
I've tested the patches on top of v6.5-rc4, with "crash" built at commit
c74f375e0ef7.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Pietro Borrello <borrello@diag.uniroma1.it>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 66b2c338ad initializes the "sk_uid" field in the protocol socket
(struct sock) from the "/dev/tapX" device node's owner UID. Per original
commit 86741ec254 ("net: core: Add a UID field to struct sock.",
2016-11-04), that's wrong: the idea is to cache the UID of the userspace
process that creates the socket. Commit 86741ec254 mentions socket() and
accept(); with "tap", the action that creates the socket is
open("/dev/tapX").
Therefore the device node's owner UID is irrelevant. In most cases,
"/dev/tapX" will be owned by root, so in practice, commit 66b2c338ad has
no observable effect:
- before, "sk_uid" would be zero, due to undefined behavior
(CVE-2023-1076),
- after, "sk_uid" would be zero, due to "/dev/tapX" being owned by root.
What matters is the (fs)UID of the process performing the open(), so cache
that in "sk_uid".
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Pietro Borrello <borrello@diag.uniroma1.it>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 66b2c338ad ("tap: tap_open(): correctly initialize socket uid")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit a096ccca6e initializes the "sk_uid" field in the protocol socket
(struct sock) from the "/dev/net/tun" device node's owner UID. Per
original commit 86741ec254 ("net: core: Add a UID field to struct
sock.", 2016-11-04), that's wrong: the idea is to cache the UID of the
userspace process that creates the socket. Commit 86741ec254 mentions
socket() and accept(); with "tun", the action that creates the socket is
open("/dev/net/tun").
Therefore the device node's owner UID is irrelevant. In most cases,
"/dev/net/tun" will be owned by root, so in practice, commit a096ccca6e
has no observable effect:
- before, "sk_uid" would be zero, due to undefined behavior
(CVE-2023-1076),
- after, "sk_uid" would be zero, due to "/dev/net/tun" being owned by root.
What matters is the (fs)UID of the process performing the open(), so cache
that in "sk_uid".
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Pietro Borrello <borrello@diag.uniroma1.it>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: a096ccca6e ("tun: tun_chr_open(): correctly initialize socket uid")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During system resume, ata_port_pm_resume() triggers ata EH to
1) Resume the controller
2) Reset and rescan the ports
3) Revalidate devices
This EH execution is started asynchronously from ata_port_pm_resume(),
which means that when sd_resume() is executed, none or only part of the
above processing may have been executed. However, sd_resume() issues a
START STOP UNIT to wake up the drive from sleep mode. This command is
translated to ATA with ata_scsi_start_stop_xlat() and issued to the
device. However, depending on the state of execution of the EH process
and revalidation triggerred by ata_port_pm_resume(), two things may
happen:
1) The START STOP UNIT fails if it is received before the controller has
been reenabled at the beginning of the EH execution. This is visible
with error messages like:
ata10.00: device reported invalid CHS sector 0
sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current]
sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command
sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5
sd 9:0:0:0: PM: failed to resume async: error -5
2) The START STOP UNIT command is received while the EH process is
on-going, which mean that it is stopped and must wait for its
completion, at which point the command is rather useless as the drive
is already fully spun up already. This case results also in a
significant delay in sd_resume() which is observable by users as
the entire system resume completion is delayed.
Given that ATA devices will be woken up by libata activity on resume,
sd_resume() has no need to issue a START STOP UNIT command, which solves
the above mentioned problems. Do not issue this command by introducing
the new scsi_device flag no_start_on_resume and setting this flag to 1
in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP
UNIT command only if this flag is not set.
Reported-by: Paul Ausbeck <paula@soe.ucsc.edu>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880
Fixes: a19a93e4c6 ("scsi: core: pm: Rely on the device driver core for async power management")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Tanner Watkins <dalzot@gmail.com>
Tested-by: Paul Ausbeck <paula@soe.ucsc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
If the cluster becomes unavailable, ceph_osdc_notify() may hang even
with osd_request_timeout option set because linger_notify_finish_wait()
waits for MWatchNotify NOTIFY_COMPLETE message with no associated OSD
request in flight -- it's completely asynchronous.
Introduce an additional timeout, derived from the specified notify
timeout. While at it, switch both waits to killable which is more
correct.
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Due to rbd_try_acquire_lock() effectively swallowing all but
EBLOCKLISTED error from rbd_try_lock() ("request lock anyway") and
rbd_request_lock() returning ETIMEDOUT error not only for an actual
notify timeout but also when the lock owner doesn't respond, a busy
loop inside of rbd_acquire_lock() between rbd_try_acquire_lock() and
rbd_request_lock() is possible.
Requesting the lock on EBUSY error (returned by get_lock_owner_info()
if an incompatible lock or invalid lock owner is detected) makes very
little sense. The same goes for ETIMEDOUT error (might pop up pretty
much anywhere if osd_request_timeout option is set) and many others.
Just fail I/O requests on rbd_dev->acquiring_list immediately on any
error from rbd_try_lock().
Cc: stable@vger.kernel.org # 588159009d: rbd: retrieve and check lock owner twice before blocklisting
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
The dcbnl_bcn_setcfg uses erroneous policy to parse tb[DCB_ATTR_BCN],
which is introduced in commit 859ee3c438 ("DCB: Add support for DCB
BCN"). Please see the comment in below code
static int dcbnl_bcn_setcfg(...)
{
...
ret = nla_parse_nested_deprecated(..., dcbnl_pfc_up_nest, .. )
// !!! dcbnl_pfc_up_nest for attributes
// DCB_PFC_UP_ATTR_0 to DCB_PFC_UP_ATTR_ALL in enum dcbnl_pfc_up_attrs
...
for (i = DCB_BCN_ATTR_RP_0; i <= DCB_BCN_ATTR_RP_7; i++) {
// !!! DCB_BCN_ATTR_RP_0 to DCB_BCN_ATTR_RP_7 in enum dcbnl_bcn_attrs
...
value_byte = nla_get_u8(data[i]);
...
}
...
for (i = DCB_BCN_ATTR_BCNA_0; i <= DCB_BCN_ATTR_RI; i++) {
// !!! DCB_BCN_ATTR_BCNA_0 to DCB_BCN_ATTR_RI in enum dcbnl_bcn_attrs
...
value_int = nla_get_u32(data[i]);
...
}
...
}
That is, the nla_parse_nested_deprecated uses dcbnl_pfc_up_nest
attributes to parse nlattr defined in dcbnl_pfc_up_attrs. But the
following access code fetch each nlattr as dcbnl_bcn_attrs attributes.
By looking up the associated nla_policy for dcbnl_bcn_attrs. We can find
the beginning part of these two policies are "same".
static const struct nla_policy dcbnl_pfc_up_nest[...] = {
[DCB_PFC_UP_ATTR_0] = {.type = NLA_U8},
[DCB_PFC_UP_ATTR_1] = {.type = NLA_U8},
[DCB_PFC_UP_ATTR_2] = {.type = NLA_U8},
[DCB_PFC_UP_ATTR_3] = {.type = NLA_U8},
[DCB_PFC_UP_ATTR_4] = {.type = NLA_U8},
[DCB_PFC_UP_ATTR_5] = {.type = NLA_U8},
[DCB_PFC_UP_ATTR_6] = {.type = NLA_U8},
[DCB_PFC_UP_ATTR_7] = {.type = NLA_U8},
[DCB_PFC_UP_ATTR_ALL] = {.type = NLA_FLAG},
};
static const struct nla_policy dcbnl_bcn_nest[...] = {
[DCB_BCN_ATTR_RP_0] = {.type = NLA_U8},
[DCB_BCN_ATTR_RP_1] = {.type = NLA_U8},
[DCB_BCN_ATTR_RP_2] = {.type = NLA_U8},
[DCB_BCN_ATTR_RP_3] = {.type = NLA_U8},
[DCB_BCN_ATTR_RP_4] = {.type = NLA_U8},
[DCB_BCN_ATTR_RP_5] = {.type = NLA_U8},
[DCB_BCN_ATTR_RP_6] = {.type = NLA_U8},
[DCB_BCN_ATTR_RP_7] = {.type = NLA_U8},
[DCB_BCN_ATTR_RP_ALL] = {.type = NLA_FLAG},
// from here is somewhat different
[DCB_BCN_ATTR_BCNA_0] = {.type = NLA_U32},
...
[DCB_BCN_ATTR_ALL] = {.type = NLA_FLAG},
};
Therefore, the current code is buggy and this
nla_parse_nested_deprecated could overflow the dcbnl_pfc_up_nest and use
the adjacent nla_policy to parse attributes from DCB_BCN_ATTR_BCNA_0.
Hence use the correct policy dcbnl_bcn_nest to parse the nested
tb[DCB_ATTR_BCN] TLV.
Fixes: 859ee3c438 ("DCB: Add support for DCB BCN")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230801013248.87240-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These options clearly turn *off* XSAVE YMM support. Correct the
typo.
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 553a5c03e9 ("x86/speculation: Add force option to GDS mitigation")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Michael Chan says:
====================
bnxt_en: 2 XDP bug fixes
The first patch fixes XDP page pool logic on systems with page size >=
64K. The second patch fixes the max_mtu setting when an XDP program
supporting multi buffers is attached.
====================
Link: https://lore.kernel.org/r/20230731142043.58855-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As documented in acd7aaf51b ("netsec: ignore 'phy-mode' device
property on ACPI systems") the SocioNext SynQuacer platform ships with
firmware defining the PHY mode as RGMII even though the physical
configuration of the PHY is for TX and RX delays. Since bbc4d71d63
("net: phy: realtek: fix rtl8211e rx/tx delay config") this has caused
misconfiguration of the PHY, rendering the network unusable.
This was worked around for ACPI by ignoring the phy-mode property but
the system is also used with DT. For DT instead if we're running on a
SynQuacer force a working PHY mode, as well as the standard EDK2
firmware with DT there are also some of these systems that use u-boot
and might not initialise the PHY if not netbooting. Newer firmware
imagaes for at least EDK2 are available from Linaro so print a warning
when doing this.
Fixes: 533dd11a12 ("net: socionext: Add Synquacer NetSec driver")
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230731-synquacer-net-v3-1-944be5f06428@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
in korina_probe(), the return value of clk_prepare_enable()
should be checked since it might fail. we can use
devm_clk_get_optional_enabled() instead of devm_clk_get_optional()
and clk_prepare_enable() to automatically handle the error.
Fixes: e4cd854ec4 ("net: korina: Get mdio input clock via common clock framework")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Link: https://lore.kernel.org/r/20230731090535.21416-1-ruc_gongyuanjun@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Samsung PM9B1 512G SSD found in some Lenovo Yoga 7 14ARB7 laptop units
reports eui as 0001000200030004 when resuming from s2idle, causing the
device to be removed with this error in dmesg:
nvme nvme0: identifiers changed for nsid 1
To fix this, add a quirk to ignore namespace identifiers for this device.
Signed-off-by: August Wikerfors <git@augustwikerfors.se>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The trailing array member of struct tx_buf was defined as a 1-element
array, but used as a flexible array. This was resulting in build warnings:
In function 'fortify_memset_chk',
inlined from 'memset_io' at /kisskb/src/arch/mips/include/asm/io.h:486:2,
inlined from 'build_auth_frame' at /kisskb/src/drivers/net/wireless/legacy/ray_cs.c:2697:2:
/kisskb/src/include/linux/fortify-string.h:493:25: error: call to '__write_overflow_field' declared with attribute warning:
detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
493 | __write_overflow_field(p_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Replace it with an actual flexible array. Binary difference comparison
shows a single change in output:
│ drivers/net/wireless/legacy/ray_cs.c:883
│ lea 0x1c(%rbp),%r13d
│ - cmp $0x7c3,%r13d
│ + cmp $0x7c4,%r13d
This is from:
if (len + TX_HEADER_LENGTH > TX_BUF_SIZE) {
specifically:
#define TX_BUF_SIZE (2048 - sizeof(struct tx_msg))
This appears to have been originally buggy, so the change is correct.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/all/88f83d73-781d-bdc-126-aa629cb368c@linux-m68k.org
Cc: Kalle Valo <kvalo@kernel.org>
Cc: linux-wireless@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230728231245.never.309-kees@kernel.org
Depends on the interface used, the RAPL registers can be either MSR
indexes or memory mapped IO addresses. Current RAPL common code uses u64
to save both MSR and memory mapped IO registers. With this, when
handling register address with an __iomem annotation, it triggers a
sparse warning like below:
sparse warnings: (new ones prefixed by >>)
>> drivers/powercap/intel_rapl_tpmi.c:141:41: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected unsigned long long [usertype] *tpmi_rapl_regs @@ got void [noderef] __iomem * @@
drivers/powercap/intel_rapl_tpmi.c:141:41: sparse: expected unsigned long long [usertype] *tpmi_rapl_regs
drivers/powercap/intel_rapl_tpmi.c:141:41: sparse: got void [noderef] __iomem *
Fix the problem by using a union to save the registers instead.
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307031405.dy3druuy-lkp@intel.com/
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When booting on e6500 with an ELF v2 ABI kernel, the secondary threads do
not start correctly:
[ 0.051118] smp: Bringing up secondary CPUs ...
[ 5.072700] Processor 1 is stuck.
This occurs because the startup code is written to use function
descriptors when loading the entry point for the secondary threads. When
building with ELF v2 ABI there are no function descriptors, and the code
loads junk values for the entry point address.
Fix it by using ppc_function_entry() in C, and DOTSYM() in asm, both of
which work correctly for ELF v2 ABI as well as ELF v1 ABI kernels.
Fixes: 8c5fa3b5c4 ("powerpc/64: Make ELFv2 the default for big-endian builds")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230801102650.48705-1-mpe@ellerman.id.au
In destruction flow, the assignment of NULL to xso->dev
caused to skip of xfrm_dev_state_free() call, which was
called in xfrm_state_put(to_put) routine.
Instead of open-coded variant of xfrm_dev_state_delete() and
xfrm_dev_state_free(), let's use them directly.
Fixes: f8a70afafc ("xfrm: add TX datapath support for IPsec packet offload mode")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The policy memory was released but not HW driver data. Add
call to xfrm_dev_policy_delete(), so drivers will have a chance
to release their resources.
Fixes: 919e43fad5 ("xfrm: add an interface to offload policy")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Infinite waits for completion of GPU activity have been observed in CI,
mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or
igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps
generated with a lot of extra trace_printk() calls added to the code,
revealed loops of request dependencies being accidentally built,
preventing the requests from being processed, each waiting for completion
of another one's activity.
After we substitute a new request for a last active one tracked on a
timeline, we set up a dependency of our new request to wait on completion
of current activity of that previous one. While doing that, we must take
care of keeping the old request still in memory until we use its
attributes for setting up that await dependency, or we can happen to set
up the await dependency on an unrelated request that already reuses the
memory previously allocated to the old one, already released. Combined
with perf adding consecutive kernel context remote requests to different
user context timelines, unresolvable loops of await dependencies can be
built, leading do infinite waits.
We obtain a pointer to the previous request to wait upon when we
substitute it with a pointer to our new request in an active tracker,
e.g. in intel_timeline.last_request. In some processing paths we protect
that old request from being freed before we use it by getting a reference
to it under RCU protection, but in others, e.g. __i915_request_commit()
-> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(),
we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU,
that RCU protection is not sufficient against reuse of memory.
We could protect i915_request's memory from being prematurely reused by
calling its release function via call_rcu() and using rcu_read_lock()
consequently, as proposed in v1. However, that approach leads to
significant (up to 10 times) increase of SLAB utilization by i915_request
SLAB cache. Another potential approach is to take a reference to the
previous active fence.
When updating an active fence tracker, we first lock the new fence,
substitute a pointer of the current active fence with the new one, then we
lock the substituted fence. With this approach, there is a time window
after the substitution and before the lock when the request can be
concurrently released by an interrupt handler and its memory reused, then
we may happen to lock and return a new, unrelated request.
Always get a reference to the current active fence first, before
replacing it with a new one. Having it protected from premature release
and reuse, lock it and then replace with the new one but only if not
yet signalled via a potential concurrent interrupt nor replaced with
another one by a potential concurrent thread, otherwise retry, starting
from getting a reference to the new current one. Adjust users to not
get a reference to the previous active fence themselves and always put the
reference got by __i915_active_fence_set() when no longer needed.
v3: Fix lockdep splat reports and other issues caused by incorrect use of
try_cmpxchg() (use (cmpxchg() != prev) instead)
v2: Protect request's memory by getting a reference to it in favor of
delegating its release to call_rcu() (Chris)
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211
Fixes: df9f85d858 ("drm/i915: Serialise i915_active_fence_set() with itself")
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janusz.krzysztofik@linux.intel.com
(cherry picked from commit 946e047a3d)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
When handling deduplicated compressed data, there can be multiple
decompressed extents pointing to the same compressed data in one shot.
In such cases, the bvecs which belong to the longest extent will be
selected as the primary bvecs for real decompressors to decode and the
other duplicated bvecs will be directly copied from the primary bvecs.
Previously, only relative offsets of the longest extent were checked to
decompress the primary bvecs. On rare occasions, it can be incorrect
if there are several extents with the same start relative offset.
As a result, some short bvecs could be selected for decompression and
then cause data corruption.
For example, as Shijie Sun reported off-list, considering the following
extents of a file:
117: 903345.. 915250 | 11905 : 385024.. 389120 | 4096
...
119: 919729.. 930323 | 10594 : 385024.. 389120 | 4096
...
124: 968881.. 980786 | 11905 : 385024.. 389120 | 4096
The start relative offset is the same: 2225, but extent 119 (919729..
930323) is shorter than the others.
Let's restrict the bvec length in addition to the start offset if bvecs
are not full.
Reported-by: Shijie Sun <sunshijie@xiaomi.com>
Fixes: 5c2a64252c ("erofs: introduce partial-referenced pclusters")
Tested-by Shijie Sun <sunshijie@xiaomi.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230719065459.60083-1-hsiangkao@linux.alibaba.com
Disabling preemption in sock_map_sk_acquire conflicts with GFP_ATOMIC
allocation later in sk_psock_init_link on PREEMPT_RT kernels, since
GFP_ATOMIC might sleep on RT (see bpf: Make BPF and PREEMPT_RT co-exist
patchset notes for details).
This causes calling bpf_map_update_elem on BPF_MAP_TYPE_SOCKMAP maps to
BUG (sleeping function called from invalid context) on RT kernels.
preempt_disable was introduced together with lock_sk and rcu_read_lock
in commit 99ba2b5aba ("bpf: sockhash, disallow bpf_tcp_close and update
in parallel"), probably to match disabled migration of BPF programs, and
is no longer necessary.
Remove preempt_disable to fix BUG in sock_map_update_common on RT.
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/all/20200224140131.461979697@linutronix.de/
Fixes: 99ba2b5aba ("bpf: sockhash, disallow bpf_tcp_close and update in parallel")
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20230728064411.305576-1-tglozar@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We are missing the serial core controller id for the serial core port
name. Let's fix the issue for sane sysfs output, and to avoid issues
addressing serial ports later on.
And as we're now showing the controller id, the "ctrl" and "port" prefix
for the DEVNAME become useless, we can just drop them. Let's standardize on
DEVNAME:0 for controller name, where 0 is the controller id. And
DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.
This makes the sysfs output nicer, on qemu for example:
$ ls /sys/bus/serial-base/devices
00:04:0 serial8250:0 serial8250:0.2
00:04:0.0 serial8250:0.1 serial8250:0.3
Fixes: 84a9582fd2 ("serial: core: Start managing serial controllers to enable runtime PM")
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230725054216.45696-4-tony@atomide.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The serial core port id should be serial core controller specific port
instance, which is not always the port->line index.
For example, 8250 driver maps a number of legacy ports, and when a
hardware specific device driver takes over, we typically have one
driver instance for each port. Let's instead add port->port_id to
keep track serial ports mapped to each serial core controller instance.
Currently this is only a cosmetic issue for the serial core port device
names. The issue can be noticed looking at /sys/bus/serial-base/devices
for example though. Let's fix the issue to avoid port addressing issues
later on.
Fixes: 84a9582fd2 ("serial: core: Start managing serial controllers to enable runtime PM")
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Link: https://lore.kernel.org/r/20230725054216.45696-3-tony@atomide.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Do not read the data register to clear the error flags for lpuart32
platforms, the additional read may cause the receive FIFO underflow
since the DMA has already read the data register.
Actually all lpuart32 platforms support write 1 to clear those error
bits, let's use this method to better clear the error flags.
Fixes: 42b68768e5 ("serial: fsl_lpuart: DMA support for 32-bit variant")
Cc: stable <stable@kernel.org>
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Link: https://lore.kernel.org/r/20230801022304.24251-1-sherry.sun@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
valis says:
====================
net/sched Bind logic fixes for cls_fw, cls_u32 and cls_route
Three classifiers (cls_fw, cls_u32 and cls_route) always copy
tcf_result struct into the new instance of the filter on update.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
This patch set fixes this issue in all affected classifiers by no longer
copying the tcf_result struct from the old filter.
====================
Link: https://lore.kernel.org/r/20230729123202.72406-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When route4_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: 1109c00547 ("net: sched: RCU cls_route")
Reported-by: valis <sec@valis.email>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-4-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When fw_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: e35a8ee599 ("net: sched: fw use RCU")
Reported-by: valis <sec@valis.email>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-3-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When u32_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Fixes: de5df63228 ("net: sched: cls_u32 changes to knode must appear atomic to readers")
Reported-by: valis <sec@valis.email>
Reported-by: M A Ramdhan <ramdhan@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-2-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hou Tao says:
====================
The patchset fixes two reported warning in cpu-map when running
xdp_redirect_cpu and some RT threads concurrently. Patch #1 fixes
the warning in __cpu_map_ring_cleanup() when kthread is stopped
prematurely. Patch #2 fixes the warning in __xdp_return() when
there are pending skbs in ptr_ring.
Please see individual patches for more details. And comments are always
welcome.
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The following warning was reported when running xdp_redirect_cpu with
both skb-mode and stress-mode enabled:
------------[ cut here ]------------
Incorrect XDP memory type (-2128176192) usage
WARNING: CPU: 7 PID: 1442 at net/core/xdp.c:405
Modules linked in:
CPU: 7 PID: 1442 Comm: kworker/7:0 Tainted: G 6.5.0-rc2+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Workqueue: events __cpu_map_entry_free
RIP: 0010:__xdp_return+0x1e4/0x4a0
......
Call Trace:
<TASK>
? show_regs+0x65/0x70
? __warn+0xa5/0x240
? __xdp_return+0x1e4/0x4a0
......
xdp_return_frame+0x4d/0x150
__cpu_map_entry_free+0xf9/0x230
process_one_work+0x6b0/0xb80
worker_thread+0x96/0x720
kthread+0x1a5/0x1f0
ret_from_fork+0x3a/0x70
ret_from_fork_asm+0x1b/0x30
</TASK>
The reason for the warning is twofold. One is due to the kthread
cpu_map_kthread_run() is stopped prematurely. Another one is
__cpu_map_ring_cleanup() doesn't handle skb mode and treats skbs in
ptr_ring as XDP frames.
Prematurely-stopped kthread will be fixed by the preceding patch and
ptr_ring will be empty when __cpu_map_ring_cleanup() is called. But
as the comments in __cpu_map_ring_cleanup() said, handling and freeing
skbs in ptr_ring as well to "catch any broken behaviour gracefully".
Fixes: 11941f8a85 ("bpf: cpumap: Implement generic cpumap")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230729095107.1722450-3-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The following warning was reported when running stress-mode enabled
xdp_redirect_cpu with some RT threads:
------------[ cut here ]------------
WARNING: CPU: 4 PID: 65 at kernel/bpf/cpumap.c:135
CPU: 4 PID: 65 Comm: kworker/4:1 Not tainted 6.5.0-rc2+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Workqueue: events cpu_map_kthread_stop
RIP: 0010:put_cpu_map_entry+0xda/0x220
......
Call Trace:
<TASK>
? show_regs+0x65/0x70
? __warn+0xa5/0x240
......
? put_cpu_map_entry+0xda/0x220
cpu_map_kthread_stop+0x41/0x60
process_one_work+0x6b0/0xb80
worker_thread+0x96/0x720
kthread+0x1a5/0x1f0
ret_from_fork+0x3a/0x70
ret_from_fork_asm+0x1b/0x30
</TASK>
The root cause is the same as commit 4369016497 ("bpf: cpumap: Fix memory
leak in cpu_map_update_elem"). The kthread is stopped prematurely by
kthread_stop() in cpu_map_kthread_stop(), and kthread() doesn't call
cpu_map_kthread_run() at all but XDP program has already queued some
frames or skbs into ptr_ring. So when __cpu_map_ring_cleanup() checks
the ptr_ring, it will find it was not emptied and report a warning.
An alternative fix is to use __cpu_map_ring_cleanup() to drop these
pending frames or skbs when kthread_stop() returns -EINTR, but it may
confuse the user, because these frames or skbs have been handled
correctly by XDP program. So instead of dropping these frames or skbs,
just make sure the per-cpu kthread is running before
__cpu_map_entry_alloc() returns.
After apply the fix, the error handle for kthread_stop() will be
unnecessary because it will always return 0, so just remove it.
Fixes: 6710e11269 ("bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230729095107.1722450-2-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The two mbox-related mutexes are destroyed in octep_ctrl_mbox_uninit(),
but the corresponding mutex_init calls were missing.
A "DEBUG_LOCKS_WARN_ON(lock->magic != lock)" warning was emitted with
CONFIG_DEBUG_MUTEXES on.
Initialize the two mutexes in octep_ctrl_mbox_init().
Fixes: 577f0d1b1c ("octeon_ep: add separate mailbox command and response queues")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230729151516.24153-1-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Similarly to other recently fixed drivers make sure we don't
try to access XDP or page pool APIs when NAPI budget is 0.
NAPI budget of 0 may mean that we are in netpoll.
This may result in running software IRQs in hard IRQ context,
leading to deadlocks or crashes.
To make sure bnapi->tx_pkts don't get wiped without handling
the events, move clearing the field into the handler itself.
Remember to clear tx_pkts after reset (bnxt_enable_napi())
as it's technically possible that netpoll will accumulate
some tx_pkts and then a reset will happen, leaving tx_pkts
out of sync with reality.
Fixes: 322b87ca55 ("bnxt_en: add page_pool support")
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230728205020.2784844-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(lightly modified commit message mostly by Linus Torvalds)
The parsing code for /proc/scsi/scsi is disgusting and broken. We should
have just used 'sscanf()' or something simple like that, but the logic may
actually predate our kernel sscanf library routine for all I know. It
certainly predates both git and BK histories.
And we can't change it to be something sane like that now, because the
string matching at the start is done case-insensitively, and the separator
parsing between numbers isn't done at all, so *any* separator will work,
including a possible terminating NUL character.
This interface is root-only, and entirely for legacy use, so there is
absolutely no point in trying to tighten up the parsing. Because any
separator has traditionally worked, it's entirely possible that people have
used random characters rather than the suggested space.
So don't bother to try to pretty it up, and let's just make a minimal patch
that can be back-ported and we can forget about this whole sorry thing for
another two decades.
Just make it at least not read past the end of the supplied data.
Link: https://lore.kernel.org/linux-scsi/b570f5fe-cb7c-863a-6ed9-f6774c219b88@cybernetics.com/
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin K Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: stable@kernel.org
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Martin K Petersen <martin.petersen@oracle.com>
fnic_clean_pending_aborts() was returning a non-zero value irrespective of
failure or success. This caused the caller of this function to assume that
the device reset had failed, even though it would succeed in most cases. As
a consequence, a successful device reset would escalate to host reset.
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Tested-by: Karan Tilak Kumar <kartilak@cisco.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20230727193919.2519-1-kartilak@cisco.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Hyper-V provides the ability to connect Fibre Channel LUNs to the host
system and present them in a guest VM as a SCSI device. I/O to the vFC
device is handled by the storvsc driver. The storvsc driver includes a
partial integration with the FC transport implemented in the generic
portion of the Linux SCSI subsystem so that FC attributes can be displayed
in /sys. However, the partial integration means that some aspects of vFC
don't work properly. Unfortunately, a full and correct integration isn't
practical because of limitations in what Hyper-V provides to the guest.
In particular, in the context of Hyper-V storvsc, the FC transport timeout
function fc_eh_timed_out() causes a kernel panic because it can't find the
rport and dereferences a NULL pointer. The original patch that added the
call from storvsc_eh_timed_out() to fc_eh_timed_out() is faulty in this
regard.
In many cases a timeout is due to a transient condition, so the situation
can be improved by just continuing to wait like with other I/O requests
issued by storvsc, and avoiding the guaranteed panic. For a permanent
failure, continuing to wait may result in a hung thread instead of a panic,
which again may be better.
So fix the panic by removing the storvsc call to fc_eh_timed_out(). This
allows storvsc to keep waiting for a response. The change has been tested
by users who experienced a panic in fc_eh_timed_out() due to transient
timeouts, and it solves their problem.
In the future we may want to deprecate the vFC functionality in storvsc
since it can't be fully fixed. But it has current users for whom it is
working well enough, so it should probably stay for a while longer.
Fixes: 3930d73098 ("scsi: storvsc: use default I/O timeout handler for FC devices")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1690606764-79669-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The ACPI device CSC3556 is a Cirrus Logic CS35L56 mono amplifier which
is used in multiples, and can be connected either to I2C or SPI.
There will be multiple instances under the same Device() node. Add it
to ignore_serial_bus_ids and handle it in the serial-multi-instantiate
driver.
There can be a 5th I2cSerialBusV2, but this is an alias address and doesn't
represent a real device. Ignore this by having a dummy 5th entry in the
serial-multi-instantiate instance list with the name of a non-existent
driver, on the same pattern as done for bsg2150.
Signed-off-by: Simon Trimmer <simont@opensource.cirrus.com>
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230728111345.7224-1-rf@opensource.cirrus.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
struct pwm_device::pwm is a write-only variable in the pwm core and used
nowhere apart from this and another dev_dbg. So it isn't useful to
identify the used PWM. Emit the PWM's label instead in the debug
message.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Helge Deller <deller@gmx.de>
in mmphw_probe(), check the return value of clk_prepare_enable()
and return the error code if clk_prepare_enable() returns an
unexpected value.
Fixes: d63028c389 ("video: mmp display controller support")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Replacing zero-length arrays with C99 flexible-array members
because they are deprecated. Use the new DECLARE_FLEX_ARRAY()
auxiliary macro instead of defining a zero-length array.
This fixes warnings such as:
./drivers/video/fbdev/amifb.c:690:6-10: WARNING use flexible-array member instead
Signed-off-by: Atul Raut <rauji.raut@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
The timer dev->stat_monitor can schedule the delayed work dev->wq and
the delayed work dev->wq can also arm the dev->stat_monitor timer.
When the device is detaching, the net_device will be deallocated. but
the net_device private data could still be dereferenced in delayed work
or timer handler. As a result, the UAF bugs will happen.
One racy situation is shown below:
(Thread 1) | (Thread 2)
lan78xx_stat_monitor() |
... | lan78xx_disconnect()
lan78xx_defer_kevent() | ...
... | cancel_delayed_work_sync(&dev->wq);
schedule_delayed_work() | ...
(wait some time) | free_netdev(net); //free net_device
lan78xx_delayedwork() |
//use net_device private data |
dev-> //use |
Although we use cancel_delayed_work_sync() to cancel the delayed work
in lan78xx_disconnect(), it could still be scheduled in timer handler
lan78xx_stat_monitor().
Another racy situation is shown below:
(Thread 1) | (Thread 2)
lan78xx_delayedwork |
mod_timer() | lan78xx_disconnect()
| cancel_delayed_work_sync()
(wait some time) | if (timer_pending(&dev->stat_monitor))
| del_timer_sync(&dev->stat_monitor);
lan78xx_stat_monitor() | ...
lan78xx_defer_kevent() | free_netdev(net); //free
//use net_device private data|
dev-> //use |
Although we use del_timer_sync() to delete the timer, the function
timer_pending() returns 0 when the timer is activated. As a result,
the del_timer_sync() will not be executed and the timer could be
re-armed.
In order to mitigate this bug, We use timer_shutdown_sync() to shutdown
the timer and then use cancel_delayed_work_sync() to cancel the delayed
work. As a result, the net_device could be deallocated safely.
What's more, the dev->flags is set to EVENT_DEV_DISCONNECT in
lan78xx_disconnect(). But it could still be set to EVENT_STAT_UPDATE
in lan78xx_stat_monitor(). So this patch put the set_bit() behind
timer_shutdown_sync().
Fixes: 77dfff5bb7 ("lan78xx: Fix race condition in disconnect handling")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixing the ODP registration flow to set the iova correctly.
The calculation in ib_umem_num_dma_blocks() function assumes the iova of
the umem is set correctly.
When iova is not set, the calculation in ib_umem_num_dma_blocks() is
equivalent to length/page_size, which is true only when memory is aligned.
For unaligned memory, iova must be set for the ALIGN() in the
ib_umem_num_dma_blocks() to take effect and return a correct value.
mlx5_ib uses ib_umem_num_dma_blocks() to decide the mkey size to use for
the MR. Without this fix, when registering unaligned ODP MR, a wrong
size mkey might be chosen and this might cause the UMR to fail.
UMR would fail over insufficient size to update the mkey translation:
infiniband mlx5_0: dump_cqe:273:(pid 0): dump error cqe
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030: 00 00 00 00 0f 00 78 06 25 00 00 58 00 da ac d2
infiniband mlx5_0: mlx5_ib_post_send_wait:806:(pid 20311): reg umr
failed (6)
infiniband mlx5_0: pagefault_real_mr:661:(pid 20311): Failed to update
mkey page tables
Fixes: f0093fb1a7 ("RDMA/mlx5: Move mlx5_ib_cont_pages() to the creation of the mlx5_ib_mr")
Fixes: a665aca89a ("RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()")
Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://lore.kernel.org/r/3d4be7ca2155bf239dd8c00a2d25974a92c26ab8.1689757344.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
1. Use unevaluatedProperties
It's needed to allow ethernet-controller.yaml properties work correctly.
2. Drop unneeded phy-handle/phy-mode
3. Don't require phy-handle
Some SoCs may use fixed link.
For in-kernel MT7621 DTS files this fixes following errors:
arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@0: 'fixed-link' does not match any of the regexes: 'pinctrl-[0-9]+'
From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml
arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@0: 'phy-handle' is a required property
From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml
arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@1: 'fixed-link' does not match any of the regexes: 'pinctrl-[0-9]+'
From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml
arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@1: 'phy-handle' is a required property
From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzkaller found zero division error [0] in div_s64_rem() called from
get_cycle_time_elapsed(), where sched->cycle_time is the divisor.
We have tests in parse_taprio_schedule() so that cycle_time will never
be 0, and actually cycle_time is not 0 in get_cycle_time_elapsed().
The problem is that the types of divisor are different; cycle_time is
s64, but the argument of div_s64_rem() is s32.
syzkaller fed this input and 0x100000000 is cast to s32 to be 0.
@TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME={0xc, 0x8, 0x100000000}
We use s64 for cycle_time to cast it to ktime_t, so let's keep it and
set max for cycle_time.
While at it, we prevent overflow in setup_txtime() and add another
test in parse_taprio_schedule() to check if cycle_time overflows.
Also, we add a new tdc test case for this issue.
[0]:
divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 1 PID: 103 Comm: kworker/1:3 Not tainted 6.5.0-rc1-00330-g60cc1f7d0605 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:div_s64_rem include/linux/math64.h:42 [inline]
RIP: 0010:get_cycle_time_elapsed net/sched/sch_taprio.c:223 [inline]
RIP: 0010:find_entry_to_transmit+0x252/0x7e0 net/sched/sch_taprio.c:344
Code: 3c 02 00 0f 85 5e 05 00 00 48 8b 4c 24 08 4d 8b bd 40 01 00 00 48 8b 7c 24 48 48 89 c8 4c 29 f8 48 63 f7 48 99 48 89 74 24 70 <48> f7 fe 48 29 d1 48 8d 04 0f 49 89 cc 48 89 44 24 20 49 8d 85 10
RSP: 0018:ffffc90000acf260 EFLAGS: 00010206
RAX: 177450e0347560cf RBX: 0000000000000000 RCX: 177450e0347560cf
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000100000000
RBP: 0000000000000056 R08: 0000000000000000 R09: ffffed10020a0934
R10: ffff8880105049a7 R11: ffff88806cf3a520 R12: ffff888010504800
R13: ffff88800c00d800 R14: ffff8880105049a0 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0edf84f0e8 CR3: 000000000d73c002 CR4: 0000000000770ee0
PKRU: 55555554
Call Trace:
<TASK>
get_packet_txtime net/sched/sch_taprio.c:508 [inline]
taprio_enqueue_one+0x900/0xff0 net/sched/sch_taprio.c:577
taprio_enqueue+0x378/0xae0 net/sched/sch_taprio.c:658
dev_qdisc_enqueue+0x46/0x170 net/core/dev.c:3732
__dev_xmit_skb net/core/dev.c:3821 [inline]
__dev_queue_xmit+0x1b2f/0x3000 net/core/dev.c:4169
dev_queue_xmit include/linux/netdevice.h:3088 [inline]
neigh_resolve_output net/core/neighbour.c:1552 [inline]
neigh_resolve_output+0x4a7/0x780 net/core/neighbour.c:1532
neigh_output include/net/neighbour.h:544 [inline]
ip6_finish_output2+0x924/0x17d0 net/ipv6/ip6_output.c:135
__ip6_finish_output+0x620/0xaa0 net/ipv6/ip6_output.c:196
ip6_finish_output net/ipv6/ip6_output.c:207 [inline]
NF_HOOK_COND include/linux/netfilter.h:292 [inline]
ip6_output+0x206/0x410 net/ipv6/ip6_output.c:228
dst_output include/net/dst.h:458 [inline]
NF_HOOK.constprop.0+0xea/0x260 include/linux/netfilter.h:303
ndisc_send_skb+0x872/0xe80 net/ipv6/ndisc.c:508
ndisc_send_ns+0xb5/0x130 net/ipv6/ndisc.c:666
addrconf_dad_work+0xc14/0x13f0 net/ipv6/addrconf.c:4175
process_one_work+0x92c/0x13a0 kernel/workqueue.c:2597
worker_thread+0x60f/0x1240 kernel/workqueue.c:2748
kthread+0x2fe/0x3f0 kernel/kthread.c:389
ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308
</TASK>
Modules linked in:
Fixes: 4cfd5779bd ("taprio: Add support for txtime-assist mode")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Co-developed-by: Eric Dumazet <edumazet@google.com>
Co-developed-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous commit 4e484b3e96 ("xfrm: rate limit SA mapping change
message to user space") added one additional attribute named
XFRMA_MTIMER_THRESH and described its type at compat_policy
(net/xfrm/xfrm_compat.c).
However, the author forgot to also describe the nla_policy at
xfrma_policy (net/xfrm/xfrm_user.c). Hence, this suppose NLA_U32 (4
bytes) value can be faked as empty (0 bytes) by a malicious user, which
leads to 4 bytes overflow read and heap information leak when parsing
nlattrs.
To exploit this, one malicious user can spray the SLUB objects and then
leverage this 4 bytes OOB read to leak the heap data into
x->mapping_maxage (see xfrm_update_ae_params(...)), and leak it to
userspace via copy_to_user_state_extra(...).
The above bug is assigned CVE-2023-3773. To fix it, this commit just
completes the nla_policy description for XFRMA_MTIMER_THRESH, which
enforces the length check and avoids such OOB read.
Fixes: 4e484b3e96 ("xfrm: rate limit SA mapping change message to user space")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
nfsd_splice_actor() has a clause in its loop that chops up a compound page
into individual pages such that if the same page is seen twice in a row, it
is discarded the second time. This is a problem with the advent of
shmem_splice_read() as that inserts zero_pages into the pipe in lieu of
pages that aren't present in the pagecache.
Fix this by assuming that the last page is being extended only if the
currently stored length + starting offset is not currently on a page
boundary.
This can be tested by NFS-exporting a tmpfs filesystem on the test machine
and truncating it to more than a page in size (eg. truncate -s 8192) and
then reading it by NFS. The first page will be all zeros, but thereafter
garbage will be read.
Note: I wonder if we can ever get a situation now where we get a splice
that gives us contiguous parts of a page in separate actor calls. As NFSD
can only be splicing from a file (I think), there are only three sources of
the page: copy_splice_read(), shmem_splice_read() and file_splice_read().
The first allocates pages for the data it reads, so the problem cannot
occur; the second should never see a partial page; and the third waits for
each page to become available before we're allowed to read from it.
Fixes: bd194b1871 ("shmem: Implement splice-read")
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
cc: Hugh Dickins <hughd@google.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-nfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
If the tuning step is not set, the tuning step is set to 1.
For some sd cards, the following Tuning timeout will occur.
Tuning failed, falling back to fixed sampling clock
So set the default tuning step. This refers to the NXP vendor's
commit below:
https://github.com/nxp-imx/linux-imx/blob/lf-6.1.y/
arch/arm/boot/dts/imx6sx.dtsi#L1108-L1109
Fixes: 1e336aa0c0 ("mmc: sdhci-esdhc-imx: correct the tuning start tap and step setting")
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The CSI1 PHY reference clock is limited to 125 MHz according to:
i.MX 8M Mini Applications Processor Reference Manual, Rev. 3, 11/2020
Table 5-1. Clock Root Table (continued) / page 307
Slice Index n = 123 .
Currently the IMX8MM_CLK_CSI1_PHY_REF clock is configured to be
fed directly from 1 GHz PLL2 , which overclocks them. Instead, drop
the configuration altogether, which defaults the clock to 24 MHz REF
clock input, which for the PHY reference clock is just fine.
Based on a patch from Marek Vasut for the imx8mn.
Fixes: e523b7c54c ("arm64: dts: imx8mm: Add CSI nodes")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Marek Vasut <marex@denx.de>
Reviewed-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The CSI1 PHY reference clock are limited to 125 MHz according to:
i.MX 8M Nano Applications Processor Reference Manual, Rev. 2, 07/2022
Table 5-1. Clock Root Table (continued) / page 319
Slice Index n = 123 .
Currently those IMX8MN_CLK_CSI1_PHY_REF clock are configured to be
fed directly from 1 GHz PLL2 , which overclocks them . Instead, drop
the configuration altogether, which defaults the clock to 24 MHz REF
clock input, which for the PHY reference clock is just fine.
Fixes: ae9279f301 ("arm64: dts: imx8mn: Add CSI and ISI Nodes")
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
If the tuning step is not set, the tuning step is set to 1.
For some sd cards, the following Tuning timeout will occur.
Tuning failed, falling back to fixed sampling clock
mmc0: Tuning failed, falling back to fixed sampling clock
So set the default tuning step. This refers to the NXP vendor's
commit below:
https://github.com/nxp-imx/linux-imx/blob/lf-6.1.y/
arch/arm/boot/dts/imx7s.dtsi#L1216-L1217
Fixes: 1e336aa0c0 ("mmc: sdhci-esdhc-imx: correct the tuning start tap and step setting")
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
RTC interrupt level should be set to "LOW". This was revealed by the
introduction of commit:
f181987ef4 ("rtc: m41t80: use IRQ flags obtained from fwnode")
which changed the way IRQ type is obtained.
Signed-off-by: Andrej Picej <andrej.picej@norik.com>
Reviewed-by: Stefan Riedmüller <s.riedmueller@phytec.de>
Fixes: 800d595151 ("ARM: dts: imx6: Add initial support for phyBOARD-Mira")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Remove the LDB endpoint description from the common imx6sx.dtsi
as it causes regression for boards that has the LCDIF connected
directly to a parallel display.
Let the LDB endpoint be described in the board devicetree file
instead.
Cc: stable@vger.kernel.org
Fixes: b74edf626c ("ARM: dts: imx6sx: Add LDB support")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Eric Dumazet says:
====================
net: annotate data-races
This series was inspired by a syzbot/KCSAN report.
This will later also permit some optimizations,
like not having to lock the socket while reading/writing
some of its fields.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_getsockopt() runs locklessly. This means sk->sk_priority
can be read while other threads are changing its value.
Other reads also happen without socket lock being held.
Add missing annotations where needed.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a prior commit I forgot that sk_getsockopt() reads
sk->sk_ll_usec without holding a lock.
Fixes: 0dbffbb533 ("net: annotate data race around sk_ll_usec")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_getsockopt() runs locklessly, thus we need to annotate the read
of sk->sk_peek_off.
While we are at it, add corresponding annotations to sk_set_peek_off()
and unix_set_peek_off().
Fixes: b9bb53f383 ("sock: convert sk_peek_offset functions to WRITE_ONCE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk->sk_mark is often read while another thread could change the value.
Fixes: 4a19ec5800 ("[NET]: Introducing socket mark socket option.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a prior commit, I forgot to change sk_getsockopt()
when reading sk->sk_rcvbuf locklessly.
Fixes: ebb3b78db7 ("tcp: annotate sk->sk_rcvbuf lockless reads")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a prior commit, I forgot to change sk_getsockopt()
when reading sk->sk_sndbuf locklessly.
Fixes: e292f05e0d ("tcp: annotate sk->sk_sndbuf lockless reads")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_getsockopt() runs without locks, we must add annotations
to sk->sk_rcvtimeo and sk->sk_sndtimeo.
In the future we might allow fetching these fields before
we lock the socket in TCP fast path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a prior commit, I forgot to change sk_getsockopt()
when reading sk->sk_rcvlowat locklessly.
Fixes: eac66402d1 ("net: annotate sk->sk_rcvlowat lockless reads")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_getsockopt() runs locklessly. This means sk->sk_max_pacing_rate
can be read while other threads are changing its value.
Fixes: 62748f32d5 ("net: introduce SO_MAX_PACING_RATE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_getsockopt() runs locklessly. This means sk->sk_txrehash
can be read while other threads are changing its value.
Other locations were handled in commit cb6cd2cec7
("tcp: Change SYN ACK retransmit behaviour to account for rehash")
Fixes: 26859240e4 ("txhash: Add socket option to control TX hash rethink behavior")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Akhmat Karakotov <hmukos@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_getsockopt() runs locklessly. This means sk->sk_reserved_mem
can be read while other threads are changing its value.
Add missing annotations where they are needed.
Fixes: 2bb2f5fb21 ("net: add new socket option SO_RESERVE_MEM")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to
`udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the
device from CB. The fix changes it to fetch the device from `skb->dev`.
l3mdev case requires special attention since it has a master and a slave
device.
Fixes: a6024562ff ("udp: Add GRO functions to UDP socket")
Reported-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here we've got to a situation when tasklet called usleep_range() in PTT
acquire logic, thus welcome to the "scheduling while atomic" BUG().
BUG: scheduling while atomic: swapper/24/0/0x00000100
[<ffffffffb41c6199>] schedule+0x29/0x70
[<ffffffffb41c5512>] schedule_hrtimeout_range_clock+0xb2/0x150
[<ffffffffb41c55c3>] schedule_hrtimeout_range+0x13/0x20
[<ffffffffb41c3bcf>] usleep_range+0x4f/0x70
[<ffffffffc08d3e58>] qed_ptt_acquire+0x38/0x100 [qed]
[<ffffffffc08eac48>] _qed_get_vport_stats+0x458/0x580 [qed]
[<ffffffffc08ead8c>] qed_get_vport_stats+0x1c/0xd0 [qed]
[<ffffffffc08dffd3>] qed_get_protocol_stats+0x93/0x100 [qed]
qed_mcp_send_protocol_stats
case MFW_DRV_MSG_GET_LAN_STATS:
case MFW_DRV_MSG_GET_FCOE_STATS:
case MFW_DRV_MSG_GET_ISCSI_STATS:
case MFW_DRV_MSG_GET_RDMA_STATS:
[<ffffffffc08e36d8>] qed_mcp_handle_events+0x2d8/0x890 [qed]
qed_int_assertion
qed_int_attentions
[<ffffffffc08d9490>] qed_int_sp_dpc+0xa50/0xdc0 [qed]
[<ffffffffb3aa7623>] tasklet_action+0x83/0x140
[<ffffffffb41d9125>] __do_softirq+0x125/0x2bb
[<ffffffffb41d560c>] call_softirq+0x1c/0x30
[<ffffffffb3a30645>] do_softirq+0x65/0xa0
[<ffffffffb3aa78d5>] irq_exit+0x105/0x110
[<ffffffffb41d8996>] do_IRQ+0x56/0xf0
Fix this by making caller to provide the context whether it could be in
atomic context flow or not when getting stats from QED driver.
QED driver based on the context provided decide to schedule out or not
when acquiring the PTT BAR window.
We faced the BUG_ON() while getting vport stats, but according to the
code same issue could happen for fcoe and iscsi statistics as well, so
fixing them too.
Fixes: 6c75424612 ("qed: Add support for NCSI statistics.")
Fixes: 1e128c8129 ("qed: Add support for hardware offloaded FCoE.")
Fixes: 2f2b2614e8 ("qed: Provide iSCSI statistics to management")
Cc: Sudarsana Kalluru <skalluru@marvell.com>
Cc: David Miller <davem@davemloft.net>
Cc: Manish Chopra <manishc@marvell.com>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit (SHA1: 5c844d57aa) provided code
to apply "Module 6: Certain PHY registers must be written as pairs instead
of singly" errata for KSZ9477 as this chip for certain PHY registers
(0xN120 to 0xN13F, N=1,2,3,4,5) must be accesses as 32 bit words instead
of 16 or 8 bit access.
Otherwise, adjacent registers (no matter if reserved or not) are
overwritten with 0x0.
Without this patch some registers (e.g. 0x113c or 0x1134) required for 32
bit access are out of valid regmap ranges.
As a result, following error is observed and KSZ9477 is not properly
configured:
ksz-switch spi1.0: can't rmw 32bit reg 0x113c: -EIO
ksz-switch spi1.0: can't rmw 32bit reg 0x1134: -EIO
ksz-switch spi1.0 lan1 (uninitialized): failed to connect to PHY: -EIO
ksz-switch spi1.0 lan1 (uninitialized): error -5 setting up PHY for tree 0, switch 0, port 0
The solution is to modify regmap_reg_range to allow accesses with 4 bytes
boundaries.
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The clock data is an array of struct clk_bulk_data, so make sure to
allocate enough memory.
Fixes: d8ca113724 ("net: stmmac: tegra: Add MGBE support")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 9fb6c9b3fe ("s390/sthyi: add cache to store hypervisor info")
added cache handling for store hypervisor info. This also changed the
possible return code for sthyi_fill().
Instead of only returning a condition code like the sthyi instruction would
do, it can now also return a negative error value (-ENOMEM). handle_styhi()
was not changed accordingly. In case of an error, the negative error value
would incorrectly injected into the guest PSW.
Add proper error handling to prevent this, and update the comment which
describes the possible return values of sthyi_fill().
Fixes: 9fb6c9b3fe ("s390/sthyi: add cache to store hypervisor info")
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20230727182939.2050744-1-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
getline() returns -1 at EOF as well as on error. It also doesn't set
errno to 0 on success, so initialize it to 0 before using errno to check
for an error condition. See the paragraph here [1]:
For some system calls and library functions (e.g., getpriority(2)),
-1 is a valid return on success. In such cases, a successful return
can be distinguished from an error return by setting errno to zero
before the call, and then, if the call returns a status that indicates
that an error may have occurred, checking to see if errno has a
nonzero value.
Bear has a bug [2] that launches processes with errno set and causes the
following build failure:
$ bear -- make LLVM=1
...
LD .tmp_vmlinux.kallsyms1
NM .tmp_vmlinux.kallsyms1.syms
KSYMS .tmp_vmlinux.kallsyms1.S
read_symbol: Invalid argument
[1]: https://linux.die.net/man/3/errno
[2]: https://github.com/rizsotto/Bear/issues/469
Fixes: 1c975da56a ("scripts/kallsyms: remove KSYM_NAME_LEN_BUFFER")
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
As &hc->lock is acquired by both timer _hfcpci_softirq() and hardirq
hfcpci_int(), the timer should disable irq before lock acquisition
otherwise deadlock could happen if the timmer is preemtped by the hadr irq.
Possible deadlock scenario:
hfcpci_softirq() (timer)
-> _hfcpci_softirq()
-> spin_lock(&hc->lock);
<irq interruption>
-> hfcpci_int()
-> spin_lock(&hc->lock); (deadlock here)
This flaw was found by an experimental static analysis tool I am developing
for irq-related deadlock.
The tentative patch fixes the potential deadlock by spin_lock_irq()
in timer.
Fixes: b36b654a7e ("mISDN: Create /sys/class/mISDN")
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Link: https://lore.kernel.org/r/20230727085619.7419-1-dg573847474@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A match entry is uniquely identified with an "address" or "path" in the
form of: hashtable ID(12b):bucketid(8b):nodeid(12b).
When creating table match entries all of hash table id, bucket id and
node (match entry id) are needed to be either specified by the user or
reasonable in-kernel defaults are used. The in-kernel default for a table id is
0x800(omnipresent root table); for bucketid it is 0x0. Prior to this fix there
was none for a nodeid i.e. the code assumed that the user passed the correct
nodeid and if the user passes a nodeid of 0 (as Mingi Cho did) then that is what
was used. But nodeid of 0 is reserved for identifying the table. This is not
a problem until we dump. The dump code notices that the nodeid is zero and
assumes it is referencing a table and therefore references table struct
tc_u_hnode instead of what was created i.e match entry struct tc_u_knode.
Ming does an equivalent of:
tc filter add dev dummy0 parent 10: prio 1 handle 0x1000 \
protocol ip u32 match ip src 10.0.0.1/32 classid 10:1 action ok
Essentially specifying a table id 0, bucketid 1 and nodeid of zero
Tableid 0 is remapped to the default of 0x800.
Bucketid 1 is ignored and defaults to 0x00.
Nodeid was assumed to be what Ming passed - 0x000
dumping before fix shows:
~$ tc filter ls dev dummy0 parent 10:
filter protocol ip pref 1 u32 chain 0
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor -30591
Note that the last line reports a table instead of a match entry
(you can tell this because it says "ht divisor...").
As a result of reporting the wrong data type (misinterpretting of struct
tc_u_knode as being struct tc_u_hnode) the divisor is reported with value
of -30591. Ming identified this as part of the heap address
(physmap_base is 0xffff8880 (-30591 - 1)).
The fix is to ensure that when table entry matches are added and no
nodeid is specified (i.e nodeid == 0) then we get the next available
nodeid from the table's pool.
After the fix, this is what the dump shows:
$ tc filter ls dev dummy0 parent 10:
filter protocol ip pref 1 u32 chain 0
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1
filter protocol ip pref 1 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 10:1 not_in_hw
match 0a000001/ffffffff at 12
action order 1: gact action pass
random type none pass val 0
index 1 ref 1 bind 1
Reported-by: Mingi Cho <mgcho.minic@gmail.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20230726135151.416917-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the device does not support Sanitize or Secure Erase commands,
hide the respective sysfs interfaces such that the operation can
never be attempted.
In order to be generic, keep track of the enabled security commands
found in the CEL - the driver does not support Security Passthrough.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230726051940.3570-4-dave@stgolabs.net
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Whelp, this is embarrassing. Since commit 082fdfd138 ("KVM: arm64:
Prevent guests from enabling HA/HD on Ampere1") KVM traps writes to
TCR_EL1 on AmpereOne to work around an erratum in the unadvertised
HAFDBS implementation, preventing the guest from enabling the feature.
Unfortunately, I failed virtualization 101 when working on that change,
and forgot to advance PC after instruction emulation.
Do the right thing and skip the MSR instruction after emulating the
write.
Fixes: 082fdfd138 ("KVM: arm64: Prevent guests from enabling HA/HD on Ampere1")
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230728000824.3848025-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
With ppc64 -mprofile-kernel and ppc32 -pg, profiling instructions to
call into ftrace are emitted right at function entry. The instruction
sequence used is minimal to reduce overhead. Crucially, a stackframe is
not created for the function being traced. This breaks stack unwinding
since the function being traced does not have a stackframe for itself.
As such, it never shows up in the backtrace:
/sys/kernel/debug/tracing # echo 1 > /proc/sys/kernel/stack_tracer_enabled
/sys/kernel/debug/tracing # cat stack_trace
Depth Size Location (17 entries)
----- ---- --------
0) 4144 32 ftrace_call+0x4/0x44
1) 4112 432 get_page_from_freelist+0x26c/0x1ad0
2) 3680 496 __alloc_pages+0x290/0x1280
3) 3184 336 __folio_alloc+0x34/0x90
4) 2848 176 vma_alloc_folio+0xd8/0x540
5) 2672 272 __handle_mm_fault+0x700/0x1cc0
6) 2400 208 handle_mm_fault+0xf0/0x3f0
7) 2192 80 ___do_page_fault+0x3e4/0xbe0
8) 2112 160 do_page_fault+0x30/0xc0
9) 1952 256 data_access_common_virt+0x210/0x220
10) 1696 400 0xc00000000f16b100
11) 1296 384 load_elf_binary+0x804/0x1b80
12) 912 208 bprm_execve+0x2d8/0x7e0
13) 704 64 do_execveat_common+0x1d0/0x2f0
14) 640 160 sys_execve+0x54/0x70
15) 480 64 system_call_exception+0x138/0x350
16) 416 416 system_call_common+0x160/0x2c4
Fix this by having ftrace create a dummy stackframe for the function
being traced. With this, backtraces now capture the function being
traced:
/sys/kernel/debug/tracing # cat stack_trace
Depth Size Location (17 entries)
----- ---- --------
0) 3888 32 _raw_spin_trylock+0x8/0x70
1) 3856 576 get_page_from_freelist+0x26c/0x1ad0
2) 3280 64 __alloc_pages+0x290/0x1280
3) 3216 336 __folio_alloc+0x34/0x90
4) 2880 176 vma_alloc_folio+0xd8/0x540
5) 2704 416 __handle_mm_fault+0x700/0x1cc0
6) 2288 96 handle_mm_fault+0xf0/0x3f0
7) 2192 48 ___do_page_fault+0x3e4/0xbe0
8) 2144 192 do_page_fault+0x30/0xc0
9) 1952 608 data_access_common_virt+0x210/0x220
10) 1344 16 0xc0000000334bbb50
11) 1328 416 load_elf_binary+0x804/0x1b80
12) 912 64 bprm_execve+0x2d8/0x7e0
13) 848 176 do_execveat_common+0x1d0/0x2f0
14) 672 192 sys_execve+0x54/0x70
15) 480 64 system_call_exception+0x138/0x350
16) 416 416 system_call_common+0x160/0x2c4
This results in two additional stores in the ftrace entry code, but
produces reliable backtraces.
Fixes: 153086644f ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI")
Cc: stable@vger.kernel.org
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230621051349.759567-1-naveen@kernel.org
The range and the defaults are specified in the description instead of
being specified in the schema.
Fix it by adding the default value in the `default` field and specifying
the range as `minimum` and `maximum`.
Fixes: b331b8ef86 ("dt-bindings: net: convert rockchip-dwmac to json-schema")
Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the multi-core JPEG encoder/decoder setup, the driver for the
individual cores references the parent device's platform driver data.
However, in the parent driver, this is only set at the end of the probe
function, way later than devm_of_platform_populate(), which triggers
the probe of the cores. This causes a kernel splat in the sub-device
probe function.
Move platform_set_drvdata() to before devm_of_platform_populate() to
fix this.
Fixes: 934e8bccac ("mtk-jpegenc: support jpegenc multi-hardware")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Saeed Mahameed says:
====================
mlx5 fixes 2023-07-26
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2023-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Unregister devlink params in case interface is down
net/mlx5: DR, Fix peer domain namespace setting
net/mlx5: fs_chains: Fix ft prio if ignore_flow_level is not supported
net/mlx5e: kTLS, Fix protection domain in use syndrome when devlink reload
net/mlx5: Bridge, set debugfs access right to root-only
net/mlx5e: xsk: Fix crash on regular rq reactivation
net/mlx5e: xsk: Fix invalid buffer access for legacy rq
net/mlx5e: Move representor neigh cleanup to profile cleanup_tx
net/mlx5e: Fix crash moving to switchdev mode when ntuple offload is set
net/mlx5e: Don't hold encap tbl lock if there is no encap action
net/mlx5: Honor user input for migratable port fn attr
net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
net/mlx5: fix potential memory leak in mlx5e_init_rep_rx
net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx
net/mlx5e: fix double free in macsec_fs_tx_create_crypto_table_groups
====================
Link: https://lore.kernel.org/r/20230726213206.47022-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I tried to get stmmac maintainers to be more active by agreeing with
them off-list on a review rotation. I pinged Peppe 3 times over 2 weeks
during his "shift month", no reviews are flowing.
All the contributions are much appreciated! But stmmac is quite
active, we need participating maintainers :(
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230726151120.1649474-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Older DSA drivers that do not provide an dsa_ops adjust_link method end
up using phylink. Unfortunately, a recent phylink change that requires
its supported_interfaces bitmap to be filled breaks these drivers
because the bitmap remains empty.
Rather than fixing each driver individually, fix it in the core code so
we have a sensible set of defaults.
Reported-by: Sergei Antonov <saproj@gmail.com>
Fixes: de5c9bf40c ("net: phylink: require supported_interfaces to be filled")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com> # dsa_loop
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/E1qOflM-001AEz-D3@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are totally 9 ndo_bridge_setlink handlers in the current kernel,
which are 1) bnxt_bridge_setlink, 2) be_ndo_bridge_setlink 3)
i40e_ndo_bridge_setlink 4) ice_bridge_setlink 5)
ixgbe_ndo_bridge_setlink 6) mlx5e_bridge_setlink 7)
nfp_net_bridge_setlink 8) qeth_l2_bridge_setlink 9) br_setlink.
By investigating the code, we find that 1-7 parse and use nlattr
IFLA_BRIDGE_MODE but 3 and 4 forget to do the nla_len check. This can
lead to an out-of-attribute read and allow a malformed nlattr (e.g.,
length 0) to be viewed as a 2 byte integer.
To avoid such issues, also for other ndo_bridge_setlink handlers in the
future. This patch adds the nla_len check in rtnl_bridge_setlink and
does an early error return if length mismatches. To make it works, the
break is removed from the parsing for IFLA_BRIDGE_FLAGS to make sure
this nla_for_each_nested iterates every attribute.
Fixes: b1edc14a3f ("ice: Implement ice_bridge_getlink and ice_bridge_setlink")
Fixes: 51616018dd ("i40e: Add support for getlink, setlink ndo ops")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230726075314.1059224-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The nla_for_each_nested parsing in function bpf_sk_storage_diag_alloc
does not check the length of the nested attribute. This can lead to an
out-of-attribute read and allow a malformed nlattr (e.g., length 0) to
be viewed as a 4 byte integer.
This patch adds an additional check when the nlattr is getting counted.
This makes sure the latter nla_get_u32 can access the attributes with
the correct length.
Fixes: 1ed4d92458 ("bpf: INET_DIAG support in bpf_sk_storage")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230725023330.422856-1-linma@zju.edu.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
gcc gets confused when -ftrivial-auto-var-init=pattern is used on sparse
bit fields such as 'struct spi_mem_op', which caused the previous false
positive warning about an uninitialized variable:
drivers/mtd/spi-nor/spansion.c: error: 'op' is used uninitialized [-Werror=uninitialized]
In fact, the variable is fully initialized and gcc does not see it being
used, so the warning is entirely bogus. The problem appears to be
a misoptimization in the initialization of single bit fields when the
rest of the bytes are not initialized.
A previous workaround added another initialization, which ended up
shutting up the warning in spansion.c, though it apparently still happens
in other files as reported by Peter Foley in the gcc bugzilla. The
workaround of adding a fake initialization seems particularly bad
because it would set values that can never be correct but prevent the
compiler from warning about actually missing initializations.
Revert the broken workaround and instead pad the structure to only
have bitfields that add up to full bytes, which should avoid this
behavior in all drivers.
I also filed a new bug against gcc with what I found, so this can
hopefully be addressed in future gcc releases. At the moment, only
gcc-12 and gcc-13 are affected.
Cc: Peter Foley <pefoley2@pefoley.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110743
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108402
Link: https://godbolt.org/z/efMMsG1Kx
Fixes: 420c4495b5 ("mtd: spi-nor: spansion: make sure local struct does not contain garbage")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230719190045.4007391-1-arnd@kernel.org
SoCFPGA dts fix for v6.5
- Fix incorrect I2C property for SCL signal
* tag 'socfpga_dts_fix_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
arm64: dts: stratix10: fix incorrect I2C property for SCL signal
Link: https://lore.kernel.org/r/20230724145617.887443-1-dinguyen@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Memory controller drivers - fixes for v6.5
Two fixes are needed for Tegra194 memory controllers caused by the same
Tegra PCI commit merged in v6.5-rc1. The Tegra PCI requires now
interconnect from the memory controller, which was set only for
Tegra234, but not for Tegra194, causing probe deferrals. Expose some
dummy interconnect provider for Tegra194, to satisfy PCI driver needs.
* tag 'memory-controller-drv-fixes-6.5' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl:
memory: tegra: make icc_set_bw return zero if BWMGR not supported
memory: tegra: Add dummy implementation on Tegra194
Link: https://lore.kernel.org/r/20230726084811.124038-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
i.MX fixes for 6.5:
- A couple of ARM DTS fixes for i.MX6SLL usbphy and supported CPU
frequency of sk-imx53 board
- Add missing pull-up for imx8mn-var-som onboard PHY reset pinmux
- A couple of imx8mm-venice fixes from Tim Harvey to diable disp_blk_ctrl
- A couple of phycore-imx8mm fixes from Yashwanth Varakala to correct
VPU label and gpio-line-names
- Fix imx8mp-blk-ctrl driver to register HSIO PLL clock as bus_power_dev
child, so that runtime PM can translate into the necessary GPC power
domain action
* tag 'imx-fixes-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
soc: imx: imx8mp-blk-ctrl: register HSIO PLL clock as bus_power_dev child
ARM: dts: nxp/imx: limit sk-imx53 supported frequencies
arm64: dts: freescale: Fix VPU G2 clock
arm64: dts: imx8mn-var-som: add missing pull-up for onboard PHY reset pinmux
arm64: dts: phycore-imx8mm: Correction in gpio-line-names
arm64: dts: phycore-imx8mm: Label typo-fix of VPU
ARM: dts: nxp/imx6sll: fix wrong property name in usbphy node
arm64: dts: imx8mm-venice-gw7904: disable disp_blk_ctrl
arm64: dts: imx8mm-venice-gw7903: disable disp_blk_ctrl
Link: https://lore.kernel.org/r/20230725075837.GR151430@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The corgi_lcd_limit_intensity() function is called from platform
and defined in a driver, but the driver does not see the declaration:
drivers/video/backlight/corgi_lcd.c:434:6: error: no previous prototype for 'corgi_lcd_limit_intensity' [-Werror=missing-prototypes]
434 | void corgi_lcd_limit_intensity(int limit)
Move the prototype into a header that can be included from both
sides to shut up the warning.
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Prior to this change, events without a group would be sorted as if they
were from the location of the first event without a group. For example
instructions and cycles are without a group:
instructions,{imc_free_running/data_read/,imc_free_running/data_write/},cycles
parse events would create an eventual evlist like:
instructions,cycles,{uncore_imc_free_running_0/data_read/,uncore_imc_free_running_1/data_read/,uncore_imc_free_running_0/data_write/,uncore_imc_free_running_1/data_write/}
This is done so that perf metric events, that must always be in a
group, will be adjacent and so can be forced into a group.
This change modifies the sorting so that only force grouped events,
like perf metrics, are sorted and all other events keep their position
with respect to groups in the evlist. The location of the force
grouped event is chosen to match the first force grouped event.
For architectures without force grouped events, ie anything not Intel
Icelake or newer, this should mean sorting and fixing doesn't modify
the event positions except when fixing the grouping for PMUs of things
like uncore events.
Fixes: 347c2f0a09 ("perf parse-events: Sort and group parsed events")
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230719001836.198363-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The evsel grouping fix iterates over evsels tracking the leader group
and the current position's group, updating the current position's leader
if an evsel is being forced into a group or groups changed. However,
groups changing isn't a sufficient condition as sorting may have
reordered events and the leader may no longer come first. For this
reason update all leaders whenever they disagree.
This change breaks certain Icelake+ metrics due to bugs in the
kernel. For example, tma_l3_bound with threshold enabled tries to
program the events:
{topdown-retiring,slots,CYCLE_ACTIVITY.STALLS_L2_MISS,topdown-fe-bound,EXE_ACTIVITY.BOUND_ON_STORES,EXE_ACTIVITY.1_PORTS_UTIL,topdown-be-bound,cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/,CYCLE_ACTIVITY.STALLS_L3_MISS,CPU_CLK_UNHALTED.THREAD,CYCLE_ACTIVITY.STALLS_MEM_ANY,EXE_ACTIVITY.2_PORTS_UTIL,CYCLE_ACTIVITY.STALLS_TOTAL,topdown-bad-spec}:W
fixing the perf metric event order gives:
{slots,topdown-retiring,topdown-fe-bound,topdown-be-bound,topdown-bad-spec,CYCLE_ACTIVITY.STALLS_L2_MISS,EXE_ACTIVITY.BOUND_ON_STORES,EXE_ACTIVITY.1_PORTS_UTIL,cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/,CYCLE_ACTIVITY.STALLS_L3_MISS,CPU_CLK_UNHALTED.THREAD,CYCLE_ACTIVITY.STALLS_MEM_ANY,EXE_ACTIVITY.2_PORTS_UTIL,CYCLE_ACTIVITY.STALLS_TOTAL}:W
Both of these return "<not counted>" for all events, whilst they work
with the group removed respecting that the perf metric events must still
be grouped. A vendor events update will need to add METRIC_NO_GROUP to
these metrics to workaround the kernel PMU driver issue.
Fixes: a90cc5a9ee ("perf evsel: Don't let evsel__group_pmu_name() traverse unsorted group")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230719001836.198363-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Perf metric (topdown) events on Intel Icelake+ machines require a
group, however, they may be next to events that don't require a group.
Consider:
cycles,slots,topdown-fe-bound
The cycles event needn't be grouped but slots and topdown-fe-bound need
grouping.
Prior to this change, as slots and topdown-fe-bound need a group forcing
and all events share the same PMU, slots and topdown-fe-bound would be
forced into a group with cycles.
This is a bug on two fronts, cycles wasn't supposed to be grouped and
cycles can't be a group leader with a perf metric event.
This change adds recognition that cycles isn't force grouped and so it
shouldn't be force grouped with slots and topdown-fe-bound.
Fixes: a90cc5a9ee ("perf evsel: Don't let evsel__group_pmu_name() traverse unsorted group")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230719001836.198363-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since commit bb1520d581 ("s390/mm: start kernel with DAT enabled")
the kernel crashes early during boot when debug pagealloc is enabled:
mem auto-init: stack:off, heap alloc:off, heap free:off
addressing exception: 0005 ilc:2 [#1] SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0-rc3-09759-gc5666c912155 #630
[..]
Krnl Code: 00000000001325f6: ec5600248064 cgrj %r5,%r6,8,000000000013263e
00000000001325fc: eb880002000c srlg %r8,%r8,2
#0000000000132602: b2210051 ipte %r5,%r1,%r0,0
>0000000000132606: b90400d1 lgr %r13,%r1
000000000013260a: 41605008 la %r6,8(%r5)
000000000013260e: a7db1000 aghi %r13,4096
0000000000132612: b221006d ipte %r6,%r13,%r0,0
0000000000132616: e3d0d0000171 lay %r13,4096(%r13)
Call Trace:
__kernel_map_pages+0x14e/0x320
__free_pages_ok+0x23a/0x5a8)
free_low_memory_core_early+0x214/0x2c8
memblock_free_all+0x28/0x58
mem_init+0xb6/0x228
mm_core_init+0xb6/0x3b0
start_kernel+0x1d2/0x5a8
startup_continue+0x36/0x40
Kernel panic - not syncing: Fatal exception: panic_on_oops
This is caused by using large mappings on machines with EDAT1/EDAT2. Add
the code to split the mappings into 4k pages if debug pagealloc is enabled
by CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT or the debug_pagealloc kernel
command line option.
Fixes: bb1520d581 ("s390/mm: start kernel with DAT enabled")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Add the option to flush IBPB only on VMEXIT in order to protect from
malicious guests but one otherwise trusts the software that runs on the
hypervisor.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Add the option to mitigate using IBPB on a kernel entry. Pull in the
Retbleed alternative so that the IBPB call from there can be used. Also,
if Retbleed mitigation is done using IBPB, the same mitigation can and
must be used here.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Add support for the synthetic CPUID flag which "if this bit is 1,
it indicates that MSR 49h (PRED_CMD) bit 0 (IBPB) flushes all branch
type predictions from the CPU branch predictor."
This flag is there so that this capability in guests can be detected
easily (otherwise one would have to track microcode revisions which is
impossible for guests).
It is also needed only for Zen3 and -4. The other two (Zen1 and -2)
always flush branch type predictions by default.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Add a mitigation for the speculative return address stack overflow
vulnerability found on AMD processors.
The mitigation works by ensuring all RET instructions speculate to
a controlled location, similar to how speculation is controlled in the
retpoline sequence. To accomplish this, the __x86_return_thunk forces
the CPU to mispredict every function return using a 'safe return'
sequence.
To ensure the safety of this mitigation, the kernel must ensure that the
safe return sequence is itself free from attacker interference. In Zen3
and Zen4, this is accomplished by creating a BTB alias between the
untraining function srso_untrain_ret_alias() and the safe return
function srso_safe_ret_alias() which results in evicting a potentially
poisoned BTB entry and using that safe one for all function returns.
In older Zen1 and Zen2, this is accomplished using a reinterpretation
technique similar to Retbleed one: srso_untrain_ret() and
srso_safe_ret().
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
The offending patch is based on the assumption that for PFs,
mlx5_get_dev_index() is the same as vhca_id. However, this assumption
is wrong in case of DPU (ECPF).
Fix it by using vhca_id directly, and switch the array of peers to
xarray.
Fixes: 6d5b7321d8 ("net/mlx5: DR, handle more than one peer domain")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit sets ft prio to fs_base_prio. But if
ignore_flow_level it not supported, ft prio must be set based on
tc filter prio. Otherwise, all the ft prio are the same on the same
chain. It is invalid if ignore_flow_level is not supported.
Fix it by setting ft prio based on tc filter prio and setting
fs_base_prio to 0 for fdb.
Fixes: 8e80e56480 ("net/mlx5: fs_chains: Refactor to detach chains from tc usage")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
There are DEK objects cached in DEK pool after kTLS is used, and they
are freed only in mlx5e_ktls_cleanup().
mlx5e_destroy_mdev_resources() is called in mlx5e_suspend() to
free mdev resources, including protection domain (PD). However, PD is
still referenced by the cached DEK objects in this case, because
profile->cleanup() (and therefore mlx5e_ktls_cleanup()) is called
after mlx5e_suspend() during devlink reload. So the following FW
syndrome is generated:
mlx5_cmd_out_err:803:(pid 12948): DEALLOC_PD(0x801) op_mod(0x0) failed,
status bad resource state(0x9), syndrome (0xef0c8a), err(-22)
To avoid this syndrome, move DEK pool destruction to
mlx5e_ktls_cleanup_tx(), which is called by profile->cleanup_tx(). And
move pool creation to mlx5e_ktls_init_tx() for symmetry.
Fixes: f741db1a51 ("net/mlx5e: kTLS, Improve connection rate by using fast update encryption key")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When the regular rq is reactivated after the XSK socket is closed
it could be reading stale cqes which eventually corrupts the rq.
This leads to no more traffic being received on the regular rq and a
crash on the next close or deactivation of the rq.
Kal Cuttler Conely reported this issue as a crash on the release
path when the xdpsock sample program is stopped (killed) and restarted
in sequence while traffic is running.
This patch flushes all cqes when during the rq flush. The cqe flushing
is done in the reset state of the rq. mlx5e_rq_to_ready code is moved
into the flush function to allow for this.
Fixes: 082a9edf12 ("net/mlx5e: xsk: Flush RQ on XSK activation to save memory")
Reported-by: Kal Cutter Conley <kal.conley@dectris.com>
Closes: https://lore.kernel.org/xdp-newbies/CAHApi-nUAs4TeFWUDV915CZJo07XVg2Vp63-no7UDfj6wur9nQ@mail.gmail.com
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit holds encap tbl lock unconditionally when setting
up dests. But it may cause the following deadlock:
PID: 1063722 TASK: ffffa062ca5d0000 CPU: 13 COMMAND: "handler8"
#0 [ffffb14de05b7368] __schedule at ffffffffa1d5aa91
#1 [ffffb14de05b7410] schedule at ffffffffa1d5afdb
#2 [ffffb14de05b7430] schedule_preempt_disabled at ffffffffa1d5b528
#3 [ffffb14de05b7440] __mutex_lock at ffffffffa1d5d6cb
#4 [ffffb14de05b74e8] mutex_lock_nested at ffffffffa1d5ddeb
#5 [ffffb14de05b74f8] mlx5e_tc_tun_encap_dests_set at ffffffffc12f2096 [mlx5_core]
#6 [ffffb14de05b7568] post_process_attr at ffffffffc12d9fc5 [mlx5_core]
#7 [ffffb14de05b75a0] mlx5e_tc_add_fdb_flow at ffffffffc12de877 [mlx5_core]
#8 [ffffb14de05b75f0] __mlx5e_add_fdb_flow at ffffffffc12e0eef [mlx5_core]
#9 [ffffb14de05b7660] mlx5e_tc_add_flow at ffffffffc12e12f7 [mlx5_core]
#10 [ffffb14de05b76b8] mlx5e_configure_flower at ffffffffc12e1686 [mlx5_core]
#11 [ffffb14de05b7720] mlx5e_rep_indr_offload at ffffffffc12e3817 [mlx5_core]
#12 [ffffb14de05b7730] mlx5e_rep_indr_setup_tc_cb at ffffffffc12e388a [mlx5_core]
#13 [ffffb14de05b7740] tc_setup_cb_add at ffffffffa1ab2ba8
#14 [ffffb14de05b77a0] fl_hw_replace_filter at ffffffffc0bdec2f [cls_flower]
#15 [ffffb14de05b7868] fl_change at ffffffffc0be6caa [cls_flower]
#16 [ffffb14de05b7908] tc_new_tfilter at ffffffffa1ab71f0
[1031218.028143] wait_for_completion+0x24/0x30
[1031218.028589] mlx5e_update_route_decap_flows+0x9a/0x1e0 [mlx5_core]
[1031218.029256] mlx5e_tc_fib_event_work+0x1ad/0x300 [mlx5_core]
[1031218.029885] process_one_work+0x24e/0x510
Actually no need to hold encap tbl lock if there is no encap action.
Fix it by checking if encap action exists or not before holding
encap tbl lock.
Fixes: 37c3b9fa7c ("net/mlx5e: Prevent encap offload when neigh update is running")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, whenever a user is setting migratable port fn attr, the
driver is always turn migratable capability on.
Fix it by honor the user input
Fixes: e5b9642a33 ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5e_ipsec_remove_trailer() should return an error code if function
pskb_trim() returns an unexpected value.
Fixes: 2ac9cfe782 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The memory pointed to by the priv->rx_res pointer is not freed in the error
path of mlx5e_init_rep_rx, which can lead to a memory leak. Fix by freeing
the memory in the error path, thereby making the error path identical to
mlx5e_cleanup_rep_rx().
Fixes: af8bbf7300 ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
when mlx5_cmd_exec failed in mlx5dr_cmd_create_reformat_ctx, the memory
pointed by 'in' is not released, which will cause memory leak. Move memory
release after mlx5_cmd_exec.
Fixes: 1d9186476e ("net/mlx5: DR, Add direct rule command utilities")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In function macsec_fs_tx_create_crypto_table_groups(), when the ft->g
memory is successfully allocated but the 'in' memory fails to be
allocated, the memory pointed to by ft->g is released once. And in function
macsec_fs_tx_create(), macsec_fs_tx_destroy() is called to release the
memory pointed to by ft->g again. This will cause double free problem.
Fixes: e467b283ff ("net/mlx5e: Add MACsec TX steering rules")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Pull an Amlogic clk driver fix from Jerome Brunet:
- Fix PLL scheduling while atomic following a1 locking sequence update
* tag 'clk-meson-fixes-v6.5-1' of https://github.com/BayLibre/clk-meson:
clk: meson: change usleep_range() to udelay() for atomic context
The kvm_host_psci_cpu_entry() function was renamed in order to add a wrapper around
it, but the prototype did not change, so now the missing-prototype warning came
back in W=1 builds:
arch/arm64/kvm/hyp/nvhe/psci-relay.c:203:28: error: no previous prototype for function '__kvm_host_psci_cpu_entry' [-Werror,-Wmissing-prototypes]
asmlinkage void __noreturn __kvm_host_psci_cpu_entry(bool is_cpu_on)
Fixes: dcf89d1111 ("KVM: arm64: Add missing BTI instructions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230724121850.1386668-1-arnd@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
This reverts commit da56a1bfba.
Bjorn Andersson, Fabio Estevam, Xiaolei Wang, and Jon Hunter reported that
da56a1bfba ("PCI: dwc: Wait for link up only if link is started") broke
controller probing by returning an error in case the link does not come up
during host initialisation, for example when the slot is empty.
As explained in commit 886a9c1347 ("PCI: dwc: Move link handling into
common code") and as indicated by the comment "Ignore errors, the link may
come up later" in the code, waiting for link up and ignoring errors is the
intended behaviour:
Let's standardize this to succeed as there are usecases where devices
(and the link) appear later even without hotplug. For example, a
reconfigured FPGA device.
Reverting the offending commit specifically fixes a regression on Qualcomm
platforms like the Lenovo ThinkPad X13s which no longer reach the
interconnect sync state if a slot does not have a device populated (e.g. an
optional modem).
Note that enabling asynchronous probing by default as was done for Qualcomm
platforms by commit c0e1eb441b ("PCI: qcom: Enable async probe by
default"), should take care of any related boot time concerns.
Finally, note that the intel-gw driver is the only driver currently not
providing a .start_link() callback and instead starts the link in its
.host_init() callback, which may avoid an additional one-second timeout
during probe by making the link-up wait conditional. If anyone cares, that
can be done in a follow-up patch with a proper motivation.
[bhelgaas: add Fabio Estevam, Xiaolei Wang, Jon Hunter reports]
Fixes: da56a1bfba ("PCI: dwc: Wait for link up only if link is started")
Link: https://lore.kernel.org/r/20230706082610.26584-1-johan+linaro@kernel.org
Reported-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reported-by: Fabio Estevam <festevam@gmail.com>
Link: https://lore.kernel.org/r/20230704122635.1362156-1-festevam@gmail.com/
Reported-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Link: https://lore.kernel.org/r/20230705010624.3912934-1-xiaolei.wang@windriver.com/
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://lore.kernel.org/r/6ca287a1-6c7c-7b90-9022-9e73fb82b564@nvidia.com
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Sajid Dalvi <sdalvi@google.com>
Cc: Ajay Agarwal <ajayagarwal@google.com>
Yan-Hsuan has been away since 2021 and Ping has been the de facto maintainer
the past year, actively reviewing patches and doing all other maintainer
duties. So fix the MAINTAINERS file to show the current situation.
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230724104547.3061709-2-kvalo@kernel.org
Storage devices are free to send RSCNs, e.g. for internal state changes. If
this happens on all connected paths, zfcp risks temporarily losing all
paths at the same time. This has strong requirements on multipath
configuration such as "no_path_retry queue".
Avoid such situations by deferring fc_rport blocking until after the ADISC
response, when any actual state change of the remote port became clear.
The already existing port recovery triggers explicitly block the fc_rport.
The triggers are: on ADISC reject or timeout (typical cable pull case), and
on ADISC indicating that the remote port has changed its WWPN or
the port is meanwhile no longer open.
As a side effect, this also removes a confusing direct function call to
another work item function zfcp_scsi_rport_work() instead of scheduling
that other work item. It was probably done that way to have the rport block
side effect immediate and synchronous to the caller.
Fixes: a2fa0aede0 ("[SCSI] zfcp: Block FC transport rports early on errors")
Cc: stable@vger.kernel.org #v2.6.30+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Link: https://lore.kernel.org/r/20230724145156.3920244-1-maier@linux.ibm.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jiri Olsa says:
====================
bpf: Disable preemption in perf_event_output helpers code
hi,
we got report of kernel crash [1][3] within bpf_event_output helper.
The reason is the nesting protection code in bpf_event_output that expects
disabled preemption, which is not guaranteed for programs executed by
bpf_prog_run_array_cg.
I managed to reproduce on tracing side where we have the same problem
in bpf_perf_event_output. The reproducer [2] just creates busy uprobe
and call bpf_perf_event_output helper a lot.
v3 changes:
- added acks and fixed 'Fixes' tag style [Hou Tao]
- added Closes tag to patch 2
v2 changes:
- I changed 'Fixes' commits to where I saw we switched from preempt_disable
to migrate_disable, but I'm not completely sure about the patch 2, because
it was tricky to find, would be nice if somebody could check on that
thanks,
jirka
[1] https://github.com/cilium/cilium/issues/26756
[2] https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/commit/?h=bpf_output_fix_reproducer&id=8054dcc634121b884c7c331329d61d93351d03b5
[3] slack:
[66194.378161] BUG: kernel NULL pointer dereference, address: 0000000000000001
[66194.378324] #PF: supervisor instruction fetch in kernel mode
[66194.378447] #PF: error_code(0x0010) - not-present page
...
[66194.378692] Oops: 0010 [#1] PREEMPT SMP NOPTI
...
[66194.380666] <TASK>
[66194.380775] ? perf_output_sample+0x12a/0x9a0
[66194.380902] ? finish_task_switch.isra.0+0x81/0x280
[66194.381024] ? perf_event_output+0x66/0xa0
[66194.381148] ? bpf_event_output+0x13a/0x190
[66194.381270] ? bpf_event_output_data+0x22/0x40
[66194.381391] ? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb
[66194.381519] ? xa_load+0x87/0xe0
[66194.381635] ? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0
[66194.381759] ? release_sock+0x3e/0x90
[66194.381876] ? sk_setsockopt+0x1a1/0x12f0
[66194.381996] ? udp_pre_connect+0x36/0x50
[66194.382114] ? inet_dgram_connect+0x93/0xa0
[66194.382233] ? __sys_connect+0xb4/0xe0
[66194.382353] ? udp_setsockopt+0x27/0x40
[66194.382470] ? __pfx_udp_push_pending_frames+0x10/0x10
[66194.382593] ? __sys_setsockopt+0xdf/0x1a0
[66194.382713] ? __x64_sys_connect+0xf/0x20
[66194.382832] ? do_syscall_64+0x3a/0x90
[66194.382949] ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
[66194.383077] </TASK>
---
====================
Link: https://lore.kernel.org/r/20230725084206.580930-1-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
We received report [1] of kernel crash, which is caused by
using nesting protection without disabled preemption.
The bpf_event_output can be called by programs executed by
bpf_prog_run_array_cg function that disabled migration but
keeps preemption enabled.
This can cause task to be preempted by another one inside the
nesting protection and lead eventually to two tasks using same
perf_sample_data buffer and cause crashes like:
BUG: kernel NULL pointer dereference, address: 0000000000000001
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
...
? perf_output_sample+0x12a/0x9a0
? finish_task_switch.isra.0+0x81/0x280
? perf_event_output+0x66/0xa0
? bpf_event_output+0x13a/0x190
? bpf_event_output_data+0x22/0x40
? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb
? xa_load+0x87/0xe0
? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0
? release_sock+0x3e/0x90
? sk_setsockopt+0x1a1/0x12f0
? udp_pre_connect+0x36/0x50
? inet_dgram_connect+0x93/0xa0
? __sys_connect+0xb4/0xe0
? udp_setsockopt+0x27/0x40
? __pfx_udp_push_pending_frames+0x10/0x10
? __sys_setsockopt+0xdf/0x1a0
? __x64_sys_connect+0xf/0x20
? do_syscall_64+0x3a/0x90
? entry_SYSCALL_64_after_hwframe+0x72/0xdc
Fixing this by disabling preemption in bpf_event_output.
[1] https://github.com/cilium/cilium/issues/26756
Cc: stable@vger.kernel.org
Reported-by: Oleg "livelace" Popov <o.popov@livelace.ru>
Closes: https://github.com/cilium/cilium/issues/26756
Fixes: 2a916f2f54 ("bpf: Use migrate_disable/enable in array macros and cgroup/lirc code.")
Acked-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230725084206.580930-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The nesting protection in bpf_perf_event_output relies on disabled
preemption, which is guaranteed for kprobes and tracepoints.
However bpf_perf_event_output can be also called from uprobes context
through bpf_prog_run_array_sleepable function which disables migration,
but keeps preemption enabled.
This can cause task to be preempted by another one inside the nesting
protection and lead eventually to two tasks using same perf_sample_data
buffer and cause crashes like:
kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
BUG: unable to handle page fault for address: ffffffff82be3eea
...
Call Trace:
? __die+0x1f/0x70
? page_fault_oops+0x176/0x4d0
? exc_page_fault+0x132/0x230
? asm_exc_page_fault+0x22/0x30
? perf_output_sample+0x12b/0x910
? perf_event_output+0xd0/0x1d0
? bpf_perf_event_output+0x162/0x1d0
? bpf_prog_c6271286d9a4c938_krava1+0x76/0x87
? __uprobe_perf_func+0x12b/0x540
? uprobe_dispatcher+0x2c4/0x430
? uprobe_notify_resume+0x2da/0xce0
? atomic_notifier_call_chain+0x7b/0x110
? exit_to_user_mode_prepare+0x13e/0x290
? irqentry_exit_to_user_mode+0x5/0x30
? asm_exc_int3+0x35/0x40
Fixing this by disabling preemption in bpf_perf_event_output.
Cc: stable@vger.kernel.org
Fixes: 8c7dcb84e3 ("bpf: implement sleepable uprobes by chaining gps")
Acked-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230725084206.580930-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
wq_cpu_intensive_thresh_us is used to detect CPU-hogging per-cpu work items.
Once detected, they're excluded from concurrency management to prevent them
from blocking other per-cpu work items. If CONFIG_WQ_CPU_INTENSIVE_REPORT is
enabled, repeat offenders are also reported so that the code can be updated.
The default threshold is 10ms which is long enough to do fair bit of work on
modern CPUs while short enough to be usually not noticeable. This
unfortunately leads to a lot of, arguable spurious, detections on very slow
CPUs. Using the same threshold across CPUs whose performance levels may be
apart by multiple levels of magnitude doesn't make whole lot of sense.
This patch scales up wq_cpu_intensive_thresh_us upto 1 second when BogoMIPS
is below 4000. This is obviously very inaccurate but it doesn't have to be
accurate to be useful. The mechanism is still useful when the threshold is
fully scaled up and the benefits of reports are usually shared with everyone
regardless of who's reporting, so as long as there are sufficient number of
fast machines reporting, we don't lose much.
Some (or is it all?) ARM CPUs systemtically report significantly lower
BogoMIPS. While this doesn't break anything, given how widespread ARM CPUs
are, it's at least a missed opportunity and it probably would be a good idea
to teach workqueue about it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
This patch fixes an issue affecting the Wifi/Bluetooth connectivity on
ROCK Pi 4 boards. Commit f471b1b2db ("arm64: dts: rockchip: Fix Bluetooth
on ROCK Pi 4 boards") introduced a problem with the clock configuration.
Specifically, the clock-names property of the sdio-pwrseq node was not
updated to 'lpo', causing the driver to wait indefinitely for the wrong clock
signal 'ext_clock' instead of the expected one 'lpo'. This prevented the proper
initialization of Wifi/Bluetooth chip on ROCK Pi 4 boards.
To address this, this patch updates the clock-names property of the
sdio-pwrseq node to "lpo" to align with the changes made to the bluetooth node.
This patch has been tested on ROCK Pi 4B.
Fixes: f471b1b2db ("arm64: dts: rockchip: Fix Bluetooth on ROCK Pi 4 boards")
Cc: stable@vger.kernel.org
Signed-off-by: Yogesh Hegde <yogi.kernel@gmail.com>
Link: https://lore.kernel.org/r/ZLbATQRjOl09aLAp@zephyrusG14
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
There is only one debug unit in the sam9x60 SOC and it has the chipid
register. So, the dbgu compatible strings are valid only for debug usart.
Defining these dbgu compatible strings are not valid for flexcom usart.
So adding the items which is valid only for flexcom usart and removing
the microchip,sam9x60-usart compatible string from the enum list as no
usart node defines only this specific compatible string.
Signed-off-by: Durai Manickam KR <durai.manickamkr@microchip.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230718065735.10187-2-durai.manickamkr@microchip.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arm SCMI and SMCCC fixes for v6.5
Set of fixes addressing issues:
1. Possible use of uninitialised results structure in the SMCCC SOC_ID
driver if the driver fails to complete the initialisation
2. Missed signed error return value handling from simple_write_to_buffer()
used in scmi_dbg_raw_mode_common_write()
3. The OF node reference obtained is not dropped if node is incompatible
with "arm,scmi-shmem" in the mailbox as well as SMC transport channel
setup
4. The possibility of a late response to an in-flight pending transaction
that could end up triggering the interrupt handler after the SCMI core
has cleaned up the transport channel as part of core driver remove
* tag 'scmi-smccc-fixes-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: arm_scmi: Fix chan_free cleanup on SMC
firmware: arm_scmi: Drop OF node reference in the transport channel setup
firmware: arm_scmi: Fix signed error return values handling
firmware: smccc: Fix use of uninitialised results structure
Link: https://lore.kernel.org/r/20230721114052.3371923-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
It is incorrect in python to compare integer values using the "is" keyword.
The "is" keyword in python is used to compare references to two objects,
not their values. Newer version of python3 (version 3.8) throws a warning
when such incorrect comparison is made. For value comparison, "==" should
be used.
Fix this in the code and suppress the following warning:
/usr/sbin/vmbus_testing:167: SyntaxWarning: "is" with a literal. Did you mean "=="?
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20230705134408.6302-1-anisinha@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The following checkpatch warning is removed:
WARNING: Use #include <linux/io.h> instead of <asm/io.h>
Signed-off-by: ZhiHu <huzhi001@208suo.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
On hardware that supports Indirect Branch Tracking (IBT), Hyper-V VMs
with ConfigVersion 9.3 or later support IBT in the guest. However,
current versions of Hyper-V have a bug in that there's not an ENDBR64
instruction at the beginning of the hypercall page. Since hypercalls are
made with an indirect call to the hypercall page, all hypercall attempts
fail with an exception and Linux panics.
A Hyper-V fix is in progress to add ENDBR64. But guard against the Linux
panic by clearing X86_FEATURE_IBT if the hypercall page doesn't start
with ENDBR. The VM will boot and run without IBT.
If future Linux 32-bit kernels were to support IBT, additional hypercall
page hackery would be needed to make IBT work for such kernels in a
Hyper-V VM.
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1690001476-98594-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The Hyper-V host is queried to get the max transfer size that it supports,
and this value is used to set max_sectors for the synthetic SCSI
controller. However, this max transfer size may be too large for virtual
Fibre Channel devices, which are limited to 512 Kbytes. If a larger
transfer size is used with a vFC device, Hyper-V always returns an error,
and storvsc logs a message like this where the SRB status and SCSI status
are both zero:
hv_storvsc <GUID>: tag#197 cmd 0x8a status: scsi 0x0 srb 0x0 hv 0xc0000001
Add logic to limit the max transfer size to 512 Kbytes for vFC devices.
Fixes: 1d3e098078 ("scsi: storvsc: Correct reporting of Hyper-V I/O size limits")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1689887102-32806-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 813665564b ("iio: core: Convert to use firmware node handle
instead of OF node") switched the kind of nodes to use for label
retrieval in device registration. Probably an unwanted change in that
commit was that if the device has no parent then NULL pointer is
accessed. This is what happens in the stock IIO dummy driver when a
new entry is created in configfs:
# mkdir /sys/kernel/config/iio/devices/dummy/foo
BUG: kernel NULL pointer dereference, address: ...
...
Call Trace:
__iio_device_register
iio_dummy_probe
Since there seems to be no reason to make a parent device of an IIO
dummy device mandatory, let’s prevent the invalid memory access in
__iio_device_register when the parent device is NULL. With this
change, the IIO dummy driver works fine with configfs.
Fixes: 813665564b ("iio: core: Convert to use firmware node handle instead of OF node")
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Link: https://lore.kernel.org/r/20230719083208.88149-1-mzamazal@redhat.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The regulator_get_voltage() function returns negative error codes.
This function saves it to an unsigned int and then does some range
checking and, since the error code falls outside the correct range,
it returns -EINVAL.
Beyond the messiness, this is bad because the regulator_get_voltage()
function can return -EPROBE_DEFER and it's important to propagate that
back properly so it can be handled.
Fixes: da35a7b526 ("iio: frequency: admv1013: add support for ADMV1013")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/ce75aac3-2aba-4435-8419-02e59fdd862b@moroto.mountain
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Gather Data Sampling (GDS) is a transient execution attack using
gather instructions from the AVX2 and AVX512 extensions. This attack
allows malicious code to infer data that was previously stored in
vector registers. Systems that are not vulnerable to GDS will set the
GDS_NO bit of the IA32_ARCH_CAPABILITIES MSR. This is useful for VM
guests that may think they are on vulnerable systems that are, in
fact, not affected. Guests that are running on affected hosts where
the mitigation is enabled are protected as if they were running
on an unaffected system.
On all hosts that are not affected or that are mitigated, set the
GDS_NO bit.
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Gather Data Sampling (GDS) is mitigated in microcode. However, on
systems that haven't received the updated microcode, disabling AVX
can act as a mitigation. Add a Kconfig option that uses the microcode
mitigation if available and disables AVX otherwise. Setting this
option has no effect on systems not affected by GDS. This is the
equivalent of setting gather_data_sampling=force.
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
The Gather Data Sampling (GDS) vulnerability allows malicious software
to infer stale data previously stored in vector registers. This may
include sensitive data such as cryptographic keys. GDS is mitigated in
microcode, and systems with up-to-date microcode are protected by
default. However, any affected system that is running with older
microcode will still be vulnerable to GDS attacks.
Since the gather instructions used by the attacker are part of the
AVX2 and AVX512 extensions, disabling these extensions prevents gather
instructions from being executed, thereby mitigating the system from
GDS. Disabling AVX2 is sufficient, but we don't have the granularity
to do this. The XCR0[2] disables AVX, with no option to just disable
AVX2.
Add a kernel parameter gather_data_sampling=force that will enable the
microcode mitigation if available, otherwise it will disable AVX on
affected systems.
This option will be ignored if cmdline mitigations=off.
This is a *big* hammer. It is known to break buggy userspace that
uses incomplete, buggy AVX enumeration. Unfortunately, such userspace
does exist in the wild:
https://www.mail-archive.com/bug-coreutils@gnu.org/msg33046.html
[ dhansen: add some more ominous warnings about disabling AVX ]
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Armv8 Juno/Vexpress DTS fix for v6.5
A single simple fix removing dangling symlink left as part of arm dts
files movement to vendor sub-directories. It is harmless and causes no
issue for the build but scripts copying files see errors/failures.
* tag 'juno-fix-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
arm64: dts: arm: Remove the dangling vexpress-v2m-rs1.dtsi symlink
Link: https://lore.kernel.org/r/20230721112359.3369716-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Move start_freeze into nvme_rdma_configure_io_queues(), and there is
at least two benefits:
1) fix unbalanced freeze and unfreeze, since re-connection work may
fail or be broken by removal
2) IO during error recovery can be failfast quickly because nvme fabrics
unquiesces queues after teardown.
One side-effect is that !mpath request may timeout during connecting
because of queue topo change, but that looks not one big deal:
1) same problem exists with current code base
2) compared with !mpath, mpath use case is dominant
Fixes: 9f98772ba3 ("nvme-rdma: fix controller reset hang during traffic")
Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Move start_freeze into nvme_tcp_configure_io_queues(), and there is
at least two benefits:
1) fix unbalanced freeze and unfreeze, since re-connection work may
fail or be broken by removal
2) IO during error recovery can be failfast quickly because nvme fabrics
unquiesces queues after teardown.
One side-effect is that !mpath request may timeout during connecting
because of queue topo change, but that looks not one big deal:
1) same problem exists with current code base
2) compared with !mpath, mpath use case is dominant
Fixes: 2875b0aeca ("nvme-tcp: fix controller reset hang during traffic")
Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
kvm_arm_hardware_enabled is rather misleading, since it doesn't track
the state of all hardware resources needed for running a VM. What it
actually tracks is whether or not the hyp cpu context has been
initialized.
Since we're now at the point where vgic + timer irq management has
been separated from kvm_arm_hardware_enabled, rephrase it (and the
associated helpers) to make it clear what state is being tracked.
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230719231855.262973-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
When running in protected mode, the hyp stub is disabled after pKVM is
initialized, meaning the host cannot enable/disable the hyp at
runtime. As such, kvm_arm_hardware_enabled is always 1 after
initialization, and kvm_arch_hardware_enable() never enables the vgic
maintenance irq or timer irqs.
Unconditionally enable/disable the vgic + timer irqs in the respective
calls, instead relying on the percpu bookkeeping in the generic code
to keep track of which cpus have the interrupts unmasked.
Fixes: 466d27e48d ("KVM: arm64: Simplify the CPUHP logic")
Reported-by: Oliver Upton <oliver.upton@linux.dev>
Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Link: https://lore.kernel.org/r/20230719175400.647154-1-rananta@google.com
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it was merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230717225358.3210536-1-robh@kernel.org
Signed-off-by: Michal Simek <michal.simek@amd.com>
SCMI transport based on SMC can optionally use an additional IRQ to
signal message completion. The associated interrupt handler is currently
allocated using devres but on shutdown the core SCMI stack will call
.chan_free() well before any managed cleanup is invoked by devres.
As a consequence, the arrival of a late reply to an in-flight pending
transaction could still trigger the interrupt handler well after the
SCMI core has cleaned up the channels, with unpleasant results.
Inhibit further message processing on the IRQ path by explicitly freeing
the IRQ inside .chan_free() callback itself.
Fixes: dd820ee21d ("firmware: arm_scmi: Augment SMC/HVC to allow optional interrupt")
Reported-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20230719173533.2739319-1-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
If the user set an MTU value, it usually means that there are special
requirements for the MTU. But if an interface gots activated, the MTU was
always recalculated and then the user set value was overwritten.
The only reason why this user set value has to be overwritten, is when the
MTU has to be decreased because batman-adv is not able to transfer packets
with the user specified size.
Fixes: c6c8fea297 ("net: Add batman-adv meshing protocol")
Cc: stable@vger.kernel.org
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
If an interface changes the MTU, it is expected that an NETDEV_PRECHANGEMTU
and NETDEV_CHANGEMTU notification events is triggered. This worked fine for
.ndo_change_mtu based changes because core networking code took care of it.
But for auto-adjustments after hard-interfaces changes, these events were
simply missing.
Due to this problem, non-batman-adv components weren't aware of MTU changes
and thus couldn't perform their own tasks correctly.
Fixes: c6c8fea297 ("net: Add batman-adv meshing protocol")
Cc: stable@vger.kernel.org
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Gather Data Sampling (GDS) is a hardware vulnerability which allows
unprivileged speculative access to data which was previously stored in
vector registers.
Intel processors that support AVX2 and AVX512 have gather instructions
that fetch non-contiguous data elements from memory. On vulnerable
hardware, when a gather instruction is transiently executed and
encounters a fault, stale data from architectural or internal vector
registers may get transiently stored to the destination vector
register allowing an attacker to infer the stale data using typical
side channel techniques like cache timing attacks.
This mitigation is different from many earlier ones for two reasons.
First, it is enabled by default and a bit must be set to *DISABLE* it.
This is the opposite of normal mitigation polarity. This means GDS can
be mitigated simply by updating microcode and leaving the new control
bit alone.
Second, GDS has a "lock" bit. This lock bit is there because the
mitigation affects the hardware security features KeyLocker and SGX.
It needs to be enabled and *STAY* enabled for these features to be
mitigated against GDS.
The mitigation is enabled in the microcode by default. Disable it by
setting gather_data_sampling=off or by disabling all mitigations with
mitigations=off. The mitigation status can be checked by reading:
/sys/devices/system/cpu/vulnerabilities/gather_data_sampling
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let COMMON_CLK_FIXED_MMIO depend on HAS_IOMEM so that it won't
be built to cause below compiling error if PCI is unset:
------
ld: drivers/clk/clk-fixed-mmio.o: in function `fixed_mmio_clk_setup':
clk-fixed-mmio.c:(.text+0x5e): undefined reference to `of_iomap'
ld: clk-fixed-mmio.c:(.text+0xba): undefined reference to `iounmap'
------
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: linux-clk@vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-8-bhe@redhat.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
This reverts commit 860690a93e.
On the MT8183, the SSPM related clocks were removed claiming a lack of
usage. This however causes some issues when the driver was converted to
the new simple-probe mechanism. This mechanism allocates enough space
for all the clocks defined in the clock driver, not the highest index
in the DT binding. This leads to out-of-bound writes if their are holes
in the DT binding or the driver (due to deprecated or unimplemented
clocks). These errors can go unnoticed and cause memory corruption,
leading to crashes in unrelated areas, or nothing at all. KASAN will
detect them.
Add the SSPM related clocks back to the MT8183 clock driver to fully
implement the DT binding. The SSPM clocks are for the power management
co-processor, and should never be turned off. They are marked as such.
Fixes: 3f37ba7cc3 ("clk: mediatek: mt8183: Convert all remaining clocks to common probe")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Link: https://lore.kernel.org/r/20230719074251.1219089-1-wenst@chromium.org
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
pKVM initialization fails on systems with v1.1+ FF-A implementations, as
the hyp does a strict match on the returned version from FFA_VERSION.
This is a stronger assertion than required by the specification, which
requires minor revisions be backwards compatible with earlier revisions
of the same major version.
Relax the check in hyp_ffa_init() to only test the returned major
version. Even though v1.1 broke ABI, the expectation is that firmware
incapable of using the v1.0 ABI return NOT_SUPPORTED instead of a valid
version.
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230718184537.3220867-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The blk-ctrl device is deliberately placed outside of the GPC power
domain as it needs to control the power sequencing of the blk-ctrl
domains together with the GPC domains.
Clock runtime PM works by operating on the clock parent device, which
doesn't translate into the neccessary GPC power domain action if the
clk parent is not part of the GPC power domain. Use the bus_power_device
as the parent for the clock to trigger the proper GPC domain actions on
clock runtime power management.
Fixes: 2cbee26e5d ("soc: imx: imx8mp-blk-ctrl: expose high performance PLL clock")
Reported-by: Yannic Moog <Y.Moog@phytec.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Yannic Moog <y.moog@phytec.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The SK-IMX53 board, bearing i.MX536A CPU, is not stable when running at
1.2 GHz (default iMX53 maximum). The SoC is only rated up to 800 MHz.
Disable 1.2 GHz and 1 GHz frequencies.
Fixes: 0b8576d844 ("ARM: dts: imx: Add support for SK-iMX53 board")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
When building with Clang, and when KASAN and GCOV_PROFILE_ALL are both
enabled, the test fails to build [1]:
>> lib/test_bitmap.c:920:2: error: call to '__compiletime_assert_239' declared with 'error' attribute: BUILD_BUG_ON failed: !__builtin_constant_p(res)
BUILD_BUG_ON(!__builtin_constant_p(res));
^
include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
^
include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^
include/linux/compiler_types.h:352:2: note: expanded from macro 'compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
include/linux/compiler_types.h:340:2: note: expanded from macro '_compiletime_assert'
__compiletime_assert(condition, msg, prefix, suffix)
^
include/linux/compiler_types.h:333:4: note: expanded from macro '__compiletime_assert'
prefix ## suffix(); \
^
<scratch space>:185:1: note: expanded from here
__compiletime_assert_239
Originally it was attributed to s390, which now looks seemingly wrong. The
issue is not related to bitmap code itself, but it breaks build for a given
configuration.
Disabling the const_eval test under that config may potentially hide other
bugs. Instead, workaround it by disabling GCOV for the test_bitmap unless
the compiler will get fixed.
[1] https://github.com/ClangBuiltLinux/linux/issues/1874
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307171254.yFcH97ej-lkp@intel.com/
Fixes: dc34d50366 ("lib: test_bitmap: add compile-time optimization/evaluations assertions")
Co-developed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Set VPU G2 clock to 300MHz like described in documentation.
This fixes pixels error occurring with large resolution ( >= 2560x1600)
HEVC test stream when using the postprocessor to produce NV12.
Fixes: 4ac7e4a812 ("arm64: dts: imx8mq: Enable both G1 and G2 VPU's with vpu-blk-ctrl")
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
For SOMs with an onboard PHY, the RESET_N pull-up resistor is
currently deactivated in the pinmux configuration. When the pinmux
code selects the GPIO function for this pin, with a default direction
of input, this prevents the RESET_N pin from being taken to the proper
3.3V level (deasserted), and this results in the PHY being not
detected since it is held in reset.
Taken from RESET_N pin description in ADIN13000 datasheet:
This pin requires a 1K pull-up resistor to AVDD_3P3.
Activate the pull-up resistor to fix the issue.
Fixes: ade0176dd8 ("arm64: dts: imx8mn-var-som: Add Variscite VAR-SOM-MX8MN System on Module")
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Update lib/cpumask.c and <linux/cpumask.h> to fix all kernel-doc
warnings:
include/linux/cpumask.h:185: warning: Function parameter or member 'srcp1' not described in 'cpumask_first_and'
include/linux/cpumask.h:185: warning: Function parameter or member 'srcp2' not described in 'cpumask_first_and'
include/linux/cpumask.h:185: warning: Excess function parameter 'src1p' description in 'cpumask_first_and'
include/linux/cpumask.h:185: warning: Excess function parameter 'src2p' description in 'cpumask_first_and'
lib/cpumask.c:59: warning: Function parameter or member 'node' not described in 'alloc_cpumask_var_node'
lib/cpumask.c:169: warning: Function parameter or member 'src1p' not described in 'cpumask_any_and_distribute'
lib/cpumask.c:169: warning: Function parameter or member 'src2p' not described in 'cpumask_any_and_distribute'
Fixes: 7b4967c532 ("cpumask: Add alloc_cpumask_var_node()")
Fixes: 839cad5fa5 ("cpumask: fix function description kernel-doc notation")
Fixes: 93ba139ba8 ("cpumask: use find_first_and_bit()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
The GW7904 does not connect the VDD_MIPI power rails thus MIPI is
disabled. However we must also disable disp_blk_ctrl as it uses the
pgc_mipi power domain and without it being disabled imx8m-blk-ctrl will
fail to probe:
imx8m-blk-ctrl 32e28000.blk-ctrl: error -ETIMEDOUT: failed to attach
power domain "mipi-dsi"
imx8m-blk-ctrl: probe of 32e28000.blk-ctrl failed with error -110
Fixes: b999bdaf05 ("arm64: dts: imx: Add i.mx8mm Gateworks gw7904 dts support")
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The GW7903 does not connect the VDD_MIPI power rails thus MIPI is
disabled. However we must also disable disp_blk_ctrl as it uses the
pgc_mipi power domain and without it being disabled imx8m-blk-ctrl will
fail to probe:
imx8m-blk-ctrl 32e28000.blk-ctrl: error -ETIMEDOUT: failed to attach power domain "mipi-dsi"
imx8m-blk-ctrl: probe of 32e28000.blk-ctrl failed with error -110
Fixes: a72ba91e5b ("arm64: dts: imx: Add i.mx8mm Gateworks gw7903 dts support")
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The affected lines were resulting in a NULL pointer dereference on our
platform because the device tree contained the following list of
compatible strings:
power-sensor@40 {
compatible = "ti,ina232", "ti,ina231";
...
};
Since the driver doesn't declare a compatible string "ti,ina232", the OF
matching succeeds on "ti,ina231". But the I2C device ID info is
populated via the first compatible string, cf. modalias population in
of_i2c_get_board_info(). Since there is no "ina232" entry in the legacy
I2C device ID table either, the struct i2c_device_id *id pointer in the
probe function is NULL.
Fix this by using the already populated type variable instead, which
points to the proper driver data. Since the name is also wanted, add a
generic one to the ina2xx_config table.
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Fixes: c43a102e67 ("iio: ina2xx: add support for TI INA2xx Power Monitors")
Link: https://lore.kernel.org/r/20230619141239.2257392-1-alvin@pqrs.dk
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
AC excitation enable feature exposed to user on AD7192, allowing a bit
which should be 0 to be set. This feature is specific only to AD7195. AC
excitation attribute moved accordingly.
In the AD7195 documentation, the AC excitation enable bit is on position
22 in the Configuration register. ACX macro changed to match correct
register and bit.
Note that the fix tag is for the commit that moved the driver out of
staging.
Fixes: b581f748cc ("staging: iio: adc: ad7192: move out of staging")
Signed-off-by: Alisa Roman <alisa.roman@analog.com>
Cc: stable@vger.kernel.org
Reviewed-by: Nuno Sa <nuno.sa@analog.com>
Link: https://lore.kernel.org/r/20230614155242.160296-1-alisa.roman@analog.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Currently, read/write_page_hwecc() and read/write_page_raw() are not
aligned: there is a mismatch in the OOB bytes which are not
read/written at the same offset in both cases (raw vs. hwecc).
This is a real problem when relying on the presence of the Page
Addresses (PA) when using the NAND chip as a boot device, as the
BootROM expects additional data in the OOB area at specific locations.
Rockchip boot blocks are written per 4 x 512 byte sectors per page.
Each page with boot blocks must have a page address (PA) pointer in OOB
to the next page. Pages are written in a pattern depending on the NAND chip ID.
Generate boot block page address and pattern for hwecc in user space
and copy PA data to/from the already reserved last 4 bytes before ECC
in the chip->oob_poi data layout.
Align the different helpers. This change breaks existing jffs2 users.
Fixes: 058e0e847d ("mtd: rawnand: rockchip: NFC driver for RK3308, RK2928 and others")
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/5e782c08-862b-51ae-47ff-3299940928ca@gmail.com
Rockchip boot blocks are written per 4 x 512 byte sectors per page.
Each page with boot blocks must have a page address (PA) pointer in OOB
to the next page.
The currently advertised free OOB area starts at offset 6, like
if 4 PA bytes were located right after the BBM. This is wrong as the
PA bytes are located right before the ECC bytes.
Fix the layout by allowing access to all bytes between the BBM and the
PA bytes instead of reserving 4 bytes right after the BBM.
This change breaks existing jffs2 users.
Fixes: 058e0e847d ("mtd: rawnand: rockchip: NFC driver for RK3308, RK2928 and others")
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/d202f12d-188c-20e8-f2c2-9cc874ad4d22@gmail.com
Even if sdhci_pltfm_pmops is specified for PM, this driver doesn't apply
sdhci_pltfm, so the structure is not correctly referenced in PM functions.
This applies sdhci_pltfm to this driver to fix this issue.
- Call sdhci_pltfm_init() instead of sdhci_alloc_host() and
other functions that covered by sdhci_pltfm.
- Move ops and quirks to sdhci_pltfm_data
- Replace sdhci_priv() with own private function sdhci_f_sdh30_priv().
Fixes: 87a507459f ("mmc: sdhci: host: add new f_sdh30")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230630004533.26644-1-hayashi.kunihiko@socionext.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Manivannan has been actively reviewing patches and testing changes
related to the DesignWare core driver and other DWC-based PCIe drivers
for a while now.
Add Manivannan as a maintainer for the Synopsys DesignWare driver to make
his role and contributions official.
Thank you Manivannan! For all the help with DWC!
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
exfat_extract_uni_name copies characters from a given file name entry into
the 'uniname' variable. This variable is actually defined on the stack of
the exfat_readdir() function. According to the definition of
the 'exfat_uni_name' type, the file name should be limited 255 characters
(+ null teminator space), but the exfat_get_uniname_from_ext_entry()
function can write more characters because there is no check if filename
entries exceeds max filename length. This patch add the check not to copy
filename characters when exceeding max filename length.
Cc: stable@vger.kernel.org
Cc: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reported-by: Maxim Suhanov <dfirblog@gmail.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let FSL_EDMA and INTEL_IDMA64 depend on HAS_IOMEM so that it
won't be built to cause below compiling error if PCI is unset.
--------
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/fsl-edma.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/idma64.ko] undefined!
--------
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: dmaengine@vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-2-bhe@redhat.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Once the ECC word endianness is converted to BE32, we force cast it
to u32 so we can use elm_write_reg() which in turn uses writel().
Fixes below sparse warnings:
drivers/mtd/nand/raw/omap_elm.c:180:37: sparse: expected unsigned int [usertype] val
drivers/mtd/nand/raw/omap_elm.c:180:37: sparse: got restricted __be32 [usertype]
drivers/mtd/nand/raw/omap_elm.c:185:37: sparse: expected unsigned int [usertype] val
drivers/mtd/nand/raw/omap_elm.c:185:37: sparse: got restricted __be32 [usertype]
drivers/mtd/nand/raw/omap_elm.c:190:37: sparse: expected unsigned int [usertype] val
drivers/mtd/nand/raw/omap_elm.c:190:37: sparse: got restricted __be32 [usertype]
>> drivers/mtd/nand/raw/omap_elm.c:200:40: sparse: sparse: restricted __be32 degrades to integer
drivers/mtd/nand/raw/omap_elm.c:206:39: sparse: sparse: restricted __be32 degrades to integer
drivers/mtd/nand/raw/omap_elm.c:210:37: sparse: expected unsigned int [assigned] [usertype] val
drivers/mtd/nand/raw/omap_elm.c:210:37: sparse: got restricted __be32 [usertype]
drivers/mtd/nand/raw/omap_elm.c:213:37: sparse: expected unsigned int [assigned] [usertype] val
drivers/mtd/nand/raw/omap_elm.c:213:37: sparse: got restricted __be32 [usertype]
drivers/mtd/nand/raw/omap_elm.c:216:37: sparse: expected unsigned int [assigned] [usertype] val
drivers/mtd/nand/raw/omap_elm.c:216:37: sparse: got restricted __be32 [usertype]
drivers/mtd/nand/raw/omap_elm.c:219:37: sparse: expected unsigned int [assigned] [usertype] val
drivers/mtd/nand/raw/omap_elm.c:219:37: sparse: got restricted __be32 [usertype]
drivers/mtd/nand/raw/omap_elm.c:222:37: sparse: expected unsigned int [assigned] [usertype] val
drivers/mtd/nand/raw/omap_elm.c:222:37: sparse: got restricted __be32 [usertype]
drivers/mtd/nand/raw/omap_elm.c:225:37: sparse: expected unsigned int [assigned] [usertype] val
drivers/mtd/nand/raw/omap_elm.c:225:37: sparse: got restricted __be32 [usertype]
drivers/mtd/nand/raw/omap_elm.c:228:39: sparse: sparse: restricted __be32 degrades to integer
Fixes: bf22433575 ("mtd: devices: elm: Add support for ELM error correction")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306212211.WDXokuWh-lkp@intel.com/
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230624184021.7740-1-rogerq@kernel.org
There exists no parameter called "cpu_intensive_threshold_us".
The actual parameter name is "cpu_intensive_thresh_us".
Fixes: 6363845005 ("workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
The correct dts property for the SCL falling time is
"i2c-scl-falling-time-ns".
Fixes: c8da1d15b8 ("arm64: dts: stratix10: i2c clock running out of spec")
Cc: stable@vger.kernel.org
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
The call stack shown below is a scenario in the Linux 4.19 kernel.
Allocating memory failed where exfat fs use kmalloc_array due to
system memory fragmentation, while the u-disk was inserted without
recognition.
Devices such as u-disk using the exfat file system are pluggable and
may be insert into the system at any time.
However, long-term running systems cannot guarantee the continuity of
physical memory. Therefore, it's necessary to address this issue.
Binder:2632_6: page allocation failure: order:4,
mode:0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null)
Call trace:
[242178.097582] dump_backtrace+0x0/0x4
[242178.097589] dump_stack+0xf4/0x134
[242178.097598] warn_alloc+0xd8/0x144
[242178.097603] __alloc_pages_nodemask+0x1364/0x1384
[242178.097608] kmalloc_order+0x2c/0x510
[242178.097612] kmalloc_order_trace+0x40/0x16c
[242178.097618] __kmalloc+0x360/0x408
[242178.097624] load_alloc_bitmap+0x160/0x284
[242178.097628] exfat_fill_super+0xa3c/0xe7c
[242178.097635] mount_bdev+0x2e8/0x3a0
[242178.097638] exfat_fs_mount+0x40/0x50
[242178.097643] mount_fs+0x138/0x2e8
[242178.097649] vfs_kern_mount+0x90/0x270
[242178.097655] do_mount+0x798/0x173c
[242178.097659] ksys_mount+0x114/0x1ac
[242178.097665] __arm64_sys_mount+0x24/0x34
[242178.097671] el0_svc_common+0xb8/0x1b8
[242178.097676] el0_svc_handler+0x74/0x90
[242178.097681] el0_svc+0x8/0x340
By analyzing the exfat code,we found that continuous physical memory
is not required here,so kvmalloc_array is used can solve this problem.
Cc: stable@vger.kernel.org
Signed-off-by: gaoming <gaoming20@hihonor.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
The function meson_clk_pll_enable() can be invoked under the enable_lock
spinlock from the clk core logic, which risks a kernel panic during the
usleep_range() call:
BUG: scheduling while atomic: kworker/u4:2/36/0x00000002
Modules linked in: g_ffs usb_f_fs libcomposite
CPU: 1 PID: 36 Comm: kworker/u4:2 Not tainted 6.4.0-rc5 #273
Workqueue: events_unbound async_run_entry_fn
Call trace:
dump_backtrace+0x9c/0x128
show_stack+0x20/0x38
dump_stack_lvl+0x48/0x60
dump_stack+0x18/0x28
__schedule_bug+0x58/0x78
__schedule+0x828/0xa88
schedule+0x64/0xd8
schedule_hrtimeout_range_clock+0xd0/0x208
schedule_hrtimeout_range+0x1c/0x30
usleep_range_state+0x6c/0xa8
meson_clk_pll_enable+0x1f4/0x310
clk_core_enable+0x78/0x200
clk_core_enable+0x58/0x200
clk_core_enable+0x58/0x200
clk_core_enable+0x58/0x200
clk_enable+0x34/0x60
So it is required to use the udelay() function instead of usleep_range()
for the atomic context safety.
Fixes: b6ec400aa1 ("clk: meson: introduce new pll power-on sequence for A1 SoC family")
Reported-by: Jan Dakinevich <yvdakinevich@sberdevices.ru>
Signed-off-by: Dmitry Rokosov <ddrokosov@sberdevices.ru>
Link: https://lore.kernel.org/r/20230704215404.11533-1-ddrokosov@sberdevices.ru
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
When ip_vti device is set to the qdisc of the sfb type, the cb field
of the sent skb may be modified during enqueuing. Then,
slab-use-after-free may occur when ip_vti device sends IPv6 packets.
As commit f855691975 ("xfrm6: Fix the nexthdr offset in
_decode_session6.") showed, xfrm_decode_session was originally intended
only for the receive path. IP6CB(skb)->nhoff is not set during
transmission. Therefore, set the cb field in the skb to 0 before
sending packets.
Fixes: f855691975 ("xfrm6: Fix the nexthdr offset in _decode_session6.")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When ipv6_vti device is set to the qdisc of the sfb type, the cb field
of the sent skb may be modified during enqueuing. Then,
slab-use-after-free may occur when ipv6_vti device sends IPv6 packets.
The stack information is as follows:
BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890
Read of size 1 at addr ffff88802e08edc2 by task swapper/0/0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-next-20230707-00001-g84e2cad7f979 #410
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0xd9/0x150
print_address_description.constprop.0+0x2c/0x3c0
kasan_report+0x11d/0x130
decode_session6+0x103f/0x1890
__xfrm_decode_session+0x54/0xb0
vti6_tnl_xmit+0x3e6/0x1ee0
dev_hard_start_xmit+0x187/0x700
sch_direct_xmit+0x1a3/0xc30
__qdisc_run+0x510/0x17a0
__dev_queue_xmit+0x2215/0x3b10
neigh_connected_output+0x3c2/0x550
ip6_finish_output2+0x55a/0x1550
ip6_finish_output+0x6b9/0x1270
ip6_output+0x1f1/0x540
ndisc_send_skb+0xa63/0x1890
ndisc_send_rs+0x132/0x6f0
addrconf_rs_timer+0x3f1/0x870
call_timer_fn+0x1a0/0x580
expire_timers+0x29b/0x4b0
run_timer_softirq+0x326/0x910
__do_softirq+0x1d4/0x905
irq_exit_rcu+0xb7/0x120
sysvec_apic_timer_interrupt+0x97/0xc0
</IRQ>
Allocated by task 9176:
kasan_save_stack+0x22/0x40
kasan_set_track+0x25/0x30
__kasan_slab_alloc+0x7f/0x90
kmem_cache_alloc_node+0x1cd/0x410
kmalloc_reserve+0x165/0x270
__alloc_skb+0x129/0x330
netlink_sendmsg+0x9b1/0xe30
sock_sendmsg+0xde/0x190
____sys_sendmsg+0x739/0x920
___sys_sendmsg+0x110/0x1b0
__sys_sendmsg+0xf7/0x1c0
do_syscall_64+0x39/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 9176:
kasan_save_stack+0x22/0x40
kasan_set_track+0x25/0x30
kasan_save_free_info+0x2b/0x40
____kasan_slab_free+0x160/0x1c0
slab_free_freelist_hook+0x11b/0x220
kmem_cache_free+0xf0/0x490
skb_free_head+0x17f/0x1b0
skb_release_data+0x59c/0x850
consume_skb+0xd2/0x170
netlink_unicast+0x54f/0x7f0
netlink_sendmsg+0x926/0xe30
sock_sendmsg+0xde/0x190
____sys_sendmsg+0x739/0x920
___sys_sendmsg+0x110/0x1b0
__sys_sendmsg+0xf7/0x1c0
do_syscall_64+0x39/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The buggy address belongs to the object at ffff88802e08ed00
which belongs to the cache skbuff_small_head of size 640
The buggy address is located 194 bytes inside of
freed 640-byte region [ffff88802e08ed00, ffff88802e08ef80)
As commit f855691975 ("xfrm6: Fix the nexthdr offset in
_decode_session6.") showed, xfrm_decode_session was originally intended
only for the receive path. IP6CB(skb)->nhoff is not set during
transmission. Therefore, set the cb field in the skb to 0 before
sending packets.
Fixes: f855691975 ("xfrm6: Fix the nexthdr offset in _decode_session6.")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When the xfrm device is set to the qdisc of the sfb type, the cb field
of the sent skb may be modified during enqueuing. Then,
slab-use-after-free may occur when the xfrm device sends IPv6 packets.
The stack information is as follows:
BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890
Read of size 1 at addr ffff8881111458ef by task swapper/3/0
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.4.0-next-20230707 #409
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0xd9/0x150
print_address_description.constprop.0+0x2c/0x3c0
kasan_report+0x11d/0x130
decode_session6+0x103f/0x1890
__xfrm_decode_session+0x54/0xb0
xfrmi_xmit+0x173/0x1ca0
dev_hard_start_xmit+0x187/0x700
sch_direct_xmit+0x1a3/0xc30
__qdisc_run+0x510/0x17a0
__dev_queue_xmit+0x2215/0x3b10
neigh_connected_output+0x3c2/0x550
ip6_finish_output2+0x55a/0x1550
ip6_finish_output+0x6b9/0x1270
ip6_output+0x1f1/0x540
ndisc_send_skb+0xa63/0x1890
ndisc_send_rs+0x132/0x6f0
addrconf_rs_timer+0x3f1/0x870
call_timer_fn+0x1a0/0x580
expire_timers+0x29b/0x4b0
run_timer_softirq+0x326/0x910
__do_softirq+0x1d4/0x905
irq_exit_rcu+0xb7/0x120
sysvec_apic_timer_interrupt+0x97/0xc0
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:intel_idle_hlt+0x23/0x30
Code: 1f 84 00 00 00 00 00 f3 0f 1e fa 41 54 41 89 d4 0f 1f 44 00 00 66 90 0f 1f 44 00 00 0f 00 2d c4 9f ab 00 0f 1f 44 00 00 fb f4 <fa> 44 89 e0 41 5c c3 66 0f 1f 44 00 00 f3 0f 1e fa 41 54 41 89 d4
RSP: 0018:ffffc90000197d78 EFLAGS: 00000246
RAX: 00000000000a83c3 RBX: ffffe8ffffd09c50 RCX: ffffffff8a22d8e5
RDX: 0000000000000001 RSI: ffffffff8d3f8080 RDI: ffffe8ffffd09c50
RBP: ffffffff8d3f8080 R08: 0000000000000001 R09: ffffed1026ba6d9d
R10: ffff888135d36ceb R11: 0000000000000001 R12: 0000000000000001
R13: ffffffff8d3f8100 R14: 0000000000000001 R15: 0000000000000000
cpuidle_enter_state+0xd3/0x6f0
cpuidle_enter+0x4e/0xa0
do_idle+0x2fe/0x3c0
cpu_startup_entry+0x18/0x20
start_secondary+0x200/0x290
secondary_startup_64_no_verify+0x167/0x16b
</TASK>
Allocated by task 939:
kasan_save_stack+0x22/0x40
kasan_set_track+0x25/0x30
__kasan_slab_alloc+0x7f/0x90
kmem_cache_alloc_node+0x1cd/0x410
kmalloc_reserve+0x165/0x270
__alloc_skb+0x129/0x330
inet6_ifa_notify+0x118/0x230
__ipv6_ifa_notify+0x177/0xbe0
addrconf_dad_completed+0x133/0xe00
addrconf_dad_work+0x764/0x1390
process_one_work+0xa32/0x16f0
worker_thread+0x67d/0x10c0
kthread+0x344/0x440
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff888111145800
which belongs to the cache skbuff_small_head of size 640
The buggy address is located 239 bytes inside of
freed 640-byte region [ffff888111145800, ffff888111145a80)
As commit f855691975 ("xfrm6: Fix the nexthdr offset in
_decode_session6.") showed, xfrm_decode_session was originally intended
only for the receive path. IP6CB(skb)->nhoff is not set during
transmission. Therefore, set the cb field in the skb to 0 before
sending packets.
Fixes: f855691975 ("xfrm6: Fix the nexthdr offset in _decode_session6.")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
There is some instablity with some eMMC modules on ROCK Pi 4 SBCs running
in HS400 mode. This ends up resulting in some block errors after a while
or after a "heavy" operation utilising the eMMC (e.g. resizing a
filesystem). An example of these errors is as follows:
[ 289.171014] mmc1: running CQE recovery
[ 290.048972] mmc1: running CQE recovery
[ 290.054834] mmc1: running CQE recovery
[ 290.060817] mmc1: running CQE recovery
[ 290.061337] blk_update_request: I/O error, dev mmcblk1, sector 1411072 op 0x1:(WRITE) flags 0x800 phys_seg 36 prio class 0
[ 290.061370] EXT4-fs warning (device mmcblk1p1): ext4_end_bio:348: I/O error 10 writing to inode 29547 starting block 176466)
[ 290.061484] Buffer I/O error on device mmcblk1p1, logical block 172288
[ 290.061531] Buffer I/O error on device mmcblk1p1, logical block 172289
[ 290.061551] Buffer I/O error on device mmcblk1p1, logical block 172290
[ 290.061574] Buffer I/O error on device mmcblk1p1, logical block 172291
[ 290.061592] Buffer I/O error on device mmcblk1p1, logical block 172292
[ 290.061615] Buffer I/O error on device mmcblk1p1, logical block 172293
[ 290.061632] Buffer I/O error on device mmcblk1p1, logical block 172294
[ 290.061654] Buffer I/O error on device mmcblk1p1, logical block 172295
[ 290.061673] Buffer I/O error on device mmcblk1p1, logical block 172296
[ 290.061695] Buffer I/O error on device mmcblk1p1, logical block 172297
Disabling the Command Queue seems to stop the CQE recovery from running,
but doesn't seem to improve the I/O errors. Until this can be investigated
further, disable HS400 mode on the ROCK Pi 4 SBCs to at least stop I/O
errors from occurring.
Fixes: 246450344d ("arm64: dts: rockchip: rk3399: Radxa ROCK 4C+")
Signed-off-by: Christopher Obbard <chris.obbard@collabora.com>
Link: https://lore.kernel.org/r/20230705144255.115299-3-chris.obbard@collabora.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
There is some instablity with some eMMC modules on ROCK Pi 4 SBCs running
in HS400 mode. This ends up resulting in some block errors after a while
or after a "heavy" operation utilising the eMMC (e.g. resizing a
filesystem). An example of these errors is as follows:
[ 289.171014] mmc1: running CQE recovery
[ 290.048972] mmc1: running CQE recovery
[ 290.054834] mmc1: running CQE recovery
[ 290.060817] mmc1: running CQE recovery
[ 290.061337] blk_update_request: I/O error, dev mmcblk1, sector 1411072 op 0x1:(WRITE) flags 0x800 phys_seg 36 prio class 0
[ 290.061370] EXT4-fs warning (device mmcblk1p1): ext4_end_bio:348: I/O error 10 writing to inode 29547 starting block 176466)
[ 290.061484] Buffer I/O error on device mmcblk1p1, logical block 172288
[ 290.061531] Buffer I/O error on device mmcblk1p1, logical block 172289
[ 290.061551] Buffer I/O error on device mmcblk1p1, logical block 172290
[ 290.061574] Buffer I/O error on device mmcblk1p1, logical block 172291
[ 290.061592] Buffer I/O error on device mmcblk1p1, logical block 172292
[ 290.061615] Buffer I/O error on device mmcblk1p1, logical block 172293
[ 290.061632] Buffer I/O error on device mmcblk1p1, logical block 172294
[ 290.061654] Buffer I/O error on device mmcblk1p1, logical block 172295
[ 290.061673] Buffer I/O error on device mmcblk1p1, logical block 172296
[ 290.061695] Buffer I/O error on device mmcblk1p1, logical block 172297
Disabling the Command Queue seems to stop the CQE recovery from running,
but doesn't seem to improve the I/O errors. Until this can be investigated
further, disable HS400 mode on the ROCK Pi 4 SBCs to at least stop I/O
errors from occurring.
While we are here, set the eMMC maximum clock frequency to 1.5MHz to
follow the ROCK 4C+.
Fixes: 1b5715c602 ("arm64: dts: rockchip: add ROCK Pi 4 DTS support")
Signed-off-by: Christopher Obbard <chris.obbard@collabora.com>
Tested-By: Folker Schwesinger <dev@folker-schwesinger.de>
Link: https://lore.kernel.org/r/20230705144255.115299-2-chris.obbard@collabora.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
GPIO_ACTIVE_x flags are not correct in the context of interrupt flags.
These are simple defines so they could be used in DTS but they will not
have the same meaning: GPIO_ACTIVE_HIGH = 0 = IRQ_TYPE_NONE.
Correct the interrupt flags, assuming the author of the code wanted same
logical behavior behind the name "ACTIVE_xxx", this is:
ACTIVE_HIGH => IRQ_TYPE_LEVEL_HIGH
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230707063335.13317-3-krzysztof.kozlowski@linaro.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
GPIO_ACTIVE_x flags are not correct in the context of interrupt flags.
These are simple defines so they could be used in DTS but they will not
have the same meaning: GPIO_ACTIVE_HIGH = 0 = IRQ_TYPE_NONE.
Correct the interrupt flags, assuming the author of the code wanted same
logical behavior behind the name "ACTIVE_xxx", this is:
ACTIVE_HIGH => IRQ_TYPE_LEVEL_HIGH
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Tested-by: Christopher Obbard <chris.obbard@collabora.com>
Link: https://lore.kernel.org/r/20230707063335.13317-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
GPIO_ACTIVE_x flags are not correct in the context of interrupt flags.
These are simple defines so they could be used in DTS but they will not
have the same meaning: GPIO_ACTIVE_HIGH = 0 = IRQ_TYPE_NONE.
Correct the interrupt flags, assuming the author of the code wanted same
logical behavior behind the name "ACTIVE_xxx", this is:
ACTIVE_HIGH => IRQ_TYPE_LEVEL_HIGH
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230707063335.13317-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
With the introduction of commit 9365bf006f ("PCI: tegra194: Add
interconnect support in Tegra234"), the PCI driver on Tegra194 and later
requires an interconnect provider. However, a provider is currently only
exposed on Tegra234 and this causes PCI on Tegra194 to defer probe
indefinitely.
Fix this by adding a dummy implementation on Tegra194. This allows nodes
to be provided to interconnect consumers, but doesn't do any bandwidth
accounting or frequency scaling.
Fixes: 9365bf006f ("PCI: tegra194: Add interconnect support in Tegra234")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Sumit Gupta <sumitg@nvidia.com>
Tested-by: Sumit Gupta <sumitg@nvidia.com>
Link: https://lore.kernel.org/r/20230629160132.768940-1-thierry.reding@gmail.com
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
After the elimination of inner modes, a couple of warnings that
were previously unreachable can now be triggered by malformed
inbound packets.
Fix this by:
1. Moving the setting of skb->protocol into the decap functions.
2. Returning -EINVAL when unexpected protocol is seen.
Reported-by: Maciej Żenczykowski<maze@google.com>
Fixes: 5f24f41e8e ("xfrm: Remove inner/outer modes from input path")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The commit 3a786086c6 ("arm64: dts: qcom: Add missing "-thermal"
suffix for thermal zones") renamed the thermal zone in the pm8150l.dtsi
file to comply with the schema. However this resulted in a clash with
the RB5 board file, which already contained the pm8150l-thermal zone for
the on-board sensor. This resulted in the board file definition
overriding the thermal zone defined in the PMIC include file (and thus
the on-die PMIC temp alarm was not probing at all).
Rename the thermal zone in qcom/qrb5165-rb5.dts to remove this override.
Fixes: 3a786086c6 ("arm64: dts: qcom: Add missing "-thermal" suffix for thermal zones")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Link: https://lore.kernel.org/r/20230613131224.666668-1-dmitry.baryshkov@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
SM8350 HDK and MTP boards were silently dying and rebooting during BAM
DMA probe, probably during reading BAM_REVISION register:
[ 1.574304] vreg_bob: Setting 3008000-3960000uV
[ 1.576918] bam-dFormat: Log Type - Time(microsec) - Message -
Optional Info
Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic
S - QC_IMAGE_VERSION_STRING=BOOT.MXF.1.0-00637.1-LAHAINA-1
S - IMAGE_VARIANT_STRING=SocLahainaLAA
S - OEM_IMAGE_VERSION_STRING=crm-ubuntu77
S - Boot Interface: UFS
It seems that BAM DMA is not yet operational, thus mark it as failed and
disable also QCE because it won't work without BAM DMA.
Fixes: f1040a7fe8 ("arm64: dts: qcom: sm8350: Add Crypto Engine support")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230626145959.646747-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
According to all consumers code of attrs[XFRMA_SEC_CTX], like
* verify_sec_ctx_len(), convert to xfrm_user_sec_ctx*
* xfrm_state_construct(), call security_xfrm_state_alloc whose prototype
is int security_xfrm_state_alloc(.., struct xfrm_user_sec_ctx *sec_ctx);
* copy_from_user_sec_ctx(), convert to xfrm_user_sec_ctx *
...
It seems that the expected parsing result for XFRMA_SEC_CTX should be
structure xfrm_user_sec_ctx, and the current xfrm_sec_ctx is confusing
and misleading (Luckily, they happen to have same size 8 bytes).
This commit amend the policy structure to xfrm_user_sec_ctx to avoid
ambiguity.
Fixes: cf5cb79f69 ("[XFRM] netlink: Establish an attribute policy")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Commmit f5ea16137a ("NFSv4: Retry LOCK on OLD_STATEID during delegation
return") attempted to solve this problem by using nfs4's generic async error
handling, but introduced a regression where v4.0 lock recovery would hang.
The additional complexity introduced by overloading that error handling is
not necessary for this case. This patch expects that commit to be
reverted.
The problem as originally explained in the above commit is:
There's a small window where a LOCK sent during a delegation return can
race with another OPEN on client, but the open stateid has not yet been
updated. In this case, the client doesn't handle the OLD_STATEID error
from the server and will lose this lock, emitting:
"NFS: nfs4_handle_delegation_recall_error: unhandled error -10024".
Fix this by using the old_stateid refresh helpers if the server replies
with OLD_STATEID.
Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When running xfrm_state_walk_init(), the xfrm_address_filter being used
is okay to have a splen/dplen that equals to sizeof(xfrm_address_t)<<3.
This commit replaces >= to > to make sure the boundary checking is
correct.
Fixes: 37bd22420f ("af_key: pfkey_dump needs parameter validation")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Several places in code for Hyper-V reference the
per-CPU variable hyperv_pcpu_input_arg. Older code uses a multi-line
sequence to reference the variable, and usually includes a cast.
Newer code does a much simpler direct assignment. The latter is
preferable as the complexity of the older code is unnecessary.
Update older code to use the simpler direct assignment.
Signed-off-by: Nischala Yelchuri <niyelchu@linux.microsoft.com>
Link: https://lore.kernel.org/r/1687286438-9421-1-git-send-email-niyelchu@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The am335x devices started producing boot errors for resetting musb module
in because of subtle timing changes:
Unhandled fault: external abort on non-linefetch (0x1008)
...
sysc_poll_reset_sysconfig from sysc_reset+0x109/0x12
sysc_reset from sysc_probe+0xa99/0xeb0
...
The fix is to flush posted write after enable before reset during
probe. Note that some devices also need to specify the delay after enable
with ti,sysc-delay-us, but this is not needed for musb on am335x based on
my tests.
Reported-by: kernelci.org bot <bot@kernelci.org>
Closes: https://storage.kernelci.org/next/master/next-20230614/arm/multi_v7_defconfig+CONFIG_THUMB2_KERNEL=y/gcc-10/lab-cip/baseline-beaglebone-black.html
Fixes: 596e795569 ("bus: ti-sysc: Add support for software reset")
Signed-off-by: Tony Lindgren <tony@atomide.com>
2023-06-14 11:03:06 +03:00
999 changed files with 9681 additions and 4792 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.