epoll can call out to vfs_poll() with a file pointer that may race with
the last 'fput()'. That would make f_count go down to zero, and while
the ep->mtx locking means that the resulting file pointer tear-down will
be blocked until the poll returns, it means that f_count is already
dead, and any use of it won't actually get a reference to the file any
more: it's dead regardless.
Make sure we have a valid ref on the file pointer before we call down to
vfs_poll() from the epoll routines.
Link: https://lore.kernel.org/lkml/0000000000002d631f0615918f1e@google.com/
Reported-by: syzbot+045b454ab35fd82a35fb@syzkaller.appspotmail.com
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull EDAC fixes from Borislav Petkov:
- Fix error logging and check user-supplied data when injecting an
error in the versal EDAC driver
* tag 'edac_urgent_for_v6.9_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/versal: Do not log total error counts
EDAC/versal: Check user-supplied data before injecting an error
EDAC/versal: Do not register for NOC errors
Pull powerpc fixes from Michael Ellerman:
- Fix incorrect delay handling in the plpks (keystore) code
- Fix a panic when an LPAR boots with a frozen PE
Thanks to Andrew Donnellan, Gaurav Batra, Nageswara R Sastry, and Nayna
Jain.
* tag 'powerpc-6.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries/iommu: LPAR panics during boot up with a frozen PE
powerpc/pseries: make max polling consistent for longer H_CALLs
Pull misc x86 fixes from Ingo Molnar:
- Remove the broken vsyscall emulation code from
the page fault code
- Fix kexec crash triggered by certain SEV RMP
table layouts
- Fix unchecked MSR access error when disabling
the x2APIC via iommu=off
* tag 'x86-urgent-2024-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Remove broken vsyscall emulation code from the page fault code
x86/apic: Don't access the APIC when disabling x2APIC
x86/sev: Add callback to apply RMP table fixups for kexec
x86/e820: Add a new e820 table update helper
Pull irq fix from Ingo Molnar:
"Fix suspicious RCU usage in __do_softirq()"
* tag 'irq-urgent-2024-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
softirq: Fix suspicious RCU usage in __do_softirq()
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc/other driver fixes and new device ids
for 6.9-rc7 that resolve some reported problems.
Included in here are:
- iio driver fixes
- mei driver fix and new device ids
- dyndbg bugfix
- pvpanic-pci driver bugfix
- slimbus driver bugfix
- fpga new device id
All have been in linux-next with no reported problems"
* tag 'char-misc-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
slimbus: qcom-ngd-ctrl: Add timeout for wait operation
dyndbg: fix old BUG_ON in >control parser
misc/pvpanic-pci: register attributes via pci_driver
fpga: dfl-pci: add PCI subdevice ID for Intel D5005 card
mei: me: add lunar lake point M DID
mei: pxp: match against PCI_CLASS_DISPLAY_OTHER
iio:imu: adis16475: Fix sync mode setting
iio: accel: mxc4005: Reset chip on probe() and resume()
iio: accel: mxc4005: Interrupt handling fixes
dt-bindings: iio: health: maxim,max30102: fix compatible check
iio: pressure: Fixes SPI support for BMP3xx devices
iio: pressure: Fixes BME280 SPI driver data
Pull USB driver fixes from Greg KH:
"Here are some small USB driver fixes for reported problems for
6.9-rc7. Included in here are:
- usb core fixes for found issues
- typec driver fixes for reported problems
- usb gadget driver fixes for reported problems
- xhci build fixes
- dwc3 driver fixes for reported issues
All of these have been in linux-next this past week with no reported
problems"
* tag 'usb-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: tcpm: Check for port partner validity before consuming it
usb: typec: tcpm: enforce ready state when queueing alt mode vdm
usb: typec: tcpm: unregister existing source caps before re-registration
usb: typec: tcpm: clear pd_event queue in PORT_RESET
usb: typec: tcpm: queue correct sop type in tcpm_queue_vdm_unlocked
usb: Fix regression caused by invalid ep0 maxpacket in virtual SuperSpeed device
usb: ohci: Prevent missed ohci interrupts
usb: typec: qcom-pmic: fix pdphy start() error handling
usb: typec: qcom-pmic: fix use-after-free on late probe errors
usb: gadget: f_fs: Fix a race condition when processing setup packets.
USB: core: Fix access violation during port device removal
usb: dwc3: core: Prevent phy suspend during init
usb: xhci-plat: Don't include xhci.h
usb: gadget: uvc: use correct buffer size when parsing configfs lists
usb: gadget: composite: fix OS descriptors w_value logic
usb: gadget: f_fs: Fix race between aio_cancel() and AIO request complete
Pull input fixes from Dmitry Torokhov:
- a new ID for ASUS ROG RAIKIRI controllers added to xpad driver
- amimouse driver structure annotated with __refdata to prevent section
mismatch warnings.
* tag 'input-for-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: amimouse - mark driver struct with __refdata to prevent section mismatch
Input: xpad - add support for ASUS ROG RAIKIRI
Pull probes fix from Masami Hiramatsu:
- probe-events: Fix memory leak in parsing probe argument.
There is a memory leak (forget to free an allocated buffer) in a
memory allocation failure path. Fix it to jump to the correct error
handling code.
* tag 'probes-fixes-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/probes: Fix memory leak in traceprobe_parse_probe_arg_body()
Pull tracing and tracefs fixes from Steven Rostedt:
- Fix RCU callback of freeing an eventfs_inode.
The freeing of the eventfs_inode from the kref going to zero freed
the contents of the eventfs_inode and then used kfree_rcu() to free
the inode itself. But the contents should also be protected by RCU.
Switch to a call_rcu() that calls a function to free all of the
eventfs_inode after the RCU synchronization.
- The tracing subsystem maps its own descriptor to a file represented
by eventfs. The freeing of this descriptor needs to know when the
last reference of an eventfs_inode is released, but currently there
is no interface for that.
Add a "release" callback to the eventfs_inode entry array that allows
for freeing of data that can be referenced by the eventfs_inode being
opened. Then increment the ref counter for this descriptor when the
eventfs_inode file is created, and decrement/free it when the last
reference to the eventfs_inode is released and the file is removed.
This prevents races between freeing the descriptor and the opening of
the eventfs file.
- Fix the permission processing of eventfs.
The change to make the permissions of eventfs default to the mount
point but keep track of when changes were made had a side effect that
could cause security concerns. When the tracefs is remounted with a
given gid or uid, all the files within it should inherit that gid or
uid. But if the admin had changed the permission of some file within
the tracefs file system, it would not get updated by the remount.
This caused the kselftest of file permissions to fail the second time
it is run. The first time, all changes would look fine, but the
second time, because the changes were "saved", the remount did not
reset them.
Create a link list of all existing tracefs inodes, and clear the
saved flags on them on a remount if the remount changes the
corresponding gid or uid fields.
This also simplifies the code by removing the distinction between the
toplevel eventfs and an instance eventfs. They should both act the
same. They were different because of a misconception due to the
remount not resetting the flags. Now that remount resets all the
files and directories to default to the root node if a uid/gid is
specified, it makes the logic simpler to implement.
* tag 'trace-v6.9-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Have "events" directory get permissions from its parent
eventfs: Do not treat events directory different than other directories
eventfs: Do not differentiate the toplevel events directory
tracefs: Still use mount point as default permissions for instances
tracefs: Reset permissions on remount if permissions are options
eventfs: Free all of the eventfs_inode after RCU
eventfs/tracing: Add callback for release of an eventfs_inode
Pull dma-mapping fix from Christoph Hellwig:
- fix the combination of restricted pools and dynamic swiotlb
(Will Deacon)
* tag 'dma-mapping-6.9-2024-05-04' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: initialise restricted pool list_head when SWIOTLB_DYNAMIC=y
Pull clk fixes from Stephen Boyd:
"A handful of clk driver fixes:
- Avoid a deadlock in the Qualcomm clk driver by making the regulator
which supplies the GDSC optional
- Restore RPM clks on Qualcomm msm8976 by setting num_clks
- Fix Allwinner H6 CPU rate changing logic to avoid system crashes by
temporarily reparenting the CPU clk to something that isn't being
changed
- Set a MIPI PLL min/max rate on Allwinner A64 to fix blank screens
on some devices
- Revert back to of_match_device() in the Samsung clkout driver to
get the match data based on the parent device's compatible string"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: samsung: Revert "clk: Use device_get_match_data()"
clk: sunxi-ng: a64: Set minimum and maximum rate for PLL-MIPI
clk: sunxi-ng: common: Support minimum and maximum rate
clk: sunxi-ng: h6: Reparent CPUX during PLL CPUX rate change
clk: qcom: smd-rpm: Restore msm8976 num_clk
clk: qcom: gdsc: treat optional supplies as optional
The events directory gets its permissions from the root inode. But this
can cause an inconsistency if the instances directory changes its
permissions, as the permissions of the created directories under it should
inherit the permissions of the instances directory when directories under
it are created.
Currently the behavior is:
# cd /sys/kernel/tracing
# chgrp 1002 instances
# mkdir instances/foo
# ls -l instances/foo
[..]
-r--r----- 1 root lkp 0 May 1 18:55 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 18:55 current_tracer
-rw-r----- 1 root lkp 0 May 1 18:55 error_log
drwxr-xr-x 1 root root 0 May 1 18:55 events
--w------- 1 root lkp 0 May 1 18:55 free_buffer
drwxr-x--- 2 root lkp 0 May 1 18:55 options
drwxr-x--- 10 root lkp 0 May 1 18:55 per_cpu
-rw-r----- 1 root lkp 0 May 1 18:55 set_event
All the files and directories under "foo" has the "lkp" group except the
"events" directory. That's because its getting its default value from the
mount point instead of its parent.
Have the "events" directory make its default value based on its parent's
permissions. That now gives:
# ls -l instances/foo
[..]
-rw-r----- 1 root lkp 0 May 1 21:16 buffer_subbuf_size_kb
-r--r----- 1 root lkp 0 May 1 21:16 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 21:16 current_tracer
-rw-r----- 1 root lkp 0 May 1 21:16 error_log
drwxr-xr-x 1 root lkp 0 May 1 21:16 events
--w------- 1 root lkp 0 May 1 21:16 free_buffer
drwxr-x--- 2 root lkp 0 May 1 21:16 options
drwxr-x--- 10 root lkp 0 May 1 21:16 per_cpu
-rw-r----- 1 root lkp 0 May 1 21:16 set_event
Link: https://lore.kernel.org/linux-trace-kernel/20240502200906.161887248@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 8186fff7ab ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Treat the events directory the same as other directories when it comes to
permissions. The events directory was considered different because it's
dentry is persistent, whereas the other directory dentries are created
when accessed. But the way tracefs now does its ownership by using the
root dentry's permissions as the default permissions, the events directory
can get out of sync when a remount is performed setting the group and user
permissions.
Remove the special case for the events directory on setting the
attributes. This allows the updates caused by remount to work properly as
well as simplifies the code.
Link: https://lore.kernel.org/linux-trace-kernel/20240502200906.002923579@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 8186fff7ab ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
If the instances directory's permissions were never change, then have it
and its children use the mount point permissions as the default.
Currently, the permissions of instance directories are determined by the
instance directory's permissions itself. But if the tracefs file system is
remounted and changes the permissions, the instance directory and its
children should use the new permission.
But because both the instance directory and its children use the instance
directory's inode for permissions, it misses the update.
To demonstrate this:
# cd /sys/kernel/tracing/
# mkdir instances/foo
# ls -ld instances/foo
drwxr-x--- 5 root root 0 May 1 19:07 instances/foo
# ls -ld instances
drwxr-x--- 3 root root 0 May 1 18:57 instances
# ls -ld current_tracer
-rw-r----- 1 root root 0 May 1 18:57 current_tracer
# mount -o remount,gid=1002 .
# ls -ld instances
drwxr-x--- 3 root root 0 May 1 18:57 instances
# ls -ld instances/foo/
drwxr-x--- 5 root root 0 May 1 19:07 instances/foo/
# ls -ld current_tracer
-rw-r----- 1 root lkp 0 May 1 18:57 current_tracer
Notice that changing the group id to that of "lkp" did not affect the
instances directory nor its children. It should have been:
# ls -ld current_tracer
-rw-r----- 1 root root 0 May 1 19:19 current_tracer
# ls -ld instances/foo/
drwxr-x--- 5 root root 0 May 1 19:25 instances/foo/
# ls -ld instances
drwxr-x--- 3 root root 0 May 1 19:19 instances
# mount -o remount,gid=1002 .
# ls -ld current_tracer
-rw-r----- 1 root lkp 0 May 1 19:19 current_tracer
# ls -ld instances
drwxr-x--- 3 root lkp 0 May 1 19:19 instances
# ls -ld instances/foo/
drwxr-x--- 5 root lkp 0 May 1 19:25 instances/foo/
Where all files were updated by the remount gid update.
Link: https://lore.kernel.org/linux-trace-kernel/20240502200905.686838327@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 8186fff7ab ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
There's an inconsistency with the way permissions are handled in tracefs.
Because the permissions are generated when accessed, they default to the
root inode's permission if they were never set by the user. If the user
sets the permissions, then a flag is set and the permissions are saved via
the inode (for tracefs files) or an internal attribute field (for
eventfs).
But if a remount happens that specify the permissions, all the files that
were not changed by the user gets updated, but the ones that were are not.
If the user were to remount the file system with a given permission, then
all files and directories within that file system should be updated.
This can cause security issues if a file's permission was updated but the
admin forgot about it. They could incorrectly think that remounting with
permissions set would update all files, but miss some.
For example:
# cd /sys/kernel/tracing
# chgrp 1002 current_tracer
# ls -l
[..]
-rw-r----- 1 root root 0 May 1 21:25 buffer_size_kb
-rw-r----- 1 root root 0 May 1 21:25 buffer_subbuf_size_kb
-r--r----- 1 root root 0 May 1 21:25 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 21:25 current_tracer
-rw-r----- 1 root root 0 May 1 21:25 dynamic_events
-r--r----- 1 root root 0 May 1 21:25 dyn_ftrace_total_info
-r--r----- 1 root root 0 May 1 21:25 enabled_functions
Where current_tracer now has group "lkp".
# mount -o remount,gid=1001 .
# ls -l
-rw-r----- 1 root tracing 0 May 1 21:25 buffer_size_kb
-rw-r----- 1 root tracing 0 May 1 21:25 buffer_subbuf_size_kb
-r--r----- 1 root tracing 0 May 1 21:25 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 21:25 current_tracer
-rw-r----- 1 root tracing 0 May 1 21:25 dynamic_events
-r--r----- 1 root tracing 0 May 1 21:25 dyn_ftrace_total_info
-r--r----- 1 root tracing 0 May 1 21:25 enabled_functions
Everything changed but the "current_tracer".
Add a new link list that keeps track of all the tracefs_inodes which has
the permission flags that tell if the file/dir should use the root inode's
permission or not. Then on remount, clear all the flags so that the
default behavior of using the root inode's permission is done for all
files and directories.
Link: https://lore.kernel.org/linux-trace-kernel/20240502200905.529542160@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 8186fff7ab ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Synthetic events create and destroy tracefs files when they are created
and removed. The tracing subsystem has its own file descriptor
representing the state of the events attached to the tracefs files.
There's a race between the eventfs files and this file descriptor of the
tracing system where the following can cause an issue:
With two scripts 'A' and 'B' doing:
Script 'A':
echo "hello int aaa" > /sys/kernel/tracing/synthetic_events
while :
do
echo 0 > /sys/kernel/tracing/events/synthetic/hello/enable
done
Script 'B':
echo > /sys/kernel/tracing/synthetic_events
Script 'A' creates a synthetic event "hello" and then just writes zero
into its enable file.
Script 'B' removes all synthetic events (including the newly created
"hello" event).
What happens is that the opening of the "enable" file has:
{
struct trace_event_file *file = inode->i_private;
int ret;
ret = tracing_check_open_get_tr(file->tr);
[..]
But deleting the events frees the "file" descriptor, and a "use after
free" happens with the dereference at "file->tr".
The file descriptor does have a reference counter, but there needs to be a
way to decrement it from the eventfs when the eventfs_inode is removed
that represents this file descriptor.
Add an optional "release" callback to the eventfs_entry array structure,
that gets called when the eventfs file is about to be removed. This allows
for the creating on the eventfs file to increment the tracing file
descriptor ref counter. When the eventfs file is deleted, it can call the
release function that will call the put function for the tracing file
descriptor.
This will protect the tracing file from being freed while a eventfs file
that references it is being opened.
Link: https://lore.kernel.org/linux-trace-kernel/20240426073410.17154-1-Tze-nan.Wu@mediatek.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240502090315.448cba46@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 5790b1fb3d ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: Tze-nan wu <Tze-nan.Wu@mediatek.com>
Tested-by: Tze-nan Wu (吳澤南) <Tze-nan.Wu@mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull cxl fix from Dave Jiang:
"Add missing RCH support for endpoint access_coordinate calculation.
A late bug was reported by Robert Richter that the Restricted CXL Host
(RCH) support was missing in the CXL endpoint access_coordinate
calculation.
The missing support causes the topology iterator to stumble over a
NULL pointer and triggers a kernel OOPS on a platform with CXL 1.1
support.
The fix bypasses RCH topology as the access_coordinate calculation is
not necessary since RCH does not support hotplug and the memory region
exported should be covered by the HMAT table already.
A unit test is also added to cxl_test to check against future
regressions on the topology iterator"
* tag 'cxl-fixes-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl: Fix cxl_endpoint_get_perf_coordinate() support for RCH
Pull xen fixes from Juergen Gross:
"Two fixes when running as Xen PV guests for issues introduced in the
6.9 merge window, both related to apic id handling"
* tag 'for-linus-6.9a-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: return a sane initial apic id when running as PV guest
x86/xen/smp_pv: Register the boot CPU APIC properly
Pull EFI fix from Ard Biesheuvel:
"This works around a shortcoming in the memory acceptation API, which
may apparently hog the CPU for long enough to trigger the softlockup
watchdog.
Note that this only affects confidential VMs running under the Intel
TDX hypervisor, which is why I accepted this for now, but this should
obviously be fixed properly in the future"
* tag 'efi-urgent-for-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi/unaccepted: touch soft lockup during memory accept
Pull block fixes from Jens Axboe:
"Nothing major in here - an nvme pull request with mostly auth/tcp
fixes, and a single fix for ublk not setting segment count and size
limits"
* tag 'block-6.9-20240503' of git://git.kernel.dk/linux:
nvme-tcp: strict pdu pacing to avoid send stalls on TLS
nvmet: fix nvme status code when namespace is disabled
nvmet-tcp: fix possible memory leak when tearing down a controller
nvme: cancel pending I/O if nvme controller is in terminal state
nvmet-auth: replace pr_debug() with pr_err() to report an error.
nvmet-auth: return the error code to the nvmet_auth_host_hash() callers
nvme: find numa distance only if controller has valid numa id
ublk: remove segment count and size limits
nvme: fix warn output about shared namespaces without CONFIG_NVME_MULTIPATH
Pull sound fixes from Takashi Iwai:
"As usual in a late stage, we received a fair amount of fixes for ASoC,
and it became bigger than wished. But all fixes are rather device-
specific, and they look pretty safe to apply.
A major par of changes are series of fixes for ASoC meson and SOF
drivers as well as for Realtek and Cirrus codecs. In addition, recent
emu10k1 regression fixes and usual HD-audio quirks are included"
* tag 'sound-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (46 commits)
ALSA: hda/realtek: Fix build error without CONFIG_PM
ALSA: hda/realtek: Fix conflicting PCI SSID 17aa:386f for Lenovo Legion models
ALSA: hda/realtek - Set GPIO3 to default at S4 state for Thinkpad with ALC1318
ALSA: hda: intel-sdw-acpi: fix usage of device_get_named_child_node()
ALSA: hda: intel-dsp-config: harden I2C/I2S codec detection
ASoC: cs35l56: fix usages of device_get_named_child_node()
ASoC: da7219-aad: fix usage of device_get_named_child_node()
ASoC: meson: cards: select SND_DYNAMIC_MINORS
ASoC: meson: axg-tdm: add continuous clock support
ASoC: meson: axg-tdm-interface: manage formatters in trigger
ASoC: meson: axg-card: make links nonatomic
ASoC: meson: axg-fifo: use threaded irq to check periods
ALSA: hda/realtek: Fix mute led of HP Laptop 15-da3001TU
ALSA: emu10k1: make E-MU FPGA writes potentially more reliable
ALSA: emu10k1: fix E-MU dock initialization
ALSA: emu10k1: use mutex for E-MU FPGA access locking
ALSA: emu10k1: move the whole GPIO event handling to the workqueue
ALSA: emu10k1: factor out snd_emu1010_load_dock_firmware()
ALSA: emu10k1: fix E-MU card dock presence monitoring
ASoC: rt715-sdca: volume step modification
...
Pull drm fixes from Dave Airlie:
"Weekly fixes, mostly made up from amdgpu and some panel changes.
Otherwise xe, nouveau, vmwgfx and a couple of others, all seems pretty
on track.
amdgpu:
- Fix VRAM memory accounting
- DCN 3.1 fixes
- DCN 2.0 fix
- DCN 3.1.5 fix
- DCN 3.5 fix
- DCN 3.2.1 fix
- DP fixes
- Seamless boot fix
- Fix call order in amdgpu_ttm_move()
- Fix doorbell regression
- Disable panel replay temporarily
amdkfd:
- Flush wq before creating kfd process
xe:
- Fix UAF on rebind worker
- Fix ADL-N display integration
imagination:
- fix page-count macro
nouveau:
- avoid page-table allocation failures
- fix firmware memory allocation
panel:
- ili9341: avoid OF for device properties; respect deferred probe;
fix usage of errno codes
ttm:
- fix status output
vmwgfx:
- fix legacy display unit
- fix read length in fence signalling"
* tag 'drm-fixes-2024-05-03' of https://gitlab.freedesktop.org/drm/kernel: (25 commits)
drm/xe/display: Fix ADL-N detection
drm/panel: ili9341: Use predefined error codes
drm/panel: ili9341: Respect deferred probe
drm/panel: ili9341: Correct use of device property APIs
drm/xe/vm: prevent UAF in rebind_work_func()
drm/amd/display: Disable panel replay by default for now
drm/amdgpu: fix doorbell regression
drm/amdkfd: Flush the process wq before creating a kfd_process
drm/amd/display: Disable seamless boot on 128b/132b encoding
drm/amd/display: Fix DC mode screen flickering on DCN321
drm/amd/display: Add VCO speed parameter for DCN31 FPU
drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2
drm/amd/display: Allocate zero bw after bw alloc enable
drm/amd/display: Fix incorrect DSC instance for MST
drm/amd/display: Atom Integrated System Info v2_2 for DCN35
drm/amd/display: Add dtbclk access to dcn315
drm/amd/display: Ensure that dmcub support flag is set for DCN20
drm/amd/display: Handle Y carry-over in VCP X.Y calculation
drm/amdgpu: Fix VRAM memory accounting
drm/vmwgfx: Fix invalid reads in fence signaled events
...
Pull spi fixes from Mark Brown:
"A few small fixes for v6.9,
The core fix is for issues with reuse of a spi_message in the case
where we've got queued messages (a relatively rare occurrence with
modern code so it wasn't noticed in testing).
We also avoid an issue with the Kunpeng driver by simply removing the
debug interface that could trigger it, and address issues with
confusing and corrupted output when printing the IP version of the AXI
SPI engine"
* tag 'spi-fix-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: fix null pointer dereference within spi_sync
spi: hisi-kunpeng: Delete the dump interface of data registers in debugfs
spi: axi-spi-engine: fix version format string
Pull btrfs fixes from David Sterba:
- set correct ram_bytes when splitting ordered extent. This can be
inconsistent on-disk but harmless as it's not used for calculations
and it's only advisory for compression
- fix lockdep splat when taking cleaner mutex in qgroups disable ioctl
- fix missing mutex unlock on error path when looking up sys chunk for
relocation
* tag 'for-6.9-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: set correct ram_bytes when splitting ordered extent
btrfs: take the cleaner_mutex earlier in qgroup disable
btrfs: add missing mutex_unlock in btrfs_relocate_sys_chunks()
Pull s390 fixes from Alexander Gordeev:
- The function __storage_key_init_range() expects the end address to be
the first byte outside the range to be initialized. Fix the callers
that provide the last byte within the range instead.
- 3270 Channel Command Word (CCW) may contain zero data address in case
there is no data in the request. Add data availability check to avoid
erroneous non-zero value as result of virt_to_dma32(NULL) application
in cases there is no data
- Add missing CFI directives for an unwinder to restore the return
address in the vDSO assembler code
- NUL-terminate kernel buffer when duplicating user space memory region
on Channel IO (CIO) debugfs write inject
- Fix wrong format string in zcrypt debug output
- Return -EBUSY code when a CCA card is temporarily unavailabile
- Restore a loop that retries derivation of a protected key from a
secure key in cases the low level reports temporarily unavailability
with -EBUSY code
* tag 's390-6.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/paes: Reestablish retry loop in paes
s390/zcrypt: Use EBUSY to indicate temp unavailability
s390/zcrypt: Handle ep11 cprb return code
s390/zcrypt: Fix wrong format string in debug feature printout
s390/cio: Ensure the copied buf is NUL terminated
s390/vdso: Add CFI for RA register to asm macro vdso_func
s390/3270: Fix buffer assignment
s390/mm: Fix clearing storage keys for huge pages
s390/mm: Fix storage key clearing for guest huge pages
Pull xtensa fixes from Max Filippov:
- fix unused variable warning caused by empty flush_dcache_page()
definition
- fix stack unwinding on windowed noMMU XIP configurations
- fix Coccinelle warning 'opportunity for min()' in xtensa ISS platform
code
* tag 'xtensa-20240502' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: remove redundant flush_dcache_page and ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE macros
tty: xtensa/iss: Use min() to fix Coccinelle warning
xtensa: fix MAKE_PC_FROM_RA second argument
With recent sanity checks for topology information added, there are now
warnings issued for APs when running as a Xen PV guest:
[Firmware Bug]: CPU 1: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0001
This is due to the initial APIC ID obtained via CPUID for PV guests is
always 0.
Avoid the warnings by synthesizing the CPUID data to contain the same
initial APIC ID as xen_pv_smp_config() is using for registering the
APIC IDs of all CPUs.
Fixes: 52128a7a21 ("86/cpu/topology: Make the APIC mismatch warnings complete")
Signed-off-by: Juergen Gross <jgross@suse.com>
Contrary to i915, in xe ADL-N is kept as a different platform, not a
subplatform of ADL-P. Since the display side doesn't need to
differentiate between P and N, i.e. IS_ALDERLAKE_P_N() is never called,
just fixup the compat header to check for both P and N.
Moving ADL-N to be a subplatform would be more complex as the firmware
loading in xe only handles platforms, not subplatforms, as going forward
the direction is to check on IP version rather than
platforms/subplatforms.
Fix warning when initializing display:
xe 0000:00:02.0: [drm:intel_pch_type [xe]] Found Alder Lake PCH
------------[ cut here ]------------
xe 0000:00:02.0: drm_WARN_ON(!((dev_priv)->info.platform == XE_ALDERLAKE_S) && !((dev_priv)->info.platform == XE_ALDERLAKE_P))
And wrong paths being taken on the display side.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425181610.2704633-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 6a2a90cba1)
Fixes: 44e694958b ("drm/xe/display: Implement display support")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Pull firewire fixes from Takashi Sakamoto:
"Two driver fixes:
- The firewire-ohci driver for 1394 OHCI hardware does not fill time
stamp for response packet when handling asynchronous transaction to
local destination. This brings an inconvenience that the response
packet is not equivalent between the transaction to local and
remote. It is fixed by fulfilling the time stamp with hardware
time. The fix should be applied to Linux kernel v6.5 or later as
well.
- The nosy driver for Texas Instruments TSB12LV21A (PCILynx) has
long-standing issue about the behaviour when user space application
passes less size of buffer than expected. It is fixed by returning
zero according to the convention of UNIX-like systems"
* tag 'firewire-fixes-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: ohci: fulfill timestamp for some local asynchronous transaction
firewire: nosy: ensure user_length is taken into account when fetching packet contents
The topology core expects the boot APIC to be registered from earhy APIC
detection first and then again when the firmware tables are evaluated. This
is used for detecting the real BSP CPU on a kexec kernel.
The recent conversion of XEN/PV to register fake APIC IDs failed to
register the boot CPU APIC correctly as it only registers it once. This
causes the BSP detection mechanism to trigger wrongly:
CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 0 > 1
Additionally this results in one CPU being ignored.
Register the boot CPU APIC twice so that the XEN/PV fake enumeration
behaves like real firmware.
Reported-by: Juergen Gross <jgross@suse.com>
Fixes: e753070234 ("x86/xen/smp_pv: Register fake APICs")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/87a5l8s2fg.ffs@tglx
Signed-off-by: Juergen Gross <jgross@suse.com>
Pull thermal control fixes from Rafael Wysocki:
"Fix a memory leak and a few locking issues (that may cause the kernel
to crash in principle if all goes wrong) in the thermal debug code
introduced during the 6.8 development cycle"
* tag 'thermal-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/debugfs: Prevent use-after-free from occurring after cdev removal
thermal/debugfs: Fix two locking issues with thermal zone debug
thermal/debugfs: Free all thermal zone debug memory on zone removal
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf.
Relatively calm week, likely due to public holiday in most places. No
known outstanding regressions.
Current release - regressions:
- rxrpc: fix wrong alignmask in __page_frag_alloc_align()
- eth: e1000e: change usleep_range to udelay in PHY mdic access
Previous releases - regressions:
- gro: fix udp bad offset in socket lookup
- bpf: fix incorrect runtime stat for arm64
- tipc: fix UAF in error path
- netfs: fix a potential infinite loop in extract_user_to_sg()
- eth: ice: ensure the copied buf is NUL terminated
- eth: qeth: fix kernel panic after setting hsuid
Previous releases - always broken:
- bpf:
- verifier: prevent userspace memory access
- xdp: use flags field to disambiguate broadcast redirect
- bridge: fix multicast-to-unicast with fraglist GSO
- mptcp: ensure snd_nxt is properly initialized on connect
- nsh: fix outer header access in nsh_gso_segment().
- eth: bcmgenet: fix racing registers access
- eth: vxlan: fix stats counters.
Misc:
- a bunch of MAINTAINERS file updates"
* tag 'net-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
MAINTAINERS: mark MYRICOM MYRI-10G as Orphan
MAINTAINERS: remove Ariel Elior
net: gro: add flush check in udp_gro_receive_segment
net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb
ipv4: Fix uninit-value access in __ip_make_skb()
s390/qeth: Fix kernel panic after setting hsuid
vxlan: Pull inner IP header in vxlan_rcv().
tipc: fix a possible memleak in tipc_buf_append
tipc: fix UAF in error path
rxrpc: Clients must accept conn from any address
net: core: reject skb_copy(_expand) for fraglist GSO skbs
net: bridge: fix multicast-to-unicast with fraglist GSO
mptcp: ensure snd_nxt is properly initialized on connect
e1000e: change usleep_range to udelay in PHY mdic access
net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341
cxgb4: Properly lock TX queue for the selftest.
rxrpc: Fix using alignmask being zero for __page_frag_alloc_align()
vxlan: Add missing VNI filter counter update in arp_reduce().
vxlan: Fix racy device stats updates.
net: qede: use return from qede_parse_actions()
...
Pull NVMe fixes from Keith.
* git://git.infradead.org/nvme:
nvme-tcp: strict pdu pacing to avoid send stalls on TLS
nvmet: fix nvme status code when namespace is disabled
nvmet-tcp: fix possible memory leak when tearing down a controller
nvme: cancel pending I/O if nvme controller is in terminal state
nvmet-auth: replace pr_debug() with pr_err() to report an error.
nvmet-auth: return the error code to the nvmet_auth_host_hash() callers
nvme: find numa distance only if controller has valid numa id
nvme: fix warn output about shared namespaces without CONFIG_NVME_MULTIPATH
Using restricted DMA pools (CONFIG_DMA_RESTRICTED_POOL=y) in conjunction
with dynamic SWIOTLB (CONFIG_SWIOTLB_DYNAMIC=y) leads to the following
crash when initialising the restricted pools at boot-time:
| Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
| Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
| pc : rmem_swiotlb_device_init+0xfc/0x1ec
| lr : rmem_swiotlb_device_init+0xf0/0x1ec
| Call trace:
| rmem_swiotlb_device_init+0xfc/0x1ec
| of_reserved_mem_device_init_by_idx+0x18c/0x238
| of_dma_configure_id+0x31c/0x33c
| platform_dma_configure+0x34/0x80
faddr2line reveals that the crash is in the list validation code:
include/linux/list.h:83
include/linux/rculist.h:79
include/linux/rculist.h:106
kernel/dma/swiotlb.c:306
kernel/dma/swiotlb.c:1695
because add_mem_pool() is trying to list_add_rcu() to a NULL
'mem->pools'.
Fix the crash by initialising the 'mem->pools' list_head in
rmem_swiotlb_device_init() before calling add_mem_pool().
Reported-by: Nikita Ioffe <ioffe@google.com>
Tested-by: Nikita Ioffe <ioffe@google.com>
Fixes: 1aaa736815 ("swiotlb: allocate a new memory pool when existing pools are full")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Richard Gobert says:
====================
net: gro: add flush/flush_id checks and fix wrong offset in udp
This series fixes a bug in the complete phase of UDP in GRO, in which
socket lookup fails due to using network_header when parsing encapsulated
packets. The fix is to add network_offset and inner_network_offset to
napi_gro_cb and use these offsets for socket lookup.
In addition p->flush/flush_id should be checked in all UDP flows. The
same logic from tcp_gro_receive is applied for all flows in
udp_gro_receive_segment. This prevents packets with mismatching network
headers (flush/flush_id turned on) from merging in UDP GRO.
The original series includes a change to vxlan test which adds the local
parameter to prevent similar future bugs. I plan to submit it separately to
net-next.
This series is part of a previously submitted series to net-next:
https://lore.kernel.org/all/20240408141720.98832-1-richardbgobert@gmail.com/
v3 -> v4:
- Store network offsets, and use them only in udp_gro_complete flows
- Correct commit hash used in Fixes tag
- v3:
https://lore.kernel.org/netdev/20240424163045.123528-1-richardbgobert@gmail.com/
v2 -> v3:
- Add network_offsets and fix udp bug in a single commit to make backporting easier
- Write to inner_network_offset in {inet,ipv6}_gro_receive
- Use network_offsets union in tcp[46]_gro_complete as well
- v2:
https://lore.kernel.org/netdev/20240419153542.121087-1-richardbgobert@gmail.com/
v1 -> v2:
- Use network_offsets instead of p_poff param as suggested by Willem
- Check flush before postpull, and for all UDP GRO flows
- v1:
https://lore.kernel.org/netdev/20240412152120.115067-1-richardbgobert@gmail.com/
====================
Link: https://lore.kernel.org/r/20240430143555.126083-1-richardbgobert@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
GRO-GSO path is supposed to be transparent and as such L3 flush checks are
relevant to all UDP flows merging in GRO. This patch uses the same logic
and code from tcp_gro_receive, terminating merge if flush is non zero.
Fixes: e20cf8d3f1 ("udp: implement GRO for plain UDP sockets.")
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commits a602456 ("udp: Add GRO functions to UDP socket") and 57c67ff ("udp:
additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the
complete phase of gro. The functions always return skb->network_header,
which in the case of encapsulated packets at the gro complete phase, is
always set to the innermost L3 of the packet. That means that calling
{ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in
gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_
L3/L4 may return an unexpected value.
This incorrect usage leads to a bug in GRO's UDP socket lookup.
udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These
*_hdr functions return network_header which will point to the innermost L3,
resulting in the wrong offset being used in __udp{4,6}_lib_lookup with
encapsulated packets.
This patch adds network_offset and inner_network_offset to napi_gro_cb, and
makes sure both are set correctly.
To fix the issue, network_offsets union is used inside napi_gro_cb, in
which both the outer and the inner network offsets are saved.
Reproduction example:
Endpoint configuration example (fou + local address bind)
# ip fou add port 6666 ipproto 4
# ip link add name tun1 type ipip remote 2.2.2.1 local 2.2.2.2 encap fou encap-dport 5555 encap-sport 6666 mode ipip
# ip link set tun1 up
# ip a add 1.1.1.2/24 dev tun1
Netperf TCP_STREAM result on net-next before patch is applied:
net-next main, GRO enabled:
$ netperf -H 1.1.1.2 -t TCP_STREAM -l 5
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 5.28 2.37
net-next main, GRO disabled:
$ netperf -H 1.1.1.2 -t TCP_STREAM -l 5
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 5.01 2745.06
patch applied, GRO enabled:
$ netperf -H 1.1.1.2 -t TCP_STREAM -l 5
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 5.01 2877.38
Fixes: a6024562ff ("udp: Add GRO functions to UDP socket")
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
KMSAN reported uninit-value access in __ip_make_skb() [1]. __ip_make_skb()
tests HDRINCL to know if the skb has icmphdr. However, HDRINCL can cause a
race condition. If calling setsockopt(2) with IP_HDRINCL changes HDRINCL
while __ip_make_skb() is running, the function will access icmphdr in the
skb even if it is not included. This causes the issue reported by KMSAN.
Check FLOWI_FLAG_KNOWN_NH on fl4->flowi4_flags instead of testing HDRINCL
on the socket.
Also, fl4->fl4_icmp_type and fl4->fl4_icmp_code are not initialized. These
are union in struct flowi4 and are implicitly initialized by
flowi4_init_output(), but we should not rely on specific union layout.
Initialize these explicitly in raw_sendmsg().
[1]
BUG: KMSAN: uninit-value in __ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481
__ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481
ip_finish_skb include/net/ip.h:243 [inline]
ip_push_pending_frames+0x4c/0x5c0 net/ipv4/ip_output.c:1508
raw_sendmsg+0x2381/0x2690 net/ipv4/raw.c:654
inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x274/0x3c0 net/socket.c:745
__sys_sendto+0x62c/0x7b0 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x130/0x200 net/socket.c:2199
do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was created at:
slab_post_alloc_hook mm/slub.c:3804 [inline]
slab_alloc_node mm/slub.c:3845 [inline]
kmem_cache_alloc_node+0x5f6/0xc50 mm/slub.c:3888
kmalloc_reserve+0x13c/0x4a0 net/core/skbuff.c:577
__alloc_skb+0x35a/0x7c0 net/core/skbuff.c:668
alloc_skb include/linux/skbuff.h:1318 [inline]
__ip_append_data+0x49ab/0x68c0 net/ipv4/ip_output.c:1128
ip_append_data+0x1e7/0x260 net/ipv4/ip_output.c:1365
raw_sendmsg+0x22b1/0x2690 net/ipv4/raw.c:648
inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x274/0x3c0 net/socket.c:745
__sys_sendto+0x62c/0x7b0 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x130/0x200 net/socket.c:2199
do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x6d/0x75
CPU: 1 PID: 15709 Comm: syz-executor.7 Not tainted 6.8.0-11567-gb3603fcb79b1 #25
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
Fixes: 99e5acae19 ("ipv4: Fix potential uninit variable access bug in __ip_make_skb()")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Link: https://lore.kernel.org/r/20240430123945.2057348-1-syoshida@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Symptom:
When the hsuid attribute is set for the first time on an IQD Layer3
device while the corresponding network interface is already UP,
the kernel will try to execute a napi function pointer that is NULL.
Example:
---------------------------------------------------------------------------
[ 2057.572696] illegal operation: 0001 ilc:1 [#1] SMP
[ 2057.572702] Modules linked in: af_iucv qeth_l3 zfcp scsi_transport_fc sunrpc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink ghash_s390 prng xts aes_s390 des_s390 de
s_generic sha3_512_s390 sha3_256_s390 sha512_s390 vfio_ccw vfio_mdev mdev vfio_iommu_type1 eadm_sch vfio ext4 mbcache jbd2 qeth_l2 bridge stp llc dasd_eckd_mod qeth dasd_mod
qdio ccwgroup pkey zcrypt
[ 2057.572739] CPU: 6 PID: 60182 Comm: stress_client Kdump: loaded Not tainted 4.18.0-541.el8.s390x #1
[ 2057.572742] Hardware name: IBM 3931 A01 704 (LPAR)
[ 2057.572744] Krnl PSW : 0704f00180000000 0000000000000002 (0x2)
[ 2057.572748] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
[ 2057.572751] Krnl GPRS: 0000000000000004 0000000000000000 00000000a3b008d8 0000000000000000
[ 2057.572754] 00000000a3b008d8 cb923a29c779abc5 0000000000000000 00000000814cfd80
[ 2057.572756] 000000000000012c 0000000000000000 00000000a3b008d8 00000000a3b008d8
[ 2057.572758] 00000000bab6d500 00000000814cfd80 0000000091317e46 00000000814cfc68
[ 2057.572762] Krnl Code:#0000000000000000: 0000 illegal
>0000000000000002: 0000 illegal
0000000000000004: 0000 illegal
0000000000000006: 0000 illegal
0000000000000008: 0000 illegal
000000000000000a: 0000 illegal
000000000000000c: 0000 illegal
000000000000000e: 0000 illegal
[ 2057.572800] Call Trace:
[ 2057.572801] ([<00000000ec639700>] 0xec639700)
[ 2057.572803] [<00000000913183e2>] net_rx_action+0x2ba/0x398
[ 2057.572809] [<0000000091515f76>] __do_softirq+0x11e/0x3a0
[ 2057.572813] [<0000000090ce160c>] do_softirq_own_stack+0x3c/0x58
[ 2057.572817] ([<0000000090d2cbd6>] do_softirq.part.1+0x56/0x60)
[ 2057.572822] [<0000000090d2cc60>] __local_bh_enable_ip+0x80/0x98
[ 2057.572825] [<0000000091314706>] __dev_queue_xmit+0x2be/0xd70
[ 2057.572827] [<000003ff803dd6d6>] afiucv_hs_send+0x24e/0x300 [af_iucv]
[ 2057.572830] [<000003ff803dd88a>] iucv_send_ctrl+0x102/0x138 [af_iucv]
[ 2057.572833] [<000003ff803de72a>] iucv_sock_connect+0x37a/0x468 [af_iucv]
[ 2057.572835] [<00000000912e7e90>] __sys_connect+0xa0/0xd8
[ 2057.572839] [<00000000912e9580>] sys_socketcall+0x228/0x348
[ 2057.572841] [<0000000091514e1a>] system_call+0x2a6/0x2c8
[ 2057.572843] Last Breaking-Event-Address:
[ 2057.572844] [<0000000091317e44>] __napi_poll+0x4c/0x1d8
[ 2057.572846]
[ 2057.572847] Kernel panic - not syncing: Fatal exception in interrupt
-------------------------------------------------------------------------------------------
Analysis:
There is one napi structure per out_q: card->qdio.out_qs[i].napi
The napi.poll functions are set during qeth_open().
Since
commit 1cfef80d4c ("s390/qeth: Don't call dev_close/dev_open (DOWN/UP)")
qeth_set_offline()/qeth_set_online() no longer call dev_close()/
dev_open(). So if qeth_free_qdio_queues() cleared
card->qdio.out_qs[i].napi.poll while the network interface was UP and the
card was offline, they are not set again.
Reproduction:
chzdev -e $devno layer2=0
ip link set dev $network_interface up
echo 0 > /sys/bus/ccwgroup/devices/0.0.$devno/online
echo foo > /sys/bus/ccwgroup/devices/0.0.$devno/hsuid
echo 1 > /sys/bus/ccwgroup/devices/0.0.$devno/online
-> Crash (can be enforced e.g. by af_iucv connect(), ip link down/up, ...)
Note that a Completion Queue (CQ) is only enabled or disabled, when hsuid
is set for the first time or when it is removed.
Workarounds:
- Set hsuid before setting the device online for the first time
or
- Use chzdev -d $devno; chzdev $devno hsuid=xxx; chzdev -e $devno;
to set hsuid on an existing device. (this will remove and recreate the
network interface)
Fix:
There is no need to free the output queues when a completion queue is
added or removed.
card->qdio.state now indicates whether the inbound buffer pool and the
outbound queues are allocated.
card->qdio.c_q indicates whether a CQ is allocated.
Fixes: 1cfef80d4c ("s390/qeth: Don't call dev_close/dev_open (DOWN/UP)")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240430091004.2265683-1-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The alc_spec.power_hook is defined only with CONFIG_PM, and the recent
fix overlooked it, resulting in a build error without CONFIG_PM.
Fix it with the simple ifdef and set __maybe_unused for the function.
We may drop the whole CONFIG_PM dependency there, but it should be
done in a separate cleanup patch later.
Fixes: 1e707769df ("ALSA: hda/realtek - Set GPIO3 to default at S4 state for Thinkpad with ALC1318")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405012104.Dr7h318W-lkp@intel.com/
Message-ID: <20240502062442.30545-1-tiwai@suse.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Ensure the inner IP header is part of skb's linear data before reading
its ECN bits. Otherwise we might read garbage.
One symptom is the system erroneously logging errors like
"vxlan: non-ECT from xxx.xxx.xxx.xxx with TOS=xxxx".
Similar bugs have been fixed in geneve, ip_tunnel and ip6_tunnel (see
commit 1ca1ba465e ("geneve: make sure to pull inner header in
geneve_rx()") for example). So let's reuse the same code structure for
consistency. Maybe we'll can add a common helper in the future.
Fixes: d342894c5d ("vxlan: virtual extensible lan")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/1239c8db54efec341dd6455c77e0380f58923a3c.1714495737.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The find connection logic of Transarc's Rx was modified in the mid-1990s
to support multi-homed servers which might send a response packet from
an address other than the destination address in the received packet.
The rules for accepting a packet by an Rx initiator (RX_CLIENT_CONNECTION)
were altered to permit acceptance of a packet from any address provided
that the port number was unchanged and all of the connection identifiers
matched (Epoch, CID, SecurityClass, ...).
This change applies the same rules to the Linux implementation which makes
it consistent with IBM AFS 3.6, Arla, OpenAFS and AuriStorFS.
Fixes: 17926a7932 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: Jeffrey Altman <jaltman@auristor.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Link: https://lore.kernel.org/r/20240419163057.4141728-1-marc.dionne@auristor.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ASoC: Fixes for v6.9
This is much larger than is ideal, partly due to your holiday but also
due to several vendors having come in with relatively large fixes at
similar times. It's all driver specific stuff.
The meson fixes from Jerome fix some rare timing issues with blocking
operations happening in triggers, plus the continuous clock support
which fixes clocking for some platforms. The SOF series from Peter
builds to the fix to avoid spurious resets of ChainDMA which triggered
errors in cleanup paths with both PulseAudio and PipeWire, and there's
also some simple new debugfs files from Pierre which make support a lot
eaiser.
Pull regulator fixes from Mark Brown:
"There's a few simple driver specific fixes here, plus some core
cleanups from Matti which fix issues found with client drivers due to
the API being confusing.
The two fixes for the stubs provide more constructive behaviour with
!REGULATOR configurations, issues were noticed with some hwmon drivers
which would otherwise have needed confusing bodges in the users.
The irq_helpers fix to duplicate the provided name for the interrupt
controller was found because a driver got this wrong and it's again a
case where the core is the sensible place to put the fix"
* tag 'regulator-fix-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: change devm_regulator_get_enable_optional() stub to return Ok
regulator: change stubbed devm_regulator_get_enable to return Ok
regulator: vqmmc-ipq4019: fix module autoloading
regulator: qcom-refgen: fix module autoloading
regulator: mt6360: De-capitalize devicetree regulator subnodes
regulator: irq_helpers: duplicate IRQ name
Panel replay was enabled by default in commit 5950efe25e
("drm/amd/display: Enable Panel Replay for static screen use case"), but
it isn't working properly at least on some BOE and AUO panels. Instead
of being static the screen is solid black when active. As it's a new
feature that was just introduced that regressed VRR disable it for now
so that problem can be properly root caused.
Cc: Tom Chung <chiahsuan.chung@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3344
Fixes: 5950efe25e ("drm/amd/display: Enable Panel Replay for static screen use case")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SKB_GSO_FRAGLIST skbs must not be linearized, otherwise they become
invalid. Return NULL if such an skb is passed to skb_copy or
skb_copy_expand, in order to prevent a crash on a potential later
call to skb_gso_segment.
Fixes: 3a1296a38d ("net: Support GRO/GSO fraglist chaining.")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling skb_copy on a SKB_GSO_FRAGLIST skb is not valid, since it returns
an invalid linearized skb. This code only needs to change the ethernet
header, so pskb_copy is the right function to call here.
Fixes: 6db6f0eae6 ("bridge: multicast to unicast")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
TLS requires a strict pdu pacing via MSG_EOR to signal the end
of a record and subsequent encryption. If we do not set MSG_EOR
at the end of a sequence the record won't be closed, encryption
doesn't start, and we end up with a send stall as the message
will never be passed on to the TCP layer.
So do not check for the queue status when TLS is enabled but
rather make the MSG_MORE setting dependent on the current
request only.
Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
If the user disabled a nvmet namespace, it is removed from the subsystem
namespaces list. When nvmet processes a command directed to an nsid that
was disabled, it cannot differentiate between a nsid that is disabled
vs. a non-existent namespace, and resorts to return NVME_SC_INVALID_NS
with the dnr bit set.
This translates to a non-retryable status for the host, which translates
to a user error. We should expect disabled namespaces to not cause an
I/O error in a multipath environment.
Address this by searching a configfs item for the namespace nvmet failed
to find, and if we found one, conclude that the namespace is disabled
(perhaps temporarily). Return NVME_SC_INTERNAL_PATH_ERROR in this case
and keep DNR bit cleared.
Reported-by: Jirong Feng <jirong.feng@easystack.cn>
Tested-by: Jirong Feng <jirong.feng@easystack.cn>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
When we teardown the controller, we wait for pending I/Os to complete
(sq->ref on all queues to drop to zero) and then we go over the commands,
and free their command buffers in case they are still fetching data from
the host (e.g. processing nvme writes) and have yet to take a reference
on the sq.
However, we may miss the case where commands have failed before executing
and are queued for sending a response, but will never occur because the
queue socket is already down. In this case we may miss deallocating command
buffers.
Solve this by freeing all commands buffers as nvmet_tcp_free_cmd_buffers is
idempotent anyways.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
While I/O is running, if the pci bus error occurs then
in-flight I/O can not complete. Worst, if at this time,
user (logically) hot-unplug the nvme disk then the
nvme_remove() code path can't forward progress until
in-flight I/O is cancelled. So these sequence of events
may potentially hang hot-unplug code path indefinitely.
This patch helps cancel the pending/in-flight I/O from the
nvme request timeout handler in case the nvme controller
is in the terminal (DEAD/DELETING/DELETING_NOIO) state and
that helps nvme_remove() code path forward progress and
finish successfully.
Link: https://lore.kernel.org/all/199be893-5dfa-41e5-b6f2-40ac90ebccc4@linux.ibm.com/
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
In nvmet_auth_host_hash(), if a mismatch is detected in the hash length
the kernel should print an error.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
If the nvmet_auth_host_hash() function fails, the error code should
be returned to its callers.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
On system where native nvme multipath is configured and iopolicy
is set to numa but the nvme controller numa node id is undefined
or -1 (NUMA_NO_NODE) then avoid calculating node distance for
finding optimal io path. In such case we may access numa distance
table with invalid index and that may potentially refer to incorrect
memory. So this patch ensures that if the nvme controller numa node
id is -1 then instead of calculating node distance for finding optimal
io path, we set the numa node distance of such controller to default 10
(LOCAL_DISTANCE).
Link: https://lore.kernel.org/all/20240413090614.678353-1-nilay@linux.ibm.com/
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
With commit ed6776c96c ("s390/crypto: remove retry
loop with sleep from PAES pkey invocation") the retry
loop to retry derivation of a protected key from a
secure key has been removed. This was based on the
assumption that theses retries are not needed any
more as proper retries are done in the zcrypt layer.
However, tests have revealed that there exist some
cases with master key change in the HSM and immediately
(< 1 second) attempt to derive a protected key from a
secure key with exact this HSM may eventually fail.
The low level functions in zcrypt_ccamisc.c and
zcrypt_ep11misc.c detect and report this temporary
failure and report it to the caller as -EBUSY. The
re-established retry loop in the paes implementation
catches exactly this -EBUSY and eventually may run
some retries.
Fixes: ed6776c96c ("s390/crypto: remove retry loop with sleep from PAES pkey invocation")
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Use -EBUSY instead of -EAGAIN in zcrypt_ccamisc.c
in cases where the CCA card returns 8/2290 to indicate
a temporarily unavailability of this function.
Fixes: ed6776c96c ("s390/crypto: remove retry loop with sleep from PAES pkey invocation")
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
An EP11 reply cprb contains a field ret_code which may
hold an error code different than the error code stored
in the payload of the cprb. As of now all the EP11 misc
functions do not evaluate this field but focus on the
error code in the payload.
Before checking the payload error, first the cprb error
field should be evaluated which is introduced with this
patch.
If the return code value 0x000c0003 is seen, this
indicates a busy situation which is reflected by
-EBUSY in the zcrpyt_ep11misc.c low level function.
A higher level caller should consider to retry after
waiting a dedicated duration (say 1 second).
Fixes: ed6776c96c ("s390/crypto: remove retry loop with sleep from PAES pkey invocation")
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
The syzbot-reported stack trace from hell in this discussion thread
actually has three nested page faults:
https://lore.kernel.org/r/000000000000d5f4fc0616e816d4@google.com
... and I think that's actually the important thing here:
- the first page fault is from user space, and triggers the vsyscall
emulation.
- the second page fault is from __do_sys_gettimeofday(), and that should
just have caused the exception that then sets the return value to
-EFAULT
- the third nested page fault is due to _raw_spin_unlock_irqrestore() ->
preempt_schedule() -> trace_sched_switch(), which then causes a BPF
trace program to run, which does that bpf_probe_read_compat(), which
causes that page fault under pagefault_disable().
It's quite the nasty backtrace, and there's a lot going on.
The problem is literally the vsyscall emulation, which sets
current->thread.sig_on_uaccess_err = 1;
and that causes the fixup_exception() code to send the signal *despite* the
exception being caught.
And I think that is in fact completely bogus. It's completely bogus
exactly because it sends that signal even when it *shouldn't* be sent -
like for the BPF user mode trace gathering.
In other words, I think the whole "sig_on_uaccess_err" thing is entirely
broken, because it makes any nested page-faults do all the wrong things.
Now, arguably, I don't think anybody should enable vsyscall emulation any
more, but this test case clearly does.
I think we should just make the "send SIGSEGV" be something that the
vsyscall emulation does on its own, not this broken per-thread state for
something that isn't actually per thread.
The x86 page fault code actually tried to deal with the "incorrect nesting"
by having that:
if (in_interrupt())
return;
which ignores the sig_on_uaccess_err case when it happens in interrupts,
but as shown by this example, these nested page faults do not need to be
about interrupts at all.
IOW, I think the only right thing is to remove that horrendously broken
code.
The attached patch looks like the ObviouslyCorrect(tm) thing to do.
NOTE! This broken code goes back to this commit in 2011:
4fc3490114 ("x86-64: Set siginfo and context on vsyscall emulation faults")
... and back then the reason was to get all the siginfo details right.
Honestly, I do not for a moment believe that it's worth getting the siginfo
details right here, but part of the commit says:
This fixes issues with UML when vsyscall=emulate.
... and so my patch to remove this garbage will probably break UML in this
situation.
I do not believe that anybody should be running with vsyscall=emulate in
2024 in the first place, much less if you are doing things like UML. But
let's see if somebody screams.
Reported-and-tested-by: syzbot+83e7f982ca045ab4405c@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/CAHk-=wh9D6f7HUkDgZHKmDCHUQmp+Co89GP+b8+z+G56BKeyNg@mail.gmail.com
If spi_sync() is called with the non-empty queue and the same spi_message
is then reused, the complete callback for the message remains set while
the context is cleared, leading to a null pointer dereference when the
callback is invoked from spi_finalize_current_message().
With function inlining disabled, the call stack might look like this:
_raw_spin_lock_irqsave from complete_with_flags+0x18/0x58
complete_with_flags from spi_complete+0x8/0xc
spi_complete from spi_finalize_current_message+0xec/0x184
spi_finalize_current_message from spi_transfer_one_message+0x2a8/0x474
spi_transfer_one_message from __spi_pump_transfer_message+0x104/0x230
__spi_pump_transfer_message from __spi_transfer_message_noqueue+0x30/0xc4
__spi_transfer_message_noqueue from __spi_sync+0x204/0x248
__spi_sync from spi_sync+0x24/0x3c
spi_sync from mcp251xfd_regmap_crc_read+0x124/0x28c [mcp251xfd]
mcp251xfd_regmap_crc_read [mcp251xfd] from _regmap_raw_read+0xf8/0x154
_regmap_raw_read from _regmap_bus_read+0x44/0x70
_regmap_bus_read from _regmap_read+0x60/0xd8
_regmap_read from regmap_read+0x3c/0x5c
regmap_read from mcp251xfd_alloc_can_err_skb+0x1c/0x54 [mcp251xfd]
mcp251xfd_alloc_can_err_skb [mcp251xfd] from mcp251xfd_irq+0x194/0xe70 [mcp251xfd]
mcp251xfd_irq [mcp251xfd] from irq_thread_fn+0x1c/0x78
irq_thread_fn from irq_thread+0x118/0x1f4
irq_thread from kthread+0xd8/0xf4
kthread from ret_from_fork+0x14/0x28
Fix this by also setting message->complete to NULL when the transfer is
complete.
Fixes: ae7d2346dc ("spi: Don't use the message queue if possible in spi_sync")
Signed-off-by: Mans Rullgard <mans@mansr.com>
Link: https://lore.kernel.org/r/20240430182705.13019-1-mans@mansr.com
Signed-off-by: Mark Brown <broonie@kernel.org>
There is a race condition when re-creating a kfd_process for a process.
This has been observed when a process under the debugger executes
exec(3). In this scenario:
- The process executes exec.
- This will eventually release the process's mm, which will cause the
kfd_process object associated with the process to be freed
(kfd_process_free_notifier decrements the reference count to the
kfd_process to 0). This causes kfd_process_ref_release to enqueue
kfd_process_wq_release to the kfd_process_wq.
- The debugger receives the PTRACE_EVENT_EXEC notification, and tries to
re-enable AMDGPU traps (KFD_IOC_DBG_TRAP_ENABLE).
- When handling this request, KFD tries to re-create a kfd_process.
This eventually calls kfd_create_process and kobject_init_and_add.
At this point the call to kobject_init_and_add can fail because the
old kfd_process.kobj has not been freed yet by kfd_process_wq_release.
This patch proposes to avoid this race by making sure to drain
kfd_process_wq before creating a new kfd_process object. This way, we
know that any cleanup task is done executing when we reach
kobject_init_and_add.
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why && How]
Screen flickering saw on 4K@60 eDP with high refresh rate external
monitor when booting up in DC mode. DC Mode Capping is disabled
which caused wrong UCLK being used.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Leo Ma <hanghong.ma@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move
on same heap. The basic problem here is that after the move the old
location is simply not available any more.
Some fixes were suggested, but essentially we should call the move
notification before actually moving things because only this way we have
the correct order for DMA-buf and VM move notifications as well.
Also rework the statistic handling so that we don't update the eviction
counter before the move.
v2: add missing NULL check
Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 94aeb41173 ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
[Why]
During DP tunnel creation, CM preallocates BW and reduces
estimated BW of other DPIA. CM release preallocation only
when allocation is complete. Display mode validation logic
validates timings based on bw available per host router.
In multi display setup, this causes bw allocation failure
when allocation greater than estimated bw.
[How]
Do zero alloc to make the CM to release preallocation and
update estimated BW correctly for all DPIAs per host router.
Reviewed-by: PeiChen Huang <peichen.huang@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why] DSC debugfs, such as dp_dsc_clock_en_read,
use aconnector->dc_link to find pipe_ctx for display.
Displays connected to MST hub share the same dc_link.
DSC instance is from pipe_ctx. This causes incorrect
DSC instance for display connected to MST hub.
[How] Add aconnector->sink check to find pipe_ctx.
CC: stable@vger.kernel.org
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Hersen Wu <hersenxs.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
New request from KMD/VBIOS in order to support new UMA carveout
model. This fixes a null dereference from accessing
Ctx->dc_bios->integrated_info while it was NULL.
DAL parses through the BIOS and extracts the necessary
integrated_info but was missing a case for the new BIOS
version 2.3.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Gabe Teeger <gabe.teeger@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why & How]
Currently DCN315 clk manager is missing code to enable/disable dtbclk.
Because of this, "optimized_required" flag is constantly set
and this prevents FreeSync from engaging for certain high bandwidth
display Modes which require DTBCLK.
Reviewed-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Swapnil Patel <swapnil.patel@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The selftest for the driver sends a dummy packet and checks if the
packet will be received properly as it should be. The regular TX path
and the selftest can use the same network queue so locking is required
and was missing in the selftest path. This was addressed in the commit
cited below.
Unfortunately locking the TX queue requires BH to be disabled which is
not the case in selftest path which is invoked in process context.
Lockdep should be complaining about this.
Use __netif_tx_lock_bh() for TX queue locking.
Fixes: c650e04898 ("cxgb4: Fix race between loopback and normal Tx path")
Reported-by: "John B. Wyatt IV" <jwyatt@redhat.com>
Closes: https://lore.kernel.org/all/Zic0ot5aGgR-V4Ks@thinkpad2021/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20240429091147.YWAaal4v@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rxrpc_alloc_data_txbuf() may be called with data_align being
zero in none_alloc_txbuf() and rxkad_alloc_txbuf(), data_align
is supposed to be an order-based alignment value, but zero is
not a valid order-based alignment value, and '~(data_align - 1)'
doesn't result in a valid mask-based alignment value for
__page_frag_alloc_align().
Fix it by passing a valid order-based alignment value in
none_alloc_txbuf() and rxkad_alloc_txbuf().
Also use page_frag_alloc_align() expecting an order-based
alignment value in rxrpc_alloc_data_txbuf() to avoid doing the
alignment converting operation and to catch possible invalid
alignment value in the future. Remove the 'if (data_align)'
checking too, as it is always true for a valid order-based
alignment value.
Fixes: 6b2536462f ("rxrpc: Fix use of changed alignment param to page_frag_alloc_align()")
Fixes: 49489bb03a ("rxrpc: Do zerocopy using MSG_SPLICE_PAGES and page frags")
CC: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20240428111640.27306-1-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Subtract the VRAM pinned memory when checking for available memory
in amdgpu_amdkfd_reserve_mem_limit function since that memory is not
available for use.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ublk_drv currently creates block devices with the default max_segments
and max_segment_size limits of BLK_MAX_SEGMENTS (128) and
BLK_MAX_SEGMENT_SIZE (65536) respectively. These defaults can
artificially constrain the I/O size seen by the ublk server - for
example, suppose that the ublk server has configured itself to accept
I/Os up to 1M and the application is also issuing 1M sized I/Os. If the
I/O buffer used by the application is backed by 4K pages, the buffer
could consist of up to 1M / 4K = 256 physically discontiguous segments
(even if the buffer is virtually contiguous). As such, the I/O could
exceed the default max_segments limit and get split. This can cause
unnecessary performance issues if the ublk server is optimized to handle
1M I/Os. The block layer's segment count/size limits exist to model
hardware constraints which don't exist in ublk_drv's case, so just
remove those limits for the block devices created by ublk_drv.
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Riley Thomasson <riley@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240430211623.2802036-1-ushankar@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull kvm fix from Paolo Bonzini:
"A pretty straightforward fix for a NULL pointer dereference, plus the
accompanying reproducer"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Add test for uaccesses to non-existent vgic-v2 CPUIF
KVM: arm64: vgic-v2: Check for non-NULL vCPU in vgic_v2_parse_attr()
typec_register_partner() does not guarantee partner registration
to always succeed. In the event of failure, port->partner is set
to the error value or NULL. Given that port->partner validity is
not checked, this results in the following crash:
Unable to handle kernel NULL pointer dereference at virtual address xx
pc : run_state_machine+0x1bc8/0x1c08
lr : run_state_machine+0x1b90/0x1c08
..
Call trace:
run_state_machine+0x1bc8/0x1c08
tcpm_state_machine_work+0x94/0xe4
kthread_worker_fn+0x118/0x328
kthread+0x1d0/0x23c
ret_from_fork+0x10/0x20
To prevent the crash, check for port->partner validity before
derefencing it in all the call sites.
Cc: stable@vger.kernel.org
Fixes: c97cd0b4b5 ("usb: typec: tcpm: set initial svdm version based on pd revision")
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240427202812.3435268-1-badhri@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before sending Enter Mode for an Alt Mode, there is a gap between Discover
Modes and the Alt Mode driver queueing the Enter Mode VDM for the port
partner to send a message to the port.
If this message results in unregistering Alt Modes such as in a DR_SWAP,
then the following deadlock can occur with respect to the DisplayPort Alt
Mode driver:
1. The DR_SWAP state holds port->lock. Unregistering the Alt Mode driver
results in a cancel_work_sync() that waits for the current dp_altmode_work
to finish.
2. dp_altmode_work makes a call to tcpm_altmode_enter. The deadlock occurs
because tcpm_queue_vdm_unlock attempts to hold port->lock.
Before attempting to grab the lock, ensure that the port is in a state
vdm_run_state_machine can run in. Alt Mode unregistration will not occur
in these states.
Fixes: 03eafcfb60 ("usb: typec: tcpm: Add tcpm_queue_vdm_unlocked() helper")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20240423202356.3372314-2-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a Fast Role Swap control message attempt results in a transition
to ERROR_RECOVERY, the TCPC can still queue a TCPM_SOURCING_VBUS event.
If the event is queued but processed after the tcpm_reset_port() call
in the PORT_RESET state, then the following occurs:
1. tcpm_reset_port() calls tcpm_init_vbus() to reset the vbus sourcing and
sinking state
2. tcpm_pd_event_handler() turns VBUS on before the port is in the default
state.
3. The port resolves as a sink. In the SNK_DISCOVERY state,
tcpm_set_charge() cannot set vbus to charge.
Clear pd events within PORT_RESET to get rid of non-applicable events.
Fixes: b17dd57118 ("staging: typec: tcpm: Improve role swap with non PD capable partners")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20240423202715.3375827-2-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently we allocate all 3 levels of radix3 page tables using
nvkm_gsp_mem_ctor(), which uses dma_alloc_coherent() for allocating all of
the relevant memory. This can end up failing in scenarios where the system
has very high memory fragmentation, and we can't find enough contiguous
memory to allocate level 2 of the page table.
Currently, this can result in runtime PM issues on systems where memory
fragmentation is high - as we'll fail to allocate the page table for our
suspend/resume buffer:
kworker/10:2: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL),
nodemask=(null),cpuset=/,mems_allowed=0
CPU: 10 PID: 479809 Comm: kworker/10:2 Not tainted
6.8.6-201.ChopperV6.fc39.x86_64 #1
Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024
Workqueue: pm pm_runtime_work
Call Trace:
<TASK>
dump_stack_lvl+0x64/0x80
warn_alloc+0x165/0x1e0
? __alloc_pages_direct_compact+0xb3/0x2b0
__alloc_pages_slowpath.constprop.0+0xd7d/0xde0
__alloc_pages+0x32d/0x350
__dma_direct_alloc_pages.isra.0+0x16a/0x2b0
dma_direct_alloc+0x70/0x270
nvkm_gsp_radix3_sg+0x5e/0x130 [nouveau]
r535_gsp_fini+0x1d4/0x350 [nouveau]
nvkm_subdev_fini+0x67/0x150 [nouveau]
nvkm_device_fini+0x95/0x1e0 [nouveau]
nvkm_udevice_fini+0x53/0x70 [nouveau]
nvkm_object_fini+0xb9/0x240 [nouveau]
nvkm_object_fini+0x75/0x240 [nouveau]
nouveau_do_suspend+0xf5/0x280 [nouveau]
nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau]
pci_pm_runtime_suspend+0x67/0x1e0
? __pfx_pci_pm_runtime_suspend+0x10/0x10
__rpm_callback+0x41/0x170
? __pfx_pci_pm_runtime_suspend+0x10/0x10
rpm_callback+0x5d/0x70
? __pfx_pci_pm_runtime_suspend+0x10/0x10
rpm_suspend+0x120/0x6a0
pm_runtime_work+0x98/0xb0
process_one_work+0x171/0x340
worker_thread+0x27b/0x3a0
? __pfx_worker_thread+0x10/0x10
kthread+0xe5/0x120
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
Luckily, we don't actually need to allocate coherent memory for the page
table thanks to being able to pass the GPU a radix3 page table for
suspend/resume data. So, let's rewrite nvkm_gsp_radix3_sg() to use the sg
allocator for level 2. We continue using coherent allocations for lvl0 and
1, since they only take a single page.
V2:
* Don't forget to actually jump to the next scatterlist when we reach the
end of the scatterlist we're currently on when writing out the page table
for level 2
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ben Skeggs <bskeggs@nvidia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-2-lyude@redhat.com
Unfortunately both Lenovo Legion Pro 7 16ARX8H and Legion 7i 16IAX7
got the very same PCI SSID while the hardware implementations are
completely different (the former is with TI TAS2781 codec while the
latter is with Cirrus CS35L41 codec). The former model got broken by
the recent fix for the latter model.
For addressing the regression, check the codec SSID and apply the
proper quirk for each model now.
Fixes: 24b6332c2d ("ALSA: hda: Add Lenovo Legion 7i gen7 sound quirk")
Cc: <stable@vger.kernel.org>
Link: https://bugzilla.suse.com/show_bug.cgi?id=1223462
Message-ID: <20240430163206.5200-1-tiwai@suse.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull power supply fixes from Sebastian Reichel:
- mt6360_charger: Fix of_match for usb-otg-vbus regulator
- rt9455: Fix unused-const-variable for !CONFIG_USB_PHY
* tag 'for-v6.9-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: supply: mt6360_charger: Fix of_match for usb-otg-vbus regulator
power: rt9455: hide unused rt9455_boost_voltage_values
Pull x86 platform driver fix from Ilpo Järvinen:
- Add Grand Ridge to HPM CPU list
* tag 'platform-drivers-x86-v6.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: ISST: Add Grand Ridge to HPM CPU list
Testing ohci functionality with qemu's pci-ohci emulation often results
in ohci interface stalls, resulting in hung task timeouts.
The problem is caused by lost interrupts between the emulation and the
Linux kernel code. Additional interrupts raised while the ohci interrupt
handler in Linux is running and before the handler clears the interrupt
status are not handled. The fix for a similar problem in ehci suggests
that the problem is likely caused by edge-triggered MSI interrupts. See
commit 0b60557230 ("usb: ehci: Prevent missed ehci interrupts with
edge-triggered MSI") for details.
Ensure that the ohci interrupt code handles all pending interrupts before
returning to solve the problem.
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: stable@vger.kernel.org
Fixes: 306c54d0ed ("usb: hcd: Try MSI interrupts on PCI devices")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Link: https://lore.kernel.org/r/20240429154010.1507366-1-linux@roeck-us.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull pin control fixes from Linus Walleij:
- Fix a double-free in the pinctrl_enable() errorpath
- Fix a refcount leak in pinctrl_dt_to_map()
- Fix selecting the GPIO pin control state and the UART3 pin config
group in the Intel Baytrail driver
- Fix readback of schmitt trigger status in the Mediatek Paris driver,
along with some semantic pin config issues in this driver
- Fix a pin suffix typo in the Meson A1 driver
- Fix an erroneous register offset in he Aspeed G6 driver
- Fix an inconsistent lock state and the interrupt type on resume in
the Renesas RZG2L driver
- Fix some minor confusion in the Renesas DT bindings
* tag 'pinctrl-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: renesas: rzg2l: Configure the interrupt type on resume
pinctrl: devicetree: fix refcount leak in pinctrl_dt_to_map()
pinctrl: baytrail: Add pinconf group for uart3
pinctrl: baytrail: Fix selecting gpio pinctrl state
pinctrl: mediatek: paris: Rework support for PIN_CONFIG_{INPUT,OUTPUT}_ENABLE
pinctrl: mediatek: paris: Fix PIN_CONFIG_INPUT_SCHMITT_ENABLE readback
pinctrl: core: delete incorrect free in pinctrl_enable()
pinctrl/meson: fix typo in PDM's pin name
pinctrl: pinctrl-aspeed-g6: Fix register offset for pinconf of GPIOR-T
pinctrl: renesas: rzg2l: Execute atomically the interrupt configuration
dt-bindings: pinctrl: renesas,rzg2l-pinctrl: Allow 'input' and 'output-enable' properties
Merge series from Jerome Brunet <jbrunet@baylibre.com>:
This patchset fixes 2 problems on TDM which both find a solution
by properly implementing the .trigger() callback for the TDM backend.
ATM, enabling the TDM formatters is done by the .prepare() callback
because handling the formatter is slow due to necessary calls to CCF.
The first problem affects the TDMIN. Because .prepare() is called on DPCM
backend first, the formatter are started before the FIFOs and this may
cause a random channel shifts if the TDMIN use multiple lanes with more
than 2 slots per lanes. Using trigger() allows to set the FE/BE order,
solving the problem.
There has already been an attempt to fix this 3y ago [1] and reverted [2]
It triggered a 'sleep in irq' error on the period IRQ. The solution is
to just use the bottom half of threaded IRQ. This is patch #1. Patch #2
and #3 remain mostly the same as 3y ago.
For TDMOUT, the problem is on pause. ATM pause only stops the FIFO and
the TDMOUT just starves. When it does, it will actually repeat the last
sample continuously. Depending on the platform, if there is no high-pass
filter on the analog path, this may translate to a constant position of
the speaker membrane. There is no audible glitch but it may damage the
speaker coil.
Properly stopping the TDMOUT in pause solves the problem. There is
behaviour change associated with that fix. Clocks used to be continuous
on pause because of the problem above. They will now be gated on pause by
default, as they should. The last change introduce the proper support for
continuous clocks, if needed.
[1]: https://lore.kernel.org/linux-amlogic/20211020114217.133153-1-jbrunet@baylibre.com
[2]: https://lore.kernel.org/linux-amlogic/20220421155725.2589089-1-narmstrong@baylibre.com
[BUG]
When running generic/287, the following file extent items can be
generated:
item 16 key (258 EXTENT_DATA 2682880) itemoff 15305 itemsize 53
generation 9 type 1 (regular)
extent data disk byte 1378414592 nr 462848
extent data offset 0 nr 462848 ram 2097152
extent compression 0 (none)
Note that file extent item is not a compressed one, but its ram_bytes is
way larger than its disk_num_bytes.
According to btrfs on-disk scheme, ram_bytes should match disk_num_bytes
if it's not a compressed one.
[CAUSE]
Since commit b73a6fd1b1 ("btrfs: split partial dio bios before
submit"), for partial dio writes, we would split the ordered extent.
However the function btrfs_split_ordered_extent() doesn't update the
ram_bytes even it has already shrunk the disk_num_bytes.
Originally the function btrfs_split_ordered_extent() is only introduced
for zoned devices in commit d22002fd37 ("btrfs: zoned: split ordered
extent when bio is sent"), but later commit b73a6fd1b1 ("btrfs: split
partial dio bios before submit") makes non-zoned btrfs affected.
Thankfully for un-compressed file extent, we do not really utilize the
ram_bytes member, thus it won't cause any real problem.
[FIX]
Also update btrfs_ordered_extent::ram_bytes inside
btrfs_split_ordered_extent().
Fixes: d22002fd37 ("btrfs: zoned: split ordered extent when bio is sent")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Xu writes:
FPGA Manager changes for 6.9-final
DFL
- Peter adds PCI ID table for Intel D5005 Stratix 10 FPGA card
All patches have been reviewed on the mailing list, and have been in the
last linux-next releases (as part of our fixes branch)
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
* tag 'fpga-for-6.9-final' of git://git.kernel.org/pub/scm/linux/kernel/git/fpga/linux-fpga:
fpga: dfl-pci: add PCI subdevice ID for Intel D5005 card
With 'iommu=off' on the kernel command line and x2APIC enabled by the BIOS
the code which disables the x2APIC triggers an unchecked MSR access error:
RDMSR from 0x802 at rIP: 0xffffffff94079992 (native_apic_msr_read+0x12/0x50)
This is happens because default_acpi_madt_oem_check() selects an x2APIC
driver before the x2APIC is disabled.
When the x2APIC is disabled because interrupt remapping cannot be enabled
due to 'iommu=off' on the command line, x2apic_disable() invokes
apic_set_fixmap() which in turn tries to read the APIC ID. This triggers
the MSR warning because x2APIC is disabled, but the APIC driver is still
x2APIC based.
Prevent that by adding an argument to apic_set_fixmap() which makes the
APIC ID read out conditional and set it to false from the x2APIC disable
path. That's correct as the APIC ID has already been read out during early
discovery.
Fixes: d10a904435 ("x86/apic: Consolidate boot_cpu_physical_apicid initialization sites")
Reported-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/875xw5t6r7.ffs@tglx
Pull workqueue fixes from Tejun Heo:
"Two doc update patches and the following three fixes:
- On single node systems, the default pool is used but the
node_nr_active for the default pool was set to min_active. This
effectively limited the max concurrency of unbound pools on single
node systems to 8 causing performance regressions on some
workloads. Fixed by setting the default pool's node_nr_active to
max_active.
- wq_update_node_max_active() could trigger divide-by-zero if the
intersection between the allowed CPUs for an unbound workqueue and
online CPUs becomes empty.
- When kick_pool() was trying to repatriate a worker to a CPU in its
pod by setting task->wake_cpu, it didn't consider whether the CPU
being selected is online or not which obviously can lead to
subobtimal behaviors. On s390, this triggered a crash in arch code.
The workqueue patch removes the gross misbehavior but doesn't fix
the crash completely as there's a race window in which CPUs can go
down after wake_cpu is set. Need to decide whether the fix should
be on the core or arch side"
* tag 'wq-for-6.9-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Fix divide error in wq_update_node_max_active()
workqueue: The default node_nr_active should have its max set to max_active
workqueue: Fix selection of wake_cpu in kick_pool()
docs/zh_CN: core-api: Update translation of workqueue.rst to 6.9-rc1
Documentation/core-api: Update events_freezable_power references.
Pull Allwinner clk driver fixes from Jernej Skrabec:
- fix H6 CPU rate change via reparenting
- set A64 MIPI PLL min & max rate
* tag 'sunxi-clk-fixes-for-6.9-1' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
clk: sunxi-ng: a64: Set minimum and maximum rate for PLL-MIPI
clk: sunxi-ng: common: Support minimum and maximum rate
clk: sunxi-ng: h6: Reparent CPUX during PLL CPUX rate change
Pull SCSI fix from James Bottomley:
"Minor core fix to prevent the sd driver printing the stream count
every time we rescan and instead print only if it's changed"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Only print updates to permanent stream count
Pull NFS client fixes from Trond Myklebust:
- Fix an Oops in xs_tcp_tls_setup_socket
- Fix an Oops due to missing error handling in nfs_net_init()
* tag 'nfs-for-6.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: Handle error of rpc_proc_register() in nfs_net_init().
SUNRPC: add a missing rpc_stat for TCP TLS
Pull bcachefs fixes from Kent Overstreet:
"Tiny set of fixes this time"
* tag 'bcachefs-2024-04-29' of https://evilpiepirate.org/git/bcachefs:
bcachefs: fix integer conversion bug
bcachefs: btree node scan now fills in sectors_written
bcachefs: Remove accidental debug assert
Currently, we allocate a lbuf-sized kernel buffer and copy lbuf from
userspace to that buffer. Later, we use scanf on this buffer but we don't
ensure that the string is terminated inside the buffer, this can lead to
OOB read when using scanf. Fix this issue by using memdup_user_nul instead.
Fixes: a4f17cc726 ("s390/cio: add CRW inject functionality")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20240424-fix-oob-read-v2-5-f1f1b53a10f4@gmail.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Robert reported the following when booting a CXL host with Restricted CXL
Host (RCH) topology:
[ 39.815379] cxl_acpi ACPI0017:00: not a cxl_port device
[ 39.827123] WARNING: CPU: 46 PID: 1754 at drivers/cxl/core/port.c:592 to_cxl_port+0x56/0x70 [cxl_core]
... plus some related subsequent NULL pointer dereference:
[ 40.718708] BUG: kernel NULL pointer dereference, address: 00000000000002d8
The iterator to walk the PCIe path did not account for RCH topology.
However RCH does not support hotplug and the memory exported by the
Restricted CXL Device (RCD) should be covered by HMAT and therefore no
access_coordinate is needed. Add check to see if the endpoint device is
RCD and skip calculation.
Also add a call to cxl_endpoint_get_perf_coordinates() in cxl_test in order
to exercise the topology iterator. The dev_is_pci() check added is to help
with this test and should be harmless for normal operation.
Reported-by: Robert Richter <rrichter@amd.com>
Closes: https://lore.kernel.org/all/Ziv8GfSMSbvlBB0h@rric.localdomain/
Fixes: 592780b839 ("cxl: Fix retrieving of access_coordinates in PCIe path")
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/20240426224913.1027420-1-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Pull erofs fixes from Gao Xiang:
"Three fixes related to EROFS fscache mode. The most important two
patches fix calling kill_block_super() in bdev-based mode instead of
kill_anon_super(). The remaining patch is an informative one.
Summary:
- Better error message when prepare_ondemand_read failed
- Fix unmount of bdev-based mode if CONFIG_EROFS_FS_ONDEMAND is on"
* tag 'erofs-for-6.9-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: reliably distinguish block based and fscache mode
erofs: get rid of erofs_fs_context
erofs: modify the error message when prepare_ondemand_read failed
The documentation for device_get_named_child_node() mentions this
important point:
"
The caller is responsible for calling fwnode_handle_put() on the
returned fwnode pointer.
"
Add fwnode_handle_put() to avoid a leaked reference.
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Fixes: 08c2a4bc9f ("ALSA: hda: move Intel SoundWire ACPI scan to dedicated module")
Message-ID: <20240426152731.38420-1-pierre-louis.bossart@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The SOF driver is selected whenever specific I2C/I2S HIDs are reported
as 'present' in the ACPI DSDT. In some cases, an HID is reported but
the hardware does not actually rely on I2C/I2S. This false positive
leads to an invalid selection of the SOF driver and as a result an
invalid topology is loaded.
This patch hardens the detection with a check that the NHLT table is
consistent with the report of an I2S-based codec in DSDT. This table
should expose at least one SSP endpoint configured for an I2S-codec
connection.
Tested on Huawei Matebook D14 (NBLB-WAX9N) using an HDaudio codec with
an invalid ES8336 ACPI HID reported:
[ 7.858249] snd_hda_intel 0000:00:1f.3: DSP detected with PCI class/subclass/prog-if info 0x040380
[ 7.858312] snd_hda_intel 0000:00:1f.3: snd_intel_dsp_find_config: no valid SSP found for HID ESSX8336, skipped
Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Closes: https://github.com/thesofproject/linux/issues/4934
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Message-ID: <20240426152818.38443-1-pierre-louis.bossart@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Amlogic sound cards do create a lot of pcm interfaces, possibly more than
8. Some pcm interfaces are internal (like DPCM backends and c2c) and not
exposed to userspace.
Those interfaces still increase the number passed to snd_find_free_minor(),
which eventually exceeds 8 causing -EBUSY error on card registration if
CONFIG_SND_DYNAMIC_MINORS=n and the interface is exposed to userspace.
select CONFIG_SND_DYNAMIC_MINORS for Amlogic cards to avoid the problem.
Fixes: 7864a79f37 ("ASoC: meson: add axg sound card support")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20240426134150.3053741-1-jbrunet@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
So far, the formatters have been reset/enabled using the .prepare()
callback. This was done in this callback because walking the formatters use
a mutex. A mutex is used because formatter handling require dealing
possibly slow clock operation.
With the support of non-atomic, .trigger() callback may be used which also
allows to properly enable and disable formatters on start but also
pause/resume.
This solve a random shift on TDMIN as well repeated samples on for TDMOUT.
Fixes: d60e4f1e4b ("ASoC: meson: add tdm interface driver")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20240426152946.3078805-4-jbrunet@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Non atomic operations need to be performed in the trigger callback
of the TDM interfaces. Those are BEs but what matters is the nonatomic
flag of the FE in the DPCM context. Just set nonatomic for everything so,
at least, what is done is clear.
Fixes: 7864a79f37 ("ASoC: meson: add axg sound card support")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20240426152946.3078805-3-jbrunet@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
With the AXG audio subsystem, there is a possible random channel shift on
TDM capture, when the slot number per lane is more than 2, and there is
more than one lane used.
The problem has been there since the introduction of the axg audio support
but such scenario is pretty uncommon. This is why there is no loud
complains about the problem.
Solving the problem require to make the links non-atomic and use the
trigger() callback to start FEs and BEs in the appropriate order.
This was tried in the past and reverted because it caused the block irq to
sleep while atomic. However, instead of reverting, the solution is to call
snd_pcm_period_elapsed() in a non atomic context.
Use the bottom half of a threaded IRQ to do so.
Fixes: 6dc4fa179f ("ASoC: meson: add axg fifo base driver")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20240426152946.3078805-2-jbrunet@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Guillaume Nault says:
====================
vxlan: Fix vxlan counters.
Like most virtual devices, vxlan needs special care when updating its
netdevice counters. This is done in patch 1. Patch 2 just adds a
missing VNI counter update (found while working on patch 1).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
VXLAN stores per-VNI statistics using vxlan_vnifilter_count().
These statistics were not updated when arp_reduce() failed its
pskb_may_pull() call.
Use vxlan_vnifilter_count() to update the VNI counter when that
happens.
Fixes: 4095e0e132 ("drivers: vxlan: vnifilter: per vni stats")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VXLAN devices update their stats locklessly. Therefore these counters
should either be stored in per-cpu data structures or the updates
should be done using atomic increments.
Since the net_device_core_stats infrastructure is already used in
vxlan_rcv(), use it for the other rx_dropped and tx_dropped counter
updates. Update the other counters atomically using DEV_STATS_INC().
Fixes: d342894c5d ("vxlan: virtual extensible lan")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensure that packet_buffer_get respects the user_length provided. If
the length of the head packet exceeds the user_length, packet_buffer_get
will now return 0 to signify to the user that no data were read
and a larger buffer size is required. Helps prevent user space overflows.
Signed-off-by: Thanassis Avgerinos <thanassis.avgerinos@gmail.com>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Handle cases where the RMP table placement in the BIOS is not 2M aligned
and the kexec-ed kernel could try to allocate from within that chunk
which then causes a fatal RMP fault.
The kexec failure is illustrated below:
SEV-SNP: RMP table physical range [0x0000007ffe800000 - 0x000000807f0fffff]
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000008efff] usable
BIOS-e820: [mem 0x000000000008f000-0x000000000008ffff] ACPI NVS
...
BIOS-e820: [mem 0x0000004080000000-0x0000007ffe7fffff] usable
BIOS-e820: [mem 0x0000007ffe800000-0x000000807f0fffff] reserved
BIOS-e820: [mem 0x000000807f100000-0x000000807f1fefff] usable
As seen here in the e820 memory map, the end range of the RMP table is not
aligned to 2MB and not reserved but it is usable as RAM.
Subsequently, kexec -s (KEXEC_FILE_LOAD syscall) loads it's purgatory
code and boot_param, command line and other setup data into this RAM
region as seen in the kexec logs below, which leads to fatal RMP fault
during kexec boot.
Loaded purgatory at 0x807f1fa000
Loaded boot_param, command line and misc at 0x807f1f8000 bufsz=0x1350 memsz=0x2000
Loaded 64bit kernel at 0x7ffae00000 bufsz=0xd06200 memsz=0x3894000
Loaded initrd at 0x7ff6c89000 bufsz=0x4176014 memsz=0x4176014
E820 memmap:
0000000000000000-000000000008efff (1)
000000000008f000-000000000008ffff (4)
0000000000090000-000000000009ffff (1)
...
0000004080000000-0000007ffe7fffff (1)
0000007ffe800000-000000807f0fffff (2)
000000807f100000-000000807f1fefff (1)
000000807f1ff000-000000807fffffff (2)
nr_segments = 4
segment[0]: buf=0x00000000e626d1a2 bufsz=0x4000 mem=0x807f1fa000 memsz=0x5000
segment[1]: buf=0x0000000029c67bd6 bufsz=0x1350 mem=0x807f1f8000 memsz=0x2000
segment[2]: buf=0x0000000045c60183 bufsz=0xd06200 mem=0x7ffae00000 memsz=0x3894000
segment[3]: buf=0x000000006e54f08d bufsz=0x4176014 mem=0x7ff6c89000 memsz=0x4177000
kexec_file_load: type:0, start:0x807f1fa150 head:0x1184d0002 flags:0x0
Check if RMP table start and end physical range in the e820 tables are
not aligned to 2MB and in that case map this range to reserved in all
the three e820 tables.
[ bp: Massage. ]
Fixes: c3b86e61b7 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/df6e995ff88565262c2c7c69964883ff8aa6fc30.1714090302.git.ashish.kalra@amd.com
Asbjørn Sloth Tønnesen says:
====================
net: qede: avoid overruling error codes
This series fixes the qede driver, so that
qede_parse_flow_attr() and it's subfunctions
doesn't get their error codes overruled
(ie. turning -EOPNOTSUPP into -EINVAL).
---
I have two more patches along the same lines,
but they are not yet causing any issues,
so I have them destined for net-next.
(those are for qede_flow_spec_validate_unused()
and qede_flow_parse_ports().)
After that I have a series for converting to
extack + the final one for validating control
flags.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When calling qede_parse_actions() then the
return code was only used for a non-zero check,
and then -EINVAL was returned.
qede_parse_actions() can currently fail with:
* -EINVAL
* -EOPNOTSUPP
This patch changes the code to use the actual
return code, not just return -EINVAL.
The blaimed commit broke the implicit assumption
that only -EINVAL would ever be returned.
Only compile tested.
Fixes: 319a1d1947 ("flow_offload: check for basic action hw stats type")
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In qede_flow_spec_to_rule(), when calling
qede_parse_flow_attr() then the return code
was only used for a non-zero check, and then
-EINVAL was returned.
qede_parse_flow_attr() can currently fail with:
* -EINVAL
* -EOPNOTSUPP
* -EPROTONOSUPPORT
This patch changes the code to use the actual
return code, not just return -EINVAL.
The blaimed commit introduced qede_flow_spec_to_rule(),
and this call to qede_parse_flow_attr(), it looks
like it just duplicated how it was already used.
Only compile tested.
Fixes: 37c5d3efd7 ("qede: use ethtool_rx_flow_rule() to remove duplicated parser code")
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In qede_add_tc_flower_fltr(), when calling
qede_parse_flow_attr() then the return code
was only used for a non-zero check, and then
-EINVAL was returned.
qede_parse_flow_attr() can currently fail with:
* -EINVAL
* -EOPNOTSUPP
* -EPROTONOSUPPORT
This patch changes the code to use the actual
return code, not just return -EINVAL.
The blaimed commit introduced these functions.
Only compile tested.
Fixes: 2ce9c93eac ("qede: Ingress tc flower offload (drop action) support.")
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Explicitly set 'rc' (return code), before jumping to the
unlock and return path.
By not having any code depend on that 'rc' remains at
it's initial value of -EINVAL, then we can re-use 'rc' for
the return code of function calls in subsequent patches.
Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Donald has been contributing to YNL a lot. Let's create a dedicated
MAINTAINERS entry and add make his involvement official :)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Berger says:
====================
net: bcmgenet: protect contended accesses
Some registers may be modified by parallel execution contexts and
require protections to prevent corruption.
A review of the driver revealed the need for these additional
protections.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The UMAC_CMD register is written from different execution
contexts and has insufficient synchronization protections to
prevent possible corruption. Of particular concern are the
acceses from the phy_device delayed work context used by the
adjust_link call and the BH context that may be used by the
ndo_set_rx_mode call.
A spinlock is added to the driver to protect contended register
accesses (i.e. reg_lock) and it is used to synchronize accesses
to UMAC_CMD.
Fixes: 1c1008c793 ("net: bcmgenet: add main driver file")
Cc: stable@vger.kernel.org
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ndo_set_rx_mode function is synchronized with the
netif_addr_lock spinlock and BHs disabled. Since this
function is also invoked directly from the driver the
same synchronization should be applied.
Fixes: 72f9634762 ("net: bcmgenet: set Rx mode before starting netif")
Cc: stable@vger.kernel.org
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The EXT_RGMII_OOB_CTRL register can be written from different
contexts. It is predominantly written from the adjust_link
handler which is synchronized by the phydev->lock, but can
also be written from a different context when configuring the
mii in bcmgenet_mii_config().
The chances of contention are quite low, but it is conceivable
that adjust_link could occur during resume when WoL is enabled
so use the phydev->lock synchronizer in bcmgenet_mii_config()
to be sure.
Fixes: afe3f907d2 ("net: bcmgenet: power on MII block for all MII modes")
Cc: stable@vger.kernel.org
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xtensa's flush_dcache_page() can be a no-op sometimes. There is a
generic implementation for this case in include/asm-generic/
cacheflush.h.
#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
static inline void flush_dcache_page(struct page *page)
{
}
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 0
#endif
So remove the superfluous flush_dcache_page() definition, which also
helps silence potential build warnings complaining the page variable
passed to flush_dcache_page() is not used.
In file included from crypto/scompress.c:12:
include/crypto/scatterwalk.h: In function 'scatterwalk_pagedone':
include/crypto/scatterwalk.h:76:30: warning: variable 'page' set but not used [-Wunused-but-set-variable]
76 | struct page *page;
| ^~~~
crypto/scompress.c: In function 'scomp_acomp_comp_decomp':
>> crypto/scompress.c:174:38: warning: unused variable 'dst_page' [-Wunused-variable]
174 | struct page *dst_page = sg_page(req->dst);
|
The issue was originally reported on LoongArch by kernel test
robot (Huacai fixed it on LoongArch), then reported by Guenter
and me on xtensa.
This patch also removes lots of redundant macros which have
been defined by asm-generic/cacheflush.h.
Cc: Huacai Chen <chenhuacai@loongson.cn>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202403091614.NeUw5zcv-lkp@intel.com/
Reported-by: Barry Song <v-songbaohua@oppo.com>
Closes: https://lore.kernel.org/all/CAGsJ_4yDk1+axbte7FKQEwD7X2oxUCFrEc9M5YOS1BobfDFXPA@mail.gmail.com/
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/all/aaa8b7d7-5abe-47bf-93f6-407942436472@roeck-us.net/
Fixes: 77292bb8ca ("crypto: scomp - remove memcpy if sg_nents is 1 and pages are lowmem")
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Message-Id: <20240319010920.125192-1-21cnbao@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Currently, the condition "__this_cpu_read(ksoftirqd) == current" is used to
invoke rcu_softirq_qs() in ksoftirqd tasks context for non-RT kernels.
This works correctly as long as the context is actually task context but
this condition is wrong when:
- the current task is ksoftirqd
- the task is interrupted in a RCU read side critical section
- __do_softirq() is invoked on return from interrupt
Syzkaller triggered the following scenario:
-> finish_task_switch()
-> put_task_struct_rcu_user()
-> call_rcu(&task->rcu, delayed_put_task_struct)
-> __kasan_record_aux_stack()
-> pfn_valid()
-> rcu_read_lock_sched()
<interrupt>
__irq_exit_rcu()
-> __do_softirq)()
-> if (!IS_ENABLED(CONFIG_PREEMPT_RT) &&
__this_cpu_read(ksoftirqd) == current)
-> rcu_softirq_qs()
-> RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map))
The rcu quiescent state is reported in the rcu-read critical section, so
the lockdep warning is triggered.
Fix this by splitting out the inner working of __do_softirq() into a helper
function which takes an argument to distinguish between ksoftirqd task
context and interrupted context and invoke it from the relevant call sites
with the proper context information and use that for the conditional
invocation of rcu_softirq_qs().
Reported-by: syzbot+dce04ed6d1438ad69656@syzkaller.appspotmail.com
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240427102808.29356-1-qiang.zhang1211@gmail.com
Link: https://lore.kernel.org/lkml/8f281a10-b85a-4586-9586-5bbc12dc784f@paulmck-laptop/T/#mea8aba4abfcb97bbf499d169ce7f30c4cff1b0e3
When erofs_kill_sb() is called in block dev based mode, s_bdev may not
have been initialised yet, and if CONFIG_EROFS_FS_ONDEMAND is enabled,
it will be mistaken for fscache mode, and then attempt to free an anon_dev
that has never been allocated, triggering the following warning:
============================================
ida_free called for id=0 which is not allocated.
WARNING: CPU: 14 PID: 926 at lib/idr.c:525 ida_free+0x134/0x140
Modules linked in:
CPU: 14 PID: 926 Comm: mount Not tainted 6.9.0-rc3-dirty #630
RIP: 0010:ida_free+0x134/0x140
Call Trace:
<TASK>
erofs_kill_sb+0x81/0x90
deactivate_locked_super+0x35/0x80
get_tree_bdev+0x136/0x1e0
vfs_get_tree+0x2c/0xf0
do_new_mount+0x190/0x2f0
[...]
============================================
Now when erofs_kill_sb() is called, erofs_sb_info must have been
initialised, so use sbi->fsid to distinguish between the two modes.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240419123611.947084-3-libaokun1@huawei.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
We did not delay after the second strobe signal, so another immediately
following access could potentially corrupt the written value.
This is a purely speculative fix with no supporting evidence, but after
taking out the spinlocks around the writes, it seems plausible that a
modern processor could be actually too fast. Also, it's just cleaner to
be consistent.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Message-ID: <20240428093716.3198666-7-oswald.buddenhagen@gmx.de>
A side effect of making the dock monitoring interrupt-driven was that
we'd be very quick to program a freshly connected dock. However, for
unclear reasons, the dock does not work when we do that - despite the
FPGA netlist upload going just fine. We work around this by adding a
delay before programming the dock; for safety, the value is several
times as much as was determined empirically.
Note that a badly timed dock hot-plug would have triggered the problem
even before the referenced commit - but now it would happen 100% instead
of about 3% of the time, thus making it impossible to work around by
re-plugging.
Fixes: fbb64eedf5 ("ALSA: emu10k1: make E-MU dock monitoring interrupt-driven")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218584
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Message-ID: <20240428093716.3198666-6-oswald.buddenhagen@gmx.de>
The FPGA access through the GPIO port does not interfere with other
sound processor register access, so there is no need to subject it to
emu_lock. And after moving all FPGA access out of the interrupt handler,
it does not need to be IRQ-safe, either.
What's more, attaching the dock causes a firmware upload, which takes
several seconds. We really don't want to disable IRQs for this long, and
even less also have someone else spin with IRQs disabled waiting for us.
Therefore, use a mutex for FPGA access locking.
This makes the code somewhat more noisy, as we need to wrap bigger
sections into the mutex, as it needs to enclose the spinlocks.
The latter has the "side effect" of fixing dock FPGA programming in a
corner case: a really badly timed mixer access right between entering
FPGA programming mode and uploading the netlist would mess up the
protocol.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Message-ID: <20240428093716.3198666-5-oswald.buddenhagen@gmx.de>
The actual event processing was already done by workqueue items. We can
move the event dispatching there as well, rather than doing it already
in the interrupt handler callback.
This change has a rather profound "side effect" on the reliability of
the FPGA programming: once we enter programming mode, we must not issue
any snd_emu1010_fpga_{read,write}() calls until we're done, as these
would badly mess up the programming protocol. But exactly that would
happen when trying to program the dock, as that triggers GPIO interrupts
as a side effect. This is mitigated by deferring the actual interrupt
handling, as workqueue items are not re-entrant.
To avoid scheduling the dispatcher on non-events, we now explicitly
ignore GPIO IRQs triggered by "uninteresting" pins, which happens a lot
as a side effect of calling snd_emu1010_fpga_{read,write}().
Fixes: fbb64eedf5 ("ALSA: emu10k1: make E-MU dock monitoring interrupt-driven")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218584
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Message-ID: <20240428093716.3198666-4-oswald.buddenhagen@gmx.de>
Pulled out of the next patch to improve its legibility.
As the function is now available, call it directly from
snd_emu10k1_emu1010_init(), thus making the MicroDock firmware loading
synchronous - there isn't really a reason not to. Note that this does
not affect the AudioDocks of rev1 cards, as these have no independent
power supplies, and thus come up only a while after the main card is
initialized.
As a drive-by, adjust the priorities of two messages to better reflect
their impact.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Message-ID: <20240428093716.3198666-3-oswald.buddenhagen@gmx.de>
While there are two separate IRQ status bits for dock attach and detach,
the hardware appears to mix them up more or less randomly, making them
useless for tracking what actually happened. It is much safer to check
the dock status separately and proceed based on that, as the old polling
code did.
Note that the code assumes that only the dock can be hot-plugged - if
other option card bits changed, the logic would break.
Fixes: fbb64eedf5 ("ALSA: emu10k1: make E-MU dock monitoring interrupt-driven")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218584
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Message-ID: <20240428093716.3198666-2-oswald.buddenhagen@gmx.de>
Daniel Borkmann says:
====================
pull-request: bpf 2024-04-26
We've added 12 non-merge commits during the last 22 day(s) which contain
a total of 14 files changed, 168 insertions(+), 72 deletions(-).
The main changes are:
1) Fix BPF_PROBE_MEM in verifier and JIT to skip loads from vsyscall page,
from Puranjay Mohan.
2) Fix a crash in XDP with devmap broadcast redirect when the latter map
is in process of being torn down, from Toke Høiland-Jørgensen.
3) Fix arm64 and riscv64 BPF JITs to properly clear start time for BPF
program runtime stats, from Xu Kuohai.
4) Fix a sockmap KCSAN-reported data race in sk_psock_skb_ingress_enqueue,
from Jason Xing.
5) Fix BPF verifier error message in resolve_pseudo_ldimm64,
from Anton Protopopov.
6) Fix missing DEBUG_INFO_BTF_MODULES Kconfig menu item,
from Andrii Nakryiko.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Test PROBE_MEM of VSYSCALL_ADDR on x86-64
bpf, x86: Fix PROBE_MEM runtime load check
bpf: verifier: prevent userspace memory access
xdp: use flags field to disambiguate broadcast redirect
arm32, bpf: Reimplement sign-extension mov instruction
riscv, bpf: Fix incorrect runtime stats
bpf, arm64: Fix incorrect runtime stats
bpf: Fix a verifier verbose message
bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue
MAINTAINERS: bpf: Add Lehui and Puranjay as riscv64 reviewers
MAINTAINERS: Update email address for Puranjay Mohan
bpf, kconfig: Fix DEBUG_INFO_BTF_MODULES Kconfig definition
====================
Link: https://lore.kernel.org/r/20240426224248.26197-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Puranjay Mohan says:
====================
bpf: prevent userspace memory access
V5: https://lore.kernel.org/bpf/20240324185356.59111-1-puranjay12@gmail.com/
Changes in V6:
- Disable the verifier's instrumentation in x86-64 and update the JIT to
take care of vsyscall page in addition to userspace addresses.
- Update bpf_testmod to test for vsyscall addresses.
V4: https://lore.kernel.org/bpf/20240321124640.8870-1-puranjay12@gmail.com/
Changes in V5:
- Use TASK_SIZE_MAX + PAGE_SIZE, VSYSCALL_ADDR as userspace boundary in
x86-64 JIT.
- Added Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
V3: https://lore.kernel.org/bpf/20240321120842.78983-1-puranjay12@gmail.com/
Changes in V4:
- Disable this feature on architectures that don't define
CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
- By doing the above, we don't need anything explicitly for s390x.
V2: https://lore.kernel.org/bpf/20240321101058.68530-1-puranjay12@gmail.com/
Changes in V3:
- Return 0 from bpf_arch_uaddress_limit() in disabled case because it
returns u64.
- Modify the check in verifier to no do instrumentation when uaddress_limit
is 0.
V1: https://lore.kernel.org/bpf/20240320105436.4781-1-puranjay12@gmail.com/
Changes in V2:
- Disable this feature on s390x.
With BPF_PROBE_MEM, BPF allows de-referencing an untrusted pointer. To
thwart invalid memory accesses, the JITs add an exception table entry for
all such accesses. But in case the src_reg + offset is a userspace address,
the BPF program might read that memory if the user has mapped it.
x86-64 JIT already instruments the BPF_PROBE_MEM based loads with checks to
skip loads from userspace addresses, but is doesn't check for vsyscall page
because it falls in the kernel address space but is considered a userspace
page. The second patch in this series fixes the x86-64 JIT to also skip
loads from the vsyscall page. The last patch updates the bpf_testmod so
this address can be checked as part of the selftests.
Other architectures don't have the complexity of the vsyscall address and
just need to skip loads from the userspace. To make this more scalable and
robust, the verifier is updated in the first patch to instrument
BPF_PROBE_MEM to skip loads from the userspace addresses.
====================
Link: https://lore.kernel.org/r/20240424100210.11982-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The vsyscall is a legacy API for fast execution of system calls. It maps
a page at address VSYSCALL_ADDR into the userspace program. This address
is in the top 10MB of the address space:
ffffffffff600000 - ffffffffff600fff | 4 kB | legacy vsyscall ABI
The last commit fixes the x86-64 BPF JIT to skip accessing addresses in
this memory region. Add this address to bpf_testmod_return_ptr() so we
can make sure that it is fixed.
After this change and without the previous commit, subprogs_extable
selftest will crash the kernel.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20240424100210.11982-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When a load is marked PROBE_MEM - e.g. due to PTR_UNTRUSTED access - the
address being loaded from is not necessarily valid. The BPF jit sets up
exception handlers for each such load which catch page faults and 0 out
the destination register.
If the address for the load is outside kernel address space, the load
will escape the exception handling and crash the kernel. To prevent this
from happening, the emits some instruction to verify that addr is > end
of userspace addresses.
x86 has a legacy vsyscall ABI where a page at address 0xffffffffff600000
is mapped with user accessible permissions. The addresses in this page
are considered userspace addresses by the fault handler. Therefore, a
BPF program accessing this page will crash the kernel.
This patch fixes the runtime checks to also check that the PROBE_MEM
address is below VSYSCALL_ADDR.
Example BPF program:
SEC("fentry/tcp_v4_connect")
int BPF_PROG(fentry_tcp_v4_connect, struct sock *sk)
{
*(volatile unsigned long *)&sk->sk_tsq_flags;
return 0;
}
BPF Assembly:
0: (79) r1 = *(u64 *)(r1 +0)
1: (79) r1 = *(u64 *)(r1 +344)
2: (b7) r0 = 0
3: (95) exit
x86-64 JIT
==========
BEFORE AFTER
------ -----
0: nopl 0x0(%rax,%rax,1) 0: nopl 0x0(%rax,%rax,1)
5: xchg %ax,%ax 5: xchg %ax,%ax
7: push %rbp 7: push %rbp
8: mov %rsp,%rbp 8: mov %rsp,%rbp
b: mov 0x0(%rdi),%rdi b: mov 0x0(%rdi),%rdi
-------------------------------------------------------------------------------
f: movabs $0x100000000000000,%r11 f: movabs $0xffffffffff600000,%r10
19: add $0x2a0,%rdi 19: mov %rdi,%r11
20: cmp %r11,%rdi 1c: add $0x2a0,%r11
23: jae 0x0000000000000029 23: sub %r10,%r11
25: xor %edi,%edi 26: movabs $0x100000000a00000,%r10
27: jmp 0x000000000000002d 30: cmp %r10,%r11
29: mov 0x0(%rdi),%rdi 33: ja 0x0000000000000039
--------------------------------\ 35: xor %edi,%edi
2d: xor %eax,%eax \ 37: jmp 0x0000000000000040
2f: leave \ 39: mov 0x2a0(%rdi),%rdi
30: ret \--------------------------------------------
40: xor %eax,%eax
42: leave
43: ret
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20240424100210.11982-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The return-address (RA) register r14 is specified as volatile in the
s390x ELF ABI [1]. Nevertheless proper CFI directives must be provided
for an unwinder to restore the return address, if the RA register
value is changed from its value at function entry, as it is the case.
[1]: s390x ELF ABI, https://github.com/IBM/s390x-abi/releases
Fixes: 4bff8cb545 ("s390: convert to GENERIC_VDSO")
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Since commit 1b2ac5a6d6 ("s390/3270: use new address translation
helpers") rq->buffer is passed unconditionally to virt_to_dma32().
The 3270 driver allocates requests without buffer, so the value passed
to virt_to_dma32 might be NULL. Check for NULL before assigning.
Fixes: 1b2ac5a6d6 ("s390/3270: use new address translation helpers")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Since thermal_debug_cdev_remove() does not run under cdev->lock, it can
run in parallel with thermal_debug_cdev_state_update() and it may free
the struct thermal_debugfs object used by the latter after it has been
checked against NULL.
If that happens, thermal_debug_cdev_state_update() will access memory
that has been freed already causing the kernel to crash.
Address this by using cdev->lock in thermal_debug_cdev_remove() around
the cdev->debugfs value check (in case the same cdev is removed at the
same time in two different threads) and its reset to NULL.
Fixes: 755113d767 ("thermal/debugfs: Add thermal cooling device debugfs information")
Cc :6.8+ <stable@vger.kernel.org> # 6.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Drop the flow-hash of the skb when forwarding to the L2TP netdev.
This avoids the L2TP qdisc from using the flow-hash from the outer
packet, which is identical for every flow within the tunnel.
This does not affect every platform but is specific for the ethernet
driver. It depends on the platform including L4 information in the
flow-hash.
One such example is the Mediatek Filogic MT798x family of networking
processors.
Fixes: d9e31d17ce ("l2tp: Add L2TP ethernet pseudowire support")
Acked-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David Bauer <mail@david-bauer.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240424171110.13701-1-mail@david-bauer.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
syzbot triggered various splats (see [0] and links) by a crafted GSO
packet of VIRTIO_NET_HDR_GSO_UDP layering the following protocols:
ETH_P_8021AD + ETH_P_NSH + ETH_P_IPV6 + IPPROTO_UDP
NSH can encapsulate IPv4, IPv6, Ethernet, NSH, and MPLS. As the inner
protocol can be Ethernet, NSH GSO handler, nsh_gso_segment(), calls
skb_mac_gso_segment() to invoke inner protocol GSO handlers.
nsh_gso_segment() does the following for the original skb before
calling skb_mac_gso_segment()
1. reset skb->network_header
2. save the original skb->{mac_heaeder,mac_len} in a local variable
3. pull the NSH header
4. resets skb->mac_header
5. set up skb->mac_len and skb->protocol for the inner protocol.
and does the following for the segmented skb
6. set ntohs(ETH_P_NSH) to skb->protocol
7. push the NSH header
8. restore skb->mac_header
9. set skb->mac_header + mac_len to skb->network_header
10. restore skb->mac_len
There are two problems in 6-7 and 8-9.
(a)
After 6 & 7, skb->data points to the NSH header, so the outer header
(ETH_P_8021AD in this case) is stripped when skb is sent out of netdev.
Also, if NSH is encapsulated by NSH + Ethernet (so NSH-Ethernet-NSH),
skb_pull() in the first nsh_gso_segment() will make skb->data point
to the middle of the outer NSH or Ethernet header because the Ethernet
header is not pulled by the second nsh_gso_segment().
(b)
While restoring skb->{mac_header,network_header} in 8 & 9,
nsh_gso_segment() does not assume that the data in the linear
buffer is shifted.
However, udp6_ufo_fragment() could shift the data and change
skb->mac_header accordingly as demonstrated by syzbot.
If this happens, even the restored skb->mac_header points to
the middle of the outer header.
It seems nsh_gso_segment() has never worked with outer headers so far.
At the end of nsh_gso_segment(), the outer header must be restored for
the segmented skb, instead of the NSH header.
To do that, let's calculate the outer header position relatively from
the inner header and set skb->{data,mac_header,protocol} properly.
[0]:
BUG: KMSAN: uninit-value in ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:524 [inline]
BUG: KMSAN: uninit-value in ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline]
BUG: KMSAN: uninit-value in ipvlan_queue_xmit+0xf44/0x16b0 drivers/net/ipvlan/ipvlan_core.c:668
ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:524 [inline]
ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline]
ipvlan_queue_xmit+0xf44/0x16b0 drivers/net/ipvlan/ipvlan_core.c:668
ipvlan_start_xmit+0x5c/0x1a0 drivers/net/ipvlan/ipvlan_main.c:222
__netdev_start_xmit include/linux/netdevice.h:4989 [inline]
netdev_start_xmit include/linux/netdevice.h:5003 [inline]
xmit_one net/core/dev.c:3547 [inline]
dev_hard_start_xmit+0x244/0xa10 net/core/dev.c:3563
__dev_queue_xmit+0x33ed/0x51c0 net/core/dev.c:4351
dev_queue_xmit include/linux/netdevice.h:3171 [inline]
packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3081 [inline]
packet_sendmsg+0x8aef/0x9f10 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
__sys_sendto+0x735/0xa10 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
Uninit was created at:
slab_post_alloc_hook mm/slub.c:3819 [inline]
slab_alloc_node mm/slub.c:3860 [inline]
__do_kmalloc_node mm/slub.c:3980 [inline]
__kmalloc_node_track_caller+0x705/0x1000 mm/slub.c:4001
kmalloc_reserve+0x249/0x4a0 net/core/skbuff.c:582
__alloc_skb+0x352/0x790 net/core/skbuff.c:651
skb_segment+0x20aa/0x7080 net/core/skbuff.c:4647
udp6_ufo_fragment+0xcab/0x1150 net/ipv6/udp_offload.c:109
ipv6_gso_segment+0x14be/0x2ca0 net/ipv6/ip6_offload.c:152
skb_mac_gso_segment+0x3e8/0x760 net/core/gso.c:53
nsh_gso_segment+0x6f4/0xf70 net/nsh/nsh.c:108
skb_mac_gso_segment+0x3e8/0x760 net/core/gso.c:53
__skb_gso_segment+0x4b0/0x730 net/core/gso.c:124
skb_gso_segment include/net/gso.h:83 [inline]
validate_xmit_skb+0x107f/0x1930 net/core/dev.c:3628
__dev_queue_xmit+0x1f28/0x51c0 net/core/dev.c:4343
dev_queue_xmit include/linux/netdevice.h:3171 [inline]
packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3081 [inline]
packet_sendmsg+0x8aef/0x9f10 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
__sys_sendto+0x735/0xa10 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
CPU: 1 PID: 5101 Comm: syz-executor421 Not tainted 6.8.0-rc5-syzkaller-00297-gf2e367d6ad3b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
Fixes: c411ed8545 ("nsh: add GSO support")
Reported-and-tested-by: syzbot+42a0dc856239de4de60e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=42a0dc856239de4de60e
Reported-and-tested-by: syzbot+c298c9f0e46a3c86332b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c298c9f0e46a3c86332b
Link: https://lore.kernel.org/netdev/20240415222041.18537-1-kuniyu@amazon.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240424023549.21862-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
With the current thermal zone locking arrangement in the debugfs code,
user space can open the "mitigations" file for a thermal zone before
the zone's debugfs pointer is set which will result in a NULL pointer
dereference in tze_seq_start().
Moreover, thermal_debug_tz_remove() is not called under the thermal
zone lock, so it can run in parallel with the other functions accessing
the thermal zone's struct thermal_debugfs object. Then, it may clear
tz->debugfs after one of those functions has checked it and the
struct thermal_debugfs object may be freed prematurely.
To address the first problem, pass a pointer to the thermal zone's
struct thermal_debugfs object to debugfs_create_file() in
thermal_debug_tz_add() and make tze_seq_start(), tze_seq_next(),
tze_seq_stop(), and tze_seq_show() retrieve it from s->private
instead of a pointer to the thermal zone object. This will ensure
that tz_debugfs will be valid across the "mitigations" file accesses
until thermal_debugfs_remove_id() called by thermal_debug_tz_remove()
removes that file.
To address the second problem, use tz->lock in thermal_debug_tz_remove()
around the tz->debugfs value check (in case the same thermal zone is
removed at the same time in two different threads) and its reset to NULL.
Fixes: 7ef01f228c ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Cc :6.8+ <stable@vger.kernel.org> # 6.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Because thermal_debug_tz_remove() does not free all memory allocated for
thermal zone diagnostics, some of that memory becomes unreachable after
freeing the thermal zone's struct thermal_debugfs object.
Address this by making thermal_debug_tz_remove() free all of the memory
in question.
Fixes: 7ef01f228c ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Cc :6.8+ <stable@vger.kernel.org> # 6.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Add PCI subdevice ID for the Intel D5005 Stratix 10 FPGA card as
used with the Open FPGA Stack (OFS) FPGA Interface Manager (FIM).
Unlike the Intel D5005 PAC FIM which exposed a separate PCI device ID,
the OFS FIM reuses the same device ID for all DFL-based FPGA cards
and differentiates on the subdevice ID. The subdevice ID values were
chosen as the numeric part of the FPGA card names in hexadecimal.
Signed-off-by: Peter Colberg <peter.colberg@intel.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
Acked-by: Xu Yilun <yilun.xu@intel.com>
Link: https://lore.kernel.org/r/20240422230257.1959-1-peter.colberg@intel.com
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
Bui Quang Minh says:
====================
Ensure the copied buf is NUL terminated (part)
I found that some drivers contains an out-of-bound read pattern like this
kern_buf = memdup_user(user_buf, count);
...
sscanf(kern_buf, ...);
The sscanf can be replaced by some other string-related functions. This
pattern can lead to out-of-bound read of kern_buf in string-related
functions.
This series fix the above issue by replacing memdup_user with
memdup_user_nul.
v1: https://lore.kernel.org/r/20240422-fix-oob-read-v1-0-e02854c30174@gmail.com
====================
Link: https://lore.kernel.org/r/20240424-fix-oob-read-v2-0-f1f1b53a10f4@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We try to access count + 1 byte from userspace with memdup_user(buffer,
count + 1). However, the userspace only provides buffer of count bytes and
only these count bytes are verified to be okay to access. To ensure the
copied buffer is NUL terminated, we use memdup_user_nul instead.
Fixes: 3a2eb515d1 ("octeontx2-af: Fix an off by one in rvu_dbg_qsize_write()")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://lore.kernel.org/r/20240424-fix-oob-read-v2-6-f1f1b53a10f4@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, we allocate a nbytes-sized kernel buffer and copy nbytes from
userspace to that buffer. Later, we use sscanf on this buffer but we don't
ensure that the string is terminated inside the buffer, this can lead to
OOB read when using sscanf. Fix this issue by using memdup_user_nul
instead of memdup_user.
Fixes: 7afc5dbde0 ("bna: Add debugfs interface.")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://lore.kernel.org/r/20240424-fix-oob-read-v2-2-f1f1b53a10f4@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, we allocate a count-sized kernel buffer and copy count bytes
from userspace to that buffer. Later, we use sscanf on this buffer but we
don't ensure that the string is terminated inside the buffer, this can lead
to OOB read when using sscanf. Fix this issue by using memdup_user_nul
instead of memdup_user.
Fixes: 96a9a9341c ("ice: configure FW logging")
Fixes: 73671c3162 ("ice: enable FW logging")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://lore.kernel.org/r/20240424-fix-oob-read-v2-1-f1f1b53a10f4@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ensure that args.acl is initialized early. It is used in an
unconditional call to kfree() on the way out of
nfsd4_encode_fattr4().
Reported-by: Scott Mayhew <smayhew@redhat.com>
Fixes: 83ab8678ad ("NFSD: Add struct nfsd4_fattr_args")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The function inject_data_ue_store() lacks a NULL check for the user
passed values. To prevent below kernel crash include a NULL check.
Call trace:
kstrtoull
kstrtou8
inject_data_ue_store
full_proxy_write
vfs_write
ksys_write
__arm64_sys_write
invoke_syscall
el0_svc_common.constprop.0
do_el0_svc
el0_svc
el0t_64_sync_handler
el0t_64_sync
Fixes: 83bf24051a ("EDAC/versal: Make the bit position of injected errors configurable")
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240425121942.26378-3-shubhrajyoti.datta@amd.com
One of my CI runs popped the following lockdep splat
======================================================
WARNING: possible circular locking dependency detected
6.9.0-rc4+ #1 Not tainted
------------------------------------------------------
btrfs/471533 is trying to acquire lock:
ffff92ba46980850 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_quota_disable+0x54/0x4c0
but task is already holding lock:
ffff92ba46980bd0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1c8f/0x2600
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&fs_info->subvol_sem){++++}-{3:3}:
down_read+0x42/0x170
btrfs_rename+0x607/0xb00
btrfs_rename2+0x2e/0x70
vfs_rename+0xaf8/0xfc0
do_renameat2+0x586/0x600
__x64_sys_rename+0x43/0x50
do_syscall_64+0x95/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #1 (&sb->s_type->i_mutex_key#16){++++}-{3:3}:
down_write+0x3f/0xc0
btrfs_inode_lock+0x40/0x70
prealloc_file_extent_cluster+0x1b0/0x370
relocate_file_extent_cluster+0xb2/0x720
relocate_data_extent+0x107/0x160
relocate_block_group+0x442/0x550
btrfs_relocate_block_group+0x2cb/0x4b0
btrfs_relocate_chunk+0x50/0x1b0
btrfs_balance+0x92f/0x13d0
btrfs_ioctl+0x1abf/0x2600
__x64_sys_ioctl+0x97/0xd0
do_syscall_64+0x95/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #0 (&fs_info->cleaner_mutex){+.+.}-{3:3}:
__lock_acquire+0x13e7/0x2180
lock_acquire+0xcb/0x2e0
__mutex_lock+0xbe/0xc00
btrfs_quota_disable+0x54/0x4c0
btrfs_ioctl+0x206b/0x2600
__x64_sys_ioctl+0x97/0xd0
do_syscall_64+0x95/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
other info that might help us debug this:
Chain exists of:
&fs_info->cleaner_mutex --> &sb->s_type->i_mutex_key#16 --> &fs_info->subvol_sem
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&fs_info->subvol_sem);
lock(&sb->s_type->i_mutex_key#16);
lock(&fs_info->subvol_sem);
lock(&fs_info->cleaner_mutex);
*** DEADLOCK ***
2 locks held by btrfs/471533:
#0: ffff92ba4319e420 (sb_writers#14){.+.+}-{0:0}, at: btrfs_ioctl+0x3b5/0x2600
#1: ffff92ba46980bd0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1c8f/0x2600
stack backtrace:
CPU: 1 PID: 471533 Comm: btrfs Kdump: loaded Not tainted 6.9.0-rc4+ #1
Call Trace:
<TASK>
dump_stack_lvl+0x77/0xb0
check_noncircular+0x148/0x160
? lock_acquire+0xcb/0x2e0
__lock_acquire+0x13e7/0x2180
lock_acquire+0xcb/0x2e0
? btrfs_quota_disable+0x54/0x4c0
? lock_is_held_type+0x9a/0x110
__mutex_lock+0xbe/0xc00
? btrfs_quota_disable+0x54/0x4c0
? srso_return_thunk+0x5/0x5f
? lock_acquire+0xcb/0x2e0
? btrfs_quota_disable+0x54/0x4c0
? btrfs_quota_disable+0x54/0x4c0
btrfs_quota_disable+0x54/0x4c0
btrfs_ioctl+0x206b/0x2600
? srso_return_thunk+0x5/0x5f
? __do_sys_statfs+0x61/0x70
__x64_sys_ioctl+0x97/0xd0
do_syscall_64+0x95/0x180
? srso_return_thunk+0x5/0x5f
? reacquire_held_locks+0xd1/0x1f0
? do_user_addr_fault+0x307/0x8a0
? srso_return_thunk+0x5/0x5f
? lock_acquire+0xcb/0x2e0
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? find_held_lock+0x2b/0x80
? srso_return_thunk+0x5/0x5f
? lock_release+0xca/0x2a0
? srso_return_thunk+0x5/0x5f
? do_user_addr_fault+0x35c/0x8a0
? srso_return_thunk+0x5/0x5f
? trace_hardirqs_off+0x4b/0xc0
? srso_return_thunk+0x5/0x5f
? lockdep_hardirqs_on_prepare+0xde/0x190
? srso_return_thunk+0x5/0x5f
This happens because when we call rename we already have the inode mutex
held, and then we acquire the subvol_sem if we are a subvolume. This
makes the dependency
inode lock -> subvol sem
When we're running data relocation we will preallocate space for the
data relocation inode, and we always run the relocation under the
->cleaner_mutex. This now creates the dependency of
cleaner_mutex -> inode lock (from the prealloc) -> subvol_sem
Qgroup delete is doing this in the opposite order, it is acquiring the
subvol_sem and then it is acquiring the cleaner_mutex, which results in
this lockdep splat. This deadlock can't happen in reality, because we
won't ever rename the data reloc inode, nor is the data reloc inode a
subvolume.
However this is fairly easy to fix, simply take the cleaner mutex in the
case where we are disabling qgroups before we take the subvol_sem. This
resolves the lockdep splat.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
intel-pinctrl for v6.9-1
* Correct GPIO selection and add UART3 pins for Intel Bay Trail
The following is an automated git shortlog grouped by driver:
baytrail:
- Add pinconf group for uart3
- Fix selecting gpio pinctrl state
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Assert that accesses to a non-existent vgic-v2 CPU interface
consistently fail across the various KVM device attr ioctls. This also
serves as a regression test for a bug wherein KVM hits a NULL
dereference when the CPUID specified in the ioctl is invalid.
Note that there is no need to print the observed errno, as TEST_ASSERT()
will take care of it.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240424173959.3776798-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
vgic_v2_parse_attr() is responsible for finding the vCPU that matches
the user-provided CPUID, which (of course) may not be valid. If the ID
is invalid, kvm_get_vcpu_by_id() returns NULL, which isn't handled
gracefully.
Similar to the GICv3 uaccess flow, check that kvm_get_vcpu_by_id()
actually returns something and fail the ioctl if not.
Cc: stable@vger.kernel.org
Fixes: 7d450e2821 ("KVM: arm/arm64: vgic-new: Add userland access to VGIC dist registers")
Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240424173959.3776798-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Commit 50e782a86c ("efi/unaccepted: Fix soft lockups caused by
parallel memory acceptance") has released the spinlock so other CPUs can
do memory acceptance in parallel and not triggers softlockup on other
CPUs.
However the softlock up was intermittent shown up if the memory of the
TD guest is large, and the timeout of softlockup is set to 1 second:
RIP: 0010:_raw_spin_unlock_irqrestore
Call Trace:
? __hrtimer_run_queues
<IRQ>
? hrtimer_interrupt
? watchdog_timer_fn
? __sysvec_apic_timer_interrupt
? __pfx_watchdog_timer_fn
? sysvec_apic_timer_interrupt
</IRQ>
? __hrtimer_run_queues
<TASK>
? hrtimer_interrupt
? asm_sysvec_apic_timer_interrupt
? _raw_spin_unlock_irqrestore
? __sysvec_apic_timer_interrupt
? sysvec_apic_timer_interrupt
accept_memory
try_to_accept_memory
do_huge_pmd_anonymous_page
get_page_from_freelist
__handle_mm_fault
__alloc_pages
__folio_alloc
? __tdx_hypercall
handle_mm_fault
vma_alloc_folio
do_user_addr_fault
do_huge_pmd_anonymous_page
exc_page_fault
? __do_huge_pmd_anonymous_page
asm_exc_page_fault
__handle_mm_fault
When the local irq is enabled at the end of accept_memory(), the
softlockup detects that the watchdog on single CPU has not been fed for
a while. That is to say, even other CPUs will not be blocked by
spinlock, the current CPU might be stunk with local irq disabled for a
while, which hurts not only nmi watchdog but also softlockup.
Chao Gao pointed out that the memory accept could be time costly and
there was similar report before. Thus to avoid any softlocup detection
during this stage, give the softlockup a flag to skip the timeout check
at the end of accept_memory(), by invoking touch_softlockup_watchdog().
Reported-by: Hossain, Md Iqbal <md.iqbal.hossain@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: 50e782a86c ("efi/unaccepted: Fix soft lockups caused by parallel memory acceptance")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Jonathan writes:
IIO: 1st set of fixes for the 6.9 cycle.
adi,asdis16475
- Write the correct field in the register when setting the sync mode.
bosch,bmp280
- Wrong chip specific data being used for the bme280 in the SPI driver.
- Fix that we can't use chip IDs because Bosch reuses them for incompatible
devices (some require a padding byte, others don't).
maxim,max30102 (dt binding)
- Fix incorrect property check to actually match on a device from the
binding rather than a completely different one due to a typo.
memsic,mxc4005
- Fix wrong masking of interrupt register accidentally disabling temperature
compensation. Also hammer initial state to 0 as it's not documented
if interrupts are masked after reset.
- Explicit reset on probe() and resume() as some devices do not power up
correctly without a reset.
* tag 'iio-fixes-for-6.9a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio:imu: adis16475: Fix sync mode setting
iio: accel: mxc4005: Reset chip on probe() and resume()
iio: accel: mxc4005: Interrupt handling fixes
dt-bindings: iio: health: maxim,max30102: fix compatible check
iio: pressure: Fixes SPI support for BMP3xx devices
iio: pressure: Fixes BME280 SPI driver data
The default nna (node_nr_active) is used when the pool isn't tied to a
specific NUMA node. This can happen in the following cases:
1. On NUMA, if per-node pwq init failure and the fallback pwq is used.
2. On NUMA, if a pool is configured to span multiple nodes.
3. On single node setups.
5797b1c189 ("workqueue: Implement system-wide nr_active enforcement for
unbound workqueues") set the default nna->max to min_active because only #1
was being considered. For #2 and #3, using min_active means that the max
concurrency in normal operation is pushed down to min_active which is
currently 8, which can obviously lead to performance issues.
exact value nna->max is set to doesn't really matter. #2 can only happen if
the workqueue is intentionally configured to ignore NUMA boundaries and
there's no good way to distribute max_active in this case. #3 is the default
behavior on single node machines.
Let's set it the default nna->max to max_active. This fixes the artificially
lowered concurrency problem on single node machines and shouldn't hurt
anything for other cases.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 5797b1c189 ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues")
Link: https://lore.kernel.org/dm-devel/20240410084531.2134621-1-shinichiro.kawasaki@wdc.com/
Signed-off-by: Tejun Heo <tj@kernel.org>
The devm_regulator_get_enable_optional() should be a 'call and forget'
API, meaning, when it is used to enable the regulators, the API does not
provide a handle to do any further control of the regulators. It gives
no real benefit to return an error from the stub if CONFIG_REGULATOR is
not set.
On the contrary, returning an error is causing problems to drivers when
hardware is such it works out just fine with no regulator control.
Returning an error forces drivers to specifically handle the case where
CONFIG_REGULATOR is not set, making the mere existence of the stub
questionalble.
Change the stub implementation for the
devm_regulator_get_enable_optional() to return Ok so drivers do not
separately handle the case where the CONFIG_REGULATOR is not set.
Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
Fixes: da279e6965 ("regulator: Add devm helpers for get and enable")
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/ZiedtOE00Zozd3XO@fedora
Signed-off-by: Mark Brown <broonie@kernel.org>
GUSB3PIPECTL.SUSPENDENABLE and GUSB2PHYCFG.SUSPHY should be cleared
during initialization. Suspend during initialization can result in
undefined behavior due to clock synchronization failure, which often
seen as core soft reset timeout.
The programming guide recommended these bits to be cleared during
initialization for DWC_usb3.0 version 1.94 and above (along with
DWC_usb31 and DWC_usb32). The current check in the driver does not
account if it's set by default setting from coreConsultant.
This is especially the case for DRD when switching mode to ensure the
phy clocks are available to change mode. Depending on the
platforms/design, some may be affected more than others. This is noted
in the DWC_usb3x programming guide under the above registers.
Let's just disable them during driver load and mode switching. Restore
them when the controller initialization completes.
Note that some platforms workaround this issue by disabling phy suspend
through "snps,dis_u3_susphy_quirk" and "snps,dis_u2_susphy_quirk" when
they should not need to.
Cc: stable@vger.kernel.org
Fixes: 9ba3aca8fe ("usb: dwc3: Disable phy suspend after power-on reset")
Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/20da4e5a0c4678c9587d3da23f83bdd6d77353e9.1713394973.git.Thinh.Nguyen@synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This commit fixes uvc gadget support on 32-bit platforms.
Commit 0df28607c5 ("usb: gadget: uvc: Generalise helper functions for
reuse") introduced a helper function __uvcg_iter_item_entries() to aid
with parsing lists of items on configfs attributes stores. This function
is a generalization of another very similar function, which used a
stack-allocated temporary buffer of fixed size for each item in the list
and used the sizeof() operator to check for potential buffer overruns.
The new function was changed to allocate the now variably sized temp
buffer on heap, but wasn't properly updated to also check for max buffer
size using the computed size instead of sizeof() operator.
As a result, the maximum item size was 7 (plus null terminator) on
64-bit platforms, and 3 on 32-bit ones. While 7 is accidentally just
barely enough, 3 is definitely too small for some of UVC configfs
attributes. For example, dwFrameInteval, specified in 100ns units,
usually has 6-digit item values, e.g. 166666 for 60fps.
Cc: stable@vger.kernel.org
Fixes: 0df28607c5 ("usb: gadget: uvc: Generalise helper functions for reuse")
Signed-off-by: Ivan Avdeev <me@provod.works>
Link: https://lore.kernel.org/r/20240413150124.1062026-1-me@provod.works
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The OS descriptors logic had the high/low byte of w_value inverted, causing
the extended properties to not be accessible for interface != 0.
>From the Microsoft documentation:
https://learn.microsoft.com/en-us/windows-hardware/drivers/usbcon/microsoft-os-1-0-descriptors-specification
OS_Desc_CompatID.doc (w_index = 0x4):
- wValue:
High Byte = InterfaceNumber. InterfaceNumber is set to the number of the
interface or function that is associated with the descriptor, typically
0x00. Because a device can have only one extended compat ID descriptor,
it should ignore InterfaceNumber, regardless of the value, and simply
return the descriptor.
Low Byte = 0. PageNumber is used to retrieve descriptors that are larger
than 64 KB. The header section is 16 bytes, so PageNumber is set to 0 for
this request.
We currently do not support >64KB compat ID descriptors, so verify that the
low byte is 0.
OS_Desc_Ext_Prop.doc (w_index = 0x5):
- wValue:
High byte = InterfaceNumber. The high byte of wValue is set to the number
of the interface or function that is associated with the descriptor.
Low byte = PageNumber. The low byte of wValue is used to retrieve
descriptors that are larger than 64 KB. The header section is 10 bytes, so
PageNumber is set to 0 for this request.
We also don't support >64KB extended properties, so verify that the low byte
is 0 and use the high byte for the interface number.
Fixes: 37a3a53342 ("usb: gadget: OS Feature Descriptors support")
Cc: stable <stable@kernel.org>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
Link: https://lore.kernel.org/r/20240404100635.3215340-1-peter@korsgaard.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
FFS based applications can utilize the aio_cancel() callback to dequeue
pending USB requests submitted to the UDC. There is a scenario where the
FFS application issues an AIO cancel call, while the UDC is handling a
soft disconnect. For a DWC3 based implementation, the callstack looks
like the following:
DWC3 Gadget FFS Application
dwc3_gadget_soft_disconnect() ...
--> dwc3_stop_active_transfers()
--> dwc3_gadget_giveback(-ESHUTDOWN)
--> ffs_epfile_async_io_complete() ffs_aio_cancel()
--> usb_ep_free_request() --> usb_ep_dequeue()
There is currently no locking implemented between the AIO completion
handler and AIO cancel, so the issue occurs if the completion routine is
running in parallel to an AIO cancel call coming from the FFS application.
As the completion call frees the USB request (io_data->req) the FFS
application is also referencing it for the usb_ep_dequeue() call. This can
lead to accessing a stale/hanging pointer.
commit b566d38857 ("usb: gadget: f_fs: use io_data->status consistently")
relocated the usb_ep_free_request() into ffs_epfile_async_io_complete().
However, in order to properly implement locking to mitigate this issue, the
spinlock can't be added to ffs_epfile_async_io_complete(), as
usb_ep_dequeue() (if successfully dequeuing a USB request) will call the
function driver's completion handler in the same context. Hence, leading
into a deadlock.
Fix this issue by moving the usb_ep_free_request() back to
ffs_user_copy_worker(), and ensuring that it explicitly sets io_data->req
to NULL after freeing it within the ffs->eps_lock. This resolves the race
condition above, as the ffs_aio_cancel() routine will not continue
attempting to dequeue a request that has already been freed, or the
ffs_user_copy_work() not freeing the USB request until the AIO cancel is
done referencing it.
This fix depends on
commit b566d38857 ("usb: gadget: f_fs: use io_data->status
consistently")
Fixes: 2e4c7553cd ("usb: gadget: f_fs: add aio support")
Cc: stable <stable@kernel.org> # b566d38857 ("usb: gadget: f_fs: use io_data->status consistently")
Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
Link: https://lore.kernel.org/r/20240409014059.6740-1-quic_wcheng@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With cpu_possible_mask=0-63 and cpu_online_mask=0-7 the following
kernel oops was observed:
smp: Bringing up secondary CPUs ...
smp: Brought up 1 node, 8 CPUs
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000000000000 TEID: 0000000000000803
[..]
Call Trace:
arch_vcpu_is_preempted+0x12/0x80
select_idle_sibling+0x42/0x560
select_task_rq_fair+0x29a/0x3b0
try_to_wake_up+0x38e/0x6e0
kick_pool+0xa4/0x198
__queue_work.part.0+0x2bc/0x3a8
call_timer_fn+0x36/0x160
__run_timers+0x1e2/0x328
__run_timer_base+0x5a/0x88
run_timer_softirq+0x40/0x78
__do_softirq+0x118/0x388
irq_exit_rcu+0xc0/0xd8
do_ext_irq+0xae/0x168
ext_int_handler+0xbe/0xf0
psw_idle_exit+0x0/0xc
default_idle_call+0x3c/0x110
do_idle+0xd4/0x158
cpu_startup_entry+0x40/0x48
rest_init+0xc6/0xc8
start_kernel+0x3c4/0x5e0
startup_continue+0x3c/0x50
The crash is caused by calling arch_vcpu_is_preempted() for an offline
CPU. To avoid this, select the cpu with cpumask_any_and_distribute()
to mask __pod_cpumask with cpu_online_mask. In case no cpu is left in
the pool, skip the assignment.
tj: This doesn't fully fix the bug as CPUs can still go down between picking
the target CPU and the wake call. Fixing that likely requires adding
cpu_online() test to either the sched or s390 arch code. However, regardless
of how that is fixed, workqueue shouldn't be picking a CPU which isn't
online as that would result in unpredictable and worse behavior.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Fixes: 8639ecebc9 ("workqueue: Implement non-strict affinity scope for unbound workqueues")
Cc: stable@vger.kernel.org # v6.6+
Signed-off-by: Tejun Heo <tj@kernel.org>
At the time of LPAR boot up, partition firmware provides Open Firmware
property ibm,dma-window for the PE. This property is provided on the PCI
bus the PE is attached to.
There are execptions where the partition firmware might not provide this
property for the PE at the time of LPAR boot up. One of the scenario is
where the firmware has frozen the PE due to some error condition. This
PE is frozen for 24 hours or unless the whole system is reinitialized.
Within this time frame, if the LPAR is booted, the frozen PE will be
presented to the LPAR but ibm,dma-window property could be missing.
Today, under these circumstances, the LPAR oopses with NULL pointer
dereference, when configuring the PCI bus the PE is attached to.
BUG: Kernel NULL pointer dereference on read at 0x000000c8
Faulting instruction address: 0xc0000000001024c0
Oops: Kernel access of bad area, sig: 7 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
Supported: Yes
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries
NIP: c0000000001024c0 LR: c0000000001024b0 CTR: c000000000102450
REGS: c0000000037db5c0 TRAP: 0300 Not tainted (6.4.0-150600.9-default)
MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 28000822 XER: 00000000
CFAR: c00000000010254c DAR: 00000000000000c8 DSISR: 00080000 IRQMASK: 0
...
NIP [c0000000001024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
LR [c0000000001024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
Call Trace:
pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
pcibios_setup_bus_self+0x1c0/0x370
__of_scan_bus+0x2f8/0x330
pcibios_scan_phb+0x280/0x3d0
pcibios_init+0x88/0x12c
do_one_initcall+0x60/0x320
kernel_init_freeable+0x344/0x3e4
kernel_init+0x34/0x1d0
ret_from_kernel_user_thread+0x14/0x1c
Fixes: b1fc44eaa9 ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window")
Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240422205141.10662-1-gbatra@linux.ibm.com
The devm_regulator_get_enable() should be a 'call and forget' API,
meaning, when it is used to enable the regulators, the API does not
provide a handle to do any further control of the regulators. It gives
no real benefit to return an error from the stub if CONFIG_REGULATOR is
not set.
On the contrary, returning and error is causing problems to drivers when
hardware is such it works out just fine with no regulator control.
Returning an error forces drivers to specifically handle the case where
CONFIG_REGULATOR is not set, making the mere existence of the stub
questionalble. Furthermore, the stub of the regulator_enable() seems to
be returning Ok.
Change the stub implementation for the devm_regulator_get_enable() to
return Ok so drivers do not separately handle the case where the
CONFIG_REGULATOR is not set.
Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
Reported-by: Aleksander Mazur <deweloper@wp.pl>
Suggested-by: Guenter Roeck <linux@roeck-us.net>
Fixes: da279e6965 ("regulator: Add devm helpers for get and enable")
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/ZiYF6d1V1vSPcsJS@drtxq0yyyyyyyyyyyyyby-3.rev.dnainternet.fi
Signed-off-by: Mark Brown <broonie@kernel.org>
Static checkers complain that the silicon_uid variable passed by
pointer to cs35l56_read_silicon_uid() could later be used
uninitialised when calling cs_amp_get_efi_calibration_data().
cs35l56_read_silicon_uid() must have succeeded to call
cs_amp_get_efi_calibration_data() and that would have populated the
variable.
However, initialise the value so we are not haunted by it forevermore.
Signed-off-by: Simon Trimmer <simont@opensource.cirrus.com>
Fixes: e1830f66f6 ("ASoC: cs35l56: Add helper functions for amp calibration")
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20240422103211.236063-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull Qualcomm clk driver fixes from Bjorn Andersson:
The introduction of support for the external VDD_GFX supply in SA8295P
ADP unearthed a lockdep problem in the GDSC code w.r.t regulator
supplies. Make the regulator optional, to avoid creating a dummy
regulator, on those boards that doesn't use this. While not solving the
root cause of the problem, it reduces the impact of the lockdep warning
- and it avoids wasting resources.
Refactoring of the RPM clock driver accidentally removed num_clks from
msm8976. Reintroduce this to get the clocks back.
* tag 'qcom-clk-fixes-for-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
clk: qcom: smd-rpm: Restore msm8976 num_clk
clk: qcom: gdsc: treat optional supplies as optional
When redirecting a packet using XDP, the bpf_redirect_map() helper will set
up the redirect destination information in struct bpf_redirect_info (using
the __bpf_xdp_redirect_map() helper function), and the xdp_do_redirect()
function will read this information after the XDP program returns and pass
the frame on to the right redirect destination.
When using the BPF_F_BROADCAST flag to do multicast redirect to a whole
map, __bpf_xdp_redirect_map() sets the 'map' pointer in struct
bpf_redirect_info to point to the destination map to be broadcast. And
xdp_do_redirect() reacts to the value of this map pointer to decide whether
it's dealing with a broadcast or a single-value redirect. However, if the
destination map is being destroyed before xdp_do_redirect() is called, the
map pointer will be cleared out (by bpf_clear_redirect_map()) without
waiting for any XDP programs to stop running. This causes xdp_do_redirect()
to think that the redirect was to a single target, but the target pointer
is also NULL (since broadcast redirects don't have a single target), so
this causes a crash when a NULL pointer is passed to dev_map_enqueue().
To fix this, change xdp_do_redirect() to react directly to the presence of
the BPF_F_BROADCAST flag in the 'flags' value in struct bpf_redirect_info
to disambiguate between a single-target and a broadcast redirect. And only
read the 'map' pointer if the broadcast flag is set, aborting if that has
been cleared out in the meantime. This prevents the crash, while keeping
the atomic (cmpxchg-based) clearing of the map pointer itself, and without
adding any more checks in the non-broadcast fast path.
Fixes: e624d4ed4a ("xdp: Extend xdp_redirect_map with broadcast support")
Reported-and-tested-by: syzbot+af9492708df9797198d6@syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20240418071840.156411-1-toke@redhat.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The current implementation of the mov instruction with sign extension has the
following problems:
1. It clobbers the source register if it is not stacked because it
sign extends the source and then moves it to the destination.
2. If the dst_reg is stacked, the current code doesn't write the value
back in case of 64-bit mov.
3. There is room for improvement by emitting fewer instructions.
The steps for fixing this and the instructions emitted by the JIT are explained
below with examples in all combinations:
Case A: offset == 32:
=====================
Case A.1: src and dst are stacked registers:
--------------------------------------------
1. Load src_lo into tmp_lo
2. Store tmp_lo into dst_lo
3. Sign extend tmp_lo into tmp_hi
4. Store tmp_hi to dst_hi
Example: r3 = (s32)r3
r3 is a stacked register
ldr r6, [r11, #-16] // Load r3_lo into tmp_lo
// str to dst_lo is not emitted because src_lo == dst_lo
asr r7, r6, #31 // Sign extend tmp_lo into tmp_hi
str r7, [r11, #-12] // Store tmp_hi into r3_hi
Case A.2: src is stacked but dst is not:
----------------------------------------
1. Load src_lo into dst_lo
2. Sign extend dst_lo into dst_hi
Example: r6 = (s32)r3
r6 maps to {ARM_R5, ARM_R4} and r3 is stacked
ldr r4, [r11, #-16] // Load r3_lo into r6_lo
asr r5, r4, #31 // Sign extend r6_lo into r6_hi
Case A.3: src is not stacked but dst is stacked:
------------------------------------------------
1. Store src_lo into dst_lo
2. Sign extend src_lo into tmp_hi
3. Store tmp_hi to dst_hi
Example: r3 = (s32)r6
r3 is stacked and r6 maps to {ARM_R5, ARM_R4}
str r4, [r11, #-16] // Store r6_lo to r3_lo
asr r7, r4, #31 // Sign extend r6_lo into tmp_hi
str r7, [r11, #-12] // Store tmp_hi to dest_hi
Case A.4: Both src and dst are not stacked:
-------------------------------------------
1. Mov src_lo into dst_lo
2. Sign extend src_lo into dst_hi
Example: (bf) r6 = (s32)r6
r6 maps to {ARM_R5, ARM_R4}
// Mov not emitted because dst == src
asr r5, r4, #31 // Sign extend r6_lo into r6_hi
Case B: offset != 32:
=====================
Case B.1: src and dst are stacked registers:
--------------------------------------------
1. Load src_lo into tmp_lo
2. Sign extend tmp_lo according to offset.
3. Store tmp_lo into dst_lo
4. Sign extend tmp_lo into tmp_hi
5. Store tmp_hi to dst_hi
Example: r9 = (s8)r3
r9 and r3 are both stacked registers
ldr r6, [r11, #-16] // Load r3_lo into tmp_lo
lsl r6, r6, #24 // Sign extend tmp_lo
asr r6, r6, #24 // ..
str r6, [r11, #-56] // Store tmp_lo to r9_lo
asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi
str r7, [r11, #-52] // Store tmp_hi to r9_hi
Case B.2: src is stacked but dst is not:
----------------------------------------
1. Load src_lo into dst_lo
2. Sign extend dst_lo according to offset.
3. Sign extend tmp_lo into dst_hi
Example: r6 = (s8)r3
r6 maps to {ARM_R5, ARM_R4} and r3 is stacked
ldr r4, [r11, #-16] // Load r3_lo to r6_lo
lsl r4, r4, #24 // Sign extend r6_lo
asr r4, r4, #24 // ..
asr r5, r4, #31 // Sign extend r6_lo into r6_hi
Case B.3: src is not stacked but dst is stacked:
------------------------------------------------
1. Sign extend src_lo into tmp_lo according to offset.
2. Store tmp_lo into dst_lo.
3. Sign extend src_lo into tmp_hi.
4. Store tmp_hi to dst_hi.
Example: r3 = (s8)r1
r3 is stacked and r1 maps to {ARM_R3, ARM_R2}
lsl r6, r2, #24 // Sign extend r1_lo to tmp_lo
asr r6, r6, #24 // ..
str r6, [r11, #-16] // Store tmp_lo to r3_lo
asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi
str r7, [r11, #-12] // Store tmp_hi to r3_hi
Case B.4: Both src and dst are not stacked:
-------------------------------------------
1. Sign extend src_lo into dst_lo according to offset.
2. Sign extend dst_lo into dst_hi.
Example: r6 = (s8)r1
r6 maps to {ARM_R5, ARM_R4} and r1 maps to {ARM_R3, ARM_R2}
lsl r4, r2, #24 // Sign extend r1_lo to r6_lo
asr r4, r4, #24 // ..
asr r5, r4, #31 // Sign extend r6_lo to r6_hi
Fixes: fc832653fa ("arm32, bpf: add support for sign-extension mov instruction")
Reported-by: syzbot+186522670e6722692d86@syzkaller.appspotmail.com
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Closes: https://lore.kernel.org/all/000000000000e9a8d80615163f2a@google.com
Link: https://lore.kernel.org/bpf/20240419182832.27707-1-puranjay@kernel.org
The function __storage_key_init_range() expects the end address to be
the first byte outside the range to be initialized. I.e. end - start
should be the size of the area to be initialized.
The current code works because __storage_key_init_range() will still loop
over every page in the range, but it is slower than using sske_frame().
Fixes: 3afdfca698 ("s390/mm: Clear skeys for newly mapped huge guest pmds")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20240416114220.28489-3-imbrenda@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
The function __storage_key_init_range() expects the end address to be
the first byte outside the range to be initialized. I.e. end - start
should be the size of the area to be initialized.
The current code works because __storage_key_init_range() will still loop
over every page in the range, but it is slower than using sske_frame().
Fixes: 964c2c05c9 ("s390/mm: Clear huge page storage keys on enable_skey")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20240416114220.28489-2-imbrenda@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Commit dce0919c83 ("irqchip/renesas-rzg2l: Do not set TIEN and TINT
source at the same time") removed the setup of TINT from
rzg2l_irqc_irq_enable(). To address the spurious interrupt issue the
setup of TINT has been moved in rzg2l_tint_set_edge() through
rzg2l_disable_tint_and_set_tint_source(). With this, the interrupts are
not properly re-configured after a suspend-to-RAM cycle. To address
this issue and avoid spurious interrupts while resumming set the
interrupt type before enabling it.
Fixes: dce0919c83 ("irqchip/renesas-rzg2l: Do not set TIEN and TINT source at the same time")
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20240419063822.3467424-1-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
When using davinci-mcasp as CPU DAI with simple-card, there are some
conditions that cause simple-card to finish registering a sound card before
davinci-mcasp finishes registering all sound components. This creates a
non-working sound card from userspace with no problem indication apart
from not being able to play/record audio on a PCM stream. The issue
arises during simultaneous probe execution of both drivers. Specifically,
the simple-card driver, awaiting a CPU DAI, proceeds as soon as
davinci-mcasp registers its DAI. However, this process can lead to the
client mutex lock (client_mutex in soc-core.c) being held or davinci-mcasp
being preempted before PCM DMA registration on davinci-mcasp finishes.
This situation occurs when the probes of both drivers run concurrently.
Below is the code path for this condition. To solve the issue, defer
davinci-mcasp CPU DAI registration to the last step in the audio part of
it. This way, simple-card CPU DAI parsing will be deferred until all
audio components are registered.
Fail Code Path:
simple-card.c: probe starts
simple-card.c: simple_dai_link_of: simple_parse_node(..,cpu,..) returns EPROBE_DEFER, no CPU DAI yet
davinci-mcasp.c: probe starts
davinci-mcasp.c: devm_snd_soc_register_component() register CPU DAI
simple-card.c: probes again, finish CPU DAI parsing and call devm_snd_soc_register_card()
simple-card.c: finish probe
davinci-mcasp.c: *dma_pcm_platform_register() register PCM DMA
davinci-mcasp.c: probe finish
Cc: stable@vger.kernel.org
Fixes: 9fbd58cf4a ("ASoC: davinci-mcasp: Choose PCM driver based on configured DMA controller")
Signed-off-by: Joao Paulo Goncalves <joao.goncalves@toradex.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Reviewed-by: Jai Luthra <j-luthra@ti.com>
Link: https://lore.kernel.org/r/20240417184138.1104774-1-jpaulo.silvagoncalves@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
When creating controls attached to widgets, there are a lot of rules if
they get their name prefixed with widget name or not. Due to that
controls ended up with weirdly looking names like "ssp0_fe DSP Volume",
while topology set it to "DSP Volume".
Fix this by setting no_wname_in_kcontrol_name to true in avs topology
widgets which disables unwanted behaviour.
Fixes: be2b81b519 ("ASoC: Intel: avs: Parse control tuples")
Signed-off-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
Link: https://lore.kernel.org/r/20240418142621.2487478-1-amadeuszx.slawinski@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
pinctrl: renesas: Fixes for v6.9
- Fix a dtbs_check warning on RZ/G3S,
- Fix a lockdep warning on RZ/G2L.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The of_match shall correspond to the name of the regulator subnode,
or the deprecated `regulator-compatible` property must be used:
failing to do so, the regulator won't probe (and the driver will
as well not probe).
Since the devicetree binding for this driver is actually correct
and wants DTs to use the "usb-otg-vbus-regulator" subnode name,
fix this driver by aligning the `of_match` string to what the DT
binding wants.
Fixes: 0402e8ebb8 ("power: supply: mt6360_charger: add MT6360 charger support")
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Link: https://lore.kernel.org/r/20240410084405.1389378-1-angelogioacchino.delregno@collabora.com
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
The version format string in the AXI SPI Engine driver was probably
intended to print the version number in the same format as the DT
compatible string (e.g. 1.00.a). However, the version just uses
semantic versioning so formatting the patch number as a character
is not correct and would result in printing control characters for
patch numbers less than 32.
Fixes: b1353d1c1d ("spi: Add Analog Devices AXI SPI Engine controller support")
Signed-off-by: David Lechner <dlechner@baylibre.com>
Link: https://lore.kernel.org/r/20240412-axi-spi-engine-version-printf-v1-1-95e1e842c1a6@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Long ago a map file descriptor in a pseudo ldimm64 instruction could
only be present as an immediate value insn[0].imm, and thus this value
was used in a verbose verifier message printed when the file descriptor
wasn't valid. Since addition of BPF_PSEUDO_MAP_IDX_VALUE/BPF_PSEUDO_MAP_IDX
the insn[0].imm field can also contain an index pointing to the file
descriptor in the attr.fd_array array. However, if the file descriptor
is invalid, the verifier still prints the verbose message containing
value of insn[0].imm. Patch the verifier message to always print the
actual file descriptor value.
Fixes: 387544bfa2 ("bpf: Introduce fd_idx")
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240412141100.3562942-1-aspsk@isovalent.com
Move the stray '.' that is currently at the end of the line after
newline '\n' to before newline character which is the right position.
Fixes: ce8d78616a ("nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH")
Signed-off-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Currently, all ASoC systems are set to use VPMON for DSP1RX5_SRC,
however, this is required only for internal boost systems.
External boost systems require VBSTMON instead of VPMON to be the
input to DSP1RX5_SRC.
Shared Boost Active acts like Internal boost (requires VPMON).
Shared Boost Passive acts like External boost (requires VBSTMON)
All systems require DSP1RX6_SRC to be set to VBSTMON.
Signed-off-by: Stefan Binding <sbinding@opensource.cirrus.com>
Reviewed-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://msgid.link/r/20240411142648.650921-1-sbinding@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The rt9455_boost_voltage_values[] array is only used when USB PHY
support is enabled, causing a W=1 warning otherwise:
drivers/power/supply/rt9455_charger.c:200:18: error: 'rt9455_boost_voltage_values' defined but not used [-Werror=unused-const-variable=]
Enclose the definition in the same #ifdef as the references to it.
Fixes: e86d69dd78 ("power_supply: Add support for Richtek RT9455 battery charger")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240403080702.3509288-10-arnd@kernel.org
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Merge series from Richard Fitzgerald <rf@opensource.cirrus.com>:
This chain fixes some problems with some previous patches for handling
the ASP1 config registers. The root of the problem is that the ownership
of these registers can be either with the firmware or the driver, and that
the chip has to be soft-reset after downloading the firmware.
This chain adds and uses a regmap_read_bypassed() function so that the
driver can leave the regmap in cache-only until the chip has rebooted,
but still poll a register to detect when the chip has rebooted.
Richard Fitzgerald (4):
regmap: Add regmap_read_bypassed()
ALSA: hda: cs35l56: Exit cache-only after
cs35l56_wait_for_firmware_boot()
ASoC: cs35l56: Fix unintended bus access while resetting amp
ASoC: cs35l56: Prevent overwriting firmware ASP config
drivers/base/regmap/regmap.c | 37 ++++++++++++++
include/linux/regmap.h | 8 +++
include/sound/cs35l56.h | 2 +
sound/pci/hda/cs35l56_hda.c | 4 ++
sound/soc/codecs/cs35l56-sdw.c | 2 -
sound/soc/codecs/cs35l56-shared.c | 83 ++++++++++++++++++++-----------
sound/soc/codecs/cs35l56.c | 26 +++++++++-
7 files changed, 130 insertions(+), 32 deletions(-)
--
2.39.2
Merge series from Peter Ujfalusi <peter.ujfalusi@linux.intel.com>:
The current code will reset the ChainDMA on release unconditionally which
can result the following error when the CHainDMA is not allocated:
ipc tx : 0xe040000|0x0: GLB_CHAIN_DMA
ipc tx reply: 0x2e000007|0x0: GLB_CHAIN_DMA
FW reported error: 7 - Unsupported operation requested
ipc error for msg 0xe040000|0x0
sof_pcm_stream_free: pcm_ops hw_free failed -22
Background:
Pulseaudio and Pipewire on startup opens all available streams and
closes them without triggering a start (after probing it's capabilities).
Merge series from Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>:
We somehow missed the default path for DSP libraries for LNL, and need
to restrict support for D0i3 w/ IPC4. Also add debugfs support for
firmware profile information so that sof-test scripts can show what is
being tested.
The codec leaves tie combo jack's sleeve/ring2 to floating status
default. It would cause electric noise while connecting the active
speaker jack during boot or shutdown.
This patch requests a gpio to control the additional jack circuit
to tie the contacts to the ground or floating.
Signed-off-by: Derek Fang <derek.fang@realtek.com>
Link: https://msgid.link/r/20240408091057.14165-1-derek.fang@realtek.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The MT6360 regulator binding, the example in the MT6360 mfd binding, and
the devicetree users of those bindings are rightfully declaring MT6360
regulator subnodes with non-capital names, and luckily without using the
deprecated regulator-compatible property.
With this driver declaring capitalized BUCKx/LDOx as of_match string for
the node names, obviously no regulator gets probed: fix that by changing
the MT6360_REGULATOR_DESC macro to add a "match" parameter which gets
assigned to the of_match.
Fixes: d321571d5e ("regulator: mt6360: Add support for MT6360 regulator")
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://msgid.link/r/20240409144438.410060-1-angelogioacchino.delregno@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The ChainDMA operation differs from normal pipelines that it is only
created when the stream started, in fact a PCM using ChainDMA has no
pipelines, modules.
To reset a ChainDMA, it needs to be first allocated in firmware. When
PulseAudio/PipeWire starts, they will probe the PCMs by opening them, check
hw_params and then close the PCM without starting audio.
Unconditionally resetting the ChainDMA can result the following error:
ipc tx : 0xe040000|0x0: GLB_CHAIN_DMA
ipc tx reply: 0x2e000007|0x0: GLB_CHAIN_DMA
FW reported error: 7 - Unsupported operation requested
ipc error for msg 0xe040000|0x0
sof_pcm_stream_free: pcm_ops hw_free failed -22
Add a new chain_dma_allocated flag to sof_ipc4_pcm_stream_priv to store the
ChainDMA allocation state and use this flag to skip sending the reset if
the ChainDMA is not allocated.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://msgid.link/r/20240409110036.9411-5-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
GPIO_S0_SC57 / GPIO_S0_SC61 can be muxed to PCU_UART_TXD / PCU_UART_RXD,
add a pinconf group for this.
On Bay Trail board schematics using these pins as UART these are
called UART3_TXD / UART3_RXD, name the pinconf group "uart3_grp"
to be consistent with the schematics.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
For all the "score" pin-groups all the intel_pingroup-s to select
the non GPIO function are re-used for byt_score_gpio_groups[].
But this is incorrect since a pin-group includes the mode setting,
which for the non GPIO functions generally is 1, where as to select
the GPIO function mode must be set to 0.
So the GPIO function needs separate intel_pingroup-s with their own mode
value of 0.
Add a new PIN_GROUP_GPIO macro which adds a foo_gpio entry to each
pin-group defined this way and update byt_score_gpio_groups[] to point
to the new foo_gpio entries.
The "sus" usb_oc_grp usb_ulpi_grp and pcu_spi_grp pin-groups are special
because these have a non 0 mode value to select the GPIO functions and
these already have matching foo_gpio pin-groups, leave these are unchanged.
The pmu_clk "sus" groups added in commit 2f46d7f7e9 ("pinctrl: baytrail:
Add pinconf group + function for the pmu_clk") do need to use the new
PIN_GROUP_GPIO macro.
Fixes: 2f46d7f7e9 ("pinctrl: baytrail: Add pinconf group + function for the pmu_clk")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Only populate the ASP1 config registers in the regmap cache if the
ASP DAI is used. This prevents regcache_sync() from overwriting
these registers with their defaults when the firmware owns
control of these registers.
On a SoundWire system the ASP could be owned by the firmware to
share reference audio with the firmware on other cs35l56. Or it
can be used as a normal codec-codec interface owned by the driver.
The driver must not overwrite the registers if the firmware has
control of them.
The original implementation for this in commit 07f7d6e7a1
("ASoC: cs35l56: Fix for initializing ASP1 mixer registers") was
to still provide defaults for these registers, assuming that if
they were never reconfigured from defaults then regcache_sync()
would not write them out because they are not dirty. Unfortunately
regcache_sync() is not that smart. If the chip has not reset (so
the driver has not called regcache_mark_dirty()) a regcache_sync()
could write out registers that are not dirty.
To avoid accidental overwriting of the ASP registers, they are
removed from the table of defaults and instead are populated with
defaults only if one of the ASP DAI configuration functions is
called. So if the DAI has never been configured, the firmware is
assumed to have ownership of these registers, and the regmap cache
will not contain any entries for them.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 07f7d6e7a1 ("ASoC: cs35l56: Fix for initializing ASP1 mixer registers")
Link: https://msgid.link/r/20240408101803.43183-5-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use the new regmap_read_bypassed() so that the regmap can be left
in cache-only mode while it is booting, but the driver can still
read boot-status and chip-id information during this time.
This fixes race conditions where some writes could be issued to the
silicon while it is still rebooting, before the driver has determined
that the boot is complete.
This is typically prevented by putting regmap into cache-only until the
hardware is ready. But this assumes that the driver does not need to
access device registers to determine when it is "ready". For cs35l56
this involves polling a register and the original implementation relied
on having special handlers to block racing callbacks until dsp_work()
is complete. However, some cases were missed, most notably the ASP DAI
functions.
The regmap_read_bypassed() function allows the fix for this to be
simplified to putting regmap into cache-only during the reset. The
initial boot stages (poll HALO_STATE and read the chip ID) are all done
bypassed. Only when the amp is seen to be booted is the cache-only
revoked.
Changes are:
- cs35l56_system_reset() now leaves the regmap in cache-only status.
- cs35l56_wait_for_firmware_boot() polls using regmap_read_bypassed().
- cs35l56_init() revokes cache-only either via cs35l56_hw_init() or
when firmware has rebooted after a soft reset.
- cs35l56_hw_init() exits cache-only after it has determined that the
amp has booted.
- cs35l56_sdw_init() doesn't disable cache-only, since this must be
deferred to cs35l56_init().
- cs35l56_runtime_resume_common() waits for firmware boot before exiting
cache-only.
These changes cover three situations where the registers are not
accessible:
1) SoundWire first-time enumeration. The regmap is kept in cache-only
until the chip is fully booted. The original code had to exit
cache-only to read chip status in cs35l56_init() and cs35l56_hw_init()
but this is now deferred to after the firmware has rebooted.
In this case cs35l56_sdw_probe() leaves regmap in cache-only
(unchanged behaviour) and cs35l56_hw_init() exits cache-only after the
firmware is booted and the chip identified.
2) Soft reset during first-time initialization. cs35l56_init() calls
cs35l56_system_reset(), which puts regmap into cache-only.
On I2C/SPI cs35l56_init() then flows through to call
cs35l56_wait_for_firmware_boot() and exit cache-only. On SoundWire
the re-enumeration will enter cs35l56_init() again, which then drops
down to call cs35l56_wait_for_firmware_boot() and exit cache-only.
3) Soft reset after firmware download. dsp_work() calls
cs35l56_system_reset(), which puts regmap into cache-only. After this
the flow is the same as (2).
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 8a731fd37f ("ASoC: cs35l56: Move utility functions to shared file")
Link: https://msgid.link/r/20240408101803.43183-4-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Adds calls to disable regmap cache-only after a successful return from
cs35l56_wait_for_firmware_boot().
This is to prepare for a change in the shared ASoC module that will
leave regmap in cache-only mode after cs35l56_system_reset(). This is
to prevent register accesses going to the hardware while it is
rebooting.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://msgid.link/r/20240408101803.43183-3-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Add a regmap_read_bypassed() to allow reads from the hardware registers
while the regmap is in cache-only mode.
A typical use for this is to keep the cache in cache-only mode until
the hardware has reached a valid state, but one or more status registers
must be polled to determine when this state is reached.
For example, firmware download on the cs35l56 can take several seconds if
there are multiple amps sharing limited bus bandwidth. This is too long
to block in probe() so it is done as a background task. The device must
be soft-reset to reboot the firmware and during this time the registers are
not accessible, so the cache should be in cache-only. But the driver must
poll a register to detect when reboot has completed.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 8a731fd37f ("ASoC: cs35l56: Move utility functions to shared file")
Link: https://msgid.link/r/20240408101803.43183-2-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The regulator IRQ helper requires caller to provide pointer to IRQ name
which is kept in memory by caller. All other data passed to the helper
in the regulator_irq_desc structure is copied. This can cause some
confusion and unnecessary complexity.
Make the regulator_irq_helper() to copy also the provided IRQ name
information so caller can discard the name after the call to
regulator_irq_helper() completes.
Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
Link: https://msgid.link/r/ZhJMuUYwaZbBXFGP@drtxq0yyyyyyyyyyyyydy-3.rev.dnainternet.fi
Signed-off-by: Mark Brown <broonie@kernel.org>
The Asus T100TA quirk has been using an exact match on a product-name of
"T100TA" but there are also T100TAM variants with a slightly higher
clocked CPU and a metal backside which need the same quirk.
Sort the existing T100TA (stereo speakers) below the more specific
T100TAF (mono speaker) quirk and switch from exact matching to
substring matching so that the T100TA quirk will also match on
the T100TAM models.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://msgid.link/r/20240407191559.21596-1-hdegoede@redhat.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Fix NULL pointer data-races in sk_psock_skb_ingress_enqueue() which
syzbot reported [1].
[1]
BUG: KCSAN: data-race in sk_psock_drop / sk_psock_skb_ingress_enqueue
write to 0xffff88814b3278b8 of 8 bytes by task 10724 on cpu 1:
sk_psock_stop_verdict net/core/skmsg.c:1257 [inline]
sk_psock_drop+0x13e/0x1f0 net/core/skmsg.c:843
sk_psock_put include/linux/skmsg.h:459 [inline]
sock_map_close+0x1a7/0x260 net/core/sock_map.c:1648
unix_release+0x4b/0x80 net/unix/af_unix.c:1048
__sock_release net/socket.c:659 [inline]
sock_close+0x68/0x150 net/socket.c:1421
__fput+0x2c1/0x660 fs/file_table.c:422
__fput_sync+0x44/0x60 fs/file_table.c:507
__do_sys_close fs/open.c:1556 [inline]
__se_sys_close+0x101/0x1b0 fs/open.c:1541
__x64_sys_close+0x1f/0x30 fs/open.c:1541
do_syscall_64+0xd3/0x1d0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
read to 0xffff88814b3278b8 of 8 bytes by task 10713 on cpu 0:
sk_psock_data_ready include/linux/skmsg.h:464 [inline]
sk_psock_skb_ingress_enqueue+0x32d/0x390 net/core/skmsg.c:555
sk_psock_skb_ingress_self+0x185/0x1e0 net/core/skmsg.c:606
sk_psock_verdict_apply net/core/skmsg.c:1008 [inline]
sk_psock_verdict_recv+0x3e4/0x4a0 net/core/skmsg.c:1202
unix_read_skb net/unix/af_unix.c:2546 [inline]
unix_stream_read_skb+0x9e/0xf0 net/unix/af_unix.c:2682
sk_psock_verdict_data_ready+0x77/0x220 net/core/skmsg.c:1223
unix_stream_sendmsg+0x527/0x860 net/unix/af_unix.c:2339
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x140/0x180 net/socket.c:745
____sys_sendmsg+0x312/0x410 net/socket.c:2584
___sys_sendmsg net/socket.c:2638 [inline]
__sys_sendmsg+0x1e9/0x280 net/socket.c:2667
__do_sys_sendmsg net/socket.c:2676 [inline]
__se_sys_sendmsg net/socket.c:2674 [inline]
__x64_sys_sendmsg+0x46/0x50 net/socket.c:2674
do_syscall_64+0xd3/0x1d0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
value changed: 0xffffffff83d7feb0 -> 0x0000000000000000
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 10713 Comm: syz-executor.4 Tainted: G W 6.8.0-syzkaller-08951-gfe46a7dd189e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Prior to this, commit 4cd12c6065 ("bpf, sockmap: Fix NULL pointer
dereference in sk_psock_verdict_data_ready()") fixed one NULL pointer
similarly due to no protection of saved_data_ready. Here is another
different caller causing the same issue because of the same reason. So
we should protect it with sk_callback_lock read lock because the writer
side in the sk_psock_drop() uses "write_lock_bh(&sk->sk_callback_lock);".
To avoid errors that could happen in future, I move those two pairs of
lock into the sk_psock_data_ready(), which is suggested by John Fastabend.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Reported-by: syzbot+aa8c8ec2538929f18f2d@syzkaller.appspotmail.com
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=aa8c8ec2538929f18f2d
Link: https://lore.kernel.org/all/20240329134037.92124-1-kerneljasonxing@gmail.com
Link: https://lore.kernel.org/bpf/20240404021001.94815-1-kerneljasonxing@gmail.com
There is a misinterpretation of some of the PIN_CONFIG_* options in this
driver library. PIN_CONFIG_OUTPUT_ENABLE should refer to a buffer or
switch in the output direction of the electrical path. The MediaTek
hardware does not have such a thing. The driver incorrectly maps this
option to the GPIO function's direction.
Likewise, PIN_CONFIG_INPUT_ENABLE should refer to a buffer or switch in
the input direction. The hardware does have such a mechanism, and is
mapped to the IES bit. The driver however sets the direction in addition
to the IES bit, which is incorrect. On readback, the IES bit isn't even
considered.
Ironically, the driver does not support readback for PIN_CONFIG_OUTPUT,
while its readback of PIN_CONFIG_{INPUT,OUTPUT}_ENABLE is what it should
be doing for PIN_CONFIG_OUTPUT.
Rework support for these three options, so that PIN_CONFIG_OUTPUT_ENABLE
is completely removed, PIN_CONFIG_INPUT_ENABLE is only linked to the IES
bit, and PIN_CONFIG_OUTPUT is linked to the GPIO function's direction
and output level.
Fixes: 805250982b ("pinctrl: mediatek: add pinctrl-paris that implements the vendor dt-bindings")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Message-ID: <20240327091336.3434141-3-wenst@chromium.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
In the generic pin config library, readback of some options are handled
differently compared to the setting of those options: the argument value
is used to convey enable/disable of an option in the set path, but
success or -EINVAL is used to convey if an option is enabled or disabled
in the debugfs readback path.
PIN_CONFIG_INPUT_SCHMITT_ENABLE is one such option. Fix the readback of
the option in the mediatek-paris library, so that the debugfs dump is
not showing "input schmitt enabled" for pins that don't have it enabled.
Fixes: 1bea6afbc8 ("pinctrl: mediatek: Refine mtk_pinconf_get()")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Message-ID: <20240327091336.3434141-2-wenst@chromium.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Significant changes have been made to workqueue, and there are staging
works transferring from tasklet, while the current translation doesn't
include description around WQ_BH, an update seems to be helpful.
Synchronize translation from upstream commit 3bc1e711c2
("workqueue: Don't implicitly make UNBOUND workqueues w/ @max_active==1 ordered")
Signed-off-by: Xingyou Chen <rockrush@rockwork.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Due to commit 8318d6a636 ("workqueue: Shorten
events_freezable_power_efficient name") we now have some stale
references in the workqeueue documentation, so updating those
references accordingly.
Signed-off-by: Audra Mitchell <audra@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Since commit deebc79b28 ("clk: qcom: gpucc-sc8280xp: Add external
supply for GX gdsc") the GDSC supply must be treated as optional to
avoid warnings like:
gpu_cc-sc8280xp 3d90000.clock-controller: supply vdd-gfx not found, using dummy regulator
on SC8280XP.
Fortunately, the driver is already prepared to handle this by checking
that the regulator pointer is non-NULL before use.
This also avoids triggering a potential deadlock on SC8280XP even if the
underlying issue still remains for the derivative platforms like SA8295P
that actually use the supply.
Fixes: deebc79b28 ("clk: qcom: gpucc-sc8280xp: Add external supply for GX gdsc")
Link: https://lore.kernel.org/lkml/Zf25Sv2x9WaCFuIH@hovoldconsulting.com/
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20240325085835.26158-1-johan+linaro@kernel.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Xtensa has two-argument MAKE_PC_FROM_RA macro to convert a0 to an actual
return address because when windowed ABI is used call{,x}{4,8,12}
opcodes stuff encoded window size into the top 2 bits of the register
that becomes a return address in the called function. Second argument of
that macro is supposed to be an address having these 2 topmost bits set
correctly, but the comment suggested that that could be the stack
address. However the stack doesn't have to be in the same 1GByte region
as the code, especially in noMMU XIP configurations.
Fix the comment and use either _text or regs->pc as the second argument
for the MAKE_PC_FROM_RA macro.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
The "pctldev" struct is allocated in devm_pinctrl_register_and_init().
It's a devm_ managed pointer that is freed by devm_pinctrl_dev_release(),
so freeing it in pinctrl_enable() will lead to a double free.
The devm_pinctrl_dev_release() function frees the pindescs and destroys
the mutex as well.
Fixes: 6118714275 ("pinctrl: core: Fix pinctrl_register_and_init() with pinctrl_enable()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Message-ID: <578fbe56-44e9-487c-ae95-29b695650f7c@moroto.mountain>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
There are 2 issues with interrupt handling in the mxc4005 driver:
1. mxc4005_set_trigger_state() writes MXC4005_REG_INT_MASK1_BIT_DRDYE
(0x01) to INT_MASK1 to enable the interrupt, but to disable the interrupt
it writes ~MXC4005_REG_INT_MASK1_BIT_DRDYE which is 0xfe, so it enables
all other interrupt sources in the INT_SRC1 register. On the MXC4005 this
is not an issue because only bit 0 of the register is used. On the MXC6655
OTOH this is a problem since bit7 is used as TC (Temperature Compensation)
disable bit and writing 1 to this disables Temperature Compensation which
should only be done when running self-tests on the chip.
Write 0 instead of ~MXC4005_REG_INT_MASK1_BIT_DRDYE to disable
the interrupts to fix this.
2. The datasheets for the MXC4005 / MXC6655 do not state what the reset
value for the INT_MASK0 and INT_MASK1 registers is and since these are
write only we also cannot learn this from the hw. Presumably the reset
value for both is all 0, which means all interrupts disabled.
Explicitly set both registers to 0 from mxc4005_chip_init() to ensure
both masks are actually set to 0.
Fixes: 79846e33aa ("iio: accel: mxc4005: add support for mxc6655")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240326113700.56725-2-hdegoede@redhat.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Lockdep detects a possible deadlock as listed below. This is because it
detects the IA55 interrupt controller .irq_eoi() API is called from
interrupt context while configuration-specific API (e.g., .irq_enable())
could be called from process context on resume path (by calling
rzg2l_gpio_irq_restore()). To avoid this, protect the call of
rzg2l_gpio_irq_enable() with spin_lock_irqsave()/spin_unlock_irqrestore().
With this the same approach that is available in __setup_irq() is mimicked
to pinctrl IRQ resume function.
Below is the lockdep report:
WARNING: inconsistent lock state
6.8.0-rc5-next-20240219-arm64-renesas-00030-gb17a289abf1f #90 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
str_rwdt_t_001./159 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff00000b001d70 (&rzg2l_irqc_data->lock){?...}-{2:2}, at: rzg2l_irqc_irq_enable+0x60/0xa4
{IN-HARDIRQ-W} state was registered at:
lock_acquire+0x1e0/0x310
_raw_spin_lock+0x44/0x58
rzg2l_irqc_eoi+0x2c/0x130
irq_chip_eoi_parent+0x18/0x20
rzg2l_gpio_irqc_eoi+0xc/0x14
handle_fasteoi_irq+0x134/0x230
generic_handle_domain_irq+0x28/0x3c
gic_handle_irq+0x4c/0xbc
call_on_irq_stack+0x24/0x34
do_interrupt_handler+0x78/0x7c
el1_interrupt+0x30/0x5c
el1h_64_irq_handler+0x14/0x1c
el1h_64_irq+0x64/0x68
_raw_spin_unlock_irqrestore+0x34/0x70
__setup_irq+0x4d4/0x6b8
request_threaded_irq+0xe8/0x1a0
request_any_context_irq+0x60/0xb8
devm_request_any_context_irq+0x74/0x104
gpio_keys_probe+0x374/0xb08
platform_probe+0x64/0xcc
really_probe+0x140/0x2ac
__driver_probe_device+0x74/0x124
driver_probe_device+0x3c/0x15c
__driver_attach+0xec/0x1c4
bus_for_each_dev+0x70/0xcc
driver_attach+0x20/0x28
bus_add_driver+0xdc/0x1d0
driver_register+0x5c/0x118
__platform_driver_register+0x24/0x2c
gpio_keys_init+0x18/0x20
do_one_initcall+0x70/0x290
kernel_init_freeable+0x294/0x504
kernel_init+0x20/0x1cc
ret_from_fork+0x10/0x20
irq event stamp: 69071
hardirqs last enabled at (69071): [<ffff800080e0dafc>] _raw_spin_unlock_irqrestore+0x6c/0x70
hardirqs last disabled at (69070): [<ffff800080e0cfec>] _raw_spin_lock_irqsave+0x7c/0x80
softirqs last enabled at (67654): [<ffff800080010614>] __do_softirq+0x494/0x4dc
softirqs last disabled at (67645): [<ffff800080015238>] ____do_softirq+0xc/0x14
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&rzg2l_irqc_data->lock);
<Interrupt>
lock(&rzg2l_irqc_data->lock);
*** DEADLOCK ***
4 locks held by str_rwdt_t_001./159:
#0: ffff00000b10f3f0 (sb_writers#4){.+.+}-{0:0}, at: vfs_write+0x1a4/0x35c
#1: ffff00000e43ba88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xe8/0x1a8
#2: ffff00000aa21dc8 (kn->active#40){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xf0/0x1a8
#3: ffff80008179d970 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend+0x9c/0x278
stack backtrace:
CPU: 0 PID: 159 Comm: str_rwdt_t_001. Not tainted 6.8.0-rc5-next-20240219-arm64-renesas-00030-gb17a289abf1f #90
Hardware name: Renesas SMARC EVK version 2 based on r9a08g045s33 (DT)
Call trace:
dump_backtrace+0x94/0xe8
show_stack+0x14/0x1c
dump_stack_lvl+0x88/0xc4
dump_stack+0x14/0x1c
print_usage_bug.part.0+0x294/0x348
mark_lock+0x6b0/0x948
__lock_acquire+0x750/0x20b0
lock_acquire+0x1e0/0x310
_raw_spin_lock+0x44/0x58
rzg2l_irqc_irq_enable+0x60/0xa4
irq_chip_enable_parent+0x1c/0x34
rzg2l_gpio_irq_enable+0xc4/0xd8
rzg2l_pinctrl_resume_noirq+0x4cc/0x520
pm_generic_resume_noirq+0x28/0x3c
genpd_finish_resume+0xc0/0xdc
genpd_resume_noirq+0x14/0x1c
dpm_run_callback+0x34/0x90
device_resume_noirq+0xa8/0x268
dpm_noirq_resume_devices+0x13c/0x160
dpm_resume_noirq+0xc/0x1c
suspend_devices_and_enter+0x2c8/0x570
pm_suspend+0x1ac/0x278
state_store+0x88/0x124
kobj_attr_store+0x14/0x24
sysfs_kf_write+0x48/0x6c
kernfs_fop_write_iter+0x118/0x1a8
vfs_write+0x270/0x35c
ksys_write+0x64/0xec
__arm64_sys_write+0x18/0x20
invoke_syscall+0x44/0x108
el0_svc_common.constprop.0+0xb4/0xd4
do_el0_svc+0x18/0x20
el0_svc+0x3c/0xb8
el0t_64_sync_handler+0xb8/0xbc
el0t_64_sync+0x14c/0x150
Fixes: 254203f9a9 ("pinctrl: renesas: rzg2l: Add suspend/resume support")
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20240320104230.446400-2-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
On the RZ/G3S SMARC platform, the 'input' property is utilized in
gpio-hog nodes, and the 'output-enable' property is used for ETH0/1 TXC
pins. Update the binding documentation to include these properties,
addressing the following dtbs_check warnings:
arch/arm64/boot/dts/renesas/r9a08g045s33-smarc.dtb: pinctrl@11030000: key-1-gpio-hog: 'anyOf' conditional failed, one must be fixed:
'input' does not match any of the regexes: 'pinctrl-[0-9]+'
arch/arm64/boot/dts/renesas/r9a08g045s33-smarc.dtb: pinctrl@11030000: eth0: 'anyOf' conditional failed, one must be fixed:
'output-enable' does not match any of the regexes: 'pinctrl-[0-9]+'
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20240313083828.5048-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.