Pull drm fixes from Dave Airlie:
"Last one for this round hopefully, mostly the usual suspects,
xe/amdgpu, with some single fixes otherwise.
There is one amdgpu HDMI blackscreen bug that came in late in the
cycle, but it was bisected and the revert is in here.
i915:
- Reject async flips when PSR's selective fetch is enabled
xe:
- Fix resource leak in xe_guc_ct_init_noalloc()'s error path
- Fix stack_depot usage without STACKDEPOT_ALWAYS_INIT
- Fix overflow in conversion from clock tics to msec
amdgpu:
- Unified MES fix
- HDMI fix
- Cursor fix
- Bightness fix
- EDID reading improvement
- UserQ fix
- Cyan Skillfish IP discovery fix
bridge:
- sil902x: Fix HDMI detection
imagination:
- Update documentation
sti:
- Fix leaks in probe
vga_switcheroo:
- Avoid race condition during fbcon initialization"
* tag 'drm-fixes-2025-11-28' of https://gitlab.freedesktop.org/drm/kernel:
drm/amdgpu: fix cyan_skillfish2 gpu info fw handling
drm/amdgpu: attach tlb fence to the PTs update
drm/amd/display: Increase EDID read retries
drm/amd/display: Don't change brightness for disabled connectors
drm/amd/display: Check NULL before accessing
Revert "drm/amd/display: Move setup_stream_attribute"
drm/xe: Fix conversion from clock ticks to milliseconds
drm/xe/guc: Fix stack_depot usage
drm/xe/guc: Fix resource leak in xe_guc_ct_init_noalloc()
drm/i915/psr: Reject async flips when selective fetch is enabled
drm, fbcon, vga_switcheroo: Avoid race condition in fbcon setup
drm/amd/amdgpu: reserve vm invalidation engine for uni_mes
drm: sti: fix device leaks at component probe
drm/imagination: Document pvr_device.power member
drm/bridge: sii902x: Fix HDMI detection with DRM_BRIDGE_ATTACH_NO_CONNECTOR
Pull dma-mapping fixes from Marek Szyprowski:
"Two last minute fixes for the recently modified DMA API infrastructure:
- proper handling of DMA_ATTR_MMIO in dma_iova_unlink() function (me)
- regression fix for the code refactoring related to P2PDMA (Pranjal
Shrivastava)"
* tag 'dma-mapping-6.18-2025-11-27' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma-direct: Fix missing sg_dma_len assignment in P2PDMA bus mappings
iommu/dma: add missing support for DMA_ATTR_MMIO for dma_iova_unlink()
Pull ACPI fix from Rafael Wysocki:
"One more urgent ACPI support fix for 6.18
There is one more commit that needs to be reverted after reverting
problematic commit 7a8c994cbb ("ACPI: processor: idle: Optimize ACPI
idle driver registration"), so revert it"
* tag 'acpi-6.18-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI: processor: Update cpuidle driver check in __acpi_processor_start()"
Pull ceph fixes from Ilya Dryomov:
"A patch to make sparse read handling work in msgr2 secure mode from
Slava and a couple of fixes from Ziming and myself to avoid operating
on potentially invalid memory, all marked for stable"
* tag 'ceph-for-6.18-rc8' of https://github.com/ceph/ceph-client:
libceph: prevent potential out-of-bounds writes in handle_auth_session_key()
libceph: replace BUG_ON with bounds check for map->max_osd
ceph: fix crash in process_v2_sparse_read() for encrypted directories
libceph: drop started parameter of __ceph_open_session()
libceph: fix potential use-after-free in have_mon_and_osd_map()
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth and CAN. No known outstanding
regressions.
Current release - regressions:
- mptcp: initialize rcv_mss before calling tcp_send_active_reset()
- eth: mlx5e: fix validation logic in rate limiting
Previous releases - regressions:
- xsk: avoid data corruption on cq descriptor number
- bluetooth:
- prevent race in socket write iter and sock bind
- fix not generating mackey and ltk when repairing
- can:
- kvaser_usb: fix potential infinite loop in command parsers
- rcar_canfd: fix CAN-FD mode as default
- eth:
- veth: reduce XDP no_direct return section to fix race
- virtio-net: avoid unnecessary checksum calculation on guest RX
Previous releases - always broken:
- sched: fix TCF_LAYER_TRANSPORT handling in tcf_get_base_ptr()
- bluetooth: mediatek: fix kernel crash when releasing iso interface
- vhost: rewind next_avail_head while discarding descriptors
- eth:
- r8169: fix RTL8127 hang on suspend/shutdown
- aquantia: add missing descriptor cache invalidation on ATL2
- dsa: microchip: fix resource releases in error path"
* tag 'net-6.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
mptcp: Initialise rcv_mss before calling tcp_send_active_reset() in mptcp_do_fastclose().
net: fec: do not register PPS event for PEROUT
net: fec: do not allow enabling PPS and PEROUT simultaneously
net: fec: do not update PEROUT if it is enabled
net: fec: cancel perout_timer when PEROUT is disabled
net: mctp: unconditionally set skb->dev on dst output
net: atlantic: fix fragment overflow handling in RX path
MAINTAINERS: separate VIRTIO NET DRIVER and add netdev
virtio-net: avoid unnecessary checksum calculation on guest RX
eth: fbnic: Fix counter roll-over issue
mptcp: clear scheduled subflows on retransmit
net: dsa: sja1105: fix SGMII linking at 10M or 100M but not passing traffic
s390/net: list Aswin Karuvally as maintainer
net: wwan: mhi: Keep modem name match with Foxconn T99W640
vhost: rewind next_avail_head while discarding descriptors
net/sched: em_canid: fix uninit-value in em_canid_match
can: rcar_canfd: Fix CAN-FD mode as default
xsk: avoid data corruption on cq descriptor number
r8169: fix RTL8127 hang on suspend/shutdown
net: sxgbe: fix potential NULL dereference in sxgbe_rx()
...
Wei Fang says:
====================
net: fec: fix some PTP related issues
There are some issues which were introduced by the commit 350749b909
("net: fec: Add support for periodic output signal of PPS"). See each
patch for more details.
====================
Link: https://patch.msgid.link/20251125085210.1094306-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There are currently two situations that can trigger the PTP interrupt,
one is the PPS event, the other is the PEROUT event. However, the irq
handler fec_pps_interrupt() does not check the irq event type and
directly registers a PPS event into the system, but the event may be
a PEROUT event. This is incorrect because PEROUT is an output signal,
while PPS is the input of the kernel PPS system. Therefore, add a check
for the event type, if pps_enable is true, it means that the current
event is a PPS event, and then the PPS event is registered.
Fixes: 350749b909 ("net: fec: Add support for periodic output signal of PPS")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251125085210.1094306-5-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In the current driver, PPS and PEROUT use the same channel to generate
the events, so they cannot be enabled at the same time. Otherwise, the
later configuration will overwrite the earlier configuration. Therefore,
when configuring PPS, the driver will check whether PEROUT is enabled.
Similarly, when configuring PEROUT, the driver will check whether PPS
is enabled.
Fixes: 350749b909 ("net: fec: Add support for periodic output signal of PPS")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251125085210.1094306-4-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If the previously set PEROUT is already active, updating it will cause
the new PEROUT to start immediately instead of at the specified time.
This is because fep->reload_period is updated whithout check whether
the PEROUT is enabled, and the old PEROUT is not disabled. Therefore,
the pulse period will be updated immediately in the pulse interrupt
handler fec_pps_interrupt().
Currently, the driver does not support directly updating PEROUT and it
will make the logic be more complicated. To fix the current issue, add
a check before enabling the PEROUT, the driver will return an error if
PEROUT is enabled. If users wants to update a new PEROUT, they should
disable the old PEROUT first.
Fixes: 350749b909 ("net: fec: Add support for periodic output signal of PPS")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251125085210.1094306-3-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The PEROUT allows the user to set a specified future time to output the
periodic signal. If the future time is far from the current time, the FEC
driver will use hrtimer to configure PEROUT one second before the future
time. However, the hrtimer will not be canceled if the PEROUT is disabled
before the hrtimer expires. So the PEROUT will be configured when the
hrtimer expires, which is not as expected. Therefore, cancel the hrtimer
in fec_ptp_pps_disable() to fix this issue.
Fixes: 350749b909 ("net: fec: Add support for periodic output signal of PPS")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251125085210.1094306-2-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
On transmit, we are currently relying on skb->dev being set by
mctp_local_output() when we first set up the skb destination fields.
However, forwarded skbs do not use the local_output path, so will retain
their incoming netdev as their ->dev on tx. This does not work when
we're forwarding between interfaces.
Set skb->dev unconditionally in the transmit path, to allow for proper
forwarding.
We keep the skb->dev initialisation in mctp_local_output(), as we use it
for fragmentation.
Fixes: 269936db5e ("net: mctp: separate routing database from routing operations")
Suggested-by: Vince Chang <vince_chang@aspeedtech.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://patch.msgid.link/20251125-dev-forward-v1-1-54ecffcd0616@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The len field originates from untrusted network packets. Boundary
checks have been added to prevent potential out-of-bounds writes when
decrypting the connection secret or processing service tickets.
[ idryomov: changelog ]
Cc: stable@vger.kernel.org
Signed-off-by: ziming zhang <ezrakiez@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
OSD indexes come from untrusted network packets. Boundary checks are
added to validate these against map->max_osd.
[ idryomov: drop BUG_ON in ceph_get_primary_affinity(), minor cosmetic
edits ]
Cc: stable@vger.kernel.org
Signed-off-by: ziming zhang <ezrakiez@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Pull smb client fix from Steve French:
"smb client multiuser (with cifscreds) mount fix"
* tag 'v6.18rc7-SMB-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix memory leak in cifs_construct_tcon()
Marc Kleine-Budde says:
====================
pull-request: can 2025-11-26
this is a pull request of 8 patches for net/main.
Seungjin Bae provides a patch for the kvaser_usb driver to fix a
potential infinite loop in the USB data stream command parser.
Thomas Mühlbacher's patch for the sja1000 driver IRQ handler's max
loop handling, that might lead to unhandled interrupts.
3 patches by me for the gs_usb driver fix handling of failed transmit
URBs and add checking of the actual length of received URBs before
accessing the data.
The next patch is by me and is a port of Thomas Mühlbacher's patch
(fix IRQ handler's max loop handling, that might lead to unhandled
interrupts.) to the sun4i_can driver.
Biju Das provides a patch for the rcar_canfd driver to fix the CAN-FD
mode setting.
The last patch is by Shaurya Rane for the em_canid filter to ensure
that the complete CAN frame is present in the linear data buffer
before accessing it.
* tag 'linux-can-fixes-for-6.18-20251126' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
net/sched: em_canid: fix uninit-value in em_canid_match
can: rcar_canfd: Fix CAN-FD mode as default
can: sun4i_can: sun4i_can_interrupt(): fix max irq loop handling
can: gs_usb: gs_usb_receive_bulk_callback(): check actual_length before accessing data
can: gs_usb: gs_usb_receive_bulk_callback(): check actual_length before accessing header
can: gs_usb: gs_usb_xmit_callback(): fix handling of failed transmitted URBs
can: sja1000: fix max irq loop handling
can: kvaser_usb: leaf: Fix potential infinite loop in command parsers
====================
Link: https://patch.msgid.link/20251126155713.217105-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The atlantic driver can receive packets with more than MAX_SKB_FRAGS (17)
fragments when handling large multi-descriptor packets. This causes an
out-of-bounds write in skb_add_rx_frag_netmem() leading to kernel panic.
The issue occurs because the driver doesn't check the total number of
fragments before calling skb_add_rx_frag(). When a packet requires more
than MAX_SKB_FRAGS fragments, the fragment index exceeds the array bounds.
Fix by assuming there will be an extra frag if buff->len > AQ_CFG_RX_HDR_SIZE,
then all fragments are accounted for. And reusing the existing check to
prevent the overflow earlier in the code path.
This crash occurred in production with an Aquantia AQC113 10G NIC.
Stack trace from production environment:
```
RIP: 0010:skb_add_rx_frag_netmem+0x29/0xd0
Code: 90 f3 0f 1e fa 0f 1f 44 00 00 48 89 f8 41 89
ca 48 89 d7 48 63 ce 8b 90 c0 00 00 00 48 c1 e1 04 48 01 ca 48 03 90
c8 00 00 00 <48> 89 7a 30 44 89 52 3c 44 89 42 38 40 f6 c7 01 75 74 48
89 fa 83
RSP: 0018:ffffa9bec02a8d50 EFLAGS: 00010287
RAX: ffff925b22e80a00 RBX: ffff925ad38d2700 RCX:
fffffffe0a0c8000
RDX: ffff9258ea95bac0 RSI: ffff925ae0a0c800 RDI:
0000000000037a40
RBP: 0000000000000024 R08: 0000000000000000 R09:
0000000000000021
R10: 0000000000000848 R11: 0000000000000000 R12:
ffffa9bec02a8e24
R13: ffff925ad8615570 R14: 0000000000000000 R15:
ffff925b22e80a00
FS: 0000000000000000(0000)
GS:ffff925e47880000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff9258ea95baf0 CR3: 0000000166022004 CR4:
0000000000f72ef0
PKRU: 55555554
Call Trace:
<IRQ>
aq_ring_rx_clean+0x175/0xe60 [atlantic]
? aq_ring_rx_clean+0x14d/0xe60 [atlantic]
? aq_ring_tx_clean+0xdf/0x190 [atlantic]
? kmem_cache_free+0x348/0x450
? aq_vec_poll+0x81/0x1d0 [atlantic]
? __napi_poll+0x28/0x1c0
? net_rx_action+0x337/0x420
```
Fixes: 6aecbba12b ("net: atlantic: add check for MAX_SKB_FRAGS")
Changes in v4:
- Add Fixes: tag to satisfy patch validation requirements.
Changes in v3:
- Fix by assuming there will be an extra frag if buff->len > AQ_CFG_RX_HDR_SIZE,
then all fragments are accounted for.
Signed-off-by: Jiefeng Zhang <jiefeng.z.zhang@gmail.com>
Link: https://patch.msgid.link/20251126032249.69358-1-jiefeng.z.zhang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit a2fb4bc4e2 ("net: implement virtio helpers to handle UDP
GSO tunneling.") inadvertently altered checksum offload behavior
for guests not using UDP GSO tunneling.
Before, tun_put_user called tun_vnet_hdr_from_skb, which passed
has_data_valid = true to virtio_net_hdr_from_skb.
After, tun_put_user began calling tun_vnet_hdr_tnl_from_skb instead,
which passes has_data_valid = false into both call sites.
This caused virtio hdr flags to not include VIRTIO_NET_HDR_F_DATA_VALID
for SKBs where skb->ip_summed == CHECKSUM_UNNECESSARY. As a result,
guests are forced to recalculate checksums unnecessarily.
Restore the previous behavior by ensuring has_data_valid = true is
passed in the !tnl_gso_type case, but only from tun side, as
virtio_net_hdr_tnl_from_skb() is used also by the virtio_net driver,
which in turn must not use VIRTIO_NET_HDR_F_DATA_VALID on tx.
cc: stable@vger.kernel.org
Fixes: a2fb4bc4e2 ("net: implement virtio helpers to handle UDP GSO tunneling.")
Signed-off-by: Jon Kohler <jon@nutanix.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20251125222754.1737443-1-jon@nutanix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix a potential counter roll-over issue in fbnic_mbx_alloc_rx_msgs()
when calculating descriptor slots. The issue occurs when head - tail
results in a large positive value (unsigned) and the compiler interprets
head - tail - 1 as a signed value.
Since FBNIC_IPC_MBX_DESC_LEN is a power of two, use a masking operation,
which is a common way of avoiding this problem when dealing with these
sort of ring space calculations.
Fixes: da3cde0820 ("eth: fbnic: Add FW communication mechanism")
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20251125211704.3222413-1-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When __mptcp_retrans() kicks-in, it schedules one or more subflows for
retransmission, but such subflows could be actually left alone if there
is no more data to retransmit and/or in case of concurrent fallback.
Scheduled subflows could be processed much later in time, i.e. when new
data will be transmitted, leading to bad subflow selection.
Explicitly clear all scheduled subflows before leaving the
retransmission function.
Fixes: ee2708aeda ("mptcp: use get_retrans wrapper")
Cc: stable@vger.kernel.org
Reported-by: Filip Pokryvka <fpokryvk@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251125-net-mptcp-clear-sched-rtx-v1-1-1cea4ad2165f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When using the SGMII PCS as a fixed-link chip-to-chip connection, it is
easy to miss the fact that traffic passes only at 1G, since that's what
any normal such connection would use.
When using the SGMII PCS connected towards an on-board PHY or an SFP
module, it is immediately noticeable that when the link resolves to a
speed other than 1G, traffic from the MAC fails to pass: TX counters
increase, but nothing gets decoded by the other end, and no local RX
counters increase either.
Artificially lowering a fixed-link rate to speed = <100> makes us able
to see the same issue as in the case of having an SGMII PHY.
Some debugging shows that the XPCS configuration is A-OK, but that the
MAC Configuration Table entry for the port has the SPEED bits still set
to 1000Mbps, due to a special condition in the driver. Deleting that
condition, and letting the resolved link speed be programmed directly
into the MAC speed field, results in a functional link at all 3 speeds.
This piece of evidence, based on testing on both generations with SGMII
support (SJA1105S and SJA1110A) directly contradicts the statement from
the blamed commit that "the MAC is fixed at 1 Gbps and we need to
configure the PCS only (if even that)". Worse, that statement is not
backed by any documentation, and no one from NXP knows what it might
refer to.
I am unable to recall sufficient context regarding my testing from March
2020 to understand what led me to draw such a braindead and factually
incorrect conclusion. Yet, there is nothing of value regarding forcing
the MAC speed, either for SGMII or 2500Base-X (introduced at a later
stage), so remove all such logic.
Fixes: ffe10e679c ("net: dsa: sja1105: Add support for the SGMII port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251122111324.136761-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When discarding descriptors with IN_ORDER, we should rewind
next_avail_head otherwise it would run out of sync with
last_avail_idx. This would cause driver to report
"id X is not a head".
Fixing this by returning the number of descriptors that is used for
each buffer via vhost_get_vq_desc_n() so caller can use the value
while discarding descriptors.
Fixes: 67a873df0c ("vhost: basic in order support")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20251120022950.10117-1-jasowang@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With the previous commit revamping the timeout handling, started isn't
used anymore. It could be taken into account by adjusting the initial
value of the timeout, but there is little point as both callers capture
the timestamp shortly before calling __ceph_open_session() -- the only
thing of note that happens in the interim is taking client->mount_mutex
and that isn't expected to take multiple seconds.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
The wait loop in __ceph_open_session() can race with the client
receiving a new monmap or osdmap shortly after the initial map is
received. Both ceph_monc_handle_map() and handle_one_map() install
a new map immediately after freeing the old one
kfree(monc->monmap);
monc->monmap = monmap;
ceph_osdmap_destroy(osdc->osdmap);
osdc->osdmap = newmap;
under client->monc.mutex and client->osdc.lock respectively, but
because neither is taken in have_mon_and_osd_map() it's possible for
client->monc.monmap->epoch and client->osdc.osdmap->epoch arms in
client->monc.monmap && client->monc.monmap->epoch &&
client->osdc.osdmap && client->osdc.osdmap->epoch;
condition to dereference an already freed map. This happens to be
reproducible with generic/395 and generic/397 with KASAN enabled:
BUG: KASAN: slab-use-after-free in have_mon_and_osd_map+0x56/0x70
Read of size 4 at addr ffff88811012d810 by task mount.ceph/13305
CPU: 2 UID: 0 PID: 13305 Comm: mount.ceph Not tainted 6.14.0-rc2-build2+ #1266
...
Call Trace:
<TASK>
have_mon_and_osd_map+0x56/0x70
ceph_open_session+0x182/0x290
ceph_get_tree+0x333/0x680
vfs_get_tree+0x49/0x180
do_new_mount+0x1a3/0x2d0
path_mount+0x6dd/0x730
do_mount+0x99/0xe0
__do_sys_mount+0x141/0x180
do_syscall_64+0x9f/0x100
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
Allocated by task 13305:
ceph_osdmap_alloc+0x16/0x130
ceph_osdc_init+0x27a/0x4c0
ceph_create_client+0x153/0x190
create_fs_client+0x50/0x2a0
ceph_get_tree+0xff/0x680
vfs_get_tree+0x49/0x180
do_new_mount+0x1a3/0x2d0
path_mount+0x6dd/0x730
do_mount+0x99/0xe0
__do_sys_mount+0x141/0x180
do_syscall_64+0x9f/0x100
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Freed by task 9475:
kfree+0x212/0x290
handle_one_map+0x23c/0x3b0
ceph_osdc_handle_map+0x3c9/0x590
mon_dispatch+0x655/0x6f0
ceph_con_process_message+0xc3/0xe0
ceph_con_v1_try_read+0x614/0x760
ceph_con_workfn+0x2de/0x650
process_one_work+0x486/0x7c0
process_scheduled_works+0x73/0x90
worker_thread+0x1c8/0x2a0
kthread+0x2ec/0x300
ret_from_fork+0x24/0x40
ret_from_fork_asm+0x1a/0x30
Rewrite the wait loop to check the above condition directly with
client->monc.mutex and client->osdc.lock taken as appropriate. While
at it, improve the timeout handling (previously mount_timeout could be
exceeded in case wait_event_interruptible_timeout() slept more than
once) and access client->auth_err under client->monc.mutex to match
how it's set in finish_auth().
monmap_show() and osdmap_show() now take the respective lock before
accessing the map as well.
Cc: stable@vger.kernel.org
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Pull ring-buffer fix from Steven Rostedt:
- Do not allow mmapped ring buffer to be split
When the ring buffer VMA is split by a partial munmap or a MAP_FIXED,
the kernel calls vm_ops->close() on each portion. This causes the
ring_buffer_unmap() to be called multiple times. This causes
subsequent calls to return -ENODEV and triggers a warning.
There's no reason to allow user space to split up memory mapping of
the ring buffer. Have it return -EINVAL when that happens.
* tag 'trace-ringbuffer-v6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix WARN_ON in tracing_buffers_mmap_close for split VMAs
Prior to commit a25e7962db ("PCI/P2PDMA: Refactor the p2pdma mapping
helpers"), P2P segments were mapped using the pci_p2pdma_map_segment()
helper. This helper was responsible for populating sg->dma_address,
marking the bus address, and also setting sg_dma_len(sg).
The refactor[1] removed this helper and moved the mapping logic directly
into the callers. While iommu_dma_map_sg() was correctly updated to set
the length in the new flow, it was missed in dma_direct_map_sg().
Thus, in dma_direct_map_sg(), the PCI_P2PDMA_MAP_BUS_ADDR case sets the
dma_address and marks the segment, but immediately executes 'continue',
which causes the loop to skip the standard assignment logic at the end:
sg_dma_len(sg) = sg->length;
As a result, when CONFIG_NEED_SG_DMA_LENGTH is enabled, the dma_length
field remains uninitialized (zero) for P2P bus address mappings. This
breaks upper-layer drivers (for e.g. RDMA/IB) that rely on sg_dma_len()
to determine the transfer size.
Fix this by explicitly setting the DMA length in the
PCI_P2PDMA_MAP_BUS_ADDR case before continuing to the next scatterlist
entry.
Fixes: a25e7962db ("PCI/P2PDMA: Refactor the p2pdma mapping helpers")
Reported-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Pranjal Shrivastava <praan@google.com>
[1]
https://lore.kernel.org/all/ac14a0e94355bf898de65d023ccf8a2ad22a3ece.1746424934.git.leon@kernel.org/
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shivaji Kant <shivajikant@google.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20251126114112.3694469-1-praan@google.com
Pull misc fixes from Andrew Morton:
"8 hotfixes. 4 are cc:stable, 7 are against mm/.
All are singletons - please see the respective changelogs for details"
* tag 'mm-hotfixes-stable-2025-11-26-11-51' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/filemap: fix logic around SIGBUS in filemap_map_pages()
mm/huge_memory: fix NULL pointer deference when splitting folio
MAINTAINERS: add test_kho to KHO's entry
mailmap: add entry for Sam Protsenko
selftests/mm: fix division-by-zero in uffd-unit-tests
mm/mmap_lock: reset maple state on lock_vma_under_rcu() retry
mm/memfd: fix information leak in hugetlb folios
mm: swap: remove duplicate nr_swap_pages decrement in get_swap_page_of_type()
The driver is doing a 64-bit divide, rather than using the proper
helpers, causing link errors on i386 allyesconfig builds:
x86_64-linux-ld: drivers/power/supply/intel_dc_ti_battery.o: in function `dc_ti_battery_get_voltage_and_current_now':
intel_dc_ti_battery.c:(.text+0x5c): undefined reference to `__udivdi3'
x86_64-linux-ld: intel_dc_ti_battery.c:(.text+0x96): undefined reference to `__udivdi3'
and while fixing that, fix the double rounding: keep the timing
difference in nanoseconds ('ktime'), and then just convert to usecs at
the end.
Not because the timing precision is likely to matter, but because doing
it right also makes the code simpler.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Hans de Goede <hansg@kernel.org>
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
That was already the limit with KASAN enabled, and the 32-bit x86 build
ends up having a couple of drm cases that have stack frames _just_ over
1kB on my allmodconfig test. So the minimal fix for this build issue
for now is to just bump the limit and make it independent of KASAN.
[ Side note: XTENSA already used 1.5k and PARISC uses 2k, so 1280 is
still relatively conservative ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull sound fixes from Takashi Iwai:
"A collection of small fixes. All changes are device-specific and
trivial, mostly HD-audio and USB-audio quirks and fixups"
* tag 'sound-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Add quirk for HP ProBook 450 G8
ALSA: usb-audio: fix uac2 clock source at terminal parser
ALSA: hda/realtek: add quirk for HP pavilion aero laptop 13z-be200
ALSA: hda/cirrus fix cs420x MacPro 6,1 inverted jack detection
ALSA: usb-audio: Add DSD quirk for LEAK Stereo 230
ALSA: au88x0: Fix incorrect error handling for PCI config reads
Pull ACPI fix from Rafael Wysocki:
"Revert a commit that attempted to make the code in the ACPI processor
driver more straightforward, but it turned out to cause the kernel to
crash on at least one system, along with some further cleanups on top
of it"
* tag 'acpi-6.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI: processor: idle: Optimize ACPI idle driver registration"
Revert "ACPI: processor: Remove unused empty stubs of some functions"
Revert "ACPI: processor: idle: Rearrange declarations in header file"
Revert "ACPI: processor: idle: Redefine two functions as void"
Revert "ACPI: processor: Do not expose global variable acpi_idle_driver"
The commit 5cff263606 ("can: rcar_canfd: Fix controller mode setting")
has aligned with the flow mentioned in the hardware manual for all SoCs
except R-Car Gen3 and RZ/G2L SoCs. On R-Car Gen4 and RZ/G3E SoCs, due to
the wrong logic in the commit[1] sets the default mode to FD-Only mode
instead of CAN-FD mode.
This patch sets the CAN-FD mode as the default for all SoCs by dropping
the rcar_canfd_set_mode() as some SoC requires mode setting in global
reset mode, and the rest of the SoCs in channel reset mode and update the
rcar_canfd_reset_controller() to take care of these constraints. Moreover,
the RZ/G3E and R-Car Gen4 SoCs support 3 modes compared to 2 modes on the
R-Car Gen3. Use inverted logic in rcar_canfd_reset_controller() to
simplify the code later to support FD-only mode.
[1]
commit 45721c406d ("can: rcar_canfd: Add support for r8a779a0 SoC")
Fixes: 5cff263606 ("can: rcar_canfd: Fix controller mode setting")
Cc: stable@vger.kernel.org
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://patch.msgid.link/20251118123926.193445-1-biju.das.jz@bp.renesas.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
My laptop, HP ProBook 450 G8 (32M40EA), has Realtek ALC236 codec on its
integrated sound card, and uses GPIO pins 0x2 and 0x1 for speaker mute
and mic mute LEDs correspondingly, as found out by me through hda-verb
invocations. This matches the GPIO masks used by the
alc236_fixup_hp_gpio_led() function.
PCI subsystem vendor and device IDs happen to be 0x103c and 0x8a75,
which has not been covered in the ALC2xx driver code yet.
Signed-off-by: Ilyas Gasanov <public@gsnoff.com>
Link: https://patch.msgid.link/20251125235441.53629-1-public@gsnoff.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Since commit 30f241fcf5 ("xsk: Fix immature cq descriptor
production"), the descriptor number is stored in skb control block and
xsk_cq_submit_addr_locked() relies on it to put the umem addrs onto
pool's completion queue.
skb control block shouldn't be used for this purpose as after transmit
xsk doesn't have control over it and other subsystems could use it. This
leads to the following kernel panic due to a NULL pointer dereference.
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 2 UID: 1 PID: 927 Comm: p4xsk.bin Not tainted 6.16.12+deb14-cloud-amd64 #1 PREEMPT(lazy) Debian 6.16.12-1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
RIP: 0010:xsk_destruct_skb+0xd0/0x180
[...]
Call Trace:
<IRQ>
? napi_complete_done+0x7a/0x1a0
ip_rcv_core+0x1bb/0x340
ip_rcv+0x30/0x1f0
__netif_receive_skb_one_core+0x85/0xa0
process_backlog+0x87/0x130
__napi_poll+0x28/0x180
net_rx_action+0x339/0x420
handle_softirqs+0xdc/0x320
? handle_edge_irq+0x90/0x1e0
do_softirq.part.0+0x3b/0x60
</IRQ>
<TASK>
__local_bh_enable_ip+0x60/0x70
__dev_direct_xmit+0x14e/0x1f0
__xsk_generic_xmit+0x482/0xb70
? __remove_hrtimer+0x41/0xa0
? __xsk_generic_xmit+0x51/0xb70
? _raw_spin_unlock_irqrestore+0xe/0x40
xsk_sendmsg+0xda/0x1c0
__sys_sendto+0x1ee/0x200
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x84/0x2f0
? __pfx_pollwake+0x10/0x10
? __rseq_handle_notify_resume+0xad/0x4c0
? restore_fpregs_from_fpstate+0x3c/0x90
? switch_fpu_return+0x5b/0xe0
? do_syscall_64+0x204/0x2f0
? do_syscall_64+0x204/0x2f0
? do_syscall_64+0x204/0x2f0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
[...]
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x1c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Instead use the skb destructor_arg pointer along with pointer tagging.
As pointers are always aligned to 8B, use the bottom bit to indicate
whether this a single address or an allocated struct containing several
addresses.
Fixes: 30f241fcf5 ("xsk: Fix immature cq descriptor production")
Closes: https://lore.kernel.org/netdev/0435b904-f44f-48f8-afb0-68868474bf1c@nop.hu/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20251124171409.3845-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There have been reports that RTL8127 hangs on suspend and shutdown,
partially disappearing from lspci until power-cycling.
According to Realtek disabling PLL's when switching to D3 should be
avoided on that chip version. Fix this by aligning disabling PLL's
with the vendor drivers, what in addition results in PLL's not being
disabled when switching to D3hot on other chip versions.
Fixes: f24f7b2f3a ("r8169: add support for RTL8127A")
Tested-by: Fabio Baltieri <fabio.baltieri@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/d7faae7e-66bc-404a-a432-3a496600575f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, when skb is null, the driver prints an error and then
dereferences skb on the next line.
To fix this, let's add a 'break' after the error message to switch
to sxgbe_rx_refill(), which is similar to the approach taken by the
other drivers in this particular case, e.g. calxeda with xgmac_rx().
Found during a code review.
Fixes: 1edb9ca69e ("net: sxgbe: add basic framework for Samsung 10Gb ethernet driver")
Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251121123834.97748-1-aleksei.kodanev@bell-sw.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Attempting to add a port device that is already up will expectedly fail,
but not before modifying the team device header_ops.
In the case of the syzbot reproducer the gre0 device is
already in state UP when it attempts to add it as a
port device of team0, this fails but before that
header_ops->create of team0 is changed from eth_header to ipgre_header
in the call to team_dev_type_check_change.
Later when we end up in ipgre_header() struct ip_tunnel* points to nonsense
as the private data of the device still holds a struct team.
Example sequence of iproute2 commands to reproduce the hang/BUG():
ip link add dev team0 type team
ip link add dev gre0 type gre
ip link set dev gre0 up
ip link set dev gre0 master team0
ip link set dev team0 up
ping -I team0 1.1.1.1
Move team_dev_type_check_change down where all other checks have passed
as it changes the dev type with no way to restore it in case
one of the checks that follow it fail.
Also make sure to preserve the origial mtu assignment:
- If port_dev is not the same type as dev, dev takes mtu from port_dev
- If port_dev is the same type as dev, port_dev takes mtu from dev
This is done by adding a conditional before the call to dev_set_mtu
to prevent it from assigning port_dev->mtu = dev->mtu and instead
letting team_dev_type_check_change assign dev->mtu = port_dev->mtu.
The conditional is needed because the patch moves the call to
team_dev_type_check_change past dev_set_mtu.
Testing:
- team device driver in-tree selftests
- Add/remove various devices as slaves of team device
- syzbot
Reported-by: syzbot+a2a3b519de727b0f7903@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a2a3b519de727b0f7903
Fixes: 1d76efe157 ("team: add support for non-ethernet devices")
Signed-off-by: Nikola Z. Ivanov <zlatistiv@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20251122002027.695151-1-zlatistiv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The rate limiting validation condition currently checks the output
variable max_bw_value[i] instead of the input value
maxrate->tc_maxrate[i]. This causes the validation to compare an
uninitialized or stale value rather than the actual requested rate.
The condition should check the input rate to properly validate against
the upper limit:
} else if (maxrate->tc_maxrate[i] <= upper_limit_gbps) {
This aligns with the pattern used in the first branch, which correctly
checks maxrate->tc_maxrate[i] against upper_limit_mbps.
The current implementation can lead to unreliable validation behavior:
- For rates between 25.5 Gbps and 255 Gbps, if max_bw_value[i] is 0
from initialization, the GBPS path may be taken regardless of whether
the actual rate is within bounds
- When processing multiple TCs (i > 0), max_bw_value[i] contains the
value computed for the previous TC, affecting the validation logic
- The overflow check for rates exceeding 255 Gbps may not trigger
consistently depending on previous array values
This patch ensures the validation correctly examines the requested rate
value for proper bounds checking.
Fixes: 43b27d1bd8 ("net/mlx5e: Fix wraparound in rate limiting for values above 255 Gbps")
Signed-off-by: Danielle Costantino <dcostantino@meta.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20251124180043.2314428-1-dcostantino@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When having a multiuser mount with domain= specified and using
cifscreds, cifs_set_cifscreds() will end up setting @ctx->domainname,
so it needs to be freed before leaving cifs_construct_tcon().
This fixes the following memory leak reported by kmemleak:
mount.cifs //srv/share /mnt -o domain=ZELDA,multiuser,...
su - testuser
cifscreds add -d ZELDA -u testuser
...
ls /mnt/1
...
umount /mnt
echo scan > /sys/kernel/debug/kmemleak
cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8881203c3f08 (size 8):
comm "ls", pid 5060, jiffies 4307222943
hex dump (first 8 bytes):
5a 45 4c 44 41 00 cc cc ZELDA...
backtrace (crc d109a8cf):
__kmalloc_node_track_caller_noprof+0x572/0x710
kstrdup+0x3a/0x70
cifs_sb_tlink+0x1209/0x1770 [cifs]
cifs_get_fattr+0xe1/0xf50 [cifs]
cifs_get_inode_info+0xb5/0x240 [cifs]
cifs_revalidate_dentry_attr+0x2d1/0x470 [cifs]
cifs_getattr+0x28e/0x450 [cifs]
vfs_getattr_nosec+0x126/0x180
vfs_statx+0xf6/0x220
do_statx+0xab/0x110
__x64_sys_statx+0xd5/0x130
do_syscall_64+0xbb/0x380
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: f2aee329a6 ("cifs: set domainName when a domain-key is used in multiuser")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: Jay Shin <jaeshin@redhat.com>
Cc: stable@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
xe_guc_ct_init_noalloc() allocates the CT workqueue and other helpers
before it tries to initialize ct->lock. If drmm_mutex_init() fails
we currently bail out without releasing those resources because the
guc_ct_fini() hasn’t been registered yet.
Since destroy_workqueue() in guc_ct_fini() may flush the workqueue, which
in turn can take the ct lock, the initialization sequence is restructured
to first initialize the ct->lock, then set up all CT state, and finally
register guc_ct_fini().
v2: guc_ct_fini() does take ct lock. (Matt)
v3: move primelockdep() together with drmm_mutex_init(). (Lucas)
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251110184522.1581001-2-shuicheng.lin@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 2e4ad5b0667244f496783c58de0995b9562d3344)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Pull arm64 fixes from Will Deacon:
"We've got a revert due to one of the recent CCA commits breaking ACPI
firmware-based error reporting, a fix for a hard-lockup introduced by
a prior fix affecting non-default (CONFIG_EXPERT) configurations and
another ACPI fix for systems using MMIO-based timers.
Other than that, we're looking pretty good.
- Avoid hardlockup when CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY=n
- Fix regression in APEI/GHES error handling
- Fix MMIO timers when probed via ACPI"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: proton-pack: Fix hard lockup when !MITIGATE_SPECTRE_BRANCH_HISTORY
ACPI: GTDT: Correctly number platform devices for MMIO timers
Revert "arm64: acpi: Enable ACPI CCEL support"
Pull iommufd fixes from Jason Gunthorpe:
"Two build fixes, no functional change:
- Fix a possible compiler error around counted_by() due to wrong
initialization order
- Fix a -Wflex-array-member-not-at-end"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd/iommufd_private.h: Avoid -Wflex-array-member-not-at-end warning
iommufd/driver: Fix counter initialization for counted_by annotation
Since 8b3a087f7f ("ALSA: usb-audio: Unify virtual type units type to
UAC3 values") usb-audio is using UAC3_CLOCK_SOURCE instead of
bDescriptorSubtype, later refactored with e0ccdef926 ("ALSA: usb-audio:
Clean up check_input_term()") into parse_term_uac2_clock_source().
This breaks the clock source selection for at least my
1397:0003 BEHRINGER International GmbH FCA610 Pro.
Fix by using UAC2_CLOCK_SOURCE in parse_term_uac2_clock_source().
Fixes: 8b3a087f7f ("ALSA: usb-audio: Unify virtual type units type to UAC3 values")
Signed-off-by: René Rebe <rene@exactco.de>
Link: https://patch.msgid.link/20251125.154149.1121389544970412061.rene@exactco.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
To initialize the taprio block in lan966x, it is required to configure
the register REVISIT_DLY. The purpose of this register is to set the
delay before revisit the next gate and the value of this register depends
on the system clock. The problem is that the we calculated wrong the value
of the system clock period in picoseconds. The actual system clock is
~165.617754MHZ and this correspond to a period of 6038 pico seconds and
not 15125 as currently set.
Fixes: e462b27173 ("net: lan966x: Add offload support for taprio")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251121061411.810571-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Revert commit 5020d05b34 ("ACPI: processor: Remove unused empty stubs
of some functions") because it depends on a problematic one.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Revert commit bdf780fbce ("ACPI: processor: idle: Rearrange declarations
in header file") because it depends on a problematic one.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Revert commit fbd401e95e ("ACPI: processor: idle: Redefine two
functions as void") because it depends on a problematic one.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Revert commit 559f2eacc8 ACPI: processor: Do not expose global variable
acpi_idle_driver" because it depends on a problematic one.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The selective fetch code doesn't handle asycn flips correctly.
There is a nonsense check for async flips in
intel_psr2_sel_fetch_config_valid() but that only gets called
for modesets/fastsets and thus does nothing for async flips.
Currently intel_async_flip_check_hw() is very unhappy as the
selective fetch code pulls in planes that are not even async
flips capable.
Reject async flips when selective fetch is enabled, until
someone fixes this properly (ie. disable selective fetch while
async flips are being issued).
Cc: stable@vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/20251105171015.22234-1-ville.syrjala@linux.intel.com
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
(cherry picked from commit a5f0cc8e0cd4007370af6985cb152001310cf20c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
gpy_update_interface() returns early in case the PHY is internal or
connected via USXGMII. In this case the gigabit master/slave property
as well as MDI/MDI-X status also won't be read which seems wrong.
Always read those properties by moving the logic to retrieve them to
gpy_read_status().
Fixes: fd8825cd8c ("net: phy: mxl-gpy: Add PHY Auto/MDI/MDI-X set driver for GPY211 chips")
Fixes: 311abcdddc ("net: phy: add support to get Master-Slave configuration")
Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/71fccf3f56742116eb18cc070d2a9810479ea7f9.1763650701.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Protect access to fore200e->available_cell_rate with rate_mtx lock in the
error handling path of fore200e_open() to prevent a data race.
The field fore200e->available_cell_rate is a shared resource used to track
available bandwidth. It is concurrently accessed by fore200e_open(),
fore200e_close(), and fore200e_change_qos().
In fore200e_open(), the lock rate_mtx is correctly held when subtracting
vcc->qos.txtp.max_pcr from available_cell_rate to reserve bandwidth.
However, if the subsequent call to fore200e_activate_vcin() fails, the
function restores the reserved bandwidth by adding back to
available_cell_rate without holding the lock.
This introduces a race condition because available_cell_rate is a global
device resource shared across all VCCs. If the error path in
fore200e_open() executes concurrently with operations like
fore200e_close() or fore200e_change_qos() on other VCCs, a
read-modify-write race occurs.
Specifically, the error path reads the rate without the lock. If another
CPU acquires the lock and modifies the rate (e.g., releasing bandwidth in
fore200e_close()) between this read and the subsequent write, the error
path will overwrite the concurrent update with a stale value. This results
in incorrect bandwidth accounting.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251120120657.2462194-1-hanguidong02@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bastien Curutchet says:
====================
net: dsa: microchip: Fix resource releases in error path
I worked on adding PTP support for the KSZ8463. While doing so, I ran
into a few bugs in the resource release process that occur when things go
wrong arount IRQ initialization.
This small series fixes those bugs.
The next series, which will add the PTP support, depend on this one.
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
---
Bastien Curutchet (Schneider Electric) (5):
net: dsa: microchip: common: Fix checks on irq_find_mapping()
net: dsa: microchip: ptp: Fix checks on irq_find_mapping()
net: dsa: microchip: Don't free uninitialized ksz_irq
net: dsa: microchip: Free previously initialized ports on init failures
net: dsa: microchip: Fix symetry in ksz_ptp_msg_irq_{setup/free}()
drivers/net/dsa/microchip/ksz_common.c | 31 +++++++++++++++----------------
drivers/net/dsa/microchip/ksz_ptp.c | 22 +++++++++-------------
2 files changed, 24 insertions(+), 29 deletions(-)
---
base-commit: 09652e543e809c2369dca142fee5d9b05be9bdc7
change-id: 20251031-ksz-fix-db345df7635f
Best regards,
====================
Link: https://patch.msgid.link/20251120-ksz-fix-v6-0-891f80ae7f8f@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The IRQ numbers created through irq_create_mapping() are only assigned
to ptpmsg_irq[n].num at the end of the IRQ setup. So if an error occurs
between their creation and their assignment (for instance during the
request_threaded_irq() step), we enter the error path and fail to
release the newly created virtual IRQs because they aren't yet assigned
to ptpmsg_irq[n].num.
Move the mapping creation to ksz_ptp_msg_irq_setup() to ensure symetry
with what's released by ksz_ptp_msg_irq_free().
In the error path, move the irq_dispose_mapping to the out_ptp_msg label
so it will be called only on created IRQs.
Cc: stable@vger.kernel.org
Fixes: cc13ab18b2 ("net: dsa: microchip: ptp: enable interrupt for timestamping")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Link: https://patch.msgid.link/20251120-ksz-fix-v6-5-891f80ae7f8f@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If a port interrupt setup fails after at least one port has already been
successfully initialized, the gotos miss some resource releasing:
- the already initialized PTP IRQs aren't released
- the already initialized port IRQs aren't released if the failure
occurs in ksz_pirq_setup().
Merge 'out_girq' and 'out_ptpirq' into a single 'port_release' label.
Behind this label, use the reverse loop to release all IRQ resources
for all initialized ports.
Jump in the middle of the reverse loop if an error occurs in
ksz_ptp_irq_setup() to only release the port IRQ of the current
iteration.
Cc: stable@vger.kernel.org
Fixes: c9cd961c0d ("net: dsa: microchip: lan937x: add interrupt support for port phy link")
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Link: https://patch.msgid.link/20251120-ksz-fix-v6-4-891f80ae7f8f@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If something goes wrong at setup, ksz_irq_free() can be called on
uninitialized ksz_irq (for example when ksz_ptp_irq_setup() fails). It
leads to freeing uninitialized IRQ numbers and/or domains.
Use dsa_switch_for_each_user_port_continue_reverse() in the error path
to iterate only over the fully initialized ports.
Cc: stable@vger.kernel.org
Fixes: cc13ab18b2 ("net: dsa: microchip: ptp: enable interrupt for timestamping")
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Link: https://patch.msgid.link/20251120-ksz-fix-v6-3-891f80ae7f8f@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
irq_find_mapping() returns a positive IRQ number or 0 if no IRQ is found
but it never returns a negative value. However, during the PTP IRQ setup,
we verify that its returned value isn't negative.
Fix the irq_find_mapping() check to enter the error path when 0 is
returned. Return -EINVAL in such case.
Cc: stable@vger.kernel.org
Fixes: cc13ab18b2 ("net: dsa: microchip: ptp: enable interrupt for timestamping")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Link: https://patch.msgid.link/20251120-ksz-fix-v6-2-891f80ae7f8f@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
irq_find_mapping() returns a positive IRQ number or 0 if no IRQ is found
but it never returns a negative value. However, on each
irq_find_mapping() call, we verify that the returned value isn't
negative.
Fix the irq_find_mapping() checks to enter error paths when 0 is
returned. Return -EINVAL in such cases.
CC: stable@vger.kernel.org
Fixes: c9cd961c0d ("net: dsa: microchip: lan937x: add interrupt support for port phy link")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Link: https://patch.msgid.link/20251120-ksz-fix-v6-1-891f80ae7f8f@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Protect vga_switcheroo_client_fb_set() with console lock. Avoids OOB
access in fbcon_remap_all(). Without holding the console lock the call
races with switching outputs.
VGA switcheroo calls fbcon_remap_all() when switching clients. The fbcon
function uses struct fb_info.node, which is set by register_framebuffer().
As the fb-helper code currently sets up VGA switcheroo before registering
the framebuffer, the value of node is -1 and therefore not a legal value.
For example, fbcon uses the value within set_con2fb_map() [1] as an index
into an array.
Moving vga_switcheroo_client_fb_set() after register_framebuffer() can
result in VGA switching that does not switch fbcon correctly.
Therefore move vga_switcheroo_client_fb_set() under fbcon_fb_registered(),
which already holds the console lock. Fbdev calls fbcon_fb_registered()
from within register_framebuffer(). Serializes the helper with VGA
switcheroo's call to fbcon_remap_all().
Although vga_switcheroo_client_fb_set() takes an instance of struct fb_info
as parameter, it really only needs the contained fbcon state. Moving the
call to fbcon initialization is therefore cleaner than before. Only amdgpu,
i915, nouveau and radeon support vga_switcheroo. For all other drivers,
this change does nothing.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://elixir.bootlin.com/linux/v6.17/source/drivers/video/fbdev/core/fbcon.c#L2942 # [1]
Fixes: 6a9ee8af34 ("vga_switcheroo: initial implementation (v15)")
Acked-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Cc: dri-devel@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Cc: linux-fbdev@vger.kernel.org
Cc: <stable@vger.kernel.org> # v2.6.34+
Link: https://patch.msgid.link/20251105161549.98836-1-tzimmermann@suse.de
Commit c010d47f10 ("mm: thp: split huge page to any lower order pages")
introduced an early check on the folio's order via mapping->flags before
proceeding with the split work.
This check introduced a bug: for shmem folios in the swap cache and
truncated folios, the mapping pointer can be NULL. Accessing
mapping->flags in this state leads directly to a NULL pointer dereference.
This commit fixes the issue by moving the check for mapping != NULL before
any attempt to access mapping->flags.
Link: https://lkml.kernel.org/r/20251119235302.24773-1-richard.weiyang@gmail.com
Fixes: c010d47f10 ("mm: thp: split huge page to any lower order pages")
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
After commit 4f78252da8, nr_swap_pages is decremented in
swap_range_alloc(). Since cluster_alloc_swap_entry() calls
swap_range_alloc() internally, the decrement in get_swap_page_of_type()
causes double-decrementing.
As a representative userspace-visible runtime example of the impact,
/proc/meminfo reports increasingly inaccurate SwapFree values. The
discrepancy grows with each swap allocation, and during hibernation
when large amounts of memory are written to swap, the reported value
can deviate significantly from actual available swap space, misleading
users and monitoring tools.
Remove the duplicate decrement.
Link: https://lkml.kernel.org/r/20251102082456.79807-1-youngjun.park@lge.com
Fixes: 4f78252da8 ("mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc()")
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
Acked-by: Chris Li <chrisl@kernel.org>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Kairui Song <kasong@tencent.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: <stable@vger.kernel.org> [6.17+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The "drop print" commit removed the whole branch and not just the print.
For some ARM64 cpus, this leads to hard lockup when
CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY is not enabled.
Fixes: 62e72463ca ("arm64: proton-pack: Drop print when !CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Will Deacon <will@kernel.org>
This reverts commit d02c2e45b1.
Mauro reports that this breaks APEI notifications on his QEMU setup
because the "reserved for firmware" region still needs to be writable
by Linux in order to signal _back_ to the firmware after processing
the reported error:
| {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
| ...
| [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error
| Unable to handle kernel write to read-only memory at virtual address ffff800080035018
| Mem abort info:
| ESR = 0x000000009600004f
| EC = 0x25: DABT (current EL), IL = 32 bits
| SET = 0, FnV = 0
| EA = 0, S1PTW = 0
| FSC = 0x0f: level 3 permission fault
| Data abort info:
| ISV = 0, ISS = 0x0000004f, ISS2 = 0x00000000
| CM = 0, WnR = 1, TnD = 0, TagAccess = 0
| GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
| swapper pgtable: 4k pages, 52-bit VAs, pgdp=00000000505d7000
| pgd=10000000510bc003, p4d=1000000100229403, pud=100000010022a403, pmd=100000010022b403, pte=0060000139b90483
| Internal error: Oops: 000000009600004f [#1] SMP
For now, revert the offending commit. We can presumably switch back to
PAGE_KERNEL when bringing this back in the future.
Link: https://lore.kernel.org/r/20251121224611.07efa95a@foz.lan
Reported-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Pull clk fixes from Stephen Boyd:
"Fixes for the Allwinner A523 clk driver:
- Lower the minimum rate for the A523 audio PLL to support
frequencies required by audio devices
- Mark a couple clks critical on A523 so that Linux doesn't turn them
off when they're used by other code like TF-A"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi-ng: sun55i-a523-ccu: Lower audio0 pll minimum rate
clk: sunxi-ng: sun55i-a523-r-ccu: Mark bus-r-dma as critical
clk: sunxi-ng: Mark A523 bus-r-cpucfg clock as critical
Pull timer fixes from Ingo Molnar:
- Fix a race in timer->function clearing in timer_shutdown_sync()
- Fix a timekeeper sysfs-setup resource leak in error paths
- Fix the NOHZ report_idle_softirq() syslog rate-limiting
logic to have no side effects on the return value
* tag 'timers-urgent-2025-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Fix NULL function pointer race in timer_shutdown_sync()
timekeeping: Fix resource leak in tk_aux_sysfs_init() error paths
tick/sched: Fix bogus condition in report_idle_softirq()
There is a race condition between timer_shutdown_sync() and timer
expiration that can lead to hitting a WARN_ON in expire_timers().
The issue occurs when timer_shutdown_sync() clears the timer function
to NULL while the timer is still running on another CPU. The race
scenario looks like this:
CPU0 CPU1
<SOFTIRQ>
lock_timer_base()
expire_timers()
base->running_timer = timer;
unlock_timer_base()
[call_timer_fn enter]
mod_timer()
...
timer_shutdown_sync()
lock_timer_base()
// For now, will not detach the timer but only clear its function to NULL
if (base->running_timer != timer)
ret = detach_if_pending(timer, base, true);
if (shutdown)
timer->function = NULL;
unlock_timer_base()
[call_timer_fn exit]
lock_timer_base()
base->running_timer = NULL;
unlock_timer_base()
...
// Now timer is pending while its function set to NULL.
// next timer trigger
<SOFTIRQ>
expire_timers()
WARN_ON_ONCE(!fn) // hit
...
lock_timer_base()
// Now timer will detach
if (base->running_timer != timer)
ret = detach_if_pending(timer, base, true);
if (shutdown)
timer->function = NULL;
unlock_timer_base()
The problem is that timer_shutdown_sync() clears the timer function
regardless of whether the timer is currently running. This can leave a
pending timer with a NULL function pointer, which triggers the
WARN_ON_ONCE(!fn) check in expire_timers().
Fix this by only clearing the timer function when actually detaching the
timer. If the timer is running, leave the function pointer intact, which is
safe because the timer will be properly detached when it finishes running.
Fixes: 0cc04e8045 ("timers: Add shutdown mechanism to the internal functions")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251122093942.301559-1-zouyipeng@huawei.com
Pull MIPS fixes from Thomas Bogendoerfer:
- Fix CPU type in DT for econet
- Fix for Malta PCI MMIO breakage for SOC-it
- Fix TLB shutdown caused by iniital uniquification
- Fix random seg faults due to missed vdso storage requirement
* tag 'mips-fixes_6.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: kernel: Fix random segmentation faults
MIPS: mm: Prevent a TLB shutdown on initial uniquification
mips: dts: econet: fix EN751221 core type
MIPS: Malta: Fix !EVA SOC-it PCI MMIO
Pull crypto library fix from Eric Biggers:
"Fix another KMSAN warning that made it in while KMSAN wasn't working
reliably"
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: tests: Fix KMSAN warning in test_sha256_finup_2x()
Pull xfs fix from Carlos Maiolino:
"A single out-of-bounds fix, nothing special"
* tag 'xfs-fixes-6.18-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix out of bounds memory read error in symlink repair
Pull SCSI fixes from James Bottomley:
"One target driver fix and one scsi-generic one. The latter is 10 lines
because the problem lock has to be dropped and re-taken around the
call causing the sleep in atomic"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sg: Do not sleep in atomic context
scsi: target: tcm_loop: Fix segfault in tcm_loop_tpg_address_show()
Pull input fixes from Dmitry Torokhov:
- INPUT_PROP_HAPTIC_TOUCHPAD definition added early in 6.18 cycle has
been renamed to INPUT_PROP_PRESSUREPAD to better reflect the kind of
devices it is supposed to be set for
- a new ID for a touchscreen found in Ayaneo Flip DS in Goodix driver
- Goodix driver no longer tries to set reset pin as "input" as it
causes issues when there is no pull up resistor installed on the
board
- fixes for cros_ec_keyb, imx_sc_key, and pegasus-notetaker drivers to
deal with potential out-of-bounds access and memory corruption issues
* tag 'input-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: rename INPUT_PROP_HAPTIC_TOUCHPAD to INPUT_PROP_PRESSUREPAD
Input: cros_ec_keyb - fix an invalid memory access
Input: imx_sc_key - fix memory corruption on unload
Input: pegasus-notetaker - fix potential out-of-bounds access
Input: goodix - remove setting of RST pin to input
Input: goodix - add support for ACPI ID GDIX1003
Pull RISC-V fixes from Paul Walmsley:
- Correct the MIPS RISC-V/JEDEC vendor ID
- Fix the system shutdown behavior in the legacy case where
CONFIG_RISCV_SBI_V01 is set, but the firmware implementation
doesn't support the older v0.1 system shutdown method
- Align some tools/ macro definitions with the corresponding
kernel headers
* tag 'riscv-for-linus-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
tools: riscv: Fixed misalignment of CSR related definitions
riscv: sbi: Prefer SRST shutdown over legacy
riscv: Update MIPS vendor id to 0x127
Pull selinux fixes from Paul Moore:
"Three SELinux patches for v6.18 to fix issues around accessing the
per-task decision cache that we introduced in v6.16 to help reduce
SELinux overhead on path walks. The problem was that despite the cache
being located in the SELinux "task_security_struct", the parent struct
wasn't actually tied to the task, it was tied to a cred.
Historically SELinux did locate the task_security_struct in the
task_struct's security blob, but it was later relocated to the cred
struct when the cred work happened, as it made the most sense at the
time.
Unfortunately we never did the task_security_struct to
cred_security_struct rename work (avoid code churn maybe? who knows)
because it didn't really matter at the time. However, it suddenly
became a problem when we added a per-task cache to a per-cred object
and didn't notice because of the old, no-longer-correct struct naming.
Thanks to KCSAN for flagging this, as the silly humans running things
forgot that the task_security_struct was a big lie.
This contains three patches, only one of which actually fixes the
problem described above and moves the SELinux decision cache from the
per-cred struct to a newly (re)created per-task struct.
The other two patches, which form the bulk of the diffstat, take care
of the associated renaming tasks so we can hopefully avoid making the
same stupid mistake in the future.
For the record, I did contemplate sending just a fix for the cache,
leaving the renaming patches for the upcoming merge window, but the
type/variable naming ended up being pretty awful and would have made
v6.18 an outlier stuck between the "old" names and the "new" names in
v6.19. The renaming patches are also fairly mechanical/trivial and
shouldn't pose much risk despite their size.
TLDR; naming things may be hard, but if you mess it up bad things
happen"
* tag 'selinux-pr-20251121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: rename the cred_security_struct variables to "crsec"
selinux: move avdcache to per-task security struct
selinux: rename task_security_struct to cred_security_struct
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_sock: Prevent race in socket write iter and sock bind
- hci_core: Fix triggering cmd_timer for HCI_OP_NOP
- hci_core: lookup hci_conn on RX path on protocol side
- SMP: Fix not generating mackey and ltk when repairing
- btusb: mediatek: Fix kernel crash when releasing mtk iso interface
- btusb: mediatek: Avoid btusb_mtk_claim_iso_intf() NULL deref
* tag 'for-net-2025-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: SMP: Fix not generating mackey and ltk when repairing
Bluetooth: btusb: mediatek: Avoid btusb_mtk_claim_iso_intf() NULL deref
Bluetooth: hci_core: lookup hci_conn on RX path on protocol side
Bluetooth: hci_sock: Prevent race in socket write iter and sock bind
Bluetooth: hci_core: Fix triggering cmd_timer for HCI_OP_NOP
Bluetooth: btusb: mediatek: Fix kernel crash when releasing mtk iso interface
====================
Link: https://patch.msgid.link/20251121145332.177015-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
Move the conflicting declaration to the end of the corresponding
structure. Notice that struct iommufd_vevent is a flexible
structure, this is a structure that contains a flexible-array
member.
Fix the following warning:
drivers/iommu/iommufd/iommufd_private.h:621:31: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Link: https://patch.msgid.link/r/aRHOAwpATIE0oajj@kspp
Signed-off-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Fixes: e36ba5ab80 ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC")
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
One of the requirements for counted_by annotations is that the counter
member must be initialized before the first reference to the
flexible-array member.
Move the vevent->data_len = data_len; initialization to before the
first access to flexible array vevent->event_data.
Link: https://patch.msgid.link/r/aRL7ZFFqM5bRTd2D@kspp
Cc: stable@vger.kernel.org
Fixes: e8e1ef9b77 ("iommufd/viommu: Add iommufd_viommu_report_event helper")
Signed-off-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull LoongArch fixes from Huacai Chen:
"Use UAPI types in ptrace UAPI header to fix nolibc ptrace.
Fix CPU name display, NUMA node parsing, kexec/kdump, PCI init and BPF
trampoline"
* tag 'loongarch-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: BPF: Disable trampoline for kernel module function trace
LoongArch: Don't panic if no valid cache info for PCI
LoongArch: Mask all interrupts during kexec/kdump
LoongArch: Fix NUMA node parsing with numa_memblks
LoongArch: Consolidate CPU names in /proc/cpuinfo
LoongArch: Use UAPI types in ptrace UAPI header
Pull smb client fixes from Steve French:
- Fix potential memory leak in mount
- Add some missing read tracepoints
- Fix locking issue with directory leases
* tag 'v6.18-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Add the smb3_read_* tracepoints to SMB1
cifs: fix memory leak in smb3_fs_context_parse_param error path
smb: client: introduce close_cached_dir_locked()
Pull io_uring fix from Jens Axboe:
"Just a single fix for a mixup of arguments for the skb_queue_splice()
call, in the io_uring timestamp retrieval code"
* tag 'io_uring-6.18-20251120' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/cmd_net: fix wrong argument types for skb_queue_splice()
Pull ata fixes from Niklas Cassel:
- Add a missing refcount decrement in ata_scsi_dev_rescan() when
the device or its queue is not running.
In the case where the device is running, the recount is already
decremented properly (Yihang Li)
- Generate the proper sense code for a Security locked device.
There was a regression caused by a recent change of how sense
data is generated for commands that did not provide any sense
data. This broke system suspend for Security locked devices.
Generate the sense data that the SCSI disk driver expects for a
Security locked device so that system suspend works again (me)
- Set capacity to zero for a Security locked device.
All I/O commands will be aborted by a Security locked device.
Thus, the block layer disk partition scanning will result in
a bunch of, for the user, confusing I/O errors in dmesg during
boot.
Since a Security locked device is unusable anyway, set the capacity
to zero, to avoid the disk partition scanning during boot. We still
create the block device in /dev such that the user may unlock the
device using e.g. hdparm (me)
* tag 'ata-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-core: Set capacity to zero for a security locked drive
ata: libata-scsi: Fix system suspend for a security locked drive
ata: libata-scsi: Add missing scsi_device_put() in ata_scsi_dev_rescan()
Pull pin control fixes from Linus Walleij:
- Fix register naming in the Mediatek mt8189 driver
- Select REGMAP_MMIO for the Realtek RTD driver
- Fix the number of items in groups in the Toshiba Visconti driver
- Fix a memory leak in the Cirrus CS42L43 driver
- Fix a deadlock (!) in Qualcomm pinmux configuration
- Fix use of uninitialized memory and list initialization in the S32CC
pin controller
* tag 'pinctrl-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
dt-bindings: pinctrl: xlnx,versal-pinctrl: Add missing unevaluatedProperties on '^conf' nodes
pinctrl: s32cc: initialize gpio_pin_config::list after kmalloc()
pinctrl: s32cc: fix uninitialized memory in s32_pinctrl_desc
pinctrl: qcom: msm: Fix deadlock in pinmux configuration
pinctrl: cirrus: Fix fwnode leak in cs42l43_pin_probe()
dt-bindings: pinctrl: toshiba,visconti: Fix number of items in groups
pinctrl: realtek: Select REGMAP_MMIO for RTD driver
pinctrl: mediatek: mt8189: align register base names to dt-bindings ones
pinctrl: mediatek: mt8196: align register base names to dt-bindings ones
Pull gpio fixes from Bartosz Golaszewski:
- fix a use-after-free bug in GPIO character device code
- update MAINTAINERS
* tag 'gpio-fixes-for-v6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
MAINTAINERS: update my email address
gpio: cdev: make sure the cdev fd is still active before emitting events
Fully initialize *ctx, including the buf field which sha256_init()
doesn't initialize, to avoid a KMSAN warning when comparing *ctx to
orig_ctx. This KMSAN warning slipped in while KMSAN was not working
reliably due to a stackdepot bug, which has now been fixed.
Fixes: 6733968be7 ("lib/crypto: tests: Add tests and benchmark for sha256_finup_2x()")
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251121033431.34406-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Pull drm fixes from Dave Airlie:
"A range of small fixes across the board, the i915 display
disambiguation is probably the biggest otherwise amdgpu and xe as
usual with tegra, nouveau, radeon and a core atomic fix.
Looks mostly normal.
atomic:
- Return error codes on failed blob creation for planes
nouveau:
- Fix memory leak
tegra:
- Fix device ref counting
- Fix pid ref counting
- Revert booting on Pixel C
xe:
- Fix out-of-bounds access with BIT()
- Fix kunit test checking wrong condition
- Drop duplicate kconfig select
- Fix guc2host irq handler with MSI-X
i915:
- Wildcat Lake and Panther Lake detangled for display fixes
amdgpu:
- DTBCLK gating fix
- EDID fetching retry improvements
- HDMI HPD debounce filtering
- DCN 2.0 cursor fix
- DP MST PBN fix
- VPE fix
- GC 11 fix
- PRT fix
- MMIO remap page fix
- SR-IOV fix
radeon:
- Fence deadlock fix"
* tag 'drm-fixes-2025-11-21' of https://gitlab.freedesktop.org/drm/kernel: (25 commits)
drm/amdgpu: Add sriov vf check for VCN per queue reset support.
drm/amdgpu/ttm: Fix crash when handling MMIO_REMAP in PDE flags
drm/amdgpu/vm: Check PRT uAPI flag instead of PTE flag
drm/amdgpu: Skip emit de meta data on gfx11 with rs64 enabled
drm/amd: Skip power ungate during suspend for VPE
drm/plane: Fix create_in_format_blob() return value
drm/xe/irq: Handle msix vector0 interrupt
drm/xe: Remove duplicate DRM_EXEC selection from Kconfig
drm/xe/kunit: Fix forcewake assertion in mocs test
drm/xe: Prevent BIT() overflow when handling invalid prefetch region
drm/radeon: delete radeon_fence_process in is_signaled, no deadlock
drm/amd/display: Fix pbn to kbps Conversion
drm/amd/display: Clear the CUR_ENABLE register on DCN20 on DPP5
drm/amd/display: Add an HPD filter for HDMI
drm/amd/display: Increase DPCD read retries
drm/amd/display: Move sleep into each retry for retrieve_link_cap()
drm/amd/display: Prevent Gating DTBCLK before It Is Properly Latched
drm/i915/xe3: Restrict PTL intel_encoder_is_c10phy() to only PHY A
drm/i915/display: Add definition for wcl as subplatform
drm/pcids: Split PTL pciids group to make wcl subplatform
...
Apparently as of version 2.42, glibc headers define AT_RENAME_NOREPLACE
and some of the other flags for renameat2() and friends in <stdio.h>.
Which would all be fine, except for inexplicable reasons glibc decided
to define them _differently_ from the kernel definitions, which then
makes some of our sample code that includes both kernel headers and user
space headers unhappy, because the compiler will (correctly) complain
about redefining things.
Now, mixing kernel headers and user space headers is always a somewhat
iffy proposition due to namespacing issues, but it's kind of inevitable
in our sample and selftest code. And this is just glibc being stupid.
Those defines come from the kernel, glibc is exposing the kernel
interfaces, and glibc shouldn't make up some random new expressions for
these values.
It's not like glibc headers changed the actual result values, but they
arbitrarily just decided to use a different expression to describe those
values. The kernel just does
#define AT_RENAME_NOREPLACE 0x0001
while glibc does
# define RENAME_NOREPLACE (1 << 0)
# define AT_RENAME_NOREPLACE RENAME_NOREPLACE
instead. Same value in the end, but very different macro definition.
For absolutely no reason.
This has since been fixed in the glibc development tree, so eventually
we'll end up with the canonical expressions and no clashes. But in the
meantime the broken headers are in the glibc-2.42 release and have made
it out into distributions.
Do a minimal work-around to make the samples build cleanly by just
undefining the affected macros in between the user space header include
and the kernel header includes.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ACPI ECRD and ECWR functions have a 10ms sleep at the end. It turns
out, that this is sometimes needed to avoid I2C transmission failures,
especially for functions doing regmap_update_bits (and thus read + write
shortly after each other). This fixes problems like the following
appearing in the kernel log:
leds platform::micmute: Setting an LED's brightness failed (-6)
leds platform::kbd_backlight: Setting an LED's brightness failed (-6)
The ACPI QEVT function used to read the interrupt status register also
has a 10ms sleep at the end. Without that there are problems with
reading multiple events following directly after each other resulting
in the following error message being logged:
thinkpad-t14s-ec 4-0028: Failed to read event
Fixes: 60b7ab6ce0 ("platform: arm64: thinkpad-t14s-ec: new driver")
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Link: https://patch.msgid.link/20251119-thinkpad-t14s-ec-improvements-v2-2-441219857c02@kernel.org
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Fix a race condition, that an input key related interrupt might be
triggered before the input handler has been registered, which results
in a NULL pointer dereference. This can happen if the user enables
the keyboard backlight shortly before the driver is being probed.
This fixes the following backtrace visible in dmesg:
Unable to handle kernel NULL pointer dereference at virtual address 00000000000000e0
...
Call trace:
sparse_keymap_report_event+0x2c/0x978 [sparse_keymap] (P)
t14s_ec_irq_handler+0x190/0x3e8 [lenovo_thinkpad_t14s]
irq_thread_fn+0x30/0xb8
irq_thread+0x18c/0x3b0
kthread+0x148/0x228
ret_from_fork+0x10/0x20
Fixes: 60b7ab6ce0 ("platform: arm64: thinkpad-t14s-ec: new driver")
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Reviewed-by: Bryan O'Donoghue <bod@kernel.org>
Link: https://patch.msgid.link/20251119-thinkpad-t14s-ec-improvements-v2-1-441219857c02@kernel.org
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
The sii902x driver was caching HDMI detection state in a sink_is_hdmi field
and checking it in mode_set() to determine whether to set HDMI or DVI
output mode. This approach had two problems:
1. With DRM_BRIDGE_ATTACH_NO_CONNECTOR (used by modern display drivers like
TIDSS), the bridge's get_modes() is never called. Instead, the
drm_bridge_connector helper calls the bridge's edid_read() and updates the
connector itself. This meant sink_is_hdmi was never populated, causing the
driver to default to DVI mode and breaking HDMI audio.
2. The mode_set() callback doesn't receive atomic state or connector
pointer, making it impossible to check connector->display_info.is_hdmi
directly at that point.
Fix this by moving the HDMI vs DVI decision from mode_set() to
atomic_enable(), where we can access the connector via
drm_atomic_get_new_connector_for_encoder(). This works for both connector
models:
- With DRM_BRIDGE_ATTACH_NO_CONNECTOR: Returns the drm_bridge_connector
created by the display driver, which has already been updated by the
helper's call to drm_edid_connector_update()
- Without DRM_BRIDGE_ATTACH_NO_CONNECTOR (legacy): Returns the connector
embedded in sii902x struct, which gets updated by the bridge's own
get_modes()
Fixes: 3de47e1309 ("drm/bridge: sii902x: use display info is_hdmi")
Signed-off-by: Devarsh Thakkar <devarsht@ti.com>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20251030151635.3019864-1-devarsht@ti.com
Commit 69896119dc ("MIPS: vdso: Switch to generic storage
implementation") switches to a generic vdso storage, which increases
the number of data pages from 1 to 4. But there is only one page
reserved, which causes segementation faults depending where the VDSO
area is randomized to. To fix this use the same size of reservation
and allocation of the VDSO data pages.
Fixes: 69896119dc ("MIPS: vdso: Switch to generic storage implementation")
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Depending on the particular CPU implementation a TLB shutdown may occur
if multiple matching entries are detected upon the execution of a TLBP
or the TLBWI/TLBWR instructions. Given that we don't know what entries
we have been handed we need to be very careful with the initial TLB
setup and avoid all these instructions.
Therefore read all the TLB entries one by one with the TLBR instruction,
bypassing the content addressing logic, and truncate any large pages in
place so as to avoid a case in the second step where an incoming entry
for a large page at a lower address overlaps with a replacement entry
chosen at another index. Then preinitialize the TLB using addresses
outside our usual unique range and avoiding clashes with any entries
received, before making the usual call to local_flush_tlb_all().
This fixes (at least) R4x00 cores if TLBP hits multiple matching TLB
entries (SGI IP22 PROM for examples sets up all TLBs to the same virtual
address).
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Fixes: 35ad7e1815 ("MIPS: mm: tlb-r4k: Uniquify TLB entries on init")
Cc: stable@vger.kernel.org
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> # Boston I6400, M5150 sim
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
GFP_NOWAIT allocation may fail anytime. It needs to be changed to
GFP_NOIO. There's no need to handle an error because mempool_alloc with
GFP_NOIO can't fail.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
As explain in commit fa349e396e ("veth: Fix race with AF_XDP exposing
old or uninitialized descriptors") for veth there is a chance after
napi_complete_done() that another CPU can manage start another NAPI
instance running veth_pool(). For NAPI this is correctly handled as the
napi_schedule_prep() check will prevent multiple instances from getting
scheduled, but for the remaining code in veth_pool() this can run
concurrent with the newly started NAPI instance.
The problem/race is that xdp_clear_return_frame_no_direct() isn't
designed to be nested.
Prior to commit 401cb7dae8 ("net: Reference bpf_redirect_info via
task_struct on PREEMPT_RT.") the temporary BPF net context
bpf_redirect_info was stored per CPU, where this wasn't an issue. Since
this commit the BPF context is stored in 'current' task_struct. When
running veth in threaded-NAPI mode, then the kthread becomes the storage
area. Now a race exists between two concurrent veth_pool() function calls
one exiting NAPI and one running new NAPI, both using the same BPF net
context.
Race is when another CPU gets within the xdp_set_return_frame_no_direct()
section before exiting veth_pool() calls the clear-function
xdp_clear_return_frame_no_direct().
Fixes: 401cb7dae8 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.")
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/176356963888.337072.4805242001928705046.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In atm_init(), if atmsvc_init() fails, the code jumps to out_atmpvc_exit
label which incorrectly calls atmsvc_exit() instead of atmpvc_exit().
This results in calling the wrong cleanup function and failing to properly
clean up atmpvc_init().
Fix this by calling atmpvc_exit() in the out_atmpvc_exit error path.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Sayooj K Karun <sayooj@aerlync.com>
Link: https://patch.msgid.link/20251119085747.67139-1-sayooj@aerlync.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The change eed467b517 ("Bluetooth: fix passkey uninitialized when used")
introduced a goto that bypasses the creation of temporary mackey and ltk
which are later used by the likes of DHKey Check step.
Later ffee202a78 ("Bluetooth: Always request for user confirmation for
Just Works (LE SC)") which means confirm_hint is always set in case
JUST_WORKS so the branch checking for an existing LTK becomes pointless
as confirm_hint will always be set, so this just merge both cases of
malicious or legitimate devices to be confirmed before continuing with the
pairing procedure.
Link: https://github.com/bluez/bluez/issues/1622
Fixes: eed467b517 ("Bluetooth: fix passkey uninitialized when used")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
In btusb_mtk_setup(), we set `btmtk_data->isopkt_intf` to:
usb_ifnum_to_if(data->udev, MTK_ISO_IFNUM)
That function can return NULL in some cases. Even when it returns
NULL, though, we still go on to call btusb_mtk_claim_iso_intf().
As of commit e9087e8288 ("Bluetooth: btusb: mediatek: Add locks for
usb_driver_claim_interface()"), calling btusb_mtk_claim_iso_intf()
when `btmtk_data->isopkt_intf` is NULL will cause a crash because
we'll end up passing a bad pointer to device_lock(). Prior to that
commit we'd pass the NULL pointer directly to
usb_driver_claim_interface() which would detect it and return an
error, which was handled.
Resolve the crash in btusb_mtk_claim_iso_intf() by adding a NULL check
at the start of the function. This makes the code handle a NULL
`btmtk_data->isopkt_intf` the same way it did before the problematic
commit (just with a slight change to the error message printed).
Reported-by: IncogCyberpunk <incogcyberpunk@proton.me>
Closes: http://lore.kernel.org/r/a380d061-479e-4713-bddd-1d6571ca7e86@leemhuis.info
Fixes: e9087e8288 ("Bluetooth: btusb: mediatek: Add locks for usb_driver_claim_interface()")
Cc: stable@vger.kernel.org
Tested-by: IncogCyberpunk <incogcyberpunk@proton.me>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The hdev lock/lookup/unlock/use pattern in the packet RX path doesn't
ensure hci_conn* is not concurrently modified/deleted. This locking
appears to be leftover from before conn_hash started using RCU
commit bf4c632524 ("Bluetooth: convert conn hash to RCU")
and not clear if it had purpose since then.
Currently, there are code paths that delete hci_conn* from elsewhere
than the ordered hdev->workqueue where the RX work runs in. E.g.
commit 5af1f84ed1 ("Bluetooth: hci_sync: Fix UAF on hci_abort_conn_sync")
introduced some of these, and there probably were a few others before
it. It's better to do the locking so that even if these run
concurrently no UAF is possible.
Move the lookup of hci_conn and associated socket-specific conn to
protocol recv handlers, and do them within a single critical section
to cover hci_conn* usage and lookup.
syzkaller has reported a crash that appears to be this issue:
[Task hdev->workqueue] [Task 2]
hci_disconnect_all_sync
l2cap_recv_acldata(hcon)
hci_conn_get(hcon)
hci_abort_conn_sync(hcon)
hci_dev_lock
hci_dev_lock
hci_conn_del(hcon)
v-------------------------------- hci_dev_unlock
hci_conn_put(hcon)
conn = hcon->l2cap_data (UAF)
Fixes: 5af1f84ed1 ("Bluetooth: hci_sync: Fix UAF on hci_abort_conn_sync")
Reported-by: syzbot+d32d77220b92eddd89ad@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d32d77220b92eddd89ad
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
There is a potential race condition between sock bind and socket write
iter. bind may free the same cmd via mgmt_pending before write iter sends
the cmd, just as syzbot reported in UAF[1].
Here we use hci_dev_lock to synchronize the two, thereby avoiding the
UAF mentioned in [1].
[1]
syzbot reported:
BUG: KASAN: slab-use-after-free in mgmt_pending_remove+0x3b/0x210 net/bluetooth/mgmt_util.c:316
Read of size 8 at addr ffff888077164818 by task syz.0.17/5989
Call Trace:
mgmt_pending_remove+0x3b/0x210 net/bluetooth/mgmt_util.c:316
set_link_security+0x5c2/0x710 net/bluetooth/mgmt.c:1918
hci_mgmt_cmd+0x9c9/0xef0 net/bluetooth/hci_sock.c:1719
hci_sock_sendmsg+0x6ca/0xef0 net/bluetooth/hci_sock.c:1839
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg+0x21c/0x270 net/socket.c:742
sock_write_iter+0x279/0x360 net/socket.c:1195
Allocated by task 5989:
mgmt_pending_add+0x35/0x140 net/bluetooth/mgmt_util.c:296
set_link_security+0x557/0x710 net/bluetooth/mgmt.c:1910
hci_mgmt_cmd+0x9c9/0xef0 net/bluetooth/hci_sock.c:1719
hci_sock_sendmsg+0x6ca/0xef0 net/bluetooth/hci_sock.c:1839
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg+0x21c/0x270 net/socket.c:742
sock_write_iter+0x279/0x360 net/socket.c:1195
Freed by task 5991:
mgmt_pending_free net/bluetooth/mgmt_util.c:311 [inline]
mgmt_pending_foreach+0x30d/0x380 net/bluetooth/mgmt_util.c:257
mgmt_index_removed+0x112/0x2f0 net/bluetooth/mgmt.c:9477
hci_sock_bind+0xbe9/0x1000 net/bluetooth/hci_sock.c:1314
Fixes: 6fe26f694c ("Bluetooth: MGMT: Protect mgmt_pending list with its own lock")
Reported-by: syzbot+9aa47cd4633a3cf92a80@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9aa47cd4633a3cf92a80
Tested-by: syzbot+9aa47cd4633a3cf92a80@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
HCI_OP_NOP means no command was actually sent so there is no point in
triggering cmd_timer which may cause a hdev->reset in the process since
it is assumed that the controller is stuck processing a command.
Fixes: e2d471b780 ("Bluetooth: ISO: Fix not using SID from adv report")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Along with the renaming from task_security_struct to cred_security_struct,
rename the local variables to "crsec" from "tsec". This both fits with
existing conventions and helps distinguish between task and cred related
variables.
No functional changes.
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The avdcache is meant to be per-task; move it to a new
task_security_struct that is duplicated per-task.
Cc: stable@vger.kernel.org
Fixes: 5d7ddc59b3 ("selinux: reduce path walk overhead")
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: line length fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Before Linux had cred structures, the SELinux task_security_struct was
per-task and although the structure was switched to being per-cred
long ago, the name was never updated. This change renames it to
cred_security_struct to avoid confusion and pave the way for the
introduction of an actual per-task security structure for SELinux. No
functional change.
Cc: stable@vger.kernel.org
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Pull sched_ext fix from Tejun Heo:
"One low risk and obvious fix: scx_enable() was dereferencing an error
pointer on helper kthread creation failure. Fixed"
* tag 'sched_ext-for-6.18-rc6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Fix scx_enable() crash on helper kthread creation failure
An empty flush bio can have arbitrary bi_sector. The commit 2b1c6d7a89
introduced a regression that device mapper would fail an empty flush bio
with -EIO if the sector pointed beyond the end of the device.
The commit introduced an optimization, that optimization would pass
flushes to __split_and_process_bio and __split_and_process_bio is not
prepared to handle empty bios. Fix this bug by passing only non-empty
flushes to __split_and_process_bio - non-empty flushes must have valid
bi_sector. Empty bios will go through __send_empty_flush, as they did
before the optimization.
This problem can be reproduced by running the lvm2 test:
make check_local T=lvconvert-thin.sh LVM_TEST_PREFER_BRD=0
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 2b1c6d7a89 ("dm: optimize REQ_PREFLUSH with data when using the linear target")
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
A crash was observed when the sched_ext selftests runner was
terminated with Ctrl+\ while test 15 was running:
NIP [c00000000028fa58] scx_enable.constprop.0+0x358/0x12b0
LR [c00000000028fa2c] scx_enable.constprop.0+0x32c/0x12b0
Call Trace:
scx_enable.constprop.0+0x32c/0x12b0 (unreliable)
bpf_struct_ops_link_create+0x18c/0x22c
__sys_bpf+0x23f8/0x3044
sys_bpf+0x2c/0x6c
system_call_exception+0x124/0x320
system_call_vectored_common+0x15c/0x2ec
kthread_run_worker() returns an ERR_PTR() on failure rather than NULL,
but the current code in scx_alloc_and_add_sched() only checks for a NULL
helper. Incase of failure on SIGQUIT, the error is not handled in
scx_alloc_and_add_sched() and scx_enable() ends up dereferencing an
error pointer.
Error handling is fixed in scx_alloc_and_add_sched() to propagate
PTR_ERR() into ret, so that scx_enable() jumps to the existing error
path, avoiding random dereference on failure.
Fixes: bff3b5aec1 ("sched_ext: Move disable machinery into scx_sched")
Cc: stable@vger.kernel.org # v6.16+
Reported-and-tested-by: Samir Mulani <samir@linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Reviewed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
If timestamp retriving needs to be retried and the local list of
SKB's already has entries, then it's spliced back into the socket
queue. However, the arguments for the splice helper are transposed,
causing exactly the wrong direction of splicing into the on-stack
list. Fix that up.
Cc: stable@vger.kernel.org
Reported-by: Google Big Sleep <big-sleep-vuln-reports+bigsleep-462435176@google.com>
Fixes: 9e4ed359b8 ("io_uring/netcmd: add tx timestamping cmd support")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull power management fix from Rafael Wysocki:
"Fix a regression introduced during the 6.16 development cycle that may
cause runtime PM to be enabled by mistake for devices that do not
support it (which may lead to some serious trouble) if there is a
system wakeup event during the "late suspend" phase of system suspend"
* tag 'pm-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: core: Fix runtime PM enabling in device_resume_early()
Pull ACPI fix from Rafael Wysocki:
"This fixes EINJV2 support introduced during the 6.17 cycle by
unbreaking the initialization broken by a previous attempted fix,
adding sanity checks for data coming from the platform firmware, and
updating the code to handle injecting legacy error types on an EINJV2
capable systems properly (Tony Luck)"
* tag 'acpi-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: APEI: EINJ: Fix EINJV2 initialization and injection
Pull x86 platform driver fixes from Ilpo Järvinen:
"This one has lots of new HW entries which adds to the size in diffstat
but the individual changes are simple.
Fixes
- acer-wmi: Ignore backlight event
- alienware-wmi-wmax: Fix quirk match table order & drop redundant
entries
- amd/pmc:
- Add Xbox Ally to spurious 8042 quirk list
- Quirk list Lenovo Legion Go 2 NVMe resume
- msi-wmi-platform:
- Correct GUID to uppercase
- GUID is uncleverly copy-pasted from an example so add a DMI
whitelist
- intel/speed_select_if: PCIBIOS_* return code conversion
- intel-uncore-freq & ISST: Fix kernel doc warnings
New HW support
- alienware-wmi-wmax:
- Alienware 16 Aurora support
- Alienware M support
- Alienware X support
- Dell G support
- amd/pmc:
- ROG Xbox Ally (non-X) support
- huaway-wmi: HONOR MagicBoox X16/X14 PrintScreen & YOYO keys
- hp-wmi:
- Omen 16-wf1xxx fan support
- Omen MAX 16-ah0xx fan + thermal profile support
- Victus 16-r0 and 16-s0 fan + thermal profile support
- intel/hid: Intel Nova Lake support
- intel-uncore-freq:
- Intel Panther Lake support
- Intel Wildcat Lake support
- Intel Nova Lake support"
* tag 'platform-drivers-x86-v6.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (21 commits)
platform/x86: intel-uncore-freq: fix all header kernel-doc warnings
platform/x86: acer-wmi: Ignore backlight event
platform/x86/intel/speed_select_if: Convert PCIBIOS_* return codes to errnos
platform/x86/intel/hid: Add Nova Lake support
platform/x86: alienware-wmi-wmax: Add AWCC support to Alienware 16 Aurora
platform/x86: hp-wmi: Add Omen MAX 16-ah0xx fan support and thermal profile
platform/x86: msi-wmi-platform: Fix typo in WMI GUID
platform/x86: msi-wmi-platform: Only load on MSI devices
platform/x86/amd: pmc: Add Lenovo Legion Go 2 to pmc quirk list
platform/x86/amd/pmc: Add spurious_8042 to Xbox Ally
platform/x86/amd/pmc: Add support for Van Gogh SoC
platform/x86: alienware-wmi-wmax: Add support for the whole "G" family
platform/x86: alienware-wmi-wmax: Add support for the whole "X" family
platform/x86: alienware-wmi-wmax: Add support for the whole "M" family
platform/x86: alienware-wmi-wmax: Drop redundant DMI entries
platform/x86: alienware-wmi-wmax: Fix "Alienware m16 R1 AMD" quirk order
platform/x86: ISST: isst_if.h: fix all kernel-doc warnings
platform/x86: intel-uncore-freq: Add additional client processors
platform/x86: hp-wmi: Add Omen 16-wf1xxx fan support
platform/x86: huawei-wmi: add keys for HONOR models
...
Pull networking fixes from Jakub Kicinski:
"Including fixes from IPsec and wireless.
Previous releases - regressions:
- prevent NULL deref in generic_hwtstamp_ioctl_lower(),
newer APIs don't populate all the pointers in the request
- phylink: add missing supported link modes for the fixed-link
- mptcp: fix false positive warning in mptcp_pm_nl_rm_addr
Previous releases - always broken:
- openvswitch: remove never-working support for setting NSH fields
- xfrm: number of fixes for error paths of xfrm_state creation/
modification/deletion
- xfrm: fixes for offload
- fix the determination of the protocol of the inner packet
- don't push locally generated packets directly to L2 tunnel
mode offloading, they still need processing from the standard
xfrm path
- mptcp: fix a couple of corner cases in fallback and fastclose
handling
- wifi: rtw89: hw_scan: prevent connections from getting stuck,
work around apparent bug in FW by tweaking messages we send
- af_unix: fix duplicate data if PEEK w/ peek_offset needs to wait
- veth: more robust handing of race to avoid txq getting stuck
- eth: ps3_gelic_net: handle skb allocation failures"
* tag 'net-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
vsock: Ignore signal/timeout on connect() if already established
be2net: pass wrb_params in case of OS2BMC
l2tp: reset skb control buffer on xmit
net: dsa: microchip: lan937x: Fix RGMII delay tuning
selftests: mptcp: add a check for 'add_addr_accepted'
mptcp: fix address removal logic in mptcp_pm_nl_rm_addr
selftests: mptcp: join: userspace: longer timeout
selftests: mptcp: join: endpoints: longer timeout
selftests: mptcp: join: fastclose: remove flaky marks
mptcp: fix duplicate reset on fastclose
mptcp: decouple mptcp fastclose from tcp close
mptcp: do not fallback when OoO is present
mptcp: fix premature close in case of fallback
mptcp: avoid unneeded subflow-level drops
mptcp: fix ack generation for fallback msk
wifi: rtw89: hw_scan: Don't let the operating channel be last
net: phylink: add missing supported link modes for the fixed-link
selftest: af_unix: Add test for SO_PEEK_OFF.
af_unix: Read sk_peek_offset() again after sleeping in unix_stream_read_generic().
net/mlx5: Clean up only new IRQ glue on request_irq() failure
...
tk_aux_sysfs_init() returns immediately on error during the auxiliary clock
initialization loop without cleaning up previously allocated kobjects and
sysfs groups.
If kobject_create_and_add() or sysfs_create_group() fails during loop
iteration, the parent kobjects (tko and auxo) and any previously created
child kobjects are leaked.
Fix this by adding proper error handling with goto labels to ensure all
allocated resources are cleaned up on failure. kobject_put() on the
parent kobjects will handle cleanup of their children.
Fixes: 7b95663a3d ("timekeeping: Provide interface to control auxiliary clocks")
Signed-off-by: Malaya Kumar Rout <mrout@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251120150213.246777-1-mrout@redhat.com
be_insert_vlan_in_pkt() is called with the wrb_params argument being NULL
at be_send_pkt_to_bmc() call site. This may lead to dereferencing a NULL
pointer when processing a workaround for specific packet, as commit
bc0c3405ab ("be2net: fix a Tx stall bug caused by a specific ipv6
packet") states.
The correct way would be to pass the wrb_params from be_xmit().
Fixes: 760c295e0e ("be2net: Support for OS2BMC.")
Cc: stable@vger.kernel.org
Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru>
Link: https://patch.msgid.link/20251119105015.194501-1-a.vatoropin@crpt.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For Security locked drives (drives that have Security enabled, and have
not been Security unlocked by boot firmware), the automatic partition
scanning will result in the user being spammed with errors such as:
ata5.00: failed command: READ DMA
ata5.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 7 dma 4096 in
res 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
ata5.00: status: { DRDY ERR }
ata5.00: error: { ABRT }
sd 4:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
sd 4:0:0:0: [sda] tag#7 Sense Key : Aborted Command [current]
sd 4:0:0:0: [sda] tag#7 Add. Sense: No additional sense information
during boot, because most commands except for IDENTIFY will be aborted by
a Security locked drive.
For a Security locked drive, set capacity to zero, so that no automatic
partition scanning will happen.
If the user later unlocks the drive using e.g. hdparm, the close() by the
user space application should trigger a revalidation of the drive.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Commit cf3fc03762 ("ata: libata-scsi: Fix ata_to_sense_error() status
handling") fixed ata_to_sense_error() to properly generate sense key
ABORTED COMMAND (without any additional sense code), instead of the
previous bogus sense key ILLEGAL REQUEST with the additional sense code
UNALIGNED WRITE COMMAND, for a failed command.
However, this broke suspend for Security locked drives (drives that have
Security enabled, and have not been Security unlocked by boot firmware).
The reason for this is that the SCSI disk driver, for the Synchronize
Cache command only, treats any sense data with sense key ILLEGAL REQUEST
as a successful command (regardless of ASC / ASCQ).
After commit cf3fc03762 ("ata: libata-scsi: Fix ata_to_sense_error()
status handling") the code that treats any sense data with sense key
ILLEGAL REQUEST as a successful command is no longer applicable, so the
command fails, which causes the system suspend to be aborted:
sd 1:0:0:0: PM: dpm_run_callback(): scsi_bus_suspend returns -5
sd 1:0:0:0: PM: failed to suspend async: error -5
PM: Some devices failed to suspend, or early wake event detected
To make suspend work once again, for a Security locked device only,
return sense data LOGICAL UNIT ACCESS NOT AUTHORIZED, the actual sense
data which a real SCSI device would have returned if locked.
The SCSI disk driver treats this sense data as a successful command.
Cc: stable@vger.kernel.org
Reported-by: Ilia Baryshnikov <qwelias@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220704
Fixes: cf3fc03762 ("ata: libata-scsi: Fix ata_to_sense_error() status handling")
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
The L2TP stack did not reset the skb control buffer before sending the
encapsulated package.
In a setup with an ath10k radio and batman-adv over an L2TP tunnel
massive fragmentations happen sporadically if the L2TP tunnel is
established over IPv4.
L2TP might reset some of the fields in the IP control buffer, but L2TP
assumes the type of the control buffer to be of an IPv4 packet.
In case the L2TP interface is used as a batadv hardif or the packet is
an IPv6 packet, this assumption breaks.
Clear the entire control buffer to avoid such mishaps altogether.
Fixes: f77ae93904 ("[PPPOL2TP]: Reset meta-data in xmit function")
Signed-off-by: David Bauer <mail@david-bauer.net>
Link: https://patch.msgid.link/20251118001619.242107-1-mail@david-bauer.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Correct RGMII delay application logic in lan937x_set_tune_adj().
The function was missing `data16 &= ~PORT_TUNE_ADJ` before setting the
new delay value. This caused the new value to be bitwise-OR'd with the
existing PORT_TUNE_ADJ field instead of replacing it.
For example, when setting the RGMII 2 TX delay on port 4, the
intended TUNE_ADJUST value of 0 (RGMII_2_TX_DELAY_2NS) was
incorrectly OR'd with the default 0x1B (from register value 0xDA3),
leaving the delay at the wrong setting.
This patch adds the missing mask to clear the field, ensuring the
correct delay value is written. Physical measurements on the RGMII TX
lines confirm the fix, showing the delay changing from ~1ns (before
change) to ~2ns.
While testing on i.MX 8MP showed this was within the platform's timing
tolerance, it did not match the intended hardware-characterized value.
Fixes: b19ac41faa ("net: dsa: microchip: apply rgmii tx and rx delay in phylink mac config")
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20251114090951.4057261-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
xfs/286 produced this report on my test fleet:
==================================================================
BUG: KFENCE: out-of-bounds read in memcpy_orig+0x54/0x110
Out-of-bounds read at 0xffff88843fe9e038 (184B right of kfence-#184):
memcpy_orig+0x54/0x110
xrep_symlink_salvage_inline+0xb3/0xf0 [xfs]
xrep_symlink_salvage+0x100/0x110 [xfs]
xrep_symlink+0x2e/0x80 [xfs]
xrep_attempt+0x61/0x1f0 [xfs]
xfs_scrub_metadata+0x34f/0x5c0 [xfs]
xfs_ioc_scrubv_metadata+0x387/0x560 [xfs]
xfs_file_ioctl+0xe23/0x10e0 [xfs]
__x64_sys_ioctl+0x76/0xc0
do_syscall_64+0x4e/0x1e0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
kfence-#184: 0xffff88843fe9df80-0xffff88843fe9dfea, size=107, cache=kmalloc-128
allocated by task 3470 on cpu 1 at 263329.131592s (192823.508886s ago):
xfs_init_local_fork+0x79/0xe0 [xfs]
xfs_iformat_local+0xa4/0x170 [xfs]
xfs_iformat_data_fork+0x148/0x180 [xfs]
xfs_inode_from_disk+0x2cd/0x480 [xfs]
xfs_iget+0x450/0xd60 [xfs]
xfs_bulkstat_one_int+0x6b/0x510 [xfs]
xfs_bulkstat_iwalk+0x1e/0x30 [xfs]
xfs_iwalk_ag_recs+0xdf/0x150 [xfs]
xfs_iwalk_run_callbacks+0xb9/0x190 [xfs]
xfs_iwalk_ag+0x1dc/0x2f0 [xfs]
xfs_iwalk_args.constprop.0+0x6a/0x120 [xfs]
xfs_iwalk+0xa4/0xd0 [xfs]
xfs_bulkstat+0xfa/0x170 [xfs]
xfs_ioc_fsbulkstat.isra.0+0x13a/0x230 [xfs]
xfs_file_ioctl+0xbf2/0x10e0 [xfs]
__x64_sys_ioctl+0x76/0xc0
do_syscall_64+0x4e/0x1e0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
CPU: 1 UID: 0 PID: 1300113 Comm: xfs_scrub Not tainted 6.18.0-rc4-djwx #rc4 PREEMPT(lazy) 3d744dd94e92690f00a04398d2bd8631dcef1954
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-4.module+el8.8.0+21164+ed375313 04/01/2014
==================================================================
On further analysis, I realized that the second parameter to min() is
not correct. xfs_ifork::if_bytes is the size of the xfs_ifork::if_data
buffer. if_bytes can be smaller than the data fork size because:
(a) the forkoff code tries to keep the data area as large as possible
(b) for symbolic links, if_bytes is the ondisk file size + 1
(c) forkoff is always a multiple of 8.
Case in point: for a single-byte symlink target, forkoff will be
8 but the buffer will only be 2 bytes long.
In other words, the logic here is wrong and we walk off the end of the
incore buffer. Fix that.
Cc: stable@vger.kernel.org # v6.10
Fixes: 2651923d8d ("xfs: online repair of symbolic links")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Currently cpu-clock event always returns 0 count, e.g.,
perf stat -e cpu-clock -- sleep 1
Performance counter stats for 'sleep 1':
0 cpu-clock # 0.000 CPUs utilized
1.002308394 seconds time elapsed
The root cause is the commit 'bc4394e5e79c ("perf: Fix the throttle
error of some clock events")' adds PERF_EF_UPDATE flag check before
calling cpu_clock_event_update() to update the count, however the
PERF_EF_UPDATE flag is never set when the cpu-clock event is stopped in
counting mode (pmu->dev() -> cpu_clock_event_del() ->
cpu_clock_event_stop()). This leads to the cpu-clock event count is
never updated.
To fix this issue, force to set PERF_EF_UPDATE flag for cpu-clock event
just like what task-clock does.
Fixes: bc4394e5e7 ("perf: Fix the throttle error of some clock events")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://patch.msgid.link/20251112080526.3971392-1-dapeng1.mi@linux.intel.com
Add proper cleanup of ctx->source and fc->source to the
cifs_parse_mount_err error handler. This ensures that memory allocated
for the source strings is correctly freed on all error paths, matching
the cleanup already performed in the success path by
smb3_cleanup_fs_context_contents().
Pointers are also set to NULL after freeing to prevent potential
double-free issues.
This change fixes a memory leak originally detected by syzbot. The
leak occurred when processing Opt_source mount options if an error
happened after ctx->source and fc->source were successfully
allocated but before the function completed.
The specific leak sequence was:
1. ctx->source = smb3_fs_context_fullpath(ctx, '/') allocates memory
2. fc->source = kstrdup(ctx->source, GFP_KERNEL) allocates more memory
3. A subsequent error jumps to cifs_parse_mount_err
4. The old error handler freed passwords but not the source strings,
causing the memory to leak.
This issue was not addressed by commit e8c73eb7db ("cifs: client:
fix memory leak in smb3_fs_context_parse_param"), which only fixed
leaks from repeated fsconfig() calls but not this error path.
Patch updated with minor change suggested by kernel test robot
Reported-by: syzbot+87be6809ed9bf6d718e3@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=87be6809ed9bf6d718e3
Fixes: 24e0a1eff9 ("cifs: switch to new mount api")
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in>
Signed-off-by: Steve French <stfrench@microsoft.com>
Replace close_cached_dir() calls under cfid_list_lock with a new
close_cached_dir_locked() variant that uses kref_put() instead of
kref_put_lock() to avoid recursive locking when dropping references.
While the existing code works if the refcount >= 2 invariant holds,
this area has proven error-prone. Make deadlocks impossible and WARN
on invariant violations.
Cc: stable@vger.kernel.org
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The current LoongArch BPF trampoline implementation is incompatible
with tracing functions in kernel modules. This causes several severe
and user-visible problems:
* The `bpf_selftests/module_attach` test fails consistently.
* Kernel lockup when a BPF program is attached to a module function [1].
* Critical kernel modules like WireGuard experience traffic disruption
when their functions are traced with fentry [2].
Given the severity and the potential for other unknown side-effects, it
is safest to disable the feature entirely for now. This patch prevents
the BPF subsystem from allowing trampoline attachments to kernel module
functions on LoongArch.
This is a temporary mitigation until the core issues in the trampoline
code for kernel module handling can be identified and fixed.
[root@fedora bpf]# ./test_progs -a module_attach -v
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_module_attach:PASS:skel_open 0 nsec
test_module_attach:PASS:set_attach_target 0 nsec
test_module_attach:PASS:set_attach_target_explicit 0 nsec
test_module_attach:PASS:skel_load 0 nsec
libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
test_module_attach:FAIL:skel_attach skeleton attach failed: -524
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
Successfully unloaded bpf_testmod.ko.
[1]: https://lore.kernel.org/loongarch/CAK3+h2wDmpC-hP4u4pJY8T-yfKyk4yRzpu2LMO+C13FMT58oqQ@mail.gmail.com/
[2]: https://lore.kernel.org/loongarch/CAK3+h2wYcpc+OwdLDUBvg2rF9rvvyc5amfHT-KcFaK93uoELPg@mail.gmail.com/
Cc: stable@vger.kernel.org
Fixes: f9b6b41f0c ("LoongArch: BPF: Add basic bpf trampoline support")
Acked-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
If there is no valid cache info detected (may happen in virtual machine)
for pci_dfl_cache_line_size, kernel shouldn't panic. Because in the PCI
core it will be evaluated to (L1_CACHE_BYTES >> 2).
Cc: <stable@vger.kernel.org>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
If the default state of the interrupt controllers in the first kernel
don't mask any interrupts, it may cause the second kernel to potentially
receive interrupts (which were previously allocated by the first kernel)
immediately after a CPU becomes online during its boot process. These
interrupts cannot be properly routed, leading to bad IRQ issues.
This patch calls machine_kexec_mask_interrupts() to mask all interrupts
during the kexec/kdump process.
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
On physical machine, NUMA node id comes from high bit 44:48 of physical
address. However it is not true on virt machine. With general method, it
comes from ACPI SRAT table.
Here the common function numa_memblks_init() is used to parse NUMA node
information with numa_memblks.
Cc: <stable@vger.kernel.org>
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Some processors have no IOCSR.VENDOR and IOCSR.CPUNAME, some processors
have these registers but there is no valid information.
Consolidate CPU names in /proc/cpuinfo:
1. Add "PRID" to display the PRID & Core-Name;
2. Let "Model Name" display "Unknown" if no valid name.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The kernel UAPI headers already contain fixed-width integer types, there
is no need to rely on the libc types. There may not be a libc available
or the libc may not provides the <stdint.h>, like for example on nolibc.
This also aligns the header with the rest of the LoongArch UAPI headers.
Fixes: 803b0fc5c3 ("LoongArch: Add process management")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-11-18 (idpf, ice)
This series contains updates to idpf and ice drivers.
Emil adds a check for NULL vport_config during removal to avoid NULL
pointer dereference in idpf.
Grzegorz fixes PTP teardown paths to account for some missed cleanups
for ice driver.
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: fix PTP cleanup on driver removal in error path
idpf: fix possible vport_config NULL pointer deref in remove
====================
Link: https://patch.msgid.link/20251118235207.2165495-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
mptcp: misc fixes for v6.18-rc7
Here are various unrelated fixes:
- Patch 1: Fix window space computation for fallback connections which
can affect ACK generation. A fix for v5.11.
- Patch 2: Avoid unneeded subflow-level drops due to unsynced received
window. A fix for v5.11.
- Patch 3: Avoid premature close for fallback connections with PREEMPT
kernels. A fix for v5.12.
- Patch 4: Reset instead of fallback in case of data in the MPTCP
out-of-order queue. A fix for v5.7.
- Patches 5-7: Avoid also sending "plain" TCP reset when closing with an
MP_FASTCLOSE. A fix for v6.1.
- Patches 8-9: Longer timeout for background connections in MPTCP Join
selftests. An additional fix for recent patches for v5.13/v6.1.
- Patches 10-11: Fix typo in a check introduce in a recent refactoring.
A fix for v6.15.
====================
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-0-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The previous patch fixed an issue with the 'add_addr_accepted' counter.
This was not spot by the test suite.
Check this counter and 'add_addr_signal' in MPTCP Join 'delete re-add
signal' test. This should help spotting similar regressions later on.
These counters are crucial for ensuring the MPTCP path manager correctly
handles the subflow creation via 'ADD_ADDR'.
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-11-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In rare cases, when the test environment is very slow, some userspace
tests can fail because some expected events have not been seen.
Because the tests are expecting a long on-going connection, and they are
not waiting for the end of the transfer, it is fine to have a longer
timeout, and even go over the default one. This connection will be
killed at the end, after the verifications: increasing the timeout
doesn't change anything, apart from avoiding it to end before the end of
the verifications.
To play it safe, all userspace tests not waiting for the end of the
transfer are now having a longer timeout: 2 minutes.
The Fixes commit was making the connection longer, but still, the
default timeout would have stopped it after 1 minute, which might not be
enough in very slow environments.
Fixes: 290493078b ("selftests: mptcp: join: userspace: longer transfer")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-9-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In rare cases, when the test environment is very slow, some endpoints
tests can fail because some expected events have not been seen.
Because the tests are expecting a long on-going connection, and they are
not waiting for the end of the transfer, it is fine to have a longer
timeout, and even go over the default one. This connection will be
killed at the end, after the verifications: increasing the timeout
doesn't change anything, apart from avoiding it to end before the end of
the verifications.
To play it safe, all endpoints tests not waiting for the end of the
transfer are now having a longer timeout: 2 minutes.
The Fixes commit was making the connection longer, but still, the
default timeout would have stopped it after 1 minute, which might not be
enough in very slow environments.
Fixes: 6457595db9 ("selftests: mptcp: join: endpoints: longer transfer")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-8-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The CI reports sporadic failures of the fastclose self-tests. The root
cause is a duplicate reset, not carrying the relevant MPTCP option.
In the failing scenario the bad reset is received by the peer before
the fastclose one, preventing the reception of the latter.
Indeed there is window of opportunity at fastclose time for the
following race:
mptcp_do_fastclose
__mptcp_close_ssk
__tcp_close()
tcp_set_state() [1]
tcp_send_active_reset() [2]
After [1] the stack will send reset to in-flight data reaching the now
closed port. Such reset may race with [2].
Address the issue explicitly sending a single reset on fastclose before
explicitly moving the subflow to close status.
Fixes: d21f834855 ("mptcp: use fastclose on more edge scenarios")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/596
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-6-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With the current fastclose implementation, the mptcp_do_fastclose()
helper is in charge of two distinct actions: send the fastclose reset
and cleanup the subflows.
Formally decouple the two steps, ensuring that mptcp explicitly closes
all the subflows after the mentioned helper.
This will make the upcoming fix simpler, and allows dropping the 2nd
argument from mptcp_destroy_common(). The Fixes tag is then the same as
in the next commit to help with the backports.
Fixes: d21f834855 ("mptcp: use fastclose on more edge scenarios")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-5-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I'm observing very frequent self-tests failures in case of fallback when
running on a CONFIG_PREEMPT kernel.
The root cause is that subflow_sched_work_if_closed() closes any subflow
as soon as it is half-closed and has no incoming data pending.
That works well for regular subflows - MPTCP needs bi-directional
connectivity to operate on a given subflow - but for fallback socket is
race prone.
When TCP peer closes the connection before the MPTCP one,
subflow_sched_work_if_closed() will schedule the MPTCP worker to
gracefully close the subflow, and shortly after will do another schedule
to inject and process a dummy incoming DATA_FIN.
On CONFIG_PREEMPT kernel, the MPTCP worker can kick-in and close the
fallback subflow before subflow_sched_work_if_closed() is able to create
the dummy DATA_FIN, unexpectedly interrupting the transfer.
Address the issue explicitly avoiding closing fallback subflows on when
the peer is only half-closed.
Note that, when the subflow is able to create the DATA_FIN before the
worker invocation, the worker will change the msk state before trying to
close the subflow and will skip the latter operation as the msk will not
match anymore the precondition in __mptcp_close_subflow().
Fixes: f09b0ad55a ("mptcp: close subflow when receiving TCP+FIN")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-3-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The rcv window is shared among all the subflows. Currently, MPTCP sync
the TCP-level rcv window with the MPTCP one at tcp_transmit_skb() time.
The above means that incoming data may sporadically observe outdated
TCP-level rcv window and being wrongly dropped by TCP.
Address the issue checking for the edge condition before queuing the
data at TCP level, and eventually syncing the rcv window as needed.
Note that the issue is actually present from the very first MPTCP
implementation, but backports older than the blamed commit below will
range from impossible to useless.
Before:
$ nstat -n; sleep 1; nstat -z TcpExtBeyondWindow
TcpExtBeyondWindow 14 0.0
After:
$ nstat -n; sleep 1; nstat -z TcpExtBeyondWindow
TcpExtBeyondWindow 0 0.0
Fixes: fa3fe2b150 ("mptcp: track window announced to peer")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-2-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mptcp_cleanup_rbuf() needs to know the last most recent, mptcp-level
rcv_wnd sent, and such information is tracked into the msk->old_wspace
field, updated at ack transmission time by mptcp_write_options().
Fallback socket do not add any mptcp options, such helper is never
invoked, and msk->old_wspace value remain stale. That in turn makes
ack generation at recvmsg() time quite random.
Address the issue ensuring mptcp_write_options() is invoked even for
fallback sockets, and just update the needed info in such a case.
The issue went unnoticed for a long time, as mptcp currently overshots
the fallback socket receive buffer autotune significantly. It is going
to change in the near future.
Fixes: e3859603ba ("mptcp: better msk receive window updates")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/594
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-1-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Scanning can be offloaded to the firmware. To that end, the driver
prepares a list of channels to scan, including periodic visits back to
the operating channel, and sends the list to the firmware.
When the channel list is too long to fit in a single H2C message, the
driver splits the list, sends the first part, and tells the firmware to
scan. When the scan is complete, the driver sends the next part of the
list and tells the firmware to scan.
When the last channel that fit in the H2C message is the operating
channel something seems to go wrong in the firmware. It will
acknowledge receiving the list of channels but apparently it will not
do anything more. The AP can't be pinged anymore. The driver still
receives beacons, though.
One way to avoid this is to split the list of channels before the
operating channel.
Affected devices:
* RTL8851BU with firmware 0.29.41.3
* RTL8832BU with firmware 0.29.29.8
* RTL8852BE with firmware 0.29.29.8
The commit 57a5fbe39a ("wifi: rtw89: refactor flow that hw scan handles channel list")
is found by git blame, but it is actually to refine the scan flow, but not
a culprit, so skip Fixes tag.
Reported-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Closes: https://lore.kernel.org/linux-wireless/0abbda91-c5c2-4007-84c8-215679e652e1@gmail.com/
Cc: stable@vger.kernel.org # 6.16+
Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/c1e61744-8db4-4646-867f-241b47d30386@gmail.com
The MMIO_REMAP BO is a special 4K IO page that does not have a ttm_tt
behind it. However, amdgpu_ttm_tt_pde_flags() was treating it like
normal TT/doorbell/preempt memory and unconditionally accessed
ttm->caching. For the MMIO_REMAP BO, ttm is NULL, so this leads to a
NULL pointer dereference when computing PDE flags.
Fix this by checking that ttm is non-NULL before reading ttm->caching.
This prevents the crash for MMIO_REMAP and also makes the code more
defensive if other BOs ever come through without a ttm_tt.
Fixes: fb5a52dbe9 ("drm/amdgpu: Implement TTM handling for MMIO_REMAP placement")
Suggested-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Tested-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0db94da5a0)
This fixes sparse mappings (aka. partially resident textures).
Check the correct flags.
Since a recent refactor, the code works with uAPI flags (for
mapping buffer objects), and not PTE (page table entry) flags.
Fixes: 6716a823d1 ("drm/amdgpu: rework how PTE flags are generated v3")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8feeab26c8)
[Why]
Accoreding to CP updated to RS64 on gfx11,
WRITE_DATA with PREEMPTION_META_MEMORY(dst_sel=8) is illegal for CP FW.
That packet is used for MCBP on F32 based system.
So it would lead to incorrect GRBM write and FW is not handling that
extra case correctly.
[How]
With gfx11 rs64 enabled, skip emit de meta data.
Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8366cd442d)
Cc: stable@vger.kernel.org
During the suspend sequence VPE is already going to be power gated
as part of vpe_suspend(). It's unnecessary to call during calls to
amdgpu_device_set_pg_state().
It actually can expose a race condition with the firmware if s0i3
sequence starts as well. Drop these calls.
Cc: Peyton.Lee@amd.com
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2a6c826cfe)
Cc: stable@vger.kernel.org
In commit 0345691b24 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle") the
new function report_idle_softirq() was created by breaking code out of the
existing can_stop_idle_tick() for kernels v5.18 and newer.
In doing so, the code essentially went from this form:
if (A) {
static int ratelimit;
if (ratelimit < 10 && !C && A&D) {
pr_warn("NOHZ tick-stop error: ...");
ratelimit++;
}
return false;
}
to a new function:
static bool report_idle_softirq(void)
{
static int ratelimit;
if (likely(!A))
return false;
if (ratelimit < 10)
return false;
...
pr_warn("NOHZ tick-stop error: local softirq work is pending, handler #%02x!!!\n",
pending);
ratelimit++;
return true;
}
commit a7e282c777 ("tick/rcu: Fix bogus ratelimit condition") realized
ratelimit was essentially set to zero instead of ten, and hence *no*
softirq pending messages would ever be issued, but "fixed" it as:
- if (ratelimit < 10)
+ if (ratelimit >= 10)
return false;
However, this fix introduced another issue:
When ratelimit is greater than or equal 10, even if A is true, it will
directly return false. While ratelimit in the original code was only used
to control printing and will not affect the return value.
Restore the original logic and restrict ratelimit to control the printk and
not the return value.
Fixes: 0345691b24 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle")
Fixes: a7e282c777 ("tick/rcu: Fix bogus ratelimit condition")
Signed-off-by: Wen Yang <wen.yang@linux.dev>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251119174525.29470-1-wen.yang@linux.dev
Pull Allwinner clk driver fixes from Chen-Yu Tsai:
Just a couple fixes for the A523 family. A couple clocks are marked as
critical, and the lower bound of the audio PLL was lowered to match
the datasheet.
* tag 'sunxi-clk-fixes-for-6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
clk: sunxi-ng: sun55i-a523-ccu: Lower audio0 pll minimum rate
clk: sunxi-ng: sun55i-a523-r-ccu: Mark bus-r-dma as critical
clk: sunxi-ng: Mark A523 bus-r-cpucfg clock as critical
Pull SoC fixes from Arnd Bergmann:
"These are mainly devicetree fixes for the arm platforms from Rockchips
NXP, ASpeed and Broadcom, addressing issues with accidental
overclocking, pinctrl, network and dtc warnings.
There are additional fixes for regressions with the i.MX reset and
memory controller drivers as well as the Tegra memory controller
driver.
Minor updates to the MAINTAINERS file, tee documentation and
defconfigs bring those up to date with recent changes elsewhere"
* tag 'soc-fixes-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits)
MAINTAINERS: sync omap devicetree maintainers with omap platform
MAINTAINERS: Update Krzysztof Kozlowski's email
arm64: dts: rockchip: fix PCIe 3.3V regulator voltage on orangepi-5
arm64: dts: rockchip: disable HS400 on RK3588 Tiger
arm64: dts: rockchip: drop reset from rk3576 i2c9 node
tee: <uapi/linux/tee.h: fix all kernel-doc issues
arm64: dts: rockchip: Fix USB power enable pin for BTT CB2 and Pi2
arm64: dts: broadcom: bcm2712: rpi-5: Add ethernet0 alias
arm64: dts: broadcom: Assign clock rates in eth node for RPi5
reset: imx8mp-audiomix: Fix bad mask values
ARM: dts: BCM53573: Fix address of Luxul XAP-1440's Ethernet PHY
arm64: defconfig: Fix V3D deferred probe timeout
arm64: dts: rockchip: Fix vccio4-supply on rk3566-pinetab2
arm64: dts: rockchip: include rk3399-base instead of rk3399 in rk3399-op1
arm64: dts: imx8mp-kontron: Fix USB OTG role switching
arm64: dts: imx95: Fix MSI mapping for PCIe endpoint nodes
arm64: dts: imx8-ss-img: Avoid gpio0_mipi_csi GPIOs being deferred
arm: imx_v6_v7_defconfig: enable ext4 directly
memory: tegra210: Fix incorrect client ids
arm64: dts: rockchip: Fix indentation on rk3399 haikou demo dtso
...
Pull pwm fix from Uwe Kleine-König:
"Correct mismatched pwm chip info for adp5585.
Luke Wang found a problem in the pwm-adp5585 driver about how register
information is mapped to the different device variants. This
effectively made the driver non-functional.
That didn't pop up before because the driver change was developed as
part of a bigger mfd series and the original author didn't retest PWM
functionality after it was tested in an earlier revision but then
reworked"
* tag 'pwm/for-6.18-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
pwm: adp5585: Correct mismatched pwm chip info
Pull HID fixes from Jiri Kosina:
- memory leak fixes in hid-uclogic, hid-ntrig and hid-playstation
drivers (Abdun Nihaal, Masami Ichikawa)
- regression fix for playback handling in hid-pidff (Tomasz Pakuła)
- initialization fix for some amd_sfh platforms (Mario Limonciello)
- a few assorted device-specific ID additions and quirks
* tag 'hid-for-linus-2025111901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: uclogic: Fix potential memory leak in error path
HID: playstation: Fix memory leak in dualshock4_get_calibration_data()
HID: pidff: Fix needs_playback check
HID: corsair-void: Use %pe for printing PTR_ERR
HID: elecom: Add support for ELECOM M-XT3URBK (018F)
HID: hid-input: Extend Elan ignore battery quirk to USB
HID: hid-ntrig: Prevent memory leak in ntrig_report_version()
HID: amd_sfh: Stop sensor before starting
HID: apple: Add SONiX AK870 PRO to non_apple_keyboards quirk list
HID: lenovo: fixup Lenovo Yoga Slim 7x Keyboard rdesc
HID: quirks: work around VID/PID conflict for 0x4c4a/0x4155
Pause, Asym_Pause and Autoneg bits are not set when pl->supported is
initialized, so these link modes will not work for the fixed-link. This
leads to a TCP performance degradation issue observed on the i.MX943
platform.
The switch CPU port of i.MX943 is connected to an ENETC MAC, this link
is a fixed link and the link speed is 2.5Gbps. And one of the switch
user ports is the RGMII interface, and its link speed is 1Gbps. If the
flow-control of the fixed link is not enabled, we can easily observe
the iperf performance of TCP packets is very low. Because the inbound
rate on the CPU port is greater than the outbound rate on the user port,
the switch is prone to congestion, leading to the loss of some TCP
packets and requiring multiple retransmissions.
Solving this problem should be as simple as setting the Asym_Pause and
Pause bits. The reason why the Autoneg bit needs to be set, Russell
has gave a very good explanation in the thread [1], see below.
"As the advertising and lp_advertising bitmasks have to be non-empty,
and the swphy reports aneg capable, aneg complete, and AN enabled, then
for consistency with that state, Autoneg should be set. This is how it
was prior to the blamed commit."
Fixes: de7d3f87be ("net: phylink: Use phy_caps_lookup for fixed-link configuration")
Link: https://lore.kernel.org/aRjqLN8eQDIQfBjS@shell.armlinux.org.uk # [1]
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251117102943.1862680-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull memblock fix from Mike Rapoport:
"Fix memblock_estimated_nr_free_pages() for soft-reserved memory
The "soft-reserved" memory regions (EFI_MEMORY_SP) are added to the
memblock.reserved, but not to the memblock.memory. It causes
memblock_estimated_nr_free_pages() to return a value smaller value
than expected, or if it underflows, an extremely large value.
Calculate the number of estimated free pages using
memblock_reserved_kern_size() instead of memblock_reserved_size() to
fix the issue"
* tag 'fixes-2025-11-19' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock: fix memblock_estimated_nr_free_pages() for soft-reserved memory
Add the missing unevaluatedProperties to disallow extra properties on
the '^conf' nodes.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
ACPI 6.6 specification for EINJV2 appends an extra structure to
the end of the existing struct set_error_type_with_address.
Several issues showed up in testing.
1) Initialization was broken by an earlier fix [1] since is_v2 is only
set while performing an injection, not during initialization.
2) A buggy BIOS provided invalid "revision" and "length" for the
extension structure. Add several sanity checks.
3) When injecting legacy error types on an EINJV2 capable system,
don't copy the component arrays.
Fixes: 6c70585149 ("ACPI: APEI: EINJ: Check if user asked for EINJV2 injection") # [1]
Fixes: b47610296d ("ACPI: APEI: EINJ: Enable EINJv2 error injections")
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ rjw: Changelog edits ]
Cc: 6.17+ <stable@vger.kernel.org> # 6.17+
Link: https://patch.msgid.link/20251119012712.178715-1-tony.luck@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
With the final call to fput() on a file descriptor, the release action
may be deferred and scheduled on a work queue. The reference count of
that descriptor is still zero and it must not be used. It's possible
that a GPIO change, we want to notify the user-space about, happens
AFTER the reference count on the file descriptor associated with the
character device went down to zero but BEFORE the .release() callback
was called from the workqueue and so BEFORE we unregistered from the
notifier.
Using the regular get_file() routine in this situation triggers the
following warning:
struct file::f_count incremented from zero; use-after-free condition present!
So use the get_file_active() variant that will return NULL on file
descriptors that have been or are being released.
Fixes: 40b7c49950 ("gpio: cdev: put emitting the line state events on a workqueue")
Reported-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Closes: https://lore.kernel.org/all/5d605f7fc99456804911403102a4fe999a14cc85.camel@siemens.com/
Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Link: https://lore.kernel.org/r/20251117-gpio-cdev-get-file-v1-1-28a16b5985b8@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Kuniyuki Iwashima says:
====================
af_unix: Fix SO_PEEK_OFF bug in unix_stream_read_generic().
Miao Wang reported a bug of SO_PEEK_OFF on AF_UNIX SOCK_STREAM socket.
Patch 1 fixes the bug and Patch 2 adds a new selftest to cover the case.
====================
Link: https://patch.msgid.link/20251117174740.3684604-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test covers various cases to verify SO_PEEK_OFF behaviour
for all AF_UNIX socket types.
two_chunks_blocking and two_chunks_overlap_blocking reproduce
the issue mentioned in the previous patch.
Without the patch, the two tests fail:
# RUN so_peek_off.stream.two_chunks_blocking ...
# so_peek_off.c:121:two_chunks_blocking:Expected 'bbbb' == 'aaaabbbb'.
# two_chunks_blocking: Test terminated by assertion
# FAIL so_peek_off.stream.two_chunks_blocking
not ok 3 so_peek_off.stream.two_chunks_blocking
# RUN so_peek_off.stream.two_chunks_overlap_blocking ...
# so_peek_off.c:159:two_chunks_overlap_blocking:Expected 'bbbb' == 'aaaabbbb'.
# two_chunks_overlap_blocking: Test terminated by assertion
# FAIL so_peek_off.stream.two_chunks_overlap_blocking
not ok 5 so_peek_off.stream.two_chunks_overlap_blocking
With the patch, all tests pass:
# PASSED: 15 / 15 tests passed.
# Totals: pass:15 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251117174740.3684604-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Miao Wang reported a bug of SO_PEEK_OFF on AF_UNIX SOCK_STREAM
socket.
The unexpected behaviour is triggered when the peek offset is
larger than the recv queue and the thread is unblocked by new
data.
Let's assume a socket which has "aaaa" in the recv queue and
the peek offset is 4.
First, unix_stream_read_generic() reads the offset 4 and skips
the skb(s) of "aaaa" with the code below:
skip = max(sk_peek_offset(sk, flags), 0); /* @skip is 4. */
do {
...
while (skip >= unix_skb_len(skb)) {
skip -= unix_skb_len(skb);
...
skb = skb_peek_next(skb, &sk->sk_receive_queue);
if (!skb)
goto again; /* @skip is 0. */
}
The thread jumps to the 'again' label and goes to sleep since
new data has not arrived yet.
Later, new data "bbbb" unblocks the thread, and the thread jumps
to the 'redo:' label to restart the entire process from the first
skb in the recv queue.
do {
...
redo:
...
last = skb = skb_peek(&sk->sk_receive_queue);
...
again:
if (skb == NULL) {
...
timeo = unix_stream_data_wait(sk, timeo, last,
last_len, freezable);
...
goto redo; /* @skip is 0 !! */
However, the peek offset is not reset in the path.
If the buffer size is 8, recv() will return "aaaabbbb" without
skipping any data, and the final offset will be 12 (the original
offset 4 + peeked skbs' length 8).
After sleeping in unix_stream_read_generic(), we have to fetch the
peek offset again.
Let's move the redo label before mutex_lock(&u->iolock).
Fixes: 9f389e3567 ("af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag")
Reported-by: Miao Wang <shankerwangmiao@gmail.com>
Closes: https://lore.kernel.org/netdev/3B969F90-F51F-4B9D-AB1A-994D9A54D460@gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251117174740.3684604-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Steffen Klassert says:
====================
pull request (net): ipsec 2025-11-18
1) Misc fixes for xfrm_state creation/modification/deletion.
Patchset from Sabrina Dubroca.
2) Fix inner packet family determination for xfrm offloads.
From Jianbo Liu.
3) Don't push locally generated packets directly to L2 tunnel
mode offloading, they still need processing from the standard
xfrm path. From Jianbo Liu.
4) Fix memory leaks in xfrm_add_acquire for policy offloads and policy
security contexts. From Zilin Guan.
* tag 'ipsec-2025-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: fix memory leak in xfrm_add_acquire()
xfrm: Prevent locally generated packets from direct output in tunnel mode
xfrm: Determine inner GSO type from packet inner protocol
xfrm: Check inner packet family directly from skb_dst
xfrm: check all hash buckets for leftover states during netns deletion
xfrm: set err and extack on failure to create pcpu SA
xfrm: call xfrm_dev_state_delete when xfrm_state_migrate fails to add the state
xfrm: make state as DEAD before final put when migrate fails
xfrm: also call xfrm_state_delete_tunnel at destroy time for states that were never added
xfrm: drop SA reference in xfrm_state_update if dir doesn't match
====================
Link: https://patch.msgid.link/20251118085344.2199815-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The function devl_rate_nodes_destroy is documented to "Unset parent for
all rate objects". However, it was only calling the driver-specific
`rate_leaf_parent_set` or `rate_node_parent_set` ops and decrementing
the parent's refcount, without actually setting the
`devlink_rate->parent` pointer to NULL.
This leaves a dangling pointer in the `devlink_rate` struct, which cause
refcount error in netdevsim[1] and mlx5[2]. In addition, this is
inconsistent with the behavior of `devlink_nl_rate_parent_node_set`,
where the parent pointer is correctly cleared.
This patch fixes the issue by explicitly setting `devlink_rate->parent`
to NULL after notifying the driver, thus fulfilling the function's
documented behavior for all rate objects.
[1]
repro steps:
echo 1 > /sys/bus/netdevsim/new_device
devlink dev eswitch set netdevsim/netdevsim1 mode switchdev
echo 1 > /sys/bus/netdevsim/devices/netdevsim1/sriov_numvfs
devlink port function rate add netdevsim/netdevsim1/test_node
devlink port function rate set netdevsim/netdevsim1/128 parent test_node
echo 1 > /sys/bus/netdevsim/del_device
dmesg:
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 8 PID: 1530 at lib/refcount.c:31 refcount_warn_saturate+0x42/0xe0
CPU: 8 UID: 0 PID: 1530 Comm: bash Not tainted 6.18.0-rc4+ #1 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:refcount_warn_saturate+0x42/0xe0
Call Trace:
<TASK>
devl_rate_leaf_destroy+0x8d/0x90
__nsim_dev_port_del+0x6c/0x70 [netdevsim]
nsim_dev_reload_destroy+0x11c/0x140 [netdevsim]
nsim_drv_remove+0x2b/0xb0 [netdevsim]
device_release_driver_internal+0x194/0x1f0
bus_remove_device+0xc6/0x130
device_del+0x159/0x3c0
device_unregister+0x1a/0x60
del_device_store+0x111/0x170 [netdevsim]
kernfs_fop_write_iter+0x12e/0x1e0
vfs_write+0x215/0x3d0
ksys_write+0x5f/0xd0
do_syscall_64+0x55/0x10f0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
[2]
devlink dev eswitch set pci/0000:08:00.0 mode switchdev
devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 1000
devlink port function rate add pci/0000:08:00.0/group1
devlink port function rate set pci/0000:08:00.0/32768 parent group1
modprobe -r mlx5_ib mlx5_fwctl mlx5_core
dmesg:
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 7 PID: 16151 at lib/refcount.c:31 refcount_warn_saturate+0x42/0xe0
CPU: 7 UID: 0 PID: 16151 Comm: bash Not tainted 6.17.0-rc7_for_upstream_min_debug_2025_10_02_12_44 #1 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:refcount_warn_saturate+0x42/0xe0
Call Trace:
<TASK>
devl_rate_leaf_destroy+0x8d/0x90
mlx5_esw_offloads_devlink_port_unregister+0x33/0x60 [mlx5_core]
mlx5_esw_offloads_unload_rep+0x3f/0x50 [mlx5_core]
mlx5_eswitch_unload_sf_vport+0x40/0x90 [mlx5_core]
mlx5_sf_esw_event+0xc4/0x120 [mlx5_core]
notifier_call_chain+0x33/0xa0
blocking_notifier_call_chain+0x3b/0x50
mlx5_eswitch_disable_locked+0x50/0x110 [mlx5_core]
mlx5_eswitch_disable+0x63/0x90 [mlx5_core]
mlx5_unload+0x1d/0x170 [mlx5_core]
mlx5_uninit_one+0xa2/0x130 [mlx5_core]
remove_one+0x78/0xd0 [mlx5_core]
pci_device_remove+0x39/0xa0
device_release_driver_internal+0x194/0x1f0
unbind_store+0x99/0xa0
kernfs_fop_write_iter+0x12e/0x1e0
vfs_write+0x215/0x3d0
ksys_write+0x5f/0xd0
do_syscall_64+0x53/0x1f0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Fixes: d755598450 ("devlink: Allow setting parent node of rate objects")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763381149-1234377-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
s32_pmx_gpio_request_enable() does not initialize the newly-allocated
gpio_pin_config::list before adding it to s32_pinctrl::gpio_configs.
This could result in a linked list corruption.
Initialize the new list_head with INIT_LIST_HEAD() to fix this.
Fixes: fd84aaa817 ("pinctrl: add NXP S32 SoC family support")
Signed-off-by: Jared Kangas <jkangas@redhat.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
s32_pinctrl_desc is allocated with devm_kmalloc(), but not all of its
fields are initialized. Notably, num_custom_params is used in
pinconf_generic_parse_dt_config(), resulting in intermittent allocation
errors, such as the following splat when probing i2c-imx:
WARNING: CPU: 0 PID: 176 at mm/page_alloc.c:4795 __alloc_pages_noprof+0x290/0x300
[...]
Hardware name: NXP S32G3 Reference Design Board 3 (S32G-VNP-RDB3) (DT)
[...]
Call trace:
__alloc_pages_noprof+0x290/0x300 (P)
___kmalloc_large_node+0x84/0x168
__kmalloc_large_node_noprof+0x34/0x120
__kmalloc_noprof+0x2ac/0x378
pinconf_generic_parse_dt_config+0x68/0x1a0
s32_dt_node_to_map+0x104/0x248
dt_to_map_one_config+0x154/0x1d8
pinctrl_dt_to_map+0x12c/0x280
create_pinctrl+0x6c/0x270
pinctrl_get+0xc0/0x170
devm_pinctrl_get+0x50/0xa0
pinctrl_bind_pins+0x60/0x2a0
really_probe+0x60/0x3a0
[...]
__platform_driver_register+0x2c/0x40
i2c_adap_imx_init+0x28/0xff8 [i2c_imx]
[...]
This results in later parse failures that can cause issues in dependent
drivers:
s32g-siul2-pinctrl 4009c240.pinctrl: /soc@0/pinctrl@4009c240/i2c0-pins/i2c0-grp0: could not parse node property
s32g-siul2-pinctrl 4009c240.pinctrl: /soc@0/pinctrl@4009c240/i2c0-pins/i2c0-grp0: could not parse node property
[...]
pca953x 0-0022: failed writing register: -6
i2c i2c-0: IMX I2C adapter registered
s32g-siul2-pinctrl 4009c240.pinctrl: /soc@0/pinctrl@4009c240/i2c2-pins/i2c2-grp0: could not parse node property
s32g-siul2-pinctrl 4009c240.pinctrl: /soc@0/pinctrl@4009c240/i2c2-pins/i2c2-grp0: could not parse node property
i2c i2c-1: IMX I2C adapter registered
s32g-siul2-pinctrl 4009c240.pinctrl: /soc@0/pinctrl@4009c240/i2c4-pins/i2c4-grp0: could not parse node property
s32g-siul2-pinctrl 4009c240.pinctrl: /soc@0/pinctrl@4009c240/i2c4-pins/i2c4-grp0: could not parse node property
i2c i2c-2: IMX I2C adapter registered
Fix this by initializing s32_pinctrl_desc with devm_kzalloc() instead of
devm_kmalloc() in s32_pinctrl_probe(), which sets the previously
uninitialized fields to zero.
Fixes: fd84aaa817 ("pinctrl: add NXP S32 SoC family support")
Signed-off-by: Jared Kangas <jkangas@redhat.com>
Tested-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Improve the cleanup on releasing PTP resources in error path.
The error case might happen either at the driver probe and PTP
feature initialization or on PTP restart (errors in reset handling, NVM
update etc). In both cases, calls to PF PTP cleanup (ice_ptp_cleanup_pf
function) and 'ps_lock' mutex deinitialization were missed.
Additionally, ptp clock was not unregistered in the latter case.
Keep PTP state as 'uninitialized' on init to distinguish between error
scenarios and to avoid resource release duplication at driver removal.
The consequence of missing ice_ptp_cleanup_pf call is the following call
trace dumped when ice_adapter object is freed (port list is not empty,
as it is required at this stage):
[ T93022] ------------[ cut here ]------------
[ T93022] WARNING: CPU: 10 PID: 93022 at
ice/ice_adapter.c:67 ice_adapter_put+0xef/0x100 [ice]
...
[ T93022] RIP: 0010:ice_adapter_put+0xef/0x100 [ice]
...
[ T93022] Call Trace:
[ T93022] <TASK>
[ T93022] ? ice_adapter_put+0xef/0x100 [ice
33d2647ad4f6d866d41eefff1806df37c68aef0c]
[ T93022] ? __warn.cold+0xb0/0x10e
[ T93022] ? ice_adapter_put+0xef/0x100 [ice
33d2647ad4f6d866d41eefff1806df37c68aef0c]
[ T93022] ? report_bug+0xd8/0x150
[ T93022] ? handle_bug+0xe9/0x110
[ T93022] ? exc_invalid_op+0x17/0x70
[ T93022] ? asm_exc_invalid_op+0x1a/0x20
[ T93022] ? ice_adapter_put+0xef/0x100 [ice
33d2647ad4f6d866d41eefff1806df37c68aef0c]
[ T93022] pci_device_remove+0x42/0xb0
[ T93022] device_release_driver_internal+0x19f/0x200
[ T93022] driver_detach+0x48/0x90
[ T93022] bus_remove_driver+0x70/0xf0
[ T93022] pci_unregister_driver+0x42/0xb0
[ T93022] ice_module_exit+0x10/0xdb0 [ice
33d2647ad4f6d866d41eefff1806df37c68aef0c]
...
[ T93022] ---[ end trace 0000000000000000 ]---
[ T93022] ice: module unloaded
Fixes: e800654e85 ("ice: Use ice_adapter for PTP shared data instead of auxdev")
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Regulator/supply fixes for a number of boards, removed too fast
cpu OPPs from rk3576 (not supported in newer vendor TF-A and never
supported in upstream TF-A). As well as some DTS validation fixes
and one pinctrl fix for the odroid-m1.
* tag 'v6.18-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: fix PCIe 3.3V regulator voltage on orangepi-5
arm64: dts: rockchip: disable HS400 on RK3588 Tiger
arm64: dts: rockchip: drop reset from rk3576 i2c9 node
arm64: dts: rockchip: Fix USB power enable pin for BTT CB2 and Pi2
arm64: dts: rockchip: Fix vccio4-supply on rk3566-pinetab2
arm64: dts: rockchip: include rk3399-base instead of rk3399 in rk3399-op1
arm64: dts: rockchip: Fix indentation on rk3399 haikou demo dtso
arm64: dts: rockchip: Make RK3588 GPU OPP table naming less generic
arm64: dts: rockchip: Drop 'rockchip,grf' prop from tsadc on rk3328
arm64: dts: rockchip: Remove non-functioning CPU OPPs from RK3576
arm64: dts: rockchip: Fix PCIe power enable pin for BigTreeTech CB2 and Pi2
arm64: dts: rockchip: Set correct pinctrl for I2S1 8ch TX on odroid-m1
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Current gu2host handler registered as MSI-X vector 0 and as per bspec for
a msix vector 0 interrupt, the driver must check the legacy registers
190008(TILE_INT_REG), 190060h (GT INTR Identity Reg 0) and other registers
mentioned in "Interrupt Service Routine Pseudocode" otherwise it will block
the next interrupts. To overcome this issue replacing guc2host handler
with legacy xe_irq_handler.
Fixes: da889070be ("drm/xe/irq: Separate MSI and MSI-X flows")
Bspec: 62357
Signed-off-by: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patch.msgid.link/20251107083141.2080189-1-venkata.ramana.nayana@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit c34a14bce7090862ebe5a64abe8d85df75e62737)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
If user provides a large value (such as 0x80) for parameter
prefetch_mem_region_instance in vm_bind ioctl, it will cause
BIT(prefetch_region) overflow as below:
"
------------[ cut here ]------------
UBSAN: shift-out-of-bounds in drivers/gpu/drm/xe/xe_vm.c:3414:7
shift exponent 128 is too large for 64-bit type 'long unsigned int'
CPU: 8 UID: 0 PID: 53120 Comm: xe_exec_system_ Tainted: G W 6.18.0-rc1-lgci-xe-kernel+ #200 PREEMPT(voluntary)
Tainted: [W]=WARN
Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023
Call Trace:
<TASK>
dump_stack_lvl+0xa0/0xc0
dump_stack+0x10/0x20
ubsan_epilogue+0x9/0x40
__ubsan_handle_shift_out_of_bounds+0x10e/0x170
? mutex_unlock+0x12/0x20
xe_vm_bind_ioctl.cold+0x20/0x3c [xe]
...
"
Fix it by validating prefetch_region before the BIT() usage.
v2: Add Closes and Cc stable kernels. (Matt)
Reported-by: Koen Koning <koen.koning@intel.com>
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6478
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20251112181005.2120521-2-shuicheng.lin@intel.com
(cherry picked from commit 8f565bdd14eec5611cc041dba4650e42ccdf71d9)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Pull kvm fixes from Paolo Bonzini:
"Arm:
- Only adjust the ID registers when no irqchip has been created once
per VM run, instead of doing it once per vcpu, as this otherwise
triggers a pretty bad conbsistency check failure in the sysreg code
- Make sure the per-vcpu Fine Grain Traps are computed before we load
the system registers on the HW, as we otherwise start running
without anything set until the first preemption of the vcpu
x86:
- Fix selftests failure on AMD, checking for an optimization that was
not happening anymore"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SVM: Fix redundant updates of LBR MSR intercepts
KVM: arm64: VHE: Compute fgt traps before activating them
KVM: arm64: Finalize ID registers only once per VM
pcache_meta_find_latest() leaves whatever it last copied into the
caller’s buffer even when it returns NULL. For cache_info_init(),
that meant cache->cache_info could still contain CRC-bad garbage when
no valid metadata exists, leading later initialization paths to read
bogus flags.
Explicitly memset cache->cache_info in cache_info_init_default()
so new-cache paths start from a clean slate. The default sequence
number assignment becomes redundant with this reset, so it drops out.
Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
Reviewed-by: Zheng Gu <cengku@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
pcache_meta_find_latest() already computes the metadata address as
meta_addr. Reuse that instead of recomputing.
Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
CONFIG_BCACHE is tristate, so dm-pcache can also be built-in.
Switch the Makefile to use obj-$(CONFIG_DM_PCACHE) so the target can be
linked into vmlinux instead of always being a loadable module.
Also rename cache_flush() to pcache_cache_flush() to avoid a global
symbol clash with sunrpc/cache.c's cache_flush().
Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
[Why]
Existing routine has two conversion sequence,
pbn_to_kbps and kbps_to_pbn with margin.
Non of those has without-margin calculation.
kbps_to_pbn with margin conversion includes
fec overhead which has already been included in
pbn_div calculation with 0.994 factor considered.
It is a double counted fec overhead factor that causes
potential bw loss.
[How]
Add without-margin calculation.
Fix fec overhead double counted issue.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3735
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e0dec00f3d)
Cc: stable@vger.kernel.org
In uclogic_params_ugee_v2_init_event_hooks(), the memory allocated for
event_hook is not freed in the next error path. Fix that by freeing it.
Fixes: a251d6576d ("HID: uclogic: Handle wireless device reconnection")
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
[Why]
On DCN20 & DCN30, the 6th DPP's & HUBP's are powered on permanently and
cannot be power gated. Thus, when dpp_reset() is invoked for the DPP5,
while it's still powered on, the cached cursor_state
(dpp_base->pos.cur0_ctl.bits.cur0_enable)
and the actual state (CUR0_ENABLE) bit are unsycned. This can cause a
double cursor in full screen with non-native scaling.
[How]
Force disable cursor on DPP5 on plane powerdown for ASICs w/ 6 DPPs/HUBPs.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4673
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 79b3c037f9)
Cc: stable@vger.kernel.org
The memory allocated for buf is not freed in the error paths when
ps_get_report() fails. Free buf before jumping to transfer_failed label
Fixes: 947992c7fa ("HID: playstation: DS4: Fix calibration workaround for clone devices")
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Reviewed-by: Silvan Jegen <s.jegen@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
[Why]
Some monitors perform rapid “autoscan” HPD re‑assertions right after a
disconnect or powersaving mode enablement. These appear as a quick
disconnect→reconnect with an identical EDID. Since Linux has no HDMI
hotplug detection (HPD) filter, these quick reconnects are seen as hotplug
events, which can unintentionally wake a system with DPMS off.
An example: https://gitlab.freedesktop.org/drm/amd/-/issues/2876
Such 'fake reconnects' are considered when the interval between a
disconnect and a connect is within 1500ms (experimentally chosen using
several monitors), and the two connections have the same EDID.
[How]
Implement a time-based debounce mechanism:
1. On HDMI disconnect detection, instead of immediately processing the
HPD event, save the current sink and schedule delayed work (default 1500ms)
2. If another HDMI disconnect HPD event arrives during the debounce period,
it reschedules the pending work, ensuring only the final state is processed.
3. When the debounce timer expires, re-detect the display and compare the
new sink with the cached one using EDID comparison.
4. If sinks match (same EDID), this was a spontaneous HPD toggle:
- Update connector state internally
- Skip hotplug event to prevent desktop rearrangement
If sinks differ, this was a real display change:
- Process normally with the hotplug event
The debounce delay is configurable via module parameter
'hdmi_hpd_debounce_delay_ms'.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2876
Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c918e75e1e)
A small bug made it's way here when rewriting code to Linux quality.
Currently, if an effect is not infinite and a program requests it's
playback with the same number of loops, the play command won't be fired
and if an effect is infinite, the spam will continue.
We want every playback update for non-infinite effects and only some
for infinite (detecting when a program requests stop with 0 which will
be different than previous value which is usually 1 or 255).
Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
[Why]
When a monitor is booting it's possible that it isn't ready to retrieve
link caps and this can lead to an EDID read failure:
```
[drm:retrieve_link_cap [amdgpu]] *ERROR* retrieve_link_cap: Read receiver caps dpcd data failed.
amdgpu 0000:c5:00.0: [drm] *ERROR* No EDID read.
```
[How]
Rather than msleep once and try a few times, msleep each time. Should
be no changes for existing working monitors, but should correct reading
caps on a monitor that is slow to boot.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4672
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 669dca37b3)
Cc: stable@vger.kernel.org
The ELECOM M-XT3URBK trackball has an additional device ID (0x018F), which
shares the same report descriptor as the existing device (0x00FB). However,
the driver does not currently recognize this new ID, resulting in only five
buttons being functional.
This patch adds the new device ID so that all six buttons work properly.
Signed-off-by: Naoki Ueki <naoki25519@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Don't update the LBR MSR intercept bitmaps if they're already up-to-date,
as unconditionally updating the intercepts forces KVM to recalculate the
MSR bitmaps for vmcb02 on every nested VMRUN. The redundant updates are
functionally okay; however, they neuter an optimization in Hyper-V
nested virtualization enlightenments and this manifests as a self-test
failure.
In particular, Hyper-V lets L1 mark "nested enlightenments" as clean, i.e.
tell KVM that no changes were made to the MSR bitmap since the last VMRUN.
The hyperv_svm_test KVM selftest intentionally changes the MSR bitmap
"without telling KVM about it" to verify that KVM honors the clean hint,
correctly fails because KVM notices the changed bitmap anyway:
==== Test Assertion Failure ====
x86/hyperv_svm_test.c:120: vmcb->control.exit_code == 0x081
pid=193558 tid=193558 errno=4 - Interrupted system call
1 0x0000000000411361: assert_on_unhandled_exception at processor.c:659
2 0x0000000000406186: _vcpu_run at kvm_util.c:1699
3 (inlined by) vcpu_run at kvm_util.c:1710
4 0x0000000000401f2a: main at hyperv_svm_test.c:175
5 0x000000000041d0d3: __libc_start_call_main at libc-start.o:?
6 0x000000000041f27c: __libc_start_main_impl at ??:?
7 0x00000000004021a0: _start at ??:?
vmcb->control.exit_code == SVM_EXIT_VMMCALL
Do *not* fix this by skipping svm_hv_vmcb_dirty_nested_enlightenments()
when svm_set_intercept_for_msr() performs a no-op change. changes to
the L0 MSR interception bitmap are only triggered by full CPUID updates
and MSR filter updates, both of which should be rare. Changing
svm_set_intercept_for_msr() risks hiding unintended pessimizations
like this one, and is actually more complex than this change.
Fixes: fbe5e5f030 ("KVM: nSVM: Always recalculate LBR MSR intercepts in svm_update_lbrv()")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251112013017.1836863-1-yosry.ahmed@linux.dev
[Rewritten commit message based on mailing list discussion. - Paolo]
Reviewed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[why]
1. With allow_0_dtb_clk enabled, the time required to latch DTBCLK to 600 MHz
depends on the SMU. If DTBCLK is not latched to 600 MHz before set_mode completes,
gating DTBCLK causes the DP2 sink to lose its clock source.
2. The existing DTBCLK gating sequence ungates DTBCLK based on both pix_clk and ref_dtbclk,
but gates DTBCLK when either pix_clk or ref_dtbclk is zero.
pix_clk can be zero outside the set_mode sequence before DTBCLK is properly latched,
which can lead to DTBCLK being gated by mistake.
[how]
Consider both pixel_clk and ref_dtbclk when determining when it is safe to gate DTBCLK;
this is more accurate.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4701
Fixes: 5949e7c489 ("drm/amd/display: Enable Dynamic DTBCLK Switch")
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d04eb0c402)
Cc: stable@vger.kernel.org
KVM/arm64 fixes for 6.18, take #3
- Only adjust the ID registers when no irqchip has been created once
per VM run, instead of doing it once per vcpu, as this otherwise
triggers a pretty bad conbsistency check failure in the sysreg code.
- Make sure the per-vcpu Fine Grain Traps are computed before we load
the system registers on the HW, as we otherwise start running without
anything set until the first preemption of the vcpu.
The recent fix to properly initialize the tags of the huge zero folio
had an unfortunate not-so-subtle side effect: it caused the actual
*contents* of the huge zero folio to not be initialized at all when the
hardware didn't support the memory tagging.
The reason was the unfortunate semantics of tag_clear_highpage(): on
hardware that didn't do the tagging, it would silently just not do
anything at all. And since this is done only on arm64 with MTE support,
that basically meant most hardware.
It wasn't necessarily immediately obvious since the huge zero page isn't
necessarily very heavily used - or because it might already be zero
because all-zeroes is the most common pattern. But it ends up causing
random odd user space failures when you do hit it.
The unfortunate semantics have been around for a while, but became a
real bug only when we started actively using __GFP_ZEROTAGS in the
generic get_huge_zero_folio() function - before that, it had only ever
been used in code that checked that the hardware supported it.
Fix this by simply changing the semantics of tag_clear_highpage() to
return whether it actually successfully did something or not. While at
it, also make it initialize multiple pages in one go, since that's
actually what the only caller wants it to do and it simplifies the whole
logic.
Fixes: adfb6609c6 ("mm/huge_memory: initialise the tags of the huge zero folio")
Link: https://lore.kernel.org/all/20251117082023.90176-1-00107082@163.com/
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reported-and-tested-by: David Wang <00107082@163.com>
Reported-and-tested-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Runtime PM should only be enabled in device_resume_early() if it has
been disabled for the given device by device_suspend_late(). Otherwise,
it may cause runtime PM callbacks to run prematurely in some cases
which leads to further functional issues.
Make two changes to address this problem.
First, reorder device_suspend_late() to only disable runtime PM for a
device when it is going to look for the device's callback or if the
device is a "syscore" one. In all of the other cases, disabling runtime
PM for the device is not in fact necessary. However, if the device's
callback returns an error and the power.is_late_suspended flag is not
going to be set, enable runtime PM so it only remains disabled when
power.is_late_suspended is set.
Second, make device_resume_early() only enable runtime PM for the
devices with the power.is_late_suspended flag set.
Fixes: 443046d1ad ("PM: sleep: Make suspend of devices more asynchronous")
Reported-by: Rose Wu <ya-jou.wu@mediatek.com>
Closes: https://lore.kernel.org/linux-pm/70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com/
Cc: 6.16+ <stable@vger.kernel.org> # 6.16+
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/12784270.O9o76ZdvQC@rafael.j.wysocki
On PTL, no combo PHY is connected to PORT B. However, PORT B can
still be used for Type-C and will utilize the C20 PHY for eDP
over Type-C. In such configurations, VBTs also enumerate PORT B.
This leads to issues where PORT B is incorrectly identified as using the
C10 PHY, due to the assumption that returning true for PORT B in
intel_encoder_is_c10phy() would not cause problems.
From PTL's perspective, only PORT A/PHY A uses the C10 PHY.
Update the helper intel_encoder_is_c10phy() to return true only for
PORT A/PHY on PTL.
v2: Change the condition code style for ptl/wcl
Bspec: 72571,73944
Fixes: 9d10de78a3 ("drm/i915/wcl: C10 phy connected to port A and B")
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://lore.kernel.org/r/20250922150317.2334680-4-dnyaneshwar.bhadane@intel.com
(cherry picked from commit 8147f7a1c0)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Handle skb allocation failures in RX path, to avoid NULL pointer
dereference and RX stalls under memory pressure. If the refill fails
with -ENOMEM, complete napi polling and wake up later to retry via timer.
Also explicitly re-enable RX DMA after oom, so the dmac doesn't remain
stopped in this situation.
Previously, memory pressure could lead to skb allocation failures and
subsequent Oops like:
Oops: Kernel access of bad area, sig: 11 [#2]
Hardware name: SonyPS3 Cell Broadband Engine 0x701000 PS3
NIP [c0003d0000065900] gelic_net_poll+0x6c/0x2d0 [ps3_gelic] (unreliable)
LR [c0003d00000659c4] gelic_net_poll+0x130/0x2d0 [ps3_gelic]
Call Trace:
gelic_net_poll+0x130/0x2d0 [ps3_gelic] (unreliable)
__napi_poll+0x44/0x168
net_rx_action+0x178/0x290
Steps to reproduce the issue:
1. Start a continuous network traffic, like scp of a 20GB file
2. Inject failslab errors using the kernel fault injection:
echo -1 > /sys/kernel/debug/failslab/times
echo 30 > /sys/kernel/debug/failslab/interval
echo 100 > /sys/kernel/debug/failslab/probability
3. After some time, traces start to appear, kernel Oopses
and the system stops
Step 2 is not always necessary, as it is usually already triggered by
the transfer of a big enough file.
Fixes: 02c1889166 ("ps3: gigabit ethernet driver for PS3, take3")
Signed-off-by: Florian Fuchs <fuchsfl@gmail.com>
Link: https://patch.msgid.link/20251113181000.3914980-1-fuchsfl@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The loops in 'qede_tpa_cont()' and 'qede_tpa_end()', iterate
over 'cqe->len_list[]' using only a zero-length terminator as
the stopping condition. If the terminator was missing or
malformed, the loop could run past the end of the fixed-size array.
Add an explicit bound check using ARRAY_SIZE() in both loops to prevent
a potential out-of-bounds access.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 55482edc25 ("qede: Add slowpath/fastpath support and enable hardware GRO")
Signed-off-by: Pavel Zhigulin <Pavel.Zhigulin@kaspersky.com>
Link: https://patch.msgid.link/20251113112757.4166625-1-Pavel.Zhigulin@kaspersky.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In file uncore-frequency/uncore-frequency-common.h,
correct all kernel-doc warnings by adding missing leading " *" to some
lines, adding a missing kernel-doc entry, and fixing a name typo.
Warning: uncore-frequency-common.h:50 bad line:
Storage for kobject attribute elc_low_threshold_percent
Warning: uncore-frequency-common.h:52 bad line:
Storage for kobject attribute elc_high_threshold_percent
Warning: uncore-frequency-common.h:54 bad line:
Storage for kobject attribute elc_high_threshold_enable
Warning: uncore-frequency-common.h:92 struct member
'min_freq_khz_kobj_attr' not described in 'uncore_data'
Warning: uncore-frequency-common.h:92 struct member
'die_id_kobj_attr' not described in 'uncore_data'
Fixes: 24b6616355 ("platform/x86/intel-uncore-freq: Add efficiency latency control to sysfs interface")
Fixes: 416de0246f ("platform/x86: intel-uncore-freq: Fix types in sysfs callbacks")
Fixes: 247b43fcd8 ("platform/x86/intel-uncore-freq: Add attributes to show die_id")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20251111060938.1998542-1-rdunlap@infradead.org
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
isst_if_probe() uses pci_read_config_dword() that returns PCIBIOS_*
codes. The return code is returned from the probe function as is but
probe functions should return normal errnos. A proper implementation
can be found in drivers/leds/leds-ss4200.c.
Convert PCIBIOS_* return codes using pcibios_err_to_errno() into
normal errno before returning.
Fixes: d3a2358429 ("platform/x86: ISST: Add Intel Speed Select mmio interface")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20251117033354.132-1-vulab@iscas.ac.cn
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
And expand it to encompass all pressure pads.
Definition: "pressure pad" as used here as includes all touchpads that
use physical pressure to convert to click, without physical hinges. Also
called haptic touchpads in general parlance, Synaptics calls them
ForcePads.
Most (all?) pressure pads are currently advertised as
INPUT_PROP_BUTTONPAD. The suggestion to identify them as pressure pads
by defining the resolution on ABS_MT_PRESSURE has been in the docs since
commit 20ccc8dd38 ("Documentation: input: define
ABS_PRESSURE/ABS_MT_PRESSURE resolution as grams") but few devices
provide this information.
In userspace it's thus impossible to determine whether a device is a
true pressure pad (pressure equals pressure) or a normal clickpad with
(pressure equals finger size).
Commit 7075ae4ac9 ("Input: add INPUT_PROP_HAPTIC_TOUCHPAD") introduces
INPUT_PROP_HAPTIC_TOUCHPAD but restricted it to those touchpads that
have support for userspace-controlled effects. Let's expand and rename
that definition to include all pressure pad touchpads since those that
do support FF effects can be identified by the presence of the
FF_HAPTIC bit.
This means:
- clickpad: INPUT_PROP_BUTTONPAD
- pressurepad: INPUT_PROP_BUTTONPAD + INPUT_PROP_PRESSUREPAD
- pressurepad with configurable haptics:
INPUT_PROP_BUTTONPAD + INPUT_PROP_PRESSUREPAD + FF_HAPTIC
Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Acked-by: Benjamin Tissoires <bentiss@kernel.org>
Link: https://patch.msgid.link/20251106114534.GA405512@tassie
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Airoha_eth driver forwards offloaded uplink traffic (packets received
on GDM1 and forwarded to GDM{3,4}) to GDM2 in order to apply hw QoS.
This is correct if the device does not support a dedicated GDM2 port.
In this case, in order to enable hw offloading for uplink traffic,
the packets should be sent to GDM{3,4} directly.
Fixes: 9cd451d414 ("net: airoha: Add loopback support for GDM2")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20251113-airoha-hw-offload-gdm2-fix-v1-1-7e4ca300872f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ret_set_ksft_status() calls ksft_status_merge() with the current return
status and the last one. It treats a non-zero return code from
ksft_status_merge() as an indication that the return status was
overwritten by the last one and therefore overwrites the return message
with the last one.
Currently, ksft_status_merge() returns a non-zero return code even if
the current return status and the last one are equal. This results in
return messages being overwritten which is counter-productive since we
are more interested in the first failure message and not the last one.
Fix by changing ksft_status_merge() to only return a non-zero return
code if the current return status was actually changed.
Add a test case which checks that the first error message is not
overwritten.
Before:
# ./lib_sh_test.sh
[...]
TEST: RET tfail2 tfail -> fail [FAIL]
retmsg=tfail expected tfail2
[...]
# echo $?
1
After:
# ./lib_sh_test.sh
[...]
TEST: RET tfail2 tfail -> fail [ OK ]
[...]
# echo $?
0
Fixes: 596c8819cb ("selftests: forwarding: Have RET track kselftest framework constants")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251116081029.69112-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The function 'mpc_rcvd_sweep_req(mpcginfo)' is called conditionally
from function 'ctcmpc_unpack_skb'. It frees passed mpcginfo.
After that a call to function 'kfree' in function 'ctcmpc_unpack_skb'
frees it again.
Remove 'kfree' call in function 'mpc_rcvd_sweep_req(mpcginfo)'.
Bug detected by the clang static analyzer.
Fixes: 0c0b20587b ("s390/ctcm: fix potential memory leak")
Reviewed-by: Aswin Karuvally <aswin@linux.ibm.com>
Signed-off-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com>
Signed-off-by: Aswin Karuvally <aswin@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251112182724.1109474-1-aswin@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull vfs fixes from Christian Brauner:
- Fix unitialized variable in statmount_string()
- Fix hostfs mounting when passing host root during boot
- Fix dynamic lookup to fail on cell lookup failure
- Fix missing file type when reading bfs inodes from disk
- Enforce checking of sb_min_blocksize() calls and update all callers
accordingly
- Restore write access before closing files opened by open_exec() in
binfmt_misc
- Always freeze efivarfs during suspend/hibernate cycles
- Fix statmount()'s and listmount()'s grab_requested_mnt_ns() helper to
actually allow mount namespace file descriptor in addition to mount
namespace ids
- Fix tmpfs remount when noswap is specified
- Switch Landlock to iput_not_last() to remove false-positives from
might_sleep() annotations in iput()
- Remove dead node_to_mnt_ns() code
- Ensure that per-queue kobjects are successfully created
* tag 'vfs-6.18-rc7.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
landlock: fix splats from iput() after it started calling might_sleep()
fs: add iput_not_last()
shmem: fix tmpfs reconfiguration (remount) when noswap is set
fs/namespace: correctly handle errors returned by grab_requested_mnt_ns
power: always freeze efivarfs
binfmt_misc: restore write access before closing files opened by open_exec()
block: add __must_check attribute to sb_min_blocksize()
virtio-fs: fix incorrect check for fsvq->kobj
xfs: check the return value of sb_min_blocksize() in xfs_fs_fill_super
isofs: check the return value of sb_min_blocksize() in isofs_fill_super
exfat: check return value of sb_min_blocksize in exfat_read_boot_sector
vfat: fix missing sb_min_blocksize() return value checks
mnt: Remove dead code which might prevent from building
bfs: Reconstruct file type when loading from disk
afs: Fix dynamic lookup to fail on cell lookup failure
hostfs: Fix only passing host root in boot stage with new mount
fs: Fix uninitialized 'offp' in statmount_string()
Pull sched_ext fixes from Tejun Heo:
"Five fixes addressing PREEMPT_RT compatibility and locking issues.
Three commits fix potential deadlocks and sleeps in atomic contexts on
RT kernels by converting locks to raw spinlocks and ensuring IRQ work
runs in hard-irq context. The remaining two fix unsafe locking in the
debug dump path and a variable dereference typo"
* tag 'sched_ext-for-6.18-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Use IRQ_WORK_INIT_HARD() to initialize rq->scx.kick_cpus_irq_work
sched_ext: Fix possible deadlock in the deferred_irq_workfn()
sched/ext: convert scx_tasks_lock to raw spinlock
sched_ext: Fix unsafe locking in the scx_dump_state()
sched_ext: Fix use of uninitialized variable in scx_bpf_cpuperf_set()
Now target is removed from nvme_fc_ctrl_free() which is the ctrl->ref
release handler. And even admin queue is unquiesced there, this way
is definitely wrong because the ctr->ref is grabbed when submitting
command.
And Marco observed that nvme_fc_ctrl_free() can be called from request
completion code path, and trigger kernel warning since request completes
from softirq context.
Fix the issue by moveing target removal into nvme_fc_delete_ctrl(),
which is also aligned with nvme-tcp and nvme-rdma.
Patch originally proposed by Ming Lei, then modified to move the tagset
removal down to after nvme_fc_delete_association() after further testing.
Cc: Marco Patalano <mpatalan@redhat.com>
Cc: Ewan Milne <emilne@redhat.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Cc: stable@vger.kernel.org
Tested-by: Marco Patalano <mpatalan@redhat.com>
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Pull MTD fixes from Miquel Raynal:
"Mostly small misc fixes, here they are sorted by sub-subsystem:
ECC fixes:
- Realtek Kconfig fix
SPI NAND fixes:
- Remove nonexistent QE bit on FMSH FM25S01A
Raw NAND fixes:
- Prevent DMA device NULL pointer dereference in Cadence driver
MTD device fixes:
- Possible integer overflow in read/write ioctls
- Fix the IRQ handler pointer in the onenand driver, even if in
practice it is never dereferenced.
* tag 'mtd/fixes-for-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: onenand: Pass correct pointer to IRQ handler
mtd: spinand: fmsh: remove QE bit for FM25S01A flash
mtd: rawnand: cadence: fix DMA device NULL pointer dereference
mtd: rawnand: realtek: Make rtl_ecc_engine_ops const
mtd: nand: MTD_NAND_ECC_REALTEK should depend on HAS_DMA
mtd: nand: realtek-ecc: Fix a IS_ERR() vs NULL bug in probe
mtdchar: fix integer overflow in read/write ioctls
Integrated amplifier LEAK Stereo 230 by IAG Limited has built-in
ESS9038Q2M DAC served by XMOS controller. It supports both DSD Native
and DSD-over-PCM (DoP) operational modes. But it doesn't work properly
by default and tries DSD-to-PCM conversion. USB quirks below allow it
to operate as designed.
Add DSD_RAW quirk flag for IAG Limited devices (vendor ID 0x2622)
Add DSD format quirk for LEAK Stereo 230 (USB ID 0x2622:0x0061)
Signed-off-by: Ivan Zhaldak <i.v.zhaldak@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20251117125848.30769-1-i.v.zhaldak@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
__snd_vortex_probe() uses pci_read_config_word() that returns PCIBIOS_*
codes (positive values on error). However, the function checks 'err < 0'
which can never be true for PCIBIOS_* codes, causing errors to be silently
ignored.
Check for non-zero return value and convert PCIBIOS_* codes using
pcibios_err_to_errno() into normal errno before returning them.
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Reviewed-by: Philipp Stanner <phasta@kernel.org>
Link: https://patch.msgid.link/20251117065559.1138-1-vulab@iscas.ac.cn
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reading the interrupt register `SUN4I_REG_INT_ADDR` causes all of its bits
to be reset. If we ever reach the condition of handling more than
`SUN4I_CAN_MAX_IRQ` IRQs, we will have read the register and reset all its
bits but without actually handling the interrupt inside of the loop body.
This may, among other issues, cause us to never `netif_wake_queue()` again
after a transmission interrupt.
Fixes: 0738eff14d ("can: Allwinner A10/A20 CAN Controller support - Kernel module")
Cc: stable@vger.kernel.org
Co-developed-by: Thomas Mühlbacher <tmuehlbacher@posteo.net>
Signed-off-by: Thomas Mühlbacher <tmuehlbacher@posteo.net>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251116-sun4i-fix-loop-v1-1-3d76d3f81950@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde <mkl@pengutronix.de> says:
The bulk-out callback gs_usb_xmit_callback() does not take care of the
cleanup of failed transfers of URBs. The 1st patch adds the missing
cleanup.
The bulk-in callback gs_usb_receive_bulk_callback() accesses the buffer of
the URB without checking how much data has actually been received. The last
2 patches fix this problem.
Link: https://patch.msgid.link/20251114-gs_usb-fix-usb-callbacks-v1-0-a29b42eacada@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The URB received in gs_usb_receive_bulk_callback() contains a struct
gs_host_frame. The length of the data after the header depends on the
gs_host_frame hf::flags and the active device features (e.g. time
stamping).
Introduce a new function gs_usb_get_minimum_length() and check that we have
at least received the required amount of data before accessing it. Only
copy the data to that skb that has actually been received.
Fixes: d08e973a77 ("can: gs_usb: Added support for the GS_USB CAN devices")
Link: https://patch.msgid.link/20251114-gs_usb-fix-usb-callbacks-v1-3-a29b42eacada@pengutronix.de
[mkl: rename gs_usb_get_minimum_length() -> +gs_usb_get_minimum_rx_length()]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Commit 7e091add9c "nvme-auth: update sc_c in host response" added
the sc_c variable to the dhchap queue context structure which is
appropriately set during negotiate and then used in the host response.
This breaks secure concat connections with a Linux target as the target
code wasn't updated at the same time. This patch fixes this by adding a
new sc_c variable to the host hash calculations.
Fixes: 7e091add9c ("nvme-auth: update sc_c in host response")
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Martin George <marting@netapp.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
For PREEMPT_RT kernels, the kick_cpus_irq_workfn() be invoked in
the per-cpu irq_work/* task context and there is no rcu-read critical
section to protect. this commit therefore use IRQ_WORK_INIT_HARD() to
initialize the per-cpu rq->scx.kick_cpus_irq_work in the
init_sched_ext_class().
Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix writing bpf_prog (infos|btfs)_cnt to data file, to not generate
invalid perf.data files in some corner cases.
- Fix 'perf top' segfault by ensuring libbfd is initialized. This is an
opt-in feature due to license incompatibilities.
- Fix segfault in 'perf lock' due to missing kernel map.
- Fix 'perf lock contention' test.
- Don't fail fast path detection if binutils-devel isn't available.
- Sync KVM's vmx.h with the kernel to pick SEAMCALL exit reason.
* tag 'perf-tools-fixes-for-v6.18-2-2025-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf libbfd: Ensure libbfd is initialized prior to use
perf test: Fix lock contention test
perf lock: Fix segfault due to missing kernel map
tools headers UAPI: Sync KVM's vmx.h with the kernel to pick SEAMCALL exit reason
perf build: Don't fail fast path feature detection when binutils-devel is not available
perf header: Write bpf_prog (infos|btfs)_cnt to data file
Pull misc fixes from Andrew Morton:
"7 hotfixes. 5 are cc:stable, 4 are against mm/
All are singletons - please see the respective changelogs for details"
* tag 'mm-hotfixes-stable-2025-11-16-10-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm, swap: fix potential UAF issue for VMA readahead
selftests/user_events: fix type cast for write_index packed member in perf_test
lib/test_kho: check if KHO is enabled
mm/huge_memory: fix folio split check for anon folios in swapcache
MAINTAINERS: update David Hildenbrand's email address
crash: fix crashkernel resource shrink
mm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlb
The file tools/arch/riscv/include/asm/csr.h borrows from
arch/riscv/include/asm/csr.h, and subsequent modifications
related to CSR should maintain consistency.
Signed-off-by: Chen Pei <cp0613@linux.alibaba.com>
Link: https://patch.msgid.link/20251114071215.816-1-cp0613@linux.alibaba.com
[pjw@kernel.org: dropped Fixes: lines for patches that weren't broken; removed superfluous blank line]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Currently, the sbi_init() always attempts to register the legacy shutdown
function as the sys-off handler which is fine when RISCV_SBI_V01 is not
enabled. However, if RISCV_SBI_V01 is enabled in the kernel and the SBI
v0.1 is not supported by the underlying SBI implementation then the
legacy shutdown fails. Fix this by not registering the legacy shutdown
when SRST shutdown is available.
Fixes: 70ddf86d76 ("riscv: sbi: Switch to new sys-off handler API")
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://patch.msgid.link/20251114065808.304430-1-mchitale@ventanamicro.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
The driver expects to receive a struct gs_host_frame in
gs_usb_receive_bulk_callback().
Use struct_group to describe the header of the struct gs_host_frame and
check that we have at least received the header before accessing any
members of it.
To resubmit the URB, do not dereference the pointer chain
"dev->parent->hf_size_rx" but use "parent->hf_size_rx" instead. Since
"urb->context" contains "parent", it is always defined, while "dev" is not
defined if the URB it too short.
Fixes: d08e973a77 ("can: gs_usb: Added support for the GS_USB CAN devices")
Link: https://patch.msgid.link/20251114-gs_usb-fix-usb-callbacks-v1-2-a29b42eacada@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Reading the interrupt register `SJA1000_IR` causes all of its bits to be
reset. If we ever reach the condition of handling more than
`SJA1000_MAX_IRQ` IRQs, we will have read the register and reset all its
bits but without actually handling the interrupt inside of the loop
body.
This may, among other issues, cause us to never `netif_wake_queue()`
again after a transmission interrupt.
Fixes: 429da1cc84 ("can: Driver for the SJA1000 CAN controller")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Mühlbacher <tmuehlbacher@posteo.net>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20251115153437.11419-1-tmuehlbacher@posteo.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The `kvaser_usb_leaf_wait_cmd()` and `kvaser_usb_leaf_read_bulk_callback`
functions contain logic to zero-length commands. These commands are used
to align data to the USB endpoint's wMaxPacketSize boundary.
The driver attempts to skip these placeholders by aligning the buffer
position `pos` to the next packet boundary using `round_up()` function.
However, if zero-length command is found exactly on a packet boundary
(i.e., `pos` is a multiple of wMaxPacketSize, including 0), `round_up`
function will return the unchanged value of `pos`. This prevents `pos`
to be increased, causing an infinite loop in the parsing logic.
This patch fixes this in the function by using `pos + 1` instead.
This ensures that even if `pos` is on a boundary, the calculation is
based on `pos + 1`, forcing `round_up()` to always return the next
aligned boundary.
Fixes: 7259124eac ("can: kvaser_usb: Split driver into kvaser_usb_core.c and kvaser_usb_leaf.c")
Signed-off-by: Seungjin Bae <eeodqql09@gmail.com>
Reviewed-by: Jimmy Assarsson <extja@kvaser.com>
Tested-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://patch.msgid.link/20251023162709.348240-1-eeodqql09@gmail.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Pull firewire fixes from Takashi Sakamoto:
"This includes some fixes for the topology map, newly introduced in
v6.18 kernel"
* tag 'firewire-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: fix to update generation field in topology map
firewire: core: Initialize topology_map.lock
Pull EDAC fixes from Borislav Petkov:
- In Versalnet, handle the reporting of non-standard hw errors whose
information can come in more than one remote processor message.
- Explicitly reenable ECC checking after a warm reset in Altera OCRAM
as those registers are reset to default otherwise
- Fix single-bit error injection in Altera EDAC to not inject errors
directly in ECC RAM and thus lead to false double-bit errors due to
same ECC RAM being in concurrent use
* tag 'edac_urgent_for_v6.18_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/altera: Use INTTEST register for Ethernet and USB SBE injection
EDAC/altera: Handle OCRAM ECC enable after warm reset
EDAC/versalnet: Handle split messages for non-standard errors
Since commit 78524b05f1 ("mm, swap: avoid redundant swap device
pinning"), the common helper for allocating and preparing a folio in the
swap cache layer no longer tries to get a swap device reference
internally, because all callers of __read_swap_cache_async are already
holding a swap entry reference. The repeated swap device pinning isn't
needed on the same swap device.
Caller of VMA readahead is also holding a reference to the target entry's
swap device, but VMA readahead walks the page table, so it might encounter
swap entries from other devices, and call __read_swap_cache_async on
another device without holding a reference to it.
So it is possible to cause a UAF when swapoff of device A raced with
swapin on device B, and VMA readahead tries to read swap entries from
device A. It's not easy to trigger, but in theory, it could cause real
issues.
Make VMA readahead try to get the device reference first if the swap
device is a different one from the target entry.
Link: https://lkml.kernel.org/r/20251111-swap-fix-vma-uaf-v1-1-41c660e58562@tencent.com
Fixes: 78524b05f1 ("mm, swap: avoid redundant swap device pinning")
Suggested-by: Huang Ying <ying.huang@linux.alibaba.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When crashkernel is configured with a high reservation, shrinking its
value below the low crashkernel reservation causes two issues:
1. Invalid crashkernel resource objects
2. Kernel crash if crashkernel shrinking is done twice
For example, with crashkernel=200M,high, the kernel reserves 200MB of high
memory and some default low memory (say 256MB). The reservation appears
as:
cat /proc/iomem | grep -i crash
af000000-beffffff : Crash kernel
433000000-43f7fffff : Crash kernel
If crashkernel is then shrunk to 50MB (echo 52428800 >
/sys/kernel/kexec_crash_size), /proc/iomem still shows 256MB reserved:
af000000-beffffff : Crash kernel
Instead, it should show 50MB:
af000000-b21fffff : Crash kernel
Further shrinking crashkernel to 40MB causes a kernel crash with the
following trace (x86):
BUG: kernel NULL pointer dereference, address: 0000000000000038
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
<snip...>
Call Trace: <TASK>
? __die_body.cold+0x19/0x27
? page_fault_oops+0x15a/0x2f0
? search_module_extables+0x19/0x60
? search_bpf_extables+0x5f/0x80
? exc_page_fault+0x7e/0x180
? asm_exc_page_fault+0x26/0x30
? __release_resource+0xd/0xb0
release_resource+0x26/0x40
__crash_shrink_memory+0xe5/0x110
crash_shrink_memory+0x12a/0x190
kexec_crash_size_store+0x41/0x80
kernfs_fop_write_iter+0x141/0x1f0
vfs_write+0x294/0x460
ksys_write+0x6d/0xf0
<snip...>
This happens because __crash_shrink_memory()/kernel/crash_core.c
incorrectly updates the crashk_res resource object even when
crashk_low_res should be updated.
Fix this by ensuring the correct crashkernel resource object is updated
when shrinking crashkernel memory.
Link: https://lkml.kernel.org/r/20251101193741.289252-1-sourabhjain@linux.ibm.com
Fixes: 16c6006af4 ("kexec: enable kexec_crash_size to support two crash kernel regions")
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In the past, CONFIG_ARCH_HAS_GIGANTIC_PAGE indicated that we support
runtime allocation of gigantic hugetlb folios. In the meantime it evolved
into a generic way for the architecture to state that it supports gigantic
hugetlb folios.
In commit fae7d834c4 ("mm: add __dump_folio()") we started using
CONFIG_ARCH_HAS_GIGANTIC_PAGE to decide MAX_FOLIO_ORDER: whether we could
have folios larger than what the buddy can handle. In the context of that
commit, we started using MAX_FOLIO_ORDER to detect page corruptions when
dumping tail pages of folios. Before that commit, we assumed that we
cannot have folios larger than the highest buddy order, which was
obviously wrong.
In commit 7b4f21f5e0 ("mm/hugetlb: check for unreasonable folio sizes
when registering hstate"), we used MAX_FOLIO_ORDER to detect
inconsistencies, and in fact, we found some now.
Powerpc allows for configs that can allocate gigantic folio during boot
(not at runtime), that do not set CONFIG_ARCH_HAS_GIGANTIC_PAGE and can
exceed PUD_ORDER.
To fix it, let's make powerpc select CONFIG_ARCH_HAS_GIGANTIC_PAGE with
hugetlb on powerpc, and increase the maximum folio size with hugetlb to 16
GiB on 64bit (possible on arm64 and powerpc) and 1 GiB on 32 bit
(powerpc). Note that on some powerpc configurations, whether we actually
have gigantic pages depends on the setting of CONFIG_ARCH_FORCE_MAX_ORDER,
but there is nothing really problematic about setting it unconditionally:
we just try to keep the value small so we can better detect problems in
__dump_folio() and inconsistencies around the expected largest folio in
the system.
Ideally, we'd have a better way to obtain the maximum hugetlb folio size
and detect ourselves whether we really end up with gigantic folios. Let's
defer bigger changes and fix the warnings first.
While at it, handle gigantic DAX folios more clearly: DAX can only end up
creating gigantic folios with HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD.
Add a new Kconfig option HAVE_GIGANTIC_FOLIOS to make both cases clearer.
In particular, worry about ARCH_HAS_GIGANTIC_PAGE only with HUGETLB_PAGE.
Note: with enabling CONFIG_ARCH_HAS_GIGANTIC_PAGE on powerpc, we will now
also allow for runtime allocations of folios in some more powerpc configs.
I don't think this is a problem, but if it is we could handle it through
__HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED.
While __dump_page()/__dump_folio was also problematic (not handling
dumping of tail pages of such gigantic folios correctly), it doesn't seem
critical enough to mark it as a fix.
Link: https://lkml.kernel.org/r/20251114214920.2550676-1-david@kernel.org
Fixes: 7b4f21f5e0 ("mm/hugetlb: check for unreasonable folio sizes when registering hstate")
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Closes: https://lore.kernel.org/r/3e043453-3f27-48ad-b987-cc39f523060a@csgroup.eu/
Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Closes: https://lore.kernel.org/r/94377f5c-d4f0-4c0f-b0f6-5bf1cd7305b1@linux.ibm.com/
Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull s390 fix from Heiko Carstens:
- Fix a bug in the __ptep_rdp() inline assembly which may lead to
missing TLB flushes
* tag 's390-6.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: Fix __ptep_rdp() inline assembly
Pull x86 fixes from Ingo Molnar:
- Update the list of AMD microcode minimum Entrysign revisions
- Add additional fixed AMD RDSEED microcode revisions
- Update the language transliteration for Kiryl Shutsemau's name
in the MAINTAINERS entry
* tag 'x86-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Add Zen5 model 0x44, stepping 0x1 minrev
x86/CPU/AMD: Add additional fixed RDSEED microcode revisions
MAINTAINERS: Update name spelling
Pull timer fix from Ingo Molnar:
"Fix a memory leak in the posix timer creation logic"
* tag 'timers-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-timers: Plug potential memory leak in do_timer_create()
Pull irq fix from Ingo Molnar:
"Fix an irqchip driver release bug in the riscv-intc irqchip driver"
* tag 'irq-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/riscv-intc: Add missing free() callback in riscv_intc_domain_ops
Pull core fix from Ingo Molnar:
"Fix a broken #ifndef in the <linux/entry-virt.h> header.
It hasn't caused problems upstream yet because no arch overrides
arch_xfer_to_guest_mode_handle_work() at this moment"
* tag 'core-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Fix ifndef around arch_xfer_to_guest_mode_handle_work() stub
Commit dc82a33297 ("veth: apply qdisc backpressure on full ptr_ring to
reduce TX drops") introduced a race condition that can lead to a permanently
stalled TXQ. This was observed in production on ARM64 systems (Ampere Altra
Max).
The race occurs in veth_xmit(). The producer observes a full ptr_ring and
stops the queue (netif_tx_stop_queue()). The subsequent conditional logic,
intended to re-wake the queue if the consumer had just emptied it (if
(__ptr_ring_empty(...)) netif_tx_wake_queue()), can fail. This leads to a
"lost wakeup" where the TXQ remains stopped (QUEUE_STATE_DRV_XOFF) and
traffic halts.
This failure is caused by an incorrect use of the __ptr_ring_empty() API
from the producer side. As noted in kernel comments, this check is not
guaranteed to be correct if a consumer is operating on another CPU. The
empty test is based on ptr_ring->consumer_head, making it reliable only for
the consumer. Using this check from the producer side is fundamentally racy.
This patch fixes the race by adopting the more robust logic from an earlier
version V4 of the patchset, which always flushed the peer:
(1) In veth_xmit(), the racy conditional wake-up logic and its memory barrier
are removed. Instead, after stopping the queue, we unconditionally call
__veth_xdp_flush(rq). This guarantees that the NAPI consumer is scheduled,
making it solely responsible for re-waking the TXQ.
This handles the race where veth_poll() consumes all packets and completes
NAPI *before* veth_xmit() on the producer side has called netif_tx_stop_queue.
The __veth_xdp_flush(rq) will observe rx_notify_masked is false and schedule
NAPI.
(2) On the consumer side, the logic for waking the peer TXQ is moved out of
veth_xdp_rcv() and placed at the end of the veth_poll() function. This
placement is part of fixing the race, as the netif_tx_queue_stopped() check
must occur after rx_notify_masked is potentially set to false during NAPI
completion.
This handles the race where veth_poll() consumes all packets, but haven't
finished (rx_notify_masked is still true). The producer veth_xmit() stops the
TXQ and __veth_xdp_flush(rq) will observe rx_notify_masked is true, meaning
not starting NAPI. Then veth_poll() change rx_notify_masked to false and
stops NAPI. Before exiting veth_poll() will observe TXQ is stopped and wake
it up.
Fixes: dc82a33297 ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
Reviewed-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/176295323282.307447.14790015927673763094.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The validation of the set(nsh(...)) action is completely wrong.
It runs through the nsh_key_put_from_nlattr() function that is the
same function that validates NSH keys for the flow match and the
push_nsh() action. However, the set(nsh(...)) has a very different
memory layout. Nested attributes in there are doubled in size in
case of the masked set(). That makes proper validation impossible.
There is also confusion in the code between the 'masked' flag, that
says that the nested attributes are doubled in size containing both
the value and the mask, and the 'is_mask' that says that the value
we're parsing is the mask. This is causing kernel crash on trying to
write into mask part of the match with SW_FLOW_KEY_PUT() during
validation, while validate_nsh() doesn't allocate any memory for it:
BUG: kernel NULL pointer dereference, address: 0000000000000018
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1c2383067 P4D 1c2383067 PUD 20b703067 PMD 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 8 UID: 0 Kdump: loaded Not tainted 6.17.0-rc4+ #107 PREEMPT(voluntary)
RIP: 0010:nsh_key_put_from_nlattr+0x19d/0x610 [openvswitch]
Call Trace:
<TASK>
validate_nsh+0x60/0x90 [openvswitch]
validate_set.constprop.0+0x270/0x3c0 [openvswitch]
__ovs_nla_copy_actions+0x477/0x860 [openvswitch]
ovs_nla_copy_actions+0x8d/0x100 [openvswitch]
ovs_packet_cmd_execute+0x1cc/0x310 [openvswitch]
genl_family_rcv_msg_doit+0xdb/0x130
genl_family_rcv_msg+0x14b/0x220
genl_rcv_msg+0x47/0xa0
netlink_rcv_skb+0x53/0x100
genl_rcv+0x24/0x40
netlink_unicast+0x280/0x3b0
netlink_sendmsg+0x1f7/0x430
____sys_sendmsg+0x36b/0x3a0
___sys_sendmsg+0x87/0xd0
__sys_sendmsg+0x6d/0xd0
do_syscall_64+0x7b/0x2c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The third issue with this process is that while trying to convert
the non-masked set into masked one, validate_set() copies and doubles
the size of the OVS_KEY_ATTR_NSH as if it didn't have any nested
attributes. It should be copying each nested attribute and doubling
them in size independently. And the process must be properly reversed
during the conversion back from masked to a non-masked variant during
the flow dump.
In the end, the only two outcomes of trying to use this action are
either validation failure or a kernel crash. And if somehow someone
manages to install a flow with such an action, it will most definitely
not do what it is supposed to, since all the keys and the masks are
mixed up.
Fixing all the issues is a complex task as it requires re-writing
most of the validation code.
Given that and the fact that this functionality never worked since
introduction, let's just remove it altogether. It's better to
re-introduce it later with a proper implementation instead of trying
to fix it in stable releases.
Fixes: b2d0f5d5dc ("openvswitch: enable NSH support")
Reported-by: Junvy Yang <zhuque@tencent.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20251112112246.95064-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot reported use-after-free in mptcp_schedule_work() [1]
Issue here is that mptcp_schedule_work() schedules a work,
then gets a refcount on sk->sk_refcnt if the work was scheduled.
This refcount will be released by mptcp_worker().
[A] if (schedule_work(...)) {
[B] sock_hold(sk);
return true;
}
Problem is that mptcp_worker() can run immediately and complete before [B]
We need instead :
sock_hold(sk);
if (schedule_work(...))
return true;
sock_put(sk);
[1]
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 1 PID: 29 at lib/refcount.c:25 refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:25
Call Trace:
<TASK>
__refcount_add include/linux/refcount.h:-1 [inline]
__refcount_inc include/linux/refcount.h:366 [inline]
refcount_inc include/linux/refcount.h:383 [inline]
sock_hold include/net/sock.h:816 [inline]
mptcp_schedule_work+0x164/0x1a0 net/mptcp/protocol.c:943
mptcp_tout_timer+0x21/0xa0 net/mptcp/protocol.c:2316
call_timer_fn+0x17e/0x5f0 kernel/time/timer.c:1747
expire_timers kernel/time/timer.c:1798 [inline]
__run_timers kernel/time/timer.c:2372 [inline]
__run_timer_base+0x648/0x970 kernel/time/timer.c:2384
run_timer_base kernel/time/timer.c:2393 [inline]
run_timer_softirq+0xb7/0x180 kernel/time/timer.c:2403
handle_softirqs+0x22f/0x710 kernel/softirq.c:622
__do_softirq kernel/softirq.c:656 [inline]
run_ktimerd+0xcf/0x190 kernel/softirq.c:1138
smpboot_thread_fn+0x542/0xa60 kernel/smpboot.c:160
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
Cc: stable@vger.kernel.org
Fixes: 3b1d6210a9 ("mptcp: implement and use MPTCP-level retransmission")
Reported-by: syzbot+355158e7e301548a1424@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6915b46f.050a0220.3565dc.0028.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251113103924.3737425-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The LED setup routine registered both led_sync_good
and led_is_gm devices without checking the return
values of led_classdev_register(). If either registration
failed, the function continued silently, leaving the
driver in a partially-initialized state and leaking
a registered LED classdev.
Add proper error handling
Fixes: 7d9ee2e8ff ("net: dsa: hellcreek: Add PTP status LEDs")
Signed-off-by: Pavel Zhigulin <Pavel.Zhigulin@kaspersky.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://patch.msgid.link/20251113135745.92375-1-Pavel.Zhigulin@kaspersky.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull pci fixes from Bjorn Helgaas:
- Cache the ASPM L0s/L1 Supported bits early so quirks can override
them if necessary (Bjorn Helgaas)
- Add quirks for PA Semi and Freescale Root Ports and a HiSilicon Wi-Fi
device that are reported to have broken L0s and L1 (Shawn Lin, Bjorn
Helgaas)
* tag 'pci-v6.18-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI/ASPM: Avoid L0s and L1 on Hi1105 [19e5:1105] Wi-Fi
PCI/ASPM: Avoid L0s and L1 on PA Semi [1959:a002] Root Ports
PCI/ASPM: Avoid L0s and L1 on Freescale [1957:0451] Root Ports
PCI/ASPM: Convert quirks to override advertised link states
PCI/ASPM: Add pcie_aspm_remove_cap() to override advertised link states
PCI/ASPM: Cache L0s/L1 Supported so advertised link states can be overridden
Pull bpf fixes from Alexei Starovoitov:
- Fix interaction between livepatch and BPF fexit programs (Song Liu)
With Steven and Masami acks.
- Fix stack ORC unwind from BPF kprobe_multi (Jiri Olsa)
With Steven and Masami acks.
- Fix out of bounds access in widen_imprecise_scalars() in the verifier
(Eduard Zingerman)
- Fix conflicts between MPTCP and BPF sockmap (Jiayuan Chen)
- Fix net_sched storage collision with BPF data_meta/data_end (Eric
Dumazet)
- Add _impl suffix to BPF kfuncs with implicit args to avoid breaking
them in bpf-next when KF_IMPLICIT_ARGS is added (Mykyta Yatsenko)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Test widen_imprecise_scalars() with different stack depth
bpf: account for current allocated stack depth in widen_imprecise_scalars()
bpf: Add bpf_prog_run_data_pointers()
selftests/bpf: Add mptcp test with sockmap
mptcp: Fix proto fallback detection with BPF
mptcp: Disallow MPTCP subflows from sockmap
selftests/bpf: Add stacktrace ips test for raw_tp
selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi
x86/fgraph,bpf: Fix stack ORC unwind from kprobe_multi return probe
Revert "perf/x86: Always store regs->ip in perf_callchain_kernel()"
bpf: add _impl suffix for bpf_stream_vprintk() kfunc
bpf:add _impl suffix for bpf_task_work_schedule* kfuncs
selftests/bpf: Add tests for livepatch + bpf trampoline
ftrace: bpf: Fix IPMODIFY + DIRECT in modify_ftrace_direct()
ftrace: Fix BPF fexit with livepatch
Pull Rust fix from Miguel Ojeda:
- Fix a Rust 1.91.0 build issue due to 'bindings.o' not containing
DWARF debug information anymore by teaching gendwarfksyms to skip
object files without exports
* tag 'rust-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
gendwarfksyms: Skip files with no exports
Pull NFS client fixes from Anna Schumaker:
- Various fixes when using NFS with TLS
- Localio direct-IO fixes
- Fix error handling in nfs_atomic_open_v23()
- Fix sysfs memory leak when nfs_client kobject add fails
- Fix an incorrect parameter when calling nfs4_call_sync()
- Fix a failing LTP test when using delegated timestamps
* tag 'nfs-for-6.18-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Fix LTP test failures when timestamps are delegated
NFSv4: Fix an incorrect parameter when calling nfs4_call_sync()
NFS: sysfs: fix leak when nfs_client kobject add fails
NFSv2/v3: Fix error handling in nfs_atomic_open_v23()
nfs/localio: do not issue misaligned DIO out-of-order
nfs/localio: Ensure DIO WRITE's IO on stable storage upon completion
nfs/localio: backfill missing partial read support for misaligned DIO
nfs/localio: add refcounting for each iocb IO associated with NFS pgio header
nfs/localio: remove unecessary ENOTBLK handling in DIO WRITE support
NFS: Check the TLS certificate fields in nfs_match_client()
pnfs: Set transport security policy to RPC_XPRTSEC_NONE unless using TLS
pnfs: Fix TLS logic in _nfs4_pnfs_v4_ds_connect()
pnfs: Fix TLS logic in _nfs4_pnfs_v3_ds_connect()
Pull drm fixes from Dave Airlie:
"Weekly fixes, amdgpu and vmwgfx making up the most of it, along with
panthor and i915/xe.
Seems about right for this time of development, nothing major
outstanding.
client:
- Fix description of module parameter
panthor:
- Flush writes before mapping buffers
vmwgfx:
- Improve command validation
- Improve ref counting
- Fix cursor-plane support
amdgpu:
- Disallow P2P DMA for GC 12 DCC surfaces
- ctx error handling fix
- UserQ fixes
- VRR fix
- ISP fix
- JPEG 5.0.1 fix
amdkfd:
- Save area check fix
- Fix GPU mappings for APU after prefetch
i915:
- Fix PSR's pipe to vblank conversion
- Disable Panel Replay on MST links
xe:
- New HW workarounds affecting PTL and WCL platforms
* tag 'drm-fixes-2025-11-15' of https://gitlab.freedesktop.org/drm/kernel:
drm/client: fix MODULE_PARM_DESC string for "active"
drm/i915/dp_mst: Disable Panel Replay
drm/amdkfd: Fix GPU mappings for APU after prefetch
drm/amdkfd: relax checks for over allocation of save area
drm/amdgpu/jpeg: Add parse_cs for JPEG5_0_1
drm/amd/amdgpu: Ensure isp_kernel_buffer_alloc() creates a new BO
drm/amd/display: Allow VRR params change if unsynced with the stream
drm/amdgpu: fix lock warning in amdgpu_userq_fence_driver_process
drm/amdgpu: jump to the correct label on failure
drm/amdgpu: disable peer-to-peer access for DCC-enabled GC12 VRAM surfaces
drm/xe/xe3lpg: Extend Wa_15016589081 for xe3lpg
drm/xe/xe3: Extend wa_14023061436
drm/xe/xe3: Add WA_14024681466 for Xe3_LPG
drm/i915/psr: fix pipe to vblank conversion
drm/panthor: Flush shmem writes before mapping buffers CPU-uncached
drm/vmwgfx: Restore Guest-Backed only cursor plane support
drm/vmwgfx: Use kref in vmw_bo_dirty
drm/vmwgfx: Validate command header size against SVGA_CMD_MAX_DATASIZE
TEE kernel-doc fixes for v6.18
* tag 'tee-fix-for-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jenswi/linux-tee:
tee: <uapi/linux/tee.h: fix all kernel-doc issues
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This pull request contains Broadcom ARM64-based SoCs Device Tree fixes
for 6.18, please pull the following:
- Andrea assigns clocks rates for the Ethernet controller for the
Raspberry Pi 5 systems
- Laurent adds an ethernet0 alias to allow client programs consuming
that alias to populate the correct Ethernet address for the Raspberry
Pi 5 systems
* tag 'arm-soc/for-6.18/devicetree-arm64-fixes-v2' of https://github.com/Broadcom/stblinux:
arm64: dts: broadcom: bcm2712: rpi-5: Add ethernet0 alias
arm64: dts: broadcom: Assign clock rates in eth node for RPi5
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reset controller fixes for v6.18
* Fix incorrect EARC reset masks in the reset-imx8mp-audiomix driver,
introduced in commit a83bc87cd3.
* tag 'reset-fixes-for-v6.18' of https://git.pengutronix.de/git/pza/linux:
reset: imx8mp-audiomix: Fix bad mask values
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This pull request contains Broadcom ARM-based SoCs Device Tree files updates
for 6.18, please pull the following:
- Rafal fixes the Ethernet PHY address on the Luxul XAP-1440
* tag 'arm-soc/for-6.18/devicetree-fixes-part2' of https://github.com/Broadcom/stblinux:
ARM: dts: BCM53573: Fix address of Luxul XAP-1440's Ethernet PHY
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This pull request contains Broadcom ARM64 defconfig updates for 6.18,
please pull the following:
- Stefan ensures that the clk-raspberrypi driver which is now the clock
provider is built into the kernel image to satisfy root over NFS
* tag 'arm-soc/for-6.18/defconfig-arm64-fixes' of https://github.com/Broadcom/stblinux:
arm64: defconfig: Fix V3D deferred probe timeout
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Memory controller drivers - fixes for v6.18
Correct incorrect ID used for the memory controller client IDs in
Tegra210 Memory Controller driver, introduced in v6.18-rc1.
* tag 'memory-controller-drv-fixes-6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl:
memory: tegra210: Fix incorrect client ids
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull spi fixes from Mark Brown:
"A few standard fixes here, plus one more interesting one from Hans
which addresses an issue where a move in when we requested GPIOs on
ACPI systems caused us to stop doing pinmuxing and leave things
floating that we'd really rather not have floating"
* tag 'spi-fix-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: Add TODO comment about ACPI GPIO setup
spi: xilinx: increase number of retries before declaring stall
spi: imx: keep dma request disabled before dma transfer setup
spi: Try to get ACPI GPIO IRQ earlier
First batch of ASPEED fixes for 6.18
This time it's just the one fix addressing a PHY configuration regression in the
Fuji (Meta) platform's mac3 devicetree node.
* tag 'aspeed-6.18-fixes-0' of https://git.kernel.org/pub/scm/linux/kernel/git/bmc/linux:
ARM: dts: aspeed: fuji-data64: Enable mac3 controller
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
arm64: tegra: Fixes for v6.18
This contains a simple fix to mark the Ethernet PHY on Jetson Xavier NX
as a wakeup source so the device can support WoL.
* tag 'tegra-for-6.18-arm64-dt-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
arm64: tegra: Mark Jetson Xavier NX's PHY as a wakeup source
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull regulator fix from Mark Brown:
"One simple fix for a GPIO descriptor leak in the probe error handling
for the fixed regulator"
* tag 'regulator-fix-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: fixed: fix GPIO descriptor leak on register failure
Pull sound fixes from Takashi Iwai:
"A collection of small fixes. All changes are device-specific, and
nothing stands out.
- A regression fix for HD-audio HDMI probe
- USB-audio hardening patches for issues spotted by fuzzers
- ASoC fixes for TAS278x, SoundWire and Cirrus
- Usual HD-audio and USB-audio quirks"
* tag 'sound-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Add native DSD quirks for PureAudio DAC series
ASoC: rsnd: fix OF node reference leak in rsnd_ssiu_probe()
ALSA: hda/tas2781: Correct the wrong project ID
ALSA: usb-audio: Fix NULL pointer dereference in snd_usb_mixer_controls_badd
ASoC: SDCA: bug fix while parsing mipi-sdca-control-cn-list
ALSA: usb-audio: Fix potential overflow of PCM transfer buffer
ALSA: hda/tas2781: Add new quirk for HP new projects
ASoC: tas2781: fix getting the wrong device number
ASoC: codecs: va-macro: fix resource leak in probe error path
ASoC: tas2783A: Fix issues in firmware parsing
ASoC: sdw_utils: fix device reference leak in is_sdca_endpoint_present()
ASoC: cs4271: Fix regulator leak on probe failure
ALSA: hda/hdmi: Fix breakage at probing nvhdmi-mcp driver
ASoC: da7213: Use component driver suspend/resume
ALSA: usb-audio: add min_mute quirk for SteelSeries Arctis
ASoC: doc: cs35l56: Update firmware filename description for B0 silicon
Pull block fixlet from Jens Axboe:
"Been sitting on this one for a week or two, planning on sending it out
when there were other block changes for 6.18. But as that hasn't
materialized in the second week of sitting on it, let's flush it out.
A previous commit updated my git tree locations, but one was missed as
it was already set to the git.kernel.org one. But the git location swap
also renamed the actual tree from linux-block to just linux, let's get
that last one updated too"
* tag 'block-6.18-20251114' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
MAINTAINERS: correct git location for block layer tree
Pull io_uring fixes from Jens Axboe:
- Use the actual segments in a request when for bvec based buffers
- Fix an odd case where the iovec might get leaked for a read/write
request, if it was newly allocated, overflowed the alloc cache, and
hit an early error
- Minor tweak to the query API added in this release, returning the
number of available entries
* tag 'io_uring-6.18-20251113' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs
io_uring/query: return number of available queries
io_uring/rw: ensure allocated iovec gets cleared for early failure
A test case for a situation when widen_imprecise_scalars() is called
with old->allocated_stack > cur->allocated_stack. Test structure:
def widening_stack_size_bug():
r1 = 0
for r6 in 0..1:
iterator_with_diff_stack_depth(r1)
r1 = 42
def iterator_with_diff_stack_depth(r1):
if r1 != 42:
use 128 bytes of stack
iterator based loop
iterator_with_diff_stack_depth() is verified with r1 == 0 first and
r1 == 42 next. Causing stack usage of 128 bytes on a first visit and 8
bytes on a second. Such arrangement triggered a KASAN error in
widen_imprecise_scalars().
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251114025730.772723-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The usage pattern for widen_imprecise_scalars() looks as follows:
prev_st = find_prev_entry(env, ...);
queued_st = push_stack(...);
widen_imprecise_scalars(env, prev_st, queued_st);
Where prev_st is an ancestor of the queued_st in the explored states
tree. This ancestor is not guaranteed to have same allocated stack
depth as queued_st. E.g. in the following case:
def main():
for i in 1..2:
foo(i) // same callsite, differnt param
def foo(i):
if i == 1:
use 128 bytes of stack
iterator based loop
Here, for a second 'foo' call prev_st->allocated_stack is 128,
while queued_st->allocated_stack is much smaller.
widen_imprecise_scalars() needs to take this into account and avoid
accessing bpf_verifier_state->frame[*]->stack out of bounds.
Fixes: 2793a8b015 ("bpf: exact states comparison for iterator convergence checks")
Reported-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251114025730.772723-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit b6bcbce335 ("soc/tegra: pmc: Ensure power-domains are in a
known state") was introduced so that all power domains get initialized
to a known working state when booting and it does this by shutting them
down (including asserting resets and disabling clocks) before registering
each power domain with the genpd framework, leaving it to each driver to
later on power its needed domains.
This caused the Google Pixel C to hang when booting due to a workaround
in the DSI driver introduced in commit b22fd0b963 ("drm/tegra: dsi:
Clear enable register if powered by bootloader") meant to handle the case
where the bootloader enabled the DSI hardware module. The workaround relies
on reading a hardware register to determine the current status and after
b6bcbce335 that now happens in a powered down state thus leading to
the boot hang.
Fix this by reverting b22fd0b963 since currently we are guaranteed
that the hardware will be fully reset by the time we start enabling the
DSI module.
Fixes: b6bcbce335 ("soc/tegra: pmc: Ensure power-domains are in a known state")
Cc: stable@vger.kernel.org
Signed-off-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20251103-diogo-smaug_ec_typec-v1-1-be656ccda391@tecnico.ulisboa.pt
Add a call to put_pid() corresponding to get_task_pid().
host1x_memory_context_alloc() does not take ownership of the PID so we
need to free it here to avoid leaking.
Signed-off-by: Prateek Agarwal <praagarwal@nvidia.com>
Fixes: e09db97889 ("drm/tegra: Support context isolation")
[mperttunen@nvidia.com: reword commit message]
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20250919-host1x-put-pid-v1-1-19c2163dfa87@nvidia.com
syzbot found that cls_bpf_classify() is able to change
tc_skb_cb(skb)->drop_reason triggering a warning in sk_skb_reason_drop().
WARNING: CPU: 0 PID: 5965 at net/core/skbuff.c:1192 __sk_skb_reason_drop net/core/skbuff.c:1189 [inline]
WARNING: CPU: 0 PID: 5965 at net/core/skbuff.c:1192 sk_skb_reason_drop+0x76/0x170 net/core/skbuff.c:1214
struct tc_skb_cb has been added in commit ec624fe740 ("net/sched:
Extend qdisc control block with tc control block"), which added a wrong
interaction with db58ba4592 ("bpf: wire in data and data_end for
cls_act_bpf").
drop_reason was added later.
Add bpf_prog_run_data_pointers() helper to save/restore the net_sched
storage colliding with BPF data_meta/data_end.
Fixes: ec624fe740 ("net/sched: Extend qdisc control block with tc control block")
Reported-by: syzbot <syzkaller@googlegroups.com>
Closes: https://lore.kernel.org/netdev/6913437c.a70a0220.22f260.013b.GAE@google.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20251112125516.1563021-1-edumazet@google.com
Pull crypto fix from Herbert Xu:
- Fix device reference leak in hisilicon
* tag 'v6.18-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: hisilicon/qm - Fix device reference leak in qm_get_qos_value
Pull smb client fixes from Steve French:
- Multichannel reconnect channel selection fix
- Fix for smbdirect (RDMA) disconnect bug
- Fix for incorrect username length check
- Fix memory leak in mount parm processing
* tag 'v6.18-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: let smbd_disconnect_rdma_connection() turn CREATED into DISCONNECTED
smb: fix invalid username check in smb3_fs_context_parse_param()
cifs: client: fix memory leak in smb3_fs_context_parse_param
smb: client: fix cifs_pick_channel when channel needs reconnect
The irq_domain_free_irqs() helper requires that the irq_domain_ops->free
callback is implemented. Otherwise, the kernel reports the warning message
"NULL pointer, cannot free irq" when irq_dispose_mapping() is invoked to
release the per-HART local interrupts.
Set irq_domain_ops->free to irq_domain_free_irqs_top() to cure that.
Fixes: 832f15f426 ("RISC-V: Treat IPIs as normal Linux IRQs")
Signed-off-by: Nick Hu <nick.hu@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251114-rv-intc-fix-v1-1-a3edd1c1a868@sifive.com
When a zero ASCE is passed to the __ptep_rdp() inline assembly, the
generated instruction should have the R3 field of the instruction set to
zero. However the inline assembly is written incorrectly: for such cases a
zero is loaded into a register allocated by the compiler and this register
is then used by the instruction.
This means that selected TLB entries may not be flushed since the specified
ASCE does not match the one which was used when the selected TLB entries
were created.
Fix this by removing the asce and opt parameters of __ptep_rdp(), since
all callers always pass zero, and use a hard-coded register zero for
the R3 field.
Fixes: 0807b85652 ("s390/mm: add support for RDP (Reset DAT-Protection)")
Cc: stable@vger.kernel.org
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The PureAudio APA DAC and Lotus DAC5 series are USB Audio
2.0 Class devices that support native Direct Stream Digital (DSD)
playback via specific vendor protocols.
Without these quirks, the devices may only function in standard
PCM mode, or fail to correctly report their DSD format capabilities
to the ALSA framework, preventing native DSD playback under Linux.
This commit adds new quirk entries for the mentioned DAC models
based on their respective Vendor/Product IDs (VID:PID), for example:
0x16d0:0x0ab1 (APA DAC), 0x16d0:0xeca1 (DAC5 series), etc.
The quirk ensures correct DSD format handling by setting the required
SNDRV_PCM_FMTBIT_DSD_U32_BE format bit and defining the DSD-specific
Audio Class 2.0 (AC2.0) endpoint configurations. This allows the ALSA
DSD API to correctly address the device for high-bitrate DSD streams,
bypassing the need for DoP (DSD over PCM).
Test on APA DAC and Lotus DAC5 SE under Arch Linux.
Tested-by: Lushih Hsieh <bruce@mail.kh.edu.tw>
Signed-off-by: Lushih Hsieh <bruce@mail.kh.edu.tw>
Link: https://patch.msgid.link/20251114052053.54989-1-bruce@mail.kh.edu.tw
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The xfrm_add_acquire() function constructs an xfrm policy by calling
xfrm_policy_construct(). This allocates the policy structure and
potentially associates a security context and a device policy with it.
However, at the end of the function, the policy object is freed using
only kfree() . This skips the necessary cleanup for the security context
and device policy, leading to a memory leak.
To fix this, invoke the proper cleanup functions xfrm_dev_policy_delete(),
xfrm_dev_policy_free(), and security_xfrm_policy_free() before freeing the
policy object. This approach mirrors the error handling path in
xfrm_add_policy(), ensuring that all associated resources are correctly
released.
Fixes: 980ebd2579 ("[IPSEC]: Sync series - acquire insert")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
ASoC: Fixes for v6.18
A small collection of fixes, all driver specific and none especially
remarkable unless you have the hardware (many not even then).
The function mlxsw_sp_flower_stats() calls mlxsw_sp_acl_ruleset_get() to
obtain a ruleset reference. If the subsequent call to
mlxsw_sp_acl_rule_lookup() fails to find a rule, the function returns
an error without releasing the ruleset reference, causing a memory leak.
Fix this by using a goto to the existing error handling label, which
calls mlxsw_sp_acl_ruleset_put() to properly release the reference.
Fixes: 7c1b8eb175 ("mlxsw: spectrum: Add support for TC flower offload statistics")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251112052114.1591695-1-zilin@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ethtool tsconfig Netlink path can trigger a null pointer
dereference. A call chain such as:
tsconfig_prepare_data() ->
dev_get_hwtstamp_phylib() ->
vlan_hwtstamp_get() ->
generic_hwtstamp_get_lower() ->
generic_hwtstamp_ioctl_lower()
results in generic_hwtstamp_ioctl_lower() being called with
kernel_cfg->ifr as NULL.
The generic_hwtstamp_ioctl_lower() function does not expect
a NULL ifr and dereferences it, leading to a system crash.
Fix this by adding a NULL check for kernel_cfg->ifr in
generic_hwtstamp_ioctl_lower(). If ifr is NULL, return -EINVAL.
Fixes: 6e9e2eed4f ("net: ethtool: Add support for tsconfig command to get/set hwtstamp config")
Closes: https://lore.kernel.org/cd6a7056-fa6d-43f8-b78a-f5e811247ba8@linux.dev
Signed-off-by: Jiaming Zhang <r772577952@gmail.com>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20251111173652.749159-2-r772577952@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull VFIO seftest fixes from Alex Williamson:
- Fix vfio selftests to remove the expectation that the IOMMU supports
a 64-bit IOVA space.
These manifest both in the original set of tests introduced this
development cycle in identity mapping the IOVA to buffer virtual
address space, as well as the more recent boundary testing.
Implement facilities for collecting the valid IOVA ranges from the
backend, implement a simple IOVA allocator, and use the information
for determining extents (Alex Mastro)
* tag 'vfio-v6.18-rc6' of https://github.com/awilliam/linux-vfio:
vfio: selftests: replace iova=vaddr with allocated iovas
vfio: selftests: add iova allocator
vfio: selftests: fix map limit tests to use last available iova
vfio: selftests: add iova range query helpers
Pull hwmon fixes from Guenter Roeck:
- gpd-fan: Fix compilation error for non-ACPI builds, and initialize EC
when loading the driver
* tag 'hwmon-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (gpd-fan) initialize EC on driver load for Win 4
hwmon: (gpd-fan) Fix compilation error in non-ACPI builds
Pull power management fixes from Rafael Wysocki:
"These fix issues related to the handling of compressed hibernation
images and a recent intel_pstate driver regression:
- Fix issues related to using inadequate data types and incorrect use
of atomic variables in the compressed hibernation images handling
code that were introduced during the 6.9 development cycle (Mario
Limonciello)
- Move a X86_FEATURE_IDA check from turbo_is_disabled() to the places
where a new value for MSR_IA32_PERF_CTL is computed in intel_pstate
to address a regression preventing users from enabling turbo
frequencies post-boot (Srinivas Pandruvada)"
* tag 'pm-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Check IDA only before MSR_IA32_PERF_CTL writes
PM: hibernate: Fix style issues in save_compressed_image()
PM: hibernate: Use atomic64_t for compressed_size variable
PM: hibernate: Emit an error when image writing fails
Pull ACPI fixes from Rafael Wysocki:
"These fix issues in the ACPI CPPC library and in the recently added
parser for the ACPI MRRM table:
- Limit some checks in the ACPI CPPC library to online CPUs to avoid
accessing uninitialized per-CPU variables when some CPUs are
offline to start with, like during boot with 'nosmt=force' (Gautham
Shenoy)
- Rework add_boot_memory_ranges() in the ACPI MRRM table parser to
fix memory leaks and improve error handling (Kaushlendra Kumar)"
* tag 'acpi-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: MRRM: Fix memory leaks and improve error handling
ACPI: CPPC: Limit perf ctrs in PCC check only to online CPUs
ACPI: CPPC: Perform fast check switch only for online CPUs
ACPI: CPPC: Check _CPC validity for only the online CPUs
ACPI: CPPC: Detect preferred core availability on online CPUs
We've had reports from the field that some RK3588 Tiger have random
issues with eMMC errors.
Applying commit a28352cf2d ("mmc: sdhci-of-dwcmshc: Change
DLL_STRBIN_TAPNUM_DEFAULT to 0x4") didn't help and seemed to have made
things worse for our board.
Our HW department checked the eMMC lines and reported that they are too
long and don't look great so signal integrity is probably not the best.
Note that not all Tigers with the same eMMC chip have errors, so the
suspicion is that we're really on the edge in terms of signal integrity
and only a handful devices are failing. Additionally, we have RK3588
Jaguars with the same eMMC chip but the layout is different and we also
haven't received reports about those so far.
Lowering the max-frequency to 150MHz from 200MHz instead of simply
disabling HS400 was briefly tested and seem to work as well. We've
disabled HS400 downstream and haven't received reports since so we'll go
with that instead of lowering the max-frequency.
Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de>
Fixes: 6173ef24b3 ("arm64: dts: rockchip: add RK3588-Q7 (Tiger) SoM")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251112-tiger-hs200-v1-1-b50adac107c0@cherry.de
[added Fixes tag and stable-cc from 2nd mail]
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Jiayuan Chen says:
====================
mptcp: Fix conflicts between MPTCP and sockmap
Overall, we encountered a warning [1] that can be triggered by running the
selftest I provided.
sockmap works by replacing sk_data_ready, recvmsg, sendmsg operations and
implementing fast socket-level forwarding logic:
1. Users can obtain file descriptors through userspace socket()/accept()
interfaces, then call BPF syscall to perform these replacements.
2. Users can also use the bpf_sock_hash_update helper (in sockops programs)
to replace handlers when TCP connections enter ESTABLISHED state
(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB/BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB)
However, when combined with MPTCP, an issue arises: MPTCP creates subflow
sk's and performs TCP handshakes, so the BPF program obtains subflow sk's
and may incorrectly replace their sk_prot. We need to reject such
operations. In patch 1, we set psock_update_sk_prot to NULL in the
subflow's custom sk_prot.
Additionally, if the server's listening socket has MPTCP enabled and the
client's TCP also uses MPTCP, we should allow the combination of subflow
and sockmap. This is because the latest Golang programs have enabled MPTCP
for listening sockets by default [2]. For programs already using sockmap,
upgrading Golang should not cause sockmap functionality to fail.
Patch 2 prevents the WARNING from occurring.
Despite these patches fixing stream corruption, users of sockmap must set
GODEBUG=multipathtcp=0 to disable MPTCP until sockmap fully supports it.
[1] truncated warning:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 mptcp_stream_accept+0x34c/0x380
Modules linked in:
RIP: 0010:mptcp_stream_accept+0x34c/0x380
RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
PKRU: 55555554
Call Trace:
<TASK>
do_accept+0xeb/0x190
? __x64_sys_pselect6+0x61/0x80
? _raw_spin_unlock+0x12/0x30
? alloc_fd+0x11e/0x190
__sys_accept4+0x8c/0x100
__x64_sys_accept+0x1f/0x30
x64_sys_call+0x202f/0x20f0
do_syscall_64+0x72/0x9a0
? switch_fpu_return+0x60/0xf0
? irqentry_exit_to_user_mode+0xdb/0x1e0
? irqentry_exit+0x3f/0x50
? clear_bhb_loop+0x50/0xa0
? clear_bhb_loop+0x50/0xa0
? clear_bhb_loop+0x50/0xa0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
---[ end trace 0000000000000000 ]---
[2]: https://go-review.googlesource.com/c/go/+/607715
====================
Link: https://patch.msgid.link/20251111060307.194196-1-jiayuan.chen@linux.dev
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The sockmap feature allows bpf syscall from userspace, or based
on bpf sockops, replacing the sk_prot of sockets during protocol stack
processing with sockmap's custom read/write interfaces.
'''
tcp_rcv_state_process()
syn_recv_sock()/subflow_syn_recv_sock()
tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
bpf_skops_established <== sockops
bpf_sock_map_update(sk) <== call bpf helper
tcp_bpf_update_proto() <== update sk_prot
'''
When the server has MPTCP enabled but the client sends a TCP SYN
without MPTCP, subflow_syn_recv_sock() performs a fallback on the
subflow, replacing the subflow sk's sk_prot with the native sk_prot.
'''
subflow_syn_recv_sock()
subflow_ulp_fallback()
subflow_drop_ctx()
mptcp_subflow_ops_undo_override()
'''
Then, this subflow can be normally used by sockmap, which replaces the
native sk_prot with sockmap's custom sk_prot. The issue occurs when the
user executes accept::mptcp_stream_accept::mptcp_fallback_tcp_ops().
Here, it uses sk->sk_prot to compare with the native sk_prot, but this
is incorrect when sockmap is used, as we may incorrectly set
sk->sk_socket->ops.
This fix uses the more generic sk_family for the comparison instead.
Additionally, this also prevents a WARNING from occurring:
result from ./scripts/decode_stacktrace.sh:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 337 at net/mptcp/protocol.c:68 mptcp_stream_accept \
(net/mptcp/protocol.c:4005)
Modules linked in:
...
PKRU: 55555554
Call Trace:
<TASK>
do_accept (net/socket.c:1989)
__sys_accept4 (net/socket.c:2028 net/socket.c:2057)
__x64_sys_accept (net/socket.c:2067)
x64_sys_call (arch/x86/entry/syscall_64.c:41)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
RIP: 0033:0x7f87ac92b83d
---[ end trace 0000000000000000 ]---
Fixes: 0b4f33def7 ("mptcp: fix tcp fallback crash")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20251111060307.194196-3-jiayuan.chen@linux.dev
Multiple threads may be creating and destroying BFD objects in
situations like `perf top`.
Without appropriate initialization crashes may occur during libbfd's
cache management.
BFD's locks require recursive mutexes, add support for these.
Committer testing:
This happens only when building with 'make BUILD_NONDISTRO=1' and having
the binutils-devel package (or equivalent) installed, i.e. linking with
binutils devel files, an opt-in perf build.
Before:
root@x1:~# perf top
perf: Segmentation fault
-------- backtrace --------
<SNIP multiple failed attempts at printing a backtrace>
root@x1:~#
After this patch it works as before.
Closes: https://lore.kernel.org/lkml/aQt66zhfxSA80xwt@gentoo.org/
Fixes: 95931d9a59 ("perf libbfd: Move libbfd functionality to its own file")
Reported-by: Guilherme Amadio <amadio@gentoo.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Couple of independent fixes:
1. Wire in SIGSEGV handler that terminates the test with a failure code.
2. Use "--lock-cgroup" instead of "-g"; "-g" was proposed but never
merged. See commit 4d1792d0a2 ("perf lock contention: Add
--lock-cgroup option")
3. Call cleanup() on every normal exit so trap_cleanup() doesn't mistake
it for an unexpected signal and emit a false-negative "Unexpected
signal in main" message.
Before patch:
# ./perf test -vv "lock contention"
85: kernel lock contention analysis test:
--- start ---
test child forked, pid 610711
Testing perf lock record and perf lock contention
Testing perf lock contention --use-bpf
Testing perf lock record and perf lock contention at the same time
Testing perf lock contention --threads
Testing perf lock contention --lock-addr
Testing perf lock contention --lock-cgroup
Unexpected signal in test_aggr_cgroup
---- end(0) ----
85: kernel lock contention analysis test : Ok
After patch:
# ./perf test -vv "lock contention"
85: kernel lock contention analysis test:
--- start ---
test child forked, pid 602637
Testing perf lock record and perf lock contention
Testing perf lock contention --use-bpf
Testing perf lock record and perf lock contention at the same time
Testing perf lock contention --threads
Testing perf lock contention --lock-addr
Testing perf lock contention --lock-cgroup
Testing perf lock contention --type-filter (w/ spinlock)
Testing perf lock contention --lock-filter (w/ tasklist_lock)
Testing perf lock contention --callstack-filter (w/ unix_stream)
[Skip] Could not find 'unix_stream'
Testing perf lock contention --callstack-filter with task aggregation
[Skip] Could not find 'unix_stream'
Testing perf lock contention --cgroup-filter
Testing perf lock contention CSV output
---- end(0) ----
85: kernel lock contention analysis test : Ok
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Tycho Andersen <tycho@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
9d7dfb95da ("KVM: VMX: Inject #UD if guest tries to execute SEAMCALL or TDCALL")
The 'perf kvm-stat' tool uses the exit reasons that are included in the
VMX_EXIT_REASONS define, this new SEAMCALL isn't included there (TDCALL
is), so shouldn't be causing any change in behaviour, this patch ends up
being just addressess the following perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
Please see tools/include/uapi/README for further details.
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is one more remnant of the BUILD_NONDISTRO series to make building
with binutils-devel opt-in due to license incompatibility.
In this case just the references at link time were still in place, which
make building the test-all.bin file fail, which wasn't detected before
probably because the last test was done with binutils-devel available,
doh.
Now:
$ rpm -q binutils-devel
package binutils-devel is not installed
$ file /tmp/build/perf-tools/feature/test-all.bin
/tmp/build/perf-tools/feature/test-all.bin: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
BuildID[sha1]=4b5388a346b51f1b993f0b0dbd49f4570769b03c, for GNU/Linux 3.2.0, not stripped
$
Fixes: 970ae86307 ("perf build: The bfd features are opt-in, stop testing for them by default")
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With commit f0d0f978f3 ("perf header: Don't write empty BPF/BTF
info"), the write_bpf_( prog_info() | btf() ) functions exit without
writing anything if env->bpf_prog.(infos| btfs)_cnt is zero.
process_bpf_( prog_info() | btf() ), however, still expect a "count"
value to exist in the data file. If btf information is empty, for
example, process_bpf_btf will read garbage or some other data as the
number of btf nodes in the data file. As a result, the data file will
not be processed correctly.
Instead, write the count to the data file and exit if it is zero.
Fixes: f0d0f978f3 ("perf header: Don't write empty BPF/BTF info")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Merge fixes for issues related to the handling of compressed hibernation
images that were introduced during the 6.9 development cycle.
* pm-sleep:
PM: hibernate: Fix style issues in save_compressed_image()
PM: hibernate: Use atomic64_t for compressed_size variable
PM: hibernate: Emit an error when image writing fails
Pull slab fix from Vlastimil Babka:
- Fix memory leak of objects from remote NUMA node when bulk freeing to
a cache with sheaves (Harry Yoo)
* tag 'slab-for-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slub: fix memory leak in free_to_pcs_bulk()
Merge ACPI CPPC library fixes and an ACPI MRRM table parser fix for
6.18-rc6.
* acpi-cppc:
ACPI: CPPC: Limit perf ctrs in PCC check only to online CPUs
ACPI: CPPC: Perform fast check switch only for online CPUs
ACPI: CPPC: Check _CPC validity for only the online CPUs
ACPI: CPPC: Detect preferred core availability on online CPUs
* acpi-tables:
ACPI: MRRM: Fix memory leaks and improve error handling
Pull kselftest fix from Shuah Khan:
"Fixes event-filter-function.tc tracing test failure caused when a
first run to sample events triggers kmem_cache_free which interferes
with the rest of the test.
Fix this by calling sample_events twice to eliminate the
kmem_cache_free related noise from the sampling"
* tag 'linux_kselftest-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/tracing: Run sample events to clear page cache events
The commit 989b09b739 ("slab: skip percpu sheaves for remote object
freeing") introduced the remote_objects array in free_to_pcs_bulk() to
skip sheaves when objects from a remote node are freed.
However, the array is flushed only when:
1) the array becomes full (++remote_nr >= PCS_BATCH_MAX), or
2) slab_free_hook() returns false and size becomes zero.
When neither of the conditions is met, objects in the array are leaked.
This resulted in a memory leak [1], where 82 GiB of memory was allocated
for the maple_node cache.
Flush the array after successfully freeing objects to sheaves
in the do_free: path.
In the meantime, move the snippet if (!size) goto flush_remote; outside
the while loop for readability. Let's say all objects in the array are
from a remote node: then we acquire s->cpu_sheaves->lock and try to free
an object even when size is zero. This doesn't appear to be harmful,
but isn't really readable.
Reported-by: Tytus Rogalewski <admin@simplepod.ai>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220765 [1]
Closes: https://lore.kernel.org/linux-mm/20251107094809.12e9d705b7bf4815783eb184@linux-foundation.org
Closes: https://lore.kernel.org/all/aRGDTwbt2EIz2CYn@hyeyoo
Fixes: 989b09b739 ("slab: skip percpu sheaves for remote object freeing")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20251111125331.12246-1-harry.yoo@oracle.com
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Tested-by: Tytus Rogalewski <admin@simplepod.ai>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
For PREEMPT_RT=y kernels, the deferred_irq_workfn() is executed in
the per-cpu irq_work/* task context and not disable-irq, if the rq
returned by container_of() is current CPU's rq, the following scenarios
may occur:
lock(&rq->__lock);
<Interrupt>
lock(&rq->__lock);
This commit use IRQ_WORK_INIT_HARD() to replace init_irq_work() to
initialize rq->scx.deferred_irq_work, make the deferred_irq_workfn()
is always invoked in hard-irq context.
Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
The sockmap feature allows bpf syscall from userspace, or based on bpf
sockops, replacing the sk_prot of sockets during protocol stack processing
with sockmap's custom read/write interfaces.
'''
tcp_rcv_state_process()
subflow_syn_recv_sock()
tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
bpf_skops_established <== sockops
bpf_sock_map_update(sk) <== call bpf helper
tcp_bpf_update_proto() <== update sk_prot
'''
Consider two scenarios:
1. When the server has MPTCP enabled and the client also requests MPTCP,
the sk passed to the BPF program is a subflow sk. Since subflows only
handle partial data, replacing their sk_prot is meaningless and will
cause traffic disruption.
2. When the server has MPTCP enabled but the client sends a TCP SYN
without MPTCP, subflow_syn_recv_sock() performs a fallback on the
subflow, replacing the subflow sk's sk_prot with the native sk_prot.
'''
subflow_ulp_fallback()
subflow_drop_ctx()
mptcp_subflow_ops_undo_override()
'''
Subsequently, accept::mptcp_stream_accept::mptcp_fallback_tcp_ops()
converts the subflow to plain TCP.
For the first case, we should prevent it from being combined with sockmap
by setting sk_prot->psock_update_sk_prot to NULL, which will be blocked by
sockmap's own flow.
For the second case, since subflow_syn_recv_sock() has already restored
sk_prot to native tcp_prot/tcpv6_prot, no further action is needed.
Fixes: cec37a6e41 ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20251111060307.194196-2-jiayuan.chen@linux.dev
The stub implementation of arch_xfer_to_guest_mode_handle_work() is
guarded by an #ifndef that incorrectly checks for the name
arch_xfer_to_guest_mode_work instead. It seems the function was renamed
to add "_handle" as a late change to the original patch, and the #ifndef
wasn't updated to go with it.
Change the #ifndef to match the name of the function. No users right now,
so no need to update any architecture code.
Fixes: 935ace2fb5 ("entry: Provide infrastructure for work before transitioning to guest mode")
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251105-entry-fix-ifndef-v1-1-d8d28045b627@linux.ibm.com
Felix Maurer says:
====================
hsr: Send correct HSRv0 supervision frames
Hangbin recently reported that the hsr selftests were failing and noted
that the entries in the node table were not merged, i.e., had
00:00:00:00:00:00 as MacAddressB forever [1].
This failure only occured with HSRv0 because it was not sending
supervision frames anymore. While debugging this I found that we were
not really following the HSRv0 standard for the supervision frames we
sent, so I additionally made a few changes to get closer to the standard
and restore a more correct behavior we had a while ago.
The selftests can still fail because they take a while and run into the
timeout. I did not include a change of the timeout because I have more
improvements to the selftests mostly ready that change the test duration
but are net-next material.
[1]: https://lore.kernel.org/netdev/aMONxDXkzBZZRfE5@fedora/
====================
Link: https://patch.msgid.link/cover.1762876095.git.fmaurer@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
For HSRv0, the path_id has the following meaning:
- 0000: PRP supervision frame
- 0001-1001: HSR ring identifier
- 1010-1011: Frames from PRP network (A/B, with RedBoxes)
- 1111: HSR supervision frame
Follow the IEC 62439-3:2010 standard more closely by setting the right
path_id for HSRv0 supervision frames (actually, it is correctly set when
the frame is constructed, but hsr_set_path_id() overwrites it) and set a
fixed HSR ring identifier of 1. The ring identifier seems to be generally
unused and we ignore it anyways on reception, but some fixed identifier is
definitely better than using one identifier in one direction and a wrong
identifier in the other.
This was also the behavior before commit f266a683a4 ("net/hsr: Better
frame dispatch") which introduced the alternating path_id. This was later
moved to hsr_set_path_id() in commit 451d8123f8 ("net: prp: add packet
handling support").
The IEC 62439-3:2010 also contains 6 unused bytes after the MacAddressA in
the HSRv0 supervision frames. Adjust a TODO comment accordingly.
Fixes: f266a683a4 ("net/hsr: Better frame dispatch")
Fixes: 451d8123f8 ("net: prp: add packet handling support")
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/ea0d5133cd593856b2fa673d6e2067bf1d4d1794.1762876095.git.fmaurer@redhat.com
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The MODULE_PARM_DESC string for the "active" parameter is missing a
space and has an extraneous trailing ']' character. Correct these.
Before patch:
$ modinfo -p ./drm_client_lib.ko
active:Choose which drm client to start, default isfbdev] (string)
After patch:
$ modinfo -p ./drm_client_lib.ko
active:Choose which drm client to start, default is fbdev (string)
Fixes: f7b42442c4 ("drm/log: Introduce a new boot logger to draw the kmsg on the screen")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20251112010920.2355712-1-rdunlap@infradead.org
Pull erofs fixes from Gao Xiang:
- Add Chunhai Guo as a EROFS reviewer to get more eyes from interested
industry vendors
- Fix infinite loop caused by incomplete crafted zstd-compressed data
(thanks to Robert again!)
* tag 'erofs-for-6.18-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: avoid infinite loop due to incomplete zstd-compressed data
MAINTAINERS: erofs: add myself as reviewer
Pull smb server fixes from Steve French:
- Fix smbdirect (RDMA) disconnect hang bug
- Fix potential Denial of Service when connection limit exceeded
- Fix smbdirect (RDMA) connection (potentially accessing freed memory)
bug
* tag 'v6.18-rc5-smb-server-fixes' of git://git.samba.org/ksmbd:
smb: server: let smb_direct_disconnect_rdma_connection() turn CREATED into DISCONNECTED
ksmbd: close accepted socket when per-IP limit rejects connection
smb: server: rdma: avoid unmapping posted recv on accept failure
The purpose of commit 703eec1b24 ("virtio_net: fixing XDP for fully
checksummed packets handling") is to record the flags in advance, as
their value may be overwritten in the XDP case. However, the flags
recorded under big mode are incorrect, because in big mode, the passed
buf does not point to the rx buffer, but rather to the page of the
submitted buffer. This commit fixes this issue.
For the small mode, the commit c11a49d58a ("virtio_net: Fix mismatched
buf address when unmapping for small packets") fixed it.
Tested-by: Alyssa Ross <hi@alyssa.is>
Fixes: 703eec1b24 ("virtio_net: fixing XDP for fully checksummed packets handling")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20251111090828.23186-1-xuanzhuo@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull nfsd fixes from Chuck Lever:
"Address recently reported issues or issues found at the recent NFS
bake-a-thon held in Raleigh, NC.
Issues reported with v6.18-rc:
- Address a kernel build issue
- Reorder SEQUENCE processing to avoid spurious NFS4ERR_SEQ_MISORDERED
Issues that need expedient stable backports:
- Close a refcount leak exposure
- Report support for NFSv4.2 CLONE correctly
- Fix oops during COPY_NOTIFY processing
- Prevent rare crash after XDR encoding failure
- Prevent crash due to confused or malicious NFSv4.1 client"
* tag 'nfsd-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
Revert "SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it"
nfsd: ensure SEQUENCE replay sends a valid reply.
NFSD: Never cache a COMPOUND when the SEQUENCE operation fails
NFSD: Skip close replay processing if XDR encoding fails
NFSD: free copynotify stateid in nfs4_free_ol_stateid()
nfsd: add missing FATTR4_WORD2_CLONE_BLKSIZE from supported attributes
nfsd: fix refcount leak in nfsd_set_fh_dentry()
Pull dma-mapping fixes from Marek Szyprowski:
- two minor fixes for DMA API infrastructure: restoring proper
structure padding used in benchmark tests (Qinxin Xia) and global
DMA_BIT_MASK macro rework to make it a bit more clang friendly (James
Clark)
* tag 'dma-mapping-6.18-2025-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma-mapping: Allow use of DMA_BIT_MASK(64) in global scope
dma-mapping: benchmark: Restore padding to ensure uABI remained consistent
Pull LoongArch fixes from Huacai Chen:
- Fix a Rust build error
- Fix exception/interrupt, memory management, perf event, hardware
breakpoint, kexec and KVM bugs
* tag 'loongarch-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Fix max supported vCPUs set with EIOINTC
LoongArch: KVM: Skip PMU checking on vCPU context switch
LoongArch: KVM: Restore guest PMU if it is enabled
LoongArch: KVM: Add delay until timer interrupt injected
LoongArch: KVM: Set page with write attribute if dirty track disabled
LoongArch: kexec: Print out debugging message if required
LoongArch: kexec: Initialize the kexec_buf structure
LoongArch: Use correct accessor to read FWPC/MWPC
LoongArch: Refine the init_hw_perf_events() function
LoongArch: Remove __GFP_HIGHMEM masking in pud_alloc_one()
LoongArch: Let {pte,pmd}_modify() record the status of _PAGE_DIRTY
LoongArch: Consolidate max_pfn & max_low_pfn calculation
LoongArch: Consolidate early_ioremap()/ioremap_prot()
LoongArch: Use physical addresses for CSR_MERRENTRY/CSR_TLBRENTRY
LoongArch: Clarify 3 MSG interrupt features
rust: Add -fno-isolate-erroneous-paths-dereference to bindgen_skip_c_flags
Pull alpha fix from Matt Turner:
"Add Magnus as a maintainer of the alpha port"
* tag 'alpha-fixes-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
MAINTAINERS: Add Magnus Lindholm as maintainer for alpha port
Christian reported that f3ac2ff148 ("PCI/ASPM: Enable all ClockPM and
ASPM states for devicetree platforms") broke booting on the A-EON AmigaOne
X1000.
Override the L0s and L1 Support advertised in Link Capabilities by the
X1000 Root Ports ([1959:a002]) so we don't try to enable those states.
Fixes: f3ac2ff148 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
Fixes: df5192d9bb ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Link: https://lore.kernel.org/r/a41d2ca1-fcd9-c416-b111-a958e92e94bf@xenosoft.de
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Existing quirks to disable ASPM L0s and L1 use pci_disable_link_state(),
which disables ASPM states and prevents their use in the future. But since
they are FINAL quirks, they happen after ASPM has already been enabled.
Here's a typical call path:
pci_host_probe
pci_scan_root_bus_bridge
pci_scan_child_bus
pci_scan_slot
pci_scan_single_device
pci_device_add
pci_fixup_device(pci_fixup_header) # HEADER quirks
pcie_aspm_init_link_state
pcie_config_aspm_path
pcie_config_aspm_link
pcie_config_aspm_dev # ASPM may be enabled
pci_bus_add_devices
pci_bus_add_devices
pci_fixup_device(pci_fixup_final) # FINAL quirks
quirk_disable_aspm_l0s
pci_disable_link_state(dev, PCIE_LINK_STATE_L0S)
Sometimes enabling ASPM can make the link non-functional, so if we know
ASPM is broken on a device, we shouldn't enable it at all, even
temporarily.
Convert the existing quirks to use pcie_aspm_remove_cap() instead, which
overrides the ASPM Support advertised in PCIe Link Capabilities, and make
them HEADER quirks so they run before pcie_aspm_init_link_state() has a
chance to enable ASPM.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Link: https://patch.msgid.link/20251110222929.2140564-4-helgaas@kernel.org
Defective devices sometimes advertise support for ASPM L0s or L1 states
even if they don't work correctly.
Cache the L0s Supported and L1 Supported bits early in enumeration so
HEADER quirks can override the ASPM states advertised in Link Capabilities
before pcie_aspm_cap_init() enables ASPM.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Link: https://patch.msgid.link/20251110222929.2140564-2-helgaas@kernel.org
rsnd_ssiu_probe() leaks an OF node reference obtained by
rsnd_ssiu_of_node(). The node reference is acquired but
never released across all return paths.
Fix it by declaring the device node with the __free(device_node)
cleanup construct to ensure automatic release when the variable goes
out of scope.
Fixes: 4e7788fb80 ("ASoC: rsnd: add SSIU BUSIF support")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/20251112065709.1522-1-vulab@iscas.ac.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
The following lockdep splat was observed while kernel auto-online a CXL
memory region:
======================================================
WARNING: possible circular locking dependency detected
6.17.0djtest+ #53 Tainted: G W
------------------------------------------------------
systemd-udevd/3334 is trying to acquire lock:
ffffffff90346188 (hmem_resource_lock){+.+.}-{4:4}, at: hmem_register_resource+0x31/0x50
but task is already holding lock:
ffffffff90338890 ((node_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x2e/0x70
which lock already depends on the new lock.
[..]
Chain exists of:
hmem_resource_lock --> mem_hotplug_lock --> (node_chain).rwsem
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock((node_chain).rwsem);
lock(mem_hotplug_lock);
lock((node_chain).rwsem);
lock(hmem_resource_lock);
The lock ordering can cause potential deadlock. There are instances
where hmem_resource_lock is taken after (node_chain).rwsem, and vice
versa.
Split out the target update section of hmat_register_target() so that
hmat_callback() only envokes that section instead of attempt to register
hmem devices that it does not need to.
[ dj: Fix up comment to be closer to 80cols. (Jonathan) ]
Fixes: cf8741ac57 ("ACPI: NUMA: HMAT: Register "soft reserved" memory as an "hmem" device")
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20251105235115.85062-3-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Building gpd-fan driver without CONFIG_ACPI results in the following
build errors:
drivers/hwmon/gpd-fan.c: In function ‘gpd_ecram_read’:
drivers/hwmon/gpd-fan.c:228:9: error: implicit declaration of function ‘outb’ [-Werror=implicit-function-declaration]
228 | outb(0x2E, addr_port);
| ^~~~
drivers/hwmon/gpd-fan.c:241:16: error: implicit declaration of function ‘inb’ [-Werror=implicit-function-declaration]
241 | *val = inb(data_port);
The definitions for inb() and outb() come from <linux/io.h>
(specifically through <asm/io.h>), which is implicitly included via
<acpi_io.h>. When CONFIG_ACPI is not set, <acpi_io.h> is not included
resulting in <linux/io.h> to be omitted as well.
Since the driver does not depend on ACPI, remove <linux/acpi.h> and add
<linux/io.h> directly to fix the compilation errors.
Signed-off-by: Gopi Krishna Menon <krishnagopi487@gmail.com>
Link: https://lore.kernel.org/r/20251024202042.752160-1-krishnagopi487@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Update scx_task_locks so that it's safe to lock/unlock in a
non-sleepable context in PREEMPT_RT kernels. scx_task_locks is
(non-raw) spinlock used to protect the list of tasks under SCX.
This list is updated during from finish_task_switch(), which
cannot sleep. Regular spinlocks can be locked in such a context
in non-RT kernels, but are sleepable under when CONFIG_PREEMPT_RT=y.
Convert scx_task_locks into a raw spinlock, which is not sleepable
even on RT kernels.
Sample backtrace:
<TASK>
dump_stack_lvl+0x83/0xa0
__might_resched+0x14a/0x200
rt_spin_lock+0x61/0x1c0
? sched_ext_dead+0x2d/0xf0
? lock_release+0xc6/0x280
sched_ext_dead+0x2d/0xf0
? srso_alias_return_thunk+0x5/0xfbef5
finish_task_switch.isra.0+0x254/0x360
__schedule+0x584/0x11d0
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? tick_nohz_idle_exit+0x7e/0x120
schedule_idle+0x23/0x40
cpu_startup_entry+0x29/0x30
start_secondary+0xf8/0x100
common_startup_64+0x13e/0x148
</TASK>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Johannes Berg says:
====================
Couple more fixes:
- mwl8k: work around FW expecting a DSSS element in beacons
- ath11k: report correct TX status
- iwlwifi: avoid toggling links due to wrong element use
- iwlwifi: fix beacon template rate on older devices
- iwlwifi: fix loop iterator being used after loop
- mac80211: disallow address changes while using the address
- mac80211: avoid bad rate warning in monitor/sniffer mode
- hwsim: fix potential NULL deref (on monitor injection)
* tag 'wireless-2025-11-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: mld: always take beacon ies in link grading
wifi: iwlwifi: mvm: fix beacon template/fixed rate
wifi: iwlwifi: fix aux ROC time event iterator usage
wifi: mwl8k: inject DSSS Parameter Set element into beacons if missing
wifi: mac80211_hwsim: Fix possible NULL dereference
wifi: mac80211: skip rate verification for not captured PSDUs
wifi: mac80211: reject address change while connecting
wifi: ath11k: zero init info->status in wmi_process_mgmt_tx_comp()
====================
Link: https://patch.msgid.link/20251112114621.15716-5-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit ac4e04d9e3 ("cpufreq: intel_pstate: Unchecked MSR aceess in
legacy mode") introduced a check for feature X86_FEATURE_IDA to verify
turbo mode support. Although this is the correct way to check for turbo
mode support, it causes issues on some platforms that disable turbo
during OS boot, but enable it later [1]. Before adding this feature
check, users were able to get turbo mode frequencies by writing 0 to
/sys/devices/system/cpu/intel_pstate/no_turbo post-boot.
To restore the old behavior on the affected systems while still
addressing the unchecked MSR issue on some Skylake-X systems, check
X86_FEATURE_IDA only immediately before updates of MSR_IA32_PERF_CTL
that may involve setting the Turbo Engage Bit (bit 32).
Fixes: ac4e04d9e3 ("cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode")
Reported-by: Aaron Rainbolt <arainbolt@kfocus.org>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2122531 [1]
Tested-by: Aaron Rainbolt <arainbolt@kfocus.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject adjustment, changelog edits ]
Link: https://patch.msgid.link/20251111010840.141490-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
io_buffer_register_bvec() currently uses blk_rq_nr_phys_segments() as
the number of bvecs in the request. However, bvecs may be split into
multiple segments depending on the queue limits. Thus, the number of
segments may overestimate the number of bvecs. For ublk devices, the
only current users of io_buffer_register_bvec(), virt_boundary_mask,
seg_boundary_mask, max_segments, and max_segment_size can all be set
arbitrarily by the ublk server process.
Set imu->nr_bvecs based on the number of bvecs the rq_for_each_bvec()
loop actually yields. However, continue using blk_rq_nr_phys_segments()
as an upper bound on the number of bvecs when allocating imu to avoid
needing to iterate the bvecs a second time.
Link: https://lore.kernel.org/io-uring/20251111191530.1268875-1-csander@purestorage.com/
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Fixes: 27cb27b6d5 ("io_uring: add support for kernel registered bvecs")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
vfio_dma_mapping_test and vfio_pci_driver_test currently use iova=vaddr
as part of DMA mapping operations. However, not all IOMMUs support the
same virtual address width as the processor. For instance, older Intel
consumer platforms only support 39-bits of IOMMU address space. On such
platforms, using the virtual address as the IOVA fails.
Make the tests more robust by using iova_allocator to vend IOVAs, which
queries legally accessible IOVAs from the underlying IOMMUFD or VFIO
container.
Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-4-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
VFIO selftests need to map IOVAs from legally accessible ranges, which
could vary between hardware. Tests in vfio_dma_mapping_test.c are making
excessively strong assumptions about which IOVAs can be mapped.
Add vfio_iommu_iova_ranges(), which queries IOVA ranges from the
IOMMUFD or VFIO container associated with the device. The queried ranges
are normalized to IOMMUFD's iommu_iova_range representation so that
handling of IOVA ranges up the stack can be implementation-agnostic.
iommu_iova_range and vfio_iova_range are equivalent, so bias to using the
new interface's struct.
Query IOMMUFD's ranges with IOMMU_IOAS_IOVA_RANGES.
Query VFIO container's ranges with VFIO_IOMMU_GET_INFO and
VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE.
The underlying vfio_iommu_type1_info buffer-related functionality has
been kept generic so the same helpers can be used to query other
capability chain information, if needed.
Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-1-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
The sit driver's packet transmission path calls: sit_tunnel_xmit() ->
update_or_create_fnhe(), which lead to fnhe_remove_oldest() being called
to delete entries exceeding FNHE_RECLAIM_DEPTH+random.
The race window is between fnhe_remove_oldest() selecting fnheX for
deletion and the subsequent kfree_rcu(). During this time, the
concurrent path's __mkroute_output() -> find_exception() can fetch the
soon-to-be-deleted fnheX, and rt_bind_exception() then binds it with a
new dst using a dst_hold(). When the original fnheX is freed via RCU,
the dst reference remains permanently leaked.
CPU 0 CPU 1
__mkroute_output()
find_exception() [fnheX]
update_or_create_fnhe()
fnhe_remove_oldest() [fnheX]
rt_bind_exception() [bind dst]
RCU callback [fnheX freed, dst leak]
This issue manifests as a device reference count leak and a warning in
dmesg when unregistering the net device:
unregister_netdevice: waiting for sitX to become free. Usage count = N
Ido Schimmel provided the simple test validation method [1].
The fix clears 'oldest->fnhe_daddr' before calling fnhe_flush_routes().
Since rt_bind_exception() checks this field, setting it to zero prevents
the stale fnhe from being reused and bound to a new dst just before it
is freed.
[1]
ip netns add ns1
ip -n ns1 link set dev lo up
ip -n ns1 address add 192.0.2.1/32 dev lo
ip -n ns1 link add name dummy1 up type dummy
ip -n ns1 route add 192.0.2.2/32 dev dummy1
ip -n ns1 link add name gretap1 up arp off type gretap \
local 192.0.2.1 remote 192.0.2.2
ip -n ns1 route add 198.51.0.0/16 dev gretap1
taskset -c 0 ip netns exec ns1 mausezahn gretap1 \
-A 198.51.100.1 -B 198.51.0.0/16 -t udp -p 1000 -c 0 -q &
taskset -c 2 ip netns exec ns1 mausezahn gretap1 \
-A 198.51.100.1 -B 198.51.0.0/16 -t udp -p 1000 -c 0 -q &
sleep 10
ip netns pids ns1 | xargs kill
ip netns del ns1
Cc: stable@vger.kernel.org
Fixes: 67d6d681e1 ("ipv4: make exception cache less predictible")
Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251111064328.24440-1-nashuiliang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace disable_irq() with disable_irq_nosync() in msm_pinmux_set_mux()
to prevent deadlock when wakeup IRQ is triggered on the same
GPIO being reconfigured.
The issue occurs when a wakeup IRQ is triggered on a GPIO and the IRQ
handler attempts to reconfigure the same GPIO's pinmux. In this scenario,
msm_pinmux_set_mux() calls disable_irq() which waits for the currently
running IRQ handler to complete, creating a circular dependency that
results in deadlock.
Using disable_irq_nosync() avoids waiting for the IRQ handler to
complete, preventing the deadlock condition while still properly
disabling the interrupt during pinmux reconfiguration.
Suggested-by: Prasad Sodagudi <prasad.sodagudi@oss.qualcomm.com>
Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
On VHE, the Fine Grain Traps registers are written to hardware in
kvm_arch_vcpu_load()->..->__activate_traps_hfgxtr(), but the fgt array is
computed later, in kvm_vcpu_load_fgt(). This can lead to zero being written
to the FGT registers the first time a VCPU is loaded. Also, any changes to
the fgt array will be visible only after the VCPU is scheduled out, and
then back in, which is not the intended behaviour.
Fix it by computing the fgt array just before the fgt traps are written
to hardware.
Fixes: fb10ddf35c ("KVM: arm64: Compute per-vCPU FGTs at vcpu_load()")
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Oliver Upton <oupton@kernel.org>
Link: https://patch.msgid.link/20251112102853.47759-1-alexandru.elisei@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
In systemd we're trying to switch the internal credentials setup logic
to new mount API [1], and I noticed fsconfig(FSCONFIG_CMD_RECONFIGURE)
consistently fails on tmpfs with noswap option. This can be trivially
reproduced with the following:
```
int fs_fd = fsopen("tmpfs", 0);
fsconfig(fs_fd, FSCONFIG_SET_FLAG, "noswap", NULL, 0);
fsconfig(fs_fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
fsmount(fs_fd, 0, 0);
fsconfig(fs_fd, FSCONFIG_CMD_RECONFIGURE, NULL, NULL, 0); <------ EINVAL
```
After some digging the culprit is shmem_reconfigure() rejecting
!(ctx->seen & SHMEM_SEEN_NOSWAP) && sbinfo->noswap, which is bogus
as ctx->seen serves as a mask for whether certain options are touched
at all. On top of that, noswap option doesn't use fsparam_flag_no,
hence it's not really possible to "reenable" swap to begin with.
Drop the check and redundant SHMEM_SEEN_NOSWAP flag.
[1] https://github.com/systemd/systemd/pull/39637
Fixes: 2c6efe9cf2 ("shmem: add support to ignore swap")
Signed-off-by: Mike Yuan <me@yhndnzj.com>
Link: https://patch.msgid.link/20251108190930.440685-1-me@yhndnzj.com
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
grab_requested_mnt_ns was changed to return error codes on failure, but
its callers were not updated to check for error pointers, still checking
only for a NULL return value.
This commit updates the callers to use IS_ERR() or IS_ERR_OR_NULL() and
PTR_ERR() to correctly check for and propagate errors.
This also makes sure that the logic actually works and mount namespace
file descriptors can be used to refere to mounts.
Christian Brauner <brauner@kernel.org> says:
Rework the patch to be more ergonomic and in line with our overall error
handling patterns.
Fixes: 7b9d14af87 ("fs: allow mount namespace fd")
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrei Vagin <avagin@google.com>
Link: https://patch.msgid.link/20251111062815.2546189-1-avagin@google.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
One of the factors of a link's grade is the channel load, which is
calculated from the AP's bss load element.
The current code takes this element from the beacon for an active link,
and from bss->ies for an inactive link.
bss->ies is set to either the beacon's ies or to the probe response
ones, with preference to the probe response (meaning that if there was
even one probe response, the ies of it will be stored in bss->ies and
won't be overiden by the beacon ies).
The probe response can be very old, i.e. from the connection time,
where a beacon is updated before each link selection (which is
triggered only after a passive scan).
In such case, the bss load element in the probe response will not
include the channel load caused by the STA, where the beacon will.
This will cause the inactive link to always have a lower channel
load, and therefore an higher grade than the active link's one.
This causes repeated link switches, causing the throughput to drop.
Fix this by always taking the ies from the beacon, as those are for
sure new.
Fixes: d1e879ec60 ("wifi: iwlwifi: add iwlmld sub-driver")
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20251110145652.b493dbb1853a.I058ba7309c84159f640cc9682d1bda56dd56a536@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
The list_for_each_entry() iterator must not be used outside the loop.
Even though we break and check for NULL, doing so still violates kernel
iteration rules and triggers Coccinelle's use_after_iter.cocci warning.
Cache the matched entry in aux_roc_te and use it consistently after the
loop. This follows iterator best practices, resolves the warning, and
makes the code more maintainable.
Signed-off-by: Junjie Cao <junjie.cao@intel.com>
Link: https://patch.msgid.link/20251016014919.383565-1-junjie.cao@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Fix the following corner case:-
Consider a 2M huge page SVM allocation, followed by prefetch call for
the first 4K page. The whole range is initially mapped with single PTE.
After the prefetch, this range gets split to first page + rest of the
pages. Currently, the first page mapping is not updated on MI300A (APU)
since page hasn't migrated. However, after range split PTE mapping it not
valid.
Fix this by forcing page table update for the whole range when prefetch
is called. Calling prefetch on APU doesn't improve performance. If all
it deteriotes. However, functionality has to be supported.
v2: Use apu_prefer_gtt as this issue doesn't apply to APUs with carveout
VRAM
v3: Simplify by setting the flag for all ASICs as it doesn't affect dGPU
v4: Remove v2 and v3 changes. Force update_mapping when range is split
at a size that is not aligned to prange granularity
Suggested-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 076470b9f6)
Over allocation of save area is not fatal, only under allocation is.
ROCm has various components that independently claim authority over save
area size.
Unless KFD decides to claim single authority, relax size checks.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 15bd4958fe)
Cc: stable@vger.kernel.org
When the BO pointer provided to amdgpu_bo_create_kernel() points to
non-NULL, amdgpu_bo_create_kernel() takes it as a hint to pin that address
rather than allocate a new BO.
This functionality is never desired for allocating ISP buffers. A new BO
should always be created when isp_kernel_buffer_alloc() is called, per the
description for isp_kernel_buffer_alloc().
Ensure this by zeroing *bo right before the amdgpu_bo_create_kernel() call.
Fixes: 55d42f6169 ("drm/amd/amdgpu: Add helper functions for isp buffers")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 73c8c29baa)
[Why]
When changing resolution (e.g., 4K → FHD) in mirror/clone mode with
certain monitors, the monitor blanks and loses connection due to an early
exit in vrr_settings_require_update(). The function only checks if VRR
state, fixed refresh target, or min/max refresh rate range has changed.
During mode changes, if the calculated min/max refresh values remain the
same even though the stream's v_total changed, the function returns early
without updating vrr_params.adjust.v_total_min/max, leaving the monitor's
VRR timing parameters unsynced with the new mode, causing it to blank out.
[How]
Explicitly adjust VRR parameters to the stream's nominal v_total when VRR
is supported, but inactive.
Fixes: 6d31602a9f ("drm/amd/display: more liberal vmin/vmax update for freesync")
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 607df8248a)
Fix a potential deadlock caused by inconsistent spinlock usage
between interrupt and process contexts in the userq fence driver.
The issue occurs when amdgpu_userq_fence_driver_process() is called
from both:
- Interrupt context: gfx_v11_0_eop_irq() -> amdgpu_userq_fence_driver_process()
- Process context: amdgpu_eviction_fence_suspend_worker() ->
amdgpu_userq_fence_driver_force_completion() -> amdgpu_userq_fence_driver_process()
In interrupt context, the spinlock was acquired without disabling
interrupts, leaving it in {IN-HARDIRQ-W} state. When the same lock
is acquired in process context, the kernel detects inconsistent
locking since the process context acquisition would enable interrupts
while holding a lock previously acquired in interrupt context.
Kernel log shows:
[ 4039.310790] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[ 4039.310804] kworker/7:2/409 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 4039.310818] ffff9284e1bed000 (&fence_drv->fence_list_lock){?...}-{3:3},
[ 4039.310993] {IN-HARDIRQ-W} state was registered at:
[ 4039.311004] lock_acquire+0xc6/0x300
[ 4039.311018] _raw_spin_lock+0x39/0x80
[ 4039.311031] amdgpu_userq_fence_driver_process.part.0+0x30/0x180 [amdgpu]
[ 4039.311146] amdgpu_userq_fence_driver_process+0x17/0x30 [amdgpu]
[ 4039.311257] gfx_v11_0_eop_irq+0x132/0x170 [amdgpu]
Fix by using spin_lock_irqsave()/spin_unlock_irqrestore() to properly
manage interrupt state regardless of calling context.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ded3ad780c)
Cc: stable@vger.kernel.org
drm_sched_entity_init wasn't called yet, so the only thing to
do is to release allocated memory.
This doesn't fix any bug since entity is zero allocated and
drm_sched_entity_fini does nothing in this case.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ec49374ccb)
Certain multi-GPU configurations (especially GFX12) may hit
data corruption when a DCC-compressed VRAM surface is shared across GPUs
using peer-to-peer (P2P) DMA transfers.
Such surfaces rely on device-local metadata and cannot be safely accessed
through a remote GPU’s page tables. Attempting to import a DCC-enabled
surface through P2P leads to incorrect rendering or GPU faults.
This change disables P2P for DCC-enabled VRAM buffers that are contiguous
and allocated on GFX12+ hardware. In these cases, the importer falls back
to the standard system-memory path, avoiding invalid access to compressed
surfaces.
Future work could consider optional migration (VRAM→System→VRAM) if a
performance regression is observed when `attach->peer2peer = false`.
Tested on:
- Dual RX 9700 XT (Navi4x) setup
- GNOME and Wayland compositor scenarios
- Confirmed no corruption after disabling P2P under these conditions
v2: Remove check TTM_PL_VRAM & TTM_PL_FLAG_CONTIGUOUS.
v3: simplify for upsteam and fix ip version check (Alex)
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9dff2bb709)
Cc: stable@vger.kernel.org
After commit 100dfa74ca ("inet: dev_queue_xmit() llist adoption")
I started seeing many qdisc requeues on IDPF under high TX workload.
$ tc -s qd sh dev eth1 handle 1: ; sleep 1; tc -s qd sh dev eth1 handle 1:
qdisc mq 1: root
Sent 43534617319319 bytes 268186451819 pkt (dropped 0, overlimits 0 requeues 3532840114)
backlog 1056Kb 6675p requeues 3532840114
qdisc mq 1: root
Sent 43554665866695 bytes 268309964788 pkt (dropped 0, overlimits 0 requeues 3537737653)
backlog 781164b 4822p requeues 3537737653
This is caused by try_bulk_dequeue_skb() being only limited by BQL budget.
perf record -C120-239 -e qdisc:qdisc_dequeue sleep 1 ; perf script
...
netperf 75332 [146] 2711.138269: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1292 skbaddr=0xff378005a1e9f200
netperf 75332 [146] 2711.138953: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1213 skbaddr=0xff378004d607a500
netperf 75330 [144] 2711.139631: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1233 skbaddr=0xff3780046be20100
netperf 75333 [147] 2711.140356: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1093 skbaddr=0xff37800514845b00
netperf 75337 [151] 2711.141037: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1353 skbaddr=0xff37800460753300
netperf 75337 [151] 2711.141877: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1367 skbaddr=0xff378004e72c7b00
netperf 75330 [144] 2711.142643: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1202 skbaddr=0xff3780045bd60000
...
This is bad because :
1) Large batches hold one victim cpu for a very long time.
2) Driver often hit their own TX ring limit (all slots are used).
3) We call dev_requeue_skb()
4) Requeues are using a FIFO (q->gso_skb), breaking qdisc ability to
implement FQ or priority scheduling.
5) dequeue_skb() gets packets from q->gso_skb one skb at a time
with no xmit_more support. This is causing many spinlock games
between the qdisc and the device driver.
Requeues were supposed to be very rare, lets keep them this way.
Limit batch sizes to /proc/sys/net/core/dev_weight (default 64) as
__qdisc_run() was designed to use.
Fixes: 5772e9a346 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/20251109161215.2574081-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
selftests: mptcp: join: fix some flaky tests
When looking at the recent CI results on NIPA and MPTCP CIs, a few MPTCP
Join tests are marked as unstable. Here are some fixes for that.
- Patch 1: a small fix for mptcp_connect.sh, printing a note as
initially intended. For >=v5.13.
- Patch 2: avoid unexpected reset when closing subflows. For >= 5.13.
- Patches 3-4: longer transfer when not waiting for the end. For >=5.18.
- Patch 5: read all received data when expecting a reset. For >= v6.1.
- Patch 6: a fix to properly kill background tasks. For >= v6.5.
====================
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-0-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The 'run_tests' function is executed in the background, but killing its
associated PID would not kill the children tasks running in the
background.
To properly kill all background tasks, 'kill -- -PID' could be used, but
this requires kill from procps-ng. Instead, all children tasks are
listed using 'ps', and 'kill' is called with all PIDs of this group.
Fixes: 31ee4ad86a ("selftests: mptcp: join: stop transfer when check is done (part 1)")
Cc: stable@vger.kernel.org
Fixes: 04b57c9e09 ("selftests: mptcp: join: stop transfer when check is done (part 2)")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-6-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MPTCP Join "fastclose server" selftest is sometimes failing because the
client output file doesn't have the expected size, e.g. 296B instead of
1024B.
When looking at a packet trace when this happens, the server sent the
expected 1024B in two parts -- 100B, then 924B -- then the MP_FASTCLOSE.
It is then strange to see the client only receiving 296B, which would
mean it only got a part of the second packet. The problem is then not on
the networking side, but rather on the data reception side.
When mptcp_connect is launched with '-f -1', it means the connection
might stop before having sent everything, because a reset has been
received. When this happens, the program was directly stopped. But it is
also possible there are still some data to read, simply because the
previous 'read' step was done with a buffer smaller than the pending
data, see do_rnd_read(). In this case, it is important to read what's
left in the kernel buffers before stopping without error like before.
SIGPIPE is now ignored, not to quit the app before having read
everything.
Fixes: 6bf41020b7 ("selftests: mptcp: update and extend fastclose test-cases")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-5-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In rare cases, when the test environment is very slow, some userspace
tests can fail because some expected events have not been seen.
Because the tests are expecting a long on-going connection, and they are
not waiting for the end of the transfer, it is fine to make the
connection longer. This connection will be killed at the end, after the
verifications, so making it longer doesn't change anything, apart from
avoid it to end before the end of the verifications
To play it safe, all userspace tests not waiting for the end of the
transfer are now sharing a longer file (128KB) at slow speed.
Fixes: 4369c198e5 ("selftests: mptcp: test userspace pm out of transfer")
Cc: stable@vger.kernel.org
Fixes: b2e2248f36 ("selftests: mptcp: userspace pm create id 0 subflow")
Fixes: e3b47e460b ("selftests: mptcp: userspace pm remove initial subflow")
Fixes: b9fb176081 ("selftests: mptcp: userspace pm send RM_ADDR for ID 0")
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-4-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In rare cases, when the test environment is very slow, some userspace
tests can fail because some expected events have not been seen.
Because the tests are expecting a long on-going connection, and they are
not waiting for the end of the transfer, it is fine to make the
connection longer. This connection will be killed at the end, after the
verifications, so making it longer doesn't change anything, apart from
avoid it to end before the end of the verifications
To play it safe, all endpoints tests not waiting for the end of the
transfer are now sharing a longer file (128KB) at slow speed.
Fixes: 69c6ce7b6e ("selftests: mptcp: add implicit endpoint test case")
Cc: stable@vger.kernel.org
Fixes: e274f71540 ("selftests: mptcp: add subflow limits test-cases")
Fixes: b5e2fb832f ("selftests: mptcp: add explicit test case for remove/readd")
Fixes: e06959e9ee ("selftests: mptcp: join: test for flush/re-add endpoints")
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-3-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some of these 'remove' tests rarely fail because a subflow has been
reset instead of cleanly removed. This can happen when one extra subflow
which has never carried data is being closed (FIN) on one side, while
the other is sending data for the first time.
To avoid such subflows to be used right at the end, the backup flag has
been added. With that, data will be only carried on the initial subflow.
Fixes: d2c4333a80 ("selftests: mptcp: add testcases for removing addrs")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-2-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The "fallback due to TCP OoO" was never printed because the stat_ooo_now
variable was checked twice: once in the parent if-statement, and one in
the child one. The second condition was then always true then, and the
'else' branch was never taken.
The idea is that when there are more ACK + MP_CAPABLE than expected, the
test either fails if there was no out of order packets, or a notice is
printed.
Fixes: 69ca3d29a7 ("mptcp: update selftest for fallback due to OoO")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-1-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_conn: Fix not cleaning up PA_LINK connections
- hci_event: Fix not handling PA Sync Lost event
- MGMT: cancel mesh send timer when hdev removed
- 6lowpan: reset link-local header on ipv6 recv path
- 6lowpan: fix BDADDR_LE vs ADDR_LE_DEV address type confusion
- L2CAP: export l2cap_chan_hold for modules
- 6lowpan: Don't hold spin lock over sleeping functions
- 6lowpan: add missing l2cap_chan_lock()
- btusb: reorder cleanup in btusb_disconnect to avoid UAF
- btrtl: Avoid loading the config file on security chips
* tag 'for-net-2025-11-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: btrtl: Avoid loading the config file on security chips
Bluetooth: hci_event: Fix not handling PA Sync Lost event
Bluetooth: hci_conn: Fix not cleaning up PA_LINK connections
Bluetooth: 6lowpan: add missing l2cap_chan_lock()
Bluetooth: 6lowpan: Don't hold spin lock over sleeping functions
Bluetooth: L2CAP: export l2cap_chan_hold for modules
Bluetooth: 6lowpan: fix BDADDR_LE vs ADDR_LE_DEV address type confusion
Bluetooth: 6lowpan: reset link-local header on ipv6 recv path
Bluetooth: btusb: reorder cleanup in btusb_disconnect to avoid UAF
Bluetooth: MGMT: cancel mesh send timer when hdev removed
====================
Link: https://patch.msgid.link/20251111141357.1983153-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Building documentation produced the following warning:
WARNING: ./include/linux/ethtool.h:495 This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* IEEE 802.3ck/df defines 16 bins for FEC histogram plus one more for
This comment was not intended to be parsed as kernel-doc, so replace
the '/**' with '/*' to silence the warning and align with normal
comment style in header files.
No functional changes.
Signed-off-by: Kriish Sharma <kriish.sharma2006@gmail.com>
Link: https://patch.msgid.link/20251110182545.2112596-1-kriish.sharma2006@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Starting with Rust 1.91.0 (released 2025-10-30), in upstream commit
ab91a63d403b ("Ignore intrinsic calls in cross-crate-inlining cost model")
[1][2], `bindings.o` stops containing DWARF debug information because the
`Default` implementations contained `write_bytes()` calls which are now
ignored in that cost model (note that `CLIPPY=1` does not reproduce it).
This means `gendwarfksyms` complains:
RUSTC L rust/bindings.o
error: gendwarfksyms: process_module: dwarf_get_units failed: no debugging information?
There are several alternatives that would work here: conditionally
skipping in the cases needed (but that is subtle and brittle), forcing
DWARF generation with e.g. a dummy `static` (ugly and we may need to
do it in several crates), skipping the call to the tool in the Kbuild
command when there are no exports (fine) or teaching the tool to do so
itself (simple and clean).
Thus do the last one: don't attempt to process files if we have no symbol
versions to calculate.
[ I used the commit log of my patch linked below since it explained the
root issue and expanded it a bit more to summarize the alternatives.
- Miguel ]
Cc: stable@vger.kernel.org # Needed in 6.17.y.
Reported-by: Haiyue Wang <haiyuewa@163.com>
Closes: https://lore.kernel.org/rust-for-linux/b8c1c73d-bf8b-4bf2-beb1-84ffdcd60547@163.com/
Suggested-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lore.kernel.org/rust-for-linux/CANiq72nKC5r24VHAp9oUPR1HVPqT+=0ab9N0w6GqTF-kJOeiSw@mail.gmail.com/
Link: ab91a63d40 [1]
Link: https://github.com/rust-lang/rust/pull/145910 [2]
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Haiyue Wang <haiyuewa@163.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251110131913.1789896-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Pull arm64 fixes from Will Deacon:
"There's more here than I would ideally like at this stage, but there's
been a steady trickle of fixes and some of them took a few rounds of
review.
The bulk of the changes are fixing some fallout from the recent BBM
level two support which allows the linear map to be split from block
to page mappings at runtime, but inadvertently led to sleeping in
atomic context on some paths where the linear map was already mapped
with page granularity. The fix is simply to avoid splitting in those
cases but the implementation of that is a little involved.
The other interesting fix is addressing a catastophic performance
issue with our per-cpu atomics discovered by Paul in the SRCU locking
code but which took some interactions with the hardware folks to
resolve.
Summary:
- Avoid sleeping in atomic context when changing linear map
permissions for DEBUG_PAGEALLOC or KFENCE
- Rework printing of Spectre mitigation status to avoid hardlockup
when enabling per-task mitigations on the context-switch path
- Reject kernel modules when instruction patching fails either due to
the DWARF-based SCS patching or because of an alternatives callback
residing outside of the core kernel text
- Propagate error when updating kernel memory permissions in kprobes
- Drop pointless, incorrect message when enabling the ACPI SPCR
console
- Use value-returning LSE instructions for per-cpu atomics to reduce
latency in SRCU locking routines"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Reject modules with internal alternative callbacks
arm64: Fail module loading if dynamic SCS patching fails
arm64: proton-pack: Fix hard lockup due to print in scheduler context
arm64: proton-pack: Drop print when !CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY
arm64: mm: Tidy up force_pte_mapping()
arm64: mm: Optimize range_split_to_ptes()
arm64: mm: Don't sleep in split_kernel_leaf_mapping() when in atomic context
arm64: kprobes: check the return value of set_memory_rox()
arm64: acpi: Drop message logging SPCR default console
Revert "ACPI: Suppress misleading SPCR console message when SPCR table is absent"
arm64: Use load LSE atomics for the non-return per-CPU atomic operations
Pull btrfs fixes from David Sterba:
- fix new inode name tracking in tree-log
- fix conventional zone and stripe calculations in zoned mode
- fix bio reference counts on error paths in relocation and scrub
* tag 'for-6.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: release root after error in data_reloc_print_warning_inode()
btrfs: scrub: put bio after errors in scrub_raid56_parity_stripe()
btrfs: do not update last_log_commit when logging inode due to a new name
btrfs: zoned: fix stripe width calculation
btrfs: zoned: fix conventional zone capacity calculation
Pull misc fixes from Andrew Morton:
"26 hotfixes. 22(!) are cc:stable, 22 are MM.
- address some Kexec Handover issues (Pasha Tatashin)
- fix handling of large folios which are mapped outside i_size (Kiryl
Shutsemau)
- fix some DAMON time issues on 32-bit machines (Quanmin Yan)
Plus the usual shower of singletons"
* tag 'mm-hotfixes-stable-2025-11-10-19-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (26 commits)
kho: warn and exit when unpreserved page wasn't preserved
kho: fix unpreservation of higher-order vmalloc preservations
kho: fix out-of-bounds access of vmalloc chunk
MAINTAINERS: add Chris and Kairui as the swap maintainer
mm/secretmem: fix use-after-free race in fault handler
mm/huge_memory: initialise the tags of the huge zero folio
nilfs2: avoid having an active sc_timer before freeing sci
scripts/decode_stacktrace.sh: fix build ID and PC source parsing
mm/damon/sysfs: change next_update_jiffies to a global variable
mm/damon/stat: change last_refresh_jiffies to a global variable
maple_tree: fix tracepoint string pointers
codetag: debug: handle existing CODETAG_EMPTY in mark_objexts_empty for slabobj_ext
mm/mremap: honour writable bit in mremap pte batching
gcov: add support for GCC 15
mm/mm_init: fix hash table order logging in alloc_large_system_hash()
mm/truncate: unmap large folio on split failure
mm/memory: do not populate page table entries beyond i_size
fs/proc: fix uaf in proc_readdir_de()
mm/huge_memory: preserve PG_has_hwpoisoned if a folio is split to >0 order
ksm: use range-walk function to jump over holes in scan_get_next_rmap_item
...
When smbd_disconnect_rdma_connection() turns SMBDIRECT_SOCKET_CREATED
into SMBDIRECT_SOCKET_ERROR, we'll have the situation that
smbd_disconnect_rdma_work() will set SMBDIRECT_SOCKET_DISCONNECTING
and call rdma_disconnect(), which likely fails as we never reached
the RDMA_CM_EVENT_ESTABLISHED. it means that
wait_event(sc->status_wait, sc->status == SMBDIRECT_SOCKET_DISCONNECTED)
in smbd_destroy() will hang forever in SMBDIRECT_SOCKET_DISCONNECTING
never reaching SMBDIRECT_SOCKET_DISCONNECTED.
So we directly go from SMBDIRECT_SOCKET_CREATED to
SMBDIRECT_SOCKET_DISCONNECTED.
Fixes: ffbfc73e84 ("smb: client: let smbd_disconnect_rdma_connection() set SMBDIRECT_SOCKET_ERROR...")
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
In snd_usb_create_streams(), for UAC version 3 devices, the Interface
Association Descriptor (IAD) is retrieved via usb_ifnum_to_if(). If this
call fails, a fallback routine attempts to obtain the IAD from the next
interface and sets a BADD profile. However, snd_usb_mixer_controls_badd()
assumes that the IAD retrieved from usb_ifnum_to_if() is always valid,
without performing a NULL check. This can lead to a NULL pointer
dereference when usb_ifnum_to_if() fails to find the interface descriptor.
This patch adds a NULL pointer check after calling usb_ifnum_to_if() in
snd_usb_mixer_controls_badd() to prevent the dereference.
This issue was discovered by syzkaller, which triggered the bug by sending
a crafted USB device descriptor.
Fixes: 17156f23e9 ("ALSA: usb: add UAC3 BADD profiles support")
Signed-off-by: Haein Lee <lhi0729@kaist.ac.kr>
Link: https://patch.msgid.link/vwhzmoba9j2f.vwhzmob9u9e2.g6@dooray.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
ciu clock is 2 times of io clock, but the sample clk used is
derived from io clock provided to the card. So we should use
io clock to calculate the phase.
Fixes: 59903441f5 ("mmc: dw_mmc-rockchip: Add internal phase support")
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
memblock_estimated_nr_free_pages() returns the difference between the total
size of the "memory" memblock type and the "reserved" memblock type.
The "soft-reserved" memory regions are added to the "reserved" memblock
type, but not to the "memory" memblock type. Therefore,
memblock_estimated_nr_free_pages() may return a smaller value than
expected, or if it underflows, an extremely large value.
/proc/sys/kernel/threads-max is determined by the value of
memblock_estimated_nr_free_pages(). This issue was discovered on machines
with CXL memory because kernel.threads-max was either smaller than expected
or extremely large for the installed DRAM size.
This fixes the issue by replacing memblock_reserved_size() with
memblock_reserved_kern_size() that tells how much memory was
reserved from the actual RAM.
Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Link: https://patch.msgid.link/20251111010010.7800-1-akinobu.mita@gmail.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Since the maximum return value of strnlen(..., CIFS_MAX_USERNAME_LEN)
is CIFS_MAX_USERNAME_LEN, length check in smb3_fs_context_parse_param()
is always FALSE and invalid.
Fix the comparison in if statement.
Signed-off-by: Yiqi Sun <sunyiqixm@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When smb_direct_disconnect_rdma_connection() turns SMBDIRECT_SOCKET_CREATED
into SMBDIRECT_SOCKET_ERROR, we'll have the situation that
smb_direct_disconnect_rdma_work() will set SMBDIRECT_SOCKET_DISCONNECTING
and call rdma_disconnect(), which likely fails as we never reached
the RDMA_CM_EVENT_ESTABLISHED. it means that
wait_event(sc->status_wait, sc->status == SMBDIRECT_SOCKET_DISCONNECTED)
in free_transport() will hang forever in SMBDIRECT_SOCKET_DISCONNECTING
never reaching SMBDIRECT_SOCKET_DISCONNECTED.
So we directly go from SMBDIRECT_SOCKET_CREATED to
SMBDIRECT_SOCKET_DISCONNECTED.
Fixes: b3fd52a0d8 ("smb: server: let smb_direct_disconnect_rdma_connection() set SMBDIRECT_SOCKET_ERROR...")
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently, CQs without a completion function are assigned the
mlx5_add_cq_to_tasklet function by default. This is problematic since
only user CQs created through the mlx5_ib driver are intended to use
this function.
Additionally, all CQs that will use doorbells instead of polling for
completions must call mlx5_cq_arm. However, the default CQ creation flow
leaves a valid value in the CQ's arm_db field, allowing FW to send
interrupts to polling-only CQs in certain corner cases.
These two factors would allow a polling-only kernel CQ to be triggered
by an EQ interrupt and call a completion function intended only for user
CQs, causing a null pointer exception.
Some areas in the driver have prevented this issue with one-off fixes
but did not address the root cause.
This patch fixes the described issue by adding defaults to the create CQ
flow. It adds a default dummy completion function to protect against
null pointer exceptions, and it sets an invalid command sequence number
by default in kernel CQs to prevent the FW from sending an interrupt to
the CQ until it is armed. User CQs are responsible for their own
initialization values.
Callers of mlx5_core_create_cq are responsible for changing the
completion function and arming the CQ per their needs.
Fixes: cdd04f4d4d ("net/mlx5: Add support to create SQ and CQ for ASO")
Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Leon Romanovsky <leon@kernel.org>
Link: https://patch.msgid.link/1762681743-1084694-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
For chips with security enabled, it's only possible to load firmware
with a valid signature pattern.
If key_id is not zero, it indicates a security chip, and the driver will
not load the config file.
- Example log for a security chip.
Bluetooth: hci0: RTL: examining hci_ver=0c hci_rev=000a
lmp_ver=0c lmp_subver=8922
Bluetooth: hci0: RTL: rom_version status=0 version=1
Bluetooth: hci0: RTL: btrtl_initialize: key id 1
Bluetooth: hci0: RTL: loading rtl_bt/rtl8922au_fw.bin
Bluetooth: hci0: RTL: cfg_sz 0, total sz 71301
Bluetooth: hci0: RTL: fw version 0x41c0c905
- Example log for a normal chip.
Bluetooth: hci0: RTL: examining hci_ver=0c hci_rev=000a
lmp_ver=0c lmp_subver=8922
Bluetooth: hci0: RTL: rom_version status=0 version=1
Bluetooth: hci0: RTL: btrtl_initialize: key id 0
Bluetooth: hci0: RTL: loading rtl_bt/rtl8922au_fw.bin
Bluetooth: hci0: RTL: loading rtl_bt/rtl8922au_config.bin
Bluetooth: hci0: RTL: cfg_sz 6, total sz 71307
Bluetooth: hci0: RTL: fw version 0x41c0c905
Tested-by: Hilda Wu <hildawu@realtek.com>
Signed-off-by: Nial Ni <niall_ni@realsil.com.cn>
Signed-off-by: Max Chou <max.chou@realtek.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The previous calculation used roundup() which caused an overflow for
rates between 25.5Gbps and 26Gbps.
For example, a rate of 25.6Gbps would result in using 100Mbps units with
value of 256, which would overflow the 8 bits field.
Simplify the upper_limit_mbps calculation by removing the
unnecessary roundup, and adjust the comparison to use <= to correctly
handle the boundary condition.
Fixes: d8880795da ("net/mlx5e: Implement DCBNL IEEE max rate")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1762681073-1084058-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The current single-bit error injection mechanism flips bits directly in ECC RAM
by performing write and read operations. When the ECC RAM is actively used by
the Ethernet or USB controller, this approach sometimes trigger a false
double-bit error.
Switch both Ethernet and USB EDAC devices to use the INTTEST register
(altr_edac_a10_device_inject_fops) for single-bit error injection, similar to
the existing double-bit error injection method.
Fixes: 064acbd4f4 ("EDAC, altera: Add Stratix10 peripheral support")
Signed-off-by: Niravkumar L Rabara <niravkumarlaxmidas.rabara@altera.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251111081333.1279635-1-niravkumarlaxmidas.rabara@altera.com
This handles PA Sync Lost event which previously was assumed to be
handled with BIG Sync Lost but their lifetime are not the same thus why
there are 2 different events to inform when each sync is lost.
Fixes: b2a5f2e1c1 ("Bluetooth: hci_event: Add support for handling LE BIG Sync Lost event")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Quang Le reported that the AF_UNIX GC could garbage-collect a
receive queue of an alive in-flight socket, with a nice repro.
The repro consists of three stages.
1)
1-a. Create a single cyclic reference with many sockets
1-b. close() all sockets
1-c. Trigger GC
2)
2-a. Pass sk-A to an embryo sk-B
2-b. Pass sk-X to sk-X
2-c. Trigger GC
3)
3-a. accept() the embryo sk-B
3-b. Pass sk-B to sk-C
3-c. close() the in-flight sk-A
3-d. Trigger GC
As of 2-c, sk-A and sk-X are linked to unix_unvisited_vertices,
and unix_walk_scc() groups them into two different SCCs:
unix_sk(sk-A)->vertex->scc_index = 2 (UNIX_VERTEX_INDEX_START)
unix_sk(sk-X)->vertex->scc_index = 3
Once GC completes, unix_graph_grouped is set to true.
Also, unix_graph_maybe_cyclic is set to true due to sk-X's
cyclic self-reference, which makes close() trigger GC.
At 3-b, unix_add_edge() allocates unix_sk(sk-B)->vertex and
links it to unix_unvisited_vertices.
unix_update_graph() is called at 3-a. and 3-b., but neither
unix_graph_grouped nor unix_graph_maybe_cyclic is changed
because both sk-B's listener and sk-C are not in-flight.
3-c decrements sk-A's file refcnt to 1.
Since unix_graph_grouped is true at 3-d, unix_walk_scc_fast()
is finally called and iterates 3 sockets sk-A, sk-B, and sk-X:
sk-A -> sk-B (-> sk-C)
sk-X -> sk-X
This is totally fine. All of them are not yet close()d and
should be grouped into different SCCs.
However, unix_vertex_dead() misjudges that sk-A and sk-B are
in the same SCC and sk-A is dead.
unix_sk(sk-A)->scc_index == unix_sk(sk-B)->scc_index <-- Wrong!
&&
sk-A's file refcnt == unix_sk(sk-A)->vertex->out_degree
^-- 1 in-flight count for sk-B
-> sk-A is dead !?
The problem is that unix_add_edge() does not initialise scc_index.
Stage 1) is used for heap spraying, making a newly allocated
vertex have vertex->scc_index == 2 (UNIX_VERTEX_INDEX_START)
set by unix_walk_scc() at 1-c.
Let's track the max SCC index from the previous unix_walk_scc()
call and assign the max + 1 to a new vertex's scc_index.
This way, we can continue to avoid Tarjan's algorithm while
preventing misjudgments.
Fixes: ad081928a8 ("af_unix: Avoid Tarjan's algorithm if unnecessary.")
Reported-by: Quang Le <quanglex97@gmail.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251109025233.3659187-1-kuniyu@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In fact, it is a multi-threaded MIPS34Kc, not a single-threaded MIPS24Kc.
Fixes: 0ec4887009 ("mips: dts: Add EcoNet DTS with EN751221 and SmartFiber XP8421-B board")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Fix a regression that has caused accesses to the PCI MMIO window to
complete unclaimed in non-EVA configurations with the SOC-it family of
system controllers, preventing PCI devices from working that use MMIO.
In the non-EVA case PHYS_OFFSET is set to 0, meaning that PCI_BAR0 is
set with an empty mask (and PCI_HEAD4 matches addresses starting from 0
accordingly). Consequently all addresses are matched for incoming DMA
accesses from PCI. This seems to confuse the system controller's logic
and outgoing bus cycles targeting the PCI MMIO window seem not to make
it to the intended devices.
This happens as well when a wider mask is used with PCI_BAR0, such as
0x80000000 or 0xe0000000, that makes addresses match that overlap with
the PCI MMIO window, which starts at 0x10000000 in our configuration.
Set the mask in PCI_BAR0 to 0xf0000000 for non-EVA then, covering the
non-EVA maximum 256 MiB of RAM, which is what YAMON does and which used
to work correctly up to the offending commit. Set PCI_P2SCMSKL to match
PCI_BAR0 as required by the system controller's specification, and match
PCI_P2SCMAPL to PCI_HEAD4 for identity mapping.
Verified with:
Core board type/revision = 0x0d (Core74K) / 0x01
System controller/revision = MIPS SOC-it 101 OCP / 1.3 SDR-FW-4:1
Processor Company ID/options = 0x01 (MIPS Technologies, Inc.) / 0x1c
Processor ID/revision = 0x97 (MIPS 74Kf) / 0x4c
for non-EVA and with:
Core board type/revision = 0x0c (CoreFPGA-5) / 0x00
System controller/revision = MIPS ROC-it2 / 0.0 FW-1:1 (CLK_unknown) GIC
Processor Company ID/options = 0x01 (MIPS Technologies, Inc.) / 0x00
Processor ID/revision = 0xa0 (MIPS interAptiv UP) / 0x20
for EVA/non-EVA, fixing:
defxx 0000:00:12.0: assign IRQ: got 10
defxx: v1.12 2021/03/10 Lawrence V. Stefani and others
0000:00:12.0: Could not read adapter factory MAC address!
vs:
defxx 0000:00:12.0: assign IRQ: got 10
defxx: v1.12 2021/03/10 Lawrence V. Stefani and others
0000:00:12.0: DEFPA at MMIO addr = 0x10142000, IRQ = 10, Hardware addr = 00-00-f8-xx-xx-xx
0000:00:12.0: registered as fddi0
for non-EVA and causing no change for EVA.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Fixes: 422dd25664 ("MIPS: Malta: Allow PCI devices DMA to lower 2GB physical")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Some Marvell AP firmware used with mwl8k misbehaves when beacon frames
do not contain a WLAN_EID_DS_PARAMS element with the current channel.
It was reported on OpenWrt Github issues [0].
When hostapd/mac80211 omits DSSS Parameter Set from the beacon (which is
valid on some bands), the firmware stops transmitting sane frames and RX
status starts reporting bogus channel information. This makes AP mode
unusable.
Newer Marvell drivers (mwlwifi [1]) hard-code DSSS Parameter Set into
AP beacons for all chips, which suggests this is a firmware requirement
rather than a mwl8k-specific quirk.
Mirror that behaviour in mwl8k: when setting the beacon, check if
WLAN_EID_DS_PARAMS is present, and if not, extend the beacon and inject
a DSSS Parameter Set element, using the current channel from
hw->conf.chandef.chan.
Tested on Linksys EA4500 (88W8366).
[0] https://github.com/openwrt/openwrt/issues/19088
[1] db97edf20f/hif/fwcmd.c (L675)
Fixes: b64fe619e3 ("mwl8k: basic AP interface support")
Tested-by: Antony Kolitsos <zeusomighty@hotmail.com>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Link: https://patch.msgid.link/20251111100733.2825970-3-paweldembicki@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Jeff Johnson says:
==================
ath.git update for v6.18-rc6
Fix an ath11k transmit status reporting issue. This issue has always
been present, but not reported until recently.
Bringing this through the current release since there is now a
userspace entity that wants to leverage this.
==================
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Breno Leitao says:
====================
net: netpoll: fix memory leak and add comprehensive selftests
Fix a memory leak in netpoll and introduce netconsole selftests that
expose the issue when running with kmemleak detection enabled.
This patchset includes a selftest for netpoll with multiple concurrent
users (netconsole + bonding), which simulates the scenario from test[1]
that originally demonstrated the issue allegedly fixed by commit
efa95b01da ("netpoll: fix use after free") - a commit that is now
being reverted.
Sending this to "net" branch because this is a fix, and the selftest
might help with the backports validation.
Link: https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.1404857349.git.decot@googlers.com/ [1]
====================
Link: https://patch.msgid.link/20251107-netconsole_torture-v10-0-749227b55f63@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Create a netconsole test that puts a lot of pressure on the netconsole
list manipulation. Do it by creating dynamic targets and deleting
targets while messages are being sent. Also put interface down while the
messages are being sent, as creating parallel targets.
The code launches three background jobs on distinct schedules:
* Toggle netcons target every 30 iterations
* create and delete random_target every 50 iterations
* toggle iface every 70 iterations
This creates multiple concurrency sources that interact with netconsole
states. This is good practice to simulate stress, and exercise netpoll
and netconsole locks.
This test already found an issue as reported in [1]
Link: https://lore.kernel.org/all/20250901-netpoll_memleak-v1-1-34a181977dfc@debian.org/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Andre Carvalho <asantostc@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251107-netconsole_torture-v10-3-749227b55f63@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Extract the netconsole target creation from create_dynamic_target(), by
moving it from create_dynamic_target() into a new helper function. This
enables other tests to use the creation of netconsole targets with
arbitrary parameters and no sleep.
The new helper will be utilized by forthcoming torture-type selftests
that require dynamic target management.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251107-netconsole_torture-v10-2-749227b55f63@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
commit efa95b01da ("netpoll: fix use after free") incorrectly
ignored the refcount and prematurely set dev->npinfo to NULL during
netpoll cleanup, leading to improper behavior and memory leaks.
Scenario causing lack of proper cleanup:
1) A netpoll is associated with a NIC (e.g., eth0) and netdev->npinfo is
allocated, and refcnt = 1
- Keep in mind that npinfo is shared among all netpoll instances. In
this case, there is just one.
2) Another netpoll is also associated with the same NIC and
npinfo->refcnt += 1.
- Now dev->npinfo->refcnt = 2;
- There is just one npinfo associated to the netdev.
3) When the first netpolls goes to clean up:
- The first cleanup succeeds and clears np->dev->npinfo, ignoring
refcnt.
- It basically calls `RCU_INIT_POINTER(np->dev->npinfo, NULL);`
- Set dev->npinfo = NULL, without proper cleanup
- No ->ndo_netpoll_cleanup() is either called
4) Now the second target tries to clean up
- The second cleanup fails because np->dev->npinfo is already NULL.
* In this case, ops->ndo_netpoll_cleanup() was never called, and
the skb pool is not cleaned as well (for the second netpoll
instance)
- This leaks npinfo and skbpool skbs, which is clearly reported by
kmemleak.
Revert commit efa95b01da ("netpoll: fix use after free") and adds
clarifying comments emphasizing that npinfo cleanup should only happen
once the refcount reaches zero, ensuring stable and correct netpoll
behavior.
Cc: <stable@vger.kernel.org> # 3.17.x
Cc: Jay Vosburgh <jv@jvosburgh.net>
Fixes: efa95b01da ("netpoll: fix use after free")
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251107-netconsole_torture-v10-1-749227b55f63@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Aksh Garg says:
====================
Fix IET verification implementation for CPSW driver
The CPSW module supports Intersperse Express Traffic (IET) and allows
the MAC layer to verify whether the peer supports IET through its MAC
merge sublayer, by sending a verification packet and waiting for its
response until the timeout. As defined in IEEE 802.3 Clause 99, the
verification process involves up to 3 verification attempts to
establish support.
This patch series fixes issues in the implementation of this IET
verification process.
====================
Link: https://patch.msgid.link/20251106092305.1437347-1-a-garg7@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The am65_cpsw_iet_verify_wait() function attempts verification 20 times,
toggling the AM65_CPSW_PN_IET_MAC_LINKFAIL bit in each iteration. When
the LINKFAIL bit transitions from 1 to 0, the MAC merge layer initiates
the verification process and waits for the timeout configured in
MAC_VERIFY_CNT before automatically retransmitting. The MAC_VERIFY_CNT
register is configured according to the user-defined verify/response
timeout in am65_cpsw_iet_set_verify_timeout_count(). As per IEEE 802.3
Clause 99, the hardware performs this automatic retry up to 3 times.
Current implementation toggles LINKFAIL after the user-configured
verify/response timeout in each iteration, forcing the hardware to
restart verification instead of respecting the MAC_VERIFY_CNT timeout.
This bypasses the hardware's automatic retry mechanism.
Fix this by moving the LINKFAIL bit toggle outside the retry loop and
reducing the retry count from 20 to 3. The software now only monitors
the status register while the hardware autonomously handles the 3
verification attempts at proper MAC_VERIFY_CNT intervals.
Fixes: 49a2eb9068 ("net: ethernet: ti: am65-cpsw-qos: Add Frame Preemption MAC Merge support")
Signed-off-by: Aksh Garg <a-garg7@ti.com>
Link: https://patch.msgid.link/20251106092305.1437347-3-a-garg7@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The CPSW module uses the MAC_VERIFY_CNT bit field in the
CPSW_PN_IET_VERIFY_REG_k register to set the verify/response timeout
count. This register specifies the number of clock cycles to wait before
resending a verify packet if the verification fails.
The verify/response timeout count, as being set by the function
am65_cpsw_iet_set_verify_timeout_count() is hardcoded for 125MHz
clock frequency, which varies based on PHY mode and link speed.
The respective clock frequencies are as follows:
- RGMII mode:
* 1000 Mbps: 125 MHz
* 100 Mbps: 25 MHz
* 10 Mbps: 2.5 MHz
- QSGMII/SGMII mode: 125 MHz (all speeds)
Fix this by adding logic to calculate the correct timeout counts
based on the actual PHY interface mode and link speed.
Fixes: 49a2eb9068 ("net: ethernet: ti: am65-cpsw-qos: Add Frame Preemption MAC Merge support")
Signed-off-by: Aksh Garg <a-garg7@ti.com>
Link: https://patch.msgid.link/20251106092305.1437347-2-a-garg7@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In tls_handshake_accept(), a netlink message is allocated using
genlmsg_new(). In the error handling path, genlmsg_cancel() is called
to cancel the message construction, but the message itself is not freed.
This leads to a memory leak.
Fix this by calling nlmsg_free() in the error path after genlmsg_cancel()
to release the allocated memory.
Fixes: 2fd5532044 ("net/handshake: Add a kernel API for requesting a TLSv1.3 handshake")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://patch.msgid.link/20251106144511.3859535-1-zilin@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current CLC proposal message construction uses a mix of
`ini->smc_type_v1/v2` and `pclc_base->hdr.typev1/v2` to decide whether
to include optional extensions (IPv6 prefix extension for v1, and v2
extension). This leads to a critical inconsistency: when
`smc_clc_prfx_set()` fails - for example, in IPv6-only environments with
only link-local addresses, or when the local IP address and the outgoing
interface’s network address are not in the same subnet.
As a result, the proposal message is assembled using the stale
`ini->smc_type_v1` value—causing the IPv6 prefix extension to be
included even though the header indicates v1 is not supported.
The peer then receives a malformed CLC proposal where the header type
does not match the payload, and immediately resets the connection.
The fix ensures consistency between the CLC header flags and the actual
payload by synchronizing `ini->smc_type_v1` with `pclc_base->hdr.typev1`
when prefix setup fails.
Fixes: 8c3dca341a ("net/smc: build and send V2 CLC proposal")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://patch.msgid.link/20251107024029.88753-1-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When freeing indexed arrays, the corresponding free function should
be called for each entry of the indexed array. For example, for
for 'struct tc_act_attrs' 'tc_act_attrs_free(...)' needs to be called
for each entry.
Previously, memory leaks were reported when enabling the ASAN
analyzer.
=================================================================
==874==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
#1 0x55c98db048af in tc_act_attrs_set_options_vlan_parms ../generated/tc-user.h:2813
#2 0x55c98db048af in main ./linux/tools/net/ynl/samples/tc-filter-add.c:71
Direct leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
#1 0x55c98db04a93 in tc_act_attrs_set_options_vlan_parms ../generated/tc-user.h:2813
#2 0x55c98db04a93 in main ./linux/tools/net/ynl/samples/tc-filter-add.c:74
Direct leak of 10 byte(s) in 2 object(s) allocated from:
#0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
#1 0x55c98db0527d in tc_act_attrs_set_kind ../generated/tc-user.h:1622
SUMMARY: AddressSanitizer: 58 byte(s) leaked in 4 allocation(s).
The following diff illustrates the changes introduced compared to the
previous version of the code.
void tc_flower_attrs_free(struct tc_flower_attrs *obj)
{
+ unsigned int i;
+
free(obj->indev);
+ for (i = 0; i < obj->_count.act; i++)
+ tc_act_attrs_free(&obj->act[i]);
free(obj->act);
free(obj->key_eth_dst);
free(obj->key_eth_dst_mask);
Signed-off-by: Zahari Doychev <zahari.doychev@linux.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20251106151529.453026-3-zahari.doychev@linux.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Broadcom switches locally terminate link local traffic and do not
forward it, so we should not mark it as offloaded.
In some situations we still want/need to flood this traffic, e.g. if STP
is disabled, or it is explicitly enabled via the group_fwd_mask. But if
the skb is marked as offloaded, the kernel will assume this was already
done in hardware, and the packets never reach other bridge ports.
So ensure that link local traffic is never marked as offloaded, so that
the kernel can forward/flood these packets in software if needed.
Since the local termination in not configurable, check the destination
MAC, and never mark packets as offloaded if it is a link local ether
address.
While modern switches set the tag reason code to BRCM_EG_RC_PROT_TERM
for trapped link local traffic, they also set it for link local traffic
that is flooded (01:80:c2:00:00:10 to 01:80:c2:00:00:2f), so we cannot
use it and need to look at the destination address for them as well.
Fixes: 964dbf186e ("net: dsa: tag_brcm: add support for legacy tags")
Fixes: 0e62f543be ("net: dsa: Fix duplicate frames flooded by learning")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251109134635.243951-1-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The tracing selftest "event-filter-function.tc" was failing because it
first runs the "sample_events" function that triggers the kmem_cache_free
event and it looks at what function was used during a call to "ls".
But the first time it calls this, it could trigger events that are used to
pull pages into the page cache.
The rest of the test uses the function it finds during that call to see if
it will be called in subsequent "sample_events" calls. But if there's no
need to pull pages into the page cache, it will not trigger that function
and the test will fail.
Call the "sample_events" twice to trigger all the page cache work before
it calls it to find a function to use in subsequent checks.
Cc: stable@vger.kernel.org
Fixes: eb50d0f250 ("selftests/ftrace: Choose target function for filter test from samples")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
syzbot reported a possible shift-out-of-bounds [1]
Blamed commit added rto_alpha_max and rto_beta_max set to 1000.
It is unclear if some sctp users are setting very large rto_alpha
and/or rto_beta.
In order to prevent user regression, perform the test at run time.
Also add READ_ONCE() annotations as sysctl values can change under us.
[1]
UBSAN: shift-out-of-bounds in net/sctp/transport.c:509:41
shift exponent 64 is too large for 32-bit type 'unsigned int'
CPU: 0 UID: 0 PID: 16704 Comm: syz.2.2320 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x16c/0x1f0 lib/dump_stack.c:120
ubsan_epilogue lib/ubsan.c:233 [inline]
__ubsan_handle_shift_out_of_bounds+0x27f/0x420 lib/ubsan.c:494
sctp_transport_update_rto.cold+0x1c/0x34b net/sctp/transport.c:509
sctp_check_transmitted+0x11c4/0x1c30 net/sctp/outqueue.c:1502
sctp_outq_sack+0x4ef/0x1b20 net/sctp/outqueue.c:1338
sctp_cmd_process_sack net/sctp/sm_sideeffect.c:840 [inline]
sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1372 [inline]
Fixes: b58537a1f5 ("net: sctp: fix permissions for rto_alpha and rto_beta knobs")
Reported-by: syzbot+f8c46c8b2b7f6e076e99@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/690c81ae.050a0220.3d0d33.014e.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251106111054.3288127-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull RISC-V fixes from Paul Walmsley:
- fix broken clang build on versions earlier than 19 and binutils
versions earlier than 2.38.
(This exposed that we're not properly testing earlier toolchain
versions in our linux-next builds and PR submissions. This was fixed
for this PR, and is being addressed more generally for -next builds.)
- remove some redundant Makefile code
- avoid building Canaan Kendryte K210-specific code on targets that
don't build for the K210
* tag 'riscv-for-linus-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix CONFIG_AS_HAS_INSN for new .insn usage
riscv: Remove redundant judgment for the default build target
riscv: Build loader.bin exclusively for Canaan K210
It's useful to know which query opcodes are available. Extend the
structure and return that. It's a trivial change, and even though it can
be painlessly extended later, it'd still require adding a v2 of the
structure.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The utimes01 and utime06 tests fail when delegated timestamps are
enabled, specifically in subtests that modify the atime and mtime
fields using the 'nobody' user ID.
The problem can be reproduced as follow:
# echo "/media *(rw,no_root_squash,sync)" >> /etc/exports
# export -ra
# mount -o rw,nfsvers=4.2 127.0.0.1:/media /tmpdir
# cd /opt/ltp
# ./runltp -d /tmpdir -s utimes01
# ./runltp -d /tmpdir -s utime06
This issue occurs because nfs_setattr does not verify the inode's
UID against the caller's fsuid when delegated timestamps are
permitted for the inode.
This patch adds the UID check and if it does not match then the
request is sent to the server for permission checking.
Fixes: e12912d941 ("NFSv4: Add support for delegated atime and mtime attributes")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Contrary to what was stated on d36349ea73 ("Bluetooth: hci_conn:
Fix running bis_cleanup for hci_conn->type PA_LINK") the PA_LINK does
in fact needs to run bis_cleanup in order to terminate the PA Sync,
since that is bond to the listening socket which is the entity that
controls the lifetime of PA Sync, so if it is closed/released the PA
Sync shall be terminated, terminating the PA Sync shall not result in
the BIG Sync being terminated since once the later is established it
doesn't depend on the former anymore.
If the use user wants to reconnect/rebind a number of BIS(s) it shall
keep the socket open until it no longer needs the PA Sync, which means
it retains full control of the lifetime of both PA and BIG Syncs.
Fixes: d36349ea73 ("Bluetooth: hci_conn: Fix running bis_cleanup for hci_conn->type PA_LINK")
Fixes: a7bcffc673 ("Bluetooth: Add PA_LINK to distinguish BIG sync and PA sync connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
l2cap_chan_close() needs to be called in l2cap_chan_lock(), otherwise
l2cap_le_sig_cmd() etc. may run concurrently.
Add missing locks around l2cap_chan_close().
Fixes: 6b8d4a6a03 ("Bluetooth: 6LoWPAN: Use connected oriented channel instead of fixed one")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
disconnect_all_peers() calls sleeping function (l2cap_chan_close) under
spinlock. Holding the lock doesn't actually do any good -- we work on a
local copy of the list, and the lock doesn't protect against peer->chan
having already been freed.
Fix by taking refcounts of peer->chan instead. Clean up the code and
old comments a bit.
Take devices_lock instead of RCU, because the kfree_rcu();
l2cap_chan_put(); construct in chan_close_cb() does not guarantee
peer->chan is necessarily valid in RCU.
Also take l2cap_chan_lock() which is required for l2cap_chan_close().
Log: (bluez 6lowpan-tester Client Connect - Disable)
------
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:575
...
<TASK>
...
l2cap_send_disconn_req (net/bluetooth/l2cap_core.c:938 net/bluetooth/l2cap_core.c:1495)
...
? __pfx_l2cap_chan_close (net/bluetooth/l2cap_core.c:809)
do_enable_set (net/bluetooth/6lowpan.c:1048 net/bluetooth/6lowpan.c:1068)
------
Fixes: 9030582963 ("Bluetooth: 6lowpan: Converting rwlocks to use RCU")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
l2cap_chan_put() is exported, so export also l2cap_chan_hold() for
modules.
l2cap_chan_hold() has use case in net/bluetooth/6lowpan.c
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth 6lowpan.c confuses BDADDR_LE and ADDR_LE_DEV address types,
e.g. debugfs "connect" command takes the former, and "disconnect" and
"connect" to already connected device take the latter. This is due to
using same value both for l2cap_chan_connect and hci_conn_hash_lookup_le
which take different dst_type values.
Fix address type passed to hci_conn_hash_lookup_le().
Retain the debugfs API difference between "connect" and "disconnect"
commands since it's been like this since 2015 and nobody apparently
complained.
Fixes: f5ad4ffceb ("Bluetooth: 6lowpan: Use hci_conn_hash_lookup_le() when possible")
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth 6lowpan.c netdev has header_ops, so it must set link-local
header for RX skb, otherwise things crash, eg. with AF_PACKET SOCK_RAW
Add missing skb_reset_mac_header() for uncompressed ipv6 RX path.
For the compressed one, it is done in lowpan_header_decompress().
Log: (BlueZ 6lowpan-tester Client Recv Raw - Success)
------
kernel BUG at net/core/skbuff.c:212!
Call Trace:
<IRQ>
...
packet_rcv (net/packet/af_packet.c:2152)
...
<TASK>
__local_bh_enable_ip (kernel/softirq.c:407)
netif_rx (net/core/dev.c:5648)
chan_recv_cb (net/bluetooth/6lowpan.c:294 net/bluetooth/6lowpan.c:359)
------
Fixes: 18722c2470 ("Bluetooth: Enable 6LoWPAN support for BT LE devices")
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
mesh_send_done timer is not canceled when hdev is removed, which causes
crash if the timer triggers after hdev is gone.
Cancel the timer when MGMT removes the hdev, like other MGMT timers.
Should fix the BUG: sporadically seen by BlueZ test bot
(in "Mesh - Send cancel - 1" test).
Log:
------
BUG: KASAN: slab-use-after-free in run_timer_softirq+0x76b/0x7d0
...
Freed by task 36:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
__kasan_save_free_info+0x3a/0x60
__kasan_slab_free+0x43/0x70
kfree+0x103/0x500
device_release+0x9a/0x210
kobject_put+0x100/0x1e0
vhci_release+0x18b/0x240
------
Fixes: b338d91703 ("Bluetooth: Implement support for Mesh")
Link: https://lore.kernel.org/linux-bluetooth/67364c09.0c0a0220.113cba.39ff@mx.google.com/
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The Smatch static checker noted that in _nfs4_proc_lookupp(), the flag
RPC_TASK_TIMEOUT is being passed as an argument to nfs4_init_sequence(),
which is clearly incorrect.
Since LOOKUPP is an idempotent operation, nfs4_init_sequence() should
not ask the server to cache the result. The RPC_TASK_TIMEOUT flag needs
to be passed down to the RPC layer.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Fixes: 76998ebb91 ("NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
If adding the second kobject fails, drop both references to avoid sysfs
residue and memory leak.
Fixes: e96f9268ee ("NFS: Make all of /sys/fs/nfs network-namespace unique")
Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn>
Reviewed-by: Benjamin Coddington <ben.coddington@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
From https://lore.kernel.org/linux-nfs/aQHASIumLJyOoZGH@infradead.org/
On Wed, Oct 29, 2025 at 12:20:40AM -0700, Christoph Hellwig wrote:
> On Mon, Oct 27, 2025 at 12:18:30PM -0400, Mike Snitzer wrote:
> > LOCALIO's misaligned DIO will issue head/tail followed by O_DIRECT
> > middle (via AIO completion of that aligned middle). So out of order
> > relative to file offset.
>
> That's in general a really bad idea. It will obviously work, but
> both on SSDs and out of place write file systems it is a sure way
> to increase your garbage collection overhead a lot down the line.
Fix this by never issuing misaligned DIO out of order. This fix means
the DIO-aligned middle will only use AIO completion if there is no
misaligned end segment. Otherwise, all 3 segments of a misaligned DIO
will be issued without AIO completion to ensure file offset increases
properly for all partial READ or WRITE situations.
Factoring out nfs_local_iter_setup() helps standardize repetitive
nfs_local_iters_setup_dio() code and is inspired by cleanup work that
Chuck Lever did on the NFSD Direct code.
Fixes: c817248fc8 ("nfs/localio: add proper O_DIRECT support for READ and WRITE")
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
The WMI driver core only supports GUID strings containing only
uppercase characters, however the GUID string used by the
msi-wmi-platform driver contains a single lowercase character.
This prevents the WMI driver core from matching said driver to
its WMI device.
Fix this by turning the lowercase character into a uppercase
character. Also update the WMI driver development guide to warn
about this.
Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
Fixes: 9c0beb6b29 ("platform/x86: wmi: Add MSI WMI Platform driver")
Tested-by: Antheas Kapenekakis <lkml@antheas.dev>
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://patch.msgid.link/20251110111253.16204-3-W_Armin@gmx.de
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
It turns out that the GUID used by the msi-wmi-platform driver
(ABBC0F60-8EA1-11D1-00A0-C90629100000) is not unique, but was instead
copied from the WIndows Driver Samples. This means that this driver
could load on devices from other manufacturers that also copied this
GUID, potentially causing hardware errors.
Prevent this by only loading on devices whitelisted via DMI. The DMI
matches where taken from the msi-ec driver.
Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
Fixes: 9c0beb6b29 ("platform/x86: wmi: Add MSI WMI Platform driver")
Tested-by: Antheas Kapenekakis <lkml@antheas.dev>
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://patch.msgid.link/20251110111253.16204-2-W_Armin@gmx.de
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Pull kvm fixes from Paolo Bonzini:
"Arm:
- Fix trapping regression when no in-kernel irqchip is present
- Check host-provided, untrusted ranges and offsets in pKVM
- Fix regression restoring the ID_PFR1_EL1 register
- Fix vgic ITS locking issues when LPIs are not directly injected
Arm selftests:
- Correct target CPU programming in vgic_lpi_stress selftest
- Fix exposure of SCTLR2_EL2 and ZCR_EL2 in get-reg-list selftest
RISC-V:
- Fix check for local interrupts on riscv32
- Read HGEIP CSR on the correct cpu when checking for IMSIC
interrupts
- Remove automatic I/O mapping from kvm_arch_prepare_memory_region()
x86:
- Inject #UD if the guest attempts to execute SEAMCALL or TDCALL as
KVM doesn't support virtualization the instructions, but the
instructions are gated only by VMXON. That is, they will VM-Exit
instead of taking a #UD and until now this resulted in KVM exiting
to userspace with an emulation error.
- Unload the "FPU" when emulating INIT of XSTATE features if and only
if the FPU is actually loaded, instead of trying to predict when
KVM will emulate an INIT (CET support missed the MP_STATE path).
Add sanity checks to detect and harden against similar bugs in the
future.
- Unregister KVM's GALog notifier (for AVIC) when kvm-amd.ko is
unloaded.
- Use a raw spinlock for svm->ir_list_lock as the lock is taken
during schedule(), and "normal" spinlocks are sleepable locks when
PREEMPT_RT=y.
- Remove guest_memfd bindings on memslot deletion when a gmem file is
dying to fix a use-after-free race found by syzkaller.
- Fix a goof in the EPT Violation handler where KVM checks the wrong
variable when determining if the reported GVA is valid.
- Fix and simplify the handling of LBR virtualization on AMD, which
was made buggy and unnecessarily complicated by nested VM support
Misc:
- Update Oliver's email address"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
KVM: nSVM: Fix and simplify LBR virtualization handling with nested
KVM: nSVM: Always recalculate LBR MSR intercepts in svm_update_lbrv()
KVM: SVM: Mark VMCB_LBR dirty when MSR_IA32_DEBUGCTLMSR is updated
MAINTAINERS: Switch myself to using kernel.org address
KVM: arm64: vgic-v3: Release reserved slot outside of lpi_xa's lock
KVM: arm64: vgic-v3: Reinstate IRQ lock ordering for LPI xarray
KVM: arm64: Limit clearing of ID_{AA64PFR0,PFR1}_EL1.GIC to userspace irqchip
KVM: arm64: Set ID_{AA64PFR0,PFR1}_EL1.GIC when GICv3 is configured
KVM: arm64: Make all 32bit ID registers fully writable
KVM: VMX: Fix check for valid GVA on an EPT violation
KVM: guest_memfd: Remove bindings on memslot deletion when gmem is dying
KVM: SVM: switch to raw spinlock for svm->ir_list_lock
KVM: SVM: Make avic_ga_log_notifier() local to avic.c
KVM: SVM: Unregister KVM's GALog notifier on kvm-amd.ko exit
KVM: SVM: Initialize per-CPU svm_data at the end of hardware setup
KVM: x86: Call out MSR_IA32_S_CET is not handled by XSAVES
KVM: x86: Harden KVM against imbalanced load/put of guest FPU state
KVM: x86: Unload "FPU" state on INIT if and only if its currently in-use
KVM: arm64: Check the untrusted offset in FF-A memory share
KVM: arm64: Check range args for pKVM mem transitions
...
LOCALIO's misaligned DIO WRITE support requires synchronous IO for any
misaligned head and/or tail that are issued using buffered IO. In
addition, it is important that the O_DIRECT middle be on stable
storage upon its completion via AIO.
Otherwise, a misaligned DIO WRITE could mix buffered IO for the
head/tail and direct IO for the DIO-aligned middle -- which could lead
to problems associated with deferred writes to stable storage (such as
out of order partial completions causing incorrect advancement of the
file's offset, etc).
Fixes: c817248fc8 ("nfs/localio: add proper O_DIRECT support for READ and WRITE")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Misaligned DIO read can be split into 3 IOs, must handle potential for
short read from each component IO (follows same pattern used for
handling partial writes, except upper layer read code handles advancing
offset before retry).
Fixes: c817248fc8 ("nfs/localio: add proper O_DIRECT support for READ and WRITE")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Improve completion handling of as many as 3 IOs associated with each
misaligned DIO by using a atomic_t to track completion of each IO.
Update nfs_local_pgio_done() to use precise atomic_t accounting for
remaining iov_iter (up to 3) associated with each iocb, so that each
NFS LOCALIO pgio header is only released after all IOs have completed.
But also allow early return if/when a short read or write occurs.
Fixes reported BUG: KASAN: slab-use-after-free in nfs_local_call_read:
https://lore.kernel.org/linux-nfs/aPSvi5Yr2lGOh5Jh@dell-per750-06-vm-07.rhts.eng.pek2.redhat.com/
Reported-by: Yongcheng Yang <yoyang@redhat.com>
Fixes: c817248fc8 ("nfs/localio: add proper O_DIRECT support for READ and WRITE")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Each filesystem is meant to fallback to retrying DIO in terms buffered
IO when it might encounter -ENOTBLK when issuing DIO (which can happen
if the VFS cannot invalidate the page cache).
So NFS doesn't need special handling for -ENOTBLK.
Also, explicitly initialize a couple DIO related iocb members rather
than simply rely on data structure zeroing.
Fixes: c817248fc8 ("nfs/localio: add proper O_DIRECT support for READ and WRITE")
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
If the TLS security policy is of type RPC_XPRTSEC_TLS_X509, then the
cert_serial and privkey_serial fields need to match as well since they
define the client's identity, as presented to the server.
Fixes: 90c9550a8d ("NFS: support the kernel keyring for TLS")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
The default setting for the transport security policy must be
RPC_XPRTSEC_NONE, when using a TCP or RDMA connection without TLS.
Conversely, when using TLS, the security policy needs to be set.
Fixes: 6c0a8c5fcf ("NFS: Have struct nfs_client carry a TLS policy field")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Don't try to add an RDMA transport to a client that is already marked as
being a TCP/TLS transport.
Fixes: a35518cae4 ("NFSv4.1/pnfs: fix NFS with TLS in pnfs")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Don't try to add an RDMA transport to a client that is already marked as
being a TCP/TLS transport.
Fixes: 04a1526366 ("pnfs/flexfiles: connect to NFSv3 DS using TLS if MDS connection uses TLS")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
nfsd4_enc_sequence_replay() uses nfsd4_encode_operation() to encode a
new SEQUENCE reply when replaying a request from the slot cache - only
ops after the SEQUENCE are replayed from the cache in ->sl_data.
However it does this in nfsd4_replay_cache_entry() which is called
*before* nfsd4_sequence() has filled in reply fields.
This means that in the replayed SEQUENCE reply:
maxslots will be whatever the client sent
target_maxslots will be -1 (assuming init to zero, and
nfsd4_encode_sequence() subtracts 1)
status_flags will be zero
The incorrect maxslots value, in particular, can cause the client to
think the slot table has been reduced in size so it can discard its
knowledge of current sequence number of the later slots, though the
server has not discarded those slots. When the client later wants to
use a later slot, it can get NFS4ERR_SEQ_MISORDERED from the server.
This patch moves the setup of the reply into a new helper function and
call it *before* nfsd4_replay_cache_entry() is called. Only one of the
updated fields was used after this point - maxslots. So the
nfsd4_sequence struct has been extended to have separate maxslots for
the request and the response.
Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Closes: https://lore.kernel.org/linux-nfs/20251010194449.10281-1-okorniev@redhat.com/
Tested-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The PCM stream data in USB-audio driver is transferred over USB URB
packet buffers, and each packet size is determined dynamically. The
packet sizes are limited by some factors such as wMaxPacketSize USB
descriptor. OTOH, in the current code, the actually used packet sizes
are determined only by the rate and the PPS, which may be bigger than
the size limit above. This results in a buffer overflow, as reported
by syzbot.
Basically when the limit is smaller than the calculated packet size,
it implies that something is wrong, most likely a weird USB
descriptor. So the best option would be just to return an error at
the parameter setup time before doing any further operations.
This patch introduces such a sanity check, and returns -EINVAL when
the packet size is greater than maxpacksize. The comparison with
ep->packsize[1] alone should suffice since it's always equal or
greater than ep->packsize[0].
Reported-by: syzbot+bfd77469c8966de076f7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=bfd77469c8966de076f7
Link: https://lore.kernel.org/690b6b46.050a0220.3d0d33.0054.GAE@google.com
Cc: Lizhi Xu <lizhi.xu@windriver.com>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20251109091211.12739-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Fix kernel-doc warnings so that there no other kernel-doc issues
in <uapi/linux/tee.h>:
- add ending ':' to some struct members as needed for kernel-doc
- change struct name in kernel-doc to match the actual struct name (2x)
- add a @params: kernel-doc entry multiple times
Warning: tee.h:265 struct member 'ret_origin' not described
in 'tee_ioctl_open_session_arg'
Warning: tee.h:265 struct member 'num_params' not described
in 'tee_ioctl_open_session_arg'
Warning: tee.h:265 struct member 'params' not described
in 'tee_ioctl_open_session_arg'
Warning: tee.h:351 struct member 'num_params' not described
in 'tee_iocl_supp_recv_arg'
Warning: tee.h:351 struct member 'params' not described
in 'tee_iocl_supp_recv_arg'
Warning: tee.h:372 struct member 'num_params' not described
in 'tee_iocl_supp_send_arg'
Warning: tee.h:372 struct member 'params' not described
in 'tee_iocl_supp_send_arg'
Warning: tee.h:298: expecting prototype for struct
tee_ioctl_invoke_func_arg. Prototype was for
struct tee_ioctl_invoke_arg instead
Warning: tee.h:473: expecting prototype for struct
tee_ioctl_invoke_func_arg. Prototype was for struct
tee_ioctl_object_invoke_arg instead
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Sumit Garg <sumit.garg@oss.qualcomm.com>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
kho_vmalloc_unpreserve_chunk() calls __kho_unpreserve() with end_pfn as
pfn + 1. This happens to work for 0-order pages, but leaks higher order
pages.
For example, say order 2 pages back the allocation. During preservation,
they get preserved in the order 2 bitmaps, but
kho_vmalloc_unpreserve_chunk() would try to unpreserve them from the order
0 bitmaps, which should not have these bits set anyway, leaving the order
2 bitmaps untouched. This results in the pages being carried over to the
next kernel. Nothing will free those pages in the next boot, leaking
them.
Fix this by taking the order into account when calculating the end PFN for
__kho_unpreserve().
Link: https://lkml.kernel.org/r/20251103180235.71409-2-pratyush@kernel.org
Fixes: a667300bd5 ("kho: add support for preserving vmalloc allocations")
Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The list of pages in a vmalloc chunk is NULL-terminated. So when looping
through the pages in a vmalloc chunk, both kho_restore_vmalloc() and
kho_vmalloc_unpreserve_chunk() rightly make sure to stop when encountering
a NULL page. But when the chunk is full, the loops do not stop and go
past the bounds of chunk->phys, resulting in out-of-bounds memory access,
and possibly the restoration or unpreservation of an invalid page.
Fix this by making sure the processing of chunk stops at the end of the
array.
Link: https://lkml.kernel.org/r/20251103110159.8399-1-pratyush@kernel.org
Fixes: a667300bd5 ("kho: add support for preserving vmalloc allocations")
Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When a page fault occurs in a secret memory file created with
`memfd_secret(2)`, the kernel will allocate a new folio for it, mark the
underlying page as not-present in the direct map, and add it to the file
mapping.
If two tasks cause a fault in the same page concurrently, both could end
up allocating a folio and removing the page from the direct map, but only
one would succeed in adding the folio to the file mapping. The task that
failed undoes the effects of its attempt by (a) freeing the folio again
and (b) putting the page back into the direct map. However, by doing
these two operations in this order, the page becomes available to the
allocator again before it is placed back in the direct mapping.
If another task attempts to allocate the page between (a) and (b), and the
kernel tries to access it via the direct map, it would result in a
supervisor not-present page fault.
Fix the ordering to restore the direct map before the folio is freed.
Link: https://lkml.kernel.org/r/20251031120955.92116-1-lance.yang@linux.dev
Fixes: 1507f51255 ("mm: introduce memfd_secret system call to create "secret" memory areas")
Signed-off-by: Lance Yang <lance.yang@linux.dev>
Reported-by: Google Big Sleep <big-sleep-vuln-reports@google.com>
Closes: https://lore.kernel.org/linux-mm/CAEXGt5QeDpiHTu3K9tvjUTPqo+d-=wuCNYPa+6sWKrdQJ-ATdg@mail.gmail.com/
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
On arm64 with MTE enabled, a page mapped as Normal Tagged (PROT_MTE) in
user space will need to have its allocation tags initialised. This is
normally done in the arm64 set_pte_at() after checking the memory
attributes. Such page is also marked with the PG_mte_tagged flag to avoid
subsequent clearing. Since this relies on having a struct page,
pte_special() mappings are ignored.
Commit d82d09e482 ("mm/huge_memory: mark PMD mappings of the huge zero
folio special") maps the huge zero folio special and the arm64
set_pmd_at() will no longer zero the tags. There is no guarantee that the
tags are zero, especially if parts of this huge page have been previously
tagged.
It's fairly easy to detect this by regularly dropping the caches to
force the reallocation of the huge zero folio.
Allocate the huge zero folio with the __GFP_ZEROTAGS flag. In addition,
do not warn in the arm64 __access_remote_tags() when reading tags from the
huge zero page.
I bundled the arm64 change in here as well since they are both related to
the commit mapping the huge zero folio as special.
[catalin.marinas@arm.com: handle arch mte_zero_clear_page_tags() code issuing MTE instructions]
Link: https://lkml.kernel.org/r/aQi8dA_QpXM8XqrE@arm.com
Link: https://lkml.kernel.org/r/20251031170133.280742-1-catalin.marinas@arm.com
Fixes: d82d09e482 ("mm/huge_memory: mark PMD mappings of the huge zero folio special")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Tested-by: Beleswar Padhi <b-padhi@ti.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Aishwarya TCV <aishwarya.tcv@arm.com>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Because kthread_stop did not stop sc_task properly and returned -EINTR,
the sc_timer was not properly closed, ultimately causing the problem [1]
reported by syzbot when freeing sci due to the sc_timer not being closed.
Because the thread sc_task main function nilfs_segctor_thread() returns 0
when it succeeds, when the return value of kthread_stop() is not 0 in
nilfs_segctor_destroy(), we believe that it has not properly closed
sc_timer.
We use timer_shutdown_sync() to sync wait for sc_timer to shutdown, and
set the value of sc_task to NULL under the protection of lock
sc_state_lock, so as to avoid the issue caused by sc_timer not being
properly shutdowned.
[1]
ODEBUG: free active (active state 0) object: 00000000dacb411a object type: timer_list hint: nilfs_construction_timeout
Call trace:
nilfs_segctor_destroy fs/nilfs2/segment.c:2811 [inline]
nilfs_detach_log_writer+0x668/0x8cc fs/nilfs2/segment.c:2877
nilfs_put_super+0x4c/0x12c fs/nilfs2/super.c:509
Link: https://lkml.kernel.org/r/20251029225226.16044-1-konishi.ryusuke@gmail.com
Fixes: 3f66cc261c ("nilfs2: use kthread_create and kthread_stop for the log writer thread")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+24d8b70f039151f65590@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=24d8b70f039151f65590
Tested-by: syzbot+24d8b70f039151f65590@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Cc: <stable@vger.kernel.org> [6.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Support for parsing PC source info in stacktraces (e.g. '(P)') was added
in commit 2bff77c665 ("scripts/decode_stacktrace.sh: fix decoding of
lines with an additional info"). However, this logic was placed after the
build ID processing. This incorrect order fails to parse lines containing
both elements, e.g.:
drm_gem_mmap_obj+0x114/0x200 [drm 03d0564e0529947d67bb2008c3548be77279fd27] (P)
This patch fixes the problem by extracting the PC source info first and
then processing the module build ID. With this change, the line above is
now properly parsed as such:
drm_gem_mmap_obj (./include/linux/mmap_lock.h:212 ./include/linux/mm.h:811 drivers/gpu/drm/drm_gem.c:1177) drm (P)
While here, also add a brief explanation the build ID section.
Link: https://lkml.kernel.org/r/20251030010347.2731925-1-cmllamas@google.com
Fixes: 2bff77c665 ("scripts/decode_stacktrace.sh: fix decoding of lines with an additional info")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matthieu Baerts <matttbe@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Puranjay Mohan <puranjay@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In DAMON's damon_sysfs_repeat_call_fn(), time_before() is used to compare
the current jiffies with next_update_jiffies to determine whether to
update the sysfs files at this moment.
On 32-bit systems, the kernel initializes jiffies to "-5 minutes" to make
jiffies wrap bugs appear earlier. However, this causes time_before() in
damon_sysfs_repeat_call_fn() to unexpectedly return true during the first
5 minutes after boot on 32-bit systems (see [1] for more explanation,
which fixes another jiffies-related issue before). As a result, DAMON
does not update sysfs files during that period.
There is also an issue unrelated to the system's word size[2]: if the
user stops DAMON just after next_update_jiffies is updated and restarts
it after 'refresh_ms' or a longer delay, next_update_jiffies will retain
an older value, causing time_before() to return false and the update to
happen earlier than expected.
Fix these issues by making next_update_jiffies a global variable and
initializing it each time DAMON is started.
Link: https://lkml.kernel.org/r/20251030020746.967174-3-yanquanmin1@huawei.com
Link: https://lkml.kernel.org/r/20250822025057.1740854-1-ekffu200098@gmail.com [1]
Link: https://lore.kernel.org/all/20251029013038.66625-1-sj@kernel.org/ [2]
Fixes: d809a7c64b ("mm/damon/sysfs: implement refresh_ms file internal work")
Suggested-by: SeongJae Park <sj@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: ze zuo <zuoze1@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/damon: fixes for the jiffies-related issues", v2.
On 32-bit systems, the kernel initializes jiffies to "-5 minutes" to make
jiffies wrap bugs appear earlier. However, this may cause the
time_before() series of functions to return unexpected values, resulting
in DAMON not functioning as intended. Meanwhile, similar issues exist in
some specific user operation scenarios.
This patchset addresses these issues. The first patch is about the
DAMON_STAT module, and the second patch is about the core layer's sysfs.
This patch (of 2):
In DAMON_STAT's damon_stat_damon_call_fn(), time_before_eq() is used to
avoid unnecessarily frequent stat update.
On 32-bit systems, the kernel initializes jiffies to "-5 minutes" to make
jiffies wrap bugs appear earlier. However, this causes time_before_eq()
in DAMON_STAT to unexpectedly return true during the first 5 minutes after
boot on 32-bit systems (see [1] for more explanation, which fixes another
jiffies-related issue before). As a result, DAMON_STAT does not update
any monitoring results during that period, which becomes more confusing
when DAMON_STAT_ENABLED_DEFAULT is enabled.
There is also an issue unrelated to the system's word size[2]: if the user
stops DAMON_STAT just after last_refresh_jiffies is updated and restarts
it after 5 seconds or a longer delay, last_refresh_jiffies will retain an
older value, causing time_before_eq() to return false and the update to
happen earlier than expected.
Fix these issues by making last_refresh_jiffies a global variable and
initializing it each time DAMON_STAT is started.
Link: https://lkml.kernel.org/r/20251030020746.967174-2-yanquanmin1@huawei.com
Link: https://lkml.kernel.org/r/20250822025057.1740854-1-ekffu200098@gmail.com [1]
Link: https://lore.kernel.org/all/20251028143250.50144-1-sj@kernel.org/ [2]
Fixes: fabdd1e911 ("mm/damon/stat: calculate and expose estimated memory bandwidth")
Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com>
Suggested-by: SeongJae Park <sj@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: ze zuo <zuoze1@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
maple_tree tracepoints contain pointers to function names. Such a pointer
is saved when a tracepoint logs an event. There's no guarantee that it's
still valid when the event is parsed later and the pointer is dereferenced.
The kernel warns about these unsafe pointers.
event 'ma_read' has unsafe pointer field 'fn'
WARNING: kernel/trace/trace.c:3779 at ignore_event+0x1da/0x1e4
Mark the function names as tracepoint_string() to fix the events.
One case that doesn't work without my patch would be trace-cmd record
to save the binary ringbuffer and trace-cmd report to parse it in
userspace. The address of __func__ can't be dereferenced from
userspace but tracepoint_string will add an entry to
/sys/kernel/tracing/printk_formats
Link: https://lkml.kernel.org/r/20251030155537.87972-1-martin@kaiser.cx
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When emitting the order of the allocation for a hash table,
alloc_large_system_hash() unconditionally subtracts PAGE_SHIFT from log
base 2 of the allocation size. This is not correct if the allocation size
is smaller than a page, and yields a negative value for the order as seen
below:
TCP established hash table entries: 32 (order: -4, 256 bytes, linear) TCP
bind hash table entries: 32 (order: -2, 1024 bytes, linear)
Use get_order() to compute the order when emitting the hash table
information to correctly handle cases where the allocation size is smaller
than a page:
TCP established hash table entries: 32 (order: 0, 256 bytes, linear) TCP
bind hash table entries: 32 (order: 0, 1024 bytes, linear)
Link: https://lkml.kernel.org/r/20251028191020.413002-1-isaacmanjarres@google.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Fix SIGBUS semantics with large folios", v3.
Accessing memory within a VMA, but beyond i_size rounded up to the next
page size, is supposed to generate SIGBUS.
Darrick reported[1] an xfstests regression in v6.18-rc1. generic/749
failed due to missing SIGBUS. This was caused by my recent changes that
try to fault in the whole folio where possible:
19773df031 ("mm/fault: try to map the entire file folio in finish_fault()")
357b92761d ("mm/filemap: map entire large folio faultaround")
These changes did not consider i_size when setting up PTEs, leading to
xfstest breakage.
However, the problem has been present in the kernel for a long time -
since huge tmpfs was introduced in 2016. The kernel happily maps
PMD-sized folios as PMD without checking i_size. And huge=always tmpfs
allocates PMD-size folios on any writes.
I considered this corner case when I implemented a large tmpfs, and my
conclusion was that no one in their right mind should rely on receiving a
SIGBUS signal when accessing beyond i_size. I cannot imagine how it could
be useful for the workload.
But apparently filesystem folks care a lot about preserving strict SIGBUS
semantics.
Generic/749 was introduced last year with reference to POSIX, but no real
workloads were mentioned. It also acknowledged the tmpfs deviation from
the test case.
POSIX indeed says[3]:
References within the address range starting at pa and
continuing for len bytes to whole pages following the end of an
object shall result in delivery of a SIGBUS signal.
The patchset fixes the regression introduced by recent changes as well as
more subtle SIGBUS breakage due to split failure on truncation.
This patch (of 2):
Accesses within VMA, but beyond i_size rounded up to PAGE_SIZE are
supposed to generate SIGBUS.
Recent changes attempted to fault in full folio where possible. They did
not respect i_size, which led to populating PTEs beyond i_size and
breaking SIGBUS semantics.
Darrick reported generic/749 breakage because of this.
However, the problem existed before the recent changes. With huge=always
tmpfs, any write to a file leads to PMD-size allocation. Following the
fault-in of the folio will install PMD mapping regardless of i_size.
Fix filemap_map_pages() and finish_fault() to not install:
- PTEs beyond i_size;
- PMD mappings across i_size;
Make an exception for shmem/tmpfs that for long time intentionally
mapped with PMDs across i_size.
Link: https://lkml.kernel.org/r/20251027115636.82382-1-kirill@shutemov.name
Link: https://lkml.kernel.org/r/20251027115636.82382-2-kirill@shutemov.name
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Fixes: 6795801366 ("xfs: Support large folios")
Reported-by: "Darrick J. Wong" <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pde is erased from subdir rbtree through rb_erase(), but not set the node
to EMPTY, which may result in uaf access. We should use RB_CLEAR_NODE()
set the erased node to EMPTY, then pde_subdir_next() will return NULL to
avoid uaf access.
We found an uaf issue while using stress-ng testing, need to run testcase
getdent and tun in the same time. The steps of the issue is as follows:
1) use getdent to traverse dir /proc/pid/net/dev_snmp6/, and current
pde is tun3;
2) in the [time windows] unregister netdevice tun3 and tun2, and erase
them from rbtree. erase tun3 first, and then erase tun2. the
pde(tun2) will be released to slab;
3) continue to getdent process, then pde_subdir_next() will return
pde(tun2) which is released, it will case uaf access.
CPU 0 | CPU 1
-------------------------------------------------------------------------
traverse dir /proc/pid/net/dev_snmp6/ | unregister_netdevice(tun->dev) //tun3 tun2
sys_getdents64() |
iterate_dir() |
proc_readdir() |
proc_readdir_de() | snmp6_unregister_dev()
pde_get(de); | proc_remove()
read_unlock(&proc_subdir_lock); | remove_proc_subtree()
| write_lock(&proc_subdir_lock);
[time window] | rb_erase(&root->subdir_node, &parent->subdir);
| write_unlock(&proc_subdir_lock);
read_lock(&proc_subdir_lock); |
next = pde_subdir_next(de); |
pde_put(de); |
de = next; //UAF |
rbtree of dev_snmp6
|
pde(tun3)
/ \
NULL pde(tun2)
Link: https://lkml.kernel.org/r/20251025024233.158363-1-albin_yang@163.com
Signed-off-by: Wei Yang <albinwyang@tencent.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: wangzijie <wangzijie1@honor.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently, scan_get_next_rmap_item() walks every page address in a VMA to
locate mergeable pages. This becomes highly inefficient when scanning
large virtual memory areas that contain mostly unmapped regions, causing
ksmd to use large amount of cpu without deduplicating much pages.
This patch replaces the per-address lookup with a range walk using
walk_page_range(). The range walker allows KSM to skip over entire
unmapped holes in a VMA, avoiding unnecessary lookups. This problem was
previously discussed in [1].
Consider the following test program which creates a 32 TiB mapping in the
virtual address space but only populates a single page:
#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>
/* 32 TiB */
const size_t size = 32ul * 1024 * 1024 * 1024 * 1024;
int main() {
char *area = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_NORESERVE | MAP_PRIVATE | MAP_ANON, -1, 0);
if (area == MAP_FAILED) {
perror("mmap() failed\n");
return -1;
}
/* Populate a single page such that we get an anon_vma. */
*area = 0;
/* Enable KSM. */
madvise(area, size, MADV_MERGEABLE);
pause();
return 0;
}
$ ./ksm-sparse &
$ echo 1 > /sys/kernel/mm/ksm/run
Without this patch ksmd uses 100% of the cpu for a long time (more then 1
hour in my test machine) scanning all the 32 TiB virtual address space
that contain only one mapped page. This makes ksmd essentially deadlocked
not able to deduplicate anything of value. With this patch ksmd walks
only the one mapped page and skips the rest of the 32 TiB virtual address
space, making the scan fast using little cpu.
Link: https://lkml.kernel.org/r/20251023035841.41406-1-pedrodemargomes@gmail.com
Link: https://lkml.kernel.org/r/20251022153059.22763-1-pedrodemargomes@gmail.com
Link: https://lore.kernel.org/linux-mm/423de7a3-1c62-4e72-8e79-19a6413e420c@redhat.com/ [1]
Fixes: 31dbd01f31 ("ksm: Kernel SamePage Merging")
Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: craftfever <craftfever@airmail.cc>
Closes: https://lkml.kernel.org/r/020cf8de6e773bb78ba7614ef250129f11a63781@murena.io
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
KHO allocates metadata for its preserved memory map using the slab
allocator via kzalloc(). This metadata is temporary and is used by the
next kernel during early boot to find preserved memory.
A problem arises when KFENCE is enabled. kzalloc() calls can be randomly
intercepted by kfence_alloc(), which services the allocation from a
dedicated KFENCE memory pool. This pool is allocated early in boot via
memblock.
When booting via KHO, the memblock allocator is restricted to a "scratch
area", forcing the KFENCE pool to be allocated within it. This creates a
conflict, as the scratch area is expected to be ephemeral and
overwriteable by a subsequent kexec. If KHO metadata is placed in this
KFENCE pool, it leads to memory corruption when the next kernel is loaded.
To fix this, modify KHO to allocate its metadata directly from the buddy
allocator instead of slab.
Link: https://lkml.kernel.org/r/20251021000852.2924827-4-pasha.tatashin@soleen.com
Fixes: fc33e4b44b ("kexec: enable KHO support for memory preservation")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: David Matlack <dmatlack@google.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Samiullah Khawaja <skhawaja@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
KHO memory preservation metadata is preserved in 512 byte chunks which
requires their allocation from slab allocator. Slabs are not safe to be
used with KHO because of kfence, and because partial slabs may lead leaks
to the next kernel. Change the size to be PAGE_SIZE.
The kfence specifically may cause memory corruption, where it randomly
provides slab objects that can be within the scratch area. The reason for
that is that kfence allocates its objects prior to KHO scratch is marked
as CMA region.
While this change could potentially increase metadata overhead on systems
with sparsely preserved memory, this is being mitigated by ongoing work to
reduce sparseness during preservation via 1G guest pages. Furthermore,
this change aligns with future work on a stateless KHO, which will also
use page-sized bitmaps for its radix tree metadata.
Link: https://lkml.kernel.org/r/20251021000852.2924827-3-pasha.tatashin@soleen.com
Fixes: fc33e4b44b ("kexec: enable KHO support for memory preservation")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Matlack <dmatlack@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Samiullah Khawaja <skhawaja@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "KHO: kfence + KHO memory corruption fix", v3.
This series fixes a memory corruption bug in KHO that occurs when KFENCE
is enabled.
The root cause is that KHO metadata, allocated via kzalloc(), can be
randomly serviced by kfence_alloc(). When a kernel boots via KHO, the
early memblock allocator is restricted to a "scratch area". This forces
the KFENCE pool to be allocated within this scratch area, creating a
conflict. If KHO metadata is subsequently placed in this pool, it gets
corrupted during the next kexec operation.
Google is using KHO and have had obscure crashes due to this memory
corruption, with stacks all over the place. I would prefer this fix to be
properly backported to stable so we can also automatically consume it once
we switch to the upstream KHO.
Patch 1/3 introduces a debug-only feature (CONFIG_KEXEC_HANDOVER_DEBUG)
that adds checks to detect and fail any operation that attempts to place
KHO metadata or preserved memory within the scratch area. This serves as
a validation and diagnostic tool to confirm the problem without affecting
production builds.
Patch 2/3 Increases bitmap to PAGE_SIZE, so buddy allocator can be used.
Patch 3/3 Provides the fix by modifying KHO to allocate its metadata
directly from the buddy allocator instead of slab. This bypasses the
KFENCE interception entirely.
This patch (of 3):
It is invalid for KHO metadata or preserved memory regions to be located
within the KHO scratch area, as this area is overwritten when the next
kernel is loaded, and used early in boot by the next kernel. This can
lead to memory corruption.
Add checks to kho_preserve_* and KHO's internal metadata allocators
(xa_load_or_alloc, new_chunk) to verify that the physical address of the
memory does not overlap with any defined scratch region. If an overlap is
detected, the operation will fail and a WARN_ON is triggered. To avoid
performance overhead in production kernels, these checks are enabled only
when CONFIG_KEXEC_HANDOVER_DEBUG is selected.
[rppt@kernel.org: fix KEXEC_HANDOVER_DEBUG Kconfig dependency]
Link: https://lkml.kernel.org/r/aQHUyyFtiNZhx8jo@kernel.org
[pasha.tatashin@soleen.com: build fix]
Link: https://lkml.kernel.org/r/CA+CK2bBnorfsTymKtv4rKvqGBHs=y=MjEMMRg_tE-RME6n-zUw@mail.gmail.com
Link: https://lkml.kernel.org/r/20251021000852.2924827-1-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20251021000852.2924827-2-pasha.tatashin@soleen.com
Fixes: fc33e4b44b ("kexec: enable KHO support for memory preservation")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Mike Rapoport <rppt@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Matlack <dmatlack@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Samiullah Khawaja <skhawaja@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
VM fails to boot with 256 vCPUs, the detailed command is
qemu-system-loongarch64 -smp 256
and there is an error reported as follows:
KVM_LOONGARCH_EXTIOI_INIT_NUM_CPU failed: Invalid argument
There is typo issue in function kvm_eiointc_ctrl_access() when set
max supported vCPUs.
Cc: stable@vger.kernel.org
Fixes: 47256c4c8b ("LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_ctrl_access()")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
PMU hardware about VM is switched on VM exit to host rather than vCPU
context sched off, PMU is checked and restored on return to VM. It is
not necessary to check PMU on vCPU context sched on callback, since the
request is made on the VM exit entry or VM PMU CSR access abort routine
already.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
On LoongArch system, guest PMU hardware is shared by guest and host but
PMU interrupt is separated. PMU is pass-through to VM, and there is PMU
context switch when exit to host and return to guest.
There is optimiation to check whether PMU is enabled by guest. If not,
it is not necessary to return to guest. However, if it is enabled, PMU
context for guest need switch on. Now KVM_REQ_PMU notification is set
on vCPU context switch, but it is missing if there is no vCPU context
switch while PMU is used by guest VM, so fix it.
Cc: <stable@vger.kernel.org>
Fixes: f4e40ea9f7 ("LoongArch: KVM: Add PMU support for guest")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
When timer is fired in oneshot mode, CSR.TVAL will stop with value -1
rather than 0. However when the register CSR.TVAL is restored, it will
continue to count down rather than stop there.
Now the method is to write 0 to CSR.TVAL, wait to count down for 1 cycle
at least, which is 10ns with a timer freq 100MHz, and then retore timer
interrupt status. Here add 2 cycles delay to assure that timer interrupt
is injected.
With this patch, timer selftest case passes to run always.
Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
With secondary MMU page table, if there is a read page fault, the page's
write attribute will not set even if it is writable from master MMU page
table. This logic only works if dirty tracking is enabled, so page table
should be set with _PAGE_WRITE if dirty tracking is disabled.
It reduces extra page fault on secondary MMU page table if a VM finishes
migration, when the master MMU page table is ready and the secondary MMU
page is fresh.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
When specifying '-d' for kexec_file_load interface, loaded locations of
kernel/initrd/cmdline etc can be printed out to help debug.
Commit eb7622d908 ("kexec_file, riscv: print out debugging message if
required") fixes the same issue on RISC-V.
So, remove kexec_image_info() because the content has been printed out
in generic code.
And on Loongson-3A5000, the printed messages look like below:
kexec_file: kernel: 00000000d9aad283 kernel_size: 0x2e77f30
kexec_file(EFI): No LoongArch PE image header.
kexec_file: Loaded initrd at 0x80000000 bufsz=0x1637cd0 memsz=0x1638000
kexec_file(ELF): Loaded kernel at 0x9c20000 bufsz=0x27f1800 memsz=0x2950000
kexec_file: nr_segments = 2
kexec_file: segment[0]: buf=0x00000000cc3e6c33 bufsz=0x27f1800 mem=0x9c20000 memsz=0x2950000
kexec_file: segment[1]: buf=0x00000000bb75a541 bufsz=0x1637cd0 mem=0x80000000 memsz=0x1638000
kexec_file: kexec_file_load: type:0, start:0xb15d000 head:0x18db60002 flags:0x8
Signed-off-by: Qiang Ma <maqianga@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The kexec_buf structure was previously declared without initialization.
commit bf454ec31a ("kexec_file: allow to place kexec_buf randomly")
added a field that is always read but not consistently populated by all
architectures. This un-initialized field will contain garbage.
This is also triggering a UBSAN warning when the uninitialized data is
accessed:
------------[ cut here ]------------
UBSAN: invalid-load in ./include/linux/kexec.h:210:10
load of value 252 is not a valid value for type '_Bool'
Zero-initializing kexec_buf at declaration ensures all fields are
cleanly set, preventing future instances of uninitialized memory being
used.
Fixes: bf454ec31a ("kexec_file: allow to place kexec_buf randomly")
Link: https://lore.kernel.org/r/20250827-kbuf_all-v1-2-1df9882bb01a@debian.org
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
CSR.FWPC and CSR.MWPC are 32bit registers, so use csr_read32() rather
than csr_read64() to read the values of FWPC/MWPC.
Cc: stable@vger.kernel.org
Fixes: edffa33c7b ("LoongArch: Add hardware breakpoints/watchpoints support")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Remove the unnecessary __GFP_HIGHMEM masking in pud_alloc_one(), which
was introduced with commit 382739797f ("loongarch: convert various
functions to use ptdescs"). GFP_KERNEL doesn't contain __GFP_HIGHMEM.
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Now if the PTE/PMD is dirty with _PAGE_DIRTY but without _PAGE_MODIFIED,
after {pte,pmd}_modify() we lose _PAGE_DIRTY, then {pte,pmd}_dirty()
return false and lead to data loss. This can happen in certain scenarios
such as HW PTW doesn't set _PAGE_MODIFIED automatically, so here we need
_PAGE_MODIFIED to record the dirty status (_PAGE_DIRTY).
The new modification involves checking whether the original PTE/PMD has
the _PAGE_DIRTY flag. If it exists, the _PAGE_MODIFIED bit is also set,
ensuring that the {pte,pmd}_dirty() interface can always return accurate
information.
Cc: stable@vger.kernel.org
Co-developed-by: Liupu Wang <wangliupu@loongson.cn>
Signed-off-by: Liupu Wang <wangliupu@loongson.cn>
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Now there 5 places which calculate max_pfn & max_low_pfn:
1. in fdt_setup() for FDT systems;
2. in memblock_init() for ACPI systems;
3. in init_numa_memory() for NUMA systems;
4. in arch_mem_init() to recalculate for "mem=" cmdline;
5. in paging_init() to recalculate for NUMA systems.
Since memblock_init() is called both for ACPI and FDT systems, move the
calculation out of the for_each_efi_memory_desc() loop can eliminate the
first case. The last case is very questionable (may be derived from the
MIPS/Loongson code) and breaks the "mem=" cmdline, so should be removed.
And then the NUMA version of paging_init() can be also eliminated.
After consolidation there are 3 places of calculation:
1. in memblock_init() for both ACPI and FDT systems;
2. in init_numa_memory() to recalculate for NUMA systems;
3. in arch_mem_init() to recalculate for the "mem=" cmdline.
For all cases the calculation is:
max_pfn = PFN_DOWN(memblock_end_of_DRAM());
max_low_pfn = min(PFN_DOWN(HIGHMEM_START), max_pfn);
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
1. Use phys_addr_t instead of u64, which can work for both 32/64 bits.
2. Check whether the input physical address is above TO_PHYS_MASK (and
return NULL if yes) for the DMW version.
Note: In theory early_ioremap() also need the TO_PHYS_MASK checking, but
the UEFI BIOS pass some DMW virtual addresses.
Cc: stable@vger.kernel.org
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Now we use virtual addresses to fill CSR_MERRENTRY/CSR_TLBRENTRY, but
hardware hope physical addresses. Now it works well because the high
bits are ignored above PA_BITS (48 bits), but explicitly use physical
addresses can avoid potential bugs. So fix it.
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
LoongArch's MSG interrupt features are used across multiple subsystems.
Clarify these features to avoid misuse, existing users will be adjusted
if necessary.
MSGINT: Infrastructure, means the CPU core supports message interupts.
Indicated by CPUCFG1.MSGINT.
AVECINT: AVEC interrupt controller based on MSGINT, means the CPU chip
supports direct message interrupts. Indicated by IOCSR.FEATURES.DMSI.
REDIRECTINT: REDIRECT interrupt controller based on MSGINT and AVECINT,
means the CPU chip supports redirect message interrupts. Indicated by
IOCSR.FEATURES.RMSI.
For example:
Loongson-3A5000/3C5000 doesn't support MSGINT/AVECINT/REDIRECTINT;
Loongson-3A6000 supports MSGINT but doesn't support AVECINT/REDIRECTINT;
Loongson-3C6000 supports MSGINT/AVECINT/REDIRECTINT.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
It's used to work around an objtool issue since commit abb2a55722
("LoongArch: Add cflag -fno-isolate-erroneous-paths-dereference"), but
it's then passed to bindgen and cause an error because Clang does not
have this option.
Fixes: abb2a55722 ("LoongArch: Add cflag -fno-isolate-erroneous-paths-dereference")
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Tested-by: Mingcong Bai <jeffbai@aosc.io>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
When the per-IP connection limit is exceeded in ksmbd_kthread_fn(),
the code sets ret = -EAGAIN and continues the accept loop without
closing the just-accepted socket. That leaks one socket per rejected
attempt from a single IP and enables a trivial remote DoS.
Release client_sk before continuing.
This bug was found with ZeroPath.
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
smb_direct_prepare_negotiation() posts a recv and then, if
smb_direct_accept_client() fails, calls put_recvmsg() on the same
buffer. That unmaps and recycles a buffer that is still posted on
the QP., which can lead to device DMA into unmapped or reused memory.
Track whether the recv was posted and only return it if it was never
posted. If accept fails after a post, leave it for teardown to drain
and complete safely.
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
The user calls fsconfig twice, but when the program exits, free() only
frees ctx->source for the second fsconfig, not the first.
Regarding fc->source, there is no code in the fs context related to its
memory reclamation.
To fix this memory leak, release the source memory corresponding to ctx
or fc before each parsing.
syzbot reported:
BUG: memory leak
unreferenced object 0xffff888128afa360 (size 96):
backtrace (crc 79c9c7ba):
kstrdup+0x3c/0x80 mm/util.c:84
smb3_fs_context_parse_param+0x229b/0x36c0 fs/smb/client/fs_context.c:1444
BUG: memory leak
unreferenced object 0xffff888112c7d900 (size 96):
backtrace (crc 79c9c7ba):
smb3_fs_context_fullpath+0x70/0x1b0 fs/smb/client/fs_context.c:629
smb3_fs_context_parse_param+0x2266/0x36c0 fs/smb/client/fs_context.c:1438
Reported-by: syzbot+72afd4c236e6bc3f4bac@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=72afd4c236e6bc3f4bac
Cc: stable@vger.kernel.org
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
cifs_pick_channel iterates candidate channels using cur. The
reconnect-state test mistakenly used a different variable.
This checked the wrong slot and would cause us to skip a healthy channel
and to dispatch on one that needs reconnect, occasionally failing
operations when a channel was down.
Fix by replacing for the correct variable.
Fixes: fc43a8ac39 ("cifs: cifs_pick_channel should try selecting active channels")
Cc: stable@vger.kernel.org
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The driver calls fwnode_get_named_child_node() which takes a reference
on the child node, but never releases it, which causes a reference leak.
Fix by using devm_add_action_or_reset() to automatically release the
reference when the device is removed.
Fixes: d5282a5392 ("pinctrl: cs42l43: Add support for the cs42l43")
Suggested-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Pull i2c fix from Wolfram Sang:
"Two reverts merged into one commit to handle a regression caused by a
wrong cleanup because the underlying implications were unclear"
* tag 'i2c-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: muxes: pca954x: Fix broken reset-gpio usage
Pull Kbuild fixes from Nathan Chancellor:
- Strip trailing padding bytes from modules.builtin.modinfo to fix
error during modules_install with certain versions of kmod
- Drop unused static inline function warning in .c files with clang
from W=1 to W=2
- Ensure kernel-doc.py invocations use the PYTHON3 make variable to
ensure user's choice of Python interpreter is always respected
* tag 'kbuild-fixes-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: Let kernel-doc.py use PYTHON3 override
compiler_types: Move unused static inline functions warning to W=2
kbuild: Strip trailing padding bytes from modules.builtin.modinfo
The current scheme for handling LBRV when nested is used is very
complicated, especially when L1 does not enable LBRV (i.e. does not set
LBR_CTL_ENABLE_MASK).
To avoid copying LBRs between VMCB01 and VMCB02 on every nested
transition, the current implementation switches between using VMCB01 or
VMCB02 as the source of truth for the LBRs while L2 is running. If L2
enables LBR, VMCB02 is used as the source of truth. When L2 disables
LBR, the LBRs are copied to VMCB01 and VMCB01 is used as the source of
truth. This introduces significant complexity, and incorrect behavior in
some cases.
For example, on a nested #VMEXIT, the LBRs are only copied from VMCB02
to VMCB01 if LBRV is enabled in VMCB01. This is because L2's writes to
MSR_IA32_DEBUGCTLMSR to enable LBR are intercepted and propagated to
VMCB01 instead of VMCB02. However, LBRV is only enabled in VMCB02 when
L2 is running.
This means that if L2 enables LBR and exits to L1, the LBRs will not be
propagated from VMCB02 to VMCB01, because LBRV is disabled in VMCB01.
There is no meaningful difference in CPUID rate in L2 when copying LBRs
on every nested transition vs. the current approach, so do the simple
and correct thing and always copy LBRs between VMCB01 and VMCB02 on
nested transitions (when LBRV is disabled by L1). Drop the conditional
LBRs copying in __svm_{enable/disable}_lbrv() as it is now unnecessary.
VMCB02 becomes the only source of truth for LBRs when L2 is running,
regardless of LBRV being enabled by L1, drop svm_get_lbr_vmcb() and use
svm->vmcb directly in its place.
Fixes: 1d5a1b5860 ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251108004524.1600006-4-yosry.ahmed@linux.dev
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
svm_update_lbrv() is called when MSR_IA32_DEBUGCTLMSR is updated, and on
nested transitions where LBRV is used. It checks whether LBRV enablement
needs to be changed in the current VMCB, and if it does, it also
recalculate intercepts to LBR MSRs.
However, there are cases where intercepts need to be updated even when
LBRV enablement doesn't. Example scenario:
- L1 has MSR_IA32_DEBUGCTLMSR cleared.
- L1 runs L2 without LBR_CTL_ENABLE (no LBRV).
- L2 sets DEBUGCTLMSR_LBR in MSR_IA32_DEBUGCTLMSR, svm_update_lbrv()
sets LBR_CTL_ENABLE in VMCB02 and disables intercepts to LBR MSRs.
- L2 exits to L1, svm_update_lbrv() is not called on this transition.
- L1 clears MSR_IA32_DEBUGCTLMSR, svm_update_lbrv() finds that
LBR_CTL_ENABLE is already cleared in VMCB01 and does nothing.
- Intercepts remain disabled, L1 reads to LBR MSRs read the host MSRs.
Fix it by always recalculating intercepts in svm_update_lbrv().
Fixes: 1d5a1b5860 ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251108004524.1600006-3-yosry.ahmed@linux.dev
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The APM lists the DbgCtlMsr field as being tracked by the VMCB_LBR clean
bit. Always clear the bit when MSR_IA32_DEBUGCTLMSR is updated.
The history is complicated, it was correctly cleared for L1 before
commit 1d5a1b5860 ("KVM: x86: nSVM: correctly virtualize LBR msrs when
L2 is running"). At that point svm_set_msr() started to rely on
svm_update_lbrv() to clear the bit, but when nested virtualization
is enabled the latter does not always clear it even if MSR_IA32_DEBUGCTLMSR
changed. Go back to clearing it directly in svm_set_msr().
Fixes: 1d5a1b5860 ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
Reported-by: Matteo Rizzo <matteorizzo@google.com>
Reported-by: evn@google.com
Co-developed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251108004524.1600006-2-yosry.ahmed@linux.dev
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM x86 fixes for 6.18:
- Inject #UD if the guest attempts to execute SEAMCALL or TDCALL as KVM
doesn't support virtualization the instructions, but the instructions
are gated only by VMXON, i.e. will VM-Exit instead of taking a #UD and
thus result in KVM exiting to userspace with an emulation error.
- Unload the "FPU" when emulating INIT of XSTATE features if and only if
the FPU is actually loaded, instead of trying to predict when KVM will
emulate an INIT (CET support missed the MP_STATE path). Add sanity
checks to detect and harden against similar bugs in the future.
- Unregister KVM's GALog notifier (for AVIC) when kvm-amd.ko is unloaded.
- Use a raw spinlock for svm->ir_list_lock as the lock is taken during
schedule(), and "normal" spinlocks are sleepable locks when PREEMPT_RT=y.
- Remove guest_memfd bindings on memslot deletion when a gmem file is dying
to fix a use-after-free race found by syzkaller.
- Fix a goof in the EPT Violation handler where KVM checks the wrong
variable when determining if the reported GVA is valid.
KVM/riscv fixes for 6.18, take #2
- Fix check for local interrupts on riscv32
- Read HGEIP CSR on the correct cpu when checking for IMSIC interrupts
- Remove automatic I/O mapping from kvm_arch_prepare_memory_region()
It is possible to force a specific version of python to be used when
building the kernel by passing PYTHON3= on the make command line.
However kernel-doc.py is currently called with python3 hard-coded and
thus ignores this setting.
Use $(PYTHON3) to run $(KERNELDOC) so that the desired version of
python is used.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://patch.msgid.link/20251107192933.2bfe9e57@endymion
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Pull drm fix from Dave Airlie:
"Brown paper bag, the dma mask fix which I applied and actually looked
through for bad things, actually broke newer GPUs, there might be some
latent part in the boot path that is assuming 32-bit still, but we
will figure that out elsewhere.
nouveau:
- revert DMA mask change"
* tag 'drm-fixes-2025-11-09' of https://gitlab.freedesktop.org/drm/kernel:
Revert "drm/nouveau: set DMA mask before creating the flush page"
If the allocation of tl_hba->sh fails in tcm_loop_driver_probe() and we
attempt to dereference it in tcm_loop_tpg_address_show() we will get a
segfault, see below for an example. So, check tl_hba->sh before
dereferencing it.
Unable to allocate struct scsi_host
BUG: kernel NULL pointer dereference, address: 0000000000000194
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 8356 Comm: tokio-runtime-w Not tainted 6.6.104.2-4.azl3 #1
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024
RIP: 0010:tcm_loop_tpg_address_show+0x2e/0x50 [tcm_loop]
...
Call Trace:
<TASK>
configfs_read_iter+0x12d/0x1d0 [configfs]
vfs_read+0x1b5/0x300
ksys_read+0x6f/0xf0
...
Cc: stable@vger.kernel.org
Fixes: 2628b352c3 ("tcm_loop: Show address of tpg in configfs")
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Allen Pais <apais@linux.microsoft.com>
Link: https://patch.msgid.link/1762370746-6304-1-git-send-email-hamzamahfooz@linux.microsoft.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull x86 fixes from Ingo Molnar:
- Fix AMD PCI root device caching regression that triggers
on certain firmware variants
- Fix the zen5_rdseed_microcode[] array to be NULL-terminated
- Add more AMD models to microcode signature checking
* tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Add more known models to entry sign checking
x86/CPU/AMD: Add missing terminator for zen5_rdseed_microcode
x86/amd_node: Fix AMD root device caching
Pull scheduler fix from Ingo Molnar:
"Fix a group-throttling bug in the fair scheduler"
* tag 'sched-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remaining
Pull perf event fix from Ingo Molnar:
"Fix a system hang caused by cpu-clock events deadlock"
* tag 'perf-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix system hang caused by cpu-clock usage
Pull locking fix from Ingo Molnar:
"Fix (well, cut in half) a futex performance regression on PowerPC"
* tag 'locking-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Optimize per-cpu reference counting
Pull io_uring fix from Jens Axboe:
"Single fix in there, fixing an overflow in calculating the needed
segments for converting into a bvec array"
* tag 'io_uring-6.18-20251107' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring: fix regbuf vector size truncation
Pull xfs fixes from Carlos Maiolino:
"This contain fixes for the RT and zoned allocator, and a few fixes for
atomic writes"
* tag 'xfs-fixes-6.18-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: free xfs_busy_extents structure when no RT extents are queued
xfs: fix zone selection in xfs_select_open_zone_mru
xfs: fix a rtgroup leak when xfs_init_zone fails
xfs: fix various problems in xfs_atomic_write_cow_iomap_begin
xfs: fix delalloc write failures in software-provided atomic writes
Zenghui reports that running a KVM guest with an assigned device and
lockdep enabled produces an unfriendly splat due to an inconsistent irq
context when taking the lpi_xa's spinlock.
This is no good as in rare cases the last reference to an LPI can get
dropped after injection of a cached LPI translation. In this case,
vgic_put_irq() will release the IRQ struct and take the lpi_xa's
spinlock to erase it from the xarray.
Reinstate the IRQ ordering and update the lockdep hint accordingly. Note
that there is no irqsave equivalent of might_lock(), so just explictly
grab and release the spinlock on lockdep kernels.
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Closes: https://lore.kernel.org/kvmarm/b4d7cb0f-f007-0b81-46d1-998b15cc14bc@huawei.com/
Fixes: 982f31bbb5 ("KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock")
Signed-off-by: Oliver Upton <oupton@kernel.org>
Link: https://patch.msgid.link/20251107184847.1784820-2-oupton@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Now that the idreg's GIC field is in sync with the irqchip, limit
the runtime clearing of these fields to the pathological case where
we do not have an in-kernel GIC.
While we're at it, use the existing API instead of open-coded
accessors to access the ID regs.
Fixes: 5cb57a1aff ("KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest")
Reviewed-by: Oliver Upton <oupton@kernel.org>
Link: https://patch.msgid.link/20251030122707.2033690-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
32bit ID registers aren't getting much love these days, and are
often missed in updates. One of these updates broke restoring
a GICv2 guest on a GICv3 machine.
Instead of performing a piecemeal fix, just bite the bullet
and make all 32bit ID regs fully writable. KVM itself never
relies on them for anything, and if the VMM wants to mess up
the guest, so be it.
Fixes: 5cb57a1aff ("KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Cc: stable@vger.kernel.org
Reviewed-by: Oliver Upton <oupton@kernel.org>
Link: https://patch.msgid.link/20251030122707.2033690-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
This reverts commit ebe7556050.
Tested the latest kernel on my GB203 and this seems to break it somehow.
Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: gsp: GSP-FMC boot failed (mbox: 0x0000000b)
Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: gsp: init failed, -5
Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: init failed with -5
Nov 09 04:16:14 bighp kernel: nouveau: drm:00000000:00000080: init failed with -5
Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: drm: Device allocation failed: -5
Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: probe with driver nouveau failed with error -5
Not sure why, I went over the patch and thought it should have worked, but there must be some
32-bit problem maybe in the FMC boot path.
Signed-off-by: Dave Airlie <airlied@redhat.com>
It seems that most of the tests prepare the interfaces once before the test
run (setup_prepare()), rely on setup_wait() to wait for link and only then
run the test(s).
local_termination brings the physical interfaces down and up during test
run but never wait for them to come up. If the auto-negotiation takes
some seconds, first test packets are being lost, which leads to
false-negative test results.
Use setup_wait() in run_test() to make sure auto-negotiation has been
completed after all simple_if_init() calls on physical interfaces and test
packets will not be lost because of the race against link establishment.
Fixes: 90b9566aa5 ("selftests: forwarding: add a test for local_termination.sh")
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Link: https://patch.msgid.link/20251106161213.459501-1-alexander.sverdlin@siemens.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The `len` member of the sk_buff is an unsigned int. This is cast to
`ssize_t` (a signed type) for the first sk_buff in the comparison,
but not the second sk_buff. On 32-bit systems, this can result in
an integer underflow for certain values because unsigned arithmetic
is being used.
This appears to be an oversight: if the intention was to use unsigned
arithmetic, then the first cast would have been omitted. The change
ensures both len values are cast to `ssize_t`.
The underflow causes an issue with ktls when multiple TLS PDUs are
included in a single TCP segment. The mainline kernel does not use
strparser for ktls anymore, but this is still useful for other
features that still use strparser, and for backporting.
Signed-off-by: Nate Karstens <nate.karstens@garmin.com>
Cc: stable@vger.kernel.org
Fixes: 43a0c6751a ("strparser: Stream parser for messages")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20251106222835.1871628-1-nate.karstens@garmin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After commit 44aa25c000 ("riscv: asm: use .insn for making custom
instructions"), builds using LLVM older that 19 or binutils older than
2.38 fail with:
arch/riscv/include/asm/vdso/processor.h: Assembler messages:
arch/riscv/include/asm/vdso/processor.h:27: Error: unrecognized opcode `0x100000f'
arch/riscv/include/asm/vdso/processor.h:27: Error: unrecognized opcode `0x100000f'
arch/riscv/include/asm/vdso/processor.h:27: Error: unrecognized opcode `0x100000f'
arch/riscv/include/asm/vdso/processor.h:27: Error: unrecognized opcode `0x100000f'
make[4]: *** [scripts/Makefile.build:287: arch/riscv/kernel/vdso/vgettimeofday.o] Error 1
In file included from <built-in>:4:
In file included from lib/vdso/gettimeofday.c:6:
In file included from include/vdso/datapage.h:21:
In file included from include/vdso/processor.h:10:
arch/riscv/include/asm/vdso/processor.h:23:2: error: expected instruction format
23 | ALT_RISCV_PAUSE();
| ^
arch/riscv/include/asm/errata_list.h:47:3: note: expanded from macro 'ALT_RISCV_PAUSE'
47 | RISCV_PAUSE, /* Original RISC‑V pause insn */ \
| ^
arch/riscv/include/asm/insn-def.h:259:21: note: expanded from macro 'RISCV_PAUSE'
259 | #define RISCV_PAUSE ASM_INSN_I("0x100000f")
| ^
arch/riscv/include/asm/asm.h:16:26: note: expanded from macro 'ASM_INSN_I'
16 | #define ASM_INSN_I(__x) ".insn " __x
| ^
<inline asm>:5:7: note: instantiated into assembly here
5 | .insn 0x100000f
| ^
binutils gained support for '.insn <value>' in 2.38 [1] and LLVM gained
support in 19 [2]. Adjust the test for CONFIG_AS_HAS_INSN to ensure that
all versions of .insn are supported before being used.
Fixes: 44aa25c000 ("riscv: asm: use .insn for making custom instructions")
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a262b82fdbf4cda3b0648b1adc32245ca3f78b7a [1]
Link: 2a086dce69 [2]
Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://patch.msgid.link/20251107-riscv-fix-new-insn-usage-v1-1-9a186c5928a0@kernel.org
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Pull drm fixes from Dave Airlie:
"Back from travel, thanks to Simona for handling things. regular fixes,
seems about the right size, but spread out a bit.
amdgpu has the usual range of fixes, xe has a few fixes, and nouveau
has a couple of fixes, one for blackwell modifiers on 8/16 bit
surfaces.
Otherwise a few small fixes for mediatek, sched, imagination and
pixpaper.
sched:
- Fix deadlock
amdgpu:
- Reset fixes
- Misc fixes
- Panel scaling fixes
- HDMI fix
- S0ix fixes
- Hibernation fix
- Secure display fix
- Suspend fix
- MST fix
amdkfd:
- Process cleanup fix
xe:
- Fix missing synchronization on unbind
- Fix device shutdown when doing FLR
- Fix user fence signaling order
i915:
- Avoid lock inversion when pinning to GGTT on CHV/BXT+VTD
- Fix conversion between clock ticks and nanoseconds
mediatek:
- Disable AFBC support on Mediatek DRM driver
- Add pm_runtime support for GCE power control
imagination:
- kconfig: Fix dependencies
nouveau:
- Set DMA mask earlier
- Advertize correct modifiers for GB20x
pixpaper:
- kconfig: Fix dependencies"
* tag 'drm-fixes-2025-11-08' of https://gitlab.freedesktop.org/drm/kernel: (26 commits)
drm/xe: Enforce correct user fence signaling order using
drm/xe: Do clean shutdown also when using flr
drm/xe: Move declarations under conditional branch
drm/xe/guc: Synchronize Dead CT worker with unbind
drm/amd/display: Enable mst when it's detected but yet to be initialized
drm/amdgpu: Fix wait after reset sequence in S3
drm/amd: Fix suspend failure with secure display TA
drm/amdgpu: fix gpu page fault after hibernation on PF passthrough
drm/tiny: pixpaper: add explicit dependency on MMU
drm/nouveau: Advertise correct modifiers on GB20x
drm: define NVIDIA DRM format modifiers for GB20x
drm/nouveau: set DMA mask before creating the flush page
drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb
drm/amd/display: Fix NULL deref in debugfs odm_combine_segments
drm/amdkfd: Don't clear PT after process killed
drm/amdgpu/smu: Handle S0ix for vangogh
drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend()
drm/amd/display: Fix black screen with HDMI outputs
drm/amd/display: Don't stretch non-native images by default in eDP
drm/amd/pm: fix missing device_attr cleanup in amdgpu_pm_sysfs_init()
...
Pull parisc fix from Helge Deller:
- fix crash triggered by unaligned access in parisc unwinder
* tag 'parisc-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Avoid crash due to unaligned access in unwinder
Pull iommufd fixes from Jason Gunthorpe:
- Syzkaller found a case where maths overflows can cause divide by 0
- Typo in a compiler bug warning fix in the selftests broke the
selftests
- type1 compatability had a mismatch when unmapping an already unmapped
range, it should succeed
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd: Make vfio_compat's unmap succeed if the range is already empty
iommufd/selftest: Fix ioctl return value in _test_cmd_trigger_vevents()
iommufd: Don't overflow during division for dirty tracking
Add proper error handling and resource cleanup to prevent memory leaks
in add_boot_memory_ranges(). The function now checks for NULL return
from kobject_create_and_add(), uses local buffer for range names to
avoid dynamic allocation, and implements a cleanup path that removes
previously created sysfs groups and kobjects on failure.
This prevents resource leaks when kobject creation or sysfs group
creation fails during boot memory range initialization.
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://patch.msgid.link/20251030023228.3956296-1-kaushlendra.kumar@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Per Nathan, clang catches unused "static inline" functions in C files
since commit 6863f5643d ("kbuild: allow Clang to find unused static
inline functions for W=1 build").
Linus said:
> So I entirely ignore W=1 issues, because I think so many of the extra
> warnings are bogus.
>
> But if this one in particular is causing more problems than most -
> some teams do seem to use W=1 as part of their test builds - it's fine
> to send me a patch that just moves bad warnings to W=2.
>
> And if anybody uses W=2 for their test builds, that's THEIR problem..
Here is the change to bump the warning from W=1 to W=2.
Fixes: 6863f5643d ("kbuild: allow Clang to find unused static inline functions for W=1 build")
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251106105000.2103276-1-andriy.shevchenko@linux.intel.com
[nathan: Adjust comment as well]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
per_cpu(cpc_desc_ptr, cpu) object is initialized for only the online
CPU via acpi_soft_cpu_online() --> __acpi_processor_start() -->
acpi_cppc_processor_probe().
However the function cppc_perf_ctrs_in_pcc() checks if the CPPC
perf-ctrs are in a PCC region for all the present CPUs, which breaks
when the kernel is booted with "nosmt=force".
Hence, limit the check only to the online CPUs.
Fixes: ae2df912d1 ("ACPI: CPPC: Disable FIE if registers in PCC regions")
Reviewed-by: "Mario Limonciello (AMD) (kernel.org)" <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://patch.msgid.link/20251107074145.2340-5-gautham.shenoy@amd.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
per_cpu(cpc_desc_ptr, cpu) object is initialized for only the online
CPUs via acpi_soft_cpu_online() --> __acpi_processor_start() -->
acpi_cppc_processor_probe().
However the function cppc_allow_fast_switch() checks for the validity
of the _CPC object for all the present CPUs. This breaks when the
kernel is booted with "nosmt=force".
Check fast_switch capability only on online CPUs
Fixes: 15eece6c5b ("ACPI: CPPC: Fix NULL pointer dereference when nosmp is used")
Reviewed-by: "Mario Limonciello (AMD) (kernel.org)" <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://patch.msgid.link/20251107074145.2340-4-gautham.shenoy@amd.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The HPA to DPA translation for poison injection assumes that the
base address starts from where the CXL region begins. When the
extended linear cache is active, the offset can be within the DRAM
region. Adjust the offset so that it correctly reflects the offset
within the CXL region.
[ dj: Add fixes tag from Alison ]
Fixes: c3dd67681c ("cxl/region: Add inject and clear poison by region offset")
Link: https://patch.msgid.link/20251031173224.3537030-5-dave.jiang@intel.com
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
SMB2_change_notify called smb2_validate_iov() but ignored the return
code, then kmemdup()ed using server provided OutputBufferOffset/Length.
Check the return of smb2_validate_iov() and bail out on error.
Discovered with help from the ZeroPath security tooling.
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Cc: stable@vger.kernel.org
Fixes: e3e9463414 ("smb3: improve SMB3 change notification support")
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull gpio fixes from Bartosz Golaszewski:
- use the firmware node of the GPIO chip, not its label for software
node lookup
- fix invalid pointer access in GPIO debugfs
- drop unused functions from gpio-tb10x
- fix a regression in gpio-aggregator: restore the set_config()
callback in the driver
- correct schema $id path in ti,twl4030 DT bindings
* tag 'gpio-fixes-for-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: tb10x: Drop unused tb10x_set_bits() function
gpio: aggregator: restore the set_config operation
gpiolib: fix invalid pointer access in debugfs
gpio: swnode: don't use the swnode's name as the key for GPIO lookup
dt-bindings: gpio: ti,twl4030: Correct the schema $id path
Pull tracing fixes from Steven Rostedt:
- Check for reader catching up in ring_buffer_map_get_reader()
If the reader catches up to the writer in the memory mapped ring
buffer then calling rb_get_reader_page() will return NULL as there's
no pages left. But this isn't checked for before calling
rb_get_reader_page() and the return of NULL causes a warning.
If it is detected that the reader caught up to the writer, then
simply exit the routine
- Fix memory leak in histogram create_field_var()
The couple of the error paths in create_field_var() did not properly
clean up what was allocated. Make sure everything is freed properly
on error
- Fix help message of tools latency_collector
The help message incorrectly stated that "-t" was the same as
"--threads" whereas "--threads" is actually represented by "-e"
* tag 'trace-v6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/tools: Fix incorrcet short option in usage text for --threads
tracing: Fix memory leaks in create_field_var()
ring-buffer: Do not warn in ring_buffer_map_get_reader() when reader catches up
Pull slab fix from Vlastimil Babka:
- Fix for potential infinite loop in kmalloc_nolock() when debugging
is enabled for the cache (Vlastimil Babka)
* tag 'slab-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
slab: prevent infinite loop in kmalloc_nolock() with debugging
Pull io_uring fixes from Jens Axboe:
- Remove the sync refill API that was added in this release, in
anticipation of doing it in a better way for the next release
- Fix type extension for calculating size off nr_pages, like we do
in other spots
* tag 'io_uring-6.18-20251106' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring: fix types for region size calulation
io_uring/zcrx: remove sync refill uapi
Pull SCSI fixes from James Bottomley:
"All fixes in the UFS driver.
The big contributor to the diffstats is the Intel controller S0ix/S3
fix which has to special case the suspend/resume patch for intel
controllers in ufshcd-pci.c"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Fix invalid probe error return value
scsi: ufs: ufs-pci: Set UFSHCD_QUIRK_PERFORM_LINK_STARTUP_ONCE for Intel ADL
scsi: ufs: core: Add a quirk to suppress link_startup_again
scsi: ufs: ufs-pci: Fix S0ix/S3 for Intel controllers
scsi: ufs: core: Revert "Make HID attributes visible"
scsi: ufs: core: Reduce link startup failure logging
scsi: ufs: core: Fix a race condition related to the "hid" attribute group
scsi: ufs: ufs-qcom: Fix UFS OCP issue during UFS power down (PC=3)
Pull smb server fixes from Steve French:
- More safely detect RDMA capable devices correctly
* tag 'v6.18-rc4-smb-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: detect RDMA capable netdevs include IPoIB
ksmbd: detect RDMA capable lower devices when bridge and vlan netdev is used
Disallow a module to load if SCS dynamic patching fails for its code. For
module loading, instead of running a dry-run to check for patching errors,
try to run patching in the first run and propagate any errors so module
loading will fail.
Signed-off-by: Adrian Barnaś <abarnas@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Following the pattern established with other Spectre mitigations,
do not print a message when the CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY
Kconfig option is disabled.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: shechenglong <shechenglong@xfusion.com>
Signed-off-by: Will Deacon <will@kernel.org>
Tidy up the implementation of force_pte_mapping() to make it easier to
read and introduce the split_leaf_mapping_possible() helper to reduce
code duplication in split_kernel_leaf_mapping() and
arch_kfence_init_pool().
Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Will Deacon <will@kernel.org>
Enter lazy_mmu mode while splitting a range of memory to pte mappings.
This causes barriers, which would otherwise be emitted after every pte
(and pmd/pud) write, to be deferred until exiting lazy_mmu mode.
For large systems, this is expected to significantly speed up fallback
to pte-mapping the linear map for the case where the boot CPU has
BBML2_NOABORT, but secondary CPUs do not. I haven't directly measured
it, but this is equivalent to commit 1fcb7cea8a ("arm64: mm: Batch dsb
and isb when populating pgtables").
Note that for the path from arch_kfence_init_pool(), we may sleep while
allocating memory inside the lazy_mmu mode. Sleeping is not allowed by
generic code inside lazy_mmu, but we know that the arm64 implementation
is sleep-safe. So this is ok and follows the same pattern already used
by split_kernel_leaf_mapping().
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Will Deacon <will@kernel.org>
It has been reported that split_kernel_leaf_mapping() is trying to sleep
in non-sleepable context. It does this when acquiring the
pgtable_split_lock mutex, when either CONFIG_DEBUG_PAGEALLOC or
CONFIG_KFENCE are enabled, which change linear map permissions within
softirq context during memory allocation and/or freeing. All other paths
into this function are called from sleepable context and so are safe.
But it turns out that the memory for which these 2 features may attempt
to modify the permissions is always mapped by pte, so there is no need
to attempt to split the mapping. So let's exit early in these cases and
avoid attempting to take the mutex.
There is one wrinkle to this approach; late-initialized kfence allocates
it's pool from the buddy which may be block mapped. So we must hook that
allocation and convert it to pte-mappings up front. Previously this was
done as a side-effect of kfence protecting all the individual pages in
its pool at init-time, but this no longer works due to the added early
exit path in split_kernel_leaf_mapping().
So instead, do this via the existing arch_kfence_init_pool() arch hook,
and reuse the existing linear_map_split_to_ptes() infrastructure.
Closes: https://lore.kernel.org/all/f24b9032-0ec9-47b1-8b95-c0eeac7a31c5@roeck-us.net/
Fixes: a166563e7e ("arm64: mm: support large block mapping when rodata=full")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <groeck@google.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Will Deacon <will@kernel.org>
Since commit a166563e7e ("arm64: mm: support large block mapping when
rodata=full"), __change_memory_common has more chance to fail due to
memory allocation failure when splitting page table. So check the return
value of set_memory_rox(), then bail out if it fails otherwise we may have
RW memory mapping for kprobes insn page.
Fixes: 195a1b7d83 ("arm64: kprobes: call set_memory_rox() for kprobe page")
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Will Deacon <will@kernel.org>
Commit f5a4af3c75 ("ACPI: Add acpi=nospcr to disable ACPI SPCR as
default console on ARM64") introduced a command line parameter to
prevent using SPCR provided console as default. It also introduced a
message to log this choice.
Drop the message as it is not particularly useful and can be incorrect
in situations where no SPCR is provided by the firmware.
Link: https://lore.kernel.org/all/aQN0YWUYaPYWpgJM@willie-the-truck/
Signed-off-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com>
Signed-off-by: Will Deacon <will@kernel.org>
This reverts commit bad3fa2fb9.
Commit bad3fa2fb9 ("ACPI: Suppress misleading SPCR console message
when SPCR table is absent") mistakenly assumes acpi_parse_spcr()
returning 0 to indicate a failure to parse SPCR. While addressing the
resultant incorrect logging it was deemed that dropping the message is
a better approach as it is not particularly useful.
Roll back the commit introducing the bug as a step towards dropping
the log message.
Link: https://lore.kernel.org/all/aQN0YWUYaPYWpgJM@willie-the-truck/
Signed-off-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The non-return per-CPU this_cpu_*() atomic operations are implemented as
STADD/STCLR/STSET when FEAT_LSE is available. On many microarchitecture
implementations, these instructions tend to be executed "far" in the
interconnect or memory subsystem (unless the data is already in the L1
cache). This is in general more efficient when there is contention as it
avoids bouncing cache lines between CPUs. The load atomics (e.g. LDADD
without XZR as destination), OTOH, tend to be executed "near" with the
data loaded into the L1 cache.
STADD executed back to back as in srcu_read_{lock,unlock}*() incur an
additional overhead due to the default posting behaviour on several CPU
implementations. Since the per-CPU atomics are unlikely to be used
concurrently on the same memory location, encourage the hardware to to
execute them "near" by issuing load atomics - LDADD/LDCLR/LDSET - with
the destination register unused (but not XZR).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/e7d539ed-ced0-4b96-8ecd-048a5b803b85@paulmck-laptop
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com>
[will: Add comment and link to the discussion thread]
Signed-off-by: Will Deacon <will@kernel.org>
The help message incorrectly listed '-t' as the short option for
--threads, but the actual getopt_long configuration uses '-e'.
This mismatch can confuse users and lead to incorrect command-line
usage. This patch updates the usage string to correctly show:
"-e, --threads NRTHR"
to match the implementation.
Note: checkpatch.pl reports a false-positive spelling warning on
'Run', which is intentional.
Link: https://patch.msgid.link/20251106031040.1869-1-zhangchujun@cmss.chinamobile.com
Signed-off-by: Zhang Chujun <zhangchujun@cmss.chinamobile.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Prevent application hangs caused by out-of-order fence signaling when
user fences are attached. Use drm_syncobj (via dma-fence-chain) to
guarantee that each user fence signals in order, regardless of the
signaling order of the attached fences. Ensure user fence writebacks to
user space occur in the correct sequence.
v7:
- Skip drm_syncbj create of error (CI)
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251031234050.3043507-2-matthew.brost@intel.com
(cherry picked from commit adda4e855a)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
SPI devices using a (relative) slow frequency need a larger time.
For instance, microblaze running at 83.25MHz and performing a
3 bytes transaction using a 10MHz/16 = 625kHz needed this stall
value increased to at least 20. The SPI device is quite slow, but
also is the microblaze, so set this value to 32 to give it even
more margin.
Signed-off-by: Alvaro Gamez Machado <alvaro.gamez@hazent.com>
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Link: https://patch.msgid.link/20251106134545.31942-1-alvaro.gamez@hazent.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The function create_field_var() allocates memory for 'val' through
create_hist_field() inside parse_atom(), and for 'var' through
create_var(), which in turn allocates var->type and var->var.name
internally. Simply calling kfree() to release these structures will
result in memory leaks.
Use destroy_hist_field() to properly free 'val', and explicitly release
the memory of var->type and var->var.name before freeing 'var' itself.
Link: https://patch.msgid.link/20251106120132.3639920-1-zilin@seu.edu.cn
Fixes: 02205a6752 ("tracing: Add support for 'field variables'")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull probe fixes from Masami Hiramatsu:
- tprobe-events: Fix to register tracepoint correctly
tprobe-events missed to set tracepoint data structure before
registering callback when enabling it. This sets it correctly.
- tprobe-events: Fix to put tracepoint_user when disable the event
tprobe-events missed to unregister tracepoint callback when the event
is disabled. This ensures to unregister it.
* tag 'probes-fixes-v6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: tprobe-events: Fix to put tracepoint_user when disable the tprobe
tracing: tprobe-events: Fix to register tracepoint correctly
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Add James Clark as a perf tools reviewer
- Handle '1' type symbols in /proc/kallsyms, related to anonymous
Rust closures in the DRM panic QR encoder, caught by 'perf test'
- Sync kernel header copies: MSRs, uprobe syscall,
DRM_IOCTL_GEM_CHANGE_HANDLE, KVM exit reasons, etc
* tag 'perf-tools-fixes-for-v6.18-1-2025-11-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf symbols: Handle '1' symbols in /proc/kallsyms
tools headers asm: Sync fls headers header with the kernel sources
tools headers UAPI: Sync KVM's vmx.h header with the kernel sources to handle new exit reasons
tools headers svm: Sync svm headers with the kernel sources
tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
MAINTAINERS: Add James Clark as a perf tools reviewer
tools headers UAPI: Sync linux/kvm.h with the kernel sources
tools headers UAPI: Update tools's copy of drm.h to pick DRM_IOCTL_GEM_CHANGE_HANDLE
tools headers x86 cpufeatures: Sync with the kernel sources
tools headers x86: Sync table due to introducion of uprobe syscall
tools headers: Sync uapi/linux/fcntl.h with the kernel sources
tools headers: Sync uapi/linux/prctl.h with the kernel source
tools headers uapi: Update fs.h with the kernel sources
tools arch x86: Sync msr-index.h to pick AMD64_{PERF_CNTR_GLOBAL_STATUS_SET,SAVIC_CONTROL}, IA32_L3_QOS_{ABMC,EXT}_CFG
Pull RISC-V fixes from Paul Walmsley:
- A fix to disable KASAN checks while walking a non-current task's
stackframe (following x86)
- A fix for a kvrealloc()-related memory leak in
module_frob_arch_sections()
- Two replacements of strcpy() with strscpy()
- A change to use the RISC-V .insn assembler directive when possible to
assemble instructions from hex opcodes
- Some low-impact fixes in the ptdump code and kprobes test code
* tag 'riscv-for-linus-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
cpuidle: riscv-sbi: Replace deprecated strcpy in sbi_cpuidle_init_cpu
riscv: KGDB: Replace deprecated strcpy in kgdb_arch_handle_qxfer_pkt
riscv: asm: use .insn for making custom instructions
riscv: tests: Make RISCV_KPROBES_KUNIT tristate
riscv: tests: Rename kprobes_test_riscv to kprobes_riscv
riscv: Fix memory leak in module_frob_arch_sections()
riscv: ptdump: use seq_puts() in pt_dump_seq_puts() macro
riscv: stacktrace: Disable KASAN checks for non-current tasks
Pull ACPI fixes from Rafael Wysocki:
"These fix a coding mistake in the ACPI Smart Battery Subsystem (SBS)
driver and two documentation issues:
- Fix computation of the battery->present value in acpi_battery_read()
to work when battery->id is not zero (Dan Carpenter)
- Fix comment typo in the ACPI CPPC library (Chu Guangqing)
- Fix I2C device references in two ASL examples in the firmware guide
that were broken by a previous update (Jonas Gorski)"
* tag 'acpi-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: SBS: Fix present test in acpi_battery_read()
ACPI: CPPC: Fix typo in a comment
Documentation: ACPI: i2c-muxes: fix I2C device references
Since __tracepoint_user_init() calls tracepoint_user_register() without
initializing tuser->tpoint with given tracpoint, it does not register
tracepoint stub function as callback correctly, and tprobe does not work.
Initializing tuser->tpoint correctly before tracepoint_user_register()
so that it sets up tracepoint callback.
I confirmed below example works fine again.
echo "t sched_switch preempt prev_pid=prev->pid next_pid=next->pid" > /sys/kernel/tracing/dynamic_events
echo 1 > /sys/kernel/tracing/events/tracepoints/sched_switch/enable
cat /sys/kernel/tracing/trace_pipe
Link: https://lore.kernel.org/all/176244793514.155515.6466348656998627773.stgit@devnote2/
Fixes: 2867495dea ("tracing: tprobe-events: Register tracepoint when enable tprobe event")
Reported-by: Beau Belgrave <beaub@linux.microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Tested-by: Beau Belgrave <beaub@linux.microsoft.com>
Reviewed-by: Beau Belgrave <beaub@linux.microsoft.com>
Merge two documentation fixes for 6.18-rc5, a commet typo fix in the
ACPI CPPC library (Chu Guangqing) and fixes for two ASL examples in the
firmware guide (Jonas Gorski).
* acpi-cppc:
ACPI: CPPC: Fix typo in a comment
* acpi-docs:
Documentation: ACPI: i2c-muxes: fix I2C device references
Pull crypto library fixes from Eric Biggers:
"Two Curve25519 related fixes:
- Re-enable KASAN support on curve25519-hacl64.c with gcc.
- Disable the arm optimized Curve25519 code on CPU_BIG_ENDIAN
kernels. It has always been broken in that configuration"
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: arm/curve25519: Disable on CPU_BIG_ENDIAN
lib/crypto: curve25519-hacl64: Fix older clang KASAN workaround for GCC
Pull fscrypt fix from Eric Biggers:
"Fix an UBSAN warning that started occurring when the block layer
started supporting logical_block_size > PAGE_SIZE"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fscrypt: fix left shift underflow when inode->i_blkbits > PAGE_SHIFT
Pull hardening fixes from Kees Cook:
"This is a work-around for a (now fixed) corner case in the arm32 build
with Clang KCFI enabled.
- Introduce __nocfi_generic for arm32 Clang (Nathan Chancellor)"
* tag 'hardening-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
libeth: xdp: Disable generic kCFI pass for libeth_xdp_tx_xmit_bulk()
ARM: Select ARCH_USES_CFI_GENERIC_LLVM_PASS
compiler_types: Introduce __nocfi_generic
While connecting, the MAC address can already no longer be
changed. The change is already rejected if netif_carrier_ok(),
but of course that's not true yet while connecting. Check for
auth_data or assoc_data, so the MAC address cannot be changed.
Also more comprehensively check that there are no stations on
the interface being changed - if any peer station is added it
will know about our address already, so we cannot change it.
Cc: stable@vger.kernel.org
Fixes: 3c06e91b40 ("wifi: mac80211: Support POWERED_ADDR_CHANGE feature")
Link: https://patch.msgid.link/20251105154119.f9f6c1df81bb.I9bb3760ede650fb96588be0d09a5a7bdec21b217@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In the commit referenced by the Fixes tag, clk_hw_get_clk()
was added in va_macro_probe() to get the fsgen clock,
but forgot to add the corresponding clk_put() in va_macro_remove().
This leads to a clock reference leak when the driver is unloaded.
Switch to devm_clk_hw_get_clk() to automatically manage the
clock resource.
Fixes: 30097967e0 ("ASoC: codecs: va-macro: use fsgen as clock")
Suggested-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://patch.msgid.link/20251106143114.729-1-vulab@iscas.ac.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
[Why]
drm_dp_mst_topology_queue_probe() is used under the assumption that
mst is already initialized. If we connect system with SST first
then switch to the mst branch during suspend, we will fail probing
topology by calling the wrong API since the mst manager is yet to
be initialized.
[How]
At dm_resume(), once it's detected as mst branc connected, check if
the mst is initialized already. If not, call
dm_helpers_dp_mst_start_top_mgr() instead to initialize mst
V2: Adjust the commit msg a bit
Fixes: bc068194f5 ("drm/amd/display: Don't write DP_MSTM_CTRL after LT")
Cc: Fangzhi Zuo <jerry.zuo@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 62320fb8d9)
Cc: stable@vger.kernel.org
commit c760bcda83 ("drm/amd: Check whether secure display TA loaded
successfully") attempted to fix extra messages, but failed to port the
cleanup that was in commit 5c6d52ff4b ("drm/amd: Don't try to enable
secure display TA multiple times") to prevent multiple tries.
Add that to the failure handling path even on a quick failure.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4679
Fixes: c760bcda83 ("drm/amd: Check whether secure display TA loaded successfully")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4104c0a454)
On PF passthrough environment, after hibernate and then resume, coralgemm
will cause gpu page fault.
Mode1 reset happens during hibernate, but partition mode is not restored
on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right
after resume. When CP access the MQD BO, wrong stride size is used,
this will cause out of bound access on the MQD BO, resulting page fault.
The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called
when resume from a hibernation.
KFD resume is called separately during a reset recovery or resume from
suspend sequence. Hence it's not required to be called as part of
partition switch.
Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5d1b32cfe4)
Pull networking fixes from Jakub Kicinski:
Including fixes from bluetooth and wireless.
Current release - new code bugs:
- ptp: expose raw cycles only for clocks with free-running counter
- bonding: fix null-deref in actor_port_prio setting
- mdio: ERR_PTR-check regmap pointer returned by
device_node_to_regmap()
- eth: libie: depend on DEBUG_FS when building LIBIE_FWLOG
Previous releases - regressions:
- virtio_net: fix perf regression due to bad alignment of
virtio_net_hdr_v1_hash
- Revert "wifi: ath10k: avoid unnecessary wait for service ready
message" caused regressions for QCA988x and QCA9984
- Revert "wifi: ath12k: Fix missing station power save configuration"
caused regressions for WCN7850
- eth: bnxt_en: shutdown FW DMA in bnxt_shutdown(), fix memory
corruptions after kexec
Previous releases - always broken:
- virtio-net: fix received packet length check for big packets
- sctp: fix races in socket diag handling
- wifi: add an hrtimer-based delayed work item to avoid low
granularity of timers set relatively far in the future, and use it
where it matters (e.g. when performing AP-scheduled channel switch)
- eth: mlx5e:
- correctly propagate error in case of module EEPROM read failure
- fix HW-GRO on systems with PAGE_SIZE == 64kB
- dsa: b53: fixes for tagging, link configuration / RMII, FDB,
multicast
- phy: lan8842: implement latest errata"
* tag 'net-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
selftests/vsock: avoid false-positives when checking dmesg
net: bridge: fix MST static key usage
net: bridge: fix use-after-free due to MST port state bypass
lan966x: Fix sleeping in atomic context
bonding: fix NULL pointer dereference in actor_port_prio setting
net: dsa: microchip: Fix reserved multicast address table programming
net: wan: framer: pef2256: Switch to devm_mfd_add_devices()
net: libwx: fix device bus LAN ID
net/mlx5e: SHAMPO, Fix header formulas for higher MTUs and 64K pages
net/mlx5e: SHAMPO, Fix skb size check for 64K pages
net/mlx5e: SHAMPO, Fix header mapping for 64K pages
net: ti: icssg-prueth: Fix fdb hash size configuration
net/mlx5e: Fix return value in case of module EEPROM read error
net: gro_cells: Reduce lock scope in gro_cell_poll
libie: depend on DEBUG_FS when building LIBIE_FWLOG
wifi: mac80211_hwsim: Limit destroy_on_close radio removal to netgroup
netpoll: Fix deadlock in memory allocation under spinlock
net: ethernet: ti: netcp: Standardize knav_dma_open_channel to return NULL on error
virtio-net: fix received length check in big packets
bnxt_en: Fix warning in bnxt_dl_reload_down()
...
After commit d50f210913 ("kbuild: align modinfo section for Secureboot
Authenticode EDK2 compat"), running modules_install with certain
versions of kmod (such as 29.1 in Ubuntu Jammy) in certain
configurations may fail with:
depmod: ERROR: kmod_builtin_iter_next: unexpected string without modname prefix
The additional padding bytes to ensure .modinfo is aligned within
vmlinux.unstripped are unexpected by kmod, as this section has always
just been null-terminated strings.
Strip the trailing padding bytes from modules.builtin.modinfo after it
has been extracted from vmlinux.unstripped to restore the format that
kmod expects while keeping .modinfo aligned within vmlinux.unstripped to
avoid regressing the Authenticode calculation fix for EDK2.
Cc: stable@vger.kernel.org
Fixes: d50f210913 ("kbuild: align modinfo section for Secureboot Authenticode EDK2 compat")
Reported-by: Omar Sandoval <osandov@fb.com>
Reported-by: Samir M <samir@linux.ibm.com>
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/7fef7507-ad64-4e51-9bb8-c9fb6532e51e@linux.ibm.com/
Tested-by: Omar Sandoval <osandov@fb.com>
Tested-by: Samir M <samir@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20251105-kbuild-fix-builtin-modinfo-for-kmod-v1-1-b419d8ad4606@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Sometimes VMs will have some intermittent dmesg warnings that are
unrelated to vsock. Change the dmesg parsing to filter on strings
containing 'vsock' to avoid false positive failures that are unrelated
to vsock. The downside is that it is possible for some vsock related
warnings to not contain the substring 'vsock', so those will be missed.
Fixes: a4a65c6fe0 ("selftests/vsock: add initial vmtest.sh for vsock")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251105-vsock-vmtest-dmesg-fix-v2-1-1a042a14892c@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikolay Aleksandrov says:
====================
net: bridge: fix two MST bugs
Patch 01 fixes a race condition that exists between expired fdb deletion
and port deletion when MST is enabled. Learning can happen after the
port's state has been changed to disabled which could lead to that
port's memory being used after it's been freed. The issue was reported
by syzbot, more information in patch 01. Patch 02 fixes an issue with
MST's static key which Ido spotted, we can have multiple bridges with MST
and a single bridge can erroneously disable it for all.
====================
Link: https://patch.msgid.link/20251105111919.1499702-1-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot reported[1] a use-after-free when deleting an expired fdb. It is
due to a race condition between learning still happening and a port being
deleted, after all its fdbs have been flushed. The port's state has been
toggled to disabled so no learning should happen at that time, but if we
have MST enabled, it will bypass the port's state, that together with VLAN
filtering disabled can lead to fdb learning at a time when it shouldn't
happen while the port is being deleted. VLAN filtering must be disabled
because we flush the port VLANs when it's being deleted which will stop
learning. This fix adds a check for the port's vlan group which is
initialized to NULL when the port is getting deleted, that avoids the port
state bypass. When MST is enabled there would be a minimal new overhead
in the fast-path because the port's vlan group pointer is cache-hot.
[1] https://syzkaller.appspot.com/bug?extid=dd280197f0f7ab3917be
Fixes: ec7328b591 ("net: bridge: mst: Multiple Spanning Tree (MST) mode")
Reported-by: syzbot+dd280197f0f7ab3917be@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/69088ffa.050a0220.29fc44.003d.GAE@google.com/
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251105111919.1499702-2-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The following warning was seen when we try to connect using ssh to the device.
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:575
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 104, name: dropbear
preempt_count: 1, expected: 0
INFO: lockdep is turned off.
CPU: 0 UID: 0 PID: 104 Comm: dropbear Tainted: G W 6.18.0-rc2-00399-g6f1ab1b109b9-dirty #530 NONE
Tainted: [W]=WARN
Hardware name: Generic DT based system
Call trace:
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x7c/0xac
dump_stack_lvl from __might_resched+0x16c/0x2b0
__might_resched from __mutex_lock+0x64/0xd34
__mutex_lock from mutex_lock_nested+0x1c/0x24
mutex_lock_nested from lan966x_stats_get+0x5c/0x558
lan966x_stats_get from dev_get_stats+0x40/0x43c
dev_get_stats from dev_seq_printf_stats+0x3c/0x184
dev_seq_printf_stats from dev_seq_show+0x10/0x30
dev_seq_show from seq_read_iter+0x350/0x4ec
seq_read_iter from seq_read+0xfc/0x194
seq_read from proc_reg_read+0xac/0x100
proc_reg_read from vfs_read+0xb0/0x2b0
vfs_read from ksys_read+0x6c/0xec
ksys_read from ret_fast_syscall+0x0/0x1c
Exception stack(0xf0b11fa8 to 0xf0b11ff0)
1fa0: 00000001 00001000 00000008 be9048d8 00001000 00000001
1fc0: 00000001 00001000 00000008 00000003 be905920 0000001e 00000000 00000001
1fe0: 0005404c be9048c0 00018684 b6ec2cd8
It seems that we are using a mutex in a atomic context which is wrong.
Change the mutex with a spinlock.
Fixes: 12c2d0a5b8 ("net: lan966x: add ethtool configuration and statistics")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251105074955.1766792-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When reporting tx completion using ieee80211_tx_status_xxx() family of
functions, the status part of the struct ieee80211_tx_info nested in the
skb is used to report things like transmit rates & retry count to mac80211
On the TX data path, this is correctly memset to 0 before calling
ieee80211_tx_status_ext(), but on the tx mgmt path this was not done.
This leads to mac80211 treating garbage values as valid transmit counters
(like tx retries for example) and accounting them as real statistics that
makes their way to userland via station dump.
The same issue was resolved in ath12k by commit 9903c0986f ("wifi:
ath12k: Add memset and update default rate value in wmi tx completion")
Tested-on: QCN9074 PCI WLAN.HK.2.9.0.1-01977-QCAHKSWPL_SILICONZ-1
Fixes: d5c65159f2 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
Signed-off-by: Nicolas Escande <nico.escande@gmail.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20251104083957.717825-1-nico.escande@gmail.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Liang reported an issue where setting a slave’s actor_port_prio to
predefined values such as 0, 255, or 65535 would cause a system crash.
The problem occurs because in bond_opt_parse(), when the provided value
matches a predefined table entry, the function returns that table entry,
which does not contain slave information. Later, in
bond_option_actor_port_prio_set(), calling bond_slave_get_rtnl() leads
to a NULL pointer dereference.
Since actor_port_prio is defined as a u16 and initialized to the default
value of 255 in ad_initialize_port(), there is no need for the
bond_actor_port_prio_tbl. Using the BOND_OPTFLAG_RAWVAL flag is sufficient.
Fixes: 6b6dc81ee7 ("bonding: add support for per-port LACP actor priority")
Reported-by: Liang Li <liali@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20251105072620.164841-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
KSZ9477/KSZ9897 and LAN937X families of switches use a reserved multicast
address table for some specific forwarding with some multicast addresses,
like the one used in STP. The hardware assumes the host port is the last
port in KSZ9897 family and port 5 in LAN937X family. Most of the time
this assumption is correct but not in other cases like KSZ9477.
Originally the function just setups the first entry, but the others still
need update, especially for one common multicast address that is used by
PTP operation.
LAN937x also uses different register bits when accessing the reserved
table.
Fixes: 457c182af5 ("net: dsa: microchip: generic access to ksz9477 static and reserved table")
Signed-off-by: Tristram Ha <tristram.ha@microchip.com>
Tested-by: Łukasz Majewski <lukma@nabladev.com>
Link: https://patch.msgid.link/20251105033741.6455-1-Tristram.Ha@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since sdma hardware configure postpone to transfer phase, have to disable
dma request before dma transfer setup because there is a hardware
limitation on sdma event enable(ENBLn) as below:
"It is thus essential for the Arm platform to program them before any DMA
request is triggered to the SDMA, otherwise an unpredictable combination
of channels may be started."
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Signed-off-by: Robin Gong <yibin.gong@nxp.com>
Link: https://patch.msgid.link/20251024055320.408482-1-carlos.song@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The bus_find_device_by_name() function returns a device pointer with an
incremented reference count, but the original code was missing put_device()
calls in some return paths, leading to reference count leaks.
Fix this by ensuring put_device() is called before function exit after
bus_find_device_by_name() succeeds
This follows the same pattern used elsewhere in the kernel where
bus_find_device_by_name() is properly paired with put_device().
Found via static analysis and code review.
Fixes: 4f8ef33dd4 ("ASoC: soc_sdw_utils: skip the endpoint that doesn't present")
Cc: stable@vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://patch.msgid.link/20251029071804.8425-1-linmq006@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The probe function enables regulators at the beginning
but fails to disable them in its error handling path.
If any operation after enabling the regulators fails,
the probe will exit with an error, leaving the regulators
permanently enabled, which could lead to a resource leak.
Add a proper error handling path to call regulator_bulk_disable()
before returning an error.
Fixes: 9a397f4736 ("ASoC: cs4271: add regulator consumer support")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20251105062246.1955-1-vulab@iscas.ac.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
The DRM_GEM_SHMEM_HELPER helper requires MMU enabled because it uses
vmf_insert_pfn() in its mmap implementation. On NOMMU configurations
(e.g. some RISC-V randconfig builds), this symbol is unavailable and
selecting DRM_GEM_SHMEM_HELPER causes a modpost undefined reference:
ERROR: modpost: "vmf_insert_pfn" [drivers/gpu/drm/drm_shmem_helper.ko] undefined!
Normally, Kconfig prevents this helper from being selected when
CONFIG_MMU=n. However, in some randconfig builds (such as those used by
0day CI), select statements can override unmet dependencies, triggering
the issue.
Add an explicit dependency on MMU to DRM_PIXPAPER to prevent this.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202510280213.0rlYA4T3-lkp@intel.com/
Fixes: 0c4932f6dd ("drm/tiny: pixpaper: Fix missing dependency on DRM_GEM_SHMEM_HELPER")
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: LiangCheng Wang <zaq14760@gmail.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20251028-bar-v1-1-edfbd13fafff@gmail.com
Fix all kernel-doc warnings in <uapi/linux/isst_if.h>:
- don't use "[]" in the variable name in kernel-doc
- add a few missing entries
- change "power_domain" to "power_domain_id" in kernel-doc to match
the struct member name
- add a leading '@' on a few existing kernel-doc lines
- use '_' instead of '-' in struct member names
Examples (but not all 27 warnings):
Warning: include/uapi/linux/isst_if.h:63 struct member 'cpu_map'
not described in 'isst_if_cpu_maps'
Warning: ../include/uapi/linux/isst_if.h:95 struct member 'req_count'
not described in 'isst_if_io_regs'
Warning: include/uapi/linux/isst_if.h:132 struct member 'mbox_cmd'
not described in 'isst_if_mbox_cmds'
Warning: ../include/uapi/linux/isst_if.h:183 struct member 'supported'
not described in 'isst_core_power'
Warning: ../include/uapi/linux/isst_if.h:206 struct member
'power_domain_id' not described in 'isst_clos_param'
Warning: ../include/uapi/linux/isst_if.h:239 struct member 'assoc_info'
not described in 'isst_if_clos_assoc_cmds'
Warning: ../include/uapi/linux/isst_if.h:286 struct member 'sst_tf_support'
not described in 'isst_perf_level_info'
Warning: ../include/uapi/linux/isst_if.h:375 struct member 'trl_freq_mhz'
not described in 'isst_perf_level_data_info'
Warning: ../include/uapi/linux/isst_if.h:475 struct member 'max_buckets'
not described in 'isst_turbo_freq_info'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20251023194615.180824-1-rdunlap@infradead.org
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
The newer HP Omen laptops, such as Omen 16-wf1xxx, use the same
WMI-based thermal profile interface as Victus 16-r1000 and 16-s1000
models.
Add the DMI board name "8C78" to the victus_s_thermal_profile_boards
list to enable proper fan and thermal mode control.
Tested on: HP Omen 16-wf1xxx (board 8C78)
Result:
* Fan RPMs are readable
* echo 0 | sudo tee /sys/devices/platform/hp-wmi/hwmon/*/pwm1_enable
allows the fans to run on max RPM.
Signed-off-by: Krishna Chomal <krishna.chomal108@gmail.com>
Link: https://patch.msgid.link/20251018111001.56625-1-krishna.chomal108@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Shrikanth noted that the per-cpu reference counter was still some 10%
slower than the old immutable option (which removes the reference
counting entirely).
Further optimize the per-cpu reference counter by:
- switching from RCU to preempt;
- using __this_cpu_*() since we now have preempt disabled;
- switching from smp_load_acquire() to READ_ONCE().
This is all safe because disabling preemption inhibits the RCU grace
period exactly like rcu_read_lock().
Having preemption disabled allows using __this_cpu_*() provided the
only access to the variable is in task context -- which is the case
here.
Furthermore, since we know changing fph->state to FR_ATOMIC demands a
full RCU grace period we can rely on the implied smp_mb() from that to
replace the acquire barrier().
This is very similar to the percpu_down_read_internal() fast-path.
The reason this is significant for PowerPC is that it uses the generic
this_cpu_*() implementation which relies on local_irq_disable() (the
x86 implementation relies on it being a single memop instruction to be
IRQ-safe). Switching to preempt_disable() and __this_cpu*() avoids
this IRQ state swizzling. Also, PowerPC needs LWSYNC for the ACQUIRE
barrier, not having to use explicit barriers safes a bunch.
Combined this reduces the performance gap by half, down to some 5%.
Fixes: 760e6f7bef ("futex: Remove support for IMMUTABLE")
Reported-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20251106092929.GR4067720@noisy.programming.kicks-ass.net
When a cfs_rq is to be throttled, its limbo list should be empty and
that's why there is a warn in tg_throttle_down() for non empty
cfs_rq->throttled_limbo_list.
When running a test with the following hierarchy:
root
/ \
A* ...
/ | \ ...
B
/ \
C*
where both A and C have quota settings, that warn on non empty limbo list
is triggered for a cfs_rq of C, let's call it cfs_rq_c(and ignore the cpu
part of the cfs_rq for the sake of simpler representation).
Debug showed it happened like this:
Task group C is created and quota is set, so in tg_set_cfs_bandwidth(),
cfs_rq_c is initialized with runtime_enabled set, runtime_remaining
equals to 0 and *unthrottled*. Before any tasks are enqueued to cfs_rq_c,
*multiple* throttled tasks can migrate to cfs_rq_c (e.g., due to task
group changes). When enqueue_task_fair(cfs_rq_c, throttled_task) is
called and cfs_rq_c is in a throttled hierarchy (e.g., A is throttled),
these throttled tasks are directly placed into cfs_rq_c's limbo list by
enqueue_throttled_task().
Later, when A is unthrottled, tg_unthrottle_up(cfs_rq_c) enqueues these
tasks. The first enqueue triggers check_enqueue_throttle(), and with zero
runtime_remaining, cfs_rq_c can be throttled in throttle_cfs_rq() if it
can't get more runtime and enters tg_throttle_down(), where the warning
is hit due to remaining tasks in the limbo list.
I think it's a chaos to trigger throttle on unthrottle path, the status
of a being unthrottled cfs_rq can be in a mixed state in the end, so fix
this by granting 1ns to cfs_rq in tg_set_cfs_bandwidth(). This ensures
cfs_rq_c has a positive runtime_remaining when initialized as unthrottled
and cannot enter tg_unthrottle_up() with zero runtime_remaining.
Also, update outdated comments in tg_throttle_down() since
unthrottle_cfs_rq() is no longer called with zero runtime_remaining.
While at it, remove a redundant assignment to se in tg_throttle_down().
Fixes: e1fad12dcb ("sched/fair: Switch to task based throttle model")
Reviewed-By: Benjamin Segall <bsegall@google.com>
Suggested-by: Benjamin Segall <bsegall@google.com>
Signed-off-by: Aaron Lu <ziqianlu@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Hao Jia <jiahao1@lixiang.com>
Link: https://patch.msgid.link/20251030032755.560-1-ziqianlu@bytedance.com
After restructuring and splitting the HDMI codec driver code, each
HDMI codec driver contains the own build_controls and build_pcms ops.
A copy-n-paste error put the wrong entries for nvhdmi-mcp driver; both
build_controls and build_pcms are swapped. Unfortunately both
callbacks have the very same form, and the compiler didn't complain
it, either. This resulted in a NULL dereference because the PCM
instance hasn't been initialized at calling the build_controls
callback.
Fix it by passing the proper entries.
Fixes: ad781b550f ("ALSA: hda/hdmi: Rewrite to new probe method")
Cc: <stable@vger.kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220743
Link: https://patch.msgid.link/20251106104647.25805-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
kmemleak occasionally reports leaking xfs_busy_extents structure
from xfs_scrub calls after running xfs/528 (but attributed to following
tests), which seems to be caused by not freeing the xfs_busy_extents
structure when tr.queued is 0 and xfs_trim_rtgroup_extents breaks out
of the main loop. Free the structure in this case.
Fixes: a3315d1130 ("xfs: use rtgroup busy extent list for FITRIM")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
In review of a followup work, Harry noticed a potential infinite loop.
Upon closed inspection, it already exists for kmalloc_nolock() on a
cache with debugging enabled, since commit af92793e52 ("slab:
Introduce kmalloc_nolock() and kfree_nolock().")
When alloc_single_from_new_slab() fails to trylock node list_lock, we
keep retrying to get partial slab or allocate a new slab. If we indeed
interrupted somebody holding the list_lock, the trylock fill fail
deterministically and we end up allocating and defer-freeing slabs
indefinitely with no progress.
To fix it, fail the allocation if spinning is not allowed. This is
acceptable in the restricted context of kmalloc_nolock(), especially
with debugging enabled.
Reported-by: Harry Yoo <harry.yoo@oracle.com>
Closes: https://lore.kernel.org/all/aQLqZjjq1SPD3Fml@hyeyoo/
Fixes: af92793e52 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
Acked-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20251103-fix-nolock-loop-v1-1-6e2b3e82b9da@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The qm_get_qos_value() function calls bus_find_device_by_name() which
increases the device reference count, but fails to call put_device()
to balance the reference count and lead to a device reference leak.
Add put_device() calls in both the error path and success path to
properly balance the reference count.
Found via static analysis.
Fixes: 22d7a6c39c ("crypto: hisilicon/qm - add pci bdf number check")
Cc: stable@vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Longfang Liu <liulongfang@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The driver calls mfd_add_devices() but fails to call mfd_remove_devices()
in error paths after successful MFD device registration and in the remove
function. This leads to resource leaks where MFD child devices are not
properly unregistered.
Replace mfd_add_devices with devm_mfd_add_devices to automatically
manage the device resources.
Fixes: c96e976d9a ("net: wan: framer: Add support for the Lantiq PEF2256 framer")
Suggested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Acked-by: Herve Codina <herve.codina@bootlin.com>
Link: https://patch.msgid.link/20251105034716.662-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MLX5E_SHAMPO_WQ_HEADER_PER_PAGE and
MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE macros are used directly in
several places under the assumption that there will always be more
headers per WQE than headers per page. However, this assumption doesn't
hold for 64K page sizes and higher MTUs (> 4K). This can be first
observed during header page allocation: ksm_entries will become 0 during
alignment to MLX5E_SHAMPO_WQ_HEADER_PER_PAGE.
This patch introduces 2 additional members to the mlx5e_shampo_hd struct
which are meant to be used instead of the macrose mentioned above.
When the number of headers per WQE goes below
MLX5E_SHAMPO_WQ_HEADER_PER_PAGE, clamp the number of headers per
page and expand the header size accordingly so that the headers
for one WQE cover a full page.
All the formulas are adapted to use these two new members.
Fixes: 945ca432bf ("net/mlx5e: SHAMPO, Drop info array")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1762238915-1027590-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mlx5e_hw_gro_skb_has_enough_space() uses a formula to check if there is
enough space in the skb frags to store more data. This formula is
incorrect for 64K page sizes and it triggers early GRO session
termination because the first fragment will blow up beyond
GRO_LEGACY_MAX_SIZE.
This patch adds a special case for page sizes >= GRO_LEGACY_MAX_SIZE
(64K) which uses the skb->len instead. Within this context,
the check is safe from fragment overflow because the hardware
will continuously fill the data up to the reservation size of 64K
and the driver will coalesce all data from the same page to the same
fragment. This means that the data will span one fragment or at most
two for such a large page size.
It is expected that the if statement will be optimized out as the
check is done with constants.
Fixes: 92552d3abd ("net/mlx5e: HW_GRO cqe handler implementation")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1762238915-1027590-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
HW-GRO is broken on mlx5 for 64K page sizes. The patch in the fixes tag
didn't take into account larger page sizes when doing an align down
of max_ksm_entries. For 64K page size, max_ksm_entries is 0 which will skip
mapping header pages via WQE UMR. This breaks header-data split
and will result in the following syndrome:
mlx5_core 0000:00:08.0 eth2: Error cqe on cqn 0x4c9, ci 0x0, qn 0x1133, opcode 0xe, syndrome 0x4, vendor syndrome 0x32
00000000: 00 00 00 00 04 4a 00 00 00 00 00 00 20 00 93 32
00000010: 55 00 00 00 fb cc 00 00 00 00 00 00 07 18 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4a
00000030: 00 00 3b c7 93 01 32 04 00 00 00 00 00 00 bf e0
mlx5_core 0000:00:08.0 eth2: ERR CQE on RQ: 0x1133
Furthermore, the function that fills in WQE UMRs for the headers
(mlx5e_build_shampo_hd_umr()) only supports mapping page sizes that
fit in a single UMR WQE.
This patch goes back to the old non-aligned max_ksm_entries value and it
changes mlx5e_build_shampo_hd_umr() to support mapping a large page over
multiple UMR WQEs.
This means that mlx5e_build_shampo_hd_umr() can now leave a page only
partially mapped. The caller, mlx5e_alloc_rx_hd_mpwqe(), ensures that
there are enough UMR WQEs to cover complete pages by working on
ksm_entries that are multiples of MLX5E_SHAMPO_WQ_HEADER_PER_PAGE.
Fixes: 8a0ee54027 ("net/mlx5e: SHAMPO, Simplify UMR allocation for headers")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1762238915-1027590-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ICSSG driver does the initial FDB configuration which
includes setting the control registers. Other run time
management like learning is managed by the PRU's. The default
FDB hash size used by the firmware is 512 slots, which is
currently missing in the current driver. Update the driver
FDB config to include FDB hash size as well.
Please refer trm [1] 6.4.14.12.17 section on how the FDB config
register gets configured. From the table 6-1404, there is a reset
field for FDB_HAS_SIZE which is 4, meaning 1024 slots. Currently
the driver is not updating this reset value from 4(1024 slots) to
3(512 slots). This patch fixes this by updating the reset value
to 512 slots.
[1]: https://www.ti.com/lit/pdf/spruim2
Fixes: abd5576b9c ("net: ti: icssg-prueth: Add support for ICSSG switch firmware")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251104104415.3110537-1-m-malladi@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
One GRO-cell device's NAPI callback can nest into the GRO-cell of
another device if the underlying device is also using GRO-cell.
This is the case for IPsec over vxlan.
These two GRO-cells are separate devices. From lockdep's point of view
it is the same because each device is sharing the same lock class and so
it reports a possible deadlock assuming one device is nesting into
itself.
Hold the bh_lock only while accessing gro_cell::napi_skbs in
gro_cell_poll(). This reduces the locking scope and avoids acquiring the
same lock class multiple times.
Fixes: 25718fdcbd ("net: gro_cells: Use nested-BH locking for gro_cell")
Reported-by: Gal Pressman <gal@nvidia.com>
Closes: https://lore.kernel.org/all/66664116-edb8-48dc-ad72-d5223696dd19@nvidia.com/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20251104153435.ty88xDQt@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Adding test that attaches kprobe/kretprobe multi and verifies the
ORC stacktrace matches expected functions.
Adding bpf_testmod_stacktrace_test function to bpf_testmod kernel
module which is called through several functions so we get reliable
call path for stacktrace.
The test is only for ORC unwinder to keep it simple.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251104215405.168643-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Currently we don't get stack trace via ORC unwinder on top of fgraph exit
handler. We can see that when generating stacktrace from kretprobe_multi
bpf program which is based on fprobe/fgraph.
The reason is that the ORC unwind code won't get pass the return_to_handler
callback installed by fgraph return probe machinery.
Solving this by creating stack frame in return_to_handler expected by
ftrace_graph_ret_addr function to recover original return address and
continue with the unwind.
Also updating the pt_regs data with cs/flags/rsp which are needed for
successful stack retrieval from ebpf bpf_get_stackid helper.
- in get_perf_callchain we check user_mode(regs) so CS has to be set
- in perf_callchain_kernel we call perf_hw_regs(regs), so EFLAGS/FIXED
has to be unset
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251104215405.168643-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
This reverts commit 83f44ae0f8.
Currently we store initial stacktrace entry twice for non-HW ot_regs, which
means callers that fail perf_hw_regs(regs) condition in perf_callchain_kernel.
It's easy to reproduce this bpftrace:
# bpftrace -e 'tracepoint:sched:sched_process_exec { print(kstack()); }'
Attaching 1 probe...
bprm_execve+1767
bprm_execve+1767
do_execveat_common.isra.0+425
__x64_sys_execve+56
do_syscall_64+133
entry_SYSCALL_64_after_hwframe+118
When perf_callchain_kernel calls unwind_start with first_frame, AFAICS
we do not skip regs->ip, but it's added as part of the unwind process.
Hence reverting the extra perf_callchain_store for non-hw regs leg.
I was not able to bisect this, so I'm not really sure why this was needed
in v5.2 and why it's not working anymore, but I could see double entries
as far as v5.10.
I did the test for both ORC and framepointer unwind with and without the
this fix and except for the initial entry the stacktraces are the same.
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251104215405.168643-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
8 and 16 bit formats use a different layout on
GB20x than they did on prior chips. Add the
corresponding DRM format modifiers to the list of
modifiers supported by the display engine on such
chips, and filter the supported modifiers for each
format based on its bytes per pixel in
nv50_plane_format_mod_supported().
Note this logic will need to be updated when GB10
support is added, since it is a GB20x chip that
uses the pre-GB20x sector layout for all formats.
Fixes: 6cc6e08d45 ("drm/nouveau/kms: add support for GB20x")
Signed-off-by: James Jones <jajones@nvidia.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251030181153.1208-3-jajones@nvidia.com
The layout of bits within the individual tiles
(referred to as sectors in the
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro)
changed for 8 and 16-bit surfaces starting in
Blackwell 2 GPUs (With the exception of GB10).
To denote the difference, extend the sector field
in the parametric format modifier definition used
to generate modifier values for NVIDIA hardware.
Without this change, it would be impossible to
differentiate the two layouts based on modifiers,
and as a result software could attempt to share
surfaces directly between pre-GB20x and GB20x
cards, resulting in corruption when the surface
was accessed on one of the GPUs after being
populated with content by the other.
Of note: This change causes the
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro to
evaluate its "s" parameter twice, with the side
effects that entails. I surveyed all usage of the
modifier in the kernel and Mesa code, and that
does not appear to be problematic in any current
usage, but I thought it was worth calling out.
Fixes: 6cc6e08d45 ("drm/nouveau/kms: add support for GB20x")
Signed-off-by: James Jones <jajones@nvidia.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251030181153.1208-2-jajones@nvidia.com
Set the DMA mask before calling nvkm_device_ctor(), so that when the
flush page is created in nvkm_fb_ctor(), the allocation will not fail
if the page is outside of DMA address space, which can easily happen if
IOMMU is disable. In such situations, you will get an error like this:
nouveau 0000:65:00.0: DMA addr 0x0000000107c56000+4096 overflow (mask ffffffff, bus limit 0).
Commit 38f5359354 ("rm/nouveau/pci: set streaming DMA mask early")
set the mask after calling nvkm_device_ctor(), but back then there was
no flush page being created, which might explain why the mask wasn't
set earlier.
Flush page allocation was added in commit 5728d06419 ("drm/nouveau/fb:
handle sysmem flush page from common code"). nvkm_fb_ctor() calls
alloc_page(), which can allocate a page anywhere in system memory, but
then calls dma_map_page() on that page. But since the DMA mask is still
set to 32, the map can fail if the page is allocated above 4GB. This is
easy to reproduce on systems with a lot of memory and IOMMU disabled.
An alternative approach would be to force the allocation of the flush
page to low memory, by specifying __GFP_DMA32. However, this would
always allocate the page in low memory, even though the hardware can
access high memory.
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patch.msgid.link/20251014174512.3172102-1-ttabi@nvidia.com
Pull rust fixes from Miguel Ojeda:
- Fix/workaround a couple Rust 1.91.0 build issues when sanitizers are
enabled due to extra checking performed by the compiler and an
upstream issue already fixed for Rust 1.93.0
- Fix future Rust 1.93.0 builds by supporting the stabilized name for
the 'no-jump-tables' flag
- Fix a couple private/broken intra-doc links uncovered by the future
move of pin-init to 'syn'
* tag 'rust-fixes-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: kbuild: support `-Cjump-tables=n` for Rust 1.93.0
rust: kbuild: workaround `rustdoc` doctests modifier bug
rust: kbuild: treat `build_error` and `rustdoc` as kernel objects
rust: condvar: fix broken intra-doc link
rust: devres: fix private intra-doc link
data_reloc_print_warning_inode() calls btrfs_get_fs_root() to obtain
local_root, but fails to release its reference when paths_from_inode()
returns an error. This causes a potential memory leak.
Add a missing btrfs_put_root() call in the error path to properly
decrease the reference count of local_root.
Fixes: b9a9a85059 ("btrfs: output affected files when relocation fails")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
scrub_raid56_parity_stripe() allocates a bio with bio_alloc(), but
fails to release it on some error paths, leading to a potential
memory leak.
Add the missing bio_put() calls to properly drop the bio reference
in those error cases.
Fixes: 1009254bf2 ("btrfs: scrub: use scrub_stripe to implement RAID56 P/Q scrub")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When logging that a new name exists, we skip updating the inode's
last_log_commit field to prevent a later explicit fsync against the inode
from doing nothing (as updating last_log_commit makes btrfs_inode_in_log()
return true). We are detecting, at btrfs_log_inode(), that logging a new
name is happening by checking the logging mode is not LOG_INODE_EXISTS,
but that is not enough because we may log parent directories when logging
a new name of a file in LOG_INODE_ALL mode - we need to check that the
logging_new_name field of the log context too.
An example scenario where this results in an explicit fsync against a
directory not persisting changes to the directory is the following:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ touch /mnt/foo
$ sync
$ mkdir /mnt/dir
# Write some data to our file and fsync it.
$ xfs_io -c "pwrite -S 0xab 0 64K" -c "fsync" /mnt/foo
# Add a new link to our file. Since the file was logged before, we
# update it in the log tree by calling btrfs_log_new_name().
$ ln /mnt/foo /mnt/dir/bar
# fsync the root directory - we expect it to persist the dentry for
# the new directory "dir".
$ xfs_io -c "fsync" /mnt
<power fail>
After mounting the fs the entry for directory "dir" does not exists,
despite the explicit fsync on the root directory.
Here's why this happens:
1) When we fsync the file we log the inode, so that it's present in the
log tree;
2) When adding the new link we enter btrfs_log_new_name(), and since the
inode is in the log tree we proceed to updating the inode in the log
tree;
3) We first set the inode's last_unlink_trans to the current transaction
(early in btrfs_log_new_name());
4) We then eventually enter btrfs_log_inode_parent(), and after logging
the file's inode, we call btrfs_log_all_parents() because the inode's
last_unlink_trans matches the current transaction's ID (updated in the
previous step);
5) So btrfs_log_all_parents() logs the root directory by calling
btrfs_log_inode() for the root's inode with a log mode of LOG_INODE_ALL
so that new dentries are logged;
6) At btrfs_log_inode(), because the log mode is LOG_INODE_ALL, we
update root inode's last_log_commit to the last transaction that
changed the inode (->last_sub_trans field of the inode), which
corresponds to the current transaction's ID;
7) Then later when user space explicitly calls fsync against the root
directory, we enter btrfs_sync_file(), which calls skip_inode_logging()
and that returns true, since its call to btrfs_inode_in_log() returns
true and there are no ordered extents (it's a directory, never has
ordered extents). This results in btrfs_sync_file() returning without
syncing the log or committing the current transaction, so all the
updates we did when logging the new name, including logging the root
directory, are not persisted.
So fix this by but updating the inode's last_log_commit if we are sure
we are not logging a new name (if ctx->logging_new_name is false).
A test case for fstests will follow soon.
Reported-by: Vyacheslav Kovalevsky <slava.kovalevskiy.2014@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/03c5d7ec-5b3d-49d1-95bc-8970a7f82d87@gmail.com/
Fixes: 130341be7f ("btrfs: always update the logged transaction when logging new names")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The stripe offset calculation in the zoned code for raid0 and raid10
wrongly uses map->stripe_size to calculate it. In fact, map->stripe_size is
the size of the device extent composing the block group, which always is
the zone_size on the zoned setup.
Fix it by using BTRFS_STRIPE_LEN and BTRFS_STRIPE_LEN_SHIFT. Also, optimize
the calculation a bit by doing the common calculation only once.
Fixes: c0d90a79e8 ("btrfs: zoned: fix alloc_offset calculation for partly conventional block groups")
CC: stable@vger.kernel.org # 6.17+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When a block group contains both conventional zone and sequential zone, the
capacity of the block group is wrongly set to the block group's full
length. The capacity should be calculated in btrfs_load_block_group_* using
the last allocation offset.
Fixes: 568220fa96 ("btrfs: zoned: support RAID0/1/10 on top of raid stripe tree")
CC: stable@vger.kernel.org # v6.12+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
->nr_pages is int, it needs type extension before calculating the region
size.
Fixes: a90558b36c ("io_uring/memmap: helper for pinning region pages")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: style fixup]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In Raspberry Pi 5 DTS, the Ethernet clock rates must be assigned
as the default clock register values are not valid for the
Ethernet interface to function.
This can be done either in rp1_clocks node or in rp1_eth node.
Define the rates in rp1_eth node, as those clocks are 'leaf' clocks
used specifically by the Ethernet device only.
Fixes: 43456fdfc0 ("arm64: dts: broadcom: Enable RP1 ethernet for Raspberry Pi 5")
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
Link: https://lore.kernel.org/r/20251021135533.5517-1-andrea.porta@suse.com
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
xfs_select_open_zone_mru needs to pass XFS_ZONE_ALLOC_OK to
xfs_try_use_zone because we only want to tightly pack into zones of the
same or a compatible temperature instead of any available zone.
This got broken in commit 0301dae732 ("xfs: refactor hint based zone
allocation"), which failed to update this particular caller when
switching to an enum. xfs/638 sometimes, but not reliably fails due to
this change.
Fixes: 0301dae732 ("xfs: refactor hint based zone allocation")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Drop the rtgrop reference when xfs_init_zone fails for a conventional
device.
Fixes: 4e4d520755 ("xfs: add the zoned space allocator")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
I think there are several things wrong with this function:
A) xfs_bmapi_write can return a much larger unwritten mapping than what
the caller asked for. We convert part of that range to written, but
return the entire written mapping to iomap even though that's
inaccurate.
B) The arguments to xfs_reflink_convert_cow_locked are wrong -- an
unwritten mapping could be *smaller* than the write range (or even
the hole range). In this case, we convert too much file range to
written state because we then return a smaller mapping to iomap.
C) It doesn't handle delalloc mappings. This I covered in the patch
that I already sent to the list.
D) Reassigning count_fsb to handle the hole means that if the second
cmap lookup attempt succeeds (due to racing with someone else) we
trim the mapping more than is strictly necessary. The changing
meaning of count_fsb makes this harder to notice.
E) The tracepoint is kinda wrong because @length is mutated. That makes
it harder to chase the data flows through this function because you
can't just grep on the pos/bytecount strings.
F) We don't actually check that the br_state = XFS_EXT_NORM assignment
is accurate, i.e that the cow fork actually contains a written
mapping for the range we're interested in
G) Somewhat inadequate documentation of why we need to xfs_trim_extent
so aggressively in this function.
H) Not sure why xfs_iomap_end_fsb is used here, the vfs already clamped
the write range to s_maxbytes.
Fix these issues, and then the atomic writes regressions in generic/760,
generic/617, generic/091, generic/263, and generic/521 all go away for
me.
Cc: stable@vger.kernel.org # v6.16
Fixes: bd1d2c21d5 ("xfs: add xfs_atomic_write_cow_iomap_begin()")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
With the 20 Oct 2025 release of fstests, generic/521 fails for me on
regular (aka non-block-atomic-writes) storage:
QA output created by 521
dowrite: write: Input/output error
LOG DUMP (8553 total operations):
1( 1 mod 256): SKIPPED (no operation)
2( 2 mod 256): WRITE 0x7e000 thru 0x8dfff (0x10000 bytes) HOLE
3( 3 mod 256): READ 0x69000 thru 0x79fff (0x11000 bytes)
4( 4 mod 256): FALLOC 0x53c38 thru 0x5e853 (0xac1b bytes) INTERIOR
5( 5 mod 256): COPY 0x55000 thru 0x59fff (0x5000 bytes) to 0x25000 thru 0x29fff
6( 6 mod 256): WRITE 0x74000 thru 0x88fff (0x15000 bytes)
7( 7 mod 256): ZERO 0xedb1 thru 0x11693 (0x28e3 bytes)
with a warning in dmesg from iomap about XFS trying to give it a
delalloc mapping for a directio write. Fix the software atomic write
iomap_begin code to convert the reservation into a written mapping.
This doesn't fix the data corruption problems reported by generic/760,
but it's a start.
Cc: stable@vger.kernel.org # v6.16
Fixes: bd1d2c21d5 ("xfs: add xfs_atomic_write_cow_iomap_begin()")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Jeff Johnson says:
==================
ath.git update for v6.18-rc5
Revert an ath12k change which resulted in a significance performance
impact on WCN7850.
==================
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
hwsim radios marked destroy_on_close are removed when the Netlink socket
that created them is closed. As the portid is not unique across network
namespaces, closing a socket in one namespace may remove radios in another
if it has the destroy_on_close flag set.
Instead of matching the network namespace, match the netgroup of the radio
to limit radio removal to those that have been created by the closing
Netlink socket. The netgroup of a radio identifies the network namespace
it was created in, and matching on it removes a destroy_on_close radio
even if it has been moved to another namespace.
Fixes: 100cb9ff40 ("mac80211_hwsim: Allow managing radios from non-initial namespaces")
Signed-off-by: Martin Willi <martin@strongswan.org>
Link: https://patch.msgid.link/20251103082436.30483-1-martin@strongswan.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As per the i.MX8MP TRM, section 14.2 "AUDIO_BLK_CTRL", table 14.2.3.1.1
"memory map", the definition of the EARC control register shows that the
EARC controller software reset is controlled via bit 0, while the EARC PHY
software reset is controlled via bit 1.
This means that the current definitions of IMX8MP_AUDIOMIX_EARC_RESET_MASK
and IMX8MP_AUDIOMIX_EARC_PHY_RESET_MASK are wrong since their values would
imply that the EARC controller software reset is controlled via bit 1 and
the EARC PHY software reset is controlled via bit 2. Fix them.
Fixes: a83bc87cd3 ("reset: imx8mp-audiomix: Prepare the code for more reset bits")
Cc: stable@vger.kernel.org
Reviewed-by: Shengjiu Wang <shengjiu.wang@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Signed-off-by: Laurentiu Mihalcea <laurentiu.mihalcea@nxp.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Since commit d24cfee7f6 ("spi: Fix acpi deferred irq probe"), the
acpi_dev_gpio_irq_get() call gets delayed till spi_probe() is called
on the SPI device.
If there is no driver for the SPI device then the move to spi_probe()
results in acpi_dev_gpio_irq_get() never getting called. This may
cause problems by leaving the GPIO pin floating because this call is
responsible for setting up the GPIO pin direction and/or bias according
to the values from the ACPI tables.
Re-add the removed acpi_dev_gpio_irq_get() in acpi_register_spi_device()
to ensure the GPIO pin is always correctly setup, while keeping the
acpi_dev_gpio_irq_get() call added to spi_probe() to deal with
-EPROBE_DEFER returns caused by the GPIO controller not having a driver
yet.
Link: https://bbs.archlinux.org/viewtopic.php?id=302348
Fixes: d24cfee7f6 ("spi: Fix acpi deferred irq probe")
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hansg@kernel.org>
Link: https://patch.msgid.link/20251102190921.30068-1-hansg@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
bm_register_write() opens an executable file using open_exec(), which
internally calls do_open_execat() and denies write access on the file to
avoid modification while it is being executed.
However, when an error occurs, bm_register_write() closes the file using
filp_close() directly. This does not restore the write permission, which
may cause subsequent write operations on the same file to fail.
Fix this by calling exe_file_allow_write_access() before filp_close() to
restore the write permission properly.
Fixes: e7850f4d84 ("binfmt_misc: fix possible deadlock in bm_register_write")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Link: https://patch.msgid.link/20251105022923.1813587-1-zilin@seu.edu.cn
Signed-off-by: Christian Brauner <brauner@kernel.org>
When sb_min_blocksize() returns 0 and the return value is not checked,
it may lead to a situation where sb->s_blocksize is 0 when
accessing the filesystem super block. After commit a64e5a5960
("bdev: add back PAGE_SIZE block size validation for
sb_set_blocksize()"), this becomes more likely to happen when the
block device’s logical_block_size is larger than PAGE_SIZE and the
filesystem is unformatted. Add the __must_check attribute to ensure
callers always check the return value.
Cc: stable@vger.kernel.org # v6.15
Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Link: https://patch.msgid.link/20251104125009.2111925-6-yangyongpeng.storage@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
In the commit referenced by the Fixes tag,
devm_gpiod_get_optional() was replaced by manual
GPIO management, relying on the regulator core to release the
GPIO descriptor. However, this approach does not account for the
error path: when regulator registration fails, the core never
takes over the GPIO, resulting in a resource leak.
Add gpiod_put() before returning on regulator registration failure.
Fixes: 5e6f3ae5c1 ("regulator: fixed: Let core handle GPIO descriptor")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Link: https://patch.msgid.link/20251028172828.625-1-vulab@iscas.ac.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
The Mesa issue referenced below pointed out a possible deadlock:
[ 1231.611031] Possible interrupt unsafe locking scenario:
[ 1231.611033] CPU0 CPU1
[ 1231.611034] ---- ----
[ 1231.611035] lock(&xa->xa_lock#17);
[ 1231.611038] local_irq_disable();
[ 1231.611039] lock(&fence->lock);
[ 1231.611041] lock(&xa->xa_lock#17);
[ 1231.611044] <Interrupt>
[ 1231.611045] lock(&fence->lock);
[ 1231.611047]
*** DEADLOCK ***
In this example, CPU0 would be any function accessing job->dependencies
through the xa_* functions that don't disable interrupts (eg:
drm_sched_job_add_dependency(), drm_sched_entity_kill_jobs_cb()).
CPU1 is executing drm_sched_entity_kill_jobs_cb() as a fence signalling
callback so in an interrupt context. It will deadlock when trying to
grab the xa_lock which is already held by CPU0.
Replacing all xa_* usage by their xa_*_irq counterparts would fix
this issue, but Christian pointed out another issue: dma_fence_signal
takes fence.lock and so does dma_fence_add_callback.
dma_fence_signal() // locks f1.lock
-> drm_sched_entity_kill_jobs_cb()
-> foreach dependencies
-> dma_fence_add_callback() // locks f2.lock
This will deadlock if f1 and f2 share the same spinlock.
To fix both issues, the code iterating on dependencies and re-arming them
is moved out to drm_sched_entity_kill_jobs_work().
Cc: stable@vger.kernel.org # v6.2+
Fixes: 2fdb8a8f07 ("drm/scheduler: rework entity flush, kill and fini")
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13908
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
[phasta: commit message nits]
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://patch.msgid.link/20251104095358.15092-1-pierre-eric.pelloux-prayer@amd.com
Pull media fixes from Mauro Carvalho Chehab:
- honour privacy led with pdx86/int3472
- fix invalid file access on cx18 and ivtv
- forbid remove_bufs when legacy fileio is active on videbuf2
- add an heuristic to find stream entity on uvcvideo
* tag 'media/v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: videobuf2: forbid remove_bufs when legacy fileio is active
media: uvcvideo: Use heuristic to find stream entity
media: v4l2-subdev / pdx86: int3472: Use "privacy" as con_id for the privacy LED
media: ivtv: Fix invalid access to file *
media: cx18: Fix invalid access to file *
Fix a AA deadlock in refill_skbs() where memory allocation while holding
skb_pool->lock can trigger a recursive lock acquisition attempt.
The deadlock scenario occurs when the system is under severe memory
pressure:
1. refill_skbs() acquires skb_pool->lock (spinlock)
2. alloc_skb() is called while holding the lock
3. Memory allocator fails and calls slab_out_of_memory()
4. This triggers printk() for the OOM warning
5. The console output path calls netpoll_send_udp()
6. netpoll_send_udp() attempts to acquire the same skb_pool->lock
7. Deadlock: the lock is already held by the same CPU
Call stack:
refill_skbs()
spin_lock_irqsave(&skb_pool->lock) <- lock acquired
__alloc_skb()
kmem_cache_alloc_node_noprof()
slab_out_of_memory()
printk()
console_flush_all()
netpoll_send_udp()
skb_dequeue()
spin_lock_irqsave(&skb_pool->lock) <- deadlock attempt
This bug was exposed by commit 248f6571fd ("netpoll: Optimize skb
refilling on critical path") which removed refill_skbs() from the
critical path (where nested printk was being deferred), letting nested
printk being called from inside refill_skbs()
Refactor refill_skbs() to never allocate memory while holding
the spinlock.
Another possible solution to fix this problem is protecting the
refill_skbs() from nested printks, basically calling
printk_deferred_{enter,exit}() in refill_skbs(), then, any nested
pr_warn() would be deferred.
I prefer this approach, given I _think_ it might be a good idea to move
the alloc_skb() from GFP_ATOMIC to GFP_KERNEL in the future, so, having
the alloc_skb() outside of the lock will be necessary step.
There is a possible TOCTOU issue when checking for the pool length, and
queueing the new allocated skb, but, this is not an issue, given that
an extra SKB in the pool is harmless and it will be eventually used.
Signed-off-by: Breno Leitao <leitao@debian.org>
Fixes: 248f6571fd ("netpoll: Optimize skb refilling on critical path")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251103-fix_netpoll_aa-v4-1-4cfecdf6da7c@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make knav_dma_open_channel consistently return NULL on error instead
of ERR_PTR. Currently the header include/linux/soc/ti/knav_dma.h
returns NULL when the driver is disabled, but the driver
implementation does not even return NULL or ERR_PTR on failure,
causing inconsistency in the users. This results in a crash in
netcp_free_navigator_resources as followed (trimmed):
Unhandled fault: alignment exception (0x221) at 0xfffffff2
[fffffff2] *pgd=80000800207003, *pmd=82ffda003, *pte=00000000
Internal error: : 221 [#1] SMP ARM
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.0-rc7 #1 NONE
Hardware name: Keystone
PC is at knav_dma_close_channel+0x30/0x19c
LR is at netcp_free_navigator_resources+0x2c/0x28c
[... TRIM...]
Call trace:
knav_dma_close_channel from netcp_free_navigator_resources+0x2c/0x28c
netcp_free_navigator_resources from netcp_ndo_open+0x430/0x46c
netcp_ndo_open from __dev_open+0x114/0x29c
__dev_open from __dev_change_flags+0x190/0x208
__dev_change_flags from netif_change_flags+0x1c/0x58
netif_change_flags from dev_change_flags+0x38/0xa0
dev_change_flags from ip_auto_config+0x2c4/0x11f0
ip_auto_config from do_one_initcall+0x58/0x200
do_one_initcall from kernel_init_freeable+0x1cc/0x238
kernel_init_freeable from kernel_init+0x1c/0x12c
kernel_init from ret_from_fork+0x14/0x38
[... TRIM...]
Standardize the error handling by making the function return NULL on
all error conditions. The API is used in just the netcp_core.c so the
impact is limited.
Note, this change, in effect reverts commit 5b6cb43b4d ("net:
ethernet: ti: netcp_core: return error while dma channel open issue"),
but provides a less error prone implementation.
Suggested-by: Simon Horman <horms@kernel.org>
Suggested-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251103162811.3730055-1-nm@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit 4959aebba8 ("virtio-net: use mtu size as buffer length
for big packets"), when guest gso is off, the allocated size for big
packets is not MAX_SKB_FRAGS * PAGE_SIZE anymore but depends on
negotiated MTU. The number of allocated frags for big packets is stored
in vi->big_packets_num_skbfrags.
Because the host announced buffer length can be malicious (e.g. the host
vhost_net driver's get_rx_bufs is modified to announce incorrect
length), we need a check in virtio_net receive path. Currently, the
check is not adapted to the new change which can lead to NULL page
pointer dereference in the below while loop when receiving length that
is larger than the allocated one.
This commit fixes the received length check corresponding to the new
change.
Fixes: 4959aebba8 ("virtio-net: use mtu size as buffer length for big packets")
Cc: stable@vger.kernel.org
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://patch.msgid.link/20251030144438.7582-1-minhquangbui99@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mykyta Yatsenko says:
====================
bpf: Add _impl suffix for kfuncs with implicit args
We have established a pattern of function naming win "_impl" suffix;
those functions accept verifier-provided bpf_prog_aux argument.
Following uniform convention will allow for transparent backwards
compatibility with the upcoming KF_IMPLICIT_ARGS feature. This patch
set aims to fix current deviation from the convention to eliminate
unnecessary backwards incompatibility in the future.
Three kfuncs added in 6.18 don’t follow this *_impl convention and
therefore won’t participate in the new KF_IMPLICIT_ARGS mechanism:
* bpf_task_work_schedule_resume()
* bpf_task_work_schedule_signal()
* bpf_stream_vprintk()
Rename them to align with the implicit-arg flow:
bpf_task_work_schedule_resume() -> bpf_task_work_schedule_resume_impl()
bpf_task_work_schedule_signal() -> bpf_task_work_schedule_signal_impl()
bpf_stream_vprintk() -> bpf_stream_vprintk_impl()
The KF_IMPLICIT_ARGS mechanism is not in tree yet, so callers must
switch to the *_impl names for now. Once the new mechanism lands, the
plain names (without _impl) will be reintroduced.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
---
Changes in v3:
- Fix commit messages
- Link to v2: https://lore.kernel.org/r/20251104-implv2-v2-0-6dbc35f39f28@meta.com
Changes in v1:
- Split commit into 2
- Rebase on the correct branch
- Link to v1: https://lore.kernel.org/all/20251103232319.122965-1-mykyta.yatsenko5@gmail.com/
====================
Link: https://patch.msgid.link/20251104-implv2-v3-0-4772b9ae0e06@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Rename bpf_stream_vprintk() to bpf_stream_vprintk_impl().
This makes bpf_stream_vprintk() follow the already established "_impl"
suffix-based naming convention for kfuncs with the bpf_prog_aux
argument provided by the verifier implicitly. This convention will be
taken advantage of with the upcoming KF_IMPLICIT_ARGS feature to
preserve backwards compatibility to BPF programs.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20251104-implv2-v3-2-4772b9ae0e06@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Rename:
bpf_task_work_schedule_resume()->bpf_task_work_schedule_resume_impl()
bpf_task_work_schedule_signal()->bpf_task_work_schedule_signal_impl()
This aligns task work scheduling kfuncs with the established naming
scheme for kfuncs with the bpf_prog_aux argument provided by the
verifier implicitly. This convention will be taken advantage of with the
upcoming KF_IMPLICIT_ARGS feature to preserve backwards compatibility to
BPF programs.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20251104-implv2-v3-1-4772b9ae0e06@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Michael Chan says:
====================
bnxt_en: Bug fixes
Patches 1, 3, and 4 are bug fixes related to the FW log tracing driver
coredump feature recently added in 6.13. Patch #1 adds the necessary
call to shutdown the FW logging DMA during PCI shutdown. Patch #3 fixes
a possible null pointer derefernce when using early versions of the FW
with this feature. Patch #4 adds the coredump header information
unconditionally to make it more robust.
Patch #2 fixes a possible memory leak during PTP shutdown. Patch #5
eliminates a dmesg warning when doing devlink reload.
====================
Link: https://patch.msgid.link/20251104005700.542174-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While populating firmware host logging segments for the coredump, it is
possible for the FW command that flushes the segment to fail. When that
happens, the existing code will not update the max entry and entry size
in the segment header and this causes software that decodes the coredump
to skip the segment.
The segment most likely has already collected some DMA data, so always
update these 2 segment fields in the header to allow the decoder to
decode any data in the segment.
Fixes: 3c2179e663 ("bnxt_en: Add FW trace coredump segments to the coredump")
Reviewed-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20251104005700.542174-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The runtime-const infrastructure was never designed to handle the
modular case, because the constant fixup is only done at boot time for
core kernel code.
But by the time I used it for the x86-64 user space limit handling in
commit 86e6b1547b ("x86: fix user address masking non-canonical
speculation issue"), I had completely repressed that fact.
And it all happens to work because the only code that currently actually
gets inlined by modules is for the access_ok() limit check, where the
default constant value works even when not fixed up. Because at least I
had intentionally made it be something that is in the non-canonical
address space region.
But it's technically very wrong, and it does mean that at least in
theory, the use of 'access_ok()' + '__get_user()' can trigger the same
speculation issue with non-canonical addresses that the original commit
was all about.
The pattern is unusual enough that this probably doesn't matter in
practice, but very wrong is still very wrong. Also, let's fix it before
the nice optimized scoped user accessor helpers that Thomas Gleixner is
working on cause this pseudo-constant to then be more widely used.
This all came up due to an unrelated discussion with Mateusz Guzik about
using the runtime const infrastructure for names_cachep accesses too.
There the modular case was much more obviously broken, and Mateusz noted
it in his 'v2' of the patch series.
That then made me notice how broken 'access_ok()' had been in modules
all along. Mea culpa, mea maxima culpa.
Fix it by simply not using the runtime-const code in modules, and just
using the USER_PTR_MAX variable value instead. This is not
performance-critical like the core user accessor functions (get_user()
and friends) are.
Also make sure this doesn't get forgotten the next time somebody wants
to do runtime constant optimizations by having the x86 runtime-const.h
header file error out if included by modules.
Fixes: 86e6b1547b ("x86: fix user address masking non-canonical speculation issue")
Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Triggered-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/all/20251030105242.801528-1-mjguzik@gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Changing alignment of header would mean it's no longer safe to cast a
2 byte aligned pointer between formats. Use two 16 bit fields to make
it 2 byte aligned as previously.
This fixes the performance regression since
commit ("virtio_net: enable gso over UDP tunnel support.") as it uses
virtio_net_hdr_v1_hash_tunnel which embeds
virtio_net_hdr_v1_hash. Pktgen in guest + XDP_DROP on TAP + vhost_net
shows the TX PPS is recovered from 2.4Mpps to 4.45Mpps.
Fixes: 56a06bd40f ("virtio_net: enable gso over UDP tunnel support.")
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://patch.msgid.link/20251031060551.126-1-jasowang@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If cros_ec_keyb_register_matrix() isn't called (due to
`buttons_switches_only`) in cros_ec_keyb_probe(), `ckdev->idev` remains
NULL. An invalid memory access is observed in cros_ec_keyb_process()
when receiving an EC_MKBP_EVENT_KEY_MATRIX event in cros_ec_keyb_work()
in such case.
Unable to handle kernel read from unreadable memory at virtual address 0000000000000028
...
x3 : 0000000000000000 x2 : 0000000000000000
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
input_event
cros_ec_keyb_work
blocking_notifier_call_chain
ec_irq_thread
It's still unknown about why the kernel receives such malformed event,
in any cases, the kernel shouldn't access `ckdev->idev` and friends if
the driver doesn't intend to initialize them.
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Link: https://patch.msgid.link/20251104070310.3212712-1-tzungbi@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The commit 4adc20ba95 ("ARM: dts: broadcom: rpi: Switch to V3D firmware
clock") causes a regression in arm64 developer setups, which stores the
kernel modules via NFS. Before this change the involved V3D clock provider
was builtin, but after this DT change the clk-raspberrypi is responsible
for V3D and for arm64/defconfig this driver is build as a kernel module.
In case these kernel modules are provided via NFS this takes too long and
the PM domain core give up before the clock driver could be loaded:
v3d fec00000.gpu: deferred probe timeout, ignoring dependency
So resolve this issue by making this critical driver builtin.
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/linux-arm-kernel/9ebda74e-e700-4fbe-bca5-382f92417a9c@sirena.org.uk/
Fixes: 4adc20ba95 ("ARM: dts: broadcom: rpi: Switch to V3D firmware clock")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Link: https://lore.kernel.org/r/20251104174518.11783-1-wahrenst@gmx.net
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Fix refcount leak in `smb2_set_path_attr` when path conversion fails.
Function `cifs_get_writable_path` returns `cfile` with its reference
counter `cfile->count` increased on success. Function `smb2_compound_op`
would decrease the reference counter for `cfile`, as stated in its
comment. By calling `smb2_rename_path`, the reference counter of `cfile`
would leak if `cifs_convert_path_to_utf16` fails in `smb2_set_path_attr`.
Fixes: 8de9e86c67 ("cifs: create a helper to find a writeable handle by path name")
Acked-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull rdma fixes from Jason Gunthorpe:
- Memory leak in bnxt GSI qp path
- Failure in irdma registering large MRs
- Failure to clean out the right CQ table entry in irdma
- Invalid vf_id in some cases
- Incorrect error unwind in EFA CQ create
- hns doesn't use the optimal cq/qp relationships for it's HW banks
- hns reports the wrong SGE size to userspace for its QPs
- Corruption of the hns work queue entries in some cases
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
MAINTAINERS: Update irdma maintainers
RDMA/irdma: Fix vf_id size to u16 to avoid overflow
RDMA/hns: Remove an extra blank line
RDMA/hns: Fix wrong WQE data when QP wraps around
RDMA/hns: Fix the modification of max_send_sge
RDMA/hns: Fix recv CQ and QP cache affinity
RDMA/uverbs: Fix umem release in UVERBS_METHOD_CQ_CREATE
RDMA/irdma: Set irdma_cq cq_num field during CQ create
RDMA/irdma: Fix SD index calculation
RDMA/bnxt_re: Fix a potential memory leak in destroy_gsi_sqp
For S3 on vangogh, PMFW needs to be notified before the
driver powers down RLC. This already happens in smu_disable_dpms()
so drop the superfluous call in amdgpu_device_suspend().
Co-developed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 960e30a61e)
[Why & How]
This fixes the black screen issue on certain APUs with HDMI,
accompanied by the following messages:
amdgpu 0000:c4:00.0: amdgpu: [drm] Failed to setup vendor info
frame on connector DP-1: -22
amdgpu 0000:c4:00.0: [drm] Cannot find any crtc or sizes [drm]
Cannot find any crtc or sizes
Fixes: 489f0f600c ("drm/amd/display: Fix DVI-D/HDMI adapters")
Suggested-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 678c901443)
commit 978fa2f6d0 ("drm/amd/display: Use scaling for non-native
resolutions on eDP") started using the GPU scaler hardware to scale
when a non-native resolution was picked on eDP. This scaling was done
to fill the screen instead of maintain aspect ratio.
The idea was supposed to be that if a different scaling behavior is
preferred then the compositor would request it. The not following
aspect ratio behavior however isn't desirable, so adjust it to follow
aspect ratio and still try to fill screen.
Note: This will lead to black bars in some cases for non-native
resolutions. Compositors can request the previous behavior if desired.
Fixes: 978fa2f6d0 ("drm/amd/display: Use scaling for non-native resolutions on eDP")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4538
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 825df7ff4b)
These were not set so soft recovery was inadvertantly
disabled.
Fixes: 6ac55eab4f ("drm/amdgpu: move reset support type checks into the caller")
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1972763505)
On big endian arm kernels, the arm optimized Curve25519 code produces
incorrect outputs and fails the Curve25519 test. This has been true
ever since this code was added.
It seems that hardly anyone (or even no one?) actually uses big endian
arm kernels. But as long as they're ostensibly supported, we should
disable this code on them so that it's not accidentally used.
Note: for future-proofing, use !CPU_BIG_ENDIAN instead of
CPU_LITTLE_ENDIAN. Both of these are arch-specific options that could
get removed in the future if big endian support gets dropped.
Fixes: d8f1308a02 ("crypto: arm/curve25519 - wire up NEON implementation")
Cc: stable@vger.kernel.org
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251104054906.716914-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
When unbinding a memslot from a guest_memfd instance, remove the bindings
even if the guest_memfd file is dying, i.e. even if its file refcount has
gone to zero. If the memslot is freed before the file is fully released,
nullifying the memslot side of the binding in kvm_gmem_release() will
write to freed memory, as detected by syzbot+KASAN:
==================================================================
BUG: KASAN: slab-use-after-free in kvm_gmem_release+0x176/0x440 virt/kvm/guest_memfd.c:353
Write of size 8 at addr ffff88807befa508 by task syz.0.17/6022
CPU: 0 UID: 0 PID: 6022 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report+0x118/0x150 mm/kasan/report.c:595
kvm_gmem_release+0x176/0x440 virt/kvm/guest_memfd.c:353
__fput+0x44c/0xa70 fs/file_table.c:468
task_work_run+0x1d4/0x260 kernel/task_work.c:227
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop+0xe9/0x130 kernel/entry/common.c:43
exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
do_syscall_64+0x2bd/0xfa0 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fbeeff8efc9
</TASK>
Allocated by task 6023:
kasan_save_stack mm/kasan/common.c:56 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
poison_kmalloc_redzone mm/kasan/common.c:397 [inline]
__kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:414
kasan_kmalloc include/linux/kasan.h:262 [inline]
__kmalloc_cache_noprof+0x3e2/0x700 mm/slub.c:5758
kmalloc_noprof include/linux/slab.h:957 [inline]
kzalloc_noprof include/linux/slab.h:1094 [inline]
kvm_set_memory_region+0x747/0xb90 virt/kvm/kvm_main.c:2104
kvm_vm_ioctl_set_memory_region+0x6f/0xd0 virt/kvm/kvm_main.c:2154
kvm_vm_ioctl+0x957/0xc60 virt/kvm/kvm_main.c:5201
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 6023:
kasan_save_stack mm/kasan/common.c:56 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
poison_slab_object mm/kasan/common.c:252 [inline]
__kasan_slab_free+0x5c/0x80 mm/kasan/common.c:284
kasan_slab_free include/linux/kasan.h:234 [inline]
slab_free_hook mm/slub.c:2533 [inline]
slab_free mm/slub.c:6622 [inline]
kfree+0x19a/0x6d0 mm/slub.c:6829
kvm_set_memory_region+0x9c4/0xb90 virt/kvm/kvm_main.c:2130
kvm_vm_ioctl_set_memory_region+0x6f/0xd0 virt/kvm/kvm_main.c:2154
kvm_vm_ioctl+0x957/0xc60 virt/kvm/kvm_main.c:5201
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Deliberately don't acquire filemap invalid lock when the file is dying as
the lifecycle of f_mapping is outside the purview of KVM. Dereferencing
the mapping is *probably* fine, but there's no need to invalidate anything
as memslot deletion is responsible for zapping SPTEs, and the only code
that can access the dying file is kvm_gmem_release(), whose core code is
mutually exclusive with unbinding.
Note, the mutual exclusivity is also what makes it safe to access the
bindings on a dying gmem instance. Unbinding either runs with slots_lock
held, or after the last reference to the owning "struct kvm" is put, and
kvm_gmem_release() nullifies the slot pointer under slots_lock, and puts
its reference to the VM after that is done.
Reported-by: syzbot+2479e53d0db9b32ae2aa@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68fa7a22.a70a0220.3bf6c6.008b.GAE@google.com
Tested-by: syzbot+2479e53d0db9b32ae2aa@syzkaller.appspotmail.com
Fixes: a7800aa80e ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
Cc: stable@vger.kernel.org
Cc: Hillf Danton <hdanton@sina.com>
Reviewed-By: Vishal Annapurve <vannapurve@google.com>
Link: https://patch.msgid.link/20251104011205.3853541-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Use a raw spinlock for vcpu_svm.ir_list_lock as the lock can be taken
during schedule() via kvm_sched_out() => __avic_vcpu_put(), and "normal"
spinlocks are sleepable locks when PREEMPT_RT=y.
This fixes the following lockdep warning:
=============================
[ BUG: Invalid wait context ]
6.12.0-146.1640_2124176644.el10.x86_64+debug #1 Not tainted
-----------------------------
qemu-kvm/38299 is trying to lock:
ff11000239725600 (&svm->ir_list_lock){....}-{3:3}, at: __avic_vcpu_put+0xfd/0x300 [kvm_amd]
other info that might help us debug this:
context-{5:5}
2 locks held by qemu-kvm/38299:
#0: ff11000239723ba8 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x240/0xe00 [kvm]
#1: ff11000b906056d8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2e/0x130
stack backtrace:
CPU: 1 UID: 0 PID: 38299 Comm: qemu-kvm Kdump: loaded Not tainted 6.12.0-146.1640_2124176644.el10.x86_64+debug #1 PREEMPT(voluntary)
Hardware name: AMD Corporation QUARTZ/QUARTZ, BIOS RQZ100AB 09/14/2023
Call Trace:
<TASK>
dump_stack_lvl+0x6f/0xb0
__lock_acquire+0x921/0xb80
lock_acquire.part.0+0xbe/0x270
_raw_spin_lock_irqsave+0x46/0x90
__avic_vcpu_put+0xfd/0x300 [kvm_amd]
svm_vcpu_put+0xfa/0x130 [kvm_amd]
kvm_arch_vcpu_put+0x48c/0x790 [kvm]
kvm_sched_out+0x161/0x1c0 [kvm]
prepare_task_switch+0x36b/0xf60
__schedule+0x4f7/0x1890
schedule+0xd4/0x260
xfer_to_guest_mode_handle_work+0x54/0xc0
vcpu_run+0x69a/0xa70 [kvm]
kvm_arch_vcpu_ioctl_run+0xdc0/0x17e0 [kvm]
kvm_vcpu_ioctl+0x39f/0xe00 [kvm]
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://patch.msgid.link/20251030194130.307900-1-mlevitsk@redhat.com
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Setup the per-CPU SVM data structures at the very end of hardware setup so
that svm_hardware_unsetup() can be used in svm_hardware_setup() to unwind
AVIC setup (for the GALog notifier). Alternatively, the error path could
do an explicit, manual unwind, e.g. by adding a helper to free the per-CPU
structures. But the per-CPU allocations have no interactions or
dependencies, i.e. can comfortably live at the end, and so converting to
a manual unwind would introduce churn and code without providing any
immediate advantage.
Link: https://patch.msgid.link/20251016190643.80529-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Update the comment above is_xstate_managed_msr() to note that
MSR_IA32_S_CET isn't saved/restored by XSAVES/XRSTORS.
MSR_IA32_S_CET isn't part of CET_U/S state as the SDM states:
The register state used by Control-Flow Enforcement Technology (CET)
comprises the two 64-bit MSRs (IA32_U_CET and IA32_PL3_SSP) that manage
CET when CPL = 3 (CET_U state); and the three 64-bit MSRs
(IA32_PL0_SSP–IA32_PL2_SSP) that manage CET when CPL < 3 (CET_S state).
Opportunistically shift the snippet about the safety of loading certain
MSRs to the function comment for kvm_access_xstate_msr(), which is where
the MSRs are actually loaded into hardware.
Fixes: e44eb58334 ("KVM: x86: Load guest FPU state when access XSAVE-managed MSRs")
Signed-off-by: Chao Gao <chao.gao@intel.com>
Link: https://patch.msgid.link/20251028060142.29830-1-chao.gao@intel.com
[sean: shift snippet about safety to kvm_access_xstate_msr()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Assert, via KVM_BUG_ON(), that guest FPU state isn't/is in use when
loading/putting the FPU to help detect KVM bugs without needing an assist
from KASAN. If an imbalanced load/put is detected, skip the redundant
load/put to avoid clobbering guest state and/or crashing the host.
Note, kvm_access_xstate_msr() already provides a similar assertion.
Reviewed-by: Yao Yuan <yaoyuan@linux.alibaba.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://patch.msgid.link/20251030185802.3375059-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Replace the hack added by commit f958bd2314 ("KVM: x86: Fix potential
put_fpu() w/o load_fpu() on MPX platform") with a more robust approach of
unloading+reloading guest FPU state based on whether or not the vCPU's FPU
is currently in-use, i.e. currently loaded. This fixes a bug on hosts
that support CET but not MPX, where kvm_arch_vcpu_ioctl_get_mpstate()
neglects to load FPU state (it only checks for MPX support) and leads to
KVM attempting to put FPU state due to kvm_apic_accept_events() triggering
INIT emulation. E.g. on a host with CET but not MPX, syzkaller+KASAN
generates:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
CPU: 211 UID: 0 PID: 20451 Comm: syz.9.26 Tainted: G S 6.18.0-smp-DEV #7 NONE
Tainted: [S]=CPU_OUT_OF_SPEC
Hardware name: Google Izumi/izumi, BIOS 0.20250729.1-0 07/29/2025
RIP: 0010:fpu_swap_kvm_fpstate+0x3ce/0x610 ../arch/x86/kernel/fpu/core.c:377
RSP: 0018:ff1100410c167cc0 EFLAGS: 00010202
RAX: 0000000000000004 RBX: 0000000000000020 RCX: 00000000000001aa
RDX: 00000000000001ab RSI: ffffffff817bb960 RDI: 0000000022600000
RBP: dffffc0000000000 R08: ff110040d23c8007 R09: 1fe220081a479000
R10: dffffc0000000000 R11: ffe21c081a479001 R12: ff110040d23c8d98
R13: 00000000fffdc578 R14: 0000000000000000 R15: ff110040d23c8d90
FS: 00007f86dd1876c0(0000) GS:ff11007fc969b000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f86dd186fa8 CR3: 00000040d1dfa003 CR4: 0000000000f73ef0
PKRU: 80000000
Call Trace:
<TASK>
kvm_vcpu_reset+0x80d/0x12c0 ../arch/x86/kvm/x86.c:11818
kvm_apic_accept_events+0x1cb/0x500 ../arch/x86/kvm/lapic.c:3489
kvm_arch_vcpu_ioctl_get_mpstate+0xd0/0x4e0 ../arch/x86/kvm/x86.c:12145
kvm_vcpu_ioctl+0x5e2/0xed0 ../virt/kvm/kvm_main.c:4539
__se_sys_ioctl+0x11d/0x1b0 ../fs/ioctl.c:51
do_syscall_x64 ../arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x6e/0x940 ../arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f86de71d9c9
</TASK>
with a very simple reproducer:
r0 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000000), 0x80b00, 0x0)
r1 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0)
ioctl$KVM_CREATE_IRQCHIP(r1, 0xae60)
r2 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x0)
ioctl$KVM_SET_IRQCHIP(r1, 0x8208ae63, ...)
ioctl$KVM_GET_MP_STATE(r2, 0x8004ae98, &(0x7f00000000c0))
Alternatively, the MPX hack in GET_MP_STATE could be extended to cover CET,
but from a "don't break existing functionality" perspective, that isn't any
less risky than peeking at the state of in_use, and it's far less robust
for a long term solution (as evidenced by this bug).
Reported-by: Alexander Potapenko <glider@google.com>
Fixes: 69cc3e8865 ("KVM: x86: Add XSS support for CET_KERNEL and CET_USER")
Reviewed-by: Yao Yuan <yaoyuan@linux.alibaba.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://patch.msgid.link/20251030185802.3375059-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
of_get_child_by_name() returns a node pointer with refcount incremented, we
should use of_node_put() on it when not needed anymore. Add the missing
of_node_put() to avoid refcount leak.
Fixes: 721cabf6c6 ("soc: imx: move PGC handling to a new GPC driver")
Cc: stable@vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Limit the workaround for the lack of the proper splash-screen handover
handling to the legacy ARM 32bit systems and replace forcing a sync_state
by explicite power domain shutdown. This approach lets compiler to
optimize it out on newer ARM 64bit systems.
Suggested-by: Ulf Hansson <ulf.hansson@linaro.org>
Fixes: 0745658aeb ("pmdomain: samsung: Fix splash-screen handover by enforcing a sync_state")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
nfsd exports a "pseudo root filesystem" which is used by NFSv4 to find
the various exported filesystems using LOOKUP requests from a known root
filehandle. NFSv3 uses the MOUNT protocol to find those exported
filesystems and so is not given access to the pseudo root filesystem.
If a v3 (or v2) client uses a filehandle from that filesystem,
nfsd_set_fh_dentry() will report an error, but still stores the export
in "struct svc_fh" even though it also drops the reference (exp_put()).
This means that when fh_put() is called an extra reference will be dropped
which can lead to use-after-free and possible denial of service.
Normal NFS usage will not provide a pseudo-root filehandle to a v3
client. This bug can only be triggered by the client synthesising an
incorrect filehandle.
To fix this we move the assignments to the svc_fh later, after all
possible error cases have been detected.
Reported-and-tested-by: tianshuo han <hantianshuo233@gmail.com>
Fixes: ef7f6c4904 ("nfsd: move V4ROOT version check to nfsd_set_fh_dentry()")
Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
find_or_create_cached_dir() could grab a new reference after kref_put()
had seen the refcount drop to zero but before cfid_list_lock is acquired
in smb2_close_cached_fid(), leading to use-after-free.
Switch to kref_put_lock() so cfid_release() is called with
cfid_list_lock held, closing that gap.
Fixes: ebe98f1447 ("cifs: enable caching of directories for which a lease is held")
Cc: stable@vger.kernel.org
Reported-by: Jay Shin <jaeshin@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Current ksmbd_rdma_capable_netdev fails to mark certain RDMA-capable
inerfaces such as IPoIB as RDMA capable after reverting GUID matching code
due to layer violation.
This patch check the ARPHRD_INFINIBAND type safely identifies an IPoIB
interface without introducing a layer violation, ensuring RDMA
functionality is correctly enabled for these interfaces.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
If user set bridge interface as actual RDMA-capable NICs are lower devices,
ksmbd can not detect as RDMA capable. This patch can detect the RDMA
capable lower devices from bridge master or VLAN. With this change, ksmbd
can accept both TCP and RDMA connections through the same bridge IP
address, allowing mixed transport operation without requiring separate
interfaces.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Since snd_soc_suspend() is invoked through snd_soc_pm_ops->suspend(),
and snd_soc_pm_ops is associated with the soc_driver (defined in
sound/soc/soc-core.c), and there is no parent-child relationship between
the soc_driver and the DA7213 codec driver, the power management subsystem
does not enforce a specific suspend/resume order between the DA7213 driver
and the soc_driver.
Because of this, the different codec component functionalities, called from
snd_soc_resume() to reconfigure various functions, can race with the
DA7213 struct dev_pm_ops::resume function, leading to misapplied
configuration. This occasionally results in clipped sound.
Fix this by dropping the struct dev_pm_ops::{suspend, resume} and use
instead struct snd_soc_component_driver::{suspend, resume}. This ensures
the proper configuration sequence is handled by the ASoC subsystem.
Cc: stable@vger.kernel.org
Fixes: 431e040065 ("ASoC: da7213: Add suspend to RAM support")
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Link: https://patch.msgid.link/20251104114914.2060603-1-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Guenter Roeck reported this kernel crash on his emulated B160L machine:
Starting network: udhcpc: started, v1.36.1
Backtrace:
[<104320d4>] unwind_once+0x1c/0x5c
[<10434a00>] walk_stackframe.isra.0+0x74/0xb8
[<10434a6c>] arch_stack_walk+0x28/0x38
[<104e5efc>] stack_trace_save+0x48/0x5c
[<105d1bdc>] set_track_prepare+0x44/0x6c
[<105d9c80>] ___slab_alloc+0xfc4/0x1024
[<105d9d38>] __slab_alloc.isra.0+0x58/0x90
[<105dc80c>] kmem_cache_alloc_noprof+0x2ac/0x4a0
[<105b8e54>] __anon_vma_prepare+0x60/0x280
[<105a823c>] __vmf_anon_prepare+0x68/0x94
[<105a8b34>] do_wp_page+0x8cc/0xf10
[<105aad88>] handle_mm_fault+0x6c0/0xf08
[<10425568>] do_page_fault+0x110/0x440
[<10427938>] handle_interruption+0x184/0x748
[<11178398>] schedule+0x4c/0x190
BUG: spinlock recursion on CPU#0, ifconfig/2420
lock: terminate_lock.2+0x0/0x1c, .magic: dead4ead, .owner: ifconfig/2420, .owner_cpu: 0
While creating the stack trace, the unwinder uses the stack pointer to guess
the previous frame to read the previous stack pointer from memory. The crash
happens, because the unwinder tries to read from unaligned memory and as such
triggers the unalignment trap handler which then leads to the spinlock
recursion and finally to a deadlock.
Fix it by checking the alignment before accessing the memory.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org # v6.12+
Pull btrfs fixes from David Sterba:
- fix memory leak in qgroup relation ioctl when qgroup levels are
invalid
- don't write back dirty metadata on filesystem with errors
- properly log renamed links
- properly mark prealloc extent range beyond inode size as dirty (when
no-noles is not enabled)
* tag 'for-6.18-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: mark dirty extent range for out of bound prealloc extents
btrfs: set inode flag BTRFS_INODE_COPY_EVERYTHING when logging new name
btrfs: fix memory leak of qgroup_list in btrfs_add_qgroup_relation
btrfs: ensure no dirty metadata is written back for an fs with errors
Song Liu says:
====================
Fix ftrace for livepatch + BPF fexit programs
livepatch and BPF trampoline are two special users of ftrace. livepatch
uses ftrace with IPMODIFY flag and BPF trampoline uses ftrace direct
functions. When livepatch and BPF trampoline with fexit programs attach to
the same kernel function, BPF trampoline needs to call into the patched
version of the kernel function.
1/3 and 2/3 of this patchset fix two issues with livepatch + fexit cases,
one in the register_ftrace_direct path, the other in the
modify_ftrace_direct path.
3/3 adds selftests for both cases.
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
v4: https://patch.msgid.link/20251027175023.1521602-1-song@kernel.org
Changes v3 => v4:
1. Add helper reset_direct. (Steven)
2. Add Reviewed-by from Jiri.
3. Fix minor typo in comments.
v3: https://lore.kernel.org/bpf/20251026205445.1639632-1-song@kernel.org/
Changes v2 => v3:
1. Incorporate feedback by AI, which also fixes build error reported by
Steven and kernel test robot.
v2: https://lore.kernel.org/bpf/20251024182901.3247573-1-song@kernel.org/
Changes v1 => v2:
1. Target bpf tree. (Alexei)
2. Bring back the FTRACE_WARN_ON in __ftrace_hash_update_ipmodify
for valid code paths. (Steven)
3. Update selftests with cleaner way to find livepatch-sample.ko.
(offlline discussion with Ihor)
v1: https://lore.kernel.org/bpf/20251024071257.3956031-1-song@kernel.org/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Raw IP packets have no MAC header, leaving skb->mac_header uninitialized.
This can trigger kernel panics on ARM64 when xfrm or other subsystems
access the offset due to strict alignment checks.
Initialize the MAC header to prevent such crashes.
This can trigger kernel panics on ARM when running IPsec over the
qmimux0 interface.
Example trace:
Internal error: Oops: 000000009600004f [#1] SMP
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.34-gbe78e49cb433 #1
Hardware name: LS1028A RDB Board (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : xfrm_input+0xde8/0x1318
lr : xfrm_input+0x61c/0x1318
sp : ffff800080003b20
Call trace:
xfrm_input+0xde8/0x1318
xfrm6_rcv+0x38/0x44
xfrm6_esp_rcv+0x48/0xa8
ip6_protocol_deliver_rcu+0x94/0x4b0
ip6_input_finish+0x44/0x70
ip6_input+0x44/0xc0
ipv6_rcv+0x6c/0x114
__netif_receive_skb_one_core+0x5c/0x8c
__netif_receive_skb+0x18/0x60
process_backlog+0x78/0x17c
__napi_poll+0x38/0x180
net_rx_action+0x168/0x2f0
Fixes: c6adf77953 ("net: usb: qmi_wwan: add qmap mux protocol support")
Signed-off-by: Qendrim Maxhuni <qendrim.maxhuni@garderos.com>
Link: https://patch.msgid.link/20251029075744.105113-1-qendrim.maxhuni@garderos.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The devm_kcalloc() function never return error pointers, it returns NULL
on failure. Also delete the netdev_err() printk. These allocation
functions already have debug output built-in some the extra error message
is not required.
Fixes: efabce2901 ("octeontx2-pf: AF_XDP zero copy receive support")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/aQYKkrGA12REb2sj@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Both livepatch and BPF trampoline use ftrace. Special attention is needed
when livepatch and fexit program touch the same function at the same
time, because livepatch updates a kernel function and the BPF trampoline
need to call into the right version of the kernel function.
Use samples/livepatch/livepatch-sample.ko for the test.
The test covers two cases:
1) When a fentry program is loaded first. This exercises the
modify_ftrace_direct code path.
2) When a fentry program is loaded first. This exercises the
register_ftrace_direct code path.
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251027175023.1521602-4-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
ftrace_hash_ipmodify_enable() checks IPMODIFY and DIRECT ftrace_ops on
the same kernel function. When needed, ftrace_hash_ipmodify_enable()
calls ops->ops_func() to prepare the direct ftrace (BPF trampoline) to
share the same function as the IPMODIFY ftrace (livepatch).
ftrace_hash_ipmodify_enable() is called in register_ftrace_direct() path,
but not called in modify_ftrace_direct() path. As a result, the following
operations will break livepatch:
1. Load livepatch to a kernel function;
2. Attach fentry program to the kernel function;
3. Attach fexit program to the kernel function.
After 3, the kernel function being used will not be the livepatched
version, but the original version.
Fix this by adding __ftrace_hash_update_ipmodify() to
__modify_ftrace_direct() and adjust some logic around the call.
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251027175023.1521602-3-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
When livepatch is attached to the same function as bpf trampoline with
a fexit program, bpf trampoline code calls register_ftrace_direct()
twice. The first time will fail with -EAGAIN, and the second time it
will succeed. This requires register_ftrace_direct() to unregister
the address on the first attempt. Otherwise, the bpf trampoline cannot
attach. Here is an easy way to reproduce this issue:
insmod samples/livepatch/livepatch-sample.ko
bpftrace -e 'fexit:cmdline_proc_show {}'
ERROR: Unable to attach probe: fexit:vmlinux:cmdline_proc_show...
Fix this by cleaning up the hash when register_ftrace_function_nolock hits
errors.
Also, move the code that resets ops->func and ops->trampoline to the error
path of register_ftrace_direct(); and add a helper function reset_direct()
in register_ftrace_direct() and unregister_ftrace_direct().
Fixes: d05cb47066 ("ftrace: Fix modification of direct_function hash while in use")
Cc: stable@vger.kernel.org # v6.6+
Reported-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
Closes: https://lore.kernel.org/live-patching/c5058315a39d4615b333e485893345be@crowdstrike.com/
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251027175023.1521602-2-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The TSO path called ionic_tx_map_skb() before preparing the TCP pseudo
checksum (ionic_tx_tcp_[inner_]pseudo_csum()), which may perform
skb_cow_head() and might modifies bytes in the linear header area.
Mapping first and then mutating the header risks:
- Using a stale DMA address if skb_cow_head() relocates the head, and/or
- Device reading stale header bytes on weakly-ordered systems
(CPU writes after mapping are not guaranteed visible without an
explicit dma_sync_single_for_device()).
Reorder the TX path to perform all header mutations (including
skb_cow_head()) *before* DMA mapping. Mapping is now done only after the
skb layout and header contents are final. This removes the need for any
post-mapping dma_sync and prevents on-wire corruption observed under
VLAN+TSO load after repeated runs.
This change is purely an ordering fix; no functional behavior change
otherwise.
Fixes: 0f3154e6bc ("ionic: Add Tx and Rx handling")
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20251031155203.203031-2-mheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The TX path currently writes descriptors and then immediately writes to
the MMIO doorbell register to notify the NIC. On weakly ordered
architectures, descriptor writes may still be pending in CPU or DMA
write buffers when the doorbell is issued, leading to the device
fetching stale or incomplete descriptors.
Add a dma_wmb() in ionic_txq_post() to ensure all descriptor writes are
visible to the device before the doorbell MMIO write.
Fixes: 0f3154e6bc ("ionic: Add Tx and Rx handling")
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Link: https://patch.msgid.link/20251031155203.203031-1-mheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefan Wiehler says:
====================
Fix SCTP diag locking issues
- Hold RCU read lock while iterating over address list in
inet_diag_msg_sctpaddrs_fill()
- Prevent TOCTOU out-of-bounds write
- Hold sock lock while iterating over address list in sctp_sock_dump_one()
====================
Link: https://patch.msgid.link/20251028161506.3294376-1-stefan.wiehler@nokia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Horatiu Vultur says:
====================
net: phy: micrel: lan8842 erratas
Add two erratas to the lan8842. The errata document can be found here [1]
The two erratas are:
- module 2 ("Analog front-end not optimized for PHY-side shorted center taps").
- module 7 ("1000BASE-T PMA EEE TX wake timer is non-compliant")
====================
Link: https://patch.msgid.link/20251031121629.814935-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonas Gorski says:
====================
net: dsa: b53: minor fdb related fixes
While investigating and fixing/implenting proper ARL support for
bcm63xx, I encountered multiple minor issues in the current ARL
implementation:
* The ARL multicast support was not properly enabled for older chips,
and instead a potentially reserved bit was toggled.
* While traversing the ARL table, "Search done" triggered one final
entry which will be invalid for 4 ARL bin chips, and failed to stop
the search on chips with only one result register.
* For chips where we have only one result register, we only traversed at
most half the maximum entries.
I also had a fix for IVL_SVL_SELECT which is only valid for some chips,
but since this would only have an effect for !vlan_enabled, and we
always have that enabled, it isn't really worth fixing (and rather drop
the !vlan_enabled paths).
====================
Link: https://patch.msgid.link/20251102100758.28352-1-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When iterating over the ARL table we stop at max ARL entries / 2, but
this is only valid if the chip actually returns 2 results at once. For
chips with only one result register we will stop before reaching the end
of the table if it is more than half full.
Fix this by only dividing the maximum results by two if we have a chip
with more than one result register (i.e. those with 4 ARL bins).
Fixes: cd169d799b ("net: dsa: b53: Bound check ARL searches")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251102100758.28352-4-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The switch clears the ARL_SRCH_STDN bit when the search is done, i.e. it
finished traversing the ARL table.
This means that there will be no valid result, so we should not attempt
to read and process any further entries.
We only ever check the validity of the entries for 4 ARL bin chips, and
only after having passed the first entry to the b53_fdb_copy().
This means that we always pass an invalid entry at the end to the
b53_fdb_copy(). b53_fdb_copy() does check the validity though before
passing on the entry, so it never gets passed on.
On < 4 ARL bin chips, we will even continue reading invalid entries
until we reach the result limit.
Fixes: 1da6df85c6 ("net: dsa: b53: Implement ARL add/del/dump operations")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251102100758.28352-3-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the New Control register bit 1 is either reserved, or has a different
function:
Out of Range Error Discard
When enabled, the ingress port discards any frames
if the Length field is between 1500 and 1536
(excluding 1500 and 1536) and with good CRC.
The actual bit for enabling IP multicast is bit 0, which was only
explicitly enabled for BCM5325 so far.
For older switch chips, this bit defaults to 0, so we want to enable it
as well, while newer switch chips default to 1, and their documentation
says "It is illegal to set this bit to zero."
So drop the wrong B53_IPMC_FWD_EN define, enable the IP multicast bit
also for other switch chips. While at it, rename it to (B53_)IP_MC as
that is how it is called in Broadcom code.
Fixes: 63cc54a6f0 ("net: dsa: b53: Fix egress flooding settings")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251102100758.28352-2-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonas Gorski says:
====================
net: dsa: b53: fix bcm63xx rgmii user ports with speed < 1g
It seems that the integrated switch in bcm63xx does not support polling
external PHYs for link configuration. While the appropriate registers
seem to exist with expected content, changing them does nothing.
This results in user ports with external PHYs only working in 1000/fd,
and not in other modes, despite linking up.
Fix this by writing the link result into the port state override
register, like we already do for fixed links.
With this, ports with lower speeds can successfully transmit and receive
packets.
This also aligns the behaviour with the old bcm63xx_enetsw driver for
those ports.
====================
Link: https://patch.msgid.link/20251101132807.50419-1-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
BCM63XX's switch does not support MDIO scanning of external phys, so its
MACs needs to be manually configured for autonegotiated link speeds.
So b53_force_port_config() and b53_force_link() accordingly also when
mode is MLO_AN_PHY for those ports.
Fixes lower speeds than 1000/full on rgmii ports 4 - 7.
This aligns the behaviour with the old bcm63xx_enetsw driver for those
ports.
Fixes: 967dd82ffc ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251101132807.50419-3-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is no guarantee that the port state override registers have their
default values, as not all switches support being reset via register or
have a reset GPIO.
So when forcing port config, we need to make sure to clear all fields,
which we currently do not do for the speed and flow control
configuration. This can cause flow control stay enabled, or in the case
of speed becoming an illegal value, e.g. configured for 1G (0x2), then
setting 100M (0x1), results in 0x3 which is invalid.
For PORT_OVERRIDE_SPEED_2000M we need to make sure to only clear it on
supported chips, as the bit can have different meanings on other chips,
e.g. for BCM5389 this controls scanning PHYs for link/speed
configuration.
Fixes: 5e004460f8 ("net: dsa: b53: Add helper to set link parameters")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251101132807.50419-2-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The call to device_node_to_regmap() in airoha_mdio_probe() can return
an ERR_PTR() if regmap initialization fails. Currently, the driver
stores the pointer without validation, which could lead to a crash
if it is later dereferenced.
Add an IS_ERR() check and return the corresponding error code to make
the probe path more robust.
Fixes: 67e3ba9783 ("net: mdio: Add MDIO bus controller for Airoha AN7583")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251031161607.58581-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull EDAC fix from Borislav Petkov:
- Fix an off-by-one error in versalnet_edac
* tag 'edac_urgent_for_v6.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/versalnet: Fix off by one in handle_error()
The `rustdoc` modifiers bug [1] was fixed in Rust 1.90.0 [2], for which
we added a workaround in commit abbf9a4494 ("rust: workaround `rustdoc`
target modifiers bug").
However, `rustdoc`'s doctest generation still has a similar issue [3],
being fixed at [4], which does not affect us because we apply the
workaround to both, and now, starting with Rust 1.91.0 (released
2025-10-30), `-Zsanitizer` is a target modifier too [5], which means we
fail with:
RUSTDOC TK rust/kernel/lib.rs
error: mixing `-Zsanitizer` will cause an ABI mismatch in crate `kernel`
--> rust/kernel/lib.rs:3:1
|
3 | //! The `kernel` crate.
| ^
|
= help: the `-Zsanitizer` flag modifies the ABI so Rust crates compiled with different values of this flag cannot be used together safely
= note: unset `-Zsanitizer` in this crate is incompatible with `-Zsanitizer=kernel-address` in dependency `core`
= help: set `-Zsanitizer=kernel-address` in this crate or unset `-Zsanitizer` in `core`
= help: if you are sure this will not cause problems, you may use `-Cunsafe-allow-abi-mismatch=sanitizer` to silence this error
A simple way around is to add the sanitizer to the list in the existing
workaround (especially if we had not started to pass the sanitizer
flags in the previous commit, since in that case that would not be
necessary). However, that still applies the workaround in more cases
than necessary.
Instead, only modify the doctests flags to ignore the check for
sanitizers, so that it is more local (and thus the compiler keeps checking
it for us in the normal `rustdoc` calls). Since the previous commit
already treated the `rustdoc` calls as kernel objects, this should allow
us in the future to easily remove this workaround when the time comes.
By the way, the `-Cunsafe-allow-abi-mismatch` flag overwrites previous
ones rather than appending, so it needs to be all done in the same flag.
Moreover, unknown modifiers are rejected, and thus we have to gate based
on the version too.
Finally, `-Zsanitizer-cfi-normalize-integers` is not affected (in Rust
1.91.0), so it is not needed in the workaround for the moment.
Cc: stable@vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs).
Link: https://github.com/rust-lang/rust/issues/144521 [1]
Link: https://github.com/rust-lang/rust/pull/144523 [2]
Link: https://github.com/rust-lang/rust/issues/146465 [3]
Link: https://github.com/rust-lang/rust/pull/148068 [4]
Link: https://github.com/rust-lang/rust/pull/138736 [5]
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Tested-by: Justin M. Forbes <jforbes@fedoraproject.org>
Link: https://patch.msgid.link/20251102212853.1505384-2-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Even if normally `build_error` isn't a kernel object, it should still
be treated as such so that we pass the same flags. Similarly, `rustdoc`
targets are never kernel objects, but we need to treat them as such.
Otherwise, starting with Rust 1.91.0 (released 2025-10-30), `rustc`
will complain about missing sanitizer flags since `-Zsanitizer` is a
target modifier too [1]:
error: mixing `-Zsanitizer` will cause an ABI mismatch in crate `build_error`
--> rust/build_error.rs:3:1
|
3 | //! Build-time error.
| ^
|
= help: the `-Zsanitizer` flag modifies the ABI so Rust crates compiled with different values of this flag cannot be used together safely
= note: unset `-Zsanitizer` in this crate is incompatible with `-Zsanitizer=kernel-address` in dependency `core`
= help: set `-Zsanitizer=kernel-address` in this crate or unset `-Zsanitizer` in `core`
= help: if you are sure this will not cause problems, you may use `-Cunsafe-allow-abi-mismatch=sanitizer` to silence this error
Thus explicitly mark them as kernel objects.
Cc: stable@vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs).
Link: https://github.com/rust-lang/rust/pull/138736 [1]
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Tested-by: Justin M. Forbes <jforbes@fedoraproject.org>
Link: https://patch.msgid.link/20251102212853.1505384-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
I started seeing this in recent Fedora 42 kernels:
root@x1:~# uname -a
Linux x1 6.17.4-200.fc42.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Oct 19 18:47:49 UTC 2025 x86_64 GNU/Linux
root@x1:~#
root@x1:~# perf test 1
1: vmlinux symtab matches kallsyms : FAILED!
root@x1:~#
Related to:
root@x1:~# grep ' 1 ' /proc/kallsyms
ffffffffb098bc00 1 __pfx__RNCINvNtNtNtCsfwaGRd4cjqE_4core4iter8adapters3map12map_try_foldjNtCskFudTml27HW_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_
ffffffffb098bc10 1 _RNCINvNtNtNtCsfwaGRd4cjqE_4core4iter8adapters3map12map_try_foldjNtCskFudTml27HW_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_
root@x1:~#
That is found in:
root@x1:~# pahole --running_kernel_vmlinux
/usr/lib/debug/lib/modules/6.17.4-200.fc42.x86_64/vmlinux
root@x1:~#
root@x1:~# readelf -sW /usr/lib/debug/lib/modules/6.17.4-200.fc42.x86_64/vmlinux | grep __pfx__RNCINvNtNtNtCsfwaGRd4cjqE_4core4iter8adapters3map12map_try_foldjNtCskFudTml27HW_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_
150649: ffffffff81f8bc00 16 FUNC LOCAL DEFAULT 1 __pfx__RNCINvNtNtNtCsfwaGRd4cjqE_4core4iter8adapters3map12map_try_foldjNtCskFudTml27HW_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_
root@x1:~#
But was being filtered out when reading /proc/kallsyms, as the '1'
symbol type was not being handled, do it, there are just two of them at
this point.
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Benno Lossin <lossin@kernel.org>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Trevor Gross <tmgross@umich.edu>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Revert commit 690de2902d ("i2c: muxes: pca954x: Use reset controller
only") and its dependent commit 94c2967764 ("i2c: muxes: pca954x:
Reset if (de)select fails") because the first breaks all users of the
driver, by requiring a completely optional reset-gpio driver. These
commits cause that mux driver simply stops working when optional
reset-gpio is not included, but that reset-gpio is not pulled anyhow.
Driver cannot remove legacy reset-gpios handling.
Fixes: 690de2902d ("i2c: muxes: pca954x: Use reset controller only")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
To pick the changes in:
885df2d210 ("KVM: x86: Add support for RDMSR/WRMSRNS w/ immediate on Intel")
c42856af8f ("KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL)")
That makes 'perf kvm-stat' aware of these new TDCALL and
MSR_{READ,WRITE}_IMM exit reasons, thus addressing the following perf
build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
Please see tools/include/uapi/README for further details.
Cc: Sean Christopherson <seanjc@google.com>
Cc: Xin Li <xin@zytor.com>
Cc: Isaku Yamahata <isaku.yamahata@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
b8c3c9f5d0 ("x86/apic: Initialize Secure AVIC APIC backing page")
That triggers:
CC /tmp/build/perf-tools/arch/x86/util/kvm-stat.o
LD /tmp/build/perf-tools/arch/x86/util/perf-util-in.o
LD /tmp/build/perf-tools/arch/x86/perf-util-in.o
LD /tmp/build/perf-tools/arch/perf-util-in.o
LD /tmp/build/perf-tools/perf-util-in.o
AR /tmp/build/perf-tools/libperf-util.a
LINK /tmp/build/perf-tools/perf
But this time causes no changes in tooling results, as the introduced
SVM_VMGEXIT_SAVIC exit reason wasn't added to SVM_EXIT_REASONS, that is
used in kvm-stat.c.
And addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h
Please see tools/include/uapi/README for further details.
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
fddd07626b ("KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors")
f2f5519aa4 ("KVM: x86: Define Control Protection Exception (#CP) vector")
9d6812d415 ("KVM: x86: Enable guest SSP read/write interface with new uAPIs")
06f2969c6a ("KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support")
That just rebuilds kvm-stat.c on x86, no change in functionality.
This silences these perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Please see tools/include/uapi/README for further details.
Cc: Sean Christopherson <seanjc@google.com>
Cc: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When tick values are large, the multiplication by NSEC_PER_SEC is larger
than 64 bits and results in bad conversions.
The issue is seen in PMU busyness counters that look like they have
wrapped around due to bad conversion. i915 PMU implementation returns
monotonically increasing counters. If a count is lesser than previous
one, it will only return the larger value until the smaller value
catches up. The user will see this as zero delta between two
measurements even though the engines are busy.
Fix it by using mul_u64_u32_div()
Fixes: 77cdd054dd ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14955
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://lore.kernel.org/r/20251016000350.1152382-2-umesh.nerlige.ramappa@intel.com
(cherry picked from commit 2ada9cb1df)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo: Added the Fixes tag while cherry-picking to fixes]
To pick the changes in:
fe2bf6234e ("KVM: guest_memfd: Add INIT_SHARED flag, reject user page faults if not set")
d2042d8f96 ("KVM: Rework KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS")
3d3a04fad2 ("KVM: Allow and advertise support for host mmap() on guest_memfd files")
That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the 'perf trace' ioctl syscall argument beautifiers.
This addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Please see tools/include/uapi/README for further details.
Cc: Sean Christopherson <seanjc@google.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There is a better way to handle the problem IORING_REGISTER_ZCRX_REFILL
solves. The uapi can also be slightly adjusted to accommodate future
extensions. Remove the feature for now, it'll be reworked for the next
release.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
As part of a recent move go exclusively listing git.kernel.org trees
for the block and io_uring development, the "BLOCK LAYER" entry wasn't
updated as it already used git.kernel.org. However, outside of just
moving from git.kernel.dk to git.kernel.org, the "block" part of the
trees was also dropped, as the tree serves both block and io_uring
development trees.
Fix up the "BLOCK LAYER" entry so they all use the same tree.
Reported-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Picking the changes from:
0864197382 ("drm: Move drm_gem ioctl kerneldoc to uapi file")
53096728b8 ("drm: Add DRM prime interface to reassign GEM handle")
Addressing these perf build warnings:
Warning: Kernel ABI header differences:
Now 'perf trace' and other code that might use the tools/perf/trace/beauty
autogenerated tables will be able to translate this new ioctl command into
a string:
$ tools/perf/trace/beauty/drm_ioctl.sh > before
$ cp include/uapi/drm/drm.h tools/include/uapi/drm/drm.h
$ tools/perf/trace/beauty/drm_ioctl.sh > after
$ diff -u before after
--- before 2025-11-03 09:57:34.832553174 -0300
+++ after 2025-11-03 09:57:47.969409428 -0300
@@ -111,6 +111,7 @@
[0xCF] = "SYNCOBJ_EVENTFD",
[0xD0] = "MODE_CLOSEFB",
[0xD1] = "SET_CLIENT_NAME",
+ [0xD2] = "GEM_CHANGE_HANDLE",
[DRM_COMMAND_BASE + 0x00] = "I915_INIT",
[DRM_COMMAND_BASE + 0x01] = "I915_FLUSH",
[DRM_COMMAND_BASE + 0x02] = "I915_FLIP",
$
Please see tools/include/uapi/README for further details.
Cc: Christian König <christian.koenig@amd.com>
Cc: David Francis <David.Francis@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Looking up a GPIO controller by label that is the name of the software
node is wonky at best - the GPIO controller driver is free to set
a different label than the name of its firmware node. We're already being
passed a firmware node handle attached to the GPIO device to
swnode_get_gpio_device() so use it instead for a more precise lookup.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Fixes: e7f9ff5dc9 ("gpiolib: add support for software nodes")
Link: https://lore.kernel.org/r/20251103-reset-gpios-swnodes-v4-4-6461800b6775@linaro.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Update the text for firmware file naming to show that the l?u? suffix is
supported on CS35L56 B0 silicon and ampN was only used on early firmware.
The previous version of this text only said that B0 silicon used the ampN
suffix. Since kernel 6.16 the driver supports both the old ampN and
new l?u? suffix for B0 silicon. New firmwares will use the l?u? suffix.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20251103115809.33953-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Recent AMD node rework removed the "search and count" method of caching AMD
root devices. This depended on the value from a Data Fabric register that was
expected to hold the PCI bus of one of the root devices attached to that
fabric.
However, this expectation is incorrect. The register, when read from PCI
config space, returns the bitwise-OR of the buses of all attached root
devices.
This behavior is benign on AMD reference design boards, since the bus numbers
are aligned. This results in a bitwise-OR value matching one of the buses. For
example, 0x00 | 0x40 | 0xA0 | 0xE0 = 0xE0.
This behavior breaks on boards where the bus numbers are not exactly aligned.
For example, 0x00 | 0x07 | 0xE0 | 0x15 = 0x1F.
The examples above are for AMD node 0. The first root device on other nodes
will not be 0x00. The first root device for other nodes will depend on the
total number of root devices, the system topology, and the specific PCI bus
number assignment.
For example, a system with 2 AMD nodes could have this:
Node 0 : 0x00 0x07 0x0e 0x15
Node 1 : 0x1c 0x23 0x2a 0x31
The bus numbering style in the reference boards is not a requirement. The
numbering found in other boards is not incorrect. Therefore, the root device
caching method needs to be adjusted.
Go back to the "search and count" method used before the recent rework.
Search for root devices using PCI class code rather than fixed PCI IDs.
This keeps the goal of the rework (remove dependency on PCI IDs) while being
able to support various board designs.
Merge helper functions to reduce code duplication.
[ bp: Reflow comment. ]
Fixes: 40a5f6ffdf ("x86/amd_nb: Simplify root device search")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/all/20251028-fix-amd-root-v2-1-843e38f8be2c@amd.com
This was supposed to pass "onenand" instead of "&onenand" with the
ampersand. Passing a random stack address which will be gone when the
function ends makes no sense. However the good thing is that the pointer
is never used, so this doesn't cause a problem at run time.
Fixes: e23abf4b77 ("mtd: OneNAND: S5PC110: Implement DMA interrupt method")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
cpu-clock usage by the async-profiler tool can trigger a system hang,
which got bisected back to the following commit by Octavia Togami:
18dbcbfabf ("perf: Fix the POLL_HUP delivery breakage") causes this issue
The root cause of the hang is that cpu-clock is a special type of SW
event which relies on hrtimers. The __perf_event_overflow() callback
is invoked from the hrtimer handler for cpu-clock events, and
__perf_event_overflow() tries to call cpu_clock_event_stop()
to stop the event, which calls htimer_cancel() to cancel the hrtimer.
But that's a recursion into the hrtimer code from a hrtimer handler,
which (unsurprisingly) deadlocks.
To fix this bug, use hrtimer_try_to_cancel() instead, and set
the PERF_HES_STOPPED flag, which causes perf_swevent_hrtimer()
to stop the event once it sees the PERF_HES_STOPPED flag.
[ mingo: Fixed the comments and improved the changelog. ]
Closes: https://lore.kernel.org/all/CAHPNGSQpXEopYreir+uDDEbtXTBvBvi8c6fYXJvceqtgTPao3Q@mail.gmail.com/
Fixes: 18dbcbfabf ("perf: Fix the POLL_HUP delivery breakage")
Reported-by: Octavia Togami <octavia.togami@gmail.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Octavia Togami <octavia.togami@gmail.com>
Cc: stable@vger.kernel.org
Link: https://github.com/lucko/spark/issues/530
Link: https://patch.msgid.link/20251015051828.12809-1-dapeng1.mi@linux.intel.com
In the past two years, I have focused on EROFS and contributed features
including the reserved buffer pool, configurable global buffer pool, and
the ongoing direct I/O support for compressed data.
I would like to continue contributing to EROFS and help with code
reviews. Please CC me on EROFS-related changes.
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Acked-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Acked-by: Chao Yu <chao@kernel.org>
Acked-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
The future move of pin-init to `syn` uncovers the following broken
intra-doc link:
error: unresolved link to `crate::pin_init`
--> rust/kernel/sync/condvar.rs:39:40
|
39 | /// instances is with the [`pin_init`](crate::pin_init!) and [`new_condvar`] macros.
| ^^^^^^^^^^^^^^^^ no item named `pin_init` in module `kernel`
|
= note: `-D rustdoc::broken-intra-doc-links` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(rustdoc::broken_intra_doc_links)]`
Currently, when rendered, the link points to a literal `crate::pin_init!`
URL.
Thus fix it.
Cc: stable@vger.kernel.org
Fixes: 129e97be8e ("rust: pin-init: fix documentation links")
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251029073344.349341-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
The future move of pin-init to `syn` uncovers the following private
intra-doc link:
error: public documentation for `Devres` links to private item `Self::inner`
--> rust/kernel/devres.rs:106:7
|
106 | /// [`Self::inner`] is guaranteed to be initialized and is always accessed read-only.
| ^^^^^^^^^^^ this item is private
|
= note: this link will resolve properly if you pass `--document-private-items`
= note: `-D rustdoc::private-intra-doc-links` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(rustdoc::private_intra_doc_links)]`
Currently, when rendered, the link points to "nowhere" (an inexistent
anchor for a "method").
Thus fix it.
Cc: stable@vger.kernel.org
Fixes: f5d3ef25d2 ("rust: devres: get rid of Devres' inner Arc")
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20251029071406.324511-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
To pick the changes from:
e19c062199 ("x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC)")
7b59c73fd6 ("x86/cpufeatures: Add SNP Secure TSC")
3c7cb84145 ("x86/cpufeatures: Add a CPU feature bit for MSR immediate form instructions")
2f8f173413 ("x86/vmscape: Add conditional IBPB mitigation")
a508cec6e5 ("x86/vmscape: Enumerate VMSCAPE bug")
This causes these perf files to be rebuilt and brings some X86_FEATURE
that may be used by:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Please see tools/include/uapi/README for further details.
Cc: Babu Moger <babu.moger@amd.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Nikunj A Dadhania <nikunj@amd.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Xin Li <xin@zytor.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in this cset:
56101b69c9 ("uprobes/x86: Add uprobe syscall to speed up uprobe")
That add support for this new 'uprobe' syscall in tools such as 'perf trace'.
Now it is possible to do a system wide 'perf trace' to look if this new
syscall is being used:
root@number:~# perf trace -v -e uprobe
<SNIP>
event qualifier tracepoint filter: (common_pid != 33989) && (id == 336)
^C
root@number#
$ grep -w uprobe tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
336 common uprobe sys_uprobe
$
This addresses these perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
Please see tools/include/uapi/README for further details.
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes in this cset:
e83f0b5d10 ("nsfs: support exhaustive file handles")
That doesn't introduce anything of interest for tools/, just addresses
these perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/perf/trace/beauty/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h
Please see tools/include/uapi/README for further details.
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes in these csets:
8cdc4d2701 ("mm/huge_memory: respect MADV_COLLAPSE with PR_THP_DISABLE_EXCEPT_ADVISED")
9dc21bbd62 ("prctl: extend PR_SET_THP_DISABLE to optionally exclude VM_HUGEPAGE")
That don't introduce anything of interest for the tools/, just
addressing these perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/perf/trace/beauty/include/uapi/linux/prctl.h include/uapi/linux/prctl.h
Please see tools/include/uapi/README for further details.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up changes from:
db2ab24a34 ("Add RWF_NOSIGNAL flag for pwritev2")
These are used to beautify fs syscall arguments, albeit the changes in
this update are not affecting those beautifiers.
This addresses these tools/ build warnings:
Warning: Kernel ABI header differences:
diff -u tools/perf/trace/beauty/include/uapi/linux/fs.h include/uapi/linux/fs.h
Please see tools/include/uapi/README for details (it's in the first patch
of this series).
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Lauri Vasama <git@vasama.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There is a race between operations that iterate over the userdata
cg_children list and concurrent add/remove of userdata items through
configfs. The update_userdata() function iterates over the
nt->userdata_group.cg_children list, and count_extradata_entries() also
iterates over this same list to count nodes.
Quoting from Documentation/filesystems/configfs.rst:
> A subsystem can navigate the cg_children list and the ci_parent pointer
> to see the tree created by the subsystem. This can race with configfs'
> management of the hierarchy, so configfs uses the subsystem mutex to
> protect modifications. Whenever a subsystem wants to navigate the
> hierarchy, it must do so under the protection of the subsystem
> mutex.
Without proper locking, if a userdata item is added or removed
concurrently while these functions are iterating, the list can be
accessed in an inconsistent state. For example, the list_for_each() loop
can reach a node that is being removed from the list by list_del_init()
which sets the nodes' .next pointer to point to itself, so the loop will
never end (or reach the WARN_ON_ONCE in update_userdata() ).
Fix this by holding the configfs subsystem mutex (su_mutex) during all
operations that iterate over cg_children.
This includes:
- userdatum_value_store() which calls update_userdata() to iterate over
cg_children
- All sysdata_*_enabled_store() functions which call
count_extradata_entries() to iterate over cg_children
The su_mutex must be acquired before dynamic_netconsole_mutex to avoid
potential lock ordering issues, as configfs operations may already hold
su_mutex when calling into our code.
Fixes: df03f830d0 ("net: netconsole: cache userdata formatted string in netconsole_target")
Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
Link: https://patch.msgid.link/20251029-netconsole-fix-warn-v1-1-0d0dd4622f48@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After registering a VLAN device and setting its feature flags, we need to
synchronize the VLAN features with the lower device. For example, the VLAN
device does not have the NETIF_F_LRO flag, it should be synchronized with
the lower device based on the NETIF_F_UPPER_DISABLES definition.
As the dev->vlan_features has changed, we need to call
netdev_update_features(). The caller must run after netdev_upper_dev_link()
links the lower devices, so this patch adds the netdev_update_features()
call in register_vlan_dev().
Fixes: fd867d51f8 ("net/core: generic support for disabling netdev features down stack")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20251030073539.133779-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The script "ethtool-common.sh" is not installed in INSTALL_PATH, and
triggers some errors when I try to run the test
'drivers/net/netdevsim/ethtool-coalesce.sh':
TAP version 13
1..1
# timeout set to 600
# selftests: drivers/net/netdevsim: ethtool-coalesce.sh
# ./ethtool-coalesce.sh: line 4: ethtool-common.sh: No such file or directory
# ./ethtool-coalesce.sh: line 25: make_netdev: command not found
# ethtool: bad command line argument(s)
# ./ethtool-coalesce.sh: line 124: check: command not found
# ./ethtool-coalesce.sh: line 126: [: -eq: unary operator expected
# FAILED /0 checks
not ok 1 selftests: drivers/net/netdevsim: ethtool-coalesce.sh # exit=1
Install this file to avoid this error. After this patch:
TAP version 13
1..1
# timeout set to 600
# selftests: drivers/net/netdevsim: ethtool-coalesce.sh
# PASSED all 22 checks
ok 1 selftests: drivers/net/netdevsim: ethtool-coalesce.sh
Fixes: fbb8531e58 ("selftests: extract common functions in ethtool-common.sh")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Link: https://patch.msgid.link/20251030040340.3258110-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In hfcsusb_probe(), the memory allocated for ctrl_urb gets leaked when
setup_instance() fails with an error code. Fix that by freeing the urb
before freeing the hw structure. Also change the error paths to use the
goto ladder style.
Compile tested only. Issue found using a prototype static analysis tool.
Fixes: 69f52adb2d ("mISDN: Add HFC USB driver")
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Link: https://patch.msgid.link/20251030042524.194812-1-nihaal@cse.iitm.ac.in
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The GRO self-test, gro.c, currently constructs IPv6 packets containing a
Hop-by-Hop Options header (IPPROTO_HOPOPTS) to ensure the GRO path
correctly handles IPv6 extension headers.
However, network elements may be configured to drop packets with the
Hop-by-Hop Options header (HBH). This causes the self-test to fail
in environments where such network elements are present.
To improve the robustness and reliability of this test in diverse
network environments, switch from using IPPROTO_HOPOPTS to
IPPROTO_DSTOPTS (Destination Options).
The Destination Options header is less likely to be dropped by
intermediate routers and still serves the core purpose of the test:
validating GRO's handling of an IPv6 extension header. This change
ensures the test can execute successfully without being incorrectly
failed by network policies outside the kernel's control.
Fixes: 7d1575014a ("selftests/net: GRO coalesce test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Anubhav Singh <anubhavsinggh@google.com>
Link: https://patch.msgid.link/20251030060436.1556664-1-anubhavsinggh@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Due to the gro_sender sending data packets and FIN packets
in very quick succession, these are received almost simultaneously
by the gro_receiver. FIN packets are sometimes processed before the
data packets leading to intermittent (~1/100) test failures.
This change adds a delay of 100ms before sending FIN packets
in gro:tcp test to avoid the out-of-order delivery. The same
mitigation already exists for the gro:ip test.
Fixes: 7d1575014a ("selftests/net: GRO coalesce test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Anubhav Singh <anubhavsinggh@google.com>
Link: https://patch.msgid.link/20251030062818.1562228-1-anubhavsinggh@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The internal switch on BCM63XX SoCs will unconditionally add 802.1Q VLAN
tags on egress to CPU when 802.1Q mode is enabled. We do this
unconditionally since commit ed409f3bba ("net: dsa: b53: Configure
VLANs while not filtering").
This is fine for VLAN aware bridges, but for standalone ports and vlan
unaware bridges this means all packets are tagged with the default VID,
which is 0.
While the kernel will treat that like untagged, this can break userspace
applications processing raw packets, expecting untagged traffic, like
STP daemons.
This also breaks several bridge tests, where the tcpdump output then
does not match the expected output anymore.
Since 0 isn't a valid VID, just strip out the VLAN tag if we encounter
it, unless the priority field is set, since that would be a valid tag
again.
Fixes: 964dbf186e ("net: dsa: tag_brcm: add support for legacy tags")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/20251027194621.133301-1-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The PTP core falls back to gettimex64 and getcrosststamp when
getcycles64 or getcyclesx64 are not implemented. This causes the CYCLES
ioctls to retrieve PHC real time instead of free-running cycles.
Reject PTP_SYS_OFFSET_{PRECISE,EXTENDED}_CYCLES for clocks without
free-running counter support since the result would represent PHC real
time and system time rather than cycles and system time.
Fixes: faf23f54d3 ("ptp: Add ioctl commands to expose raw cycle counter values")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251029083813.2276997-1-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tim Hostetler says:
====================
gve: Fix NULL dereferencing with PTP clock
This patch series fixes NULL dereferences that are possible with gve's
PTP clock due to not stubbing certain ptp_clock_info callbacks.
====================
Link: https://patch.msgid.link/20251029184555.3852952-1-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In commit 296602b8e5 ("arm64: dts: rockchip: Move RK3399 OPPs to dtsi
files for SoC variants"), everything shared between variants of RK3399
was put into rk3399-base.dtsi and the rest in variant-specific DTSI,
such as rk3399-t, rk3399-op1, rk3399, etc.
Therefore, the variant-specific DTSI should include rk3399-base.dtsi and
not another variant's DTSI.
rk3399-op1 wrongly includes rk3399 (a variant) DTSI instead of
rk3399-base DTSI, let's fix this oversight by including the intended
DTSI.
Fortunately, this had no impact on the resulting DTB since all nodes
were named the same and all node properties were overridden in
rk3399-op1.dtsi. This was checked by doing a checksum of rk3399-op1 DTBs
before and after this commit.
No intended change in behavior.
Fixes: 296602b8e5 ("arm64: dts: rockchip: Move RK3399 OPPs to dtsi files for SoC variants")
Cc: stable@vger.kernel.org
Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de>
Reviewed-by: Dragan Simic <dsimic@manjaro.org>
Link: https://patch.msgid.link/20251029-rk3399-op1-include-v1-1-2472ee60e7f8@cherry.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- btrtl: Fix memory leak in rtlbt_parse_firmware_v2()
- MGMT: Fix OOB access in parse_adv_monitor_pattern()
- hci_event: validate skb length for unknown CC opcode
* tag 'for-net-2025-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: MGMT: Fix OOB access in parse_adv_monitor_pattern()
Bluetooth: btrtl: Fix memory leak in rtlbt_parse_firmware_v2()
Bluetooth: hci_event: validate skb length for unknown CC opcode
====================
Link: https://patch.msgid.link/20251031170959.590470-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johannes Berg says:
====================
Couple of new fixes:
- ath10k: revert a patch that had caused issues on some devices
- cfg80211/mac80211: use hrtimers for some things where the
precise timing matters
- zd1211rw: fix a long-standing potential leak
* tag 'wireless-2025-10-30' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: zd1211rw: fix potential memory leak in __zd_usb_enable_rx()
wifi: mac80211: use wiphy_hrtimer_work for csa.switch_work
wifi: mac80211: use wiphy_hrtimer_work for ml_reconf_work
wifi: mac80211: use wiphy_hrtimer_work for ttlm_work
wifi: cfg80211: add an hrtimer based delayed work item
Revert "wifi: ath10k: avoid unnecessary wait for service ready message"
====================
Link: https://patch.msgid.link/20251030104919.12871-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the parse_adv_monitor_pattern() function, the value of
the 'length' variable is currently limited to HCI_MAX_EXT_AD_LENGTH(251).
The size of the 'value' array in the mgmt_adv_pattern structure is 31.
If the value of 'pattern[i].length' is set in the user space
and exceeds 31, the 'patterns[i].value' array can be accessed
out of bound when copied.
Increasing the size of the 'value' array in
the 'mgmt_adv_pattern' structure will break the userspace.
Considering this, and to avoid OOB access revert the limits for 'offset'
and 'length' back to the value of HCI_MAX_AD_LENGTH.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: db08722fc7 ("Bluetooth: hci_core: Fix missing instances using HCI_MAX_AD_LENGTH")
Cc: stable@vger.kernel.org
Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The memory allocated for ptr using kvmalloc() is not freed on the last
error path. Fix that by freeing it on that error path.
Fixes: 9a24ce5e29 ("Bluetooth: btrtl: Firmware format v2 support")
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Commit e0762fd26a ("rtc: cpcap: Fix initial enable_irq/disable_irq
balance") set 'alarm_enabled' prior to calling the function
devm_request_threaded_irq() because this enables the IRQ. However, right
after calling devm_request_threaded_irq(), the driver calls
disable_irq() to disable the IRQ and so now 'alarm_enabled' will be true
but the IRQ is actually disabled. Revert this commit to fix the
'alarm_enabled' state.
Fixes: e0762fd26a ("rtc: cpcap: Fix initial enable_irq/disable_irq balance")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20251031103741.945460-2-jonathanh@nvidia.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Commit 1502fe0e97 ("rtc: tps6586x: Fix initial enable_irq/disable_irq
balance") breaks the wake-up alarm for the tps6586x. After this commit
was added RTC wake ups from suspend stopped working on the Tegra20
Ventana platform.
The problem is that this change set the 'irq_en' variable to true prior
to calling devm_request_threaded_irq() to indicate that the IRQ is
enabled, however, it was over looked that the flag IRQ_NOAUTOEN is
already set meaning that the IRQ is not enabled by default. This
prevents the IRQ from being enabled as expected. Revert this change to
fix this.
Fixes: 1502fe0e97 ("rtc: tps6586x: Fix initial enable_irq/disable_irq balance")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20251031103741.945460-1-jonathanh@nvidia.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Use a scope-based cleanup helper for the buffer allocated with kmalloc()
in ntrig_report_version() to simplify the cleanup logic and prevent
memory leaks (specifically the !hid_is_usb()-case one).
[jkosina@suse.com: elaborate on the actual existing leak]
Fixes: 185c926283 ("HID: hid-ntrig: fix unable to handle page fault in ntrig_report_version()")
Signed-off-by: Masami Ichikawa <masami256@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
In btrfs_fallocate(), when the allocated range overlaps with a prealloc
extent and the extent starts after i_size, the range doesn't get marked
dirty in file_extent_tree. This results in persisting an incorrect
disk_i_size for the inode when not using the no-holes feature.
This is reproducible since commit 41a2ee75aa ("btrfs: introduce
per-inode file extent tree"), then became hidden since commit 3d7db6e8bd
("btrfs: don't allocate file extent tree for non regular files") and then
visible again after commit 8679d2687c ("btrfs: initialize
inode::file_extent_tree after i_mode has been set"), which fixes the
previous commit.
The following reproducer triggers the problem:
$ cat test.sh
MNT=/mnt/test
DEV=/dev/vdb
mkdir -p $MNT
mkfs.btrfs -f -O ^no-holes $DEV
mount $DEV $MNT
touch $MNT/file1
fallocate -n -o 1M -l 2M $MNT/file1
umount $MNT
mount $DEV $MNT
len=$((1 * 1024 * 1024))
fallocate -o 1M -l $len $MNT/file1
du --bytes $MNT/file1
umount $MNT
mount $DEV $MNT
du --bytes $MNT/file1
umount $MNT
Running the reproducer gives the following result:
$ ./test.sh
(...)
2097152 /mnt/test/file1
1048576 /mnt/test/file1
The difference is exactly 1048576 as we assigned.
Fix by adding a call to btrfs_inode_set_file_extent_range() in
btrfs_fallocate_update_isize().
Fixes: 41a2ee75aa ("btrfs: introduce per-inode file extent tree")
Signed-off-by: austinchang <austinchang@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we are logging a new name make sure our inode has the runtime flag
BTRFS_INODE_COPY_EVERYTHING set so that at btrfs_log_inode() we will find
new inode refs/extrefs in the subvolume tree and copy them into the log
tree.
We are currently doing it when adding a new link but we are missing it
when renaming.
An example where this makes a new name not persisted:
1) create symlink with name foo in directory A
2) fsync directory A, which persists the symlink
3) rename the symlink from foo to bar
4) fsync directory A to persist the new symlink name
Step 4 isn't working correctly as it's not logging the new name and also
leaving the old inode ref in the log tree, so after a power failure the
symlink still has the old name of "foo". This is because when we first
fsync directoy A we log the symlink's inode (as it's a new entry) and at
btrfs_log_inode() we set the log mode to LOG_INODE_ALL and then because
we are using that mode and the inode has the runtime flag
BTRFS_INODE_NEEDS_FULL_SYNC set, we clear that flag as well as the flag
BTRFS_INODE_COPY_EVERYTHING. That means the next time we log the inode,
during the rename through the call to btrfs_log_new_name() (calling
btrfs_log_inode_parent() and then btrfs_log_inode()), we will not search
the subvolume tree for new refs/extrefs and jump directory to the
'log_extents' label.
Fix this by making sure we set BTRFS_INODE_COPY_EVERYTHING on an inode
when we are about to log a new name. A test case for fstests will follow
soon.
Reported-by: Vyacheslav Kovalevsky <slava.kovalevskiy.2014@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/ac949c74-90c2-4b9a-b7fd-1ffc5c3175c7@gmail.com/
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When btrfs_add_qgroup_relation() is called with invalid qgroup levels
(src >= dst), the function returns -EINVAL directly without freeing the
preallocated qgroup_list structure passed by the caller. This causes a
memory leak because the caller unconditionally sets the pointer to NULL
after the call, preventing any cleanup.
The issue occurs because the level validation check happens before the
mutex is acquired and before any error handling path that would free
the prealloc pointer. On this early return, the cleanup code at the
'out' label (which includes kfree(prealloc)) is never reached.
In btrfs_ioctl_qgroup_assign(), the code pattern is:
prealloc = kzalloc(sizeof(*prealloc), GFP_KERNEL);
ret = btrfs_add_qgroup_relation(trans, sa->src, sa->dst, prealloc);
prealloc = NULL; // Always set to NULL regardless of return value
...
kfree(prealloc); // This becomes kfree(NULL), does nothing
When the level check fails, 'prealloc' is never freed by either the
callee or the caller, resulting in a 64-byte memory leak per failed
operation. This can be triggered repeatedly by an unprivileged user
with access to a writable btrfs mount, potentially exhausting kernel
memory.
Fix this by freeing prealloc before the early return, ensuring prealloc
is always freed on all error paths.
Fixes: 4addc1ffd6 ("btrfs: qgroup: preallocate memory before adding a relation")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Shardul Bankar <shardulsb08@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
During development of a minor feature (make sure all btrfs_bio::end_io()
is called in task context), I noticed a crash in generic/388, where
metadata writes triggered new works after btrfs_stop_all_workers().
It turns out that it can even happen without any code modification, just
using RAID5 for metadata and the same workload from generic/388 is going
to trigger the use-after-free.
[CAUSE]
If btrfs hits an error, the fs is marked as error, no new
transaction is allowed thus metadata is in a frozen state.
But there are some metadata modifications before that error, and they are
still in the btree inode page cache.
Since there will be no real transaction commit, all those dirty folios
are just kept as is in the page cache, and they can not be invalidated
by invalidate_inode_pages2() call inside close_ctree(), because they are
dirty.
And finally after btrfs_stop_all_workers(), we call iput() on btree
inode, which triggers writeback of those dirty metadata.
And if the fs is using RAID56 metadata, this will trigger RMW and queue
new works into rmw_workers, which is already stopped, causing warning
from queue_work() and use-after-free.
[FIX]
Add a special handling for write_one_eb(), that if the fs is already in
an error state, immediately mark the bbio as failure, instead of really
submitting them.
Then during close_ctree(), iput() will just discard all those dirty
tree blocks without really writing them back, thus no more new jobs for
already stopped-and-freed workqueues.
The extra discard in write_one_eb() also acts as an extra safenet.
E.g. the transaction abort is triggered by some extent/free space
tree corruptions, and since extent/free space tree is already corrupted
some tree blocks may be allocated where they shouldn't be (overwriting
existing tree blocks). In that case writing them back will further
corrupting the fs.
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There's currently no verification for host issued ranges in most of the
pKVM memory transitions. The end boundary might therefore be subject to
overflow and later checks could be evaded.
Close this loophole with an additional pfn_range_is_valid() check on a
per public function basis. Once this check has passed, it is safe to
convert pfn and nr_pages into a phys_addr_t and a size.
host_unshare_guest transition is already protected via
__check_host_shared_guest(), while assert_host_shared_guest() callers
are already ignoring host checks.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://patch.msgid.link/20251016164541.3771235-1-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
get-reg-list includes ZCR_EL2 in the list of EL2 registers that it looks
for when NV is enabled but does not have any feature gate for this register,
meaning that testing any combination of features that includes EL2 but does
not include SVE will result in a test failure due to a missing register
being reported:
| The following lines are missing registers:
|
| ARM64_SYS_REG(3, 4, 1, 2, 0),
Add ZCR_EL2 to feat_id_regs so that the test knows not to expect to see it
without SVE being enabled.
Fixes: 3a90b6f279 ("KVM: arm64: selftests: get-reg-list: Add base EL2 registers")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20251024-kvm-arm64-get-reg-list-zcr-el2-v1-1-0cd0ff75e22f@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Since GITS_TYPER.PTA == 0, the ITS MAPC command demands a CPU ID,
rather than a physical redistributor address, for its RDbase
command argument.
As such, when MAPC-ing guest ITS collections, vgic_lpi_stress iterates
over CPU IDs in the range [0, nr_cpus), passing them as the RDbase
vcpu_id argument to its_send_mapc_cmd().
However, its_encode_target() in the its_send_mapc_cmd() selftest
handler expects RDbase arguments to be formatted with a 16 bit
offset, as shown by the 16-bit target_addr right shift its implementation:
its_mask_encode(&cmd->raw_cmd[2], target_addr >> 16, 51, 16)
At the moment, all CPU IDs passed into its_send_mapc_cmd() have no
offset, therefore becoming 0x0 after the bit shift. Thus, when
vgic_its_cmd_handle_mapc() receives the ITS command in vgic-its.c,
it always interprets the RDbase target CPU as CPU 0. All interrupts
sent to collections will be processed by vCPU 0, which defeats the
purpose of this multi-vCPU test.
Fix by creating procnum_to_rdbase() helper function, which left-shifts
the vCPU parameter received by its_send_mapc_cmd 16 bits before passing
it to its_encode_target for encoding.
Signed-off-by: Maximilian Dittgen <mdittgen@amazon.de>
Link: https://patch.msgid.link/20251020145946.48288-1-mdittgen@amazon.de
Signed-off-by: Marc Zyngier <maz@kernel.org>
SONiX AK870 PRO keyboard pretends to be an apple keyboard by VID:PID,
rendering function keys not treated properly. Despite being a
SONiX USB DEVICE, it uses a different name, so adding it to the list.
Signed-off-by: April Grimoire <april@aprilg.moe>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Add a check to ensure locally generated packets (skb->sk != NULL) do
not use direct output in tunnel mode, as these packets require proper
L2 header setup that is handled by the normal XFRM processing path.
Fixes: 5eddd76ec2 ("xfrm: fix tunnel mode TX datapath in packet offload mode")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The GSO segmentation functions for ESP tunnel mode
(xfrm4_tunnel_gso_segment and xfrm6_tunnel_gso_segment) were
determining the inner packet's L2 protocol type by checking the static
x->inner_mode.family field from the xfrm state.
This is unreliable. In tunnel mode, the state's actual inner family
could be defined by x->inner_mode.family or by
x->inner_mode_iaf.family. Checking only the former can lead to a
mismatch with the actual packet being processed, causing GSO to create
segments with the wrong L2 header type.
This patch fixes the bug by deriving the inner mode directly from the
packet's inner protocol stored in XFRM_MODE_SKB_CB(skb)->protocol.
Instead of replicating the code, this patch modifies the
xfrm_ip2inner_mode helper function. It now correctly returns
&x->inner_mode if the selector family (x->sel.family) is already
specified, thereby handling both specific and AF_UNSPEC cases
appropriately.
With this change, ESP GSO can use xfrm_ip2inner_mode to get the
correct inner mode. It doesn't affect existing callers, as the updated
logic now mirrors the checks they were already performing externally.
Fixes: 26dbd66eab ("esp: choose the correct inner protocol for GSO on inter address family tunnels")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In the output path, xfrm_dev_offload_ok and xfrm_get_inner_ipproto
need to determine the protocol family of the inner packet (skb) before
it gets encapsulated.
In xfrm_dev_offload_ok, the code checked x->inner_mode.family. This is
unreliable because, for states handling both IPv4 and IPv6, the
relevant inner family could be either x->inner_mode.family or
x->inner_mode_iaf.family. Checking only the former can lead to a
mismatch with the actual packet being processed.
In xfrm_get_inner_ipproto, the code checked x->outer_mode.family. This
is also incorrect for tunnel mode, as the inner packet's family can be
different from the outer header's family.
At both of these call sites, the skb variable holds the original inner
packet. The most direct and reliable source of truth for its protocol
family is its destination entry. This patch fixes the issue by using
skb_dst(skb)->ops->family to ensure protocol-specific headers are only
accessed for the correct packet type.
Fixes: 91d8a53db2 ("xfrm: fix offloading of cross-family tunnels")
Fixes: 45a98ef492 ("net/xfrm: IPsec tunnel mode fix inner_ipproto setting in sec_path")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The keyboard of this device has the following in its report description
for Usage (Keyboard) in Collection (Application):
# 0x15, 0x00, // Logical Minimum (0) 52
# 0x25, 0x65, // Logical Maximum (101) 54
# 0x05, 0x07, // Usage Page (Keyboard) 56
# 0x19, 0x00, // Usage Minimum (0) 58
# 0x29, 0xdd, // Usage Maximum (221) 60
# 0x81, 0x00, // Input (Data,Arr,Abs) 62
Since the Usage Min/Max range exceeds the Logical Min/Max range,
keypresses outside the Logical range are not recognized. This includes,
for example, the Japanese language keyboard variant's keys for |, _ and
\.
Fixup the report description to make the Logical range match the Usage
range, fixing the interpretation of keypresses above 101 on this device.
Signed-off-by: Lauri Tirkkonen <lauri@hacktheplanet.fi>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Based on available evidence, the USB ID 4c4a:4155 used by multiple
devices has been attributed to Jieli. The commit 1a8953f4f7
("HID: Add IGNORE quirk for SMARTLINKTECHNOLOGY") affected touchscreen
functionality. Added checks for manufacturer and serial number to
maintain microphone compatibility, enabling both devices to function
properly.
[jkosina@suse.com: edit shortlog]
Fixes: 1a8953f4f7 ("HID: Add IGNORE quirk for SMARTLINKTECHNOLOGY")
Cc: stable@vger.kernel.org
Tested-by: staffan.melin@oscillator.se
Reviewed-by: Terry Junge <linuxhid@cosmicgizmosystems.com>
Signed-off-by: Zhang Heng <zhangheng@kylinos.cn>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
After DME Link Startup, the error return value is set to the MIPI UniPro
GenericErrorCode which can be 0 (SUCCESS) or 1 (FAILURE). Upon failure
during driver probe, the error code 1 is propagated back to the driver
probe function which must return a negative value to indicate an error,
but 1 is not negative, so the probe is considered to be successful even
though it failed. Subsequently, removing the driver results in an oops
because it is not in a valid state.
This happens because none of the callers of ufshcd_init() expect a
non-negative error code.
Fix the return value and documentation to match actual usage.
Fixes: 69f5eb78d4 ("scsi: ufs: core: Move the ufshcd_device_init(hba, true) call")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251024085918.31825-5-adrian.hunter@intel.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
ufshcd_link_startup() has a facility (link_startup_again) to issue
DME_LINKSTARTUP a 2nd time even though the 1st time was successful.
Some older hardware benefits from that, however the behaviour is
non-standard, and has been found to cause link startup to be unreliable
for some Intel Alder Lake based host controllers.
Add UFSHCD_QUIRK_PERFORM_LINK_STARTUP_ONCE to suppress
link_startup_again, in preparation for setting the quirk for affected
controllers.
Fixes: 7dc9fb47bc ("scsi: ufs: ufs-pci: Add support for Intel ADL")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251024085918.31825-3-adrian.hunter@intel.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Intel platforms with UFS, can support Suspend-to-Idle (S0ix) and
Suspend-to-RAM (S3). For S0ix the link state should be HIBERNATE. For
S3, state is lost, so the link state must be OFF. Driver policy,
expressed by spm_lvl, can be 3 (link HIBERNATE, device SLEEP) for S0ix
but must be changed to 5 (link OFF, device POWEROFF) for S3.
Fix support for S0ix/S3 by switching spm_lvl as needed. During suspend
->prepare(), if the suspend target state is not Suspend-to-Idle, ensure
the spm_lvl is at least 5 to ensure that resume will be possible from
deep sleep states. During suspend ->complete(), restore the spm_lvl to
its original value that is suitable for S0ix.
This fix is first needed in Intel Alder Lake based controllers.
Fixes: 7dc9fb47bc ("scsi: ufs: ufs-pci: Add support for Intel ADL")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251024085918.31825-2-adrian.hunter@intel.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Patch "Make HID attributes visible" is needed for older kernel versions
(e.g. 6.12) where ufs_get_device_desc() is called from ufshcd_probe_hba().
In these older kernel versions ufshcd_get_device_desc() may be called
after the sysfs attributes have been added. In the upstream kernel however
ufshcd_get_device_desc() is called before ufs_sysfs_add_nodes(). See also
the ufshcd_device_params_init() call from ufshcd_init(). Hence, calling
sysfs_update_group() is not necessary.
See also commit 69f5eb78d4 ("scsi: ufs: core: Move the
ufshcd_device_init(hba, true) call") in kernel v6.13.
This patch fixes the following kernel warning:
sysfs: cannot create duplicate filename '/devices/platform/3c2d0000.ufs/hid'
Workqueue: async async_run_entry_fn
Call trace:
dump_backtrace+0xfc/0x17c
show_stack+0x18/0x28
dump_stack_lvl+0x40/0x104
dump_stack+0x18/0x3c
sysfs_warn_dup+0x6c/0xc8
internal_create_group+0x1c8/0x504
sysfs_create_groups+0x38/0x9c
ufs_sysfs_add_nodes+0x20/0x58
ufshcd_init+0x1114/0x134c
ufshcd_pltfrm_init+0x728/0x7d8
ufs_google_probe+0x30/0x84
platform_probe+0xa0/0xe0
really_probe+0x114/0x454
__driver_probe_device+0xa4/0x160
driver_probe_device+0x44/0x23c
__device_attach_driver+0x15c/0x1f4
bus_for_each_drv+0x10c/0x168
__device_attach_async_helper+0x80/0xf8
async_run_entry_fn+0x4c/0x17c
process_one_work+0x26c/0x65c
worker_thread+0x33c/0x498
kthread+0x110/0x134
ret_from_fork+0x10/0x20
ufshcd 3c2d0000.ufs: ufs_sysfs_add_nodes: sysfs groups creation failed (err = -17)
Cc: Daniel Lee <chullee@google.com>
Cc: Peter Wang <peter.wang@mediatek.com>
Cc: Bjorn Andersson <andersson@kernel.org>
Cc: Neil Armstrong <neil.armstrong@linaro.org>
Fixes: bb7663dec6 ("scsi: ufs: sysfs: Make HID attributes visible")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Fixes: bb7663dec6 ("scsi: ufs: sysfs: Make HID attributes visible")
Acked-by: Neil Armstrong <neil.armstrong@linaro.org>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Link: https://patch.msgid.link/20251028222433.1108299-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When building drivers/net/ethernet/intel/idpf/xsk.c for ARCH=arm with
CONFIG_CFI=y using a version of LLVM prior to 22.0.0, there is a
BUILD_BUG_ON failure:
$ cat arch/arm/configs/repro.config
CONFIG_BPF_SYSCALL=y
CONFIG_CFI=y
CONFIG_IDPF=y
CONFIG_XDP_SOCKETS=y
$ make -skj"$(nproc)" ARCH=arm LLVM=1 clean defconfig repro.config drivers/net/ethernet/intel/idpf/xsk.o
In file included from drivers/net/ethernet/intel/idpf/xsk.c:4:
include/net/libeth/xsk.h:205:2: error: call to '__compiletime_assert_728' declared with 'error' attribute: BUILD_BUG_ON failed: !__builtin_constant_p(tmo == libeth_xsktmo)
205 | BUILD_BUG_ON(!__builtin_constant_p(tmo == libeth_xsktmo));
| ^
...
libeth_xdp_tx_xmit_bulk() indirectly calls libeth_xsk_xmit_fill_buf()
but these functions are marked as __always_inline so that the compiler
can turn these indirect calls into direct ones and see that the tmo
parameter to __libeth_xsk_xmit_fill_buf_md() is ultimately libeth_xsktmo
from idpf_xsk_xmit().
Unfortunately, the generic kCFI pass in LLVM expands the kCFI bundles
from the indirect calls in libeth_xdp_tx_xmit_bulk() in such a way that
later optimizations cannot turn these calls into direct ones, making the
BUILD_BUG_ON fail because it cannot be proved at compile time that tmo
is libeth_xsktmo.
Disable the generic kCFI pass for libeth_xdp_tx_xmit_bulk() to ensure
these indirect calls can always be turned into direct calls to avoid
this error.
Closes: https://github.com/ClangBuiltLinux/linux/issues/2124
Fixes: 9705d6552f ("idpf: implement Rx path for AF_XDP")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Acked-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20251025-idpf-fix-arm-kcfi-build-error-v1-3-ec57221153ae@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
There are two different ways that LLVM can expand kCFI operand bundles
in LLVM IR: generically in the middle end or using an architecture
specific sequence when lowering LLVM IR to machine code in the backend.
The generic pass allows any architecture to take advantage of kCFI but
the expansion of these bundles in the middle end can mess with
optimizations that may turn indirect calls into direct calls when the
call target is known at compile time, such as after inlining.
Add __nocfi_generic, dependent on an architecture selecting
CONFIG_ARCH_USES_CFI_GENERIC_LLVM_PASS, to disable kCFI bundle
generation in functions where only the generic kCFI pass may cause
problems.
Link: https://github.com/ClangBuiltLinux/linux/issues/2124
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251025-idpf-fix-arm-kcfi-build-error-v1-1-ec57221153ae@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
scx_bpf_cpuperf_set() has a typo where it dereferences the local
variable @sch, instead of the global @scx_root pointer. Fix by
dereferencing the correct variable.
Fixes: 956f2b11a8 ("sched_ext: Drop kf_cpu_valid()")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
When a process tries to access an entry in /afs, normally what happens is
that an automount dentry is created by ->lookup() and then triggered, which
jumps through the ->d_automount() op. Currently, afs_dynroot_lookup() does
not do cell DNS lookup, leaving that to afs_d_automount() to perform -
however, it is possible to use access() or stat() on the automount point,
which will always return successfully, have briefly created an afs_cell
record if one did not already exist.
This means that something like:
test -d "/afs/.west" && echo Directory exists
will print "Directory exists" even though no such cell is configured. This
breaks the "west" python module available on PIP as it expects this access
to fail.
Now, it could be possible to make afs_dynroot_lookup() perform the DNS[*]
lookup, but that would make "ls --color /afs" do this for each cell in /afs
that is listed but not yet probed. kafs-client, probably wrongly, preloads
the entire cell database and all the known cells are then listed in /afs -
and doing ls /afs would be very, very slow, especially if any cell supplied
addresses but was wholly inaccessible.
[*] When I say "DNS", actually read getaddrinfo(), which could use any one
of a host of mechanisms. Could also use static configuration.
To fix this, make the following changes:
(1) Create an enum to specify the origination point of a call to
afs_lookup_cell() and pass this value into that function in place of
the "excl" parameter (which can be derived from it). There are six
points of origination:
- Cell preload through /proc/net/afs/cells
- Root cell config through /proc/net/afs/rootcell
- Lookup in dynamic root
- Automount trigger
- Direct mount with mount() syscall
- Alias check where YFS tells us the cell name is different
(2) Add an extra state into the afs_cell state machine to indicate a cell
that's been initialised, but not yet looked up. This is separate from
one that can be considered active and has been looked up at least
once.
(3) Make afs_lookup_cell() vary its behaviour more, depending on where it
was called from:
If called from preload or root cell config, DNS lookup will not happen
until we definitely want to use the cell (dynroot mount, automount,
direct mount or alias check). The cell will appear in /afs but stat()
won't trigger DNS lookup.
If the cell already exists, dynroot will not wait for the DNS lookup
to complete. If the cell did not already exist, dynroot will wait.
If called from automount, direct mount or alias check, it will wait
for the DNS lookup to complete.
(4) Make afs_lookup_cell() return an error if lookup failed in one way or
another. We try to return -ENOENT if the DNS says the cell does not
exist and -EDESTADDRREQ if we couldn't access the DNS.
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220685
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/1784747.1761158912@warthog.procyon.org.uk
Fixes: 1d0b929fc0 ("afs: Change dynroot to create contents on demand")
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Add keycodes for hotkeys toggling the electronic privacy screen found on
some laptops on/off.
There already is an API for eprivacy screens as kernel-mode-setting drm
connector object properties:
https://www.kernel.org/doc/html/latest/gpu/drm-kms.html#standard-connector-properties
this API also supports reporting when the eprivacy screen is turned on/off
by the embedded-controller (EC) in response to hotkey presses.
But on some laptops (e.g. the Dell Latitude 7300) the firmware does not
allow querying the presence nor the status of the eprivacy screen at boot.
This makes it impossible to implement the drm connector properties API
since drm objects do not allow adding new properties after creation and
the presence of the eprivacy cannot be detected at boot.
The first notice of the presence of an eprivacy screen on these laptops is
an EC generated (WMI) event when the eprivacy screen hotkeys are pressed.
In this case the new keycodes this change adds can be generated to notify
userspace of the eprivacy screen on/off hotkeys being pressed, so that
userspace can show the usual on-screen-display (OSD) notification for eprivacy
screen on/off to the user. This is similar to how e.g. touchpad on/off
keycodes are used to show the touchpad on/off OSD.
Signed-off-by: Hans de Goede <hansg@kernel.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: https://patch.msgid.link/20251020152331.52870-2-hansg@kernel.org
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
regulator_unregister() already frees the associated GPIO device. On
ThinkPad X9 (Lunar Lake), this causes a double free issue that leads to
random failures when other drivers (typically Intel THC) attempt to
allocate interrupts. The root cause is that the reference count of the
pinctrl_intel_platform module unexpectedly drops to zero when this
driver defers its probe.
This behavior can also be reproduced by unloading the module directly.
Fix the issue by removing the redundant release of the GPIO device
during regulator unregistration.
Cc: stable@vger.kernel.org
Fixes: 1e5d088a52 ("platform/x86: int3472: Stop using devm_gpiod_get()")
Signed-off-by: Qiu Wenbo <qiuwenbo@kylinsec.com.cn>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Hans de Goede <hansg@kernel.org>
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Link: https://patch.msgid.link/20251028063009.289414-1-qiuwenbo@gnome.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
The normal timer mechanism assume that timeout further in the future
need a lower accuracy. As an example, the granularity for a timer
scheduled 4096 ms in the future on a 1000 Hz system is already 512 ms.
This granularity is perfectly sufficient for e.g. timeouts, but there
are other types of events that will happen at a future point in time and
require a higher accuracy.
Add a new wiphy_hrtimer_work type that uses an hrtimer internally. The
API is almost identical to the existing wiphy_delayed_work and it can be
used as a drop-in replacement after minor adjustments. The work will be
scheduled relative to the current time with a slack of 1 millisecond.
CC: stable@vger.kernel.org # 6.4+
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20251028125710.7f13a2adc5eb.I01b5af0363869864b0580d9c2a1770bafab69566@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The VBUS supply regulator is currently assigned to the PHY node.
This causes the VBUS to be always on, even when the controller
needs to be switched to peripheral mode.
Fix the OTG role switching by adding a connector node and moving
the VBUS supply regulator to that node. This way the VBUS gets
correctly switched according to the current role.
Fixes: 946ab10e3f ("arm64: dts: Add support for Kontron OSM-S i.MX8MP SoM and BL carrier board")
Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The assembler has .insn for building custom instructions
now, so change the .4byte to .insn. This ensures the output
is marked as an instruction and not as data which may
confuse both debuggers and anything else that relies on
this sort of marking.
Add an ASM_INSN_I() wrapper in asm.h to allow the selecting
of how this is output so older assemblers are still good.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Link: https://lore.kernel.org/r/20251024171640.65232-1-ben.dooks@codethink.co.uk
Signed-off-by: Paul Walmsley <pjw@kernel.org>
The current code directly overwrites the scratch pointer with the
return value of kvrealloc(). If kvrealloc() fails and returns NULL,
the original buffer becomes unreachable, causing a memory leak.
Fix this by using a temporary variable to store kvrealloc()'s return
value and only update the scratch pointer on success.
Found via static anlaysis and this is similar to commit 42378a9ca5
("bpf, verifier: Fix memory leak in array reallocation for stack state")
Fixes: be17c0df67 ("riscv: module: Optimize PLT/GOT entry counting")
Cc: stable@vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20251026091912.39727-1-linmq006@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
The pt_dump_seq_puts() macro incorrectly uses seq_printf() instead of
seq_puts(). This is both a performance issue and conceptually wrong,
as the macro name suggests plain string output (puts) but the
implementation uses formatted output (printf).
The macro is used in ptdump.c:301 to output a newline character. Using
seq_printf() adds unnecessary overhead for format string parsing when
outputting this constant string.
This bug was introduced in commit 59c4da8640 ("riscv: Add support to
dump the kernel page tables") in 2020, which copied the implementation
pattern from other architectures that had the same bug.
Fixes: 59c4da8640 ("riscv: Add support to dump the kernel page tables")
Signed-off-by: Josephine Pfeiffer <hi@josie.lol>
Link: https://lore.kernel.org/r/20251018170451.3355496-1-hi@josie.lol
Signed-off-by: Paul Walmsley <pjw@kernel.org>
When QP wraps around, WQE data from the previous use at the same
position still remains as driver does not clear it. The WQE field
layout differs across different opcodes, causing that the fields
that are not explicitly assigned for the current opcode retain
stale values, and are issued to HW by mistake. Such fields are as
follows:
* MSG_START_SGE_IDX field in ATOMIC WQE
* BLOCK_SIZE and ZBVA fields in FRMR WQE
* DirectWQE fields when DirectWQE not used
For ATOMIC WQE, always set the latest sge index in MSG_START_SGE_IDX
as required by HW.
For FRMR WQE and DirectWQE, clear only those unassigned fields
instead of the entire WQE to avoid performance penalty.
Fixes: 68a997c5d2 ("RDMA/hns: Add FRMR support for hip08")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://patch.msgid.link/20251016114051.1963197-4-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Currently driver enforces affinity between QP cache and send CQ
cache, which helps improve the performance of sending, but doesn't
set affinity with recv CQ cache, resulting in suboptimal performance
of receiving.
Use one CQ bank per context to ensure the affinity among QP, send CQ
and recv CQ. For kernel ULP, CQ bank is fixed to 0.
Fixes: 9e03dbea2b ("RDMA/hns: Fix CQ and QP cache affinity")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://patch.msgid.link/20251016114051.1963197-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
The msi-map property was incorrectly applied to pcie0-ep instead of
pcie1-ep. Correct the msi-map for both pcie0-ep and pcie1-ep nodes.
Fixes: bbe4b2f7d6 ("arm64: dts: imx95: Add msi-map for pci-ep device")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The gpio0_mipi_csi DT nodes are enabled by default, but they are
dependent on the irqsteer_csi nodes, which are not enabled. This causes
the gpio0_mipi_csi GPIOs to be probe deferred. Since these GPIOs can be
used independently of the CSI controller, enable irqsteer_csi by default
too to prevent them from being deferred and to ensure they work out of
the box.
Fixes: 2217f82437 ("arm64: dts: imx8: add capture controller for i.MX8's img subsystem")
Signed-off-by: João Paulo Gonçalves <joao.goncalves@toradex.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
In former times, ext4 was enabled implicitely by enabling ext3 but with
the ext3 fs gone, it does not get enabled, which lets devices fail to
mount root on non-initrd based boots with an ext4 root.
Fixes: d6ace46c82 ("ext4: remove obsolete EXT3 config options")
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
During DT-binding review for extending the V4L2 camera sensor privacy LED
support to systems using devicetree, it has come up that having a "-led"
suffix for the LED name / con_id is undesirable since it already is clear
that it is a LED.
Drop the "-led" suffix from the con_id in both the lookup table in
the int3472 code, as well as from the con_id led_get() argument in
the v4l2-subdev code.
Signed-off-by: Hans de Goede <hansg@kernel.org>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Some systems, e.g. Rock 4D, have a pluggable UFS module. Link startup
fails systematically on these systems. If no UFS module has been plugged
in, more than fourty lines are logged after the "link startup failed"
message. Avoid this by reducing link startup failure logging.
An intended side effect of this patch is that scsi_host_busy() is not
called before scsi_add_host() is called.
Commit 995412e23b ("blk-mq: Replace tags->lock with SRCU for tag
iterators") introduced a regression - the warning shown below is
triggered during every boot. This patch fixes that regression.
Call trace:
__srcu_read_lock+0x30/0x80 (P)
blk_mq_tagset_busy_iter+0x44/0x300
scsi_host_busy+0x38/0x70
ufshcd_print_host_state+0x34/0x1bc
ufshcd_link_startup.constprop.0+0xe4/0x2e0
ufshcd_init+0x944/0xf80
ufshcd_pltfrm_init+0x504/0x820
ufs_rockchip_probe+0x2c/0x88
platform_probe+0x5c/0xa4
really_probe+0xc0/0x38c
__driver_probe_device+0x7c/0x150
driver_probe_device+0x40/0x120
__driver_attach+0xc8/0x1e0
bus_for_each_dev+0x7c/0xdc
driver_attach+0x24/0x30
bus_add_driver+0x110/0x230
driver_register+0x68/0x130
__platform_driver_register+0x20/0x2c
ufs_rockchip_pltform_init+0x1c/0x28
do_one_initcall+0x60/0x1e0
kernel_init_freeable+0x248/0x2c4
kernel_init+0x20/0x140
ret_from_fork+0x10/0x20
Reported-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Closes: https://lore.kernel.org/linux-block/pnezafputodmqlpumwfbn644ohjybouveehcjhz2hmhtcf2rka@sdhoiivync4y/
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251014200118.3390839-3-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
ufs_sysfs_add_nodes() is called concurrently with ufs_get_device_desc().
This may cause the following code to be called before
ufs_sysfs_add_nodes():
sysfs_update_group(&hba->dev->kobj, &ufs_sysfs_hid_group);
If this happens, ufs_sysfs_add_nodes() triggers a kernel warning and
fails. Fix this by calling ufs_sysfs_add_nodes() before SCSI LUNs are
scanned since the sysfs_update_group() call happens from the context of
thread that executes ufshcd_async_scan(). This patch fixes the following
kernel warning:
sysfs: cannot create duplicate filename '/devices/platform/3c2d0000.ufs/hid'
Workqueue: async async_run_entry_fn
Call trace:
dump_backtrace+0xfc/0x17c
show_stack+0x18/0x28
dump_stack_lvl+0x40/0x104
dump_stack+0x18/0x3c
sysfs_warn_dup+0x6c/0xc8
internal_create_group+0x1c8/0x504
sysfs_create_groups+0x38/0x9c
ufs_sysfs_add_nodes+0x20/0x58
ufshcd_init+0x1114/0x134c
ufshcd_pltfrm_init+0x728/0x7d8
ufs_google_probe+0x30/0x84
platform_probe+0xa0/0xe0
really_probe+0x114/0x454
__driver_probe_device+0xa4/0x160
driver_probe_device+0x44/0x23c
__device_attach_driver+0x15c/0x1f4
bus_for_each_drv+0x10c/0x168
__device_attach_async_helper+0x80/0xf8
async_run_entry_fn+0x4c/0x17c
process_one_work+0x26c/0x65c
worker_thread+0x33c/0x498
kthread+0x110/0x134
ret_from_fork+0x10/0x20
ufshcd 3c2d0000.ufs: ufs_sysfs_add_nodes: sysfs groups creation failed (err = -17)
Cc: Daniel Lee <chullee@google.com>
Fixes: bb7663dec6 ("scsi: ufs: sysfs: Make HID attributes visible")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251014200118.3390839-2-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
According to UFS specifications, the power-off sequence for a UFS device
includes:
- Sending an SSU command with Power_Condition=3 and await a response.
- Asserting RST_N low.
- Turning off REF_CLK.
- Turning off VCC.
- Turning off VCCQ/VCCQ2.
As part of ufs shutdown, after the SSU command completion, asserting
hardware reset (HWRST) triggers the device firmware to wake up and
execute its reset routine. This routine initializes hardware blocks and
takes a few milliseconds to complete. During this time, the ICCQ draws a
large current.
This large ICCQ current may cause issues for the regulator which is
supplying power to UFS, because the turn off request from UFS driver to
the regulator framework will be immediately followed by low power
mode(LPM) request by regulator framework. This is done by framework
because UFS which is the only client is requesting for disable. So if
the rail is still in the process of shutting down while ICCQ exceeds LPM
current thresholds, and LPM mode is activated in hardware during this
state, it may trigger an overcurrent protection (OCP) fault in the
regulator.
To prevent this, a 10ms delay is added after asserting HWRST. This
allows the reset operation to complete while power rails remain active
and in high-power mode.
Currently there is no way for Host to query whether the reset is
completed or not and hence this the delay is based on experiments with
Qualcomm UFS controllers across multiple UFS vendors.
Signed-off-by: Nitin Rawat <nitin.rawat@oss.qualcomm.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20251012173828.9880-1-nitin.rawat@oss.qualcomm.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The "groups" property can hold multiple entries (e.g.
toshiba/tmpv7708-rm-mbrc.dts file), so allow that by dropping incorrect
type (pinmux-node.yaml schema already defines that as string-array) and
adding constraints for items. This fixes dtbs_check warnings like:
toshiba/tmpv7708-rm-mbrc.dtb: pinctrl@24190000 (toshiba,tmpv7708-pinctrl):
pwm-pins:groups: ['pwm0_gpio16_grp', 'pwm1_gpio17_grp', 'pwm2_gpio18_grp', 'pwm3_gpio19_grp'] is too long
Fixes: 1825c1fe00 ("pinctrl: Add DT bindings for Toshiba Visconti TMPV7700 SoC")
Cc: stable@vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The pinctrl-rtd driver uses 'devm_regmap_init_mmio', which requires
'REGMAP_MMIO' to be enabled.
Without this selection, the build fails with an undefined reference:
aarch64-none-linux-gnu-ld: drivers/pinctrl/realtek/pinctrl-rtd.o: in
function rtd_pinctrl_probe': pinctrl-rtd.c:(.text+0x5a0): undefined
reference to __devm_regmap_init_mmio_clk'
Fix this by selecting 'REGMAP_MMIO' in the Kconfig.
Fixes: e99ce78030 ("pinctrl: realtek: Add common pinctrl driver for Realtek DHC RTD SoCs")
Signed-off-by: Yu-Chun Lin <eleanor.lin@realtek.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The DMA device pointer `dma_dev` was being dereferenced before ensuring
that `cdns_ctrl->dmac` is properly initialized.
Move the assignment of `dma_dev` after successfully acquiring the DMA
channel to ensure the pointer is valid before use.
Fixes: d76d22b509 ("mtd: rawnand: cadence: use dma_map_resource for sdma address")
Cc: stable@vger.kernel.org
Signed-off-by: Niravkumar L Rabara <niravkumarlaxmidas.rabara@altera.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
While the user manual states that the PLL's rate should be between 180
MHz and 3 GHz in the register defninition section, it also says the
actual operating frequency is 22.5792*4 MHz in the PLL features table.
22.5792*4 MHz is one of the actual clock rates that we want and is
is available in the SDM table. Lower the minimum clock rate to 90 MHz
so that both rates in the SDM table can be used.
Fixes: 7cae1e2b55 ("clk: sunxi-ng: Add support for the A523/T527 CCU PLLs")
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251020171059.2786070-7-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
The "bus-r-dma" clock in the A523's PRCM clock controller is also
referred to as "DMA_CLKEN_SW" or "DMA ADB400 gating". It is unclear how
this ties into the DMA controller MBUS clock gate; however if the clock
is not enabled, the DMA controller in the MCU block will fail to access
DRAM, even failing to retrieve the DMA descriptors.
Mark this clock as critical. This sort of mirrors what is done for the
main DMA controller's MBUS clock, which has a separate toggle that is
currently left out of the main clock controller driver.
Fixes: 8cea339cfb ("clk: sunxi-ng: add support for the A523/T527 PRCM CCU")
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20251020171059.2786070-6-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
If of_genpd_add_provider_onecell() fails during probe, the previously
created generic power domains are not removed, leading to a memory leak
and potential kernel crash later in genpd_debug_add().
Add proper error handling to unwind the initialized domains before
returning from probe to ensure all resources are correctly released on
failure.
Example crash trace observed without this fix:
| Unable to handle kernel paging request at virtual address fffffffffffffc70
| CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.0-rc1 #405 PREEMPT
| Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform
| pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : genpd_debug_add+0x2c/0x160
| lr : genpd_debug_init+0x74/0x98
| Call trace:
| genpd_debug_add+0x2c/0x160 (P)
| genpd_debug_init+0x74/0x98
| do_one_initcall+0xd0/0x2d8
| do_initcall_level+0xa0/0x140
| do_initcalls+0x60/0xa8
| do_basic_setup+0x28/0x40
| kernel_init_freeable+0xe8/0x170
| kernel_init+0x2c/0x140
| ret_from_fork+0x10/0x20
Fixes: 898216c97e ("firmware: arm_scmi: add device power domain support using genpd")
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The rtl_ecc_engine_ops structure is only used to provide a set of
callback functions and is never modified after initialization.
Mark it as const so it can be placed in the read-only section, which
improves safety and allows better compiler optimization.
Signed-off-by: Li Qiang <liqiang01@kylinos.cn>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
The "req.start" and "req.len" variables are u64 values that come from the
user at the start of the function. We mask away the high 32 bits of
"req.len" so that's capped at U32_MAX but the "req.start" variable can go
up to U64_MAX which means that the addition can still integer overflow.
Use check_add_overflow() to fix this bug.
Fixes: 095bb6e44e ("mtdchar: add MEMREAD ioctl")
Fixes: 6420ac0af9 ("mtdchar: prevent unbounded allocation in MEMWRITE ioctl")
Cc: stable@vger.kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
In statmount_string(), most flags assign an output offset pointer (offp)
which is later updated with the string offset. However, the
STATMOUNT_MNT_UIDMAP and STATMOUNT_MNT_GIDMAP cases directly set the
struct fields instead of using offp. This leaves offp uninitialized,
leading to a possible uninitialized dereference when *offp is updated.
Fix it by assigning offp for UIDMAP and GIDMAP as well, keeping the code
path consistent.
Fixes: 37c4a9590e ("statmount: allow to retrieve idmappings")
Fixes: e52e97f09f ("statmount: let unset strings be empty")
Cc: stable@vger.kernel.org
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
Link: https://patch.msgid.link/20251013114151.664341-1-zhen.ni@easystack.cn
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Enabling compile testing should not enable every individual driver (we
have "allyesconfig" for that).
Fixes: 7cd8db0fb0 ("mmc: add COMPILE_TEST to multiple drivers")
Cc: Mikko Rapeli <mikko.rapeli@linaro.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The current hlist_empty checks only test the first bucket of each
hashtable, ignoring any other bucket. They should be caught by the
WARN_ON for state_all, but better to make all the checks accurate.
Fixes: 73d189dce4 ("netns xfrm: per-netns xfrm_state_bydst hash")
Fixes: d320bbb306 ("netns xfrm: per-netns xfrm_state_bysrc hash")
Fixes: b754a4fd8f ("netns xfrm: per-netns xfrm_state_byspi hash")
Fixes: fe9f1d8779 ("xfrm: add state hashtable keyed by seq")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
xfrm_state_construct can fail without setting an error if the
requested pcpu_num value is too big. Set err and add an extack message
to avoid confusing userspace.
Fixes: 1ddf9916ac ("xfrm: Add support for per cpu xfrm state handling.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In case xfrm_state_migrate fails after calling xfrm_dev_state_add, we
directly release the last reference and destroy the new state, without
calling xfrm_dev_state_delete (this only happens in
__xfrm_state_delete, which we're not calling on this path, since the
state was never added).
Call xfrm_dev_state_delete on error when an offload configuration was
provided.
Fixes: ab244a394c ("xfrm: Migrate offload configuration")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In commit b441cf3f8c ("xfrm: delete x->tunnel as we delete x"), I
missed the case where state creation fails between full
initialization (->init_state has been called) and being inserted on
the lists.
In this situation, ->init_state has been called, so for IPcomp
tunnels, the fallback tunnel has been created and added onto the
lists, but the user state never gets added, because we fail before
that. The user state doesn't go through __xfrm_state_delete, so we
don't call xfrm_state_delete_tunnel for those states, and we end up
leaking the FB tunnel.
There are several codepaths affected by this: the add/update paths, in
both net/key and xfrm, and the migrate code (xfrm_migrate,
xfrm_state_migrate). A "proper" rollback of the init_state work would
probably be doable in the add/update code, but for migrate it gets
more complicated as multiple states may be involved.
At some point, the new (not-inserted) state will be destroyed, so call
xfrm_state_delete_tunnel during xfrm_state_gc_destroy. Most states
will have their fallback tunnel cleaned up during __xfrm_state_delete,
which solves the issue that b441cf3f8c (and other patches before it)
aimed at. All states (including FB tunnels) will be removed from the
lists once xfrm_state_fini has called flush_work(&xfrm_state_gc_work).
Reported-by: syzbot+999eb23467f83f9bf9bf@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=999eb23467f83f9bf9bf
Fixes: b441cf3f8c ("xfrm: delete x->tunnel as we delete x")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
We're not updating x1, but we still need to put() it.
Fixes: a4a87fa4e9 ("xfrm: Add Direction to the SA in or out")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Add VMX exit handlers for SEAMCALL and TDCALL to inject a #UD if a non-TD
guest attempts to execute SEAMCALL or TDCALL. Neither SEAMCALL nor TDCALL
is gated by any software enablement other than VMXON, and so will generate
a VM-Exit instead of e.g. a native #UD when executed from the guest kernel.
Note! No unprivileged DoS of the L1 kernel is possible as TDCALL and
SEAMCALL #GP at CPL > 0, and the CPL check is performed prior to the VMX
non-root (VM-Exit) check, i.e. userspace can't crash the VM. And for a
nested guest, KVM forwards unknown exits to L1, i.e. an L2 kernel can
crash itself, but not L1.
Note #2! The Intel® Trust Domain CPU Architectural Extensions spec's
pseudocode shows the CPL > 0 check for SEAMCALL coming _after_ the VM-Exit,
but that appears to be a documentation bug (likely because the CPL > 0
check was incorrectly bundled with other lower-priority #GP checks).
Testing on SPR and EMR shows that the CPL > 0 check is performed before
the VMX non-root check, i.e. SEAMCALL #GPs when executed in usermode.
Note #3! The aforementioned Trust Domain spec uses confusing pseudocode
that says that SEAMCALL will #UD if executed "inSEAM", but "inSEAM"
specifically means in SEAM Root Mode, i.e. in the TDX-Module. The long-
form description explicitly states that SEAMCALL generates an exit when
executed in "SEAM VMX non-root operation". But that's a moot point as the
TDX-Module injects #UD if the guest attempts to execute SEAMCALL, as
documented in the "Unconditionally Blocked Instructions" section of the
TDX-Module base specification.
Cc: stable@vger.kernel.org
Cc: Kai Huang <kai.huang@intel.com>
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20251016182148.69085-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Rename the 'ssi2' and 'aud3' nodes to 'mux-ssi2' and 'mux-aud3' in the
audmux configuration of imx51-zii-rdu1.dts to comply with the naming
convention in imx-audmux.yaml.
This fixes the following dt-schema warning:
imx51-zii-rdu1.dtb: audmux@83fd0000 (fsl,imx51-audmux): 'aud3', 'ssi2'
do not match any of the regexes: '^mux-[0-9a-z]*$', '^pinctrl-[0-9]+$'
Fixes: ceef0396f3 ("ARM: dts: imx: add ZII RDU1 board")
Signed-off-by: Jihed Chaibi <jihed.chaibi.dev@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The 'report-rate-hz' property for the edt-ft5x06 driver was added and
handled in the Linux kernel by me with patches [1] and [2] for this
specific board.
The v1 upstream version, which was the one applied to the customer's
kernel, used the 'report-rate' property, which was written directly to
the controller register. During review, the 'hz' suffix was added,
changing its handling so that writing the value directly to the register
was no longer possible for the M06 controller.
Once the patches were accepted in mainline, I did not reapply them to
the customer's kernel, and when upstreaming the DTS for this board, I
forgot to correct the 'report-rate-hz' property value.
The property must be set to 60 because this board uses the M06 controller,
which expects the report rate in units of 10 Hz, meaning the actual value
written to the register is 6.
[1] 625f829586 ("dt-bindings: input: touchscreen: edt-ft5x06: add report-rate-hz")
[2] 5bcee83a40 ("Input: edt-ft5x06 - set report rate by dts property")
Fixes: ffea3cac94 ("ARM: dts: imx6ul: support Engicam MicroGEA RMM board")
Co-developed-by: Michael Trimarchi <michael@amarulasolutions.com>
Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com>
Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
In `UVERBS_METHOD_CQ_CREATE`, umem should be released if anything goes
wrong. Currently, if `create_cq_umem` fails, umem would not be
released or referenced, causing a possible leak.
In this patch, we release umem at `UVERBS_METHOD_CQ_CREATE`, the driver
should not release umem if it returns an error code.
Fixes: 1a40c362ae ("RDMA/uverbs: Add a common way to create CQ with umem")
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Link: https://patch.msgid.link/aOh1le4YqtYwj-hH@osx.local
Signed-off-by: Leon Romanovsky <leon@kernel.org>
The driver maintains a CQ table that is used to ensure that a CQ is
still valid when processing CQ related AEs. When a CQ is destroyed,
the table entry is cleared, using irdma_cq.cq_num as the index. This
field was never being set, so it was just always clearing out entry
0.
Additionally, the cq_num field size was increased to accommodate HW
supporting more than 64K CQs.
Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20250923142439.943930-1-jmoroni@google.com
Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
In some cases, it is possible for pble_rsrc->next_fpm_addr to be
larger than u32, so remove the u32 cast to avoid unintentional
truncation.
This fixes the following error that can be observed when registering
massive memory regions:
[ 447.227494] (NULL ib_device): cqp opcode = 0x1f maj_err_code = 0xffff min_err_code = 0x800c
[ 447.227505] (NULL ib_device): [Update PE SDs Cmd Error][op_code=21] status=-5 waiting=1 completion_err=1 maj=0xffff min=0x800c
Fixes: e8c4dbc2fc ("RDMA/irdma: Add PBLE resource manager")
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20250923190850.1022773-1-jmoroni@google.com
Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
The current error handling path in bnxt_re_destroy_gsi_sqp() could lead
to a resource leak. When bnxt_qplib_destroy_qp() fails, the function
jumps to the 'fail' label and returns immediately, skipping the call
to bnxt_qplib_free_qp_res().
Continue the resource teardown even if bnxt_qplib_destroy_qp() fails,
which aligns with the driver's general error handling strategy and
prevents the potential leak.
Fixes: 8dae419f9e ("RDMA/bnxt_re: Refactor queue pair creation code")
Signed-off-by: YanLong Dai <daiyanlong@kylinos.cn>
Link: https://patch.msgid.link/20250924061444.11288-1-daiyanlong@kylinos.cn
Signed-off-by: Leon Romanovsky <leon@kernel.org>
In the pegasus_notetaker driver, the pegasus_probe() function allocates
the URB transfer buffer using the wMaxPacketSize value from
the endpoint descriptor. An attacker can use a malicious USB descriptor
to force the allocation of a very small buffer.
Subsequently, if the device sends an interrupt packet with a specific
pattern (e.g., where the first byte is 0x80 or 0x42),
the pegasus_parse_packet() function parses the packet without checking
the allocated buffer size. This leads to an out-of-bounds memory access.
Fixes: 1afca2b66a ("Input: add Pegasus Notetaker tablet driver")
Signed-off-by: Seungjin Bae <eeodqql09@gmail.com>
Link: https://lore.kernel.org/r/20251007214131.3737115-2-eeodqql09@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
When executing kvm_riscv_vcpu_aia_has_interrupts, the vCPU may have
migrated and the IMSIC VS-file have not been updated yet, currently
the HGEIP CSR should be read from the imsic->vsfile_cpu ( the pCPU
before migration ) via on_each_cpu_mask, but this will trigger an
IPI call and repeated IPI within a period of time is expensive in
a many-core systems.
Just let the vCPU execute and update the correct IMSIC VS-file via
kvm_riscv_vcpu_aia_imsic_update may be a simple solution.
Fixes: 4cec89db80 ("RISC-V: KVM: Move HGEI[E|P] CSR access to IMSIC virtualization")
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251016012659.82998-1-fangyu.yu@linux.alibaba.com
Signed-off-by: Anup Patel <anup@brainfault.org>
"mac3" controller was removed from the initial version of fuji-data64
dts because the rgmii setting is incorrect, but dropping mac3 leads to
regression in the existing fuji platform, because fuji.dts simply
includes fuji-data64.dts.
This patch adds mac3 back to fuji-data64.dts to fix the fuji regression[1],
and rgmii settings need to be fixed later.
Fixes: b0f294fdfc ("ARM: dts: aspeed: facebook-fuji: Include facebook-fuji-data64.dts")
Link: https://lore.kernel.org/all/79ddc7b9-ef26-4959-9a16-aa4e006eb145@roeck-us.net/ [1]
Signed-off-by: Tao Ren <rentao.bupt@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
Mark the RTL8211F PHY as a wakeup source for the Jetson Xavier NX.
This allows the reworked RTL8211F driver to know that the PHY is
wired to wakeup capable hardware, and thus to expose WoL capabilities.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Unify the naming of the existing GPU OPP table nodes found in the RK3588
and RK3588J SoC dtsi files with the other SoC's GPU OPP nodes, following
the more "modern" node naming scheme.
Fixes: a7b2070505 ("arm64: dts: rockchip: Split GPU OPPs of RK3588 and RK3588j")
Signed-off-by: Dragan Simic <dsimic@manjaro.org>
[opp-table also is way too generic on systems with like 4-5 opp-tables]
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
The 'rockchip,grf' property for tsadc in rk3328 wasn't actually used in
the driver and is no longer allowed in the DT since commit
e881662aa0 ("dt-bindings: thermal: rockchip: Tighten grf requirements")
So remove that property which fixes the following DT validation issue
tsadc@ff250000 (rockchip,rk3328-tsadc): rockchip,grf: False schema does not allow [[58]]
Signed-off-by: Diederik de Haas <didi.debian@cknow.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Enable proper pin multiplexing for the I2S1 8-channel transmit interface by
adding the default pinctrl configuration which esures correct signal routing
and avoids pinmux conflicts during audio playback.
Changes fix the error
[ 116.856643] [ T782] rockchip-pinctrl pinctrl: pin gpio1-10 already requested by affinity_hint; cannot claim for fe410000.i2s
[ 116.857567] [ T782] rockchip-pinctrl pinctrl: error -EINVAL: pin-42 (fe410000.i2s)
[ 116.857618] [ T782] rockchip-pinctrl pinctrl: error -EINVAL: could not request pin 42 (gpio1-10) from group i2s1m0-sdi1 on device rockchip-pinctrl
[ 116.857659] [ T782] rockchip-i2s-tdm fe410000.i2s: Error applying setting, reverse things back
I2S1 on the M1 to the codec in the RK809 only uses the SCLK, LRCK, SDI0
and SDO0 signals, so limit the claimed pins to those.
With this change audio output works as expected:
$ aplay -l
**** List of PLAYBACK Hardware Devices ****
card 0: HDMI [HDMI], device 0: fe400000.i2s-i2s-hifi i2s-hifi-0 [fe400000.i2s-i2s-hifi i2s-hifi-0]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 1: RK817 [Analog RK817], device 0: fe410000.i2s-rk817-hifi rk817-hifi-0 [fe410000.i2s-rk817-hifi rk817-hifi-0]
Subdevices: 1/1
Subdevice #0: subdevice #0
Fixes: 78f858447c ("arm64: dts: rockchip: Add analog audio on ODROID-M1")
Cc: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Anand Moon <linux.amoon@gmail.com>
[adapted the commit message a bit]
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Since commit 9ba9d11544 ("media: ivtv: Access v4l2_fh from file")
all ioctl handlers have been ported to operate on the file * first
function argument.
The ivtv DVB layer calls ivtv_init_on_first_open() when the driver
needs to start streaming. This function calls the s_input() and
s_frequency() ioctl handlers directly, but being called from the driver
context, it doesn't have a valid file * to pass them. This causes the
ioctl handlers to deference an invalid pointer.
Fix this by moving the implementation of those ioctls to two helper
functions.
The ivtv_do_s_input() helper accepts a struct ivtv * as first argument,
which is easily accessible in ivtv_init_on_first_open() as well as from
the file * argument of the ioctl handler.
The ivtv_s_frequency() takes an ivtv_stream * instead. The stream * can
safely be accessed in ivtv_init_on_first_open() where it is hard-coded
to the IVTV_ENC_STREAM_TYPE_MPG stream type, as well as from the ioctl
handler as a valid stream type is associated to each open file handle
depending on which video device node has been opened in the ivtv_open()
file operation.
The bug has been reported by Smatch.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/aKL4OMWsESUdX8KQ@stanley.mountain/
Fixes: 9ba9d11544 ("media: ivtv: Access v4l2_fh from file")
Cc: stable@vger.kernel.org
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Tested-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Sice commit 7b9eb53e85 ("media: cx18: Access v4l2_fh from file")
all ioctl handlers have been ported to operate on the file * first
function argument.
The cx18 DVB layer calls cx18_init_on_first_open() when the driver needs
to start streaming. This function calls the s_input(), s_std() and
s_frequency() ioctl handlers directly, but being called from the driver
context, it doesn't have a valid file * to pass them. This causes
the ioctl handlers to deference an invalid pointer.
Fix this by moving the implementation of those ioctls to functions that
take a cx18 pointer instead of a file pointer, and turn the V4L2 ioctl
handlers into wrappers that get the cx18 from the file. When calling
from cx18_init_on_first_open(), pass the cx18 pointer directly. This
allows removing the fake fh in cx18_init_on_first_open().
The bug has been reported by Smatch:
--> 1223 cx18_s_input(NULL, &fh, video_input);
The patch adds a new dereference of "file" but some of the callers pass a
NULL pointer.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/aKL4OMWsESUdX8KQ@stanley.mountain/
Fixes: 7b9eb53e85 ("media: cx18: Access v4l2_fh from file")
Cc: stable@vger.kernel.org
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Tested-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
The reset line is being set to input on non-ACPI devices apparently to
save power. This isn't being done on ACPI devices as it's been found
that some ACPI devices don't have a pull-up resistor fitted. This can
also be the case for non-ACPI devices, resulting in:
[ 941.672207] Goodix-TS 1-0014: Error reading 10 bytes from 0x814e: -110
[ 942.696168] Goodix-TS 1-0014: Error reading 10 bytes from 0x814e: -110
[ 945.832208] Goodix-TS 1-0014: Error reading 10 bytes from 0x814e: -110
This behaviour appears to have been initialing introduced in
ec6e1b4082. This doesn't seem to be based on information in either the
GT911 or GT9271 datasheets cited as sources of information for this
change. Thus it seems likely that it is based on functionality in the
Android driver which it also lists. This behaviour may be viable in very
specific instances where the hardware is well known, but seems unwise in
the upstream kernel where such hardware requirements can't be
guaranteed.
Remove this over optimisation to improve reliability on non-ACPI
devices.
Signed-off-by: Martyn Welch <martyn.welch@collabora.com>
Reviewed-by: Hans de Goede <hansg@kernel.org>
Link: https://lore.kernel.org/r/20251009134138.686215-1-martyn.welch@collabora.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The priv->mci[] array has NUM_CONTROLLERS so this > comparison needs to be >=
to prevent an out of bounds access.
Fixes: d5fe2fec6c ("EDAC: Add a driver for the AMD Versal NET DDR controller")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
The mt8189-pinctrl driver requires to probe that a device tree uses
in the device node the same names than mt8189_pinctrl_register_base_names
array. But they are not matching the required ones in the
"mediatek,mt8189-pinctrl" dt-bindings, leading to possible dtbs check
issues. The mt8189_pinctrl_register_base_names entry order is also
different.
So, align all mt8189_pinctrl_register_base_names entry names and order
on dt-bindings.
Fixes: a3fe1324c3 ("pinctrl: mediatek: Add pinctrl driver for mt8189")
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The mt8196-pinctrl driver requires to probe that a device tree uses
in the device node the same names than mt8196_pinctrl_register_base_names
array. But they are not matching the required ones in the
"mediatek,mt8196-pinctrl" dt-bindings, leading to possible dtbs check
issues.
So, align all mt8196_pinctrl_register_base_names entries on dt-bindings
ones.
Fixes: f7a29377c2 ("pinctrl: mediatek: Add pinctrl driver on mt8196")
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-10-13 13:07:36 +02:00
833 changed files with 9272 additions and 4137 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.