linux

mirror of https://github.com/torvalds/linux.git synced 2025-12-07 20:06:24 +00:00

Author	SHA1	Message	Date
Dmitry Safonov	21f4d45eba	net/ip6_tunnel: Prevent perpetual tunnel growth Similarly to ipv4 tunnel, ipv6 version updates dev->needed_headroom, too. While ipv4 tunnel headroom adjustment growth was limited in commit `5ae1e9922b` ("net: ip_tunnel: prevent perpetual headroom growth"), ipv6 tunnel yet increases the headroom without any ceiling. Reflect ipv4 tunnel headroom adjustment limit on ipv6 version. Credits to Francesco Ruggeri, who was originally debugging this issue and wrote local Arista-specific patch and a reproducer. Fixes: `8eb30be035` ("ipv6: Create ip6_tnl_xmit") Cc: Florian Westphal <fw@strlen.de> Cc: Francesco Ruggeri <fruggeri05@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Link: https://patch.msgid.link/20251009-ip6_tunnel-headroom-v2-1-8e4dbd8f7e35@arista.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-13 17:43:46 -07:00
Kamil Horák - 2N	e4d0c909bf	net: phy: bcm54811: Fix GMII/MII/MII-Lite selection The Broadcom bcm54811 is hardware-strapped to select among RGMII and GMII/MII/MII-Lite modes. However, the corresponding bit, RGMII Enable in Miscellaneous Control Register must be also set to select desired RGMII or MII(-lite)/GMII mode. Fixes: `3117a11fff` ("net: phy: bcm54811: PHY initialization") Signed-off-by: Kamil Horák - 2N <kamilh@axis.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20251009130656.1308237-2-kamilh@axis.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-13 17:36:20 -07:00
Linmao Li	70f92ab970	r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111H After resume from S4 (hibernate), RTL8168H/RTL8111H truncates incoming packets. Packet captures show messages like "IP truncated-ip - 146 bytes missing!". The issue is caused by RxConfig not being properly re-initialized after resume. Re-initializing the RxConfig register before the chip re-initialization sequence avoids the truncation and restores correct packet reception. This follows the same pattern as commit `ef9da46dde` ("r8169: fix data corruption issue on RTL8402"). Fixes: `6e1d0b8988` ("r8169:add support for RTL8168H and RTL8107E") Signed-off-by: Linmao Li <lilinmao@kylinos.cn> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20251009122549.3955845-1-lilinmao@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-13 17:33:56 -07:00
Sebastian Andrzej Siewior	25718fdcbd	net: gro_cells: Use nested-BH locking for gro_cell The gro_cell data structure is per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Reported-by: syzbot+8715dd783e9b0bef43b1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68c6c3b1.050a0220.2ff435.0382.GAE@google.com/ Fixes: `3253cb49cb` ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20251009094338.j1jyKfjR@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-13 17:33:32 -07:00
Ivan Vecera	fcb8b32a68	dpll: zl3073x: Handle missing or corrupted flash configuration If the internal flash contains missing or corrupted configuration, basic communication over the bus still functions, but the device is not capable of normal operation (for example, using mailboxes). This condition is indicated in the info register by the ready bit. If this bit is cleared, the probe procedure times out while fetching the device state. Handle this case by checking the ready bit value in zl3073x_dev_start() and skipping DPLL device and pin registration if it is cleared. Do not report this condition as an error, allowing the devlink device to be registered and enabling the user to flash the correct configuration. Prior this patch: [ 31.112299] zl3073x-i2c 1-0070: Failed to fetch input state: -ETIMEDOUT [ 31.116332] zl3073x-i2c 1-0070: error -ETIMEDOUT: Failed to start device [ 31.136881] zl3073x-i2c 1-0070: probe with driver zl3073x-i2c failed with error -110 After this patch: [ 41.011438] zl3073x-i2c 1-0070: FW not fully ready - missing or corrupted config Fixes: `75a71ecc24` ("dpll: zl3073x: Register DPLL devices and pins") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251008141445.841113-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-13 17:24:36 -07:00
Jaegeuk Kim	9d5c4f5c7a	f2fs: fix wrong block mapping for multi-devices Assuming the disk layout as below, disk0: 0 --- 0x00035abfff disk1: 0x00035ac000 --- 0x00037abfff disk2: 0x00037ac000 --- 0x00037ebfff and we want to read data from offset=13568 having len=128 across the block devices, we can illustrate the block addresses like below. 0 .. 0x00037ac000 ------------------- 0x00037ebfff, 0x00037ec000 ------- \| ^ ^ ^ \| fofs 0 13568 13568+128 \| ------------------------------------------------------ \| LBA 0x37e8aa9 0x37ebfa9 0x37ec029 --- map 0x3caa9 0x3ffa9 In this example, we should give the relative map of the target block device ranging from 0x3caa9 to 0x3ffa9 where the length should be calculated by 0x37ebfff + 1 - 0x37ebfa9. In the below equation, however, map->m_pblk was supposed to be the original address instead of the one from the target block address. - map->m_len = min(map->m_len, dev->end_blk + 1 - map->m_pblk); Cc: stable@vger.kernel.org Fixes: `71f2c82062` ("f2fs: multidevice: support direct IO") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-10-13 23:55:44 +00:00
Mateusz Guzik	1ee889fdf4	f2fs: don't call iput() from f2fs_drop_inode() iput() calls the problematic routine, which does a ->i_count inc/dec cycle. Undoing it with iput() recurses into the problem. Note f2fs should not be playing games with the refcount to begin with, but that will be handled later. Right now solve the immediate regression. Fixes: `bc986b1d75` ("fs: stop accessing ->i_count directly in f2fs and gfs2") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202509301450.138b448f-lkp@intel.com Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-10-13 23:55:44 +00:00
Hans Zhang	d6fc45100a	PCI: cadence: Search for MSI Capability with correct ID `907912c1da` ("PCI: cadence: Use cdns_pcie_find_capability() to avoid hardcoding offsets") incorrectly searched for the MSI-X Capability ID instead of the MSI Capability ID in cdns_pcie_ep_get_msi(). Search for PCI_CAP_ID_MSI, not PCI_CAP_ID_MSIX, to fix this problem. Fixes: `907912c1da` ("PCI: cadence: Use cdns_pcie_find_capability() to avoid hardcoding offsets") Reported-by: Sasha Levin <sashal@kernel.org> Closes: https://lore.kernel.org/r/aOfMk9BW8BH2P30V@laps/ Signed-off-by: Hans Zhang <18255117159@163.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20251010144307.12979-1-18255117159@163.com	2025-10-13 16:42:29 -05:00
Ingo Molnar	83b0177a6c	x86/mm: Fix SMP ordering in switch_mm_irqs_off() Stephen noted that it is possible to not have an smp_mb() between the loaded_mm store and the tlb_gen load in switch_mm(), meaning the ordering against flush_tlb_mm_range() goes out the window, and it becomes possible for switch_mm() to not observe a recent tlb_gen update and fail to flush the TLBs. [ dhansen: merge conflict fixed by Ingo ] Fixes: `209954cbc7` ("x86/mm/tlb: Update mm_cpumask lazily") Reported-by: Stephen Dolan <sdolan@janestreet.com> Closes: https://lore.kernel.org/all/CAHDw0oGd0B4=uuv8NGqbUQ_ZVmSheU2bN70e4QhFXWvuAZdt2w@mail.gmail.com/ Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>	2025-10-13 13:55:53 -07:00
Rik van Riel	f25785f9b0	x86/mm: Fix overflow in __cpa_addr() The change to have cpa_flush() call flush_kernel_pages() introduced a bug where __cpa_addr() can access an address one larger than the largest one in the cpa->pages array. KASAN reports the issue like this: BUG: KASAN: slab-out-of-bounds in __cpa_addr arch/x86/mm/pat/set_memory.c:309 [inline] BUG: KASAN: slab-out-of-bounds in __cpa_addr+0x1d3/0x220 arch/x86/mm/pat/set_memory.c:306 Read of size 8 at addr ffff88801f75e8f8 by task syz.0.17/5978 This bug could cause cpa_flush() to not properly flush memory, which somehow never showed any symptoms in my tests, possibly because cpa_flush() is called so rarely, but could potentially cause issues for other people. Fix the issue by directly calculating the flush end address from the start address. Fixes: `86e6815b31` ("x86/mm: Change cpa_flush() to call flush_kernel_range() directly") Reported-by: syzbot+afec6555eef563c66c97@syzkaller.appspotmail.com Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kiryl Shutsemau <kas@kernel.org> Link: https://lore.kernel.org/all/68e2ff90.050a0220.2c17c1.0038.GAE@google.com/	2025-10-13 13:55:48 -07:00
Dave Jiang	a375246fcf	cxl/features: Add check for no entries in cxl_feature_info cxl EDAC calls cxl_feature_info() to get the feature information and if the hardware has no Features support, cxlfs may be passed in as NULL. [ 51.957498] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 51.965571] #PF: supervisor read access in kernel mode [ 51.971559] #PF: error_code(0x0000) - not-present page [ 51.977542] PGD 17e4f6067 P4D 0 [ 51.981384] Oops: Oops: 0000 [#1] SMP NOPTI [ 51.986300] CPU: 49 UID: 0 PID: 3782 Comm: systemd-udevd Not tainted 6.17.0dj test+ #64 PREEMPT(voluntary) [ 51.997355] Hardware name: <removed> [ 52.009790] RIP: 0010:cxl_feature_info+0xa/0x80 [cxl_core] Add a check for cxlfs before dereferencing it and return -EOPNOTSUPP if there is no cxlfs created due to no hardware support. Fixes: `eb5dfcb9e3` ("cxl: Add support to handle user feature commands for set feature") Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>	2025-10-13 13:47:49 -07:00
Gustavo A. R. Silva	8aec9dbf2d	btrfs: send: fix -Wflex-array-member-not-at-end warning in struct send_ctx The warning -Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Fix the following warning: fs/btrfs/send.c:181:24: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] and move the declaration of send_ctx::cur_inode_path to the end. Notice that struct fs_path contains a flexible array member inline_buf, but also a padding array and a limit calculated for the usable space of inline_buf (FS_PATH_INLINE_SIZE). It is not the pattern where flexible array is in the middle of a structure and could potentially overwrite other members. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-10-13 22:36:38 +02:00
Dan Carpenter	e92c294120	btrfs: tree-checker: fix bounds check in check_inode_extref() The parentheses for the unlikely() annotation were put in the wrong place so it means that the condition is basically never true and the bounds checking is skipped. Fixes: `aab9458b9f` ("btrfs: tree-checker: add inode extref checks") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-10-13 22:35:51 +02:00
Miquel Sabaté Solà	fec9b9d3ce	btrfs: fix memory leaks when rejecting a non SINGLE data profile without an RST At the end of btrfs_load_block_group_zone_info() the first thing we do is to ensure that if the mapping type is not a SINGLE one and there is no RAID stripe tree, then we return early with an error. Doing that, though, prevents the code from running the last calls from this function which are about freeing memory allocated during its run. Hence, in this case, instead of returning early, we set the ret value and fall through the rest of the cleanup code. Fixes: `5906333cc4` ("btrfs: zoned: don't skip block group profile checks on conventional zones") CC: stable@vger.kernel.org # 6.8+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-10-13 22:35:14 +02:00
Boris Burkov	8ab2fa6969	btrfs: fix incorrect readahead expansion length The intent of btrfs_readahead_expand() was to expand to the length of the current compressed extent being read. However, "ram_bytes" is not that, in the case where a single physical compressed extent is used for multiple file extents. Consider this case with a large compressed extent C and then later two non-compressed extents N1 and N2 written over C, leaving C1 and C2 pointing to offset/len pairs of C: [ C ] [ N1 ][ C1 ][ N2 ][ C2 ] In such a case, ram_bytes for both C1 and C2 is the full uncompressed length of C. So starting readahead in C1 will expand the readahead past the end of C1, past N2, and into C2. This will then expand readahead again, to C2_start + ram_bytes, way past EOF. First of all, this is totally undesirable, we don't want to read the whole file in arbitrary chunks of the large underlying extent if it happens to exist. Secondly, it results in zeroing the range past the end of C2 up to ram_bytes. This is particularly unpleasant with fs-verity as it can zero and set uptodate pages in the verity virtual space past EOF. This incorrect readahead behavior can lead to verity verification errors, if we iterate in a way that happens to do the wrong readahead. Fix this by using em->len for readahead expansion, not em->ram_bytes, resulting in the expected behavior of stopping readahead at the extent boundary. Reported-by: Max Chernoff <git@maxchernoff.ca> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2399898 Fixes: `9e9ff875e4` ("btrfs: use readahead_expand() on compressed extents") CC: stable@vger.kernel.org # 6.17 Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>	2025-10-13 22:34:08 +02:00
Filipe Manana	a5a51bf4e9	btrfs: do not assert we found block group item when creating free space tree Currently, when building a free space tree at populate_free_space_tree(), if we are not using the block group tree feature, we always expect to find block group items (either extent items or a block group item with key type BTRFS_BLOCK_GROUP_ITEM_KEY) when we search the extent tree with btrfs_search_slot_for_read(), so we assert that we found an item. However this expectation is wrong since we can have a new block group created in the current transaction which is still empty and for which we still have not added the block group's item to the extent tree, in which case we do not have any items in the extent tree associated to the block group. The insertion of a new block group's block group item in the extent tree happens at btrfs_create_pending_block_groups() when it calls the helper insert_block_group_item(). This typically is done when a transaction handle is released, committed or when running delayed refs (either as part of a transaction commit or when serving tickets for space reservation if we are low on free space). So remove the assertion at populate_free_space_tree() even when the block group tree feature is not enabled and update the comment to mention this case. Syzbot reported this with the following stack trace: BTRFS info (device loop3 state M): rebuilding free space tree assertion failed: ret == 0 :: 0, in fs/btrfs/free-space-tree.c:1115 ------------[ cut here ]------------ kernel BUG at fs/btrfs/free-space-tree.c:1115! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 1 UID: 0 PID: 6352 Comm: syz.3.25 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 RIP: 0010:populate_free_space_tree+0x700/0x710 fs/btrfs/free-space-tree.c:1115 Code: ff ff e8 d3 (...) RSP: 0018:ffffc9000430f780 EFLAGS: 00010246 RAX: 0000000000000043 RBX: ffff88805b709630 RCX: fea61d0e2e79d000 RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000 RBP: ffffc9000430f8b0 R08: ffffc9000430f4a7 R09: 1ffff92000861e94 R10: dffffc0000000000 R11: fffff52000861e95 R12: 0000000000000001 R13: 1ffff92000861f00 R14: dffffc0000000000 R15: 0000000000000000 FS: 00007f424d9fe6c0(0000) GS:ffff888125afc000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd78ad212c0 CR3: 0000000076d68000 CR4: 00000000003526f0 Call Trace: <TASK> btrfs_rebuild_free_space_tree+0x1ba/0x6d0 fs/btrfs/free-space-tree.c:1364 btrfs_start_pre_rw_mount+0x128f/0x1bf0 fs/btrfs/disk-io.c:3062 btrfs_remount_rw fs/btrfs/super.c:1334 [inline] btrfs_reconfigure+0xaed/0x2160 fs/btrfs/super.c:1559 reconfigure_super+0x227/0x890 fs/super.c:1076 do_remount fs/namespace.c:3279 [inline] path_mount+0xd1a/0xfe0 fs/namespace.c:4027 do_mount fs/namespace.c:4048 [inline] __do_sys_mount fs/namespace.c:4236 [inline] __se_sys_mount+0x313/0x410 fs/namespace.c:4213 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f424e39066a Code: d8 64 89 02 (...) RSP: 002b:00007f424d9fde68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00007f424d9fdef0 RCX: 00007f424e39066a RDX: 0000200000000180 RSI: 0000200000000380 RDI: 0000000000000000 RBP: 0000200000000180 R08: 00007f424d9fdef0 R09: 0000000000000020 R10: 0000000000000020 R11: 0000000000000246 R12: 0000200000000380 R13: 00007f424d9fdeb0 R14: 0000000000000000 R15: 00002000000002c0 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- Reported-by: syzbot+884dc4621377ba579a6f@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/68dc3dab.a00a0220.102ee.004e.GAE@google.com/ Fixes: `a5ed918285` ("Btrfs: implement the free space B-tree") CC: <stable@vger.kernel.org> # 6.1.x: `1961d20f6f`: btrfs: fix assertion when building free space tree CC: <stable@vger.kernel.org> # 6.1.x Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-10-13 22:33:22 +02:00
Qu Wenruo	42d3a055d9	btrfs: do not use folio_test_partial_kmap() in ASSERT()s [BUG] Syzbot reported an ASSERT() triggered inside scrub: BTRFS info (device loop0): scrub: started on devid 1 assertion failed: !folio_test_partial_kmap(folio) :: 0, in fs/btrfs/scrub.c:697 ------------[ cut here ]------------ kernel BUG at fs/btrfs/scrub.c:697! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 6077 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 RIP: 0010:scrub_stripe_get_kaddr+0x1bb/0x1c0 fs/btrfs/scrub.c:697 Call Trace: <TASK> scrub_bio_add_sector fs/btrfs/scrub.c:932 [inline] scrub_submit_initial_read+0xf21/0x1120 fs/btrfs/scrub.c:1897 submit_initial_group_read+0x423/0x5b0 fs/btrfs/scrub.c:1952 flush_scrub_stripes+0x18f/0x1150 fs/btrfs/scrub.c:1973 scrub_stripe+0xbea/0x2a30 fs/btrfs/scrub.c:2516 scrub_chunk+0x2a3/0x430 fs/btrfs/scrub.c:2575 scrub_enumerate_chunks+0xa70/0x1350 fs/btrfs/scrub.c:2839 btrfs_scrub_dev+0x6e7/0x10e0 fs/btrfs/scrub.c:3153 btrfs_ioctl_scrub+0x249/0x4b0 fs/btrfs/ioctl.c:3163 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> ---[ end trace 0000000000000000 ]--- Which doesn't make much sense, as all the folios we allocated for scrub should not be highmem. [CAUSE] Thankfully syzbot has a detailed kernel config file, showing that CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP is set to y. And that debug option will force all folio_test_partial_kmap() to return true, to improve coverage on highmem tests. But in our case we really just want to make sure the folios we allocated are not highmem (and they are indeed not). Such incorrect result from folio_test_partial_kmap() is just screwing up everything. [FIX] Replace folio_test_partial_kmap() to folio_test_highmem() so that we won't bother those highmem specific debuging options. Fixes: `5fbaae4b85` ("btrfs: prepare scrub to support bs > ps cases") Reported-by: syzbot+bde59221318c592e6346@syzkaller.appspotmail.com Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-10-13 22:31:36 +02:00
Qu Wenruo	b7fdfd29a1	btrfs: only set the device specific options after devices are opened [BUG] With v6.17-rc kernels, btrfs will always set 'ssd' mount option even if the block device is not a rotating one: # cat /sys/block/sdd/queue/rotational 1 # cat /etc/fstab: LABEL=DATA2 /data2 btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/,nofail,nosuid,nodev 0 0 # mount [...] /dev/sdd on /data2 type btrfs (rw,nosuid,nodev,relatime,ssd,space_cache=v2,subvolid=5,subvol=/) [CAUSE] The 'ssd' mount option is set by set_device_specific_options(), and it expects that if there is any rotating device in the btrfs, it will set fs_devices::rotating. However after commit `bddf57a707` ("btrfs: delay btrfs_open_devices() until super block is created"), the device opening is delayed until the super block is created. But the timing of set_device_specific_options() is still left as is, this makes the function be called without any device opened. Since no device is opened, thus fs_devices::rotating will never be set, making btrfs incorrectly set 'ssd' mount option. [FIX] Only call set_device_specific_options() after btrfs_open_devices(). Also only call set_device_specific_options() after a new mount, if we're mounting a mounted btrfs, there is no need to set the device specific mount options again. Reported-by: HAN Yuwei <hrx@bupt.moe> Link: https://lore.kernel.org/linux-btrfs/C8FF75669DFFC3C5+5f93bf8a-80a0-48a6-81bf-4ec890abc99a@bupt.moe/ Fixes: `bddf57a707` ("btrfs: delay btrfs_open_devices() until super block is created") CC: stable@vger.kernel.org # 6.17 Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-10-13 22:29:53 +02:00
Miquel Sabaté Solà	53a4acbfc1	btrfs: fix memory leak on duplicated memory in the qgroup assign ioctl On 'btrfs_ioctl_qgroup_assign' we first duplicate the argument as provided by the user, which is kfree'd in the end. But this was not the case when allocating memory for 'prealloc'. In this case, if it somehow failed, then the previous code would go directly into calling 'mnt_drop_write_file', without freeing the string duplicated from the user space. Fixes: `4addc1ffd6` ("btrfs: qgroup: preallocate memory before adding a relation") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-10-13 22:29:27 +02:00
Filipe Manana	7e5a5983ed	btrfs: fix clearing of BTRFS_FS_RELOC_RUNNING if relocation already running When starting relocation, at reloc_chunk_start(), if we happen to find the flag BTRFS_FS_RELOC_RUNNING is already set we return an error (-EINPROGRESS) to the callers, however the callers call reloc_chunk_end() which will clear the flag BTRFS_FS_RELOC_RUNNING, which is wrong since relocation was started by another task and still running. Finding the BTRFS_FS_RELOC_RUNNING flag already set is an unexpected scenario, but still our current behaviour is not correct. Fix this by never calling reloc_chunk_end() if reloc_chunk_start() has returned an error, which is what logically makes sense, since the general widespread pattern is to have end functions called only if the counterpart start functions succeeded. This requires changing reloc_chunk_start() to clear BTRFS_FS_RELOC_RUNNING if there's a pending cancel request. Fixes: `907d2710d7` ("btrfs: add cancellable chunk relocation support") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-10-13 22:29:03 +02:00
Shuicheng Lin	9f64b3cd05	drm/xe/guc: Check GuC running state before deregistering exec queue In normal operation, a registered exec queue is disabled and deregistered through the GuC, and freed only after the GuC confirms completion. However, if the driver is forced to unbind while the exec queue is still running, the user may call exec_destroy() after the GuC has already been stopped and CT communication disabled. In this case, the driver cannot receive a response from the GuC, preventing proper cleanup of exec queue resources. Fix this by directly releasing the resources when GuC is not running. Here is the failure dmesg log: " [ 468.089581] ---[ end trace 0000000000000000 ]--- [ 468.089608] pci 0000:03:00.0: [drm] ERROR GT0: GUC ID manager unclean (1/65535) [ 468.090558] pci 0000:03:00.0: [drm] GT0: total 65535 [ 468.090562] pci 0000:03:00.0: [drm] GT0: used 1 [ 468.090564] pci 0000:03:00.0: [drm] GT0: range 1..1 (1) [ 468.092716] ------------[ cut here ]------------ [ 468.092719] WARNING: CPU: 14 PID: 4775 at drivers/gpu/drm/xe/xe_ttm_vram_mgr.c:298 ttm_vram_mgr_fini+0xf8/0x130 [xe] " v2: use xe_uc_fw_is_running() instead of xe_guc_ct_enabled(). As CT may go down and come back during VF migration. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251010172529.2967639-2-shuicheng.lin@intel.com (cherry picked from commit `9b42321a02`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:03:26 -07:00
Vinay Belgaumkar	1852d27aa9	drm/xe: Enable media sampler power gating Where applicable, enable media sampler power gating. Also, add it to the powergate_info debugfs. v2: Remove the sampler powergate status since it is cleared quickly anyway. v3: Use vcs mask (Rodrigo) and fix the version check for media v4: Remove extra spaces v5: Media samplers are independent of vcs mask, use Media version 1255 (Matt Roper) Fixes: `38e8c4184e` ("drm/xe: Enable Coarse Power Gating") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20251010011047.2047584-1-vinay.belgaumkar@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `4cbc08649a`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:03:20 -07:00
Matthew Brost	7413e9f2be	drm/xe: Handle mixed mappings and existing VRAM on atomic faults Moving to VRAM will fail if mixed mappings are present or if the page is already located in VRAM. Atomic faults that require a move to VRAM currently retry without attempting to evict mixed mappings or locate existing VRAM mappings. This patch fixes the issue by attempting to evict mixed mappings or find existing VRAM pages when a move to VRAM fails during atomic fault handling. Fixes: `a9ac0fa455` ("drm/xe: Strict migration policy for atomic SVM faults") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20251009130629.3531962-1-matthew.brost@intel.com (cherry picked from commit `75188605c5`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:03:15 -07:00
Thomas Hellström	1117e7d1e8	drm/xe/migrate: Fix an error path The exhaustive eviction accidently changed an error path goto to a return. Fix this. Fixes: `59eabff2a3` ("drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction") Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250910160939.103473-1-thomas.hellstrom@linux.intel.com (cherry picked from commit `381f1ed151`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:03:08 -07:00
Lucas De Marchi	d30203739b	drm/xe: Move rebar to be done earlier There may be cases in which the BAR0 also needs to move to accommodate the bigger BAR2. However if it's not released, the BAR2 resize fails. During the vram probe it can't be released as it's already in use by xe_mmio for early register access. Add a new function in xe_vram and let xe_pci call it directly before even early device probe. This allows the BAR2 to resize in cases BAR0 also needs to move, assuming there aren't other reasons to hold that move: [] xe 0000:03:00.0: vgaarb: deactivate vga console [] xe 0000:03:00.0: [drm] Attempting to resize bar from 8192MiB -> 16384MiB [] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: releasing [] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing [] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing [] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing [] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned [] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned [] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned [] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: assigned [] pcieport 0000:00:01.0: PCI bridge to [bus 01-04] [] pcieport 0000:00:01.0: bridge window [mem 0x83000000-0x840fffff] [] pcieport 0000:00:01.0: bridge window [mem 0x4000000000-0x44007fffff 64bit pref] [] pcieport 0000:01:00.0: PCI bridge to [bus 02-04] [] pcieport 0000:01:00.0: bridge window [mem 0x83000000-0x840fffff] [] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref] [] pcieport 0000:02:01.0: PCI bridge to [bus 03] [] pcieport 0000:02:01.0: bridge window [mem 0x83000000-0x83ffffff] [] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref] [] xe 0000:03:00.0: [drm] BAR2 resized to 16384M [] xe 0000:03:00.0: [drm:xe_pci_probe [xe]] BATTLEMAGE e221:0000 dgfx:1 gfx:Xe2_HPG (20.02) ... For BMG there are additional fix needed in the PCI side, but this helps getting it to a working resize. All the rebar logic is more pci-specific than xe-specific and can be done very early in the probe sequence. In future it would be good to move it out of xe_vram.c, but this refactor is left for later. Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: stable@vger.kernel.org # 6.12+ Link: https://lore.kernel.org/intel-xe/fafda2a3-fc63-ce97-d22b-803f771a4d19@linux.intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20250918-xe-pci-rebar-2-v1-2-6c094702a074@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `45e33f220f`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:03:03 -07:00
Matthew Brost	7ac74613e5	drm/xe: Don't allow evicting of BOs in same VM in array of VM binds An array of VM binds can potentially evict other buffer objects (BOs) within the same VM under certain conditions, which may lead to NULL pointer dereferences later in the bind pipeline. To prevent this, clear the allow_res_evict flag in the xe_bo_validate call. v2: - Invert polarity of no_res_evict (Thomas) - Add comment in code explaining issue (Thomas) Cc: stable@vger.kernel.org Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6268 Fixes: `774b5fa509` ("drm/xe: Avoid evicting object of the same vm in none fault mode") Fixes: `77f2ef3f16` ("drm/xe: Lock all gpuva ops during VM bind IOCTL") Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20251009110618.3481870-1-matthew.brost@intel.com (cherry picked from commit `8b9ba8d6d9`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:02:58 -07:00
Kenneth Graunke	e5ae8d1eb0	drm/xe: Increase global invalidation timeout to 1000us The previous timeout of 500us seems to be too small; panning the map in the Roll20 VTT in Firefox on a KDE/Wayland desktop reliably triggered timeouts within a few seconds of usage, causing the monitor to freeze and the following to be printed to dmesg: [Jul30 13:44] xe 0000:03:00.0: [drm] ERROR GT0: Global invalidation timeout [Jul30 13:48] xe 0000:03:00.0: [drm] ERROR [CRTC:82:pipe A] flip_done timed out I haven't hit a single timeout since increasing it to 1000us even after several multi-hour testing sessions. Fixes: `0dd2dd0182` ("drm/xe: Move DSB l2 flush to a more sensible place") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5710 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250912223254.147940-1-kenneth@whitecape.org Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `146046907b`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 13:02:50 -07:00
Tetsuo Handa	93a27b5891	can: j1939: add missing calls in NETDEV_UNREGISTER notification handler Currently NETDEV_UNREGISTER event handler is not calling j1939_cancel_active_session() and j1939_sk_queue_drop_all(). This will result in these calls being skipped when j1939_sk_release() is called. And I guess that the reason syzbot is still reporting unregister_netdevice: waiting for vcan0 to become free. Usage count = 2 is caused by lack of these calls. Calling j1939_cancel_active_session(priv, sk) from j1939_sk_release() can be covered by calling j1939_cancel_active_session(priv, NULL) from j1939_netdev_notify(). Calling j1939_sk_queue_drop_all() from j1939_sk_release() can be covered by calling j1939_sk_netdev_event_netdown() from j1939_netdev_notify(). Therefore, we can reuse j1939_cancel_active_session(priv, NULL) and j1939_sk_netdev_event_netdown(priv) for NETDEV_UNREGISTER event handler. Fixes: `7fcbe5b2c6` ("can: j1939: implement NETDEV_UNREGISTER notification handler") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/3ad3c7f8-5a74-4b07-a193-cb0725823558@I-love.SAKURA.ne.jp Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2025-10-13 21:26:31 +02:00
Marc Kleine-Budde	a3c35f8270	Merge patch series "can: add Transmitter Delay Compensation (TDC) documentation" Vincent Mailhol <mailhol@kernel.org> says: TDC was added to the kernel in 2021 but I never took time to update the documentation. The year is now 2025... As we say: "better late than never"! The first patch is a small clean up which fixes an incorrect statement concerning the CAN DLC, the second patch is the real thing and adds the documentation of how to use the ip tool to configure the TDC. Link: https://patch.msgid.link/20251013-can-fd-doc-v2-0-5d53bdc8f2ad@kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2025-10-13 21:26:08 +02:00
Vincent Mailhol	b5746b3e8e	can: add Transmitter Delay Compensation (TDC) documentation Back in 2021, support for CAN TDC was added to the kernel in series [1] and in iproute2 in series [2]. However, the documentation was never updated. Add a new sub-section under CAN-FD driver support to document how to configure the TDC using the "ip tool". [1] add the netlink interface for CAN-FD Transmitter Delay Compensation (TDC) Link: https://lore.kernel.org/all/20210918095637.20108-1-mailhol.vincent@wanadoo.fr/ [2] iplink_can: cleaning, fixes and adding TDC support Link: https://lore.kernel.org/all/20211103164428.692722-1-mailhol.vincent@wanadoo.fr/ Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Link: https://patch.msgid.link/20251013-can-fd-doc-v2-2-5d53bdc8f2ad@kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2025-10-13 21:26:01 +02:00
Vincent Mailhol	c282993ccd	can: remove false statement about 1:1 mapping between DLC and length The CAN-FD section of can.rst still states that there is a 1:1 mapping between the Classical CAN DLC and its length. This is only true for the DLC values up to 8. Beyond that point, the length remains at 8. For reference, the mapping between the CAN DLC and the length is given in below table [1]: DLC value CBFF and CEFF FBFF and FEFF [decimal] [byte] [byte] ---------------------------------------------- 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 8 12 10 8 16 11 8 20 12 8 24 13 8 32 14 8 48 15 8 64 Remove the erroneous statement. Instead just state that the length of a Classical CAN frame ranges from 0 to 8. [1] ISO 11898-1:2024, Table 5 -- DLC: coding of the four LSB Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Link: https://patch.msgid.link/20251013-can-fd-doc-v2-1-5d53bdc8f2ad@kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2025-10-13 21:26:01 +02:00
Babu Moger	15292f1b4c	x86/resctrl: Fix miscount of bandwidth event when reactivating previously unavailable RMID Users can create as many monitoring groups as the number of RMIDs supported by the hardware. However, on AMD systems, only a limited number of RMIDs are guaranteed to be actively tracked by the hardware. RMIDs that exceed this limit are placed in an "Unavailable" state. When a bandwidth counter is read for such an RMID, the hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable remains set on first read after tracking re-starts and is clear on all subsequent reads as long as the RMID is tracked. resctrl miscounts the bandwidth events after an RMID transitions from the "Unavailable" state back to being tracked. This happens because when the hardware starts counting again after resetting the counter to zero, resctrl in turn compares the new count against the counter value stored from the previous time the RMID was tracked. This results in resctrl computing an event value that is either undercounting (when new counter is more than stored counter) or a mistaken overflow (when new counter is less than stored counter). Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to zero whenever the RMID is in the "Unavailable" state to ensure accurate counting after the RMID resets to zero when it starts to be tracked again. Example scenario that results in mistaken overflow ================================================== 1. The resctrl filesystem is mounted, and a task is assigned to a monitoring group. $mount -t resctrl resctrl /sys/fs/resctrl $mkdir /sys/fs/resctrl/mon_groups/test1/ $echo 1234 > /sys/fs/resctrl/mon_groups/test1/tasks $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_/mbm_total_bytes 21323 <- Total bytes on domain 0 "Unavailable" <- Total bytes on domain 1 Task is running on domain 0. Counter on domain 1 is "Unavailable". 2. The task runs on domain 0 for a while and then moves to domain 1. The counter starts incrementing on domain 1. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_/mbm_total_bytes 7345357 <- Total bytes on domain 0 4545 <- Total bytes on domain 1 3. At some point, the RMID in domain 0 transitions to the "Unavailable" state because the task is no longer executing in that domain. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_/mbm_total_bytes "Unavailable" <- Total bytes on domain 0 434341 <- Total bytes on domain 1 4. Since the task continues to migrate between domains, it may eventually return to domain 0. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_/mbm_total_bytes 17592178699059 <- Overflow on domain 0 3232332 <- Total bytes on domain 1 In this case, the RMID on domain 0 transitions from "Unavailable" state to active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when the counter is read and begins tracking the RMID counting from 0. Subsequent reads succeed but return a value smaller than the previously saved MSR value (7345357). Consequently, the resctrl's overflow logic is triggered, it compares the previous value (7345357) with the new, smaller value and incorrectly interprets this as a counter overflow, adding a large delta. In reality, this is a false positive: the counter did not overflow but was simply reset when the RMID transitioned from "Unavailable" back to active state. Here is the text from APM [1] available from [2]. "In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the first QM_CTR read when it begins tracking an RMID that it was not previously tracking. The U bit will be zero for all subsequent reads from that RMID while it is still tracked by the hardware. Therefore, a QM_CTR read with the U bit set when that RMID is in use by a processor can be considered 0 when calculating the difference with a subsequent read." [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.3.3 Monitoring L3 Memory Bandwidth (MBM). [ bp: Split commit message into smaller paragraph chunks for better consumption. ] Fixes: `4d05bf71f1` ("x86/resctrl: Introduce AMD QOS feature") Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org # needs adjustments for <= v6.17 Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]	2025-10-13 21:24:39 +02:00
Pauli Virtanen	2b929b6eec	ALSA: usb-audio: add mixer_playback_min_mute quirk for Logitech H390 ID 046d:0a8f Logitech, Inc. H390 headset with microphone is reported to have muted min playback volume. Apply quirk for that. Link: https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4929 Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Takashi Iwai <tiwai@suse.de>	2025-10-13 21:03:42 +02:00
Li Ming	02e7567f5d	cxl/port: Avoid missing port component registers setup port->nr_dports is used to represent how many dports added to the cxl port, it will increase in add_dport() when a new dport is being added to the cxl port, but it will not be reduced when a dport is removed from the cxl port. Currently, when the first dport is added to a cxl port, it will trigger component registers setup on the cxl port, the implementation is using port->nr_dports to confirm if the dport is the first dport. A corner case here is that adding dport could fail after port->nr_dports updating and before checking port->nr_dports for component registers setup. If the failure happens during the first dport attaching, it will cause that CXL subsystem has not chance to execute component registers setup for the cxl port. the failure flow like below: port->nr_dports = 0 dport 1 adding to the port: add_dport() # port->nr_dports: 1 failed on devm_add_action_or_reset() or sysfs_create_link() return error # port->nr_dports: 1 dport 2 adding to the port: add_dport() # port->nr_dports: 2 no failure skip component registers setup because of port->nr_dports is 2 The solution here is that moving component registers setup closer to add_dport(), so if add_dport() is executed correctly for the first dport, component registers setup on the port will be executed immediately after that. Fixes: `f6ee24913d` ("cxl: Move port register setup to when first dport appear") Signed-off-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>	2025-10-13 12:01:01 -07:00
Joshua Watt	9bb3baa9d1	NFS4: Fix state renewals missing after boot Since the last renewal time was initialized to 0 and jiffies start counting at -5 minutes, any clients connected in the first 5 minutes after a reboot would have their renewal timer set to a very long interval. If the connection was idle, this would result in the client state timing out on the server and the next call to the server would return NFS4ERR_BADSESSION. Fix this by initializing the last renewal time to the current jiffies instead of 0. Signed-off-by: Joshua Watt <jpewhacker@gmail.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-10-13 14:33:00 -04:00
Scott Mayhew	9ff022f382	NFS: check if suid/sgid was cleared after a write as needed I noticed xfstests generic/193 and generic/355 started failing against knfsd after commit `e7a8ebc305` ("NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE"). I ran those same tests against ONTAP (which has had write delegation support for a lot longer than knfsd) and they fail there too... so while it's a new failure against knfsd, it isn't an entirely new failure. Add the NFS_INO_REVAL_FORCED flag so that the presence of a delegation doesn't keep the inode from being revalidated to fetch the updated mode. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-10-13 14:33:00 -04:00
Joshua Watt	7a84394f02	NFS4: Apply delay_retrans to async operations The setting of delay_retrans is applied to synchronous RPC operations because the retransmit count is stored in same struct nfs4_exception that is passed each time an error is checked. However, for asynchronous operations (READ, WRITE, LOCKU, CLOSE, DELEGRETURN), a new struct nfs4_exception is made on the stack each time the task callback is invoked. This means that the retransmit count is always zero and thus delay_retrans never takes effect. Apply delay_retrans to these operations by tracking and updating their retransmit count. Change-Id: Ieb33e046c2b277cb979caa3faca7f52faf0568c9 Signed-off-by: Joshua Watt <jpewhacker@gmail.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-10-13 14:33:00 -04:00
Mike Snitzer	8db4a1d146	NFSv4/flexfiles: fix to allocate mirror->dss before use Move mirror_array's dss_count initialization and dss allocation to ff_layout_alloc_mirror(), just before the loop that initializes each nfs4_ff_layout_ds_stripe's nfs_file_localio. Also handle NULL return from kcalloc() and remove one level of indent in ff_layout_alloc_mirror(). This commit fixes dangling nfsd_serv refcount issues seen when using NFS LOCALIO and then attempting to stop the NFSD service. Fixes: `20b1d75fb8` ("NFSv4/flexfiles: Add support for striped layouts") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-10-13 14:33:00 -04:00
Jens Axboe	927069c4ac	Revert "io_uring/rw: drop -EOPNOTSUPP check in __io_complete_rw_common()" This reverts commit `90bfb28d5f`. Kevin reports that this commit causes an issue for him with LVM snapshots, most likely because of turning off NOWAIT support while a snapshot is being created. This makes -EOPNOTSUPP bubble back through the completion handler, where io_uring read/write handling should just retry it. Reinstate the previous check removed by the referenced commit. Cc: stable@vger.kernel.org Fixes: `90bfb28d5f` ("io_uring/rw: drop -EOPNOTSUPP check in __io_complete_rw_common()") Reported-by: Salvatore Bonaccorso <carnil@debian.org> Reported-by: Kevin Lumik <kevin@xf.ee> Link: https://lore.kernel.org/io-uring/cceb723c-051b-4de2-9a4c-4aa82e1619ee@kernel.dk/ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-10-13 12:16:40 -06:00
Jonathan Kim	079ae5118e	drm/amdkfd: fix suspend/resume all calls in mes based eviction path Suspend/resume all gangs should be done with the device lock is held. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	277bb0f83e	drm/amdgpu: enable suspend/resume all for gfx 12 Suspend/resume all gangs has been available for GFX12 for a while now so enable it. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	0ef930e1fa	drm/amdgpu: fix hung reset queue array memory allocation By design the MES will return an array result that is twice the number of hung doorbells it can report. i.e. if up k reported doorbells are supported, then the second half of the array, also of length k, holds the HQD information (type/queue/pipe) where queue 1 corresponds to index 0 and k, queue 2 corresponds to index 1 and k + 1 etc ... The driver will use the HDQ info to target queue/pipe reset for hardware scheduled user compute queues. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	8745ca5efb	drm/amdgpu: fix initialization of doorbell array for detect and hang Initialized doorbells should be set to invalid rather than 0 to prevent driver from over counting hung doorbells since it checks against the invalid value to begin with. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:28 -04:00
Jonathan Kim	d0de79f66a	drm/amdgpu: fix gfx12 mes packet status return check GFX12 MES uses low 32 bits of status return for success (1 or 0) and high bits for debug information if low bits are 0. GFX11 MES doesn't do this so checking full 64-bit status return for 1 or 0 is still valid. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2025-10-13 14:14:16 -04:00
Jesse.Zhang	883f309add	drm/amdgpu: Fix NULL pointer dereference in VRAM logic for APU devices Previously, APU platforms (and other scenarios with uninitialized VRAM managers) triggered a NULL pointer dereference in `ttm_resource_manager_usage()`. The root cause is not that the `struct ttm_resource_manager man` pointer itself is NULL, but that `man->bdev` (the backing device pointer within the manager) remains uninitialized (NULL) on APUs—since APUs lack dedicated VRAM and do not fully set up VRAM manager structures. When `ttm_resource_manager_usage()` attempts to acquire `man->bdev->lru_lock`, it dereferences the NULL `man->bdev`, leading to a kernel OOPS. 1. amdgpu_cs.c: Extend the existing bandwidth control check in `amdgpu_cs_get_threshold_for_moves()` to include a check for `ttm_resource_manager_used()`. If the manager is not used (uninitialized `bdev`), return 0 for migration thresholds immediately—skipping VRAM-specific logic that would trigger the NULL dereference. 2. amdgpu_kms.c: Update the `AMDGPU_INFO_VRAM_USAGE` ioctl and memory info reporting to use a conditional: if the manager is used, return the real VRAM usage; otherwise, return 0. This avoids accessing `man->bdev` when it is NULL. 3. amdgpu_virt.c*: Modify the vf2pf (virtual function to physical function) data write path. Use `ttm_resource_manager_used()` to check validity: if the manager is usable, calculate `fb_usage` from VRAM usage; otherwise, set `fb_usage` to 0 (APUs have no discrete framebuffer to report). This approach is more robust than APU-specific checks because it: - Works for all scenarios where the VRAM manager is uninitialized (not just APUs), - Aligns with TTM's design by using its native helper function, - Preserves correct behavior for discrete GPUs (which have fully initialized `man->bdev` and pass the `ttm_resource_manager_used()` check). v4: use ttm_resource_manager_used(&adev->mman.vram_mgr.manager) instead of checking the adev->gmc.is_app_apu flag (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Christian König	33cc891b56	drm/amdgpu: hide VRAM sysfs attributes on GPUs without VRAM Otherwise accessing them can cause a crash. Signed-off-by: Christian König <christian.koenig@amd.com> Tested-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Sathishkumar S	74de0eaa00	drm/amdgpu: fix bit shift logic BIT_ULL(n) sets nth bit, remove explicit shift and set the position Fixes: `a7a411e246` ("drm/amdgpu: fix shift-out-of-bounds in amdgpu_debugfs_jpeg_sched_mask_set") Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Timur Kristóf	6917112af2	drm/amd/powerplay: Fix CIK shutdown temperature Remove extra multiplication. CIK GPUs such as Hawaii appear to use PP_TABLE_V0 in which case the shutdown temperature is hardcoded in smu7_init_dpm_defaults and is already multiplied by 1000. The value was mistakenly multiplied another time by smu7_get_thermal_temperature_range. Fixes: `4ba082572a` ("drm/amd/powerplay: export the thermal ranges of VI asics (V2)") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1676 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Alex Deucher	ef38b4eab1	drm/amdgpu: drop unused structures in amdgpu_drm.h These were never used and are duplicated with the interface that is used. Maybe leftovers from a previous revision of the patch that added them. Fixes: `90c448fef3` ("drm/amdgpu: add new AMDGPU_INFO subquery for userq objects") Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00
Gui-Dong Han	6df8e84aa6	drm/amdgpu: use atomic functions with memory barriers for vm fault info The atomic variable vm_fault_info_updated is used to synchronize access to adev->gmc.vm_fault_info between the interrupt handler and get_vm_fault_info(). The default atomic functions like atomic_set() and atomic_read() do not provide memory barriers. This allows for CPU instruction reordering, meaning the memory accesses to vm_fault_info and the vm_fault_info_updated flag are not guaranteed to occur in the intended order. This creates a race condition that can lead to inconsistent or stale data being used. The previous implementation, which used an explicit mb(), was incomplete and inefficient. It failed to account for all potential CPU reorderings, such as the access of vm_fault_info being reordered before the atomic_read of the flag. This approach is also more verbose and less performant than using the proper atomic functions with acquire/release semantics. Fix this by switching to atomic_set_release() and atomic_read_acquire(). These functions provide the necessary acquire and release semantics, which act as memory barriers to ensure the correct order of operations. It is also more efficient and idiomatic than using explicit full memory barriers. Fixes: `b97dfa27ef` ("drm/amdgpu: save vm fault information for amdkfd") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:15 -04:00

... 3 4 5 6 7 ...

1397031 Commits