Commit 86e25f40aa ("net: napi: Add napi_config") moved napi->napi_id
assignment to a later point in time (napi_hash_add_with_id). This breaks
__xdp_rxq_info_reg which copies napi_id at an earlier time and now
stores 0 napi_id. It also makes sk_mark_napi_id_once_xdp and
__sk_mark_napi_id_once useless because they now work against 0 napi_id.
Since sk_busy_loop requires valid napi_id to busy-poll on, there is no way
to busy-poll AF_XDP sockets anymore.
Bring back the ability to busy-poll on XSK by resolving socket's napi_id
at bind time. This relies on relatively recent netif_queue_set_napi,
but (assume) at this point most popular drivers should have been converted.
This also removes per-tx/rx cycles which used to check and/or set
the napi_id value.
Confirmed by running a busy-polling AF_XDP socket
(github.com/fomichev/xskrtt) on mlx5 and looking at BusyPollRxPackets
from /proc/net/netstat.
Fixes: 86e25f40aa ("net: napi: Add napi_config")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250109003436.2829560-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recalculate features when XDP is detached.
Before:
# ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp
# ip li set dev eth0 xdp off
# ethtool -k eth0 | grep gro
rx-gro-hw: off [requested on]
After:
# ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp
# ip li set dev eth0 xdp off
# ethtool -k eth0 | grep gro
rx-gro-hw: on
The fact that HW-GRO doesn't get re-enabled automatically is just
a minor annoyance. The real issue is that the features will randomly
come back during another reconfiguration which just happens to invoke
netdev_update_features(). The driver doesn't handle reconfiguring
two things at a time very robustly.
Starting with commit 98ba1d931f ("bnxt_en: Fix RSS logic in
__bnxt_reserve_rings()") we only reconfigure the RSS hash table
if the "effective" number of Rx rings has changed. If HW-GRO is
enabled "effective" number of rings is 2x what user sees.
So if we are in the bad state, with HW-GRO re-enablement "pending"
after XDP off, and we lower the rings by / 2 - the HW-GRO rings
doing 2x and the ethtool -L doing / 2 may cancel each other out,
and the:
if (old_rx_rings != bp->hw_resc.resv_rx_rings &&
condition in __bnxt_reserve_rings() will be false.
The RSS map won't get updated, and we'll crash with:
BUG: kernel NULL pointer dereference, address: 0000000000000168
RIP: 0010:__bnxt_hwrm_vnic_set_rss+0x13a/0x1a0
bnxt_hwrm_vnic_rss_cfg_p5+0x47/0x180
__bnxt_setup_vnic_p5+0x58/0x110
bnxt_init_nic+0xb72/0xf50
__bnxt_open_nic+0x40d/0xab0
bnxt_open_nic+0x2b/0x60
ethtool_set_channels+0x18c/0x1d0
As we try to access a freed ring.
The issue is present since XDP support was added, really, but
prior to commit 98ba1d931f ("bnxt_en: Fix RSS logic in
__bnxt_reserve_rings()") it wasn't causing major issues.
Fixes: 1054aee823 ("bnxt_en: Use NETIF_F_GRO_HW.")
Fixes: 98ba1d931f ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://patch.msgid.link/20250109043057.2888953-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As pointed out in the original comment, lookup in sockmap can return a TCP
ESTABLISHED socket. Such TCP socket may have had SO_ATTACH_REUSEPORT_EBPF
set before it was ESTABLISHED. In other words, a non-NULL sk_reuseport_cb
does not imply a non-refcounted socket.
Drop sk's reference in both error paths.
unreferenced object 0xffff888101911800 (size 2048):
comm "test_progs", pid 44109, jiffies 4297131437
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
80 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 9336483b):
__kmalloc_noprof+0x3bf/0x560
__reuseport_alloc+0x1d/0x40
reuseport_alloc+0xca/0x150
reuseport_attach_prog+0x87/0x140
sk_reuseport_attach_bpf+0xc8/0x100
sk_setsockopt+0x1181/0x1990
do_sock_setsockopt+0x12b/0x160
__sys_setsockopt+0x7b/0xc0
__x64_sys_setsockopt+0x1b/0x30
do_syscall_64+0x93/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: 64d85290d7 ("bpf: Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250110-reuseport-memleak-v1-1-fa1ddab0adfe@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When jumping to 'martian_destination' a drop reason is always set but
that label falls-through the 'e_nobufs' one, overriding the value.
The behavior was introduced by the mentioned commit. The logic went
from,
goto martian_destination;
...
martian_destination:
...
e_inval:
err = -EINVAL;
goto out;
e_nobufs:
err = -ENOBUFS;
goto out;
to,
reason = ...;
goto martian_destination;
...
martian_destination:
...
e_nobufs:
reason = SKB_DROP_REASON_NOMEM;
goto out;
A 'goto out' is clearly missing now after 'martian_destination' to avoid
overriding the drop reason.
Fixes: 5b92112acd ("net: ip: make ip_route_input_slow() return drop reasons")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Cc: Menglong Dong <menglong8.dong@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250108165725.404564-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
CPSW ALE has 75-bit ALE entries stored across three 32-bit words.
The cpsw_ale_get_field() and cpsw_ale_set_field() functions support
ALE field entries spanning up to two words at the most.
The cpsw_ale_get_field() and cpsw_ale_set_field() functions work as
expected when ALE field spanned across word1 and word2, but fails when
ALE field spanned across word2 and word3.
For example, while reading the ALE field spanned across word2 and word3
(i.e. bits 62 to 64), the word3 data shifted to an incorrect position
due to the index becoming zero while flipping.
The same issue occurred when setting an ALE entry.
This issue has not been seen in practice but will be an issue in the future
if the driver supports accessing ALE fields spanning word2 and word3
Fix the methods to handle getting/setting fields spanning up to two words.
Fixes: b685f1a589 ("net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field()")
Signed-off-by: Sudheer Kumar Doredla <s-doredla@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://patch.msgid.link/20250108172433.311694-1-s-doredla@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, Bluetooth and WPAN.
No outstanding fixes / investigations at this time.
Current release - new code bugs:
- eth: fbnic: revert HWMON support, it doesn't work at all and revert
is similar size as the fixes
Previous releases - regressions:
- tcp: allow a connection when sk_max_ack_backlog is zero
- tls: fix tls_sw_sendmsg error handling
Previous releases - always broken:
- netdev netlink family:
- prevent accessing NAPI instances from another namespace
- don't dump Tx and uninitialized NAPIs
- net: sysctl: avoid using current->nsproxy, fix null-deref if task
is exiting and stick to opener's netns
- sched: sch_cake: add bounds checks to host bulk flow fairness
counts
Misc:
- annual cleanup of inactive maintainers"
* tag 'net-6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy
sctp: sysctl: plpmtud_probe_interval: avoid using current->nsproxy
sctp: sysctl: udp_port: avoid using current->nsproxy
sctp: sysctl: auth_enable: avoid using current->nsproxy
sctp: sysctl: rto_min/max: avoid using current->nsproxy
sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy
mptcp: sysctl: blackhole timeout: avoid using current->nsproxy
mptcp: sysctl: sched: avoid using current->nsproxy
mptcp: sysctl: avail sched: remove write access
MAINTAINERS: remove Lars Povlsen from Microchip Sparx5 SoC
MAINTAINERS: remove Noam Dagan from AMAZON ETHERNET
MAINTAINERS: remove Ying Xue from TIPC
MAINTAINERS: remove Mark Lee from MediaTek Ethernet
MAINTAINERS: mark stmmac ethernet as an Orphan
MAINTAINERS: remove Andy Gospodarek from bonding
MAINTAINERS: update maintainers for Microchip LAN78xx
MAINTAINERS: mark Synopsys DW XPCS as Orphan
net/mlx5: Fix variable not being completed when function returns
rtase: Fix a check for error in rtase_alloc_msix()
net: stmmac: dwmac-tegra: Read iommu stream id from device tree
...
Pull btrfs fixes from David Sterba:
"A few more fixes.
Besides the one-liners in Btrfs there's fix to the io_uring and
encoded read integration (added in this development cycle). The update
to io_uring provides more space for the ongoing command that is then
used in Btrfs to handle some cases.
- io_uring and encoded read:
- provide stable storage for io_uring command data
- make a copy of encoded read ioctl call, reuse that in case the
call would block and will be called again
- properly initialize zlib context for hardware compression on s390
- fix max extent size calculation on filesystems with non-zoned
devices
- fix crash in scrub on crafted image due to invalid extent tree"
* tag 'for-6.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zlib: fix avail_in bytes for s390 zlib HW compression path
btrfs: zoned: calculate max_extent_size properly on non-zoned setup
btrfs: avoid NULL pointer dereference if no valid extent tree
btrfs: don't read from userspace twice in btrfs_uring_encoded_read()
io_uring: add io_uring_cmd_get_async_data helper
io_uring/cmd: add per-op data to struct io_uring_cmd_data
io_uring/cmd: rename struct uring_cache to io_uring_cmd_data
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix imbalance between flowtable BIND and UNBIND calls to configure
hardware offload, this fixes a possible kmemleak.
2) Clamp maximum conntrack hashtable size to INT_MAX to fix a possible
WARN_ON_ONCE splat coming from kvmalloc_array(), only possible from
init_netns.
* tag 'nf-25-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: conntrack: clamp maximum hashtable size to INT_MAX
netfilter: nf_tables: imbalance in flowtable binding
====================
Link: https://patch.msgid.link/20250109123532.41768-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
net: sysctl: avoid using current->nsproxy
As pointed out by Al Viro and Eric Dumazet in [1], using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns as it is usually done. This could cause
unexpected issues when other operations are done on the wrong netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' or 'pernet' structure can be obtained from the table->data
using container_of().
Note that table->data could also be used directly in more places, but
that would increase the size of this fix to replace all accesses via
'net'. Probably best to avoid that for fixes.
Patches 2-9 remove access of net via current->nsproxy in sysfs handlers
in MPTCP, SCTP and RDS. There are multiple patches doing almost the same
thing, but the reason is to ease the backports.
Patch 1 is not directly linked to this, but it is a small fix for MPTCP
available_schedulers sysctl knob to explicitly mark it as read-only.
Please note that this series does not address Al's comment [2]. In SCTP,
some sysctl knobs set other sysfs-exposed variables for the min/max: two
processes could then write two linked values at the same time, resulting
in new values being outside the new boundaries. It would be great if
SCTP developers can look at this problem.
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Link: https://lore.kernel.org/20250105211158.GL1977892@ZenIV [2]
====================
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-0-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The per-netns structure can be obtained from the table->data using
container_of(), then the 'net' one can be retrieved from the listen
socket (if available).
Fixes: c6a58ffed5 ("RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-9-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.probe_interval' is
used.
Fixes: d1e462a7a5 ("sctp: add probe_interval in sysctl and sock/asoc/transport")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-8-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, but that would
increase the size of this fix, while 'sctp.ctl_sock' still needs to be
retrieved from 'net' structure.
Fixes: 046c052b47 ("sctp: enable udp tunneling socks")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-7-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, but that would
increase the size of this fix, while 'sctp.ctl_sock' still needs to be
retrieved from 'net' structure.
Fixes: b14878ccb7 ("net: sctp: cache auth_enable per endpoint")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-6-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.rto_min/max' is used.
Fixes: 4f3fdf3bc5 ("sctp: add check rto_min and rto_max in sysctl")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-5-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.sctp_hmac_alg' is
used.
Fixes: 3c68198e75 ("sctp: Make hmac algorithm selection for cookie generation dynamic")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-4-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Even though we fixed a logic error in the commit cited below, syzbot
still managed to trigger an underflow of the per-host bulk flow
counters, leading to an out of bounds memory access.
To avoid any such logic errors causing out of bounds memory accesses,
this commit factors out all accesses to the per-host bulk flow counters
to a series of helpers that perform bounds-checking before any
increments and decrements. This also has the benefit of improving
readability by moving the conditional checks for the flow mode into
these helpers, instead of having them spread out throughout the
code (which was the cause of the original logic error).
As part of this change, the flow quantum calculation is consolidated
into a helper function, which means that the dithering applied to the
ost load scaling is now applied both in the DRR rotation and when a
sparse flow's quantum is first initiated. The only user-visible effect
of this is that the maximum packet size that can be sent while a flow
stays sparse will now vary with +/- one byte in some cases. This should
not make a noticeable difference in practice, and thus it's not worth
complicating the code to preserve the old behaviour.
Fixes: 546ea84d07 ("sched: sch_cake: fix bulk flow accounting logic for host fairness")
Reported-by: syzbot+f63600d288bfb7057424@syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Link: https://patch.msgid.link/20250107120105.70685-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use INT_MAX as maximum size for the conntrack hashtable. Otherwise, it
is possible to hit WARN_ON_ONCE in __kvmalloc_node_noprof() when
resizing hashtable because __GFP_NOWARN is unset. See:
0708a0afe2 ("mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls")
Note: hashtable resize is only possible from init_netns.
Fixes: 9cc1c73ad6 ("netfilter: conntrack: avoid integer overflow when resizing")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
All these cases cause imbalance between BIND and UNBIND calls:
- Delete an interface from a flowtable with multiple interfaces
- Add a (device to a) flowtable with --check flag
- Delete a netns containing a flowtable
- In an interactive nft session, create a table with owner flag and
flowtable inside, then quit.
Fix it by calling FLOW_BLOCK_UNBIND when unregistering hooks, then
remove late FLOW_BLOCK_UNBIND call when destroying flowtable.
Fixes: ff4bf2f42a ("netfilter: nf_tables: add nft_unregister_flowtable_hook()")
Reported-by: Phil Sutter <phil@nwl.cc>
Tested-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We found a timeout problem with the pldm command on our system. The
reason is that the MCTP-I3C driver has a race condition when receiving
multiple-packet messages in multi-thread, resulting in a wrong packet
order problem.
We identified this problem by adding a debug message to the
mctp_i3c_read function.
According to the MCTP spec, a multiple-packet message must be composed
in sequence, and if there is a wrong sequence, the whole message will be
discarded and wait for the next SOM.
For example, SOM → Pkt Seq #2 → Pkt Seq #1 → Pkt Seq #3 → EOM.
Therefore, we try to solve this problem by adding a mutex to the
mctp_i3c_read function. Before the modification, when a command
requesting a multiple-packet message response is sent consecutively, an
error usually occurs within 100 loops. After the mutex, it can go
through 40000 loops without any error, and it seems to run well.
Fixes: c8755b29b5 ("mctp i3c: MCTP I3C driver")
Signed-off-by: Leo Yang <Leo-Yang@quantatw.com>
Link: https://patch.msgid.link/20250107031529.3296094-1-Leo-Yang@quantatw.com
[pabeni@redhat.com: dropped already answered question from changelog]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-01-07 (ice, igc)
For ice:
Arkadiusz corrects mask value being used to determine DPLL phase range.
Przemyslaw corrects frequency value for E823 devices.
For igc:
En-Wei Wu adds a check and, early, return for failed register read.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
igc: return early when failing to read EECD register
ice: fix incorrect PHY settings for 100 GB/s
ice: fix max values for dpll pin phase adjust
====================
Link: https://patch.msgid.link/20250107190150.1758577-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- btmtk: Fix failed to send func ctrl for MediaTek devices.
- hci_sync: Fix not setting Random Address when required
- MGMT: Fix Add Device to responding before completing
- btnxpuart: Fix driver sending truncated data
* tag 'for-net-2025-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: btmtk: Fix failed to send func ctrl for MediaTek devices.
Bluetooth: btnxpuart: Fix driver sending truncated data
Bluetooth: MGMT: Fix Add Device to responding before completing
Bluetooth: hci_sync: Fix not setting Random Address when required
====================
Link: https://patch.msgid.link/20250108162627.1623760-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull SCSI fixes from James Bottomley:
"Four driver fixes in UFS, mostly to do with power management"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: qcom: Power down the controller/device during system suspend for SM8550/SM8650 SoCs
scsi: ufs: qcom: Allow passing platform specific OF data
scsi: ufs: core: Honor runtime/system PM levels if set by host controller drivers
scsi: ufs: qcom: Power off the PHY if it was already powered on in ufs_qcom_power_up_sequence()
The TQP BAR space is divided into two segments. TQPs 0-1023 and TQPs
1024-1279 are in different BAR space addresses. However,
hclge_fetch_pf_reg does not distinguish the tqp space information when
reading the tqp space information. When the number of TQPs is greater
than 1024, access bar space overwriting occurs.
The problem of different segments has been considered during the
initialization of tqp.io_base. Therefore, tqp.io_base is directly used
when the queue is read in hclge_fetch_pf_reg.
The error message:
Unable to handle kernel paging request at virtual address ffff800037200000
pc : hclge_fetch_pf_reg+0x138/0x250 [hclge]
lr : hclge_get_regs+0x84/0x1d0 [hclge]
Call trace:
hclge_fetch_pf_reg+0x138/0x250 [hclge]
hclge_get_regs+0x84/0x1d0 [hclge]
hns3_get_regs+0x2c/0x50 [hns3]
ethtool_get_regs+0xf4/0x270
dev_ethtool+0x674/0x8a0
dev_ioctl+0x270/0x36c
sock_do_ioctl+0x110/0x2a0
sock_ioctl+0x2ac/0x530
__arm64_sys_ioctl+0xa8/0x100
invoke_syscall+0x4c/0x124
el0_svc_common.constprop.0+0x140/0x15c
do_el0_svc+0x30/0xd0
el0_svc+0x1c/0x2c
el0_sync_handler+0xb0/0xb4
el0_sync+0x168/0x180
Fixes: 939ccd107f ("net: hns3: move dump regs function to a separate file")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20250106143642.539698-7-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch modifies the implementation of debugfs:
When the user process stops unexpectedly, not all data of the file system
is read. In this case, the save_buf pointer is not released. When the
user process is called next time, save_buf is used to copy the cached
data to the user space. As a result, the queried data is stale.
To solve this problem, this patch implements .open() and .release() handler
for debugfs file_operations. moving allocation buffer and execution
of the cmd to the .open() handler and freeing in to the .release() handler.
Allocate separate buffer for each reader and associate the buffer
with the file pointer.
When different user read processes no longer share the buffer,
the stale data problem is fixed.
Fixes: 5e69ea7ee2 ("net: hns3: refactor the debugfs process")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Guangwei Zhang <zhangwangwei6@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250106143642.539698-4-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a reset type that is not supported by the driver is input, a reset
pending flag bit of the HNAE3_NONE_RESET type is generated in
reset_pending. The driver does not have a mechanism to clear this type
of error. As a result, the driver considers that the reset is not
complete. This patch provides a mechanism to clear the
HNAE3_NONE_RESET flag and the parameter of
hnae3_ae_ops.set_default_reset_request is verified.
The error message:
hns3 0000:39:01.0: cmd failed -16
hns3 0000:39:01.0: hclge device re-init failed, VF is disabled!
hns3 0000:39:01.0: failed to reset VF stack
hns3 0000:39:01.0: failed to reset VF(4)
hns3 0000:39:01.0: prepare reset(2) wait done
hns3 0000:39:01.0 eth4: already uninitialized
Use the crash tool to view struct hclgevf_dev:
struct hclgevf_dev {
...
default_reset_request = 0x20,
reset_level = HNAE3_NONE_RESET,
reset_pending = 0x100,
reset_type = HNAE3_NONE_RESET,
...
};
Fixes: 720bd5837e ("net: hns3: add set_default_reset_request in the hnae3_ae_ops")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20250106143642.539698-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The NAPI IDs were not fully exposed to user space prior to the netlink
API, so they were never namespaced. The netlink API must ensure that
at the very least NAPI instance belongs to the same netns as the owner
of the genl sock.
napi_by_id() can become static now, but it needs to move because of
dev_get_by_napi_id().
Cc: stable@vger.kernel.org
Fixes: 1287c1ae0f ("netdev-genl: Support setting per-NAPI config values")
Fixes: 27f91aaf49 ("netdev-genl: Add netlink framework functions for napi")
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250106180137.1861472-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull device mapper fixes from Mikulas Patocka:
- dm-array fixes
- dm-verity forward error correction fixes
- remove the flag DM_TARGET_PASSES_INTEGRITY from dm-ebs
- dm-thin RCU list fix
* tag 'for-6.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm thin: make get_first_thin use rcu-safe list first function
dm-ebs: don't set the flag DM_TARGET_PASSES_INTEGRITY
dm-verity FEC: Avoid copying RS parity bytes twice.
dm-verity FEC: Fix RS FEC repair for roots unaligned to block size (take 2)
dm array: fix cursor index when skipping across block boundaries
dm array: fix unreleased btree blocks on closing a faulty array cursor
dm array: fix releasing a faulty array block twice in dm_array_cursor_end