Pull powerpc fixes from Michael Ellerman:
- Three fixes for various bogosity in our linker script, revealed
by the recent commit which changed discard behaviour with some
toolchains.
* tag 'powerpc-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/vmlinux.lds: Don't discard .comment
powerpc/vmlinux.lds: Don't discard .rela* for relocatable builds
powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT
Pull memblock fixes from Mike Rapoport:
"Small fixes in kernel-doc and tests:
- Fix kernel-doc for memblock_phys_free() to use correct names for
the counterpart allocation methods
- Fix compilation error in memblock tests"
* tag 'fixes-2023-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock: Fix doc for memblock_phys_free
memblock tests: Fix compilation error.
Pull NFS client fixes from Trond Myklebust:
- Fix a race in the RPCSEC_GSS upcall code that causes hung RPC calls
- Fix a broken coalescing test in the pNFS file layout driver
- Ensure that the access cache rcu path also applies the login test
- Fix up for a sparse warning
* tag 'nfs-for-6.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix up a sparse warning
NFS: Judge the file access cache's timestamp in rcu path
pNFS/filelayout: Fix coalescing test for single DS
SUNRPC: ensure the matching upcall is in-flight upon downcall
Pull cifs fixes from Steve French:
"cifs/smb3 client fixes:
- two multichannel fixes
- three reconnect fixes
- unmap fix"
* tag '6.2-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix interface count calculation during refresh
cifs: refcount only the selected iface during interface update
cifs: protect access of TCP_Server_Info::{dstaddr,hostname}
cifs: fix race in assemble_neg_contexts()
cifs: ignore ipc reconnect failures during dfs failover
cifs: Fix kmap_local_page() unmapping
Pull devicetree fixes from Rob Herring:
- Fix DT memory scanning for some MIPS boards when memory is not
specified in DT
- Redo CONFIG_CMDLINE* handling for missing /chosen node. The first
attempt broke PS3 (and possibly other PPC platforms).
- Fix constraints in QCom Soundwire schema
* tag 'devicetree-fixes-for-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: fdt: Honor CONFIG_CMDLINE* even without /chosen node, take 2
Revert "of: fdt: Honor CONFIG_CMDLINE* even without /chosen node"
dt-bindings: soundwire: qcom,soundwire: correct sizes related to number of ports
of/fdt: run soc memory setup when early_init_dt_scan_memory fails
Pull USB fixes from Greg KH:
"Here are some small USB driver fixes for 6.2-rc3 that resolve some
reported issues. They include:
- of-reported ulpi problem, so the offending commit is reverted
- dwc3 driver bugfixes for recent changes
- fotg210 fixes
Most of these have been in linux-next for a while, the last few were
on the mailing list for a long time and passed all the 0-day bot
testing so all should be fine with them as well"
* tag 'usb-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: dwc3: gadget: Ignore End Transfer delay on teardown
usb: dwc3: xilinx: include linux/gpio/consumer.h
usb: fotg210-udc: fix error return code in fotg210_udc_probe()
usb: fotg210: fix OTG-only build
Revert "usb: ulpi: defer ulpi_register on ulpi_read_id timeout"
Pull rdma fixes from Jason Gunthorpe:
"Most noticeable is that Yishai found a big data corruption regression
due to a change in the scatterlist:
- Do not wrongly combine non-contiguous pages in scatterlist
- Fix compilation warnings on gcc 13
- Oops when using some mlx5 stats
- Bad enforcement of atomic responder resources in mlx5"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
lib/scatterlist: Fix to merge contiguous pages into the last SG properly
RDMA/mlx5: Fix validation of max_rd_atomic caps for DC
RDMA/mlx5: Fix mlx5_ib_get_hw_stats when used for device
RDMA/srp: Move large values to a new enum for gcc13
Pull Kbuild fixes from Masahiro Yamada:
- Fix single *.ko build
- Fix module builds when vmlinux.o or Module.symver is missing
* tag 'kbuild-fixes-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: readd -w option when vmlinux.o or Module.symver is missing
kbuild: fix single *.ko build
Pull drm fixes from Daniel Vetter:
"Still not much, but more than last week. Dave should be back next week
from the beaching.
drivers:
- i915-gvt fixes
- amdgpu/kfd fixes
- panfrost bo refcounting fix
- meson afbc corruption fix
- imx plane width fix
core:
- drm/sched fixes
- drm/mm kunit test fix
- dma-buf export error handling fixes"
* tag 'drm-fixes-2023-01-06' of git://anongit.freedesktop.org/drm/drm:
Revert "drm/amd/display: Enable Freesync Video Mode by default"
drm/i915/gvt: fix double free bug in split_2MB_gtt_entry
drm/i915/gvt: use atomic operations to change the vGPU status
drm/i915/gvt: fix vgpu debugfs clean in remove
drm/i915/gvt: fix gvt debugfs destroy
drm/i915: unpin on error in intel_vgpu_shadow_mm_pin()
drm/amd/display: Uninitialized variables causing 4k60 UCLK to stay at DPM1 and not DPM0
drm/amdkfd: Fix kernel warning during topology setup
drm/scheduler: Fix lockup in drm_sched_entity_kill()
drm/imx: ipuv3-plane: Fix overlay plane width
drm/scheduler: Fix lockup in drm_sched_entity_kill()
drm/virtio: Fix memory leak in virtio_gpu_object_create()
drm/meson: Reduce the FIFO lines held when AFBC is not used
drm/tests: reduce drm_mm_test stack usage
drm/panfrost: Fix GEM handle creation ref-counting
drm/plane-helper: Add the missing declaration of drm_atomic_state
dma-buf: fix dma_buf_export init order v2
TPM 1 is sometimes broken across system suspends, due to races or
locking issues or something else that haven't been diagnosed or fixed
yet, most likely having to do with concurrent reads from the TPM's
hardware random number generator driver. These issues prevent the system
from actually suspending, with errors like:
tpm tpm0: A TPM error (28) occurred continue selftest
...
tpm tpm0: A TPM error (28) occurred attempting get random
...
tpm tpm0: Error (28) sending savestate before suspend
tpm_tis 00:08: PM: __pnp_bus_suspend(): tpm_pm_suspend+0x0/0x80 returns 28
tpm_tis 00:08: PM: dpm_run_callback(): pnp_bus_suspend+0x0/0x10 returns 28
tpm_tis 00:08: PM: failed to suspend: error 28
PM: Some devices failed to suspend, or early wake event detected
This issue was partially fixed by 23393c6461 ("char: tpm: Protect
tpm_pm_suspend with locks"), in a last minute 6.1 commit that Linus took
directly because the TPM maintainers weren't available. However, it
seems like this just addresses the most common cases of the bug, rather
than addressing it entirely. So there are more things to fix still,
apparently.
In lieu of actually fixing the underlying bug, just allow system suspend
to continue, so that laptops still go to sleep fine. Later, this can be
reverted when the real bug is fixed.
Link: https://lore.kernel.org/lkml/7cbe96cf-e0b5-ba63-d1b4-f63d2e826efa@suse.cz/
Cc: stable@vger.kernel.org # 6.1+
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Luigi Semenzato <semenzato@chromium.org>
Cc: Peter Huewe <peterhuewe@gmx.de>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Johannes Altmanninger <aclopte@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 55d1cbbbb2 ("hfs/hfsplus: use WARN_ON for sanity check") fixed
a build warning by turning a comment into a WARN_ON(), but it turns out
that syzbot then complains because it can trigger said warning with a
corrupted hfs image.
The warning actually does warn about a bad situation, but we are much
better off just handling it as the error it is. So rather than warn
about us doing bad things, stop doing the bad things and return -EIO.
While at it, also fix a memory leak that was introduced by an earlier
fix for a similar syzbot warning situation, and add a check for one case
that historically wasn't handled at all (ie neither comment nor
subsequent WARN_ON).
Reported-by: syzbot+7bb7cd3595533513a9e7@syzkaller.appspotmail.com
Fixes: 55d1cbbbb2 ("hfs/hfsplus: use WARN_ON for sanity check")
Fixes: 8d824e69d9 ("hfs: fix OOB Read in __hfs_brec_find")
Link: https://lore.kernel.org/lkml/000000000000dbce4e05f170f289@google.com/
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull block fixes from Jens Axboe:
"The big change here is obviously the revert of the pktcdvd driver
removal. Outside of that, just minor tweaks. In detail:
- Re-instate the pktcdvd driver, which necessitates adding back
bio_copy_data_iter() and the fops->devnode() hook for now (me)
- Fix for splitting of a bio marked as NOWAIT, causing either nowait
reads or writes to error with EAGAIN even if parts of the IO
completed (me)
- Fix for ublk, punting management commands to io-wq as they can all
easily block for extended periods of time (Ming)
- Removal of SRCU dependency for the block layer (Paul)"
* tag 'block-2023-01-06' of git://git.kernel.dk/linux:
block: Remove "select SRCU"
Revert "pktcdvd: remove driver."
Revert "block: remove devnode callback from struct block_device_operations"
Revert "block: bio_copy_data_iter"
ublk: honor IO_URING_F_NONBLOCK for handling control command
block: don't allow splitting of a REQ_NOWAIT bio
block: handle bio_split_to_limits() NULL return
Pull io_uring fixes from Jens Axboe:
"A few minor fixes that should go into the 6.2 release:
- Fix for a memory leak in io-wq worker creation, if we ultimately
end up canceling the worker creation before it gets created (me)
- lockdep annotations for the CQ locking (Pavel)
- A regression fix for CQ timeout handling (Pavel)
- Ring pinning around deferred task_work fix (Pavel)
- A trivial member move in struct io_ring_ctx, saving us some memory
(me)"
* tag 'io_uring-2023-01-06' of git://git.kernel.dk/linux:
io_uring: fix CQ waiting timeout handling
io_uring: move 'poll_multi_queue' bool in io_ring_ctx
io_uring: lockdep annotate CQ locking
io_uring: pin context while queueing deferred tw
io_uring/io-wq: free worker if task_work creation is canceled
Pull arm TIF_NOTIFY_SIGNAL fixup from Jens Axboe:
"Hui Tang reported a performance regressions with _TIF_WORK_MASK in
newer kernels, which he tracked to a change that went into 5.11. After
this change, we'll call do_work_pending() more often than we need to,
because we're now testing bits 0..15 rather than just 0..7.
Shuffle the bits around to avoid this"
* tag 'tif-notify-signal-2023-01-06' of git://git.kernel.dk/linux:
ARM: renumber bits related to _TIF_WORK_MASK
Pull ceph fixes from Ilya Dryomov:
"Two file locking fixes from Xiubo"
* tag 'ceph-for-6.2-rc3' of https://github.com/ceph/ceph-client:
ceph: avoid use-after-free in ceph_fl_release_lock()
ceph: switch to vfs_inode_has_locks() to fix file lock bug
Pull UDF fixes from Jan Kara:
"Two fixups of the UDF changes that went into 6.2-rc1"
* tag 'fixes_for_v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: initialize newblock to 0
udf: Fix extension of the last extent in the file
Pull btrfs fixes from David Sterba:
"A few more regression and regular fixes:
- regressions:
- fix assertion condition using = instead of ==
- fix false alert on bad tree level check
- fix off-by-one error in delalloc search during lseek
- fix compat ro feature check at read-write remount
- handle case when read-repair happens with ongoing device replace
- updated error messages"
* tag 'for-6.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix compat_ro checks against remount
btrfs: always report error in run_one_delayed_ref()
btrfs: handle case when repair happens with dev-replace
btrfs: fix off-by-one in delalloc search during lseek
btrfs: fix false alert on bad tree level check
btrfs: add error message for metadata level mismatch
btrfs: fix ASSERT em->len condition in btrfs_get_extent
Pull RISC-V fixes from Palmer Dabbelt:
- use the correct mask for c.jr/c.jalr when decoding instructions
- build fix for get_user() to avoid a sparse warning
* tag 'riscv-for-linus-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: uaccess: fix type of 0 variable on error in get_user()
riscv, kprobes: Stricter c.jr/c.jalr decoding
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix segfault when trying to process tracepoints present in a
perf.data file and not linked with libtraceevent.
- Fix build on uClibc systems by adding missing sys/types.h include,
that was being obtained indirectly which stopped being the case when
tools/lib/traceevent was removed.
- Don't show commands in 'perf help' that depend on linking with
libtraceevent when not building with that library, which is now a
possibility since we no longer ship a copy in tools/lib/traceevent.
- Fix failure in 'perf test' entry testing the combination of 'perf
probe' user space function + 'perf record' + 'perf script' where it
expects a backtrace leading to glibc's inet_pton() from 'ping' that
now happens more than once with glibc 2.35 for IPv6 addreses.
- Fix for the inet_pton perf test on s/390 where
'text_to_binary_address' now appears on the backtrace.
- Fix build error on riscv due to missing header for 'struct
perf_sample'.
- Fix 'make -C tools perf_install' install variant by not propagating
the 'subdir' to submakes for the 'install_headers' targets.
- Fix handling of unsupported cgroup events when using BPF counters in
'perf stat'.
- Count all cgroups, not just the last one when using 'perf stat' and
combining --for-each-cgroup with --bpf-counters.
This makes the output using BPF counters match the output without
using it, which was the intention all along, the output should be the
same using --bpf-counters or not.
- Fix 'perf lock contention' core dump related to not finding the
"__sched_text_end" symbol on s/390.
- Fix build failure when HEAD is signed: exclude the signature from the
version string.
- Add missing closedir() calls to in perf_data__open_dir(), plugging a
fd leak.
* tag 'perf-tools-fixes-for-v6.2-1-2023-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf tools: Fix build on uClibc systems by adding missing sys/types.h include
perf stat: Fix handling of --for-each-cgroup with --bpf-counters to match non BPF mode
perf stat: Fix handling of unsupported cgroup events when using BPF counters
perf test record_probe_libc_inet_pton: Fix test on s/390 where 'text_to_binary_address' now appears on the backtrace
perf lock contention: Fix core dump related to not finding the "__sched_text_end" symbol on s/390
perf build: Don't propagate subdir to submakes for install_headers
perf test record_probe_libc_inet_pton: Fix failure due to extra inet_pton() backtrace in glibc >= 2.35
perf tools: Fix segfault when trying to process tracepoints in perf.data and not linked with libtraceevent
perf tools: Don't include signature in version strings
perf help: Use HAVE_LIBTRACEEVENT to filter out unsupported commands
perf tools riscv: Fix build error on riscv due to missing header for 'struct perf_sample'
perf tools: Fix resources leak in perf_data__open_dir()
Pull perf fix from Ingo Molnar:
"Intel RAPL updates for new model IDs"
* tag 'perf-urgent-2023-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/rapl: Add support for Intel Emerald Rapids
perf/x86/rapl: Add support for Intel Meteor Lake
perf/x86/rapl: Treat Tigerlake like Icelake
Pull crypto fixes from Herbert Xu:
"This fixes a CFI crash in arm64/sm4 as well as a regression in the
caam driver"
* tag 'v6.2-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: arm64/sm4 - fix possible crash with CFI enabled
crypto: caam - fix CAAM io mem access in blob_gen
If we delay sending End Transfer for Setup TRB to be prepared, we need
to check if the End Transfer was in preparation for a driver
teardown/soft-disconnect. In those cases, just send the End Transfer
command without delay.
In the case of soft-disconnect, there's a very small chance the command
may not go through immediately. But should it happen, the Setup TRB will
be prepared during the polling of the controller halted state, allowing
the command to go through then.
In the case of disabling endpoint due to reconfiguration (e.g.
set_interface(alt-setting) or usb reset), then it's driven by the host.
Typically the host wouldn't immediately cancel the control request and
send another control transfer to trigger the End Transfer command
timeout.
Fixes: 4db0fbb601 ("usb: dwc3: gadget: Don't delay End Transfer on delayed_status")
Cc: stable@vger.kernel.org
Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/f1617a323e190b9cc408fb8b65456e32b5814113.1670546756.git.Thinh.Nguyen@synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The newly added gpio consumer calls cause a build failure in configurations
that fail to include the right header implicitly:
drivers/usb/dwc3/dwc3-xilinx.c: In function 'dwc3_xlnx_init_zynqmp':
drivers/usb/dwc3/dwc3-xilinx.c:207:22: error: implicit declaration of function 'devm_gpiod_get_optional'; did you mean 'devm_clk_get_optional'? [-Werror=implicit-function-declaration]
207 | reset_gpio = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_LOW);
| ^~~~~~~~~~~~~~~~~~~~~~~
| devm_clk_get_optional
Fixes: ca05b38252 ("usb: dwc3: xilinx: Add gpio-reset support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230103121755.956027-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The clang build reports this error
fs/udf/inode.c:805:6: error: variable 'newblock' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
if (*err < 0)
^~~~~~~~
newblock is never set before error handling jump.
Initialize newblock to 0 and remove redundant settings.
Fixes: d8b39db5fab8 ("udf: Handle error when adding extent to a file")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20221230175341.1629734-1-trix@redhat.com>
When extending the last extent in the file within the last block, we
wrongly computed the length of the last extent. This is mostly a
cosmetical problem since the extent does not contain any data and the
length will be fixed up by following operations but still.
Fixes: 1f3868f068 ("udf: Fix extending file within last block")
Signed-off-by: Jan Kara <jack@suse.cz>
Pull thermal control fix from Rafael Wysocki:
"Add a missing sysfs attribute to the int340x thermal driver (Srinivas
Pandruvada)"
* tag 'thermal-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: int340x: Add missing attribute for data rate base
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, wifi, and netfilter.
Current release - regressions:
- bpf: fix nullness propagation for reg to reg comparisons, avoid
null-deref
- inet: control sockets should not use current thread task_frag
- bpf: always use maximal size for copy_array()
- eth: bnxt_en: don't link netdev to a devlink port for VFs
Current release - new code bugs:
- rxrpc: fix a couple of potential use-after-frees
- netfilter: conntrack: fix IPv6 exthdr error check
- wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes
- eth: dsa: qca8k: various fixes for the in-band register access
- eth: nfp: fix schedule in atomic context when sync mc address
- eth: renesas: rswitch: fix getting mac address from device tree
- mobile: ipa: use proper endpoint mask for suspend
Previous releases - regressions:
- tcp: add TIME_WAIT sockets in bhash2, fix regression caught by
Jiri / python tests
- net: tc: don't intepret cls results when asked to drop, fix
oob-access
- vrf: determine the dst using the original ifindex for multicast
- eth: bnxt_en:
- fix XDP RX path if BPF adjusted packet length
- fix HDS (header placement) and jumbo thresholds for RX packets
- eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf,
avoid memory corruptions
Previous releases - always broken:
- ulp: prevent ULP without clone op from entering the LISTEN status
- veth: fix race with AF_XDP exposing old or uninitialized
descriptors
- bpf:
- pull before calling skb_postpull_rcsum() (fix checksum support
and avoid a WARN())
- fix panic due to wrong pageattr of im->image (when livepatch and
kretfunc coexist)
- keep a reference to the mm, in case the task is dead
- mptcp: fix deadlock in fastopen error path
- netfilter:
- nf_tables: perform type checking for existing sets
- nf_tables: honor set timeout and garbage collection updates
- ipset: fix hash:net,port,net hang with /0 subnet
- ipset: avoid hung task warning when adding/deleting entries
- selftests: net:
- fix cmsg_so_mark.sh test hang on non-x86 systems
- fix the arp_ndisc_evict_nocarrier test for IPv6
- usb: rndis_host: secure rndis_query check against int overflow
- eth: r8169: fix dmar pte write access during suspend/resume with
WOL
- eth: lan966x: fix configuration of the PCS
- eth: sparx5: fix reading of the MAC address
- eth: qed: allow sleep in qed_mcp_trace_dump()
- eth: hns3:
- fix interrupts re-initialization after VF FLR
- fix handling of promisc when MAC addr table gets full
- refine the handling for VF heartbeat
- eth: mlx5:
- properly handle ingress QinQ-tagged packets on VST
- fix io_eq_size and event_eq_size params validation on big endian
- fix RoCE setting at HCA level if not supported at all
- don't turn CQE compression on by default for IPoIB
- eth: ena:
- fix toeplitz initial hash key value
- account for the number of XDP-processed bytes in interface stats
- fix rx_copybreak value update
Misc:
- ethtool: harden phy stat handling against buggy drivers
- docs: netdev: convert maintainer's doc from FAQ to a normal
document"
* tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
caif: fix memory leak in cfctrl_linkup_request()
inet: control sockets should not use current thread task_frag
net/ulp: prevent ULP without clone op from entering the LISTEN status
qed: allow sleep in qed_mcp_trace_dump()
MAINTAINERS: Update maintainers for ptp_vmw driver
usb: rndis_host: Secure rndis_query check against int overflow
net: dpaa: Fix dtsec check for PCS availability
octeontx2-pf: Fix lmtst ID used in aura free
drivers/net/bonding/bond_3ad: return when there's no aggregator
netfilter: ipset: Rework long task execution when adding/deleting entries
netfilter: ipset: fix hash:net,port,net hang with /0 subnet
net: sparx5: Fix reading of the MAC address
vxlan: Fix memory leaks in error path
net: sched: htb: fix htb_classify() kernel-doc
net: sched: cbq: dont intepret cls results when asked to drop
net: sched: atm: dont intepret cls results when asked to drop
dt-bindings: net: marvell,orion-mdio: Fix examples
dt-bindings: net: sun8i-emac: Add phy-supply property
net: ipa: use proper endpoint mask for suspend
selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier
...
In the compressed instruction extension, c.jr, c.jalr, c.mv, and c.add
is encoded the following way (each instruction is 16b):
---+-+-----------+-----------+--
100 0 rs1[4:0]!=0 00000 10 : c.jr
100 1 rs1[4:0]!=0 00000 10 : c.jalr
100 0 rd[4:0]!=0 rs2[4:0]!=0 10 : c.mv
100 1 rd[4:0]!=0 rs2[4:0]!=0 10 : c.add
The following logic is used to decode c.jr and c.jalr:
insn & 0xf007 == 0x8002 => instruction is an c.jr
insn & 0xf007 == 0x9002 => instruction is an c.jalr
When 0xf007 is used to mask the instruction, c.mv can be incorrectly
decoded as c.jr, and c.add as c.jalr.
Correct the decoding by changing the mask from 0xf007 to 0xf07f.
Fixes: c22b0bcb1d ("riscv: Add kprobes supported")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230102160748.1307289-1-bjorn@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull gpio fixes from Bartosz Golaszewski:
"A reference leak fix, two fixes for using uninitialized variables and
more drivers converted to using immutable irqchips:
- fix a reference leak in gpio-sifive
- fix a potential use of an uninitialized variable in core gpiolib
- fix a potential use of an uninitialized variable in gpio-pca953x
- make GPIO irqchips immutable in gpio-pmic-eic-sprd, gpio-eic-sprd
and gpio-sprd"
* tag 'gpio-fixes-for-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: sifive: Fix refcount leak in sifive_gpio_probe
gpio: sprd: Make the irqchip immutable
gpio: pmic-eic-sprd: Make the irqchip immutable
gpio: eic-sprd: Make the irqchip immutable
gpio: pca953x: avoid to use uninitialized value pinctrl
gpiolib: Fix using uninitialized lookup-flags on ACPI platforms
When sg_alloc_append_table_from_pages() calls to pages_are_mergeable() in
its 'sgt_append->prv' flow to check whether it can merge contiguous pages
into the last SG, it passes the page arguments in the wrong order.
The first parameter should be the next candidate page to be merged to
the last page and not the opposite.
The current code leads to a corrupted SG which resulted in OOPs and
unexpected errors when non-contiguous pages are merged wrongly.
Fix to pass the page parameters in the right order.
Fixes: 1567b49d1a ("lib/scatterlist: add check when merging zone device pages")
Link: https://lore.kernel.org/r/20230105112339.107969-1-yishaih@nvidia.com
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull fbdev fixes from Helge Deller:
- Fix Matrox G200eW initialization failure
- Fix build failure of offb driver when built as module
- Optimize stack usage in omapfb
* tag 'fbdev-for-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: omapfb: avoid stack overflow warning
fbdev: matroxfb: G200eW: Increase max memory from 1 MB to 16 MB
fbdev: atyfb: use strscpy() to instead of strncpy()
fbdev: omapfb: use strscpy() to instead of strncpy()
fbdev: make offb driver tristate
Now that the SRCU Kconfig option is unconditionally selected, there is
no longer any point in selecting it. Therefore, remove the "select SRCU"
Kconfig statements.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Although the powerpc linker script mentions .comment in the DISCARD
section, that has never actually caused it to be discarded, because the
earlier ELF_DETAILS macro (previously STABS_DEBUG) explicitly includes
.comment.
However commit 99cb0d917f ("arch: fix broken BuildID for arm64 and
riscv") introduced an earlier use of DISCARD as part of the RO_DATA
macro. With binutils < 2.36 that causes the DISCARD directives later in
the script to be applied earlier, causing .comment to actually be
discarded.
It's confusing to explicitly include and discard .comment, and even more
so if the behaviour depends on the toolchain version. So don't discard
.comment in order to maintain the existing behaviour in all cases.
Fixes: 83a092cf95 ("powerpc: Link warning for orphan sections")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230105132349.384666-3-mpe@ellerman.id.au
Relocatable kernels must not discard relocations, they need to be
processed at runtime. As such they are included for CONFIG_RELOCATABLE
builds in the powerpc linker script (line 340).
However they are also unconditionally discarded later in the
script (line 414). Previously that worked because the earlier inclusion
superseded the discard.
However commit 99cb0d917f ("arch: fix broken BuildID for arm64 and
riscv") introduced an earlier use of DISCARD as part of the RO_DATA
macro (line 137). With binutils < 2.36 that causes the DISCARD
directives later in the script to be applied earlier, causing .rela* to
actually be discarded at link time, leading to build warnings and a
kernel that doesn't boot:
ld: warning: discarding dynamic section .rela.init.rodata
Fix it by conditionally discarding .rela* only when CONFIG_RELOCATABLE
is disabled.
Fixes: 99cb0d917f ("arch: fix broken BuildID for arm64 and riscv")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230105132349.384666-2-mpe@ellerman.id.au
The powerpc linker script explicitly includes .exit.text, because
otherwise the link fails due to references from __bug_table and
__ex_table. The code is freed (discarded) at runtime along with
.init.text and data.
That has worked in the past despite powerpc not defining
RUNTIME_DISCARD_EXIT because DISCARDS appears late in the powerpc linker
script (line 410), and the explicit inclusion of .exit.text
earlier (line 280) supersedes the discard.
However commit 99cb0d917f ("arch: fix broken BuildID for arm64 and
riscv") introduced an earlier use of DISCARD as part of the RO_DATA
macro (line 136). With binutils < 2.36 that causes the DISCARD
directives later in the script to be applied earlier [1], causing
.exit.text to actually be discarded at link time, leading to build
errors:
'.exit.text' referenced in section '__bug_table' of crypto/algboss.o: defined in
discarded section '.exit.text' of crypto/algboss.o
'.exit.text' referenced in section '__ex_table' of drivers/nvdimm/core.o: defined in
discarded section '.exit.text' of drivers/nvdimm/core.o
Fix it by defining RUNTIME_DISCARD_EXIT, which causes the generic
DISCARDS macro to not include .exit.text at all.
1: https://lore.kernel.org/lkml/87fscp2v7k.fsf@igel.home/
Fixes: 99cb0d917f ("arch: fix broken BuildID for arm64 and riscv")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230105132349.384666-1-mpe@ellerman.id.au
The dsi_irq_stats structure is a little too big to fit on the
stack of a 32-bit task, depending on the specific gcc options:
fbdev/omap2/omapfb/dss/dsi.c: In function 'dsi_dump_dsidev_irqs':
fbdev/omap2/omapfb/dss/dsi.c:1621:1: error: the frame size of 1064 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
Since this is only a debugfs file, performance is not critical,
so just dynamically allocate it, and print an error message
in there in place of a failure code when the allocation fails.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Helge Deller <deller@gmx.de>
Commit 63ffe00d8c ("kbuild: Fix running modpost with musl libc")
accidentally turned the unresolved symbol warnings into errors when
vmlinux.o (for in-tree builds) or Module.symver (for external module
builds) is missing.
In those cases, unresolved symbols are expected, but the -w option
is not set because 'missing-input' is referenced before set.
Move $(missing-input) back to the original place. This should be fine
for musl libc because vmlinux.o and -w are not added at the same time.
With this change, -w may be passed twice, but it is not a big deal.
Link: https://lore.kernel.org/all/b56a03b8-2a2a-f833-a5d2-cdc50a7ca2bb@cschramm.eu/
Fixes: 63ffe00d8c ("kbuild: Fix running modpost with musl libc")
Reported-by: Christopher Schramm <debian@cschramm.eu>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Samuel Holland <samuel@sholland.org>
The single *.ko build is broken since commit f65a486821 ("kbuild:
change module.order to list *.o instead of *.ko").
Fixes: f65a486821 ("kbuild: change module.order to list *.o instead of *.ko")
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
The last fix to iface_count did fix the overcounting issue.
However, during each refresh, we could end up undercounting
the iface_count, if a match was found.
Fixing this by doing increments and decrements instead of
setting it to 0 before each parsing of server interfaces.
Fixes: 096bbeec7b ("smb3: interface count displayed incorrectly")
Cc: stable@vger.kernel.org # 6.1
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
When the server interface for a channel is not active anymore,
we have the logic to select an alternative interface. However
this was not breaking out of the loop as soon as a new alternative
was found. As a result, some interfaces may get refcounted unintentionally.
There was also a bug in checking if we found an alternate iface.
Fixed that too.
Fixes: b54034a73b ("cifs: during reconnect, update interface if necessary")
Cc: stable@vger.kernel.org # 5.19+
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
When an ULP-enabled socket enters the LISTEN status, the listener ULP data
pointer is copied inside the child/accepted sockets by sk_clone_lock().
The relevant ULP can take care of de-duplicating the context pointer via
the clone() operation, but only MPTCP and SMC implement such op.
Other ULPs may end-up with a double-free at socket disposal time.
We can't simply clear the ULP data at clone time, as TLS replaces the
socket ops with custom ones assuming a valid TLS ULP context is
available.
Instead completely prevent clone-less ULP sockets from entering the
LISTEN status.
Fixes: 734942cc4e ("tcp: ULP infrastructure")
Reported-by: slipper <slipper.alive@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/4b80c3d1dbe3d0ab072f80450c202d9bc88b4b03.1672740602.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
By default, qed_mcp_cmd_and_union() delays 10us at a time in a loop
that can run 500K times, so calls to qed_mcp_nvm_rd_cmd()
may block the current thread for over 5s.
We observed thread scheduling delays over 700ms in production,
with stacktraces pointing to this code as the culprit.
qed_mcp_trace_dump() is called from ethtool, so sleeping is permitted.
It already can sleep in qed_mcp_halt(), which calls qed_mcp_cmd().
Add a "can sleep" parameter to qed_find_nvram_image() and
qed_nvram_read() so they can sleep during qed_mcp_trace_dump().
qed_mcp_trace_get_meta_info() and qed_mcp_trace_read_meta(),
called only by qed_mcp_trace_dump(), allow these functions to sleep.
I can't tell if the other caller (qed_grc_dump_mcp_hw_dump()) can sleep,
so keep b_can_sleep set to false when it calls these functions.
An example stacktrace from a custom warning we added to the kernel
showing a thread that has not scheduled despite long needing resched:
[ 2745.362925,17] ------------[ cut here ]------------
[ 2745.362941,17] WARNING: CPU: 23 PID: 5640 at arch/x86/kernel/irq.c:233 do_IRQ+0x15e/0x1a0()
[ 2745.362946,17] Thread not rescheduled for 744 ms after irq 99
[ 2745.362956,17] Modules linked in: ...
[ 2745.363339,17] CPU: 23 PID: 5640 Comm: lldpd Tainted: P O 4.4.182+ #202104120910+6d1da174272d.61x
[ 2745.363343,17] Hardware name: FOXCONN MercuryB/Quicksilver Controller, BIOS H11P1N09 07/08/2020
[ 2745.363346,17] 0000000000000000 ffff885ec07c3ed8 ffffffff8131eb2f ffff885ec07c3f20
[ 2745.363358,17] ffffffff81d14f64 ffff885ec07c3f10 ffffffff81072ac2 ffff88be98ed0000
[ 2745.363369,17] 0000000000000063 0000000000000174 0000000000000074 0000000000000000
[ 2745.363379,17] Call Trace:
[ 2745.363382,17] <IRQ> [<ffffffff8131eb2f>] dump_stack+0x8e/0xcf
[ 2745.363393,17] [<ffffffff81072ac2>] warn_slowpath_common+0x82/0xc0
[ 2745.363398,17] [<ffffffff81072b4c>] warn_slowpath_fmt+0x4c/0x50
[ 2745.363404,17] [<ffffffff810d5a8e>] ? rcu_irq_exit+0xae/0xc0
[ 2745.363408,17] [<ffffffff817c99fe>] do_IRQ+0x15e/0x1a0
[ 2745.363413,17] [<ffffffff817c7ac9>] common_interrupt+0x89/0x89
[ 2745.363416,17] <EOI> [<ffffffff8132aa74>] ? delay_tsc+0x24/0x50
[ 2745.363425,17] [<ffffffff8132aa04>] __udelay+0x34/0x40
[ 2745.363457,17] [<ffffffffa04d45ff>] qed_mcp_cmd_and_union+0x36f/0x7d0 [qed]
[ 2745.363473,17] [<ffffffffa04d5ced>] qed_mcp_nvm_rd_cmd+0x4d/0x90 [qed]
[ 2745.363490,17] [<ffffffffa04e1dc7>] qed_mcp_trace_dump+0x4a7/0x630 [qed]
[ 2745.363504,17] [<ffffffffa04e2556>] ? qed_fw_asserts_dump+0x1d6/0x1f0 [qed]
[ 2745.363520,17] [<ffffffffa04e4ea7>] qed_dbg_mcp_trace_get_dump_buf_size+0x37/0x80 [qed]
[ 2745.363536,17] [<ffffffffa04ea881>] qed_dbg_feature_size+0x61/0xa0 [qed]
[ 2745.363551,17] [<ffffffffa04eb427>] qed_dbg_all_data_size+0x247/0x260 [qed]
[ 2745.363560,17] [<ffffffffa0482c10>] qede_get_regs_len+0x30/0x40 [qede]
[ 2745.363566,17] [<ffffffff816c9783>] ethtool_get_drvinfo+0xe3/0x190
[ 2745.363570,17] [<ffffffff816cc152>] dev_ethtool+0x1362/0x2140
[ 2745.363575,17] [<ffffffff8109bcc6>] ? finish_task_switch+0x76/0x260
[ 2745.363580,17] [<ffffffff817c2116>] ? __schedule+0x3c6/0x9d0
[ 2745.363585,17] [<ffffffff810dbd50>] ? hrtimer_start_range_ns+0x1d0/0x370
[ 2745.363589,17] [<ffffffff816c1e5b>] ? dev_get_by_name_rcu+0x6b/0x90
[ 2745.363594,17] [<ffffffff816de6a8>] dev_ioctl+0xe8/0x710
[ 2745.363599,17] [<ffffffff816a58a8>] sock_do_ioctl+0x48/0x60
[ 2745.363603,17] [<ffffffff816a5d87>] sock_ioctl+0x1c7/0x280
[ 2745.363608,17] [<ffffffff8111f393>] ? seccomp_phase1+0x83/0x220
[ 2745.363612,17] [<ffffffff811e3503>] do_vfs_ioctl+0x2b3/0x4e0
[ 2745.363616,17] [<ffffffff811e3771>] SyS_ioctl+0x41/0x70
[ 2745.363619,17] [<ffffffff817c6ffe>] entry_SYSCALL_64_fastpath+0x1e/0x79
[ 2745.363622,17] ---[ end trace f6954aa440266421 ]---
Fixes: c965db4446 ("qed: Add support for debug data collection")
Signed-off-by: Caleb Sander <csander@purestorage.com>
Acked-by: Alok Prasad <palok@marvell.com>
Link: https://lore.kernel.org/r/20230103233021.1457646-1-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexei Starovoitov says:
====================
bpf 2023-01-04
We've added 5 non-merge commits during the last 8 day(s) which contain
a total of 5 files changed, 112 insertions(+), 18 deletions(-).
The main changes are:
1) Always use maximal size for copy_array in the verifier to fix
KASAN tracking, from Kees.
2) Fix bpf task iterator walking through dead tasks, from Kui-Feng.
3) Make sure livepatch and bpf fexit can coexist, from Chuang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Always use maximal size for copy_array()
selftests/bpf: add a test for iter/task_vma for short-lived processes
bpf: keep a reference to the mm, in case the task is dead.
selftests/bpf: Temporarily disable part of btf_dump:var_data test.
bpf: Fix panic due to wrong pageattr of im->image
====================
Link: https://lore.kernel.org/r/20230104215500.79435-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I do not read a strict requirement on /chosen node in either ePAPR or in
Documentation/devicetree. Help text for CONFIG_CMDLINE and
CONFIG_CMDLINE_EXTEND doesn't make their behavior explicitly dependent on
the presence of /chosen or the presense of /chosen/bootargs.
However the early check for /chosen and bailing out in
early_init_dt_scan_chosen() skips CONFIG_CMDLINE handling which is not
really related to /chosen node or the particular method of passing cmdline
from bootloader.
This leads to counterintuitive combinations (assuming
CONFIG_CMDLINE_EXTEND=y):
a) bootargs="foo", CONFIG_CMDLINE="bar" => cmdline=="foo bar"
b) /chosen missing, CONFIG_CMDLINE="bar" => cmdline==""
c) bootargs="", CONFIG_CMDLINE="bar" => cmdline==" bar"
Rework early_init_dt_scan_chosen() so that the cmdline config options are
always handled.
[commit msg written by Alexander Sverdlin]
Cc: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Tested-by: Geoff Levand <geoff@infradead.org>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Link: https://lore.kernel.org/r/20230103-dt-cmdline-fix-v1-2-7038e88b18b6@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Pull virtio updates from Michael Tsirkin:
"Mostly fixes all over the place, a couple of cleanups"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (32 commits)
virtio_blk: Fix signedness bug in virtblk_prep_rq()
vdpa_sim_net: should not drop the multicast/broadcast packet
vdpasim: fix memory leak when freeing IOTLBs
vdpa: conditionally fill max max queue pair for stats
vdpa/vp_vdpa: fix kfree a wrong pointer in vp_vdpa_remove
vduse: Validate vq_num in vduse_validate_config()
tools/virtio: remove smp_read_barrier_depends()
tools/virtio: remove stray characters
vhost_vdpa: fix the crash in unmap a large memory
virtio: Implementing attribute show with sysfs_emit
virtio-crypto: fix memory leak in virtio_crypto_alg_skcipher_close_session()
tools/virtio: Variable type completion
vdpa_sim: fix vringh initialization in vdpasim_queue_ready()
virtio_blk: use UINT_MAX instead of -1U
vhost-vdpa: fix an iotlb memory leak
vhost: fix range used in translate_desc()
vringh: fix range used in iotlb_translate()
vhost/vsock: Fix error handling in vhost_vsock_init()
vdpa_sim: fix possible memory leak in vdpasim_net_init() and vdpasim_blk_init()
tools: Delete the unneeded semicolon after curly braces
...
The cacheline section holding this variable has two gaps, where one is
caused by this bool not packing well with structs. This causes it to
blow into the next cacheline. Move the variable, shrinking io_ring_ctx
by a full cacheline in size.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we split a bio marked with REQ_NOWAIT, then we can trigger spurious
EAGAIN if constituent parts of that split bio end up failing request
allocations. Parts will complete just fine, but just a single failure
in one of the chained bios will yield an EAGAIN final result for the
parent bio.
Return EAGAIN early if we end up needing to split such a bio, which
allows for saner recovery handling.
Cc: stable@vger.kernel.org # 5.15+
Link: https://github.com/axboe/liburing/issues/766
Reported-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull misc x86 fixes from Ingo Molnar:
"Fix a double-free bug, a binutils warning, a header namespace clash
and a bug in ib_prctl_set()"
* tag 'x86-urgent-2023-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Flush IBP in ib_prctl_set()
x86/insn: Avoid namespace clash by separating instruction decoder MMIO type from MMIO trace type
x86/asm: Fix an assembler warning with current binutils
x86/kexec: Fix double-free of elf header buffer
Pull f2fs fixes from Jaegeuk Kim:
- fix a null pointer dereference in f2fs_issue_flush, which occurs by
the combination of mount/remount options.
- fix a bug in per-block age-based extent_cache newly introduced in
6.2-rc1, which reported a wrong age information in extent_cache.
- fix a kernel panic if extent_tree was not created, which was caught
by a wrong BUG_ON
* tag 'f2fs-fix-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: let's avoid panic if extent_tree is not created
f2fs: should use a temp extent_info for lookup
f2fs: don't mix to use union values in extent_info
f2fs: initialize extent_cache parameter
f2fs: fix to avoid NULL pointer dereference in f2fs_issue_flush()
Not all libc implementations define ssize_t as part of stdio.h like
glibc does since the standard only requires this type to be defined by
unistd.h and sys/types.h. For this reason the perf build is currently
broken for toolchains based on uClibc, for instance.
Include sys/types.h explicitly to fix that.
Committer notes:
In addition, in the past this worked in uClibc test systems as there was
another way to get to sys/types.h that got removed in that cset:
tools/perf/util/trace-event.h
/usr/include/traceevent/event_parse.h # This got removed from util/trace-event.h in 378ef0f5d9
/usr/include/regex.h
/usr/include/sys/types.h
typedef __ssize_t ssize_t;
So the size_t that is used in tools/perf/util/trace-event.h was being
obtained indirectly, by chance.
Fixes: 378ef0f5d9 ("perf build: Use libtraceevent from the system")
Signed-off-by: Jesus Sanchez-Palencia <jesussanp@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20230104193414.606905-1-jesussanp@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull nfsd fixes from Chuck Lever:
- Fix a filecache UAF during NFSD shutdown
- Avoid exposing automounted mounts on NFS re-exports
* tag 'nfsd-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix handling of readdir in v4root vs. mount upcall timeout
nfsd: shut down the NFSv4 state objects before the filecache
This can't happen right now, but in preparation for allowing
bio_split_to_limits() returning NULL if it ended the bio, check for it
in all the callers.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If intel_gvt_dma_map_guest_page failed, it will call
ppgtt_invalidate_spt, which will finally free the spt.
But the caller function ppgtt_populate_spt_by_guest_entry
does not notice that, it will free spt again in its error
path.
Fix this by canceling the mapping of DMA address and freeing sub_spt.
Besides, leave the handle of spt destroy to caller function instead
of callee function when error occurs.
Fixes: b901b252b6 ("drm/i915/gvt: Add 2M huge gtt support")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20221229165641.1192455-1-zyytlz.wz@163.com
Use the appropriate locks to protect access of hostname and dstaddr
fields in cifs_tree_connect() as they might get changed by other
tasks.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
The --for-each-cgroup can have the same cgroup multiple times, but this
confuses BPF counters (since they have the same cgroup id), making only
the last cgroup events to be counted.
Let's check the cgroup name before adding a new entry to the cgroups
list.
Before:
$ sudo ./perf stat -a --bpf-counters --for-each-cgroup /,/ sleep 1
Performance counter stats for 'system wide':
<not counted> msec cpu-clock /
<not counted> context-switches /
<not counted> cpu-migrations /
<not counted> page-faults /
<not counted> cycles /
<not counted> instructions /
<not counted> branches /
<not counted> branch-misses /
8,016.04 msec cpu-clock / # 7.998 CPUs utilized
6,152 context-switches / # 767.461 /sec
250 cpu-migrations / # 31.187 /sec
442 page-faults / # 55.139 /sec
613,111,487 cycles / # 0.076 GHz
280,599,604 instructions / # 0.46 insn per cycle
57,692,724 branches / # 7.197 M/sec
3,385,168 branch-misses / # 5.87% of all branches
1.002220125 seconds time elapsed
After it becomes similar to the non-BPF mode:
$ sudo ./perf stat -a --bpf-counters --for-each-cgroup /,/ sleep 1
Performance counter stats for 'system wide':
8,013.38 msec cpu-clock / # 7.998 CPUs utilized
6,859 context-switches / # 855.944 /sec
334 cpu-migrations / # 41.680 /sec
345 page-faults / # 43.053 /sec
782,326,119 cycles / # 0.098 GHz
471,645,724 instructions / # 0.60 insn per cycle
94,963,430 branches / # 11.851 M/sec
3,685,511 branch-misses / # 3.88% of all branches
1.001864539 seconds time elapsed
Committer notes:
As a reminder, to test with BPF counters one has to use BUILD_BPF_SKEL=1
in the make command line and have clang/llvm installed when building
perf, otherwise the --bpf-counters option will not be available:
# perf stat -a --bpf-counters --for-each-cgroup /,/ sleep 1
Error: unknown option `bpf-counters'
Usage: perf stat [<options>] [<command>]
-a, --all-cpus system-wide collection from all CPUs
<SNIP>
#
Fixes: bb1c15b60b ("perf stat: Support regex pattern in --for-each-cgroup")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: bpf@vger.kernel.org
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/r/20230104064402.1551516-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When --for-each-cgroup option is used, it fails when any of events is
not supported and exits immediately. This is not how 'perf stat'
handles unsupported events.
Let's ignore the failure and proceed with others so that the output is
similar to when BPF counters are not used:
Before:
$ sudo ./perf stat -a --bpf-counters -e L1-icache-loads,L1-dcache-loads --for-each-cgroup system.slice,user.slice sleep 1
Failed to open first cgroup events
$
After it shows output similat to when --bpf-counters isn't specified:
$ sudo ./perf stat -a --bpf-counters -e L1-icache-loads,L1-dcache-loads --for-each-cgroup system.slice,user.slice sleep 1
Performance counter stats for 'system wide':
<not supported> L1-icache-loads system.slice
29,892,418 L1-dcache-loads system.slice
<not supported> L1-icache-loads user.slice
52,497,220 L1-dcache-loads user.slice
$
Fixes: 944138f048 ("perf stat: Enable BPF counter with --for-each-cgroup")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/r/20230104064402.1551516-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf test '84: probe libc's inet_pton & backtrace it with ping' fails on
s390. Debugging revealed a changed stack trace for the ping command
using probes:
ping 35729 [002] 8006.365063: probe_libc:inet_pton: (3ff9603e7c0)
13e7c0 __GI___inet_pton+0x0 (/usr/lib64/libc.so.6)
---> 104371 text_to_binary_address+0xef1 (inlined)
104371 gaih_inet+0xef1 (inlined)
104371 __GI_getaddrinfo+0xef1 (inlined)
5d4b main+0x139b (/usr/bin/ping)
The line "---> text_to_binary_address ..." is new. It was introduced
with glibc version 2.36.7.2 released with Fedora 37 for s390.
Output before
# perf test inet_pton
84: probe libc's inet_pton & backtrace it with ping : FAILED!
#
Output after:
# perf test inet_pton
84: probe libc's inet_pton & backtrace it with ping : Ok
#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20221228145704.2702487-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit cf4694be2b ("tools: Add atomic_test_and_set_bit()") changed
tools/arch/x86/include/asm/atomic.h to include <asm/asm.h>, which causes
'make -C tools/testing/memblock' to fail with:
In file included from ../../include/asm/atomic.h:6,
from ../../include/linux/atomic.h:5,
from ./linux/mmzone.h:5,
from ../../include/linux/mm.h:5,
from ../../include/linux/pfn.h:5,
from ./linux/memory_hotplug.h:6,
from ./linux/init.h:7,
from ./linux/memblock.h:11,
from tests/common.h:8,
from tests/basic_api.h:5,
from main.c:2:
../../include/asm/../../arch/x86/include/asm/atomic.h:11:10: fatal error: asm/asm.h: No such file or directory
11 | #include <asm/asm.h>
| ^~~~~~~~~~~
Create a symlink to asm/asm.h in the same manner as the existing one to
asm/cmpxchg.h.
Signed-off-by: Aaron Thompson <dev@aaront.org>
Link: https://lore.kernel.org/r/010101857c402765-96e2dbc6-b82b-47e2-a437-4834dbe0b96b-000000@us-west-2.amazonses.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Vivek has decided to transfer the maintainership of the VMware virtual
PTP clock driver (ptp_vmw) to Srivatsa and Deep. Update the
MAINTAINERS file to reflect this change, and also add Alexey as a
reviewer for the driver.
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Acked-by: Vivek Thampi <vivek@vivekthampi.com>
Acked-by: Deep Shah <sdeep@vmware.com>
Acked-by: Alexey Makhalov <amakhalov@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Serialise access of TCP_Server_Info::hostname in
assemble_neg_contexts() by holding the server's mutex otherwise it
might end up accessing an already-freed hostname pointer from
cifs_reconnect() or cifs_resolve_server().
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
If it failed to reconnect ipc used for getting referrals, we can just
ignore it as it is not required for reconnecting the share. The worst
case would be not being able to detect or chase nested links as long
as dfs root server is unreachable.
Before patch:
$ mount.cifs //root/dfs/link /mnt -o echo_interval=10,...
-> target share: /fs0/share
disconnect root & fs0
$ ls /mnt
ls: cannot access '/mnt': Host is down
connect fs0
$ ls /mnt
ls: cannot access '/mnt': Resource temporarily unavailable
After patch:
$ mount.cifs //root/dfs/link /mnt -o echo_interval=10,...
-> target share: /fs0/share
disconnect root & fs0
$ ls /mnt
ls: cannot access '/mnt': Host is down
connect fs0
$ ls /mnt
bar.rtf dir1 foo
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
There are several properties depending on number of ports. Some of them
had maximum limit of 5 and some of 8. SM8450 AudioReach comes with 8
ports, so fix the limits:
sm8450-sony-xperia-nagara-pdx224.dtb: soundwire-controller@3250000: qcom,ports-word-length: 'oneOf' conditional failed, one must be fixed:
[[255, 255, 255, 255, 255, 255, 255, 255]] is too short
[255, 255, 255, 255, 255, 255, 255, 255] is too long
Fixes: febc50b82b ("dt-bindings: soundwire: Convert text bindings to DT Schema")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221223132159.81211-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Rob Herring <robh@kernel.org>
If memory has been found early_init_dt_scan_memory now returns 1. If
it hasn't found any memory it will return 0, allowing other memory
setup mechanisms to carry on.
Previously early_init_dt_scan_memory always returned 0 without
distinguishing between any kind of memory setup being done or not. Any
code path after the early_init_dt_scan memory call in the ramips
plat_mem_setup code wouldn't be executed anymore. Making
early_init_dt_scan_memory the only way to initialize the memory.
Some boards, including my mt7621 based Cudy X6 board, depend on memory
initialization being done via the soc_info.mem_detect function
pointer. Those wouldn't be able to obtain memory and panic the kernel
during early bootup with the message "early_init_dt_alloc_memory_arch:
Failed to allocate 12416 bytes align=0x40".
Fixes: 1f012283e9 ("of/fdt: Rework early_init_dt_scan_memory() to call directly")
Cc: stable@vger.kernel.org
Signed-off-by: Andreas Rammhold <andreas@rammhold.de>
Link: https://lore.kernel.org/r/20221223112748.2935235-1-andreas@rammhold.de
Signed-off-by: Rob Herring <robh@kernel.org>
The test case perf lock contention dumps core on s390. Run the following
commands:
# ./perf lock record -- ./perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 2.799 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.073 MB perf.data (100 samples) ]
#
# ./perf lock contention
Segmentation fault (core dumped)
#
The function call stack is lengthy, here are the top 5 functions:
# gdb ./perf core.24048
GNU gdb (GDB) Fedora Linux 12.1-6.fc37
Core was generated by `./perf lock contention'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28, addr=1789230) at util/machine.c:3356
3356 machine->sched.text_end = kmap->unmap_ip(kmap, sym->start);
(gdb) where
#0 0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28, addr=1789230) at util/machine.c:3356
#1 0x000000000109f244 in callchain_id (evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:957
#2 0x000000000109e094 in get_key_by_aggr_mode (key=0x3ffea4f7290, addr=27758136, evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:586
#3 0x000000000109f4d0 in report_lock_contention_begin_event (evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:1004
#4 0x00000000010a00ae in evsel__process_contention_begin (evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:1254
#5 0x00000000010a0e14 in process_sample_event (tool=0x3ffea4f8480, event=0x3ff85601ef8, sample=0x3ffea4f77d0, evsel=0x30313e0, machine=0x3029e28) at builtin-lock.c:1464
.....
The issue is in function machine__is_lock_function() in file
./util/machine.c lines 3355:
/* should not fail from here */
sym = machine__find_kernel_symbol_by_name(machine, "__sched_text_end", &kmap);
machine->sched.text_end = kmap->unmap_ip(kmap, sym->start)
On s390 the symbol __sched_text_end is *NOT* in the symbol list and the
resulting pointer sym is set to NULL. The sym->start is then a NULL pointer
access and generates the core dump.
The reason why __sched_text_end is not in the symbol list on s390 is
simple:
When the symbol list is created at perf start up with function calls
dso__load
+--> dso__load_vmlinux_path
+--> dso__load_vmlinux
+--> dso__load_sym
+--> dso__load_sym_internal (reads kernel symbols)
+--> symbols__fixup_end
+--> symbols__fixup_duplicate
The issue is in function symbols__fixup_duplicate(). It deletes all
symbols with have the same address. On s390:
# nm -g ~/linux/vmlinux| fgrep c68390
0000000000c68390 T __cpuidle_text_start
0000000000c68390 T __sched_text_end
#
two symbols have identical addresses and __sched_text_end is considered
duplicate (in ascending sort order) and removed from the symbol list.
Therefore it is missing and an invalid pointer reference occurs. The
code checks for symbol __sched_text_start and when it exists assumes
symbol __sched_text_end is also in the symbol table. However this is not
the case on s390.
Same situation exists for symbol __lock_text_start:
0000000000c68770 T __cpuidle_text_end
0000000000c68770 T __lock_text_start
This symbol is also removed from the symbol table but used in function
machine__is_lock_function().
To fix this and keep duplicate symbols in the symbol table, set
symbol_conf.allow_aliases to true. This prevents the removal of
duplicate symbols in function symbols__fixup_duplicate().
Output After:
# ./perf lock contention
contended total wait max wait avg wait type caller
48 124.39 ms 123.99 ms 2.59 ms rwsem:W unlink_anon_vmas+0x24a
47 83.68 ms 83.26 ms 1.78 ms rwsem:W free_pgtables+0x132
5 41.22 us 10.55 us 8.24 us rwsem:W free_pgtables+0x140
4 40.12 us 20.55 us 10.03 us rwsem:W copy_process+0x1ac8
#
Fixes: 0d2997f750 ("perf lock: Look up callchain for the contended locks")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20221230102627.2410847-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
subdir is added to the OUTPUT which fails as part of building
install_headers when passed from "make -C tools perf_install".
Committer testing:
The original reporter (see the Link: below) had trouble with this:
$ make -C tools perf_install
That ended up with errors like this:
/var/home/acme/git/perf-urgent/tools/scripts/Makefile.include:17: *** output directory "/var/home/acme/git/perf-urgent/tools/perf/libperf/perf/" does not exist. Stop.
With this patch applied we now get it installed at:
INSTALL /var/home/acme/git/perf-urgent/tools/perf/libperf/include/perf/bpf_perf.h
As expected:
$ ls -la /var/home/acme/git/perf-urgent/tools/perf/libperf/include/perf/bpf_perf.h
-rw-r--r--. 1 acme acme 1146 Jan 3 15:42 /var/home/acme/git/perf-urgent/tools/perf/libperf/include/perf/bpf_perf.h
And if we clean tools with:
$ make -C tools clean
it gets cleaned up:
$ ls -la /var/home/acme/git/perf-urgent/tools/perf/libperf/include/perf/bpf_perf.h
ls: cannot access '/var/home/acme/git/perf-urgent/tools/perf/libperf/include/perf/bpf_perf.h': No such file or directory
$
Fixes: 746bd29e34 ("perf build: Use tools/lib headers from install path")
Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/fa4b3115-d555-3d7f-54d1-018002e99350@secunet.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Both <linux/mmiotrace.h> and <asm/insn-eval.h> define various MMIO_ enum constants,
whose namespace overlaps.
Rename the <asm/insn-eval.h> ones to have a INSN_ prefix, so that the headers can be
used from the same source file.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230101162910.710293-2-Jason@zx2c4.com
With below two cases, it will cause NULL pointer dereference when
accessing SM_I(sbi)->fcc_info in f2fs_issue_flush().
a) If kthread_run() fails in f2fs_create_flush_cmd_control(), it will
release SM_I(sbi)->fcc_info,
- mount -o noflush_merge /dev/vda /mnt/f2fs
- mount -o remount,flush_merge /dev/vda /mnt/f2fs -- kthread_run() fails
- dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=1 conv=fsync
b) we will never allocate memory for SM_I(sbi)->fcc_info w/ below
testcase,
- mount -o ro /dev/vda /mnt/f2fs
- mount -o rw,remount /dev/vda /mnt/f2fs
- dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=1 conv=fsync
In order to fix this issue, let change as below:
- fix error path handling in f2fs_create_flush_cmd_control().
- allocate SM_I(sbi)->fcc_info even if readonly is on.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[BUG]
Even with commit 81d5d61454 ("btrfs: enhance unsupported compat RO
flags handling"), btrfs can still mount a fs with unsupported compat_ro
flags read-only, then remount it RW:
# btrfs ins dump-super /dev/loop0 | grep compat_ro_flags -A 3
compat_ro_flags 0x403
( FREE_SPACE_TREE |
FREE_SPACE_TREE_VALID |
unknown flag: 0x400 )
# mount /dev/loop0 /mnt/btrfs
mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
dmesg(1) may have more information after failed mount system call.
^^^ RW mount failed as expected ^^^
# dmesg -t | tail -n5
loop0: detected capacity change from 0 to 1048576
BTRFS: device fsid cb5b82f5-0fdd-4d81-9b4b-78533c324afa devid 1 transid 7 /dev/loop0 scanned by mount (1146)
BTRFS info (device loop0): using crc32c (crc32c-intel) checksum algorithm
BTRFS info (device loop0): using free space tree
BTRFS error (device loop0): cannot mount read-write because of unknown compat_ro features (0x403)
BTRFS error (device loop0): open_ctree failed
# mount /dev/loop0 -o ro /mnt/btrfs
# mount -o remount,rw /mnt/btrfs
^^^ RW remount succeeded unexpectedly ^^^
[CAUSE]
Currently we use btrfs_check_features() to check compat_ro flags against
our current mount flags.
That function get reused between open_ctree() and btrfs_remount().
But for btrfs_remount(), the super block we passed in still has the old
mount flags, thus btrfs_check_features() still believes we're mounting
read-only.
[FIX]
Replace the existing @sb argument with @is_rw_mount.
As originally we only use @sb to determine if the mount is RW.
Now it's callers' responsibility to determine if the mount is RW, and
since there are only two callers, the check is pretty simple:
- caller in open_ctree()
Just pass !sb_rdonly().
- caller in btrfs_remount()
Pass !(*flags & SB_RDONLY), as our check should be against the new
flags.
Now we can correctly reject the RW remount:
# mount /dev/loop0 -o ro /mnt/btrfs
# mount -o remount,rw /mnt/btrfs
mount: /mnt/btrfs: mount point not mounted or bad option.
dmesg(1) may have more information after failed mount system call.
# dmesg -t | tail -n 1
BTRFS error (device loop0: state M): cannot mount read-write because of unknown compat_ro features (0x403)
Reported-by: Chung-Chiang Cheng <shepjeng@gmail.com>
Fixes: 81d5d61454 ("btrfs: enhance unsupported compat RO flags handling")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we have a btrfs_debug() for run_one_delayed_ref() failure, but
if end users hit such problem, there will be no chance that
btrfs_debug() is enabled. This can lead to very little useful info for
debugging.
This patch will:
- Add extra info for error reporting
Including:
* logical bytenr
* num_bytes
* type
* action
* ref_mod
- Replace the btrfs_debug() with btrfs_err()
- Move the error reporting into run_one_delayed_ref()
This is to avoid use-after-free, the @node can be freed in the caller.
This error should only be triggered at most once.
As if run_one_delayed_ref() failed, we trigger the error message, then
causing the call chain to error out:
btrfs_run_delayed_refs()
`- btrfs_run_delayed_refs()
`- btrfs_run_delayed_refs_for_head()
`- run_one_delayed_ref()
And we will abort the current transaction in btrfs_run_delayed_refs().
If we have to run delayed refs for the abort transaction,
run_one_delayed_ref() will just cleanup the refs and do nothing, thus no
new error messages would be output.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that a BUG_ON() in btrfs_repair_io_failure()
(originally repair_io_failure() in v6.0 kernel) got triggered when
replacing a unreliable disk:
BTRFS warning (device sda1): csum failed root 257 ino 2397453 off 39624704 csum 0xb0d18c75 expected csum 0x4dae9c5e mirror 3
kernel BUG at fs/btrfs/extent_io.c:2380!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 9 PID: 3614331 Comm: kworker/u257:2 Tainted: G OE 6.0.0-5-amd64 #1 Debian 6.0.10-2
Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO WIFI (MS-7C60), BIOS 2.70 07/01/2021
Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
RIP: 0010:repair_io_failure+0x24a/0x260 [btrfs]
Call Trace:
<TASK>
clean_io_failure+0x14d/0x180 [btrfs]
end_bio_extent_readpage+0x412/0x6e0 [btrfs]
? __switch_to+0x106/0x420
process_one_work+0x1c7/0x380
worker_thread+0x4d/0x380
? rescuer_thread+0x3a0/0x3a0
kthread+0xe9/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x22/0x30
[CAUSE]
Before the BUG_ON(), we got some read errors from the replace target
first, note the mirror number (3, which is beyond RAID1 duplication,
thus it's read from the replace target device).
Then at the BUG_ON() location, we are trying to writeback the repaired
sectors back the failed device.
The check looks like this:
ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, logical,
&map_length, &bioc, mirror_num);
if (ret)
goto out_counter_dec;
BUG_ON(mirror_num != bioc->mirror_num);
But inside btrfs_map_block(), we can modify bioc->mirror_num especially
for dev-replace:
if (dev_replace_is_ongoing && mirror_num == map->num_stripes + 1 &&
!need_full_stripe(op) && dev_replace->tgtdev != NULL) {
ret = get_extra_mirror_from_replace(fs_info, logical, *length,
dev_replace->srcdev->devid,
&mirror_num,
&physical_to_patch_in_first_stripe);
patch_the_first_stripe_for_dev_replace = 1;
}
Thus if we're repairing the replace target device, we're going to
trigger that BUG_ON().
But in reality, the read failure from the replace target device may be
that, our replace hasn't reached the range we're reading, thus we're
reading garbage, but with replace running, the range would be properly
filled later.
Thus in that case, we don't need to do anything but let the replace
routine to handle it.
[FIX]
Instead of a BUG_ON(), just skip the repair if we're repairing the
device replace target device.
Reported-by: 小太 <nospam@kota.moe>
Link: https://lore.kernel.org/linux-btrfs/CACsxjPYyJGQZ+yvjzxA1Nn2LuqkYqTCcUH43S=+wXhyf8S00Ag@mail.gmail.com/
CC: stable@vger.kernel.org # 6.0+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
During lseek, when searching for delalloc in a range that represents a
hole and that range has a length of 1 byte, we end up not doing the actual
delalloc search in the inode's io tree, resulting in not correctly
reporting the offset with data or a hole. This actually only happens when
the start offset is 0 because with any other start offset we round it down
by sector size.
Reproducer:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt/sdc
$ xfs_io -f -c "pwrite -q 0 1" /mnt/sdc/foo
$ xfs_io -c "seek -d 0" /mnt/sdc/foo
Whence Result
DATA EOF
It should have reported an offset of 0 instead of EOF.
Fix this by updating btrfs_find_delalloc_in_range() and count_range_bits()
to deal with inclusive ranges properly. These functions are already
supposed to work with inclusive end offsets, they just got it wrong in a
couple places due to off-by-one mistakes.
A test case for fstests will be added later.
Reported-by: Joan Bruguera Micó <joanbrugueram@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/20221223020509.457113-1-joanbrugueram@gmail.com/
Fixes: b6e833567e ("btrfs: make hole and data seeking a lot more efficient")
CC: stable@vger.kernel.org # 6.1
Tested-by: Joan Bruguera Micó <joanbrugueram@gmail.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that on a RAID0 NVMe btrfs system, under heavy
write load the filesystem can flip RO randomly.
With extra debugging, it shows some tree blocks failed to pass their
level checks, and if that happens at critical path of a transaction, we
abort the transaction:
BTRFS error (device nvme0n1p3): level verify failed on logical 5446121209856 mirror 1 wanted 0 found 1
BTRFS error (device nvme0n1p3: state A): Transaction aborted (error -5)
BTRFS: error (device nvme0n1p3: state A) in btrfs_finish_ordered_io:3343: errno=-5 IO failure
BTRFS info (device nvme0n1p3: state EA): forced readonly
[CAUSE]
The reporter has already bisected to commit 947a629988 ("btrfs: move
tree block parentness check into validate_extent_buffer()").
And with extra debugging, it shows we can have btrfs_tree_parent_check
filled with all zeros in the following call trace:
submit_one_bio+0xd4/0xe0
submit_extent_page+0x142/0x550
read_extent_buffer_pages+0x584/0x9c0
? __pfx_end_bio_extent_readpage+0x10/0x10
? folio_unlock+0x1d/0x50
btrfs_read_extent_buffer+0x98/0x150
read_tree_block+0x43/0xa0
read_block_for_search+0x266/0x370
btrfs_search_slot+0x351/0xd30
? lock_is_held_type+0xe8/0x140
btrfs_lookup_csum+0x63/0x150
btrfs_csum_file_blocks+0x197/0x6c0
? sched_clock_cpu+0x9f/0xc0
? lock_release+0x14b/0x440
? _raw_read_unlock+0x29/0x50
btrfs_finish_ordered_io+0x441/0x860
btrfs_work_helper+0xfe/0x400
? lock_is_held_type+0xe8/0x140
process_one_work+0x294/0x5b0
worker_thread+0x4f/0x3a0
? __pfx_worker_thread+0x10/0x10
kthread+0xf5/0x120
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2c/0x50
Currently we only copy the btrfs_tree_parent_check structure into bbio
at read_extent_buffer_pages() after we have assembled the bbio.
But as shown above, submit_extent_page() itself can already submit the
bbio, leaving the bbio->parent_check uninitialized, and cause the false
alert.
[FIX]
Instead of copying @check into bbio after bbio is assembled, we pass
@check in btrfs_bio_ctrl::parent_check, and copy the content of
parent_check in submit_one_bio() for metadata read.
By this we should be able to pass the needed info for metadata endio
verification, and fix the false alert.
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CABXGCsNzVxo4iq-tJSGm_kO1UggHXgq6CdcHDL=z5FL4njYXSQ@mail.gmail.com/
Fixes: 947a629988 ("btrfs: move tree block parentness check into validate_extent_buffer()")
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The em->len value is supposed to be verified in the assertion condition
though we expect it to be same as the sectorsize.
Fixes: a196a8944f ("btrfs: do not reset extent map members for inline extents read")
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Tanmay Bhushan <007047221b@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Starting with glibc 2.35 there are extra inet_pton() calls when doing a
IPv6 ping as in one of the 'perf test' entry, which makes it fail:
# perf test inet_pton
89: probe libc's inet_pton & backtrace it with ping : FAILED!
#
If we look at what this script is expecting (commenting out the removal
of the temporary files in it):
# cat /tmp/expected.aT6
ping[][0-9 \.:]+probe_libc:inet_pton: \([[:xdigit:]]+\)
.*inet_pton\+0x[[:xdigit:]]+[[:space:]]\(/usr/lib64/libc.so.6|inlined\)$
getaddrinfo\+0x[[:xdigit:]]+[[:space:]]\(/usr/lib64/libc.so.6\)$
.*(\+0x[[:xdigit:]]+|\[unknown\])[[:space:]]\(.*/bin/ping.*\)$
#
And looking at what we are getting out of 'perf script', to match with
the above:
# cat /tmp/perf.script.IUC
ping 623883 [006] 265438.471610: probe_libc:inet_pton: (7f32bcf314c0)
1314c0 __GI___inet_pton+0x0 (/usr/lib64/libc.so.6)
29510 __libc_start_call_main+0x80 (/usr/lib64/libc.so.6)
ping 623883 [006] 265438.471664: probe_libc:inet_pton: (7f32bcf314c0)
1314c0 __GI___inet_pton+0x0 (/usr/lib64/libc.so.6)
fa6c6 getaddrinfo+0x126 (/usr/lib64/libc.so.6)
491e [unknown] (/usr/bin/ping)
#
We see that its just the first call to inet_pton() that didn't came thru
getaddrinfo(), so if we ignore the first the script matches what it
expects, testing that using 'perf probe' + 'perf record' + 'perf script'
with callchains on userspace targets is producing the expected results.
Since we don't have a 'perf script --skip' to help us here, use tac +
grep to do that, resulting in a one liner that makes this script work on
both older glibc versions as well as with 2.35.
With it, on fedora 36, x86, glibc 2.35:
# perf test inet_pton
90: probe libc's inet_pton & backtrace it with ping : Ok
# perf test -v inet_pton
90: probe libc's inet_pton & backtrace it with ping :
--- start ---
test child forked, pid 627197
ping 627220 1 267956.962402: probe_libc:inet_pton_1: (7f488bf314c0)
1314c0 __GI___inet_pton+0x0 (/usr/lib64/libc.so.6)
fa6c6 getaddrinfo+0x126 (/usr/lib64/libc.so.6)
491e n (/usr/bin/ping)
test child finished with 0
---- end ----
probe libc's inet_pton & backtrace it with ping: Ok
#
And on Ubuntu 22.04.1 LTS on a Libre Computer ROC-RK3399-PC arm64 system:
Before this patch it works (see that the script used has no 'tac' to
remove the first event):
root@roc-rk3399-pc:~# dpkg -l | grep libc-bin
ii libc-bin 2.35-0ubuntu3.1 arm64 GNU C Library: Binaries
root@roc-rk3399-pc:~# grep -w tac ~acme/libexec/perf-core/tests/shell/record+probe_libc_inet_pton.sh
root@roc-rk3399-pc:~# perf test inet_pton
86: probe libc's inet_pton & backtrace it with ping : Ok
root@roc-rk3399-pc:~# perf test -v inet_pton
86: probe libc's inet_pton & backtrace it with ping :
--- start ---
test child forked, pid 1375
ping 1399 [000] 4114.417450: probe_libc:inet_pton: (ffffb3e26120)
106120 inet_pton+0x0 (/usr/lib/aarch64-linux-gnu/libc.so.6)
d18bc getaddrinfo+0xec (/usr/lib/aarch64-linux-gnu/libc.so.6)
2b68 [unknown] (/usr/bin/ping)
test child finished with 0
---- end ----
probe libc's inet_pton & backtrace it with ping: Ok
root@roc-rk3399-pc:~#
And after it continues to work:
root@roc-rk3399-pc:~# grep -w tac ~acme/libexec/perf-core/tests/shell/record+probe_libc_inet_pton.sh
perf script -i $perf_data | tac | grep -m1 ^ping -B9 | tac > $perf_script
root@roc-rk3399-pc:~# perf test inet_pton
86: probe libc's inet_pton & backtrace it with ping : Ok
root@roc-rk3399-pc:~# perf test -v inet_pton
86: probe libc's inet_pton & backtrace it with ping :
--- start ---
test child forked, pid 6995
ping 7019 [005] 4832.160741: probe_libc:inet_pton: (ffffa62e6120)
106120 inet_pton+0x0 (/usr/lib/aarch64-linux-gnu/libc.so.6)
d18bc getaddrinfo+0xec (/usr/lib/aarch64-linux-gnu/libc.so.6)
2b68 [unknown] (/usr/bin/ping)
test child finished with 0
---- end ----
probe libc's inet_pton & backtrace it with ping: Ok
root@roc-rk3399-pc:~#
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/Y7QyPkPlDYip3cZH@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The drm_sched_entity_kill() is invoked twice by drm_sched_entity_destroy()
while userspace process is exiting or being killed. First time it's invoked
when sched entity is flushed and second time when entity is released. This
causes a lockup within wait_for_completion(entity_idle) due to how completion
API works.
Calling wait_for_completion() more times than complete() was invoked is a
error condition that causes lockup because completion internally uses
counter for complete/wait calls. The complete_all() must be used instead
in such cases.
This patch fixes lockup of Panfrost driver that is reproducible by killing
any application in a middle of 3d drawing operation.
Fixes: 2fdb8a8f07 ("drm/scheduler: rework entity flush, kill and fini")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://patchwork.freedesktop.org/patch/msgid/20221123001303.533968-1-dmitry.osipenko@collabora.com
Variables off and len typed as uint32 in rndis_query function
are controlled by incoming RNDIS response message thus their
value may be manipulated. Setting off to a unexpectetly large
value will cause the sum with len and 8 to overflow and pass
the implemented validation step. Consequently the response
pointer will be referring to a location past the expected
buffer boundaries allowing information leakage e.g. via
RNDIS_OID_802_3_PERMANENT_ADDRESS OID.
Fixes: ddda086240 ("USB: rndis_host, various cleanups")
Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We want to fail if the PCS is not available, not if it is available. Fix
this condition.
Fixes: 5d93cfcf73 ("net: dpaa: Convert to phylink")
Reported-by: Christian Zigotzky <info@xenosoft.de>
Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code uses per_cpu pointer to get the lmtst_id mapped to
the core on which aura_free() is executed. Using per_cpu pointer
without preemption disable causing mismatch between lmtst_id and
core on which pointer gets freed. This patch fixes the issue by
disabling preemption around aura_free.
Fixes: ef6c8da71e ("octeontx2-pf: cn10K: Reserve LMTST lines per core")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise we would dereference a NULL aggregator pointer when calling
__set_agg_ports_ready on the line below.
Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Use signed integer in ipv6_skip_exthdr() called from nf_confirm().
Reported by static analysis tooling, patch from Florian Westphal.
2) Missing set type checks in nf_tables: Validate that set declaration
matches the an existing set type, otherwise bail out with EEXIST.
Currently, nf_tables silently accepts the re-declaration with a
different type but it bails out later with EINVAL when the user adds
entries to the set. This fix is relatively large because it requires
two preparation patches that are included in this batch.
3) Do not ignore updates of timeout and gc_interval parameters in
existing sets.
4) Fix a hang when 0/0 subnets is added to a hash:net,port,net type of
ipset. Except hash:net,port,net and hash:net,iface, the set types don't
support 0/0 and the auxiliary functions rely on this fact. So 0/0 needs
a special handling in hash:net,port,net which was missing (hash:net,iface
was not affected by this bug), from Jozsef Kadlecsik.
5) When adding/deleting large number of elements in one step in ipset,
it can take a reasonable amount of time and can result in soft lockup
errors. This patch is a complete rework of the previous version in order
to use a smaller internal batch limit and at the same time removing
the external hard limit to add arbitrary number of elements in one step.
Also from Jozsef Kadlecsik.
Except for patch #1, which fixes a bug introduced in the previous net-next
development cycle, anything else has been broken for several releases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If we cancel the task_work, the worker will never come into existance.
As this is the last reference to it, ensure that we get it freed
appropriately.
Cc: stable@vger.kernel.org
Reported-by: 진호 <wnwlsgh98@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull btrfs fixes from David Sterba:
"First batch of regression and regular fixes:
- regressions:
- fix error handling after conversion to qstr for paths
- fix raid56/scrub recovery caused by uninitialized variable
after conversion to error bitmaps
- restore qgroup backref lookup behaviour after recent
refactoring
- fix leak of device lists at module exit time
- fix resolving backrefs for inline extent followed by prealloc
- reset defrag ioctl buffer on memory allocation error"
* tag 'for-6.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix fscrypt name leak after failure to join log transaction
btrfs: scrub: fix uninitialized return value in recover_scrub_rbio
btrfs: fix resolving backrefs for inline extent followed by prealloc
btrfs: fix trace event name typo for FLUSH_DELAYED_REFS
btrfs: restore BTRFS_SEQ_LAST when looking up qgroup backref lookup
btrfs: fix leak of fs devices after removing btrfs module
btrfs: fix an error handling path in btrfs_defrag_leaves()
btrfs: fix an error handling path in btrfs_rename()
After
b3e34a47f9 ("x86/kexec: fix memory leak of elf header buffer"),
freeing image->elf_headers in the error path of crash_load_segments()
is not needed because kimage_file_post_load_cleanup() will take
care of that later. And not clearing it could result in a double-free.
Drop the superfluous vfree() call at the error path of
crash_load_segments().
Fixes: b3e34a47f9 ("x86/kexec: fix memory leak of elf header buffer")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20221122115122.13937-1-tiwai@suse.de
When we have a perf.data file with tracepoints, such as:
# perf evlist -f
probe_perf:lzma_decompress_to_file
# Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
#
We end up segfaulting when using perf built with NO_LIBTRACEEVENT=1 by
trying to find an evsel with a NULL 'event_name' variable:
(gdb) run report --stdio -f
Starting program: /root/bin/perf report --stdio -f
Program received signal SIGSEGV, Segmentation fault.
0x000000000055219d in find_evsel (evlist=0xfda7b0, event_name=0x0) at util/sort.c:2830
warning: Source file is more recent than executable.
2830 if (event_name[0] == '%') {
Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.8-11.fc36.x86_64 cyrus-sasl-lib-2.1.27-18.fc36.x86_64 elfutils-debuginfod-client-0.188-3.fc36.x86_64 elfutils-libelf-0.188-3.fc36.x86_64 elfutils-libs-0.188-3.fc36.x86_64 glibc-2.35-20.fc36.x86_64 keyutils-libs-1.6.1-4.fc36.x86_64 krb5-libs-1.19.2-12.fc36.x86_64 libbrotli-1.0.9-7.fc36.x86_64 libcap-2.48-4.fc36.x86_64 libcom_err-1.46.5-2.fc36.x86_64 libcurl-7.82.0-12.fc36.x86_64 libevent-2.1.12-6.fc36.x86_64 libgcc-12.2.1-4.fc36.x86_64 libidn2-2.3.4-1.fc36.x86_64 libnghttp2-1.51.0-1.fc36.x86_64 libpsl-0.21.1-5.fc36.x86_64 libselinux-3.3-4.fc36.x86_64 libssh-0.9.6-4.fc36.x86_64 libstdc++-12.2.1-4.fc36.x86_64 libunistring-1.0-1.fc36.x86_64 libunwind-1.6.2-2.fc36.x86_64 libxcrypt-4.4.33-4.fc36.x86_64 libzstd-1.5.2-2.fc36.x86_64 numactl-libs-2.0.14-5.fc36.x86_64 opencsd-1.2.0-1.fc36.x86_64 openldap-2.6.3-1.fc36.x86_64 openssl-libs-3.0.5-2.fc36.x86_64 slang-2.3.2-11.fc36.x86_64 xz-libs-5.2.5-9.fc36.x86_64 zlib-1.2.11-33.fc36.x86_64
(gdb) bt
#0 0x000000000055219d in find_evsel (evlist=0xfda7b0, event_name=0x0) at util/sort.c:2830
#1 0x0000000000552416 in add_dynamic_entry (evlist=0xfda7b0, tok=0xffb6eb "trace", level=2) at util/sort.c:2976
#2 0x0000000000552d26 in sort_dimension__add (list=0xf93e00 <perf_hpp_list>, tok=0xffb6eb "trace", evlist=0xfda7b0, level=2) at util/sort.c:3193
#3 0x0000000000552e1c in setup_sort_list (list=0xf93e00 <perf_hpp_list>, str=0xffb6eb "trace", evlist=0xfda7b0) at util/sort.c:3227
#4 0x00000000005532fa in __setup_sorting (evlist=0xfda7b0) at util/sort.c:3381
#5 0x0000000000553cdc in setup_sorting (evlist=0xfda7b0) at util/sort.c:3608
#6 0x000000000042eb9f in cmd_report (argc=0, argv=0x7fffffffe470) at builtin-report.c:1596
#7 0x00000000004aee7e in run_builtin (p=0xf64ca0 <commands+288>, argc=3, argv=0x7fffffffe470) at perf.c:330
#8 0x00000000004af0f2 in handle_internal_command (argc=3, argv=0x7fffffffe470) at perf.c:384
#9 0x00000000004af241 in run_argv (argcp=0x7fffffffe29c, argv=0x7fffffffe290) at perf.c:428
#10 0x00000000004af5fc in main (argc=3, argv=0x7fffffffe470) at perf.c:562
(gdb)
So check if we have tracepoint events in add_dynamic_entry() and bail
out instead:
# perf report --stdio -f
This perf binary isn't linked with libtraceevent, can't process probe_perf:lzma_decompress_to_file
Error:
Unknown --sort key: `trace'
#
Fixes: 378ef0f5d9 ("perf build: Use libtraceevent from the system")
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/Y7MDb7kRaHZB6APC@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If v4 READDIR operation hits a mountpoint and gets back an error,
then it will include that entry in the reply and set RDATTR_ERROR for it
to the error.
That's fine for "normal" exported filesystems, but on the v4root, we
need to be more careful to only expose the existence of dentries that
lead to exports.
If the mountd upcall times out while checking to see whether a
mountpoint on the v4root is exported, then we have no recourse other
than to fail the whole operation.
Cc: Steve Dickson <steved@redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216777
Reported-by: JianHong Yin <yin-jianhong@163.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org>
Commands such as kmem, kwork, lock, sched, trace and timechart depend on
libtraceevent, these commands need to be isolated using HAVE_LIBTRACEEVENT
macro when cmdlist generation.
The output of the generate-cmdlist.sh script is as follows:
# ./util/generate-cmdlist.sh
/* Automatically generated by ./util/generate-cmdlist.sh */
struct cmdname_help
{
char name[16];
char help[80];
};
static struct cmdname_help common_cmds[] = {
{"annotate", "Read perf.data (created by perf record) and display annotated code"},
{"archive", "Create archive with object files with build-ids found in perf.data file"},
{"bench", "General framework for benchmark suites"},
{"buildid-cache", "Manage build-id cache."},
{"buildid-list", "List the buildids in a perf.data file"},
{"c2c", "Shared Data C2C/HITM Analyzer."},
{"config", "Get and set variables in a configuration file."},
{"daemon", "Run record sessions on background"},
{"data", "Data file related processing"},
{"diff", "Read perf.data files and display the differential profile"},
{"evlist", "List the event names in a perf.data file"},
{"ftrace", "simple wrapper for kernel's ftrace functionality"},
{"inject", "Filter to augment the events stream with additional information"},
{"iostat", "Show I/O performance metrics"},
{"kallsyms", "Searches running kernel for symbols"},
{"kvm", "Tool to trace/measure kvm guest os"},
{"list", "List all symbolic event types"},
{"mem", "Profile memory accesses"},
{"record", "Run a command and record its profile into perf.data"},
{"report", "Read perf.data (created by perf record) and display the profile"},
{"script", "Read perf.data (created by perf record) and display trace output"},
{"stat", "Run a command and gather performance counter statistics"},
{"test", "Runs sanity tests."},
{"top", "System profiling tool."},
{"version", "display the version of perf binary"},
#ifdef HAVE_LIBELF_SUPPORT
{"probe", "Define new dynamic tracepoints"},
#endif /* HAVE_LIBELF_SUPPORT */
#if defined(HAVE_LIBTRACEEVENT) && (defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE_SUPPORT))
{"trace", "strace inspired tool"},
#endif /* HAVE_LIBTRACEEVENT && (HAVE_LIBAUDIT_SUPPORT || HAVE_SYSCALL_TABLE_SUPPORT) */
#ifdef HAVE_LIBTRACEEVENT
{"kmem", "Tool to trace/measure kernel memory properties"},
{"kwork", "Tool to trace/measure kernel work properties (latencies)"},
{"lock", "Analyze lock events"},
{"sched", "Tool to trace/measure scheduler properties (latencies)"},
{"timechart", "Tool to visualize total system behavior during a workload"},
#endif /* HAVE_LIBTRACEEVENT */
};
Fixes: 378ef0f5d9 ("perf build: Use libtraceevent from the system")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221226085703.95081-1-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 62d89a7d49 ("video: fbdev: matroxfb: set maxvram of vbG200eW to
the same as vbG200 to avoid black screen") accidently decreases the
maximum memory size for the Matrox G200eW (102b:0532) from 8 MB to 1 MB
by missing one zero. This caused the driver initialization to fail with
the messages below, as the minimum required VRAM size is 2 MB:
[ 9.436420] matroxfb: Matrox MGA-G200eW (PCI) detected
[ 9.444502] matroxfb: cannot determine memory size
[ 9.449316] matroxfb: probe of 0000:0a:03.0 failed with error -1
So, add the missing 0 to make it the intended 16 MB. Successfully tested on
the Dell PowerEdge R910/0KYD3D, BIOS 2.10.0 08/29/2013, that the warning is
gone.
While at it, add a leading 0 to the maxdisplayable entry, so it’s aligned
properly. The value could probably also be increased from 8 MB to 16 MB, as
the G200 uses the same values, but I have not checked any datasheet.
Note, matroxfb is obsolete and superseded by the maintained DRM driver
mga200, which is used by default on most systems where both drivers are
available. Therefore, on most systems it was only a cosmetic issue.
Fixes: 62d89a7d49 ("video: fbdev: matroxfb: set maxvram of vbG200eW to the same as vbG200 to avoid black screen")
Link: https://lore.kernel.org/linux-fbdev/972999d3-b75d-5680-fcef-6e6905c52ac5@suse.de/T/#mb6953a9995ebd18acc8552f99d6db39787aec775
Cc: it+linux-fbdev@molgen.mpg.de
Cc: Z. Liu <liuzx@knownsec.com>
Cc: Rich Felker <dalias@libc.org>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Helge Deller <deller@gmx.de>
The drm_sched_entity_kill() is invoked twice by drm_sched_entity_destroy()
while userspace process is exiting or being killed. First time it's invoked
when sched entity is flushed and second time when entity is released. This
causes a lockup within wait_for_completion(entity_idle) due to how completion
API works.
Calling wait_for_completion() more times than complete() was invoked is a
error condition that causes lockup because completion internally uses
counter for complete/wait calls. The complete_all() must be used instead
in such cases.
This patch fixes lockup of Panfrost driver that is reproducible by killing
any application in a middle of 3d drawing operation.
Fixes: 2fdb8a8f07 ("drm/scheduler: rework entity flush, kill and fini")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://patchwork.freedesktop.org/patch/msgid/20221123001303.533968-1-dmitry.osipenko@collabora.com
When adding/deleting large number of elements in one step in ipset, it can
take a reasonable amount of time and can result in soft lockup errors. The
patch 5f7b51bf09 ("netfilter: ipset: Limit the maximal range of
consecutive elements to add/delete") tried to fix it by limiting the max
elements to process at all. However it was not enough, it is still possible
that we get hung tasks. Lowering the limit is not reasonable, so the
approach in this patch is as follows: rely on the method used at resizing
sets and save the state when we reach a smaller internal batch limit,
unlock/lock and proceed from the saved state. Thus we can avoid long
continuous tasks and at the same time removed the limit to add/delete large
number of elements in one step.
The nfnl mutex is held during the whole operation which prevents one to
issue other ipset commands in parallel.
Fixes: 5f7b51bf09 ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete")
Reported-by: syzbot+9204e7399656300bf271@syzkaller.appspotmail.com
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The hash:net,port,net set type supports /0 subnets. However, the patch
commit 5f7b51bf09 titled "netfilter: ipset: Limit the maximal range
of consecutive elements to add/delete" did not take into account it and
resulted in an endless loop. The bug is actually older but the patch
5f7b51bf09 brings it out earlier.
Handle /0 subnets properly in hash:net,port,net set types.
Fixes: 5f7b51bf09 ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete")
Reported-by: Марк Коренберг <socketpair@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
There is an issue with the checking of the return value of
'of_get_mac_address', which returns 0 on success and negative value on
failure. The driver interpretated the result the opposite way. Therefore
if there was a MAC address defined in the DT, then the driver was
generating a random MAC address otherwise it would use address 0.
Fix this by checking correctly the return value of 'of_get_mac_address'
Fixes: b74ef9f9cb ("net: sparx5: Do not use mac_addr uninitialized in mchp_sparx5_probe()")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim says:
====================
net: dont intepret cls results when asked to drop
It is possible that an error in processing may occur in tcf_classify() which
will result in res.classid being some garbage value. Example of such a code path
is when the classifier goes into a loop due to bad policy. See patch 1/2
for a sample splat.
While the core code reacts correctly and asks the caller to drop the packet
(by returning TC_ACT_SHOT) some callers first intepret the res.class as
a pointer to memory and end up dropping the packet only after some activity with
the pointer. There is likelihood of this resulting in an exploit. So lets fix
all the known qdiscs that behave this way.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If asked to drop a packet via TC_ACT_SHOT it is unsafe to assume
res.class contains a valid pointer
Fixes: b0188d4dbe ("[NET_SCHED]: sch_atm: Lindent")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
of_irq_find_parent() returns a node pointer with refcount incremented,
We should use of_node_put() on it when not needed anymore.
Add missing of_node_put() to avoid refcount leak.
Fixes: 96868dce64 ("gpio/sifive: Add GPIO driver for SiFive SoCs")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
When ceph releasing the file_lock it will try to get the inode pointer
from the fl->fl_file, which the memory could already be released by
another thread in filp_close(). Because in VFS layer the fl->fl_file
doesn't increase the file's reference counter.
Will switch to use ceph dedicate lock info to track the inode.
And in ceph_fl_release_lock() we should skip all the operations if the
fl->fl_u.ceph.inode is not set, which should come from the request
file_lock. And we will set fl->fl_u.ceph.inode when inserting it to the
inode lock list, which is when copying the lock.
Link: https://tracker.ceph.com/issues/57986
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
For the POSIX locks they are using the same owner, which is the
thread id. And multiple POSIX locks could be merged into single one,
so when checking whether the 'file' has locks may fail.
For a file where some openers use locking and others don't is a
really odd usage pattern though. Locks are like stoplights -- they
only work if everyone pays attention to them.
Just switch ceph_get_caps() to check whether any locks are set on
the inode. If there are POSIX/OFD/FLOCK locks on the file at the
time, we should set CHECK_FILELOCK, regardless of what fd was used
to set the lock.
Fixes: ff5d913dfc ("ceph: return -EIO if read/write against filp that lost file locks")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Having a bigger number of FIFO lines held after vsync is only useful to
SoCs using AFBC to give time to the AFBC decoder to be reset, configured
and enabled again.
For SoCs not using AFBC this, on the contrary, is causing on some
displays issues and a few pixels vertical offset in the displayed image.
Conditionally increase the number of lines held after vsync only for
SoCs using AFBC, leaving the default value for all the others.
Fixes: 24e0d4058e ("drm/meson: hold 32 lines after vsync to give time for AFBC start")
Signed-off-by: Carlo Caione <ccaione@baylibre.com>
Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Neil Armstrong <neil.armstrong@linaro.org>
[narmstrong: added fixes tag]
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20221216-afbc_s905x-v1-0-033bebf780d9@baylibre.com
If the user's login time is newer than the cache's timestamp,
we expect the cache may be stale and need to clear.
The stale cache will remain in the list's tail if no other
users operate on that inode.
Once the user accesses the inode, the stale cache will be
returned in rcu path.
Signed-off-by: Chengen Du <chengen.du@canonical.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
As stated in marvell-orion-mdio.txt deleted in commit 0781434af8
("dt-bindings: net: orion-mdio: Convert to JSON schema") if
'interrupts' property is present, width of 'reg' should be 0x84.
Otherwise, width of 'reg' should be 0x4. Fix 'examples:' and add
constraints checking whether 'interrupts' property is present
and validate it against fixed values in reg.
Signed-off-by: Michał Grzelak <mig@semihalf.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This property has always been supported by the Linux driver; see
commit 9f93ac8d40 ("net-next: stmmac: Add dwmac-sun8i"). In fact, the
original driver submission includes the phy-supply code but no mention
of it in the binding, so the omission appears to be accidental. In
addition, the property is documented in the binding for the previous
hardware generation, allwinner,sun7i-a20-gmac.
Document phy-supply in the binding to fix devicetree validation for the
25+ boards that already use this property.
Fixes: 0441bde003 ("dt-bindings: net-next: Add DT bindings documentation for Allwinner dwmac-sun8i")
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is now possible for a system to have more than 32 endpoints. As
a result, registers related to endpoint suspend are parameterized,
with 32 endpoints represented in one more registers.
In ipa_interrupt_suspend_control(), the IPA_SUSPEND_EN register
offset is determined properly, but the bit mask used still assumes
the number of enpoints won't exceed 32. This is a bug. Fix it.
Fixes: f298ba785e ("net: ipa: add a parameter to suspend registers")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Po-Hsu Lin says:
====================
selftests: net: fix for arp_ndisc_evict_nocarrier test
This patchset will fix a false-positive issue caused by the command in
cleanup_v6() of the arp_ndisc_evict_nocarrier test.
Also, it will make the test to return a non-zero value for any failure
reported in the test for us to avoid false-negative results.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Return non-zero return value if there is any failure reported in this
script during the test. Otherwise it can only reflect the status of
the last command.
Fixes: f86ca07eb5 ("selftests: net: add arp_ndisc_evict_nocarrier")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cleanup_v6() will cause the arp_ndisc_evict_nocarrier script exit
with 255 (No such file or directory), even the tests are good:
# selftests: net: arp_ndisc_evict_nocarrier.sh
# run arp_evict_nocarrier=1 test
# RTNETLINK answers: File exists
# ok
# run arp_evict_nocarrier=0 test
# RTNETLINK answers: File exists
# ok
# run all.arp_evict_nocarrier=0 test
# RTNETLINK answers: File exists
# ok
# run ndisc_evict_nocarrier=1 test
# ok
# run ndisc_evict_nocarrier=0 test
# ok
# run all.ndisc_evict_nocarrier=0 test
# ok
not ok 1 selftests: net: arp_ndisc_evict_nocarrier.sh # exit=255
This is because it's trying to modify the parameter for ipv4 instead.
Also, tests for ipv6 (run_ndisc_evict_nocarrier_enabled() and
run_ndisc_evict_nocarrier_disabled() are working on veth1, reflect
this fact in cleanup_v6().
Fixes: f86ca07eb5 ("selftests: net: add arp_ndisc_evict_nocarrier")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that phylink no longer calls phy_get_rate_matching with
PHY_INTERFACE_MODE_NA, phys no longer need to support it. Remove the
documentation mandating support.
Fixes: 7642cc28fd ("net: phylink: fix PHY validation with rate adaption")
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christian Marangi says:
====================
net: dsa: qca8k: multiple fix on mdio read/write
Due to some problems in reading the Documentation and elaborating it
some wrong assumption were done. The error was reported and notice only
now due to how things are setup in the code flow.
First 2 patch fix mgmt eth where the lenght calculation is very
confusing and in step of word size. (the related commit description have
an extensive description about how this mess works)
Last 3 patch revert the broken mdio cache and apply a correct version
that should still save some extra mdio in phy poll secnario.
These 5 patch fix each related problem and apply what the Documentation
actually say.
Changes v2:
- Add cover letter
- Fix typo in revert patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Improve mdio master read/write by using singe mii read/write lo/hi.
In a read and write we need to poll the mdio master regs in a busy loop
to check for a specific bit present in the upper half of the reg. We can
ignore the other half since it won't contain useful data. This will save
an additional useless read for each read and write operation.
In a read operation the returned data is present in the mdio master reg
lower half. We can ignore the other half since it won't contain useful
data. This will save an additional useless read for each read operation.
In a read operation it's needed to just set the hi half of the mdio
master reg as the lo half will be replaced by the result. This will save
an additional useless write for each read operation.
Tested-by: Ronald Wahl <ronald.wahl@raritan.com>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It may be useful to read/write just the lo or hi half of a reg.
This is especially useful for phy poll with the use of mdio master.
The mdio master reg is composed by the first 16 bit related to setup and
the other half with the returned data or data to write.
Refactor the mii function to permit single mii read/write of lo or hi
half of the reg.
Tested-by: Ronald Wahl <ronald.wahl@raritan.com>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 2481d206fa.
The Documentation is very confusing about the topic.
The cache logic for hi and lo is wrong and actually miss some regs to be
actually written.
What the Documentation actually intended was that it's possible to skip
writing hi OR lo if half of the reg is not needed to be written or read.
Revert the change in favor of a better and correct implementation.
Reported-by: Ronald Wahl <ronald.wahl@raritan.com>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org # v5.18+
Signed-off-by: David S. Miller <davem@davemloft.net>
It was discovered that MGMT_DATA2 can contain up to 28 bytes of data
instead of the 12 bytes written in the Documentation by accounting the
limit of 16 bytes declared in Documentation subtracting the first 4 byte
in the packet header.
Update the define with the real world value.
Tested-by: Ronald Wahl <ronald.wahl@raritan.com>
Fixes: c2ee8181fd ("net: dsa: tag_qca: add define for handling mgmt Ethernet packet")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org # v5.18+
Signed-off-by: David S. Miller <davem@davemloft.net>
The assumption that Documentation was right about how this value work was
wrong. It was discovered that the length value of the mgmt header is in
step of word size.
As an example to process 4 byte of data the correct length to set is 2.
To process 8 byte 4, 12 byte 6, 16 byte 8...
Odd values will always return the next size on the ack packet.
(length of 3 (6 byte) will always return 8 bytes of data)
This means that a value of 15 (0xf) actually means reading/writing 32 bytes
of data instead of 16 bytes. This behaviour is totally absent and not
documented in the switch Documentation.
In fact from Documentation the max value that mgmt eth can process is
16 byte of data while in reality it can process 32 bytes at once.
To handle this we always round up the length after deviding it for word
size. We check if the result is odd and we round another time to align
to what the switch will provide in the ack packet.
The workaround for the length limit of 15 is still needed as the length
reg max value is 0xf(15)
Reported-by: Ronald Wahl <ronald.wahl@raritan.com>
Tested-by: Ronald Wahl <ronald.wahl@raritan.com>
Fixes: 90386223f4 ("net: dsa: qca8k: add support for larger read/write size with mgmt Ethernet")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org # v5.18+
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, when mlx5_ib_get_hw_stats() is used for device (port_num = 0),
there is a special handling in order to use the correct counters, but,
port_num is being passed down the stack without any change. Also, some
functions assume that port_num >=1. As a result, the following oops can
occur.
BUG: unable to handle page fault for address: ffff89510294f1a8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP
CPU: 8 PID: 1382 Comm: devlink Tainted: G W 6.1.0-rc4_for_upstream_base_2022_11_10_16_12 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:_raw_spin_lock+0xc/0x20
Call Trace:
<TASK>
mlx5_ib_get_native_port_mdev+0x73/0xe0 [mlx5_ib]
do_get_hw_stats.constprop.0+0x109/0x160 [mlx5_ib]
mlx5_ib_get_hw_stats+0xad/0x180 [mlx5_ib]
ib_setup_device_attrs+0xf0/0x290 [ib_core]
ib_register_device+0x3bb/0x510 [ib_core]
? atomic_notifier_chain_register+0x67/0x80
__mlx5_ib_add+0x2b/0x80 [mlx5_ib]
mlx5r_probe+0xb8/0x150 [mlx5_ib]
? auxiliary_match_id+0x6a/0x90
auxiliary_bus_probe+0x3c/0x70
? driver_sysfs_add+0x6b/0x90
really_probe+0xcd/0x380
__driver_probe_device+0x80/0x170
driver_probe_device+0x1e/0x90
__device_attach_driver+0x7d/0x100
? driver_allows_async_probing+0x60/0x60
? driver_allows_async_probing+0x60/0x60
bus_for_each_drv+0x7b/0xc0
__device_attach+0xbc/0x200
bus_probe_device+0x87/0xa0
device_add+0x404/0x940
? dev_set_name+0x53/0x70
__auxiliary_device_add+0x43/0x60
add_adev+0x99/0xe0 [mlx5_core]
mlx5_attach_device+0xc8/0x120 [mlx5_core]
mlx5_load_one_devl_locked+0xb2/0xe0 [mlx5_core]
devlink_reload+0x133/0x250
devlink_nl_cmd_reload+0x480/0x570
? devlink_nl_pre_doit+0x44/0x2b0
genl_family_rcv_msg_doit.isra.0+0xc2/0x110
genl_rcv_msg+0x180/0x2b0
? devlink_nl_cmd_region_read_dumpit+0x540/0x540
? devlink_reload+0x250/0x250
? devlink_put+0x50/0x50
? genl_family_rcv_msg_doit.isra.0+0x110/0x110
netlink_rcv_skb+0x54/0x100
genl_rcv+0x24/0x40
netlink_unicast+0x1f6/0x2c0
netlink_sendmsg+0x237/0x490
sock_sendmsg+0x33/0x40
__sys_sendto+0x103/0x160
? handle_mm_fault+0x10e/0x290
? do_user_addr_fault+0x1c0/0x5f0
__x64_sys_sendto+0x25/0x30
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fix it by setting port_num to 1 in order to get device status and remove
unused variable.
Fixes: aac4492ef2 ("IB/mlx5: Update counter implementation for dual port RoCE")
Link: https://lore.kernel.org/r/98b82994c3cd3fa593b8a75ed3f3901e208beb0f.1672231736.git.leonro@nvidia.com
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
There is a variable pinctrl declared without initializer. And then
has the case (switch operation chose the default case) to directly
use this uninitialized value, this is not a safe behavior. So here
initialize the pinctrl as 0 to avoid this issue.
This is reported by Coverity.
Fixes: 13c5d4ce80 ("gpio: pca953x: Add support for PCAL6534")
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Commit 8eb1f71e7a ("gpiolib: consolidate GPIO lookups") refactors
fwnode_get_named_gpiod() and gpiod_get_index() into a unified
gpiod_find_and_request() helper.
The old functions both initialized their local lookupflags variable to
GPIO_LOOKUP_FLAGS_DEFAULT, but the new code leaves it uninitialized.
This is a problem for at least ACPI platforms, where acpi_find_gpio()
only does a bunch of *lookupflags |= GPIO_* statements and thus relies
on the variable being initialized.
The variable not being initialized leads to:
1. Potentially the wrong flags getting used
2. The check for conflicting lookup flags in gpiod_configure_flags():
"multiple pull-up, pull-down or pull-disable enabled, invalid config"
sometimes triggering, making the GPIO unavailable
Restore the initialization of lookupflags to GPIO_LOOKUP_FLAGS_DEFAULT
to fix this.
Fixes: 8eb1f71e7a ("gpiolib: consolidate GPIO lookups")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
The SM4 CCM/GCM assembly functions for encryption and decryption is
called via indirect function calls. Therefore they need to use
SYM_TYPED_FUNC_START instead of SYM_FUNC_START to cause its type hash
to be emitted when the kernel is built with CONFIG_CFI_CLANG=y.
Otherwise, the code crashes with a CFI failure (if the compiler didn't
happen to optimize out the indirect call).
Fixes: 67fa3a7fdf ("crypto: arm64/sm4 - add CE implementation for CCM mode")
Fixes: ae1b83c7d5 ("crypto: arm64/sm4 - add CE implementation for GCM mode")
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
IO memory access has to be done with accessors defined in caam/regs.h
as there are little-endian architectures with a big-endian CAAM unit.
Fixes: 6a83830f64 ("crypto: caam - warn if blob_gen key is insecure")
Signed-off-by: Nikolaus Voss <nikolaus.voss@haag-streit.com>
Reviewed-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
of_phy_find_device() return device node with refcount incremented.
Call put_device() to relese it when not needed anymore.
Fixes: ab4e6ee578 ("net: phy: xgmiitorgmii: Check phy_driver ready before accessing")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device supports a PCIe optimization hint, which indicates on
which NUMA the queue is currently processed. This hint is utilized
by PCIe in order to reduce its access time by accessing the
correct NUMA resources and maintaining cache coherence.
The driver calls the register update for the hint (called TPH -
TLP Processing Hint) during the NAPI loop.
Though the update is expected upon a NUMA change (when a queue
is moved from one NUMA to the other), the current logic performs
a register update when the queue is moved to a different CPU,
but the CPU is not necessarily in a different NUMA.
The changes include:
1. Performing the TPH update only when the queue has switched
a NUMA node.
2. Moving the TPH update call to be triggered only when NAPI was
scheduled from interrupt context, as opposed to a busy-polling loop.
This is due to the fact that during busy-polling, the frequency
of CPU switches for a particular queue is significantly higher,
thus, the likelihood to switch NUMA is much higher. Therefore,
providing the frequent updates to the device upon a NUMA update
are unlikely to be beneficial.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RX ring can be NULL in XDP use cases where only TX queues
are configured. In this scenario, the RX interrupt moderation
value sent to the device remains in its default value of 0.
In this change, setting the default value of the RX interrupt
moderation to be the same as of the TX.
Fixes: 548c4940b9 ("net: ena: Implement XDP_TX action")
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the upper bound on rx_copybreak tighter, by
making sure it is smaller than the minimum of mtu and
ENA_PAGE_SIZE. With the current upper bound of mtu,
rx_copybreak can be larger than a page. Such large
rx_copybreak will not bring any performance benefit to
the user and therefore makes no sense.
In addition, the value update was only reflected in
the adapter structure, but not applied for each ring,
causing it to not take effect.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Osama Abboud <osamaabb@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Redirecting packets with XDP Redirect is done in two phases:
1. A packet is passed by the driver to the kernel using
xdp_do_redirect().
2. After finishing polling for new packets the driver lets the kernel
know that it can now process the redirected packet using
xdp_do_flush_map().
The packets' redirection is handled in the napi context of the
queue that called xdp_do_redirect()
To avoid calling xdp_do_flush_map() each time the driver first checks
whether any packets were redirected, using
xdp_flags |= xdp_verdict;
and
if (xdp_flags & XDP_REDIRECT)
xdp_do_flush_map()
essentially treating XDP instructions as a bitmask, which isn't the case:
enum xdp_action {
XDP_ABORTED = 0,
XDP_DROP,
XDP_PASS,
XDP_TX,
XDP_REDIRECT,
};
Given the current possible values of xdp_action, the current design
doesn't have a bug (since XDP_REDIRECT = 100b), but it is still
flawed.
This patch makes the driver use a bitmask instead, to avoid future
issues.
Fixes: a318c70ad1 ("net: ena: introduce XDP redirect implementation")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The size of packets that were forwarded or dropped by XDP wasn't added
to the total processed bytes statistic.
Fixes: 548c4940b9 ("net: ena: Implement XDP_TX action")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the queues aren't destroyed when we only exchange XDP programs,
there's no need to re-register them again.
Fixes: 548c4940b9 ("net: ena: Implement XDP_TX action")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On driver initialization, RSS hash initial value is set to zero,
instead of the default value. This happens because we pass NULL as
the RSS key parameter, which caused us to never initialize
the RSS hash value.
This patch fixes it by making sure the initial value is set, no matter
what the value of the RSS key is.
Fixes: 91a65b7d3e ("net: ena: fix potential crash when rxfh key is NULL")
Signed-off-by: Nati Koler <nkoler@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver does not call tasklet_kill in several places.
Add the calls to fix it.
Fixes: 85b85c8534 ("amd-xgbe: Re-issue interrupt if interrupt status not cleared")
Signed-off-by: Jiguang Xiao <jiguang.xiao@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the PF check the VF alive by the KEEP_ALVE
mailbox from VF. VF keep sending the mailbox per 2
seconds. Once PF lost the mailbox for more than 8
seconds, it will regards the VF is abnormal, and stop
notifying the state change to VF, include link state,
vf mac, reset, even though it receives the KEEP_ALIVE
mailbox again. It's inreasonable.
This patch fixes it. PF will record the state change which
need to notify VF when lost the VF's KEEP_ALIVE mailbox.
And notify VF when receive the mailbox again. Introduce a
new flag HCLGE_VPORT_STATE_INITED, used to distinguish the
case whether VF driver loaded or not. For VF will query
these states when initializing, so it's unnecessary to
notify it in this case.
Fixes: aa5c4f175b ("net: hns3: add reset handling for VF when doing PF reset")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A remove callback just returning 0 is equivalent to no remove callback
at all. So drop the useless function.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
A remove callback just returning 0 is equivalent to no remove callback
at all. So drop the useless function.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima says:
===================
tcp: Fix bhash2 and TIME_WAIT regression.
We forgot to add twsk to bhash2. Therefore TIME_WAIT sockets cannot
prevent bind() to the same local address and port.
Changes:
v1:
* Patch 1:
* Add tw_bind2_node in inet_timewait_sock instead of
moving sk_bind2_node from struct sock to struct
sock_common.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
bhash2 split the bind() validation logic into wildcard and non-wildcard
cases. Let's add a test to catch future regression.
Before the previous patch:
# ./bind_timewait
TAP version 13
1..2
# Starting 2 tests from 3 test cases.
# RUN bind_timewait.localhost.1 ...
# bind_timewait.c:87:1:Expected ret (0) == -1 (-1)
# 1: Test terminated by assertion
# FAIL bind_timewait.localhost.1
not ok 1 bind_timewait.localhost.1
# RUN bind_timewait.addrany.1 ...
# OK bind_timewait.addrany.1
ok 2 bind_timewait.addrany.1
# FAILED: 1 / 2 tests passed.
# Totals: pass:1 fail:1 xfail:0 xpass:0 skip:0 error:0
After:
# ./bind_timewait
TAP version 13
1..2
# Starting 2 tests from 3 test cases.
# RUN bind_timewait.localhost.1 ...
# OK bind_timewait.localhost.1
ok 1 bind_timewait.localhost.1
# RUN bind_timewait.addrany.1 ...
# OK bind_timewait.addrany.1
ok 2 bind_timewait.addrany.1
# PASSED: 2 / 2 tests passed.
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Slaby reported regression of bind() with a simple repro. [0]
The repro creates a TIME_WAIT socket and tries to bind() a new socket
with the same local address and port. Before commit 28044fc1d4 ("net:
Add a bhash2 table hashed by port and address"), the bind() failed with
-EADDRINUSE, but now it succeeds.
The cited commit should have put TIME_WAIT sockets into bhash2; otherwise,
inet_bhash2_conflict() misses TIME_WAIT sockets when validating bind()
requests if the address is not a wildcard one.
The straight option is to move sk_bind2_node from struct sock to struct
sock_common to add twsk to bhash2 as implemented as RFC. [1] However, the
binary layout change in the struct sock could affect performances moving
hot fields on different cachelines.
To avoid that, we add another TIME_WAIT list in inet_bind2_bucket and check
it while validating bind().
[0]: https://lore.kernel.org/netdev/6b971a4e-c7d8-411e-1f92-fda29b5b2fb9@kernel.org/
[1]: https://lore.kernel.org/netdev/20221221151258.25748-2-kuniyu@amazon.com/
Fixes: 28044fc1d4 ("net: Add a bhash2 table hashed by port and address")
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since gcc13, each member of an enum has the same type as the enum [1]. And
that is inherited from its members. Provided these two:
SRP_TAG_NO_REQ = ~0U,
SRP_TAG_TSK_MGMT = 1U << 31
all other members are unsigned ints.
Esp. with SRP_MAX_SGE and SRP_TSK_MGMT_SQ_SIZE and their use in min(),
this results in the following warnings:
include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast
drivers/infiniband/ulp/srp/ib_srp.c:563:42: note: in expansion of macro 'min'
include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast
drivers/infiniband/ulp/srp/ib_srp.c:2369:27: note: in expansion of macro 'min'
So move the large values away to a separate enum, so that they don't
affect other members.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36113
Link: https://lore.kernel.org/r/20221212120411.13750-1-jirislaby@kernel.org
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Kui-Feng Lee says:
====================
This issue is related to task iterators over vma. A system crash can
occur when a task iterator travels through vma of tasks as the death
of a task will clear the pointer to its mm, even though the
task_struct is still held. As a result, an unexpected crash happens
due to a null pointer. To address this problem, a reference to mm is
kept on the iterator to make sure that the pointer is always
valid. This patch set provides a solution for this crash by properly
referencing mm on task iterators over vma.
The major changes from v1 are:
- Fix commit logs of the test case.
- Use reverse Christmas tree coding style.
- Remove unnecessary error handling for time().
v1: https://lore.kernel.org/bpf/20221216015912.991616-1-kuifeng@meta.com/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When a task iterator traverses vma(s), it is possible task->mm might
become invalid in the middle of traversal and this may cause kernel
misbehave (e.g., crash)
This test case creates iterators repeatedly and forks short-lived
processes in the background to detect this bug. The test will last
for 3 seconds to get the chance to trigger the issue.
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221216221855.4122288-3-kuifeng@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fix the system crash that happens when a task iterator travel through
vma of tasks.
In task iterators, we used to access mm by following the pointer on
the task_struct; however, the death of a task will clear the pointer,
even though we still hold the task_struct. That can cause an
unexpected crash for a null pointer when an iterator is visiting a
task that dies during the visit. Keeping a reference of mm on the
iterator ensures we always have a valid pointer to mm.
Co-developed-by: Song Liu <song@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Reported-by: Nathan Slingerland <slinger@meta.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221216221855.4122288-2-kuifeng@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit 7443b296e6 ("x86/percpu: Move cpu_number next to current_task")
moved global per_cpu variable 'cpu_number' into pcpu_hot structure.
Therefore this part of var_data test is no longer valid.
Disable it until better solution is found.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In the scenario where livepatch and kretfunc coexist, the pageattr of
im->image is rox after arch_prepare_bpf_trampoline in
bpf_trampoline_update, and then modify_fentry or register_fentry returns
-EAGAIN from bpf_tramp_ftrace_ops_func, the BPF_TRAMP_F_ORIG_STACK flag
will be configured, and arch_prepare_bpf_trampoline will be re-executed.
At this time, because the pageattr of im->image is rox,
arch_prepare_bpf_trampoline will read and write im->image, which causes
a fault. as follows:
insmod livepatch-sample.ko # samples/livepatch/livepatch-sample.c
bpftrace -e 'kretfunc:cmdline_proc_show {}'
BUG: unable to handle page fault for address: ffffffffa0206000
PGD 322d067 P4D 322d067 PUD 322e063 PMD 1297e067 PTE d428061
Oops: 0003 [#1] PREEMPT SMP PTI
CPU: 2 PID: 270 Comm: bpftrace Tainted: G E K 6.1.0 #5
RIP: 0010:arch_prepare_bpf_trampoline+0xed/0x8c0
RSP: 0018:ffffc90001083ad8 EFLAGS: 00010202
RAX: ffffffffa0206000 RBX: 0000000000000020 RCX: 0000000000000000
RDX: ffffffffa0206001 RSI: ffffffffa0206000 RDI: 0000000000000030
RBP: ffffc90001083b70 R08: 0000000000000066 R09: ffff88800f51b400
R10: 000000002e72c6e5 R11: 00000000d0a15080 R12: ffff8880110a68c8
R13: 0000000000000000 R14: ffff88800f51b400 R15: ffffffff814fec10
FS: 00007f87bc0dc780(0000) GS:ffff88803e600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0206000 CR3: 0000000010b70000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
bpf_trampoline_update+0x25a/0x6b0
__bpf_trampoline_link_prog+0x101/0x240
bpf_trampoline_link_prog+0x2d/0x50
bpf_tracing_prog_attach+0x24c/0x530
bpf_raw_tp_link_attach+0x73/0x1d0
__sys_bpf+0x100e/0x2570
__x64_sys_bpf+0x1c/0x30
do_syscall_64+0x5b/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
With this patch, when modify_fentry or register_fentry returns -EAGAIN
from bpf_tramp_ftrace_ops_func, the pageattr of im->image will be reset
to nx+rw.
Cc: stable@vger.kernel.org
Fixes: 00963a2e75 ("bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)")
Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20221224133146.780578-1-nashuiliang@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The cited patch added support of matching on geneve option by setting
geneve_tlv_option_0_data mask and key but didn't set geneve_tlv_option_0_exist
bit which is required on some HWs when matching geneve_tlv_option_0_data parameter,
this may cause in some cases for packets to wrongly match on rules with different
geneve option.
Example of such case is packet with geneve_tlv_object class=789 and data=456
will wrongly match on rule with match geneve_tlv_object class=123 and data=456.
Fix it by setting geneve_tlv_option_0_exist bit when supported by the HW when matching
on geneve_tlv_option_0_data parameter.
Fixes: 9272e3df30 ("net/mlx5e: Geneve, Add support for encap/decap flows offload")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Current xdp xmit functions logic (mlx5e_xmit_xdp_frame_mpwqe or
mlx5e_xmit_xdp_frame), validates xdp packet length by comparing it to
hw mtu (configured at xdp sq allocation) before xmiting it. This check
does not account for ethernet fcs length (calculated and filled by the
nic). Hence, when we try sending packets with length > (hw-mtu -
ethernet-fcs-size), the device port drops it and tx_errors_phy is
incremented. Desired behavior is to catch these packets and drop them
by the driver.
Fix this behavior in XDP SQ allocation function (mlx5e_alloc_xdpsq) by
subtracting ethernet FCS header size (4 Bytes) from current hw mtu
value, since ethernet FCS is calculated and written to ethernet frames
by the nic.
Fixes: d8bec2b29a ("net/mlx5e: Support bpf_xdp_adjust_head()")
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit introduced a bug for multiple encapsulations flow.
If one dest encap becomes invalid, the flow is set slow path flag.
But when other dests encap become invalid, they are not cleared due
to slow path flag of the flow. When neigh-update-add is running, it
will use invalid encap.
Fix it by checking slow path flag after clearing dest encap.
Fixes: 9a5f9cc794 ("net/mlx5e: Fix possible use-after-free deleting fdb rule")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Need to use sprintf to build a string instead of sscanf. Otherwise
dirname is null and both "ct_nic" and "ct_fdb" won't be created.
But its redundant anyway as driver could be in switchdev mode but
still add nic rules. So use "ct" as folder name.
Fixes: 77422a8f6f ("net/mlx5e: CT: Add ct driver counters")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
RX reporter mistakenly reads from the regular (inactive) RQ
when XSK RQ is active. Fix it here.
Fixes: 3db4c85cde ("net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5e_build_nic_params will turn CQE compression on if the hardware
capability is enabled and the slow_pci_heuristic condition is detected.
As IPoIB doesn't support CQE compression, make sure to disable the
feature in the IPoIB profile init.
Please note that the feature is not exposed to the user for IPoIB
interfaces, so it can't be subsequently turned on.
Fixes: b797a684b0 ("net/mlx5e: Enable CQE compression when PCI is slower than link")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5 PF can disable RoCE for its VFs and SFs. In such case RoCE is
marked as unsupported on those VFs/SFs.
The cited patch added an option for disable (and enable) RoCE at HCA
level. However, that commit didn't check whether RoCE is supported on
the HCA and enabled user to try and set RoCE to on.
Fix it by checking whether the HCA supports RoCE.
Fixes: fbfa97b4d7 ("net/mlx5: Disable roce at HCA level")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, recovery is done without considering whether the device is
still in probe flow.
This may lead to recovery before device have finished probed
successfully. e.g.: while mlx5_init_one() is running. Recovery flow is
using functionality that is loaded only by mlx5_init_one(), and there
is no point in running recovery without mlx5_init_one() finished
successfully.
Fix it by waiting for probe flow to finish and checking whether the
device is probed before trying to perform recovery.
Fixes: 51d138c261 ("net/mlx5: Fix health error state handling")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
io_eq_size and event_eq_size params are of param type
DEVLINK_PARAM_TYPE_U32. But, the validation callback is addressing them
as DEVLINK_PARAM_TYPE_U16.
This cause mismatch in validation in big-endian systems, in which
values in range were rejected while 268500991 was accepted.
Fix it by checking the U32 value in the validation callback.
Fixes: 0844fa5f7b ("net/mlx5: Let user configure io_eq_size param")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
There are two cleanup calls missing in mlx5_init_once() error path.
Add them making the error path flow to be the same as
mlx5_cleanup_once().
Fixes: 52ec462eca ("net/mlx5: Add reserved-gids support")
Fixes: 7c39afb394 ("net/mlx5: PTP code migration to driver core section")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Fix SRIOV VST mode behavior to insert cvlan when a guest tag is already
present in the frame. Previous VST mode behavior was to drop packets or
override existing tag, depending on the device version.
In this patch we fix this behavior by correctly building the HW steering
rule with a push vlan action, or for older devices we ask the FW to stack
the vlan when a vlan is already present.
Fixes: 07bab95026 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Fixes: dfcb1ed3c3 ("net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Rudi reports a compilation failure on x86_64 when CONFIG_NET_CLS or
CONFIG_NET_CLS_ACT is not set but CONFIG_RETPOLINE is set.
A misplaced '#endif' was causing the issue.
Fixes: 7f0e810220 ("net/sched: add retpoline wrapper for tc")
Tested-by: Rudi Heitbaum <rudi@heitbaum.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Follow the advice of the Documentation/filesystems/sysfs.rst
and show() should only use sysfs_emit() or sysfs_emit_at()
when formatting the value to be returned to user space.
Signed-off-by: Xuezhi Zhang <zhangxuezhi1@coolpad.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chunhao Lin says:
====================
r8169: fix dmar pte write access is not set error
This series fixes dmar pte write access is not set error.
Chunhao Lin (2):
r8169: move rtl_wol_enable_rx() and rtl_prepare_power_down()
r8169: fix dmar pte write access is not set error
v2:
-update commit message
-adjust the code according to current kernel code
v3:
-update title and commit message
-split the patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When close device, if wol is enabled, rx will be enabled. When open
device it will cause rx packet to be dma to the wrong memory address
after pci_set_master() and system log will show blow messages.
DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Write] Request device [02:00.0] PASID ffffffff fault addr
ffdd4000 [fault reason 05] PTE Write access is not set
In this patch, driver disable tx/rx when close device. If wol is
enabled, only enable rx filter and disable rxdv_gate(if support) to
let hardware only receive packet to fifo but not to dma it.
Signed-off-by: Chunhao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no functional change. Moving these two functions for following
patch "r8169: fix dmar pte write access is not set error".
Signed-off-by: Chunhao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniil Tatianin says:
====================
net/ethtool/ioctl: split ethtool_get_phy_stats into multiple helpers
This series fixes a potential NULL dereference in ethtool_get_phy_stats
while also attempting to refactor/split said function into multiple
helpers so that it's easier to reason about what's going on.
I've taken Andrew Lunn's suggestions on the previous version of this
patch and added a bit of my own.
Changes since v1:
- Remove an extra newline in the first patch
- Move WARN_ON_ONCE into the if check as it already returns the
result of the comparison
- Actually split ethtool_get_phy_stats instead of attempting to
refactor it
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
So that it's easier to follow and make sense of the branching and
various conditions.
Stats retrieval has been split into two separate functions
ethtool_get_phy_stats_phydev & ethtool_get_phy_stats_ethtool.
The former attempts to retrieve the stats using phydev & phy_ops, while
the latter uses ethtool_ops.
Actual n_stats validation & array allocation has been moved into a new
ethtool_vzalloc_stats_array helper.
This also fixes a potential NULL dereference of
ops->get_ethtool_phy_stats where it was getting called in an else branch
unconditionally without making sure it was actually present.
Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we always early return if we don't have any stats we can remove
these checks as they're no longer necessary.
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's not very useful to copy back an empty ethtool_stats struct and
return 0 if we didn't actually have any stats. This also allows for
further simplification of this function in the future commits.
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the device without multiqueue feature, we will read 0 as
max_virtqueue_pairs from the config. So if we fill
VDPA_ATTR_DEV_NET_CFG_MAX_VQP with the value we read from the config
we will confuse the user.
Fixing this by only filling the value when multiqueue is offered by
the device so userspace can assume 1 when the attr is not provided.
Fixes: 13b00b135665c("vdpa: Add support for querying vendor statistics")
Cc: Eli Cohen <elic@nvidia.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220907060110.4511-1-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
In vp_vdpa_remove(), the code kfree(&vp_vdpa_mgtdev->mgtdev.id_table) uses
a reference of pointer as the argument of kfree, which is the wrong pointer
and then may hit crash like this:
Unable to handle kernel paging request at virtual address 00ffff003363e30c
Internal error: Oops: 96000004 [#1] SMP
Call trace:
rb_next+0x20/0x5c
ext4_readdir+0x494/0x5c4 [ext4]
iterate_dir+0x168/0x1b4
__se_sys_getdents64+0x68/0x170
__arm64_sys_getdents64+0x24/0x30
el0_svc_common.constprop.0+0x7c/0x1bc
do_el0_svc+0x2c/0x94
el0_svc+0x20/0x30
el0_sync_handler+0xb0/0xb4
el0_sync+0x160/0x180
Code: 54000220 f9400441 b4000161 aa0103e0 (f9400821)
SMP: stopping secondary CPUs
Starting crashdump kernel...
Fixes: ffbda8e9df ("vdpa/vp_vdpa : add vdpa tool support in vp_vdpa")
Signed-off-by: Rong Wang <wangrong68@huawei.com>
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Message-Id: <20221207120813.2837529-1-sunnanyong@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cindy Lu <lulu@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Add a limit to 'config->vq_num' which is user controlled data which
comes from an vduse_ioctl to prevent large memory allocations.
Micheal says - This limit is somewhat arbitrary.
However, currently virtio pci and ccw are limited to a 16 bit vq number.
While MMIO isn't it is also isn't used with lots of VQs due to
current lack of support for per-vq interrupts.
Thus, the 0xffff limit on number of VQs corresponding
to a 16-bit VQ number seems sufficient for now.
This is found using static analysis with smatch.
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Message-Id: <20221128155717.2579992-1-harshit.m.mogalapalli@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
This gets rid of the last references to smp_read_barrier_depends()
which for the kernel side was removed in v5.9. The serialization
required for Alpha is done inside READ_ONCE() instead of having
users deal with it. Simply use a full barrier, the architecture
does not have rmb in the first place.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Message-Id: <20221128034347.990-3-dave@stgolabs.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
When we initialize vringh, we should pass the features and the
number of elements in the virtqueue negotiated with the driver,
otherwise operations with vringh may fail.
This was discovered in a case where the driver sets a number of
elements in the virtqueue different from the value returned by
.get_vq_num_max().
In vdpasim_vq_reset() is safe to initialize the vringh with
default values, since the virtqueue will not be used until
vdpasim_queue_ready() is called again.
Fixes: 2c53d0f64c ("vdpasim: vDPA device simulator")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20221110141335.62171-1-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Before commit 3d56987938 ("vhost-vdpa: introduce asid based IOTLB")
we called vhost_vdpa_iotlb_unmap(v, iotlb, 0ULL, 0ULL - 1) during
release to free all the resources allocated when processing user IOTLB
messages through vhost_vdpa_process_iotlb_update().
That commit changed the handling of IOTLB a bit, and we accidentally
removed some code called during the release.
We partially fixed this with commit 037d430556 ("vhost-vdpa: call
vhost_vdpa_cleanup during the release") but a potential memory leak is
still there as showed by kmemleak if the application does not send
VHOST_IOTLB_INVALIDATE or crashes:
unreferenced object 0xffff888007fbaa30 (size 16):
comm "blkio-bench", pid 914, jiffies 4294993521 (age 885.500s)
hex dump (first 16 bytes):
40 73 41 07 80 88 ff ff 00 00 00 00 00 00 00 00 @sA.............
backtrace:
[<0000000087736d2a>] kmem_cache_alloc_trace+0x142/0x1c0
[<0000000060740f50>] vhost_vdpa_process_iotlb_msg+0x68c/0x901 [vhost_vdpa]
[<0000000083e8e205>] vhost_chr_write_iter+0xc0/0x4a0 [vhost]
[<000000008f2f414a>] vhost_vdpa_chr_write_iter+0x18/0x20 [vhost_vdpa]
[<00000000de1cd4a0>] vfs_write+0x216/0x4b0
[<00000000a2850200>] ksys_write+0x71/0xf0
[<00000000de8e720b>] __x64_sys_write+0x19/0x20
[<0000000018b12cbb>] do_syscall_64+0x3f/0x90
[<00000000986ec465>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Let's fix this calling vhost_vdpa_iotlb_unmap() on the whole range in
vhost_vdpa_remove_as(). We move that call before vhost_dev_cleanup()
since we need a valid v->vdev.mm in vhost_vdpa_pa_unmap().
vhost_iotlb_reset() call can be removed, since vhost_vdpa_iotlb_unmap()
on the whole range removes all the entries.
The kmemleak log reported was observed with a vDPA device that has `use_va`
set to true (e.g. VDUSE). This patch has been tested with both types of
devices.
Fixes: 037d430556 ("vhost-vdpa: call vhost_vdpa_cleanup during the release")
Fixes: 3d56987938 ("vhost-vdpa: introduce asid based IOTLB")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20221109154213.146789-1-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
vhost_iotlb_itree_first() requires `start` and `last` parameters
to search for a mapping that overlaps the range.
In translate_desc() we cyclically call vhost_iotlb_itree_first(),
incrementing `addr` by the amount already translated, so rightly
we move the `start` parameter passed to vhost_iotlb_itree_first(),
but we should hold the `last` parameter constant.
Let's fix it by saving the `last` parameter value before incrementing
`addr` in the loop.
Fixes: a9709d6874 ("vhost: convert pre sorted vhost memory array to interval tree")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20221109102503.18816-3-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost_iotlb_itree_first() requires `start` and `last` parameters
to search for a mapping that overlaps the range.
In iotlb_translate() we cyclically call vhost_iotlb_itree_first(),
incrementing `addr` by the amount already translated, so rightly
we move the `start` parameter passed to vhost_iotlb_itree_first(),
but we should hold the `last` parameter constant.
Let's fix it by saving the `last` parameter value before incrementing
`addr` in the loop.
Fixes: 9ad9c49cfe ("vringh: IOTLB support")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20221109102503.18816-2-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A problem about modprobe vhost_vsock failed is triggered with the
following log given:
modprobe: ERROR: could not insert 'vhost_vsock': Device or resource busy
The reason is that vhost_vsock_init() returns misc_register() directly
without checking its return value, if misc_register() failed, it returns
without calling vsock_core_unregister() on vhost_transport, resulting the
vhost_vsock can never be installed later.
A simple call graph is shown as below:
vhost_vsock_init()
vsock_core_register() # register vhost_transport
misc_register()
device_create_with_groups()
device_create_groups_vargs()
dev = kzalloc(...) # OOM happened
# return without unregister vhost_transport
Fix by calling vsock_core_unregister() when misc_register() returns error.
Fixes: 433fc58e6b ("VSOCK: Introduce vhost_vsock.ko")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Message-Id: <20221108101705.45981-1-yuancan@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Virtio_crypto use max_data_queues+1 to setup vqs,
we use vp_modern_get_num_queues to protect the vq range in setup_vq.
We could enter index >= vp_modern_get_num_queues(mdev) in setup_vq
if common->num_queues is not set well,and it return -ENOENT.
It is better to use -EINVAL instead.
Signed-off-by: Angus Chen <angus.chen@jaguarmicro.com>
Message-Id: <20221101111655.1947-1-angus.chen@jaguarmicro.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When qemu uses different address spaces for data and control virtqueues,
the current code would overwrite the control virtqueue iotlb through the
dup_iotlb call. Fix this by referring to the address space identifier
and the group to asid mapping to determine which mapping needs to be
updated. We also move the address space logic from mlx5 net to core
directory.
Reported-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-6-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
event_handler runs under atomic context and may not acquire reslock. We
can still guarantee that the handler won't be called after suspend by
clearing nb_registered, unregistering the handler and flushing the
workqueue.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-5-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Set the VLAN id to the header values field instead of overwriting the
headers criteria field.
Before this fix, VLAN filtering would not really work and tagged packets
would be forwarded unfiltered to the TIR.
Fixes: baf2ad3f6a ("vdpa/mlx5: Add RX MAC VLAN filter support")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-2-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The fotg210 module combines the HCD and OTG drivers, which then
fails to build when only the USB gadget support is enabled
in the kernel but host support is not:
aarch64-linux-ld: drivers/usb/fotg210/fotg210-core.o: in function `fotg210_init':
fotg210-core.c:(.init.text+0xc): undefined reference to `usb_disabled'
Move the check for usb_disabled() after the check for the HCD module,
and let the OTG driver still be probed in this configuration.
A nicer approach might be to have the common portion built as a
library module, with the two platform other files registering
their own platform_driver instances separately.
Fixes: ddacd6ef44 ("usb: fotg210: Fix Kconfig for USB host modules")
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20221215165728.2062984-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Chan says:
====================
bnxt_en: Bug fixes
This series fixes a devlink bug and several XDP related bugs. The
devlink bug causes a kernel crash on VF devices. The XDP driver
patches fix and clean up the RX XDP path and re-enable header-data
split that was disabled by mistake when adding the XDP multi-buffer
support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent XDP multi-buffer feature has introduced regressions in the
setting of HDS and jumbo thresholds. HDS was accidentally disabled in
the nornmal mode without XDP. This patch restores jumbo HDS placement
when not in XDP mode. In XDP multi-buffer mode, HDS should be disabled
and the jumbo threshold should be set to the usable page size in the
first page buffer.
Fixes: 3286123619 ("bnxt: change receive ring space parameters")
Reviewed-by: Mohammad Shuab Siddique <mohammad-shuab.siddique@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The size of the first buffer is always page size, and the useable
space is the page size minus the offset and the skb_shared_info size.
Make sure SKB and XDP buf sizes match so that the skb_shared_info
is at the same offset seen from the SKB and XDP_BUF.
build_skb() should be passed PAGE_SIZE. xdp_init_buff() should
be passed PAGE_SIZE as well. xdp_get_shared_info_from_buff() will
automatically deduct the skb_shared_info size if the XDP buffer
has frags. There is no need to keep bp->xdp_has_frags.
Change BNXT_PAGE_MODE_BUF_SIZE to BNXT_MAX_PAGE_MODE_MTU_SBUF
since this constant is really the MTU with ethernet header size
subtracted.
Also fix the BNXT_MAX_PAGE_MODE_MTU macro with proper parentheses.
Fixes: 3286123619 ("bnxt: change receive ring space parameters")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The XDP program can change the starting address of the RX data buffer and
this information needs to be passed back from bnxt_rx_xdp() to
bnxt_rx_pkt() for the XDP_PASS case so that the SKB can point correctly
to the modified buffer address. Add back the data_ptr parameter to
bnxt_rx_xdp() to make this work.
Fixes: b231c3f341 ("bnxt: refactor bnxt_rx_xdp to separate xdp_init_buff/xdp_prepare_buff")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yoshihiro Shimoda says:
====================
net: ethernet: renesas: rswitch: Fix minor issues
This patch series is based on v6.2-rc2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
To get mac address from device tree which is from each ethernet-port,
fix the first argument of of_get_ethdev_address().
Fixes: 3590918b5d ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If rswitch_init() returns non-zero and this driver is re-probed,
the following error happens:
renesas_eth_sw e6880000.ethernet: Unbalanced pm_runtime_enable!
So, fix error path in renesas_eth_sw_probe().
Fixes: 3590918b5d ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can merge VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES with
VDPA_ATTR_DEV_FEATURES which is functionally equivalent.
While at it, tweak the comment in header file to make
user provioned device features distinguished from those
supported by the parent mgmtdev device: the former of
which can be inherited as a whole from the latter, or
can be a subset of the latter if explicitly specified.
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Message-Id: <1665422823-18364-1-git-send-email-si-wei.liu@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Jakub Kicinski says:
====================
netdev doc de-FAQization
We have outgrown the FAQ format for our process doc.
I often find myself struggling to locate information in this doc,
because the questions do not serve well as section headers.
Reformat the document.
v2: update the headers
v1: https://lore.kernel.org/all/20221221184007.1170384-1-kuba@kernel.org/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The netdev-FAQ document has grown over the years to the point
where finding information in it is somewhat challenging.
The length of the questions prevents readers from locating
content that's relevant at a glance.
Convert to a more standard documentation format with sections
and sub-sections rather than questions and answers.
The content edits are limited to what's necessary to change
the format, and very minor clarifications.
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subsequent changes will reformat the doc away from FAQ.
To make that more readable perform the pure section moves now.
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the end of rxrpc_recvmsg(), if a call is found, the call is put and then
a trace line is emitted referencing that call in a couple of places - but
the call may have been deallocated by the time those traces happen.
Fix this by stashing the call debug_id in a variable and passing that to
the tracepoint rather than the call pointer.
Fixes: 849979051c ("rxrpc: Add a tracepoint to follow what recvmsg does")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
The implementation of strscpy() is more robust and safer.
That's now the recommended way to copy NUL-terminated strings.
Signed-off-by: Xu Panda <xu.panda@zte.com.cn>
Signed-off-by: Yang Yang <yang.yang29@zte.com>
Signed-off-by: Helge Deller <deller@gmx.de>
The implementation of strscpy() is more robust and safer.
That's now the recommended way to copy NUL-terminated strings.
Signed-off-by: Xu Panda <xu.panda@zte.com.cn>
Signed-off-by: Yang Yang <yang.yang29@zte.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Make the offb (Open Firmware frame buffer) driver tristate,
i.e., so that it can be built as a loadable module.
However, it still depends on the setting of DRM_OFDRM
so that both of these drivers cannot be builtin at the same time
nor can one be builtin and the other one a loadable module.
Build-tested successfully with all combination of DRM_OFDRM and FB_OF.
This fixes a build issue that Michal reported when FB_OF=y and
DRM_OFDRM=m:
powerpc64-linux-ld: drivers/video/fbdev/offb.o:(.data.rel.ro+0x58): undefined reference to `cfb_fillrect'
powerpc64-linux-ld: drivers/video/fbdev/offb.o:(.data.rel.ro+0x60): undefined reference to `cfb_copyarea'
powerpc64-linux-ld: drivers/video/fbdev/offb.o:(.data.rel.ro+0x68): undefined reference to `cfb_imageblit'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Michal Suchánek <msuchanek@suse.de>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-fbdev@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Helge Deller <deller@gmx.de>
Fix for uninitialized variable warning.
Addresses-Coverity: ("Uninitialized scalar variable")
Signed-off-by: Anuradha Weeraman <anuradha@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
nfc_get_device() take reference for the device, add missing
nfc_put_device() to release it when not need anymore.
Also fix the style warnning by use error EOPNOTSUPP instead of
ENOTSUPP.
Fixes: 5ce3f32b52 ("NFC: netlink: SE API implementation")
Fixes: 29e76924cf ("nfc: netlink: Add capability to reply to vendor_cmd with data")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PTP hardware timestamping related objects are not linked when PTP
support for MV88E6xxx (NET_DSA_MV88E6XXX_PTP) is disabled, therefore
NET_DSA_MV88E6XXX should not depend on PTP_1588_CLOCK_OPTIONAL
regardless of NET_DSA_MV88E6XXX_PTP.
Instead, condition more strictly on how NET_DSA_MV88E6XXX_PTP's
dependencies are met, making sure that it cannot be enabled when
NET_DSA_MV88E6XXX=y and PTP_1588_CLOCK=m.
In other words, this commit allows NET_DSA_MV88E6XXX to be built-in
while PTP_1588_CLOCK is a module, as long as NET_DSA_MV88E6XXX_PTP is
prevented from being enabled.
Fixes: e5f3155267 ("ethernet: fix PTP_1588_CLOCK dependencies")
Signed-off-by: Johnny S. Lee <foss@jsl.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
adapter->dcb would get silently freed inside qlcnic_dcb_enable() in
case qlcnic_dcb_attach() would return an error, which always happens
under OOM conditions. This would lead to use-after-free because both
of the existing callers invoke qlcnic_dcb_get_info() on the obtained
pointer, which is potentially freed at that point.
Propagate errors from qlcnic_dcb_enable(), and instead free the dcb
pointer at callsite using qlcnic_dcb_free(). This also removes the now
unused qlcnic_clear_dcb_ops() helper, which was a simple wrapper around
kfree() also causing memory leaks for partially initialized dcb.
Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.
Fixes: 3c44bba1d2 ("qlcnic: Disable DCB operations from SR-IOV VFs")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzkaller reports a memory leak as follows:
====================================
BUG: memory leak
unreferenced object 0xffff88810c287f00 (size 256):
comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
[<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
[<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
[<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
[<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
[<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
[<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
[<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
[<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
[<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
[<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
[<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
[<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
[<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
[<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
[<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
[<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
[<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
[<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
[<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
[<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
[<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
[<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
====================================
Kernel uses tcindex_change() to change an existing
filter properties.
Yet the problem is that, during the process of changing,
if `old_r` is retrieved from `p->perfect`, then
kernel uses tcindex_alloc_perfect_hash() to newly
allocate filter results, uses tcindex_filter_result_init()
to clear the old filter result, without destroying
its tcf_exts structure, which triggers the above memory leak.
To be more specific, there are only two source for the `old_r`,
according to the tcindex_lookup(). `old_r` is retrieved from
`p->perfect`, or `old_r` is retrieved from `p->h`.
* If `old_r` is retrieved from `p->perfect`, kernel uses
tcindex_alloc_perfect_hash() to newly allocate the
filter results. Then `r` is assigned with `cp->perfect + handle`,
which is newly allocated. So condition `old_r && old_r != r` is
true in this situation, and kernel uses tcindex_filter_result_init()
to clear the old filter result, without destroying
its tcf_exts structure
* If `old_r` is retrieved from `p->h`, then `p->perfect` is NULL
according to the tcindex_lookup(). Considering that `cp->h`
is directly copied from `p->h` and `p->perfect` is NULL,
`r` is assigned with `tcindex_lookup(cp, handle)`, whose value
should be the same as `old_r`, so condition `old_r && old_r != r`
is false in this situation, kernel ignores using
tcindex_filter_result_init() to clear the old filter result.
So only when `old_r` is retrieved from `p->perfect` does kernel use
tcindex_filter_result_init() to clear the old filter result, which
triggers the above memory leak.
Considering that there already exists a tc_filter_wq workqueue
to destroy the old tcindex_data by tcindex_partial_destroy_work()
at the end of tcindex_set_parms(), this patch solves
this memory leak bug by removing this old filter result
clearing part and delegating it to the tc_filter_wq workqueue.
Note that this patch doesn't introduce any other issues. If
`old_r` is retrieved from `p->perfect`, this patch just
delegates old filter result clearing part to the
tc_filter_wq workqueue; If `old_r` is retrieved from `p->h`,
kernel doesn't reach the old filter result clearing part, so
removing this part has no effect.
[Thanks to the suggestion from Jakub Kicinski, Cong Wang, Paolo Abeni
and Dmitry Vyukov]
Fixes: b9a24bb76b ("net_sched: properly handle failure case of tcf_exts_init()")
Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/
Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
Cc: Cong Wang <cong.wang@bytedance.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
The following pull-request contains BPF updates for your *net* tree.
We've added 7 non-merge commits during the last 5 day(s) which contain
a total of 11 files changed, 231 insertions(+), 3 deletions(-).
The main changes are:
1) Fix a splat in bpf_skb_generic_pop() under CHECKSUM_PARTIAL due to
misuse of skb_postpull_rcsum(), from Jakub Kicinski with test case
from Martin Lau.
2) Fix BPF verifier's nullness propagation when registers are of
type PTR_TO_BTF_ID, from Hao Sun.
3) Fix bpftool build for JIT disassembler under statically built
libllvm, from Anton Protopopov.
4) Fix warnings reported by resolve_btfids when building vmlinux
with CONFIG_SECURITY_NETWORK disabled, from Hou Tao.
5) Minor fix up for BPF selftest gitignore, from Stanislav Fomichev.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The check_reserve_boundaries function uses a lot of kernel stack,
and it gets inlined by clang, which makes __drm_test_mm_reserve
use even more of it, to the point of hitting the warning limit:
drivers/gpu/drm/tests/drm_mm_test.c:344:12: error: stack frame size (1048) exceeds limit (1024) in '__drm_test_mm_reserve' [-Werror,-Wframe-larger-than]
When building with gcc, this does not happen, but the structleak
plugin can similarly increase the stack usage and needs to be
disabled, as we do for all other kunit users.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Maíra Canal <mairacanal@riseup.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20221215163511.266214-1-arnd@kernel.org
Hao Lan says:
====================
net: hns3: fix some bug for hns3
There are some bugfixes for the HNS3 ethernet driver. patch#1 fix miss
checking for rx packet. patch#2 fixes VF promisc mode not update
when mac table full bug, and patch#3 fixes a nterrupts not
initialization in VF FLR bug.
====================
Link: https://lore.kernel.org/r/20221222064343.61537-1-lanhao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, it missed set HCLGE_VPORT_STATE_PROMISC_CHANGE
flag for VF when vport->overflow_promisc_flags changed.
So the VF won't check whether to update promisc mode in
this case. So add it.
Fixes: 1e6e76101f ("net: hns3: configure promisc mode for VF asynchronously")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For device supports RXD advanced layout, the driver will
return directly if the hardware finish the checksum
calculate. It cause missing L3E checking for ip packets.
Fixes it.
Fixes: 1ddc028ac8 ("net: hns3: refactor out RX completion checksum")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently keep alive message between PF and VF may be lost and the VF is
unalive in PF. So the VF will not do reset during PF FLR reset process.
This would make the allocated interrupt resources of VF invalid and VF
would't receive or respond to PF any more.
So this patch adds VF interrupts re-initialization during VF FLR for VF
recovery in above cases.
Fixes: 862d969a3a ("net: hns3: do VF's pci re-initialization while PF doing FLR")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After befae75856, the verifier would propagate null information after
JEQ/JNE, e.g., if two pointers, one is maybe_null and the other is not,
the former would be marked as non-null in eq path. However, as comment
"PTR_TO_BTF_ID points to a kernel struct that does not need to be null
checked by the BPF program ... The verifier must keep this in mind and
can make no assumptions about null or non-null when doing branch ...".
If one pointer is maybe_null and the other is PTR_TO_BTF, the former is
incorrectly marked non-null. The following BPF prog can trigger a
null-ptr-deref, also see this report for more details[1]:
0: (18) r1 = map_fd ; R1_w=map_ptr(ks=4, vs=4)
2: (79) r6 = *(u64 *)(r1 +8) ; R6_w=bpf_map->inner_map_data
; R6 is PTR_TO_BTF_ID
; equals to null at runtime
3: (bf) r2 = r10
4: (07) r2 += -4
5: (62) *(u32 *)(r2 +0) = 0
6: (85) call bpf_map_lookup_elem#1 ; R0_w=map_value_or_null
7: (1d) if r6 == r0 goto pc+1
8: (95) exit
; from 7 to 9: R0=map_value R6=ptr_bpf_map
9: (61) r0 = *(u32 *)(r0 +0) ; null-ptr-deref
10: (95) exit
So, make the verifier propagate nullness information for reg to reg
comparisons only if neither reg is PTR_TO_BTF_ID.
[1] https://lore.kernel.org/bpf/CACkBjsaFJwjC5oiw-1KXvcazywodwXo4zGYsRHwbr2gSG9WcSw@mail.gmail.com/T/#u
Fixes: befae75856 ("bpf: propagate nullness information for reg to reg comparisons")
Signed-off-by: Hao Sun <sunhao.th@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221222024414.29539-1-sunhao.th@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Since the commit eb9d1acf63 ("bpftool: Add LLVM as default library for
disassembling JIT-ed programs") we might link the bpftool program with the
libllvm library. This works fine when a shared libllvm library is available,
but fails if we want to link bpftool with a statically built LLVM:
[...]
/usr/bin/ld: /usr/local/lib/libLLVMSupport.a(CrashRecoveryContext.cpp.o): in function `llvm::CrashRecoveryContextCleanup::~CrashRecoveryContextCleanup()':
CrashRecoveryContext.cpp:(.text._ZN4llvm27CrashRecoveryContextCleanupD0Ev+0x17): undefined reference to `operator delete(void*, unsigned long)'
/usr/bin/ld: /usr/local/lib/libLLVMSupport.a(CrashRecoveryContext.cpp.o): in function `llvm::CrashRecoveryContext::~CrashRecoveryContext()':
CrashRecoveryContext.cpp:(.text._ZN4llvm20CrashRecoveryContextD2Ev+0xc8): undefined reference to `operator delete(void*, unsigned long)'
[...]
So in the case of static libllvm we need to explicitly link bpftool with
required libraries, namely, libstdc++ and those provided by the `llvm-config
--system-libs` command. We can distinguish between the shared and static cases
by using the `llvm-config --shared-mode` command.
Fixes: eb9d1acf63 ("bpftool: Add LLVM as default library for disassembling JIT-ed programs")
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221222102627.1643709-1-aspsk@isovalent.com
Currently, we shut down the filecache before trying to clean up the
stateids that depend on it. This leads to the kernel trying to free an
nfsd_file twice, and a refcount overput on the nf_mark.
Change the shutdown procedure to tear down all of the stateids prior
to shutting down the filecache.
Reported-and-tested-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Fixes: 5e113224c1 ("nfsd: nfsd_file cache entries should be per net namespace")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
When AF_XDP is used on on a veth interface the RX ring is updated in two
steps. veth_xdp_rcv() removes packet descriptors from the FILL ring
fills them and places them in the RX ring updating the cached_prod
pointer. Later xdp_do_flush() syncs the RX ring prod pointer with the
cached_prod pointer allowing user-space to see the recently filled in
descriptors. The rings are intended to be SPSC, however the existing
order in veth_poll allows the xdp_do_flush() to run concurrently with
another CPU creating a race condition that allows user-space to see old
or uninitialized descriptors in the RX ring. This bug has been observed
in production systems.
To summarize, we are expecting this ordering:
CPU 0 __xsk_rcv_zc()
CPU 0 __xsk_map_flush()
CPU 2 __xsk_rcv_zc()
CPU 2 __xsk_map_flush()
But we are seeing this order:
CPU 0 __xsk_rcv_zc()
CPU 2 __xsk_rcv_zc()
CPU 0 __xsk_map_flush()
CPU 2 __xsk_map_flush()
This occurs because we rely on NAPI to ensure that only one napi_poll
handler is running at a time for the given veth receive queue.
napi_schedule_prep() will prevent multiple instances from getting
scheduled. However calling napi_complete_done() signals that this
napi_poll is complete and allows subsequent calls to
napi_schedule_prep() and __napi_schedule() to succeed in scheduling a
concurrent napi_poll before the xdp_do_flush() has been called. For the
veth driver a concurrent call to napi_schedule_prep() and
__napi_schedule() can occur on a different CPU because the veth xmit
path can additionally schedule a napi_poll creating the race.
The fix as suggested by Magnus Karlsson, is to simply move the
xdp_do_flush() call before napi_complete_done(). This syncs the
producer ring pointers before another instance of napi_poll can be
scheduled on another CPU. It will also slightly improve performance by
moving the flush closer to when the descriptors were placed in the
RX ring.
Fixes: d1396004dd ("veth: Add XDP TX and REDIRECT")
Suggested-by: Magnus Karlsson <magnus.karlsson@gmail.com>
Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When the PCS was taken out of reset, we were changing by mistake also
the speed to 100 Mbit. But in case the link was going down, the link
up routine was setting correctly the link speed. If the link was not
getting down then the speed was forced to run at 100 even if the
speed was something else.
On lan966x, to set the speed link to 1G or 2.5G a value of 1 needs to be
written in DEV_CLOCK_CFG_LINK_SPEED. This is similar to the procedure in
lan966x_port_init.
The issue was reproduced using 1000base-x sfp module using the commands:
ip link set dev eth2 up
ip link addr add 10.97.10.2/24 dev eth2
ethtool -s eth2 speed 1000 autoneg off
Fixes: d28d6d2e37 ("net: lan966x: add port module support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Link: https://lore.kernel.org/r/20221221093315.939133-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Set timeout and garbage collection interval updates are ignored on
updates. Add transaction to update global set element timeout and
garbage collection interval.
Fixes: 96518518cc ("netfilter: add nftables")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Mat Martineau says:
====================
mptcp: Locking fixes
Two separate locking fixes for the networking tree:
Patch 1 addresses a MPTCP fastopen error-path deadlock that was found
with syzkaller.
Patch 2 works around a lockdep false-positive between MPTCP listening and
non-listening sockets at socket destruct time.
====================
Link: https://lore.kernel.org/r/20221220195215.238353-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MattB reported a lockdep splat in the mptcp listener code cleanup:
WARNING: possible circular locking dependency detected
packetdrill/14278 is trying to acquire lock:
ffff888017d868f0 ((work_completion)(&msk->work)){+.+.}-{0:0}, at: __flush_work (kernel/workqueue.c:3069)
but task is already holding lock:
ffff888017d84130 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close (net/mptcp/protocol.c:2973)
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (sk_lock-AF_INET){+.+.}-{0:0}:
__lock_acquire (kernel/locking/lockdep.c:5055)
lock_acquire (kernel/locking/lockdep.c:466)
lock_sock_nested (net/core/sock.c:3463)
mptcp_worker (net/mptcp/protocol.c:2614)
process_one_work (kernel/workqueue.c:2294)
worker_thread (include/linux/list.h:292)
kthread (kernel/kthread.c:376)
ret_from_fork (arch/x86/entry/entry_64.S:312)
-> #0 ((work_completion)(&msk->work)){+.+.}-{0:0}:
check_prev_add (kernel/locking/lockdep.c:3098)
validate_chain (kernel/locking/lockdep.c:3217)
__lock_acquire (kernel/locking/lockdep.c:5055)
lock_acquire (kernel/locking/lockdep.c:466)
__flush_work (kernel/workqueue.c:3070)
__cancel_work_timer (kernel/workqueue.c:3160)
mptcp_cancel_work (net/mptcp/protocol.c:2758)
mptcp_subflow_queue_clean (net/mptcp/subflow.c:1817)
__mptcp_close_ssk (net/mptcp/protocol.c:2363)
mptcp_destroy_common (net/mptcp/protocol.c:3170)
mptcp_destroy (include/net/sock.h:1495)
__mptcp_destroy_sock (net/mptcp/protocol.c:2886)
__mptcp_close (net/mptcp/protocol.c:2959)
mptcp_close (net/mptcp/protocol.c:2974)
inet_release (net/ipv4/af_inet.c:432)
__sock_release (net/socket.c:651)
sock_close (net/socket.c:1367)
__fput (fs/file_table.c:320)
task_work_run (kernel/task_work.c:181 (discriminator 1))
exit_to_user_mode_prepare (include/linux/resume_user_mode.h:49)
syscall_exit_to_user_mode (kernel/entry/common.c:130)
do_syscall_64 (arch/x86/entry/common.c:87)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sk_lock-AF_INET);
lock((work_completion)(&msk->work));
lock(sk_lock-AF_INET);
lock((work_completion)(&msk->work));
*** DEADLOCK ***
The report is actually a false positive, since the only existing lock
nesting is the msk socket lock acquired by the mptcp work.
cancel_work_sync() is invoked without the relevant socket lock being
held, but under a different (the msk listener) socket lock.
We could silence the splat adding a per workqueue dynamic lockdep key,
but that looks overkill. Instead just tell lockdep the msk socket lock
is not held around cancel_work_sync().
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/322
Fixes: 30e51b923e ("mptcp: fix unreleased socket in accept queue")
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MatM reported a deadlock at fastopening time:
INFO: task syz-executor.0:11454 blocked for more than 143 seconds.
Tainted: G S 6.1.0-rc5-03226-gdb0157db5153 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0 state:D stack:25104 pid:11454 ppid:424 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5191 [inline]
__schedule+0x5c2/0x1550 kernel/sched/core.c:6503
schedule+0xe8/0x1c0 kernel/sched/core.c:6579
__lock_sock+0x142/0x260 net/core/sock.c:2896
lock_sock_nested+0xdb/0x100 net/core/sock.c:3466
__mptcp_close_ssk+0x1a3/0x790 net/mptcp/protocol.c:2328
mptcp_destroy_common+0x16a/0x650 net/mptcp/protocol.c:3171
mptcp_disconnect+0xb8/0x450 net/mptcp/protocol.c:3019
__inet_stream_connect+0x897/0xa40 net/ipv4/af_inet.c:720
tcp_sendmsg_fastopen+0x3dd/0x740 net/ipv4/tcp.c:1200
mptcp_sendmsg_fastopen net/mptcp/protocol.c:1682 [inline]
mptcp_sendmsg+0x128a/0x1a50 net/mptcp/protocol.c:1721
inet6_sendmsg+0x11f/0x150 net/ipv6/af_inet6.c:663
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xf7/0x190 net/socket.c:734
____sys_sendmsg+0x336/0x970 net/socket.c:2476
___sys_sendmsg+0x122/0x1c0 net/socket.c:2530
__sys_sendmmsg+0x18d/0x460 net/socket.c:2616
__do_sys_sendmmsg net/socket.c:2645 [inline]
__se_sys_sendmmsg net/socket.c:2642 [inline]
__x64_sys_sendmmsg+0x9d/0x110 net/socket.c:2642
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f5920a75e7d
RSP: 002b:00007f59201e8028 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f5920bb4f80 RCX: 00007f5920a75e7d
RDX: 0000000000000001 RSI: 0000000020002940 RDI: 0000000000000005
RBP: 00007f5920ae7593 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000020004050 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f5920bb4f80 R15: 00007f59201c8000
</TASK>
In the error path, tcp_sendmsg_fastopen() ends-up calling
mptcp_disconnect(), and the latter tries to close each
subflow, acquiring the socket lock on each of them.
At fastopen time, we have a single subflow, and such subflow
socket lock is already held by the called, causing the deadlock.
We already track the 'fastopen in progress' status inside the msk
socket. Use it to address the issue, making mptcp_disconnect() a
no op when invoked from the fastopen (error) path and doing the
relevant cleanup after releasing the subflow socket lock.
While at the above, rename the fastopen status bit to something
more meaningful.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/321
Fixes: fa9e57468a ("mptcp: fix abba deadlock on fastopen")
Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit dacce2be33 ("vmxnet3: add geneve and vxlan tunnel offload
support") added support for encapsulation offload. However, the
pathc did not report correctly the csum_level for encapsulated packet.
This patch fixes this issue by reporting correct csum level for the
encapsulated packet.
Fixes: dacce2be33 ("vmxnet3: add geneve and vxlan tunnel offload support")
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Peng Li <lpeng@vmware.com>
Link: https://lore.kernel.org/r/20221220202556.24421-1-doshir@vmware.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Multicast packets received on an interface bound to a VRF are marked as
belonging to the VRF and the skb device is updated to point to the VRF
device itself. This was fine even when a route was associated to a
device as when performing a fib table lookup 'oif' in fib6_table_lookup
(coming from 'skb->dev->ifindex' in ip6_route_input) was set to 0 when
FLOWI_FLAG_SKIP_NH_OIF was set.
With commit 40867d74c3 ("net: Add l3mdev index to flow struct and
avoid oif reset for port devices") this is not longer true and multicast
traffic is not received on the original interface.
Instead of adding back a similar check in fib6_table_lookup determine
the dst using the original ifindex for multicast VRF traffic. To make
things consistent across the function do the above for all strict
packets, which was the logic before commit 6f12fa7755 ("vrf: mark skb
for multicast or link-local as enslaved to VRF"). Note that reverting to
this behavior should be fine as the change was about marking packets
belonging to the VRF, not about their dst.
Fixes: 40867d74c3 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20221220171825.1172237-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previously ice XDP xmit routine was changed in a way that it avoids
xdp_buff->xdp_frame conversion as it is simply not needed for handling
XDP_TX action and what is more it saves us CPU cycles. This routine is
re-used on ZC driver to handle XDP_TX action.
Although for XDP_TX on Rx ZC xdp_buff that comes from xsk_buff_pool is
converted to xdp_frame, xdp_frame itself is not stored inside
ice_tx_buf, we only store raw data pointer. Casting this pointer to
xdp_frame and calling against it xdp_return_frame in
ice_clean_xdp_tx_buf() results in undefined behavior.
To fix this, simply call page_frag_free() on tx_buf->raw_buf.
Later intention is to remove the buff->frame conversion in order to
simplify the codebase and improve XDP_TX performance on ZC.
Fixes: 126cdfe100 ("ice: xsk: Improve AF_XDP ZC Tx and use batching API")
Reported-and-tested-by: Robin Cowley <robin.cowley@thehutgroup.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@.intel.com>
Link: https://lore.kernel.org/r/20221220175448.693999-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kalle Valo says:
====================
wireless fixes for v6.2
First set of fixes for v6.2. Fix for a link error in mt76, fix for an
iwlwifi firmware crash and two cleanups.
* tag 'wireless-2022-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: ath9k: use proper statements in conditionals
wifi: mt76: mt7996: select CONFIG_RELAY
wifi: iwlwifi: fw: skip PPAG for JF
wifi: ti: remove obsolete lines in the Makefile
====================
Link: https://lore.kernel.org/r/20221221180808.96A8AC433EF@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the bpf_skb_adjust_room() shrinks the skb such that its csum_start
is invalid, the skb->ip_summed should be reset from CHECKSUM_PARTIAL to
CHECKSUM_NONE.
The commit 54c3f1a814 ("bpf: pull before calling skb_postpull_rcsum()")
fixed it.
This patch adds a test to ensure the skb->ip_summed changed from
CHECKSUM_PARTIAL to CHECKSUM_NONE after bpf_skb_adjust_room().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221221185653.1589961-1-martin.lau@linux.dev
If a ruleset declares a set name that matches an existing set in the
kernel, then validate that this declaration really refers to the same
set, otherwise bail out with EEXIST.
Currently, the kernel reports success when adding a set that already
exists in the kernel. This usually results in EINVAL errors at a later
stage, when the user adds elements to the set, if the set declaration
mismatches the existing set representation in the kernel.
Add a new function to check that the set declaration really refers to
the same existing set in the kernel.
Fixes: 96518518cc ("netfilter: add nftables")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a helper function to allocate and initialize the stateful expressions
that are defined in a set.
This patch allows to reuse this code from the set update path, to check
that type of the update matches the existing set in the kernel.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add the following fields to the set description:
- key type
- data type
- object type
- policy
- gc_int: garbage collection interval)
- timeout: element timeout
This prepares for stricter set type checks on updates in a follow up
patch.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
smatch warnings:
net/netfilter/nf_conntrack_proto.c:167 nf_confirm() warn: unsigned 'protoff' is never less than zero.
We need to check if ipv6_skip_exthdr() returned an error, but protoff is
unsigned. Use a signed integer for this.
Fixes: a70e483460 ("netfilter: conntrack: merge ipv4+ipv6 confirm functions")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
panfrost_gem_create_with_handle() previously returned a BO but with the
only reference being from the handle, which user space could in theory
guess and release, causing a use-after-free. Additionally if the call to
panfrost_gem_mapping_get() in panfrost_ioctl_create_bo() failed then
a(nother) reference on the BO was dropped.
The _create_with_handle() is a problematic pattern, so ditch it and
instead create the handle in panfrost_ioctl_create_bo(). If the call to
panfrost_gem_mapping_get() fails then this means that user space has
indeed gone behind our back and freed the handle. In which case just
return an error code.
Reported-by: Rob Clark <robdclark@chromium.org>
Fixes: f3ba91228e ("drm/panfrost: Add initial panfrost driver")
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221219140130.410578-1-steven.price@arm.com
Anand hit a BUG() when pulling off headers on egress to a SW tunnel.
We get to skb_checksum_help() with an invalid checksum offset
(commit d7ea0d9df2 ("net: remove two BUG() from skb_checksum_help()")
converted those BUGs to WARN_ONs()).
He points out oddness in how skb_postpull_rcsum() gets used.
Indeed looks like we should pull before "postpull", otherwise
the CHECKSUM_PARTIAL fixup from skb_postpull_rcsum() will not
be able to do its job:
if (skb->ip_summed == CHECKSUM_PARTIAL &&
skb_checksum_start_offset(skb) < 0)
skb->ip_summed = CHECKSUM_NONE;
Reported-by: Anand Parthasarathy <anpartha@meta.com>
Fixes: 6578171a7f ("bpf: add bpf_skb_change_proto helper")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20221220004701.402165-1-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
When logging a new name, we don't expect to fail joining a log transaction
since we know at least one of the inodes was logged before in the current
transaction. However if we fail for some unexpected reason, we end up not
freeing the fscrypt name we previously allocated. So fix that by freeing
the name in case we failed to join a log transaction.
Fixes: ab3c5c18e8 ("btrfs: setup qstr from dentrys using fscrypt helper")
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 75b4703329 ("btrfs: raid56: migrate recovery and scrub recovery
path to use error_bitmap") introduced an uninitialized return variable.
This can be caught by gcc 12.1 by -Wmaybe-uninitialized:
CC [M] fs/btrfs/raid56.o
fs/btrfs/raid56.c: In function ‘scrub_rbio’:
fs/btrfs/raid56.c:2801:15: warning: ‘ret’ may be used uninitialized [-Wmaybe-uninitialized]
2801 | ret = recover_scrub_rbio(rbio);
| ^~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/raid56.c:2649:13: note: ‘ret’ was declared here
2649 | int ret;
The warning is disabled by default so we haven't caught that.
Due to the bug the raid56 scrub fstests have been failing since the
patch was merged, so initialize that.
Fixes: 75b4703329 ("btrfs: raid56: migrate recovery and scrub recovery path to use error_bitmap")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When there is a single DS no striping constraints need to be placed on
the IO. When such constraint is applied then buffered reads don't
coalesce to the DS's rsize.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
A previous cleanup patch accidentally broke some conditional
expressions by replacing the safe "do {} while (0)" constructs
with empty macros. gcc points this out when extra warnings
are enabled:
drivers/net/wireless/ath/ath9k/hif_usb.c: In function 'ath9k_skb_queue_complete':
drivers/net/wireless/ath/ath9k/hif_usb.c:251:57: error: suggest braces around empty body in an 'else' statement [-Werror=empty-body]
251 | TX_STAT_INC(hif_dev, skb_failed);
Make both sets of macros proper expressions again.
Fixes: d7fc76039b ("ath9k: htc: clean up statistics macros")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221215165553.1950307-1-arnd@kernel.org
Without CONFIG_RELAY, the driver fails to link:
ERROR: modpost: "relay_flush" [drivers/net/wireless/mediatek/mt76/mt7996/mt7996e.ko] undefined!
ERROR: modpost: "relay_switch_subbuf" [drivers/net/wireless/mediatek/mt76/mt7996/mt7996e.ko] undefined!
ERROR: modpost: "relay_open" [drivers/net/wireless/mediatek/mt76/mt7996/mt7996e.ko] undefined!
ERROR: modpost: "relay_reset" [drivers/net/wireless/mediatek/mt76/mt7996/mt7996e.ko] undefined!
ERROR: modpost: "relay_file_operations" [drivers/net/wireless/mediatek/mt76/mt7996/mt7996e.ko] undefined!
The same change was done in mt7915 for the corresponding copy of the code.
Fixes: 98686cd216 ("wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices")
See-also: 988845c936 ("mt76: mt7915: add support for passing chip/firmware debug data to user space")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221215163133.4152299-1-arnd@kernel.org
There are warnings reported from resolve_btfids when building vmlinux
with CONFIG_SECURITY_NETWORK disabled:
WARN: resolve_btfids: unresolved symbol bpf_lsm_sk_free_security
WARN: resolve_btfids: unresolved symbol bpf_lsm_sk_alloc_security
So only define BTF IDs for these LSM hooks when CONFIG_SECURITY_NETWORK
is enabled.
Fixes: c0c852dd18 ("bpf: Do not mark certain LSM hook arguments as trusted")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221217062144.2507222-1-houtao@huaweicloud.com
Commit 9130b8dbc6 ("SUNRPC: allow for upcalls for the same uid
but different gss service") introduced `auth` argument to
__gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL
since it (and auth->service) was not (yet) determined.
When multiple upcalls with the same uid and different service are
ongoing, it could happen that __gss_find_upcall(), which returns the
first match found in the pipe->in_downcall list, could not find the
correct gss_msg corresponding to the downcall we are looking for.
Moreover, it might return a msg which is not sent to rpc.gssd yet.
We could see mount.nfs process hung in D state with multiple mount.nfs
are executed in parallel. The call trace below is of CentOS 7.9
kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/
elrepo kernel-ml-6.0.7-1.el7.
PID: 71258 TASK: ffff91ebd4be0000 CPU: 36 COMMAND: "mount.nfs"
#0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f
#1 [ffff9203ca323580] schedule at ffffffffa3b88eb9
#2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss]
#3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc
[sunrpc]
#4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss]
#5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc]
#6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc]
#7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc]
#8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc]
#9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc]
The scenario is like this. Let's say there are two upcalls for
services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe.
When rpc.gssd reads pipe to get the upcall msg corresponding to
service B from pipe->pipe and then writes the response, in
gss_pipe_downcall the msg corresponding to service A will be picked
because only uid is used to find the msg and it is before the one for
B in pipe->in_downcall. And the process waiting for the msg
corresponding to service A will be woken up.
Actual scheduing of that process might be after rpc.gssd processes the
next msg. In rpc_pipe_generic_upcall it clears msg->errno (for A).
The process is scheduled to see gss_msg->ctx == NULL and
gss_msg->msg.errno == 0, therefore it cannot break the loop in
gss_create_upcall and is never woken up after that.
This patch adds a simple check to ensure that a msg which is not
sent to rpc.gssd yet is not chosen as the matching upcall upon
receiving a downcall.
Signed-off-by: minoura makoto <minoura@valinux.co.jp>
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@nec.com>
Tested-by: Hiroshi Shimamoto <h-shimamoto@nec.com>
Cc: Trond Myklebust <trondmy@hammerspace.com>
Fixes: 9130b8dbc6 ("SUNRPC: allow for upcalls for same uid but different gss service")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
In the patch a2c8d27e5e ("btrfs: use a structure to pass arguments to
backref walking functions") Filipe converted everybody to using a new
context struct to use for backref lookups, but accidentally dropped the
BTRFS_SEQ_LAST usage that exists for qgroups. Add this back so we have
the previous behavior.
Fixes: a2c8d27e5e ("btrfs: use a structure to pass arguments to backref walking functions")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When removing the btrfs module we are not calling btrfs_cleanup_fs_uuids()
which results in leaking btrfs_fs_devices structures and other resources.
This is a regression recently introduced by a refactoring of the module
initialization and exit sequence, which simply removed the call to
btrfs_cleanup_fs_uuids() in the exit path, resulting in the leaks.
So fix this by calling btrfs_cleanup_fs_uuids() at exit_btrfs_fs().
Fixes: 5565b8e0ad ("btrfs: make module init/exit match their sequence")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
All error handling paths end to 'out', except this memory allocation
failure.
This is spurious. So branch to the error handling path also in this case.
It will add a call to:
memset(&root->defrag_progress, 0,
sizeof(root->defrag_progress));
Fixes: 6702ed490c ("Btrfs: Add run time btree defrag, and an ioctl to force btree defrag")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If new_whiteout_inode() fails, some resources need to be freed.
Add the missing goto to the error handling path.
Fixes: ab3c5c18e8 ("btrfs: setup qstr from dentrys using fscrypt helper")
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The init order and resulting error handling in dma_buf_export
was pretty messy.
Subordinate objects like the file and the sysfs kernel objects
were initializing and wiring itself up with the object in the
wrong order resulting not only in complicating and partially
incorrect error handling, but also in publishing only halve
initialized DMA-buf objects.
Clean this up thoughtfully by allocating the file independent
of the DMA-buf object. Then allocate and initialize the DMA-buf
object itself, before publishing it through sysfs. If everything
works as expected the file is then connected with the DMA-buf
object and publish it through debugfs.
Also adds the missing dma_resv_fini() into the error handling.
v2: add some missing changes to dma_bug_getfile() and a missing NULL
check in dma_buf_file_release()
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20221209071535.933698-1-christian.koenig@amd.com
2022-12-13 08:31:45 +01:00
298 changed files with 6420 additions and 1499 deletions
@@ -2767,7 +2767,8 @@ static int hclgevf_pci_reset(struct hclgevf_dev *hdev)
structpci_dev*pdev=hdev->pdev;
intret=0;
if(hdev->reset_type==HNAE3_VF_FULL_RESET&&
if((hdev->reset_type==HNAE3_VF_FULL_RESET||
hdev->reset_type==HNAE3_FLR_RESET)&&
test_bit(HCLGEVF_STATE_IRQ_INITED,&hdev->state)){
hclgevf_misc_irq_uninit(hdev);
hclgevf_uninit_msi(hdev);
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.